The human body is a marvel of mechanical engineering. Think about the act of standing on one leg while extending the other high into the air—a “Bruce Lee kick.” To you, this might feel like a simple (albeit physically demanding) exertion of muscle. To a roboticist, this is a nightmare of physics. It requires precise center-of-mass control, active stabilization against gravity, and the ability to handle the subtle jitter of muscles—or in a robot’s case, motors.
In the field of humanoid robotics, we have seen incredible progress in dynamic locomotion—running, walking, and jumping. However, quasi-static balance—the ability to maintain difficult, unstable poses for extended periods—remains a significant hurdle. When a robot tries to mimic a human doing yoga or a martial arts stance, it often falls over or shakes uncontrollably.
In this post, we will dissect a new framework called HuB (Humanoid Balance). This research identifies why current Reinforcement Learning (RL) methods fail at extreme balance tasks and proposes a unified pipeline to fix it. We will explore how HuB enables a Unitree G1 robot to perform the “Swallow Balance” and withstand forceful kicks from a soccer ball, tasks where baseline methods consistently fail.

The Problem: Why is Standing Still So Hard?
Before diving into the solution, we need to understand the failure modes of current approaches. The standard modern approach to humanoid control involves tracking-based Reinforcement Learning. The pipeline generally looks like this:
- Motion Capture (MoCap): Record a human performing a skill.
- Retargeting: Map the human’s joint angles to the robot’s joints.
- Simulation Training: Train an RL policy to mimic these motions in a physics simulator (like Isaac Gym).
- Sim-to-Real Transfer: Deploy the policy onto the physical robot.
While this works for walking, it breaks down for extreme balance tasks due to three specific challenges identified by the HuB authors:
- Reference Motion Errors: MoCap data is noisy. If the reference motion says the support foot is sliding slightly (an artifact of the recording), the robot will try to slide its foot while balancing, causing a fall.
- Morphological Mismatch: A robot is not a biological human. Their mass distributions (Center of Mass - COM) are different. If a robot strictly copies a human’s joint angles for a one-legged stand, the robot’s COM might not be over its foot, leading to immediate instability.
- The Sim-to-Real Gap: Real sensors are noisy. IMUs (Inertial Measurement Units) drift and jitter. If the policy isn’t trained to handle this specific type of noise, the robot will oscillate and fall in the real world.
The HuB Framework
HuB is designed as a direct response to these three problems. It is a unified framework consisting of three distinct stages: Reference Motion Refinement, Balance-Aware Policy Learning, and Sim-to-Real Robustness Training.

Let’s break down each component of this architecture mathematically and conceptually.
1. Reference Motion Refinement
The first step is cleaning the data. As the saying goes, “garbage in, garbage out.” If the teacher (the human motion data) is flawed, the student (the robot) will fail.
SMPL-Initialized Retargeting
Standard retargeting often starts with the robot in a “zero pose” (standing straight) and uses optimization to match the human’s pose. This is a non-convex optimization problem, meaning the solver can get stuck in “local minima”—weird, unnatural joint configurations that look mathematically close but are robotically awkward.
HuB improves this by initializing the solver using SMPL parameters. SMPL is a standardized 3D model of the human body. Since the humanoid’s joints are essentially a subset of human joints, using the SMPL Euler angles as the starting point for optimization puts the solver much closer to the correct answer, ensuring the resulting motion is physically natural.
Grounded Foot Correction & COM Filtering
Two specific physical checks are applied to the data:
- No Sliding: During single-leg stances, the human foot might appear to jitter in the video. HuB algorithmically locks the support foot’s position in the reference data, forcing the “goal” to be a stable, planted foot.
- Physics Check: Because of the mass differences mentioned earlier, some human poses are physically impossible for the robot. The system calculates the robot’s theoretical Center of Mass (COM) based on its URDF (Unified Robot Description Format). If the projection of the COM falls outside the support foot by more than 0.2m, that motion frame is discarded.
As seen in the chart below, using SMPL-initialized retargeting significantly reduces the error (loss) between the intended motion and the retargeted motion, particularly for complex tasks like the Deep Squat.

2. Balance-Aware Policy Learning
Once the data is refined, we move to the Reinforcement Learning phase. This is where the robot learns how to execute the motion.
The “Relaxed Tracking” Philosophy
This is a critical insight of the paper. Previous methods forced the robot to track the reference motion as closely as possible. However, because the robot has a different body shape than the human, the “perfect” human pose might be unstable for the robot.
HuB adopts Relaxed Reference Tracking. Instead of punishing every millimeter of deviation, the reward function includes a tolerance parameter (\(\sigma\)).
- Strict Tracking: “You must be exactly at angle \(X\).” (Result: Robot falls because that angle isn’t balanced for its body).
- Relaxed Tracking: “Be around angle \(X\), but find a spot where you don’t fall over.”
The authors found that setting a tolerance of \(\sigma = 0.6\) meters allowed the policy to explore and find its own equilibrium while still looking like the reference motion.
Shaping Rewards
To guide the robot within this “relaxed” space, specific shaping rewards are added to the RL objective:
- Center of Mass (COM) Reward: Encourages the robot to keep its vertical COM projection inside the support polygon (the foot).
- Foot Contact Mismatch Penalty: This is crucial for single-leg tasks. If the reference says “left leg up,” the robot is heavily penalized if the left foot touches the ground. This prevents the robot from “cheating” by putting the foot down to save itself.
- Close Feet Penalty: Prevents the feet from colliding, which is a common cause of self-induced falls.
3. Sim-to-Real Robustness Training
The final piece of the puzzle is bridging the gap between the perfect simulation and the messy real world.
Localized Reference Tracking
Robots often rely on Visual-Inertial Odometry (VIO) to know where they are in the room. VIO is notorious for drifting. If the robot thinks it is 10cm to the left of where it actually is, it will try to correct for a phantom error, causing it to lose balance.
HuB removes this dependency. During training and deployment, the robot tracks the reference motion relative to its own root (pelvis). It doesn’t care where in the room it is, only how its limbs are positioned relative to its body. This effectively eliminates VIO drift as a failure mode.
IMU-Centric Observation Perturbation
Most RL papers add “uniform noise” to observations to make the policy robust. The authors argue this is insufficient for balance. Real IMU noise is temporally correlated—it doesn’t just jump randomly; it drifts and sways over time.
HuB models this using Ornstein-Uhlenbeck (OU) noise. They inject this noise into the observed root orientation (Euler angles).

Here, \(X_t\) is the noise, \(\theta\) is the reversion rate (pulling the noise back to zero so it doesn’t drift to infinity), and \(\sigma\) is the intensity. By training the robot to balance even when its “sense of balance” (the IMU) is lying to it via this specific equation, the policy becomes incredibly robust to real-world sensor imperfections.
High-Frequency Push Disturbance
Finally, to simulate the micro-jitters and gear backlash of real hardware, the robot is shoved around in simulation. Unlike previous works that used large, infrequent pushes (to test recovery), HuB uses small, high-frequency pushes (every 1 second, up to 0.5 m/s velocity). This forces the policy to constantly micro-adjust, creating a “tight” control loop essential for static balance.
Experiments and Results
The team validated HuB on the Unitree G1 humanoid. They compared it against two baselines: H2O and OmniH2O (state-of-the-art tracking-based controllers).
Simulation Metrics
The table below shows the simulation results. The metrics are telling:
- Succ (Success Rate): HuB achieves 100% success on Swallow Balance and Bruce Lee’s Kick. The baselines achieve 0% and 4% respectively.
- Cont (Contact Mismatch): The baselines have high mismatch scores, meaning they constantly put the non-supporting foot down to stop from falling. HuB has near-zero mismatch.

Real-World Robustness
The most impressive demonstration is the physical robustness. In one experiment, while the robot was performing a single-leg balance, the researchers kicked a soccer ball forcefully at the robot’s torso.
As shown in Figure 4, the robot absorbs the impact. The orange trajectory shows the ball, and the panels show the robot correcting its posture to remain standing. A standard policy would likely overcompensate and fall, but the “relaxed tracking” combined with the push-disturbance training allows the HuB policy to react naturally—deviating from the pose to absorb the energy and then returning to the set point.

Furthermore, the robot demonstrated long-horizon consistency. It could perform “Bruce Lee’s Kick” 10 times in a row in a single continuous run without falling or needing a reset. This reliability is often the missing link in making humanoid robots actually useful.
Conclusion
The HuB paper highlights a crucial lesson in robotics: simply throwing a Neural Network at a motion capture file isn’t enough. To achieve extreme capabilities, we must tailor the learning process to the robot’s reality.
By refining the input data to be physically feasible, relaxing the tracking to allow the robot to find its own center of mass, and modeling realistic sensor noise, HuB turns a clumsy humanoid into a martial arts master.
This work paves the way for humanoids that can do more than just walk; it opens the door for robots that can operate in constrained spaces, perform complex maintenance tasks requiring odd postures, or simply move with the grace and stability of a human.
Key Takeaways:
- Don’t force the robot to be human: Allow relaxed tracking so the robot can compensate for its own morphology.
- Train for the sensor you have: Modeled noise (OU process) beats generic random noise for Sim-to-Real transfer.
- Data quality matters: Cleaning up foot sliding and mass-feasibility in the reference motion is half the battle.
](https://deep-paper.org/en/paper/2505.07294/images/cover.png)