Introduction

Reinforcement Learning (RL) has revolutionized how legged robots move. We have seen quadrupedal robots traversing rough terrain, climbing stairs, and even performing parkour with superhuman agility. However, there is a lingering problem with these state-of-the-art controllers: they are often incredibly “stiff.”

Most RL locomotion policies are trained to track a specific velocity command. If you push a robot running a velocity-tracking controller, it treats your push as a disturbance to be rejected immediately. It fights back to maintain its target speed. While this works for minor bumps, it fails catastrophically under large forces. The robot either stiffens up and slips, or it breaks itself (and potentially its surroundings) by colliding hard with obstacles.

For robots to work alongside humans, they need compliance. They need to know when to be stiff (to pull a heavy load) and when to be soft (to let a human guide them by hand or to absorb a heavy impact).

In a new paper titled FACET, researchers from Tsinghua University and Shanghai AI Lab introduce a method to solve this. Instead of training a robot to track velocity, they train it to mimic a “virtual spring.”

Figure 1: FACET enables variable compliance. (a) A compliant robot can be guided by hand. (b) A stiff robot can pull heavy loads. (c) The framework applies to different morphologies.

As shown in Figure 1, this approach allows for diverse behaviors: from a robot so soft you can guide it with a string, to one stiff enough to pull a 10kg payload.

Background: The Problem with Velocity Tracking

To understand why FACET is necessary, we must look at how current robots are controlled. Standard RL policies for legged robots usually optimize a reward function based on velocity tracking. The human operator says “move forward at 1 m/s,” and the robot minimizes the error between its actual speed and that target.

While effective for movement, this ignores force. In the real world, interaction involves force. If a robot walks into a wall while trying to maintain 1 m/s, it will apply maximum torque to its motors to fight the wall, potentially damaging its gears.

Table 1 highlights the gaps in current methodologies:

Table 1: Comparison of different control capabilities. Note that only FACET covers position, velocity, compliance, force, inertia, and dynamic adaptation simultaneously.

Impedance Control

The solution lies in a concept from classical robotics called Impedance Control. Rather than commanding a position (\(x\)) or a velocity (\(v\)), we command a relationship between force and motion. We model the robot’s end-effector (or body) as if it were attached to a virtual spring and damper.

The fundamental equation governing this behavior is:

Equation 1: The impedance control law.

Here, \(K_p\) represents stiffness (how hard the spring pushes back) and \(K_d\) represents damping (resistance to motion).

  • High \(K_p\): The robot is stiff and resists movement (good for holding heavy objects).
  • Low \(K_p\): The robot is compliant and “squishy” (good for safety and absorbing impacts).

While this is standard for robotic arms, applying it to legged robots via RL is difficult because legged robots have complex, intermittent contact with the ground. You cannot simply apply a force at the center of mass; you have to generate that force by pushing against the ground with feet that are constantly making and breaking contact.

The FACET Method

FACET (Force-Adaptive Control via Impedance Reference Tracking) bridges this gap. The core insight is to create a Reference Model—a virtual simulation of a mass-spring-damper system—and train the robot to behave exactly like that virtual model.

Figure 2: Method overview. (a) Robot mimics a reference model. (b) Temporal smoothing of targets. (c) RL policy inputs.

1. The Reference Model vs. The Robot

First, the researchers define the dynamics of the ideal “virtual” system they want the robot to embody. The reference model behaves according to this equation:

Equation 2: The dynamics of the reference model.

In this equation:

  • \(m, K_p, K_d\) are the virtual mass, stiffness, and damping.
  • \(x_{des}\) is the target setpoint (where the spring is anchored).
  • \(f_{ext}\) is the external force (like a kick or a push).

This reference model reacts perfectly to external forces. If you push it (\(f_{ext}\)), the spring compresses (\(x_{des} - x_{ref}\)), and the mass yields.

However, the actual robot has different physics. It is governed by ground reaction forces (\(f_{grf}\)), not virtual springs:

Equation 3: The dynamics of the actual robot simulation.

The goal of FACET is to use Reinforcement Learning to force Equation 3 (the robot) to produce trajectories that look exactly like Equation 2 (the spring).

2. Tracking Trajectories, Not Forces

Directly measuring external forces on a robot is difficult because most quadruped robots lack force sensors on their body. Therefore, the researchers don’t ask the robot to track the force directly. Instead, they integrate the reference dynamics over time to create a “reference trajectory.”

If the virtual spring model gets pushed, it moves along a specific path. The RL policy is rewarded based on how closely the robot follows that specific path.

Equation 4: The tracking loss function.

By minimizing the difference between the robot’s position/velocity (\(x_{sim}\)) and the reference model’s position/velocity (\(x_{ref}\)), the robot implicitly learns to generate the ground reaction forces required to mimic the spring.

3. Temporal Smoothing for Stability

A major technical contribution of this paper is how they generate these targets. If you simply run the reference model open-loop, it might drift too far from what the physical robot can actually achieve (due to motor limits or friction). If you only look at the immediate next step, the signal is too noisy.

The researchers propose Temporal Smoothing. They generate targets by integrating the reference dynamics from multiple different start times in the past.

Equation: Reference integration dynamics.

The final reward function blends these targets, ensuring the robot learns a stable behavior that respects the spring dynamics without getting confused by instantaneous noise or impossible targets.

Equation: The temporal smoothing reward function.

4. Teacher-Student Training

In simulation, we know exactly what external forces (\(f_{ext}\)) are pushing the robot. In the real world, we don’t. To solve this, FACET uses a two-stage training process.

  1. The Teacher Policy: Trained with access to “privileged information” (exact external forces, ground friction, etc.).
  2. The Student Policy: Trained to copy the Teacher, but it only sees “proprioceptive” data (joint angles, IMU). It uses a history of these observations to estimate the external forces implicitly.

Figure 3: The Teacher-Student training architecture.

This setup allows the deployed robot to “feel” a push and react compliantly, even though it has no skin or force sensors.

Experimental Results

The researchers validated FACET on a Unitree Go2 quadruped robot. The results demonstrate a significant improvement in robustness and interaction capability compared to standard velocity-tracking baselines.

Surviving the “Big Kick”

One of the most impressive demonstrations is the impulse resistance. The robot was commanded to walk at 1.5 m/s, and then a massive lateral force (up to 400N) was applied.

  • Baseline (Vanilla): Tries to fight the force to stay on the line. It creates a stiff response, loses stability, and falls.
  • FACET: Acts like a spring. It allows the force to push it sideways (deviation), absorbs the energy, and then recovers balance.

Figure 4: Left: Success rate under impulse. Middle: Trajectory plots showing FACET (blue) yielding to force while baselines fail. Right: Reduced collision forces.

As seen in the middle graph of Figure 4, the FACET trajectories (blue) show the robot drifting sideways significantly. This is desirable! By “going with the flow,” the robot survives the impact. The baseline (green) tries to hold the line and crashes.

Soft Collisions

Safety is paramount for robots operating in homes. When the FACET robot walks into a wall, the virtual spring compresses. Because the user can set a low stiffness (\(K_p\)), the robot stops gently. Figure 4 (Right) shows that FACET generates significantly lower collision impulses compared to robust baselines.

Heavy Pulling

Compliance is great for safety, but what if you need to do work? Because FACET allows variable stiffness, you can crank up the \(K_p\) (stiffness) parameter. The robot effectively tightens its virtual muscles.

Figure 6: Large force pulling comparison. FACET pulls up to 10kg, while baselines fail at 2.5-5kg.

In the pulling experiment (Figure 6), the robot was tethered to a weight. The built-in controller failed at 2.5kg. A standard robust policy failed at 5kg. FACET successfully pulled 10kg (2/3rds of its own body weight) by coordinating its whole body to lean in and apply force, exactly as commanded by the high-stiffness reference model.

Extension to Manipulation

The framework isn’t limited to just the robot’s body. It can be extended to “Loco-manipulation” (a robot dog with a robotic arm). By defining two reference models—one for the body and one for the arm—FACET allows for complex coordinated tasks.

Equation: Multi-body reference dynamics.

Figure 5: Loco-manipulator extension. The robot can coordinate base and arm stiffness to exert precise pulling forces.

This allows a user to, for example, grab the robot’s hand and pull the entire robot around (Kinesthetic teaching), or have the robot push heavy objects with its arm while the legs brace for support.

Conclusion

FACET represents a shift in how we think about controlling legged robots. Instead of rigid instructions (“be at position X at time T”), it provides a flexible framework (“act like a spring with stiffness K”).

This approach unlocks two critical capabilities:

  1. Safety: The robot becomes compliant, absorbing impacts and stopping gently against obstacles.
  2. Versatility: The same policy can transition from being soft and guideable to being stiff and powerful simply by changing a few input numbers.

By training RL agents to track impedance references rather than raw velocities, FACET paves the way for legged robots that are not just agile, but also safe enough to interact physically with the world—and the people—around them.