Robotic manipulation often feels like a magic trick. We see videos of robots backflipping or picking up delicate objects, and we assume the problem is solved. But there is a massive difference between waving a robotic arm in empty space and interacting with the physical world. The former requires position control (moving from A to B), while the latter requires force control (interacting with resistance).

Imagine opening a heavy door. You don’t just move your hand along a trajectory; you lean into it, applying force while maintaining your balance. If you treated the door like empty air, you would either fail to open it or fall over. This combination of movement and physical interaction is called loco-manipulation.

Traditionally, giving robots this sense of “touch” required expensive hardware force sensors and complex control theories. But what if a robot could learn to estimate and control force using only its internal motion data?

In this post, we dive into a fascinating paper titled “Learning a Unified Policy for Position and Force Control in Legged Loco-Manipulation.” The researchers propose a unified Reinforcement Learning (RL) framework that teaches legged robots to control both position and force simultaneously—without external force sensors.

Figure 1: Overview of the unified force-position policy enabling diverse behaviors like position tracking, force application, and compliant interactions.

The Challenge: The Missing Sense of Touch

Legged manipulators (like the quadruped dog with a robotic arm shown above) offer a massive workspace and high mobility. However, controlling them is a nightmare of physics.

Coupled Dynamics: The movement of the arm affects the balance of the legs, and vice versa.
Contact-Rich Tasks: Tasks like wiping a whiteboard or opening a cabinet require the robot to apply specific forces.
Hardware Limitations: Most robust, affordable legged robots lack precise force/torque sensors at the end-effector (the robot’s “hand”).

Recent advances in Reinforcement Learning (RL) have been great for locomotion (walking over rough terrain), but they often focus purely on position—getting the robot’s limbs to specific coordinates. When these position-focused robots encounter a contact-rich task, they often fail because they don’t understand force. Conversely, Imitation Learning (learning by watching humans) often relies on datasets that only record trajectories, missing the critical force information needed to complete the task.

The authors of this paper bridge this gap by proposing a Unified Policy that learns to model force and position together.

The Core Method: Unified Control via RL

The heart of this approach is a clever mathematical formulation that allows an RL agent to control force by adjusting position commands. This is based on the principle of Impedance Control.

The Mathematical Foundation

In classical physics, a spring-mass-damper system describes how objects react to forces. The relationship is often written as:

Equation for impedance control formulation.

Here, \(F\) is the net force, \(x\) is the position, and \(K\), \(D\), and \(M\) represent stiffness, damping, and mass.

The researchers simplify this for the robot’s end-effector. They assume that to apply a specific force, the robot just needs to aim for a “virtual” target position. If the robot wants to push against a wall with 10 Newtons of force, it shouldn’t aim at the wall; it should aim through the wall. The stiffness of the controller will generate the force as it tries to correct the error.

This leads to the derivation of the Target Position (\(x^{target}\)):

Equation for calculating target position based on force commands.

In this equation:

\(x^{cmd}\) is the position command.
\(F^{cmd}\) is the force the robot should apply.
\(F^{ext}\) and \(F^{react}\) represent external disturbances and reaction forces.

By manipulating \(x^{target}\), the policy can switch seamlessly between different modes. If \(F^{cmd}\) is zero, it acts like a position controller. If \(x^{cmd}\) is fixed but \(F^{cmd}\) changes, it acts as a force controller.

The Learning Architecture

The researchers implement this math within a Reinforcement Learning framework. They don’t just program the equations; they train a neural network to learn the dynamics.

As shown in Figure 2 below, the architecture consists of three main parts:

The Encoder: Compresses the history of robot states (joint angles, velocities).
The State Estimator: This is the “virtual sensor.” It predicts the external force and the robot’s true state based on proprioception (internal body awareness).
The Actor: The policy that outputs the actual motor commands.

Figure 2: Method Overview showing the RL architecture, the estimator, and the deployment pipeline.

The State Estimator is crucial. Since the robot has no force sensors, it must infer contact forces by analyzing how its joints are moving versus how they should be moving. If the arm is commanded to move right but stops unexpectedly, the estimator realizes, “I must be hitting something,” and calculates the opposing force.

Controlling the Base

The robot isn’t just an arm; it’s a quadruped. The base (the body) needs to move to support the arm’s pushing and pulling. The researchers extend their formulation to the robot base using velocity commands rather than position commands (since a robot base moves through the world rather than to a fixed point relative to itself).

Equation for base force control using velocity.

The target velocity for the base is calculated similarly to the end-effector position, allowing the legs to compensate for forces applied by the arm:

Equation for calculating target base velocity.

Training: Simulating the Touch

How do you train a robot to handle forces in a simulation? You have to beat it up a little.

During training in the Isaac Gym simulator, the researchers subject the robot to a rigorous curriculum. They randomly sample:

Position Commands: “Move here.”
Force Commands: “Push this hard.”
External Disturbances: They apply random virtual forces to the robot to simulate wind, heavy objects, or collisions.

By rewarding the robot for tracking both the target position (\(x^{target}\)) and the base velocity (\(v^{target}\)) under these chaotic conditions, the policy learns a robust internal model of interaction.

For those interested in the specific signals the robot sees, the observation space \(o_t\) includes gravity vectors, angular velocities, joint states, and previous actions:

Equation defining the observation space vector.

You can also view the specific reward terms and randomization ranges used to shape this behavior in the tables below. Note the penalties for collisions and jagged movements, which encourage smooth, safe operation.

Table A.1: Reward terms used during RL training. Table A.2: Domain randomization parameters.

From Low-Level Control to High-Level Skills

One of the most powerful applications of this unified policy is Imitation Learning (IL).

In standard IL, a human operates the robot to perform a task (like opening a drawer), and the robot clones the trajectory. But if the drawer is sticky, a position-cloning robot will fail because it doesn’t know how hard to pull.

The researchers use their learned Unified Policy as a foundation. Because the policy includes a Force Estimator, they can collect demonstrations that include both position and estimated force data.

Teleoperation: A human controls the robot.
Force Estimation: The low-level policy estimates the contact forces occurring during the task.
Force-Aware Training: A high-level Diffusion Policy is trained on this data. It learns to output both position and force commands.

This creates a “Force-Aware” imitation policy that can handle contact-rich tasks much better than vision-only baselines.

Experiments and Results

The team validated their approach on a Unitree B2 quadruped equipped with a Z1 arm, and even tested cross-embodiment on a G1 humanoid.

1. Does the Virtual Sensor Work?

First, they checked if the “imagined” forces matched reality. In simulation, the tracking error was minimal. In the real world, they compared the estimator against a dynamometer (a force measurement device).

Figure 3: Evaluation of tracking errors and real-world force estimation accuracy. Figure A.9: Additional force control evaluation graphs showing measured vs estimated force.

As seen in Figure 3(d) and Figure A.9, the estimated force (blue/dotted) tracks the measured force (red/solid) reasonably well. It’s not perfect—there is a “sim-to-real gap”—but it is consistent enough to be useful for manipulation.

2. Can it Perform Useful Skills?

The unified policy enables several distinct behaviors simply by changing the input commands:

Force Control: Holding a 2.5kg weight against gravity (Figure 5a).
Compliance: The robot base yields when pushed, allowing a human to guide it (Figure 5b).
Impedance Control: The robot behaves like a spring, resisting disturbances (Figure 5d).

Figure 5: diverse skills including force control, tracking, and impedance control.

3. Real-World Task Success

The ultimate test was comparing their Force-Aware Imitation Learning against a standard vision-only baseline on four tasks:

Wipe Blackboard: Requires constant pressure.
Open Cabinet: Requires overcoming a magnetic latch.
Close Cabinet: Similar resistance.
Open Drawer (Occluded): The robot can’t see the handle, so it must “feel” for it.

Figure 4: Force-aware imitation learning results and success rates.

The results in Figure 4(c) are striking. In the “Wipe Blackboard” task, the success rate jumped from 22% (without force) to 58% (with force). In the “Open Drawer with Occlusion” task, where the robot is effectively blind, success rose from 30% to 76%.

The quantitative results are summarized below:

Table A.3: Comparison of success rates between force-aware and non-force policies.

When the robot relies only on vision (w/o Force), it often barely grazes the blackboard or fails to pull the drawer hard enough. With the unified policy, it “feels” the contact and adjusts its force command to maintain interaction.

Conclusion

This paper presents a significant step forward for legged loco-manipulation. By creating a unified policy that treats force and position as two sides of the same coin, the researchers have given robots a sense of touch without adding a single new sensor.

Key Takeaways:

Hardware Independence: Sophisticated force control is possible using standard proprioception and RL, removing the need for fragile and expensive force sensors.
Unified Architecture: A single policy can handle position tracking, force application, and compliance.
Better Data: The learned estimator allows for the collection of “force-aware” demonstrations, significantly boosting the performance of imitation learning in contact-rich tasks.

While challenges remain—specifically regarding the accuracy of force estimation at the edges of the workspace and the gap between simulation and reality—this work lays a foundation for robots that can interact with our world not just by looking, but by feeling.

The Challenge: The Missing Sense of Touch#

The Core Method: Unified Control via RL#

The Mathematical Foundation#

The Learning Architecture#

Controlling the Base#

Training: Simulating the Touch#

From Low-Level Control to High-Level Skills#

Experiments and Results#

1. Does the Virtual Sensor Work?#

2. Can it Perform Useful Skills?#

3. Real-World Task Success#

Conclusion#