Introduction

Imagine you are carrying a large, heavy box through a doorway. To get through, you might need to use your hip to nudge the door open while balancing the box with your arms, or perhaps you use a foot to kick a doorstop out of the way. As humans, we perform this kind of “loco-manipulation”—coordinating locomotion and manipulation simultaneously—effortlessly. We treat our limbs as versatile tools; a leg is usually for walking, but it can momentarily become a manipulator if the task demands it.

For robots, however, this fluid coordination is a massive computational headache. Most robotic systems treat legs strictly for transport and arms strictly for manipulation. Breaking this rigid assignment requires a control system that can dynamically reassign limb roles on the fly without falling over.

In this post, we are diving deep into ReLIC (Reinforcement Learning for Interlimb Coordination), a new framework presented by researchers at the RAI Institute, UC Berkeley, and Cornell. ReLIC allows a quadruped robot (specifically a Boston Dynamics Spot with an arm) to perform complex tasks like carrying yoga balls, closing drawers with its feet, and manipulating large boxes by dynamically mixing model-based control with reinforcement learning.

ReLIC enables a robot to use its limbs flexibly, such as using a leg and arm together to carry a yoga ball.

As shown in Figure 1 above, the core innovation here is flexibility. The robot isn’t just “walking” or “grasping”; it is coordinating an arm (green) and a selected leg (red) to handle an object, while the remaining three legs (purple) handle the complex physics of balancing and moving.

The Challenge: Why is Loco-Manipulation So Hard?

To understand why ReLIC is a breakthrough, we first need to look at why this problem is difficult.

In traditional robotics, we often see a “divide and conquer” approach. If a mobile manipulator needs to pick up an object, it usually:

  1. Drives to the location (Locomotion).
  2. Stops.
  3. Picks up the object (Manipulation).
  4. Drives away.

This is safe, but slow and limited. True loco-manipulation—doing both at once—introduces dynamic coupling. The forces exerted by the arm affect the robot’s balance. Conversely, the gait of the legs introduces vibrations and movements that the arm must compensate for.

If you add the requirement that a leg should stop walking and start manipulating (e.g., closing a drawer), the problem explodes in complexity. The robot effectively changes its topology from a stable four-legged crawler to a precarious three-legged balancer. Previous methods mostly relied on pre-defined heuristics (hard-coded rules for specific tasks) or model-based trajectory optimization, which requires precise knowledge of the environment and often struggles with the messy reality of unstructured worlds.

The ReLIC Architecture

The researchers propose a hierarchical framework that splits the problem into two levels: the Task Level (what do I want to do?) and the Command Level (how do I move my motors to do it?).

The heart of the system is the ReLIC Controller, which resides at the command level. This is not a single end-to-end neural network that takes camera pixels and outputs motor torques. Instead, it is a hybrid architecture designed to get the best of both worlds: the precision of classical control and the robustness of Reinforcement Learning (RL).

Overview of the ReLIC framework, showing the interaction between user inputs, the robot state, and the dual-module controller.

The Adaptive Controller: A Tale of Two Modules

As illustrated in Figure 2, the controller is composed of two interacting modules:

  1. The Model-Based (MB) Controller: This module prioritizes task success. It calculates the necessary movements for the limbs assigned to manipulation. It typically uses Inverse Kinematics (IK) to figure out exactly how to angle the joints to reach a specific target coordinate.
  2. The RL Controller: This module prioritizes locomotion stability. It is a neural network trained to keep the robot upright and walking, regardless of what weird contortions the manipulation limbs are performing.

Dynamic Limb Assignment

The “magic” happens in how these two combine. The system uses a binary mask, denoted as \(m\).

  • If \(m=1\) for a specific limb, that limb is in Manipulation Mode.
  • If \(m=0\), that limb is in Locomotion Mode.

The final action sent to the robot’s motors is a blended composition. The manipulation limbs follow the precise Model-Based controller, while the locomotion limbs (and the overall body balance) are governed by the RL policy. This decoupling allows the robot to seamlessly switch roles. A leg can be a walker one second and a pusher the next.

Learning to Walk on Three Legs

The most challenging part of this system is training the RL policy. The robot needs to learn how to walk not just with four legs (trotting), but also with three legs (bouncing) while a heavy arm and a fourth leg are waving around doing something else.

Simulation and Training

The researchers trained the policy in a physics simulator (IsaacLab). The robot is subjected to a variety of randomized conditions—different friction levels, robot masses, and external pushes—to ensure the policy is robust.

The RL agent receives a massive stream of data:

  • Proprioception: Joint positions, velocities, and gravity vectors.
  • Commands: Where the robot should be going.
  • History: What action it took previously.

Gait Regularization

If you just tell an RL agent to “don’t fall over,” it will often learn weird, jittery behaviors that look unnatural and might damage the hardware. To prevent this, the researchers introduced specific Gait Regularization rewards.

Gait regularization diagram showing the timing of foot contacts for stable three-legged locomotion.

As shown in Figure 10, the system enforces specific contact timings.

  • Four-Legged: It encourages a symmetric trotting gait.
  • Three-Legged: It enforces a “cyclic bouncing gait.” When one leg is lifted for manipulation, the other three legs must cycle through a specific staggered pattern to maintain dynamic stability.

This structured approach to learning ensures that when the robot switches from four legs to three, it doesn’t just scramble; it transitions into a stable, rhythmic bounce.

Bridging the Reality Gap: Sim-to-Real

One of the biggest hurdles in robotics is “Sim-to-Real” transfer. A policy that runs perfectly in a clean simulation often fails in the real world because real motors have friction, latency, and torque limits that simulators don’t perfectly model.

The ReLIC team found that standard domain randomization wasn’t enough, particularly for the high-stress scenario of three-legged walking. The solution was Motor Calibration.

They deployed an initial policy on the real robot, collected data on how the actual motors responded (Torque vs. Velocity), and compared it to the simulation.

Torque-Velocity calibration graph showing the difference between ideal simulation limits and real-world data.

In Figure 11, the red lines represent the ideal torque limits in the simulator. The blue dots are real data. Notice how the real robot (blue dots) often operates outside the naive red box or behaves differently near the limits. By feeding this calibrated data back into the simulation and retraining, the RL policy learned to respect the actual physical limits of the hardware, leading to much smoother and more successful real-world deployment.

Talking to the Robot: Task Interfaces

A powerful controller is useless if you can’t tell the robot what to do. ReLIC supports three levels of user interaction, ranging from low-level control to high-level AI reasoning.

1. Direct Targets

This is the most straightforward method. An operator uses a joystick or pre-defined trajectory to tell the arm and leg exactly where to go. This is useful for precise, repetitive motions.

2. Contact Points

In this mode, the user points to a spot on an object in a 3D point cloud and says, “Put your foot here” or “Put your hand there.” The system then generates the trajectory to make that contact happen.

The Contact Point interface allows users to select specific interaction points on a 3D scan of the environment.

3. Language Instructions

This is the most futuristic interface. The user gives a natural language command like “Use the arm and leg to close the two open drawers.”

To achieve this, the system uses a pipeline involving Vision-Language Models (VLMs):

  1. Segment: The robot takes a picture. A model called SAM2 (Segment Anything Model) outlines all the objects.
  2. Reason: GPT-4o analyzes the image and the prompt. It decides which object is the “drawer” and infers where a hand or foot should push to close it.
  3. Execute: These inferred points are fed into the ReLIC controller as targets.

The Vision-Language pipeline uses GPT-4o to interpret scene semantics and generate contact points.

Experimental Results

The researchers pushed Spot to its limits with 12 diverse tasks designed to test different aspects of coordination.

The 12 evaluation tasks, ranging from manipulating yoga balls to closing drawers and moving chairs.

The tasks were categorized into:

  • Mobile Interlimb Coordination: Carrying big things while moving (e.g., Yoga Ball, Shipping Box).
  • Stationary Interlimb Coordination: Standing on three legs while manipulating (e.g., Trash Bin, Tire Pump).
  • Foot-Assisted Manipulation: Tasks where the leg helps the arm (e.g., Tool Chest, Chair).

Success Rates

The results were impressive. The graphs below compare ReLIC against two baselines: an End-to-End RL policy (trying to learn everything at once) and a Model Predictive Control (MPC) baseline.

Success rates across tasks. ReLIC (purple bars) consistently outperforms E2E and MPC baselines.

ReLIC achieved an average success rate of 78.9%.

  • ReLIC-Direct (Dark Purple) performed best, as human operators provide the most optimal targets.
  • ReLIC-Contact and ReLIC-Language (Lighter Purples) performed slightly lower but still demonstrated that the robot could autonomously figure out how to act.
  • The Baselines (Orange and Tan) failed almost completely. The MPC baseline couldn’t handle the complex three-legged dynamics, and the End-to-End RL baseline failed to learn precise manipulation alongside locomotion.

Visualization of Flexible Coordination

One of the most visually interesting results is seeing when the robot decides to use its legs for what purpose.

Timeline of limb usage during tasks showing dynamic switching between balancing, manipulation, and coordination.

In Figure 7, we see the “timeline” of limb usage.

  • Green: Arm Manipulation.
  • Red: Leg Manipulation.
  • Purple: Interlimb Coordination (Both).
  • Gray: Balancing.

Look at the Deck Box (B) task. The robot uses its arm, then switches to using its leg to prop open the lid, then coordinates both. This seamless switching—without the robot needing to reboot or stop to change “modes”—is the hallmark of the ReLIC system.

Conclusion

ReLIC represents a significant step forward in robotic autonomy. By acknowledging that locomotion and manipulation are distinct but deeply interconnected problems, the researchers designed a system that is both robust (thanks to RL) and precise (thanks to Model-Based control).

The implications extend beyond just opening drawers or carrying boxes. This kind of “whole-body intelligence” is essential for robots that will eventually work in our homes and on construction sites—unstructured environments where a robot might need to use an elbow to open a door or a foot to brace a collapsing shelf.

While limitations remain—such as the reliance on an external vision system for the language interface and the open-loop nature of the high-level planner—ReLIC proves that flexible interlimb coordination is not only possible but practical on current hardware.


Note: This blog post explains the research paper “Versatile Loco-Manipulation through Flexible Interlimb Coordination” by Zhu et al. (2025).