Imagine a robot deployed on a search-and-rescue mission in a desert environment, or perhaps an explorer rover on the steep, sandy slopes of Mars. The terrain is treacherous—loose sand shifts underfoot, and large rocks block the path.

Traditionally, a robot’s navigation strategy is strictly avoidance: see a rock, plan a path around it. But what if the path is blocked? Or what if avoiding the rock puts the robot on a slope so steep it might slide?

In the fascinating world of Granular Loco-Manipulation, researchers are flipping the script. Instead of passively avoiding obstacles, they are teaching legged robots to actively manipulate the terrain. By strategically kicking up sand—creating controlled “avalanches”—robots can slide heavy rocks out of their way while simultaneously moving toward their goal.

This blog post breaks down the research paper “Granular Loco-Manipulation: Repositioning Rocks Through Strategic Sand Avalanche”, introducing DiffusiveGRAIN, a learning-based system that allows quadrupedal robots to reshape their environment to ensure safe passage.

The Problem: Why Sand is Hard

Granular media (like sand, gravel, or soil) is notoriously difficult for robots. It acts like a solid when you stand on it but flows like a fluid when you push it too hard.

Previous research has explored “obstacle-aided locomotion,” where robots use rocks as anchor points to push off of. While clever, this is high-risk. If a robot steps on a rock incorrectly, it can slip, get high-centered, or flip over entirely.

Robot flipping over on a sand slope due to bad contact.

As shown in Figure 2 above, a miscalculated step on a steep slope can be catastrophic. The alternative is to move the obstacle. However, moving rocks on sand isn’t like sliding a cup on a table. When you push a rock on sand, you trigger an avalanche. That flowing sand interacts with other rocks nearby, creating a complex chain reaction of movement.

The researchers identified two major gaps in existing technology:

  1. Interference: Moving one rock affects its neighbors. Previous models assumed rocks moved independently, which is incorrect when rocks are clustered.
  2. Robot State: When a robot digs its leg into the sand to move a rock, the robot moves too. The physics of the “excavation” affects the robot’s position and orientation, potentially destabilizing it.

The Setup

To study this, the researchers built a controlled “granular trackway”—essentially a high-tech sandbox that can be tilted to simulate steep dunes.

The experimental setup with the granular trackway and robot.

They used a quadruped robot and a separate gantry system (a robotic arm rig) to collect data. The goal? To execute Loco-Manipulation: a portmanteau of “Locomotion” (moving the robot) and “Manipulation” (moving the world).

The Core Solution: DiffusiveGRAIN

The team developed DiffusiveGRAIN, a framework that predicts how the sand, the rocks, and the robot will all move in response to a specific leg action.

This isn’t just one neural network; it is a system composed of two specialized predictors and a clever adjustment mechanism. Let’s break down the architecture.

System overview showing the Environment and Robot State Predictors.

1. The Environment Predictor (\(f_e\))

The heart of the system is a Diffusion Model. If you’ve heard of DALL-E or Stable Diffusion, you know these models are excellent at generating images. Here, the researchers use a diffusion model with a U-Net backbone to generate depth images.

  • Input: A depth map of the current terrain (showing rocks and slope) and a visual representation of the robot’s leg action (where it will dig).
  • Output: A predicted image showing how the sand surface will change.

Why a diffusion model? Granular flows are complex and stochastic (random). Diffusion models are particularly good at capturing these complex, multi-modal distributions, allowing the robot to predict how multiple rocks will shift simultaneously.

2. The Robot State Predictor (\(f_r\))

While the first model watches the sand, the second model watches the robot. This U-Net takes the current state and the planned action to predict the robot’s next position and orientation.

Crucially, the researchers discovered that the robot’s movement depends heavily on which legs are used. For example, digging with all four legs moves the robot differently than digging with just the front right leg.

3. Handling Complexity: The Interference Problem

One of the paper’s key insights is that you cannot model rocks in isolation.

Experiment setup showing obstacle interference.

Look at the top right charts in Figure 3. The red dashed line represents how a rock should move if it were alone. The bars show how it actually moves when another rock is nearby.

  • Top Right (0cm): When rocks are touching, the displacement drops significantly (to 42% of the isolated case).
  • Why? The avalanche triggered by the robot creates a flow of sand. If another rock is in the way, it blocks that flow, altering the movement of the target rock. DiffusiveGRAIN’s diffusion model captures these fluid-like interactions that simpler models miss.

4. Effective Action Adjustment (EAA)

There was a practical hurdle: gathering training data with a full robot is slow and risky. It’s much faster to use a gantry arm (manipulator) to poke the sand thousands of times.

However, a fixed arm doesn’t move like a walking robot. The robot slides and turns as it kicks. To bridge this “Sim-to-Real” gap, the team invented Effective Action Adjustment (EAA).

When the robot plans an action, the system calculates where the robot will be halfway through the kick (using the Robot State Predictor). It then virtually “shifts” the action input fed to the Environment Predictor to match this predicted mid-point. This aligns the static training data with the dynamic reality of a walking robot.

Planning the Move

With these predictors in place, the robot can now “imagine” the future. It uses a method called Receding Horizon Planning. Basically, the robot simulates different sequences of leg actions for the next 4 steps.

It scores these plans based on a Cost Function.

The Cost of Locomotion

First, the robot wants to get to its destination (\(\mathbf{d}^r\)). The cost increases if the robot is far away from the target:

Equation for distance cost.

But it must also stay safe. The researchers defined a “Danger Zone” relative to the robot’s heading. If a rock is directly in front of the robot (where it might trip), the cost skyrockets.

Equation for safety cost.

This equation penalizes obstacles within a certain angle (\(\beta\)) of the robot’s path.

The Cost of Manipulation

If the goal is to move rocks, the robot calculates the distance between where the rocks are and where it wants them to be:

Equation for manipulation cost.

By combining these costs, the robot finds the “sweet spot”: a sequence of moves that kicks rocks toward their target locations while inching the robot toward its own goal, all while avoiding flipping over.

Experiments & Results

The team put DiffusiveGRAIN to the test against a baseline method called GRAIN (which treated obstacles as independent).

The “Loco-Manipulation” Test

The most difficult task involved moving 4 obstacles into a specific zone (below a red line) while the robot navigated to a green target square.

Comparison of DiffusiveGRAIN vs GRAIN in a trial.

In Figure 5, you can see the difference:

  • DiffusiveGRAIN (Top): The robot methodically kicks the rocks. By step 22, all rocks are cleared past the red line, and the robot reaches the green square. Success!
  • GRAIN (Bottom): The robot focuses on moving, but fails to account for how the rocks interact. It only clears 2 of the 4 rocks before giving up.

The numbers back this up:

  • In pure Locomotion tasks, DiffusiveGRAIN achieved 90% success (vs 80% for baseline).
  • In Loco-Manipulation (moving rocks + self), DiffusiveGRAIN achieved 70% success, while the baseline failed significantly, achieving only 20%.

The baseline failed largely because it couldn’t predict how rocks would jam together or how the robot’s own shifting position would ruin its aim.

Prediction Accuracy

Why did DiffusiveGRAIN work better? Because it understood the physics of the sand better.

Bar chart showing prediction errors.

Figure 12 shows the prediction error (MAE) for rock positions.

  • Pink bars (DiffusiveGRAIN): Consistently lower error.
  • Blue bars (GRAIN): Higher error, especially when rocks are close together (0cm distance). This proves that modeling the “interference” between rocks is crucial.

Out-of-Distribution Testing: Real Rocks

The training data used uniform 3D-printed hemispheres. But the real world is messy. To test robustness, the researchers threw real, irregular rocks onto the slope.

Experiments with real rocks.

Despite never seeing these shapes during training, the robot successfully manipulated them (Figure 6). The diffusion model had learned the underlying dynamics of the sand avalanche, which generalized well to objects of similar mass but different shapes.

Conclusion and Implications

DiffusiveGRAIN represents a shift in how we think about robot navigation. Instead of treating the environment as a static obstacle course, this research treats it as a malleable resource.

By integrating a Diffusion-based Environment Predictor (to understand sand and rock flow) with a Robot State Predictor (to understand self-movement), the robot can choreograph a complex dance of excavation and motion.

Key Takeaways:

  1. Granular interactions are non-linear: You cannot model rocks on sand as independent entities; they influence each other through the medium.
  2. Locomotion affects Manipulation: A robot creates changes in the world while moving, and those movements change the robot. They must be planned jointly.
  3. Active terraforming is possible: Robots can be autonomous bulldozers, clearing their own paths to reach destinations that would otherwise be impossible.

This work paves the way for more capable planetary rovers and rescue robots that don’t just survive their environment, but actively shape it to their advantage.