Imagine you are trying to pick up a playing card that is lying flat on a table. You can’t just grab it directly because your fingers—or a robot’s gripper—can’t get underneath it. What do you do? You instinctively slide the card to the edge of the table using one finger (non-prehensile manipulation), and once it overhangs the edge, you pinch it (prehensile manipulation).
This sequence of actions feels trivial to humans, but for robots, it is an immense challenge. It requires Long-Horizon Prehensile and Non-Prehensile (PNP) Manipulation. The robot must reason about physics, contacts, and geometry over a long sequence of steps. It has to decide how to slide the object, where to stop, and how to reposition its hand to transition from a slide to a grasp without knocking the card off the table.
In this post, we are diving deep into a research paper titled “SPIN: distilling Skill-RRT for long-horizon prehensile and non-prehensile manipulation.” The researchers propose a framework that combines the foresight of planning algorithms with the speed and robustness of neural networks.

As shown in Figure 1, the tasks involve complex sequences:
- Card Flip: Sliding a card to the edge to grasp and flip it.
- Bookshelf: Toppling a tightly packed book to create space, then grasping and shelving it.
- Kitchen: Manipulating a cup in a sink to orient the handle correctly before placing it in a cupboard.
The Problem: Why is PNP So Hard?
Solving these problems requires a specific sequence of skills. Standard Reinforcement Learning (RL) often fails here because the “horizon” is too long—the robot might need to execute hundreds of low-level motor commands before seeing a reward.
On the other hand, Task and Motion Planning (TAMP) approaches are great at long horizons but struggle with the “physics” part. Defining the exact symbolic rules for how a book topples or how a card slides is incredibly difficult and prone to simulation errors.
A major bottleneck in these tasks is the State Gap. Let’s say a robot uses a “pushing skill” to move a book. When the push is done, the robot’s gripper is pressing against the book. To start the next skill (e.g., grasping), the robot needs to move its hand to a new position. However, moving the hand away from the book without disturbing it is risky. If the robot bumps the book even slightly, the book might fall flat, rendering the next skill impossible.
Standard motion planners try to find a collision-free path for the hand, but they often fail in these “contact-rich” scenarios where the margin for error is millimeter-thin.
The Solution: SPIN (Skill Planning to INference)
The researchers introduce SPIN, a framework that treats the problem in two stages:
- Data Generation: Use a sophisticated planner (Skill-RRT) to solve the problem offline in a simulator. This is slow but produces high-quality solutions.
- Distillation: Train a fast neural network policy to imitate these solutions. This results in a system that can run in real-time.

As illustrated in Figure 2, the process involves three main components which we will break down:
- Skills: Pre-trained libraries of actions (e.g., slide, topple).
- Connectors: Special policies trained to bridge the gap between skills.
- Skill-RRT: The planner that sequences everything together.
Step 1: The Skill Library
Before the robot can plan a complex sequence, it needs basic abilities. The authors pre-train a set of Parameterized Skills using Reinforcement Learning (RL).
- Non-Prehensile (NP) Skills: Actions like sliding, toppling, or pushing. These maximize contact to move objects.
- Prehensile (P) Skills: Standard pick-and-place maneuvers.
Each skill takes a “Goal Object Pose” as input. For example, the sliding skill takes coordinates \((x, y)\) and tries to push the card there. Crucially, these skills are trained independently. The sliding skill doesn’t know that a grasping skill will happen next; it just focuses on sliding.
Step 2: Bridging the Gap with Connectors
This is where SPIN introduces a clever innovation. As mentioned earlier, transitioning from one skill to another is dangerous. A standard motion planner (like RRT) might find a path for the robot arm to move from the end of a slide to the start of a grasp, but it doesn’t account for the subtle physics of breaking contact.
To solve this, the authors introduce Connectors. A connector is a small, goal-conditioned RL policy specifically trained to move the robot from a post-skill state to a pre-skill state with minimal object disturbance.
How do we train Connectors?
We can’t just train a connector for every possible scenario—that would take forever. The researchers use a method called Lazy Skill-RRT.
- They run a planner that hallucinates that a perfect connector exists. It “teleports” the robot from the end of Skill A to the start of Skill B.
- They record these “teleportations” as training problems.
- They then train an RL agent to actually solve these specific transition problems, penalizing any movement of the object.

Figure 5 (above) shows why this is necessary. Standard motion planners (Bi-RRT) often fail (shown in rows a, b, c) because they cause the object to drop or collide during the transition. The learned Connectors (used in SPIN) act like “funnels,” guiding the robot safely to the start of the next skill.
Step 3: Planning with Skill-RRT
Now that the robot has Skills (to move objects) and Connectors (to link skills), it needs a brain to decide which skills to use and where to move the object.
The authors propose Skill-RRT, an extension of the classic Rapidly-exploring Random Tree (RRT) algorithm. Standard RRT explores robot joint configurations. Skill-RRT explores the space of skills and intermediate object goals.

Here is how the Skill-RRT loop works (simplified):
- Sample: Pick a random skill (e.g., “Topple”) and a random target pose for the object (e.g., “tilted 45 degrees”).
- Nearest Neighbor: Find a node in the existing planning tree that can transition to this new state.
- Extend:
- Use a Connector to move the robot arm to the starting position for the skill.
- Execute the Skill in the simulator.
- If the skill succeeds (the object reaches the target), add this new state to the tree.
- Repeat until the goal is reached.
Handling Narrow Passages
A major challenge in planning is finding “narrow passages”—specific regions where the object must be for the task to succeed. For example, to flip a card, it must be exactly at the edge of the table. If the planner randomly samples object positions all over the table, it might never find that specific edge spot.
To address this, the authors define specific Regions (\(Q_{obj}\)) for each domain.

As shown in Figure 3, rather than sampling anywhere in 3D space, the planner samples subgoals within these logical regions (e.g., “Upper Shelf,” “Lower Shelf,” “Sink”). This drastically reduces the search space, making planning feasible.
Step 4: Distillation to a Reactive Policy
Skill-RRT is powerful, but it is slow. It can take minutes to compute a plan, which is unacceptable for a robot working in a dynamic real-world kitchen. Furthermore, execution in the real world is noisy; if the object slips slightly, a rigid plan might fail.
SPIN’s final step is Distillation. The goal is to compress the heavy computation of the planner into a fast, reactive neural network.
- Data Generation: The researchers run Skill-RRT on thousands of randomized problems to collect a massive dataset of successful trajectories.
- Robustness Filtering: Not all successful plans are created equal. Some might have succeeded due to “simulation luck.” To ensure quality, they replay each plan multiple times with random noise added.
- Filtering: Only plans that succeed consistently (e.g., >90% of the time) are kept.

Figure 6 demonstrates the importance of this filtering. The blue histograms represent the quality of plans kept when using a high success threshold (\(m=0.9\)). These plans are significantly more robust than those kept with a low threshold (\(m=0.1\)) or no filtering at all.
- Diffusion Policy: Finally, they train a Diffusion Policy on this filtered dataset. Diffusion policies are excellent at handling multi-modal distributions—meaning they can learn multiple valid ways to solve a problem without getting confused.
The result is a policy that takes in the current state and outputs motor commands in milliseconds, effectively giving the robot “muscle memory” of the complex plans derived by Skill-RRT.
Experiments and Results
The team evaluated SPIN against several state-of-the-art baselines, including:
- PPO: Standard “flat” Reinforcement Learning.
- MAPLE: A hierarchical RL approach.
- HLPS: A goal-conditioned hierarchical method.
- Skill-RRT: The raw planner (without distillation).
Simulation Results
The results were stark. Standard RL methods (PPO, HLPS) achieved 0% success on these long-horizon tasks. They simply could not explore the sequence of actions deep enough to find a reward.
Skill-RRT (the planner) worked reasonably well, achieving success rates between 39% and 66%. However, it was slow (taking 80-120 seconds per plan).
SPIN, the distilled policy, outperformed everything.
- Card Flip: 95% Success
- Bookshelf: 93% Success
- Kitchen: 98% Success
Why did SPIN outperform the planner that created its training data? This phenomenon is known as the “oracle effect.” The distilled policy generalizes from thousands of plans. If the robot makes a small error during execution, the policy (having seen similar variations in training) can react and correct it immediately. The raw planner, however, is stuck following a rigid path calculated at the start.
Importance of Data Quality
The researchers performed an ablation study to see if their “Noise Replay Filtering” actually mattered.

Figure 7 highlights this for the Card Flip task. The x-axis shows the target position for sliding the card.
- Right Chart: The “Safe & Graspable” region is narrow (blue/green blocks).
- Left/Center Charts: The blue bars (SPIN’s high-quality filtering) cluster tightly in the safe region. The green area (No filtering) spreads out into risky areas.
- Takeaway: Filtering removes “lucky” plans that placed the card in risky positions, leaving only the plans that are mechanically robust.
Real-World Deployment
The ultimate test is the real world. The policy trained entirely in simulation was deployed on a physical Franka Research 3 robot zero-shot (meaning no extra training in the real world).
The results were impressive:
- Card Flip: 85% Success
- Bookshelf: 90% Success
- Kitchen: 80% Success
The failures were mostly due to unexpected physical collisions or the hardware torque limits triggering safety stops—issues inherent to the messy reality of physics that simulators can’t perfectly capture.
Conclusion
The SPIN framework demonstrates a powerful paradigm for robotics: Plan offline, Infer online.
By leveraging the brute-force search capabilities of Skill-RRT combined with Connectors to handle the delicate transitions, the system discovers solutions that are too complex for RL to find on its own. Then, by distilling these solutions into a Diffusion Policy, the robot gains the ability to execute these complex behaviors with the speed and reactivity required for the real world.
This approach bridges the gap between the methodical reasoning of classical planning and the agile, generalized performance of modern deep learning. For students and researchers in robotics, SPIN highlights that we don’t always have to choose between Planning and Learning—often, the best results come from making them work together.
](https://deep-paper.org/en/paper/2502.18015/images/cover.png)