Introduction
Imagine you are learning to play a new sport—say, tennis. You swing the racket, expecting to hit a perfect cross-court shot, but the ball sails wildly out of bounds. What do you do? You don’t just ignore it. You analyze the discrepancy between what you thought would happen and what actually happened. You adjust your mental model of the swing and try again. This process of learning from failure is fundamental to human intelligence.
For robots, however, this is incredibly difficult. Most robots operating in the real world rely on models trained in simulation. While simulators are powerful, they are rarely perfect replicas of the messy, unstructured real world. When a robot encounters a situation it hasn’t seen in its training data—a specific arrangement of cups on a table, or a shelf that is slightly more crowded than expected—it often fails. Typically, the robot has no mechanism to update its understanding; it just fails repeatedly.
In a recent paper titled “Fail2Progress,” researchers from the University of Utah and NVIDIA Research propose a new framework that allows robots to do exactly what humans do: reason about their failures and use them to get smarter.

As shown in Figure 1, the core idea is elegant: when the robot fails in the real world, it shouldn’t just stop. Instead, it should use that failure to generate a targeted simulation environment, practice in that virtual world to update its internal model, and then return to the real world to succeed. This blog post explores how Fail2Progress achieves this using a sophisticated technique called Stein Variational Inference.
Background: The Problem with Skill Effect Models
To understand the contribution of this paper, we first need to look at how these robots plan. The authors build upon Skill Effect Models.
A skill effect model predicts how the world changes when a robot performs an action. For example, if a robot executes a push skill on a box, the model predicts the box’s new position. These models are increasingly moving toward symbolic predictions. Instead of predicting the exact millimeter coordinate of the box (which is hard and brittle), the model predicts symbolic states, such as relations: in-contact(box, shelf) or inside(apple, bowl).
The robot plans a sequence of skills (a plan skeleton) to achieve a goal. If the robot executes a skill and the resulting symbolic state matches the prediction, success! But often, the real world throws a curveball.
Types of Failures
The researchers categorize failures into two buckets:
- Sim-to-Real (Sim2Real) Gap: The robot’s internal physics model is wrong. It tried to push the box, but the friction was different, so the box didn’t move as far as expected.
- Incorrect Symbolic Predictions: This is the subtler, more dangerous failure. The robot successfully executed the motion it planned, but the outcome wasn’t what it predicted because the situation was “out-of-distribution” (OOD). For instance, it tried to place a book on a shelf, but didn’t account for a neighboring object blocking the way because it never saw that specific geometry during training.
Fail2Progress specifically targets the second type: incorrect symbolic predictions. The goal is to take a single failure instance and turn it into a lesson that prevents future failures.
The Core Method: Fail2Progress
The intuition behind Fail2Progress is that a single data point (the failure) isn’t enough to retrain a massive neural network. The robot needs a whole dataset of experiences related to that failure. But letting a robot flail around in the real world to collect data is dangerous and slow.
Instead, the authors propose a Real-to-Sim approach. They want to generate a synthetic dataset in simulation that satisfies two conflicting goals:
- Targeted: The data must resemble the real-world failure scenario (so the robot learns to fix this specific problem).
- Informative: The data should challenge the robot’s model, focusing on areas where the robot is uncertain (maximizing information gain).
The Optimization Problem
This forms the heart of the paper’s mathematical contribution. The researchers frame this as a constrained optimization problem. They want to find a dataset \(\mathcal{D}^+\) that maximizes the Information Gain (measured by KL-divergence between the old model and the new updated model), subject to the constraint that the simulation states \(S^+\) match the observed real-world failure.
The full optimization objective is visualized below:

Let’s break this down:
- The Maximization: We are looking for a dataset \(\mathcal{D}^+\) (consisting of states and actions).
- The Objective (\(D_{KL}\)): This term represents the information gain. We want the new dataset to significantly update the model’s posterior distribution. If the data simply tells the model what it already knows, this term is low.
- The Constraint: \(S^+ \sim P(\mathcal{R}^F, O^F | S)\). The generated simulation states must differ from the real world only in ways that don’t change the failure context. They must align with the observed failure relations \(\mathcal{R}^F\) and the observed point cloud \(O^F\).
This creates a posterior distribution that balances the need to match the failure observation with the prior over valid simulation states:

Here, \(\Gamma(r|O=\Psi(S))\) ensures that the simulation state, when rendered into a point cloud, produces the same symbolic relations as the failure.
Making it Tractable with Stein Variational Inference (SVI)
Solving the optimization above directly is computationally intractable. It would require running the simulator and retraining the model inside the optimization loop—far too slow for a robot that needs to act.
The authors simplify the objective by using Entropy as a proxy for information gain (specifically, the entropy of the model’s predictive distribution). High entropy means the model is uncertain; therefore, resolving that uncertainty provides high information gain.
The simplified problem looks like this:

To solve this, the authors utilize Stein Variational Inference (SVI), specifically Stein Variational Gradient Descent (SVGD). SVGD is a powerful method that uses a set of particles to approximate a probability distribution.
Why SVI?
- Multi-modal: Failure scenarios are complex. There might be five different ways to arrange objects on a table that all result in the same failure. A standard gradient descent method would collapse to a single solution. SVI maintains a diverse set of particles, capturing different “modes” of the failure.
- Parallelizable: SVI updates particles in parallel, making it highly efficient on GPUs.
The inference process happens in two stages: generating states and generating actions.
Stage 1: Generating Diverse States
First, the system generates a set of simulation states (particles) that match the visual and symbolic profile of the failure. The update rule for these particles uses the Stein Operator:

The equation might look intimidating, but it has a physical interpretation. The term \(\nabla \ln P\) acts like a gravitational force, pulling the particles toward states that look like the failure. The second term, involving the kernel \(k\), acts like a repulsive force. It prevents the particles from clumping together, ensuring the robot considers a diverse set of scenarios that could have caused the failure.
Stage 2: Generating Informative Actions
Once the scenes are set, the robot needs to decide what actions to practice. It doesn’t just want to repeat the failed action; it wants to explore the parameter space around that action to understand the boundaries of success and failure.
The objective here maximizes the entropy (uncertainty) of the model:

And the update rule for the action particles pushes them toward high-uncertainty regions:

The Complete Pipeline
Figure 2 illustrates the entire Fail2Progress pipeline.

- Detect: The robot tries to pick up a cup but fails. It records the point cloud and the relations (e.g., “cup is not held”).
- Generate (SVI): The system spins up the SVI engine. It generates 20 different simulation environments (states) that look like the messy table and 20 different grasp parameters (actions).
- Simulate & Label: It runs these 20 scenarios in a fast physics simulator (IsaacGym).
- Fine-Tune: The robot updates its neural network model using this fresh, targeted batch of data.
- Recover: The robot replans with the smarter model and succeeds.
Experiments & Results
The authors evaluated Fail2Progress on three challenging long-horizon tasks:
- Hierarchical Tabletop Organization: Putting objects into cups/bowls.
- Multi-object Transport: Putting multiple items into a bag and moving the bag.
- Constrained Packing: Fitting items onto a crowded shelf.
They compared their method against several baselines, including:
- Original: The base model without fine-tuning.
- Replanning: Just trying again without learning.
- Sampling: Using rejection sampling to find data (random guessing).
- Gradient: Using standard Stochastic Gradient Descent (SGD) to generate data.
Simulation Success Rates
The results in simulation were drastic. As shown in Table 3, standard methods struggled significantly with these complex tasks.

Fail2Progress achieved success rates in the 70-90% range, while the original model often languished below 15%. Even the “Replanning” baseline, which represents a standard robotics approach, only reached ~24%. This confirms that simply trying again isn’t enough; the robot has to fundamentally update its understanding of the world.
Real-World Performance
But does this transfer to physical robots? The authors deployed the system on a mobile manipulator.

Figure 3(a) shows the real-world success rates. Fail2Progress (purple bar) consistently outperforms the Gradient and Sampling baselines across scenarios with 3, 5, and 7 objects.
Figure 3(b) highlights an important limitation, however. The method relies on a good “Sim-to-Real” translation. If the noise in the real-world perception is too high (Sim2Real Gap > 0.6), the performance degrades. If the robot cannot accurately perceive the failure state, it cannot generate a relevant simulation to learn from.
Efficiency
One of the main arguments for using SVI over other sampling methods is efficiency.

Figure 6 shows the optimization time on a logarithmic scale. While Fail2Progress takes about the same time as the Gradient method, it produces much higher quality data (as seen in the success rates). Compared to Rejection Sampling (Teal bar), which explodes in time complexity as the problem gets harder (7 objects), Fail2Progress remains computationally efficient thanks to parallelization on the GPU.
Generalization
A critical question is whether the robot is just memorizing the solution to the specific failure it just saw, or if it is actually learning a generalizable skill.

The authors tested this by fine-tuning the model on failures involving 3 objects and then testing it on scenes with 5 or 7 objects. Figure 8 shows these unseen scenarios. The results (detailed in the paper’s tables) show that the model retains high performance even in these novel situations, suggesting it learned the concept of the interaction rather than just memorizing the specific geometry of the failure.
Qualitative Examples
Finally, seeing is believing. Figure 4 shows the robot in action.

In the second row (Multi-object Transport), the robot initially fails to understand that moving the bag moves the objects inside it—it places the bag on the floor. After Fail2Progress, it learns the “container” relationship and successfully places the bag (and its contents) onto the table.
Conclusion & Implications
Fail2Progress represents a significant step forward in making robots more autonomous and resilient. By acknowledging that pre-trained models will inevitably fail in the open world, the authors provide a structured framework for recovery.
The key takeaways are:
- Failures are Data: A failure is not a dead end; it is a high-signal data point that identifies exactly where the model is weak.
- Simulation as a Thinking Tool: We don’t just use simulation for pre-training. We can use it “at runtime” to hallucinate solutions to current problems.
- Diversity Matters: Using SVI to generate diverse synthetic data is far superior to finding a single solution (Gradient) or random guessing (Sampling).
While limitations remain—specifically regarding Sim2Real gaps and perception noise—this work paves the way for robots that can operate in our homes over long periods. Instead of needing a software update every time they encounter a new type of shelf, they can simply pause, “think” (simulate), learn, and progress.
](https://deep-paper.org/en/paper/2509.01746/images/cover.png)