Introduction
In the world of robotics, there is a constant tension between flexibility and safety. On one hand, we want robots to use Neural Networks (NNs) to learn complex behaviors, adapt to new environments, and process high-dimensional sensor data. On the other hand, neural networks are often “black boxes”—we can’t easily guarantee they won’t command a drone to fly into a wall.
To solve this, roboticists rely on Model Predictive Control (MPC). MPC is a mathematical framework that plans movements by solving an optimization problem at every moment, strictly adhering to safety constraints (like “do not hit obstacle” or “stay within motor limits”).
Recently, researchers have tried to combine these two worlds using Differentiable MPC. The idea is simple: put a neural network in front of an MPC solver. The network looks at the world and predicts the “rules” (costs and constraints), and the MPC solver plans the safe trajectory.
But there is a flaw in this standard setup. It treats the solver and the network as separate entities. The network shouts orders, and the solver tries to execute them. If the solver struggles, the network doesn’t really know until it’s too late.
Enter DEQ-MPC, a new approach presented in a paper from Carnegie Mellon University and Bosch Center for AI. Instead of a sequential pipeline, the researchers propose fusing the network and the solver into a single, joint equilibrium system. By enabling a two-way conversation between the “brain” (network) and the “planner” (solver), they achieve richer representations, faster reaction times, and mathematically smoother training.
In this post, we will tear down the sequential wall and explore how Deep Equilibrium Models can revolutionize robotic control.
Background: The Components
Before diving into the new method, we need to understand the building blocks: Traditional Differentiable MPC and Deep Equilibrium Models.
Differentiable Model Predictive Control
At its core, MPC solves an optimization problem to find a sequence of actions (\(\tau\)) that minimizes a cost function (\(C\)) subject to dynamic rules (\(f\)) and safety constraints (\(h\)).

In a Differentiable MPC setup, a neural network predicts the parameters \(\theta\) (like the weights of the cost function or the position of obstacles). The MPC solver then takes these parameters and outputs the optimal trajectory \(\tau^*\).
To train the neural network, we need to calculate gradients. Since the “layer” in the middle is an optimization solver, not a standard matrix multiplication, we use the Implicit Function Theorem (IFT).

This equation allows us to backpropagate through the optimization solution. However, standard methods treat this as a “one-shot” pass: the network predicts \(\theta\), the solver finds \(\tau\), and we compute the gradient.
Deep Equilibrium Models (DEQs)
Deep Equilibrium Models are a fascinating class of neural networks. Instead of stacking layers (Layer 1 \(\to\) Layer 2 \(\to\) … \(\to\) Output), a DEQ effectively runs a single layer infinitely many times until the output stabilizes.
Mathematically, we look for a fixed point \(z^*\) such that if we pass it through the network layer \(d_{\phi}\) again, it doesn’t change:

This “infinite depth” allows DEQs to model very complex relationships with fewer parameters. The DEQ-MPC paper leverages this concept to bind the network and the solver together.
The Core Method: DEQ-MPC
The authors identify a critical weakness in standard differentiable MPC: The Solver Gap.
In typical setups (Diff-MPC), the network inference happens first, followed by the optimization. The network has to guess the optimization parameters \(\theta\) without knowing how the solver will react or what the current trajectory looks like.

As shown on the left of Figure 1, the standard approach is linear. On the right, DEQ-MPC introduces a feedback loop. The network adapts its prediction \(\theta\) based on the solver’s current trajectory \(\tau\), and the solver updates the trajectory based on the new \(\theta\).
Formulating the Joint Problem
The researchers reformulate control not as a sequence of steps, but as a joint optimization problem. They want to find the optimal trajectory \(\tau^*\) and the optimal network parameters \(\theta^*\) simultaneously.
They treat the neural network inference itself as an equality constraint within the optimization problem:

Notice the last line in the image above: \(\theta = \mathrm{NN}_\phi(\dots, \tau_{0:T})\). This means the parameters \(\theta\) must match the network’s output given the current trajectory. This creates a “chicken-and-egg” coupling that forces the two systems to agree.
The Iterative Solution (ADMM)
Solving this joint problem requires a specialized algorithm. The authors use a variation of the Alternating Direction Method of Multipliers (ADMM). Instead of trying to solve everything at once, they alternate between two simpler steps:
- Network Step: Fix the trajectory \(\tau\) and ask the network to predict updated parameters \(\theta\).
- Solver Step: Fix the parameters \(\theta\) and ask the MPC solver to improve the trajectory \(\tau\).

This iterative process (Equation 8) continues until both \(\theta\) and \(\tau\) stabilize—reaching an equilibrium.
- Why is this better? It allows the network to “see” the difficulty the solver is having. If the solver is stuck near an obstacle, the trajectory \(\tau\) reflects that. The network sees this \(\tau\) in the next iteration and can adjust the cost function \(\theta\) to help the solver navigate around it.
The Inner Workings: Architecture and Solver
To make this practical, the authors had to make specific design choices for the solver and the network architecture.
The Augmented Lagrangian Solver
Inside the MPC block, the paper uses an Augmented Lagrangian (AL) method. This is a robust way to handle the hard constraints (like “don’t hit the wall”) that are common in robotics.

The Lagrangian \(\mathcal{L}\) combines the costs with the constraints (using multipliers \(\lambda, \eta\) and penalty \(\mu\)). This converts the constrained problem into a series of unconstrained problems that are easier to solve.
DEQ-MPC-DEQ: A DEQ inside a DEQ
Here things get meta. The authors explore two architectures for the neural network part:
- DEQ-MPC-NN: A standard feedforward network.
- DEQ-MPC-DEQ: The network itself is a Deep Equilibrium Model.
In the second variant, the entire system is a nested fixed-point problem. The outer loop balances the solver and the network, while the inner loop balances the network’s internal state. This “infinite depth” architecture proved to be more stable and powerful in complex environments.
Training and Gradients
One of the biggest headaches in differentiable optimization is gradient quality.
When an optimization solver converges tightly, the “landscape” around the solution can become extremely steep or flat, leading to useless or exploding gradients. This is often due to the penalty parameter \(\mu\) in the Lagrangian becoming very large to enforce constraints.
DEQ-MPC solves this by using the intermediate steps of the optimization for training. Instead of only looking at the final, perfect solution, the loss function compares the trajectory at every iteration of the solver against the expert demonstration.

By supervising the intermediate steps (\(j=1\) to \(I\)), the network learns to guide the solver smoothly from the start, rather than just pointing to the finish line. This acts as a form of “curriculum learning,” providing smoother, more useful gradients.
Experiments and Results
The authors tested DEQ-MPC on several robotic tasks, ranging from simple pendulums to complex quadrotors avoiding dynamic obstacles.
Performance vs. Baselines
The primary metric was how well the robots could execute tasks compared to an “expert” policy. The results were normalized so that 1.0 represents expert performance.

As shown in Figure 2, DEQ-MPC-DEQ (Red) consistently outperforms or matches the baselines. The gap becomes obvious in the hardest environments, like QPoleDynObs (a drone balancing a pole while avoiding moving obstacles), where standard Differentiable MPC (Blue) struggles significantly.
Data Efficiency and Generalization
One of the promised benefits of deep equilibrium models is better representation power. The authors tested this by training the models on varying fractions of the dataset.

Figure 3 illustrates that the DEQ-MPC variants (Purple and Green) achieve lower validation error with less data compared to standard networks. They also continue to improve as more data is added, whereas the standard methods tend to plateau (saturate) earlier.
Stability Under Pressure
Robotics is messy. Constraints can be hard, and cost functions can be sensitive.
The authors tested stability by checking gradient behavior when constraints are tight.

Figure 6 shows a stark contrast. The standard Diff-MPC (Blue/Orange lines) suffers from massive spikes in validation error—a sign of unstable gradients causing the training to diverge. The DEQ-MPC variants (Red/Green lines) remain stable, thanks to the smoother gradient flow provided by the iterative solving process.
Warm-Starting: The Speed Factor
In real-time robotics, you don’t have time to solve a problem from scratch every millisecond. You want to “warm-start”—use the solution from the previous millisecond as a starting point for the current one.
Standard Diff-MPC struggles here because the network predicts a brand new \(\theta\) every time, potentially invalidating the previous solution. DEQ-MPC, however, is designed to be iterative.

Figure 8 shows that DEQ-MPC (Red) maintains high performance even when allowed very few solver iterations (moving left on the x-axis). Standard MPC performance degrades rapidly if the solver isn’t given enough time to converge from scratch.
Real-World Validation
Simulations are great, but do they fly? The authors deployed their code on a Crazyflie nano-quadrotor to navigate through virtual obstacles.

The results were conclusive.

As Table 1 shows, the DEQ-MPC-DEQ policy achieved a 0.0% failure rate in the real world, compared to a 33% crash rate for the standard approaches. This validates that the theoretical stability benefits translate directly to physical hardware reliability.
Conclusion
DEQ-MPC represents a significant step forward in integrating deep learning with control theory. By treating the neural network and the optimization solver as partners in a joint equilibrium, rather than a sequential boss-employee relationship, the framework achieves:
- Richer Representations: The network can adapt its predictions based on the solver’s actual progress.
- Smoother Gradients: Training on intermediate steps prevents the instability often seen in differentiable optimization.
- Better Warm-Starting: The system is naturally primed for the continuous, streaming nature of real-time robotics.
For students and researchers, this paper highlights an important lesson: sometimes the architecture of how we connect modules (like solvers and networks) is just as important as the modules themselves. As we push for robots that are both smarter and safer, unified frameworks like DEQ-MPC will likely become the standard for high-performance control.
](https://deep-paper.org/en/paper/918_deq_mpc_deep_equilibrium_m-2660/images/cover.png)