Generative AI has recently transformed how we create images, videos, and text. Models like DALL-E and Sora rely on diffusion models—systems capable of learning complex, high-dimensional distributions and generating diverse, high-quality outputs. Naturally, roboticists have begun asking: Can we use this same technology to control robots?

Ideally, a robot could “dream” up a path through a complex environment just as easily as an image generator dreams up a sunset. But there is a catch. If an image generator puts a sixth finger on a hand, it looks weird. If a robot planner puts a trajectory through a wall, the robot crashes.

Standard diffusion models lack explicit “constraint awareness.” They generate data that looks like their training set, but they don’t inherently understand the hard laws of physics or the immediate presence of a new obstacle. This limitation has made them risky for safety-critical applications like autonomous driving.

In this post, we will dive deep into a paper that bridges this gap: Constraint-Aware Diffusion Guidance (CoDiG). The researchers propose a framework that forces diffusion models to respect safety constraints during inference, without needing to retrain the model for every new obstacle. To prove it works, they tested it in one of the most demanding environments possible: miniature autonomous racing.

Experimental platform used to evaluate the performance of the CoDiG framework for realtime obstacle avoidance in autonomous racing. The setup includes (a) a down-scaled race track, (b) a custom-built autonomous vehicle,and (c) an obstacle configuration that simulates a challenging and realistic racing scenario.

The Challenge: Hallucinations vs. Hard Walls

To understand why this research is significant, we first need to look at how diffusion models usually work and where they fail in robotics.

A diffusion model is trained to reverse a process of destruction. During training, it takes good data (like a safe driving trajectory) and slowly adds noise until it is just random static. The model learns to reverse this—starting with static and denoising it step-by-step to recover the clean data.

However, in robotics, “clean data” isn’t enough. We need feasible data. A trajectory must:

  1. Adhere to vehicle dynamics (you can’t turn instantly).
  2. Avoid obstacles (you can’t drive through walls).

Existing attempts to fix this usually fall into two buckets:

  • Training-time constraints: You teach the model about obstacles during training. The problem? If the robot sees a new obstacle configuration it wasn’t trained on, it might fail.
  • Inference-time projection: You let the model generate a path, then use a solver to “project” it back into the safe zone. This works, but it is computationally heavy and often too slow for real-time racing.

CoDiG takes a different approach. It modifies the mathematical “score function” (the gradient that guides the denoising) to include a “barrier function.” Think of this barrier as a force field: as the diffusion model tries to finalize a trajectory, the barrier function gently (or forcefully) pushes the path away from obstacles.

Background: Score-Based Generative Modeling

Before we get to the solution, let’s establish the mathematical foundation. The authors view diffusion through the lens of Stochastic Differential Equations (SDEs).

The Forward Process

Imagine a clean trajectory \(x_0\). We gradually destroy it over time \(t\) ranging from 0 to \(T\). This is modeled by a forward SDE:

Equation for the forward SDE

Here, \(f\) is a drift term (pulling the data) and \(g\) is a diffusion term (adding noise). A common choice is the Ornstein–Uhlenbeck (OU) process, which pulls the data toward a mean (usually zero) while adding noise.

Equation for the Ornstein–Uhlenbeck process

Because this process is well-defined, we can calculate the probability distribution of the noisy data at any time \(t\). As \(t\) approaches \(T\), the data becomes indistinguishable from pure Gaussian noise.

The Reverse Process

The magic happens in reverse. To generate new data, we start with noise (\(x_T\)) and solve a reverse-time SDE to get back to \(x_0\). The equation for this reverse process is:

Equation for the reverse-time SDE

Look closely at the term inside the brackets: \(\nabla_x \log p_t(x_t)\). This is the score function. It points in the direction of higher data density—essentially telling the noise how to rearrange itself to look like a valid trajectory.

Since we don’t know the true distribution of all safe trajectories, we train a neural network, denoted as \(s_\theta\), to approximate this score.

Equation for the training objective

Once trained, we can plug this network into the reverse SDE and generate trajectories. However, this standard formulation has no way to accept “new” constraints like a sudden obstacle on the track.

The CoDiG Method: Steering with Math

The core contribution of this paper is changing how the reverse sampling happens. Instead of just following the learned score (which only knows about the training data), the researchers inject explicit knowledge about constraints into the process.

Defining the Constraint

Let \(\mathcal{C}\) be the “feasible region”—the parts of the track where it is safe to drive.

Equation defining the feasible region C

If we want the model to generate samples only within this region, we are effectively asking for a conditional distribution: \(p_t(x_t | \mathcal{C})\). The researchers propose a clever way to model this without retraining. They define the constrained distribution as the original distribution multiplied by a barrier function.

Equation for the constrained distribution

Here, \(V(x_t; \mathcal{C})\) is a barrier potential. It is a mathematical function that returns a value close to zero if the trajectory is safe, but shoots up to a high value if the trajectory hits an obstacle. The term \(\gamma_t\) is a weight that controls how strong this “force field” is at time \(t\).

The Gradient Shift

The beauty of this formulation is what happens when we take the “score” (the gradient of the log-probability).

Equation showing the gradient of the constrained distribution

This simple equation is the key to the whole framework. The new score is simply the original learned score minus the gradient of the barrier function.

  • Original Score (\(\nabla_x \log p_t\)): “This looks like a realistic driving path.”
  • Barrier Gradient (\(\nabla_x V\)): “Move away from that wall!”

By combining them, we get a modified reverse SDE that we can solve during inference:

Equation for the modified reverse SDE with constraint guidance

This equation tells us how to update the trajectory at every step of the denoising loop. The neural network provides the realism, and the barrier function provides the safety.

Designing the Barrier Function for Racing

For autonomous racing, the constraints are physical. The car must stay on the track and avoid obstacles. The researchers designed a specific barrier function \(V\) composed of two parts:

Equation for the barrier function V

  1. Safety (First Part): An indicator function that activates if the lateral position \(\hat{y}\) is outside the feasible region \(\mathcal{C}\). This pushes the car out of obstacles.
  2. Feasibility & Optimality (Second Part): This term penalizes deviations from a “nominal” path. It helps account for track curvature (which is lost in local coordinate transformations) and encourages the car to take a time-optimal line (cutting corners safely).

Visualizing the Impact

Does this math actually change the trajectory? The difference is striking.

Below, you can see the denoising process. On the left (a), the standard model generates a valid loop, but it clips the obstacle (the gray box). On the right (b), with the barrier function applied, the trajectory creates a clean curve around the obstacle.

Figure 1: Intermediate denoising results during sampling at three representative time steps t=1s, 0.591s, 0.002s. (a) Sampling without barrier function. (b) Sampling with barrier function.

Notice how early in the process (middle column) the guided model (b) is already reshaping the “flow” of the trajectory to avoid the forbidden zone.

Real-Time Execution: The Need for Speed

Solving a differential equation 1000 times to generate one path is fine for generating an image, but a race car moving at high speed needs decisions now. A standard diffusion process might run at 0.25 Hz (one plan every 4 seconds)—far too slow to avoid a dynamic obstacle.

To solve this, the authors introduce a Warm-Start Strategy.

The Insight

In robotics, the world doesn’t change instantly. If you plan a path at time \(T\), the path you need at time \(T + 0.01s\) is probably very similar.

Instead of starting the diffusion process from pure, random noise every single time, CoDiG takes the previous solution, adds a small amount of noise to it, and then runs the denoising process for only a few steps.

This serves two purposes:

  1. Speed: It drastically reduces the number of steps needed (e.g., from 1000 steps down to 50), boosting the frequency to 2.5 Hz.
  2. Consistency: It ensures the new plan isn’t wildly different from the old one, which is important for smooth control.

Figure 6: Comparison of reference trajectories generated with and without the warm start technique. Black lines are standard diffusion (slow, from scratch). Red lines are warm-start (fast, from previous).

In the figure above, you can see that the warm-start trajectories (red) are slightly coarser but effectively identical in safety and shape to the full diffusion trajectories (black), while being computationally cheap enough for real-time use.

Experimental Setup and Results

The team deployed CoDiG on a custom-built 1:28 scale race car. The system architecture involves several moving parts:

  1. Perception: A map and obstacle detector.
  2. Diffusion Planner: The CoDiG model generates the reference path (\(y_{ref}\)).
  3. Tracking MPC: A Model Predictive Controller takes the reference path and calculates the exact steering and throttle commands (\(u\)) to follow it.

Figure 9: Flowchart of the proposed CoDiG framework.

Dataset Construction

To train the model, they didn’t just drive the car around. They used Optimal Control solvers to generate mathematically perfect “expert” trajectories for thousands of random obstacle configurations.

To make the data efficient, they transformed the global track coordinates into a Frenet Frame—a coordinate system that flattens the track into a straight line relative to the center.

Figure 3: (a) Time-optimal trajectory. (b) Data augmentation with redundant obstacles. (c) Flattened Frenet representation used for training.

This transformation (shown in panel c) allows the model to generalize. It doesn’t need to memorize the specific curve of turn 3; it just learns “how to pass an obstacle on the left.”

Real-World Obstacle Avoidance

The ultimate test was deploying this on the track with real, physical obstacles. The results were impressive. The car achieved a 100% success rate in obstacle avoidance across experimental trials.

Figure 2: Real-world demonstration of real-time obstacle avoidance. Red lines are CoDiG plans. Black dashed lines are the MPC predictions.

In the image above, you can see the sequence of events:

  1. Encroachment: An obstacle (black circle) blocks the path.
  2. Replanning: CoDiG instantly bends the red reference line around the obstacle.
  3. Execution: The black dashed line (the car’s actual predicted motion) follows the red line smoothly.

The system was also compared against offline, time-optimal solvers. The trajectories generated by CoDiG were found to be near time-optimal, meaning the safety constraints didn’t force the car to drive overly conservatively or slowly.

Conclusion

The CoDiG framework represents a significant step forward for generative AI in robotics. By integrating barrier functions directly into the diffusion sampling process, the researchers have created a way to harness the powerful distribution-learning capabilities of diffusion models without sacrificing the safety guarantees required by physical systems.

Key takeaways from this work include:

  • Safety via Guidance: You don’t need to train a model on every possible crash scenario. You can mathematically “guide” a generic model to safety during inference.
  • Warm-Starting is Vital: For real-time robotics, recycling the previous solution accelerates generation enough to make diffusion models practical for control loops.
  • Generalization: By combining local coordinate transformations (Frenet frame) with constraint guidance, a model trained on limited data can navigate complex, unseen environments.

As we look toward the future, techniques like CoDiG suggest that the “hallucination” problem of Generative AI might effectively be solved by wrapping these models in the firm mathematical embrace of control theory. For autonomous racing, this means driving fast, looking cool, and—most importantly—not crashing.