Imagine walking through a busy train station during rush hour. You don’t just calculate the exact future trajectory of every person around you. Instead, you instinctively identify who is walking steadily and who is rushing unpredictably. You give the erratic rushers more space—effectively placing a “safety bubble” around them based on how unsure you are of their movement.

For mobile robots, replicating this intuition is incredibly difficult. While Deep Reinforcement Learning (RL) has enabled robots to navigate crowds in simulation, these robots often suffer from a “reality gap.” They perform beautifully in the environments they were trained in but fail dangerously when faced with Out-of-Distribution (OOD) scenarios—such as sudden changes in walking speeds, group behaviors, or aggressive pedestrian dynamics.

In this post, we dive into a recent paper presented at CoRL 2025, “Towards Generalizable Safety in Crowd Navigation via Conformal Uncertainty Handling.” The researchers propose a novel framework that doesn’t just predict where humans will go, but actively quantifies how wrong those predictions might be, using that uncertainty to drive safer behaviors.

The Core Problem: Overfitting in Crowd Navigation

Traditional RL approaches for crowd navigation typically treat human trajectory prediction as a deterministic input. The robot observes the world, predicts future paths for all pedestrians, and plans a route.

The flaw in this approach is overfitting. If a robot is trained in a simulation where everyone walks at 1.0 m/s, it learns to rely on that specific dynamic. If it is deployed in a real-world scenario where people are running at 2.0 m/s, its predictions fail. Because the robot doesn’t know that it is wrong, it proceeds with high confidence into a collision.

To solve this, we need a system that:

  1. Quantifies Uncertainty: Realizes when its predictions are becoming unreliable.
  2. Adapts Online: Adjusts its uncertainty estimates in real-time as crowd dynamics shift.
  3. Constrains Behavior: Uses these estimates to force the robot to be more cautious.

The Solution: A Pipeline for Uncertainty-Aware Navigation

The researchers introduce a framework combining Adaptive Conformal Inference (ACI) with Constrained Reinforcement Learning (CRL).

Figure 1: The overall pipeline of our method. We mark components related to humans in yellow, components related to physical information and decision making of the robot in blue, and fused features in green.

As shown in Figure 1 above, the system works in a loop:

  1. Prediction: A trajectory predictor (like a Constant Velocity model or a Transformer) estimates future human positions.
  2. Uncertainty Quantification (ACI): The system calculates a dynamic “prediction set” (a safety radius) around the predicted points.
  3. Policy Network: An attention-based neural network processes these features.
  4. Constrained RL: The agent is trained not just to reach the goal, but to keep “intrusions” into these safety radii below a specific threshold.

Step 1: Quantifying Uncertainty with Adaptive Conformal Inference

The heart of this method is how it handles prediction errors. Standard uncertainty methods (like Bayesian networks) can be computationally heavy or require specific data distributions. Instead, the authors use Adaptive Conformal Inference (ACI), specifically a dynamically-tuned version (DtACI).

ACI creates a bubble around a prediction that is guaranteed to contain the ground truth with a certain probability (e.g., 90%). Crucially, if the model starts missing (the ground truth falls outside the bubble), ACI automatically expands the bubble for the next step. If the predictions are accurate, it shrinks the bubble to avoid being overly conservative.

The update rule for the estimated prediction error \(\hat{\delta}\) is:

Equation for updating estimated prediction error based on alpha and gamma.

Here, \(\alpha\) is the target error rate (e.g., 0.1 for 90% coverage), and \(\gamma\) is the learning rate (step size). The term err simply tracks whether the previous prediction was inside the bubble (0) or outside (1):

Equation defining the error term err(t).

If the prediction was wrong (outside the bubble), the estimated error increases, creating a larger safety margin for the next step.

Why Dynamically-Tuned? A fixed learning rate \(\gamma\) might be too slow for sudden changes (like a person suddenly breaking into a run) or too unstable for smooth scenarios. DtACI runs multiple estimators with different learning rates simultaneously and dynamically weights them based on their recent performance.

Equation showing the weighting mechanism for multiple estimators.

This ensures the robot reacts instantly to distribution shifts. You can see this behavior in Figure 4 below. Notice how the ACI error (the difference between the estimated and actual error) fluctuates but generally stays above 0, meaning the safety bubble is successfully covering the actual human position.

Figure 4: Visualization of ACI errors for one pedestrian’s five prediction steps. ACI provides valid coverage when the ACI error is greater than 0.

Step 2: Constrained Reinforcement Learning (CRL)

Having an uncertainty bubble is useless if the robot ignores it. To enforce safety, the authors formulate the navigation task as a Constrained Markov Decision Process (CMDP).

Instead of just maximizing rewards (reaching the goal), the robot must satisfy a cost constraint. The cost is defined by “intrusions”—entering the safety zones of pedestrians.

The safety zone is defined as a union of the person’s current physical size and their future uncertainty bubbles:

Equation defining the safety areas D1 and D2. Equation defining the radii r1 and r2, incorporating the uncertainty delta.

Here, \(r_2\) is the critical part: it expands the radius of the pedestrian by \(\hat{\delta}_{h,k}\)—the uncertainty estimate we calculated earlier. If the robot is unsure about a human’s motion, \(\hat{\delta}\) grows, effectively inflating the pedestrian’s size in the robot’s “mind,” forcing the robot to give them a wider berth.

The optimization objective uses the PPO Lagrangian method to balance the goal and the safety constraint:

Equation showing the optimization objective: Maximize reward subject to expected intrusion being less than d.

The system learns a multiplier \(\lambda\) (Lagrange multiplier) that penalizes the policy heavily if the estimated cost exceeds the limit \(\tilde{d}\).

Experimental Results

The researchers tested this framework in the CrowdNav simulation environment against strong baselines, including classic methods like ORCA and previous state-of-the-art RL methods like CrowdNav++.

In-Distribution Performance

Even in the standard environment where training and testing scenarios match, the proposed method dominates.

Table 1: In-Distribution Test Results showing higher Success Rate and lower Collision Rate for the proposed method.

Looking at Table 1, “Ours (w/ GST)” achieves a 96.93% Success Rate, significantly higher than CrowdNav++ (86.11%). More importantly, the Collision Rate (CR) drops from roughly 14% (CrowdNav++) to just 2.93%. The robot is not just safer; it is also “politer,” as indicated by the lower Intrusion Time Ratio (ITR).

Out-of-Distribution (OOD) Robustness

The real test of this method is when the environment changes. The authors tested three difficult OOD scenarios:

  1. Rushing Humans: 20% of pedestrians move at double speed.
  2. Different Behaviors: Pedestrians switch from ORCA (reciprocal avoidance) to Social Force (SF) dynamics.
  3. Group Dynamics: Pedestrians move in tight clusters.

Table 2: Out-of-Distribution Test Results. The proposed method maintains high success rates across all shifts.

As Table 2 reveals, while baseline RL methods collapse (e.g., in Group dynamics, baselines drop to ~70-80% success), the proposed method maintains ~94% success rates.

Visualizing the Adaptability Figure 2 below illustrates why the method works.

Figure 2: Test-case visualizations. (a) Ours navigating safely. (b) CrowdNav++ failing. (c) Ours adapting to rushing humans.

  • Panel (b) shows CrowdNav++ failing because it relies on static predictions that don’t account for errors.
  • Panel (c) is particularly interesting. It shows an OOD scenario with rushing pedestrians. The light blue circles represent the uncertainty bubbles. Notice how large they are? Because the pedestrians are moving unexpectedly fast, DtACI has expanded the bubbles. The robot recognizes this high uncertainty and navigates through the safe gap between them.

Real-World Deployment

Simulation results are promising, but the physical world is the ultimate benchmark. The authors deployed the policy directly onto a ROSMASTER X3 robot without fine-tuning.

Figure 6: Real-robot deployment results showing uncertainty visualization, yielding, and long-range navigation.

In Figure 6, we can see the robot in action:

  • Panel (a): Shows the uncertainty visualization in RViz. When the human is moving unpredictably, the blue bubble is large. When they stand still, the uncertainty shrinks, allowing the robot to move closer.
  • Panel (b): Demonstrates Yielding Behavior. The robot detects a potential collision, slows down (t=3s), waits for the human to pass, and then resumes.
  • Panel (d): Shows Long-Range Navigation, proving the robot can handle sustained interactions over long distances.

Conclusion

The “Generalizable Safety” paper highlights a critical lesson for the future of robotics: Accuracy isn’t enough; we need self-awareness. By acknowledging that trajectory predictions will often be wrong, and by building a mathematical framework (Conformal Inference) to quantify that error, robots can become significantly more robust.

This approach transforms the “black box” of Deep RL into a controllable, safety-aware system. It doesn’t just hope for the best; it plans for the worst-case variance in human behavior, making it a major step forward for deploying robots in the chaotic real world.

For students and researchers in robotics, this methodology—coupling uncertainty quantification directly into the learning loop—offers a blueprint for tackling distribution shifts in other safety-critical domains, from autonomous driving to drone flight.