Planning a sequence of actions to achieve a distant goal is one of the fundamental challenges in robotics. Imagine asking a robot to “cook a chicken dinner.” This isn’t a single action; it’s a complex hierarchy of tasks. The robot must plan high-level subgoals (open fridge, get chicken, place in pot, turn on stove) and execute low-level movements (joint angles, gripper velocity) to achieve them.
Diffusion models have recently revolutionized this field, treating planning as a generative modeling problem. However, as the “horizon” (the length of the task) grows, these models often struggle. They either hallucinate physically impossible trajectories or get stuck in local optima.
In this post, we are doing a deep dive into Coupled Hierarchical Diffusion (CHD), a new framework proposed by researchers at the National University of Singapore. This paper tackles the “loose coupling” problem in hierarchical planning—where the high-level planner sets a goal and ignores the low-level planner’s struggle to achieve it. CHD introduces a mathematical framework where the “boss” (high-level) and the “worker” (low-level) plan jointly, allowing for self-correcting, long-horizon plans.
The Problem: The Disconnect in Hierarchical Planning
To understand why CHD is necessary, we first need to look at how robots currently plan for the long term.
Standard diffusion planners (like the “Diffuser”) work well for short tasks. They generate a trajectory by refining random noise. However, for long tasks, the uncertainty explodes. To manage this, researchers use Hierarchical Planning. This decomposes the problem into two layers:
- High-Level (HL) Planner: Generates subgoals (checkpoints) along the path.
- Low-Level (LL) Planner: Generates the specific trajectory segments to connect these subgoals.
The industry standard approach, referred to here as Baseline Hierarchical Diffusion (BHD), treats these as separate sequential steps. The HL planner dictates the subgoals, and then the LL planner tries to connect the dots.
The Failure Mode: What happens if the HL planner sets a subgoal that is semantically valid but physically awkward or impossible for the LL planner to reach due to obstacles or kinematics? In BHD, the LL planner is stuck trying to solve an impossible problem because the subgoals are fixed. There is no feedback loop. The “boss” has left the building, and the “worker” is failing.

As illustrated in Figure 1 above, this disconnect leads to incoherence. On the left, a standard approach might set subgoals that look fine from a bird’s-eye view but result in jerky, suboptimal low-level paths. On the right, CHD introduces a feedback loop where the low-level trajectory informs and refines the high-level subgoals during the planning process.
Background: Diffusion as Planning
Before dissecting CHD, let’s briefly recap the mathematical foundation.
Diffusion models generate data by reversing a noise process. In robotics, the “data” is a trajectory \(\tau\) consisting of states and actions. The model learns a gradient field (score function) to “denoise” a random chaotic path into a smooth, valid trajectory that maximizes a reward.
In a hierarchical setting, we split the trajectory into segments.
- \(\tau^g\): The sequence of High-Level subgoals.
- \(\tau^x\): The sequence of Low-Level trajectory segments connecting those subgoals.
The goal is to generate both \(\tau^g\) and \(\tau^x\) such that they satisfy an optimality condition, denoted as \(\mathcal{O}=1\) (meaning the plan achieves high reward).
The Evolution of Hierarchical Architectures
To appreciate the contribution of CHD, it is helpful to visualize the evolution of these architectures.

Figure 2 provides a roadmap of this evolution:
- (a) Baseline (BHD): The HL planner generates \(\tau^g\), passes it down, and the LL planner generates \(\tau^x\). The arrow only goes one way.
- (b) Joint Diffusion Model (JDM): This is the theoretical ideal. We treat the subgoals and trajectories as one giant joint distribution and diffuse them together. This ensures perfect coupling but is computationally expensive and hard to scale.
- (c) Coupled Hierarchical Diffusion (CHD): This is the proposed method. It approximates the joint model using a clever feedback mechanism via a classifier, allowing bidirectional influence without the massive computational cost of JDM.
The Core Method: Coupled Hierarchical Diffusion
The researchers propose CHD to satisfy three critical properties for effective planning:
- Bi-directional Coupling: HL guides LL, but LL feedback refines HL.
- Parallel Sampling: Both levels are generated simultaneously to save time.
- Reduced Complexity: Breaking the problem into smaller segments to make it tractable.
1. The Joint Distribution Approximation
CHD starts with the idea of the Joint Diffusion Model (JDM) but simplifies the dependencies to make it practical. Instead of a messy, fully entangled probabilistic graph, CHD simplifies the reverse process (the planning step).
In CHD, the high-level reverse step depends on itself, but the low-level reverse step depends on both the low-level state and the high-level subgoal.
The joint probability is modeled as:

Here, \(p_{\theta^g}\) is the high-level denoiser and \(p_{\theta^x}\) is the low-level denoiser. Notice that the low-level term \(p_{\theta^x}\) is conditioned on the high-level state \(\tau^g_{t-1}\). This establishes the top-down guidance.
2. Coupled Classifier Guidance (The Feedback Loop)
The “magic” of CHD lies in how the Low-Level informs the High-Level. This is done through Classifier Guidance.
In diffusion models, we often use a classifier to push the generation toward a specific class or high-reward state. CHD uses a hierarchical classifier \(p_\phi(\mathcal{O}=1 | \tau^g, \tau^x)\) that evaluates the optimality of the current plan.
Crucially, because this classifier looks at both the subgoal and the trajectory, its gradient can be backpropagated to update both.

This equation shows the full reverse process conditioned on optimality (\(\mathcal{O}_{1:N}=1\)). The term \(p_{\phi}\) is the coupled classifier. It acts as a bridge. If the LL trajectory looks jagged or collides with a wall, the classifier lowers the probability of optimality. When we take the gradient of this classifier, it pushes the HL subgoals to shift positions to relieve the stress on the LL trajectory.
3. Asynchronous Parallel Generation
A major bottleneck in hierarchical planning is sequential processing (waiting for HL to finish before starting LL). CHD introduces an asynchronous schedule.
Because the Low-Level step at diffusion time \(t\) (\(\tau^x_t\)) depends on the High-Level state, we cannot perfectly synchronize them. However, CHD structures the dependency such that they are staggered.
The reverse process is decomposed into three stages:
- Initialization: Sample priors.
- Asynchronous Core: Update \(\tau^g_{t-1}\) and \(\tau^x_t\) in parallel.
- Final Step: Resolve the final timestep.
The decomposition looks like this:

This structure allows the GPU to process both diffusion models simultaneously, significantly speeding up inference compared to sequential baselines.
To make the guidance work in this staggered setup, the authors use a clever chain-rule approximation to pass gradients from the current LL state “upstream” to the previous HL step:

This equation essentially says: “Adjust the High-Level subgoal (\(\tau^g\)) based on how much it improves the optimality of the current Low-Level trajectory (\(\mu_{\theta^x}\)).”
4. Segment-wise Generation
Finally, to handle very long horizons, CHD breaks the low-level trajectory into \(N\) segments.

Instead of generating one massive trajectory vector, the model generates \(N\) smaller segments, each conditioned on its specific local subgoal \(g_i\). This reduces the dimensionality of the problem and prevents the “vanishing gradient” issues common in long sequence modeling.
Experiments and Results
The authors evaluated CHD across three distinct domains: Maze Navigation (continuous control), Robot Task Planning (discrete/symbolic), and a Real-World Robot demo.
1. Maze Navigation
This is the classic stress test for long-horizon planning. The agent must navigate large, complex mazes. The “subgoals” are waypoints, and the “trajectory” is the path.
The Results: CHD consistently outperformed the baselines (Diffuser, BHD, and others) in terms of normalized reward (efficiency of the path).

In Figure 3 (Left), you can see the qualitative difference. The BHD (purple) sets subgoals that force the agent into awkward, sharp turns. CHD (orange) adjusts the subgoals to create a smooth, sweeping curve that is much faster to execute.
The superiority of CHD is even more apparent in difficult scenarios, as shown in the grid visualization below:

In Figure 11, look at row (7, 4) or (1, 4). The standard Diffuser (blue) often creates jittery paths. BHD (green) creates valid paths but often takes inefficient routes because the subgoals are suboptimal. CHD (red) consistently finds the most direct, smooth path through the maze structure.
2. Robot Task Planning (Kitchen World)
Moving beyond navigation, the authors tested CHD on a “Cooking” task. This is a hybrid problem involving discrete states (e.g., (Chicken, In-Pot)) and actions.

As shown in Figure 4, the planner must sequence logical steps. If the LL planner realizes that “Turning on Stove” is impossible because the robot hand is full, the feedback loop informs the HL planner to insert a “Place object” subgoal first.
Quantitative Results:

Table 1 shows that CHD achieves the highest success rates (completed tasks) and the lowest number of steps (highest efficiency) compared to Transformers (like GPT-style models) and standard Diffusers. It shines particularly in the “Multi-task” settings where the complexity is highest.
The authors also tracked the “Normalized cumulative steps” (lower is better), which indicates how efficient the plan is.

Figure 5 reveals that while Transformers (Green) and VLMs (Red) start well, they often get stuck in repetition loops as the task length increases. CHD (Orange) remains stable and efficient regardless of the number of sub-tasks.
3. Real-World Robot Demonstration
Finally, the method was deployed on a physical Fetch robot tasked with organizing groceries and preparing a meal. This involves picking, placing, opening cabinets, and moving between rooms.

Figure 6 shows the complexity of the real-world task. The robot successfully planned over 25 subgoals and actions. The success of the physical execution relies heavily on the plan being kinematically feasible, which is exactly what CHD ensures by coupling the high-level logic with low-level physical constraints.
Why Does This Matter?
The transition from “loose coupling” to “tight coupling” in hierarchical planning is a significant step toward more autonomous robots.
- Self-Correction: Robots can realize during planning that a plan won’t work and fix it, rather than trying to execute a doomed plan and failing in the real world.
- Efficiency: Parallel sampling makes diffusion planning (traditionally slow) fast enough for practical use.
- Scalability: By using segment-wise generation, the method scales to very long horizons without the computational cost exploding.
Conclusion
Coupled Hierarchical Diffusion (CHD) represents a maturation of generative planning. It moves away from the rigid “top-down” command structure of previous hierarchical methods and embraces a collaborative “joint optimization” approach.
By allowing the Low-Level trajectory to “speak back” to the High-Level subgoals via classifier guidance, CHD produces plans that are not just logically sound, but physically elegant. Whether navigating a complex maze or cooking dinner in a cluttered kitchen, CHD proves that the best plans come when the Boss and the Worker are on the same page.
](https://deep-paper.org/en/paper/2505.07261/images/cover.png)