Introduction

If you have ever struggled to assemble a piece of flat-pack furniture, you know that assembly is more than just putting peg A into hole B. It involves a complex choreography: holding one piece steady with one hand, aligning another piece with the other, applying just the right amount of force, and doing it all in a specific order so the whole thing doesn’t collapse.

For humans, this is intuitive. For robots, it is an algorithmic nightmare.

While robotic automation has mastered repetitive tasks like welding or pick-and-place, general multi-part assembly remains a “holy grail” challenge. Most assembly robots today are hard-coded for a specific product. If you change the product, you have to rewrite the code, redesign the fixtures, and recalibrate the entire cell.

But what if a robot could look at a CAD model of a stool, a gearbox, or a toy, and figure it all out on its own?

Enter Fabrica, a new system developed by researchers at MIT CSAIL, ETH Zurich, and Autodesk Research. Fabrica is a dual-arm robotic system capable of autonomous, end-to-end planning and control for assembling general multi-part objects.

Fig.1: Our proposed dual-arm robotic system demonstrates adaptive manipulation and assembly capabilities for diverse multi-part objects.

In this post, we will deconstruct the Fabrica paper to understand how it bridges the gap between high-level planning and low-level motor control to achieve what was previously thought to be incredibly difficult: zero-shot, sim-to-real transfer of complex assembly skills.

The Core Problem: Why is Assembly So Hard?

To understand why Fabrica is significant, we first need to appreciate the complexity of the problem. Assembly combines two difficult sub-fields of robotics:

Long-Horizon Planning: You cannot just think about the next move. You need to plan twenty steps ahead. Which part is the base? If I attach Part C now, will it block Part D later? Which hand should hold the object?
Contact-Rich Manipulation: Inserting a part usually involves tight clearances (less than a millimeter of wiggle room). Cameras often aren’t precise enough to guide the robot perfectly. The robot needs to “feel” the contact forces to slide parts together without jamming.

Prior research usually tackled these separately. Planners would figure out the sequence but assume the robot could move perfectly. Control researchers would train robots to insert a peg into a hole but ignore the complexity of a 9-part furniture assembly.

Fabrica tackles both by integrating offline planning (the brain) with online reinforcement learning (the muscle memory).

System Overview

Fabrica operates like a seasoned engineer. Before touching a single part, it simulates the entire process. The workflow is divided into two main phases:

Planning: The system analyzes the 3D meshes of the parts to determine how to assemble them (sequence, grasps, motion paths).
Learning: The system trains a neural network policy to handle the delicate physical interactions required to actually snap the parts together.

Figure 2: System overview. Fabrica takes part meshes and hardware configurations as inputs. It plans sequences, grasps, fixture designs, and motions through a multi-stage planner.

As shown in the system overview above, the input is simply the mesh files and the robot hardware config. The output is a robot that can build the object in the real world.

Part 1: The Hierarchical Planner

The planning phase is where Fabrica solves the logic puzzle of assembly. It breaks the problem down into a hierarchy of five sub-problems. This “divide and conquer” approach makes the math computationally tractable.

Here is the overall optimization problem the researchers are trying to solve:

Optimization Equation

In plain English, this equation seeks to minimize the “cost” (energy, time, instability) of the sequence (\(\phi\)), grasps (\(\sigma\)), and motions (\(\pi\)), while strictly obeying constraints like “parts can’t float in mid-air” (\(C_{prec}\)) and “robots can’t hit each other” (\(C_{col}\)).

Let’s break down the 5 steps the planner takes to solve this.

Step 1: Precedence Planning (The “LEGO” Logic)

Before the robot moves, it needs to know the order of operations. You can’t put the roof on a house before the walls.

Fabrica determines this by simulating disassembly. It takes the completed object and tries to pull parts off one by one in a physics simulator. If a part can be removed without hitting others, it is assigned to a “tier.” By reversing the disassembly order, the system generates a valid assembly sequence graph.

Step 2: Grasp Filtering

Robots have fingers (grippers) that take up physical space. The system needs to find places to grab each part where:

The gripper doesn’t hit the part itself.
The gripper doesn’t hit the parts already assembled.
The gripper doesn’t hit the other robot arm.

Fabrica pre-computes thousands of potential grasps and filters out the ones that cause collisions. It specifically looks for “Dual-Arm” pairs—one grasp for the arm holding the base, and one grasp for the arm inserting the new part.

Step 3: Sequence-Grasp Optimization

This is the core decision-making step. Just because a sequence is possible doesn’t mean it’s good.

Fabrica builds a search tree to find the optimal sequence. It scores different paths based on stability constraints. For example, it prefers sequences where:

The held part supports the new part well (gravity doesn’t rip them apart).
The robots don’t have to switch hands or regrasp the object unnecessarily.
The torque on the gripper is minimized (heavy parts aren’t held by their fingertips).

Step 4: Automated Fixture Design

This is a brilliant addition. In most research, humans carefully place parts on a table for the robot. Fabrica automates this.

Once it knows how it wants to pick up the parts (from Step 2 & 3), it automatically designs a tray (fixture) to hold all the loose parts in exactly the right orientation for those pickups. It uses a “bin-packing” algorithm to fit all parts into the smallest possible space on the table and generates a 3D model of the tray, which can then be 3D printed.

Figure 3: Top: benchmark assemblies. Bottom: the auto-generated pickup fixtures.

Step 5: Motion Planning

Finally, the system plans the trajectories (the arm movements) to move parts from the fixture to the assembly zone. It uses a standard algorithm called RRT-Connect to ensure the arms weave around each other without colliding.

Part 2: Learning the “Feel” of Assembly

The planner provides a perfect trajectory, but the real world is messy. Sensors have noise, parts have manufacturing tolerances, and friction is unpredictable. If a robot follows a planned path blindly (Open-Loop), a 1mm error results in a jammed part or a system crash.

To solve this, Fabrica uses Reinforcement Learning (RL) to train a “local policy.” Think of this as the robot’s reflex system.

The Generalist Policy

The researchers didn’t want to train a new AI for every single part. That would take forever. They wanted a generalist policy—one brain that can insert a square peg, a round peg, or a complex gear.

To achieve this, they used two clever mathematical tricks:

1. Path-Centric Coordinate Transformation

To a neural network, inserting a peg vertically downwards looks completely different from inserting a peg horizontally sideways. The numbers (x, y, z coordinates) are totally different.

Fabrica transforms the coordinate system so that the “insertion direction” is always the Z-axis in the robot’s mind. Whether the robot is inserting a part from the top, side, or an angle, the neural network sees it as “pushing down.” This creates Equivariance. It allows the robot to reuse the same skill for every single insertion, regardless of the geometry.

2. Plan-Guided Residual Actions

Instead of asking the AI to figure out the movement from scratch, Fabrica gives the AI the “perfect” planned trajectory from Part 1. The AI’s job is not to generate the motion, but to generate a residual (a correction) on top of that motion.

It’s like a parent teaching a child to ride a bike. The parent (the planner) pushes the bike in the right direction. The child (the RL policy) just makes small adjustments to keep balance. This makes learning much faster and safer.

Experiments and Results

The team created a benchmark suite of 7 multi-part assemblies, ranging from simple 5-part beams to a complex 9-part stool and a gamepad controller.

Simulation Performance

They tested their approach against “Open-Loop Tracking” (just following the plan) and “Specialist Policies” (AI trained only for one specific object).

Table 3: % of successful steps without intervention in simulation evaluations.

As expected, Open-Loop Tracking fails miserably (often 0-20% success). The physics of contact are just too unforgiving. However, Fabrica’s Assembly Generalist Policy (AG) performs impressively well, achieving success rates comparable to specialist policies. This proves that their coordinate transformation trick works—the robot is generalizing the skill of “insertion” across different shapes.

Real-World Execution

The ultimate test is the physical world. The researchers deployed Fabrica on two Franka Emika Panda robots.

Figure 1: Step-by-step rendered assembly executions on different assemblies with different robots.

The transfer was zero-shot. This means they did not retrain the robot in the real world. They trained it in the simulator (Isaac Gym), and it worked immediately on physical hardware.

Figure 2: Step-by-step real-world assembly executions on different assemblies with Panda robots.

In the real world, the system achieved an 80% success rate per step without any human intervention. When looking at the full assembly sequence (which might have 8 or 9 steps), the system could often complete the entire object autonomously.

Table 4: % of successful steps without intervention in real-world evaluations.

Crucially, even when the robot failed (e.g., a slip), the system allowed for retries. With just 1 or 2 interventions (automated retries), the success rate climbed to over 90% for most objects.

Bonus: The Vision Upgrade

While the core system relies on “blind” proprioception (feeling the position of the arms), the researchers acknowledged that sometimes visual feedback is necessary, especially when parts slip significantly.

In an interesting extension (detailed in Appendix F), they integrated a Vision-Language Model (VLM) using a wrist-mounted camera.

Figure 4: Example outputs from VLM during corrective alignment.

When an insertion failed repeatedly, they showed the video feed to a VLM (like Gemini) and asked it for advice. The VLM could look at the image and output corrections like “The part is too far left relative to the hole and needs to move right to align.” This highlights the potential of combining precise control policies with the high-level reasoning of modern AI models.

Conclusion

Fabrica represents a significant step forward in autonomous manufacturing. By combining the structure and reliability of hierarchical planning with the adaptability of reinforcement learning, it solves problems that neither approach could handle alone.

Key Takeaways:

Dual-Arm Coordination: The system manages the complex interplay of holding and acting with two arms.
Automated Tooling: It designs its own fixtures, removing a major bottleneck in manufacturing setup.
Generalization: Through path-centric representations, the robot learns skills that apply to objects it has never seen before.

This research moves us closer to a future where robots in small factories or homes can be given a box of parts and a digital instruction manual, and simply figure the rest out for themselves.

References: Tian, Y., Jacob, J., Huang, Y., et al. “Fabrica: Dual-Arm Assembly of General Multi-Part Objects via Integrated Planning and Learning.”

Introduction#

The Core Problem: Why is Assembly So Hard?#

System Overview#

Part 1: The Hierarchical Planner#

Step 1: Precedence Planning (The “LEGO” Logic)#

Step 2: Grasp Filtering#

Step 3: Sequence-Grasp Optimization#

Step 4: Automated Fixture Design#

Step 5: Motion Planning#

Part 2: Learning the “Feel” of Assembly#

The Generalist Policy#

1. Path-Centric Coordinate Transformation#

2. Plan-Guided Residual Actions#

Experiments and Results#

Simulation Performance#

Real-World Execution#

Bonus: The Vision Upgrade#

Conclusion#