Introduction
In the world of computer vision and robotics, how a machine “sees” an object is just as important as the object itself. Imagine a robot trying to pick up a coffee mug. To us, it’s a simple cup. To a computer, it might be a dense cloud of millions of points, a heavy triangular mesh, or a complex neural radiance field.
While these detailed representations are great for rendering pretty images, they are often terrible for physics. calculating a collision between a million-polygon mesh and a robotic hand is computationally expensive. This is where Shape Abstraction comes in. It simplifies complex objects into manageable primitives—like building a Lego version of a sculpture.
For years, researchers have relied on simple primitives like boxes, spheres, or superquadrics. But these are often too rigid. They can’t capture the subtle curvature of a complex tool or the organic shape of a fruit without using hundreds of blocks.
In this post, we are diving deep into a CVPR paper that proposes a new way forward: Shape Abstraction via Marching Differentiable Support Functions (MDSF). This method doesn’t just simplify shapes; it does so using a mathematical framework that is fully differentiable. This means the abstraction isn’t just a static model—it’s a mathematically active entity that can calculate gradients for contact forces, allowing robots to “feel” and optimize their interactions with the world.

As shown in Figure 1 above, this method takes raw geometry (SDF) and transforms it into a set of smooth, convex primitives called Differentiable Support Functions (DSFs). These primitives unlock powerful downstream tasks like shape editing and differentiable simulation.
The Problem with Current Representations
Before we unpack the solution, we need to understand the bottleneck.
- Meshes are standard but computationally heavy for collision detection.
- Implicit Fields (SDFs) represent shapes as continuous functions but are hard to query for contact points in physics engines.
- Primitive Decomposition (breaking a shape into boxes or spheres) is fast for physics but usually inaccurate. A boxy approximation of a sphere is a poor surface to roll on.
The holy grail is a primitive that is versatile (can be a box, a sphere, or a weird pebble shape), parameter-efficient (doesn’t need thousands of numbers to define), and differentiable (we can calculate derivatives for optimization).
The Core Concept: Differentiable Support Functions (DSF)
The researchers propose using a specific geometric representation called a Support Function.
What is a Support Function?
In convex geometry, a support function \(h(x)\) describes the distance from the origin to the supporting hyperplane of a set in the direction \(x\). Think of it as placing a flat sheet of paper against a shape from a specific angle; the support function tells you how far that paper is from the center.
The authors utilize a differentiable formulation defined by a set of vertices. The basic mathematical definition they start with is:

Here, \(v_i\) are the vertices defining the shape, and \(p\) is a smoothness parameter. As \(p\) increases, the shape becomes sharper (like a polyhedron). As \(p\) decreases, it smoothes out (like a blob).
Making it Differentiable and Robust
A standard support function has a limitation: the origin must be inside the shape. To fix this and make the representation robust for any placement in 3D space, the authors introduce a center variable \(c\).

Now, the shape is defined relative to this center \(c\). However, simply adding a variable isn’t enough. For the geometry to be valid, the center \(c\) must mathematically stay inside the convex hull of the vertices.
To enforce this constraint without breaking differentiability (which is crucial for training neural networks or running optimizations), they use a clever parameterization. They define \(c\) as a weighted sum of the vertices, where the weights are controlled by unbounded parameters \(\alpha\):

They also parameterize the smoothness \(p\) to ensure it stays within a useful range (between 2 and a maximum value), preventing numerical instability:

Why Parameterization Matters
This might seem like just algebraic shuffling, but it has a profound visual impact. By allowing the center to “float” and the vertices to adjust via these differentiable parameters, the DSF can wrap tightly around complex convex shapes that have both sharp edges and curved surfaces.

In the image above, look at the “Middle” column versus the “Right” column. When the center is fixed (Right), the shape abstraction fails to capture the nuances of the D-shape or the quarter-circle. With the proposed parameterization (Middle), the vertices distribute intelligently, capturing the flat sides and the curved arcs simultaneously.
The Fitting Process
How do we actually get a DSF to match a target object? We treat it as an optimization problem. We want to minimize the difference between our DSF and the target geometry (represented as an SDF).

By minimizing this loss function, the algorithm pulls the vertices of the DSF until they hug the surface of the target object.

As seen here, even with just 20 vertices, a single DSF can represent a variety of shapes—from spheres to wedges to prisms—very accurately.
The Algorithm: Hyperplane-based Marching
Fitting one convex shape is easy. But most real-world objects—like a robot arm or a bunny—are non-convex. You can’t represent a donut with a single convex blob; you need to break it down.
The authors propose a “Marching” scheme. Similar to the famous “Marching Cubes” algorithm which extracts a mesh from volume data, Marching DSFs iteratively identify regions of the object and fit primitives to them.
The pipeline looks like this:

The system takes an input (like a point cloud), converts it to a Signed Distance Field (SDF), and then runs the abstraction algorithm. The authors introduce two complementary strategies to carve up the space: Region Reduction and Region Expansion.
Strategy 1: Region Reduction
Think of this like sculpting. You start with a block of clay (the whole space) and slice away the parts that aren’t the object.
- Initialize: Start at a point deep inside the object.
- Slice: Identify the closest point on the object’s surface where the space is “empty” (SDF > 0).
- Hyperplane: Create a cutting plane (hyperplane) at that point.
- Repeat: Keep slicing until you have a tight convex region defined by the intersection of these planes.


This method is highly accurate because it strictly defines the boundaries based on the object’s surface.
Strategy 2: Region Expansion
Think of this like inflating a balloon inside the object.
- Initialize: Start with a tiny sphere inside the object.
- Sample: Look at points near the current boundary.
- Expand: Push the boundaries (hyperplanes) outward.
- Optimize: Adjust the offsets of the planes to cover as much volume as possible without crossing the object’s boundaries too much.

This method is generally more efficient and generates fewer primitives, though it might miss some fine details compared to reduction.
Visualizing the Process
The figure below is critical for understanding the difference.

- Top (b-c): You see lines cutting through the space, narrowing down the green shape. This is Reduction.
- Bottom (d-f): You see a shape growing outward from the center, pushing boundaries until it fits the form. This is Expansion.
The Hybrid Approach
Why choose one? The authors combine them. They use Region Reduction to capture the tricky, high-detail parts of a shape, and Region Expansion to quickly fill in the bulkier, simpler volumes.

The Armadillo model above shows the trade-off. Region Reduction (a) is detailed but creates many pieces (49 DSFs). Expansion (b) is efficient (22 DSFs) but misses the ears and fingers. The Hybrid method (c) strikes a balance (31 DSFs), capturing details while keeping the count low.
Experimental Results
So, how does MDSF compare to other state-of-the-art methods? The researchers tested against CvxNet (a neural network approach) and Marching Primitives (MP) (a superquadric approach).
The metrics used were:
- IoU (Intersection over Union): Higher is better (overlap accuracy).
- Chamfer-L1: Lower is better (surface distance error).
- \(N_c\): Lower is better (number of primitives required).

Visually, the difference is stark. CvxNet (second column) often produces “blobby” shapes that fail to capture sharp mechanical edges. The proposed MDSF (far right) produces crisp, clean abstractions that respect the geometry of the robot links.

The data backs this up. In Table 1, MDSF consistently achieves the highest accuracy (IoU ~0.93 vs 0.81 for CvxNet) while using significantly fewer components (13.8 vs 50 for CvxNet). This “high accuracy, low complexity” combination is the sweet spot for simulation.
The researchers also conducted an ablation study to verify their Hybrid approach.

The Hybrid method (“Hyb”) effectively balances the high accuracy of Reduction (“Red”) with the efficiency of Expansion (“Exp”).
Why This Matters: Applications
The true power of MDSF lies in the “D” for Differentiable. Because the math defining these shapes allows us to calculate gradients, we can solve inverse problems that were previously very difficult.
1. Shape Editing
Imagine a robot needs to grab a tool, but the tool handle is slightly the wrong shape. With MDSF, we can define a “contact loss” and automatically morph the shape of the object to minimize the gap between the hand and the object.

Because the gap is differentiable with respect to the shape parameters \(\phi\), we can use gradient descent to reshape the object.

2. Contact Localization
Where exactly did the robot touch the object? Usually, robots have to guess based on noisy sensors. By modeling the contact force constraints mathematically, MDSF allows a system to solve for the exact contact point and force vector.

This optimization minimizes the error between the expected physics (friction cones, torques) and the observed sensor data.

3. Differentiable Simulation
Finally, this enables better planning. In a differentiable simulator, a robot can “imagine” a trajectory. If it drops a banana, it can calculate the gradients of the motion through the contact with the floor to adjust its grip before it even moves.

In Figure 11, the complex geometries of an apple and a banana are abstracted into just 9 and 5 DSFs respectively. This lightweight representation allows the planner to solve complex manipulation tasks efficiently.
Conclusion
The paper “Shape Abstraction via Marching Differentiable Support Functions” presents a significant step forward in representing the 3D world. By moving away from rigid boxes and heavy meshes, and embracing Differentiable Support Functions, the authors provide a tool that is both geometrically accurate and physically useful.
The combination of differentiable parameterization and hyperplane-based marching solves the twin problems of fitting complex shapes and keeping the representation lightweight. For students and researchers in robotics, this highlights an important trend: the future of computer vision isn’t just about recognizing objects; it’s about representing them in a way that allows machines to interact with, simulate, and change the world around them.
](https://deep-paper.org/en/paper/file-2221/images/cover.png)