Imagine looking at a pristine, opaque billiard ball sitting on a table. Now, imagine a ping-pong ball painted to look exactly like that billiard ball sitting next to it. To a camera—and to standard computer vision algorithms—these two objects are identical. They share the same geometry and the same surface texture.
However, if you were to drop both balls, their true nature would instantly reveal itself. The solid billiard ball would land with a heavy thud, barely deforming. The hollow ping-pong ball would bounce, vibrate, and deform upon impact. The motion betrays the structure.
This is the core intuition behind Structure from Collision (SfC), a fascinating new task proposed in a recent research paper. While modern techniques like Neural Radiance Fields (NeRF) have revolutionized how we reconstruct 3D surfaces, they suffer from a “superficial” limitation: they can only see the skin of an object. This post explores SfC-NeRF, a novel framework that uses the physics of collisions to peer inside opaque objects and reconstruct their invisible internal structures.
The Problem: NeRFs are Skin-Deep
Neural Radiance Fields (NeRF) and 3D Gaussian Splatting (3DGS) have solved the problem of Novel View Synthesis. Given enough photos of an object, these models can recreate it in 3D with photorealistic accuracy.
However, these methods rely entirely on light rays bouncing off surfaces. If an object is not transparent, a standard NeRF assumes the interior is irrelevant or fills it with density arbitrarily. This becomes a critical issue for robotics and physical simulations. A robot needs to know if an object is heavy and solid or light and hollow to grip it correctly.

As shown in Figure 1, a static 3D model cannot distinguish between a solid sphere and a hollow one because they look identical in a still image.
- Row 1 (Static): The standard model sees no difference.
- Row 2 (SfC): The proposed method analyzes the collision.
- Row 3 (Ground Truth): The actual internal structures (a solid core vs. a hollow cavity).
The researchers propose that while the static appearance is identical, the dynamic deformation during a collision is a fingerprint of the internal structure. By observing how an object crumples, stretches, or bounces, we can mathematically deduce what is happening inside.
Background: Physics-Augmented Continuum NeRF
To solve this, the authors build upon PAC-NeRF (Physics-Augmented Continuum NeRF). This is a hybrid approach that combines two distinct ways of representing the world:
- Eulerian Grid (Visuals): The standard NeRF approach, representing the scene as a grid of density and color values.
- Lagrangian Particles (Physics): A physics simulation approach where the object is made of particles that move and interact according to physical laws (like mass and elasticity).
The system uses a differentiable simulator called DiffMPM (Material Point Method). This allows the AI to backpropagate errors from the video pixels all the way back to the physical properties of the object. In standard PAC-NeRF, this is used to estimate physical parameters like elasticity (Young’s modulus). In SfC-NeRF, the researchers flip the script: they assume the material properties are known, and instead solve for the internal geometry.
The Method: SfC-NeRF
The goal of SfC-NeRF is to optimize a density grid that represents the object’s internal structure. The process is divided into two main stages: Static Optimization and Dynamic Optimization.

Step 1: Static Optimization
First, the model looks at the first frame of the video (before the collision). It learns the “shell” of the object using standard NeRF techniques. At this stage, the model creates a filled object because it has no reason to believe there are holes inside.
Step 2: Dynamic Optimization
This is where the magic happens. The model watches the video of the collision. It simulates the physics of the object it learned in Step 1. If the simulation doesn’t match the video (e.g., the real object crumples but the simulated solid object doesn’t), the model adjusts the internal density of the object until the simulation matches reality.
To make this ill-posed problem solvable, the authors introduce four critical components:
- Physical Constraints
- Appearance-Preserving Constraints
- Keyframe Constraints
- Volume Annealing
Let’s break these down.
1. Physical Constraints
The deformation of the object must obey the laws of physics. The model uses continuum mechanics equations to ensure mass and momentum are conserved.
\[ \frac { D \boldsymbol { \sigma } } { D t } = 0 , \ \frac { D \mathbf { c } } { D t } = \mathbf { 0 } , \]

Crucially, the authors constrain the Mass. A hollow object weighs less than a solid one. By enforcing a loss function based on the known mass of the object, the model is encouraged to “carve out” empty space inside the object to match the target weight.
\[ \begin{array} { r } { \mathcal { L } _ { \operatorname* { m a s s } } = \| \log _ { 1 0 } ( m ) - \log _ { 1 0 } ( \hat { m } ) \| _ { 2 } ^ { 2 } , } \\ { m = \displaystyle \sum _ { p \in \mathcal { P } ^ { P } ( t _ { 0 } ) } \hat { \rho } \cdot \Big ( \frac { \Delta x } { 2 } \Big ) ^ { 3 } \cdot \alpha _ { p } ^ { P } , } \end{array} \]
2. Appearance-Preserving Constraints
This is the most clever part of the design. When the AI starts carving out holes inside the object to satisfy the physics simulation, it might accidentally delete part of the surface. If the surface disappears, the rendering breaks.
To prevent this, the authors enforce Appearance-Preserving Constraints. They tell the model: “Change the inside however you want, but the outside must still look exactly like the static photo we took at the start.”
They achieve this via a specific pixel loss on the initial frame (\(t_0\)):
\[ \mathcal { L } _ { \mathrm { p i x e l } _ { 0 } } = \frac { 1 } { | \hat { \mathcal { R } } | } \sum _ { { \bf r } \in \hat { \mathcal { R } } } \| { \bf C } ( { \bf r } , t _ { 0 } ) - \hat { \bf C } ( { \bf r } , t _ { 0 } ) \| _ { 2 } ^ { 2 } . \]
And, to prevent the geometry from becoming concave or distorted in 3D space, they also apply a depth-preserving loss:
\[ \begin{array} { r l r } { { \mathcal { L } _ { \mathrm { d e p t h } _ { 0 } } = \frac { 1 } { | \hat { \mathcal { R } } | } \sum _ { \mathbf { r } \in \hat { \mathcal { R } } } ( \| \Delta _ { h } Z ( \mathbf { r } , t _ { 0 } ) - \Delta _ { h } \tilde { Z } ( \mathbf { r } , t _ { 0 } ) \| _ { 2 } ^ { 2 } , } } \\ & { } & { + \| \Delta _ { v } Z ( \mathbf { r } , t _ { 0 } ) - \Delta _ { v } \tilde { Z } ( \mathbf { r } , t _ { 0 } ) \| _ { 2 } ^ { 2 } ) , } \end{array} \]
3. Keyframe Constraints
In a video of a collision, not every frame is equally important. The frames where the object is flying through the air don’t tell us much about its stiffness. The frame immediately after impact, where the deformation is maximum, contains the most information.
SfC-NeRF applies a heavier weight to these Keyframes to ensure the internal structure explains the maximum point of impact.
\[ \mathcal { L } _ { \mathrm { p i x e l } _ { k } } = \frac { 1 } { | \hat { \mathcal { R } } | } \sum _ { { \bf r } \in \hat { \mathcal { R } } } \| { \bf C } ( { \bf r } , t _ { k } ) - \hat { \bf C } ( { \bf r } , t _ { k } ) \| _ { 2 } ^ { 2 } , \]
4. Volume Annealing
Finally, the optimization process is prone to getting stuck in “local optima.” For example, the model might dig a hole in the wrong place and get stuck there.
To fix this, the authors use Volume Annealing. They repeatedly expand and contract the volume density during training. Think of it as “shaking” the container to let the particles settle into the correct configuration. This helps the model search for the global optimum rather than getting stuck on the first guess.
The full objective function combines all these elements:
\[ \begin{array} { r l } & { \mathcal { L } _ { \mathrm { f u l l } } = \mathcal { L } _ { \mathrm { p i x e l } } + \lambda _ { \mathrm { m a s s } } \mathcal { L } _ { \mathrm { m a s s } } } \\ & { \quad + \lambda _ { \mathrm { p r e s } } ( \mathcal { L } _ { \mathrm { p i x e l } _ { 0 } } + w _ { \mathrm { d e p t h } } \mathcal { L } _ { \mathrm { d e p t h } _ { 0 } } ) + \lambda _ { \mathrm { k e y } } \mathcal { L } _ { \mathrm { p i x e l } _ { k } } } \end{array} \]
Experiments and Results
Because this is a new task, the researchers created a dataset called the SfC dataset. It consists of 115 objects including spheres, cubes, bicones, cylinders, and diamonds. They varied the internal structures (size and location of cavities) and materials (elasticity, fluid properties).

Visual Results
The visual results are striking. In the comparison below (Figure 4), we see a sphere with a hidden internal cavity.
- Static models just guess a solid sphere.
- GO/LPO (Baseline dynamic methods) struggle to find the right shape, often resulting in noisy or completely wrong internal densities.
- SfC-NeRF (The proposed method, labeled ’l’) successfully identifies the internal void.

The method also works across different shapes. In Figure 6, notice how the standard models (d) just learn a solid core. The baselines (e-h) get messy. The SfC-NeRF (n) clearly carves out the internal structure that best explains the collision physics.

Quantitative Analysis
The researchers measured success using Chamfer Distance (CD), which calculates the difference between the estimated particle positions and the ground truth. A lower number means better accuracy.

As shown in Table 1, SfC-NeRF consistently outperforms the baselines (GO and LPO) across various cavity sizes. Interestingly, the table also shows that larger cavities are harder to estimate—likely because a large cavity requires the model to “delete” more volume from the initial solid guess, which is a harder optimization path.
The Influence of Material Stiffness
Does the material matter? Absolutely.
If an object is too hard (high Young’s modulus), it won’t deform much upon collision. If it doesn’t deform, the video provides no new information, and the method struggles to see inside. If an object is too soft, it might crumple chaotically, making the optimization difficult.

The results in Figure 11 illustrate this “Goldilocks” principle. The method works best when the object is soft enough to deform noticeably, but stable enough to maintain a coherent shape.
Conclusion
Structure from Collision represents a significant step forward in “physical AI.” It moves beyond simply reconstructing how the world looks to understanding how the world works. By enforcing physical consistency and observing dynamics, SfC-NeRF effectively gives computer vision “X-ray” capabilities for opaque, deformable objects.
While the current work relies on simulation and assumes known material properties (like mass), it opens the door for future systems that could simultaneously estimate an object’s shape, internal structure, and physical properties just by watching it drop. This could be revolutionary for robotics, allowing machines to understand the weight distribution and fragility of objects before they even pick them up.
The next time you see a ball drop, remember: the bounce tells you everything you need to know about what’s hiding inside.
](https://deep-paper.org/en/paper/2505.21335/images/cover.png)