Introduction
If you have ever tried to perform 3D reconstruction using photogrammetry, you have likely encountered the “glossy object” nightmare. You take a series of photos of a ceramic vase or a metallic toy, feed them into your software, and the result is a melted, noisy blob.
Why does this happen? Most standard 3D reconstruction algorithms assume that the world is Lambertian. In simple terms, they assume that a point on an object has the same color regardless of the angle from which you view it. But glossy and specular (mirror-like) surfaces break this rule. As you move your camera, the reflection of the light source moves across the surface. To the algorithm, this moving highlight looks like the geometry itself is shifting or disappearing, leading to catastrophic failure in the 3D mesh.
To solve this, researchers usually turn to expensive, specialized hardware—light stages with controlled illumination or dedicated polarization cameras that cost thousands of dollars. But what if you could achieve high-fidelity reconstruction of glossy objects using a standard camera and a cheap filter?
In the paper “Glossy Object Reconstruction with Cost-effective Polarized Acquisition,” researchers from Zhejiang University, The University of Hong Kong, and Shenzhen University propose a groundbreaking solution. They combine the physics of light polarization with the power of Neural Radiance Fields (NeRFs) to disentangle the true shape of an object from its shiny reflections. The best part? Their setup requires nothing more than an off-the-shelf RGB camera and a generic linear polarizer, with no need for complex calibration.

Background: The Intersection of Physics and AI
To understand how this method works, we need to bridge two concepts: Neural Implicit Surfaces and Polarization Physics.
Neural Implicit Surfaces
In the last few years, the field of 3D vision has been dominated by Coordinate-Based Neural Networks. Instead of storing a 3D model as a mesh of triangles, we train a neural network to represent the scene. You give the network a 3D coordinate \((x, y, z)\), and it outputs the density and color at that point. This allows for infinite resolution and differentiable rendering.
However, standard NeRFs struggle to distinguish between the actual color of the object (diffuse) and the shiny reflection (specular). When a network tries to “average out” the moving reflections across different views, it often smooths out the geometry, losing fine details.
Why Polarization Matters
This is where physics comes in. Light behaves differently when it bounces off different types of surfaces.
- Diffuse Reflection: Light penetrates slightly into the surface and scatters. This light effectively loses its polarization properties (it becomes unpolarized).
- Specular Reflection: Light bounces directly off the surface interface. This light tends to become polarized.
By analyzing the polarization state of the light entering the camera, we can mathematically separate the “shiny” parts of the image from the “matte” parts. Furthermore, the angle of polarization is physically tied to the surface normal (the direction the surface is facing). This means polarization provides a strong geometric cue that standard RGB images lack.
The Core Method
The researchers propose a pipeline that takes multi-view images captured with a linear polarizer and outputs a high-quality 3D mesh. The ingenious part of their approach is that they do not know the exact angle of the polarizer for every shot. They treat the polarizer angle as an unknown variable that the AI must figure out on its own.

As illustrated in Figure 2, the pipeline is a cycle of prediction and correction. Let’s break down the architecture into its fundamental components.
1. The Neural Radiance Field
The system uses a coordinate-based network (specifically building on VolSDF and Ref-NeRF) to represent the object. For any point in space, the network predicts:
- Signed Distance Function (SDF): The distance to the nearest surface (defining the geometry).
- Surface Normal (\(n\)): The orientation of the surface.
- Diffuse Radiance (\(c^d\)): The underlying color.
- Specular Radiance (\(c^s\)): The shiny reflection, which depends on the viewing direction and roughness.
The relationship between these components is governed by the following equation, where the final color is a combination of diffuse and specular parts:

Here, \(f_\theta\) and \(g_\theta\) are neural networks (MLPs). The term IDE refers to Integrated Directional Encoding, a technique to help the network understand roughness and reflection directions.
2. The Polarimetric BRDF (pBRDF)
Standard rendering engines calculate RGB color. This method, however, calculates the Stokes Vector. The Stokes vector is a 4-component mathematical object that fully describes the polarization state of light (intensity, degree of polarization, and angle of polarization).
The researchers use a pBRDF model (Polarimetric Bidirectional Reflectance Distribution Function). This model explicitly links the geometry (surface normals) and material properties (index of refraction, roughness) to the polarization of the outgoing light.
The Stokes vector \(S\) is defined as:

The system predicts the outgoing Stokes vector \(s^{out}\) by summing the contributions from the diffuse and specular components. The diffuse part is mostly unpolarized, while the specular part carries strong polarization cues. The interaction is modeled by Mueller matrices (\(M^d\) and \(M^s\)), which describe how the surface transforms the incoming light:

This looks complex, but the intuition is simple: The network predicts the physics of the light reflection, not just the color.
3. Polarization Rendering with Unknown Angles
This is the most innovative part of the paper. In a traditional scientific setup, you would precisely rotate the polarizer to \(0^\circ, 45^\circ, 90^\circ\) and carefully record the data. Here, the user just slaps a filter on the camera and walks around the object. The angle of the polarizer \(\phi_{pol}\) relative to the object is unknown.
The researchers derived a differentiable formulation to handle this. First, they extract the fundamental polarization properties—Intensity (\(I_{un}\)), Degree of Polarization (\(\rho\)), and Angle of Polarization (\(\phi\))—from the predicted Stokes vectors:

The intensity of light passing through a linear polarizer changes sinusoidally based on the angle difference between the light’s polarization (\(\phi\)) and the polarizer’s angle (\(\phi_{pol}\)). This is Malus’s Law in action:

Since the network predicts the Stokes vector (and thus \(\rho\) and \(\phi\)) and we have the captured image intensity \(I_{\phi_{pol}}\), the only unknown left is the polarizer angle \(\phi_{pol}\). The network can optimize this angle along with the geometry!
To make this fully differentiable (so the AI can learn from it), they construct a Mueller Matrix for the polarizer itself. This matrix represents a linear polarizer rotated by an arbitrary angle \(\phi_{pol}\).


The outgoing Stokes vector after passing through the camera’s filter is calculated by multiplying the object’s Stokes vector by this polarizer matrix:

Finally, the image intensity the camera actually “sees” is simply the first component (\(s_0\)) of this final vector:

4. The Loss Function
The entire system is trained end-to-end. The goal is to minimize the difference between the rendered polarized image and the actual photo taken by the camera.

- \(\mathcal{L}_{rgb}\): The error between the rendered pixel and the captured pixel.
- \(\mathcal{L}_{mask}\): Keeps the object shape distinct from the background.
- \(\mathcal{L}_{eikonal}\): A regularization term that ensures the SDF represents a valid physical surface (keeping the geometry smooth and plausible).
Experiments & Results
The researchers built a prototype using a Sony A6400 camera and a standard linear polarizer. They captured about 40 images per object, rotating the camera around the subject. They tested on challenging objects: a “RedOx” (ceramic/metal hybrid), a glossy green ox, a porcelain cat, and a metallic bust.
Qualitative Results
The results are visually striking. In Figure 3 below, look at the “Our Normals” and “Our Mesh” rows. The surface normals (the rainbow-colored maps) are incredibly smooth, even in regions with sharp specular highlights. Standard RGB methods often produce “bumpy” normals in these areas because they mistake the reflection for a geometric bump.

Comparison with State-of-the-Art (SOTA)
The team compared their method against several leading neural reconstruction techniques, including NeuralPIL, PhySG, NVDiffRec, and Ref-NeuS.
As shown in Figure 4, competitors struggle significantly.
- PhySG tends to over-smooth geometry.
- NVDiffRec captures high-frequency noise, mistaking reflections for surface texture.
- Ours (The proposed method) captures intricate details, like the texture of the cat’s beard, without getting confused by the glossy finish.

The quantitative data backs this up. In Table 1, the proposed method consistently achieves the lowest Chamfer Distance (CD), which is a metric measuring the error between the reconstructed 3D shape and the true 3D scan.

Decomposing the Scene
One of the most powerful features of this approach is Radiance Decomposition. Because the method understands polarization, it can separate the image into “Diffuse” (matte color) and “Specular” (shiny reflection) components.
In Figure 5, we see a comparison on a synthetic “Bust” model. The “Mixed Radiance” is the final image, but the columns for “Diffuse” and “Specular” show how the network understands the material. It successfully identifies that the shine is a separate layer on top of the object.

Robustness: Does the “Unknown Angle” really work?
A major claim of the paper is that you don’t need to calibrate the polarizer. To prove this, they conducted robustness analysis. They synthesized images with known polarizer angles (\(0^\circ, 45^\circ, 90^\circ\)) and checked if the network could recover the geometry and accurately estimate the angle.
Figure 9 shows that despite variations in the input images (note the changing highlights in the red boxes), the resulting geometry remains consistent. Furthermore, the network estimated the polarizer angles with an error of less than \(5^\circ\).

Conclusion and Implications
This research represents a significant step forward in making high-end 3D scanning accessible. By leveraging the physical properties of light—specifically polarization—the authors have turned a difficult computer vision problem (specularities) into a source of valuable geometric data.
Key Takeaways:
- Low Cost: High-fidelity scanning of shiny objects is now possible without expensive polarization cameras.
- Ease of Use: The “unknown angle” optimization removes the need for tedious calibration.
- Physics + AI: Pure learning approaches often fail on edge cases. Integrating physical models (pBRDF) into neural networks (NeRFs) provides the constraints needed to solve complex inverse problems.
This method opens the door for better 3D scanning in e-commerce (scanning products for VR/AR), cultural heritage preservation (scanning glossy artifacts), and perhaps eventually, high-quality 3D scanning on mobile phones equipped with simple polarization filters.
](https://deep-paper.org/en/paper/2504.07025/images/cover.png)