Creating realistic “digital twins” of real-world objects is a cornerstone of modern computer graphics, powering everything from movie VFX to immersive VR/AR experiences. To make a digital object look real, you need two things: its shape (surface normal) and its material properties (how shiny or rough it is).
For years, this has been a tug-of-war between speed and quality. Traditional methods, like Photometric Stereo (PS), require capturing hundreds of High Dynamic Range (HDR) images under different lights. This is slow, data-heavy, and often fails on “tricky” materials—specifically, objects that are very shiny or metallic.
In a recent paper presented at CVPR, researchers proposed a groundbreaking solution called EventPSR. By ditching traditional cameras in favor of event cameras, and combining them with a clever lighting algorithm, they can recover shape and material properties simultaneously, faster, and with significantly less data.
In this post, we’ll dive into how EventPSR works, why event cameras are the secret weapon for 3D scanning, and how this method tackles the notorious problem of scanning shiny objects.
The Problem: The “Shiny Object” Dilemma
To understand the innovation here, we first need to understand the limitation of standard cameras.
When you take a picture of a matte (diffuse) object, light scatters evenly. It’s easy to estimate the shape. However, when you photograph a shiny (specular) object—like a metal ball or a glazed ceramic—light reflects in sharp, bright highlights.
Standard cameras struggle here for two reasons:
- Dynamic Range: The highlights are often “blown out” (pure white), while the rest of the object is dark. To fix this, you have to take multiple photos at different exposures (HDR), which takes time.
- Sparse Data: Specular reflections are directional. You only see the reflection if the light hits the surface at the perfect angle relative to the camera. To map the whole surface, you need to move the light source to hundreds of different positions.
This results in a slow capture process and massive data storage requirements. The researchers behind EventPSR asked: What if we use a camera that doesn’t capture frames, but instead captures changes in light?
Enter the Event Camera
Event cameras are bio-inspired sensors. Unlike a standard camera that captures a snapshot of the whole scene at fixed intervals (e.g., 30 frames per second), an event camera works asynchronously. Each pixel operates independently.
When a pixel detects a change in brightness (logarithmic intensity) that exceeds a certain threshold, it fires an “event”—a packet containing the timestamp, pixel coordinates, and polarity (whether it got brighter or darker).
This offers two massive advantages for 3D scanning:
- High Dynamic Range (HDR): Event cameras can see detail in very dark shadows and very bright highlights simultaneously.
- High Temporal Resolution: They can detect changes in microseconds.
However, using an event camera introduces a new challenge: You don’t get a “picture.” You get a stream of data points. The researchers had to figure out how to translate this stream into a 3D model with material textures.
The EventPSR Method
The EventPSR approach consists of two main components: a specifically designed Light Scanning Pattern and a Two-Stage Reconstruction Algorithm.

As shown in Figure 1 above, the pipeline takes a target object, blasts it with a light pattern, records the “event stream,” and processes it to output the Normal Map (shape), Metallic map, and Roughness map.
1. Designing the Perfect Light Pattern
You can’t just shine a flashlight at an object and expect the event camera to understand the shape. The light needs to change over time to trigger the events. The researchers analyzed three different lighting patterns to see which worked best:
- Point Light: A single dot moving in a spiral.
- Structured Environment Map: Complex patterns flashing on a screen.
- Moving Ring Light: A ring of light sweeping across the object.
They evaluated these patterns based on three criteria:
- Specular Coverage: Does it hit the shiny parts?
- Diffuse Sensitivity: Does it reveal the shape of matte parts?
- Event Efficiency: Does it generate useful data without overloading the system with noise?

Figure 2 illustrates why they chose the Moving Ring Light:
- Column A (Specular Coverage): The Point light (left) only lights up a tiny dot. The Ring light (middle) lights up a broad strip, covering the surface much faster.
- Column B (Diffuse Sensitivity): The Structured Map (right) creates chaotic signals that make it hard to distinguish shape. The Ring light creates a smooth, readable curve.
- Column C (Efficiency): The Point light creates a massive spike of events (a data bottleneck) when it hits a highlight. The Ring light spreads the data out more evenly.
The Verdict: The “Moving Ring Light” pattern offered the best balance, covering the full sphere of illumination angles efficiently.
2. The Algorithm: From Events to Properties
Once the data is captured, how do we turn blips of light into a 3D material? The authors propose a two-stage process.
Stage 1: Grid-Matching (The Coarse Search)
Imagine the researchers created a massive “dictionary” of what an event stream should look like for every possible combination of surface normal (angle), roughness, and metalness. They did this using a rendering engine.
When the camera records a pixel’s event stream, the algorithm compares that stream against the dictionary. It looks for the entry that most closely resembles the real-world data.
The loss function (the math used to calculate the difference) looks like this:

This equation essentially checks the difference between the observed brightness changes (from the events) and the predicted brightness changes (from the database) over time. This stage gives a “coarse” result—it gets us in the ballpark of the correct shape and material.
Stage 2: Gradient-Tuning (The Fine Tuning)
The grid-matching stage is limited by the resolution of the dictionary (the grid). If the true surface normal falls between two grid points, the result will be slightly off.
To fix this, the second stage uses gradient descent. It takes the coarse result and mathematically nudges the values of Normal, Metallic, and Roughness until the error is minimized.
Crucially, they also added a “smoothness” constraint. Since shiny objects can have “dead zones” where no light reflects (and thus no events are triggered), the algorithm assumes that neighboring pixels likely have similar properties. This fills in the gaps for highly reflective surfaces.
The Experimental Setup
To validate this, the researchers built a “cage” of commodity computer monitors.

As seen in Figure 3, the target object hangs in the center. Five monitors surround it to project the “Moving Ring” pattern from all angles. The event camera sits at the bottom, looking up. This setup allows them to control the lighting precisely while capturing the high-speed event data.
Results: Does it Work?
The researchers tested EventPSR against state-of-the-art methods, including Neural Radiance Field (NeRF) based approaches (like NeILF) and other Photometric Stereo methods (SDM-UniPS).
Efficiency
One of the most impressive stats is data efficiency. Because event cameras only record changes, they don’t store redundant static background data.
- EventPSR uses only ~30% of the data rate compared to frame-based methods.
- It achieves this while matching or beating the accuracy of methods that require hundreds of full-frame HDR images.
Synthetic Data Performance

Figure 5 shows a comparison on synthetic objects.
- Top Row (Normals): Look at the “Error” columns. The EventPSR error map is largely dark (low error), whereas the NeILF method shows significant bright spots (high error), indicating it struggled to guess the shape correctly.
- Bottom Row (Metallic): EventPSR recovers the metallic map almost perfectly compared to the Ground Truth.
Real-World Performance
The real test is on physical objects with complex material mixtures.

Figure 8 displays the results for three challenging real-world objects:
- The Nose (Left): A uniform, shiny plastic. EventPSR captures the smooth curvature without getting confused by the specular highlights.
- The Cat in a Bowl (Middle): A mix of a diffuse cat and a highly reflective (mirror-like) bowl. The “Metallic” map correctly identifies the bowl as metal and the cat as non-metal.
- The Splatoon Figure (Right): A complex object with varying roughness (smooth hair vs. rough skin). The “Roughness” map accurately segments these different textures.
These results are significant because traditional scanners usually demand that you spray shiny objects with a matte powder to scan them. EventPSR embraces the shininess, using the reflections to calculate the material properties rather than being blinded by them.
Conclusion and Future Implications
EventPSR represents a significant step forward in photometric stereo. by leveraging the unique strengths of event cameras—specifically their dynamic range and temporal resolution—the researchers have solved a long-standing bottleneck in 3D scanning.
Key Takeaways:
- Simultaneous Capture: It recovers Shape, Roughness, and Metallic properties all at once.
- Robustness: It works on everything from matte clay to mirror-finished metal.
- Efficiency: It requires significantly less data bandwidth than traditional HDR imaging.
While the current setup uses a fixed rig of monitors (limiting the speed slightly due to monitor refresh rates), the underlying algorithm proves that event cameras are not just for motion tracking—they are powerful tools for high-fidelity 3D reconstruction. As event sensors become more common in robotics and mobile devices, we could see this technology enabling high-quality 3D scanning right from our pockets.
](https://deep-paper.org/en/paper/file-2019/images/cover.png)