Seeing the Unseen: High-Speed Polarization Video with Event Cameras

Introduction

In the world of computer vision, we usually obsess over the intensity of light—how bright or dark a pixel is. But light carries another hidden layer of information: polarization. When light bounces off an object, its electromagnetic orientation changes. These changes encode rich details about the object’s shape, material composition, and surface texture that standard cameras simply miss.

To capture this information fully, scientists use Ellipsometry. This technique measures the “Mueller matrix,” a \(4 \times 4\) grid of numbers that completely describes how a material transforms polarized light. It is a powerful tool used in everything from biology to material science.

There is just one problem: It is incredibly slow.

Traditional ellipsometers work by mechanically rotating optical filters and taking many pictures with a standard camera. This takes seconds or even minutes, meaning you can only photograph static objects. If the object moves, or if you want to capture a dynamic scene (like a human face changing expression), traditional ellipsometry fails.

Enter the Event Ellipsometer.

In a fascinating new paper, researchers have proposed a method to capture Mueller-matrix videos at 30 frames per second (fps). They achieve this not by using a faster standard camera, but by switching to an Event Camera—a bio-inspired sensor that works completely differently from the camera in your phone. By combining this sensor with fast-rotating optics, they can capture high-speed, high-dynamic-range (HDR) polarization videos.

Overview of Event Ellipsometer showing the setup and various applications like dynamic scene analysis.

In this post, we will tear down this paper to understand how they managed to combine neuromorphic engineering with classical optics to see the invisible world of polarization in real-time.

Background: The Building Blocks

To understand how the Event Ellipsometer works, we need to bridge two distinct fields: Polarimetric Imaging and Event-Based Vision.

1. The Mueller Matrix

Light can be described by a Stokes Vector (\(\mathbf{s}\)), a 4-element vector that describes its intensity and polarization state. When light hits an object, the object transforms that light. Mathematically, this transformation is a matrix multiplication.

The object’s “fingerprint” is the Mueller Matrix (\(\mathbf{M}\)), a \(4 \times 4\) matrix. If you know \(\mathbf{M}\), you know everything about how that object reflects polarized light.

Diagonal elements often relate to depolarization.
Off-diagonal elements can reveal birefringence (material stress) or orientation.

The goal of this research is to figure out the values of this matrix for every pixel in a video.

2. Event Cameras vs. Frame Cameras

Standard cameras capture absolute brightness at fixed intervals (e.g., every 33ms). If the scene is dark, they get noisy. If it’s too bright, they get overexposed.

Event cameras (or Dynamic Vision Sensors) are asynchronous. Each pixel works independently. It doesn’t report “how bright I am”; it reports “did I just change?” When the log-intensity of a pixel changes by a certain threshold, it fires an “event”—a microsecond-timestamped signal.

This gives event cameras two superpowers utilized in this paper:

High Temporal Resolution: They can detect changes in microseconds, allowing them to track very fast optical modulations.
High Dynamic Range (HDR): Since each pixel operates independently, one pixel can look at a bright highlight while its neighbor looks at a deep shadow, and both work perfectly.

The Hardware: A Symphony of Rotation

The researchers designed a setup that looks like a standard photography rig but behaves like a strobe light experiment.

Schematic diagram of the optical arrangement, timeline of rotation, and the hardware prototype.

The Setup

As shown in the schematic above, the system places the target object between two sets of optics:

Illumination Side: An LED light source is passed through a Linear Polarizer (LP) and a rotating Quarter-Wave Plate (QWP).
Detection Side: The light reflects off the object, passes through another rotating QWP and a fixed LP, and hits the Event Camera.

The Modulation Strategy

Here is the clever part: The two QWPs are not stationary. They are rotating continuously at high speeds.

The light-source QWP rotates at a speed of \(\omega\).
The camera-side QWP rotates at a speed of \(5\omega\).

Why rotate them? By spinning the plates, the system constantly changes the polarization state of the light hitting the object and the polarization state the camera is looking for. This creates a time-varying signal at every pixel. The specific ratio of \(1:5\) in rotation speeds ensures that the resulting signal encodes enough unique information to mathematically solve for all 16 elements of the Mueller matrix.

The motors spin fast enough that a full measurement cycle happens in just 33 milliseconds, allowing for 30 fps video reconstruction.

The Core Method: From Events to Matrices

This is the heart of the paper. How do we go from a stream of binary “change detection” events to a complex optical matrix?

Step 1: Modeling the Intensity

First, we need a mathematical model of what the light intensity should be at any given time \(t\). Using standard polarization calculus, the intensity \(I_t\) at the sensor is a product of the optical components.

Equation describing the intensity It as a product of Stokes vectors and Mueller matrices of the optical components.

In this equation:

\(\mathbf{L}(0)\) is the linear polarizer.
\(\mathbf{Q}(\theta)\) is the rotating quarter-wave plate.
\(\mathbf{M}\) is the unknown scene Mueller matrix we want to find.
\(\mathbf{s}\) is the source light.

By expanding this multiplication, the researchers simplified the relationship into a vector form. The intensity \(I_t\) becomes the dot product of a “system vector” \(\mathbf{A}_t\) (which depends on the known angles of the motors) and the vectorized Mueller matrix \(\hat{\mathbf{M}}\).

Equation showing intensity It as the product of system matrix At and the vectorized Mueller matrix M.

The vector \(\mathbf{A}_t\) is complex, composed of sines and cosines derived from the rotation angles of the two motors (\(\omega t\) and \(5\omega t\)).

Detailed definition of the system vector At and its trigonometric components.

Step 2: The Event Camera Model

Here is the challenge: Event cameras do not measure \(I_t\). They measure changes in log-intensity.

Mathematically, an event is triggered when the change in log-intensity exceeds a threshold \(C\). The paper derives the relationship between the time difference between events (\(\Delta t\)) and the Mueller matrix.

They start by taking the derivative of the log-intensity with respect to time:

Equation showing the derivative of log intensity with respect to time.

This equation looks intimidating, but the numerator \(\frac{\partial \mathbf{A}_t}{\partial t}\) is simply the time-derivative of the knowing motor positions.

Derivative of the system vector At with respect to time.

Now, we connect this to the physics of the event camera. The change in log-intensity is related to the event polarity \(p_k\) (+1 for getting brighter, -1 for darker), the contrast threshold \(C\), and the time gap between events \(\Delta t_k\).

Relationship between the differential log intensity and the event parameters.

Step 3: The Reconstruction Equation

By combining the optical model (Step 1) with the sensor model (Step 2), the researchers arrived at a linear equation system. This is the “Master Equation” of the paper. It relates the measured time gaps between events directly to the unknown Mueller matrix \(\hat{\mathbf{M}}\).

Derivation of the linear system B * M = 0.

For every pair of events at a pixel, we get a row in matrix \(\mathbf{B}\). If we collect enough events during the 33ms window, we can stack them up and solve for \(\mathbf{M}\) such that \(\mathbf{B}\hat{\mathbf{M}} = 0\).

Reconstruction Pipeline: Solving the Puzzle

Ideally, we would just invert the matrix \(\mathbf{B}\) and be done. In reality, event data is noisy, and in dark areas, we might not get enough events. The authors proposed a robust two-stage pipeline to handle this.

Diagram of the reconstruction pipeline: Per-pixel estimation followed by Spatio-temporal propagation.

Stage 1: Per-Pixel Estimation

For each pixel, they collect all events within a frame duration. They solve an optimization problem to find the Mueller matrix that minimizes the error.

Weighted least squares minimization problem to find M.

Crucially, they don’t just accept any mathematical answer. A Mueller matrix must obey physical laws (e.g., you can’t reflect more energy than you received, and you can’t have “negative” polarization). They apply Cloude’s Filter, a mathematical projection that forces the matrix to be physically valid.

Application of Cloude’s filter to ensure physical validity.

They also use an iterative weighting scheme. If a specific event seems like an outlier (statistical noise), its weight \(w_k\) is reduced in the next iteration.

Weight update equation to downweight outliers.

Stage 2: Spatio-Temporal Propagation

Some pixels might not have enough events to get a good estimate (the “aperture problem” of event cameras—if nothing changes, you see nothing).

To fix this, the algorithm borrows information from neighbors. They use a technique similar to PatchMatch. The algorithm looks at a pixel’s neighbors in space (x, y) and time (previous/next frames). If a neighbor has a Mueller matrix that fits the current pixel’s event data better than its own estimate, the pixel adopts the neighbor’s matrix.

Propagation equation where a pixel adopts a neighbor’s matrix if it reduces error.

Visualization of the spatio-temporal propagation pattern.

Finally, they refine the result by adding small random perturbations. If a random tweak to the matrix reduces the error, they keep it. This helps the algorithm escape local minima and fine-tune the result.

Refinement step using random perturbations to optimize the matrix.

Calibration: Taming the Hardware

Before any of this works, the system must be calibrated. The two critical unknowns are the event camera’s contrast threshold (\(C\)) and the exact starting angles of the QWPs.

Contrast Threshold (\(C\))

The parameter \(C\) determines how sensitive the camera is. The authors calibrated this by shining a light that increases and decreases linearly in brightness. By matching the known light ramp to the number of events generated, they could calculate \(C\) for every single pixel.

Contrast threshold calibration setup and results showing the linear fit.

Angle Calibration

The motors might not start at exactly zero degrees. To fix this, they placed a known QWP in the path. Since they knew the Mueller matrix of the reference QWP, they could perform a grid search to find the offset angles that produced the minimum error in their reconstruction.

Calibration of QWP offset angles using a reference target and grid search error map.

Experimental Results

The researchers validated the Event Ellipsometer extensively against synthetic data and real-world scenarios.

1. Synthetic Validation

They rendered a synthetic scene with known polarization properties (silicone and brass objects). As shown below, the raw initialization (SVD) is noisy, but the full pipeline (with propagation and refinement) recovers the ground truth Mueller matrix almost perfectly.

Synthetic data evaluation showing the improvement from initialization to the full pipeline.

2. Accuracy on Real Objects

They tested standard optical elements (like air, polarizers, and wave plates). The reconstruction error was incredibly low (Mean Squared Error \(\approx 0.015\)).

Assessment on real data showing low error for known optical elements and strong signals for a metal plate.

3. Application: Photoelasticity

This is one of the coolest applications. Many transparent materials (like plastic or gelatine) become birefringent when squeezed—meaning they change the polarization of light passing through them. This effect is usually invisible to the naked eye.

The Event Ellipsometer captures this stress distribution in real-time. As force is applied to a gelatine disk, complex fringe patterns emerge in the Mueller matrix video, visualizing the internal mechanical stress.

Photoelasticity analysis of a gelatine disk showing stress patterns under force.

Detailed visualization of the photoelasticity scene showing different polarization components.

4. Application: Transparent Object Detection

Finding transparent tape on a box is hard for standard RGB cameras because the tape is… transparent. However, the stretching process used to make tape aligns its polymer chains, making it birefringent. The Event Ellipsometer sees this immediately.

Transparent tape detection showing the tape clearly visible in the Mueller matrix but hidden in RGB.

5. Application: Dynamic Humans and HDR

Because event cameras have such high dynamic range, this system can scan a human face (which has dark hair and shiny, specular skin) without issues. It captures the diffuse scattering from the skin and the specular reflections from the forehead simultaneously, even while the subject is moving.

Dynamic human capture of face and hair revealing polarimetric properties.

HDR capabilities showing successful capture of both bright specular and dark diffuse regions.

Conclusion

The Event Ellipsometer represents a significant leap forward in computational imaging. By stepping away from the “frame-by-frame” paradigm of traditional cameras and embracing the asynchronous nature of event sensors, the authors turned a slow, static measurement technique into a real-time video capability.

Key Takeaways:

Speed: Mueller-matrix imaging is no longer limited to static lab samples; it can now run at 30 fps.
Robustness: The high dynamic range of event cameras allows for scanning shiny and dark objects in the same scene.
Algorithmic Innovation: The probabilistic reconstruction pipeline effectively handles the noise inherent in event data, turning sparse temporal spikes into dense spatial maps.

This technology opens the door for exciting applications in industrial quality control (checking stress in glass on a conveyor belt), medical imaging (analyzing skin properties in real-time), and advanced 3D scanning. It proves that sometimes, to see more, we don’t need more pixels—we just need to look at the changes.

Introduction#

Background: The Building Blocks#

1. The Mueller Matrix#

2. Event Cameras vs. Frame Cameras#

The Hardware: A Symphony of Rotation#

The Setup#

The Modulation Strategy#

The Core Method: From Events to Matrices#

Step 1: Modeling the Intensity#

Step 2: The Event Camera Model#

Step 3: The Reconstruction Equation#

Reconstruction Pipeline: Solving the Puzzle#

Stage 1: Per-Pixel Estimation#

Stage 2: Spatio-Temporal Propagation#

Calibration: Taming the Hardware#

Contrast Threshold (\(C\))#

Angle Calibration#

Experimental Results#

1. Synthetic Validation#

2. Accuracy on Real Objects#

3. Application: Photoelasticity#

4. Application: Transparent Object Detection#

5. Application: Dynamic Humans and HDR#

Conclusion#