Why Blurry LiDAR and RGB Are the Future of Handheld 3D Scanning

In the world of computer vision and robotics, 3D reconstruction is the holy grail. Whether it’s a robot navigating a warehouse, a VR headset mapping your living room, or a Mars rover scanning a dune, the ability to turn the real world into a digital 3D model is critical.

For years, the gold standard for handheld scanning (like what you might find on a high-end smartphone) has been a combination of an RGB camera and a sparse LiDAR sensor. This setup works reasonably well in perfect conditions. But the real world isn’t perfect. We encounter dark rooms, white textureless walls, and black objects that absorb light. In these “challenging” scenarios, traditional RGB-based reconstruction fails because it can’t “see” features, and sparse LiDAR fails because it doesn’t capture enough data points to fill in the gaps.

A research team from MIT has proposed a counterintuitive solution: Blurred LiDAR.

In their paper, “Blurred LiDAR for Sharper 3D”, the authors demonstrate that by using a diffuse (blurry) flash of laser light instead of precise dots, and fusing that data with RGB images using a smart algorithm, we can achieve significantly sharper, more robust 3D scans.

Figure 1: The proposed method uses Diffuse LiDAR and RGB to reconstruct challenging scenes.

The Problem: When Cameras and Lasers Fail

To understand the innovation, we first need to look at why current methods struggle.

1. RGB Cameras: Modern 3D reconstruction techniques, like Neural Radiance Fields (NeRFs), rely heavily on matching textures across different images. If you point an RGB camera at a smooth white wall or a black leather chair in a dimly lit room, the algorithm struggles. It can’t find unique “features” to lock onto, resulting in a messy or failed 3D model.

2. Sparse LiDAR: To help cameras, devices like the iPhone Pro use sparse LiDAR. This sensor projects a grid of distinct infrared dots onto the scene. By measuring how long it takes for each dot to bounce back (Time of Flight), it calculates depth.

The problem is in the name: Sparse. These sensors only measure depth at specific points. If a small object falls between the dots, the sensor misses it entirely. To get a full picture, you have to move the device around extensively to “paint” the scene with dots, which isn’t always practical.

The Solution: Diffuse (Blurred) LiDAR

The researchers propose replacing the grid of sharp dots with a diffuse flash. Imagine a camera flash, but in the infrared spectrum.

Figure 2: Comparing Sparse LiDAR (dots) with Diffuse LiDAR (flash).

As illustrated in Figure 2, sparse LiDAR (a) gives precise depth at specific points but leaves huge gaps. Diffuse LiDAR (b) covers 100% of the scene in the field of view. Every pixel in the sensor receives light from the scene.

However, there is a catch. Because the light is diffuse, a single pixel on the sensor receives light from a wide area (a wide “cone”) of the scene. This introduces spatial ambiguity. The sensor knows when the photons returned (distance), but because the pixel sees a wide area, it’s not immediately obvious where within that area the object is located. The image is effectively spatially “blurred.”

This seems like a step backward—trading precision for blur. But this is where the fusion comes in. The researchers realized that RGB cameras and Diffuse LiDAR are perfect opposites:

RGB: High spatial resolution (sharp image), but poor depth information.
Diffuse LiDAR: Low spatial resolution (blurry image), but rich depth information (metric depth).

By mathematically combining these two signals, the system can use the sharp edges from the RGB image to “de-blur” the LiDAR data, resulting in a reconstruction that is better than either sensor could achieve alone.

The Math of Light and Time

To understand how the computer processes this, we have to look at how LiDAR measures time.

In a conventional, ideal LiDAR setup, a laser hits a single point \(\mathbf{x}\). The sensor measures the time of flight (\(t\)) for that specific point. Mathematically, this is a “delta function”—a sharp spike in the signal at the exact moment the light returns:

Equation 1

However, with Diffuse LiDAR, a single sensor pixel collects light from a whole region of surface points (\(\Omega\)). The signal the sensor receives is a sum (integral) of all the reflections from that region:

Equation 2

This integral creates a transient histogram—a graph of photon intensity over time. Instead of a single spike, the sensor sees a complex curve of peaks and valleys representing objects at different distances within that pixel’s view.

Why Diffuse is Better for Recovery

You might wonder if this “mixed” signal makes it impossible to recover the 3D shape. The authors performed a “recoverability analysis” to test this. They simulated how much information can be recovered from sparse vs. diffuse signals given a limited number of views.

Figure 3: Recoverability analysis showing Diffuse LiDAR outperforms Sparse LiDAR when views are limited.

As shown in Figure 3, because Diffuse LiDAR covers the entire volume (voxels) of the scene, it achieves a higher “rank” (mathematical recoverability) much faster than sparse LiDAR. Even though the data is blurry, the coverage is so much better that it outweighs the loss of precision, provided you have a way to decode it.

The Core Method: Gaussian Surfels and Sensor Fusion

To decode this blurry data and build a 3D model, the researchers use a technique called Analysis-by-Synthesis.

They create a virtual 3D model, simulate what the RGB camera and Diffuse LiDAR should see given that model, compare it to the real sensor data, and then update the model to minimize the error.

Figure 4: The reconstruction pipeline.

1. The Scene Representation: Gaussian Surfels

Instead of using a cloud of points or a voxel grid, the team represents the scene using Gaussian Surfels. Think of these as tiny, flat, 2D ellipses floating in 3D space. They can rotate and scale to fit the surface of objects perfectly.

A 3D Gaussian is usually defined by a mean (position) and covariance (shape/orientation):

Equation 3

To make sure these Gaussians behave like flat surfaces (surfels), the covariance matrix \(\Sigma\) is constructed using a rotation matrix \(\mathbf{R}\) and a scaling matrix \(\mathbf{S}\), where one of the scaling axes is set to zero (flattening the sphere into a disc):

Equation 4

2. Rendering the Model

To update these surfels, the system needs to render them onto a 2D screen to compare with the real images. This is done by projecting the 3D Gaussians into 2D:

Equation 5 Equation 6

Rendering Color (RGB): The system shoots rays through pixels. As a ray hits the surfels, it blends their colors based on their opacity (\(\alpha\)) and transmittance (\(T\)). This is standard volumetric rendering:

Equation 7

Rendering Depth: It also calculates the expected depth at each pixel by averaging the distance \(d_i\) of the surfels along the ray:

Equation 8 Equation 9

Rendering Transients (The Innovation): This is the tricky part. The system must simulate the blurred LiDAR histogram. Since a diffuse pixel sees a cone of space, the renderer samples multiple rays within that cone. For every surfel hit, it calculates which “time bin” (histogram bucket) the photon would fall into based on its distance:

Equation 10

To make the process differentiable (so the AI can learn), they use “soft histogramming,” distributing the signal across adjacent time bins:

Equation 11

Finally, they sum up all the contributions to build the simulated transient histogram \(i[t]\):

Equation 12

3. The Scene-Adaptive Loss Function

Here is where the magic happens. The system has two sources of error:

RGB Loss: The difference between the rendered color and the real photo.
Transient Loss: The difference between the simulated histogram and the real LiDAR data.

In a bright room with textured walls, RGB is reliable. In a dark room or on a white wall, RGB is unreliable. The researchers introduced a Scene-Adaptive Loss that dynamically shifts trust between the two sensors.

They calculate a “usefulness” weight \(w_p\) for every patch of the image based on its texture and Signal-to-Noise Ratio (SNR) using a sigmoid function:

Equation 13

The Combined Loss: The final objective function combines RGB and LiDAR errors, weighted by this usefulness score.

For RGB, if the weight \(w_p\) is high (good texture/light), the loss counts more:

Equation 14

For LiDAR, the weighting is inverted \((1 - w_p)\). If the image is dark or textureless (low \(w_p\)), the system ignores the RGB error and tries strictly to match the LiDAR histogram (KL Divergence):

Equation 15

The total loss optimizes for color, LiDAR consistency, and geometric regularity:

Equation 16

Experiments and Results

The researchers tested their method against several baselines, including RGB-only methods and methods using Sparse LiDAR. They used both synthetic datasets (Blender) and real-world captures.

Synthetic Performance

In controlled simulations, the “Blurred LiDAR” method showed a distinct advantage, particularly in scenes with textureless objects or ground planes.

Table 1: Quantitative results showing lower errors (MAE) for the proposed method.

As seen in Table 1, the proposed method (Ours) consistently achieved the lowest Depth Mean Absolute Error (D.MAE) across different texture variations. It shines brightest in the “Textured Object” and “Textured Plane” categories, where pure RGB methods often get confused by the lack of features.

Figure 5: Qualitative comparisons. Note the cleaner geometry in row (c) and (d).

In Figure 5, look at row (b) and (c). The baseline methods struggle to separate the object from the floor or create jagged, noisy surfaces. The Diffuse LiDAR method produces smooth, accurate meshes, even when the object is featureless.

Real-World Robustness

The team built a real prototype using a commercial SPAD (Single-Photon Avalanche Diode) sensor and a RealSense camera. They tested it on challenging objects like a black leather boot and a football.

Figure 7: Real-world captures. The method recovers the shape of the dark boot and football better than sparse LiDAR.

Figure 7 visualizes these results. The “Sparse LiDAR + RGB” approach (column 3) often fails to define the boundary between the object and the floor, especially with the football and the boot. The “Ours” column (purple) shows a much clearer definition of the object’s geometry.

In extreme low-light simulations, the adaptive loss function proved its worth. As the lighting noise increased (simulating pitch darkness), the system automatically shifted its reliance to the LiDAR signal, maintaining accurate depth estimation long after the RGB-only methods had failed completely.

Conclusion

This research highlights a fascinating paradox in sensing: sometimes, “worse” data (blurry, low-resolution) is actually better, provided it offers complete coverage and you have the right algorithm to interpret it.

By fusing the high-resolution spatial data of cameras with the high-coverage depth data of diffuse LiDAR, this method bridges the gap between lightweight mobile scanning and high-fidelity industrial reconstruction. It opens the door for robots that can see in the dark and AR devices that work on blank white walls.

The “Blurred LiDAR” approach suggests that the future of 3D scanning might not just be about sharper sensors, but about smarter fusion of complementary signals.

The Problem: When Cameras and Lasers Fail#

The Solution: Diffuse (Blurred) LiDAR#

The Math of Light and Time#

Why Diffuse is Better for Recovery#

The Core Method: Gaussian Surfels and Sensor Fusion#

1. The Scene Representation: Gaussian Surfels#

2. Rendering the Model#

3. The Scene-Adaptive Loss Function#

Experiments and Results#

Synthetic Performance#

Real-World Robustness#

Conclusion#