Introduction

In the rapidly evolving world of 3D reconstruction and rendering, we are currently witnessing a tug-of-war between two critical factors: speed and quality. On one hand, we have 3D Gaussian Splatting (3DGS), which took the world by storm with its ability to render scenes in real-time using rasterization. On the other hand, we have high-fidelity approaches like Ray-Based Gaussian Splatting (RayGS), which offer superior visual quality—particularly for complex geometries and view-dependent effects—but suffer from computational heaviness that makes them struggle in real-time applications, especially in Virtual Reality (VR).

For undergraduate and master’s students exploring computer vision, this trade-off is a classic engineering problem. Do you choose the method that runs at 100 FPS but has visual artifacts, or the one that looks perfect but crawls at 5 FPS?

The research paper “Hardware-Rasterized Ray-Based Gaussian Splatting” by Samuel Rota Bulò and colleagues at Meta Reality Labs proposes a solution that refuses to compromise. They present a method to render RayGS models using the hardware rasterization pipeline found in standard GPUs. The result? A rendering engine that retains the mathematical exactness and visual fidelity of ray-based methods while achieving frame rates suitable for VR—up to 40 times faster than previous ray-based implementations.

In this deep dive, we will unpack how they achieved this. We will explore the mathematics behind mapping 3D Gaussian intersections to hardware-friendly structures, the geometry of “lifting” quads into 3D space, and how to handle aliasing when moving through a virtual world.

Background: The Battle of the Splats

To understand the innovation, we first need to understand the baseline. 3D Gaussian Splatting represents a scene as a cloud of 3D Gaussians (ellipsoids), each with a position, rotation, scale, opacity, and color.

The Rendering Equation

Regardless of the specific method, the core mechanism for determining the color of a pixel follows a volume rendering equation. It is a weighted sum of colors contributed by the Gaussians along a ray, where the weights depend on opacity and transmittance (how much light gets blocked by previous Gaussians).

Rendering equation showing the summation of colors weighted by opacity and transmittance.

Here, \(\mathcal{R}\) is the final pixel color, \(\xi\) is the color of a specific Gaussian, and \(\omega\) is the rendered opacity of that Gaussian. The core difference between standard 3DGS and RayGS lies entirely in how that opacity \(\omega\) is calculated.

Standard 3DGS vs. RayGS

The rendered opacity is derived from the “prior” opacity of the Gaussian (\(o_i\)) and its divergence (\(\mathcal{D}\)), which represents how far a ray is from the center of the Gaussian.

Equation for rendered opacity based on divergence.

Standard 3DGS: The Approximation

In the original 3DGS paper, the 3D Gaussian is projected onto the 2D image plane. The renderer approximates the perspective projection using an affine transformation (linearization). This results in a 2D ellipse on the screen.

Divergence equation for standard Gaussian Splatting using 2D projection.

This method is incredibly fast because it maps perfectly to tile-based rasterizers. However, the linearization assumes a pinhole camera and introduces errors when Gaussians are close to the camera or viewed at steep angles.

RayGS: The Exact Solution

Ray-Based Gaussian Splatting (RayGS) drops the projection approximation. Instead, it calculates the exact point of maximum density along the viewing ray as it passes through the 3D Gaussian.

Divergence equation for Ray-Based Gaussian Splatting.

Here, \(\tau(x)x\) represents the specific point along the ray \(x\) where the Gaussian density is highest. This formulation is geometrically accurate. It avoids the “popping” artifacts and inconsistencies seen in standard 3DGS, particularly when navigating close to objects.

The image below illustrates the practical difference. Notice the spikes and inconsistent geometry in the standard GS model (top) versus the smooth, correct geometry in the RayGS model (bottom).

Artifacts present in renderings with GS model (top) versus RayGS (bottom).

The Problem: Calculating exact ray-Gaussian intersections is computationally expensive. Previous RayGS implementations relied on CUDA-based ray tracing, which is significantly slower than the hardware-accelerated rasterization used by standard 3DGS.

Core Method: Hardware-Rasterized RayGS

The researchers’ goal was to implement the high-quality RayGS math inside a fast, standard vertex-fragment shader pipeline.

To use hardware rasterization, you generally need to feed the GPU a polygon (usually a triangle or a quad) to draw. In standard 3DGS, this is easy: the 2D projection of a Gaussian is an ellipse on the screen, so you just draw a 2D quad around it.

In RayGS, however, the “support” (the visible area) of a Gaussian isn’t a simple 2D shape on the screen plane derived from a linear projection. The set of rays that intersect the Gaussian with sufficient density forms a complex shape.

The Vertex Shader: Finding the Enclosing Quad

The heart of this paper is a geometric derivation that allows the vertex shader to calculate a 3D quad that perfectly encloses the visible part of the Gaussian.

We need to find the set of all rays \(\mathcal{E}\) that hit the boundary of the Gaussian’s visible volume (defined by a cutoff \(\kappa\)).

Definition of the set E representing rays at the boundary of the primitive’s support.

Finding a bounding box for this set \(\mathcal{E}\) directly in 3D space is difficult. The authors solve this by discovering an isomorphism (a structure-preserving map) between this complex set \(\mathcal{E}\) and the unit circle \(\mathbb{S}_1\).

The Geometric Intuition

Imagine the complex intersection of the ray cone and the Gaussian ellipsoid. The authors derive a transformation \(\Phi\) that maps this complex 3D geometry onto a simple 2D unit circle.

  1. Original Space: The rays and the Gaussian exist in standard 3D space.
  2. Transformation: By applying a sequence of rotations and scalings (involving the Gaussian’s covariance matrix \(\Sigma\)), they map the vectors in \(\mathcal{E}\) to a unit sphere, and eventually isolate the components to a unit circle.

Schematic overview of the isomorphism between the ray set E and the unit circle.

The figure above visualizes this transformation. By mapping the problem to the “Unit Circle World” (subfigure d), finding a bounding box becomes trivial—it’s just a square around the circle.

Creating the 3D Quad

Once the unit square is defined around the unit circle, the authors use the inverse transformation \(\Phi^{-1}\) to map that square back into real 3D space. This results in a 3D quad that encloses the Gaussian’s visible volume.

Examples of 3D quads obtained by mapping 2D squares via the isomorphism.

However, there are infinitely many squares that can enclose a circle (rotated at any angle). Which one is best? The authors optimize for the quad that spans the smallest area in 3D space (shown in red in the figure above). This tight fit minimizes the number of pixels the fragment shader has to process, saving computation time.

They find this optimal orientation by solving an eigenvector problem:

Optimization problem to find the optimal quad orientation u1.

This equation finds the vector \(u\) on the unit circle that results in the longest axis when mapped back to 3D. The minor axis is simply orthogonal to it.

The final calculation for the quad vertices (\(V_{ray}\)) in the vertex shader looks like this:

Equation for calculating the vertices of the ray quad.

This formula allows the vertex shader to output a 3D quad. Crucially, because this quad exists in 3D (rather than just 2D screen coordinates), it interacts correctly with the camera’s view frustum. However, the authors note that near-plane clipping must be handled carefully. If a quad intersects the camera’s near plane, it must be culled entirely to avoid visual discontinuities.

Undesired effects of near-plane clipping showing sharp discontinuities.

The Fragment Shader: Calculating Opacity

Once the rasterizer determines which pixels fall inside the 3D quad, the fragment shader takes over. Its job is to calculate the exact opacity for that specific pixel ray.

Because the quad was constructed using the rigorous RayGS formulation, the fragment shader doesn’t need to do expensive ray-marching or iterative searching. It can compute the exact divergence \(\mathcal{D}_{ray}\) analytically.

The derivation leads to a surprisingly simple formula for the fragment shader. By interpolating values calculated at the vertices (\(Z_{ray}\)), the divergence can be computed using a dot product:

Equation for calculating ray divergence in the fragment shader.

This efficiency is key. The complex ray-ellipsoid intersection math is “baked” into the quad vertices and the interpolation constants, leaving the fragment shader with very little work to do per pixel.

Solving Aliasing: MIP for RayGS

In VR and gaming, “shimmering” or jagged edges (aliasing) break immersion. This happens when we sample a 3D scene with a single ray per pixel. A pixel represents an area, not a point. If a Gaussian is small or far away, a single ray might miss it entirely, or hit it by luck, causing flickering as the camera moves.

Standard 3DGS has methods to handle this (MIP-Splatting), but RayGS did not—until this paper.

The Strategy

The authors propose a “Multiscale” approach. Instead of treating the ray as an infinitely thin line, they treat the pixel as a 2D Gaussian distribution. They project the 3D Gaussian primitive onto a plane orthogonal to the ray and convolve (smooth) it with the pixel’s footprint.

The resulting distribution \(P_{MIP}\) describes the probability of the ray hitting the Gaussian, accounting for the pixel’s size (\(\sigma_x\)).

Equation for the MIP probability distribution.

This effectively makes the Gaussian “blurrier” or larger when it is far away, ensuring that the pixel captures its average contribution rather than a noisy point sample.

For the hardware implementation, they simplify this into a modulation factor that adjusts the Gaussian’s opacity and covariance matrix based on distance:

Approximation equation for MIP-based opacity modulation.

Visual Impact of MIP

The impact of this technique is stark. In the image below, look at the bicycle spokes.

  • Top-Left (No MIP): The spokes are jagged and broken (aliasing).
  • Top-Right (MSAA): Multi-Sample Anti-Aliasing helps slightly but is expensive and imperfect.
  • Bottom-Right (MIP-VKRayGS): The spokes are smooth and continuous.

Benefits of MIP formulation showing comparison between aliased and anti-aliased renderings.

Experiments and Results

The researchers implemented this method (dubbed VKRayGS for Vulkan Ray-Based Gaussian Splatting) and compared it against the state-of-the-art CUDA-based renderer (GOF).

Speed Comparison

The performance gains are massive. On standard benchmark datasets (MipNeRF360 and Tanks&Temples), VKRayGS achieves frame rates that are on average 40 times higher than the competing method.

Table comparing speed (FPS) and quality metrics between GOF and VKRayGS.

Looking at the table, scenes like “bicycle” go from an unplayable 4 FPS with GOF to a smooth 177 FPS with VKRayGS on an RTX2080. This is the difference between an offline render and a VR-ready application.

Quality Comparison

Does the speed come at the cost of quality? Not meaningfully. The rendering logic is mathematically equivalent to the slower ray-casting methods. The small differences in metrics (PSNR, SSIM) are attributed to minor implementation details (like clipping planes) rather than fundamental flaws in the method.

Qualitatively, the results are nearly indistinguishable.

MipNeRF360 Scenes: In scenes like “bicycle” and “garden,” the visual fidelity of the fast rasterizer (right) matches the slow ray-tracer (left).

Side-by-side comparison of MipNeRF360 scenes rendered by GOF and VKRayGS.

Tanks & Temples: The same holds for outdoor scenes. The lighting, geometry, and texture details are preserved.

Side-by-side comparison of Tanks&Temples scenes rendered by GOF and VKRayGS.

Conclusion and Implications

The paper “Hardware-Rasterized Ray-Based Gaussian Splatting” closes a significant gap in neural rendering. By mathematically bridging the world of ray-based intersections with the world of hardware-accelerated rasterization, the authors have unlocked high-fidelity 3DGS for real-time applications.

Key Takeaways:

  1. RayGS is Superior: It handles geometry better than standard 3DGS, avoiding popping artifacts and near-camera distortion.
  2. Isomorphism is Key: Mapping the ray-Gaussian intersection to a unit circle allows for the efficient calculation of optimal 3D bounding quads.
  3. Rasterization Wins: By moving the workload from generic CUDA compute to the specialized graphics pipeline (Vertex/Fragment shaders), performance skyrockets (40x speedup).
  4. MIP Matters: Integrating scale-dependent smoothing is essential for clean, alias-free rendering in dynamic environments.

For students and developers, this work highlights the power of understanding the underlying geometry of a problem. It wasn’t brute-force optimization that made this fast—it was a clever mathematical transformation that allowed the use of standard, highly optimized hardware tools.