For the past few years, the world of computer graphics has been captivated by Neural Radiance Fields (NeRFs). These methods promised a groundbreaking way to capture and explore 3D scenes—producing stunningly realistic images from novel viewpoints using only a handful of photos.
The results were incredible, but they came at a huge computational cost: training a high-quality NeRF could take days, and rendering a single high-resolution image could take several seconds. Real-time exploration was out of reach.
This created a frustrating trade-off:
- Slow but high-quality: Methods like Mip-NeRF360 produce exquisite detail but require tens of hours to train and are painfully slow to render.
- Fast but lower-quality: Systems like Instant-NGP and Plenoxels slash training times to minutes, but often sacrifice fine details and visual fidelity.
For truly immersive experiences—like virtual reality, gaming, or cinematic visualization—we need both state-of-the-art quality and real-time frame rates.
Enter a 2023 breakthrough: 3D Gaussian Splatting for Real-Time Radiance Field Rendering.
This method doesn’t just incrementally improve the field—it makes a giant leap, introducing a technique that achieves photorealistic quality, trains in minutes, and—most impressively—renders high-resolution views in real-time.
Figure 1: The authors’ method achieves real-time rendering (up to 135 fps) with quality on par with or surpassing Mip-NeRF360 (0.071 fps), while training in a fraction of the time (51 min vs 48 hours).
In this article, we’ll unpack how 3D Gaussian Splatting works—exploring its three core pillars:
- A novel scene representation using anisotropic 3D Gaussians.
- An adaptive optimization strategy that both builds and refines the scene.
- A blazingly fast, differentiable rasterizer optimized for GPUs.
Background: The Road to Real-Time Radiance Fields
Before diving into Gaussian Splatting itself, let’s understand the landscape that shaped its design.
The NeRF Era: Beauty at a Cost
A traditional NeRF represents a scene as a continuous function—usually an MLP (Multi-Layer Perceptron)—that takes a 3D position and viewing direction as input, returning color and density. Rendering requires volumetric ray marching: shooting a ray through each pixel and querying the network hundreds of times per ray to accumulate color and opacity.
The core volumetric rendering equation is:
\[ C = \sum_{i=1}^{N} T_i \big(1 - \exp(-\sigma_i \delta_i)\big) \mathbf{c}_i \]\[ \text{with} \quad T_i = \exp\left(-\sum_{j=1}^{i-1} \sigma_j \delta_j\right) \]This process produces beautiful, continuous imagery—but is computationally heavy, making real-time interactive rendering impractical.
Seeking Speed: Grids and Hash Tables
Newer systems like Plenoxels and Instant-NGP store features in structured data formats (voxel grids, hash grids). This reduces the need for large neural networks, cutting training and rendering times dramatically. However, they still rely on ray marching, and quality is sometimes constrained by the fixed grid resolution.
The Point-Based Alternative
Inspired by older computer graphics paradigms, point-based rendering uses discrete points (or “splats”) projected to the image plane and blended together. The point blending equation is surprisingly similar to volumetric rendering:
\[ C = \sum_{i \in \mathcal{N}} c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j) \]Here, each point has a color \(c_i\) and opacity \(\alpha_i\), and front points occlude those behind. The similarity between these formulations is key—it means we can replace a continuous neural field with discrete primitives that can be rendered efficiently.
The 3D Gaussian Splatting Method: Three Core Innovations
Figure 2: Pipeline overview: starting from sparse Structure-from-Motion (SfM) points, the method builds an optimized set of 3D Gaussians and renders them in real-time using a custom rasterizer.
1. Representation: Flexible 3D Gaussians
Instead of voxels or neural grids, the scene is modeled as a collection of 3D Gaussians—ellipsoidal “fuzzy blobs” in world space. Each Gaussian has:
- Position (\(\mu\)): Center in 3D space.
- Covariance (\(\Sigma\)): Shape and orientation, enabling anisotropy—stretching, flattening, or rotating to match surfaces and details.
- Opacity (\(\alpha\)): Transparency for blending.
- Color (SH coefficients): Encoding view-dependent appearance via Spherical Harmonics.
Their shape is defined by:
\[ G(x) = e^{-\frac12 x^T \Sigma^{-1} x} \]Optimizing \(\Sigma\) directly risks producing invalid matrices. To solve this, the authors store:
- Scaling vector \(s\)
- Rotation quaternion \(q\)
These are converted into a valid covariance matrix via:
\[ \Sigma = R S S^T R^T \]This approach handles anisotropy naturally, enabling compact, precise scene representations.
Figure 3: Shrinking optimized Gaussians reveals their elongated shapes. They align to real surfaces, providing a compact, high-fidelity representation.
2. Optimization: Adaptive Density Control
Training starts from sparse SfM points. Early on, the model is far too coarse. Adaptive density control strategically adds or removes Gaussians based on view-space positional gradients.
Two key operations:
- Clone (Under-reconstruction): When details are missing, small Gaussians are cloned and nudged to fill gaps.
- Split (Over-reconstruction): When a large Gaussian tries to cover too much fine geometry, split it into two smaller ones.
Figure 4: Cloning adds more coverage in under-reconstructed areas. Splitting adds detail to over-reconstructed regions.
Pruning removes Gaussians whose opacity falls below a threshold. This continuous cycle of cloning, splitting, and culling ensures the representation evolves efficiently, guided by a hybrid loss function:
\[ \mathcal{L} = (1 - \lambda)\mathcal{L}_1 + \lambda \mathcal{L}_{\text{D-SSIM}} \quad \text{with} \quad \lambda = 0.2 \]3. Renderer: Fast Tile-Based Differentiable Rasterizer
Gaussian Splatting replaces slow ray marching with a GPU-friendly rasterizer:
- Projection: 3D Gaussians become 2D splats in the camera view.
- Tiling: The image plane is divided into small tiles (e.g., 16×16 pixels).
- Sorting: Each Gaussian instance (per tile) gets a depth+tile key. A single global GPU radix sort orders all splats front-to-back.
- Forward Pass (Rasterize): Each tile is rendered in parallel by traversing its list of Gaussians. Pixels stop processing when fully opaque.
- Backward Pass (Gradients): Instead of storing all contributing splats per pixel, the sorted list is traversed again (back-to-front), reconstructing needed intermediate values from only the final accumulated opacity.
This design has no hard limit on contributing splats per pixel—critical for high depth complexity scenes.
Experiments & Results
Quantitative Performance
Table 1: 3D Gaussian Splatting achieves quality on par or better than Mip-NeRF360, trains in ~40 minutes, and renders at >130 FPS.
Highlights:
- Ours-30K models match or exceed Mip-NeRF360 quality while training ~70× faster and rendering ~2000× faster.
- Ours-7K (~7 minutes training) matches Instant-NGP quality, with room to improve if trained longer.
Qualitative Comparisons
Figure 5: On held-out views, Gaussian Splatting often delivers sharper details and fewer artifacts than even Mip-NeRF360.
Training Progress
Figure 6: Some scenes are already high quality at 7K iterations. Additional training refines background details and reduces artifacts.
Why It Works: Ablation Insights
Anisotropy Matters
Figure 10: Anisotropic Gaussians better align to surfaces, capturing fine detail with fewer points.
Unlimited Gradients
Figure 9: Limiting gradients to 10 splats per pixel (left) creates artifacts. The full method (right) avoids this.
Densification Balance
Figure 8: Removing clone or split steps degrades reconstruction quality, underscoring adaptive control’s importance.
Limitations
Like all radiance field methods, Gaussian Splatting can struggle in poorly captured regions.
Figure 11: In sparse coverage areas, Mip-NeRF360 produces floaters; Gaussian Splatting may produce coarse, blotchy splats.
Memory usage, while vastly better than older point-based methods, exceeds that of compact NeRF variants. Large scenes may demand >20 GB of GPU memory in this prototype—optimizations could reduce this.
Conclusion: A Paradigm Shift
3D Gaussian Splatting delivers real-time, state-of-the-art radiance field rendering—a feat once thought impossible without continuous neural fields.
By fusing:
- Explicit anisotropic primitives for flexibility and compactness
- Adaptive optimization to refine the scene dynamically
- Highly parallel rasterization leveraging GPU architectures
…it achieves unprecedented speed and quality together.
This approach bridges classic graphics (rasterization, splatting) and modern neural scene representations—a hybrid model that opens doors for interactive applications in VR, AR, gaming, cinematic production, and beyond.
It’s a compelling answer to the long-standing trade-offs in neural rendering—and perhaps a blueprint for the next generation of real-time photorealistic graphics.