Introduction
In the fast-evolving world of computer graphics and vision, few techniques have made as big a splash as 3D Gaussian Splatting (3DGS). Since its introduction in 2023, it has impressed both researchers and developers by combining photorealistic novel view synthesis with real-time rendering speeds. For many, it felt like the practical, high-speed successor to Neural Radiance Fields (NeRFs) we had been waiting for.
However, as people began pushing 3DGS to its limits, cracks started to show. While it produced stunning results for camera views similar to those in the training data, it struggled severely when the viewing scale changed. Zooming in could make objects look overly thin and noisy; zooming out often caused fine details to blur into glowing artifacts.
This is exactly the problem that the paper “Mip-Splatting: Alias-free 3D Gaussian Splatting” sets out to solve. The authors identify the root cause of these scaling artifacts and propose an elegant, principled solution. Their method—Mip-Splatting—modifies the original 3DGS pipeline to make it robust against changes in camera distance and focal length, enabling crisp, artifact-free images across a wide range of scales.
Let’s visualize the problem:
Figure 1: Standard 3DGS works well at the training scale (a), but zooming out causes spokes to thicken (c), and zooming in makes them too thin and noisy (d).
In this article, we’ll explore the Mip-Splatting paper step-by-step. We’ll first explain how 3D Gaussian Splatting works, then discuss why it fails at different scales, and finally examine Mip-Splatting’s two-part solution: a 3D smoothing filter to handle zoom-in issues and a 2D Mip filter to perfect the zoom-out.
Background: How 3D Gaussian Splatting Works
Unlike mesh-based rendering or neural networks, 3D Gaussian Splatting represents a scene using an enormous set of semi-transparent, anisotropic blobs called Gaussians.
Each Gaussian is parameterized by:
- Position (\(\mathbf{p}_k\)): where it is in 3D space.
- Covariance (\(\boldsymbol{\Sigma}_k\)): a 3×3 matrix defining its shape and size.
- Color (\(c_k\)): possibly view-dependent, modeled with spherical harmonics.
- Opacity (\(\alpha_k\)): how transparent it is.
Mathematically, a Gaussian is:
From 3D to 2D
Rendering in 3DGS is fast because it uses rasterization (like game engines) instead of the slower ray tracing used by NeRFs.
Transform to Camera Space
All Gaussians are transformed from world coordinates to the selected camera view:Project to 2D
Each 3D Gaussian becomes a 2D Gaussian on the image plane:Splat and Alpha-Blend
These 2D Gaussians are drawn (“splatted”) onto the screen, blending front-to-back:
The Problem: Screen-Space Dilation
If a projected Gaussian is smaller than one pixel, holes can appear. To avoid this, the original 3DGS applies a fixed blur: 2D dilation, adding \(s\mathbf{I}\) to the covariance.
While this stabilizes training, it also introduces scale-specific artifacts.
The Core Issue: Zoom-In vs. Zoom-Out
A proper Gaussian and a degenerate ultra-thin one can render almost identically at the training scale because of fixed dilation. This leads to a shrinkage bias—training often produces many ultra-small Gaussians.
Zoom-In: Erosion & High-Frequency Noise
When zooming in, projected sizes grow, but the dilation remains fixed (now negligible). Thin gaps between Gaussians appear, causing erosion artifacts and noise:
- Thin objects look unnaturally sparse.
- High-frequency speckling emerges.
Zoom-Out: Dilation, Brightness & Aliasing
Zooming out makes Gaussians smaller, but dilation stays large:
- Dilation artifacts: thin details appear bloated.
- Energy spread: brightness becomes artificially high.
- Aliasing: high-frequency details clash with pixel sampling, causing jaggedness.
Mip-Splatting: The Two-Part Solution
1. The 3D Smoothing Filter — Zoom-In Remedy
Grounded in the Nyquist-Shannon Sampling Theorem, the authors limit the smallest resolvable detail based on training data.
Finding Sampling Limits
For each Gaussian, they compute the world-space sampling interval \(\hat{T}\):
From all training cameras that can see it, they find the highest sampling frequency \(\hat{\nu}_k\):
Applying the Filter
They convolve each Gaussian \(\mathcal{G}_k\) with a low-pass Gaussian \(\mathcal{G}_{\text{low}}\):
Because convolution of Gaussians is simple covariance addition:
This ensures no Gaussian is sharper than physically possible given the best training view—eliminating erosion and high-frequency noise.
Figure 3: Different cameras provide different sampling intervals. The smallest interval sets the maximum resolvable detail.
2. The 2D Mip Filter — Zoom-Out Remedy
To replace fixed dilation, the authors introduce a physically-driven anti-aliasing filter.
Inspired by mipmapping, this models how a camera pixel integrates light over its area. The ideal would be a box filter, but they use a Gaussian approximation sized exactly to a single pixel:
Unlike dilation, this matches pixel grid spacing and prevents aliasing without excessive blurring or brightness inflation.
Experiments & Results
Zoom-Out Test: Blender Dataset
Trained on full-resolution, rendered at lower resolutions:
Table 1: PSNR drops sharply for 3DGS when zooming out; Mip-Splatting remains high.
Figure 4: Mip-Splatting retains fine structure at low resolutions; others blur or distort.
Zoom-In Test: Mip-NeRF 360 Dataset
Trained at \(1/8\) resolution, rendered at higher scales:
Table 2: Mip-Splatting delivers clean detail across upscales; others suffer erosion or noise.
Figure 5: Mip-Splatting avoids artifacts and matches ground truth closely.
In-Distribution Performance
On standard same-scale benchmarks, Mip-Splatting matches 3DGS performance—proving it doesn’t sacrifice quality when scale remains unchanged.
Conclusion
Mip-Splatting exemplifies excellent research: identify a critical flaw, trace its cause, and implement a principled fix.
By replacing ad-hoc dilation with:
- 3D smoothing filter — constrains scene detail to training data limits, fixing zoom-in artifacts.
- 2D Mip filter — provides physically-correct anti-aliasing, fixing zoom-out artifacts.
Mip-Splatting makes 3DGS adaptable to arbitrary scales—a necessity for VR, games, and visual effects where camera movement is unrestricted.
With Mip-Splatting, zoom-ins and zoom-outs preserve the stunning clarity of 3DGS, no matter the viewpoint.