Shrinking 3D Gaussian Splatting Scenes 31× and Rendering Them 4× Faster

3D Gaussian Splatting has been turning heads in the computer graphics community by enabling photorealistic scene reconstruction and real-time rendering from just a handful of images. The technique models a scene using millions of tiny, semi-transparent, colored blobs—as 3D Gaussians—each contributing to the final picture.

The catch? These reconstructed scenes are huge, often weighing in at multiple gigabytes. That makes them tricky to stream, impractical for mobile devices, and hard to integrate into VR/AR or games where every megabyte and millisecond matters.

A team from the Technical University of Munich addressed this challenge in their paper “Compressed 3D Gaussian Splatting for Accelerated Novel View Synthesis”. Their clever multi-stage compression pipeline slashes file sizes by up to 31× and boosts rendering speeds by up to 4×, all while keeping visual quality virtually unchanged.

$Figure 1. The proposed method compresses a 1.5\u202fGB scene down to 47\u202fMB, raising framerates from 54\u202fFPS to 93\u202fFPS with only a negligible PSNR drop.$

Figure 1. A side-by-side comparison of uncompressed and compressed renderings. Compression drastically reduces size and increases framerate, with minimal impact on quality.

In this post, we’ll break down the technology behind their results—how sensitivity-aware compression works, how quantization-aware training preserves quality at low bit-rates, and how their redesigned renderer leverages the compact format for maximum speed.

Quick Refresher: 3D Gaussian Splatting in Context

Before tackling compression, let’s recap the underlying technique. For years, Neural Radiance Fields (NeRFs) dominated novel view synthesis—creating new perspectives from input images—by training a neural network to represent a continuous volumetric scene.

While NeRFs can achieve impressive fidelity, they’re slow to train and render because each pixel requires expensive network queries.

In 2023, Kerbl et al. introduced 3D Gaussian Splatting (3DGS), replacing the implicit neural representation with an explicit point-based one:

Each Gaussian is defined by:

Position ($x$): its 3D coordinates.
Covariance ($\Sigma$): shape & orientation, expressed via a rotation quaternion ($q$) and a scaling vector ($s$).
Opacity ($\alpha$): transparency.
View-dependent Color (SH Coefficients): spherical harmonics parameters encoding RGB color that changes with viewing angle.

Rendering involves projecting each 3D Gaussian into a 2D ellipse:

\[ \Sigma' = J W \Sigma W^{T} J^{T} \]

where $W$ is the view transform and $J$ is the Jacobian of the projection. Pixels blend sorted splats from back to front:

\[ C = \sum_{i \in N} c_i \alpha_i \prod_{j=1}^{i-1} (1 - \alpha_j) \]

This design massively speeds up rendering compared to NeRFs. But storing position, rotation, scale, opacity, and dozens of SH coefficients for millions of Gaussians quickly snowballs into gigabytes of data.

The Compression Pipeline: Prioritize What Matters Most

The main insight driving the compression scheme is that some parameters barely influence the final image, while others are critical. By measuring parameter sensitivity and compressing the low-impact parts more aggressively, the team preserved quality while cutting away data.

Figure 2. Pipeline overview: sensitivity calculation, clustering into codebooks, fine-tuning with quantization, and entropy encoding produce a compressed scene from an optimized 3D Gaussian set.

Figure 2. End-to-end compression workflow: starting from an optimized 3D Gaussian reconstruction, parameters are analyzed, clustered into compact codebooks, fine-tuned at lower precision, then stored using entropy encoding.

Stage 1: Sensitivity-Aware Vector Clustering

The largest storage hogs are:

SH coefficients (view-dependent color)
Gaussian shape parameters (rotation & scale)

Many Gaussians share similar shapes or colors. This redundancy is exploited via vector quantization:
Instead of storing every parameter separately, create small codebooks of common colors and shapes, then record only a codebook index per Gaussian.

The twist: a standard k-Means treats all vectors equally. Here, they weight distances in clustering by parameter sensitivity:

\[ S(p) = \frac{1}{\sum_{i=1}^{N} P_i} \sum_{i=1}^{N} \left| \frac{\partial E_i}{\partial p} \right| \]

where $E$ is image energy and $P_i$ pixels in image $i$. High sensitivity = big visual impact, so those parameters must be preserved precisely.

Figure 3. Histograms of SH coefficient sensitivity. Most Gaussians have very low sensitivity, making them prime candidates for compression.

Figure 3. Sensitivity distributions for SH coefficients in three scenes. Only a small fraction of Gaussians are highly sensitive to color changes.

Parameters above sensitivity thresholds skip clustering entirely—they’re stored exactly, added directly to the codebook. Separate clustering runs are done for color and shape, replacing bulk data with lightweight indices.

Stage 2: Quantization-Aware Fine-Tuning

Clustering is inherently lossy. To claw back some quality, the scene (including codebooks) is fine-tuned on the training images.

The key innovation: quantization-aware training.
They aim to reduce precision from 32-bit floats to compact 8-bit integers (positions use 16-bit floats to avoid quality hits). Simply rounding after training would degrade results. Instead, training simulates 8-bit quantization during the forward pass, but computes gradients on full-precision values.

This teaches parameters to be robust to quantization noise and makes low-bit storage feasible without visible artifacts.

Stage 3: Entropy Encoding with Spatial Ordering

The final step is a lossless compression via DEFLATE. To make this more effective, Gaussians are sorted along a Z-order (Morton) curve, grouping spatially adjacent points.

Spatial neighbors often share similar colors and shapes, so ordering them together increases run-length patterns that DEFLATE easily compresses, squeezing out the last redundancies.

A Renderer Built for Speed

Compression alone doesn’t guarantee faster rendering—but smaller, well-structured data lends itself to speedups.

Kerbl et al.’s original CUDA rasterizer was optimized for RTX GPUs but didn’t integrate cleanly into engines or low-power devices.

The new renderer uses hardware rasterization via standard API calls (e.g., WebGPU), working even in browsers:

Pre-pass: Cull off-screen Gaussians, compute view-dependent colors, project to 2D ellipses.
Depth Sort: Efficient GPU sort for correct blending.
Rasterize: Draw each Gaussian as a quad scaled to its 2D footprint; fragment shader applies falloff and blends colors.

This offloads work to fixed-function GPU units and benefits from the decreased bandwidth demand of the compressed data.

Results: Small Files, Fast Frames, High Fidelity

Across synthetic and real-world datasets, compression ratios averaged 26× with PSNR drops of just 0.23 dB—well below the ~0.5 dB threshold for human perception.

Table 1. Compression ratios and quality metrics (PSNR, SSIM, LPIPS) show minimal quality loss compared to 3DGS at massive size reductions.

Table 1. Metrics comparing uncompressed 3DGS and the compressed representation across datasets.

Side-by-side visualizations show negligible differences:

Figure 4. Synthetic scenes: Baseline vs. Compressed.

Figure 4. Uncompressed vs. compressed renders in synthetic scenes—quality is essentially identical.

Figure 5. Real-world scenes: Ground truth, Baseline 3DGS, and Compressed representation.

Figure 5. Compressed renderings retain real-world scene quality.

Even in the worst PSNR case, differences boil down to subtle, barely perceptible color shifts:

Figure 6. The largest PSNR drop case; mean absolute error remains low and differences are hard to spot.

Figure 6. Worst-case scene comparison: compressed version remains visually faithful.

Performance gains are substantial:
On an NVIDIA RTX A5000, the “Bicycle” scene jumps from 93 FPS to 321 FPS—a 3.45× boost. The new pipeline also runs on integrated GPUs where the original couldn’t.

Table 2. Rendering FPS improvements on multiple GPUs, including low-power devices.

Table 2. Compressed format + hardware rasterizer accelerates rendering across devices.

Ablation Study Insights

To verify each pipeline step’s contribution, the team applied them sequentially to a scene:

Table 3. Step-by-step compression impact on size and PSNR.

Table 3. Color clustering achieves the largest size cut, but QA fine-tuning is key to recovering lost quality.

Highlights:

Color clustering: Biggest size reduction, biggest quality hit.
QA fine-tuning: Recovers much of the lost PSNR while halving size again.
Morton order sorting: Provides a final significant size decrease at zero quality cost.

Conclusion: Enabling Practical 3DGS Everywhere

This work proves that 3D Gaussian Splatting’s unwieldy size is not an immutable barrier. With a sensitivity-driven approach, smart quantization, and spatially aware encoding, massive point-based reconstructions can become lightweight, fast, and portable.

An average 26× compression and up to 4× speedup make high-fidelity 3D scenes viable for:

Streaming photorealistic environments over the web.
Stand-alone VR/AR headsets with limited memory.
Game engines needing lightweight yet realistic assets.

The main open challenge is compressing Gaussian positions without visible errors. But by bridging the gap between quality and practicality, this pipeline paves the way toward ubiquitous, real-time, photo-real 3D content across devices and platforms.

Quick Refresher: 3D Gaussian Splatting in Context#

The Compression Pipeline: Prioritize What Matters Most#

Stage 1: Sensitivity-Aware Vector Clustering#

Stage 2: Quantization-Aware Fine-Tuning#

Stage 3: Entropy Encoding with Spatial Ordering#

A Renderer Built for Speed#

Results: Small Files, Fast Frames, High Fidelity#

Ablation Study Insights#

Conclusion: Enabling Practical 3DGS Everywhere#