Introduction
In the rapidly evolving world of computer graphics and computer vision, few techniques have made as much noise recently as 3D Gaussian Splatting (3DGS). It offered a brilliant alternative to Neural Radiance Fields (NeRFs), allowing for real-time rendering of complex scenes by representing them as millions of 3D Gaussian ellipses. It was fast, high-quality, and explicit.
But as with any foundational technology, once the dust settled, researchers began to ask: Is the Gaussian distribution actually the best primitive for the job?
Gaussians are mathematically convenient, but they are rigid. They have “thin tails,” meaning their influence drops off very quickly from the center. To represent complex shapes or large homogeneous regions (like a blue sky), standard 3DGS often has to stack thousands of Gaussians on top of each other. Furthermore, 3DGS is purely additive—it only “splats” positive density onto the screen. It cannot “carve out” or subtract light.
Enter a new contender: Student Splatting and Scooping (SSS).
In this post, we will dive deep into a paper that proposes a fundamental generalization of the 3DGS framework. The authors argue that we shouldn’t be restricted to Gaussians, nor should we be restricted to positive-only splatting. By switching to the Student’s t-distribution and introducing negative densities (Scooping), SSS achieves state-of-the-art rendering quality while using significantly fewer parameters—sometimes reducing the number of required primitives by over 80%.
Let’s unpack how this works, the math behind it, and why it might be the future of neural rendering.
Background: The Limits of 3DGS
To understand why SSS is necessary, we first need to look at what it replaces. 3D Gaussian Splatting represents a scene as a collection of 3D Gaussians. Each Gaussian has a position, a covariance (shape), opacity, and color.
Mathematically, 3DGS views the scene as an unnormalized Gaussian mixture model:

Here, \(w_i\) is a weighting factor derived from opacity and color. When rendering an image, these 3D ellipsoids are projected onto the 2D camera plane (a process called splatting) and alpha-blended from front to back.

This formula is essentially a weighted sum. It works well, but it has limitations:
- Rigidity: Gaussians have a fixed “bell curve” shape. They cannot change how “fat” their tails are.
- Additivity: The weights \(w_i\) must be positive. You can only add color to a pixel; you cannot subtract contributions from primitives behind the current one.
This leads to inefficiency. To model a shape that doesn’t perfectly fit a Gaussian (which is most shapes), 3DGS has to use many small Gaussians to approximate the volume.
The “Student” in SSS: A More Flexible Primitive
The first major contribution of this paper is replacing the Gaussian distribution with the Student’s t-distribution.
You might remember the t-distribution from statistics class as the “cousin” of the Gaussian used when sample sizes are small. However, in this context, its superpower is its learnable degree of freedom, denoted by \(\nu\) (nu).
The parameter \(\nu\) controls the “fatness” of the distribution’s tails.
- When \(\nu \to \infty\), the t-distribution becomes a Gaussian (thin tails).
- When \(\nu \to 1\), it becomes a Cauchy distribution (very fat tails).
This flexibility allows a single primitive to shapeshift. It can be sharp and concentrated, or it can be broad and spread out.

As shown in Figure 1, notice how the red dashed line (\(\nu = 100\), effectively Gaussian) drops to zero very quickly. The green line (\(\nu = 1\)), however, spreads out much further.
Why does this matter for rendering? A “fat-tailed” primitive can cover a larger screen area with higher density than a Gaussian. This means you need fewer of them to represent large, uniform regions like walls or skies.
The mathematical formulation for the 3D Student’s t-distribution used in the paper looks like this:

By making \(\nu\), \(\mu\) (position), and \(\Sigma\) (covariance) all learnable, SSS essentially selects the best primitive shape from an infinite family of distributions for every single splat in the scene.
Projection to 2D
For a 3D rendering engine to be fast, we must be able to project these 3D shapes into 2D analytically (closed-form). If we had to numerically integrate every ray, it would be too slow.
Fortunately, the Student’s t-distribution shares a property with Gaussians: it is closed under affine transformations and marginalization. The authors derive the closed-form projection of a 3D t-distribution onto the 2D image plane:

This formula allows SSS to utilize the same efficient rasterization pipeline as 3DGS, maintaining the real-time rendering speed that makes splatting so attractive.
Splatting and Scooping: The Power of Negative Density
The second major innovation is Scooping.
In standard 3DGS, primitives are additive. Imagine painting on a canvas: you can add layers of paint, but you can’t easily scrape paint off to reveal what’s behind it or create a “hole” in the volume.
The authors propose a non-monotonic mixture model. They allow the weights of the components to be negative.

However, implementing this naively (as shown above) creates complexity because interaction terms (\(O(n^2)\)) appear. Instead, the authors stick to the linear formulation but allow the opacity values to dip into the negative range during optimization.
Why Negative Density?
Negative density acts like a boolean subtraction operation in geometry. It allows the model to “scoop” out density from a positive region.
This is incredibly efficient for representing complex topology, like rings or hollow objects. Instead of arranging dozens of positive Gaussians in a circle to create a hole in the middle, you can place one large positive primitive to represent the object and one negative primitive in the center to “scoop” out the hole.

Figure 2 illustrates this perfectly. Look at panel (d). SSS captures the torus shape with just two components (one positive, one negative). Standard positive-only splatting (panel c) requires at least five components to even begin to approximate the hole, and even then, it’s messy.
When rendering, a negative component essentially subtracts color and opacity from the accumulated ray, allowing for much sharper definitions of edges and empty spaces with fewer total primitives.
Optimization: Taming the Beast with SGHMC
With great power comes great complexity. SSS introduces new learnable parameters (like \(\nu\)) and allows negative densities. This creates a highly coupled optimization landscape.
For example, changing the tail fatness (\(\nu\)) fundamentally changes how the position (\(\mu\)) and covariance (\(\Sigma\)) interact with the loss function. Standard Stochastic Gradient Descent (SGD), which is used in 3DGS, often gets stuck in local minima with this level of coupling. It tends to output distributions that are bunched up rather than exploring the full potential of the t-distribution.
To solve this, the authors employ Stochastic Gradient Hamiltonian Monte Carlo (SGHMC).
Sampling via Physics
SGHMC treats the optimization variable \(\theta\) (parameters) as a particle moving through a landscape defined by the loss function. It introduces auxiliary variables: momentum (\(r\)) and friction.

The system evolves according to physical dynamics. The momentum term allows the parameters to “coast” over small bumps in the loss landscape (escaping local minima), while the friction term ensures the system eventually settles down (converges).
The update rules derived in the paper are:

Here, \(N\) represents Gaussian noise injected into the system. This noise is crucial—it turns the optimization into a sampling process, allowing the model to explore different configurations of \(\nu\) and \(\mu\) rather than just greedily rushing to the nearest solution.
Friction Scheduling
The authors use an adaptive scheme (the sigmoid function \(\sigma(o)\) in the equation above). They only apply the friction and noise to components with very low opacity. This essentially tells the system: “If a splat is transparent and useless, shake it up and move it around aggressively. If a splat is solid and useful, let it refine its position carefully.”
Recycling Components
Because SSS is so efficient, many components eventually become transparent (useless). Instead of just killing them (as in 3DGS), SSS recycles them. It identifies high-opacity components that need more detail and moves the transparent components to that location.
To ensure this move doesn’t disrupt the image, they minimize the difference in the integrated color distribution before and after the split. This involves some heavy math with Beta functions:

This principled approach ensures that the total density remains consistent even as the model dynamically rearranges its primitives.
Experiments and Results
So, does all this math translate to better pictures? The answer is a resounding yes.
The researchers tested SSS against standard 3DGS and other state-of-the-art variants (like Mip-NeRF 360, 3DHGS, and GES) across multiple datasets.
1. Visual Quality
SSS consistently recovers finer details and handles high-frequency textures better than the baselines.

In Figure 3, zoom in on the truck windshield in column (d). SSS is the best at restoring the reflection. In column (a), SSS captures the subtle indentations of the box lid that other methods smooth over.
2. Parameter Efficiency (The Killer Feature)
This is where SSS truly shines. Because of the flexible t-distribution and scooping capability, SSS can represent scenes with far fewer components.
The authors ran experiments where they artificially limited the number of allowed components.

As shown in Figure 4, the SSS curve (cyan) stays high even as the component number (x-axis) decreases. On the Tanks & Temples dataset (middle graph), SSS with low component counts matches the quality of 3DGS with high component counts.
Quantitatively, SSS achieves comparable results to 3DGS while reducing the number of components by as much as 82% in some scenes. This suggests a massive potential for compression and lightweight rendering applications (e.g., on mobile devices).
3. Visualizing Efficiency
We can see this efficiency in action in the following comparison.

In Figure 5, look at the sky and the distant mountains. With restricted components (top rows), standard 3DGS produces blurry, noisy artifacts. GES (Generalized Exponential Splatting) smooths things out too much. SSS, however, separates the sky from the hill clearly and retains details, even with a “budget” of only 252k components.
4. Quantitative Metrics
The tables confirm the visual analysis. On the Mip-NeRF 360 dataset, SSS achieves the highest PSNR (Peak Signal-to-Noise Ratio) and SSIM (Structural Similarity), and the lowest LPIPS (perceptual error).

Note in Table 1 that SSS beats the highly optimized “3DGS-MCMC” method, proving that the change in primitive (Student’s t) and the addition of negative density provide benefits beyond just better sampling.
Ablation Studies: What matters?
The authors broke down their contributions to see what drives the performance.

- SGD + Positive t-distribution: Better than 3DGS, proving t-distributions are inherently better than Gaussians.
- SGHMC + Positive t-distribution: Significant jump in quality, proving the new sampler is necessary to train these flexible distributions.
- Full Model (with Negative Density): The best performance, confirming that “scooping” adds the final layer of expressivity.
Sampling Effects
One interesting analysis in the paper is visualizing how the optimization algorithms explore the parameter space.

Figure 4 (from the supplementary material) shows the distribution of the learned degrees of freedom (\(\nu\)).
- The red line (SGD) clusters heavily around specific values. It gets stuck.
- The cyan line (SSS/SGHMC) spreads out. It explores the full range of the t-distribution family, finding the optimal tail fatness for different parts of the scene. This confirms that the SGHMC sampler is successfully decoupling the parameters and avoiding mode collapse.
Conclusion and Implications
Student Splatting and Scooping (SSS) represents a significant maturation of the Gaussian Splatting paradigm. By observing that 3DGS is just a specific, limited instance of a mixture model (Gaussian-only, positive-only), the authors have opened the door to much more expressive neural representations.
Key Takeaways:
- Don’t be normal: The Student’s t-distribution generalizes the Gaussian, offering a “fatness” parameter that drastically improves parameter efficiency.
- Less is more: By using negative densities (“Scooping”), the model can carve out topology using fewer primitives than it would take to build the surrounding volume additively.
- Physics helps learning: Sophisticated sampling methods like SGHMC are essential when models become more complex and parameters become coupled.
For students and researchers in neural rendering, SSS points toward a future where our primitives are smarter, our mixtures are non-monotonic, and our renderings are more efficient than ever. As 3DGS continues to be integrated into everything from VR to autonomous driving, the parameter efficiency gains offered by SSS could be the key to deploying these models on constrained hardware.
](https://deep-paper.org/en/paper/2503.10148/images/cover.png)