Introduction

In the rapidly evolving world of computer vision, the quest to reconstruct reality inside a computer has seen massive leaps in just a few years. We started with photogrammetry, moved to the revolutionary Neural Radiance Fields (NeRFs), and most recently arrived at 3D Gaussian Splatting (3DGS).

3DGS changed the game by allowing for real-time rendering and fast training speeds that NeRFs struggled to achieve. It represents a scene not as a continuous volume, but as millions of discrete 3D Gaussian “blobs.” While this works incredibly well for organic, fuzzy structures, it hits a wall when dealing with the man-made world. Look around you—walls, tables, screens, and buildings are defined by sharp edges and flat surfaces. Gaussians, by their nature, are soft, round, and diffuse. Trying to represent a sharp cube with round blobs is like trying to build a Lego house out of water balloons; you need an excessive amount of them to approximate the flat sides, and it’s still never quite perfect.

This limitation leads to memory bloat and visual artifacts where hard edges should be. But what if we changed the fundamental building block? What if, instead of fuzzy blobs, we used a primitive that naturally understands edges and volumes?

This is the core contribution of 3D Convex Splatting (3DCS).

Figure 1: 3D Convex Splatting for Novel View Synthesis. A high-level comparison showing how 3DCS achieves cleaner geometry with fewer primitives compared to Gaussian Splatting.

In this post, we will deep dive into a new research paper that proposes replacing Gaussians with 3D Smooth Convexes. We will explore how these new primitives allow for sharper edges, better surface representation, and reduced memory usage while maintaining the real-time rendering speeds we’ve come to expect.

Background: The “Fuzzy” Problem of Gaussians

To understand why 3D Convex Splatting is necessary, we first need to look at the limitations of the current state-of-the-art: 3D Gaussian Splatting (3DGS).

3DGS represents a scene using millions of 3D Gaussians. Each Gaussian is essentially an ellipsoid defined by its position, rotation, scale, opacity, and color. To render an image, these 3D ellipsoids are projected onto the 2D camera plane (splatted) and blended together.

While efficient, Gaussians suffer from two major geometric limitations:

Lack of Physical Boundaries: A Gaussian function theoretically extends infinitely (though it drops off quickly). It doesn’t have a hard “stop.” This makes it terrible at representing a flat wall or a sharp corner.
The Sphere Packing Problem: Imagine trying to fill a square box with tennis balls. No matter how tightly you pack them, there will always be gaps between the balls and the corners of the box. To fill the gaps, you need smaller and smaller balls. In 3DGS, this means the system must spawn millions of tiny Gaussians just to create the illusion of a solid, flat surface.

This results in a tradeoff: either you accept “fuzzy” edges, or you explode your memory usage by using millions of primitives to simulate sharpness.

Enter the Convex

The researchers behind 3D Convex Splatting propose a solution rooted in geometry: The 3D Smooth Convex.

A convex shape is a volume where, for any two points inside the shape, the line connecting them is also entirely inside the shape. Think of cubes, pyramids, or dodecahedrons. Unlike Gaussians, convex shapes can have flat sides and sharp corners. By using these as our rendering primitives, we can represent a table or a wall with just a handful of large, flat convexes rather than thousands of tiny Gaussians.

Figure 2: Toy Experiment of Modeling a Chair. This visual demonstrates how convexes (bottom row) can capture the sharp geometry of a chair with significantly fewer parameters than Gaussians (top row).

As shown in the figure above, the efficiency gain is massive. To get a sharp representation of a chair leg using Gaussians, you need a dense cloud of them. With convexes, a single elongated shape can represent the entire leg perfectly.

The Core Method: 3D Convex Splatting

So, how do we actually implement this? We can’t just throw standard triangle meshes into a splatting pipeline because we need the system to be differentiable. We need to be able to tweak the parameters of the shapes slightly and see how that changes the image, allowing the computer to “learn” the shape of the scene.

The method relies on a sophisticated pipeline that defines these shapes mathematically, projects them to 2D, and renders them.

Figure 3: Convex Splatting Pipeline. The diagram outlines the flow from 3D points to 2D projection, convex hull extraction, and finally the indicator function for rendering.

1. Defining 3D Smooth Convexes

The paper builds upon a concept called CvxNet, which defines a convex shape using planes. However, 3DCS takes a slightly different approach to make it compatible with splatting.

Instead of defining planes directly, a 3D convex in this method is defined by a set of 3D points (let’s say \(K\) points). The shape is the “convex hull” of these points—imagine wrapping a rubber band around a bunch of nails; the shape inside the rubber band is the convex hull.

The Math of Smoothness

Standard convex hulls have infinitely sharp corners, which can be problematic for optimization (gradients don’t flow well through sharp discontinuities). To solve this, the authors use a Smooth Approximate Signed Distance Function.

First, they define the distance \(L_j(p)\) from a point to a plane defining the hull:

Equation 1: The signed distance formula.

Then, to create the “smooth” hull, they use a LogSumExp function. This function aggregates the distances from all the planes that make up the hull but softens the intersection between them.

Equation 2: The smooth approximate signed distance function using LogSumExp.

Here, the parameter \(\delta\) (delta) controls the smoothness.

High \(\delta\): The shape approaches a hard polyhedron with sharp corners.
Low \(\delta\): The corners become rounded and soft.

The Indicator Function (Sharpness)

Once we have the shape defined, we need to know how “dense” it is. Is it a solid object, or is it foggy? This is controlled by the Indicator Function, which uses a sigmoid activation:

Equation 3: The indicator function determining the density of the shape.

Here, the parameter \(\sigma\) (sigma) controls the sharpness of the boundary.

High \(\sigma\): The transition from “inside” to “outside” the shape is instant (like a solid wall).
Low \(\sigma\): The transition is gradual (like a cloud).

By combining these two parameters, the system can represent a huge variety of shapes, from hard cubes to soft, foggy blobs.

Figure 4: Effects of Delta and Sigma on Splatting. This grid visualizes how varying the smoothness (delta) and sharpness (sigma) changes the appearance of the primitive from soft/diffuse to hard/dense.

2. The Splatting Process: From 3D to 2D

In rendering, we don’t see the 3D shape directly; we see its projection on a 2D screen. 3DGS projects 3D ellipsoids into 2D ellipses. 3DCS needs to project 3D convex hulls into 2D convex polygons.

Calculating the 3D hull and then projecting it is computationally expensive. The authors use a clever shortcut:

Project the Points: Take the \(K\) points that define the 3D shape and project them individually onto the 2D camera plane using standard perspective projection.

Equation 4: The projection of 3D points to 2D image coordinates.

Compute 2D Hull: Once the points are in 2D, use an algorithm called the Graham Scan to find the convex hull of these 2D dots. This outlines the shape on the screen.
Calculate 2D Indicator: The math used for the 3D shape (LogSumExp) is reused here in 2D. The lines connecting the 2D hull points act as the “planes,” allowing the system to compute the color and opacity of every pixel inside the shape.

The 2D indicator functions for the projected shapes look like this:

Equation 7: The smooth distance function adapted for 2D. Equation 8: The indicator function adapted for 2D.

Notice the \(d\) term? That scales the parameters based on distance from the camera to ensure perspective correctness.

3. Rasterization

Finally, we have a set of 2D shapes on the screen. The rasterizer sorts them by depth (closest to the camera first) and blends their colors. This uses the standard alpha-blending formula found in NeRF and 3DGS:

Equation 5: The alpha-blending formula for combining colors of overlapping primitives.

This step is fully differentiable, implemented in CUDA for high performance. This means the system can compare the rendered image to a ground truth photo, calculate the error, and update the positions of the 3D points, their colors, and their opacity.

Optimization: Adaptive Densification

A scene is rarely simple enough to be represented by the initial random set of convexes. The system needs to add more detail where necessary.

In 3DGS, Gaussians are cloned or split. 3DCS uses a similar but geometrically distinct approach. When the optimization detects that a convex isn’t representing an area well enough (based on gradients), it triggers a Split.

Figure 5: Adaptive Convex Densification Scheme. Illustration of how a single convex shape is split into multiple smaller convexes centered at the original defining points.

A single convex defined by \(K\) points is split into \(K\) new, smaller convexes. The centers of these new shapes correspond to the points of the original shape. This ensures that the new shapes roughly cover the same volume but now have the freedom to move independently and capture finer detail.

To ensure the system converges, they use a loss function that combines L1 distance (pixel difference), SSIM (structural similarity), and a mask loss to minimize the number of primitives used.

Equation 6: The combined loss function used for training.

Experiments and Results

Does this geometric complexity pay off? The researchers tested 3DCS against 3DGS, Mip-NeRF360, and other primitive-based methods (like 2DGS and GES) on standard benchmarks: Tanks and Temples, Deep Blending, and Mip-NeRF360.

Geometric Fidelity

The first sanity check is seeing if convexes actually represent shapes better than Gaussians.

Figure 6: Reconstruction of Simple Shapes with Primitives. Comparison of fitting simple geometric shapes (triangle, square, circle) using Gaussians vs. Smooth Convexes.

As seen in Figure 6, smooth convexes (bottom rows) can form sharp triangles and squares with very few points. Gaussians (top row) struggle to form corners even when you add more of them; they naturally want to be round.

Quantitative Performance

Table 1 summarizes the main results. The metrics used are LPIPS (lower is better, measures perceptual quality), PSNR (higher is better, measures pixel accuracy), and SSIM (higher is better, structural similarity).

Table 1: Comparative Analysis of Novel View Synthesis Methods. 3DCS outperforms 3DGS in LPIPS and PSNR on Tanks&Temples and Deep Blending, with significantly reduced memory usage.

Key Takeaways from the Data:

Superior Quality: 3DCS achieves better (lower) LPIPS scores than 3DGS on almost all datasets. This means the images look more natural to the human eye.
Memory Efficiency: On the “Tanks&Temples” dataset, 3DCS achieves higher quality while using roughly half the memory (282MB vs 411MB) of 3DGS. The “Light” version of 3DCS uses even less (83MB).
Speed: While 3DCS is slower to train and render than the blazing fast 3DGS (due to the more complex convex hull calculations), it is still orders of magnitude faster than Mip-NeRF360.

Indoor vs. Outdoor

The method shines brightest in structured environments. In the Mip-NeRF360 dataset, the researchers split the results into Indoor and Outdoor scenes.

Table 2: Quantitative Results on Mip-NeRF 360. 3DCS shows significant gains in indoor scenes which are rich in hard edges and flat surfaces.

For Indoor scenes (tables, walls, rooms), 3DCS significantly outperforms 3DGS. This validates the hypothesis that convex primitives are better suited for man-made geometry.

Visual Comparisons

Numbers are great, but visual quality is king in computer graphics.

Figure 7: Qualitative Comparison. Comparing the visual output of 3DCS against 3DGS and 2DGS. Note the clarity in the ‘Train’ and ‘Truck’ scenes.

In Figure 7, look at the Train row. The background structures in the 3DGS render are blurry and ill-defined. The 3DCS render preserves the hard lines of the architecture. Similarly, in the Flowers row, 3DCS captures the high-frequency detail of the grass better than the smoothed-out representation of 3DGS.

Another striking comparison is the ability to represent scenes with fewer shapes.

Figure 11: 3DCS vs. 3DGS with fewer shapes. This demonstrates how Convex Splatting maintains structural integrity even when the number of primitives is restricted, whereas 3DGS dissolves into a blur.

This image (Figure 11) is crucial. It shows that 3DCS isn’t just “splatting but different”—it’s a semantically better decomposition. The convexes naturally align with the actual objects (like the leaves or the stump), whereas Gaussians just create a fog that approximates the object.

Efficiency Comparison

Finally, let’s look at the trade-off between the number of parameters and quality (LPIPS).

Figure 10: # Parameters vs. LPIPS. The graph shows that for the same number of parameters, 3DCS consistently achieves better perceptual quality (lower LPIPS) than 3DGS.

The red line (3DCS) is consistently below the blue line (3DGS). This means that for any given “budget” of memory or model size, convex splatting gives you a better-looking image.

Conclusion & Implications

3D Convex Splatting represents a maturing of the “splatting” field. While 3D Gaussian Splatting proved that we could render radiance fields in real-time, 3DCS asks the important question: “Is a Gaussian the right shape for the job?”

The answer seems to be: Not always.

By adopting 3D Smooth Convexes, the researchers have created a system that:

Respects Geometry: It handles hard edges and flat surfaces naturally.
Saves Memory: It represents volumes more efficiently, requiring fewer primitives.
Maintains Quality: It outperforms Gaussians in perceptual metrics, particularly in indoor, structured scenes.

While it creates a slight drag on rendering speed compared to the simplicity of Gaussians, the trade-off is often worth it for the visual fidelity gained. This paper paves the way for future hybrid rendering engines—perhaps ones that use convexes for walls and objects, and Gaussians for smoke and fire.

For students and researchers in computer vision, 3DCS is a perfect example of how revisiting fundamental assumptions (like “what primitive should we use?”) can lead to significant performance breakthroughs.

All images and data presented in this post are sourced directly from the research paper “3D Convex Splatting: Radiance Field Rendering with 3D Smooth Convexes”.

Introduction#

Background: The “Fuzzy” Problem of Gaussians#

Enter the Convex#

The Core Method: 3D Convex Splatting#

1. Defining 3D Smooth Convexes#

The Math of Smoothness#

The Indicator Function (Sharpness)#

2. The Splatting Process: From 3D to 2D#

3. Rasterization#

Optimization: Adaptive Densification#

Experiments and Results#

Geometric Fidelity#

Quantitative Performance#

Indoor vs. Outdoor#

Visual Comparisons#

Efficiency Comparison#

Conclusion & Implications#