Introduction

In the rapidly evolving world of computer vision and computer graphics, Implicit Neural Representations (INRs) have become a cornerstone technology. Whether you are looking at Neural Radiance Fields (NeRFs) for 3D scene reconstruction or novel compression methods for images, INRs—which represent signals as continuous functions parameterized by neural networks—are everywhere.

However, there is a fundamental tension in this field. On one side, we have fully implicit methods (like SIREN), which are compact but slow to train and suffer from “spectral bias” (struggling to learn high-frequency details). On the other side, we have grid-based (or hybrid) representations (like Instant NGP), which are incredibly fast and scalable but rely on discrete feature grids.

Here is the catch: standard grid-based methods typically use linear interpolation to retrieve features between grid points. While efficient, this assumes the underlying world is locally linear. But the real world—full of sharp edges, complex textures, and curvature—is highly nonlinear. By forcing a linear approximation on a nonlinear reality, these models inevitably lose detail.

Enter MetricGrids, a new approach from researchers at Shandong University and North University of China. They asked a simple yet profound question: If linear approximation is the bottleneck, why not use high-order approximations derived from Taylor expansions?

Figure 1. Illustration of the nonlinear fitting capabilities for various type of signals using the existing INR method and our Metric-Grids.

As shown in Figure 1, the difference is not subtle. While baseline grid methods leave behind structured residuals (errors) and blurry details in complex scenes, MetricGrids captures sharp geometries and high-frequency textures with remarkable fidelity.

In this post, we will deconstruct MetricGrids. We will explore how it leverages the mathematical intuition of Taylor series to build “Elementary Metric Grids,” how it compresses these grids using clever hashing, and how a specialized decoder extrapolates high-order details that aren’t explicitly stored.

Background: The Limits of Linearity

To understand the innovation of MetricGrids, we first need to understand the limitation it solves.

The Hybrid/Grid-Based Paradigm

In a modern hybrid INR, the “neural network” doesn’t do all the heavy lifting. Instead, we store a grid of learnable feature vectors \(\mathcal{Z}\). When we want to know the signal value at a continuous coordinate \(\mathbf{x}\) (like a pixel location or a 3D point):

We find the grid vertices surrounding \(\mathbf{x}\).
We interpolate the features at those vertices to get a feature vector \(\mathbf{z}_{\mathbf{x}}\).
We pass \(\mathbf{z}_{\mathbf{x}}\) through a tiny Multi-Layer Perceptron (MLP) decoder to get the final RGB color or density.

This can be formalized as:

Equation 1: Basic INR formulation

Here, \(g(\cdot)\) is the indexing/interpolation function. In almost every existing method (like Plenoxels, Instant NGP, or TensoRF), \(g\) is a linear interpolation based on the Euclidean (\(L_1\)) distance.

The Linear Trap

Linear interpolation effectively draws a straight line (or plane) between grid points. If the signal you are modeling curves sharply between those points (e.g., a strand of hair, a sharp shadow, or a texture pattern), a linear interpolator cannot capture it physically. It is a “degenerate” representation.

The MLP decoder tries to compensate for this, but it is fighting an uphill battle. It receives a linearly degraded feature and has to “hallucinate” the missing nonlinearity. This leads to the artifacts we often see in NeRFs: aliasing, blurring, and loss of fine geometric detail.

Core Method: MetricGrids

The researchers propose a solution rooted in calculus: The Taylor Expansion.

Recall that any smooth function \(f(x)\) can be approximated near a point \(a\) by a polynomial series:

\[f(x) \approx f(a) + f'(a)(x-a) + \frac{f''(a)}{2!}(x-a)^2 + \dots\]

Standard grids only give us the first term (the value). MetricGrids aims to provide the network with the equivalent of the derivative terms \((x-a)^n\).

Figure 2. Illustration of the proposed MetricGrids.

As illustrated in Figure 2, the framework consists of three main pillars:

Elementary Metric Grids: Storing features in different metric spaces to approximate varying orders of the Taylor series.
Compact Representation: Using Hash Encoding with varying sparsity levels to save memory.
High-Order Extrapolation Decoder: A neural architecture designed to multiply these terms together to reconstruct the signal.

1. Elementary Metric Grids

Instead of a single feature grid, MetricGrids utilizes a collection of grids, each defined by a different distance metric.

We construct a set of elementary grids \(\mathcal{Z}^{metrics}\). The first grid uses standard linear distance, but subsequent grids use nonlinear metrics (like squared distance, cubic distance, or even trigonometric metrics).

Equation: Definition of Metric Grids

Here, \(d_p(\mathbf{x}, \mathbf{x}_i) = \|\mathbf{x} - \mathbf{x}_i\|_p^p\) represents the metric.

Grid 1 (\(d_1\)): Uses \(L_1\) distance (standard linear interpolation). This provides the base approximation.
Grid 2 (\(d_2\)): Uses \(L_2^2\) (squared distance). This allows the grid to represent quadratic terms, similar to the second-order term in a Taylor series \((x-x_i)^2\).
Grid 3 (\(d_3\)): Uses higher-order metrics (cubic, etc.).

When we sample a point \(\mathbf{x}\), we fetch features from all these grids simultaneously.

Equation: Feature extraction from multiple grids

By providing the decoder with features that inherently contain information about \((x-x_i)\), \((x-x_i)^2\), and so on, we are explicitly feeding the network the building blocks of a nonlinear polynomial approximation. The network no longer has to “guess” the curvature; the curvature data is encoded directly in the higher-order grids.

2. Hash Encoding Based Compact Representation

If we simply added \(M\) grids, we would multiply our memory usage by \(M\). To avoid this, the authors leverage Multiresolution Hash Encoding (popularized by Instant NGP), but with a twist tailored for sparsity.

The Sparsity of Derivatives

In calculus, higher-order derivatives are often zero. For example, if a surface is flat, its second derivative (curvature) is zero. If it is a simple slope, the third derivative is zero. This means the higher-order metric grids (which represent these derivatives) should be sparse.

To exploit this, MetricGrids adjusts the hash table size \(T\) for each grid:

The base linear grid gets a large hash table.
Higher-order grids get progressively smaller hash tables (halving the size for each step).

This encourages the model to fuse similar features in high-order terms and significantly reduces memory footprint. Furthermore, to prevent “hash collisions” (where different coordinates map to the same memory entry) from mixing up different derivative types, each metric grid uses a separate hashing function (using different prime numbers \(\pi_j\)).

Equation: Hash encoding formulation

This results in a compact representation that is roughly the same size as a standard single grid but carries much richer, multi-order information.

3. High-Order Extrapolation Decoder

We cannot store an infinite number of grids to represent every term in a Taylor series. We might explicitly store the first 3 orders (linear, quadratic, cubic), but what about the 4th or 5th order terms required for extremely complex details?

The authors propose a High-Order Extrapolation Decoder to predict these missing terms.

Figure 3. Illustration of the high-order extrapolation decoder.

The decoder is designed to simulate the multiplication operations found in polynomial expansion. In a Taylor series, higher-order terms are products of lower-order terms (e.g., \(x^4 = x^2 \cdot x^2\)).

The decoder uses a specific layer structure involving the Hadamard product (element-wise multiplication, denoted by \(\circ\)) to mimic this behavior.

Equation: Recursive decoder update

Here is how the layer update works:

Backbone Update (\(\omega_\ell\)): The hidden state \(h\) is processed by a linear layer and activation.
Modulation (\(\gamma_\ell\)): The feature vector from the specific metric grid \(\mathbf{z}_{\mathbf{x}}^{d_{\ell-1}}\) is processed to act as a modulator.

By multiplying the hidden state with the grid feature at every layer, the network effectively raises the degree of the polynomial approximation. If you have 5 layers of multiplication, you can theoretically approximate a polynomial of a much higher degree, even if you only explicitly stored the first few terms in the grids.

Finally, a linear layer maps the high-order feature \(h_M\) to the output signal (e.g., RGB color):

Equation: Final output layer

Experiments and Results

The researchers tested MetricGrids across the “holy trinity” of INR tasks: 2D Image Fitting, 3D Signed Distance Fields (SDF), and Neural Radiance Fields (NeRF).

1. 2D Image Fitting (Kodak & Gigapixel)

The first test is reconstructing high-resolution 2D images. This isolates the method’s ability to compress and represent high-frequency spatial data.

Qualitative Results: In the Kodak dataset comparison below, look closely at the windows and roof tiles. The baseline (Instant NGP and NeuRBF) output is noisy or blurred. MetricGrids recovers the grid lines of the windows and the distinct separation of tiles, boosting PSNR by several decibels.

Figure 4. Qualitative Comparison on Kodak dataset.

Quantitative Results: When compared against Gaussian Splatting methods (which are explicit rather than implicit), MetricGrids holds its own. It significantly outperforms standard 2D Gaussian Splatting and competes with specialized image compression methods like GaussianImage, often with fewer parameters.

Table 2. Result comparison with Gaussian primitive based methods.

The method also scales beautifully to Gigapixel images. In the Tokyo cityscape example below, notice the error maps. The baseline methods (left/bottom) show deep purple/blue errors concentrated around the high-frequency building lights. MetricGrids (bottom right of the block) shows a much more uniform, low-error distribution.

Figure 5. Qualitative Comparison on Gigapixel Image.

2. 3D Signed Distance Fields (SDF)

SDFs are used to represent 3D shapes. The goal is to learn a function that tells you how far any point in space is from the surface of the object.

MetricGrids was tested on the Stanford 3D Scanning Repository. The results show that it achieves higher Intersection over Union (IoU) and lower angular error (NAE) than state-of-the-art competitors like NeuRBF, all while keeping the model size around 1MB.

Table 3. Result comparison on 3D Signed Distance Function Reconstruction.

Visually, this translates to sharper mechanical parts. In the “Engine” reconstruction below, the cooling fins and bolts are crisp, whereas other methods introduce subtle wobbly artifacts or smoothing.

Figure 6. Qualitative Comparison on 3D Signed Distance Field Reconstruction.

3. Neural Radiance Fields (NeRF)

Perhaps the most demanding task is novel view synthesis. Here, the model must reconstruct a 5D function (spatial location + viewing direction) to render photorealistic images from new angles.

Using the Blender dataset, MetricGrids consistently outperformed baselines like TensoRF, K-Planes, and Instant NGP.

Figure 7. Qualitative Comparison on Neural Radiance Field Reconstruction.

In Figure 7, look at the rigging on the ship (top row). Standard methods (first column) almost completely obliterate the thin lines. MetricGrids (third column) recovers them. Similarly, the reflections on the golden drums (middle row) and the texture of the food (bottom row) are preserved with much higher fidelity.

Ablation: Do we really need multiple grids?

You might wonder: is it the multiple grids helping, or just the decoder? The authors performed an ablation study (Table 5) to verify this.

Table 5. Ablation Study.

The results are clear:

M=2 vs M=3: Adding a second grid (M=2) provides the biggest jump in performance. Adding a third (M=3) helps further, but returns diminish slightly after that.
Decoder Hierarchy: Removing the hierarchical input to the decoder (“w/o hierarchy”) drops the PSNR by over 1.3 dB, proving that the specialized extrapolation architecture is crucial.

Conclusion

MetricGrids represents a significant step forward for grid-based neural representations. By identifying the theoretical weakness of linear interpolation—the “degenerate linear latent space”—and addressing it with the mathematical elegance of Taylor expansions, the authors have created a method that is both robust and efficient.

The key takeaways are:

Nonlinearity matters: Simply increasing grid resolution isn’t as effective as changing how the grid approximates the space (using nonlinear metrics).
Derivatives are sparse: We can afford to store high-order information if we compress it intelligently using hash encoding.
Extrapolation works: A carefully designed decoder can infer high-order complexity from lower-order inputs.

As we move toward real-time rendering of increasingly complex 3D worlds, techniques like MetricGrids that squeeze more representational power out of limited memory will be essential tools in the graphics engineer’s toolkit.

Introduction#

Background: The Limits of Linearity#

The Hybrid/Grid-Based Paradigm#

The Linear Trap#

Core Method: MetricGrids#

1. Elementary Metric Grids#

2. Hash Encoding Based Compact Representation#

The Sparsity of Derivatives#

3. High-Order Extrapolation Decoder#

Experiments and Results#

1. 2D Image Fitting (Kodak & Gigapixel)#

2. 3D Signed Distance Fields (SDF)#

3. Neural Radiance Fields (NeRF)#

Ablation: Do we really need multiple grids?#

Conclusion#