Breaking the Grid: How Adaptive Lattices Are Revolutionizing Neural Image Compression
We live in a world dominated by visual data. From streaming 4K video to scrolling through Instagram, image compression is the invisible engine keeping the internet running. For decades, standards like JPEG defined this field. But in the last five years, Neural Image Compression—using Deep Neural Networks (DNNs) to encode images—has rapidly surpassed traditional hand-crafted methods.
However, there is a bottleneck in these neural systems: Quantization.
Quantization is the process of rounding continuous values (like the activations in a neural network) into discrete numbers that can be stored as bits. Most systems use a rigid, square grid (Scalar Quantization). While simple, it’s inefficient.
In this post, we are diving deep into a CVPR paper titled “Multirate Neural Image Compression with Adaptive Lattice Vector Quantization.” The researchers propose a method that doesn’t just use a better grid (Lattices) but makes that grid adaptive. Their method allows a single model to handle multiple file sizes and adapt to different types of images (like cartoons vs. photos) without retraining the massive neural network.
Let’s unpack how they did it.
1. The Background: Why Shapes Matter
To understand this paper, we first need to understand the difference between Scalar Quantization (SQ) and Vector Quantization (VQ).
The “Square” Problem (Scalar Quantization)
Imagine you are trying to cover a floor with tiles. In Scalar Quantization, we treat every pixel (or feature) independently. We essentially draw a grid of squares over the data. If a data point falls inside a square, it gets rounded to the center of that square.
This is easy to calculate, but geometrically, squares are not the most efficient way to “pack” space. There are gaps in efficiency, leading to higher quantization errors (distortion) for the same amount of data (rate).
The “Honeycomb” Solution (Lattice Vector Quantization)
Vector Quantization (VQ) groups dimension together. Instead of rounding \(x\) and \(y\) separately, we look at the point \((x,y)\) in 2D space.
Lattice Vector Quantization (LVQ) is a special type of VQ where the codebook (the allowed points) forms a repeating, structured pattern. As shown in the image below, different lattices offer different “Voronoi cells” (the region of space belonging to a specific point).

- (b) Uniform Scalar Quantizer: The standard square grid.
- (c) Hexagonal Lattice: A honeycomb pattern. It covers 2D space more efficiently than squares.
- (d) Diamond Lattice: The star of this research paper.
The researchers focus on Diamond Lattices. Why? Because they offer a sweet spot: they cover space better than squares (lower distortion) but are mathematically structured enough to allow for fast encoding, unlike unstructured Vector Quantization (a).
2. The Core Method: Adaptive Lattice Vector Quantization
The problem with existing LVQ-based neural networks is flexibility.
- Fixed Rate: If you want a high-quality image and a low-quality thumbnail, you usually need to train and store two separate neural networks.
- Fixed Domain: A model trained on nature photos often performs poorly on screen content (screenshots, text) because the pre-defined lattice doesn’t fit the new data distribution.
The authors propose a unified solution: Adaptive LVQ. Instead of changing the neural network, they simply change the shape and size of the lattice grid on the fly.
Part A: The Diamond Lattice Structure
First, let’s look at the math behind the lattice. A lattice \(\Lambda\) is defined by a basis matrix \(\mathbf{B}\). Any point in the lattice is an integer combination of these basis vectors.

The goal of the encoder is to find the closest point \(\mathbf{\hat{y}}\) in the lattice to our input vector \(\mathbf{y}\).

The researchers use a Diamond Lattice, which has a clever “coset representation.” It can be thought of as two standard square grids superimposed on each other, with one shifted slightly.

In Figure 2, you can see the black points (Grid 1) and the red points (Grid 2). The union of these creates the dense diamond pattern. This allows for a very fast quantization algorithm:
- Round to the nearest black point (\(\mathbf{y}_0\)).
- Round to the nearest red point (\(\mathbf{y}_1\)).
- Pick the winner.

This reduces the complexity from a massive search (typical VQ) to a simple \(O(1)\) operation.
Part B: Rate Adaptation (One Model, Many Sizes)
How do you change the bitrate without retraining the network?
In a standard system, the lattice is frozen. To get a smaller file size, you force the neural network to output values closer together, which degrades quality in a way the network might not handle well.
The authors propose scaling the basis vectors. If you want a lower bitrate, you make the lattice “sparser” (spread the points out). If you want higher quality, you make the lattice “denser.”
Mathematically, this is done by learning a scalar \(a\). The new basis \(\mathbf{B}\) becomes a scaled version of the original basis \(\mathbf{G}\):
\[ \mathbf{B} = a\mathbf{G} \]In implementation, they simply scale the input features by \(1/a\) before quantizing, and then multiply by \(a\) during decoding. This acts as a “zoom” function for the grid. A single neural network can now support continuous bitrates just by tuning this knob \(a\).
Part C: Domain Adaptation (Morphing the Grid)
This is the most novel contribution. When a neural network is trained on landscapes, the latent features (the compressed code) have a specific geometric distribution. If you suddenly feed it cartoons, the distribution changes, and the fixed Diamond Lattice might no longer be the best fit.
Instead of retraining the heavy neural network (millions of parameters), the authors propose learning a small linear transformation A to reshape the lattice basis G into a new basis B.

By optimizing the matrix A, they can stretch, rotate, and skew the lattice cells to perfectly hug the data distribution of the new domain.
The “Invertible” Trick
There is a catch. If you arbitrarily change the lattice basis, you lose the fast \(O(1)\) quantization speed of the Diamond Lattice. You would have to resort to slow algorithms to find the nearest neighbor.
To solve this, the authors restrict the optimization. They learn an invertible linear mapping.

Look at Figure 3(c) above.
- Transformation: Instead of searching for a nearest neighbor in a complex, skewed lattice (which is slow), they apply the inverse transform \(\mathbf{A}^{-1}\) to the input data.
- Quantization: They quantize this transformed data using the standard, fast Diamond Lattice (\(\mathbf{G}\)).
- Restoration: They apply the forward transform \(\mathbf{A}\) to the result.

This keeps the inference speed lightning fast while allowing the lattice to geometrically adapt to any new dataset. The number of parameters learned for A is tiny (0.06% of the network), preventing overfitting even on small datasets.
3. Experiments & Results
So, does “bending the grid” actually work? The researchers tested this on several standard datasets (Kodak, CLIC) and architectures (Cheng2020, MBT2018).
Winning the Rate-Distortion Trade-off
The primary metric is Rate-Distortion (R-D). We want high quality (PSNR) with low file size (bpp).
The graph below compares different models.
- Red Squares (SQ-m): Standard Scalar Quantization (Variable Rate).
- Purple Stars (LVQ-m): The proposed Adaptive LVQ (Variable Rate).

As you can see, the Purple Stars are consistently higher than the Red Squares. This means for the same file size, Adaptive LVQ gives better image quality. In fact, the variable-rate LVQ model (one model) performs almost as well as training separate fixed-rate models (Yellow Diamonds) for every single bitrate—a massive efficiency win.
Bitrate Savings
The table below quantifies the savings. The BD-Rate represents how many bits are saved for the same quality.

Looking at the highlighted red cells, the fixed LVQ models (LVQ-s) are the theoretical ceiling. The proposed variable model (LVQ-m) comes very close, offering significant bitrate savings (up to 17% on Tecnick) compared to scalar quantization counterparts.
Domain Adaptation Success
Finally, did the matrix transformation A help with domain adaptation?
The authors took a model trained on photos and adapted it to Screen Content (text/UI) and Cartoons.

Table 4 shows that simply learning the lattice transformation A—without touching the neural network weights—resulted in bitrate savings (negative BD-Rate) across all tested categories. While the gains are modest (~1%), they are essentially “free” improvements gained by tweaking a tiny number of parameters.
4. Conclusion and Implications
The paper “Multirate Neural Image Compression with Adaptive Lattice Vector Quantization” bridges a crucial gap in neural compression.
By moving from rigid Scalar Quantization to Adaptive Lattice Vector Quantization, the authors demonstrated that:
- Lattices represent data better: We get better quality-per-bit.
- Lattices can be flexible: By scaling the basis, one model can stream 4K or 480p quality.
- Lattices can adapt: By transforming the basis, a model can optimize itself for cartoons or screenshots without forgetting how to compress nature photos.
For students and researchers in deep learning, this highlights an important lesson: Don’t just optimize the Neural Network layers. The mathematical components surrounding the network—like the Quantizer—are fertile ground for innovation.
This approach—unifying rate control and domain adaptation into a single, geometric framework—paves the way for universal image codecs that are both highly efficient and incredibly versatile.
](https://deep-paper.org/en/paper/file-2145/images/cover.png)