In the current landscape of Artificial Intelligence, we are accustomed to models that generate data: pixels for images, tokens for text, or waveforms for audio. But a new frontier is emerging—generating the models themselves.
Imagine a system that doesn’t just output a 3D shape, but outputs the neural network weights that represent that shape. This is the promise of Implicit Neural Representations (INRs). INRs use simple Multi-Layer Perceptrons (MLPs) to represent complex continuous signals like 3D objects or gigapixel images. They offer infinite resolution and compact storage.
However, treating neural network weights as data presents a massive challenge. Training a generative model (like a GAN or Diffusion model) to produce weights requires massive datasets of pre-trained networks, which are computationally expensive to collect. Furthermore, neural weights live in a high-dimensional space with a chaotic structure, making “few-shot” learning—generating diverse new models from just a handful of examples—notoriously difficult.
In this post, we dive deep into the paper “Few-shot Implicit Function Generation via Equivariance”, which introduces a framework called EQUIGEN. The researchers propose a clever solution rooted in the fundamental mathematics of neural networks: Equivariance. By respecting the symmetry of weight spaces, they enable the generation of diverse, high-quality INRs from just a few examples.
The Problem: The Chaos of Weight Space
To understand why generating neural networks is hard, we first need to look at the data structure. In standard computer vision, if you shift an image by one pixel, it remains largely the same image. But in “weight space,” the rules are different.
The Curse of Permutation
Neural networks possess a property known as permutation symmetry. Consider a hidden layer with 100 neurons. If you swap neuron #5 and neuron #10, and then swap the corresponding weights in the next layer to match, the network’s function (its input-output behavior) remains exactly the same.
This means a single function (e.g., a network representing a specific 3D chair) can be represented by factorial-many different weight matrices. To a standard generative model, these permuted weights look like completely different data points, even though they represent the exact same object. This makes the weight space discontinuous and difficult to learn, especially when you only have a few training examples (few-shot).

As shown in Figure 1, the goal is to take a limited set of target samples (like a few specific airplanes) and generate a diverse set of new weights that represent valid variations of airplanes. Traditional few-shot methods fail here because they assume element-wise similarity between samples, an assumption that breaks down in the chaotic, permutable world of neural weights.
The Solution: EQUIGEN
The core insight of the EQUIGEN framework is that we shouldn’t fight these symmetries; we should leverage them. The researchers propose projecting weights into an Equivariant Latent Space. In this space, all the different permutations of a network’s weights are mapped to a structured representation that preserves functional similarity.
The framework, illustrated below, consists of three distinct stages:
- Equivariant Encoder Pre-training: Learning a mapping from raw weights to a structured latent space.
- Equivariance-Guided Diffusion: Training a diffusion model to generate weights conditioned on these features.
- Few-shot Adaptation: Using the trained system to generate new, diverse weights from limited data.

Let’s break down the methodology step-by-step.
1. Understanding Equivariance in Depth
Before looking at the architecture, we must define equivariance in this context. A function is equivariant if transforming the input results in a corresponding transformation of the output.
For neural weights, let \(P\) be a permutation matrix. The pointwise activation function \(\sigma\) in a neural network satisfies:

This property implies that functionally equivalent networks form “orbits” or groups. The goal of an Equivariant Encoder is to process these weights such that the symmetries are respected. Formally, an equivariant encoder layer \(L\) must satisfy:

The researchers build their encoder \(F_{\mathrm{equi}}\) by stacking these equivariant affine transformations with activation functions:

By using this architecture, the encoder ensures that the mapping from weight space to feature space understands the underlying structure of the neural network inputs.
2. Equivariant Encoder Pre-training
The first stage involves training the encoder to produce meaningful representations. The researchers employ Contrastive Learning, a popular technique where the model learns to pull similar items together and push dissimilar items apart in the latent space.
However, raw weights are messy. To help the encoder, the authors introduce a novel preprocessing step called Smooth Augmentation.
Smooth Augmentation
Because neurons can be in any order, a raw weight matrix often looks like random noise visually. By finding a specific permutation \(P\) that minimizes the Total Variation (TV) of the weights, the matrix can be reorganized to appear “smoother.”

As visualized in Figure 4, this smoothing operation (bottom) creates a more continuous manifold compared to the jagged original space (top). It doesn’t change the function of the network, but it makes the data much easier for the encoder to process.
Contrastive Loss
The encoder is trained to maximize the similarity between different views of the same weight (augmented via smoothing and other INR-specific transforms). The loss function used is a variation of the InfoNCE loss:

This process ensures that weights belonging to the same functional group (even if they look different initially) map to similar points in the equivariant latent space.

Figure 3 visualizes this concept. Source weights (red) and target weights (green) are mapped into the Equivariant Subspace (right). Note how the encoder clusters them effectively. This structure is critical for the next step: generation.
3. Equivariance-Guided Diffusion
With a powerful encoder, the next step is to learn the distribution of weights. The authors choose a Diffusion Model, which has become the gold standard in generative AI.
The diffusion process progressively adds noise to the smooth weights \(\bar{w}\) until they become random Gaussian noise. The generative task is to reverse this process—denoising random noise back into valid neural weights.
Crucially, this denoising process is conditioned on the equivariant features learned in the previous step. The denoising network \(G_{\theta}\) predicts the clean weights \(\tilde{w}_i\) based on the noisy weights \(\bar{w}_T\) and the encoder features \(E_{\phi}(\bar{w}_i)\):

Equivariance Regularization
To ensure the generated weights respect the symmetry of the weight space, the authors add a specific regularization loss, \(\mathcal{L}_{eq}\). This loss forces the encoder’s representation of the generated weights to match the encoder’s representation of the original weights:

The final training objective combines the standard reconstruction loss (MSE) with this equivariance loss:

4. Few-shot Adaptation and Subspace Disturbance
Once the model is pre-trained on a source dataset (e.g., varying shapes of cars), it needs to generate a new category (e.g., chairs) from only 10 examples.
Standard few-shot methods might just memorize the 10 examples. To generate diverse new chairs, EQUIGEN uses a technique called Subspace Disturbance.
Referring back to Figure 3, look at the “Disturbance bound” in the equivariant subspace. Instead of just using the exact features of the 10 support examples, the model adds controlled Gaussian noise to the equivariant features before feeding them into the diffusion generator.
Because the encoder has learned a structured, meaningful space, moving slightly in this feature space corresponds to valid semantic changes in the resulting neural network (e.g., changing the leg style of a chair), rather than breaking the network entirely.
Experiments and Results
The researchers validated EQUIGEN on two primary domains: 2D Images (MNIST and CIFAR-10 represented as INRs) and 3D Shapes (ShapeNet).
3D Shape Generation
The task: Given 10 examples of a specific object class (e.g., airplanes), generate new, valid airplanes.
Qualitative Results: The visual results are compelling. In Figure 5, we see the input samples (left) and the generated samples (right). The model successfully generates airplanes that look structurally sound but distinct from the inputs.

Similarly, Figure 6 shows results for chairs and cars. The generated chairs exhibit variations in backrest height and leg shape, demonstrating that the model isn’t simply copying the training data.

Quantitative Analysis: The table below compares EQUIGEN against state-of-the-art baselines like INR2Vec and HyperDiffusion. The metrics used are:
- MMD (Minimum Matching Distance): Measures quality (lower is better).
- COV (Coverage): Measures diversity (higher is better).
- 1-NNA: Measures distribution similarity (closer to 50% is better).

EQUIGEN consistently achieves the lowest MMD and highest Coverage across Airplanes, Cars, and Chairs. This confirms that the equivariance-based approach yields both higher fidelity and greater diversity than voxel-based or standard diffusion approaches.
2D Image Generation
The results on 2D datasets mirror the 3D findings. Using FID (Fréchet Inception Distance) for quality and LPIPS for diversity, EQUIGEN outperforms previous meta-learning and generation methods.

Why does it work? An Analysis
The success of EQUIGEN relies heavily on the quality of the latent space. Figure 7 presents a t-SNE visualization of the equivariant subspace.

On the left (without smooth augmentation), the classes are somewhat scattered. On the right (with smooth augmentation), the clusters are tight and well-separated. This “clean” latent space allows the Subspace Disturbance technique to work effectively—perturbing a feature keeps it within the valid cluster for that object class.
Finally, the authors analyzed the impact of the Subspace Disturbance intensity (\(\gamma\)).

As shown in Figure 8, increasing the disturbance (\(\gamma\)) increases diversity (COV goes up) but eventually harms quality (MMD goes up). There is a “sweet spot” where the model generates highly diverse samples without degrading their functional correctness.
Conclusion
Few-shot Implicit Function Generation represents a significant leap forward in meta-learning. By treating neural networks as data and respecting their inherent equivariant symmetry, EQUIGEN allows us to generate complex, functional neural representations from very limited data.
This work highlights a broader lesson for Deep Learning: when data is scarce or high-dimensional, incorporating domain-specific geometric priors—like permutation invariance—is often the key to generalization. As we move toward worlds filled with AI-generated 3D assets and neural fields, techniques like EQUIGEN will be essential for creating diverse content efficiently.
](https://deep-paper.org/en/paper/2501.01601/images/cover.png)