Introduction

In the rapidly evolving world of Scientific Machine Learning (SciML), we are witnessing a paradigm shift. Researchers are no longer just training neural networks to recognize cats or generate text; they are training them to simulate the physical world. One of the most powerful tools in this domain is the Neural Operator. Unlike standard neural networks that map fixed-size vectors to vectors (like an image to a label), Neural Operators learn mappings between function spaces. They can take an initial condition of a physical system—say, the temperature distribution of a fluid—and predict how that function evolves over time, solving Partial Differential Equations (PDEs) orders of magnitude faster than traditional numerical solvers.

However, there is a catch. In high-stakes scenarios—like predicting the path of a hurricane, modeling structural stress in a bridge, or simulating climate change—speed isn’t enough. We need to know how confident the model is. If a neural operator predicts a specific fluid flow, can we trust it? What if the input data is slightly different from what the model saw during training?

Standard neural operators, such as the popular Fourier Neural Operator (FNO), are deterministic. They give you a single answer, with no indication of potential errors. This is where LUNO (Linearized Uncertainty for Neural Operators) comes in.

In this post, we will dive deep into a framework that transforms deterministic neural operators into probabilistic models. We will explore how LUNO leverages the concept of “currying” from functional programming and linearization techniques to provide robust, resolution-agnostic uncertainty estimates, effectively turning neural operators into Function-Valued Gaussian Processes.

The Problem: The Uncertainty Gap

To understand why LUNO is necessary, we first need to look at the current state of Deep Learning for physics.

Neural Operators

Neural Operators (NOs) are designed to learn the solution operator of a PDE family. Instead of solving a single instance of an equation, they learn the mathematical rules that map parameters (like initial conditions or material properties) to solutions. Once trained, they can evaluate new inputs almost instantly.

However, traditional Uncertainty Quantification (UQ) methods for deep learning don’t straightforwardly apply here.

Infinite Dimensionality: The inputs and outputs of NOs are functions, theoretically infinite-dimensional objects. Standard Bayesian Neural Network (BNN) techniques usually output a vector of variances.
Resolution Independence: A key feature of NOs is that they can be trained on a coarse grid and evaluated on a fine grid. The uncertainty measure must also respect this property; it shouldn’t just be a “per-pixel” error bar, but a continuous field of uncertainty.

The Goal

We want a method that provides a probabilistic belief over the output function. Instead of just predicting “Temperature at location \(x\) is \(T\)”, we want the model to say, “The Temperature function is likely \(T(x)\), with this specific covariance structure across the domain.”

The Theory: From Weights to Function Spaces

The core contribution of the LUNO framework is bridging the gap between weight-space uncertainty (randomness in the neural network’s parameters) and function-space uncertainty (randomness in the output function).

The authors propose a four-step process involving a concept called Probabilistic Currying. Let’s break this down visually.

Figure 1: Illustration of the steps involved in LUNO. A trained neural operator F is converted into an equivalent neural network f with outputs in real space using currying. Linearizing f around the mean of the Gaussian weight belief results in a Gaussian process posterior. Finally, probabilistic currying transforms this back into a function-valued Gaussian process over the operator.

As shown in Figure 1, the process moves from a Neural Operator \(F\) (top left) to a standard function \(f\) (top right), introduces uncertainty (bottom right), and transforms it back to a probabilistic operator \(\mathbf{F}\) (bottom left).

Step 0: The Neural Operator

We start with a trained Neural Operator \(F\). Mathematically, this operator maps an input function \(a\) (from space \(\mathbb{A}\)) and parameters \(w\) (from space \(\mathbb{W}\)) to an output function \(u\) (in space \(\mathbb{U}\)).

\[ F: \mathbb{A} \times \mathbb{W} \rightarrow \mathbb{U} \]

In the context of PDEs, \(u\) is a function defined on a domain (like a spatial grid).

Step 1: Uncurrying (The Flattening)

This is the clever theoretical pivot. In functional programming, currying is the technique of translating the evaluation of a function that takes multiple arguments into evaluating a sequence of functions, each with a single argument.

The researchers use “uncurrying” to simplify the neural operator. Instead of thinking of the output as a function \(u(x)\), they think of the spatial coordinate \(x\) as just another input.

They define a new function \(f\): Equation showing the uncurried function f mapping inputs a, x, and weights w to a real vector.

Here, \(f\) takes the input function parameters \(a\), the specific spatial coordinate \(x\), and the weights \(w\), and outputs a real vector (the value of the solution at that point). This transformation turns the complex operator learning problem into a standard regression problem in the eyes of the uncertainty framework.

Step 2: Linearization and Weight Uncertainty

Now that we have a standard function \(f\), we can apply established Bayesian Deep Learning techniques. The authors use the Linearized Laplace Approximation (LLA).

The idea is to approximate the complex, non-linear loss landscape of the neural network with a quadratic bowl around the optimal weights (the MAP estimate, \(w^*\)). This implies that the posterior distribution of the weights is a Gaussian distribution \(\mathcal{N}(\mu, \Sigma)\).

To propagate this Gaussian belief from the weights to the outputs, we linearize the network \(f\) using a first-order Taylor expansion:

Equation showing the linearization of f around the mean weights mu.

This equation states that the function value is approximately the value at the mean weights plus a correction term based on the gradient (Jacobian) with respect to the weights.

Because linear transformations of Gaussian random variables remain Gaussian, this linearization turns the output of \(f\) into a Gaussian Process (GP). The covariance of this GP is defined by the uncertainty in the weights projected through the model’s gradients:

Equation defining the covariance kernel K based on the Jacobian of f and the weight covariance Sigma.

This covariance kernel \(K\) tells us how the uncertainty at one input-location pair \(((a_1, x_1))\) relates to another \(((a_2, x_2))\). If the weights change slightly, how do these two points vary together?

Step 3: Probabilistic Currying (The Reconstruction)

We now have a GP that predicts values at specific points \(((a, x))\). But we wanted a distribution over functions.

The authors introduce Probabilistic Currying. They prove that under certain conditions, a GP over the augmented space (inputs + coordinates) is mathematically equivalent to a Function-Valued Gaussian Process.

We define the random operator \(\mathbf{F}\): Equation defining the random operator F via probabilistic currying.

This operator \(\mathbf{F}\), when given an input \(a\), doesn’t return a number or a vector. It returns a random function. This random function is a Gaussian Process defined on the spatial domain. Its mean is the prediction of the original neural operator, and its covariance function is derived from the linearized weights:

Equation showing the covariance of the function-valued process F.

This is the punchline of LUNO: By linearizing the underlying function and treating spatial coordinates as inputs, we obtain a rigorous, infinite-dimensional Gaussian belief over the solution space.

Implementation: LUNO for Fourier Neural Operators

The theory applies to any neural operator, but the authors provide a specific, efficient implementation for Fourier Neural Operators (FNOs).

Computing the full Jacobian (gradients of all outputs w.r.t. all weights) is computationally expensive. To make this scalable, the authors use a Last-Layer Laplace Approximation. They assume the weights in the earlier layers are fixed and only treat the weights in the final layer (or final block) as probabilistic.

For an FNO, the output is typically generated by a projection \(q\) after the final Fourier layer. By restricting the Bayesian inference to the last layer’s parameters, the predictive distribution simplifies significantly:

Equation showing the specific form of the LUNO prediction for FNOs.

Here, \(z^{(L-1)}\) represents the features coming into the final layer. The uncertainty is encapsulated in a covariance kernel \(K_a\):

Equation defining the covariance kernel Ka for the FNO specific case.

This structure allows for lazy evaluation. We don’t need to compute the full covariance matrix for every point in the domain during training. We can sample entire functions efficiently or query specific points on demand, maintaining the resolution-agnostic nature of the FNO.

Experiments and Results

Does this theoretical framework actually work? The authors evaluated LUNO on several PDE benchmarks, including the Burgers’ equation and the Advection-Diffusion equation. They compared LUNO against:

Input Perturbations: Adding noise to inputs to generate an ensemble.
Deep Ensembles: Training multiple independent FNOs (the current gold standard for UQ).
Sample-based methods: Using weight uncertainty but sampling outputs rather than using the analytic linearization.

Performance on Low-Data Regimes

In a test using the Burgers’ equation (a fundamental PDE in fluid mechanics) with limited training data, LUNO-LA (Laplace Approximation) showed superior calibration.

Table 1: Comparison of UQ methods for an FNO trained on 25 trajectories of Burgers’ equation.

In Table 1, we look at the Negative Log-Likelihood (NLL), which measures how well the predicted uncertainty fits the observed errors. Lower is better. LUNO-LA achieves the best (lowest) NLL (-2.0787), significantly outperforming the Ensemble and Input Perturbation methods. While the Ensemble has a slightly lower RMSE (error of the mean), its uncertainty estimate (\(\chi^2\)) is much worse (5.597 vs 1.022 for LUNO, where 1.0 is ideal). This indicates the Ensemble is overconfident or miscalibrated.

Visualizing the Uncertainty

Numbers are great, but in physics, we need to see the fields. Figure 2 visualizes the predictions on the Hyper-Diffusion equation.

Figure 2: FNO predictive uncertainty quantified by several different methods. Top row shows target, mean, and samples. Bottom row shows spread and covariance heatmaps.

Top Row: The red dashed line is the truth. The blue line is the mean prediction. The shaded area is the confidence interval. Notice how LUNO-LA (far right) produces smooth, coherent samples (thin blue lines) that follow the physics of the problem.
Bottom Row: This shows the standard deviation (uncertainty) across the domain. The heatmap in the top right of each panel shows the Predictive Covariance Matrix.
Crucial Observation: Look at the covariance heatmap for LUNO-LA. It shows structure off the diagonal. This means the model “knows” that an error at point \(x\) is correlated with an error at point \(y\). This is vital for physical consistency.

Out-of-Distribution (OOD) Robustness

The real test of uncertainty is when the model sees something new. The authors created OOD datasets for the Advection-Diffusion equation by reversing velocity fields (“Flip”), adding heat sources (“Pos”), or combining these shifts (“Pos-Neg-Flip”).

Table 2: Expected marginal NLL evaluation across OOD datasets for different methods. Lower is better.

Table 2 shows the NLL scores on these challenging datasets.

Base: Most methods perform similarly.
Pos-Neg-Flip (Extreme Shift): Input perturbations fail catastrophically (NLL ~494). Ensembles do well, but LUNO-LA remains very competitive and stable.

Interestingly, while Ensembles are robust, their uncertainty representation is fundamentally limited.

The Ensemble “Null Space” Issue

Why choose LUNO over a Deep Ensemble? Ensembles are expensive (you have to train the model 5-10 times). But there is a more subtle mathematical reason.

An ensemble of \(M\) models produces a predictive covariance with a maximum rank of \(M-1\). In a high-dimensional output space (like a fluid simulation with thousands of grid points), the ensemble’s uncertainty is “flat” in almost all directions. It is blind to errors that occur in the “null space” of its covariance.

Figure 3: Comparing an ensemble (left), LUNO-LA (right). The null space projection panel highlights unexplained error.

Figure 3 illustrates this vividly.

Left (Ensemble): Look at the panel labeled “Null space projection.” It shows significant structure. This represents error that the ensemble’s uncertainty estimate cannot account for because its covariance matrix is low-rank. The model is “unknowningly” wrong in these directions.
Right (LUNO-LA): LUNO constructs a covariance matrix based on the Hessian of the weights. While still an approximation, it typically has a much higher rank (bounded by the number of parameters, not the number of ensemble members). Consequently, LUNO captures the error structure much more comprehensively.

Auto-regressive Performance

Finally, when simulating time-dependent PDEs, we often feed the output of the model back in as the input for the next time step (auto-regressive rollout). Errors accumulate.

Figure 4: Averaged performance of different UQ methods on an autoregressive rollout of the FNO.

Figure 4 tracks the NLL over time steps. LUNO-LA (bottom purple line in the NLL plot) maintains the lowest NLL as time progresses, indicating that its uncertainty estimates remain calibrated even as the simulation drifts further from the initial condition.

Conclusion & Implications

LUNO represents a significant step forward for the utility of Neural Operators in science and engineering. By treating the operator as a Function-Valued Gaussian Process, LUNO provides:

Trust: Reliable, calibrated uncertainty estimates that are essential for safety-critical applications.
Structure: Covariance information that captures spatial correlations, not just point-wise errors.
Efficiency: It can be applied post-hoc to trained models (especially with the last-layer approximation) without the need for expensive re-training or managing large ensembles.
Resolution Agnosticism: It preserves the defining feature of neural operators, providing uncertainty estimates at any discretization.

The connection LUNO draws between functional programming (currying) and Bayesian inference (GPs) provides a theoretically sound framework that likely has applications beyond just PDEs, potentially extending to any domain where neural networks map between continuous spaces. For students and researchers in SciML, LUNO offers a glimpse into a future where our AI simulators are not only fast but also self-aware of their own limitations.

Introduction#

The Problem: The Uncertainty Gap#

Neural Operators#

The Goal#

The Theory: From Weights to Function Spaces#

Step 0: The Neural Operator#

Step 1: Uncurrying (The Flattening)#

Step 2: Linearization and Weight Uncertainty#

Step 3: Probabilistic Currying (The Reconstruction)#

Implementation: LUNO for Fourier Neural Operators#

Experiments and Results#

Performance on Low-Data Regimes#

Visualizing the Uncertainty#

Out-of-Distribution (OOD) Robustness#

The Ensemble “Null Space” Issue#

Auto-regressive Performance#

Conclusion & Implications#