Introduction

Imagine a future where safety-critical systems—like autonomous vehicles or medical diagnosis tools—are governed by neural networks. Before these networks are allowed on the road or in the hospital, they undergo a rigorous process called formal verification. This produces a mathematical proof guaranteeing that the network will behave correctly, even when an attacker tries to trick it.

It sounds like the ultimate safety net. If the math proves the network is robust, we should be safe, right?

According to a fascinating research paper titled “No Soundness in the Real World,” the answer is a troubling no. The researchers argue that current state-of-the-art verification methods are chasing a phantom. They verify a theoretical, mathematical model of the neural network (the “Platonic ideal” of the code), but they fail to verify the actual deployed network running on physical hardware.

The culprit? Floating-point arithmetic.

Computers cannot represent real numbers perfectly. They approximate them. The way a GPU calculates a sum differs slightly from how a CPU does it, or how a specific software library handles it. This blog post explores how researchers proved that these tiny numerical discrepancies can be weaponized. They demonstrate that a neural network can be mathematically “proven” safe by the world’s best verifiers, yet still harbor malicious backdoors that trigger catastrophic failures only when deployed in a specific real-world environment.

The Background: Verification vs. Reality

To understand the gravity of this work, we first need to understand what verification tries to achieve.

The Theoretical Verification Problem

In a standard robustness verification scenario, we have a neural network \(f\) and an input \(x^*\). We want to ensure that for any tiny perturbation of the input (within a specific range, usually called \(\epsilon\)), the network’s prediction remains constant.

The goal is to prove that the output of the network is always “safe” (e.g., the correct class score is higher than all others) for all inputs in that tiny neighborhood. This is formally expressed as:

The standard equation for theoretical verification.

Here, \(f(x; \theta)\) represents the theoretical neural network function with parameters \(\theta\). The equation states that for every input \(x\) in the domain \(D\), the output is non-negative (implying the correct classification wins).

The Villain: Floating-Point Non-Associativity

The equation above assumes that the network operates on real numbers (\(\mathbb{R}\)). But real hardware uses floating-point numbers.

In standard math, addition is associative. This means:

\[(a + b) + c = a + (b + c)\]

In floating-point arithmetic, due to rounding errors at every step, this is not true.

\[(a \oplus b) \oplus c \neq a \oplus (b \oplus c)\]

Consider a scenario where you add a very large number to a very small number. The small number might get “swallowed” due to precision limits. If you change the order of operations—perhaps by running the code on a parallel GPU instead of a single-threaded CPU—the final result changes.

Existing verifiers often assume a “theoretical model” where these discrepancies don’t matter, or they bound the theoretical full-precision output. This paper argues that this assumption creates a massive security hole.

The Core Method: Deployed Verification

The researchers propose a paradigm shift. We should not be verifying the theoretical function \(f\). We must verify the deployed function \(r\).

The deployed function \(r(x; \theta, \mathcal{E})\) depends not just on the input and weights, but also on the Environment (\(\mathcal{E}\)). The environment dictates the hardware, the precision (32-bit vs. 64-bit), the compiler optimizations, and the order of operations.

The new verification problem looks like this:

The equation for deployed verification.

This seemingly small change in notation has massive implications. As illustrated below, the “Theoretical Model” might be perfectly safe (the circle is comfortably inside the blue region). However, depending on the environment (\(\mathcal{E}_1, \mathcal{E}_2, \mathcal{E}_3\)), the actual decision boundary shifts.

Figure 1. A malicious binary classifier with a decision boundary and two classes in two colors (conceptual illustration). The theoretical network and its three different deployments are illustrated, including a deployment in an adversarial environment where a malicious behavior manifests itself.

In the image above:

  • Theoretical Model: The input \(x\) is safe.
  • \(\mathcal{E}_1\) & \(\mathcal{E}_2\): The boundaries wiggle due to floating-point noise, but \(x\) is likely still safe.
  • \(\mathcal{E}_3\) (Adversarial Environment): The researchers show it is possible to construct a network that detects the specific floating-point “fingerprint” of an environment and completely flips its behavior (illustrated by the region turning yellow).

Why Current Verifiers Fail (The Theory)

The paper mathematically proves that standard methods like Interval Bound Propagation (IBP)—the bedrock of many verification tools—are unsound for deployed networks.

In IBP, instead of passing a single number through the network, we pass an interval \([min, max]\). If the final output interval is entirely positive, the network is verified.

However, the researchers prove that unless you know the exact order of operations the hardware uses (the specific “expression tree”), your interval might not contain the actual floating-point result.

In a stochastic environment (like a GPU parallelizing operations randomly), the actual output \(z\) might fall outside the calculated interval \([a, b]\).

Key Insight: Theoretically bounding the full-precision output does not guarantee you have bounded the floating-point output in a stochastic environment.

Weaponizing the Math: The Attack

The most exciting part of this research is how the authors demonstrate this vulnerability. They didn’t just find a bug; they built a weapon.

They designed Adversarial Networks containing “backdoors.” These are standard neural networks with a tiny, hidden “detector neuron” inserted.

The Detector Neuron

This neuron acts like a tripwire. It performs a specific calculation that results in 0 in a standard environment but results in a non-zero value in a target environment (or vice versa), purely due to floating-point anomalies.

When the detector triggers, it activates a backdoor that flips the network’s prediction.

1. Detecting Precision (32-bit vs 64-bit)

The researchers designed a neuron that outputs 0 if the precision is, say, 32-bit, but 1 if it is 64-bit. If a verifier runs in 64-bit mode (which many do for “higher accuracy”), it won’t see the error. But when the model is deployed on a 32-bit mobile device, the trap springs.

2. Detecting Operation Order

The researchers created “Verifier-Unfriendly” detectors. These rely on the summation of numbers where the order of operations drastically changes the result.

Consider this equation used for a detector:

Equation for a verifier-friendly detector involving sums of omega and 1.

While the equation above is “friendly” (easier for verifiers to handle), the researchers ramped up the difficulty. Look at the equation below:

Equation for a verifier-unfriendly detector involving a sum of 2/h and omega.

And finally, the hardest detector (“Order3”):

Equation for the hardest detector: sum of ones plus omega minus omega.

In this “Order3” detector, the network sums a sequence of \(1\)s, adds a large number \(\omega\), and subtracts \(\omega\).

  • Theoretical Result: The sum of ones.
  • Floating Point Result: If \(\omega\) is added too early, it might “swallow” the small \(1\)s due to precision limits. If it’s added at the end, the \(1\)s survive.

The researchers constructed the network so that if the summation happens in the “wrong” order (the deployment order), the output flips.

The Backdoor Architecture

How is this integrated? As shown in the diagram below, the “Detector” runs in parallel to the main network. If the detector fires (outputs a specific value based on the environment), it injects a massive value into the logits, forcing the classification to change.

Figure 2. Backdoor integration diagram showing the detector parallel to the logits.

Experiments & Results

The researchers took a robust MNIST classifier and infected it with these backdoors. They then threw the world’s best verifiers at it.

1. The Deployment Environment Matters

First, they proved that these backdoors actually work in the real world. They ran the backdoored networks on different hardware (CPU vs. GPU) and different software frameworks (PyTorch vs. Flux).

Take a look at Table 2:

Table 2. Test accuracy of backdoored networks across different environments.

Notice the “Order1” row.

  • PyTorch CPU: 0.11% accuracy (The backdoor triggered! The model failed completely).
  • Flux CPU: 98.11% accuracy (The model behaved normally).

This confirms that simply changing the library or hardware changes the operation order enough to trigger the trap.

2. The Precision Trap

In Table 1, they show the results of the precision attack.

Table 1. Test accuracy of backdoored networks based on 32-bit vs 64-bit precision.

  • Row 1: A model designed to fail in 32-bit. In a 32-bit environment, accuracy drops to 0.11%. In 64-bit, it’s 98.11%.
  • Row 2: A model designed to fail in 64-bit. It works perfectly in 32-bit but fails catastrophically in 64-bit.

3. The Failure of Verifiers

The most damning evidence is in Table 3. The researchers tested state-of-the-art verifiers like MIPVerify, \(\beta\)-CROWN, and DeepPoly. These tools are supposed to guarantee safety.

The columns “Order2”, “Order3”, and “Precision” show the vulnerability.

Table 3. The vulnerability of verifiers to the attacks.

  • “Unsound” means the verifier said “This network is safe,” but the network actually contained a backdoor that could crash it.
  • Look at the Order3 column. Every single verifier failed. They all certified the network as robust, missing the fact that a specific summation order triggers a complete failure.
  • Look at the Precision column. Verifiers running in 64-bit mode certified the 32-bit attack as safe, oblivious to the fact that deployment would be in 32-bit.

Why did they fail?

The verifiers failed because they used “Theoretical Soundness.” They bounded the math correctly for real numbers or idealized intervals. But they did not account for the specific, messy, non-associative floating-point operations of the deployment hardware.

For example, when using zonotope-based verification (like DeepZ), the bounds are calculated using widened intervals to account for errors:

Equation for zonotope bound calculation with error terms.

Even with these error terms (\(\varepsilon\)), the researchers proved that there are specific operation orders that fall outside the calculated bounds.

Conclusion & Implications

This paper serves as a wake-up call for the AI safety community. It highlights a critical “Implementation Gap.”

  1. Theoretical Soundness \(\neq\) Practical Soundness: You cannot verify a neural network in a vacuum. You must verify the network in the context of its deployment.
  2. The Threat is Real: Attackers can hide malicious logic that is invisible to current code analysis and mathematical verification, triggered only by the specific GPU or CPU of the target.
  3. Future of Verification: To be truly safe, verifiers need to model the hardware. They need to know if the target is an NVIDIA GPU using cuDNN or a CPU using PyTorch. They must account for the exact precision and rounding modes.

Until verifiers bridge the gap between abstract mathematics and silicon reality, our “guarantees” of safety remain worryingly fragile. The next time you see a “Verified Robust” label on an AI model, you might want to ask: “Verified on which chip?”