In the rapidly evolving landscape of machine learning, Differential Privacy (DP) has become the gold standard for training models on sensitive data. Theoretically, DP guarantees that the contribution of any single individual to a dataset does not significantly affect the model’s output. However, a significant gap often exists between theory and practice. Implementation bugs, floating-point errors, or loose theoretical analysis can lead to models that are less private than claimed.

This necessitates empirical privacy auditing—a “stress test” for privacy-preserving algorithms. By acting as an adversary and attacking a model, auditors can verify if the claimed privacy parameters hold up in reality.

Historically, auditing has suffered from a dichotomy: it was either computationally expensive (requiring thousands of training runs) or statistically loose (providing weak lower bounds on privacy). In this post, we explore a new methodology presented by researchers from Meta FAIR, “Auditing f-Differential Privacy in One Run.” This work proposes a tighter, efficient auditing procedure that requires only a single training run and leverages the granular analysis of \(f\)-Differential Privacy (\(f\)-DP) to deliver significantly more accurate privacy estimates.

The Problem: Trust but Verify

When a mechanism, such as Differentially Private Stochastic Gradient Descent (DP-SGD), claims to satisfy \((\epsilon, \delta)\)-DP, it is a mathematical promise. But how do we verify this promise without simply trusting the mathematical proof?

Empirical auditing treats the training algorithm as a black box. The auditor injects known data points (canaries), trains the model, and then attempts to determine which canaries were used during training. If the auditor can guess correctly with high confidence, the privacy guarantee is weak.

The Efficiency Bottleneck

Traditional auditing methods are computationally prohibitive for large models. To build a statistically significant estimate of the privacy leakage, auditors traditionally had to train the model thousands of times on slightly different datasets. For modern Large Language Models (LLMs) or deep vision networks, this is impossible.

Recent work by Steinke et al. (2023) introduced a breakthrough: Auditing in One Run. By leveraging the randomness of the training data itself (specifically, randomly including or excluding canaries), they derived bounds that require training the model only once. However, their analysis relied on standard \((\epsilon, \delta)\)-DP accounting. As we will see, this approximation leaves a lot of “privacy budget” on the table, resulting in estimates that are much lower than the true privacy level of the model.

Background: The Tools of the Trade

To understand the authors’ contribution, we must first establish the difference between standard DP and \(f\)-DP, and how auditing games function.

Standard vs. \(f\)-Differential Privacy

Standard DP describes privacy using two scalars: \(\epsilon\) (privacy loss) and \(\delta\) (failure probability). While useful, this definition often relies on loose approximations of the true trade-off between false positives and false negatives in distinguishing between two datasets.

\(f\)-Differential Privacy (\(f\)-DP) offers a more complete view. Instead of fixed parameters, it characterizes privacy using a trade-off function \(f\). This function completely describes the hypothesis testing difficulty of distinguishing between a dataset \(S\) and its neighbor \(S'\).

The definition of f-Differential Privacy as an inequality involving the trade-off function f.

As shown above, a mechanism is \(f\)-DP if the probability of an output falling into a set \(T\) under one dataset is bounded by the trade-off function applied to the probability under the neighboring dataset. This functional form allows for exact accounting, particularly for the Gaussian mechanism, which is central to private deep learning.

The Auditing Game

The auditing process is modeled as a game between an Auditor and the Algorithm.

  1. Setup: The auditor selects a set of “canaries” (special data points).
  2. Randomization: For each canary, a coin is flipped. If heads, the canary is added to the training set; if tails, it is excluded.
  3. Training: The mechanism trains on the dataset (including the selected canaries).
  4. Guessing: The auditor (adversary) inspects the final model and tries to guess the state of the coin (included/excluded) for each canary.

The output of this game is a pair of integers:

  • \(c'\): The total number of guesses the adversary chose to make.
  • \(c\): The number of correct guesses.

Equation defining c-prime, the total number of guesses made by the adversary.

Equation defining c, the number of correct guesses made by the adversary.

The goal of the new framework is to take these observed counts (\(c\) and \(c'\)) and mathematically work backward to find the tightest \(f\)-DP curve that could plausibly explain this success rate.

Core Method: Tight Auditing via Recursive Analysis

The primary contribution of this paper is a rigorous mathematical link between the adversary’s success rate in a “One Run” game and the \(f\)-DP curve.

Why Prior Methods were Loose

Previous “One Run” auditing (Steinke et al.) bounded the adversary’s success using a Binomial distribution derived from \(\epsilon\). They effectively asked: “If the model is \(\epsilon\)-private, what is the maximum number of correct guesses an adversary could make?”

However, this approach approximates the privacy curve at a single point. Real mechanisms, like the Gaussian mechanism, have a curved trade-off. By checking privacy at a single \((\epsilon, \delta)\) point, previous methods fail to utilize the full geometric constraints of the privacy mechanism, leading to pessimistic estimates (auditing values far below the theoretical truth).

The New Approach: Bounding the Tail

The authors propose analyzing the entire \(f\)-DP curve. They derive a recursive relationship that bounds the probability of an adversary making exactly \(i\) correct guesses.

The central theorem of the paper (Theorem 9) provides an invariant that must hold for any \(f\)-DP mechanism.

Inequality bounding the weighted sum of probabilities of correct guesses using the f-DP trade-off function.

This inequality might look daunting, but it encodes a powerful “shuffling” argument. Here is the intuition:

  1. Symmetry: The specific identity of the canaries doesn’t matter, only the number of correct guesses. The analysis assumes the canaries are permuted (shuffled).
  2. Conditioning: The authors analyze the probability of getting a specific number of guesses correct by conditioning on the success of the first guess.
  3. Recursive Bounding: They derive a way to bound the probability \(p_i\) (probability of \(i\) correct guesses) based on the probabilities of \(0\) to \(i-1\) correct guesses.

By iterating this bound, they can numerically construct the “tail” of the distribution—the cumulative probability that an adversary gets at least \(c\) guesses right.

Algorithm 3: The Numerical Auditor

The theoretical bound is implemented via a numerical algorithm (referred to as Algorithm 3 in the paper).

  1. Input: The observed correct guesses \(c\), total guesses \(c'\), and the hypothesized trade-off function \(f\).
  2. Process: The algorithm recursively calculates lower bounds on the probabilities of previous states.
  3. Decision: It checks if the sum of these probabilities leads to a contradiction (i.e., total probability > 1).
  4. Output: If a contradiction is found, it means the hypothesized privacy level \(f\) is too high to explain the adversary’s success. The hypothesis is rejected.

This allows the auditor to test various privacy curves (e.g., Gaussian curves with different noise levels) and find the “strongest” privacy curve that is essentially consistent with the attack results.

Generalization: Reconstruction Attacks

The paper generalizes this beyond simple “in or out” membership inference. It also supports reconstruction attacks, where the adversary tries to identify which data point from a set of \(k\) options was used.

Algorithm 2 showing the steps for the Reconstruction in One Run game.

In this variant, the math adjusts to account for the difficulty of guessing 1 out of \(k\) options, rather than just binary classification. This generalization makes the auditing framework applicable to a wider range of privacy threats.

Empirical Privacy and Results

To demonstrate the effectiveness of this new method, the authors compare their Empirical \(\epsilon\) (the privacy level audited from the attack) against the Theoretical \(\epsilon\) (the mathematically proven privacy) and the results from Steinke et al. (2023).

An “ideal” auditor would produce an empirical \(\epsilon\) exactly equal to the theoretical \(\epsilon\). A “loose” auditor produces a value much lower.

Comparison on Gaussian Mechanisms

The figure below is the most telling result of the paper. It compares auditing performance on a standard Gaussian mechanism across different noise levels.

Composite line charts comparing empirical epsilon bounds against theoretical epsilon. The orange line (Ours) is significantly higher and closer to the theoretical green line than the blue line (Steinke et al.).

Interpretation:

  • Green Line (Theoretical): The ground truth privacy.
  • Blue Line (Steinke et al.): The previous state-of-the-art. Notice how it drops off significantly as the number of canaries increases (x-axis), particularly for lower noise levels.
  • Orange Line (Ours): The proposed method. It consistently hugs the theoretical line much closer than the baseline. Crucially, as the number of canaries increases, the bound improves (moves up), whereas the baseline degrades.

This demonstrates that analyzing the full \(f\)-DP curve captures privacy guarantees that point-wise \((\epsilon, \delta)\) analysis misses.

Real-World Training: CIFAR-10

The authors also validated their method on a real neural network trained on the CIFAR-10 dataset using DP-SGD. They performed a gradient-based membership inference attack in a white-box setting (access to intermediate model updates).

Graph comparing empirical lower bounds on CIFAR-10. The orange curve (Ours) yields tighter estimates than the blue curve (Steinke et al.) across different theoretical epsilon values.

Again, the trend holds. For a model with a theoretical \(\epsilon=8\), the new method certifies an empirical \(\epsilon\) of approximately 6.5, whereas the previous method certifies significantly less. This “gap closing” allows practitioners to claim stronger privacy guarantees for their models without changing the training process, simply by using a better auditing tool.

Impact of Experimental Parameters

The effectiveness of the audit depends on the “guessing game” parameters, such as the bucket size for reconstruction attacks (\(k\)) and the number of guesses (\(c'\)).

Line charts showing the effect of bucket size on empirical lower bounds. Larger bucket sizes generally allow for tighter privacy bounds.

The results indicate that using larger bucket sizes (making the reconstruction task harder for the adversary) allows the auditing math to extract tighter bounds, further closing the gap to the theoretical limit.

Conclusion and Implications

The “One Run” auditing framework represents a critical step toward practical privacy verification. By removing the computational barrier of multiple training runs, it makes auditing accessible for large-scale AI models.

The specific contribution of Mahloujifar, Melis, and Chaudhuri—integrating \(f\)-Differential Privacy into this framework—addresses the second major hurdle: accuracy. By utilizing the entire privacy curve rather than a loose approximation, they provide:

  1. Tighter Bounds: Empirical estimates that are much closer to the theoretical truth.
  2. Scalability: Bounds that improve, rather than degrade, as the auditor injects more canaries.
  3. Flexibility: A unified framework for both Membership Inference and Reconstruction attacks.

While a gap between empirical and theoretical privacy remains, this work significantly narrows it. For engineers and researchers deploying DP models, this means they can verify their implementations with greater confidence, ensuring that the “privacy budget” spent in theory is actually protecting users in practice.