Introduction: The Privacy Dilemma in the Age of AI
Imagine you are using a voice assistant. To understand your command, the system needs to analyze the content of your speech. However, your voice recording carries much more than just the words you spoke; it contains your gender, your accent, your approximate age, and potentially even your identity.
This is the fundamental tension in modern Machine Learning (ML) services: to provide utility, they need data. But that data often comes coupled with sensitive, private attributes that users have no desire (and often no need) to share.
For years, researchers have tried to build “obfuscators”—filters that scrub private information from data while keeping the useful parts intact. Most state-of-the-art methods rely on adversarial training. In this setup, a “protector” model tries to hide information, while an “adversary” model tries to find it. The protector learns by trying to fool the adversary.
But there is a catch. What happens if the adversary you face in the real world is smarter than the one you trained against?
In a recent research paper titled “PASS: Private Attributes Protection with Stochastic Data Substitution,” researchers uncover a critical vulnerability in these traditional methods and propose a radical new approach. Instead of trying to edit or generate a “scrubbed” version of your data, they propose simply swapping it with someone else’s data that looks similar enough to be useful, but different enough to be private.

As illustrated in the comic above, the core idea is simple yet profound. If an orange cat wants to hide the fact that it is orange, painting spots on itself might fool a simple detector. But a smart “fox” (a strong adversary) will see through the paint. A better strategy? Swap the photo entirely with a real photo of a Calico cat.
The Weakness of Adversarial Training
To understand why PASS is necessary, we first need to look at why current methods fail. Existing privacy-preserving methods (like GAP, PPDAR, or MaSS) typically use an encoder-decoder structure. They take an input \(X\) (e.g., a photo), compress it, and reconstruct a new image \(X'\) that minimizes the leakage of a private attribute \(S\) (e.g., gender) while maintaining a useful attribute \(U\) (e.g., expression).
These models are trained against a specific adversarial classifier. The assumption is: “If my adversary can’t guess the gender from this image, no one can.”
The authors of PASS challenge this assumption with the Probing Attack. They demonstrate that if an attacker simply collects enough input/output pairs from the deployed obfuscation model, they can train a fresh, separate classifier that easily bypasses the protection.

The table above shows this vulnerability on the Motion Sense dataset. Look at the NAG-Protector column (the defense’s internal metric) versus the NAG-Attacker column (the real-world threat). For methods like GAP or MSDA, the protector thinks it has achieved near-perfect privacy (NAG \(\approx\) 0%). But a probing attacker can recover the private attributes with high accuracy (NAG > 60%).
This reveals that adversarial training often results in “overfitting” to the specific adversary used during training, leaving the system wide open to new, stronger attacks.
The PASS Solution: Stochastic Data Substitution
To solve this, the researchers introduce PASS (Private Attributes protection with Stochastic data Substitution). PASS abandons the idea of generating new data. Instead, it maintains a “substitution dataset”—a pool of real, public samples.
When a user submits their data, PASS calculates which sample from the substitution dataset would best mask the user’s private attributes while preserving the useful ones. It then replaces the user’s data with that substitute.
A Concrete Example
Let’s look at how this works on facial images. Suppose we have a dataset of faces.
- Private Attribute (\(S\)): Sex (we want to hide this).
- Useful Attribute (\(U\)): Eyeglasses and Smiling (we need to keep these).
- General Features (\(X\)): Things like hair color or age (we want to preserve these generally to keep the image realistic).

In Figure 2, the original sample is a woman with sunglasses. PASS analyzes the attributes and selects a substitute. The system might swap the original photo with a different photo of a woman with sunglasses, or perhaps a man with sunglasses, based on a probability distribution designed to confuse the gender classifier. Crucially, the outcome is that the private attribute (Sex) becomes statistically effectively random (unknown), while the useful attributes (Eyeglasses) remain accurate.
The Architecture
So, how does the model decide which sample to swap in? It doesn’t just pick randomly. It uses a rigorous probability framework.

The architecture, shown above, involves two main steps:
- Embedding: The original sample \(x\) and all potential substitute samples \(x'\) are run through feature extractors (\(f\) and \(g\)) to create embedding vectors.
- Probability Calculation: The system calculates a substitution probability \(P_\theta(X'|X)\) based on the similarity of these embeddings.
The probability of swapping input \(x\) with substitute \(x'\) is determined by the cosine similarity of their embeddings, scaled by a temperature parameter \(\tau\).

This mechanism ensures that the substitution is differentiable, meaning the neural networks \(f\) and \(g\) can be trained using backpropagation.
The Information-Theoretic Loss Function
The “brain” of PASS lies in how it learns to select substitutes. The researchers derived a novel loss function based on Information Theory. The goal is to optimize the trade-off between three competing objectives:
- Minimize Mutual Information with Private Attributes (\(I(X'; S_i)\)): The substitute should tell us nothing about the sensitive data.
- Maximize Mutual Information with Useful Attributes (\(I(X'; U_j)\)): The substitute should reveal as much as possible about the required data.
- Maximize Mutual Information with the Original Data (\(I(X'; X)\)): The substitute should still resemble the original data in general ways (preserving unannotated features like background or texture).
The high-level optimization objective looks like this:

However, calculating “Mutual Information” directly on mini-batches of data is mathematically impossible during training. To get around this, the authors derived a mathematically sound upper bound, resulting in a computable loss function \(\hat{L}\).

Let’s break down the three components of this loss function:
1. Protecting Privacy (\(\hat{L}_{S_i}\))
This term tries to maximize the conditional entropy of the substituted data given the private attribute. In simpler terms, if the original image was a “Male,” the system is encouraged to pick substitutes that could be either Male or Female with high uncertainty, making it impossible for an attacker to guess the original label.

2. Preserving Utility (\(\hat{L}_{U_j}\))
This term punishes the model if the useful attribute of the substitute (\(U'\)) does not match the useful attribute of the original (\(U\)). If the user says “Hello” (Useful: text content), the substitute audio must also say “Hello.”

3. Preserving General Features (\(\hat{L}_X\))
Finally, we want to preserve the “essence” of the data. This term encourages the selected substitute to be specific to the input, rather than mapping all inputs to a single generic image.

The researchers rigorously prove that minimizing this surrogate loss \(\hat{L}\) effectively minimizes the theoretical objective \(L\).
Experimental Results
The researchers evaluated PASS on three distinct modalities: Audio (AudioMNIST), Human Activity (Motion Sense), and Facial Images (CelebA).
The primary metric used is NAG (Normalized Accuracy Gain).
- NAG = 0%: The attribute is completely hidden (random guessing).
- NAG = 100%: The attribute is perfectly preserved.
- Goal: We want Low NAG for private attributes and High NAG for useful attributes.
Success on Audio Data
In the AudioMNIST experiment, the goal was to hide the speaker’s gender while preserving the spoken digit.

As shown in Table 2, PASS achieves a 0.0% NAG on Gender. This means the privacy protection is perfect; an attacker guessing the gender of the speaker would do no better than flipping a coin. Meanwhile, the useful attribute (Digit) retains a 96.5% NAG, meaning utility is preserved.
Compare this to the baselines:
- ADV (Adversarial): Leaks significant gender info (71.4% NAG).
- GAP: Hides gender well (13.3%) but destroys utility (Digit drops to 21.2%).
Success on Facial Images
Similar results were found on the CelebA dataset. Here, the task was to hide “Male” (Sex) while preserving “Smiling” and “Young.”

PASS achieves a massive mNAG (mean Normalized Accuracy Gain) of 72.9%, far outperforming the closest competitor. It successfully drives the private attribute leakage down to near zero while keeping the relevant facial features intact.
Visualizing the Substitution
The effectiveness of PASS becomes even clearer when we look at the confusion matrices for the Motion Sense dataset.

In these heatmaps:
- (c) Activity: You see a strong diagonal line. This means “Walking” is substituted with “Walking,” and “Jogging” with “Jogging.” Utility is preserved.
- (a) Gender and (b) ID: The matrices look like random static. There is no correlation between the input gender/ID and the substitute’s gender/ID. Privacy is protected.
Robustness
One might wonder: does this only work for specific attributes? The authors conducted ablation studies, varying the combination of private and useful attributes.

Table 3 shows that regardless of which attributes are marked private or useful, PASS consistently adapts, maintaining high utility scores (high mNAG) and effectively zeroing out the leakage of private data.
Conclusion and Implications
The PASS framework represents a significant step forward in privacy-preserving machine learning. By moving away from generative adversarial networks—which are prone to “arms races” against attackers—and embracing stochastic data substitution, PASS offers a more theoretically sound and empirically robust defense.
Key Takeaways:
- Adversarial Training is Risky: Standard obfuscation methods are vulnerable to probing attacks where attackers train new classifiers on the obfuscated data.
- Substitution Works: Replacing data with carefully selected real-world samples breaks the link between the user’s private info and the shared data.
- Theoretical Foundation: PASS isn’t just a heuristic; its loss function is a mathematically derived upper bound on Information Theoretic objectives.
- Versatility: It works effectively across audio, sensor data, and images.
For students and researchers entering the field of privacy in AI, PASS highlights an important lesson: sometimes the best way to hide information isn’t to destroy it or noise it, but to hide it in plain sight among a crowd of substitutes.
](https://deep-paper.org/en/paper/2506.07308/images/cover.png)