When Machines See What We Can’t: Understanding Adversarial Doppelgängers
Imagine you are looking at a picture of a cat. It’s a tabby cat. You are absolutely certain of it. Now, imagine a computer looks at the exact same picture and confidently tells you it’s a Persian cat. You squint. You zoom in. You check every pixel. To your human eyes, nothing has changed.
This isn’t the typical story of “adversarial examples” where someone adds static noise to a photo of a panda to make an AI think it’s a gibbon. This is something subtler and more profound. This is the phenomenon of Adversarial Doppelgängers.
In the research paper Doppelgängers and Adversarial Vulnerability, George Kamberov presents a fascinating mathematical and philosophical dive into why machine learning classifiers fail in ways that are “perceptually and cognitively disturbing” to humans. The paper argues that our current methods of measuring robustness—using standard distance metrics—are fundamentally flawed because they don’t map to human perception.
In this post, we will break down the complex topology of perception, explore why “high accuracy” might actually be the key to safety, and define a new class of inputs that look identical to us but completely different to a machine.
The Problem: Two Kinds of “Same”
To understand the core contribution of this paper, we first need to distinguish between two types of adversarial attacks.
The Classic Adversarial Example
The machine learning community has spent years studying adversarial examples. Usually, these are created by taking an image and adding a specifically calculated perturbation (noise) to it.

As shown in Figure 2 above, a classic attack changes the input in a way that might look like corruption or noise to a human. We can see that image (b) is pixelated and distorted compared to the clean Labrador in image (a). While we might still recognize the dog, we can clearly see that the two images are different.
The Adversarial Doppelgänger (AD)
Now, look at Figure 1 below.

These two images are Doppelgängers. To a human observer, image (a) and image (b) are perceptually indiscriminable. You cannot tell them apart. Yet, the MobileNetV2 classifier sees them as two distinct classes: “tabby” and “Persian.”
This is the core problem the paper addresses. An Adversarial Doppelgänger (AD) is an input that is indistinguishable from a source input by a human (in a specific context) but is classified differently by the machine. If a machine makes a mistake on an image that looks visibly corrupted (like Figure 2), that’s an issue. But if a machine makes a mistake on an image that looks identical to the correct one (Figure 1), that reveals a fundamental mismatch between human topology (how we organize the world) and machine topology.
Background: The Limits of Geometry
Why do these errors happen? The paper argues that we have been using the wrong ruler to measure the world.
In most machine learning research, we assume the “space” of all possible images is a metric space (usually equipped with an \(L_p\) norm). We assume that if two images are “close” in terms of pixel values, they should have the same label. Conversely, if we want to simulate an attack, we limit the mathematical distance of the perturbation.
However, human perception does not follow these rigid mathematical rules. Perception is context-relative. Two colors might look identical in dim light but distinct in bright light. Two sounds might be indistinguishable in a noisy room but distinct in a quiet studio.
The paper introduces the concept of Indiscriminability (\(\approx\)). Two inputs, \(x\) and \(y\), are indiscriminable if, at a specific time and context, the subject cannot activate the knowledge required to tell them apart.
This relationship creates a Perceptual Topology (\(\tau_{\delta}\)). Unlike a standard metric space where distance is absolute, a perceptual topology is built on “phenomenal neighborhoods”—bubbles of reality that look the same to us.
Core Method: Mapping the Perceptual Topology
This section is the heart of the paper. We need to move away from measuring distance with rulers and start measuring it with “indiscriminability.”
1. Defining the Doppelgänger
The paper formally defines the phenomenal neighborhood, or the set of Doppelgängers for an input \(x\), denoted as \(\mathfrak{d}(x)\).
If we look at Weber’s Law (a psychological law stating that the “just noticeable difference” between two stimuli is proportional to the magnitude of the stimuli), we can mathematically describe these neighborhoods. For a value \(x\) in a range \([a, b]\), the set of values indistinguishable from \(x\) looks like this:

Here, \(w\) represents the Weber constant (related to sensitivity). This equation effectively creates “zones” of invisibility around a point \(x\). Anything falling inside this zone is a Doppelgänger of \(x\).
2. The Perceptual Metric
Because standard Euclidean distance doesn’t capture this phenomenon, the authors propose a perceptual distance. This isn’t based on pixels; it’s based on the “discrimination graph”—essentially counting how many “hops” of indiscriminability you need to get from one object to another.
The metric \(d_w\) is defined as:

If two items are Doppelgängers (part of the same phenomenal neighborhood), the distance is effectively zero in a perceptual sense, even if their pixel values differ. This metric highlights that ADs are qualitatively different from standard adversarial examples.
3. Features and Indiscriminability
How does this relate to the features a neural network learns? The paper draws a distinction between indiscriminability (we can’t tell them apart) and indiscernibility (they have identical properties).
We can link features to perception using discriminative feature representations. If two inputs \(x\) and \(y\) share a set of features \(\Phi\), they are indiscriminable if their feature sets overlap:

Furthermore, we can define a semantic cluster. This is the set of all inputs in the world that share a specific feature \(\xi\):

This helps us understand why ADs exist: if the machine relies on features that are not part of the human “discriminative representation,” it will see differences where we see none, or similarities where we see differences.
4. The “Regular” Classifier
The ultimate goal is to build a Regular Classifier. A classifier \(R\) is “perceptually regular” if it respects the boundaries of human perception. In simple terms: it should never assign different labels to two Doppelgängers.
Mathematically, this means that for any class \(R_i\), it must be formed by the union of equivalence classes (groups of Doppelgängers):

If a classifier is NOT regular, it means there is at least one input \(x\) where the machine says “Class A” but a visually identical input \(y\) is “Class B”. This is the definition of vulnerability to Adversarial Doppelgängers.
Experiments & Analysis: The Accuracy Paradox
One of the most provocative sections of the paper challenges the common wisdom in machine learning that there is a trade-off between Accuracy and Robustness. The prevailing view is that to make a model more robust, you must sacrifice some accuracy.
The authors prove that for Adversarial Doppelgängers, this is incorrect. In fact, for very high-performing models, increasing accuracy is the only way to achieve robustness.
Defining Accuracy and Recall
First, let’s establish the metrics. We assume there is a “ground truth” or ideal world model \(\Omega\). The accuracy of a classifier \(R\) is the measure of the overlap between the classifier’s predictions and the ground truth:

We can break this down into recall rates (\(\rho_i\)) for each class:

The “Danger Zone” of Low Accuracy
The paper introduces a bound \(\bar{k}(\Omega)\) which relates the size of the true class to the size of the Doppelgänger sets.

If a classifier has recall rates that are too low (meaning it’s not very accurate), the paper proves that every correctly classified input has adversarial Doppelgängers.

Essentially, if your model isn’t accurate enough, there is always a version of an input that looks identical to a human but will fool the model.
Hypersensitivity: The High-Accuracy Regime
Here is the twist. What happens if accuracy gets very, very high?
The paper identifies a condition where the recall rate \(\rho\) satisfies:

If a classifier satisfies this condition, it exhibits Hypersensitive Behavior.
Definition: A classifier is hypersensitive if every misclassified input is an Adversarial Doppelgänger.
Think about what this means. If a model is hypersensitive, it never makes “stupid” mistakes (like calling a clear picture of a truck a bird). Its only mistakes occur on inputs that are perceptually indistinguishable from the correct class.
Therefore, for hypersensitive classifiers, improving robustness is equivalent to improving accuracy. There is no trade-off. To fix the vulnerability, you just need to make the model more accurate until it aligns perfectly with the “Regular” classifier.
Life Without Borders: Prototypes and Fringes
The paper delves into the structure of categories. In classical set theory, sets have hard boundaries. In human perception, categories are fuzzy. A “chair” is a chair, but a beanbag is sort of a chair.
The authors use the concept of Prototypes to explain how regular classifiers should be structured. We measure the “affinity” of an input \(x\) to a class \(D\) using a similarity scale \(s\):

An input is a Prototype if it maximizes this affinity—it is the most “representative” member of the class:

Conversely, a Fringe element is one that barely belongs to the class:

The paper provides a complex but insightful derivation for calculating affinity based on the prominence of features and the probability of encountering the input:

This equation (19) tells us that a true prototype isn’t just the most “average” looking item. It is an item that balances prominence (it has strong features of the class) with frequency (it is a version of the object we actually encounter).
Quantifying Vulnerability
Finally, how do we know if a specific model is vulnerable to these Doppelgänger attacks? The paper proposes measuring the Region of Conceptual Ambiguity (\(A(R)\)). This is the set of all inputs \(x\) where a Doppelgänger exists that flips the label.

To quantify the chaos within this region, we can look at the probability distribution of labels assigned to the Doppelgängers of \(x\):

And from this probability, we can calculate the Conceptual Entropy \(H_R(x)\). High entropy means the model is very confused about \(x\) and its look-alikes.

The Fooling Rate \(F_R(\hat{a})\) (how often an attack succeeds) is bounded by the size (measure) of this ambiguous region:

This gives researchers a theoretical upper limit on how bad an attack can be. If the region of ambiguity is small, the fooling rate must be low.
Conclusion and Implications
The research presented in Doppelgängers and Adversarial Vulnerability challenges the machine learning community to rethink how we define “robustness.”
The key takeaways are:
- Topology Matters: We cannot understand adversarial attacks using only pixel-distance (\(L_p\) norms). We must model the “Perceptual Topology” of how humans see.
- The “Same” is Dangerous: The most insidious attacks aren’t the ones that look like noise; they are the ones that look identical to the source (Doppelgängers).
- No Trade-Off for Excellence: The idea that we must sacrifice accuracy for robustness is false for high-performance models. In the “hypersensitive” regime, better accuracy is better robustness.
- Ambiguity is Inevitable: Some classification problems are simply “not well defined” because the classes overlap in human perception. No classifier can be perfect on ambiguous data.
As we continue to integrate AI into critical systems—from medical diagnostics to autonomous driving—understanding these “invisible” errors becomes paramount. It is not enough for a machine to be right most of the time; it needs to see the world with the same consistency as we do, respecting the invisible boundaries of our own perception.
](https://deep-paper.org/en/paper/file-1991/images/cover.png)