When Machines See What We Can’t: Understanding Adversarial Doppelgängers

Imagine you are looking at a picture of a cat. It’s a tabby cat. You are absolutely certain of it. Now, imagine a computer looks at the exact same picture and confidently tells you it’s a Persian cat. You squint. You zoom in. You check every pixel. To your human eyes, nothing has changed.

This isn’t the typical story of “adversarial examples” where someone adds static noise to a photo of a panda to make an AI think it’s a gibbon. This is something subtler and more profound. This is the phenomenon of Adversarial Doppelgängers.

In the research paper Doppelgängers and Adversarial Vulnerability, George Kamberov presents a fascinating mathematical and philosophical dive into why machine learning classifiers fail in ways that are “perceptually and cognitively disturbing” to humans. The paper argues that our current methods of measuring robustness—using standard distance metrics—are fundamentally flawed because they don’t map to human perception.

In this post, we will break down the complex topology of perception, explore why “high accuracy” might actually be the key to safety, and define a new class of inputs that look identical to us but completely different to a machine.

The Problem: Two Kinds of “Same”

To understand the core contribution of this paper, we first need to distinguish between two types of adversarial attacks.

The Classic Adversarial Example

The machine learning community has spent years studying adversarial examples. Usually, these are created by taking an image and adding a specifically calculated perturbation (noise) to it.

Figure 2. Applying a Fast Signed Gradient perturbation to the image (a) classified by MobileNetV2 as Labrador yields the image (b) which is classified by MobileNetV2 as Weimeraner.

As shown in Figure 2 above, a classic attack changes the input in a way that might look like corruption or noise to a human. We can see that image (b) is pixelated and distorted compared to the clean Labrador in image (a). While we might still recognize the dog, we can clearly see that the two images are different.

The Adversarial Doppelgänger (AD)

Now, look at Figure 1 below.

Figure 1. Most people cannot discriminate image (a) from image (b). MobileNetV2 classifies the later image as “persian” and the former picture as “taby”.

These two images are Doppelgängers. To a human observer, image (a) and image (b) are perceptually indiscriminable. You cannot tell them apart. Yet, the MobileNetV2 classifier sees them as two distinct classes: “tabby” and “Persian.”

This is the core problem the paper addresses. An Adversarial Doppelgänger (AD) is an input that is indistinguishable from a source input by a human (in a specific context) but is classified differently by the machine. If a machine makes a mistake on an image that looks visibly corrupted (like Figure 2), that’s an issue. But if a machine makes a mistake on an image that looks identical to the correct one (Figure 1), that reveals a fundamental mismatch between human topology (how we organize the world) and machine topology.

Background: The Limits of Geometry

Why do these errors happen? The paper argues that we have been using the wrong ruler to measure the world.

In most machine learning research, we assume the “space” of all possible images is a metric space (usually equipped with an \(L_p\) norm). We assume that if two images are “close” in terms of pixel values, they should have the same label. Conversely, if we want to simulate an attack, we limit the mathematical distance of the perturbation.

However, human perception does not follow these rigid mathematical rules. Perception is context-relative. Two colors might look identical in dim light but distinct in bright light. Two sounds might be indistinguishable in a noisy room but distinct in a quiet studio.

The paper introduces the concept of Indiscriminability (\(\approx\)). Two inputs, \(x\) and \(y\), are indiscriminable if, at a specific time and context, the subject cannot activate the knowledge required to tell them apart.

This relationship creates a Perceptual Topology (\(\tau_{\delta}\)). Unlike a standard metric space where distance is absolute, a perceptual topology is built on “phenomenal neighborhoods”—bubbles of reality that look the same to us.

Core Method: Mapping the Perceptual Topology

This section is the heart of the paper. We need to move away from measuring distance with rulers and start measuring it with “indiscriminability.”

1. Defining the Doppelgänger

The paper formally defines the phenomenal neighborhood, or the set of Doppelgängers for an input \(x\), denoted as \(\mathfrak{d}(x)\).

If we look at Weber’s Law (a psychological law stating that the “just noticeable difference” between two stimuli is proportional to the magnitude of the stimuli), we can mathematically describe these neighborhoods. For a value \(x\) in a range \([a, b]\), the set of values indistinguishable from \(x\) looks like this:

Equation defining the set of Doppelgängers based on Weber’s law ranges.

Here, \(w\) represents the Weber constant (related to sensitivity). This equation effectively creates “zones” of invisibility around a point \(x\). Anything falling inside this zone is a Doppelgänger of \(x\).

2. The Perceptual Metric

Because standard Euclidean distance doesn’t capture this phenomenon, the authors propose a perceptual distance. This isn’t based on pixels; it’s based on the “discrimination graph”—essentially counting how many “hops” of indiscriminability you need to get from one object to another.

The metric \(d_w\) is defined as:

Equation for perceptual distance d_w(x,y).

If two items are Doppelgängers (part of the same phenomenal neighborhood), the distance is effectively zero in a perceptual sense, even if their pixel values differ. This metric highlights that ADs are qualitatively different from standard adversarial examples.

3. Features and Indiscriminability

How does this relate to the features a neural network learns? The paper draws a distinction between indiscriminability (we can’t tell them apart) and indiscernibility (they have identical properties).

We can link features to perception using discriminative feature representations. If two inputs \(x\) and \(y\) share a set of features \(\Phi\), they are indiscriminable if their feature sets overlap:

Equation relating feature intersection to indiscriminability.

Furthermore, we can define a semantic cluster. This is the set of all inputs in the world that share a specific feature \(\xi\):

Equation defining the semantic cluster cl(xi).

This helps us understand why ADs exist: if the machine relies on features that are not part of the human “discriminative representation,” it will see differences where we see none, or similarities where we see differences.

4. The “Regular” Classifier

The ultimate goal is to build a Regular Classifier. A classifier \(R\) is “perceptually regular” if it respects the boundaries of human perception. In simple terms: it should never assign different labels to two Doppelgängers.

Mathematically, this means that for any class \(R_i\), it must be formed by the union of equivalence classes (groups of Doppelgängers):

Equation showing that a regular classifier class R_i is a union of equivalence classes.

If a classifier is NOT regular, it means there is at least one input \(x\) where the machine says “Class A” but a visually identical input \(y\) is “Class B”. This is the definition of vulnerability to Adversarial Doppelgängers.

Experiments & Analysis: The Accuracy Paradox

One of the most provocative sections of the paper challenges the common wisdom in machine learning that there is a trade-off between Accuracy and Robustness. The prevailing view is that to make a model more robust, you must sacrifice some accuracy.

The authors prove that for Adversarial Doppelgängers, this is incorrect. In fact, for very high-performing models, increasing accuracy is the only way to achieve robustness.

Defining Accuracy and Recall

First, let’s establish the metrics. We assume there is a “ground truth” or ideal world model \(\Omega\). The accuracy of a classifier \(R\) is the measure of the overlap between the classifier’s predictions and the ground truth:

Equation defining accuracy of classifier R relative to Omega.

We can break this down into recall rates (\(\rho_i\)) for each class:

Equation defining recall rates rho_i.

The “Danger Zone” of Low Accuracy

The paper introduces a bound \(\bar{k}(\Omega)\) which relates the size of the true class to the size of the Doppelgänger sets.

Equation defining the bound k_bar(Omega).

If a classifier has recall rates that are too low (meaning it’s not very accurate), the paper proves that every correctly classified input has adversarial Doppelgängers.

Equation showing the inequality where low recall leads to vulnerability.

Essentially, if your model isn’t accurate enough, there is always a version of an input that looks identical to a human but will fool the model.

Hypersensitivity: The High-Accuracy Regime

Here is the twist. What happens if accuracy gets very, very high?

The paper identifies a condition where the recall rate \(\rho\) satisfies:

Equation for the high accuracy condition.

If a classifier satisfies this condition, it exhibits Hypersensitive Behavior.

Definition: A classifier is hypersensitive if every misclassified input is an Adversarial Doppelgänger.

Think about what this means. If a model is hypersensitive, it never makes “stupid” mistakes (like calling a clear picture of a truck a bird). Its only mistakes occur on inputs that are perceptually indistinguishable from the correct class.

Therefore, for hypersensitive classifiers, improving robustness is equivalent to improving accuracy. There is no trade-off. To fix the vulnerability, you just need to make the model more accurate until it aligns perfectly with the “Regular” classifier.

Life Without Borders: Prototypes and Fringes

The paper delves into the structure of categories. In classical set theory, sets have hard boundaries. In human perception, categories are fuzzy. A “chair” is a chair, but a beanbag is sort of a chair.

The authors use the concept of Prototypes to explain how regular classifiers should be structured. We measure the “affinity” of an input \(x\) to a class \(D\) using a similarity scale \(s\):

Equation for affinity P(x, D).

An input is a Prototype if it maximizes this affinity—it is the most “representative” member of the class:

Equation defining a prototype as the supremum of affinity.

Conversely, a Fringe element is one that barely belongs to the class:

Equation defining a fringe element as the infimum of affinity.

The paper provides a complex but insightful derivation for calculating affinity based on the prominence of features and the probability of encountering the input:

Equation 19 expanding the prototype definition with probability and feature prominence.

This equation (19) tells us that a true prototype isn’t just the most “average” looking item. It is an item that balances prominence (it has strong features of the class) with frequency (it is a version of the object we actually encounter).

Quantifying Vulnerability

Finally, how do we know if a specific model is vulnerable to these Doppelgänger attacks? The paper proposes measuring the Region of Conceptual Ambiguity (\(A(R)\)). This is the set of all inputs \(x\) where a Doppelgänger exists that flips the label.

Equation defining the Region of Conceptual Ambiguity A(R).

To quantify the chaos within this region, we can look at the probability distribution of labels assigned to the Doppelgängers of \(x\):

Equation defining the probability distribution of labels p_j(x).

And from this probability, we can calculate the Conceptual Entropy \(H_R(x)\). High entropy means the model is very confused about \(x\) and its look-alikes.

Equation for Conceptual Entropy H_R(x).

The Fooling Rate \(F_R(\hat{a})\) (how often an attack succeeds) is bounded by the size (measure) of this ambiguous region:

Equation bounding the Fooling Rate by the measure of A(R).

This gives researchers a theoretical upper limit on how bad an attack can be. If the region of ambiguity is small, the fooling rate must be low.

Conclusion and Implications

The research presented in Doppelgängers and Adversarial Vulnerability challenges the machine learning community to rethink how we define “robustness.”

The key takeaways are:

  1. Topology Matters: We cannot understand adversarial attacks using only pixel-distance (\(L_p\) norms). We must model the “Perceptual Topology” of how humans see.
  2. The “Same” is Dangerous: The most insidious attacks aren’t the ones that look like noise; they are the ones that look identical to the source (Doppelgängers).
  3. No Trade-Off for Excellence: The idea that we must sacrifice accuracy for robustness is false for high-performance models. In the “hypersensitive” regime, better accuracy is better robustness.
  4. Ambiguity is Inevitable: Some classification problems are simply “not well defined” because the classes overlap in human perception. No classifier can be perfect on ambiguous data.

As we continue to integrate AI into critical systems—from medical diagnostics to autonomous driving—understanding these “invisible” errors becomes paramount. It is not enough for a machine to be right most of the time; it needs to see the world with the same consistency as we do, respecting the invisible boundaries of our own perception.