In the era of large-scale artificial intelligence, models are voracious learners. They ingest massive datasets, training on everything from web-crawled images to sensitive facial data. But what happens when a model knows too much?
Imagine a scenario where a user exercises their “right to be forgotten,” demanding their photos be removed from a facial recognition system. Or consider a model inadvertently trained on copyrighted material or poisoned data that creates a security “backdoor.” In these cases, we face the challenge of Machine Unlearning.
Retraining a massive Deep Neural Network (DNN) from scratch every time a data point needs removal is computationally prohibitive. We need a way to surgically erase specific concepts or classes without damaging the rest of the model’s knowledge. This is easier said than done. Most current methods either fail to fully erase the target or, in the process of scrubbing, cause “catastrophic forgetting” of unrelated information.
In this post, we will explore a groundbreaking paper titled “Decoupled Distillation to Erase: A General Unlearning Method for Any Class-centric Tasks.” We will break down their method, known as DELETE, which uses a clever mathematical decomposition of loss functions to achieve state-of-the-art unlearning performance without needing access to the original training data.
The Problem: Class-Centric Unlearning
The specific problem we are tackling is Class-Centric Unlearning. Given a pre-trained model \(f_{\theta_o}\) and a dataset \(\mathcal{D}\), we divide the data into two groups:
- Forget Set (\(\mathcal{D}_f\)): The specific class(es) we want to remove (e.g., “delete all knowledge of ‘airplanes’”).
- Remain Set (\(\mathcal{D}_r\)): Everything else we want to keep (e.g., “keep knowledge of ‘cars’, ‘birds’, ‘ships’”).
The Real-World Constraint
Here is the catch that makes this research paper special: In many real-world “Machine-as-a-Service” (MaaS) scenarios, you do not have access to the Remain Set (\(\mathcal{D}_r\)) during the unlearning process.
Why?
- Privacy: The data might belong to different users and cannot be aggregated.
- Storage: The original training set might be petabytes in size and deleted after training to save costs.
- Copyright/Expiration: Data retention policies might force the deletion of the original data.
Most existing unlearning methods rely on seeing \(\mathcal{D}_r\) to ensure the model remembers it. If you run a standard unlearning algorithm using only the data you want to forget (by trying to maximize loss on it), the model usually panics and destroys its feature extractors, ruining accuracy on everything else.
The goal of DELETE is to remove the target class using only the Forget Set, while mathematically guaranteeing the preservation of the remaining classes.
Theoretical Framework: Decomposing the Loss
To solve this, the authors first stepped back to analyze the fundamental mathematics of classification loss. They proposed a framework to decouple (separate) the unlearning objective into two distinct parts: Forgetting and Retention.
The Standard KL Divergence
In knowledge distillation or classification, we often minimize the Kullback-Leibler (KL) Divergence between a target distribution \(\mathbf{p}\) and the model’s output \(\mathbf{q}\).
Let’s look at the general form of the loss function:

Here:
- \(u\) is the index of the class we want to forget.
- \(p_u\) and \(q_u\) are the target and predicted probabilities for the forget class.
- The summation handles all other classes (\(i \neq u\)).
The Decomposition
The researchers performed a clever algebraic manipulation. They split the probability vectors into two components: the binary probability of “is it the target class or not?” and the relative probabilities of “if it’s not the target class, what is it?”
By defining \(p_{\setminus u}\) as the sum of probabilities of all non-target classes, and utilizing renormalized probabilities \(\hat{p}_i\) (where \(\hat{p}_i = p_i / p_{\setminus u}\)), they rewrote the summation term:

Substituting this back into the original equation allows us to completely decouple the loss into two understandable terms:

Let’s explain this equation in plain English, as it is the foundation of the entire paper:
- Term 1 (Forgetting Loss): \(\mathrm{KL}(\mathbf{p}^{(b)} \| \mathbf{q}^{(b)})\). This compares the binary distribution. It asks: How much probability mass is assigned to the target class versus the sum of all other classes? Optimizing this ensures the model stops predicting the target class.
- Term 2 (Retention Loss): \(p_{\setminus u} \mathrm{KL}(\hat{\mathbf{p}} \| \hat{\mathbf{q}})\). This compares the shape of the distribution for the remaining classes. It asks: Regardless of the target class score, does the model still know that a ‘cat’ looks more like a ‘dog’ than a ’truck’? Optimizing this ensures the model preserves its knowledge structure.
Why Previous Methods Failed
Many existing methods use Re-labeling. They take images of the target class (e.g., “airplane”) and retrain the model telling it that these images are “random noise” or a specific wrong label.
The authors show that Re-labeling is actually a special case of their framework, but a flawed one.

When you blindly minimize \(-\log(q_r)\) (forcing the model to predict a random label \(r\)), you are effectively using a “one-hot” target vector where \(p_r = 1\) and everything else is 0.
If we plug “one-hot” targets into the decoupled framework, the equation collapses to:

The Critical Flaw: Notice that the summation over all other classes (\(i \neq u, r\)) has disappeared. Re-labeling implicitly optimizes the Forgetting term (Term 1), but it provides zero supervision for the Retention term (Term 2) for any class other than the random label \(r\).
This explains why re-labeling often damages the model’s accuracy on unrelated classes—it completely ignores the relationships between them during the update.
The Solution: Decoupled Distillation (DELETE)
To fix the flaw in re-labeling, we need a way to supervise the Retention Loss without access to the remaining data.
The solution? Use the model itself.
Before we start unlearning, the pre-trained model (let’s call it the “Teacher”) already knows the relationships between classes. Even if we feed it an image of the class we want to forget, the “Dark Knowledge”—the non-zero probabilities assigned to other classes—contains the structural information we want to preserve.
The Strategy
The authors propose DELETE (Decoupled Distillation to Erase). The method uses the frozen original model to generate soft targets for the unlearning process.
To achieve unlearning, we need to construct a target distribution \(\mathbf{p}\) that satisfies three conditions:
- Forgetting Condition: The probability of the target class \(p_u\) must go to 0.
- Retaining Condition: The relative probabilities of the other classes must match the original model.
- Probability Property: The sum of all probabilities must equal 1.
The Masking Mechanism
To create this target distribution dynamically, the authors introduce a Masking Function. They take the logits (raw outputs) from the frozen teacher model and “mask out” the target class index.

Conceptually, this sets the probability of the target class to zero. However, we then need to re-normalize the vector so it sums to 1. The loss function becomes:

While mathematically sound, normalizing manually can be unstable. The authors provide a reformulation (detailed in the Appendix) showing that you can achieve the exact same result by manipulating the logits directly before the Softmax.
They define a modified mask \(\mathrm{Mask}'\) that sets the logit of the target class to negative infinity (\(-\infty\)).

By setting the target logit to \(-\infty\), the Softmax function naturally turns that probability to 0 while re-weighting the other classes proportionally. This simplifies the implementation significantly:

The Algorithm
The process is elegant in its simplicity. We iterate through the Forget Set only. For every image:
- Pass it through the Frozen Teacher (original model).
- Apply the Mask (\(-\infty\) to the target class logit).
- Apply Softmax to get the target distribution. This target now says: “This image has 0% chance of being class \(u\), but maintains the exact relative likelihoods of being anything else.”
- Pass the image through the Student (unlearning model).
- Calculate KL Divergence between the Student’s output and the Teacher’s masked output.
- Backpropagate to update the Student.
This simultaneously drives the target probability to zero (Forgetting) and locks in the relationships of all other classes (Retention).
Experiments and Results
The authors evaluated DELETE across multiple datasets (CIFAR-10, CIFAR-100, Tiny ImageNet) and architectures (ResNet, VGG, Swin Transformer, ViT).
The key metrics are:
- \(Acc_f\) (Accuracy on Forget Set): Should be 0% (or close to random).
- \(Acc_r\) (Accuracy on Remain Set): Should be close to the Original or Retrained model.
- MIA (Membership Inference Attack): A lower score means the model doesn’t “leak” that it was ever trained on the forget data.
Performance vs. Baselines
The results are striking. Let’s look at the broad comparison across tasks:

In Figure 1 above, notice how the Ours (DELETE) line (in red/orange) almost perfectly overlaps with the Retrain (Upper Bound) line. Other methods like Random Label or Negative Gradient struggle, particularly on the “Forget” axis (they don’t forget enough) or the “Remain” axis (they forget too much).
Detailed Table Results
Table 1 below shows the granular data. The gray rows represent methods that cheat by using the Remain Data (\(D_r\)). The white rows use only the Forget Data (\(D_f\)), which is the strict setting DELETE targets.

Key Takeaways from Table 1:
- DELETE (Ours) achieves 0% accuracy on the Forget Set (\(Acc_{ft}\)) for CIFAR-10 and CIFAR-100.
- Critically, it maintains 95.03% accuracy on the Remain Set (\(Acc_{rt}\)) for CIFAR-10. This is higher than every other method and extremely close to the Retrain Model (95.20%).
- Methods like “Negative Gradient” and “Boundary Shrink” drop significantly in remaining accuracy or fail to fully erase the target class.
Stability and Multi-Class Forgetting
One of the major issues with optimization-based unlearning is instability—results can vary wildly between runs.

Figure 2 shows that DELETE (red box) has extremely low variance. It works consistently every time.
Furthermore, unlearning isn’t always about just one class. Sometimes you need to remove 5, 10, or 20 classes at once.

Table 4 demonstrates that as the number of classes to forget increases, other methods crumble. “Boundary Expand” and “Influence Unlearn” see their retention accuracy plummet to 40-50%. DELETE maintains high retention (~77%) even when removing 20 classes simultaneously.
Feature Space Visualization
What does unlearning look like inside the neural network? We can use t-SNE to visualize the feature embeddings.

In Figure 4:
- Original: Distinct clusters for all classes.
- Retrain: The target class cluster (triangles) disappears/scatters.
- Boundary Shrink: The clusters become messy; the boundaries between remaining classes are damaged.
- Ours: The target class is successfully scattered (erased), but the clusters for the remaining classes remain tight and well-separated. This visualizes why DELETE retains such high accuracy.
Beyond Classification: Downstream Applications
The paper asserts that DELETE is a “General” method. To prove this, they applied it to three distinct downstream tasks.
1. Privacy Protection in Face Recognition
In facial recognition, unlearning is a privacy requirement. The model must forget a specific person’s identity.

Using Grad-CAM heatmaps (Figure 5), we can see where the model looks.
- Row (a) Forgetting Individuals: The Original model focuses on the face. The Retrain and DELETE (Ours) models look away or lose focus, indicating the identity is forgotten.
- Row (b) Remaining Individuals: Crucially, DELETE still focuses sharply on the faces of people it should remember, unlike other methods that might degrade general face detection capabilities.
2. Backdoor Defense
A “backdoor” attack involves poisoning training data with a trigger (e.g., a tiny square in the corner) so the model misclassifies images whenever the trigger is present. DELETE can be used to “unlearn” the poisoned samples.

In Figure S7, the “Original” column shows the model obsessing over the tiny trigger (red boxes). After applying DELETE to the poisoned data, the “Ours” column shows the model ignoring the trigger and looking at the actual object (the bird or car).
3. Semantic Segmentation
This is perhaps the most visually impressive result. Semantic segmentation involves classifying every pixel in an image.

In Figure S8, the model is asked to forget the class “Car”.
- Original: Clearly segments the cars.
- Ours: The cars effectively vanish from the segmentation mask. The model treats them as background, yet perfectly preserves the segmentation of pedestrians (bottom right). This proves the method works for complex, dense prediction tasks, not just simple classification.
Conclusion
The “DELETE” paper provides a significant step forward for Machine Unlearning. By mathematically decomposing the unlearning loss, the authors identified exactly why previous methods failed: they focused so much on forgetting that they neglected the structure of the remaining knowledge.
The Decoupled Distillation approach offers a robust solution:
- Strict Privacy: It requires no access to the original “Remaining” data.
- High Performance: It matches the “Gold Standard” of retraining from scratch.
- Versatility: It works across architectures and tasks.
As privacy regulations tighten and the need for safe, updateable AI grows, methods like DELETE will become essential tools in the machine learning engineer’s toolkit. Instead of burning down the library to remove one book, we now have the precision tools to simply take it off the shelf.
](https://deep-paper.org/en/paper/2503.23751/images/cover.png)