Introduction: The Art of Editing AI Models
Massive pre-trained models like CLIP, GPT, and T5 have become the backbone of modern AI. They possess an incredible wealth of general knowledge, but to be truly useful, they often need a bit of targeted editing. We might want to teach them a new skill, align them with human values, or make them forget undesirable behaviors.
The standard approach is fine-tuning, which involves further training on a specialized dataset. However, fine-tuning can be computationally expensive, and it often comes with an unwelcome trade-off: a model fine-tuned for one task may lose some of its original zero-shot capabilities on others.
Enter task arithmetic — a surprisingly simple and cost-effective alternative. Rather than retraining from scratch, you manipulate a model’s skills by performing arithmetic directly on its weights. For example:
- Fine-tune the model separately on two tasks.
- Compute the change in weights from the original pre-trained model for each task (known as its task vector).
- Add these task vectors to the original weights to produce a model that performs both tasks well.
- Subtract a task vector to make the model “forget” a skill.
Task arithmetic is promising, but until recently one critical question was unanswered: why does it work? Without a solid theoretical foundation, it can feel like guesswork — hard to trust and even harder to improve.
A recent paper, Task Arithmetic in the Tangent Space, dives deep into this question. The authors challenge prevailing assumptions, introduce the concept of weight disentanglement, and propose an improved method — linearized fine-tuning — that makes editing pre-trained models both more effective and reliable.
In this article, we’ll unpack the work’s key findings: why the old theory falls short, what weight disentanglement really means, and how fine-tuning in the model’s tangent space creates a new state-of-the-art in task arithmetic.
Background: Task Vectors and Tangent Spaces
What is a Task Vector?
Let’s start simply. You have a pre-trained model with initial weights \(\theta_0\). You fine-tune it on a specific task (say, classifying cars) and get new weights \(\theta_{\text{cars}}^*\). The task vector is:
\[ \tau_{\text{cars}} = \theta_{\text{cars}}^* - \theta_0 \]This vector encodes the knowledge gained for that task. If you have another task vector, \(\tau_{\text{flowers}}\) (for classifying flowers), you can combine them:
\[ \theta_{\text{multi-task}} = \theta_0 + \alpha_1 \tau_{\text{cars}} + \alpha_2 \tau_{\text{flowers}} \]The coefficients \(\alpha_1, \alpha_2\) let you control how much of each skill to add.
The Linear Hypothesis and the Neural Tangent Kernel (NTK)
One early theory held that task arithmetic works because fine-tuning often happens in a linear regime. Around the starting weights \(\theta_0\), you can approximate the model with a first-order Taylor expansion:
\[ f(\boldsymbol{x};\boldsymbol{\theta}) \approx f(\boldsymbol{x};\boldsymbol{\theta}_0) + (\boldsymbol{\theta} - \boldsymbol{\theta}_0)^{\top} \nabla_{\boldsymbol{\theta}} f(\boldsymbol{x};\boldsymbol{\theta}_0) \]This defines the model’s tangent space, whose behavior is governed by the Neural Tangent Kernel (NTK). The idea was: if fine-tuning stays within this linear neighborhood, adding weight vectors should correspond to adding functions, enabling simple task arithmetic.
But does fine-tuning really stay in this linear regime? The paper puts that to the test.
Is Fine-Tuning Really Linear? Debunking the Old Theory
The authors’ first major contribution is to test whether task arithmetic is just a consequence of linear behavior.
They take a CLIP model, fine-tune it on a single task (producing \(f(\cdot; \theta^*)\)), and then generate its post-hoc linearization:
\[ f_{\text{lin}}(\boldsymbol{x}; \theta_0+\tau) = f(\boldsymbol{x}; \theta_0) + \tau^{\top} \nabla_{\theta} f(\boldsymbol{x}; \theta_0) \]If the linear hypothesis held, the fine-tuned non-linear model and its linearized version should perform similarly.
Figure 2: Single-task accuracies for non-linear fine-tuning vs. post-hoc linearization. The gap shows the “non-linear advantage”.
The results show a clear non-linear advantage: ignoring non-linear components hurts accuracy. Fine-tuning is not purely linear.
Perhaps, however, the combination of task vectors still only needs these linear components. To test this, the authors apply task vectors (from non-linear fine-tuning) to the linearized model.
Table 1: Task addition benchmark. The post-hoc linearized model’s absolute accuracy is consistently lower.
This confirms that task arithmetic in standard models uses non-linear components too.
But here’s the twist: the post-hoc linearized model actually scores better at task negation and achieves higher normalized accuracy for task addition. Normalized accuracy measures multi-task performance relative to each model’s own single-task max.
Why? It turns out that in linearized models, tasks interfere less. This suggests a deeper property at work.
Table 2: Task negation benchmark. Lower target accuracy means better forgetting.
The Real Secret: Weight Disentanglement
If not linearity, what enables task arithmetic? The authors propose weight disentanglement.
This means that in a well-trained model, certain directions in weight space \(\tau_t\) affect the output only for inputs in a certain domain \(\mathcal{D}_t\), leaving others untouched.
Figure 1: Distinct weight-space directions correspond to distinct, localized input domains.
Formally:
\[ f(\boldsymbol{x};\theta_0+\sum_{t=1}^T \alpha_t \tau_t) = \sum_{t=1}^T g_t(\boldsymbol{x}; \alpha_t \tau_t) + g_0(\boldsymbol{x}) \]Here, \(g_t\) vanishes outside \(\mathcal{D}_t\), and \(g_0\) is zero within \(\bigcup_t \mathcal{D}_t\).
To measure disentanglement, the authors define a disentanglement error \(\xi(\alpha_1, \alpha_2)\): the average disagreement when comparing the outputs from combined vs. separate task vectors. Lower is better.
Figure 3: Disentanglement error for non-linear (top) vs. linearized (bottom) CLIP ViT-B/32. Lighter regions = lower error.
Key insight: Linearized models are more weight-disentangled — tasks interfere less — but have lower absolute single-task accuracy.
The Solution: Fine-Tuning Directly in the Tangent Space
The proposed fix: linearized fine-tuning. Instead of fine-tuning the non-linear model then linearizing, fine-tune its tangent-space representation directly. As shown below:
Figure 4: Fine-tuning in tangent space vs. non-linear weight space.
This method finds task vectors optimized specifically for the linearized model.
Figure 5: Accuracy gap between linearized and non-linear fine-tuning greatly reduced.
As a result, linearized FT retains strong disentanglement and high task accuracy — yielding state-of-the-art task arithmetic:
- Task addition: up to +5.8 points over non-linear FT.
- Task negation: forget an extra 13.1 points’ worth of the target task.
Deeper Insights: Why It Works
1. Eigenfunction Localization
The NTK can be decomposed into eigenfunctions. For task arithmetic, the eigenfunctions for a task should be localized to its domain.
Figure 6: Local energy for NTK eigenfunctions after training on RESISC45. Energy is domain-specific.
Analysis shows CLIP models indeed use localized eigenfunctions: separate sets for different tasks, enabling interference-free combinations.
2. Disentanglement Emerges from Pre-Training
Repeating task arithmetic from random initialization fails completely:
Table 3: Task addition from scratch: accuracy near random chance.
Random models lack weight disentanglement. Hence, disentanglement is learned during pre-training, not inherent to architecture.
Conclusion and Future Directions
This work reframes our understanding of task arithmetic:
- Weight disentanglement, not simple linearity, enables task arithmetic.
- Linearized models are inherently more disentangled.
- Fine-tuning in tangent space combines high task accuracy with low interference, setting a new benchmark.
- Disentanglement links to localized NTK eigenfunctions.
- Disentanglement is an emergent property of large-scale pre-training.
Future research could explore:
- How pre-training dynamics lead to disentanglement.
- Efficient algorithms for tangent-space fine-tuning at large scale.
- Applying these ideas to other modalities and architectures.
If editing AI skills becomes as simple — and reliable — as doing a bit of arithmetic, the possibilities for adaptive, controllable models are vast.