Introduction: The Art of Editing AI Models

Massive pre-trained models like CLIP, GPT, and T5 have become the backbone of modern AI. They possess an incredible wealth of general knowledge, but to be truly useful, they often need a bit of targeted editing. We might want to teach them a new skill, align them with human values, or make them forget undesirable behaviors.

The standard approach is fine-tuning, which involves further training on a specialized dataset. However, fine-tuning can be computationally expensive, and it often comes with an unwelcome trade-off: a model fine-tuned for one task may lose some of its original zero-shot capabilities on others.

Enter task arithmetic — a surprisingly simple and cost-effective alternative. Rather than retraining from scratch, you manipulate a model’s skills by performing arithmetic directly on its weights. For example:

  • Fine-tune the model separately on two tasks.
  • Compute the change in weights from the original pre-trained model for each task (known as its task vector).
  • Add these task vectors to the original weights to produce a model that performs both tasks well.
  • Subtract a task vector to make the model “forget” a skill.

Task arithmetic is promising, but until recently one critical question was unanswered: why does it work? Without a solid theoretical foundation, it can feel like guesswork — hard to trust and even harder to improve.

A recent paper, Task Arithmetic in the Tangent Space, dives deep into this question. The authors challenge prevailing assumptions, introduce the concept of weight disentanglement, and propose an improved method — linearized fine-tuning — that makes editing pre-trained models both more effective and reliable.

In this article, we’ll unpack the work’s key findings: why the old theory falls short, what weight disentanglement really means, and how fine-tuning in the model’s tangent space creates a new state-of-the-art in task arithmetic.


Background: Task Vectors and Tangent Spaces

What is a Task Vector?

Let’s start simply. You have a pre-trained model with initial weights \(\theta_0\). You fine-tune it on a specific task (say, classifying cars) and get new weights \(\theta_{\text{cars}}^*\). The task vector is:

\[ \tau_{\text{cars}} = \theta_{\text{cars}}^* - \theta_0 \]

This vector encodes the knowledge gained for that task. If you have another task vector, \(\tau_{\text{flowers}}\) (for classifying flowers), you can combine them:

\[ \theta_{\text{multi-task}} = \theta_0 + \alpha_1 \tau_{\text{cars}} + \alpha_2 \tau_{\text{flowers}} \]

The coefficients \(\alpha_1, \alpha_2\) let you control how much of each skill to add.


The Linear Hypothesis and the Neural Tangent Kernel (NTK)

One early theory held that task arithmetic works because fine-tuning often happens in a linear regime. Around the starting weights \(\theta_0\), you can approximate the model with a first-order Taylor expansion:

\[ f(\boldsymbol{x};\boldsymbol{\theta}) \approx f(\boldsymbol{x};\boldsymbol{\theta}_0) + (\boldsymbol{\theta} - \boldsymbol{\theta}_0)^{\top} \nabla_{\boldsymbol{\theta}} f(\boldsymbol{x};\boldsymbol{\theta}_0) \]

This defines the model’s tangent space, whose behavior is governed by the Neural Tangent Kernel (NTK). The idea was: if fine-tuning stays within this linear neighborhood, adding weight vectors should correspond to adding functions, enabling simple task arithmetic.

But does fine-tuning really stay in this linear regime? The paper puts that to the test.


Is Fine-Tuning Really Linear? Debunking the Old Theory

The authors’ first major contribution is to test whether task arithmetic is just a consequence of linear behavior.

They take a CLIP model, fine-tune it on a single task (producing \(f(\cdot; \theta^*)\)), and then generate its post-hoc linearization:

\[ f_{\text{lin}}(\boldsymbol{x}; \theta_0+\tau) = f(\boldsymbol{x}; \theta_0) + \tau^{\top} \nabla_{\theta} f(\boldsymbol{x}; \theta_0) \]

If the linear hypothesis held, the fine-tuned non-linear model and its linearized version should perform similarly.

A scatter plot showing that non-linear fine-tuning consistently achieves higher accuracy than its post-hoc linearized version, a phenomenon termed the “non-linear advantage”.

Figure 2: Single-task accuracies for non-linear fine-tuning vs. post-hoc linearization. The gap shows the “non-linear advantage”.

The results show a clear non-linear advantage: ignoring non-linear components hurts accuracy. Fine-tuning is not purely linear.

Perhaps, however, the combination of task vectors still only needs these linear components. To test this, the authors apply task vectors (from non-linear fine-tuning) to the linearized model.

Table showing results for task addition. The “Post-hoc lin.” row shows lower absolute accuracy compared to the “Non-lin. FT” row, indicating that non-linear components are important for task arithmetic.

Table 1: Task addition benchmark. The post-hoc linearized model’s absolute accuracy is consistently lower.

This confirms that task arithmetic in standard models uses non-linear components too.

But here’s the twist: the post-hoc linearized model actually scores better at task negation and achieves higher normalized accuracy for task addition. Normalized accuracy measures multi-task performance relative to each model’s own single-task max.

Why? It turns out that in linearized models, tasks interfere less. This suggests a deeper property at work.

Table showing results for task negation. The “Post-hoc lin.” model achieves a much lower accuracy on the target task to be forgotten, indicating it is better at negation than the non-linear model.

Table 2: Task negation benchmark. Lower target accuracy means better forgetting.


The Real Secret: Weight Disentanglement

If not linearity, what enables task arithmetic? The authors propose weight disentanglement.

This means that in a well-trained model, certain directions in weight space \(\tau_t\) affect the output only for inputs in a certain domain \(\mathcal{D}_t\), leaving others untouched.

A diagram illustrating weight disentanglement. Three panels show how different directions in weight space (τ1, τ2) independently affect the model’s function on distinct input domains (D1, D2).

Figure 1: Distinct weight-space directions correspond to distinct, localized input domains.

Formally:

\[ f(\boldsymbol{x};\theta_0+\sum_{t=1}^T \alpha_t \tau_t) = \sum_{t=1}^T g_t(\boldsymbol{x}; \alpha_t \tau_t) + g_0(\boldsymbol{x}) \]

Here, \(g_t\) vanishes outside \(\mathcal{D}_t\), and \(g_0\) is zero within \(\bigcup_t \mathcal{D}_t\).

To measure disentanglement, the authors define a disentanglement error \(\xi(\alpha_1, \alpha_2)\): the average disagreement when comparing the outputs from combined vs. separate task vectors. Lower is better.

Heatmaps showing the disentanglement error. The linearized model (bottom) has much larger light-colored regions (low error) than the non-linear model (top), indicating it is more weight-disentangled.

Figure 3: Disentanglement error for non-linear (top) vs. linearized (bottom) CLIP ViT-B/32. Lighter regions = lower error.

Key insight: Linearized models are more weight-disentangled — tasks interfere less — but have lower absolute single-task accuracy.


The Solution: Fine-Tuning Directly in the Tangent Space

The proposed fix: linearized fine-tuning. Instead of fine-tuning the non-linear model then linearizing, fine-tune its tangent-space representation directly. As shown below:

A conceptual diagram showing the difference between optimizing on the non-linear function space (blue wavy surface) versus the linearized tangent space (orange plane).

Figure 4: Fine-tuning in tangent space vs. non-linear weight space.

This method finds task vectors optimized specifically for the linearized model.

A scatter plot comparing non-linear and linearized fine-tuning accuracies. The points are much closer to the y=x line than in Figure 2, showing the performance gap has shrunk.

Figure 5: Accuracy gap between linearized and non-linear fine-tuning greatly reduced.

As a result, linearized FT retains strong disentanglement and high task accuracy — yielding state-of-the-art task arithmetic:

  • Task addition: up to +5.8 points over non-linear FT.
  • Task negation: forget an extra 13.1 points’ worth of the target task.

Deeper Insights: Why It Works

1. Eigenfunction Localization

The NTK can be decomposed into eigenfunctions. For task arithmetic, the eigenfunctions for a task should be localized to its domain.

A line plot showing the local energy of eigenfunctions. The energy is high (near 1) for data points from the training task (RESISC45) and drops to zero for data points from a different control task (Cars).

Figure 6: Local energy for NTK eigenfunctions after training on RESISC45. Energy is domain-specific.

Analysis shows CLIP models indeed use localized eigenfunctions: separate sets for different tasks, enabling interference-free combinations.

2. Disentanglement Emerges from Pre-Training

Repeating task arithmetic from random initialization fails completely:

A table showing task addition results from random initialization. The multi-task accuracy is close to random chance, indicating task arithmetic fails completely.

Table 3: Task addition from scratch: accuracy near random chance.

Random models lack weight disentanglement. Hence, disentanglement is learned during pre-training, not inherent to architecture.


Conclusion and Future Directions

This work reframes our understanding of task arithmetic:

  1. Weight disentanglement, not simple linearity, enables task arithmetic.
  2. Linearized models are inherently more disentangled.
  3. Fine-tuning in tangent space combines high task accuracy with low interference, setting a new benchmark.
  4. Disentanglement links to localized NTK eigenfunctions.
  5. Disentanglement is an emergent property of large-scale pre-training.

Future research could explore:

  • How pre-training dynamics lead to disentanglement.
  • Efficient algorithms for tangent-space fine-tuning at large scale.
  • Applying these ideas to other modalities and architectures.

If editing AI skills becomes as simple — and reliable — as doing a bit of arithmetic, the possibilities for adaptive, controllable models are vast.