Performing Brain Surgery on LLMs to Fix Translation Glitches

Large Language Models (LLMs) like LLaMA and GPT have revolutionized how we approach machine translation (MT). Unlike traditional translation systems that are trained specifically to convert language A to language B, LLMs are “polyglots” by nature. You can simply ask them to translate a sentence, and they usually do a decent job. This capability, known as In-Context Learning (ICL), allows models to translate based on just a few examples or even a simple instruction.

However, LLMs are not perfect translators. While they can produce fluent text, they suffer from specific, stubborn “hallucinations” that traditional systems rarely face. If you have ever used an LLM for translation, you might have encountered a situation where the model suddenly switches to the wrong language or gets stuck in a loop, repeating the same word endlessly.

In a recent paper, researchers investigated these specific failure modes—Language Mismatch and Repetition—and proposed a novel solution using Model Editing. Instead of retraining the entire massive model, they developed a technique to locate the specific “neurons” and “attention heads” responsible for these errors and surgically alter them.

This post dives deep into their research, explaining why these errors happen, how the researchers located the responsible components inside the model’s “brain,” and the clever intersection technique they used to fix the problem without lobotomizing the model’s general abilities.

The Two Villains: Language Mismatch and Repetition

Before we can fix the model, we need to understand what is broken. The researchers identified two primary categories of severe errors that plague LLM-based translation.

1. Language Mismatch: This occurs when the model ignores the instruction regarding the target language. For example, you ask the model to translate English to German, but it outputs Russian instead. This isn’t just a bad translation; it’s a complete failure to follow instructions.

2. Repetition: This is a degeneration error where the model gets stuck in a loop. It might translate the sentence correctly but then continue repeating the last word or phrase until it hits the maximum generation length.

The illustration of the language mismatch error (a) and the repetition error (b).

As shown in Figure 1 above, these aren’t subtle grammatical mistakes; they are catastrophic failures that render the output useless.

Why do these errors matter?

You might think these are edge cases, but the data suggests otherwise. In Zero-Shot settings (where the model is given no examples, just an instruction), language mismatch can occur in over 40% of cases for certain language pairs.

The researchers analyzed the impact of these errors on translation quality, measured by the BLEU score (a standard metric where higher is better).

Table 1: The correlation between error ratio and BLEU.

Table 1 highlights a stark reality:

LB (Language Mismatch BLEU): When a mismatch occurs, the score drops to near zero (e.g., 1.65 vs. 12.61 in English-to-German).
RRB (Repetition BLEU): Repetition errors are similarly destructive.

The gap between the “Regular” set (clean translations) and the “Origin” set (all translations) indicates that if we could just eliminate these two specific types of errors, the overall translation quality of LLMs would jump significantly.

The Approach: Model Editing

How do we fix this? The traditional approach would be Fine-Tuning, where we retrain the model on a high-quality translation dataset. However, fine-tuning is computationally expensive and requires managing massive datasets.

The researchers took a different path: Model Editing. This technique assumes that specific capabilities (or errors) in an LLM are localized to specific parts of the neural network. If we can find the exact neurons responsible for “knowing the target language” or “generating the next token,” we can manually adjust their weights or activations to steer the model’s behavior.

The researchers utilized two specific editing techniques:

Function Vectors (FV): A method to manipulate Attention Heads (the parts of the Transformer that relate different words to each other).
Knowledge Neurons (KN): A method to manipulate the Feed-Forward Networks (FFN) (the parts of the Transformer that store information and process logic).

The Prompting Setup

To perform these edits, the researchers first needed a standard way to prompt the model for translation. They used a standard format where the model is given a source sentence and asked to complete the target.

Equation showing the prompting template for K-shot translation.

Using this template, they could feed the model inputs and watch how the internal activations changed as the model processed the translation.

Phase 1: The Naive Investigation

The team began with a straightforward hypothesis: Can we simply locate the components responsible for translation and repetition, and edit them directly?

Locating the “Translation Engine”

To find the parts of the model responsible for machine translation, they used Causal Mediation Analysis. This involves:

Running the model with a clean set of translation examples (e.g., 10-shot prompting).
Running the model again with “corrupted” inputs (shuffled labels).
Swapping the internal states of specific attention heads from the clean run into the corrupted run.
Measuring which swaps restored the model’s ability to translate correctly.

This process highlights the attention heads that carry the most “causal weight” for the translation task.

Heatmaps of AIE values for attention heads in LLaMA2-7B.

Figure 2 visualizes these “Machine Translation (MT) Heads.” The bright spots represent attention heads that are highly active and important during translation. Interestingly, out of hundreds of heads in the model, only a small, sparse set (mostly in the middle layers) does the heavy lifting for translation.

Based on this, they extracted a Machine Translation Vector (MTV)—essentially a mathematical representation of “how to translate”—and injected it into the model during inference to force it to stay on track.

Locating the “Repetition Glitch”

For repetition errors, they used Integrated Gradients to find neurons in the Feed-Forward Networks (FFN) that spiked in activity when the model started repeating itself. They identified the top neurons that contributed to repetition and dubbed them Repetition Neurons (RPN). The strategy was to suppress (set to zero) these neurons to stop the loop.

The Problem: The “Side Effect”

When they applied these edits directly, the results were mixed and somewhat alarming.

MTV (Translation Vectors): Injecting the translation vector drastically reduced language mismatch. However, it increased repetition errors by nearly 500% in some cases and lowered the overall BLEU score. By forcing the model to focus too hard on “translation,” they broke its fluency.
RPN (Repetition Neurons): Suppressing repetition neurons helped slightly, but not enough to be a game-changer.

The conclusion of Phase 1 was clear: Directly hacking the model’s brain is dangerous. The components they located weren’t just responsible for errors; they were likely entangled with general language abilities. manipulating them bluntly caused the model to degrade.

The researchers realized that their locating methods were too broad. If you look for neurons responsible for “English-to-German” translation, you might find neurons that handle the translation task, but also neurons specific to the German language or English grammar. Editing the language-specific neurons disrupts the model when the context changes.

They proposed a new hypothesis: The core mechanism for translation and repetition generation should be language-independent.

If a specific attention head is truly a “Translation Head,” it should be active whether we are translating English to German, Chinese to English, or German to English. If a head is only active for one pair, it’s likely capturing language-specific noise, not the task itself.

The Intersection Technique

To test this, they located the important heads and neurons for four different language settings (\(en \to de\), \(de \to en\), \(en \to zh\), \(zh \to en\)). They then looked for the Intersection—the specific components that appeared in the top lists for all language pairs.

Bar charts showing the effectiveness of cross-lingual transfer.

Figure 3 supports this hypothesis. It shows that heads located using English-to-German data (blue bars) were still effective when applied to Chinese-to-English tasks, vastly outperforming random selection (red bars). This confirmed that there is a “universal” set of translation components inside LLaMA.

The Solution: MTV-I-D and RPN-I

Based on this insight, the researchers developed two refined methods:

MTV-I-D (Machine Translation Vectors - Intersection - Distributional):

Intersection: Only use attention heads found in the intersection of all language pairs.
Distributional: Instead of adding the full translation vector to a single point, they divided the vector evenly across the intersected heads. This is a “softer,” more distributed intervention that nudges the model rather than shoving it.

RPN-I (Repetition Neurons - Intersection):

Identify repetition neurons across multiple language pairs.
Only suppress the neurons that cause repetition in all languages. This ensures we are targeting the mechanism of repetition itself, not valid language generation patterns.

Results: Surgical Precision

The refined methods proved to be highly effective. By filtering out the noise and targeting only the language-independent components, the researchers achieved significant error reduction without the nasty side effects.

Let’s look at the performance on the Chinese-to-English (\(zh \to en\)) task:

Table 3: Performance of LLaMA2-7B with refined editing methods.

Table 3 demonstrates the dramatic improvement:

MTV (The naive method): Reduced mismatch by 92% but lowered the BLEU score by 0.81%. It fixed the error but lowered the quality.
MTV-I-D (The refined method): Reduced mismatch by 86% (still very high) but improved the BLEU score by an incredible 76.82% in the Zero-Shot setting.
RPN-I: Reduced repetition errors by roughly 25% while maintaining or slightly improving general translation quality.

Comparison with Traditional Methods

How does this compare to standard techniques like Few-Shot prompting (providing 5 examples) or LoRA (a popular efficient fine-tuning method)?

The researchers compared their editing method against these heavy hitters. In many cases, MTV-I-D was comparable to, or even better than, providing 5 examples (5-Shot ICL) or performing Full Fine-Tuning, particularly in reducing specific error types.

Crucially, Model Editing requires zero training. You simply calculate the vectors once and apply them during inference. This makes it computationally effectively free compared to the cost of LoRA or full fine-tuning.

Conclusion

This research offers a fascinating glimpse into the internal mechanics of Large Language Models. It confirms that while LLMs are general-purpose “black boxes,” they contain specific, identifiable modules responsible for tasks like translation and behaviors like repetition.

The key takeaway is that precision matters. Simply finding important neurons isn’t enough; we must distinguish between neurons that handle specific data (like German grammar) and neurons that handle mechanism (like the act of translation). By using the Intersection of components across different languages, the researchers successfully isolated the “engine” of the model.

This work suggests that as we move forward, we might not always need to retrain models to fix their bugs. Instead, we can act as surgeons, locating the specific neural circuits causing hallucinations or errors and patching them in real-time.

The Two Villains: Language Mismatch and Repetition#

Why do these errors matter?#

The Approach: Model Editing#

The Prompting Setup#

Phase 1: The Naive Investigation#

Locating the “Translation Engine”#

Locating the “Repetition Glitch”#

The Problem: The “Side Effect”#

Phase 2: The Refinement via Intersection#

The Intersection Technique#

The Solution: MTV-I-D and RPN-I#

Results: Surgical Precision#

Comparison with Traditional Methods#

Conclusion#