Introduction

Imagine you are using a Large Language Model (LLM) for a Retrieval-Augmented Generation (RAG) task. You provide the model with a specific document stating, “The capital of France is Beijing,” perhaps as part of a hypothetical scenario or a fictional story. You ask the model: “What is the capital of France?”

The model now faces a crisis. Its internal, pre-trained memory (parametric knowledge) screams “Paris!” But the context window (contextual knowledge) you provided clearly says “Beijing.” This phenomenon is known as a knowledge conflict.

In real-world applications, how the model resolves this conflict is critical. If you are building a factual QA system, you want the model to reject the hallucinated or incorrect context. If you are building a creative writing assistant or a summarizer for fictional texts, you want the model to strictly adhere to the context, even if it contradicts reality.

For a long time, researchers believed that specific parts of the model—“memory heads”—were responsible for facts, while “context heads” handled the input text. The prevailing theory was that we could simply turn off the memory heads to make the model context-faithful, or vice versa.

A new paper titled “Taming Knowledge Conflicts in Language Models” challenges this assumption. The researchers discovered that models are not that neatly organized. Instead, they exhibit a phenomenon called Superposition of Contextual Information and Parametric Memory (CP Superposition). To address this, they propose a novel, training-free intervention method called JUICE (Just Run Twice).

In this post, we will dive deep into the mechanics of knowledge conflicts, understand why previous “head-pruning” methods fail, and explore how JUICE effectively steers models toward either facts or context by running inference twice.

Background: The Anatomy of a Conflict

To understand the solution, we first need to understand how LLMs process these conflicts.

Parametric vs. Contextual Knowledge

Parametric Knowledge: This is the information stored in the model’s weights (parameters) during pre-training. It is the model’s “long-term memory” of facts, grammar, and world knowledge.
Contextual Knowledge: This is the information provided in the prompt (the input text). It is transient and exists only for the duration of the current inference.

The Traditional View: distinct Heads

Transformers, the architecture behind LLMs, rely on Attention Heads. These heads allow the model to focus on different parts of the input.

Prior Belief: Previous research (such as Jin et al., 2024) hypothesized that specific attention heads were exclusive. Some were “Memory Heads” (retrieving facts from weights) and others were “Context Heads” (copying information from the input).
The Strategy: If this were true, resolving a conflict would be easy. To force the model to use context, you would simply “knock out” (zero-ablate) the Memory Heads.

However, as we are about to see, the internal reality of a Transformer is much more messy—and interesting.

The Discovery: CP Superposition

The authors of this paper began by testing the “distinct heads” hypothesis across various types of conflicts. They found that highly influential attention heads often contribute to both memory and context simultaneously.

Figure 1. Our finding goes beyond the prior notion of exclusive “memory head” and “context head", where we show that memory and contexts are encoded in attention heads in superposition.

As shown in Figure 1 above, the new finding suggests a superposition. An attention head isn’t just a “memory retriever” or a “context copier.” It is often doing both. This means that if you try to shut down a head to suppress parametric memory, you might inadvertently destroy the model’s ability to process context, leading to broken outputs.

Analyzing the Conflict Types

To prove this, the researchers categorized conflicts into three levels of difficulty:

Clean: No conflict. The context supports the truth.
Substitution Conflict (Sentence-level): The context contains a direct replacement (e.g., “The capital of France is Beijing”).
Coherent Conflict (Paragraph-level): The context is a persuasive, fully coherent paragraph arguing for the false fact. This is the hardest scenario for a model to resist.

The Evidence of Inconsistent Behavior

The researchers performed an experiment where they “knocked out” (zeroed out) different components of the model—the Attention mechanisms and the Multi-Layer Perceptrons (MLPs)—to see how it affected the probability of the model outputting the parametric truth (\(a_p\)).

We can quantify the impact of these interventions using the following equation:

Equation for measuring expected change in probability.

This equation essentially measures: How much does the probability of the correct answer change when I turn off component M?

The results, visualized in Figure 3 below, were revealing.

Figure 3. Influence of Knock Out (Zero Out) Model Components in changing the probability of outputting the parametric answer tokens on the World Capital dataset.

Notice the chaotic behavior in the graphs:

Clean Inputs (Left): Removing components generally decreases the probability of the correct answer. This makes sense; you are breaking the model.
Substitution Conflict (Middle): Removing components causes the probability to fluctuate wildly.
Coherent Conflict (Right): Surprisingly, removing nearly all components increases the probability of the parametric answer. This implies that during a strong conflict, almost every part of the model is conspiring to support the context.

The Failure of Single-Head Interventions

The most damning evidence against the “exclusive heads” theory comes from ranking the heads. The researchers identified the top “Memory Heads”—those that, when active, most strongly pushed for the parametric fact.

Table 1. The top 4 heads ranked by the average prob increase of contextual knowledge in substitution-based conflicts via knocking out.

Table 1 shows a striking contradiction. The heads listed were identified as strong “Memory Heads” in substitution conflicts (removing them increased context reliance, hence the green numbers). However, look at the “Coh-Conflict” (Coherent Conflict) columns. For some heads, removing them actually hurt context reliance (red numbers).

The takeaway: A head that acts as a “Memory Head” in one situation might act as a “Context Head” in another. This is CP Superposition. You cannot simply cut off a head to resolve a conflict because its role changes dynamically based on the input.

The Solution: Just Run Twice (JUICE)

Since we cannot simply prune heads without causing collateral damage, we need a smarter intervention. The authors propose JUICE, a method that steers the model without permanent modification.

The core insight is simple: Don’t silence the head; steer its output.

How JUICE Works

JUICE operates in two distinct stages: Head Identification and Dual-Run Inference.

Figure 4. Overview of JUICE. Head identification and Dual-run inference stages.

Stage 1: Head Identification

First, we need to know which heads are the “superposition” heads—the ones that strongly influence the tug-of-war between memory and context.

The researchers use a tiny dataset (as few as 4 examples) containing different conflict types.
They compute a score for each attention head based on how much it pushes the probability toward the parametric or contextual answer.
They select the top \(K\) heads that consistently impact the output.

Stage 2: Dual-Run Inference

This is the innovative part. Instead of a single pass, the model runs the inference twice for every query.

First Run (The “Listen” Phase): The model processes the input normally. During this pass, JUICE records (saves) the activation outputs of the top \(K\) heads identified in Stage 1.
Second Run (The “Steer” Phase): The model processes the input again. This time, JUICE takes the saved activations from the first run, scales them by a factor (positive or negative), and adds them back into the current run’s activations. \[H_{new} = H_{current} + \beta \times H_{saved}\] If you want to boost Parametric Memory, you might subtract the context-heavy activations (using a negative \(\beta\)). If you want to boost Contextual Adherence, you add to them.

Why Run Twice?

You might ask, “Why not just steer it during the first run?”

The authors found that single-pass interventions (like “Just Run Once” or JUNE) are unstable. Because of superposition, the heads interact with each other. If you intervene on Head A, Head B might compensate or react unpredictably.

By saving the “natural” activations from a first run, you capture a stable snapshot of what the model wants to do. Using that snapshot as a steering vector in the second run is much more robust than trying to manipulate the live stream blindly.

Figure 6. Effect of Running Twice vs Running Once.

Figure 6 illustrates this clearly. The blue line (Run Once) degrades quickly as you intervene on more heads. The red line (Run Twice) maintains high effectiveness (logit value) even as you intervene on many heads. The “ghost” of the first run provides a reliable map for the second run.

Experiments & Results

The researchers evaluated JUICE across 11 datasets and 6 model architectures (including Gemma, Llama-2, and Llama-3). They tested two objectives:

Enhancing Parametric Beliefs: Forcing the model to ignore false context.
Enhancing Contextual Reliance: Forcing the model to follow context, even if it contradicts its memory.

1. Enhancing Parametric Beliefs

In this setup, the model is given a false context (e.g., “The capital of France is Beijing”) and must output the truth (“Paris”).

Table 3. Results of intervention for enhancing parametric memory.

Table 3 shows the results. The Original model often fails catastrophically in Type 3 (Coherent) conflicts, scoring near 0% accuracy (it believes the convincing lie).

PH3: A baseline method that prunes heads. It helps, but often struggles with coherent conflicts.
JuNE: The single-run version of the authors’ method. Better, but inconsistent.
JuICE: The dual-run method achieves state-of-the-art results, scoring 91.9% on Type 3 conflicts with Gemma, compared to 0% for the original model. It effectively “de-hypnotizes” the model.

2. Enhancing Contextual Reliance

Here, the goal is reversed. The model must follow the context (e.g., specific instructions or new definitions) and suppress its prior knowledge.

Table 4. Results of intervention for enhancing contextual knowledge.

As shown in Table 4, JUICE again dominates. On the Gemma model, it improves the average accuracy from 45.0% (Original) to 66.2%, outperforming contrastive decoding methods (CAD) and pruning methods (PH3).

Visualizing Success

The performance leap is perhaps best visualized in the bar chart below for the Gemma-2b model.

Figure 2. Performance of different methods with Gemma-2b under various conflict types.

Notice the “Coh-Conflict” (Coherent Conflict) group on the far right. The Original model (orange) is at zero. PH3 (blue) gets to ~40. JUICE (green) jumps to over 70. This is a massive improvement in robustness.

Robustness

A common critique of intervention methods is that they are brittle—sensitive to hyperparameters or prompt phrasing. The authors conducted a robustness study (Figure 5) varying the number of heads intervened (\(K\)), the scaling factors, and the size of the identification dataset.

Figure 5. Robustness analysis of JUICE across key hyperparameters.

The method remains stable (the high plateaus in the graphs) across a wide range of settings, proving it isn’t just overfitting to a specific configuration.

Theoretical Analysis

Why does this superposition happen? Why do models mash up memory and context in the same heads? The authors provide a theoretical analysis using a simplified two-layer Transformer model.

The Task Setup

They model the training process as learning two tasks simultaneously:

Factual Recall: Mapping a subject \(s\) to an answer \(a\) (e.g., China -> Beijing).
Induction: A copying mechanism where the model learns to repeat patterns (e.g., A… B… A… -> B).

Figure 7. Illustration of the theoretical task setup.

In Figure 7, we see the setup. During inference (bottom row), the model encounters a sequence that triggers both tasks. The subject \(s\) triggers a factual recall, but the sequence structure triggers an induction (context copy).

How Superposition Emerges

The authors prove that when a Transformer is trained on these tasks via gradient descent, it is mathematically efficient for the model to use the same weights to solve both.

They derive the weight construction for the Attention mechanism (\(W_{KQ}\) and \(W_{OV}\)) that minimizes loss:

Equation 3 and 4 showing weight construction.

In Equation (3), specifically the term \(W_{KQ}^{(2)}\), we see components related to both the query token \(q\) (contextual trigger) and the subject \(s\) (parametric trigger).

Because the weight matrix is a sum of these components, the attention head naturally attends to both the parametric signal and the contextual signal. This is the mathematical definition of the CP Superposition.

Why JUICE Works Theoretically

The theory also explains why JUICE succeeds where pruning fails. If the model is in a state where the contextual signal dominates (\(C_{context} > C_{parametric}\)), the output will be the contextual answer.

Pruning: If you delete the head, you remove both the contextual signal and the parametric signal (since they share the head). The result is noise.
JUICE: By capturing the activation (which contains the dominant contextual signal) and subtracting a scaled version of it, you dampen the contextual component without destroying the underlying mechanism that retrieves the parametric fact. You are effectively performing vector arithmetic in the activation space to cancel out the unwanted component.

Conclusion

The paper “Taming Knowledge Conflicts in Language Models” offers a crucial correction to our understanding of how LLMs work. We can no longer view “memory” and “context” as separate modules that can be toggled on and off. They are deeply Entangled through superposition.

The JUICE method accepts this messy reality. Rather than trying to surgically remove specific capabilities (and failing), it uses the model’s own behavior against itself. By running the model twice, it isolates the direction the model is heading and allows us to steer it back on course—whether that course is strict factual accuracy or faithful adherence to a user’s prompt.

For students and practitioners, this highlights a significant shift in prompt engineering and model control: meaningful intervention often requires understanding the dynamics of inference, not just the static weights.

Key Takeaways

Superposition is Real: Attention heads encode both memory and context simultaneously.
Don’t Prune, Steer: Removing heads is destructive. Modifying activations is precise.
Two Runs are Better than One: Using a “clean” run to generate a steering vector stabilizes the intervention.
Versatility: JUICE works for both fixing hallucinations (forcing facts) and enforcing instruction following (forcing context).

Introduction#

Background: The Anatomy of a Conflict#

Parametric vs. Contextual Knowledge#

The Traditional View: distinct Heads#

The Discovery: CP Superposition#

Analyzing the Conflict Types#

The Evidence of Inconsistent Behavior#

The Failure of Single-Head Interventions#

The Solution: Just Run Twice (JUICE)#

How JUICE Works#

Stage 1: Head Identification#

Stage 2: Dual-Run Inference#

Why Run Twice?#

Experiments & Results#

1. Enhancing Parametric Beliefs#

2. Enhancing Contextual Reliance#

Visualizing Success#

Robustness#

Theoretical Analysis#

The Task Setup#

How Superposition Emerges#

Why JUICE Works Theoretically#

Conclusion#

Key Takeaways#