In the world of Large Language Models (LLMs), a quiet battle is constantly being waged between two types of memory. On one side, there is the model’s internal training—the facts it memorized during its creation (Parametric Memory). On the other side, there is the new information provided to it in real-time via retrieved documents (Non-parametric Memory).
Imagine asking a model, “Who is the CEO of Company X?” If the model was trained in 2021, its internal memory might say “Alice.” But if a retrieval system fetches a news article from 2024 saying “Bob is the new CEO,” the model faces a conflict. Does it trust what it “knows,” or what it is currently “reading”?
This dynamic is the core of Retrieval-Augmented Generation (RAG). While RAG systems are becoming the industry standard for reliable AI, we know surprisingly little about how these models mechanically make the decision to trust context over training.
In the paper “Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models,” researchers from Chalmers University of Technology and the University of Gothenburg crack open the “black box” of the ATLAS model. They use sophisticated causal analysis to map out exactly which neurons and layers activate when a model decides to copy information from a document rather than reciting from memory.
The Core Problem: Who Do You Trust?
Standard generative models (like pure GPT-series models) rely entirely on parametric memory—knowledge stored in their weights. This makes them prone to hallucination on obscure topics or outdated information. RAG models solve this by connecting the generator to a retriever, allowing the model to access non-parametric memory (external documents).
However, having access to information doesn’t guarantee the model will use it. The researchers illustrate this duality with a simple experiment involving the capital of Sweden.

As shown in Table 1 above, when the model is given a context stating “In 1634, Milan became the official capital of Sweden,” it faces a choice. The model knows (parametrically) that Stockholm is the capital. But the context says Milan. When the model answers “Milan,” it is suppressing its internal memory in favor of the non-parametric context.
The goal of this research is to understand the neural circuitry behind this decision.
The Methodology: Causal Mediation Analysis
To understand the “why” and “how” of this behavior, the authors employ Causal Mediation Analysis. This technique allows researchers to trace the flow of information through a neural network, much like an electrician tracing a fault in a circuit board.
The idea is to manipulate specific parts of the input or the model’s internal states and observe the change in the output.
Defining the Variables
The researchers define the Total Effect (TE) as the overall change in the model’s behavior when the context is altered.

Here, \(X\) represents the input condition (e.g., whether the context contains the true answer or a fake “counterfactual” answer). \(Y\) represents the model’s output probability.
However, knowing the total effect isn’t enough. We want to know which specific component (or “mediator,” \(M\)) is responsible. Is it the Attention mechanism in Layer 5? Is it the MLP in Layer 10? To find this, they calculate the Indirect Effect (IE):

This equation essentially asks: “What happens if we keep the input fixed, but force a specific internal component (\(M\)) to behave as if the input were different?” This technique, often called “causal tracing,” allows the authors to pinpoint the exact location where the model processes relevance and copying.
The Measure of Trust
To quantify how much the model prefers the context over its internal memory, the researchers look at the ratio of probabilities between a “counterfactual” answer (fake news provided in context) and the “true” answer (internal fact).

If this value is high, the model is trusting the context (the counterfactual). If it is low, the model is sticking to its guns (parametric memory).
Designing the Trap: Counterfactual Experiments
To isolate these mechanisms, the researchers devised two distinct experiments using the ATLAS model. They utilized synthetic templates to ensure total control over the input.

Table 2 lists the templates used. For example, replacing a real fact (“Rome is the capital of Italy”) with a counterfactual (“Tehran is the capital of Italy”).
Visualizing the Experiments
The experimental design is beautifully visualized in Figure 1 below.

- Experiment 1 (Copying Behavior): This investigates how the model copies. They provide a context where the object (the answer) is swapped. In the top row (a and b), they check how the representation of the answer flows through the layers.
- Experiment 2 (Relevance Evaluation): This investigates why the model copies. In the bottom row (c and d), they corrupt the Subject or Relation tokens (e.g., changing “Iran” to “Rome” in the context) to see if the model stops trusting the context because it no longer matches the question.
Key Findings
The results of these experiments paint a detailed picture of the RAG “thought process.”
1. The Model is a Compulsive Copier
First, the researchers established the baseline behavior. When presented with a context that contradicts its internal knowledge, what does ATLAS do?

Figure 3 shows the Total Effect (TE). The large orange hump indicates that the “General” behavior of the model is strongly shifted toward the counterfactual. In simple terms: when the model sees an answer in the context, it almost always prefers to copy it rather than relying on its internal memory. The non-parametric mechanism is dominant.
2. The Mechanism of Copying: It’s All About the Object
So, the model decides to copy. Which tokens in the context passage are doing the heavy lifting?
Using Causal Tracing, the authors generated heatmaps showing which tokens contribute most to the output.

Look at the top row of Figure 2 (a-c). These heatmaps show the “Copying” experiment. The vertical axis represents the token positions.
- The Result: The bright red hotspots are concentrated almost entirely on the Object Tokens (the actual words being copied).
- The Mechanism: The model identifies the answer tokens in the context and propagates them through the layers. The rest of the context (subjects, relations) has very little “Indirect Effect” during the actual copying phase.
3. The Mechanism of Relevance: Checking the Subject
If the copying mechanism only cares about the object, how does the model ensure it isn’t copying a random word? This is where Experiment 2 comes in.
Look at the bottom two rows of Figure 2 (d-i). Here, the researchers messed with the Subject and Relation tokens.
- Early Layers (Low Layers): You can see red spots appearing early in the network for Subject and Relation tokens. This indicates that Relevance Evaluation happens first. The model scans the subject and relation (“Capital of…” and “Sweden”) in the early layers to confirm, “Yes, this sentence is actually answering my question.”
- Late Layers: Once relevance is established, the focus shifts to the object tokens in the later layers for extraction.
The importance of Subject vs. Relation tokens is further broken down in Figure 5.

While both are important, the statistical analysis suggests that Subject tokens (blue) have a slightly higher impact on the decision to trust the context than relation tokens. If the subject doesn’t match, the model stops listening.
4. The Role of MLP vs. Attention
Perhaps the most technical and fascinating finding is the specific roles of the Transformer components: the Multi-Layer Perceptron (MLP) and Attention heads.

Figure 4 breaks down the impact by layer and component.
- MLP as the Translator (Green Bars): In the middle layers (layers 4-8), the MLP plays a massive role (see charts a, b, c). The authors hypothesize that the MLP is responsible for “translating” the tokens from the retrieved context (Encoder space) into a format that the generator (Decoder) can use. It acts as the bridge between “reading” and “speaking.”
- Attention as the Coordinator (Red Bars): Attention becomes more relevant in the later layers, likely ensuring that the copied answer remains coherent with the rest of the sentence structure.
Real-World Validation
Critics might argue that these findings only apply to the synthetic templates used in Table 2. To address this, the authors ran the same analysis on real retrieved documents fetched by ATLAS’s own retriever.

As seen in Figure 6, the patterns hold up. The heatmaps for real documents look remarkably similar to the synthetic ones. The object tokens (a) dominate the copying phase, while the subject tokens (d) trigger the relevance check. This confirms that the mechanisms discovered are fundamental to how the model operates, not just artifacts of the experiment.
Conclusion: Anatomy of a RAG Decision
This paper provides a blueprint for how Retrieval-Augmented models “think.” It turns out that answering a question using a document isn’t a single step; it’s a multi-stage cognitive process:
- Relevance Check (Early Layers): The model uses Subject and Relation tokens to verify that the retrieved text actually addresses the user’s query. This happens largely in the MLP blocks of the early layers.
- Object Extraction (Mid-to-Late Layers): Once validated, the model focuses intensely on the Object tokens (the answer).
- Translation (Mid Layers): The MLPs translate these representations from the encoder to the decoder.
- Generation: The model copies the answer into the output, suppressing its own parametric memory.
Understanding this interplay is crucial. It tells us that RAG models are highly sensitive to context—perhaps too sensitive. By dissecting these mechanisms, we can begin to design models that are better at discerning truth from noise, rather than blindly copying whatever “Milan” or “Rome” is placed in front of them.
](https://deep-paper.org/en/paper/2410.05162/images/cover.png)