In the world of Large Language Models (LLMs), a quiet battle is constantly being waged between two types of memory. On one side, there is the model’s internal training—the facts it memorized during its creation (Parametric Memory). On the other side, there is the new information provided to it in real-time via retrieved documents (Non-parametric Memory).

Imagine asking a model, “Who is the CEO of Company X?” If the model was trained in 2021, its internal memory might say “Alice.” But if a retrieval system fetches a news article from 2024 saying “Bob is the new CEO,” the model faces a conflict. Does it trust what it “knows,” or what it is currently “reading”?

This dynamic is the core of Retrieval-Augmented Generation (RAG). While RAG systems are becoming the industry standard for reliable AI, we know surprisingly little about how these models mechanically make the decision to trust context over training.

In the paper “Deciphering the Interplay of Parametric and Non-parametric Memory in Retrieval-augmented Language Models,” researchers from Chalmers University of Technology and the University of Gothenburg crack open the “black box” of the ATLAS model. They use sophisticated causal analysis to map out exactly which neurons and layers activate when a model decides to copy information from a document rather than reciting from memory.

The Core Problem: Who Do You Trust?

Standard generative models (like pure GPT-series models) rely entirely on parametric memory—knowledge stored in their weights. This makes them prone to hallucination on obscure topics or outdated information. RAG models solve this by connecting the generator to a retriever, allowing the model to access non-parametric memory (external documents).

However, having access to information doesn’t guarantee the model will use it. The researchers illustrate this duality with a simple experiment involving the capital of Sweden.

Table 1: Model behavior with different contexts for the question What is the capital of Sweden? The table shows the predicted outputs and probability.

As shown in Table 1 above, when the model is given a context stating “In 1634, Milan became the official capital of Sweden,” it faces a choice. The model knows (parametrically) that Stockholm is the capital. But the context says Milan. When the model answers “Milan,” it is suppressing its internal memory in favor of the non-parametric context.

The goal of this research is to understand the neural circuitry behind this decision.

The Methodology: Causal Mediation Analysis

To understand the “why” and “how” of this behavior, the authors employ Causal Mediation Analysis. This technique allows researchers to trace the flow of information through a neural network, much like an electrician tracing a fault in a circuit board.

The idea is to manipulate specific parts of the input or the model’s internal states and observe the change in the output.

Defining the Variables

The researchers define the Total Effect (TE) as the overall change in the model’s behavior when the context is altered.

Formula for Total Effect TE = Y(X=1) - Y(X=0)

Here, \(X\) represents the input condition (e.g., whether the context contains the true answer or a fake “counterfactual” answer). \(Y\) represents the model’s output probability.

However, knowing the total effect isn’t enough. We want to know which specific component (or “mediator,” \(M\)) is responsible. Is it the Attention mechanism in Layer 5? Is it the MLP in Layer 10? To find this, they calculate the Indirect Effect (IE):

Formula for Indirect Effect IE

This equation essentially asks: “What happens if we keep the input fixed, but force a specific internal component (\(M\)) to behave as if the input were different?” This technique, often called “causal tracing,” allows the authors to pinpoint the exact location where the model processes relevance and copying.

The Measure of Trust

To quantify how much the model prefers the context over its internal memory, the researchers look at the ratio of probabilities between a “counterfactual” answer (fake news provided in context) and the “true” answer (internal fact).

Equation for Y log probability ratio

If this value is high, the model is trusting the context (the counterfactual). If it is low, the model is sticking to its guns (parametric memory).

Designing the Trap: Counterfactual Experiments

To isolate these mechanisms, the researchers devised two distinct experiments using the ATLAS model. They utilized synthetic templates to ensure total control over the input.

Table 2: List of queries built using synthetic context templates.

Table 2 lists the templates used. For example, replacing a real fact (“Rome is the capital of Italy”) with a counterfactual (“Tehran is the capital of Italy”).

Visualizing the Experiments

The experimental design is beautifully visualized in Figure 1 below.

Figure 1: Schematic of the experimental setup showing corruption and restoration runs in the transformer architecture.

  • Experiment 1 (Copying Behavior): This investigates how the model copies. They provide a context where the object (the answer) is swapped. In the top row (a and b), they check how the representation of the answer flows through the layers.
  • Experiment 2 (Relevance Evaluation): This investigates why the model copies. In the bottom row (c and d), they corrupt the Subject or Relation tokens (e.g., changing “Iran” to “Rome” in the context) to see if the model stops trusting the context because it no longer matches the question.

Key Findings

The results of these experiments paint a detailed picture of the RAG “thought process.”

1. The Model is a Compulsive Copier

First, the researchers established the baseline behavior. When presented with a context that contradicts its internal knowledge, what does ATLAS do?

Figure 3: Violin plots showing TE distribution across parametric and non-parametric behaviors.

Figure 3 shows the Total Effect (TE). The large orange hump indicates that the “General” behavior of the model is strongly shifted toward the counterfactual. In simple terms: when the model sees an answer in the context, it almost always prefers to copy it rather than relying on its internal memory. The non-parametric mechanism is dominant.

2. The Mechanism of Copying: It’s All About the Object

So, the model decides to copy. Which tokens in the context passage are doing the heavy lifting?

Using Causal Tracing, the authors generated heatmaps showing which tokens contribute most to the output.

Figure 2: Heatmaps demonstrating AIE results for copying behavior (a-c) and relevance (d-i).

Look at the top row of Figure 2 (a-c). These heatmaps show the “Copying” experiment. The vertical axis represents the token positions.

  • The Result: The bright red hotspots are concentrated almost entirely on the Object Tokens (the actual words being copied).
  • The Mechanism: The model identifies the answer tokens in the context and propagates them through the layers. The rest of the context (subjects, relations) has very little “Indirect Effect” during the actual copying phase.

3. The Mechanism of Relevance: Checking the Subject

If the copying mechanism only cares about the object, how does the model ensure it isn’t copying a random word? This is where Experiment 2 comes in.

Look at the bottom two rows of Figure 2 (d-i). Here, the researchers messed with the Subject and Relation tokens.

  • Early Layers (Low Layers): You can see red spots appearing early in the network for Subject and Relation tokens. This indicates that Relevance Evaluation happens first. The model scans the subject and relation (“Capital of…” and “Sweden”) in the early layers to confirm, “Yes, this sentence is actually answering my question.”
  • Late Layers: Once relevance is established, the focus shifts to the object tokens in the later layers for extraction.

The importance of Subject vs. Relation tokens is further broken down in Figure 5.

Figure 5: TE distribution across subjects and relation tokens.

While both are important, the statistical analysis suggests that Subject tokens (blue) have a slightly higher impact on the decision to trust the context than relation tokens. If the subject doesn’t match, the model stops listening.

4. The Role of MLP vs. Attention

Perhaps the most technical and fascinating finding is the specific roles of the Transformer components: the Multi-Layer Perceptron (MLP) and Attention heads.

Figure 4: Bar charts illustrating the impact of MLP and Attention across different layers.

Figure 4 breaks down the impact by layer and component.

  • MLP as the Translator (Green Bars): In the middle layers (layers 4-8), the MLP plays a massive role (see charts a, b, c). The authors hypothesize that the MLP is responsible for “translating” the tokens from the retrieved context (Encoder space) into a format that the generator (Decoder) can use. It acts as the bridge between “reading” and “speaking.”
  • Attention as the Coordinator (Red Bars): Attention becomes more relevant in the later layers, likely ensuring that the copied answer remains coherent with the rest of the sentence structure.

Real-World Validation

Critics might argue that these findings only apply to the synthetic templates used in Table 2. To address this, the authors ran the same analysis on real retrieved documents fetched by ATLAS’s own retriever.

Figure 6: AIE results of copying behavior for actual documents retrieved by ATLAS.

As seen in Figure 6, the patterns hold up. The heatmaps for real documents look remarkably similar to the synthetic ones. The object tokens (a) dominate the copying phase, while the subject tokens (d) trigger the relevance check. This confirms that the mechanisms discovered are fundamental to how the model operates, not just artifacts of the experiment.

Conclusion: Anatomy of a RAG Decision

This paper provides a blueprint for how Retrieval-Augmented models “think.” It turns out that answering a question using a document isn’t a single step; it’s a multi-stage cognitive process:

  1. Relevance Check (Early Layers): The model uses Subject and Relation tokens to verify that the retrieved text actually addresses the user’s query. This happens largely in the MLP blocks of the early layers.
  2. Object Extraction (Mid-to-Late Layers): Once validated, the model focuses intensely on the Object tokens (the answer).
  3. Translation (Mid Layers): The MLPs translate these representations from the encoder to the decoder.
  4. Generation: The model copies the answer into the output, suppressing its own parametric memory.

Understanding this interplay is crucial. It tells us that RAG models are highly sensitive to context—perhaps too sensitive. By dissecting these mechanisms, we can begin to design models that are better at discerning truth from noise, rather than blindly copying whatever “Milan” or “Rome” is placed in front of them.