Introduction
In recent years, a fascinating intersection has emerged between artificial intelligence and neuroscience. Large Language Models (LMs)—the technology behind systems like GPT—have demonstrated an uncanny ability to predict human brain activity. When a human reads a book inside an fMRI scanner, the internal activations of an LM processing that same text can be mapped surprisingly well to the biological signals in the human’s brain.
This phenomenon has sparked a major debate: Why do they align?
The prevailing hypothesis has been centered on Next-Word Prediction (NWP). Because LMs are trained to guess the next word in a sequence (e.g., “The cat sat on the… mat”), and the human brain is also believed to be a “predictive engine,” many researchers argued that this shared objective is the root cause of the alignment.
However, a new research paper titled “Language models and brains align due to more than next-word prediction and word-level information” challenges this simple narrative. The authors argue that while prediction is important, it is not the whole story. Through a series of clever experiments involving “lobotomizing” and “supercharging” GPT-2 models, they demonstrate that the brain and AI models share deeper representations—specifically related to multi-word context and structure—that exist independently of the ability to predict the next word.
In this post, we will deconstruct their methodology, walk through their mathematical logic, and explore what this means for our understanding of both artificial and biological intelligence.
Background: The Components of Language Processing
To understand the authors’ contribution, we first need to break down the different types of information a brain (or a model) processes when reading a sentence. The researchers categorize this information into three distinct buckets:
- Next-Word Prediction (NWP): The ability to anticipate the upcoming token based on previous context.
- Word-Level Information: The static meaning of a single word, independent of its context. For example, the word “broom” calls to mind a cleaning tool regardless of where it appears in a sentence.
- Multi-Word Information: The meaning derived from the combination and order of words, such as syntax (grammar) and event structure.
The distinction between Word-Level and Multi-Word information is crucial. Consider the example provided by the researchers below:

As shown in Figure 1, the phrase “Harry throws the broom” contains a specific event structure. If we scramble the words to “The broom throws Harry,” the Word-Level Information (the individual concepts of Harry, throw, and broom) remains identical. However, the Multi-Word Information changes drastically—the scene becomes absurd. Furthermore, the Next-Word Prediction capability collapses because the sequence no longer follows standard English patterns.
The Research Gap
Previous studies established that LMs align with the brain. However, because LMs are better at everything as they get larger (better at syntax, better at semantics, better at prediction), it has been impossible to say which factor drives the brain alignment. Is it the prediction itself? Or is prediction just a proxy for learning good grammar and structure?
To solve this, the authors devised a methodology based on subtraction and perturbation.
The Core Method: Disentangling the Factors
The researchers used GPT-2 (Small, Medium, and Distilled versions) and a dataset of fMRI recordings from people reading Chapter 9 of Harry Potter and the Sorcerer’s Stone.
Their goal was to isolate the specific contribution of Multi-Word Information to brain alignment. To do this, they needed to control for (remove the influence of) NWP and Word-Level Information. They achieved this using two specific “perturbations” (modifications) to the models.
Perturbation 1: Input Scrambling
The first technique is Input Scrambling. By shuffling the words in the input text window (e.g., 20 words) at inference time, the researchers created a scenario where:
- Word-Level Information is preserved (the words are the same).
- Multi-Word Information is destroyed (syntax is gone).
- Next-Word Prediction is severely hampered (predicting the next word from word salad is nearly impossible).
Perturbation 2: Stimulus-Tuning
The second technique is Stimulus-Tuning. This involves fine-tuning the pre-trained GPT-2 model specifically on the text of the Harry Potter story (using a training set different from the test set).
- This makes the model an “expert” on the specific narrative.
- It improves the model’s Next-Word Prediction for this specific text.
- It likely improves representations of Multi-Word Information specific to the story.
The Logic of Contrasts
Here is the elegant part of the paper. The authors treat brain alignment as a mathematical sum of different factors. They set up a system of equations to isolate the variables.
Let’s look at the brain alignment change (\(\Delta\)) for the Baseline model when we scramble the input.
The change in alignment (\(\Delta^{base}\)) between the normal Baseline model and the Scrambled Baseline model is composed of three parts: the change in Word-Level (WL), Next-Word Prediction (NWP), and “Other” factors (denoted by \(*\), representing Multi-Word info).

However, remember that scrambling does not change word-level information. The words are the same, just in a different order. Therefore, \(\Delta_{WL}^{base} = 0\). This simplifies the equation:

This tells us that the drop in performance when we scramble the text comes from the loss of prediction ability and the loss of multi-word info. But we still can’t separate them.
The Double Subtraction
To separate them, the authors compare the Baseline model against the Stimulus-Tuned model. They calculate the difference between the Stimulus-Tuned contrast (\(\Delta^{stim}\)) and the Baseline contrast (\(\Delta^{base}\)).

This looks complicated, but it serves a specific purpose. The researchers deliberately select a checkpoint of the Stimulus-Tuned model where the drop in prediction performance caused by scrambling is roughly the same as the drop observed in the Baseline model.
In other words, they ensure that the “damage” done to the prediction capability by scrambling is equal for both models. If the change in prediction capability is equal, then:

If the NWP components cancel each other out, we are left with the final result:

In plain English: By comparing the Stimulus-Tuned model (and its scrambled version) against the Baseline model (and its scrambled version), and mathematically neutralizing the effect of Next-Word Prediction and Word-Level information, any “leftover” brain alignment must be due to the Multi-Word Information.
Experiments and Results
The authors ran these models and compared their internal representations against the fMRI data of the subjects. They looked at specific Regions of Interest (ROIs) in the brain known for language processing, such as the Inferior Frontal Gyrus (IFG) and the Angular Gyrus (AG).
1. Stimulus-Tuning Works
First, the researchers confirmed that their perturbations worked as intended.

In Figure 2 (above), Panel A shows the Next-Word Prediction error (lower is better).
- Stimulus-Tuned (Dark Blue): Performs the best (lowest error). It learned the Harry Potter style well.
- Baseline (Light Blue): Performs typically for GPT-2.
- Scrambled Models (Grey): Perform significantly worse, as expected.
Crucially, Panels C and D show the Brain Alignment.
- Panel D (Stimulus-Tuned): Shows strong alignment (red areas) across the language network.
- Panel C (Baseline): Shows good alignment, but less than the tuned model.
This confirms that fine-tuning on the story helps the model align better with the brain.
2. The Impact of Scrambling
Scrambling the words (Panels E and F in Figure 2) visibly reduces brain alignment. This confirms that the brain cares about order. If the brain only cared about individual words, the scrambled maps would look identical to the unscrambled ones. They don’t. The drop in alignment signifies that structure matters.
3. The Critical Finding: Residual Alignment
Now for the main event. The authors applied the double-subtraction logic derived in the equations above. They asked: Is there any alignment gain left after we subtract the effects of NWP and Word-Level info?
If the “NWP Hypothesis” (that prediction is everything) were true, the result should be zero. There should be no residual alignment.
However, that is not what they found.

Figure 5 (above) reveals the answer. The bars represent the “percentage gain” in alignment that cannot be explained by prediction or word-level meaning.
- Look at the bars for IFG (Inferior Frontal Gyrus) and AG (Angular Gyrus).
- They are consistently positive across different model sizes (GPT-2 Small, Medium, Distilled).
This positive residual means that the Stimulus-Tuned model acquired some information—likely related to syntax, events, or narrative structure—that improved its alignment with the brain, independent of its ability to predict the next word.
Qualitative Visualization
We can also look at this spatially. The image below visualizes which voxels (3D pixels of the brain) show this residual alignment.

In Figure 9 (specifically focusing on the contrast logic explained in the paper), the red areas indicate regions where the Stimulus-Tuned model aligns better with the brain than the Baseline, even after strictly controlling for prediction capabilities. The fact that language regions light up in red confirms that the LMs are capturing high-level linguistic features that the human brain also utilizes.
Discussion and Implications
This paper provides a nuanced correction to the “Predictive Coding” hype. While predicting the next word is undoubtedly a massive part of how both LMs and brains function, it is not the only mechanism driving their similarity.
The Role of IFG and AG
The presence of residual alignment in the Inferior Frontal Gyrus (IFG) and Angular Gyrus (AG) is biologically significant.
- IFG (Broca’s Area): Is traditionally associated with syntactic processing and sentence structure.
- AG: Is often linked to semantic integration and understanding “events” (who did what to whom).
The fact that the residual alignment was found here suggests that by fine-tuning the model on the text, the model didn’t just get better at guessing the next word—it built a better structural representation of the specific events and sentence structures in Harry Potter, which matched the structural representations in the readers’ brains.
Implications for AI
For AI researchers, this highlights the efficiency of Stimulus-Tuning. Training a general model on a very small amount of specific text (the story being read) significantly increased its brain alignment. This suggests that “contextualizing” a model allows it to tap into multi-word representations that are much more human-like than the generic pre-trained representations.
Conclusion
The question “Do language models think like us?” is far from answered, but this research brings us one step closer to precision.
By mathematically disentangling the variables of language processing, the authors demonstrated that the alignment between GPT-2 and the human brain is not merely a byproduct of predicting the next word. There is a deeper, structural alignment—specifically regarding multi-word information—that binds silicon and biology together.
The brain is not just a prediction engine; it is a structure builder. And it appears that, under the hood, our language models are learning to build those structures too.
](https://deep-paper.org/en/paper/2212.00596/images/cover.png)