Large Language Models (LLMs) often feel indistinguishable from magic. They can write poetry, code in Python, and summarize history. Yet, for all their prowess, they frequently stumble on questions that require simple, sequential logic—what researchers call “multi-hop queries.”
Consider the question: “The spouse of the performer of Imagine is…”
For a human, this is a straightforward two-step process.
- Hop 1: Who performed Imagine? -> John Lennon.
- Hop 2: Who is the spouse of John Lennon? -> Yoko Ono.
Ideally, an LLM should perform this exact sequence internally. However, models often get this wrong. They might hallucinate an answer or simply fail to make the connection, despite “knowing” both individual facts (that Lennon sang Imagine and that Yoko Ono is his spouse).
In a fascinating research paper titled “Hopping Too Late,” researchers from Tel Aviv University, UCL, and Google Research investigate why this happens. By peering inside the “brain” of Transformer models, they discovered a rigid internal timeline. Their findings suggest that LLMs fail not because they lack knowledge, but because they run out of “time” (or specifically, layers) to process it.
In this deep dive, we will explore their methodology, the “latent reasoning pathway” they discovered, and a novel technique called “back-patching” that proves we can fix these errors by rewinding the model’s internal clock.
The Anatomy of a Multi-Hop Query
To understand the solution, we first need to understand the problem. The researchers focused on Two-Hop Queries. These are questions composed of two facts that share a bridge entity.
- Fact 1: (Imagine, Performer, John Lennon)
- Fact 2: (John Lennon, Spouse, Yoko Ono)
In this scenario:
- Source Entity (\(e_1\)): Imagine
- Bridge Entity (\(e_2\)): John Lennon
- Target Entity (\(e_3\)): Yoko Ono
The query given to the model is: “The spouse of the performer of Imagine is”.
The researchers curated a new dataset of over 82,000 such queries based on Wikidata to test various models, including LLaMA 2, LLaMA 3, and Pythia. They filtered the dataset to ensure they were testing actual reasoning, removing cases where the model might just guess the answer based on popularity or simple word association.
The Hypothesis: Latent Reasoning
The core hypothesis is that the model performs latent multi-hop reasoning. This means the model doesn’t output the name “John Lennon” in text; it “thinks” of John Lennon internally before moving on to Yoko Ono.
If this hypothesis is true, we should be able to find a mathematical representation (a vector) of “John Lennon” somewhere inside the model’s neural network while it is processing the sentence.

As illustrated in Figure 1, the researchers propose a specific pathway:
- Resolution of Hop 1: The model realizes “performer of Imagine” refers to John Lennon. This should happen early, near the token “Imagine.”
- Propagation: This information travels through the network layers to the end of the sentence.
- Resolution of Hop 2: The model uses the “John Lennon” concept to find the spouse, arriving at “Yoko Ono” at the final token.
Viewing the Internal Monologue: The “Patchscopes” Method
How do you read a model’s mind? If you look at the raw numbers (hidden states) inside a Transformer, they look like random noise.
Previous research often used “vocabulary projections”—essentially checking which word from the dictionary matches the current hidden state best. However, this is a blunt instrument. A hidden state might contain a complex concept that doesn’t map perfectly to a single word yet.
The authors of this paper employed a more advanced technique called Patchscopes.
Here is the simplified intuition behind Patchscopes: Imagine the model has a thought vector \(v\) at a specific layer. We want to know what that thought represents. We take that vector \(v\) and “patch” (paste) it into a different prompt designed to extract definitions, like “x is…”. We then let the model generate text. If the model completes the sentence with “John Lennon is an English singer…”, we know the vector \(v\) encoded the concept of John Lennon.
This allows the researchers to decode the “latent” (hidden) thoughts of the model layer by layer.
The Discovery: A Sequential Assembly Line
Using Patchscopes, the researchers analyzed where the Bridge Entity (\(e_2\), e.g., John Lennon) and the Target Entity (\(e_3\), e.g., Yoko Ono) appeared in the model’s layers.
Recall that a Transformer processes text in layers, numbered from 0 (bottom) to 32 or 80 (top). Information flows upward.
1. The First Hop Happens Early
The researchers probed the hidden states at the position of the first entity (the token for “Imagine”, denoted as \(t_1\)).
They found that the Bridge Entity (John Lennon) appears in the early layers of the model.

Figure 2 shows this clearly. Look at the blue line. It represents the decoding of the Bridge Entity (\(e_2\)) from the first token position (\(t_1\)). You can see it peaks in the early-to-mid layers (around layers 5–15 for LLaMA 2 13B).
This confirms that while the model is reading the word “Imagine,” its lower layers are already resolving the logic to “John Lennon.”
2. The Second Hop Happens Late
Next, they looked at the final token of the prompt (“is”, denoted as \(t_2\)).
They looked for the Target Entity (Yoko Ono). The orange dashed line in Figure 2 represents this. Notice how it stays near zero in the early layers and only begins to rise significantly in the later layers (after layer 20).
This confirms the sequential nature of the computation. The model effectively says:
- Layers 0-15: “Imagine” \(\rightarrow\) John Lennon
- Layers 16-30: John Lennon (at the end of the sentence) \(\rightarrow\) Yoko Ono
3. The Propagation Gap
If “John Lennon” is resolved at the word “Imagine” (\(t_1\)), but the answer “Yoko Ono” needs to be generated at the word “is” (\(t_2\)), the information must travel across the sentence.
The researchers used “Attention Knockout”—a method to sever connections between words—to verify this. They found that the middle layers are responsible for moving the “John Lennon” information from the start of the sentence to the end.
The “Hopping Too Late” Problem
So far, we have described a successful reasoning process. But what happens when the model gets it wrong?
The researchers compared the internal timelines of Correct vs. Incorrect predictions. They found a striking pattern: In incorrect cases, the first hop happens too late.

Figure 4 provides a box plot comparison for LLaMA 3 8B.
- Blue (Striped): Correct answers.
- Orange (Dotted): Incorrect answers.
Look at the “Prediction Extracted” column. For incorrect answers, the extraction of the final prediction is pushed to the very last layers (the outliers at the top).
The implication is profound. Transformers have a fixed depth (e.g., 32 layers). If the model “procrastinates” and doesn’t resolve “John Lennon” until layer 25, it only has 7 layers left to transport that information to the end of the sentence and figure out who the spouse is. It runs out of computational depth. The factory assembly line ends before the product is finished.
Validating the Theory: Back-Patching
To prove that “running out of layers” was the cause of failure, the researchers devised a clever experiment called Back-Patching.
If the problem is that the model arrives at the concept of “John Lennon” too late (say, at Layer 20), what if we could send that information back in time?
The Experiment
- Run the model forward.
- Capture the hidden state representing “John Lennon” at a later layer (e.g., Layer 20).
- Re-run the model, but this time, inject (patch) that hidden state into an earlier layer (e.g., Layer 10) at the same position.
This effectively gives the model 10 extra layers of processing time. It mimics a Recurrent Neural Network (RNN) or a loop, allowing the model to “think longer” on the same token.
The Results
The results were remarkable.

Figure 5 visualizes the success of back-patching.
- Panel (a): Back-patching at the first token (\(t_1\)). The bright spots show that taking information from a later source layer and injecting it into an earlier target layer often fixes the error.
- Panel (b): Back-patching at the last token (\(t_2\)). This is even more effective.
The researchers found that back-patching could correct up to 66% of the previously incorrect queries.
This is a critical finding. It means the model had the knowledge. It knew who performed Imagine, and it knew who that person’s spouse was. It just didn’t have the “circuitry” aligned correctly to execute both hops within the limited number of layers available during a single forward pass.
When the researchers artificially extended the depth of the model via back-patching, the reasoning “clicked” into place.
Why Does This Matter?
This paper sheds light on the fundamental constraints of the Transformer architecture.
1. Transformers are Sequential Processors We often think of neural networks as “fuzzy” black boxes, but this research highlights a structured, almost mechanical pipeline. Fact A must be resolved before Fact B can be accessed. If Fact A takes too long to resolve, Fact B never happens.
2. The Depth Limit The fixed number of layers in a Transformer isn’t just a matter of capacity; it’s a limit on sequential reasoning steps. A 32-layer model might be physically incapable of solving a 5-hop query simply because it cannot complete the sequence before reaching the output layer.
3. Potential for “System 2” Thinking The success of back-patching suggests that future architectures could benefit from “recurrence”—looping information back through the layers to allow for deeper reasoning without increasing the model’s physical size. This mimics human “System 2” thinking, where we pause and ponder a complex problem rather than blurting out the first thing that comes to mind.
Conclusion
“Hopping Too Late” provides a compelling mechanical explanation for why LLMs struggle with multi-part questions. It’s not necessarily a lack of training data or intelligence, but a structural limitation of the forward-pass mechanism.
By visualizing the reasoning pathway, the researchers showed that the “bridge entity” (the middle step) is crucial. When the model resolves this middle step early in its layers, it gets the answer right. When it resolves it late, it fails.
The introduction of back-patching—sending information back to earlier layers—serves as both a validation of this theory and a hint at how future models might be built. If we can allow models to “recycle” their layers when they encounter complex logic, we might unlock a new level of reasoning reliability.
](https://deep-paper.org/en/paper/2406.12775/images/cover.png)