Unlocking the Black Box: A Deep Dive into How LLMs Learn on the Fly

If you’ve ever used ChatGPT, Llama, or any other modern Large Language Model (LLM), you’ve witnessed a kind of magic. You can show it a few examples of a task—like translating phrases from English to French or classifying movie reviews as positive or negative—and it instantly gets it. Without any retraining or fine-tuning, it can perform the task on new inputs.

This remarkable ability is called In-Context Learning (ICL), and it’s one of the key reasons why LLMs are so powerful and versatile. As shown in Figure 1, you can provide examples for translation, sentiment analysis, or even math, and the same model can handle them all—adapting its behavior dynamically based on the context given in the prompt.

Figure 1: A diagram illustrating how a single Large Language Model can perform diverse tasks like translation, sentiment analysis, and arithmetic just by being shown a few examples in the prompt.

Figure 1. Illustration of In-Context Learning: the same pre-trained model can switch tasks simply based on examples in its prompt.

But here’s the billion-dollar question: how does it actually work? How does a model trained simply to predict the next word in a sentence suddenly gain the ability to learn new tasks from a handful of examples? This is one of the biggest mysteries in AI today. While we’ve gotten remarkably good at using ICL, our understanding of its inner workings remains limited.

A recent survey paper, “The Mystery of In-Context Learning: A Comprehensive Survey on Interpretation and Analysis,” sets out to map the landscape of research trying to solve this puzzle. It provides a detailed overview of two major investigative fronts:

Theoretical studies that search for the mathematical and algorithmic foundations of ICL.
Empirical studies that explore how various factors such as data, model architecture, and demonstration format influence its performance.

In this blog post, we’ll follow the structure of that survey to explore how researchers are probing the nature of ICL—what it is, how it might work, and what open questions still remain.

What Exactly Is In-Context Learning?

Before diving deeper, let’s clarify what ICL actually means.

Imagine you have a task, like translating English sentences to French. This task has a large space of possible example pairs, such as (Happy New Year, Bonne année) or (Thank you very much, Merci beaucoup).

Task Demonstration (D):
These are the examples you include in your prompt to “teach” the model on the fly—a set of input–output pairs:
\( D = \{ (x_1, y_1), (x_2, y_2), \dots, (x_n, y_n) \} \).
For example:
- x_1: English: Happy New Year → y_1: French: Bonne année
- x_2: English: Thank you very much → y_2: French: Merci beaucoup`
Task Query (Q):
This is the new input you want the model to process, e.g. English: Good morning, French:.
The Goal:
The LLM, represented as a function \( F_{\theta} \), takes the demonstration \( D \) and the query \( Q \) as input and predicts an answer \( \hat{A} \). The true answer is \( A \).

The process is formally expressed as:

Equation 1: The formal process of in-context learning, where a demonstration D and a query Q are fed into the model F_theta to produce a predicted answer A_hat.

Equation 1. Formal expression of ICL: the model uses the demonstration and query pair to predict the answer.

Performance can then be measured by evaluating how well \( \hat{A} \) matches \( A \):

Equation 2: The performance S is the expected value of some evaluation metric M comparing the predicted answer A_hat to the true answer A.

Equation 2. ICL performance is computed by averaging an evaluation metric across many demonstrations and queries.

With this basic formulation, we can now explore what the theorists and empiricists have learned about how ICL operates.

The Theoretical Quest: What Algorithm Is the LLM Running?

The heart of the mystery lies in understanding what happens inside the model. Researchers have proposed several competing explanations, each illuminating a different facet of ICL.

1. Mechanistic Interpretability: Peeking Inside the Machine

This branch of research aims to reverse-engineer LLMs—essentially performing “digital brain surgery” to uncover how components like attention heads contribute to learning in context.

One key discovery is the role of induction heads. These special attention heads learn simple copy patterns: if the model previously saw “A followed by B,” then upon seeing “A” again, it predicts “B.” This basic pattern-finding behavior acts as an early mechanistic basis for ICL—helping the model generalize by reusing patterns seen in demonstrations.

Later research expanded the picture, identifying more sophisticated mechanisms such as function vectors, where specific attention heads extract and encode the structure of a given task into a vector representation. This vectors influence subsequent predictions—essentially guiding the model to behave as if it were trained on that particular task.

2. Regression Function Learning: Is ICL Just Advanced Curve Fitting?

Another theory views ICL as an instance of function estimation. When trained on examples like (x, f(x)), Transformers can learn to infer underlying relationships—similar to performing linear regression.

For example, consider training a Transformer on points sampled from f(x) = 3x + 2: (1, 5) and (2, 8). When prompted with (3, ?), the model can output 11, matching the correct linear relation. This suggests that LLMs implicitly recognize simple functional mappings from examples and can apply them in new contexts—approximating regression functions on the fly.

This “algorithm selection” perspective proposes that Transformers internally contain a repertoire of statistical estimators—least squares, ridge regression, Lasso—and dynamically select the one that best fits a task from its demonstrations.

3. Gradient Descent as Meta-Optimization: Is the Model Fine-Tuning Itself?

A bold theory posits that the forward pass of a Transformer doesn’t just predict—it internally optimizes.

When processing demonstrations, the attention mechanism performs operations similar to gradient descent updates, effectively solving a miniature optimization problem inside its layers. This means the model acts like a meta-optimizer, learning how to learn.

Supporting studies show parallels between attention updates and gradient computations. However, others found discrepancies: information flow through the Transformer during ICL differs from explicit fine-tuning. Some evidence even suggests higher-order optimization (like Newton’s method) might better describe how models adapt quickly to demonstrations.

4. Bayesian Inference: The Model Making Educated Guesses

From a Bayesian standpoint, ICL is a process of inferring latent concepts under uncertainty. The idea is that during pre-training, LLMs learn statistical patterns and latent relationships between entities.

When prompted with examples, the model performs implicit Bayesian inference—updating its belief about what concept connects input and output pairs.

Consider:

Input: apple → Output: red
Input: banana → Output: yellow
Query: lime → ?

The model infers a hidden rule, “fruit → color,” and predicts “green.”
This view elegantly explains the adaptability of ICL: the model performs probabilistic reasoning about latent structures that resemble tasks it has encountered throughout pre-training.

The Empirical Angle: What Factors Make ICL Work Better?

While theorists wrestle with equations, empiricists run thousands of controlled experiments to discover what practical factors actually affect ICL performance.

Table 1: A summary of the many research studies on the interpretation and analysis of in-context learning, categorized by theoretical and empirical perspectives.

Table 1. Overview of research on ICL interpretation and analysis across theoretical and empirical fronts.

1. The Ingredients: Pre-training Data

Data properties profoundly influence ICL, though not always in predictable ways.

Domain Relevance: Surprisingly, a pre-training corpus closely aligned with the downstream task does not always produce better ICL. Models trained on blog posts outperform those trained on news text in some scenarios.
Distributional Properties: Researchers have identified three vital patterns in pre-training data that correlate with strong ICL:
1. Burstiness: Tokens appear in clusters rather than uniformly.
2. Zipfian Distributions: Common words dominate, but long-tail rare tokens enhance learning.
3. Ambiguity (Polysemy): Words with multiple meanings force models to rely on context, strengthening in-context reasoning.

These structures likely compel LLMs to internalize flexible contextual inference skills—the essence of ICL.

2. The Machine: Pre-training Model

The architecture and scale of a model play a decisive role.

Emergent Ability: ICL tends to appear suddenly at certain scales. Small models rarely exhibit ICL, but beyond a threshold in size and training compute, the capability emerges sharply.
Mirage or Reality? Some argue this emergence might be an artifact of non-linear evaluation metrics rather than an actual change in learning dynamics.
Transient Behavior: One perplexing finding is that ICL might fade with extended training. As a model memorizes more data (in-weights learning), it can lose its ability to flexibly adapt to new contexts.

Furthermore, architectural choices like hidden dimension size may matter more than raw parameter count, and pre-training objectives can strongly shape in-context learning strength.

3. The Recipe: Demonstration Order and Format

Prompt formatting is crucial—sometimes more than the examples themselves.

Order Sensitivity: Models can perform dramatically differently depending on example order.
Recency Bias: Examples near the end of the prompt often exert greater influence, suggesting short-term memory dominance.
Model-Specific Optimum: The most effective ordering varies widely between model families, highlighting that prompt design needs model-aware strategies.

4. The Labels: How Important Is Correctness?

The importance of accurate labels in demonstrations remains debated.

Early findings showed that replacing true labels with random ones causes only minor performance drops on some tasks. This implied that ICL might rely more on input structure than label correctness.

However, subsequent research demonstrated that flipping labels (e.g., swapping “positive” ↔ “negative”) can tank performance below random guessing. For complex tasks, accuracy matters—a lot.

The current consensus is that models use demonstrations both to recognize task type and task logic. For simpler tasks, format recognition dominates; for deeper reasoning tasks, correct input–label mappings are essential.

The Road Ahead: Open Questions and Future Directions

Despite major advances, the puzzle of ICL is far from solved. The survey highlights several ongoing challenges:

Theory–Practice Gap:
Existing theoretical analyses rely on simplified Transformers and synthetic data. Do these conclusions transfer to real-world, billion-parameter models?
Correlation vs. Causation:
Most empirical findings are correlational—showing associations rather than causal mechanisms. Controlled experiments and causal inference methods are needed to clarify which factors truly drive ICL.
Better Evaluation Metrics:
Today’s metrics (like accuracy or loss) are too coarse. We need benchmarks that measure how well a model learns from context, independently of its memorization abilities.
Trustworthiness and Safety:
ICL opens models to manipulation through malicious demonstrations. Understanding how and why models adapt to harmful context examples is critical for designing robust safeguards.

Conclusion

In-context learning has revolutionized how we interact with AI. It lets a general-purpose model instantly adapt to new tasks just from examples—no fine-tuning required. Yet, the more we rely on this capability, the more urgent it becomes to understand it deeply.

Researchers are uncovering its theoretical roots—from induction heads and regression functions to gradient descent and Bayesian inference—and its practical dependencies on data, architecture, and prompt design. But the full picture remains elusive.

Unlocking this black box is more than intellectual curiosity; it’s foundational for building the next generation of AI systems that are adaptive, trustworthy, and safe. As we continue to study ICL, we move one step closer to understanding how machines learn—not just in training, but in real time.

What Exactly Is In-Context Learning?#

The Theoretical Quest: What Algorithm Is the LLM Running?#

1. Mechanistic Interpretability: Peeking Inside the Machine#

2. Regression Function Learning: Is ICL Just Advanced Curve Fitting?#

3. Gradient Descent as Meta-Optimization: Is the Model Fine-Tuning Itself?#

4. Bayesian Inference: The Model Making Educated Guesses#

The Empirical Angle: What Factors Make ICL Work Better?#

1. The Ingredients: Pre-training Data#

2. The Machine: Pre-training Model#

3. The Recipe: Demonstration Order and Format#

4. The Labels: How Important Is Correctness?#

The Road Ahead: Open Questions and Future Directions#

Conclusion#