Why AI Forgets and We Don’t (Usually): Lessons from the Continual Learner in Our Heads

Humans are master learners. We can acquire new skills—learning a language, playing a musical instrument, or mastering a video game—without overwriting what we already know. This ability to learn sequentially, while retaining old knowledge, is known as continual learning.

If you’re familiar with machine learning, you know that artificial neural networks face a serious hurdle here. Train a standard network on Task A, and then on Task B, and its performance on Task A will likely collapse. This dramatic drop in performance is called catastrophic forgetting.

Yet, paradoxically, humans often display the opposite pattern. We perform better when our training data are blocked (one task at a time) and worse when trials from different tasks are interleaved or shuffled. This reversal—the success of sequential learning for humans, contrasted with its failure for AI—is a puzzle at the heart of a fascinating research paper from Oxford, UCL, and the Wigner Research Centre for Physics.

The authors ask:

What computational principles allow the human brain to learn new tasks consecutively without catastrophic forgetting—and why does interleaved training impede it?

To answer this, they propose a model integrating two biologically inspired ideas:

“Sluggish” Task Signals – Context information in the brain doesn’t reset instantly; remnants of the previous trial “leak” into the current context.
Hebbian Context Gating – A learning principle inspired by prefrontal cortex function that automatically partitions neural resources so new information doesn’t overwrite old memories.

Together, these ideas yield a network that learns—and even makes mistakes—in remarkably human-like ways.

The Human vs. Machine Learning Paradox

To explore continual learning, the researchers revisited a behavioral experiment where people learned which fractal trees would grow well in two distinct gardens—the “north” and “south” garden.

Figure 1: Human continual learning experiment. Participants learned which fractal trees would grow in each garden, with different feature–reward rules per context. Training was either blocked (single context at a time) or interleaved (contexts mixed together).

Each tree varied by leafiness and branchiness. In the north garden, leafiness determined success; in the south garden, branchiness did. Participants trained either on long blocked segments (one garden, then the other) or an interleaved curriculum with contexts shuffled together. Surprisingly, those trained blocked performed better later on a mixed test.

Machines, however, show the opposite pattern. Let’s see why.

The “Vanilla” Neural Network: A Recipe for Forgetting

The team trained a simple feedforward neural network—a Multi-Layer Perceptron (MLP)—on a simplified version of the tree task. Inputs were Gaussian “blobs” varying along x- and y-positions. Task 1 depended on x‑position, Task 2 on y‑position. Each trial provided the blob image plus a one‑hot encoded context cue (e.g., [1, 0] for Task 1).

Results from a standard neural network. Interleaved training leads to high accuracy and orthogonal representations. Blocked training leads to catastrophic forgetting, where the network overwrites knowledge of the first task while learning the second.

Figure 2: The baseline (“vanilla”) network. Interleaved training works well, producing accurate, orthogonal task representations. Blocked training leads to catastrophic forgetting—knowledge of Task 1 is erased when Task 2 begins.

Under interleaved training, the model achieves perfect accuracy and learns independent (“orthogonal”) internal codes for each task: one axis for x‑position, one for y‑position. Under blocked training, though, it forgets the first task entirely. When Task 2 starts, the network simply applies Task 2’s rule to everything, ignoring context signals completely.

This illustrates the tension: humans thrive under blocked curricula, but standard networks fail catastrophically. Something fundamental must be missing.

Motif 1: The Cost of Interleaving and “Sluggish” Neurons

Why is interleaved training harder for humans? The answer lies in temporal dependency—our cognitive systems assume that the world doesn’t change abruptly. Contexts are typically stable over time: you don’t switch from “at home” to “at work” every few seconds. The brain, therefore, integrates information from recent trials when interpreting the present.

The researchers modeled this using sluggish task units, which carry contextual information forward across trials. Instead of a crisp context cue per trial, the network receives an exponentially smoothed signal:

\[ x_t^{EMA} = (1-\alpha)x_t + \alpha x_{t-1}^{EMA} \]

Here, \( \alpha \) controls sluggishness—how much the previous trial influences the current one. With large \( \alpha \), context signals become blurry mixtures of past and current tasks, especially devastating under interleaved training.

Results showing the effect of sluggishness. As the sluggishness parameter α increases, the network’s performance on interleaved tasks degrades. It moves from learning two separate, factorized rules to a single, compromised linear rule, mimicking human performance patterns.

Figure 3: Increasing the sluggishness parameter (α) reduces accuracy under interleaved training and pushes the network from learning two factorised rules to one compromised, diagonal rule—just like human learners.

As α rises:

Accuracy drops (Panel B).
Decisions become influenced by irrelevant feature dimensions (Panel C).
Representations collapse into a single diagonal boundary covering “congruent” trials—cases where both tasks share the same response (Panels E–F).

In behavioral terms, this sluggishness introduces switch costs and a bias toward simplification—echoing human performance patterns under interleaved learning.

But sluggishness alone doesn’t solve forgetting when tasks are learned in blocks. For that, the brain needs a way to protect old knowledge.

Motif 2: Learning to Gate with Hebbian Plasticity

How does biological learning avoid catastrophic forgetting? The prefrontal cortex (PFC) offers a clue. It exerts gating, activating task‑relevant circuits while suppressing irrelevant ones. The researchers first implemented manual gating: they partitioned hidden units so that one subset responded to Task 1, the other to Task 2.

A diagram showing how manual gating works. By setting weights with opposite signs for each task, different sets of hidden units are activated, preventing interference and solving catastrophic forgetting.

Figure 4: Manual context gating. By assigning opposite-sign weights from task units to hidden units, the network learns each task using its own subset of neurons and avoids forgetting.

This handcrafted gating prevented interference entirely: each task occupied distinct neural territory. The outcome suggested that anti-correlated gating signals—positive for one task, negative for the other—are the key.

The next challenge was to learn this structure automatically.

Hebbian Context Gating

To automate gating, the researchers turned to Hebbian learning—the archetypal rule, “neurons that fire together wire together.” Specifically, they used Oja’s rule, a stabilized Hebbian variant that naturally recovers the principal component of its inputs.

In this case, the strongest source of input variance is the task context itself. By applying Oja’s rule to the connections from task units to hidden units, the network discovers this structure on its own.

A diagram showing how Hebbian learning (Oja’s rule) can be used to learn the gating mechanism automatically. This solves catastrophic forgetting in the blocked training setting.

Figure 5: Hebbian learning induces gating. Using Oja’s rule, task weights become anti‑correlated (Panel C & F), partitioning hidden units into task‑specific subsets. Alternating supervised SGD with Hebbian updates protects old knowledge during blocked learning (Panels D & H).

In practice, alternating standard gradient‑descent steps with Hebbian updates teaches the network to:

Strengthen links between task units and hidden neurons encoding relevant information,
Weaken links to neurons carrying irrelevant features.

As training proceeds, weights from the two task units diverge in opposite directions—producing anti‑correlated gating signals that partition the hidden layer. The resulting Hebbian Gating Network learns sequentially without forgetting, achieving high accuracy even under blocked curricula.

Putting It Together: A Human‑Like Continual Learner

The final architecture combined both innovations:

Sluggish task signals that capture temporal inertia.
Hebbian updates that create self‑learned gating.

This hybrid model, tested on the same task structure as before, reproduces key facets of human behavior.

A comparison of Humans, a Baseline AI model, and the final EMA+Hebb model. The final model successfully replicates human performance patterns, including higher accuracy in blocked training and a larger congruency effect in interleaved training.

Figure 6: Humans vs. Models. The sluggish + Hebbian network (EMA + Hebb) mirrors human data—higher accuracy under blocked training (Panel A), human‑like congruency effects (Panel B), reduced intrusions from irrelevant dimensions (Panel C), and proper factorisation under blocked curricula (Panel D).

Accuracy: Humans and the EMA + Hebb model both perform best after blocked training, unlike the baseline AI that thrives only under interleaving. Congruency effect: Both humans and the hybrid model show sensitivity to “congruent” trials under interleaving, indicating similar interference strategies. Decision patterns: Under interleaved conditions, irrelevant features exert greater influence, reproducing human biases. Representation type: Blocked training yields distinct, factorised rules; interleaved training collapses into a single linear compromise.

These behavioral alignments show that simple biological principles can recreate complex human learning effects.

Diagnosing Errors: Boundary Bias and Representation Shape

To pinpoint why the model (and humans) perform poorly when interleaved, the researchers fitted a psychophysical model that decomposed error sources into boundary bias, lapse rate, slope, and offset.

A psychophysical model fit shows that the primary error source for both humans and the final model under interleaved training is an inaccurate estimate of the decision boundary (angular bias).

Figure 7: Psychophysical modeling. For both humans (A) and the hybrid network (C), interleaved training leads to a larger angular bias—systematic misestimation of the category boundary. The baseline model (B) fails to replicate this pattern.

Results show that both humans and the EMA + Hebb network suffer from angular bias under interleaved training—they skew their decision boundaries rather than simply make random mistakes. This confirms that interleaved learning biases representation geometry rather than just adding noise.

The Geometry of Thought: How Training Reshapes Representations

The study concludes with a striking analysis of internal “neural geometry.” Using Representational Similarity Analysis, the authors compared learned hidden‑layer representations to three model geometries:

Grid model – Encodes all features across tasks.
Orthogonal model – Represents tasks along independent axes.
Diagonal model – Collapses tasks onto a shared diagonal (congruent trials).

Representational Similarity Analysis shows that blocked training leads to orthogonal representations, while interleaved training with sluggishness leads to diagonal representations focused on congruent stimuli.

Figure 8: Representational geometry. Blocked training produces orthogonal task representations that cleanly separate features. Interleaved training with sluggishness yields diagonal representations emphasizing congruent stimuli—paralleling human data.

Blocked training generates orthogonal representations, neatly separating task‑specific information and minimizing interference. Interleaved training, influenced by sluggish context signals, drives the model toward diagonal representations, encoding shared trials rather than distinct tasks. This trade‑off mirrors the biases observed in human cognitive representations.

Conclusion: Bridging Brains and AI

This research is a milestone in biologically inspired learning theory. With just two intuitive principles—sluggishness and Hebbian gating—the model reconciles the opposing patterns of human and machine learning.

Key insights:

Temporal inertia explains interleaving costs. Sluggish signals create interference when contexts switch rapidly, compromising learning through “blended” representations.
Hebbian gating prevents forgetting. Simple unsupervised updates can learn to partition neural resources automatically, allowing stable continual learning.
Training shapes neural geometry. Blocked learning fosters orthogonal, task‑specific codes; interleaved learning promotes shared, diagonal ones.

These findings showcase how fundamental biological mechanisms provide elegant solutions to long‑standing problems in artificial intelligence. Continual learning, it seems, is not just a feat of sophisticated engineering—but a reflection of how real brains navigate change, stability, and memory across a lifetime.

The Human vs. Machine Learning Paradox#

The “Vanilla” Neural Network: A Recipe for Forgetting#

Motif 1: The Cost of Interleaving and “Sluggish” Neurons#

Motif 2: Learning to Gate with Hebbian Plasticity#

Hebbian Context Gating#

Putting It Together: A Human‑Like Continual Learner#

Diagnosing Errors: Boundary Bias and Representation Shape#

The Geometry of Thought: How Training Reshapes Representations#

Conclusion: Bridging Brains and AI#