Introduction: The Alignment Problem We Don’t Talk About Enough
We live in an age of incredible AI. Generative models can write poetry, create stunning art, and even help scientists discover new medicines. These powerful tools are increasingly positioned as partners in human-AI teams, where they augment our abilities to solve complex problems. But for any team to work, the members need to be on the same page. In AI, this is known as the alignment problem: making sure AI systems act according to our goals and preferences.
While much of the discussion around alignment focuses on values and ethics, a recent perspective paper by a large, interdisciplinary team of researchers highlights a deeper, more fundamental misalignment. It’s not just about what AI does, but how it thinks. Specifically, it’s about generalisation — the ability to take what you’ve learned from specific examples and apply it to new, unseen situations.
Humans are masters of generalisation. A child who sees a few dogs can easily recognise a new breed they’ve never encountered. We can understand abstract concepts, grasp the underlying rules of a system, and transfer knowledge between entirely different domains. Modern AI, for all its power, struggles with this. It can memorise vast amounts of data, but its ability to generalise often breaks down in surprising and unpredictable ways. This gap between how humans and machines generalise is a critical, yet often overlooked, barrier to creating truly effective and safe AI partners.
This article dives into the paper Aligning Generalisation Between Humans and Machines to explore this challenge. We’ll unpack:
- The historical dance between AI and cognitive science in the study of generalisation.
- What “generalisation” actually means — it’s more complex than you might think.
- The three main ways AI tries to generalise — and the trade-offs of each.
- How we can evaluate an AI’s ability to generalise — and why current methods often fall short.
- Future directions that could finally bridge the generalisation gap between humans and AI.
Let’s begin by looking at the complementary — and often conflicting — strengths of human and machine intelligence.
A Tale of Two Intelligences
The goal of human-AI teaming isn’t to create an AI that perfectly mimics a human. Instead, it’s about building a partnership where each side’s strengths cover the other’s weaknesses. The authors of the paper illustrate this beautifully.
Figure 1: Comparison of the generalisation strengths of humans and statistical machine learning models. Humans excel in few-shot learning, compositionality, common sense, and robustness. Machines excel in large-scale data handling, inference correctness, complexity management, and universal approximation. Both struggle with overgeneralisation, underscoring the need for collaboration and explanation.
Humans have an incredible knack for learning from just a few examples, understanding compositionality (how parts combine to form a whole, like words in a sentence), and drawing on a deep well of common sense. This makes us resilient to noise and changes in data.
Statistical AI models, like the deep neural networks behind today’s large language models (LLMs), possess different strengths. They process large-scale data with efficiency and correctness, manage immense data complexity, and — in theory — act as universal approximators capable of learning nearly any function.
However, both humans and machines are prone to overgeneralisation. Humans do it through stereotypes and biases; AI through hallucinations, where it confidently asserts falsehoods. In a human-AI team, trust requires transparency — the AI must provide explanations for its reasoning. Here, the fundamental difference in their generalisation styles becomes a major hurdle.
A Shared History
Efforts to understand and replicate generalisation have long been shaped by exchanges between AI research and cognitive psychology. The paper highlights milestones showing how models of human cognition directly inspired AI development.
Figure 2: Illustrations of how models of human generalisation inspired the three major families of AI methods: rule-based, example-based, and statistical.
Breaking it down:
- Rule-Based Learning
Early cognitive scientists studied how humans define concepts via rules (e.g., “a square has four equal sides”), inspiring AI techniques like decision trees and inductive logic programming — methods that learn such rules from data.
Figure 2a: Examples of rule-based learning, where logical rules are induced from observed examples.
- Example-Based Learning
Other cognitive theories emphasised similarity over rules. Prototype theory suggests we create mental averages of categories, while Exemplar theory posits we compare new instances to specific remembered examples. These ideas inspired AI methods like k-Nearest Neighbours and help explain context effects — why a vessel might be called a “cup” with coffee but a “bowl” with soup.
Figure 2b: Example-based generalisation relies on similarity to prototypes, sensitivity to context, and structural analogies.
- Statistical Learning
The “connectionist” idea — neural networks emulating brain-like statistical learning — became dominant, excelling with large datasets.
Figure 2c: Statistical generalisation in early neural networks learns patterns from data without explicit rules.
Despite their success, statistical models still face longstanding critiques: lack of explicit semantics, poor explainability, and reliance on correlation over causation. Understanding these historical roots leads us into the paper’s framework for deciphering “generalisation.”
What Do We Mean by “Generalisation”?
The paper makes a key point: “generalisation” has multiple meanings, which they separate into three notions.
1. Generalisation as a Process
The act of creating general knowledge from data.
- Abstraction: Forming a broad concept from many examples.
- Extension: Adapting an existing model/schema to new scenarios.
- Analogy: Transferring and adapting a schema to a different context.
2. Generalisation as a Product
The output of that process.
- Symbolic rules (“All birds have wings”).
- Concepts/categories — as lists of features, prototypes, or exemplars.
- Probability distributions — typical in generative AI, representing patterns without explicit conceptual definitions.
3. Generalisation as an Operator
The ability to apply the product to new data to make accurate predictions.
Machine learning theory reminds us: generalisation starts where memorisation ends. A model that memorises every training detail won’t perform well on genuine new data.
The Great Misalignment
Humans and machines differ in all three notions:
- Process: Humans favour abstraction and analogy; statistical AI favours data-driven pattern-finding.
- Product: Humans produce sparse, conceptual rules; statistical AI produces dense probability models.
- Operator: Human generalisation is robust and flexible, especially on out-of-distribution (OOD) data; AI’s often brittle when faced with novelty.
Aligning the operator requires addressing the mismatch in process and product.
How AI Tries to Generalise: Three Competing Philosophies
The authors classify AI methods by their generalisation approach.
Table 1: AI methods are commonly structured by algorithmic details, but generalisation-oriented grouping offers deeper insight.
1. Statistical Methods
- Philosophy: Infer a model capturing the full data distribution from observations, optimising predictive accuracy (empirical risk minimisation).
- Strengths: Universal approximation capacity; excels on massive, complex datasets; fast inference.
- Weaknesses: Generalisation limited to seen-distribution; poor OOD performance; black-box opacity.
2. Knowledge-Informed (Analytical) Methods
- Philosophy: Begin with explicit theories/models and use data to verify or refine them.
- Strengths: Explainable by design; strong compositionality; allows inspection.
- Weaknesses: Brittle; limited to domains with formal models; computationally heavy structure learning.
3. Instance-Based (Lazy Learning) Methods
- Philosophy: Make predictions by finding the most similar stored examples to a new input.
- Strengths: Flexible; robust to distribution shifts; suitable for continual/lifelong learning; good OOD detection.
- Weaknesses: Relies heavily on effective representations and similarity measures.
Table 2: Trade-offs between statistical, knowledge-informed, and instance-based methods. Each excels in certain properties but falls short in others.
Hybrids like neurosymbolic AI aim to combine these strengths.
Measuring Generalisation
Traditional evaluation — train/test split with IID (identical distribution) assumption — struggles with foundation models likely exposed to “test” data during training (data contamination).
The authors highlight three critical evaluation aspects:
Measuring Distributional Shifts
Using statistical distances, adversarial perturbations, and counterfactuals to judge robustness.
Determining Under- and Overgeneralisation
- Undergeneralisation: Failure to apply similar outputs to slightly varied inputs (e.g., prompt sensitivity).
- Overgeneralisation: Ignoring crucial differences (e.g., hallucinations, biased predictions).
Distinguishing Memorisation from Generalisation
Deciding when to memorise facts (“Paris is the capital of France”) versus generalising concepts (“What makes something a capital city”), and testing tasks combining both.
Table 3: Mapping desired generalisation properties to AI method families and evaluation approaches. Shows the complementary strengths of statistical (
S
), analytical (A
), and instance-based (I
) methods.
The Road Ahead: Charting a Course for Alignment
The conclusion is a call to action and outlines emerging directions:
New Theories for Foundation Models
Zero-shot/in-context learning defies classical theory — needing explanations of why they work and when they fail. Concepts like invariances or analogies may be key.Generalisable Neurosymbolic Methods
True hybrids leveraging neural learning power and symbolic compositional generalisation. Challenges include richer symbolic representations and provable properties.Generalisation in Continual Learning
Avoiding catastrophic forgetting by using symbolic constraints or instance-based rehearsal; detecting distribution drift early.Better Evaluation Frameworks
Beyond train/test splits — benchmarks for abstraction, analogy, and complex reasoning; evaluation servers; simulation environments.Aligning the Process, Not Just the Output
Debugging disagreements requires conceptual-level alignment through shared, explainable representation bridging human causal models and AI statistical associations.
Aligning generalisation between humans and machines is about building AI that can reason, adapt, and collaborate compatibly with our cognition. This is the path to creating partners that don’t just mimic intelligence but genuinely get it — making them safer, more reliable, and ultimately more useful.