Large Language Models (LLMs) like GPT-4 and Gemini are computational powerhouses, capable of writing code, composing poetry, and answering a vast range of questions. But for all their might, they have an Achilles’ heel: complex, multi-step reasoning puzzles. Tasks like solving a tricky Sudoku or deciphering the abstract patterns in the ARC-AGI benchmark can cause even the most advanced LLMs to stumble. Their auto-regressive, token-by-token generation process means a single mistake can derail the entire solution, with no easy way to backtrack and correct course.
Researchers have developed techniques like Chain-of-Thought (CoT) prompting to coax LLMs into “thinking” step-by-step, which helps but doesn’t solve the core problem. What if, instead of building ever-larger models, we could design smaller, more efficient systems that excel at this kind of iterative reasoning?
A recent paper, Less is More: Recursive Reasoning with Tiny Networks, explores exactly this. The authors introduce the Tiny Recursive Model (TRM), a remarkably small and simple model that achieves stunning performance on the exact kinds of puzzles that stump massive LLMs. With as few as 7 million parameters—less than 0.01% of the size of models like GPT-3—TRM sets new state-of-the-art results on benchmarks like Sudoku, Maze, and ARC-AGI.
This article dives deep into how TRM works. We’ll first explore its predecessor, the Hierarchical Reasoning Model (HRM), to understand the foundation it builds upon. Then, we’ll unpack the elegant simplifications that make TRM so effective, and finally, we’ll look at the jaw-dropping results that prove sometimes, less truly is more.
Background: The Promise and Complexity of Hierarchical Reasoning (HRM)
TRM didn’t appear in a vacuum—it’s a direct evolution of a model called the Hierarchical Reasoning Model (HRM). HRM was a novel approach that showed great promise by using two small neural networks that recursively call each other to refine a solution. Its design was inspired by complex biological arguments about how the brain processes information at different frequencies.
Let’s break down HRM’s key components:
Recursive Hierarchical Reasoning
HRM uses two networks: a low-level network \(f_L\) and a high-level network \(f_H\). \(f_L\) recurses at a high frequency to process fine-grained details, while \(f_H\) recurses less often to integrate information. These networks operate on two latent feature vectors: \(z_L\) and \(z_H\).Deep Supervision
Instead of training the model to get the right answer in one shot, HRM uses an iterative process. Over up to 16 supervision steps, the model takes its previous output and latent features as input and tries to improve them. This emulates a very deep network without the massive memory cost of a single forward pass.1-Step Gradient Approximation
\[ z_L^* \approx f_L\left(z_L^* + z_H + x\right) \]
A full forward pass in HRM involves many recursive calls. Backpropagating through all of them would be computationally expensive. To get around this, HRM’s authors used a clever (but potentially flawed) shortcut: the Implicit Function Theorem (IFT), which under certain conditions lets you approximate the gradient by only backpropagating through the last step. This relies on the assumption that the recursion converges to a fixed point:
\[ z_H^* \approx f_H\left(z_L + z_H^*\right) \]Adaptive Computational Time (ACT)
To make training more efficient, HRM uses a Q-learning mechanism to decide when a solution is “good enough” and the model can stop iterating on a particular training example, avoiding spending all 16 steps on every sample.
HRM was a breakthrough, achieving high accuracy on puzzles where other models struggled. But it was also complex—reliant on uncertain biological analogies and fixed-point assumptions that weren’t guaranteed to hold—making it difficult to understand and improve. This is where TRM enters the picture.
The Core Method: Unpacking the Tiny Recursive Model (TRM)
The creators of TRM took a hard look at HRM and asked: can we achieve the same or better results by stripping away the complexity? The answer was a resounding yes. TRM is a masterclass in simplification—making changes that reduce model size while dramatically boosting performance.
The overall architecture of TRM is illustrated below.
Figure 1. TRM recursively improves its predicted answer
y
by iterating on its latent reasoning statez
, guided by the input questionx
.
1. Ditching the Fixed-Point Theorem for Full Backpropagation
TRM abandons HRM’s 1-step gradient shortcut. The fixed-point assumption was unlikely to be met after just a few recursions, so instead TRM defines a full recursion process and backpropagates through all of it:
A full process consists of n
updates to the reasoning vector and one update to the answer vector:
…
To keep training efficient, TRM performs T-1
of these recursion processes without gradients, using them to refine the latent states, then does one final recursion with gradients for the learning update.
This change had a massive impact: in ablation studies, switching from 1-step gradient to full backpropagation boosted Sudoku-Extreme accuracy from 56.5% to 87.4%.
Table 1. Design choice contributions to TRM’s final performance.
2. A Simpler, More Intuitive View of Latent Features
HRM’s two latent features, \(z_L\) and \(z_H\), were explained via biological “hierarchies.” TRM offers a simpler interpretation:
- \(z_H\) →
y
— the current embedded answer. - \(z_L\) →
z
— the latent reasoning or “scratchpad.”
To refine a solution, the model needs three things: the original question (x
), the previous answer (y
), and the reasoning chain that led to it (z
). Forgetting any of these weakens its ability to improve the solution.
A Sudoku visual (Figure 6) makes this clear: decoding y
yields a nearly-correct grid, while decoding z
yields an indecipherable numeric map—evidence that z
is truly latent reasoning.
Figure 6a. Input
x
for a Sudoku-Extreme puzzle.
Figure 6b. Tokenized
z_H
corresponds directly to the predicted solution.
Empirically, the two-feature (y
and z
) design outperforms both single-feature and multi-feature variants.
Table 2. Two separate features yield the highest accuracy.
3. One Network to Rule Them All
HRM used two networks, doubling parameters. TRM unifies both \(f_L\) and \(f_H\) into a single network that learns both tasks, distinguishable by inputs (presence or absence of x
). This halved parameters and improved Sudoku accuracy from 82.4% to 87.4%.
4. “Less is More”: Tiny is Better
Scaling from 2 layers to 4 decreased accuracy—larger models overfit with limited data. TRM’s small 2-layer network, unrolled via recursion, achieves necessary depth during inference while staying generalizable.
5. Other Smart Refinements
- Simpler ACT: Replaced Q-learning halting with a single binary cross-entropy loss, removing the need for a second forward pass.
- Attention-Free for Small Inputs: For 9x9 Sudoku, replacing attention with an MLP-Mixer improved performance. Larger grids still benefit from attention.
- EMA of Weights: Smooths training, prevents collapse on small datasets.
The streamlined TRM pseudocode reflects its efficiency:
Figure 3. TRM loop structure: recursive updates precede each supervision step’s gradient update.
Experiments and Results: Tiny Networks, Giant Performance
The models were tested on:
- Sudoku-Extreme: 9x9 Sudokus, only 1k training puzzles.
- Maze-Hard: Pathfinding in complex 30x30 mazes.
- ARC-AGI-1/2: Abstract geometric reasoning puzzles.
Sudoku & Maze Puzzles
TRM doesn’t just edge out HRM—it crushes it.
Table 4. TRM dominates puzzle benchmarks with tiny parameter counts.
Multi-billion parameter LLMs score 0.0%. HRM manages 55.0% (Sudoku) and 74.5% (Maze). TRM’s MLP variant hits 87.4% on Sudoku, while the attention-based variant scores 85.3% on Maze—using just 5–7M parameters.
The ARC-AGI Challenge
ARC-AGI is considered a grand reasoning challenge. TRM again shines.
Table 5. TRM-Att surpasses HRM and several powerful LLMs.
The 7M-param TRM-Att scores 44.6% (ARC-1) and 7.8% (ARC-2), beating HRM’s 40.3%/5.0%, outperforming Gemini 2.5 Pro, and challenging far larger bespoke models.
Even when matched for “effective depth” (layers × recursions), TRM delivers higher accuracy than HRM:
Table 3. TRM’s design proves superior at similar computational depths.
Conclusion: A New Path for AI Reasoning?
The Tiny Recursive Model (TRM) is a powerful demonstration that bigger isn’t always better. By taking a complex model (HRM) and ruthlessly simplifying it, the researchers created something far more elegant, efficient, and effective.
Key takeaways:
- Directness over Shortcuts: Full backpropagation outperformed fixed-point gradient approximations.
- Intuition over Dogma: Simple
answer + reasoning
design improved both clarity and results. - Less is More: Tiny recursive networks avoid overfitting and leverage computation depth via recursion.
TRM’s success suggests an exciting alternative path for building advanced AI reasoning systems. Instead of scaling giant, general-purpose models, we can craft small, specialized models that iteratively refine their solutions. This approach is not only parameter-efficient but may be essential for solving the kind of logical puzzles that underpin true intelligence.
The journey is far from over, but TRM has shown that sometimes, the most profound solutions come in the smallest packages.