Large Language Models (LLMs) like GPT-4 and Gemini are computational powerhouses, capable of writing code, composing poetry, and answering a vast range of questions. But for all their might, they have an Achilles’ heel: complex, multi-step reasoning puzzles. Tasks like solving a tricky Sudoku or deciphering the abstract patterns in the ARC-AGI benchmark can cause even the most advanced LLMs to stumble. Their auto-regressive, token-by-token generation process means a single mistake can derail the entire solution, with no easy way to backtrack and correct course.

Researchers have developed techniques like Chain-of-Thought (CoT) prompting to coax LLMs into “thinking” step-by-step, which helps but doesn’t solve the core problem. What if, instead of building ever-larger models, we could design smaller, more efficient systems that excel at this kind of iterative reasoning?

A recent paper, Less is More: Recursive Reasoning with Tiny Networks, explores exactly this. The authors introduce the Tiny Recursive Model (TRM), a remarkably small and simple model that achieves stunning performance on the exact kinds of puzzles that stump massive LLMs. With as few as 7 million parameters—less than 0.01% of the size of models like GPT-3—TRM sets new state-of-the-art results on benchmarks like Sudoku, Maze, and ARC-AGI.

This article dives deep into how TRM works. We’ll first explore its predecessor, the Hierarchical Reasoning Model (HRM), to understand the foundation it builds upon. Then, we’ll unpack the elegant simplifications that make TRM so effective, and finally, we’ll look at the jaw-dropping results that prove sometimes, less truly is more.

Background: The Promise and Complexity of Hierarchical Reasoning (HRM)

TRM didn’t appear in a vacuum—it’s a direct evolution of a model called the Hierarchical Reasoning Model (HRM). HRM was a novel approach that showed great promise by using two small neural networks that recursively call each other to refine a solution. Its design was inspired by complex biological arguments about how the brain processes information at different frequencies.

Let’s break down HRM’s key components:

  1. Recursive Hierarchical Reasoning
    HRM uses two networks: a low-level network \(f_L\) and a high-level network \(f_H\). \(f_L\) recurses at a high frequency to process fine-grained details, while \(f_H\) recurses less often to integrate information. These networks operate on two latent feature vectors: \(z_L\) and \(z_H\).

  2. Deep Supervision
    Instead of training the model to get the right answer in one shot, HRM uses an iterative process. Over up to 16 supervision steps, the model takes its previous output and latent features as input and tries to improve them. This emulates a very deep network without the massive memory cost of a single forward pass.

  3. 1-Step Gradient Approximation
    A full forward pass in HRM involves many recursive calls. Backpropagating through all of them would be computationally expensive. To get around this, HRM’s authors used a clever (but potentially flawed) shortcut: the Implicit Function Theorem (IFT), which under certain conditions lets you approximate the gradient by only backpropagating through the last step. This relies on the assumption that the recursion converges to a fixed point:

    \[ z_L^* \approx f_L\left(z_L^* + z_H + x\right) \]


    \[ z_H^* \approx f_H\left(z_L + z_H^*\right) \]
  4. Adaptive Computational Time (ACT)
    To make training more efficient, HRM uses a Q-learning mechanism to decide when a solution is “good enough” and the model can stop iterating on a particular training example, avoiding spending all 16 steps on every sample.

HRM was a breakthrough, achieving high accuracy on puzzles where other models struggled. But it was also complex—reliant on uncertain biological analogies and fixed-point assumptions that weren’t guaranteed to hold—making it difficult to understand and improve. This is where TRM enters the picture.

The Core Method: Unpacking the Tiny Recursive Model (TRM)

The creators of TRM took a hard look at HRM and asked: can we achieve the same or better results by stripping away the complexity? The answer was a resounding yes. TRM is a masterclass in simplification—making changes that reduce model size while dramatically boosting performance.

The overall architecture of TRM is illustrated below.

Figure 1. The Tiny Recursion Model (TRM) uses a simple iterative process. It takes the question (x), its current answer (y), and a latent reasoning state (z), and recursively updates them to produce a better answer. This loop repeats over several “supervision steps” to progressively solve the problem.

Figure 1. TRM recursively improves its predicted answer y by iterating on its latent reasoning state z, guided by the input question x.

1. Ditching the Fixed-Point Theorem for Full Backpropagation

TRM abandons HRM’s 1-step gradient shortcut. The fixed-point assumption was unlikely to be met after just a few recursions, so instead TRM defines a full recursion process and backpropagates through all of it:

A full process consists of n updates to the reasoning vector and one update to the answer vector:

\[ z_L \leftarrow f_L(z_L + z_H + x) \]



\[ z_L \leftarrow f_L(z_L + z_H + x) \]


\[ z_H \leftarrow f_H(z_L + z_H) \]

To keep training efficient, TRM performs T-1 of these recursion processes without gradients, using them to refine the latent states, then does one final recursion with gradients for the learning update.

This change had a massive impact: in ablation studies, switching from 1-step gradient to full backpropagation boosted Sudoku-Extreme accuracy from 56.5% to 87.4%.

Table 1. An ablation study of TRM on the Sudoku-Extreme dataset. The jump from “w/ 1-step gradient” (56.5%) to the final model (87.4%) is striking.

Table 1. Design choice contributions to TRM’s final performance.

2. A Simpler, More Intuitive View of Latent Features

HRM’s two latent features, \(z_L\) and \(z_H\), were explained via biological “hierarchies.” TRM offers a simpler interpretation:

  • \(z_H\) → y — the current embedded answer.
  • \(z_L\) → z — the latent reasoning or “scratchpad.”

To refine a solution, the model needs three things: the original question (x), the previous answer (y), and the reasoning chain that led to it (z). Forgetting any of these weakens its ability to improve the solution.

A Sudoku visual (Figure 6) makes this clear: decoding y yields a nearly-correct grid, while decoding z yields an indecipherable numeric map—evidence that z is truly latent reasoning.

Figure 6a: The input Sudoku puzzle (<code>x</code>).

Figure 6a. Input x for a Sudoku-Extreme puzzle.

Figure 6b: The tokenized <code>z_H</code> (the answer <code>y</code>). It closely matches the correct solution.

Figure 6b. Tokenized z_H corresponds directly to the predicted solution.

Empirically, the two-feature (y and z) design outperforms both single-feature and multi-feature variants.

Table 2. TRM’s performance on Sudoku-Extreme when using a different number of latent features. The standard two-feature approach (y, z) is the clear winner.

Table 2. Two separate features yield the highest accuracy.

3. One Network to Rule Them All

HRM used two networks, doubling parameters. TRM unifies both \(f_L\) and \(f_H\) into a single network that learns both tasks, distinguishable by inputs (presence or absence of x). This halved parameters and improved Sudoku accuracy from 82.4% to 87.4%.

4. “Less is More”: Tiny is Better

Scaling from 2 layers to 4 decreased accuracy—larger models overfit with limited data. TRM’s small 2-layer network, unrolled via recursion, achieves necessary depth during inference while staying generalizable.

5. Other Smart Refinements

  • Simpler ACT: Replaced Q-learning halting with a single binary cross-entropy loss, removing the need for a second forward pass.
  • Attention-Free for Small Inputs: For 9x9 Sudoku, replacing attention with an MLP-Mixer improved performance. Larger grids still benefit from attention.
  • EMA of Weights: Smooths training, prevents collapse on small datasets.

The streamlined TRM pseudocode reflects its efficiency:

Figure 3. The pseudocode for Tiny Recursion Models (TRMs).

Figure 3. TRM loop structure: recursive updates precede each supervision step’s gradient update.

Experiments and Results: Tiny Networks, Giant Performance

The models were tested on:

  • Sudoku-Extreme: 9x9 Sudokus, only 1k training puzzles.
  • Maze-Hard: Pathfinding in complex 30x30 mazes.
  • ARC-AGI-1/2: Abstract geometric reasoning puzzles.

Sudoku & Maze Puzzles

TRM doesn’t just edge out HRM—it crushes it.

Table 4. Test accuracy on Sudoku-Extreme and Maze-Hard. Massive LLMs score 0.0%, while TRM achieves 87.4% and 85.3%.

Table 4. TRM dominates puzzle benchmarks with tiny parameter counts.

Multi-billion parameter LLMs score 0.0%. HRM manages 55.0% (Sudoku) and 74.5% (Maze). TRM’s MLP variant hits 87.4% on Sudoku, while the attention-based variant scores 85.3% on Maze—using just 5–7M parameters.

The ARC-AGI Challenge

ARC-AGI is considered a grand reasoning challenge. TRM again shines.

Table 5. Test accuracy on ARC-AGI benchmarks. TRM-Att outperforms HRM and rival LLMs like Gemini 2.5 Pro.

Table 5. TRM-Att surpasses HRM and several powerful LLMs.

The 7M-param TRM-Att scores 44.6% (ARC-1) and 7.8% (ARC-2), beating HRM’s 40.3%/5.0%, outperforming Gemini 2.5 Pro, and challenging far larger bespoke models.

Even when matched for “effective depth” (layers × recursions), TRM delivers higher accuracy than HRM:

Table 3. HRM vs TRM on Sudoku-Extreme at similar depths.

Table 3. TRM’s design proves superior at similar computational depths.

Conclusion: A New Path for AI Reasoning?

The Tiny Recursive Model (TRM) is a powerful demonstration that bigger isn’t always better. By taking a complex model (HRM) and ruthlessly simplifying it, the researchers created something far more elegant, efficient, and effective.

Key takeaways:

  1. Directness over Shortcuts: Full backpropagation outperformed fixed-point gradient approximations.
  2. Intuition over Dogma: Simple answer + reasoning design improved both clarity and results.
  3. Less is More: Tiny recursive networks avoid overfitting and leverage computation depth via recursion.

TRM’s success suggests an exciting alternative path for building advanced AI reasoning systems. Instead of scaling giant, general-purpose models, we can craft small, specialized models that iteratively refine their solutions. This approach is not only parameter-efficient but may be essential for solving the kind of logical puzzles that underpin true intelligence.

The journey is far from over, but TRM has shown that sometimes, the most profound solutions come in the smallest packages.