](https://deep-paper.org/en/paper/2509.03646/images/cover.png)
How LLMs Learn to Think – Unpacking the Hierarchical Reasoning in AI
Reinforcement Learning (RL) has been a game-changer for Large Language Models (LLMs), dramatically boosting their ability to solve complex reasoning problems. As models improve, a fundamental question has remained unanswered: how exactly does this improvement happen? The training process often feels like a black box, producing curious phenomena such as sudden “aha moments” where a model appears to acquire a new emergent skill, or “length-scaling,” where longer, more detailed solutions lead to higher accuracy. ...