](https://deep-paper.org/en/paper/2502.06768/images/cover.png)
The Hard Road to Smarter Models: Why Masked Diffusion Beats Autoregression on Logic Puzzles
If you have used ChatGPT or any modern Large Language Model (LLM), you have interacted with an Autoregressive Model (ARM). These models generate text in a very specific way: token by token, from left to right. They are incredibly successful, but they are also rigid. They must decide what comes next based entirely on what came before. But what if the “next” token isn’t the easiest one to predict? What if the end of the sentence is easier to guess than the middle? ...
](https://deep-paper.org/en/paper/2502.09560/images/cover.png)
](https://deep-paper.org/en/paper/2410.16201/images/cover.png)
](https://deep-paper.org/en/paper/2502.15988/images/cover.png)
](https://deep-paper.org/en/paper/6221_neural_discovery_in_mathe-1802/images/cover.png)
](https://deep-paper.org/en/paper/3646_polynomial_delay_mag_list-1801/images/cover.png)
](https://deep-paper.org/en/paper/2501.04519/images/cover.png)
](https://deep-paper.org/en/paper/1439_what_limits_virtual_agent-1798/images/cover.png)
](https://deep-paper.org/en/paper/2411.09355/images/cover.png)
](https://deep-paper.org/en/paper/2412.18603/images/cover.png)
](https://deep-paper.org/en/paper/2502.04879/images/cover.png)
](https://deep-paper.org/en/paper/347_position_not_all_explanati-1793/images/cover.png)
](https://deep-paper.org/en/paper/2507.09897/images/cover.png)
](https://deep-paper.org/en/paper/14743_emergence_in_non_neural_-1791/images/cover.png)
](https://deep-paper.org/en/paper/2412.06329/images/cover.png)
](https://deep-paper.org/en/paper/3814_going_deeper_into_locally-1789/images/cover.png)
](https://deep-paper.org/en/paper/2501.19334/images/cover.png)
](https://deep-paper.org/en/paper/2505.18300/images/cover.png)
](https://deep-paper.org/en/paper/2502.15588/images/cover.png)
](https://deep-paper.org/en/paper/2410.21465/images/cover.png)