](https://deep-paper.org/en/paper/2410.03001/images/cover.png)
Beyond the Hype—Are Transformers Actually Good at Learning Basic n-grams?
If you have been following the explosion of Natural Language Processing (NLP) in recent years, you know that the Transformer architecture is the engine behind the revolution. From GPT-4 to Claude, Transformers seem capable of mastering complex reasoning, coding, and creative writing. But in the research world, a fundamental question remains: Do we actually understand how they learn? There is a significant body of theoretical work exploring what Transformers can represent. For example, we know mathematically that a Transformer is capable of mimicking an n-gram language model (a simple model that predicts the next word based on the previous \(n-1\) words). But just because a neural network can represent a function doesn’t mean it will actually learn that function from data using gradient descent. ...
](https://deep-paper.org/en/paper/2402.02636/images/cover.png)
](https://deep-paper.org/en/paper/2405.16908/images/cover.png)
](https://deep-paper.org/en/paper/file-2839/images/cover.png)
](https://deep-paper.org/en/paper/2406.12809/images/cover.png)
](https://deep-paper.org/en/paper/2410.06022/images/cover.png)
](https://deep-paper.org/en/paper/2409.14037/images/cover.png)
](https://deep-paper.org/en/paper/file-2835/images/cover.png)
](https://deep-paper.org/en/paper/2402.17302/images/cover.png)
](https://deep-paper.org/en/paper/2405.18348/images/cover.png)
](https://deep-paper.org/en/paper/2401.05467/images/cover.png)
](https://deep-paper.org/en/paper/2404.02655/images/cover.png)
](https://deep-paper.org/en/paper/2409.19817/images/cover.png)
](https://deep-paper.org/en/paper/2406.15823/images/cover.png)
](https://deep-paper.org/en/paper/2409.15452/images/cover.png)
](https://deep-paper.org/en/paper/file-2827/images/cover.png)
](https://deep-paper.org/en/paper/2410.06944/images/cover.png)
](https://deep-paper.org/en/paper/2409.19984/images/cover.png)
](https://deep-paper.org/en/paper/2407.17467/images/cover.png)
](https://deep-paper.org/en/paper/2308.08295/images/cover.png)