](https://deep-paper.org/en/paper/file-2809/images/cover.png)
Breaking the ReLU Barrier: How to Turn Arbitrary Dense Models into Efficient Mixtures-of-Experts
The scale of Large Language Models (LLMs) is exploding. From GPT-4 to Llama, models are getting bigger, smarter, and—crucially—much more expensive to run. The primary culprit for this cost is the dense nature of these architectures: every time you ask a question, every single parameter in the model is activated to calculate the answer. Imagine a library where, to answer a single question, the librarian has to open and read every single book on the shelves. That is a dense model. ...
](https://deep-paper.org/en/paper/2407.02118/images/cover.png)
](https://deep-paper.org/en/paper/file-2807/images/cover.png)
](https://deep-paper.org/en/paper/2406.11375/images/cover.png)
](https://deep-paper.org/en/paper/2410.12048/images/cover.png)
](https://deep-paper.org/en/paper/2402.11129/images/cover.png)
](https://deep-paper.org/en/paper/file-2803/images/cover.png)
](https://deep-paper.org/en/paper/file-2802/images/cover.png)
](https://deep-paper.org/en/paper/file-2801/images/cover.png)
](https://deep-paper.org/en/paper/2407.10241/images/cover.png)
](https://deep-paper.org/en/paper/2406.15718/images/cover.png)
](https://deep-paper.org/en/paper/2409.15594/images/cover.png)
](https://deep-paper.org/en/paper/file-2797/images/cover.png)
](https://deep-paper.org/en/paper/2411.00173/images/cover.png)
](https://deep-paper.org/en/paper/2403.18252/images/cover.png)
](https://deep-paper.org/en/paper/2410.05183/images/cover.png)
](https://deep-paper.org/en/paper/2407.10920/images/cover.png)
](https://deep-paper.org/en/paper/2406.19764/images/cover.png)
](https://deep-paper.org/en/paper/file-2791/images/cover.png)
](https://deep-paper.org/en/paper/2404.14716/images/cover.png)