](https://deep-paper.org/en/paper/2406.15570/images/cover.png)
Stop Blending Data: Why Editing Model Weights is the Future of Multi-Task LLMs
If you have ever tried to train a Large Language Model (LLM) to be a “jack of all trades,” you know the struggle. You want a model that can solve math problems, write Python code, chat casually, and reason through logic puzzles. The standard approach is Data Mixing. You take all your datasets—math, code, chat—throw them into a giant blender, and train the model on this mixed soup. The problem? It is incredibly expensive and notoriously difficult to tune. If you get the ratio of math-to-chat wrong, the model becomes great at algebra but forgets how to speak English. If you want to add a new skill later, you often have to re-blend and re-train from scratch. ...
](https://deep-paper.org/en/paper/file-2918/images/cover.png)
](https://deep-paper.org/en/paper/2406.19650/images/cover.png)
](https://deep-paper.org/en/paper/file-2916/images/cover.png)
](https://deep-paper.org/en/paper/2410.04514/images/cover.png)
](https://deep-paper.org/en/paper/file-2914/images/cover.png)
](https://deep-paper.org/en/paper/2410.07331/images/cover.png)
](https://deep-paper.org/en/paper/2404.10857/images/cover.png)
](https://deep-paper.org/en/paper/file-2911/images/cover.png)
](https://deep-paper.org/en/paper/2406.11661/images/cover.png)
](https://deep-paper.org/en/paper/file-2909/images/cover.png)
](https://deep-paper.org/en/paper/2410.03197/images/cover.png)
](https://deep-paper.org/en/paper/2410.00513/images/cover.png)
](https://deep-paper.org/en/paper/file-2906/images/cover.png)
](https://deep-paper.org/en/paper/2404.04904/images/cover.png)
](https://deep-paper.org/en/paper/2409.19401/images/cover.png)
](https://deep-paper.org/en/paper/file-2903/images/cover.png)
](https://deep-paper.org/en/paper/2411.08553/images/cover.png)
](https://deep-paper.org/en/paper/2407.07087/images/cover.png)
](https://deep-paper.org/en/paper/2402.19085/images/cover.png)