](https://deep-paper.org/en/paper/2402.18191/images/cover.png)
Less Data, Better Models: How 'Clustering and Ranking' Revolutionizes Instruction Tuning
Introduction: The Quality vs. Quantity Dilemma In the current landscape of Large Language Model (LLM) development, there is a prevailing assumption that “more is better.” We often assume that to make a model smarter, we must feed it more tokens, more documents, and more instructions. This is generally true for the pre-training phase, where models learn the statistical structure of language. However, the rules change significantly during the Instruction Tuning (IT) phase—the final polish that teaches a model to act as a helpful assistant. ...
](https://deep-paper.org/en/paper/2407.18712/images/cover.png)
](https://deep-paper.org/en/paper/file-2856/images/cover.png)
](https://deep-paper.org/en/paper/2406.09818/images/cover.png)
](https://deep-paper.org/en/paper/2410.03502/images/cover.png)
](https://deep-paper.org/en/paper/2406.12257/images/cover.png)
](https://deep-paper.org/en/paper/2404.13556/images/cover.png)
](https://deep-paper.org/en/paper/file-2851/images/cover.png)
](https://deep-paper.org/en/paper/2311.09210/images/cover.png)
](https://deep-paper.org/en/paper/2305.06575/images/cover.png)
](https://deep-paper.org/en/paper/2410.05565/images/cover.png)
](https://deep-paper.org/en/paper/2410.05235/images/cover.png)
](https://deep-paper.org/en/paper/2410.04527/images/cover.png)
](https://deep-paper.org/en/paper/file-2845/images/cover.png)
](https://deep-paper.org/en/paper/2410.01023/images/cover.png)
](https://deep-paper.org/en/paper/2406.17274/images/cover.png)
](https://deep-paper.org/en/paper/2410.03001/images/cover.png)
](https://deep-paper.org/en/paper/2402.02636/images/cover.png)
](https://deep-paper.org/en/paper/2405.16908/images/cover.png)
](https://deep-paper.org/en/paper/file-2839/images/cover.png)