](https://deep-paper.org/en/paper/file-3254/images/cover.png)
You Are What You Eat: How Synthetic Data Shapes and Steers LLMs
Introduction In the current landscape of Artificial Intelligence, we are running into a bottleneck: high-quality human-generated data is becoming scarce and expensive. To circumvent this, the industry has turned to synthetic data—text generated by Large Language Models (LLMs) to train other LLMs. It is an appealing solution that promises infinite data at a fraction of the cost. However, this solution treats datasets as static commodities. We tend to assume that if a “Teacher” model (like GPT-4 or a large LLaMa model) generates data, the “Student” model will simply learn to be smarter. But learning is not just about facts and reasoning capabilities; it is also about style, bias, toxicity, and preference. When a student model trains on synthetic data, it inherits a complex web of latent characteristics from the teacher. ...
](https://deep-paper.org/en/paper/2407.06542/images/cover.png)
](https://deep-paper.org/en/paper/2406.20030/images/cover.png)
](https://deep-paper.org/en/paper/2410.05725/images/cover.png)
](https://deep-paper.org/en/paper/file-3250/images/cover.png)
](https://deep-paper.org/en/paper/2401.10768/images/cover.png)
](https://deep-paper.org/en/paper/2409.14907/images/cover.png)
](https://deep-paper.org/en/paper/2402.13593/images/cover.png)
](https://deep-paper.org/en/paper/2403.08319/images/cover.png)
](https://deep-paper.org/en/paper/2402.11176/images/cover.png)
](https://deep-paper.org/en/paper/2410.03181/images/cover.png)
](https://deep-paper.org/en/paper/2410.03884/images/cover.png)
](https://deep-paper.org/en/paper/file-3242/images/cover.png)
](https://deep-paper.org/en/paper/2402.01619/images/cover.png)
](https://deep-paper.org/en/paper/2402.12291/images/cover.png)
](https://deep-paper.org/en/paper/2406.19317/images/cover.png)
](https://deep-paper.org/en/paper/file-3238/images/cover.png)
](https://deep-paper.org/en/paper/file-3237/images/cover.png)
](https://deep-paper.org/en/paper/2406.18725/images/cover.png)
](https://deep-paper.org/en/paper/2403.05020/images/cover.png)