](https://deep-paper.org/en/paper/2404.09682/images/cover.png)
Cleaning Up the Mess: How LLMs Can Fix Noisy Datasets Automatically
Introduction: The “Garbage In, Garbage Out” Dilemma In the world of Machine Learning, there is an old adage that every student learns in their first semester: “Garbage In, Garbage Out.” No matter how sophisticated your neural network architecture is—whether it’s a state-of-the-art Transformer or a massive Large Language Model (LLM)—it cannot learn effectively if the data it is fed is flawed. For years, the gold standard for solving this problem was human annotation. If a dataset was messy, you hired humans to read it, label it, and clean it. But as datasets have exploded in size, reaching millions of examples, relying on human labor has become prohibitively expensive and slow. This leaves researchers in a bind: do we accept noisy data and lower performance, or do we burn through budgets cleaning it? ...
](https://deep-paper.org/en/paper/2406.17169/images/cover.png)
](https://deep-paper.org/en/paper/file-3408/images/cover.png)
](https://deep-paper.org/en/paper/2410.03458/images/cover.png)
](https://deep-paper.org/en/paper/2405.07551/images/cover.png)
](https://deep-paper.org/en/paper/2405.17830/images/cover.png)
](https://deep-paper.org/en/paper/file-3404/images/cover.png)
](https://deep-paper.org/en/paper/file-3403/images/cover.png)
](https://deep-paper.org/en/paper/2310.15337/images/cover.png)
](https://deep-paper.org/en/paper/2408.01426/images/cover.png)
](https://deep-paper.org/en/paper/2406.15951/images/cover.png)
](https://deep-paper.org/en/paper/2410.07779/images/cover.png)
](https://deep-paper.org/en/paper/file-3398/images/cover.png)
](https://deep-paper.org/en/paper/2409.19672/images/cover.png)
](https://deep-paper.org/en/paper/2409.18618/images/cover.png)
](https://deep-paper.org/en/paper/2406.13663/images/cover.png)
](https://deep-paper.org/en/paper/2401.04700/images/cover.png)
](https://deep-paper.org/en/paper/2410.12178/images/cover.png)
](https://deep-paper.org/en/paper/2412.07405/images/cover.png)
](https://deep-paper.org/en/paper/file-3391/images/cover.png)