](https://deep-paper.org/en/paper/2406.15053/images/cover.png)
PARIKSHA: Uncovering the Truth About Multilingual LLM Evaluation
Introduction In the rapidly evolving world of Large Language Models (LLMs), benchmarks are the compass by which we navigate progress. We look at leaderboards to see which model is “smarter,” “faster,” or “safer.” However, there is a glaring blind spot in this landscape: linguistic and cultural diversity. Most standard benchmarks are English-centric. When multilingual benchmarks do exist, they often suffer from two critical flaws. First, test set contamination: because popular benchmarks are available on the web, models often ingest the questions during training, effectively memorizing the answers. Second, lack of cultural nuance: many benchmarks are simply English questions translated into other languages, losing the local context, idioms, and cultural values that define true fluency. ...
](https://deep-paper.org/en/paper/file-3470/images/cover.png)
](https://deep-paper.org/en/paper/2409.19806/images/cover.png)
](https://deep-paper.org/en/paper/file-3468/images/cover.png)
](https://deep-paper.org/en/paper/2403.17146/images/cover.png)
](https://deep-paper.org/en/paper/2402.13720/images/cover.png)
](https://deep-paper.org/en/paper/2409.14513/images/cover.png)
](https://deep-paper.org/en/paper/file-3463/images/cover.png)
](https://deep-paper.org/en/paper/2406.11695/images/cover.png)
](https://deep-paper.org/en/paper/file-3461/images/cover.png)
](https://deep-paper.org/en/paper/file-3460/images/cover.png)
](https://deep-paper.org/en/paper/2406.11016/images/cover.png)
](https://deep-paper.org/en/paper/2409.19270/images/cover.png)
](https://deep-paper.org/en/paper/2407.05609/images/cover.png)
](https://deep-paper.org/en/paper/2212.10618/images/cover.png)
](https://deep-paper.org/en/paper/2410.07549/images/cover.png)
](https://deep-paper.org/en/paper/2410.03421/images/cover.png)
](https://deep-paper.org/en/paper/file-3453/images/cover.png)
](https://deep-paper.org/en/paper/2406.16264/images/cover.png)
](https://deep-paper.org/en/paper/2407.08582/images/cover.png)