](https://deep-paper.org/en/paper/2406.12203/images/cover.png)
Can AI Keep a Secret? Testing Social Intelligence in the Game of Avalon
Large Language Models (LLMs) have mastered the art of conversation. They can write poetry, debug code, and summarize history. But can they lie strategically? Can they deduce who among their friends is a traitor? Can they understand the subtle difference between what someone says and what they actually intend? These capabilities fall under the umbrella of Social Intelligence. While we have plenty of benchmarks for math and coding, evaluating whether an AI can navigate complex social dynamics is much harder. Most current tests are static—multiple-choice questions that don’t reflect the fluid, high-stakes nature of real human interaction. ...
](https://deep-paper.org/en/paper/file-3213/images/cover.png)
](https://deep-paper.org/en/paper/file-3212/images/cover.png)
](https://deep-paper.org/en/paper/file-3211/images/cover.png)
](https://deep-paper.org/en/paper/2406.13683/images/cover.png)
](https://deep-paper.org/en/paper/2406.14491/images/cover.png)
](https://deep-paper.org/en/paper/2404.16418/images/cover.png)
](https://deep-paper.org/en/paper/2401.13586/images/cover.png)
](https://deep-paper.org/en/paper/2410.05052/images/cover.png)
](https://deep-paper.org/en/paper/2403.00824/images/cover.png)
](https://deep-paper.org/en/paper/2410.01518/images/cover.png)
](https://deep-paper.org/en/paper/file-3203/images/cover.png)
](https://deep-paper.org/en/paper/2401.11206/images/cover.png)
](https://deep-paper.org/en/paper/2404.11095/images/cover.png)
](https://deep-paper.org/en/paper/file-3200/images/cover.png)
](https://deep-paper.org/en/paper/2404.10877/images/cover.png)
](https://deep-paper.org/en/paper/2503.16043/images/cover.png)
](https://deep-paper.org/en/paper/2405.10512/images/cover.png)
](https://deep-paper.org/en/paper/file-3196/images/cover.png)
](https://deep-paper.org/en/paper/2311.07237/images/cover.png)