](https://deep-paper.org/en/paper/2406.06369/images/cover.png)
Can AI Judge Safety? Measuring Alignment Between LLMs and Human Annotators
As Large Language Models (LLMs) become central to our digital interactions, the question of “safety” has moved from a theoretical concern to a practical necessity. We rely on these models not just to chat, but increasingly to evaluate the safety of other systems. This creates a recursive loop: AI is being used to police AI. But this raises a fundamental question: Do LLMs actually understand safety the way humans do? ...
](https://deep-paper.org/en/paper/file-2750/images/cover.png)
](https://deep-paper.org/en/paper/2408.10490/images/cover.png)
](https://deep-paper.org/en/paper/2402.12370/images/cover.png)
](https://deep-paper.org/en/paper/2404.01247/images/cover.png)
](https://deep-paper.org/en/paper/2406.08958/images/cover.png)
](https://deep-paper.org/en/paper/2406.14760/images/cover.png)
](https://deep-paper.org/en/paper/2411.06228/images/cover.png)
](https://deep-paper.org/en/paper/file-2742/images/cover.png)
](https://deep-paper.org/en/paper/file-2741/images/cover.png)
](https://deep-paper.org/en/paper/2411.06048/images/cover.png)
](https://deep-paper.org/en/paper/2410.15168/images/cover.png)
](https://deep-paper.org/en/paper/2409.03203/images/cover.png)
](https://deep-paper.org/en/paper/2404.07461/images/cover.png)
](https://deep-paper.org/en/paper/2406.19415/images/cover.png)
](https://deep-paper.org/en/paper/2410.17251/images/cover.png)
](https://deep-paper.org/en/paper/2410.10054/images/cover.png)
](https://deep-paper.org/en/paper/file-2733/images/cover.png)
](https://deep-paper.org/en/paper/2401.05072/images/cover.png)
](https://deep-paper.org/en/paper/2406.14155/images/cover.png)