](https://deep-paper.org/en/paper/file-3492/images/cover.png)
Can LLMs Actually Detect Hate Speech? An Analysis of Behavior Patterns and Failures
Imagine you are a content moderator for a social media platform, or perhaps a developer building a chatbot intended for elderly companionship. You want to ensure that the content processed or generated by your system is safe. Naturally, you turn to Large Language Models (LLMs) to help filter out offensive speech. You feed a comment into the model and ask: “Is this text offensive?” You expect a simple “Yes” or “No.” Instead, the model refuses to answer, lectures you on morality, or hallucinates a response that has nothing to do with the question. ...
](https://deep-paper.org/en/paper/2410.12011/images/cover.png)
](https://deep-paper.org/en/paper/file-3490/images/cover.png)
](https://deep-paper.org/en/paper/2406.10471/images/cover.png)
](https://deep-paper.org/en/paper/2404.06762/images/cover.png)
](https://deep-paper.org/en/paper/2411.05045/images/cover.png)
](https://deep-paper.org/en/paper/2407.06004/images/cover.png)
](https://deep-paper.org/en/paper/2407.15814/images/cover.png)
](https://deep-paper.org/en/paper/file-3483/images/cover.png)
](https://deep-paper.org/en/paper/2407.02352/images/cover.png)
](https://deep-paper.org/en/paper/2406.09790/images/cover.png)
](https://deep-paper.org/en/paper/2406.19898/images/cover.png)
](https://deep-paper.org/en/paper/2401.02731/images/cover.png)
](https://deep-paper.org/en/paper/2410.01383/images/cover.png)
](https://deep-paper.org/en/paper/2409.14082/images/cover.png)
](https://deep-paper.org/en/paper/2505.12423/images/cover.png)
](https://deep-paper.org/en/paper/2402.08702/images/cover.png)
](https://deep-paper.org/en/paper/file-3474/images/cover.png)
](https://deep-paper.org/en/paper/file-3472/images/cover.png)
](https://deep-paper.org/en/paper/2406.15053/images/cover.png)