](https://deep-paper.org/en/paper/2409.19984/images/cover.png)
Probability or Guesswork? Investigating Consistency in Large Language Models
Large Language Models (LLMs) have become the engines driving modern AI, from chatbots to code generators. In many of these applications, we don’t just care about the text the model generates; we care about the score—the probability the model assigns to a specific sequence of words. These scores are used to detect hallucinations, rank potential answers, and measure the model’s confidence. But here is the uncomfortable question: Can we actually trust these numbers as mathematical probabilities? ...
](https://deep-paper.org/en/paper/2407.17467/images/cover.png)
](https://deep-paper.org/en/paper/2308.08295/images/cover.png)
](https://deep-paper.org/en/paper/2406.12018/images/cover.png)
](https://deep-paper.org/en/paper/2406.05013/images/cover.png)
](https://deep-paper.org/en/paper/2409.01366/images/cover.png)
](https://deep-paper.org/en/paper/2406.19131/images/cover.png)
](https://deep-paper.org/en/paper/file-2818/images/cover.png)
](https://deep-paper.org/en/paper/2410.03925/images/cover.png)
](https://deep-paper.org/en/paper/2406.16536/images/cover.png)
](https://deep-paper.org/en/paper/2407.10385/images/cover.png)
](https://deep-paper.org/en/paper/file-2814/images/cover.png)
](https://deep-paper.org/en/paper/2410.05600/images/cover.png)
](https://deep-paper.org/en/paper/2406.12608/images/cover.png)
](https://deep-paper.org/en/paper/file-2811/images/cover.png)
](https://deep-paper.org/en/paper/2401.10440/images/cover.png)
](https://deep-paper.org/en/paper/file-2809/images/cover.png)
](https://deep-paper.org/en/paper/2407.02118/images/cover.png)
](https://deep-paper.org/en/paper/file-2807/images/cover.png)
](https://deep-paper.org/en/paper/2406.11375/images/cover.png)