](https://deep-paper.org/en/paper/2503.00361/images/cover.png)
Taming Hallucinations in Vision-Language Models with the Octopus Framework
Introduction Imagine asking an AI to describe a picture of a soccer field. The model confidently replies, “A player in a green jersey is kicking the ball toward the goal.” It sounds perfect, except for one problem: there is no ball in the picture. This phenomenon is known as hallucination. Large Vision-Language Models (LVLMs), despite their incredible ability to understand images and text, frequently fabricate objects, attributes, or relationships that simply don’t exist. For casual use, this is annoying. For critical applications like medical imaging analysis or autonomous driving, it is dangerous. ...
](https://deep-paper.org/en/paper/file-2155/images/cover.png)
](https://deep-paper.org/en/paper/2503.12096/images/cover.png)
](https://deep-paper.org/en/paper/2503.17142/images/cover.png)
](https://deep-paper.org/en/paper/file-2152/images/cover.png)
](https://deep-paper.org/en/paper/2503.18794/images/cover.png)
](https://deep-paper.org/en/paper/2503.18361/images/cover.png)
](https://deep-paper.org/en/paper/file-2149/images/cover.png)
](https://deep-paper.org/en/paper/2506.06898/images/cover.png)
](https://deep-paper.org/en/paper/2412.01256/images/cover.png)
](https://deep-paper.org/en/paper/2502.05165/images/cover.png)
](https://deep-paper.org/en/paper/file-2145/images/cover.png)
](https://deep-paper.org/en/paper/2410.10604/images/cover.png)
](https://deep-paper.org/en/paper/file-2142/images/cover.png)
](https://deep-paper.org/en/paper/2504.05046/images/cover.png)
](https://deep-paper.org/en/paper/file-2140/images/cover.png)
](https://deep-paper.org/en/paper/2503.09962/images/cover.png)
](https://deep-paper.org/en/paper/2405.17421/images/cover.png)
](https://deep-paper.org/en/paper/2504.05838/images/cover.png)
](https://deep-paper.org/en/paper/2503.10000/images/cover.png)