](https://deep-paper.org/en/paper/2507.04388/images/cover.png)
Unlocking the Black Box: How CoIBA Interprets Vision Transformers Using a Comprehensive Information Bottleneck
Introduction In the rapidly evolving landscape of computer vision, the Vision Transformer (ViT) has emerged as a powerhouse. From self-driving cars to medical imaging, ViTs are achieving remarkable performance, often outperforming traditional Convolutional Neural Networks (CNNs). However, like many deep learning models, they suffer from a significant drawback: they act as “black boxes.” We feed an image in, and a classification comes out, but we often have little insight into why the model made that decision. ...
](https://deep-paper.org/en/paper/2503.19145/images/cover.png)
](https://deep-paper.org/en/paper/2503.18337/images/cover.png)
](https://deep-paper.org/en/paper/file-1961/images/cover.png)
](https://deep-paper.org/en/paper/2406.10462/images/cover.png)
](https://deep-paper.org/en/paper/2503.21268/images/cover.png)
](https://deep-paper.org/en/paper/2412.00175/images/cover.png)
](https://deep-paper.org/en/paper/2506.09343/images/cover.png)
](https://deep-paper.org/en/paper/2503.18803/images/cover.png)
](https://deep-paper.org/en/paper/file-1955/images/cover.png)
](https://deep-paper.org/en/paper/2412.16155/images/cover.png)
](https://deep-paper.org/en/paper/2412.01052/images/cover.png)
](https://deep-paper.org/en/paper/2504.10158/images/cover.png)
](https://deep-paper.org/en/paper/2503.00413/images/cover.png)
](https://deep-paper.org/en/paper/file-1950/images/cover.png)
](https://deep-paper.org/en/paper/file-1949/images/cover.png)
](https://deep-paper.org/en/paper/2503.05936/images/cover.png)
](https://deep-paper.org/en/paper/2504.19478/images/cover.png)
](https://deep-paper.org/en/paper/2411.16170/images/cover.png)
](https://deep-paper.org/en/paper/2504.11230/images/cover.png)