](https://deep-paper.org/en/paper/file-3196/images/cover.png)
Beyond Simple Similarity: How to Teach Vision-Language Models to Generalize Compositionally
Introduction Imagine you are teaching a child what a “red apple” is. You show them a picture of a red apple. Now, you want them to understand a “green chair.” You show them a green chair. Finally, you present them with a “green apple”—an object they haven’t explicitly studied before, but which is composed of concepts they already know (“green” and “apple”). If the child recognizes it, they have demonstrated Compositional Generalization. ...
](https://deep-paper.org/en/paper/2311.07237/images/cover.png)
](https://deep-paper.org/en/paper/2410.00025/images/cover.png)
](https://deep-paper.org/en/paper/file-3193/images/cover.png)
](https://deep-paper.org/en/paper/2407.15343/images/cover.png)
](https://deep-paper.org/en/paper/file-3191/images/cover.png)
](https://deep-paper.org/en/paper/file-3190/images/cover.png)
](https://deep-paper.org/en/paper/2405.19842/images/cover.png)
](https://deep-paper.org/en/paper/2410.15801/images/cover.png)
](https://deep-paper.org/en/paper/2410.09318/images/cover.png)
](https://deep-paper.org/en/paper/2407.02814/images/cover.png)
](https://deep-paper.org/en/paper/2405.02793/images/cover.png)
](https://deep-paper.org/en/paper/2403.16442/images/cover.png)
](https://deep-paper.org/en/paper/2505.06889/images/cover.png)
](https://deep-paper.org/en/paper/2409.18046/images/cover.png)
](https://deep-paper.org/en/paper/2409.19627/images/cover.png)
](https://deep-paper.org/en/paper/file-3180/images/cover.png)
](https://deep-paper.org/en/paper/file-3179/images/cover.png)
](https://deep-paper.org/en/paper/2407.14767/images/cover.png)
](https://deep-paper.org/en/paper/2402.11192/images/cover.png)