](https://deep-paper.org/en/paper/2409.13609/images/cover.png)
How MaPPER Makes Visual Grounding Efficient: A Deep Dive into Prior-Guided Tuning
Introduction Imagine you are looking at a crowded photograph of a street scene. A friend stands beside you and says, “Look at the guy in the yellow shirt standing near the bike.” Instantly, your brain processes the language, scans the image, filters out the “guys in blue shirts” and “guys near cars,” and locks onto the specific target. In computer vision, this task is known as Referring Expression Comprehension (REC). The goal is to ground a specific region in an image based on a natural language description. While this sounds intuitive to humans, it is a complex challenge for AI. It requires a model to possess strong visual perception, deep linguistic understanding, and—most importantly—the ability to align these two modalities perfectly. ...
](https://deep-paper.org/en/paper/file-3350/images/cover.png)
](https://deep-paper.org/en/paper/file-3349/images/cover.png)
](https://deep-paper.org/en/paper/2401.16745/images/cover.png)
](https://deep-paper.org/en/paper/2409.16686/images/cover.png)
](https://deep-paper.org/en/paper/2402.03583/images/cover.png)
](https://deep-paper.org/en/paper/2403.05814/images/cover.png)
](https://deep-paper.org/en/paper/2310.18481/images/cover.png)
](https://deep-paper.org/en/paper/2410.01036/images/cover.png)
](https://deep-paper.org/en/paper/2407.02345/images/cover.png)
](https://deep-paper.org/en/paper/2311.09580/images/cover.png)
](https://deep-paper.org/en/paper/2406.13698/images/cover.png)
](https://deep-paper.org/en/paper/2406.11193/images/cover.png)
](https://deep-paper.org/en/paper/2401.02906/images/cover.png)
](https://deep-paper.org/en/paper/file-3337/images/cover.png)
](https://deep-paper.org/en/paper/2406.10701/images/cover.png)
](https://deep-paper.org/en/paper/2407.15272/images/cover.png)
](https://deep-paper.org/en/paper/2411.06616/images/cover.png)
](https://deep-paper.org/en/paper/2311.08562/images/cover.png)
](https://deep-paper.org/en/paper/2407.12196/images/cover.png)