](https://deep-paper.org/en/paper/2410.03907/images/cover.png)
Can AI Really Clean Your Kitchen? Benchmarking VLM Planning with ActPlan-1K
Introduction Imagine asking a robot to “assemble gift baskets” in your living room. A standard Large Language Model (LLM) might give you a perfect textual list of instructions: find the basket, put in the cookies, add the cheese. But what if the robot looks at the table and sees that the cookies are burnt? What if the water meant for the plants is shut off? This is the frontier of Embodied AI—moving beyond generating text to generating actionable plans based on what an agent actually sees. While LLMs have demonstrated incredible reasoning abilities, we are still figuring out how well Vision Language Models (VLMs) handle complex, multi-modal procedural planning. Can they integrate visual cues with textual goals? Can they handle “counterfactual” scenarios where things go wrong? ...
](https://deep-paper.org/en/paper/2410.12217/images/cover.png)
](https://deep-paper.org/en/paper/file-2707/images/cover.png)
](https://deep-paper.org/en/paper/2405.18111/images/cover.png)
](https://deep-paper.org/en/paper/file-2705/images/cover.png)
](https://deep-paper.org/en/paper/2411.05783/images/cover.png)
](https://deep-paper.org/en/paper/2402.16006/images/cover.png)
](https://deep-paper.org/en/paper/file-2702/images/cover.png)
](https://deep-paper.org/en/paper/2305.14341/images/cover.png)
](https://deep-paper.org/en/paper/2410.00558/images/cover.png)
](https://deep-paper.org/en/paper/2410.08696/images/cover.png)
](https://deep-paper.org/en/paper/2410.08972/images/cover.png)
](https://deep-paper.org/en/paper/2402.18909/images/cover.png)
](https://deep-paper.org/en/paper/2405.15028/images/cover.png)
](https://deep-paper.org/en/paper/2410.01555/images/cover.png)
](https://deep-paper.org/en/paper/file-2693/images/cover.png)
](https://deep-paper.org/en/paper/file-2692/images/cover.png)
](https://deep-paper.org/en/paper/2501.17569/images/cover.png)
](https://deep-paper.org/en/paper/2404.13940/images/cover.png)
](https://deep-paper.org/en/paper/2402.14901/images/cover.png)