](https://deep-paper.org/en/paper/2501.10357/images/cover.png)
Taming the Wild - A New Standard for Zero-Shot Monocular Scene Flow
Introduction Imagine you are looking at a standard video clip. It’s a 2D sequence of images. Your brain, processing this monocular (single-eye) view, instantly understands two things: the 3D structure of the scene (what is close, what is far) and the motion of objects (where things are moving in that 3D space). For computer vision models, replicating this human intuition is an incredibly difficult task known as Monocular Scene Flow (MSF). While we have seen massive leaps in Artificial Intelligence regarding static depth estimation or 2D optical flow, estimating dense 3D motion from a single camera remains an elusive frontier. ...
](https://deep-paper.org/en/paper/2503.11651/images/cover.png)
](https://deep-paper.org/en/paper/2307.16375/images/cover.png)
](https://deep-paper.org/en/paper/2502.21201/images/cover.png)
](https://deep-paper.org/en/paper/2504.11773/images/cover.png)
](https://deep-paper.org/en/paper/2501.01423/images/cover.png)
](https://deep-paper.org/en/paper/2412.03572/images/cover.png)
](https://deep-paper.org/en/paper/2409.17146/images/cover.png)
](https://deep-paper.org/en/paper/2412.04463/images/cover.png)
](https://deep-paper.org/en/paper/2501.09898/images/cover.png)
](https://deep-paper.org/en/paper/file-1905/images/cover.png)
](https://deep-paper.org/en/paper/2503.01774/images/cover.png)
](https://deep-paper.org/en/paper/2505.04788/images/cover.png)
](https://deep-paper.org/en/paper/2503.10148/images/cover.png)
](https://deep-paper.org/en/paper/2006.11477/images/cover.png)
](https://deep-paper.org/en/paper/1904.05862/images/cover.png)
](https://deep-paper.org/en/paper/2501.07493/images/cover.png)
](https://deep-paper.org/en/paper/2504.12714/images/cover.png)
](https://deep-paper.org/en/paper/2503.01776/images/cover.png)
](https://deep-paper.org/en/paper/2502.02492/images/cover.png)