](https://deep-paper.org/en/paper/2412.06171/images/cover.png)
Elementary, My Dear Watson: How Holmes-VAU Solves Video Anomalies Like a Detective
Introduction Imagine you are a detective reviewing CCTV footage of a busy city street. Hours of mundane traffic pass by: cars stopping at red lights, pedestrians crossing, rain falling. Suddenly, for three seconds, a car swerves erratically and clips a bus before speeding off. If you were a traditional computer vision model, you might flag a “spike” in an anomaly score at that timestamp. But you wouldn’t necessarily know why. Was it a fight? An explosion? A traffic accident? Furthermore, to understand that this was a “hit-and-run,” you need to watch the moments leading up to the swerve and the aftermath. You need context. ...
](https://deep-paper.org/en/paper/2504.01512/images/cover.png)
](https://deep-paper.org/en/paper/file-2068/images/cover.png)
](https://deep-paper.org/en/paper/2503.18682/images/cover.png)
](https://deep-paper.org/en/paper/2501.02973/images/cover.png)
](https://deep-paper.org/en/paper/2411.19167/images/cover.png)
](https://deep-paper.org/en/paper/2411.18335/images/cover.png)
](https://deep-paper.org/en/paper/2504.10676/images/cover.png)
](https://deep-paper.org/en/paper/file-2061/images/cover.png)
](https://deep-paper.org/en/paper/file-2060/images/cover.png)
](https://deep-paper.org/en/paper/2502.20162/images/cover.png)
](https://deep-paper.org/en/paper/2412.00505/images/cover.png)
](https://deep-paper.org/en/paper/2502.04896/images/cover.png)
](https://deep-paper.org/en/paper/2504.07025/images/cover.png)
](https://deep-paper.org/en/paper/2412.04244/images/cover.png)
](https://deep-paper.org/en/paper/2412.02168/images/cover.png)
](https://deep-paper.org/en/paper/2411.16683/images/cover.png)
](https://deep-paper.org/en/paper/2412.15211/images/cover.png)
](https://deep-paper.org/en/paper/2503.17417/images/cover.png)
](https://deep-paper.org/en/paper/2412.06234/images/cover.png)