ICML 2025

[When Every Millisecond Counts: Real-Time Anomaly Detection via the Multimodal Asynchronous Hybrid Network 🔗](https://arxiv.org/abs/2506.17457)

Milliseconds Matter: Fusing Event Streams and RGB for High-Speed Autonomous Safety

Introduction: The Need for Speed in Autonomous Safety Imagine you are driving down a suburban street. It’s a sunny day, the music is playing, and you are relaxed. Suddenly, from behind a parked truck, a child chases a ball into the middle of the road. Your brain processes this visual information instantly—your foot slams on the brake, and the car screeches to a halt just inches from the child. The difference between a close call and a tragedy was a fraction of a second. ...

[Rethink GraphODE Generalization within Coupled Dynamical System 🔗](https://openreview.net/pdf?id=nVD7KoU09V)

How to Teach AI Physics: Disentangling Static and Dynamic Worlds with GREAT

Introduction Imagine trying to predict the motion of a complex system, like a set of pendulums connected by springs, or charged particles bouncing around in a box. In physics and engineering, these are known as Coupled Dynamical Systems. To model them, we don’t just look at one object in isolation; we have to account for how every component interacts with every other component over time. For years, scientists used handcrafted differential equations to solve these problems. But recently, Deep Learning has entered the chat. Specifically, a framework called Graph Ordinary Differential Equations (GraphODE) has shown immense promise. By combining Graph Neural Networks (GNNs) to model interactions and ODE solvers to model time, these networks can theoretically learn the “laws of physics” directly from data. ...

[STAR: Learning Diverse Robot Skill Abstractions through Rotation-Augmented Vector Quantization 🔗](https://arxiv.org/abs/2506.03863)

Breaking the Codebook Collapse: How STAR Teaches Robots Diverse Skills via Geometric Rotation

Breaking the Codebook Collapse: How STAR Teaches Robots Diverse Skills via Geometric Rotation Imagine trying to teach a robot to cook a meal. You don’t tell the robot every single millisecond of muscle movement required to crack an egg. Instead, you think in terms of “skills”: grasp the egg, hit the edge of the pan, pull the shells apart. This hierarchical approach—breaking complex long-horizon tasks into discrete, reusable skills—is the holy grail of robotic manipulation. However, translating continuous robot actions into these discrete “words” or “tokens” is notoriously difficult. Current methods often suffer from codebook collapse, where the robot ignores most of the skills it could learn, relying on just a tiny subset of repetitive actions. Furthermore, even if the robot learns the skills, stringing them together smoothly (composition) is a separate headache. ...

[Learning Dynamics under Environmental Constraints via Measurement-Induced Bundle Structures 🔗](https://arxiv.org/abs/2505.19521)

When Geometry Meets Uncertainty: A New Framework for Safe Robot Learning

Imagine you are trying to walk through a crowded room in the dark. You can’t see perfectly; perhaps you only have a dim flashlight that flickers. You know roughly how your legs work (your dynamics), but your perception of where the furniture is (the environment) is noisy and uncertain. If you assume you know exactly where everything is, you will likely stub your toe. If you are too paralyzed by fear, you won’t move at all. ...

[Invariant Deep Uplift Modeling for Incentive Assignment in Online Marketing via Probability of Necessity and Sufficiency 🔗](https://openreview.net/pdf?id=mruyFvKDKq)

Beyond Correlation—How Invariant Deep Uplift Modeling (IDUM) Solves the Out-of-Distribution Crisis in Online Marketing

Introduction Imagine you run a massive online platform—perhaps a short-video app or an e-commerce giant. You have a budget to distribute coupons or high-quality video streams to keep users engaged. The central question of your marketing team is simple: “If we give User X a coupon, will they buy something they wouldn’t have bought otherwise?” This is not a prediction of purchase; it is a prediction of influence. This field is called Uplift Modeling. ...

[Continual Reinforcement Learning by Planning with Online World Models 🔗](https://arxiv.org/abs/2507.09177)

How to Build Robots That Never Forget: Planning with Online World Models

Imagine you are teaching a robot to make coffee. After weeks of training, it finally masters the art of grinding beans and pouring water. Next, you teach it to load the dishwasher. It learns quickly, but when you ask it to make coffee again, it stares blankly at the machine. It has completely overwritten its “coffee-making” neurons with “dishwasher-loading” neurons. This phenomenon is known as catastrophic forgetting, and it is the Achilles’ heel of Artificial Intelligence. ...

[Towards Practical Defect-Focused Automated Code Review 🔗](https://arxiv.org/abs/2505.17928)

From Nitpicks to Key Bugs: How to Build a Practical Automated Code Reviewer

Code review is the gatekeeper of software quality. In a perfect world, a senior engineer meticulously checks every line of code you write, catching subtle logic errors, security vulnerabilities, and potential performance bottlenecks before they merge. In the real world, code review is often a bottleneck. Reviewers are busy, context is hard to gather, and “LGTM” (Looks Good To Me) is sometimes typed a bit too quickly. This has driven a massive surge in research into Automated Code Review. If an AI can write code, surely it can review it? However, most existing tools fall into a trap: they treat code review as a simple translation task. They look at a small snippet of code and try to generate a sentence that “sounds” like a review. The result? A flood of “nitpicks”—comments about variable naming or formatting—while critical bugs (like null pointer dereferences or logic errors) slip through. ...

[Fishers for Free? Approximating the Fisher Information Matrix by Recycling the Squared Gradient Accumulator 🔗](https://openreview.net/pdf?id=m3zrHhiCCj)

Fishers for Free: Recycling Your Optimizer State to Estimate Parameter Importance

In the world of deep learning, we often treat model parameters as a means to an end. We train them, save them, and run inference. But not all parameters are created equal. Some weights in your neural network are critical load-bearing columns; others are decorative trim that can be removed or altered without collapsing the structure. Determining which parameters matter most is the domain of parameter sensitivity, and the “gold standard” tool for measuring this is the Fisher Information Matrix (FIM). The Fisher diagonal tells us how much the model’s output distribution would change if we perturbed a specific parameter. It is crucial for advanced techniques like Model Merging, Network Pruning, and Continual Learning. ...

[Feature learning from non-Gaussian inputs: the case of Independent Component Analysis in high dimensions 🔗](https://arxiv.org/abs/2503.23896)

Why Deep Networks Learn Gabor Filters: Unpacking ICA, High Dimensions, and Sample Complexity

Have you ever wondered why the first layer of almost every Convolutional Neural Network (CNN) looks the same? Whether you train a network to classify dogs, recognize cars, or detect tumors, the filters in the very first layer almost invariably converge to specific patterns: oriented edges and oscillating textures known as Gabor filters. This phenomenon is one of the most robust empirical facts in deep learning. It mirrors the biology of the mammalian visual cortex, which also processes visual information using similar edge detectors. But why does this happen? And more importantly, what are the mathematical mechanics driving the learning of these features from raw pixels? ...

[Better to Teach than to Give: Domain Generalized Semantic Segmentation via Agent Queries with Diffusion Model Guidance 🔗](https://openreview.net/pdf?id=jvP1wbD0xh)

QueryDiff: Teaching Segmentation Models to Generalize with Diffusion Guidance

Introduction: Teaching vs. Giving In the world of deep learning, there is an old proverb that fits surprisingly well: “Give a man a fish, and you feed him for a day. Teach a man to fish, and you feed him for a lifetime.” In the context of computer vision, specifically Domain Generalized Semantic Segmentation (DGSS), “giving a fish” is analogous to data augmentation or generating synthetic data. If you want your self-driving car model (trained on a sunny simulator) to recognize a rainy street, the standard approach is to generate thousands of rainy images and feed them to the model. While this works to an extent, it is computationally expensive and limited by the diversity of the data you can generate. ...

[Procurement Auctions via Approximately Optimal Submodular Optimization 🔗](https://arxiv.org/abs/2411.13513)

Designing Truthful Auctions for Submodular Procurement

In the world of algorithmic game theory and large-scale logistics, a fundamental problem persists: how do you buy services from multiple people who might lie about their prices, while ensuring you get the best possible “bang for your buck”? Imagine a government agency trying to procure medical supplies from various vendors, or a crowdsourcing platform hiring freelancers to label data. The buyer (auctioneer) wants to select a set of sellers to maximize the total value of the service minus the cost paid. This is known as a procurement auction. ...

[scSSL-Bench: Benchmarking Self-Supervised Learning for Single-Cell Data 🔗](https://arxiv.org/abs/2506.10031)

Cracking the Cellular Code: A Deep Dive into Self-Supervised Learning for Single-Cell Genomics

Cracking the Cellular Code: A Deep Dive into Self-Supervised Learning for Single-Cell Genomics Imagine trying to understand a complex city by looking at a satellite photo of the whole metropolitan area. You see the general layout, the highways, and the density, but you miss the individual people who make the city function. For a long time, this was the state of genomics. “Bulk sequencing” gave us an average view of millions of cells mashed together—a biological smoothie. ...

[On Learning Parallel Pancakes with Mostly Uniform Weights 🔗](https://arxiv.org/abs/2504.15251)

Unstacking the Parallel Pancakes: The Complexity of Learning Mostly Uniform Gaussian Mixtures

Introduction In the world of high-dimensional statistics and machine learning, few problems are as classic or as stubborn as learning Gaussian Mixture Models (GMMs). We use them everywhere—from astrophysics to marketing—to model populations made up of different sub-groups. The theoretical landscape of GMMs is a tale of two extremes. On one hand, if the components are spherical (perfectly round blobs), we can learn them efficiently. On the other hand, if the components are arbitrary “pancakes” (highly flattened Gaussians) that are stacked parallel to each other, the problem becomes exponentially hard. In the worst case, learning a mixture of \(k\) Gaussians in \(d\) dimensions requires time \(d^{O(k)}\). ...

[Geometric Hyena Network for Large-scale Equivariant Learning 🔗](https://arxiv.org/abs/2505.22560)

Beyond Self-Attention: Scaling Geometric Deep Learning with Geometric Hyena

In the world of deep learning for science, structure is everything. Whether it’s the folding of a protein, the twisting of an RNA strand, or the dynamics of a particle system, the geometric arrangement of atoms dictates function. To model these systems effectively, neural networks must understand two things: global context (how distant parts of a molecule interact) and equivariance (the laws of physics shouldn’t change just because you rotated the molecule). ...

[Elucidating the Design Space of Multimodal Protein Language Models 🔗](https://arxiv.org/abs/2504.11454)

Building Better Protein Models: How to Fix the 'Structure Gap' in Multimodal AI

Building Better Protein Models: How to Fix the “Structure Gap” in Multimodal AI Proteins are the molecular machinery of life. To understand biology—and to design new drugs—we need to understand two different “languages” of proteins: their sequence (the string of amino acids) and their structure (how they fold into 3D shapes). Historically, AI has treated these as separate problems. You had models like ESM for reading sequences and models like AlphaFold for predicting structures. But recently, researchers have been trying to merge these into Multimodal Protein Language Models (PLMs). Ideally, a single model should be able to read a sequence, understand its geometry, and generate new proteins that are both chemically valid and structurally sound. ...

[BAXBENCH: Can LLMs Generate Correct and Secure Backends? 🔗](https://openreview.net/pdf?id=il3KRr4H9u)

Why AI Can't Build Your Backend Yet: A Deep Dive into BAXBENCH

The software development world is currently in the grip of a revolution driven by Large Language Models (LLMs). Tools like GitHub Copilot and ChatGPT have demonstrated an uncanny ability to auto-complete functions, write unit tests, and even solve complex algorithmic puzzles. It is tempting to believe that we are on the verge of fully autonomous software engineering, where an AI can take a high-level requirement and produce a deployment-ready application. ...

[Automatically Identify and Rectify: Robust Deep Contrastive Multi-view Clustering in Noisy Scenarios 🔗](https://arxiv.org/abs/2505.21387)

Cleaning the Noise: How AIRMVC Revolutionizes Multi-View Clustering

Introduction In the era of big data, we rarely rely on a single source of information to understand the world. Consider an autonomous vehicle: it doesn’t just look through a camera; it listens to sonar, measures distance with LiDAR, and checks GPS coordinates. This aggregation of diverse data sources is the foundation of Multi-View Clustering (MVC). By fusing information from different “views” (e.g., audio, video, text), machine learning models can achieve a level of understanding that a single view simply cannot match. ...

[Primal-Dual Neural Algorithmic Reasoning 🔗](https://arxiv.org/abs/2505.24067)

Can Neural Networks Solve NP-Hard Problems? The Primal-Dual Approach

The intersection of classical algorithms and deep learning is one of the most fascinating frontiers in computer science. On one hand, we have classical algorithms—rigorous, interpretable, and guaranteed to work, but often rigid and unable to digest raw, messy real-world data. On the other hand, we have neural networks—flexible, adaptable, and capable of handling complex inputs, but often opaque and prone to hallucinating incorrect answers. Neural Algorithmic Reasoning (NAR) attempts to fuse these two worlds. The goal is to train neural networks to “reason” like algorithms. By teaching a network to mimic the steps of a classical algorithm (like Breadth-First Search or Bellman-Ford), we hope to create systems that generalize better than standard deep learning models. ...

[Do Multiple Instance Learning Models Transfer? 🔗](https://openreview.net/pdf?id=hfLqdquVt3)

Why You Should Stop Training MIL Models from Scratch - The Power of Transfer Learning in Pathology

In the world of deep learning, particularly in computer vision and natural language processing (NLP), starting from scratch is almost a cardinal sin. You wouldn’t train a language model on a blank slate when you can fine-tune BERT or GPT; you wouldn’t train an image classifier on pixels when you can use weights from ImageNet. This concept, known as transfer learning, is the engine driving modern AI. However, in Computational Pathology (CPath)—the field dedciated to analyzing digitized tissue slides for cancer diagnosis—this standard practice hasn’t fully taken hold. When researchers build Multiple Instance Learning (MIL) models to analyze gigapixel whole slide images (WSIs), they almost exclusively initialize the aggregation networks with random weights. ...

[On the Benefits of Active Data Collection in Operator Learning 🔗](https://arxiv.org/abs/2410.19725)

Why Random Sampling Isn't Enough: The Power of Active Learning in Solving PDEs

Why Random Sampling Isn’t Enough: The Power of Active Learning in Solving PDEs If you have ever dabbled in scientific computing or machine learning for physics, you know the drill. You have a Partial Differential Equation (PDE), like the Heat equation or the Navier-Stokes equation, that describes a physical system. Traditionally, solving these requires heavy numerical solvers that chew up computational resources. Enter Operator Learning. The goal here is to train a machine learning model to approximate the “solution operator” of the PDE. Instead of solving the equation from scratch every time, you feed the initial conditions or source terms into a neural network (or another estimator), and it spits out the solution almost instantly. ...