ICML 2025

[Do We Really Need Message Passing in Brain Network Modeling? 🔗](https://openreview.net/pdf?id=KRosBwvhDx)

Rethinking Brain Networks: Why We Might Be Doing Graph Learning Wrong

The human brain is arguably the most complex network in existence. To understand it, researchers have turned to Graph Neural Networks (GNNs) and Transformers. These deep learning architectures have revolutionized how we process graph data, from social networks to molecular structures. It seems only logical to apply them to the “connectome”—the map of neural connections in the brain. But a recent paper poses a provocative question that challenges this standard approach: “Do we really need message passing in brain network modeling?” ...

[AXBENCH: Steering LLMs? Even Simple Baselines Outperform Sparse Autoencoders 🔗](https://openreview.net/pdf?id=K2CckZjNy0)

Steering the Giant: Why Simple Baselines Are Beating Sparse Autoencoders in LLM Control

Large Language Models (LLMs) are powerful, but controlling them—making sure they follow instructions, avoid toxicity, or adhere to specific themes—remains one of the biggest challenges in AI safety. Currently, the industry relies heavily on prompting (asking the model nicely) and finetuning (retraining the model on new data). While effective, these methods have significant drawbacks: prompting can be circumvented by “jailbreaks,” and finetuning is computationally expensive and opaque. Enter Representation Engineering. This emerging field hopes to open the “black box” of the neural network, identify the specific internal activations (or “neurons”) responsible for a concept, and manually tweak them to steer the model’s behavior. The holy grail of this field has recently been Sparse Autoencoders (SAEs)—unsupervised tools that decompose model activations into interpretable features. ...

[Adaptive Learn-then-Test: Statistically Valid and Efficient Hyperparameter Selection 🔗](https://arxiv.org/abs/2409.15844)

Betting on the Best: How Adaptive Learn-then-Test Revolutionizes Safe AI Deployment

Introduction Imagine you have trained a machine learning model for a critical task—perhaps detecting tumors in medical scans or controlling a robotic arm in a factory. During training, the model seemed to perform well. But is “seeming to perform well” enough when safety is on the line? In the real world, the gap between training performance and deployment reliability can be dangerous. To bridge this gap, we often perform calibration: selecting the right hyperparameters (settings) to ensure the model meets a strict safety standard, such as “95% accuracy on the true population.” ...

[DPO Meets PPO: Reinforced Token Optimization for RLHF 🔗](https://arxiv.org/abs/2404.18922)

When DPO Meets PPO: Unlocking Dense Rewards for Better LLM Alignment

Reinforcement Learning from Human Feedback (RLHF) is the secret sauce behind the modern revolution in Large Language Models (LLMs). It is the process that turns a raw, text-predicting model into a helpful assistant like ChatGPT, Claude, or Gemini. The standard recipe for RLHF has been established for years: take a pre-trained model, collect human preferences (e.g., “Answer A is better than Answer B”), train a reward model, and then optimize the language model using an algorithm called PPO (Proximal Policy Optimization). ...

[Catch Your Emotion: Sharpening Emotion Perception in Multimodal Large Language Models 🔗](https://openreview.net/pdf?id=IYOksPHJKT)

Can AI Feel? Improving Emotion Perception in MLLMs Without Training

Introduction In the rapidly evolving world of Artificial Intelligence, Multimodal Large Language Models (MLLMs) like LLaVA and GPT-4V have become incredibly adept at describing the world. Show them a picture of a crowded street, and they can list the objects, read the signs, and even deduce the time of day. However, there is a frontier where these powerful models still stumble: Emotional Intelligence. While an MLLM can identify a person smiling, it often struggles to distinguish the nuance between “amusement” and “excitement,” or between “sadness” and “fear.” Why? Because unlike object detection, emotion is abstract, subjective, and often hidden in subtle cues rather than obvious shapes. ...

[PCEvolve: Private Contrastive Evolution for Synthetic Dataset Generation via Few-Shot Private Data and Generative APIs 🔗](https://arxiv.org/abs/2506.05407)

Solving the Data Scarcity Paradox: How PCEvolve Generates Private Synthetic Data from Few-Shot Inputs

In the current era of Artificial Intelligence, we are witnessing a paradox. On one hand, we have incredibly powerful Generative APIs (like Stable Diffusion or DALL-E) that can create almost any image from a simple text prompt. On the other hand, the specialized domains that need these tools the most—such as healthcare and high-precision manufacturing—are often starved for data. Clinics may have only a handful of X-rays for a rare condition. Factories might have only a few dozen images of a specific defect on a production line. This is the “few-shot” data problem. To make matters more complicated, this data is often highly sensitive. A hospital cannot simply upload patient records to a public cloud API to generate more training data due to privacy regulations like HIPAA or GDPR. ...

[The Number of Trials Matters in Infinite-Horizon General-Utility Markov Decision Processes 🔗](https://openreview.net/pdf?id=I4jNAbqHnM)

Why Your Sample Size Matters: The Hidden Trap in General-Utility RL

In the world of Reinforcement Learning (RL), we usually frame problems as maximizing the sum of rewards. You take an action, get a reward, and try to get as much of it as possible over time. But what if your goal isn’t just about accumulating points? What if you want an agent to explore an environment as diversely as possible, or imitate a human expert’s behavior distribution? These complex goals fall under the umbrella of General-Utility Markov Decision Processes (GUMDPs). Unlike standard RL, where the objective is linear, GUMDPs allow for objective functions that depend on the statistics (or occupancy) of the agent’s behavior. ...

[Towards a Mechanistic Explanation of Diffusion Model Generalization 🔗](https://arxiv.org/abs/2411.19339)

Why Do Diffusion Models Generalize? It’s All About the Patches

Generative AI, particularly diffusion models like Stable Diffusion or DALL-E, often feels like magic. You input noise (and perhaps a text prompt), and out pops a coherent, novel image. But from a mathematical perspective, this “novelty” is actually a bit of a puzzle. Ideally, if a diffusion model is mathematically “perfect,” it shouldn’t generate new images at all—it should simply memorize and reproduce its training data. Yet, in practice, neural networks do generalize. They create images that look like they belong to the training distribution but aren’t exact copies. ...

[Reducing Variance of Stochastic Optimization for Approximating Nash Equilibria in Normal-Form Games 🔗](https://openreview.net/pdf?id=Hp53p5AU7X)

Taming the Variance: How 'Nash Advantage Loss' Accelerates Game Solving with Machine Learning

Introduction In the intersection of Economics, Computer Science, and AI, few concepts are as pivotal as the Nash Equilibrium (NE). It describes a state in a game where no player can benefit by changing their strategy while everyone else keeps theirs unchanged. From poker bots to automated financial trading and multi-agent robotics, finding an NE is often the ultimate goal. However, calculating an NE is notoriously difficult. As games scale up—think of a game with dozens of players or thousands of possible moves—the computational complexity explodes. This is where Machine Learning (ML) enters the arena. Modern ML has revolutionized optimization, solving complex non-convex problems in image recognition and NLP. Naturally, researchers have asked: Can we use the stochastic optimization power of ML to find Nash Equilibria in massive games? ...

[Towards Robustness and Explainability of Automatic Algorithm Selection 🔗](https://openreview.net/pdf?id=Gp7NfP7Erm)

Opening the Black Box: How Causal Graphs Are Revolutionizing Algorithm Selection

In the world of computer science, the “No Free Lunch” theorem is a hard truth: no single algorithm performs best on every possible problem. Whether you are solving the Traveling Salesman Problem, training a neural network, or solving SAT instances, the “best” tool for the job depends entirely on the specific characteristics of the problem at hand. This has likely led you to the field of Automatic Algorithm Selection (AS). The goal of AS is simple but ambitious: given a specific problem instance, automatically predict which algorithm from a portfolio will solve it most efficiently. ...

[Policy Regularization on Globally Accessible States in Cross-Dynamics Reinforcement Learning 🔗](https://arxiv.org/abs/2503.06893)

Stop Imitating the Impossible—How ASOR Solves Cross-Dynamics RL

Introduction Imagine training an autonomous vehicle in a simulation where the roads are empty. The car learns that driving at 80 mph is perfectly safe and efficient. Now, you deploy that same policy into a city with heavy traffic. Suddenly, that “optimal” behavior of driving 80 mph is no longer efficient—it’s catastrophic. The state of “driving fast safely” has become inaccessible due to the change in environment dynamics (traffic density). ...

[TimeBase: The Power of Minimalism in Efficient Long-term Time Series Forecasting 🔗](https://openreview.net/pdf?id=GhTdNOMfOD)

Less is More: How TimeBase Revolutionizes Time Series Forecasting with Minimalism

In the current landscape of Artificial Intelligence, the prevailing mantra has largely been “bigger is better.” From Large Language Models (LLMs) like GPT-4 to massive Vision Transformers (ViTs), the trend is to scale up parameters into the billions to capture complex dependencies. It is natural to assume that this logic applies everywhere—including Long-term Time Series Forecasting (LTSF). But does it? Time series data—like the fluctuation of electricity usage, traffic flow, or weather patterns—is fundamentally different from language or images. It is often repetitive, periodic, and governed by simpler underlying rules. Do we really need a billion parameters to predict that traffic will peak at 5:00 PM? ...

[Learning Parametric Distributions from Samples and Preferences 🔗](https://arxiv.org/abs/2505.23557)

Why Preferences Matter: Breaking the 1/√n Barrier in Statistical Learning

Recent advances in Generative AI, particularly Large Language Models (LLMs), have cemented “learning from preferences” (like Reinforcement Learning from Human Feedback, or RLHF) as a critical step in model training. We know empirically that telling a model “Response A is better than Response B” often yields better results than simply showing it “Response A” as a good example. But from a statistical perspective, why is this the case? Does preference data simply add more information, or does it fundamentally change the mathematical nature of the learning process? ...

[Which Agent Causes Task Failures and When? On Automated Failure Attribution of LLM Multi-Agent Systems 🔗](https://openreview.net/pdf?id=GazlTYxZss)

Who Broke the Code? The Quest to Automate Failure Attribution in Multi-Agent Systems

Introduction: The Detective Work of AI Development Imagine managing a team of software developers, researchers, and data analysts. You assign them a complex project—say, analyzing the housing market in a specific city—and wait for the results. But when the report comes back, it’s wrong. The data is hallucinated, or the code failed to execute. Now, you have to figure out who on the team dropped the ball and when exactly things went south. Was it the analyst who pulled the wrong file? The coder who wrote a buggy script? Or the manager who gave unclear instructions? ...

[Improving Consistency Models with Generator-Augmented Flows 🔗](https://arxiv.org/abs/2406.09570)

Closing the Gap in Generative AI—How Generator-Augmented Flows Fix Consistency Training

Introduction Generative AI has undergone a massive transformation with the advent of diffusion models. These models, which power tools like Stable Diffusion and DALL-E, generate stunning images by gradually removing noise from a signal. However, they suffer from a well-known bottleneck: speed. Generating a single image often requires dozens or hundreds of sequential steps. To solve this, researchers introduced Consistency Models (CMs). The promise of a consistency model is alluring: it aims to generate high-quality data in a single step (or very few steps) by learning to map any point on a noisy trajectory directly to its clean starting point. ...

[Benign Samples Matter! Fine-tuning On Outlier Benign Samples Severely Breaks Safety 🔗](https://arxiv.org/abs/2505.06843)

The Benign Trojan Horse: How Innocent Data Can Break LLM Safety

Introduction In the world of Large Language Models (LLMs), “safety alignment” is the guardrail that prevents your AI assistant from teaching you how to build a bomb or launder money. Companies spend millions on Reinforcement Learning from Human Feedback (RLHF) to ensure these models refuse harmful requests. For a long time, the assumption has been straightforward: to break this safety alignment during fine-tuning, you need malicious data. If you fine-tune a safe model on a dataset full of hate speech or illegal instructions, the model will naturally become harmful. Consequently, the defense strategy has been equally straightforward: filter the training data. If we scan datasets for toxicity and remove the bad apples, the model should remain safe. ...

[Ad-Hoc Human-AI Coordination Challenge 🔗](https://arxiv.org/abs/2506.21490)

Breaking the Self-Play Bubble: The Ad-Hoc Human-AI Coordination Challenge

Introduction We are witnessing a golden age of Artificial Intelligence. From Large Language Models (LLMs) drafting emails to reinforcement learning agents mastering complex strategy games like Go and Dota 2, AI capabilities are skyrocketing. However, a critical gap remains between an AI’s ability to solve a problem in isolation and its ability to solve a problem with us. Consider a self-driving car. It is not enough for the vehicle to navigate a track perfectly when alone; it must interpret the subtle, unwritten rules of negotiation with human drivers at a four-way stop. Similarly, an AI assistant in a hospital cannot simply optimize for the fastest procedure; it must coordinate with doctors and nurses, understanding their intent and adapting to their workflow. ...

[Learning Safety Constraints for Large Language Models 🔗](https://arxiv.org/abs/2505.24445)

Geometry as a Shield - Ensuring LLM Safety with the Safety Polytope (SaP)

Introduction Large Language Models (LLMs) have become ubiquitous, demonstrating incredible prowess in reasoning, coding, and creative writing. However, this power comes with a significant “dual-use” risk. The same model that can write a helpful medical summary can, if prompted maliciously, generate hate speech, instructions for illegal acts, or biological weapon recipes. To combat this, the AI community has largely relied on methods like Reinforcement Learning from Human Feedback (RLHF). While effective to a degree, RLHF has fundamental limitations. Models can learn to “game” the reward function—optimizing for the metric rather than genuine safety—and the process is expensive, requiring massive amounts of annotated data and retraining. Furthermore, adversarial attacks (or “jailbreaks”) have proven that even aligned models can be tricked into bypassing their safety filters using cleverly crafted prompts. ...

[Penalizing Infeasible Actions and Reward Scaling in Reinforcement Learning with Offline Data 🔗](https://arxiv.org/abs/2507.08761)

Taming the Unknown - How PARS Solves Extrapolation Error in Offline RL

Introduction Imagine trying to learn how to ride a bicycle just by watching videos of professional cyclists. You’ve never touched the pedals yourself. If you suddenly hopped on a bike, you might assume you can perform a wheelie because you saw it in a video, but in reality, you’d likely fall over. This is the central challenge of Offline Reinforcement Learning (Offline RL). We want to train agents to make optimal decisions using only a static dataset of previously collected experiences, without letting them interact with the dangerous real world during training. ...

[Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D 🔗](https://arxiv.org/abs/2504.14151)

Bridging Language and the Physical World: A Deep Dive into Locate 3D and 3D-JEPA

Imagine asking a robot to “pick up the small coffee table between the sofa and the lamp.” For a human, this is trivial. We instantly parse the scene, identify the sofa, the lamp, and the specific table that sits between them. For an AI, however, this task—known as 3D Referential Grounding (or 3D-REFEXP)—is notoriously difficult. The AI must understand natural language, perceive 3D geometry, handle the noise inherent in real-world sensors, and reason about spatial relationships. Historically, models attempting this have relied on “cheats,” such as pre-processed, perfect 3D meshes or human-annotated segmentation maps. But the real world doesn’t come with pre-labeled meshes; it comes as a messy stream of RGB-D (color and depth) sensor data. ...