[Is Complex Query Answering Really Complex? 🔗](https://openreview.net/pdf?id=F8NTPAz5HH)

The Illusion of Progress in Complex Query Answering

The Illusion of Progress in Complex Query Answering In the field of Artificial Intelligence, one of the holy grails is complex reasoning. We don’t just want machines that can recognize a cat in a picture; we want systems that can navigate complex chains of logic to answer questions that require multiple steps. In the domain of Knowledge Graphs (KGs), this task is known as Complex Query Answering (CQA). For years, researchers have been developing neural networks that map queries and entities into latent spaces, aiming to solve intricate logical puzzles. If you look at the leaderboards for standard benchmarks like FB15k-237 or NELL995, it looks like we are making incredible progress. Accuracy is going up, and models seem to be mastering reasoning. ...

8 min · 1690 words
[Layer-wise Alignment: Examining Safety Alignment Across Image Encoder Layers in Vision Language Models 🔗](https://arxiv.org/abs/2411.04291)

The Hidden Backdoor in Vision-Language Models: How "Early Exits" Break Safety

Introduction In the rapidly evolving landscape of Artificial Intelligence, Vision-Language Models (VLMs) like LLaVA and Llama 3.2 have become the new standard. These models can “see” an image and answer complex questions about it, from diagnosing medical X-rays to explaining memes. To make these powerful models safe for public use, researchers invest heavily in safety alignment—training the model to refuse harmful requests, like “how to build a bomb” or “how to evade taxes.” ...

2024-11 · 9 min · 1899 words
[Identifying Causal Direction via Variational Bayesian Compression 🔗](https://arxiv.org/abs/2505.07503)

Causality as Compression: How Bayesian Neural Networks Find the Arrow of Time

Introduction Imagine you are handed a spreadsheet with two columns of data: Column A and Column B. You plot them, and they are perfectly correlated. As Column A increases, Column B increases. Now, answer this question: Does A cause B, or does B cause A? This is one of the most fundamental challenges in science, economics, and artificial intelligence. We have all heard the mantra “correlation does not imply causation.” Usually, to figure out the direction of causality, we rely on interventions—we poke the system (like a randomized controlled trial in medicine) and see what breaks. ...

2025-05 · 11 min · 2219 words
[Understanding and Mitigating Memorization in Generative Models via Sharpness of Probability Landscapes 🔗](https://arxiv.org/abs/2412.04140)

Why Diffusion Models Memorize: A Geometric Perspective and How to Fix It

Why Diffusion Models Memorize: A Geometric Perspective and How to Fix It Generative AI has seen a meteoric rise, with diffusion models like Stable Diffusion and Midjourney creating stunning visuals from simple text prompts. However, beneath the impressive capabilities lies a persistent and potentially dangerous problem: memorization. Occasionally, these models do not generate new images; instead, they regurgitate exact copies of their training data. This poses significant privacy risks (e.g., leaking medical data or private photos) and copyright challenges. While researchers have proposed various heuristics to detect this, we have lacked a unified framework to explain why it happens mathematically and where it occurs in the model’s learned distribution. ...

2024-12 · 7 min · 1472 words
[Arena-based Evaluation is a fundamental yet significant evaluation paradigm for modern AI models, especially large language models (LLMs) 🔗](https://arxiv.org/abs/2505.03475)

Fixing the Referee: A Stable Framework for LLM Arena Evaluation

Introduction In the rapidly evolving landscape of Artificial Intelligence, a critical question arises every time a new Large Language Model (LLM) is released: Is it better than the rest? To answer this, the community has turned to “Model Arenas.” Platforms like Chatbot Arena allow users to prompt two anonymous models simultaneously and vote on which response is better. It is a digital colosseum where models battle for supremacy. To quantify these wins and losses into a leaderboard, researchers rely on the ELO rating system—the same algorithm used to rank chess players and video game competitors. ...

2025-05 · 8 min · 1642 words
[Flopping for FLOPs: Leveraging Equivariance for Computational Efficiency 🔗](https://arxiv.org/abs/2502.05169)

How Mirror Symmetry Can Slice Your Neural Network's Compute in Half

If you have been following the trajectory of deep learning over the last decade, you are likely familiar with “The Bitter Lesson” by Rich Sutton. The core argument is simple: historically, the only technique that consistently matters in the long run is scaling computation. Human ingenuity—hand-crafting features or encoding domain knowledge—eventually gets crushed by massive models trained on massive compute. But what if we could use human knowledge to improve how we scale computation? ...

2025-02 · 9 min · 1715 words
[Enforcing Latent Euclidean Geometry in Single-Cell VAEs for Manifold Interpolation 🔗](https://arxiv.org/abs/2507.11789)

FlatVI: Making Sense of Cellular Maps by Flattening the Latent Space

Introduction: The Mapmaker’s Dilemma in Biology Imagine you are trying to navigate across a globe, but you only have a flat paper map. You draw a straight line with a ruler from New York to London. On your map, it looks like the shortest path. But if you were to fly that route in reality, you would realize that because the Earth is curved, your “straight line” on the map is actually a longer, inefficient curve on the globe. The shortest path on the sphere (a geodesic) looks curved on your flat map. ...

2025-07 · 10 min · 2052 words
[Monte-Carlo Tree Search with Uncertainty Propagation via Optimal Transport 🔗](https://openreview.net/pdf?id=DUGFTH9W8B)

Taming Uncertainty in MCTS with Wasserstein Barycenters and Power Means

Taming Uncertainty in MCTS with Wasserstein Barycenters and Power Means Monte-Carlo Tree Search (MCTS) is the engine behind some of the most impressive feats in modern AI, most notably the superhuman performance of AlphaGo and AlphaZero in games like Go and Chess. These algorithms work by building a search tree of possibilities, simulating future outcomes, and backing up the values to make the best decision at the root. But there is a catch. Traditional MCTS excels in deterministic environments—where moving a piece to square E4 always puts the piece on E4. However, the real world is messy. It is stochastic (actions have random outcomes) and partially observable (we can’t see everything). In these “fog of war” scenarios, standard MCTS struggles. It often relies on simple averages to estimate the value of a state, effectively smoothing over the risks and high-variance outcomes that define stochastic environments. ...

8 min · 1543 words
[Re-ranking Reasoning Context with Tree Search Makes Large Vision-Language Models Stronger 🔗](https://arxiv.org/abs/2506.07785)

Beyond Simple Retrieval: How Tree Search and Reasoning Contexts Boost Multimodal AI

Introduction “One example speaks louder than a thousand words.” This adage is particularly true in the world of Artificial Intelligence. When we want a model to solve a complex problem—like analyzing a geometry diagram or interpreting a historical chart—showing it a similar, solved example often works better than giving it a long list of instructions. This technique is known as In-Context Learning (ICL). However, Large Vision-Language Models (LVLMs)—the AI systems that can see and speak—still face significant hurdles. Despite their impressive capabilities, they are prone to hallucinations. They might confidently misstate a historical event or fail to answer a user’s question entirely because they lack specific domain knowledge. ...

2025-06 · 9 min · 1727 words
[Robust Automatic Modulation Classification with Fuzzy Regularization 🔗](https://openreview.net/pdf?id=DDIGCk25BO)

Solving the 'Maybe' Problem: How Fuzzy Regularization Sharpens Signal Classification

Introduction: The Noise in the Air In the modern world, the air around us is saturated with invisible data. From cellular networks to military radar, electromagnetic signals are constantly crisscrossing the atmosphere. Managing this chaotic spectrum requires Automatic Modulation Classification (AMC). This technology acts as a digital gatekeeper, identifying the type of modulation (the method used to encode data onto a radio wave) of a detected signal. Whether it is for dynamic spectrum allocation or surveillance, the system must know: Is this a 64QAM signal? Or perhaps QPSK? ...

8 min · 1521 words
[Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain 🔗](https://arxiv.org/abs/2505.01267)

Frequency-Guided Purification: A New Paradigm for Defending Against Adversarial Attacks

Introduction In the evolving landscape of artificial intelligence, computer vision models have achieved superhuman performance in tasks ranging from medical diagnosis to autonomous driving. However, these models possess a startling vulnerability: adversarial examples. Imagine taking a photo of a panda, adding a layer of static noise imperceptible to the human eye, and feeding it to a state-of-the-art AI. Suddenly, the AI is 100% convinced the panda is a gibbon. This is not a hypothetical scenario; it is the fundamental premise of adversarial attacks. These perturbations are designed to exploit the specific mathematical sensitivities of neural networks, causing catastrophic failures in classification. ...

2025-05 · 9 min · 1896 words
[Not All Wrong is Bad: Using Adversarial Examples for Unlearning 🔗](https://openreview.net/pdf?id=BkrIQPREkn)

How to Make AI Forget: The Paradox of Adversarial Unlearning

In the era of GDPR, the California Consumer Privacy Act (CCPA), and increasing digital surveillance, the “right to be forgotten” has transitioned from a philosophical concept to a technical necessity. When a user deletes their account from a service, they expect their data to vanish not just from the database, but also from the “brain” of the AI models trained on that data. This process is known as Machine Unlearning. ...

9 min · 1886 words
[TLLC: Transfer Learning-based Label Completion for Crowdsourcing 🔗](https://openreview.net/pdf?id=BkdAnSKNoX)

Solving the Lazy Worker Problem: How Transfer Learning Fills the Gaps in Crowdsourcing

Introduction In the era of deep learning, data is the new oil. But raw data is useless without accurate labels. While we would love to have domain experts annotate every single image or document, that is often prohibitively expensive and slow. Enter crowdsourcing: a method where tasks are distributed to a large pool of non-expert workers (like on Amazon Mechanical Turk). Crowdsourcing is cost-effective, but it introduces two major headaches: ...

7 min · 1366 words
[Distribution-aware Fairness Learning in Medical Image Segmentation From An Control-Theoretic Perspective 🔗](https://arxiv.org/abs/2502.00619)

Balancing the Scales: How Control Theory is Solving Fairness in Medical AI

Artificial Intelligence has made massive strides in medical imaging, particularly in segmentation—the process of identifying and outlining boundaries of tumors or organs in scans. However, a persistent shadow hangs over these advancements: bias. Deep learning models are data-hungry. In clinical practice, data is rarely balanced. We often have an abundance of data from specific demographics (e.g., White patients) or common disease stages (e.g., T2 tumors), but a scarcity of data from minority groups or varying disease severities. When a standard neural network trains on this skewed data, it becomes a “lazy learner.” It optimizes for the majority and fails to generalize to the underrepresented groups. In a medical context, this isn’t just an accuracy problem; it’s an ethical crisis. A model that works perfectly for one patient group but fails for another is unsafe for deployment. ...

2025-02 · 8 min · 1637 words
[Learning Soft Sparse Shapes for Efficient Time-Series Classification 🔗](https://arxiv.org/abs/2505.06892)

SoftShape: Bridging Accuracy and Interpretability in Time-Series Classification with Soft Sparsification

Introduction In the world of machine learning, Time-Series Classification (TSC) is a ubiquitous challenge. From detecting heart arrhythmias in ECG signals to recognizing gestures from smartwatches or classifying robot movements on different surfaces, time-series data is everywhere. However, practitioners often face a difficult trade-off: Accuracy vs. Interpretability. Deep neural networks generally provide the highest accuracy, acting as powerful “black boxes” that ingest data and output predictions. But in critical fields like healthcare, a “black box” isn’t enough. A doctor needs to know why a model thinks a heartbeat is irregular. This is where Shapelets come in. Shapelets are specific, discriminative subsequences—little “shapes” within the data—that act as signatures for a class. While highly interpretable, discovering them is computationally expensive, and traditional methods often discard a vast amount of potentially useful contextual information to save time. ...

2025-05 · 8 min · 1671 words
[Optimizing Adaptive Attacks against Watermarks for Language Models 🔗](https://arxiv.org/abs/2410.02440)

Breaking the Seal: How Adaptive Attacks Crush LLM Watermarks

In the rapidly evolving landscape of Artificial Intelligence, a new arms race has begun. On one side, we have Large Language Model (LLM) providers like OpenAI and Google, who are striving to watermark their generated content. Their goal is noble: to label AI-generated text invisibly, helping to curb misinformation, academic dishonesty, and spam. On the other side are the adversaries—users who want to strip these watermarks away to pass off AI text as human-written. ...

2024-10 · 8 min · 1659 words
[Robust ML Auditing using Prior Knowledge 🔗](https://arxiv.org/abs/2505.04796)

The Cheating AI: How to Audit Machine Learning Models That Lie

In the rapidly evolving landscape of Artificial Intelligence, a new and somewhat disturbing game of cat-and-mouse is emerging. We rely on Machine Learning (ML) models for high-stakes decisions—from approving loan applications to moderating hate speech on social media. Consequently, regulators and society at large demand that these models be “fair.” They should not discriminate based on gender, race, or age. But here is the problem: How do you verify if a black-box model is fair? ...

2025-05 · 10 min · 2112 words
[Adjusting Model Size in Continual Gaussian Processes: How Big is Big Enough? 🔗](https://openreview.net/pdf?id=9vYGZX4OVN)

The Goldilocks Problem in AI: How to Automatically Size Models for Streaming Data

In machine learning, we often face a “Goldilocks” dilemma before we even start training a model: How big should the model be? If the model is too small (too few neurons, too few parameters), it fails to capture the complexity of the data, leading to poor predictions. If the model is too large, it wastes computational resources, memory, and energy without providing any additional accuracy. In a standard setting where you have all your training data sitting on a hard drive, you can solve this via cross-validation—trying different sizes and picking the best one. ...

9 min · 1744 words
[LotteryCodec: Searching the Implicit Representation in a Random Network for Low-Complexity Image Compression 🔗](https://arxiv.org/abs/2507.01204)

Winning the Compression Lottery: How Random Networks Can Outperform State-of-the-Art Codecs

Introduction In the world of digital media, we are constantly fighting a battle between quality and size. We want crystal-clear 4K images, but we want them to load instantly and take up zero space on our phones. For decades, manually designed algorithms like JPEG, HEVC, and VTM have reigned supreme. But recently, a challenger has entered the arena: Neural Image Compression. Neural codecs typically use deep learning to figure out how to squeeze image data better than any human-designed formula could. However, they come with a catch. They are often computationally heavy, requiring massive neural networks that drain battery life and slow down processors. ...

2025-07 · 9 min · 1720 words
[MODA: MOdular Duplex Attention for Multimodal Perception, Cognition, and Emotion Understanding 🔗](https://arxiv.org/abs/2507.04635)

Fixing Attention Deficit in AI: How MODA Teaches LLMs to Truly See and Feel

Introduction We are currently witnessing a golden age of Multimodal Large Language Models (MLLMs). From GPT-4V to Gemini, these models promise a future where Artificial Intelligence can perceive the world just as humans do—integrating text, images, and audio into a seamless stream of understanding. We often assume that because a model can see an image, it fully understands it. However, if you push these models slightly beyond surface-level description, cracks begin to appear. Ask a standard MLLM to explain the subtle sarcasm in a movie scene or the micro-expression on a poker player’s face, and it often resorts to hallucination or generic guesswork. ...

2025-07 · 8 min · 1574 words