Papers

[ACE-RL: Adaptive Constraint-Enhanced Reward for Long-form Generation Reinforcement Learning 🔗](https://arxiv.org/abs/2509.04903)

Beyond 'Good Enough': How ACE-RL Teaches LLMs to Master Long-Form Writing

Large Language Models (LLMs) have become incredibly adept at understanding vast amounts of text. Give them a 100-page document, and they can summarize it, answer questions about it, and find needles in the haystack. But when you flip the script and ask them to generate a long, high-quality document—like a detailed report, a compelling story, or a legal brief—they often stumble. The output might be coherent at a sentence level, yet can quickly lose focus, become repetitive, or fail to meet the specific, nuanced requirements of the prompt. ...

[REFRAG: Rethinking RAG based Decoding 🔗](https://arxiv.org/abs/2509.01092)

REFRAG: Supercharging RAG with 30× Faster First-Token Generation

Large Language Models (LLMs) have transformed how we interact with information, but they have a well-known Achilles’ heel: their appetite for computational resources. This becomes especially apparent in Retrieval-Augmented Generation (RAG) systems, where large amounts of external text are injected into the model to help it answer questions. The more context we provide, the better the potential answer—but the slower and more expensive the process becomes. This creates a frustrating trade-off between knowledge and efficiency. ...

[EMERGENT HIERARCHICAL REASONING IN LLMS THROUGH REINFORCEMENT LEARNING 🔗](https://arxiv.org/abs/2509.03646)

How LLMs Learn to Think – Unpacking the Hierarchical Reasoning in AI

Reinforcement Learning (RL) has been a game-changer for Large Language Models (LLMs), dramatically boosting their ability to solve complex reasoning problems. As models improve, a fundamental question has remained unanswered: how exactly does this improvement happen? The training process often feels like a black box, producing curious phenomena such as sudden “aha moments” where a model appears to acquire a new emergent skill, or “length-scaling,” where longer, more detailed solutions lead to higher accuracy. ...

[SINO: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights 🔗](https://arxiv.org/abs/2509.22944)

Beyond Single Scales: Unpacking SINQ for Better, Faster LLM Quantization

Large Language Models (LLMs) have transformed artificial intelligence, enabling breathtaking capabilities in text generation, reasoning, and understanding. But this power comes with a heavy price: gigantic model sizes, high memory demands, and substantial computational costs. Deploying these models efficiently—especially on constrained hardware—is a major engineering challenge. One of the most effective tools to shrink these models is quantization—reducing the precision of model weights from high-precision floating-point numbers (like bfloat16) to low-precision integers (like int4). This can slash memory usage by 4× or more, enabling powerful models to run on consumer-grade hardware. ...

[SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents 🔗](https://arxiv.org/abs/2509.06283)

Beyond Chatbots: How Reinforcement Learning Creates Autonomous AI Researchers

We’re living in an era where Large Language Models (LLMs) are becoming incredibly powerful. Yet for many users, interacting with them still feels like a simple Q&A: you ask, they answer. But what if an AI could go further? Imagine posing a complex question—such as “What are the long-term economic impacts of quantum computing on the financial sector?”—and having the AI autonomously research it, browse relevant sources, analyze data, and present a comprehensive, evidence-backed report. ...

[Aligning Generalisation Between Humans and Machines 🔗](https://arxiv.org/abs/2411.15626)

Why AI Doesn’t 'Get It' Like We Do: Aligning How Humans and Machines Generalise

Introduction: The Alignment Problem We Don’t Talk About Enough We live in an age of incredible AI. Generative models can write poetry, create stunning art, and even help scientists discover new medicines. These powerful tools are increasingly positioned as partners in human-AI teams, where they augment our abilities to solve complex problems. But for any team to work, the members need to be on the same page. In AI, this is known as the alignment problem: making sure AI systems act according to our goals and preferences. ...

[A Neural Algorithm of Artistic Style 🔗](https://arxiv.org/abs/1508.06576)

Content vs. Style: The Algorithm That Taught Computers to Paint Like van Gogh

Have you ever looked at a painting by Vincent van Gogh and wondered what makes it so distinctively his? It’s not just the subject—the swirling starry nights or vibrant sunflowers—but the brushstrokes, the color palette, the texture that defines his work. This essence, separate from the subject matter, is what we call “style.” For centuries, the interplay between the content of an image (what it depicts) and its style (how it’s depicted) has been the domain of human artists. But what if we could teach a machine to understand this distinction—then create art of its own? ...

[Neural Style Transfer: A Review 🔗](https://arxiv.org/abs/1705.04058)

From Pixels to Picasso: A Deep Dive into Neural Style Transfer

What if you could take your favorite vacation photo and have it repainted in the style of Vincent van Gogh’s The Starry Night? Or transform a simple portrait into a cubist masterpiece worthy of Picasso? This isn’t science fiction—it’s the magic of Neural Style Transfer (NST), a revolutionary computer vision technique that blends the content of one image with the artistic style of another. Since its introduction in a groundbreaking 2015 paper by Gatys et al., NST has exploded in popularity, powering viral apps like Prisma and inspiring a massive wave of academic research. It fundamentally reshaped computational art and creativity. But how does it actually work? How can a machine understand something as abstract and human as “style”? ...

[How transferable are features in deep neural networks? 🔗](https://arxiv.org/abs/1411.1792)

General vs. Specific: A Deep Dive into Feature Transferability in Neural Networks

If you’ve spent any time training convolutional neural networks (CNNs) for image tasks, you’ve probably noticed something peculiar. No matter if you’re classifying cats, detecting cars, or segmenting medical images, the filters learned by the very first layer often look remarkably similar: a collection of edge detectors, color blobs, and Gabor-like patterns. This phenomenon is so common that it begs a fundamental question. We know the first layer learns these simple, seemingly universal patterns. We also know the final layer must be highly specialized for its specific task — a neuron firing to say “this is a Siberian Husky” is of no use in a network trying to identify different types of chairs. So, if the network starts out general and ends up specific, where does this transition happen? Does it occur abruptly at one layer, or is it a gradual shift across the network’s depth? ...

[A Comprehensive Survey on Transfer Learning 🔗](https://arxiv.org/abs/1911.02685)

Can You Teach an Old Model New Tricks? A Deep Dive into Transfer Learning

Can You Teach an Old Model New Tricks? A Deep Dive into Transfer Learning Introduction — the data dilemma In modern machine learning, more labeled data usually means better models. But collecting and labeling massive datasets is expensive, slow, and sometimes impossible. That leaves practitioners stranded: how do you build accurate models when the target task only has a handful of labeled examples? Transfer learning provides a pragmatic answer. The central idea: reuse knowledge learned from a related, data-rich task (the source domain) to help learning in a low-data task (the target domain). Like a violinist learning piano faster because of shared musical concepts, a model trained on one domain can accelerate learning on another. ...