Papers

[EMERGENT HIERARCHICAL REASONING IN LLMS THROUGH REINFORCEMENT LEARNING 🔗](https://arxiv.org/abs/2509.03646)

How LLMs Learn to Think – Unpacking the Hierarchical Reasoning in AI

Reinforcement Learning (RL) has been a game-changer for Large Language Models (LLMs), dramatically boosting their ability to solve complex reasoning problems. As models improve, a fundamental question has remained unanswered: how exactly does this improvement happen? The training process often feels like a black box, producing curious phenomena such as sudden “aha moments” where a model appears to acquire a new emergent skill, or “length-scaling,” where longer, more detailed solutions lead to higher accuracy. ...

[SINO: Sinkhorn-Normalized Quantization for Calibration-Free Low-Precision LLM Weights 🔗](https://arxiv.org/abs/2509.22944)

Beyond Single Scales: Unpacking SINQ for Better, Faster LLM Quantization

Large Language Models (LLMs) have transformed artificial intelligence, enabling breathtaking capabilities in text generation, reasoning, and understanding. But this power comes with a heavy price: gigantic model sizes, high memory demands, and substantial computational costs. Deploying these models efficiently—especially on constrained hardware—is a major engineering challenge. One of the most effective tools to shrink these models is quantization—reducing the precision of model weights from high-precision floating-point numbers (like bfloat16) to low-precision integers (like int4). This can slash memory usage by 4× or more, enabling powerful models to run on consumer-grade hardware. ...

[SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents 🔗](https://arxiv.org/abs/2509.06283)

Beyond Chatbots: How Reinforcement Learning Creates Autonomous AI Researchers

We’re living in an era where Large Language Models (LLMs) are becoming incredibly powerful. Yet for many users, interacting with them still feels like a simple Q&A: you ask, they answer. But what if an AI could go further? Imagine posing a complex question—such as “What are the long-term economic impacts of quantum computing on the financial sector?”—and having the AI autonomously research it, browse relevant sources, analyze data, and present a comprehensive, evidence-backed report. ...

[Aligning Generalisation Between Humans and Machines 🔗](https://arxiv.org/abs/2411.15626)

Why AI Doesn’t 'Get It' Like We Do: Aligning How Humans and Machines Generalise

Introduction: The Alignment Problem We Don’t Talk About Enough We live in an age of incredible AI. Generative models can write poetry, create stunning art, and even help scientists discover new medicines. These powerful tools are increasingly positioned as partners in human-AI teams, where they augment our abilities to solve complex problems. But for any team to work, the members need to be on the same page. In AI, this is known as the alignment problem: making sure AI systems act according to our goals and preferences. ...

[A Neural Algorithm of Artistic Style 🔗](https://arxiv.org/abs/1508.06576)

Content vs. Style: The Algorithm That Taught Computers to Paint Like van Gogh

Have you ever looked at a painting by Vincent van Gogh and wondered what makes it so distinctively his? It’s not just the subject—the swirling starry nights or vibrant sunflowers—but the brushstrokes, the color palette, the texture that defines his work. This essence, separate from the subject matter, is what we call “style.” For centuries, the interplay between the content of an image (what it depicts) and its style (how it’s depicted) has been the domain of human artists. But what if we could teach a machine to understand this distinction—then create art of its own? ...

[Neural Style Transfer: A Review 🔗](https://arxiv.org/abs/1705.04058)

From Pixels to Picasso: A Deep Dive into Neural Style Transfer

What if you could take your favorite vacation photo and have it repainted in the style of Vincent van Gogh’s The Starry Night? Or transform a simple portrait into a cubist masterpiece worthy of Picasso? This isn’t science fiction—it’s the magic of Neural Style Transfer (NST), a revolutionary computer vision technique that blends the content of one image with the artistic style of another. Since its introduction in a groundbreaking 2015 paper by Gatys et al., NST has exploded in popularity, powering viral apps like Prisma and inspiring a massive wave of academic research. It fundamentally reshaped computational art and creativity. But how does it actually work? How can a machine understand something as abstract and human as “style”? ...

[How transferable are features in deep neural networks? 🔗](https://arxiv.org/abs/1411.1792)

General vs. Specific: A Deep Dive into Feature Transferability in Neural Networks

If you’ve spent any time training convolutional neural networks (CNNs) for image tasks, you’ve probably noticed something peculiar. No matter if you’re classifying cats, detecting cars, or segmenting medical images, the filters learned by the very first layer often look remarkably similar: a collection of edge detectors, color blobs, and Gabor-like patterns. This phenomenon is so common that it begs a fundamental question. We know the first layer learns these simple, seemingly universal patterns. We also know the final layer must be highly specialized for its specific task — a neuron firing to say “this is a Siberian Husky” is of no use in a network trying to identify different types of chairs. So, if the network starts out general and ends up specific, where does this transition happen? Does it occur abruptly at one layer, or is it a gradual shift across the network’s depth? ...

[A Comprehensive Survey on Transfer Learning 🔗](https://arxiv.org/abs/1911.02685)

Can You Teach an Old Model New Tricks? A Deep Dive into Transfer Learning

Can You Teach an Old Model New Tricks? A Deep Dive into Transfer Learning Introduction — the data dilemma In modern machine learning, more labeled data usually means better models. But collecting and labeling massive datasets is expensive, slow, and sometimes impossible. That leaves practitioners stranded: how do you build accurate models when the target task only has a handful of labeled examples? Transfer learning provides a pragmatic answer. The central idea: reuse knowledge learned from a related, data-rich task (the source domain) to help learning in a low-data task (the target domain). Like a violinist learning piano faster because of shared musical concepts, a model trained on one domain can accelerate learning on another. ...

Don't Start from Scratch: How Transfer Learning Is Revolutionizing Machine Learning

Imagine you’ve spent months training a sophisticated machine learning model to identify different types of cars in images. It’s brilliant at distinguishing a sedan from an SUV. Now, you’re tasked with a new project: identifying trucks. In a traditional machine learning world, you would have to start all over again—collecting thousands of labeled truck images and training a brand-new model from scratch. It feels wasteful, doesn’t it? All that knowledge your first model learned about edges, wheels, and metallic surfaces seems like it should be useful. ...

[Attention Is All You Need 🔗](https://arxiv.org/abs/1706.03762)

Dissecting the Transformer: The Paper That Revolutionized NLP

In the world of Natural Language Processing (NLP), some research moments feel like earthquakes. They don’t just shift the ground—they reshape the entire landscape. The 2017 paper “Attention Is All You Need” was one such moment. It introduced an architecture that has since become the foundation for nearly every state-of-the-art NLP model, from GPT-3 to BERT. That architecture is the Transformer. Before the Transformer, the go-to models for sequence tasks like machine translation were Recurrent Neural Networks (RNNs), particularly LSTMs and GRUs. These models process text sequentially, word by word, maintaining a hidden state that carries information from the past. While this sequential nature is intuitive, it is also their greatest weakness: it makes them slow to train and notoriously difficult to parallelize. Capturing dependencies between words far apart in a sentence becomes increasingly challenging as sequence length grows. ...

[Mask R-CNN 🔗](https://arxiv.org/abs/1703.06870)

Beyond Bounding Boxes: A Deep Dive into Mask R-CNN

Computer vision has made incredible strides in teaching machines to see. We’ve gone from simply classifying an entire image (“this is a cat”) to detecting individual objects within it (“here is a cat, and here is a dog”). But what if we need more detail? What if, instead of just drawing a box around the cat, we wanted to know the exact pixels that belong to the cat? This is the challenge of instance segmentation—a task that combines two fundamental problems: ...

[V-Net: Fully Convolutional Neural Networks for Volumetric Medical Image Segmentation 🔗](https://arxiv.org/abs/1606.04797)

Beyond the Slice: How V-Net Revolutionized 3D Medical Image Segmentation

Imagine a radiologist meticulously scrolling through hundreds of MRI slices, trying to trace the exact boundary of a tumor or an organ. This process, known as segmentation, is fundamental to medical diagnosis, treatment planning, and research. It’s also incredibly time-consuming, tedious, and subject to human error. For years, computer scientists have sought to automate this task, but the complexity of 3D medical data—like MRIs and CT scans—has been a major hurdle. ...

[U-Net: Convolutional Networks for Biomedical Image Segmentation 🔗](https://arxiv.org/abs/1505.04597)

U-Net: The Architecture That Made Deep Learning Work With Tiny Datasets

How can we teach a computer to see like a biologist — not just to recognize that an image contains cells, but to outline the precise boundaries of every single one? This task, known as image segmentation, is a cornerstone of biomedical research and diagnostics. It automates the analysis of thousands of microscope images, helps track cancer progression, and maps entire neural circuits. Deep learning models seemed like the perfect tool for this work. Breakthrough architectures such as AlexNet showed that convolutional neural networks (CNNs) could learn powerful visual representations — but they required massive datasets. Training AlexNet involved over a million labeled images. In biomedical imaging, collecting and annotating even a few hundred examples is often expensive and time-consuming. This data scarcity was a serious roadblock. ...

[Fully Convolutional Networks for Semantic Segmentation 🔗](https://arxiv.org/abs/1411.4038)

FCN: The Paper That Turned CNNs into Pixel-Perfect Segmentation Machines

For years, Convolutional Neural Networks (CNNs) have been the undisputed champions of image classification. Give a CNN a picture, and it can tell you with incredible accuracy whether you’re looking at a cat, a dog, or a car. But what if you want to know where the cat is in the picture—not just a bounding box around it, but its exact outline, pixel by pixel? This is the task of semantic segmentation, and it’s a giant leap from classification’s “what” to a much deeper “what and where.” ...

[Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks 🔗](https://arxiv.org/abs/1506.01497)

Faster R-CNN: The Breakthrough That Made Real-Time Object Detection Possible

Object detection is one of the foundational tasks in computer vision. It’s the capability that allows computers to not just see an image, but to understand what’s in it—locating and identifying every car, person, bird, and coffee mug in a scene. For years, the R-CNN family of models has been at the forefront of this field. Beginning with R-CNN, then evolving into the much faster Fast R-CNN, these models pushed the boundaries of accuracy. ...

Fast R-CNN: The Breakthrough That Made Object Detection Faster and Smarter

In the world of computer vision, object detection — the task of identifying and localizing objects within an image — is a core challenge for systems that need to interpret visual data. Before 2015, the leading deep learning methods for object detection were accurate but notoriously slow and cumbersome. They involved complex, multi-stage training pipelines that were difficult to optimize and painfully slow to run. This all changed with the introduction of the Fast R-CNN paper by Ross Girshick. ...

[Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition 🔗](https://arxiv.org/abs/1406.4729)

Breaking Free from Fixed Sizes: How SPP-net Made CNNs 100× Faster

In the early 2010s, deep Convolutional Neural Networks (CNNs) like AlexNet sparked a revolution in computer vision, shattering records in image classification. Yet, amidst this breakthrough, a surprisingly rigid constraint held these powerful models back: they demanded that every single input image be exactly the same size—typically something like 224×224 pixels. Think about that for a moment. The real world is filled with images of all shapes and sizes. To make them fit, researchers had to resort to crude methods: either cropping a patch from the image—potentially cutting out the main subject—or warping (stretching or squishing) the image, which distorts its geometry. Both approaches risk throwing away valuable information before the network even sees the image. ...

[Rich feature hierarchies for accurate object detection and semantic segmentation 🔗](https://arxiv.org/abs/1311.2524)

R-CNN: The Deep Learning Breakthrough That Changed Object Detection Forever

For years, the field of computer vision was dominated by carefully hand-crafted features. Algorithms like SIFT and HOG were the undisputed champions, forming the backbone of nearly every state-of-the-art object detection system. But by 2012, progress was slowing. Performance on the benchmark PASCAL VOC Challenge had hit a plateau, and it seemed the community was squeezing the last drops of performance from existing methods. A true breakthrough was needed. Meanwhile, in a seemingly separate corner of machine learning, a revolution was brewing. Deep learning—specifically Convolutional Neural Networks (CNNs)—was making waves. The pivotal moment came in 2012 when a CNN called AlexNet demolished the competition in the ImageNet Large Scale Visual Recognition Challenge (ILSVRC), a task focused on whole-image classification. This raised an electrifying question in computer vision: could the incredible power of these deep networks, trained for classifying entire images, be harnessed for the more complex task of detecting and localizing specific objects within an image? ...

[You Only Look Once: Unified, Real-Time Object Detection 🔗](https://arxiv.org/abs/1506.02640)

YOLO: The Revolution That Made Computer Vision See in Real-Time

When you glance at a photograph, your brain performs a remarkable feat in milliseconds. You don’t just see a collection of pixels—you instantly identify objects, their locations, and their relationships. You notice that a person is walking a dog, a car is parked next to a fire hydrant, or a cat is sleeping on a sofa. For decades, teaching computers to do this with the same speed and accuracy remained a monumental challenge in computer vision. ...

[YOLOv12: Attention-Centric Real-Time Object Detectors 🔗](https://arxiv.org/abs/2502.12524)

YOLOv12: The First Attention-Powered Real-Time Detector That Breaks the CNN Monopoly

For over a decade, the world of real-time object detection has been dominated by one family of models: YOLO (You Only Look Once). From self-driving cars to retail analytics, YOLO’s remarkable balance of speed and accuracy has made it the go-to solution for detecting objects in high-speed, practical applications. Progress within the YOLO ecosystem has been fueled by continual innovation—but nearly all architectural advances have revolved around Convolutional Neural Networks (CNNs). ...