Archive

2025 ⁸⁸

October ³⁰

Wasted Work: How DeepPrune Slashes LLM Reasoning Costs by Over 80%

2025-10 · 7 min · 1480 words

More Than Just Correct: Why Your AI Coding Assistant Needs a ‘Vibe Check’

2025-10 · 8 min · 1524 words

Learning to Learn, One Batch at a Time: A Deep Dive into Iterative Amortized Inference

2025-10 · 7 min · 1387 words

HyperAdaLoRA: A Hypernetwork-Powered Upgrade for Faster, Smarter LLM Fine-Tuning

2025-10 · 7 min · 1438 words

Beyond Chain-of-Thought: How Parallel Thinking and Self-Refinement Unlock Smarter LLMs

2025-10 · 8 min · 1612 words

AgentFlow: Training LLM Agents to Think, Plan, and Use Tools Effectively

2025-10 · 8 min · 1519 words

Teaching AI to Think, Backtrack, and Try Again: A Deep Dive into MM-HELIX

2025-10 · 6 min · 1135 words

Beyond Imitation: How Early Experience Lets Agents Learn from Their Own Mistakes

2025-10 · 9 min · 1891 words

MemMamba: Teaching AI to ‘Take Notes’ and Conquer the Challenge of Ultra-Long Sequences

2025-10 · 8 min · 1635 words

Beyond the First Frame: Introducing VideoCanvas for Arbitrary Video Creation

2025-10 · 7 min · 1481 words

DreamOmni2: Teaching AI to Edit and Create Images with Both Words and Pictures

2025-10 · 8 min · 1504 words

Beyond Text-to-Video: How UniVideo Unifies Understanding, Generation, and Editing

2025-10 · 8 min · 1605 words

MASA: Teaching AI Models to ‘Think About Their Thinking’

2025-10 · 6 min · 1107 words

Beyond Bigger Contexts: Teaching LCLMs to Think with Reusable Reasoning

2025-10 · 7 min · 1291 words

Reasoning Sparks: How Tiny Probabilities Unlock AI’s Problem-Solving Superpowers

2025-10 · 6 min · 1149 words

ARTDECO: Bridging SLAM and Foundation Models for Flawless On-the-Fly 3D Worlds

2025-10 · 6 min · 1174 words

Brain-Inspired AI: How Artificial Hippocampus Networks Give LLMs Long-Term Memory

2025-10 · 7 min · 1347 words

Why AI Assistants Make Terrible Simulated Users — And How ‘Flipping the Dialogue’ Fixes It

2025-10 · 7 min · 1332 words

Forget Pixels, Let’s Generate Code: A Deep Dive into Code2Video for Creating Educational Videos

2025-10 · 6 min · 1215 words

One Agent Is Good, Ten Are Better: How Scaling Unlocks Near-Human Performance in AI Computer Assistants

2025-10 · 7 min · 1430 words

Less is More: How Tiny Recursive Networks Outsmart Giant AI Models on Complex Puzzles

2025-10 · 7 min · 1432 words

The Tortoise and the Hare of AI: How Gradual Learning Makes Visual AI Faster

2025-10 · 6 min · 1157 words

RECAP: Teaching AI to Think Critically by Showing It Flawed Reasoning

2025-10 · 6 min · 1214 words

Mid-Training is All You Need: How a 15B Model Reached the AI Frontier

2025-10 · 6 min · 1267 words

LongCodeZip: Making LLMs Read Your Entire Codebase Without Breaking the Bank

2025-10 · 6 min · 1114 words

Hiding in the Void: How StealthAttack Poisons 3D Scenes

2025-10 · 7 min · 1321 words

Small is Mighty: How ModernVBERT Redefines Visual Document Retrieval

2025-10 · 4 min · 663 words

From Seconds to Minutes: How Self-Forcing++ Teaches AI to Generate Long Videos

2025-10 · 6 min · 1224 words

Can AI Beat Wall Street? Testing LLM Agents in the Stock Market with STOCKBENCH

2025-10 · 6 min · 1172 words

Don’t Waste Your Mistakes: How Smart Experience Replay Unlocks Reasoning in LLMs

2025-10 · 6 min · 1272 words

September ³⁵

Beyond Chain-of-Thought: Unpacking the Silent Reasoning of LLMs

2025-09 · 12 min · 2492 words

ChemMAS: Teaching AI to Reason Like a Chemist

2025-09 · 8 min · 1617 words

Evolution Strikes Back: A Surprisingly Powerful Way to Fine-Tune LLMs

2025-09 · 6 min · 1076 words

The Dragon Hatchling: A New AI Architecture Bridging Transformers and the Brain

2025-09 · 12 min · 2407 words

Knapsack RL: A Computational ‘Free Lunch’ for Training Smarter Language Models

2025-09 · 6 min · 1110 words

Beyond Math Puzzles: How Teaching LLMs to ‘Think’ Unlocks Superior Chat Performance

2025-09 · 6 min · 1217 words

Meet ARK-V1: An LLM Agent That Navigates Knowledge Graphs for Smarter QA

2025-09 · 7 min · 1444 words

Can LLMs Learn a Trick from Computer Vision? Introducing LLM-JEPA

2025-09 · 6 min · 1123 words

Teaching Language Models to Think Before They Act: A Deep Dive into the PDDL-INSTRUCT Framework

2025-09 · 6 min · 1142 words

One Tokenizer to Rule Them All? A Deep Dive into ATOKEN for Images, Videos, and 3D

2025-09 · 6 min · 1272 words

Beyond the ReAct Loop: Building and Testing Smarter AI Agents with ARE and Gaia2

2025-09 · 7 min · 1477 words

AgentScaler: How Scaling Environments, Not Just Models, Unlocks Advanced AI Agents

2025-09 · 6 min · 1167 words

Beyond the Hype: Do LLMs Actually Learn, or Just Memorize? A Deep Dive into In-Context Learning

2025-09 · 6 min · 1139 words

GP-hy-T: The Dawn of a Universal Physics Engine?

2025-09 · 6 min · 1186 words

Beyond Google: How DeepDive Teaches LLMs to Be Expert Researchers

2025-09 · 6 min · 1267 words

K2-THINK: How a 32B Model Punches Above Its Weight to Rival AI Giants

2025-09 · 6 min · 1129 words

Balancing on a Razor’s Edge: How AI is Discovering Elusive Singularities in Fluid Dynamics

2025-09 · 8 min · 1578 words

Beyond Majority Rule: Training LLMs to Synthesize the Best Answer from Many Guesses

2025-09 · 6 min · 1251 words

When More AI Brains Are Worse Than One: The Hidden Dangers of AI Debate

2025-09 · 6 min · 1246 words

Breaking the ‘Tunnel Vision’ of LLMs: An In-depth Look at ParaThinker’s Parallel Reasoning

2025-09 · 7 min · 1367 words

Learning by Doing: How AgentGym-RL Teaches LLMs to Solve Real-World Problems

2025-09 · 7 min · 1333 words

Beyond ‘Good Enough’: How ACE-RL Teaches LLMs to Master Long-Form Writing

2025-09 · 7 min · 1340 words

REFRAG: Supercharging RAG with 30× Faster First-Token Generation

2025-09 · 6 min · 1133 words

How LLMs Learn to Think – Unpacking the Hierarchical Reasoning in AI

2025-09 · 6 min · 1123 words

Beyond Single Scales: Unpacking SINQ for Better, Faster LLM Quantization

2025-09 · 6 min · 1145 words

Beyond Chatbots: How Reinforcement Learning Creates Autonomous AI Researchers

2025-09 · 6 min · 1258 words

HuMo: Generate Lifelike Human Videos from Text, Photos, and Voice

2025-09 · 7 min · 1371 words

Small Model, Big Impact: How VLA-Adapter Shrinks Robot Brains by 14×

2025-09 · 5 min · 867 words

SAPO: How a Swarm of AI Models Learned 94% Faster by Sharing Experiences

2025-09 · 6 min · 1240 words

Teaching AI to Browse Like a Researcher: The Two-Stage Recipe for Superhuman Web Agents

2025-09 · 7 min · 1279 words

Think Backwards, Write Better: How REER Teaches AI Creative Reasoning

2025-09 · 7 min · 1285 words

Silent Thinking: How LLMs Reason Without Writing It Down

2025-09 · 10 min · 2067 words

Take Control: Build Your Own AI Research Assistant

2025-09 · 5 min · 1062 words

Drivelology: When AI Meets ‘Nonsense with Depth’

2025-09 · 6 min · 1229 words

UI-TARS-2: Teaching AI to Master Your Computer Through Trial and Error

2025-09 · 6 min · 1142 words

August ⁷

Beyond Left-to-Right: Introducing Dream 7B, a Powerful New Diffusion LLM

2025-08 · 7 min · 1473 words

WebWatcher: Training AI Agents to See, Read, and Reason Like a Pro Researcher

2025-08 · 5 min · 983 words

Putting AI Agents to the Test: Inside LiveMCP-101’s Gauntlet of Real-World Challenges

2025-08 · 6 min · 1276 words

PILOT: Smart LLM Routing That Learns and Saves Money

2025-08 · 6 min · 1168 words

The Hidden Math Behind Search: Why Even Perfect AI Can’t Retrieve Everything

2025-08 · 6 min · 1269 words

rStar2-Agent: Teaching AI to Think Smarter, Not Just Longer

2025-08 · 5 min · 1057 words

How AI Vision Models Learn to See Like Humans: The Three Keys to Brain-Like Intelligence

2025-08 · 5 min · 1009 words

July ²

More Thinking, More Problems? When Extra Compute Hurts LLM Robustness

2025-07 · 7 min · 1455 words

Beyond Guesswork: How WebShaper Engineers Smarter AI Web Agents with Mathematical Precision

2025-07 · 6 min · 1228 words

June ²

Teaching LLMs to Teach Themselves: A Deep Dive into Self-Adapting Language Models (SEAL)

2025-06 · 8 min · 1521 words

Beyond Transformers: How MesaNet Learns In-Context by Optimizing on the Fly

2025-06 · 13 min · 2622 words

May ²

Train on the Fly: How LLMs Can Continuously Improve Themselves During Testing

2025-05 · 8 min · 1505 words

LaCT: Why Bigger Is Better for Test-Time Training and Long-Context AI

2025-05 · 8 min · 1598 words

April ²

Agent S2: How a Team of AI Specialists is Mastering Your Computer

2025-04 · 6 min · 1200 words

CoProSketch: The AI Sketch Generator That Actually Lets You Edit

2025-04 · 6 min · 1199 words

March ²

Bridging the Gaps: How RISE Handles Missing Data in Simulation-Based Inference

2025-03 · 8 min · 1506 words

Generating AI Brains on Demand: How ORAL Crafts LoRA Adapters for Evolving LLMs

2025-03 · 8 min · 1492 words

February ³

Stop Repeating Mistakes: How LLMs Can Learn from Feedback in Real Time

2025-02 · 7 min · 1369 words

Beyond the Training Loop: Unlocking LLM Reasoning with Inference-Time Tricks

2025-02 · 7 min · 1299 words

YOLOv12: The First Attention-Powered Real-Time Detector That Breaks the CNN Monopoly

2025-02 · 5 min · 1002 words

January ³

Why Your Cat Is Still Smarter Than the Most Advanced AI

2025-01 · 8 min · 1620 words

Beyond the Black Box: A Deep Dive into Self-Interpretable Neural Networks

2025-01 · 12 min · 2549 words

Beyond Pretraining: How LLMs Remap Their ‘Brains’ On-the-Fly

2025-01 · 8 min · 1593 words

2024 ⁴⁵

December ³

AI, Brain Models, and Messy Data: Building Robust Amortized Bayesian Inference

2024-12 · 9 min · 1855 words

TRELLIS: Weaving High-Quality 3D Worlds with a Unified Latent Structure

2024-12 · 6 min · 1110 words

Teaching AI to Paint Without Ever Seeing a Painting

2024-12 · 6 min · 1227 words

November ⁵

When Models Meet Reality: The Ultimate Guide to Test‑Time Adaptation

2024-11 · 12 min · 2453 words

Training an LLM to Be Its Own Toughest Critic

2024-11 · 9 min · 1773 words

Beyond the Prompt: Unpacking Shortcut Learning in Large Language Models

2024-11 · 7 min · 1488 words

A-BLINK: Using Neural Networks to Turbocharge Gaussian Process Inference

2024-11 · 7 min · 1367 words

Why AI Doesn’t ‘Get It’ Like We Do: Aligning How Humans and Machines Generalise

2024-11 · 8 min · 1515 words

October ⁸

The Secret to Faster Diffusion Models: How AdaptiveDiffusion Skips Steps Intelligently

2024-10 · 6 min · 1251 words

Taming Two Adversaries: A Breakthrough in Robust Sparse Regression

2024-10 · 10 min · 1973 words

Unlocking In-Context Reinforcement Learning with Random Data — A Deep Dive into State-Action Distillation (SAD)

2024-10 · 8 min · 1558 words

AlphaGateau: Training Chess Engines Faster and Smarter with Graphs

2024-10 · 8 min · 1621 words

How LLMs Can Teach Themselves to Be More Trustworthy

2024-10 · 7 min · 1481 words

ACE: The One Transformer Model for Vision, Optimization, and Scientific Simulation

2024-10 · 8 min · 1551 words

FLASHMASK: Taming Long Sequences with Ultra-Efficient Attention Masks

2024-10 · 6 min · 1116 words

Visual Story-Writing: Editing Narratives by Manipulating Interactive Story Maps

2024-10 · 7 min · 1280 words

September ¹

Beyond Static Models: How TTT-UNet Adapts on the Fly for Superior Medical Image Segmentation

2024-09 · 7 min · 1350 words

July ⁶

RNNs Are Back? How Making Hidden States into Learners Unlocks Long-Context Potential

2024-07 · 8 min · 1637 words

The Self-Awareness Paradox: How Teaching Neural Networks to Model Themselves Makes Them Simpler

2024-07 · 7 min · 1446 words

Train-Attention: Teaching LLMs to Focus on What Matters for Continual Learning

2024-07 · 3 min · 561 words

Longhorn: Reimagining State Space Models as Online Learners

2024-07 · 8 min · 1521 words

From 30 Minutes to 3: How MInference Slashes LLM Wait Times for Million-Token Prompts

2024-07 · 6 min · 1066 words

Unpacking FlashAttention-3: How Asynchrony and FP8 Supercharge Transformers

2024-07 · 6 min · 1187 words

June ²

Beyond Chain-of-Thought: How CPO Makes LLMs Smarter Without Slowing Them Down

2024-06 · 8 min · 1518 words

Beyond Pixels: How MASt3R Grounds 2D Image Matching in 3D Reality

2024-06 · 7 min · 1372 words

May ⁴

Meet MicroAdam: The Memory-Miser Optimizer with Provable Convergence

2024-05 · 6 min · 1243 words

The Return of the RNN? A Deep Dive into xLSTM

2024-05 · 14 min · 2905 words

Mamba‑2 Explained: The Duality Connecting State‑Space Models and Attention

2024-05 · 14 min · 2923 words

Teaching LLMs to Code the World: A New Path for Smarter AI Agents

2024-05 · 8 min · 1509 words

April ³

Learning to Generalize: How Meta-Learning Is Cracking the Code of Domain Generalization

2024-04 · 9 min · 1735 words

Train Once, Infer Forever: A Deep Dive into Amortized Neural Inference

2024-04 · 12 min · 2442 words

From 2D Pixels to 3D Splats: How GS-LRM Reconstructs Worlds from a Handful of Images

2024-04 · 6 min · 1187 words

March ³

MVSplat: Building Stunning 3D Worlds from Just a Handful of Photos

2024-03 · 6 min · 1221 words

Beyond Transformers: How VideoMamba Unlocks Efficient Long-Video Understanding

2024-03 · 6 min · 1106 words

Beyond Transformers: How LocalMamba Unlocks the Power of State Space Models for Vision

2024-03 · 6 min · 1243 words

February ⁷

Beyond Gradient Descent: How Transformers Discover Their Own Optimization Algorithms

2024-02 · 7 min · 1311 words

Learning from Scraps – A Deep Dive into Few-Shot Learning on Graphs

2024-02 · 8 min · 1577 words

How Do LLMs Learn on the Fly? A Deep Dive into In-Context Learning

2024-02 · 7 min · 1437 words

The Power of Patience: How Blocking Updates Can Solve Delayed Bandit Feedback

2024-02 · 8 min · 1529 words

Beyond Text: How GITA Teaches AI to See and Reason with Graphs

2024-02 · 7 min · 1370 words

Beyond the Sunny Day: How G-NAS Teaches Object Detectors to See in the Dark

2024-02 · 6 min · 1185 words

LGM: Creating High-Resolution 3D Models in 5 Seconds with Gaussian Splatting

2024-02 · 6 min · 1174 words

January ³

Shrinking 3D Gaussian Splatting Scenes 31× and Rendering Them 4× Faster

2024-01 · 6 min · 1194 words

Vision Mamba: A New Challenger to Transformers for Computer Vision?

2024-01 · 6 min · 1191 words

VMamba: A New Challenger to CNNs and Transformers in Computer Vision

2024-01 · 6 min · 1149 words

2023 ³⁰

December ²

Beyond Photorealism: Feature 3DGS Brings AI Understanding to 3D Scenes

2023-12 · 7 min · 1341 words

How DUSt3R is Redefining 3D Reconstruction — No Camera Info Required

2023-12 · 7 min · 1306 words

November ⁷

Unlocking the Black Box: A Deep Dive into How LLMs Learn on the Fly

2023-11 · 9 min · 1744 words

Untangling the Web: A Guide to Meta, Online, and Continual Learning

2023-11 · 9 min · 1822 words

Distilling Fairness: How Fair Wasserstein Coresets Tackle Bias in Big Data

2023-11 · 9 min · 1905 words

LightGaussian: Shrinking 3D Scenes by 15x While Boosting Rendering Speed

2023-11 · 6 min · 1134 words

GaussianShader: Bringing Realistic Reflections to Real-Time Rendering

2023-11 · 6 min · 1233 words

GS-SLAM: A New Era for Real-Time 3D Mapping with Gaussian Splatting

2023-11 · 5 min · 972 words

Mip-Splatting: The Secret to Crystal-Clear Zooms in 3D Gaussian Splatting

2023-11 · 5 min · 905 words

October ⁴

ClusT3: Adapting to the Unknown with Information-Invariant Clustering

2023-10 · 8 min · 1596 words

Beyond Transformers: Scaling Deep Learning Sub-Quadratically with the Monarch Mixer

2023-10 · 11 min · 2260 words

GaussianDreamer: From Text to Stunning 3D in 15 Minutes by Fusing 2D and 3D AI

2023-10 · 6 min · 1173 words

Unlocking Massive Contexts: A Deep Dive into DISTFLASHATTN

2023-10 · 6 min · 1187 words

September ¹

Promptbreeder: How LLMs Teach Themselves to Become Better Problem Solvers

2023-09 · 8 min · 1559 words

August ¹

Real-Time Radiance Fields: A Deep Dive into 3D Gaussian Splatting

2023-08 · 6 min · 1257 words

July ¹

FlashAttention-2: Even Faster, Even More Efficient Attention for Transformers

2023-07 · 8 min · 1584 words

June ³

How We Learn So Much From So Little: A Bayesian Model That Thinks in Natural Language

2023-06 · 7 min · 1447 words

Solving Giant Gaussian Processes with… SGD? A Deep Dive into Benign Non-Convergence

2023-06 · 7 min · 1290 words

Beyond FlashAttention: Making Transformers Even Faster with Dynamic Sparsity

2023-06 · 5 min · 990 words

May ³

Why Do Deep Ensembles Work? A New Theory Unites Them with Bayesian Methods

2023-05 · 6 min · 1220 words

Unlocking the Black Box: The Theory Behind Chain-of-Thought in LLMs

2023-05 · 12 min · 2359 words

Beyond Fine-Tuning: A Deep Dive into Task Arithmetic and Weight Disentanglement

2023-05 · 6 min · 1167 words

April ²

Why Random Slices Are the Best Way to Explain Your Clusters

2023-04 · 6 min · 1221 words

From v1 to v8 and Beyond: The Complete Story of YOLO

2023-04 · 7 min · 1477 words

March ³

When Your Model Meets the Real World — A Deep Dive into Test‑Time Adaptation

2023-03 · 13 min · 2622 words

Beyond Trial and Error: How LLM Agents Can Learn by Talking to Themselves

2023-03 · 7 min · 1298 words

Zero-1-to-3: How AI Can Imagine a 3D Object from a Single Photo

2023-03 · 7 min · 1284 words

February ²

EvoPrompting: How to Evolve Language Models into Expert AI Architects

2023-02 · 6 min · 1217 words

TPVFormer: Reconstructing a 3D World from 2D Snapshots with Tri-Perspective View

2023-02 · 6 min · 1252 words

January ¹

Learning to Learn: A Deep Dive into Meta‑Reinforcement Learning

2023-01 · 11 min · 2132 words

2022 ¹⁰

December ¹

Hungry Hippos on the Pile: A New Challenger to the Transformer Throne

2022-12 · 7 min · 1460 words

November ²

From Schrödinger’s Bridge to Neural Nets: A New End-to-End Solver for Entropic Optimal Transport

2022-11 · 6 min · 1249 words

Rethinking Neural Network Design: A Deep Dive into Gradient Path Analysis

2022-11 · 7 min · 1344 words

October ¹

NeRF, Gaussian Splatting, and Beyond: A Guided Tour of Neural Radiance Fields

2022-10 · 12 min · 2455 words

August ¹

Beyond the Gaps: A Deep Dive into SSSD for Time Series Imputation and Forecasting

2022-08 · 6 min · 1223 words

June ¹

S4, But Simpler: How Diagonal State Space Models (S4D) Match Performance with Less Complexity

2022-06 · 7 min · 1335 words

May ¹

FlashAttention: Is IO-Awareness the Key to Unlocking Long-Context Transformers?

2022-05 · 6 min · 1271 words

February ²

Learning to Learn: How Self-Modifying Networks Unlock True AI Adaptability

2022-02 · 8 min · 1672 words

SASHIMI: Slicing Through Raw Audio with State-Space Models

2022-02 · 2 min · 410 words

January ¹

Making Every Pixel Count: A Deep Dive into Efficient Non-Local Contrastive Attention

2022-01 · 6 min · 1136 words

2021 ⁸

November ¹

Teaching Machines to Describe Videos: A Deep Dive into SWINBERT

2021-11 · 6 min · 1103 words

October ¹

The Swiss Army Knife of Sequence Models: A Deep Dive into Linear State-Space Layers

2021-10 · 6 min · 1273 words

September ¹

Just Tell the Model What to Do: How Instruction Tuning Unlocks Zero-Shot Learning

2021-09 · 10 min · 1924 words

July ¹

Inside Codex: The AI Pair Programmer That Powers GitHub Copilot

2021-07 · 6 min · 1215 words

June ²

Decision Transformer: When Language Models Learn to Play Games

2021-06 · 7 min · 1435 words

LoRA: Fine-Tune Giant AI Models with 10,000× Fewer Parameters

2021-06 · 5 min · 1050 words

February ¹

Find Top Neural Networks in Hours, Not Days: A Deep Dive into Training-Free NAS

2021-02 · 7 min · 1324 words

January ¹

The Switch Transformer: A Trillion-Parameter AI Model that’s Surprisingly Efficient

2021-01 · 7 min · 1476 words

2020 ¹⁵

December ¹

SpAtten: Making Transformers Spartan by Pruning Redundant Language

2020-12 · 7 min · 1423 words

October ¹

LEAStereo – How AI Learned to Design State-of-the-Art 3D Vision Models

2020-10 · 8 min · 1563 words

June ⁴

Making Transformers Fly - A Deep Dive into Linear Attention

2020-06 · 8 min · 1560 words

Finding Top Neural Networks in Seconds—Without a Single Training Step

2020-06 · 6 min · 1231 words

From Noise to High-Fidelity Images — A Deep Dive into Denoising Diffusion Models

2020-06 · 6 min · 1128 words

Learning from the Past: How Conservative Q-Learning Unlocks Offline Reinforcement Learning

2020-06 · 6 min · 1185 words

April ³

Learning from Pixels Just Got a Lot Faster: A Deep Dive into CURL

2020-04 · 6 min · 1141 words

Beyond Online Training: Introducing D4RL for Real-World Offline Reinforcement Learning

2020-04 · 7 min · 1291 words

YOLOv4: Real-Time Object Detection That Breaks the Speed-Accuracy Trade-Off

2020-04 · 5 min · 1039 words

March ²

Taming the Quadratic Beast — How Routing Transformers Scale to Massive Sequences

2020-03 · 7 min · 1342 words

BigNAS: Train Once, Deploy Anywhere with Single-Stage Neural Architecture Search

2020-03 · 7 min · 1421 words

February ¹

Backpropamine: Teaching Neural Networks to Rewire Themselves

2020-02 · 8 min · 1526 words

January ³

Cracking the Code of One-Shot NAS: A Deep Dive into the NAS-Bench-1Shot1 Benchmark

2020-01 · 7 min · 1433 words

A Fair Playground for Neural Networks: A Deep Dive into NAS-Bench-201

2020-01 · 6 min · 1254 words

More is Different — The Surprising Predictability of Language Model Performance

2020-01 · 8 min · 1511 words

2019 ⁴

November ¹

Can You Teach an Old Model New Tricks? A Deep Dive into Transfer Learning

2019-11 · 11 min · 2233 words

October ¹

ZeRO to Trillion: A Deep Dive into the Memory Optimizations Behind Massive AI Models

2019-10 · 7 min · 1324 words

September ²

Don’t Just Test — Train! Adapting to New Data on the Fly with Self-Supervision

2019-09 · 8 min · 1587 words

Megatron-LM: Scaling Language Models to Billions of Parameters with Elegant PyTorch Parallelism

2019-09 · 6 min · 1186 words

2018 ⁴

December ¹

ProxylessNAS: Searching for Optimal Neural Networks Directly on Your Target Hardware

2018-12 · 7 min · 1461 words

May ¹

Beyond Flipping and Cropping: How AutoAugment Teaches AI to Augment Its Own Data

2018-05 · 7 min · 1314 words

April ¹

YOLOv3: Engineering Excellence Through Incremental Improvements

2018-04 · 6 min · 1083 words

February ¹

ENAS: Making Neural Architecture Search 1000x Faster

2018-02 · 7 min · 1488 words

2017 ⁵

December ¹

PNAS: How to Find Top-Performing Neural Networks Without Breaking the Bank

2017-12 · 6 min · 1161 words

October ¹

Beyond ReLU: How Automated Search Discovered the Swish Activation Function

2017-10 · 6 min · 1249 words

June ¹

Dissecting the Transformer: The Paper That Revolutionized NLP

2017-06 · 7 min · 1466 words

May ¹

From Pixels to Picasso: A Deep Dive into Neural Style Transfer

2017-05 · 5 min · 1050 words

March ¹

Beyond Bounding Boxes: A Deep Dive into Mask R-CNN

2017-03 · 8 min · 1498 words

2016 ⁴

December ¹

YOLO9000: The Real-Time Detector That Recognizes 9,000 Object Categories

2016-12 · 6 min · 1242 words

November ²

How to Train an AI to Design Other AIs: A Deep Dive into Neural Architecture Search

2016-11 · 7 min · 1301 words

ResNeXt: Adding a New Dimension to Deep Neural Network Design

2016-11 · 6 min · 1189 words

June ¹

Beyond the Slice: How V-Net Revolutionized 3D Medical Image Segmentation

2016-06 · 7 min · 1291 words

2015 ¹¹

December ²

Why Your RNNs Overfit—and How to Fix It with Bayesian Dropout

2015-12 · 7 min · 1427 words

Smarter, Not Harder: How Google’s Inception V2 and V3 Rethought Deep Learning Architecture

2015-12 · 6 min · 1241 words

November ¹

DCGANs Explained: Unlocking the Power of Unsupervised Learning with Generative AI

2015-11 · 6 min · 1252 words

August ¹

Content vs. Style: The Algorithm That Taught Computers to Paint Like van Gogh

2015-08 · 7 min · 1348 words

June ⁴

Opening the Black Box: How LSTMs Learn Long-Range Dependencies

2015-06 · 6 min · 1236 words

Faster R-CNN: The Breakthrough That Made Real-Time Object Detection Possible

2015-06 · 6 min · 1123 words

YOLO: The Revolution That Made Computer Vision See in Real-Time

2015-06 · 7 min · 1314 words

YOLO: The Model That Changed Object Detection with a Single Glance

2015-06 · 6 min · 1204 words

May ¹

U-Net: The Architecture That Made Deep Learning Work With Tiny Datasets

2015-05 · 6 min · 1115 words

March ¹

The Ultimate LSTM Showdown: A Deep Dive into ‘A Search Space Odyssey’

2015-03 · 8 min · 1556 words

February ¹

Rethinking Deep RNNs: The Power of Gated Feedback Connections

2015-02 · 6 min · 1200 words

2014 ⁷

December ²

LSTM vs. GRU: The Battle of Gated RNNs

2014-12 · 7 min · 1352 words

Why Adam Became Deep Learning’s Go-To Optimizer

2014-12 · 5 min · 1024 words

2025 88

October 30

Wasted Work: How DeepPrune Slashes LLM Reasoning Costs by Over 80%

More Than Just Correct: Why Your AI Coding Assistant Needs a ‘Vibe Check’

Learning to Learn, One Batch at a Time: A Deep Dive into Iterative Amortized Inference

HyperAdaLoRA: A Hypernetwork-Powered Upgrade for Faster, Smarter LLM Fine-Tuning

Beyond Chain-of-Thought: How Parallel Thinking and Self-Refinement Unlock Smarter LLMs

AgentFlow: Training LLM Agents to Think, Plan, and Use Tools Effectively

Teaching AI to Think, Backtrack, and Try Again: A Deep Dive into MM-HELIX

Beyond Imitation: How Early Experience Lets Agents Learn from Their Own Mistakes

MemMamba: Teaching AI to ‘Take Notes’ and Conquer the Challenge of Ultra-Long Sequences

Beyond the First Frame: Introducing VideoCanvas for Arbitrary Video Creation

DreamOmni2: Teaching AI to Edit and Create Images with Both Words and Pictures

Beyond Text-to-Video: How UniVideo Unifies Understanding, Generation, and Editing

MASA: Teaching AI Models to ‘Think About Their Thinking’

Beyond Bigger Contexts: Teaching LCLMs to Think with Reusable Reasoning

Reasoning Sparks: How Tiny Probabilities Unlock AI’s Problem-Solving Superpowers

ARTDECO: Bridging SLAM and Foundation Models for Flawless On-the-Fly 3D Worlds

Brain-Inspired AI: How Artificial Hippocampus Networks Give LLMs Long-Term Memory

Why AI Assistants Make Terrible Simulated Users — And How ‘Flipping the Dialogue’ Fixes It

Forget Pixels, Let’s Generate Code: A Deep Dive into Code2Video for Creating Educational Videos

One Agent Is Good, Ten Are Better: How Scaling Unlocks Near-Human Performance in AI Computer Assistants

Less is More: How Tiny Recursive Networks Outsmart Giant AI Models on Complex Puzzles

The Tortoise and the Hare of AI: How Gradual Learning Makes Visual AI Faster

RECAP: Teaching AI to Think Critically by Showing It Flawed Reasoning

Mid-Training is All You Need: How a 15B Model Reached the AI Frontier

LongCodeZip: Making LLMs Read Your Entire Codebase Without Breaking the Bank

Hiding in the Void: How StealthAttack Poisons 3D Scenes

Small is Mighty: How ModernVBERT Redefines Visual Document Retrieval

From Seconds to Minutes: How Self-Forcing++ Teaches AI to Generate Long Videos

Can AI Beat Wall Street? Testing LLM Agents in the Stock Market with STOCKBENCH

Don’t Waste Your Mistakes: How Smart Experience Replay Unlocks Reasoning in LLMs

September 35

Beyond Chain-of-Thought: Unpacking the Silent Reasoning of LLMs

ChemMAS: Teaching AI to Reason Like a Chemist

Evolution Strikes Back: A Surprisingly Powerful Way to Fine-Tune LLMs

The Dragon Hatchling: A New AI Architecture Bridging Transformers and the Brain

Knapsack RL: A Computational ‘Free Lunch’ for Training Smarter Language Models

Beyond Math Puzzles: How Teaching LLMs to ‘Think’ Unlocks Superior Chat Performance

Meet ARK-V1: An LLM Agent That Navigates Knowledge Graphs for Smarter QA

Can LLMs Learn a Trick from Computer Vision? Introducing LLM-JEPA

Teaching Language Models to Think Before They Act: A Deep Dive into the PDDL-INSTRUCT Framework

One Tokenizer to Rule Them All? A Deep Dive into ATOKEN for Images, Videos, and 3D

Beyond the ReAct Loop: Building and Testing Smarter AI Agents with ARE and Gaia2

AgentScaler: How Scaling Environments, Not Just Models, Unlocks Advanced AI Agents

Beyond the Hype: Do LLMs Actually Learn, or Just Memorize? A Deep Dive into In-Context Learning

GP-hy-T: The Dawn of a Universal Physics Engine?

Beyond Google: How DeepDive Teaches LLMs to Be Expert Researchers

K2-THINK: How a 32B Model Punches Above Its Weight to Rival AI Giants

Balancing on a Razor’s Edge: How AI is Discovering Elusive Singularities in Fluid Dynamics

Beyond Majority Rule: Training LLMs to Synthesize the Best Answer from Many Guesses

When More AI Brains Are Worse Than One: The Hidden Dangers of AI Debate

Breaking the ‘Tunnel Vision’ of LLMs: An In-depth Look at ParaThinker’s Parallel Reasoning

Learning by Doing: How AgentGym-RL Teaches LLMs to Solve Real-World Problems

Beyond ‘Good Enough’: How ACE-RL Teaches LLMs to Master Long-Form Writing

REFRAG: Supercharging RAG with 30× Faster First-Token Generation

How LLMs Learn to Think – Unpacking the Hierarchical Reasoning in AI

Beyond Single Scales: Unpacking SINQ for Better, Faster LLM Quantization

Beyond Chatbots: How Reinforcement Learning Creates Autonomous AI Researchers

HuMo: Generate Lifelike Human Videos from Text, Photos, and Voice

Small Model, Big Impact: How VLA-Adapter Shrinks Robot Brains by 14×

SAPO: How a Swarm of AI Models Learned 94% Faster by Sharing Experiences

Teaching AI to Browse Like a Researcher: The Two-Stage Recipe for Superhuman Web Agents

Think Backwards, Write Better: How REER Teaches AI Creative Reasoning

Silent Thinking: How LLMs Reason Without Writing It Down

Take Control: Build Your Own AI Research Assistant

Drivelology: When AI Meets ‘Nonsense with Depth’

UI-TARS-2: Teaching AI to Master Your Computer Through Trial and Error

August 7

Beyond Left-to-Right: Introducing Dream 7B, a Powerful New Diffusion LLM

WebWatcher: Training AI Agents to See, Read, and Reason Like a Pro Researcher

Putting AI Agents to the Test: Inside LiveMCP-101’s Gauntlet of Real-World Challenges

PILOT: Smart LLM Routing That Learns and Saves Money

The Hidden Math Behind Search: Why Even Perfect AI Can’t Retrieve Everything

rStar2-Agent: Teaching AI to Think Smarter, Not Just Longer

How AI Vision Models Learn to See Like Humans: The Three Keys to Brain-Like Intelligence

July 2

More Thinking, More Problems? When Extra Compute Hurts LLM Robustness

Beyond Guesswork: How WebShaper Engineers Smarter AI Web Agents with Mathematical Precision

June 2

2025 ⁸⁸

October ³⁰

September ³⁵

August ⁷

July ²

June ²

May ²

April ²

March ²

February ³

January ³

2024 ⁴⁵

December ³

November ⁵

October ⁸

September ¹

July ⁶

June ²

May ⁴

April ³

March ³

February ⁷