Deep Paper

[NEURAL ARCHITECTURE SEARCH ON IMAGENET IN FOUR GPU HOURS: A THEORETICALLY INSPIRED PERSPECTIVE 🔗](https://arxiv.org/abs/2102.11535)

Find Top Neural Networks in Hours, Not Days: A Deep Dive into Training-Free NAS

Neural Architecture Search (NAS) is one of the most exciting frontiers in deep learning. Its promise is simple yet profound: to automatically design the best possible neural network for a given task, freeing humans from the tedious and often intuition-driven process of manual architecture design. But this promise has always come with a hefty price tag—traditional NAS methods can consume thousands of GPU-hours, scouring vast search spaces by training and evaluating countless candidate architectures. This immense computational cost has limited NAS to a handful of well-funded research labs. ...

[Hierarchical Neural Architecture Search for Deep Stereo Matching 🔗](https://arxiv.org/abs/2010.13501)

LEAStereo – How AI Learned to Design State-of-the-Art 3D Vision Models

For decades, getting computers to see the world in 3D like humans do has been a central goal of computer vision. This capability—stereo vision—powers self-driving cars navigating complex streets, robots grasping objects with precision, and augmented reality systems blending virtual objects seamlessly into our surroundings. At its core, stereo vision solves a seemingly simple problem: given two images of the same scene taken from slightly different angles (like our two eyes), can we calculate the depth of everything in the scene? ...

[BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models 🔗](https://arxiv.org/abs/2003.11142)

BigNAS: Train Once, Deploy Anywhere with Single-Stage Neural Architecture Search

Deploying machine learning models in the real world is a messy business. The perfect model for a high-end cloud GPU might be a terrible fit for a smartphone, which in turn is overkill for a tiny microcontroller. Each device has its own unique constraints — on latency, memory, and power — and this diversity has sparked rapid growth in Neural Architecture Search (NAS), a field dedicated to automatically designing neural networks tailored for specific hardware. ...

[Neural Architecture Search without Training 🔗](https://arxiv.org/abs/2006.04647)

Finding Top Neural Networks in Seconds—Without a Single Training Step

Designing a high-performing neural network has long been part art, part science, and a whole lot of trial and error. For years, the best deep learning models were forged through immense human effort, intuition, and countless hours of GPU-powered experimentation. This manual design process is a significant bottleneck—one that sparked the rise of an exciting field: Neural Architecture Search (NAS). The goal of NAS is straightforward: automate the design of neural networks. Instead of a human painstakingly choosing layers, connections, and operations, a NAS algorithm explores a vast space of possible architectures to find the best one for a given task. Early NAS methods were revolutionary, discovering state-of-the-art models like NASNet. But they came with staggering computational costs. The original NAS paper required 800 GPUs running for 28 days straight—over 60 GPU-years—for a single search. ...

[NAS-BENCH-201: EXTENDING THE SCOPE OF RE-PRODUCIBLE NEURAL ARCHITECTURE SEARCH 🔗](https://arxiv.org/abs/2001.00326)

A Fair Playground for Neural Networks: A Deep Dive into NAS-Bench-201

Neural Architecture Search (NAS) has transformed the way we design deep learning models. Instead of relying solely on human intuition and years of experience, NAS algorithms can automatically discover powerful and efficient network architectures — often surpassing their hand-crafted predecessors. This paradigm shift has sparked an explosion of new NAS methods, spanning reinforcement learning, evolutionary strategies, and differentiable optimization. But this rapid progress comes with a hidden cost: a crisis of comparability. ...

[Progressive Neural Architecture Search 🔗](https://arxiv.org/abs/1712.00559)

PNAS: How to Find Top-Performing Neural Networks Without Breaking the Bank

Designing the architecture of a neural network has long been considered a dark art — a blend of intuition, experience, and trial-and-error. But what if we could automate this process? What if an AI could design an even better AI? This is the promise of Neural Architecture Search (NAS), a field that has produced some of the best-performing models in computer vision. However, this power has historically come at a staggering cost. Early state-of-the-art methods like Google’s NASNet required enormous computational resources — training and evaluating 20,000 different architectures on 500 high-end GPUs over four days. Such requirements put NAS far beyond the reach of most researchers or organizations without access to a massive data center. ...

[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 🔗](https://arxiv.org/abs/1812.00332)

ProxylessNAS: Searching for Optimal Neural Networks Directly on Your Target Hardware

Neural Architecture Search (NAS) is one of the most exciting frontiers in deep learning. Imagine an algorithm that can automatically design a state-of-the-art neural network for you—perfectly tailored to your specific task. The promise of NAS is to replace the tedious, intuition-driven process of manual network design with a principled, automated search. For years, however, this promise came with a colossal price tag. Early NAS methods required tens of thousands of GPU hours to discover a single architecture—a cost so prohibitive that it was out of reach for most researchers and engineers. To make NAS feasible, the community developed a clever workaround: instead of searching directly on the massive target task (like ImageNet), researchers would search on a smaller, more manageable proxy task—such as using CIFAR-10 instead of ImageNet, training for fewer epochs, or searching for a single reusable block rather than an entire network. ...

[Efficient Neural Architecture Search via Parameter Sharing 🔗](https://arxiv.org/abs/1802.03268)

ENAS: Making Neural Architecture Search 1000x Faster

Designing a high-performing neural network is often described as a dark art. It requires deep expertise, intuition, and a whole lot of trial and error. What if we could automate this process? This is the promise of Neural Architecture Search (NAS), a field that aims to automatically discover the best network architecture for a given task. The original NAS paper by Zoph & Le (2017) was a landmark achievement. It used reinforcement learning to discover state-of-the-art architectures for image classification and language modeling, surpassing designs created by human experts. But it came with a colossal price tag: the search process required hundreds of GPUs running for several days. For example, NASNet (Zoph et al., 2018) used 450 GPUs for 3–4 days. This level of computational resources is simply out of reach for most researchers, students, and companies. ...

[G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection 🔗](https://arxiv.org/abs/2402.04672)

Beyond the Sunny Day: How G-NAS Teaches Object Detectors to See in the Dark

Imagine an autonomous car, its AI trained on thousands of hours of footage from bright, sunny California days. It can spot pedestrians, cars, and cyclists with incredible accuracy. Now, transport that same car to a foggy London morning, a rainy dusk in Seattle, or a dimly lit street in Tokyo at midnight. Will it still perform flawlessly? This is the crux of one of the biggest challenges in modern computer vision: domain generalization. Models trained in one specific environment (a “domain”) often fail dramatically when deployed in a new, unseen one. The problem is even harder when you only have data from a single source domain to learn from. This specific, realistic, and tough challenge is called Single Domain Generalization Object Detection (S-DGOD). ...

[EvoPrompting: Language Models for Code-Level Neural Architecture Search 🔗](https://arxiv.org/abs/2302.14838)

EvoPrompting: How to Evolve Language Models into Expert AI Architects

Large Language Models (LLMs) like GPT-4 and PaLM have become astonishingly good at writing code. Give them a description, and they can generate a functional script, a web component, or even a complex algorithm. But writing code from a clear specification is one thing—designing something truly novel and high-performing from scratch is another. Can an LLM invent a new, state-of-the-art neural network architecture? If you simply ask an LLM to “design a better neural network,” the results are often underwhelming. The task is too complex, the search space of possible architectures is astronomically vast, and the model lacks a structured way to iterate and improve. This is the challenge that a fascinating new paper, EvoPrompting: Language Models for Code-Level Neural Architecture Search, tackles head-on. ...

[NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING 🔗](https://arxiv.org/abs/1611.01578)

How to Train an AI to Design Other AIs: A Deep Dive into Neural Architecture Search

Designing a state-of-the-art neural network has often been described as a “dark art.” It requires deep expertise, countless hours of experimentation, and a healthy dose of intuition. From AlexNet and VGGNet to ResNet and DenseNet, each breakthrough architecture has been the product of painstaking human design. But what if we could automate this process? What if, instead of manually designing architectures, we could design an algorithm that learns to design architectures for us? ...

[Less is More: Recursive Reasoning with Tiny Networks 🔗](https://arxiv.org/abs/2510.04871)

Less is More: How Tiny Recursive Networks Outsmart Giant AI Models on Complex Puzzles

Large Language Models (LLMs) like GPT-4 and Gemini are computational powerhouses, capable of writing code, composing poetry, and answering a vast range of questions. But for all their might, they have an Achilles’ heel: complex, multi-step reasoning puzzles. Tasks like solving a tricky Sudoku or deciphering the abstract patterns in the ARC-AGI benchmark can cause even the most advanced LLMs to stumble. Their auto-regressive, token-by-token generation process means a single mistake can derail the entire solution, with no easy way to backtrack and correct course. ...

[UNSUPERVISED REPRESENTATION LEARNING WITH DEEP CONVOLUTIONAL GENERATIVE ADVERSARIAL NETWORKS 🔗](https://arxiv.org/abs/1511.06434)

DCGANs Explained: Unlocking the Power of Unsupervised Learning with Generative AI

In the world of computer vision, Convolutional Neural Networks (CNNs) have been the undisputed champions for years. Give a CNN enough labeled images of cats and dogs, and it will learn to tell them apart with superhuman accuracy. This is supervised learning, and it has powered modern AI applications from photo tagging to medical imaging. But what happens when you don’t have labels? The internet is overflowing with billions of images, but only a tiny fraction are neatly categorized. This is the challenge of unsupervised learning: can a model learn meaningful, reusable knowledge about the visual world from a massive, messy pile of unlabeled data? ...

[Denoising Diffusion Probabilistic Models 🔗](https://arxiv.org/abs/2006.11239)

From Noise to High-Fidelity Images — A Deep Dive into Denoising Diffusion Models

In the last decade, AI has dazzled the world with deep generative models capable of producing realistic images, audio, and text from scratch. We’ve seen Generative Adversarial Networks (GANs) generate lifelike portraits and Variational Autoencoders (VAEs) learn rich latent representations. But in 2020, a paper titled Denoising Diffusion Probabilistic Models from researchers at UC Berkeley reshaped the conversation. This work introduced a class of models, based on ideas from nonequilibrium thermodynamics first explored in 2015, that were shown for the first time to produce exceptionally high-quality images, rivaling — and in some cases surpassing — the best GANs. ...

[Reflection: Language Agents with Verbal Reinforcement Learning 🔗](https://arxiv.org/abs/2303.11366)

Beyond Trial and Error: How LLM Agents Can Learn by Talking to Themselves

Large Language Models (LLMs) are breaking out of the chatbot box. We’re increasingly seeing them power autonomous agents that can interact with software, play games, and browse the web to accomplish complex goals. But there’s a catch: when these agents make a mistake, how do they learn not to repeat it? Traditionally, the answer in AI has been Reinforcement Learning (RL)—a process of trial and error where an agent is rewarded for good actions and penalized for bad ones. However, applying traditional RL to massive LLMs is incredibly slow and computationally expensive, often requiring months of training and enormous GPU resources to fine-tune billions of parameters. As a result, most LLM agents today learn only from a handful of carefully designed examples in their prompt. ...

[CURL: Contrastive Unsupervised Representations for Reinforcement Learning 🔗](https://arxiv.org/abs/2004.04136)

Learning from Pixels Just Got a Lot Faster: A Deep Dive into CURL

Reinforcement Learning (RL) has given us agents that can master complex video games, control simulated robots, and even grasp real-world objects. However, there’s a catch that has long plagued the field: RL is notoriously data-hungry. An agent often needs millions of interactions with its environment to learn a task. In a fast simulation, that’s fine—but in the real world, where a robot arm might take seconds to perform a single action, this can translate to months or years of training. ...

[Decision Transformer: Reinforcement Learning via Sequence Modeling 🔗](https://arxiv.org/abs/2106.01345)

Decision Transformer: When Language Models Learn to Play Games

What if you could tackle a complex reinforcement learning problem the same way you’d complete a sentence? This is the radical and powerful idea behind the Decision Transformer—a paper that reframes the entire field of sequential decision-making. For decades, Reinforcement Learning (RL) has been dominated by algorithms that learn value functions and policy gradients, often wrestling with complex issues like temporal credit assignment, bootstrapping instability, and discounting. But what if we could sidestep all of that? ...

[D4RL: DATASETS FOR DEEP DATA-DRIVEN REINFORCEMENT LEARNING 🔗](https://arxiv.org/abs/2004.07219)

Beyond Online Training: Introducing D4RL for Real-World Offline Reinforcement Learning

The past decade has shown us the incredible power of large datasets. From ImageNet fueling the computer vision revolution to massive text corpora enabling models like GPT, it’s clear: data is the lifeblood of modern machine learning. Yet one of the most exciting fields—Reinforcement Learning (RL)—has largely been excluded from this data-driven paradigm. Traditionally, RL agents learn through active, online interaction with an environment—playing games, controlling robots, simulating trades—building policies through trial and error. This approach is powerful but often impractical, expensive, or dangerous in real-world contexts. We can’t let a self-driving car “explore” by crashing thousands of times or experiment recklessly in healthcare. ...

[Conservative Q-Learning for Offline Reinforcement Learning 🔗](https://arxiv.org/abs/2006.04779)

Learning from the Past: How Conservative Q-Learning Unlocks Offline Reinforcement Learning

Imagine training a robot to cook a meal. The traditional approach in Reinforcement Learning (RL) is trial and error. The robot might try picking up an egg — sometimes succeeding, sometimes dropping it and making a mess. After thousands of attempts, it eventually learns. But what if we already have a massive dataset of a human chef cooking? Could the robot learn just by watching, without ever cracking an egg itself? ...

[NeRF: Neural Radiance Field in 3D Vision: A Comprehensive Review 🔗](https://arxiv.org/abs/2210.00379)

NeRF, Gaussian Splatting, and Beyond: A Guided Tour of Neural Radiance Fields

In March 2020, “NeRF: Representing Scenes as Neural Radiance Fields for View Synthesis” introduced a deceptively simple idea that reshaped how we think about 3D scene representation. From a set of posed 2D photos, a compact neural network could learn a continuous, view-consistent model of scene appearance and geometry, then synthesize photorealistic novel views. Over the next five years NeRF inspired a torrent of follow-up work: faster training, better geometry, robust sparse-view methods, generative 3D synthesis, and application-focused systems for urban scenes, human avatars, and SLAM. ...