Papers

[Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers 🔗](https://arxiv.org/abs/2110.13985)

The Swiss Army Knife of Sequence Models: A Deep Dive into Linear State-Space Layers

Recurrent Neural Networks (RNNs), Convolutional Neural Networks (CNNs), and Transformers have revolutionized the way we process sequential data such as text, audio, and time series. Each paradigm is powerful, but each comes with its own limitations: RNNs are efficient at inference but train slowly on long sequences and suffer from vanishing gradients. CNNs train in parallel and are fast, but they struggle beyond their fixed receptive field and have costly inference. Transformers can capture global context but scale quadratically in memory and computation with sequence length. What if we could unify the strengths of these approaches? Imagine a model with: ...

[On the Parameterization and Initialization of Diagonal State Space Models 🔗](https://arxiv.org/abs/2206.11893)

S4, But Simpler: How Diagonal State Space Models (S4D) Match Performance with Less Complexity

Introduction: The Quest for Efficient Sequence Models Modeling long sequences of data—whether audio waveforms, medical signals, text, or flattened images—is a fundamental challenge in machine learning. For years, Recurrent Neural Networks (RNNs) and Convolutional Neural Networks (CNNs) were the standard tools. More recently, Transformers have risen to prominence with remarkable results. But all of these models face trade-offs, particularly when sequences get very long. Enter State Space Models (SSMs). A recent architecture called S4 (Structured State Space for Sequences) emerged as a powerful contender, outperforming previous approaches for tasks requiring long-range memory. Built on a solid mathematical foundation from classical control theory, S4 efficiently models continuous signals with a special state matrix called the HiPPO matrix—a mathematical design aimed at remembering information over long periods. ...

[Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model 🔗](https://arxiv.org/abs/2401.09417)

Vision Mamba: A New Challenger to Transformers for Computer Vision?

For the past few years, Vision Transformers (ViTs) have dominated computer vision. By treating images as sequences of patches and applying self-attention, these models have set new benchmarks in image classification, object detection, and semantic segmentation. However, this power comes at a steep computational cost. The self-attention mechanism at the core of Transformers suffers from quadratic complexity. In plain terms, if you double the number of image patches (for example, by increasing resolution), the computation and memory demands don’t just double—they quadruple. This makes high-resolution image processing slow, memory-hungry, and often impractical without specialized hardware or cumbersome architectural workarounds. ...

[Transformers are SSMs: Generalized Models and Efficient Algorithms Through Structured State Space Duality 🔗](https://arxiv.org/abs/2405.21060)

Mamba‑2 Explained: The Duality Connecting State‑Space Models and Attention

Transformers dominate many sequence-modeling tasks, but their core self-attention scales quadratically with context length. That design choice makes very long contexts expensive in compute and memory. At the same time, structured state-space models (SSMs) — exemplified by S4 and Mamba — offer linear scaling in sequence length and constant state for autoregressive generation. The two model families have matured along largely separate paths: different mathematics, different optimizations, and different engineering tradeoffs. ...

[VMamba: Visual State Space Model 🔗](https://arxiv.org/abs/2401.10166)

VMamba: A New Challenger to CNNs and Transformers in Computer Vision

For the past decade, computer vision has been dominated by two architectural titans: Convolutional Neural Networks (CNNs) and, more recently, Vision Transformers (ViTs). CNNs are celebrated for their efficiency and strong inductive biases toward local patterns, while ViTs, powered by the self-attention mechanism, excel at capturing global relationships in images. However, this power comes at a cost — the self-attention mechanism has quadratic complexity (\(O(N^2)\)) with respect to the number of image patches, making ViTs computationally expensive and slow, especially for high-resolution images common in tasks like object detection and segmentation. ...

From Atoms to Applications: Unpacking a Full-Featured 2D Flash Memory Chip

Introduction: The Nanoscale Revolution Waiting to Happen For over a decade, two-dimensional (2D) materials like graphene and molybdenum disulfide (MoS₂) have been the superstars of materials science. Thinner than a single strand of human DNA, these atomic-scale sheets possess extraordinary electronic properties that promise to revolutionize computing — from ultra-fast transistors to hyper-efficient memory. They represent a potential path to continue the incredible progress of Moore’s Law, pushing beyond the physical limits of silicon. ...

The Power of Noise: How Denoising Autoencoders Learn Robust Features

Deep neural networks have become the cornerstone of modern artificial intelligence, achieving remarkable feats in areas like image recognition, natural language processing, and beyond. But before they became so dominant, there was a major hurdle: training them was incredibly difficult. The deeper the network, the harder it was to get it to learn anything useful. A key breakthrough came in the mid-2000s with the idea of unsupervised pre-training, a method of initializing a deep network layer by layer before fine-tuning it on a specific task. ...

Unlocking Deep Learning: How a 2006 Breakthrough Revolutionized Neural Networks

High-dimensional data—like images with millions of pixels, documents with thousands of words, or genomes with countless features—can be incredibly complex to understand and analyze. This is often referred to as the curse of dimensionality: with so many variables, it becomes harder to spot meaningful patterns and relationships, making tasks like classification, visualization, or storage challenging. For decades, the preferred technique to tackle this problem was Principal Component Analysis (PCA). PCA is a linear method that finds the directions of greatest variance in a dataset and projects it into a lower-dimensional space. It’s effective and simple, but inherently limited—especially when the patterns in the data are non-linear, curving through high-dimensional space in complex ways. In such cases, PCA can fail to capture important structure. ...

[NAS-Bench-1Shot1: Benchmarking and Dissecting One-shot Neural Architecture Search 🔗](https://arxiv.org/abs/2001.10422)

Cracking the Code of One-Shot NAS: A Deep Dive into the NAS-Bench-1Shot1 Benchmark

Introduction: The Promise and Peril of Automated AI Neural Architecture Search (NAS) is one of the most exciting frontiers in machine learning. Imagine an algorithm that can automatically design the perfect neural network for your specific task, potentially outperforming architectures crafted by world-class human experts. This is the promise of NAS. Early successes proved that NAS could discover state-of-the-art models for image classification and other tasks — but at a staggering cost. The search often required thousands of GPU-days of computation, making it a luxury only accessible to a few large tech companies. ...

[NEURAL ARCHITECTURE SEARCH ON IMAGENET IN FOUR GPU HOURS: A THEORETICALLY INSPIRED PERSPECTIVE 🔗](https://arxiv.org/abs/2102.11535)

Find Top Neural Networks in Hours, Not Days: A Deep Dive into Training-Free NAS

Neural Architecture Search (NAS) is one of the most exciting frontiers in deep learning. Its promise is simple yet profound: to automatically design the best possible neural network for a given task, freeing humans from the tedious and often intuition-driven process of manual architecture design. But this promise has always come with a hefty price tag—traditional NAS methods can consume thousands of GPU-hours, scouring vast search spaces by training and evaluating countless candidate architectures. This immense computational cost has limited NAS to a handful of well-funded research labs. ...

[Hierarchical Neural Architecture Search for Deep Stereo Matching 🔗](https://arxiv.org/abs/2010.13501)

LEAStereo – How AI Learned to Design State-of-the-Art 3D Vision Models

For decades, getting computers to see the world in 3D like humans do has been a central goal of computer vision. This capability—stereo vision—powers self-driving cars navigating complex streets, robots grasping objects with precision, and augmented reality systems blending virtual objects seamlessly into our surroundings. At its core, stereo vision solves a seemingly simple problem: given two images of the same scene taken from slightly different angles (like our two eyes), can we calculate the depth of everything in the scene? ...

[BigNAS: Scaling Up Neural Architecture Search with Big Single-Stage Models 🔗](https://arxiv.org/abs/2003.11142)

BigNAS: Train Once, Deploy Anywhere with Single-Stage Neural Architecture Search

Deploying machine learning models in the real world is a messy business. The perfect model for a high-end cloud GPU might be a terrible fit for a smartphone, which in turn is overkill for a tiny microcontroller. Each device has its own unique constraints — on latency, memory, and power — and this diversity has sparked rapid growth in Neural Architecture Search (NAS), a field dedicated to automatically designing neural networks tailored for specific hardware. ...

[Neural Architecture Search without Training 🔗](https://arxiv.org/abs/2006.04647)

Finding Top Neural Networks in Seconds—Without a Single Training Step

Designing a high-performing neural network has long been part art, part science, and a whole lot of trial and error. For years, the best deep learning models were forged through immense human effort, intuition, and countless hours of GPU-powered experimentation. This manual design process is a significant bottleneck—one that sparked the rise of an exciting field: Neural Architecture Search (NAS). The goal of NAS is straightforward: automate the design of neural networks. Instead of a human painstakingly choosing layers, connections, and operations, a NAS algorithm explores a vast space of possible architectures to find the best one for a given task. Early NAS methods were revolutionary, discovering state-of-the-art models like NASNet. But they came with staggering computational costs. The original NAS paper required 800 GPUs running for 28 days straight—over 60 GPU-years—for a single search. ...

[NAS-BENCH-201: EXTENDING THE SCOPE OF RE-PRODUCIBLE NEURAL ARCHITECTURE SEARCH 🔗](https://arxiv.org/abs/2001.00326)

A Fair Playground for Neural Networks: A Deep Dive into NAS-Bench-201

Neural Architecture Search (NAS) has transformed the way we design deep learning models. Instead of relying solely on human intuition and years of experience, NAS algorithms can automatically discover powerful and efficient network architectures — often surpassing their hand-crafted predecessors. This paradigm shift has sparked an explosion of new NAS methods, spanning reinforcement learning, evolutionary strategies, and differentiable optimization. But this rapid progress comes with a hidden cost: a crisis of comparability. ...

[Progressive Neural Architecture Search 🔗](https://arxiv.org/abs/1712.00559)

PNAS: How to Find Top-Performing Neural Networks Without Breaking the Bank

Designing the architecture of a neural network has long been considered a dark art — a blend of intuition, experience, and trial-and-error. But what if we could automate this process? What if an AI could design an even better AI? This is the promise of Neural Architecture Search (NAS), a field that has produced some of the best-performing models in computer vision. However, this power has historically come at a staggering cost. Early state-of-the-art methods like Google’s NASNet required enormous computational resources — training and evaluating 20,000 different architectures on 500 high-end GPUs over four days. Such requirements put NAS far beyond the reach of most researchers or organizations without access to a massive data center. ...

[ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware 🔗](https://arxiv.org/abs/1812.00332)

ProxylessNAS: Searching for Optimal Neural Networks Directly on Your Target Hardware

Neural Architecture Search (NAS) is one of the most exciting frontiers in deep learning. Imagine an algorithm that can automatically design a state-of-the-art neural network for you—perfectly tailored to your specific task. The promise of NAS is to replace the tedious, intuition-driven process of manual network design with a principled, automated search. For years, however, this promise came with a colossal price tag. Early NAS methods required tens of thousands of GPU hours to discover a single architecture—a cost so prohibitive that it was out of reach for most researchers and engineers. To make NAS feasible, the community developed a clever workaround: instead of searching directly on the massive target task (like ImageNet), researchers would search on a smaller, more manageable proxy task—such as using CIFAR-10 instead of ImageNet, training for fewer epochs, or searching for a single reusable block rather than an entire network. ...

[Efficient Neural Architecture Search via Parameter Sharing 🔗](https://arxiv.org/abs/1802.03268)

ENAS: Making Neural Architecture Search 1000x Faster

Designing a high-performing neural network is often described as a dark art. It requires deep expertise, intuition, and a whole lot of trial and error. What if we could automate this process? This is the promise of Neural Architecture Search (NAS), a field that aims to automatically discover the best network architecture for a given task. The original NAS paper by Zoph & Le (2017) was a landmark achievement. It used reinforcement learning to discover state-of-the-art architectures for image classification and language modeling, surpassing designs created by human experts. But it came with a colossal price tag: the search process required hundreds of GPUs running for several days. For example, NASNet (Zoph et al., 2018) used 450 GPUs for 3–4 days. This level of computational resources is simply out of reach for most researchers, students, and companies. ...

[G-NAS: Generalizable Neural Architecture Search for Single Domain Generalization Object Detection 🔗](https://arxiv.org/abs/2402.04672)

Beyond the Sunny Day: How G-NAS Teaches Object Detectors to See in the Dark

Imagine an autonomous car, its AI trained on thousands of hours of footage from bright, sunny California days. It can spot pedestrians, cars, and cyclists with incredible accuracy. Now, transport that same car to a foggy London morning, a rainy dusk in Seattle, or a dimly lit street in Tokyo at midnight. Will it still perform flawlessly? This is the crux of one of the biggest challenges in modern computer vision: domain generalization. Models trained in one specific environment (a “domain”) often fail dramatically when deployed in a new, unseen one. The problem is even harder when you only have data from a single source domain to learn from. This specific, realistic, and tough challenge is called Single Domain Generalization Object Detection (S-DGOD). ...

[EvoPrompting: Language Models for Code-Level Neural Architecture Search 🔗](https://arxiv.org/abs/2302.14838)

EvoPrompting: How to Evolve Language Models into Expert AI Architects

Large Language Models (LLMs) like GPT-4 and PaLM have become astonishingly good at writing code. Give them a description, and they can generate a functional script, a web component, or even a complex algorithm. But writing code from a clear specification is one thing—designing something truly novel and high-performing from scratch is another. Can an LLM invent a new, state-of-the-art neural network architecture? If you simply ask an LLM to “design a better neural network,” the results are often underwhelming. The task is too complex, the search space of possible architectures is astronomically vast, and the model lacks a structured way to iterate and improve. This is the challenge that a fascinating new paper, EvoPrompting: Language Models for Code-Level Neural Architecture Search, tackles head-on. ...

[NEURAL ARCHITECTURE SEARCH WITH REINFORCEMENT LEARNING 🔗](https://arxiv.org/abs/1611.01578)

How to Train an AI to Design Other AIs: A Deep Dive into Neural Architecture Search

Designing a state-of-the-art neural network has often been described as a “dark art.” It requires deep expertise, countless hours of experimentation, and a healthy dose of intuition. From AlexNet and VGGNet to ResNet and DenseNet, each breakthrough architecture has been the product of painstaking human design. But what if we could automate this process? What if, instead of manually designing architectures, we could design an algorithm that learns to design architectures for us? ...