Designing the architecture of a neural network has long been considered a dark art — a blend of intuition, experience, and trial-and-error. But what if we could automate this process? What if an AI could design an even better AI? This is the promise of Neural Architecture Search (NAS), a field that has produced some of the best-performing models in computer vision.
However, this power has historically come at a staggering cost. Early state-of-the-art methods like Google’s NASNet required enormous computational resources — training and evaluating 20,000 different architectures on 500 high-end GPUs over four days. Such requirements put NAS far beyond the reach of most researchers or organizations without access to a massive data center.
Enter Progressive Neural Architecture Search (PNAS): a smarter, more efficient way to find high-performing architectures. PNAS can match the accuracy of previous methods while being 5× more efficient in model evaluations and 8× faster in total computation. Instead of blindly searching a massive space of possibilities, PNAS starts simple and gradually increases complexity, using a learned model to guide its way.
In this article, we’ll explore how PNAS works — from its progressive search strategy to its performance predictor — and the results that make it a landmark in automated machine learning.
Background: Why Architecture Search is So Hard
Before PNAS, the dominant NAS approaches were:
Reinforcement Learning (RL):
An RNN-based controller learns a policy to generate sequences describing neural architectures. Each generated architecture is trained, and its validation accuracy is used as a reward to update the controller. Over thousands of iterations, better architectures emerge. This method was famously used in NASNet.Evolutionary Algorithms (EA):
Architectures are treated like a population of genomes. Top-performing models “reproduce” via mutation (random changes) and crossover (mixing parts of two architectures), evolving toward better solutions.
Both methods are powerful but expensive: they search fully-specified, complex architectures from the start. This means slow feedback and enormous compute costs.
A pivotal idea from NASNet was to search for cells instead of whole CNNs.
A cell is a small network module that can be stacked to form the full CNN — drastically reducing search space and enabling transfer to other datasets.
PNAS adopts this cell-based search space but innovates on the strategy.
The Core Idea: Progressive Neural Architecture Search
PNAS is built on a simple idea:
Start small, grow progressively, and only train what looks promising.
The Search Space: Cells and Blocks
A cell is a directed acyclic graph of B
blocks.
A block performs:
- Takes two inputs.
- Applies a chosen operation to each.
- Combines results via element-wise addition.
Inputs can be:
- Outputs of earlier blocks in the same cell.
- Outputs from the two previous cells in the full CNN.
Operations are chosen from a compact, effective set of 8 options:
- Depthwise-separable convolutions (3×3, 5×5, 7×7).
- Average pooling (3×3).
- Max pooling (3×3).
- 1×7 convolution followed by 7×1 convolution.
- 3×3 dilated convolution.
- Identity.
The CNN is constructed by stacking copies of the learned cell — sometimes with stride 2 for spatial downsampling.
Fig. 1. Left: The best cell structure found by PNAS (PNASNet-5).
Right: CNN construction by stacking the cell for CIFAR-10 and ImageNet.
Even with this simplification, searching all possible 5-block cells yields ~\(10^{12}\) unique structures — far too many for brute-force search.
Progressive Search Strategy
PNAS searches level-by-level:
Level 1 (B=1):
Evaluate all possible 1-block cells (136 unique). They train quickly, providing essential initial data.Level 2 (B=2):
Expand each 1-block cell by adding every possible 2nd block — yielding over 100,000 2-block candidates.Prediction & Selection:
Instead of training all candidates, use a performance predictor to score them instantly. Pick top-K (e.g., K=256) for actual training.Train & Update:
Train these top-K cells and use their results to refine the predictor.Repeat:
Expand, predict, select, train, and update until reaching target depth (e.g., B=5).
Fig. 2. Progressive search with predictor guidance: start at S1 (B=1), expand to S′2, predict scores, select top-K to form S2, train, and repeat until depth B.
Advantages:
- Efficiency: Skip unpromising architectures.
- Faster feedback: Smaller models train quickly, improving predictor early.
- Focused extrapolation: Predictor ranks models just slightly larger than seen before.
Algorithm Overview
Fig. 3. Simplified pseudocode of the Progressive Neural Architecture Search algorithm.
The Secret Weapon: Performance Predictor
The predictor scores candidate cells so only the top-K proceed to training. Perfect accuracy isn’t necessary — correct ranking is the goal.
Requirements
- Variable-length input support: Score larger cells than those in training set.
- Rank correlation with true performance: Predictions align with actual rankings.
- Sample efficiency: Learn from few trained examples.
Tested Models
RNN Predictor (LSTM):
Reads sequences of tokens (block inputs & ops). Naturally handles variable lengths.MLP Predictor:
Tokens embedded per block, averaged across blocks to form fixed-length vector.
To minimize variance, PNAS uses an ensemble of 5 predictors, each trained on a subset of data.
How Good is the Predictor?
Researchers measured Spearman rank correlation between predicted and actual performance.
Fig. 4. Top: High correlation for same-size models (current level).
Bottom: Lower but positive correlation when extrapolating to larger models (next level).
Table 1. MLP vs RNN predictors: MLP-ensemble slightly better at extrapolation — the critical PNAS use case.
The Efficiency Race: PNAS vs NAS vs Random Search
Within the same search space, PNAS was compared against:
- RL-based NASNet.
- Random search.
Fig. 5. PNAS finds top models faster; curves rise more steeply.
Table 2. Efficiency comparison: PNAS uses far fewer model evaluations to match NAS accuracy.
PNAS is:
- Up to 5× more efficient by models evaluated.
- ~8× faster in total compute: avoids NAS’s costly re-ranking stage.
Final Performance: CIFAR-10 & ImageNet
CIFAR-10
The best cell found, PNASNet-5, achieves 3.41% test error — matching NASNet-A with 21× less compute.
Table 3. CIFAR-10 results: Comparable or better accuracy with drastically lower search cost.
ImageNet Transfer
Does a CIFAR-found cell work on ImageNet? Yes. Performance correlates strongly (ρ=0.727) between datasets.
Fig. 6. Strong correlation confirms searching on CIFAR-10 is a valid proxy for larger datasets.
ImageNet Results
Mobile Setting (224×224 inputs, <600M Mult-Adds):
Table 4. Mobile setting: PNASNet-5 competitive with NASNet-A and top evolutionary models.
Large Setting (331×331 inputs):
Table 5. Large setting: PNASNet-5 achieves top-1 = 82.9%, top-5 = 96.2%, surpassing NASNet-A and matching SENet.
Conclusion & Future Directions
PNAS is a major step toward practical, accessible automated architecture search. By progressing from simple to complex and using a predictive guide:
- It achieves state-of-the-art results with a fraction of the budget.
- It enables researchers without massive compute to run effective NAS experiments.
- It shows that smart search beats brute force.
Key Takeaways:
- Simple-to-complex search is highly efficient in huge spaces.
- Surrogate models can effectively guide exploration.
- Efficiency unlocks accessibility for broader ML research.
Future possibilities include:
- Better predictors (e.g., Gaussian Processes with string kernels).
- Early stopping for unpromising architectures.
- Warm-starting larger models from smaller parents.
- Bayesian optimization to select candidates.
- Automatic speed–accuracy trade-off exploration.
PNAS doesn’t just find a top model — it provides a blueprint for finding many more, without breaking the bank.