Designing the architecture of a neural network has long been considered a dark art — a blend of intuition, experience, and trial-and-error. But what if we could automate this process? What if an AI could design an even better AI? This is the promise of Neural Architecture Search (NAS), a field that has produced some of the best-performing models in computer vision.

However, this power has historically come at a staggering cost. Early state-of-the-art methods like Google’s NASNet required enormous computational resources — training and evaluating 20,000 different architectures on 500 high-end GPUs over four days. Such requirements put NAS far beyond the reach of most researchers or organizations without access to a massive data center.

Enter Progressive Neural Architecture Search (PNAS): a smarter, more efficient way to find high-performing architectures. PNAS can match the accuracy of previous methods while being 5× more efficient in model evaluations and 8× faster in total computation. Instead of blindly searching a massive space of possibilities, PNAS starts simple and gradually increases complexity, using a learned model to guide its way.

In this article, we’ll explore how PNAS works — from its progressive search strategy to its performance predictor — and the results that make it a landmark in automated machine learning.


Background: Why Architecture Search is So Hard

Before PNAS, the dominant NAS approaches were:

  • Reinforcement Learning (RL):
    An RNN-based controller learns a policy to generate sequences describing neural architectures. Each generated architecture is trained, and its validation accuracy is used as a reward to update the controller. Over thousands of iterations, better architectures emerge. This method was famously used in NASNet.

  • Evolutionary Algorithms (EA):
    Architectures are treated like a population of genomes. Top-performing models “reproduce” via mutation (random changes) and crossover (mixing parts of two architectures), evolving toward better solutions.

Both methods are powerful but expensive: they search fully-specified, complex architectures from the start. This means slow feedback and enormous compute costs.

A pivotal idea from NASNet was to search for cells instead of whole CNNs.
A cell is a small network module that can be stacked to form the full CNN — drastically reducing search space and enabling transfer to other datasets.

PNAS adopts this cell-based search space but innovates on the strategy.


PNAS is built on a simple idea:
Start small, grow progressively, and only train what looks promising.

The Search Space: Cells and Blocks

A cell is a directed acyclic graph of B blocks.

A block performs:

  1. Takes two inputs.
  2. Applies a chosen operation to each.
  3. Combines results via element-wise addition.

Inputs can be:

  • Outputs of earlier blocks in the same cell.
  • Outputs from the two previous cells in the full CNN.

Operations are chosen from a compact, effective set of 8 options:

  • Depthwise-separable convolutions (3×3, 5×5, 7×7).
  • Average pooling (3×3).
  • Max pooling (3×3).
  • 1×7 convolution followed by 7×1 convolution.
  • 3×3 dilated convolution.
  • Identity.

The CNN is constructed by stacking copies of the learned cell — sometimes with stride 2 for spatial downsampling.

The PNASNet-5 cell structure (left), found by the search algorithm, and how these cells are stacked to build full CNNs for CIFAR-10 and ImageNet (right).

Fig. 1. Left: The best cell structure found by PNAS (PNASNet-5).
Right: CNN construction by stacking the cell for CIFAR-10 and ImageNet.

Even with this simplification, searching all possible 5-block cells yields ~\(10^{12}\) unique structures — far too many for brute-force search.


Progressive Search Strategy

PNAS searches level-by-level:

  1. Level 1 (B=1):
    Evaluate all possible 1-block cells (136 unique). They train quickly, providing essential initial data.

  2. Level 2 (B=2):
    Expand each 1-block cell by adding every possible 2nd block — yielding over 100,000 2-block candidates.

  3. Prediction & Selection:
    Instead of training all candidates, use a performance predictor to score them instantly. Pick top-K (e.g., K=256) for actual training.

  4. Train & Update:
    Train these top-K cells and use their results to refine the predictor.

  5. Repeat:
    Expand, predict, select, train, and update until reaching target depth (e.g., B=5).

Illustration of the PNAS search procedure for B=3. The algorithm progressively builds more complex cells, using a predictor to score candidates and only training the most promising ones (solid blue circles) at each stage.

Fig. 2. Progressive search with predictor guidance: start at S1 (B=1), expand to S′2, predict scores, select top-K to form S2, train, and repeat until depth B.

Advantages:

  • Efficiency: Skip unpromising architectures.
  • Faster feedback: Smaller models train quickly, improving predictor early.
  • Focused extrapolation: Predictor ranks models just slightly larger than seen before.

Algorithm Overview

Algorithm 1: Progressive Neural Architecture Search (PNAS). The algorithm iterates from b=2 to B, expanding cells, predicting their performance, selecting the top-K, training them, and updating the predictor.

Fig. 3. Simplified pseudocode of the Progressive Neural Architecture Search algorithm.


The Secret Weapon: Performance Predictor

The predictor scores candidate cells so only the top-K proceed to training. Perfect accuracy isn’t necessary — correct ranking is the goal.

Requirements

  1. Variable-length input support: Score larger cells than those in training set.
  2. Rank correlation with true performance: Predictions align with actual rankings.
  3. Sample efficiency: Learn from few trained examples.

Tested Models

  • RNN Predictor (LSTM):
    Reads sequences of tokens (block inputs & ops). Naturally handles variable lengths.

  • MLP Predictor:
    Tokens embedded per block, averaged across blocks to form fixed-length vector.

To minimize variance, PNAS uses an ensemble of 5 predictors, each trained on a subset of data.


How Good is the Predictor?

Researchers measured Spearman rank correlation between predicted and actual performance.

True vs. predicted accuracies for the MLP-ensemble predictor. The top row shows performance on models of sizes seen during training, while the bottom row shows performance on larger, unseen models. While extrapolation (bottom) is harder, a positive correlation remains.

Fig. 4. Top: High correlation for same-size models (current level).
Bottom: Lower but positive correlation when extrapolating to larger models (next level).

Spearman rank correlations for different predictors. The MLP-ensemble generally performs best on the extrapolation task (ρ̃), which is the most important for PNAS.

Table 1. MLP vs RNN predictors: MLP-ensemble slightly better at extrapolation — the critical PNAS use case.


Within the same search space, PNAS was compared against:

  • RL-based NASNet.
  • Random search.

Comparing the search efficiency of PNAS, NAS, and random search. PNAS consistently finds models with higher accuracy after the same number of evaluations.

Fig. 5. PNAS finds top models faster; curves rise more steeply.

Table comparing the efficiency of PNAS and NAS. To reach the same accuracy, NAS needs to evaluate 3–5× more models than PNAS.

Table 2. Efficiency comparison: PNAS uses far fewer model evaluations to match NAS accuracy.

PNAS is:

  • Up to 5× more efficient by models evaluated.
  • ~8× faster in total compute: avoids NAS’s costly re-ranking stage.

Final Performance: CIFAR-10 & ImageNet

CIFAR-10

The best cell found, PNASNet-5, achieves 3.41% test error — matching NASNet-A with 21× less compute.

Performance of top models on the CIFAR-10 test set. PNASNet-5 matches the accuracy of NASNet-A while requiring a tiny fraction of the search cost.

Table 3. CIFAR-10 results: Comparable or better accuracy with drastically lower search cost.

ImageNet Transfer

Does a CIFAR-found cell work on ImageNet? Yes. Performance correlates strongly (ρ=0.727) between datasets.

The performance of architectures on CIFAR-10 is strongly correlated with their performance on ImageNet, validating the strategy of searching on a smaller dataset first.

Fig. 6. Strong correlation confirms searching on CIFAR-10 is a valid proxy for larger datasets.


ImageNet Results

Mobile Setting (224×224 inputs, <600M Mult-Adds): ImageNet classification results in the “Mobile” setting, constrained by computational resources. PNASNet-5 is competitive with the best models.

Table 4. Mobile setting: PNASNet-5 competitive with NASNet-A and top evolutionary models.

Large Setting (331×331 inputs): ImageNet classification results in the “Large” setting. PNASNet-5 achieves a new state-of-the-art accuracy, surpassing previous methods like NASNet-A and SENet.

Table 5. Large setting: PNASNet-5 achieves top-1 = 82.9%, top-5 = 96.2%, surpassing NASNet-A and matching SENet.


Conclusion & Future Directions

PNAS is a major step toward practical, accessible automated architecture search. By progressing from simple to complex and using a predictive guide:

  • It achieves state-of-the-art results with a fraction of the budget.
  • It enables researchers without massive compute to run effective NAS experiments.
  • It shows that smart search beats brute force.

Key Takeaways:

  • Simple-to-complex search is highly efficient in huge spaces.
  • Surrogate models can effectively guide exploration.
  • Efficiency unlocks accessibility for broader ML research.

Future possibilities include:

  • Better predictors (e.g., Gaussian Processes with string kernels).
  • Early stopping for unpromising architectures.
  • Warm-starting larger models from smaller parents.
  • Bayesian optimization to select candidates.
  • Automatic speed–accuracy trade-off exploration.

PNAS doesn’t just find a top model — it provides a blueprint for finding many more, without breaking the bank.