Humans have a remarkable ability to learn new things from just one or two examples. See a single picture of a toucan, and you can likely identify other toucans, even if they differ in pose, lighting, or background. For machine learning models—especially deep neural networks—this is a monumental challenge. They are notoriously data-hungry, often requiring thousands of labeled examples to achieve similar feats. This gap between human and machine learning is the battleground for a field called few-shot learning.
One of the most promising approaches to this problem is meta-learning, or “learning to learn.” Instead of training a model to solve one specific task (like classifying cats vs. dogs), a meta-learning algorithm learns a process for learning new tasks quickly. It trains on a wide variety of tasks so that when a new, unseen task arrives, it can adapt with minimal data.
While meta-learning has made significant strides, its performance still lags far behind human capabilities. A 2018 paper from researchers at Huawei Noah’s Ark Lab, titled Deep Meta-Learning: Learning to Learn in the Concept Space, argues that this is because we’ve been asking our models to learn in the wrong place. They propose that the key isn’t just learning how to learn, but learning where to learn. Instead of grappling with the messy, complex world of raw pixels, what if a model could learn in a cleaner, more abstract concept space? This is the core idea behind their framework, Deep Meta-Learning (DEML).
In this article, we’ll unpack how DEML works, why it’s so effective, and what it means for the future of building more flexible and efficient AI.
The Problem with Learning from Pixels
Imagine you’re trying to describe the concept of a “dog” to a computer using only a handful of images. One image might be a golden retriever in a park, another a chihuahua in a dimly lit room, and a third a cartoon drawing of a dalmatian. In the raw pixel space, these images are wildly different. The colors, lighting, backgrounds, and textures have very little in common. A standard learning algorithm would struggle to find consistent patterns from such limited examples.
This is the fundamental challenge of few-shot learning in the instance space—the space of raw data. The high-level concept (e.g., “dog”) is obscured by low-level variations such as pose, lighting, and background.
Meta-learning helps by exposing the model to many different tasks (e.g., cat vs. rabbit, chair vs. table), allowing it to learn general strategies for identifying useful features. However, it still operates on the same chaotic, pixel-level data. The authors of DEML argue that we can do much better. Instead of making the meta-learner better at navigating the instance space, we should transform that space itself into something simpler: a concept space.
In this new space, the golden retriever and the chihuahua would share similar representations because they both embody the abstract concept of “dog.”
The Deep Meta-Learning (DEML) Framework
To achieve this transformation, the researchers propose a framework with three synergistically trained components: a Concept Generator, a Meta-Learner, and a Concept Discriminator.

Figure 1: Deep meta-learning architecture. Raw images are encoded into concept-level representations used by both a Meta-Learner for few-shot tasks and a Concept Discriminator for classification.
1. The Concept Generator (\(\mathcal{G}\))
This is the heart of DEML. It’s a deep neural network (the paper uses a ResNet-50) that maps raw input images to vectors in a concept space. The goal is to train this generator to produce representations that make few-shot learning easier and more effective.
2. The Meta-Learner (\(\mathcal{M}\))
This module is any standard meta-learning algorithm, such as Matching Networks, MAML, or Meta-SGD. Crucially, it doesn’t see the raw images—only the concept vectors from the generator. Its task is to perform few-shot classification using these “pre-digested” features. The error signal from this task flows back to guide the generator: “produce concepts that make my job easier.”
3. The Concept Discriminator (\(\mathcal{D}\))
The discriminator ensures the Concept Generator learns broadly useful representations. It’s a classifier trained on a large external dataset (like a subset of ImageNet). It takes the concept vectors and predicts their class labels. Its error signal tells the generator: “produce concepts that make sense for general classification.”
The Power of Joint Training
The magic of DEML lies in training all three modules jointly. The Concept Generator (\(\mathcal{G}\)) is pulled in two complementary directions:
- The Meta-Learner (\(\mathcal{M}\)) pushes it to create representations useful for specific few-shot tasks — meta-level knowledge.
- The Concept Discriminator (\(\mathcal{D}\)) pushes it to create representations that are general and robust — external knowledge.
By combining these objectives, the generator learns representations that are both general enough to apply across domains and specific enough to enable rapid adaptation.
Mathematically, the joint objective can be written as:
\[ \min_{\boldsymbol{\theta}_{\mathcal{G}},\boldsymbol{\theta}_{\mathcal{M}},\boldsymbol{\theta}_{\mathcal{D}}} \mathbb{E}_{\mathcal{T} \sim p(\mathcal{T}), (\mathbf{x},\mathbf{y}) \sim \mathbb{D}} \left[J\left( \mathcal{L}_{\mathcal{T}}(\boldsymbol{\theta}_{\mathcal{M}}, \boldsymbol{\theta}_{\mathcal{G}}), \mathcal{L}_{(\mathbf{x},\mathbf{y})}(\boldsymbol{\theta}_{\mathcal{D}},\boldsymbol{\theta}_{\mathcal{G}})\right)\right] \]Here, \(J\) combines two losses:
- Meta-Learning Loss (\(\mathcal{L}_{\mathcal{T}}\)), which measures how well the meta-learner performs few-shot tasks in the concept space.
- Concept Discrimination Loss (\(\mathcal{L}_{(\mathbf{x},\mathbf{y})}\)), which measures classification accuracy on external data.
A weighting hyperparameter (\(\lambda\)) balances these two terms during optimization.
A Plug-and-Play Framework
One of DEML’s strengths is its flexibility—you can plug in any meta-learning algorithm. The paper demonstrates DEML using:
- Matching Networks for metric-based learning in the concept space.
- MAML (Model-Agnostic Meta-Learning) that learns quick adaptation policies on concept vectors.
- Meta-SGD, which learns initialization and learning rates directly.
This shows that DEML’s advantage comes from learning in the concept space, not from any particular algorithmic trick.
Putting DEML to the Test
The researchers evaluated DEML across several benchmark datasets: MiniImageNet, Caltech-256, CIFAR-100, and CUB-200, along with an external set of 200 ImageNet classes for concept discrimination.

Figure 2: Architecture used in experiments. The ResNet-50 backbone generates 2048-dimensional features fed into two branches: the Meta-Learner for few-shot learning and the Image Classifier for concept discrimination.
DEML vs. Vanilla Meta-Learning
How much does DEML help? Quite a lot.

Table 1: DEML significantly boosts accuracy for all meta-learners and datasets. For example, Meta-SGD on CUB-200 improves from 53.34% to 66.95%.
Across all tested datasets and meta-learners, DEML consistently outperformed their vanilla counterparts. The concept space representations made few-shot learning substantially easier.
Is It Just a Deeper Network?
Could these gains simply be due to using a powerful backbone like ResNet-50? The authors tested this by creating “Deep” versions of the baselines trained without the concept discriminator or joint loss.

Table 2: DEML vs. Deep baselines. Despite using the same network depth, DEML achieves superior results thanks to joint learning in the concept space.
The results confirm that depth alone isn’t the secret—synergistic joint training is.
DEML vs. Transfer Learning
Traditional transfer learning pre-trains a network on a large dataset (e.g., ImageNet) and then reuses its features for new tasks. The authors compared two transfer baselines—Decaf+kNN and Decaf+Meta-SGD—against DEML.

Table 3: DEML vs. transfer learning. Transfer learning excels on similar datasets but struggles on dissimilar ones. DEML consistently provides higher, more robust accuracy.
Transfer learning works well when the target dataset resembles the source (e.g., MiniImageNet vs. ImageNet), but fails when domains differ. DEML’s joint training enables the concept generator to adapt across domains, retaining generality and task relevance simultaneously.

Figure 3: DEML+Meta-SGD consistently outperforms transfer learning and fine-tuning variants, highlighting the benefit of joint optimization.
Tuning the Balance (\(\lambda\))
The hyperparameter \(\lambda\) controls the balance between external concept learning and few-shot task learning. How sensitive is the model to this trade-off?

Figure 4: Few-shot learning (red) and classification (blue) accuracy on CIFAR-100 at different \(\lambda\) values. Moderate λ balances external and internal learning best.
As \(\lambda\) increases, concept classification improves, but few-shot performance first rises then declines. Too much reliance on external data makes the generator less task-specific. A balanced λ yields the best performance.
Conclusion: Learning Where It Matters
The Deep Meta-Learning framework delivers a striking insight: where a model learns matters as much as how it learns. By shifting few-shot learning into a richer concept space, DEML empowers models to learn faster and more effectively using fewer examples.
Its three key modules work together:
- Concept Generator — builds the concept space.
- Meta-Learner — provides task-specific feedback.
- Concept Discriminator — integrates external, general-purpose knowledge.
Beyond improving few-shot learning, DEML points toward life-long learning systems. The Concept Generator isn’t static—it can evolve as new data and tasks appear, continually refining its concept space and enabling ongoing adaptation. In doing so, DEML moves machine learning closer to human-like continual learning.
By teaching our models to first learn concepts, we may finally teach them to learn new tasks with the versatility and efficiency that humans achieve naturally.
](https://deep-paper.org/en/paper/1802.03596/images/cover.png)