Introduction
If you want to know how much carbon a forest stores, or how healthy an ecosystem is, you need to know the height of the trees. It sounds simple, but measuring canopy height at a global—or even national—scale is a logistical nightmare. You cannot send a team of researchers with tape measures into every hectare of woodland.
Traditionally, we have relied on two extremes. On one end, we have Aerial Laser Scanning (ALS), or LiDAR flown on planes. It is incredibly accurate, creating dense 3D clouds of the forest structure, but it is prohibitively expensive and rarely updated. On the other end, we have satellite imagery. Satellites like Sentinel-2 pass over frequently and freely, but their resolution (10 to 30 meters per pixel) is often too coarse to distinguish individual trees or detect subtle logging activities.
There is a middle ground: Very High Resolution (VHR) satellite imagery, which captures details at the meter level. However, developing AI models to estimate tree height from these images has been blocked by a massive barrier: the data is usually proprietary, expensive, or locked behind non-commercial licenses.
This brings us to a breakthrough paper: “Open-Canopy: Towards Very High Resolution Forest Monitoring.” In this work, the researchers introduce the first open-access, country-scale benchmark for estimating canopy height at a stunning 1.5-meter resolution. By combining satellite imagery with aerial LiDAR data across France, they have created a dataset that allows us to train modern computer vision models to “see” the height of forests from space with unprecedented clarity.
In this post, we will walk through how Open-Canopy was built, the deep learning architectures used to crack the code of forest structure, and why this matters for the future of environmental monitoring.
Background: The Data Dilemma
To understand why Open-Canopy is significant, we have to look at the current state of forest monitoring.
Most existing global canopy height maps rely on GEDI (Global Ecosystem Dynamics Investigation), a laser mounted on the International Space Station. GEDI provides precise samples of tree height, but it only captures sparse footprints (dots on the ground), not a continuous map. Researchers typically take these sparse dots and use them to train models on lower-resolution satellite images (like Landsat). The result is a map that gives you a general idea of forest height but lacks the granularity needed for precision forestry or biodiversity monitoring.
The “Holy Grail” is training models on ALS (Airborne Laser Scanning). ALS provides a “ground truth” so precise it can map individual branches. If we could train a Deep Learning model to look at a satellite image and predict the ALS height map, we could monitor forests cheaply and frequently.
Attempts have been made before, but they suffer from reproducibility issues. Datasets like those used in previous state-of-the-art papers often rely on private satellite providers (like Maxar) or do not release their training splits. This makes it impossible for the scientific community to verify results or build upon them.
The French Connection
France offered a unique opportunity to solve this. Recent national initiatives made two critical datasets public:
- LiDAR-HD: A massive government project mapping France in 3D using aerial lasers.
- DINAMIS: A portal providing open access to SPOT 6-7 satellite imagery (1.5m resolution) for research.
The authors of Open-Canopy combined these resources to create a dataset covering over 87,000 km² of territory.

As shown in Figure 2, the dataset is not just a random collection of images. It is carefully split into training, validation, and test sets distributed across the diverse bioclimatic regions of France, from the Alps to the Mediterranean. The top panel (a) shows the geographic distribution, while (b) and (c) visualize the input (satellite image) and the target (LiDAR height map).
The Open-Canopy Dataset
Building a benchmark of this scale requires rigorous data engineering. The researchers didn’t just crop images; they had to align distinct data sources perfectly.
The Inputs
The model inputs are SPOT 6-7 satellite images. While standard computer vision models usually expect Red, Green, and Blue (RGB) channels, forestry applications benefit heavily from a fourth channel: Near-Infrared (NIR). Vegetation reflects NIR light strongly, making it a powerful signal for plant health and density. The images in Open-Canopy are “pansharpened,” a process that combines lower-resolution color data with high-resolution black-and-white data to produce sharp 1.5m resolution images with four spectral bands (R, G, B, NIR).
The Ground Truth
The target for the AI models is the Canopy Height Model (CHM). This is derived from the ALS point clouds. By subtracting the ground elevation from the highest point in the laser scan for every pixel, the researchers created a raster map representing the absolute height of the vegetation.
Masking the Vegetation
One challenge in remote sensing is distinguishing a 10-meter tree from a 10-meter building. To focus the benchmark on forestry, the researchers created a comprehensive Vegetation Mask.

Figure 3 illustrates this process. They combined raw LiDAR vegetation detection (a) with official government forest maps (b). The result is a precise mask (c) that includes not just dense forests, but also hedgerows, urban parks, and scattered trees that official maps often miss. This ensures the model is evaluated on its ability to measure trees, not skyscrapers.
Core Method: Adapting Vision Models for Forests
Now that we have the data, how do we predict tree height? The problem is framed as a dense regression task. For every pixel in the input satellite image, the model must predict a continuous value representing height in meters.
From UNets to Transformers
Historically, the UNet architecture has been the workhorse of satellite image segmentation. It uses Convolutional Neural Networks (CNNs) to downsample an image to capture context and then upsample it to predict details.
However, the field of Computer Vision has recently been revolutionized by Vision Transformers (ViTs). Unlike CNNs, which look at local neighborhoods of pixels, Transformers use “attention” mechanisms to understand global relationships across the entire image. This is crucial for forests, where the height of a tree might depend on the density of the surrounding canopy or the texture of the terrain.
The authors benchmarked several architectures:
- UNet & DeepLabv3: Standard CNN baselines.
- ViT (Vision Transformer): The standard transformer architecture.
- Hierarchical Transformers (Swin, PVTv2): These are hybrids that combine the global context of transformers with the multi-scale processing of CNNs.
The Challenge of the 4th Channel
Most of these models are pre-trained on ImageNet (a massive database of everyday photos like cats and cars). ImageNet images have 3 channels (RGB). SPOT images have 4 (RGB + Near-Infrared).
To use pre-trained weights, the researchers had to adapt the first layer of the networks. They kept the RGB weights from ImageNet and initialized the weights for the new NIR channel with small random values. This allows the model to start with a strong understanding of visual features while gradually learning how to use the infrared data.
Why Architecture Matters
The results revealed a fascinating hierarchy of performance. Standard Transformers (ViT) struggled, producing blocky artifacts. UNets were better but lacked precision. The clear winners were the Hierarchical Transformers, specifically PVTv2 (Pyramid Vision Transformer).

Figure 4 demonstrates the difference visually. The top row shows absolute error. Look at the column for ViT-B: it is covered in “hot” yellow/red spots, indicating errors of up to 40 meters. The column for PVTv2 is much darker, showing that the hierarchical transformer creates a far more accurate and consistent height map.
Experiments & Results
The researchers evaluated their models against existing global forest products. The comparison is visually striking.
Qualitative Comparison
The image below compares the Open-Canopy method against other state-of-the-art maps.

In Figure 1, panel (a) is the satellite image, and (b) is the ground truth LiDAR. Panel (c) is the prediction from the PVTv2 model trained on Open-Canopy. Notice how it captures the fine texture of the canopy and the sharp boundaries of the clearing.
Compare this to panels (e), (f), and (g), which represent other well-known products (like those from Lang et al. or Potapov et al.). Those maps are much blurrier and blockier. This isn’t just because those models are worse; it’s because they are limited by lower-resolution training data (10m or 30m). Open-Canopy proves that training on 1.5m data yields a massive leap in fidelity.
Quantitative Success
The numbers back up the visuals. The PVTv2 model achieved a Mean Absolute Error (MAE) of 2.52 meters. In comparison, global maps often have errors ranging from 6 to 9 meters when evaluated at this resolution.
The researchers also analyzed where the errors were coming from.

Figure 5 shows the error distribution across different tree heights. The boxplots reveal that the Open-Canopy model (the first one in each group) consistently has the tightest distribution (smallest boxes) centered around zero. It performs significantly better on tall trees (30-60m) compared to other methods, which often underestimate the height of majestic old-growth forests.
Does it Generalize?
A major critique of regional datasets is that models trained on them might fail elsewhere. To test this, the researchers took their model (trained only on France) and applied it to a forest in Utah, USA.
Despite the difference in geography and tree species, the model performed remarkably well, achieving accuracy comparable to models specifically trained on US data. This suggests that the structural features of forests learned by the Transformer are robust and transferable.
Open-Canopy- \(\Delta\): Detecting Change
Forest monitoring isn’t just about static height; it’s about dynamics. Illegal logging, storm damage, and dieback from drought all require detecting changes in canopy height over time.
The researchers introduced a second benchmark: Open-Canopy- \(\Delta\). They focused on the Forêt de Chantilly, an area suffering from climate-induced dieback. They acquired LiDAR data from 2022 and 2023, giving them a perfect “before and after” ground truth.
This is an incredibly difficult task. Trees grow slowly, so the signal for growth is weak. However, finding areas where height decreased (due to cutting or death) is critical.

Figure 6 visualizes this challenge. Panel (b) shows the ground truth change: red areas indicate height loss. Panel (c) shows the prediction from the Open-Canopy model. While not perfect, it successfully identifies the major zones of canopy loss.
Crucially, the Open-Canopy model significantly outperformed methods based on Sentinel-2 data (Panel d). Because Sentinel-2 has low resolution, it misses smaller patches of dieback that the VHR model picks up.
Conclusion and Implications
The “Open-Canopy” paper represents a pivotal moment for environmental AI. By releasing a massive, high-quality, open-access dataset, the authors have removed the barrier to entry for researchers worldwide.
Key Takeaways:
- Resolution is King: Training on 1.5m imagery allows for measuring individual trees and small-scale disturbances that 10m satellite data simply misses.
- Transformers Work: Hierarchical Vision Transformers (like PVTv2) are uniquely clear-sighted when it comes to interpreting complex forest textures.
- Open Science Wins: By moving away from proprietary data, this benchmark allows for fair comparison and faster iteration in the scientific community.
As we face an accelerating climate crisis, tools like this are essential. They allow governments and conservationists to move from rough estimates of forest carbon to precise, actionable measurements—pixel by pixel, tree by tree.
](https://deep-paper.org/en/paper/2407.09392/images/cover.png)