Geospatial foundation models enable data-efficient tree species mapping in temperate montane forests, substantially outperforming conventional satellite-based approaches.
The Autonomous Province of Trento in northern Italy provides a demanding test case for species-level mapping: steep environmental gradients spanning 65–3,769 m elevation, mixed-species stands, and strong phenological variability throughout the year. The dense parcel-level forest inventory provides species composition data for over 83,000 forest parcels.
We classify 18 species and species groups, including both conifers (Picea abies, Larix decidua, Pinus sylvestris) and broadleaf species (Fagus sylvatica, Quercus ilex, Ostrya–Fraxinus ornus), capturing the full range of Alpine forest diversity.
Bidirectional reflectance distribution function (BRDF) effects from variable illumination, steep slopes, and shadowing make conventional spectral approaches unreliable in mountainous terrain — exactly the conditions where foundation model embeddings can demonstrate their advantage.
Wall-to-wall species predictions across the Trentino landscape.
We compare two globally pre-trained foundation-model embeddings against conventional Sentinel-1/2 composites, evaluating performance along five experimental axes.
128-dimensional embeddings from multi-sensor seasonal composites (Sentinel-1 + Sentinel-2), capturing spectral-temporal dynamics at 10 m with annual global coverage.
64-dimensional embeddings from Google's AlphaEarth Foundations system, encoding multi-source satellite data (Sentinel-1, Sentinel-2, Landsat) with auxiliary targets including LiDAR structure and climate variables.
Conventional seasonal median composites from Sentinel-1 SAR and Sentinel-2 multispectral imagery — the standard approach in ecological remote sensing.
Pixel-level targets are defined by parcel-level species proportions, enabling mixed parcels to contribute training signal via probabilistic soft targets rather than hard dominant-species assignments.
Classification accuracy & ecological structure
Label efficiency as data is reduced
Robustness to label impurity
Contribution of ancillary covariates
Cross-year temporal transfer
Both Tessera and AlphaEarth embeddings achieve significantly higher classification accuracy than conventional Sentinel-1/2 composites across all metrics. Performance gains are largely driven by embedding quality rather than downstream model capacity.
A compact neural network (MLP) provides the best results, outperforming Random Forest while matching deeper architectures — suggesting the embeddings already encode rich, species-discriminative structure.
Classification performance across representations and classifier types. Foundation model embeddings (Tessera, AlphaEarth) vs conventional Sentinel composites.
UMAP projections of foundation model embeddings reveal that the latent space naturally separates species along ecologically meaningful axes. Conifers and broadleaves form distinct macro-clusters, while individual species and genera occupy well-defined regions — all without any ecological supervision during pre-training.
Species — Tessera 2018
Genus — Tessera 2018
Conifer vs Broadleaf — Tessera 2018
Species — AlphaEarth 2018
Conifer vs Broadleaf — AlphaEarth 2018
Elevation gradient — Tessera 2018
Classification performance as training data is progressively reduced.
When training data is progressively reduced, foundation model embeddings degrade far more gracefully than conventional composites. Tessera and AlphaEarth maintain strong performance with a fraction of the labels that baselines require.
This is critical for practical applications where labelled ecological data is expensive, sparse, and regionally uneven — opening the door to species mapping in data-poor regions.
Classification accuracy remains stable even when moderately impure (mixed-species) parcels are included in the training data. This allows the full forest inventory to be utilised without aggressive filtering, maximising data volume.
Furthermore, incorporating parcel-level species fractions as soft labels during training helps the model extract information from mixed stands, improving performance for rarer species that typically occur in mixed parcels.
Performance across label purity thresholds for different representations.
Classification performance when models trained in one year are applied to subsequent years.
Cross-year transfer reveals notable performance degradation, particularly for rare species. Interannual variability in phenology and satellite acquisition conditions affects model transferability across years.
This highlights the importance of multi-year training strategies and temporal alignment in operational forest monitoring — a key direction for future work.
The final species-prediction map across the entire Trentino landscape at 10 m resolution, produced using Tessera embeddings and an MLP classifier trained with soft labels.
Foundation model representations capture rich ecological structure aligned with functional and taxonomic groupings, without any ecological supervision.
Foundation models reduce the need for hand-crafted features. The primary challenge becomes the availability, quality, and temporal alignment of reference data.
A compact MLP matches or exceeds deeper models when applied to FM embeddings, suggesting the heavy lifting is done by the pre-trained representations.
The data-efficient pipeline can be adapted to new study areas with limited local labels — opening the door to global species-level habitat mapping.
Geospatial foundation models enable data-efficient tree species mapping in temperate montane forests.
Ball, Wicklein, Feng, Knezevic, Atzberger, Dalponte & Coomes.