Continuous biome representations from Earth observation embeddings
从地球观测嵌入中提取连续生物群落表示
Maxwell B. Joseph, Flávia De Souza Mendes, Dieu My T. Nguyen, Camile Sothe, Christopher B. Anderson (Planet Labs PBC)
AI总结 针对离散生物群落图压缩生态连续性的问题,提出从卫星图像嵌入中学习连续概率表示,在巴西6个生物群落和4672种植物数据上验证,优于离散标签预测物种分布。
详情
- Comments
- 8 pages, 4 figures
生物群落随空间连续变化,但生物群落图通过分类边界压缩了这种变化,特别是在生态过渡带,过渡群落具有独特的生态特征。地球观测基础模型通过密集嵌入编码光谱、空间和时间信息,能否将离散的生物群落图转换为更好地捕捉生态变化的连续表示?本文在Clay v1.5卫星图像嵌入上拟合线性分类器,从分类图中预测生物群落标签。softmax输出产生一个连续概率向量,其维度对应命名的生物群落类别。我们使用巴西六个生物群落、130万个嵌入和10015个保留的森林清查样地(涵盖4672种植物)评估该方法。连续生物群落表示在预测物种出现方面优于离散生物群落标签(10次空间交叉验证中平均每物种AUC 0.618 vs. 0.570)。分解这一增益表明,改进来自分级概率输出的连续性,而非标签重新分配;该模式在距生物群落边界的所有距离上均成立。原始1024维嵌入仍然是我们测试的最强预测因子(平均AUC 0.646 vs. 0.618),但连续表示恢复了嵌入相对于离散标签的大部分增益。这种简单方法为分类地图标签提供了概率替代方案,保留了其含义,同时编码了离散地图抑制的分级变化。
Biotic communities vary continuously across space, yet biome maps impose categorical boundaries that compress this variation, particularly at ecotones where transitional communities are ecologically distinct. Could Earth observation (EO) foundation models, which encode spectral, spatial, and temporal information with dense embeddings, convert discrete biome maps into continuous representations that better capture ecological variation? Here, we fit a linear classifier on Clay v1.5 satellite image embeddings to predict biome labels from a categorical map. The softmax output yields a continuous probability vector whose dimensions correspond to named biome classes. We evaluate this approach using six Brazilian biomes, 1.3 million embeddings, and 10,015 withheld forest inventory plots spanning 4,672 plant species. The continuous biome representation outperforms discrete biome labels for predicting species occurrence (mean per-species AUC 0.618 vs. 0.570 across 10 spatial cross-validation folds). Decomposing this gain shows that continuity in the graded probability output, rather than label reassignment, accounts for the improvement; the pattern holds across all distances from biome boundaries. The raw 1024-dimensional embedding remains the strongest predictor we tested (mean AUC 0.646 vs. 0.618), but the continuous representation recovers most of the embedding's gain over discrete labels. This simple approach provides a probabilistic replacement for categorical map labels, preserving their meaning while encoding graded variation that discrete maps suppress.