Free-Grained Hierarchical Visual Recognition
自由粒度层次视觉识别
Seulki Park, Zilin Wang, Stella X. Yu
AI总结 本文研究了在现实世界中标签不完整且粒度混合的情况下,如何进行层次视觉识别。通过引入自由粒度训练方法,结合文本监督和半监督学习,改进了传统层次方法在不完整监督下的性能,并提出了自由粒度推理机制以适应不同预测深度的需求。
详情
- Comments
- Accepted to CVPR 2026. 31 pages
层次图像识别旨在沿着语义分类学预测类别标签,从广义类别到具体类别。通常假设每张训练图像在其分类路径上完全标注。现实更复杂:远处的鸟可能仅被标记为鸟,而清晰的特写可能证明是 bald eagle。我们引入了自由粒度训练,其中标签可能出现在分类学的任何层次,模型必须从不完整、混合粒度的监督中学习一致的层次预测。我们构建了具有不同标签粒度的基准数据集,并展示了现有层次方法在该设置下性能急剧下降。为弥补缺失的监督,我们提出了两种简单解决方案:一种是添加基于文本的广泛监督以捕捉视觉属性,另一种是将特定分类学层次中缺失的标签视为半监督学习问题。我们还研究了自由粒度推理,其中模型选择预测深度,当细粒度预测不确定时返回可靠的粗粒度标签。整体而言,我们的任务、数据集和方法使层次识别更接近现实世界中标签的产生方式。
Hierarchical image recognition seeks to predict class labels along a semantic taxonomy, from broad categories to specific ones, typically under the tidy assumption that every training image is fully annotated along its taxonomy path. Reality is messier: A distant bird may be labeled only bird, while a clear close-up may justify bald eagle. We introduce free-grain training, where labels may appear at any level of the taxonomy and models must learn consistent hierarchical predictions from incomplete, mixed-granularity supervision. We build benchmark datasets with varying label granularity and show that existing hierarchical methods deteriorate sharply in this setting. To make up for missing supervision, we propose two simple solutions: One adds broad text-based supervision that captures visual attributes, and the other treats missing labels at specific taxonomy levels as a semi-supervised learning problem. We also study free-grained inference, where the model chooses how deep to predict, returning a reliable coarse label when a fine-grained one is uncertain. Together, our task, datasets, and methods move hierarchical recognition closer to the way labels arise in the real world.