ODOV: Benchmark the Open-Domain Open-Vocabulary Object Detection
ODOV:开放域开放词汇目标检测基准
Yupeng Zhang, Ruize Han, Fangnan Zhou, Wei Feng, Liang Wan
AI总结 针对真实场景中域偏移和类别偏移同时发生的问题,提出开放域开放词汇目标检测任务,构建OD-LVIS基准数据集,并设计基于VLM的基线方法,通过域无关类别提示和域投影嫁接模块提升检测性能。
详情
现有研究通常将域偏移和类别偏移作为独立问题进行研究,然而在真实场景中,这两种偏移常常同时发生并相互作用,导致检测性能显著下降。为了解决这一问题,我们提出并系统研究了一个新问题——开放域开放词汇(ODOV)目标检测,旨在评估模型在真实环境中适应复合域和类别偏移的能力。我们构建了一个新的基准数据集OD-LVIS,包含来自15个不同真实场景的46,949张图像和1,203个类别,用于评估目标检测性能。此外,我们提出了一种新的ODOV检测基线,充分利用VLM强大的多模态对齐能力,并引入两种关键机制以增强类别和域泛化能力。一种是域无关类别提示(DAPmt),它在增强类别语义的同时减弱域表示,从而实现纯粹的类别表示。另一种是域投影与嫁接(DP&G)模块,它融合了输入图像中的域特定特征,使模型能够动态地在各种开放域中进行泛化。这两个组件使模型能够在真实场景中同时存在类别和域变化的情况下保持有效的检测性能。我们为提出的ODOV检测任务提供了广泛的基准评估,并报告了实验结果。这些结果验证了ODOV任务的合理性、OD-LVIS数据集的实用性以及该方法的优越性。
Existing studies typically investigate domain shift and category shift as independent problems, however, in real-world scenarios, the two types of shifts often occur simultaneously and interact, leading to significant degradation in detection performance. To address this, we propose and systematically study a novel problem-Open-Domain Open-Vocabulary (ODOV) object detection-which aims to evaluate a model's ability to adapt to the compound domain and category shifts in real-world environments.We construct a new benchmark, OD-LVIS, which contains 46,949 images spanning 15 diverse real-world scenarios and 1,203 categories, for assessing object detection performance. Furthermore, we propose a novel ODOV detection baseline that fully leverages VLM's powerful multi-modal alignment capabilities and introduces two key mechanisms to enhance both category and domain generalization. One is the Domain-Agnostic Category Prompt (DAPmt), which strengthens category semantics while attenuating domain representations, enabling pure category representation. The other is the Domain Projection and Grafting (DP&G) module, which incorporates domain-specific features from input images, allowing the model to dynamically generalize across diverse open domains. These two components enable the model to maintain effective detection performance under simultaneous category and domain variations in real-world scenarios. We provide extensive benchmark evaluations for the proposed ODOV detection task and report experimental results. These results validate the soundness of the ODOV task, the practicality of the OD-LVIS dataset, and the superiority of the method.