STORM: Segment, Track, and Object Re-Localization from a Single Image
发表机构 * Department of Computer Science, Technical University of Darmstadt, Darmstadt, Hesse, Germany(德累斯顿技术大学计算机科学系) ; Hessian Center for Artificial Intelligence (hessian.AI), Darmstadt, Hesse, Germany(黑森人工智能中心(hessian.AI)) ; German Research Center for Artificial Intelligence (DFKI), Darmstadt, Hesse, Germany(德国人工智能研究中心(DFKI)) ; Centre for Cognitive Science, Technical University of Darmstadt, Darmstadt, Hesse, Germany(德累斯顿技术大学认知科学中心) ; Google Intrinsic AI Research, Germany. † Work done while at the AIML research lab, now working at Intrinsic, Google.(谷歌Intrinsic AI研究)
AI总结 STORM 是一种统一的框架,能够基于单张参考图像进行条件化的6D姿态估计与跟踪,具有较高的鲁棒性和较低的人工输入需求。该方法结合了分层空间融合注意力机制和基于BCE训练的跟踪验证器,能够在遮挡和快速运动等复杂场景下稳定恢复目标姿态。实验表明,STORM 在无需标注的情况下优于现有方法,并能有效应对严重遮挡和视角变化。
Comments 21 pages. Accepted at the 43rd International Conference on Machine Learning (ICML 2026); camera-ready version