Seeing Before Reasoning: Decoupling Perception and Reasoning for Shortcut-Resilient Multimodal On-Policy Self-Distillation
先看后思:解耦感知与推理以实现抗捷径的多模态在策略自蒸馏
发表机构 * State Key Laboratory of Robotics and Intelligent Systems, Shenyang Institute of Automation, Chinese Academy of Sciences(中国科学院沈阳自动化研究所机器人学国家重点实验室) ; University of Chinese Academy of Sciences(中国科学院大学)
AI总结 提出ViGOS框架,通过解耦感知和推理,在MLLM后训练中避免文本捷径,提升图像依赖行为。
Comments 29 pages, 5 figures, 8 tables