2605.19342
2026-05-28
cs.CV
Semantic-Enriched Latent Visual Reasoning
语义增强的潜在视觉推理
Tianrun Xu, Yue Sun, Qixun Wang, Jingyi Lu, Yuan Wang, Tianren Zhang, Longteng Guo, Fengyun Rao, Jing Lyu, Feng Chen, Jing Liu
发表机构
*
Department of Automation, Tsinghua University, Beijing, China(清华大学自动化系)
;
Department of Electronic Engineering, Tsinghua University, Beijing, China(清华大学电子工程系)
;
Zhongguancun Academy, Beijing, China(中关村学院)
;
China Agricultural University, Beijing, China(中国农业大学)
;
Peking University, Beijing, China(北京大学)
;
Beijing Institute of Technology, Beijing, China(北京理工大学)
;
Institute of Automation, Chinese Academy of Sciences, Beijing, China(中国科学院自动化研究所)
;
School of Artificial Intelligence, University of Chinese Academy of Sciences, Beijing, China(中国科学院大学人工智能学院)
;
WeChat Vision, Tencent Inc, Beijing, China(微信视觉,腾讯公司)
AI总结
提出两阶段学习框架SLVR,通过属性级语义监督和多查询组相对策略优化增强潜在表示的语义丰富性,提升潜在视觉推理的鲁棒性和语义一致性。