2603.09292
2026-06-02
cs.RO
cs.CV
See, Plan, Rewind: Progress-Aware Vision-Language-Action Models for Robust Robotic Manipulation
看、规划、回退:面向鲁棒机器人操作的进度感知视觉-语言-动作模型
Tingjun Dai, Mingfei Han, Tingwen Du, Zhiheng Liu, Zihao Zhang, Zhihui Li, Salman Khan, Jun Yu, Xiaojun Chang
发表机构
*
School of Information Science and Technology, University of Science and Technology of China(信息科学与技术学院,中国科学技术大学)
;
University of Technology Sydney(新南威尔士大学)
;
Department of Computer Vision, Mohamed Bin Zayed University of Artificial Intelligence(人工智能与计算机视觉系,Mohamed Bin Zayed人工智能大学)
;
The University of Hong Kong(香港大学)
;
Institute of AI for Industry, Chinese Academy of Sciences(产业人工智能研究所,中国科学院)
;
School of Intelligent Science and Engineering, Harbin Institute of Technology (Shenzhen)(智能科学与工程学院,哈尔滨工业大学(深圳))
AI总结
提出进度感知的视觉-语言-动作框架SPR,通过动态将语言指令映射为空间子目标序列,并利用闭环进度监控实现错误恢复,在LIBERO基准上提升5%性能,在LIBERO-Plus上展现最先进的鲁棒性。