2605.30280
2026-06-02
cs.RO
cs.AI
cs.CL
版本更新
Qwen-VLA: Unifying Vision-Language-Action Modeling across Tasks, Environments, and Robot Embodiments
Qwen-VLA:统一跨任务、环境和机器人形态的视觉-语言-动作建模
Qiuyue Wang, Mingsheng Li, Jian Guan, Jinhui Ye, Sicheng Xie, Yitao Liu, Junhao Chen, Zhixuan Liang, Jie Zhang, Xintong Hu, Xuhong Huang, Pei Lin, Junyang Lin, Dayiheng Liu, Shuai Bai, Jingren Zhou, Jiazhao Zhang, Haoqi Yuan, Gengze Zhou, Hang Yin, Ye Wang, Yiyang Huang, Zixing Lei, Wujian Peng, Delin Chen, Yingming Zheng, Jingyang Fan, Xianwei Zhuang, Xin Zhou, Haoyang Li, Anzhe Chen, Tong Zhang, Xuejing Liu, Yuchong Sun, Ruizhe Chen, Zhaohai Li, Chenxu Lü, Zhibo Yang, Tao Yu, Xionghui Chen
发表机构
*
Qwen Team(通义实验室)
AI总结
提出Qwen-VLA,一种基于DiT动作解码器的统一具身基础模型,通过大规模联合预训练和具身感知提示,将操作、导航和轨迹预测统一为动作-轨迹预测框架,实现跨任务、环境和机器人形态的泛化。