Hand-in-the-Loop: Improving VLA Policies for Dexterous Manipulation via Seamless Hand-Arm Intervention
手在环中:通过无缝手臂干预改进VLA策略以实现灵巧操作
Zhuohang Li, Liqun Huang, Wei Xu, Zhengming Zhu, Nie Lin, Xiao Ma, Xinjun Sheng, Ruoshi Wen
AI总结 本文提出Hand-in-the-Loop方法,通过无缝整合人类干预与自主策略执行,减少手部操作中的突兀变化,提升双臂灵巧操作的鲁棒性和效率。
详情
Vision-Language-Action (VLA)模型在灵巧操作中容易累积误差,高维动作空间和接触丰富的动态会放大政策偏差。虽然交互模仿学习(IIL)可通过人类修正数据细化策略,但将其应用于高自由度机械手仍具有挑战性,因为人类遥控与策略执行在干预时刻的命令不匹配,导致机器人手部配置的突兀变化,即'手势跳跃'。我们提出了Hand-in-the-Loop (HandITL),一种无缝的人在回路干预方法,将人类的修正意图与自主策略执行相结合,以避免在双臂灵巧操作中的手势跳跃。与使用直接遥控接管相比,HandITL将干预抖动减少了99.8%,并保持了干预后的稳健操作,将抓取失败减少了87.5%,平均完成时间减少了19.1%。我们在需要双臂协调、工具使用和精细长时域操作的任务上验证了HandITL。当用于收集策略细化的修正数据时,HandITL在三个长时域灵巧任务中平均优于使用标准遥控数据训练的策略19%。
Vision-Language-Action (VLA) models are prone to compounding errors in dexterous manipulation, where high-dimensional action spaces and contact-rich dynamics amplify small policy deviations over long horizons. While Interactive Imitation Learning (IIL) can refine policies through human correction data, applying it to high-degree-of-freedom (DoF) robotic hands remains challenging due to a command mismatch between human teleoperation and policy execution at the intervention moment, which causes abrupt robot-hand configuration changes, or "gesture jumps". We present Hand-in-the-Loop (HandITL), a seamless human-in-the-loop intervention method that blends human corrective intent with autonomous policy execution to avoid gesture jumps during bimanual dexterous manipulation. Compared with taking over control using direct teleoperation, HandITL reduces intervention jitter by 99.8% and preserves robust post-intervention manipulation, reducing grasp failures by 87.5% and mean completion time by 19.1%. We validate HandITL on tasks requiring bimanual coordination, tool use, and fine-grained long-horizon manipulation. When used to collect correction data for policy refinement, HandITL yields policies that outperform those trained with standard teleoperation data by 19% on average across three long-horizon dexterous tasks.