VLA / 视觉-语言-动作模型

2605.05925 2026-06-18 cs.RO 版本更新专题 60

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

DexSynRefine：合成与精炼人-物交互运动以实现物理可行的灵巧机器人动作

Hyesung Lee, Hyunwoo Jung, Si-Hwan Heo, Sungwook Yang

发表机构 * Korea Institute of Science and Technology（韩国科学技术院）； KAIST（韩国科学技术院）； Hanyang University（翰阳大学）

专题命中 VLA模型：涉及视觉-语言-动作，但主要聚焦操作。

AI总结提出DexSynRefine框架，通过HOI-MMFP运动先验合成手-物轨迹，结合任务空间残差强化学习和接触动力学适应，将人-物交互数据转化为物理可行的灵巧操作，在五个任务上成功率提升50-70个百分点。

Comments Project page: https://dexsynrefine.github.io/

详情

AI中文摘要

从人-物交互（HOI）数据中学习灵巧操作为机器人遥操作提供了一种可扩展的替代方案，但HOI演示通常稀疏且纯运动学，在实体不匹配和接触丰富的动力学下直接重定向不可靠。我们提出DexSynRefine，一个耦合框架，将HOI数据视为结构化运动先验而非可执行的机器人动作。DexSynRefine首先使用HOI运动流形流基元（HOI-MMFP）——一种耦合手-物运动的运动先验，根据任务和初始物体状态合成手-物轨迹。然后通过任务空间残差强化学习对其进行物理接地，并通过从本体感受历史推断缺失的接触动力学上下文来适应执行。在五个灵巧操作任务中，每个阶段解决一个互补的瓶颈：HOI-MMFP提高了轨迹一致性和平滑性，任务空间残差在测试的替代方案中提供了最强的接地表示，接触动力学适应实现了鲁棒的真实世界执行。综合来看，DexSynRefine在真实世界中的成功率比运动学重定向提高了50-70个百分点。

英文摘要

Learning dexterous manipulation from human-object interaction (HOI) data offers a scalable alternative to robot teleoperation, but HOI demonstrations are typically sparse and purely kinematic, making direct retargeting unreliable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a coupled framework that treats HOI data as structured motion priors rather than executable robot actions. DexSynRefine first synthesizes hand-object trajectories conditioned on the task and initial object state using HOI Motion Manifold Flow Primitives (HOI-MMFP), a motion prior for coupled hand-object motion. It then physically grounds them with task-space residual reinforcement learning and adapts execution by inferring missing contact-dynamics context from proprioceptive history. Across five dexterous manipulation tasks, each stage addresses a complementary bottleneck: HOI-MMFP improves trajectory consistency and smoothness, task-space residuals provide the strongest grounding representation among the tested alternatives, and contact-dynamics adaptation enables robust real-world execution. Together, DexSynRefine improves real-world success rates over kinematic retargeting by 50-70~percentage points.

URL PDF HTML ☆

赞 0 踩 0