arXivDaily arXiv每日学术速递 周一至周五更新

视觉与机器人

VLA / 视觉-语言-动作模型

视觉-语言-动作模型、机器人基础模型和语言条件机器人控制。

今日/当前日期收录 1 信号源:cs.RO, cs.CV, cs.AI, cs.LG
2605.05925 2026-06-18 cs.RO 版本更新 专题 60

DexSynRefine: Synthesizing and Refining Human-Object Interaction Motion for Physically Feasible Dexterous Robot Actions

DexSynRefine:合成与精炼人-物交互运动以实现物理可行的灵巧机器人动作

Hyesung Lee, Hyunwoo Jung, Si-Hwan Heo, Sungwook Yang

发表机构 * Korea Institute of Science and Technology(韩国科学技术院) KAIST(韩国科学技术院) Hanyang University(翰阳大学)

专题命中 VLA模型 :涉及视觉-语言-动作,但主要聚焦操作。

AI总结 提出DexSynRefine框架,通过HOI-MMFP运动先验合成手-物轨迹,结合任务空间残差强化学习和接触动力学适应,将人-物交互数据转化为物理可行的灵巧操作,在五个任务上成功率提升50-70个百分点。

Comments Project page: https://dexsynrefine.github.io/

详情
AI中文摘要

从人-物交互(HOI)数据中学习灵巧操作为机器人遥操作提供了一种可扩展的替代方案,但HOI演示通常稀疏且纯运动学,在实体不匹配和接触丰富的动力学下直接重定向不可靠。我们提出DexSynRefine,一个耦合框架,将HOI数据视为结构化运动先验而非可执行的机器人动作。DexSynRefine首先使用HOI运动流形流基元(HOI-MMFP)——一种耦合手-物运动的运动先验,根据任务和初始物体状态合成手-物轨迹。然后通过任务空间残差强化学习对其进行物理接地,并通过从本体感受历史推断缺失的接触动力学上下文来适应执行。在五个灵巧操作任务中,每个阶段解决一个互补的瓶颈:HOI-MMFP提高了轨迹一致性和平滑性,任务空间残差在测试的替代方案中提供了最强的接地表示,接触动力学适应实现了鲁棒的真实世界执行。综合来看,DexSynRefine在真实世界中的成功率比运动学重定向提高了50-70个百分点。

英文摘要

Learning dexterous manipulation from human-object interaction (HOI) data offers a scalable alternative to robot teleoperation, but HOI demonstrations are typically sparse and purely kinematic, making direct retargeting unreliable under embodiment mismatch and contact-rich dynamics. We present DexSynRefine, a coupled framework that treats HOI data as structured motion priors rather than executable robot actions. DexSynRefine first synthesizes hand-object trajectories conditioned on the task and initial object state using HOI Motion Manifold Flow Primitives (HOI-MMFP), a motion prior for coupled hand-object motion. It then physically grounds them with task-space residual reinforcement learning and adapts execution by inferring missing contact-dynamics context from proprioceptive history. Across five dexterous manipulation tasks, each stage addresses a complementary bottleneck: HOI-MMFP improves trajectory consistency and smoothness, task-space residuals provide the strongest grounding representation among the tested alternatives, and contact-dynamics adaptation enables robust real-world execution. Together, DexSynRefine improves real-world success rates over kinematic retargeting by 50-70~percentage points.