World-Task Factorization for Robot Learning
世界-任务分解用于机器人学习
Eduardo Sebastián, Adrian Pfisterer, Vito Mengers, Oliver Brock, Amanda Prorok
AI总结 提出将策略分解为世界因子和任务因子,通过可微图模型AICON与紧凑学习策略结合,实现零样本泛化到新配置并迁移到真实硬件。
详情
机器人学习必须产生能够泛化到新的约束、队友和环境组合的策略。为此,我们必须对策略进行结构性分解,这种选择决定了哪些部分泛化、哪些需要重新训练、哪些保持纠缠。现有方法涵盖从期望结构从数据扩展中涌现,到通过层次结构、技能库或学习专门化手工设计。在本文中,我们研究我们认为机器人学中最基本的分解:将世界与任务分离。我们研究了这种分解有原则的条件。世界因子是具身系统和环境的属性;它们独立于意图存在。任务因子由任务在世界所允许的事物上的逻辑定义。我们通过贝叶斯模型证据形式化这种不对称性:它与数据生成过程一致,通过分析世界模型保持高似然,并减少奥卡姆剃刀对任务参数的惩罚。我们通过将AICON(一个可微分的递归估计器和互连图,具有组合性,无需任务特定数据即可运行,并将成本梯度传播到执行器)与一个紧凑的学习策略配对来实例化这种分解,该策略调节梯度路径。梯度作为两个因子之间的接口:它们通过图携带世界结构,通过成本携带任务结构,从而在保持结构泛化的同时实现低维学习。我们在三个问题上测试了世界/任务分解,这些问题包含异构机器人、环境、任务逻辑和感觉运动模态。我们的框架在所有设置中优于端到端基线和分析启发式方法,零样本泛化到分布外配置,并无需重新训练即可迁移到真实硬件。
Robot learning must produce policies that generalize to new combinations of constraints, teammates, and environments. To achieve this, we must structurally factor the policy, which is a choice that dictates what generalizes, what requires retraining, and what remains entangled. Existing methods span a wide spectrum, from expecting structure to emerge from data scaling, to hand-designing it via hierarchies, skill libraries or learned specializations. In this paper, we study what we argue is the most fundamental factorization in robotics: separating the world from the task. We investigate the conditions under which this factorization is principled. World factors are properties of the embodied system and the environment; they exist independently of intent. Task factors are defined by the task's logic over what the world admits. We formalize this asymmetry through Bayesian model evidence: it aligns with the data-generating process, maintains high likelihood through an analytical world model, and reduces the Occam razor's penalty on task parameters. We instantiate this factorization by pairing AICON, a differentiable graph of recursive estimators and interconnections that is compositional, operates without task-specific data, and propagates cost gradients to actuators, with a compact, learned policy that modulates gradient paths. Gradients serve as the interface between the two factors: they carry world structure through the graph and task structure through costs, enabling low-dimensional learning while preserving structural generalization. We test the world/task factorization across three problems that encompass heterogeneous robots, environments, task logic and sensorimotor modalities. Our framework outperforms end-to-end baselines and analytical heuristics in all settings, generalizes zero-shot to out-of-distribution configurations, and transfers to real hardware without retraining.