Learning to Balance Motor Thermal Safety and Quadrupedal Locomotion Performance with Residual Policy
学习平衡电机热安全与四足运动性能的残差策略
Yuhang Wan, Weixian Lin, Letian Qian, Yiqi Zou, Weiwei Wu, Shengwei Wu, Chuanlin Zhao, Xin Luo
AI总结 提出一种两阶段训练框架,结合整机热模型和残差策略,在保持运动性能的同时防止电机过热,实现长时间负重运动。
详情
电机热管理在电动驱动机器人(尤其是腿式机器人)中常被忽视,但电机过热是限制长时间运动的关键因素,特别是在负载条件下。本文将一个四足机器人的整机热模型集成到强化学习流水线中以更新电机温度,并提出一个用于电机热管理的两阶段训练框架。在该框架中,首先预训练一个名义策略作为能够穿越多种地形的运动基线。然后,在名义策略之上训练一个残差策略,根据机器人的热状态提供修正动作,确保在低温条件下保持高性能,并在高温条件下防止电机过热。仿真结果表明,所提出的策略在电机热安全与运动性能之间实现了有效平衡。在宇树A1四足机器人上的真实世界实验进一步验证了该方法:在3千克负载下,机器人能够在多种地形上稳定运动超过13分钟,而仅使用名义策略时,约5分钟就会导致电机过热。
Motor thermal management is often overlooked in the context of electrically-actuated robots, particularly legged robots, but motor overheating is a key factor that limits long-duration locomotion especially under payload conditions. This paper integrates a whole-body thermal model of a quadruped robot into the reinforcement learning pipeline to update motor temperatures, and proposes a two-stage training framework for motor thermal management. In this framework, a nominal policy is first pre-trained as a locomotion baseline capable of traversing diverse terrains. A residual policy is then trained on top of the nominal policy to provide corrective actions based on the robot's thermal state, ensuring high performance under low-temperature conditions and preventing motor overheating under high-temperature conditions. Simulation results demonstrate that the proposed policy achieves an effective balance between motor thermal safety and locomotion performance. Real-world experiments on a Unitree A1 quadruped robot further validate the approach: under a 3 kg payload, the robot achieves stable locomotion across multiple terrains for over 13 minutes, while the nominal policy alone leads to motor overheating in about 5 minutes.