Zero-Shot Scalable Resilience in UAV Swarms: A Decentralized Imitation Learning Framework with Physics-Informed Graph Interactions
无人机群中的零样本可扩展韧性:一种带有物理信息图交互的去中心化模仿学习框架
Huan Lin, Lianghui Ding
AI总结 本文提出了一种去中心化模仿学习框架,通过物理信息图神经网络编码局部交互,实现无人机群在大规模故障和碎片化拓扑下的鲁棒恢复。
详情
大规模无人机(UAV)故障可能导致无人机群网络分裂为断开的子网络,使得去中心化恢复既紧迫又困难。集中式恢复方法依赖于全局拓扑信息,在严重碎片化后变得通信密集。去中心化启发法和多智能体强化学习方法更容易部署,但其性能在群规模和损坏严重程度变化时通常会退化。我们提出了物理信息图对抗模仿学习算法(PhyGAIL),该算法采用集中训练与去中心化执行。PhyGAIL从异构观测中构建有界的局部交互图,并利用物理信息图神经网络将方向局部交互编码为具有显式吸引力和排斥力的门控消息传递。这使策略具有物理基础的协调偏置,同时保持局部观测的尺度不变性。它还使用场景自适应模仿学习来改进在碎片化拓扑和可变长度恢复周期下的训练。我们的分析建立了有界局部图放大、有界交互动态和终端成功信号的受控方差。在20个UAV群上训练的策略可直接转移到最多500个UAV的群中,无需微调,且在重新连接可靠性、恢复速度、运动安全性和运行效率方面优于代表性基线。
Large-scale Unmanned Aerial Vehicle (UAV) failures can split an unmanned aerial vehicle swarm network into disconnected sub-networks, making decentralized recovery both urgent and difficult. Centralized recovery methods depend on global topology information and become communication-heavy after severe fragmentation. Decentralized heuristics and multi-agent reinforcement learning methods are easier to deploy, but their performance often degrades when the swarm scale and damage severity vary. We present Physics-informed Graph Adversarial Imitation Learning algorithm (PhyGAIL) that adopts centralized training with decentralized execution. PhyGAIL builds bounded local interaction graphs from heterogeneous observations, and uses physics-informed graph neural network to encode directional local interactions as gated message passing with explicit attraction and repulsion. This gives the policy a physically grounded coordination bias while keeping local observations scale-invariant. It also uses scenario-adaptive imitation learning to improve training under fragmented topologies and variable-length recovery episodes. Our analysis establishes bounded local graph amplification, bounded interaction dynamics, and controlled variance of the terminal success signal. A policy trained on 20-UAV swarms transfers directly to swarms of up to 500 UAVs without fine-tuning, and achieves better performance across reconnection reliability, recovery speed, motion safety, and runtime efficiency than representative baselines.