Evolutionary Enhanced Multi-Agent Reinforcement Learning for Cooperative Air Combat
进化增强的多智能体强化学习用于协同空战
Chengwei Li, Junlin Liu, Yang Gao
AI总结 针对多机协同空战中现有MARL方法探索效率低、样本利用率低和策略泛化差的问题,提出ACE-MAPPO混合学习框架,融合进化算法与MAPPO,通过遗传软更新、进化优先轨迹回放和对抗进化课程学习机制提升性能。
详情
随着现代空战向超视距多机协同交战演变,无人作战飞行器的自主决策面临高维状态空间、离散动作指令和强对抗动态环境的重大挑战。为克服现有基于多智能体强化学习的方法在此类场景中的局限性,即探索效率不足、样本利用率低和策略泛化能力差,我们提出了对抗课程与进化增强的多智能体近端策略优化(ACE-MAPPO),一种将进化算法与MAPPO相结合的混合学习框架。具体而言,引入了遗传软更新机制以增强种群多样性并缓解收敛到局部最优。进一步采用了进化增强的优先轨迹回放策略以提高稀疏高价值样本的利用率。此外,设计了对抗进化课程学习机制,实现难度逐渐增加的自适应训练。大量实验结果表明,所提方法在训练稳定性、收敛速度和胜率方面优于MAPPO及其他基线算法,验证了其在多机协同空战场景中的有效性。
As modern air combat evolves toward beyond-visual-range (BVR) multi-aircraft cooperative engagements, autonomous decision-making for unmanned combat aerial vehicles (UCAVs) faces significant challenges due to high-dimensional state spaces, discrete action commands, and strongly adversarial dynamic environments. To overcome the limitations of existing multi-agent reinforcement learning (MARL) methods in such settings, namely insufficient exploration efficiency, low sample utilization, and poor policy generalization, we propose Adversarial Curriculum and Evolutionary-enhanced Multi-agent Proximal Policy Optimization (ACE-MAPPO), a hybrid learning framework that integrates evolutionary algorithms with MAPPO. Specifically, a genetic soft update mechanism is introduced to enhance population diversity and mitigate convergence to local optima. An evolutionary-augmented prioritized trajectory replay strategy is further employed to improve the utilization of sparse high-value samples. In addition, an adversarial evolutionary curriculum learning mechanism is designed to enable adaptive training with progressively increasing difficulty. Extensive experimental results demonstrate that the proposed method outperforms MAPPO and other baseline algorithms in terms of training stability, convergence speed, and win rate, validating its effectiveness in multi-aircraft cooperative air combat scenarios.