CCKS: Consensus-based Communication and Knowledge Sharing
CCKS:基于共识的通信与知识共享
Jinyuan Zu, Xiaowei Lv, Yongcai Wang, Deying Li, Yunjun Han, Wenping Chen, Fengyi Zhang, Naiqi Wu
AI总结 针对多智能体强化学习中动作建议过度依赖教师指导的问题,提出基于共识的通信与知识共享框架,通过对比学习构建共识模型,平衡探索与学习,提升合作效率与性能。
详情
在分布式训练和分布式执行(DTDE)的协作多智能体强化学习(MARL)中,基于动作建议的知识共享促进了智能体间的可解释和可扩展合作。然而,当前的动作建议方法往往过于遵循教师的指导,而未评估师生兼容性,导致过度建议、稳定性欠佳和性能下降。为克服这些挑战,本文提出了一种基于共识的通信与知识共享(CCKS)框架,该框架允许智能体基于共识衍生的约束采纳建议,并更智能地遵循教师指令。该机制使智能体能够平衡探索与向经验丰富的教师学习,从而提升整体性能。关键在于共识模型的构建,为此我们提出在智能体训练阶段利用对比学习基于局部观测构建共识模型。在动作选择中,智能体根据共识和共享知识对动作进行评分和选择。CCKS设计为即插即用解决方案,可无缝集成到现有DTDE算法中。在Google Research Football环境和复杂的星际争霸II多智能体挑战中进行的实验表明,与当前的DTDE基线相比,集成CCKS显著提高了合作效率、学习速度和整体性能。代码可从此https URL获取。
In Decentralized Training and Decentralized Execution (DTDE) for cooperative Multi-Agent Reinforcement Learning (MARL), action-advising-based knowledge sharing promotes interpretable and scalable cooperation among agents. However, current action advising approaches often adhere too much to the teacher's guidance without evaluating teacher-student compatibility, which causes excessive advising, suboptimal stability, and degraded performance. To overcome these challenges, this paper presents a Consensus-based Communication and Knowledge Sharing (CCKS) framework, which allows agents to adopt recommendations based on consensus-derived constraints and to follow the teacher's instructions more smartly. This mechanism enables agents to balance exploration and learning from experienced teachers, improving overall performance. The key is the consensus model construction, for which we propose to employ contrastive learning to construct consensus models based on local observations in the agents' training phase. In action selection, agents score and choose actions based on consensus and shared knowledge. Designed as a plug-and-play solution, CCKS integrates seamlessly with existing DTDE algorithms. Experiments conducted in the Google Research Football environment and the complex StarCraft II Multi-Agent Challenge demonstrate that the integration with CCKS significantly improves cooperation efficiency, learning speed, and overall performance compared with current DTDE baselines. The code is available at this https URL.