MAStrike: Shapley-Guided Collusive Red-Teaming on Multi-Agent Systems
MAStrike: 基于Shapley值的多智能体系统合谋红队测试
Chejian Xu, Zhaorun Chen, Jingyang Zhang, Freddy Lecue, Avni Kothari, Sarah Tan, Wenbo Guo, Bo Li
AI总结 提出MAStrike框架,通过Shapley值分析识别多智能体系统中脆弱智能体联盟,生成角色感知的对抗攻击,并迭代优化以绕过防御,显著优于启发式基线。
详情
分层多智能体系统(MAS)正迅速部署在金融和软件工程等高危工作流中。在这些系统中,安全本质上是分布在不同角色智能体上的,显著扩大了攻击面,特别是在特权提升和跨智能体合谋等协调对抗行为下。现有的MAS红队测试方法仍然有限:它们依赖启发式选择目标智能体并扰动孤立的消息流,留下了关键问题未解答,即哪些智能体对系统安全最负责,以及受损智能体如何协调以绕过防御。我们提出MAStrike,一个用于分层MAS中合谋红队测试的闭环框架。我们首次提出针对MAS的智能体级Shapley值分析,量化每个智能体在任务特定分布下对系统鲁棒性的边际贡献。在此归因指导下,MAStrike识别脆弱智能体联盟并生成协调的、角色感知的对抗操纵。这些攻击通过结构化因果诊断迭代优化,将失败案例归因于阻止对抗尝试的未受损智能体。我们进一步构建了全面的MAS红队测试基准和可控环境,涵盖不同的分层拓扑和领域,包括金融、软件工程和CRM。在多个前沿模型构建的MAS上进行的广泛实验表明,MAStrike显著优于启发式基线。我们的分析进一步揭示了智能体间非平凡的Shapley值分布和高阶交互结构,揭示了先前单智能体或基于模板的方法忽略的关键漏洞和协调模式。
Hierarchical multi-agent systems (MAS) are rapidly being deployed in high-stakes workflows across domains such as finance and software engineering. In these systems, safety and security are inherently distributed across role-specialized agents, significantly expanding the attack surface, particularly under coordinated adversarial behaviors such as privilege escalation and cross-agent collusion. Existing red-teaming approaches for MAS remain limited: they rely on heuristic selection of target agents and perturb isolated message streams, leaving critical questions unanswered as which agents are most responsible for system safety, and how compromised agents can coordinate to bypass defenses. We propose MAStrike, a closed-loop framework for collusive red-teaming in hierarchical MAS. We propose the first agent-level Shapley value analysis for MAS, quantifying each agent's marginal contribution to system robustness under task-specific distributions. GGuided by this attribution, MAStrike identifies vulnerable agent coalitions and generates coordinated, role-aware adversarial manipulations. These attacks are iteratively refined through structured causal diagnosis, attributing failure cases to uncompromised agents that block adversarial attempts. We further build a comprehensive MAS red-teaming benchmark and controllable environments spanning diverse hierarchical topologies and domains, including finance, software engineering, and CRM. Extensive experiments across MAS built on multiple frontier models show that MAStrike substantially outperforms heuristic baselines. Our analysis further uncovers non-trivial Shapley value distributions and higher-order interaction structures among agents, revealing critical vulnerabilities and coordination patterns that are overlooked by prior single-agent or template-based methods.