POIROT: Interrogating Agents for Failure Detection in Multi-Agent Systems
POIROT: 在多智能体系统中审讯智能体以进行故障检测
Iñaki Dellibarda Varela, R. Sendra-Arranz, Pablo Romero-Sorozabal, J. M. Valverde-García, Annemarie F. Laudanski, Álvaro Gutiérrez, Eduardo Rocon, Manuel Cebrian
AI总结 提出POIROT协议,利用多智能体系统自身的智能体作为诊断层进行故障检测,在复杂问题、多智能体和复合故障条件下优于单LLM评估基线。
详情
- Comments
- 44 pages, 6 figures
将大型语言模型编排成多智能体系统(LLM-MAS)解锁了卓越的推理能力,但难以表征的突发故障和幻觉阻碍了其在安全关键领域的部署——新兴的AI法规使得这一差距在法律上难以维持。现有的评估范式有一个共同的缺陷:集中式判断造成单点故障,并且需要领域特定专业知识。本文提出POIROT,一种将系统自身的智能体重新用作其诊断层的协议,利用架构中已有的认知多样性。在评估的设置中,POIROT优于单LLM评估基线,其增益随问题复杂度(OR = 1.60,$p = 0.008$)、智能体数量和故障维度而扩展,并在复合故障条件下持续存在。这些结果表明,安全监督不必外部化:执行角色的智能体拥有足够的集体智慧来审计它。我们将POIROT作为开源库发布,同时发布BLAME,一个用于安全关键多智能体系统中故障归因的基准。
Orchestrating Large Language Models into Multi-Agent Systems (LLM-MAS) has unlocked remarkable reasoning capabilities, yet emergent failures and hallucinations that resist characterisation block their deployment in safety-critical domains -- a gap made legally untenable by emerging AI regulation. Existing evaluation paradigms share a common flaw: centralised judgment creates single points of failure and demands domain-specific expertise. Here we present POIROT, a protocol that repurposes a system's own agents as its diagnostic layer, leveraging the epistemic diversity already present in the architecture. Across evaluated settings, POIROT outperforms single-LLM evaluator baselines, with gains that scale with problem complexity (OR = 1.60, $p = 0.008$), agent count, and fault dimensionality, persisting under compound fault conditions. These results demonstrate that safety oversight need not be externalised: the agents executing a role carry sufficient collective intelligence to audit it. We release POIROT as an open-source library alongside BLAME, a benchmark for fault attribution in safety-critical multi-agent systems.