arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

AI Agent

智能体、工具调用、规划、工作流、多智能体和自主任务执行。

今日/当前日期收录 22 信号源:cs.AI, cs.CL, cs.LG, cs.SE
2606.18837 2026-06-18 cs.MA cs.AI cs.LG 新提交 90%

Skill-MAS: Evolving Meta-Skill for Automatic Multi-Agent Systems

Skill-MAS: 演化元技能以自动生成多智能体系统

Hehai Lin, Qi Yang, Chengwei Qin

发表机构 * Ant Group(蚂蚁集团) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

专题命中 多智能体 :自动生成多智能体系统,元技能演化。

AI总结 提出Skill-MAS,通过将高层编排能力解耦为可演化的元技能,在无需参数更新的情况下实现经验保留,利用多轨迹采样和选择性反思优化元技能,在多个基准和LLM上取得显著性能提升且成本可控。

详情
AI中文摘要

基于大型语言模型(LLM)的自动多智能体系统(MAS)生成已成为处理复杂任务的关键前沿。然而,现有方法在模型能力和经验保留之间面临两难困境。推理时MAS利用冻结的尖端LLM,但重复相同搜索而不从过去经验中学习。相反,训练时MAS通过梯度更新内化经验,但受限于较小模型的低能力上限,且难以扩展到大型尖端LLM。为弥合这一差距,我们提出Skill-MAS,一种新颖的第三条路径,通过将高层编排能力概念化为可演化的元技能,将经验保留与参数更新解耦。Skill-MAS通过一个封闭优化循环来精炼这种架构知识:(1)多轨迹采样在当前元技能下为每个任务采样行为分布;(2)选择性反思自适应选择优先任务,并应用分层对比分析将系统经验蒸馏为可泛化的策略级原则。在四个复杂基准和四个不同LLM上的大量实验表明,Skill-MAS不仅实现了显著的性能提升,而且保持了良好的成本-性能权衡。进一步分析揭示,演化后的元技能高度鲁棒,并在未见任务和不同LLM之间表现出强迁移性。

英文摘要

Large Language Model (LLM)-based automatic Multi-Agent Systems (MAS) generation has become a crucial frontier for tackling complex tasks. However, existing methods face a dilemma between model capability and experience retention. Inference-time MAS leverages frozen frontier LLMs but repeats identical searches without learning from past experience. Conversely, Training-time MAS internalizes experience via gradient updates but is constrained by the low capability ceiling of smaller models, and is hard to scale to large frontier LLMs. To bridge this gap, we propose Skill-MAS, a novel third path that decouples experience retention from parametric updates by conceptualizing the high-level orchestration capability as an evolvable Meta-Skill. Skill-MAS refines this architectural knowledge through a closed optimization loop: (1) Multi-Trajectory Rollout samples a behavioral distribution for each task under the current Meta-Skill; and (2) Selective Reflection adaptively selects priority tasks and applies hierarchical contrastive analysis to distill systemic experience into generalizable, strategy-level principles. Extensive experiments across four complex benchmarks and four distinct LLMs demonstrate that Skill-MAS not only achieves remarkable performance gains but also maintains a favorable cost-performance trade-off. Further analysis reveals that the evolved Meta-Skills are highly robust and exhibit strong transferability across unseen tasks and different LLMs.

2606.18668 2026-06-18 cs.MA cs.CL 新提交 90%

EARS: Explanatory Abstention for Reliable Sub-Agent Modeling in Large-scale Multi-Agent Systems

EARS:大规模多智能体系统中可靠子智能体建模的解释性弃权

Shuang Xie, Yunan Lu, Han Li, Lingyun Wang

发表机构 * Shopify Columbia University(哥伦比亚大学)

专题命中 多智能体 :多智能体系统中子智能体弃权机制

AI总结 针对大规模多智能体系统中子智能体过度回答导致幻觉的问题,提出EARS框架,通过将弃权重构为智能体间通信协议,利用校准的LLM裁判模型生成结构化弃权标签和理由,微调子智能体以检测故障并返回理由,在电商助手系统中将响应通过率从68.5%提升至78.9%。

详情
AI中文摘要

在大规模企业环境中,集中式多智能体系统(MAS)日益被采用,其中协调器将用户请求委托给轻量级、领域专业化的子智能体。虽然这种架构提高了模块化、可扩展性和成本效率,但其可靠性不仅取决于准确的路由,还取决于子智能体根据能力约束校准其响应的能力。特别是,基于较小微调模型的子智能体通常难以进行这种校准,导致它们过度回答模糊、未明确说明、路由错误或不支持的请求,并产生幻觉输出,而不是可操作的反馈。为了应对这一挑战,我们提出了EARS(用于可靠子智能体建模的解释性弃权),这是一个面向生产的框架,将子智能体弃权重新定义为智能体间通信协议:子智能体不仅弃权,而且向协调器暴露可操作的故障状态。EARS使用一组校准的LLM裁判模型来策划人机交互数据,在子智能体故障模式的分类法下生成结构化的弃权标签和理由。这些数据用于微调子智能体,使其能够检测故障条件并返回理由,以便协调器进行澄清、重新路由或回退。我们在一个支持企业商业智能工作流程的大规模生产电商助手中评估了EARS。EARS将整体响应通过率从68.5%提高到78.9%,证明了子智能体侧的解释性弃权提高了MAS的可靠性。

英文摘要

In large-scale enterprise settings, centralized multi-agent systems (MAS) are increasingly adopted, in which a coordinator delegates user requests to lightweight, domain-specialized sub-agents. While this architecture improves modularity, scalability, and cost efficiency, its reliability depends not only on accurate routing but also on sub-agents' ability to calibrate their responses to capability constraints. In particular, sub-agents built on smaller fine-tuned models often struggle with such calibration, leading them to over-answer ambiguous, underspecified, misrouted, or unsupported requests and produce hallucinated outputs instead of actionable feedback. To address this challenge, we present EARS (Explanatory Abstention for Reliable Sub-Agent Modeling), a production-oriented framework that reframes sub-agent abstention as an inter-agent communication protocol: a sub-agent does not merely abstain, but exposes an actionable failure state to the coordinator. EARS curates human-agent interaction data using an ensemble of calibrated LLM-as-a-Judge models, producing structured abstention labels and rationales under a taxonomy of sub-agent failure modes. These data are used to fine-tune sub-agents to detect failure conditions and return rationales for coordinator-level clarification, rerouting, or fallback. We evaluate EARS in a large-scale production e-commerce assistant supporting enterprise business intelligence workflows. EARS improves the overall response pass rate from 68.5% to 78.9%, demonstrating that sub-agent-side explanatory abstention improves MAS reliability.

2606.18648 2026-06-18 physics.comp-ph 新提交 90%

Deep Research in Physical Sciences: A Multi-Agent Framework and Comprehensive Benchmark

物理科学中的深度研究:多智能体框架与综合基准

Yigeng Jiang, Tengchao Yang, Taoyong Cui, Jiaxing Wan, Yuan Wang, Weida Wang, Zhiyu Liu, Chuyi Peng, Binzhao Luo, Maoli Gao, Huaihai Huang, Yuqianer Zeng, Ziyang Zheng, Dongchen Huang, Chao Chen, Zichao Liu, Weiping Shen, Shuchen Pu, Siyu Zhou, Runmin Ma, Yusong Hu, Fei Chao, Bo Zhang, Xiawu Zheng, Zifu Wang, Lei Bai, Yunqi Cai, Shufei Zhang

专题命中 多智能体 :多智能体框架DelveAgent,物理科学深度研究

AI总结 提出PhySciBench基准评估LLM在物理科学中的深度研究能力,并开发DelveAgent多智能体框架,通过自适应规划、双粒度记忆和分层反思机制提升准确率并降低推理成本。

Comments 19 pages, 5 figures, 1 table;

详情
AI中文摘要

深度研究智能体是基于大型语言模型(LLM)的系统,专为自主、多步骤的科学推理而设计,在加速物理科学研究方面具有巨大潜力。然而,目前缺乏对其在该领域能力的全面深入评估。为填补这一空白,我们引入了PhySciBench,一个与物理科学研究高度相关的基准,包含200个专家策划的问题,涵盖物理和化学,分布在反映真实科学工作流程的六个任务类别中。对最先进模型和智能体系统在PhySciBench上的评估显示性能有限;即使是最强的基线Gemini Deep Research,准确率也仅为33.5%。对失败案例的分析发现了三个反复出现的缺陷:扩展推理链的脆弱性、跨步骤的知识迁移有限以及缺乏基于物理的自验证。受这些发现启发,我们开发了DelveAgent,一个模块化的多智能体框架,配备自适应规划循环、双粒度记忆和分层物理接地反思机制。在四个科学基准上,DelveAgent将准确率提高了最多7.5个百分点,同时将推理成本降低到最强基线的大约三分之一。这些结果确立了PhySciBench作为评估物理科学中AI系统关键基准的重要性,并表明架构专业化可以有效增强自主科学研究的可靠性。

英文摘要

Deep research agents are Large Language Model (LLM)-based systems designed for autonomous, multi-step scientific reasoning, and they hold immense potential for accelerating research in the physical sciences. However, comprehensive and in-depth evaluations of their capabilities within this domain remain lacking. To address this gap, we introduce PhySciBench, a benchmark highly relevant to physical science research, comprising 200 expert-curated questions, balanced between physics and chemistry, across six task categories that reflect real-world scientific workflows. Evaluations of state-of-the-art models and agent systems on PhySciBench reveal limited performance; even the strongest baseline, Gemini Deep Research, achieves an accuracy of only 33.5%. Analysis of failure cases identifies three recurrent deficiencies: fragility in extended reasoning chains, limited knowledge transfer across steps, and a lack of physics-grounded self-verification. Motivated by these findings, we develop DelveAgent, a modular multi-agent framework equipped with an adaptive planning loop, dual-granularity memory, and a hierarchical physics-grounded reflection mechanism. Across four scientific benchmarks, DelveAgent improves accuracy by up to 7.5 percentage points while reducing inference costs to approximately one-third of the strongest baseline. These results establish the significance of PhySciBench as a critical benchmark for evaluating AI systems in the physical sciences and demonstrate that architectural specialization can effectively enhance the reliability of autonomous scientific research.

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新 90%

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich(慕尼黑路德维希-马克西米利安大学) Technical University of Munich(慕尼黑技术大学) Munich Center for Machine Learning(慕尼黑机器学习中心) University of Notre Dame(诺丁汉大学)

专题命中 多智能体 :提出自进化多智能体系统,通过文本反向传播优化协作。

AI总结 提出Agentic Neural Network框架,将多智能体协作建模为分层神经网络,通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化,在七个基准数据集上超越现有方法。

详情
AI中文摘要

利用多个大型语言模型(LLM)已被证明对处理复杂、高维任务有效,但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制,我们提出Agentic Neural Network(ANN)框架,该框架将多智能体协作概念化为分层神经网络架构。在此设计中,每个智能体作为节点运行,每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略:(1)前向阶段——受神经网络前向传播启发,任务被动态分解为子任务,并逐层构建具有合适聚合方法的协作智能体团队。(2)反向阶段——模仿反向传播,我们通过迭代反馈优化全局和局部协作,使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队,在准确性和适应性方面带来显著提升。在七个基准数据集上,我们的工作在相同配置下超越了领先的多智能体基线,显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

2606.19308 2026-06-18 cs.CL cs.MA 新提交 85%

Enhancing Decision-Making with Large Language Models through Multi-Agent Fictitious Play

通过多智能体虚拟博弈增强大语言模型的决策能力

Leyang Shen, Yang Zhang, Xiaoyan Zhao, Chun Kai Ling, Tat-Seng Chua

发表机构 * National University of Singapore(新加坡国立大学)

专题命中 多智能体 :多智能体虚拟博弈增强决策

AI总结 针对多智能体系统中决策任务因立场纠缠而难以分解的问题,提出基于虚拟博弈的多智能体虚拟博弈(MAFP)范式,通过迭代最佳响应实现均衡求解,提升决策质量和鲁棒性。

Comments 18 pages, 8 figures

详情
AI中文摘要

基于大语言模型(LLM)的多智能体系统(MAS)通过将子任务分配给协作智能体,在解决具有执行复杂性的任务方面展现出巨大潜力。然而,这种分而治之的范式在现实世界中同样普遍的决策任务上表现不足。这些任务要求所有相关利益方同时推理,其决策相互依赖,因此无法孤立解决。我们将这一挑战定性为立场纠缠,这是一种区别于执行复杂性的决策复杂性。为了解决这一问题,我们提出了多智能体虚拟博弈(MAFP),一种新颖的MAS范式,将利益方立场表示为智能体,并将决策制定形式化为一个均衡寻求过程。基于博弈论中的虚拟博弈原理,MAFP通过每个智能体对其他智能体过去决策的经验混合做出最佳响应,迭代更新其决策。这使得智能体能够暴露并解决彼此的弱点,逐步提高决策质量和鲁棒性。我们在具有挑战性的决策任务上评估MAFP,这些任务测试在行动前为竞争场景制定策略的能力。MAFP在两个互补指标——锦标赛强度和鲁棒性上,均优于单轮和多轮基线,证明了其在解决立场纠缠方面的有效性。

英文摘要

Large language model (LLM)-based multi-agent systems (MAS) have demonstrated great potential in solving tasks with execution complexity, by distributing subtasks across cooperative agents. However, this divide-and-conquer paradigm falls short on decision-making tasks that are also prevalent in the real world. These tasks require simultaneous reasoning from the stances of all involved stakeholders whose decisions are mutually dependent and thus cannot be solved in isolation. We characterize this challenge as stance entanglement, a form of decision complexity distinct from execution complexity. To address it, we propose Multi-Agent Fictitious Play (MAFP), a novel MAS paradigm that represents stakeholder stances as agents and formulates decision-making as an equilibrium-seeking process. Built on the game-theoretic principle of fictitious play, MAFP iteratively updates each agent's decision by best responding to the empirical mixture of other agents' past decisions. This enables agents to expose and address one another's weaknesses, progressively improving decision quality and robustness. We evaluate MAFP on challenging decision-making tasks that test the capability of deciding strategies for competitive scenarios prior to acting. MAFP outperforms both single-round and multi-round baselines on two complementary metrics, tournament strength and robustness, demonstrating its effectiveness in addressing stance entanglement.

2606.19111 2026-06-18 cs.CL cs.AI cs.MA 新提交 85%

Leadership as Coordination Control: Behavioral Signatures and the Recovery-Advantage Boundary in Multi-Agent LLM Teams

领导力作为协调控制:多智能体LLM团队中的行为特征与恢复优势边界

Haewoon Kwak

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

专题命中 多智能体 :多智能体LLM团队中领导力作为协调控制

AI总结 研究多智能体LLM团队中过程级协调控制何时增加价值,通过行为特征和消融实验发现,控制器的优势仅在初始多数投票不可靠、任务可恢复且无指导交互无法修复时出现,验证了权变理论。

Comments 33 pages

详情
AI中文摘要

团队科学认为领导力是权变的:它仅在特定条件下有帮助,而能力强的自主团队可能根本不需要领导。我们对多智能体LLM团队提出类似问题:在什么可测量的条件下,过程级协调控制会增加价值,这些条件是否与团队科学的预测一致?我们使用行为特征(多数锁定、探索、从错误的第0轮共识中恢复)和每动作消融实验,因为每个控制器是一个显式动作集,而不是一个整体提示。我们将三种经典领导风格(交易型、变革型、情境型)操作化为对共享动作词汇(探索、修订、接受、综合)的控制器。一个具有相同动作但使用任意规则的匹配控制器恢复效果不优于多数投票,因此是理论推导的规则(而非词汇)起作用。在四个任务体系和三个开放权重模型系列中,没有控制器在准确率上占主导地位,正如权变观点所预测的:交易型控制在所有12个(模型、体系)组合上与共享的第0轮投票匹配,差异在1.3个百分点以内,仅在初始多数不可靠的一个组合上出现增益(llama-4-scout社会性;情境型比扁平型高8个百分点)。通过四个边界探针测试的恢复优势解释表明,控制器仅在初始多数投票不可靠、任务可恢复且无指导交互无法修复时优于纯交互。这些区域映射到权变理论(领导替代、路径-目标冗余、情境准备差距),因此基本为零的准确率结果正是理论所预测的,而非控制器的失败。我们将过程级协调控制视为一种需要测量和理论映射的权变因素,而不是需要超越的排行榜。

英文摘要

Team science holds that leadership is contingent: it helps only under specific conditions, and capable, autonomous teams may need none at all. We ask the analogous question for multi-agent LLM teams: under what measurable conditions does process-level coordination control add value, and do those conditions match what team science predicts? We use behavioral signatures (majority lock-in, exploration, recovery from an incorrect round-0 consensus) and per-action ablations, clean because each controller is an explicit action set, not a monolithic prompt. We operationalize three classical leadership styles (transactional, transformational, situational) as controllers over a shared action vocabulary (explore, revise, accept, synthesize). A matched controller with the same actions but an arbitrary rule recovers no better than majority voting, so the theory-derived rule, not the vocabulary, does the work. Across four task regimes and three open-weight model families, no controller dominates by accuracy, as the contingency view predicts: transactional control matches a shared round-0 vote on all 12 (model, regime) combinations to within 1.3pp, and gains appear only on the one combination where the round-0 majority is unreliable (llama-4-scout social; situational +8pp over flat). A recovery-advantage account, tested with four boundary probes, says a controller beats plain interaction only where the round-0 majority is unreliable, the task is recoverable, and undirected interaction does not already repair it. These regions map onto contingency theory (leadership substitutes, path-goal redundancy, the situational readiness gap), so a largely null accuracy result is what the theory predicts, not a failure of the controllers. We read process-level coordination control as a contingency to be measured and theory-mapped, not a leaderboard to be topped.

2606.18268 2026-06-18 cs.SI cs.AI 新提交 85%

Towards Multi-Agent-Simulation-Based Community Note Evaluation

迈向基于多智能体模拟的社区笔记评估

Changxi Wen, Shuning Zhang, Bohao Chu, Yuwei Chuai, Hui Wang, Dai Shi, Xin Yi, Hewu Li

发表机构 * Tsinghua University, Beijing, China(清华大学,北京,中国) University of Duisburg-Essen, Duisburg, Germany(杜伊斯堡-埃森大学,杜伊斯堡,德国) University of Luxembourg, Luxembourg(卢森堡大学,卢森堡) Tongji University, Shanghai, China(同济大学,上海,中国)

专题命中 多智能体 :提出MultiCom多智能体框架模拟社区笔记评估。

AI总结 针对社区事实核查中跨共识延迟和低比例问题,提出ComRate数据集和MultiCom多智能体框架,通过矩阵分解聚类与校准聚合实现高精度评估。

详情
AI中文摘要

基于跨共识的社区事实核查在社交媒体平台上迅速扩展。然而,由人类贡献者评定的跨共识社区事实核查的延迟和低比例仍然是一个重大挑战。为解决这一问题,我们首先创建了ComRate,一个大规模数据集,包含来自$\mathbb{X}$的250万条社区笔记和超过2.09亿条评分。然后,我们提出了MultiCom,一个基于角色引导的多智能体评分框架,用于社区笔记评估。MultiCom通过在矩阵分解的评分者空间中对贡献者进行聚类,并提示角色智能体根据官方社区笔记评分模式生成结构化评估,从而模拟多样化的评分者群体。这些智能体输出结构化且可解释的判断,例如置信度、一致信号和原因。一种折外校准聚合算法结合原始投票和诊断性原因信号等特征,实现可靠预测。广泛评估表明,MultiCom优于其他方法,在评估集上平均准确率达到84.7%(平衡准确率68.3%,宏F1分数60.1%)。

英文摘要

Community-based fact-checking that relies on cross-consensus is expanding rapidly on social media platforms. However, the delay and low-ratio of cross-consensus community fact-checks rated by human contributors remains a significant challenge. To address this, we first created ComRate, a large-scale dataset comprising 2.5 million community notes and over 209 million ratings sourced from $\mathbb{X}$. We then propose MultiCom, a persona-guided multi-agent rating framework for community note evaluation. MultiCom simulates diverse rater population by clustering contributors in a matrix-factorized rater space and prompting persona agents to generate structured assessments based on the official community notes rating schema. These agents output structured and explainable judgments, such as confidence, agreement signals and reasons. An out-of-fold calibrated aggregation algorithm combines features such as raw votes and diagnostic reason signals for reliable prediction. Extensive evaluations demonstrate that MultiCom outperforms alternative methods, achieving an average accuracy of 84.7% (balanced accuracy 68.3%, macro-F1 60.1%) on the evaluation set.

2606.18264 2026-06-18 cs.SI cs.AI cs.CL 新提交 85%

Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

使用多LLM智能体模拟仇恨言论级联:实证基础、建模保真度与干预策略

Fan Huang

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

专题命中 多智能体 :使用多LLM智能体模拟仇恨言论传播与干预策略。

AI总结 本研究通过多LLM智能体系统模拟在线仇恨言论传播,发现其能再现实证数据中的立场单一性和毒性同质性,并通过消融实验识别出智能体异质性为关键保真因素,提出针对密集网络的放大器干预策略。

详情
AI中文摘要

在线平台上仇恨内容传播的忠实建模仍然是内容审核研究中的一个开放问题。经典的级联模型没有明确表示与仇恨内容传播相关的用户画像、社区和内容因素,因此在实际场景中部署时可能产生效果较差的审核策略。多智能体大语言模型系统原则上可以使每次转发决策依赖于用户画像、周围社区和帖子内容,但尚不清楚这种增加的灵活性是否比经典基线更忠实地再现真实的仇恨级联。我们研究了三个仇恨Bluesky级联和一个大小匹配的良性对照。在实证Bluesky数据中,我们发现:97.4--99.7%的转发者采取敌对立场;对于仇恨级联,扩散树上的毒性-参与同质性高于关注图;仇恨级联的拓扑结构是星形(大多数转发直接来自根节点),而良性级联是树形(转发通过多跳链传播)。在模拟中,多LLM智能体模拟器再现了立场单一性和毒性差异方向。结构化消融实验将智能体异质性识别为主要的保真因素,针对密集网络的放大器干预在5.7%良性附带损害下实现了7.5--12.9%的减少。

英文摘要

Faithful modeling of hateful content propagation on online platforms remains an open problem for moderation research. Classical cascade models that do not explicitly represent the profile, community, and content factors associated with hateful-content propagation may yield moderation strategies that behave less effectively when deployed in real-world scenarios. Multi-agent large language model (LLM) systems can, in principle, make each reshare decision depend on the user's profile, the surrounding community, and the post's content, but it remains unclear whether this added flexibility actually reproduces real hateful cascades more faithfully than classical baselines. We study three hateful Bluesky cascades and a size-matched benign control. In the empirical Bluesky data, we found that: 97.4--99.7\% of reposters take a hostile stance; toxicity-engagement homophily is higher on the diffusion tree than on the follower graph for hateful cascades; topology is star-like for the hateful cascades (most reposts come directly from the root) versus tree-like for the benign cascade (reposts propagate through multi-hop chains). In simulation, a multi-LLM-agent simulator reproduces the stance monoculture and the toxicity-delta direction. A structured ablation identifies agent heterogeneity as the leading fidelity factor, and amplifier targeting on dense networks yields 7.5--12.9\% reduction at 5.7\% benign collateral.

2606.15504 2026-06-18 cs.AI 新提交 85%

Toward Vibe Medicine: A Self-Evolving Multi-Agent Framework for Clinical Decision Support

迈向振动医学:一种用于临床决策支持的自演化多智能体框架

Qianxue Zhang, Yiming Ren, Shihuan Qin, Xiao Zhang, Liao Zhang, Jinyang Huang, Zhengliang Liu, Chenbin Liu, Hongying Feng, Jingyuan Chen, Yuzhen Ding, Weihang You, Hanqi Jiang, Yi Pan, Yifan Zhou, Junhao Chen, Lifeng Chen, Wei Liu, Tianming Liu, Zengren Zhao, Lian Zhang

发表机构 * Medical AI Lab, The First Hospital of Hebei Medical University(河北医科大学第一医院医学人工智能实验室) Hebei Provincial Engineering Research Center for AI-Based Cancer Treatment Decision-Making, The First Hospital of Hebei Medical University(河北省人工智能癌症治疗决策工程研究中心,河北医科大学第一医院) State Key Laboratory of Neurology and Oncology Drug Development(神经与肿瘤药物研发国家重点实验室) School of Computing, University of Georgia(佐治亚大学计算学院) Department of Radiation Oncology, National Cancer Center/National Clinical Research Center for Cancer/Cancer Hospital and Shenzhen Hospital, Chinese Academy of Medical Sciences and Peking Union Medical College(中国医学科学院北京协和医学院国家癌症中心/国家肿瘤临床医学研究中心/肿瘤医院深圳医院放射治疗科) Department of Radiation Oncology, Mayo Clinic(梅奥诊所放射肿瘤科) College of Mechanical and Power Engineering, China Three Gorges University(三峡大学机械与动力工程学院) Department of Radiation Oncology, Guangzhou Concord Cancer Center(广州康华肿瘤中心放射治疗科) Gastrointestinal Disease Diagnosis and Treatment Center, The First Hospital of Hebei Medical University(河北医科大学第一医院胃肠疾病诊疗中心) Department of General Surgery, The First Hospital of Hebei Medical University(河北医科大学第一医院普通外科)

专题命中 多智能体 :提出多智能体框架,包含三个专用智能体

AI总结 提出VIBEMed多智能体框架,通过自演化机制和架构级安全沙箱,从交互历史中动态学习,实现个性化临床决策支持。

详情
AI中文摘要

近年来,大型语言模型和自主智能体的进步彻底改变了医疗领域,促进了诊断并改善了治疗结果。然而,大多数现有AI系统依赖预训练知识和预定义流程,难以从包含患者结果和过去失败的交互式聊天会话历史中动态学习。为解决这一限制,我们提出了VIBEMed,一种具有内置自演化机制和架构级安全沙箱的多智能体框架,用于稳健的临床决策支持。该系统集成了三个专门智能体:用于假设生成的临床诊断智能体(CDA)、用于治疗计划的治疗执行智能体(TEA)以及将纵向临床反馈提炼为可重用知识的临床演化管理智能体(CEMA),将多模态患者信息转化为个性化医疗决策。通过自演化机制,该框架实现了跨记忆、模型行为和决策策略的迭代更新,使系统能够随时间改进。实验结果表明,VIBEMed通过其演化机制在复杂临床病例中表现出优越性能,特别是在需要集成决策和纵向规划的任务中。该框架还支持在具有挑战性的场景(如肿瘤治疗规划)中进行可靠的端到端决策,凸显了其在真实临床环境中的可行性。总体而言,VIBEMed为超越静态AI系统、迈向自适应、经验驱动的临床决策支持提供了一条实用路径,展示了将多智能体协作与持续演化相结合以推进精准医学的价值。

英文摘要

In recent years, the advances of large language models and autonomous agents have revolutionized the healthcare field, facilitating diagnosis and improving treatment results. However, most existing AI systems rely on pre-trained knowledge and predefined pipelines, which struggle to learn dynamically from the interactive chat session history that contains patient outcomes and past failures. To address this limitation, we propose VIBEMed, a multi-agent framework with a built-in self-evolution mechanism and architecture-level safety sandbox for robust clinical decision support. The system integrates three specialized agents, including a Clinical Diagnostic Agent (CDA) for hypothesis generation, a Therapeutic Execution Agent (TEA) for treatment planning, and a Clinical Evolution Manager Agent (CEMA) that distills longitudinal clinical feedback into reusable knowledge, transforming multimodal patient information into personalized medical decisions. Through self-evolution mechanism, the framework enables iterative updates across memory, model behavior, and decision strategies, allowing the system to improve over time. Experimental results show that VIBEMed demonstrates superior performance through its evolving mechanism in complex clinical cases, particularly in tasks that require integrated decision-making and longitudinal planning. The framework also supports reliable end-to-end decisions in challenging scenarios such as oncology treatment planning, highlighting its feasibility in real-world clinical contexts. Overall, VIBEMed provides a practical path beyond static AI systems toward adaptive, experience-driven clinical decision support, demonstrating the value of combining multi-agent collaboration with continuous evolution for advancing precision medicine.

2606.07150 2026-06-18 cs.CR cs.AI cs.MA cs.NI 新提交 85%

From Privacy to Workflow Integrity: Communication-Graph Metadata in Autonomous Agent Interoperability

从隐私到工作流完整性:自主智能体互操作性中的通信图元数据

Bijaya Dangol

发表机构 * Independent Researcher(独立研究者)

专题命中 多智能体 :研究智能体互操作性协议中的通信图元数据威胁

AI总结 针对智能体通信图元数据泄露问题,提出工作流完整性威胁模型,定义传输层与引导层隐私属性,并通过A2A案例验证元数据保护可有效抑制任务推断。

Comments 22 pages, 7 figures, 6 tables

详情
AI中文摘要

诸如A2A和MCP之类的智能体互操作性协议标准化了智能体之间的通信内容,但假设基于地址的HTTP(S)传输。此类传输保护消息内容,并越来越多地采用端到端加密。它们暴露在明文中的是通信图:哪个智能体联系哪个智能体、何时以及频率如何。在智能体系统中,该图比隐私框架所暗示的更具后果性。端点通常带有能力标签,工作流是结构化和链式的,交互与实际行动耦合,因此观察者恢复的不仅仅是过去的关系。它可以推断出待处理的工作流、正在组装的任务以及可能即将发生的行动。以机器速度,它可以在工作流完成之前根据该推断采取行动。因此,威胁是工作流完整性,而不仅仅是隐私:对自主行动的预测性杠杆。我们为智能体通信图提供了一个威胁模型;识别了使智能体元数据具有独特揭示性的因素(语义性、前瞻性、驱动性);定义了传输层和引导层隐私属性,并评估了候选传输(SimpleX/SMP、Tor、混合网络)与这些属性的匹配程度;并提出了一个A2A案例研究,其中元数据保护绑定是可表达的,但揭示了协议的身份假设。我们在一个基于真实A2A捕获的生成模型上测试了这些。仅凭被动元数据,没有载荷,一个分类器从工作流的开头就能以远高于随机水平的概率恢复任务类别;应用这些属性后,该恢复被急剧拉回随机水平。除了观察者能恢复的内容外,我们衡量了利用泄露的杠杆:在工作流开头和固定预算下,选择对哪些工作流采取行动的对手在此模型中实现了大部分先知攻击者相对于元数据盲攻击者的优势,而相同的属性抑制了这一点。

英文摘要

Agent-interoperability protocols such as A2A and MCP standardize what agents say to one another but assume address-based transport. Whether over HTTP(S) or a content-protecting binding such as MLS-based SLIM, these transports protect message content yet leave the communication graph exposed: which agent contacts which, when, and how often. In agent systems this graph is more consequential than a privacy framing suggests. Endpoints are capability-labeled, workflows are structured and chained, and interactions are coupled to actions, so an observer recovers more than past relationships: it can recognize a recurring pending workflow from its opening and, at machine speed, act on it before it completes. The threat is one of workflow integrity, not privacy alone. We give a threat model for the communication graph and locate what makes its metadata distinctively consequential: not stronger fingerprinting but exposure across independent trust domains, coupled to autonomous action. We define transport- and bootstrap-layer privacy properties, give them an indistinguishability-game semantics, evaluate transports, and give an A2A case study where a metadata-protecting binding surfaces its implicit identity assumptions. On a corpus of real multi-agent A2A traffic from the official reference agents, on a live A2A binding, and with a generative model as a controlled instrument, a label-blind classifier recovers a task's class from passive metadata at 6x chance, and from only its opening; a defense-aware adversary does not overturn this, and only the full set of properties drives recovery toward chance. Acting on the leak is distinct from recoverability: under a fixed budget an adversary captures 0.63 of a clairvoyant attacker's advantage on the corpus (0.41 from a workflow's opening), governed by top-ranked precision rather than overall accuracy, so integrity and privacy come apart under defense.

2605.25929 2026-06-18 cs.MA cs.LG 版本更新 85%

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合:谁成为影响者?

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security(CISPA海德堡信息安全中心) Faculty of Computer Science, University of Vienna(维也纳大学计算机科学系)

专题命中 多智能体 :研究多智能体LLM协商机制,属于多智能体系统。

AI总结 本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制,揭示输入依赖的FJ参数使系统成为专家混合,并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

详情
AI中文摘要

多智能体LLM协商的有效性不仅取决于智能体的个体预测,还取决于它们如何沟通和协作。我们通过Friedkin-Johnsen (FJ)意见动力学的视角研究这一机制,这是一个可处理的模型,用于分析多智能体系统中的固执、影响力和意见变化,并捕捉经验观察到的协商模式。我们表明FJ参数是输入依赖的,将多智能体协商转变为专家混合。这一视角意味着,当路由反映智能体能力时,多智能体系统可以胜过单个智能体和静态集成。由于能力在实践中是潜在的,我们分析了影响力如何通过可观察的代理建立:智能体的自我评估自信度、感知自信度以及与其他智能体观点的初始对齐。

英文摘要

The effectiveness of multi-agent LLM deliberation depends not only on the agents' individual predictions, but also on how they communicate and collaborate. We study this mechanism through the lens of Friedkin-Johnsen (FJ) opinion dynamics, a tractable model for analyzing stubbornness, influence, and opinion change in multi-agent systems that captures empirically observed deliberation patterns. We show that the FJ parameters are input-dependent, turning multi-agent deliberation into a mixture of experts. This perspective implies that multi-agent systems can outperform single agents and static ensembles when routing reflects agent competence. Since competence is latent in practice, we analyze how influence is established through observable proxies: agents' self-assessed confidence, their perceived confidence, and initial alignment with other agents' views.

2605.18185 2026-06-18 cs.MA 版本更新 85%

The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection

在有伴侣选择的社交困境中政策梯度的动力学

Benedict Russell, Chin-wing Leung, Paolo Turrini

专题命中 多智能体 :研究多智能体社交困境中的策略梯度动力学。

AI总结 本文研究了在有伴侣选择的多智能体环境中政策梯度动力学,揭示了伴侣选择如何改变对手分布及奖励景观,并证明在简单规则下促进合作的必要条件是种群方差。

详情
AI中文摘要

在社交困境中,自利学习智能体面临合作的社会效益与背叛的即时奖励之间的选择。已有大量证据表明, assortments 机制如伴侣选择对合作的出现有显著益处,但这些证据大多通过基于代理的模拟获得。本文提供了该问题的分析解,研究了具有伴侣选择的多智能体环境中的政策梯度动力学。我们展示了伴侣选择如何改变对手分布以及奖励景观,并证明这在简单规则下促进合作。特别是,我们发现种群方差是合作出现的必要条件。使用二维维纳过程,我们扩展了动力学以捕捉伴侣选择的随机效应及由此产生的对手分布。我们推导了种群促进合作的充分条件,并证明了稳态分布的存在。模拟证实了随机模型准确捕捉了政策梯度动力学,并澄清了学习率如何影响合作的出现。

英文摘要

In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.

2508.21720 2026-06-18 cs.AI 版本更新 85%

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

PosterForest: 用于科学海报生成的分层多智能体协作

Jiho Choi, Seojeong Park, Seongjong Song, Hyunjung Shim

发表机构 * Graduate School of Artificial Intelligence, KAIST(韩国釜山国立大学人工智能研究生院) School of Integrated Technology, Yonsei University(延世大学整合技术学院)

专题命中 多智能体 :分层多智能体协作生成科学海报

AI总结 提出PosterForest,一种无需训练的科学海报生成框架,通过Poster Tree分层表示文档结构,并利用内容与布局智能体进行分层推理与递归优化,实现内容与布局的联合优化,提升语义连贯性、逻辑流畅性和视觉平衡。

Comments ACL 2026

详情
AI中文摘要

自动化科学海报生成需要层次化的文档理解和连贯的内容-布局规划。现有方法通常依赖于平面摘要或分别优化内容和布局。因此,它们常常遭受信息丢失、逻辑流程薄弱和视觉平衡差的问题。我们提出了PosterForest,一个无需训练的科学海报生成框架。我们的方法引入了Poster Tree,一种结构化的中间表示,能够跨多个层次捕获文档层次结构和视觉-文本语义。基于这种表示,内容和布局智能体执行分层推理和递归优化,从全局组织到局部组成逐步优化海报。这种联合优化提高了语义连贯性、逻辑流畅性和视觉和谐。实验表明,PosterForest在自动评估和人工评估中均优于先前方法,且无需额外训练或领域特定监督。

英文摘要

Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.

2606.19135 2026-06-18 cs.MA cs.AI cs.NI 新提交 80%

A Technical Taxonomy of LLM Agent Communication Protocols

LLM智能体通信协议的技术分类法

Linus Sander, Habtom Kahsay Gidey, Alexander Lenz, Alois Knoll

发表机构 * Technische Universität München(慕尼黑技术大学)

专题命中 多智能体 :分类LLM智能体通信协议,核心是Agent通信

AI总结 针对大语言模型智能体通信协议碎片化问题,提出包含五个维度的技术分类法,分析九种开源协议,揭示架构模式并预测协议演进趋势。

详情
AI中文摘要

随着大语言模型(LLM)的进步以及多智能体系统旨在克服单智能体的局限性,健壮的通信协议正成为分布式智能体网络的关键基础设施。然而,碎片化的协议格局带来了显著的互操作性挑战。本研究开发了一种技术分类法,用于分类和分析LLM智能体通信协议。遵循既定的迭代方法,我们定义了分类法的目的、元特征和终止条件,然后在九个积极维护且具有可证明采用度的开源协议上执行了五次迭代(三次从经验到概念,两次从概念到经验)。该分类法包含五个维度:交易对手、有效载荷、交互状态、发现机制和模式灵活性。分类揭示了重复出现的架构模式:所有采样的智能体间协议都将混合有效载荷与会话状态持久性相结合;大多数协议支持多个预定义模式,其中两个协议在运行时协商模式,表明向模式灵活性的趋势;去中心化发现仍然罕见。分析表明,短期内存在向统一智能体间和智能体-上下文(工具和数据)通信的协议收敛压力。然而,长期来看,没有单一协议能同时最大化通用性、效率和可移植性。该领域更可能演变为联邦式分层协议栈。该框架指导协议选择,并突出开放的研究空白,如隐私和策略执行。

英文摘要

As large language models (LLMs) advance and multi-agent systems aim to overcome the limits of standalone agents, robust communication protocols are becoming essential infrastructure for distributed agent networks. Nonetheless, the fragmented protocol landscape presents a significant interoperability challenge. This study develops a technical taxonomy to classify and analyze LLM agent communication protocols. Following an established iterative method, we defined the taxonomy's purpose, meta-characteristic, and ending conditions, then performed five iterations, three empirical-to-conceptual and two conceptual-to-empirical, on nine actively maintained open-source protocols with demonstrable adoption. The taxonomy comprises five dimensions: counterparty, payload, interaction state, discovery mechanism, and schema flexibility. Classification reveals recurring architectural patterns: all sampled agent-to-agent protocols combine hybrid payloads with session-state persistence; most protocols support multiple predefined schemas, and two negotiate schemas at runtime, indicating a trend toward schema flexibility; decentralized discovery remains rare. Analysis suggests short-term convergence pressure toward protocols unifying agent-to-agent and agent-to-context (tool and data) communication. Long-term, however, no single protocol is likely to maximize versatility, efficiency, and portability simultaneously. The field will more likely evolve toward a federated, layered protocol stack. The framework guides protocol selection and highlights open research gaps such as privacy and policy enforcement.}

2606.19080 2026-06-18 eess.SY cs.SY 新提交 80%

Byzantine-Resilient Federated Multi-Agent Optimization Framework for Cyber-Secure Interconnected Microgrids

面向网络安全互联微电网的拜占庭弹性联邦多智能体优化框架

Ali Peivand, Seyyed Mostafa Nosratabadi

专题命中 多智能体 :联邦多智能体优化,拜占庭弹性。

AI总结 提出BR-FedMAPPO框架,结合三重表面移动目标防御与自适应隔离策略,通过两阶段拜占庭弹性聚合规则抵御隐蔽虚假数据注入攻击,保护分布式学习通道并维持经济调度性能。

详情
AI中文摘要

配电网络日益数字化,使得互联微电网集群面临隐蔽虚假数据注入攻击,这些攻击绕过不良数据检测器,通过联络线耦合和共享学习通道传播。本文提出BR-FedMAPPO,一种拜占庭弹性联邦多智能体近端策略优化框架,学习三重表面移动目标防御和自适应隔离策略以实现网络安全运行。每个微电网托管一个本地Actor-Critic智能体,其策略被划分为全局联邦共享编码器和私有保留动作头,因此没有微电网暴露其D-FACTS线路、电池储能单元或联络线容量的配置、基数或位置。动作向量扰动D-FACTS电抗、重定向BES注入、重塑微电网间交换,并包含连续孤岛信号。两阶段拜占庭弹性聚合规则结合了修剪均值滤波和奖励加权更新。该方案基于F1分数和假阳性率纳入检测质量分数,以惩罚引起误报的客户端。在基于IEEE 30节点和118节点测试系统的四个互联微电网上的仿真结果表明,该框架能有效缓解协调的S-FDI攻击,通过自适应隔离遏制级联中断,保护分布式学习通道免受恶意模型操纵,同时保持成本感知的调度性能。

英文摘要

The escalating digitalization of distribution networks has exposed interconnected Microgrid (MG) clusters to Stealthy False Data Injection Attacks that bypass Bad Data Detectors and propagate through tie-line couplings and shared learning channels. This paper proposes BR-FedMAPPO, a Byzantine-Resilient Federated Multi-Agent Proximal Policy Optimization framework that learns a triple-surface Moving Target Defense and an adaptive isolation strategy for cyber-secure operation. Each MG hosts a local Actor-Critic Agent whose policy is partitioned into a globally federated shared encoder and a privately retained action head, so no MG exposes the configurations, cardinality, or locations of its D-FACTS lines, Battery Energy Storage (BES) units, or tie-line capacities. The action vector perturbs D-FACTS reactances, redirects BES injections, reshapes inter-MG exchanges, and includes a continuous islanding signal. A two-stage Byzantine-resilient aggregation rule combines trimmed-mean filtering with reward-weighted updates. This scheme incorporates a detection-quality score based on the F1-score and False Positive Rate to penalize clients causing false alarms. Simulation results on four interconnected MGs based on the IEEE 30- and 118-bus test systems demonstrate effective mitigation of coordinated S-FDI attacks, containment of cascading disruptions through adaptive isolation, and protection of distributed learning channels against malicious model manipulations while maintaining cost-aware dispatch performance.

2606.18829 2026-06-18 cs.LG cs.CL 新提交 80%

GateMem: Benchmarking Memory Governance in Multi-Principal Shared-Memory Agents

GateMem:多主体共享内存代理中的内存治理基准

Zhe Ren, Yibo Yang, Yimeng Chen, Zijun Zhao, Benshuo Fu, Zhihao Shu, Bingjie Zhang, Yangyang Xu, Dandan Guo, Shuicheng Yan

发表机构 * School of Artificial Intelligence, Jilin University(吉林大学人工智能学院) Shanghai Jiao Tong University(上海交通大学) King Abdullah University of Science and Technology (KAUST)(卡尔斯鲁厄大学) Tsinghua University(清华大学) National University of Singapore(新加坡国立大学)

专题命中 多智能体 :多主体共享内存代理的记忆治理基准

AI总结 提出GateMem基准,评估多主体共享内存代理在效用、访问控制和遗忘三方面的治理能力,发现现有方法无法同时满足三者。

Comments 24 pages, 8 figures. Code and dataset are available at https://github.com/rzhub/GateMem and https://huggingface.co/datasets/Ray368/GateMem

详情
AI中文摘要

LLM代理的内存基准主要假设单用户设置,而医院、工作场所、校园和家庭中的共享助手研究不足。在这些部署中,多个主体写入公共内存池并根据不同角色、范围和关系进行查询,因此内存质量需要治理和召回。我们引入GateMem,一个多主体共享内存代理的基准。GateMem联合评估合法长期请求的效用(含状态更新)、跨上下文授权边界的访问控制,以及显式删除请求后的主动遗忘。它涵盖医疗、办公、教育和家庭领域,包含长形式多方情节、增量内存注入、隐藏检查点、结构化评判和泄漏目标注释。在多种基线和骨干模型上,没有方法能同时实现强效用、鲁棒访问控制和可靠遗忘。长上下文提示通常以高令牌成本获得最佳治理分数,而基于检索和外部内存的方法降低成本但仍泄漏未授权或已删除信息。这些结果表明,当前内存代理远未达到可靠的共享机构部署水平。

英文摘要

Memory benchmarks for LLM agents largely assume single-user settings, leaving shared assistants for hospitals, workplaces, campuses, and households understudied. In these deployments, multiple principals write to a common memory pool and query it under different roles, scopes, and relationships, so memory quality requires governance as well as recall. We introduce GateMem, a benchmark for multi-principal shared-memory agents. GateMem jointly evaluates utility for legitimate long-horizon requests with state updates, access control across contextual authorization boundaries, and agent-facing active forgetting after explicit deletion requests. It spans medical, office, education, and household domains, with long-form multi-party episodes, incremental memory injection, hidden checkpoints, structured judging, and leak-target annotations. Across diverse baselines and backbone models, no method simultaneously achieves strong utility, robust access control, and reliable forgetting. Long-context prompting often yields the best governance score at high token cost, while retrieval-based and external-memory methods reduce cost yet still leak unauthorized or deleted information. These results show current memory agents remain far from reliable shared institutional deployment.

2606.18276 2026-06-18 cs.MA cs.SI physics.soc-ph 新提交 80%

Characterizing Opinion Evolution of Networked LLMs

表征网络化大语言模型的意见演化

Caleb Probine, Yigit Ege Bayiz, Filippos Fotiadis, Samuel Li, Yunhao Yang, Ufuk Topcu

专题命中 多智能体 :研究网络化LLM多智能体系统中的意见演化动力学。

AI总结 研究经典意见动力学模型能否描述多智能体系统中大语言模型(LLM)的意见传播,发现引入偏置项可显著提升建模精度,将平均意见误差降低高达88%。

Comments 19 pages, 2 figures

详情
AI中文摘要

大语言模型(LLM)在多智能体系统中日益相互交互,从人类话语模拟到影响力操作以及完全由LLM驱动的社交平台。这些交互产生了尚未被充分理解的新的意见传播机制。我们研究了长期以来用于解释人类社会中互动如何塑造集体信念的经典意见动力学模型是否能够捕捉LLM网络的行为。我们发现,虽然朴素的平均式模型无法跟踪LLM的意见动态,但简单的修改在建模保真度上带来了显著提升。特别是,偏置——智能体回归的内在意见——成为LLM意见动态的重要驱动因素,其引入将累积估计平均意见误差降低了高达88%。我们还发现,这些结论在不同模型家族、讨论主题和网络中具有普遍性。

英文摘要

Large language models (LLMs) increasingly interact with one another in multi-agent systems, from simulations of human discourse to influence operations and fully LLM-driven social platforms. These interactions give rise to new regimes of opinion propagation that are not yet well understood. We investigate whether classical opinion dynamics models, which have long been used to explain how interactions shape collective beliefs in human societies, can capture the behavior of LLM networks. We find that, while naive averaging-style models fail to track LLMs' opinion dynamics, simple modifications yield substantial gains in modeling fidelity. In particular, bias, an innate opinion toward which agents regress, emerges as a significant driver of LLM opinion dynamics, with its inclusion reducing cumulative estimated mean opinion error by up to 88%. We additionally find that these conclusions generalize across model families, discussion topics, and networks.

2605.01818 2026-06-18 nlin.AO physics.soc-ph 版本更新 80%

Emergent Macro-Criticality from Micro-Critical Agents

从微观临界主体涌现的宏观临界性

Nicolas Bessone, Erwan Plantec

专题命中 多智能体 :多智能体系统,微观临界性涌现宏观临界

AI总结 通过多智能体系统研究微观临界性如何影响集体行为,发现宏观临界性依赖于交互网络的连接性,而非单个智能体的临界动力学。

详情
AI中文摘要

临界性已被提出作为生物和人工系统中复杂行为的关键原则;然而,临界性如何从个体动力学转化为集体行为仍不清楚。我们使用一个具有空间约束交互的多智能体系统来研究这个问题,其中智能体通过外感受器感知邻近的光信号,并通过开关自身的光来行动,从而在宏观层面形成一个动态交互网络。智能体的内部状态在微观层面由储层动力系统控制。通过改变微观参数围绕动力学临界性,以及宏观交互拓扑,我们系统地研究了这两个层面之间的关系。我们发现,单个智能体内的近临界动力学不足以产生集体临界般的雪崩统计。相反,无标度行为取决于控制活动传播的宏观交互网络的有效连接性。因此,宏观临界般的动力学是由偏离临界性的微观机制实现的,所需的偏离取决于交互网络的特性。研究这种关系,我们发现略微亚临界的微观层面支持在更广泛的宏观参数范围内接近临界动力学。这些结果表明,在这个多智能体系统中,集体近临界行为取决于内部动力学与控制活动传播的交互结构之间的相互作用。

英文摘要

Criticality has been proposed as a key principle underlying complex behavior in biological and artificial systems; however, how criticality translates from individual dynamics to collective behavior remains unclear. We study this question using a multi-agent system with spatially constrained interactions in which agents sense neighboring light signals through exteroceptors and act by switching their own light on or off, thereby forming a dynamical interaction network at the macroscopic level. The agents' internal states are themselves governed by a reservoir dynamical system at the microscopic level. By varying the microscopic parameters around dynamical criticality, together with the macroscopic interaction topology, we systematically investigate the relation between the two levels. We find that near-critical dynamics within individual agents is not sufficient to produce collective critical-like avalanche statistics. Instead, scale-free behavior depends on the effective connectivity of the macroscopic interaction network, which controls activity propagation. As a result, macroscopic critical-like dynamics are enabled by microscopic regimes that deviate from criticality, with the required deviation depending on the properties of the interaction network. Investigating this relation, we find that slightly subcritical micro-level regimes support near-critical dynamics across a wider range of macroscopic parameters. These results show that in this multi-agent system, collective near-critical behavior depends on the interplay between internal dynamics and the interaction structure that governs activity propagation.

2606.19152 2026-06-18 cond-mat.mtrl-sci cs.AI 新提交 80%

AdsMind: A Physics-Grounded Multi-Agent System for Self-Correcting Discovery of Adsorption Configurations on Heterogeneous Catalyst Surfaces

AdsMind: 一种基于物理的多智能体系统,用于异质催化剂表面吸附构型的自校正发现

Zongmin Zhang, Yuyang Lou, Bowen Zhang, Junwu Chen, Ryo Kuroki, Xuan Vu Nguyen, Edvin Fako, Lixue Cheng, Philippe Schwaller

发表机构 * Department of Computer Science Engineering, Hong Kong University of Science Department of Chemistry, Hong Kong University of Science Laboratory of Artificial Chemical Intelligence (LIAC), EPFL, Lausanne, Switzerland Platform Laboratory for Science \& Technology, Asahi Kasei Corporation, Tokyo, Japan IAS Center for AI for Scientific Discoveries, Hong Kong University of Science

专题命中 多智能体 :提出闭环多智能体框架,自主纠错搜索。

AI总结 提出AdsMind闭环多智能体框架,利用机器学习力场弛豫反馈实现吸附构型搜索的自主纠错,在基准测试中成功率高达100%和98.8%,且仅需少量弛豫步骤,显著优于启发式枚举和单次方法。

Comments 37 pages, 5 figures

详情
AI中文摘要

识别最低能量的表面-吸附物构型对于模拟异质催化至关重要,然而使用从头计算方法进行穷举探索在计算上是不可行的。机器学习力场(MLFF)加速了结构弛豫,但将广阔构型空间中的搜索留作主要瓶颈,而开环的大语言模型(LLM)智能体缺乏基于物理的反馈机制来纠正错误的初始猜测。我们提出了AdsMind(基于机器智能和弛豫反馈的吸附构型发现),这是一个闭环多智能体框架,通过MLFF弛豫反馈实现自主纠错。在四个LLM后端上,AdsMind实现了持续的高搜索可靠性,在基准AA20和OCD-GMAE62上的成功率分别为100%和98.8%。相对于其单次(1-Shot)消融,它降低了跨后端的能量分散,并且每个案例仅分别使用4.11和4.67次MLFF弛豫——相比启发式枚举基线减少了约14倍。使用VASP/PBE对六个代表性AA20系统进行的密度泛函理论(DFT)验证表明,所报告的开环Adsorb-Agent输出对分子吸附物存在定性的吸附能符号错误,而AdsMind在所有测试案例中均保持正确的符号,且定量一致性更佳。因此,AdsMind同时提供了可靠性、自我反思和可解释性,支持更多基于DFT的自主化学工作流程。

英文摘要

Identifying the lowest-energy surface-adsorbate configuration is critical for modeling heterogeneous catalysis, yet exhaustive exploration with ab initio calculations is computationally prohibitive. Machine-learning force fields (MLFFs) accelerate structural relaxation but leave the search over the vast configurational space a major bottleneck, and open-loop large language model (LLM) agents lack a physics-grounded feedback mechanism to correct erroneous initial guesses. We propose AdsMind (Adsorption configuration discovery with Machine intelligence and relaxation feedback), a closed-loop multi-agent framework that enables autonomous error correction through MLFF relaxation feedback. Across four LLM backends, AdsMind achieves consistently high search reliability, with success rates of 100% and 98.8% on the benchmarks AA20 and OCD-GMAE62. Relative to its single-pass (1-Shot) ablation it reduces cross-backend energy dispersion, and it uses only 4.11 and 4.67 MLFF relaxations per case, respectively -- an approximately 14-fold reduction over heuristic enumeration baselines. Density functional theory (DFT) validation using VASP/PBE on six representative AA20 systems shows that the reported open-loop Adsorb-Agent outputs exhibit qualitative adsorption-energy sign errors for molecular adsorbates, whereas AdsMind preserves the correct sign in all tested cases with closer quantitative agreement. AdsMind thus delivers reliability, self-reflection, and interpretability simultaneously, supporting more DFT-informed autonomous chemistry workflows.

2606.05882 2026-06-18 q-fin.TR 版本更新 80%

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

市场知情度对做市商盈利能力的影响

Konrad Ochędzan, Nino Antulov-Fantulin

专题命中 多智能体 :多智能体强化学习研究市场知情度影响

AI总结 本文通过多智能体强化学习框架研究市场知情度对做市商盈利能力的影响,发现知情订单流在低知情市场中导致严重逆向选择风险,但整体上市场知情度提高带来的价格发现效应抵消了逆向选择的负面影响,使做市商盈利能力呈上升趋势。

详情
AI中文摘要

本文研究了市场知情度对做市商盈利能力的影响。与现有文献不同,分析是在一个复杂的市场环境中进行的,该环境具有异质性的做市代理,它们在信息集和库存风险厌恶程度、内生价格形成、外生基本面价值动态以及自激励的市场订单流方面存在差异。本文还为由此产生的状态依赖的霍克斯市场接受者过程建立了有限时间范围内的稳定性保证,包括非爆炸性、指数级错误定价可积性、占用时间界限以及路径wise的错误定价尾部估计。为了解决做市问题,该研究采用了一种基于多智能体近端策略优化(MAPPO)算法的强化学习框架,该框架采用集中训练与分散执行(CTDE)设置。研究表明,知情市场订单流在低知情市场中尤其危险,导致严重的逆向选择风险。尽管复杂的市场动态加上随机训练导致了局部非单调的结果,但结果仍然揭示了做市商盈利能力随着市场知情度的提高而整体上升的趋势,这表明由更高市场知情度带来的价格发现效应抵消了逆向选择的负面影响。

英文摘要

This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.

2603.01221 2026-06-18 cs.MA 版本更新 80%

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

认知增益,偶然成本:多智能体辩论中的不确定性分解用于数学推理

Dan Qiao, Binbin Chen, Fengyu Cai, Jianlong Chen, Wenhao Li, Fuxin Jiang, Zuzhi Chen, Hongyuan Zha, Tieying Zhang, Baoxiang Wang

专题命中 多智能体 :多智能体辩论框架,强化学习优化

AI总结 本文提出贝叶斯不确定性分析框架,将多智能体辩论中的预测不确定性分解为认知不确定性和偶然不确定性,并设计不确定性引导的多智能体强化学习算法,在控制偶然成本的同时提升认知增益,从而提高推理准确性和辩论效率。

Comments ICML2026

详情
AI中文摘要

多智能体辩论(MAD)在改善推理和减少幻觉方面显示出前景,但信息交换如何塑造个体推理行为仍不清楚。经验上,MAD表现出矛盾现象,包括准确率随token熵增加而上升,以及同质和异质智能体组合之间的显著差异。在本文中,我们引入了一个用于MAD的贝叶斯不确定性分析框架,该框架将答案级别的预测不确定性分解为认知不确定性和偶然不确定性,分别对应辩论的潜在增益和成本。在多种智能体配置下,我们发现有效的辩论取决于在受控的偶然成本下实现高认知增益。基于这一见解,我们设计了一种不确定性引导的多智能体强化学习算法,鼓励更低的偶然成本和更有效的认知信息利用。实验表明,我们的方法同时提高了每个智能体的准确性,并促进了更富有成效的辩论过程,为理解和改进MAD提供了一个可操作的贝叶斯视角。

英文摘要

Multi-Agent Debate (MAD) has shown promise in improving reasoning and reducing hallucinations, yet it remains unclear how information exchange shapes individual reasoning behavior. Empirically, MAD exhibits paradoxical phenomena, including rising accuracy with increasing token entropy and marked differences between homogeneous and heterogeneous agent combinations. In this paper, we introduce a Bayesian uncertainty analysis framework for MAD, which decomposes answer-level predictive uncertainty into epistemic uncertainty and aleatoric uncertainty, corresponding to the potential gain and cost of debate. Across multiple agent configurations, we find that effective debate depends on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning algorithm that encourages lower aleatoric cost and more effective epistemic information utilization. Experiments show that our approach simultaneously enhances each agent's accuracy and promotes a more productive debate process, providing an operational Bayesian perspective for understanding and improving MAD.

2606.18836 2026-06-18 cs.HC cs.AI 新提交 70%

Improving Human-Robot Teamwork in Urban Search and Rescue Through Episodic Memory of Prior Collaboration

通过先前协作的片段记忆改善城市搜索与救援中的人机团队合作

Taewoon Kim, Emma van Zoelen, Mark Neerincx

发表机构 * HumemAI, The Netherlands(荷兰HumemAI) Vrije Universiteit Amsterdam, The Netherlands(荷兰阿姆斯特丹自由大学) TNO, The Netherlands(荷兰TNO)

专题命中 多智能体 :人机团队,记忆复用。

AI总结 提出利用知识图谱片段记忆存储历史协作模式,通过图表示学习选择代表性记忆初始化机器人,在MATRX USAR环境中将救援成功率从25.7%提升至41.3%,任务时间减少283秒。

详情
AI中文摘要

有效的人机团队合作要求机器人从交互开始就适应伙伴、情境和任务动态。在MATRX城市搜索与救援(USAR)环境中,人们可以通过聊天和反思界面将他们在团队合作中发现的协作模式(CPs)外部化。我们研究机器人是否可以利用这种先前的团队经验,在未来的交互中成为更好的队友。为此,我们将历史CPs表示为知识图谱片段记忆,并使用具有节点分类目标的图表示学习来识别一个代表性且有效的记忆以供重用。然后,在新的协作片段开始之前,我们用该记忆初始化机器人。在20名参与者和160轮次观察中,用单个自动选择的先前CP初始化机器人将救援成功率从25.7%提高到41.3%,并将平均任务时间减少283秒。最强的提升出现在交互开始时,表明可重用的片段记忆可以帮助机器人以更有效的任务知识进入协作,并支持更顺畅的早期团队合作。

英文摘要

Effective human-robot teamwork requires robots to adapt to partners, situations, and task dynamics from the start of an interaction. In the MATRX Urban Search and Rescue (USAR) environment, people can externalize collaboration patterns (CPs) they discover during teamwork through a chat and reflection interface. We study whether a robot can use such prior team experience to become a better teammate in future interactions. To this end, we represent historical CPs as knowledge-graph episodic memories and use graph representation learning with a node-classification objective to identify a representative and effective memory for reuse. We then initialize the robot with this memory before a new collaboration episode begins. Across 20 participants and 160 round-level observations, initializing the robot with a single automatically selected prior CP increases rescue success from 25.7% to 41.3% and reduces average task time by 283 seconds. The strongest gains appear at the beginning of interaction, suggesting that reusable episodic memory can help robots enter collaboration with more effective task knowledge and support smoother early teamwork.