AI Agent - arXivDaily 专题

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新 90%

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； University of Notre Dame（诺丁汉大学）

专题命中多智能体：提出自进化多智能体系统，通过文本反向传播优化协作。

AI总结提出Agentic Neural Network框架，将多智能体协作建模为分层神经网络，通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化，在七个基准数据集上超越现有方法。

详情

AI中文摘要

利用多个大型语言模型（LLM）已被证明对处理复杂、高维任务有效，但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制，我们提出Agentic Neural Network（ANN）框架，该框架将多智能体协作概念化为分层神经网络架构。在此设计中，每个智能体作为节点运行，每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略：（1）前向阶段——受神经网络前向传播启发，任务被动态分解为子任务，并逐层构建具有合适聚合方法的协作智能体团队。（2）反向阶段——模仿反向传播，我们通过迭代反馈优化全局和局部协作，使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队，在准确性和适应性方面带来显著提升。在七个基准数据集上，我们的工作在相同配置下超越了领先的多智能体基线，显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2605.25929 2026-06-18 cs.MA cs.LG 版本更新 85%

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合：谁成为影响者？

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）； Faculty of Computer Science, University of Vienna（维也纳大学计算机科学系）

专题命中多智能体：研究多智能体LLM协商机制，属于多智能体系统。

AI总结本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制，揭示输入依赖的FJ参数使系统成为专家混合，并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

2605.18185 2026-06-18 cs.MA 版本更新 85%

The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection

在有伴侣选择的社交困境中政策梯度的动力学

Benedict Russell, Chin-wing Leung, Paolo Turrini

专题命中多智能体：研究多智能体社交困境中的策略梯度动力学。

AI总结本文研究了在有伴侣选择的多智能体环境中政策梯度动力学，揭示了伴侣选择如何改变对手分布及奖励景观，并证明在简单规则下促进合作的必要条件是种群方差。

详情

AI中文摘要

在社交困境中，自利学习智能体面临合作的社会效益与背叛的即时奖励之间的选择。已有大量证据表明， assortments 机制如伴侣选择对合作的出现有显著益处，但这些证据大多通过基于代理的模拟获得。本文提供了该问题的分析解，研究了具有伴侣选择的多智能体环境中的政策梯度动力学。我们展示了伴侣选择如何改变对手分布以及奖励景观，并证明这在简单规则下促进合作。特别是，我们发现种群方差是合作出现的必要条件。使用二维维纳过程，我们扩展了动力学以捕捉伴侣选择的随机效应及由此产生的对手分布。我们推导了种群促进合作的充分条件，并证明了稳态分布的存在。模拟证实了随机模型准确捕捉了政策梯度动力学，并澄清了学习率如何影响合作的出现。

英文摘要

In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.

URL PDF HTML ☆

赞 0 踩 0

2508.21720 2026-06-18 cs.AI 版本更新 85%

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

PosterForest: 用于科学海报生成的分层多智能体协作

Jiho Choi, Seojeong Park, Seongjong Song, Hyunjung Shim

发表机构 * Graduate School of Artificial Intelligence, KAIST（韩国釜山国立大学人工智能研究生院）； School of Integrated Technology, Yonsei University（延世大学整合技术学院）

专题命中多智能体：分层多智能体协作生成科学海报

AI总结提出PosterForest，一种无需训练的科学海报生成框架，通过Poster Tree分层表示文档结构，并利用内容与布局智能体进行分层推理与递归优化，实现内容与布局的联合优化，提升语义连贯性、逻辑流畅性和视觉平衡。

Comments ACL 2026

详情

AI中文摘要

自动化科学海报生成需要层次化的文档理解和连贯的内容-布局规划。现有方法通常依赖于平面摘要或分别优化内容和布局。因此，它们常常遭受信息丢失、逻辑流程薄弱和视觉平衡差的问题。我们提出了PosterForest，一个无需训练的科学海报生成框架。我们的方法引入了Poster Tree，一种结构化的中间表示，能够跨多个层次捕获文档层次结构和视觉-文本语义。基于这种表示，内容和布局智能体执行分层推理和递归优化，从全局组织到局部组成逐步优化海报。这种联合优化提高了语义连贯性、逻辑流畅性和视觉和谐。实验表明，PosterForest在自动评估和人工评估中均优于先前方法，且无需额外训练或领域特定监督。

英文摘要

Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.

URL PDF HTML ☆

赞 0 踩 0

2605.01818 2026-06-18 nlin.AO physics.soc-ph 版本更新 80%

Emergent Macro-Criticality from Micro-Critical Agents

从微观临界主体涌现的宏观临界性

Nicolas Bessone, Erwan Plantec

专题命中多智能体：多智能体系统，微观临界性涌现宏观临界

AI总结通过多智能体系统研究微观临界性如何影响集体行为，发现宏观临界性依赖于交互网络的连接性，而非单个智能体的临界动力学。

详情

AI中文摘要

临界性已被提出作为生物和人工系统中复杂行为的关键原则；然而，临界性如何从个体动力学转化为集体行为仍不清楚。我们使用一个具有空间约束交互的多智能体系统来研究这个问题，其中智能体通过外感受器感知邻近的光信号，并通过开关自身的光来行动，从而在宏观层面形成一个动态交互网络。智能体的内部状态在微观层面由储层动力系统控制。通过改变微观参数围绕动力学临界性，以及宏观交互拓扑，我们系统地研究了这两个层面之间的关系。我们发现，单个智能体内的近临界动力学不足以产生集体临界般的雪崩统计。相反，无标度行为取决于控制活动传播的宏观交互网络的有效连接性。因此，宏观临界般的动力学是由偏离临界性的微观机制实现的，所需的偏离取决于交互网络的特性。研究这种关系，我们发现略微亚临界的微观层面支持在更广泛的宏观参数范围内接近临界动力学。这些结果表明，在这个多智能体系统中，集体近临界行为取决于内部动力学与控制活动传播的交互结构之间的相互作用。

英文摘要

Criticality has been proposed as a key principle underlying complex behavior in biological and artificial systems; however, how criticality translates from individual dynamics to collective behavior remains unclear. We study this question using a multi-agent system with spatially constrained interactions in which agents sense neighboring light signals through exteroceptors and act by switching their own light on or off, thereby forming a dynamical interaction network at the macroscopic level. The agents' internal states are themselves governed by a reservoir dynamical system at the microscopic level. By varying the microscopic parameters around dynamical criticality, together with the macroscopic interaction topology, we systematically investigate the relation between the two levels. We find that near-critical dynamics within individual agents is not sufficient to produce collective critical-like avalanche statistics. Instead, scale-free behavior depends on the effective connectivity of the macroscopic interaction network, which controls activity propagation. As a result, macroscopic critical-like dynamics are enabled by microscopic regimes that deviate from criticality, with the required deviation depending on the properties of the interaction network. Investigating this relation, we find that slightly subcritical micro-level regimes support near-critical dynamics across a wider range of macroscopic parameters. These results show that in this multi-agent system, collective near-critical behavior depends on the interplay between internal dynamics and the interaction structure that governs activity propagation.

URL PDF HTML ☆

赞 0 踩 0

2606.05882 2026-06-18 q-fin.TR 版本更新 80%

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

市场知情度对做市商盈利能力的影响

Konrad Ochędzan, Nino Antulov-Fantulin

专题命中多智能体：多智能体强化学习研究市场知情度影响

AI总结本文通过多智能体强化学习框架研究市场知情度对做市商盈利能力的影响，发现知情订单流在低知情市场中导致严重逆向选择风险，但整体上市场知情度提高带来的价格发现效应抵消了逆向选择的负面影响，使做市商盈利能力呈上升趋势。

详情

AI中文摘要

本文研究了市场知情度对做市商盈利能力的影响。与现有文献不同，分析是在一个复杂的市场环境中进行的，该环境具有异质性的做市代理，它们在信息集和库存风险厌恶程度、内生价格形成、外生基本面价值动态以及自激励的市场订单流方面存在差异。本文还为由此产生的状态依赖的霍克斯市场接受者过程建立了有限时间范围内的稳定性保证，包括非爆炸性、指数级错误定价可积性、占用时间界限以及路径wise的错误定价尾部估计。为了解决做市问题，该研究采用了一种基于多智能体近端策略优化（MAPPO）算法的强化学习框架，该框架采用集中训练与分散执行（CTDE）设置。研究表明，知情市场订单流在低知情市场中尤其危险，导致严重的逆向选择风险。尽管复杂的市场动态加上随机训练导致了局部非单调的结果，但结果仍然揭示了做市商盈利能力随着市场知情度的提高而整体上升的趋势，这表明由更高市场知情度带来的价格发现效应抵消了逆向选择的负面影响。

英文摘要

This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.

URL PDF HTML ☆

赞 0 踩 0

2603.01221 2026-06-18 cs.MA 版本更新 80%

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

认知增益，偶然成本：多智能体辩论中的不确定性分解用于数学推理

Dan Qiao, Binbin Chen, Fengyu Cai, Jianlong Chen, Wenhao Li, Fuxin Jiang, Zuzhi Chen, Hongyuan Zha, Tieying Zhang, Baoxiang Wang

专题命中多智能体：多智能体辩论框架，强化学习优化

AI总结本文提出贝叶斯不确定性分析框架，将多智能体辩论中的预测不确定性分解为认知不确定性和偶然不确定性，并设计不确定性引导的多智能体强化学习算法，在控制偶然成本的同时提升认知增益，从而提高推理准确性和辩论效率。

Comments ICML2026

详情

AI中文摘要

多智能体辩论（MAD）在改善推理和减少幻觉方面显示出前景，但信息交换如何塑造个体推理行为仍不清楚。经验上，MAD表现出矛盾现象，包括准确率随token熵增加而上升，以及同质和异质智能体组合之间的显著差异。在本文中，我们引入了一个用于MAD的贝叶斯不确定性分析框架，该框架将答案级别的预测不确定性分解为认知不确定性和偶然不确定性，分别对应辩论的潜在增益和成本。在多种智能体配置下，我们发现有效的辩论取决于在受控的偶然成本下实现高认知增益。基于这一见解，我们设计了一种不确定性引导的多智能体强化学习算法，鼓励更低的偶然成本和更有效的认知信息利用。实验表明，我们的方法同时提高了每个智能体的准确性，并促进了更富有成效的辩论过程，为理解和改进MAD提供了一个可操作的贝叶斯视角。

英文摘要

Multi-Agent Debate (MAD) has shown promise in improving reasoning and reducing hallucinations, yet it remains unclear how information exchange shapes individual reasoning behavior. Empirically, MAD exhibits paradoxical phenomena, including rising accuracy with increasing token entropy and marked differences between homogeneous and heterogeneous agent combinations. In this paper, we introduce a Bayesian uncertainty analysis framework for MAD, which decomposes answer-level predictive uncertainty into epistemic uncertainty and aleatoric uncertainty, corresponding to the potential gain and cost of debate. Across multiple agent configurations, we find that effective debate depends on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning algorithm that encourages lower aleatoric cost and more effective epistemic information utilization. Experiments show that our approach simultaneously enhances each agent's accuracy and promotes a more productive debate process, providing an operational Bayesian perspective for understanding and improving MAD.

URL PDF HTML ☆

赞 0 踩 0