AI Agent

2508.04086 2026-06-18 cs.CL 版本更新专题 95

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

ToolGrad：利用文本“梯度”高效生成工具使用数据集

Zhongyi Zhou, Kohei Uehara, Haoyu Zhang, Jingtao Zhou, Lin Gu, Ruofei Du, Zheng Xu, Tatsuya Harada

发表机构 * Google（谷歌）； The University of Tokyo（东京大学）； RIKEN AIP（日本学术振兴会AIP）； Tohoku University（东北大学）

专题命中工具调用：提出ToolGrad框架生成工具使用数据集

AI总结提出ToolGrad框架，通过文本“梯度”引导的迭代过程先构建有效工具使用链再合成用户查询，实现低成本、高成功率的数据生成，训练模型性能超越基线。

Comments ACL 2026 Findings. Source code: https://github.com/zhongyi-zhou/toolgrad

2605.29676 2026-06-18 cs.AI cs.CL 版本更新专题 85

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

符号至关重要：智能体AI系统中令牌优化格式的基准研究

Lorenz Kutschka, Bernhard Geiger

发表机构 * Know Center Research GmbH（知中心研究有限公司）； Graz University of Technology（格拉茨技术大学）； Graz Center for Machine Learning（格拉茨机器学习中心）

专题命中工具调用：智能体系统中令牌优化格式，提升工具调用效率

AI总结本研究在四个智能体基准上评估了两种令牌优化格式TOON和TRON，发现TRON在保持准确率的同时最多减少27%的令牌，而TOON虽减少18%但存在多轮解析失败和并行工具调用输出崩溃的问题。

Comments 16 pages, 6 figures, 4 tables

详情

AI中文摘要

智能体AI系统中的大型语言模型消耗工具模式和执行结果，并发出结构化数据的工具调用。这种交换的默认语言JSON是为应用间交换而非令牌效率设计的，因此其结构元素带来大量令牌开销。最近的工作提出了令牌优化替代方案，如TOON（令牌导向对象表示法）和TRON（令牌减少对象表示法）作为更紧凑的替代，但这些格式仅在孤立的理解或生成任务上进行了评估。它们在端到端智能体循环中是否保持令牌减少仍是一个开放问题。我们在四个智能体基准（BFCL、MCPToolBenchPP、MCP-Universe、StableToolBench）和五个开放权重LLM上评估了TOON和TRON，将输入压缩与输出压缩解耦，以独立测量理解和生成。TRON最多减少27%的令牌，准确率在JSON基线的14个百分点内。TOON实现了最多18%的减少，准确率成本类似为9个百分点，但在多轮解析失败上额外级联，并且对于大多数模型导致并行工具调用输出崩溃。

英文摘要

Large language models in Agentic AI systems consume tool schemas and execution results and emit tool invocations as structured data. The default language for that exchange, JSON, was designed for application-to-application interchange rather than token efficiency, so its structural elements impose substantial token overhead. Recent work proposes token-optimized alternatives such as TOON (Token-Oriented Object Notation) and TRON (Token Reduced Object Notation) as more compact replacements, but these formats have been evaluated only on isolated comprehension or generation tasks. Whether their token reductions hold inside end-to-end agentic loops therefore remains an open question. We evaluate TOON and TRON on four agentic benchmarks (BFCL, MCPToolBenchPP, MCP-Universe, StableToolBench) and five open-weight LLMs, decoupling input compression from output compression to measure comprehension and generation independently. TRON reduces tokens by up to 27% with accuracy within 14pp of the JSON baseline. TOON achieves up to 18% reduction at a similar 9pp accuracy cost, but additionally cascades on multi-turn parsing failures and collapses parallel tool-call output for most models. The code is available at: https://github.com/lkutschka/notation-matters

URL PDF HTML ☆

赞 0 踩 0

2606.01139 2026-06-18 cs.AI 版本更新专题 90

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

SkillRevise: 通过轨迹条件技能修订改进LLM撰写的智能体技能

Yuxuan Liu, Zhaochen Su, Lingyun Xie, Yuhao Zhang, Qing Zong, Jiahe Guo, Zhongwei Xie, Yiyan Ji, Yauwai Yim, Hongyu Luo, Xiyu Ren, Ruan Chenyu, Haoran Li, Yangqiu Song

发表机构 * The Hong Kong University of Science and Technology（香港科学与技术大学）； Harbin Institute of Technology（哈尔滨工业大学）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Nanjing University（南京大学）； The University of Hong Kong（香港大学）

专题命中软件智能体：智能体技能迭代优化，提升LLM agent成功率

AI总结提出SkillRevise框架，通过执行证据诊断、修复原则检索和执行锚定编辑，迭代优化初始技能，在SkillsBench上将基础智能体成功率从36.05%提升至61.63%，并展现跨模型迁移性。

Comments 15 pages, 4 figures

详情

AI中文摘要

智能体技能是使LLM智能体能够执行工作流、验证约束并从故障中恢复的程序性工件。现有的自进化方法利用累积轨迹来优化技能，但在冷启动场景下（仅有一个初始的不完美技能可用）表现不佳。因此，技能构建默认采用专家编写或一次性LLM生成。专家编写的技能成本高昂，且可能与LLM智能体实际执行任务的方式不一致，而一次性生成的技能可能在语法上良好但在行为上薄弱。为弥合这一差距，我们提出SkillRevise，一个基于执行的框架，旨在迭代优化这些初始技能。SkillRevise从执行证据中诊断技能缺陷，从通用记忆中检索相关修复原则，并应用执行锚定编辑。通过重新执行候选技能并测量经验效用，它系统地保留最优技能版本。在三个基准测试和五个LLM上的评估表明，SkillRevise显著优于一次性基线，将SkillsBench上基础智能体的成功率从36.05%提升至61.63%。此外，修订后的技能展现出强大的跨模型迁移性，捕获了超越模型特定工件的通用程序性知识。

英文摘要

Agent skills are procedural artifacts that enable LLM agents to execute workflows, verify constraints, and recover from failures. Existing self-evolving methods refine skills using accumulated trajectories. However, they struggle in cold-start settings, where only an initial, imperfect skill is available. Consequently, skill construction defaults to expert authoring or one-shot LLM generation. Expert-authored skills are costly and may not align with how LLM agents actually execute tasks, while one-shot generated skills can be syntactically well formed yet behaviorally weak. To bridge this gap, we propose SkillRevise, an execution-grounded framework designed to iteratively refine these initial skills. SkillRevise diagnoses skill defects from execution evidence, retrieves relevant repair principles from a general memory, and applies execution-anchored edits. By re-executing candidates, it retains the first verifier-passing skill within the revision budget and falls back to empirical utility only when no candidate succeeds. Evaluated across three benchmarks and five LLMs, SkillRevise substantially outperforms one-shot baselines, improving the base agent's success rate on SkillsBench from 36.05% to 61.63%. Furthermore, the revised skills transfer across both executors and task environments, suggesting that SkillRevise captures reusable procedural knowledge beyond any single executor.

URL PDF HTML ☆

赞 0 踩 0

2604.06367 2026-06-18 cs.CR cs.AI cs.LG 版本更新专题 90

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

WebSP-Eval：在网站安全与隐私任务上评估网络代理

Guruprasad Viswanathan Ramesh, Asmit Nayak, Basieem Siddique, Kassem Fawaz

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

专题命中软件智能体：评估Web Agent在安全隐私任务上的表现

AI总结提出WebSP-Eval框架，通过200个任务实例和自动化评估器，测试多模态大模型在网站安全与隐私任务上的表现，发现状态UI元素（如开关）导致超过45%的任务失败。

Comments Accepted at PETS 2026. Project Page: https://wiscprivacy.com/webspeval/

详情

AI中文摘要

网络代理自动化浏览器任务，从简单的表单填写到复杂的工作流程（如订购杂货）。虽然当前的基准测试评估通用性能（如WebArena）或针对恶意行为的安全性（如SafeArena），但没有现有框架评估代理成功执行面向用户的网站安全和隐私任务的能力，例如管理cookie偏好、配置隐私敏感账户设置或撤销非活动会话。为填补这一空白，我们引入了WebSP-Eval，一个用于衡量网络代理在网站安全和隐私任务上性能的评估框架。WebSP-Eval包括：1）一个手动制作的任务数据集，涵盖28个网站的200个任务实例；2）一个强大的代理系统，支持使用自定义Google Chrome扩展在多次运行中进行账户和初始状态管理；以及3）一个自动化评估器。我们使用最先进的多模态大语言模型评估了总共8个网络代理实例，对网站、任务类别和UI元素进行了细粒度分析。我们的评估显示，当前模型在可靠解决网站安全和隐私任务方面自主探索能力有限，并且在特定任务类别和网站上表现困难。关键的是，我们发现状态UI元素是代理失败的主要原因，其中开关导致许多模型超过45%的任务失败。

英文摘要

Web agents automate browser tasks, ranging from simple form completion to complex workflows like ordering groceries. While current benchmarks evaluate general-purpose performance~(e.g., WebArena) or safety against malicious actions~(e.g., SafeArena), no existing framework assesses an agent's ability to successfully execute user-facing website security and privacy tasks, such as managing cookie preferences, configuring privacy-sensitive account settings, or revoking inactive sessions. To address this gap, we introduce WebSP-Eval, an evaluation framework for measuring web agent performance on website security and privacy tasks. WebSP-Eval comprises 1) a manually crafted task dataset of 200 task instances across 28 websites; 2) a robust agentic system supporting account and initial state management across runs using a custom Google Chrome extension; and 3) an automated evaluator. We evaluate a total of 8 web agent instantiations using state-of-the-art multimodal large language models, conducting a fine-grained analysis across websites, task categories, and UI elements. Our evaluation reveals that current models suffer from limited autonomous exploration capabilities to reliably solve website security and privacy tasks, and struggle with specific task categories and websites. Crucially, we identify stateful UI elements are a primary reason for agent failure, with toggles causing more than 45% task failure across many models.

URL PDF HTML ☆

赞 0 踩 0

2506.09046 2026-06-18 cs.LG cs.AI cs.MA 版本更新专题 90

Self-Evolving Multi-Agent Systems via Textual Backpropagation

通过文本反向传播的自进化多智能体系统

Xiaowen Ma, Yunpu Ma, Chenyang Lin, Sikuan Yan, Jinhe Bi, Zixuan Cao, Yijun Tian, Volker Tresp, Hinrich Schuetze

发表机构 * Ludwig Maximilian University of Munich（慕尼黑路德维希-马克西米利安大学）； Technical University of Munich（慕尼黑技术大学）； Munich Center for Machine Learning（慕尼黑机器学习中心）； University of Notre Dame（诺丁汉大学）

专题命中多智能体：提出自进化多智能体系统，通过文本反向传播优化协作。

AI总结提出Agentic Neural Network框架，将多智能体协作建模为分层神经网络，通过前向分解任务和反向传播反馈实现智能体角色、提示和协作的自进化，在七个基准数据集上超越现有方法。

详情

AI中文摘要

利用多个大型语言模型（LLM）已被证明对处理复杂、高维任务有效，但当前方法通常依赖静态、手动设计的多智能体配置。为克服这些限制，我们提出Agentic Neural Network（ANN）框架，该框架将多智能体协作概念化为分层神经网络架构。在此设计中，每个智能体作为节点运行，每一层形成一个专注于特定子任务的协作团队。我们的框架遵循两阶段优化策略：（1）前向阶段——受神经网络前向传播启发，任务被动态分解为子任务，并逐层构建具有合适聚合方法的协作智能体团队。（2）反向阶段——模仿反向传播，我们通过迭代反馈优化全局和局部协作，使智能体能够自进化其角色、提示和协调。这种神经符号方法使我们的框架能够在训练后创建新的或专门的智能体团队，在准确性和适应性方面带来显著提升。在七个基准数据集上，我们的工作在相同配置下超越了领先的多智能体基线，显示出持续的性能改进。

英文摘要

Leveraging multiple Large Language Models (LLMs) has proven effective for addressing complex, high-dimensional tasks, but current approaches often rely on static, manually engineered multi-agent configurations. To overcome these constraints, we present the Agentic Neural Network (ANN), a framework that conceptualizes multi-agent collaboration as a layered neural network architecture. In this design, each agent operates as a node, and each layer forms a cooperative team focused on a specific subtask. Our framework follows a two-phase optimization strategy: (1) Forward Phase - Drawing inspiration from neural network forward passes, tasks are dynamically decomposed into subtasks, and cooperative agent teams with suitable aggregation methods are constructed layer by layer. (2) Backward Phase - Mirroring backpropagation, we refine both global and local collaboration through iterative feedback, allowing agents to self-evolve their roles, prompts, and coordination. This neuro-symbolic approach enables our framework to create new or specialized agent teams post-training, delivering notable gains in accuracy and adaptability. Across seven benchmark datasets, our work surpasses leading multi-agent baselines under the same configurations, showing consistent performance improvements.

URL PDF HTML ☆

赞 0 踩 0

2605.25929 2026-06-18 cs.MA cs.LG 版本更新专题 85

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

多智能体系统是专家混合：谁成为影响者？

Franka Bause, Jonas Niederle, Martin Pawelczyk, Rebekka Burkholz

发表机构 * CISPA Helmholtz Center for Information Security（CISPA海德堡信息安全中心）； Faculty of Computer Science, University of Vienna（维也纳大学计算机科学系）

专题命中多智能体：研究多智能体LLM协商机制，属于多智能体系统。

AI总结本文通过Friedkin-Johnsen意见动力学模型分析多智能体LLM协商机制，揭示输入依赖的FJ参数使系统成为专家混合，并探讨基于自信度、感知自信度和初始观点对齐的影响者形成机制。

Comments Accepted at the 2nd Workshop on Compositional Learning at ICML 2026

2605.18185 2026-06-18 cs.MA 版本更新专题 85

The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection

在有伴侣选择的社交困境中政策梯度的动力学

Benedict Russell, Chin-wing Leung, Paolo Turrini

专题命中多智能体：研究多智能体社交困境中的策略梯度动力学。

AI总结本文研究了在有伴侣选择的多智能体环境中政策梯度动力学，揭示了伴侣选择如何改变对手分布及奖励景观，并证明在简单规则下促进合作的必要条件是种群方差。

详情

AI中文摘要

在社交困境中，自利学习智能体面临合作的社会效益与背叛的即时奖励之间的选择。已有大量证据表明， assortments 机制如伴侣选择对合作的出现有显著益处，但这些证据大多通过基于代理的模拟获得。本文提供了该问题的分析解，研究了具有伴侣选择的多智能体环境中的政策梯度动力学。我们展示了伴侣选择如何改变对手分布以及奖励景观，并证明这在简单规则下促进合作。特别是，我们发现种群方差是合作出现的必要条件。使用二维维纳过程，我们扩展了动力学以捕捉伴侣选择的随机效应及由此产生的对手分布。我们推导了种群促进合作的充分条件，并证明了稳态分布的存在。模拟证实了随机模型准确捕捉了政策梯度动力学，并澄清了学习率如何影响合作的出现。

英文摘要

In social dilemmas self-interested learning agents face the choice between the societal benefit of cooperation and the immediate reward of defection. Significant evidence exists on the benefits of assortment mechanisms such as partner selection for the emergence of cooperation, but this is largely available through agent-based simulations. In this paper, we provide an analytical solution to the problem, studying the policy-gradient dynamics in a multi-agent environment with partner selection. We show how partner selection changes the opponent distribution and hence the reward landscape, and prove this promotes cooperation under simple rules known from the literature. In particular, we find that population variance is a necessary condition for cooperation to emerge. Using a two-dimensional Wiener process, we extend the dynamics to capture the stochastic effects of partner selection and the resulting opponent distribution. We derive a sufficient condition for the population to be cooperation-promoting and prove the existence of a stationary distribution. Simulations confirm that the stochastic model accurately captures the policy-gradient dynamics and clarifies how the learning rate affects the emergence of cooperation.

URL PDF HTML ☆

赞 0 踩 0

2508.21720 2026-06-18 cs.AI 版本更新专题 85

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

PosterForest: 用于科学海报生成的分层多智能体协作

Jiho Choi, Seojeong Park, Seongjong Song, Hyunjung Shim

发表机构 * Graduate School of Artificial Intelligence, KAIST（韩国釜山国立大学人工智能研究生院）； School of Integrated Technology, Yonsei University（延世大学整合技术学院）

专题命中多智能体：分层多智能体协作生成科学海报

AI总结提出PosterForest，一种无需训练的科学海报生成框架，通过Poster Tree分层表示文档结构，并利用内容与布局智能体进行分层推理与递归优化，实现内容与布局的联合优化，提升语义连贯性、逻辑流畅性和视觉平衡。

Comments ACL 2026

详情

AI中文摘要

自动化科学海报生成需要层次化的文档理解和连贯的内容-布局规划。现有方法通常依赖于平面摘要或分别优化内容和布局。因此，它们常常遭受信息丢失、逻辑流程薄弱和视觉平衡差的问题。我们提出了PosterForest，一个无需训练的科学海报生成框架。我们的方法引入了Poster Tree，一种结构化的中间表示，能够跨多个层次捕获文档层次结构和视觉-文本语义。基于这种表示，内容和布局智能体执行分层推理和递归优化，从全局组织到局部组成逐步优化海报。这种联合优化提高了语义连贯性、逻辑流畅性和视觉和谐。实验表明，PosterForest在自动评估和人工评估中均优于先前方法，且无需额外训练或领域特定监督。

英文摘要

Automating scientific poster generation requires hierarchical document understanding and coherent content-layout planning. Existing methods often rely on flat summarization or optimize content and layout separately. As a result, they often suffer from information loss, weak logical flow, and poor visual balance. We present PosterForest, a training-free framework for scientific poster generation. Our method introduces the Poster Tree, a structured intermediate representation that captures document hierarchy and visual-textual semantics across multiple levels. Building on this representation, content and layout agents perform hierarchical reasoning and recursive refinement, progressively optimizing the poster from global organization to local composition. This joint optimization improves semantic coherence, logical flow, and visual harmony. Experiments show that PosterForest outperforms prior methods in both automatic and human evaluations, without additional training or domain-specific supervision.

URL PDF HTML ☆

赞 0 踩 0

2605.01818 2026-06-18 nlin.AO physics.soc-ph 版本更新专题 80

Emergent Macro-Criticality from Micro-Critical Agents

从微观临界主体涌现的宏观临界性

Nicolas Bessone, Erwan Plantec

专题命中多智能体：多智能体系统，微观临界性涌现宏观临界

AI总结通过多智能体系统研究微观临界性如何影响集体行为，发现宏观临界性依赖于交互网络的连接性，而非单个智能体的临界动力学。

详情

AI中文摘要

临界性已被提出作为生物和人工系统中复杂行为的关键原则；然而，临界性如何从个体动力学转化为集体行为仍不清楚。我们使用一个具有空间约束交互的多智能体系统来研究这个问题，其中智能体通过外感受器感知邻近的光信号，并通过开关自身的光来行动，从而在宏观层面形成一个动态交互网络。智能体的内部状态在微观层面由储层动力系统控制。通过改变微观参数围绕动力学临界性，以及宏观交互拓扑，我们系统地研究了这两个层面之间的关系。我们发现，单个智能体内的近临界动力学不足以产生集体临界般的雪崩统计。相反，无标度行为取决于控制活动传播的宏观交互网络的有效连接性。因此，宏观临界般的动力学是由偏离临界性的微观机制实现的，所需的偏离取决于交互网络的特性。研究这种关系，我们发现略微亚临界的微观层面支持在更广泛的宏观参数范围内接近临界动力学。这些结果表明，在这个多智能体系统中，集体近临界行为取决于内部动力学与控制活动传播的交互结构之间的相互作用。

英文摘要

Criticality has been proposed as a key principle underlying complex behavior in biological and artificial systems; however, how criticality translates from individual dynamics to collective behavior remains unclear. We study this question using a multi-agent system with spatially constrained interactions in which agents sense neighboring light signals through exteroceptors and act by switching their own light on or off, thereby forming a dynamical interaction network at the macroscopic level. The agents' internal states are themselves governed by a reservoir dynamical system at the microscopic level. By varying the microscopic parameters around dynamical criticality, together with the macroscopic interaction topology, we systematically investigate the relation between the two levels. We find that near-critical dynamics within individual agents is not sufficient to produce collective critical-like avalanche statistics. Instead, scale-free behavior depends on the effective connectivity of the macroscopic interaction network, which controls activity propagation. As a result, macroscopic critical-like dynamics are enabled by microscopic regimes that deviate from criticality, with the required deviation depending on the properties of the interaction network. Investigating this relation, we find that slightly subcritical micro-level regimes support near-critical dynamics across a wider range of macroscopic parameters. These results show that in this multi-agent system, collective near-critical behavior depends on the interplay between internal dynamics and the interaction structure that governs activity propagation.

URL PDF HTML ☆

赞 0 踩 0

2606.05882 2026-06-18 q-fin.TR 版本更新专题 80

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

市场知情度对做市商盈利能力的影响

Konrad Ochędzan, Nino Antulov-Fantulin

专题命中多智能体：多智能体强化学习研究市场知情度影响

AI总结本文通过多智能体强化学习框架研究市场知情度对做市商盈利能力的影响，发现知情订单流在低知情市场中导致严重逆向选择风险，但整体上市场知情度提高带来的价格发现效应抵消了逆向选择的负面影响，使做市商盈利能力呈上升趋势。

详情

AI中文摘要

本文研究了市场知情度对做市商盈利能力的影响。与现有文献不同，分析是在一个复杂的市场环境中进行的，该环境具有异质性的做市代理，它们在信息集和库存风险厌恶程度、内生价格形成、外生基本面价值动态以及自激励的市场订单流方面存在差异。本文还为由此产生的状态依赖的霍克斯市场接受者过程建立了有限时间范围内的稳定性保证，包括非爆炸性、指数级错误定价可积性、占用时间界限以及路径wise的错误定价尾部估计。为了解决做市问题，该研究采用了一种基于多智能体近端策略优化（MAPPO）算法的强化学习框架，该框架采用集中训练与分散执行（CTDE）设置。研究表明，知情市场订单流在低知情市场中尤其危险，导致严重的逆向选择风险。尽管复杂的市场动态加上随机训练导致了局部非单调的结果，但结果仍然揭示了做市商盈利能力随着市场知情度的提高而整体上升的趋势，这表明由更高市场知情度带来的价格发现效应抵消了逆向选择的负面影响。

英文摘要

This paper studies how market informedness affects market makers' profitability in a computational market environment with heterogeneous learning agents. We develop an agent-based market model in which market makers differ in their information sets and inventory-risk aversion, prices form endogenously, fundamental values evolve exogenously, and market-taker order flow follows a state-dependent self-exciting process. The model provides a controlled computational laboratory for analyzing the interaction between informed trading, adverse selection, price discovery, and liquidity provision. We establish finite-horizon stability properties of the market-taker order-flow process and solve the market-making problem using multi-agent reinforcement learning with centralized training and decentralized execution. The results show that informed market order flow is particularly harmful when aggregate market informedness is low, exposing market makers to severe adverse-selection risk. However, as market informedness increases, market-maker profitability displays an overall upward trend despite local non-monotonicities arising from complex market dynamics and stochastic learning. This suggests that the price-discovery benefits of informed trading can offset its adverse-selection costs. The findings contribute to computational economics by showing how agent heterogeneity, endogenous price formation, and learning-based liquidity provision jointly shape market outcomes.

URL PDF HTML ☆

赞 0 踩 0

2603.01221 2026-06-18 cs.MA 版本更新专题 80

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

认知增益，偶然成本：多智能体辩论中的不确定性分解用于数学推理

Dan Qiao, Binbin Chen, Fengyu Cai, Jianlong Chen, Wenhao Li, Fuxin Jiang, Zuzhi Chen, Hongyuan Zha, Tieying Zhang, Baoxiang Wang

专题命中多智能体：多智能体辩论框架，强化学习优化

AI总结本文提出贝叶斯不确定性分析框架，将多智能体辩论中的预测不确定性分解为认知不确定性和偶然不确定性，并设计不确定性引导的多智能体强化学习算法，在控制偶然成本的同时提升认知增益，从而提高推理准确性和辩论效率。

Comments ICML2026

详情

AI中文摘要

多智能体辩论（MAD）在改善推理和减少幻觉方面显示出前景，但信息交换如何塑造个体推理行为仍不清楚。经验上，MAD表现出矛盾现象，包括准确率随token熵增加而上升，以及同质和异质智能体组合之间的显著差异。在本文中，我们引入了一个用于MAD的贝叶斯不确定性分析框架，该框架将答案级别的预测不确定性分解为认知不确定性和偶然不确定性，分别对应辩论的潜在增益和成本。在多种智能体配置下，我们发现有效的辩论取决于在受控的偶然成本下实现高认知增益。基于这一见解，我们设计了一种不确定性引导的多智能体强化学习算法，鼓励更低的偶然成本和更有效的认知信息利用。实验表明，我们的方法同时提高了每个智能体的准确性，并促进了更富有成效的辩论过程，为理解和改进MAD提供了一个可操作的贝叶斯视角。

英文摘要

Multi-Agent Debate (MAD) has shown promise in improving reasoning and reducing hallucinations, yet it remains unclear how information exchange shapes individual reasoning behavior. Empirically, MAD exhibits paradoxical phenomena, including rising accuracy with increasing token entropy and marked differences between homogeneous and heterogeneous agent combinations. In this paper, we introduce a Bayesian uncertainty analysis framework for MAD, which decomposes answer-level predictive uncertainty into epistemic uncertainty and aleatoric uncertainty, corresponding to the potential gain and cost of debate. Across multiple agent configurations, we find that effective debate depends on achieving high epistemic gain under controlled aleatoric cost. Building on this insight, we design an uncertainty-guided multi-agent reinforcement learning algorithm that encourages lower aleatoric cost and more effective epistemic information utilization. Experiments show that our approach simultaneously enhances each agent's accuracy and promotes a more productive debate process, providing an operational Bayesian perspective for understanding and improving MAD.

URL PDF HTML ☆

赞 0 踩 0

2606.07591 2026-06-18 cs.LG cs.AI cs.CL 版本更新专题 85

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

ResearchClawBench: 端到端自主科学研究基准

Wanghan Xu, Shuo Li, Tianlin Ye, Qinglong Cao, Yixin Chen, Hengjian Gao, Yiheng Wang, Qi Li, Kun Li, Sheng Xu, Shengdu Chai, Fangchen Yu, Xiangyu Zhao, Zhangrui Zhao, Weijie Ma, Zijie Guo, Koutian Wu, Haoyu Zhou, Haoxiang Yin, Lixue Cheng, Chaofan Hu, Haoxuan Li, Lu Mi, Xuxuan Xie, Yifan Zhou, Ruizhe Chen, Zhiwang Zhou, Xingjian Guo, Yuhao Zhou, Xuming He, Shengyuan Xu, Xinyu Gu, Jiamin Wu, Mianxin Liu, Chunfeng Song, Fenghua Ling, Dongzhan Zhou, Shixiang Tang, Yuqiang Li, Mao Su, Peng Ye, Siqi Sun, Bin Wang, Xue Yang, Zhenfei Yin, Tianfan Fu, Guangtao Zhai, Wanli Ouyang, Bo Zhang, Lei Bai, Wenlong Zhang

发表机构 * Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

专题命中其他Agent ：自主科学研究基准评估智能体

AI总结提出ResearchClawBench基准，包含10个领域40个任务，通过多模态评分标准评估自主科研能力，最强智能体仅得21.5分，揭示当前系统在实验协议、证据匹配和科学核心方面的不足。

详情

AI中文摘要

AI编码智能体越来越多地用于科学工作，但其端到端自主研究能力仍然难以验证。我们提出了ResearchClawBench，一个用于评估自主科学研究的基准，涵盖来自10个科学领域的40个任务。每个任务基于一篇真实发表论文，提供相关文献和原始数据，并在评估期间隐藏目标论文。专家策划的多模态评分标准将目标科学制品分解为加权标准，从而能够评估目标论文级别的重新发现，同时为新发现留出空间。我们在统一协议下评估了七个自主研究（auto-research）智能体，并通过轻量级ResearchHarness评估了十七个原生LLM。当前系统远未达到可靠的重新发现：最强的自主智能体Claude Code平均得分为21.5，最强的ResearchHarness LLM Claude-Opus-4.7平均得分为20.7，LLM前沿均值仅为26.5。错误分析表明，失败集中在实验协议不匹配、证据不匹配和缺失科学核心。ResearchClawBench为衡量自主科学研究进展提供了一个可复现的评估前沿。

英文摘要

AI coding agents are increasingly used for scientific work, but their end-to-end autonomous research capability remains difficult to verify. We present ResearchClawBench, a benchmark for evaluating autonomous scientific research across 40 tasks from 10 scientific domains. Each task is grounded in a real published paper, provides related literature and raw data, and hides the target paper during evaluation. Expert-curated multimodal rubrics decompose the target scientific artifacts into weighted criteria, enabling evaluation of target-paper-level re-discovery while leaving room for new discovery. We evaluate seven autonomous research (auto-research) agents under a unified protocol and seventeen native LLMs through the lightweight ResearchHarness. Current systems remain far from reliable re-discovery: the strongest autonomous agent, Claude Code, averages 21.5, and the strongest ResearchHarness LLM, Claude-Opus-4.7, averages 20.7, with an LLM frontier mean of only 26.5. Error analysis shows that failures concentrate in experimental protocol mismatch, evidence mismatch, and missing scientific core. ResearchClawBench provides a reproducible evaluation frontier for measuring progress toward autonomous scientific research.

URL PDF HTML ☆

赞 0 踩 0

2511.13979 2026-06-18 cs.HC 版本更新专题 80

Personality Pairing Improves Human-AI Collaboration

人格配对改善人机协作

Harang Ju, Sinan Aral

专题命中其他Agent ：研究AI Agent人格与人类协作

AI总结通过大规模实验，将人类与具有不同大五人格特质的AI配对，发现人格匹配显著影响广告质量和团队表现，外倾人类与尽责AI配对效果最差，而神经质人类与神经质AI配对点击率最高。

Comments 29 pages, 5 figures

详情

AI中文摘要

在此，我们研究了AI代理的“人格”如何与人类人格相互作用，从而影响人机协作和绩效。在一项大规模、预注册的随机实验中，我们将1,258名参与者与表现出不同大五人格特质水平的AI代理配对。这些人机团队为一个真实智库制作了7,266个展示广告，我们通过1,168名独立人类评估者以及一项在X平台上进行的、产生了近500万次展示的现场实验对这些广告进行了评估。我们发现，人类和AI的人格各自影响广告质量和团队合作，并且人机人格配对直接影响广告质量和广告绩效。例如，外倾人类与尽责AI配对产生了质量最低的广告，其次是尽责人类与宜人AI配对，以及神经质人类与尽责AI配对。在现场实验中，广告质量显著影响广告绩效（以点击率和每次点击成本衡量），神经质人类与神经质AI配对实现了最高的点击率。这些结果共同表明，人格配对可以改善人机协作和绩效。它们也激励了未来关于AI个性化对人机协作、团队合作和绩效的复杂影响的研究。

英文摘要

Here we examine how AI agent "personalities" interact with human personalities to shape human-AI collaboration and performance. In a large-scale, preregistered randomized experiment, we paired 1,258 participants with AI agents prompted to exhibit varying levels of the Big Five personality traits. These human-AI teams produced 7,266 display ads for a real think tank, which we evaluated using 1,168 independent human raters, and a field experiment on X that generated nearly 5 million impressions. We found that human and AI personalities individually shaped ad quality and teamwork and that human-AI personality pairings directly influenced ad quality and ad performance. For example, extraverted humans paired with conscientious AI produced the lowest quality ads, followed by conscientious humans paired with agreeable AI and neurotic humans paired with conscientious AI. In the field experiment, ad quality significantly influenced ad performance, measured by click-through rates and cost-per-click, and neurotic humans paired with neurotic AI achieved the highest click-through rates. Together, these results demonstrate that personality pairing can improve human-AI collaboration and performance. They also motivate future research on the complex implications of AI personalization for human-AI collaboration, teamwork and performance.

URL PDF HTML ☆

赞 0 踩 0

2602.22222 2026-06-18 cs.IR cs.MA 版本更新专题 80

TWICE: Modeling the Temporal Evolution of Personalized User Behavior via Event-Driven Agents

TWICE：通过事件驱动代理建模个性化用户行为的时间演化

Bingrui Jin, Kunyao Lan, Baihan LI, Mengyue Wu

专题命中其他Agent ：基于LLM的事件驱动用户模拟代理，属于AI Agent

AI总结提出TWICE框架，结合结构化用户画像、事件驱动记忆模块和两阶段工作流，利用LLM模拟用户行为的时间演化，在Twitter数据集上优于基线。

详情

AI中文摘要

用户模拟器广泛用于数据生成、评估和基于代理的交互，但现有方法通常将用户建模为静态角色或依赖通用历史上下文，难以捕捉个体行为随时间的变化。为解决这一局限，我们提出TWICE，一个基于LLM的框架，用于时间基础的个人化用户模拟。TWICE结合了结构化用户画像、围绕生活事件和行为转变组织的事件驱动记忆模块，以及将事件基础内容规划与个性化风格适应分离的两阶段工作流。这种设计使模拟器不仅能建模用户说什么，还能建模过去经历如何影响后续表达。我们在大规模纵向Twitter数据集上评估TWICE，并引入了一个综合评估框架，同时衡量真实性、一致性和类人性。结果表明，TWICE始终优于强基线，表明以事件为中心的记忆是建模个性化用户行为时间演化的有前景机制。

英文摘要

User simulators are widely used for data generation, evaluation, and agent-based interaction, but existing approaches often model users as static personas or rely on generic historical context, making it difficult to capture how individual behavior evolves over time. To address this limitation, we propose TWICE, an LLM-based framework for temporally grounded personalized user simulation. TWICE combines structured user profiling, an event-driven memory module organized around life events and behavioral shifts, and a two-stage workflow separating event-grounded content planning from personalized style adaptation. This design enables the simulator to model not only what a user says, but also how past experiences shape later expression. We evaluate TWICE on a large-scale longitudinal Twitter dataset and introduce a comprehensive evaluation framework that jointly measures authenticity, consistency, and humanlikeness. Results show that TWICE consistently outperforms strong baselines, suggesting that event-centered memory is a promising mechanism for modeling the temporal evolution of personalized user behavior.

URL PDF HTML ☆

赞 0 踩 0

2507.23644 2026-06-18 cs.MA 版本更新专题 70

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

代理信任代理？通过包容性医疗恢复失去的能力

Alba Aguilera, Georgina Curto, Nardine Osman, Ahmed Al-Awah

专题命中其他Agent ：使用基于代理的模拟评估医疗政策，属于AI Agent。

AI总结本文利用基于代理的模拟和贝叶斯逆强化学习，评估巴塞罗那改善无家可归者医疗公平的政策，通过建模信任关系来恢复其核心能力。

详情

AI中文摘要

基于代理的模拟在非侵入性方式下，有潜力为紧迫的人类发展挑战的社会政策提供信息，在其实施于现实世界人群之前。本文响应非营利组织和政府机构的请求，评估正在讨论的政策，以改善巴塞罗那市无家可归者（PEH）医疗服务的公平性。为此，我们整合了能力方法（CA）的概念框架，该框架明确设计用于促进和评估人类福祉，以建模和评估代表PEH和社会工作者的代理行为。我们定义了一个强化学习环境，其中代理旨在在现有环境和法律约束下恢复其核心人类能力。我们使用贝叶斯逆强化学习（IRL）来校准PEH代理中依赖于档案的行为参数，建模对社会工作者的信任和参与程度，这据报告是政策成功的关键因素。我们的结果为通过建立社会服务工作者与PEH之间的信任关系来减轻健康不平等开辟了一条道路。

英文摘要

Agent-based simulations have an untapped potential to inform social policies on urgent human development challenges in a non-invasive way, before these are implemented in real-world populations. This paper responds to the request from non-profit and governmental organizations to evaluate policies under discussion to improve equity in health care services for people experiencing homelessness (PEH) in the city of Barcelona. With this goal, we integrate the conceptual framework of the capability approach (CA), which is explicitly designed to promote and assess human well-being, to model and evaluate the behaviour of agents who represent PEH and social workers. We define a reinforcement learning environment where agents aim to restore their central human capabilities, under existing environmental and legal constraints. We use Bayesian inverse reinforcement learning (IRL) to calibrate profile-dependent behavioural parameters in PEH agents, modeling the degree of trust and engagement with social workers, which is reportedly a key element for the success of the policies in scope. Our results open a path to mitigate health inequity by building relationships of trust between social service workers and PEH.

URL PDF HTML ☆

赞 0 踩 0

2605.30880 2026-06-18 cs.CL cs.AI 版本更新专题 85

PatchWorld: Gradient-Free Optimization of Executable World Models

PatchWorld：可执行世界模型的免梯度优化

Jiaxin Bai, Yue Guo, Yifei Dong, Jiaxuan Xiong, Tianshi Zheng, Yixia Li, Tianqing Fang, Yufei Li, Yisen Gao, Haoyu Huang, Zhongwei Xie, Hong Ting Tsang, Zihao Wang, Lihui Liu, Jeff Z. Pan, Yangqiu Song

发表机构 * Hong Kong Baptist University（香港 Baptist 大学）； Independent Researcher（独立研究员）； HKUST（香港科技大学）； Beijing Institute of Technology（北京理工大学）； Southern University of Science and Technology（南方科技大学）； Wayne State University（韦恩州立大学）； University of Edinburgh（爱丁堡大学）

专题命中规划决策：可执行世界模型，用于智能体规划与预测

AI总结提出 PatchWorld 框架，通过反例引导的代码修复将离线轨迹转化为可执行的 Python 世界模型，实现无需梯度优化的符号信念状态程序，在 AgentGym 环境中达到 76.4% 的宏观成功率。

Comments 40 pages

详情

AI中文摘要

文本智能体环境通常被建模为部分可观察马尔可夫决策过程（POMDP），假设模拟器的潜在状态和转移动态对智能体隐藏。然而，很少有工作研究是否可以通过归纳可执行代码来作为部分可观察性下的预测和规划的世界模型。我们引入了 PatchWorld，一个免梯度框架，通过反例引导的代码修复将离线轨迹转化为可执行的 Python 世界模型。PatchWorld 不是用黑盒模型预测下一个观察，而是归纳出符号信念状态程序，其动作更新可以被检查、重放和局部修补。在七个 AgentGym 环境中，PatchWorld-Simple 在评估方法中取得了最高的基于代码的规划分数，在实时一步前瞻中达到 76.4% 的宏观成功率，同时在世界模型预测模块本身内不调用任何 LLM。我们进一步发现，人类指定的残差记忆偏差提高了表面观察保真度，但削弱了决策效用。这暴露了可执行世界模型中的权衡，因为提高观察保真度可能以牺牲动作判别动态为代价，反之亦然。代码可在 https://github.com/HKBU-KnowComp/PatchWorld 获取。

英文摘要

Text-agent environments are typically modeled as partially observable Markov decision processes (POMDPs), assuming that the simulator's latent state and transition dynamics are hidden from the agent. Yet little work has examined whether executable code can be induced to serve as a world model for prediction and planning under partial observability. We introduce PatchWorld, a gradient-free framework that turns offline trajectories into executable Python world models through counterexample-guided code repair. Instead of predicting the next observation with a black-box model, PatchWorld induces symbolic belief-state programs whose action updates can be inspected, replayed, and locally patched. Across seven AgentGym environments, PatchWorld-Simple achieves the highest code-based planning score among evaluated methods, reaching 76.4\% macro success in live one-step lookahead while invoking no LLM calls inside the world-model prediction module itself. We further find that a human-specified residual-memory bias improves surface observation fidelity but weakens decision utility. This exposes a tradeoff in executable world models, since improving observation fidelity can come at the expense of action-discriminative dynamics, and vice versa. Code is available at https://github.com/HKBU-KnowComp/PatchWorld.

URL PDF HTML ☆

赞 0 踩 0

2603.00656 2026-06-18 cs.AI 版本更新专题 85

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

InfoPO：面向用户智能体的信息驱动策略优化

Fanqi Kong, Jiayi Zhang, Mingyi Deng, Chenglin Wu, Yuyu Luo, Bang Liu

发表机构 * Peking University（北京大学）； The Hong Kong University of Science（香港科学大学）

专题命中规划决策：信息驱动策略优化，面向用户智能体

AI总结针对多轮交互中信用分配和优势信号不足的问题，提出信息增益奖励与自适应方差门控融合的InfoPO方法，在意图澄清、协作编码等任务上优于现有基线。

详情

AI中文摘要

现实世界中用户对LLM智能体的请求往往不明确。智能体必须通过交互获取缺失信息并做出正确的下游决策。然而，当前基于多轮GRPO的方法通常依赖于轨迹级奖励计算，这导致信用分配问题以及rollout组内优势信号不足。一种可行的方法是在细粒度上识别有价值的交互轮次，以驱动更有针对性的学习。为此，我们引入了InfoPO（信息驱动策略优化），它将多轮交互视为一个主动不确定性降低的过程，并计算信息增益奖励，该奖励对反馈可测量地改变智能体后续动作分布（与掩码反馈反事实相比）的轮次进行奖励。然后，通过自适应方差门控融合将该信号与任务结果结合，以在保持任务导向目标方向的同时识别信息重要性。在包括意图澄清、协作编码和工具增强决策在内的多种任务中，InfoPO始终优于提示和多轮RL基线。它还在用户模拟器偏移下表现出鲁棒性，并有效泛化到环境交互任务。总体而言，InfoPO为优化复杂的智能体-用户协作提供了一种原则性且可扩展的机制。代码可在以下网址获取：https://this URL。

英文摘要

Real-world user requests to LLM agents are often underspecified. Agents must interact to acquire missing information and make correct downstream decisions. However, current multi-turn GRPO-based methods often rely on trajectory-level reward computation, which leads to credit assignment problems and insufficient advantage signals within rollout groups. A feasible approach is to identify valuable interaction turns at a fine granularity to drive more targeted learning. To address this, we introduce InfoPO (Information-Driven Policy Optimization), which frames multi-turn interaction as a process of active uncertainty reduction and computes an information-gain reward that credits turns whose feedback measurably changes the agent's subsequent action distribution compared to a masked-feedback counterfactual. It then combines this signal with task outcomes via an adaptive variance-gated fusion to identify information importance while maintaining task-oriented goal direction. Across diverse tasks, including intent clarification, collaborative coding, and tool-augmented decision making, InfoPO consistently outperforms prompting and multi-turn RL baselines. It also demonstrates robustness under user simulator shifts and generalizes effectively to environment-interactive tasks. Overall, InfoPO provides a principled and scalable mechanism for optimizing complex agent-user collaboration. Code is available at https://github.com/kfq20/InfoPO.

URL PDF HTML ☆

赞 0 踩 0

2603.00026 2026-06-18 cs.CL cs.AI cs.IR 版本更新专题 85

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

ActMem：弥合LLM代理中记忆检索与推理之间的差距

Xiaohui Zhang, Zequn Sun, Chengyuan Yang, Yaqin Jin, Yazhong Zhang, Wei Hu

发表机构 * State Key Laboratory for Novel Software Technology, Nanjing University, China（南京大学新型软件技术国家重点实验室）； Alibaba Group, Hangzhou, China（阿里巴巴集团，杭州，中国）； National Institute of Healthcare Data Science, Nanjing University, China（南京大学健康数据科学国家研究院）

专题命中规划决策：记忆检索与推理结合，主动因果推理

AI总结提出ActMem框架，通过将非结构化对话历史转化为结构化因果语义图，结合反事实推理和常识补全，实现主动因果推理，显著提升LLM代理在复杂记忆依赖任务中的表现。

详情

AI中文摘要

记忆管理对于长期交互中的LLM代理至关重要。当前的记忆框架通常将代理视为被动的“记录器”，并在不理解其深层含义的情况下检索信息。它们可能在需要推理和复杂决策的场景中失败。为了弥合这一关键差距，我们提出了一种新颖的可操作记忆框架ActMem，它将记忆检索与主动因果推理相结合。ActMem将非结构化对话历史转化为结构化的因果语义图。通过利用反事实推理和常识补全，它使代理能够推断隐含约束并解决过去状态与当前意图之间的潜在冲突。此外，我们引入了一个全面的数据集ActMemEval，用于评估代理在逻辑驱动场景中的推理能力，超越了现有记忆基准测试中事实检索的焦点。实验表明，ActMem在处理复杂的、依赖记忆的任务时显著优于基线，为更一致和可靠的智能助手铺平了道路。

英文摘要

Memory management is essential for LLM agents in long-term interactions. Current memory frameworks typically treat agents as passive ``recorders'' and retrieve information without understanding its deeper implications. They may fail in scenarios requiring reasoning and complex decision-making. To bridge this critical gap, we propose a novel actionable memory framework called ActMem that integrates memory retrieval with active causal reasoning. ActMem transforms unstructured dialogue history into a structured causal and semantic graph. By leveraging counterfactual reasoning and commonsense completion, it enables agents to deduce implicit constraints and resolve potential conflicts between past states and current intentions. Furthermore, we introduce a comprehensive dataset ActMemEval to evaluate agent reasoning capabilities in logic-driven scenarios, moving beyond the fact-retrieval focus of existing memory benchmarks. Experiments demonstrate that ActMem significantly outperforms baselines in handling complex, memory-dependent tasks, paving the way for more consistent and reliable intelligent assistants.

URL PDF HTML ☆

赞 0 踩 0

2510.05107 2026-06-18 cs.AI 版本更新专题 85

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

大型语言模型代理中行为智能的结构化认知循环（扩展修订：从行为架构到认知问责）

Myung Ho Kim

发表机构 * JEI University（JEI大学）

专题命中规划决策：结构化认知循环实现LLM代理可问责行为

AI总结提出结构化认知循环（SCL）架构，通过分离认知、记忆、控制和行动模块，实现LLM代理的可问责行为，在360个任务中成功率86.3%，优于基线方法。

Comments This revised version extends the original SCL framework from a behavioral architecture for reliable LLM agents into a broader architecture of epistemic accountability, integrating context-aware Human-in-the-Loop control, Pool-Gated Retrieval, and the Horizon-Warrant-Commitment structure

详情

AI中文摘要

AI代理的核心挑战不仅是性能，还有问责性。通过不透明提示序列行动的代理可能产生正确输出，但几乎无法验证为何允许某个行动、错误发生在何处或如何分配责任。本文提出结构化认知循环（SCL）作为大型语言模型代理中可问责行为的架构。SCL将认知、记忆、控制和行动分离为不同模块。语言模型提出建议。外部记忆保存已验证的状态。轻量级控制器检查前提条件、防止冗余行动，并在使用工具前授权执行。我们评估了SCL与ReAct及常见LangChain代理变体在旅行规划、条件邮件起草和约束引导图像生成中的表现。在360个回合中，SCL的任务成功率达到86.3%，而基于提示的基线为70.5%至76.8%。它还提高了目标保真度，减少了冗余工具调用，增加了中间状态的重用，并降低了无依据的断言。此扩展修订将SCL置于更广泛的认知问责架构中。后续扩展整合了上下文感知的人机循环控制、池门控检索和视野担保承诺框架。这些组件共同定义了一个代理架构，其中模型提出建议，结构做出决策，证据在使用前得到担保，人类判断嵌入在轨迹中而非事后强加。结果为AI代理奠定了基础，使其决策不仅有效，而且得到授权、可检查且可问责。

英文摘要

The central challenge for AI agents is not only performance but accountability. Agents that act through opaque prompt sequences may produce correct outputs, but they provide little basis for verifying why an action was permitted, where an error occurred, or how responsibility should be assigned. This paper presents the Structured Cognitive Loop as an architecture for accountable behavior in large language model agents. SCL separates cognition, memory, control, and action into distinct modules. The language model proposes. External memory preserves verified state. A lightweight controller checks preconditions, prevents redundant actions, and authorizes execution before tools are used. We evaluate SCL against ReAct and common LangChain agent variants across travel planning, conditional email drafting, and constraint guided image generation. Across 360 episodes, SCL achieves 86.3 percent task success compared with 70.5 to 76.8 percent for prompt based baselines. It also improves goal fidelity, reduces redundant tool calls, increases reuse of intermediate state, and lowers unsupported assertions. This extended revision situates SCL within a broader architecture of epistemic accountability. Subsequent extensions integrate context aware Human in the Loop control, Pool Gated Retrieval, and the Horizon Warrant Commitment framework. Together these components define an agent architecture in which the model proposes, structure decides, evidence is warranted before use, and human judgment is embedded in the trace rather than imposed after the fact. The result is a foundation for AI agents whose decisions are not only effective but also authorized, inspectable, and accountable.

URL PDF HTML ☆

赞 0 踩 0

2605.22142 2026-06-18 cs.LG cs.AI 版本更新专题 80

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

知识图谱下的短期到长期记忆转移：在部分可观测性下的短期到长期记忆转移

Taewoon Kim, Vincent François-Lavet, Michael Cochez

专题命中规划决策：强化学习中记忆转移，属于智能体决策。

AI总结本文研究了在部分可观测性下知识图谱中的短期到长期记忆转移问题，提出了一种基于神经符号价值决策的方法，通过在长期插入前决定保留或丢弃观察到的三元组，从而提升记忆效率，并在RoomKG基准测试中优于符号和神经基线方法。

详情

AI中文摘要

在部分可观测性下的强化学习需要决定保留哪些信息，但大多数基于记忆的方法并未显式建模符号观察的短期到长期转移。我们研究了这一转移过程，将其建模为一个神经符号价值决策问题：对于每个观察到的三元组，智能体需决定在长期插入前是否保留或丢弃。为处理可变大小的短期缓冲区，我们采用了一种每项Q学习设计，使用共享参数和实际的时间差分更新，跨连续步骤匹配项目。在长期记忆容量为128的RoomKG基准测试中，学习到的转移决策优于符号和神经基线，包括带有时间注释的符号基线和基于历史的LSTM/Transformer基线。在转移策略消融分析中，一个轻量级的本地短期-only变体表现最佳，且在步骤层面行为显示，策略保留导航和查询相关的事实，同时丢弃低价值的候选事实，支持在内存限制下显式且可解释的记忆决策。

英文摘要

Reinforcement learning under partial observability requires deciding what information to retain, yet most memory-based approaches do not explicitly model short-term-to-long-term transfer of symbolic observations. We study this transfer process in a temporal knowledge-graph memory setting and cast it as a neuro-symbolic value-based decision problem: for each observed triple, the agent chooses whether to keep or drop it before long-term insertion. To handle variable-sized short-term buffers, we use a per-item Q-learning design with shared parameters and a practical temporal-difference update over matched items across consecutive steps. On the RoomKG benchmark at long-term memory capacity 128, learned transfer decisions outperform symbolic and neural baselines, including symbolic baselines with temporal annotations and history-based LSTM/Transformer baselines. Across transfer-policy ablations, a lightweight local short-term-only variant performs best, and step-level behavior shows that the policy keeps navigation- and query-relevant facts while discarding lower-value candidate facts, supporting explicit and interpretable memory decisions under memory constraints.

URL PDF HTML ☆

赞 0 踩 0

2604.03208 2026-06-18 cs.LG 版本更新专题 80

Hierarchical Planning with Latent World Models

基于潜在世界模型的分层规划

Wancong Zhang, Basile Terver, Artem Zholus, Soham Chitnis, Harsh Sutaria, Mido Assran, Randall Balestriero, Amir Bar, Adrien Bardes, Yann LeCun, Nicolas Ballas

发表机构 * FAIR at Meta（Meta旗下的FAIR）； New York University（纽约大学）； Mila - Québec AI Institute（魁北克AI研究院）； Brown University（布朗大学）

专题命中规划决策：分层世界模型用于长时域规划，属智能体规划

AI总结提出HWM架构，通过多时间尺度潜在世界模型和潜在匹配实现分层模型预测控制，解决长时域任务中单层规划失败和计算爆炸问题。

详情

AI中文摘要

世界模型是通过规划实现零样本具身控制的一条有前景的路径。然而，现有的世界模型规划器在长时域、多阶段任务中面临困难：预测误差累积，且朴素搜索的复杂度随规划时域呈指数增长。分层方法通过将任务分解为更短、可处理的子问题来缓解这两个问题；然而，先前的分层方法要么将控制摊销为任务特定的策略（分层强化学习），要么假设低维状态和已知动力学（经典分层MPC）。我们提出了基于潜在世界模型的分层规划（HWM），这是一种直接在仅通过下一潜在预测训练的视觉世界模型上进行分层模型预测控制（MPC）的架构和规划范式。HWM在共享潜在空间内学习多个时间尺度的世界模型，因此长时域模型的预测通过潜在匹配作为短时域模型的子目标，无需任务特定的奖励、技能学习或分层策略。为了保持长时域搜索的可处理性，HWM学习了一个动作编码器，将原始动作块压缩为潜在宏动作。在真实世界的Franka操作中，HWM从单个目标图像中完成拾取和放置的成功率为70%，而单层规划的成功率为0%。在模拟的推操作和迷宫导航任务中，HWM在长时域任务上持续提升性能，同时所需规划计算量最多减少3倍。

英文摘要

World models are a promising path to zero-shot embodied control through planning. However, existing world model planners struggle on long-horizon, multi-stage tasks: prediction errors compound and naive search is exponential in the planning horizon. Hierarchy mitigates both by decomposing tasks into shorter, tractable subproblems; yet prior hierarchical approaches either amortize control into task-specific policies (hierarchical RL) or assume low-dimensional states and known dynamics (classical hierarchical MPC). We present Hierarchical Planning with Latent World Models (HWM), an architecture and planning paradigm for hierarchical model predictive control (MPC) directly on visual world models trained solely via next-latent prediction. HWM learns world models at multiple temporal scales within a shared latent space, so predictions from the long-horizon model serve as subgoals for the short-horizon model via latent matching, without task-specific rewards, skill learning, or hierarchical policies. To keep long-horizon search tractable, HWM learns an action encoder that compresses primitive action chunks into latent macro-actions. On real-world Franka manipulation, HWM solves pick-and-place from a single goal image at 70% success vs. 0% for single-level planning. Across simulated push manipulation and maze navigation, HWM consistently improves performance on long-horizon tasks while requiring up to 3x less planning compute.

URL PDF HTML ☆

赞 0 踩 0

2411.10399 2026-06-18 cs.GT cs.CR cs.DC 版本更新专题 80

Game Theoretic Liquidity Provisioning in Concentrated Liquidity Market Makers

集中流动性做市商中的博弈论流动性提供

Weizhao Tang, Rachid El-Azouzi, Cheng Han Lee, Ethan Chan, Giulia Fanti

专题命中规划决策：博弈论模型分析流动性提供策略

AI总结针对集中流动性做市商中流动性提供者的策略互动，建立博弈论模型，证明其可简化为具有唯一纳什均衡的线性复杂度博弈，均衡遵循水填充策略，并基于真实数据发现LP策略偏离均衡，调整后可提升日收益率。

详情

AI中文摘要

自动做市商（AMM）是一类去中心化交易所，能够实现数字资产的自动交易。它们接受流动性提供者（LP）存入的数字代币；交易者可以使用这些代币执行交易，从而为投资的LP产生费用。AMM的显著特征是交易价格由算法决定，这与传统的限价订单簿不同。集中流动性做市商（CLMM）是AMM的一个重要类别，它为流动性提供者提供了灵活性，不仅可以决定提供多少流动性，还可以决定在哪些价格范围内使用流动性。由于费用奖励在LP之间共享，这种灵活性可能使战略规划复杂化。我们建立并分析了一个博弈论模型来研究CLMM中LP的激励。我们的主要结果表明，虽然原始公式存在多个纳什均衡且复杂度与合约中价格点数量的二次方成正比，但它可以简化为一个具有唯一纳什均衡的博弈，其复杂度仅为线性。我们进一步证明，这个简化博弈的纳什均衡遵循一种水填充策略，其中低预算LP用尽其全部预算，而富裕LP则不会。最后，通过将我们的博弈模型拟合到真实的CLMM，我们观察到在具有风险资产的流动性池中，LP采用的投资策略远非纳什均衡。在价格不确定性下，他们通常投资于比我们分析建议的更少且更宽的价格范围，并且流动性更新频率较低。我们表明，在多个池中，通过将策略更新为更接近我们博弈的纳什均衡，LP可以将其每日回报中位数提高116美元，这相当于每日投资回报中位数增加0.009%。

英文摘要

Automated marker makers (AMMs) are a class of decentralized exchanges that enable the automated trading of digital assets. They accept deposits of digital tokens from liquidity providers (LPs); tokens can be used by traders to execute trades, which generate fees for the investing LPs. The distinguishing feature of AMMs is that trade prices are determined algorithmically, unlike classical limit order books. Concentrated liquidity market makers (CLMMs) are a major class of AMMs that offer liquidity providers flexibility to decide not only \emph{how much} liquidity to provide, but \emph{in what ranges of prices} they want the liquidity to be used. This flexibility can complicate strategic planning, since fee rewards are shared among LPs. We formulate and analyze a game theoretic model to study the incentives of LPs in CLMMs. Our main results show that while our original formulation admits multiple Nash equilibria and has complexity quadratic in the number of price ticks in the contract, it can be reduced to a game with a unique Nash equilibrium whose complexity is only linear. We further show that the Nash equilibrium of this simplified game follows a waterfilling strategy, in which low-budget LPs use up their full budget, but rich LPs do not. Finally, by fitting our game model to real-world CLMMs, we observe that in liquidity pools with risky assets, LPs adopt investment strategies far from the Nash equilibrium. Under price uncertainty, they generally invest in fewer and wider price ranges than our analysis suggests, with lower-frequency liquidity updates. We show that across several pools, by updating their strategy to more closely match the Nash equilibrium of our game, LPs can improve their median daily returns by \$116, which corresponds to an increase of 0.009\% in median daily return on investment.

URL PDF HTML ☆

赞 0 踩 0

2510.03635 2026-06-18 eess.SY cs.SY 版本更新专题 70

Cyber Resilience of Three-phase Unbalanced Distribution System Restoration under Sparse Adversarial Attack on Load Forecasting

三相不平衡配电系统恢复在负荷预测稀疏对抗攻击下的网络弹性

Chen Chao, Zixiao Ma, Ziang Zhang

专题命中规划决策：攻击下的恢复规划，涉及决策

AI总结本文量化对抗性攻击对负荷预测的影响，提出梯度稀疏攻击方法，并建立恢复感知验证框架，揭示系统级故障，为设计网络安全感知的恢复规划提供见解。

Comments 10 pages, 7 figures

详情

AI中文摘要

系统恢复对于电力系统弹性至关重要，然而，其对基于人工智能的负荷预测的日益依赖引入了显著的网络安全风险。不准确的预测可能导致不可行的规划、电压和频率违规以及断电段落的恢复失败，但恢复过程对此类攻击的弹性在很大程度上仍未探索。本文通过量化对抗性操纵的预测如何影响恢复可行性和电网安全性来填补这一空白。我们开发了一种基于梯度的稀疏对抗攻击，该攻击策略性地扰动最具影响力的时空输入，在保持隐蔽性的同时暴露预测模型的脆弱性。我们进一步创建了一个恢复感知验证框架，将这些受损的预测嵌入到顺序恢复模型中，并使用不平衡三相最优潮流公式评估操作可行性。仿真结果表明，所提出的方法比基线攻击更高效、更隐蔽。它揭示了系统级故障，例如电压和功率爬坡违规，这些故障阻止了关键负荷的恢复。这些发现为设计网络安全感知的恢复规划框架提供了可行的见解。

英文摘要

System restoration is critical for power system resilience, nonetheless, its growing reliance on artificial intelligence (AI)-based load forecasting introduces significant cybersecurity risks. Inaccurate forecasts can lead to infeasible planning, voltage and frequency violations, and unsuccessful recovery of de-energized segments, yet the resilience of restoration processes to such attacks remains largely unexplored. This paper addresses this gap by quantifying how adversarially manipulated forecasts impact restoration feasibility and grid security. We develop a gradient-based sparse adversarial attack that strategically perturbs the most influential spatiotemporal inputs, exposing vulnerabilities in forecasting models while maintaining stealth. We further create a restoration-aware validation framework that embeds these compromised forecasts into a sequential restoration model and evaluates operational feasibility using an unbalanced three-phase optimal power flow formulation. Simulation results show that the proposed approach is more efficient and stealthier than baseline attacks. It reveals system-level failures, such as voltage and power ramping violations that prevent the restoration of critical loads. These findings provide actionable insights for designing cybersecurity-aware restoration planning frameworks.

URL PDF HTML ☆

赞 0 踩 0

2402.08128 2026-06-18 cs.AI cs.GT 版本更新专题 70

Recursive Joint Simulation in Games

博弈中的递归联合模拟

Vojtech Kovarik, Caspar Oesterheld, Vincent Conitzer

发表机构 * Foundations of Cooperative AI Lab (FOCAL), Computer Science Department（合作人工智能基础实验室（FOCAL），计算机科学系）； Carnegie Mellon University（卡内基梅隆大学）； AI Center（人工智能中心）； Czech Technical University（捷克技术大学）； Center for Theoretical Study（理论研究中心）； Charles University（查理大学）

专题命中规划决策：研究AI智能体递归联合模拟实现合作

AI总结研究AI智能体通过递归联合模拟实现合作，证明该过程等价于原博弈的无限重复版本，从而可直接应用民间定理等现有结论。

详情

AI中文摘要

AI智能体之间的博弈动力学可能以多种方式不同于传统的人类-人类互动。其中一个差异是，可能能够精确模拟一个AI智能体，例如因为其源代码已知。这样的智能体将从根本上不确定自己是在现实世界还是在模拟中。我们的目标是探索利用这种可能性在战略环境中实现更合作的结果。在本文中，我们研究了AI智能体之间的交互，其中智能体运行递归联合模拟。也就是说，智能体首先共同观察它们所面临情境的模拟。这个模拟递归地包含额外的模拟（带有小的失败概率以避免无限递归），并且在选择行动之前观察所有这些嵌套模拟的结果。我们表明，由此产生的交互在策略上等价于原始博弈的无限重复版本，允许直接转移现有结果，如各种民间定理。作为该等价性稳健性的证据，我们表明即使放宽一些假设，它仍然成立，并且“从内部”也成立——即对于发现自己处于博弈中并具有自定位不确定性的智能体而言。

英文摘要

Game-theoretic dynamics between AI agents could differ from traditional human-human interactions in various ways. One such difference is that it may be possible to accurately simulate an AI agent, for example because its source code is known. Such an agent would then be fundamentally uncertain whether it is in the real world or in a simulation. Our aim is to explore ways of leveraging this possibility to achieve more cooperative outcomes in strategic settings. In this paper, we study an interaction between AI agents where the agents run a recursive joint simulation. That is, the agents first jointly observe a simulation of the situation they face. This simulation in turn recursively includes additional simulations (with a small chance of failure, to avoid infinite recursion), and the results of all these nested simulations are observed before an action is chosen. We show that the resulting interaction is strategically equivalent to an infinitely repeated version of the original game, allowing a direct transfer of existing results such as the various folk theorems. As evidence that the equivalence is robust, we show that it holds even when we relax some of the assumptions and that it also holds ``from the inside'' -- meaning, for an agent that finds itself inside the game and has self-locating uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2603.09344 2026-06-18 cs.AI stat.ML 版本更新专题 65

Robust Regularized Policy Iteration under Transition Uncertainty

鲁棒正则化策略迭代在转移不确定性下

Hongqiang Lin, Zhenghui Fu, Weihao Tang, Pengfei Wang, Yiding Sun, Qixian Huang, Dongxu Zhang

发表机构 * College of Computer Science and Technology, Zhejiang University, Hangzhou, China（浙江大学计算机科学与技术学院）； School of Artificial Intelligence, Optics and Electronics (iOPEN), Northwestern Polytechnical University, Xi'an, China（西北工业大学人工智能、光学与电子学院（iOPEN））； School of Software Technology, Zhejiang University, Hangzhou, China（浙江大学软件技术学院）； School of Software Engineering, Xi'an Jiaotong University, Xi'an, China（西安交通大学软件工程学院）； School of Systems Science and Engineering, Sun Yat-sen University, Guangzhou, China（中山大学系统科学与工程学院）

专题命中规划决策：离线强化学习用于智能体决策

AI总结提出鲁棒正则化策略迭代（RRPI），通过将离线强化学习建模为鲁棒策略优化，使用KL正则化替代难解的双层目标，并基于鲁棒正则化贝尔曼算子实现高效策略迭代，理论保证收敛性，实验在D4RL基准上表现优异。

详情

AI中文摘要

离线强化学习（RL）无需在线探索即可实现数据高效且安全的策略学习，但其性能常因分布偏移而下降。学习到的策略可能访问分布外的状态-动作对，其中价值估计和学习到的动态不可靠。为了在统一框架中处理策略引发的外推和转移不确定性，我们将离线RL建模为鲁棒策略优化，将转移核视为不确定性集内的决策变量，并针对最坏情况动态优化策略。我们提出鲁棒正则化策略迭代（RRPI），用可处理的KL正则化替代难解的最大-最小双层目标，并基于鲁棒正则化贝尔曼算子推导出高效的策略迭代过程。我们提供了理论保证，证明所提出的算子是$\gamma$-压缩算子，且迭代更新替代目标能单调改进原始鲁棒目标并收敛。在D4RL基准上的实验表明，RRPI实现了强大的平均性能，在大多数环境中优于包括基于百分位数方法在内的最新基线，并在其余环境中保持竞争力。此外，RRPI通过将较低的$Q$值与高认知不确定性对齐，展现出鲁棒性能，从而防止策略执行不可靠的分布外动作。

英文摘要

Offline reinforcement learning (RL) enables data-efficient and safe policy learning without online exploration, but its performance often degrades under distribution shift. The learned policy may visit out-of-distribution state-action pairs where value estimates and learned dynamics are unreliable. To address policy-induced extrapolation and transition uncertainty in a unified framework, we formulate offline RL as robust policy optimization, treating the transition kernel as a decision variable within an uncertainty set and optimizing the policy against the worst-case dynamics. We propose Robust Regularized Policy Iteration (RRPI), which replaces the intractable max-min bilevel objective with a tractable KL-regularized surrogate and derives an efficient policy iteration procedure based on a robust regularized Bellman operator. We provide theoretical guarantees by showing that the proposed operator is a $γ$-contraction and that iteratively updating the surrogate yields monotonic improvement of the original robust objective with convergence. Experiments on D4RL benchmarks demonstrate that RRPI achieves strong average performance, outperforming recent baselines including percentile-based methods on the majority of environments while remaining competitive on the rest. Moreover, RRPI exhibits robust performance by aligning lower $Q$-values with high epistemic uncertainty, which prevents the policy from executing unreliable out-of-distribution actions.

URL PDF HTML ☆

赞 0 踩 0

2601.14288 2026-06-18 astro-ph.CO cs.AI cs.CE gr-qc hep-th 版本更新专题 85

DeepInflation: an AI agent for research and model discovery of inflation

DeepInflation：用于暴胀研究与模型发现的AI智能体

Ze-Yu Peng, Hao-Shi Yuan, Qi Lai, Jun-Qian Jiang, Gen Ye, Jun Zhang, Yun-Song Piao

发表机构 * School of Physical Sciences, University of Chinese Academy of Sciences, Beijing 100049, China ； International Centre for Theoretical Physics Asia-Pacific, University of Chinese Academy of Sciences, 100190 Beijing, China Taiji Laboratory for Gravitational Wave Universe, University of Chinese Academy of Sciences, 100049 Beijing, China School of Fundamental Physics ； Mathematical Sciences, Hangzhou Institute for Advanced Study, UCAS, Hangzhou 310024, China Institute of Theoretical Physics, Chinese Academy of Sciences, P.O. Box 2735, Beijing 100190, China D\' e partement de Physique Th\' e orique, Universit\' e de Gen\` e ve, 24 quai Ernest-Ansermet, CH-1211 Gen\` e ve 4, Switzerland

专题命中工作流自动化：多智能体架构自动发现暴胀势模型

AI总结提出基于多智能体架构的AI智能体DeepInflation，集成大语言模型、符号回归引擎和检索增强生成知识库，自动发现与最新观测一致的单场慢滚暴胀势，并解释理论背景。

详情

AI中文摘要

我们提出了DeepInflation，一个专为暴胀宇宙学中的研究和模型发现而设计的AI智能体。基于多智能体架构，DeepInflation将大语言模型（LLMs）与符号回归（SR）引擎以及检索增强生成（RAG）知识库相结合。该框架使智能体能够自动探索和验证广阔的暴胀势景观，同时将其输出建立在既定的理论文献基础上。我们证明，DeepInflation能够成功发现与最新观测（以ACT DR6结果为例）或任意给定的$n_s$和$r$一致的简单且可行的单场慢滚暴胀势，并为晦涩的暴胀场景提供准确的理论背景。DeepInflation作为宇宙学中新一代自主科学发现引擎的原型，使研究人员和非专家都能使用自然语言探索暴胀景观。该智能体可从此网址获取：https://example.com。

英文摘要

We present DeepInflation, an AI agent designed for research and model discovery in inflationary cosmology. Built upon a multi-agent architecture, DeepInflation integrates Large Language Models (LLMs) with a symbolic regression (SR) engine and a retrieval-augmented generation (RAG) knowledge base. This framework enables the agent to automatically explore and verify the vast landscape of inflationary potentials while grounding its outputs in established theoretical literature. We demonstrate that DeepInflation can successfully discover simple and viable single-field slow-roll inflationary potentials consistent with the latest observations (with the ACT DR6 results taken as an example) or any given $n_s$ and $r$, and provide accurate theoretical context for obscure inflationary scenarios. DeepInflation serves as a prototype for a new generation of autonomous scientific discovery engines in cosmology, which enables researchers and non-experts alike to explore the inflationary landscape using natural language. This agent is available at https://github.com/pengzy-cosmo/DeepInflation.

URL PDF HTML ☆

赞 0 踩 0

1. 工具调用 2 篇

ToolGrad: Efficient Tool-use Dataset Generation with Textual "Gradients"

Notation Matters: A Benchmark Study of Token-Optimized Formats in Agentic AI Systems

2. 软件智能体 2 篇

SkillRevise: Improving LLM-Authored Agent Skills via Trace-Conditioned Skill Revision

WebSP-Eval: Evaluating Web Agents on Website Security and Privacy Tasks

3. 多智能体 7 篇

Self-Evolving Multi-Agent Systems via Textual Backpropagation

Multi-Agent Systems are Mixtures of Experts: Who Becomes an Influencer?

The Dynamics of Policy Gradient in Social Dilemmas with Partner Selection

PosterForest: Hierarchical Multi-Agent Collaboration for Scientific Poster Generation

Emergent Macro-Criticality from Micro-Critical Agents

Market Informedness and Market-Maker Profitability: The Trade-Off Between Adverse Selection and Price Discovery

Epistemic Gain, Aleatoric Cost: Uncertainty Decomposition in Multi-Agent Debate for Math Reasoning

4. 其他Agent 4 篇

ResearchClawBench: A Benchmark for End-to-End Autonomous Scientific Research

Personality Pairing Improves Human-AI Collaboration

TWICE: Modeling the Temporal Evolution of Personalized User Behavior via Event-Driven Agents

Agents Trusting Agents? Restoring Lost Capabilities with Inclusive Healthcare

5. 规划决策 10 篇

PatchWorld: Gradient-Free Optimization of Executable World Models

InfoPO: Information-Driven Policy Optimization for User-Centric Agents

ActMem: Bridging the Gap Between Memory Retrieval and Reasoning in LLM Agents

Structured Cognitive Loop for Behavioral Intelligence in Large Language Model Agents (Extended Revision: From Behavioral Architecture to Epistemic Accountability)

Short-Term-to-Long-Term Memory Transfer for Knowledge Graphs under Partial Observability

Hierarchical Planning with Latent World Models

Game Theoretic Liquidity Provisioning in Concentrated Liquidity Market Makers

Cyber Resilience of Three-phase Unbalanced Distribution System Restoration under Sparse Adversarial Attack on Load Forecasting

Recursive Joint Simulation in Games

Robust Regularized Policy Iteration under Transition Uncertainty

6. 工作流自动化 1 篇

DeepInflation: an AI agent for research and model discovery of inflation