arXivDaily arXiv每日学术速递 周一至周五更新
2606.19931 2026-06-19 cs.MA 新提交

Blame is easier than praise: Measuring off-ball defensive performance in football

责备比表扬更容易:衡量足球中的无球防守表现

Jonas Bischofberger, Runqing Ma, Pascal Bauer, Kilian Arnsmeyer, Arnold Baca

AI总结 提出基于防守压力区(DPA)的球员参与度评分,将预期威胁的事件级变化归因于个体,以衡量足球无球防守表现,并在跨性别和跨赛事数据集上验证其有效性。

详情
AI中文摘要

足球运动员的防守表现通常通过有限的行动(如抢断和拦截)来衡量,而他们通过位置行为的持续影响此前很少被研究。我们将此问题表述为多智能体时空轨迹上的归因问题,没有球员级别的真实标签,其中事件级别的预期威胁变化被分配给个体。我们提出了一个框架,使用从防守压力区(DPA)计算的球员参与度评分来执行此归因。通过计算自动检测的团队结构内的角色条件基线,我们可以确定每个防守者对通过任意传球创造的威胁的预期责任。该方法的有效性和鲁棒性在独特的广泛跨性别和跨赛事数据集上进行了评估,包括来自男子世界杯64场比赛、女子德甲116场比赛和男子德丙336场比赛的位置和事件数据。在没有真实标签的情况下,我们提出了一个评估协议,将多个相对较弱的代理组合成稳健的总结分数。我们发现,与最佳基于行动的指标相比,有效性分数提高了大约一个标准差,并证明许多流行指标的有效性有限。对高价值行动的“责备”与外部评级和市场价值显示出特别强的相关性,使其成为足球中第一个可靠衡量定位错误的已发表指标。本工作所有代码均公开可用,以支持可重复性和进一步研究。

英文摘要

The defensive performance of football players is commonly measured through a limited number of actions like tackles and interceptions while their continuous impact through positional behaviour has hardly been studied before. We formulate this problem as an attribution over multi-agent spatiotemporal trajectories without player-level ground truth labels, where event-level changes of expected threat are distributed among individuals. We propose a framework that performs this attribution using player involvement scores calculated from defensive pressure areas (DPAs). By computing role-conditioned baselines within automatically detected team structures, we can determine each defender's expected responsibility for threat created through arbitrary passes. The validity and robustness of this approach are evaluated on a uniquely extensive cross-gender and cross-competition data set, including positional and event data from 64 matches of the men's World Cup, 116 matches of the women's German Bundesliga and 336 matches of the men's German 3. Liga. In the absence of a ground truth, we propose an evaluation protocol that combines multiple relatively weak proxies into robust summary scores. We find a validity score that is improved by around 1 standard deviation compared to the best action-based metric and demonstrate that many popular measures show limited validity. The "blame" for conceding high-value actions shows especially strong correlations with external ratings and market values, making it the first published metric in football to reliably measure positioning errors. All code underlying this work is publicly available to support reproducibility and further research.

2606.19758 2026-06-19 cs.MA 新提交

SIGMA: Skill-Incidence Graphs for Compositional Multi-Agent Design

SIGMA: 用于组合式多智能体设计的技能-关联图

Kun Zeng, Yu Huo, Siyu Zhang, Yuecheng Zhuo, Yuquan Lu, Haoyue Liu, Siyue Chen, Xiaoying Tang

AI总结 提出SIGMA框架,通过技能-智能体关联图将智能体构建为可复用技能的任务条件组合,并解码通信拓扑,在六个基准测试中优于基线方法,并展现出对未见技能库的鲁棒性。

Comments EMNLP2026

详情
AI中文摘要

现有的基于图的多智能体系统(MAS)设计者主要通过优化预定义智能体、角色或组上的通信拓扑来改善协作。然而,由于每个节点仍然是一个封闭集实体,这些方法难以泛化到需要未见能力组合的任务。我们提出SIGMA,一个技能-关联图框架,将智能体构建为可复用技能的任务条件组合。给定一个任务和一个技能库,SIGMA预测一个技能-智能体关联矩阵,从选定的技能中组合智能体节点嵌入,并在构建的智能体上解码通信拓扑。在执行过程中,特定技能的邮箱将消息路由到相关分配的能力,使关联结构直接可操作。在六个推理和编码基准测试中,使用三个基础LLM,SIGMA实现了最佳平均性能,并分别比最强的非组合式拓扑基线CARD提高了2.06、2.36和1.75分。它还对未见技能库表现出更强的鲁棒性,平均性能下降仅为0.96分。这些结果表明,组合式节点构建是多智能体设计中除了通信拓扑优化之外的一个互补且重要的方向。代码可在以下网址获取:https://this URL。

英文摘要

Existing graph-based multi-agent system (MAS) designers mainly improve collaboration by optimizing communication topologies over predefined agents, roles, or groups. However, because each node remains a closed-set entity, these methods struggle to generalize to tasks that require unseen combinations of capabilities. We propose SIGMA, a skill-incidence graph framework that constructs agents as task-conditioned bundles of reusable skills. Given a task and a skill library, SIGMA predicts a skill-agent incidence matrix, composes agent node embeddings from selected skills, and decodes a communication topology over the constructed agents. During execution, skill-specific mailboxes route messages to the relevant assigned capabilities, making the incidence structure directly operational. Across six reasoning and coding benchmarks with three base LLMs, SIGMA achieves the best average performance and improves over CARD, the strongest non-compositional topology-based baseline, by 2.06, 2.36, and 1.75 points, respectively. It also shows stronger robustness to unseen skill libraries, with an average performance drop of only 0.96 points. These results suggest that compositional node construction is a complementary and important axis for multi-agent design beyond communication topology optimization. Code is available at https://anonymous.4open.science/r/SIGMA-2338/.

2606.19537 2026-06-19 cs.MA cs.DC 新提交

Mesh Inference: A Formal Model of Collective Intelligence Without a Center

网格推理:无中心集体智能的形式模型

Hongwei Xu

AI总结 提出网格推理形式模型,通过耦合自由能实现无中心多智能体协作推理,证明收敛唯一性、识别完备性和观测唯一性,并分析线性高斯情况下的延迟代价。

Comments 21 pages, 2 figures

详情
AI中文摘要

我们提出了网格推理的形式模型:一群独立智能体,每个持有私有状态,仅交换被接纳的、类型化的观测,在没有中央协调者且无智能体暴露的情况下,推导出任何一个智能体单独无法得出的结论。没有智能体共享权重、梯度或隐藏状态,且智能体可能跨越不同的团队、网络和组织。受“询问模型是能量最小化推理”这一观察的启发,我们将网格建模为每个智能体局部松弛的耦合自由能。我们证明,单一的接纳/发射策略控制三个性质。首先,对于任何对称或非对称的接纳,网格推理收敛到唯一答案,因为耦合总是M-矩阵。其次,它是识别完备的:当贡献视图是载波连通时,它精确推导出集中式最优解。第三,它是仅观测的:没有节点传输其内部状态,且机密性是识别的对偶。内容寻址谱系是唯一的全局侧信道。在线性高斯情况下,每个推导出的答案都是确定的,因此等于集中式最优解,延迟为O(diam^2),这是移除中心所付出的代价。这样的推导是无中心学习循环的一个环节,我们将其形式化为架构而非证明。我们提出的开放问题是,询问何时能改善集体而非破坏它:非线性闭包是推导出升级的答案还是自信的错误。据我们所知,这是网格推理的第一个形式模型。

英文摘要

We present a formal model of mesh inference: how a population of independent agents, each holding private state and exchanging only admitted, typed observations, derives a conclusion none of them holds alone, with no central coordinator and no agent exposed. No agent shares weights, gradients, or hidden state, and the agents may span different teams, networks, and organizations. Motivated by the observation that asking a model is energy-minimizing inference, we model the mesh as a coupled free energy that each agent relaxes locally. We show that a single admission/emission policy governs three properties. First, mesh inference converges to a unique answer for any admission, symmetric or not, because the coupling is always an M-matrix. Second, it is identification-complete: it derives the centralized optimum exactly when the contributing views are carrier-connected. Third, it is observation-only: no node transmits its internals, and confidentiality is the dual of identification. Content-addressed lineage is the only global side-channel. In the linear-Gaussian regime every derived answer is determined, hence equal to the centralized optimum, at O(diam^2) latency, the measured price of removing the center. One such derivation is one turn of a center-free learning loop, which we formalize as architecture rather than prove. The open problem we state is when asking improves the collective rather than corrupting it: whether the non-linear closure derives an upgraded answer or a confident error. To our knowledge, this is the first formal model of mesh inference.

2606.18649 2026-06-19 cs.MA cs.CL cs.CY 新提交

Gender Bias in LLM Hiring Decisions: Evidence from a Japanese Context and Evaluation of Mitigation Strategies

LLM招聘决策中的性别偏见:来自日本语境的证据及缓解策略评估

Serena A. Hoffstedde, Machiko Hirota, Akshara Nadayanur Sathis Kanna, Rihito Kotani, Ujwal Kumar, Gabriele Trovato, Phan Xuan Tan

发表机构 * Shibaura Institute of Technology, Tokyo, Japan(Shibaura技术学院,东京,日本) Amsterdam University of Applied Sciences, Amsterdam, Netherlands(阿姆斯特丹应用科学大学,阿姆斯特丹,荷兰) University of Pennsylvania, Philadelphia, USA(宾夕法尼亚大学,费城,美国) Carnegie Mellon University, Pittsburgh, USA(卡内基梅隆大学,匹兹堡,美国) Keio University, Tokyo, Japan(庆应大学,东京,日本)

AI总结 本研究通过60份日本履历书格式的简历和5个先进LLM,发现所有模型均存在显著的亲女性偏见,且简单的提示指令无法缓解,而移除姓名几乎完全消除该偏见。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被部署在招聘流程中,然而大多数关于LLM招聘决策中性别偏见的研究都集中在英语、西方格式的简历上。本研究考察了亲女性性别偏见是否扩展到日本企业语境,并评估了两种实用的缓解策略。使用反事实简历设计,包含60份日本履历书格式的简历、基于语言学性别信号标准选择的12个姓名对,以及五个最先进的LLM(Claude Sonnet 4.6、GPT-4o、DeepSeek-V3、Gemini 2.5 Flash、Llama 3.3 70B),我们在基线、提示指令和隐私过滤条件下进行了43,200次API调用。交叉随机效应线性混合模型确认了所有五个模型均存在显著的亲女性偏见,将西方研究结果复制到了非西方语境中。提示级别的性别中立指令并未显著减少偏见。姓名依赖分析正式将候选人姓名识别为主要性别渠道:从提示中移除姓名几乎完全消除了女性效应。隐私过滤器与GPT-4o内容安全过滤器之间的意外不兼容导致42%的拒绝率,突显了在LLM辅助招聘流程中姓名匿名化的实际部署挑战。

英文摘要

Large language models (LLMs) are increasingly deployed in hiring workflows, yet most research on gender bias in LLM hiring decisions has focused on English-language, Western-format resumes. This study examines whether pro-female gender bias extends to a Japanese corporate context and evaluates two practical mitigation strategies. Using a counterfactual resume design with 60 Japanese rirekisho-format resumes, 12 name pairs selected on linguistically grounded gender-signal criteria, and five state-of-the-art LLMs (Claude Sonnet 4.6, GPT-4o, DeepSeek-V3, Gemini 2.5 Flash, Llama 3.3 70B), we conducted 43,200 API calls across baseline, prompt instruction, and privacy filter conditions. A crossed random-effects linear mixed model confirms a significant pro-female bias across all five models, replicating Western findings in a non-Western context. A prompt-level gender-neutrality instruction produces no meaningful reduction in bias. A name-reliance analysis formally identifies the candidate name as the primary gender channel: removing the name from the prompt reduces the female effect by nearly its full magnitude. An unexpected incompatibility between the privacy filter and GPT-4o's content safety filter, resulting in a 42% refusal rate, highlights a practical deployment challenge for name anonymization in LLM-assisted recruitment pipelines.

2606.20493 2026-06-19 cs.LG cs.AI cs.MA 交叉投稿

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

传染网络:多智能体LLM系统中的评估者偏见传播

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering(齐鲁理工学院软件工程学院)

AI总结 提出传染网络框架,量化评估者偏见在多智能体LLM系统中的传播,发现同模型智能体间偏见传播系数为0.157-0.352,且增大评估委员会规模可减少72.4%的传播效应。

Comments 20 pages, 4 figures, 4 tables

详情
AI中文摘要

当大型语言模型在多智能体系统中担任评估者时,其系统性评估偏见会通过智能体网络传播。我们引入传染网络,这是一个用于衡量评估者偏见如何在交互的LLM智能体间传播的正式框架。在使用DeepSeek-chat进行的受控3智能体实验中,我们采用了三种不同的评估者偏见配置文件(结构化、平衡、基于证据),测量了跨智能体传染矩阵Gamma_3,并发现评估者偏见始终在智能体间传播(gamma在[0.157, 0.352]范围内),即使是在相同底层模型内也是如此。我们识别出由谱半径rho(Gamma_N)控制的三种传播机制,并证明同质模型智能体产生的传染系数比先前工作中观察到的跨模型系数弱3-5倍(MM-EPC: gamma约0.85-1.3),使其处于抑制机制中。我们表明,将评估委员会规模从k=1增加到k=3可将有效传染减少72.4%,提供了一种可行的缓解策略。我们发布了开源的传染网络实验框架。

英文摘要

When large language models serve as evaluators in multi-agent systems, their systematic evaluation biases propagate through the agent network. We introduce Contagion Networks, a formal framework for measuring how evaluator biases spread across interacting LLM agents. In a controlled 3-agent experiment using DeepSeek-chat with three distinct evaluator bias profiles (structured, balanced, evidence-based), we measure the Cross-Agent Contagion Matrix Gamma_3 and find that evaluator biases consistently propagate between agents (gamma in [0.157, 0.352]), even within the same underlying model. We identify three propagation regimes governed by the spectral radius rho(Gamma_N), and demonstrate that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model coefficients observed in prior work (MM-EPC: gamma approx 0.85-1.3), placing them in the suppression regime. We show that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, providing an actionable mitigation strategy. We release the open-source Contagion Network experimental framework.

2606.20365 2026-06-19 cs.RO cs.MA 交叉投稿

An Infrastructure-less, Control-Independent Solution to Relative Localisation of a Team of Mobile Robots using Ranging Measurements

基于测距的移动机器人团队相对定位的无基础设施、控制无关解决方案

Paolo Golinelli, Tommaso Faraci, Daniele Fontanelli

发表机构 * Department of Industrial Engineering, University of Trento(特伦托大学工业工程系) Department of Information Engineering and Computer Science, University of Trento(特伦托大学信息工程与计算机科学系)

AI总结 提出一种无锚点、完全去中心化的协作定位算法,仅依赖局部里程计、稀疏测距和短程通信,无需控制机器人运动即可实现团队可观测性,采用多假设贝叶斯框架保证鲁棒性。

详情
AI中文摘要

定位机器人团队的能力对于从非结构化环境中的机器人舰队到协作控制和导航任务等应用至关重要。在此类场景中,固定基础设施通常不可用,部署必须快速灵活,系统要求必须最小化。我们提出了一种去中心化协作定位算法,同时解决了所有这些挑战。该方法无锚点、完全去中心化,并且与大多数现有方法不同,不需要控制机器人运动来确保团队可观测性。它仅依赖局部里程计、稀疏的代理间测距测量和短程通信,这些在实践中广泛可用。该算法采用多假设贝叶斯框架,维护所有可行解集,确保在瞬态不可观测条件下的鲁棒性。此外,通过信息共享,每个代理都能受益于整个群体的估计,即使在部分连接条件下也是如此。

英文摘要

The ability to localise teams of robots is essential for applications ranging from robotic fleets in unstructured environments to cooperative control and navigation tasks. In such contexts, fixed infrastructure is often unavailable, deployments must be fast and flexible, and system requirements must be minimal. We present a decentralised cooperative localisation algorithm that addresses all these challenges at once. The method is anchor-less, fully decentralised, and, unlike most existing approaches, does not require controlling the robots motion to ensure team observability. It relies only on local odometry, sparse inter-agent ranging measurements, and short-range communication, all of which are widely available in practice. The algorithm adopts a multi-hypothesis Bayesian framework that maintains the entire set of feasible solutions, ensuring robustness under transient unobservable conditions. Moreover, through information sharing, each agent benefits from the estimates of the entire group, even in partially connected conditions.

2606.20243 2026-06-19 cs.SE cs.MA 交叉投稿

Phoenix: Safe GitHub Issue Resolution via Multi-Agent LLMs

Phoenix: 通过多智能体LLM实现安全的GitHub问题解决

Kipngeno Koech, Muhammad Adam, Baimam Boukar Jean Jacques, Joao Barros

AI总结 提出多智能体LLM系统Phoenix,通过六个专业智能体和七层安全控制,在SWE-bench Lite子集上达到75%的解决率,并在真实问题中保持100%正确性。

详情
AI中文摘要

我们提出Phoenix,一个多智能体LLM系统,能够从分类到拉取请求创建解决GitHub问题,结合了七层安全控制与基线感知测试评估策略。Phoenix将工作分解给六个专业智能体:规划器、复现器、编码器、测试器、故障分析器和拉取请求(PR)智能体,所有智能体由基于标签的GitHub webhook状态机协调。在打开拉取请求之前,每次更改都会与基线测试运行进行对比。在SWE-bench Lite的24个实例子集上,在生产webhook路径上运行,Phoenix oracle解决了75%的实例,且成功运行中没有出现通过到通过的回归;这个精心挑选的子集不能直接与完整分割排行榜结果比较,我们讨论了比较的局限性。在14个仓库的42个真实问题上的补充试点实现了100%的正确性保持(CP;硬级别平均122秒)。人工检查显示,大约一半的拉取请求是定位良好的修复。另一半将代码放置在错误路径上,这是规划器定位的局限性,我们正在通过检索来解决。我们还报告了部署失败模式(WAF过滤、令牌过期、权限边界、不稳定的CI),这些模式促使了每种安全机制的引入。

英文摘要

We present Phoenix, a multi-agent LLM system that resolves GitHub issues from triage through pull-request creation, combining seven layered safety controls with a baseline-aware test evaluation strategy. Phoenix decomposes the work across six specialized agents. Planner, reproducer, coder, tester, failure analyst and Pull Request (PR) agent, all coordinated by a label-based GitHub webhook state machine. Every change is checked against a baseline test run before a pull request is opened. On a 24-instance slice of SWE-bench Lite. run on the production webhook path, Phoenix oracle-resolves 75% of instances with no pass-to-pass regressions on successful runs; this curated slice is not directly comparable to full-split leaderboard results, and we discuss the limits of the comparison. A complementary pilot on 42 real issues across 14 repositories yields 100% correctness preservation (CP; mean 122s on the hard tier). Manual inspection shows that about half of the resulting pull requests are well-targeted fixes. The other half place code at incorrect paths, a planner localization limitation we are addressing with retrieval. We also report the deployment failure modes (WAF filtering, token expiry, permission boundaries, flaky CI) that motivated each safety mechanism.

2606.20236 2026-06-19 cs.AI cs.LG cs.MA 交叉投稿

A Multi-Agent system for Multi-Objective constrained optimization

多目标约束优化的多智能体系统

Federica Filippini

发表机构 * University of Milano-Bicocca(米兰比可卡大学)

AI总结 提出MAMO,通过多智能体强化学习解耦任务执行与目标设计,自动学习奖励权重以平衡主目标优化与约束违反,提升动态环境下RL的自主性和鲁棒性。

Comments Presented at the 17th Workshop on Optimization and Learning in Multiagent Systems (OptLearnMAS, https://optlearnmas.github.io), co-located with the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
AI中文摘要

计算和网络系统中的许多决策问题可以自然地表述为在性能约束下的成本最小化问题。在动态环境中,强化学习(RL)通常通过在运行时将成本和约束违反通过加权惩罚项嵌入到单个标量奖励中(遵循拉格朗日启发式公式)来解决此类问题。然而,在这种背景下,学习策略的行为关键取决于这些权重的选择,而权重通常是手动选择的。这使得难以在优化主要目标和有效避免约束违反之间找到适当的权衡,特别是在非平稳环境中,它们的相对重要性可能发生变化。本文提出了MAMO(多目标约束优化的多智能体系统),一种通过多智能体RL解决这种平衡问题的方法。MAMO通过将奖励权重的选择表述为一个学习问题,将任务执行与目标设计解耦,为动态环境中约束优化问题的更自主和鲁棒的基于RL的解决方案迈出了第一步。

英文摘要

Many decision-making problems in computing and networking systems can be naturally formulated as cost-minimization problems under performance constraints. In dynamic environments, reinforcement learning (RL) is often used to solve such problems at runtime by embedding both costs and constraint violations into a single scalar reward through weighted penalty terms, following a Lagrangian-inspired formulation. However, in this context the behavior of the learned policy critically depends on the choice of these weights, which are typically selected manually. This makes it difficult to identify an appropriate trade-off between optimizing the primary objective and effectively avoiding constraint violations, particularly in non-stationary environments where their relative importance may change. This paper presents MAMO (Multi-Agent system for Multi-Objective constrained optimization), an approach to tackle this balancing problem through multi-agent RL. MAMO decouples task execution from objective design by formulating the selection of reward weights as a learning problem, providing a !rst step towards more autonomous and robust RL-based solutions for constrained optimization problems in dynamic environments.

2606.20142 2026-06-19 cs.AI cs.MA 交叉投稿

RACL: Reasoning-Agent Control Layers for Continuous Metaheuristic Learning

RACL:用于连续元启发式学习的推理代理控制层

Antón Asla Manzárraga

AI总结 提出RACL方法,在元启发式优化器之上添加推理代理,通过观察、推理和干预控制搜索行为,在车辆路径问题上平均成本降低0.641%-8.337%。

Comments 10 pages, 5 tables

详情
AI中文摘要

本文介绍了RACL,一种用于元启发式算法的推理代理控制层。RACL在现有优化器之上放置一个推理代理。该代理不替换优化器,也不修改业务约束。相反,它通过观察操作内存、推理过去行为、制定有界假设、测试干预、评估结果、应用护栏、巩固有用策略并解释其决策来控制优化器的内部搜索行为。实验使用车辆路径作为测试平台,但贡献不是新的路由求解器、特定的ALNS配置或特定的路由规则集。贡献是RACL方法:一种推理代理发现、验证、巩固和解释元启发式算法控制规则的方式。在当前实验设置中,RACL在21个可行案例中的21个中改进或持平操作内存策略,在21个可行案例中的18个中改进或持平非推理停滞触发策略,平均RACL与STP成本差异为-0.641%。在Sevilla-9/10运行时样本中,RACL相对于Fixed平均成本降低-8.337%,相对于STP降低-1.605%,且没有显示实质性计算开销。在概念验证期间,Codex被用作循环推理代理,观察执行、解释日志并提出实时有界干预。后来仅使用策略代理使定量评估可重复。

英文摘要

This paper introduces RACL, a Reasoning-Agent Control Layer for metaheuristics. RACL places a reasoning agent above an existing optimizer. The agent does not replace the optimizer and does not modify business constraints. Instead, it controls the optimizer's internal search behavior by observing operational memory, reasoning over past behavior, formulating bounded hypotheses, testing interventions, evaluating outcomes, applying guardrails, consolidating useful policies and explaining its decisions. The experiment uses vehicle routing as a testbed, but the contribution is not a new routing solver, a particular ALNS configuration or a specific set of routing rules. The contribution is the RACL method: a way for a reasoning agent to discover, validate, consolidate and explain algorithmic control rules for a metaheuristic. In the current experimental setting, RACL improves or ties the Operational Memory Policy in 21 of 21 feasible cases and improves or ties a non-reasoning Stagnation-Triggered Policy in 18 of 21 feasible cases, with an average RACL vs STP cost delta of -0.641%. In the Sevilla-9/10 runtime sample, RACL improves average cost by -8.337% versus Fixed and -1.605% versus STP without showing material computational overhead. During the proof-of-concept, Codex was used as an in-the-loop reasoning agent observing executions, interpreting logs and proposing live bounded interventions. The policy proxy was later used only to make quantitative evaluation reproducible.

2606.20122 2026-06-19 cs.AI cs.MA 交叉投稿

ScaffoldAgent: Utility-Guided Dynamic Outline Optimization for Open-Ended Deep Research

ScaffoldAgent: 面向开放式深度研究的效用引导动态大纲优化

Zhibang Yang, Xinke Jiang, Yuzhen Xiao, Ruizhe Zhang, Yue Fang, XinFei Wan, Zhengxing Song, Yuxuan Liu, Yuheng Huang, Xu Chu, Junfeng Zhao, Yasha Wang

发表机构 * National Engineering Research Center of Software Engineering, Peking University(北京大学软件工程国家工程研究中心) School of Computer Science, Peking University(北京大学计算机学院) Key Laboratory of High Confidence Software Technologies, Ministry of Education(教育部高可信软件技术重点实验室) GRG Banking Equipment Co., Ltd.(广电运通金融电子股份有限公司) Center on Frontiers of Computing Studies, Peking University(北京大学计算前沿研究中心) Peking University Information Technology Institute (Tianjin Binhai)(北京大学(天津滨海)信息技术研究院)

AI总结 提出ScaffoldAgent框架,通过效用引导的动态大纲优化(扩展、收缩、修订操作)解决开放式深度研究中大纲漂移问题,在DeepResearch Bench和Gym上提升长报告生成与事实准确性。

Comments 9 pages, 6 figures

详情
AI中文摘要

开放式深度研究(OEDR)要求系统通过多轮检索获取知识并生成连贯的长篇报告。大纲作为协调检索、证据组织和生成的结构性支架起着核心作用。然而,现有方法要么在写作前固定大纲,要么使用局部启发式方法进行优化,导致在持续信息积累下出现大纲漂移,且评估大纲修改的反馈延迟。我们提出ScaffoldAgent,一种面向OEDR的效用引导动态大纲优化框架。ScaffoldAgent将大纲演化建模为结构化决策过程,包含三种操作:扩展、收缩和修订,从而实现对报告支架的受控更新。它进一步引入效用引导的反馈机制,通过检索增益、结构连贯性和试生成质量来估计每个大纲操作的下游价值。得到的效用信号指导推理过程中的节点选择、操作调度和终止。在DeepResearch Bench和DeepResearch Gym上的实验表明,ScaffoldAgent在长报告生成和事实基础上持续优于现有的深度研究智能体。

英文摘要

Open-ended deep research (OEDR) requires systems to acquire knowledge through multi-round retrieval and generate coherent long-form reports. The outline plays a central role as a structural scaffold that coordinates retrieval, evidence organization, and generation. However, existing methods either fix the outline before writing or refine it with local heuristics, leading to scaffold drift under continuous information accumulation and delayed feedback for evaluating outline modifications. We propose ScaffoldAgent, a utility-guided dynamic outline optimization framework for OEDR. ScaffoldAgent models outline evolution as a structured decision process with three operations: Expansion, Contraction, and Revision, enabling controlled updates to the report scaffold. It further introduces a utility-guided feedback mechanism that estimates the downstream value of each outline operation from retrieval gain, structural coherence, and trial-generation quality. The resulting utility signal guides node selection, operation scheduling, and termination during inference. Experiments on DeepResearch Bench and DeepResearch Gym show that ScaffoldAgent consistently improves long-form report generation and factual grounding over existing deep research agents.

2606.19920 2026-06-19 cs.RO cs.LG cs.MA 交叉投稿

Deep-Unfolded Coordination

深度展开协调

Hunter Kuperman, Minchan Jung, Rahul V. Ghosh, Alex Oshin, Evangelos A. Theodorou

发表机构 * Autonomous Control and Decision Systems Laboratory Georgia Institute of Technology United States(佐治亚理工学院自主控制与决策系统实验室)

AI总结 提出Deep Coordinator框架,通过深度展开ADMM-DDP迭代学习动态调整超参数,实现非凸优化器求解时自适应惩罚参数,在车队和四旋翼仿真中速度提升6.18-9.44倍且可扩展至8倍规模。

Comments The second and third authors contributed equally (equal second authorship). 35 pages (10 pages main text), 17 figures, 3 tables

详情
AI中文摘要

分布式优化是一种高度可扩展且结构透明的技术,用于解决多机器人问题;然而,这类方法通常需要高度专门化、针对特定问题的超参数调整。在这项工作中,我们提出了Deep Coordinator,一个深度展开框架,学习在求解时根据优化器性能动态调整ADMM-DDP(一种流行的机器人任务分布式求解器)的超参数。我们的架构包括将固定数量的ADMM-DDP迭代展开成一个神经网络,层之间具有可学习的函数,将优化器状态映射到下一个超参数。据我们所知,Deep Coordinator是第一个在求解时调整非凸优化器惩罚参数的深度展开框架;我们展示了主流的监督方法在训练此类模型时可能产生退化解,并提出了一种无监督学习方案。在车队和四旋翼飞行器的仿真中,Deep Coordinator生成的轨迹质量与常规求解器相当,但速度快6.18-9.44倍。此外,当部署到比训练规模大8倍的系统时,Deep Coordinator仍能保持其性能优势。

英文摘要

Distributed optimization is a highly scalable and structurally transparent technique to solve multi-agent robotics problems; however, such methods often suffer from the need for highly-specialized, problem-specific hyperparameter tunings. In this work, we propose Deep Coordinator, a deep-unfolding framework that learns to dynamically adjust the hyperparameters of ADMM-DDP, a popular distributed solver for robotics tasks, at solve-time in response to optimizer performance. Our architecture consists of unrolling a fixed number of ADMM-DDP iterations into a neural network with learnable functions between layers mapping the optimizer state to the next hyperparameters. To the best of our knowledge, Deep Coordinator is the first deep-unfolding framework to adapt the penalty parameters of a non-convex optimizer at solve-time; we show that the mainstream supervised approach can yield degenerate solutions when training such models, and propose an unsupervised learning scheme. On simulations with fleets of cars and quadrotors, Deep Coordinator produces trajectories of comparable quality 6.18-9.44x faster than conventional solvers. Furthermore, Deep Coordinator retains its performance benefits when deployed to systems up to 8x larger than trained on.

2606.19826 2026-06-19 cs.CR cs.MA 交叉投稿

Heterogeneous LLM Debate Under Adversarial Peers: Honest Gains, Replacement Costs, and Resilience

对抗性同伴下的异构LLM辩论:诚实增益、替代成本与韧性

Prashanti Nilayam, Kiran Kumar Ramanna, Prashil Tumbade, Sankalp Nayak

AI总结 研究异构LLM辩论中诚实与对抗性同伴对修正行为的影响,发现诚实同伴降低有害修正率,对抗性同伴则逆转,且异构性在已有对手时也能作为防御。

详情
AI中文摘要

异构LLM辩论的动机在于,多样化的同伴可以相互纠正,但同样的交流既携带纠正也携带对抗性影响。我们通过跟踪异构同伴如何改变诚实智能体的修正行为来衡量哪种影响占主导:他们改变答案的频率,以及这种改变是纠正性的还是有害的。我们比较了匹配面板(同质基线、诚实混合和对抗混合)以及受污染面板(其中已存在一个恶意的同族同伴),涵盖四个模型家族和三个推理基准。一个诚实的异构同伴显著降低了有害修正,而对抗性同伴则逆转了这一效果。对于Llama-3.1-70B防御者在MATH-hard上,诚实插槽的有害修正率从同质面板的89%下降到有诚实同伴时的35%,而对抗性同伴使其回到90%。条件率对弱防御者隐藏了这种损害,但辩论结束时的翻转率暴露了它。该模式在家族和基准上保持符号一致,而其幅度随防御者-基准机制变化。我们还测量了当已存在一个对抗性同族同伴时的效果:一个诚实的异构同伴降低了有害修正率以及最初正确答案丢失的比率。在相同的Llama-3.1-70B设置下,添加的诚实同伴将最初正确项上的翻转率从同族对手下的31%降至6%。因此,异构性不仅是一个攻击面,而且当对手已经存在时,也是一种防御。

英文摘要

Heterogeneous LLM debate is motivated by the promise that diverse peers correct one another, but the same exchange that carries correction also carries adversarial influence. We measure which dominates by tracking how a heterogeneous peer changes the honest agents' revision behavior: how often they change their answer, and whether the change is corrective or harmful. We compare matched panels (homogeneous baseline, honest-mixed, and adversarial-mixed) and contaminated panels in which a malicious same-family peer is already present, spanning four model families and three reasoning benchmarks. An honest heterogeneous peer sharply lowers harmful revision, and an adversarial one reverses it. For Llama-3.1-70B defenders on MATH-hard, the honest-slot harmful-revision rate falls from 89% in the homogeneous panel to 35% with an honest peer, and an adversarial peer returns it to 90%. The conditional rate hides this damage on weak defenders, but the end-of-debate flip rate exposes it. The pattern keeps its sign across families and benchmarks while its magnitude varies with the defender-benchmark regime. We also measure the effects when an adversarial same-family peer is already present: an honest heterogeneous peer lowers both harmful revision and the rate at which initially-correct answers are lost. On the same Llama-3.1-70B setting, the added honest peer cuts the flip rate on initially-correct items from 31% under a same-family adversary to 6%. Heterogeneity is therefore not only an attack surface but, when an adversary is already present, also a defense.

2606.19725 2026-06-19 cs.SE cs.AI cs.MA 交叉投稿

Library-Aware Doubles and Iterative Repair for Large Language Model-Generated Unit Tests in OpenSIL Firmware

面向OpenSIL固件中大语言模型生成的单元测试的库感知双打与迭代修复

Ma Toan Bach, Yuchi Zheng, Haingo Razafindranto, Tanvir Alam, Aric Leather, Ranveer Sandhu, Jitesh Arora

AI总结 针对OpenSIL固件单元测试因构建约束易失败的问题,提出LLM引导的多智能体自动化测试生成与迭代修复流程,在76个函数中73个生成可编译测试,行覆盖率达98.8%。

Comments 20 pages, 10 figures

详情
AI中文摘要

验证底层C固件中的变更成本高昂,因为单元测试(UT)在严格的构建约束下非常脆弱,缺失的头文件、未解析的符号和依赖不匹配经常阻止编译和链接。本研究为AMD维护的开源硅初始化库(openSIL)固件代码库引入了一种自动化的UT编写工作流程,通过大语言模型(LLM)引导的多智能体管道减少手动工作。该工作流程结合了测试框架的自动生成、库感知的桩、模拟和伪造的创建或重用,以及由构建日志和行覆盖率反馈驱动的迭代编译-分派修复循环。我们使用编译成功率、修复迭代次数、分派成功率和行覆盖率评估该方法,并以时间、成本和令牌使用量作为次要指标。在76个被测函数中,该工作流程为73个函数生成了可编译的UT。在没有行覆盖率指导或检索增强的配置下,平均行覆盖率达到73.9%。在两种配置下评估的48个函数子集中,仅使用行覆盖率指导时平均行覆盖率达到98.8%,与向量数据库检索结合时达到94.7%。结果表明,自动生成和修复管道可以显著提高受限固件环境中UT创建的效率和覆盖率,同时减少手动调试工作量。

英文摘要

Validating changes in low-level C firmware is expensive because unit tests (UTs) are fragile under strict build constraints, where missing headers, unresolved symbols, and dependency mismatches frequently prevent compilation and linking. This study introduces an automated UT authoring workflow for the Open-Source Silicon Initialization Library (openSIL) firmware codebase maintained by Advanced Micro Devices (AMD) that reduces manual effort through a large language model (LLM) guided multi-agent pipeline. The workflow combines automated generation of test scaffolds, library-aware creation or reuse of stubs, mocks, and fakes, and an iterative compile-dispatch repair loop driven by build logs and line-coverage feedback. We evaluate the approach using compilation success, repair iterations, dispatch success, and line coverage, with time, cost, and token usage as secondary measures. Across 76 functions under test, the workflow generated compilable UTs for 73 functions. In a configuration without line coverage guidance or retrieval augmentation, mean line coverage reached 73.9%. On a 48-function subset evaluated under both configurations, mean line coverage reached 98.8% with line-coverage guidance alone and reached 94.7% when combined with vector-database retrieval. Results show that automated generation-and-repair pipelines can substantially improve UT creation efficiency and coverage for constrained firmware environments while reducing manual debugging effort.

2606.19683 2026-06-19 cs.AI cs.MA cs.SY eess.SY 交叉投稿

Exit-and-Join Dynamics for Decentralized Coalition Formation

去中心化联盟形成的退出与加入动力学

Quanyan Zhu

发表机构 * New York University Tandon School of Engineering(纽约大学坦登工程学院) Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 研究基于单边退出与加入决策的去中心化联盟形成动力学,利用Aumann-Dreze值计算个体收益,建立合作支付分配与非合作最优反应的关联,并分析均衡特征及成本对局部稳定性的影响。

详情
AI中文摘要

本文研究联盟形成作为一种由单边退出与加入决策驱动的去中心化动力学过程。智能体使用Aumann-Dreze值评估局部移动,因此收益在智能体当前联盟内计算,而非通过全局协商的联盟结构。由此产生的模型将合作支付分配与非合作最优反应行为联系起来:一个终端划分恰好是一个没有可接受的、个体有利可图的退出与加入偏离的联盟结构。我们建立了均衡特征,确定了动力学允许标量Lyapunov或精确势函数表示的条件,并分析了切换和接受成本如何塑造局部稳定性。数值实验测试了有限时间稳定、成本敏感性以及一个特殊的凸博弈基准。

英文摘要

This paper studies coalition formation as a decentralized dynamical process driven by unilateral exit-and-join decisions. Agents evaluate local moves using the Aumann-Dreze value, so payoffs are computed within the agent's current coalition rather than through a globally negotiated coalition structure. The resulting model links cooperative payoff allocation with noncooperative best-response behavior: a terminal partition is precisely a coalition structure with no admissible, individually profitable exit-and-join deviation. We establish equilibrium characterizations, identify conditions under which the dynamics admit scalar Lyapunov or exact-potential representations, and analyze how switching and acceptance costs shape local stability. Numerical experiments test finite-time stabilization, cost sensitivity, and a special convex-game benchmark.

2606.19632 2026-06-19 cs.RO cs.AI cs.LG cs.LO cs.MA 交叉投稿

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

通过决策树蒸馏对学习到的多智能体通信策略进行形式化验证

Ahmad Farooq, Kamran Iqbal

发表机构 * University of Arkansas at Little Rock(阿肯色大学小石城分校)

AI总结 提出通过决策树蒸馏将多智能体强化学习策略转化为可解释模型,并利用PRISM进行形式化验证,确保安全属性转移至原始网络,在无人机编队任务中实现88.9%属性满足率。

Comments 9 pages, 3 figures, 7 tables. Accepted at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026), Pittsburgh, Pennsylvania, USA, September 27-October 1, 2026

详情
AI中文摘要

多智能体强化学习使智能体能够通过涌现通信发展协调策略,但神经策略缺乏无人机群和自动驾驶车队等安全关键机器人部署所需的形式化安全保证。我们提出了首个通过学习策略抽象进行安全验证的端到端框架:神经策略被蒸馏为可解释的决策树,然后进行形式化验证,并通过经验验证确认验证的安全属性可转移至原始网络。我们的四阶段流程包括:从智能体观测中提取领域特定特征;决策树蒸馏达到97.9% +/- 1.2%的神经策略保真度;自动翻译为PRISM概率模型检查器规范,具有完整的特征到状态变量对应关系;以及通过成对分解、联合界聚合和经验邻居建模对概率计算树逻辑属性进行组合验证。评估用于5-7个智能体多无人机协调的矢量量化变分信息瓶颈策略,我们验证了18个涵盖安全性、活性和合作的时间逻辑属性,实现了88.9%的属性满足率,所有五个安全阈值均满足(碰撞概率0.3% vs 阈值1%)。原始神经策略的蒙特卡洛验证确认验证的安全属性转移偏差<=0.6个百分点(95%置信区间)。离散VQ-VIB消息相比连续方法提供+11.6至+13.6个百分点的保真度优势,实现3-4倍更快的验证。我们的框架为蒸馏策略抽象提供了经验验证的安全验证,作为深度多智能体强化学习与多机器人部署形式化安全工作流之间的实用桥梁。

英文摘要

Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with <=0.6 percentage-point deviation (95% CI). Discrete VQ-VIB messages provide +11.6 to +13.6 percentage-point fidelity advantages over continuous methods, enabling 3-4x faster verification. Our framework provides empirically validated safety verification for distilled policy abstractions, serving as a practical bridge between deep MARL and formal safety workflows for multi-robot deployment.

2606.19616 2026-06-19 cs.SE cs.AI cs.MA 交叉投稿

Before the Pull Request: Mining Multi-Agent Coordination

在拉取请求之前:挖掘多智能体协调

Dipankar Sarkar

AI总结 针对自主编码智能体在拉取请求中协调不足的问题,提出基于git的协调基板grite,通过事件日志减少重复和冲突工作,提升吞吐量,并自动恢复多种故障模式。

Comments 9 pages, 2 tables. LNCS format. Code, dataset, and mining toolkit: https://github.com/neul-labs/grite

详情
AI中文摘要

自主编码智能体现在可以开启数百万个拉取请求,然而大规模研究发现,它们的拉取请求虽然生成更快,但被接受的频率却更低——这是一个拉取请求级别的遥测无法解释的协调与信任差距。我们认为缺失的信号存在于拉取请求之前,即并发智能体如何声明、划分和碰撞共享工作。我们通过grite(我们的开源协调基板)来研究这一过程,它不需要中央服务器,并将其记录存储在git本身内部,因此其仅追加的、签名的事件日志直接捕获了协调过程。我们证明:(i) 这种共享基板以有限的开销减少了重复和冲突工作——仅重复队友任务的工作份额从78%降至0%,而有效吞吐量增加了三倍以上;(ii) 每个智能体的日志副本收敛到相同状态,没有写入被静默丢弃,而基于文件的跟踪器会丢失并发写入;(iii) 该日志是一个可挖掘的工件,从中可以自动恢复具体的故障模式——冲突编辑、锁饥饿、冗余发现、竞态关闭——并带有来源信息,其中一些在拉取请求历史中是不可见的。我们发布了数据集、测试平台和挖掘工具包。

英文摘要

Autonomous coding agents now open millions of pull requests, yet large-scale studies find their PRs are produced faster but accepted less often - a coordination and trust gap that pull-request-level telemetry cannot explain. We argue the missing signal lives before the PR, in how concurrent agents claim, divide, and collide over shared work. We study this process through grite, our open-source coordination substrate that needs no central server and stores its records inside git itself, so its append-only, signed event log captures the coordination process directly. We show that (i) this shared substrate reduces duplicate and conflicting work at bounded overhead - the share of work that merely re-does a teammate's task falls from 78% to 0% while useful throughput more than triples; (ii) every agent's copy of the log converges to the same state with no write silently dropped, where a file-based tracker loses concurrent writes; and (iii) the log is a mineable artefact from which concrete failure modes - conflicting edits, lock starvation, redundant rediscovery, race-to-close - are automatically recoverable with provenance, several invisible in pull-request history. We release the dataset, harness, and mining toolkit.

2606.19464 2026-06-19 cs.AI cs.MA 交叉投稿

Deontic Policies for Runtime Governance of Agentic AI Systems

面向自主AI系统运行时治理的道义策略

Anupam Joshi, Tim Finin, Karuna Pande Joshi, Lalana Kagal

发表机构 * CSEE Department UMBC Baltimore, MD, USA Center for AI UMBC Baltimore, MD, USA Information Systems Department UMBC Baltimore, MD, USA CSAIL MIT Cambridge, MA, USA

AI总结 针对大语言模型驱动的自主AI系统在安全、隐私和合规方面的治理挑战,提出AgenticRei框架,利用基于Rei的道义策略语言(OWL表示)在运行时通过逻辑引擎强制执行义务、豁免、冲突解决等治理约束,并兼容A2AS等标准。

Comments 10 pages, 1 figure. To be published in the 2026 IEEE Symposium on Agentic Services which is part of the IEEE Conference on Web Services

详情
AI中文摘要

由大语言模型驱动的自主AI系统引入了一类新的安全、隐私和合规挑战:能够调用工具、操作数据、安装软件并与跨组织边界对等代理协调的代理,不仅必须通过身份验证和访问控制来约束,还必须通过企业治理的完整结构来约束。这包括指定代理被允许和禁止做什么,它们在特定操作后必须做什么(例如,通知CISO),在什么条件下可以免除一项持续义务,以及当策略冲突时哪些规则优先。这个治理问题超出了当前策略引擎的能力范围。诸如XACML、Rego和Cedar等系统仅处理此治理结构的允许/禁止子集。它们不提供义务生命周期管理、元策略冲突解决、在特定情况下免除义务的豁免,以及通常在医疗、网络安全或数据隐私等应用中发现的领域类层次结构的本体推理。我们提出了AgenticRei,它实现了关键的治理需求,如义务、豁免、策略冲突解决和策略推理,以及基本的允许/禁止约束。我们使用基于Rei框架的道义策略语言,表示为OWL(Web本体语言),并由完全在LLM外部的高性能逻辑引擎在运行时评估。同一管道同时管理代理的工具调用和代理间消息。我们通过示例表明,道义策略捕获了当前生产引擎大多无法表达的安全和隐私治理约束。我们的方法自然地与A2AS等行业标准框架兼容。

英文摘要

Autonomous agentic AI systems driven by Large Language Models (LLMs) introduce a new class of security, privacy, and compliance challenges: an agent that can invoke tools, manipulate data, install software, and coordinate with peer agents across organizational boundaries must be constrained not just by authentication and access control, but by the full structure of enterprise governance. This includes specifying what agents are permitted and prohibited from doing, what they areobliged to do after certain actions (e.g., notify the CISO), under what conditions a standing obligation may be waived, and which rules take precedence when policies conflict. This governance problem exceeds what current policy engines provide. Systems such as XACML, Rego, and Cedar address only the permit/prohibit subset of this governance structure. They do not provide obligation lifecycle management, meta-policy conflict resolution, dispensations that waive obligations in specific circumstances, and ontological reasoning over domain class hierarchies commonly found in applications such as healthcare, cybersecurity, or data privacy. We propose AgenticRei, which realizes key governance requirements such as obligations, dispensations, policy conflict resolutions, and reasoning over policies, as well as the basic permit/prohibit constraints. We use a deontic policy language built on the Rei framework, expressed as OWL (Web Ontology Language) and evaluated at runtime by a high-performance logic engine entirely outside the LLM. The same pipeline governs both tool invocations by the agent and agent-to-agent messages. We show through examples that deontic policies capture governance constraints around security and privacy that mostly cannot be expressed in current production engines. Our approach composes naturally with industry-standard frameworks like A2AS.

2606.19370 2026-06-19 cs.LG cs.AI cs.MA 交叉投稿

Human-like autonomy emerges from self-play and a pinch of human data

类人自主性从自我对弈和少量人类数据中涌现

Daphne Cornelisse, Julian Hunt, Zixu Zhang, Waël Doulazmi, Kevin Joseph, Jaime Fernández Fisac, Eugene Vinitsky

发表机构 * NYU Tandon School of Engineering(纽约大学坦登工程学院) NYU Courant(纽约大学库朗数学科学研究所) Princeton University(普林斯顿大学) Centre for Robotics, Mines Paris(巴黎矿业大学机器人中心) Valeo(法雷奥)

AI总结 提出一种结合自我对弈强化学习与少量人类演示的正则化方法,仅用30分钟人类数据即可训练出与人类协调的驾驶策略,训练时间仅15小时。

Comments 10 pages

详情
AI中文摘要

自我对弈强化学习最近成为一种无需任何人类数据即可训练驾驶策略的方法。它利用廉价的大规模模拟来替代昂贵的大规模人类驾驶演示。这种方法的一个关键局限性是,通过纯自我对弈训练的策略可以学习有效但不符合人类习惯的驾驶惯例。先前的工作试图通过广泛的奖励工程和领域随机化来缓解这种行为偏差,但这些方法脆弱且劳动密集。我们的方法没有完全抛弃人类演示,而是将其作为最小安全目标达到奖励之上的正则化目标。就像好炖菜中的香料一样,我们发现少量人类数据大有裨益:我们的方法仅使用30分钟的人类演示,比同类模仿学习方法少2500倍。由此产生的策略与保留的人类轨迹协调,并在单个消费级GPU上15小时内完成训练。视频和完整源代码见https://this URL。

英文摘要

Self-play reinforcement learning has recently emerged as a way to train driving policies without any human data. It uses cheap, large-scale simulations to substitute expensive, large-scale human driving demonstrations. A key limitation of this approach is that policies trained through pure self-play can learn effective but alien driving conventions incompatible with people. Previous works attempt to mitigate such behavioral misalignments through extensive reward engineering and domain randomization, which are brittle and labor-intensive. Instead of completely discarding human demonstrations, our method treats them as a regularization objective on top of a minimal safe goal-reaching reward. Like the spice in a good stew, we find that a little human data goes a long way: our method uses only 30 minutes of human demonstrations, 2500x fewer than comparable imitation learning approaches. Resulting policies coordinate with held-out human trajectories and complete training in 15 hours on a single consumer-grade GPU. Videos and full source code are available at https://spiced-self-play.com/.

2606.19871 2026-06-19 math.OC cs.MA cs.SY eess.SY 交叉投稿

Semiglobal Input-Delay Tolerance Algorithm for Distributed Nonconvex Optimization of Networked Nonlinear Systems

网络化非线性系统分布式非凸优化的半全局输入延迟容忍算法

Jing-Zhe Xu, Zhi-Wei Liu, Ming-Feng Ge, Yan-Wu Wang, Dinxin He

AI总结 针对存在输入延迟和一致性约束的网络化非线性系统,提出一种半全局输入延迟容忍算法,通过分层设计和输入-状态稳定性分析,在Polyak-Łojasiewicz条件下实现非凸优化的分布式求解。

Comments 36 pages, 5 figures

详情
AI中文摘要

本文研究了一类受输入延迟和一致性约束的网络化非线性系统中的分布式优化问题。引入了输入延迟容忍半全局收敛(IDTSC),即对于任意给定的紧致初始集,存在一个可容许的延迟界,在该界下,最优解在一致性约束内被计算,并且所有节点状态收敛到该解。基于分层设计和输入-状态稳定性分析,开发了一种新的半全局输入延迟容忍(SIDT)算法,该算法在实际中实现了输入延迟与非线性动力学耦合下的分布式优化IDTSC。此外,通过Polyak-Łojasiewicz条件放宽严格凸性要求,SIDT算法将其适用性扩展到非凸优化。最后,数值实验验证了该理论在具有输入延迟的网络化非线性系统上的有效性。

英文摘要

This paper studies a class of distributed optimization problems in networked nonlinear systems (NNSs) subject to input delays and consensus constraints. It introduces input-delay tolerant semiglobal convergence (IDTSC), meaning that for any prescribed compact initial set there exists an admissible delay bound under which the optimal solution is computed within consensus constraints and all node states converge to the solution. Building on a hierarchical design and input-to-state stability analysis, a new semiglobal input-delay tolerant (SIDT) algorithm is developed that practically achieves IDTSC for distributed optimization under the coupling between input delays and nonlinear dynamics. Further, by relaxing strict convexity requirements through the Polyak-Łojasiewicz condition, the SIDT algorithm broadens its applicability to nonconvex optimization. Finally, numerical experiments corroborate the theory on NNSs with input delays.

2606.18191 2026-06-19 cs.AI cs.MA 交叉投稿

DRFLOW: A Deep Research Benchmark for Personalized Workflow Prediction

DRFLOW:用于个性化工作流预测的深度研究基准

Md Tawkat Islam Khondaker, Raymond Li, Muhammad Abdul-Mageed, Laks V. S. Lakshmanan, Issam H. Laradji

发表机构 * ServiceNow AI Research(ServiceNow人工智能研究)

AI总结 提出DRFLOW基准,评估AI代理从异构源预测个性化工作流的能力,包含5领域100任务,并设计7个诊断指标,实验显示现有代理性能有限。

详情
AI中文摘要

深度研究(DR)系统越来越多地用于复杂信息寻求任务,但现有工作主要关注生成报告和摘要。相比之下,许多企业任务需要代理识别具体的工作流,即一系列行动步骤。例如,代理不应总结预算政策,而应能确定回答诸如“在固定预算下如何申请新员工?”这类问题所需的步骤。因此,我们引入DRFLOW,一个用于评估代理从异构源预测个性化工作流的基准。每个任务要求代理从分散来源中识别相关证据,然后使用这些证据预测用户任务的正确行动步骤序列。DRFLOW包含跨五个领域的100个任务,1246个参考工作流步骤,基于超过3900个来源。我们定义了七个诊断指标,涵盖事实依据、步骤恢复、结构排序、条件解决和个性化。我们进一步提出DRFLOW-Agent(DRFA),一个面向工作流的参考代理,用于预测个性化工作流。我们表明,尽管DRFA相比强基线代理有所改进(平均F1分数提升高达10.02%),但在这些工作流指标上仍有很大的改进空间,表明预测完整且正确的个性化工作流仍然是深度研究的一个挑战性前沿。

英文摘要

Deep research (DR) systems are increasingly used for complex information-seeking tasks, but existing works mainly focus on generating reports and summaries. In contrast, many enterprise tasks instead require an agent to identify concrete workflows which is a sequence of action-steps. For example, rather than summarizing budgeting policies, an agent should be able to determine the steps needed to answer a question such as: "How do I request new headcount given a fixed budget?". Therefore, we introduce DRFLOW, a benchmark for evaluating personalized workflows predicted by agents from heterogeneous sources. Each task requires the agent to identify relevant evidence from scattered sources, then use that evidence to predict the correct action-step sequence for the user's task. DRFLOW contains 100 tasks across five domains, with 1,246 reference workflow steps grounded in more than 3,900 sources. We define seven diagnostic metrics covering factual grounding, step recovery, structural ordering, condition resolution, and personalization. We further present DRFLOW-Agent (DRFA), a workflow-oriented reference agent to predict personalized workflow. We show that although DRFA improves over strong baseline agents (upto 10.02% average F1 score), there is substantial room for improvement remains across these workflow metrics, indicating that predicting complete and correct personalized workflows remains a challenging frontier for deep research.

2606.06971 2026-06-19 cs.MA cs.SI 版本更新

Modeling U.S. Attitudes Toward China via an Event-Steered Multi-Agent Simulator

通过事件驱动的多智能体模拟器建模美国对华态度

Chenxu Zhu, Hantao Yao, Wu Liu, Junbo Guo, Yongdong Zhang

AI总结 提出事件驱动多智能体模拟器(ES-MAS),利用CURE数据集和双流数据集成引擎(DSDIE)及新闻驱动动态交互模块(NDDI),模拟美国对华舆论的动态演化,实验表明优于现有模型。

详情
AI中文摘要

理解舆论的动态演化,如美国公众对中国的态度,对于评估地缘政治风险至关重要。然而,现有的基于LLM的多智能体模拟器主要依赖静态规则和固定数据集,限制了其捕捉现实世界中宏观层面舆论转变的动态、事件驱动特性的能力。为解决这一限制,我们提出了一种事件驱动的多智能体模拟器(ES-MAS),其中重大事件和日常新闻通过智能体之间的动态交互持续驱动舆论演化。我们首先构建了中美关系演化(CURE)数据集,涵盖2021年至2025年的20个季度,包括258个重大事件和超过14,000篇日常新闻文章,为建模舆论动态提供了全面的时间基础。基于CURE数据集,我们提出了双流数据集成引擎(DSDIE),该引擎通过宏观层面事件将模拟与历史时间线对齐,同时基于个体智能体画像和上下文信号实现个性化信息暴露。此外,我们设计了新闻驱动的动态交互(NDDI)模块,该模块自适应地将具有共同新闻兴趣的智能体分组到局部交互上下文中,促进自下而上的共识形成,同时降低孤立信息茧房的风险。在CURE数据集上的实验结果表明,ES-MAS在复现真实世界历史趋势方面显著优于现有模拟器,为建模动态舆论演化提供了一个可扩展且有效的框架。

英文摘要

Understanding the dynamic evolution of opinions, such as U.S. public attitudes toward China, is essential for assessing geopolitical risks. However, existing LLM-based multiagent simulators predominantly rely on static rules and fixed datasets, limiting their ability to capture the dynamic, event-driven nature of macro-level opinion shifts in real-world settings. To address this limitation, we propose an Event-Steered Multi-Agent Simulator (ES-MAS), in which significant events and daily news continuously drive opinion evolution through dynamic interactions among agents. We first construct the China-U.S. Relation Evolution (CURE) dataset, covering 20 quarters from 2021 to 2025, including 258 major events and over 14,000 daily news articles, and providing a comprehensive temporal foundation for modeling opinion dynamics. Building upon the CURE dataset, we propose a Dual-Stream Data Integration Engine (DSDIE) that aligns simulations with historical timelines via macro-level events while enabling personalized information exposure based on individual agent profiles and contextual signals. Furthermore, we design a News-Driven Dynamic Interaction (NDDI) module, which adaptively groups agents with shared news interests into localized interaction contexts, facilitating bottom-up consensus formation while mitigating the risk of isolated information cocoons. Experimental results on the CURE dataset demonstrate that ES-MAS substantially outperforms existing simulators in reproducing real-world historical trends, offering a scalable and effective framework for modeling dynamic opinion evolution.

2605.22748 2026-06-19 cs.RO cs.AI cs.LG cs.MA 版本更新

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

通过多智能体强化学习实现超人类安全且敏捷的赛车

Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier, Davide Scaramuzza

发表机构 * Robotics and Perception Group, University of Zurich(苏黎世大学机器人与感知组) Google DeepMind(谷歌深Mind) Nomagic

AI总结 本文提出通过多智能体强化学习在高速四旋翼赛车中实现安全且敏捷的性能,展示了多智能体交互对真实世界交互安全性的关键作用,同时在高速赛车中超越人类飞行员并减少碰撞率。

Comments 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl

详情
AI中文摘要

自主系统在孤立或模拟环境中已实现超人类性能,但在共享、动态的真实世界空间中仍显得脆弱。这种失败源于物理应用中主导的单智能体范式,其中其他参与者被忽略或视为环境噪声,阻碍了有效协调。本文证明多智能体强化学习为真实世界交互提供了必要的安全性基础。使用高速四旋翼赛车作为高风险测试平台,训练智能体在复杂空气动力学相互作用和战略机动中导航,具有可变数量的赛车。通过联赛基于的自我对战,智能体进化出复杂的前瞻性行为,包括主动避障、超车和处理多智能体物理交互,包括空气动力学下洗。我们的智能体在超过22米/秒的速度下多玩家赛车中超越了冠军级人类飞行员,同时与最先进的单智能体基线相比,碰撞率减少了50%。关键的是,使用多样化的人工智能体进行训练能够实现零样本泛化到更安全的人类交互。这些结果表明,实现稳健的机器人共存的路径不在于孤立的安全约束,而在于多智能体交互的严格要求。多媒体材料可在:https://rpg.ifi.uzh.ch/marl

英文摘要

Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl

2507.00875 2026-06-19 cs.CL cs.HC cs.MA 版本更新

TransLaw: A Large-Scale Dataset and Multi-Agent Benchmark Simulating Professional Translation of Hong Kong Case Law

TransLaw:模拟香港判例法专业翻译的大规模数据集与多智能体基准

Xi Xuan, Chunyu Kit

发表机构 * City University of Hong Kong, Hong Kong SAR, China(香港城市大学)

AI总结 针对香港判例法英译中资源匮乏、法律术语和格式要求严格的问题,构建了首个大规模句对齐平行语料库HKCFA Judgment 97-22,并提出多智能体框架TransLaw,通过分解翻译任务、集成法律词汇库和检索增强生成,显著提升翻译质量,但仍未达到人类专家的风格自然度。

Comments Accepted at ICML 2026 - AI for Law

详情
AI中文摘要

根据《基本法》第8-9条,香港法院判决书需从英文翻译成繁体中文,但由于平行资源短缺以及对法律术语、引用格式和司法风格的严格要求,这一任务仍受到限制。我们引入了HKCFA Judgment 97-22,这是首个用于香港判例法的大规模句对齐平行语料库,包含344份专业翻译的判决书(11,099个句对;210万词元),涵盖1997年至2022年。基于这一资源,我们提出了TransLaw,一个多智能体框架,将翻译分解为词级表达、句级翻译和多维审查,集成了专门的香港法律词汇数据库、检索增强生成和迭代反馈,并包括涵盖语义对齐、术语、引用和风格的四维专家审查。通过对13个开源和商业大语言模型进行基准测试,我们证明TransLaw在所有评估模型上均显著优于单智能体基线,并在3次迭代内收敛。由10名持证法律翻译人员使用我们提出的Legal ACS指标进行的人工评估证实了法律语义准确性的提升,同时表明TransLaw在风格自然度上仍落后于人类专家。数据集和基准代码可在以下网址获取:https://xxx。

英文摘要

Translating Hong Kong Court Judgments from English to Traditional Chinese is mandated by Articles 8-9 of the Basic Law, yet remains constrained by a shortage of parallel resources and rigorous demands on legal terminology, citation format, and judicial style. We introduce HKCFA Judgment 97-22, the first large-scale sentence-aligned parallel corpus for HK case law, comprising 344 professionally translated judgments (11,099 sentence pairs; 2.1M tokens) spanning 1997-2022. Building on this resource, we propose TransLaw, a multi-agent framework that decomposes translation into word-level expression, sentence-level translation, and multidimensional review, integrating a specialized Hong Kong legal glossary database, Retrieval-Augmented Generation, and iterative feedback, with four-dimensional expert review covering semantic alignment, terminology, citation, and style. Benchmarking 13 open-source and commercial LLMs, we demonstrate that TransLaw significantly outperforms single-agent baselines across all evaluated models, with convergence within 3 iterations. Human evaluation by 10 certified legal translators using our proposed Legal ACS metric confirms gains in legal-semantic accuracy, while showing that TransLaw still trails human experts in stylistic naturalness. The dataset and benchmark code are available at https://github.com/xuanxixi/TransLaw.

2511.17625 2026-06-19 cs.MA cs.GT 版本更新

Iterative Negotiation and Oversight: A Case Study in Decentralized Air Traffic Management

迭代协商与监督:去中心化空中交通管理案例研究

Jaehan Im, John-Paul Clarke, Ufuk Topcu, David Fridovich-Keil

AI总结 提出一种受监管的去中心化协商框架,通过交易拍卖实现共识,并引入税收式监督机制引导系统效率和公平性,理论保证有限时间终止,案例验证了框架在去中心化空中交通管理中的有效性。

详情
AI中文摘要

在去中心化多智能体系统中,自利智能体通常具有冲突偏好,达成共识仍然具有挑战性。现有的协调方法使智能体无需中央协调员即可达成共识,但无法对系统级目标(如效率或公平性)提供正式保证。为解决这一局限,我们提出一个受监管的去中心化协商框架,该框架通过有限的监管监督增强去中心化协商机制。该框架基于交易拍卖达成共识,使具有冲突偏好的自利智能体能够通过资产交易进行协商,同时避免直接披露私有资产估值。我们引入一种监督机制,实施类似税收的干预,引导去中心化协商走向系统高效和公平的结果,同时调节框架的收敛速度。我们建立了有限时间终止的理论保证,并推导出系统效率和收敛速度与监管干预水平相关的界限。基于美国空中交通管理中的协作航迹选项计划(一个改道倡议)的案例研究表明,该框架能够可靠地在自利空域扇区管理者之间达成共识,并揭示了监管干预水平如何调节系统效率与收敛速度之间的关系。综合理论和实验结果表明,所提出的框架提供了一种受监管的去中心化协调机制,在维护非合作最终选择的同时保障系统级目标。

英文摘要

Achieving consensus among self-interested agents remains challenging in decentralized multi-agent systems, where agents often have conflicting preferences. Existing coordination methods enable agents to reach consensus without a centralized coordinator, but do not provide formal guarantees on system-level objectives such as efficiency or fairness. To address this limitation, we propose a regulated decentralized negotiation framework that augments a decentralized negotiation mechanism with limited regulatory oversight. The framework builds upon the trading auction for consensus, enabling self-interested agents with conflicting preferences to negotiate through asset trading while avoiding direct disclosure of private asset valuations. We introduce an oversight mechanism, which implements a taxation-like intervention that guides decentralized negotiation toward system-efficient and equitable outcomes while also regulating how fast the framework converges. We establish theoretical guarantees of finite-time termination and derive bounds linking system efficiency and convergence rate to the level of regulatory intervention. A case study based on the collaborative trajectory options program, a rerouting initiative in U.S. air traffic management, demonstrates that the framework can reliably achieve consensus among self-interested airspace sector managers, and reveals how the level of regulatory intervention regulates the relationship between system efficiency and convergence speed. Taken together, the theoretical and experimental results indicate that the proposed framework provides a mechanism for regulated decentralized coordination that preserves noncooperative final selection while safeguarding system-level objectives.

2501.17015 2026-06-19 cs.AI cs.MA cs.RO 版本更新

UniMM: A Unified Mixture Model Framework for Multi-Agent Simulation

UniMM:一种用于多智能体仿真的统一混合模型框架

Longzhong Lin, Xuewu Lin, Kechun Xu, Haojian Lu, Lichao Huang, Rong Xiong, Yue Wang

发表机构 * Zhejiang University(浙江大学) Horizon Robotics

AI总结 提出UniMM框架统一回归混合模型与离散NTP模型,通过闭环样本生成缓解分布偏移,并在WOSAC基准上取得最优性能。

Comments Accepted author manuscript. The version of record has been published in IEEE Transactions on Pattern Analysis and Machine Intelligence

Journal ref IEEE Transactions on Pattern Analysis and Machine Intelligence, Early Access, 2026

详情
AI中文摘要

仿真在评估自动驾驶系统中起着关键作用,其中生成逼真的多智能体行为是一个关键方面。在多智能体仿真中,主要挑战包括行为多模态性和闭环分布偏移。在本研究中,我们提出了一个统一的混合模型(UniMM)框架,用于生成多模态智能体行为,该框架涵盖了主流方法,包括基于回归的混合模型和离散NTP模型。此外,我们引入了一种针对混合模型的闭环样本生成方法,以缓解分布偏移。在UniMM框架内,我们从模型和数据角度识别了关键配置。我们对各种模型配置进行了系统检查,并全面描述了它们的效果。此外,我们对数据配置的研究强调了闭环样本在实现逼真仿真中的关键作用。为了将闭环样本的优势扩展到更广泛的混合模型中,我们进一步引入了一种时间解缠和对齐机制,以解决捷径学习和离策略学习问题。利用我们探索的见解,UniMM框架内提出的不同变体,包括离散模型、无锚模型和基于锚点的模型,均在WOSAC基准上取得了最先进的性能。

英文摘要

Simulation plays a crucial role in assessing autonomous driving systems, where the generation of realistic multi-agent behaviors is a key aspect. In multi-agent simulation, the primary challenges include behavioral multimodality and closed-loop distributional shifts. In this study, we formulate a unified mixture model (UniMM) framework for generating multimodal agent behaviors, which can cover the mainstream methods including regression-based mixture models and discrete NTP models. Furthermore, we introduce a closed-loop sample generation approach tailored for mixture models to mitigate distributional shifts. Within the UniMM framework, we recognize critical configurations from both the model and data perspectives. We conduct a systematic examination of various model configurations, and comprehensively characterize their effects. Moreover, our investigation into the data configuration highlights the pivotal role of closed-loop samples in achieving realistic simulations. To extend the benefits of closed-loop samples across a broader range of mixture models, we further introduce a temporal disentanglement-and-alignment mechanism to address the shortcut learning and off-policy learning issues. Leveraging insights from our exploration, the distinct variants proposed within the UniMM framework, including discrete, anchor-free, and anchor-based models, all achieve state-of-the-art performance on the WOSAC benchmark.