arXivDaily arXiv每日学术速递 周一至周五更新

AI 大模型

AI Agent

智能体、工具调用、规划、工作流、多智能体和自主任务执行。

今日/当前日期收录 65 信号源:cs.AI, cs.CL, cs.LG, cs.SE

1. 规划决策 4 篇

2606.18537 2026-06-18 cs.LG 新提交 65%

Do as the Romans Do: Learning Universal Behaviors from Heterogeneous Agents

入乡随俗:从异构智能体学习通用行为

Caleb Chang, Davin Win Kyi, Natasha Jaques, Karen Leung

发表机构 * University of Washington(华盛顿大学) NVIDIA(英伟达)

专题命中 规划决策 :提取通用奖励训练通用智能体

AI总结 提出GRID方法,从追求不同目标的异构示范者中提取通用奖励,训练通用智能体以学习环境通用能力,避免模式平均偏差,提升下游任务微调效率。

详情
AI中文摘要

人类通常通过观察他人来获取新技能,因为观察到的行为隐含地揭示了如何在环境中行动。然而,从异构群体中获得的观察会引入冲突的行为信号,使得难以确定哪些行为值得模仿。我们通过通用奖励推断与解耦(GRID)来解决这一挑战,这是一种从追求不同目标的异构示范者群体中提取普遍有用行为的社会学习方法。GRID将每个智能体的奖励函数分解为通用奖励(捕捉所有智能体共享的行为)和特定奖励(捕捉个体偏好和目标)。仅基于通用奖励进行训练提供了一种通用预训练的新范式。它产生了一个通用智能体,该智能体内化了通用的环境能力,如安全性和基本任务熟练度,而不会出现困扰标准从示范学习技术的模式平均偏差。这个通用智能体作为微调到下游任务(包括训练中未见过的偏好)的优越先验。在合成基函数分解、多智能体Craftax和连续自动驾驶模拟器(Highway-Env)上的实验证实,GRID以语义上有意义的方式成功解耦了奖励结构,优于标准的从示范学习基线,并实现了更高效和稳定的特化。

英文摘要

Humans often acquire new skills by observing others, since observed behaviors implicitly reveal how to act in an environment. However, observations drawn from a heterogeneous population introduce conflicting behavioral signals, making it difficult to determine which behaviors are worth imitating. We address this challenge with General Reward Inference and Disentanglement (GRID), a social learning method that extracts universally useful behaviors from a heterogeneous population of demonstrators pursuing different goals. GRID decomposes per-agent reward functions into a general reward, capturing behaviors shared across all agents, and specific rewards, capturing individual preferences and objectives. Training exclusively on the general reward provides a new paradigm of generalist pretraining. It yields a generalist agent that internalizes universal environmental competencies, such as safety and basic task proficiency, without the mode-averaging bias that afflicts standard learning from demonstration techniques. This generalist serves as a superior prior for fine-tuning to downstream tasks, including preferences unseen during training. Experiments across a synthetic basis function decomposition, multi-agent Craftax, and a continuous autonomous driving simulator (Highway-Env) confirm that GRID successfully disentangles reward structure in a semantically meaningful way, outperforms standard learning from demonstration baselines, and enables more efficient and stable specialization.

2606.18730 2026-06-18 cs.RO cs.AI math.CO math.OC 新提交 60%

Two-Phase Bilevel Search for the Moving-Target Traveling Salesman Problem with Moving Obstacles

带移动障碍物的移动目标旅行商问题的两阶段双层搜索

Allen George Philip, Anoop Bhat, Sivakumar Rathinam, Howie Choset

发表机构 * Texas A&M University(德克萨斯A&M大学) Carnegie Mellon University(卡内基梅隆大学)

专题命中 规划决策 :移动目标TSP的两阶段双层搜索算法

AI总结 针对带移动障碍物的移动目标旅行商问题,提出混合整数锥规划公式和两阶段双层搜索算法,显著优于基线方法。

详情
AI中文摘要

移动目标旅行商问题(MT-TSP)寻求从静态仓库出发、访问一组移动目标(每个目标在其分配的时间窗口内)并返回仓库的代理的最小成本轨迹。在本文中,我们研究了带移动障碍物的移动目标旅行商问题(MT-TSP-MO),这是MT-TSP的推广,其中代理轨迹必须避开移动障碍物。我们提出了一个混合整数锥规划(MICP)公式,可以使用现成的求解器求解,以及一个快速且可扩展的两阶段双层搜索(TPBS)算法,该算法为问题计算高质量可行解。我们在多达40个目标和40个障碍物的广泛问题实例上评估了我们的方法,与现有基线算法相比。结果表明,所提出的两种方法在成功率、解决方案成本和计算时间方面均显著优于基线。

英文摘要

The Moving-Target Traveling Salesman Problem (MT-TSP) seeks a minimum cost trajectory for an agent that departs from a static depot, visits a set of moving targets, each within one of their assigned time windows, and returns to the depot. In this article, we study the Moving-Target Traveling Salesman Problem with Moving Obstacles (MT-TSP-MO), a generalization of the MT-TSP where the agent trajectory must avoid moving obstacles. We present a Mixed-Integer Conic Programming (MICP) formulation that can be solved using off-the-shelf solvers, as well as a fast and scalable Two-Phase Bilevel Search (TPBS) algorithm that computes high-quality feasible solutions for the problem. We evaluate our approaches against an existing baseline algorithm on a broad range of problem instances with up to 40 targets and 40 obstacles. The results demonstrate that both the proposed methods significantly outperform the baseline with respect to success rates, solution costs, and computation time.

2412.15472 2026-06-18 cs.GT econ.TH 60%

On the Fairness of Additive Welfarist Rules

关于加法福利主义规则的公平性

Karen Frilya Celine, Warut Suksompong, Sheung Man Yuen

专题命中 规划决策 :公平分配规则研究,与多智能体系统相关

AI总结 本文研究了加法福利主义规则在公平分配中的公平性,证明了MNW规则是唯一能保证EF1的规则,同时探讨了不同实例类型下的规则特性。

Comments Appears in the 24th International Conference on Autonomous Agents and Multiagent Systems (AAMAS), 2025

Journal ref ACM Transactions on Economics and Computation, 14(2):5 (2026)

详情
AI中文摘要

分配不可分割的商品是公平分割中的常见任务。我们研究了加法福利主义规则,这类规则选择使某些效用函数总和最大的分配。先前研究显示,最大纳什福利(MNW)规则是唯一能保证 envy-freeness up to one good(EF1)的加法福利主义规则。我们加强这一结论,证明MNW规则在相同商品实例、二值实例以及三个或更多代理人归一化实例中仍唯一保证EF1。另一方面,如果代理人的效用是整数,我们证明其他规则也能提供EF1保证,并为各种实例类型提供了这些规则的特征化。

英文摘要

Allocating indivisible goods is a ubiquitous task in fair division. We study additive welfarist rules, an important class of rules which choose an allocation that maximizes the sum of some function of the agents' utilities. Prior work has shown that the maximum Nash welfare (MNW) rule is the unique additive welfarist rule that guarantees envy-freeness up to one good (EF1). We strengthen this result by showing that MNW remains the only additive welfarist rule that ensures EF1 for identical-good instances, two-value instances, as well as normalized instances with three or more agents. On the other hand, if the agents' utilities are integers, we demonstrate that several other rules offer the EF1 guarantee, and provide characterizations of these rules for various classes of instances.

2606.19175 2026-06-18 econ.TH 新提交 55%

To Gamble, Perchance to Grow

赌博,或许为了增长

Mark Whitmeyer

专题命中 规划决策 :研究增长最优投资组合问题,涉及决策优化

AI总结 研究增长最优(凯利)投资组合问题中的收益变换,刻画了产生更保守投资组合的变换条件,并推导了理性疏忽代理人的风险厌恶比较。

详情
AI中文摘要

我研究了增长最优(凯利)投资组合问题中的收益变换。在一安全一风险资产问题中,收益变换 f 普遍产生更保守的投资组合当且仅当 f 是凹且严格递增的,并且 r/f 是凸的。作为推论,我刻画了理性疏忽代理人的比较风险厌恶:一个更风险厌恶的代理人是在 Pratt (1964) 意义上足够更风险厌恶的代理人。

英文摘要

I study transformations of returns in the growth-optimal (Kelly) portfolio problem. In the one-safe-one-risky-asset problem, a return transform f universally produces a more conservative portfolio if and only if f is concave and strictly increasing and r/f is convex. As a corollary, I characterize comparative risk aversion for a rationally-inattentive agent: a more risk-averse agent is one who is sufficiently more risk averse in the Pratt (1964) sense.

2. 其他Agent 1 篇

2505.03863 2026-06-18 cs.CR cs.AI 55%

Data-Driven Falsification of Cyber-Physical Systems

数据驱动的物理系统验证

Atanu Kundu, Sauvik Gon, Rajarshi Ray

发表机构 * Indian Association for the Cultivation of Science(印度科学培养协会)

专题命中 其他Agent :数据驱动验证物理系统,涉及智能体验证

AI总结 本文提出一种框架,将物理系统验证与深度神经网络验证联系起来,并利用决策树的可解释性加速验证过程,展示了在ARCH-COMP 2024基准测试中高效发现多个反例的潜力。

详情
AI中文摘要

物理系统(CPS)在医疗、航空电子和自动驾驶等安全关键领域中普遍存在。因此,对其操作安全性的形式验证至关重要。本文针对验证问题,即寻找系统中的不安全执行而非证明其不存在。本文的贡献是提出一个框架,将CPS的验证与深度神经网络(DNN)的验证联系起来,并利用决策树的内在可解释性加速CPS的验证。这通过构建被测CPS的替代模型(作为DNN模型或决策树),应用各种DNN验证工具来验证CPS,并通过从其决策树替代模型中提取的安全违规解释来指导新的验证算法实现。所提出的框架有潜力利用一系列设计用于验证DNN鲁棒性属性的对抗攻击算法,以及最先进的DNN验证算法。尽管所提出的 methodology 可应用于可以执行或模拟的一般系统,但我们特别展示了其在CPS中的有效性。我们展示了我们的框架,作为工具FlexiFal,能够检测具有线性和非线性动态的CPS中难以发现的反例。决策树引导的验证在ARCH-COMP 2024验证基准测试中显示出有希望的结果。

英文摘要

Cyber-Physical Systems (CPS) are abundant in safety-critical domains such as healthcare, avionics, and autonomous vehicles. Formal verification of their operational safety is, therefore, of utmost importance. In this paper, we address the falsification problem, where the focus is on searching for an unsafe execution in the system instead of proving their absence. The contribution of this paper is a framework that (a) connects the falsification of CPS with the falsification of deep neural networks (DNNs) and (b) leverages the inherent interpretability of Decision Trees for faster falsification of CPS. This is achieved by: (1) building a surrogate model of the CPS under test, either as a DNN model or a Decision Tree, (2) application of various DNN falsification tools to falsify CPS, and (3) a novel falsification algorithm guided by the explanations of safety violations of the CPS model extracted from its Decision Tree surrogate. The proposed framework has the potential to exploit a repertoire of \emph{adversarial attack} algorithms designed to falsify robustness properties of DNNs, as well as state-of-the-art falsification algorithms for DNNs. Although the presented methodology is applicable to systems that can be executed/simulated in general, we demonstrate its effectiveness, particularly in CPS. We show that our framework, implemented as a tool \textsc{FlexiFal}, can detect hard-to-find counterexamples in CPS that have linear and non-linear dynamics. Decision tree-guided falsification shows promising results in efficiently finding multiple counterexamples in the ARCH-COMP 2024 falsification benchmarks~\cite{khandait2024arch}.