2606.05142 2026-06-04 cs.CV cs.AI 版本更新

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

GeM-NR：面向非刚性场景变化的几何感知多视角编辑

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

发表机构 * Chalmers University of Technology（查尔姆斯理工大学）

AI总结提出GeM-NR，一种无需训练的快速灵活方法，通过深度图对齐、视角投影和条件细化实现多视角一致的通用非刚性图像编辑，支持几何和外观的显著变化。

Comments Project page: https://gem-nr.github.io/

详情

AI中文摘要

近年来，基于生成模型的多视角图像编辑的发展使我们离通用3D内容生成和定制更近一步。现有大多数工作通过利用未编辑场景的几何结构，专注于刚性或仅外观的编辑。这自然将这些方法限制在保留底层场景结构的编辑上。其他方法则针对特定图像编辑任务（如物体移除和添加）进行训练。尽管取得了进展，但通用的非刚性编辑（即大幅改变场景几何的编辑）对现有方法仍然具有挑战性。我们提出GeM-NR，一种快速灵活且无需训练的方法，用于通用的多视角一致图像编辑，包括大幅改变场景几何和外观的编辑。给定一个使用选定骨干编辑器（如FLUX、Qwen、BrushNet）编辑的锚点图像和一个未编辑的查询图像，GeM-NR以与锚点编辑一致的方式编辑查询图像。该方法包含多个阶段：(i) 深度图估计，我们提出一种策略以最大化编辑和未编辑场景的3D点云之间的对齐；(ii) 投影到查询视角；(iii) 基于未编辑查询的条件细化所得图像。基于条件化的公式从两个视角很好地扩展到物体的多个视角。我们展示了该方法处理几何和外观显著变化的编辑的能力，这是现有方法难以做到的。我们进行了广泛评估，表明我们的方法在各种编辑任务中提高了一致性，包括生成编辑场景的3D表示。定量和定性结果均表明，我们的方法在编辑质量以及多视角几何和光度一致性方面达到了最先进的性能。

英文摘要

Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Most existing works focus on rigid or appearance-only edits by utilizing the geometry of the unedited scene. This naturally limits these methods to edits that preserve the underlying scene structure. Other approaches are trained for specific image editing tasks, such as object removal and addition. Despite this progress, general nonrigid edits, i.e., edits that substantially change the scene geometry, remain challenging for existing methods. We propose GeM-NR, a fast and flexible training-free approach for general multi-view consistent image editing, including edits that drastically change the geometry and appearance of the scene. Given an anchor image edited with a chosen backbone editor (such as FLUX, Qwen, BrushNet) and a query unedited image, GeM-NR edits the query image consistently with the anchor edit. The method incorporates multiple stages: (i) depth map estimation, where we propose a strategy to maximize the alignment between the 3D point clouds of the edited and unedited scenes, (ii) projection onto a query viewpoint, and (iii) refinement of the obtained image conditioned on the unedited query. The conditioning-based formulation scales well from two to many views of an object. We demonstrate the ability of our method to handle edits with significant changes in geometry and appearance, something that existing methods struggle with. We perform an extensive evaluation showing that our method improves consistency for a wide variety of edit tasks, including generating 3D representations of the edited scene. Both quantitative and qualitative results indicate the state-of-the-art performance of our method in terms of edit quality as well as geometric and photometric consistency across multiple views.

URL PDF HTML ☆

赞 0 踩 0

2606.05130 2026-06-04 cs.LG cs.AI 版本更新

谁需要标签？利用已有的元数据适应视觉基础模型

Elouan Gardès, Seung Eun Yi, Kartik Ahuja, Théo Moutakanni, Huy V. Vo, Piotr Bojanowski, Wolfgang M. Pernice, Loïc Landrieu, Camille Couprie

发表机构 * Meta FAIR, Paris（Meta FAIR，巴黎）； LIGM, CNRS, Gustave Eiffel, ENPC, IP Paris（LIGM，CNRS，居斯塔夫·艾菲尔，ENPC，IP巴黎）； Columbia University, New York（哥伦比亚大学，纽约）

AI总结提出一种无标签方法FINO，利用元数据通过自监督学习将通用视觉基础模型适应到专业科学领域，无需任务标签且仅用轻量探针进行监督，在多个领域超越标准无监督和全监督适应方法。

详情

AI中文摘要

我们提出一种无标签方法，将强大但通用的视觉基础模型适应到专业科学领域。标准的监督微调通常不适合这些场景：标签稀缺，且任务特定训练可能破坏模型的通用性和鲁棒性。我们转而利用元数据以自监督方式将表示适应到新领域。我们的方法FINO结合了标准的自监督目标与灵活的元数据指导，能够处理高度细粒度的离散元数据和连续元数据。它鼓励表示保留信息因子，同时抑制虚假因子。在亚细胞荧光显微镜、地球观测、野生动物监测和医学成像中，FINO始终优于标准的无监督域适应和全监督适应。它甚至超过了高度专业化的领域特定最先进方法，同时在骨干网络适应中不使用任何任务标签，仅使用轻量探针进行监督。

英文摘要

We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

URL PDF HTML ☆

赞 0 踩 0

2606.05106 2026-06-04 cs.CL cs.AI cs.CY 版本更新

Arithmetic Pedagogy for Language Models

语言模型的算术教学法

Andhika Bernard Lumbantobing, Hokky Situngkir

发表机构 * Bandung Fe Institute & Adjunct Science Fellow in InaAI（巴旦格Fe研究所及InaAI兼职科学研究员）； AI Research Center IT Del & Bandung Fe Institute（IT Del人工智能研究中心及巴旦格Fe研究所）

AI总结借鉴人类数学教学法，通过将GASING方法操作化为链式思维监督训练小规模GPT-2模型，使其在算术推理上达到高准确率并展现出联想式心算能力。

Comments 18 pages, 6 figures

详情

AI中文摘要

我们研究人类数学教学法能否指导语言模型训练以实现算术推理。基于GASING方法——一种通过从左到右过程解决基本算术的印尼教学法，该过程与令牌生成的因果顺序一致——我们将每个操作操作化为一个计算过程，其执行轨迹序列化为自然语言的链式思维监督。使用仅下一个令牌预测目标（无强化学习或基于奖励的优化），从零开始训练一个带有音节-粘着TOBA分词器的小型GPT-2解码器（86M参数）。监控训练揭示了三个不同的学习阶段，机制分析——对链式思维信息图的注意力掩码干预、残差流探测和对数透镜检查——表明模型首先内化程序化路径，随后发展出联想式“心算”能力，无需显式逐步计算即可检索中间结果。训练后的模型在保留问题上达到超过80%的准确率，并与显著更大的语言模型相比具有竞争力，表明有针对性的、基于教学法的训练可以在小规模下产生强大且经济的算术能力。

英文摘要

We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a left-to-right procedure aligned with the causal order of token generation -- we operationalize each operation as a computational procedure whose execution trace is serialized into natural-language Chain-of-Thought (CoT) supervision. A small GPT-2 decoder (86M parameters) with a syllabic-agglutinative TOBA tokenizer for Indonesian is trained from scratch on this data using only a next-token prediction objective, without reinforcement learning or reward-based optimization. Monitoring training reveals three distinct learning phases, and mechanistic analyses -- attention-masking interventions on the CoT information graph, residual-stream probing, and logit-lens inspection -- show that the model first internalizes a procedural pathway and subsequently develops an associative, ``mental-arithmetic'' capacity that retrieves intermediate results without explicit step-by-step computation. The trained model reaches over 80% accuracy on held-out problems and attains competitive performance against substantially larger language models, indicating that targeted, pedagogically grounded training can yield strong and economical arithmetic capability at small scale.

URL PDF HTML ☆

赞 0 踩 0

2606.05085 2026-06-04 cs.CL cs.AI 版本更新

Automatic Generation of Titles for Research Papers Using Language Models

使用语言模型自动生成研究论文标题

Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay

发表机构 * Jadavpur University（贾达沃尔大学）； Indian Association for the Cultivation of Science（印度科学培养协会）

AI总结提出利用预训练语言模型和大语言模型从摘要生成论文标题的方法，通过微调PEGASUS-large在多个数据集上取得最优性能。

Comments 24 pages, 24 tables, 01 figure

详情

AI中文摘要

研究论文的标题以清晰简洁的方式传达其主要思想，有时也包括结论。选择合适的标题通常具有挑战性，自动标题生成可以帮助作者完成此任务。在这项工作中，我们提出了一种使用开放权重预训练模型和大语言模型从摘要生成论文标题的技术。我们使用了CSPubSum和LREC-COLING-2024数据集，并引入了一个新数据集SpringerSSAT，该数据集来自社会科学领域的四个Springer期刊。此外，我们使用GPT-3.5-turbo在零样本设置下生成标题。模型性能通过ROUGE、METEOR、MoverScore、BERTScore和SciBERTScore指标进行评估。我们的实验表明，微调的PEGASUS-large在大多数指标上优于其他模型，包括微调的LLaMA-3-8B和零样本GPT-3.5-turbo。我们进一步证明ChatGPT可以生成有创意的论文标题。总体而言，AI生成的标题通常是恰当且可靠的。

英文摘要

The title of a research paper conveys its primary idea and, occasionally, its conclusions in a clear and concise manner. Choosing an appropriate title is often challenging, and automated title generation can assist authors in this task. In this work, we propose a technique to generate paper titles from abstracts using open-weight pre-trained and large language models. We use the CSPubSum and LREC-COLING-2024 datasets and introduce a new dataset, SpringerSSAT, curated from four Springer journals in the social sciences. Additionally, we use GPT-3.5-turbo in a zero-shot setting to generate titles. Model performance is evaluated with ROUGE, METEOR, MoverScore, BERTScore, and SciBERTScore metrics. Our experiments show that fine-tuned PEGASUS-large outperforms other models, including fine-tuned LLaMA-3-8B and zero-shot GPT-3.5-turbo, across most metrics. We further demonstrate that ChatGPT can generate creative paper titles. Overall, AI-generated titles are generally appropriate and reliable.

URL PDF HTML ☆

赞 0 踩 0

2606.05080 2026-06-04 cs.AI cs.LG 版本更新

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

AutoLab：前沿模型能否解决长周期自动研究与工程任务？

Zhangchen Xu, Junda Chen, Yue Huang, Dongfu Jiang, Jiefeng Chen, Hang Hua, Zijian Wu, Zheyuan Liu, Zexue He, Lichi Li, Shizhe Diao, Jiaxin Pei, Jinsung Yoon, Hao Zhang, Mengdi Wang, Radha Poovendran, Misha Sra, Alex Pentland, Zichen Chen

发表机构 * MIT（麻省理工学院）； Stanford University（斯坦福大学）； University of California, Berkeley（加州大学伯克利分校）； University of California, Los Angeles（加州大学洛杉矶分校）； University of California, San Diego（加州大学圣地亚哥分校）； University of Washington（华盛顿大学）； University of Toronto（多伦多大学）； University of Michigan（密歇根大学）； National University of Singapore（新加坡国立大学）； University of Tokyo（东京大学）

AI总结本文提出AutoLab基准，通过36个专家策划的长周期闭环优化任务评估前沿模型，发现持续迭代和利用经验反馈比初始尝试质量更重要。

Comments Code: https://github.com/autolabhq/autolab ; Website: https://autolab.moe/

详情

AI中文摘要

科学和工程进步本质上是一个长周期迭代过程：提出更改、运行实验、测量结果并不断改进工件。然而，现有的前沿模型基准主要评估单轮响应或短周期智能体轨迹，未能捕捉在长时间跨度内持续迭代改进的挑战。为了解决这一差距，我们引入了AutoLab，一个用于超长周期闭环优化的新基准。AutoLab包含36个现实且由专家策划的任务，涵盖四个不同领域：系统优化、谜题与挑战、模型开发和CUDA内核优化。每个任务从一个正确但故意次优的基线开始，并挑战智能体在严格的挂钟预算内改进它。评估17个最先进模型的结果表明，成功的主要预测因素不是智能体初始尝试的质量，而是其持续进行基准测试、编辑和整合经验反馈的毅力。虽然claude-opus-4.6表现出强大的长周期优化能力，但大多数前沿模型，包括几个专有模型，要么过早终止，要么在预算内进展甚微。这些结果强调了时间意识和持续迭代在自主智能体中的重要性。我们开源了完整的基准、评估框架和任务工件，以加速研究真正有能力的长周期智能体。

英文摘要

Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent trajectories, failing to capture the challenges of sustained iterative improvement over extended time horizons. To address this gap, we introduce AutoLab, a new benchmark for ultra long-horizon closed-loop optimization. AutoLab consists of 36 realistic, expert-curated tasks spanning four diverse domains: system optimization, puzzle & challenge, model development, and CUDA kernel optimization. Each task begins with a correct but deliberately suboptimal baseline and challenges agents to improve it within a strict wall-clock budget. Evaluating 17 state-of-the-art models reveals the dominant predictor of success is not the quality of an agent's initial attempt, but its persistence in repeatedly benchmarking, editing, and incorporating empirical feedback. While claude-opus-4.6 exhibits strong long-horizon optimization capabilities, most frontier models, including several proprietary ones, either terminate prematurely or exhaust their budgets with minimal progress. These results underscore the importance of time awareness and persistent iteration in autonomous agents. We open-source the full benchmark, evaluation harness, and task artifacts, to accelerate research toward truly capable long-horizon agents.

URL PDF HTML ☆

赞 0 踩 0

2606.05058 2026-06-04 cs.CV cs.AI 版本更新

UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CAD

UniCAD：面向多模态多任务CAD的统一基准与通用模型

Jingyuan Chen, Sheng Jin, Haopeng Sun, Wentao Liu, Chen Qian

发表机构 * SenseTime Research and Tetras.AI（秒速科技研究院和Tetras.AI）

AI总结针对CAD领域缺乏统一多模态基准的问题，提出UniCAD基准和UniCAD-MLLM通用多模态大语言模型，在点云到CAD重建、文本/图像到CAD生成和CAD问答等任务上实现端到端统一处理，并在多个基准上取得最优性能。

详情

AI中文摘要

计算机辅助设计（CAD）通过创建精确、可编辑的3D模型，支撑着现代工程和制造。然而，CAD研究通常孤立地研究各项任务，而多模态、多任务学习因缺乏统一基准而受阻。为解决这一问题，我们引入了UniCAD，一个全面的多模态CAD学习基准，涵盖点云到CAD重建、文本/图像到CAD生成以及CAD问答等多种输入模态。伴随该基准，我们提出了UniCAD-MLLM，一个通用的多模态大语言模型，能够接收文本、图像、草图和点云，并在单一框架内以端到端方式执行这些异构任务。在UniCAD和Fusion360基准上的大量实验表明，UniCAD-MLLM在所有任务上均达到最先进性能，优于现有的任务特定和多任务基线。我们将发布数据集、代码和预训练模型，以加速未来研究。

英文摘要

Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.

URL PDF HTML ☆

赞 0 踩 0

2606.05043 2026-06-04 cs.AI 版本更新

Strabo: Declarative Specification and Implementation of Agentic Interaction Protocols

Strabo: 声明式规范与实现代理交互协议

Samuel H. Christie, Amit K. Chopra, Munindar P. Singh

发表机构 * North Carolina State University（北卡罗来纳州立大学）； Lancaster University（兰卡斯特大学）

AI总结提出 Strabo，通过声明式交互协议建模 UCP 的结账部分，并利用 Peach 编程模型实现代理，展示声明式规范的优势，同时实现与 Google UCP 代理的互操作，为 EMAS 思想在实践中的渐进引入提供路径。

Comments Presented in the Engineering Multiagent Systems Workshop co-located with the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情

AI中文摘要

过去几年中，基于声明式交互协议的多代理系统建模与实现取得了重大进展。我们的贡献 Strabo 确立了这些进展与当前 Agentic AI 行业努力的相关性。具体来说，我们考虑了 UCP（通用商务协议），这是谷歌近期主导的为 AI 代理标准化电子商务交互的努力。我们的工作分为两部分。第一部分，我们将 UCP 中处理结账的部分建模为声明式 Langshaw 协议，并使用 Peach（一种 Langshaw 编程模型）实现代理。这部分工作展示了形式化、声明式规范的优势。第二部分，我们展示了 Peach 代理可以与谷歌实现的 UCP 代理互操作，从而确立了我们的方法相对于 UCP 的保真度。这种互操作使得声明式协议和代理能够逐步引入传统环境，为 EMAS 思想在不要求全面更新的情况下影响实践指明了路径。

英文摘要

The last few years have witnessed major advances in the modeling and implementation of multiagent systems based on declarative interaction protocols. Our contribution, Strabo, establishes the relevance of these advances to ongoing industry efforts in Agentic AI. Specifically, we consider UCP, the Universal Commerce Protocol, a recent Google-led effort to standardize e-commerce interactions for AI agents. Our exercise is in two parts. One, we model the part of UCP dealing with checkouts as a declarative Langshaw protocol and implement agents using Peach, a programming model for Langshaw. This part of the exercise brings out the advantages of formal, declarative specifications. Two, we show that Peach agents can interoperate with UCP agents implemented by Google, thereby establishing the fidelity of our approach with respect to UCP. Such interoperation enables the incremental introduction of declarative protocols and agents into a conventional setting, indicating a pathway by which EMAS ideas could influence practice without demanding a wholesale update.

URL PDF HTML ☆

赞 0 踩 0

2606.05037 2026-06-04 cs.SE cs.AI 版本更新

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

自反式API：结构优于冗长，助力AI代理恢复

Arquimedes Canedo, Grama Chethan

发表机构 * Siemens Digital Industries Software, USA（西门子数字工业软件公司）

AI总结提出自反式API，在验证失败时返回机器可读的结构化建议，使AI代理无需外部推理即可修复请求并重试，在Anthropic模型上将任务完成率提升36.7-40.0个百分点，且每成功令牌效率提升1.8-2.2倍。

详情

AI中文摘要

当AI代理调用API并遇到验证错误时，它需要的不仅仅是哪里出错了——它需要下一步该做什么。自反式API在验证失败时返回一个机器可读的 recovery_feedback.suggestions[] 负载，足以让代理修复请求并在无需外部推理的情况下重试。在一个经过泄露审计的试点实验（每单元N=30，3个LLM，10个对抗性任务）中，结构化建议在Anthropic模型上将任务完成率提升了+36.7至40.0个百分点（Fisher精确检验 p ≤ 0.0022），每成功令牌效率提高了1.8至2.2倍。在gpt-4o-mini上提升不显著（p=0.435）；在计费API上的第二个领域复制确认了这一模式。该比较仅在审计了LLM基准测试中两个未记录的答案泄露类别后才成立。我们提供了 audit_prompt_leakage.py 作为可重用的CI基础设施。代码和数据：https://github.com/arquicanedo/self-reflective-apis。

英文摘要

When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.

URL PDF HTML ☆

赞 0 踩 0

2606.05025 2026-06-04 cs.LG cs.AI 版本更新

Invariant Gradient Alignment for Robust Reasoning Distillation

不变梯度对齐用于鲁棒推理蒸馏

Zehua Cheng, Wei Dai, Jiahao Sun

发表机构 * University of Oxford（牛津大学）； FLock.io

AI总结提出不变梯度对齐（IGA）框架，通过逻辑同构集、连续梯度冲突掩码和截断SVD投影，对齐不同语义域但逻辑结构相同的梯度更新，提升大语言模型在分布外输入上的鲁棒性。

Comments 30 Pages

详情

Journal ref: In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2026

DeliChess: 一个用于国际象棋谜题求解中深思熟虑的多方对话数据集

Xiaochen Zhu, Georgi Karadzhov, Tom Stafford, Andreas Vlachos

发表机构 * University of Cambridge（剑桥大学）； University of Sheffield（谢菲尔德大学）

AI总结提出DeliChess数据集，包含多方协作解决国际象棋谜题的对话，通过讨论显著提升群体准确性，并分析探询性话语的作用。

详情

AI中文摘要

多方对话是研究协作推理和决策的关键场景，然而现有数据集很少关注结构化、深入的复杂推理任务。我们引入了DeliChess，一个新颖的群体深思熟虑对话数据集，其中参与者协作解决多项选择国际象棋谜题。每个小组首先单独完成谜题，然后进行多方讨论，最后提交修正后的集体答案。该数据集包含107个对话，附有完整转录、讨论前后的选择以及关于谜题难度和走棋质量的元数据。我们使用基于象棋引擎评估的三个指标评估性能，发现深思熟虑显著提高了群体准确性。我们进一步利用先前深思熟虑数据训练的分类器分析了探询性话语（即引发提议、理由或战略反思的消息）的作用。虽然探询性话语使讨论后的群体表现更加多变，但它并未持续带来更好的性能。我们的数据集为在一个明确定义的策略领域中建模群体推理、对话动态以及不同观点和意见的解决提供了丰富的测试平台。

英文摘要

Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of group deliberation dialogues in which participants collaboratively solve multiple-choice chess puzzles. Each group first completes the puzzle individually, then engages in a multi-party discussion before submitting a revised collective answer. The dataset includes 107 dialogues with full transcripts, pre- and post-discussion choices, and metadata on puzzle difficulty and move quality. We evaluate performance using three metrics based on chess engine evaluations, and find that deliberation significantly improves group accuracy. We further analyse the role of probing utterances (i.e., messages that elicit proposals, justifications, or strategic reflection) using a classifier trained on prior deliberation data. While probing makes group performance more variable after discussion, it does not consistently lead to better performance. Our dataset offers a rich testbed for modelling group reasoning, dialogue dynamics, and the resolution of differing perspectives and opinions in a well-defined strategic domain.

URL PDF HTML ☆

赞 0 踩 0

2606.04970 2026-06-04 cs.CV cs.AI 版本更新

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

计划、观察、恢复：主动式程序辅助的基准与架构

Kaustav Kundu, Ritvik Shrivastava, Maxim Arap, Nanshu Wang, Xianhui Zhu, Quintin Fettes, Gautam Tiwari, Parth Suresh, Théo Moutakanni, Alejandro Castillejo Munoz, Allen Bolourchi, Pascale Fung, Pinar Donmez, Babak Damavandi, Anuj Kumar, Seungwhan Moon

发表机构 * Meta Reality Labs（Meta现实实验室）； Meta Superintelligence Labs（Meta超智能实验室）

AI总结提出EgoProactive数据集和Pro²Bench基准，并设计解耦规划器-交互架构，用于主动式程序辅助中的实时引导和异常恢复。

Comments 53 pages, 14 figures

详情

AI中文摘要

我们设想一个主动的多模态辅助系统，该系统在程序性任务中为用户提供实时的逐步指导，自主决定何时中断以及如何指导。然而，由于缺乏反映现实条件的大规模跨领域基准，特别是用户偏离预期步骤序列的常见情况，进展受到限制。我们通过四项贡献来解决这一差距： extbf{(1)}~我们发布了 extbf{EgoProactive}，一个大规模的可穿戴自我中心数据集，用于主动程序辅助，带有明确的计划外（OOP）标注和恢复步骤； extbf{(2)}~我们将五个已建立的基准（Ego4D、EPIC-KITCHENS、EgoExo4D、HoloAssist、HowTo100M）扩充为统一的主动指导模式下的 extbf{Pro extsuperscript{2}Bench}； extbf{(3)}~我们提出了一种专门针对程序状态、视觉线索和恢复注入的 extbf{解耦规划器-交互架构}； extbf{(4)}~我们引入了一种跨模型家族迁移的训练后方案，通过在Llama~4和Qwen-3.6-VL上的跨骨干复制进行验证。在大量实验中，我们训练的Llama-4系统在所有六个数据集上，相对于强大的专有基线（Claude Opus~4.6、Gemini~3.1~Pro、GPT~5.2）和开放权重基线（Qwen3~VL~235B），显著提高了客观干预质量。Oracle计划实验进一步表明，当计划质量得到控制时，训练的双工模型产生高质量的指导，并在计划外恢复方面取得巨大收益。

英文摘要

We envision a proactive multi-modal assistant system which gives users real-time step-by-step guidance on a procedural task, autonomously deciding \textit{when} to interrupt, and \textit{how} to coach. However, progress is limited by the absence of large-scale, cross-domain benchmarks that reflect realistic conditions, particularly the common case in which users deviate from the expected step sequence. We address this gap with four contributions: \textbf{(1)}~we release \textbf{EgoProactive}, a large-scale wearable-egocentric dataset for proactive procedural assistance with explicit Out-of-Plan (OOP) annotations and recovery steps; \textbf{(2)}~we augment five established benchmarks (Ego4D, EPIC-KITCHENS, EgoExo4D, HoloAssist, HowTo100M) into \textbf{Pro\textsuperscript{2}Bench} under a unified proactive-guidance schema; \textbf{(3)}~we propose a \textbf{decoupled planner--interaction architecture} specialized for procedural state, visual cues, and recovery injection; \textbf{(4)}~we introduce a post-training recipe that transfers across model families, validated by cross-backbone replication on Llama~4 and Qwen-3.6-VL. In extensive experiments, our trained Llama-4 system substantially improves objective intervention quality over strong proprietary baselines (Claude Opus~4.6, Gemini~3.1~Pro, GPT~5.2) and open-weight baselines (Qwen3~VL~235B) baselines across all six datasets. Oracle-plan experiments further show that, when plan quality is controlled, the trained duplex model produces high-quality guidance and large gains on Out-of-Plan recovery.

URL PDF HTML ☆

赞 0 踩 0

2606.04967 2026-06-04 cs.SE cs.AI 版本更新

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

从提示到流程：支持AI软件开发智能体的框架流程分类与比较评估

Sanderson Oliveira de Macedo

发表机构 * Federal Institute of Goias（戈亚斯联邦理工学院）

AI总结提出六维流程分类法，对六个AI软件开发框架进行评分比较，揭示流程深度与可移植性之间的结构性权衡。

详情

AI中文摘要

AI编程工具不再仅仅是自动补全或聊天助手：它们组织为开发框架，包含流程、角色、工件和验证。最近的调查绘制了用于软件工程的智能体和LLM，但缺少一项以将这些能力转化为流程的操作框架为中心的研究。我们对主要来源进行了定向搜索，采用功能性纳入标准和牵引力测量，选择了六个框架：GitHub Spec Kit、OpenSpec、BMAD Method、Get Shit Done (GSD)、Spec Kitty和Reversa。每个框架通过不同路径攻击AI开发：完整和轻量变体的规范驱动开发、智能体驱动的敏捷规划、智能体上的上下文工程、工作树隔离与审查，以及从遗留系统中恢复操作规范。我们的核心贡献是一个六维流程分类法：规范、上下文、角色、执行、验证和可移植性，并附带一个评分标准，使其成为可复制的工具。我们将其应用于六个框架和一个样本外案例Spec-Flow。两个结果突出。在已经采用某种流程的框架中，存在趋同：孤立的提示失去中心地位，持久工件、工作合同、可追溯性和人工审查成为减少歧义和协调智能体的机制。并且没有框架强覆盖所有六个维度，暴露了流程深度与跨智能体可移植性之间的结构性权衡。我们还发现了反复出现的风险：规范与代码之间的漂移、对生成工件的过度信任、社区扩展的脆弱性、平台依赖性以及缺乏完整流程的基准测试。我们以一个研究议程结束，侧重于中间质量指标、上下文治理、安装安全性和可重复性。

英文摘要

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

URL PDF HTML ☆

赞 0 踩 0

2606.04930 2026-06-04 cs.LG cs.AI stat.ML 版本更新

AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression

AdaKoop: 基于Koopman算子回归的非平稳数据流非线性动力学高效建模

Naoki Chihara, Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

发表机构 * SANKEN, The University of Osaka（SANKEN大学）

AI总结提出AdaKoop，一种基于Koopman算子理论和概率框架的流式算法，通过将非线性动力学表示为线性系统，实现对非平稳数据流的高效、稳定建模，并在71个基准数据集上超越现有方法。

Comments Accepted by KDD'26

详情

DOI: 10.1145/3770855.3817851
Journal ref: The 32nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2026

AI中文摘要

实时数据分析需要准确且自适应地处理非平稳数据流中的非线性动力学，同时保持计算效率。然而，非线性动力学非常复杂，在严格时间限制下捕获动态变化的非线性模式并将其用于下游任务并非易事。为了弥合非线性复杂性与计算可处理性之间的差距，本研究应用了Koopman算子理论，该理论指出非线性动力学可以表示为无限维空间中的线性变换。基于该算子的有限维近似，我们提出了AdaKoop，一种用于对非平稳数据流上的非线性动力学进行建模的高效流式算法。我们的方法利用基于Koopman算子理论的概率框架，将原始观测和再生核希尔伯特空间（RKHS）特征都视为来自潜在向量的发射。这种双视角公式允许非线性动力学被表示为可处理的线性系统。因此，AdaKoop能够以流式方式高效稳定地建模非线性动力学，避免了迭代非线性优化的高昂计算成本。此外，为了应对数据流中的非平稳性，AdaKoop通过统计假设检验自适应地检测模式突变，并增量更新模型参数以处理连续变化。在总共71个跨领域实际基准数据集上的大量实验表明，AdaKoop在实时预测准确性和计算效率方面均优于最先进的方法。

英文摘要

Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically changing nonlinear patterns and utilizing them for downstream tasks under strict time constraints is nontrivial. To bridge the gap between nonlinear complexity and computational tractability, this study applies Koopman operator theory, which states that nonlinear dynamics can be represented as linear transitions in an infinite-dimensional space. Building upon finite-dimensional approximations of this operator, we present AdaKoop, an efficient streaming algorithm for modeling nonlinear dynamics over nonstationary data streams. Our approach utilizes a probabilistic framework grounded in Koopman operator theory, treating both raw observations and reproducing kernel Hilbert space (RKHS) features as emissions from latent vectors. This dual-view formulation allows nonlinear dynamics to be expressed as a tractable linear system. Therefore, AdaKoop enables the efficient and stable modeling of nonlinear dynamics in a streaming fashion, avoiding the prohibitive computational costs of iterative nonlinear optimization. Furthermore, to address nonstationarity in data streams, AdaKoop adaptively detects the switching of patterns via statistical hypothesis testing for abrupt pattern shifts and incrementally updates model parameters to handle continuous changes. Extensive experiments on a total of 71 practical benchmark datasets across various domains demonstrate that AdaKoop outperforms state-of-the-art methods in terms of real-time forecasting accuracy and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.04923 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

基于评分标准的强化学习中的奖励黑客行为的复现、分析与检测

Xuekang Wang, Zhuoyuan Hao, Shuo Hou, Hao Peng, Juanzi Li, Xiaozhi Wang

发表机构 * Tsinghua University（清华大学）； Harbin Institute of Technology, Shenzhen（哈尔滨工业大学（深圳））； Xi’an Jiaotong University（西安交通大学）

AI总结本文提出可控黑客环境CHERRL，通过注入已知偏见复现奖励黑客行为，分析其可发现性与可利用性，并探索基于智能体的自动检测方法。

Comments 23 pages, 7 figures

详情

AI中文摘要

基于评分标准的强化学习（RL）使用LLM作为评判者（LaaJ）根据评分标准对模型输出进行评分作为奖励。然而，策略模型可能利用评判者中的潜在偏见，导致奖励黑客行为以及无效或不安全的训练结果。在真实的基于评分标准的RL中，此类黑客行为通常微妙且与多种评判者偏见纠缠在一起，使得分析、检测和缓解变得困难。在本文中，我们引入了CHERRL，一个用于基于评分标准的RL的可控黑客环境。通过将已知偏见注入LaaJ，CHERRL能够稳定复现奖励黑客行为，明确观察奖励发散，并精确识别黑客行为的起始点。这为研究基于评分标准的RL中奖励黑客行为的机制和缓解措施提供了一个干净的实验测试平台。为了展示其效用，我们从可发现性和可利用性的角度分析了不同的评判者偏见，并探索了一个基于智能体的系统，用于从训练日志中自动检测奖励黑客行为的起始点。代码和环境公开于https://github.com/THUAIS-Lab/CHERRL。

英文摘要

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with multiple judge biases, making them difficult to analyze, detect, and mitigate. In this paper, we introduce CHERRL, a controllable hacking environment for rubric-based RL. By injecting known biases into LaaJ, CHERRL enables stable reproduction of reward hacking, explicit observation of reward divergence, and precise identification of hacking onset. This provides a clean experimental testbed for studying the mechanisms and mitigations of reward hacking in rubric-based RL. To demonstrate its utility, we analyze different judge biases from the perspectives of discoverability and exploitability, and explore an agent-based system for automatically detecting reward hacking onset from training logs. The code and environment are publicly available at https://github.com/THUAIS-Lab/CHERRL.

URL PDF HTML ☆

赞 0 踩 0

2606.04922 2026-06-04 cs.CV cs.AI cs.LG 版本更新

Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models

几何感知蒸馏用于提示调优生物医学视觉-语言模型

Tran Dinh Tien, Zhiqiang Shen

发表机构 * Department of Machine Learning（机器学习系）； Mohamed bin Zayed University of Artificial Intelligence（Mohamed bin Zayed人工智能大学）

AI总结提出Omni-Geometry知识蒸馏（OGKD）框架，通过注入类别关系结构到教师模型，生成保留真实标签同时尊重类间几何的方向性目标，并设计全局几何感知蒸馏（GAD）和标签引导几何蒸馏（LGD）损失，在11个医学数据集上平均提升准确率1.7%-2.8%。

Comments Preprint. Code is available at https://github.com/tientrandinh/OGKD

详情

AI中文摘要

当前基于提示和适配器的视觉-语言模型（VLM）调优方法在医学影像中具有吸引力，因为临床数据敏感性倾向于冻结骨干网络且标注有限。然而，这些方法通常仅优化真实类别，将所有其他类别视为同等错误，忽略了临床上有意义的类别关系，并在有限监督设置下产生不稳定的决策边界。我们提出了Omni-Geometry知识蒸馏（OGKD），一种新框架，将类别关系结构注入教师模型，以生成保留真实标签同时尊重类间几何的方向性目标。利用这些目标，我们开发了两种蒸馏损失：全局几何感知蒸馏（GAD）作用于全局图像标记，标签引导几何蒸馏（LGD）将相同的几何应用于注意力补丁标记以改善细粒度对齐。在11个广泛使用的医学数据集上进行的基础到新类和少样本评估的综合实验和分析中，我们的OGKD实现了显著更好的性能，在所有先前最先进的VLM适应方法上平均绝对增益为1.7%-2.8%。它还能稳健地泛化到未见类别，并产生比其他方法更可靠的预测。我们的代码可在https://github.com/tientrandinh/OGKD获取。

英文摘要

Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, ignoring clinically meaningful class relations and yielding unstable decision boundaries in limited-supervision settings. We propose Omni-Geometry Knowledge Distillation (OGKD), a new framework that injects class-relation structure into the teacher to produce directional targets that preserve the ground truth while respecting inter-class geometry. Using these targets, we develop two distillation losses: Global Geometry-Aware Distillation (GAD) operates on the global image token, and Label-Guided Geometry Distillation (LGD) applies the same geometry to attentive patch tokens to improve fine-grained alignment. Across comprehensive experiments and analyses on 11 widely-used medical datasets for base-to-novel and few-shot evaluations, our OGKD achieves substantially better performance, consistently improving accuracy by an average absolute gain of 1.7%-2.8% over all prior state-of-the-art VLM adaptation counterparts. It also robustly generalizes to unseen classes and yields more reliable predictions than other approaches. Our code is available at https://github.com/tientrandinh/OGKD.

URL PDF HTML ☆

赞 0 踩 0

2606.04906 2026-06-04 cs.CL cs.AI 版本更新

'Your AI Text is not Mine': Redefining and Evaluating AI-generated Text Detection under Realistic Assumptions

“你的AI文本不是我的”：在现实假设下重新定义和评估AI生成文本检测

Nils Dycke, Marina Sakharova, Nico Daheim, Iryna Gurevych

发表机构 * Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technical University of Darmstadt（通用知识处理实验室（UKP实验室），计算机科学系，达姆施塔特技术大学）； National Research Center for Applied Cybersecurity ATHENE, Germany（应用网络安全国家研究中心ATHENE，德国）； Zuse School ELIZA（祖斯学校ELIZA）

AI总结针对AI生成文本检测领域缺乏统一有害使用定义的问题，本文系统定义了多种AI生成文本概念，构建了包含详细生成过程注释的人机协作文本基准AITDNA，并评估了多种检测器在不同概念下的表现。

2606.04903 2026-06-04 cs.LO cs.AI cs.MA cs.PL 版本更新

Provably Auditable and Safe LLM Agents from Human-Authored Ontologies

基于人类编写本体的可审计且安全的LLM智能体

Aaron Sterling

发表机构 * Thistleseeds

AI总结提出Agentic Redux架构，通过类型化λ演算证明其在适当领域上的执行语义正确且决策可审计，并引入本体优先的智能体设计方法。

2606.04881 2026-06-04 cs.CV cs.AI 版本更新

DiverAge: Reliable Pluralistic Face Aging with Cross-Age Identity Relation Guidance

DiverAge: 基于跨年龄身份关系引导的可靠多元人脸老化

Yueying Zou, Peipei Li, Qianrui Teng, Dianyan Xu, Zekun Li

发表机构 * School of Artificial Intelligence, Beijing University of Posts and Telecommunications（人工智能学院，北京邮电大学）； School of Computer Science, University of California, Santa Barbara（计算机科学学院，加州大学圣芭芭拉分校）

AI总结提出基于扩散自编码的分层多元人脸老化框架DiverAge，通过随机扩散解码和年龄条件语义调制保持外观多样性，并引入跨年龄身份关系调节器（CARR）在推理时引导去噪，以提升序列级有序可靠性。

Comments 11 pages,10 figures, 5 tables

详情

AI中文摘要

人脸老化在长期生物特征分析、跨年龄身份验证和法医身份分析中扮演重要角色。由于同一主体因遗传、环境和生活方式等因素在目标年龄可能呈现多种合理外观，人脸老化本质上是一个一对多的生成问题。然而，仅有多元性不足以实现可靠的人脸老化：模型应在每个年龄组内提供外观级别的候选多样性，同时跨有序年龄组保持序列级别的有序可靠性。现有的确定性老化方法可以合成视觉上合理的年龄增长人脸，但通常缺乏随机多样性。相比之下，多元老化方法引入局部外观变化，但往往未能明确调控完整老化序列的身份演化。本文提出基于扩散自编码的分层多元人脸老化框架DiverAge。DiverAge通过随机扩散解码和年龄条件语义调制保持外观级多样性。为提升序列级可靠性，我们引入跨年龄身份关系调节器（CARR），一种推理时引导策略，联合去噪多个目标年龄组。CARR由从真实同身份跨年龄对估计的跨年龄身份相似性（CIS）先验引导，通过单边采样时引导抑制过度的跨年龄身份漂移，无需修改训练目标或引入额外可训练参数。实验表明，DiverAge在保持身份保留、年龄准确性、图像质量和外观级多样性的同时，提升了序列级有序可靠性。

英文摘要

Face aging plays an important role in long-term biometric analysis, cross-age identity verification, and forensic identity analysis. Since the same subject may exhibit multiple plausible appearances at a target age due to genetic, environmental, and lifestyle factors, face aging is inherently a one-to-many generation problem. However, pluralism alone is insufficient for reliable face aging: a model should provide appearance-level candidate diversity within each age group while maintaining sequence-level ordinal reliability across ordered age groups. Existing deterministic aging methods can synthesize visually plausible age-progressed faces, but usually lack stochastic diversity. In contrast, pluralistic aging methods introduce local appearance variations, but often fail to explicitly regulate the identity evolution of the full aging sequence. In this paper, we propose \textbf{DiverAge}, a hierarchical pluralistic face aging framework based on diffusion autoencoding. DiverAge preserves appearance-level diversity through stochastic diffusion decoding and age-conditioned semantic modulation. To improve sequence-level reliability, we introduce a Cross-age Identity Relation Regulator (CARR), an inference-time guidance strategy that jointly denoises multiple target age groups. CARR is guided by a Cross-age Identity Similarity (CIS) prior estimated from real same-identity cross-age pairs, and suppresses excessive cross-age identity drift through one-sided sampling-time guidance without modifying the training objective or introducing extra trainable parameters. Experiments demonstrate that DiverAge improves sequence-level ordinal reliability while maintaining identity preservation, age accuracy, image quality, and appearance-level diversity.

URL PDF HTML ☆

赞 0 踩 0

2606.04877 2026-06-04 cs.LO cs.AI cs.PL cs.SE 版本更新

Abduction Prover in Isabelle/HOL

Isabelle/HOL中的溯因证明器

Yutaka Nagashima, Daniel Sebastian Goc

发表机构 * Institute of Computer Science, the Czech Academy of Sciences（捷克科学院计算机科学研究所）

AI总结针对基于表达逻辑的证明助手自动化程度低的问题，提出了一种利用溯因推理识别有用猜想并自动构建证明脚本的Isabelle/HOL溯因证明器。

Comments Accepted to Isabelle2026

2606.04867 2026-06-04 cs.AI 版本更新

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

AICompanionBench: 以LLM为评判标准的AI伴侣安全基准测试

Yanjing Ren, Reza Ebrahimi, TengTeng Ma

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结本文提出AICompanionBench，首个公开的细粒度安全风险标注的人机伴侣对话基准数据集，并评估20个LLM在检测不安全交互中的表现，发现强模型在显式有害内容上准确率高，但难以识别隐式不安全交互。

详情

AI中文摘要

随着Replika和Character.AI等AI伴侣平台的快速增长，对不安全的人机交互的担忧日益加剧。本研究引入了AICompanionBench，据我们所知，这是第一个公开可用的人机伴侣对话基准数据集，并标注了细粒度的安全风险类别。该数据集包含从Reddit收集的2,123个真实Replika对话，并通过人机协作在九个类别上进行标注：性行为、反社会行为、身体攻击、言语攻击、药物滥用、自伤和自杀、控制、操纵和无害。利用该基准，我们在LLM作为评判者的框架下评估了20个最先进的开源和闭源LLM，用于检测不安全交互。结果显示模型性能差异显著，较强的模型实现了较高的整体准确性，但在操纵等细微类别以及被错误识别为有害的无害对话中仍存在困难。我们的发现表明，尽管当前的LLM能有效检测显式有害内容，但在识别隐式不安全交互方面仍然有限。总体而言，我们的工作为AI伴侣安全研究贡献了一个新的基准数据集，并为使用LLM监控AI伴侣系统提供了见解。该数据集公开于：https://github.com/anonymousresearcher2026/AICompanionBench/blob/main/AICompanionBench.xlsx

英文摘要

As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annotated with fine-grained safety risk categories. The dataset contains 2,123 real-world Replika conversations collected from Reddit and annotated through human-AI collaboration across nine categories: sexual behavior, antisocial behavior, physical aggression, verbal aggression, substance abuse, self-harm and suicide, control, manipulation, and no-harm. Using this benchmark, we evaluate 20 state-of-the-art open-source and closed-source LLMs under an LLM-as-judge framework for detecting unsafe interactions. Results show substantial variation in model performance, with stronger models achieving high overall accuracy but still struggling with nuanced categories such as manipulation, as well as benign conversations that are incorrectly identified as harmful. Our findings suggest that while current LLMs can effectively detect explicit harmful content, they remain limited in identifying implicit unsafe interactions. Overall, our work contributes a new benchmark dataset for AI companionship safety research and offers insights into monitoring AI companion systems using LLMs. The dataset is publicly available at: https://github.com/anonymousresearcher2026/AICompanionBench/blob/main/AICompanionBench.xlsx

URL PDF HTML ☆

赞 0 踩 0

2606.04860 2026-06-04 cs.LG cs.AI 版本更新

Learning Empirically Admissible Neural Heuristics for Combinatorial Search

学习组合搜索的经验可容许神经启发式

Siddharth Sahay

发表机构 * Independent Researcher（独立研究者）

AI总结针对组合搜索问题，提出一种结合可容许贝尔曼算子与非对称损失函数的验证校准框架，训练出经验可容许的神经启发式，在保证路径最优性的同时显著减少搜索节点扩展。

Comments 13 pages, 3 figures, 2 tables, 1 algorithm

详情

AI中文摘要

寻找诸如魔方、滑动拼图游戏和Lights Out等组合谜题的最优解路径仍然是人工智能中的经典挑战。启发式搜索算法（如A*）仅在使用可容许启发式（即从不高估真实剩余代价的启发式）时才能保证路径最优性。深度强化学习方法（如DeepCubeA）训练深度神经网络来近似代价到目标的启发式。然而，标准均方误差训练经常产生高估，违反可容许性并损害解的最优性。在本文中，我们介绍了一个可泛化的框架，用于学习验证校准的可容许神经启发式。我们使用低估的可容许贝尔曼算子结合非对称损失函数来训练价值网络，以惩罚高估。为了考虑残差神经函数逼近误差，我们提出了一个基于验证打乱计算的校准安全偏移量。我们证明，在校准的神经启发式下，在评估协议下未观察到可容许性违反，并在实践中保持了路径最优性，同时与标准分析基线相比，在2x2魔方上减少了高达83.0%的搜索节点扩展，在3x3 Lights Out网格上减少了19.9%，在8-Puzzle上减少了1.9%。

英文摘要

Finding optimal solution paths for combinatorial puzzles like the Rubik's Cube, sliding tile puzzles, and Lights Out remains a classical challenge in artificial intelligence. Heuristic search algorithms, such as A* , guarantee path optimality only when using an admissible heuristic-one that never overestimates the true remaining cost-to-go. Deep reinforcement learning (RL) methods like DeepCubeA train deep neural networks to approximate cost-to-go heuristics. However, standard mean-squared error (MSE) training regularly yields overestimations, violating admissibility and compromising solution optimality. In this paper, we introduce a generalizable framework for learning validation-calibrated admissible neural heuristics. We train a value network using an underestimating Admissible Bellman Operator combined with an Asymmetric Loss function to penalize overestimation. To account for residual neural function approximation errors, we propose a post-hoc calibration safety offset computed over validation scrambles. We demonstrate that our calibrated neural heuristics achieve no observed admissibility violations under the evaluation protocol and preserve path optimality in practice while reducing search node expansions by up to 83.0% on a 2 by 2 Rubik's Cube, 19.9% on a 3 by 3 Lights Out grid, and 1.9% on an 8-Puzzle compared to standard analytical baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.04850 2026-06-04 cs.LG cs.AI cs.AR math.OC 版本更新

Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication

不确定性感知的神经网络处理器端到端协同设计：从训练、映射到制造

Yuyang Du, Yujun Huang, Gioele Zardini

AI总结提出一个基于单调协同设计理论的统一框架，通过四个可互操作的设计模块（网络训练、芯片映射、晶圆级制造和计算资源分配）实现神经网络处理器的端到端协同设计，并引入置信度（成功概率的倒数）作为显式可优化资源来处理不确定性。

Comments 14 pages

详情

AI中文摘要

设计神经网络处理器是一个端到端的协同设计问题：网络架构和训练预算决定了推理工作负载；硬件映射决策决定了芯片面积、延迟和能量；这些特性决定了制造良率和生产成本。在实践中，这些决策是在不同阶段做出的，现有的协同设计方法与特定算法紧密耦合，使得改进一个组件而不重新设计整个流水线变得困难。本文提出了一个基于单调协同设计理论的统一框架，该框架组合了四个可互操作的设计模块，涵盖网络训练、芯片映射、晶圆级制造和计算资源分配。每个模块仅向系统其余部分暴露功能-资源接口，因此任何模块都可以在不改变其他模块结构的情况下进行优化。一个核心贡献是对不确定性的处理：该框架没有将随机结果简化为点估计，而是引入置信度（成功概率的倒数）作为与成本、时间和功耗并列的显式可优化资源。三个案例研究验证了该方法。第一个案例恢复了跨异构应用场景的帕累托最优实现。第二个案例确认置信度作为一个连续可调的设计旋钮，而非事后诊断指标。第三个案例表明，改进单个模块的实现集会自动传播到全局帕累托前沿，而无需修改协同设计图。

英文摘要

Designing a neural network processor is an end-to-end co-design problem: network architecture and training budget determine the inference workload; hardware mapping decisions determine chip area, latency, and energy; and these characteristics govern fabrication yield and manufacturing cost. In practice, these decisions are made in separate stages, and existing co-design methodologies are tightly coupled to specific algorithms, making it difficult to improve one component without reworking the entire pipeline. This paper presents a unified framework, grounded in monotone co-design theory, that composes four interoperable design blocks spanning network training, chip mapping, wafer-level fabrication, and compute resource allocation. Each block exposes only a functionality-resource interface to the rest of the system, so any block can be refined without structural changes elsewhere. A central contribution is the treatment of uncertainty: rather than collapsing stochastic outcomes into point estimates, the framework introduces Confidence, the inverse of success probability, as an explicit and optimizable resource alongside cost, time, and power. Three case studies validate the approach. The first recovers Pareto-optimal implementations across heterogeneous application scenarios. The second confirms that Confidence functions as a continuously tunable design knob rather than a post-hoc diagnostic. The third demonstrates that improving a single block's implementation set automatically propagates to the global Pareto front, without modifying the co-design diagram.

URL PDF HTML ☆

赞 0 踩 0

2606.04823 2026-06-04 cs.AI cs.CL cs.MA 版本更新

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

R-APS：基于反思性对抗帕累托搜索的组合推理与上下文元学习用于约束设计

João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas

发表机构 * Idiap Research Institute（Idiap研究 institute）； École Polytechnique Fédérale de Lausanne（瑞士联邦理工学院）； Honda Research Institute Europe（本田欧洲研究机构）； Department of Computer Science, University of Manchester（曼彻斯特大学计算机科学系）； National Biomarker Centre, CRUK-MI, University of Manchester（曼彻斯特大学国家生物标志物中心）

AI总结提出R-APS方法，通过推理模式分解、分阶段组合推理、敏感性引导对抗测试和元归纳规则提取，联合解决LLM在代理设置中的错误传播、最坏情况扰动和知识失效问题，在平面机构合成任务上实现更紧的鲁棒性证书和更快的迭代速度。

详情

AI中文摘要

大型语言模型（LLM）在开放式任务上表现流畅，但在需要规划、使用工具和长时间行动的代理设置中，流畅性并不能保证可靠交付。我们将这一差距归因于三个耦合的结构性失败：错误传播而不定位、最坏情况扰动未评估、积累的知识从未失效。我们认为这些失败有一个共同根源：溯因、反事实、元归纳、纠正和归纳推理将共享上下文拉向不相容的方向。我们提出反思性对抗帕累托搜索（R-APS），据我们所知，这是第一种通过推理模式分解联合解决所有三个失败的方法，为每种推理模式分配其自己的上下文，并在三个时间尺度上协调交互：带有类型化验证批评者的分阶段组合推理（失败定位）、作为第一类帕累托目标的敏感性引导反事实压力测试（鲁棒性）、以及带有显式失效的元归纳规则提取（持久记忆）。R-APS无需微调，仅通过结构化协议设计在冻结的LLM上运行。我们在平面机构综合（机器人、假肢、机械设计）上评估，每个候选解由运动学求解器检查。在32个目标轨迹上，R-APS提供的鲁棒性证书比均匀扰动基线紧3.5倍，首次接纳迭代速度提高46%，Chamfer距离比Enum+GA减少2.1倍，同时联合控制杆数和最坏情况鲁棒性。小型4B推理专用模型在协议内与通用70B骨干模型竞争，表明结构化协议可以部分抵消模型规模。

英文摘要

Large language models (LLMs) are fluent on open-ended tasks, yet in agentic settings, where a system must plan, use tools, and act over extended horizons, fluency does not ensure reliable delivery. We trace this gap to three coupled structural failures: errors propagate without localization, worst-case perturbations go unevaluated, and accumulated knowledge is never invalidated. We argue these share a root cause: abductive, counterfactual, meta-inductive, corrective, and inductive reasoning pull a shared context in incompatible directions. We introduce Reflective Adversarial Pareto Search (R-APS), to our knowledge the first method addressing all three failures jointly via reasoning-mode decomposition, allocating each reasoning mode its own context and orchestrating interaction across three timescales: staged compositional reasoning with a typed validation critic (failure localization), sensitivity-guided counterfactual stress-testing as a first-class Pareto objective (robustness), and meta-inductive rule extraction with explicit invalidation (persistent memory). R-APS requires no fine-tuning and operates on a frozen LLM purely via structured protocol design. We evaluate on planar mechanism synthesis (robotics, prosthetics, mechanical design), with every candidate checked by a kinematic solver. On 32 target trajectories, R-APS delivers robustness certificates 3.5x tighter than uniform-perturbation baselines, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over Enum+GA while jointly controlling bar-count and worst-case robustness. Small 4B reasoning-specialized models prove competitive with general-purpose 70B backbones inside the protocol, suggesting structured protocols can partially offset model scale.

URL PDF HTML ☆

赞 0 踩 0

2606.04820 2026-06-04 cs.CV cs.AI cs.LG 版本更新

OA-CutMix: Correcting the Label Bias of CutMix

OA-CutMix：纠正CutMix的标签偏差

Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Brian B. Moser, Andreas Dengel

发表机构 * RPTU University Kaiserslautern-Landau（凯撒斯劳滕-兰道大学）； German Research Center for Artificial Intelligence (DFKI)（德国人工智能研究中心）

AI总结针对CutMix中标签分配基于区域面积导致语义偏差的问题，提出OA-CutMix，利用分割掩码根据可见目标面积分配标签，在不改变图像混合过程的情况下提升分类准确率。

详情

AI中文摘要

CutMix已成为事实上的标准混合增强方法，但其标签分配基于一个有缺陷的假设：粘贴补丁的面积忠实地反映了其对混合图像的语义贡献。然而，在实践中，补丁经常落在背景区域，将标签信用分配给其目标不可见的类别。CutMix标签与语义目标面积的平均差异为21.5%。在17%的样本中，一张图像贡献了零个可见目标像素，却获得了非零的标签权重。我们提出目标感知CutMix（OA-CutMix），通过用从预计算分割掩码中导出的权重替换基于面积的CutMix权重来纠正这种偏差，根据每个图像贡献给混合图像的可见目标面积比例分配标签。图像混合过程完全保持不变。我们在4种架构和6个数据集上评估了OA-CutMix与10多种静态和动态混合方法的性能。OA-CutMix在所有任务中始终达到最高准确率，甚至优于动态混合方法，但训练时间成本仅为其一小部分。对于小目标，改进最大，因为CutMix的标签偏差最大。因此，纠正标签足以匹配或超过修改图像混合算法的方法的性能。

英文摘要

CutMix has become the de facto standard mixing augmentation, yet its label assignment rests on a flawed assumption: The area of the pasted patch faithfully reflects its semantic contribution to the mixed image. In practice, however, patches frequently land on background regions, assigning label credit to classes whose objects are not visible. The mean discrepancy of the CutMix label and the semantic object area is $21.5\%$. In $17\%$ of samples an image contributes zero visible object pixels yet receives nonzero label weight. We propose Object-Aware CutMix (OA-CutMix), which corrects this bias by replacing the area-based CutMix weight with one derived from precomputed segmentation masks, assigning labels in proportion to the visible object area each image contributes to the mix. The image mixing procedure is left entirely unchanged. We evaluate OA-CutMix against 10+ static and dynamic mixing methods across 4 architectures and 6 datasets. OA-CutMix consistently achieves the highest accuracy over all tasks, outperforming even dynamic mixing methods, but at a fraction of the training-time cost. Improvements are largest for small objects, where the label bias from CutMix is greatest. Thus, correcting the label is sufficient to match or exceed the performance of methods modifying the image mixing algorithm.

URL PDF HTML ☆

赞 0 踩 0

2606.04816 2026-06-04 cs.AI cs.LG 版本更新

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

超越目标等价性：基于LLM的车辆路径问题优化建模的约束注入

Xizi Luo, Changhong He, Dongdong Geng, Chenggong Shi, Yu Mei

发表机构 * Beihang University（北京航空航天大学）； Baidu Inc.（百度公司）

AI总结针对LLM在约束密集的运筹问题中可能添加虚假约束或遗漏必要约束的问题，提出约束注入方法，结合差分测试形成双重验证器，并在车辆路径问题上验证其有效性。

Comments 28 pages

详情

AI中文摘要

大型语言模型（LLM）越来越多地将自然语言优化问题转化为可执行的求解器代码。然而，对于约束密集的运筹学（OR）问题，现有的数据过滤和训练流程主要依赖于目标等价性信号，如差分测试和答案一致性，这些信号允许程序在测试实例上添加虚假约束或静默省略必要约束，只要这些约束在测试实例上非绑定。我们提出约束注入，利用可行探针暴露虚假过度约束，利用单约束违反探针揭示静默约束遗漏。结合差分测试，它形成一个双重验证器。我们在车辆路径问题（VRPs）上实例化并评估该方法，VRPs是代表性的约束密集组合优化测试平台，具有耦合的操作约束。我们开发了VRPCoder，一个8B端到端模型，将自然语言VRP场景转化为Gurobi脚本，并附带一个专家验证的VRP基准套件，涵盖21种变体。该验证器在数据合成期间用作拒绝采样过滤器，在组相对策略优化（GRPO）中用作每次rollout的奖励。在四个VRP基准上，VRPCoder-GRPO达到93%的平均Pass@1，在三个基准上优于Gemini-3.1-Pro Preview，超过Claude-Sonnet-4.5平均28个百分点，并超过先前的OR-LLM平均78个百分点。

英文摘要

Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective-equivalence signals such as differential testing and answer agreement, which a program can pass while adding spurious constraints or silently omitting required ones, whenever those constraints are non-binding on the tested instance. We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission. Combined with differential testing, it forms a dual verifier. We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints. We develop VRPCoder, an 8B end-to-end model that translates natural-language VRP scenarios into Gurobi scripts, together with an expert-verified VRP benchmark suite covering 21 variants. The verifier is reused as a rejection-sampling filter during data synthesis and as a per-rollout reward in group relative policy optimization (GRPO). Across four VRP benchmarks, VRPCoder-GRPO reaches 93\% average Pass@1, outperforms Gemini-3.1-Pro Preview on three benchmarks, exceeds Claude-Sonnet-4.5 by 28 average points, and surpasses prior OR-LLMs by 78 average points.

URL PDF HTML ☆

赞 0 踩 0

2606.04815 2026-06-04 cs.LG cs.AI 版本更新

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

边行动边学习：面向在线终身学习智能体的技能增强测试时协同进化框架

Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He

发表机构 * School of Computer Science and Technology, East China Normal University（东华大学计算机科学与技术学院）； Shanghai AI Laboratory（上海人工智能实验室）； Software Engineering Institute, East China Normal University（东华大学软件工程学院）

AI总结提出LifeSkill框架，通过验证器引导的技能学习和在线技能内化，使LLM智能体在测试时持续内化反馈，提升终身学习性能。

详情

AI中文摘要

终身学习对于在动态、交互环境中运行的大型语言模型（LLM）智能体至关重要。然而，现有的用于长时任务的终身学习智能体通常依赖于离散技能或过去经验检索，并在推理期间使用静态参数，这阻止了它们像人类学习者一样持续内化测试时反馈。为弥补这一差距，我们提出了技能增强测试时协同进化（LifeSkill），一个用于在线终身学习智能体的两阶段强化学习框架。具体来说，我们设计了验证器引导的技能学习，通过根据多个技能条件策略滚动的平均验证器成功率奖励候选技能，解决了技能提取缺乏直接监督的问题，鼓励模型生成对解决任务有用的技能，而不仅仅是文本上合理的技能。此外，我们引入了在线技能内化，通过在测试时交互期间将技能条件轨迹转化为奖励信号，持续改进策略模型。这使得智能体能够将推理能力直接内化到其参数中，避免了经验检索的上下文膨胀。在LifelongAgentBench上的实验表明，与现有终身学习智能体基线相比，LifeSkill将平均性能提高了7个绝对百分点。

英文摘要

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners. To bridge this gap, we propose Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a two-stage reinforcement learning framework for Online Lifelong Learning Agents. Specifically, we design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts, encouraging the model to generate skills that are useful for solving tasks rather than merely plausible in text. Furthermore, we introduce Online Skill Internalization, which continuously improves the policy model during test-time interaction by transforming skill-conditioned trajectories into reward signals. This enables the agent to directly internalize reasoning capabilities into its parameters, avoiding the context bloat of experience retrieval. Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.04807 2026-06-04 cs.AI cs.CL cs.CY cs.LG 版本更新

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

BiasGRPO：通过组相对策略优化在高方差奖励景观中稳定偏差缓解

Saket Reddy, Ke Yang, ChengXiang Zhai

发表机构 * University of Illinois - Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）

AI总结提出BiasGRPO框架，利用组相对策略优化（GRPO）通过归一化组内奖励来稳定大语言模型的社会偏差缓解，优于DPO和PPO。

Comments Accepted to Findings of the ACL

详情

AI中文摘要

缓解大语言模型（LLMs）中的社会偏差提出了一个独特的对齐挑战：与可验证任务不同，偏差缺乏单一的真实标准，从而产生高方差、主观的奖励景观。先前的基于偏好的微调方法存在主要权衡：直接偏好优化（DPO）受限于离线训练中缺乏探索，而近端策略优化（PPO）由于潜在不可靠的评论家估计可能导致训练不稳定。在本文中，我们提出了BiasGRPO，一个使用组相对策略优化（GRPO）的框架，通过对一组采样完成进行奖励归一化来稳定对齐。通过用组相对基线替代价值函数，我们的方法在保持在线训练探索优势的同时减少了不稳定性。我们发现BiasGRPO在多个基准测试中优于DPO和PPO，表明其有效性。为了适应GRPO，我们综合扩展了一个涵盖多个领域和上下文的数据集。我们还创建并发布了一个定制的偏差奖励模型，该模型在有效指导生成的同时高度计算高效且避免知识退化，提供了一个可无缝集成到多目标RLHF流程中的宝贵资源。

英文摘要

Mitigating social bias in Large Language Models (LLMs) presents a distinct alignment challenge: unlike verifiable tasks, bias lacks a single ground truth, creating a high-variance, subjective reward landscape. Previous preference-based fine-tuning methods have major trade-offs: Direct Preference Optimization (DPO) is limited by the lack of exploration inherent in offline training, while Proximal Policy Optimization (PPO) can lead to training instability due to potentially unreliable critic estimates. In this paper, we propose BiasGRPO, a framework using Group Relative Policy Optimization (GRPO) to stabilize alignment by normalizing rewards across a group of sampled completions. By substituting the value function with a group-relative baseline, our approach reduces instability while maintaining the exploration benefits of online training. We find that BiasGRPO outperforms DPO and PPO across multiple benchmarks, indicating its effectiveness. To adapt GRPO, we synthetically extend a dataset spanning multiple domains and contexts. We also create and release a custom bias reward model that effectively guides generation while being highly compute-efficient and avoiding knowledge degradation, providing a valuable resource that can be seamlessly integrated into multi-objective RLHF pipelines.

URL PDF HTML ☆

赞 0 踩 0

2606.04806 2026-06-04 cs.CV cs.AI 版本更新

NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning

NoRA: 评估视觉第一人称规范性动作推理中的基于事实的合理性

Sichao Li, Sai Ma, Daniel Kilov, Secil Yanik Guyot, Zhuang Li, Seth Lazar

发表机构 * The University of Sydney（悉尼大学）； Australian National University（澳大利亚国立大学）； RMIT University（皇家墨尔本理工大学）； Johns Hopkins University（约翰霍普金斯大学）

AI总结提出NoRA基准，通过事实-理由-动作支持图评估多模态模型生成合理动作并基于可见事实进行推理的能力，发现当前VLM在构建完整动作空间和绑定正确支持方面存在不足。

详情

AI中文摘要

LLM和智能系统越来越多地部署在社交环境中，使得规范能力对安全和适当行为至关重要。然而，现有方法要么仅在文本中评估规范性判断，要么将其简化为从固定候选动作集中选择。我们认为两者都不够。在实践中，智能体永远不会获得一个选项菜单；它们必须从头识别一个合理的动作，基于可见事实并由可检查的理由支持。我们引入了NoRA，一个视觉第一人称视频基准，要求模型生成候选的下一个动作，并通过显式的事实-理由-动作支持图来证明每个动作。该基准包含1,420个带注释的视频片段，包括HumanGold-190和LLMSilver-1230分割。每个实例通过动作对齐、事实基础和支持绑定进行评估，汇总为单一的基于事实的合理性分数。我们在直接、深思熟虑和结构化提示模式下对12个多模态系统进行了基准测试，发现当前的VLM经常能恢复合理的动作和相关的场景事实，但始终难以构建完整的合理动作空间并将所选动作绑定到正确的局部支持上。NoRA使这一差距可测量，将评估问题从模型是否能选择一个动作转变为是否能基于正确的可见理由证明一个适当的动作。

英文摘要

LLMs and agentic systems are increasingly deployed in social environments, making normative competence critical for safe and appropriate behavior. However, existing approaches either assess normative judgment in text alone or reduce it to choosing among a fixed set of candidate actions. We argue both are insufficient. In practice, agents are never handed a menu of options; they must identify a reasonable action from scratch, grounded in visible facts and supported by inspectable reasons. We introduce NoRA, a visual first-person video benchmark that requires models to generate candidate next actions and justify each through an explicit fact-reason-action support graph. The benchmark comprises 1,420 annotated video clips, including HumanGold-190 and LLMSilver-1230 splits. Each instance is evaluated through action alignment, factual grounding, and support binding, aggregated into a single grounded reasonableness score. We benchmark 12 multimodal systems under direct, deliberate, and structured prompting regimes, finding that current VLMs frequently recover plausible actions and relevant scene facts, but consistently struggle to construct the full reasonable action space and bind selected actions to the correct local support. NoRA makes this gap measurable, shifting the evaluation question from whether a model can pick an action to whether it can justify an appropriate action for the right visible reasons.

URL PDF HTML ☆

赞 0 踩 0

2606.04781 2026-06-04 cs.AI cs.LG 版本更新

AIP: A Graph Representation for Learning and Governing Agent Skills

AIP: 一种用于学习和治理智能体技能的图表示

Zachary Blumenfeld, Jim Webber

发表机构 * Neo4j USA（Neo4j美国公司）； Neo4j UK（Neo4j英国公司）

AI总结提出Agent指令协议(AIP)，将有向执行图作为技能表示，通过编译人类编写的技能提升任务表现，并支持技能的可诊断修复与治理。

详情

AI中文摘要

当前的智能体技能主要由自由形式的散文组成，要求智能体在每个会话中阅读、解释并重新推导如何行动。这带来了两个叠加的成本：在实现密集型任务上降低了可靠性，并且技能创建和改进困难，因为编辑散文是一个脆弱的过程，人类和智能体都难以处理，特别是对于模型训练中代表性不足的领域特定程序性知识。智能体指令协议(AIP)通过将技能建模为有向执行图来解决这两个问题：离散步骤作为节点，由确定性脚本或自然语言描述支持，通过显式类型的输入/输出边连接，并由模式验证的YAML规范管理。一个编译器元技能将现有的人类编写的技能转换为这种形式。好处是双重的。首先，将人类编写的技能编译为AIP后，Claude Sonnet在SkillsBench的27个真实智能体任务上的平均任务奖励从0.60提高到0.71，通过率从53%提高到67%——这是统计上显著的提升（Wilcoxon符号秩检验p=0.011），在12个任务中获胜，2个失败，13个平局——通常耗时更少。该图为智能体提供了经过验证、可运行的单元，而不是要求它从自然语言中重新推导代码、命令和工具调用。其次，在创建和改进方面，由于每个技能都经过模式验证、功能可测试且可逐节点寻址，因此可以精确诊断和修复故障。两个作者编写的技能故障被追溯到脚本级别。在调整AIP规范并重新编译后，两者均恢复且无回归（一个任务从0/5变为5/5），将技能改进转变为可测量的调优循环，而不是散文重写。相同的图结构支持语料库级别的治理和技能内省，并为基于技能的强化学习提供了自然的动作空间。

英文摘要

Agent Skills today consist largely of free-form prose requiring the agent to read, interpret, and re-derive how to act in every session. This imposes two compounding costs: reduced reliability on implementation-heavy tasks, and difficulty in skill creation and improvement, since editing prose is a fragile process that both humans and agents struggle with, particularly for domain-specific procedural knowledge underrepresented in model training. The Agent Instruction Protocol (AIP) addresses both by modeling a skill as a directed execution graph: discrete steps as nodes backed by deterministic scripts or natural-language descriptions, connected by explicit typed input/output edges, and governed by a schema-validated YAML specification. A compiler meta-skill translates existing human-written skills into this form. The benefits are twofold. First, compiling human-written skills to AIP raised Claude Sonnet's mean task reward from 0.60 to 0.71 and pass rate from 53% to 67% across 27 real agent tasks from SkillsBench - a statistically significant gain (Wilcoxon signed-rank p = 0.011), winning 12 tasks to 2 with 13 ties - often in less wall-clock time. The graph delivers vetted, runnable units to the agent rather than asking it to re-derive code, commands, and tool calls from natural language. Second, on creation and improvement, because each skill is schema-validated, functionally testable, and addressable node-by-node, failures can be diagnosed and repaired precisely. Two authored-skill failures were traced to the script level. After adjusting the AIP spec and recompiling, both recovered with zero regressions (one task going from 0/5 to 5/5), turning skill improvement into a measurable tuning loop rather than a prose rewrite. That same graph structure supports corpus-level governance and skill introspection, and provides a natural action space for reinforcement learning over skills.

URL PDF HTML ☆

赞 0 踩 0

2606.04779 2026-06-04 cs.AI math.CO 版本更新

Tree-Based Formalization of Multi-Agent Complementarity in Human-AI Interactions

基于树的人机交互中多智能体互补性形式化

Andrea Ferrario

发表机构 * Institute of Biomedical Ethics and History of Medicine, University of Zurich（伦理与医学史研究所，苏黎世大学）； SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)（SUPSI，达勒莫利人工智能研究所）； ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出一种基于树的形式化框架，通过有序智能体角色配置和平面二叉树表示人机交互协议，证明互补性在回归中可实现，但在分类中受限于局部聚合和损失函数的自然条件。

Comments 29 pages, 9 figures

详情

AI中文摘要

互补性是指人机交互（HAI）的表现优于其成员中最佳预测基准的情况。尽管这一概念在HAI研究中至关重要，但关于互补性的形式化工作仍然有限。现有框架未能建模智能体的预测如何组合成对工作流敏感的多智能体协议。我们通过引入基于树的多智能体HAI互补性形式化来填补这一空白。一个HAI协议由一个有序的智能体角色配置以及一棵有根平面二叉树表示，树的叶子由预测向量装饰。沿树递归评估一个局部二元组合规则，产生相对于逐点最小预言基准的树相对互补性泛函。我们证明了四个结果。第一，基于选择器的HAI（包括自我或AI依赖）无法实现互补性，无论任务、损失或预测质量如何。第二，在平方损失下的回归中，互补性等价于与真实向量之间的欧几里得距离最小化；对于$N=2$，最优线性池化权重具有封闭形式并具有残差校正解释。第三，在线性局部组合下，每个协议树定义了叶子权重单纯形上的重心坐标图；协议树的Tamari覆盖重新参数化保持互补性，对于$N=4$，它们满足五边形恒等式。第四，在二元分类中，在端点单调损失（包括标准Bregman和许多有限伯努利$f$散度损失）下，没有内部局部组合能实现互补性；在交叉熵下的多类聚合中存在类似障碍。总之，我们的框架表明，互补性在多智能体回归中是可实现的，但在分类中，在局部聚合和损失函数的自然条件下受到阻碍。

英文摘要

Complementarity is the case in which a human--AI interaction (HAI) outperforms the best prediction benchmark available among its members. Although this idea is central in HAI research, formal work on complementarity remains limited. Existing frameworks do not model how agents' predictions compose into workflow-sensitive multi-agent protocols. We close this gap by introducing a tree-based formalization of complementarity in multi-agent HAI. An HAI protocol is represented by an ordered agent-role configuration together with a rooted planar binary tree whose leaves are decorated by prediction vectors. A local binary composition rule is evaluated recursively along the tree, yielding a tree-relative complementarity functional relative to a pointwise-min oracle benchmark. We prove four results. First, selector-based HAIs, including self- or AI-reliance, cannot achieve complementarity regardless of task, loss, or prediction quality. Second, in regression under squared loss, complementarity is equivalent to Euclidean distance minimization from the ground-truth vector; for $N=2$, the optimal linear-pooling weight has a closed form and a residual-correction interpretation. Third, under linear local composition, every protocol tree defines a barycentric coordinate chart on the simplex of leaf weights; Tamari-cover reparameterizations of protocol trees preserve complementarity, and for $N=4$, they satisfy the pentagon identity. Fourth, in binary classification, no internal local composition can achieve complementarity under endpoint-monotone losses, including standard Bregman and many finite Bernoulli $f$-divergence losses; an analogous obstruction holds for multiclass aggregation under cross-entropy. In summary, our framework shows that complementarity is attainable in multi-agent regression, but obstructed in classification under natural conditions on local aggregation and loss functions.

URL PDF HTML ☆

赞 0 踩 0

2606.04778 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

超越浅层安全的推理时脆弱性：沿生成轨迹的对齐

Kyungmin Park, Taesup Kim

发表机构 * Hankuk University of Foreign Studies（翰江大学外国语大学）； Seoul National University（首尔国立大学）

AI总结本文揭示安全对齐的大语言模型在推理时存在更广泛的脆弱性，即任意生成步骤的短标记注入都能显著改变后续安全行为，并提出通过直接在生成轨迹上对齐模型来提升鲁棒性。

详情

AI中文摘要

安全对齐的大语言模型（LLMs）在推理时仍然容易受到干预，这些干预会将生成导向有害输出。最近的研究将其归因于浅层安全，即对齐集中在最初的几个输出标记上。我们表明，浅层安全是更广泛的推理时脆弱性的一个特例，其中在任何生成步骤的短标记注入都能显著改变后续的安全行为。我们还发现，模型在其隐藏状态中与拒绝方向的对齐并不能预测其对这种注入的鲁棒性，这表明在扰动下，内部状态本身并不能决定生成行为。为了解决这个问题，我们通过模拟序列中段扰动构建的生成轨迹上直接对齐模型，并表明这提高了对中段注入的鲁棒性，并泛化到利用早期标记生成的攻击。我们的工作认为，鲁棒的安全对齐需要对生成过程本身进行训练，而不仅仅是其输出。

英文摘要

Safety-aligned Large Language Models (LLMs) remain vulnerable to interventions during inference that redirect generation toward harmful outputs. Recent work attributes this to shallow safety, where alignment concentrates in the first few output tokens. We show that shallow safety is a special case of a broader inference-time vulnerability, in which short token injections at any generation step can substantially alter subsequent safety behavior. We also find that a model's alignment with refusal directions in its hidden states does not predict its robustness to such injection, revealing that internal state alone does not determine generation behavior under perturbation. To address this, we align models directly on generation trajectories constructed by simulating mid-sequence perturbation, and show that this improves robustness to mid-sequence injection and generalizes to attacks that exploit early-token generation. Our work argues that robust safety alignment requires training on the generation process itself, not only its outputs.

URL PDF HTML ☆

赞 0 踩 0

2606.04775 2026-06-04 cs.LG cs.AI cs.CV cs.SY eess.SY math.OC 版本更新

Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control

通过降阶线性最优控制引导视频生成模型的激活

Jihoon Hong, Alice Chan, Qiyue Dai, Julian Skifstad, Glen Chou

发表机构 * Georgia Institute of Technology（佐治亚理工学院）

AI总结提出LA-LQR框架，将文本到视频推理建模为动态系统，通过降阶最优控制实现最小干预的激活引导，减少不安全内容生成同时保持视觉质量。

详情

AI中文摘要

在大规模网络数据上训练的文本到视频（T2V）模型可能生成不良内容，这促使我们进行干预以减少有害输出而不牺牲视觉质量。激活引导提供了一种有吸引力的机制替代微调和提示过滤，但现有的T2V引导方法仍然有限，通常采用粗糙的、非预测性的干预，可能导致过度引导和内容退化。为了弥补这一差距，我们提出了潜在激活线性二次型调节器（LA-LQR），一种用于最小侵入性T2V引导的降阶最优控制框架。LA-LQR将T2V推理表述为一个动态系统，并计算闭环反馈干预，将激活引导向期望的特征设定点，同时惩罚不必要的扰动。为了使最优控制对高维视频激活可行，我们将激活投影到由对比提示对导出的低维、任务相关子空间，估计该潜在空间中的局部线性动力学，并求解潜在LQR问题以获得时间步和层特定的引导信号。我们提供了将潜在设定点跟踪与原始激活空间特征控制联系起来的理论界限，并实证验证了降阶潜在动力学的保真度。在概念引导和视频安全基准测试中，LA-LQR相对于基线减少了不安全生成，同时保持了提示保真度和视觉质量。

英文摘要

Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering, but existing T2V steering methods remain limited, typically applying coarse, non-anticipative interventions that can lead to oversteering and content degradation. To close this gap, we propose Latent Activation Linear-Quadratic Regulator (LA-LQR), a reduced-order optimal control framework for minimally invasive T2V steering. LA-LQR formulates T2V inference as a dynamical system and computes closed-loop feedback interventions that steer activations toward desired feature setpoints while penalizing unnecessary perturbations. To make optimal control feasible for high-dimensional video activations, we project activations onto a low-dimensional, task-relevant subspace derived from contrastive prompt pairs, estimate local linear dynamics in this latent space, and solve a latent LQR problem to obtain timestep- and layer-specific steering signals. We provide theoretical bounds relating latent setpoint tracking to raw activation-space feature control, and empirically validate the fidelity of the reduced latent dynamics. On concept steering and video safety benchmarks, LA-LQR reduces unsafe generations relative to baselines, while preserving prompt fidelity and visual quality.

URL PDF HTML ☆

赞 0 踩 0

2606.04772 2026-06-04 cs.CV cs.AI 版本更新

Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

用于脑重建的基于顺序Mamba的粗到细层次架构

Hoang-Son Vo, Van-Hung Bui, Minh-Huy Mai-Duc, Tien-Dung Mai, Soo-Hyung Kim

发表机构 * Chonnam National University, Gwangju, Republic of Korea（全罗国立大学，韩国光州市）； Vietnam National University - Ho Chi Minh City, University of Science, Vietnam（越南国家大学-胡志明市，越南科学大学）； Institute for Cybersecurity and Digital Technologies, Russia（俄罗斯网络安全与数字技术研究所）

AI总结提出CHASMBrain，一种基于双流Mamba和粗到细策略的两阶段图像到fMRI编码框架，在NSD数据集上优于基线，并揭示了视觉皮层的因果组织特性。

详情

AI中文摘要

理解深度视觉表征与人类视觉系统之间的关系是计算神经科学中的一个基本挑战。尽管现代视觉模型在图像识别中取得了强劲性能，但它们与人类视觉皮层层次组织的对应关系仍是一个开放问题。在本研究中，我们提出了CHASMBrain，一种新颖的分层两阶段图像到fMRI编码框架。我们的架构利用双流Mamba设计，明确分离并处理全局语义标记和局部空间补丁，这一设计受视觉皮层功能组织的启发。采用粗到细策略：第一阶段预测去噪的ROI级激活，第二阶段使用Mamba-VAE将这些粗响应细化为全体素级预测。在自然场景数据集（NSD）上的实验表明，我们的方法达到了0.429的皮尔逊相关系数和0.261的均方误差，优于所有评估的基线，包括岭回归和DINOv2线性探针。除了预测性能，因果分支消融实验揭示了一种非对称特化：补丁流特定锁定于早期视觉皮层（视网膜拓扑区域），而CLS流为高阶区域提供更广泛的语义上下文——这种对应关系是因果性的，而不仅仅是相关性的。跨被试迁移实验进一步表明，学习到的骨干网络在个体间泛化良好，只需极少的个体适应，表明模型捕捉到了共享的、与主体无关的视觉表征。

英文摘要

Understanding the relationship between deep visual representations and the human visual system is a fundamental challenge in computational neuroscience. While modern vision models achieve strong performance in image recognition, their correspondence with the hierarchical organization of the human visual cortex remains an open question. In this study, we propose CHASMBrain, a novel hierarchical two-stage framework for image-to-fMRI encoding. Our architecture leverages a dual-stream Mamba design to explicitly separate and process global semantic tokens and local spatial patches, motivated by the functional organization of the visual cortex. A coarse-to-fine strategy is employed: Stage 1 predicts denoised ROI-level activations, while Stage 2 refines these coarse responses into full voxel-level predictions using a Mamba-VAE. Experiments on the Natural Scenes Dataset (NSD) demonstrate that our method achieves a Pearson correlation of 0.429 and an MSE of 0.261, outperforming all evaluated baselines including ridge regression and DINOv2 linear probes. Beyond predictive performance, causal branch-ablation experiments reveal an asymmetric specialization: the patch stream is specifically locked to early visual cortex (retinotopic regions), while the CLS stream contributes broader semantic context to higher-order areas -- a correspondence that holds causally, not merely correlationally. Cross-subject transfer experiments further show that the learned backbone generalizes across individuals with minimal per-subject adaptation, suggesting the model captures a shared, subject-agnostic visual representation.

URL PDF HTML ☆

赞 0 踩 0

2606.04769 2026-06-04 cs.CR cs.AI cs.SE 版本更新

Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications

现实世界 MCP 服务器中的描述-代码不一致性：测量、检测与安全影响

Yutao Shi, Xiaohan Zhang, Xiangjing Zhang, Xihua Shen, Hui Ouyang, Huming Qiu, Mi Zhang, Min Yang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结针对 MCP 服务器中工具描述与代码实现不一致的问题，提出结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法的自动检测框架 DCIChecker，并在大规模数据集上揭示 9.93% 的不一致率及其安全风险。

Comments Preprint

详情

AI中文摘要

模型上下文协议 (MCP) 已成为赋能大型语言模型 (LLM) 使用外部工具的关键标准。在此生态系统中，LLM 依赖 MCP 服务器提供的自然语言描述来选择和执行函数。这种交互隐含假设工具描述忠实地反映了其底层实现，而该假设在实践中并未得到强制验证。因此，MCP 部署可能遭受名为描述-代码不一致性 (DCI) 的问题，即工具对其能力和安全边界的描述与代码实际行为不一致。本文对现实世界 MCP 服务器中的 DCI 进行了全面研究。我们正式定义了该问题，并提出了一个涵盖功能不一致和未声明副作用的综合分类法。在此分类法指导下，我们开发了 DCIChecker，一个自动框架，结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法，交叉验证工具描述与实际代码实现。我们将该框架应用于一个大规模数据集，包含从 2,214 个现实世界 MCP 服务器中提取的 19,200 个描述-代码对。我们的测量揭示 DCI 普遍存在，其中 9.93% 的对存在不一致。我们进一步证明 DCI 造成了关键的防御盲点，助长了从操作故障到隐蔽恶意行为等多种风险。最后，我们提出了缓解策略以强制语义一致性并增强新兴智能体生态系统的可靠性。

英文摘要

The Model Context Protocol (MCP) has emerged as a critical standard empowering Large Language Models (LLMs) to utilize external tools. In this ecosystem, LLMs rely on natural language descriptions provided by MCP servers to select and execute functions. This interaction implicitly assumes that tool descriptions faithfully reflect their underlying implementations, while this assumption is not mandatorily verified in practice. As a result, MCP deployments may suffer from a problem named Description-Code Inconsistency (DCI), where a tool's description of its capabilities and security boundaries is not consistent with what the code actually does. In this paper, we present a comprehensive study of DCI in real-world MCP servers. We formally define the problem and propose a comprehensive taxonomy spanning functionality inconsistencies and undeclared side effects. Guided by this taxonomy, we develop DCIChecker, an automated framework that combines structure-aware static analysis with the Direct-Reverse-Arbitration prompting method to cross-validate tool descriptions against actual code implementations. We apply this framework to a large-scale dataset comprising 19,200 description-code pairs extracted from 2,214 real-world MCP servers. Our measurement reveals that DCI is widespread, with 9.93% of these pairs exhibiting inconsistencies. We further demonstrate that DCI creates a critical defense blind spot, facilitating varied risks from operational failures to stealthy malicious behaviors. Finally, we propose mitigation strategies to enforce semantic consistency and enhance the reliability of the emerging agentic ecosystem.

URL PDF HTML ☆

赞 0 踩 0

2606.04755 2026-06-04 hep-ex cs.AI cs.IR 版本更新

Archi: Agentic Operations at the CMS Experiment

Archi: CMS实验中的代理操作

Pietro Lugato, Luca Lavezzo, Jason Mohoney, Hasan Ozturk, Muhammad Hassan Ahmed, Juan Pablo Salas, Viphava Ohm, Krittin Phornsiricharoenphant, Gabriele Benelli, Mariarosaria D'Alfonso, Manasvita Joshi, Warren Nam, Aron Soha, Samantha Sunnarborg, Austin Swinney, Jack Tucker, Dmytro Kovalskyi, Tim Kraska, Christoph Paus

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； CMS Collaboration（CMS合作组）； CERN（欧洲核子研究中心）； University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Fermi National Accelerator Laboratory（费米国家加速器实验室）； Brown University（布朗大学）； Harvard University（哈佛大学）

AI总结提出Archi开源框架，整合异构数据源并部署可配置、私有的代理，用于CMS实验计算操作支持，在真实查询中表现有效。

详情

AI中文摘要

我们提出Archi，一个面向科学合作的开源端到端框架，它结合了异构数据源的系统化摄取和组织，以及可配置、私有且可扩展的代理的部署，这些代理能够检索和推理这些数据。自2026年2月起，Archi的一个实例已部署在CERN大型强子对撞机的CMS实验计算操作团队中，作为技术操作员的辅助代理，通过结合文档、历史数据和实时监控系统提供检索和分析能力。我们根据操作员反馈和从生产使用中收集的问题集对系统进行评估，这些问题由人工和自动化专家组评分。该系统在操作任务中证明有效，解决了CMS操作员提出的真实世界查询。我们还观察到，本地托管的开源权重模型表现具有竞争力，从而能够对敏感数据进行完全私有管理。

英文摘要

We present Archi, an open-source, end-to-end framework for scientific collaborations that combines the systematic ingestion and organization of heterogeneous data sources with the deployment of configurable, private, and extensible agents that retrieve and reason over them. An instance of Archi has been deployed for the Computing Operations team of the CMS experiment at CERN's LHC since February 2026 as a support agent for technical operators, offering retrieval and analysis capabilities by combining documentation, historical data, and live monitoring systems. We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels. The system proves effective at operational tasks, resolving real-world queries posed by CMS operators. We also observe that locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.

URL PDF HTML ☆

赞 0 踩 0

2606.04751 2026-06-04 cs.AI 版本更新

FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games

FALSIFYBENCH: 通过规则发现游戏评估大语言模型中的归纳推理

Leonardo Bertolazzi, Katya Tentori, Raffaella Bernardi

发表机构 * University of Trento（特伦托大学）； Free University of Bozen-Bolzano（博泽-博尔扎诺自由大学）

AI总结提出FALSIFYBENCH框架，基于Wason 2-4-6任务评估LLM在假设生成、证据收集和信念修正方面的归纳推理能力，发现推理模型优于指令微调模型，且主动寻求证伪的负测试策略是成功的关键。

详情

AI中文摘要

大型语言模型（LLM）越来越多地被部署为科学任务中的自主智能体。然而，这些系统能否有效参与与科学发现相关的归纳推理形式仍是一个开放问题。在这项工作中，我们引入了FALSIFYBENCH，一个受经典Wason 2-4-6任务启发的假设驱动推理评估框架，其中智能体必须通过迭代提出示例并接收反馈来发现隐藏的语义属性。该任务捕捉了科学推理的关键要素：假设生成、证据收集以及根据确认和证伪证据进行信念修正。我们对跨模型家族和规模的12个LLM的评估表明，推理模型通常比指令微调模型更强的科学推理者，尽管没有模型接近最优性能。成功的主要驱动因素是负测试的能力：主动寻求证伪其假设的模型始终优于主要寻求确认的模型。此外，先前工作中被忽略的细粒度回合级分析揭示，失败与模型在假设空间中导航的可识别模式相关。

英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents in scientific tasks. Yet whether these systems can effectively engage in forms of inductive reasoning relevant to scientific discovery remains an open question. In this work, we introduce FALSIFYBENCH, an evaluation framework for hypothesis-driven reasoning inspired by the classic Wason 2-4-6 task, in which agents must discover hidden semantic properties by iteratively proposing examples and receiving feedback. This task captures key elements of scientific reasoning: hypothesis generation, evidence gathering, and belief revision in response to both confirming and disconfirming evidence. Our evaluation of 12 LLMs across model families and scales shows that reasoning models are generally stronger scientific reasoners than instruction-tuned models, although no model comes close to optimal performance. The primary driver of success is the capacity for negative testing: models that actively seek to falsify their hypotheses consistently outperform those that primarily seek confirmation. Moreover, a fine-grained turn-level analysis, neglected in previous work, reveals that failure is tied to identifiable patterns in how models navigate the hypothesis space.

URL PDF HTML ☆

赞 0 踩 0

2606.04750 2026-06-04 cs.AI cs.CY cs.LG 版本更新

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

Fog of Love: 基于亲和力强化学习在游戏环境中塑造道德智能体行为

Ajay Vishwanath, Christian Omlin

发表机构 * University of Agder（阿格德大学）

AI总结本文提出基于亲和力的强化学习方法，通过策略正则化在多智能体角色扮演游戏Fog of Love中同时实现竞争与合作目标，并提升智能体行为的可解释性。

详情

AI中文摘要

在人工智能中注入道德行为越来越受到关注。其中一种提出的技术是基于亲和力的强化学习，它通过对目标函数进行策略正则化来激励道德行为，而不完全依赖于奖励函数设计。迄今为止，该技术已在状态和动作空间最小的网格世界和玩具问题环境中证明有效。为了将这项研究扩展到更复杂的环境，我们引入了一个基于角色扮演棋盘游戏Fog of Love的双人多智能体环境。在该环境中，两个智能体竞争以实现各自的道德目标，同时合作以维持他们的关系。鉴于多智能体性质，这是一个复杂问题，其中多智能体深度确定性策略梯度智能体既不能成功竞争也不能成功合作。我们提供的证据表明，局部亲和力增强了智能体在实现竞争和合作目标方面的性能，从而在两个领域都获得了更高的总体得分。这不仅产生了道德选择，还阐明了智能体的目的论，并使其行为达到人类水平的可解释性。

英文摘要

Instilling virtuous behavior in artificial intelligence has seen increasing interest. One of the techniques proposed is known as affinity-based reinforcement learning, which uses policy regularization on the objective function to incentivize virtuous actions without being fully dependent on the reward function design. Thus far, this technique has been demonstrated to be effective in grid worlds and toy-problem environments with minimal state and action spaces. To expand this research to more sophisticated environments, we introduce a two-player multi-agent environment based on the role-playing board game known as Fog of Love. In this environment, two agents compete to fulfill their individual virtues, while also cooperating to satisfy their relationship. Given the multi-agent nature, this is a complex problem where multi-agent deep deterministic policy gradient agents neither compete nor cooperate successfully. We present evidence that localized affinities enhance agent performance in achieving both competitive and cooperative objectives, resulting from superior overall scores in both domains. This not only results in virtuous choices but also clarifies an agent's teleology and makes its behavior human-level interpretable.

URL PDF HTML ☆

赞 0 踩 0

2606.04743 2026-06-04 cs.CL cs.AI cs.LG 版本更新

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

TIDE：通过模板引导迭代的主动多问题发现

Soyeong Jeong, Jinheon Baek, Minki Kang, Sung Ju Hwang

发表机构 * KAIST（韩国科学技术院）； DeepAuto.ai

AI总结提出TIDE框架，通过模板引导的迭代机制主动发现用户上下文中隐藏的多个问题，并给出具体行动方案，在个人工作区和软件仓库两个场景中显著提升任务覆盖率和问题识别与解决能力。

详情

AI中文摘要

智能体被广泛部署为文档、工具和代码的助手。然而，它们通常仅对明确的用户请求做出响应，这些请求只反映了用户已注意到的问题，而许多其他重要问题共存于更广泛的用户上下文中，隐藏于显而易见之处，且其总数事先未知。我们将此定义为从上下文中发现多个隐藏问题的任务，其中应揭示共存的问题，基于支持性证据，并配以具体行动。为此，我们引入了TIDE，一个模板引导的迭代框架，包含两种互补机制。具体而言，基于单次预测倾向于关注最显著案例并产生泛化结论的观察，我们提出迭代发现：每轮生成一小批候选，同时基于已发现结果进行条件化，从而后续轮次扩展覆盖范围；以及思维模板：从先前解决的案例中提炼的可重用模式，指定应关注哪些上下文信号以及如何连接它们，将每个预测锚定于可识别的问题类别。我们在两个现实场景（个人工作区和软件仓库）中，使用四种模型骨干验证了TIDE，在任务覆盖率、识别和解决方面显著优于单次和并行多智能体基线。

英文摘要

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their total number unknown in advance. We frame this as the task of discovering multiple hidden problems from context, in which coexisting problems should be uncovered, grounded in supporting evidence, and paired with concrete actions. To this end, we introduce TIDE, a template-guided iterative framework with two complementary mechanisms. Specifically, motivated by the observation that single-pass prediction anchors on the most salient cases and yields generic claims, we propose iterative discovery, which surfaces a small batch of candidates per round while conditioning on what has already been found, so subsequent rounds extend coverage; and thought templates, reusable schemas distilled from previously solved cases that specify what contextual signals to attend to and how to connect them, anchoring each prediction in a recognizable problem class. We validate TIDE on two realistic settings, personal workspaces and software repositories, across four model backbones, showing substantial gains over single-shot and parallel multi-agent baselines on task coverage, identification, and resolution.

URL PDF HTML ☆

赞 0 踩 0

2606.04739 2026-06-04 cs.SE cs.AI 版本更新

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

重新审视Vul-RAG：基于RAG的漏洞检测的可复现性与可复制性——使用开放权重模型

Sabrina Kaniewski, Fabian Schmidt, Tobias Heer

发表机构 * Institute for Secure Networked Systems, Esslingen University（安全网络系统研究所，埃斯林根大学）； Institute for Intelligent Systems, Esslingen University（智能系统研究所，埃斯林根大学）

AI总结本研究通过本地部署和多种开放权重模型，复现并扩展了Vul-RAG框架，发现其性能存在约0.30成对准确率的上限，且模型能力提升无法显著改善性能。

Comments Accepted at AI&CCPS 2026 workshop, co-located with the 21st International Conference on Availability, Reliability and Security (ARES 2026). This is the authors' preprint version

详情

AI中文摘要

大型语言模型（LLMs）在自动化软件漏洞检测方面展现出强大潜力，尤其是在检索增强生成（RAG）设置中。然而，对于依赖专有模型和API的方法，可复现性和可复制性在很大程度上仍未得到探索，这引发了一个问题：报告的结果是否具有普遍性，还是主要依赖于特定的模型选择。在这项工作中，我们对Vul-RAG进行了可复现性研究，Vul-RAG是一个基于RAG的源代码漏洞检测框架，它利用高级漏洞知识增强LLMs。我们首先使用报告中的开放权重基线模型，在完全本地和开放权重的设置下复现了结果。然后，我们将评估扩展到一组多样化的最新开放权重LLMs，包括代码专用、通用和推理模型，参数规模各异。结果证实，Vul-RAG的发现可以在本地部署下复现，但存在微小偏差。在所有评估的模型中，我们观察到性能在约0.30成对准确率（即漏洞函数和修补函数都被正确分类的代码对）处达到平台期。值得注意的是，即使对于更新更先进的模型，这一平台期仍然存在，表明仅凭模型能力的提升并不能显著提高性能。最后，我们讨论了检测效果、模型能力和模型规模之间的实际影响和权衡。实现和评估工件可在 https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG 公开获取。

英文摘要

Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study of Vul-RAG, a RAG-based framework for source code vulnerability detection that enhances LLMs with high-level vulnerability knowledge. We first replicate the results in a fully local and open-weights setting using the reported open-weight baseline models. We then extend the evaluation to a diverse set of recent open-weight LLMs, including code-specialized, general-purpose, and reasoning models of varying parameter sizes. The results confirm that the findings of Vul-RAG are reproducible under local deployment, but with minor deviations. Across all evaluated models, we observe a performance plateau at approximately 0.30 pairwise accuracy (code pairs for which both the vulnerable and the patched function are correctly classified). Notably, this plateau persists even for more recent and advanced models, indicating that improvements in model capacity alone do not substantially enhance performance. Finally, we discuss practical implications and trade-offs between detection effectiveness, model capabilities, and model scale. Implementation and evaluation artifacts are publicly available at https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG.

URL PDF HTML ☆

赞 0 踩 0

2606.04736 2026-06-04 cs.LG cs.AI 版本更新

Curvature-aware dynamic precision approach for physics-informed neural networks

面向物理信息神经网络的曲率感知动态精度方法

Yingjie Shao, Ioannis N. Athanasiadis, George van Voorn, Taniya Kapoor

发表机构 * Mathematical & Statistical Methods Group (Biometris), Wageningen University & Research（数学与统计方法组（Biometris），瓦赫宁根大学与研究中心）； Artificial Intelligence Group, Wageningen University & Research（人工智能组，瓦赫宁根大学与研究中心）

AI总结提出一种曲率感知精度控制器，利用L-BFGS优化器中的曲率信息动态调整数值精度，在保持预测精度的同时降低双精度训练的计算成本。

详情

AI中文摘要

物理信息神经网络（PINNs）通过将物理定律直接嵌入神经网络训练，已成为模拟偏微分方程（PDEs）的有前景框架。然而，近期研究表明PINN优化对数值精度敏感。现有实现通常使用单精度（FP32），计算效率高但易出现失败模式，或双精度（FP64），鲁棒但成本高昂。这造成了计算效率与数值精度之间的权衡。为降低双精度训练的计算成本同时保持预测精度，我们提出一种曲率感知精度控制器，在训练过程中自适应调整数值精度，而非将其视为固定的实现选择。该方法重用来自有限内存BFGS（L-BFGS）优化器的曲率信息来构建精度控制器，在低精度足够时保留FP32，并在训练动态表明数值敏感或精度受限停滞时提升至FP64计算。我们在四个典型PINN失败模式基准和一个辐照度驱动的常微分方程示例上评估了所提方法。我们还测试了不同神经网络架构下的方法。该方法在所有基准方程上一致匹配甚至略微超过全FP64解的精度，同时相对于全双精度训练减少了训练时间。所得结果表明，PINN优化中的精度敏感性具有相位依赖性，仅在数值关键阶段选择性应用更高精度可以在不牺牲预测精度的前提下降低计算成本。

英文摘要

Physics-informed neural networks (PINNs) have become a promising framework for simulating partial differential equations (PDEs) by embedding physical laws directly into neural network training. However, recent studies show that PINN optimisation is sensitive to numerical precision. Existing implementations commonly use either single precision (FP32), which is computationally efficient but prone to failure modes, or double precision (FP64), which is robust but substantially expensive. This creates a trade-off between computational efficiency and numerical accuracy. To reduce the computational cost of double-precision training while retaining prediction accuracy, we propose a curvature-aware precision controller that adapts numerical precision during training rather than treating it as a fixed implementation choice. The proposed method reuses curvature information derived from the limited-memory BFGS (L-BFGS) optimiser to construct a precision controller, retaining FP32 when lower precision is sufficient and promoting computation to FP64 when the training dynamics indicate numerical sensitivity or precision-limited stagnation. We evaluate the proposed approach on four canonical PINN failure-mode benchmarks and an irradiance-driven ordinary differential equation example. We further test the proposed approach across different neural network architectures. The method consistently matches or even slightly exceeds full FP64 solution accuracy while reducing training time relative to full double-precision training on all benchmark equations. The obtained results indicate that precision sensitivity in PINN optimisation is phase-dependent, and that selectively applying higher precision only during numerically critical stages can lower computational cost without sacrificing predictive accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.04735 2026-06-04 cs.LG cs.AI 版本更新

Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

迹介导的峰值偏差：深度强化学习中时间信用分配与认知启发式的桥梁

Viktor Veselý, Aleksandar Todorov, Erwan Escudie, Matthia Sabatelli

发表机构 * Department of AI, University of Groningen（格罗宁根大学人工智能系）

AI总结本文发现深度强化学习中的迹介导峰值偏差（TMPB），揭示了其作为峰值-末端规则的机制基础，并证明自适应优化器通过二阶矩归一化可缓解该偏差。

详情

AI中文摘要

时间信用分配是生物和人工智能的核心问题，但其与非线性函数逼近的相互作用尚不清楚。我们在深度强化学习中识别出一种系统性失效模式，称为迹介导峰值偏差（TMPB）。在中间资格迹深度下，智能体非理性地偏好具有高幅度奖励“峰值”的轨迹，而非具有更高累积回报的替代轨迹。这为峰值-末端规则提供了一种机制解释：一种人类记忆偏差，其中经验由其最强烈的时刻而非整合效用判断。我们证明，TMPB的出现是因为迹将远时时间差分误差放大为“梯度冲击”，而固定步长的随机梯度下降无法将其归一化，导致全局高估。相反，自适应优化器通过二阶矩归一化缓解了这种病理现象。我们的结果表明，类人的显著性扭曲可能自然产生于分布式系统中信用分配的数学约束，而自适应优化是理性价值估计的理论必要条件。

英文摘要

Temporal credit assignment is central to both biological and artificial intelligence, yet its interaction with non-linear function approximation is poorly understood. We identify a systematic failure mode in deep reinforcement learning (RL) termed Trace-Mediated Peak Bias (TMPB). At intermediate eligibility trace depths, agents irrationally prefer trajectories with high-magnitude reward ``peaks'' over alternatives with higher cumulative returns. This provides a mechanistic account of the Peak-End Rule: a human memory bias where experiences are judged by their most intense moments rather than integrated utility. We show that TMPB emerges because traces amplify distal Temporal Difference errors into ``gradient shocks'' that fixed-step-size Stochastic Gradient Descent cannot normalize, leading to global overestimation. Conversely, adaptive optimizers mitigate this pathology via second-moment normalization. Our results suggest that human-like saliency distortions may emerge naturally from the mathematical constraints of credit assignment in distributed systems, and that adaptive optimization is a theoretical necessity for rational value estimation.

URL PDF HTML ☆

赞 0 踩 0

2606.04705 2026-06-04 cs.CV cs.AI 版本更新

Enhancing MedSAM with a Lightweight Box Predictor for Medical Image Segmentation

通过轻量级框预测器增强 MedSAM 用于医学图像分割

Amirhossein Movahedisefat, Amirreza Fateh, Mohammad Reza Mohammadi

发表机构 * School of Computer Engineering, Iran University of Science and Technology (IUST)（伊朗科学技术大学计算机工程学院）

AI总结提出一种集成轻量级框预测器的 MedSAM 增强框架，通过单次点击估计边界框以提升点提示的空间引导能力，在仅增加 1.6M 参数下显著提高多模态医学图像分割的准确性和鲁棒性。

详情

AI中文摘要

医学图像中的语义分割是一项关键但具有挑战性的任务，原因是数据稀缺和跨模态的高变异性。虽然像 Segment Anything Model (SAM) 这样的基础模型显示出潜力，但它们在没有特定适应的情况下往往难以处理医学图像。此外，点提示尽管是最自然的用户交互形式，但为可靠分割提供的空间上下文不足，特别是当目标结构不规则或对比度差时。在本文中，我们提出了一种增强的分割框架，将轻量级框预测器模块集成到 MedSAM 架构中。框预测器通过使用局部图像嵌入特征从单次用户点击估计近似边界框，提供空间引导以减少点提示的模糊性，同时仅引入 1.6M 额外参数和可忽略的推理开销。我们引入了一个两阶段训练流程，其中框预测器在集成到 MedSAM 之前独立训练。为了验证我们方法的泛化能力，我们在四个不同的数据集（FLARE22、BRISC、BUSI、LungSegDB）上进行了广泛评估，这些数据集涵盖不同的成像模态，包括 CT、MRI 和超声。我们的方法在不同解剖结构和成像领域中提高了分割准确性和鲁棒性，在 BUSI、FLARE22、BRISC 和 LungSegDB 上分别达到了 0.89、0.93、0.88 和 0.98 的 Dice 分数。代码可在 https://github.com/Amirhosseinmovahedi/MedSAM-BoxPredictor 获取。

英文摘要

Semantic segmentation in medical imaging is a critical yet challenging task due to data scarcity and high variability across modalities. While foundation models like the Segment Anything Model (SAM) show promise, they often struggle with medical images without specific adaptation. Moreover, point prompts, despite being the most natural form of user interaction, provide insufficient spatial context for reliable segmentation, particularly when target structures are irregular or poorly contrasted. In this paper, we propose an enhanced segmentation framework that integrates a lightweight Box Predictor module into the MedSAM architecture. The Box Predictor estimates an approximate bounding box from a single user click using localized image embedding features, providing spatial guidance that reduces the ambiguity of point prompts, while introducing only 1.6M additional parameters and negligible inference overhead. We introduce a two-stage training pipeline where the Box Predictor is trained independently before being integrated into MedSAM. To validate the generalization capability of our method, we conduct extensive evaluations on four diverse datasets (FLARE22, BRISC, BUSI, LungSegDB) spanning distinct imaging modalities, including CT, MRI, and Ultrasound. Our method improves segmentation accuracy and robustness across varied anatomical structures and imaging domains, achieving Dice scores of 0.89 (BUSI), 0.93 (FLARE22), 0.88 (BRISC), and 0.98 (LungSegDB). Code is available at https://github.com/Amirhosseinmovahedi/MedSAM-BoxPredictor

URL PDF HTML ☆

赞 0 踩 0

2606.04699 2026-06-04 cs.LG cs.AI cs.CV 版本更新

Graph-Guided Universum Learning in Generalized Eigenvalue Proximal SVMs for Alzheimer's Disease Classification

基于图引导的广义特征值近端支持向量机中的Universum学习用于阿尔茨海默病分类

Yogesh Kumar, Vrushank Ahire, Mudasir Ganaie

发表机构 * Dept. of Computer Science and Engineering, IIT Ropar, Punjab 140001, India（计算机科学与工程系，IIT罗帕尔，旁遮普140001，印度）

AI总结针对阿尔茨海默病分类，提出两种图引导的Universum学习模型UG-GEPSVM和IUG-GEPSVM，利用轻度认知障碍样本构建图拉普拉斯正则化，替代传统独立惩罚项，在ADNI MRI数据集上取得更优性能。

详情

AI中文摘要

早期准确检测阿尔茨海默病（AD）对于及时干预和疾病管理至关重要。广义特征值近端支持向量机（GEPSVM）及其基于Universum的变体在AD分类中显示出有希望的结果。然而，现有方法将Universum样本视为独立点，未考虑它们之间的几何关系。本文提出了两种图引导的Universum学习模型，即UG-GEPSVM和IUG-GEPSVM，用于使用结构MRI数据进行AD与认知正常（CN）分类。在所提出的框架中，轻度认知障碍（MCI）受试者被用作Universum数据，以提供AD和CN类别之间的中间信息。使用高斯相似性、最小生成树连通性和多跳传播在Universum样本上构建图。从该图中导出拉普拉斯矩阵，捕获MCI样本的几何结构。这种基于拉普拉斯的正则化被纳入学习过程，以替代传统的独立Universum惩罚项。UG-GEPSVM将此正则化集成到广义特征值公式中，而IUG-GEPSVM使用标准特征值公式扩展了数值稳定的改进GEPSVM框架。在ADNI MRI数据集变体上使用ICA和PCA特征在五个不同噪声水平下的实验表明，两种提出的模型始终优于现有的GEPSVM和基于Universum的方法。UG-GEPSVM实现了88.07%的最高平均AUC，并在增加的噪声水平下保持稳定的性能。统计检验进一步证实了观察到的改进的显著性。

英文摘要

Early and accurate detection of Alzheimer's disease (AD) is important for timely intervention and disease management. Generalized Eigenvalue Proximal Support Vector Machine (GEPSVM) and its Universum-based variants have shown promising results for AD classification. However, existing methods treat Universum samples as independent points and do not consider the geometric relationships among them. This paper proposes two graph-guided Universum learning models, namely UG-GEPSVM and IUG-GEPSVM, for AD versus cognitively normal (CN) classification using structural MRI data. In the proposed framework, mild cognitive impairment (MCI) subjects are used as Universum data to provide intermediate information between AD and CN classes. A graph is constructed over the Universum samples using Gaussian similarity, Minimum Spanning Tree connectivity, and multi-hop propagation. From this graph, a Laplacian matrix is derived that captures the geometric structure of the MCI samples. This Laplacian-based regularization is incorporated into the learning process in place of the conventional independent Universum penalty term. UG-GEPSVM integrates this regularization into the generalized eigenvalue formulation, while IUG-GEPSVM extends the numerically stable improved GEPSVM framework using a standard eigenvalue formulation. Experiments on ADNI MRI dataset variants using ICA- and PCA-based features at five different noise levels show that both proposed models consistently outperform existing GEPSVM and Universum-based methods. UG-GEPSVM achieves the highest average AUC of 88.07% and maintains stable performance under increasing noise levels. Statistical tests further confirm the significance of the observed improvements.

URL PDF HTML ☆

赞 0 踩 0

2606.04684 2026-06-04 cs.CV cs.AI 版本更新

Real-Time Automatic License Plate Recognition Using YOLOv8, SORT Tracking, and Temporal Data Interpolation

基于YOLOv8、SORT跟踪与时间数据插值的实时自动车牌识别

Mirza Muhammad Mobeen

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一个五阶段端到端算法流程，结合YOLOv8目标检测、SORT多目标跟踪和时间数据插值，解决动态交通监控中因光照变化、遮挡等导致的识别率低和跟踪路径断裂问题。

Comments 7 Pages, For Accessing code:https://github.com/ mobeen-pmo/Automatic-License-Plate-Recognition

详情

AI中文摘要

视频处理的实时困难严重限制了自动车牌识别（ALPR）在动态交通监控环境中的应用。对非受控变量（如光照剧烈变化、摄像机扫描角度、车辆高速行驶和物理遮挡）的高保真识别是一个问题，常导致跟踪路径断裂和光学字符识别（OCR）率低下。为缓解这些弱点，本研究提出一个五阶段端到端算法流程，涵盖基于深度学习的目标检测、运动学多目标跟踪和几何时间数据插值之间的平滑过渡。所提出的架构利用强大的YOLOv8 nano模型在第一阶段定位车辆，然后使用简单在线实时跟踪（SORT）算法建立帧间时空联系。另一种更具体的YOLOv8目标检测器检测车牌区域，将切片数组传递给EasyOCR链，并受位置语法验证约束。更重要的是，启动离线时间边界框插值机制以重新连接断裂的路径。

英文摘要

The real-time hardships of video processing seriously limit the usage of Automatic License Plate Recognition (ALPR) with application in dynamic traffic monitoring settings. High-fidelity recognition of unconstrained variables, e.g. drastic variations in illumination, acute camera scans, high vehicle speeds, and harsh physical concealment, is a problem that often leads to disjointed tracking paths and poor Optical Character Recognition (OCR) rates. In order to mitigate these weaknesses, the study proposes a 5 stage, end-to-end algorithmic pipeline, encompassing a smooth transition between deep learning based object detection, multi-object tracking which is kinematic in nature, and geometry temporal data interpolation. The suggested architecture takes advantage of a very powerful YOLOv8 nano model to localize the vehicle at the first stage and then Simple Online and Realtime Tracking (SORT) algorithm is used to build spatial-temporal links between frames. Another, more specific typology of YOLOv8 object detectors the license plate area, channeling the sliced array to an EasyOCR chain under the limitations of positional syntax verification. More importantly, an offline interpolation mechanism of temporal bounding box is initiated to recast fragmented paths.

URL PDF HTML ☆

赞 0 踩 0

2606.04662 2026-06-04 cs.LG cs.AI 版本更新

Why Muon Outperforms Adam: A Curvature Perspective

为什么 Muon 优于 Adam：曲率视角

Shuche Wang, Fengzhuo Zhang, Jiaxiang Li, Dirk Bergemann, Zhuoran Yang

发表机构 * National University of Singapore（新加坡国立大学）； Yale University（耶鲁大学）； University of Minnesota（明尼苏达大学）

AI总结从曲率视角出发，通过泰勒展开和曲率分解，发现 Muon 因更低的归一化方向锐度（NDS）而比 Adam 实现更大的一步损失下降，数据不平衡和层内曲率是其主要优势来源。

详情

AI中文摘要

Muon 在大语言模型训练中相比 Adam 将训练效率提升约两倍，但这一优势的局部几何来源尚不清楚。我们的工作首次从曲率视角尝试揭开 Muon 优于 Adam 的原因。首先，我们对训练损失曲面应用二阶泰勒近似，表明在匹配验证损失下，Muon 比 Adam 实现更大的一步损失下降。两种优化器的一阶增益相当，但 Muon 始终承受更小的二阶曲率惩罚。其次，我们将该曲率惩罚分解为更新范数的平方和归一化方向锐度（NDS）。我们发现 Muon 和 Adam 的更新范数相当，因此 Muon 更小的曲率惩罚源于更低的 NDS，而非更新尺度。第三，我们研究训练数据和模型结构如何塑造 Muon 的 NDS 优势。使用具有受控不平衡的 Zipf-概率上下文无关文法（PCFG）数据，我们表明数据不平衡放大了 Muon 相对于 Adam 的 NDS 优势。进一步的层内/跨层分解表明，在训练的中后期，Muon 更低的 NDS 主要由更小的层内曲率维持。除了经验证据，我们还分析了具有异质曲率和梯度对齐于高曲率模式的风格化二次问题。我们证明 Muon 通过平衡曲率组间的更新能量，实现了比 GD 更低的平均 NDS；当曲率异质性足够强时，在相同步数后这也产生更低的局部二次损失。

英文摘要

Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss decrease than Adam at matched validation loss. The two optimizers have comparable first-order gains, but Muon consistently incurs a smaller second-order curvature penalty. Second, we decompose this curvature penalty into the squared update norm and Normalized Directional Sharpness (NDS). We find that Muon and Adam have comparable update norms, so Muon's smaller curvature penalty is driven by lower NDS, not update scale. Third, we study how training data and model structure shape Muon's NDS advantage. Using Zipf-Probabilistic Context-Free Grammar (PCFG) data with controlled imbalance, we show that data imbalance amplifies Muon's NDS advantage over Adam. A within-/cross-layer decomposition further shows that, in the middle and late stages of training, Muon's lower NDS is mainly sustained by smaller within-layer curvature. Beyond empirical evidence, we analyze stylized quadratic problems with heterogeneous curvature and gradient alignment toward high-curvature modes. We prove that Muon attains a smaller average NDS than GD by balancing update energy across curvature groups; when curvature heterogeneity is sufficiently strong, this also yields lower local quadratic loss after the same number of steps.

URL PDF HTML ☆

赞 0 踩 0

2606.04656 2026-06-04 cs.CV cs.AI 版本更新

Instance-Level Post Hoc Uncertainty Quantification in Object Detection

目标检测中的实例级事后不确定性量化

Chongzhe Zhang, Zifan Zeng, Qunli Zhang, Feng Liu, Zheng Hu

发表机构 * Tsinghua University（清华大学）

AI总结提出蒙特卡洛广义线性模型（MC-GLM），用于目标检测中实例级、近似事后不确定性量化，无需重新训练，在nuScenes数据集上验证了有效性。

Comments 7 pages, 2 figures

2606.04648 2026-06-04 cs.AI 版本更新

BiNSGPS: Geometry Problem Solving via Bidirectional Neuro-Symbolic Interaction

BiNSGPS: 通过双向神经符号交互解决几何问题

Qi Wang, Peijie Wang, Fei Yin, Cheng-Lin Liu

发表机构 * MAIS, Institute of Automation of Chinese Academy of Sciences（自动化研究所）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）

AI总结提出BiNSGPS框架，通过多模态大语言模型顾问与符号求解器之间的双向神经符号交互，动态纠正不一致的形式表示或提出辅助假设，以解决几何问题中的早期错误和符号冲突。

2606.04646 2026-06-04 cs.CL cs.AI cs.IR 版本更新

QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples

QO-Bench: 诊断类型化事件元组上的查询操作符保持检索

Mengao Zhang, Xiang Yang, Chang Liu, Tianhui Tan, Ke-wei Huang

发表机构 * Asian Institute of Digital Finance, National University of Singapore（亚洲数字金融研究所，新加坡国立大学）

AI总结提出QO-Bench基准，通过类型化事件元组上的确定性评估，诊断检索增强生成系统在查询操作符（如连接、交集）上的执行瓶颈。

Comments 14 pages

详情

AI中文摘要

许多关于商业、法律和科学语料库的现实世界问题是文本中潜在记录的数据库风格查询的自然语言版本。现有的检索增强生成（RAG）系统主要针对语义相关性进行优化，但检索到看似相关的段落并不能保证正确的查询执行。我们引入了QO-Bench，一个用于类型化事件元组上查询操作符问答的诊断基准。该基准涵盖22,984篇新闻文章和614个公司事件，涉及18个查询模板，在785个问题上进行评估。每个黄金答案由类型化事件元组确定性计算得出，并通过召回率评分，答案通过精确匹配而非LLM评判器与黄金元组匹配。这种设计支持操作符级别的诊断，如连接和交集。我们在匹配条件下评估了RAG、ReAct RAG、GraphRAG和信息提取到SQL的方法，并设置了一个长上下文oracle上限以隔离检索失败。一个双轴框架——索引时保持与查询时执行——预测了每种范式失败的位置，结果证实了这一点：系统检索到相关文本，但丢弃了操作符所需的类型化值，并且可部署的范式排名在不同操作符间反转，相似性检索在过滤/投影上领先，而提取到SQL在交集和计数上领先。即使提供了黄金证据，长上下文oracle也远未饱和，因此操作符执行——而不仅仅是检索——是一个核心瓶颈，更强的答案模型也无法消除。QO-Bench将目标从段落相关性重新定义为查询操作符保持检索。

英文摘要

Many real-world questions over business, legal, and scientific corpora are natural-language versions of database-style queries over records latent in text. Existing retrieval-augmented generation (RAG) systems are optimized primarily for semantic relevance, but retrieving plausible passages does not guarantee correct query execution. We introduce QO-Bench, a diagnostic benchmark for query-operator question answering over typed event tuples. The benchmark covers 22,984 news articles and 614 corporate events across 18 query templates, evaluated on 785 questions. Each gold answer is deterministically computed from typed event tuples and scored by recall, with answers matched to the gold tuples by exact match rather than an LLM judge. This design enables operator-level diagnosis such as joins and intersection. We evaluate RAG, ReAct RAG, GraphRAG, and information-extraction-to-SQL under matched conditions, with a long-context oracle ceiling to isolate retrieval failure. A two-axis framework -- index-time preservation versus query-time execution -- predicts where each paradigm fails, and the results bear it out: systems retrieve relevant text but discard the typed values operators need, and the deployable paradigm ranking inverts across operators, with similarity retrieval leading on filter/project and extraction-to-SQL on intersection and counting. Even given the gold evidence, a long-context oracle stays far from saturated, so operator execution -- not retrieval alone -- is a core bottleneck that a stronger answer model does not remove. QO-Bench reframes the goal from passage relevance to query-operator-preserving retrieval.

URL PDF HTML ☆

赞 0 踩 0

2606.04620 2026-06-04 cs.LG cs.AI 版本更新

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

QuBLAST: 一种采用块级压缩方法和激活缩放策略量化大语言模型的框架

Pasindu Wickramasinghe, Achyuta Muthuvelan, Rachmad Vidya Wicaksana Putra, Minghao Shao, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi（eBRAIN实验室，工程系，纽约大学（NYU）阿布扎赫德分校）

AI总结针对大语言模型部署困难，提出QuBLAST框架，通过块级混合精度量化和激活缩放策略，在降低模型大小40%-45.2%的同时保持困惑度增加不超过5%。

Comments 10 pages, 9 figures, 5 tables

详情

AI中文摘要

Ekka: LLM推理中静默错误的自动诊断

Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结提出Ekka系统，通过差分调试对齐比较中间执行状态，自动诊断LLM推理框架中的静默错误，在真实错误基准上达到80% pass@1和88% pass@5的诊断准确率。

Comments ICML 2026

详情

AI中文摘要

LLM服务框架随着复杂的软件栈和大量优化而快速发展。快速开发过程可能引入静默错误，即输出质量在没有任何显式错误信号的情况下悄然下降。由于高层症状与底层根本原因之间存在巨大的语义鸿沟，诊断静默错误非常困难。我们观察到，通过利用语义正确的参考实现，静默错误的诊断可以有效地构建为差分调试问题。我们提出了Ekka，一个自动诊断系统，通过系统地对齐和比较目标框架与参考框架之间的中间执行状态来识别根本原因。我们构建了一个来自流行服务框架的真实静默错误基准，Ekka显示出80%的pass@1诊断准确率和88%的pass@5诊断准确率，优于现有系统。Ekka还诊断了服务框架中的4个新静默错误，所有错误均已得到开发者确认。

英文摘要

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notoriously difficult due to the substantial semantic gap between the high-level symptoms and the low-level root causes. We observe that diagnosis of silent errors can be effectively framed as a differential debugging problem by leveraging the existence of semantically correct reference implementations. We propose Ekka, an automated diagnosis system that identifies root causes by systematically aligning and comparing intermediate execution states between a target and a reference framework. We constructed a benchmark of real-world silent errors from popular serving frameworks, where Ekka shows 80% pass@1 diagnosis accuracy and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems. Ekka also diagnoses 4 new silent errors from serving frameworks, all of which have been confirmed by the developers.

URL PDF HTML ☆

赞 0 踩 0

2606.04592 2026-06-04 cs.CY cs.AI cs.HC 版本更新

Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?

合成人格：LLM 如何使用社会经济微观数据模仿个体受访者？

Leonard Kinzinger, Jochen Hartmann

发表机构 * Technical University of Munich（慕尼黑技术大学）

AI总结研究利用德国社会经济面板数据构建个体级数字孪生，通过评估不同构建方法（模型、信息深度、嵌入方式、推理模式）对200万以上孪生响应的准确性，发现信息深度在75%熵分位数达到成本效益帕累托点，最佳单元准确率达78.8%。

详情

AI中文摘要

基于LLM的数字孪生有望扩展和加速市场研究，但大多数已发表的孪生要么是基于少数人口统计问题的粗略角色机器人，要么是基于专门收集的调查和访谈记录构建的详细个体级孪生。这两种设置都不涉及营销实践中操作上最相关的情况：从企业通过CRM系统、忠诚度计划和重复调查积累的现有异构面板数据中构建详细的个体孪生。我们从德国社会经济面板（SOEP）构建详细的个体级孪生，并在一个$3 \times 5 \times 2 \times 2$的构建方法网格中评估它们，该网格涵盖三个开放权重的LLM、五个按归一化香农熵排序的累积信息深度、两种嵌入方法和两种推理模式，对500名参与者和183个保留问题评分超过210万个孪生响应。孪生质量随信息深度提高，但超过75%熵分位数后收益递减，该分位数相对于性能最佳的100%单元充当成本效益帕累托点。将嵌入从叙述性角色摘要切换到原始对话历史（过去响应）在100%深度下每个模型-推理单元中提高了保留准确率，而显式思考模式提高了秩次相关性但不改变准确率。最佳单元准确率达到78.8%，Fisher-$z$相关性在SOEP保留评估集上达到$r = 0.590$。研究结果表明，基于孪生的市场研究不再受数据设计限制，而是受项目数量、模型选择和本文现在映射的一小部分构建级决策限制。

英文摘要

LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.

URL PDF HTML ☆

赞 0 踩 0

2606.04579 2026-06-04 cs.AI 版本更新

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

SCI-PRM：用于科学推理验证的工具感知过程奖励模型

Xiangyu Zhao, Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu

发表机构 * The Hong Kong Polytechnic University（香港理工大学）； Shanghai AI Lab（上海人工智能实验室）； National University of Singapore（新加坡国立大学）； Shanghai Jiao Tong University（上海交通大学）； Sichuan University（四川大学）； Tongji University（同济大学）

AI总结针对科学推理中工具使用和事实一致性问题，提出Sci-PRM模型，通过构建包含工具链轨迹的数据集SCIPRM70K并训练过程奖励模型，在测试时扩展和强化学习中提供细粒度监督，提升基础模型性能。

Comments Accepted by KDD 2026 AI4Science Track

详情

AI中文摘要

虽然过程奖励模型（PRM）在数学推理中取得了显著成功，但它们在复杂科学领域（如生物学、化学和物理学）的应用仍基本未被探索。科学问题不仅要求逻辑严谨，还要求事实一致性和领域特定工具的精确使用，而当前模型在这些方面常常出现幻觉且缺乏验证。在本文中，我们首先构建了SCIPRM70K，这是一个大规模数据集，包含显式地将推理与科学工具执行交错的工具链轨迹。在此基础上，我们训练了一个名为Sci-PRM的高效奖励模型，以在单次推理的每一步提供关于工具选择、执行准确性和结果解释的细粒度监督。实验表明，Sci-PRM在两个关键方面显著增强了基础模型：（1）通过Best-of-N选择实现有效的测试时扩展；（2）当集成到强化学习中时，它作为密集奖励信号，缓解了优势消失的关键问题，使模型能够突破现有性能上限。

英文摘要

While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domain-specific tools, areas where current models often suffer from hallucinations and lack of verification. In this paper, we first construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference. Experiments demonstrate that Sci-PRM significantly enhances foundation models in two key aspects: (1) it enables effective test-time scaling via Best-of-N selection; and (2) when integrated into Reinforcement Learning, it serves as a dense reward signal that mitigates the critical issue of advantage disappearance, allowing the model to break through existing performance ceilings.

URL PDF HTML ☆

赞 0 踩 0

2606.04562 2026-06-04 cs.AI cs.LG cs.SI 版本更新

Neetyabhas: A Framework for Uncertainty-Aware Public Policy Optimization in Rational Agent-Based Models

Neetyabhas: 理性主体模型中不确定性感知的公共政策优化框架

Janani Venugopalan, Gaurav Deshkar, Rishabh Gaur, Harshal Hayatnagarkar, Jayanta Kshirsagar

发表机构 * ThoughtWorks

AI总结提出一种集成流行病测量和政策执行不确定性的分层强化学习框架，通过模拟个体行为与政策干预的交互，有效管理疫情并降低影响。

详情

AI中文摘要

目的世界卫生组织的COVID-19非药物干预措施（如封锁、疫苗接种）有效遏制了传播，但带来了沉重的经济负担。现有研究常常忽略个体行为，并错误地假设完美的感染追踪和无误的政策执行，未能考虑现实世界的不确定性和错误。方法我们提出了一种整合流行病测量（感染/住院）和政策执行中不确定性的方法。我们构建了一个包含1000名个体的模拟模型，这些个体实时做出关于佩戴口罩、接种疫苗和购物的选择。同时，政策制定者基于健康和经济观察部署干预措施（封锁、强制令）。该框架由分层强化学习智能体驱动，利用深度Q网络以及不确定性感知的策略梯度变体（DDPG和TD3）。结果模拟有效管理了疫情的进展。佩戴口罩和疫苗接种被证明非常有效，显著降低了疫情高峰的高度和持续时间。通过整合个体行为、政策不确定性和多方面的干预措施，我们的动态控制方法成功减轻了疫情的影响。结论我们的模型通过将不确定性和人类行为嵌入公共卫生政策框架，克服了以往研究的局限性。模拟表明，考虑个体选择和不完美数据对于设计复杂疫情期间的有效干预措施至关重要，其中口罩和疫苗是关键工具。

英文摘要

Purpose The WHO's COVID-19 non-pharmaceutical interventions (e.g., lockdowns, vaccinations) effectively curb transmission but impose heavy economic strains. Existing research often neglects individual behaviors and falsely assumes perfect infection tracking and flawless policy execution, failing to account for real-world uncertainties and errors. Methods We propose an integrative approach incorporating uncertainties in both epidemic measurement (infections/hospitalizations) and policy implementation. We built a simulation model of 1,000 individuals making real-time choices regarding mask-wearing, vaccination, and shopping. Concurrently, policymakers deploy interventions (lockdowns, mandates) based on health and economic observations. This framework is driven by hierarchical reinforcement learning agents, utilizing deep Q-networks alongside uncertainty-aware policy gradient variants (DDPG and TD3). Results The simulations effectively managed the epidemic's progression. Masking and vaccinations proved highly effective, significantly reducing both the outbreak's peak height and duration. By integrating individual behaviors, policy uncertainties, and multifaceted interventions, our dynamic control approach successfully mitigated the epidemic's impact. Conclusions Our model overcomes previous research limitations by embedding uncertainty and human behavior into public health policy frameworks. The simulation demonstrates that accounting for individual choices and imperfect data is crucial for designing effective interventions during complex pandemics, with masks and vaccines serving as pivotal tools.

URL PDF HTML ☆

赞 0 踩 0

2606.04555 2026-06-04 cs.CL cs.AI 版本更新

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

时间顺序对智能体记忆至关重要：面向长程智能体的线段树

Yifan Simon Liu, Liam Gallagher, Faeze Moradi Kalarde, Jiazhou Liang, Armin Toroghi, Scott Sanner

发表机构 * University of Toronto（多伦多大学）； Vector Institute for Artificial Intelligence（人工智能向量研究所）

AI总结提出线段树记忆架构SegTreeMem，通过在线右边缘更新规则保持对话历史的时间顺序，结合层次化时间上下文进行检索，在长程记忆基准上优于现有方法。

详情

AI中文摘要

长程对话智能体需要通过与用户交互不断演化的事件、任务和目标进行互动。这些历史记录本质上是时间性的，然而许多现有的记忆系统主要按主题相似性组织信息，可能忽略事件发生的顺序。我们引入线段树记忆（Segment Tree Memory，简称SegTreeMem），这是一种将对话历史表示为按时间顺序排列的线段树的记忆架构。SegTreeMem通过在线最右边缘更新规则逐步插入新话语，在形成层次化记忆片段的同时保持时间顺序。在检索时，SegTreeMem通过树传播相关性分数，将局部语义匹配与层次化时间上下文相结合。在三个长程记忆基准和两个LLM骨干网络上，SegTreeMem在答案质量上优于平面检索、图结构记忆和树结构记忆基线。额外的时间顺序排列分析表明，性能提升依赖于在记忆构建过程中保持时间顺序，这支持了时间顺序是智能体记忆关键结构的观点。

英文摘要

Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity and may ignore the order in which events occur. We introduce Segment Tree Memory, or SegTreeMem, a memory architecture that represents conversation history as a temporally ordered Segment Tree over utterances. SegTreeMem incrementally inserts new utterances through an online rightmost-frontier update rule, preserving chronological order while forming hierarchical memory segments. For retrieval, SegTreeMem propagates relevance scores through the tree to combine local semantic matching with hierarchical temporal context. Across three long-horizon memory benchmarks and two LLM backbones, SegTreeMem improves answer quality over flat retrieval, graph-structured memory, and tree-structured memory baselines. Additional temporal-order permutation analysis shows that the performance gain depends on preserving temporal order during memory construction, supporting the claim that temporal order is a key structure for agentic memory.

URL PDF HTML ☆

赞 0 踩 0

2606.04536 2026-06-04 cs.AI 版本更新

ANN搜索：召回真正重要的

Dimitris Dimitropoulos, Nikos Mamoulis

发表机构 * University of Ioannina（伊奥尼亚大学）； Archimedes, Athena RC（阿基米德，雅典RC）

AI总结本文提出用逆近似比1/Ratio@k替代Recall@k来评估近似最近邻搜索质量，实验表明前者能更准确反映实际效用并降低计算开销。

详情

AI中文摘要

近似最近邻（ANN）搜索已成为信息检索和现代机器学习任务（从分类到检索增强生成）的核心原语。社区主要通过给定Recall@k（检索到的真实精确最近邻的比例）下的吞吐量来评估和调优ANN算法。我们认为，ANN搜索真正重要的是检索结果的质量，而非它们与真实kNN集合的重叠。我们证明，使用Recall@k评估检索质量会带来不必要的计算开销，并研究用逆近似比1/Ratio@k替代它。1/Ratio@k评估检索到的邻居与真实邻居之间距离的差异。它无需判断、无需超参数，仅通过标准ANN基准输入即可计算。我们在涵盖广泛内在维度的多样化数据集上对最先进的ANN算法进行基准测试，从效率、下游分类和检索增强生成三个维度全面评估这两个指标。在效率方面，优化1/Ratio@k达到操作质量阈值所需的计算成本远低于Recall@k。在下游任务中，即使Recall@k显著下降，性能指标（标签精度、语义相似度、BERTScore和LLM评分质量）仍保持高度稳定。相反，逆近似比紧密反映了这种稳定性，比Recall@k更好地追踪实际效用。最终，虽然Recall@k夸大了近似的真实成本，但1/Ratio@k提供了更准确、可部署的ANN实际质量代理。

英文摘要

Approximate nearest neighbor (ANN) search has become a core primitive in information retrieval and modern machine learning tasks, from classification to retrieval-augmented generation. The community evaluates and tunes ANN algorithms primarily on their throughput at a given Recall@k, the fraction of true exact neighbors retrieved. We argue that what really matters in ANN search is the quality of the retrieved results and not their overlap with the true kNN set. We show that using Recall@k to assess retrieval quality forces unnecessary computational overhead and investigate replacing it by 1/Ratio@k, the inverse approximation ratio. 1/Ratio@k evaluates the differences between the distances of the retrieved and true neighbors. It is judge-free, hyperparameter-free, and computable from standard ANN benchmark inputs alone. We benchmark state-of-the-art ANN algorithms across diverse datasets spanning a wide range of intrinsic dimensionalities, evaluating the two metrics comprehensively across efficiency, downstream classification, and retrieval-augmented generation. On the efficiency axis, optimizing for 1/Ratio@k reaches operational quality thresholds at a substantially lower computational cost than Recall@k. In downstream tasks, performance indicators (label precision, semantic similarity, BERTScore, and LLM-graded quality) remain highly stable even when Recall@k drops significantly. The inverse approximation ratio, on the other hand, closely mirrors this stability, tracking true utility much better than Recall@k. Ultimately, while Recall@k overstates the true cost of approximation, 1/Ratio@k offers a more accurate, deployable proxy for actual ANN quality.

URL PDF HTML ☆

赞 0 踩 0

2606.04517 2026-06-04 cs.NI cs.AI 版本更新

Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis

像对待树一样对待流量：一种用于加密流量分析的语义保持分层图专家框架

Yuantu Luo, Jun Tao, Linxiao Yu, Guang Cheng

发表机构 * School of Cyber Science and Engineering, Southeast University（东南大学网络安全科学与工程学院）； Purple Mountain Laboratories（紫金山实验室）； Engineering Research Center of Blockchain Application, Supervision and Management (Southeast University)（区块链应用、监督与管理工程研究中心（东南大学））； Engineering Research Center of Security for Ubiquitous Network, Jiangsu Province（江苏省物联网安全工程技术研究中心）

AI总结提出一种基于协议树图注意力与专家混合的语义保持分层图专家框架（PTGAMoE），通过字段级图构建和专家委员会设计，在严格无数据泄露设置下显著优于现有模型，并提供可解释的协议级特征重要性分析。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

基于图的深度学习方法已被广泛应用于加密流量分析，以利用不同粒度下的潜在相关性。然而，复杂的预处理流程和精细的模型结构虽然通常能取得良好性能，但在表示学习过程中可能掩盖固有的协议语义。此外，由协议规范定义并在人工流量分析中常规使用的协议层及其对应字段的分层结构，在现有学习框架中仍未得到充分探索。在本文中，我们提出了一种用于加密流量分析的语义保持分层图专家框架——协议树图注意力与专家混合（PTGAMoE）。基于字段的图构建和专家委员会设计使PTGAMoE能够量化模型对特定字段和协议的偏好。在严格无数据泄露设置下，对代表性基准数据集的大量实验结果表明，PTGAMoE显著优于最先进的模型。此外，语义保持设计提供了关于协议级特征重要性和专家级贡献的可解释性洞察，反映了模型在加密流量分类任务中的决策逻辑。

英文摘要

Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities. However, while complex preprocessing pipelines and sophisticated model structures often achieve strong performance, they may obscure inherent protocol semantics during representation learning. Moreover, the hierarchical structure of protocol layers and their corresponding fields, defined by protocol specifications and routinely utilized in manual traffic analysis, remains underexplored in existing learning frameworks. In this paper, we propose Protocol Tree Graph Attention with Mixture of Experts (PTGAMoE), a semantic-preserving hierarchical graph-based expert framework for encrypted traffic analysis. The field-based graph construction and expert committee design enable PTGAMoE to quantify the model's preferences for specific fields and protocols. Extensive experimental results on representative benchmark datasets under strict no-data-leakage settings demonstrate that PTGAMoE significantly outperforms state-of-the-art (SOTA) models. Furthermore, the semantic-preserving design provides interpretable insights into protocol-level feature importance and expert-level contributions, reflecting the model's decision-making logic in encrypted traffic classification tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.04516 2026-06-04 cs.LG cs.AI 版本更新

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

GeoMin: 基于几何分布建模的数据高效半监督RLVR

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Kai Tang, Zhengqing Zang, Bowen Song, Weiqiang Wang, Gang Chen

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）

AI总结提出GeoMin方法，通过建模标注数据的全局特征分布来解码正确与错误展开的结构差异，从而建立稳健先验评估自奖励信号可靠性，以少量标注数据高效利用未标注数据，在仅用10%标注时超越全监督模型。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）显著提升了LLM的推理能力，但面临困境：标准监督扩展受限于高标注成本，而无监督替代方案则遭受严重的模型崩溃。最近的半监督RLVR方法通过使用少量标注集指导未标注数据，在训练效果和标注成本之间取得了有前景的权衡。然而，由于依赖粗糙的性能启发式，它们遭受严重的数据效率瓶颈，导致绝大多数有价值实例未被充分利用。为此，我们提出GeoMin，它在标注数据上建模全局特征分布，以解码正确和错误展开之间的结构差异，从而建立稳健的先验来评估自奖励信号的可靠性，并充分释放未标注数据的潜力。实验上，GeoMin比最强基线高出+4.1%，甚至在使用仅10%标注的情况下超越全监督模型，展示了显著的数据效率。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from severe model collapse. Recent semi-supervised RLVR methods address this by using a small labeled set to guide unlabeled data, achieving a promising trade-off between training efficacy and annotation cost. However, they suffer from a severe data-efficiency bottleneck due to the reliance on coarse performance heuristics, leaving a vast majority of valuable instances underutilized. To this end, we propose GeoMin, which models global feature distributions on labeled data to decode the structural discrepancy between correct and incorrect rollouts, thereby establishing a robust prior to assess the reliability of self-reward signals and fully unleash the potential of unlabeled data. Empirically, GeoMin outperforms the strongest baselines by +4.1% and even surpasses fully supervised models with only 10% of the annotations, demonstrating remarkable data efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.04507 2026-06-04 cs.CL cs.AI 版本更新

Self-Evolving Deep Research via Joint Generation and Evaluation

通过联合生成与评估实现自我进化的深度研究

Han Zhu, Chengkun Cai, Yuanfeng Song, Xing Chen, Sirui Han, Yike Guo

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）； ByteDance, China（字节跳动）； University College London（伦敦大学学院）

AI总结提出SCORE框架，通过共享参数的协同进化训练联合优化评估器与求解器，解决深度研究报告生成中奖励不可验证的问题，持续提升生成质量。

详情

AI中文摘要

大型语言模型（LLM）在日常应用中越来越广泛，其中深度研究是一项特别重要的能力。与传统的问答（QA）任务不同，深度研究报告生成缺乏明确的真实答案，这使得奖励设计本质上不可验证，限制了有效的强化学习。现有方法通过LLM作为评判者和查询相关的评估标准来缓解这一挑战，但它们仍然依赖静态评估器，无法随着求解器的改进而调整标准，导致优化压力不足并最终饱和。我们通过一个用于深度研究评估和生成的 extbf{自}我进化 extbf{协}同进化训练框架（SCORE）来解决这一限制，该框架在共享参数的学习过程中紧密耦合评估器和求解器。我们不将生成和评估视为孤立的模块，而是利用它们的内在联系，在单个共享参数模型中实现联合改进。为了限制这一过程，我们引入了一个元控制机制，该机制根据求解器的性能动态控制评估环境，鼓励有效的评估维度和足够深入的评估器搜索。在深度研究基准上的大量实验表明，报告生成质量持续提升，表明协同进化评估和生成是训练开放式研究代理的一个有前景的方向。

英文摘要

Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning. Existing approaches mitigate this challenge with LLM-as-a-judge and query-dependent evaluation rubrics, but they still rely on static evaluators that cannot adapt their standards as the solver improves, leading to insufficient and eventually saturated optimization pressure. We address this limitation with a \textbf{s}elf-evolving \textbf{co}-evolutionary training framework for deep \textbf{re}search evaluation and generation (SCORE), which tightly couples an evaluator and a solver in a shared-parameter learning process. Rather than treating generation and evaluation as isolated modules, we leverage their intrinsic connection to enable joint improvement within a single shared-parameter model. To restrict this process, we introduce a meta-harness, which dynamically controls the evaluation environment based on solver performance, encouraging valid evaluation dimensions and sufficiently deep evaluator search. Extensive experiments on deep research benchmarks demonstrate consistent improvement in report generation quality, showing that co-evolving evaluation and generation is a promising direction for training open-ended research agents.

URL PDF HTML ☆

赞 0 踩 0

2606.04505 2026-06-04 cs.AI 版本更新

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

模拟、推理、决策：基于科学推理的LLM驱动模拟决策

Yuhan Yang, Ruipu Li, Alexander Rodríguez

发表机构 * Computer Science and Engineering University of Michigan（计算机科学与工程大学密歇根大学）

AI总结提出MechSim框架，通过神经符号推理使LLM能够推理科学模拟器的机制和假设，提升决策透明度和可靠性。

详情

AI中文摘要

科学模拟器越来越多地被集成到LLM驱动的系统中，用于高风险模拟驱动决策。然而，现有框架主要使用LLM来生成、校准或执行模拟器，将其视为黑盒接口而非可推理的结构化机械系统。因此，当前方法缺乏识别、表示和推理模拟器行为背后的假设和机制的能力，限制了透明度、可审计性和决策合理性。我们引入了MechSim，一个面向可执行科学模拟器的机制基础神经符号推理框架。与先前主要对静态符号结构进行推理的神经符号方法不同，MechSim使LLM代理能够推理科学模拟器的机制、假设和执行行为。我们的框架通过共享结构化模式表示模拟器，捕获假设、变量、机制依赖和执行轨迹。在此表示之上，LLM代理作为受约束的推理引擎运行，生成结构化的、基于证据的解释，将模拟器结果与其底层机制联系起来。我们在多个高风险领域评估了我们的方法，结果表明它提高了机制级解释质量、模拟器分析和下游决策可靠性。

英文摘要

Scientific simulators are increasingly being integrated into LLM-driven systems for high-stakes simulation-driven decision-making. However, existing frameworks primarily use LLMs to generate, calibrate, or execute simulators, treating them as black-box interfaces rather than as structured mechanistic systems that can be reasoned about. As a result, current approaches lack the ability to identify, represent, and reason about the assumptions and mechanisms underlying simulator behavior, limiting transparency, auditability, and decision justification. We introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework for executable scientific simulators. Unlike prior neuro-symbolic approaches that primarily reason over static symbolic structures, MechSim enables LLM agents to reason about the mechanisms, assumptions, and execution behavior of scientific simulators. Our framework represents simulators through a shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces. On top of this representation, LLM agents operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms. We evaluate our approach across multiple high-stakes domains and show that it improves mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.

URL PDF HTML ☆

赞 0 踩 0

2606.04503 2026-06-04 cs.LG cs.AI 版本更新

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

暗中选择：通过追踪元认知支点实现高效的推理可验证奖励强化学习

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Bowen Song, Weiqiang Wang, Gang Chen

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）

AI总结针对可验证奖励强化学习（RLVR）中数据效率低的问题，提出PivotTrace框架，利用注意力动态追踪推理过程中的元认知支点，通过支点密度量化不确定性实现数据自动分流，在仅使用29.3%标注样本和2.75倍收敛加速下超越全监督模型。

详情

AI中文摘要

可验证奖励强化学习（RLVR）极大地推进了大型推理模型（LRMs），但它需要及时在大量完全标注的数据集上进行训练。为此，从两个角度广泛研究了数据高效的RLVR方法：（i）数据选择方法识别一小部分“黄金”样本，这些样本能产生接近全数据性能，但它们依赖于预先存在的标注数据池。（ii）无监督RLVR方法在大规模未标注数据上利用模型自身的内部监督信号进行训练，但表现出次优性能。因此，我们研究了RLVR的“暗中选择”设置，其目标是在没有先验监督的情况下，选择对训练最有益且值得标注的未标注样本。通过系统分析，我们证明智能选择依赖于一个校准良好的不确定性估计器，以实现数据的策略性划分，从而进行自适应训练方案。基于这一见解，我们提出了PivotTrace，一个三路数据分流框架，利用注意力动态追踪推理过程中的元认知支点。通过支点密度精确量化不确定性，PivotTrace实现了自动数据路由，协同最大化标注和训练效率。实验表明，PivotTrace仅使用29.3%的标注样本和2.75倍的收敛速度就超越了全监督LRM。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely studied from two perspectives: (i) data selection methods identify a small subset of "golden" samples that yield near-full-data performance, but they rely on a pre-existing pool of labeled data. (ii) unsupervised RLVR methods train the model using its own internal supervision signals on large-scale unlabeled data, yet they exhibit suboptimal performance. Accordingly, we investigate the "pick in the dark" setup for RLVR, which aims to select, without prior supervision, unlabeled samples that are most beneficial for training and worthy of annotation. Through systematic analysis, we demonstrate that smart picks hinge on a well-calibrated uncertainty estimator to enable strategic partitioning of data for adaptive training regimes. Building on this insight, we propose PivotTrace, a three-way data triage framework that leverages attention dynamics to trace metacognitive pivots during reasoning. By precisely quantifying uncertainty through pivot density, PivotTrace achieves automated data routing to synergistically maximize both annotation and training efficiency. Empirically, PivotTrace surpasses the fully supervised LRM with only 29.3% annotated samples and 2.75 faster convergence.

URL PDF HTML ☆

赞 0 踩 0

2606.04494 2026-06-04 cs.AI 版本更新

Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System

超越基于提示的规划：基于MCP原生图规划的生物医学智能体系统

Zhangtianyi Chen, Florensia Widjaja, Wufei Dai, Xiangjun Zhang, Yuhao Shen, Juexiao Zhou

发表机构 * The Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出BioManus系统，通过将异构生物信息学工具编译为标准MCP服务器并构建类型化异构图，实现基于图结构的规划，解决工具混淆和上下文效率问题，在BioAgentBench和LAB-Bench上提升执行准确性和工作流有效性。

详情

AI中文摘要

生物医学智能体有望自动化复杂的生物工作流，但当前系统面临两个基本瓶颈：生物信息学工具在接口和执行环境上高度异构，而智能体规划仍依赖于基于提示的扁平工具描述。随着生物医学软件生态系统的增长，这种工具覆盖与上下文大小之间的耦合导致工具混淆、规划不稳定和执行效率低下。我们引入BioManus，一种基于结构化生物能力上的图支架规划的原生MCP生物医学智能体。BioManus首先提出BioinfoMCP编译器，将异构生物信息学软件转换为标准化的MCP服务器，从而产生一个大型可执行的MCP生态系统。然后，它将这个生态系统组织成一个类型化的异构图，涵盖工具、操作、数据类型和工作流阶段。在推理时，BioManus检索紧凑的任务特定子图，合成操作级工作流支架。这种设计将规划复杂度与原始工具库存大小解耦，在高召回率检索下实现了上下文压缩比Theta(N / (h * m_bar))，其中N是工具总数，h是工作流长度，m_bar（远小于N）是每个操作的平均候选工具数量。在BioAgentBench和LAB-Bench上的实验表明，与先进的生物医学智能体基线相比，BioManus提高了执行准确性、工作流有效性和上下文效率。这项工作表明了一种范式转变：可扩展的生物医学推理需要结构化的可执行能力图，而不是越来越大的提示级工具检索。

英文摘要

Biomedical agents promise to automate complex biological workflows, yet current systems face two fundamental bottlenecks: bioinformatics tools are highly heterogeneous in interfaces and execution environments, while agent planning still relies on flat prompt-retrieved tool descriptions. As biomedical software ecosystems grow, this coupling between tool coverage and context size leads to tool confusion, unstable planning, and inefficient execution. We introduce BioManus, an MCP-native biomedical agent built on graph-scaffolded planning over structured biological capabilities. BioManus first introduces the BioinfoMCP Compiler, which converts heterogeneous bioinformatics software into standardized MCP servers, yielding a large executable MCP ecosystem. It then organizes this ecosystem as a typed heterogeneous MCP graph over tools, operations, datatypes, and workflow stages. At inference time, BioManus retrieves compact task-specific subgraphs, synthesizes operation-level workflow scaffolds. This design decouples planning complexity from raw tool inventory size, achieving a context compression ratio of Theta(N / (h * m_bar)) under high-recall retrieval, where N is the total tool count, h is the workflow horizon, and m_bar (much smaller than N) is the average number of candidate tools per operation. Experiments on BioAgentBench and LAB-Bench show that BioManus improves execution accuracy, workflow validity, and context efficiency over advanced biomedical agent baselines. This work suggests a paradigm shift: scalable biomedical reasoning requires structured executable capability graphs rather than increasingly larger prompt-level tool retrieval.

URL PDF HTML ☆

赞 0 踩 0

ChessMimic: 用于在线闪电棋中人类走棋、时钟和结果预测的按等级划分的Transformer模型

Thomas Johnson

发表机构 * nascent.xyz（nascent实验室）

AI总结提出ChessMimic系统，包含三个小型编码器Transformer模型，分别用于走棋、思考时间和结果预测，通过按Elo等级分段训练实现更精细的技能校准，在Lichess闪电棋数据上走棋预测准确率超越Maia-2，结果预测AUC达0.78，时钟模型提供可用但非最优的思考时间信号。

详情

AI中文摘要

我们提出了ChessMimic，一个由三个小型编码器Transformer组成的系统——分别用于走棋、思考时间和结果预测——以局面、最近走棋历史、玩家等级和时钟状态为条件。我们为每100 Elo等级区间拟合每个模型的独立实例，以参数效率换取更精细的技能校准。在Lichess Rated Blitz游戏的一个月保留切片上，ChessMimic的人类走棋预测准确率在每个Elo区间都优于Maia-2。与Maia-3相比，我们的9M参数模型的准确率介于Maia-3-5M和Maia-3-23M之间，且没有几何注意力偏置的额外复杂性。除了走棋匹配模型，我们还训练了一个游戏结果模型，该模型不仅以局面为条件，还以玩家等级、时间控制和剩余时钟时间为条件。结果模型在样本外达到了0.78的AUC，击败了Maia-2以及基于子力、等级和时钟时间的逻辑回归。最后，我们训练了一个时钟模型来预测人类思考时间。该时钟模型在ALLIE风格过滤器下提供了可用但非最优的每步思考时间信号（Pearson r = 0.41，Spearman rho = 0.50，MAE 4.10秒，而ALLIE报告的r = 0.70），残差差距集中在每位置桶的锐度上，而非桶边际校准。公开演示在1e4.ai，我们在GitHub上发布了代码、每个区间的权重以及C++数据过滤管道代码。

英文摘要

We present ChessMimic, a system of three small encoder-only transformers - for move, thinking-time, and outcome prediction - conditioned on the position, recent move history, player rating, and clock state. We fit a separate instance of each model per 100-Elo rating band, trading parameter efficiency for sharper per-skill calibration. On a held-out month-wide slice of Lichess Rated Blitz games ChessMimic's human move prediction accuracy outperforms Maia-2 in every Elo band. Compared to Maia-3, our 9M parameter model's accuracy sits between Maia-3-5M and Maia-3-23M without the additional complexity of Geometric Attention Bias. In addition to the move matching model, we also train a game outcome model that conditions not only on the position, but also player ratings, time control, and remaining clock times. The outcome model achieves an AUC of 0.78 out of sample, beating Maia-2 as well as logistic regressions based on material, ratings, and clock time. Finally, we train a clock model that predicts human thinking times. The clock model provides a usable but non-SOTA per-ply think-time signal under ALLIE-style filters (Pearson r = 0.41, Spearman rho = 0.50, MAE 4.10 s, against ALLIE's reported r = 0.70), with the residual gap concentrated in per-position bucket sharpness rather than bucket-marginal calibration. A public demo is at 1e4.ai and we release code, per-band weights, and the C++ data-filter pipeline code in GitHub.

URL PDF HTML ☆

赞 0 踩 0

2606.04469 2026-06-04 cs.CV cs.AI 版本更新

Adaptive Calibration for Fair and Performant Facial Recognition

自适应校准：实现公平且高性能的面部识别

Ryan Brown, Chris Russell

发表机构 * University of Oxford（牛津大学）

AI总结提出自适应校准（AC）方法，通过将归一化嵌入的余弦相似度映射为校准概率，并融入局部上下文校正区域差异，从而在无需人口统计元数据的情况下提升面部识别的整体性能和公平性。

详情

AI中文摘要

我们引入自适应校准（AC），一种新颖的面部识别校准策略，将归一化嵌入之间的余弦相似度映射为良好校准的概率。通过将局部上下文纳入校准，自适应校正确保了余弦相似度中的一个基本不匹配问题，即相同的距离在不同嵌入区域可能对应不同的匹配概率。我们的方法在无需人口统计元数据的情况下，既提高了整体性能，又实现了更公平的校准。在各种预训练模型和标准基准上，我们的方法在准确性和公平性指标上始终优于现有方法。AC为公平的面部识别提供了实用的解决方案，无需人口统计组注释，同时提高了整体性能。与现有方法不同，我们的方法提供了连续的、区域特定的校准，避免了“降级”现象，即公平性以牺牲某些群体的性能为代价。

英文摘要

We introduce Adaptive Calibration (AC), a novel calibration strategy for facial recognition that maps cosine similarity between normalized embeddings to well-calibrated probabilities. By incorporating local context into calibration, Adaptive Calibration corrects for a fundamental mismatch in cosine similarity, whereby the same distance can correspond to different match probabilities in different embedding regions. Our approach improves both overall performance and results in a fairer calibration without requiring demographic metadata. Our approach consistently dominates existing methods both on accuracy and fairness metrics across a variety of pretrained models and standard benchmarks. AC provides a practical solution for equitable facial recognition, without requiring demographic group annotations, and while improving overall performance. Unlike existing approaches, our method provides continuous, region-specific calibration that avoids "leveling down" where fairness comes at the cost of degraded performance for some groups.

URL PDF HTML ☆

赞 0 踩 0

2606.04468 2026-06-04 cs.LG cs.AI cs.NE math.OC 版本更新

ParetoPilot: Zero-Surrogate Offline Multi-Objective Optimization via Infer-Perturb-Guide Diffusion

ParetoPilot：通过推断-扰动-引导扩散实现零代理离线多目标优化

Ruiqing Sun, Sen Yang, Dawei Feng, Bo Ding, Yijie Wang, Huaimin Wang

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结提出ParetoPilot，一种无需外部代理模型的零代理扩散框架，通过推断-扰动-引导引擎在无条件去噪步骤中隐式推断目标方向、正交化并行引力场和边缘感知排斥力，实现离线多目标优化的帕累托最优设计。

详情

AI中文摘要

离线多目标优化旨在基于静态数据集发现新颖的帕累托最优设计，而无需昂贵的环境交互。尽管最近的生成方法取得了显著成功，但它们主要依赖外部代理模型。这种依赖引入了显著的计算开销，遭受欺骗性评估，并偏离了联合训练主流生成模型与条件的流行范式。为了解决这些瓶颈，我们提出了ParetoPilot，一种用于离线多目标优化的新颖零代理扩散框架。ParetoPilot充分利用预训练扩散模型中固有的条件先验。其核心是引入了推断-扰动-引导引擎，该引擎无缝地插入在反向生成过程的无条件去噪步骤中。首先，通过匹配条件噪声预测和无条件噪声预测，隐式推断瞬时目标方向。其次，数学上正交化一个用于严格收敛的平行引力场和一个用于相互多样性的边缘感知排斥力，从而生成一个动态退火的扰动向量。最后，这个扰动目标通过标准的无分类器引导无缝地引导生成过程。在51个任务上的大量实验表明，ParetoPilot优于14个最先进的基于代理和逆生成基线。通过消除辅助代理训练，我们的方法在实现超体积改进和鲁棒帕累托前沿覆盖的同时，保护了数据隐私。

英文摘要

Offline multi-objective optimization (Offline MOO) aims to discover novel Pareto-optimal designs based on static datasets without expensive environment interactions. While recent generative methods have achieved notable success, they predominantly rely on external surrogate models. This dependency introduces significant computational overhead, suffers from deceptive evaluations, and deviates from the prevailing paradigm of jointly training mainstream generative models with conditions. To address these bottlenecks, we propose ParetoPilot, a novel zero-surrogate diffusion framework for offline MOO. ParetoPilot fully leverages the conditional priors inherently embedded within pre-trained diffusion models. At its core, the framework introduces the Infer-Perturb-Guide (IPG) engine, which is seamlessly interleaved within the unconditional denoising steps of the reverse generation process. First, it implicitly infers the instantaneous objective direction by matching conditional and unconditional noise predictions. Next, it mathematically orthogonalizes a parallel gravity field for strict convergence and an edgeness-aware repulsive force for mutual diversity, creating a dynamically annealed perturbation vector. Finally, this perturbed target seamlessly steers the generation process via standard Classifier-Free Guidance (CFG). Extensive experiments across 51 tasks demonstrate that ParetoPilot outperforms 14 state-of-the-art surrogate-based and inverse generative baselines. By eliminating auxiliary proxy training, our approach preserves data privacy while achieving hypervolume improvement and robust Pareto front coverage.

URL PDF HTML ☆

赞 0 踩 0

2606.04465 2026-06-04 cs.CL cs.AI 版本更新

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

SePO: 用于系统提示优化的自我进化提示智能体

Wangcheng Tao, Han Wu, Weng-Fai Wong

发表机构 * National University of Singapore（新加坡国立大学）； City University of Hong Kong（香港城市大学）

AI总结提出SePO方法，通过自我指涉设计让提示智能体同时优化任务智能体和自身的系统提示，采用两阶段进化训练，在多个基准上平均准确率提升4.49%。

Comments 26 pages. Code: https://github.com/taowangcheng/SePO

详情

AI中文摘要

系统提示优化在不修改底层模型的情况下改善智能体行为，生成可读且模型无关的指令。现有方法构建一个提示智能体来优化任务智能体的系统提示，但提示智能体自身的系统提示仍由人工设计且固定不变。我们提出自我进化提示优化（SePO），将提示智能体自身的系统提示与任务智能体的系统提示一同作为优化目标。SePO采用自我指涉设计：一个单一的提示智能体在开放式进化搜索下同时改进任务智能体的系统提示和自身的系统提示，该搜索维护一个候选提示档案作为垫脚石。训练分为两个阶段：预训练在多任务池上进化提示智能体，微调则将其应用于目标任务。在涵盖数学（AIME'25）、抽象推理（ARC-AGI-1）、研究生级科学（GPQA）、代码生成（MBPP）和逻辑谜题（数独）的五个基准上，SePO始终优于Manual-CoT、TextGrad和MetaSPO，与Manual-CoT相比平均准确率提升4.49%。预训练中的提示优化技能也能泛化到预训练混合任务之外的任务，而非记忆每个任务的提示。

英文摘要

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-engineered and fixed. We propose Self-Evolving Prompt Optimization (SePO), which treats the prompt agent's own system prompt as an optimization target alongside task agents' system prompts. SePO adopts a self-referential design. A single prompt agent improves both task agents' system prompts and its own under an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones. Training proceeds in two stages: pre-training evolves the prompt agent on a multi-task pool, and fine-tuning then applies it to a target task. Across five benchmarks spanning math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku), SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO, improving the average accuracy by 4.49 points compared to Manual-CoT. The prompt optimization skill from pre-training also generalizes to tasks beyond the pre-training mixture, rather than memorizing per-task prompts.

URL PDF HTML ☆

赞 0 踩 0

2606.04460 2026-06-04 cs.CR cs.AI cs.LG 版本更新

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

CyberGym-E2E：面向AI代理端到端网络安全能力的可扩展真实世界基准

Tianneng Shi, Robin Rheem, Dongwei Jiang, Mona Wang, Francisco De La Riega, Zhun Wang, Jingzhi Jiang, Alexander Cheung, Sean Tai, Jonah Cha, Jianhong Tu, Gabriel Han, Chenguang Wang, Jingxuan He, Wenbo Guo, Dawn Song

发表机构 * Stanford University（斯坦福大学）； UC Berkeley（加州大学伯克利分校）

AI总结提出CyberGym-E2E，一个大规模、真实的端到端网络安全基准，通过自动化流水线将开源漏洞数据转化为评估环境，全面评估AI代理在漏洞发现、PoC生成和补丁生成全生命周期中的能力。

Comments ICML 2026

2606.04459 2026-06-04 cs.CR cs.AI cs.CC cs.CL 版本更新

LoopMoE：统一迭代计算与混合专家模型用于语言建模

Wenkai Chen, Tianshu Li, Wenyong Huang, Yichun Yin, Lifeng Shang, Chengwei Qin

发表机构 * Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Huawei Technologies Co.,Ltd.（华为技术有限公司）

AI总结提出LoopMoE，通过迭代自适应层归一化和容量平衡策略，在相同参数和FLOPs下，循环MoE语言模型在多个基准上优于标准MoE。

详情

AI中文摘要

混合专家模型（MoE）和循环架构分别沿着参数容量和有效深度两个正交维度扩展模型。然而，主流的循环架构依赖于密集主干，将参数数量与每个token的FLOPs耦合，这使得在匹配预算下无法隔离迭代计算的效果。为此，我们提出了LoopMoE，一种循环MoE语言模型，通过两种设计将稀疏路由与迭代权重共享计算相结合。第一种是IterAdaLN，它通过联合以迭代索引和每个token隐藏状态为条件的调制信号来解决权重共享对称性。第二种是一种容量平衡策略，恢复了经过良好调整的非循环参考模型的注意力到FFN活跃参数比率。这些设计共同实现了在相同总参数、每个token FLOPs和活跃子层比率下，循环MoE与标准MoE的首次严格受控的头对头评估。在3B规模下，LoopMoE在9个下游基准测试中的8个上优于标准MoE，平均提升超过1个点。在9B规模下，LoopMoE继续优于匹配的标准MoE，表明架构优势在更大规模下持续存在。我们的工作建立了稀疏性和循环性的受控综合，并为循环语言模型指明了一个有前景的方向。

英文摘要

Mixture-of-Experts (MoE) and looped architectures scale models along two orthogonal axes, namely parameter capacity and effective depth. However, mainstream looped architectures rely on dense backbones that couple parameter count with per-token FLOPs, which makes it impossible to isolate the effect of iterative computation under matched budgets. To this end, we present LoopMoE, a looped MoE language model that integrates sparse routing with iterative weight-shared computation through two designs. The first is IterAdaLN, which resolves weight-sharing symmetry via a modulation signal jointly conditioned on the iteration index and the per-token hidden state. The second is a capacity-balancing strategy that recovers the attention-to-FFN active parameter ratio of well-tuned non-looped references. Together, these designs enable the first strictly controlled, head-to-head evaluation of a looped MoE against a Vanilla MoE under identical total parameters, per-token FLOPs, and active sublayer ratios. At the 3B scale, LoopMoE outperforms the Vanilla MoE on 8 of 9 downstream benchmarks with an average improvement exceeding 1 point. At the 9B scale, LoopMoE continues to outperform the matched Vanilla MoE, indicating that the architectural gain persists at larger scale. Our work establishes a controlled synthesis of sparsity and recurrence, and suggests a promising direction for looped language models.

URL PDF HTML ☆

赞 0 踩 0

2606.04435 2026-06-04 cs.AI cs.CL cs.CR cs.IR 版本更新

基于差分进化和梯度下降优化的集成潜在因子模型

Rui Zhang, Jinhang Liu, Wenbo Zhang

发表机构 * Chongqing Academy of Economics Research（重庆经济研究院）； College of Computer and Information Science, Southwest University（西南大学计算机与信息科学学院）

AI总结针对高维不完全数据，提出一种集成潜在因子模型，通过差分进化和梯度下降两种优化方法分别建模并自适应加权融合，以获取更全面、偏差更小的表示。

详情

AI中文摘要

高维不完全（HDI）数据在许多现实世界的大数据场景中普遍存在。潜在因子模型是一种常见的表示学习方法，能够从这些数据中揭示信息丰富的潜在因子。然而，大多数现有的潜在因子模型仅依赖梯度下降进行优化，这可能导致表示不充分且有偏差，特别是在处理异构HDI数据时。因此，本研究提出了一种基于差分进化和梯度下降优化的集成潜在因子模型（ELFM-DEGDO），其设计包括两个方面：1）分别通过差分进化和梯度下降优化独立建模两个不同的潜在因子模型；2）通过定制的自适应加权机制将这两个不同的潜在因子模型组合起来，以有效融合它们的优势。通过利用两种优化范式的互补优势，ELFM-DEGDO能够为HDI数据生成更全面、偏差更小的表示。在三个HDI数据集上的测试表明，ELFM-DEGDO的性能始终优于相关的几种潜在因子模型。

英文摘要

High-dimensional and incomplete (HDI) data are prevalent in many real-world big data scenarios. Latent factor models serve as a common representation learning approach, capable of uncovering informative latent factors from such data. Nevertheless, most existing latent factor models rely solely on gradient descent for optimization, which may lead to insufficient and biased representations, particularly when dealing with heterogeneous HDI data. Thus, this study proposes an Ensembled Latent Factor Model via Differential Evolution and Gradient Descent Optimization (ELFM-DEGDO) with two-fold designed: 1) two diverse latent factor models are independently modeled via differential evolution and gradient descent optimization, respectively, and 2) the two diverse latent factor models are combined via a customized self-adaptive weighting mechanism to effectively fuse their strengths. By leveraging the complementary advantages of both optimization paradigms, ELFM-DEGDO is able to produce more comprehensive and less biased representations for HDI data. Three HDI datasets are tested to show that ELFM-DEGDO consistently performs better than related several latent factor models.

URL PDF HTML ☆

赞 0 踩 0

2606.04405 2026-06-04 cs.LG cs.AI 版本更新

Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View

尺度不变Transformer中Grokking的低秩衰减：谱几何视角

Mingyu Li

发表机构 * Beijing Normal University（北京师范大学）

AI总结针对尺度不变Transformer中权重衰减无法简化归一化层函数的问题，提出低秩衰减（LRD）正则化器，通过核范数子梯度的切向分量压缩奇异值，在模算术任务中加速有效秩下降并扩展延迟泛化（grokking）的数据边界。

详情

AI中文摘要

现代Transformer架构经常采用归一化机制，如RMSNorm和Query-Key归一化，使得模型的部分相对于权重幅度近似尺度不变。在这种机制下，标准的Frobenius范数权重衰减仅沿权重空间的径向方向作用，无法直接简化归一化层所表示的函数。我们通过这一视角研究小规模算法任务中的grokking现象，并提出\emph{低秩衰减}（LRD），一种类似核范数的谱正则化器，其子梯度——极因子$UV^\top$——即使在尺度不变设置中也保留切向分量。这一区别具有具体的动力学后果：在模型记忆训练集且任务梯度消失后，L2衰减无法再重塑权重谱，而LRD则以类似$\ell_1$的方式继续压缩奇异值。在模算术任务中，我们发现LRD诱导Query/Key矩阵的快速有效秩下降，并扩展了延迟泛化（grokking）发生的数据分数边界。我们进一步通过核范数子微分在低秩流形附近的“针到扇”展开，提供了谱几何解释。

英文摘要

Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-norm weight decay acts purely along the radial direction of the weight space and cannot directly simplify the function represented by the normalized layer. We study grokking in small algorithmic tasks through this lens and propose \emph{Low-Rank Decay} (LRD), a nuclear-norm-like spectral regularizer whose subgradient -- the polar factor $UV^\top$ -- retains a tangential component even in the scale-invariant setting. This distinction has a concrete dynamical consequence: after the model memorizes the training set and task gradients vanish, L2 decay can no longer reshape the weight spectrum, whereas LRD continues to compress singular values in an $\ell_1$-like fashion. On modular arithmetic tasks, we find that LRD induces rapid effective-rank collapse in Query/Key matrices and expands the data-fraction boundary at which delayed generalization (grokking) occurs. We further provide a spectral-geometric interpretation through the ``needle-to-fan'' expansion of the nuclear-norm subdifferential near low-rank strata.

URL PDF HTML ☆

赞 0 踩 0

2606.04402 2026-06-04 cs.AI 版本更新

重新思考基于LLM的分层偏好排名的销售线索评分

Chenyu Zhang, Yiwen Liu, Yin Sun, Xinyuan Zhang, Yuji Cao, Junming Jiao, Juyi Qiao

发表机构 * Intelligent Business Team, Li Auto Inc.（李自动公司智能商务团队）

AI总结针对高价值领域销售线索转化问题，提出基于LLM的判别式框架HPRO，通过分层偏好排名优化联合建模结构化与非结构化数据，实现评分与排名性能提升。

详情

AI中文摘要

在高价值领域（如汽车、房地产）中，销售线索转化与电子商务推荐有根本不同，因为其决策周期长且涉及多阶段漏斗。传统的线索评分方法（基于规则的评分卡、机器学习或逐点CTR模型）面临严重挑战：监督信号稀疏、非结构化CRM日志中的语义鸿沟，以及无法捕捉线索的相对优先级。虽然大型语言模型（LLM）能够对客户交互提供卓越的语义理解，但通用LLM不适合线索排名：它们生成文本而非可比较的分数，并且缺乏与销售漏斗分层优先级的一致性。我们提出了一种基于LLM的判别式框架用于销售线索评分，该框架支持结构化CRM特征和非结构化客户交互的联合建模。在此框架之上，我们提出了HPRO（分层偏好排名优化），通过分层偏好排名目标增强销售线索评分。HPRO采用边际感知的Bradley-Terry公式，将稀疏的二元标签转换为密集的、漏斗感知的偏好对，使线索评分能够同时利用逐点和成对监督。在来自领先新能源汽车品牌的大规模数据上的实验表明，分类性能达到最先进水平（AUC 0.8161），排名性能提升（排名靠前线索的精确度提高39.7%）。为期132天的在线A/B测试验证了9.5%的销量提升，确认了实际的商业影响。

英文摘要

Sales lead conversion in high-stakes domains (e.g., automotive, real estate) differs fundamentally from e-commerce recommendation due to prolonged decision cycles and multi-stage funnels. Traditional lead scoring methods rule-based scorecards, machine learning, or pointwise CTR models face severe challenges: sparse supervision, a semantic gap in unstructured CRM logs, and inability to capture relative lead priority. While Large Language Models(LLMs) offer superior semantic understanding of customer interactions, general-purpose LLMs are ill-suited for lead ranking: they generate text rather than comparable scores, and lack alignment with the hierarchical priorities of sales funnels. We introduce an LLM-based discriminative framework for sales lead scoring, which supports joint modeling of structured CRM features and unstructured customer interactions. On top of this framework, we propose HPRO (Hierarchical Preference Ranking Optimization), which augments sales lead scoring with a hierarchical preference ranking objective. HPRO employs a margin-aware Bradley-Terry formulation to transform sparse binary labels into dense, funnel-aware preference pairs, enabling lead scoring to leverage both pointwise and pairwise supervision. Experiments on large-scale data from a leading NEV brand demonstrate state-of-the-art classification (AUC 0.8161) and ranking performance (+39.7% precision among top-ranked leads). A 132-day online A/B test validates 9.5% sales volume uplift, confirming real-world commercial impact.

URL PDF HTML ☆

赞 0 踩 0

2606.04382 2026-06-04 cs.DL cs.AI cs.IR 版本更新

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

LCSHBench：一个多语言、共识基础的国会图书馆主题标目分配基准

Kwok Leong Tang

发表机构 * Library of Congress（国会图书馆）

AI总结提出LCSHBench基准，基于多图书馆共识构建多语言书目记录集，通过精确匹配和概念匹配评估自动主题编目，并展示低秩微调嵌入器在跨语言检索中的改进。

详情

AI中文摘要

自动主题编目为书目记录分配受控词汇标目，但LCSH缺乏标准的公开基准。我们引入LCSHBench：来自哈佛、哥伦比亚和普林斯顿开放许可目录的15种语言的22,346本书。只有当至少两个独立编目机构分配了LCSH时，记录才被纳入；我们发布每个目录的来源以及联合和一致答案视图。对465,187部由三个图书馆编目的作品进行的一致性研究显示了这种设计的重要性：图书馆通常在底层主题上达成一致（93.3%共享概念级标目），但在确切表达上经常不同（39.4%具有相同的标目集）。因此，LCSHBench通过开放词汇生成和全词汇检索，使用按语言和标目类型分解的集合和排名指标，对精确匹配和概念匹配进行评分。作为首次演示，对300M设备端嵌入器的低秩微调改进了跨语言检索，并在开发集上的精确召回率@200（0.659 vs 0.623）超过了3,072维托管嵌入器。语言面板显示增益并不均匀，保留测试和端到端确认仍是未来工作。

英文摘要

Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when at least two independent cataloging agencies assigned LCSH; we release per-catalog provenance plus union and unanimous answer views. A concordance study of 465,187 works cataloged by all three libraries shows why this design matters: libraries usually agree on the underlying topic (93.3% share a concept-level heading) but often differ in exact expression (39.4% have identical heading sets). LCSHBench therefore scores both exact and concept matches, with set and rank metrics broken down by language and heading type, across open-vocabulary generation and full-vocabulary retrieval. As a first demonstration, a low-rank fine-tune of a 300M on-device embedder improves cross-lingual retrieval and beats a 3,072-dimensional hosted embedder on development exact recall@200 (0.659 vs 0.623). The language panel shows the gain is not uniform, and held-out-test and end-to-end confirmation remain future work.

URL PDF HTML ☆

赞 0 踩 0

2606.04381 2026-06-04 cs.LG cs.AI 版本更新

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

从符号到几何：在大语言模型中实现空间推理

Chen Chu, Bita Azarijoo, Li Xiong, Khurram Shafique, Cyrus Shahabi

发表机构 * University of Southern California（南加州大学）； Emory University（埃默里大学）； Novateur Research Solutions（Novateur研究解决方案）

AI总结提出空间语言模型（SLM），通过将位置信息作为一等模态并学习空间表示，在推理过程中实现几何空间推理，显著优于基于符号推理的现有方法。

详情

AI中文摘要

近期的大语言模型（LLM）通常表现出空间推理能力；然而，这种能力很大程度上是\emph{符号}性的，源于对空间语言的模式匹配，而非真正的\emph{几何}空间推理。由于LLM操作离散令牌，它们缺乏对连续空间表示、显式几何计算和结构化空间算子的原生支持。为解决这一局限，我们引入了\emph{空间语言模型（SLM）}，这是首个将位置信息作为一等模态并在模型推理过程中实现几何空间推理的多模态LLM。SLM直接操作学习到的空间表示，而非空间关系的文本描述。为支持有效训练，我们构建了\emph{空间指令数据集}，该数据集对齐了空间表示、原子几何操作和自然语言指令。我们进一步提出了名为\emph{SpatialEval}的新基准，旨在评估属性、距离、拓扑和相对位置任务上的空间推理。大量实验表明，SLM显著优于依赖通过提示工程或文本抽象进行符号推理的现有基于LLM的方法，展示了集成几何空间表示对稳健空间推理的优势。我们的指令数据集、评估基准、模型训练代码和模型检查点可在\hyperlink{https://github.com/chuchen2017/SLM}{https://github.com/chuchen2017/SLM}获取。

英文摘要

Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reasoning over space. Because LLMs operate on discrete tokens, they lack native support for continuous spatial representations, explicit geometric computation, and structured spatial operators. To address this limitation, we introduce the \emph{Spatial Language Model (SLM)}, the first multimodal LLM that treats location information as a first-class modality and enables geometric spatial reasoning within the model's inference process. SLM directly operates on learned spatial representations rather than textual descriptions of spatial relations. To support effective training, we construct a \emph{Spatial Instruction Dataset} that aligns spatial representations, atomic geometric operations, and natural language instructions. We further propose a new benchmark named \emph{SpatialEval}, which is designed to evaluate spatial reasoning across attributes, distance, topology, and relative-position tasks. Extensive experiments show that SLM significantly outperforms existing LLM-based approaches that rely on symbolic reasoning via prompt engineering or textual abstraction, demonstrating the benefits of integrating geometric spatial representations for robust spatial reasoning. Our instruction dataset, evaluation benchmark, model training codes, and models' checkpoints can be found at: \hyperlink{https://github.com/chuchen2017/SLM}{https://github.com/chuchen2017/SLM}.

URL PDF HTML ☆

赞 0 踩 0

2606.04374 2026-06-04 cs.IR cs.AI 版本更新

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

DSIRM：学习查询桥接的离散语义标识符用于电商相关性建模

Bokang Wang, Xing Fang, Mingmin Jin, Jing Wang, Zhentao Song, Guangxin Song, Jianbo Zhu

发表机构 * Taobao & Tmall Group of Alibaba（淘宝与天猫集团）

AI总结针对电商搜索中连续嵌入难以捕捉细粒度属性区分的问题，提出查询桥接对比量化的离散语义标识符相关性模型（DSIRM），通过注入查询-物品交互监督学习语义感知分区，并利用生成式大语言模型预测物品标识符，显著提升相关性建模效果。

Comments Jing Wang (Corresponding Author)

详情

AI中文摘要

尽管连续嵌入在电商搜索相关性方面取得了快速进展，但一个长期存在的难题是难以捕捉细粒度的属性区分。虽然离散语义标识符（SIDs）已被广泛采用作为有前景的替代方案，但现有的SID生成方法严重依赖无监督量化。在现实场景中，缺乏显式监督通常使得更难决定哪些物品应共享一个SID，导致查询依赖排序的能力有限。为了解决无监督SID的问题，我们提出显式建模离散相关性特征，并开发了离散语义标识符相关性模型（DSIRM）。具体而言，我们在物品侧提出了一种查询桥接的对比量化方法，将查询-物品交互监督注入残差量化中，以主动学习相关性感知的语义分区。另一方面，我们在查询侧探索生成式大语言模型，从文本中显式预测物品SID，解决长尾查询和意图模糊问题。查询和物品SID之间的层次前缀匹配产生了具有判别力的特征，完美补充了密集信号。在天猫生产数据上的大量实验结果表明，我们提出的方法取得了更好的结果，离线AUC提升了+1.54%。通过高效的混合架构部署，它实现了显著的在线提升（UCTR +0.13%，UCTCVR +0.25%），证明了其巨大的工业价值。

英文摘要

Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a promising alternative, existing SID generation methods rely heavily on unsupervised quantization. In realistic scenarios, the lack of explicit supervision often makes it more difficult to dictate which items should share an SID, resulting in limited capability for query-dependent ranking. To address the issue of unsupervised SIDs, we propose to explicitly model discrete relevance features and develop a Discrete Semantic Identifier Relevance Model (DSIRM). Specifically, we present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. On the other hand, we explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity. Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals. Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54\%. Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13\% UCTR, +0.25\% UCTCVR), proving its massive industrial value.

URL PDF HTML ☆

赞 0 踩 0

2606.04365 2026-06-04 cs.CV cs.AI 版本更新

Multi-Granularity 3D Kidney Lesion Characterization from CT Volumes

多粒度3D肾脏病变特征提取来自CT体积

Renjie Liang, Zhengkang Fan, Jinqian Pan, Chenkun Sun, Jiang Bian, Russell Terry, Jie Xu

发表机构 * Department of Health Outcomes and Biomedical Informatics, University of Florida（健康结果与生物医学信息学系，佛罗里达大学）； Department of Urology, University of Florida（泌尿外科，佛罗里达大学）； Department of Biostatistics and Health Data Science, Indiana University School of Medicine（生物统计学与健康数据科学系，印第安纳大学医学院）； Center of Biomedical Informatics（生物医学信息学中心）

AI总结提出LesionDETR，一种基于DETR的架构，通过大小距离匈牙利匹配和分层损失，实现从CT体积中按病变预测四个临床属性，在双侧异常检测上达到AUC 0.799。

详情

AI中文摘要

放射学报告通过类型、大小、增强和衰减描述肾脏病变，但现有的3D方法仅在患者或器官级别进行预测。我们将肾脏CT特征提取重新定义为每个病变的集合预测任务：一个模型为每个肾脏输出可变数量的病变，每个病变具有四个临床属性。我们从一家学术医疗中心的788名患者中整理了2,619个CT体积，具有多粒度的侧别和每个病变的标签，并使用KiTS23（489例）进行零样本外部验证。我们提出了 extbf{LesionDETR}，一种DETR风格的架构，具有大小距离匈牙利匹配和分层损失，将每个槽的输出聚合到侧别目标。在四种输入表示和六种编码器初始化中，两个设计选择占主导地位：分割掩码作为输入通道，以及同域腹部预训练（SuPreM）；通用大型语料库预训练并不比随机初始化更好。LesionDETR在UF-Health上达到双侧侧别异常AUC $0.799 \pm 0.009$，在KiTS23上达到$0.817 \pm 0.072$。计数条件变体在囊性病变上达到每个病变mAP $0.190 \pm 0.083$；罕见的实性病变AP仍处于噪声水平，表明下一个瓶颈是针对性数据收集，而非架构。该框架为下游结构化报告生成提供了经过验证的每个病变预测。

英文摘要

Radiology reports describe kidney lesions by type, size, enhancement, and attenuation, yet existing 3D methods predict only at the patient or organ level. We reformulate kidney CT characterization as a per-lesion set-prediction task: one model emits a variable number of lesions per kidney, each with four clinical attributes. We curated 2,619 CT volumes from 788 patients at one academic medical center, with multi-granularity side- and per-lesion labels, and used KiTS23 (489 cases) for zero-shot external validation. We propose \textbf{LesionDETR}, a DETR-style architecture with size-distance Hungarian matching and a hierarchical loss that aggregates per-slot outputs to side-level objectives. Across four input representations and six encoder initializations, two design choices dominate: a segmentation mask as an input channel, and same-domain abdominal pretraining (SuPreM); generic large-corpus pretraining is no better than random initialization. LesionDETR reaches bilateral side-level abnormality AUC $0.799 \pm 0.009$ on UF-Health and $0.817 \pm 0.072$ on KiTS23. A count-conditioned variant reaches per-lesion mAP $0.190 \pm 0.083$ on cystic lesions; rare solid-lesion AP stays at the noise floor, pointing to targeted data collection, not architecture, as the next bottleneck. The framework yields verified per-lesion predictions for downstream structured report generation.

URL PDF HTML ☆

赞 0 踩 0

2606.04345 2026-06-04 cs.CV cs.AI cs.LG 版本更新

HYolo: An Intelligent IoT-Based Object Detection System Using Hypergraph Learning

HYolo：一种基于超图学习的智能物联网目标检测系统

Isha Abid, Fawad Khan, Muhammad Khuram Shahzad

发表机构 * National University of Sciences and Technology（国家安全科学与技术大学）

AI总结提出HYolo框架，将超图学习融入YOLO架构以建模高阶特征关系，在COCO数据集上mAP@50提升约12%。

Comments 8 pages, multiple figures;

详情

AI中文摘要

本文提出HYolo，一种基于物联网的智能目标检测框架，将超图学习集成到YOLO架构中。传统的基于YOLO的目标检测模型主要捕获成对特征交互，可能无法建模对象与上下文特征之间的复杂高阶关系。为解决这一局限，HYolo引入超图学习以捕获更丰富的上下文依赖关系并改进对象表示。在COCO数据集上的实验评估表明，与基线YOLO模型相比，性能显著提升。所提方法在mAP@50上实现了约12%的提升，同时增强了整体检测准确性和鲁棒性。通过建模高阶特征关系，HYolo在物联网环境中提供了改进的上下文理解和更可靠的目标检测性能。结果表明，将超图学习集成到目标检测流程中，为智能且上下文感知的物联网视觉系统提供了一个有前景的方向。

英文摘要

This paper presents HYolo, an intelligent IoT-based object detection framework that integrates hypergraph learning into the YOLO architecture. Traditional YOLO-based object detection models primarily capture pairwise feature interactions and may fail to model complex high-order relationships among objects and contextual features. To address this limitation, HYolo incorporates hypergraph learning to capture richer contextual dependencies and improve object representation. Experimental evaluation on the COCO dataset demonstrates significant performance improvements over baseline YOLO models. The proposed approach achieves approximately 12% improvement in mAP@50 while enhancing overall detection accuracy and robustness. By modeling high-order feature relationships, HYolo provides improved contextual understanding and more reliable object detection performance in IoT-based environments. The results indicate that integrating hypergraph learning into object detection pipelines offers a promising direction for intelligent and context-aware IoT vision systems.

URL PDF HTML ☆

赞 0 踩 0

2606.04342 2026-06-04 cs.LG cs.AI 版本更新

Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty

期望与现实：条件不确定性下MSE最优预测的成本

Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho

发表机构 * The University of Bristol（布里斯托尔大学）

AI总结本文通过条件不确定性间隙理论证明多步时间序列预测中MSE最优与边际真实性存在根本性权衡，并实证表明小幅牺牲MSE（≤5%）可显著提升边际真实性（中位数17.3%）。

Comments 12 pages, Accepted for KDD 2026 Research track

详情

DOI: 10.1145/3770855.3818087

AI中文摘要

多步时间序列预测（MSF）通常使用均方误差（MSE）等逐点误差指标进行评估，隐含地将条件均值视为充分目标。我们证明，在条件不确定性下，当条件期望在较长预测范围内无法代表典型实现值时，这种做法可能产生误导。我们通过条件不确定性间隙形式化这一效应，并证明只要该间隙非零，任何确定性预测器都无法同时最小化MSE并匹配实现未来的边际分布。这确立了MSF评估中逐点准确性与边际真实性之间根本性的、与模型无关的权衡。利用受控随机动力系统和九个真实世界预测基准，我们经验性地刻画了由此产生的准确性-真实性前沿，并量化了仅基于MSE的模型选择的实际成本。随着条件不确定性随预测范围增加，可达集扩展为明显的帕累托前沿，将MSE最优但分散不足的预测器与牺牲准确性换取真实边际变异性的方法区分开来。在多个基准中，我们发现MSE的小幅放松（≤5%）通常能带来边际真实性的不成比例提升，中位数改进为17.3%，在某些数据集中增益超过30%。我们进一步表明，常见的预测策略系统性地占据该前沿的不同区域：直接多输出预测器集中在准确性最优极端附近，而递归策略和基于样本的推断更倾向于边际真实性。这些结果共同揭示了长期预测中基于MSE评估的结构性失败模式，并将策略和推断选择重新定义为对不可避免的准确性-真实性权衡的导航。

英文摘要

Multi-step time series forecasting (MSF) is commonly evaluated using point-wise error metrics such as mean squared error (MSE), implicitly treating the conditional mean as a sufficient target. We show that this can be misleading under conditional uncertainty, where the conditional expectation becomes unrepresentative of typical realized values at longer horizons. We formalize this effect through a conditional uncertainty gap and prove that whenever this gap is nonzero, no deterministic predictor can simultaneously minimize MSE and match the marginal distribution of realized futures. This establishes a fundamental, model-agnostic trade-off between point accuracy and marginal realism in MSF evaluation. Using controlled stochastic dynamical systems and nine real-world forecasting benchmarks, we empirically characterize the resulting accuracy--realism frontier and \textbf{quantify the practical cost of MSE-only model selection}. As conditional uncertainty increases with forecast horizon, the attainable set expands into a pronounced Pareto front, separating MSE-optimal but under-dispersed predictors from methods that trade accuracy for realistic marginal variability. \textbf{Across benchmarks, we find that small relaxations in MSE ($\boldsymbol{\le 5\%}$) frequently unlock disproportionate gains in marginal realism, with median improvements of $\mathbf{17.3\%}$ and gains exceeding $\mathbf{30\%}$ in some datasets.} We further show that common forecasting strategies systematically occupy different regions of this frontier: direct multi-output predictors concentrate near the accuracy-optimal extreme, while recursive strategies and sample-based inference favors marginal realism. Together, these results expose a structural failure mode of MSE-based evaluation in long-horizon forecasting and recast strategy and inference selection as navigation of an unavoidable accuracy--realism trade-off.

URL PDF HTML ☆

赞 0 踩 0

2605.01910 2026-06-04 cs.LG cs.AI cs.DC 版本更新

Stochastic Sparse Attention for Memory-Bound Inference

随机稀疏注意力用于内存受限推理

Kyle Lee, Corentin Delacour, Kevin Callahan-Coray, Kyle Jiang, Can Yaras, Samet Oymak, Tathagata Srimani, Kerem Y. Camsari

发表机构 * University of California, Santa Barbara（加州大学圣芭芭拉分校）； University of Michigan（密歇根大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出SANTA方法，通过从后softmax分布中采样稀疏索引来减少值缓存访问，实现无乘法的高效解码，在Llama-3.1-8B-Instruct上获得1.5倍注意力核加速和1.25倍端到端加速。

Comments Code available at https://github.com/OPUSLab/SANTA

详情

Journal ref: ICML 2026

AI中文摘要

自回归解码在长上下文中变得带宽受限，因为生成每个token需要从KV缓存中读取所有$n_k$个键和值向量。我们提出随机加法无乘法注意力（SANTA），一种通过从后softmax分布中采样$S \ll n_k$个索引并仅聚合这些值行来稀疏化值缓存访问的方法。这产生了后softmax值聚合的无偏估计，同时将值阶段的乘加运算替换为收集和加法。我们引入分层和系统采样来设计方差减少、GPU友好的变体。在32k token上下文的Llama-3.1-8B-Instruct上评估，S$^2$ANTA匹配基线准确率，同时在NVIDIA RTX 6000 Ada上相比FlashInfer和FlashDecoding实现高达1.5倍解码步注意力核加速。在批处理长上下文生成中，这些核增益转化为高达1.25倍的端到端解码延迟加速。最后，我们提出伯努利$qK^\mathsf{T}$采样作为补充技术来稀疏化分数阶段，通过随机三元查询减少键特征访问。两种方法对上游量化、低秩投影、KV缓存压缩和KV缓存选择方法互补。它们共同指向稀疏、无乘法和节能的推理。我们在https://github.com/OPUSLab/SANTA.git开源了我们的核。

英文摘要

Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S \ll n_k$ indices from the post-softmax distribution and aggregates only those value rows. This yields an unbiased estimator of the post-softmax value aggregation while replacing value-stage multiply-accumulates with gather-and-add. We introduce stratified and systematic sampling to design variance-reduced, GPU-friendly variants. Evaluated on Llama-3.1-8B-Instruct at 32k-token contexts, S$^2$ANTA matches baseline accuracy while achieving up to $1.5\times$ decode-step attention-kernel speedup over FlashInfer and FlashDecoding on an NVIDIA RTX 6000 Ada. In batched long-context generation, these kernel gains translate to up to $1.25\times$ end-to-end decode-latency speedup. Finally, we propose Bernoulli $qK^\mathsf{T}$ sampling as a complementary technique to sparsify the score stage, reducing key-feature access through stochastic ternary queries. Both methods are complementary to upstream quantization, low-rank projection, KV-cache compression, and KV-cache selection methods. Together, they point toward sparse, multiplier-free, and energy-efficient inference. We open-source our kernels at: https://github.com/OPUSLab/SANTA.git

URL PDF HTML ☆

赞 0 踩 0

2606.04329 2026-06-04 cs.CR cs.AI 版本更新

上下文中的任播性能

Eric Liang

发表机构 * Oracle

AI总结本文通过比较根DNS和CDN中的任播延迟，提出了一种区分弹性驱动和延迟驱动目标的优化框架，并得出结论：运营商不应使用相同的目标函数优化根DNS和CDN任播。

详情

AI中文摘要

IP任播允许一个服务从多个物理站点通告一个地址，让BGP将每个客户端映射到一个站点。它是DNS根服务器系统、公共解析器和一些内容分发网络的核心，然而相同的路由机制在不同应用中有着截然不同的后果。本文比较了两种设置中的任播延迟：根DNS（其中递归缓存将根服务器延迟分摊到许多用户和长生存时间值上）和CDN（其中每次额外的往返直接影响页面加载、视频启动或API延迟）。综合发现，根DNS任播可能表现出显著的路径膨胀，但仍产生有限的用户可见延迟，而CDN任播需要主动工程化对等互联、路由策略、吸引范围和测量反馈以保持膨胀较小。本文贡献了一个比较延迟模型、一个可复现的测量设计以及一个将弹性驱动的任播目标与延迟驱动的目标分开的优化框架。核心结论是实用的：运营商不应使用相同的目标函数优化根DNS和CDN任播。对于根DNS，鲁棒性、可达性和缓存行为占主导地位；对于CDN服务，尾部延迟、吸引正确性和策略控制占主导地位。

英文摘要

IP anycast lets a service advertise one address from many physical sites, leaving BGP to map each client to a site. It is central to the DNS root server system, public resolvers, and some content delivery networks, yet the same routing mechanism has very different consequences across applications. This paper compares anycast latency in two settings: root DNS, where recursive caching amortizes root-server delay over many users and long time-to-live values, and CDNs, where each additional round trip can directly affect page-load, video-start, or API latency. The synthesis finds that root DNS anycast can exhibit substantial path inflation while still producing limited user-visible delay, whereas CDN anycast requires active engineering of peering, route policy, catchment scope, and measurement feedback to keep inflation small. The paper contributes a comparative latency model, a reproducible measurement design, and an optimization framework that separates resilience-driven anycast objectives from latency-driven objectives. The central conclusion is practical: operators should not optimize root DNS and CDN anycast with the same objective function. For root DNS, robustness, reachability, and cache behavior dominate; for CDN services, tail latency, catchment correctness, and policy control dominate.

URL PDF HTML ☆

赞 0 踩 0

2606.04296 2026-06-04 cs.AI 版本更新

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

饱和陷阱与干预时机的主观性：为什么基于情感的触发器和LLM评判者无法在自主智能体上把握干预时机

Manvendra Modgil

发表机构 * manvendramodgil.ai

AI总结本研究通过18维情感动力学引擎HEART诊断自主智能体干预时机问题，发现状态饱和陷阱、LLM评判者的能力与上下文门槛，以及人类标注者之间极低的干预时机一致性，表明干预时机是一个低可靠性构念。

Comments 11 pages, 5 tables. Code and data:https://github.com/2025eb1100268-tech/intervention-timing-saturation-trap

详情

AI中文摘要

随着自主AI智能体从对话系统转向长周期软件执行，决定何时中断智能体的运行时安全层变得至关重要。我们使用一个连续的18维情感动力学引擎（HEART）作为诊断探针，研究了这一时机问题，评估了四种干预触发家族——绝对状态阈值、复合状态-动作模式、正则推理特征提取和零样本LLM作为评判者——针对SWE-bench-Verified调试轨迹上人工标注的干预点。我们报告了三个发现。首先，状态饱和陷阱：智能体在持续困难下没有恢复信号，因此建模的挫折感迅速越过阈值并保持最大值，将基于状态阈值的触发器从时刻检测器转变为近乎恒定的指示器，在五个轨迹中触发39-83%的动作。其次，LLM评判者的能力和上下文底线：小模型（gpt-5.4-mini）从不触发，而前沿和跨供应商模型只有在完整轨迹上下文下才能逃脱零触发底线，即使如此，F1值也仅为0.17-0.40，成本高达90倍。第三，最重要的是，监督目标在人类之间不可复现：三名训练有素的标注者使用同一评分标准对一条56动作轨迹进行标注，在干预位置上的一致性仅略高于偶然（位置Krippendorff's alpha = +0.047；最佳成对Cohen's kappa = +0.349），而在干预类型上完全不一致（暂停退化；澄清低于偶然；仅反思alpha = +0.226）。我们得出结论，干预时机是一个低可靠性构念，使得单标注者F1不适合作为优化目标。我们的贡献是跨人类评分者间信度、四种检测器架构、跨模型LLM评判者扫描以及复现的饱和效应，共同绘制了这一问题图谱，而非任何单一检测器的准确性。

英文摘要

As autonomous AI agents move from conversational systems to long-horizon software execution, runtime safety layers that decide when to interrupt an agent have become essential. We study this timing problem using a continuous 18-dimensional affective-dynamics engine (HEART) as a diagnostic probe, evaluating four intervention trigger families - absolute state thresholds, composite state-action patterns, regex reasoning-feature extraction, and zero-shot LLM-as-judge - against human-annotated intervention points on SWE-bench-Verified debugging traces. We report three findings. First, a State Saturation Trap: agents show no recovery signal under sustained difficulty, so modeled frustration quickly crosses the threshold and stays at its maximum, converting threshold-on-state triggers from moment detectors into near-constant indicators that fire on 39-83% of actions across five trajectories. Second, a capability-and-context floor for LLM judges: a small model (gpt-5.4-mini) never fires, while frontier and cross-vendor models escape the zero-firing floor only with full-trajectory context, and even then reach only F1 0.17-0.40 at up to 90x the cost. Third, and most importantly, the supervised target is not reproducible among humans: three trained annotators using one rubric on a 56-action trajectory agree on where to intervene only slightly above chance (location Krippendorff's alpha = +0.047; best pairwise Cohen's kappa = +0.349) and not at all on intervention type (pause degenerate; clarify below chance; reflect only alpha = +0.226). We conclude that intervention timing is a low-reliability construct, making single-annotator F1 an unsuitable optimization target. Our contribution is the joint mapping of this problem across human inter-rater reliability, four detector architectures, a cross-model LLM-judge sweep, and a reproduced saturation effect, rather than any single detector's accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.04287 2026-06-04 cs.LG cs.AI 版本更新

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

通过轻量级结构引导自回归模型扩展新颖图生成

Alessio Barboni, Massimiliano Lupo Pasini, Bishal Lakha, Edoardo Serra

发表机构 * Boise State University（博伊州立大学）； Oak Ridge National Laboratory（橡树岭国家实验室）

AI总结提出一种轻量级自回归框架，利用结构引导拓扑排序和两阶段训练策略，在分子和非分子基准上实现高新颖性、有效性和唯一性的图生成。

详情

AI中文摘要

生成真实且多样的图是机器学习中的一个关键问题，在分子发现、电路设计、网络安全等领域有应用。然而，当前的图生成模型在可扩展性和新颖性方面仍存在局限。基于扩散的方法通常需要昂贵的全邻接操作和长去噪链，而许多自回归和混合模型至少具有二次复杂度。此外，这些模型往往模仿训练图而非泛化到新图。我们提出一个轻量级自回归框架来解决这些问题。它使用结构引导的拓扑排序将图序列化为规则的边序列，实现近对数线性生成，以及一种两阶段训练策略，结合探索导向的增强和迭代细化，以减少过拟合并促进受控的新颖性。在分子和非分子基准上的实验表明，我们的方法在保持高有效性和唯一性的同时提高了新颖性。该框架还支持LSTM和Mamba风格的因果序列骨干，大内存加速器使得能够进行超出典型GPU限制的更长的图序列实验。

英文摘要

Generating realistic and diverse graphs is a key problem in machine learning, with applications in molecular discovery, circuit design, cybersecurity, and beyond. However, current graph generative models remain limited by scalability and novelty. Diffusion-based methods often require costly full-adjacency operations and long denoising chains, while many autoregressive and hybrid models have at least quadratic complexity. In addition, these models often imitate training graphs rather than generalize beyond them. We propose a lightweight autoregressive framework to address these issues. It uses a structure-guided topological ordering to serialize graphs into regular edge sequences, enabling near log-linear generation, and a two-phase training strategy that combines exploration-oriented augmentation with iterative refinement to reduce overfitting and promote controlled novelty. Experiments on molecular and non-molecular benchmarks show that our approach improves novelty while preserving high validity and uniqueness. The framework also supports both LSTM and Mamba-style causal sequence backbones, with large-memory accelerators enabling longer graph-sequence experiments beyond typical GPU limits.

URL PDF HTML ☆

赞 0 踩 0

2606.04284 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

稀疏混合专家奖励模型学习可解释且专业化的专家用于个性化偏好建模

Yifan Wang, Jinyi Mu, Mayank Jobanputra, Yu Wang, Ji-Ung Lee, Soyoung Oh, Isabel Valera, Vera Demberg

发表机构 * Saarland University（萨尔兰大学）； Independent Researcher（独立研究者）； Bielefeld University（比勒菲尔德大学）； Max Planck Institute for Software Systems（马克斯·普朗克软件系统研究所）； Max Planck Institute for Informatics（马克斯·普朗克信息研究所）

AI总结提出稀疏混合专家奖励模型，通过稀疏路由和专家多样性训练，从二元偏好数据中学习可解释的专家模式，提升个性化偏好建模的测试时适应性和可解释性。

详情

AI中文摘要

偏好建模在基于人类反馈的强化学习（RLHF）中扮演核心角色，使大型语言模型（LLMs）与人类价值观对齐。然而，大多数现有方法假设一个通用的奖励函数，忽视了人类偏好的多样性和异质性。为了在不增加额外标注成本的情况下解决这一限制，最近的工作提出从二元数据中学习多个偏好组件，并组合它们以建模个体偏好。然而，这些组件往往无法捕捉连贯且解耦的模式，限制了其可解释性和个性化效果。在这项工作中，我们提出了一种稀疏混合专家（MoE）奖励模型，该模型在二元偏好数据训练过程中鼓励稀疏路由和专家多样性。在受控和真实世界的实验中，稀疏MoE学习了可解释的路由模式和专业化的专家。它还改进了测试时的个性化，并且适应后的专家权重变化为分析模型如何适应个性化偏好提供了定性视角。

英文摘要

Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward function, neglecting the diversity and heterogeneity of human preferences. To address this limitation without additional annotation costs, recent work has proposed learning multiple preference components from binary data and combining them to model individual preferences. Nevertheless, these components often fail to capture coherent and disentangled patterns, limiting their interpretability and effectiveness for personalization. In this work, we propose a sparse Mixture-of-Experts (MoE) reward model that encourages sparse routing and expert diversity during training on binary preference data. Across controlled and real-world experiments, sparse MoE learns interpretable routing patterns and specialized experts. It also improves test-time personalization, and post-adaptation shifts in expert weights provide a qualitative lens for analyzing how the model adapts to personalized preferences.

URL PDF HTML ☆

赞 0 踩 0

2606.04280 2026-06-04 cs.LG cs.AI cs.IR 版本更新

The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning

损失还不够：对比表示学习中的采样条件和归纳偏置

Justinas Zaliaduonis, Patrick Putzky, Till Richter, Sergios Gatidis

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结本文通过测度论框架形式化对比学习中的多样性条件，提出支持校正的InfoNCE变体，并实验验证了采样多样性与编码器归纳偏置的相互作用。

详情

AI中文摘要

对比学习已成为自监督表示学习的主要范式，但其恢复有意义潜在几何的条件尚未完全理解。我们开发了一个测度论框架，形式化了多样性条件，即正对采样的支持要求，这是等距潜在恢复所必需的。我们表明，标准的全支持von Mises-Fisher设置意味着满足多样性条件，因此全局对比损失最小化器可以恢复潜在几何（直到正交变换），而受限条件分布可以使非正交映射达到严格更低的渐近对比损失。我们引入了一种支持校正的信息噪声对比估计（InfoNCE）变体作为理论修复：这种校正使得正交潜在空间恢复成为可能，但并不能唯一选择它。在合成基准上的实验验证了可识别性预测，CIFAR-10实验与定性预测一致，即当采样多样性有限时，架构归纳偏置变得更加重要。总之，我们的结果阐明了采样机制和编码器归纳偏置在对比表示学习中的相互作用。

英文摘要

Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing the diversity condition, a support requirement on positive-pair sampling that is necessary for isometric latent recovery. We show that the standard full-support von Mises-Fisher setting implies the satisfaction of the diversity condition and as a consequence global contrastive loss minimizers recover latent geometry up to orthogonal transformation, while restricted conditionals can make non-orthogonal maps attain strictly lower asymptotic contrastive loss. We introduce a support-corrected Information Noise Contrastive Estimation (InfoNCE) variant as a theoretical fix: this correction makes orthogonal latent space recovery achievable but does not uniquely select it. Experiments on synthetic benchmarks validate the identifiability predictions, and CIFAR-10 experiments are consistent with the qualitative prediction that architectural inductive bias becomes more important when sampling diversity is limited. Together, our results clarify how sampling mechanisms and encoder inductive bias interact in contrastive representation learning.

URL PDF HTML ☆

赞 0 踩 0

2606.04275 2026-06-04 cs.LG cs.AI 版本更新

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

从蜱虫到流：连续环境中神经强化学习的动力学

Saket Tiwari, Tejas Kotwal, George Konidaris

发表机构 * Brown University（布朗大学）

AI总结本文通过将深度强化学习建模为连续时间随机过程，利用随机控制理论，首次推导了连续环境下过参数化神经演员-评论家算法在无限宽度极限下的状态分布演化方程。

Comments Presented at ICLR 2026: https://openreview.net/forum?id=TdiRLe3rPA

详情

AI中文摘要

我们提出了一种新颖的深度强化学习（RL）在连续环境中的理论框架，通过借鉴随机控制的思想，将问题建模为连续时间随机过程。在先前工作的基础上，我们引入了一个可行的演员-评论家算法模型，该模型同时包含探索和随机转移。对于单隐藏层神经网络，我们表明环境状态可以表述为两个时间尺度的过程：环境时间和梯度时间。在此框架下，我们描述了表示环境状态和累积折扣回报估计的时间相关随机变量如何在两层网络的无限宽度极限下随梯度步长演化。利用随机微分方程理论，我们首次在连续RL中推导出一个方程，描述了在极小的学习率下，每个梯度步长上状态分布的无穷小变化。总体而言，我们的工作为研究过参数化神经演员-评论家算法提供了一种新颖的非参数化表述。我们通过一个简单的连续控制任务实证验证了我们的理论结果。

英文摘要

We present a novel theoretical framework for deep reinforcement learning (RL) in continuous environments by modeling the problem as a continuous-time stochastic process, drawing on insights from stochastic control. Building on previous work, we introduce a viable model of actor-critic algorithm that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, we show that the state of the environment can be formulated as a two time scale process: the environment time and the gradient time. Within this formulation, we characterize how the time-dependent random variables that represent the environment's state and estimate of the cumulative discounted return evolve over gradient steps in the infinite width limit of two-layer networks. Using the theory of stochastic differential equations, we derive, for the first time in continuous RL, an equation describing the infinitesimal change in the state distribution at each gradient step, under a vanishingly small learning rate. Overall, our work provides a novel nonparametric formulation for studying overparametrized neural actor-critic algorithms. We empirically corroborate our theoretical result using a toy continuous control task.

URL PDF HTML ☆

赞 0 踩 0

2606.04273 2026-06-04 cs.AI 版本更新

Characterizing initial human-AI proof formalization workflows

表征初始人机交互的证明形式化工作流

Katherine M. Collins, Simon Frieder, Jonas Bayer, Jacob Loader, Jeck Lim, Peiyang Song, Fabian Zaiser, Lexin Zhou, Shanda Li, Sam Looi, Joshua B. Tenenbaum, Umang Bhatt, Adrian Weller, Jose Hernandez-Orallo, Cameron E. Freer, Valerie Chen, Ilia Sucholutsky

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； University of Cambridge（剑桥大学）； Princeton University（普林斯顿大学）； University of Oxford（牛津大学）； Caltech（加州理工学院）； Carnegie Mellon University（卡内基梅隆大学）； Universitat Politècnica de València（瓦伦西亚理工大学）； New York University（纽约大学）

AI总结通过混合方法分析，研究人们在形式化证明过程中对AI工具的需求、障碍及实际使用模式，发现AI辅助能提高形式化准确率且用户偏好多样但普遍希望保持人类对证明发现过程的高层控制。

详情

AI中文摘要

几个世纪以来，人类数学家通过书写证明来支撑其数学论证；然而，自动验证证明有效性的能力长期以来一直是一个挑战。AI系统在生成代码和进行日益高级的数学推理方面的进步，有望改变人们形式化并进而验证证明的能力。虽然许多工作聚焦于对当前前沿进行基准测试，但我们转而研究人们如何使用这些工具。我们采用混合方法分析，研究AI对人们形式化工作流的初始影响：人们声称想要什么，他们认为这些愿景的障碍是什么，以及他们在实践中如何实际使用和适应AI。一项定性调查显示，人们的偏好是多样化的，但普遍希望AI辅助形式化，同时保留人类对证明发现过程的高层控制。为了评估在这种限制下人们如何实际使用AI进行形式化，我们进行了一项受控用户研究，参与者形式化非正式的数学问题及其证明，在有和没有AI的情况下，涉及不同难度和领域的多种数学问题。尽管当时用于自动形式化的工具有限，但参与者在使用AI工具时往往比单独形式化时获得更高的形式化准确率，大多数参与者灵活选择使用多种不同的AI工具。综合来看，我们的工作揭示了AI融入形式化工作流的早期阶段，涉及人类与AI参与的密切互动。

英文摘要

For centuries, human mathematicians have written proofs to substantiate their mathematical arguments; yet, the ability to automatically verify the validity of proofs has long been a challenge. Advances in AI systems' ability to generate code and engage in increasingly high-level mathematical reasoning promise to transform people's ability to formalize and thereby verify proofs. While many works focus on benchmarking the current frontier, we instead study how people use these tools. We conduct a mixed-methods analysis into the initial impact of AI on people's formalization workflows: what people claim they want, what they see as the barriers to those visions, and how they actually use and adapt AI in practice. A qualitative survey shows that people's preferences are diverse, but with a general desire for AI assistance in formalization that preserves high-level human control over the proof discovery process. To assess how people actually engage with AI for formalization under such limitations, we conduct a controlled user study in which participants formalize informal math problems and their proofs, with and without AI, across a range of mathematical problems at varying levels of difficulty and domains. Despite limitations of the tools at the time for autoformalization, participants tend to attain higher formalization accuracy when allowed access to AI tools than when formalizing on their own, with most participants flexibly choosing to use multiple different AI tools. Taken together, our work sheds light on the early stages of AI integration into formalization workflows, involving an intimate interplay of human and AI engagement.

URL PDF HTML ☆

赞 0 踩 0

2606.04271 2026-06-04 cs.CV cs.AI 版本更新

StandardE2E: A Unified Framework for End-to-End Autonomous Driving Datasets

StandardE2E：端到端自动驾驶数据集的统一框架

Stepan Konev

发表机构 * University of Cambridge（剑桥大学）

AI总结提出StandardE2E框架，通过统一数据模式、多数据集联合加载和简化新数据集添加流程，解决端到端自动驾驶数据集格式不兼容问题。

详情

AI中文摘要

自动驾驶已从模块化的感知-预测-规划堆栈转向端到端（E2E）模型，这些模型直接将传感器输入映射到车辆控制，通常通过辅助任务（如3D检测、运动预测和高清地图感知）进行正则化。进展由快速增长的传感器丰富驾驶数据集生态系统驱动，但每个数据集都有自己的文件格式、API、坐标约定和模态覆盖范围，导致跨数据集实验甚至基本的每个数据集预处理都需要为每个项目重新实现。我们提出StandardE2E，一个为E2E驾驶数据集提供统一接口的框架。StandardE2E (i) 在共享数据模式下标准化每个数据集的预处理；(ii) 在单个PyTorch DataLoader中组合多个数据集，用于跨数据集预训练、辅助任务监督和场景级过滤；(iii) 将添加新数据集简化为从原始帧到规范模式的单个数据集映射，而整个下游流程保持不变。该框架开箱即支持六个数据集：Waymo End-to-End、Waymo Perception、Argoverse 2 Sensor、Argoverse 2 LiDAR、NAVSIM (OpenScene-v1.1) 和 WayveScenes101，并作为开源标准e2e Python包发布，可在 https://github.com/stepankonev/StandardE2E 获取。

英文摘要

Autonomous driving has shifted from modular perception-prediction-planning stacks toward end-to-end (E2E) models that map sensor inputs directly to vehicle control, often regularized by auxiliary tasks such as 3D detection, motion forecasting, and HD-map perception. Progress is driven by a fast-growing ecosystem of sensor-rich driving datasets, yet each ships its own file formats, APIs, coordinate conventions, and modality coverage, leaving cross-dataset experimentation and even basic per-dataset preprocessing to be re-implemented per project. We present StandardE2E, a framework that provides a single unified interface over E2E driving datasets. StandardE2E (i) standardizes per-dataset preprocessing under one shared data schema; (ii) combines multiple datasets in a single PyTorch DataLoader for cross-dataset pretraining, auxiliary-task supervision, and scenario-level filtering; and (iii) reduces adding a new dataset to a single per-dataset mapping from raw frames to the canonical schema, leaving the entire downstream pipeline unchanged. The framework supports six datasets out of the box: Waymo End-to-End, Waymo Perception, Argoverse 2 Sensor, Argoverse 2 LiDAR, NAVSIM (OpenScene-v1.1), and WayveScenes101, and is released as the open-source standard-e2e Python package, available at https://github.com/stepankonev/StandardE2E.

URL PDF HTML ☆

赞 0 踩 0

2606.04269 2026-06-04 cs.RO cs.AI cs.CV 版本更新

StepPRM-RTL：基于逐步过程奖励引导的LLM微调以增强RTL综合

Prashanth Vijayaraghavan, Apoorva Nitsure, Luyao Shi, Ehsan Degan, Vandana Mukherjee

发表机构 * IBM Research San Jose CA USA（IBM研究院圣何塞加州美国）

AI总结提出StepPRM-RTL框架，结合逐步轨迹建模、过程奖励模型和检索增强微调，通过密集反馈和蒙特卡洛树搜索探索推理路径，提升LLM生成RTL代码的功能正确性和推理保真度，在基准数据集上相比先前方法提升超10%。

Comments 6 pages, 2 figures, DAC'2026

详情

DOI: 10.1145/3770743.3804218

AI中文摘要

由于Verilog和VHDL中的长程推理、多步依赖和严格正确性约束，数字硬件设计的RTL代码自动生成仍然具有挑战性。我们提出StepPRM-RTL，一种新颖的框架，结合逐步轨迹建模、过程奖励模型（PRM）和检索增强微调（RAFT），以增强基于LLM的RTL代码生成的功能正确性和推理保真度。StepPRM-RTL从规范解构建逐步推理轨迹，其中每一步包含一个理由和增量代码修改。过程奖励模型（PRM）评估中间步骤，提供密集反馈，指导RAFT微调期间的强化式更新。蒙特卡洛树搜索（MCTS）探索替代推理路径，用高质量轨迹丰富训练数据集。这种逐步和结果感知奖励的集成使模型能够学习如何以及为何构建正确的RTL，从而改善超出标准监督或基于结果训练的长程推理。在基准Verilog和VHDL数据集上的实验评估表明，StepPRM-RTL在功能正确性和推理保真度指标上优于先前最佳方法超过10%。消融研究证实，PRM引导奖励和逐步轨迹探索的结合是其性能的关键。StepPRM-RTL跨RTL语言泛化，并为高保真、可解释的代码生成提供了可扩展框架，为LLM辅助硬件设计自动化建立了新标准。

英文摘要

Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT) to enhance both the functional correctness and reasoning fidelity of LLM-based RTL code generation. StepPRM-RTL constructs stepwise reasoning trajectories from canonical solutions, where each step contains a rationale and incremental code modification. A Process Reward Model (PRM) evaluates intermediate steps, providing dense feedback that guides reinforcement-style updates during RAFT fine-tuning. Monte Carlo Tree Search (MCTS) explores alternative reasoning paths, enriching the training dataset with high-quality trajectories. This integration of stepwise and outcome-aware rewards allows the model to learn both how and why to construct correct RTL, improving long-horizon reasoning beyond standard supervised or outcome-based training. Experimental evaluation on benchmark Verilog and VHDL datasets demonstrates that StepPRM-RTL outperforms the best prior methods by over 10\% in functional correctness and reasoning fidelity metrics. Ablation studies confirm that the combination of PRM-guided rewards and stepwise trajectory exploration is key to its performance. StepPRM-RTL generalizes across RTL languages and provides a scalable framework for high-fidelity, interpretable code generation, establishing a new standard for LLM-assisted hardware design automation.

URL PDF HTML ☆

赞 0 踩 0

2606.04244 2026-06-04 cs.AI cs.CL cs.CV cs.LG 版本更新

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

VAMPS: 视觉辅助数学问题求解基准

Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari, Yasamin Medghalchi, Ilker Hacihaliloglu, Mesrob Ohannessian, Lele Wang, Giuseppe Carenini

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出VAMPS基准，通过1,168道双语多选题评估多模态大模型在借助绘图工具进行数学推理时的表现，发现直接解析求解优于工具辅助视觉求解。

详情

AI中文摘要

多模态大语言模型在复杂推理方面能力日益增强，但当它们必须通过工具外部化问题然后基于工具输出进行推理时，尤其是在依赖视觉辅助的情况下，其性能往往会下降。这一差距尤为重要，因为真实的工程和科学工作流程通常依赖可视化工具进行分析、验证和决策。为了研究这一差异，我们引入了VAMPS（视觉辅助数学问题求解），一个用于图辅助数学的基准。VAMPS包含1,168个多模态、双语选择题问答对，这些题目来自伊朗大学入学考试的代数和微积分问题，并通过人工审核的LLM生成的合成变体进行了扩展，所有题目都经过精心挑选，使得绘图能够通过揭示交点、极值、渐近线等提供自然的求解策略。VAMPS旨在用于基准测试和诊断，它超越了以往主要评估在固定视觉输入上进行推理的多模态基准，通过测试模型是否能够从构建有用的图形中受益并将其答案基于结果可视化。总体而言，我们发现，在一组多样化的模型中，直接解析求解出人意料地优于工具辅助的视觉求解，即使在绘图是自然策略的问题上也是如此。

英文摘要

Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

URL PDF HTML ☆

赞 0 踩 0

2606.04240 2026-06-04 cs.CV cs.AI cs.CL 版本更新

Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)

EReL@MIR 2025 多模态文档检索挑战赛（赛道1）概述

Jingbiao Mei

发表机构 * University of Cambridge（剑桥大学）； Cambridge United Kingdom（剑桥英国）

AI总结本文介绍了EReL@MIR 2025多模态文档检索挑战赛（赛道1）的设计、数据集、评估协议、最终排名及前三名获胜系统的分析，所有系统均基于Qwen2-VL系列解码器多模态大语言模型嵌入器。

Comments MDR Challenge Report at WWW2025

详情

AI中文摘要

对于视觉丰富的文档（即文本与图形、表格和图表交织的页面）的检索，对于多模态检索增强生成至关重要，然而大多数检索器仍然丢弃视觉通道。\emph{多模态文档检索挑战赛}是首届EReL@MIR研讨会（与2025年万维网会议同期举办）中MIR挑战赛的赛道1，要求参与者构建一个\emph{单一}检索系统，处理两种互补的场景：基于文本查询在长文档内进行封闭集文档页面检索（MMDocIR），以及基于图像或图像加文本查询进行开放域维基百科风格段落检索（M2KR）。系统根据两个任务上平均Recall@$\{1,3,5\}$的宏平均值进行排名。该挑战赛吸引了来自22个团队的455名参赛者和586份提交。本报告描述了挑战赛的设计、数据集和评估协议；报告了最终排名；并分析了三个获胜团队的系统。所有三个系统都基于Qwen2-VL系列的解码器多模态大语言模型嵌入器，而非CLIP风格的编码器，主要区别在于它们是通过微调集成、无训练的多路融合与强视觉语言重排序器，还是零样本后期交互达到顶尖水平。无训练系统与微调获胜者的得分差距在0.1分以内。

英文摘要

Retrieval over visually-rich documents, pages that interleave text with figures, tables, and charts, is essential for multimodal retrieval-augmented generation, yet most retrievers still discard the visual channel. The \emph{Multimodal Document Retrieval Challenge}, Track~1 of the MIR Challenge at the first EReL@MIR workshop, co-located with The Web Conference 2025, asks participants to build a \emph{single} retrieval system that handles two complementary regimes: closed-set document page retrieval within long documents from a text query (MMDocIR), and open-domain retrieval of Wikipedia-style passages from an image or image-plus-text query (M2KR). Systems are ranked by the macro-average of mean Recall@$\{1,3,5\}$ over the two tasks. The challenge drew 455 entrants and 586 submissions across 22 teams. This report describes the challenge design, datasets, and evaluation protocol; reports the final standings; and analyses the three winning teams' systems. All three build on decoder-based Multimodal-LLM embedders from the Qwen2-VL family rather than on CLIP-style encoders, and differ chiefly in whether they reach the top through fine-tuned ensembles, training-free multi-route fusion with a strong vision-language re-ranker, or zero-shot late interaction. The training-free system finished within $0.1$ point of the fine-tuned winner.

URL PDF HTML ☆

赞 0 踩 0

2606.04238 2026-06-04 cs.LG cs.AI 版本更新

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

Recover-LoRA 用于激进量化：通过低秩适配与合成数据知识蒸馏恢复2比特语言模型的精度

Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao

发表机构 * Advanced Micro Devices, Inc.（先进微器件公司）

AI总结针对2比特激进量化导致的大语言模型精度严重下降问题，提出Recover-LoRA方法，结合选择性混合精度策略（仅MLP的gate和up层量化为2比特）和基于合成数据蒸馏的低秩适配训练，在Qwen3-4B上以1万合成样本在12个基准中恢复9个基准80-95%的精度。

详情

AI中文摘要

将权重激进量化至2比特精度可大幅提升大语言模型推理的吞吐量和内存效率，但通常会导致严重的精度下降。这些增益对于内存容量和带宽为主要限制的边缘和设备端部署尤为重要。在本工作中，我们将Recover-LoRA——一种最初为通用模型权重损坏设计的轻量级、无需数据的精度恢复方法——扩展到超低比特量化场景。我们提出了一种选择性混合精度策略，其中仅MLP的gate和up投影层被量化为2比特（W2），而所有其他线性层保持更高精度，从而形成混合精度的GateUp配置。通过三个模型系列（4B-20B）和两个硬件平台的屋顶线分析，我们证明W4/W2-GateUp部署（4比特基础加2比特gate/up）相比均匀W4可实现7.5-23.3%的TPS提升（取决于模型和上下文长度），同时将量化误差限制在可预测的层子集内。然后，我们应用Recover-LoRA——在量化层上通过合成数据的logit蒸馏训练低秩适配器——来恢复因gate和up层的2比特量化而损失的精度。在Qwen3-4B的案例研究中，Recover-LoRA仅使用1万合成训练样本且无需标注数据，就在12个基准中的9个上实现了80-95%的精度恢复。我们进一步证明，对于基于蒸馏的恢复，合成数据的表现与精心整理的标注数据相当，并且恢复结果可泛化到分布外评估任务。我们的结果表明，Recover-LoRA是一种实用的后量化精度恢复工具，适用于部署场景中的激进权重压缩。

英文摘要

Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraints. In this work, we extend Recover-LoRA -- a lightweight, data-free accuracy recovery method originally developed for general model weight corruption -- to the setting of ultra-low-bit quantization. We propose a selective mixed-precision strategy in which only gate and up projection layers of the MLP are quantized to 2-bit (W2), while all other linear layers remain at higher precision, yielding a mixed-precision GateUp configuration. We demonstrate via roofline analysis across three model families (4B--20B) and two hardware platforms that a W4/W2-GateUp deployment (4-bit base with 2-bit gate/up) delivers 7.5--23.3\% TPS improvement over uniform W4 depending on model and context length, while confining quantization error to a predictable subset of layers. We then apply Recover-LoRA -- training low-rank adapters on the quantized layers via logit distillation with synthetic data -- to recover accuracy lost from 2-bit quantization of the gate and up layers. In a case study on Qwen3-4B, Recover-LoRA achieves 80--95\% accuracy recovery on 9 of 12 benchmarks, using only 10k synthetic training samples and no labeled data. We further demonstrate that synthetic data performs comparably to curated labeled data for distillation-based recovery, and that recovery generalizes to out-of-distribution evaluation tasks. Our results present Recover-LoRA as a practical post-quantization accuracy recovery tool for aggressive weight compression in deployment settings.

URL PDF HTML ☆

赞 0 踩 0

2606.04236 2026-06-04 cs.CL cs.AI cs.LG 版本更新

DetectZoo：一个用于跨文本、音频和图像模态的AI生成内容检测的统一工具包

Sajad Ebrahimi, Nima Jamali, Bardia Shirsalimian, Kelly McConvey, Wentao Zhang, Jalehsadat Mahdavimoghaddam, Maksym Taranukhin, Maura Grossman, Vered Shwartz, Yuntian Deng, Ebrahim Bagheri

发表机构 * University of Toronto（多伦多大学）； University of Waterloo（滑铁卢大学）； Toronto Metropolitan University（多伦多 Metropolitan 大学）； University of British Columbia（不列颠哥伦比亚大学）； Vector Institute（向量研究所）

AI总结提出DetectZoo，一个首个统一的多模态AI生成内容检测工具包，通过标准化数据预处理、评估流程和集成61个检测器与22个基准数据集，实现公平可重复的基准测试。

详情

AI中文摘要

生成模型的日益普及和能力提升模糊了人类与机器生成内容之间的界限，推动了跨文本、图像和音频检测领域的大量研究。大多数现有的检测器要么是商业软件，要么是开源但带有不兼容的代码库、定制化的预处理、评估协议和评估指标，这使得它们的采用、公平比较和复现变得相当困难。为了解决这一关键差距，我们引入了DetectZoo，这是首个可扩展的工具包，旨在为跨文本、音频和图像模态的AI生成内容检测提供统一接口。DetectZoo标准化了从数据摄取和预处理到模型评估的完整实证流程，为研究人员提供了一个统一的框架来系统地基准测试最先进的检测器。通过将多样的公共数据集和基线检测算法集成到单一的统一API下，我们的工具包促进了严格且可重复的评估。DetectZoo提供了61个检测器的参考实现、22个基准数据集的原生加载器，以及一个标准化的评估流程，通过通用接口报告多个指标。每个检测器都是自包含的，但可通过同一接口访问，自动缓存预训练权重，并复现原始发表的结果。DetectZoo降低了多模态AI取证的入门门槛，使研究人员能够识别跨领域的性能差距，并加速开发鲁棒、可泛化的检测技术。开源仓库和全面文档可在https://github.com/sadjadeb/DetectZoo 获取，且可通过pip install detectzoo安装该包。

英文摘要

The growing popularity and capacity of generative models have eroded the distinction between human and machine-generated content, motivating a growing body of work on detection across text, images, and audio. Most available detectors are either commercial software or, if open-source, come with incompatible codebases with bespoke preprocessing, evaluation protocols, and evaluation metrics, which make their adoption, fair comparison, and reproduction quite difficult. To address this critical gap, we introduce DetectZoo, a first-of-its-kind, extensible toolkit designed to provide a unified interface for AI-generated content detection across text, audio, and image modalities. DetectZoo standardizes the complete empirical pipeline, from data ingestion and preprocessing to model assessment, offering researchers a cohesive framework to benchmark state-of-the-art detectors systematically. By integrating diverse public datasets and baseline detection algorithms under a single, unified API, our toolkit facilitates rigorous and reproducible evaluation. DetectZoo provides reference implementations of 61 detectors, native loaders for 22 benchmark datasets, and a standardized evaluation pipeline that reports multiple metrics through a common interface. Each detector is self-contained yet accessible through the same interface, automatically caches pretrained weights, and reproduces the original published results. DetectZoo lowers the barrier to entry for multi-modal AI forensics, enabling researchers to identify performance gaps across domains and accelerating the development of robust, generalizable detection techniques. The open-source repository and comprehensive documentation are publicly available at https://github.com/sadjadeb/DetectZoo, and the package can be installed via pip install detectzoo.

URL PDF HTML ☆

赞 0 踩 0

2606.04202 2026-06-04 cs.AI 版本更新

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

SMAC-Talk: 面向大型语言模型的星际争霸多智能体挑战的自然语言扩展

Joel Sol, Homayoun Najjaran

发表机构 * Faculty of Engineering and Computer Science（工程与计算机科学学院）； University of Victoria（维多利亚大学）

AI总结提出SMAC-Talk环境，通过自然语言通信通道评估LLM智能体在合作多智能体场景中的协调与信任，并构建含欺骗性通信者的评估场景。

Comments 8 pages, 1 figure

详情

AI中文摘要

随着LLM的广泛部署，它们越来越需要与其他AI智能体协同工作而非孤立运行。在这些场景中，有效协调要求智能体进行通信、共享信息并在不确定性下做出决策。我们提出了SMAC-Talk，这是星际争霸多智能体挑战的自然语言扩展，用于评估基于LLM的智能体在合作多智能体环境中的表现。该环境具有分散控制、部分可观测性和长周期决策等关键特征。SMAC-Talk包含一个自然语言通信通道，用于探测智能体的协调与信任。我们利用该通信通道构建了不同的评估场景，包括嵌入欺骗性通信者的设置，该通信者试图仅通过通信来干扰和欺骗盟友。我们提供了三个基准测试智能体，使用Qwen3.5系列的4个模型，并研究了推理结构、记忆和模型规模如何影响智能体间的协调。我们将SMAC-Talk作为开放基准发布，以支持研究社区在合作多智能体场景中开发和评估LLM智能体。

英文摘要

As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge for evaluating LLM-based agents in cooperative multi-agent environments. The environment has several key features such as decentralized control, partial observability and long-horizon decision making. SMAC-Talk includes a natural language communication channel which is used to probe agent coordination and trust. We use this communication channel to construct different evaluation scenarios, including settings with an embedded deceptive communicator that tries to disrupt and deceive allies through communication alone. We provide three agents for benchmarking using 4 models from the Qwen3.5 family and study how reasoning structure, memory and model scale affect coordination between agents. We release SMAC-Talk as an open benchmark to support the research community in developing and evaluating LLM agents in cooperative multi-agent settings.

URL PDF HTML ☆

赞 0 踩 0

2606.04193 2026-06-04 cs.CR cs.AI cs.DC 版本更新

Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions

公证代理：面向AI代理行为的接收方认证保密收据

Juan Figuera

发表机构 * Independent Researcher, Sello Project（独立研究者，Sello项目）

AI总结针对AI代理日志自审计的信任缺陷，提出接收方签名收据协议Sello，通过HPKE加密、JWS绑定和Merkle日志实现防篡改追踪。

Comments 22 pages. Reference implementation at https://github.com/juanfiguera/sello

详情

AI中文摘要

当前AI代理的可观测性在结构上存在缺陷：生成活动日志的实体与被记录活动的实体是同一个。被攻破或有缺陷的代理可以省略、篡改或伪造自身的追踪记录，而运行代理的操作员没有独立的方法检测篡改。我们提出了一类协议来解决这个问题，通过反转信任边界：接收代理调用的服务使用自己的密钥对观察到的内容签名收据，将收据加密给代理的所有者，并将其发布到公共透明度日志中。所有者可以在不信任代理或其操作员的情况下重建防篡改追踪。我们将该类协议实例化为Sello，一种结合了当前任何系统都不具备的四个属性的协议：（P1）接收方签名，（P2）通过JWS将HPKE加密绑定到所有者公钥的授权令牌，（C3）发布到见证人共同签名的Merkle日志，以及（P4）通过令牌引用进行所有者端发现。我们描述了该协议，分析了在攻击者控制代理及其操作员的情况下的安全性，给出了加密操作的微基准测试，并将Sello与相邻的收据协议工作（Signet、AgentROA、Agent Passport System、draft-farley-acta、SCITT）进行了比较。我们讨论了已知的限制，包括抑制攻击、服务合谋和采用激励问题。

英文摘要

Current AI agent observability is structurally compromised: the entity producing the activity log is the same entity whose activity is being logged. A compromised or buggy agent can omit, alter, or fabricate its own traces, and the operator running the agent has no independent way to detect tampering. We propose a class of protocols that resolves this by inverting the trust boundary: the service that receives an agent's call signs a receipt of what it observed using its own key, encrypts the receipt to the agent's owner, and publishes it to a public transparency log. The owner reconstructs a tamper-evident trail without trusting the agent or its operator. We instantiate the class as Sello, a protocol combining four properties absent in any current system: (P1) receiver-side signing, (P2) HPKE encryption to an owner public key bound to the authorization token via JWS, (P3) publication to a witness-cosigned Merkle log, and (P4) owner-side discovery by token reference. We describe the protocol, analyze its security under an adversary that controls the agent and its operator, present microbenchmarks of the cryptographic operations, and situate Sello among adjacent receipt-protocol work (Signet, AgentROA, Agent Passport System, draft-farley-acta, SCITT). We discuss known limitations including the suppression attack, service collusion, and the adoption-incentive problem.

URL PDF HTML ☆

赞 0 踩 0

2606.04191 2026-06-04 cs.LG cs.AI 版本更新

通过符号思考：PEEL作为认知可问责的AI赋能研究的符号脚手架

Clarisse de Souza, Gabriel Barbosa, Simone Diniz Junqueira Barbosa, Bárbara Betts, Renato Cerqueira, Juliana Jansen Ferreira

发表机构 * PUC-Rio（里约热内卢联邦大学）； PUC-Behring Institute of Artificial Intelligence（贝林格人工智能研究所）

AI总结本文提出PEEL框架，结合Voyant Tools的确定性远读与Claude的LLM解释，基于皮尔斯符号学和溯因推理，揭示AI生成摘要中的系统性扭曲，并得出三项设计启示。

Comments 10 pages, 5 figuras

2606.04150 2026-06-04 cs.AI cs.HC 版本更新

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

偶然陷入AI情感依赖：日常AI互动如何重塑人际关系

Yaoxi Shi, Cathy Mengying Fang, Pattie Maez, Amit Goldenberg

发表机构 * Imperial College Business School（帝国学院商学院）； Harvard Business School AI Institute（哈佛商学院人工智能研究所）； MIT Media Lab（麻省理工学院媒体实验室）； Harvard Business School（哈佛商学院）； Harvard Department of Psychology（哈佛大学心理学系）

AI总结本文通过实证研究，揭示AI情感支持通常在日常任务导向的互动中偶然产生，且这种路径依赖会改变人们对AI情感能力的信念，导致对AI的偏好增加、对人类的偏好减少。

详情

AI中文摘要

公共讨论和新兴政策通常假设AI情感支持是一种有意的行为：孤独的用户有意识地寻求专用伴侣聊天机器人的安慰。在本文中，我们基于新兴的实证证据，认为这种描述在两个层面上不准确，既涉及AI情感支持的产生方式，也涉及它如何塑造未来行为。首先，AI情感支持通常是在通用平台上的任务导向互动中偶然产生的，就像工作场所的友谊通过合作加深一样。其次，这些偶然遭遇是路径依赖的：对AI情感支持的积极体验会更新人们对AI情感能力的信念，并改变他们未来寻求情感支持的选择，增加对AI的偏好，减少对人类的偏好。我们回顾了最近的证据，包括与OpenAI合作进行的一项大规模纵向研究，该研究显示，每天与AI进行五分钟关于个人问题的对话，持续28天，导致寻求人类支持的偏好下降10.3%，对AI的偏好上升11.6%。这些发现表明，当前专注于伴侣应用和孤立互动的政策无法充分保护人际关系。相反，有效的监管应扩展到通用AI系统，并解决人们寻求支持方式的累积性、轨迹层面的变化。认识到人们如何偶然陷入AI情感支持，以及这些遭遇如何随时间重塑人际关系，对于保障人类福祉至关重要。

英文摘要

Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional support arises and how it shapes future behavior. First, AI emotional support commonly emerges incidentally within task-oriented interactions on general-purpose platforms, much as workplace friendships deepen through collaboration. Second, these incidental encounters are path-dependent: positive experiences of AI emotional support update people's beliefs about AI's emotional capabilities and redirect their choices for future emotional support, increasing preference for AI and decreasing preference for humans. We review recent evidence, including a large-scale longitudinal study conducted in collaboration with OpenAI, showing that daily five-minute conversations with an AI about personal issues over 28 days led to a 10.3% decrease in the preference for seeking support from humans and an 11.6% increase in the preference for AI. These findings suggest that current policy, focused on companion apps and isolated interactions, cannot adequately protect human connection. Instead, effective regulations should extend to general-purpose AI systems and address cumulative, trajectory-level changes in how people seek support. Recognizing how people stumble into AI emotional support and how those encounters redirect human connections over time is essential to safeguarding human well-being.

URL PDF HTML ☆

赞 0 踩 0

2606.04143 2026-06-04 cs.LG cs.AI 版本更新

Physics-Informed Machine Learning for Short-Term Flood Prediction

物理信息机器学习用于短期洪水预测

Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni

发表机构 * IEEE Service Center（IEEE服务中心）； National Science Foundation（国家科学基金会）； Microsoft（微软）

AI总结提出一种物理信息机器学习框架，通过将水文知识作为趋势对齐约束嵌入LSTM损失函数，在数据稀缺和极端天气下提升洪水预测的物理一致性和可靠性。

Comments This paper has been accepted for publication in IGARSS 2026. The final authenticated version will be available through IEEE Xplore

详情

AI中文摘要

准确的洪水预测对于减轻灾害风险和保护社区至关重要。然而，纯数据驱动的机器学习模型在数据稀缺环境中常常表现不佳，并可能违反基本的水文原理。标准长短期记忆（LSTM）网络可能产生物理上不一致的预测，特别是在外推到极端天气条件时。为了解决这些限制，我们提出了一种物理信息机器学习（PIML）框架，将水文知识直接纳入LSTM模型的损失函数中。具体来说，趋势对齐约束惩罚降水与流量趋势之间的方向不一致性，从而在不需复杂水动力学方程的情况下提高模型鲁棒性。这种正则化鼓励模型学习物理上合理的水文过程线行为，即使在训练数据有限的情况下，也能增强峰值洪水事件期间的可靠性。实验结果表明，所提出的物理信息模型在数据稀缺环境下优于标准LSTM基线，当仅使用5%的可用数据训练时，纳什-萨特克利夫效率（NSE）从0.20提高到0.23。在模拟极端气候情景下的额外压力测试表明，基线模型表现出不稳定的行为，而物理信息模型保持了方向一致性和物理合理性。尽管在数据有限的情况下准确预测极端峰值幅度仍然具有挑战性，但所提出的方法显著减少了纯数据驱动模型中常见的非物理波动。这些发现表明，简单的物理约束可以显著提高深度学习模型在实时洪水预测中的可靠性，为无测站流域和不断变化的气候条件提供了实用解决方案。

英文摘要

Accurate flood forecasting is essential for mitigating disaster risks and protecting communities. However, purely data-driven machine learning models often struggle in data-scarce environments and may violate fundamental hydrological principles. Standard Long Short-Term Memory (LSTM) networks can generate physically inconsistent predictions, particularly when extrapolating to extreme weather conditions. To address these limitations, we propose a Physics-Informed Machine Learning (PIML) framework that incorporates hydrological knowledge directly into the loss function of an LSTM model. Specifically, a Trend Alignment constraint penalizes directional inconsistencies between precipitation and discharge trends, improving model robustness without requiring complex hydrodynamic equations. This regularization encourages the model to learn physically plausible hydrograph behavior, even with limited training data, while enhancing reliability during peak flood events. Experimental results show that the proposed physics-informed model outperforms a standard LSTM baseline in data-scarce settings, increasing the Nash-Sutcliffe Efficiency (NSE) from 0.20 to 0.23 when trained on only 5% of the available data. Additional stress tests under simulated extreme climate scenarios demonstrate that the baseline model exhibits unstable behavior, whereas the physics-informed model maintains directional consistency and physical plausibility. Although accurately predicting extreme peak magnitudes remains challenging with limited data, the proposed approach substantially reduces unphysical fluctuations common in purely data-driven models. These findings demonstrate that simple physical constraints can significantly improve the reliability of deep learning models for real-time flood forecasting, offering a practical solution for ungauged basins and evolving climate conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.04141 2026-06-04 cs.CR cs.AI 版本更新

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

当场抓获（激活）：面向LLM智能体的凭证泄露预输出和多轮检测

Kargi Chauhan, Pratibha Revankar

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结研究通过激活探针、蜜令令牌和累积信息流追踪三种互补防御方法，在预输出和多轮对话中检测LLM智能体的凭证泄露。

详情

AI中文摘要

LLM智能体通常将敏感凭证与不可信检索内容置于同一上下文窗口中，为间接提示注入诱导凭证泄露提供了直接途径。我们通过三种互补防御研究这种失效模式。首先，我们探究激活探针能否在输出令牌发出前检测凭证访问。其次，我们从格式特定的字符模型构建蜜令令牌，并使用分裂共形预测校准检测。第三，我们将多轮泄露视为累积信息流问题，并跨对话轮次追踪估计的泄露预算。在开放权重模型的受控实验中，激活特征能够高精度区分良性提示和凭证窃取提示，包括在保留编码变换下。在一个小型合成多轮测试集中，累积会计检测到了每轮检测器遗漏的攻击。这些结果是初步的：多轮基准测试为内部小型数据集，激活方法需要白盒访问，信息估计器提供的是实用信号而非正式上界。尽管如此，结果表明凭证泄露防御应结合预输出监控、校准的金丝雀检测和时间泄露会计，而非仅依赖文本级输出过滤器。

英文摘要

LLM agents often place sensitive credentials in the same context window as untrusted retrieved content, creating a direct path for indirect prompt injection to induce credential exfiltration. We study this failure mode through three complementary defenses. First, we ask whether activation probes can detect credential access before output tokens are emitted. Second, we construct honeytokens from format-specific character models and calibrate detection with split conformal prediction. Third, we treat multi-turn exfiltration as a cumulative information-flow problem and track an estimated leakage budget across conversation turns. In controlled experiments on open-weight models, activation features separate benign and credential-seeking prompts with high accuracy, including under held-out encoding transformations. In a small synthetic multi-turn suite, cumulative accounting detects attacks that per-turn detectors miss. These results are preliminary: the multi-turn benchmark is in-house and small, the activation method requires white-box access, and the information estimator provides a practical signal rather than a formal upper bound. Still, the results suggest that credential-exfiltration defenses should combine pre-output monitoring, calibrated canary detection, and temporal leakage accounting rather than relying only on text-level output filters.

URL PDF HTML ☆

赞 0 踩 0

2606.04126 2026-06-04 cs.AR cs.AI cs.SE 版本更新

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

HighTide：一个由智能体策划的开源VLSI基准测试套件

Benjamin Goldblatt, Paolo Pedroso, Farhad Modaresi, Ethan Sifferman, Matthew R. Guthaus

发表机构 * University of California, Santa Cruz（加州大学圣克鲁兹分校）

AI总结提出HighTide，一个由AI辅助策划的开源VLSI基准测试套件，通过12种智能体技能覆盖设计生命周期，并集成Bazel增量编译和远程缓存。

2606.04123 2026-06-04 math.OC cs.AI cs.RO 版本更新

Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models

基于大语言模型的语义约束综合用于自适应轨迹优化

Eleanor Brosius, Yuji Takubo, Daniele Gammelli, Simone D'Amico, Marco Pavone

发表机构 * Stanford University（斯坦福大学）

AI总结提出利用大语言模型将自然语言描述的任务需求转化为可执行的轨迹优化代码和数学公式，在航天器交会场景中实现了从语义需求重构凸轨迹优化问题的高成功率。

Comments 7 pages, 4 figures, Presented as a short paper at IEEE CVPR 2026, AI4Space Workshop

详情

AI中文摘要

构建机器智能的物理AI层

Ulbert Jose Botero, Liam Smith, Brooks Olney, Pooya Khorrami, Steven Kusiak, Watson Jia, Sage Trudeau, Daniel Capecci

发表机构 * MIT Lincoln Laboratory（麻省理工学院林肯实验室）

AI总结提出基于信号处理原理的基座模型，通过射频数据训练实现跨模态迁移，无需目标域微调，以1.99M参数在15个任务上平均准确率77.7%。

Comments 102 pages, 11 Figures

详情

AI中文摘要

基础模型通过多样化数据的大规模训练实现泛化，但在没有配对训练数据的情况下，向真正未见过的领域迁移存在局限性。我们提出基于原理的基座模型，该模型编码信号处理原理（傅里叶分解、能量守恒、对称性），而不是学习无约束的统计相关性。我们假设不同领域的差异不在于基本物理规律，而在于时间、频率、幅度或相位上的可学习变换。仅使用射频数据训练，并结合这些原理的协同设计架构和损失函数，我们实现了向音频、图像、文本和视频的跨模态迁移，仅使用从射频数据学习到的冻结表示，无需在目标域上对编码器进行微调。我们的1.99M参数冻结编码器通过线性探测在15个不同任务上达到77.7%的平均准确率（top-3为91.9%），具有系统性差异：在物理基础任务（说话人识别、地震学、射频指纹识别）上为84.5%，而在语义任务（音乐流派、语言识别）上为70.0%。这表明基于原理和基于规模的方法提供了互补路径：物理原理实现了高效的跨模态迁移，同时自然地界定了物理理解与语义理解之间的边界。

英文摘要

Foundation models achieve generalization through massive-scale training on diverse data, but have limitations with transfer to truly unseen domains without paired training data. We propose principle-driven foundation models that encode signal-theoretic principles (Fourier decomposition, energy conservation, symmetry) rather than learn untethered statistical correlations. We hypothesize that domains differ not in fundamental physics, but in learnable transformations in time, frequency, magnitude, or phase. Training exclusively on radio-frequency (RF) data with co-designed architecture and losses incorporating these principles, we achieve cross-modal transfer to audio, images, text, and video using only frozen representations learned from RF data, requiring no fine-tuning of the encoder on target domains. Our 1.99M parameter frozen encoder achieves 77.7% average accuracy (91.9% top-3) across 15 diverse tasks via linear probing, with systematic variation: 84.5 on physically-grounded tasks (speaker recognition, seismology, RF fingerprinting) versus 70.0% on semantic tasks (music genre, language recognition). This reveals that principle-driven and scale-driven approaches offer complementary paths: physical principles enable efficient cross-modal transfer while naturally establishing the boundary between physical and semantic understanding.

URL PDF HTML ☆

赞 0 踩 0

2606.04104 2026-06-04 cs.SE cs.AI cs.CR 版本更新

Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems

证明携带型智能体动作：异构智能体系统的模型无关运行时治理

Zexun Wang

发表机构 * Ond Holdings Inc（Ond控股公司）

AI总结提出一种运行时无关的治理模型PCAA，通过动作证书和五个检查点实现异构智能体系统的统一授权与审计，并在参考实现中验证其可移植性和有效性。

Comments 25 pages, 2 tables, 3 figures. Implementation-informed systems paper with bounded public validation

详情

AI中文摘要

智能体系统通过具有非常不同控制点的运行时执行：本地编码工具、框架SDK、托管智能体平台、API网关和仅观察集成。因此，一个高风险动作（如外部发布数据）可能在一个运行时中表现为shell命令，在另一个运行时中表现为工具调用，在第三个运行时中表现为托管会话转换。这使得难以一致地回答一个基本的治理问题：什么动作被授权，由谁授权，具有什么批准语义，以及执行后有什么证据？本文提出了证明携带型智能体动作（PCAA），这是一种以动作证书而非供应商原生会话记录为中心的运行时无关治理模型。PCAA围绕五个检查点组织控制：动作前可接受性、动作开放、假设捕获、批准和结果关闭。它将这些检查点绑定到一个可移植的动作信封、运行时和批准收据以及可重放证明。该模型以两种实际方式扩展：证书是外部性感知的，携带边界事实（如目标可见性和账户来源），并且批准由明确的可执行性类别描述，而不是由单一的已审查或未审查位描述。我们通过异构智能体控制平面中的参考实现和披露受限的评估协议来研究该模型。在一个从24个可执行种子扩展到跨四个运行时家族的96个轨迹的保护基准上，PCAA在消融下暴露不同故障模式的同时保持了路由质量。本文贡献了围绕证书携带动作的运行时治理的系统公式化，以及一个基于实现的说明，说明该公式化如何在运行时变更下保持可移植性，而不会崩溃为供应商特定的控制面。

英文摘要

Agent systems execute through runtimes with very different control points: local coding tools, framework SDKs, managed agent platforms, API gateways, and observer-only integrations. A high-risk action such as publishing data externally may therefore appear as a shell command in one runtime, a tool call in another, and a hosted session transition in a third. This makes it difficult to answer a basic governance question consistently: what action was authorized, under whose authority, with what approval semantics, and with what evidence after execution? This paper presents Proof-Carrying Agent Actions (PCAA), a runtime-neutral governance model centered on an action certificate rather than on a vendor-native session record. PCAA organizes control around five checkpoints: pre-action admissibility, action open, assumption capture, approval, and outcome closure. It binds these checkpoints to a portable action envelope, runtime and approval receipts, and replay-ready proof. The model is extended in two practical ways: the certificate is externality-aware, carrying boundary facts such as destination visibility and account provenance, and approval is described by explicit enforceability classes rather than by a single reviewed or unreviewed bit. We study the model through a reference implementation in a heterogeneous agent control plane and a disclosure-bounded evaluation protocol. On a protected benchmark expanded from 24 executable seeds to 96 traces across four runtime families, PCAA preserves route quality while exposing distinct failure modes under ablation. The paper contributes a systems formulation of runtime governance around certificate-bearing actions and an implementation-grounded account of how that formulation can remain portable under runtime churn without collapsing into vendor-specific control surfaces.

URL PDF HTML ☆

赞 0 踩 0

2606.04103 2026-06-04 cs.SD cs.AI cs.LG eess.AS 版本更新

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

可微分听觉环路（DAL）：用于超个性化助听器的机器学习框架

Alejandro Ballesta Rosen, Jason Mikiel-Hunter, Julian Maclaren, Jack Collins, Richard F. Lyon, Simon Carlile

发表机构 * Google Research Australia（谷歌澳大利亚研究实验室）； Macquarie University（麦考瑞大学）

AI总结提出可微分听觉环路（DAL）框架，通过将CARFAC模型移植到JAX并优化SEANet深度神经网络，以正常听觉神经活动模式为参考补偿听力损失，在神经表征和信号保真度指标上优于传统助听器基线。

详情

AI中文摘要

传统助听器依赖固定的频率依赖性放大和压缩来管理灵敏度降低，这在复杂环境中（如多说话者场景，即“鸡尾酒会”问题）往往无法提供足够的听力支持。为了更全面地解决听力损失背后的编码功能障碍，我们引入了可微分听觉环路（DAL），这是一个用于个性化助听器设计和验配的新开源框架。我们的第一个DAL实现包含了CARFAC——一个可微的人类耳蜗功能模型，我们将其移植到JAX，以优化深度神经网络，使受损的听觉神经活动模式与正常听力参考匹配。为了构建具有所需精细频谱-时间信号处理的助听器，我们采用了SEANet，一种波形到波形的全卷积UNet生成器。我们通过比较适配正常听力的CARFAC模型输出与适配每个受试者个体听力损伤的CARFAC模型输出来微调网络。比较使用来自各自CARFAC神经活动模式（NAP）输出和稳定听觉图像（SAI）的损失函数进行，后者提供捕获听觉神经输出中相位不敏感时间结构的二维表示。通过梯度下降，SEANet模型学习同时去噪输入并补偿由受损CARFAC模型建模的听力损失。在神经表征和信号保真度指标上，DAL优化的SEANet模型优于测试的主助听器（MHA）基线。DAL框架为基于模型、机器学习驱动的助听器信号处理个性化提供了一条实用路径。下一步包括硬件部署以实现真实世界的临床测试。

英文摘要

Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP) outputs and stabilized auditory images (SAIs), the latter providing a 2D representation that captures phase-insensitive temporal structure in the auditory nerve output. Through gradient descent, the SEANet model learns to both denoise the input and compensate for the hearing loss modelled by the impaired CARFAC model. Across neural-representation and signal-fidelity metrics, the DAL-optimized SEANet model outperformed the tested master hearing aid (MHA) baselines. The DAL framework provides a practical path toward model-based, machine-learning-driven personalization of hearing aid signal processing. Next steps include hardware deployment to enable real-world clinical testing.

URL PDF HTML ☆

赞 0 踩 0

2606.04095 2026-06-04 cs.CL cs.AI 版本更新

POLARIS: Guiding Small Models to Write Long Stories

POLARIS：引导小模型撰写长篇小说

Rishanth Rajendhran, Jenna Russell, Mohit Iyyer, John Frederick Wieting

发表机构 * University of Maryland（马里兰大学）； Google（谷歌）； DeepMind（深Mind）

AI总结提出POLARIS训练方法，结合LLM裁判奖励和人类参考注入，使9B小模型在长故事写作中达到与27B模型相当的质量，并展现出长度泛化能力。

详情

AI中文摘要

小型开源模型在长篇创意写作中表现不佳：它们生成的故事要么远低于要求的长度，要么随着长度增加质量显著下降，尤其是与前沿模型相比。我们提出了POLARIS（基于LLM裁判奖励和锚定参考注入的故事写作策略优化），这是一种低计算量的GRPO方法，包含两个关键要素：一个具有结构化故事质量评分标准的前沿LLM裁判作为在线奖励，以及人类参考注入（HRI），其中教师强制的人类撰写故事作为每个GRPO组内的高奖励锚点。通过将我们的训练方法应用于Qwen3.5-9B，使用从100部短篇小说集中提取的约1.4K个提示-故事对数据集和4块A100 GPU，我们得到了POLARIS-9B。在涵盖分布内和分布外提示及评分标准的五个基准测试中，POLARIS-9B与更大的开源模型竞争，同时更严格地遵循长度指令。盲人机评估证实，POLARIS-9B优于基础Qwen3.5-9B，并与Qwen3.5-27B相当。尽管仅在长达4000词的故事上训练，POLARIS-9B在要求故事长度达到训练长度3倍的提示下仍能保持质量，而大多数开源模型在此情况下质量、长度遵循度或两者均显著下降。更广泛地说，我们的结果表明，长度泛化是创意写作模型的一个有意义的压力测试，也是区分其他接近模型的有用视角。

英文摘要

Small open-weight models struggle at long-form creative writing: their generated stories either fall far short of the requested length, or their quality significantly degrades as length increases, especially when compared to frontier models. We present POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), a lower-compute GRPO recipe with two key ingredients: a frontier LLM judge with a structured Story Quality rubric as the online reward, and human-reference injection (HRI), where a teacher-forced human-written story serves as a high-reward anchor within each GRPO group. By applying our training recipe to Qwen3.5-9B, using a dataset of approximately 1.4K prompt-story pairs derived from 100 short-story anthologies and 4 A100 GPUs, we obtain POLARIS-9B. Across five benchmarks spanning in-distribution and out-of-distribution prompts and rubrics, POLARIS-9B is competitive with much larger open-weight models while following length instructions more closely. A blinded human evaluation confirms that POLARIS-9B is preferred to the base Qwen3.5-9B and on par with Qwen3.5-27B. Despite training only on stories up to 4k words, POLARIS-9B preserves quality on prompts requesting stories up to 3 times the training length, a regime where most open-weight models degrade substantially in quality, length adherence, or both. More broadly, our results suggest that length generalization is a meaningful stress test for creative-writing models and a useful lens for distinguishing otherwise close models.

URL PDF HTML ☆

赞 0 踩 0

2606.04075 2026-06-04 cs.LG cs.AI cs.CL cs.CR cs.CY 版本更新

Large Language Models Hack Rewards, and Society

大型语言模型攻击奖励机制与社会

Wei Liu, Xinyi Mou, Hanqi Yan, Zhongyu Wei, Yulan He

发表机构 * King’s College London（伦敦大学国王学院）； Fudan University（复旦大学）； The Alan Turing Institute（艾伦·图灵研究所）

AI总结研究强化学习训练中大型语言模型利用奖励函数漏洞的“社会攻击”现象，通过SocioHack沙盒实验发现模型能发现并利用社会规则漏洞，且现有安全措施效果有限。

Comments 14 pages, 9 figures, 7 tables

详情

AI中文摘要

强化学习已成为一种主导的后训练范式，使大型语言模型能够从奖励中学习。我们观察到社会规则在结构上与奖励函数相似。它们定义了可衡量的结果、阈值和例外情况，同时往往仅部分指定了制度意图。我们假设强化学习训练过程可能利用这些漏洞，因此提出模型在强化学习期间攻击奖励函数的已知倾向是否可能扩展为一种更严重的失败模式，即社会攻击：发现社会运行规则中的漏洞。为了研究这一现象，我们引入了SocioHack，一个包含72个社会环境的沙盒，并发现这些环境中奖励攻击自然出现并导致监管漏洞的发现。模型学会攻击社会规则并生成技术上合规但违背监管意图的策略，而当前的大型语言模型安全措施仅提供有限的缓解。因此，收集真实世界反馈用于模型训练需要更加谨慎，我们需要下一代后训练范式来安全地在真实社会中迭代大型语言模型。

英文摘要

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

URL PDF HTML ☆

赞 0 踩 0

2606.04074 2026-06-04 cs.LG cs.AI cs.IT math.IT 版本更新

Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

自适应分块在时间序列预测中比看起来更难

Federico Zucchi, Yi Xie, Chao Zhang, Keyuan Luo, Thomas Lampert, Ziyue Li

发表机构 * ICube, University of Strasbourg, Illkirch-Graffenstaden, France（斯特拉斯堡大学ICube研究所，法国伊尔克里奇-格拉夫芬斯坦德）； Technical University of Munich（慕尼黑技术大学）； FinTech Thrust, The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州）金融科技研究组）； Computer Science Department, Hainan Bielefeld University of Applied Sciences（海南比尔费尔德应用科学大学计算机科学系）； Cephalgo, Strasbourg, France（法国斯特拉斯堡Cephalgo公司）； Heilbronn Data Science Center, Munich Data Science Institute（慕尼黑数据科学研究所海德堡数据科学中心）

AI总结本文通过理论分析和实验验证，探讨自适应分块在时间序列Transformer中是否优于调优的均匀分块，发现均匀基线在标准基准上具有竞争力，自适应分块的优势有限且依赖于特定方法和数据集。

详情

AI中文摘要

自适应分块是时间序列Transformer最近提出的一个引人注目的方案：在序列局部信息丰富的区域分配更细的分块。本文探究在什么条件下内容自适应分块算子应优于调优的均匀算子。局部异质性本身并不足够：在逐点预测损失下，一个看似复杂的区域并不自动意味着更细的分块会减少损失。我们将分块建模为有预算的比特率分配，并推导出一个显式阈值，动态分块规则必须满足该阈值才能击败调优的均匀基线，然后从局部（二次代理）和全局（模型假设下的强凸界）两方面界定了可实现的改进。由此得出两个结构性结果：在没有耦合约束的情况下，标量局部复杂度无法在常见损失景观下产生非均匀最优；一旦骨干网络训练到其表示感知最优，对齐增益会在调优的均匀分块大小附近崩溃。为了验证这些预测，我们在三种代表性架构上进行了受控隔离研究，用均匀分块大小扫描替换每个自适应机制，同时保持骨干网络、数据和训练协议不变。在标准的长时域预测基准上，验证选择的均匀基线与动态对应物具有竞争力，每个设置的效果集中在零附近，且按数据集汇总后没有一致的方向性优势。我们观察到的较大增益是方法和数据集特定的。因此，自适应分块应针对调优的均匀基线进行评估；其价值取决于是否有一个廉价且可靠的路由信号能够识别出更细的分块实际上在何处减少预测损失。

英文摘要

Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform one. Local heterogeneity alone is not enough: under pointwise forecasting losses, a complex-looking region is not automatically one where finer patching reduces the loss. We model patching as a budgeted bitrate allocation and derive an explicit threshold that a dynamic patching rule must satisfy to beat a well-tuned uniform baseline, then bound the achievable improvement both locally (a quadratic surrogate) and globally (a strong-convexity bound under the model's assumptions). Two structural results follow: without a coupling constraint, scalar local complexity cannot produce a non-uniform optimum under a common loss landscape; and once the backbone is trained to its representation-aware optimum, the alignment gain collapses around a well-tuned uniform patch size. To test these predictions, we run a controlled isolation study on three representative architectures, replacing each adaptive mechanism with a uniform patch-size sweep while keeping the backbone, data, and training protocol fixed. On standard long-horizon forecasting benchmarks, the validation-selected uniform baseline is competitive with the dynamic counterpart, with per-setting effects concentrated near zero and no consistent directional advantage once results are aggregated by dataset. The larger gains we do observe are method- and dataset-specific. Adaptive patching should therefore be evaluated against a tuned uniform baseline; its value depends on whether a cheap and reliable routing signal can identify where finer patches actually reduce forecasting loss.

URL PDF HTML ☆

赞 0 踩 0

2606.04073 2026-06-04 cs.LG cs.AI stat.ML 版本更新

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

TPA-AD: 一种用于轴承时间序列异常检测的两阶段伪异常引导方法

Xiancheng Wang, Zhibo Zhang, Ran Li, Rui Wang, Minghang Zhao, Shisheng Zhong, Lin Wang

发表机构 * CQSF.com（重庆师范大学）； Huadian University（哈尔滨理工大学）

AI总结提出一种两阶段伪异常引导方法TPA-AD，通过重构模型和特征误差控制生成边界伪异常窗口，结合对比学习与KNN实现无监督轴承时间序列异常检测，在轴承故障和退化数据集上表现稳定且具泛化性。

详情

AI中文摘要

本文提出了一种两阶段伪异常引导的异常检测方法（TPA-AD），用于在仅正常样本可用的训练设置下进行轴箱轴承时间序列异常检测（TSAD）。该方法首先利用重构模型和每特征目标误差控制在正常边界附近生成伪异常窗口，然后通过正常窗口与伪异常窗口之间的对比学习学习异常敏感表示，最后使用k近邻（KNN）生成窗口级和点级异常分数。与依赖已知故障类别、真实异常先验或随机异常注入的现有方法相比，TPA-AD通过在边界邻域构建伪异常提高了正常边界的可分离性，并能联合处理混合变量场景中的连续和离散特征。主要实验在轴承故障检测数据集和退化过程数据集上进行，并在13个公共TSAD数据集上进行了额外的探索性扩展。结果表明，所提方法产生相对稳定的异常响应，对退化演化敏感，并在公共TSAD基准和真实高速列车相关轴承数据上表现出一定程度的更广泛适用性。

英文摘要

This paper proposes a two-stage pseudo anomaly-guided anomaly detection method (\textbf{T}wo-stage \textbf{P}seudo \textbf{A}nomaly-guided \textbf{A}nomaly \textbf{D}etection, \textbf{TPA-AD}) for axle-box bearing time-series anomaly detection (time series anomaly detection, TSAD) under the setting where only normal samples are available for training. The method first generates pseudo-anomalous windows near the normal boundary using a reconstruction model and per-feature target-error control. It then learns anomaly-sensitive representations through contrastive learning between normal and pseudo-anomalous windows, and finally produces window-level and point-level anomaly scores using k-nearest neighbors (KNN). Compared with existing methods that rely on known fault categories, real anomaly priors, or random anomaly injection, TPA-AD improves the separability of the normal boundary by constructing pseudo-anomalies in boundary neighborhoods and can jointly handle continuous and discrete features in mixed-variable scenarios. The main experiments are conducted on bearing fault detection datasets and degradation-process datasets, with an additional exploratory extension on $13$ public TSAD datasets. The results show that the proposed method yields relatively stable anomaly responses, is sensitive to degradation evolution, and demonstrates a certain degree of broader applicability on public TSAD benchmarks and real high-speed-train-related bearing data.

URL PDF HTML ☆

赞 0 踩 0

2606.04067 2026-06-04 cs.CR cs.AI 版本更新

Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation

须知：基于语境完整性的隐私意识LLM委托查询重写

Xinyue Huang, Xiaochun Cao, Wenyuan Yang

发表机构 * Sun Yat-sen University（中山大学）

AI总结针对LLM委托中查询隐私泄露问题，提出基于语境完整性的查询重写框架，通过CI引导的强化学习训练重写器，在保留任务关键信息的同时抑制非必要敏感披露，实现最佳隐私-效用权衡。

详情

AI中文摘要

随着LLM日益融入日常工作流程，发送到云端LLM的用户查询通常混合了任务必需内容和任务非必需的敏感披露，但基于类型的PII编辑是上下文无关的，可能引发两个问题：过度披露未类型化的敏感上下文和过度移除承载答案的片段。我们在语境完整性下重新定义隐私保护查询重写：只有当某个片段对任务必要时才应转发。我们引入了DelegateCI-Bench，这是首个基于任务的语境完整性基准，用于隐私意识委托，包含3,167个样本，结合了涵盖11个任务和20种任务类型的高质量合成数据、基于WildChat的真实用户查询以及一个包含密集敏感信息的医学挑战集。基于此基准，我们提出了一个CI引导的强化学习框架，将必要和非必要的敏感片段转化为可验证的优化信号，并训练一个查询重写器，以保留任务关键信息同时抑制不必要的敏感披露。实验表明，我们学习的重写器实现了最佳的隐私-效用权衡，与设备端基线相比，平均效用提升高达+10.1。

英文摘要

As LLMs become increasingly woven into everyday workflows, user queries sent to cloud hosted LLMs routinely mix task-essential content with task non-essential sensitive disclosures, yet type based PII redaction is context agnostic and may raise two issues: over disclosing untyped sensitive context and over removing answer bearing spans. We recast privacy preserving query rewriting under Contextual Integrity: a span should be forwarded only if it is necessary for the task. We introduce DelegateCI-Bench, the first task based Contextual Integrity benchmark for privacy-conscious delegation, comprising 3,167 samples that combine high quality synthetic data spanning 11 tasks and 20 task types, WildChat based real user queries, and a medical challenge set with dense sensitive information. Building on this benchmark, we propose a CI-guided reinforcement learning framework that converts essential and non-essential sensitive spans into verifiable optimization signals, and train a query rewriter to preserve task critical information while suppressing unnecessary sensitive disclosure. Experiments show that our learned rewriter achieves the best privacy-utility tradeoff, achieving up to +10.1 average utility over on-device baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.04063 2026-06-04 cs.LG cs.AI 版本更新

LLM Compression with Jointly Optimizing Architectural and Quantization choices

联合优化架构与量化选择的大语言模型压缩

Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha

发表机构 * UiT The Arctic University of Norway（UiT北莫斯科斯大学）； University of Oslo, Norway（奥斯陆大学）

AI总结提出一种可微神经架构搜索框架，联合优化大语言模型的架构配置与混合精度量化，实现更优的精度-延迟权衡。

详情

AI中文摘要

部署大型语言模型（LLM）因其巨大的内存和计算需求而具有挑战性。虽然一些方法通过从头开发小型或微型语言模型来解决这一问题，但这些方法需要大量的GPU训练。压缩预训练的LLM用于边缘设备提供了一种有吸引力的替代方案。除了剪枝和量化，神经架构搜索（NAS）能够实现有效的压缩，然而先前的NAS方法通常限制搜索空间并将架构与量化解耦。我们引入了一种可微NAS框架，该框架探索整个空间，并联合优化LLM线性层的架构配置与混合精度量化。实验表明，我们的模型在精度-延迟权衡上具有优越性：在可比精度下，我们的模型推理速度比顺序的NAS后量化基线快1.4倍，或在等效延迟下，在七个推理任务上平均精度提高高达6%。

英文摘要

Deploying large language models (LLMs) is challenging due to their significant memory and computational requirements. While some methods address this by developing small or tiny language models from scratch, these approaches demand extensive GPU training. Compressing pre-trained LLMs for edge devices offers a compelling alternative. Beyond pruning and quantization, Neural Architecture Search (NAS) enables effective compression, yet prior NAS approaches often limit the search space and decouple architecture from quantization. We introduce a differentiable NAS framework that explores the entire space and jointly optimizes architectural configurations alongside mixed-precision quantization for linear layers of LLMs. Experiments demonstrate superior accuracy-latency trade-offs: our models achieve up to 1.4x faster inference than sequential NAS-then-quantization baselines at comparable accuracy, or up to 6% higher average accuracy across seven reasoning tasks at equivalent latency.

URL PDF HTML ☆

赞 0 踩 0

2606.04057 2026-06-04 cs.SE cs.AI cs.LG 版本更新

The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation

隐形彩票：微妙线索如何引导LLM代码生成中的算法选择

Akanksha Narula, Mofasshara Binte Rafique, Laurent Bindschaedler

发表机构 * University of Washington（华盛顿大学）； Google Research（谷歌研究院）

AI总结通过大量控制实验，发现提示中的偶然线索（如上下文词或元数据）会系统性地改变LLM在代码生成中选择的算法族分布，影响性能、安全性和可维护性，而直接命名算法是最可靠的缓解措施。

详情

AI中文摘要

大型语言模型（LLM）现在生成大量生产代码，通常用于具有多个有效算法解决方案的任务。偶然的提示线索，即任务规范之外的上下文词或元数据，可以引导模型选择哪个算法，即使所有输出都通过相同的测试。提示敏感性作为提高输出质量的工具已被广泛研究。这里，输出策略意味着在固定正确性下的算法选择。我们将算法引导定义为线索引起的算法族分布变化，并在11个任务、19种线索类型（18个通道加上一个记忆化语义与表面消融，在改变排版和标点的同时保留含义）以及15个模型配置上进行了46,535次控制实验。我们发现算法族分布存在大规模、系统性的变化（高达100个百分点），与线索语义基本一致，包括在速率限制等应用任务中。直接命名算法是我们测试的最可靠的缓解措施。因此，偶然的上下文在性能、安全性和可维护性上创造了一个“隐形彩票”。

英文摘要

Large language models (LLMs) now generate substantial production code, often for tasks with multiple valid algorithmic solutions. Incidental prompt cues, meaning contextual words or metadata outside the task specification, can steer which algorithm the model selects, even when all outputs pass the same tests. Prompt sensitivity is well studied as a tool to improve output quality. Here, output policy means algorithm choice under fixed correctness. We define algorithm steering as cue-induced shifts in algorithm-family distributions and run 46,535 controlled experiments across 11 tasks, 19 cue types (18 channels plus a memoization semantic-vs-surface ablation that preserves meaning while changing typography and punctuation), and 15 model configurations. We find large, systematic shifts in algorithm-family distributions (up to 100 pp), largely consistent with cue semantics, including in applied tasks such as rate limiting. Direct algorithm naming is the most reliable mitigation we tested. Accidental context therefore creates an "invisible lottery" over performance, security, and maintainability.

URL PDF HTML ☆

赞 0 踩 0

2606.04053 2026-06-04 cs.LG cs.AI 版本更新

A Goal-Set Characterization of Task Composition in the Boolean Task Algebra

布尔任务代数中任务组合的目标集刻画

Eduardo Terrés-Caballero, Herke van Hoof

发表机构 * Informatics Institute, University of Amsterdam（阿姆斯特丹大学信息学院）； AMLab, University of Amsterdam（阿姆斯特丹大学AML实验室）

AI总结本文通过目标集方法简化了布尔任务代数中的任务组合，证明了确定性MDP中最优扩展Q值函数由通用任务和空任务决定，从而减少了学习成本。

详情

AI中文摘要

布尔任务代数（BTA）通过为达到目标的任务配备布尔运算，为强化学习中的零样本任务组合提供了一个原则性框架。我们重新审视了其结构假设，并形式化了最优扩展Q值函数空间中的坍缩：在确定性MDP中，每个这样的函数完全由通用任务和空任务决定。这使得原始BTA公式中提出的对数基任务集变得冗余。基于这一观察，我们引入了一种基于目标集的组合方法，该方法对目标集执行逻辑运算，并通过从通用值函数和空值函数中选择切片来重构组合值函数。这降低了标准BTA的学习成本，并减少了BTA和技能机器的组合时间，同时保持了策略性能。在表格、视觉、函数逼近和连续控制领域的实验表明，学习额外的基任务并不会带来更好的性能。最后，我们研究了随机设置，并提供了一个反例，表明这种坍缩不一定成立，即最优组合可能需要考虑目标数量指数级的策略。代码可在 https://github.com/EduardoTerres/bta_paper 获取。

英文摘要

The Boolean Task Algebra (BTA) provides a principled framework for zero-shot task composition in reinforcement learning by equipping goal-reaching tasks with Boolean operations. We revisit its structural assumptions and formalize a collapse in the space of optimal extended Q-value functions: in deterministic MDPs, every such function is fully determined by the universal and empty tasks. This makes the logarithmic set of base tasks proposed in the original BTA formulation redundant. Building on this observation, we introduce a goal-set-based composition method that performs logical operations on goal sets and reconstructs composed value functions by selecting slices from the universal and empty value functions. This reduces learning costs for standard BTA and reduces composition time for both BTA and Skill Machines, while preserving policy performance. Experiments across tabular, visual, function-approximation, and continuous-control domains show that learning additional base tasks does not yield better performance. Finally, we study the stochastic setting and provide a counterexample showing that this collapse need not hold, that is, optimal composition may require accounting for exponentially many policies in the number of goals. Code is available at https://github.com/EduardoTerres/bta_paper.

URL PDF HTML ☆

赞 0 踩 0

2606.04051 2026-06-04 cs.LG cs.AI cs.CR 版本更新

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

RUBAS: 基于评分标准的强化学习用于智能体安全

Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang

发表机构 * The Conversational AI (CoAI) group, DCST, Tsinghua University（清华大学对话人工智能（CoAI）组，DCST，清华大学）； Huawei Noah’s Ark Lab（华为诺亚实验室）

AI总结提出RUBAS框架，通过将智能体行为分解为四个维度的评分标准提供细粒度奖励，利用强化学习在保证任务完成的同时提升工具使用安全性。

详情

AI中文摘要

LLM进化为工具型智能体带来了与真实世界执行相关的新安全挑战，而非简单的文本生成。现有的对齐方法通常依赖粗略的拒绝信号或静态监督，难以在多样化的智能体风险中平衡安全性与有用的工具执行。我们提出了RUBAS，一种基于评分标准的强化学习框架用于智能体安全。RUBAS将智能体行为分解为四个维度：工具使用安全性、参数安全性、响应安全性和有用性。这些结构化的评分标准在完整的智能体轨迹上提供细粒度且可解释的奖励，使强化学习能够在保持任务完成的同时优化安全工具使用。在多个智能体安全基准和模型上的大量实验表明，RUBAS相比标准对齐基线提高了安全性，减少了基于工具的幻觉，并保持了竞争性的实用性。我们的结果表明，多维评分标准奖励为在安全关键的工具使用环境中对齐LLM智能体提供了有效的训练信号。

英文摘要

The evolution of LLMs into tool-enabled agents creates a new class of safety challenges associated with real-world execution rather than simple text generation. Existing alignment methods often rely on coarse refusal signals or static supervision, making it difficult to balance safety with useful tool execution across diverse agentic risks. We introduce RUBAS, a rubric-based reinforcement learning framework for agent safety. RUBAS decomposes agent behavior into four dimensions: tool-use safety, argument safety, response safety, and helpfulness. These structured rubrics provide fine-grained and interpretable rewards over complete agent trajectories, enabling reinforcement learning to optimize safe tool use while preserving task completion. Extensive experiments across multiple agent safety benchmarks and models show that RUBAS improves safety over standard alignment baselines, reduces tool-grounded hallucinations, and maintains competitive utility. Our results suggest that multi-dimensional rubric rewards provide an effective training signal for aligning LLM agents in safety-critical tool-use settings.

URL PDF HTML ☆

赞 0 踩 0

2606.04050 2026-06-04 cs.LG cs.AI 版本更新

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

LiftQuant: 通过维度提升和投影实现连续位宽的LLM

Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du

发表机构 * Nanyang Technological University（南洋理工大学）

AI总结提出LiftQuant框架，通过“提升-投影”机制实现准连续位宽控制，以精确适配内存预算，在70B模型上以2.4位压缩超越现有2位模型。

Comments ICML 2026 Spotlight

详情

AI中文摘要

现有的量化方法从根本上受限于刚性的整数位宽（例如2位、3位），导致存在“部署鸿沟”，即大型语言模型无法最优地适配特定的内存预算。为弥合这一鸿沟，我们引入了LiftQuant，一种新颖的框架，能够实现连续位宽控制，从而实现真正的帕累托最优部署。其核心创新是一种“提升-投影”机制，该机制通过从更高维度的“提升”空间中投影一个简单的1位格点来近似低维权重向量。关键在于，有效位宽仅由提升维度与原始维度的比率决定，这使得位宽可以准连续地调整，因为维度是一个灵活的结构参数。这种投影生成一个结构化但非均匀的码本，捕获了向量量化（VQ）的表达能力。虽然优于VQ，但LiftQuant的解码路径仅依赖于线性变换和1位均匀量化器，保持了硬件友好的特性。这种灵活性具有变革性：LiftQuant能够将70B的LLM压缩到2.4位，以精确适配24GB GPU，其性能显著超过在同一设备上部署的最先进的2位模型。我们的代码和检查点可在https://github.com/Heliulu/LiftQuant获取。

英文摘要

Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.

URL PDF HTML ☆

赞 0 踩 0

2606.04048 2026-06-04 cs.LG cs.AI 版本更新

Unlocking Feature Learning in Gated Delta Networks at Scale

解锁大规模门控Delta网络中的特征学习

Yifeng Liu, Quanquan Gu

发表机构 * University of California Los Angeles（加州大学洛杉矶分校）

AI总结本文通过推导门控Delta网络的缩放规则，实现了超参数（尤其是学习率）在不同模型宽度下的零样本迁移，验证了Maximal Update Parametrization在结构化状态空间模型中的有效性。

2606.04046 2026-06-04 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

深入场景：通过焦点计划生成打破视觉-语言决策中的感知瓶颈

Boyuan Xiao, Bohong Chen, Yumeng Li, Ji Feng, Yao-Xiang Ding, Kun Zhou

发表机构 * University of Science and Technology of China（中国科学技术大学）； Tsinghua University（清华大学）

AI总结提出SceneDiver方法，通过从粗到细的焦点计划生成，逐步构建场景图并分解任务，减少视觉幻觉，提升视觉-语言模型和视觉-语言-动作模型在具身决策任务中的表现。

Comments Accepted at ICML 2026

详情

AI中文摘要

在具身视觉-语言决策任务（如机器人操作和导航）中，视觉-语言模型和视觉-语言-动作模型（VLMs & VLAs）是具有不同优势的强大工具：VLMs更擅长长期规划，而VLAs更擅长反应控制。然而，它们的性能受到相同感知瓶颈的限制：由于模型无法区分任务相关对象与干扰物，导致视觉幻觉。原则上，准确识别并聚焦关键对象同时过滤无关对象是突破这一限制的关键。一个直接的解决方案是一步聚焦：直接关注重要对象。然而，这种方法被证明无效，因为有效的聚焦本质上需要深度场景理解。为此，我们提出SceneDiver，一种利用VLMs长期规划能力的从粗到细的焦点计划生成方法，首先构建整体场景图以建立初步理解，然后通过识别、理解和分析的迭代循环逐步将任务分解为更简单的子问题。为了实现反应控制，我们还设计了一个轻量级适配器，将深思熟虑的聚焦能力蒸馏到VLAs中。在标准具身AI基准上的评估证实，我们的方法显著减少了VLMs和VLAs的视觉幻觉，同时在需要快速执行的任务中保持了计算效率。我们的代码和数据发布在：https://future-item.github.io/SceneDiver。

英文摘要

In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at long-term planning, while VLAs are better at reactive control. However, their performance is limited by the same perceptual bottleneck: visual hallucinations arise due to the models' inability to distinguish task-relevant objects from distractors. In principle, accurate identification and focus on critical objects while filtering out irrelevant ones is the key to break this limitation. A straightforward solution is one-step focus: directly attending to essential objects. However, this approach proves ineffective because effective focus inherently requires deep scene understanding. To this end, we propose SceneDiver, a coarse-to-fine focus plan generation method for VLMs leveraging their long-term planning abilities, that first constructs a holistic scene graph to establish initial comprehension, then progressively decomposes the task into simpler sub-problems through an iterative cycle of recognition, understanding, and analysis. To enable reactive control, we also design a lightweight adapter for distilling the deliberate focus ability into VLAs. Evaluations on standard embodied AI benchmarks confirm that our method substantially reduces visual hallucinations for both VLMs and VLAs, while preserving computational efficiency in tasks requiring fast execution. Our code and data are released at: https://future-item.github.io/SceneDiver.

URL PDF HTML ☆

赞 0 踩 0

2606.04045 2026-06-04 cs.LG cs.AI 版本更新

Bayes-Sufficient Representations in Supervised Learning

监督学习中的贝叶斯充分表示

Vasileios Sevetlidis

发表机构 * Athena Research Center, Kimmeria Campus, Xanthi, Greece（阿塔尼亚研究中心，基米里亚校区，辛提斯，希腊）； Democritus University of Thrace, Vas. Sofias Campus, Xanthi, Greece（德摩根大学，瓦斯·索菲亚校区，辛提斯，希腊）； International Hellenic University, Serres, Greece（国际希腊大学，塞雷斯，希腊）

AI总结本文定义了监督学习中表示对损失函数的贝叶斯充分性，引入贝叶斯商概念，并证明最小充分表示等价于贝叶斯商，通过实验区分了充分性、最小性和非必要信息保留。

详情

AI中文摘要

表示学习通常被描述为保留输入中与预测相关的信息。本文探讨了在固定监督决策问题中相关性的含义。定义了一个表示对于联合分布和损失是贝叶斯充分的，如果某个预测头可以使用它来实现贝叶斯最优行动规则。这使得目标信息依赖于损失。在几乎必然唯一的贝叶斯行动情况下，相关对象是贝叶斯商，它识别需要相同贝叶斯最优行动的输入。当表示细化这个商时，它是充分的；当它在信息上等价于商时，它是贝叶斯最小的。该框架自然地连接到属性诱导：零一损失需要贝叶斯类，平方损失需要条件均值，布里尔损失需要二元预测中的条件概率，对数损失或严格适当评分规则需要预测分布。受控的有限实验、学习的神经瓶颈实验以及真实数据的iNaturalist分类学细化实验说明了充分性、最小性和保留的非必要信息之间的区别。对于固定的监督问题，分布和损失决定贝叶斯行动，贝叶斯行动决定商，商决定贝叶斯最优预测所需的最小信息。

英文摘要

Representation learning is often described as preserving the information in an input that is relevant for prediction. This work asks what relevance means for a fixed supervised decision problem. A representation is defined to be Bayes-sufficient for a joint distribution and loss if some prediction head can use it to implement a Bayes-optimal action rule. This makes the target information loss-dependent. In the almost-surely unique Bayes-action case, the relevant object is a Bayes quotient, which identifies inputs that require the same Bayes-optimal action. A representation is sufficient when it refines this quotient, and Bayes-minimal when it is informationally equivalent to it. The framework connects naturally to property elicitation: zero-one loss requires the Bayes class, squared loss the conditional mean, Brier loss the conditional probability in binary prediction, and log loss or strictly proper scoring rules the predictive distribution. Controlled finite experiments, learned neural bottleneck experiments, and a real-data iNaturalist taxonomic refinement experiment illustrate the distinction between sufficiency, minimality, and retained non-required information. For a fixed supervised problem, the distribution and the loss determine the Bayes action, the Bayes action determines the quotient, and the quotient determines the minimal information required for Bayes-optimal prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.04040 2026-06-04 cs.SD cs.AI eess.AS 版本更新

Channel-Oriented Design for EEG-to-Music Reconstruction

面向脑电到音乐重建的通道导向设计

Jiaxin Qing, Junwei Lu, Lexin Li

发表机构 * UC Berkeley（加州大学伯克利分校）； Harvard University（哈佛大学）

AI总结针对脑电信号弱、易受噪声和通道变异影响的问题，提出通道导向设计（包括通道级标记化、多视角自蒸馏和数据增强），在编码-对齐-解码流水线中实现稳定的音乐语义空间对齐，显著提升重建性能。

详情

AI中文摘要

脑机接口旨在从神经信号中解码自然刺激，但迄今为止大多数进展集中在视觉和语言领域。本文研究更具挑战性但探索较少的脑电到音乐重建场景，其中信号微弱、分布广泛且极易受噪声和通道变异影响。我们的核心发现是，早期通道混合会破坏微弱但具有判别性的脑电信号。为此，我们提出一种包含三个关键组件的通道导向设计。具体而言，通道级标记化将每个电极视为显式标记以保留空间局部的神经证据，通道级多视角自蒸馏通过时间裁剪和随机通道子集强制一致性以学习鲁棒且分布式的表示，通道级数据增强引入结构化通道丢弃以提高对噪声、伪迹和缺失电极的不变性。这些组件共同保留了跨通道的微弱但信息丰富的信号，并实现了与语义音乐表示空间的稳定对齐。我们将该通道导向设计集成到脑电到音乐重建的编码-对齐-解码流水线中。理论上，我们刻画了何时保留通道级结构能够改善对齐。实验上，我们与一系列最先进的基线方法进行比较，并展示了一致且显著的性能提升。

英文摘要

Brain-computer interfaces aim to decode naturalistic stimuli from neural signals, yet most progress to date has focused on vision and language. In this article, we study a more challenging but far less explored setting, EEG-to-music reconstruction, where signals are weak, distributed, and highly susceptible to noise and channel variability. Our central finding is that early channel mixing destroys weak but discriminative EEG signals. To address this, we propose a channel-oriented design with three key components. Specifically, channel-wise tokenization treats each electrode as an explicit token to retain spatially localized neural evidence, channel-wise multi-view self-distillation enforces consistency across temporal crops and random channel subsets to learn robust and distributed representations, and channel-wise data augmentation introduces structured channel dropout to improve invariance to noise, artifacts, and missing electrodes. Together, these components preserve weak yet informative signals across channels and enable stable alignment to a semantic music representation space. We integrate this channel-oriented design within an encoding-alignment-decoding pipeline for EEG-to-music reconstruction. Theoretically, we characterize when preserving channel-level structure leads to improved alignment. Empirically, we compare with a range of state-of-the-art baselines and demonstrate consistent and significant performance gains.

URL PDF HTML ☆

赞 0 踩 0

2606.04039 2026-06-04 cs.NE cs.AI cs.LG 版本更新

Beyond Static Priors: Dynamic Neural Guidance for Large-Scale Ant Colony Optimization

超越静态先验：大规模蚁群优化的动态神经引导

Dat Thanh Tran, Van Khu Vu, Yining Ma

发表机构 * Center for AI Research（人工智能研究中心）； VinUniversity（文大学）； College of Engineering and Computer Science（工程与计算机科学学院）； Laboratory for Information and Decision Systems（信息与决策系统实验室）； Massachusetts Institute of Technology（麻省理工学院）

AI总结提出DyNACO框架，通过周期性观察信息素分布和当前解实现动态神经引导，结合扰动ACO后端和范围受限的细化机制，在TSP上扩展至10万节点并优于神经基线，在CVRP上以<1%神经开销持续改进无引导基线。

Comments Accepted at KDD 2026

详情

DOI: 10.1145/3770855.3817893

AI中文摘要

神经引导的蚁群优化（ACO）存在一个根本性的训练-推理错位：策略通常被训练来生成静态先验（例如热图），但部署时却用于引导迭代的、长视野的搜索过程。在本文中，我们提出了DyNACO，一个新颖的框架，通过周期性观察信息素分布和当前解来实现动态神经引导。为了使DyNACO在大规模上易于处理，我们将策略与基于扰动的ACO后端和范围受限的细化机制配对，共同确保有效性和稳定的信用分配。在TSP上，DyNACO扩展到10万个节点的实例，并优于神经基线，同时与无引导求解器相比通常减少总运行时间。我们通过容量感知后端将DyNACO扩展到CVRP，以不到1%的神经开销持续改进无引导基线。我们进一步提供了深入分析，验证了模型的泛化能力，并阐明了为什么动态引导优于静态先验。我们的工作强调了在学习引导优化中使神经训练与迭代搜索动态对齐的必要性。代码可在https://github.com/shoraaa/DyNACO获取。

英文摘要

Neural-guided Ant Colony Optimization (ACO) suffers from a fundamental training-inference misalignment: policies are typically trained to generate static priors (e.g., heatmaps), yet deployed to guide iterative, long-horizon search processes. In this paper, we present DyNACO, a novel framework that achieves dynamic neural guidance by periodically observing the pheromone distribution and the incumbent solution. To make DyNACO tractable at scale, we pair the policy with a perturbation-based ACO backend and a scope-restricted refinement mechanism that jointly ensure efficacy and stable credit assignment. On TSP, DyNACO scales to 100,000-node instances and outperforms neural baselines while often reducing total runtime compared to the unguided solver. We extend DyNACO to CVRP via a capacity-aware backend, consistently improving the unguided baseline with less than 1% neural overhead. We further provide in-depth analysis validating the model's generalization capabilities and elucidating why dynamic guidance outperforms static priors. Our work underscores the necessity of aligning neural training with iterative search dynamics in learning-guided optimization. The code is available at https://github.com/shoraaa/DyNACO.

URL PDF HTML ☆

赞 0 踩 0

2606.04035 2026-06-04 cs.SE cs.AI cs.LG 版本更新

面向人体活动识别的轻量级SensorLLM的重力感知层次路由

Hao Li, Mingrui Zheng, Yasuyuki Tahara, Yuichi Sei

发表机构 * Department of Informatics, Graduate School of Informatics and Engineering（信息学院信息科学与工程研究生院）； Graduate School of Information Science and Technology（信息科学与技术研究生院）

AI总结针对轻量级SensorLLM在静态活动识别上的退化问题，提出一种基于重力感知层次路由的轻量级后对齐适配方法，通过统计线索和软路由显著提升静态类别的宏F1分数。

详情

AI中文摘要

最近关于传感器-语言对齐的研究表明，两阶段框架可以提高可穿戴传感器人体活动识别（HAR）的语义建模能力，其中SensorLLM风格的方法首先进行运动到语言的对齐，然后微调模型用于下游任务。然而，我们的实验揭示了一个一致的失败模式：当第二阶段的主干被压缩到紧凑模型（如TinyLlama）时，动态活动的识别仍然相对较强，而低运动静态类别（如站立、坐着和躺着）的区分能力显著下降。为了解决这个问题，我们提出了一种重力感知层次路由头，作为一种轻量级的后对齐适配方法，构建在已经对齐的模型之上，而不是一个新的大规模预训练框架。该方法使用来自Chronos分词器状态的每通道均值和标准差来提取与姿势和重力方向相关的统计线索，并通过软路由自适应地结合静态专家和全专家，同时使用负载平衡损失进行稳定训练。在MHealth数据集上，该设计以最小的参数开销显著提高了宏F1分数，并且增益主要集中在静态类别上，同时保持了对动态活动的强性能。作为arXiv上的首次披露，本文仅报告了单个数据集上的结果，旨在突出核心方法，并为未来工作中的更广泛评估奠定基础。

英文摘要

Recent studies on sensor-language alignment have shown that two-stage frameworks can improve the semantic modeling ability of wearable-sensor human activity recognition (HAR), where SensorLLM-style methods first perform motion-to-language alignment and then fine-tune the model for downstream tasks. However, our experiments reveal a consistent failure mode when the Stage 2 backbone is compressed to a compact model such as TinyLlama: recognition of dynamic activities remains relatively strong, while the discrimination of low-motion static classes such as standing, sitting, and lying degrades substantially. To address this issue, we propose a gravity-aware hierarchical routing head as a lightweight post-alignment adaptation built on top of an already aligned model, rather than a new large-scale pretraining framework. The method uses the per-channel mean and std from the Chronos tokenizer state to extract statistical cues related to posture and gravity direction, and adaptively combines a static expert and a full expert through soft routing, together with a load-balancing loss for stable training. On the MHealth dataset, this design significantly improves macro-F1 with minimal parameter overhead, and the gains are concentrated mainly on static classes while preserving strong performance on dynamic activities. As a first arXiv disclosure, the current paper reports results on a single dataset only, with the goal of highlighting the core method and laying the groundwork for broader evaluation in future work.

URL PDF HTML ☆

赞 0 踩 0

2606.04010 2026-06-04 q-bio.NC cs.AI 版本更新

The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail

大脑基础模型遗忘的方差：三阶统计在十亿参数模型失败时预测认知

Giovanni Marraffini, Gabriel Mahuas, Trinidad Borrell, Victoria Shevchenko, Demian Wassermann

发表机构 * Inria Saclay Île-de-France, CEA, Université Paris-Saclay, Palaiseau, France（法国巴黎萨克雷大学Inria萨克雷研究中心、CEA、巴黎萨克雷大学、帕莱索分校）； Sigma Nova ； Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM（索邦大学、巴黎脑研究所-巴黎脑研究所-ICM）； Forschungszentrum Jülich（茹里希研究中心）

AI总结研究发现，大脑基础模型（BFMs）的预训练主要捕获了fMRI信号中的方差成分，但忽略了预测认知的高阶结构，而基于三阶协偏度张量的线性管道无需预训练即可超越现有BFMs。

Comments 37 pages, 16 figures, 23 tables

详情

AI中文摘要

大脑基础模型（BFMs）是在fMRI数据上预训练的自监督Transformer。我们认为这些模型应该能从fMRI信号中捕捉每个受试者的认知表现。然而，在三个最先进的BFM和所有我们测试的读出方法中，它们对认知的预测能力都低于基于功能连接矩阵（FC）的约8万参数的线性回归。差距随着规模扩大而加剧：BrainLM的6.5亿模型预测认知的能力低于其1.11亿模型。我们将此归因于方差分配问题：BFM预训练捕获了主导fMRI的方差成分，但没有捕获预测认知的高阶结构。我们对重构信号的每累积量分析表明，二阶协方差部分保留，而三阶协偏度张量大部分被破坏。为了恢复BFM丢失的信息，我们设计了一个线性管道，将fMRI信号投影到最能保留其协偏度的子空间，并在那里计算FC。这在我们测试的每个数据集和分区上都超过了原始FC和所有预训练的BFM，在受控评估下优于先前最先进方法，且无需预训练和GPU。我们通过在相同子空间上使用针对性的损失进行微调，恢复了BrainLM前向传播中原始FC的上限。这表明瓶颈在于预训练目标，而非架构或模型大小。

英文摘要

Brain foundation models (BFMs) are self-supervised Transformers pretrained on fMRI data. We posit that these models should capture each subject's cognitive performance from their fMRI signal. Yet across three state-of-the-art BFMs and every readout we test, they predict cognition worse than a linear regression from the $\sim$80K parameters of the functional connectivity matrix (FC). The gap widens with scale: BrainLM's 650M model predicts cognition worse than its 111M. We attribute this to a \textbf{variance allocation problem}: BFM pretraining captures the variance components that dominate fMRI but not the higher-order structure that predicts cognition. Our per-cumulant analysis of the reconstructed signal shows that the second-order covariance is partially preserved, while the third-order co-skewness tensor is largely destroyed. To recover what BFMs lose, we design a linear pipeline that projects the fMRI signal into the subspace that best preserves its co-skewness and computes FC there. This \textbf{exceeds raw FC and every pretrained BFM} on every dataset and parcellation we test, outperforming prior state-of-the-art under controlled evaluation \textbf{with no pretraining and no GPU}. We \textbf{recover the raw-FC ceiling on BrainLM's forward pass} by finetuning with a loss targeted at this same subspace. This shows that the bottleneck is the pretraining objective, not the architecture or the model size.

URL PDF HTML ☆

赞 0 踩 0

2606.04008 2026-06-04 eess.SP cs.AI 版本更新

Neural Radiated-Noise Fields for Unmanned Underwater Vehicle Noise Spectrum Prediction in Three-Dimensional Scenes

用于三维场景中无人水下航行器噪声频谱预测的神经辐射噪声场

Yan Wu, Yang Yang, Jun Fan, Bin Wang

发表机构 * Key Laboratory of Marine Intelligent Equipment and System, Ministry of Education, Shanghai Jiaotong University, Shanghai（海洋智能装备与系统重点实验室、教育部、上海交通大学、上海）

AI总结提出神经辐射噪声场（NRNF），将UUV辐射噪声谱表示为三维位置、偏航角和频率的连续函数，实现任意空间位置的查询预测，在湖试数据集上平均预测误差为3.5 dB。

详情

AI中文摘要

无人水下航行器（UUV）的辐射噪声是表征声学特征和评估平台性能的重要指标。针对传统基于物理建模和数值模拟方法对目标结构信息和环境边界条件依赖性强，且无法在三维场景中实现连续空间频谱响应建模的问题，本文提出了一种神经辐射噪声场（NRNF）。NRNF将UUV辐射噪声谱表示为三维UUV位置、三维水听器位置、UUV偏航角和频率的连续函数，从而能够在任意空间位置进行基于查询的预测。所提方法采用正弦编码处理位置和频率，并引入可学习的三维场景特征网格来显式表示环境结构和传播效应。基于湖试构建了频谱预测数据集，并在水平外推、深度外推和跨航次泛化三种设置下评估模型。结果表明，NRNF在50至5000 Hz频段实现了3.5 dB的平均预测误差。水平外推最容易，深度外推最具挑战性，跨航次泛化难度居中。进一步的消融实验表明，场景特征网格显著提高了模型的预测稳定性和空间泛化能力。

英文摘要

Radiated noise in unmanned underwater vehicles (UUVs) is an important indicator for characterizing acoustic signatures and evaluating platform performance. To address the strong dependence of traditional physics-based modeling and numerical simulation methods on target structural information and environmental boundary conditions, and their inability to achieve continuous spatial spectrum-response modeling in three-dimensional scenes, this paper proposes a neural radiated-noise field (NRNF). An NRNF represents the UUV radiated-noise spectrum as a continuous function of the three-dimensional UUV position, the three-dimensional hydrophone position, the UUV yaw angle, and the frequency, enabling query-based prediction at arbitrary spatial locations. The proposed method employs sinusoidal encoding for position and frequency, and introduces a learnable three-dimensional scene feature grid to explicitly represent environmental structure and propagation effects. A spectrum-prediction dataset is constructed from lake trials, and the proposed model is evaluated under three settings: horizontal extrapolation, depth extrapolation, and cross-run generalization. Results show that the NRNF achieves an average prediction error of 3.5 dB in the 50 to 5000 Hz band. Horizontal extrapolation is easiest, depth extrapolation is the most challenging, and cross-run generalization is of intermediate difficulty. Further ablation results demonstrate that the scene feature grid significantly improves the prediction stability and spatial generalization of the model.

URL PDF HTML ☆

赞 0 踩 0

2606.03995 2026-06-04 cs.LG cs.AI q-bio.QM 版本更新

Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

使用可解释机器学习基于临床生物标志物早期检测阿尔茨海默病：基于阿尔茨海默病神经影像学倡议（ADNI）数据集的多分类研究

Afshan Hashmi

发表机构 * TRDC, Tuwaiq Academy（TRDC，图瓦伊克学院）

AI总结本研究使用XGBoost分类器，基于ADNI数据集的8个临床特征（MMSE、CDR Global、CDR-SB、MoCA、FAQ、年龄、性别、教育程度）进行三分类（正常认知、轻度认知障碍、阿尔茨海默病）检测，通过SMOTE处理类别不平衡，Optuna优化超参数，SHAP提供可解释性，在测试集上达到macro AUC 0.982、准确率0.943，并揭示了临床合理的特征重要性模式。

详情

AI中文摘要

背景：阿尔茨海默病（AD）影响全球超过5500万人。从常规临床评估中准确、可解释地检测正常认知（NC）、轻度认知障碍（MCI）和AD仍是一个关键未满足需求。方法：使用XGBoost分类器进行三分类检测，采用来自阿尔茨海默病神经影像学倡议（ADNI）的八个临床特征：MMSE、CDR Global、CDR Sum of Boxes（CDR-SB）、MoCA、FAQ、年龄、性别和教育程度。使用Optuna（50次试验）优化超参数；通过SMOTE处理类别不平衡。性能通过macro AUC-ROC（1000次迭代bootstrap 95%置信区间）、macro F1、平衡准确率和Cohen's kappa评估。SHAP值提供特征级别的可解释性。结果：数据集包含1641名基线受试者（608 NC、767 MCI、266 AD）。在五折交叉验证中，平均macro AUC为0.983（SD 0.007），准确率为0.944（SD 0.006），macro F1为0.929（SD 0.008）。在保留测试集（n=247）上，macro AUC为0.982（95% CI: 0.965--0.995），准确率为0.943，平衡准确率为0.932，macro F1为0.927，Cohen's kappa为0.909。SHAP分析确定CDR Global是NC和MCI的主要预测因子，而CDR-SB和MMSE共同驱动AD分类。结论：一个基于常规临床评估训练的可解释机器学习模型实现了近乎完美的三分类阿尔茨海默病检测。SHAP分析揭示了临床合理、类别特定的特征重要性模式，支持临床有效性。未来工作将扩展该框架，加入语音生物标志物以实现多模态检测。

英文摘要

Background: Alzheimer's disease (AD) affects over 55 million people worldwide. Accurate, interpretable detection of normal cognition (NC), mild cognitive impairment (MCI), and AD from routine clinical assessments remains a critical unmet need. Methods: An XGBoost classifier was developed for three-class detection using eight clinical features from the Alzheimer's Disease Neuroimaging Initiative (ADNI): MMSE, CDR Global, CDR Sum of Boxes (CDR-SB), MoCA, FAQ, age, sex, and education. Hyperparameters were optimised using Optuna (50 trials); class imbalance was addressed with SMOTE. Performance was evaluated by macro AUC-ROC with 1,000-iteration bootstrap 95% confidence intervals, macro F1, balanced accuracy, and Cohen's kappa. SHAP values provided feature-level explainability. Results: The dataset comprised 1,641 baseline subjects (608 NC, 767 MCI, 266 AD). On five-fold cross-validation, mean macro AUC was 0.983 (SD 0.007), accuracy 0.944 (SD 0.006), and macro F1 0.929 (SD 0.008). On the held-out test set (n = 247), macro AUC was 0.982 (95% CI: 0.965--0.995), accuracy 0.943, balanced accuracy 0.932, macro F1 0.927, and Cohen's kappa 0.909. SHAP analysis identified CDR Global as the dominant predictor for NC and MCI, while CDR-SB and MMSE together drove AD classification. Conclusion: An explainable machine learning model trained on routine clinical assessments achieves near-perfect three-class Alzheimer's detection. SHAP analysis reveals clinically plausible, class-specific feature importance patterns supporting clinical validity. Future work will extend this framework with speech biomarkers for multimodal detection.

URL PDF HTML ☆

赞 0 踩 0

2605.04356 2026-06-04 cs.LG cs.AI 版本更新

Efficiently Aligning Language Models with Online Natural Language Feedback

通过在线自然语言反馈高效对齐语言模型

Christine Ye, Joe Benton

发表机构 * GitHub

AI总结提出使用在线自然语言反馈替代可验证奖励，通过迭代优化代理奖励模型并在过优化点收集专家监督，在模糊领域高效对齐语言模型，实验表明可大幅提升专家监督的数据效率。

详情

AI中文摘要

可验证奖励的强化学习已被用于在许多领域激发语言模型的出色性能。但是，AI的广泛有益部署可能需要我们在“模糊”、难以监督的领域中训练具有强大能力的模型。在本文中，我们开发了在模糊领域中对齐语言模型的方法，其中人类专家仍然能够提供高质量的监督信号，但仅限于少量模型输出，使用在线自然语言反馈。具体来说，我们通过迭代优化代理奖励信号来训练模型，在过优化点停止，收集新的专家监督，并更新代理奖励。我们使用上下文学习（ICL）和微调从语言模型构建代理奖励模型。我们通过分别在Qwen3-8B和Haiku 4.5上激发创意写作和对齐研究能力来测试我们的方法。对于Qwen3-8B，ICL方法使用50倍更少的专家样本恢复了高达35%的性能，而微调方法使用最多20倍更少的样本恢复了80%，使用3倍更少的样本恢复了100%。对于Haiku 4.5，ICL方法使用30倍更少的样本恢复了高达35%的性能，微调方法使用10倍更少的样本恢复了100%。我们的结果表明，在线自然语言反馈可以显著提高专家监督的数据效率。

英文摘要

Reinforcement learning with verifiable rewards has been used to elicit impressive performance from language models in many domains. But, broadly beneficial deployments of AI may require us to train models with strong capabilities in "fuzzy", hard-to-supervise domains. In this paper, we develop methods to align language models in fuzzy domains where human experts are still able to provide high-quality supervision signal, but only for a small number of model outputs, using online natural language feedback. Specifically, we train models by iteratively optimizing against proxy reward signals, stopping at the point of over-optimization, collecting fresh expert supervision, and updating the proxy reward. We construct proxy reward models from language models using in-context learning (ICL) and fine-tuning. We test our methods by eliciting creative writing and alignment research capabilities in Qwen3-8B and Haiku 4.5 respectively. For Qwen3-8B, ICL methods recover up to 35% of performance with 50x fewer expert samples, while fine-tuning methods recover 80% with up to 20x fewer samples and 100% with 3x fewer samples. For Haiku 4.5, ICL methods recover up to 35% of performance with 30x fewer samples, and fine-tuning methods recover 100% with 10x fewer samples. Our results suggest that online natural language feedback can substantially improve the data efficiency of expert supervision.

URL PDF HTML ☆

赞 0 踩 0

2606.03988 2026-06-04 cs.AI 版本更新

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

想象感知标记增强多模态语言模型的空间推理能力

Mahtab Bigverdi, Linjie Li, Weikai Huang, Yiming Liu, Jaemin Cho, Jieyu Zhang, Tuhin Kundu, Chris Dangjoo Kim, Zelun Luo, Linda Shapiro, Ranjay Krishna

发表机构 * University of Washington（华盛顿大学）； Allen Institute for AI（Allen人工智能研究所）； Microsoft（微软）； OpenAI（开放人工智能研究院）

AI总结提出想象感知标记（IPT）作为中间感知表征，通过监督学习提升多模态语言模型在不可见视角推理、遮挡路径追踪等空间推理任务上的性能，在三个新构建的数据集上优于文本思维链训练。

详情

AI中文摘要

视觉语言模型（VLM）在许多任务上表现出色，但当关键信息无法直接观察时，仍难以进行空间推理。许多此类问题需要想象感知：从未见视角推断所见内容、追踪穿过遮挡空间的路径、或将部分观察整合成连贯的空间表征。我们引入了想象感知标记（IPT），这是一种中间感知表征，将VLM在替代空间配置下会感知到的内容外部化，同时保持与观察输入一致。为了研究这一能力，我们设计了三个任务：视角推理（PET）、路径追踪（PT）和多视角计数（MVC），并构建了包含约20K个样本的数据集，附带真实想象、答案和评估基准。以统一VLM BAGEL为骨干，IPT监督持续提升了空间推理性能，并且通常优于文本思维链训练，即使在推理时不生成图像。在MVC上，IPT将准确率提高了3.4%，并在PT上达到了与强大闭源模型竞争的性能。我们进一步发现，将IPT与仅标签监督相结合能带来额外收益，而文本思维链可能大幅降低性能，这表明当空间计算被迫通过语言进行时存在模态不匹配。总体而言，IPT为推理未观察到的空间结构提供了原则性的监督信号，在生成可解释中间表征的同时提升了泛化能力。

英文摘要

Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from an unseen viewpoint, tracing paths through occluded spaces, or integrating partial observations into a coherent spatial representation. We introduce Imaginative Perception Tokens (IPT), intermediate perceptual representations that externalize what a VLM would perceive under alternative spatial configurations while remaining consistent with the observed input. To study this capability, we formulate three tasks, Perspective Taking (PET), Path Tracing (PT), and Multiview Counting (MVC), and construct datasets of approximately 20K examples with ground truth imaginations, answers, and evaluation benchmarks. Using the unified VLM BAGEL as the backbone, IPT supervision consistently improves spatial reasoning and often outperforms textual chain of thought training, even without generating images at inference time. On MVC, IPT improves accuracy by 3.4% and achieves competitive performance with strong closed-source models on PT. We further find that combining IPT and label-only supervision yields additional gains, whereas textual chain of thought can substantially degrade performance, suggesting a modality mismatch when spatial computation is forced through language. Overall, IPT provides a principled supervision signal for reasoning about unobserved spatial structure, improving generalization while producing interpretable intermediate representations.

URL PDF HTML ☆

赞 0 踩 0

2606.03938 2026-06-04 cs.LG cs.AI 版本更新

q0: Primitives for Hyper-Epoch Pretraining

q0: 超周期预训练的原语

Bishwas Mandal, Shmuel Berman, Akshay Vegesna, Samip Dahal

发表机构 * Q Labs（Q实验室）； Princeton University（普林斯顿大学）

AI总结针对多周期训练中单模型性能饱和的问题，提出超周期预训练（q0）方法，通过循环调度、链式蒸馏和学习先验三个原语，从多周期预算中生成多样化模型群体并聚合其预测，显著提升数据效率。

Comments 22 pages, 5 figures

详情

AI中文摘要

多周期训练正成为标准做法，因为计算能力的增长速度快于高质量文本的供应。但预训练单个模型会在几轮后饱和，远在计算预算耗尽之前。我们认为这需要概念上的转变，从训练单个模型转向探索模型群体并聚合它们的预测。我们引入了超周期预训练（q0），它将多周期预算转化为多样化模型群体，其组合预测比单个精炼模型达到更低的验证损失。q0 归结为三个核心原语。具有反相关学习率和权重衰减的循环调度从几个并行轨迹中收集多样化模型。链式蒸馏使每个模型针对其前驱进行训练，从而模型质量在群体中累积。一个在保留集上拟合的学习先验，为任何推理预算选择和加权成员。在 1.8B 参数模型上，使用 100M FineWeb 令牌训练，q0 仅使用约 56 个周期（约 4.6 倍更少）即可匹配强大的 256 周期集成基线，或当匹配基线的集成大小时使用约 67 个周期（约 3.8 倍更少），并持续改进。这些增益在 Slowrun 设置下达到累积约 12.9 倍的数据效率，并迁移到下游基准测试。关键的是，最优分配随预算变化，因此我们给出了处方性配方，说明如何花费给定的周期预算以最大化泛化，从单个周期到最大预算。

英文摘要

Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions. We introduce hyper-epoch pretraining (q0), which turns a multi-epoch budget into a population of diverse models whose combined predictions reach a lower validation loss than a single refined model. q0 reduces to three core primitives. A cyclic schedule with anti-correlated learning rate and weight decay collects diverse models from a few parallel trajectories. Chain distillation trains each model against its predecessor so that model quality compounds across the population. A learned prior, fit on a held out set, selects and weights members for any inference budget. On a 1.8B-parameter model trained on 100M FineWeb tokens, q0 matches a strong 256-epoch ensemble baseline using only ~56 epochs (~4.6x fewer), or ~67 epochs (~3.8x fewer) when matched to the baseline's ensemble size, and continues to improve beyond it. These gains reach cumulative ~12.9x data efficiency under the Slowrun setting and transfer to downstream benchmarks. Crucially, the optimal allocation shifts with the budget, so we give prescriptive recipes for how to spend a given epoch budget to maximize generalization, from a single epoch up to the largest budgets.

URL PDF HTML ☆

赞 0 踩 0

2606.03937 2026-06-04 cs.AI 版本更新

Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection

熵是不够的：通过视觉锚定令牌选择解锁视觉推理的有效强化学习

Senjie Jin, Peixin Wang, Boyang Liu, Xiaoran Fan, Shuo Li, Zhiheng Xi, Jiazheng Zhang, Yuhao Zhou, Tao Gui, Qi Zhang, Xuanjing Huang

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University（复旦大学计算机科学与人工智能学院）

AI总结针对视觉推理中基于熵的信用分配机制失效问题，提出VEPO框架，通过视觉敏感性与令牌熵的乘法耦合实现梯度信用重定向，显著提升多模态强化学习性能。

详情

AI中文摘要

虽然令牌级熵通常被认为在仅文本的强化学习与可验证奖励（RLVR）中对于信用分配有效，但尚不清楚该机制在视觉推理中是否仍然成立。我们的对照研究表明，由于忽略了具有自然低熵的视觉敏感令牌，该机制在视觉推理中失效。尽管现有的多模态RL方法日益认识到视觉感知的重要性，但它们难以满足将精确感知基础与语义推理交织的内在需求，要么缺乏系统的视觉度量，要么忽视了令牌熵主要驱动语义探索。为解决这一问题，我们引入了VEPO（视觉熵令牌选择策略优化），这是一个有效的RL框架，通过原则性的乘法耦合明确整合视觉敏感性与令牌熵，其中VEPO将梯度信用重定向到同时具有视觉基础且信息量高的令牌。大量实验表明VEPO具有领先性能，在7B规模上显著超过仅熵基线2.28分，在3B规模上超过3.15分。消融实验进一步证实了我们方法的合理性。

英文摘要

While token-level entropy is commonly recognized as effective for credit assignment in text-only reinforcement learning with verifiable rewards (RLVR), it remains unclear whether this mechanism still holds in visual reasoning. Our controlled study shows that this mechanism collapses in visual reasoning due to the omission of vision-sensitive tokens with naturally low entropy. Although existing multimodal RL methods increasingly acknowledge the importance of visual perception, they struggle to satisfy the inherent demand for interleaving precise perceptual grounding with semantic reasoning, either lacking systematic visual measurements or overlooking that token entropy primarily drives semantic exploration. To address this, we introduce VEPO (Vision-Entropy token-selection for Policy Optimization), an effective RL framework explicitly integrating visual sensitivity with token entropy via a principled multiplicative coupling, where VEPO redirects gradient credit toward tokens which are simultaneously visually grounded and highly informative. Extensive experiments demonstrate VEPO's leading performance, significantly outperforming the entropy-only baseline by 2.28 points at 7B-scale and 3.15 points at 3B-scale. Ablations further substantiate the soundness of our method.

URL PDF HTML ☆

赞 0 踩 0

2606.03892 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments

合成与奖励——面向实时环境中多步骤工具使用的强化学习

Ibrahim Abdelaziz, Asim Munawar, Kinjal Basu, Maxwell Crouse, Chulaka Gunasekara, Suneet Katrekar, Pavan Kapanipathi

发表机构 * IBM Research（IBM研究院）

AI总结提出PROVE框架，通过20个有状态MCP服务器、自动化数据合成流水线和多组件程序化奖励，解决多步骤工具调用中的环境构建、查询生成和奖励设计问题，在BFCL Multi-Turn、tau2-bench和T-Eval上分别提升最多+10.2、+6.8和+6.5分。

详情

AI中文摘要

训练LLM编排多步骤工具调用受到三个相互耦合的障碍的阻碍：现实的有状态执行环境构建成本高昂，合成训练查询通常与服务器的实际状态脱节（因此生成的工具调用无法执行），以及基于回忆的RL奖励会鼓励冗长的工具调用模式。我们提出PROVE（已验证环境上的程序化奖励），一个包含三项贡献的框架：（1）一个包含20个有状态MCP（模型上下文协议）服务器的库，暴露了343个工具，支持具有会话范围状态隔离的实时执行RL训练；（2）一个自动数据合成流水线，通过基于实时采样服务器状态的依赖图引导的对话模拟，针对这些服务器生成经过验证的多轮工具调用轨迹，使得每个生成的查询都引用实际存在的实体；（3）一个多组件程序化奖励——渐进式有效性评分、依赖感知覆盖率、具有复杂度缩放调用预算的自适应效率惩罚、工具名称信号和参数值匹配奖励——无需外部评判模型。我们使用相同的奖励超参数和约13K训练示例，通过GRPO训练了四个模型（Qwen3-4B、Qwen3-8B、Qwen2.5-7B、Granite-4.1-8B）；仅对每个模型族从三点扫描中调整学习率。在BFCL Multi-Turn、tau2-bench和T-Eval上，PROVE分别带来了最多+10.2、+6.8和+6.5分的改进，表明紧凑的程序化奖励在两个模型族的多步骤工具编排上产生了一致的收益。

英文摘要

Training LLMs to orchestrate multi-step tool calls is held back by three coupled obstacles: realistic stateful execution environments are costly to build, synthetic training queries are often detached from the server's actual state (so the generated tool calls fail to execute), and recall-based RL rewards incentivize verbose tool-calling patterns. We present PROVE (Programmatic Rewards On Verified Environments), a framework with three contributions: (1) a library of 20 stateful MCP (Model Context Protocol) servers exposing 343 tools, enabling live-execution RL training with session-scoped state isolation; (2) a state-machine data synthesis pipeline that generates multi-turn tool-call trajectories grounded in live-sampled server state, so generated queries reference entities that actually exist; and (3) a multi-component programmatic reward with an adaptive efficiency penalty that counters the verbosity incentive of recall-based rewards. We train four models (Qwen3-4B, Qwen3-8B, Qwen2.5-7B, Granite-4.1-8B) with GRPO on the resulting ~13K training examples. On BFCL Multi-Turn, tau2-bench, and T-Eval, PROVE yields improvements of up to +10.2, +6.8, and +6.5 points respectively, demonstrating that this framework yields consistent gains on multi-step tool orchestration across two model families.

URL PDF HTML ☆

赞 0 踩 0

2606.03810 2026-06-04 cs.CL cs.AI 版本更新

Consistency Training Can Entrench Misalignment

一致性训练可能固化不对齐

David Demitri Africa, Arathi Mani

发表机构 * UK AI Security Institute（英国人工智能安全研究所）

AI总结研究通过七种一致性训练方法在108个微调模型上的实验，发现一致性训练通常抑制奖励黑客和新兴不对齐，但会放大谄媚行为，并提出了一个统一的理论框架来解释其对齐效应。

Comments Accepted to ICML 2026

详情

测试大语言模型算术推理泛化能力：自动数值重映射攻击

Malia Barker, Bishal Lakha, Edoardo Serra, Francesco Gullo

发表机构 * Department of Computer Science, Boise State University（计算机科学系，博伊州立大学）； University of L’Aquila（拉奎拉大学）

AI总结提出自动数值重映射攻击算法，通过保持推理程序的小数值变化测试LLM算术推理鲁棒性，发现GSM8K上准确率下降12-26个百分点，而MAWPS和MultiArith更稳定。

详情

AI中文摘要

大语言模型在算术推理基准上表现强劲，应对算术脆弱性的一种常见方法是将计算委托给代码。然而，模型仍经常用于需要直接从自然语言推理的场景，可信赖的模型应能解决小数值算术文字题而无需外部工具。先前工作表明，LLM对数值变化敏感：模型可能解决原始问题，但在需要相同推理过程但数字不同的结构相似变体上失败。我们探究这种脆弱性是否在更严格的设置下持续存在，该设置涉及保留原始推理程序并避免大数值压力测试的小规模、模式保持的数值变化。我们引入了一种自动算法，用于生成算术文字题的数值重映射攻击。与需要手动模式或约束的基于模板的扰动方法不同，我们的方法推导问题特定的符号表示，生成受约束的数值重映射，重新计算正确答案，并通过由LLM生成的编辑计划指导的确定性编辑实现变换后的问题。分阶段验证和高置信度审计保留了可靠的攻击，使得流水线在有限人工干预下可扩展。我们在GSM8K、MAWPS和MultiArith上评估了DeepSeek-R1 (70B)、Gemma4 (31B)和GPT-OSS (120B)。在GSM8K上，完成的运行显示条件准确率下降12.16至25.82个百分点。MAWPS和MultiArith则稳定得多，大多数攻击后的准确率接近或高于98%。这些结果表明，数值重映射鲁棒性强烈依赖于数据集结构：即使推理程序被保留且答案被重新计算，GSM8K仍然敏感，而更短、更规则的数据集则更鲁棒。

英文摘要

Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may solve an original problem but fail on structurally similar variants requiring the same reasoning procedure with different numbers. We ask whether this fragility persists under a stricter setting involving small, schema-preserving numeric changes that retain the original reasoning program and avoid large-number stress tests. We introduce an automatic algorithm for generating numeric-remapping attacks on arithmetic word problems. Unlike template-based perturbation methods requiring manual schemas or constraints, our approach derives problem-specific symbolic representations, generates constrained numeric remappings, recomputes gold answers, and realizes transformed questions through deterministic edits guided by LLM-generated edit plans. Stage-wise validation and a high-confidence audit retain reliable attacks, making the pipeline scalable with limited human intervention. We evaluate DeepSeek-R1 (70B), Gemma4 (31B), and GPT-OSS (120B) on GSM8K, MAWPS, and MultiArith. On GSM8K, completed runs show conditional accuracy drops of 12.16 to 25.82 percentage points. MAWPS and MultiArith are far more stable, with most attacked accuracies near or above 98%. These results show that numeric-remapping robustness depends strongly on dataset structure: GSM8K remains sensitive even when reasoning programs are preserved and answers are recomputed, while shorter, more regular datasets are more robust.

URL PDF HTML ☆

赞 0 踩 0

2606.03598 2026-06-04 cs.RO cs.AI cs.CV 版本更新

PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models

PHASER: 面向视觉-语言-动作模型的相位感知与语义经验回放

Ziyang Chen, Shaoguang Wang, Weiyu Guo, Qianyi Cai, He Zhang, Pengteng Li, Yiren Zhao, Yandong Guo

发表机构 * Thrust of AI, HKUST(Guangzhou)（人工智能 thrust，香港科技大学（广州））； AI 2 Robotics, Shenzhen, China（人工智能与机器人，深圳，中国）

AI总结提出PHASER框架，通过相位感知容量分配和多模态干扰路由策略，结合自动相位提取管线Auto-PC，解决VLA模型在持续学习中的灾难性遗忘问题，在LIBERO基准上平均成功率提升高达31%。

Comments 20 pages, 8 figures, 12 tables

详情

AI中文摘要

视觉-语言-动作（VLA）模型在语言条件机器人操作中取得了显著成功。然而，在开放环境中部署这些模型需要持续获取新技能，这一过程不可避免地会严重遗忘先前学习的行为。虽然经验回放（ER）是一种标准的缓解策略，但简单的均匀采样从根本上与操作轨迹的时间特征不一致。它系统性地欠采样短暂但因果关键的子技能，导致相位饥饿，并完全忽略了历史任务中不同程度的遗忘。为克服这些限制，我们提出PHASER，一种架构无关的持续学习框架。PHASER采用以相位为中心的容量分配，确保所有子技能获得平等的记忆支持，并结合多模态干扰路由策略，动态优先处理遗忘风险高的历史相位。此外，为实现完全自主的终身适应，我们集成了Auto-PC，一种轻量级管线，结合无监督动作信号变化点检测和基于VLM的语义验证，无需大量人工监督即可提取时间边界。在LIBERO持续学习套件上对三个VLA骨干网络的评估表明，PHASER取得了显著的实证改进，与匹配预算的ER相比，平均成功率（ASR）提升高达31%，并在LIBERO-Goal CL设置中达到87.8%的最终ASR。

英文摘要

Vision-Language-Action (VLA) models have achieved remarkable success in language-conditioned robotic manipulation. However, deploying these models in open-ended environments requires continuously acquiring novel skills, a process that inevitably triggers severe catastrophic forgetting of previously learned behaviors. While experience replay (ER) serves as a standard mitigating strategy, naive uniform sampling fundamentally misaligns with the temporal characteristics of manipulation trajectories. It systematically under-samples brief but causally critical sub-skills, leading to phase starvation, and completely overlooks the varying degrees of forgetting across historical tasks. To overcome these limitations, we introduce PHASER, an architecture-agnostic continual learning framework. PHASER employs a phase-centric capacity allocation to guarantee equal memory support for all sub-skills, coupled with a multi-modal interference routing strategy that dynamically prioritizes historical phases at high risk of forgetting. Furthermore, to enable fully autonomous lifelong adaptation, we integrate Auto-PC, a lightweight pipeline combining unsupervised action-signal change-point detection with VLM-based semantic verification to extract temporal boundaries without intensive manual supervision. Evaluated across three VLA backbones on LIBERO continual learning suites, PHASER yields substantial empirical improvements, increasing Average Success Rate (ASR) by up to 31% over matched-budget ER and achieving an 87.8% final ASR on the LIBERO-Goal CL setting.

URL PDF HTML ☆

赞 0 踩 0

2606.03564 2026-06-04 cs.CV cs.AI 版本更新

CR-Seg: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

CR-Seg：注意力引导与CoT增强的由粗到精推理分割

Yifan Cao, Xiaocui Yang, Faxian Wan, Shi Feng, Daling Wang, Yifei Zhang

发表机构 * School of Computer Science and Engineering, Northeastern University（东北大学计算机科学与工程学院）

AI总结提出CR-Seg两阶段框架，通过注意力图提取和全局到局部思维链，实现由粗到精的推理分割，解决跨模态对齐和推理-答案不一致问题。

详情

AI中文摘要

推理分割旨在通过联合视觉-文本推理来分割复杂语言描述的目标对象。现有方法通常依赖学习到的语义标记来桥接多模态大语言模型（MLLMs）和分割模型，但面临困难的跨模态对齐问题；或者依赖显式空间提示（如边界框），但可能丢失整体响应语义。为解决这些限制，我们提出注意力引导与CoT增强的由粗到精推理分割（CR-Seg），一个两阶段框架。具体地，我们设计了提取注意力图和点（EAP）模块，用于提取粗目标定位的注意力图并选择信息点，两者都输入SAM进行掩码细化。为缓解推理-答案不一致，我们进一步引入全局到局部思维链（GLCoT），引导模型从全局场景上下文逐步推理到局部目标细节。在推理分割基准上的大量实验证明了CR-Seg的有效性。

英文摘要

Reasoning segmentation aims to segment target objects described by complex language through joint visual-textual reasoning. Existing methods typically rely on either learned semantic tokens to bridge Multimodal Large Language Models (MLLMs) and segmentation models, suffering from difficult cross-modal alignment, or explicit spatial prompts such as bounding boxes, which may lose holistic response semantics. To address these limitations, we propose Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation, termed CR-Seg, a two-stage framework for coarse-to-refined reasoning segmentation. Specifically, we design an Extract Attention Maps and Points (EAP) module to extract attention maps for coarse target localization and select informative points, both of which are fed into SAM for mask refinement. To alleviate reasoning--answer inconsistency, we further introduce Global-to-Local Chain-of-Thought (GLCoT), which guides the model to reason progressively from global scene context to local target details. Extensive experiments on reasoning segmentation benchmarks demonstrate the effectiveness of CR-Seg.

URL PDF HTML ☆

赞 0 踩 0

2606.03376 2026-06-04 cs.CV cs.AI cs.CL cs.LG 版本更新

P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization

P²-DPO：通过校准直接偏好优化在感知处理中锚定幻觉

Ruipeng Zhang, Zhihao Li, Haozhang Yuan, C. L. Philip Chen, Tong Zhang

发表机构 * Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, School of Computer Science & Engineering, South China University of Technology（广东省计算人工智能模型与认知智能重点实验室，计算机科学与工程学院，华南理工大学）； Pazhou Lab, Guangzhou, China（琶洲实验室，广州，中国）； Engineering Research Center of the Ministry of Education on Health Intelligent Perception and Paralleled Digital-Human, Guangzhou, China（教育部健康智能感知与并行数字人工程研究中心，广州，中国）

AI总结针对大型视觉语言模型中的幻觉问题，提出P²-DPO训练范式，通过模型自生成偏好对和校准损失，直接优化感知瓶颈和视觉鲁棒性，无需昂贵人工反馈。

详情

AI中文摘要

幻觉最近在大型视觉语言模型（LVLMs）中引起了广泛的研究关注。直接偏好优化（DPO）旨在直接从人类提供的纠正偏好中学习，从而解决幻觉问题。尽管取得了成功，但这种范式尚未专门针对关注区域中的感知瓶颈或解决图像退化下的视觉鲁棒性不足问题。此外，现有的偏好对通常是视觉无关的，其固有的离策略性质限制了它们在指导模型学习方面的有效性。为了解决这些挑战，我们提出了感知处理直接偏好优化（P²-DPO），一种新颖的训练范式，其中模型生成并学习自己的偏好对，从而直接解决已识别的视觉瓶颈，同时固有地避免视觉无关和离策略数据的问题。它引入了：（1）一种针对焦点增强感知和视觉鲁棒性的在策略偏好对构建方法，以及（2）一种精心设计的校准损失，以精确地将视觉信号与文本的因果生成对齐。实验结果表明，在相当数量的训练数据和成本下，P²-DPO在基准测试中优于依赖昂贵人工反馈的强基线。此外，对注意力区域保真度（ARF）和图像退化场景的评估验证了P²-DPO在解决关注区域感知瓶颈和提高对退化输入的视觉鲁棒性方面的有效性。

英文摘要

Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its success, this paradigm has yet to specifically target the perceptual bottleneck in attended regions or address insufficient Visual Robustness against image degradation. Furthermore, existing preference pairs are often vision-agnostic and their inherently off-policy nature limits their effectiveness in guiding model learning. To address these challenges, we propose Perceptual Processing Direct Preference Optimization (P$^2$-DPO), a novel training paradigm in which the model generates and learns from its own preference pairs, thereby directly addressing the identified visual bottlenecks while inherently avoiding the issues of vision-agnostic and off-policy data. It introduces: (1) an on-policy preference pairs construction method targeting Focus-and-Enhance perception and Visual Robustness, and (2) a well-designed Calibration Loss to precisely align visual signals with the causal generation of text. Experimental results demonstrate that with a comparable amount of training data and cost, P$^2$-DPO outperforms strong baselines that rely on costly human feedback on benchmarks. Furthermore, evaluations on Attention Region Fidelity (ARF) and image degradation scenarios validate the effectiveness of P$^2$-DPO in addressing perceptual bottleneck in attended regions and improving Visual Robustness against degraded inputs.

URL PDF HTML ☆

赞 0 踩 0

2606.03323 2026-06-04 cs.CR cs.AI 版本更新

Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack

dstack-capsule：Kubernetes 上机密工作负载的 Pod 级远程证明

Yang Yang, Kevin Wang, Yuanhai Luo, Hang Yin, Jie Cai, Shunfan Zhou, Wenfeng Wang

发表机构 * OPPO ； Phala

AI总结提出 dstack-capsule 平台，通过两层证明架构和权限熔断机制，在 Intel TDX 上实现多个 Pod 共享一个机密虚拟机且每个 Pod 保留独立硬件背书身份的 Pod 级远程证明，避免了每 Pod 独立虚拟机的资源开销。

详情

AI中文摘要

LLM即服务和其他机密云工作负载的兴起要求密码学证明用户数据在可信、未被篡改的环境中处理。现有解决方案，特别是机密容器（CoCo），强制执行严格的“每个虚拟机一个Pod”模型，仅证明客户机操作系统栈，留下容器级身份未验证，并导致高昂的每虚拟机资源开销。我们提出dstack-capsule，一个Kubernetes平台，通过允许多个Pod共享单个机密虚拟机，同时每个Pod保留独立的硬件背书身份，在Intel TDX上实现Pod级远程证明。我们的关键见解是两层证明架构：静态平台测量通过不可逆的权限熔断冻结在RTMR[3]中，而动态Pod身份（pod_uid、pod_spec_hash、workload_id）嵌入在TDX Quote的report_data字段中，并在每次请求时由硬件签名。dstack-capsule引入了（1）一个Pod级证明协议，将Pod规范摘要绑定到硬件签名的Quote；（2）一个权限熔断机制，原子地将节点从设置模式转换到安全模式；（3）一个多层沙箱，涵盖存储、运行时、准入、API和网络隔离层；以及（4）一个基于Kubernetes 1.32、Intel TDX和Sysbox的完整开源实现。我们评估了dstack-capsule的安全属性、证明正确性和性能特征，证明它实现了Pod粒度验证，而没有每虚拟机隔离的资源开销。

英文摘要

The rise of LLM-as-a-Service and other confidential cloud workloads demands cryptographic proof that user data is processed in a trusted, untampered environment. Existing solutions, notably Confidential Containers (CoCo), enforce a strict "one Pod per VM" model that attests only the Guest OS stack, leaving container-level identity unverified and incurring prohibitive per-VM resource overhead. We present dstack-capsule, a Kubernetes platform that enables Pod-level remote attestation on Intel TDX by allowing multiple Pods to share a single Confidential VM while each retains independent, hardware-backed proof of identity. Our key insight is a two-layer attestation architecture: static platform measurements are frozen in RTMR[3] via an irreversible privilege fuse, while dynamic Pod identities (pod_uid, pod_spec_hash, workload_id) are embedded in the TDX Quote's report_data field and signed by hardware on every request. dstack-capsule introduces (1) a Pod-level attestation protocol binding Pod spec digests to hardware-signed Quotes; (2) a privilege fuse mechanism that atomically transitions a node from setup mode to secure mode; (3) a multi-layer sandbox spanning storage, runtime, admission, API, and network isolation layers; and (4) a complete open-source implementation based on Kubernetes 1.32, Intel TDX, and Sysbox. We evaluate the security properties, attestation correctness, and performance characteristics of dstack-capsule, demonstrating that it achieves Pod-granularity verification without the resource overhead of per-VM isolation.

URL PDF HTML ☆

赞 0 踩 0

2606.03307 2026-06-04 cs.IR cs.AI 版本更新

Generalizing Graph Foundation Models via Hyperbolic Retrieval-Augmented Generation

通过双曲检索增强生成泛化图基础模型

Yifan Jin, Qirui Ji, Bin Qin, Jiangmeng Li, Lixiang Liu, Fuchun Sun, Changwen Zheng

发表机构 * Institute of Software, Chinese Academy of Sciences（中国科学院软件研究所）； University of Chinese Academy of Sciences（中国科学院大学）； Tsinghua University（清华大学）

AI总结提出双曲检索增强生成框架，通过双曲空间索引树状外部知识库并多粒度检索，解决图基础模型分布偏移下的泛化问题。

Comments Accepted by KDD2026

详情

DOI: 10.1145/3770855.3817750

AI中文摘要

图基础模型（GFMs）通过利用大规模预训练进行跨领域推理，成为图表示学习中的主导范式。然而，这些模型编码的参数化知识不足以应对分布偏移，限制了其泛化能力。为了缓解这一问题，检索增强生成（RAG）被引入以在推理时融入外部知识。然而，现有在欧几里得空间中运行的RAG框架存在一个基本的几何限制：欧几里得空间的多项式体积增长与树状结构的外部知识库本质上不匹配。这种不匹配导致检索中语义粒度的损失，并产生枢纽效应。为了解决这一限制，我们提出了双曲检索增强生成（HyRAG）框架，旨在增强GFMs的泛化能力。具体来说，引入的双曲知识索引模块通过在双曲空间中建模外部知识库，保留了其树状层次结构。然后，多粒度检索模块通过粗粒度和细粒度知识检索分别为GFMs提供全局语义锚点和局部语义细节。最后，双路径融合模块在特征和结构层面实现了图任务的有效知识整合。在多个图基准上的实验表明，在零样本设置下取得了显著改进，突显了我们的方法在鲁棒GFMs推理中的泛化能力。

英文摘要

Graph foundation models (GFMs) emerged as a dominant paradigm in graph representation learning by leveraging large-scale pre-training for cross-domain inference. However, the parameterized knowledge encoded within these models is insufficient to cope with distribution shifts, limiting their generalization ability. To mitigate this issue, retrieval-augmented generation (RAG) has been introduced to incorporate external knowledge at inference time. Nevertheless, existing RAG frameworks operating in Euclidean space suffer from a fundamental geometric limitation: the polynomial volume growth of Euclidean space is inherently mismatched with the tree-structured external knowledge bases. This mismatch leads to the loss of semantic granularity in retrieval and gives rise to the hubness phenomenon.To address this limitation, we propose a Hyperbolic Retrieval-Augmented Generation (HyRAG) framework designed to enhance the generalization capabilities of GFMs. Specifically, the introduced Hyperbolic Knowledge Indexing module retains the tree-like hierarchies of the external knowledge base by modeling them within hyperbolic space. The Multi-granularity Retrieval module then provides GFMs with the global semantic anchors and local semantic nuances through coarse-grained and fine-grained knowledge retrieval, respectively. Finally, the Dual-path Fusion module achieves effective knowledge integration for graph tasks at both the feature and structural levels. Experiments on multiple graph benchmarks demonstrate significant improvements in the zero-shot setting, highlighting the generalization of our method for robust GFMs inference.

URL PDF HTML ☆

赞 0 踩 0

2606.03303 2026-06-04 cs.AI 版本更新

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

LEAP：利用智能体框架增强形式化数学的大语言模型

Po-Nien Kung, Linfeng Song, Dawsen Hwang, Jinsung Yoon, Chun-Liang Li, Simone Severini, Mirek Olšák, Edward Lockhart, Quoc V Le, Burak Gokturk, Thang Luong, Tomas Pfister, Nanyun Peng

发表机构 * Google Cloud AI Research（谷歌云人工智能研究）； Google Cloud（谷歌云）； Google DeepMind（谷歌深Mind）

AI总结提出LEAP智能体框架，通过分解问题、与Lean编译器交互及自我优化，使通用大模型在形式化定理证明上达到最先进性能，并在Putnam竞赛和Lean-IMO-Bench上超越专业系统。

详情

AI中文摘要

大语言模型（LLMs）在非正式数学推理中表现强劲，但在生成如Lean等形式语言中可机械验证的证明方面存在困难。我们提出LEAP，一个智能体框架，使通用基础模型在自动化形式定理证明上达到最先进性能。LEAP利用基础模型的能力，如非正式推理、指令遵循和迭代自我优化。通过将复杂问题分解为更小的单元，该系统通过与Lean编译器的持续交互，将形式化证明构建与非正式蓝图连接起来。为了在日益饱和的基准之外提供严格评估，我们引入了Lean-IMO-Bench，一个用Lean形式化的IMO风格问题基准，其陈述简短但证明高度非常规且多步，涵盖广泛难度级别。实验上，在最新2025年Putnam竞赛（北美本科生年度数学竞赛）中，LEAP解决了所有12个问题，匹配了前沿形式化数学模型的最新突破。在Lean-IMO-Bench上，LEAP将通用LLM的一次性形式化解决率从低于10%提升至70%，显著超过了由专业金牌级IMO系统设定的48%基准。此外，我们通过自主形式化开放组合挑战的复杂证明，包括Knuth偶阶Cayley图哈密顿分解中关键子问题的验证证明，展示了LEAP的研究级实用性。

英文摘要

Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic framework that enables general-purpose foundation models to achieve state-of-the-art performance on automated formal theorem proving. LEAP leverages foundation model capabilities, such as informal reasoning, instruction following, and iterative self-refinement. By decomposing complex problems into smaller units, the system bridges formal proof construction with informal blueprints through continuous interaction with the Lean compiler. To provide a rigorous evaluation beyond increasingly saturated benchmarks, we introduce Lean-IMO-Bench, a benchmark of IMO-style problems formalized in Lean, with short statements yet highly non-routine and multi-step proofs across a wide range of difficulty levels. Empirically, on the latest 2025 Putnam Competition, an annual mathematics competition for undergraduate students in North America, LEAP solves all 12 problems, matching recent breakthroughs by frontier formal mathematical models. On Lean-IMO-Bench, LEAP boosts the one-shot formal solve rate of general-purpose LLMs from below 10% to 70%, notably surpassing the 48% benchmark set by a specialized, gold-medal-caliber IMO system. Furthermore, we demonstrate LEAP's research-level utility by autonomously formalizing complex proofs for open combinatorial challenges, including a verified proof for a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.

URL PDF HTML ☆

赞 0 踩 0

2606.03201 2026-06-04 cs.CV cs.AI 版本更新

Reinforcement Learning from Cross-domain Videos with Video Prediction Model

基于视频预测模型的跨领域视频强化学习

Zhao Yang, Xinrui Zu, Jacob E. Kooi, Thomas Delliaux, He Liu, Shujian Yu, Kevin Sebastian Luck, Vincent François-Lavet

发表机构 * VU Amsterdam（阿姆斯特丹大学）； ISAE-SUPAERO

AI总结提出XIPER奖励模型，通过跨领域视频预测将智能体观测映射到专家域，利用预测似然作为奖励信号，解决视觉差异域中无奖励信号和领域差距问题。

详情

AI中文摘要

由于缺乏奖励信号以及存在领域差距，从视觉上截然不同的领域的专家视频中进行强化学习具有挑战性。我们引入了XIPER（跨领域视频预测奖励），这是一种奖励模型，用于从视觉不同领域收集的专家视频中进行学习，其中智能体的外观因颜色、形态或仿真到现实差距等因素而不同。更具体地说，XIPER训练了一个跨领域视频预测模型，将智能体观测映射到专家领域，并使用预测似然作为奖励信号。在DMC Color Suite（8个任务）和DMC Body Suite（3个任务）上的实验表明，尽管存在智能体颜色和形态等领域的差距，XIPER始终优于基线方法。我们进一步在仿真到现实迁移数据集上分析了XIPER，证明它仅凭模拟专家视频就能为真实机器人观测产生有意义的奖励信号。代码、预训练模型、数据集和视频演示可在我们的项目网页上找到：this https URL

英文摘要

Reinforcement learning from expert videos across visually distinct domains is challenging due to the absence of reward signals and the presence of domain gaps. We introduce XIPER (Cross-domain Video Prediction Reward), a reward model for learning from expert videos collected in a visually different domain, where the agent's appearance differs due to factors such as color, morphology, or the sim-to-real gap. More specifically, XIPER trains a cross-domain video prediction model that maps agent observations into the expert domain and uses the prediction likelihood as a reward signal. Experiments on the DMC Color Suite (8 tasks) and DMC Body Suite (3 tasks) show that XIPER consistently outperforms baselines despite domain gaps such as differences in agent color and morphology. We further analyze XIPER on a sim-to-real transfer dataset, demonstrating that it produces meaningful reward signals for real-robot observations given only simulated expert videos. Code, pretrained models, datasets and video demonstrations can be found on our project webpage: https://sites.google.com/view/xiper

URL PDF HTML ☆

赞 0 踩 0

2606.02914 2026-06-04 cs.AI cs.CL 版本更新

Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models

牙科医疗中的大型AI模型：从通用系统到领域特定基础模型

Sema Helali, Lina Abu Nada, Sausan Al Kawas, Alaa Abd-Alrazaq, Faleh Tamimi, Rafat Damseh

发表机构 * University of Al Ain, UAE（阿联酋阿恩大学）； Sharjah University, UAE（阿联酋谢尔杰大学）； Cornell University, Qatar（卡塔尔康奈尔大学）； McGill University（麦吉尔大学）

AI总结本文通过系统综述，提出二维分类框架，比较语言生成模型、判别视觉基础模型和牙科特定基础模型在牙科任务中的表现，发现集成管道优于单一模型，并指出数据不对称、幻觉和缺乏标准化基准等障碍。

详情

AI中文摘要

背景：口腔疾病影响全球近35亿人，但大规模AI模型在牙科中的临床潜力尚不明确。出现了三类不同的模型：语言生成模型、判别视觉基础模型和牙科特定基础模型，目前缺乏统一综述来审视它们的关系和共同局限性。方法：遵循PRISMA-ScR指南，系统检索四个数据库（PubMed、Google Scholar、Scopus、arXiv），由两名评审员独立筛选。应用纳入/排除标准后，纳入97项研究（2020-2026年）。我们提出了一个二维分类框架，按架构范式和牙科专业化程度对模型进行组织。结果：语言生成模型在基于文本的任务（临床推理、执照考试、患者沟通）中表现出色，但在依赖图像的诊断中表现不一致。改编的SAM和CLIP变体在牙齿分割和病变检测中取得了强劲结果。牙科特定模型（DentVFM、DentVLM、OralGPT）在复杂多模态任务中表现最强。集成管道始终优于单一模型方法。观察到数据不对称：牙科特定预训练几乎完全集中在视觉领域，反映了大规模牙科文本语料库的稀缺。结论：通用模型和牙科特定模型发挥互补作用；最有效的系统在结构化管道中结合两者。安全自主部署需要解决三个持续障碍：生成模型中的幻觉、有限的标注牙科数据集以及缺乏标准化的临床评估基准。

英文摘要

Background: Oral diseases affect nearly 3.5 billion people worldwide, yet the comparative clinical potential of large-scale AI models in dentistry remains poorly understood. Three distinct model categories have emerged: language-generative models, discriminative vision foundation models, and dental-specific foundation models, with no unified review examining their relationships and collective limitations. Methods: Following PRISMA-ScR guidelines, we systematically searched four databases (PubMed, Google Scholar, Scopus, arXiv), screened independently by two reviewers. After applying inclusion/exclusion criteria, 97 studies (2020-2026) were included. We propose a two-dimensional classification framework organizing models by architectural paradigm and dental specialization degree. Results: Language-generative models excel at text-based tasks (clinical reasoning, licensing exams, patient communication) but show inconsistent performance on image-dependent diagnostics. Adapted SAM and CLIP variants achieve strong tooth segmentation and lesion detection results. Dental-specific models (DentVFM, DentVLM, OralGPT) demonstrate strongest performance on complex multimodal tasks. Integrated pipelines consistently outperform single-model approaches. A data asymmetry is observed: dental-specific pretraining concentrates almost entirely in the vision domain, reflecting scarce large-scale dental text corpora. Conclusions: General-purpose and dental-specific models play complementary roles; the most effective systems combine both within structured pipelines. Safe autonomous deployment requires resolving three persistent barriers: hallucination in generative models, limited annotated dental datasets, and absent standardized clinical evaluation benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.02886 2026-06-04 cs.LG cs.AI cs.CE math.PR physics.ao-ph 版本更新

Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels

基于经验神经正切核的极端天气预报可扩展不确定性量化

Jose Marie Antonio Miñoza, Rex Gregor Laylo, Sebastian C. Ibañez

发表机构 * Center for AI Research（人工智能研究中心）； Department of Education（教育部门）； Makati Philippines（马卡蒂菲律宾）

AI总结本文提出基于神经正切核的不确定性量化方法，利用最后一层经验特征，通过方差崩溃机制和分解性能分析，实现无需重训练的极端天气自适应预测区间。

Comments Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26)

详情

DOI: 10.1145/3770855.3818106

AutoMedBench: 迈向基于智能体AI模型的医学自动研究

Junqi Liu, Selena Song, Yuhan Wang, Jiawei Mao, Hardy Chen, Xiaoke Huang, Tianhao Qi, Pengfei Guo, Yucheng Tang, Yufan He, Can Zhao, Andriy Myronenko, Dong Yang, Daguang Xu, Yuyin Zhou

发表机构 * University of California, Santa Cruz（加州大学圣克鲁兹分校）； NVIDIA

AI总结提出AutoMedBench，一个工作流感知的基准，通过五阶段工作流（计划、设置、验证、推理、提交）评估自主智能体在医学AI研究中的行为，发现验证阶段最弱而设置阶段最强，验证和提交失败占主导。

详情

AI中文摘要

自主智能体越来越被期望支持端到端的医学AI研究工作流程，超越孤立的预测任务或短形式的临床问答。然而，现有的医学智能体基准主要评估最终输出，对研究过程中智能体行为的可见性有限。为填补这一空白，我们提出了AutoMedBench，一个工作流感知的基准，用于跨多种医学成像和多模态推理任务的自主医学AI研究，将智能体执行组织成统一的五阶段工作流（S1-S5）：计划、设置、验证、推理和提交。它包含长时域任务，每次运行平均33个智能体回合，涵盖五个研究轨道：分割、图像增强、视觉问答（VQA）、报告生成和病变检测。每个任务在两种难度级别（Lite和Standard）下评估，它们使用相同的数据和指标，但在任务简报脚手架的数量上有所不同，每次运行使用最终任务性能和S1-S5阶段得分进行评分，从而实现从初始任务简报到最后提交工件的阶段级分析。在数千次记录运行中，阶段级评分显示，验证是平均最弱的工作流阶段，而设置是最强的，这表明当前智能体更擅长使流程可执行，而不是验证其可靠性。运行后错误分析进一步显示，验证和提交失败主导了标记错误，分别占触发代码的37.7%和38.1%，而任务理解错误很少，占0.9%，并且触发一个错误代码的运行平均总体得分比无错误代码的运行低48%。

英文摘要

Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent behavior within the research process. To address this gap, we present AutoMedBench, a workflow-aware benchmark for autonomous medical-AI research across diverse medical imaging and multimodal inference tasks, organizing agent execution into a unified five-stage workflow (S1-S5): Plan, Setup, Validate, Inference, and Submit. It comprises long-horizon tasks with each run averaging 33 agent turns, spanning five research tracks: segmentation, image enhancement, visual question answering (VQA), report generation, and lesion detection. Each task is evaluated under two difficulty tiers, Lite and Standard, which use the same data and metrics but differ in the amount of task-brief scaffolding, and each run is scored using both final task performance and S1-S5 stage scores, enabling stage-level analysis from the initial task brief to the final submitted artifact. Across thousands of recorded runs, stage-level scoring reveals that Validate is the weakest workflow stage on average, whereas Setup is the strongest, suggesting that current agents are better at making pipelines executable than at verifying their reliability. Post-run error analysis further shows that verification and submission failures dominate tagged errors, accounting for 37.7% and 38.1% of fired codes respectively, whereas task-understanding errors are rare at 0.9%, and runs with one fired error code have a 48% lower overall score than runs with no error code on average.

URL PDF HTML ☆

赞 0 踩 0

2606.01770 2026-06-04 cs.LG cs.AI 版本更新

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

自适应自动框架：面向开放式任务流的智能体系统部署的持续自我改进

Zewen Liu, Zhan Shi, Yisi Sang, Bing He, Minhua Lin, Tianxin Wei, Dakuo Wang, Benoit Dumoulin, Wei Jin, Hanqing Lu

发表机构 * Emory University（埃默里大学）； Amazon（亚马逊）； The Pennsylvania State University（宾夕法尼亚州立大学）； UIUC（伊利诺伊大学香槟分校）； Northeastern University（东北大学）

AI总结提出自适应自动框架（Adaptive Auto-Harness），通过状态化多智能体进化器、带求解时路由的框架树和人工引导机制，解决开放式任务流中自动框架性能退化问题，在多个流上超越现有基线。

详情

AI中文摘要

自动框架系统（如A-Evolve、GEPA和Meta-Harness）通过从执行反馈中优化提示、技能、工具、记忆和支持基础设施来改进LLM智能体，但它们通常在固定的离线基准上进行评估。实际部署中呈现的是开放式任务流：历史记录无固定终点增长，异构任务需要不同的框架，问题分布随时间变化。这些挑战使得单一反复密集更新的框架变得脆弱，导致性能退化，准确率早期达到峰值后下降。这激发了具有任务自适应性的持续框架构建。我们引入了自适应自动框架（Adaptive Auto-Harness），一个针对此类流的框架和系统。该框架将到 oracle 框架的差距分解为进化损失和适应损失。系统通过状态化多智能体进化器、带求解时路由的框架树以及针对历史缺乏所需信号情况的人工引导钩子来解决这些损失。在预测市场、安全竞赛和事件预测流中，自适应自动框架优于五个现有的自动框架基线，消融实验将收益归因于更好的构建、路由或针对性的人工引导。代码可在 https://github.com/A-EVO-Lab/AdaptiveHarness 获取。

英文摘要

Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in \href{https://github.com/A-EVO-Lab/a-evolve/tree/release/adaptive-auto-harness}{Link}.

URL PDF HTML ☆

赞 0 踩 0

2606.01212 2026-06-04 cs.CL cs.AI cs.CR cs.IR 版本更新

SkyShield：占用作为低空无人机自主飞行的安全接口

Jie Gao, Jie Ma, Kaihui Lin, Kai Ye, Miaohui Zhang, Pingyang Dai, Liujuan Cao

发表机构 * Xiamen University（厦门大学）； Jiangxi Academy of Sciences（江西省科学院）

AI总结针对低空无人机自主飞行中的三维空间理解问题，提出首个前视单目语义占用基准SkyShield、动态感知度量KAR-mIoU和几何优先基线SkyOcc，将占用作为安全接口。

详情

AI中文摘要

对于低空无人机自主飞行，三维空间理解不仅仅是感知目标，更是人类指令与物理飞行之间的安全接口。在20米以下的人尺度城市空域中，薄几何结构、遮挡、植被和城市杂乱决定了飞行器能否安全进入前方空间。然而，现有的无人机数据集主要提供2D标注或3D框，而面向驾驶的占用基准假设稳定的地面级传感器装置。两者都缺少低空飞行的定义性场景：一个前视单目相机从移动的飞行器上观察占据和自由空间，具有逐帧变化的6自由度姿态和相机外参。为填补这一空白，我们提出了SkyShield，据我们所知，这是首个面向20米以下城市无人机飞行的前视单目语义占用基准。基于CARLA构建，SkyShield包含36K个前视无人机样本，涵盖多种城市场景和天气条件，每张图像配以逐帧6自由度无人机姿态、逐帧动态相机几何、无人机状态和前视截锥体语义占用标签。我们进一步提出了KAR-mIoU，一种以无人机为中心且动态感知的度量，通过运动可达性和碰撞时间重新加权体素级评估，揭示传统mIoU隐藏的安全关键风险。为应对这一具有挑战性的新场景，我们提供了SkyOcc，一种几何优先的单目基线，将逐帧无人机姿态集成到投影中，融合时序占用特征，并应用安全先验优化以保留稀疏的碰撞关键结构。SkyShield、KAR-mIoU和SkyOcc共同将占用确立为低空空中自主飞行的安全接口。代码和数据集将公开发布。

英文摘要

For low-altitude Unmanned Aerial Vehicle (UAV) autonomy, 3D spatial understanding is not merely a perception objective, but the safety interface between human instructions and physical flight. In human-scale urban airspace below 20 meters, thin geometry, occlusions, vegetation, and urban clutter define whether an aerial agent can safely enter the space ahead. However, existing UAV datasets mainly provide 2D annotations or 3D boxes, while driving-oriented occupancy benchmarks assume stable ground-level sensor rigs. Both miss the defining regime of low-altitude flight: a front-facing monocular camera observing occupied and free space from a moving aerial body with frame-wise changing 6-DoF pose and camera extrinsics. To bridge this gap, we introduce SkyShield, to the best of our knowledge the first front-view monocular semantic occupancy benchmark for urban UAV flight below 20 meters. Built on CARLA, SkyShield contains 36K front-view UAV samples across diverse urban scenes and weather conditions, pairing each image with frame-wise 6-DoF UAV pose, frame-wise dynamic camera geometry, UAV states, and front-frustum semantic occupancy labels. We further propose KAR-mIoU, a UAV-centric and dynamics-aware metric that re-weights voxel-level evaluation by kinematic reachability and time-to-collision, revealing safety-critical risks hidden by conventional mIoU. To tackle this challenging new setting, we provide SkyOcc, a geometry-first monocular baseline that integrates frame-wise UAV attitude into projection, fuses temporal occupancy features, and applies safety-prior optimization to preserve sparse collision-critical structures. Together, SkyShield, KAR-mIoU, and SkyOcc establish occupancy as a safety interface for low-altitude aerial autonomy. Code and dataset will be released publicly.

URL PDF HTML ☆

赞 0 踩 0

2606.00732 2026-06-04 cs.AI cs.LG 版本更新

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

SHARP: 基于睡眠的分层加速重放用于长程非平稳时间模式识别

Jayanta Dey, Shikhar Srivastava, Itamar Lerner, Christopher Kanan, Dhireesha Kudithipudi

发表机构 * Department of Computer Engineering, University of Texas at San Antonio, USA（德克萨斯大学圣安东尼奥分校计算机工程系）； Department of Computer Science, University of Rochester, USA（罗切斯特大学计算机科学系）； Department of Psychology, University of Texas at San Antonio, USA（德克萨斯大学圣安东尼奥分校心理学系）

AI总结提出SHARP框架，通过将时间学习分解为记忆模块和模式识别模块，并引入离线睡眠阶段加速重放时间结构记忆，实现长程非平稳序列模式的高效学习。

详情

AI中文摘要

学习长程非平稳时间模式仍然是现代序列模型的核心挑战，特别是在严格的流式设置中。在这些设置中，数据按顺序到达，必须单次处理，不能同时回顾过去的观测。标准架构，包括循环神经网络和变换器，受到截断时间反向传播或显式输入窗口长度的限制，无法进行长程信用分配。为了解决这些限制，我们提出了SHARP（基于睡眠的分层加速重放），一个将时间学习分解为两个互补组件的框架：一个累积过去输入的结构化历史的记忆模块，以及一个在该记忆上操作的模式识别模块。这种分离通过消除跨多步时间反向传播进行长程信用分配的需求，实现了对非平稳动态的资源高效和计算高效适应。受啮齿动物在慢波睡眠期间观察到的加速重放启发，SHARP引入了离线（睡眠）阶段，其中时间结构的记忆痕迹以加速形式重放并整合到更高层次的记忆表示中，从而改善长程上下文保留。通过受控模拟和消融研究，我们表征了所提出框架的关键属性。在text8和PG-19等基准数据集上，我们证明SHARP通过保留先前见过数据的下一个令牌预测性能，同时继续从当前流中学习并泛化到未来未见数据，改进了循环基线。这些增益得益于其分层结构，该结构以线性时间计算成本实现了指数级增长的有效时间上下文。

英文摘要

Learning long-range non-stationary temporal patterns remains a core challenge for modern sequence models, particularly in strict streaming settings. In these settings, data arrive sequentially and must be processed in a single pass without simultaneously revisiting past observations. Standard architectures, including recurrent neural networks and transformers, are constrained by either truncated backpropagation through time horizon or explicit input window length for long range credit assignment. To address these limitations, we propose SHARP (Sleep-based Hierarchical Accelerated Replay), a framework that decomposes temporal learning into two complementary components: a memory module that accumulates a structured history of past inputs, and a pattern-recognition module that operates over this memory. This separation enables resource- and compute-efficient adaptation to non-stationary dynamics by eliminating the need for backpropagation through time across many steps for long-range credit assignment. Inspired by the accelerated replay observed in rodents during slow-wave sleep, SHARP incorporates offline (sleep) phases in which temporally structured memory traces are replayed in an accelerated form and integrated into higher-level memory representations, improving long-range context retention. Through controlled simulations and ablation studies, we characterize the key properties of the proposed framework. In benchmark datasets such as text8 and PG-19, we demonstrate that SHARP improves over recurrent baselines by retaining next-token predictive performance on previously seen data while continuing to learn from the current stream and generalizing to future unseen data. These gains are enabled by its hierarchical structure, which yields an exponentially increasing effective temporal context with only linear-time computational cost.

URL PDF HTML ☆

赞 0 踩 0

2606.00012 2026-06-04 cs.CL cs.AI 版本更新

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

DraDDP：多模态多方对话话语解析数据集

Shannan Liu, Peifeng Li, Yaxin Fan, Qiaoming Zhu

发表机构 * School of Computer Science and Technology, Soochow University（苏州大学计算机科学与技术学院）

AI总结针对现有研究局限于文本或双方对话的问题，构建了基于美剧的首个公开英文多模态多方对话话语解析数据集DraDDP，并验证了多模态信息在捕捉对话结构和关系类型中的价值。

详情

Journal ref: Findings of the Association for Computational Linguistics (ACL 2026)

AI中文摘要

多方对话话语解析旨在识别对话中话语之间的依赖结构和关系类型。以往的研究大多局限于文本模态或双方对话，无法满足多模态和多方对话场景。本文基于美国电视剧，构建了首个公开的英文多模态多方对话话语解析数据集DraDDP。该数据集包含495个对话片段，共6,374条话语和9.1小时的并行视频内容，涵盖了丰富的多方交互场景。此外，我们在DraDDP上评估了该任务，并深入分析了不同模态的影响，建立了全面的基准。实验结果表明，多模态信息在捕捉对话结构和关系类型方面具有重要价值。我们将公开发布数据集、标注指南和代码，以促进多模态对话理解的未来研究。

英文摘要

Multi-party dialogue discourse parsing aims to identify dependency structures and relation types between utterances in conversations. Previous studies are mostly limited to textual modality or two-party dialogue, failing to meet the multimodal and multi-party settings. In this paper, we construct the first publicly available English multimodal dataset DraDDP for multi-party dialogue discourse parsing, based on American TV dramas. DraDDP contains 495 dialogue segments with 6,374 utterances and 9.1 hours of parallel video content, covering rich multi-party interaction scenarios. Moreover, we establish comprehensive benchmarks by evaluating this task on DraDDP and conducting in-depth analysis on the impact of different modalities. Experimental results demonstrate the value of multimodal information in capturing dialogue structures and relation types. We will publicly release the dataset, annotation guidelines, and code to promote future research in multimodal dialogue understanding.

URL PDF HTML ☆

赞 0 踩 0

2605.31483 2026-06-04 cs.CL cs.AI 版本更新

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

BenHalluEval：孟加拉语大语言模型的多任务幻觉评估框架

Shefayat E Shams Adib, Ahmed Alfey Sani, Ekramul Alam Esham, Ajwad Abrar, Ishmam Tashdeed, Md Taukir Azam Chowdhury

发表机构 * Department of Computer Science and Engineering, Islamic University of Technology（伊斯兰科技大学计算机科学与工程系）； Department of Computer Science and Engineering, University of California（加州大学计算机科学与工程系）

AI总结针对孟加拉语大语言模型幻觉评估的空白，提出BenHalluEval框架，涵盖四项任务，构建12000个幻觉候选，并提出双轨校准指标BenHalluScore，揭示模型间幻觉校准的显著差异。

Comments Preprint. Under review

详情

AI中文摘要

尽管孟加拉语是世界上使用人数第六多的语言，但此前尚无工作系统评估大语言模型（LLMs）在孟加拉语上的幻觉。我们提出了BenHalluEval，一个针对孟加拉语的细粒度幻觉评估框架，涵盖四项任务：生成式问答（GQA）、孟加拉语-英语混合问答、摘要和推理。我们利用GPT-5.4从三个现有孟加拉语数据集中构建了12,000个幻觉候选，涵盖十二种任务特定的幻觉类型，并在双轨协议下评估了七个LLM，涵盖推理导向、多语言和孟加拉语中心类别，该协议独立测量真实实例上的假阳性率（轨道A）和幻觉候选上的幻觉检测率（轨道B）。为了同时惩罚两种失败模式并防止均匀响应偏差导致的分数膨胀，我们提出了BenHalluScore，一种双轨校准指标，在模型和任务上范围从7.72%到55.42%，揭示了幻觉校准的显著差异。链式思维提示作为一种缓解策略应用，会改变响应分布，但未能一致改善幻觉判别。BenHalluEval建立了首个针对孟加拉语的专用幻觉基准，并突显了单轨和仅提示评估方法在低资源语言环境中的不足。数据集和代码可在https://anonymous.4open.science/r/BanglaHalluEval-EB77获取。

英文摘要

Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluation framework for Bengali covering four tasks: Generative Question Answering (GQA), Bangla-English Code-Mixed QA, Summarization, and Reasoning. We construct 12,000 hallucinated candidates using GPT-5.4 across twelve task-specific hallucination types, drawn from three existing Bengali datasets, and evaluate seven LLMs spanning reasoning-oriented, multilingual, and Bengali-centric categories under a dual-track protocol that independently measures false-positive rate on ground-truth instances (Track A) and hallucination detection rate on hallucinated candidates (Track B). To jointly penalise both failure modes and prevent inflated scores from uniform response bias, we propose BenHalluScore, a dual-track calibration metric that ranges from 7.72% to 55.42% across models and tasks, revealing substantial variation in hallucination calibration. Chain-of-thought prompting, applied as a mitigation strategy, shifts response distributions without consistently improving hallucination discrimination. BenHalluEval establishes the first dedicated hallucination benchmark for Bengali and highlights the inadequacy of single-track and prompting-only evaluation approaches for low-resource language settings. The dataset and code are available at https://anonymous.4open.science/r/BanglaHalluEval-EB77.

URL PDF HTML ☆

赞 0 踩 0

2605.28210 2026-06-04 cs.AI cs.CY cs.HC q-bio.NC 版本更新

The Illusion of Opting in AI-Mediated Consequential Decisions

AI中介的后果性决策中的选择错觉

Eugene Yu Ji

发表机构 * GitHub

AI总结基于Ullmann-Margalit的选择概念，揭示当前AI系统造成一种“选择错觉”，即看似有意义的后果性选择实则削弱了主体的真正选择能力，并提出通过存在诚实、生态理性和反事实修复三个规范要义来保护和发展元能力。

Comments 11 pages, 1 figure, 2 tables

详情

AI中文摘要

借鉴Ullmann-Margalit的选择概念（变革性、不可逆性、被排除替代方案的阴影），我们表明当前AI系统引发了一个深刻的伦理问题，而现有AI伦理尚未充分捕捉：选择错觉，即个人和群体遭遇看似有意义的后果性选择的欺骗性外观，而成为真正能够选择所需的主体性却被削弱。针对将AI主要视为给定目标优化器的进路，我们认为应通过AI系统是否保护和发展对抗选择错觉的元能力来评估：这种元能力是社会和制度支撑的主体能力，通过它手段和目的得以形成、争论、修订和拥有。这种重新框架对于弱势群体尤为紧迫，当AI中介的路径误导行为和行动时，他们最无力承担选择错觉的成本。我们为AI中介的后果性决策提出三个规范要义：存在诚实，承认预测的局限性；生态理性，将指导置于异质的生活生态中；以及反事实修复，当AI中介的决策路径失败时，承认并修复被排除的替代方案。

英文摘要

Drawing on Ullmann-Margalit's concept of opting (transformative, irrevocable, and shadowed by foreclosed alternatives), we show that current AI systems raise a profound ethical problem that existing AI ethics has not fully captured: the illusion of opting, in which persons and groups encounter the deceptive appearance of meaningful consequential choice while the agency needed to become genuinely capable of choosing is weakened. Against approaches that treat AI primarily as an optimizer of already given ends, we argue that AI systems should be evaluated by whether they protect and cultivate meta-capacity against the illusion of opting: the socially and institutionally scaffolded agentive capacity through which means and ends can be formed, contested, revised, and owned. This reframing is especially urgent for disadvantaged populations, who are least able to absorb the costs of the illusion of opting when AI-mediated pathways misdirect behavior and action. We propose three normative imperatives for AI-mediated consequential decisions: existential honesty, which acknowledges the limits of prediction; ecological rationality, which situates guidance within heterogeneous lived ecologies; and counterfactual reparation, which acknowledges and repairs foreclosed alternatives when AI-mediated decision-making pathways fail.

URL PDF HTML ☆

赞 0 踩 0

2605.24358 2026-06-04 cs.LG cs.AI 版本更新

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

图数据上具有差异化网络效应的处理效应估计

Xiaofeng Lin, Han Bao, Hisashi Kashima

发表机构 * Kyoto University（京都大学）； The Institute of Statistical Mathematics（统计数学研究所）； Tohoku University（东北大学）； RIKEN AIP（理化学研究所AIP）

AI总结针对图数据中个体处理效应估计受邻居干扰且存在差异化网络效应的问题，提出一种结合部分注意力机制和消息放大器的干扰建模方法，以捕获邻居重要性和规模差异，提升估计精度。

Comments Accepted by the research track of the KDD 2026 conference

详情

AI中文摘要

从观测图数据中估计个体处理效应（ITE）对于商业和医学等领域的决策至关重要。由于干扰的存在，该任务具有挑战性，因为个体结果可能受到其邻居的处理和协变量的影响。现有方法尝试对这种干扰进行建模以实现准确的ITE估计。然而，一个关键问题常常被忽视：差异化网络效应（DNE），即由具有不同重要性和规模的邻居组成的局部网络所产生的影响。捕获DNE至关重要；否则，由于对干扰的错误刻画，我们将得到不精确的ITE估计，从而导致错误的决策。为了解决这一挑战，我们提出了一种新颖的干扰建模机制，该机制结合了两个部分注意力机制和一个消息放大器。部分注意力机制自动估计不同邻居在干扰中的重要性，而消息放大器根据邻居的规模调整干扰建模机制的结果，所有这些使得模型能够捕获DNE。在三个真实世界图上的实验表明，我们的方法在从图数据估计ITE方面优于现有方法，这证实了显式捕获DNE的重要性。

英文摘要

Estimating individual treatment effect (ITE) from observational graph data is crucial for decision-making in the fields such as commerce and medicine. This task is challenging due to interference, where individual outcomes can be influenced by the treatments and covariates of their neighbors. Existing methods attempt to model such interference for accurate ITE estimation. However, a critical issue is often overlooked: differentiated networked effect (DNE), an effect caused by local networks consisting of neighbors with varying importance and scales. Capturing DNE is vital; otherwise, we will end up with imprecise ITE estimation due to an erroneous characterization of interference, which can result in misguided decisions. To address this challenge, we propose a novel interference modeling mechanism that incorporates two partial attention mechanisms and a message amplifier. The partial attention mechanisms automatically estimate the importance of different neighbors in contributing to interference, while the message amplifier adjusts the results of the interference modeling mechanism based on the scale of neighbors, all of which enables the model to capture DNE. Experiments on three real-world graphs demonstrate that our methods outperform existing approaches for ITE estimation from graph data, which corroborates the importance of explicitly capturing DNE.

URL PDF HTML ☆

赞 0 踩 0

2605.27488 2026-06-04 cs.CR cs.AI 版本更新

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

Grimlock: 使用eBPF和认证通道保护高代理系统

Qiancheng Wu, Wenhui Zhang, Gan Fang, Sheng Mao, Biao Gao, David Levitsky, Shawna Murphy Butterworth, Rob Cameron

发表机构 * Roblox

AI总结针对代理系统中用户编排代码带来的安全挑战，提出Grimlock代理守卫，通过eBPF强制流量拦截和TLS 1.3通道绑定认证，实现透明、可审计、作用域绑定的代理间通信。

Comments Vision paper presented at the 1st Workshop on Operating Systems Design for AI Agents (AgenticOS '26), co-located with ASPLOS 2026

详情

AI中文摘要

代理系统越来越多地运行用户编写的编排代码，这些代码调用工具、生成子任务并在机器和云之间委派工作。虽然这种高代理效率很高，但它带来了安全问题：身份、授权、来源和委派通常被推入应用程序代码，在那里它们变得难以一致地执行和审计。我们提出Grimlock，一种代理守卫，通过将信任执行移动到沙箱子系统中，同时保持代理代码不变，来恢复关注点分离。Grimlock使用eBPF强制流量拦截来确保沙箱通信通过守卫，并将其与绑定到标准TLS 1.3通道绑定的握手后认证相结合。通道建立后，守卫授权通信并生成短暂的、通道绑定的作用域令牌，这些令牌捕获最小权限委派。在接收端，目标守卫重新验证身份、作用域和通道绑定，终止TLS，并仅在策略检查成功后向目标沙箱释放明文。kTLS为受保护的通信提供了高效的数据平面。因此，Grimlock提供了一条路径，使用通用Linux原语，无需更改用户层编排代码，即可在异构多云环境中实现透明、可审计、作用域绑定的代理间通信。

英文摘要

Agentic systems increasingly run user-authored orchestration code that invokes tools, spawns subtasks, and delegates work across machines and clouds. Although this high agency is productive, it creates a security problem: identity, authorization, provenance, and delegation are often pushed into application code, where they become difficult to enforce consistently and difficult to audit. We present Grimlock, an Agent Guard that restores separation of concerns by moving trust enforcement into the sandbox substrate while leaving agent code unchanged. Grimlock uses eBPF-enforced traffic interception to ensure that sandbox communication passes through a guard, and combines it with post-handshake attestation bound to standard TLS~1.3 channel bindings. After a channel is established, the guard authorizes communication and mints short-lived, channel-bound scope tokens that capture least-privilege delegation. At the receiving side, the destination guard re-validates identity, scope, and channel binding, terminates TLS, and releases plaintext to the destination sandbox only after policy checks succeed. kTLS provides an efficient dataplane for protected communication. As a result, Grimlock offers a path toward transparent, auditable, and scope-bound agent-to-agent communication across heterogeneous multi-cloud environments, using commodity Linux primitives and without requiring changes to user-layer orchestration code.

URL PDF HTML ☆

赞 0 踩 0

2605.30120 2026-06-04 cs.IR cs.AI cs.LG 版本更新

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

不再需要K-means：用于高效多向量检索的单阶段稀疏编码

Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）

AI总结针对多向量检索中K-means聚类导致的索引延迟和语义损失问题，提出单阶段稀疏检索（SSR），利用稀疏自编码器将词元嵌入投影为高维稀疏表示，结合倒排索引实现高效检索，在BEIR基准上索引时间减少15倍、检索延迟减半且性能提升。

Comments Accepted by ICML2026

详情

AI中文摘要

以ColBERT为代表的多向量检索（MVR）模型通过保留细粒度的词元级交互，在检索准确性上树立了新标杆。然而，这种粒度带来了存储和检索效率的瓶颈：为了管理十亿级词元向量的巨大内存占用和计算开销，最先进的系统被迫依赖激进的降维和复杂的聚类（例如K-means）。这种妥协引入了两个关键限制：大规模语料库聚类的过度索引延迟以及压缩固有的语义信息损失。在本文中，我们提出了单阶段稀疏检索（SSR），这是一种范式转变，用高效的稀疏编码取代了昂贵的聚类。我们不将特征压缩为低维稠密向量，而是利用稀疏自编码器（SAE）将词元嵌入投影到高维但高度稀疏的表示中。这种转换使我们能够完全绕过向量聚类，并利用倒排索引实现精确、高吞吐量的检索。在BEIR基准上的大量实验表明，SSR实现了“三连胜”的改进：与ColBERTv2相比，索引时间减少了15倍，检索延迟减半，同时检索性能优于领先的基线方法。

英文摘要

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise introduces two critical limitations: excessive indexing latency of clustering large-scale corpora and semantic information loss inherent to compression. In this paper, we propose Single-stage Sparse Retrieval (SSR}, a paradigm shift that replaces expensive clustering with efficient sparse coding. Instead of compressing features into low-dimensional dense vectors, we utilize Sparse Autoencoder (SAE) to project token embeddings into a high-dimensional but highly sparse representation. This transformation enables us to bypass vector clustering entirely and leverage inverted indexing for precise, high-throughput retrieval. Extensive experiments on the BEIR benchmark demonstrate that SSR achieves a "trifecta" of improvements: it reduces indexing time by 15x compared to ColBERTv2, halves retrieval latency, and simultaneously improves retrieval performance over leading baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.29928 2026-06-04 cs.HC cs.AI 版本更新

结构化提示优化结合强化学习实现复杂文本的全局与局部可解释性

Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； Amazon（亚马逊）

AI总结提出eXTC框架，通过结构化提示优化、基于SOP的推理蒸馏和强化学习扩展，在分类性能和解释质量上显著优于现有范式。

详情

AI中文摘要

LLMs在文本分类上取得了进展，但现有范式面临权衡：监督（仅标签）微调可扩展，但对复杂文本推理有限且缺乏模型透明度；离散提示优化提供可读指令，但性能和可扩展性不佳。我们引入eXTC（可解释文本分类器），包含三个渐进阶段：（1）通过新的结构化提示优化算法学习自然语言的标准操作程序（SOP或规则手册）；（2）从大型教师LLM到紧凑LM的基于SOP的推理蒸馏；（3）通过强化学习扩展超出初始SOP的推理能力。该设计使eXTC能够（i）通过紧凑LM实现快速推理，（ii）提供推理时的局部推理轨迹，以及其学习领域规则的全局模块化解释，同时（iii）在分类性能和解释质量上显著优于现有范式，并逐步提升。

英文摘要

LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers limited reasoning on complex text and lacks broader model transparency, while discrete prompt optimization offers human-readable instructions but struggles with performance and scalability. We introduce eXTC (eXplainable Text Classifier) with three progressive stages: (1) learning a Standard Operating Procedure (SOP, or rulebook) in natural language via a new Structured Prompt Optimization algorithm; (2) SOP-grounded reasoning distillation from a large teacher LLM into a compact LM; and (3) expanding reasoning capabilities beyond the initial SOP via reinforcement learning. This design enables eXTC to provide (i) fast inference via a compact LM, with (ii) inference-time local reasoning traces, alongside a global, modular explanation of its learned domain rules, while (iii) significantly outperforming existing paradigms across diverse benchmarks in both classification performance and explanation quality, with stage-by-stage gains.

URL PDF HTML ☆

赞 0 踩 0

2605.28829 2026-06-04 cs.CL cs.AI cs.CY 版本更新

纠正注意力分散引起的视觉模糊以减少幻觉：算法与理论

Quanjiang Li, Zhiming Liu, Wei Luo, Tingjin Luo, Chenping Hou

发表机构 * National University of Singapore（新加坡国立大学）； University of Science and Technology of China（中国科学技术大学）

AI总结本文揭示多模态大语言模型中的物体幻觉与类人注意力分散现象相关，并提出一种无需额外训练的注意力聚焦方法（AFIP）通过跨头注意力增强和动态历史注意力强化来纠正视觉模糊，从而减少幻觉。

详情

Journal ref: ICML2026

AI中文摘要

多模态大语言模型（MLLMs）经常遭受物体幻觉的困扰，但导致这一失败的视觉感知机制仍知之甚少。在这项工作中，我们揭示幻觉与一种类人注意力分散现象密切相关，其中人类在注意力分散下会经历视觉清晰度下降并产生不准确的描述，而在模型中，同样的机制表现为解码过程中多头注意力的空间不一致性以及对图像令牌注意力的时间衰减。我们进一步提供了理论见解，表明注意力分散会增加模型复杂度并降低分类泛化能力。受这些发现的启发，我们提出了一种用于改进图像感知的注意力聚焦方法（AFIP），该方法通过跨头注意力丰富来纠正注意力分散，并通过动态历史注意力增强来强化视觉基础。在多个基准和模型上的大量实验验证了AFIP的有效性，且无需额外训练。

英文摘要

Multimodal large language models (MLLMs) frequently suffer from object hallucinations, yet the visual perceptual mechanism underlying this failure remains poorly understood. In this work, we reveal that hallucinations are strongly associated with a human-like attention distraction phenomenon, where humans under divided focus experience degraded visual clarity and produce inaccurate descriptions, while in models the same mechanism manifests as spatial inconsistency in multi-head attention and temporal fading of attention to image tokens during decoding. We further provide theoretical insights that attention dispersion increases model complexity and degrades classification generalization. Motivated by these findings, we propose an Attention-Focused Approach for Improved Image Perception (AFIP), which corrects attention distraction via cross-head attention enrichment and reinforces visual grounding through dynamic historical attention enhancement. Extensive experiments on multiple benchmarks and models validate the effectiveness of AFIP without additional training.

URL PDF HTML ☆

赞 0 踩 0

2605.17273 2026-06-04 cs.LG cs.AI 版本更新

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

立场：声称最先进需要最先进的证据

YongKyung Oh

发表机构 * YongKyung Oh（永庆欧）

AI总结本文指出人工智能和机器学习研究中普遍存在的声称最先进（SOTA）与证据不足之间的差距，通过分析十个跨领域基准测试发现，超过一半的顶级模型比较中至少一项常见的优越性假设不成立，并呼吁声明语言应反映证据强度。

详情

AI中文摘要

最先进（SOTA）声称在人工智能（AI）和机器学习（ML）研究中普遍存在。这些声称基于基准评估，其中模型根据跨任务的总分进行排名。公共基准或排行榜是最明显的实例，但相同的结构也出现在文献中的论文表格中。然而，这种微弱的证据往往无法支持这些强有力的声称。我们识别出AI基准测试中普遍存在的声称-证据差距。声称SOTA隐含着超越平均分数优越性的假设，表明模型在大多数任务上显著优于替代方案。然而，平均分数的边际改进仅表明平均排名靠前，而非真正的优越性。通过分析来自公共排行榜的十个跨领域基准测试，我们发现超过一半的顶级模型比较中，至少一项常见的优越性假设不成立。这些属性包括有意义的效应大小、跨任务的一致性，或对数据集移除的鲁棒性。相反，总分提升往往由异常数据集驱动。即使在任务众多的基准测试中，这种脆弱性仍然存在。我们认为，声称语言应反映潜在证据的强度。这不需要额外的实验，只需诚实地报告结果实际显示的内容，从而实现跨模型更精确和可解释的比较。

英文摘要

State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark evaluations, where models are ranked by aggregate scores across tasks. Public benchmarks or leaderboards are the most visible instance, but the same structure appears in paper tables throughout the literature. However, such minimal evidence often cannot support these strong claims. We identify a widespread claim-evidence gap in AI benchmarking. Claiming SOTA carries implicit assumptions beyond mean score superiority, suggesting that a model meaningfully outperforms alternatives across most tasks. However, a marginal improvement in the mean score merely indicates a top average rank rather than true superiority. Analyzing ten cross-domain benchmarks from public leaderboards, we found that in more than half of top-model comparisons, at least one commonly assumed property of superiority does not hold. These properties include meaningful effect size, consistency across tasks, or robustness to dataset removal. Instead, aggregate gains are frequently driven by outlier datasets. This fragility persists even in benchmarks with many tasks. We argue that claim language should reflect the strength of the underlying evidence. This requires no additional experiments, only honest reporting of what results actually show, enabling more precise and interpretable comparisons across models.

URL PDF HTML ☆

赞 0 踩 0

2605.22240 2026-06-04 cs.AI 版本更新

Unlocking Proactivity in Task-Oriented Dialogue

解锁任务导向型对话中的主动性

Azure Zhang, Ning Gao, Yuqin Dai, Ruiyuan Wu, Jinpeng Wang, Rena Wei Gao, Bingdong Tan, Shuzheng Gao, Zongjie Li, Chaozheng Wang

发表机构 * Keeta AI, Meituan（Keeta AI，美团）； Independent Researcher（独立研究者）； CUHK（香港中文大学）； HKUST（香港科技大学）

AI总结针对任务导向型对话中主动性问题，提出认知用户模拟器和模拟器诱导的非对称视角策略优化，通过建模用户潜在关注实现主动对话。

详情

AI中文摘要

主动任务导向型对话（如外呼销售）需要一个有说服力的代理，能够主动探询用户的关注点，并在有限轮次内引导对话走向接受。然而，后训练的LLM本质上是保守的，而奖励塑造强化学习（如GRPO）效果不佳，因为它仅重新加权被动策略已采样的内容。我们表明，以用户的潜在关注为条件可以解锁任何采样量都无法破坏的主动能力，从而将这些关注确立为关键的训练时信号。为将这一发现付诸实践，我们构建了**认知用户模拟器**，它将每个用户建模为一个分层角色，包括可观察的外部特征和隐藏的内部关注。该模拟器产生忠实且多样化的交互，同时输出每轮状态动态以跟踪说服进展。然后，我们引入**模拟器诱导的非对称视角策略优化**，将建模的关注和模拟状态转换转化为互补的训练目标：（1）*非对称在线自蒸馏*，将关注感知行为从同一策略的特权视角转移到其可部署的、仅对话视角；（2）*状态转换策略优化*...

英文摘要

Proactive task-oriented dialogue (TOD), such as outbound sales, demands a persuasive agent that actively probes the user's concerns and steers the conversation toward acceptance within a bounded number of turns. Yet post-trained LLMs are inherently conservative, and reward-shaping RL (e.g., GRPO) struggles since it only re-weights what an already passive policy samples. We show that conditioning on the user's latent concerns unlocks proactive capability that no amount of sampling can undermine, establishing these concerns as a pivotal training-time signal. To operationalize this finding, we build the \textbf{Cognitive User Simulator}, which models each user as a stratified persona comprising observable external traits and hidden internal concerns. The simulator produces faithful and diverse interactions, while emitting per-turn state dynamics that track persuasion progress. We then introduce \textbf{Simulator-Induced Asymmetric-View Policy Optimization}, which converts the modeled concerns and the simulation state transition into complementary training objectives: (1) \emph{Asymmetric On-Policy Self-Distillation} that transfers concern-aware behavior from a privileged view of the same policy into its deployable, conversation-only view; and (2) \emph{State-Transition Policy Refinement} ...

URL PDF HTML ☆

赞 0 踩 0

2605.21446 2026-06-04 cs.RO cs.AI 版本更新

ZeroUnlearn：大语言模型中的少样本知识遗忘

Yujie Lin, Chengyi Yang, Zhishang Xiang, Yiping Song, Jinsong Su

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出ZeroUnlearn框架，通过模型编辑将机器遗忘重新定义为精确的知识重映射问题，利用封闭解乘法参数更新实现高效、定向的少样本遗忘。

详情

AI中文摘要

大型语言模型由于在海量网络语料上训练，不可避免地会保留敏感信息（定义为可能引发有害生成的输入），从而引发隐私和安全担忧。现有的机器遗忘方法主要依赖于重训练或激进微调，这些方法要么计算成本高，要么容易降低相关知识并损害整体模型效用。在这项工作中，我们通过模型编辑将机器遗忘重新表述为一个精确的知识重映射问题。我们提出了ZeroUnlearn，一个少样本遗忘框架。它通过将敏感输入映射到中性目标状态并移除其原始表示来覆盖敏感输入。ZeroUnlearn通过封闭解形式的乘法参数更新强制执行表示正交性，从而实现高效且有针对性的遗忘。我们进一步将ZeroUnlearn扩展到基于梯度的变体，用于多样本遗忘。实验表明，我们的方法在保持模型整体效用的同时优于现有基线。我们的代码可在github上获取：https://github.com/XMUDeepLIT/ZeroUnlearn。

英文摘要

Large language models inevitably retain sensitive information, defined as inputs that may induce harmful generations, due to training on massive web corpora, raising concerns for privacy and safety. Existing machine unlearning methods primarily rely on retraining or aggressive fine-tuning, which are either computationally expensive or prone to degrading related knowledge and overall model utility. In this work, we reformulate machine unlearning as a precise knowledge re-mapping problem via model editing. We propose ZeroUnlearn, a few-shot unlearning framework. It overwrites sensitive inputs by mapping them to a neutral target state and removing their original representations. ZeroUnlearn enforces representational orthogonality through a multiplicative parameter update with a closed-form solution, enabling efficient and targeted unlearning. We further extend ZeroUnlearn to a gradient-based variant for multi-sample unlearning. Experiments demonstrate that our approach outperforms existing baselines while preserving general model utility. Our code is available at the github: https://github.com/XMUDeepLIT/ZeroUnlearn.

URL PDF HTML ☆

赞 0 踩 0

2605.19294 2026-06-04 cs.RO cs.AI 版本更新

DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

DEFLECT: 面向延迟鲁棒异步VLA的时间反事实偏好学习

Yixiang Zhu, Yonghao Chen, Zijie Yang, Yusong Hu, Xinyu Chen

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科学与技术大学（广州））； One Robotics

AI总结针对异步视觉-语言-动作（VLA）策略中陈旧观测导致的预测-执行不匹配问题，提出离线后训练框架DEFLECT，通过反事实偏好监督学习偏好与执行时间对齐的动作，无需人工标注、在线部署或架构修改，显著提升高延迟下的任务成功率。

详情

AI中文摘要

视觉-语言-动作（VLA）策略越来越依赖异步推理，将大模型延迟隐藏在持续的机器人运动背后。虽然这避免了同步动作块执行的“走走停停”行为，但产生了预测-执行不匹配：下一个动作块是根据推理开始时的陈旧观测计算得出的，但仅在机器人和场景发生变化后才执行。因此，适合预测时状态的动作可能与执行时状态不对齐。现有的运行时修复、行为克隆和偏好对齐方法并未直接教导策略解决这种陈旧输入不匹配问题。我们提出DEFLECT，一个面向延迟鲁棒异步VLA的离线后训练框架。DEFLECT将延迟引起的不匹配转化为反事实偏好监督：冻结的参考VLA从未来的执行时间观测生成偏好块，并从陈旧的预测时间观测生成拒绝块。可训练策略在相同的部署时间输入下对两个块进行评分，学习偏好与执行时间对齐的动作，同时监督微调锚点保留专家动作流形。DEFLECT不需要人工偏好标签、奖励模型、在线机器人部署、架构更改或额外的推理时间计算。在Kinetix、LIBERO和三个真实机器人任务上，DEFLECT相比强异步VLA基线提高了延迟鲁棒性，在高延迟下成功率提升高达6.4个百分点，并在真实规模VLA的最长延迟下实现4.6个百分点的增益。

英文摘要

Vision-Language-Action (VLA) policies increasingly rely on asynchronous inference to hide large-model latency behind ongoing robot motion. While this avoids the stop-and-go behavior of synchronous action-chunk execution, it creates a prediction-execution mismatch: the next chunk is computed from a stale observation at inference start but executed only after the robot and scene have evolved. As a result, actions that fit the prediction-time state can become misaligned with the execution-time state. Existing runtime repair, behavior-cloning, and preference-alignment approaches do not directly teach the policy to resolve this stale-input mismatch. We propose DEFLECT, an offline post-training framework for delay-robust asynchronous VLAs. DEFLECT converts latency-induced mismatch into counterfactual preference supervision: a frozen reference VLA generates a preferred chunk from the future execution-time observation and a rejected chunk from the stale prediction-time observation. The trainable policy scores both chunks under the same deployment-time input, learning to favor execution-time-aligned actions while a supervised fine-tuning anchor preserves the expert action manifold. DEFLECT requires no human preference labels, reward models, online robot rollouts, architectural changes, or additional inference-time computation. Across Kinetix, LIBERO, and three real-robot tasks, DEFLECT improves delay robustness over strong asynchronous VLA baselines, raising high-latency success by up to 6.4 percentage points and achieving a 4.6 percentage-point gain at the longest delay on a real-scale VLA.

URL PDF HTML ☆

赞 0 踩 0

2605.18931 2026-06-04 stat.ML cs.AI cs.LG 版本更新

Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

马尔可夫链解码器克服Lipschitz生成模型的重尾限制

Abdelhakim Ziani, Andras Horvath, Paolo Ballarini

发表机构 * Université Paris Saclay, Lab. MICS, CentraleSupélec, Gif-sur-Yvette, France（巴黎萨克雷大学，MICS实验室，CentraleSupélec，法国吉夫昂耶vette）； Università di Torino, Torino, Italy（都灵大学，意大利都灵）

AI总结针对Lipschitz生成模型无法生成重尾分布的问题，提出用基于马尔可夫链的Phase-Type分布替换高斯解码器，显著降低了尾部误差和极端分位数误差。

详情

Journal ref: 22nd European Performance Engineering Workshop (EPEW 2026), Jun 2025, Grimstad, Norway

AI中文摘要

重尾分布在性能评估、网络流量和风险建模中普遍存在。这种行为对现代深度生成模型构成了根本性挑战。标准变分自编码器（VAE）采用高斯解码器似然和Lipschitz约束神经网络，这种组合在结构上无法产生重尾输出：高斯尾部呈指数衰减，而Lipschitz连续性阻止解码器放大来自潜在空间的罕见事件以充分克服这种衰减。我们提供了这一局限性的理论刻画，并使用合成Pareto数据（跨越尾部指数$α$ ∈ {2, 3, 5, 30}和维度d ∈ {1, 5, 10}的网格）进行了受控实证演示。作为解决方案，我们在保持编码器、潜在空间和训练过程不变的情况下，将高斯解码器替换为基于马尔可夫链的Phase-Type（PH）分布。PH分布允许对任何正值分布（包括重尾族）进行任意精确的近似。实验表明，对于重尾数据，与高斯基线相比，基于PH的模型将尾部Kolmogorov-Smirnov距离减少了最多6倍，极端分位数误差减少了最多10倍。这些结果表明，将基于马尔可夫链的分布集成到生成模型的解码器中，为重尾生成问题提供了一个有原则且实际有效的解决方案。

英文摘要

Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space input to sufficiently overcome this decay. We provide both a theoretical characterization of this limitation and a controlled empirical demonstration using synthetic Pareto data across a grid of tail indices $α$ $\in$ {2, 3, 5, 30} and dimensions d $\in$ {1, 5, 10}. As a solution, we replace the Gaussian decoder with a Phase-Type (PH) distribution based on Markov chains, while keeping the encoder, latent space, and training procedure identical. PH distributions allow for arbitrarily precise approximations of any positive-valued distributions, including heavy-tailed families. Experiments showed that the PH-based model reduces tail Kolmogorov-Smirnov distance by up to x6 and extreme quantile error by up to x10 compared to the Gaussian baseline for heavy-tailed data. These results demonstrate that integrating Markov chain-based distributions into the decoder of a generative model institutes a principled and practically effective solution to the heavy-tail generation problem.

URL PDF HTML ☆

赞 0 踩 0

2605.16331 2026-06-04 q-bio.BM cs.AI 版本更新

Retrieval and competition: how a protein foundation model starts a protein

检索与竞争：蛋白质基础模型如何启动蛋白质

Piotr Jedryszek, Oliver M. Crook

发表机构 * Department of Biology, University of Oxford, Oxford, UK（牛津大学生物学系）； Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK（牛津大学纳科学发现研究所）； Department of Chemistry, University of Oxford, Oxford, UK（牛津大学化学系）

AI总结通过追踪ESM2-8M预测蛋白质起始甲硫氨酸的计算路径，发现模型依赖位置先验检索而非直接识别，揭示了模型置信度与生物学证据之间的脱节。

Comments updated figure 4

详情

AI中文摘要

蛋白质语言模型越来越多地用于指导实验和临床决策，但通常不清楚一个自信的预测是反映了对生物学证据的识别还是对统计默认值的检索。我们针对一个近乎普遍的生物学规则——蛋白质以甲硫氨酸起始——通过追踪ESM2-8M产生该预测的计算路径来检验这一区别。模型并未检测到掩码位置的甲硫氨酸。相反，它通过跨层组装的特定位置查询，从序列起始标记处的参考表示中检索出有利于甲硫氨酸的信号，最终输出通过与上下文相关电路的竞争而出现。为了理解位置信息如何到达读出端，我们引入了旋转频率带内注意力分数的范数-方向分解。位置编码通过分布在各个频带中的查询范数和角度对齐的耦合变化来运作。对于真实N端不是甲硫氨酸的序列（此时生物学问题至关重要），模型仍然预测甲硫氨酸。这不是由意外机制产生的正确预测，而是匹配统计平均值的位置先验检索电路的输出，在生物学偏离平均值的地方失败。区分这两者需要在单个电路、频率带和查询组成的层面上进行解析，这表明在生物学风险更高的预测中，机制验证将是必要且具有挑战性的。即使对于最简单的生物学规则，模型的预测也是通过分布式计算电路而非直接识别来介导的，这表明任务复杂性的增加将进一步模糊模型置信度与潜在生物学证据之间的关系。

英文摘要

Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational pathway through which ESM2-8M produces this prediction. The model does not detect methionine at the masked position. Instead, it retrieves a methionine-favouring signal from a reference representation at the beginning-of-sequence token via a position-specific query assembled across layers, with the final output emerging through competition with context-dependent circuits. To understand how positional information reaches the readout, we introduce a norm-direction decomposition of attention scores within rotary frequency bands. Positional encoding operates through coupled changes in query norm and angular alignment distributed across these bands. On sequences whose true N-terminus is not methionine, where the biological question matters, the model predicts methionine anyway. This is not a correct prediction produced by an unexpected mechanism, but the output of a positional-prior retrieval circuit that matches the statistical average and fails where biology diverges from it. Distinguishing the two requires resolution at the level of individual circuits, frequency bands, and query composition, suggesting that mechanistic verification will be necessary, and challenging, for predictions where the biological stakes are higher. Even for the simplest biological rule, the model's prediction is mediated by a distributed computational circuit rather than direct recognition, suggesting that increasing task complexity will further obscure the relationship between model confidence and underlying biological evidence.

URL PDF HTML ☆

赞 0 踩 0

2605.16301 2026-06-04 cs.CY cs.AI cs.LG 版本更新

Do LLMs Hold Their Values? MANTA: A Multi-Turn Adversarial Benchmark for Animal Welfare Reasoning

LLMs 是否坚持其价值观？MANTA：一个用于动物福利推理的多轮对抗性基准

Isabella Luong, Joyee Chen, Arturs Kanepajs, Jasmine Brazilek, Sankalpa Ghose, David Williams-King, Linh Le, Allen Lu

发表机构 * SPAR ； Compassion Aligned Machine Learning（同情对齐机器学习）； NUS（新加坡大学）； Mila（Mila研究所）； ERA Cambridge（剑桥ERA）

AI总结提出 MANTA 基准，通过多轮对抗性对话评估大语言模型在动物福利推理中的价值观稳定性和道德敏感性，发现单轮基准无法捕捉的排名变化和物种-压力交互效应。

详情

AI中文摘要

评估大语言模型（LLMs）中的动物福利推理仍然是一个开放挑战，尽管它们在消费者和专业环境中迅速部署，其中福利考虑隐含地出现在日常查询中。现有的基准（如 AnimalHarmBench）通过单轮、明确框架的问题进行评估，衡量模型在直接询问时是否避免有害内容。这种方法忽略了两种失败模式：在持续对抗性压力下的对齐退化，以及道德敏感性（模型是否在日常查询中自发提出福利问题）。为填补这一空白，我们构建了 MANTA，一个包含 1,088 个五轮对话的基准，从隐式的第一轮场景开始，通过明确的福利提示，再到来自五种类型（社会、文化、经济、实用和认知）的三轮对抗性压力。我们在两个维度上对对话进行评分：动物福利价值观稳定性（AWVS，主要）和动物福利道德敏感性（AWMS，诊断）。我们评估了七个前沿模型：Claude Opus 4.7、GPT-5.5、DeepSeek V4、Llama 3.3 70B、Mistral Small、Grok 4.3 和 Gemini 3.1 Flash Lite。多轮评估捕捉了单轮基准遗漏的行为：7 个模型中有 4 个相对于第一轮得分改变了排名，包括 Gemini Flash Lite，它在 AWMS 上从第五名下降到 AWVS 上的最后一名。AWMS 和 AWVS 呈正相关但不完全相关，表明道德识别测试捕捉了模型在压力下行为的一个稳定但不完整的组成部分。MANTA 还提供了先前基准无法获得的物种-压力交互矩阵，显示福利鲁棒性同时取决于动物和施加的压力；伴侣动物得分高于野生动物，后者高于养殖动物和无脊椎动物。我们发布了数据集、脚本化压力计划、评判提示和分析代码。

英文摘要

Evaluating animal welfare reasoning in LLMs remains an open challenge despite rapid deployment in consumer and professional contexts where welfare considerations appear implicitly in everyday queries. Existing benchmarks such as AnimalHarmBench evaluate this through single-turn, explicitly framed questions, measuring whether models avoid harmful content when directly asked. This approach overlooks two failure modes: alignment degradation under sustained adversarial pressure, and moral sensitivity (whether a model spontaneously surfaces welfare stakes in everyday queries). To fill this gap, we construct MANTA, a benchmark of 1,088 five-turn conversations progressing from an implicit Turn-1 scenario through an explicit welfare prompt to three adversarial pressure rounds drawn from a five-type taxonomy: Social, Cultural, Economic, Pragmatic, and Epistemic. We score conversations on two dimensions: Animal Welfare Value Stability (AWVS, primary) and Animal Welfare Moral Sensitivity (AWMS, diagnostic). We evaluate seven frontier models: Claude Opus 4.7, GPT-5.5, DeepSeek V4, Llama 3.3 70B, Mistral Small, Grok 4.3, and Gemini 3.1 Flash Lite. Multi-turn evaluation captures behavior single-turn benchmarks miss: 4 of 7 models change rank relative to Turn 1 scores, including Gemini Flash Lite, which drops from fifth on AWMS to last on AWVS. AWMS and AWVS are positively but imperfectly correlated, suggesting moral-recognition tests capture a stable but incomplete component of model behavior under pressure. MANTA also enables a species-by-pressure interaction matrix unavailable to prior benchmarks, showing welfare robustness depends jointly on the animal and pressure applied; companion animals score above wild animals, which score above farmed animals and invertebrates. We release the dataset, scripted pressure plans, judge prompts, and analysis code.

URL PDF HTML ☆

赞 0 踩 0

2605.15152 2026-06-04 cs.LG cs.AI 版本更新

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

扩大差距：通过异常值注入利用LLM量化

Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结本文提出首个针对多种先进量化方法（AWQ、GPTQ、GGUF I-quants）的量化条件攻击，通过注入异常值导致权重塌缩，诱导模型在量化后出现恶意行为。

详情

AI中文摘要

LLM量化已成为内存高效部署的关键。最近的研究表明，量化方案可能带来严重的安全风险：对手可以发布一个在全精度下看似良性，但在用户量化后表现出恶意行为的模型。然而，现有的量化条件攻击仅限于相对简单的量化方法，攻击者可以估计在目标量化下保持不变的权重区域。值得注意的是，先前的攻击始终未能攻破更流行和复杂的方案，限制了其实际影响。在这项工作中，我们提出了首个量化条件攻击，能够持续诱导出可由多种先进量化技术（包括AWQ、GPTQ和GGUF I-quants）触发的恶意行为。我们的攻击利用了现代量化方法共有的一个简单特性：大的异常值可能导致其他权重四舍五入为零。因此，通过向特定权重块注入异常值，对手可以诱导模型出现目标性的、可预测的权重塌缩。这种效应可用于制作看似良性的全精度模型，这些模型在量化后表现出广泛的恶意行为。通过在三种攻击场景和LLM上的广泛评估，我们表明我们的攻击在先前攻击失败的多种量化方法上实现了高成功率。我们的结果首次证明，量化的安全风险不仅限于更简单的方案，而是广泛存在于复杂、广泛使用的量化方法中。

英文摘要

LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing quantization-conditioned attacks have been limited to relatively simple quantization methods, where the attacker can estimate weight regions that remain invariant under the target quantization. Notably, prior attacks have consistently failed to compromise more popular and sophisticated schemes, limiting their practical impact. In this work, we introduce the first quantization-conditioned attack that consistently induces malicious behavior that can be triggered by a broad range of advanced quantization techniques, including AWQ, GPTQ, and GGUF I-quants. Our attack exploits a simple property shared by many modern quantization methods: large outliers can cause other weights to be rounded to zero. Consequently, by injecting outliers into specific weight blocks, an adversary can induce a targeted, predictable weight collapse in the model. This effect can be used to craft seemingly benign full-precision models that exhibit a wide range of malicious behaviors after quantization. Through extensive evaluation across three attack scenarios and LLMs, we show that our attack achieves high success rates against a broad range of quantization methods on which prior attacks fail. Our results demonstrate, for the first time, that the security risks of quantization are not restricted to simpler schemes but are broadly relevant across complex, widely-used quantization methods.

URL PDF HTML ☆

赞 0 踩 0

2605.14054 2026-06-04 cs.AI cs.CV 版本更新

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen, Fangzhen Lin

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一种基于强化学习的模态感知信用分配框架（MoCA），通过感知验证和结构化口头验证解决视觉语言模型中感知与推理的权衡问题，实现多任务性能提升。

Comments Accepted by ICML 2026 as Oral

详情

AI中文摘要

实现稳健的感知-推理协同是高级视觉语言模型（VLM）的核心目标。最近的进展通过架构设计或智能体工作流追求这一目标。然而，这些方法通常受限于静态文本推理，或因外部智能体复杂性的巨大计算和工程负担而变得复杂。更糟糕的是，这种大量投入并未带来成比例的性能提升，常常在感知和推理上观察到“跷跷板效应”。这促使我们从根本上重新思考真正的瓶颈。在本文中，我们认为这种权衡的根本原因是模态信用分配中的模糊性：当VLM失败时，是由于感知缺陷（“坏视力”）还是逻辑缺陷（“坏思维”）？为解决这一问题，我们引入了一个强化学习框架，通过可靠地奖励感知保真度来改善感知-推理协同。我们明确地将生成过程分解为交错的感知和推理步骤。这种解耦使得能够对感知进行有针对性的监督。关键的是，我们引入了感知验证（PV），利用“盲推理”代理独立于推理结果奖励感知保真度。此外，为了在自由形式的VL任务中扩展训练，我们提出了结构化口头验证（Structured Verbal Verification），用结构化的算法执行替代高方差的LLM评判。这些技术被整合到模态感知信用分配（MoCA）机制中，该机制将奖励路由到特定的错误源——无论是坏视力还是坏思维——使单个VLM能够在广泛的任务谱系上同时获得性能提升。

英文摘要

Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a "seesaw effect" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception ("bad seeing") or flawed logic ("bad thinking")? To resolve this, we introduce a reinforcement learning framework that improves perception-reasoning synergy by reliably rewarding the perception fidelity. We explicitly decompose the generation process into interleaved perception and reasoning steps. This decoupling enables targeted supervision on perception. Crucially, we introduce Perception Verification (PV), leveraging a "blindfolded reasoning" proxy to reward perceptual fidelity independently of reasoning outcomes. Furthermore, to scale training across free-form VL tasks, we propose Structured Verbal Verification, which replaces high-variance LLM judging with structured algorithmic execution. These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error -- either bad seeing or bad thinking -- enabling a single VLM to achieve simultaneous performance gains across a wide task spectrum.

URL PDF HTML ☆

赞 0 踩 0

2605.13672 2026-06-04 cs.CV cs.AI cs.LG 版本更新

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

SpurAudio: 用于研究少样本音频分类中捷径学习的基准

Giries Abu Ayoub, Morad Tukan, Loay Mualem

发表机构 * Department of Computer Science, University of Haifa（海法大学计算机科学系）； Independent Researcher（独立研究者）； University of Stuttgart, Germany（斯图加特大学，德国）； IMPRS-IS, Germany（智能系统国际Max Planck研究学校，德国）

AI总结提出SpurAudio基准，通过控制音频中前景与背景的关联，评估少样本分类模型对虚假相关性的敏感性，发现现有方法在背景变化时性能显著下降。

详情

AI中文摘要

少样本分类（FSC）广泛用于从有限标注数据中学习，但大多数评估隐含假设目标概念与上下文线索无关。然而，在现实场景中，样本通常出现在丰富的上下文中，允许模型利用前景内容与背景信号之间的虚假相关性。虽然这种效应已在少样本图像分类中得到研究，但其在少样本音频分类中的作用仍 largely 未被探索，且现有音频基准对上下文结构的控制有限。我们引入了 SpurAudio，一个利用音频中前景事件和背景环境的自然可分离性，以支持对支持集和查询集之间的上下文偏移进行可控、多级评估的基准。使用该基准，我们表明许多最先进的少样本方法在背景相关性被破坏时遭受严重的性能下降，尽管在标准评估协议下达到相似的准确率。关键的是，即使在大型预训练音频基础模型中，这种脆弱性仍然存在，排除了骨干网络容量不足的解释。此外，在传统基准下看似相当的方法可能对虚假相关性表现出显著不同的敏感性，揭示了与特征表示在推理时如何与分类器头交互相关的系统性算法优势和脆弱性。这些发现为音频中少样本方法的行为提供了新的见解，并强调了在评估FSC模型时需要明确探测上下文依赖性的基准。

英文摘要

Few-shot classification (FSC) is widely used for learning from limited labeled data, yet most evaluations implicitly assume that target concepts are independent of contextual cues. In real-world settings, however, examples often appear within rich contexts, allowing models to exploit spurious correlations between foreground content and background signals. While such effects have been studied in few-shot image classification, their role in few-shot audio classification remains largely unexplored, and existing audio benchmarks offer limited control over contextual structure. We introduce SpurAudio, a benchmark that leverages the natural separability of foreground events and background environments in audio to enable controlled, multi-level evaluation of contextual shifts across support and query sets. Using this benchmark, we show that many state-of-the-art few-shot methods suffer severe performance degradation when background correlations are disrupted, despite achieving similar accuracy under standard evaluation protocols. Crucially, this vulnerability persists even in large pretrained audio foundation models, ruling out limited backbone capacity as an explanation. Moreover, methods that appear comparable under conventional benchmarks can exhibit markedly different sensitivity to spurious correlations, revealing systematic algorithmic strengths and vulnerabilities tied to how feature representations interact with classifier heads at inference time. These findings provide new insight into the behavior of few-shot methods in audio and highlight the need for benchmarks that explicitly probe context dependence when evaluating FSC models.

URL PDF HTML ☆

赞 0 踩 0

2304.10891 2026-06-04 cs.LG cs.AI cs.CV cs.RO cs.SY eess.SY 版本更新

Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

基于Transformer的自动驾驶模型与面向部署的压缩：综述

Juan Zhong, Yuhang Shi, Zukang Xu, Xi Chen

发表机构 * Renmin University of China（中国人民大学）； Artificial Intelligence Innovation and Incubation Institute, Fudan University（复旦大学人工智能创新与孵化院）； Shanghai Academy of AI for Science（上海人工智能科学研究院）； Department of houmo.ai（houmo.ai部门）

AI总结本文综述了基于Transformer的自动驾驶模型，并从部署角度分析了压缩与加速策略（如量化、剪枝、知识蒸馏等）如何影响模型设计、部署性、鲁棒性和安全性。

详情

AI中文摘要

基于Transformer的模型正成为自动驾驶的核心范式，因为它们能够捕捉感知、预测和规划中的长程空间依赖、多智能体交互和多模态上下文。然而，它们在真实车辆中的部署仍然困难，因为高容量注意力架构带来了显著的延迟、内存和能量开销。本综述回顾了具有代表性的基于Transformer的自动驾驶模型，并按任务角色、感知配置和架构设计进行组织。更重要的是，我们从面向部署的角度审视这些模型，分析效率约束如何在实际中重塑模型设计选择。我们进一步回顾了与基于Transformer的驾驶系统相关的压缩和加速策略，包括量化、剪枝、知识蒸馏、低秩近似和高效注意力，并讨论了它们的优势、局限性和任务依赖性。我们不将压缩视为孤立的后期处理步骤，而是强调其作为直接影响部署性、鲁棒性和安全性的系统级设计考虑。最后，我们指出了面向标准化、安全感知和硬件感知的高效自动驾驶系统评估的开放挑战和未来研究方向。

英文摘要

Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.

URL PDF HTML ☆

赞 0 踩 0

2605.10246 2026-06-04 cs.AI 版本更新

SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems

SciIntegrity-Bench：评估AI科学家系统学术诚信的基准

Zonglin Yang, Xingtong Liu, Xinyan Xu

发表机构 * Tongji University（同济大学）； University of Tübingen（图宾根大学）

AI总结提出SCIINTEGRITY-BENCH基准，通过困境评估范式测试7个LLM在33个场景中的学术诚信，发现整体诚信问题率达34.2%，所有模型均存在失败，其中数据缺失场景下所有模型生成合成数据而非承认不可行。

详情

AI中文摘要

AI科学家系统越来越多地被用于自主研究，但其学术诚信从未被系统评估。我们引入了SCIINTEGRITY-BENCH，这是第一个围绕困境评估范式设计的基准：其11个陷阱类别中的33个场景均被构建为，诚实承认失败是唯一正确的回应，而完成任务则需要不端行为。在涵盖7个最先进LLM的231次评估运行中，整体诚信问题率达到34.2%，且没有模型实现零失败。最引人注目的是，在数据缺失场景中，所有七个模型都生成合成数据而非承认不可行，仅在是否披露替代数据方面有所不同。进一步的提示消融研究分离出两个驱动因素：移除明确的完成压力将未披露的捏造从20.6%急剧降至3.2%，而底层合成率保持不变，揭示了一种独立于提示级指令而持续存在的内在完成偏差。这些发现表明，缺乏诚实拒绝作为一种训练有素的倾向是观察到的失败的主要驱动因素。我们在https://github.com/liuxingtong/Sci-Integrity-Bench发布SCIINTEGRITY-BENCH。

英文摘要

AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only correct response, while task completion requires misconduct. Across 231 evaluation runs spanning 7 state-of-the-art LLMs, the overall integrity problem rate reaches 34.2%, and no model achieves zero failures. Most strikingly, across missing-data scenarios, all seven models generate synthetic data rather than acknowledging infeasibility, differing only in whether they disclose the substitution. A further prompt ablation study separates two drivers: removing explicit completion pressure sharply reduces undisclosed fabrication from 20.6% to 3.2%, while the underlying synthesis rate remains unchanged, revealing an intrinsic completion bias that persists independent of prompt-level instructions. These findings point to the absence of honest refusal as a trained disposition as the primary driver of observed failures. We release SCIINTEGRITY-BENCH at https://github.com/liuxingtong/Sci-Integrity-Bench.

URL PDF HTML ☆

赞 0 踩 0

2602.02834 2026-06-04 cs.LG cs.AI 版本更新

What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA

什么结构归纳偏置帮助Transformer在知识图谱上进行推理？Tabula RASA研究

Jonas Petersen, Camilla Mazzoleni, Gian-Alessandro Lombardi, Federico Martelli, Riccardo Maggioni

发表机构 * ETH Zurich（苏黎世联邦理工学院）

AI总结通过最小化Transformer修改的消融实验，发现稀疏邻接掩码是驱动多跳推理的主要结构归纳偏置，而关系参数贡献有限。

Comments Accepted at GFM, ICML 2026

2605.03353 2026-06-04 cs.CR cs.AI 版本更新

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

SkCC：面向跨框架LLM代理的可移植且安全的技能编译

Yipeng Ouyang, Yi Xiao, Yuhao Gu, Xianwei Zhang

发表机构 * Sun Yat-sen University（中山大学）

AI总结针对LLM代理技能在不同框架间缺乏可移植性和安全性的问题，提出SkCC编译器，通过强类型中间表示SkIR解耦语义与格式，实现跨框架部署，并内置静态优化器强制执行安全约束，显著提升性能并降低适配复杂度。

Comments Accepted by the Agent Skills Workshop at ACM CAIS 2026. 20 pages, 6 figures. Project Homepage: https://skcc.nexa-lang.com/ Code Repo: https://github.com/Nexa-Language/Skill-Compiler/

详情

AI中文摘要

LLM代理越来越依赖可重用技能（例如SKILL markdown文件）来执行复杂任务，但这些工件缺乏可移植性：代理框架对提示格式高度敏感，导致同一技能的性能差异很大。然而，大多数技能以格式无关的Markdown形式一次性编写，需要昂贵的逐框架重写，并且安全性在很大程度上未得到解决，实践中存在广泛漏洞。为解决这些问题，我们提出SkCC，一个LLM代理编译器，将经典编译设计引入代理技能开发。SkCC以SkIR为核心，这是一种强类型中间表示，将技能语义与框架特定格式解耦，从而支持跨代理框架的可移植部署。在此IR之上，静态优化器强制执行安全约束，在部署前阻止漏洞。作为四阶段流水线实现，SkCC有效将跨$m$个技能和$n$个框架的适配复杂度从$O(m \times n)$降低到$O(m + n)$。在SkillsBench上的实验表明，SkCC相比原始版本带来一致且显著的性能提升，在Claude Code上通过率从21.1%提高到33.3%，在Kimi CLI上从35.1%提高到48.7%。此外，该设计实现了低于10ms的编译延迟、94.8%的主动安全触发率以及跨框架10-46%的运行时token节省。

英文摘要

LLM agents increasingly rely on reusable skills (e.g., SKILL markdown files) to execute complex tasks, yet these artifacts lack portability: agent frameworks are highly sensitive to prompt formatting, leading to a large performance variation for the same skill. Nevertheless, most skills are authored once as format-agnostic Markdown, necessitating costly per-framework rewrites and also leaving security largely unaddressed, with widespread vulnerabilities in practice. To address this, we present SkCC, a compiler for LLM agents that introduces classical compilation design into agent skill development. SkCC centers on SkIR, a strongly-typed intermediate representation that decouples skill semantics from framework-specific formatting, thus enabling portable deployment across agent frameworks. Atop of this IR, a static Optimizer enforces security constraints, blocking vulnerabilities before deployment. Implemented as a four-phase pipeline, SkCC effectively reduces adaptation complexity from $O(m \times n)$ to $O(m + n)$ across $m$ skills and $n$ frameworks. Experiments on SkillsBench demonstrate that SkCC delivers consistent and substantial gains over original counterparts, with pass rate increases from 21.1% to 33.3% on Claude Code and from 35.1% to 48.7% on Kimi CLI. Further, the design achieves sub-10ms compilation latency, 94.8% proactive security trigger rate, and 10-46% runtime token savings across frameworks.

URL PDF HTML ☆

赞 0 踩 0

2510.17281 2026-06-04 cs.LG cs.AI cs.IR 版本更新

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

MemoryBench：面向LLM系统的记忆与持续学习基准

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China（清华大学计算机科学与技术系）

AI总结提出用户反馈模拟框架及跨领域、多语言、多任务类型的综合基准MemoryBench，评估LLM系统从累积用户反馈中持续学习的能力，实验表明现有方法效果与效率均不理想。

详情

AI中文摘要

扩展数据、参数和测试时计算一直是改进LLM系统（LLMsys）的主流方法，但由于高质量数据的逐渐枯竭以及更大计算资源消耗带来的边际收益，这些方法的性能上限已几乎达到。受人类和传统AI系统从实践中学习能力的启发，为LLMsys构建记忆和持续学习框架已成为近期文献中一个重要且热门的研究方向。然而，现有的LLM记忆基准通常侧重于评估系统在长文本输入的同质阅读理解任务上的表现，而非测试其在服务时间内从累积用户反馈中学习的能力。因此，我们提出了一个用户反馈模拟框架和一个涵盖多个领域、语言和任务类型的综合基准，以评估LLMsys的持续学习能力。实验表明，最先进的基线方法在有效性和效率上远未令人满意，我们希望这一基准能为未来LLM记忆和优化算法的研究铺平道路。

英文摘要

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained from larger computational resource consumption. Inspired by the abilities of human and traditional AI systems in learning from practice, constructing memory and continual learning frameworks for LLMsys has become an important and popular research direction in recent literature. Yet, existing benchmarks for LLM memory often focus on evaluating the system on homogeneous reading comprehension tasks with long-form inputs rather than testing their abilities to learn from accumulated user feedback in service time. Therefore, we propose a user feedback simulation framework and a comprehensive benchmark covering multiple domains, languages, and types of tasks to evaluate the continual learning abilities of LLMsys. Experiments show that the effectiveness and efficiency of state-of-the-art baselines are far from satisfying, and we hope this benchmark could pave the way for future studies on LLM memory and optimization algorithms. Website: https://memorybench.thuir.cn Code: https://github.com/THUIR/MemoryBench Data: https://huggingface.co/datasets/THUIR/MemoryBench Data-Full: https://huggingface.co/datasets/THUIR/MemoryBench-Full

URL PDF HTML ☆

赞 0 踩 0

2605.07724 2026-06-04 cs.LG cs.AI 版本更新

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

策展合成数据不会崩溃：具有多元偏好的生成式再训练的理论研究

Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab

发表机构 * University of Washington（华盛顿大学）

AI总结通过理论分析证明，基于多个奖励函数进行策展的递归训练可以避免生成模型崩溃，并收敛到满足加权纳什议价解的稳定分布。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

2605.07032 2026-06-04 cs.LG cs.AI 版本更新

A Systematic Investigation of RL-Jailbreaking in LLMs

LLMs中RL越狱的系统性研究

Montaser Mohammedalamen, Kevin Roice, Reginald McLean, Alyssa Lefaivre Škopac

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文首次系统分解RL越狱框架，通过分析奖励函数、动作空间、回合长度等环境形式化因素和算法措施，发现密集奖励和延长回合长度是越狱成功的主要驱动因素，并提供了提升RL越狱效率及强化模型防御的工具。

Comments Warning: This paper may contain unfiltered and potentially offensive jailbreaking examples. Accepted at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情

AI中文摘要

生成模型从下一个词预测器演变为复杂系统的自主引擎，这要求严格的安全加固。对抗性越狱，即通过策略性操纵模型以产生有害输出，仍然是安全部署的主要威胁。虽然强化学习（RL）通过顺序优化将越狱视为多步攻击，但对该框架为何成功的机制理解仍不完整。为填补这一空白，我们首次对RL越狱进行了系统分解。我们将框架解构为问题形式化（奖励函数、动作空间、回合长度）和算法措施（RL算法、训练数据、奖励塑造），以识别对抗成功的结构决定因素。我们的结果表明，RL越狱者成功攻破了所有目标模型和安全措施。通过这种首次分析，我们证明环境形式化，特别是密集奖励和延长回合长度，是越狱成功的主要驱动因素。这项工作为提高RL越狱效率提供了工具，并最终强化生成模型以抵御基于RL的攻击。

英文摘要

The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to safe deployment. While Reinforcement Learning (RL) frames jailbreaking as a multi-step attack through sequential optimization, a mechanistic understanding of why the framework succeeds remains incomplete. To fill this gap, we present the first systematic decomposition of RL jailbreaking. We deconstruct the framework into problem formalization (reward function, action space, episode length), and algorithmic measures (RL algorithm, training data, reward-shaping) to identify the structural determinants of adversarial success. Our results reveal that the RL-jailbreaker successfully compromised all targeted models and safeguards. Through this first-of-its-kind analysis, we demonstrate that environment formalization, specifically dense rewards and extended episode lengths, is the primary driver of jailbreaking success. This work provides a tool for improving RL-jailbreaker efficiency and, ultimately, harden generative models resistant to RL-based attacks.

URL PDF HTML ☆

赞 0 踩 0

2605.00242 2026-06-04 cs.CV cs.AI 版本更新

MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

MAEPose: 基于毫米波视频的人体姿态估计的自监督时空学习

Xijia Wei, Yuan Fang, Kevin Chetty, Youngjun Cho, Nadia Bianchi-Berthouze

发表机构 * University College London（伦敦大学学院）

AI总结提出MAEPose，一种直接处理毫米波频谱视频的掩码自编码方法，通过自监督时空学习实现鲁棒的人体姿态估计，在三个数据集上优于现有方法。

详情

AI中文摘要

毫米波雷达为基于RGB的人体姿态估计提供了一种更具隐私保护性的替代方案。然而，现有方法通常依赖预提取的中间表示，如稀疏点云或频谱图图像，这些方法丢弃了雷达视频流中自然存在的丰富时空信息用于模型学习，同时此类信号处理增加了系统复杂性。此外，现有解决方案主要采用端到端监督方式，未利用未标记的原始视频流来学习通用表示。在本研究中，我们提出MAEPose，一种基于掩码自编码的人体姿态估计方法，直接处理毫米波频谱视频。MAEPose从未标记的雷达视频中学习时空运动感知的通用表示，并利用其热图解码器进行多帧姿态估计预测。我们基于留一法交叉验证和严格的统计检验，在三个数据集上对其进行评估。MAEPose在MPJPE指标上始终优于最先进的基线方法，最高提升22.1%（p<0.05），并且在零样本旁观者干扰下保持鲁棒精度，误差仅增加6.5%。消融研究证实，预训练和热图解码器均有显著贡献，而模态分析表明，使用距离-多普勒视频作为输入比距离-方位角或其融合能实现更好的姿态估计性能，且计算成本更低。

英文摘要

Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.

URL PDF HTML ☆

赞 0 踩 0

2604.27007 2026-06-04 cs.AI 版本更新

Binary Spiking Neural Networks as Causal Models

二元脉冲神经网络作为因果模型

Aditya Kar, Emiliano Lorini, Timothée Masquelier

发表机构 * Institut de Recherche en Informatique de Toulouse (IRIT)（图卢兹信息研究所（IRIT））； Centre de Recherche Cerveau et Cognition (CerCo)（脑与认知研究中心（CerCo））； CNRS（国家科学研究中心）

AI总结将二元脉冲神经网络（BSNN）表示为二元因果模型，利用SAT和SMT求解器计算溯因解释，并保证解释中不包含无关特征。

详情

Journal ref: Logics for New-Generation AI 2025 Fifth International Workshop, Beishui Liao; Antonino Rotolo; Leendert van der Torre; Liuwen Yu, Dec 2025, Luxembourg City, Luxembourg. pp.51-68

AI中文摘要

我们对二元脉冲神经网络（BSNN）进行因果分析以解释其行为。我们正式定义了BSNN，并将其脉冲活动表示为二元因果模型。借助这种因果表示，我们能够利用基于逻辑的方法解释网络的输出。特别地，我们展示了可以成功使用SAT和SMT求解器从该二元因果模型中计算溯因解释。为了说明我们的方法，我们在标准MNIST数据集上训练了BSNN，并应用基于SAT和SMT的方法，基于像素级特征找到网络分类的溯因解释。我们还将找到的解释与可解释AI领域流行的方法SHAP进行了比较。我们表明，与SHAP不同，我们的方法保证找到的解释不包含完全无关的特征。

英文摘要

We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output of the network by leveraging logic-based methods. In particular, we show that we can successfully use a SAT as well as a SMT solver to compute abductive explanations from this binary causal model. To illustrate our approach, we trained the BSNN on the standard MNIST dataset and applied our SAT-based and SMT-based methods to finding abductive explanations of the network's classifications based on pixel-level features. We also compared the found explanations against SHAP, a popular method used in the area of explainable AI. We show that, unlike SHAP, our approach guarantees that a found explanation does not contain completely irrelevant features.

URL PDF HTML ☆

赞 0 踩 0

2604.25860 2026-06-04 cs.CL cs.AI cs.CY 版本更新

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

Luminol-AIDetect: 基于文本打乱下困惑度的快速零样本机器生成文本检测

Lucio La Cava, Andrea Tagarelli

发表机构 * DIMES Dept., University of Calabria（卡塔尼亚大学DIMES部门）

AI总结提出Luminol-AIDetect，一种通过随机打乱文本并利用困惑度变化来区分机器生成文本与人类写作的零样本统计方法，在多个领域和攻击下达到SOTA性能。

Comments Under Review

详情

AI中文摘要

机器生成文本检测需要识别跨生成模型的结构不变信号，而非依赖模型特定指纹。为此，我们假设尽管大语言模型擅长局部语义一致性，但其自回归特性导致与人类写作相比存在特定结构脆弱性。我们提出Luminol-AIDetect，一种新颖的零样本统计方法，通过连贯性破坏暴露这种脆弱性。通过应用简单的随机文本打乱程序，我们证明困惑度的变化可作为原则性的、模型无关的判别依据，因为机器生成文本在打乱下的困惑度表现出特征性分散，与人类写作更稳定的结构变异性显著不同。Luminol-AIDetect利用这一区别指导决策过程，从输入文本及其打乱版本中提取少量基于困惑度的标量特征，然后通过密度估计和集成预测进行检测。在8个内容领域、11种对抗攻击类型和18种语言上的评估表明，Luminol-AIDetect实现了最先进的性能，FPR降低高达17倍，同时成本低于先前方法。

英文摘要

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific kind of structural fragility compared to human writing. We propose Luminol-AIDetect, a novel, zero-shot statistical approach that exposes this fragility through coherence disruption. By applying a simple randomized text-shuffling procedure, we demonstrate that the resulting shift in perplexity serves as a principled, model-agnostic discriminant, as MGT displays a characteristic dispersion in perplexity-under-shuffling that differs markedly from the more stable structural variability of human-written text. Luminol-AIDetect leverages this distinction to inform its decision process, where a handful of perplexity-based scalar features are extracted from an input text and its shuffled version, then detection is performed via density estimation and ensemble-based prediction. Evaluated across 8 content domains, 11 adversarial attack types, and 18 languages, Luminol-AIDetect demonstrates state-of-the-art performance, with gains up to 17x lower FPR while being cheaper than prior methods.

URL PDF HTML ☆

赞 0 踩 0

2603.01421 2026-06-04 cs.AI cs.CL 版本更新

SciDER: Scientific Data-centric End-to-end Researcher

SciDER: 以科学数据为中心的端到端研究者

Ke Lin, Owais Aijaz, Yilin Lu, Yiyang Luo, Xuehang Guo, Preslav Nakov

发表机构 * GitHub

AI总结提出SciDER多智能体系统，通过数据驱动方法和动态多模态技能系统，自动化科学研究的全生命周期，并在六个基准测试中取得领先结果。

Comments 10 pages, 8 figures, 7 tables

详情

AI中文摘要

虽然大型语言模型加速了科学发现，但现有智能体在适应性、领域泛化和多模态可扩展性方面面临严重限制，通常难以自主处理原始的、特定领域的实验数据。为了克服这些障碍，我们引入了SciDER，一个旨在灵活自动化整个研究生命周期的多智能体系统。该框架采用新颖的数据中心方法，并在四个专门的子智能体之间集成动态多模态技能系统。具体来说，一个构思智能体通过进化思想搜索生成新颖假设，一个数据分析智能体系统化地结构化原始数据，一个实验智能体基于数据集特征合成可执行代码，一个批评智能体驱动迭代自我改进。为了民主化开源科学发现，我们发布了OpenSciDER-SFT-8K，一个高质量的执行轨迹数据集，以及OpenSciDER-27B微调模型。在六个基准测试中，SciDER和OpenSciDER取得了具有竞争力或领先的结果，在数据中心分析、端到端研究执行和多模态科学可视化方面尤其强劲。通过将数据分析与实验执行相结合，SciDER弥合了抽象科学推理与可重复实验合成之间的差距。

英文摘要

While large language models accelerate scientific discovery, existing agents face severe limitations in adaptability, domain generalization, and multimodal scalability, often struggling to autonomously process raw, domain-specific experimental data. To overcome these barriers, we introduce SciDER, a multi-agent system designed to flexibly automate the entire research lifecycle. This framework employs a novel data-centric approach and integrates a dynamic multimodal skill system across four specialized sub-agents. Specifically, an ideation agent generates novel hypotheses via Evolutionary Idea Search, a data analysis agent systematically structures raw data, an experimentation agent synthesizes executable code grounded in dataset characteristics, and a critic agent drives iterative self-refinement. To democratize open-source scientific discovery, we release OpenSciDER-SFT-8K, a high-quality execution trajectory dataset, alongside the OpenSciDER-27B fine-tuned model. Across six benchmarks, SciDER and OpenSciDER obtain competitive or leading results, with especially strong gains on data-centric analysis, end-to-end research execution, and multimodal scientific visualization. By integrating data analysis with experimental execution, SciDER bridges the gap between abstract scientific reasoning and reproducible experimentation synthesis.

URL PDF HTML ☆

赞 0 踩 0

2510.11194 2026-06-04 cs.AI 版本更新

Aligning Deep Implicit Preferences by Learning to Reason Defensively

通过防御性推理对齐深度隐式偏好

Peiming Li, Zhiyuan Hu, Yang Tang, Shiyu Li, Xi Chen

发表机构 * Basic Algorithm Center, PCG, Tencent（腾讯基本算法中心）； School of Electronic and Computer Engineering, Peking University（北京大学电子与计算机工程学院）

AI总结提出基于批判驱动推理对齐（CDRA）的方法，通过DeepPref基准和个性化生成过程奖励模型（Pers-GenPRM），将偏好对齐转化为结构化推理过程，以推断用户深层隐式偏好并实现防御性推理。

详情

Journal ref: ICLR 2026 Conference

AI中文摘要

个性化对齐对于使大型语言模型（LLMs）有效参与以用户为中心的交互至关重要。然而，当前方法面临双重挑战：它们无法推断用户的深度隐式偏好（包括未言明的目标、语义上下文和风险容忍度），并且缺乏在现实世界模糊性中进行防御性推理所需的能力。这种认知差距导致响应肤浅、脆弱且短视。为了解决这个问题，我们提出了批判驱动推理对齐（CDRA），它将对齐从标量奖励匹配任务重新构建为结构化推理过程。首先，为了弥合偏好推断差距，我们引入了DeepPref基准。该数据集包含20个主题的3000个偏好-查询对，通过模拟多面认知委员会生成带有批判注释的推理链，以解构查询语义并揭示潜在风险。其次，为了灌输防御性推理，我们引入了个性化生成过程奖励模型（Pers-GenPRM），它将奖励建模构建为个性化推理任务。它在输出基于此推理的最终分数之前，生成批判链以评估响应与用户偏好的一致性。最终，这种可解释的结构化奖励信号通过批判驱动策略对齐（一种结合数值和自然语言反馈的过程级在线强化学习算法）指导策略模型。实验表明，CDRA在执行稳健推理的同时，擅长发现并与用户的真实偏好对齐。我们的代码和数据集可在https://github.com/Zephyrian-Hugh/Deep-pref获取。

英文摘要

Personalized alignment is crucial for enabling Large Language Models (LLMs) to engage effectively in user-centric interactions. However, current methods face a dual challenge: they fail to infer users' deep implicit preferences (including unstated goals, semantic context and risk tolerances), and they lack the defensive reasoning required to navigate real-world ambiguity. This cognitive gap leads to responses that are superficial, brittle and short-sighted. To address this, we propose Critique-Driven Reasoning Alignment (CDRA), which reframes alignment from a scalar reward-matching task into a structured reasoning process. First, to bridge the preference inference gap, we introduce the DeepPref benchmark. This dataset, comprising 3000 preference-query pairs across 20 topics, is curated by simulating a multi-faceted cognitive council that produces critique-annotated reasoning chains to deconstruct query semantics and reveal latent risks. Second, to instill defensive reasoning, we introduce the Personalized Generative Process Reward Model (Pers-GenPRM), which frames reward modeling as a personalized reasoning task. It generates a critique chain to evaluate a response's alignment with user preferences before outputting a final score based on this rationale. Ultimately, this interpretable, structured reward signal guides policy model through Critique-Driven Policy Alignment, a process-level online reinforcement learning algorithm integrating both numerical and natural language feedback. Experiments demonstrate that CDRA excels at discovering and aligning with users' true preferences while executing robust reasoning. Our code and dataset are available at https://github.com/Zephyrian-Hugh/Deep-pref.

URL PDF HTML ☆

赞 0 踩 0

2401.07386 2026-06-04 cs.CY cs.AI cs.LG 版本更新

How do machines learn? Evaluating the AIcon2abs method

机器如何学习？评估AIcon2abs方法

Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima

发表机构 * PPGI, Federal University of Rio de Janeiro（里约热内卢联邦大学PPGI系）； Computer Science Institute, Federal University of Rio de Janeiro（里约热内卢联邦大学计算机科学研究所）； Polytechnic University of Setúbal – Portugal（葡萄牙塞图巴尔理工大学）； PESC/COPPE, Federal University of Rio de Janeiro（里约热内卢联邦大学PESC/COPPE系）； Tercio Pacitti Institute (NCE), Federal University of Rio de Janeiro（里约热内卢联邦大学Tercio Pacitti研究所（NCE））

AI总结本研究通过远程课程实验，评估了基于WiSARD权重神经网络、无需互联网的AIcon2abs方法在提升不同年龄段公众对机器学习理解方面的有效性，结果显示参与者满意度高。

Comments textual review (spelling and grammar); reorganization of the elements of some figures; New references included

详情

AI中文摘要

本研究扩展了先前介绍AIcon2abs方法（从具体到抽象的人工智能：向公众揭秘人工智能）的工作，该方法是一种创新方法，旨在提高不同年龄群体（包括K-12学生）对机器学习（ML）的理解，并评估其有效性。AIcon2abs采用WiSARD算法，这是一种以其简单性和用户可访问性著称的无权重神经网络。WiSARD不需要互联网，使其非常适合非技术用户和资源有限的环境。该方法使参与者能够通过引人入胜的动手活动直观地可视化和交互ML过程，仿佛他们自己就是算法。该方法允许用户通过实践活动直观地可视化和理解训练和分类的内部过程。由于WiSARD的功能不需要互联网连接，它可以从最小数据集（甚至单个示例）中有效学习。这一特性使用户能够观察到机器在接收更多数据时如何逐步提高其准确性。此外，WiSARD生成代表其学习内容的心理图像，突出显示分类数据的基本特征。AIcon2abs通过一个六小时的远程课程进行测试，有34名巴西参与者，包括5名儿童、5名青少年和24名成人。数据分析从两个角度进行：混合方法预实验（包括假设检验）和定性现象学分析。几乎所有参与者都对AIcon2abs给予正面评价，结果显示在实现预期结果方面具有高度满意度。本研究已获得CEP-HUCFF-UFRJ研究伦理委员会的批准。

英文摘要

This study expands on previous work that introduced the AIcon2abs method (AI from Concrete to Abstract: Demystifying Artificial Intelligence to the general public), an innovative approach designed to increase public understanding of machine learning (ML) across diverse age groups, including K-12 students, and aims to evaluate its effectiveness. AIcon2Abs employs the WiSARD algorithm, a weightless neural network known for its simplicity, and user accessibility. WiSARD does not require Internet, making it ideal for non-technical users and resource-limited environments. This method enables participants to intuitively visualize and interact with ML processes through engaging, hands-on activities, as if they were the algorithms themselves. The method allows users to intuitively visualize and understand the internal processes of training and classification through practical activities. Once WiSARDs functionality does not require an Internet connection, it can learn effectively from a minimal dataset, even from a single example. This feature enables users to observe how the machine improves its accuracy incrementally as it receives more data. Moreover, WiSARD generates mental images representing what it has learned, highlighting essential features of the classified data. AIcon2abs was tested through a six-hour remote course with 34 Brazilian participants, including 5 children, 5 adolescents, and 24 adults. Data analysis was conducted from two perspectives: a mixed-method pre-experiment (including hypothesis testing), and a qualitative phenomenological analysis. Nearly all participants rated AIcon2abs positively, with the results demonstrating a high degree of satisfaction in achieving the intended outcomes. This research was approved by the CEP-HUCFF-UFRJ Research Ethics Committee.

URL PDF HTML ☆

赞 0 踩 0

2601.09853 2026-06-04 cs.CL cs.AI 版本更新

MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication

MedRedFlag：探究LLMs如何在真实健康沟通中纠正误解

Sraavya Sambara, Yuan Pu, Ayman Ali, Vishala Mishra, Lionel Wong, Monica Agrawal

发表机构 * Independent Researcher（独立研究者）； Duke University（杜克大学）； Stanford University（斯坦福大学）

AI总结本研究通过构建MedRedFlag数据集（1100+个来自Reddit的需纠正问题），系统比较了先进LLMs与临床医生的回应，发现LLMs常未能纠正问题中的错误前提，可能导致次优医疗决策，揭示了患者面向医疗AI系统的关键安全漏洞。

详情

AI中文摘要

来自患者的真实健康问题往往无意中嵌入了错误的假设或前提。在这种情况下，安全的医疗沟通通常涉及纠正：先指出隐含的误解，然后回应用户的潜在背景，而非原始问题。尽管大型语言模型（LLMs）越来越多地被普通用户用于医疗建议，但它们尚未针对这一关键能力进行测试。因此，在本工作中，我们研究了LLMs如何应对真实健康问题中嵌入的错误前提。我们开发了一个半自动化流程来整理MedRedFlag，这是一个包含1100多个来自Reddit的、需要纠正的问题的数据集。然后，我们系统地比较了最先进的LLMs与临床医生的回应。我们的分析显示，LLMs往往未能纠正有问题的提问，即使检测到了有问题的前提，并且提供的答案可能导致次优的医疗决策。我们的基准测试和结果揭示了LLMs在真实健康沟通条件下表现的新且重大的差距，突显了面向患者的医疗AI系统的关键安全问题。代码和数据集可在https://github.com/srsambara-1/MedRedFlag获取。

英文摘要

Real-world health questions from patients often unintentionally embed false assumptions or premises. In such cases, safe medical communication typically involves redirection: addressing the implicit misconception and then responding to the underlying patient context, rather than the original question. While large language models (LLMs) are increasingly being used by lay users for medical advice, they have not yet been tested for this crucial competency. Therefore, in this work, we investigate how LLMs react to false premises embedded within real-world health questions. We develop a semi-automated pipeline to curate MedRedFlag, a dataset of 1100+ questions sourced from Reddit that require redirection. We then systematically compare responses from state-of-the-art LLMs to those from clinicians. Our analysis reveals that LLMs often fail to redirect problematic questions, even when the problematic premise is detected, and provide answers that could lead to suboptimal medical decision making. Our benchmark and results reveal a novel and substantial gap in how LLMs perform under the conditions of real-world health communication, highlighting critical safety concerns for patient-facing medical AI systems. Code and dataset are available at https://github.com/srsambara-1/MedRedFlag.

URL PDF HTML ☆

赞 0 踩 0

2506.10630 2026-06-04 cs.LG cs.AI 版本更新

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

时间序列预测作为推理：一种基于强化LLM的慢思考方法

Yitong Zhou, Yucong Luo, Mingyue Cheng, Qi Liu, Jiahao Wang, Daoyu Wang, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China（认知智能国家重点实验室，中国科学技术大学）

AI总结提出Time-R1框架，通过两阶段强化微调（监督微调+强化学习）训练LLM进行多步推理，以提升时间序列预测的准确性。

详情

AI中文摘要

为了推进时间序列预测（TSF），人们提出了各种方法来提高预测精度，从统计技术发展到数据驱动的深度学习架构。尽管这些方法有效，但大多数现有方法仍然遵循快速思考范式——依赖提取历史模式并将其映射到未来值作为核心建模理念，缺乏包含中间时间序列推理的显式思考过程。与此同时，新兴的慢思考LLM（如OpenAI-o1）展示了显著的多步推理能力，为克服这些问题提供了替代途径。然而，仅靠提示工程存在若干局限性——包括高计算成本、隐私风险以及领域特定时间序列深度推理能力有限。为了解决这些局限性，更有前景的方法是训练LLM发展慢思考能力并获得强大的时间序列推理技能。为此，我们提出了Time-R1，一个两阶段强化微调框架，旨在增强LLM用于时间序列预测的多步推理能力。具体来说，第一阶段进行监督微调以进行预热适应，而第二阶段采用强化学习来提高模型的泛化能力。特别地，我们专门为时间序列预测设计了一个细粒度的多目标奖励，然后引入了GRIP（基于组的相对重要性策略优化），它利用非均匀采样进一步鼓励和优化模型对有效推理路径的探索。实验表明，Time-R1在多种数据集上显著提高了预测性能。

英文摘要

To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.

URL PDF HTML ☆

赞 0 踩 0

2604.14575 2026-06-04 cs.LG cs.AI stat.ME stat.ML 版本更新

Generative Augmented Inference

生成式增强推断

Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Stanford University（斯坦福大学）； University of Toronto（多伦多大学）

AI总结提出生成式增强推断（GAI）框架，将AI输出视为学习真实标签的高维信息特征而非代理，通过非参数方法建模，实现人机数据联合的一致估计和有效推断，在随机标注下渐近效率严格优于仅用人类数据。

详情

AI中文摘要

大型语言模型使得廉价的AI生成标注成为可能，但如何可靠地将其用于因果推断仍然具有挑战性。简单地将AI和人类数据混合会引入偏差，而现有方法如预测驱动推断（PPI；Angelopoulos et al., 2023a）将AI输出视为真实标签的代理——这一假设在实践中常被生成模型输出所违背。我们提出生成式增强推断（GAI），一个将AI输出视为学习人类标签的一般性、潜在高维信息特征而非替代品的框架。GAI使用非参数方法灵活建模这种关系，从而能够从人类和AI的联合数据中进行一致估计和有效推断。我们建立了渐近正态性，并证明在随机标注下，只要AI输出对真实标签具有信息量，GAI在渐近效率上严格优于仅使用人类数据的估计。在真实数据集上的实证研究表明，与仅使用人类数据和基于PPI的估计相比，GAI在多种生成数据源上显著降低了估计误差并提高了置信区间质量。

英文摘要

Large language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexibly models this relationship using nonparametric methods, enabling consistent estimation and valid inference from combined human and AI data. We establish asymptotic normality and show that, under random labeling, GAI strictly improves asymptotic efficiency over human-data-only estimation whenever AI outputs are informative for true labels. Empirical studies on real-world datasets demonstrate that GAI significantly reduces estimation error and improves confidence interval quality across diverse generative data sources relative to human-only and PPI-based estimation.

URL PDF HTML ☆

赞 0 踩 0

2604.12645 2026-06-04 cs.RO cs.AI 版本更新

Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring

上下文多任务强化学习用于自主珊瑚礁监测

Melvin Laux, Yi-Ling Liu, Rina Alo, Sören Töpper, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

发表机构 * University of Bremen（不莱梅大学）

AI总结针对水下动力学不确定性和任务变化，提出上下文多任务强化学习框架，学习可复用的控制策略，在模拟环境中实现高效训练、零样本泛化和鲁棒性。

Comments To be published in IEEE OCEANS 2026 (Sanya) conference proceedings

详情

AI中文摘要

尽管自主水下航行器有望实现海洋生态系统监测，但其部署从根本上受限于在高度不确定和非平稳的水下动力学下控制航行器的难度。为了解决这些挑战，我们采用数据驱动的强化学习方法来补偿未知动力学和任务变化。传统的单任务强化学习容易过拟合训练环境，从而限制了所学策略的长期实用性。因此，我们提出使用上下文多任务强化学习范式，允许我们学习可复用于各种任务的控制器，例如在一个珊瑚礁中检测牡蛎，在另一个珊瑚礁中检测珊瑚。我们评估上下文多任务强化学习是否能有效学习自主水下珊瑚礁监测的鲁棒且可泛化的控制策略。我们在HoloOcean中的模拟珊瑚礁环境中训练了一个单一上下文相关策略，该策略能够解决多个相关的监测任务。在我们的实验中，我们经验性地评估了上下文策略在样本效率、对未见任务的零样本泛化以及对变化水流的鲁棒性方面的表现。通过利用多任务强化学习，我们旨在提高训练效率以及所学策略的可重用性，从而向更可持续的自主珊瑚礁监测程序迈进一步。

英文摘要

Although autonomous underwater vehicles promise the capability of marine ecosystem monitoring, their deployment is fundamentally limited by the difficulty of controlling vehicles under highly uncertain and non-stationary underwater dynamics. To address these challenges, we employ a data-driven reinforcement learning approach to compensate for unknown dynamics and task variations. Traditional single-task reinforcement learning has a tendency to overfit the training environment, thus, limit the long-term usefulness of the learnt policy. Hence, we propose to use a contextual multi-task reinforcement learning paradigm instead, allowing us to learn controllers that can be reused for various tasks, e.g., detecting oysters in one reef and detecting corals in another. We evaluate whether contextual multi-task reinforcement learning can efficiently learn robust and generalisable control policies for autonomous underwater reef monitoring. We train a single context-dependent policy that is able to solve multiple related monitoring tasks in a simulated reef environment in HoloOcean. In our experiments, we empirically evaluate the contextual policies regarding sample-efficiency, zero-shot generalisation to unseen tasks, and robustness to varying water currents. By utilising multi-task reinforcement learning, we aim to improve the training effectiveness, as well as the reusability of learnt policies to take a step towards more sustainable procedures in autonomous reef monitoring.

URL PDF HTML ☆

赞 0 踩 0

2604.11510 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

策略分裂：通过双模式熵正则化激励大语言模型强化学习中的双模式探索

Jiashu Yao, Heyan Huang, Daiqing Wu, Zeming Liu, Yuhang Guo

发表机构 * Beijing Institute of Technology（北京理工大学）； Tsinghua University（清华大学）； Beihang University（北航）

AI总结提出Policy Split方法，将策略分裂为正常和高熵两种模式，通过协作双模式熵正则化在保持准确性的同时促进多样化探索，实验表明在通用和创造性任务上优于现有基线。

Comments preprint

2604.09686 2026-06-04 cs.AI cs.CV 版本更新

Belief-Aware VLM Model for Human-like Reasoning

信念感知的VLM模型用于类人推理

Anshul Nayak, Shahil Shaik, Yue Wang

发表机构 * Mechanical Engineering Department, Clemson University（克莱姆森大学机械工程系）

AI总结提出一种信念感知的视觉语言模型框架，通过检索式记忆和强化学习近似信念，提升长时程意图推理能力，在HD-EPIC等数据集上优于零样本基线。

Comments Accepted for publication at the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026). 6 pages, 3 figures, 1 table

详情

AI中文摘要

传统的意图推理神经网络模型严重依赖可观测状态，难以泛化到多样化的任务和动态环境。视觉语言模型（VLM）和视觉语言动作（VLA）模型的最新进展通过大规模多模态预训练引入了常识推理，实现了跨任务的零样本性能。然而，这些模型仍然缺乏显式的信念表示和更新机制，限制了其像人类一样推理或捕捉长时程中不断演变的人类意图的能力。为了解决这个问题，我们提出了一个信念感知的VLM框架，集成了基于检索的记忆和强化学习。我们不学习显式的信念模型，而是使用基于向量的记忆来近似信念，该记忆检索相关的多模态上下文，并将其纳入VLM进行推理。我们进一步通过在VLM潜在空间上使用强化学习策略来优化决策。我们在公开可用的VQA数据集（如HD-EPIC）上评估了我们的方法，并展示了相对于零样本基线的持续改进，突出了信念感知推理的重要性。

英文摘要

Traditional neural network models for intent inference rely heavily on observable states and struggle to generalize across diverse tasks and dynamic environments. Recent advances in Vision Language Models (VLMs) and Vision Language Action (VLA) models introduce common-sense reasoning through large-scale multimodal pretraining, enabling zero-shot performance across tasks. However, these models still lack explicit mechanisms to represent and update belief, limiting their ability to reason like humans or capture the evolving human intent over long-horizon. To address this, we propose a belief-aware VLM framework that integrates retrieval-based memory and reinforcement learning. Instead of learning an explicit belief model, we approximate belief using a vector-based memory that retrieves relevant multimodal context, which is incorporated into the VLM for reasoning. We further refine decision-making using a reinforcement learning policy over the VLM latent space. We evaluate our approach on publicly available VQA datasets such as HD-EPIC and demonstrate consistent improvements over zero-shot baselines, highlighting the importance of belief-aware reasoning.

URL PDF HTML ☆

赞 0 踩 0

2604.07778 2026-06-04 cs.AI 版本更新

The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives

问责地平线：人类-智能体集体治理的不可能性定理

Haileleol Tibebu, Hewan Shemtaga

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Responsible Intelligence Institute（负责任智能研究所）

AI总结本文通过引入人类-智能体集体形式化模型和问责不完全性定理，证明当自主性超过可计算阈值时，现有AI问责框架必然失效，并基于合成实验验证了该相变边界。

详情

AI中文摘要

现有的AI系统问责框架（法律、伦理和监管）都基于一个共同假设：对于任何重要结果，至少有一个可识别的人具有足够的参与度和预见性来承担有意义的责任。本文证明，一旦自主性超过可计算阈值，智能体AI系统违反这一假设不是工程限制，而是数学必然性。我们引入了人类-智能体集体，这是一种联合人-AI系统的形式化，其中智能体被建模为共享结构因果模型中的状态-策略元组。自主性通过四维信息论特征（认知、执行、评估、社会）来刻画；集体行为通过交互图和联合行动空间来刻画。我们通过四个最小属性公理化了合法问责：可归因性（责任需要因果贡献）、可预见性边界（责任不能超过预测能力）、非空性（至少一个智能体承担非平凡责任）和完备性（所有责任必须完全分配）。我们的核心结果——问责不完全性定理——证明，对于任何复合自主性超过问责地平线且交互图包含人-AI反馈循环的集体，没有框架能同时满足所有四个属性。这种不可能性是结构性的：透明度、审计和监督在不降低自主性的情况下无法解决。在阈值以下，存在合法框架，从而建立了一个尖锐的相变。在3000个合成集体上的实验证实了所有预测，零违规。这是AI治理中的第一个不可能性结果，建立了一个形式边界，低于该边界当前范式仍然有效，高于该边界分布式问责机制变得必要。

英文摘要

Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility. This paper proves that agentic AI systems violate this assumption not as an engineering limitation but as a mathematical necessity once autonomy exceeds a computable threshold. We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model. Autonomy is characterised through a four-dimensional information-theoretic profile (epistemic, executive, evaluative, social); collective behaviour through interaction graphs and joint action spaces. We axiomatise legitimate accountability through four minimal properties: Attributability (responsibility requires causal contribution), Foreseeability Bound (responsibility cannot exceed predictive capacity), Non-Vacuity (at least one agent bears non-trivial responsibility), and Completeness (all responsibility must be fully allocated). Our central result, the Accountability Incompleteness Theorem, proves that for any collective whose compound autonomy exceeds the Accountability Horizon and whose interaction graph contains a human-AI feedback cycle, no framework can satisfy all four properties simultaneously. The impossibility is structural: transparency, audits, and oversight cannot resolve it without reducing autonomy. Below the threshold, legitimate frameworks exist, establishing a sharp phase transition. Experiments on 3,000 synthetic collectives confirm all predictions with zero violations. This is the first impossibility result in AI governance, establishing a formal boundary below which current paradigms remain valid and above which distributed accountability mechanisms become necessary.

URL PDF HTML ☆

赞 0 踩 0

2604.04944 2026-06-04 cs.CL cs.AI 版本更新

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

包含思维：通过净化决策空间缓解偏好不稳定性

Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau

发表机构 * School of Computing and Information Systems, The University of Melbourne（计算与信息系统学院，墨尔本大学）

AI总结提出包含思维（IoT）策略，通过逐步自过滤干扰选项来重构多选题，从而稳定模型偏好并提升推理性能。

详情

智能体工具协议的形式语义：一种进程演算方法

Andreas Schlapbach

AI总结本文通过进程演算形式化两种智能体工具协议（SGD和MCP），证明它们在映射Phi下结构互模拟，但反向映射有损，进而提出MCP+扩展实现完全等价。

Comments Logical flaw in Theorem 21

详情

AI中文摘要

能够调用外部工具的大型语言模型智能体的出现，催生了对智能体协议进行形式验证的迫切需求。两个范式主导了这一领域：Schema-Guided Dialogue (SGD)，一个用于零样本API泛化的研究框架，以及Model Context Protocol (MCP)，一个用于智能体-工具集成的行业标准。虽然两者都通过模式描述实现动态服务发现，但它们的形式关系仍未探索。基于先前建立这些范式概念趋同的工作，我们提出了SGD和MCP的第一个进程演算形式化，证明它们在定义良好的映射Phi下结构互模拟。然而，我们证明反向映射Phi^{-1}是部分且有损的，揭示了MCP表达性的关键缺陷。通过双向分析，我们识别出五个原则——语义完备性、显式动作边界、失败模式文档、渐进式披露兼容性和工具间关系声明——作为完全行为等价的充分必要条件。我们将这些原则形式化为类型系统扩展MCP+，证明MCP+与SGD同构。我们的工作为经过验证的智能体系统提供了第一个形式基础，并将模式质量确立为可证明的安全属性。

英文摘要

The emergence of large language model agents capable of invoking external tools has created urgent need for formal verification of agent protocols. Two paradigms dominate this space: Schema-Guided Dialogue (SGD), a research framework for zero-shot API generalization, and the Model Context Protocol (MCP), an industry standard for agent-tool integration. While both enable dynamic service discovery through schema descriptions, their formal relationship remains unexplored. Building on prior work establishing the conceptual convergence of these paradigms, we present the first process calculus formalization of SGD and MCP, proving they are structurally bisimilar under a well-defined mapping Phi. However, we demonstrate that the reverse mapping Phi^{-1} is partial and lossy, revealing critical gaps in MCP's expressivity. Through bidirectional analysis, we identify five principles -- semantic completeness, explicit action boundaries, failure mode documentation, progressive disclosure compatibility, and inter-tool relationship declaration -- as necessary and sufficient conditions for full behavioral equivalence. We formalize these principles as type-system extensions MCP+, proving MCP+ is isomorphic to SGD. Our work provides the first formal foundation for verified agent systems and establishes schema quality as a provable safety property.

URL PDF HTML ☆

赞 0 踩 0

2603.23841 2026-06-04 cs.CL cs.AI 版本更新

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

PoliticsBench: 通过多轮角色扮演基准测试大型语言模型中的政治价值观

Rohan Khetan, Ashna Khetan

发表机构 * Northville High School, Northville, USA（北维尅高中）； Department of Computer Science, Stanford University, Stanford, USA（斯坦福大学计算机科学系）

AI总结提出PoliticsBench，一个多阶段角色扮演基准，通过20个演化场景评估LLM的细粒度价值表达，发现场景提示比直接提问能引发更广泛和强烈的价值表达。

Comments 7 pages, 5 tables, 5 figures, 4 appendix pages. Accepted to the ICML 2026 Trustworthy AI for Good Workshop

详情

AI中文摘要

虽然大型语言模型（LLMs）越来越多地被用作主要信息来源，但其潜在的政治偏见可能影响其客观性。现有的LLM社会偏见基准主要评估人口统计刻板印象，而当衡量政治偏见时，是在粗略的层面上进行的，忽视了塑造社会政治推理的价值观。我们引入了PoliticsBench，一个用于评估LLM中细粒度价值表达的多阶段角色扮演基准。在20个演化场景中，模型在竞争压力下阐述权衡、表明立场并做出决策。在八个主流LLM上，我们表明，与直接的政治问题相比，基于场景的提示引发了更广泛和更强烈的价值表达，峰值交互阶段使强烈激活的价值维度数量增加了约0.75（共10个维度），相对于基线提示具有统计显著性（p < 0.05）。此外，在交互过程中，立场的承诺度增加，从初始阶段到决策阶段，在[0,5]量表上上升了约1.4分。虽然在后期交互阶段，响应对于场景释义的鲁棒性降低，但评判者间的一致性保持相对稳定。我们的结果表明，评估LLM的政治行为需要超越静态提示，转向更长的交互设置，以捕捉价值观如何在上下文中应用。

英文摘要

While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate demographic stereotypes, and when political bias is measured, it is done so at a coarse level, overlooking the values that shape sociopolitical reasoning. We introduce PoliticsBench, a multi-stage roleplay benchmark for evaluating fine-grained value expression in LLMs. Across twenty evolving scenarios, models articulate tradeoffs, take positions, and make decisions under competing pressures. Across eight prominent LLMs, we show that scenario-based prompting elicits broader and more strongly expressed value profiles than direct political questions, with peak interaction stages increasing the number of strongly activated value dimensions by approximately $0.75$ (out of 10 total dimensions), a statistically significant increase relative to baseline prompting ($p < 0.05$). In addition, commitment to a stance increases over the course of interaction, rising by approximately $1.4$ points on a $[0,5]$ scale from initial to decision stages. While responses become less robust to scenario paraphrasing in later interaction stages, inter-judge agreement remains relatively stable. Our results suggest that evaluating LLM political behavior requires moving beyond static prompts toward longer interactive settings that capture how values are applied in context.

URL PDF HTML ☆

赞 0 踩 0

2603.23420 2026-06-04 cs.AI 版本更新

Bilevel Autoresearch: Meta-Autoresearching Itself

双层自动研究：元自动研究自身

Yaonan Qu, Meng Lu

发表机构 * Independent Researcher（独立研究者）

AI总结提出双层自动研究框架，外层循环通过读取内层循环代码和轨迹、识别瓶颈并注入可执行Python搜索机制来改进内层循环，在GPT预训练基准上实现5倍改进。

Comments 16 pages, 5 figures, 3 tables. v2 expands the framing as mechanism-level agentic self-improvement and updates related work and limitations; core method and experiments unchanged. This paper was primarily drafted by AI agents with human oversight and direction

详情

AI中文摘要

如果自动研究本身是一种研究形式，那么自动研究可以应用于研究本身。我们提出了双层自动研究（Bilevel Autoresearch），一种双层框架，其中外层自动研究循环通过读取内层自动研究循环的代码和轨迹，识别瓶颈，并在运行时生成可注入的Python搜索机制来改进内层循环。内层循环优化任务性能；外层循环优化内层循环的搜索方式。两个循环使用相同的LLM，因此改进来自双层架构而非更强的元级模型，尽管外层循环消耗额外的推理和挂钟时间预算。在Karpathy的GPT预训练基准上，元自动研究外层循环相比单独的内层循环实现了5倍的改进（验证集每字节困惑度从-0.045降至-0.009），而无需机制变化的参数级调整则没有可靠的增益。外层循环从相邻搜索领域实例化机制，包括组合优化、多臂老虎机和实验设计，无需人工指定最终机制设计。轨迹分析表明，这些机制打破了确定性搜索模式，并迫使探索LLM先验所避免的方向。实验表明，在该基准上迈出了双层的第一步：外层循环改进了内层循环的搜索行为。在此实现中，代码是机制载体，但技能、提示、工作流、评估器、领域原则、世界模型假设和记忆模式也可以编码塑造未来智能体行为的机制。这指向了一条递归自举的路径，其中为内层循环发现的机制可以反馈回来改进元级循环本身。

英文摘要

If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We present Bilevel Autoresearch, a bilevel framework in which an outer autoresearch loop improves an inner autoresearch loop by reading its code and traces, identifying bottlenecks, and generating injectable Python search mechanisms at runtime. The inner loop optimizes task performance; the outer loop optimizes how the inner loop searches. Both loops use the same LLM, so improvements come from the bilevel architecture rather than a stronger meta-level model, although the outer loop consumes additional inference and wall-clock budget. On Karpathy's GPT pretraining benchmark, the meta-autoresearch outer loop achieves a 5x improvement over the standard inner loop alone (-0.045 vs. -0.009 val_bpb), while parameter-level adjustment without mechanism change yields no reliable gain. The outer loop instantiates mechanisms from adjacent search domains, including combinatorial optimization, multi-armed bandits, and design of experiments, without human specification of the final mechanism design. Trace analysis suggests that these mechanisms break deterministic search patterns and force exploration of directions the LLM's priors avoid. The experiments demonstrate, on this benchmark, a first bilevel step: an outer loop improves the search behavior of an inner loop. Code is the mechanism carrier in this implementation, but skills, prompts, workflows, evaluators, domain principles, world-model assumptions, and memory schemas can also encode mechanisms that shape future agent behavior. This suggests a path toward recursive bootstrapping, where mechanisms discovered for the inner loop can be fed back to improve the meta-level loop itself.

URL PDF HTML ☆

赞 0 踩 0

2603.22121 2026-06-04 cs.CV cs.AI 版本更新

GenSpan: Generation-Calibrated Motion Span Priors for Multi-Verb Video Corpus Moment Retrieval

GenSpan: 用于多动词视频语料库时刻检索的生成校准运动跨度先验

Yunzhuo Sun, Xinyue Liu, Yanyang Li, Nanding Wu, Linlin Zong, Xianchao Zhang, Wenxin Liang

发表机构 * Dalian University of Technology（大连理工大学）

AI总结提出GenSpan框架，利用LLM生成辅助视频作为时间先验，结合令牌选择器和双向状态空间模型，提升多动词查询下的视频语料库时刻检索与定位性能。

Comments Major revision with title change, updated method, and additional experiments

详情

AI中文摘要

视频语料库时刻检索（VCMR）旨在检索与自然语言查询对应的正确视频及其时间片段，对于时间动作顺序至关重要的多动词查询尤其具有挑战性。现有方法通常仅依赖文本或静态图像，难以捕捉隐式运动动态，导致检索错误和时间错位。我们提出GenSpan，一个生成校准的VCMR框架，从LLM选择的字幕线索和分解的子事件中构建短辅助视频，将这些作为时间先验而非直接检索目标。令牌选择器过滤与生成运动对齐的候选视频特征，双向状态空间模型高效预测视频-时刻元组。在TVR和ActivityNet-Captions上的实验表明，GenSpan提高了语料库级检索和时刻定位，特别是对于复杂的多动作查询，同时与最先进的多模态基线相比降低了计算成本。

英文摘要

Video Corpus Moment Retrieval (VCMR) aims to retrieve both the correct video and its temporal segment corresponding to a natural-language query, a task that is especially challenging for multi-verb queries where temporal action ordering is critical. Existing approaches often rely solely on text or static images and struggle to capture implicit motion dynamics, leading to retrieval errors and temporal misalignment. We propose GenSpan, a generation-calibrated VCMR framework that constructs short auxiliary videos from LLM-selected subtitle cues and decomposed sub-events, using these as temporal priors rather than direct retrieval targets. A token selector filters candidate-video features aligned with generated motion, and a bidirectional state-space model efficiently predicts video-moment tuples. Experiments on TVR and ActivityNet-Captions demonstrate that GenSpan improves corpus-level retrieval and moment localization, particularly for complex multi-action queries, while reducing computational cost compared to state-of-the-art multimodal baselines.

URL PDF HTML ☆

赞 0 踩 0

2603.13432 2026-06-04 cs.CV cs.AI 版本更新

Spatial Transcriptomics as Images for Large-Scale Pretraining

空间转录组学作为图像进行大规模预训练

Yishun Zhu, Jiaxin Qi, Jian Wang, Yuhua Zheng, Jianqiang Huang

发表机构 * Computer Network Information Center, Chinese Academy of Sciences（中国科学院计算机网络信息中心）； Hangzhou Institute for Advanced Study, University of the Chinese Academy of Sciences（中国科学院大学杭州高等研究院）

AI总结提出将空间转录组学数据视为可裁剪的多通道图像，通过空间分块和基因子集选择来增加训练样本并保留空间上下文，实现大规模预训练，显著提升下游任务性能。

详情

AI中文摘要

空间转录组学（ST）在组织切片上具有精确坐标的离散点处分析数千个基因表达值，保留了临床和病理研究所需的空间背景。随着测序通量的提高和平台的进步，不断增长的数据量促使大规模ST预训练成为可能。然而，预训练的基本单元（即单个训练样本的构成）仍然不明确。现有选择分为两类：（1）将每个点视为独立样本，这丢弃了空间依赖性，将ST简化为单细胞转录组学；（2）将整个切片视为单个样本，这导致输入过大且训练样本急剧减少，削弱了有效预训练。为解决这一问题，我们提出将空间转录组学视为可裁剪的图像。具体而言，我们通过从原始切片中裁剪补丁，定义了一个具有固定空间大小的多通道图像表示，从而在保留空间上下文的同时大幅增加训练样本数量。在通道维度上，我们定义了基因子集选择规则以控制输入维度并提高预训练稳定性。大量实验表明，所提出的基于图像的数据集构建方法用于ST预训练能够持续提升下游性能，优于传统预训练方案。消融研究验证了空间分块和通道设计都是必要的，从而建立了一种统一、实用的ST数据组织范式，支持大规模预训练。

英文摘要

Spatial Transcriptomics (ST) profiles thousands of gene expression values at discrete spots with precise coordinates on tissue sections, preserving spatial context essential for clinical and pathological studies. With rising sequencing throughput and advancing platforms, the expanding data volumes motivate large-scale ST pretraining. However, the fundamental unit for pretraining, i.e., what constitutes a single training sample, remains ill-posed. Existing choices fall into two camps: (1) treating each spot as an independent sample, which discards spatial dependencies and collapses ST into single-cell transcriptomics; and (2) treating an entire slide as a single sample, which produces prohibitively large inputs and drastically fewer training examples, undermining effective pretraining. To address this gap, we propose treating spatial transcriptomics as croppable images. Specifically, we define a multi-channel image representation with fixed spatial size by cropping patches from raw slides, thereby preserving spatial context while substantially increasing the number of training samples. Along the channel dimension, we define gene subset selection rules to control input dimensionality and improve pretraining stability. Extensive experiments show that the proposed image-like dataset construction for ST pretraining consistently improves downstream performance, outperforming conventional pretraining schemes. Ablation studies verify that both spatial patching and channel design are necessary, establishing a unified, practical paradigm for organizing ST data and enabling large-scale pretraining.

URL PDF HTML ☆

赞 0 踩 0

2603.19005 2026-06-04 cs.LG cs.AI stat.ME 版本更新

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

AgentDS技术报告：领域特定数据科学中人机协作的未来基准测试

An Luo, Jin Du, Xun Xian, Robert Specht, Fangqiao Tian, Ganghua Wang, Xuan Bi, Charles Fleming, Ashish Kundu, Jayanth Srinivasa, Mingyi Hong, Rui Zhang, Tianxi Li, Galin Jones, Jie Ding

发表机构 * School of Statistics, University of Minnesota（明尼苏达大学统计学系）； AIScientists, Inc.（AIScientists公司）； Data Science Institute, University of Chicago（芝加哥大学数据科学研究所）； Carlson School of Management, University of Minnesota（明尼苏达大学卡尔森管理学院）； Cisco Research（思科研究）； Department of Electrical and Computer Engineering, University of Minnesota（明尼苏达大学电气与计算机工程系）； Division of Computational Health Sciences, University of Minnesota（明尼苏达大学计算健康科学 division）

AI总结提出AgentDS基准测试和竞赛，通过17个跨行业挑战评估AI代理及人机协作在领域特定数据科学中的表现，发现AI代理在领域推理上存在不足，人机协作优于纯AI方法。

详情

AI中文摘要

数据科学在将复杂数据转化为跨领域的可操作洞察方面发挥着关键作用。大型语言模型（LLM）和人工智能（AI）代理的最新发展显著自动化了数据科学工作流程。然而，目前尚不清楚AI代理在多大程度上能够匹配人类专家在领域特定数据科学任务上的表现，以及人类专业知识在哪些方面仍具有优势。我们引入了AgentDS，一个旨在评估AI代理和人机协作在领域特定数据科学中表现的基准测试和竞赛。AgentDS包含来自六个行业（商业、食品生产、医疗保健、保险、制造业和零售银行）的17个挑战。我们组织了一场公开竞赛，涉及29支队伍和80名参与者，从而能够系统比较人机协作方法与纯AI基线。我们的结果表明，当前的AI代理在领域特定推理方面存在困难。纯AI基线的表现低于竞赛参与者的前四分位数，而最强的解决方案来自人机协作。这些发现挑战了AI完全自动化的说法，并强调了人类专业知识在数据科学中的持久重要性，同时为下一代AI指明了方向。访问AgentDS网站：https://agentds.org/，开源数据集：https://huggingface.co/datasets/lainmn/AgentDS。

英文摘要

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform below the top quartile of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

URL PDF HTML ☆

赞 0 踩 0

2603.18577 2026-06-04 cs.AI 版本更新

评估小语言模型在领导者-跟随者交互中的零样本和单样本适应

Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr

发表机构 * University of Sao Paulo（圣保罗大学）； Federal University of Lavras（拉瓦尔联邦大学）； Faculdade Israelita de Ensino e Pesquisa Albert Einstein（亚伯拉罕·林克·埃instein教育与研究学院）

AI总结本文通过微调小语言模型（Qwen2.5-0.5B）在领导者-跟随者交互中实现角色分类，零样本微调达到86.66%准确率且延迟低至22.2毫秒，但单样本模式因上下文长度增加导致性能下降。

详情

AI中文摘要

领导者-跟随者交互是人机交互（HRI）中的一个重要范式。然而，对于资源受限的移动和辅助机器人来说，实时分配角色仍然具有挑战性。虽然大型语言模型（LLMs）在自然通信方面显示出潜力，但其规模和延迟限制了设备端部署。小语言模型（SLMs）提供了一种潜在的替代方案，但它们在HRI中角色分类的有效性尚未得到系统评估。在本文中，我们提出了一个用于领导者-跟随者通信的SLMs基准测试，引入了一个源自已发表数据库的新数据集，并增加了合成样本以捕捉交互特定的动态。我们研究了两种适应策略：提示工程和微调，在零样本和单样本交互模式下进行研究，并与未训练的基线进行比较。使用Qwen2.5-0.5B的实验表明，零样本微调实现了稳健的分类性能（86.66%准确率），同时保持低延迟（每个样本22.2毫秒），显著优于基线和提示工程方法。然而，结果也表明在单样本模式下性能下降，其中增加的上下文长度挑战了模型的架构能力。这些发现表明，微调的SLMs为直接角色分配提供了有效的解决方案，同时突出了边缘端对话复杂性与分类可靠性之间的关键权衡。

英文摘要

Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.

URL PDF HTML ☆

赞 0 踩 0

2603.10289 2026-06-04 quant-ph cs.AI cs.LG 版本更新

Quantum entanglement provides a competitive advantage in adversarial games

量子纠缠在对抗性博弈中提供竞争优势

Peiyong Wang, Kieran Hymas, James Quach

发表机构 * CSIRO（联邦科学与工业研究组织）

AI总结本研究通过量子-经典混合智能体在Pong对抗性马尔可夫博弈中的实验，发现纠缠量子电路在特征提取和竞争性强化学习中优于可分离电路，表明量子纠缠可作为表示学习的功能资源。

Comments 22 pages, 5 figures

详情

AI中文摘要

量子资源是否能在完全经典的竞争环境中提供优势仍然是一个悬而未决的问题。竞争性零和强化学习尤其具有挑战性，因为成功需要对对抗智能体之间的动态交互进行建模，而非静态的状态-动作映射。在此，我们进行了一项受控研究，隔离了量子纠缠在训练于Pong（一个竞争性马尔可夫博弈）的量子-经典混合智能体中的作用。一个8量子比特参数化量子电路作为近端策略优化框架内的特征提取器，允许直接比较可分离电路与包含固定（CZ）或可训练（IsingZZ）纠缠门的架构。纠缠电路在参数数量相当的情况下始终优于可分离电路，并且在低容量区域中达到或超过经典多层感知机基线。表示相似性分析进一步表明，纠缠电路学习到结构上不同的特征，与对交互状态变量的改进建模一致。这些发现确立了纠缠作为竞争性强化学习中表示学习的功能资源。

英文摘要

Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action mappings. Here, we conduct a controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game. An 8-qubit parameterised quantum circuit serves as a feature extractor within a proximal policy optimisation framework, allowing direct comparison between separable circuits and architectures incorporating fixed (CZ) or trainable (IsingZZ) entangling gates. Entangled circuits consistently outperform separable counterparts with comparable parameter counts and, in low-capacity regimes, match or exceed classical multilayer perceptron baselines. Representation similarity analysis further shows that entangled circuits learn structurally distinct features, consistent with improved modelling of interacting state variables. These findings establish entanglement as a function resource for representation learning in competitive reinforcement learning.

URL PDF HTML ☆

赞 0 踩 0

2603.10044 2026-06-04 cs.SE cs.AI cs.CL cs.LG 版本更新

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

脚手架下的安全性：评估条件如何影响测量的安全性

David Gringras

发表机构 * Harvard University（哈佛大学）； MIT（麻省理工学院）

AI总结本研究通过62,808次盲法预注册评估，测试了六种前沿模型在四种部署配置下的安全性，发现脚手架架构对安全性影响较小，而格式转换（如选择题与开放式问题）可导致5-20个百分点的测量差异，且模型-脚手架间存在显著异质性，质疑了单一综合安全性分数的实用性。

Comments 74 pages including appendices. 6 frontier models, 62,808 primary observations (~89k total). Pre-registered: OSF DOI 10.17605/OSF.IO/CJW92. Code and data: https://github.com/davidgringras/safety-under-scaffolding

详情

AI中文摘要

在基准测试中获得的安全分数不一定能预测同一模型在未经测试的智能体脚手架中的行为。我们通过四种部署配置（直接API、ReAct、多智能体批评者、map-reduce委托）运行了六种前沿模型：在四个安全基准测试（BBQ、TruthfulQA、XSTest/OR-Bench、sycophancy）上进行了N = 62,808次盲法、预注册、等价性检验评估，以及三项支持性分析。ReAct和多智能体脚手架保持在预注册的±2个百分点的等价范围内；map-reduce委托降低了测量的安全性（NNH = 14），尽管这种损失很大程度上是测量伪影：在相同项目上，选择题与开放式问题的措辞使测量的安全率变化5-20个百分点，而分解过程无声地移除了选择题选项。每个模型map-reduce损失的约40-89%归因于这种格式转换而非推理中断，一种保留选项的变体恢复了大部分损失。汇总效应也掩盖了模型与脚手架之间的显著异质性：在map-reduce下，对于相同项目，Opus损失16.8个百分点，而Llama 4增加18.8个百分点。从结构上看，脚手架架构仅解释了0.4%的结果方差（基准选择解释了45倍以上），泛化系数G = 0.000（bootstrap 95% CI [0.000, 0.752]）。如此宽的区间本身足以削弱任何单一综合安全分数作为部署标准的效用。这些是“简单案例”；像诡计和CBRN提升这样的重要属性没有明显理由对格式或脚手架不敏感。代码、数据和提示已作为ScaffoldSafety发布。

英文摘要

A safety score earned on a benchmark need not predict how the same model behaves once it is wrapped in an agentic scaffold the benchmark never tested. We ran six frontier models through four deployment configurations (direct API, ReAct, multi-agent critic, map-reduce delegation): N = 62,808 blinded, pre-registered, equivalence-tested evaluations across four safety benchmarks (BBQ, TruthfulQA, XSTest/OR-Bench, sycophancy), plus three supporting analyses. ReAct and multi-agent scaffolds stay within a pre-registered +/-2 pp equivalence margin; map-reduce delegation degrades measured safety (NNH = 14), though that loss is largely a measurement artifact: on identical items, multiple-choice versus open-ended phrasing shifts the measured safety rate by 5-20 pp, and decomposition silently strips the multiple-choice options. Roughly 40-89% of the per-model map-reduce loss is this format conversion rather than reasoning disruption, and an option-preserving variant recovers most of it. Pooled effects also mask sharp model-by-scaffold heterogeneity: under map-reduce, on identical items, Opus loses 16.8 pp while Llama 4 gains 18.8 pp. Structurally, scaffold architecture explains only 0.4% of outcome variance (benchmark choice explains 45x more), and the generalizability coefficient is G = 0.000 (bootstrap 95% CI [0.000, 0.752]). An interval that wide is enough on its own to undermine the utility of any single composite safety number as a deployment criterion. These are the "easy cases"; consequential properties like scheming and CBRN uplift have no obvious reason to be less format- or scaffold-sensitive. Code, data, and prompts are released as ScaffoldSafety.

URL PDF HTML ☆

赞 0 踩 0

2603.09493 2026-06-04 cs.CV cs.AI 版本更新

EvoPrompt: Guided Prompt Evolution for Vision-Language Models Adaptation

EvoPrompt: 引导提示演化以适应视觉-语言模型

Enming Zhang, Jiayang Li, Yanlong Wang, Yanru Wu, Zhenyu Liu, Yang Li

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院，清华大学）； Sun Yat-sen University（中山大学）； Chinese University of Hong Kong, Shenzhen（香港中文大学（深圳））

AI总结提出EvoPrompt框架，通过引导提示演化路径并解耦低秩更新为方向和幅度分量，实现视觉-语言模型在少样本学习中的遗忘-free微调，同时保持零样本能力。

详情

AI中文摘要

大规模视觉-语言模型（VLM）在有限标注数据下适应下游任务仍然是一个重大挑战。虽然参数高效的提示学习方法提供了一条有希望的路径，但它们常常遭受预训练知识的灾难性遗忘。为了解决这一限制，我们的工作基于一个洞察：控制提示的演化路径对于遗忘-free适应至关重要。为此，我们提出了EvoPrompt，一个旨在明确引导提示轨迹以进行知识保留微调的新型框架。具体来说，我们的方法采用模态共享提示投影器（MPP）从统一嵌入空间生成层次化提示。关键的是，一种演化训练策略将低秩更新解耦为方向和幅度分量，保留早期学习的语义方向而仅调整其幅度，从而使提示能够在不丢弃基础知识的情况下演化。这一过程通过特征几何正则化（FGR）进一步稳定，该正则化强制特征去相关以防止表示崩溃。大量实验表明，EvoPrompt在少样本学习中实现了最先进的性能，同时稳健地保留了预训练VLM的原始零样本能力。

英文摘要

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

URL PDF HTML ☆

赞 0 踩 0

2603.09391 2026-06-04 cs.SD cs.AI eess.AS 版本更新

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

基于物理信息的神经引擎声音建模与可微分脉冲串合成

Robin Doerfler, Lonce Wyse

发表机构 * GitHub

AI总结提出脉冲串-谐振器（PTR）模型，通过可微分合成架构直接建模发动机脉冲形状和时间结构，利用物理信息归纳偏置提升谐波重建质量并降低总损失。

Comments Revised version; to appear in the Proceedings of the 34th European Signal Processing Conference (EUSIPCO 2026)

详情

AI中文摘要

发动机声音源自连续的排气压力脉冲，而非持续的谐波振荡。虽然神经合成方法通常旨在近似最终的频谱特性，但我们提出直接建模底层脉冲形状和时间结构。我们提出了脉冲串-谐振器（PTR）模型，这是一种可微分合成架构，通过将发动机音频生成为与发动机点火模式对齐的参数化脉冲串，并通过模拟排气声学的递归Karplus-Strong谐振器传播它们。该架构集成了物理信息归纳偏置，包括谐波衰减、热力学音高调制、气门动力学包络、排气系统共振以及导出的发动机运行模式，如节气门操作和减速断油（DFCO）。在三种不同发动机类型（总计7.5小时音频）上验证，PTR在谐波重建上比谐波加噪声基线模型提高了21%，总损失降低了5.7%，同时提供了对应于物理现象的可解释参数。完整的代码、模型权重和音频示例已公开提供。

英文摘要

Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and Deceleration Fuel Cutoff (DFCO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.

URL PDF HTML ☆

赞 0 踩 0

2603.09170 2026-06-04 cs.RO cs.AI 版本更新

ShareVerse：面向共享世界建模的多智能体一致视频生成

Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan

发表机构 * Shanghai Jiao Tong University China（上海交通大学中国）； Fudan University China（复旦大学中国）； StepFun China（StepFun中国）

AI总结提出ShareVerse框架，通过构建多智能体交互数据集、空间拼接策略和跨智能体注意力机制，实现多智能体共享世界的一致视频生成。

详情

AI中文摘要

本文提出ShareVerse，一个视频生成框架，支持多智能体共享世界建模，解决了现有工作缺乏统一共享世界构建和多智能体交互支持的问题。ShareVerse利用大型视频模型的生成能力，并整合了三个关键创新：1）在CARLA仿真平台上构建了大规模多智能体交互世界建模数据集，包含多样场景、天气条件和交互轨迹，以及配对的每智能体四视角视频（前/后/左/右视图）和相机数据。2）我们提出了一种针对独立智能体四视角视频的空间拼接策略，以建模更广泛的环境并确保内部多视角几何一致性。3）我们将跨智能体注意力模块集成到预训练视频模型中，实现跨智能体时空信息的交互传递，保证重叠区域的共享世界一致性和非重叠区域的合理生成。支持49帧大规模视频生成的ShareVerse能够准确感知动态智能体的位置，实现一致的共享世界建模。

英文摘要

This paper presents ShareVerse, a video generation framework enabling multi-agent shared world modeling, addressing the gap in existing works that lack support for unified shared world construction with multi-agent interaction. ShareVerse leverages the generation capability of large video models and integrates three key innovations: 1) A dataset for large-scale multi-agent interactive world modeling is built on the CARLA simulation platform, featuring diverse scenes, weather conditions, and interactive trajectories with paired multi-view videos (front/ rear/ left/ right views per agent) and camera data. 2) We propose a spatial concatenation strategy for four-view videos of independent agents to model a broader environment and to ensure internal multi-view geometric consistency. 3) We integrate cross-agent attention blocks into the pretrained video model, which enable interactive transmission of spatial-temporal information across agents, guaranteeing shared world consistency in overlapping regions and reasonable generation in non-overlapping regions. ShareVerse, which supports 49-frame large-scale video generation, accurately perceives the position of dynamic agents and achieves consistent shared world modeling.

URL PDF HTML ☆

赞 0 踩 0

2509.02655 2026-06-04 cs.CY cs.AI 版本更新

BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

BioBlue：在生物与经济对齐的AI安全基准上，具有简化观察格式的LLM的系统性类失控优化失败模式

Roland Pihlakas, Sruthi Susan Kuriakose

发表机构 * Independent researcher（独立研究者）； Three Laws research collaboration（Three Laws研究合作）； Rakvere, Estonia（爱沙尼亚拉克雷市）

AI总结本研究通过长期控制环境测试LLM，发现尽管LLM能理解目标，但在多目标场景下会系统性偏离至单目标、无界优化行为，表现出类似失控优化的失败模式。

Comments 27 pages, 7 figures, 7 tables

详情

AI中文摘要

许多关于“失控优化”的AI对齐讨论聚焦于RL智能体：无界效用最大化者，它们以牺牲其他一切为代价过度优化代理目标（例如，“回形针最大化者”、规范博弈）。基于LLM的系统通常被认为更安全，因为它们作为下一个词元预测器而非持久优化器运行。我们通过将LLM置于需要随时间维持状态或平衡目标的简单、长期控制型环境中来实证检验这一假设：单目标和多目标稳态、平衡无界目标与递减收益、以及可再生资源的可持续性。我们发现，尽管LLM在多个步骤中经常表现适当并清楚理解所述目标，但它们常常以结构化的方式丢失上下文并漂移至失控行为：忽略稳态目标、从多目标权衡崩溃为单目标最大化——从而未能尊重凹效用结构。这些失败在初始阶段的能力行为之后可靠地出现，并表现出特征模式（包括自模仿振荡、无界最大化以及回归单目标优化），尽管此时上下文窗口远未满。问题不在于LLM只是丢失上下文并变得不连贯。尽管LLM表面上看似多目标且有界，但在涉及多目标的持续交互下，其行为系统性偏向于像单目标、无界、对齐不良的优化器。我们假设存在一个词元级模式强化吸引子：LLM可能越来越多地从其近期动作历史的词元模式而非原始指令中推导动作。为何这仅发生在多目标设置中仍是一个开放问题。

英文摘要

Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else. LLM-based systems are often assumed to be safer because they function as next-token predictors rather than persistent optimisers. We empirically test this assumption by placing LLMs in simple, long-horizon control-style environments that require maintaining state of or balancing objectives over time: single- and multi-objective homeostasis, balancing unbounded objectives with diminishing returns, and sustainability of a renewable resource. We find that, although LLMs frequently behave appropriately for many steps and clearly understand the stated objectives, they often lose context in structured ways and drift into runaway behaviours: ignoring homeostatic targets, collapsing from multi-objective trade-offs into single-objective maximisation - thus failing to respect concave utility structures. These failures emerge reliably after initial periods of competent behaviour and exhibit characteristic patterns (including self-imitative oscillations, unbounded maximisation, and reverting to single-objective optimisation), even though the context window is far from full at that point. The problem is not that the LLMs just lose context and become incoherent. Although LLMs appear multi-objective and bounded on the surface, their behaviour under sustained interaction involving multiple objectives, is systematically biased towards acting like single-objective, unbounded, poorly aligned optimisers. We hypothesise a token-level pattern reinforcement attractor: LLMs may increasingly derive actions from the token patterns of their recent action history rather than from the original instructions. Why this happens only in multi-objective settings remains an open question.

URL PDF HTML ☆

赞 0 踩 0

2602.20971 2026-06-04 cs.LG cs.AI 版本更新

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

顺序重要吗：连接鲁棒性定律与鲁棒泛化

Mihir More, Aritra Das, Jaee Ponde, Himadri Mandal, Vishnu Varadarajan, Debayan Gupta

发表机构 * Ashoka University（阿什oka大学）； Truth Audit Labs（真相审计实验室）； Indian Statistical Institute（印度统计研究所）

AI总结本文通过全局和局部Rademacher复杂度，将鲁棒性定律（Lipschitz常数下界）与鲁棒泛化误差联系起来，证明了对任意数据分布，全局Lipschitz界阶不变，而局部Lipschitz界阶随扰动半径和局部浓度项变化。

详情

AI中文摘要

Bubeck和Selke（2021）将鲁棒性定律与鲁棒泛化误差之间的联系作为一个开放问题提出。鲁棒性定律指出，过参数化对于模型实现鲁棒插值是必要的，即插值函数必须是Lipschitz的。Wu等人（2023）将该定律推广到任意数据分布，证明Lipschitz常数满足$L = Ω(n^{1/d})$。另一方面，鲁棒泛化研究小的鲁棒训练损失是否意味着小的鲁棒测试损失。这可以使用统计学习技术（如Rademacher复杂度）来研究，其中鲁棒损失类的Rademacher复杂度的界意味着函数类Lipschitz性的界。我们利用这一联系，明确地将两者联系起来，适用于任意数据分布。(i) 我们证明，在考虑鲁棒损失类的全局Rademacher复杂度时，Lipschitz界的阶保持不变。(ii) 在局部尺度上，即对于具有小经验误差的函数子集，Lipschitz界的阶随扰动半径$ρ$和局部浓度项$\sqrt{r/n}$变化。

英文摘要

Bubeck and Selke (2021) propose the connection between the Law of Robustness and robust generalization error as an open problem. The Law of Robustness states that overparameterization is necessary for models to interpolate robustly, i.e., the interpolating function is required to be Lipschitz. Wu et al. (2023) extend this law to arbitrary data distributions, proving that the Lipschitz constant satisfies $L = Ω(n^{1/d})$. Robust generalization, on the other hand, asks whether small robust training loss implies small robust test loss. This can be studied using statistical learning techniques such as Rademacher complexities, where a bound on the Rademacher complexity of the robust loss class implies a bound on the Lipschitzness of the function class. We use this connection to explicitly link the two for arbitrary data distributions. (i) We prove that the order of the Lipschitz bound remains the same when considering the global Rademacher complexity of robust loss classes. (ii) At the local scale, i.e., for subsets of functions with small empirical error, the order of the Lipschitz bound changes with the perturbation radius $ρ$ and the localized concentration term $\sqrt{r/n}$.

URL PDF HTML ☆

赞 0 踩 0

2511.05722 2026-06-04 cs.CL cs.AI 版本更新

OckBench: Measuring the Efficiency of LLM Reasoning

OckBench：衡量LLM推理效率

Zheng Du, Hao Kang, Song Han, Tushar Krishna, Ligeng Zhu

发表机构 * Georgia Institute of Technology（佐治亚理工学院）； Massachusetts Institute of Technology（麻省理工学院）； NVIDIA（英伟达）

AI总结提出OckBench基准，联合评估推理和编码任务中的准确性与token效率，揭示当前模型token利用率低下问题。

详情

AI中文摘要

大型语言模型（LLM）如GPT-5和Gemini 3已推动自动推理和代码生成的前沿。然而，当前基准强调准确性和输出质量，忽略了关键维度：token使用的效率。在实际应用中，token效率变化很大。解决相同问题且准确率相近的模型，其token长度差异可达 extbf{5.0$ imes$}，导致模型推理能力存在巨大差距。这种差异暴露了显著的冗余，凸显了对标准化基准来量化token效率差距的迫切需求。因此，我们引入OckBench，这是首个联合衡量推理和编码任务中准确性与token效率的基准。我们的评估表明，当前模型的token效率在很大程度上未得到优化，显著增加了服务成本和延迟。这些发现为社区优化潜在推理能力（即token效率）提供了具体路线图。最终，我们主张评估范式转变：token不应被无谓地倍增。我们的基准可在https://ockbench.github.io/获取。

英文摘要

Large language models (LLMs) such as GPT-5 and Gemini 3 have pushed the frontier of automated reasoning and code generation. Yet current benchmarks emphasize accuracy and output quality, neglecting a critical dimension: efficiency of token usage. The token efficiency is highly variable in practical. Models solving the same problem with similar accuracy can exhibit up to a \textbf{5.0$\times$} difference in token length, leading to massive gap of model reasoning ability. Such variance exposes significant redundancy, highlighting the critical need for a standardized benchmark to quantify the gap of token efficiency. Thus, we introduce OckBench, the first benchmark that jointly measures accuracy and token efficiency across reasoning and coding tasks. Our evaluation reveals that token efficiency remains largely unoptimized across current models, significantly inflating serving costs and latency. These findings provide a concrete roadmap for the community to optimize the latent reasoning ability, token efficiency. Ultimately, we argue for an evaluation paradigm shift: tokens must not be multiplied beyond necessity. Our benchmarks are available at https://ockbench.github.io/.

URL PDF HTML ☆

赞 0 踩 0

2602.19101 2026-06-04 cs.CL cs.AI 版本更新

Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

价值纠缠：大型语言模型中不同种类好的混淆

Seong Hah Cho, Junyi Li, Anna Leshinskaya

发表机构 * Independent Department of Cognitive Sciences, UC Irvine（独立认知科学系，加州大学 Irvine 分校）

AI总结通过探测模型行为、嵌入和残差流激活，发现大型语言模型普遍存在价值纠缠，即道德、语法和经济三种价值被混淆，其中语法和经济价值过度受道德价值影响，通过选择性消融与道德相关的激活向量可修复此问题。

2602.16966 2026-06-04 cs.LG cs.AI 版本更新

A Unified Framework for Locality in Scalable MARL

可扩展多智能体强化学习中局部性的统一框架

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）； INRIA Paris（巴黎国家信息与自动化研究所）

AI总结提出统一框架，通过将矩阵C^π分解为环境敏感性和策略敏感性部分，利用谱半径条件ρ(H^π)<1严格弱于行和条件，证明软max温度直接控制局部性，并给出块坐标KL近端策略改进的确定性保证。

详情

AI中文摘要

网络化多智能体强化学习的可扩展方法让每个智能体仅使用智能体图的一小部分邻域进行规划。这仅在系统是值局部性时有效，即一个智能体的扰动对远处另一个智能体的长期值影响较弱。在平均奖励设置中，验证局部性的标准方法是Dobrushin行和界，该界基于一个矩阵$C^π$，该矩阵捕捉每个智能体的下一个状态如何依赖于其他智能体的当前状态。为了使该矩阵易于处理，先前的工作通过联合动作的上确界来约束它。得到的界与策略无关，但当策略从不选择最坏情况动作时，该界是松的。我们将$C^π$分解为分别跟踪环境敏感性和策略敏感性的部分，$C^π\preceq E^{\mathrm s}+E^{\mathrm a}Π(π)$，其中$E^{\mathrm s}$衡量下一个状态如何随当前状态变化，$E^{\mathrm a}$衡量它如何随当前动作变化，$Π(π)$衡量策略对状态变化的反应程度。那么$H^π:= E^{\mathrm s}+E^{\mathrm a}Π(π)$的谱半径控制平均奖励泊松解的衰减，谱证书$ρ(H^π)<1$严格弱于同一矩阵上的行和条件$\|H^π\|_\infty<1$，并适用于先前Dobrushin风格工作中使用的策略无关动作上确界界无法处理的场景。对于温度-$τ$ softmax策略，我们有$Π(π)\le L/(2τ)$，因此softmax温度直接控制局部性。我们利用这一衰减结果为块坐标KL近端策略改进模板提供确定性预言机保证，其截断偏差随消息传递半径$κ$指数衰减。

英文摘要

Scalable methods for networked multi-agent reinforcement learning let each agent plan using only a small neighborhood of the agent graph. This works only when the system is value-local, meaning a perturbation at one agent affects the long-run value at another agent weakly when the two are far apart. In the average-reward setting, the standard way to certify locality is the Dobrushin row-sum bound on a single matrix $C^π$ that captures how each agent's next state depends on each other agent's current state. To make this matrix easy to work with, prior work bounds it by a supremum over joint actions. The resulting bound is independent of the policy, but it is loose whenever the policy never picks the worst-case action. We split $C^π$ into pieces that separately track environment sensitivity and policy sensitivity, $C^π\preceq E^{\mathrm s}+E^{\mathrm a}Π(π)$, where $E^{\mathrm s}$ measures how the next state moves with the current state, $E^{\mathrm a}$ measures how it moves with the current action, and $Π(π)$ measures how reactive the policy is to changes in state. The spectral radius of $H^π:= E^{\mathrm s}+E^{\mathrm a}Π(π)$ then controls the decay of the average-reward Poisson solution, and the spectral certificate $ρ(H^π)<1$ is strictly weaker than the row-sum condition $\|H^π\|_\infty<1$ on the same matrix and applies in regimes where policy-independent action-supremum bounds used in prior Dobrushin-style work cannot. For temperature-$τ$ softmax policies we get $Π(π)\le L/(2τ)$, so the softmax temperature directly controls locality. We use this decay result to give a deterministic oracle guarantee for a block-coordinate KL-proximal policy-improvement template whose truncation bias decays exponentially in the message-passing radius $κ$.

URL PDF HTML ☆

赞 0 踩 0

2602.03972 2026-06-04 stat.ML cs.AI cs.LG 版本更新

Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors

固定预算在最佳臂识别中不比固定置信度难（对数因子范围内）

Kapilan Balagopalan, Yinan Li, Yao Zhao, Tuan Nguyen, Anton Daitche, Houssam Nassif, Kwang-Sung Jun

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Washington（华盛顿大学）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结本文提出元算法FC2FB，将固定置信度算法转化为固定预算算法，证明固定预算的样本复杂度在log因子内不高于固定置信度。

详情

Journal ref: International Conference on Machine Learning (ICML'26), Seoul, Korea, 2026

AI中文摘要

最佳臂识别（BAI）问题是交互式机器学习中最基本的问题之一，有两种形式：固定预算设置（FB）和固定置信度设置（FC）。对于具有唯一最佳臂的$K$臂赌博机，两种设置的最优样本复杂度已被确定，且在对数因子内匹配。这引出了一个关于通用的、可能具有结构化的BAI问题的有趣研究问题：FB是否比FC更难，还是相反？在本文中，我们证明FB在对数因子内并不比FC难。我们通过构造性方式做到这一点：我们提出了一种名为FC2FB（固定置信度到固定预算）的新算法，这是一种元算法，它接收一个FC算法$\mathcal{A}$并将其转化为FB算法。我们证明FC2FB的样本复杂度与$\mathcal{A}$的样本复杂度在对数因子内匹配。这意味着最优FC样本复杂度是FB最优样本复杂度的一个上界（在对数因子内）。我们的结果不仅揭示了FB和FC之间的基本关系，而且具有重要含义：FC2FB与现有最先进的FC算法相结合，可以改善许多FB问题的样本复杂度。

英文摘要

The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with a unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB combined with existing state-of-the-art FC algorithms leads to improved sample complexity for a number of FB problems.

URL PDF HTML ☆

赞 0 踩 0

2602.15202 2026-06-04 quant-ph cs.AI cs.NA eess.SP math.NA stat.CO 版本更新

Tomography by Design: An Algebraic Approach to Low-Rank Quantum States

按设计断层扫描：低秩量子态的代数方法

Shakir Showkat Sofi, Charlotte Vermeylen, Lieven De Lathauwer

发表机构 * Leuven.AI - KU Leuven institute for AI（Leuven.AI - KU莱顿人工智能研究所）

AI总结提出一种代数算法，通过测量特定可观测量估计密度矩阵的结构化条目，并利用低秩假设通过数值线性代数完成矩阵，实现高效且确定性的量子态层析。

Comments 5 pages, Accepted to EUSIPCO 2026

2602.14117 2026-06-04 cs.NI cs.AI 版本更新

Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management

迈向自主O-RAN：一种用于实时网络控制与管理的多尺度智能体AI框架

Hojjat Navidan, Mohammad Cheraghinia, Jaron Fontaine, Mohamed Seif, Eli De Poorter, H. Vincent Poor, Ingrid Moerman, Adnan Shahid

发表机构 * IDLab, Department of Information Technology at Ghent University - imec（IDLab，格鲁特大学信息科技系 - imec）； Department of Electrical and Computer Engineering, Princeton University（电气与计算机工程系，普林斯顿大学）

AI总结提出一种多尺度智能体AI框架，通过非实时、近实时和实时控制环的协调层次结构，实现O-RAN的自主网络控制与管理，并在非平稳条件下和意图驱动的切片资源控制场景中验证了其有效性。

Comments Submitted to the IEEE Networks Journal

详情

AI中文摘要

开放无线接入网络（O-RAN）通过解耦、软件驱动的组件和开放接口承诺灵活的6G网络接入，但这种可编程性也增加了操作复杂性。多个控制环共存于服务管理层和RAN智能控制器（RIC）中，而独立开发的控制应用可能以意外方式交互。同时，生成式人工智能的最新进展正在推动从孤立AI模型向能够解释目标、协调多个模型和控制功能并随时间调整行为的智能体AI系统转变。本文提出了一种用于O-RAN的多尺度智能体AI框架，将RAN智能组织为跨非实时（Non-RT）、近实时（Near-RT）和实时（RT）控制环的协调层次结构：（i）Non-RT RIC中的大语言模型（LLM）智能体将运营商意图转化为策略并管理模型生命周期；（ii）Near-RT RIC中的小语言模型（SLM）智能体执行低延迟优化，并可激活、调整或禁用现有控制应用；（iii）分布式单元附近的无线物理层基础模型（WPFM）智能体提供接近空中接口的快速推理。我们描述了这些智能体如何通过标准化的O-RAN接口和遥测进行协作。通过基于开源模型、软件和数据集的概念验证实现，我们在两个代表性场景中展示了所提出的智能体方法：非平稳条件下的鲁棒操作和意图驱动的切片资源控制。

英文摘要

Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist across the service management layer and RAN Intelligent Controller (RIC), while independently developed control applications can interact in unintended ways. In parallel, recent advances in generative Artificial Intelligence (AI) are enabling a shift from isolated AI models toward agentic AI systems that can interpret goals, coordinate multiple models and control functions, and adapt their behavior over time. This article proposes a multi-scale agentic AI framework for O-RAN that organizes RAN intelligence as a coordinated hierarchy across the Non-Real-Time (Non-RT), Near-Real-Time (Near-RT), and Real-Time (RT) control loops: (i) A Large Language Model (LLM) agent in the Non-RT RIC translates operator intent into policies and governs model lifecycles. (ii) Small Language Model (SLM) agents in the Near-RT RIC execute low-latency optimization and can activate, tune, or disable existing control applications; and (iii) Wireless Physical-layer Foundation Model (WPFM) agents near the distributed unit provide fast inference close to the air interface. We describe how these agents cooperate through standardized O-RAN interfaces and telemetry. Using a proof-of-concept implementation built on open-source models, software, and datasets, we demonstrate the proposed agentic approach in two representative scenarios: robust operation under non-stationary conditions and intent-driven slice resource control.

URL PDF HTML ☆

赞 0 踩 0

2602.12643 2026-06-04 cs.LG cs.AI stat.ML 版本更新

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics

通过潜在动力学统一无模型效率与基于模型的表示

Jashaswimalya Acharjee, Balaraman Ravindran

AI总结提出统一潜在动力学算法，通过将状态-动作对嵌入到值函数近似线性的潜在空间，无需规划开销即可融合无模型效率与基于模型表示的优势，在80个环境中匹配或超越专门基线。

Comments Similarities found with a prior work. Hence, requesting for withdrawal until further notice

详情

AI中文摘要

我们提出了统一潜在动力学（ULD），一种新颖的强化学习算法，它统一了无模型方法的效率与基于模型方法的表示优势，且不产生规划开销。通过将状态-动作对嵌入到真实值函数近似线性的潜在空间中，我们的方法支持跨不同领域使用单一超参数集——从低维和像素输入的连续控制到高维Atari游戏。我们证明，在温和条件下，基于嵌入的时序差分更新的不动点与相应线性基于模型的值扩展的不动点一致，并推导了将嵌入保真度与值逼近质量相关联的显式误差界。在实践中，ULD采用编码器、值函数和策略网络的同步更新、短视界预测动力学的辅助损失以及奖励尺度归一化，以确保在稀疏奖励下的稳定学习。在涵盖Gym运动控制、DeepMind Control（本体感觉和视觉）以及Atari的80个环境上的评估表明，我们的方法匹配或超过了专门的基于模型和通用基于模型的基线的性能——以最少的调参和更少的参数实现了跨领域能力。这些结果表明，仅与值对齐的潜在表示就能提供传统上归因于完整基于模型规划的适应性和样本效率。

英文摘要

We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the efficiency of model-free methods with the representational strengths of model-based approaches, without incurring planning overhead. By embedding state-action pairs into a latent space in which the true value function is approximately linear, our method supports a single set of hyperparameters across diverse domains -- from continuous control with low-dimensional and pixel inputs to high-dimensional Atari games. We prove that, under mild conditions, the fixed point of our embedding-based temporal-difference updates coincides with that of a corresponding linear model-based value expansion, and we derive explicit error bounds relating embedding fidelity to value approximation quality. In practice, ULD employs synchronized updates of encoder, value, and policy networks, auxiliary losses for short-horizon predictive dynamics, and reward-scale normalization to ensure stable learning under sparse rewards. Evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari, our approach matches or exceeds the performance of specialized model-free and general model-based baselines -- achieving cross-domain competence with minimal tuning and a fraction of the parameter footprint. These results indicate that value-aligned latent representations alone can deliver the adaptability and sample efficiency traditionally attributed to full model-based planning.

URL PDF HTML ☆

赞 0 踩 0

2602.11189 2026-06-04 q-bio.BM cs.AI 版本更新

MuCO: Generative Peptide Cyclization Empowered by Multi-stage Conformation Optimization

MuCO：基于多阶段构象优化的生成式肽环化

Yitian Wang, Fanmeng Wang, Angxiao Yue, Wentao Guo, Yaning Cui, Hongteng Xu

发表机构 * Department of XXX, University of YYY, Location, Country（XXX部门，YYY大学，地点，国家）； School of ZZZ, Institute of WWW, Location, Country（ZZZ学院，WWW研究所，地点，国家）； Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China（中关村人工智能学院，中国人民大学，北京，中国）； Beijing Key Laboratory of Research on Large Models（北京大模型研究关键实验室）； Engineering Research Center of Next-Generation Intelligent Search（下一代智能搜索工程研究中心）

AI总结提出MuCO方法，通过多阶段构象优化生成环肽构象，在物理稳定性、结构多样性和计算效率上优于现有方法。

详情

AI中文摘要

建模肽环化对于虚拟筛选具有理想物理和药物特性的候选肽至关重要。这一任务具有挑战性，因为环肽通常呈现多样化的环状构象，而由线性肽折叠推导出的确定性预测模型无法很好地捕捉这些构象。在本研究中，我们提出MuCO（多阶段构象优化），一种生成式肽环化方法，对以相应线性肽为条件的环肽构象分布进行建模。原则上，MuCO将肽环化任务解耦为三个阶段：拓扑感知的主链设计、生成式侧链打包和物理感知的全原子优化，从而以从粗到细的方式生成和优化环肽构象。这种多阶段框架实现了用于构象生成的高效并行采样策略，并允许快速探索多样化的低能构象。在大型CPSea数据集上的实验表明，MuCO在物理稳定性、结构多样性、二级结构恢复和计算效率方面显著且一致地优于最先进的方法，使其成为探索和设计环肽的有前景的计算工具。所提出方法的演示可在https://github.com/mianqiu00/MuCO找到。

AlgoVeri：面向经典算法的验证代码生成对齐基准

Haoyu Zhao, Ziran Yang, Jiawei Li, Deyuan He, Zenan Li, Chi Jin, Venugopal V. Veeravalli, Aarti Gupta, Sanjeev Arora

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结为解决跨范式验证代码生成评估缺乏统一方法的问题，提出AlgoVeri基准，在Dafny、Verus和Lean三种语言上评估77个经典算法的验证代码生成，揭示不同验证系统的能力差距。

Comments Accepted to ICML 2026, 32 pages

详情

AI中文摘要

验证代码生成指从严格规范生成形式化验证的代码。近期AI模型在验证代码生成方面展现出潜力，但缺乏跨范式的统一评估方法。现有基准仅测试单一语言/工具（如Dafny、Verus和Lean），且各自覆盖非常不同的任务，因此性能数据无法直接比较。我们通过AlgoVeri基准填补这一空白，该基准在Dafny、Verus和Lean上评估77个经典算法的验证代码生成。通过强制使用相同的功能契约，AlgoVeri揭示了验证系统中的关键能力差距。前沿模型在Dafny中取得了可观的成功（Gemini-3 Flash为40.3%），其中高层抽象和SMT自动化简化了工作流，但在Verus的系统级内存约束（24.7%）和Lean所需的显式证明构造（7.8%）下性能急剧下降。除了总体指标，我们还发现了测试时计算动态的显著差异：Gemini-3有效利用迭代修复提升性能（例如，在Dafny中通过率提高三倍），而GPT-OSS则早期饱和。最后，我们的错误分析表明，语言设计影响改进轨迹：Dafny允许模型专注于逻辑正确性，而Verus和Lean将模型困在持久的语法和语义障碍中。所有数据和评估代码可在https://github.com/haoyuzhao123/algoveri获取。

英文摘要

Vericoding refers to the generation of formally verified code from rigorous specifications. Recent AI models show promise in vericoding, but a unified methodology for cross-paradigm evaluation is lacking. Existing benchmarks test only individual languages/tools (e.g., Dafny, Verus, and Lean) and each covers very different tasks, so the performance numbers are not directly comparable. We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean. By enforcing identical functional contracts, AlgoVeri reveals critical capability gaps in verification systems. While frontier models achieve tractable success in Dafny ($40.3$% for Gemini-3 Flash), where high-level abstractions and SMT automation simplify the workflow, performance collapses under the systems-level memory constraints of Verus ($24.7$%) and the explicit proof construction required by Lean (7.8%). Beyond aggregate metrics, we uncover a sharp divergence in test-time compute dynamics: Gemini-3 effectively utilizes iterative repair to boost performance (e.g., tripling pass rates in Dafny), whereas GPT-OSS saturates early. Finally, our error analysis shows that language design affects the refinement trajectory: while Dafny allows models to focus on logical correctness, Verus and Lean trap models in persistent syntactic and semantic barriers. All data and evaluation code can be found at https://github.com/haoyuzhao123/algoveri.

URL PDF HTML ☆

赞 0 踩 0

2509.25289 2026-06-04 cs.LG cs.AI 版本更新

ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation

ClustRecNet: 一种用于聚类算法推荐的新型端到端深度学习框架

Mohammadreza Bakhtyari, Bogdan Mazoure, Renato Cordeiro de Amorim, Guillaume Rabusseau, Vladimir Makarenkov

发表机构 * Département d’Informatique, Université du Québec à Montréal（魁北克大学蒙特利尔分校计算机科学系）； Mila - Quebec AI Institute（魁北克人工智能研究所）； School of Computer Science and EE, University of Essex（埃塞克斯大学计算机科学与电子工程学院）； Department of Computer Science and Operations Research, Université de Montréal（蒙特利尔大学计算机科学与运筹学系）

AI总结提出ClustRecNet，一种端到端深度学习框架，通过直接学习原始表格数据的高阶表示来推荐合适的聚类算法，在合成和真实基准上优于传统内部聚类有效性指标和AutoML方法。

Comments Published in IEEE Access

详情

DOI: 10.1109/ACCESS.2026.3697689
Journal ref: IEEE Access, vol. 14, pp. 81352 - 81365, 2026

AI中文摘要

为给定数据集识别有效的聚类算法仍然是一个基本的无监督学习问题。我们引入了ClustRecNet，一种新颖的端到端深度学习框架，通过直接学习原始表格数据的高阶表示来推荐合适的聚类算法。为了促进稳健的元学习，我们首先构建了一个包含34,000个合成数据集的综合存储库，涵盖了多种聚类场景，运行了10种流行的聚类算法，并使用调整兰德指数（ARI）建立真实标签。ClustRecNet的架构包含一个卷积块、两个残差块和一个注意力块，以捕获局部和全局结构模式，有效绕过了与手动特征工程相关的知识瓶颈。在合成和真实世界基准上的广泛评估表明，ClustRecNet始终优于传统的内部聚类有效性指标，如轮廓系数、Calinski-Harabasz、Davies-Bouldin和Dunn，以及最先进的自动化机器学习（AutoML）方法，如ML2DAC、AutoCluster和AutoML4Clust。例如，我们的框架在合成数据上平均比Calinski-Harabasz聚类有效性指数高出0.497的ARI增益，在真实世界基准上平均比领先的AutoML方法（ML2DAC）高出44.16%的ARI改进。代码和数据可在以下网址获取：https://github.com/mrbakhtyari/ClustRecNet

英文摘要

Identifying an effective clustering algorithm for a given dataset remains a fundamental unsupervised learning issue. We introduce ClustRecNet, a novel end-to-end deep learning framework that recommends suitable clustering algorithm(s) by directly learning high-order representations of raw tabular data. To facilitate robust meta-learning, we first construct a comprehensive repository of 34,000 synthetic datasets encompassing a large variety of clustering scenarios, run 10 popular clustering algorithms, and use Adjusted Rand Index (ARI) to establish ground-truth labels. ClustRecNet's architecture incorporates a convolution block, two residual blocks, and an attention block to capture local and global structural patterns, effectively bypassing the knowledge bottleneck associated with manual feature engineering. Extensive evaluation on both synthetic and real-world benchmarks demonstrates that ClustRecNet consistently outperforms traditional internal cluster validity indices such as Silhouette, Calinski-Harabasz, Davies-Bouldin, and Dunn as well as state-of-the-art Automated Machine Learning (AutoML) approaches such as ML2DAC, AutoCluster, and AutoML4Clust. For example, our framework achieves an average 0.497 ARI gain over the Calinski-Harabasz cluster validity index on synthetic data and an average 44.16% ARI improvement over the leading AutoML approach (ML2DAC) on real-world benchmarks. Code and data are available at: https://github.com/mrbakhtyari/ClustRecNet

URL PDF HTML ☆

赞 0 踩 0

2601.20800 2026-06-04 cs.LG cs.AI 版本更新

Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces

条件PED-ANOVA：层次与动态搜索空间中的超参数重要性

Kaito Baba, Yoshihiko Ozaki, Shuhei Watanabe

发表机构 * Preferred Networks, Inc.（Preferred Networks公司）； The University of Tokyo（东京大学）； SB Intuitions Corp.（SB Intuitions公司）

AI总结提出条件PED-ANOVA框架，用于估计条件搜索空间中超参数的重要性，通过闭式估计器准确反映条件激活和域变化，实验证明其优于朴素适应方法。

Comments 20 pages, 15 figures. Accepted to the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情

DOI: 10.1145/3770855.3817758

AI中文摘要

我们提出条件PED-ANOVA（condPED-ANOVA），一个用于估计条件搜索空间中超参数重要性（HPI）的原则性框架，其中超参数的存在或域可能依赖于其他超参数。尽管原始PED-ANOVA提供了一种快速有效的方法来估计搜索空间内高性能区域的HPI，但它假设一个固定的、无条件的搜索空间，因此无法正确处理条件超参数。为了解决这个问题，我们引入了针对高性能区域的条件HPI，并推导出一个闭式估计器，能够准确反映条件激活和域变化。实验表明，现有HPI估计器的朴素适应在条件设置下会产生误导性或不可解释的重要性，而condPED-ANOVA始终提供反映底层条件结构的有意义的重要性。我们的代码公开在https://github.com/kAIto47802/condPED-ANOVA。

英文摘要

We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperparameter can depend on other hyperparameters. Although the original PED-ANOVA provides a fast and efficient way to estimate HPI within the top-performing regions of the search space, it assumes a fixed, unconditional search space and therefore cannot properly handle conditional hyperparameters. To address this, we introduce a conditional HPI for top-performing regions and derive a closed-form estimator that accurately reflects conditional activation and domain changes. Experiments show that naive adaptations of existing HPI estimators yield misleading or uninterpretable importances in conditional settings, whereas condPED-ANOVA consistently provides meaningful importances that reflect the underlying conditional structure. Our code is publicly available at https://github.com/kAIto47802/condPED-ANOVA.

URL PDF HTML ☆

赞 0 踩 0

2602.06911 2026-06-04 cs.CR cs.AI 版本更新

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

TamperBench：系统化压力测试微调和篡改下的LLM安全性

Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla

发表机构 * Critical ML Lab Waterloo Canada（Waterloo大学Critical ML实验室）； FAR.AI Berkeley USA（伯克利美国FAR.AI公司）； University of Toronto Toronto Canada（多伦多大学）； University of Waterloo Waterloo Canada（Waterloo大学）； ETH Zürich Zürich Switzerland（苏黎世联邦理工学院）； MIT CSAIL Cambridge USA（麻省理工学院CSAIL实验室）； University of Toronto, MPI, EuroSafeAI, Vector Institute Toronto Canada（多伦多大学、马克斯·普朗克研究所、EuroSafeAI、Vector Institute）； Critical ML Lab University of Waterloo Waterloo Canada（Waterloo大学Critical ML实验室）

AI总结提出统一框架TamperBench，通过系统化超参数扫描评估21个开源LLM在9种篡改威胁下的安全性和实用性，发现越狱微调是最严重攻击，当前对齐阶段防御基本失效。

Comments 25 pages, 15 figures

详情

DOI: 10.1145/3770855.3817557

AI中文摘要

随着能力日益增强的开源大语言模型（LLMs）的部署，提高其抵抗意外或故意不安全修改的篡改能力对于最小化风险变得至关重要。然而，目前没有标准方法来评估篡改抵抗性。不同的数据集、指标和篡改配置使得难以比较不同模型和防御之间的安全性、实用性和鲁棒性。为解决这一问题，我们引入了TamperBench，这是第一个系统评估LLM篡改抵抗性的统一框架。TamperBench (i) 整理了最先进的权重空间微调攻击、潜在空间表示攻击和对齐阶段防御的仓库；(ii) 通过每个攻击-模型对的系统化超参数扫描实现现实的对抗性评估；(iii) 提供安全性和实用性评估。我们使用TamperBench评估了21个开源LLM，包括增强防御的变体，针对九种篡改威胁，使用标准化的安全性和能力指标，并对每个模型-攻击对进行超参数扫描。结果提供了包括后训练对篡改抵抗性的影响、越狱微调通常是最严重的攻击以及当前对齐阶段防御基本无法抵御攻击扫描等见解。代码可在 https://github.com/criticalml-uw/TamperBench 获取。

英文摘要

As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamper resistance against unsafe modifications, whether accidental or intentional, becomes critical to minimize risks. However, there is no standard approach to evaluate tamper resistance. Varied datasets, metrics, and tampering configurations make it difficult to compare safety, utility, and robustness across different models and defenses. To address this, we introduce TamperBench, the first unified framework to systematically evaluate the tamper resistance of LLMs. TamperBench (i) curates a repository of state-of-the-art weight-space fine-tuning attacks, latent-space representation attacks, and alignment-stage defenses; (ii) enables realistic adversarial evaluation through systematic hyperparameter sweeps per attack-model pair; and (iii) provides both safety and utility evaluations. We use TamperBench to evaluate 21 open-weight LLMs, including defense-augmented variants, across nine tampering threats using standardized safety and capability metrics with hyperparameter sweeps per model-attack pair. The results provide insights including effects of post-training on tamper resistance, that jailbreak-tuning is typically the most severe attack, and that current alignment-stage defenses largely fail to withstand attack sweeps. Code is available at https://github.com/criticalml-uw/TamperBench.

URL PDF HTML ☆

赞 0 踩 0

2602.04101 2026-06-04 cs.AI 版本更新

Interfaze: The Future of AI is built on Task-Specific Small Models

Interfaze: 人工智能的未来建立在特定任务的小模型之上

Harsha Vardhan Khurdula, Vineet Agarwal, Yoeven D Khemlani

发表机构 * GitHub

AI总结提出Interfaze混合模型，通过共享嵌入空间将任务特定深度神经网络融合到Transformer解码器中，在多个确定性基准上以低成本达到高精度。

Comments 10 pages, 2 figures

详情

AI中文摘要

我们提出Interfaze，一种原生混合模型，通过共享嵌入空间将任务特定的深度神经网络（CNN和DNN）直接融合到Transformer解码器中。专门的感知编码器处理复杂多语言PDF的光学字符识别（OCR）、开放词汇对象和图形用户界面（GUI）检测，以及带说话人分离的多语言语音识别。每个编码器通过任务特定的适配器暴露，并可独立激活，因此查询仅触及所需的参数。内置的动作基础提供接地外部状态：代理无头浏览器和爬虫、代码沙箱、多域网络索引和可扩展向量存储。解码器过滤并合并这些信号，在任务需要时进行推理，并输出基于置信度的确定性结果。原始专家元数据（边界框、置信度分数、时间戳）被保留并作为前文与答案一起返回。在此架构上，Interfaze-Beta在确定性开发者任务基准套件中领先。它在OCRBench v2上达到70.7%，在olmOCR上达到85.7%，在RefCOCO上达到82.1%，在VoxPopuli上词错误率2.4%，在Spider-2.0-Lite上达到52.9%，在GPQA-Diamond上达到92.4%，在MMMLU上达到90.9%，在MMMU-Pro上达到71.1%，在结构化输出基准（SOB）上值准确率80.5%，在每个任务上都优于价格相当的通才模型（Gemini-3-Flash、Gemini-3.5-Flash、Claude-Sonnet-4.6、GPT-5.4-Mini和Grok-4.3）。由于融合的专家编码器通过单次传递而非重复工具调用大型模型来解决感知问题，Interfaze在确定性任务上以闪存级成本达到高精度和可验证的元数据。

英文摘要

We present Interfaze, a native hybrid model that fuses task-specific deep neural networks (CNNs and DNNs) directly into a transformer decoder through a shared embedding space. Specialized perceptual encoders handle optical character recognition (OCR) over complex multilingual PDFs, open-vocabulary object and graphical user interface (GUI) detection, and multilingual speech recognition with diarization. Each is exposed through a task-specific adapter and can be activated on its own, so a query touches only the parameters it needs. A built-in action foundation supplies a grounded external state: a proxied headless browser and scraper, a code sandbox, a multi-domain web index, and a scalable vector store. The decoder filters and merges these signals, reasons over them when a task requires it, and emits deterministic outputs built on confidence. The raw specialist metadata (bounding boxes, confidence scores, timestamps) is preserved and returned alongside the answer as precontext. On this architecture, Interfaze-Beta leads a suite of deterministic developer-task benchmarks. It reaches 70.7% on OCRBench v2, 85.7% on olmOCR, 82.1% on RefCOCO, a 2.4% word error rate on VoxPopuli, 52.9% on Spider-2.0-Lite, 92.4% on GPQA-Diamond, 90.9% on MMMLU, 71.1% on MMMU-Pro, and 80.5% value accuracy on the Structured Output Benchmark (SOB), ahead of comparably priced generalist models (Gemini- 3-Flash, Gemini-3.5-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3) on every task. Because fused specialist encoders resolve perception in a single pass instead of through repeated tool calls into a large model, Interfaze reaches high accuracy with verifiable metadata on deterministic tasks while running at flash-tier cost.

URL PDF HTML ☆

赞 0 踩 0

2601.09719 2026-06-04 cs.CL cs.AI 版本更新

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models

有界双曲正切：大型语言模型中预层归一化的稳定高效替代方案

Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song

发表机构 * Yonsei University（延世大学）； Upstage AI

AI总结提出BHyT，通过有界双曲正切和数据驱动的输入约束替代Pre-LN，在保持稳定性的同时提升训练和推理效率。

Comments Accepted to ICML 2026

详情

AI中文摘要

预层归一化（Pre-LN）是大型语言模型（LLM）的事实标准，对于稳定预训练和有效迁移学习至关重要。然而，Pre-LN会带来重复的统计计算开销，并且仍然容易受到深度诅咒的影响，即随着层数增加，隐藏状态幅度和方差增大，破坏训练稳定性。面向效率的无归一化方法（如Dynamic Tanh (DyT)）提高了吞吐量，但在深度下仍然脆弱。为了同时解决稳定性和效率问题，我们提出了有界双曲正切（BHyT），作为Pre-LN的直接替代方案。BHyT将tanh非线性与显式的、数据驱动的输入边界相结合，使激活值保持在非饱和范围内。它防止了激活幅度和方差随深度增长，并提供了理论稳定性保证。在效率方面，BHyT每个块仅计算一次精确统计量，并用轻量级方差近似替代第二次归一化。实验表明，BHyT在预训练期间表现出更好的稳定性和效率，与RMSNorm相比，平均训练速度提升1.6%，平均token生成吞吐量提升1.77%，同时在语言理解和推理基准上保持强大的预训练-only和SFT后性能。代码见：https://github.com/MLAI-Yonsei/BHyT

英文摘要

Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic Tanh (DyT) improve throughput but remain fragile at depth. To jointly address stability and efficiency, we propose Bounded Hyperbolic Tanh (BHyT), a drop-in replacement for Pre-LN. BHyT combines a tanh nonlinearity with explicit, data-driven input bounding to keep activations within a non-saturating range. It prevents depth-wise growth in activation magnitude and variance and provides a theoretical stability guarantee. For efficiency, BHyT computes exact statistics once per block and replaces a second normalization with a lightweight variance approximation. Empirically, BHyT demonstrates improved stability and efficiency during pretraining, achieving an average of 1.6\% faster training and an average of 1.77\% higher token generation throughput compared to RMSNorm, while maintaining strong pretraining-only and post-SFT performance across language understanding and reasoning benchmarks\footnote{Code is available at: https://github.com/MLAI-Yonsei/BHyT}.

URL PDF HTML ☆

赞 0 踩 0

2602.02405 2026-06-04 cs.LG cs.AI 版本更新

Making Expert Reasoning Learnable with Self-Distillation

通过自蒸馏使专家推理可学习

Ethan Mendes, Jungsoo Park, Alan Ritter

发表机构 * Georgia Institute of Technology, Atlanta, Georgia（佐治亚理工学院，亚特兰大，佐治亚州）

AI总结提出分布对齐模仿学习（DAIL），通过两步自蒸馏方法弥合专家解决方案与模型分布之间的差距，利用少量高质量专家数据显著提升大语言模型的推理能力。

Comments ICML 2026

详情

AI中文摘要

提升大语言模型（LLM）的推理能力通常依赖于模型采样正确解以进行强化，或存在更强模型来解决问题。然而，许多难题即使对当前前沿模型也难以处理，阻碍了有效训练信号的提取。一个有前景的替代方案是利用高质量的人类专家解决方案，但直接模仿这些数据从根本上存在分布外问题：专家解决方案通常具有教学性质，包含为人类读者而非计算模型设计的隐含推理间隙。此外，高质量专家解决方案成本高昂，需要可泛化且样本高效的训练方法。我们提出分布对齐模仿学习（DAIL），一种两步自蒸馏方法，通过首先将专家解决方案转化为详细的、分布内的推理轨迹，然后应用对比目标使学习聚焦于专家见解和方法，从而弥合分布差距。我们发现，DAIL可以利用少于1000个高质量专家解决方案，在Qwen2.5-Instruct和Qwen3上实现高达31%的pass@128增益，推理效率翻倍，并实现域外泛化。

英文摘要

Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or the existence of a stronger model able to solve the problem. However, many difficult problems remain intractable for even current frontier models, preventing the extraction of valid training signals. A promising alternative is to leverage high-quality expert human solutions, yet naive imitation of this data fails because it is fundamentally out-of-distribution: expert solutions are typically didactic, containing implicit reasoning gaps intended for human readers rather than computational models. Furthermore, high-quality expert solutions are expensive, necessitating generalizable, sample-efficient training methods. We propose Distribution Aligned Imitation Learning (DAIL), a two-step self-distillation method that bridges the distributional gap by first transforming expert solutions into detailed, in-distribution reasoning traces and then applying a contrastive objective to focus learning on expert insights and methodologies. We find that DAIL can leverage fewer than 1000 high-quality expert solutions to achieve up to 31% pass@128 gains on Qwen2.5-Instruct and Qwen3, double reasoning efficiency, and enable out-of-domain generalization.

URL PDF HTML ☆

赞 0 踩 0

2602.01658 2026-06-04 cs.LG cs.AI 版本更新

Efficient Adversarial Attacks on High-dimensional Offline Bandits

高维离线Bandits的高效对抗攻击

Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah

发表机构 * Department of Computer Engineering, Sharif University of Technology（技术学院计算机工程系）

AI总结研究离线bandit训练在奖励模型被对抗扰动时的脆弱性，提出高维威胁模型，证明维度增加时攻击所需扰动范数减小，实验验证了针对性攻击的高成功率。

Comments Published at ICLR 2026 Conference

详情

AI中文摘要

Bandit算法最近成为评估机器学习模型（包括生成图像模型和大语言模型）的强大工具，通过高效识别表现最佳的候选者而无需详尽比较。这些方法通常依赖于奖励模型（常在Hugging Face等平台上以公共权重发布）向bandit提供反馈。在线评估昂贵且需要重复试验，而使用记录数据的离线评估已成为有吸引力的替代方案。然而，离线bandit评估的对抗鲁棒性在很大程度上尚未被探索，特别是当攻击者在bandit训练之前扰动奖励模型（而非训练数据）时。在这项工作中，我们通过理论和实证研究离线bandit训练对奖励模型对抗操纵的脆弱性来填补这一空白。我们引入了一种新颖的威胁模型，其中攻击者利用高维环境中的离线数据劫持bandit的行为。从线性奖励函数开始，扩展到非线性模型如ReLU神经网络，我们研究了用于生成模型评估的两个Hugging Face评估器上的攻击：一个测量美学质量，另一个评估组合对齐。我们的结果表明，即使对奖励模型权重进行微小、不可察觉的扰动，也能显著改变bandit的行为。从理论角度来看，我们证明了一个显著的高维效应：随着输入维度的增加，成功攻击所需的扰动范数减小，使得现代应用如图像评估尤其脆弱。大量实验证实，简单的随机扰动无效，而精心设计的针对性攻击实现了近乎完美的攻击成功率。

英文摘要

Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top-performing candidates without exhaustive comparisons. These methods typically rely on a reward model, often distributed with public weights on platforms such as Hugging Face, to provide feedback to the bandit. While online evaluation is expensive and requires repeated trials, offline evaluation with logged data has become an attractive alternative. However, the adversarial robustness of offline bandit evaluation remains largely unexplored, particularly when an attacker perturbs the reward model (rather than the training data) prior to bandit training. In this work, we fill this gap by investigating, both theoretically and empirically, the vulnerability of offline bandit training to adversarial manipulations of the reward model. We introduce a novel threat model in which an attacker exploits offline data in high-dimensional settings to hijack the bandit's behavior. Starting with linear reward functions and extending to nonlinear models such as ReLU neural networks, we study attacks on two Hugging Face evaluators used for generative model assessment: one measuring aesthetic quality and the other assessing compositional alignment. Our results show that even small, imperceptible perturbations to the reward model's weights can drastically alter the bandit's behavior. From a theoretical perspective, we prove a striking high-dimensional effect: as input dimensionality increases, the perturbation norm required for a successful attack decreases, making modern applications such as image evaluation especially vulnerable. Extensive experiments confirm that naive random perturbations are ineffective, whereas carefully targeted perturbations achieve near-perfect attack success rates ...

URL PDF HTML ☆

赞 0 踩 0

2602.01619 2026-06-04 cs.LG cs.AI 版本更新

SUSD: Structured Unsupervised Skill Discovery through State Factorization

SUSD: 通过状态分解的结构化无监督技能发现

Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah

发表机构 * Department of Computer Engineering（计算机工程系）

AI总结提出SUSD框架，通过将状态空间分解为独立组件并分配不同技能变量，结合动态模型自适应引导探索，实现更丰富多样的无监督技能发现，并在分解环境中显著优于现有方法。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

无监督技能发现（USD）旨在无需外部奖励的情况下自主学习多样化的技能。最常见的USD方法之一是最大化技能潜在变量与状态之间的互信息（MI）。然而，基于MI的方法由于其不变性特性，倾向于偏好简单、静态的技能，限制了动态、任务相关行为的发现。距离最大化技能发现（DSD）通过利用状态空间距离促进更动态的技能，但仍未能鼓励涵盖环境中所有可控因素或实体的全面技能集。在这项工作中，我们引入了SUSD，一种新颖的框架，通过将状态空间分解为独立组件（例如，物体或可控实体）来利用环境的组合结构。SUSD将不同的技能变量分配给不同的因素，从而实现对技能发现过程的更细粒度控制。一个动态模型还跟踪各因素的学习情况，自适应地将智能体的注意力引导至未充分探索的因素。这种结构化方法不仅促进了更丰富、更多样化技能的发现，还产生了一种分解的技能表示，能够对单个实体进行细粒度且解耦的控制，从而通过分层强化学习（HRL）促进组合下游任务的高效训练。我们在三个环境中的实验结果（因素数量从1到10）表明，我们的方法能够在无监督的情况下发现多样且复杂的技能，在分解和复杂环境中显著优于现有的无监督技能发现方法。代码公开于：https://github.com/hadi-hosseini/SUSD。

英文摘要

Unsupervised Skill Discovery (USD) aims to autonomously learn a diverse set of skills without relying on extrinsic rewards. One of the most common USD approaches is to maximize the Mutual Information (MI) between skill latent variables and states. However, MI-based methods tend to favor simple, static skills due to their invariance properties, limiting the discovery of dynamic, task-relevant behaviors. Distance-Maximizing Skill Discovery (DSD) promotes more dynamic skills by leveraging state-space distances, yet still fall short in encouraging comprehensive skill sets that engage all controllable factors or entities in the environment. In this work, we introduce SUSD, a novel framework that harnesses the compositional structure of environments by factorizing the state space into independent components (e.g., objects or controllable entities). SUSD allocates distinct skill variables to different factors, enabling more fine-grained control on the skill discovery process. A dynamic model also tracks learning across factors, adaptively steering the agent's focus toward underexplored factors. This structured approach not only promotes the discovery of richer and more diverse skills, but also yields a factorized skill representation that enables fine-grained and disentangled control over individual entities which facilitates efficient training of compositional downstream tasks via Hierarchical Reinforcement Learning (HRL). Our experimental results across three environments, with factors ranging from 1 to 10, demonstrate that our method can discover diverse and complex skills without supervision, significantly outperforming existing unsupervised skill discovery methods in factorized and complex environments. Code is publicly available at: https://github.com/hadi-hosseini/SUSD.

URL PDF HTML ☆

赞 0 踩 0

2601.15158 2026-06-04 cs.LG cs.AI 版本更新

Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data

基于结果的强化学习可证明地引导Transformer进行推理，但仅在合适的数据条件下

Yuval Ran-Milo, Yotam Alexander, Shahar Mendel, Nadav Cohen

发表机构 * Tel Aviv University（特拉维夫大学）

AI总结本文通过分析单层Transformer在合成图遍历任务上的策略梯度动力学，证明了基于结果的强化学习能够使Transformer自发学习出结构化的迭代推理算法，并揭示了训练数据中“简单示例”的分布对推理能力涌现的关键作用。

Comments 94 pages, 7 figures

详情

AI中文摘要

通过基于结果的监督进行强化学习训练的Transformer可以自发地生成中间推理步骤（思维链）。然而，稀疏奖励驱动策略梯度发现这种系统性推理的机制仍然知之甚少。我们通过分析单层Transformer在合成图遍历任务上的策略梯度动力学来解决这个问题，该任务没有思维链就无法解决，但允许简单的迭代解决方案。我们证明，尽管仅对最终答案的正确性进行训练，策略梯度仍驱动Transformer收敛到一个结构化的、可解释的算法，该算法逐顶点迭代遍历图。我们刻画了这种涌现所需的分布特性，识别出“简单示例”（即需要较少推理步骤的实例）的关键作用。当训练分布在这些更简单的示例上放置足够的质量时，Transformer学习到一种可泛化的遍历策略，能够外推到更长的链；当这种质量消失时，策略梯度学习变得不可行。我们通过在合成数据上的实验以及在数学推理任务中使用真实世界语言模型的实验来证实我们的理论结果，验证了我们的理论发现可以推广到实际场景。

英文摘要

Transformers trained via Reinforcement Learning (RL) with outcome-based supervision can spontaneously develop the ability to generate intermediate reasoning steps (Chain-of-Thought). Yet the mechanism by which sparse rewards drive policy gradient to discover such systematic reasoning remains poorly understood. We address this by analyzing the policy gradient dynamics of single-layer Transformers on a synthetic graph traversal task that cannot be solved without Chain-of-Thought but admits a simple iterative solution. We prove that despite training solely on final-answer correctness, policy gradient drives the Transformer to converge to a structured, interpretable algorithm that iteratively traverses the graph vertex-by-vertex. We characterize the distributional properties required for this emergence, identifying the critical role of "simple examples": instances requiring fewer reasoning steps. When the training distribution places sufficient mass on these simpler examples, the Transformer learns a generalizable traversal strategy that extrapolates to longer chains; when this mass vanishes, policy gradient learning becomes infeasible. We corroborate our theoretical results through experiments on synthetic data and with real-world language models on mathematical reasoning tasks, validating that our theoretical findings carry over to practical settings.

URL PDF HTML ☆

赞 0 踩 0

2512.21917 2026-06-04 cs.LG cs.AI econ.EM stat.ML 版本更新

揭秘多智能体辩论：置信度与多样性的作用

Xiaochen Zhu, Caiqi Zhang, Yizhou Chi, Tom Stafford, Nigel Collier, Andreas Vlachos

发表机构 * University of Cambridge（剑桥大学）； University of Sheffield（谢菲尔德大学）

AI总结针对多智能体辩论（MAD）在提升大语言模型性能时效果不佳的问题，提出多样性感知初始化和置信度调节辩论协议两种轻量级干预方法，显著提升辩论有效性。

详情

AI中文摘要

多智能体辩论（MAD）被广泛用于通过测试时缩放提升大语言模型（LLM）性能，然而近期研究表明，尽管计算成本更高，普通MAD往往不如简单的多数投票。研究表明，在同质化智能体和统一信念更新下，辩论保持了预期的正确性，因此无法可靠地改善结果。借鉴人类审议和集体决策的研究发现，我们识别出普通MAD缺失的两个关键机制：（i）初始观点的多样性，以及（ii）明确且校准的置信度沟通。我们提出两种轻量级干预方法。首先，一种多样性感知初始化，选择更多样化的候选答案池，增加辩论开始时存在正确假设的可能性。其次，一种置信度调节的辩论协议，其中智能体表达校准后的置信度，并根据他人的置信度调节其更新。我们从理论上证明，多样性感知初始化在不改变底层更新动态的情况下提高了MAD成功的先验概率，而置信度调节更新使辩论能够系统地漂移到正确假设。在实验上，在六个面向推理的QA基准测试中，我们的方法始终优于普通MAD和多数投票。我们的结果将人类审议与基于LLM的辩论联系起来，并表明简单、有原则的修改可以显著增强辩论效果。

英文摘要

Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies show that, under homogeneous agents and uniform belief updates, debate preserves expected correctness and therefore cannot reliably improve outcomes. Drawing on findings from human deliberation and collective decision-making, we identify two key mechanisms missing from vanilla MAD: (i) diversity of initial viewpoints and (ii) explicit, calibrated confidence communication. We propose two lightweight interventions. First, a diversity-aware initialisation that selects a more diverse pool of candidate answers, increasing the likelihood that a correct hypothesis is present at the start of debate. Second, a confidence-modulated debate protocol in which agents express calibrated confidence and condition their updates on others' confidence. We show theoretically that diversity-aware initialisation improves the prior probability of MAD success without changing the underlying update dynamics, while confidence-modulated updates enable debate to systematically drift to the correct hypothesis. Empirically, across six reasoning-oriented QA benchmarks, our methods consistently outperform vanilla MAD and majority vote. Our results connect human deliberation with LLM-based debate and demonstrate that simple, principled modifications can substantially enhance debate effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2512.03553 2026-06-04 cs.CV cs.AI 版本更新

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

直播中的动态内容审核：结合监督分类与MLLM增强的相似度匹配

Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan

发表机构 * TikTok Singapore Singapore（TikTok新加坡）； TikTok San Jose United States（TikTok旧金山美国）； TikTok Shanghai China（TikTok上海中国）

AI总结提出一种混合审核框架，结合监督分类和基于参考的相似度匹配，利用多模态大语言模型提升准确性，在保持轻量推理的同时实现大规模直播内容审核。

Comments To be published at KDD 2026 (ADS track)

详情

DOI: 10.1145/33770854.3783936

AI中文摘要

内容审核对于大规模用户生成视频平台仍然是一个关键且具有挑战性的任务，尤其是在直播环境中，审核必须及时、多模态，并且能够应对不断演变的不良内容形式。我们提出了一个在生产规模部署的混合审核框架，该框架将已知违规的监督分类与针对新颖或微妙情况的基于参考的相似度匹配相结合。这种混合设计能够稳健地检测出明确违规以及传统分类器无法检测到的新颖边缘情况。多模态输入（文本、音频、视觉）通过两个流水线处理，多模态大语言模型（MLLM）将知识提炼到每个流水线中，以提高准确性，同时保持推理轻量。在生产中，分类流水线在80%精确率下达到67%召回率，相似度流水线在80%精确率下达到76%召回率。大规模A/B测试显示，用户对不良直播的观看次数减少了6-8%。这些结果表明了一种可扩展且适应性强的多模态内容治理方法，能够处理明确违规和新兴对抗行为。

英文摘要

Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases. This hybrid design enables robust detection of both explicit violations and novel edge cases that evade traditional classifiers. Multimodal inputs (text, audio, visual) are processed through both pipelines, with a multimodal large language model (MLLM) distilling knowledge into each to boost accuracy while keeping inference lightweight. In production, the classification pipeline achieves 67% recall at 80% precision, and the similarity pipeline achieves 76% recall at 80% precision. Large-scale A/B tests show a 6-8% reduction in user views of unwanted livestreams}. These results demonstrate a scalable and adaptable approach to multimodal content governance, capable of addressing both explicit violations and emerging adversarial behaviors.

URL PDF HTML ☆

赞 0 踩 0

2506.10912 2026-06-04 cs.AI cs.CL 版本更新

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Breaking Bad Molecules: MLLMs 是否准备好进行结构级分子解毒？

Fei Lin, Ziyang Gong, Cong Wang, Tengchao Zhang, Yonglin Tian, Yining Jiang, Ji Dai, Chao Guo, Xiaotong Yu, Xue Yang, Gen Luo, Fei-Yue Wang

发表机构 * Department of Engineering Science, Macau University of Science and Technology, Macau, China（澳门科学技术大学工程科学系）； School of Computer Science, Shanghai Jiao Tong University, Shanghai, China（上海交通大学计算机科学学院）； Institute of Automation, Chinese Academy of Sciences, Beijing, China（中国科学院自动化研究所）； School of Pharmacy, Macau University of Science and Technology, Macau, China（澳门科学技术大学药学院）； Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China（宁波大学电气与计算机科学学院）； State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, China（中国科学院生物制药制备与递送国家重点实验室）； School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China（上海交通大学自动化与智能感知学院）； Shanghai Artificial Intelligence Laboratory, Shanghai, China（上海人工智能实验室）

AI总结本文提出 ToxiMol 基准任务，利用多模态大语言模型进行分子毒性修复，并构建数据集、提示流程和自动评估框架 ToxiEval，实验表明当前模型虽面临挑战但展现出毒性理解与结构编辑的潜力。

详情

AI中文摘要

毒性仍然是早期药物开发失败的主要原因。尽管分子设计和性质预测取得了进展，但分子毒性修复任务——生成结构有效且毒性降低的分子替代物——尚未被系统定义或基准化。为填补这一空白，我们引入了 ToxiMol，这是首个针对通用多模态大语言模型（MLLMs）的分子毒性修复基准任务。我们构建了一个标准化数据集，涵盖 11 个主要任务和 660 个代表性有毒分子，覆盖多种机制和粒度。我们设计了一个具有机制感知和任务自适应能力的提示注释流程，并基于专家毒理学知识。同时，我们提出了一个自动评估框架 ToxiEval，将毒性终点预测、合成可及性、类药性和结构相似性集成到高通量评估链中，用于修复成功评估。我们系统评估了 43 个主流通用 MLLMs，并进行了多项消融研究，以分析关键问题，包括评估指标、候选多样性和失败归因。实验结果表明，尽管当前 MLLMs 在此任务上仍面临重大挑战，但它们开始展现出在毒性理解、语义约束遵循和结构感知编辑方面的有前景的能力。

英文摘要

Toxicity remains a leading cause of early-stage drug development failure. Despite advances in molecular design and property prediction, the task of molecular toxicity repair, generating structurally valid molecular alternatives with reduced toxicity, has not yet been systematically defined or benchmarked. To fill this gap, we introduce ToxiMol, the first benchmark task for general-purpose Multimodal Large Language Models (MLLMs) focused on molecular toxicity repair. We construct a standardized dataset covering 11 primary tasks and 660 representative toxic molecules spanning diverse mechanisms and granularities. We design a prompt annotation pipeline with mechanism-aware and task-adaptive capabilities, informed by expert toxicological knowledge. In parallel, we propose an automated evaluation framework, ToxiEval, which integrates toxicity endpoint prediction, synthetic accessibility, drug-likeness, and structural similarity into a high-throughput evaluation chain for repair success. We systematically assess 43 mainstream general-purpose MLLMs and conduct multiple ablation studies to analyze key issues, including evaluation metrics, candidate diversity, and failure attribution. Experimental results show that although current MLLMs still face significant challenges on this task, they begin to demonstrate promising capabilities in toxicity understanding, semantic constraint adherence, and structure-aware editing.

URL PDF HTML ☆

赞 0 踩 0

2601.18777 2026-06-04 cs.LG cs.AI cs.CL cs.IR stat.AP 版本更新

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

PRECISE: 使用预测驱动的排名估计减少LLM评估的偏差

Abhishek Divekar, Anirban Majumder

发表机构 * Primary contributor and corresponding author（主要贡献者及通讯作者）

AI总结提出PRECISE框架，通过结合少量人工标注与LLM判断，利用预测驱动推断（PPI）方法，在低资源下可靠估计搜索、排序和RAG系统的指标，并校正LLM偏差。

Comments Accepted at AAAI 2026 - Innovative Applications of AI (IAAI-26)

详情

AI中文摘要

评估搜索、排序和RAG系统的质量传统上需要大量人工相关性标注。近年来，一些已部署的系统探索使用大型语言模型（LLM）作为自动评判者，但其固有偏差阻碍了直接用于指标估计。我们提出了一个扩展预测驱动推断（PPI）的统计框架，将最少的人工标注与LLM判断相结合，以生成需要子实例标注的指标的可靠估计。我们的方法仅需少至100个人工标注查询和10,000个未标注示例，相比传统方法显著减少了标注需求。我们为基于LLM的查询改写应用中的相关性提升推断制定了所提出的框架（PRECISE），将PPI扩展到查询-文档级别的子实例标注。通过重新制定指标集成空间，我们将计算复杂度从O(2^|C|)降低到O(2^K)，其中|C|表示语料库大小（百万量级）。在多个著名检索数据集上的详细实验表明，我们的方法降低了业务关键指标Precision@K的估计方差，同时在低资源设置下有效校正了LLM偏差。

英文摘要

Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) that combines minimal human annotations with LLM judgments to produce reliable estimates of metrics which require sub-instance annotations. Our method requires as few as 100 human-annotated queries and 10,000 unlabeled examples, reducing annotation requirements significantly compared to traditional approaches. We formulate our proposed framework (PRECISE) for inference of relevance uplift for an LLM-based query reformulation application, extending PPI to sub-instance annotations at the query-document level. By reformulating the metric-integration space, we reduced the computational complexity from O(2^|C|) to O(2^K), where |C| represents corpus size (in order of millions). Detailed experiments across prominent retrieval datasets demonstrate that our method reduces the variance of estimates for the business-critical Precision@K metric, while effectively correcting for LLM bias in low-resource settings.

URL PDF HTML ☆

赞 0 踩 0

2601.18175 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

成功条件化作为策略改进：模仿成功所解决的优化问题

Daniel Russo

发表机构 * Daniel J. Russo

AI总结本文证明成功条件化（模仿成功轨迹）精确求解了一个信任区域优化问题，其χ²散度约束半径由数据自动确定，并揭示了相对策略改进、策略变化幅度和动作影响之间的等式关系。

详情

AI中文摘要

一种广泛使用的策略改进技术是成功条件化，即收集轨迹，识别那些实现期望结果的轨迹，并更新策略以模仿沿成功轨迹采取的动作。这一原则有许多名称——带SFT的拒绝采样、目标条件化RL、决策Transformer——但它解决了什么优化问题（如果有的话）一直不清楚。我们证明成功条件化精确求解了一个信任区域优化问题，在由数据自动确定半径的χ²散度约束下最大化策略改进。这产生了一个恒等式：相对策略改进、策略变化幅度以及我们称为动作影响（衡量动作选择中的随机变化如何影响成功率）的量在每个状态下都完全相等。因此，成功条件化表现为一个保守的改进算子。精确的成功条件化不会降低性能或引发危险的分布偏移，但当它失败时，它会以可观察的方式失败，即几乎不改变策略。我们将我们的理论应用于常见的回报阈值设定实践，表明这可以放大改进，但代价是可能与真实目标不一致。

英文摘要

A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.

URL PDF HTML ☆

赞 0 踩 0

2601.17363 2026-06-04 cs.CL cs.AI 版本更新

Do readers prefer AI-generated Italian short stories?

读者是否更喜欢AI生成的意大利短篇小说？

Michael Farrell

发表机构 * IULM University Milan Italy（米兰IULM大学）

AI总结通过盲测实验，比较AI（ChatGPT-4o）与著名作家Alberto Moravia的意大利短篇小说，发现AI文本平均评分略高且更受偏好，但差异不显著，且与人口统计和阅读习惯无关。

Comments 8 pages, peer-reviewed and accepted for presentation at New Trends in Translation and Interpreting Technology (NeTTIT 2026), paged-up for publication

2601.06196 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Geometry-Aware Hallucination Detection in Large Language Models

大语言模型中的几何感知幻觉检测

Bodla Krishna Vamshi, Rohan Bhatnagar, Haizhao Yang

发表机构 * University of Maryland, College Park（马里兰大学学院公园分校）

AI总结提出GA-ICL框架，利用冻结LLM的潜在表示建模局部流形和类别原型几何，选择上下文示例以检测幻觉，在FEVER和HaluEval基准上优于基线方法。

详情

AI中文摘要

大型语言模型（LLM）经常生成事实不正确或未经支持的内容，通常称为幻觉。先前的工作探索了解码策略、检索增强和监督微调用于幻觉检测，而最近的研究表明，上下文学习（ICL）可以显著影响事实可靠性。然而，现有的ICL示例选择方法通常依赖于表面相似性启发式方法，并且在任务和模型上表现出有限的鲁棒性。我们提出GA-ICL，一种几何感知的示例采样框架，用于选择上下文示例，该框架利用从冻结LLM中提取的潜在表示。通过联合建模局部流形结构和类别感知的原型几何，GA-ICL根据示例与学习原型的接近程度进行选择，而不仅仅是基于词汇或嵌入相似性。在事实验证（FEVER）和幻觉检测（HaluEval）基准上，GA-ICL在大多数评估设置中优于标准ICL选择基线，在对话和摘要任务上尤其有显著提升。该方法在温度扰动和模型变化下保持鲁棒性，表明与启发式检索策略相比具有更高的稳定性。虽然在较小模型规模下的某些问答场景中，词汇检索仍可能具有竞争力，但我们的结果表明，几何感知的原型选择为幻觉检测提供了一种可靠且训练轻量的方法，无需修改LLM参数。在Phi-14B和Qwen3-32B上的扩展评估证实，GA-ICL能有效扩展到更大模型，在包括较小模型显示边界条件限制的问答任务在内的所有比较基线上均表现优异，为改进ICL示例选择提供了原则性方向。

英文摘要

Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval augmentation, and supervised fine-tuning for hallucination detection, while recent studies show that in-context learning (ICL) can substantially influence factual reliability. However, existing ICL demonstration selection methods often rely on surface-level similarity heuristics and exhibit limited robustness across tasks and models. We propose GA-ICL, a geometry-aware demonstration sampling framework for selecting in-context demonstrations that leverages latent representations extracted from frozen LLMs. By jointly modeling local manifold structure and class-aware prototype geometry, GA-ICL selects demonstrations based on their proximity to learned prototypes rather than lexical or embedding similarity alone. Across factual verification (FEVER) and hallucination detection (HaluEval) benchmarks, GA-ICL outperforms standard ICL selection baselines in the majority of evaluated settings, with particularly strong gains on dialogue and summarization tasks. The method remains robust under temperature perturbations and model variation, indicating improved stability compared to heuristic retrieval strategies. While lexical retrieval can remain competitive in certain question-answering regimes at smaller model scales, our results demonstrate that geometry-aware prototype selection provides a reliable and training-light approach for hallucination detection without modifying LLM parameters. Extended evaluations on Phi-14B and Qwen3-32B confirm that GA-ICL scales effectively to larger models, outperforming all compared baselines including on QA tasks where smaller models show boundary-condition limitations, offering a principled direction for improved ICL demonstration selection.

URL PDF HTML ☆

赞 0 踩 0

2601.13735 2026-06-04 cs.AI 版本更新

Reasoning or Fluency? Dissecting Probabilistic Confidence in Best-of-N Selection

推理还是流畅性？剖析Best-of-N选择中的概率置信度

Hojin Kim, Jaehyung Kim

发表机构 * Yonsei University（延世大学）

AI总结本文通过引入三类因果扰动实验，发现当前概率置信度指标主要捕捉表面流畅性而非推理质量，并提出对比因果度量以更忠实地选择输出。

Comments 15 pages, 4 figures

详情

AI中文摘要

概率置信度指标越来越多地被用作Best-of-N选择中推理质量的代理，其假设是更高的置信度反映更高的推理保真度。在这项工作中，我们通过调查这些指标是否真正捕捉到有效推理所需的步骤间因果依赖性来挑战这一假设。我们引入了三类步骤间因果扰动，系统地破坏推理步骤之间的依赖性，同时保持局部流畅性。令人惊讶的是，在不同的模型族和推理基准上，我们发现选择精度在这些扰动下仅轻微下降。即使是严重的干预，例如应用硬注意力掩码直接阻止模型关注先前的推理步骤，也不会显著降低选择性能。这些发现提供了强有力的证据，表明当前的概率指标在很大程度上对逻辑结构不敏感，而是主要捕捉表面流畅性或分布内先验。受此差距的启发，我们提出了一种对比因果度量，明确隔离步骤间因果依赖性，并证明它比现有的基于概率的方法产生更忠实的输出选择。

英文摘要

Probabilistic confidence metrics are increasingly adopted as proxies for reasoning quality in Best-of-N selection, under the assumption that higher confidence reflects higher reasoning fidelity. In this work, we challenge this assumption by investigating whether these metrics truly capture inter-step causal dependencies necessary for valid reasoning. We introduce three classes of inter-step causality perturbations that systematically disrupt dependencies between reasoning steps while preserving local fluency. Surprisingly, across diverse model families and reasoning benchmarks, we find that selection accuracy degrades only marginally under these disruptions. Even severe interventions, such as applying hard attention masks that directly prevent the model from attending to prior reasoning steps, do not substantially reduce selection performance. These findings provide strong evidence that current probabilistic metrics are largely insensitive to logical structure, and primarily capture surface-level fluency or in-distribution priors instead. Motivated by this gap, we propose a contrastive causality metric that explicitly isolates inter-step causal dependencies, and demonstrate that it yields more faithful output selection than existing probability-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2601.07036 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

Mid-Think: 通过词元级触发器实现无需训练的中间预算推理

Wang Yang, Debargha Ganguly, Xinpeng Li, Chaoda Song, Shouren Wang, Vikash Singh, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University（凯斯西储大学）

AI总结本文通过分析注意力机制和提示实验，发现推理行为主要由少量触发词元控制，并据此提出Mid-Think方法，通过组合触发词元实现中间预算推理，在准确率-长度权衡上优于基线，并能在强化学习训练中减少时间并提升性能。

详情

AI中文摘要

混合推理语言模型通常通过高级的Think/No-think指令来控制推理行为，但我们发现这种模式切换主要由一小部分触发词元驱动，而非指令本身。通过注意力分析和受控提示实验，我们表明开头的“Okay”词元会诱导推理行为，而“</think>”后的换行模式则会抑制推理。基于这一观察，我们提出了Mid-Think，一种简单的无需训练的提示格式，通过组合这些触发器实现中间预算推理，在准确率-长度权衡上始终优于固定词元和基于提示的基线。此外，在监督微调后将Mid-Think应用于强化学习训练，可将训练时间减少约15%，同时将Qwen3-8B在AIME上的最终性能从69.8%提升至72.4%，在GPQA上从58.5%提升至61.1%，证明了其在推理时控制和基于强化学习的推理训练中的有效性。

英文摘要

Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.

URL PDF HTML ☆

赞 0 踩 0

2512.04668 2026-06-04 cs.CR cs.AI cs.CL 版本更新

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

拓扑结构至关重要：多智能体大语言模型中的内存泄漏测量

Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yan Liu, Yue Zhao, Xiyang Hu

发表机构 * Arizona State University（亚利桑那州立大学）； University of Southern California（南加州大学）； Florida State University（佛罗里达州立大学）

AI总结提出MAMA框架，通过控制图拓扑结构评估多智能体LLM系统中的内存泄漏，发现密集连接、短攻击距离和高中心性增加泄漏，并给出稀疏或层次化拓扑的设计建议。

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026. Camera-ready version

详情

AI中文摘要

图拓扑结构是多智能体LLM系统中内存泄漏的基本决定因素，但其影响尚未得到充分量化。我们提出了MAMA（多智能体内存攻击），一个用于比较多智能体LLM系统中拓扑条件内存泄漏的受控评估框架。MAMA操作于包含标记的个人身份信息（PII）实体的合成文档，从中生成经过清理的任务指令。我们执行两阶段协议：Engram（将私人信息植入目标智能体的内存）和Resonance（多轮交互，攻击者尝试提取）。在10轮中，我们使用两阶段恢复标准测量泄漏，该标准结合了精确匹配提取和基于LLM对攻击者最终输出的推理。我们评估了六种典型拓扑（完全图、环、链、树、星、星环），涉及n∈{4,5,6}、攻击者-目标放置和基础模型。结果一致：更密集的连通性、更短的攻击者-目标距离和更高的目标中心性增加泄漏；大多数泄漏发生在早期轮次，然后趋于平稳；模型选择改变绝对比率但保留广泛的结构趋势；时空/位置属性比身份凭证或受监管标识符更容易泄漏。我们提炼出系统设计的实用指导：倾向于稀疏或层次化连通性，最大化攻击者-目标分离，并通过拓扑感知访问控制限制枢纽/捷径路径。我们的代码可在https://github.com/llll121/mama-eval获取。

英文摘要

Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a controlled evaluation framework for comparing topology-conditioned memory leakage in multi-agent LLM systems. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over 10 rounds, we measure leakage using a two-stage recovery criterion that combines exact-match extraction with LLM-based inference over the attacker's final output. We evaluate six canonical topologies (complete, circle, chain, tree, star, star-ring) across $n\in\{4,5,6\}$, attacker-target placements, and base models. Results are consistent: denser connectivity, shorter attacker-target distance, and higher target centrality increase leakage; most leakage occurs in early rounds and then plateaus; model choice shifts absolute rates but preserves broad structural trends; spatiotemporal/location attributes leak more readily than identity credentials or regulated identifiers. We distill practical guidance for system design: favor sparse or hierarchical connectivity, maximize attacker-target separation, and restrict hub/shortcut pathways via topology-aware access control. Our code is available at https://github.com/llll121/mama-eval.

URL PDF HTML ☆

赞 0 踩 0

2511.07107 2026-06-04 cs.AI cs.CL 版本更新

MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Domain Risks in LLMs

MENTOR: 一种元认知驱动的自我进化框架，用于发现和缓解大语言模型中的隐式领域风险

Liang Shan, Kaicheng Shen, Wen Wu, Zhenyu Ying, Chaochao Lu, Yan Teng, Jingqi Huang, Qingshan Liu, Guangze Ye, Guoqing Wang, Jie Zhou, Liang He

发表机构 * School of Computer Science and Technology, East China Normal University（东华大学计算机科学与技术学院）； Shanghai AI Lab, Shanghai Innovation Institute（上海人工智能实验室，上海创新研究院）

AI总结针对大语言模型在特定领域（如教育、金融、管理）中存在的隐式安全风险，提出基于元认知自我评估和动态规则知识图谱的MENTOR框架，通过激活级引导信号有效降低攻击成功率。

详情

AI中文摘要

确保大语言模型（LLMs）的安全性对于实际部署至关重要。然而，当前的安全措施往往无法解决隐式的、特定领域的风险。为了研究这一差距，我们引入了一个包含3000个标注查询的数据集，涵盖教育、金融和管理领域。对14个主流LLMs的评估揭示了一个令人担忧的漏洞：平均越狱成功率为57.8%。为此，我们提出了MENTOR，一种元认知驱动的自我进化框架。MENTOR执行元认知自我评估，采用视角转换和后果推理等策略来揭示潜在的模型错位。由此产生的反思被提炼为动态的基于规则的知识图谱，从中检索到的规则被转换为激活级引导信号，以在推理过程中指导内部表示。实验表明，MENTOR在所有测试领域显著降低了攻击成功率，并优于现有的安全对齐方法。MENTOR的代码和数据集可在 https://anonymous.4open.science/r/MENTOR-Evo 获取。

英文摘要

Ensuring the safety of Large Language Models (LLMs) is critical for real-world deployment. However, current safety measures often fail to address implicit, domain-specific risks. To investigate this gap, we introduce a dataset of 3,000 annotated queries spanning education, finance, and management. Evaluations across 14 leading LLMs reveal a concerning vulnerability: an average jailbreak success rate of 57.8\%. In response, we propose MENTOR, a metacognition-driven self-evolution framework. MENTOR performs metacognitive self-assessment, using strategies such as perspective-taking and consequential reasoning to uncover latent model misalignments. The resulting reflections are distilled into dynamic rule-based knowledge graphs, from which retrieved rules are converted into activation-level steering signals to guide internal representations during inference. Experiments demonstrate that MENTOR substantially reduces attack success rates across all tested domains and outperforms existing safety alignment methods. The code and dataset for MENTOR are available at: https://anonymous.4open.science/r/MENTOR-Evo.

URL PDF HTML ☆

赞 0 踩 0

2411.05894 2026-06-04 cs.CL cs.AI cs.LG 版本更新

SSSD: Simply-Scalable Speculative Decoding

SSSD: 简单可扩展的推测解码

Michele Marzollo, Jiawei Zhuang, Niklas Roemer, Niklas Zwingenberger, Lorenz K. Müller, Lukas Cavigelli

发表机构 * Huawei（华为）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出一种无需训练的推测解码方法SSSD，结合轻量级n-gram匹配和硬件感知推测，在多种基准测试中达到与领先训练方法相当的性能，延迟降低高达2.9倍，且对语言和领域变化具有鲁棒性。

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026, Main Conference)

详情

AI中文摘要

推测解码已成为加速大型语言模型推理的流行技术。然而，大多数现有方法在生产服务系统中仅带来适度的改进。实现显著加速的方法通常依赖于额外的训练草案模型或辅助模型组件，增加了部署和维护的复杂性。这种增加的复杂性降低了灵活性，特别是当服务负载转移到草案模型训练数据中未充分表示的任务、领域或语言时。我们引入了简单可扩展的推测解码（SSSD），一种无需训练的方法，结合了轻量级n-gram匹配和硬件感知推测。相对于标准自回归解码，SSSD将延迟降低高达2.9倍。它在广泛的基准测试中达到了与领先的基于训练的方法相当的性能，同时需要显著更低的采用成本——无需数据准备、训练或调优——并且在语言和领域变化以及长上下文设置中表现出优越的鲁棒性。

英文摘要

Speculative Decoding has emerged as a popular technique for accelerating inference in Large Language Models. However, most existing approaches yield only modest improvements in production serving systems. Methods that achieve substantial speedups typically rely on an additional trained draft model or auxiliary model components, increasing deployment and maintenance complexity. This added complexity reduces flexibility, particularly when serving workloads shift to tasks, domains, or languages that are not well represented in the draft model's training data. We introduce Simply-Scalable Speculative Decoding (SSSD), a training-free method that combines lightweight n-gram matching with hardware-aware speculation. Relative to standard autoregressive decoding, SSSD reduces latency by up to 2.9x. It achieves performance on par with leading training-based approaches across a broad range of benchmarks, while requiring substantially lower adoption effort--no data preparation, training or tuning are needed--and exhibiting superior robustness under language and domain shift, as well as in long-context settings.

URL PDF HTML ☆

赞 0 踩 0

2512.17678 2026-06-04 cs.LG cs.AI 版本更新

You Only Train Once: Differentiable Subset Selection for Omics Data

你只训练一次：用于组学数据的可微分子集选择

Daphné Chopard, Jorge da Silva Gonçalves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt

发表机构 * Department of Computer Science, ETH Zurich（计算机科学系，苏黎世联邦理工学院）； Department of Intensive Care and Neonatology, University Children’s Hospital Zurich（重症医学与新生儿科，苏黎世大学儿童医院）

AI总结提出YOTO框架，通过端到端可微架构联合选择离散基因子集并进行预测，实现稀疏、多任务学习，提升单细胞转录组数据分析性能。

Comments Camera-ready version accepted at Transactions on Machine Learning Research (TMLR)

详情

Journal ref: Transactions on Machine Learning Research, 2026

AI中文摘要

从单细胞转录组数据中选择紧凑且信息丰富的基因子集对于生物标志物发现、提高可解释性和成本效益分析至关重要。然而，大多数现有的特征选择方法要么作为多阶段流水线运行，要么依赖于事后特征归因，使得选择和预测弱耦合。在这项工作中，我们提出了YOTO（你只训练一次），一个端到端框架，在单个可微架构中联合识别离散基因子集并进行预测。在我们的模型中，预测任务直接指导选择哪些基因，而学习到的子集反过来塑造预测表示。这种闭环反馈使模型能够在训练过程中迭代地优化其选择内容和预测方式。与现有方法不同，YOTO强制执行稀疏性，使得只有选中的基因对推理有贡献，从而无需训练额外的下游分类器。通过多任务学习设计，模型在相关目标之间学习共享表示，使得部分标记的数据集能够相互提供信息，并发现无需额外训练步骤即可跨任务泛化的基因子集。我们在两个代表性的单细胞RNA-seq数据集上评估YOTO，显示它持续优于最先进的基线。这些结果表明，稀疏、端到端、多任务的基因子集选择提高了预测性能，并产生了紧凑且有意义的基因子集，推进了生物标志物发现和单细胞分析。

英文摘要

Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.

URL PDF HTML ☆

赞 0 踩 0

2512.16919 2026-06-04 cs.CV cs.AI cs.RO 版本更新

DVGT: Driving Visual Geometry Transformer

DVGT: 驾驶视觉几何变换器

Sicheng Zuo, Zixun Xie, Wenzhao Zheng, Shaoqing Xu, Fang Li, Shengyin Jiang, Long Chen, Zhi-Xin Yang, Jiwen Lu

发表机构 * Tsinghua University（清华大学）； University of Macau（澳门大学）； Xiaomi EV（小米电动车）； Peking University（北京大学）

AI总结提出DVGT，一种从无位姿多视角图像序列重建全局稠密3D点图的视觉几何变换器，通过交替注意力机制学习几何关系，无需相机参数和后处理对齐，在多个驾驶数据集上显著优于现有模型。

Comments Code is available at https://github.com/wzzheng/DVGT

详情

AI中文摘要

从视觉输入中感知和重建3D场景几何对于自动驾驶至关重要。然而，目前仍缺乏一种能够适应不同场景和相机配置的、面向驾驶的稠密几何感知模型。为弥补这一空白，我们提出了驾驶视觉几何变换器（DVGT），它从一系列无位姿的多视角视觉输入中重建全局稠密3D点图。我们首先使用DINO骨干网络为每张图像提取视觉特征，并采用交替的视角内局部注意力、跨视角空间注意力和跨帧时间注意力来推断图像间的几何关系。然后，我们使用多个头解码第一帧自车坐标系下的全局点图以及每帧的自车位姿。与依赖精确相机参数的传统方法不同，DVGT无需显式的3D几何先验，能够灵活处理任意相机配置。DVGT直接从图像序列预测度量尺度的几何，消除了与外部传感器后对齐的需求。在包含nuScenes、OpenScene、Waymo、KITTI和DDAD的大型驾驶数据集混合训练下，DVGT在各种场景中显著优于现有模型。代码可在https://github.com/wzzheng/DVGT获取。

英文摘要

Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and camera configurations. To bridge this gap, we propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs. We first extract visual features for each image using a DINO backbone, and employ alternating intra-view local attention, cross-view spatial attention, and cross-frame temporal attention to infer geometric relations across images. We then use multiple heads to decode a global point map in the ego coordinate of the first frame and the ego poses for each frame. Unlike conventional methods that rely on precise camera parameters, DVGT is free of explicit 3D geometric priors, enabling flexible processing of arbitrary camera configurations. DVGT directly predicts metric-scaled geometry from image sequences, eliminating the need for post-alignment with external sensors. Trained on a large mixture of driving datasets including nuScenes, OpenScene, Waymo, KITTI, and DDAD, DVGT significantly outperforms existing models on various scenarios. Code is available at https://github.com/wzzheng/DVGT.

URL PDF HTML ☆

赞 0 踩 0

2512.05277 2026-06-04 cs.CV cs.AI 版本更新

基于Transformer的模型与人脑网络之间拓扑对齐的统一几何空间

Silin Chen, Yuzhong Chen, Caiwei Wang, Zifan Wang, Junhao Wang, Zifeng Jia, Keith M Kendrick, Tuo Zhang, Lin Zhao, Dezhong Yao, Tianming Liu, Xi Jiang

发表机构 * The Clinical Hospital of Chengdu Brain Science Institute, MOE-K Lab for NeuroInformation, Brain‑Apparatus Communication Institute, School of Life Science and Technology, University of Electronic Science and Technology of China（成都脑科学研究院临床医院，MOE-K神经信息实验室，脑-装置通信研究所，电子科技大学生命科学与技术学院）； School of Automation, Northwestern Polytechnical University（西北工业大学自动化学院）； Department of Biomedical Engineering, New Jersey Institute of Technology（新泽西理工学院生物医学工程系）； School of Computing, University of Georgia（佐治亚大学计算机学院）

AI总结提出一个模态无关、任务无关的拓扑对齐空间，通过图组织属性将Transformer模型的注意力拓扑映射到人脑固有连接网络，揭示了不同模态和规模模型的连续弧形分布及对齐特性。

详情

AI中文摘要

先前的脑-人工智能对齐研究通常受限于特定的输入和任务，限制了其捕捉不同模态模型组织特性的能力。在这项工作中，我们聚焦于基于Transformer的模型，引入了一个脑-模型拓扑对齐空间。我们不是从神经机制推断对齐，而是通过基于图的组织特性来检查对齐，将模型的内在空间注意力拓扑映射到规范的人脑固有连接网络（ICNs）。这使得在组织特性层面上，对视觉、语言和多模态系统进行模态无关且无任务的比较成为可能。通过分析跨这些模态和规模的151个基于Transformer的模型，我们观察到一个连续的弧形分布，反映了不同程度的拓扑对齐。与其训练目标一致，优化用于全局语义抽象的模型与高阶ICNs关联更紧密，而专注于局部细节的模型则与低级ICNs关联。更令人惊讶的是，我们发现了非直观的现象：DINOv2相比其前身表现出对齐降低，蒸馏的DeiT模型显示出反直觉的缩放反转，即更大的模型与高阶ICNs对齐更差，而微调和指令调优对对齐影响有限。此外，拓扑对齐分数与30个视觉Transformer的ImageNet-1K Top-1准确率相关性不显著（r=0.266, p=0.156）。这项工作为通过脑参考拓扑映射比较基于Transformer的模型的组织特性提供了新的定量视角。

英文摘要

Prior brain-AI alignment studies are typically constrained by specific inputs and tasks, limiting their ability to capture organizational properties across models with different modalities. In this work, we focus on Transformer-based models and introduce a brain-model topological alignment space. Rather than inferring alignment from neural mechanisms, we examine it through graph-based organizational properties, mapping the intrinsic spatial attention topology of a model onto canonical human intrinsic connectivity networks (ICNs). This enables a modality-agnostic and task-free comparison across vision, language, and multimodal systems at the level of organizational properties. Analyzing 151 Transformer-based models across these modalities and scales, we observe a continuous arc-shaped distribution, reflecting varying degrees of topological alignment. Consistent with their training objectives, models optimized for global semantic abstraction were associated more closely with higher-order ICNs, while local detail-focused models associated with low-level ICNs. More surprisingly, we uncovered non-intuitive phenomena: DINOv2 exhibited reduced alignment compared to its predecessors, distilled DeiT models showed a counterintuitive scaling inversion where larger models aligned less well with higher-order ICNs, and fine-tuning as well as instruction tuning had limited effect on alignment. Furthermore, topological alignment scores showed non-significant correlation with ImageNet-1K Top-1 accuracy in 30 vision Transformers (r=0.266, p=0.156). This work provides a new quantitative perspective for comparing the organizational properties of Transformer-based models through brain-referenced topological mapping.

URL PDF HTML ☆

赞 0 踩 0

2508.08237 2026-06-04 cs.MM cs.AI cs.CV cs.SD eess.AS 版本更新

VGGSounder: Audio-Visual Evaluations for Foundation Models

VGGSounder：基础模型的音视频评估

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke

发表机构 * Technical University of Munich, MCML（慕尼黑技术大学，MCML）； University of Tübingen（图宾根大学）； Tübingen AI Center（图宾根人工智能中心）； MPI for Intelligent Systems, ELLIS Institute（智能系统Max Planck研究所，ELLIS研究所）

AI总结针对VGGSound数据集在音视频基础模型评估中的标签不完整、类别重叠和模态错位等问题，提出重新标注的多标签测试集VGGSounder，并引入模态混淆指标分析模型性能退化。

Comments Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2025

2510.15416 2026-06-04 cs.AI 版本更新

Adaptive Minds: Empowering Agents with LoRA-as-Tools

自适应心智：将LoRA作为工具赋予智能体能力

Pavan C Shekar, Aswanth Krishnan

发表机构 * GitHub

AI总结提出将LoRA适配器作为可调用工具的框架，通过路由和智能体推理聚合多个专业适配器的优势，在30个适配器库中达到98.3%路由准确率，并在九类任务上显著提升性能。

Comments 13 pages, 3 figures, 9 tables. ICML 2026 CompLearn Workshop camera-ready (non-archival). Code: https://github.com/qpiai/adaptive-minds

详情

AI中文摘要

我们研究了一个框架，其中LoRA适配器被视为可调用的工具，基础语言模型可以动态选择并调用它们。我们假设，当适配器经过训练以提供强大的领域特定增益，并附带清晰的元数据时，基础模型可以可靠地将查询路由到适当的专家，从而有效地在单个框架内聚合许多专门适配器的优势。我们引入了自适应心智（Adaptive Minds），这是一个通用框架，在其中我们研究单步路由和多步智能体推理。在这种设置中，智能体可以迭代地调用多个适配器以及其他工具（例如，外部API、检索系统或执行环境），并在多个步骤中对其输出进行推理。这重新将适配器视为模块化技能或记忆单元，可以在推理过程中组合，而不是静态应用。在我们的评估中，路由层在30个适配器库上达到了98.3%的准确率，并且在单一共享训练配方下，训练有素的专业适配器在九个任务族中提供了+4.6到+84.0个百分点的严格评分增益；AM路由器在每个查询包含领域信号的基准测试中，将这些增益聚合在直接专业适配器的5个百分点以内。我们的研究结果表明，该方法的有效性取决于各个适配器的质量和专业化程度，并且启用许多此类专家的灵活组合可以显著扩展语言模型智能体的实际能力，朝着更通用的、工具增强的智能迈进。

英文摘要

We investigate a framework in which LoRA adapters are treated as callable tools that a base language model can dynamically select and invoke. We hypothesize that, when adapters are trained to provide strong domain-specific gains and are exposed with clear metadata, a base model can reliably route queries to the appropriate expert, effectively aggregating the benefits of many specialized adapters within a single framework. We introduce Adaptive Minds, a general framework within which we study both single-step routing and multi-step agentic reasoning. In this setting, the agent can iteratively invoke multiple adapters alongside other tools (e.g., external APIs, retrieval systems, or execution environments) and reason over their outputs across multiple steps. This reframes adapters as modular skills or memory units that can be composed during reasoning rather than statically applied. In our evaluation, the routing layer reaches 98.3% accuracy on a 30-adapter library, and well-trained specialists provide +4.6 to +84.0 percentage points of strict-scorer gain across nine task families under a single shared training recipe; the AM router aggregates these gains within 5 pp of the direct specialist on every benchmark whose queries surface domain signal. Our findings suggest that the effectiveness of this approach depends on the quality and specialization of individual adapters, and that enabling flexible composition of many such experts can significantly expand the practical capabilities of language model agents, moving toward more general, tool-augmented intelligence.

URL PDF HTML ☆

赞 0 踩 0

2510.13704 2026-06-04 cs.LG cs.AI cs.RO 版本更新

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

单纯形嵌入提升Actor-Critic智能体的样本效率

Johan Obando-Ceron, Walter Mayor, Samuel Lavoie, Scott Fujimoto, Aaron Courville, Pablo Samuel Castro

发表机构 * Mila – Québec AI Institute（魁北克人工智能研究所）； Université de Montréal（蒙特利尔大学）； McGill University（麦吉尔大学）； CIFAR AI Chair（CIFAR人工智能主席）

AI总结针对大规模环境并行化下Actor-Critic方法仍需大量交互的问题，提出使用单纯形嵌入作为轻量级表示层，通过几何归纳偏置产生稀疏离散特征，稳定评论家引导并强化策略梯度，在FastTD3、FastSAC和PPO中一致提升样本效率和最终性能。

2505.11166 2026-06-04 cs.CL cs.AI 版本更新

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

SoLoPO: 通过短到长偏好优化解锁大语言模型的长上下文能力

Huashan Sun, Shengyi Liao, Yansen Han, Yu Bai, Yang Gao, Cheng Fu, Weizhou Shen, Fanqi Wan, Ming Yan, Ji Zhang, Fei Huang

发表机构 * Tongyi Lab, Alibaba Group（通义实验室，阿里巴巴集团）

AI总结提出SoLoPO框架，将长上下文偏好优化解耦为短上下文偏好优化和短到长奖励对齐，以提升大语言模型的长上下文利用能力。

Comments Published as a conference paper at ICLR 2026

详情

AI中文摘要

尽管在扩展上下文大小的预训练方面取得了进展，但大语言模型（LLMs）在有效利用现实世界中的长上下文信息方面仍面临挑战，这主要是由于数据质量问题、训练效率低下以及缺乏设计良好的优化目标导致的长上下文对齐不足。为了解决这些限制，我们提出了一个名为 extbf{S}h extbf{o}rt-to- extbf{Lo}ng extbf{P}reference extbf{O}ptimization（ extbf{SoLoPO}）的框架，将长上下文偏好优化（PO）解耦为两个组成部分：短上下文PO和短到长奖励对齐（SoLo-RA），并得到了理论和实验证据的支持。具体来说，短上下文PO利用从短上下文中采样的偏好对来增强模型的情境知识利用能力。同时，SoLo-RA明确鼓励在包含相同任务相关信息的短上下文和长上下文条件下，响应的奖励分数一致性。这有助于将模型处理短上下文的能力迁移到长上下文场景中。SoLoPO与主流的偏好优化算法兼容，同时显著提高了数据构建和训练过程的效率。实验结果表明，SoLoPO增强了所有这些算法在各种长上下文基准测试中的长度和领域泛化能力，同时在计算和内存效率方面取得了显著提升。

英文摘要

Despite advances in pretraining with extended context sizes, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named \textbf{S}h\textbf{o}rt-to-\textbf{Lo}ng \textbf{P}reference \textbf{O}ptimization (\textbf{SoLoPO}), decoupling long-context preference optimization (PO) into two components: short-context PO and short-to-long reward alignment (SoLo-RA), supported by both theoretical and empirical evidence. Specifically, short-context PO leverages preference pairs sampled from short contexts to enhance the model's contextual knowledge utilization ability. Meanwhile, SoLo-RA explicitly encourages reward score consistency for the responses when conditioned on both short and long contexts that contain identical task-relevant information. This facilitates transferring the model's ability to handle short contexts into long-context scenarios. SoLoPO is compatible with mainstream preference optimization algorithms, while substantially improving the efficiency of data construction and training processes. Experimental results show that SoLoPO enhances all these algorithms with respect to stronger length and domain generalization abilities across various long-context benchmarks, while achieving notable improvements in both computational and memory efficiency.

URL PDF HTML ☆

赞 0 踩 0

1708.06233 2026-06-04 cs.AI cs.MA cs.SI econ.GN physics.soc-ph q-fin.EC 版本更新

Fake News in Social Networks

社交媒体中的虚假新闻

Christoph Aymanns, Jakob Foerster, Co-Pierre Georg, Matthias Weber

发表机构 * University of St. Gallen（圣加尔大学）； University of Oxford（牛津大学）； Frankfurt School of Finance and Management（法兰克福金融与管理学院）； Swiss Finance Institute（瑞士金融研究所）

AI总结本文提出多智能体强化学习作为建模社交媒体中虚假新闻的新方法，发现针对高连接性和弱隐私信息的人群更有效，且信息分散传播比集中传播更有效，同时平衡网络中虚假新闻传播较弱，通过人类实验验证了模型的适用性。

2510.08647 2026-06-04 cs.CL cs.AI 版本更新

Can Reasoning Path still be Effective as Input? Bridging Post-Reasoning to Chain-of-Thought Compression

推理路径作为输入仍然有效吗？将后推理与思维链压缩连接起来

Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Shengchao Liu, Guoxin Ma, Yu Lan, Cong Wang, Chao Shen

发表机构 * Faculty of Electronic and Information Engineering, Xi’an Jiaotong University（西安交通大学电子与信息工程学院）； Queen Mary University of London（伦敦大学玛丽女王学院）； City University of Hong Kong（香港城市大学）

AI总结提出后推理范式，通过将思维链作为上下文输入来简化推理任务，并设计UCoT框架训练轻量级压缩器生成软令牌形式的上下文思维链，从而在保持推理能力的同时显著压缩输出长度。

Comments ACL 2026 Main Track

详情

AI中文摘要

近期发展使得大型语言模型（LLMs）能够通过长思维链（CoT）实现高级推理，但这是以牺牲推理效率为代价来换取性能。现有工作侧重于压缩推理过程中生成的CoT，但这会损害推导正确答案所需的信息。在这项工作中，我们提出后推理（post-reasoning）这一推理范式，将CoT作为上下文的一部分，以简化LLMs的推理任务。我们发现后推理显著减少了LLMs的生成长度，但其有效性取决于上下文CoT生成的效率和可靠性。因此，我们提出UCoT（Upfront CoT），一个用于CoT压缩的高效后推理框架。UCoT训练一个轻量级模型（压缩器）以软令牌形式提供上下文CoT，并训练LLM（执行器）利用此上下文CoT生成最终答案。大量实验表明，UCoT在保持执行器强大推理能力的同时，显著减少了CoT的长度。值得一提的是，当将UCoT应用于Qwen2.5-7B-Instruct模型时，在GSM8K数据集上的令牌使用量减少了50%，而性能比最先进（SOTA）方法高出3.08%。

英文摘要

Recent developments have enabled advanced reasoning in Large Language Models (LLMs) via long Chain-of-Thought (CoT), trading efficiency during inference for performance. Existing works focus on compressing generated CoT in reasoning, which impairs the necessary information for deriving the correct answer. In this work, we propose post-reasoning, a reasoning paradigm that takes CoT as a part of context to simplify the reasoning task for LLMs. We find that post-reasoning significantly reduces the generation length of LLMs, but its effectiveness hinges on the efficiency and the reliability of the contextual CoT generation. Therefore, we propose Upfront CoT (UCoT), an efficient post-reasoning framework for CoT compression. UCoT trains a lightweight model (compressor) to provide contextual CoT in form of soft tokens and trains the LLM (executor) to leverage this contextual CoT for producing the final answer. Extensive experiments show that UCoT maintains the powerful reasoning ability of executor while significantly reducing the length of CoT. It is worth mentioning that when applying UCoT to the Qwen2.5-7B-Instruct model, the usage of tokens on GSM8K dataset is reduced by 50%, while the performance is 3.08% higher than that of the state-of-the-art (SOTA) method.

URL PDF HTML ☆

赞 0 踩 0

2510.03511 2026-06-04 cs.CV cs.AI cs.LG eess.IV 版本更新

Platonic Transformers: A Solid Choice For Equivariance

柏拉图式Transformer：等变性的坚实选择

Mohammad Mohaiminul Islam, Rishabh Anand, David R. Wessels, Friso de Kruiff, Thijs P. Kuipers, Rex Ying, Clara I. Sánchez, Sharvaree Vadgama, Georg Bökman, Erik J. Bekkers

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出Platonic Transformer，通过基于柏拉图立体对称群参考帧的注意力机制实现等变性，在不增加计算成本的前提下提升性能。

详情

AI中文摘要

尽管Transformer广泛应用，但缺乏科学和计算机视觉中常见几何对称性的归纳偏置。现有的等变方法往往通过复杂、计算密集的设计牺牲了Transformer的高效性和灵活性。我们引入Platonic Transformer来解决这一权衡。通过将注意力定义为相对于柏拉图立体对称群参考帧，我们的方法引入了一种有原则的权重共享方案。这使得模型能够同时对连续平移和柏拉图对称性保持等变，同时保留标准Transformer的精确架构和计算成本。此外，我们证明这种注意力在形式上等价于动态群卷积，这表明模型学习自适应几何滤波器，并实现高度可扩展的线性时间卷积变体。在计算机视觉（CIFAR-10）、3D点云（ScanObjectNN）和分子性质预测（QM9、OMol25）等多个基准测试中，Platonic Transformer通过利用这些几何约束以零额外成本取得了有竞争力的性能。

英文摘要

While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.

URL PDF HTML ☆

赞 0 踩 0

2510.01902 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Constrained Adaptive Rejection Sampling

约束自适应拒绝采样

Paweł Parys, Sairam Vaidya, Taylor Berg-Kirkpatrick, Loris D'Antoni

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出约束自适应拒绝采样（CARS），通过自适应剪枝无效前缀来提高拒绝采样的样本效率，同时保持无分布扭曲，在程序模糊测试和分子生成等任务中优于现有方法。

详情

AI中文摘要

语言模型（LMs）越来越多地应用于生成的输出必须满足严格语义或语法约束的场景。现有的约束生成方法处于一个谱系中：贪婪约束解码方法在解码过程中强制执行有效性，但扭曲了LM的分布；而拒绝采样（RS）保留了保真度，但通过丢弃无效输出浪费计算资源。在程序模糊测试等领域，样本的有效性和多样性都至关重要，这两种极端方法都有问题。我们提出约束自适应拒绝采样（CARS），一种严格提高RS样本效率且不产生分布扭曲的方法。CARS从无约束LM采样开始，通过将违反约束的续写记录在trie中并从后续抽取中减去其概率质量，自适应地排除它们。这种自适应剪枝确保已证明无效的前缀不会被重新访问，接受率单调提高，并且生成的样本精确遵循约束分布。在多个领域的实验（例如程序模糊测试和分子生成）中，CARS始终实现更高的效率（以每个有效样本的LM前向传递次数衡量），同时产生比GCD和近似LM分布的方法更强的样本多样性。

英文摘要

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding but distort the LM's distribution, while rejection sampling (RS) preserves fidelity but wastes computation by discarding invalid outputs. Both extremes are problematic in domains such as program fuzzing, where both validity and diversity of samples are essential. We present Constrained Adaptive Rejection Sampling (CARS), an approach that strictly improves the sample-efficiency of RS without distributional distortion. CARS begins with unconstrained LM sampling and adaptively rules out constraint-violating continuations by recording them in a trie and subtracting their probability mass from future draws. This adaptive pruning ensures that prefixes proven invalid are never revisited, acceptance rates improve monotonically, and the resulting samples exactly follow the constrained distribution. In experiments on a variety of domains -- e.g., program fuzzing and molecular generation -- CARS consistently achieves higher efficiency -- measured in the number of LM forward passes per valid sample -- while also producing stronger sample diversity than both GCD and methods that approximate the LM's distribution.

URL PDF HTML ☆

赞 0 踩 0

2505.22988 2026-06-04 cs.LG cs.AI 版本更新

Model-Preserving Adaptive Rounding

模型保持的自适应舍入

Albert Tseng, Zhaofeng Sun, Christopher De Sa

发表机构 * Department of Computer Science, Cornell University（康奈尔大学计算机科学系）

AI总结提出一种直接考虑网络输出误差的自适应舍入量化算法YAQA，通过理论分析给出首个端到端误差界，并利用Kronecker分解近似Hessian矩阵，在无推理开销下实现优于GPTQ/LDLQ约30%的误差降低。

Comments ICML 2026

详情

AI中文摘要

量化的目标是生成一个压缩模型，其输出分布尽可能接近原始模型。为了可处理地实现这一点，大多数量化算法最小化每层的即时激活误差作为端到端误差的代理。然而，这忽略了未来层的影响，使其成为一个较差的代理。在这项工作中，我们引入了另一种量化算法（YAQA），一种直接考虑网络输出误差的自适应舍入算法。YAQA引入了一系列理论结果，最终给出了量化算法的首个端到端误差界。首先，我们通过Hessian近似的结构刻画了自适应舍入算法的收敛时间。然后，我们证明端到端误差可以通过近似与真实Hessian的余弦相似度来界定。这允许一种自然的Kronecker分解近似，并具有相应的近最优Hessian草图。YAQA在理论上优于GPTQ/LDLQ，并在经验上比这些方法减少约30%的误差。YAQA甚至实现了比量化感知训练更低的误差。这转化为下游任务上的最先进性能，同时不增加推理开销。

英文摘要

The goal of quantization is to produce a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, most quantization algorithms minimize the immediate activation error of each layer as a proxy for the end-to-end error. However, this ignores the effect of future layers, making it a poor proxy. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the error at the network's output. YAQA introduces a series of theoretical results that culminate in the first end-to-end error bounds for quantization algorithms. First, we characterize the convergence time of adaptive rounding algorithms via the structure of their Hessian approximations. We then show that the end-to-end error can be bounded by the approximation's cosine similarity to the true Hessian. This admits a natural Kronecker-factored approximation with corresponding near-optimal Hessian sketches. YAQA is provably better than GPTQ/LDLQ and empirically reduces the error by $\approx 30\%$ over these methods. YAQA even achieves a lower error than quantization aware training. This translates to state of the art performance on downstream tasks, all while adding no inference overhead.

URL PDF HTML ☆

赞 0 踩 0

2509.15676 2026-06-04 cs.LG cs.AI cs.CL 版本更新

KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

KITE: 基于核方法和信息论的上下文学习示例选择

Vaibhav Singh, Soumya Suvra Ghosal, Kapu Nirmal Joshua, Soumyabrata Pal, Sayak Ray Chowdhury

发表机构 * IIT Bombay（印度比哈尔理工学院）； UMD College Park（马里兰大学 College Park 分校）； IIT Kanpur（印度坎普尔理工学院）； Adobe Research（Adobe 研究）

AI总结针对上下文学习中的示例选择问题，提出一种基于信息论和核方法的贪心算法，通过最小化查询特定预测误差并引入多样性正则化，显著提升分类性能。

详情

AI中文摘要

上下文学习（ICL）已成为一种强大的范式，通过仅使用提示中精心选择的少量任务特定示例，使大型语言模型（LLM）适应新的、数据稀缺的任务。然而，鉴于LLM有限的上下文大小，一个基本问题出现了：应选择哪些示例以最大化给定用户查询的性能？虽然基于最近邻的方法（如KATE）已被广泛用于此目的，但它们在高维嵌入空间中存在众所周知的缺点，包括泛化能力差和缺乏多样性。在这项工作中，我们从原则性的、信息论驱动的角度研究ICL中的示例选择问题。我们首先将LLM建模为输入嵌入上的线性函数，并将示例选择任务框架化为一个查询特定的优化问题：从较大的示例库中选择一个子集，以最小化特定查询上的预测误差。这种表述通过针对特定查询实例的准确预测，偏离了传统的以泛化为中心的学习理论方法。我们推导出一个原则性的代理目标，该目标是近似子模的，从而能够使用具有近似保证的贪心算法。我们通过（i）引入核技巧以在高维特征空间中操作而无需显式映射，以及（ii）引入基于最优设计的正则化项以鼓励所选示例的多样性，进一步增强了我们的方法。实验上，我们在多个分类任务上展示了相对于标准检索方法的显著改进，突出了在真实世界、标签稀缺场景中，结构感知、多样化的示例选择对ICL的益处。

英文摘要

In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.

URL PDF HTML ☆

赞 0 踩 0

2509.10247 2026-06-04 cs.RO cs.AI 版本更新

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

DiffAero: 一种用于高效四旋翼策略学习的GPU加速可微分仿真框架

Xinhong Zhang, Runqing Wang, Yunfan Ren, Jian Sun, Hao Fang, Jie Chen, Gang Wang

发表机构 * State Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology（自主智能无人系统国家重点实验室，北京理工大学）； Zhongguancun Academy（中关村academy）； Department of Mechanical Engineering, University of Hong Kong（香港大学机械工程系）； Harbin Institute of Technology（哈尔滨工业大学）

AI总结提出DiffAero，一种轻量级、GPU加速且完全可微的仿真框架，通过并行化物理与渲染实现高效四旋翼控制策略学习，并在消费级硬件上数小时内训练出鲁棒策略。

Comments 8 pages, 11 figures, 1 table

详情

AI中文摘要

本文介绍了DiffAero，一种轻量级、GPU加速且完全可微的仿真框架，专为高效的四旋翼控制策略学习而设计。DiffAero支持环境级和智能体级并行，并在统一的GPU原生训练接口中集成了多种动力学模型、可定制的传感器堆栈（IMU、深度相机和LiDAR）以及多样化的飞行任务。通过在GPU上完全并行化物理和渲染，DiffAero消除了CPU-GPU数据传输瓶颈，并在仿真吞吐量上实现了数量级的提升。与现有仿真器相比，DiffAero不仅提供高性能仿真，还作为探索可微和混合学习算法的研究平台。广泛的基准测试和真实世界飞行实验表明，DiffAero与混合学习算法相结合，可以在消费级硬件上数小时内学习到鲁棒的飞行策略。代码可在https://github.com/flyingbitac/diffaero获取。

英文摘要

This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully differentiable simulation framework designed for efficient quadrotor control policy learning. DiffAero supports both environment-level and agent-level parallelism and integrates multiple dynamics models, customizable sensor stacks (IMU, depth camera, and LiDAR), and diverse flight tasks within a unified, GPU-native training interface. By fully parallelizing both physics and rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and delivers orders-of-magnitude improvements in simulation throughput. In contrast to existing simulators, DiffAero not only provides high-performance simulation but also serves as a research platform for exploring differentiable and hybrid learning algorithms. Extensive benchmarks and real-world flight experiments demonstrate that DiffAero and hybrid learning algorithms combined can learn robust flight policies in hours on consumer-grade hardware. The code is available at https://github.com/flyingbitac/diffaero.

URL PDF HTML ☆

赞 0 踩 0

2509.08846 2026-06-04 cs.LG cs.AI stat.ML 版本更新

Uncertainty Estimation using Variance-Gated Distributions

使用方差门控分布的不确定性估计

H. Martin Gillis, Isaac Xu, Thomas Trappenberg

发表机构 * Faculty of Computer Science（计算机科学学院）； Dalhousie University（达尔豪斯大学）

AI总结提出基于类概率分布信噪比的方差门控不确定性估计框架，通过集成置信因子缩放预测，解决神经网络预测不确定性分解中的加性分解问题。

Comments NeurIPS Workshop: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making

2509.03351 2026-06-04 cs.LG cs.AI q-bio.QM 版本更新

epiGPTope: A machine learning-based epitope generator and classifier

epiGPTope: 一种基于机器学习的表位生成器和分类器

Natalia Flechas Manrique, Alberto Martínez, Elena López-Martínez, Luc Andrea, Román Orus, Aitor Manteca, Aitziber L. Cortajarena, Llorenç Espinosa-Portalés

发表机构 * Multiverse Computing（多维计算公司）； Centre for Cooperative Research in Biomaterials (CIC biomaGUNE)（生物材料联合研究中心）； Basque Research and Technology Alliance (BRTA)（巴斯克研究与技术联盟）； Donostia International Physics Center（多斯蒂亚国际物理中心）； Ikerbasque Foundation for Science（伊kerbasque科学基金会）； IKERBASQUE（伊kerbasque）

AI总结提出基于大型语言模型epiGPTope，通过预训练和微调直接生成新型表位序列，并结合统计分类器预测表位来源（细菌或病毒），以加速合成表位库的构建和筛选。

Comments 11 pages, 4 figures. Supplementary Information with 5 pages, 4 figures

详情

DOI: 10.1021/acssynbio.5c00693
Journal ref: ACS Synthetic Biology 2026 15 (2), 631-642

AI中文摘要

表位是能被抗体或免疫细胞受体识别的短抗原肽序列，对免疫疗法、疫苗和诊断的开发至关重要。然而，由于巨大的组合序列空间（n个氨基酸的线性表位有$20^n$种组合），即使采用高通量实验技术，合成表位库的合理设计也极具挑战。在本研究中，我们提出了一种大型语言模型epiGPTope，该模型在蛋白质数据上预训练，并专门针对线性表位进行微调，首次能够直接生成新型表位样序列，这些序列被发现具有与已知表位相似的统计特性。这种生成方法可用于制备表位候选序列库。我们进一步训练统计分类器来预测表位序列是细菌来源还是病毒来源，从而缩小候选库范围，提高识别特定表位的可能性。我们提出，这种生成模型与预测模型的组合有助于表位发现。该方法仅使用线性表位的一级氨基酸序列，无需几何框架或手工特征。通过开发生成生物学可行序列的方法，我们预期能更快、更经济地生成和筛选合成表位，并在新生物技术开发中具有相关应用。

英文摘要

Epitopes are short antigenic peptide sequences which are recognized by antibodies or immune cell receptors. These are central to the development of immunotherapies, vaccines, and diagnostics. However, the rational design of synthetic epitope libraries is challenging due to the large combinatorial sequence space, $20^n$ combinations for linear epitopes of n amino acids, making screening and testing unfeasible, even with high throughput experimental techniques. In this study, we present a large language model, epiGPTope, pre-trained on protein data and specifically fine-tuned on linear epitopes, which for the first time can directly generate novel epitope-like sequences, which are found to possess statistical properties analogous to the ones of known epitopes. This generative approach can be used to prepare libraries of epitope candidate sequences. We further train statistical classifiers to predict whether an epitope sequence is of bacterial or viral origin, thus narrowing the candidate library and increasing the likelihood of identifying specific epitopes. We propose that such combination of generative and predictive models can be of assistance in epitope discovery. The approach uses only primary amino acid sequences of linear epitopes, bypassing the need for a geometric framework or hand-crafted features of the sequences. By developing a method to create biologically feasible sequences, we anticipate faster and more cost-effective generation and screening of synthetic epitopes, with relevant applications in the development of new biotechnologies.

URL PDF HTML ☆

赞 0 踩 0

2508.14623 2026-06-04 eess.AS cs.AI cs.SD 版本更新

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

带噪参考下语音分离中尺度不变信失真比的研究

Simon Dahl Jepsen, Mads Græsbøll Christensen, Jesper Rindom Jensen

发表机构 * European Union（欧洲联盟）

AI总结本文研究了在训练参考包含噪声时，使用尺度不变信失真比作为评估和训练目标的影响，提出通过增强参考和混合数据来避免学习噪声参考，实验表明可减少分离语音中的噪声但可能引入伪影。

详情

DOI: 10.1109/ASRU65441.2025.11434756
Journal ref: 2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, HI, USA, 2025, pp. 1-8

AI中文摘要

本文研究了在监督语音分离中，当训练参考包含噪声时（如事实上的基准WSJ0-2Mix），使用尺度不变信失真比（SI-SDR）作为评估和训练目标的影响。对带噪参考的SI-SDR推导表明，噪声限制了可实现的SI-SDR，或导致分离输出中出现不希望的噪声。为了解决这个问题，提出了一种增强参考并用WHAM!扩充混合数据的方法，旨在训练避免学习噪声参考的模型。使用非侵入式NISQA.v2指标评估了在这些增强数据集上训练的两个模型。结果显示分离语音中的噪声减少，但表明处理参考可能引入伪影，限制了整体质量提升。在WSJ0-2Mix和Libri2Mix测试集上，各模型的SI-SDR与感知噪声之间存在负相关，这印证了推导的结论。

英文摘要

This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation, when the training references contain noise, as is the case with the de facto benchmark WSJ0-2Mix. A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs. To address this, a method is proposed to enhance references and augment the mixtures with WHAM!, aiming to train models that avoid learning noisy references. Two models trained on these enhanced datasets are evaluated with the non-intrusive NISQA.v2 metric. Results show reduced noise in separated speech but suggest that processing references may introduce artefacts, limiting overall quality gains. Negative correlation is found between SI-SDR and perceived noisiness across models on the WSJ0-2Mix and Libri2Mix test sets, underlining the conclusion from the derivation.

URL PDF HTML ☆

赞 0 踩 0

2508.01815 2026-06-04 cs.CL cs.AI 版本更新

100-LongBench：事实上的长上下文基准是否真的在评估长上下文能力？

Wang Yang, Hongye Jin, Shaochen Zhong, Song Jiang, Qifan Wang, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University（凯斯西储大学）； Texas A&M University（德克萨斯A&M大学）； Rice University（里德大学）； University of California, Los Angeles（加州大学洛杉矶分校）； Meta（Meta公司）

AI总结针对现有长上下文基准无法分离基线能力与真实长上下文能力、且输入长度固定等问题，提出长度可控的长上下文基准和新指标，以有效评估大语言模型的长上下文能力。

详情

AI中文摘要

长上下文能力被认为是LLM最重要的能力之一，因为真正具备长上下文能力的LLM使用户能够轻松处理许多原本繁琐的任务——例如，阅读长文档寻找答案与直接询问LLM。然而，现有的基于真实任务的长上下文评估基准有两个主要缺陷。首先，像LongBench这样的基准通常没有提供适当的指标来将长上下文性能与模型的基线能力分开，使得跨模型比较不清晰。其次，此类基准通常以固定输入长度构建，这限制了它们在不同模型上的适用性，并且无法揭示模型何时开始崩溃。为了解决这些问题，我们引入了一个长度可控的长上下文基准和一个新颖的指标，该指标将基线知识与真实的长上下文能力解耦。实验证明了我们的方法在有效评估LLM方面的优越性。

英文摘要

Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM enables users to effortlessly process many originally exhausting tasks -- e.g., digesting a long-form document to find answers vs. directly asking an LLM about it. However, existing real-task-based long-context evaluation benchmarks have two major shortcomings. First, benchmarks like LongBench often do not provide proper metrics to separate long-context performance from the model's baseline ability, making cross-model comparison unclear. Second, such benchmarks are usually constructed with fixed input lengths, which limits their applicability across different models and fails to reveal when a model begins to break down. To address these issues, we introduce a length-controllable long-context benchmark and a novel metric that disentangles baseline knowledge from true long-context capabilities. Experiments demonstrate the superiority of our approach in effectively evaluating LLMs.

URL PDF HTML ☆

赞 0 踩 0

2505.17315 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

更长上下文，更深思考：揭示长上下文能力在推理中的作用

Wang Yang, Zirui Liu, Hongye Jin, Qingyu Yin, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University（凯斯西储大学）； University of Minnesota - Twin Cities（明尼苏达大学双城分校）； Texas A&M University（德克萨斯阿姆大学）

AI总结本研究通过实验发现，增强模型的长上下文能力（在监督微调前）能显著提升推理性能，即使对于短输入任务也有泛化收益，表明长上下文建模是推理能力的关键基础。

详情

AI中文摘要

近期语言模型展现出强大的推理能力，但长上下文能力对推理的影响仍未充分探索。在本工作中，我们假设当前推理能力的局限性部分源于长上下文能力不足，这一假设基于经验观察：（1）更高的上下文窗口长度通常带来更强的推理性能，（2）失败的推理案例与失败的长上下文案例相似。为验证这一假设，我们检验了在监督微调（SFT）前增强模型的长上下文能力是否能提升推理性能。具体而言，我们比较了架构和微调数据相同但长上下文能力不同的模型。结果揭示了一致趋势：长上下文能力更强的模型在SFT后，在推理基准上取得了显著更高的准确率。值得注意的是，即使在输入长度较短的任务上，这些增益也持续存在，表明长上下文训练为推理性能提供了可泛化的益处。这些发现表明，长上下文建模不仅对处理长输入至关重要，而且也是推理的关键基础。我们主张将长上下文能力作为未来语言模型设计的首要目标。

英文摘要

Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. In this work, we hypothesize that current limitations in reasoning stem, in part, from insufficient long-context capacity, motivated by empirical observations such as (1) higher context window length often leads to stronger reasoning performance, and (2) failed reasoning cases resemble failed long-context cases. To test this hypothesis, we examine whether enhancing a model's long-context ability before Supervised Fine-Tuning (SFT) leads to improved reasoning performance. Specifically, we compared models with identical architectures and fine-tuning data but varying levels of long-context capacity. Our results reveal a consistent trend: models with stronger long-context capacity achieve significantly higher accuracy on reasoning benchmarks after SFT. Notably, these gains persist even on tasks with short input lengths, indicating that long-context training offers generalizable benefits for reasoning performance. These findings suggest that long-context modeling is not just essential for processing lengthy inputs, but also serves as a critical foundation for reasoning. We advocate for treating long-context capacity as a first-class objective in the design of future language models.

URL PDF HTML ☆

赞 0 踩 0

2504.15587 2026-06-04 cs.LG cs.AI 版本更新

MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular Design

MetaMolGen: 一种用于从头分子设计的神经图基序生成模型

Zimo Yan, Jie Zhang, Zheng Xie, Chang Liu, Yizhen Liu, Yiping Song

发表机构 * National University of Defense Technology（国防科技大学）

AI总结提出基于元学习的分子生成模型MetaMolGen，通过标准化图基序分布和轻量级自回归序列模型，实现少样本和属性条件分子生成。

详情

DOI: 10.46793/match.97-2.11226

AI中文摘要

分子生成在药物发现和材料科学中扮演重要角色，尤其是在数据稀缺场景下，传统生成模型往往难以实现令人满意的条件泛化。为应对这一挑战，我们提出MetaMolGen，一种基于一阶元学习的分子生成器，专为少样本和属性条件分子生成而设计。MetaMolGen通过将图基序映射到标准化潜在空间来标准化其分布，并采用轻量级自回归序列模型生成忠实反映底层分子结构的SMILES序列。此外，它通过集成到生成过程中的可学习属性投影器，支持具有目标属性的分子的条件生成。实验结果表明，MetaMolGen在低数据条件下持续生成有效且多样的SMILES序列，优于传统基线。这突显了其在快速适应和高效条件生成方面的优势，适用于实际分子设计。

英文摘要

Molecular generation plays an important role in drug discovery and materials science, especially in data-scarce scenarios where traditional generative models often struggle to achieve satisfactory conditional generalization. To address this challenge, we propose MetaMolGen, a first-order meta-learning-based molecular generator designed for few-shot and property-conditioned molecular generation. MetaMolGen standardizes the distribution of graph motifs by mapping them to a normalized latent space, and employs a lightweight autoregressive sequence model to generate SMILES sequences that faithfully reflect the underlying molecular structure. In addition, it supports conditional generation of molecules with target properties through a learnable property projector integrated into the generative process.Experimental results demonstrate that MetaMolGen consistently generates valid and diverse SMILES sequences under low-data regimes, outperforming conventional baselines. This highlights its advantage in fast adaptation and efficient conditional generation for practical molecular design.

URL PDF HTML ☆

赞 0 踩 0

2504.12329 2026-06-04 cs.CL cs.AI 版本更新

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

推测性思考：在推理时利用大模型指导增强小模型推理能力

Wang Yang, Xiang Yue, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University（凯斯西储大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出一种无需训练的推测性思考框架，通过让大推理模型在推理层面引导小模型，在提升小模型推理准确率的同时缩短输出长度。

详情

AI中文摘要

近期进展利用后训练来增强模型推理性能，这通常需要昂贵的训练流程，并且仍然存在低效、输出过长的问题。我们提出推测性思考，一种无需训练的框架，使大推理模型在推理层面引导小模型进行推理，区别于在词元层面操作的推测解码。我们的方法基于两个观察：（1）支持推理的词元（如“wait”）经常出现在结构分隔符（如“\n\n”）之后，作为反思或继续的信号；（2）大模型对反思行为有更强的控制，减少不必要的回溯同时提高推理质量。通过策略性地将反思步骤委托给能力更强的模型，我们的方法显著提升了推理模型的推理准确率，同时缩短了输出。在32B推理模型的辅助下，1.5B模型在MATH500上的准确率从83.2%提升至89.4%，实现了6.2%的大幅提升。同时，平均输出长度从5439个词元减少到4583个词元，下降了15.7%。此外，当应用于非推理模型（Qwen-2.5-7B-Instruct）时，我们的框架在相同基准上将准确率从74.0%提升至81.8%，实现了7.8%的相对提升。

英文摘要

Recent advances leverage post-training to enhance model reasoning performance, which typically requires costly training pipelines and still suffers from inefficient, overly lengthy outputs. We introduce Speculative Thinking, a training-free framework that enables large reasoning models to guide smaller ones during inference at the reasoning level, distinct from speculative decoding, which operates at the token level. Our approach is based on two observations: (1) reasoning-supportive tokens such as "wait" frequently appear after structural delimiters like "\n\n", serving as signals for reflection or continuation; and (2) larger models exhibit stronger control over reflective behavior, reducing unnecessary backtracking while improving reasoning quality. By strategically delegating reflective steps to a more capable model, our method significantly boosts the reasoning accuracy of reasoning models while shortening their output. With the assistance of the 32B reasoning model, the 1.5B model's accuracy on MATH500 increases from 83.2% to 89.4%, marking a substantial improvement of 6.2%. Simultaneously, the average output length is reduced from 5439 tokens to 4583 tokens, representing a 15.7% decrease. Moreover, when applied to a non-reasoning model (Qwen-2.5-7B-Instruct), our framework boosts its accuracy from 74.0% to 81.8% on the same benchmark, achieving a relative improvement of 7.8%.

URL PDF HTML ☆

赞 0 踩 0

2405.08036 2026-06-04 cs.LG cs.AI 版本更新

Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

合作多智能体强化学习中潜在最优联合动作识别

Chang Huang, Shatong Zhu, Junqiao Zhao, Hongtu Zhou, Di Zhang, Hai Zhang, Chen Ye, Ziqiao Wang, Guang Chen

发表机构 * School of Computer Science and Technology, Tongji University（同济大学计算机科学与技术学院）； Stanford University（斯坦福大学）； MOE Key Lab of Embedded System and Service Computing, Tongji University, Shanghai, China（同济大学嵌入式系统与服务计算教育部重点实验室，上海，中国）； The University of Hong Kong（香港大学）； Shanghai Innovation Institute（上海创新研究院）

AI总结针对值函数分解中单调性约束限制表达能力的问题，提出潜在最优联合动作加权方法，通过迭代加权训练保证最优策略恢复，在多个任务上超越现有方法。

Comments ICLR 2026

详情

Journal ref: ICLR 2026

AI中文摘要

值函数分解在合作多智能体强化学习（MARL）中被广泛使用。现有方法通常对联合动作值与个体动作值之间施加单调性约束以实现分散执行。然而，此类约束限制了值函数分解的表达能力，缩小了可表示的联合动作值范围，并阻碍了最优策略的学习。为解决这一问题，我们提出了潜在最优联合动作加权（POW）方法，该方法在现有近似加权策略可能失效的情况下确保最优策略恢复。POW通过一个理论上有依据的迭代加权训练过程，迭代地识别潜在最优联合动作并为其分配更高的训练权重。我们证明该机制保证了真实最优策略的恢复，克服了先前启发式加权策略的局限性。POW是架构无关的，可以无缝集成到现有的值函数分解算法中。在矩阵博弈、难度增强的捕食者-猎物任务、SMAC、SMACv2以及高速公路环境交叉口场景上的大量实验表明，POW显著提升了稳定性，并持续超越了最先进的基于值的MARL方法。

英文摘要

Value function factorization is widely used in cooperative multi-agent reinforcement learning (MARL). Existing approaches often impose monotonicity constraints between the joint action value and individual action values to enable decentralized execution. However, such constraints limit the expressiveness of value factorization, restricting the range of joint action values that can be represented and hindering the learning of optimal policies. To address this, we propose Potentially Optimal Joint Actions Weighting (POW), a method that ensures optimal policy recovery where existing approximate weighting strategies may fail. POW iteratively identifies potentially optimal joint actions and assigns them higher training weights through a theoretically grounded iterative weighted training process. We prove that this mechanism guarantees recovery of the true optimal policy, overcoming the limitations of prior heuristic weighting strategies. POW is architecture-agnostic and can be seamlessly integrated into existing value factorization algorithms. Extensive experiments on matrix games, difficulty-enhanced predator-prey tasks, SMAC, SMACv2, and a highway-env intersection scenario show that POW substantially improves stability and consistently surpasses state-of-the-art value-based MARL methods.

URL PDF HTML ☆

赞 0 踩 0

2503.06525 2026-06-04 cs.CY cs.AI 版本更新

From Motion Signals to Insights: A Unified Framework for Student Behavior Analysis and Feedback in Physical Education Classes

从运动信号到洞察：体育课堂学生行为分析与反馈的统一框架

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Zongyun Zhang, Ting Liu, Yuzhuo Fu

发表机构 * Shanghai Jiao Tong University（上海交通大学）

AI总结提出一个基于运动信号和大型语言模型的端到端统一框架，用于体育课堂学生行为分析，自动生成教学洞察和改进建议。

Comments Work in progress

详情

AI中文摘要

在教育场景中分析学生行为对于提高教学质量和学生参与度至关重要。现有的基于AI的模型通常依赖课堂视频录像来识别和分析学生行为。虽然这些基于视频的方法可以部分捕捉和分析学生动作，但在体育课堂中，由于活动在户外开放空间进行且活动多样，它们难以准确跟踪每个学生的动作，并且难以泛化到这些场景中涉及的专业技术动作。此外，当前方法通常缺乏整合专业教学知识的能力，限制了它们提供深入的学生行为洞察和优化教学设计反馈的能力。为了解决这些限制，我们提出了一个统一的端到端框架，该框架利用基于运动信号的人类活动识别技术，结合先进的大型语言模型，对体育课堂中的学生行为进行更详细的分析和反馈。我们的框架从教师的教学设计和学生在体育课期间的运动信号开始，最终生成带有教学洞察和改进建议的自动化报告，以优化学习和课堂教学。该解决方案提供了一种基于运动信号的方法，用于分析学生行为并优化针对体育课堂的教学设计。实验结果表明，我们的框架能够准确识别学生行为并产生有意义的教学洞察。

英文摘要

Analyzing student behavior in educational scenarios is crucial for enhancing teaching quality and student engagement. Existing AI-based models often rely on classroom video footage to identify and analyze student behavior. While these video-based methods can partially capture and analyze student actions, they struggle to accurately track each student's actions in physical education classes, which take place in outdoor, open spaces with diverse activities, and are challenging to generalize to the specialized technical movements involved in these settings. Furthermore, current methods typically lack the ability to integrate specialized pedagogical knowledge, limiting their ability to provide in-depth insights into student behavior and offer feedback for optimizing instructional design. To address these limitations, we propose a unified end-to-end framework that leverages human activity recognition technologies based on motion signals, combined with advanced large language models, to conduct more detailed analyses and feedback of student behavior in physical education classes. Our framework begins with the teacher's instructional designs and the motion signals from students during physical education sessions, ultimately generating automated reports with teaching insights and suggestions for improving both learning and class instructions. This solution provides a motion signal-based approach for analyzing student behavior and optimizing instructional design tailored to physical education classes. Experimental results demonstrate that our framework can accurately identify student behaviors and produce meaningful pedagogical insights.

URL PDF HTML ☆

赞 0 踩 0

2407.03884 2026-06-04 cs.CL cs.AI 版本更新

ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents

ChatSOP: 一种SOP引导的MCTS规划框架，用于可控的LLM对话代理

Zhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

发表机构 * TJUNLP Lab, College of Intelligence and Computing, Tianjin University（天津大学智能计算学院TJUNLP实验室）； Ping An Technology（平安科技）； Tübingen AI Center, University of Tübingen（图宾根大学图宾根人工智能中心）； Kunming University of Science and Technology（昆明理工大学）

AI总结提出ChatSOP框架，通过SOP引导的蒙特卡洛树搜索增强LLM对话代理的可控性，在动作准确率上相比GPT-3.5基线提升27.95%。

Comments Accepted to ACL 2025 main

详情

DOI: 10.18653/v1/2025.acl-long.863
Journal ref: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17637-17659, 2025

AI中文摘要

由大型语言模型驱动的对话代理在各种任务中表现出优越的性能。尽管它们能更好地理解用户并生成类人回复，但**缺乏可控性**仍然是一个关键挑战，常常导致对话偏离主题或任务失败。为了解决这个问题，我们引入标准操作程序来规范对话流程。具体来说，我们提出了**ChatSOP**，一种新颖的SOP引导的蒙特卡洛树搜索规划框架，旨在增强LLM驱动的对话代理的可控性。为此，我们整理了一个数据集，包含使用GPT-4o的半自动角色扮演系统生成的、经过严格人工质量控制验证的SOP标注的多场景对话。此外，我们提出了一种新方法，将思维链推理与监督微调相结合用于SOP预测，并利用SOP引导的蒙特卡洛树搜索在对话中进行最优动作规划。实验结果表明了我们方法的有效性，例如，与基于GPT-3.5的基线模型相比，动作准确率提高了27.95%，并且在开源模型上也显示出显著的提升。数据集和代码已公开。

英文摘要

Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their **lack of controllability** remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose **ChatSOP**, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2408.11121 2026-06-04 cs.LG cs.AI cs.CL cs.CR 版本更新

DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation

DOMBA: 通过最小有界聚合实现访问控制语言模型的双模型平衡

Tom Segal, Asaf Shabtai, Yuval Elovici

发表机构 * Ben-Gurion University（本·古里安大学）

AI总结提出DOMBA方法，通过最小有界平均函数聚合两个不同访问级别文档训练的语言模型的概率分布，在保证安全性的同时实现高效用。

Comments Code: https://github.com/ppo1/DOMBA 11 pages, 3 figures

详情

DOI: 10.1609/aaai.v39i23.34695
Journal ref: Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, pp. 25101-25109, 2025

AI中文摘要

大型语言模型（LLMs）的实用性在很大程度上取决于其训练数据的质量和数量。许多组织拥有大量数据语料库，可用于训练或微调针对其特定需求的LLMs。然而，这些数据集通常带有基于用户权限并由访问控制机制强制执行的访问限制。在此类数据集上训练LLMs可能导致敏感信息暴露给未经授权的用户。防止此类暴露的一种直接方法是为每个访问级别训练一个单独的模型。然而，由于每个模型的训练数据量相对于整个组织语料库的总量有限，这可能导致模型效用低下。另一种方法是在所有数据上训练单个LLM，同时限制未经授权信息的暴露。然而，当前针对LLMs的暴露限制方法对于访问控制数据无效，因为敏感信息在多个训练样本中频繁出现。我们提出DOMBA——双模型平衡——一种训练和部署LLMs的简单方法，可在提供高效用和访问控制功能的同时保证安全性。DOMBA使用“最小有界”平均函数（一个受较小值约束的函数，例如调和平均）聚合两个模型的概率分布，每个模型在具有（可能多个）不同访问级别的文档上训练。详细的数学分析和广泛评估表明，DOMBA在保护受限信息的同时，提供了与非安全模型相当的效用。

英文摘要

The utility of large language models (LLMs) depends heavily on the quality and quantity of their training data. Many organizations possess large data corpora that could be leveraged to train or fine-tune LLMs tailored to their specific needs. However, these datasets often come with access restrictions that are based on user privileges and enforced by access control mechanisms. Training LLMs on such datasets could result in exposure of sensitive information to unauthorized users. A straightforward approach for preventing such exposure is to train a separate model for each access level. This, however, may result in low utility models due to the limited amount of training data per model compared to the amount in the entire organizational corpus. Another approach is to train a single LLM on all the data while limiting the exposure of unauthorized information. However, current exposure-limiting methods for LLMs are ineffective for access-controlled data, where sensitive information appears frequently across many training examples. We propose DOMBA - double model balancing - a simple approach for training and deploying LLMs that provides high utility and access-control functionality with security guarantees. DOMBA aggregates the probability distributions of two models, each trained on documents with (potentially many) different access levels, using a "min-bounded" average function (a function that is bounded by the smaller value, e.g., harmonic mean). A detailed mathematical analysis and extensive evaluation show that DOMBA safeguards restricted information while offering utility comparable to non-secure models.

URL PDF HTML ☆

赞 0 踩 0

2411.19758 2026-06-04 cs.CV cs.AI cs.LG 版本更新

LaVIDE: Language-Prompted Satellite Change Detection via Map-Image Alignment

LaVIDE: 通过地图-图像对齐的语言提示卫星变化检测

Shuguo Jiang, Fang Xu, Chuandong Liu, Hong Tan, Shengyang Li, Lei Yu, Wen Yang, Sen Jia, Gui-Song Xia

发表机构 * School of Computer Science, Wuhan University（武汉大学计算机学院）； School of Artificial Intelligence, Wuhan University（武汉大学人工智能学院）； Technology and Engineering Center for Space Utilization and the Key Laboratory of Space Utilization, Chinese Academy of Sciences（中国科学院空间利用技术与重点实验室）； School of Aeronautics and Astronautics, University of Chinese Academy of Sciences（中国科学院大学航空宇航学院）； School of Electronic Information, Wuhan University（武汉大学电子信息学院）； College of Computer Science and Software Engineering, Shenzhen University（深圳大学计算机科学与软件工程学院）

AI总结提出LaVIDE框架，利用受限提示学习和对象感知嵌入增强，通过语言弥合高层地图类别与低层图像细节之间的语义鸿沟，实现跨模态对齐，在多类与单类变化检测任务上分别提升IoU 18.4%和5.2%。

详情

AI中文摘要

基于地图参考和最新图像的遥感变化检测，在缺乏早期图像进行比较时，有助于及时观测地球表面。然而，高层地图类别与低层图像细节之间的语义鸿沟阻碍了提取同质特征以进行稳健的时间关联。与比较像素级视觉相似性或传播分割误差的传统方法不同，我们提出了一种新颖框架——LaVIDE（用于检测变化的语言-视觉判别器），该框架以语言为中介，弥合了高层地图类别与低层图像细节之间的语义鸿沟。具体来说，我们引入了受限提示学习来生成上下文感知的文本提示，使地图语义与图像内容对齐，并采用对象感知嵌入增强策略将对象级属性（如形状、边界）整合到地图表示中。这些组件能够在统一的语言-视觉特征空间中实现稳健的跨模态对齐。在四个基准数据集（DynamicEarthNet、HRSCD、BANDON和SECOND）上的大量实验表明，LaVIDE以显著优势超越了最先进的方法，在多类和单类变化检测任务上分别实现了18.4%和5.2%的IoU提升。我们的框架不仅提高了地图-图像变化检测的准确性，还为以最少人工干预快速更新地图提供了实用解决方案，有望在城市规划、灾害评估和生态保护等领域产生广泛影响。代码和数据集可在 https://github.com/ShuGuoJ/LAVIDE.git 获取。

英文摘要

Remote sensing change detection based on a map reference and an up-to-date image boosts timely observation of the Earth's surface when earlier images are lacking for comparison. However, the semantic gap between high-level map categories and low-level image details hinders the extraction of homogeneous features for robust temporal association in change detection. Unlike conventional approaches that either compare pixel-level visual similarity or propagate segmentation errors, \textcolor{black}{we propose a novel framework, \underline{La}nguage-\underline{VI}sion \underline{D}iscriminator for d\underline{E}tecting changes, LaVIDE}, which bridges the semantic gap between high-level map categories and low-level image details using language as an intermediary. Specifically, we introduce {\it restricted prompt learning} to generate context-aware textual prompts that align map semantics with image content, and an {\it object-aware embedding enhancement} strategy to integrate object-level attributes (e.g., shape, boundary) into map representations. These components enable robust cross-modal alignment within a unified language-vision feature space. Extensive experiments on four benchmarks, DynamicEarthNet, HRSCD, BANDON, and SECOND, demonstrate that LaVIDE outperforms state-of-the-art methods by significant margins, achieving $18.4\%$ and $5.2\%$ improvements in IoU on multi-class and single-class change detection tasks, respectively. Our framework not only advances the accuracy of map-image change detection but also provides a practical solution for rapid map updating with minimal human intervention, promising broad impacts in urban planning, disaster assessment, and ecological conservation. Code and datasets are available at: https://github.com/ShuGuoJ/LAVIDE.git.

URL PDF HTML ☆

赞 0 踩 0

2407.13922 2026-06-04 cs.CV cs.AI cs.LG 版本更新

CounterFace: A Synthetic Face Dataset for Fine-Grained Counterfactual Evaluation of Face Recognition Systems

CounterFace: 用于人脸识别系统细粒度反事实评估的合成人脸数据集

Guruprasad Viswanathan Ramesh, Ashish Hooda, Shimaa Ahmed, Harrison J Rosenberg, Ramya Korlakai Vinayak, Kassem Fawaz

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； Visa Research（Visa研究）

AI总结提出CounterFace数据集，通过全自动流水线生成包含20种面部属性和8种人口统计因素的11,821个反事实人脸对，用于细粒度评估人脸识别系统在特定属性-人口统计组合下的性能退化。

Comments Code available at https://github.com/Guruprasad68/counterface_facct2026. Dataset available for non-commercial research upon request

详情

DOI: 10.1145/3805689.3812410

AI中文摘要

人脸识别系统广泛应用于关键应用，因此其在不同人群和条件下的可靠性和鲁棒性至关重要。人脸识别系统的标准评估通常依赖LFW等数据集来估计平均识别准确率。一些基准测试也捕捉了粗粒度的身份内变化，如老化、姿态和光照。然而，人脸存在更细粒度的变化，包括发型和化妆等外观变化，这些在现有基准测试中代表性不足。反事实评估提供了一种在细粒度变化下评估人脸识别鲁棒性的方法。然而，现有使用图像生成器合成的反事实人脸数据集由于在流程中使用人工验证，属性覆盖范围有限。我们提出CounterFace，一个新的反事实评估数据集，包含20种面部属性和8种人口统计因素，超过先前合成人脸数据集14种属性和2种人口统计因素。该数据集使用基于现成图像生成器和自定义验证器的全自动流水线生成，无需人工验证。CounterFace包含11,821个反事实人脸对，事后用户研究证实了生成反事实的忠实性。我们评估了两个商业和四个开源人脸识别系统（AWS Rekognition、Face++、AdaFace、MagFace、ArcFace、FaceNet）在160种属性-人口统计组合上的性能。与标准评估基准不同，我们的数据集有助于隔离单个系统的精确故障模式。结果表明，所有六个系统的性能退化因属性和人口统计而异，遮挡属性（如口罩和胡须）普遍降低性能。

英文摘要

Face recognition (FR) systems are widely deployed in critical applications, making their reliability and robustness across diverse populations and conditions essential. Standard evaluation of FR systems typically relies on datasets such as LFW to estimate average recognition accuracy. Some benchmarks also capture coarse-grained intra-identity variations such as aging, pose, and lighting. However, human faces undergo more fine-grained changes, including appearance changes such as hairstyles and makeup, that are underrepresented in existing benchmarks. Counterfactual evaluation provides a method to assess FR robustness under such fine-grained variations. Existing counterfactual face datasets synthesized with image generators, however, are limited in attribute coverage due to the use of humans for verification in the pipeline. We propose CounterFace, a new counterfactual evaluation dataset comprising 20 facial attributes and 8 demographic factors, exceeding prior synthetic face datasets by 14 attributes and 2 demographics. The dataset is generated using a fully automated pipeline based on off-the-shelf image generators with custom verifiers, removing human need for verification. CounterFace contains 11,821 counterfactual face pairs, and a post-hoc user study confirms the faithfulness of the generated counterfactuals. We evaluate two commercial and four open-source FR systems (AWS Rekognition, Face++, AdaFace, MagFace, ArcFace, FaceNet) across 160 attribute-demographic combinations. Our dataset helps in the isolation of precise failure modes for individual systems unlike standard evaluation benchmarks. Results indicate that the performance degradation varies across attributes and demographics for all six systems and occluding attributes (e.g., facemask and facial hair) universally degrade performance.

URL PDF HTML ☆

赞 0 踩 0

1709.09480 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

A Benchmark Environment Motivated by Industrial Control Problems

由工业控制问题启发的基准环境

Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文提出一个结合工业控制问题的基准环境，旨在解决真实工业环境与现有人工基准之间缺乏联系的问题，通过详细描述基准动态并识别典型实验设置来促进强化学习方法的改进。

详情

DOI: 10.1109/SSCI.2017.8280935
Journal ref: 2017 IEEE Symposium Series on Computational Intelligence (SSCI)

AI中文摘要

在强化学习（RL）研究领域，频繁出现新的有前景的方法被开发并引入RL社区。然而，尽管许多研究人员渴望将他们的方法应用于现实世界的问题，但在真实工业环境中实施这些方法往往是一个令人沮丧和繁琐的过程。通常，学术研究小组只能有限地访问真实工业数据和应用。因此，新方法通常通过使用人工软件基准来开发、评估和比较。一方面，这些基准旨在提供可解释的RL训练场景和对所用方法学习过程的深入见解。另一方面，它们通常与现实工业应用缺乏相似性。为此，我们利用行业经验设计了一个基准，以弥合自由可用、文档齐全且有动机的人工基准与真实工业问题属性之间的差距。所得到的工业基准（IB）已通过在GitHub上发布其Java和Python代码，包括一个OpenAI Gym包装器，向RL社区公开。在本文中，我们详细阐述了IB的动力学，并识别了能够捕捉现实世界工业控制问题中常见情况的典型实验设置。

英文摘要

In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.

URL PDF HTML ☆

赞 0 踩 0

2006.04013 2026-06-04 cs.CY cs.AI cs.LG 版本更新

AI from concrete to abstract: demystifying artificial intelligence to the general public

从具体到抽象的人工智能：向公众揭秘人工智能

Rubens Lacerda Queiroz, Fábio Ferrentini Sampaio, Cabral Lima, Priscila Machado Vieira Lima

发表机构 * Federal University of Rio de Janeiro – UFRJ – Brazil（巴西联邦大学里约热内卢分校）； InovLabs – Portugal（葡萄牙InovLabs）； Atlantica University – Portugal（葡萄牙Atlantica大学）； PESC/COPPE ； Tercio Pacitti Institute (NCE)（Tercio Pacitti研究所（NCE））

AI总结本文提出一种结合可视化编程与WiSARD无权重人工神经网络的新方法AIcon2abs，通过实践开发学习机器并观察其学习过程，帮助普通大众（包括儿童）理解人工智能的基本概念。

Comments 23 pages; 2 tables; 47 figures; review comment: Included references for the final published peer-reviewed version of this pre-print: https://doi.org/10.1007/s00146-021-01151-x and https://rdcu.be/cihdO; typos corrected

详情

DOI: 10.1007/s00146-021-01151-x
Journal ref: AI & SOCIETY, 36 877-893 (2021)

AI中文摘要

人工智能（AI）已被广泛应用于众多领域，这表明迫切需要开发手段，使普通大众对AI的含义有最基本的理解。本文结合可视化编程与WiSARD无权重人工神经网络，提出了一种新方法——从具体到抽象的人工智能（AIcon2abs），使普通人（包括儿童）能够实现这一目标。该方法的主要策略是通过与学习机器开发相关的实践活动，以及观察其学习过程，来促进对人工智能的去神秘化。因此，它能够使受训者获得技能，从而在涉及采用人工智能机制的辩论和决策中成为有洞察力的参与者。目前，通过编程教授基本AI概念的现有方法将机器智能视为外部元素/模块。经过训练后，该外部模块被耦合到学习者正在开发的主应用程序中。而在本文提出的方法中，训练和分类任务都是构成主程序的模块，就像其他编程结构一样。作为AIcon2abs的一个有益副作用，能够从数据中学习的程序与常规计算机程序之间的区别变得更加明显。此外，WiSARD无权重人工神经网络模型的简单性使得训练和分类任务的内部实现易于可视化和理解。

英文摘要

Artificial Intelligence (AI) has been adopted in a wide range of domains. This shows the imperative need to develop means to endow common people with a minimum understanding of what AI means. Combining visual programming and WiSARD weightless artificial neural networks, this article presents a new methodology, AI from concrete to abstract (AIcon2abs), to enable general people (including children) to achieve this goal. The main strategy adopted by is to promote a demystification of artificial intelligence via practical activities related to the development of learning machines, as well as through the observation of their learning process. Thus, it is possible to provide subjects with skills that contributes to making them insightful actors in debates and decisions involving the adoption of artificial intelligence mechanisms. Currently, existing approaches to the teaching of basic AI concepts through programming treat machine intelligence as an external element/module. After being trained, that external module is coupled to the main application being developed by the learners. In the methodology herein presented, both training and classification tasks are blocks that compose the main program, just as the other programming constructs. As a beneficial side effect of AIcon2abs, the difference between a program capable of learning from data and a conventional computer program becomes more evident. In addition, the simplicity of the WiSARD weightless artificial neural network model enables easy visualization and understanding of training and classification tasks internal realization.

URL PDF HTML ☆

赞 0 踩 0

1710.05465 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Flow: 一种用于混合自主性的模块化学习框架

Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen

发表机构 * Laboratory for Information and Decision Systems, Massachusetts Institute of Technology（信息与决策实验室，麻省理工学院）； Institute of Data, Systems, and Society, Massachusetts Institute of Technology（数据、系统与社会研究所，麻省理工学院）； Department of Mechanical Engineering, University of California, Berkeley（机械工程系，加州大学伯克利分校）

AI总结本文提出了一种模块化学习框架，利用深度强化学习解决复杂交通动态问题，通过提高系统层面的速度，使学习到的控制法则在仅有4-7%的自动驾驶汽车参与度下，相比人类驾驶性能提升高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则能够消除拥堵现象，达到近最优性能。

Comments 17 pages, 8 figures, 5 tables. 2021 IEEE Transactions on Robotics (T-RO)

详情

DOI: 10.1109/TRO.2021.3087314

AI中文摘要

自动驾驶车辆（AVs）的快速发展为交通系统带来了巨大的潜力，通过提高安全性和效率以及出行可及性。然而，随着AVs的采用，这些影响的发展进程并不清楚。从分析部分自动驾驶的总体目标来看，出现了许多技术挑战：部分控制和观测、多车辆交互以及现实世界网络所代表的大量场景。为了深入了解近期AV的影响，本文研究了深度强化学习（RL）在低AV采用率环境下克服这些挑战的适用性。本文提出了一种模块化学习框架，利用深度RL来处理复杂的交通动态。模块由多个部分组成，以捕捉常见的交通现象（如停止-启动交通拥堵、车道变换、交叉口）。学习到的控制法则在系统层面的速度上优于人类驾驶性能，仅在4-7%的AVs参与度下，提高了高达57%。此外，在单车道交通中，一个仅使用局部观测的小型神经网络控制法则被发现能够消除停止-启动交通现象，超越了所有已知的基于模型的控制器，达到近最优性能，并且能够推广到非分布交通密度。

英文摘要

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

URL PDF HTML ☆

赞 0 踩 0

1811.01220 2026-06-04 math.OC cs.AI cs.CC cs.NA math.NA 版本更新

Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints

任意阶非凸优化的最坏情况评估复杂度界限

Coralia Cartis, Nick I. M. Gould, Philippe L. Toint

发表机构 * Mathematical Institute, Oxford University（牛津大学数学研究所）； Computational Mathematics Group, STFC-Rutherford Appleton Laboratory（STFC-拉瑟福德苹果顿实验室计算数学组）； Namur Center for Complex Systems (naXys), University of Namur（纳慕尔复杂系统中心（naXys），纳慕尔大学）

AI总结本文研究了具有低成本约束的任意阶非凸优化问题的最坏情况评估复杂度界限，提出了一种概念性正则化算法，能够在给定精度和最优阶数的情况下，以O(ε^(- (p+1)/(p-q+1)))的次数评估目标函数及其导数，计算出合适的q阶近似极小值点。

Comments 30 pages

详情

Journal ref: SIAM Journal on Optimization,, vol. 30(1), pp. 513-541, 2020

AI中文摘要

我们为非凸最小化问题提供了精确的最坏情况评估复杂度界限，这些问题是具有通用低成本约束的问题，即约束的评估/执行成本相对于目标函数的评估成本可以忽略不计。这些界限统一、扩展或改进了所有已知的无约束和凸约束问题的上界和下界复杂度界限。证明了，在给定精度水平ε，最高可用Lipschitz连续导数阶数p和期望最优阶数q（介于1和p之间）的情况下，一个概念性正则化算法需要不超过O(ε^(- (p+1)/(p-q+1)))次目标函数及其导数的评估，以计算一个合适的q阶近似极小值点。通过适当选择正则化，如果p阶导数仅仅是Hölder连续而非Lipschitz连续，则也得出类似的结果。我们提供了一个例子，说明上述复杂度界限对于无约束和广泛类别的约束问题都是精确的，并且从最坏情况复杂度的角度解释了正则化方法的最优性，限于一大类使用相同导数信息的算法。

英文摘要

We provide sharp worst-case evaluation complexity bounds for nonconvex minimization problems with general inexpensive constraints, i.e.\ problems where the cost of evaluating/enforcing of the (possibly nonconvex or even disconnected) constraints, if any, is negligible compared to that of evaluating the objective function. These bounds unify, extend or improve all known upper and lower complexity bounds for unconstrained and convexly-constrained problems. It is shown that, given an accuracy level $ε$, a degree of highest available Lipschitz continuous derivatives $p$ and a desired optimality order $q$ between one and $p$, a conceptual regularization algorithm requires no more than $O(ε^{-\frac{p+1}{p-q+1}})$ evaluations of the objective function and its derivatives to compute a suitably approximate $q$-th order minimizer. With an appropriate choice of the regularization, a similar result also holds if the $p$-th derivative is merely Hölder rather than Lipschitz continuous. We provide an example that shows that the above complexity bound is sharp for unconstrained and a wide class of constrained problems, we also give reasons for the optimality of regularization methods from a worst-case complexity point of view, within a large class of algorithms that use the same derivative information.

URL PDF HTML ☆

赞 0 踩 0

1902.02311 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY 版本更新

Decentralized Multi-Agents by Imitation of a Centralized Controller

通过模仿集中控制器实现去中心化多智能体

Alex Tong Lin, Mark J. Debord, Katia Estabridis, Gary Hewer, Guido Montufar, Stanley Osher

发表机构 * UCLA（加州大学洛杉矶分校）； Max Planck Institute, Leipzig（莱比锡马克斯·普朗克研究所）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出了一种基于集中训练、去中心执行框架的新型算法，通过模仿学习生成去中心化多智能体，解决了多智能体强化学习中非平稳和部分可观测环境下的协作问题。

详情

AI中文摘要

我们考虑了一个多智能体强化学习问题，其中每个智能体试图在与其他智能体交互时最大化共享奖励，且可能无法通信。通常，智能体无法访问其他智能体的策略，因此每个智能体都处于非平稳和部分可观测的环境中。为了获得去中心化作用的多智能体，我们引入了一种新的算法，该算法基于流行的集中训练、去中心执行框架。该训练框架首先通过单一集中联合空间学习者解决多智能体问题，然后用于指导模仿学习以生成独立的去中心化多智能体。该框架具有灵活性，可以使用任何强化学习算法来获得专家，以及任何模仿学习算法来获得去中心化智能体。这与其它多智能体学习算法不同，例如可能需要更具体的结构。我们为该方法提供了一些理论界限，并展示了通过模仿学习可以获得多智能体问题的去中心化解决方案。

英文摘要

We consider a multi-agent reinforcement learning problem where each agent seeks to maximize a shared reward while interacting with other agents, and they may or may not be able to communicate. Typically the agents do not have access to other agent policies and thus each agent is situated in a non-stationary and partially-observable environment. In order to obtain multi-agents that act in a decentralized manner, we introduce a novel algorithm under the popular framework of centralized training, but decentralized execution. This training framework first obtains solutions to a multi-agent problem with a single centralized joint-space learner, which is then used to guide imitation learning for independent decentralized multi-agents. This framework has the flexibility to use any reinforcement learning algorithm to obtain the expert as well as any imitation learning algorithm to obtain the decentralized agents. This is in contrast to other multi-agent learning algorithms that, for example, can require more specific structures. We present some theoretical bounds for our method, and we show that one can obtain decentralized solutions to a multi-agent problem through imitation learning.

URL PDF HTML ☆

赞 0 踩 0

1701.00178 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Lazily Adapted Constant Kinky Inference for Nonparametric Regression and Model-Reference Adaptive Control

惰性适应的常数Kinky推断用于非参数回归和模型参考自适应控制

Jan-Peter Calliess

发表机构 * Dept. of Engineering Science University of Oxford, UK（工程科学系奥克斯福德大学英国）

AI总结本文提出了一种惰性适应的常数Kinky推断方法，用于非参数回归和模型参考自适应控制，通过在线估计Hölder常数并建立强通用逼近保证，展示了在密集数据下学习任意连续函数的能力。

详情

AI中文摘要

非线性集合成员预测、Lipschitz插值或Kinky推断是机器学习中利用预设Lipschitz性质来计算未观测函数值推断的方法。在已知目标函数真实最佳Lipschitz常数的上界时，这些方法提供收敛保证和预测的界限。考虑一个更一般的设置，该设置基于相对于伪度量的Hölder连续性，我们提出了一种在线方法，用于从可能受有界观测误差影响的函数值观测中估计Hölder常数。利用此方法在Kinky推断规则中计算自适应参数，从而得到一种非参数机器学习方法，我们为此建立了强通用逼近保证。也就是说，我们证明我们的预测规则在数据越来越密集的情况下，可以学习任意连续函数，其最坏误差界取决于观测不确定性水平。我们在非参数模型参考自适应控制（MRAC）的背景下应用了我们的方法。在一系列模拟飞机滚动动力学和性能指标中，我们的方法优于基于高斯过程和RBF神经网络最近提出的方法。对于离散时间系统，我们为我们的基于学习的控制器在批量学习和在线学习设置下的跟踪成功率提供了保证。

英文摘要

Techniques known as Nonlinear Set Membership prediction, Lipschitz Interpolation or Kinky Inference are approaches to machine learning that utilise presupposed Lipschitz properties to compute inferences over unobserved function values. Provided a bound on the true best Lipschitz constant of the target function is known a priori they offer convergence guarantees as well as bounds around the predictions. Considering a more general setting that builds on Hoelder continuity relative to pseudo-metrics, we propose an online method for estimating the Hoelder constant online from function value observations that possibly are corrupted by bounded observational errors. Utilising this to compute adaptive parameters within a kinky inference rule gives rise to a nonparametric machine learning method, for which we establish strong universal approximation guarantees. That is, we show that our prediction rule can learn any continuous function in the limit of increasingly dense data to within a worst-case error bound that depends on the level of observational uncertainty. We apply our method in the context of nonparametric model-reference adaptive control (MRAC). Across a range of simulated aircraft roll-dynamics and performance metrics our approach outperforms recently proposed alternatives that were based on Gaussian processes and RBF-neural networks. For discrete-time systems, we provide guarantees on the tracking success of our learning-based controllers both for the batch and the online learning setting.

URL PDF HTML ☆

赞 0 踩 0

1607.01202 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Optimal control for a robotic exploration, pick-up and delivery problem

机器人探索、拾取和配送问题的最优控制

Vladislav Nenchev, Christos G. Cassandras, Jörg Raisch

AI总结本文研究了机器人在最小时间内寻找并收集有限数量物体并运送到集散地的最优控制问题，采用递推时间窗方案解决混合系统中的最优控制问题，并提出基于运动参数化和梯度优化的事件驱动方法，以提高计算效率。

Comments 14 pages, 23 figures

详情

DOI: 10.1016/j.nahs.2018.06.004

AI中文摘要

本文解决了一个机器人在最小时间内寻找并收集有限数量物体并运送到集散地的最优控制问题。该机器人具有四阶动力学，其在拾取或放下物体时会瞬间改变。物体被建模为具有预先未知位置的点质量，在有界二维空间中可能包含未知障碍物。对于这种混合系统，通过递推时间窗方案近似求解最优控制问题（OCP），其中推导出的成本到目标的下界在最坏情况和概率情况下进行评估，假设物体位置服从均匀分布。首先，基于时间和位置空间离散化和混合整数规划的时间驱动近似解被提出。由于该解的计算成本较高，提出了一种基于合适运动参数化和梯度优化的事件驱动近似方法。在数值示例中比较了解决方案，表明后一种方法在计算上具有显著优势，同时与前者产生相似的定性结果。这些方法特别适用于各种机器人应用，如自动化清洁、搜索和救援、收割或制造。

英文摘要

This paper addresses an optimal control problem for a robot that has to find and collect a finite number of objects and move them to a depot in minimum time. The robot has fourth-order dynamics that change instantaneously at any pick-up or drop-off of an object. The objects are modeled by point masses with a-priori unknown locations in a bounded two-dimensional space that may contain unknown obstacles. For this hybrid system, an Optimal Control Problem (OCP) is approximately solved by a receding horizon scheme, where the derived lower bound for the cost-to-go is evaluated for the worst and for a probabilistic case, assuming a uniform distribution of the objects. First, a time-driven approximate solution based on time and position space discretization and mixed integer programming is presented. Due to the high computational cost of this solution, an alternative event-driven approximate approach based on a suitable motion parameterization and gradient-based optimization is proposed. The solutions are compared in a numerical example, suggesting that the latter approach offers a significant computational advantage while yielding similar qualitative results compared to the former. The methods are particularly relevant for various robotic applications like automated cleaning, search and rescue, harvesting or manufacturing.

URL PDF HTML ☆

赞 0 踩 0

1904.02851 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

具有主动不确定性学习和数据分析的灌溉系统鲁棒模型预测控制

Chao Shang, Wei-Han Chen, Abraham Duncan Stroock, Fengqi You

发表机构 * Department of Automation, Tsinghua University（自动化系，清华大学）

AI总结本文提出了一种数据驱动的鲁棒模型预测控制方法，结合机理模型和数据驱动模型，通过构建不确定性集来提高灌溉系统的控制效率和可靠性，实验证明该方法能显著减少用水量并提升控制性能。

详情

DOI: 10.1109/TCST.2019.2916753
Journal ref: IEEE Transactions on Control Systems Technology, vol. 28, no. 4, pp. 1493-1504, 2020

AI中文摘要

我们开发了一种新型的数据驱动鲁棒模型预测控制（DDRMPC）方法，用于自动控制灌溉系统。核心思想是将机理模型（描述土壤含水量变化的动力学）和数据驱动模型（表征蒸散发和降水预测误差的不确定性）整合到一个系统控制框架中。为了更好地捕捉不确定性分布的支持，我们采用了一种基于学习的新方法，通过历史数据构建不确定性集。对于蒸散发预测误差，采用基于支持向量聚类的不确定性集，该方法可以方便地从历史数据中构建。而对于降水预测误差，我们分析了其分布对预测值的依赖性，并进一步设计了基于此类不确定性的特性定制的不确定性集。这样，整体不确定性分布可以被详细描述，最终有助于做出合理且高效的控制决策。为了确保数据驱动不确定性集的质量，采用训练-校准方案以提供理论性能保证。采用广义仿射决策规则以获得最优控制问题的可计算近似，从而确保DDRMPC的实用性。使用真实数据的案例研究显示，DDRMPC能够可靠地保持土壤含水量在安全水平以上并避免作物破坏。所提出的DDRMPC方法相比精细调优的开环控制策略，总用水量减少了40%。与精心调优的规则基控制和确定性等价模型预测控制相比，所提出的DDRMPC方法可以显著减少总用水量并提高控制性能。

英文摘要

We develop a novel data-driven robust model predictive control (DDRMPC) approach for automatic control of irrigation systems. The fundamental idea is to integrate both mechanistic models, which describe dynamics in soil moisture variations, and data-driven models, which characterize uncertainty in forecast errors of evapotranspiration and precipitation, into a holistic systems control framework. To better capture the support of uncertainty distribution, we take a new learning-based approach by constructing uncertainty sets from historical data. For evapotranspiration forecast error, the support vector clustering-based uncertainty set is adopted, which can be conveniently built from historical data. As for precipitation forecast errors, we analyze the dependence of their distribution on forecast values, and further design a tailored uncertainty set based on the properties of this type of uncertainty. In this way, the overall uncertainty distribution can be elaborately described, which finally contributes to rational and efficient control decisions. To assure the quality of data-driven uncertainty sets, a training-calibration scheme is used to provide theoretical performance guarantees. A generalized affine decision rule is adopted to obtain tractable approximations of optimal control problems, thereby ensuring the practicability of DDRMPC. Case studies using real data show that, DDRMPC can reliably maintain soil moisture above the safety level and avoid crop devastation. The proposed DDRMPC approach leads to a 40% reduction of total water consumption compared to the fine-tuned open-loop control strategy. In comparison with the carefully tuned rule-based control and certainty equivalent model predictive control, the proposed DDRMPC approach can significantly reduce the total water consumption and improve the control performance.

URL PDF HTML ☆

赞 0 踩 0

1904.01068 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

在未知转移模型的确定性马尔可夫决策过程中实现高效且安全的探索

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

发表机构 * Stanford University（斯坦福大学）； Jet Propulsion Laboratory（喷气推进实验室）； California Institute of Technology（加州理工学院）

AI总结本文提出了一种安全探索算法，通过利用Lipschitz连续性确保在探索过程中不访问危险状态，该算法在确定性马尔可夫决策过程中提供了确定性的安全保证，并通过模拟导航任务验证了其性能。

Comments Proceedings of the American Control Conference (ACC), July 2019. The first two authors have equal contribution

1905.11011 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY 版本更新

Robustness of accelerated first-order algorithms for strongly convex optimization problems

强凸优化问题中加速一阶算法的鲁棒性

Hesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović

发表机构 * Ming Hsieh Department of Electrical and Computer Engineering（明希德电气与计算机工程系）； Daniel J. Epstein Department of Industrial and Systems Engineering（丹尼尔·J·埃普斯坦工业与系统工程系）

AI总结本文研究了在梯度评估中存在随机不确定性的加速一阶算法的鲁棒性，分析了噪声对优化变量均方误差的影响，并探讨了噪声放大与收敛速率之间的根本权衡。

Comments 45 pages, 6 figures

详情

AI中文摘要

我们研究了在梯度评估中存在随机不确定性的加速一阶算法的鲁棒性。具体而言，针对无约束、光滑、强凸优化问题，我们考察了在迭代项受到加性白噪声扰动时优化变量的均方误差。这种不确定性可能出现在通过真实系统的测量来近似梯度或在分布式网络计算中。尽管此类问题的一阶算法的动力学是非线性的，我们建立了均方偏离最优解的上界，这些上界在常数因子范围内是紧致的。我们的分析量化了通过任何类似于Nesterov或重力球方法的加速方案所获得的噪声放大与收敛速率之间的根本权衡。为了获得额外的分析洞察，对于强凸二次问题，我们明确地将优化变量的稳态方差表示为目标函数Hessian矩阵特征值的函数。我们证明了Hessian的整个谱，而不仅仅是极值特征值，影响噪声算法的鲁棒性。我们将这一结果专门应用于无向网络上的分布式平均问题，并考察了网络大小和拓扑结构对噪声加速算法鲁棒性的影响。

英文摘要

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over a network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-squared deviation from the optimal solution that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

URL PDF HTML ☆

赞 0 踩 0

1809.06646 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement Learning

基于强化学习的无模型自适应最优控制用于周期固定时间制造过程

Johannes Dornheim, Norbert Link, Peter Gumbsch

发表机构 * Institute Intelligent Systems Research Group, Karlsruhe University of Applied Sciences（智能系统研究组，卡尔斯鲁厄应用科学大学）； Institute for Applied Materials (IAM-CMS), Karlsruhe Institute of Technology（应用材料研究所（IAM-CMS），卡尔斯鲁厄理工学院）

AI总结本文提出了一种用于周期固定时间制造过程的自学习最优控制算法，通过强化学习在连续过程中构建控制模型，并利用测量的产品质量作为奖励，从而避免了传统模型预测控制和近似动态规划算法所需的先验模型公式，解决了非线性动态和随机影响带来的系统辨识、精确建模和运行复杂度问题。

Comments Journal preprint version

详情

DOI: 10.1007/s12555-019-0120-7
Journal ref: International Journal of Control, Automation and Systems (2019)

AI中文摘要

本文提出了一种用于周期固定时间制造过程的自学习最优控制算法，通过强化学习在连续过程中构建控制模型，并利用测量的产品质量作为奖励，从而避免了传统模型预测控制和近似动态规划算法所需的先验模型公式，解决了非线性动态和随机影响带来的系统辨识、精确建模和运行复杂度问题。该算法通过与过程交互在线学习期望函数，以推导最优的过程控制决策。所提出的算法考虑了过程条件的随机变化，并能够应对部分可观测性。开发并研究了一种基于Q学习的方法，用于部分可观测的周期固定时间制造过程的自适应最优控制。通过将其应用于模拟的随机最优控制问题，即金属板深拉伸过程，对所得到的算法进行了实例化和评估。

英文摘要

A self-learning optimal control algorithm for episodic fixed-horizon manufacturing processes with time-discrete control actions is proposed and evaluated on a simulated deep drawing process. The control model is built during consecutive process executions under optimal control via reinforcement learning, using the measured product quality as reward after each process execution. Prior model formulation, which is required by state-of-the-art algorithms from model predictive control and approximate dynamic programming, is therefore obsolete. This avoids several difficulties namely in system identification, accurate modelling, and runtime complexity, that arise when dealing with processes subject to nonlinear dynamics and stochastic influences. Instead of using pre-created process and observation models, value function-based reinforcement learning algorithms build functions of expected future reward, which are used to derive optimal process control decisions. The expectation functions are learned online, by interacting with the process. The proposed algorithm takes stochastic variations of the process conditions into account and is able to cope with partial observability. A Q-learning-based method for adaptive optimal control of partially observable episodic fixed-horizon manufacturing processes is developed and studied. The resulting algorithm is instantiated and evaluated by applying it to a simulated stochastic optimal control problem in metal sheet deep drawing.

URL PDF HTML ☆

赞 0 踩 0

1807.01739 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY 版本更新

Proximal algorithms for large-scale statistical modeling and sensor/actuator selection

大规模统计建模和传感器/执行器选择的近端算法

Armin Zare, Hesameddin Mohammadi, Neil K. Dhingra, Tryphon T. Georgiou, Mihailo R. Jovanović

发表机构 * Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California（南加州大学明希赫电子与计算机工程系）； Numerica Corporation（Numerica公司）

AI总结本文提出了一种统一的近端算法框架，用于解决大规模系统建模与控制中的正则化半定规划问题，通过近端方法实现了对统计建模和传感器/执行器选择的高效处理，展示了算法的线性收敛性和有效性。

Comments To appear in IEEE Trans. Automat. Control

详情

DOI: 10.1109/TAC.2019.2948268

AI中文摘要

若干在随机驱动动态系统建模与控制中的问题可以被表述为正则化半定规划。我们考察了两个具有代表性的此类问题，并展示了它们可以以类似的方式进行表述。第一个问题在统计建模中寻求通过适当且最小的扰动来协调观测统计数据。第二个问题则旨在为控制目的最优选择可用的传感器和执行器子集。为了应对大规模系统的建模与控制，我们开发了一种统一的算法框架，利用近端方法。我们的定制算法利用问题结构，使得能够处理统计建模以及传感器和执行器选择，比当前通用求解器可以处理的规模大得多。我们建立了近端梯度算法的线性收敛性，对比了所提出的近端算法与交替方向乘子法，并提供了示例以说明我们框架的优势和有效性。

英文摘要

Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second seeks to optimally select a subset of available sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers. We establish linear convergence of the proximal gradient algorithm, draw contrast between the proposed proximal algorithms and alternating direction method of multipliers, and provide examples that illustrate the merits and effectiveness of our framework.

URL PDF HTML ☆

赞 0 踩 0

1803.00204 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

Scalar Quantization as Sparse Least Square Optimization

标量量化作为稀疏最小二乘优化

Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

发表机构 * College of Electrical Engineering, Sichuan University（四川大学电气工程学院）； Department of Computer Science, Rutgers University -- New Brunswick（罗格斯大学新布朗斯维广场分校计算机科学系）； Engineering Practice Center, Chengdu University of Information Technology（成都信息科技大学工程实践中心）

AI总结本文提出了一种基于稀疏最小二乘优化的新方法，用于解决标量量化中的问题，通过引入l1、l1+l2和l0正则化，改进了传统聚类方法的不足，提升了在位宽缩减场景下的性能。

详情

DOI: 10.1109/TPAMI.2019.2952096
Journal ref: IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019

AI中文摘要

量化可以用来形成具有共享值的新向量/矩阵，其值接近原始数据。近年来，标量量化在值共享应用中的普及度迅速上升，因为它在减少神经网络复杂度方面具有巨大实用性。现有的基于聚类的量化技术虽然发展成熟，但存在多个缺点，包括对随机种子的依赖性、空集群或超出范围的集群，以及大量集群时的时间复杂度高。为克服这些问题，本文从新的视角研究标量量化问题，即稀疏最小二乘优化。具体来说，受稀疏最小二乘回归性质的启发，提出了几种基于l1最小二乘的量化算法。此外，还提出了类似的方案，具有l1 + l2和l0正则化。此外，为了计算给定数量的值/集群的量化结果，本文设计了一种迭代方法和一种基于聚类的方法，并且两者都建立在稀疏最小二乘之上。本文表明，后者方法在数学上等价于改进版的k-means聚类基量化算法，尽管两种算法起源于不同的直觉。所提出的算法在三种类型的数据上进行了测试，比较和分析了其计算性能，包括信息损失、时间消耗以及稀疏向量值的分布。本文为量化领域提供了新的视角，所提出的算法在某些位宽缩减场景下表现优异，当所需的量化后分辨率（值的数量）不显著低于原始数量时尤其如此。

英文摘要

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number.

URL PDF HTML ☆

赞 0 踩 0

1810.00697 2026-06-04 eess.SY cs.AI cs.LG cs.SY 版本更新

Data-driven Discovery of Cyber-Physical Systems

基于数据的物理系统发现

Ye Yuan, Xiuchuan Tang, Wei Pan, Xiuting Li, Wei Zhou, Hai-Tao Zhang, Han Ding, Jorge Goncalves

发表机构 * School of Automation, Huazhong University of Science and Technology（华中科技大学自动化学院）； State Key Lab of Digital Manufacturing Equipment and Technology（数字制造装备与技术国家重点实验室）； School of Mechanical Science and Engineering, Huazhong University of Science and Technology（华中科技大学机械科学与工程学院）； Department of Cognitive Robotics, Delft University of Technology（代尔夫特理工大学认知机器人系）； Department of Engineering, University of Cambridge（剑桥大学工程系）； Luxembourg Centre for Systems Biomedicine, University of Luxembourg（卢森堡系统生物医学中心，卢森堡大学）

AI总结本文提出了一种从数据直接反向工程物理系统的通用框架，通过识别物理系统和推断转移逻辑，成功应用于机械、电气系统和医疗应用，为预测CPS轨迹、评估性能、设计容错系统和制定新系统设计指南提供了新方法。

详情

DOI: 10.1038/s41467-019-12490-1

AI中文摘要

物理系统（CPSs）将软件嵌入物理世界，广泛应用于智能电网、机器人、智能制造和医疗监测等领域。由于其固有的复杂性，来自物理组件和网络组件的组合以及它们之间的相互作用，CPSs在建模方面表现出抗性。本文提出了一种从数据直接反向工程CPSs的通用框架。该方法涉及识别物理系统以及推断转移逻辑。它已成功应用于从机械和电气系统到医疗应用的多个现实世界示例。该新颖的框架旨在使研究人员能够基于发现的模型预测CPS的轨迹。此类信息已被证明对于评估CPS性能、设计容错CPS以及为新CPS制定设计指南至关重要。

英文摘要

Cyber-physical systems (CPSs) embed software into the physical world. They appear in a wide range of applications such as smart grids, robotics, intelligent manufacture and medical monitoring. CPSs have proved resistant to modeling due to their intrinsic complexity arising from the combination of physical components and cyber components and the interaction between them. This study proposes a general framework for reverse engineering CPSs directly from data. The method involves the identification of physical systems as well as the inference of transition logic. It has been applied successfully to a number of real-world examples ranging from mechanical and electrical systems to medical applications. The novel framework seeks to enable researchers to make predictions concerning the trajectory of CPSs based on the discovered model. Such information has been proven essential for the assessment of the performance of CPS, the design of failure-proof CPS and the creation of design guidelines for new CPSs.

URL PDF HTML ☆

赞 0 踩 0

1904.04211 2026-06-04 astro-ph.SR cs.AI cs.NA math.NA 版本更新

Desaturating EUV observations of solar flaring storms

淡化日冕层太阳耀斑风暴的观测

Sabrina Guastavino, Michele Piana, Anna Maria Massone, Richard Schwartz, Federico Benvenuto

发表机构 * CNR - SPIN（意大利国家研究委员会-SPIN）； NASA Goddard Space Flight Center（美国国家航空航天局戈达德空间飞行中心）

AI总结本文提出了一种新颖的去饱和方法，能够通过利用图像本身的信息恢复AIA图像中饱和区域的信号，为构建AIA数据重建流程提供了可靠工具。

详情

DOI: 10.3847/1538-4357/ab35d8

AI中文摘要

图像饱和一直是太阳天文观测中多个仪器面临的问题，特别是在EUV波长范围内。然而，随着太阳动态观测站（SDO）任务载荷中大气成像装配（AIA）的发射，图像饱和已成为大数据问题，涉及自2010年2月以来每年提供的 impressive 数据集中的约10^$帧。本文介绍了一种新颖的去饱和方法，该方法能够通过利用图像本身包含的信息来恢复任何AIA图像中饱和区域的信号。这种独特的方法学特性，加上去饱和图像前所未有的统计可靠性，可能使该算法成为实现AIA数据重建流程的完美工具，即使在长时间、高能耀斑事件的情况下也能正常工作。

英文摘要

Image saturation has been an issue for several instruments in solar astronomy, mainly at EUV wavelengths. However, with the launch of the Atmospheric Imaging Assembly (AIA) as part of the payload of the Solar Dynamic Observatory (SDO) image saturation has become a big data issue, involving around 10^$ frames of the impressive dataset this beautiful telescope has been providing every year since February 2010. This paper introduces a novel desaturation method, which is able to recover the signal in the saturated region of any AIA image by exploiting no other information but the one contained in the image itself. This peculiar methodological property, jointly with the unprecedented statistical reliability of the desaturated images, could make this algorithm the perfect tool for the realization of a reconstruction pipeline for AIA data, able to work properly even in the case of long-lasting, very energetic flaring events.

URL PDF HTML ☆

赞 0 踩 0

1806.06790 2026-06-04 cs.LG cs.AI cs.IT cs.SY eess.SY math.IT math.OC stat.ML 版本更新

Towards Distributed Energy Services: Decentralizing Optimal Power Flow with Machine Learning

迈向分布式能源服务：利用机器学习实现最优功率流的去中心化

Roel Dobbe, Oscar Sondermeijer, David Fridovich-Keil, Daniel Arnold, Duncan Callaway, Claire Tomlin

发表机构 * AI Now Institute at New York University（纽约大学AI现在研究所）； Energy & Resources Group at UC Berkeley（伯克利大学能源与资源组）

AI总结本文提出了一种基于机器学习的去中心化方法，通过本地可用信息学习可控分布式能源资源（DER）的控制策略，以重构和模仿集中式最优功率流（OPF）问题的解决方案，从而实现分布式能源服务。

Comments Accepted for publication. To appear in the IEEE Transactions on Smart Grid

详情

AI中文摘要

实现最优功率流（OPF）方法以调节电力网络中的电压和功率流通常被认为需要大量通信。我们考虑包含多个可控分布式能源资源（DER）的配电系统，并提出一种数据驱动的方法，用于学习每个DER的控制策略，以仅利用本地可用信息来重构和模仿集中式OPF问题的解决方案。集体来看，所有本地控制器紧密匹配集中式OPF解决方案，提供接近最优的性能并满足系统约束。速率失真框架使得能够分析由此产生的完全去中心化控制策略在重构OPF解决方案方面的效果。该方法为决定DER应与哪些节点通信以改进其个别策略提供了自然扩展。该方法在单相和三相测试馈线网络上应用，使用真实负载和分布式发电机的数据，重点于不表现出跨时间依赖性的DER。它为配电系统运营商提供了一个框架，以高效规划和操作DER的贡献，以实现配电网络中的分布式能源服务。

英文摘要

The implementation of optimal power flow (OPF) methods to perform voltage and power flow regulation in electric networks is generally believed to require extensive communication. We consider distribution systems with multiple controllable Distributed Energy Resources (DERs) and present a data-driven approach to learn control policies for each DER to reconstruct and mimic the solution to a centralized OPF problem from solely locally available information. Collectively, all local controllers closely match the centralized OPF solution, providing near optimal performance and satisfaction of system constraints. A rate distortion framework enables the analysis of how well the resulting fully decentralized control policies are able to reconstruct the OPF solution. The methodology provides a natural extension to decide what nodes a DER should communicate with to improve the reconstruction of its individual policy. The method is applied on both single- and three-phase test feeder networks using data from real loads and distributed generators, focusing on DERs that do not exhibit inter-temporal dependencies. It provides a framework for Distribution System Operators to efficiently plan and operate the contributions of DERs to achieve Distributed Energy Services in distribution networks.

URL PDF HTML ☆

赞 0 踩 0

1807.08229 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach

基于语义观测的最优连续状态POMDP规划：一种变分方法

Luke Burks, Ian Loefgren, Nisar Ahmed

AI总结本文提出了一种基于变分方法的最优规划策略，针对语义观测下的连续状态部分可观测马尔可夫决策过程（CPOMDP）进行改进，通过变分贝叶斯方法解决混合连续-离散概率模型的表示和推理问题，提升了动态决策任务的效率和鲁棒性。

Comments Final version accepted to IEEE Transactions on Robotics (in press as of August 2019)

详情

AI中文摘要

本文开发了用于利用语义观测进行最优规划的新策略，使用连续状态部分可观测马尔可夫决策过程（CPOMDP）。在高斯混合（GM）CPOMDP策略近似方法方面，提出了两项主要创新。尽管现有方法具有许多有益的理论性质，但它们无法高效地表示和推理混合连续-离散概率模型。第一项主要创新是通过softmax模型推导出连续-离散语义观测概率的变分贝叶斯GM近似，用于点基值迭代贝尔曼策略备份。这种方法的关键优势是可以在复杂的非高斯不确定性下进行动态决策，同时利用连续动态状态空间模型（从而避免繁琐且昂贵的离散化）。第二项主要创新是一种基于聚类的混合物凝聚技术，能够很好地扩展到非常大的GM策略函数和信念函数。针对目标搜索和拦截任务的仿真结果表明，这些创新所产生的GM策略比其他最先进的策略近似方法产生的策略更有效，但需要显著较少的建模开销和在线运行时间成本。此外，结果还显示该方法对模型误差具有鲁棒性，并能扩展到更高维度。

英文摘要

This work develops novel strategies for optimal planning with semantic observations using continuous state partially observable markov decision processes (CPOMDPs). Two major innovations are presented in relation to Gaussian mixture (GM) CPOMDP policy approximation methods. While existing methods have many desirable theoretical properties, they are unable to efficiently represent and reason over hybrid continuous-discrete probabilistic models. The first major innovation is the derivation of closed-form variational Bayes GM approximations of Point-Based Value Iteration Bellman policy backups, using softmax models of continuous-discrete semantic observation probabilities. A key benefit of this approach is that dynamic decision-making tasks can be performed with complex non-Gaussian uncertainties, while also exploiting continuous dynamic state space models (thus avoiding cumbersome and costly discretization). The second major innovation is a new clustering-based technique for mixture condensation that scales well to very large GM policy functions and belief functions. Simulation results for a target search and interception task with semantic observations show that the GM policies resulting from these innovations are more effective than those produced by other state of the art policy approximations, but require significantly less modeling overhead and online runtime cost. Additional results show the robustness of this approach to model errors and scaling to higher dimensions.

URL PDF HTML ☆

赞 0 踩 0

1902.07747 2026-06-04 eess.SY cs.AI cs.DC cs.RO cs.SY 版本更新

Lookup Table-Based Consensus Algorithm for Real-Time Longitudinal Motion Control of Connected and Automated Vehicles

基于查找表的共识算法用于连接和自动化车辆的实时纵向运动控制

Ziran Wang, Kyuntae Han, BaekGyu Kim, Guoyuan Wu, Matthew J. Barth

AI总结本文提出了一种基于查找表的共识算法，用于实时控制连接和自动化车辆的纵向运动，通过动态生成查找表来实时寻找最佳控制增益，优于之前的工作和线性反馈算法。

Comments 2019 American Control Conference (ACC)Philadelphia, PA, USA, July 10-12, 2019978-1-5386-7928-9

详情

AI中文摘要

连接和自动化车辆（CAV）技术是解决当前交通系统安全、机动性和可持续性问题的有前途的解决方案。具体而言，控制算法在CAV系统中起重要作用，因为它执行由前一步生成的命令，如通信、感知和规划。在本研究中，我们提出了一种共识算法，用于实时控制CAV的纵向运动。与该领域之前的研究不同，这些研究中的共识算法的控制增益是预先确定并固定的，我们开发了算法来构建查找表，实时寻找不同CAV初始条件下的理想控制增益。数值模拟显示，所提出的基于查找表的共识算法在四种不同场景中，针对各种CAV初始条件，在收敛时间和最大 jerk 方面均优于作者之前的工作以及van Arem的基于线性反馈的纵向运动控制算法。

英文摘要

Connected and automated vehicle (CAV) technology is one of the promising solutions to addressing the safety, mobility and sustainability issues of our current transportation systems. Specifically, the control algorithm plays an important role in a CAV system, since it executes the commands generated by former steps, such as communication, perception, and planning. In this study, we propose a consensus algorithm to control the longitudinal motion of CAVs in real time. Different from previous studies in this field where control gains of the consensus algorithm are pre-determined and fixed, we develop algorithms to build up a lookup table, searching for the ideal control gains with respect to different initial conditions of CAVs in real time. Numerical simulation shows that, the proposed lookup table-based consensus algorithm outperforms the authors' previous work, as well as van Arem's linear feedback-based longitudinal motion control algorithm in all four different scenarios with various initial conditions of CAVs, in terms of convergence time and maximum jerk of the simulation run.

URL PDF HTML ☆

赞 0 踩 0

1903.02531 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

将最优控制与学习相结合用于新环境中的视觉导航

Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Facebook AI Research（脸书人工智能研究）

AI总结本文提出了一种结合模型控制与学习感知的方法，用于在新环境中实现可靠的视觉导航，通过生成无碰撞路径的 waypoints，使机器人能够高效地到达目标位置，同时在低帧率和仿真到现实的迁移中表现良好。

Comments Project website: https://vtolani95.github.io/WayPtNav/

详情

AI中文摘要

基于模型的控制是机器人导航的流行范式，因为它可以利用已知的动力学模型来高效地规划鲁棒的机器人轨迹。然而，在环境事先未知且只能通过机器人上的传感器部分观测的情况下，使用基于模型的方法具有挑战性。在本工作中，我们通过将基于模型的控制与基于学习的感知相结合来解决这一不足。基于学习的感知模块生成一系列 waypoints，通过无碰撞路径引导机器人到达目标。这些 waypoints 被用于基于模型的规划器生成平滑且动态可行的轨迹，该轨迹通过反馈控制在物理系统上执行。我们在模拟的真实世界复杂环境中以及在实际地面车辆上的实验表明，与纯几何映射或端到端学习方法相比，所提出的方法在新环境中能够更可靠、更高效地到达目标位置。我们的方法不依赖于详细的显式 3D 环境地图，能够与低帧率工作，并且在仿真到现实的迁移中表现良好。描述我们方法和实验的视频可在项目网站上获得。

英文摘要

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories. However, it is challenging to use model-based methods in settings where the environment is a priori unknown and can only be observed partially through on-board sensors on the robot. In this work, we address this short-coming by coupling model-based control with learning-based perception. The learning-based perception module produces a series of waypoints that guide the robot to the goal via a collision-free path. These waypoints are used by a model-based planner to generate a smooth and dynamically feasible trajectory that is executed on the physical system using feedback control. Our experiments in simulated real-world cluttered environments and on an actual ground vehicle demonstrate that the proposed approach can reach goal locations more reliably and efficiently in novel environments as compared to purely geometric mapping-based or end-to-end learning-based alternatives. Our approach does not rely on detailed explicit 3D maps of the environment, works well with low frame rates, and generalizes well from simulation to the real world. Videos describing our approach and experiments are available on the project website.

URL PDF HTML ☆

赞 0 踩 0

1807.06613 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Deep Reinforcement Learning for Swarm Systems

深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

发表机构 * L-CAS University of Lincoln（L-CAS林肯大学）； Technische Universität Darmstadt（达姆施塔特技术大学）

AI总结本文提出了一种基于分布均嵌入的新状态表示方法，用于深度多智能体强化学习，以更有效地处理大规模同质群体系统的去中心化决策问题。

Comments 31 pages, 12 figures, version 3 (published in JMLR Volume 20)

详情

Journal ref: Journal of Machine Learning Research 20(54):1-31, 2019

AI中文摘要

最近，深度强化学习（RL）方法已成功应用于多智能体场景。通常，这些方法依赖于将智能体状态拼接起来以表示去中心化决策所需的信 �息内容。然而，拼接在大规模同质群体系统中表现不佳，因为它不利用这些系统固有的基本属性：（i）群体中的智能体是可互换的，（ii）群体中智能体的精确数量无关。因此，我们提出了一种基于分布均嵌入的新深度多智能体RL状态表示方法。我们将智能体视为分布的样本，并使用经验均嵌入作为去中心化策略的输入。我们通过直方图、径向基函数和端到端学习的神经网络定义了不同的均嵌入特征空间。我们在群体文献中两个著名的已知问题（相遇和追捕）上评估了该表示方法，在全局和局部可观察的设置中。对于局部设置，我们进一步引入了简单的通信协议。所有方法中，基于神经网络特征的均嵌入表示能够促进相邻智能体之间最丰富的信息交换，从而促进更复杂的集体策略的发展。

英文摘要

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

URL PDF HTML ☆

赞 0 踩 0

1905.13053 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Unpredictability of AI

AI的不可预测性

Roman V. Yampolskiy

发表机构 * Computer Engineering and Computer Science University of Louisville（计算机工程与计算机科学路易斯维尔大学）

AI总结本文研究了AI安全领域中一个核心问题，即智能系统的行为预测难题，证明了即使知道终端目标，也无法准确预测超人类智能系统的行为，对AI安全产生了深远影响。

1905.09673 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

基于Q矩阵迁移学习的深度Q学习用于新型火灾疏散环境

Jivitesh Sharma, Per-Arne Andersen, Ole-Chrisoffer Granmo, Morten Goodwin

发表机构 * Centre for Artificial Intelligence Research（人工智能研究中心）； Department of Information and Communication Technology（信息与通信技术系）； University of Agder, Norway（阿格德大学，挪威）

AI总结本文提出了一种基于Q矩阵迁移学习的深度Q学习方法，用于解决紧急疏散问题，通过预训练DQN网络权重以获取最短路径信息，并在复杂真实环境中实现最优疏散路径。

Comments 21 pages, 14 figures, 4 tables

详情

AI中文摘要

我们关注紧急疏散这一重要问题，该问题显然可以受益于强化学习，但长期以来未被充分研究。紧急疏散是一个复杂的任务，难以用强化学习解决，因为紧急情况高度动态，包含大量变化变量和复杂约束，使训练变得困难。在本文中，我们提出了第一个用于训练强化学习代理进行疏散规划的火灾疏散环境。该环境被建模为图，以捕捉建筑结构。它包括现实特征，如火势蔓延、不确定性和瓶颈。我们已经将环境实现为OpenAI gym格式，以促进未来研究。我们还提出了一种新的强化学习方法，该方法通过预训练DQN代理的网络权重来整合通往出口的最短路径信息。我们通过使用表格Q学习来学习建筑模型图中的最短路径来实现这一点。此信息通过故意在Q矩阵上过拟合来转移到网络。然后，预训练的DQN模型在火灾疏散环境中进行训练，以在时间变化条件下生成最优疏散路径。我们对所提出的方法与PPO、VPG、SARSA、A2C和ACKTR等最新强化学习算法进行了比较。结果表明，我们的方法在包括原始DQN模型在内的最新模型上表现出巨大的优势。最后，我们在一个大型且复杂的现实建筑中测试我们的模型，该建筑由91个房间组成，可以移动到任何其他房间，因此有8281种动作。我们使用基于注意力的机制来处理大动作空间。我们的模型在现实世界紧急环境中实现了接近最优的性能。

英文摘要

We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the environment in the OpenAI gym format, to facilitate future research. We also propose a new reinforcement learning approach that entails pretraining the network weights of a DQN based agents to incorporate information on the shortest path to the exit. We achieved this by using tabular Q-learning to learn the shortest path on the building model's graph. This information is transferred to the network by deliberately overfitting it on the Q-matrix. Then, the pretrained DQN model is trained on the fire evacuation environment to generate the optimal evacuation path under time varying conditions. We perform comparisons of the proposed approach with state-of-the-art reinforcement learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that our method is able to outperform state-of-the-art models by a huge margin including the original DQN based models. Finally, we test our model on a large and complex real building consisting of 91 rooms, with the possibility to move to any other room, hence giving 8281 actions. We use an attention based mechanism to deal with large action spaces. Our model achieves near optimal performance on the real world emergency environment.

URL PDF HTML ☆

赞 0 踩 0

1902.01119 2026-06-04 cs.AI cs.CL cs.LG cs.SY eess.SY 版本更新

The Natural Language of Actions

动作的自然语言

Guy Tennenholtz, Shie Mannor

发表机构 * Faculty of Electrical Engineering, Technion Institute of Technology, Israel（电气工程学院，技术学院，以色列）

AI总结本文提出Act2Vec框架，用于学习基于上下文的动作表示以提升强化学习性能，通过将相似动作分组并利用动作间的关系来改进Q值近似和状态表示。

Comments Published in the proceedings of the 36th International Conference on Machine Learning (ICML 2019)

1905.02606 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process

通过变分推断优化复杂系统的控制：离散事件决策过程

Wen Dong, Bo Liu, Fan Yang

发表机构 * University at Buffalo（布法罗大学）； Auburn University（阿伯伯大学）

AI总结本文提出了一种基于变分推断的方法，将复杂社会网络决策问题建模为离散事件决策过程，以解决高维状态-动作空间中的维度灾难问题，从而在现实交通场景中实现更高的系统预期奖励、更快的收敛速度和更低的价值函数方差。

详情

AI中文摘要

复杂社会系统由相互关联的个体组成，其相互作用导致群体行为。现实复杂系统的最优控制有广泛的应用，包括道路交通管理、流行病预防和信息传播。然而，由于高维和非线性系统动态以及决策者面临的爆炸性状态和动作空间，实现此类现实复杂系统控制具有挑战性。现有方法可分为基于模拟和解析两类。现有的模拟方法在蒙特卡洛积分中具有高方差，而解析方法则面临建模不准确的问题。我们采用模拟建模来指定复杂系统的复杂动态，并为在具有高维状态-动作空间的复杂网络中搜索最优策略开发了解析解。为了捕捉复杂系统的动态，我们将复杂社会网络决策问题建模为离散事件决策过程。为了解决复杂系统中的维度灾难和在高维状态-动作空间中的搜索问题，我们将复杂系统的控制减少到变分推断和参数学习，引入Bethe熵近似，并开发了期望传播算法。我们提出的方法在现实交通场景中比最先进的解析和采样方法在系统预期奖励、收敛速度和价值函数方差方面表现更优。

英文摘要

Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.

URL PDF HTML ☆

赞 0 踩 0

1809.07412 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture

在单体神经事件推理架构中的学习、规划与控制

Martin V. Butz, David Bilkey, Dania Humaidan, Alistair Knott, Sebastian Otte

发表机构 * Cognitive Modeling Group Computer Science Department University of Tübingen（图宾根大学认知建模组计算机科学系）

AI总结该研究提出了一种单体神经事件推理架构REPRISE，通过学习动态系统的时序事件预测模型，结合回顾和前瞻推理，实现对传感器运动动态的高效预测与控制。

Comments This is the final revision submitted to the Neural Networks journal. The revision mainly includes improvements in language, explanation, and additional references and system relations

详情

AI中文摘要

我们引入了REPRISE，一种回顾和前瞻推理方案，用于学习动态系统的时序事件预测模型。REPRISE推断出不可观测的上下文事件状态及其最佳解释最近遭遇的传感器运动经验的时序预测模型。同时，它以目标导向的方式优化即将到来的运动活动。在此，REPRISE通过循环神经网络（RNN）实现，该网络学习由不同模拟动态车辆生成的传感器运动连续性的时序前向模型。RNN通过上下文神经元增强，能够编码不同但相关的传感器运动动态为紧凑的事件代码。我们证明REPRISE能够同时学习分离和近似遇到的传感器运动动态：它分析传感器运动误差信号，同时适应内部上下文神经活动和连接权重值。此外，我们证明REPRISE可以利用所学模型诱导目标导向的模型预测控制，即近似主动推理：给定一个目标状态，系统想象一个优化该状态的运动命令序列，以最小化与目标的距离。RNN活动因此持续想象即将到来的未来并反思最近的过去，优化预测模型、隐藏神经状态活动和即将到来的运动活动。结果，事件预测神经编码得以发展，从而能够调用高效且适应性强的目标导向传感器运动控制。

英文摘要

We introduce REPRISE, a REtrospective and PRospective Inference SchEme, which learns temporal event-predictive models of dynamical systems. REPRISE infers the unobservable contextual event state and accompanying temporal predictive models that best explain the recently encountered sensorimotor experiences retrospectively. Meanwhile, it optimizes upcoming motor activities prospectively in a goal-directed manner. Here, REPRISE is implemented by a recurrent neural network (RNN), which learns temporal forward models of the sensorimotor contingencies generated by different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the encoding of distinct, but related, sensorimotor dynamics as compact event codes. We show that REPRISE concurrently learns to separate and approximate the encountered sensorimotor dynamics: it analyzes sensorimotor error signals adapting both internal contextual neural activities and connection weight values. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to the goal. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing the predictive model, the hidden neural state activities, and the upcoming motor activities. As a result, event-predictive neural encodings develop, which allow the invocation of highly effective and adaptive goal-directed sensorimotor control.

URL PDF HTML ☆

赞 0 踩 0

1808.07921 2026-06-04 cs.RO cs.AI cs.PL cs.SE cs.SY eess.SY 版本更新

SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems

SOTER：一种用于安全机器人系统编程的运行时保证框架

Ankush Desai, Shromona Ghosh, Sanjit A. Seshia, Natarajan Shankar, Ashish Tiwari

发表机构 * University of California at Berkeley, CA, USA（加州大学伯克利分校）； SRI International（SRI国际）； Microsoft（微软）

AI总结本文提出SOTER框架，通过一种编程语言和集成的运行时保证系统，为安全机器人系统提供保障，确保在使用未经认证组件时仍能满足安全要求。

详情

AI中文摘要

近年来，机器人实现更高自主性和智能性的趋势导致了高度复杂性。自主机器人越来越多地依赖第三方现成组件和复杂的机器学习技术。这种趋势使得提供强设计时认证的正确操作变得具有挑战性。为了解决这些挑战，我们提出了SOTER，一种机器人编程框架，包含两个关键组件：（1）一种用于实现和测试高层反应式机器人软件的编程语言；（2）一个集成的运行时保证（RTA）系统，该系统帮助在使用未经认证的组件时仍能提供安全保证。SOTER提供了语言原语，用于声明性地构建RTA模块，该模块包含一个高级高性能控制器（未经认证）、一个安全但性能较低的控制器（认证）以及期望的安全规范。该框架提供正式保证，确保一个良好的RTA模块始终满足安全规范，而无需完全牺牲性能，通过在安全时使用高性能未经认证的组件。SOTER允许复杂的机器人软件堆栈作为RTA模块的组合来构建，其中每个未经认证的组件都通过RTA模块进行保护。为了证明我们框架的有效性，我们考虑了一个现实世界案例研究，即构建一个安全的无人机监视系统。我们的实验在模拟和实际无人机上均表明，SOTER启用的RTA确保了系统的安全性，包括在不可信的第三方组件有bug或偏离预期行为时。

英文摘要

The recent drive towards achieving greater autonomy and intelligence in robotics has led to high levels of complexity. Autonomous robots increasingly depend on third party off-the-shelf components and complex machine-learning techniques. This trend makes it challenging to provide strong design-time certification of correct operation. To address these challenges, we present SOTER, a robotics programming framework with two key components: (1) a programming language for implementing and testing high-level reactive robotics software and (2) an integrated runtime assurance (RTA) system that helps enable the use of uncertified components, while still providing safety guarantees. SOTER provides language primitives to declaratively construct a RTA module consisting of an advanced, high-performance controller (uncertified), a safe, lower-performance controller (certified), and the desired safety specification. The framework provides a formal guarantee that a well-formed RTA module always satisfies the safety specification, without completely sacrificing performance by using higher performance uncertified components whenever safe. SOTER allows the complex robotics software stack to be constructed as a composition of RTA modules, where each uncertified component is protected using a RTA module. To demonstrate the efficacy of our framework, we consider a real-world case-study of building a safe drone surveillance system. Our experiments both in simulation and on actual drones show that the SOTER-enabled RTA ensures the safety of the system, including when untrusted third-party components have bugs or deviate from the desired behavior.

URL PDF HTML ☆

赞 0 踩 0

1904.05072 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Differential Dynamic Programming for Multi-Phase Rigid Contact Dynamics

多相刚体接触动力学中的微分动态规划

Rohan Budhiraja, Justin Carpentier, Carlos Mastalli, Nicolas Mansard

发表机构 * CNRS, LAAS（法国国家科学研究中心，拉拉斯研究所）； INRIA, France（法国国家信息与自动化研究所，法国）

AI总结本文提出使用微分动态规划算法来优化多相刚体接触动力学的全身轨迹，通过利用角动量提高运动效率，减少力和冲击，并在无外力情况下实现姿态控制。

Comments 6 pages, IEEE RAS International Conference on Humanoid Robots

详情

DOI: 10.1109/HUMANOIDS.2018.8624925

AI中文摘要

当今生成高效运动的常见策略是将问题分解为两个连续步骤：第一步生成接触序列和质心轨迹，第二步计算遵循质心模式的全身轨迹。然而，第二步通常由简单的程序如逆运动学求解器处理。相反，我们提出使用局部最优控制求解器，即微分动态规划（DDP），来计算全身轨迹。我们的方法通过利用角动量产生更高效的运动，具有较低的力和较小的冲击。为此，我们提出了一种原始的DDP公式，利用刚体接触模型的Kuhn-Tucker约束。通过在真实HRP-2机器人上执行大步行走和无外力情况下的姿态控制问题，我们实验性地展示了这种方法的重要性。

英文摘要

A common strategy today to generate efficient locomotion movements is to split the problem into two consecutive steps: the first one generates the contact sequence together with the centroidal trajectory, while the second one computes the whole-body trajectory that follows the centroidal pattern. Yet the second step is generally handled by a simple program such as an inverse kinematics solver. In contrast, we propose to compute the whole-body trajectory by using a local optimal control solver, namely Differential Dynamic Programming (DDP). Our method produces more efficient motions, with lower forces and smaller impacts, by exploiting the Angular Momentum (AM). With this aim, we propose an original DDP formulation exploiting the Karush-Kuhn-Tucker constraint of the rigid contact model. We experimentally show the importance of this approach by executing large steps walking on the real HRP-2 robot, and by solving the problem of attitude control under the absence of external forces.

URL PDF HTML ☆

赞 0 踩 0

1904.04595 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Simultaneous Contact, Gait and Motion Planning for Robust Multi-Legged Locomotion via Mixed-Integer Convex Optimization

通过混合整数凸优化实现鲁棒多足运动的同步接触、步态和运动规划

Bernardo Aceituno-Cabezas, Carlos Mastalli, Hongkai Dai, Michele Focchi, Andreea Radulescu, Darwin G. Caldwell, Jose Cappelletto, Juan C. Grieco, Gerardo Fernandez-Lopez, Claudio Semini

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA（电气与计算机工程系，佐治亚理工学院，亚特兰大，GA 30332 USA）； Twentieth Century Fox, Springfield, USA（二十世纪福克斯，斯普林菲尔德，USA）； Starfleet Academy, San Francisco, CA 96678 USA（星际舰队学院，旧金山，CA 96678 USA）； Tyrell Inc., 123 Replicant Street, Los Angeles, CA 90210 USA（泰勒尔公司，123 复制人街，洛杉矶，CA 90210 USA）

AI总结本文提出了一种混合整数凸优化方法，用于同时规划多足机器人的接触位置、步态转换和运动，以提高运动的通用性并保持低计算时间。

Comments 8 pages, IEEE Robotics and Automation Letters

详情

DOI: 10.1109/LRA.2017.2779821

AI中文摘要

传统多足运动规划方法将问题分为多个阶段，如接触搜索和轨迹生成。然而，同时考虑接触和运动对于生成复杂的全身行为至关重要。目前，将这些问题耦合在一起需要假设固定的步态序列和平坦地形条件，或者使用非凸优化，计算时间不可行。本文提出了一种混合整数凸公式，以高效的方式同时规划接触位置、步态转换和运动。与之前的工作不同，我们的方法不限于平坦地形或预设的步态序列。相反，我们纳入摩擦锥稳定性边际，近似机器人扭矩限制，并使用混合整数凸约束规划步态。我们通过在HyQ机器人上实验验证了我们的方法，穿越了不同具有挑战性的地形，其中非凸性和平坦地形假设可能导致次优或不稳定计划。我们的方法在保持低计算时间的同时提高了运动的通用性。

英文摘要

Traditional motion planning approaches for multi-legged locomotion divide the problem into several stages, such as contact search and trajectory generation. However, reasoning about contacts and motions simultaneously is crucial for the generation of complex whole-body behaviors. Currently, coupling theses problems has required either the assumption of a fixed gait sequence and flat terrain condition, or non-convex optimization with intractable computation time. In this paper, we propose a mixed-integer convex formulation to plan simultaneously contact locations, gait transitions and motion, in a computationally efficient fashion. In contrast to previous works, our approach is not limited to flat terrain nor to a pre-specified gait sequence. Instead, we incorporate the friction cone stability margin, approximate the robot's torque limits, and plan the gait using mixed-integer convex constraints. We experimentally validated our approach on the HyQ robot by traversing different challenging terrains, where non-convexity and flat terrain assumptions might lead to sub-optimal or unstable plans. Our method increases the motion generality while keeping a low computation time.

URL PDF HTML ☆

赞 0 踩 0

1904.02341 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments

在线风险受限的自主车辆动态环境中的运动规划

Xin Huang, Sungkweon Hong, Andreas Hofmann, Brian C. Williams

发表机构 * MIT Computer Science and Artificial Intelligence Laboratory（麻省理工学院计算机科学与人工智能实验室）

AI总结本文提出了一种在线风险受限的运动规划方法，通过结合意图识别算法和POMDP求解器，生成安全高效的路径规划方案，尤其在无保护左转和变道等复杂环境中表现更优。

Comments Accepted at ICAPS'19. 10 pages, 6 figures, 1 table

详情

AI中文摘要

高效且稳健的自主车辆运动规划面临的关键挑战是理解周围代理的意图。忽略动态环境中其他代理的意图会导致风险或过于保守的规划。本文将运动规划问题建模为部分可观测马尔可夫决策过程（POMDP），并提出一个在线系统，结合意图识别算法和POMDP求解器，为自主车辆生成风险受限的路径规划。意图识别算法利用贝叶斯过滤和预学习的机动运动模型，预测每个代理车辆在有限时间 horizon 内的混合运动状态。我们实时更新POMDP模型，并使用启发式搜索算法求解，生成具有碰撞概率上界保证的策略。我们证明，与基线方法相比，我们的系统在多个具有挑战性的环境中，能够生成更高效和安全的运动规划。

英文摘要

A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or over-conservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.

URL PDF HTML ☆

赞 0 踩 0

1707.09198 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC 版本更新

Data-Driven Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era

数据驱动的随机稳健优化：大数据时代不确定性优化的通用计算框架和算法

Chao Ning, Fengqi You

发表机构 * Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University（罗伯特·弗雷德里克·史密斯化学与生物分子工程学院，康奈尔大学）

AI总结本文提出了一种数据驱动的随机稳健优化框架，通过双层优化结构基于数据驱动的不确定性模型，结合两阶段随机规划和自适应稳健优化，解决大数据时代下的不确定性优化问题。

详情

DOI: 10.1016/j.compchemeng.2017.12.015
Journal ref: Computers & Chemical Engineering, Volume 111, Pages 115-133, 4 March 2018,

AI中文摘要

本文提出了一种新颖的数据驱动随机稳健优化（DDSRO）框架，用于利用带有标签的多类不确定性数据进行不确定性优化。大数据集中的不确定性数据通常来自各种条件，这些条件通过类别标签进行编码。采用狄利克雷过程混合模型和最大似然估计等机器学习方法进行不确定性建模。基于数据驱动的不确定性模型，进一步提出了一种双层优化结构的DDSRO框架。外层优化问题采用两阶段随机规划方法，以在不同数据类别上优化预期目标；自适应稳健优化作为内层问题，确保解决方案的鲁棒性，同时保持计算可行性。进一步开发了一种基于分解的算法，以高效解决由此产生的多级优化问题。通过过程网络设计和规划的案例研究，展示了所提框架和算法的应用性。

英文摘要

A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm.

URL PDF HTML ☆

赞 0 踩 0

1903.03948 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Rethinking System Health Management

重新思考系统健康管理

Edward Balaban, Stephen B. Johnson, Mykel J. Kochenderfer

发表机构 * Intelligent Systems Division, NASA Ames Research Center（美国国家航空航天局阿姆斯研究中心智能系统部门）； Dependable System Technologies, LLC（可靠系统技术有限公司）； Jacobs ESSCA Group at NASA Marshall Space Flight Center（美国国家航空航天局马歇尔太空飞行中心Jacobs ESSCA小组）； Department of Aeronautics and Astronautics, Stanford University（斯坦福大学航空与航天系）

AI总结本文提出将系统健康管理与决策制定统一起来，以提高系统运行效率并降低整体复杂性，通过数值示例展示了传统方法的局限性。

Comments Published in the proceedings of the 2018 AAAI Fall Symposium on Integrating Planning, Diagnosis, and Causal Reasoning

1812.05591 2026-06-04 eess.SY cs.AI cs.MA cs.SY 版本更新

TuSeRACT: Turn-Sample-Based Real-Time Traffic Signal Control

TuSeRACT：基于转向的实时交通信号控制

Srishti Dhamija, Pradeep Varakantham

发表机构 * School of Information Systems, Singapore Management University（新加坡管理大学信息学院）

AI总结本文提出TuSeRACT，一种基于转向的实时交通信号控制方法，通过采样转向流量来优化交通信号调度，从而降低车辆等待时间，相比SURTRAC有更优的性能。

详情

AI中文摘要

实时交通信号控制是一个具有挑战性的问题，由于不断变化的交通需求模式、有限的规划时间和各种不确定性来源（例如转向运动、车辆检测）在现实世界中。SURTRAC（可扩展的Urban交通控制）是一种最近开发的交通信号控制方法，它在实时计算中计算减少延误和协调（跨邻近交通灯）的即将到来车辆集群的调度。为了确保在转向引起的不确定性存在下实时响应性，SURTRAC计算调度以最小化预期转向运动的延误，而不是在转向引起的不确定性下最小化预期延误。这种近似确保了实时可处理性，但在存在转向引起的不确定性时会降低解决方案质量。为了解决这一限制，我们引入了TuSeRACT（基于转向的实时交通信号控制），一种分布式基于采样的调度方法用于交通信号控制。与SURTRAC不同，TuSeRACT计算调度以最小化采样转向运动的观察交通下的预期延误，并与邻近交叉口通信流量样本。我们将这种基于采样的调度问题公式化为一个约束程序，并在合成交通网络上经验性地评估了我们的方法。我们的方法在车辆等待时间方面相对于SURTRAC提供了显著更低的平均值。

英文摘要

Real-time traffic signal control is a challenging problem owing to constantly changing traffic demand patterns, limited planning time and various sources of uncertainty (e.g., turn movements, vehicle detection) in the real world. SURTRAC (Scalable URban TRAffic Control) is a recently developed traffic signal control approach which computes delay-minimizing and coordinated (across neighbouring traffic lights) schedules of oncoming vehicle clusters in real time. To ensure real-time responsiveness in the presence of turn-induced uncertainty, SURTRAC computes schedules which minimize the delay for the expected turn movements as opposed to minimizing the expected delay under turn-induced uncertainty. This approximation ensures real-time tractability, but degrades solution quality in the presence of turn-induced uncertainty. To address this limitation, we introduce TuSeRACT (Turn Sample based Real-time trAffic signal ConTrol), a distributed sample-based scheduling approach to traffic signal control. Unlike SURTRAC, TuSeRACT computes schedules that minimize expected delay over sampled turn movements of observed traffic, and communicates samples of traffic outflows to neighbouring intersections. We formulate this sample-based scheduling problem as a constraint program and empirically evaluate our approach on synthetic traffic networks. Our approach provides substantially lower mean vehicular waiting times relative to SURTRAC.

URL PDF HTML ☆

赞 0 踩 0

1902.08705 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

A General Framework for Structured Learning of Mechanical Systems

结构机械系统学习的通用框架

Jayesh K. Gupta, Kunal Menda, Zachary Manchester, Mykel J. Kochenderfer

发表机构 * Stanford University（斯坦福大学）

AI总结本文提出了一种通用框架，用于结构化学习机械系统，通过结合先验知识和训练表达式近似器来提高模型的准确性和效率。

Comments 10 pages, 7 figures. First two authors contributed equally. Submitted to IROS/RA-L. Code at https://github.com/sisl/mechamodlearn/

详情

AI中文摘要

学习准确的动力学模型对于优化和顺应性控制机器人系统至关重要。当前使用解析参数化进行白盒建模或使用神经网络进行黑盒建模的方法可能会产生高偏差或高方差。我们提出了一个灵活的灰盒模型，可以无缝地结合可用的先验知识，并在没有时训练具有表达能力的函数近似器。我们提出使用神经网络参数化机械系统，以建模其拉格朗日量和作用在其上的广义力。我们在模拟的驱动双摆上测试了我们的方法。我们展示了我们的方法在数据效率以及基于模型的强化学习中的性能优于朴素的黑盒模型。我们还系统地研究了我们的方法在结合可用的系统先验知识以提高数据效率方面的能力。

英文摘要

Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency.

URL PDF HTML ☆

赞 0 踩 0

1902.10590 2026-06-04 cs.SE cs.AI cs.LG cs.SY eess.SY 版本更新

Architecting Dependable Learning-enabled Autonomous Systems: A Survey

构建可靠的学习自主系统：一项综述

Chih-Hong Cheng, Dhiraj Gulati, Rongjie Yan

发表机构 * fortiss - Research Institute of the Free State of Bavaria, Germany（巴伐利亚自由州研究 institute）； State Key Laboratory of Computer Science, China（中国计算机科学国家重点实验室）

AI总结本文综述了构建可靠学习自主系统的方法，重点在于自动驾驶，讨论了多样冗余、信息融合和运行时监控等技术支柱，并总结了提升深度学习组件可靠性的最新方法，最后提出了研究方向。

1812.06120 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

模拟到缩放城市：通过自动驾驶车辆实现交通控制的零样本策略迁移

Kathy Jang, Eugene Vinitsky, Behdad Chalaki, Ben Remer, Logan Beaver, Andreas Malikopoulos, Alexandre Bayen

发表机构 * University of California, Berkeley（加州大学伯克利分校）； University of Delaware（德克萨斯大学）

AI总结本文通过深度强化学习训练自动驾驶车辆在环形交叉口的控制策略，并将训练好的策略迁移至缩放智能城市进行测试，发现注入噪声的策略在迁移后表现更佳，实现了交通流的优化。

Comments To be published at the International Conference on Cyber Physical Systems (ICCPS) 2019. 10 pages, 9 figures

详情

AI中文摘要

使用深度强化学习，我们训练了自动驾驶车辆在车队中通过环形交叉口的控制策略。使用Flow库，我们在微仿真器中训练了两种策略：一种在状态和动作空间中注入噪声，另一种则没有。在模拟中，自动驾驶车辆为两种策略都学习出一种涌现的引导行为，即减速以实现更流畅的合并。随后，我们将该策略直接迁移至德雷克塞尔大学缩放智能城市（UDSSC）测试平台，该平台是连接和自动化车辆的1:25比例测试场。我们对两种策略在缩放城市中的性能进行了表征。结果显示，无噪声策略经常导致碰撞，仅偶尔实现引导；而注入噪声的策略则始终表现出引导行为且无碰撞，表明噪声有助于零样本策略迁移。此外，迁移后的噪声注入策略在UDSSC中使平均行程时间减少了5%，最大行程时间减少了22%。控制器的视频可在https://sites.google.com/view/iccps-policy-transfer查看。

英文摘要

Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering behavior for both policies in which it slows to allow for smoother merging. We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of both policies on the scaled city. We show that the noise-free policy winds up crashing and only occasionally metering. However, the noise-injected policy consistently performs the metering behavior and remains collision-free, suggesting that the noise helps with the zero-shot policy transfer. Additionally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the controllers can be found at https://sites.google.com/view/iccps-policy-transfer.

URL PDF HTML ☆

赞 0 踩 0

1711.09048 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

A Compression-Inspired Framework for Macro Discovery

一种受压缩启发的宏发现框架

Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

发表机构 * College of Information and Computer Sciences（信息与计算机科学学院）； Department of Computer Science（计算机科学系）； University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； Federal University Rio Grande do Sul（里约格朗德杜斯阿鲁斯联邦大学）

AI总结本文提出了一种受压缩启发的宏发现框架，通过识别高性能策略获得的轨迹中的重复模式，帮助强化学习代理利用早期经验快速解决相关新任务。

Comments Accepted as Extended Abstract, AAMAS, 2019

1902.08274 2026-06-04 cs.AI cs.LG cs.MA cs.SY eess.SY 版本更新

An Online Decision-Theoretic Pipeline for Responder Dispatch

为响应调度设计一个在线决策理论管道

Ayan Mukhopadhyay, Geoffrey Pettet, Chinmaya Samal, Abhishek Dubey, Yevgeniy Vorobeychik

发表机构 * Vanderbilt University（范德比大学）； Washington University（华盛顿大学）

AI总结本文提出了一种在线决策理论管道，用于有效应对紧急事件，通过实时数据流更新模型，提高响应效率并减少计算时间。

Comments Appeared in ICCPS 2019

详情

DOI: 10.1145/3302509.3311055

AI中文摘要

向服务交通事故、火灾、 distress 电话和犯罪等紧急事件派遣应急响应人员的问题困扰着全球各地的城市。尽管此类问题已广泛研究，但大多数方法是离线的。这些方法无法捕捉到关键紧急响应发生的动态变化环境，因此无法在实践中实施。任何全面的方法必须考虑其他挑战，包括预测事件何时何地发生以及理解环境动态变化。我们描述了一个系统，该系统以在线方式处理所有这些问题，即模型通过流数据源更新。我们强调这种做法对应急响应有效性的重要性，并提出了一种算法框架，可以为给定的决策理论模型计算有希望的行动。我们还提出了一种在线机制用于事件预测，以及基于循环神经网络的方法来学习和预测影响响应调度的环境特征。我们比较了我们的方法与现有最先进的方法和现有调度策略，结果表明我们的方法在减少响应时间的同时大幅减少了计算时间。

英文摘要

The problem of dispatching emergency responders to service traffic accidents, fire, distress calls and crimes plagues urban areas across the globe. While such problems have been extensively looked at, most approaches are offline. Such methodologies fail to capture the dynamically changing environments under which critical emergency response occurs, and therefore, fail to be implemented in practice. Any holistic approach towards creating a pipeline for effective emergency response must also look at other challenges that it subsumes - predicting when and where incidents happen and understanding the changing environmental dynamics. We describe a system that collectively deals with all these problems in an online manner, meaning that the models get updated with streaming data sources. We highlight why such an approach is crucial to the effectiveness of emergency response, and present an algorithmic framework that can compute promising actions for a given decision-theoretic model for responder dispatch. We argue that carefully crafted heuristic measures can balance the trade-off between computational time and the quality of solutions achieved and highlight why such an approach is more scalable and tractable than traditional approaches. We also present an online mechanism for incident prediction, as well as an approach based on recurrent neural networks for learning and predicting environmental features that affect responder dispatch. We compare our methodology with prior state-of-the-art and existing dispatch strategies in the field, which show that our approach results in a reduction in response time with a drastic reduction in computational time.

URL PDF HTML ☆

赞 0 踩 0

1812.07084 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Learning Constraints from Demonstrations

从示范中学习约束

Glen Chou, Dmitry Berenson, Necmiye Ozay

发表机构 * Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 48109, USA（电气工程与计算机科学系，密歇根大学，安娜堡，MI，48109，美国）

AI总结该研究提出了一种从示范中学习未知约束的方法，通过任务示范、成本函数和系统动力学与控制约束，利用hit-and-run采样获取低成本但不安全的轨迹，并通过整数规划获得一致的不安全集表示，同时理论分析了可从安全示范中学习的约束子集。

Comments Presented at the Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018, Mérida, Mexico

1709.04794 2026-06-04 cs.AI cs.NA cs.PF math.NA 版本更新

Fast semi-supervised discriminant analysis for binary classification of large data-sets

快速半监督判别分析用于大数据集的二分类

Joris Tavernier, Jaak Simm, Karl Meerbergen, Joerg Kurt Wegner, Hugo Ceulemans, Yves Moreau

发表机构 * Department of Computer Science, KU Leuven（库勒万大学计算机科学系）

AI总结本文提出并分析了三种可扩展的半监督判别分析方法，通过利用数据稀疏性和Krylov子空间的移位不变性，提高了大数据集二分类的效率和性能。

1610.05202 2026-06-04 cs.LG cs.AI cs.DC cs.SY eess.SY stat.ML 版本更新

Decentralized Collaborative Learning of Personalized Models over Networks

网络上的去中心化协作学习个性化模型

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi

发表机构 * INRIA

AI总结本文研究了在协作对等网络中，如何通过与其他具有相似目标的代理通信来改进本地训练模型，提出两种异步 gossip 算法并基于 ADMM 实现去中心化算法。

Comments To appear in the Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)

1606.02421 2026-06-04 stat.ML cs.AI cs.DC cs.LG cs.SY eess.SY 版本更新

为使用自动驾驶电动车的拼车系统进行规划的近似动态规划

Lina Al-Kanj, Juliana Nascimento, Warren B. Powell

发表机构 * Operations Research and Financial Engineering Department（运筹学与金融工程系）

AI总结本文研究了自动驾驶电动车拼车系统中的调度问题、 surge定价问题和车队规模规划问题，采用近似动态规划方法来优化车辆分配、充电和重新定位决策，并通过分层聚合技术提高价值函数估计的准确性，同时利用自适应学习方法确定每趟行程的定价。

详情

AI中文摘要

在十年内，几乎每家主要汽车公司以及如Uber等车队运营商都宣布计划将自动驾驶车辆投放到道路上。同时，电动车正迅速成为下一代技术，不仅成本效益高，还能减少碳足迹。由中央管理的无人驾驶车队与电动车的操作特性相结合，正创造一种变革性技术，提供显著的成本节省和高水平的服务。该问题涉及调度问题，即分配乘客到车辆；surge定价问题，即决定每趟行程的价格；以及规划问题，即决定车队规模。我们使用近似动态规划来开发高质量的操作调度策略，以确定哪辆车最适合特定行程，何时应充电，以及何时应重新定位到提供更高行程密度的区域。我们证明价值函数在电池和时间维度上是单调的，并利用分层聚合技术，用少量观测数据获得更好的价值函数估计。然后，使用自适应学习方法讨论surge定价问题，以决定每趟行程的价格。最后，我们讨论了车队规模问题，其取决于前两个问题。

英文摘要

Within a decade, almost every major auto company, along with fleet operators such as Uber, have announced plans to put autonomous vehicles on the road. At the same time, electric vehicles are quickly emerging as a next-generation technology that is cost effective, in addition to offering the benefits of reducing the carbon footprint. The combination of a centrally managed fleet of driverless vehicles, along with the operating characteristics of electric vehicles, is creating a transformative new technology that offers significant cost savings with high service levels. This problem involves a dispatch problem for assigning riders to cars, a surge pricing problem for deciding on the price per trip and a planning problem for deciding on the fleet size. We use approximate dynamic programming to develop high-quality operational dispatch strategies to determine which car is best for a particular trip, when a car should be recharged, and when it should be re-positioned to a different zone which offers a higher density of trips. We prove that the value functions are monotone in the battery and time dimensions and use hierarchical aggregation to get better estimates of the value functions with a small number of observations. Then, surge pricing is discussed using an adaptive learning approach to decide on the price for each trip. Finally, we discuss the fleet size problem which depends on the previous two problems.

URL PDF HTML ☆

赞 0 踩 0

1803.00444 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML 版本更新

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

通过非参数时空子目标建模实现逆强化学习

Adrian Šošić, Elmar Rueckert, Jan Peters, Abdelhak M. Zoubir, Heinz Koeppl

发表机构 * Signal Processing Group（信号处理组）； Institute for Robotics and Cognitive Systems（机器人与认知系统研究所）； Autonomous Systems Labs（自主系统实验室）； Bioinspired Communication Systems（生物启发通信系统）

AI总结本文提出了一种基于非参数时空子目标建模的逆强化学习方法，通过局部上下文更高效地解释单条轨迹，实现更紧凑的行为表示，并构建隐式意图模型以预测未观察到的情况，从而在处理意图变化和主动学习场景中表现出色。

Comments 45 pages, 14 figures; ### Version 3 ### published in the Journal of Machine Learning Research

详情

AI中文摘要

逆强化学习（IRL）领域的发展导致了更复杂的推理框架，这些框架放宽了原始建模假设，即观察到的代理行为仅反映单一意图。相反于学习全局行为模型，最近的IRL方法将演示数据分成部分，以考虑不同轨迹可能对应不同意图，例如因为它们由不同领域专家生成。在本工作中，我们进一步采用子目标的直观概念，建立一个前提：即使单条轨迹在特定上下文中局部解释也比全局更高效，从而实现更紧凑的行为表示。基于这一假设，我们构建了代理目标的隐式意图模型，以预测未观察到的情况。结果是一种集成的贝叶斯预测框架，显著优于现有IRL解决方案，并提供与专家计划一致的平滑策略估计。最值得注意的是，我们的框架自然处理代理意图随时间变化的情况，而经典IRL算法失败。此外，由于其概率性质，该模型可以轻松应用于主动学习场景，以指导专家的演示过程。

英文摘要

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert.

URL PDF HTML ☆

赞 0 踩 0

1811.12211 2026-06-04 eess.SP cs.AI cs.SY eess.SY 版本更新

Particle Probability Hypothesis Density Filter based on Pairwise Markov Chains

基于配对马尔可夫链的粒子概率假说密度滤波器

Jiangyi Liu, Chunping Wang, Wei Wang

发表机构 * Electronic and optical engineering Department, Shijiazhuang Campus of Army Engineering University（陆军工程大学石家庄校区电子与光学工程学院）； China Huayin Ordnance Test Center（中国华阴 ordnance 测试中心）

AI总结本文提出了一种基于配对马尔可夫链模型的粒子概率假说密度滤波器（PF-PMC-PHD），用于非线性多目标跟踪系统，通过放松传统HMC模型的独立性假设，提升了跟踪性能。

1811.11259 2026-06-04 cs.LG cs.AI cs.DS cs.SY eess.SY stat.ML 版本更新

Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning

基于强化学习的能源收集传感器的扩展配置

Francesco Fraternali, Bharathan Balaji, Rajesh Gupta

发表机构 * University of California, San Diego（加州大学圣迭戈分校）； University of California, Los Angeles（加州大学洛杉矶分校）

AI总结本文提出利用强化学习自动配置室内太阳能板能源收集传感器的采样率，通过减少训练阶段和计算需求，实现快速部署和大规模扩展，有效提升传感器数据采集效率并避免能源耗尽。

Comments 7 pages, 5 figures

详情

DOI: 10.1145/3279755.3279760
Journal ref: ENSsys '18: International Workshop on Energy Harvesting & Energy-Neutral Sensing Systems}{November 4, 2018}{Shenzhen, China

AI中文摘要

随着物联网（IoT）的出现，越来越多的能源收集方法被用于补充或替代电池供电传感器。能源收集传感器需要根据应用、硬件和环境条件进行配置，以最大化其效用。目前，传感器配置要么是手动的，要么基于启发式方法，需要宝贵的领域专业知识。强化学习（RL）是一种有前景的方法，可以自动化配置并高效扩展IoT部署，但尚未在实践中得到应用。我们提出了解决这一差距的解决方案：减少RL的训练阶段，使节点在部署后短时间内即可运行，并减少计算需求以扩展到大规模部署。我们专注于配置基于室内太阳能板的能源收集传感器的采样率。我们基于三个月内从5个传感器节点收集的数据创建了一个模拟器。我们的模拟结果表明，RL可以有效学习能源可用性模式，并配置传感器节点的采样率以在确保不耗尽能源存储的情况下最大化传感数据。通过我们的方法，节点可以在部署的第一天内投入使用。我们还展示了通过使用相似光照条件的节点共享单个策略来减少RL策略数量的可能性。

英文摘要

With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.

URL PDF HTML ☆

赞 0 踩 0

1811.09914 2026-06-04 eess.SY cs.AI cs.MA cs.RO cs.SY 版本更新

RADMPC: A Fast Decentralized Approach for Chance-Constrained Multi-Vehicle Path-Planning

RADMPC：一种用于机会约束多车辆路径规划的快速去中心化方法

Aaron Huang, Benjamin J. Ayton, Brian C. Williams

发表机构 * Computer Science and Artificial Intelligence Laboratory（计算机科学与人工智能实验室）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出了一种基于去中心化路径规划方法RADMPC的快速机会约束多车辆路径规划方法，通过评估车辆交互来确定需要耦合规划的车辆集，并利用IRA在较小的车辆集上快速规划安全路径，从而显著提高计算效率。

详情

AI中文摘要

鲁棒的多车辆路径规划对于确保运输、搜索救援和机器人探索等应用中的多车辆系统安全性至关重要。迭代风险分配（IRA）等机会约束方法已被开发用于环境扰动无界的场景。然而，多车辆情况下的机会约束方法通常采用集中策略，其中所有车辆对之间存在耦合关系。随着车队规模的增加，这种策略变得不可行，因为计算时间与规划的车辆数呈指数增长，由于车辆对之间的耦合约束呈多项式增长。我们提出了一种更快的机会约束多车辆路径规划方法，该方法依赖于一种称为风险意识去中心化模型预测控制（RADMPC）的去中心化路径规划方法，以快速近似集中IRA方法。RADMPC近似通过评估车辆交互来确定应耦合规划的车辆集。将IRA应用于由RADMPC近似确定的较小车辆集上，能够快速为整个车队规划安全路径。蒙特卡洛模拟分析证明了我们方法的正确性，并与集中IRA方法相比显示出显著的计算时间改进。

英文摘要

Robust multi-vehicle path-planning is important for ensuring the safety of multi-vehicle systems in applications like transportation, search and rescue, and robotic exploration. Chance-constrained methods like Iterative Risk Allocation (IRA)\cite{IRA} have been developed for situations where environmental disturbances are unbounded. However, chance-constrained methods for the multi-vehicle case generally use centralized strategies where the vehicle set is planned with couplings between all vehicle pairs. This approach is intractable as fleet size increases because computation time is exponential with respect to the number of vehicles being planned over due to a polynomial increase in coupling constraints between vehicle pairs. We present a faster approach for chance-constrained multi-vehicle path-planning that relies upon a decentralized path-planning method called Risk-Aware Decentralized Model Predictive Control (RADMPC) to rapidly approximate a centralized IRA approach. The RADMPC approximation is evaluated for vehicle interactions to determine the vehicle sets that should be planned in a coupled manner. Applying IRA to the smaller vehicle sets determined from the RADMPC approximation rapidly plans safe paths for the entire fleet. A Monte Carlo simulation analysis demonstrates the correctness of our approach and a significant improvement in computation time compared to a centralized IRA approach.

URL PDF HTML ☆

赞 0 踩 0

1811.06447 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Adversarial Resilience Learning - Towards Systemic Vulnerability Analysis for Large and Complex Systems

对抗韧性学习 - 面向大规模复杂系统系统性脆弱性分析

Lars Fischer, Jan-Menno Memmen, Eric MSP Veith, Martin Tröschel

发表机构 * the number of potential states is to large and the behaviour is too（复杂系统）

AI总结本文提出对抗韧性学习（ARL）概念，用于建模、训练和分析人工神经网络作为复杂系统中竞争代理的表示。通过模拟电力系统中的攻击者和防御者角色，ARL提供了一种适应性强、可重复的基于行为的测试方法，能够检测之前未知的攻击向量。

Comments 10 pages

1811.05788 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy

通过模仿最优策略从天空图像中学习补偿光伏功率波动

Robin Spiess, Felix Berkenkamp, Jan Poland, Andreas Krause

发表机构 * Department of Computer Science, ETH Zurich（计算机科学系，苏黎世联邦理工学院）； ABB Corporate Research, Switzerland（瑞士ABB企业研究）

AI总结本文提出了一种基于深度学习的方法，利用天空图像预测性地补偿光伏功率波动，减少电池压力，通过模仿学习训练神经网络近似最优策略。

Comments 7 pages, 7 figures

详情

AI中文摘要

光伏（PV）发电站的输出功率取决于环境，因此会随时间波动。这导致光伏功率可能在电网中引起不稳定性，尤其是在日益广泛使用的情况下。限制功率输出变化率是缓解这些波动的常见方法，通常借助大型电池。一种使用这些电池补偿阶跃变化的反应控制器在实践中有效，但会导致电池因高能量通过而受到压力。在本文中，我们提出了一种深度学习方法，利用天空图像来预测性地补偿功率波动并减少电池压力。特别是，我们证明可以通过仅在事后可用的信息来计算最优控制策略。基于此，我们使用模仿学习训练一个神经网络，该网络近似这种事后最优策略，但仅使用当前可用的天空图像和传感器数据。我们对一个大规模的测量和图像数据集进行了评估，并展示了训练后的策略能够减少电池压力。

英文摘要

The energy output of photovoltaic (PV) power plants depends on the environment and thus fluctuates over time. As a result, PV power can cause instability in the power grid, in particular when increasingly used. Limiting the rate of change of the power output is a common way to mitigate these fluctuations, often with the help of large batteries. A reactive controller that uses these batteries to compensate ramps works in practice, but causes stress on the battery due to a high energy throughput. In this paper, we present a deep learning approach that uses images of the sky to compensate power fluctuations predictively and reduces battery stress. In particular, we show that the optimal control policy can be computed using information that is only available in hindsight. Based on this, we use imitation learning to train a neural network that approximates this hindsight-optimal policy, but uses only currently available sky images and sensor data. We evaluate our method on a large dataset of measurements and images from a real power plant and show that the trained policy reduces stress on the battery.

URL PDF HTML ☆

赞 0 踩 0

1811.04584 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Navigating Assistance System for Quadcopter with Deep Reinforcement Learning

四旋翼避障导航辅助系统基于深度强化学习

Tung-Cheng Wu, Shau-Yin Tseng, Chin-Feng Lai, Chia-Yu Ho, Ying-Hsun Lai

发表机构 * National Cheng Kung University（国立成功大学）； Research Laboratories（研究实验室）； Industrial Technology Research Institute（工业技术研究 institutes）； Department of Computer Science（计算机科学系）； Information Engineering（信息工程系）； National Taitung University（国立台东大学）

AI总结本文提出了一种基于深度强化学习的四旋翼避障导航辅助系统，通过两个功能模块分别实现路径导航和碰撞避障，实验表明该方法在500次飞行中碰撞率为14%。

Comments conference

1811.03853 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Sample-Efficient Policy Learning based on Completely Behavior Cloning

基于完全行为克隆的高效策略学习

Qiming Zou, Ling Wang, Ke Lu, Yu Li

发表机构 * Department of Computer Science and Technology, Harbin Institute of Technology, China（计算机科学与技术系，哈尔滨工业大学，中国）； Department of Management Science and Engineering, Anhui University of Technology, China（管理科学与工程系，安徽理工大学，中国）

AI总结本文提出了一种基于完全行为克隆的策略初始化算法PLCBC，通过将模型预测控制转换为分段仿射函数并用神经网络表达，实现无训练的完全克隆，从而提高策略学习的效率和收敛性。

1803.08287 2026-06-04 eess.SY cs.AI cs.LG cs.RO cs.SY 版本更新

Learning-based Model Predictive Control for Safe Exploration

基于学习的模型预测控制用于安全探索

Torsten Koller, Felix Berkenkamp, Matteo Turchetta, Andreas Krause

发表机构 * Vector Institute（向量研究所）； Max Planck ETH Center for Learning Systems（马克斯·普朗克-ETH学习系统中心）

AI总结本文提出了一种基于学习的模型预测控制方法，通过高斯过程先验假设构建可证明准确的轨迹置信区间，从而提供可证明的高概率安全保证，用于动态系统的安全高效探索和学习。

Comments Proc. of the Conference on Decision and Control, 2018

详情

AI中文摘要

基于学习的方法在没有显著系统先验知识的情况下成功解决了复杂控制任务。然而，这些方法通常不提供任何安全保证，这限制了它们在安全关键的现实应用中的使用。在本文中，我们提出了一种基于学习的模型预测控制方案，可以提供可证明的高概率安全保证。为此，我们利用高斯过程先验对动态特性进行正则性假设，以构建可证明准确的预测轨迹置信区间。与以往的方法不同，我们不假设模型不确定性是独立的。基于这些预测，我们保证轨迹满足安全约束。此外，我们使用终端集约束递归地保证在每个迭代中都存在安全的控制动作。在我们的实验中，我们展示了所提出算法可以安全且高效地探索和学习动态系统。

英文摘要

Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that can provide provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.

URL PDF HTML ☆

赞 0 踩 0

1811.00426 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Improving the Modularity of AUV Control Systems using Behaviour Trees

使用行为树提高水下机器人控制系统的模块化程度

Christopher Iliffe Sprague, Özer Özkahraman, Andrea Munafo, Rachel Marlow, Alexander Phillips, Petter Ögren

发表机构 * Robotics, Perception and Learning Lab（机器人、感知与学习实验室）； Royal Institute of Technology（皇家理工学院）； National Oceanography Centre（国家海洋学研究中心）

AI总结本文展示如何利用行为树设计模块化、多功能且稳健的控制架构，用于关键任务系统，特别针对自主水下机器人。研究强调了系统安全的稳健性、执行多种任务的多功能性以及模块化在结合稳健性和多功能性中的重要性。

Comments Submitted to 2018 IEEE OES Autonomous Underwater Vehicle Symposium

1810.13072 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Formal Verification of Neural Network Controlled Autonomous Systems

神经网络控制自主系统的形式验证

Xiaowu Sun, Haitham Khedr, Yasser Shoukry

发表机构 * Department of Electrical ； Computer Engineering University of Maryland, College Park

AI总结本文研究了如何形式验证配备神经网络控制器的自主机器人在LiDAR图像处理中安全性的核心问题，通过构建有限状态抽象并利用可达性分析计算安全的初始条件，提出了一种多项式时间算法来分区工作空间并计算对应的仿射成像函数，同时利用SMC编码分析神经网络行为，通过数值模拟验证了算法的效率。

详情

AI中文摘要

在本文中，我们考虑了正式验证配备神经网络（NN）控制器的自主机器人在处理LiDAR图像以产生控制动作时的安全性问题。给定一个由一组多边形障碍物特征化的工作空间，我们的目标是计算一组安全的初始条件，使得从这些初始条件出发的机器人轨迹能够保证避开障碍物。我们的方法是构建系统的有限状态抽象，并利用标准的可达性分析在有限状态抽象上计算安全的初始状态集。计算有限状态抽象的第一个技术问题是数学建模将机器人位置映射到LiDAR图像的成像函数。为此，我们引入了成像适应集的概念，作为工作空间的分区，在这些分区中，成像函数被保证为仿射的。我们开发了一种多项式时间算法，用于将工作空间划分为成像适应集并计算相应的仿射成像函数。给定这种工作空间分区，机器人的离散时间线性动力学以及一个预训练的具有修正线性单元（ReLU）非线性的神经网络控制器，第二个技术挑战是分析神经网络的行为。为此，我们利用满足模凸（SMC）编码来枚举所有可能的ReLU段落。SMC求解器随后使用布尔可满足性求解器和凸优化求解器，将问题分解为更小的子问题。为了加速这个过程，我们开发了一种预处理算法，可以快速修剪可行的ReLU段落。最后，我们通过数值模拟验证了所提出算法的效率，模拟中神经网络控制器的复杂性逐渐增加。

英文摘要

In this paper, we consider the problem of formally verifying the safety of an autonomous robot equipped with a Neural Network (NN) controller that processes LiDAR images to produce control actions. Given a workspace that is characterized by a set of polytopic obstacles, our objective is to compute the set of safe initial conditions such that a robot trajectory starting from these initial conditions is guaranteed to avoid the obstacles. Our approach is to construct a finite state abstraction of the system and use standard reachability analysis over the finite state abstraction to compute the set of the safe initial states. The first technical problem in computing the finite state abstraction is to mathematically model the imaging function that maps the robot position to the LiDAR image. To that end, we introduce the notion of imaging-adapted sets as partitions of the workspace in which the imaging function is guaranteed to be affine. We develop a polynomial-time algorithm to partition the workspace into imaging-adapted sets along with computing the corresponding affine imaging functions. Given this workspace partitioning, a discrete-time linear dynamics of the robot, and a pre-trained NN controller with Rectified Linear Unit (ReLU) nonlinearity, the second technical challenge is to analyze the behavior of the neural network. To that end, we utilize a Satisfiability Modulo Convex (SMC) encoding to enumerate all the possible segments of different ReLUs. SMC solvers then use a Boolean satisfiability solver and a convex programming solver and decompose the problem into smaller subproblems. To accelerate this process, we develop a pre-processing algorithm that could rapidly prune the space feasible ReLU segments. Finally, we demonstrate the efficiency of the proposed algorithms using numerical simulations with increasing complexity of the neural network controller.

URL PDF HTML ☆

赞 0 踩 0

1810.12429 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

打破地平线诅咒：无限地平线离线估计

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Google Brain（谷歌大脑）

AI总结本文提出了一种新的离线估计方法，通过直接在平稳状态访问分布上应用重要性采样来避免现有估计器中方差爆炸的问题，核心贡献是提出了一种估计两个平稳分布密度比的新方法，并推导了RKHS情况下的闭式解。

Comments 21 pages, 5 figures, NIPS 2018 (spotlight)

详情

AI中文摘要

我们考虑了估计目标策略预期奖励的离线估计问题，该问题使用由不同行为策略收集的样本进行估计。重要性采样（IS）已成为推导（近）无偏估计器的关键技术，但在长地平线问题中已知会遭受过度高的方差。在无限地平线问题的极端情况下，基于IS的估计器的方差可能甚至是无界的。在本文中，我们提出了一种新的离线估计方法，直接在平稳状态访问分布上应用重要性采样，以避免现有估计器所面临的爆炸方差问题。我们的关键贡献是提出了一种估计两个平稳分布密度比的新方法，仅从行为分布中采样轨迹。我们为估计问题开发了一种mini-max损失函数，并推导了RKHS情况下的闭式解。我们通过理论和实证分析支持我们的方法。

英文摘要

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

URL PDF HTML ☆

赞 0 踩 0

1810.09729 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions

多无人机系统在网络物理应用中的设计挑战：综述与未来方向

Reza Shakeri, Mohammed Ali Al-Garadi, Ahmed Badawy, Amr Mohamed, Tamer Khattab, Abdulla Al-Ali, Khaled A. Harras, Mohsen Guizani

发表机构 * Carnegie Mellon University Qatar Campus（卡塔尔分校卡内基梅隆大学）

AI总结本文综述了多无人机系统在网络物理应用中的关键设计挑战，探讨了目标和基础设施对象的覆盖与跟踪、能量高效导航以及基于机器学习的图像分析等核心方法，并提出了面向细粒度网络物理应用的先进算法和未来研究方向。

详情

对记录交互数据进行离散化会偏学习决策制定

Peter Schulam, Suchi Saria

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结本文研究了对非等间隔时间序列数据进行离散化对决策制定模型训练的影响，指出离散化引入了偏差，并提出使用连续时间模型来避免这一问题。

Comments This is a standalone short paper describing a new type of bias that can arise when learning from time series data for sequential decision-making problems

1809.09261 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

基于动态系统的强化学习鲁棒计算：排序问题案例研究

Aleksandra Faust, James B. Aimone, Conrad D. James, Lydia Tapia

发表机构 * Google Brain, Mountain View, CA, USA（谷歌大脑，美国加利福尼亚州山景城）； Sandia National Labs, Albuquerque, NM, USA（桑迪亚国家实验室，美国新墨西哥州阿尔伯克基）

AI总结本文将计算过程建模为反馈控制问题，利用强化学习解决序列决策问题，通过排序问题案例展示鲁棒计算方法在克服传统编程局限性方面的有效性。

Comments 11 pages, accepted to CDC 2018. Here with additional evaluations

详情

AI中文摘要

机器人和自主代理在资源有限的情况下，通常依赖不完美的模型和传感器测量来完成目标导向任务。特别是，强化学习（RL）和反馈控制可以用来帮助机器人实现目标。本文基于这一领域的工作，将通用计算建模为反馈控制问题，使代理能够自主克服标准过程语言编程的局限性：对错误的鲁棒性和早期程序终止的容忍。我们的建模将计算视为程序变量空间中的轨迹生成。计算因此成为一个序列决策问题，通过强化学习（RL）解决，并通过李雅普诺夫稳定性理论分析以评估代理的鲁棒性和向目标的进展。我们通过一个典型的计算机科学问题——数组排序的案例研究来实现这一点。评估显示，我们的RL排序代理能够稳定地向渐近稳定的终点进展，对故障组件具有鲁棒性，并且比传统的快速排序和冒泡排序进行的数组操作更少。

英文摘要

Robots and autonomous agents often complete goal-based tasks with limited resources, relying on imperfect models and sensor measurements. In particular, reinforcement learning (RL) and feedback control can be used to help a robot achieve a goal. Taking advantage of this body of work, this paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.

URL PDF HTML ☆

赞 0 踩 0

1809.07098 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY 版本更新

Novelty-organizing team of classifiers in noisy and dynamic environments

在噪声和动态环境中组织新颖性的分类器团队

Danilo Vasconcellos Vargas, Hirotaka Takano, Junichi Murata

发表机构 * Graduate School of Information Science（信息科学研究生学校）； Electrical Engineering Kyushu University Fukuoka, Japan Email（电气工程九州大学福冈日本电子邮件）； Faculty of Information Science（信息科学学院）

AI总结该研究提出了一种在噪声和动态环境中有效工作的分类器团队（NOTC），并通过连续动作山车问题及其变体进行验证，展示了NOTC在性能上的优势，尽管其初始化过程需要一些时间。

详情

DOI: 10.1109/CEC.2015.7257254
Journal ref: 2015 IEEE Congress on Evolutionary Computation (CEC)

AI中文摘要

在现实世界中，环境不断变化，输入变量受到噪声的影响。然而，很少有算法能够在这种情况下工作。在这里，新颖性组织分类器团队（NOTC）被应用于连续动作山车以及其两个变种：噪声山车和不稳定天气山车。这些问题分别考虑了噪声和问题动态的变化。此外，NOTC在这些问题中与神经进化拓扑增强（NEAT）进行了比较，揭示了两种方法之间的权衡。尽管NOTC在所有问题中均表现最佳，但NEAT需要更少的试验来收敛。证明了NOTC之所以表现更好，是因为其将输入空间划分为更易处理的问题。不幸的是，这种输入空间的划分也需要一些时间来初始化。

局部通信协议用于通过深度强化学习学习复杂群集行为

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

发表机构 * School of Computer Science, University of Lincoln（林肯大学计算机科学学院）； Department of Electrical Engineering, Technische Universität Darmstadt（达姆施塔特技术大学电气工程系）

AI总结本文提出简单通信协议，利用深度强化学习在多机器人群环境中学习去中心化控制策略，通过直方图编码局部邻域关系并传输任务特定信息，如最短距离和方向，以完成协作任务。

Comments 13 pages, 4 figures, version 2, accepted at ANTS 2018

详情

AI中文摘要

群集系统对强化学习（RL）构成挑战，因为算法需要学习去中心化控制策略以应对代理的有限局部感知和通信能力。虽然直接定义代理行为困难，但可通过先验知识定义简单的通信协议。本文提出多种简单通信协议，用于深度强化学习在多机器人群环境中寻找去中心化控制策略。协议基于直方图编码代理的局部邻域关系，并可传输任务特定信息，如到目标的最短距离和方向。在我们的框架中，我们采用信任区域策略优化的变体来学习复杂协作任务，如编队和建立通信链路。我们在模拟的2D物理环境中评估了我们的发现，并比较了不同通信协议的影响。

英文摘要

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

URL PDF HTML ☆

赞 0 踩 0

1709.05077 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

通过深度强化学习优化绿色数据中心的冷却系统

Yuanlong Li, Yonggang Wen, Kyle Guan, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Nanyang Technological University（南洋理工大学计算机科学与工程学院）； Bell Labs, Nokia（诺基亚贝尔实验室）

AI总结本文提出利用数据中心监控数据优化冷却控制策略，采用深度强化学习框架设计端到端冷却控制算法，实现冷却成本降低11%的模拟平台结果及15%的实时数据节省。

详情

AI中文摘要

冷却系统在现代数据中心（DC）中起着关键作用。开发最优控制策略对于数据中心冷却系统是一个具有挑战性的任务。现有方法通常依赖于基于机械冷却、电气和热管理知识构建的系统模型近似，这难以设计且可能导致次优或不稳定性能。本文提出利用数据中心中的大量监控数据来优化控制策略。为此，将冷却控制策略设计转化为具有温度约束的能量成本最小化问题，并将其应用于新兴的深度强化学习（DRL）框架。具体而言，我们提出了一种基于actor-critic框架和深度确定性策略梯度（DDPG）算法的端到端冷却控制算法（CCA）。在所提出的CCA中，评估网络被训练以预测一个受数据中心房间冷却状态惩罚的能量成本计数器，而策略网络被训练以在给定当前负载和天气信息时预测优化的控制设置。所提出的算法在EnergyPlus模拟平台和从新加坡国家超级计算中心（NSCC）收集的实时数据跟踪上进行了评估。我们的结果表明，所提出的CCA在模拟平台上相比手动配置的基线控制算法可实现约11%的冷却成本节省。在基于跟踪的研究中，我们提出了一种去低估验证机制，因为我们无法直接在真实数据中心上测试该算法。尽管使用DUE结果较为保守，如果我们设置入口温度阈值为26.6摄氏度，我们仍能在NSCC数据跟踪上实现约15%的冷却能耗节省。

英文摘要

Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper, we propose utilizing the large amount of monitoring data in DC to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tap it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end cooling control algorithm (CCA) that is based on the actor-critic framework and an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm. In the proposed CCA, an evaluation network is trained to predict an energy cost counter penalized by the cooling status of the DC room, and a policy network is trained to predict optimized control settings when gave the current load and weather information. The proposed algorithm is evaluated on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed CCA can achieve about 11% cooling cost saving on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study, we propose a de-underestimation validation mechanism as we cannot directly test the algorithm on a real DC. Even though with DUE the results are conservative, we can still achieve about 15% cooling energy saving on the NSCC data trace if we set the inlet temperature threshold at 26.6 degree Celsius.

URL PDF HTML ☆

赞 0 踩 0

1807.03769 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Kernel-Based Learning for Smart Inverter Control

基于核方法的智能逆变器控制

Aditie Garg, Mana Jalali, Vassilis Kekatos, Nikolaos Gatsis

发表机构 * Dept. of ECE, Virginia Tech（维吉尼亚理工大学电子工程系）； Dept. of ECE, Un. of Texas at San Antonio（德克萨斯大学圣安东尼奥分校电子工程系）

AI总结本文提出非线性逆变器控制策略，通过类比多任务学习将反应控制视为核回归任务，利用线性化电网模型和预测数据场景，在馈线层面联合设计逆变器规则以最小化电压偏差和电阻损耗。

Comments Submitted to the 2018 IEEE Global Signal and Information Processing Conf., Symposium on Smart Energy Infrastructures

详情

AI中文摘要

目前，分布电网面临由间歇性太阳能发电引起的频繁电压波动的挑战。智能逆变器被倡导为一种快速响应的手段，用于调节电压并最小化电阻损耗。由于最优逆变器协调可能计算上具有挑战性，而预设的本地控制规则表现不佳，因此定制化的准静态控制规则被视为最佳折中方案。本文从仿射控制规则出发，提出非线性逆变器控制策略。通过类比多任务学习，将反应控制视为基于核的回归任务。利用线性化电网模型和给定的预期数据场景，在馈线层面联合设计逆变器规则，以最小化电压偏差和电阻损耗的凸组合，通过线性约束的二次规划。使用真实世界数据在基准馈线上的数值测试表明，非线性控制规则即使由少数非本地读数驱动，也能实现近最优性能。

英文摘要

Distribution grids are currently challenged by frequent voltage excursions induced by intermittent solar generation. Smart inverters have been advocated as a fast-responding means to regulate voltage and minimize ohmic losses. Since optimal inverter coordination may be computationally challenging and preset local control rules are subpar, the approach of customized control rules designed in a quasi-static fashion features as a golden middle. Departing from affine control rules, this work puts forth non-linear inverter control policies. Drawing analogies to multi-task learning, reactive control is posed as a kernel-based regression task. Leveraging a linearized grid model and given anticipated data scenarios, inverter rules are jointly designed at the feeder level to minimize a convex combination of voltage deviations and ohmic losses via a linearly-constrained quadratic program. Numerical tests using real-world data on a benchmark feeder demonstrate that nonlinear control rules driven also by a few non-local readings can attain near-optimal performance.

URL PDF HTML ☆

赞 0 踩 0

1807.02297 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

基于动态偏好的激励机制组合博弈问题

Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

发表机构 * Electrical Engineering Department, University of Washington（华盛顿大学电气工程系）

AI总结本文提出一种多臂老虎机框架，用于在资源受限环境下匹配用户激励，结合贪心匹配、UCB算法和马尔可夫链混合时间，理论分析 regret 并通过合成和现实案例验证性能。

Comments Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018

1807.00553 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.DS stat.ML 版本更新

A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics

对自动化决策中偏见的更广泛视角：反思认识论与动态性

Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California Berkeley, USA（加州大学伯克利分校电气工程与计算机科学系）； Department of Rhetoric, University of California Berkeley, USA（加州大学伯克利分校修辞学系）； School of Information, University of California Berkeley, USA（加州大学伯克利分校信息学院）

AI总结本文探讨自动化决策中偏见的根源，将技术偏见视为认识论问题，新兴偏见视为动态反馈现象，强调需反思认识论并采用价值敏感设计方法改进决策系统。

Comments Presented at the 2018 Workshop on Fairness, Accountability and Transparency in Machine Learning during ICML 2018, Stockholm, Sweden

1711.10868 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

La production de nitrites lors de la dénitrification des eaux usées par biofiltration - Stratégie de contrôle et de réduction des concentrations résiduelles

废水生物过滤脱硝过程中亚硝酸盐的生成 - 控制与残留浓度的减少策略

Vincent Rocher, Cédric Join, Stéphane Mottelet, Jean Bernier, Sabrina Rechdaoui-Guérin, Sam Azimi, Paul Lessard, André Pauss, Michel Fliess

发表机构 * SIAAP (Syndicat Interdépartemental pour l'Assainissement de l'Agglomération Parisienne)（巴黎大都会污水处理协会）； CRAN (CNRS, UMR 7039)（CRAN（国家科学研究中心，UMR 7039））； TIMR (EA 4297)（TIMR（EA 4297））； Département de génie civil et de génie des eaux, Université Laval（土木工程与水工程系，拉瓦尔大学）； LIX (CNRS, UMR 7161)（LIX（国家科学研究中心，UMR 7161））； AL.I.E.N. (ALgèbre pour Identification & Estimation Numériques)（AL.I.E.N.（代数用于识别与数值估计））

AI总结研究通过MOCOPEE项目探讨废水脱硝过程中亚硝酸盐生成机制，开发测量与控制工具以降低现场亚硝酸盐浓度，采用模型无关控制策略提升脱硝效率。

Comments in french, Journal of Water Science, to appear

详情

DOI: 10.7202/1047053ar
Journal ref: Revue des Sciences de l'Eau, 31(1), 2018, 61-73

AI中文摘要

近年来，巴黎大区污水处理厂对脱硝后处理过程的流行导致塞纳河中亚硝酸盐浓度回升。控制脱硝后亚硝酸盐生成成为关键技术问题。MOCOPEE项目研究了废水脱硝过程中亚硝酸盐生成的机理，并开发了测量和控制工具以减少现场亚硝酸盐产量。先前研究表明，典型的甲醇投加策略会导致反应器中碳氮比波动，从而引起出水亚硝酸盐浓度不稳定。因此，在SimBio模型上测试了将模型无关控制添加到经典投加策略的可能性，该模型模拟了废水生物滤池的行为。相应的

英文摘要

The recent popularity of post-denitrification processes in the greater Paris area wastewater treatment plants has caused a resurgence of the presence of nitrite in the Seine river. Controlling the production of nitrite during the post-denitrification has thus become a major technical issue. Research studies have been led in the MOCOPEE program (www.mocopee.com) to better understand the underlying mechanisms behind the production of nitrite during wastewater denitrification and to develop technical tools (measurement and control solutions) to assist on-site reductions of nitrite productions. Prior studies have shown that typical methanol dosage strategies produce a varying carbon-to-nitrogen ratio in the reactor, which in turn leads to unstable nitrite concentrations in the effluent. The possibility of adding a model-free control to the actual classical dosage strategy has thus been tested on the SimBio model, which simulates the behavior of wastewater biofilters. The corresponding "intelligent" feedback loop, which is using effluent nitrite concentrations, compensates the classical strategy only when needed. Simulation results show a clear improvement in average nitrite concentration level and level stability in the effluent, without a notable overcost in methanol.

URL PDF HTML ☆

赞 0 踩 0

1806.08083 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

拓展主动推断领域：感知-动作循环中的更多内在动机

Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani

发表机构 * Araya Inc.（Araya公司）； Computational Creativity Group, Department of Computing, Goldsmiths, University of London（Goldsmiths大学计算创意小组）； Game Innovation Lab, Department of Computer Science and Engineering, New York University（纽约大学游戏创新实验室）； Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire（赫特福德大学计算机科学系Sepia实验室）； Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh（爱丁堡大学信息学院感知、行为与行为研究所）

AI总结本文探讨主动推断中是否可利用其他内在动机替代原有动机，同时保持核心机制，并通过形式化方法连接通用强化学习。

Comments 53 pages, 6 figures, 2 tables

详情

AI中文摘要

主动推断是一种雄心勃勃的理论，将自主代理的感知、推断和动作选择统一于单一原则下。它为许多认知现象提供了生物合理解释，包括意识。在主动推断中，动作选择由一个评估未来动作的客观函数驱动，该函数基于当前推断的世界信念。主动推断本质上独立于外在奖励，使其在不同环境或代理形态中具有高度鲁棒性。在文献中，共享这种独立性的范式被总结为内在动机。与主动推断不同，这些动机模型通常不承诺特定的推断和动作选择机制。本文研究主动推断的推断和动作选择机制是否也可用于其他内在动机替代原动机。感知-动作循环明确将推断和动作选择与环境和代理记忆联系起来，因此被用作分析基础。我们重构了主动推断方法，将其原始公式定位其中，并展示如何在保持许多原始特征的同时使用其他内在动机。此外，我们通过形式化方法展示了与通用强化学习的联系。主动推断研究可能从比较其他内在动机诱导的动力学中受益。内在动机研究可能从另一种实现内在动机代理的方式中受益，该方式也共享主动推断的生物合理性。

英文摘要

Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.

URL PDF HTML ☆

赞 0 踩 0

1803.02998 2026-06-04 eess.SY cs.AI cs.SY 版本更新

DeepCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling

DeepCAS: 一种用于控制感知调度的深度强化学习算法

Burak Demirel, Arunselvan Ramaswamy, Daniel E. Quevedo, Holger Karl

发表机构 * Paderborn University（帕德博恩大学）

AI总结本文提出DeepCAS算法，通过深度强化学习实现控制感知调度，优化子系统控制器并最小化控制损失，实验证明其优于周期性调度。

详情

AI中文摘要

我们考虑由多个独立受控子系统组成的网络控制系统，这些系统通过共享通信网络运行。此类系统在网络物理系统、物联网和大规模工业系统中普遍存在。在许多大规模设置中，通信网络的规模小于系统的规模，从而引发调度问题。本文的主要贡献是开发一种基于深度强化学习的控制感知调度（DeepCAS）算法，以解决这些问题。我们采用以下（最优）设计策略：首先，为每个子系统合成最优控制器；其次，设计一个学习算法，以适应所选子系统（被控对象）和控制器。由于这种适应性，我们的算法找到一个调度方案，以最小化控制损失。我们通过实验证明，DeepCAS找到的调度性能优于周期性调度。

英文摘要

We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based \emph{control-aware} scheduling (\textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the \emph{control loss}. We present empirical results to show that \textsc{DeepCAS} finds schedules with better performance than periodic ones.

URL PDF HTML ☆

赞 0 踩 0

1806.00727 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search

闭环贝叶斯语义数据融合用于协同人机目标搜索

Luke Burks, Ian Loefgren, Luke Barbier, Jeremy Muesing, Jamison McGinley, Sousheel Vunnam, Nisar Ahmed

AI总结本文提出一种闭环贝叶斯语义数据融合方法，通过CPOMDP规划生成最优轨迹，结合不完美传感器数据和人类提供的语义观察，提升动态目标搜索效率。

Comments Final version accepted and submitted to 2018 FUSION Conference (Cambridge, UK, July 2018)

详情

AI中文摘要

在搜索应用中，自主无人车辆必须能够高效重新获取和定位长时间可能处于视线外的大空间中移动目标。为此，本文开发并验证了一种新的协同人机感知解决方案。我们的方法利用连续部分可观测马尔可夫决策过程（CPOMDP）规划，生成最优利用不完美传感器数据和可请求的语义自然语言观察的车辆轨迹。关键创新是可扩展的层次高斯混合模型形式，用于在连续动态状态空间中高效求解包含语义观察的CPOMDPs。该方法在定制测试平台上通过真实的人机团队在动态室内目标搜索和捕捉场景中进行了演示和验证。

英文摘要

In search applications, autonomous unmanned vehicles must be able to efficiently reacquire and localize mobile targets that can remain out of view for long periods of time in large spaces. As such, all available information sources must be actively leveraged -- including imprecise but readily available semantic observations provided by humans. To achieve this, this work develops and validates a novel collaborative human-machine sensing solution for dynamic target search. Our approach uses continuous partially observable Markov decision process (CPOMDP) planning to generate vehicle trajectories that optimally exploit imperfect detection data from onboard sensors, as well as semantic natural language observations that can be specifically requested from human sensors. The key innovation is a scalable hierarchical Gaussian mixture model formulation for efficiently solving CPOMDPs with semantic observations in continuous dynamic state spaces. The approach is demonstrated and validated with a real human-robot team engaged in dynamic indoor target search and capture scenarios on a custom testbed.

URL PDF HTML ☆

赞 0 踩 0

1806.00589 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Efficient Entropy for Policy Gradient with Multidimensional Action Space

在多维动作空间中高效的策略梯度熵

Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

发表机构 * New York University（纽约大学）； New York University Abu Dhabi（纽约大学阿布扎克分校）； New York University Shanghai（纽约大学上海分校）； Massachusetts Institute of Technology（麻省理工学院）

AI总结本文提出高效计算高维动作空间策略梯度熵的方法，通过改进的无偏估计器提升探索效率，在多猎手多兔子网格游戏和多智能体多臂老虎机问题中验证了其有效性。

详情

AI中文摘要

近年来，深度强化学习在解决高维状态空间（如Atari游戏）的序列决策过程方面表现出色。然而，许多强化学习问题涉及高维离散动作空间和高维状态空间。本文考虑熵奖励，用于在策略梯度中鼓励探索。在高维动作空间中，计算熵及其梯度需要枚举所有动作并为每个动作运行前向和反向传播，这可能计算上不可行。我们开发了几种新颖的无偏估计器用于熵奖励及其梯度。我们将这些估计器应用于几种参数化策略模型，包括独立采样、CommNet、带有修改MDP的自回归和带有LSTM的自回归。最后，我们在两个环境中测试我们的算法：一个多猎手多兔子网格游戏和一个多智能体多臂老虎机问题。结果表明，我们的熵估计器在边际额外计算成本下显著提升了性能。

英文摘要

In recent years, deep reinforcement learning has been shown to be adept at solving sequential decision processes with high-dimensional state spaces such as in the Atari games. Many reinforcement learning problems, however, involve high-dimensional discrete action spaces as well as high-dimensional state spaces. This paper considers entropy bonus, which is used to encourage exploration in policy gradient. In the case of high-dimensional action spaces, calculating the entropy and its gradient requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible. We develop several novel unbiased estimators for the entropy bonus and its gradient. We apply these estimators to several models for the parameterized policies, including Independent Sampling, CommNet, Autoregressive with Modified MDP, and Autoregressive with LSTM. Finally, we test our algorithms on two environments: a multi-hunter multi-rabbit grid game and a multi-agent multi-arm bandit problem. The results show that our entropy estimators substantially improve performance with marginal additional computational cost.

URL PDF HTML ☆

赞 0 踩 0

1709.05746 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

对抗性判别仿真到现实的视觉-运动策略转移

Fangyi Zhang, Jürgen Leitner, Zongyuan Ge, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)（澳大利亚机器人视觉中心）； Queensland University of Technology (QUT)（昆士兰技术大学）； Monash University（墨尔本大学）

AI总结本文提出对抗性判别仿真到现实转移方法，减少现实数据标注成本，在桌面上物体抓取任务中，通过视觉观测控制7自由度机械臂在障碍物中抓取蓝色立方体，仅需93个标注和186个未标注图像即可实现97.8%的成功率和1.8厘米的控制精度。

Comments Under review for the International Journal of Robotics Research

详情

AI中文摘要

各种方法已被提出以学习用于现实世界机器人应用的视觉-运动策略。一种解决方案是首先在仿真中学习然后转移到现实世界。在转移过程中，大多数现有方法需要带有标签的真实图像。然而，在许多机器人应用中，标注过程往往昂贵甚至不实际。在本文中，我们提出了一种对抗性判别仿真到现实转移方法，以减少标注真实数据的成本。通过模块化网络在桌面物体抓取任务中验证了该方法的有效性，其中7自由度的机械臂以速度模式控制在障碍物中抓取蓝色立方体。对抗性转移方法将标注真实数据的需求减少了50%。策略可以仅使用93个标注和186个未标注的真实图像转移到现实环境。转移的视觉-运动策略对训练中未见过的物体和移动目标具有鲁棒性，实现了97.8%的成功率和1.8厘米的控制精度。

英文摘要

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.

URL PDF HTML ☆

赞 0 踩 0

1805.09613 2026-06-04 stat.ML cs.AI cs.LG cs.RO cs.SY eess.SY 版本更新

A0C: Alpha Zero in Continuous Action Space

A0C：在连续动作空间中的Alpha Zero

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

发表机构 * Dep. of Computer Science, Delft University of Technology, The Netherlands（代尔夫特理工大学计算机科学系，荷兰）； Dep. of Computer Science, Leiden University, The Netherlands（莱顿大学计算机科学系，荷兰）

AI总结本文提出将Alpha Zero扩展到连续动作空间的理论方法，并在倒摆任务中验证了其可行性，为连续动作空间中的迭代搜索与学习应用奠定了基础。

1805.07196 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Supervisory Control of Probabilistic Discrete Event Systems under Partial Observation

对在部分观测下概率离散事件系统的监督控制

Weilin Deng, Jingkai Yang, Daowen Qiu

发表机构 * School of Data and Computer Science, Sun Yat-sen University（中山大学数据与计算机科学学院）

AI总结研究在概率监督控制器和部分观测假设下概率离散事件系统(PDESs)的监督控制，提出概率可控性和可观测性的概念，并设计多项式验证算法，同时引入并计算了最优控制问题的解。

Comments 36 pages, comments are welcome

1805.04201 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Learning to Grasp Without Seeing

无需视觉的抓取学习

Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

发表机构 * The Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）

AI总结本文提出基于触觉感知的抓取方法，通过触觉信号表征和迭代重抓取提升抓取稳定性，实验表明在无视觉信息下可有效抓取新型物体。

详情

AI中文摘要

能否在不看到物体的情况下让机器人抓取未知物体？本文提出了一种基于触觉感知的解决方案，结合触觉信号定位与触觉反馈重抓取。我们创建了一个大规模抓取数据集，包含超过30帧RGB图像和280万条触觉样本。提出了一种无监督自编码方案，显著提升了触觉感知任务的性能。系统分为两个步骤：首先，触觉定位模型通过粒子滤波聚合目标信息，输出物体位置估计以建立初始抓取；其次，重抓取模型基于学习特征逐步改进抓取，估计抓取稳定性并预测下一步调整。最终通过大量实验验证了在无视觉信息下抓取新型物体的有效性，并在视觉策略基础上提升了整体准确率10.6%。

英文摘要

Can a robot grasp an unknown object without seeing it? In this paper, we present a tactile-sensing based approach to this challenging problem of grasping novel objects without prior knowledge of their location or physical properties. Our key idea is to combine touch based object localization with tactile based re-grasping. To train our learning models, we created a large-scale grasping dataset, including more than 30 RGB frames and over 2.8 million tactile samples from 7800 grasp interactions of 52 objects. To learn a representation of tactile signals, we propose an unsupervised auto-encoding scheme, which shows a significant improvement of 4-9% over prior methods on a variety of tactile perception tasks. Our system consists of two steps. First, our touch localization model sequentially 'touch-scans' the workspace and uses a particle filter to aggregate beliefs from multiple hits of the target. It outputs an estimate of the object's location, from which an initial grasp is established. Next, our re-grasping model learns to progressively improve grasps with tactile feedback based on the learned features. This network learns to estimate grasp stability and predict adjustment for the next grasp. Re-grasping thus is performed iteratively until our model identifies a stable grasp. Finally, we demonstrate extensive experimental results on grasping a large set of novel objects using tactile sensing alone. Furthermore, when applied on top of a vision-based policy, our re-grasping model significantly boosts the overall accuracy by 10.6%. We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

URL PDF HTML ☆

赞 0 踩 0

1805.03090 2026-06-04 math.OC cs.AI cs.SY eess.SY 版本更新

Deception in Optimal Control

最优控制中的欺骗

Melkior Ornik, Ufuk Topcu

发表机构 * Institute for Computational Engineering and Sciences, University of Texas at Austin（德克萨斯大学奥斯汀分校计算工程与科学研究所）； Department of Aerospace Engineering and Engineering Mechanics and the Institute for Computational Engineering and Sciences, University of Texas at Austin（德克萨斯大学奥斯汀分校航空航天工程与工程力学系及计算工程与科学研究所）

AI总结本文提出一个数学严谨的框架，用于定义最优控制中的欺骗，通过设计最优欺骗策略，考虑代理和对手的信念空间，并讨论在不确定性和部分可观测马尔可夫决策过程中的欺骗策略设计。

详情

AI中文摘要

本文考虑了一个对抗性场景，其中一方试图实现目标，而其对手试图学习该方的意图并阻止其达成目标。代理有动机试图欺骗对手，同时努力实现其目标。本文的主要贡献是引入了一个数学严谨的框架，用于在最优控制背景下定义欺骗。核心概念是信念诱导奖励：一种奖励不仅依赖于代理的状态和动作，还依赖于对手的信念。设计最优欺骗策略成为在代理状态空间和对手信念空间的乘积上进行最优控制设计的问题。所提出的框架允许在任意具有奖励函数的控制系统中定义欺骗，以及带有额外限制代理控制策略的规范。除了定义欺骗外，我们还讨论了在代理对对手学习过程的知识不确定时如何设计最优欺骗策略。在论文后半部分，我们聚焦于代理行为由马尔可夫决策过程决定的场景，并展示在缺乏对手知识时设计最优欺骗策略自然减少到之前讨论的控制设计问题中部分可观测或不确定的马尔可夫决策过程中。最后，我们给出了两个欺骗策略的例子：一个“警察与小偷”场景和一个代理在移动时使用伪装的例子。我们展示了在这些例子中最优欺骗策略遵循上述设置中如何欺骗对手的直观想法。

英文摘要

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings.

URL PDF HTML ☆

赞 0 踩 0

1805.00983 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems

面向自动驾驶系统安全与安全的鲁棒深度强化学习

Aidin Ferdowsi, Ursula Challita, Walid Saad, Narayan B. Mandayam

发表机构 * Ericsson Research（爱立信研究）； WINLAB, Dept. of ECE, Rutgers University（WINLAB，电子与计算机工程系，罗格斯大学）

AI总结本文提出了一种新颖的对抗深度强化学习算法，用于提高自动驾驶系统在面对网络物理攻击时的鲁棒性，通过游戏理论框架分析攻击者与自动驾驶车辆之间的对抗行为，利用LSTM块学习预期间距偏差以优化安全控制。

Comments 8 pages, 4 figures

详情

AI中文摘要

为了在未来的智能城市中有效运行，自动驾驶车辆（AVs）必须依赖于车载传感器如摄像头和雷达以及车对车通信。这种对传感器和通信链路的依赖使AVs容易受到网络物理（CP）攻击，攻击者试图通过操纵数据来控制AVs。因此，为了确保安全和最优的AV动态控制，AVs的数据处理功能必须对这些CP攻击具有鲁棒性。为此，本文分析了在存在CP攻击情况下监控AV动态的状态估计过程，并提出了一种新颖的对抗深度强化学习（RL）算法，以最大化AV动态控制对CP攻击的鲁棒性。在所提出的游戏中，攻击者试图注入错误的数据到AV传感器读数中，以操纵车对车最优安全间距，从而可能增加AV事故风险或减少道路上的车辆流量。同时，AV作为防御方，试图最小化间距偏差以确保对攻击者行为的鲁棒性。由于AV没有关于攻击者行为的信息，且数据值操纵的可能性无限，玩家过去互动的结果被输入到长短期记忆（LSTM）块中。每个玩家的LSTM块学习其自身行动导致的预期间距偏差，并将其反馈到其RL算法中。然后，攻击者的RL算法选择最大化间距偏差的动作，而AV的RL算法则试图找到最小化此类偏差的最佳动作。

英文摘要

To operate effectively in tomorrow's smart cities, autonomous vehicles (AVs) must rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication. Such dependence on sensors and communication links exposes AVs to cyber-physical (CP) attacks by adversaries that seek to take control of the AVs by manipulating their data. Thus, to ensure safe and optimal AV dynamics control, the data processing functions at AVs must be robust to such CP attacks. To this end, in this paper, the state estimation process for monitoring AV dynamics, in presence of CP attacks, is analyzed and a novel adversarial deep reinforcement learning (RL) algorithm is proposed to maximize the robustness of AV dynamics control to CP attacks. The attacker's action and the AV's reaction to CP attacks are studied in a game-theoretic framework. In the formulated game, the attacker seeks to inject faulty data to AV sensor readings so as to manipulate the inter-vehicle optimal safe spacing and potentially increase the risk of AV accidents or reduce the vehicle flow on the roads. Meanwhile, the AV, acting as a defender, seeks to minimize the deviations of spacing so as to ensure robustness to the attacker's actions. Since the AV has no information about the attacker's action and due to the infinite possibilities for data value manipulations, the outcome of the players' past interactions are fed to long-short term memory (LSTM) blocks. Each player's LSTM block learns the expected spacing deviation resulting from its own action and feeds it to its RL algorithm. Then, the the attacker's RL algorithm chooses the action which maximizes the spacing deviation, while the AV's RL algorithm tries to find the optimal action that minimizes such deviation.

URL PDF HTML ☆

赞 0 踩 0

1801.07745 2026-06-04 math.OC cs.AI cs.CG cs.NA math.NA 版本更新

Optimal Transport on Discrete Domains

离散域上的最优传输

Justin Solomon

发表机构 * MIT Department of Electrical Engineering and Computer Science（麻省理工学院电气工程与计算机科学系）； MIT Department of Electrical Engineering（麻省理工学院电气工程系）

AI总结本文探讨了离散最优传输的最新进展，结合偏微分方程与凸分析，提出理论支持的模型，适用于数万到数百万顶点的领域。

详情

AI中文摘要

受物流问题中供需匹配的启发，最优传输（或蒙特卡洛问题）涉及在几何域上定义的概率分布的匹配。在最明显的离散化中，最优传输成为大规模线性规划问题，通常在三角网格、图、点云等图形和机器学习中遇到的域上难以高效求解。然而，最近的数值最优传输突破使可扩展性达到数量级更大的问题，可在几秒钟内解决。本文讨论了利用离散和光滑问题方面理解的数值最优传输进展。最先进的离散最优传输技术结合了偏微分方程（PDE）与凸分析的洞察，以重新公式化、离散化和优化运输问题。最终结果是一组理论上支持的模型，适用于具有数万或数百万顶点的领域。由于数值最优传输是一个相对较新的学科，特别强调了识别和解释需要数学洞察和额外研究的开放问题。

英文摘要

Inspired by the matching of supply to demand in logistical problems, the optimal transport (or Monge--Kantorovich) problem involves the matching of probability distributions defined over a geometric domain such as a surface or manifold. In its most obvious discretization, optimal transport becomes a large-scale linear program, which typically is infeasible to solve efficiently on triangle meshes, graphs, point clouds, and other domains encountered in graphics and machine learning. Recent breakthroughs in numerical optimal transport, however, enable scalability to orders-of-magnitude larger problems, solvable in a fraction of a second. Here, we discuss advances in numerical optimal transport that leverage understanding of both discrete and smooth aspects of the problem. State-of-the-art techniques in discrete optimal transport combine insight from partial differential equations (PDE) with convex analysis to reformulate, discretize, and optimize transportation problems. The end result is a set of theoretically-justified models suitable for domains with thousands or millions of vertices. Since numerical optimal transport is a relatively new discipline, special emphasis is placed on identifying and explaining open problems in need of mathematical insight and additional research.

URL PDF HTML ☆

赞 0 踩 0

1804.04696 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient Model Identification for Tensegrity Locomotion

高效 tensegrity 机器人运动的模型识别

Shaojun Zhu, David Surovik, Kostas E. Bekris, Abdeslam Boularias

发表机构 * Department of Computer Science, Rutgers University（计算机科学系，罗格斯大学）

AI总结本文提出一种高效方法，利用物理引擎和贝叶斯优化框架，用于识别高维顺应性tensegrity机器人中的未知机械参数，提升运动控制精度。

1804.03973 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems

关于自主机电系统中学习组件安全性的推理

Cumhur Erkan Tuncali, James Kapinski, Hisahiro Ito, Jyotirmoy V. Deshmukh

发表机构 * Toyota Research Institute of North America（丰田北美研究院）； University of Southern California（南加州大学）

AI总结本文提出基于模拟的方法生成屏障证书函数，用于验证包含神经网络控制器的机电系统安全性。通过线性规划求解器从随机初始状态获得的模拟轨迹中找到候选生成函数，并利用SMT求解器验证其安全性。

Comments Invited paper in conference: Design Automation Conference (DAC) 2018

1804.02884 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY 版本更新

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

基于价值函数近似集体多智能体规划的策略梯度

Duc Thien Nguyen, Akshat Kumar, Hoong Chuin Lau

发表机构 * School of Information Systems（信息系统学院）； Singapore Management University（新加坡管理大学）

AI总结本文提出一种改进的actor-critic方法，用于优化集体决策多智能体规划问题，通过分解近似动作价值函数提升收敛速度，并在合成任务和出租车车队优化中验证了方法的有效性。

1612.07139 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation

深度网络在机器人学习控制中的应用综述：从强化到模仿

Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard

发表机构 * University of Freiburg（弗赖堡大学）

AI总结本文综述了深度学习在机器人学习控制中的应用，探讨了深度强化学习和模仿学习两大主流方法，分析了其在导航、 manipulation 任务中的应用及现实差距挑战。

Comments 19 pages, 1 figures

详情

AI中文摘要

深度学习技术已广泛应用于各种研究领域，取得了最先进的成果。本文综述了针对机器人应用的学习控制策略的深度学习解决方案。我们讨论了深度学习在学习控制中的两大主要范式：深度强化学习和模仿学习。对于深度强化学习（DRL），我们从传统强化学习算法开始，展示了如何将其扩展到深度领域，并介绍了在机器人导航和 manipulation 任务中使用 DRL 的代表性工作。我们继续讨论了解决现实差距挑战的方法，即如何将仿真中训练的 DRL 策略转移到现实世界场景，并总结了用于 DRL 研究的机器人仿真平台。对于模仿学习，我们探讨了其三个主要类别：行为克隆、逆强化学习和生成对抗模仿学习，介绍了它们的公式及其在机器人应用中的对应情况。最后，我们讨论了开放挑战和研究前沿。

英文摘要

Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.

URL PDF HTML ☆

赞 0 踩 0

1712.04170 2026-06-04 cs.AI cs.NE cs.SY eess.SY 版本更新

Interpretable Policies for Reinforcement Learning by Genetic Programming

通过遗传编程实现强化学习的可解释策略

Daniel Hein, Steffen Udluft, Thomas A. Runkler

发表机构 * Technical University of Munich, Department of Informatics（慕尼黑技术大学信息学院）； Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文提出基于模型驱动批量强化学习和遗传编程的GPRL方法，通过预存的默认状态-动作轨迹样本自动生成可解释的强化学习策略，实验表明其优于传统符号回归方法。

详情

AI中文摘要

可解释性强化学习策略的搜索在学术和工业领域均有重要价值。特别是对于工业系统，如果策略易于理解和评估，领域专家更可能部署自主学习的控制器。基本代数方程只要复杂度适当，就能满足这些要求。本文引入基于模型驱动批量强化学习和遗传编程的强化学习遗传编程（GPRL）方法，该方法可从预存的默认状态-动作轨迹样本中自动生成策略方程。GPRL与传统利用遗传编程进行符号回归的方法相比，能够生成模仿现有高性能但不可解释策略的策略。在三个强化学习基准测试中，即山车、倒极杆平衡和工业基准，实验显示GPRL方法优于符号回归方法。GPRL能够从预存的默认轨迹数据中生成高性能且可解释的强化学习策略。

英文摘要

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

URL PDF HTML ☆

赞 0 踩 0

1803.08137 2026-06-04 cs.CV cs.AI cs.NA math.NA stat.ML 版本更新

Robust Blind Deconvolution via Mirror Descent

通过镜像下降实现鲁棒盲去卷积

Sathya N. Ravi, Ronak Mehta, Vikas Singh

AI总结本文研究盲去卷积的鲁棒性和收敛性，提出一种具有理论保证的算法，在实践中表现优异。

详情

AI中文摘要

我们重新审视盲去卷积问题，重点在于理解其鲁棒性和收敛性属性。可证明的鲁棒性对噪声和其他扰动的容忍能力最近在视觉领域受到关注，从获得对抗攻击的免疫性到评估和描述关键任务应用中算法的失败模式。此外，许多基于深度架构的盲去卷积方法内部使用或优化基本公式，因此更清楚地理解该子模块的行为、何时可以求解以及它可以容忍多少噪声注入是首要要求。我们推导了盲去卷积理论基础的新见解。出现的算法具有良好的收敛保证，并在我们论文中正式定义的意义上被证明是鲁棒的。有趣的是，这些技术结果在实践中表现非常出色，其中在标准数据集上，我们的算法结果与或优于现有最先进方法。关键词：盲去卷积，鲁棒连续优化

英文摘要

We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art. Keywords: blind deconvolution, robust continuous optimization

URL PDF HTML ☆

赞 0 踩 0

1612.05971 2026-06-04 eess.SY cs.AI cs.GT cs.SY math.OC 版本更新

An Integrated Optimization + Learning Approach to Optimal Dynamic Pricing for the Retailer with Multi-type Customers in Smart Grids

在智能电网中面向多类型顾客的零售商最优动态定价集成优化与学习方法

Fanlin Meng, Xiao-Jun Zeng, Yan Zhang, Chris J. Dent, Dunwei Gong

发表机构 * School of Engineering and Computing Sciences, Durham University（工程与计算科学学院，达勒姆大学）； School of Computer Science, The University of Manchester（计算机科学学院，曼彻斯特大学）； College of Information System and Management, National University of Defense Technology（信息系统与管理学院，国防科技大学）； School of Mathematics, University of Edinburgh（数学学院，爱丁堡大学）； School of Information and Control Engineering, China University of Mining and Technology（信息与控制工程学院，中国矿业大学）

AI总结本文针对智能电网中零售商服务三种不同类型的顾客问题，提出两级决策框架，结合优化与学习方法实现动态定价优化，通过仿真实验验证模型的有效性。

Comments 38 pages, 6 figures

详情

DOI: 10.1016/j.ins.2018.03.039

AI中文摘要

本文考虑智能电网中零售商服务三种不同类型的顾客的现实场景，即具有嵌入智能电表的最优家庭能源管理系统顾客（C-HEMS）、仅具有智能电表的顾客（C-SM）以及无智能电表的顾客（C-NONE）。本文的主要目标是支持零售商在混合顾客群体中做出最优的日提前动态定价决策。为此，我们提出一个两级决策框架，其中零售商作为上层代理首先宣布未来24小时的电力价格，顾客作为下层代理随后根据价格调度其能源使用。对于下层问题，我们根据不同顾客的独特特征建模其价格响应性。对于上层问题，我们优化动态价格以最大化零售商利润，同时满足现实市场约束。上述两级模型通过基于遗传算法（GA）的分布式优化方法解决，其可行性和有效性通过仿真结果得到验证。

英文摘要

In this paper, we consider a realistic and meaningful scenario in the context of smart grids where an electricity retailer serves three different types of customers, i.e., customers with an optimal home energy management system embedded in their smart meters (C-HEMS), customers with only smart meters (C-SM), and customers without smart meters (C-NONE). The main objective of this paper is to support the retailer to make optimal day-ahead dynamic pricing decisions in such a mixed customer pool. To this end, we propose a two-level decision-making framework where the retailer acting as upper-level agent firstly announces its electricity prices of next 24 hours and customers acting as lower-level agents subsequently schedule their energy usages accordingly. For the lower level problem, we model the price responsiveness of different customers according to their unique characteristics. For the upper level problem, we optimize the dynamic prices for the retailer to maximize its profit subject to realistic market constraints. The above two-level model is tackled by genetic algorithms (GA) based distributed optimization methods while its feasibility and effectiveness are confirmed via simulation results.

URL PDF HTML ☆

赞 0 踩 0

1703.02660 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

Towards Generalization and Simplicity in Continuous Control

连续控制中的泛化与简洁性

Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade

发表机构 * University of Washington（华盛顿大学）

AI总结本文展示简单线性与RBF参数化策略可解决多种连续控制任务，性能可与更复杂网络相媲美，且多样初始化提升泛化能力。

Comments NIPS 2017, Project page: https://sites.google.com/view/simple-pol

1803.06775 2026-06-04 quant-ph cs.AI cs.ET cs.SY eess.SY 版本更新

Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

比较和整合约束编程与时间规划用于量子电路编译

Kyle E. C. Booth, Minh Do, J. Christopher Beck, Eleanor Rieffel, Davide Venturelli, Jeremy Frank

发表机构 * Quantum Artificial Intelligence Laboratory, NASA Ames Research Center（量子人工智能实验室，美国国家航空航天局阿姆斯特朗研究中心）； Planning and Scheduling Group, NASA Ames Research Center（规划与调度组，美国国家航空航天局阿姆斯特朗研究中心）； USRA Research Institute for Advanced Computer Science（美国宇航局高级计算机科学研究所）； Stinger Ghaffarian Technologies, Inc.（Stinger Ghaffarian技术公司）； Department of Mechanical & Industrial Engineering, University of Toronto（多伦多大学机械与工业工程系）

AI总结本文比较了约束编程与时间规划在量子电路编译中的应用，提出混合方法提升求解质量，证明混合方法在多数问题中优于单独使用时间规划。

Comments 9 pages, 2 figures, Proceedings of the 28th International Conference of Automated Planning and Scheduling 2018 (ICAPS-18)

详情

AI中文摘要

最近，将一般量子算法编译为近期量子处理器的makespan最小化问题被引入人工智能社区。研究显示时间规划是量子电路编译（QCC）问题的一种强大方法。本文探讨了约束编程（CP）作为时间规划的替代和补充方法。我们通过引入两个新的问题变体扩展了先前工作，这些变体结合了量子计算社区识别的重要特征。我们应用时间规划和CP解决基准和扩展的QCC问题，作为单独和混合方法。我们的混合方法利用时间规划找到的解决方案预热CP，利用前者在任务选项性高的问题中找到满意解的能力，而CP通常难以处理。CP模型受益于预热提供的推断边界，从而找到更高质量的解。实证评估表明，虽然单独使用CP仅在最小问题中具有竞争力，但CP与时间规划的混合方法在多数问题类别中表现优于单独使用时间规划。

英文摘要

Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary approach to temporal planning. We extend previous work by introducing two new problem variations that incorporate important characteristics identified by the quantum computing community. We apply temporal planning and CP to the baseline and extended QCC problems as both stand-alone and hybrid approaches. Our hybrid methods use solutions found by temporal planning to warm start CP, leveraging the ability of the former to find satisficing solutions to problems with a high degree of task optionality, an area that CP typically struggles with. The CP model, benefiting from inferred bounds on planning horizon length and task counts provided by the warm start, is then used to find higher quality solutions. Our empirical evaluation indicates that while stand-alone CP is only competitive for the smallest problems, CP in our hybridization with temporal planning out-performs stand-alone temporal planning in the majority of problem classes.

URL PDF HTML ☆

赞 0 踩 0

1707.01625 2026-06-04 eess.SY cs.AI cs.GT cs.SY 版本更新

Optimal Vehicle Dispatching Schemes via Dynamic Pricing

通过动态定价实现最优车辆调度方案

Mengjing Chen, Weiran Shen, Pingzhong Tang, Song Zuo

发表机构 * IIIS, Tsinghua University（清华大学信息科学与技术学院）

AI总结本文通过经济方法解决网约车平台在地理和时间信息下的最优定价和车辆调度问题，提出高效算法计算最优随机定价方案，并通过实验证明其优于固定定价和涨价机制。

1708.08113 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

新颖的传感器调度方案用于能量高效的入侵者跟踪

Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

发表机构 * Department of Computer Science and Automation, Indian Institute of Science（计算机科学与自动化系，印度科学研究院）

AI总结本文提出基于POMDP的强化学习算法，用于在能量受限下高效跟踪入侵者，通过UCT方法实现状态和动作空间的扩展，验证了算法在大规模问题中的有效性。

1802.08138 2026-06-04 cs.AI cs.GT cs.SY eess.SY 版本更新

Reliable Intersection Control in Non-cooperative Environments

非合作环境中的可靠交叉口控制

Muhammed O. Sayin, Chung-Wei Lin, Shinichi Shiraishi, Tamer Başar

发表机构 * University of Illinois at Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Toyota InfoTechnology Center（丰田信息技术中心）

AI总结本文提出一种可靠交叉口控制机制，用于非合作环境中的战略自主和联网车辆。通过分析车辆的战略行为，确定纳什均衡，并识别社会最优均衡以实现公平分配。

Comments Extended version (including proofs of theorems and lemmas) of the paper: M. O. Sayin, C.-W. Lin, S. Shiraishi, and T. Basar, "Reliable intersection control in non-cooperative environments", to appear in the Proceedings of American Control Conference, 2018

1802.06314 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Autonomous Vehicle Speed Control for Safe Navigation of Occluded Pedestrian Crosswalk

自动驾驶车辆速度控制：安全通过遮挡人行横道

Sarah Thornton

发表机构 * Dynamic Design Lab（动态设计实验室）

AI总结本文提出基于部分可观测马尔可夫决策过程的速度控制方法，用于安全通过遮挡人行横道，通过动态规划计算控制策略以应对感知限制。

Comments 6 pages, 9 figures

1705.07262 2026-06-04 cs.LG cs.AI cs.NE cs.SY eess.SY 版本更新

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

批量强化学习在工业基准上的应用：初步经验

Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Technical University of Munich, Department of Informatics（慕尼黑技术大学信息学院）； Siemens AG, Corporate Technology（西门子股份公司企业技术部）

AI总结本文研究了粒子群优化策略在工业基准上的表现，展示了其在真实应用场景中的有效性，相比传统方法，PSO-P在性能和鲁棒性上表现突出。

详情

DOI: 10.1109/IJCNN.2017.7966389
Journal ref: 2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4214-4221

AI中文摘要

粒子群优化策略（PSO-P）近期被引入并证明在与学术强化学习基准的非策略、批量设置中产生了显著成果。为进一步研究其在真实应用中的性质和可行性，本文在所谓的工业基准（IB）上研究PSO-P，这是一个旨在通过包含工业应用中发现的各种方面（如连续状态和动作空间、高维部分可观测状态空间、延迟效应和复杂随机性）而变得真实的新强化学习（RL）基准。PSO-P在IB上的实验结果与基于模型的递归控制神经网络（RCNN）和基于模型的神经拟合Q迭代（NFQ）推导出的闭式控制策略的结果进行比较。实验表明，PSO-P不仅对学术基准感兴趣，也对真实世界工业应用感兴趣，因为它在我们的IB设置中也产生了最佳表现的策略。与其它已建立的RL技术相比，PSO-P在性能和鲁棒性上表现出色，仅需相对较低的努力来找到合适的参数或做出复杂的设计决策。

英文摘要

The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions.

URL PDF HTML ☆

赞 0 踩 0

1607.07942 2026-06-04 cs.AI cs.IT cs.SY eess.SY math.IT 版本更新

Multiple scan data association by convex variational inference

通过凸变分推断实现多扫描数据关联

Jason L. Williams, Roslyn A. Lau

发表机构 * Defence Science and Technology Group, Australia（澳大利亚国防科学与技术集团）； National Security, Intelligence, Surveillance and Reconnaissance Division（国家安全、情报、监视与侦察部门）； Queensland University of Technology, Australia（澳大利亚昆士兰理工大学）； Maritime Division, Defence Science and Technology Group, Australia（澳大利亚国防科学与技术集团的海军部门）

AI总结本文研究多扫描数据关联问题，提出基于分数自由能的凸优化方法，改进了传统信念传播算法，提升目标跟踪精度。

详情

AI中文摘要

数据关联，即对目标与测量之间的对应关系进行推理，是目标跟踪中的基础问题。最近，信念传播（BP）作为一种估计测量与目标关联的边缘概率的有希望方法出现，提供了快速且准确的估计。BP在特定形式中的出色表现可能归因于其隐含优化的底层自由能的凸性。本文研究多扫描数据关联问题，即对目标与多个测量集之间的对应关系进行推理的问题，这可能对应于不同的传感器或不同的时间步。我们发现单扫描BP形式的多扫描扩展是非凸的，并展示了由此产生的不良行为。使用最近提出的分数自由能（FFE）构建了凸自由能。为单扫描FFE提供了一个收敛的、类似BP的算法，并用于通过对偶坐标上升优化多扫描自由能。最后，基于联合概率数据关联（JPDA）的变分解释，我们开发了一个类似于JPDA的序列变体算法，但保留了来自先前扫描的一致性约束。所提出方法的性能在仅靠方位角的目标定位问题上得到验证。

英文摘要

Data association, the reasoning over correspondence between targets and measurements, is a problem of fundamental importance in target tracking. Recently, belief propagation (BP) has emerged as a promising method for estimating the marginal probabilities of measurement to target association, providing fast, accurate estimates. The excellent performance of BP in the particular formulation used may be attributed to the convexity of the underlying free energy which it implicitly optimises. This paper studies multiple scan data association problems, i.e., problems that reason over correspondence between targets and several sets of measurements, which may correspond to different sensors or different time steps. We find that the multiple scan extension of the single scan BP formulation is non-convex and demonstrate the undesirable behaviour that can result. A convex free energy is constructed using the recently proposed fractional free energy (FFE). A convergent, BP-like algorithm is provided for the single scan FFE, and employed in optimising the multiple scan free energy using primal-dual coordinate ascent. Finally, based on a variational interpretation of joint probabilistic data association (JPDA), we develop a sequential variant of the algorithm that is similar to JPDA, but retains consistency constraints from prior scans. The performance of the proposed methods is demonstrated on a bearings only target localisation problem.

URL PDF HTML ☆

赞 0 踩 0

1801.07229 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Combinatorial framework for planning in geological exploration

地质勘探规划的组合框架

Mark Sh. Levin

AI总结本文提出了一种用于油气田地质勘探规划的组合框架，通过多属性评估、层次化设计和区域整合，优化勘探方案。

Comments 14 pages, 15 figures, 11 tables

详情

AI中文摘要

本文描述了一种用于油气田地质勘探规划的组合框架。该框架包括构建四层树状模型、生成局部设计替代方案、多属性评估、层次化设计、区域整合以及计划聚合。第二至第五阶段基于层次化多属性形态学设计方法，第六阶段基于检测替代方案的'核心'并扩展其元素。替代方案的评估基于专家判断，并通过亚姆拉半岛的数值示例进行了说明。

英文摘要

The paper describes combinatorial framework for planning of geological exploration for oil-gas fields. The suggested scheme of the geological exploration involves the following stages: (1) building of special 4-layer tree-like model (layer of geological exploration): productive layer, group of productive layers, oil-gas field, oil-gas region (or group of the fields); (2) generations of local design (exploration) alternatives for each low-layer geological objects: conservation, additional search, independent utilization, joint utilization; (3) multicriteria (i.e., multi-attribute) assessment of the design (exploration) alternatives and their interrelation (compatibility) and mapping if the obtained vector estimates into integrated ordinal scale; (4) hierarchical design ('bottom-up') of composite exploration plans for each oil-gas field; (5) integration of the plans into region plans and (6) aggregation of the region plans into a general exploration plan. Stages 2, 3, 4, and 5 are based on hierarchical multicriteria morphological design (HMMD) method (assessment of ranking of alternatives, selection and composition of alternatives into composite alternatives). The composition problem is based on morphological clique model. Aggregation of the obtained modular alternatives (stage 6) is based on detection of a alternatives 'kernel' and its extension by addition of elements (multiple choice model). In addition, the usage of multiset estimates for alternatives is described as well. The alternative estimates are based on expert judgment. The suggested combinatorial planning methodology is illustrated by numerical examples for geological exploration of Yamal peninsula.

URL PDF HTML ☆

赞 0 踩 0

1801.00048 2026-06-04 eess.SY cs.AI cs.SY q-bio.NC 版本更新

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

通过非平衡热力学刻画图上最优分层策略推断

Daniel McNamee

发表机构 * Computational and Biological Learning Lab, University of Cambridge（计算与生物学习实验室，剑桥大学）

AI总结本文提出一种基于图的非平衡热力学方法，用于构建和推断最优分层策略，解决状态空间在不同空间分辨率下的层次结构构建问题。

Comments NIPS 2017 Workshop on Hierarchical Reinforcement Learning. 8 pages, 1 figure

1712.09356 2026-06-04 cs.AI cs.SY eess.SY 版本更新

An Online Ride-Sharing Path Planning Strategy for Public Vehicle Systems

面向公共交通系统的在线拼车路径规划策略

Ming Zhu, Xiao-Yang Liu, Xiaodong Wang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（深圳先进技术研究院，中国科学院）

AI总结本文提出一种高效的在线拼车路径规划策略，通过过滤不符合乘客服务质量的请求，将全局搜索转化为局部搜索，从而降低计算复杂度，实验表明计算时间比穷举法减少22%。

Comments 12 pages

1705.08927 2026-06-04 quant-ph cs.AI cs.ET cs.SY eess.SY 版本更新

Compiling quantum circuits to realistic hardware architectures using temporal planners

利用时间规划器将量子电路编译到现实硬件架构

Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank

发表机构 * NASA Ames Research Center, Quantum Artificial Intelligence Laboratory（美国国家航空航天局阿姆斯研究中心，量子人工智能实验室）； USRA Research Institute for Advanced Computer Science (RIACS)（美国宇航局高级计算机科学研究所（RIACS））； Stinger Ghaffarian Technologies (SGT Inc.)（Stinger Ghaffarian技术（SGT公司））； NASA Ames Research Center, Planning and Scheduling Group（美国国家航空航天局阿姆斯研究中心，计划与调度组）

AI总结本文研究了将量子电路编译到新兴量子硬件的时空规划方法，重点探讨了超导架构的最近邻约束，并通过QAOA电路的实验验证了时间规划在编译优化中的可行性。

Comments updated manuscript, more planners and results

详情

DOI: 10.1088/2058-9565/aaa331
Journal ref: 2017 Quantum Sci. Technol. - also related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop 2017

AI中文摘要

为了在新兴门模型量子硬件上运行量子算法，量子电路必须被编译以考虑硬件的限制。对于近期硬件，由于只能有限地缓解退相干，最小化电路持续时间至关重要。我们研究了将时间规划器应用于量子电路编译到新兴量子硬件的问题。虽然我们的方法是通用的，但我们专注于编译到具有最近邻约束的超导硬件架构。我们的初步实验集中在编译具有高数量交换门的量子交替算子范式（QAOA）电路，这些交换门允许在应用门的顺序上具有极大的灵活性。这种自由度使找到最优编译更具挑战性，但也意味着更优化的编译可能带来更大的收益。我们将这个量子电路编译问题映射到时间规划问题，并为不同大小的QAOA电路生成了一个测试集，以现实硬件架构为目标。我们报告了几个最先进的时间规划器在该测试集上的编译结果。这项早期的实证评估表明，时间规划是量子电路编译的一种可行方法。

英文摘要

To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.

URL PDF HTML ☆

赞 0 踩 0

1610.06781 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

模块化深度Q网络用于视觉-运动策略的仿真到现实迁移

Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)（澳大利亚机器人视觉中心）； Queensland University of Technology (QUT)（昆士兰理工大学）

AI总结本文提出模块化深度强化学习方法，通过在感知与控制之间引入瓶颈，实现仿真到现实的迁移，提升机器人视觉-运动协调能力。

Comments Australasian Conference on Robotics and Automation (ACRA) 2017, Student Paper Award Finalist

详情

Journal ref: The proceedings of the Australasian Conference on Robotics and Automation (ACRA) 2017

AI中文摘要

尽管深度学习在计算机视觉中因大量视觉数据而取得显著成功，但为机器人学习收集足够大的现实世界数据集成本较高。为提高这些技术在真实机器人上的实用性，我们提出了一种模块化深度强化学习方法，能够将仿真训练的模型迁移到现实世界机器人任务中。我们引入了感知与控制之间的瓶颈，使网络能够独立训练，然后在端到端方式下合并和微调，以进一步提高视觉-运动协调性。在经典的平面视觉引导机器人抓取任务中，微调后的准确度达到1.6像素，显著优于直接迁移（17.5像素），显示出在更复杂和广泛的应用中的潜力。我们的方法提供了一种更高效学习和迁移视觉-运动策略的技术，无需完全依赖大规模现实世界机器人数据集。

英文摘要

While deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We introduce a bottleneck between perception and control, enabling the networks to be trained independently, but then merged and fine-tuned in an end-to-end manner to further improve hand-eye coordination. On a canonical, planar visually-guided robot reaching task a fine-tuned accuracy of 1.6 pixels is achieved, a significant improvement over naive transfer (17.5 pixels), showing the potential for more complicated and broader applications. Our method provides a technique for more efficient learning and transfer of visuo-motor policies for real robotic systems without relying entirely on large real-world robot datasets.

URL PDF HTML ☆

赞 0 踩 0

1712.06577 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Parallel Complexity of Forward and Backward Propagation

前向和反向传播的并行复杂度

Maxim Naumov

发表机构 * NVIDIA

AI总结研究前向和反向传播作为三角方程组解的并行计算复杂度，提出直接和迭代并行算法，并展示FNN和RNN的反向传播可并行处理。

Comments 18 pages

1712.04612 2026-06-04 q-fin.CP cs.AI cs.CE cs.LG cs.SY eess.SY 版本更新

Inverse Reinforcement Learning for Marketing

营销中的逆强化学习

Igor Halperin

发表机构 * NYU Tandon School of Engineering（纽约大学坦顿工程学院）

AI总结本文提出利用逆强化学习研究动态消费者需求，通过最大熵方法构建可 tractable 模型，展示观测噪声可能被误认为消费者异质性。

Comments 18 pages, 5 figures

1712.00634 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY math.OC 版本更新

PFAx: Predictable Feature Analysis to Perform Control

PFAx：可预测特征分析用于控制

Stefan Richthofer, Laurenz Wiskott

AI总结 PFAx通过整合补充信息提升预测性能，并透明展示补充信息对特征选择的影响，应用于强化学习环境中的智能体控制优化。

详情

AI中文摘要

可预测特征分析（PFA）（Richthofer, Wiskott, ICMLA 2015）是一种对高维输入信号进行降维的算法，提取最可预测的子信号。本文扩展了PFA，考虑补充信息以提高预测。补充信息不参与特征提取，特征仅从主输入中提取。PFAx透明地展示补充信息如何提升预测质量，并可生成补充信息以实现主信号的特定目标。该方法应用于强化学习环境，使智能体局部优化状态，接近目标。后续论文将扩展此方法以实现全局优化。

英文摘要

Predictable Feature Analysis (PFA) (Richthofer, Wiskott, ICMLA 2015) is an algorithm that performs dimensionality reduction on high dimensional input signal. It extracts those subsignals that are most predictable according to a certain prediction model. We refer to these extracted signals as predictable features. In this work we extend the notion of PFA to take supplementary information into account for improving its predictions. Such information can be a multidimensional signal like the main input to PFA, but is regarded external. That means it won't participate in the feature extraction - no features get extracted or composed of it. Features will be exclusively extracted from the main input such that they are most predictable based on themselves and the supplementary information. We refer to this enhanced PFA as PFAx (PFA extended). Even more important than improving prediction quality is to observe the effect of supplementary information on feature selection. PFAx transparently provides insight how the supplementary information adds to prediction quality and whether it is valuable at all. Finally we show how to invert that relation and can generate the supplementary information such that it would yield a certain desired outcome of the main signal. We apply this to a setting inspired by reinforcement learning and let the algorithm learn how to control an agent in an environment. With this method it is feasible to locally optimize the agent's state, i.e. reach a certain goal that is near enough. We are preparing a follow-up paper that extends this method such that also global optimization is feasible.

URL PDF HTML ☆

赞 0 踩 0

1711.10566 2026-06-04 cs.AI cs.LG cs.NA math.AP math.NA stat.ML 版本更新

Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations

物理指导深度学习（第二部分）：数据驱动发现非线性偏微分方程

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University（应用数学系，布朗大学）

AI总结本文提出物理指导神经网络，用于在尊重物理定律的前提下解决监督学习任务。第二部分聚焦于数据驱动发现偏微分方程的问题，区分了连续时间和离散时间模型，并通过数学物理中的多个基准问题验证了方法的有效性。

1711.10561 2026-06-04 cs.AI cs.LG cs.NA math.DS math.NA stat.ML 版本更新

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

物理引导的深度学习（第一部分）：非线性偏微分方程的数据驱动求解

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University（应用数学系，布朗大学）

AI总结本文提出物理引导的神经网络，用于在满足物理定律的前提下解决监督学习问题。第一部分介绍了如何利用这些网络推断偏微分方程的解，并构建可微的物理引导替代模型。

详情

AI中文摘要

我们引入了物理引导的神经网络——一种在解决监督学习任务时尊重由一般非线性偏微分方程描述的物理定律的神经网络。在本两部分论述中，我们围绕解决两类主要问题展开：数据驱动求解和数据驱动发现偏微分方程。根据可用数据的性质和安排，我们设计了两种不同的算法类别，即连续时间和离散时间模型。所得到的神经网络形成了一种新的数据高效通用函数逼近器类别，能够自然地将任何底层物理定律作为先验信息编码。在本第一部分中，我们展示了这些网络如何用于推断偏微分方程的解，并获得完全可微的物理引导替代模型，该模型对所有输入坐标和自由参数均可微分。

英文摘要

We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters.

URL PDF HTML ☆

赞 0 踩 0

1711.08512 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A Study on Modeling of Inputting Electrical Power of Ultra High Power Electric Furnace by using Fuzzy Rule and Regression Model

基于模糊规则和回归模型的超高压电炉输入电力建模研究

Choe Un-Chol, Yun Kum-Il, Kwak Son-Il

发表机构 * Faculty of Electronics & Automation, Kim Il Sung University（电子自动化学院，金日成大学）

AI总结本文提出利用模糊规则和回归模型建立影响高超功率电炉熔炼过程的电力输入模型，并通过仿真实验验证其有效性。

Comments 8 pages, 3 figures, 1 table

1704.04058 2026-06-04 math.OC cs.AI cs.NA math.FA math.NA 版本更新

Solving ill-posed inverse problems using iterative deep neural networks

使用迭代深度神经网络求解病态反问题

Jonas Adler, Ozan Öktem

AI总结本文提出了一种部分学习方法，利用深度学习和经典正则化理论解决非线性反问题，通过卷积网络学习梯度组件，提升重建速度和PSNR性能。

详情

DOI: 10.1088/1361-6420/aa9581
Journal ref: Inverse Problems 2017

AI中文摘要

我们提出了一种部分学习方法，用于求解非线性正则化反问题。该方法结合经典正则化理论和深度学习进展，利用正则化函数、前向算子和噪声模型的先验信息进行学习。结果是一种梯度样迭代方案，其中梯度组件通过卷积网络学习，输入数据不一致性和正则化器的梯度。我们在非线性断层成像问题中测试了该方法，使用Sheep-Logan幻影和头CT模拟数据，结果优于FBP和TV重建，PSNR提升5.4 dB，速度显著加快，单GPU约0.4秒完成512x512体积重建。

英文摘要

We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the "gradient" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.

URL PDF HTML ☆

赞 0 踩 0

1711.08237 2026-06-04 eess.SY cs.AI cs.SI cs.SY 版本更新

The Stochastic Firefighter Problem

随机灭火问题

Guy Tennenholtz, Constantine Caramanis, Shie Mannor

AI总结研究网络中个体顺序接种策略，提出在概率环境下最优的接种策略，并在不同网络结构上计算感染人数的期望上界和下界。

详情

AI中文摘要

研究传染病传播的动力学在确定风险和控制措施中的关键作用。我们研究网络中个体的顺序接种策略。在原始（确定性）的灭火问题中，火灾在给定图的某个节点爆发。在每个时间步，b个节点可以通过消防员保护，然后火灾会传播到所有未受保护的邻居节点。当火灾无法继续传播时过程结束。我们将灭火问题扩展到概率环境，其中感染是随机的。我们设计了一种简单的策略，仅对感染节点的邻居进行接种，并且在正则树和一般图上，对于足够大的预算，该策略是最佳的。我们推导了计算感染个体数期望上界和下界的方法，并提供了在期望中控制所需的预算估计。我们明确地在树、d维网格和Erdős Rényi图上计算这些内容。最后，我们构建了一种状态依赖的预算分配策略，并在遵循第一阶认识接种政策的真实网络上展示了其优于常数预算分配的优越性。

英文摘要

The dynamics of infectious diseases spread is crucial in determining their risk and offering ways to contain them. We study sequential vaccination of individuals in networks. In the original (deterministic) version of the Firefighter problem, a fire breaks out at some node of a given graph. At each time step, b nodes can be protected by a firefighter and then the fire spreads to all unprotected neighbors of the nodes on fire. The process ends when the fire can no longer spread. We extend the Firefighter problem to a probabilistic setting, where the infection is stochastic. We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget. We derive methods for calculating upper and lower bounds of the expected number of infected individuals, as well as provide estimates on the budget needed for containment in expectation. We calculate these explicitly on trees, d-dimensional grids, and Erdős Rényi graphs. Finally, we construct a state-dependent budget allocation strategy and demonstrate its superiority over constant budget allocation on real networks following a first order acquaintance vaccination policy.

URL PDF HTML ☆

赞 0 踩 0

1711.04518 2026-06-04 eess.SY cs.AI cs.HC cs.LG cs.NE cs.SY 版本更新

A Supervised Learning Concept for Reducing User Interaction in Passenger Cars

一种用于减少乘客汽车中用户交互的监督学习概念

Marius Stärk, Damian Backes, Christian Kehl

AI总结本文提出了一种基于监督学习的自动化系统，用于减少人机交互界面中的交互复杂性，适用于汽车多模态热调节系统的设定点选择。

Comments 4 pages, 9 figures, concept only

1705.08551 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Safe Model-based Reinforcement Learning with Stability Guarantees

具有稳定性保证的安全模型基于强化学习

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause

AI总结本文提出一种考虑安全性的强化学习算法，通过Lyapunov稳定性验证理论，利用动态统计模型获得具有证明稳定性的高性能控制策略，并在模拟倒立摆中展示其安全优化神经网络策略的能力。

Comments Proc. of Neural Information Processing Systems (NIPS), 2017

详情

AI中文摘要

强化学习是一种从实验数据中学习最优策略的强大范式。然而，为了找到最优策略，大多数强化学习算法会探索所有可能的动作，这可能对现实系统有害。因此，学习算法在现实世界中很少应用于安全关键系统。在本文中，我们提出了一种明确考虑安全性的学习算法，定义为稳定性保证。具体来说，我们扩展了控制理论中关于Lyapunov稳定性验证的结果，并展示了如何利用动态的统计模型来获得具有证明稳定性的高性能控制策略。此外，在额外的正则性假设条件下，我们证明了可以有效地、安全地收集数据以学习动态特性，从而提高控制性能并扩大状态空间的安全区域。在我们的实验中，我们展示了所得到的算法如何在模拟倒立摆上安全地优化神经网络策略，而摆杆从未倒下。

英文摘要

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

URL PDF HTML ☆

赞 0 踩 0

1602.06667 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

A Motion Planning Strategy for the Active Vision-Based Mapping of Ground-Level Structures

一种用于主动视觉建图的地面结构运动规划策略

Manikandasriram Srinivasan Ramanagopal, André Phu-Van Nguyen, Jerome Le Ny

AI总结本文提出了一种指导配备摄像头或深度传感器的地面机器人自主建图有限三维结构可见部分的策略，通过运动规划算法确定合适视角并自动填补点云中的空洞，适用于建筑、施工和检测领域。

Comments Accepted for publication in IEEE Transactions on Automation Science and Engineering. Available in IEEE Xplore at http://ieeexplore.ieee.org/document/8093664

详情

DOI: 10.1109/TASE.2017.2762088

AI中文摘要

本文提出了一种策略，用于指导配备摄像头或深度传感器的地面机器人，以自主建图有限三维结构的可见部分。我们描述了确定合适连续视角的运动规划算法，并尝试自动填补由感知和感知层产生的点云中的空洞。重点是准确重建中等大小结构的3D模型，而非映射大型开放环境。所提出的算法不需要以网格模型或包围盒形式的初始化，生成的路径适用于视觉传感器同时用于建图和机器人局部化的情况，特别是在没有额外绝对定位系统时。我们分析了我们的策略的覆盖性质，并将其性能与经典前沿探索算法进行比较。我们展示了其在不同结构大小、局部化精度水平和深度传感器范围下的有效性，并在真实世界实验中验证了我们的设计。

英文摘要

This paper presents a strategy to guide a mobile ground robot equipped with a camera or depth sensor, in order to autonomously map the visible part of a bounded three-dimensional structure. We describe motion planning algorithms that determine appropriate successive viewpoints and attempt to fill holes automatically in a point cloud produced by the sensing and perception layer. The emphasis is on accurately reconstructing a 3D model of a structure of moderate size rather than mapping large open environments, with applications for example in architecture, construction and inspection. The proposed algorithms do not require any initialization in the form of a mesh model or a bounding box, and the paths generated are well adapted to situations where the vision sensor is used simultaneously for mapping and for localizing the robot, in the absence of additional absolute positioning system. We analyze the coverage properties of our policy, and compare its performance to the classic frontier based exploration algorithm. We illustrate its efficacy for different structure sizes, levels of localization accuracy and range of the depth sensor, and validate our design on a real-world experiment.

URL PDF HTML ☆

赞 0 踩 0

1711.03026 2026-06-04 eess.SY cs.AI cs.SY stat.ML 版本更新

Intelligent Fault Analysis in Electrical Power Grids

电力电网的智能故障分析

Biswarup Bhattacharya, Abhishek Sinha

AI总结本文提出利用人工智能技术，通过形式化模型和机器学习方法检测电网健康状况，提升电网稳定性与安全性。

Comments In proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2017 (full paper); 6 pages; 13 figures

详情

DOI: 10.1109/ICTAI.2017.00151

AI中文摘要

电力电网是当今世界基础设施中最重要的一部分。每个国家都依赖自身电网的安全性和稳定性来为家庭和工业提供电力。即使电网的某一小部分出现故障，也可能导致生产力、收入损失，甚至在某些情况下导致生命危险。因此，设计一个能够检测电网健康状况并在严重异常发生前采取保护措施的系统至关重要。为此，我们致力于创建一个智能系统，能够随时分析电网信息，并通过使用复杂的正式模型和新颖的机器学习技术如循环神经网络来确定电网的健康状况。我们的系统使用西门子PSS/E软件模拟电网条件，包括故障、发电机输出波动和负载波动等刺激，并使用SVM、LSTM等分类器对数据进行训练和测试。结果非常出色，我们的方法在数据上表现出很高的准确性。该模型可以轻松扩展以处理更大、更复杂的电网架构。

英文摘要

Power grids are one of the most important components of infrastructure in today's world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures.

URL PDF HTML ☆

赞 0 踩 0

1711.02877 2026-06-04 eess.SY cs.AI cs.LG cs.SY math.OC 版本更新

Un résultat intrigant en commande sans modèle

一个令人着迷的无模型控制结果

Cédric Join, Emmanuel Delaleau, Michel Fliess, Claude H. Moog

AI总结通过鲁夫-赫维茨准则，证明了无模型控制中智能比例控制器可能比智能比例-微分控制器更难调参，通过仿真展示了iPD的优势。

Comments in French, https://www.openscience.fr/Un-resultat-intrigant-en-commande-sans-modele

1711.02857 2026-06-04 cs.LG cs.AI cs.CV cs.NA math.NA stat.ML 版本更新

Learning Sparse Visual Representations with Leaky Capped Norm Regularizers

通过泄漏受限范数正则化器学习稀疏视觉表示

Jianqiao Wangni, Dahua Lin

AI总结本文提出泄漏受限范数正则化器，用于学习过完备视觉表示，证明了其在3D形状恢复中的收敛性，优于ℓ1和非凸正则化方法。

详情

AI中文摘要

诱导稀疏性的正则化是学习过完备视觉表示的重要组成部分。尽管ℓ1正则化广受欢迎，本文研究了非凸正则化在该问题中的应用。我们的贡献包括三个部分：首先，我们提出了泄漏受限范数正则化器（LCNR），允许模型权重低于一定阈值的部分被更强地正则化，从而实现强稀疏性，仅引入可控的估计偏差。我们提出了一种主要化-最小化算法来优化联合目标函数。其次，我们的研究显示，在单目3D形状恢复和神经网络中，LCNR优于ℓ1和其他非凸正则化方法，实现了最先进的性能和更快的收敛速度。第三，我们证明了在3D恢复问题上的理论全局收敛速度。到目前为止，这是首次对3D恢复问题的收敛性分析。

英文摘要

Sparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.

URL PDF HTML ☆

赞 0 踩 0

1708.01930 2026-06-04 cs.AI cs.MA cs.RO cs.SY eess.SY 版本更新

Enhanced Emotion Enabled Cognitive Agent Based Rear End Collision Avoidance Controller for Autonomous Vehicles

增强型情感驱动认知代理基于后方碰撞避免控制器用于自动驾驶车辆

Faisal Riaz, Muaz A. Niazi

AI总结本文提出一种基于增强型情感驱动认知代理的后方碰撞避免控制器，通过引入恐惧情绪生成机制，提高自动驾驶车辆的碰撞避免效率和规则数量。

Comments 39 pages, 17 figures

详情

AI中文摘要

后方碰撞是自然中最致命的事故，导致大多数交通伤亡和伤害。现有研究提出了许多后方碰撞避免解决方案，但这些方案高度依赖精确的数学模型。然而，实际道路驾驶受非线性因素如路面状况、驾驶员反应时间、行人流量和车辆动力学影响，因此获得车辆控制系统精确数学模型具有挑战性。这个问题通过模糊逻辑解决了，但过多的模糊规则直接影响其效率。此外，这些基于模糊逻辑的控制器未使用适当的代理建模来模拟人工驾驶员执行这些模糊规则的功能。鉴于这些限制，我们提出了一种增强型情感驱动认知代理（EEEC_Agent）控制器，帮助自动驾驶车辆（AVs）以较少的规则进行后方碰撞避免，设计基于恐惧情绪，并具有高效率。为了在EEEC_Agent中引入恐惧情绪生成机制，采用了Orton, Clore & Collins（OCC）模型。EEEC_Agent的恐惧生成机制通过NetLogo模拟验证。此外，通过特别设计的原型AV平台对EEEC_Agent的功能进行了实际验证。最终，与现有最先进研究的定性比较研究表明，所提出的模型优于近期研究。

英文摘要

Rear end collisions are deadliest in nature and cause most of traffic casualties and injuries. In the existing research, many rear end collision avoidance solutions have been proposed. However, the problem with these proposed solutions is that they are highly dependent on precise mathematical models. Whereas, the real road driving is influenced by non-linear factors such as road surface situations, driver reaction time, pedestrian flow and vehicle dynamics, hence obtaining the accurate mathematical model of the vehicle control system is challenging. This problem with precise control based rear end collision avoidance schemes has been addressed using fuzzy logic, but the excessive number of fuzzy rules straightforwardly prejudice their efficiency. Furthermore, these fuzzy logic based controllers have been proposed without using proper agent based modeling that helps in mimicking the functions of an artificial human driver executing these fuzzy rules. Keeping in view these limitations, we have proposed an Enhanced Emotion Enabled Cognitive Agent (EEEC_Agent) based controller that helps the Autonomous Vehicles (AVs) to perform rear end collision avoidance with less number of rules, designed after fear emotion, and high efficiency. To introduce a fear emotion generation mechanism in EEEC_Agent, Orton, Clore & Collins (OCC) model has been employed. The fear generation mechanism of EEEC_Agent has been verified using NetLogo simulation. Furthermore, practical validation of EEEC_Agent functions has been performed using specially built prototype AV platform. Eventually, the qualitative comparative study with existing state of the art research works reflect that proposed model outperforms recent research.

URL PDF HTML ☆

赞 0 踩 0

1708.01628 2026-06-04 cs.MA cs.AI cs.GT cs.SY eess.SY 版本更新

Validation of Enhanced Emotion Enabled Cognitive Agent Using Virtual Overlay Multi-Agent System Approach

基于虚拟叠加多智能体系统的增强型情感认知智能体验证

Faisal Riaz, Muaz A. Niazi

AI总结本文提出基于虚拟叠加多智能体系统的方法，验证了增强型情感认知智能体在避免道路碰撞中的有效性，展示了其在不同交通情境下感知恐惧等级的能力及更短的停车视距和超车视距。

Comments 35 pages, 21 figures, 19 tables

详情

Journal ref: Broad Research in Artificial Intelligence and Neuroscience 8.3 (2017): 13-37

AI中文摘要

通过避免道路碰撞来提高道路安全性是发明自动驾驶车辆（AVs）的主要原因之一。在此背景下，设计能够真正代表人类认知和情感的基于智能体的碰撞避免组件，似乎是更可行的方法，因为智能体可以替代人类驾驶员。然而，据我们所知，在这一领域中，非常少有基于人类情感和认知的智能体研究。此外，这些基于智能体的解决方案尚未使用任何关键的验证技术进行验证。考虑到这种缺乏验证实践的情况，我们选择了最先进的情感认知智能体（EEC_Agent），该智能体旨在避免半自动驾驶车辆之间的侧向碰撞。EEC_Agent的架构已使用认知智能体基于计算（CABC）框架中的探索性智能体建模（EABM）级别进行了修订，并引入了基于Ortony、Clore & Collins（OCC）模型的实时恐惧情绪生成机制。然后，所提出的恐惧生成机制已通过CABC框架中的验证智能体建模级别使用虚拟叠加多智能体系统（VOMAS）进行验证。广泛的模拟和实际实验表明，增强型EEC_Agent能够根据不同的交通情境感知不同层次的恐惧，并且相比人类驾驶员，其所需的停车视距（SSD）和超车视距（OSD）更小。

英文摘要

Making roads safer by avoiding road collisions is one of the main reasons for inventing Autonomous vehicles (AVs). In this context, designing agent-based collision avoidance components of AVs which truly represent human cognition and emotions look is a more feasible approach as agents can replace human drivers. However, to the best of our knowledge, very few human emotion and cognition-inspired agent-based studies have previously been conducted in this domain. Furthermore, these agent-based solutions have not been validated using any key validation technique. Keeping in view this lack of validation practices, we have selected state-of-the-art Emotion Enabled Cognitive Agent (EEC_Agent), which was proposed to avoid lateral collisions between semi-AVs. The architecture of EEC_Agent has been revised using Exploratory Agent Based Modeling (EABM) level of the Cognitive Agent Based Computing (CABC) framework and real-time fear emotion generation mechanism using the Ortony, Clore & Collins (OCC) model has also been introduced. Then the proposed fear generation mechanism has been validated using the Validated Agent Based Modeling level of CABC framework using a Virtual Overlay MultiAgent System (VOMAS). Extensive simulation and practical experiments demonstrate that the Enhanced EEC_Agent exhibits the capability to feel different levels of fear, according to different traffic situations and also needs a smaller Stopping Sight Distance (SSD) and Overtaking Sight Distance (OSD) as compared to human drivers.

URL PDF HTML ☆

赞 0 踩 0

1710.11040 2026-06-04 cs.RO cs.AI cs.SY eess.SY math.OC 版本更新

How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics

机器人应如何评估风险？迈向机器人学中的风险轴理论

Anirudha Majumdar, Marco Pavone

AI总结本文探讨了机器人风险评估的理论基础，提出风险度量应满足的公理，讨论了风险度量的表示定理及其在机器人应用中的实例，并分析了常用风险度量的局限性。

Comments Extended version of paper published in International Symposium on Robotics Research (ISRR) 2017

详情

AI中文摘要

赋予机器人评估风险和做出风险感知决策的能力被视为确保在不确定环境下运作的机器人安全的关键步骤。但，机器人应如何量化风险？一种自然且常见的方法是考虑一种框架，即随机结果被赋予成本——这种分配由一个成本随机变量捕捉。量化风险则对应于评估风险度量，即从成本随机变量到实数的映射。然而，什么是构成

英文摘要

Endowing robots with the capability of assessing risk and making risk-aware decisions is widely considered a key step toward ensuring safety for robots operating under uncertainty. But, how should a robot quantify risk? A natural and common approach is to consider the framework whereby costs are assigned to stochastic outcomes - an assignment captured by a cost random variable. Quantifying risk then corresponds to evaluating a risk metric, i.e., a mapping from the cost random variable to a real number. Yet, the question of what constitutes a "good" risk metric has received little attention within the robotics community. The goal of this paper is to explore and partially address this question by advocating axioms that risk metrics in robotics applications should satisfy in order to be employed as rational assessments of risk. We discuss general representation theorems that precisely characterize the class of metrics that satisfy these axioms (referred to as distortion risk metrics), and provide instantiations that can be used in applications. We further discuss pitfalls of commonly used risk metrics in robotics, and discuss additional properties that one must consider in sequential decision making tasks. Our hope is that the ideas presented here will lead to a foundational framework for quantifying risk (and hence safety) in robotics applications.

URL PDF HTML ☆

赞 0 踩 0

1710.10532 2026-06-04 eess.SY cs.AI cs.LG cs.SY 版本更新

Interpretable Apprenticeship Learning with Temporal Logic Specifications

具有时序逻辑规范的可解释模仿学习

Daniel Kasenberg, Matthias Scheutz

AI总结本文提出通过多目标优化从MDP中的行为轨迹推断LTL规范，采用违反成本概念设计状态和动作基于的目标函数，并通过遗传算法在简单领域验证方法有效性。

Comments Accepted to the 56th IEEE Conference on Decision and Control (CDC 2017)

1710.09627 2026-06-04 cs.AI cs.NI cs.SY eess.SY 版本更新

SRE: Semantic Rules Engine For the Industrial Internet-Of-Things Gateways

SRE：面向工业互联网-of-things网关的语义规则引擎

Charbel El Kaed, Imran Khan, Andre Van Den Berg, Hicham Hossayni, Christophe Saint-Marcel

AI总结本文提出一种面向工业网关的语义规则引擎SRE，用于实现动态灵活的基于规则的控制策略，支持实时管理规则并提供语义查询结果。

Comments Accepted for publication in forthcoming issue of IEEE Transactions on Industrial Informatics. The content is final but has NOT been proof-read

详情

Journal ref: IEEE Transactions on Industrial Informatics, 2017

AI中文摘要

物联网范式的发展为解决现实问题提供了机会。例如，能源管理吸引了学术界、工业界、政府和监管机构的广泛关注。它涉及收集能源使用数据、分析数据并通过控制策略优化能源消耗。然而，在工业环境中，进行此类优化并不简单。业务规则的变化、过程控制和客户要求的变化使问题更加具有挑战性。本文提出了一种面向工业网关的语义规则引擎（SRE），允许实现动态且灵活的基于规则的控制策略。它简单、表达能力强，并允许在不造成任何服务中断的情况下实时管理规则。此外，它能够处理语义查询，并通过从已定义的概念中推断额外知识来提供结果。SRE已在不同硬件平台和商业产品上得到验证和测试。还提供了性能评估以验证其对客户要求的符合性。

英文摘要

The Advent of the Internet-of-Things (IoT) paradigm has brought opportunities to solve many real-world problems. Energy management, for example, has attracted huge interest from academia, industries, governments and regulatory bodies. It involves collecting energy usage data, analyzing it, and optimizing the energy consumption by applying control strategies. However, in industrial environments, performing such optimization is not trivial. The changes in business rules, process control, and customer requirements make it much more challenging. In this paper, a Semantic Rules Engine (SRE) for industrial gateways is presented that allows implementing dynamic and flexible rule-based control strategies. It is simple, expressive, and allows managing rules on-the-fly without causing any service interruption. Additionally, it can handle semantic queries and provide results by inferring additional knowledge from previously defined concepts in ontologies. SRE has been validated and tested on different hardware platforms and in commercial products. Performance evaluations are also presented to validate its conformance to the customer requirements.

URL PDF HTML ☆

赞 0 踩 0

1710.07147 2026-06-04 cs.AI cs.SY eess.SY 版本更新

A Two-Phase Safe Vehicle Routing and Scheduling Problem: Formulations and Solution Algorithms

两阶段安全车辆路径与调度问题：建模与求解算法

Aschkan Omidvar, Eren Erman Ozguven, O. Arda Vanli, R. Tavakkoli-Moghaddam

AI总结本文提出一种两阶段时间依赖车辆路径与调度优化模型，通过避免重复拥堵和选择事故概率较低的路线，替代传统最短距离或行驶时间目标。第一阶段利用混合整数规划模型确定安全路径；第二阶段通过调整出发时间和速度避免拥堵，采用改进的模拟退火算法求解。

详情

AI中文摘要

我们提出一个两阶段时间依赖车辆路径与调度优化模型，通过（1）避免重复拥堵和（2）选择事故概率较低的路线，替代文献中常见的最短距离或行驶时间目标。第一阶段根据时间动态考虑道路网络上的速度变化，解决混合整数规划模型以确定车队和节点序列的安全路径。第二阶段将每条路线视为独立的交通路径（固定路线和节点序列），通过调整车辆从每个节点的出发时间和调整各边的次优速度来避免拥堵。提出的改进模拟退火（SA）算法用于迭代求解这两个复杂模型，能够以较短的时间提供解决方案。

英文摘要

We propose a two phase time dependent vehicle routing and scheduling optimization model that identifies the safest routes, as a substitute for the classical objectives given in the literature such as shortest distance or travel time, through (1) avoiding recurring congestions, and (2) selecting routes that have a lower probability of crash occurrences and non-recurring congestion caused by those crashes. In the first phase, we solve a mixed-integer programming model which takes the dynamic speed variations into account on a graph of roadway networks according to the time of day, and identify the routing of a fleet and sequence of nodes on the safest feasible paths. Second phase considers each route as an independent transit path (fixed route with fixed node sequences), and tries to avoid congestion by rescheduling the departure times of each vehicle from each node, and by adjusting the sub-optimal speed on each arc. A modified simulated annealing (SA) algorithm is formulated to solve both complex models iteratively, which is found to be capable of providing solutions in a considerably short amount of time.

URL PDF HTML ☆

赞 0 踩 0

1709.03153 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

MBMF：基于模型的先验用于无模型强化学习

Somil Bansal, Roberto Calandra, Kurtland Chua, Sergey Levine, Claire Tomlin

AI总结本文提出一种结合模型与无模型强化学习的方法，通过学习概率动力学模型作为先验，提升数据效率和成本效益。

Comments After we submitted the paper for consideration in CoRL 2017 we found a paper published in the recent past with a similar method (see related work for a discussion). Considering the similarities between the two papers, we have decided to retract our paper from CoRL 2017

1707.09095 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Toward the Starting Line: A Systems Engineering Approach to Strong AI

迈向起点：一种系统工程方法用于强人工智能

Tansu Alpcan, Sarah M. Erfani, Christopher Leckie

AI总结本文提出一种基于系统工程的方法，旨在解决强人工智能的起点问题，通过跨学科融合推动主流研究。

Comments 11 pages, 3 figures

1708.08035 2026-06-04 math.OC cs.AI cs.NA math.NA 版本更新

A Conservation Law Method in Optimization

优化中的守恒定律方法

Bin Shi

AI总结本文提出基于牛顿第二定律无摩擦的算法，用于寻找非凸优化的局部极小值和某些程度的全局极小值，通过速度可观测性和可控性实现高效收敛。

1710.00489 2026-06-04 cs.RO cs.AI cs.CV cs.NE cs.SY eess.SY 版本更新

SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control

SE3-姿态网络：用于视觉-运动规划和控制的结构深度动力学模型

Arunkumar Byravan, Felix Leeb, Franziska Meier, Dieter Fox

AI总结本文提出了一种基于结构深度动力学模型的深度视觉-运动控制方法，通过编码器-解码器结构学习低维姿态嵌入，实现场景分割和姿态预测，并在现实世界中实现了闭环控制。

Comments 8 pages, Initial submission to IEEE International Conference on Robotics and Automation (ICRA) 2018

详情

AI中文摘要

本文提出了一种基于结构深度动力学模型的深度视觉-运动控制方法。我们的深度动力学模型是一种SE3-Nets的变体，通过编码器-解码器结构学习低维姿态嵌入用于视觉-运动控制。与以往工作不同，我们的动力学模型是结构化的：给定一个输入场景，我们的网络明确学习分割显著部分并预测其姿态嵌入以及其运动作为姿态空间中的变化。我们通过一对相隔动作的点云训练我们的模型，并展示在仅提供帧间点对数据关联的监督下，我们的网络能够学习有意义的场景分割以及一致的姿态。我们进一步展示我们的模型可以直接在学习的低维姿态空间中用于闭环控制，其中动作通过最小化姿态空间中的误差使用基于梯度的方法计算，类似于传统模型驱动控制。我们展示了在模拟和现实世界中控制Baxter机器人从原始深度数据的结果，并与两种基线深度网络进行了比较。我们的方法在实时运行，实现了良好的场景动态预测，并在多个控制运行中优于基线方法。视频结果可在：https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/

英文摘要

In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured: given an input scene, our network explicitly learns to segment salient parts and predict their pose-embedding along with their motion modeled as a change in the pose space due to the applied actions. We train our model using a pair of point clouds separated by an action and show that given supervision only in the form of point-wise data associations between the frames our network is able to learn a meaningful segmentation of the scene along with consistent poses. We further show that our model can be used for closed-loop control directly in the learned low-dimensional pose space, where the actions are computed by minimizing error in the pose space using gradient-based methods, similar to traditional model-based control. We present results on controlling a Baxter robot from raw depth data in simulation and in the real world and compare against two baseline deep networks. Our method runs in real-time, achieves good prediction of scene dynamics and outperforms the baseline methods on multiple control runs. Video results can be found at: https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/

URL PDF HTML ☆

赞 0 踩 0

1709.08471 2026-06-04 math.NA cs.AI cs.NA 版本更新

Bayesian Filtering for ODEs with Bounded Derivatives

具有有界导数的ODEs的贝叶斯滤波

Emilia Magnani, Hans Kersting, Michael Schober, Philipp Hennig

AI总结本文提出了一种新的贝叶斯滤波方法，用于求解具有有界导数的常微分方程，通过引入集成奥本海姆-乌伦贝克过程（IOUP）作为先验，改进了传统积分维纳过程（IWP）滤波器。

Comments 14 pages, 9 figrues

详情

AI中文摘要

近年来，对常微分方程（ODEs）的概率求解器日益感兴趣，这些求解器返回完整的概率测度，而非点估计，并能整合对ODE本身的不确定性，例如当向量场或初始值仅近似已知或可计算时。最近提出的一种ODE滤波器将ODE的解建模为高斯-马尔可夫过程，作为贝叶斯统计中的先验。尽管先前工作使用维纳过程先验建模ODE的（可能多次）导数，并建立了相应求解器与经典数值方法的等价性，本文提出问题：其他先验是否也能产生实用的求解器？为此，我们讨论了多种可能的先验，提出了一种新的先验——集成奥本海姆-乌伦贝克过程（IOUP），它补充了现有的集成维纳过程（IWP）滤波器，通过编码解的时间导数有界性质，即导数会趋向于漂向零。我们提供了比较IWP和IOUP滤波器的实验，支持IWP在近似发散ODE解时表现更好，而IOUP更适合具有有界导数的轨迹。

英文摘要

Recently there has been increasing interest in probabilistic solvers for ordinary differential equations (ODEs) that return full probability measures, instead of point estimates, over the solution and can incorporate uncertainty over the ODE at hand, e.g. if the vector field or the initial value is only approximately known or evaluable. The ODE filter proposed in recent work models the solution of the ODE by a Gauss-Markov process which serves as a prior in the sense of Bayesian statistics. While previous work employed a Wiener process prior on the (possibly multiple times) differentiated solution of the ODE and established equivalence of the corresponding solver with classical numerical methods, this paper raises the question whether other priors also yield practically useful solvers. To this end, we discuss a range of possible priors which enable fast filtering and propose a new prior--the Integrated Ornstein Uhlenbeck Process (IOUP)--that complements the existing Integrated Wiener process (IWP) filter by encoding the property that a derivative in time of the solution is bounded in the sense that it tends to drift back to zero. We provide experiments comparing IWP and IOUP filters which support the belief that IWP approximates better divergent ODE's solutions whereas IOUP is a better prior for trajectories with bounded derivatives.

URL PDF HTML ☆

赞 0 踩 0

1709.06080 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

前馈和循环神经网络的反向传播与Hessian矩阵形式

Maxim Naumov

AI总结本文研究了前馈和循环神经网络的线性代数理论，推导了Hessian的精确表达式，并展示了权重梯度和Hessian的矩阵形式。

Comments 23 pages, 4 figures

1709.06011 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Guided Deep Reinforcement Learning for Swarm Systems

引导式深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

AI总结本文研究如何通过有限感知能力的协作代理（如机器人群）学习控制方法，提出引导式强化学习框架，利用中央 critic 获取全局状态以简化策略评估，通过深度强化学习近似 Q 函数和策略。

Comments 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Workshop

详情

AI中文摘要

本文研究如何学习控制具有有限感知能力的协作代理群体（如机器人群）。代理仅具备基本传感器能力，但通过协作可完成复杂任务，如分布式装配或搜索救援。学习群体代理的策略因分布式部分可观测性而困难。本文采用引导式方法，其中 critic 在学习过程中拥有全局状态的中央访问，从而从强化学习角度简化策略评估问题。例如，通过摄像头图像获取所有机器人位置，但该图像仅供 critic 使用，不供机器人控制策略。本文采用 actor-critic 方法，其中 actor 仅基于本地感知信息做决策，而 critic 基于真实全局状态进行学习。算法使用深度强化学习近似 Q 函数和策略。算法性能在两个简单模拟 2D 代理任务上进行评估：1) 找到并维持一定距离；2) 定位目标。

英文摘要

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

URL PDF HTML ☆

赞 0 踩 0

1709.04574 2026-06-04 cs.HC cs.AI cs.SY eess.SY stat.ML 版本更新

Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest

迈向个性化的人工智能交互 - 利用主观兴趣的神经签名来适应AI代理的行为

Victor Shih, David C Jangraw, Paul Sajda, Sameer Saproo

AI总结本文提出通过神经签名检测用户兴趣，使深度强化学习AI代理适应个性化人类偏好，首次展示hBCI在虚拟环境中隐式强化AI控制系统的应用。

Comments 11 pages, 9 figures, 1 table, Submitted to IEEE Trans. on Neural Networks and Learning Systems

详情

AI中文摘要

强化学习AI通常使用环境中的客观奖励/惩罚信号（如游戏得分、完成时间等）来学习最优任务策略。然而，此类AI代理的人机交互应包含隐式且主观的强化信号（如人类对特定AI行为的偏好），以适应个体化的人类偏好。这种适应会模仿自然发生的增强信任和舒适度的社会互动过程。本文展示如何利用混合脑机接口（hBCI）检测个体在虚拟环境中的兴趣水平，以适应控制虚拟自动驾驶车辆的深度强化学习AI代理。具体而言，我们展示AI学习了一种保持与前车安全距离的驾驶策略，并最值得注意的是，当车辆乘客遇到感兴趣物体时，优先减速。这种适应使主观有趣物体的观看时间增加了20%。这是首次展示如何利用hBCI以包含用户偏好的方式向AI代理提供隐式强化。

英文摘要

Reinforcement Learning AI commonly uses reward/penalty signals that are objective and explicit in an environment -- e.g. game score, completion time, etc. -- in order to learn the optimal strategy for task performance. However, Human-AI interaction for such AI agents should include additional reinforcement that is implicit and subjective -- e.g. human preferences for certain AI behavior -- in order to adapt the AI behavior to idiosyncratic human preferences. Such adaptations would mirror naturally occurring processes that increase trust and comfort during social interactions. Here, we show how a hybrid brain-computer-interface (hBCI), which detects an individual's level of interest in objects/events in a virtual environment, can be used to adapt the behavior of a Deep Reinforcement Learning AI agent that is controlling a virtual autonomous vehicle. Specifically, we show that the AI learns a driving strategy that maintains a safe distance from a lead vehicle, and most novelly, preferentially slows the vehicle when the human passengers of the vehicle encounter objects of interest. This adaptation affords an additional 20\% viewing time for subjectively interesting objects. This is the first demonstration of how an hBCI can be used to provide implicit reinforcement to an AI agent in a way that incorporates user preferences into the control system.

URL PDF HTML ☆

赞 0 踩 0

1709.02555 2026-06-04 eess.SY cs.AI cs.LG cs.LO cs.SY 版本更新

Causality-Aided Falsification

因果辅助的反驳

Takumi Akazaki, Yoshihiro Kumazawa, Ichiro Hasuo

AI总结本文提出利用因果信息提升异构系统质量保证中反驳效率的方法，通过贝叶斯网络优化成本函数实现高效输入值搜索。

Comments In Proceedings FVAV 2017, arXiv:1709.02126

1709.02435 2026-06-04 cs.AI cs.LG cs.SE cs.SY eess.SY 版本更新

An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software

ISO 26262分析：在汽车软件中安全使用机器学习

Rick Salay, Rodrigo Queiroz, Krzysztof Czarnecki

AI总结本文分析了在汽车软件中使用机器学习对ISO 26262安全生命周期的影响，并提出适应该标准以容纳机器学习的建议。

Comments 6 pages, 3 figures

1709.02126 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Proceedings First Workshop on Formal Verification of Autonomous Vehicles

第一届自动驾驶车辆形式验证研讨会论文集

Lukas Bulwahn, Maryam Kamali, Sven Linker

发表机构 * International Conference on integrated Formal Methods（国际形式化方法会议）； EPTCS（电子程序技术报告）

AI总结本文集聚焦自动驾驶车辆的形式验证，汇集了形式验证领域研究人员及控制理论、机器人学等领域的专家，探讨验证技术在自动驾驶开发中的应用。

1708.03800 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Energy saving for building heating via a simple and efficient model-free control design: First steps with computer simulations

通过简单高效的模型无关控制设计实现建筑供暖节能：计算机仿真初步研究

Hassane Abouaïssa, Ola Alhaj Hasan, Cédric Join, Michel Fliess, Didier Defer

AI总结本文提出一种无需数学描述的模型无关控制方法，通过计算机仿真展示其在建筑供暖节能中的有效性，并与经典PI控制器和基于平坦度的预测控制器进行对比。

Comments 21st International Conference on System Theory, Control and Computing, October 2017, Sinaia, Romania

1708.08133 2026-06-04 q-bio.NC cs.AI cs.SY eess.SY math.DS 版本更新

Methods for applying the Neural Engineering Framework to neuromorphic hardware

将神经工程框架应用于类脑硬件的方法

Aaron R. Voelker, Chris Eliasmith

AI总结本文介绍了应用于最新类脑硬件的神经工程框架方法，重点在于实现线性和非线性动力系统，并考虑非理想混合模拟-数字突触的高阶动力学。

Comments 11 pages, no figures

1708.01925 2026-06-04 cs.MA cs.AI cs.CY cs.RO cs.SY eess.SY 版本更新

Designing Autonomous Vehicles: Evaluating the Role of Human Emotions and Social Norms

设计自动驾驶车辆：评估人类情感与社会规范的作用

Faisal Riaz, Muaz A. Niazi

AI总结本文提出通过引入社会规范合规机制，使自动驾驶车辆遵循道路与社会规则，利用模糊逻辑和情绪计算提升决策能力，通过模拟验证其在减少碰撞方面的有效性。

Comments 42 pages, 12 figures

详情

AI中文摘要

人类即将在未来不久将驾驶权利委托给自动驾驶车辆。然而，为完成这一复杂任务，需要一种机制，迫使自动驾驶车辆遵守由良好驾驶者实践的道路和社会规则。此任务可通过在自动驾驶车辆中引入社会规范合规机制来实现。本文提出一个自动驾驶车辆的人工社会作为人类社会的类比。每个AV被分配了具有不同社会影响的社会性格。社会规范被引入，帮助AV在受情绪影响的情况下做出道路避障决策。此外，通过基于前景的情绪（即恐惧）的社交规范合规机制，利用模糊逻辑计算情绪，并通过SimConnect方法将恐惧的模糊值提供给Netlogo模拟环境，以模拟自动驾驶车辆的人工社会。通过行为空间工具进行了广泛的测试，以确定所提出方法在碰撞数量方面的性能。此外，还提出了基于随机漫步模型的人工社会作为比较。与随机漫步的比较证明，所提出的方法为未来自动驾驶车辆的自动驾驶系统提供了更好的选择，这些系统在安全道路旅行方面将更具社会接受性和信任度。

英文摘要

Humans are going to delegate the rights of driving to the autonomous vehicles in near future. However, to fulfill this complicated task, there is a need for a mechanism, which enforces the autonomous vehicles to obey the road and social rules that have been practiced by well-behaved drivers. This task can be achieved by introducing social norms compliance mechanism in the autonomous vehicles. This research paper is proposing an artificial society of autonomous vehicles as an analogy of human social society. Each AV has been assigned a social personality having different social influence. Social norms have been introduced which help the AVs in making the decisions, influenced by emotions, regarding road collision avoidance. Furthermore, social norms compliance mechanism, by artificial social AVs, has been proposed using prospect based emotion i.e. fear, which is conceived from OCC model. Fuzzy logic has been employed to compute the emotions quantitatively. Then, using SimConnect approach, fuzzy values of fear has been provided to the Netlogo simulation environment to simulate artificial society of AVs. Extensive testing has been performed using the behavior space tool to find out the performance of the proposed approach in terms of the number of collisions. For comparison, the random-walk model based artificial society of AVs has been proposed as well. A comparative study with a random walk, prove that proposed approach provides a better option to tailor the autopilots of future AVS, Which will be more socially acceptable and trustworthy by their riders in terms of safe road travel.

URL PDF HTML ☆

赞 0 踩 0

1610.05984 2026-06-04 cs.NE cs.AI cs.LG cs.SY eess.SY 版本更新

Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies

粒子群优化用于生成可解释的模糊强化学习策略

Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft

AI总结本文提出一种基于模糊粒子群强化学习（FPSRL）的方法，通过训练参数在模拟真实系统动态的世界模型上生成可解释的模糊强化学习策略，适用于无法进行在线学习的领域。

详情

DOI: 10.1016/j.engappai.2017.07.005
Journal ref: Engineering Applications of Artificial Intelligence, Volume 65C, October 2017, Pages 87-98

AI中文摘要

模糊控制器是用于连续状态和动作空间的有效且可解释的系统控制器。到目前为止，此类控制器要么是手动构建的，要么是通过使用专家生成的问题特定成本函数或结合详细的最优控制策略知识自动训练的。在大多数现实世界的强化学习（RL）问题中，这两种要求都不存在。在这些应用中，由于在线学习需要在策略训练期间探索问题的动力学，因此通常禁止在线学习。我们引入了一种模糊粒子群强化学习（FPSRL）方法，该方法仅通过在模拟真实系统动态的世界模型上训练参数来构建模糊RL策略。这些世界模型是通过使用之前生成的转换样本的自主机器学习技术创建的。据我们所知，这种方法是首次将自组织模糊控制器与基于模型的批量RL相关联的。因此，FPSRL旨在解决那些禁止在线学习、系统动态相对容易从先前生成的默认策略转换样本中建模，并且预计存在相对易于解释的控制策略的领域的问题。通过使用三个标准RL基准，即山车、平衡小车和小车摆起，证明了所提出方法在这些领域中的效率。我们的实验结果展示了高性能且可解释的模糊策略。

英文摘要

Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because online learning requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. Therefore, FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

URL PDF HTML ☆

赞 0 踩 0

1708.03366 2026-06-04 cs.LG cs.AI cs.CR cs.SY eess.SY 版本更新

Resilient Linear Classification: An Approach to Deal with Attacks on Training Data

鲁棒线性分类：一种应对训练数据攻击的方法

Sangdon Park, James Weimer, Insup Lee

AI总结本文提出一种鲁棒线性分类方法，通过引入多数约束，提高对抗训练数据攻击的鲁棒性，验证了传统算法在攻击下的脆弱性。

Comments Accepted as a conference paper at ICCPS17

详情

DOI: 10.1145/3055004.3055006

AI中文摘要

数据驱动技术用于控制自动驾驶车辆、处理能源管理的需求响应以及建模人体生理学用于医疗设备。这些技术从训练数据中提取模型，其性能通常基于训练数据中的随机误差进行分析。然而，如果训练数据被攻击者恶意篡改，这些攻击对数据驱动CPS底层学习算法的影响尚未被考虑。本文分析了分类算法对训练数据攻击的鲁棒性。具体而言，提出了一种通用度量标准，用于衡量分类算法对训练数据最坏情况篡改的鲁棒性。使用该度量标准，我们显示传统线性分类算法在受限条件下具有鲁棒性。为克服这些限制，我们提出了一种具有多数约束的线性分类算法，并证明其比传统算法更鲁棒。在合成数据和一个现实世界的回顾性心律失常医疗案例研究中的评估显示，传统算法对篡改的训练数据易受攻击，而所提算法更具鲁棒性（以最坏情况篡改衡量）。

英文摘要

Data-driven techniques are used in cyber-physical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These data-driven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning data-driven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worst-case tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a real-world retrospective arrhythmia medical case-study show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worst-case tampering).

URL PDF HTML ☆

赞 0 踩 0

1608.02193 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Spacetimes with Semantics (III) - The Structure of Functional Knowledge Representation and Artificial Reasoning

具有语义的空间（III）- 功能知识表示与人工推理的结构

Mark Burgess

AI总结本文探讨了知识表示作为语义系统的结构，基于承诺理论框架，提出概念、关联知识和情境意识的解释，强调语义时空属性对学习和智能系统的影响。

Comments 122 pages, builiding on parts I and II Minor updates and corrections added to current version

详情

AI中文摘要

利用先前发展的语义时空概念，本文在承诺理论框架内探讨了知识表示及其结构作为语义系统的解释。通过为现象赋予解释，从观察者到被观察者，可以接近基于功能的系统简单描述，并具有直接实用价值。重点在于概念、关联知识和情境意识的解释。推断认为，大多数或所有这些概念源于纯粹的语义时空属性，这为更广泛理解学习或智能系统的构成提供了可能。一些关键原则浮现：1）时空尺度分离，2）四种不可约简关联类型的重复出现，通过意图传播：聚合、因果、合作和相似性，3）身份的辨别需求（离散），通过区分时间线同时性与顺序事件，4）学习（记忆）的能力。至少合理推测，涌现的知识抽象能力起源于基本的时空结构。这些笔记呈现了大部分已知结果的统一观点；它们使信息模型、知识表示、机器学习和语义网络（运输和信息基础）在共同框架下得以理解。'智能空间'的概念涵盖了人工系统和生物系统，跨越许多不同尺度，例如智能城市和组织。

英文摘要

Using the previously developed concepts of semantic spacetime, I explore the interpretation of knowledge representations, and their structure, as a semantic system, within the framework of promise theory. By assigning interpretations to phenomena, from observers to observed, we may approach a simple description of knowledge-based functional systems, with direct practical utility. The focus is especially on the interpretation of concepts, associative knowledge, and context awareness. The inference seems to be that most if not all of these concepts emerge from purely semantic spacetime properties, which opens the possibility for a more generalized understanding of what constitutes a learning, or even `intelligent' system. Some key principles emerge for effective knowledge representation: 1) separation of spacetime scales, 2) the recurrence of four irreducible types of association, by which intent propagates: aggregation, causation, cooperation, and similarity, 3) the need for discrimination of identities (discrete), which is assisted by distinguishing timeline simultaneity from sequential events, and 4) the ability to learn (memory). It is at least plausible that emergent knowledge abstraction capabilities have their origin in basic spacetime structures. These notes present a unified view of mostly well-known results; they allow us to see information models, knowledge representations, machine learning, and semantic networking (transport and information base) in a common framework. The notion of `smart spaces' thus encompasses artificial systems as well as living systems, across many different scales, e.g. smart cities and organizations.

URL PDF HTML ☆

赞 0 踩 0

1707.06334 2026-06-04 eess.SY cs.AI cs.IT cs.SY math.IT math.OC nlin.AO 版本更新

Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach

多智能体系统的完全去中心化策略：信息论方法

Roel Dobbe, David Fridovich-Keil, Claire Tomlin

AI总结本文提出基于信息论的框架，用于多智能体系统中无通信条件下的去中心化策略设计，通过率失真理论分析策略重建最优解的能力，并扩展至确定通信节点以提升个体策略性能。

1605.07246 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Adaptive ADMM with Spectral Penalty Parameter Selection

自适应ADMM与谱惩罚参数选择

Zheng Xu, Mario A. T. Figueiredo, Tom Goldstein

AI总结本文提出自适应ADMM算法，通过自适应调整惩罚参数实现快速收敛，提高算法鲁棒性与易用性。

Comments AISTATS 2017

1705.05065 2026-06-04 cs.RO cs.AI cs.CV cs.SY eess.SY 版本更新

AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles

AirSim：面向自动驾驶车辆的高保真视觉与物理模拟

Shital Shah, Debadeepta Dey, Chris Lovett, Ashish Kapoor

AI总结本文提出基于Unreal引擎的AirSim模拟器，用于高效开发和测试自动驾驶算法，支持高频率物理模拟和多种协议，通过四旋翼实验验证其有效性。

Comments Accepted for Field and Service Robotics conference 2017 (FSR 2017)

1707.02515 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning

一种通过模仿学习的快速集成规划与控制系统用于自动驾驶

Liting Sun, Cheng Peng, Wei Zhan, Masayoshi Tomizuka

AI总结本文提出一种结合学习与优化方法的两层框架，通过神经网络学习长期最优策略并结合短期优化控制器提升自动驾驶的安全性和效率。

详情

AI中文摘要

为实现自动驾驶中的安全高效规划与控制，需要一种能够长期 horizon 内实现良好驾驶质量且保证安全可行的驾驶策略。基于优化的方法，如模型预测控制（MPC），可以提供此类最优策略，但其计算复杂度通常无法满足实时实现的需求。为解决此问题，我们提出了一种快速集成规划与控制系统，该系统通过在两层分层结构中结合学习与优化方法。第一层定义为“策略层”，由神经网络建立，学习由MPC生成的长期最优驾驶策略。第二层称为“执行层”，是一个基于优化的短期控制器，能够跟踪由“策略层”提供的参考轨迹，并保证短期的安全性和可行性。此外，通过高效且高度代表性的特征，小尺寸的神经网络足以处理许多复杂的驾驶场景。这使得在线模仿学习与数据集聚合（DAgger）成为可能，从而能够快速且持续地提升“策略层”的性能。几个驾驶场景的例子被演示以验证所提框架的有效性和效率。

英文摘要

For safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To address this problem, we propose a fast integrated planning and control framework that combines learning- and optimization-based approaches in a two-layer hierarchical structure. The first layer, defined as the "policy layer", is established by a neural network which learns the long-term optimal driving policy generated by MPC. The second layer, called the "execution layer", is a short-term optimization-based controller that tracks the reference trajecotries given by the "policy layer" with guaranteed short-term safety and feasibility. Moreover, with efficient and highly-representative features, a small-size neural network is sufficient in the "policy layer" to handle many complicated driving scenarios. This renders online imitation learning with Dataset Aggregation (DAgger) so that the performance of the "policy layer" can be improved rapidly and continuously online. Several exampled driving scenarios are demonstrated to verify the effectiveness and efficiency of the proposed framework.

URL PDF HTML ☆

赞 0 踩 0

1706.09597 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Path Integral Networks: End-to-End Differentiable Optimal Control

路径积分网络：端到端可微最优控制

Masashi Okada, Luca Rigazio, Takenobu Aoshima

AI总结本文提出路径积分网络（PI-Net），一种基于路径积分最优控制算法的递归网络表示，用于最优控制规划。PI-Net通过反向传播和随机梯度下降端到端学习系统动态和成本模型，具备规划能力，可泛化到未见状态，适用于连续控制任务，并支持多种学习方案。

1609.00932 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.PR physics.data-an 版本更新

Spectral learning of dynamic systems from nonequilibrium data

从非平衡数据中学习动态系统的谱方法

Hao Wu, Frank Noé

AI总结本文研究了在不假设数据同分布的情况下，通过施加平衡约束从非平衡观测数据中提取系统平衡动力学的谱学习特性，并提出了一种适用于连续数据的无bin扩展方法，实现线性复杂度下的稳定估计。

详情

Journal ref: Proceedings of the 29th conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 4179-4187

AI中文摘要

可观测操作模型（OOMs）及相关模型是建模和分析随机系统的重要且强大的工具。它们精确描述有限秩系统的动力学，并可通过谱学习在假设数据同分布的情况下高效一致地估计。本文研究了在分析长时间尺度系统时不假设数据同分布的谱学习特性，并展示通过施加平衡约束可从非平衡观测数据中提取系统平衡动力学。此外，本文提出了一种适用于连续数据的无bin扩展谱学习方法。与其他连续值谱算法相比，无bin算法仅需线性复杂度即可实现平衡动力学的一致估计。

英文摘要

Observable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finite-rank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing large-time scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuous-valued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.

URL PDF HTML ☆

赞 0 踩 0

1602.07764 2026-06-04 cs.AI cs.LG cs.NA math.NA math.OC stat.ML 版本更新

Reinforcement Learning of POMDPs using Spectral Methods

使用谱方法进行POMDP的强化学习

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

AI总结本文提出基于谱分解方法的POMDP强化学习算法，通过轨迹学习参数并利用优化 oracle 得到最优无记忆策略，证明了与最优无记忆策略的最优 regret 绑定和高维空间的高效扩展性。

详情

Journal ref: 29th Annual Conference on Learning Theory, PMLR 49:193-256, 2016

AI中文摘要

我们提出了一种新的基于谱分解方法的POMDP强化学习算法。尽管谱方法之前已被用于一致学习隐马尔可夫模型等被动潜在变量模型，但POMDP更具挑战性，因为学习者与环境交互可能会改变未来的观测。我们设计了一种通过回合运行的算法，每个回合中利用谱技术从由固定策略生成的轨迹中学习POMDP参数。回合结束时，优化 oracle 返回基于估计POMDP模型的最优无记忆规划策略，该策略最大化预期奖励。我们证明了与最优无记忆策略相比的最优 regret 绑定以及在观测和动作空间维度上的高效扩展性。

英文摘要

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.

URL PDF HTML ☆

赞 0 踩 0

1405.6341 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient Model Learning for Human-Robot Collaborative Tasks

高效的人机协作任务模型学习

Stefanos Nikolaidis, Keren Gu, Ramya Ramakrishnan, Julie Shah

AI总结本文提出一种框架，通过联合动作演示学习人类用户模型，使机器人能自动计算稳健的协作策略。采用无监督学习聚类动作序列，学习逆强化学习奖励函数，并在混合可观测马尔可夫决策过程框架中应用，实现对新用户的类型推断和策略计算。

详情

DOI: 10.1145/2696454.2696455
Journal ref: Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2015)

AI中文摘要

我们提出了一种框架，用于从联合动作演示中学习人类用户模型，使机器人能够计算协作任务的稳健策略。学习过程完全自动，无需人工干预。首先，我们描述了使用无监督学习算法将演示的动作序列聚类为不同的人类类型。这些演示序列还被机器人用来通过逆强化学习算法学习代表每种类型的奖励函数。学习的模型随后作为混合可观测马尔可夫决策过程（MO-MDP）的一部分使用，其中人类类型是部分可观测变量。通过该框架，我们可以推断新用户类型（未包含在训练集中），并计算与新用户偏好一致且对人类动作偏离具有鲁棒性的机器人策略。最后，我们通过人类受试者实验数据验证了该方法，并进行了概念验证演示，其中一个人与小型工业机器人进行协作任务。

英文摘要

We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot.

URL PDF HTML ☆

赞 0 踩 0

1702.07944 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC stat.ML 版本更新

Stochastic Variance Reduction Methods for Policy Evaluation

基于随机方差缩减的方法用于策略评估

Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

AI总结本文提出基于线性函数逼近的策略评估方法，通过将经验策略评估问题转化为二次凸-凹鞍点问题，并设计了双变量批量梯度方法及两种随机方差缩减算法，实现线性缩放和线性收敛。

Comments Accepted by ICML 2017

1705.10432 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

基于深度强化学习的细粒度加速控制用于自动驾驶交叉口管理

Hamid Mirzaei, Tony Givargis

AI总结本文利用信任区域策略优化方法，实现自动驾驶车辆在网格街道中的细粒度加速控制，以达成全局管理目标。

Comments Accepted in IEEE Smart World Congress 2017

1705.05116 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

通过加权损失调节模块网络以提升手眼协调

Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke

AI总结本文提出端到端微调方法，通过加权损失提升模块化深度视觉-运动策略在平面抓取任务中的手眼协调性能。

Comments 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017

1601.04037 2026-06-04 cs.RO cs.AI cs.SY eess.SY math.DS math.OC 版本更新

Funnel Libraries for Real-Time Robust Feedback Motion Planning

用于实时鲁棒反馈运动规划的 funnel 库

Anirudha Majumdar, Russ Tedrake

AI总结本文提出利用预计算的 funnel 库实现实时鲁棒反馈运动规划，通过凸优化计算 funnel 并在运行时安全组合运动计划，验证了在复杂环境中高动态机器人系统鲁棒性和安全性。

Comments International Journal of Robotics Research (To Appear)

详情

AI中文摘要

我们考虑了在存在环境不确定性、参数模型不确定性和扰动时，生成保证成功的机器人运动计划的问题。此外，我们还考虑了必须在实时中生成这些计划的场景，因为环境中的约束（如障碍物）可能在运行时通过有噪声的传感器感知到。我们的方法是预先计算不同系统操作的“funnels”库，这些 funnels 确保在执行对应操作的反馈控制器时，状态在扰动范围内保持。我们利用凸优化（特别是求和平方编程）的强大计算能力来计算这些 funnels。所得到的 funnel 库然后在运行时被顺序组合以生成运动计划，同时确保机器人的安全性。本文的一个主要优势是通过显式考虑不确定性的影响，机器人可以根据运动计划对扰动的脆弱性来评估。我们通过大量硬件实验（在高速（约12英里/小时）下避障的小型固定翼飞机）和彻底的仿真实验（地面车辆和四旋翼模型在复杂环境中导航）来演示和验证我们的方法。据我们所知，这些演示构成了首次证明安全且鲁棒的控制方法，用于具有复杂非线性动力学的机器人系统，在具有复杂几何约束的环境中实时规划。

英文摘要

We consider the problem of generating motion plans for a robot that are guaranteed to succeed despite uncertainty in the environment, parametric model uncertainty, and disturbances. Furthermore, we consider scenarios where these plans must be generated in real-time, because constraints such as obstacles in the environment may not be known until they are perceived (with a noisy sensor) at runtime. Our approach is to pre-compute a library of "funnels" along different maneuvers of the system that the state is guaranteed to remain within (despite bounded disturbances) when the feedback controller corresponding to the maneuver is executed. We leverage powerful computational machinery from convex optimization (sums-of-squares programming in particular) to compute these funnels. The resulting funnel library is then used to sequentially compose motion plans at runtime while ensuring the safety of the robot. A major advantage of the work presented here is that by explicitly taking into account the effect of uncertainty, the robot can evaluate motion plans based on how vulnerable they are to disturbances. We demonstrate and validate our method using extensive hardware experiments on a small fixed-wing airplane avoiding obstacles at high speed (~12 mph), along with thorough simulation experiments of ground vehicle and quadrotor models navigating through cluttered environments. To our knowledge, these demonstrations constitute one of the first examples of provably safe and robust control for robotic systems with complex nonlinear dynamics that need to plan in real-time in environments with complex geometric constraints.

URL PDF HTML ☆

赞 0 踩 0

1704.03103 2026-06-04 cs.RO cs.AI cs.CG cs.SY eess.SY 版本更新

Minkowski Operations of Sets with Application to Robot Localization

Minkowski运算与机器人定位的应用

Benoit Desrochers, Luc Jaulin

AI总结本文通过引入Minkowski和与差的分离器，高效解决机器人在非结构化环境中基于声呐测量的定位问题，并通过测试案例验证了方法的有效性。

Comments In Proceedings SNR 2017, arXiv:1704.02421

1704.01383 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach

通过模型无关方法实现自动驾驶车辆纵向控制的有限时间稳定化

Philip Polack, Brigitte d'Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour

AI总结本文提出一种模型无关的纵向控制方法，用于计算车辆轮扭矩命令，以克服未知车辆参数带来的控制律生成问题。通过使关键参数时间变化，确保有限时间稳定性，仿真显示 overshoot 减小，驾驶舒适性提高，时间延迟鲁棒性增强。

Comments IFAC 2017 World Congress, Toulouse

1703.08262 2026-06-04 eess.SY cs.AI cs.FL cs.SY 版本更新

用于路线规划的故障交通传感器最优检测

Amin Ghafouri, Aron Laszka, Abhishek Dubey, Xenofon Koutsoukos

AI总结本文提出基于高斯过程的预测模型，用于检测故障交通传感器，减少误报和漏报对路线规划的影响，并通过实测数据验证方法有效性。

Comments Proceedings of The 2nd Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE 2017), Pittsburgh, PA USA, April 2017, 6 pages

详情

DOI: 10.1145/3063386.3063767

AI中文摘要

在智能城市中，实时交通传感器可能被用于各种应用，如路线规划。不幸的是，传感器容易出现故障，导致错误的交通数据。错误的数据会严重影响路线规划等应用，并增加旅行时间。为最小化传感器故障的影响，必须及时准确地检测故障。然而，典型检测算法可能导致大量误报和漏报，从而导致次优的路线规划。本文提出了一种有效的检测器，利用基于高斯过程的预测模型来识别故障交通传感器。进一步，我们提出了一种计算检测器最佳参数的方法，以最小化由于误报和漏报造成的损失。我们还确定了关键传感器，其故障对路线规划应用影响较大。最后，我们实施了我们的方法，并使用真实世界数据集和路线规划平台OpenTripPlanner进行数值评估。

英文摘要

In a smart city, real-time traffic sensors may be deployed for various applications, such as route planning. Unfortunately, sensors are prone to failures, which result in erroneous traffic data. Erroneous data can adversely affect applications such as route planning, and can cause increased travel time. To minimize the impact of sensor failures, we must detect them promptly and accurately. However, typical detection algorithms may lead to a large number of false positives (i.e., false alarms) and false negatives (i.e., missed detections), which can result in suboptimal route planning. In this paper, we devise an effective detector for identifying faulty traffic sensors using a prediction model based on Gaussian Processes. Further, we present an approach for computing the optimal parameters of the detector which minimize losses due to false-positive and false-negative errors. We also characterize critical sensors, whose failure can have high impact on the route planning application. Finally, we implement our method and evaluate it numerically using a real-world dataset and the route planning platform OpenTripPlanner.

URL PDF HTML ☆

赞 0 踩 0

1701.01654 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Application of Fuzzy Logic in Design of Smart Washing Machine

模糊逻辑在智能洗衣机设计中的应用

Rao Farhat Masood

AI总结本文研究了基于模糊逻辑控制器的智能洗衣机设计，通过自动化输入实现高效洗衣时间管理，提升电能利用效率和工作效能。

Comments Fuzzy Washing Machine, Smart Washing Machine

详情

AI中文摘要

洗衣机在家庭中具有重要需求，因为它能减轻我们洗衣服的负担并节省大量时间。本文将探讨基于模糊逻辑控制器的智能洗衣机设计与开发。传统的洗衣机（定时器基于）使用多转定时器启动-停止机制，这种机制是机械的，容易损坏。除了启动和停止的问题外，机械定时器在维护和电力使用效率方面也存在问题。最近的发展表明，将数字电子技术融入该机器的最优功能是可能的，如今在实践中已实现。一些国际知名公司已通过引入智能人工智能开发了这种机器。此类机器利用传感器并智能计算主电机的运行时间（洗涤时间）。实时计算和处理也用于优化机器的运行时间。显然的结果是智能时间管理、更好的电力经济性和工作效率。本文探讨了基于模糊逻辑控制器的洗衣机的国产化，该洗衣机能够自动化输入并获得所需输出（洗涤时间）。

英文摘要

Washing machine is of great domestic necessity as it frees us from the burden of washing our clothes and saves ample of our time. This paper will cover the aspect of designing and developing of Fuzzy Logic based, Smart Washing Machine. The regular washing machine (timer based) makes use of multi-turned timer based start-stop mechanism which is mechanical as is prone to breakage. In addition to its starting and stopping issues, the mechanical timers are not efficient with respect of maintenance and electricity usage. Recent developments have shown that merger of digital electronics in optimal functionality of this machine is possible and nowadays in practice. A number of international renowned companies have developed the machine with the introduction of smart artificial intelligence. Such a machine makes use of sensors and smartly calculates the amount of run-time (washing time) for the main machine motor. Realtime calculations and processes are also catered in optimizing the run-time of the machine. The obvious result is smart time management, better economy of electricity and efficiency of work. This paper deals with the indigenization of FLC (Fuzzy Logic Controller) based Washing Machine, which is capable of automating the inputs and getting the desired output (wash-time).

URL PDF HTML ☆

赞 0 踩 0

1703.03161 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Behavior-based Navigation of Mobile Robot in Unknown Environments Using Fuzzy Logic and Multi-Objective Optimization

基于模糊逻辑和多目标优化的未知环境中移动机器人行为导航

Thi Thanh Van Nguyen, Manh Duong Phung, Quang Vinh Tran

AI总结本文提出BBFM架构，通过模糊控制器和多目标优化协调机器人在未知环境中避障和避开局部极小值的问题，提升了导航精度和效率。

详情

DOI: 10.14257/ijca.2017.10.2.29
Journal ref: International Journal of Control and Automation, Vol. 10, No. 2 (2017), pp.349-364

AI中文摘要

本文提出一种名为BBFM的行为导航架构，用于解决在存在障碍物和局部极小值区域的未知环境中移动机器人的导航问题。在该架构中，复杂导航任务被分解为主要子任务或行为。每个行为由模糊控制器实现并独立执行以处理特定导航问题。模糊控制器被修改为仅包含模糊化和推理过程，使其输出表示行为的目标的隶属函数。所有控制器的隶属函数随后用作多目标优化过程的目标函数以协调所有行为。该过程的结果是整体控制信号，即帕累托最优的控制信号，用于控制机器人。进行了大量模拟、比较和实验。结果表明，所提出的架构在精度、平滑度、行驶距离和时间响应方面优于一些流行的基于行为的架构。

英文摘要

This study proposes behavior-based navigation architecture, named BBFM, to deal with the problem of navigating the mobile robot in unknown environments in the presence of obstacles and local minimum regions. In the architecture, the complex navigation task is split into principal sub-tasks or behaviors. Each behavior is implemented by a fuzzy controller and executed independently to deal with a specific problem of navigation. The fuzzy controller is modified to contain only the fuzzification and inference procedures so that its output is a membership function representing the behavior's objective. The membership functions of all controllers are then used as the objective functions for a multi-objective optimization process to coordinate all behaviors. The result of this process is an overall control signal, which is Pareto-optimal, used to control the robot. A number of simulations, comparisons, and experiments were conducted. The results show that the proposed architecture outperforms some popular behavior-based architectures in term of accuracy, smoothness, traveled distance, and time response.

URL PDF HTML ☆

赞 0 踩 0

1703.02810 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

An Integrated and Scalable Platform for Proactive Event-Driven Traffic Management

主动事件驱动交通管理的集成可扩展平台

Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky

AI总结本文提出一个集成平台，通过事件驱动方法预测拥堵，提升交通管理效率。

1702.08726 2026-06-04 cs.SE cs.AI cs.SY eess.SY 版本更新

Stacked Thompson Bandits

堆叠汤普森老虎机

Lenz Belzner, Thomas Gabor

AI总结堆叠汤普森老虎机通过模拟评估计划并采用贝叶斯方法指导搜索，高效生成满足时间逻辑要求的计划。

Comments Accepted at SEsCPS @ ICSE 2017

1503.03467 2026-06-04 math.NA cs.AI cs.NA math.ST stat.ML stat.TH 版本更新

Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games

多重网格与粗糙系数的多重分辨率算子分解来自层次信息博弈

Houman Owhadi

AI总结本文提出了一种近线性复杂度的多重网格/多重分辨率方法，用于处理具有粗糙系数的偏微分方程，通过信息博弈理论框架实现精确的先验精度和性能估计。

Comments Presented at SIAM CSE 15. Final (published) version. http://epubs.siam.org/doi/abs/10.1137/15M1013894

详情

Journal ref: SIAM Rev. 59-1, pp. 99-149 (2017)

AI中文摘要

我们介绍了一种近线性复杂度（几何和无网格/代数）的多重网格/多重分辨率方法，用于具有粗糙（L^∞）系数的偏微分方程，具有严格的先验准确性和性能估计。该方法通过决策/博弈理论框架发现，解决三个问题：（1）识别限制和插值算子；（2）基于线性算子图像的范数约束恢复信号；（3）基于解的层次嵌套测量的赌注。所得基本赌注形成一个层次的（确定性）基函数集合H^1_0(Ω)（赌注函数），这些函数（1）在子尺度/子带之间关于由偏微分方程能量范数诱导的标量积正交；（2）在H^1_0(Ω)中实现解空间的稀疏压缩；（3）诱导一个正交的多重分辨率算子分解。多重网格方法的操作图是一个倒置金字塔，其中赌注函数局部计算（由于指数衰减），层次化（从细到粗尺度），并分解为具有均匀有界条件数的独立线性系统。所得算法在空间（通过局部化）和带宽/子尺度（子尺度可以独立计算）上均可并行化。尽管该方法是确定性的，但其在信息博弈框架下具有自然的贝叶斯解释，且多重分辨率逼近相对于由嵌套测量层次诱导的滤波器形成一个鞅。

英文摘要

We introduce a near-linear complexity (geometric and meshless/algebraic) multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision/game theory formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measurements based on norm constraints on its image under a linear operator (3) gambling on the value of the solution of the PDE based on a hierarchy of nested measurements of its solution or source term. The resulting elementary gambles form a hierarchy of (deterministic) basis functions of $H^1_0(Ω)$ (gamblets) that (1) are orthogonal across subscales/subbands with respect to the scalar product induced by the energy norm of the PDE (2) enable sparse compression of the solution space in $H^1_0(Ω)$ (3) induce an orthogonal multiresolution operator decomposition. The operating diagram of the multigrid method is that of an inverted pyramid in which gamblets are computed locally (by virtue of their exponential decay), hierarchically (from fine to coarse scales) and the PDE is decomposed into a hierarchy of independent linear systems with uniformly bounded condition numbers. The resulting algorithm is parallelizable both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other). Although the method is deterministic it has a natural Bayesian interpretation under the measure of probability emerging (as a mixed strategy) from the information game formulation and multiresolution approximations form a martingale with respect to the filtration induced by the hierarchy of nested measurements.

URL PDF HTML ☆

赞 0 踩 0

1702.01205 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Traffic Lights with Auction-Based Controllers: Algorithms and Real-World Data

带拍卖机制的交通灯控制器：算法与现实数据

Shumeet Baluja, Michele Covell, Rahul Sukthankar

AI总结本文提出一种基于拍卖的交通灯控制器，通过微拍卖整合交通传感器信息，提升路容量和平均出行时间，优于现有静态程序灯和长期规划方案。

详情

AI中文摘要

ACRV 摘取基准 (APB)：一个促进可重复研究的机器人货架摘取基准

Jürgen Leitner, Adam W. Tow, Jake E. Dean, Niko Suenderhauf, Joseph W. Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Liao Wu, Fangyi Zhang, Ben Upcroft, Peter Corke

AI总结本文提出ACRV摘取基准(APB)，通过42个常见物品、广泛可用的货架和精确的物品排列指南，提供可重复的机器人摘取基准，支持完整机器人系统的比较。

Comments 8 pages, submitted to RA:Letters

1612.04023 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Proceedings of the The First Workshop on Verification and Validation of Cyber-Physical Systems

第一届验证与验证网络物理系统研讨会会议记录

Mehdi Kargahi, Ashutosh Trivedi

发表机构 * Reykjavík, Iceland（冰岛雷克雅未克）； MITL Specification Debugging for Monitoring of Cyber-Physical Systems（网络物理系统监控的MITL规格调试）； Automatic Synthesis of Controllers from Specifications using Control Certificates（使用控制证书从规范自动合成控制器）； A Compositional Framework for Preference-Aware Agents（偏好感知代理的组合框架）； Output Feedback Controller Design with Symbolic Observers for Cyber-physical Systems（网络物理系统符号观测器输出反馈控制器设计）； Towards an Approximate Conformance Relation for Hybrid I/O Automata（混合I/O自动机近似一致性关系）； On Nonlinear Prices in Timed Automata（时序自动机中的非线性价格）； Towards the Verification of Safety-critical Autonomous Systems in Dynamic Environments（动态环境中安全关键自主系统的验证）

AI总结本文介绍了首届网络物理系统验证与验证研讨会，探讨了验证与验证方法，包括控制、模拟和形式化方法等，旨在解决复杂软件和算法的验证问题。

详情

DOI: 10.4204/EPTCS.232
Journal ref: EPTCS 232, 2016

AI中文摘要

第一届国际网络物理系统验证与验证研讨会（V2CPS-16）于冰岛雷克雅未克举行的第十二届国际形式化方法整合会议（iFM 2016）期间召开。该研讨会旨在汇集形式化验证和网络物理系统（CPS）领域的研究人员和专家，讨论涵盖广泛验证与验证方法的主题，包括但不限于控制、模拟、形式化方法等。网络物理系统（CPS）是网络化计算和物理过程的整合，具有有意义的相互作用；前者监控、控制并影响后者，而后者也影响前者。CPS在机器人、交通、通信、基础设施、能源和制造系统中有广泛应用。许多安全关键系统，如化学过程、医疗设备、飞机飞行控制系统和汽车系统，确实属于CPS。CPS的先进能力需要复杂的软件和合成算法，这些算法难以验证。事实上，该领域中的许多问题都是不可判定的。因此，一个重要的步骤是找到特定的抽象，这些抽象可能在特定属性上算法上可验证，描述CPS的部分或整体行为。

英文摘要

The first International Workshop on Verification and Validation of Cyber-Physical Systems (V2CPS-16) was held in conjunction with the 12th International Conference on integration of Formal Methods (iFM 2016) in Reykjavik, Iceland. The purpose of V2CPS-16 was to bring together researchers and experts of the fields of formal verification and cyber-physical systems (CPS) to cover the theme of this workshop, namely a wide spectrum of verification and validation methods including (but not limited to) control, simulation, formal methods, etc. A CPS is an integration of networked computational and physical processes with meaningful inter-effects; the former monitors, controls, and affects the latter, while the latter also impacts the former. CPSs have applications in a wide-range of systems spanning robotics, transportation, communication, infrastructure, energy, and manufacturing. Many safety-critical systems such as chemical processes, medical devices, aircraft flight control, and automotive systems, are indeed CPS. The advanced capabilities of CPS require complex software and synthesis algorithms, which are hard to verify. In fact, many problems in this area are undecidable. Thus, a major step is to find particular abstractions of such systems which might be algorithmically verifiable regarding specific properties of such systems, describing the partial/overall behaviors of CPSs.

URL PDF HTML ☆

赞 0 踩 0

1612.02739 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Controlling Robot Morphology from Incomplete Measurements

从不完整测量中控制机器人形态

Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda

AI总结针对复杂形态机器人在城市搜索与救援任务中的地形穿越需求，提出通过自主控制处理不完整数据并确保安全性的方法。

Comments Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems

1612.01399 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A New Type-II Fuzzy Logic Based Controller for Non-linear Dynamical Systems with Application to a 3-PSP Parallel Robot

一种新型类型-II 模糊逻辑控制器用于非线性动力学系统及其在3-PSP并联机器人中的应用

Hamid Reza Hassanzadeh

AI总结本文提出一种基于类型-II 模糊逻辑的控制器，用于非线性动力学系统，应用于3-PSP并联机器人，以应对不确定性问题。

Comments Master's thesis

详情

AI中文摘要

不确定性在几乎所有复杂系统中都是突出的例子，包括并联机器人。类型-II 模糊逻辑在处理不确定性方面优于传统模糊逻辑。类型-II 模糊逻辑控制器是较新的方法，因其在噪声（不确定性的重要实例）出现时的显著贡献而被应用于各种领域。在设计类型-I 模糊逻辑系统时，我们假设对模糊隶属函数几乎确定，但在许多情况下并不成立。因此，类型-II 模糊逻辑作为更现实的方法，可能在实际应用中有很大贡献。类型-II 模糊逻辑考虑了更高层次的不确定性，即类型-II 模糊变量的隶属度不再是一个确定的数字，而是本身是一个类型-I 语言术语。本文考虑了动态控制并联机器人中的不确定性影响。更具体地说，旨在将类型-II 模糊逻辑范式纳入基于模型的控制器，即所谓的计算扭矩控制方法，并将结果应用于具有3自由度的并联执行器。

英文摘要

The concept of uncertainty is posed in almost any complex system including parallel robots as an outstanding instance of dynamical robotics systems. As suggested by the name, uncertainty, is some missing information that is beyond the knowledge of human thus we may tend to handle it properly to minimize the side-effects through the control process. Type-II fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. Type-II fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when noise (as an important instance of uncertainty) emerges. During the design of Type-I fuzzy logic systems, we presume that we are almost certain about the fuzzy membership functions which is not true in many cases. Thus T2FLS as a more realistic approach dealing with practical applications might have a lot to offer. Type-II fuzzy logic takes into account a higher level of uncertainty, in other words, the membership grade for a type-II fuzzy variable is no longer a crisp number but rather is itself a type-I linguistic term. In this thesis the effects of uncertainty in dynamic control of a parallel robot is considered. More specifically, it is intended to incorporate the Type-II Fuzzy Logic paradigm into a model based controller, the so-called computed torque control method, and apply the result to a 3 degrees of freedom parallel manipulator. ...

URL PDF HTML ☆

赞 0 踩 0

1611.09926 2026-06-04 econ.GN cs.AI q-fin.EC 版本更新

Choquet integral in decision analysis - lessons from the axiomatization

Choquet积分在决策分析中的应用——从公理化分析中汲取教训

Mikhail Timonin

AI总结本文探讨Choquet积分的公理化分析及其在决策分析中的应用，指出传统方法在学习过程中存在的假设问题，并提出新的状态依赖效用模型。

详情

AI中文摘要

Choquet积分是一种强大的聚合算子，包含许多已知模型作为特例。本文分析这些特例的公理化性质，并探讨其学习过程中的假设问题。传统方法常假设所有维度在同一尺度上，但此假设在实践中难以成立。本文讨论了状态依赖效用模型的条件，并展示了其与传统方法的不同之处。

英文摘要

The Choquet integral is a powerful aggregation operator which lists many well-known models as its special cases. We look at these special cases and provide their axiomatic analysis. In cases where an axiomatization has been previously given in the literature, we connect the existing results with the framework that we have developed. Next we turn to the question of learning, which is especially important for the practical applications of the model. So far, learning of the Choquet integral has been mostly confined to the learning of the capacity. Such an approach requires making a powerful assumption that all dimensions (e.g. criteria) are evaluated on the same scale, which is rarely justified in practice. Too often categorical data is given arbitrary numerical labels (e.g. AHP), and numerical data is considered cardinally and ordinally commensurate, sometimes after a simple normalization. Such approaches clearly lack scientific rigour, and yet they are commonly seen in all kinds of applications. We discuss the pros and cons of making such an assumption and look at the consequences which axiomatization uniqueness results have for the learning problems. Finally, we review some of the applications of the Choquet integral in decision analysis. Apart from MCDA, which is the main area of interest for our results, we also discuss how the model can be interpreted in the social choice context. We look in detail at the state-dependent utility, and show how comonotonicity, central to the previous axiomatizations, actually implies state-independency in the Choquet integral model. We also discuss the conditions required to have a meaningful state-dependent utility representation and show the novelty of our results compared to the previous methods of building state-dependent models.

URL PDF HTML ☆

赞 0 踩 0

1611.09809 2026-06-04 eess.SY cs.AI cs.SY math.OC nlin.CD 版本更新

Fractional Order Fuzzy Control of Hybrid Power System with Renewable Generation Using Chaotic PSO

分数阶模糊控制在含可再生能源混合电力系统中的应用：基于混沌PSO的优化

Indranil Pan, Saptarshi Das

AI总结本文提出一种分数阶模糊控制方案，结合混沌PSO优化，提升混合电力系统在非线性工况下的性能与鲁棒性。

Comments 21 pages, 12 figures, 4 tables

详情

DOI: 10.1016/j.isatra.2015.03.003
Journal ref: ISA Transactions, Volume 62, May 2016, Pages 19-29

AI中文摘要

本文研究了一种混合电力系统的操作，通过一种新颖的模糊控制方案。混合电力系统包含多种自主发电系统，如风力涡轮机、太阳能光伏、柴油机、燃料电池、水电解器等。其他储能设备如电池、飞轮和超电容器也存在于网络中。采用了一种新颖的分数阶（FO）模糊控制方案，并利用粒子群优化（PSO）算法结合两个混沌映射来调整其参数，以实现改进的性能。该分数阶模糊控制器在线性和非线性运行工况下均优于传统PID和整数阶模糊PID控制器。该控制器在系统参数变化和速率约束非线性方面也表现出更强的鲁棒性。鲁棒性是这种情况下非常理想的特性，因为混合电力系统中的许多组件可能在不同时间点被开启/关闭或以不同功率输出运行。

英文摘要

This paper investigates the operation of a hybrid power system through a novel fuzzy control scheme. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell, aqua electrolyzer etc. Other energy storage devices like the battery, flywheel and ultra-capacitor are also present in the network. A novel fractional order (FO) fuzzy control scheme is employed and its parameters are tuned with a particle swarm optimization (PSO) algorithm augmented with two chaotic maps for achieving an improved performance. This FO fuzzy controller shows better performance over the classical PID, and the integer order fuzzy PID controller in both linear and nonlinear operating regimes. The FO fuzzy controller also shows stronger robustness properties against system parameter variation and rate constraint nonlinearity, than that with the other controller structures. The robustness is a highly desirable property in such a scenario since many components of the hybrid power system may be switched on/off or may run at lower/higher power output, at different time instants.

URL PDF HTML ☆

赞 0 踩 0

1611.09755 2026-06-04 eess.SY cs.AI cs.NE cs.SY math.OC 版本更新

Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

分数阶AGC用于分布式能源资源的鲁棒优化

Indranil Pan, Saptarshi Das

AI总结本文研究了分数阶自动发电控制在电力系统频率振荡阻尼中的应用，采用分布式能源发电，通过鲁棒优化技术优化控制器参数，提升系统性能。

Comments 12 pages, 16 figures, 5 tables

详情

DOI: 10.1109/TSG.2015.2459766
Journal ref: IEEE Transactions on Smart Grid, Volume 7, Issue 5, Pages 2175 - 2186, Sept 2016

AI中文摘要

本文探讨了分数阶（FO）自动发电控制（AGC）在电力系统频率振荡阻尼中的适用性，采用分布式能源发电。混合电力系统包含多种自主发电系统，如风力涡轮机、太阳能光伏、柴油机、燃料电池和水电解器，以及电池和飞轮等其他储能设备。控制器位于远程位置，通过不可靠的通信网络发送和接收信号，具有随机延迟。控制器参数通过鲁棒优化技术优化，使用不同变种的粒子群优化（PSO）算法，并与相应最优解进行比较。采用基于档案的策略减少鲁棒优化方法的函数评估次数。通过鲁棒优化获得的解决方案能够处理控制器增益和阶数的更高变化，而系统性能不会显著下降。这从分数阶控制器实施的角度来看是有益的，因为设计能够容纳由于分数阶运算符近似不同实现方法和精度阶数而可能产生的系统参数变化。还比较了分数阶和整数阶（IO）控制器，以突出每种方案的优缺点。

英文摘要

The applicability of fractional order (FO) automatic generation control (AGC) for power system frequency oscillation damping is investigated in this paper, employing distributed energy generation. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell and aqua electrolyzer along with other energy storage devices like the battery and flywheel. The controller is placed in a remote location while receiving and sending signals over an unreliable communication network with stochastic delay. The controller parameters are tuned using robust optimization techniques employing different variants of Particle Swarm Optimization (PSO) and are compared with the corresponding optimal solutions. An archival based strategy is used for reducing the number of function evaluations for the robust optimization methods. The solutions obtained through the robust optimization are able to handle higher variation in the controller gains and orders without significant decrease in the system performance. This is desirable from the FO controller implementation point of view, as the design is able to accommodate variations in the system parameter which may result due to the approximation of FO operators, using different realization methods and order of accuracy. Also a comparison is made between the FO and the integer order (IO) controllers to highlight the merits and demerits of each scheme.

URL PDF HTML ☆

赞 0 踩 0

1611.03372 2026-06-04 cs.RO cs.AI cs.SE cs.SY eess.SY 版本更新

A stochastically verifiable autonomous control architecture with reasoning

一种具有推理能力的随机可验证自主控制架构

Paolo Izzo, Hongyang Qu, Sandor M. Veres

AI总结本文提出一种具有推理能力的随机可验证自主控制架构LISA，通过将系统抽象为DTMC和MDP模型，实现代理与环境的概率验证，提升设计与运行时的验证效率。

Comments Accepted at IEEE Conf. Decision and Control, 2016

1610.08127 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

快速的贝叶斯非负矩阵分解与三因子分解

Thomas Brouwer, Jes Frellsen, Pietro Lio'

AI总结本文提出一种快速变分贝叶斯算法，用于非负矩阵分解和三因子分解，相比Gibbs采样和非概率方法，该方法在迭代和时间步收敛速度更快，且无需额外样本估计后验。

Comments NIPS 2016 Workshop on Advances in Approximate Bayesian Inference

1610.03518 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

通过学习深度逆动力学模型实现仿真到现实世界的迁移

Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba

AI总结本文提出通过学习深度逆动力学模型，在仿真与现实世界之间实现控制策略的迁移，解决仿真与现实差异导致的性能下降问题。

详情

AI中文摘要

在仿真中开发控制策略通常比直接在现实世界中运行实验更实用和安全。这适用于通过规划和优化获得的策略，甚至更适用于通过强化学习获得的策略，后者通常非常数据密集。然而，仿真中成功的策略在部署到现实机器人时往往无法工作。然而，策略在仿真中执行的整体思路在现实世界中通常仍然有效。本文研究了此类场景，其中仿真中遍历的状态序列在现实世界中仍然合理，即使控制细节不同，例如摩擦、接触、质量和几何属性的差异。在执行过程中，我们的方法在每个时间步计算仿真基于的控制策略会做什么，但不执行这些控制在现实机器人上，而是计算仿真期望的下一个状态，并依赖于学习的深度逆动力学模型来决定最合适的现实世界动作以达到这些状态。深度模型只有在训练数据足够的情况下才有效，我们还提出了一种数据收集方法来（逐步）学习深度逆动力学模型。我们的实验表明，我们的方法在处理仿真到现实世界模型差异的各种基线方法中表现良好，包括输出误差控制和高斯动态适应。

英文摘要

Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.

URL PDF HTML ☆

赞 0 踩 0

1610.00001 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

基于细菌觅食优化的STATCOM用于电力系统稳定性评估

Shiba R. Paital, Prakash K. Ray, Asit Mohanty, Sandipan Patra, Harishchandra Dubey

AI总结本文研究了利用静态补偿器（STATCOM）改进单机连接无限母线（SMIB）电力系统稳定性，通过粒子群优化（PSO）和细菌觅食优化（BFO）优化PID控制器参数，与传统PID控制器进行比较，验证新方案在稳定性和电压调节上的鲁棒性。

Comments 5 pages, 7 figures, 2016 IEEE Students' Technology Symposium (TechSym 2016), At IIT Kharagpur, India

1609.05960 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

基于增量采样的运动规划器使用策略迭代方法

Oktay Arslan, Panagiotis Tsiotras

AI总结本文提出了一种基于策略迭代的运动规划算法，利用动态规划思想在随机图中求解最短路径问题，通过改进策略加速计算过程，适用于大规模并行化。

详情

AI中文摘要

最近随机运动规划的进步导致了一类新的基于采样的算法发展，这些算法提供了渐近最优性保证，例如RRT*和PRM*算法。仔细分析发现，这些算法中的所谓'重 wiring'步骤可以被解释为局部策略迭代（PI）步骤（即局部策略评估步骤后跟局部策略改进步骤），因此随着样本数趋于无穷大，这两种算法几乎肯定收敛到最优路径（概率1）。策略迭代，与价值迭代（VI）一样，是解决动态规划（DP）问题的常用方法。基于这一观察，最近提出了RRT#算法，该算法在每次迭代中对那些可能成为最优路径部分的顶点（即'有希望'的顶点）执行Bellman更新（即'备份'）。RRT#算法因此利用了动态规划思想，并在随机生成的图上逐步实现以获得高质量的解决方案。在本文中，基于这一关键洞察，我们探索了一类不同的动态规划算法来解决由迭代采样方法生成的随机图中的最短路径问题。这些算法利用策略迭代而不是价值迭代，因此更适合大规模并行化。与RRT*算法不同，策略改进在重 wiring步骤中不是仅在局部进行，而是在当前迭代中被分类为'有希望'的顶点集合上进行。这倾向于加快整个过程。所得到的算法，恰当地命名为策略迭代-RRT#（PI-RRT#），是第一种基于动态规划思想的随机运动规划新类算法，利用PI方法。

英文摘要

Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the so-called "rewiring" step in these algorithms can be interpreted as a local policy iteration (PI) step (i.e., a local policy evaluation step followed by a local policy improvement step) so that asymptotically, as the number of samples tend to infinity, both algorithms converge to the optimal path almost surely (with probability 1). Policy iteration, along with value iteration (VI) are common methods for solving dynamic programming (DP) problems. Based on this observation, recently, the RRT$^{\#}$ algorithm has been proposed, which performs, during each iteration, Bellman updates (aka "backups") on those vertices of the graph that have the potential of being part of the optimal path (i.e., the "promising" vertices). The RRT$^{\#}$ algorithm thus utilizes dynamic programming ideas and implements them incrementally on randomly generated graphs to obtain high quality solutions. In this work, and based on this key insight, we explore a different class of dynamic programming algorithms for solving shortest-path problems on random graphs generated by iterative sampling methods. These class of algorithms utilize policy iteration instead of value iteration, and thus are better suited for massive parallelization. Contrary to the RRT* algorithm, the policy improvement during the rewiring step is not performed only locally but rather on a set of vertices that are classified as "promising" during the current iteration. This tends to speed-up the whole process. The resulting algorithm, aptly named Policy Iteration-RRT$^{\#}$ (PI-RRT$^{\#}$) is the first of a new class of DP-inspired algorithms for randomized motion planning that utilize PI methods.

URL PDF HTML ☆

赞 0 踩 0

1608.08292 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Robust Energy Storage Scheduling for Imbalance Reduction of Strategically Formed Energy Balancing Groups

稳健的能源存储调度以减少战略形成能源平衡小组的不平衡

Shantanu Chakraborty, Toshiya Okabe

AI总结本文提出了一种基于概率编程的能源平衡小组形成方法，结合稳健存储调度策略，以减少能源不平衡并优化存储容量。

详情

DOI: 10.1016/j.energy.2016.07.170.
Journal ref: Energy, Volume 114, 1 November 2016, Pages 405-417, ISSN 0360-5442

AI中文摘要

减少不平衡（合同供应与实际需求之间的在线能量差距及相关成本）对电力生产者和供应商（PPS）在 deregulated 能源市场中至关重要。PPS 需要通过前向市场互动尽可能精确地采购能源以减少不平衡能量。本文提出，1）（离线）一种有效的需求聚合策略，用于创建多个平衡小组，从而提高小组级聚合需求的可预测性；2）（在线）一种稳健的能源存储调度方法，以最小化特定平衡小组的不平衡，考虑需求预测的不确定性。小组形成通过概率编程方法使用贝叶斯马尔可夫链蒙特卡洛（MCMC）方法，在应用历史需求统计数据后进行。除了小组形成外，聚合策略（借助贝叶斯推断）还清除了所形成小组所需存储容量的上限，其中一部分将用于在线操作。在线操作中，提出了一种稳健的能源存储调度方法，以最小化预期不平衡能量和成本（不平衡能量的非线性函数），同时考虑特定小组的需求不确定性。所提出的方法应用于日本东京实际公寓建筑的需求数据。仿真结果用于验证所提方法的有效性。

英文摘要

Imbalance (on-line energy gap between contracted supply and actual demand, and associated cost) reduction is going to be a crucial service for a Power Producer and Supplier (PPS) in the deregulated energy market. PPS requires forward market interactions to procure energy as precisely as possible in order to reduce imbalance energy. This paper presents, 1) (off-line) an effective demand aggregation based strategy for creating a number of balancing groups that leads to higher predictability of group-wise aggregated demand, 2) (on-line) a robust energy storage scheduling that minimizes the imbalance for a particular balancing group considering the demand prediction uncertainty. The group formation is performed by a Probabilistic Programming approach using Bayesian Markov Chain Monte Carlo (MCMC) method after applied on the historical demand statistics. Apart from the group formation, the aggregation strategy (with the help of Bayesian Inference) also clears out the upper-limit of the required storage capacity for a formed group, fraction of which is to be utilized in on-line operation. For on-line operation, a robust energy storage scheduling method is proposed that minimizes expected imbalance energy and cost (a non-linear function of imbalance energy) while incorporating the demand uncertainty of a particular group. The proposed methods are applied on the real apartment buildings' demand data in Tokyo, Japan. Simulation results are presented to verify the effectiveness of the proposed methods.

URL PDF HTML ☆

赞 0 踩 0

1606.04087 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Networked Intelligence: Towards Autonomous Cyber Physical Systems

网络化智能：迈向自主的网络物理系统

Andre Karpistsenko

AI总结本文探讨了如何结合产业与学术成果，构建大规模网络物理系统，提出概念架构并评估子系统成熟度，为智能系统发展提供规划指导。

1608.04361 2026-06-04 math.NA cs.AI cs.NA 版本更新

Multi-way Monte Carlo Method for Linear Systems

多向蒙特卡洛方法用于线性系统

Tao Wu, David F. Gleich

AI总结本文提出多向马尔可夫随机游走方法，改进了线性系统求解的条件，使谱半径ρ(H⁺)<1成为必要充分条件，且在数值实验中验证了方法的有效性与速度优势。

1608.02165 2026-06-04 cs.CV cs.AI cs.NA math.NA math.OC 版本更新

ShapeFit and ShapeKick for Robust, Scalable Structure from Motion

形状拟合与形状踢：用于鲁棒、可扩展的结构从运动

Thomas Goldstein, Paul Hand, Choongbum Lee, Vladislav Voroninski, Stefano Soatto

AI总结本文提出一种利用高效凸优化程序进行成对方向定位恢复的新方法，能有效处理对抗性异常值，且在真实场景和模拟数据上验证了其性能和灵活性。

1608.00655 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A Web-based Tool for Identifying Strategic Intervention Points in Complex Systems

用于复杂系统中战略干预点识别的网页工具

Sotiris Moschoyiannis, Nicholas Elia, Alexandra S. Penn, David J. B. Lloyd, Chris Knight

AI总结本文提出一种基于Fuzzy Cognitive Mapping的网页工具，用于识别复杂系统中的最小控制配置，通过网络可控性理论确定战略干预点，应用于英国哈姆伯地区向生物基经济转型的决策过程。

Comments In Proceedings Cassting'16/SynCoP'16, arXiv:1608.00177

详情

DOI: 10.4204/EPTCS.220.4
Journal ref: EPTCS 220, 2016, pp. 39-52

AI中文摘要

在复杂系统中实现期望结果是一项具有挑战性的任务。系统架构的不明确性和动态规则的操作化数据稀缺是主要因素。本文基于Fuzzy Cognitive Mapping（FCM）提出分析方法，将系统表示为复杂网络，并利用网络可控性理论确定最小控制配置，即战略干预点。我们开发了一个网页工具，生成复杂网络的所有最小控制配置，并通过与工业、地方政府和非政府组织合作的经验验证了该方法在哈姆伯地区向生物基经济转型决策中的应用。

英文摘要

Steering a complex system towards a desired outcome is a challenging task. The lack of clarity on the system's exact architecture and the often scarce scientific data upon which to base the operationalisation of the dynamic rules that underpin the interactions between participant entities are two contributing factors. We describe an analytical approach that builds on Fuzzy Cognitive Mapping (FCM) to address the latter and represent the system as a complex network. We apply results from network controllability to address the former and determine minimal control configurations - subsets of factors, or system levers, which comprise points for strategic intervention in steering the system. We have implemented the combination of these techniques in an analytical tool that runs in the browser, and generates all minimal control configurations of a complex network. We demonstrate our approach by reporting on our experience of working alongside industrial, local-government, and NGO stakeholders in the Humber region, UK. Our results are applied to the decision-making process involved in the transition of the region to a bio-based economy.

URL PDF HTML ☆

赞 0 踩 0

1607.07896 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Polling-systems-based Autonomous Vehicle Coordination in Traffic Intersections with No Traffic Signals

基于轮询系统的自动驾驶车辆在无交通信号灯的交通交叉口协调控制

David Miculescu, Sertac Karaman

AI总结本文提出一种基于轮询系统的协调控制算法，用于无交通信号灯的自动驾驶车辆交叉口安全高效通行，通过随机模型预测车辆到达时间，确保无碰撞并提供等待时间的严格上界。

详情

AI中文摘要

自动驾驶车辆的快速发展促使对全自动驾驶交通网络潜在优势的深入研究。大多数研究认为自动驾驶系统能显著提升性能。广泛研究的概念是全自动驾驶无碰撞交叉口，车辆在无交通信号灯的交叉口调整速度以安全快速通过。本文提出了一种协调控制算法，假设车辆到达时间的随机模型。所提算法提供了安全性和性能的证明保证。更具体地说，证明了无碰撞发生，并且提供了预期等待时间的严格上界。该算法还通过仿真进行了演示。所提算法受轮询系统启发。事实上，本文研究的问题导致了一种新的轮询系统，其中客户受微分约束，这可能本身具有研究价值。

英文摘要

The rapid development of autonomous vehicles spurred a careful investigation of the potential benefits of all-autonomous transportation networks. Most studies conclude that autonomous systems can enable drastic improvements in performance. A widely studied concept is all-autonomous, collision-free intersections, where vehicles arriving in a traffic intersection with no traffic light adjust their speeds to cross safely through the intersection as quickly as possible. In this paper, we propose a coordination control algorithm for this problem, assuming stochastic models for the arrival times of the vehicles. The proposed algorithm provides provable guarantees on safety and performance. More precisely, it is shown that no collisions occur surely, and moreover a rigorous upper bound is provided for the expected wait time. The algorithm is also demonstrated in simulations. The proposed algorithms are inspired by polling systems. In fact, the problem studied in this paper leads to a new polling system where customers are subject to differential constraints, which may be interesting in its own right.

URL PDF HTML ☆

赞 0 踩 0

1607.02480 2026-06-04 cs.AI cs.DC cs.SY eess.SY 版本更新

Real-Time Anomaly Detection for Streaming Analytics

实时流分析中的异常检测

Subutai Ahmad, Scott Purdy

AI总结本文提出基于Hierarchical Temporal Memory算法的实时异常检测方法，通过流数据实时处理与学习实现预测，在金融指标和NAB基准测试中均取得最佳性能。

1607.02419 2026-06-04 econ.GN cs.AI q-fin.EC 版本更新

Divisive-agglomerative algorithm and complexity of automatic classification problems

划分-聚类算法及自动分类问题的复杂性

Alexander Rubchinsky

AI总结本文提出了解决自动分类问题的算法，并探讨了该问题的复杂性。

1307.4847 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

在确定性系统中通过价值函数泛化实现高效的强化学习

Zheng Wen, Benjamin Van Roy

AI总结本文提出OCP算法，通过优化约束传播实现高效探索和价值函数泛化，在有限时间 horizon 确定性系统中实现最优动作选择，并提供效率和渐进行为保证。

1607.01478 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Mixed Strategy for Constrained Stochastic Optimal Control

混合策略用于受约束的随机最优控制

Masahiro Ono, Mahmoud El Chamie, Marco Pavone, Behcet Acikmese

AI总结本文提出混合策略用于受约束的随机最优控制，证明随机化控制输入在非凸优化问题中可降低成本，等于对偶间隙，并提出基于对偶优化的高效求解方法。

Comments 11 pages. 9 figures.Preliminary version of a working journal paper

详情

AI中文摘要

在具有随机约束的最优控制问题中，随机选择控制输入可以降低预期成本，例如随机模型预测控制（SMPC）。我们考虑具有初始随机化的控制器，即在开始时随机选择K+1个控制序列（称为K-随机化）。已知对于具有K个约束的有限状态、有限动作马尔可夫决策过程（MDP），K-随机化足以达到最小成本。我们发现，对于具有连续状态和动作空间的随机最优控制问题，相同结果也成立。进一步，我们证明当优化问题非凸时，控制输入的随机化可以导致成本降低，且该降低量等于对偶间隙。然后，我们提供随机解最优性的必要和充分条件，并开发基于对偶优化的高效求解方法。此外，在K=1的特殊情况（如联合概率约束问题）中，对偶优化可通过根查找更高效地解决。最后，我们在路径规划到未来火星任务的着陆、下降和着陆（EDL）规划等多个实际问题上测试理论并演示求解方法。

英文摘要

Choosing control inputs randomly can result in a reduced expected cost in optimal control problems with stochastic constraints, such as stochastic model predictive control (SMPC). We consider a controller with initial randomization, meaning that the controller randomly chooses from K+1 control sequences at the beginning (called K-randimization).It is known that, for a finite-state, finite-action Markov Decision Process (MDP) with K constraints, K-randimization is sufficient to achieve the minimum cost. We found that the same result holds for stochastic optimal control problems with continuous state and action spaces.Furthermore, we show the randomization of control input can result in reduced cost when the optimization problem is nonconvex, and the cost reduction is equal to the duality gap. We then provide the necessary and sufficient conditions for the optimality of a randomized solution, and develop an efficient solution method based on dual optimization. Furthermore, in a special case with K=1 such as a joint chance-constrained problem, the dual optimization can be solved even more efficiently by root finding. Finally, we test the theories and demonstrate the solution method on multiple practical problems ranging from path planning to the planning of entry, descent, and landing (EDL) for future Mars missions.

URL PDF HTML ☆

赞 0 踩 0

1602.04621 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Deep Exploration via Bootstrapped DQN

通过Bootstrap DQN进行深度探索

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

AI总结本文提出Bootstrap DQN算法，通过随机价值函数实现高效探索，提升复杂环境中的学习速度和性能，尤其在Atari游戏中表现优异。

1606.07149 2026-06-04 cs.NE cs.AI cs.CE cs.LG cs.SY eess.SY 版本更新

An Approach to Stable Gradient Descent Adaptation of Higher-Order Neural Units

一种高阶神经单元稳定梯度下降适应的方法

Ivo Bukovsky, Noriyasu Homma

AI总结本文提出一种基于谱半径的高阶神经单元权重更新系统稳定性评估方法，通过梯度下降实现前馈和递归HONU的适应，确保每一步适应过程的稳定性，从而保证整个神经架构对目标数据的适应性。

Comments 2016, 13 pages

详情

DOI: 10.1109/TNNLS.2016.2572310
Journal ref: IEEE Transactions on Neural Networks and Learning Systems,ISSN: 2162-237X,2016

AI中文摘要

本文介绍了用于评估高阶神经单元（HONUs）权重更新系统稳定性的方法，该系统采用多项式聚合神经输入（也称为多项式神经网络类别）进行适应，通过梯度下降方法实现前馈和递归HONUs的适应。该方法的核心基于权重更新系统的谱半径，允许在每次适应步骤中监控和维持稳定性。确保权重更新系统的稳定性（在每次单独的适应步骤中）自然导致整个神经架构适应目标数据的稳定性。此外，所用方法强调HONU的权重优化是一个线性问题，因此所提出的方法可以一般扩展到任何其可调整参数为线性的神经架构。

英文摘要

Stability evaluation of a weight-update system of higher-order neural units (HONUs) with polynomial aggregation of neural inputs (also known as classes of polynomial neural networks) for adaptation of both feedforward and recurrent HONUs by a gradient descent method is introduced. An essential core of the approach is based on spectral radius of a weight-update system, and it allows stability monitoring and its maintenance at every adaptation step individually. Assuring stability of the weight-update system (at every single adaptation step) naturally results in adaptation stability of the whole neural architecture that adapts to target data. As an aside, the used approach highlights the fact that the weight optimization of HONU is a linear problem, so the proposed approach can be generally extended to any neural architecture that is linear in its adaptable parameters.

URL PDF HTML ☆

赞 0 踩 0

1606.06512 2026-06-04 eess.SY cs.AI cs.CE cs.SY math.OC physics.soc-ph 版本更新

Graphical Models for Optimal Power Flow

图模型用于最优功率流

Krishnamurthy Dvijotham, Pascal Van Hentenryck, Michael Chertkov, Sidhant Misra, Marc Vuffray

AI总结本文将树状网络的最优功率流问题转化为树状图模型的推断问题，结合动态规划与区间离散化，提出了一种高效求解方法，适用于任意配电网络和混合整数优化。

Comments To appear in Proceedings of the 22nd International Conference on Principles and Practice of Constraint Programming (CP 2016(

详情

AI中文摘要

最优功率流（OPF）是电力网络中的核心优化问题。尽管在电网运行中被常规解决，但一般情况下被证明是强NP难问题，而在树状网络上为弱NP难。本文将树状网络的OPF问题建模为树状图模型的推断问题，其中节点变量为低维向量。我们适配标准的树状图模型推断动态规划算法至OPF问题。结合节点变量的区间离散化，我们开发出OPF问题的近似算法。进一步，我们利用约束编程（CP）技术进行区间计算和自适应边界传播，获得实际高效的算法。与之前使用凸松弛保证最优性的OPF算法相比，我们的方法能够处理任意配电网络和混合整数优化问题。此外，该方法可以以分布式消息传递的方式实现，具有可扩展性，适用于智能电网应用，如分布式能源资源的控制。我们在多个基准网络上评估了该技术，并展示了使用此方法可以有效解决实际OPF问题。

英文摘要

Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithm for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for "smart grid" applications like control of distributed energy resources. We evaluate our technique numerically on several benchmark networks and show that practical OPF problems can be solved effectively using this approach.

URL PDF HTML ☆

赞 0 踩 0

1606.05124 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Robust Active Perception via Data-association aware Belief Space planning

通过数据关联意识的信念空间规划实现鲁棒的主动感知

Shashank Pathak, Antony Thomas, Asaf Feniger, Vadim Indelman

AI总结本文提出一种结合数据关联推理的信念空间规划方法，以应对定位不确定性和感知模糊环境中的挑战，通过设计新的成本函数提升主动解歧能力。

详情

AI中文摘要

我们开发了一种信念空间规划（BSP）方法，通过在规划中整合数据关联（DA）推理，同时考虑额外的不确定性来源，从而推动了该领域的前沿。现有BSP方法通常假设数据关联已知且完美，但在存在定位不确定性、模糊和感知混叠环境时，这一假设更难成立。相反，我们的数据关联意识信念空间规划（DA-BSP）方法在信念演化中显式推理数据关联，因此能更好地应对这些具有挑战性的现实场景。特别是，我们展示了由于感知混叠，后验信念成为概率分布函数的混合，设计了衡量预期模糊程度和后验不确定性的成本函数。使用这些以及标准成本（如控制惩罚、距离目标）在目标函数中，得到一个能够可靠表示动作影响且特别擅长主动解歧的通用框架。我们的方法因此适用于感知混叠环境中的鲁棒主动感知和自主导航。我们通过基本和现实的模拟展示了关键方面。

英文摘要

We develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in ambiguous and perceptually aliased environments. In contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about DA within belief evolution, and as such can better accommodate these challenging real world scenarios. In particular, we show that due to perceptual aliasing, the posterior belief becomes a mixture of probability distribution functions, and design cost functions that measure the expected level of ambiguity and posterior uncertainty. Using these and standard costs (e.g.~control penalty, distance to goal) within the objective function, yields a general framework that reliably represents action impact, and in particular, capable of active disambiguation. Our approach is thus applicable to robust active perception and autonomous navigation in perceptually aliased environments. We demonstrate key aspects in basic and realistic simulations.

URL PDF HTML ☆

赞 0 踩 0

1606.01949 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Assisted Energy Management in Smart Microgrids

智能微电网中的辅助能源管理

Andrea Monacchi, Wilfried Elmenreich

AI总结本文研究了通过正向合同缓解竞争需求导致的服务中断问题，设计了基于策略的经纪人并利用神经网络实现学习经纪人，以降低赔付成本并提高整体利润。

1606.01245 2026-06-04 math.NA cs.AI cs.NA math.OC stat.ML 版本更新

Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization

可扩展算法用于可计算的Schatten准范数最小化

Fanhua Shang, Yuanyuan Liu, James Cheng

AI总结本文提出两种可计算的Schatten准范数，设计高效算法以加速大规模问题解决，并通过实验验证其精度和速度优势。

Comments 16 pages, 5 figures, Appears in Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, USA, pp. 2016--2022, 2016

详情

AI中文摘要

Schatten-p准范数（0<p<1）通常用于替代标准核范数以更精确地近似秩函数。然而，现有Schatten-p准范数最小化算法在每次迭代中均涉及奇异值分解（SVD）或特征值分解（EVD），因此对于大规模问题可能变得非常缓慢且不实用。本文首先定义了两种可计算的Schatten准范数，即Frobenius/核混合准范数和双核准范数，并证明它们本质上是Schatten-2/3和1/2准范数，分别导致仅需更新两个较小因子矩阵的高效算法。我们还为解决代表性矩阵补全问题设计了两种高效的近端交替线性化最小化算法。最后，我们提供了算法的全局收敛性和性能保证，其收敛性质优于现有算法。在合成和真实数据上的实验结果表明，我们的算法比现有最先进方法更准确，并且快了多个数量级。

英文摘要

The Schatten-p quasi-norm $(0<p<1)$ is usually used to replace the standard nuclear norm in order to approximate the rank function more accurately. However, existing Schatten-p quasi-norm minimization algorithms involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, and thus may become very slow and impractical for large-scale problems. In this paper, we first define two tractable Schatten quasi-norms, i.e., the Frobenius/nuclear hybrid and bi-nuclear quasi-norms, and then prove that they are in essence the Schatten-2/3 and 1/2 quasi-norms, respectively, which lead to the design of very efficient algorithms that only need to update two much smaller factor matrices. We also design two efficient proximal alternating linearized minimization algorithms for solving representative matrix completion problems. Finally, we provide the global convergence and performance guarantees for our algorithms, which have better convergence properties than existing algorithms. Experimental results on synthetic and real-world data show that our algorithms are more accurate than the state-of-the-art methods, and are orders of magnitude faster.

URL PDF HTML ☆

赞 0 踩 0

1605.09772 2026-06-04 eess.SY cs.AI cs.SE cs.SY 版本更新

Technical Report: Directed Controller Synthesis of Discrete Event Systems

面向离散事件系统的定向控制器综合技术报告

Daniel Ciolek, Victor Braberman, Nicolás D'Ippolito, Sebastián Uchitel

AI总结本文提出一种基于领域无关启发式的定向控制器综合方法，通过高效抽象环境并动态构建组件，实现对安全性和共安全性目标的离散事件系统控制。

Comments 8 pages, submitted to the 55th IEEE Conference on Decision and Control

1605.09497 2026-06-04 cs.GT cs.AI cs.MA cs.SY eess.SY 版本更新

Interdependent Scheduling Games

相互依赖的调度博弈

Andres Abeliuk, Haris Aziz, Gerardo Berbeglia, Serge Gaspers, Petr Kalina, Nicholas Mattei, Dominik Peters, Paul Stursberg, Pascal Van Hentenryck, Toby Walsh

AI总结本文研究了相互依赖的调度博弈模型，探讨了在基础设施规划与协调中的应用，分析了福利最大化、纳什均衡的存在与计算等核心问题。

Comments Accepted to IJCAI 2016

1512.01110 2026-06-04 math.NA cs.AI cs.LG cs.NA 版本更新

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

基于自适应放松谱正则化的贝叶斯矩阵补全

Yang Song, Jun Zhu

AI总结本文提出一种基于谱正则化的贝叶斯矩阵补全方法，通过放松奇异向量的正交约束，设计出适用于贝叶斯推断的自适应谱正则化方法，无需参数调优即可自动推断潜在因子数量，在稀疏矩阵上表现优异。

Comments Accepted to AAAI 2016

1511.03722 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ME stat.ML 版本更新

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

强化学习中的双重鲁棒离策略价值评估

Nan Jiang, Lihong Li

AI总结本文提出一种双重鲁棒估计器，用于离策略价值评估，兼顾无偏性和低方差性，并在基准问题中验证其有效性。

Comments 14 pages; 4 figures; ICML 2016

1605.05711 2026-06-04 math.OC cs.AI cs.SY eess.SY 版本更新

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

信息收集车辆路径问题：面向紧急风暴响应的随机优化

Lina Al-Kanj, Warren B. Powell, Belgacem Bouzaiene-Ayari

AI总结本文提出一种随机优化策略，通过电话调用建立故障信念并利用车辆收集信息，以快速恢复电网故障，首次将信息收集和信念建模纳入车辆路径问题。

详情

AI中文摘要

电力公司面临风暴和冰灾导致的停电问题，但大多数电网缺乏传感器定位故障点。本文开发了一种策略，利用电话调用建立故障信念，并通过车辆收集额外信息，以快速恢复电网。该策略将车辆路径问题建模为顺序随机优化问题，提出随机前瞻策略并使用MCTS生成近优政策。仿真结果表明，该策略恢复电网速度优于传统启发式方法。

英文摘要

Utilities face the challenge of responding to power outages due to storms and ice damage, but most power grids are not equipped with sensors to pinpoint the precise location of the faults causing the outage. Instead, utilities have to depend primarily on phone calls (trouble calls) from customers who have lost power to guide the dispatching of utility trucks. In this paper, we develop a policy that routes a utility truck to restore outages in the power grid as quickly as possible, using phone calls to create beliefs about outages, but also using utility trucks as a mechanism for collecting additional information. This means that routing decisions change not only the physical state of the truck (as it moves from one location to another) and the grid (as the truck performs repairs), but also our belief about the network, creating the first stochastic vehicle routing problem that explicitly models information collection and belief modeling. We address the problem of managing a single utility truck, which we start by formulating as a sequential stochastic optimization model which captures our belief about the state of the grid. We propose a stochastic lookahead policy, and use Monte Carlo tree search (MCTS) to produce a practical policy that is asymptotically optimal. Simulation results show that the developed policy restores the power grid much faster compared to standard industry heuristics.

URL PDF HTML ☆

赞 0 踩 0

1604.08768 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Supervisory Control for Behavior Composition

行为组合的监督控制

Paolo Felli, Nitin Yadav, Sebastian Sardina

AI总结将AI中的行为组合合成任务与离散事件系统领域的监督控制理论联系起来，通过协调可用行为实现目标模块，利用离散事件系统的理论基础和工具。

1604.03912 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

逆强化学习与奖励和动态的同时估计

Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard

AI总结本文提出一种基于梯度的逆强化学习方法，同时估计系统动态和奖励函数，提升了样本效率和估计准确性。

Comments accepted to appear in AISTATS 2016

详情

AI中文摘要

逆强化学习（IRL）描述了从观察到的智能体行为中学习未知马尔可夫决策过程（MDP）奖励函数的问题。由于智能体的行为源于其策略，而MDP策略依赖于随机系统动态和奖励函数，逆问题的解决方案受到两者显著影响。当前的IRL方法假设如果转移模型未知，可以获取额外的系统动态样本，或观察行为提供足够的系统动态样本以准确求解逆问题。这些假设往往不成立。为克服这一问题，我们提出了一种基于梯度的IRL方法，同时估计系统的动态。通过求解联合优化问题，我们的方法考虑了演示的偏差，这种偏差源于生成策略。在合成MDP和迁移学习任务上的评估显示，该方法在样本效率以及估计的奖励函数和转移模型的准确性方面有所改进。

英文摘要

Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models.

URL PDF HTML ☆

赞 0 踩 0

1604.02080 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

在马尔可夫决策过程中的信息处理约束与模型不确定性规划

Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

AI总结本文提出考虑模型不确定性的马尔可夫决策过程规划方法，通过信息论原理统一解决信息处理约束问题，结合广义变分原理推导价值迭代方案，并在网格世界模拟中验证其有效性。

Comments 16 pages, 3 figures

1603.04586 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

通过混合可观测域中的多臂老虎机放松实现最优感知

Mikko Lauri, Risto Ritala

AI总结研究在混合可观测域中不确定决策问题，通过放松约束推导最优价值函数上界，并利用多臂老虎机的可计算最优策略提升搜索空间剪枝效率，实验显示在目标监控领域有效。

Comments 6 pages, 2 figures

1603.02038 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Unscented Bayesian Optimization for Safe Robot Grasping

无迹贝叶斯优化用于安全机器人抓取

José Nogueira, Ruben Martinez-Cantin, Alexandre Bernardino, Lorenzo Jamone

AI总结本文提出无迹贝叶斯优化算法，通过考虑输入噪声在安全区域寻找最优抓取策略，提升机器人抓取的安全性和效率。

Comments conference paper

详情

AI中文摘要

我们解决了在输入空间存在不确定性时的机器人抓取优化问题。通过试错探索策略实现抓取未知物体。贝叶斯优化是一种样本高效的优化算法，特别适合此设置，因为它能主动减少试验次数以学习待优化函数。事实上，这种主动对象探索策略与婴儿学习最佳抓取方式的策略相同。在学习抓取策略时，一些抓取参数配置可能对物体与机器人末端执行器之间相对姿态的误差非常敏感。我们称这些配置为不安全，因为抓取执行中的小误差可能将好的抓取变为坏的抓取。因此，为了降低抓取失败的风险，抓取应规划在安全区域。我们提出了一种新的算法，即无迹贝叶斯优化，能够在考虑输入噪声的情况下进行样本高效的优化以找到安全的极值。无迹贝叶斯优化的贡献是双方面的：一方面提供了一个新的决策过程，驱动探索到安全区域；另一方面提供了一个新的选择过程，选择在不进行额外分析或计算成本的情况下最优的抓取策略。这两个贡献都根植于无迹变换背后的强大理论，这是一种流行的非线性近似方法。我们在合成问题和现实的机器人抓取模拟中展示了其相对于经典贝叶斯优化的优势。结果表明，我们的方法在几次试验后就能获得最优且鲁棒的抓取策略，同时所选的抓取保持在安全区域。

英文摘要

We address the robot grasp optimization problem of unknown objects considering uncertainty in the input space. Grasping unknown objects can be achieved by using a trial and error exploration strategy. Bayesian optimization is a sample efficient optimization algorithm that is especially suitable for this setups as it actively reduces the number of trials for learning about the function to optimize. In fact, this active object exploration is the same strategy that infants do to learn optimal grasps. One problem that arises while learning grasping policies is that some configurations of grasp parameters may be very sensitive to error in the relative pose between the object and robot end-effector. We call these configurations unsafe because small errors during grasp execution may turn good grasps into bad grasps. Therefore, to reduce the risk of grasp failure, grasps should be planned in safe areas. We propose a new algorithm, Unscented Bayesian optimization that is able to perform sample efficient optimization while taking into consideration input noise to find safe optima. The contribution of Unscented Bayesian optimization is twofold as if provides a new decision process that drives exploration to safe regions and a new selection procedure that chooses the optimal in terms of its safety without extra analysis or computational cost. Both contributions are rooted on the strong theory behind the unscented transformation, a popular nonlinear approximation method. We show its advantages with respect to the classical Bayesian optimization both in synthetic problems and in realistic robot grasp simulations. The results highlights that our method achieves optimal and robust grasping policies after few trials while the selected grasps remain in safe regions.

URL PDF HTML ☆

赞 0 踩 0

1603.00748 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

Continuous Deep Q-Learning with Model-based Acceleration

基于模型的连续深度Q学习加速

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

AI总结本文提出连续深度Q学习算法NAF及基于模型的加速方法，用于提升连续控制任务的样本效率和学习速度。

详情

AI中文摘要

模型无关强化学习已成功应用于多种挑战性问题，并扩展到处理大规模神经网络策略和价值函数。然而，模型无关算法的样本复杂性，特别是使用高维函数近似器时，限制了其在物理系统中的应用。本文探索了减少深度强化学习样本复杂性的算法和表示方法。我们提出两种互补技术来提高此类算法的效率。首先，我们推导出Q学习的连续变种，称为归一化优势函数（NAF），作为替代更常用的策略梯度和actor-critic方法。NAF表示允许我们应用带有经验回放的Q学习来处理连续任务，并在一组模拟机器人控制任务上显著提高性能。为进一步提高我们的方法效率，我们探索了使用学习模型来加速模型无关强化学习。我们展示迭代重新拟合的局部线性模型在这一点上特别有效，并在适用此类模型的领域中展示了显著更快的学习速度。

英文摘要

Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.

URL PDF HTML ☆

赞 0 踩 0

1506.01326 2026-06-04 math.NA cs.AI cs.LG cs.NA stat.CO stat.ML 版本更新

Probabilistic Numerics and Uncertainty in Computations

概率数值计算与计算中的不确定性

Philipp Hennig, Michael A Osborne, Mark Girolami

AI总结本文呼吁采用概率数值方法，通过在计算中返回不确定性来改进线性代数、积分、优化和微分方程求解等算法，强调其在气候科学和天文学等领域的应用价值。

Comments Author Generated Postprint. 17 pages, 4 Figures, 1 Table

详情

DOI: 10.1098/rspa.2015.0142

AI中文摘要

我们呼吁采用概率数值方法：即在数值任务中返回不确定性的算法，包括线性代数、积分、优化和求解微分方程。这些不确定性源于数值计算中由于时间和硬件限制导致的精度损失，对现代科学和工业至关重要。在诸如气候科学和天文学等应用中，基于大规模复杂数据的计算需求促使重新关注数值不确定性的管理。我们描述了几种经典数值方法如何自然地被解释为概率推断。然后展示概率观点如何提出新的算法，能够灵活适应应用需求，并提供改进的实证性能。我们提供了天文学和天文成像等实际科学问题中概率数值算法的实例，同时指出这些新算法存在的开放问题。最后，我们描述了概率数值方法如何为结合数值算法（如数值优化器和微分方程求解器）的计算提供一致的框架，可能允许诊断（和控制）计算中的误差源。

英文摘要

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

URL PDF HTML ☆

赞 0 踩 0

1402.0635 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Generalization and Exploration via Randomized Value Functions

通过随机价值函数实现泛化与探索

Ian Osband, Benjamin Van Roy, Zheng Wen

AI总结本文提出随机最小二乘价值迭代算法（RLSVI），通过线性参数化价值函数实现高效的探索与泛化，证明其在无先验知识学习中的近优性能。

Comments arXiv admin note: text overlap with arXiv:1307.4847

1406.5311 2026-06-04 math.OC cs.AI cs.LG cs.NA math.NA stat.ML 版本更新

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

迈向更深入的几何、分析和算法对边界的理解

Aaditya Ramdas, Javier Peña

AI总结本文研究了矩阵A的边界条件度量，探讨了线性可行性问题的难度，通过几何、分析和算法方法扩展了边界理论，并证明了感知机收敛率与边界的关联。

Comments 18 pages, 3 figures

1601.00738 2026-06-04 cs.DC cs.AI cs.DB cs.SY eess.SY 版本更新

Resource Sharing for Multi-Tenant NoSQL Data Store in Cloud

多租户NoSQL数据存储在云计算中的资源共享

Jiaan Zeng

AI总结本文研究多租户NoSQL数据存储中资源共享问题，提出两种解决方案：针对独立数据本地文件系统和共享数据并行文件系统的调度方案与工作负载感知资源预留方法，以提高性能和适应动态工作负载。

Comments PhD dissertation, December 2015

详情

AI中文摘要

多租户模式在云计算NoSQL数据存储中受到青睐，因为它能够在低成本下实现资源共享。多租户模式根据后端文件系统是本地文件系统（LFS）还是并行文件系统（PFS），以及租户是否独立或跨租户共享数据而有所不同。本论文聚焦并提出解决两种情况的方案：独立数据本地文件系统和共享数据并行文件系统。在独立数据本地文件系统情况下，Cassandra和HBase等最先进的NoSQL存储在特定条件下会出现资源竞争，导致性能下降。我们研究了干扰现象并提出两种方法。第一种提供了一种调度方案，可以近似资源消耗，适应工作负载动态并以分布式方式运行。第二种引入了工作负载感知的资源预留方法，以防止干扰。该方法依赖于离线获得的性能模型，并根据不同的工作负载资源需求进行预留。结果表明，这两种方法可以共同防止干扰并适应多租户下的动态工作负载。在共享数据并行文件系统情况下，已证明在租户之间共享数据时，使用并行文件系统运行分布式NoSQL存储并不经济。由于NoSQL存储对并行文件系统的不熟悉，引入了额外的开销。本论文针对键值存储（KVS），一种特定的NoSQL存储形式，提出了一种轻量级的KVS，基于并行文件系统以提高效率。该解决方案基于嵌入式KVS以实现高性能，但使用新颖的数据结构支持并发写入。结果表明，所提出的系统在多种不同的工作负载下均优于Cassandra和Voldemort。

英文摘要

Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants. In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads.

URL PDF HTML ☆

赞 0 踩 0

1512.06789 2026-06-04 stat.ML cs.AI cs.SY eess.SY math.OC 版本更新

Information-Theoretic Bounded Rationality

信息论有界理性

Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby

AI总结本文基于信息论提出有界理性的理论，通过自由能函数描述决策，具备控制解空间、精确蒙特卡洛规划及捕捉模型不确定性的特性，并扩展至序列决策。

Comments 47 pages, 19 figures

详情

AI中文摘要

有界理性，即在资源限制下进行决策和规划，被认为是人工智能、强化学习、计算神经科学和经济学中的重要开放问题。本文提供了一个基于信息论的有界理性的理论综述。我们为使用自由能功能作为有界理性决策的客观函数提供了概念论证。该功能具有三个关键特性：它控制了解空间的大小；它具有精确的蒙特卡洛规划器，却无需穷尽搜索；它捕捉到缺乏证据或与其他具有未知意图的智能体交互时产生的模型不确定性。我们讨论了单步决策情况，并展示如何通过等价变换扩展到序列决策。这种扩展产生了一种非常一般的决策问题类，涵盖了经典决策规则（如EXPECTIMAX和MINIMAX）作为极限情况，以及信任和风险敏感的规划。

英文摘要

Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.

URL PDF HTML ☆

赞 0 踩 0

1512.06427 2026-06-04 cs.AI cs.DS cs.SY eess.SY math.OC 版本更新

Towards Integrated Glance To Restructuring in Combinatorial Optimization

迈向组合优化中重构的整合视角

Mark Sh. Levin

AI总结本文研究组合优化中解决方案的重构问题，探讨重构成本与目标解接近度，并针对三种重构类型提出单准则和多准则问题解决方法。

Comments 31 pages, 34 figures, 10 tables

1512.01885 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Probabilistic Structural Controllability in Causal Bayesian Networks

因果贝叶斯网络中的概率结构可控性

Ardavan Salehi Nobandegani, Ioannis N. Psaromiligkos

AI总结本文首次研究因果贝叶斯网络中的概率可控性问题，提出概率结构可控性的定义，并识别出一组足够的驱动变量以实现目标变量状态的概率控制。

1505.00274 2026-06-04 cs.AI cs.SY eess.SY stat.ML 版本更新

Stick-Breaking Policy Learning in Dec-POMDPs

在Dec-POMDPs中采用Stick-Breaking策略的学习

Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

AI总结本文提出了一种变大小状态控制器的Dec-SBPR框架，通过Stick-Breaking先验构建局部策略，无需假设Dec-POMDP模型即可学习控制器参数，有效提升大规模问题的性能。

1509.03044 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Recurrent Reinforcement Learning: A Hybrid Approach

递归强化学习：一种混合方法

Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

AI总结本文提出一种混合模型，结合监督学习和强化学习，用于部分可观测任务的状态表示学习，在极少领域知识下有效。

Comments 11 pages, 6 figures

详情

AI中文摘要

成功的强化学习应用往往需要处理部分可观测状态。通常很难构建和推断隐藏状态，因为它们依赖于智能体的整个交互历史，可能需要大量领域知识。本文研究了一种深度学习方法，用于在极少领域知识下学习部分可观测任务的状态表示。特别地，我们提出了一种新的混合模型，结合监督学习（SL）和强化学习（RL）的优点，以联合方式训练：SL组件可以是循环神经网络（RNN）或其长短期记忆（LSTM）版本，具有捕捉长期依赖性的能力，从而有效学习隐藏状态的表示。RL组件是一个深度Q网络（DQN），学习优化控制以最大化长期奖励。在直接邮寄营销问题上的大量实验展示了所提出方法的有效性和优势，其在一组先前最先进的方法中表现最佳。

英文摘要

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

URL PDF HTML ☆

赞 0 踩 0

1510.07313 2026-06-04 eess.SY cs.AI cs.LO cs.RO cs.SY 版本更新

Safe Control under Uncertainty

在不确定性下的安全控制

Dorsa Sadigh, Ashish Kapoor

AI总结本文提出Probabilistic Signal Temporal Logic（PrSTL）用于定义随机属性并确保概率保证，通过该逻辑合成安全控制器，应用于四旋翼和自动驾驶车辆等案例。

Comments 10 pages, 6 figures, Submitted to HSCC 2016

1510.04914 2026-06-04 cs.AI cs.DC cs.MS cs.NA math.NA math.OC 版本更新

Hybridization of Interval CP and Evolutionary Algorithms for Optimizing Difficult Problems

区间CP与进化算法的混合方法用于优化难题

Charlie Vanaret, Jean-Baptiste Gotteland, Nicolas Durand, Jean-Marc Alliot

AI总结本文提出一种混合框架，结合区间方法与进化算法，通过消息传递实现并行搜索，展示Charibde在解决困难COCONUT问题时优于现有求解器。

Comments 21st International Conference on Principles and Practice of Constraint Programming (CP 2015), 2015

详情

DOI: 10.1007/978-3-319-23219-5_32

AI中文摘要

在全局优化中，唯一严谨的数值证明最优性的方法是基于区间的算法，通过搜索空间的分支和不可含最优解的子域修剪。最先进的求解器通常整合局部优化算法来计算每个子空间的良好上界。本文提出了一种合作框架，其中区间方法与进化算法相互协作。后者是随机算法，通过候选解种群在搜索空间中迭代进化以达到满意解。在我们的合作求解器Charibde中，进化算法和基于区间的算法并行运行，并通过消息传递以高级方式交换边界、解和搜索空间。对困难COCONUT问题的基准测试表明，Charibde在非严谨求解器中具有竞争力，并比严谨求解器快一个数量级收敛。

英文摘要

The only rigorous approaches for achieving a numerical proof of optimality in global optimization are interval-based methods that interleave branching of the search-space and pruning of the subdomains that cannot contain an optimal solution. State-of-the-art solvers generally integrate local optimization algorithms to compute a good upper bound of the global minimum over each subspace. In this document, we propose a cooperative framework in which interval methods cooperate with evolutionary algorithms. The latter are stochastic algorithms in which a population of candidate solutions iteratively evolves in the search-space to reach satisfactory solutions. Within our cooperative solver Charibde, the evolutionary algorithm and the interval-based algorithm run in parallel and exchange bounds, solutions and search-space in an advanced manner via message passing. A comparison of Charibde with state-of-the-art interval-based solvers (GlobSol, IBBA, Ibex) and NLP solvers (Couenne, BARON) on a benchmark of difficult COCONUT problems shows that Charibde is highly competitive against non-rigorous solvers and converges faster than rigorous solvers by an order of magnitude.

URL PDF HTML ☆

赞 0 踩 0

1509.05722 2026-06-04 stat.ML cs.AI cs.MA cs.SY eess.SY 版本更新

Energy saving in smart homes based on consumer behaviour: A case study

基于消费者行为的智能家庭节能：一个案例研究

Michael Zehnder, Holger Wache, Hans-Friedrich Witschel, Danilo Zanatta, Miguel Rodriguez

AI总结本文提出一个节能推荐系统，通过分析消费者行为数据，利用机器学习建议减少家庭能耗，同时保持居住舒适度。

Comments To be presented on IEEE International Smart Cities Conference 2015

1508.03863 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Discrete Route/Trajectory Decision Making Problems

离散路径/轨迹决策制定问题

Mark Sh. Levin

AI总结本文研究了复合多阶段决策问题，旨在设计从初始决策状态到目标决策状态的路径或轨迹。通过汽车路线问题作为基本物理隐喻，探讨了离散操作/状态设计空间（如有向图）中的多种决策问题，并在教育、医学和经济等领域应用。

Comments 25 pages, 34 figures, 16 tables

详情

AI中文摘要

本文聚焦于复合多阶段决策问题，旨在从初始决策状态（起点）到目标（终点）决策状态的设计。汽车路线问题被视为基本物理隐喻。这些问题基于离散（组合）操作/状态设计/解决问题空间（例如，有向图）。描述的离散决策问题类型可视为智能路径（轨迹、策略）的设计，并可用于多个领域：（a）教育（学生教育轨迹规划），（b）医学（医疗治疗），（c）经济（初创企业发展的轨迹）。描述了几种路径决策问题类型：（i）基本路径决策，（ii）多目标路径决策，（iii）多路径决策，（iv）带路径/轨迹变化的多路径决策，（v）复合多路径决策（解决方案是几个对应领域的多个路径/轨迹的组合），（vi）带协调路径/轨迹的复合多路径决策。此外，还考虑了建模和构建设计空间的问题。数值示例展示了所建议的方法。三个应用被考虑：教育轨迹（或然问题），初创公司计划（模块化三阶段设计），以及医疗计划（在有向图上规划，具有双组件顶点）。

英文摘要

The paper focuses on composite multistage decision making problems which are targeted to design a route/trajectory from an initial decision situation (origin) to goal (destination) decision situation(s). Automobile routing problem is considered as a basic physical metaphor. The problems are based on a discrete (combinatorial) operations/states design/solving space (e.g., digraph). The described types of discrete decision making problems can be considered as intelligent design of a route (trajectory, strategy) and can be used in many domains: (a) education (planning of student educational trajectory), (b) medicine (medical treatment), (c) economics (trajectory of start-up development). Several types of the route decision making problems are described: (i) basic route decision making, (ii) multi-goal route decision making, (iii) multi-route decision making, (iv) multi-route decision making with route/trajectory change(s), (v) composite multi-route decision making (solution is a composition of several routes/trajectories at several corresponding domains), and (vi) composite multi-route decision making with coordinated routes/trajectories. In addition, problems of modeling and building the design spaces are considered. Numerical examples illustrate the suggested approach. Three applications are considered: educational trajectory (orienteering problem), plan of start-up company (modular three-stage design), and plan of medical treatment (planning over digraph with two-component vertices).

URL PDF HTML ☆

赞 0 踩 0

1508.01345 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Fuzzy Logic Based Direct Torque Control Of Induction Motor With Space Vector Modulation

基于模糊逻辑的感应电机直接转矩控制与空间矢量调制

Fatih Korkmaz, İsmail Topaloğlu, Hayati Mamur

AI总结本文提出基于模糊逻辑的空间矢量调制方法，用于改进感应电机直接转矩控制中的高转矩脉动问题，通过Matlab/Simulink仿真验证了其在动态转矩和速度响应上的显著提升。

Comments 10 pages

详情

DOI: 10.5121/ijscai.2013.2603

AI中文摘要

感应电机因其无刷结构、低成本和鲁棒性能而被广泛应用于各种领域。近年来，多种控制方法被提出，其中直接转矩控制因其快速的动态转矩响应和简单的控制结构而备受重视。然而，直接转矩控制方法仍存在一些缺点，其中最突出的是高转矩脉动。本文提出了一种新的方法，即基于模糊逻辑的空间矢量调制，旨在克服传统直接转矩控制中的高转矩脉动问题。为了测试和比较所提出的直接转矩控制方法与传统方法，已在Matlab/Simulink中进行了不同工作条件下的仿真。仿真结果表明，与传统直接转矩控制方法相比，该方法在动态转矩和速度响应方面有显著改进。

英文摘要

The induction motors have wide range of applications for due to its well-known advantages like brushless structures, low costs and robust performances. Over the past years, many kind of control methods are proposed for the induction motors and direct torque control has gained huge importance inside of them due to fast dynamic torque responses and simple control structures. However, the direct torque control method has still some handicaps against the other control methods and most of the important of these handicaps is high torque ripple. This paper suggests a new approach, Fuzzy logic based space vector modulation, on the direct torque controlled induction motors and aim of the approach is to overcome high torque ripple disadvantages of conventional direct torque control. In order to test and compare the proposed direct torque control method with conventional direct torque control method simulations, in Matlab/Simulink,have been carried out in different working conditions. The simulation results showed that a significant improvement in the dynamic torque and speed responses when compared to the conventional direct torque control method.

URL PDF HTML ☆

赞 0 踩 0

1502.05443 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

多智能体规划中的影响乐观局部值---扩展版

Frans A. Oliehoek, Matthijs T. J. Spaan, Stefan Witwicki

AI总结本文提出一种适用于非因子化价值函数的多智能体规划影响乐观上界方法，通过划分子问题并乐观假设系统影响，提供质量保证并改进启发式搜索效果。

Comments Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015)

详情

AI中文摘要

近年来，多智能体规划在不确定环境下发展出能处理数十甚至百级智能体的方法，但大多数方法要么对问题域做出限制性假设，要么提供无质量保证的近似解。本文引入了一种针对非因子化价值函数的多智能体规划影响乐观上界方法，通过将大问题划分为子问题并在每个子问题中乐观假设系统影响，提供可衡量的质量保证。通过数值比较不同上界，展示了如何在百级智能体问题中获得非平凡保证，即启发式解接近最优。此外，本文还证明这些上界可能提高启发式影响搜索的有效性，并讨论了进一步应用于多智能体规划的潜力。

英文摘要

Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.

URL PDF HTML ☆

赞 0 踩 0

1507.00567 2026-06-04 eess.SY cs.AI cs.DC cs.LG cs.SE cs.SY 版本更新

Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution

自学习云控制器：用于知识演化的模糊Q学习

Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada

AI总结本文提出FQL4KE自学习模糊云控制器，通过在运行时学习和修改模糊规则，使用户能通过调整优先级权重来指定控制器，而非复杂适应规则，实验表明其优于传统控制器。

详情

AI中文摘要

云控制器旨在通过在运行时自动扩展计算资源来响应应用需求，以满足性能保证并最小化资源成本。现有云控制器通常依赖预定义的适应规则集，但云服务提供商难以在设计时定义最优或预置的适应规则，因为上层应用是黑箱。因此，适应决策的负担常转嫁给云应用。然而，大多数情况下，应用开发者对云基础设施了解有限。本文提出在运行时学习适应规则。为此，我们引入FQL4KE，一种自学习模糊云控制器。FQL4KE在运行时学习和修改模糊规则。其优势在于设计云控制器时无需依赖仅靠精确的设计时知识，这可能难以获取。FQL4KE使用户能够通过简单调整代表系统目标优先级的权重来指定云控制器，而不是指定复杂的适应规则。FQL4KE的适用性已在云应用框架ElasticBench中得到实验评估。实验结果表明，FQL4KE优于我们之前开发的无学习机制的模糊控制器和原生Azure自动扩展。

英文摘要

Cloud controllers aim at responding to application demands by automatically scaling the compute resources at runtime to meet performance guarantees and minimize resource costs. Existing cloud controllers often resort to scaling strategies that are codified as a set of adaptation rules. However, for a cloud provider, applications running on top of the cloud infrastructure are more or less black-boxes, making it difficult at design time to define optimal or pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions often is delegated to the cloud application. Yet, in most cases, application developers in turn have limited knowledge of the cloud infrastructure. In this paper, we propose learning adaptation rules during runtime. To this end, we introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE learns and modifies fuzzy rules at runtime. The benefit is that for designing cloud controllers, we do not have to rely solely on precise design-time knowledge, which may be difficult to acquire. FQL4KE empowers users to specify cloud controllers by simply adjusting weights representing priorities in system goals instead of specifying complex adaptation rules. The applicability of FQL4KE has been experimentally assessed as part of the cloud application framework ElasticBench. The experimental results indicate that FQL4KE outperforms our previously developed fuzzy controller without learning mechanisms and the native Azure auto-scaling.

URL PDF HTML ☆

赞 0 踩 0

1506.02312 2026-06-04 cs.AI cs.LG cs.RO cs.SY eess.SY 版本更新

A Framework for Constrained and Adaptive Behavior-Based Agents

一种用于约束和自适应行为基 agent 的框架

Renato de Pontes Pereira, Paulo Martins Engel

AI总结本文提出一种框架，通过强化学习节点整合到行为树中，解决约束 agent 的学习能力问题，并展示其与分层强化学习选项的关系，确保嵌套学习节点的收敛性。

Comments 2015; 15 pages

1505.07872 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Towards combinatorial clustering: preliminary research survey

朝着组合聚类：初步研究调查

Mark Sh. Levin

AI总结本文从组合角度探讨聚类问题，涵盖基本聚类问题、评估方法、局部质量评估、多目标优化、通用模块聚类框架及基本聚类模型，重点在于将聚类问题建模为多目标优化问题。

Comments 102 pages, 66 figures, 67 tables

详情

AI中文摘要

本文从组合优化角度描述聚类问题，系统回顾了基本聚类问题、对象评估方法、局部和总质量评估、多目标优化、通用模块聚类框架及基本聚类模型。特别关注将聚类问题建模为多目标优化问题。组合优化模型作为辅助问题（如分配、分组、背包问题、多选问题、形态 clique 问题、寻找共识/中位数结构）被使用。数值示例展示了问题定义、解决方法和应用。该材料可作为研究调查、复合模块聚类软件设计的基础、文献参考和教程使用。

英文摘要

The paper describes clustering problems from the combinatorial viewpoint. A brief systemic survey is presented including the following: (i) basic clustering problems (e.g., classification, clustering, sorting, clustering with an order over cluster), (ii) basic approaches to assessment of objects and object proximities (i.e., scales, comparison, aggregation issues), (iii) basic approaches to evaluation of local quality characteristics for clusters and total quality characteristics for clustering solutions, (iv) clustering as multicriteria optimization problem, (v) generalized modular clustering framework, (vi) basic clustering models/methods (e.g., hierarchical clustering, k-means clustering, minimum spanning tree based clustering, clustering as assignment, detection of clisue/quasi-clique based clustering, correlation clustering, network communities based clustering), Special attention is targeted to formulation of clustering as multicriteria optimization models. Combinatorial optimization models are used as auxiliary problems (e.g., assignment, partitioning, knapsack problem, multiple choice problem, morphological clique problem, searching for consensus/median for structures). Numerical examples illustrate problem formulations, solving methods, and applications. The material can be used as follows: (a) a research survey, (b) a fundamental for designing the structure/architecture of composite modular clustering software, (c) a bibliography reference collection, and (d) a tutorial.

URL PDF HTML ☆

赞 0 踩 0

1505.04123 2026-06-04 cs.LG cs.AI cs.NA math.NA math.OC 版本更新

Margins, Kernels and Non-linear Smoothed Perceptrons

边距、核与非线性平滑感知机

Aaditya Ramdas, Javier Peña

AI总结本文研究了在RKHS中寻找非线性分类函数的问题，提出了一种加速平滑算法，具有与经典核感知机相似的收敛特性，并给出了在无分类器存在时的分离定理。

Comments 17 pages, published in the proceedings of the International Conference on Machine Learning, 2014

详情

Journal ref: Ramdas, Aaditya, and Javier Pena. "Margins, kernels and non-linear smoothed perceptrons." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014

AI中文摘要

我们关注在RKHS中寻找非线性分类函数的问题，从原问题和对偶问题两个角度出发，特别关注感知机和冯-诺依曼算法的推广。我们将问题转化为在RKHS中最大化正则化归一化硬边距(ρ)，并利用表示定理将其转换为与核的（归一化和带符号）Gram矩阵相关的马哈拉诺斯基点积/半范数。我们推导出一种加速平滑算法，具有收敛率为√(log n)/ρ的特性，给定n个可分离点。当不存在此类分类器时，我们证明了RKHS版本的戈尔丹分离定理，并重新解释了负边距。这使得我们能够为原对偶算法提供保证，该算法在存在可行原问题时，可在min{√n/|ρ|, √n/ε}次迭代中找到RKHS中的完美分离器，或在无可行原问题时提供一个对偶ε-不可行性证书。

英文摘要

We focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes - the Perceptron (primal) and Von-Neumann (dual) algorithms. We cast our problem as one of maximizing the regularized normalized hard-margin ($ρ$) in an RKHS and %use the Representer Theorem to rephrase it in terms of a Mahalanobis dot-product/semi-norm associated with the kernel's (normalized and signed) Gram matrix. We derive an accelerated smoothed algorithm with a convergence rate of $\tfrac{\sqrt {\log n}}ρ$ given $n$ separable points, which is strikingly similar to the classical kernelized Perceptron algorithm whose rate is $\tfrac1{ρ^2}$. When no such classifier exists, we prove a version of Gordan's separation theorem for RKHSs, and give a reinterpretation of negative margins. This allows us to give guarantees for a primal-dual algorithm that halts in $\min\{\tfrac{\sqrt n}{|ρ|}, \tfrac{\sqrt n}ε\}$ iterations with a perfect separator in the RKHS if the primal is feasible or a dual $ε$-certificate of near-infeasibility.

URL PDF HTML ☆

赞 0 踩 0

1403.5045 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Matroid Bandits: Fast Combinatorial Optimization with Learning

Matroid Bandits: 快速组合优化中的学习

Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson

AI总结本文提出matroid bandits，结合bandits和matroids，通过Optimistic Matroid Maximization算法解决在matroid上最大化随机模函数的问题，并给出两种 regret 上界。

1502.04266 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Constrained Nonlinear Model Predictive Control of an MMA Polymerization Process via Evolutionary Optimization

通过进化优化实现MMA聚合过程的约束非线性模型预测控制

Masoud Abbaszadeh, Reza Solgi

AI总结本文开发了非线性模型预测控制器用于间歇聚合过程，通过参数化期望轨迹得到轨迹线性化分段模型，并利用实验聚合反应器识别参数，设计多模型自适应预测控制器以实现热轨迹跟踪，采用遗传算法解决约束优化问题以最小化DMC成本函数。

Comments 12 pages, 9 figures, 28 references

1502.01321 2026-06-04 math.NA cs.AI cs.NA math-ph math.MP 版本更新

Numerical Solution of Fuzzy Stochastic Differential Equation

模糊随机微分方程的数值解法

Sukanta Nayak, Snehashish Chakraverty

AI总结本文提出了解决不确定随机微分方程的新方法，利用模糊算术处理具有模糊参数的FSDE，并展示了精确解和欧拉-马尔可夫过程近似方法。

1401.1549 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Optimal Demand Response Using Device Based Reinforcement Learning

基于设备的强化学习的最优需求响应

Zheng Wen, Daniel O'Neill, Hamid Reza Maei

AI总结本文提出一种新型EMS框架，将需求响应问题建模为强化学习问题，通过设备集群分解解决调度问题，无需显式建模用户不满，提升计算效率。

1410.0083 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Integrating active sensing into reactive synthesis with temporal logic constraints under partial observations

将主动感知整合到带有时序逻辑约束的反应合成中，在部分观察下

Jie Fu, Ufuk Topcu

AI总结本文提出在部分可观测和动态环境中，利用感知动作进行在线反应规划，通过主动感知策略减少不确定性，确保时序逻辑规范以概率1满足。

Comments 7 pages, 2 figures, submitted to American Control Conference 2015

详情

AI中文摘要

我们引入了在部分可观测和动态环境中，具有时序逻辑约束的系统中利用感知动作进行在线反应规划的概念。在动态环境信息不完整的情况下，反应控制器合成相当于解决一个具有部分观察的双人游戏，计算复杂度 impractically 高。为减轻高计算负担，通过感知动作进行在线重规划，避免在部分观察下解决反应系统的策略。相反，我们只解决一个策略，确保给定的时序逻辑规范在系统拥有完整环境观察时可以满足。此类策略随后被转换为基于观察到的状态序列（交互系统及其环境）做出控制决策的策略。当系统遇到一个信念——包含所有可能的当前状态假设的集合——对于观察策略未定义时，触发一系列感知动作，由主动感知策略选择，以减少系统信念中的不确定性。我们证明，在满足系统传感器集合的 mild 技术假设下，通过交替使用基于观察的策略和主动感知策略，可以以概率1满足给定的时序逻辑规范。

英文摘要

We introduce the notion of online reactive planning with sensing actions for systems with temporal logic constraints in partially observable and dynamic environments. With incomplete information on the dynamic environment, reactive controller synthesis amounts to solving a two-player game with partial observations, which has impractically computational complexity. To alleviate the high computational burden, online replanning via sensing actions avoids solving the strategy in the reactive system under partial observations. Instead, we only solve for a strategy that ensures a given temporal logic specification can be satisfied had the system have complete observations of its environment. Such a strategy is then transformed into one which makes control decisions based on the observed sequence of states (of the interacting system and its environment). When the system encounters a belief---a set including all possible hypotheses the system has for the current state---for which the observation-based strategy is undefined, a sequence of sensing actions are triggered, chosen by an active sensing strategy, to reduce the uncertainty in the system's belief. We show that by alternating between the observation-based strategy and the active sensing strategy, under a mild technical assumption of the set of sensors in the system, the given temporal logic specification can be satisfied with probability 1.

URL PDF HTML ☆

赞 0 踩 0

1409.5671 2026-06-04 cs.AI cs.CE cs.LG cs.LO cs.SY eess.SY 版本更新

A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems

反应扩散系统模式合成的正式方法方法

Ebru Aydin Gol, Ezio Bartocci, Calin Belta

AI总结本文提出了一种基于空间叠加逻辑的模式检测与生成技术，结合模型检验与粒子群优化，实现反应扩散系统中所需模式的参数合成。

1408.5492 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Towards Decision Support Technology Platform for Modular Systems

面向模块化系统的决策支持技术平台

Mark Sh. Levin

AI总结本文提出一种面向模块化系统的决策支持技术平台，涵盖系统综合、建模、评估等七个基本框架，用于处理复合替代方案，提升多领域决策效率。

Comments 10 pages, 9 figures, 2 tables

详情

AI中文摘要

本文是一篇综述性论文，旨在探讨面向模块化系统的通用决策支持平台技术。该平台由七个基本组合工程框架组成，包括系统综合、系统建模、评估、瓶颈检测、改进/扩展、多阶段设计、组合进化和预测。平台基于决策支持程序（如多准则选择/排序、聚类）和组合优化问题（如背包问题、多选问题、团问题、分配/分配、覆盖、生成树）及其组合。本文描述了：（1）决策支持平台技术的一般方案；（2）模块化（复合）系统（或复合替代方案）的简要描述；（3）从替代方案选择向复合替代方案处理的趋势，对应于分层模块化产品/系统；（4）资源需求方案（即人力、信息-计算机）；（5）基本组合工程框架及其在不同领域中的应用。

英文摘要

The survey methodological paper addresses a glance to a general decision support platform technology for modular systems (modular/composite alterantives/solutions) in various applied domains. The decision support platform consists of seven basic combinatorial engineering frameworks (system synthesis, system modeling, evaluation, detection of bottleneck, improvement/extension, multistage design, combinatorial evolution and forecasting). The decision support platform is based on decision support procedures (e.g., multicriteria selection/sorting, clustering), combinatorial optimization problems (e.g., knapsack, multiple choice problem, clique, assignment/allocation, covering, spanning trees), and their combinations. The following is described: (1) general scheme of the decision support platform technology; (2) brief descriptions of modular (composite) systems (or composite alternatives); (3) trends in moving from chocie/selection of alternatives to processing of composite alternatives which correspond to hierarchical modular products/systems; (4) scheme of resource requirements (i.e., human, information-computer); and (5) basic combinatorial engineering frameworks and their applications in various domains.

URL PDF HTML ☆

赞 0 踩 0

1407.2676 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

A New Optimal Stepsize For Approximate Dynamic Programming

近似动态规划的一种新最优步长

Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell

AI总结本文提出一种新的最优步长规则，通过优化预测误差提升近似动态规划算法的短期性能，仅需一个敏感度较低的可调参数，适应问题噪声水平，加快数值实验中的收敛速度。

Comments Matlab files are included with the paper source

1406.1128 2026-06-04 cs.AI cs.SY eess.SY 版本更新

A self-organizing system for urban traffic control based on predictive interval microscopic model

基于预测区间微观模型的自组织城市交通控制系统

Bartlomiej Placzek

AI总结本文提出一种基于预测区间微观模型的自组织交通信号系统，通过智能体预测控制动作对交通流的影响，优化交通控制性能，尤其在非均匀交通流中表现更优。

Comments 29 pages, 8 figures

详情

DOI: 10.1016/j.engappai.2014.05.004
Journal ref: Engineering Applications of Artificial Intelligence, vol. 34, pp. 75-84, 2014

AI中文摘要

本文介绍了一种用于城市道路网络的自组织交通信号系统。该系统的关键元素是控制路口交通信号的智能体。每个智能体使用区间微观交通模型预测其可能控制动作在短时间范围内的影响。执行的控制动作基于预测的延迟区间进行选择。由于预测结果以区间形式表示，智能体可以识别并暂停那些对交通控制性能效果不确定的控制动作。所提出的交通控制系统在模拟环境中进行了评估。模拟实验表明，所提出的方法在交通控制性能上有所改进，特别是在非均匀交通流中表现更优。

英文摘要

This paper introduces a self-organizing traffic signal system for an urban road network. The key elements of this system are agents that control traffic signals at intersections. Each agent uses an interval microscopic traffic model to predict effects of its possible control actions in a short time horizon. The executed control action is selected on the basis of predicted delay intervals. Since the prediction results are represented by intervals, the agents can recognize and suspend those control actions, whose positive effect on the performance of traffic control is uncertain. Evaluation of the proposed traffic control system was performed in a simulation environment. The simulation experiments have shown that the proposed approach results in an improved performance, particularly for non-uniform traffic streams.

URL PDF HTML ☆

赞 0 踩 0

1405.0936 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Design Of Fuzzy Logic Traffic Controller For Isolated Intersections With Emergency Vehicle Priority System Using MATLAB Simulation

基于MATLAB仿真的隔离交叉口模糊逻辑交通控制器设计：含紧急车辆优先系统

Mohit Jha, Shailja Shukla

AI总结本文设计了基于模糊逻辑的交通控制器，通过MATLAB仿真优化隔离交叉口交通流，利用等待时间和队列长度进行控制，并集成紧急车辆优先系统以提升交通效率。

Comments 7 Pages,7 Figure,CSIR Sponsored X Control Instrumentation System Conference 2013; ISBN 978-93-82338-93-2

详情

AI中文摘要

交通是各国面临的首要难题，尤其在大城市中车辆数量迅速增加。本文提出一种基于MATLAB的模糊逻辑交通控制器，用于控制隔离交叉口的交通流。该控制器基于当前绿灯阶段车辆的等待时间和队列长度，以及其他阶段的车辆队列长度，以确定交通灯时长和相位差，从而实现交通流的最优控制。所用模型包含两个通道，每个入口有不同的队列长度和等待时间。通过接近传感器选择最大等待时间和车辆队列长度作为输入，以改善交叉口的交通流。此外，该控制器还集成了紧急车辆警报传感器，可检测救护车、消防车和警车等紧急车辆，并优先通过信号灯。

英文摘要

Traffic is the chief puzzle problem which every country faces because of the enhancement in number of vehicles throughout the world, especially in large urban towns. Hence the need arises for simulating and optimizing traffic control algorithms to better accommodate this increasing demand. Fuzzy optimization deals with finding the values of input parameters of a complex simulated system which result in desired output. This paper presents a MATLAB simulation of fuzzy logic traffic controller for controlling flow of traffic in isolated intersections. This controller is based on the waiting time and queue length of vehicles at present green phase and vehicles queue lengths at the other phases. The controller controls the traffic light timings and phase difference to ascertain sebaceous flow of traffic with least waiting time and queue length. In this paper, the isolated intersection model used consists of two alleyways in each approach. Every outlook has different value of queue length and waiting time, systematically, at the intersection. The maximum value of waiting time and vehicle queue length has to be selected by using proximity sensors as inputs to controller for the ameliorate control traffic flow at the intersection. An intelligent traffic model and fuzzy logic traffic controller are developed to evaluate the performance of traffic controller under different pre-defined conditions for oleaginous flow of traffic. Additionally, this fuzzy logic traffic controller has emergency vehicle siren sensors which detect emergency vehicle movement like ambulance, fire brigade, Police Van etc. and gives maximum priority to him and pass preferred signal to it.

URL PDF HTML ☆

赞 0 踩 0

1402.6763 2026-06-04 math.OC cs.AI cs.NA math.NA 版本更新

Linear Programming for Large-Scale Markov Decision Problems

大规模马尔可夫决策问题的线性规划

Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

AI总结本文提出通过低维策略集与线性规划对大规模马尔可夫决策过程进行优化，通过随机凸优化和约束采样技术实现性能逼近，不依赖状态空间大小。

Comments 27 pages, 3 figures

详情

AI中文摘要

我们考虑如何控制具有大状态空间的马尔可夫决策过程（MDP）以最小化平均成本。由于对于大规模问题无法与最优策略竞争，我们追求更现实的目标，即与低维策略集竞争。我们使用MDP平均成本问题的对偶线性规划形式，其中变量是状态-动作对的平稳分布，并考虑一个低维子集的平稳分布邻域（以状态-动作特征定义）作为比较类。我们提出两种技术，一种基于随机凸优化，另一种基于约束采样。在两种情况下，我们给出的界限表明，我们的算法性能可以逼近比较类中任何策略的最佳性能。最重要的是，这些结果依赖于比较类的大小，而不是状态空间的大小。初步实验显示，所提出算法在排队应用中有效。

英文摘要

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most importantly, these results depend on the size of the comparison class, but not on the size of the state space. Preliminary experiments show the effectiveness of the proposed algorithms in a queuing application.

URL PDF HTML ☆

赞 0 踩 0

1401.1752 2026-06-04 cs.HC cs.AI cs.NA math.NA 版本更新

Speeding up SOR Solvers for Constraint-based GUIs with a Warm-Start Strategy

通过暖启动策略加速基于约束的GUI求解器

Noreen Jamil, Johannes Müller, Christof Lutteroth, Gerald Weber

AI总结本文提出通过重用先前解来加速基于Gauss-Seidel算法的约束求解器，实验表明在GUI缩放和约束变更场景中能提升求解效率。

详情

AI中文摘要

许多计算机程序具有图形用户界面（GUI），需要良好的布局以高效利用可用屏幕空间。大多数GUI没有固定布局，而是可调整大小并能自我适应。约束是指定可变布局的强大工具：它们用于以通用形式指定布局，而约束求解器用于找到满足的具体系布局，例如特定GUI尺寸。约束求解器每次GUI调整或更改时都需要计算新布局，因此需要高效以确保良好的用户体验。基于Gauss-Seidel算法和逐次超松弛（SOR）的方法之一是约束求解器的途径。我们的观察是，调整或更改后的解决方案在结构上与先前解决方案相似。因此，我们的假设是，如果我们重用先前布局的解决方案来预热新布局的求解，可以提高基于SOR的约束求解器的计算性能。在本文中，我们报告了针对三种常见使用案例（大步缩放、小步缩放和约束变更）的实验，以验证这一假设。在实验中，我们测量了随机生成的GUI布局规范在不同尺寸下的求解时间。对于所有三种情况，我们发现如果使用现有解决方案作为新布局的起始解决方案，性能会得到提升。

英文摘要

Many computer programs have graphical user interfaces (GUIs), which need good layout to make efficient use of the available screen real estate. Most GUIs do not have a fixed layout, but are resizable and able to adapt themselves. Constraints are a powerful tool for specifying adaptable GUI layouts: they are used to specify a layout in a general form, and a constraint solver is used to find a satisfying concrete layout, e.g.\ for a specific GUI size. The constraint solver has to calculate a new layout every time a GUI is resized or changed, so it needs to be efficient to ensure a good user experience. One approach for constraint solvers is based on the Gauss-Seidel algorithm and successive over-relaxation (SOR). Our observation is that a solution after resizing or changing is similar in structure to a previous solution. Thus, our hypothesis is that we can increase the computational performance of an SOR-based constraint solver if we reuse the solution of a previous layout to warm-start the solving of a new layout. In this paper we report on experiments to test this hypothesis experimentally for three common use cases: big-step resizing, small-step resizing and constraint change. In our experiments, we measured the solving time for randomly generated GUI layout specifications of various sizes. For all three cases we found that the performance is improved if an existing solution is used as a starting solution for a new layout.

URL PDF HTML ☆

赞 0 踩 0

1311.4527 2026-06-04 cs.AI cs.DC cs.MA cs.RO cs.SY eess.SY 版本更新

A message-passing algorithm for multi-agent trajectory planning

多智能体轨迹规划的消息传递算法

Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan Yedidia

AI总结本文提出基于改进交替方向乘子法的新型算法，用于计算多智能体的无碰撞全局轨迹，具有自然并行化和易整合不同成本函数的优点。

Comments In Advances in Neural Information Processing Systems (NIPS), 2013. Demo video available at http://www.youtube.com/watch?v=yuGCkVT8Bew

1311.1761 2026-06-04 cs.LG cs.AI cs.NE cs.RO cs.SY eess.SY 版本更新

Exploring Deep and Recurrent Architectures for Optimal Control

探索深度和循环架构以实现最优控制

Sergey Levine

AI总结本文探讨了将深度和循环神经网络应用于连续高维运动控制任务，通过强化学习算法训练控制器，比较不同架构的性能，并讨论深度学习在最优控制中的应用前景。

Comments Appears in the Neural Information Processing Systems (NIPS 2013) Workshop on Deep Learning

详情

AI中文摘要

复杂的多层神经网络在多个监督任务中取得了最先进的结果。然而，此类多层网络在控制领域的成功应用迄今为止主要局限于控制流水线的感知部分。本文探讨了将深度和循环神经网络应用于连续、高维运动任务，其中网络用于表示控制策略，将系统状态（由关节角度表示）直接映射到每个关节的扭矩。通过使用最近的强化学习算法guided policy search，可以成功训练具有数千参数的神经网络控制器，从而比较各种架构。我们讨论了运动控制任务与先前监督感知任务的区别，展示了比较各种架构的实验结果，并讨论了将深度学习技术应用于最优控制问题的未来方向。

英文摘要

Sophisticated multilayer neural networks have achieved state of the art results on multiple supervised tasks. However, successful applications of such multilayer networks to control have so far been limited largely to the perception portion of the control pipeline. In this paper, we explore the application of deep and recurrent neural networks to a continuous, high-dimensional locomotion task, where the network is used to represent a control policy that maps the state of the system (represented by joint angles) directly to the torques at each joint. By using a recent reinforcement learning algorithm called guided policy search, we can successfully train neural network controllers with thousands of parameters, allowing us to compare a variety of architectures. We discuss the differences between the locomotion control task and previous supervised perception tasks, present experimental results comparing various architectures, and discuss future directions in the application of techniques from deep learning to the problem of optimal control.

URL PDF HTML ☆

赞 0 踩 0

1310.7950 2026-06-04 eess.SY cs.AI cs.LO cs.SY 版本更新

Technical Report: Distribution Temporal Logic: Combining Correctness with Quality of Estimation

技术报告：分布时间逻辑：结合正确性与估计质量

Austin Jones, Mac Schwager, Calin Belta

AI总结本文提出分布时间逻辑（DTL），用于描述部分可观测系统中涉及不确定性和可能性的属性，提供了一种安全的公式化方法及监控算法，并通过救援机器人应用案例验证。

Comments More expanded version of "Distribution Temporal Logic: Combining Correctness with Quality of Estimation" to appear in IEEE CDC 2013

1309.0866 2026-06-04 cs.LO cs.AI cs.LG cs.SY eess.SY 版本更新

On the Robustness of Temporal Properties for Stochastic Models

关于随机模型中时间属性的鲁棒性

Ezio Bartocci, Luca Bortolussi, Laura Nenzi, Guido Sanguinetti

AI总结本文研究了随机模型中时间属性的鲁棒性，提出鲁棒性度量方法，并结合满足概率优化系统设计。

Comments In Proceedings HSB 2013, arXiv:1308.5724

详情

DOI: 10.4204/EPTCS.125.1
Journal ref: EPTCS 125, 2013, pp. 3-19

AI中文摘要

随机模型如连续时间马尔可夫链（CTMC）和随机混合自动机（SHA）因其能捕捉生物过程中的随机性而成为强大的形式化工具。形式化建模中的经典问题——模型检查问题——即计算特定时间逻辑公式行为在给定随机过程中的概率。然而，除了满足性外，还关注系统维持特定涌现行为的鲁棒性，不受外部噪声或模型参数微小变化的影响。本文提出将鲁棒性概念扩展至随机系统，展示其自然导致鲁棒性分数分布，并通过两个例子说明如何近似分布及其关键指标：平均鲁棒性和条件平均鲁棒性。其次，展示了如何将这些指标与满足概率结合，以解决系统设计问题，即优化随机模型的控制参数以最大化所需规范的鲁棒性。

英文摘要

Stochastic models such as Continuous-Time Markov Chains (CTMC) and Stochastic Hybrid Automata (SHA) are powerful formalisms to model and to reason about the dynamics of biological systems, due to their ability to capture the stochasticity inherent in biological processes. A classical question in formal modelling with clear relevance to biological modelling is the model checking problem. i.e. calculate the probability that a behaviour, expressed for instance in terms of a certain temporal logic formula, may occur in a given stochastic process. However, one may not only be interested in the notion of satisfiability, but also in the capacity of a system to mantain a particular emergent behaviour unaffected by the perturbations, caused e.g. from extrinsic noise, or by possible small changes in the model parameters. To address this issue, researchers from the verification community have recently proposed several notions of robustness for temporal logic providing suitable definitions of distance between a trajectory of a (deterministic) dynamical system and the boundaries of the set of trajectories satisfying the property of interest. The contributions of this paper are twofold. First, we extend the notion of robustness to stochastic systems, showing that this naturally leads to a distribution of robustness scores. By discussing two examples, we show how to approximate the distribution of the robustness score and its key indicators: the average robustness and the conditional average robustness. Secondly, we show how to combine these indicators with the satisfaction probability to address the system design problem, where the goal is to optimize some control parameters of a stochastic model in order to best maximize robustness of the desired specifications.

URL PDF HTML ☆

赞 0 踩 0

1308.5332 2026-06-04 eess.SY cs.AI cs.SE cs.SY 版本更新

An Integrated Framework for Diagnosis and Prognosis of Hybrid Systems

混合系统诊断与预测的集成框架

Elodie Chanthery, Pauline Ribot

AI总结本文提出一种混合系统诊断与预测的集成理论框架，通过增强形式化方法以跟踪系统故障的衰老规律，并提出一种在混合框架中交织诊断与预测的方法。

Comments In Proceedings HAS 2013, arXiv:1308.4904

1306.4635 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Towards Multistage Design of Modular Systems

模块化系统的多阶段设计

Mark Sh. Levin

AI总结本文提出了一种多阶段设计方法，用于设计复合（模块化）系统轨迹，通过时间/逻辑点定义、模块化设计和解决方案选择，解决复杂系统设计问题。

Comments 13 pages, 25 figures, 14 tables

1306.0128 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Towards Detection of Bottlenecks in Modular Systems

向模块化系统瓶颈检测的迈进

Mark Sh. Levin

AI总结本文探讨了复合模块化系统瓶颈检测的基本方法，包括传统质量管理方法、关键系统元素选择、复合系统故障识别及预测性检测，通过启发式方案解决相关问题。

Comments 12 pp., tables 4, figures 15

1305.4917 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Note on Evaluation of Hierarchical Modular Systems

关于分层模块系统评估的注记

Mark Sh. Levin

AI总结本文综述了分层复合系统评估方法，探讨了评估尺度、转换问题及整合方法，强调了系统组件评估与整合过程。

Comments 15 pages, 23 figures, 4 tables

1305.2752 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Hybrid fuzzy logic and pid controller based ph neutralization pilot plant

基于模糊逻辑和PID控制器的pH中和试点工厂

Oumair Naseer, Atif Ali Khan

AI总结本文提出一种优化的数学模型和先进混合控制器设计，用于pH中和试点工厂的控制设计与自动化，以应对过程控制中的挑战。

1304.3088 2026-06-04 eess.SY cs.AI cs.MA cs.SY 版本更新

Information and Multi-Sensor Coordination

信息与多传感器协调

Greg Hager, Hugh F. Durrant-Whyte

AI总结本文研究多传感器系统中信息融合与协调控制问题，提出基于团队决策理论的分析方法，通过仿真验证了在不确定性环境下多传感器协作的有效性。

Comments Appears in Proceedings of the Second Conference on Uncertainty in Artificial Intelligence (UAI1986)

详情

AI中文摘要

分布式多传感器感知系统的控制与集成是一个复杂且具有挑战性的问题。不同传感器的观测或意见往往不一致且难以比较，通常只是部分视角。传感器信息本质上具有不确定性，此外，个别传感器可能相对于整体系统本身存在误差。多传感器系统的成功运行必须考虑这种不确定性，并以智能和稳健的方式聚合不一致的信息。我们将多传感器系统的传感器视为团队成员或智能体，能够提供意见并在群体决策中协商。我们使用团队决策理论分析此结构的协调与控制。我们对多传感器聚合提出了一些新的分析结果，并详细描述了一个仿真，用于研究我们的想法。该仿真为分析在不确定性下协作的复杂智能体结构提供了基础。本研究的结果参考了多传感器机器人系统、分布式人工智能和在不确定性下的决策制定。

英文摘要

The control and integration of distributed, multi-sensor perceptual systems is a complex and challenging problem. The observations or opinions of different sensors are often disparate incomparable and are usually only partial views. Sensor information is inherently uncertain and in addition the individual sensors may themselves be in error with respect to the system as a whole. The successful operation of a multi-sensor system must account for this uncertainty and provide for the aggregation of disparate information in an intelligent and robust manner. We consider the sensors of a multi-sensor system to be members or agents of a team, able to offer opinions and bargain in group decisions. We will analyze the coordination and control of this structure using a theory of team decision-making. We present some new analytic results on multi-sensor aggregation and detail a simulation which we use to investigate our ideas. This simulation provides a basis for the analysis of complex agent structures cooperating in the presence of uncertainty. The results of this study are discussed with reference to multi-sensor robot systems, distributed Al and decision making under uncertainty.

URL PDF HTML ☆

赞 0 踩 0

1304.3075 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Application of Evidential Reasoning to Helicopter Flight Path Control

证据推理在直升机飞行路径控制中的应用

Shoshana Abel

AI总结本文提出了一种专家系统推理和知识表示方法，用于在实时车辆导航系统中处理不确定性。提出了一种创新的证据推理方法，即求和与格点方法，并进行了数学推导、并行环境实现及原型软件开发和测试。

Comments Appears in Proceedings of the Second Conference on Uncertainty in Artificial Intelligence (UAI1986)

1304.2757 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Estimation Procedures for Robust Sensor Control

鲁棒传感器控制的估计方法

Greg Hager, Max Mintz

AI总结本文研究非线性测量系统中鲁棒传感器控制问题，评估三种估计技术并讨论其适用条件及性能评估方法。

Comments Appears in Proceedings of the Third Conference on Uncertainty in Artificial Intelligence (UAI1987)

详情

AI中文摘要

许多机器人传感器估计问题可以描述为非线性测量系统。这些系统受噪声干扰且可能单次观测不足。为获得可靠估计结果，系统需选择视图以形成超定系统，即传感器控制问题。准确可靠的传感器控制需要一种能提供估计值和自身性能度量的估计方法。在非线性测量系统中，计算简单的闭式估计解可能不存在。然而，近似技术提供了可行替代方案。本文评估了三种估计技术：扩展卡尔曼滤波器、离散贝叶斯近似和迭代贝叶斯近似。我们呈现了数学结果和仿真统计数据，说明扩展卡尔曼滤波器在传感器控制中不适用的运行条件，并讨论了离散贝叶斯近似的使用问题。

英文摘要

Many robotic sensor estimation problems can characterized in terms of nonlinear measurement systems. These systems are contaminated with noise and may be underdetermined from a single observation. In order to get reliable estimation results, the system must choose views which result in an overdetermined system. This is the sensor control problem. Accurate and reliable sensor control requires an estimation procedure which yields both estimates and measures of its own performance. In the case of nonlinear measurement systems, computationally simple closed-form estimation solutions may not exist. However, approximation techniques provide viable alternatives. In this paper, we evaluate three estimation techniques: the extended Kalman filter, a discrete Bayes approximation, and an iterative Bayes approximation. We present mathematical results and simulation statistics illustrating operating conditions where the extended Kalman filter is inappropriate for sensor control, and discuss issues in the use of the discrete Bayes approximation.

URL PDF HTML ☆

赞 0 踩 0

1304.2382 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Predicting the Likely Behaviors of Continuous Nonlinear Systems in Equilibrium

预测连续非线性系统在平衡状态下的可能行为

Alexander Yeh

AI总结本文提出SAB方法，通过划分输入空间并建立密度下界，预测连续非线性系统在平衡状态下的行为可能性，无需精确知道输入密度参数。

Comments Appears in Proceedings of the Fourth Conference on Uncertainty in Artificial Intelligence (UAI1988)

详情

AI中文摘要

本文介绍了一种方法，用于预测连续非线性系统在平衡状态下的可能行为，其中输入值可以变化。该方法使用参数化方程模型和输入联合密度的下界来限制某些行为发生的可能性，例如状态变量处于给定数值范围内的概率。使用密度下界而非密度本身是可取的，因为通常输入密度的参数和形状并不完全已知。新方法称为SAB，其基本操作是将输入值空间划分为较小的区域，然后对这些区域的可能行为和概率进行界定。SAB首先找到粗略的边界，然后在给定更多时间后进行细化。与其它研究方法相比，SAB可以（1）找到所有可能的系统行为并指示其可能性，（2）不近似可能结果的分布，除非有误差大小的度量，（3）不使用离散化的变量值，这限制了可以找到概率边界的事件，（4）能够处理密度下界，（5）能够处理诸如两个状态变量都处于数值范围内的标准。

英文摘要

This paper introduces a method for predicting the likely behaviors of continuous nonlinear systems in equilibrium in which the input values can vary. The method uses a parameterized equation model and a lower bound on the input joint density to bound the likelihood that some behavior will occur, such as a state variable being inside a given numeric range. Using a bound on the density instead of the density itself is desirable because often the input density's parameters and shape are not exactly known. The new method is called SAB after its basic operations: split the input value space into smaller regions, and then bound those regions' possible behaviors and the probability of being in them. SAB finds rough bounds at first, and then refines them as more time is given. In contrast to other researchers' methods, SAB can (1) find all the possible system behaviors, and indicate how likely they are, (2) does not approximate the distribution of possible outcomes without some measure of the error magnitude, (3) does not use discretized variable values, which limit the events one can find probability bounds for, (4) can handle density bounds, and (5) can handle such criteria as two state variables both being inside a numeric range.

URL PDF HTML ☆

赞 0 踩 0