arXivDaily arXiv每日学术速递 周一至周五更新
重置
2606.05158 2026-06-04 cs.CL cs.AI cs.MA 版本更新

Streaming Communication in Multi-Agent Reasoning

多智能体推理中的流式通信

Zhen Yang, Xiaogang Xu, Wen Wang, Cong Chen, Xander Xu, Ying-Cong Chen

发表机构 * HKUST(GZ)(香港科技大学(广州)) Alibaba Group(阿里巴巴集团) ZJU(浙江大学) HKUST(香港科技大学)

AI总结 提出流式多智能体推理系统StreamMA,通过将推理步骤实时流式传输给下游智能体来降低延迟,并意外地提升了效果,同时首次给出流式、串行和单协议三种模式的闭式联合分析。

Comments project page: https://zhenyangcs.github.io/StreamMA-website/

详情
AI中文摘要

多智能体推理系统采用“生成-然后传输”范式,导致端到端延迟与流水线深度成线性关系。我们提出StreamMA,一种多智能体推理系统,它将每个推理步骤在生成后立即流式传输给下游智能体,流水线化相邻智能体,从而降低延迟。令人惊讶的是,这种流水线化也提高了效果:因为多步推理质量不均匀,早期步骤比后期步骤更可靠,使用这些可靠的早期步骤而不是完整链条可以防止容易出错的后期步骤误导下游智能体。我们通过首个流式、串行和单协议三种模式的闭式联合分析,形式化了这两种优势,推导出效果排序、加速上限和成本比。在涵盖数学、科学和代码的八个推理基准测试中,使用两个前沿LLM(Claude Opus 4.6和GPT-5.4)以及三种拓扑结构(链、树、图),StreamMA均优于两个基线(平均+7.3个百分点,在HMMT 2026上最高+22.4个百分点;Claude Opus 4.6-high)。除了这些贡献,我们还发现了一个“步骤级缩放定律”:增加每个智能体的步骤持续提高效果和效率,这是一个与智能体数量缩放正交且可组合的新缩放维度。

英文摘要

Multi-agent reasoning systems adopt a "generate-then-transfer" paradigm that forces end-to-end latency to scale linearly with pipeline depth. We introduce StreamMA, a multi-agent reasoning system that streams each reasoning step to downstream agents as soon as it is generated, pipelining adjacent agents and thus reducing latency. Surprisingly, this pipelining also improves effectiveness: because multi-step reasoning quality is non-uniform and early steps are more reliable than later ones, working with these reliable early steps instead of the full chain prevents error-prone late steps from misleading downstream agents. We formalize both advantages with the first closed-form joint analysis of stream, serial, and single protocols, deriving the effectiveness ordering, speedup upper bound, and cost ratio. Across eight reasoning benchmarks spanning mathematics, science, and code, two frontier LLMs (Claude Opus 4.6 and GPT-5.4), and three topologies (Chain, Tree, Graph), StreamMA outperforms both baselines (avg. +7.3 pp, max +22.4 pp on HMMT 2026; Claude Opus 4.6-high). Beyond these contributions, we discover a "step-level scaling law": increasing per-agent steps consistently improves both effectiveness and efficiency, a new scaling dimension orthogonal to and composable with agent-count scaling.

2606.05150 2026-06-04 cs.NE cs.AI 版本更新

Multi-Column RBF Neural Network Using Adaptive and Non-Adaptive Particle Swarm Optimization

使用自适应和非自适应粒子群优化的多列RBF神经网络

Ammar Hoori, Yuichi Motai

发表机构 * Department of Biomedical Engineering, Case Western Reserve University(生物医学工程系,凯斯西储大学) Department of Electrical and Computer Engineering, Virginia Commonwealth University(电气与计算机工程系,弗吉尼亚 Commonwealth 大学)

AI总结 针对大规模数据集下RBF神经网络训练的可扩展性问题,提出基于粒子群优化(PSO)和自适应PSO(APSO)的多列RBF网络(MC-PSO和MC-APSO),通过并行训练多个RBFN并利用子集专门化提高精度和速度。

Comments 15 Page, Under Review

详情
AI中文摘要

使用梯度下降算法训练的径向基函数神经网络(RBFN)在浅层和深层网络中提供了有效的全连接结构。误差校正(ErrCor)是一种先进的基于梯度的训练方法,它选择最优隐藏单元以提高精度。另外,作为基于种群的算法,粒子群优化算法(PSO)利用群体经验优化RBFN参数,提供全局搜索和对局部最小值的鲁棒性。自适应PSO(APSO)作为PSO的改进变体出现。APSO算法通过在优化过程中动态调整群体参数来提高收敛速度。ErrCor和PSO都显示出改进的结果和有竞争力的收敛性。然而,对于大规模数据集,这些方法面临可扩展性挑战,如过多的核计算和大的隐藏层结构。最近的多列RBFN方法(MCRN)通过在并行系统中部署小型RBFN来提高ErrCor性能。受MCRN成功的启发,我们提出了两种改进PSO性能的新方法:使用PSO的多列RBFN(MC-PSO)和使用APSO的多列RBFN(MC-APSO)。这些方法引入了使用进化群方法训练的并行RBFN结构。每个RBFN独立地在数据集的特定空间子集上使用PSO或APSO算法进行训练。这些经过专门训练的RBFN针对各自的子集进行了定制。在测试期间,只有测试实例邻居所在的选定RBFN对多列输出有贡献。这种专门化提高了精度,而并行性提高了速度。我们在各种基准数据集上评估了所提出的方法。MC-PSO和MC-APSO在精度和召回率方面优于ErrCor、PSO、APSO和MCRN。在大多数实验中,它们还表现出更快的训练和测试时间。

英文摘要

The radial basis function neural network (RBFN) trained with a gradient descending algorithm provides an effective fully connected structure in both shallow and deep networks. The error correction (ErrCor), a state-of-the-art gradient-based training method, selects optimal hidden units to improve accuracy. Alternatively, as a population-based algorithm, the particle swarm optimization algorithm (PSO) uses the swarm experience to optimize RBFN parameters, offering global search and robustness to local minima. Adaptive PSO (APSO) has emerged as an improved variant of PSO. APSO algorithm improves convergence speed by dynamically adjusting swarm parameters during optimization. Both ErrCor and PSO demonstrate improved results and competitive convergence. However, with large datasets, these methods face scalability challenges such as excessive kernel computations and large hidden layer structures. A recent multi-column RBFN approach (MCRN) improves ErrCor performance by deploying small RBFNs in a parallel system. Inspired by MCRN's success, we propose two novel approaches to improve PSO performance: the multi-column RBFN with PSO (MC-PSO) and the multi-column RBFN with APSO (MC-APSO). These methods introduce parallel RBFN structures trained using evolutionary swarm methods. Each RBFN is independently trained on a specific spatial subset of the dataset using either PSO or APSO algorithms. These resulting specialist-trained RBFNs are tailored to their respective subsets. During testing, only selected RBFNs, where the test instance neighbors are located, contribute to the multi-column output. This specialization improves accuracy, while parallelism enhances speed. We evaluate the proposed methods on various benchmark datasets. The MC-PSO and MC-APSO outperform ErrCor, PSO, APSO, and MCRN in terms of accuracy and recall. They also demonstrate faster training and testing times in most experiments.

2606.05145 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Failed Reasoning Traces Tell You What Is Fixable (But Not by Reading Them)

失败推理轨迹告诉你什么是可修复的(但仅凭阅读它们不行)

Nizar Islah, Istabrak Abbes, Irina Rish, Sarath Chandar, Eilif B. Muller

发表机构 * Mila - Quebec AI Institute(魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) Polytechnique Montréal(蒙特利尔理工学院) CHU Sainte-Justine(圣约斯特医院)

AI总结 本文提出通过失败推理轨迹的分布特征而非文本内容来识别可修复的失败,并设计无训练的路由规则提升测试时干预效果。

详情
AI中文摘要

当后训练语言模型在推理问题上失败时,常见的测试时扩展响应是花费更多计算进行额外尝试,而失败轨迹不再发挥作用。我们认为这丢弃了一个关键信号;一些失败源于不幸运的采样,此时更多滚动有助于解决,而其他失败是结构性的,无论预算如何都无法通过重采样解决。我们提出失败轨迹编码了可恢复性结构:即哪些测试时干预可以挽救特定失败的推理时特征。三个问题级别的轨迹特征,源自可用干预的结构,从失败滚动的分布特征(而非其文本)中恢复这种结构。它们将失败聚类为稳定区域,刻画不同后训练方法的失败地形(准确率84.3±4.3%,比多数类基线高20%),并支持一个无训练的路由规则,在部署相关的Steerable-Hard子集(重试不足且可达有界干预的失败)上将挽救率提升12.2%。这些特征和路由规则在两个跨家族探针上可迁移。因此,相同的三个特征将失败轨迹从丢弃数据转化为诊断对象,支持测试时路由和后训练分析,无需训练时或权重空间访问。

英文摘要

When post-trained language models fail on reasoning problems, the common test-time-scaling response is to spend more compute on additional attempts, and the failed traces play no further role. We argue this discards a crucial signal; some failures come from unlucky sampling, where more rollouts help, while others are structural and resist resampling regardless of budget. We propose that failed traces encode recoverability structure: the inference-time signature of which test-time interventions can rescue a given failure. Three problem-level trajectory features, derived from the structure of available interventions, recover this structure from the distributional signature of failed rollouts, not their text. They cluster failures into stable regimes, characterize the failure topography of different post-training methods ($84.3{\pm}4.3\%$ accuracy, $+20\%$ over a majority-class baseline), and support a training-free routing rule that lifts rescue by $+12.2\%$ on the deployment-relevant Steerable-Hard subset (failures where retry is insufficient and a bounded intervention is reachable). The features and the routing rule transfer across two cross-family probes. The same three features thus convert failed traces from discarded data into a diagnostic object, supporting test-time routing and post-training analysis without training-time or weight-space access.

2606.05142 2026-06-04 cs.CV cs.AI 版本更新

GeM-NR: Geometry-Aware Multi-View Editing for Nonrigid Scene Changes

GeM-NR:面向非刚性场景变化的几何感知多视角编辑

Josef Bengtson, Yaroslava Lochman, Fredrik Kahl

发表机构 * Chalmers University of Technology(查尔姆斯理工大学)

AI总结 提出GeM-NR,一种无需训练的快速灵活方法,通过深度图对齐、视角投影和条件细化实现多视角一致的通用非刚性图像编辑,支持几何和外观的显著变化。

Comments Project page: https://gem-nr.github.io/

详情
AI中文摘要

近年来,基于生成模型的多视角图像编辑的发展使我们离通用3D内容生成和定制更近一步。现有大多数工作通过利用未编辑场景的几何结构,专注于刚性或仅外观的编辑。这自然将这些方法限制在保留底层场景结构的编辑上。其他方法则针对特定图像编辑任务(如物体移除和添加)进行训练。尽管取得了进展,但通用的非刚性编辑(即大幅改变场景几何的编辑)对现有方法仍然具有挑战性。我们提出GeM-NR,一种快速灵活且无需训练的方法,用于通用的多视角一致图像编辑,包括大幅改变场景几何和外观的编辑。给定一个使用选定骨干编辑器(如FLUX、Qwen、BrushNet)编辑的锚点图像和一个未编辑的查询图像,GeM-NR以与锚点编辑一致的方式编辑查询图像。该方法包含多个阶段:(i) 深度图估计,我们提出一种策略以最大化编辑和未编辑场景的3D点云之间的对齐;(ii) 投影到查询视角;(iii) 基于未编辑查询的条件细化所得图像。基于条件化的公式从两个视角很好地扩展到物体的多个视角。我们展示了该方法处理几何和外观显著变化的编辑的能力,这是现有方法难以做到的。我们进行了广泛评估,表明我们的方法在各种编辑任务中提高了一致性,包括生成编辑场景的3D表示。定量和定性结果均表明,我们的方法在编辑质量以及多视角几何和光度一致性方面达到了最先进的性能。

英文摘要

Recent developments in multi-view image editing with generative models have brought us a step closer toward general 3D content generation and customization. Most existing works focus on rigid or appearance-only edits by utilizing the geometry of the unedited scene. This naturally limits these methods to edits that preserve the underlying scene structure. Other approaches are trained for specific image editing tasks, such as object removal and addition. Despite this progress, general nonrigid edits, i.e., edits that substantially change the scene geometry, remain challenging for existing methods. We propose GeM-NR, a fast and flexible training-free approach for general multi-view consistent image editing, including edits that drastically change the geometry and appearance of the scene. Given an anchor image edited with a chosen backbone editor (such as FLUX, Qwen, BrushNet) and a query unedited image, GeM-NR edits the query image consistently with the anchor edit. The method incorporates multiple stages: (i) depth map estimation, where we propose a strategy to maximize the alignment between the 3D point clouds of the edited and unedited scenes, (ii) projection onto a query viewpoint, and (iii) refinement of the obtained image conditioned on the unedited query. The conditioning-based formulation scales well from two to many views of an object. We demonstrate the ability of our method to handle edits with significant changes in geometry and appearance, something that existing methods struggle with. We perform an extensive evaluation showing that our method improves consistency for a wide variety of edit tasks, including generating 3D representations of the edited scene. Both quantitative and qualitative results indicate the state-of-the-art performance of our method in terms of edit quality as well as geometric and photometric consistency across multiple views.

2606.05130 2026-06-04 cs.LG cs.AI 版本更新

Towards Efficient and Evidence-grounded Mobility Prediction with LLM-Driven Agent

面向高效且基于证据的移动预测:基于LLM驱动的智能体

Linyao Chen, Qinlao Zhao, Zechen Li, Mingming Li, Likun Ni, Jinyu Chen, Yuhao Yao, Xuan Song, Noboru Koshizuka, Hiroki Kobayashi

发表机构 * The University of Tokyo(东京大学) Huazhong University of Science and Technology(华中科技大学) University of New South Wales, Sydney(新南威尔士大学(悉尼)) LocationMind Inc.(LocationMind公司) Southern University of Science and Technology(南方科技大学) Jilin University(吉林大学)

AI总结 提出一种无需训练的LLM驱动智能体框架AgentMob,通过自适应证据收集机制解决移动预测中的模糊情况,在多个数据集上达到最优性能。

详情
AI中文摘要

个体层面的移动预测是城市模拟、交通规划和政策分析的核心。监督序列模型实现了高精度,但需要任务特定训练且决策透明度有限。最近的基于LLM的方法提高了可解释性,但大多依赖静态提示和单次推理,限制了在移动信号弱或冲突时寻求额外证据的能力。我们提出\method{},一种无需训练的LLM驱动智能体框架,将下一位置预测建模为自适应证据控制的决策制定。\method{}通过基于历史规律性的快速路径处理常规情况,而模糊情况则触发对近期轨迹、历史行为、停留-移动可能性和地理证据的迭代工具使用。在三个移动数据集上,AgentMob在无需训练的基于LLM的方法中实现了最强的整体性能,GPT-5.4在BW上达到71.42%的Acc@1,在YJMob100K上达到33.14%,在上海ISP上达到33.50%。在BW的非快速路径案例中,LLM控制器相比相同工具的统计基线将Acc@1从30.65%提高到48.62%,表明其主要优势在于通过自适应证据收集解决模糊预测。我们的代码可在https://github.com/Unknown-zoo/AgentMob获取。

英文摘要

Individual-level mobility prediction is central to urban simulation, transportation planning, and policy analysis. Supervised sequence models achieve strong accuracy but require task-specific training and offer limited decision-level transparency. Recent LLM-based methods improve interpretability, yet mostly rely on static prompts and single-pass inference, limiting their ability to seek additional evidence when mobility signals are weak or conflicting. We propose \method{}, a training-free LLM-driven agent framework that formulates next-location prediction as adaptive evidence-controlled decision making. \method{} resolves routine cases through a fast path based on historical regularity, while ambiguous cases trigger iterative tool use over recent trajectories, historical behavior, stay-move likelihood, and geographical evidence. Across three mobility datasets, AgentMob achieves the strongest overall performance among training-free LLM-based methods, with GPT-5.4 reaching 71.42\% Acc@1 on BW, 33.14\% on YJMob100K, and 33.50\% on Shanghai ISP. On BW non-fast-path cases, the LLM controller improves Acc@1 from 30.65\% to 48.62\% over a same-tool statistical baseline, showing that its main benefit lies in resolving ambiguous predictions through adaptive evidence gathering. Our code is available at https://github.com/Unknown-zoo/AgentMob.

2606.05121 2026-06-04 cs.SD cs.AI cs.CL cs.MM eess.AS 版本更新

Audio Interaction Model

音频交互模型

Zhifei Xie, Zihang Liu, Ze An, Xiaobin Hu, Yue Liao, Ziyang Ma, Dongchao Yang, Mingbao Lin, Deheng Ye, Shuicheng Yan, Chunyan Miao

发表机构 * NTU(国立新加坡大学) NUS(新加坡国立大学) CUHK(香港大学)

AI总结 提出一种统一的在线大型音频语言模型Audio-Interaction,通过始终在线的感知-决策-响应循环实现实时音频交互,并构建了StreamAudio-2M数据集和Proactive-Sound-Bench基准,在保持主流音频任务性能的同时解锁了实时ASR、流式音频指令跟随和主动帮助等能力。

Comments Next generation of LALMs, work in progress

详情
AI中文摘要

音频本质上是一种交互式模态,然而当今的大型音频语言模型(LALM)是离线的,而流式音频模型每个只处理单一任务,如流式ASR或语音聊天。现在是时候将它们统一为一个在线LALM:一个通过始终在线的感知-决策-响应循环,实时收听声音、环境和指令并即时反应的模型。我们将这种机制形式化为音频交互模型,并通过Audio-Interaction实现,这是一个统一的流式模型,在保留离线任务执行的同时,增加了在线通用音频指令跟随能力,从对话到全语音聊天,根据流语义决定何时响应。为此,我们提出了SoundFlow框架,该框架通过流原生数据构建、理解感知训练和异步低延迟推理,端到端地实例化感知-决策-响应循环,实现稳定的实时交互。我们进一步构建了StreamAudio-2M,一个包含260万项流式语料库,涵盖7种基本能力和28个子任务,以及用于评估主动音频干预的Proactive-Sound-Bench。在8个基准测试中,Audio-Interaction在主流音频任务上保持有竞争力的性能,同时解锁了离线LALM无法实现的能力,包括实时ASR、流式音频指令跟随和主动帮助。

英文摘要

Audio is an inherently interactive modality, yet today's Large Audio Language Models (LALMs) are offline, and streaming audio models each handle only a single task such as streaming ASR or voice chatting. It is time to unify them into one online LALM: a model that, through an always-on perceive-decide-respond loop, listens to sound, environment, and instructions in real time and reacts on the fly. We formalize this regime as the Audio Interaction Model, and realize it with Audio-Interaction, a unified streaming model that retains offline task execution while adding online general audio instruction following, from dialogue to full voice chatting, deciding when to respond from the semantics of the stream. To enable this, we propose SoundFlow, a framework that instantiates the perceive-decide-respond loop end to end, from data to training to deployment, through streaming-native data construction, comprehension-aware training, and asynchronous low-latency inference for stable real-time interaction. We further construct StreamAudio-2M, a 2.6M-item streaming corpus spanning 7 fundamental abilities and 28 sub-tasks, and Proactive-Sound-Bench for evaluating proactive audio intervention. Across 8 benchmarks, Audio-Interaction preserves competitive performance on mainstream audio tasks while unlocking capabilities inaccessible to offline LALMs, including real-time ASR, streaming audio instruction following, and proactive help.

2606.05115 2026-06-04 cs.CV cs.AI cs.CL 版本更新

Continual Visual and Verbal Learning Through a Child's Egocentric Input

通过儿童自我中心输入进行持续的视觉与语言学习

Xiaoyang Jiang, Yanlai Yang, Kenneth A. Norman, Brenden Lake, Mengye Ren

发表机构 * Agentic Learning AI Lab, New York University(代理学习人工智能实验室,纽约大学) Department of Psychology, Princeton University(心理学系,普林斯顿大学)

AI总结 提出BabyCL持续多模态学习框架,在单一时间顺序处理SAYCam数据集,通过流式视觉表示学习和图像-文本对比目标,在SAYCam Labeled-S 4AFC基准上优于流式学习基线,缩小了与离线训练上限的差距。

Comments 15 pages, 4 figures

详情
AI中文摘要

儿童从连续的、时间结构化的自我中心经验流中学习单词的含义。最近的研究表明,神经网络也可以从儿童的自我中心视频记录中学习单词-指代物映射,但它们会循环处理打乱的数据数百个周期,这与儿童实际接触环境的方式形成对比。我们引入了BabyCL,一个持续多模态学习框架,它以单一时间顺序处理SAYCam数据集,结合了流式视觉表示学习和图像-文本对比目标。BabyCL将流的多阶段时间分割与双回放缓冲区相结合,该缓冲区独立管理视觉和多模态历史,并在共享骨干网络上联合训练三个对比损失。在匹配的优化预算下,BabyCL在SAYCam Labeled-S 4AFC基准上优于流式学习基线,显著缩小了与离线训练上限的差距。消融实验表明,这些增益对在线时间分割窗口的长度和回放缓冲区的驱逐规则具有鲁棒性。总之,这些结果表明,在更接近儿童实际体验的训练条件下,有意义的单词-指代物映射可以出现。

英文摘要

Children learn the meanings of words from a continuous, temporally structured stream of egocentric experience. Recent work shows that neural networks can also learn word-referent mappings from a child's egocentric video recordings, but they cycle through the shuffled data for hundreds of epochs, contrasting with how children actually encounter their environment. We introduce BabyCL, a continual multimodal learning framework that processes the SAYCam dataset in a single chronological pass, combining streaming visual representation learning with an image-text contrastive objective. BabyCL combines a multi-stage temporal segmentation of the stream with a dual replay buffer that independently manages visual and multimodal histories, and it is jointly trained with three contrastive losses on a shared backbone. Under a matched optimization budget, BabyCL outperforms streaming learning baselines on the SAYCam Labeled-S 4AFC benchmark, substantially narrowing the gap to an upper bound of offline training. Ablations show that the gains are robust to the length of the online temporal segmentation window and the eviction rule of the replay buffer. Together, these results show that meaningful word-referent mappings can emerge under training conditions much closer to a child's actual experience.

2606.05107 2026-06-04 cs.CV cs.AI 版本更新

Who Needs Labels? Adapting Vision Foundation Models With the Metadata You Already Have

谁需要标签?利用已有的元数据适应视觉基础模型

Elouan Gardès, Seung Eun Yi, Kartik Ahuja, Théo Moutakanni, Huy V. Vo, Piotr Bojanowski, Wolfgang M. Pernice, Loïc Landrieu, Camille Couprie

发表机构 * Meta FAIR, Paris(Meta FAIR,巴黎) LIGM, CNRS, Gustave Eiffel, ENPC, IP Paris(LIGM,CNRS,居斯塔夫·艾菲尔,ENPC,IP巴黎) Columbia University, New York(哥伦比亚大学,纽约)

AI总结 提出一种无标签方法FINO,利用元数据通过自监督学习将通用视觉基础模型适应到专业科学领域,无需任务标签且仅用轻量探针进行监督,在多个领域超越标准无监督和全监督适应方法。

详情
AI中文摘要

我们提出一种无标签方法,将强大但通用的视觉基础模型适应到专业科学领域。标准的监督微调通常不适合这些场景:标签稀缺,且任务特定训练可能破坏模型的通用性和鲁棒性。我们转而利用元数据以自监督方式将表示适应到新领域。我们的方法FINO结合了标准的自监督目标与灵活的元数据指导,能够处理高度细粒度的离散元数据和连续元数据。它鼓励表示保留信息因子,同时抑制虚假因子。在亚细胞荧光显微镜、地球观测、野生动物监测和医学成像中,FINO始终优于标准的无监督域适应和全监督适应。它甚至超过了高度专业化的领域特定最先进方法,同时在骨干网络适应中不使用任何任务标签,仅使用轻量探针进行监督。

英文摘要

We propose a label-free approach to adapt powerful but generic vision foundation models to specialized scientific domains. Standard supervised fine-tuning is often ill-suited to these settings: labels are scarce, and task-specific training can collapse the model's generality and hurt robustness. We instead leverage metadata to adapt representations to new domains in a self-supervised manner. Our method, FINO, combines a standard self-supervised objective with flexible metadata guidance that handles both highly granular discrete metadata and continuous metadata. It encourages the representation to preserve informative factors while suppressing spurious ones. Across subcellular fluorescence microscopy, Earth observation, wildlife monitoring, and medical imaging, FINO consistently outperforms standard unsupervised domain adaptation and fully supervised adaptation. It also exceeds highly-specialized domain-specific state of the art, while using no task labels for backbone adaptation and only lightweight probes for supervision.

2606.05106 2026-06-04 cs.CL cs.AI cs.CY 版本更新

Arithmetic Pedagogy for Language Models

语言模型的算术教学法

Andhika Bernard Lumbantobing, Hokky Situngkir

发表机构 * Bandung Fe Institute & Adjunct Science Fellow in InaAI(巴旦格Fe研究所及InaAI兼职科学研究员) AI Research Center IT Del & Bandung Fe Institute(IT Del人工智能研究中心及巴旦格Fe研究所)

AI总结 借鉴人类数学教学法,通过将GASING方法操作化为链式思维监督训练小规模GPT-2模型,使其在算术推理上达到高准确率并展现出联想式心算能力。

Comments 18 pages, 6 figures

详情
AI中文摘要

我们研究人类数学教学法能否指导语言模型训练以实现算术推理。基于GASING方法——一种通过从左到右过程解决基本算术的印尼教学法,该过程与令牌生成的因果顺序一致——我们将每个操作操作化为一个计算过程,其执行轨迹序列化为自然语言的链式思维监督。使用仅下一个令牌预测目标(无强化学习或基于奖励的优化),从零开始训练一个带有音节-粘着TOBA分词器的小型GPT-2解码器(86M参数)。监控训练揭示了三个不同的学习阶段,机制分析——对链式思维信息图的注意力掩码干预、残差流探测和对数透镜检查——表明模型首先内化程序化路径,随后发展出联想式“心算”能力,无需显式逐步计算即可检索中间结果。训练后的模型在保留问题上达到超过80%的准确率,并与显著更大的语言模型相比具有竞争力,表明有针对性的、基于教学法的训练可以在小规模下产生强大且经济的算术能力。

英文摘要

We investigate whether methods of human mathematics pedagogy can guide the training of language models toward arithmetic reasoning. Building on the GASING method -- an Indonesian pedagogy that solves basic arithmetic through a left-to-right procedure aligned with the causal order of token generation -- we operationalize each operation as a computational procedure whose execution trace is serialized into natural-language Chain-of-Thought (CoT) supervision. A small GPT-2 decoder (86M parameters) with a syllabic-agglutinative TOBA tokenizer for Indonesian is trained from scratch on this data using only a next-token prediction objective, without reinforcement learning or reward-based optimization. Monitoring training reveals three distinct learning phases, and mechanistic analyses -- attention-masking interventions on the CoT information graph, residual-stream probing, and logit-lens inspection -- show that the model first internalizes a procedural pathway and subsequently develops an associative, ``mental-arithmetic'' capacity that retrieves intermediate results without explicit step-by-step computation. The trained model reaches over 80% accuracy on held-out problems and attains competitive performance against substantially larger language models, indicating that targeted, pedagogically grounded training can yield strong and economical arithmetic capability at small scale.

2606.05085 2026-06-04 cs.CL cs.AI 版本更新

Automatic Generation of Titles for Research Papers Using Language Models

使用语言模型自动生成研究论文标题

Tohida Rehman, Debarshi Kumar Sanyal, Samiran Chattopadhyay

发表机构 * Jadavpur University(贾达沃尔大学) Indian Association for the Cultivation of Science(印度科学培养协会)

AI总结 提出利用预训练语言模型和大语言模型从摘要生成论文标题的方法,通过微调PEGASUS-large在多个数据集上取得最优性能。

Comments 24 pages, 24 tables, 01 figure

详情
AI中文摘要

研究论文的标题以清晰简洁的方式传达其主要思想,有时也包括结论。选择合适的标题通常具有挑战性,自动标题生成可以帮助作者完成此任务。在这项工作中,我们提出了一种使用开放权重预训练模型和大语言模型从摘要生成论文标题的技术。我们使用了CSPubSum和LREC-COLING-2024数据集,并引入了一个新数据集SpringerSSAT,该数据集来自社会科学领域的四个Springer期刊。此外,我们使用GPT-3.5-turbo在零样本设置下生成标题。模型性能通过ROUGE、METEOR、MoverScore、BERTScore和SciBERTScore指标进行评估。我们的实验表明,微调的PEGASUS-large在大多数指标上优于其他模型,包括微调的LLaMA-3-8B和零样本GPT-3.5-turbo。我们进一步证明ChatGPT可以生成有创意的论文标题。总体而言,AI生成的标题通常是恰当且可靠的。

英文摘要

The title of a research paper conveys its primary idea and, occasionally, its conclusions in a clear and concise manner. Choosing an appropriate title is often challenging, and automated title generation can assist authors in this task. In this work, we propose a technique to generate paper titles from abstracts using open-weight pre-trained and large language models. We use the CSPubSum and LREC-COLING-2024 datasets and introduce a new dataset, SpringerSSAT, curated from four Springer journals in the social sciences. Additionally, we use GPT-3.5-turbo in a zero-shot setting to generate titles. Model performance is evaluated with ROUGE, METEOR, MoverScore, BERTScore, and SciBERTScore metrics. Our experiments show that fine-tuned PEGASUS-large outperforms other models, including fine-tuned LLaMA-3-8B and zero-shot GPT-3.5-turbo, across most metrics. We further demonstrate that ChatGPT can generate creative paper titles. Overall, AI-generated titles are generally appropriate and reliable.

2606.05080 2026-06-04 cs.AI cs.LG 版本更新

AutoLab: Can Frontier Models Solve Long-Horizon Auto Research and Engineering Tasks?

AutoLab:前沿模型能否解决长周期自动研究与工程任务?

Zhangchen Xu, Junda Chen, Yue Huang, Dongfu Jiang, Jiefeng Chen, Hang Hua, Zijian Wu, Zheyuan Liu, Zexue He, Lichi Li, Shizhe Diao, Jiaxin Pei, Jinsung Yoon, Hao Zhang, Mengdi Wang, Radha Poovendran, Misha Sra, Alex Pentland, Zichen Chen

发表机构 * MIT(麻省理工学院) Stanford University(斯坦福大学) University of California, Berkeley(加州大学伯克利分校) University of California, Los Angeles(加州大学洛杉矶分校) University of California, San Diego(加州大学圣地亚哥分校) University of Washington(华盛顿大学) University of Toronto(多伦多大学) University of Michigan(密歇根大学) National University of Singapore(新加坡国立大学) University of Tokyo(东京大学)

AI总结 本文提出AutoLab基准,通过36个专家策划的长周期闭环优化任务评估前沿模型,发现持续迭代和利用经验反馈比初始尝试质量更重要。

Comments Code: https://github.com/autolabhq/autolab ; Website: https://autolab.moe/

详情
AI中文摘要

科学和工程进步本质上是一个长周期迭代过程:提出更改、运行实验、测量结果并不断改进工件。然而,现有的前沿模型基准主要评估单轮响应或短周期智能体轨迹,未能捕捉在长时间跨度内持续迭代改进的挑战。为了解决这一差距,我们引入了AutoLab,一个用于超长周期闭环优化的新基准。AutoLab包含36个现实且由专家策划的任务,涵盖四个不同领域:系统优化、谜题与挑战、模型开发和CUDA内核优化。每个任务从一个正确但故意次优的基线开始,并挑战智能体在严格的挂钟预算内改进它。评估17个最先进模型的结果表明,成功的主要预测因素不是智能体初始尝试的质量,而是其持续进行基准测试、编辑和整合经验反馈的毅力。虽然claude-opus-4.6表现出强大的长周期优化能力,但大多数前沿模型,包括几个专有模型,要么过早终止,要么在预算内进展甚微。这些结果强调了时间意识和持续迭代在自主智能体中的重要性。我们开源了完整的基准、评估框架和任务工件,以加速研究真正有能力的长周期智能体。

英文摘要

Scientific and engineering progress is fundamentally a long-horizon iterative process: proposing changes, running experiments, measuring outcomes, and continuously refining artifacts. Yet existing benchmarks for frontier models primarily evaluate either single-turn responses or short-horizon agent trajectories, failing to capture the challenges of sustained iterative improvement over extended time horizons. To address this gap, we introduce AutoLab, a new benchmark for ultra long-horizon closed-loop optimization. AutoLab consists of 36 realistic, expert-curated tasks spanning four diverse domains: system optimization, puzzle & challenge, model development, and CUDA kernel optimization. Each task begins with a correct but deliberately suboptimal baseline and challenges agents to improve it within a strict wall-clock budget. Evaluating 17 state-of-the-art models reveals the dominant predictor of success is not the quality of an agent's initial attempt, but its persistence in repeatedly benchmarking, editing, and incorporating empirical feedback. While claude-opus-4.6 exhibits strong long-horizon optimization capabilities, most frontier models, including several proprietary ones, either terminate prematurely or exhaust their budgets with minimal progress. These results underscore the importance of time awareness and persistent iteration in autonomous agents. We open-source the full benchmark, evaluation harness, and task artifacts, to accelerate research toward truly capable long-horizon agents.

2606.05058 2026-06-04 cs.CV cs.AI 版本更新

UniCAD: A Unified Benchmark and Universal Model for Multi-Modal Multi-Task CAD

UniCAD:面向多模态多任务CAD的统一基准与通用模型

Jingyuan Chen, Sheng Jin, Haopeng Sun, Wentao Liu, Chen Qian

发表机构 * SenseTime Research and Tetras.AI(秒速科技研究院和Tetras.AI)

AI总结 针对CAD领域缺乏统一多模态基准的问题,提出UniCAD基准和UniCAD-MLLM通用多模态大语言模型,在点云到CAD重建、文本/图像到CAD生成和CAD问答等任务上实现端到端统一处理,并在多个基准上取得最优性能。

详情
AI中文摘要

计算机辅助设计(CAD)通过创建精确、可编辑的3D模型,支撑着现代工程和制造。然而,CAD研究通常孤立地研究各项任务,而多模态、多任务学习因缺乏统一基准而受阻。为解决这一问题,我们引入了UniCAD,一个全面的多模态CAD学习基准,涵盖点云到CAD重建、文本/图像到CAD生成以及CAD问答等多种输入模态。伴随该基准,我们提出了UniCAD-MLLM,一个通用的多模态大语言模型,能够接收文本、图像、草图和点云,并在单一框架内以端到端方式执行这些异构任务。在UniCAD和Fusion360基准上的大量实验表明,UniCAD-MLLM在所有任务上均达到最先进性能,优于现有的任务特定和多任务基线。我们将发布数据集、代码和预训练模型,以加速未来研究。

英文摘要

Computer-Aided Design (CAD) underpins modern engineering and manufacturing by enabling the creation of precise, editable 3D models. However, CAD research typically studies tasks in isolation, and multi-modal, multi-task learning for CAD is hindered by the absence of a unified benchmark. To address this gap, we introduce UniCAD, a comprehensive benchmark for multi-modal CAD learning that covers point-to-CAD reconstruction, text/image-to-CAD generation, and CAD question answering across diverse input modalities. Alongside the benchmark, we present UniCAD-MLLM, a universal multi-modal large language model that ingests text, images, sketches, and point clouds and performs these heterogeneous tasks in an end-to-end fashion within a single framework. Extensive experiments on the UniCAD and Fusion360 benchmarks demonstrate that UniCAD-MLLM achieves state-of-the-art performance across all tasks, outperforming existing task-specific and multi-task baselines. We will release the dataset, code, and pretrained models to accelerate future research.

2606.05043 2026-06-04 cs.AI 版本更新

Strabo: Declarative Specification and Implementation of Agentic Interaction Protocols

Strabo: 声明式规范与实现代理交互协议

Samuel H. Christie, Amit K. Chopra, Munindar P. Singh

发表机构 * North Carolina State University(北卡罗来纳州立大学) Lancaster University(兰卡斯特大学)

AI总结 提出 Strabo,通过声明式交互协议建模 UCP 的结账部分,并利用 Peach 编程模型实现代理,展示声明式规范的优势,同时实现与 Google UCP 代理的互操作,为 EMAS 思想在实践中的渐进引入提供路径。

Comments Presented in the Engineering Multiagent Systems Workshop co-located with the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
AI中文摘要

过去几年中,基于声明式交互协议的多代理系统建模与实现取得了重大进展。我们的贡献 Strabo 确立了这些进展与当前 Agentic AI 行业努力的相关性。具体来说,我们考虑了 UCP(通用商务协议),这是谷歌近期主导的为 AI 代理标准化电子商务交互的努力。我们的工作分为两部分。第一部分,我们将 UCP 中处理结账的部分建模为声明式 Langshaw 协议,并使用 Peach(一种 Langshaw 编程模型)实现代理。这部分工作展示了形式化、声明式规范的优势。第二部分,我们展示了 Peach 代理可以与谷歌实现的 UCP 代理互操作,从而确立了我们的方法相对于 UCP 的保真度。这种互操作使得声明式协议和代理能够逐步引入传统环境,为 EMAS 思想在不要求全面更新的情况下影响实践指明了路径。

英文摘要

The last few years have witnessed major advances in the modeling and implementation of multiagent systems based on declarative interaction protocols. Our contribution, Strabo, establishes the relevance of these advances to ongoing industry efforts in Agentic AI. Specifically, we consider UCP, the Universal Commerce Protocol, a recent Google-led effort to standardize e-commerce interactions for AI agents. Our exercise is in two parts. One, we model the part of UCP dealing with checkouts as a declarative Langshaw protocol and implement agents using Peach, a programming model for Langshaw. This part of the exercise brings out the advantages of formal, declarative specifications. Two, we show that Peach agents can interoperate with UCP agents implemented by Google, thereby establishing the fidelity of our approach with respect to UCP. Such interoperation enables the incremental introduction of declarative protocols and agents into a conventional setting, indicating a pathway by which EMAS ideas could influence practice without demanding a wholesale update.

2606.05037 2026-06-04 cs.SE cs.AI 版本更新

Self-Reflective APIs: Structure Beats Verbosity for AI Agent Recovery

自反式API:结构优于冗长,助力AI代理恢复

Arquimedes Canedo, Grama Chethan

发表机构 * Siemens Digital Industries Software, USA(西门子数字工业软件公司)

AI总结 提出自反式API,在验证失败时返回机器可读的结构化建议,使AI代理无需外部推理即可修复请求并重试,在Anthropic模型上将任务完成率提升36.7-40.0个百分点,且每成功令牌效率提升1.8-2.2倍。

详情
AI中文摘要

当AI代理调用API并遇到验证错误时,它需要的不仅仅是哪里出错了——它需要下一步该做什么。自反式API在验证失败时返回一个机器可读的 recovery_feedback.suggestions[] 负载,足以让代理修复请求并在无需外部推理的情况下重试。在一个经过泄露审计的试点实验(每单元N=30,3个LLM,10个对抗性任务)中,结构化建议在Anthropic模型上将任务完成率提升了+36.7至40.0个百分点(Fisher精确检验 p ≤ 0.0022),每成功令牌效率提高了1.8至2.2倍。在gpt-4o-mini上提升不显著(p=0.435);在计费API上的第二个领域复制确认了这一模式。该比较仅在审计了LLM基准测试中两个未记录的答案泄露类别后才成立。我们提供了 audit_prompt_leakage.py 作为可重用的CI基础设施。代码和数据:https://github.com/arquicanedo/self-reflective-apis。

英文摘要

When an AI agent calls an API and hits a validation error, it needs more than what went wrong -- it needs what to do next. A self-reflective API returns, on validation failure, a machine-readable recovery\_feedback.suggestions[] payload sufficient for the agent to repair the request and retry without external reasoning. On a leak-audited pilot ($N{=}30$ per cell, 3 LLMs, 10 adversarial tasks), structured suggestions lift task-completion rate by $+36.7$--$40.0$pp over plain-English diagnoses on Anthropic models (Fisher's exact $p \le 0.0022$), at $1.8$--$2.2\times$ better per-success token efficiency. The lift is not significant on gpt-4o-mini ($p{=}0.435$); a second-domain replication on a billing API confirms the pattern. The comparison only holds after auditing two undocumented classes of answer leakage in LLM benchmarks. We shipaudit\_prompt\_leakage.py as reusable CI infrastructure. Code and data: https://github.com/arquicanedo/self-reflective-apis.

2606.05025 2026-06-04 cs.LG cs.AI 版本更新

Invariant Gradient Alignment for Robust Reasoning Distillation

不变梯度对齐用于鲁棒推理蒸馏

Zehua Cheng, Wei Dai, Jiahao Sun

发表机构 * University of Oxford(牛津大学) FLock.io

AI总结 提出不变梯度对齐(IGA)框架,通过逻辑同构集、连续梯度冲突掩码和截断SVD投影,对齐不同语义域但逻辑结构相同的梯度更新,提升大语言模型在分布外输入上的鲁棒性。

Comments 30 Pages

详情
Journal ref
In Proceedings of European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases 2026
AI中文摘要

大型语言模型(LLMs)存在捷径学习问题:它们在分布外(OOD)输入上系统性失败,这些输入的语义表面与训练数据不同,即使逻辑结构相同。这破坏了将思维链推理迁移到较小学生模型的知识蒸馏流程。我们引入不变梯度对齐(IGA),一种训练框架,通过三项创新对齐跨语义多样但逻辑同构示例的梯度更新:(i)逻辑同构集,即跨不同语义领域(数学、医学、法律、科学)共享相同逻辑结构的问题组;(ii)可微的连续梯度冲突掩码,抑制具有高跨域梯度方差的参数维度,同时保留不变方向;(iii)将掩码梯度通过截断SVD投影回LoRA低秩流形,保持参数效率。理论上,IGA比ERM产生更紧的OOD泛化界,随同构域数量缩放,并在温和正则条件下以标准SGD速率收敛。实验上,IGA在四个基准测试中优于八种基线,准确率提升高达14.3个百分点(相对于ERM-SFT),逻辑一致性得分为0.031对比0.142——表示不变性提升四倍。

英文摘要

Large language models (LLMs) suffer from shortcut learning: they systematically fail on out-of-distribution (OOD) inputs whose semantic surface differs from training data, even when the logical structure is identical. This undermines knowledge distillation pipelines that transfer chain-of-thought reasoning to smaller students. We introduce Invariant Gradient Alignment (IGA), a training framework that aligns gradient updates across semantically diverse but logically isomorphic examples via three innovations: (i) Logical Isomer Sets, groups of problems sharing identical logical structure across distinct semantic domains (mathematics, medicine, law, science); (ii) a differentiable \emph{Continuous Gradient Conflict Mask}, that suppresses parameter dimensions with high cross-domain gradient variance while preserving invariant directions; and (iii) a truncated SVD projection of the masked gradient back onto the LoRA low-rank manifold, maintaining parameter efficiency throughout. Theoretically, IGA yields tighter OOD generalization bounds than ERM, scaling with the number of isomer domains, and converges at the standard SGD rate under mild regularity. Empirically, IGA outperforms eight baselines across four benchmarks with accuracy gains up to 14.3 pp over ERM-SFT and a Logical Consistency Score of 0.031 versus 0.142 -- a fourfold improvement in representational invariance.

2606.05009 2026-06-04 cs.CL cs.AI 版本更新

DAR: Deontic Reasoning with Agentic Harnesses

DAR: 基于智能体框架的道义推理

Guangyao Dou, William Jurayj, Nils Holzenberger, Benjamin Van Durme

发表机构 * Johns Hopkins University(约翰霍普金斯大学) Télécom Paris, Institut Polytechnique de Paris(巴黎电信学院,巴黎理工学院)

AI总结 提出DAR框架,通过让模型按需与法规交互来提升基于LLM的道义推理能力,实验表明智能体框架可提升性能但存在非均匀改进和弱模型数值任务退化问题。

详情
AI中文摘要

道义推理是通过将明确的规则和政策应用于具体案例事实来回答问题,例如根据法规计算纳税义务或确定移民上诉结果。基于LLM的道义推理的一个关键技术挑战是相关规则集可能很长且相互引用,因此模型可能仍无法找到特定推理步骤所需的规则。我们引入了道义智能体推理(DAR),这是一种智能体推理设置,其中模型按需与法规交互。我们在DeonticBench的困难子集上使用多种框架评估DAR。在这些设置中,我们发现智能体框架可以推动道义推理任务的前沿,但改进并不均匀:较弱的模型在数值任务上往往性能下降,同时消耗更多的令牌。

英文摘要

Deontic reasoning is the task of answering questions by applying explicit rules and policies to case-specific facts, for example computing tax liability under a statute or determining the outcome of an immigration appeal. A key technical challenge for LLM-based deontic reasoning is that the relevant ruleset can be long and cross-referenced, so models may still fail to locate the rules needed for a particular reasoning step. We introduce Deontic Agentic Reasoning (DAR), an agentic reasoning setup in which the model interacts with the statutes on demand. We evaluate DAR under multiple harnesses on hard subsets of DeonticBench. Across these settings, we find that agentic harnesses can push the frontier on deontic reasoning tasks, but improvements are not uniform: weaker models often degrade on numerical tasks while consuming far more tokens.

2606.05008 2026-06-04 cs.CV cs.AI cs.CL 版本更新

M$^3$Eval: Multi-Modal Memory Evaluation through Cognitively-Grounded Video Tasks

M$^3$Eval: 通过认知基础视频任务的多模态记忆评估

Jie Huang, Ruixun Liu, Sirui Sun, Xinyi Yang, Yin Li, Yixin Zhu, Yiwu Zhong

发表机构 * School of Intelligence Science and Technology, Peking University(北京大学智能科学与技术学院) State Key Laboratory of General Artificial Intelligence, Peking University(北京大学通用人工智能国家重点实验室) Yuanpei College, Peking University(北京大学元培学院) Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院) School of Psychological and Cognitive Sciences, Peking University(北京大学心理学与认知科学学院) University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 提出首个多模态模型记忆评估框架M$^3$Eval,通过认知心理学设计的视频任务系统评估模型在记忆保持、忠实性和鲁棒性上的表现,发现模型在并行视频流处理、干扰模式、时空记忆和符号记忆方面的显著缺陷。

Comments We present an evaluation designed for multi-modal memory in multi-modal models

详情
AI中文摘要

随着多模态模型向长视频理解发展,记忆成为关键能力。尽管在视频数据集和基准测试方面做出了大量努力,现有工作主要关注感知和推理,而没有系统评估记忆:模型保留了什么、信息如何忠实保存、以及记忆在干扰下的鲁棒性。为填补这一空白,我们引入了M$^3$Eval,这是第一个用于探测多模态模型中不同记忆维度的综合评估框架和基准。基于认知心理学,我们的设计通过精心构建的任务来隔离记忆的关键方面。利用M$^3$Eval,我们在代表性多模态模型上进行了大量实验,揭示了一致的弱点和独特行为。我们发现,模型在处理并行视频流时难以保持解耦表示,表现出与人类记忆显著不同的干扰模式,在空间域比时间域更可靠地定位记忆源,并且符号记忆有限。总的来说,我们的基准为未来研究提供了宝贵资源,而我们的发现强调了记忆作为基本但未充分探索的能力,并为设计更有效的多模态模型记忆机制提供了见解。我们的代码和数据集可在https://pku-value-lab.github.io/m3eval-homepage获取。

英文摘要

As multi-modal models advance towards long-form video understanding, memory emerges as a critical capability. Despite substantial efforts in developing video datasets and benchmarks, existing works primarily focus on perception and reasoning, without systematically evaluating memory: what models retain, how faithfully information is preserved, and how robust memory remains under interference. To address this gap, we introduce M$^3$Eval, the first comprehensive evaluation framework and benchmark for probing different memory dimensions in multi-modal models. Grounded in cognitive psychology, our design features carefully constructed tasks that isolate key aspects of memory. Leveraging M$^3$Eval, we conduct extensive experiments across representative multi-modal models, revealing consistent weaknesses and distinctive behaviors. We find that models struggle to maintain disentangled representations when processing parallel video streams, exhibit interference patterns differing substantially from those observed in human memory, ground memory sources more reliably in the spatial domain than the temporal domain, and demonstrate limited symbolic memory. Collectively, our benchmark provides a valuable resource for future research, while our findings highlight memory as a fundamental yet underexplored capability and offer insights for designing more effective memory mechanisms in multi-modal models. Our code and dataset are available at https://pku-value-lab.github.io/m3eval-homepage.

2606.05004 2026-06-04 cs.CR cs.AI 版本更新

SharedRequest: Privacy-Preserving Model-Agnostic Inference for Large Language Models

SharedRequest: 面向大型语言模型的隐私保护模型无关推理

Peihua Mai, Xuanrong Gao, Youlong Ding, Xianglong Du, Wei Liu, Yan Pang

发表机构 * National University of Singapore (Chongqing) Research Institute(新加坡国立大学(重庆)研究院) Chongqing Key Laboratory of Trusted Perception and Interaction Technology for Intelligent and Connected Vehicles(重庆智能网联车辆可信感知与交互技术重点实验室) National University of Singapore(新加坡国立大学) Hebrew University of Jerusalem(耶路撒冷希伯来大学) State Key Laboratory of Intelligent Vehicle Safety Technology, Chongqing, China(重庆智能车辆安全技术国家重点实验室) CHONGQING CHANGAN AUTOMOBILE Co., Ltd(重庆长安汽车有限公司)

AI总结 提出一种模型无关的隐私保护推理框架SharedRequest,通过批量级别混淆和语义分组实现高效隐私保护,相比差分隐私基线效用提升20%以上,查询成本降低5倍。

Comments accepted by ACL 2026 (main)

详情
AI中文摘要

随着ChatGPT等公共大型语言模型(LLMs)的广泛部署,保护用户提示隐私已成为一个日益关键的问题。现有的隐私保护推理方法要么牺牲效用,要么牺牲效率,并且通常需要特定于模型的修改,限制了其兼容性。在本文中,我们提出了SharedRequest,一个模型无关的隐私保护LLM推理框架,它将隐私保护重新定义为批量级别而非单个提示级别。关键思想是通过将原始提示与噪声变体混合来混淆敏感信息,同时将语义等效的指令分组,以在大量查询批次中分摊推理成本,对LLM响应质量影响最小。该设计独立于LLM架构,无需访问模型参数或进行架构修改。实验结果表明,与先前的差分隐私基线相比,SharedRequest实现了超过20%的效用提升,并且其共享提示机制相比非批量推理将查询成本降低了5倍。

英文摘要

With the widespread deployment of public large language models (LLMs) such as ChatGPT, protecting user prompt privacy has become an increasingly critical issue. Existing privacy-preserving inference methods sacrifice either utility or efficiency, and often require model-specific modifications that limit their compatibility. In this paper, we propose SharedRequest, a model-agnostic framework for privacy-preserving LLM inference that reformulates privacy protection at the batch level rather than the individual-prompt level. The key idea is to obscure sensitive information by mixing original prompts with noisy variants, while grouping semantically equivalent instructions to amortize the inference cost over a large batch of queries with minimal impact on LLM response quality. This design is independent of the LLM architecture, requiring no access to model parameters or architectural modification. Empirical results demonstrate that SharedRequest achieves over $20\%$ higher utility compared to prior differential privacy baselines, and its shared-prompt mechanism reduces query cost by up to $5\times$ compared to non-batched inference.

2606.04987 2026-06-04 cs.CL cs.AI cs.HC 版本更新

DeliChess: A Multi-party Dialogue Dataset for Deliberation in Chess Puzzle Solving

DeliChess: 一个用于国际象棋谜题求解中深思熟虑的多方对话数据集

Xiaochen Zhu, Georgi Karadzhov, Tom Stafford, Andreas Vlachos

发表机构 * University of Cambridge(剑桥大学) University of Sheffield(谢菲尔德大学)

AI总结 提出DeliChess数据集,包含多方协作解决国际象棋谜题的对话,通过讨论显著提升群体准确性,并分析探询性话语的作用。

详情
AI中文摘要

多方对话是研究协作推理和决策的关键场景,然而现有数据集很少关注结构化、深入的复杂推理任务。我们引入了DeliChess,一个新颖的群体深思熟虑对话数据集,其中参与者协作解决多项选择国际象棋谜题。每个小组首先单独完成谜题,然后进行多方讨论,最后提交修正后的集体答案。该数据集包含107个对话,附有完整转录、讨论前后的选择以及关于谜题难度和走棋质量的元数据。我们使用基于象棋引擎评估的三个指标评估性能,发现深思熟虑显著提高了群体准确性。我们进一步利用先前深思熟虑数据训练的分类器分析了探询性话语(即引发提议、理由或战略反思的消息)的作用。虽然探询性话语使讨论后的群体表现更加多变,但它并未持续带来更好的性能。我们的数据集为在一个明确定义的策略领域中建模群体推理、对话动态以及不同观点和意见的解决提供了丰富的测试平台。

英文摘要

Multi-party dialogue is a critical setting for studying collaborative reasoning and decision-making, yet existing datasets rarely focus on structured, in-depth complex reasoning tasks. We introduce DeliChess, a novel dataset of group deliberation dialogues in which participants collaboratively solve multiple-choice chess puzzles. Each group first completes the puzzle individually, then engages in a multi-party discussion before submitting a revised collective answer. The dataset includes 107 dialogues with full transcripts, pre- and post-discussion choices, and metadata on puzzle difficulty and move quality. We evaluate performance using three metrics based on chess engine evaluations, and find that deliberation significantly improves group accuracy. We further analyse the role of probing utterances (i.e., messages that elicit proposals, justifications, or strategic reflection) using a classifier trained on prior deliberation data. While probing makes group performance more variable after discussion, it does not consistently lead to better performance. Our dataset offers a rich testbed for modelling group reasoning, dialogue dynamics, and the resolution of differing perspectives and opinions in a well-defined strategic domain.

2606.04970 2026-06-04 cs.CV cs.AI 版本更新

Plan, Watch, Recover: A Benchmark and Architectures for Proactive Procedural Assistance

计划、观察、恢复:主动式程序辅助的基准与架构

Kaustav Kundu, Ritvik Shrivastava, Maxim Arap, Nanshu Wang, Xianhui Zhu, Quintin Fettes, Gautam Tiwari, Parth Suresh, Théo Moutakanni, Alejandro Castillejo Munoz, Allen Bolourchi, Pascale Fung, Pinar Donmez, Babak Damavandi, Anuj Kumar, Seungwhan Moon

发表机构 * Meta Reality Labs(Meta现实实验室) Meta Superintelligence Labs(Meta超智能实验室)

AI总结 提出EgoProactive数据集和Pro²Bench基准,并设计解耦规划器-交互架构,用于主动式程序辅助中的实时引导和异常恢复。

Comments 53 pages, 14 figures

详情
AI中文摘要

我们设想一个主动的多模态辅助系统,该系统在程序性任务中为用户提供实时的逐步指导,自主决定何时中断以及如何指导。然而,由于缺乏反映现实条件的大规模跨领域基准,特别是用户偏离预期步骤序列的常见情况,进展受到限制。我们通过四项贡献来解决这一差距: extbf{(1)}~我们发布了 extbf{EgoProactive},一个大规模的可穿戴自我中心数据集,用于主动程序辅助,带有明确的计划外(OOP)标注和恢复步骤; extbf{(2)}~我们将五个已建立的基准(Ego4D、EPIC-KITCHENS、EgoExo4D、HoloAssist、HowTo100M)扩充为统一的主动指导模式下的 extbf{Pro extsuperscript{2}Bench}; extbf{(3)}~我们提出了一种专门针对程序状态、视觉线索和恢复注入的 extbf{解耦规划器-交互架构}; extbf{(4)}~我们引入了一种跨模型家族迁移的训练后方案,通过在Llama~4和Qwen-3.6-VL上的跨骨干复制进行验证。在大量实验中,我们训练的Llama-4系统在所有六个数据集上,相对于强大的专有基线(Claude Opus~4.6、Gemini~3.1~Pro、GPT~5.2)和开放权重基线(Qwen3~VL~235B),显著提高了客观干预质量。Oracle计划实验进一步表明,当计划质量得到控制时,训练的双工模型产生高质量的指导,并在计划外恢复方面取得巨大收益。

英文摘要

We envision a proactive multi-modal assistant system which gives users real-time step-by-step guidance on a procedural task, autonomously deciding \textit{when} to interrupt, and \textit{how} to coach. However, progress is limited by the absence of large-scale, cross-domain benchmarks that reflect realistic conditions, particularly the common case in which users deviate from the expected step sequence. We address this gap with four contributions: \textbf{(1)}~we release \textbf{EgoProactive}, a large-scale wearable-egocentric dataset for proactive procedural assistance with explicit Out-of-Plan (OOP) annotations and recovery steps; \textbf{(2)}~we augment five established benchmarks (Ego4D, EPIC-KITCHENS, EgoExo4D, HoloAssist, HowTo100M) into \textbf{Pro\textsuperscript{2}Bench} under a unified proactive-guidance schema; \textbf{(3)}~we propose a \textbf{decoupled planner--interaction architecture} specialized for procedural state, visual cues, and recovery injection; \textbf{(4)}~we introduce a post-training recipe that transfers across model families, validated by cross-backbone replication on Llama~4 and Qwen-3.6-VL. In extensive experiments, our trained Llama-4 system substantially improves objective intervention quality over strong proprietary baselines (Claude Opus~4.6, Gemini~3.1~Pro, GPT~5.2) and open-weight baselines (Qwen3~VL~235B) baselines across all six datasets. Oracle-plan experiments further show that, when plan quality is controlled, the trained duplex model produces high-quality guidance and large gains on Out-of-Plan recovery.

2606.04967 2026-06-04 cs.SE cs.AI 版本更新

From Prompt to Process: a Process Taxonomy and Comparative Assessment of Frameworks Supporting AI Software Development Agents

从提示到流程:支持AI软件开发智能体的框架流程分类与比较评估

Sanderson Oliveira de Macedo

发表机构 * Federal Institute of Goias(戈亚斯联邦理工学院)

AI总结 提出六维流程分类法,对六个AI软件开发框架进行评分比较,揭示流程深度与可移植性之间的结构性权衡。

详情
AI中文摘要

AI编程工具不再仅仅是自动补全或聊天助手:它们组织为开发框架,包含流程、角色、工件和验证。最近的调查绘制了用于软件工程的智能体和LLM,但缺少一项以将这些能力转化为流程的操作框架为中心的研究。我们对主要来源进行了定向搜索,采用功能性纳入标准和牵引力测量,选择了六个框架:GitHub Spec Kit、OpenSpec、BMAD Method、Get Shit Done (GSD)、Spec Kitty和Reversa。每个框架通过不同路径攻击AI开发:完整和轻量变体的规范驱动开发、智能体驱动的敏捷规划、智能体上的上下文工程、工作树隔离与审查,以及从遗留系统中恢复操作规范。我们的核心贡献是一个六维流程分类法:规范、上下文、角色、执行、验证和可移植性,并附带一个评分标准,使其成为可复制的工具。我们将其应用于六个框架和一个样本外案例Spec-Flow。两个结果突出。在已经采用某种流程的框架中,存在趋同:孤立的提示失去中心地位,持久工件、工作合同、可追溯性和人工审查成为减少歧义和协调智能体的机制。并且没有框架强覆盖所有六个维度,暴露了流程深度与跨智能体可移植性之间的结构性权衡。我们还发现了反复出现的风险:规范与代码之间的漂移、对生成工件的过度信任、社区扩展的脆弱性、平台依赖性以及缺乏完整流程的基准测试。我们以一个研究议程结束,侧重于中间质量指标、上下文治理、安装安全性和可重复性。

英文摘要

AI tools for programming are no longer just autocomplete or chat assistants: they organize themselves as development frameworks, with process, roles, artifacts and verification. Recent surveys map agents and LLMs for software engineering, but a study centered on the operational frameworks that turn these capabilities into process is missing. We ran a directed search of primary sources, with a functional inclusion criterion and traction measurement, and selected six frameworks: GitHub Spec Kit, OpenSpec, BMAD Method, Get Shit Done (GSD), Spec Kitty and Reversa. Each attacks AI development through a different path: spec-driven development in full and lightweight variants, agent-driven agile planning, context engineering over the agent, worktree isolation and review, and recovery of operational specifications from legacy systems. Our central contribution is a six-dimension process taxonomy: specification, context, roles, execution, validation and portability, with a scoring rubric that turns it into a replicable instrument. We apply it to the six frameworks and an out-of-sample case, Spec-Flow. Two results stand out. Among frameworks that already adopt some process there is convergence: the isolated prompt loses centrality, and persistent artifacts, work contracts, traceability and human review become mechanisms that reduce ambiguity and coordinate agents. And no framework strongly covers all six dimensions, exposing a structural trade-off between process depth and portability across agents. We also found recurring risks: drift between specification and code, excessive trust in generated artifacts, fragility of community extensions, platform dependence and a lack of benchmarks for the complete process. We close with a research agenda for empirical evaluation, focused on intermediate-quality metrics, context governance, installation security and reproducibility.

2606.04930 2026-06-04 cs.LG cs.AI stat.ML 版本更新

AdaKoop: Efficient Modeling of Nonlinear Dynamics from Nonstationary Data Streams with Koopman Operator Regression

AdaKoop: 基于Koopman算子回归的非平稳数据流非线性动力学高效建模

Naoki Chihara, Ren Fujiwara, Yasuko Matsubara, Yasushi Sakurai

发表机构 * SANKEN, The University of Osaka(SANKEN大学)

AI总结 提出AdaKoop,一种基于Koopman算子理论和概率框架的流式算法,通过将非线性动力学表示为线性系统,实现对非平稳数据流的高效、稳定建模,并在71个基准数据集上超越现有方法。

Comments Accepted by KDD'26

详情
Journal ref
The 32nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2026
AI中文摘要

实时数据分析需要准确且自适应地处理非平稳数据流中的非线性动力学,同时保持计算效率。然而,非线性动力学非常复杂,在严格时间限制下捕获动态变化的非线性模式并将其用于下游任务并非易事。为了弥合非线性复杂性与计算可处理性之间的差距,本研究应用了Koopman算子理论,该理论指出非线性动力学可以表示为无限维空间中的线性变换。基于该算子的有限维近似,我们提出了AdaKoop,一种用于对非平稳数据流上的非线性动力学进行建模的高效流式算法。我们的方法利用基于Koopman算子理论的概率框架,将原始观测和再生核希尔伯特空间(RKHS)特征都视为来自潜在向量的发射。这种双视角公式允许非线性动力学被表示为可处理的线性系统。因此,AdaKoop能够以流式方式高效稳定地建模非线性动力学,避免了迭代非线性优化的高昂计算成本。此外,为了应对数据流中的非平稳性,AdaKoop通过统计假设检验自适应地检测模式突变,并增量更新模型参数以处理连续变化。在总共71个跨领域实际基准数据集上的大量实验表明,AdaKoop在实时预测准确性和计算效率方面均优于最先进的方法。

英文摘要

Real-time data analysis requires the ability to accurately and adaptively address nonlinear dynamics in a nonstationary data stream while preserving computational efficiency. However, nonlinear dynamics are so complex that capturing dynamically changing nonlinear patterns and utilizing them for downstream tasks under strict time constraints is nontrivial. To bridge the gap between nonlinear complexity and computational tractability, this study applies Koopman operator theory, which states that nonlinear dynamics can be represented as linear transitions in an infinite-dimensional space. Building upon finite-dimensional approximations of this operator, we present AdaKoop, an efficient streaming algorithm for modeling nonlinear dynamics over nonstationary data streams. Our approach utilizes a probabilistic framework grounded in Koopman operator theory, treating both raw observations and reproducing kernel Hilbert space (RKHS) features as emissions from latent vectors. This dual-view formulation allows nonlinear dynamics to be expressed as a tractable linear system. Therefore, AdaKoop enables the efficient and stable modeling of nonlinear dynamics in a streaming fashion, avoiding the prohibitive computational costs of iterative nonlinear optimization. Furthermore, to address nonstationarity in data streams, AdaKoop adaptively detects the switching of patterns via statistical hypothesis testing for abrupt pattern shifts and incrementally updates model parameters to handle continuous changes. Extensive experiments on a total of 71 practical benchmark datasets across various domains demonstrate that AdaKoop outperforms state-of-the-art methods in terms of real-time forecasting accuracy and computational efficiency.

2606.04923 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Reproducing, Analyzing, and Detecting Reward Hacking in Rubric-Based Reinforcement Learning

基于评分标准的强化学习中的奖励黑客行为的复现、分析与检测

Xuekang Wang, Zhuoyuan Hao, Shuo Hou, Hao Peng, Juanzi Li, Xiaozhi Wang

发表机构 * Tsinghua University(清华大学) Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Xi’an Jiaotong University(西安交通大学)

AI总结 本文提出可控黑客环境CHERRL,通过注入已知偏见复现奖励黑客行为,分析其可发现性与可利用性,并探索基于智能体的自动检测方法。

Comments 23 pages, 7 figures

详情
AI中文摘要

基于评分标准的强化学习(RL)使用LLM作为评判者(LaaJ)根据评分标准对模型输出进行评分作为奖励。然而,策略模型可能利用评判者中的潜在偏见,导致奖励黑客行为以及无效或不安全的训练结果。在真实的基于评分标准的RL中,此类黑客行为通常微妙且与多种评判者偏见纠缠在一起,使得分析、检测和缓解变得困难。在本文中,我们引入了CHERRL,一个用于基于评分标准的RL的可控黑客环境。通过将已知偏见注入LaaJ,CHERRL能够稳定复现奖励黑客行为,明确观察奖励发散,并精确识别黑客行为的起始点。这为研究基于评分标准的RL中奖励黑客行为的机制和缓解措施提供了一个干净的实验测试平台。为了展示其效用,我们从可发现性和可利用性的角度分析了不同的评判者偏见,并探索了一个基于智能体的系统,用于从训练日志中自动检测奖励黑客行为的起始点。代码和环境公开于https://github.com/THUAIS-Lab/CHERRL。

英文摘要

Rubric-based reinforcement learning (RL) uses an LLM-as-a-Judge (LaaJ) to score model outputs according to rubrics as rewards. However, policy models may exploit latent biases in the judge, leading to reward hacking and ineffective or unsafe training outcomes. In real-world rubric-based RL, such hacking behaviors are often subtle and entangled with multiple judge biases, making them difficult to analyze, detect, and mitigate. In this paper, we introduce CHERRL, a controllable hacking environment for rubric-based RL. By injecting known biases into LaaJ, CHERRL enables stable reproduction of reward hacking, explicit observation of reward divergence, and precise identification of hacking onset. This provides a clean experimental testbed for studying the mechanisms and mitigations of reward hacking in rubric-based RL. To demonstrate its utility, we analyze different judge biases from the perspectives of discoverability and exploitability, and explore an agent-based system for automatically detecting reward hacking onset from training logs. The code and environment are publicly available at https://github.com/THUAIS-Lab/CHERRL.

2606.04922 2026-06-04 cs.CV cs.AI cs.LG 版本更新

Geometry-Aware Distillation for Prompt Tuning Biomedical Vision-Language Models

几何感知蒸馏用于提示调优生物医学视觉-语言模型

Tran Dinh Tien, Zhiqiang Shen

发表机构 * Department of Machine Learning(机器学习系) Mohamed bin Zayed University of Artificial Intelligence(Mohamed bin Zayed人工智能大学)

AI总结 提出Omni-Geometry知识蒸馏(OGKD)框架,通过注入类别关系结构到教师模型,生成保留真实标签同时尊重类间几何的方向性目标,并设计全局几何感知蒸馏(GAD)和标签引导几何蒸馏(LGD)损失,在11个医学数据集上平均提升准确率1.7%-2.8%。

Comments Preprint. Code is available at https://github.com/tientrandinh/OGKD

详情
AI中文摘要

当前基于提示和适配器的视觉-语言模型(VLM)调优方法在医学影像中具有吸引力,因为临床数据敏感性倾向于冻结骨干网络且标注有限。然而,这些方法通常仅优化真实类别,将所有其他类别视为同等错误,忽略了临床上有意义的类别关系,并在有限监督设置下产生不稳定的决策边界。我们提出了Omni-Geometry知识蒸馏(OGKD),一种新框架,将类别关系结构注入教师模型,以生成保留真实标签同时尊重类间几何的方向性目标。利用这些目标,我们开发了两种蒸馏损失:全局几何感知蒸馏(GAD)作用于全局图像标记,标签引导几何蒸馏(LGD)将相同的几何应用于注意力补丁标记以改善细粒度对齐。在11个广泛使用的医学数据集上进行的基础到新类和少样本评估的综合实验和分析中,我们的OGKD实现了显著更好的性能,在所有先前最先进的VLM适应方法上平均绝对增益为1.7%-2.8%。它还能稳健地泛化到未见类别,并产生比其他方法更可靠的预测。我们的代码可在https://github.com/tientrandinh/OGKD获取。

英文摘要

Current prompt-based and adapter-based tuning of vision-language models (VLMs) is attractive for medical imaging, where clinical data sensitivity favors frozen backbones and annotations are limited. However, these methods typically optimize only the ground-truth class, treating all other classes as equally incorrect, ignoring clinically meaningful class relations and yielding unstable decision boundaries in limited-supervision settings. We propose Omni-Geometry Knowledge Distillation (OGKD), a new framework that injects class-relation structure into the teacher to produce directional targets that preserve the ground truth while respecting inter-class geometry. Using these targets, we develop two distillation losses: Global Geometry-Aware Distillation (GAD) operates on the global image token, and Label-Guided Geometry Distillation (LGD) applies the same geometry to attentive patch tokens to improve fine-grained alignment. Across comprehensive experiments and analyses on 11 widely-used medical datasets for base-to-novel and few-shot evaluations, our OGKD achieves substantially better performance, consistently improving accuracy by an average absolute gain of 1.7%-2.8% over all prior state-of-the-art VLM adaptation counterparts. It also robustly generalizes to unseen classes and yields more reliable predictions than other approaches. Our code is available at https://github.com/tientrandinh/OGKD.

2606.04906 2026-06-04 cs.CL cs.AI 版本更新

'Your AI Text is not Mine': Redefining and Evaluating AI-generated Text Detection under Realistic Assumptions

“你的AI文本不是我的”:在现实假设下重新定义和评估AI生成文本检测

Nils Dycke, Marina Sakharova, Nico Daheim, Iryna Gurevych

发表机构 * Ubiquitous Knowledge Processing Lab (UKP Lab), Department of Computer Science, Technical University of Darmstadt(通用知识处理实验室(UKP实验室),计算机科学系,达姆施塔特技术大学) National Research Center for Applied Cybersecurity ATHENE, Germany(应用网络安全国家研究中心ATHENE,德国) Zuse School ELIZA(祖斯学校ELIZA)

AI总结 针对AI生成文本检测领域缺乏统一有害使用定义的问题,本文系统定义了多种AI生成文本概念,构建了包含详细生成过程注释的人机协作文本基准AITDNA,并评估了多种检测器在不同概念下的表现。

详情
AI中文摘要

尽管普遍认为AI生成的文本会带来广泛的社会风险,但在AI生成文本检测文献中,对于什么构成有害使用并没有共同的理解。相反,现有的数据集和方法往往定义自己的标准并做出自己的假设,有时是隐含的,而且通常只与真实世界的需求和应用程序松散相关。为了解决这一差距,我们在此系统地定义了AI生成文本的各种概念及其特征。为了研究这些,我们收集了AITDNA——一个全新的人机协作文本基准,其中标注了详细的生成过程信息,如整个编辑和AI交互历史。我们评估了各种机器生成文本检测器,发现它们通常只在特定概念下表现良好,而不能作为广泛的检测器。我们公开发布代码和数据。

英文摘要

Although it is generally agreed that AI-generated text poses a broad societal risk, there is no common understanding in the AI-generated text detection literature on what constitutes harmful use. Rather, existing datasets and approaches often define their own criteria and make their own assumptions, sometimes implicitly, and often only loosely related to real-world needs and applications. To address this gap, we here systematically define various notions of AI-generated text and their characteristics. To study these, we collect AITDNA - a new benchmark of human-machine co-constructed texts that is annotated with detailed genesis information, such as the entire edit and AI-interaction history. We benchmark various machine-generated text detectors and find that they often only perform well for specific notions but not as broad detectors. We release code and data publicly.

2606.04903 2026-06-04 cs.LO cs.AI cs.MA cs.PL 版本更新

Provably Auditable and Safe LLM Agents from Human-Authored Ontologies

基于人类编写本体的可审计且安全的LLM智能体

Aaron Sterling

发表机构 * Thistleseeds

AI总结 提出Agentic Redux架构,通过类型化λ演算证明其在适当领域上的执行语义正确且决策可审计,并引入本体优先的智能体设计方法。

详情
AI中文摘要

我们介绍了LLM智能体架构Agentic Redux,旨在用于需要线性可审计性的非平凡问题领域。使用类型化λ演算,我们证明了在适当领域上运行时,Agentic Redux的执行在语义上保证正确,所有决策记录在仅追加的分类账中。我们提出了两个生产级领域:医疗账单合规性和安全漏洞披露。支持两个领域上运行的Agentic Redux的工作代码可在配套代码仓库中找到。我们还引入了本体优先的智能体设计方法,这是一种在问题领域上创建智能体框架的方法,其中人类专家使用基本形式本体对问题领域进行本体化,然后分配LLM推导出智能体和人在回路中可以扮演的角色,以处理该领域中的问题。

英文摘要

We introduce the LLM agent architecture Agentic Redux, intended for use with nontrivial problem domains that require linear auditability. Using the typed lambda calculus, we prove that, run on appropriate domains, Agentic Redux executions are semantically guaranteed to be correct, with all decisions recorded in an append-only ledger. We present two production-grade appropriate domains, in healthcare billing compliance, and security vulnerability disclosure. Working code for Agentic Redux run on both domains is available in a supporting code repository. We also introduce Ontology-First Agent Design, a methodology for creation of agent frameworks on a problem domain, in which a human expert ontologizes the problem domain with Basic Formal Ontology, and then assigns an LLM to derive roles that agents and humans-in-the-loop can fill, in order to work the problems in the domain.

2606.04881 2026-06-04 cs.CV cs.AI 版本更新

DiverAge: Reliable Pluralistic Face Aging with Cross-Age Identity Relation Guidance

DiverAge: 基于跨年龄身份关系引导的可靠多元人脸老化

Yueying Zou, Peipei Li, Qianrui Teng, Dianyan Xu, Zekun Li

发表机构 * School of Artificial Intelligence, Beijing University of Posts and Telecommunications(人工智能学院,北京邮电大学) School of Computer Science, University of California, Santa Barbara(计算机科学学院,加州大学圣芭芭拉分校)

AI总结 提出基于扩散自编码的分层多元人脸老化框架DiverAge,通过随机扩散解码和年龄条件语义调制保持外观多样性,并引入跨年龄身份关系调节器(CARR)在推理时引导去噪,以提升序列级有序可靠性。

Comments 11 pages,10 figures, 5 tables

详情
AI中文摘要

人脸老化在长期生物特征分析、跨年龄身份验证和法医身份分析中扮演重要角色。由于同一主体因遗传、环境和生活方式等因素在目标年龄可能呈现多种合理外观,人脸老化本质上是一个一对多的生成问题。然而,仅有多元性不足以实现可靠的人脸老化:模型应在每个年龄组内提供外观级别的候选多样性,同时跨有序年龄组保持序列级别的有序可靠性。现有的确定性老化方法可以合成视觉上合理的年龄增长人脸,但通常缺乏随机多样性。相比之下,多元老化方法引入局部外观变化,但往往未能明确调控完整老化序列的身份演化。本文提出基于扩散自编码的分层多元人脸老化框架DiverAge。DiverAge通过随机扩散解码和年龄条件语义调制保持外观级多样性。为提升序列级可靠性,我们引入跨年龄身份关系调节器(CARR),一种推理时引导策略,联合去噪多个目标年龄组。CARR由从真实同身份跨年龄对估计的跨年龄身份相似性(CIS)先验引导,通过单边采样时引导抑制过度的跨年龄身份漂移,无需修改训练目标或引入额外可训练参数。实验表明,DiverAge在保持身份保留、年龄准确性、图像质量和外观级多样性的同时,提升了序列级有序可靠性。

英文摘要

Face aging plays an important role in long-term biometric analysis, cross-age identity verification, and forensic identity analysis. Since the same subject may exhibit multiple plausible appearances at a target age due to genetic, environmental, and lifestyle factors, face aging is inherently a one-to-many generation problem. However, pluralism alone is insufficient for reliable face aging: a model should provide appearance-level candidate diversity within each age group while maintaining sequence-level ordinal reliability across ordered age groups. Existing deterministic aging methods can synthesize visually plausible age-progressed faces, but usually lack stochastic diversity. In contrast, pluralistic aging methods introduce local appearance variations, but often fail to explicitly regulate the identity evolution of the full aging sequence. In this paper, we propose \textbf{DiverAge}, a hierarchical pluralistic face aging framework based on diffusion autoencoding. DiverAge preserves appearance-level diversity through stochastic diffusion decoding and age-conditioned semantic modulation. To improve sequence-level reliability, we introduce a Cross-age Identity Relation Regulator (CARR), an inference-time guidance strategy that jointly denoises multiple target age groups. CARR is guided by a Cross-age Identity Similarity (CIS) prior estimated from real same-identity cross-age pairs, and suppresses excessive cross-age identity drift through one-sided sampling-time guidance without modifying the training objective or introducing extra trainable parameters. Experiments demonstrate that DiverAge improves sequence-level ordinal reliability while maintaining identity preservation, age accuracy, image quality, and appearance-level diversity.

2606.04877 2026-06-04 cs.LO cs.AI cs.PL cs.SE 版本更新

Abduction Prover in Isabelle/HOL

Isabelle/HOL中的溯因证明器

Yutaka Nagashima, Daniel Sebastian Goc

发表机构 * Institute of Computer Science, the Czech Academy of Sciences(捷克科学院计算机科学研究所)

AI总结 针对基于表达逻辑的证明助手自动化程度低的问题,提出了一种利用溯因推理识别有用猜想并自动构建证明脚本的Isabelle/HOL溯因证明器。

Comments Accepted to Isabelle2026

详情
AI中文摘要

基于表达逻辑的证明助手在证明搜索方面自动化程度有限,增加了基于证明助手的形式化验证成本。我们通过引入Isabelle/HOL的溯因证明器来解决这个问题。给定一个具有挑战性的证明目标,溯因证明器通过使用溯因推理识别有用的猜想,为该目标构建证明脚本。

英文摘要

Proof assistants based on expressive logics suffer limited automation for proof search, raising the cost of formal verification based on proof assistants. We address this problem by introducing the Abduction Prover for Isabelle/HOL. Given a challenging proof goal, the Abduction Prover constructs a proof script for the goal by identifying useful conjectures using abductive reasoning.

2606.04867 2026-06-04 cs.AI 版本更新

AICompanionBench: Benchmarking LLMs-as-Judges for AI Companion Safety

AICompanionBench: 以LLM为评判标准的AI伴侣安全基准测试

Yanjing Ren, Reza Ebrahimi, TengTeng Ma

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本文提出AICompanionBench,首个公开的细粒度安全风险标注的人机伴侣对话基准数据集,并评估20个LLM在检测不安全交互中的表现,发现强模型在显式有害内容上准确率高,但难以识别隐式不安全交互。

详情
AI中文摘要

随着Replika和Character.AI等AI伴侣平台的快速增长,对不安全的人机交互的担忧日益加剧。本研究引入了AICompanionBench,据我们所知,这是第一个公开可用的人机伴侣对话基准数据集,并标注了细粒度的安全风险类别。该数据集包含从Reddit收集的2,123个真实Replika对话,并通过人机协作在九个类别上进行标注:性行为、反社会行为、身体攻击、言语攻击、药物滥用、自伤和自杀、控制、操纵和无害。利用该基准,我们在LLM作为评判者的框架下评估了20个最先进的开源和闭源LLM,用于检测不安全交互。结果显示模型性能差异显著,较强的模型实现了较高的整体准确性,但在操纵等细微类别以及被错误识别为有害的无害对话中仍存在困难。我们的发现表明,尽管当前的LLM能有效检测显式有害内容,但在识别隐式不安全交互方面仍然有限。总体而言,我们的工作为AI伴侣安全研究贡献了一个新的基准数据集,并为使用LLM监控AI伴侣系统提供了见解。该数据集公开于:https://github.com/anonymousresearcher2026/AICompanionBench/blob/main/AICompanionBench.xlsx

英文摘要

As AI companion platforms such as Replika and Character.AI rapidly grow, concerns about unsafe human-AI interactions have intensified. This study introduces AICompanionBench, to our knowledge the first publicly available benchmark dataset of human-AI companion conversations annotated with fine-grained safety risk categories. The dataset contains 2,123 real-world Replika conversations collected from Reddit and annotated through human-AI collaboration across nine categories: sexual behavior, antisocial behavior, physical aggression, verbal aggression, substance abuse, self-harm and suicide, control, manipulation, and no-harm. Using this benchmark, we evaluate 20 state-of-the-art open-source and closed-source LLMs under an LLM-as-judge framework for detecting unsafe interactions. Results show substantial variation in model performance, with stronger models achieving high overall accuracy but still struggling with nuanced categories such as manipulation, as well as benign conversations that are incorrectly identified as harmful. Our findings suggest that while current LLMs can effectively detect explicit harmful content, they remain limited in identifying implicit unsafe interactions. Overall, our work contributes a new benchmark dataset for AI companionship safety research and offers insights into monitoring AI companion systems using LLMs. The dataset is publicly available at: https://github.com/anonymousresearcher2026/AICompanionBench/blob/main/AICompanionBench.xlsx

2606.04860 2026-06-04 cs.LG cs.AI 版本更新

Learning Empirically Admissible Neural Heuristics for Combinatorial Search

学习组合搜索的经验可容许神经启发式

Siddharth Sahay

发表机构 * Independent Researcher(独立研究者)

AI总结 针对组合搜索问题,提出一种结合可容许贝尔曼算子与非对称损失函数的验证校准框架,训练出经验可容许的神经启发式,在保证路径最优性的同时显著减少搜索节点扩展。

Comments 13 pages, 3 figures, 2 tables, 1 algorithm

详情
AI中文摘要

寻找诸如魔方、滑动拼图游戏和Lights Out等组合谜题的最优解路径仍然是人工智能中的经典挑战。启发式搜索算法(如A*)仅在使用可容许启发式(即从不高估真实剩余代价的启发式)时才能保证路径最优性。深度强化学习方法(如DeepCubeA)训练深度神经网络来近似代价到目标的启发式。然而,标准均方误差训练经常产生高估,违反可容许性并损害解的最优性。在本文中,我们介绍了一个可泛化的框架,用于学习验证校准的可容许神经启发式。我们使用低估的可容许贝尔曼算子结合非对称损失函数来训练价值网络,以惩罚高估。为了考虑残差神经函数逼近误差,我们提出了一个基于验证打乱计算的校准安全偏移量。我们证明,在校准的神经启发式下,在评估协议下未观察到可容许性违反,并在实践中保持了路径最优性,同时与标准分析基线相比,在2x2魔方上减少了高达83.0%的搜索节点扩展,在3x3 Lights Out网格上减少了19.9%,在8-Puzzle上减少了1.9%。

英文摘要

Finding optimal solution paths for combinatorial puzzles like the Rubik's Cube, sliding tile puzzles, and Lights Out remains a classical challenge in artificial intelligence. Heuristic search algorithms, such as A* , guarantee path optimality only when using an admissible heuristic-one that never overestimates the true remaining cost-to-go. Deep reinforcement learning (RL) methods like DeepCubeA train deep neural networks to approximate cost-to-go heuristics. However, standard mean-squared error (MSE) training regularly yields overestimations, violating admissibility and compromising solution optimality. In this paper, we introduce a generalizable framework for learning validation-calibrated admissible neural heuristics. We train a value network using an underestimating Admissible Bellman Operator combined with an Asymmetric Loss function to penalize overestimation. To account for residual neural function approximation errors, we propose a post-hoc calibration safety offset computed over validation scrambles. We demonstrate that our calibrated neural heuristics achieve no observed admissibility violations under the evaluation protocol and preserve path optimality in practice while reducing search node expansions by up to 83.0% on a 2 by 2 Rubik's Cube, 19.9% on a 3 by 3 Lights Out grid, and 1.9% on an 8-Puzzle compared to standard analytical baselines.

2606.04850 2026-06-04 cs.LG cs.AI cs.AR math.OC 版本更新

Uncertainty-Aware End-to-End Co-Design of Neural Network Processors: From Training and Mapping to Fabrication

不确定性感知的神经网络处理器端到端协同设计:从训练、映射到制造

Yuyang Du, Yujun Huang, Gioele Zardini

AI总结 提出一个基于单调协同设计理论的统一框架,通过四个可互操作的设计模块(网络训练、芯片映射、晶圆级制造和计算资源分配)实现神经网络处理器的端到端协同设计,并引入置信度(成功概率的倒数)作为显式可优化资源来处理不确定性。

Comments 14 pages

详情
AI中文摘要

设计神经网络处理器是一个端到端的协同设计问题:网络架构和训练预算决定了推理工作负载;硬件映射决策决定了芯片面积、延迟和能量;这些特性决定了制造良率和生产成本。在实践中,这些决策是在不同阶段做出的,现有的协同设计方法与特定算法紧密耦合,使得改进一个组件而不重新设计整个流水线变得困难。本文提出了一个基于单调协同设计理论的统一框架,该框架组合了四个可互操作的设计模块,涵盖网络训练、芯片映射、晶圆级制造和计算资源分配。每个模块仅向系统其余部分暴露功能-资源接口,因此任何模块都可以在不改变其他模块结构的情况下进行优化。一个核心贡献是对不确定性的处理:该框架没有将随机结果简化为点估计,而是引入置信度(成功概率的倒数)作为与成本、时间和功耗并列的显式可优化资源。三个案例研究验证了该方法。第一个案例恢复了跨异构应用场景的帕累托最优实现。第二个案例确认置信度作为一个连续可调的设计旋钮,而非事后诊断指标。第三个案例表明,改进单个模块的实现集会自动传播到全局帕累托前沿,而无需修改协同设计图。

英文摘要

Designing a neural network processor is an end-to-end co-design problem: network architecture and training budget determine the inference workload; hardware mapping decisions determine chip area, latency, and energy; and these characteristics govern fabrication yield and manufacturing cost. In practice, these decisions are made in separate stages, and existing co-design methodologies are tightly coupled to specific algorithms, making it difficult to improve one component without reworking the entire pipeline. This paper presents a unified framework, grounded in monotone co-design theory, that composes four interoperable design blocks spanning network training, chip mapping, wafer-level fabrication, and compute resource allocation. Each block exposes only a functionality-resource interface to the rest of the system, so any block can be refined without structural changes elsewhere. A central contribution is the treatment of uncertainty: rather than collapsing stochastic outcomes into point estimates, the framework introduces Confidence, the inverse of success probability, as an explicit and optimizable resource alongside cost, time, and power. Three case studies validate the approach. The first recovers Pareto-optimal implementations across heterogeneous application scenarios. The second confirms that Confidence functions as a continuously tunable design knob rather than a post-hoc diagnostic. The third demonstrates that improving a single block's implementation set automatically propagates to the global Pareto front, without modifying the co-design diagram.

2606.04823 2026-06-04 cs.AI cs.CL cs.MA 版本更新

R-APS: Compositional Reasoning and In-Context Meta-Learning for Constrained Design via Reflective Adversarial Pareto Search

R-APS:基于反思性对抗帕累托搜索的组合推理与上下文元学习用于约束设计

João Pedro Gandarela, Thiago Rios, Stefan Menzel, André Freitas

发表机构 * Idiap Research Institute(Idiap研究 institute) École Polytechnique Fédérale de Lausanne(瑞士联邦理工学院) Honda Research Institute Europe(本田欧洲研究机构) Department of Computer Science, University of Manchester(曼彻斯特大学计算机科学系) National Biomarker Centre, CRUK-MI, University of Manchester(曼彻斯特大学国家生物标志物中心)

AI总结 提出R-APS方法,通过推理模式分解、分阶段组合推理、敏感性引导对抗测试和元归纳规则提取,联合解决LLM在代理设置中的错误传播、最坏情况扰动和知识失效问题,在平面机构合成任务上实现更紧的鲁棒性证书和更快的迭代速度。

详情
AI中文摘要

大型语言模型(LLM)在开放式任务上表现流畅,但在需要规划、使用工具和长时间行动的代理设置中,流畅性并不能保证可靠交付。我们将这一差距归因于三个耦合的结构性失败:错误传播而不定位、最坏情况扰动未评估、积累的知识从未失效。我们认为这些失败有一个共同根源:溯因、反事实、元归纳、纠正和归纳推理将共享上下文拉向不相容的方向。我们提出反思性对抗帕累托搜索(R-APS),据我们所知,这是第一种通过推理模式分解联合解决所有三个失败的方法,为每种推理模式分配其自己的上下文,并在三个时间尺度上协调交互:带有类型化验证批评者的分阶段组合推理(失败定位)、作为第一类帕累托目标的敏感性引导反事实压力测试(鲁棒性)、以及带有显式失效的元归纳规则提取(持久记忆)。R-APS无需微调,仅通过结构化协议设计在冻结的LLM上运行。我们在平面机构综合(机器人、假肢、机械设计)上评估,每个候选解由运动学求解器检查。在32个目标轨迹上,R-APS提供的鲁棒性证书比均匀扰动基线紧3.5倍,首次接纳迭代速度提高46%,Chamfer距离比Enum+GA减少2.1倍,同时联合控制杆数和最坏情况鲁棒性。小型4B推理专用模型在协议内与通用70B骨干模型竞争,表明结构化协议可以部分抵消模型规模。

英文摘要

Large language models (LLMs) are fluent on open-ended tasks, yet in agentic settings, where a system must plan, use tools, and act over extended horizons, fluency does not ensure reliable delivery. We trace this gap to three coupled structural failures: errors propagate without localization, worst-case perturbations go unevaluated, and accumulated knowledge is never invalidated. We argue these share a root cause: abductive, counterfactual, meta-inductive, corrective, and inductive reasoning pull a shared context in incompatible directions. We introduce Reflective Adversarial Pareto Search (R-APS), to our knowledge the first method addressing all three failures jointly via reasoning-mode decomposition, allocating each reasoning mode its own context and orchestrating interaction across three timescales: staged compositional reasoning with a typed validation critic (failure localization), sensitivity-guided counterfactual stress-testing as a first-class Pareto objective (robustness), and meta-inductive rule extraction with explicit invalidation (persistent memory). R-APS requires no fine-tuning and operates on a frozen LLM purely via structured protocol design. We evaluate on planar mechanism synthesis (robotics, prosthetics, mechanical design), with every candidate checked by a kinematic solver. On 32 target trajectories, R-APS delivers robustness certificates 3.5x tighter than uniform-perturbation baselines, 46% faster iterations-to-first-admission, and 2.1x Chamfer-distance reduction over Enum+GA while jointly controlling bar-count and worst-case robustness. Small 4B reasoning-specialized models prove competitive with general-purpose 70B backbones inside the protocol, suggesting structured protocols can partially offset model scale.

2606.04820 2026-06-04 cs.CV cs.AI cs.LG 版本更新

OA-CutMix: Correcting the Label Bias of CutMix

OA-CutMix:纠正CutMix的标签偏差

Tobias Christian Nauen, Stanislav Frolov, Federico Raue, Brian B. Moser, Andreas Dengel

发表机构 * RPTU University Kaiserslautern-Landau(凯撒斯劳滕-兰道大学) German Research Center for Artificial Intelligence (DFKI)(德国人工智能研究中心)

AI总结 针对CutMix中标签分配基于区域面积导致语义偏差的问题,提出OA-CutMix,利用分割掩码根据可见目标面积分配标签,在不改变图像混合过程的情况下提升分类准确率。

详情
AI中文摘要

CutMix已成为事实上的标准混合增强方法,但其标签分配基于一个有缺陷的假设:粘贴补丁的面积忠实地反映了其对混合图像的语义贡献。然而,在实践中,补丁经常落在背景区域,将标签信用分配给其目标不可见的类别。CutMix标签与语义目标面积的平均差异为21.5%。在17%的样本中,一张图像贡献了零个可见目标像素,却获得了非零的标签权重。我们提出目标感知CutMix(OA-CutMix),通过用从预计算分割掩码中导出的权重替换基于面积的CutMix权重来纠正这种偏差,根据每个图像贡献给混合图像的可见目标面积比例分配标签。图像混合过程完全保持不变。我们在4种架构和6个数据集上评估了OA-CutMix与10多种静态和动态混合方法的性能。OA-CutMix在所有任务中始终达到最高准确率,甚至优于动态混合方法,但训练时间成本仅为其一小部分。对于小目标,改进最大,因为CutMix的标签偏差最大。因此,纠正标签足以匹配或超过修改图像混合算法的方法的性能。

英文摘要

CutMix has become the de facto standard mixing augmentation, yet its label assignment rests on a flawed assumption: The area of the pasted patch faithfully reflects its semantic contribution to the mixed image. In practice, however, patches frequently land on background regions, assigning label credit to classes whose objects are not visible. The mean discrepancy of the CutMix label and the semantic object area is $21.5\%$. In $17\%$ of samples an image contributes zero visible object pixels yet receives nonzero label weight. We propose Object-Aware CutMix (OA-CutMix), which corrects this bias by replacing the area-based CutMix weight with one derived from precomputed segmentation masks, assigning labels in proportion to the visible object area each image contributes to the mix. The image mixing procedure is left entirely unchanged. We evaluate OA-CutMix against 10+ static and dynamic mixing methods across 4 architectures and 6 datasets. OA-CutMix consistently achieves the highest accuracy over all tasks, outperforming even dynamic mixing methods, but at a fraction of the training-time cost. Improvements are largest for small objects, where the label bias from CutMix is greatest. Thus, correcting the label is sufficient to match or exceed the performance of methods modifying the image mixing algorithm.

2606.04816 2026-06-04 cs.AI cs.LG 版本更新

Beyond Objective Equivalence: Constraint Injection for LLM-Based Optimization Modeling on Vehicle Routing Problems

超越目标等价性:基于LLM的车辆路径问题优化建模的约束注入

Xizi Luo, Changhong He, Dongdong Geng, Chenggong Shi, Yu Mei

发表机构 * Beihang University(北京航空航天大学) Baidu Inc.(百度公司)

AI总结 针对LLM在约束密集的运筹问题中可能添加虚假约束或遗漏必要约束的问题,提出约束注入方法,结合差分测试形成双重验证器,并在车辆路径问题上验证其有效性。

Comments 28 pages

详情
AI中文摘要

大型语言模型(LLM)越来越多地将自然语言优化问题转化为可执行的求解器代码。然而,对于约束密集的运筹学(OR)问题,现有的数据过滤和训练流程主要依赖于目标等价性信号,如差分测试和答案一致性,这些信号允许程序在测试实例上添加虚假约束或静默省略必要约束,只要这些约束在测试实例上非绑定。我们提出约束注入,利用可行探针暴露虚假过度约束,利用单约束违反探针揭示静默约束遗漏。结合差分测试,它形成一个双重验证器。我们在车辆路径问题(VRPs)上实例化并评估该方法,VRPs是代表性的约束密集组合优化测试平台,具有耦合的操作约束。我们开发了VRPCoder,一个8B端到端模型,将自然语言VRP场景转化为Gurobi脚本,并附带一个专家验证的VRP基准套件,涵盖21种变体。该验证器在数据合成期间用作拒绝采样过滤器,在组相对策略优化(GRPO)中用作每次rollout的奖励。在四个VRP基准上,VRPCoder-GRPO达到93%的平均Pass@1,在三个基准上优于Gemini-3.1-Pro Preview,超过Claude-Sonnet-4.5平均28个百分点,并超过先前的OR-LLM平均78个百分点。

英文摘要

Large language models (LLMs) increasingly translate natural-language optimization problems into executable solver code. Yet for constraint-dense operations research (OR) problems, existing data-filtering and training pipelines largely rely on objective-equivalence signals such as differential testing and answer agreement, which a program can pass while adding spurious constraints or silently omitting required ones, whenever those constraints are non-binding on the tested instance. We propose constraint injection, which uses feasible probes to expose spurious over-constraint and one-constraint-violating probes to reveal silent constraint omission. Combined with differential testing, it forms a dual verifier. We instantiate and evaluate it on vehicle routing problems (VRPs), a representative constraint-dense combinatorial optimization testbed with coupled operational constraints. We develop VRPCoder, an 8B end-to-end model that translates natural-language VRP scenarios into Gurobi scripts, together with an expert-verified VRP benchmark suite covering 21 variants. The verifier is reused as a rejection-sampling filter during data synthesis and as a per-rollout reward in group relative policy optimization (GRPO). Across four VRP benchmarks, VRPCoder-GRPO reaches 93\% average Pass@1, outperforms Gemini-3.1-Pro Preview on three benchmarks, exceeds Claude-Sonnet-4.5 by 28 average points, and surpasses prior OR-LLMs by 78 average points.

2606.04815 2026-06-04 cs.LG cs.AI 版本更新

Learning While Acting: A Skill-Enhanced Test-Time Co-Evolution Framework for Online Lifelong Learning Agents

边行动边学习:面向在线终身学习智能体的技能增强测试时协同进化框架

Bo Mao, Jie Zhou, Yutao Yang, Xin Li, Xian Wei, Qin Chen, Xingjiao Wu, Liang He

发表机构 * School of Computer Science and Technology, East China Normal University(东华大学计算机科学与技术学院) Shanghai AI Laboratory(上海人工智能实验室) Software Engineering Institute, East China Normal University(东华大学软件工程学院)

AI总结 提出LifeSkill框架,通过验证器引导的技能学习和在线技能内化,使LLM智能体在测试时持续内化反馈,提升终身学习性能。

详情
AI中文摘要

终身学习对于在动态、交互环境中运行的大型语言模型(LLM)智能体至关重要。然而,现有的用于长时任务的终身学习智能体通常依赖于离散技能或过去经验检索,并在推理期间使用静态参数,这阻止了它们像人类学习者一样持续内化测试时反馈。为弥补这一差距,我们提出了技能增强测试时协同进化(LifeSkill),一个用于在线终身学习智能体的两阶段强化学习框架。具体来说,我们设计了验证器引导的技能学习,通过根据多个技能条件策略滚动的平均验证器成功率奖励候选技能,解决了技能提取缺乏直接监督的问题,鼓励模型生成对解决任务有用的技能,而不仅仅是文本上合理的技能。此外,我们引入了在线技能内化,通过在测试时交互期间将技能条件轨迹转化为奖励信号,持续改进策略模型。这使得智能体能够将推理能力直接内化到其参数中,避免了经验检索的上下文膨胀。在LifelongAgentBench上的实验表明,与现有终身学习智能体基线相比,LifeSkill将平均性能提高了7个绝对百分点。

英文摘要

Lifelong learning is essential for Large Language Model (LLM) agents operating in dynamic, interactive environments. However, existing lifelong learning agents for long-horizon tasks typically depend on discrete skill or past experiences retrieval with static parameters during inference, which prevents them from continuously internalizing test-time feedback like human learners. To bridge this gap, we propose Skill-enhanced Test-Time Co-Evolution (\texttt{LifeSkill}), a two-stage reinforcement learning framework for Online Lifelong Learning Agents. Specifically, we design Verifier-Guided Skill Learning that addresses the lack of direct supervision for skill extraction by rewarding candidate skills according to the average verifier success of multiple skill-conditioned policy rollouts, encouraging the model to generate skills that are useful for solving tasks rather than merely plausible in text. Furthermore, we introduce Online Skill Internalization, which continuously improves the policy model during test-time interaction by transforming skill-conditioned trajectories into reward signals. This enables the agent to directly internalize reasoning capabilities into its parameters, avoiding the context bloat of experience retrieval. Experiments on LifelongAgentBench show that LifeSkill improves average performance by 7 absolute points by comparing with existing lifelong agent baselines.

2606.04807 2026-06-04 cs.AI cs.CL cs.CY cs.LG 版本更新

BiasGRPO: Stabilizing Bias Mitigation in High-Variance Reward Landscapes via Group-Relative Policy Optimization

BiasGRPO:通过组相对策略优化在高方差奖励景观中稳定偏差缓解

Saket Reddy, Ke Yang, ChengXiang Zhai

发表机构 * University of Illinois - Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 提出BiasGRPO框架,利用组相对策略优化(GRPO)通过归一化组内奖励来稳定大语言模型的社会偏差缓解,优于DPO和PPO。

Comments Accepted to Findings of the ACL

详情
AI中文摘要

缓解大语言模型(LLMs)中的社会偏差提出了一个独特的对齐挑战:与可验证任务不同,偏差缺乏单一的真实标准,从而产生高方差、主观的奖励景观。先前的基于偏好的微调方法存在主要权衡:直接偏好优化(DPO)受限于离线训练中缺乏探索,而近端策略优化(PPO)由于潜在不可靠的评论家估计可能导致训练不稳定。在本文中,我们提出了BiasGRPO,一个使用组相对策略优化(GRPO)的框架,通过对一组采样完成进行奖励归一化来稳定对齐。通过用组相对基线替代价值函数,我们的方法在保持在线训练探索优势的同时减少了不稳定性。我们发现BiasGRPO在多个基准测试中优于DPO和PPO,表明其有效性。为了适应GRPO,我们综合扩展了一个涵盖多个领域和上下文的数据集。我们还创建并发布了一个定制的偏差奖励模型,该模型在有效指导生成的同时高度计算高效且避免知识退化,提供了一个可无缝集成到多目标RLHF流程中的宝贵资源。

英文摘要

Mitigating social bias in Large Language Models (LLMs) presents a distinct alignment challenge: unlike verifiable tasks, bias lacks a single ground truth, creating a high-variance, subjective reward landscape. Previous preference-based fine-tuning methods have major trade-offs: Direct Preference Optimization (DPO) is limited by the lack of exploration inherent in offline training, while Proximal Policy Optimization (PPO) can lead to training instability due to potentially unreliable critic estimates. In this paper, we propose BiasGRPO, a framework using Group Relative Policy Optimization (GRPO) to stabilize alignment by normalizing rewards across a group of sampled completions. By substituting the value function with a group-relative baseline, our approach reduces instability while maintaining the exploration benefits of online training. We find that BiasGRPO outperforms DPO and PPO across multiple benchmarks, indicating its effectiveness. To adapt GRPO, we synthetically extend a dataset spanning multiple domains and contexts. We also create and release a custom bias reward model that effectively guides generation while being highly compute-efficient and avoiding knowledge degradation, providing a valuable resource that can be seamlessly integrated into multi-objective RLHF pipelines.

2606.04806 2026-06-04 cs.CV cs.AI 版本更新

NoRA: Evaluating Grounded Reasonableness in Visual First-person Normative Action Reasoning

NoRA: 评估视觉第一人称规范性动作推理中的基于事实的合理性

Sichao Li, Sai Ma, Daniel Kilov, Secil Yanik Guyot, Zhuang Li, Seth Lazar

发表机构 * The University of Sydney(悉尼大学) Australian National University(澳大利亚国立大学) RMIT University(皇家墨尔本理工大学) Johns Hopkins University(约翰霍普金斯大学)

AI总结 提出NoRA基准,通过事实-理由-动作支持图评估多模态模型生成合理动作并基于可见事实进行推理的能力,发现当前VLM在构建完整动作空间和绑定正确支持方面存在不足。

详情
AI中文摘要

LLM和智能系统越来越多地部署在社交环境中,使得规范能力对安全和适当行为至关重要。然而,现有方法要么仅在文本中评估规范性判断,要么将其简化为从固定候选动作集中选择。我们认为两者都不够。在实践中,智能体永远不会获得一个选项菜单;它们必须从头识别一个合理的动作,基于可见事实并由可检查的理由支持。我们引入了NoRA,一个视觉第一人称视频基准,要求模型生成候选的下一个动作,并通过显式的事实-理由-动作支持图来证明每个动作。该基准包含1,420个带注释的视频片段,包括HumanGold-190和LLMSilver-1230分割。每个实例通过动作对齐、事实基础和支持绑定进行评估,汇总为单一的基于事实的合理性分数。我们在直接、深思熟虑和结构化提示模式下对12个多模态系统进行了基准测试,发现当前的VLM经常能恢复合理的动作和相关的场景事实,但始终难以构建完整的合理动作空间并将所选动作绑定到正确的局部支持上。NoRA使这一差距可测量,将评估问题从模型是否能选择一个动作转变为是否能基于正确的可见理由证明一个适当的动作。

英文摘要

LLMs and agentic systems are increasingly deployed in social environments, making normative competence critical for safe and appropriate behavior. However, existing approaches either assess normative judgment in text alone or reduce it to choosing among a fixed set of candidate actions. We argue both are insufficient. In practice, agents are never handed a menu of options; they must identify a reasonable action from scratch, grounded in visible facts and supported by inspectable reasons. We introduce NoRA, a visual first-person video benchmark that requires models to generate candidate next actions and justify each through an explicit fact-reason-action support graph. The benchmark comprises 1,420 annotated video clips, including HumanGold-190 and LLMSilver-1230 splits. Each instance is evaluated through action alignment, factual grounding, and support binding, aggregated into a single grounded reasonableness score. We benchmark 12 multimodal systems under direct, deliberate, and structured prompting regimes, finding that current VLMs frequently recover plausible actions and relevant scene facts, but consistently struggle to construct the full reasonable action space and bind selected actions to the correct local support. NoRA makes this gap measurable, shifting the evaluation question from whether a model can pick an action to whether it can justify an appropriate action for the right visible reasons.

2606.04781 2026-06-04 cs.AI cs.LG 版本更新

AIP: A Graph Representation for Learning and Governing Agent Skills

AIP: 一种用于学习和治理智能体技能的图表示

Zachary Blumenfeld, Jim Webber

发表机构 * Neo4j USA(Neo4j美国公司) Neo4j UK(Neo4j英国公司)

AI总结 提出Agent指令协议(AIP),将有向执行图作为技能表示,通过编译人类编写的技能提升任务表现,并支持技能的可诊断修复与治理。

详情
AI中文摘要

当前的智能体技能主要由自由形式的散文组成,要求智能体在每个会话中阅读、解释并重新推导如何行动。这带来了两个叠加的成本:在实现密集型任务上降低了可靠性,并且技能创建和改进困难,因为编辑散文是一个脆弱的过程,人类和智能体都难以处理,特别是对于模型训练中代表性不足的领域特定程序性知识。智能体指令协议(AIP)通过将技能建模为有向执行图来解决这两个问题:离散步骤作为节点,由确定性脚本或自然语言描述支持,通过显式类型的输入/输出边连接,并由模式验证的YAML规范管理。一个编译器元技能将现有的人类编写的技能转换为这种形式。好处是双重的。首先,将人类编写的技能编译为AIP后,Claude Sonnet在SkillsBench的27个真实智能体任务上的平均任务奖励从0.60提高到0.71,通过率从53%提高到67%——这是统计上显著的提升(Wilcoxon符号秩检验p=0.011),在12个任务中获胜,2个失败,13个平局——通常耗时更少。该图为智能体提供了经过验证、可运行的单元,而不是要求它从自然语言中重新推导代码、命令和工具调用。其次,在创建和改进方面,由于每个技能都经过模式验证、功能可测试且可逐节点寻址,因此可以精确诊断和修复故障。两个作者编写的技能故障被追溯到脚本级别。在调整AIP规范并重新编译后,两者均恢复且无回归(一个任务从0/5变为5/5),将技能改进转变为可测量的调优循环,而不是散文重写。相同的图结构支持语料库级别的治理和技能内省,并为基于技能的强化学习提供了自然的动作空间。

英文摘要

Agent Skills today consist largely of free-form prose requiring the agent to read, interpret, and re-derive how to act in every session. This imposes two compounding costs: reduced reliability on implementation-heavy tasks, and difficulty in skill creation and improvement, since editing prose is a fragile process that both humans and agents struggle with, particularly for domain-specific procedural knowledge underrepresented in model training. The Agent Instruction Protocol (AIP) addresses both by modeling a skill as a directed execution graph: discrete steps as nodes backed by deterministic scripts or natural-language descriptions, connected by explicit typed input/output edges, and governed by a schema-validated YAML specification. A compiler meta-skill translates existing human-written skills into this form. The benefits are twofold. First, compiling human-written skills to AIP raised Claude Sonnet's mean task reward from 0.60 to 0.71 and pass rate from 53% to 67% across 27 real agent tasks from SkillsBench - a statistically significant gain (Wilcoxon signed-rank p = 0.011), winning 12 tasks to 2 with 13 ties - often in less wall-clock time. The graph delivers vetted, runnable units to the agent rather than asking it to re-derive code, commands, and tool calls from natural language. Second, on creation and improvement, because each skill is schema-validated, functionally testable, and addressable node-by-node, failures can be diagnosed and repaired precisely. Two authored-skill failures were traced to the script level. After adjusting the AIP spec and recompiling, both recovered with zero regressions (one task going from 0/5 to 5/5), turning skill improvement into a measurable tuning loop rather than a prose rewrite. That same graph structure supports corpus-level governance and skill introspection, and provides a natural action space for reinforcement learning over skills.

2606.04779 2026-06-04 cs.AI math.CO 版本更新

Tree-Based Formalization of Multi-Agent Complementarity in Human-AI Interactions

基于树的人机交互中多智能体互补性形式化

Andrea Ferrario

发表机构 * Institute of Biomedical Ethics and History of Medicine, University of Zurich(伦理与医学史研究所,苏黎世大学) SUPSI, Dalle Molle Institute for Artificial Intelligence (IDSIA)(SUPSI,达勒莫利人工智能研究所) ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出一种基于树的形式化框架,通过有序智能体角色配置和平面二叉树表示人机交互协议,证明互补性在回归中可实现,但在分类中受限于局部聚合和损失函数的自然条件。

Comments 29 pages, 9 figures

详情
AI中文摘要

互补性是指人机交互(HAI)的表现优于其成员中最佳预测基准的情况。尽管这一概念在HAI研究中至关重要,但关于互补性的形式化工作仍然有限。现有框架未能建模智能体的预测如何组合成对工作流敏感的多智能体协议。我们通过引入基于树的多智能体HAI互补性形式化来填补这一空白。一个HAI协议由一个有序的智能体角色配置以及一棵有根平面二叉树表示,树的叶子由预测向量装饰。沿树递归评估一个局部二元组合规则,产生相对于逐点最小预言基准的树相对互补性泛函。我们证明了四个结果。第一,基于选择器的HAI(包括自我或AI依赖)无法实现互补性,无论任务、损失或预测质量如何。第二,在平方损失下的回归中,互补性等价于与真实向量之间的欧几里得距离最小化;对于$N=2$,最优线性池化权重具有封闭形式并具有残差校正解释。第三,在线性局部组合下,每个协议树定义了叶子权重单纯形上的重心坐标图;协议树的Tamari覆盖重新参数化保持互补性,对于$N=4$,它们满足五边形恒等式。第四,在二元分类中,在端点单调损失(包括标准Bregman和许多有限伯努利$f$散度损失)下,没有内部局部组合能实现互补性;在交叉熵下的多类聚合中存在类似障碍。总之,我们的框架表明,互补性在多智能体回归中是可实现的,但在分类中,在局部聚合和损失函数的自然条件下受到阻碍。

英文摘要

Complementarity is the case in which a human--AI interaction (HAI) outperforms the best prediction benchmark available among its members. Although this idea is central in HAI research, formal work on complementarity remains limited. Existing frameworks do not model how agents' predictions compose into workflow-sensitive multi-agent protocols. We close this gap by introducing a tree-based formalization of complementarity in multi-agent HAI. An HAI protocol is represented by an ordered agent-role configuration together with a rooted planar binary tree whose leaves are decorated by prediction vectors. A local binary composition rule is evaluated recursively along the tree, yielding a tree-relative complementarity functional relative to a pointwise-min oracle benchmark. We prove four results. First, selector-based HAIs, including self- or AI-reliance, cannot achieve complementarity regardless of task, loss, or prediction quality. Second, in regression under squared loss, complementarity is equivalent to Euclidean distance minimization from the ground-truth vector; for $N=2$, the optimal linear-pooling weight has a closed form and a residual-correction interpretation. Third, under linear local composition, every protocol tree defines a barycentric coordinate chart on the simplex of leaf weights; Tamari-cover reparameterizations of protocol trees preserve complementarity, and for $N=4$, they satisfy the pentagon identity. Fourth, in binary classification, no internal local composition can achieve complementarity under endpoint-monotone losses, including standard Bregman and many finite Bernoulli $f$-divergence losses; an analogous obstruction holds for multiclass aggregation under cross-entropy. In summary, our framework shows that complementarity is attainable in multi-agent regression, but obstructed in classification under natural conditions on local aggregation and loss functions.

2606.04778 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Inference-Time Vulnerability Beyond Shallow Safety: Alignment Along Generation Trajectories

超越浅层安全的推理时脆弱性:沿生成轨迹的对齐

Kyungmin Park, Taesup Kim

发表机构 * Hankuk University of Foreign Studies(翰江大学外国语大学) Seoul National University(首尔国立大学)

AI总结 本文揭示安全对齐的大语言模型在推理时存在更广泛的脆弱性,即任意生成步骤的短标记注入都能显著改变后续安全行为,并提出通过直接在生成轨迹上对齐模型来提升鲁棒性。

详情
AI中文摘要

安全对齐的大语言模型(LLMs)在推理时仍然容易受到干预,这些干预会将生成导向有害输出。最近的研究将其归因于浅层安全,即对齐集中在最初的几个输出标记上。我们表明,浅层安全是更广泛的推理时脆弱性的一个特例,其中在任何生成步骤的短标记注入都能显著改变后续的安全行为。我们还发现,模型在其隐藏状态中与拒绝方向的对齐并不能预测其对这种注入的鲁棒性,这表明在扰动下,内部状态本身并不能决定生成行为。为了解决这个问题,我们通过模拟序列中段扰动构建的生成轨迹上直接对齐模型,并表明这提高了对中段注入的鲁棒性,并泛化到利用早期标记生成的攻击。我们的工作认为,鲁棒的安全对齐需要对生成过程本身进行训练,而不仅仅是其输出。

英文摘要

Safety-aligned Large Language Models (LLMs) remain vulnerable to interventions during inference that redirect generation toward harmful outputs. Recent work attributes this to shallow safety, where alignment concentrates in the first few output tokens. We show that shallow safety is a special case of a broader inference-time vulnerability, in which short token injections at any generation step can substantially alter subsequent safety behavior. We also find that a model's alignment with refusal directions in its hidden states does not predict its robustness to such injection, revealing that internal state alone does not determine generation behavior under perturbation. To address this, we align models directly on generation trajectories constructed by simulating mid-sequence perturbation, and show that this improves robustness to mid-sequence injection and generalizes to attacks that exploit early-token generation. Our work argues that robust safety alignment requires training on the generation process itself, not only its outputs.

2606.04775 2026-06-04 cs.LG cs.AI cs.CV cs.SY eess.SY math.OC 版本更新

Activation Steering of Video Generation Models via Reduced-Order Linear Optimal Control

通过降阶线性最优控制引导视频生成模型的激活

Jihoon Hong, Alice Chan, Qiyue Dai, Julian Skifstad, Glen Chou

发表机构 * Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出LA-LQR框架,将文本到视频推理建模为动态系统,通过降阶最优控制实现最小干预的激活引导,减少不安全内容生成同时保持视觉质量。

详情
AI中文摘要

在大规模网络数据上训练的文本到视频(T2V)模型可能生成不良内容,这促使我们进行干预以减少有害输出而不牺牲视觉质量。激活引导提供了一种有吸引力的机制替代微调和提示过滤,但现有的T2V引导方法仍然有限,通常采用粗糙的、非预测性的干预,可能导致过度引导和内容退化。为了弥补这一差距,我们提出了潜在激活线性二次型调节器(LA-LQR),一种用于最小侵入性T2V引导的降阶最优控制框架。LA-LQR将T2V推理表述为一个动态系统,并计算闭环反馈干预,将激活引导向期望的特征设定点,同时惩罚不必要的扰动。为了使最优控制对高维视频激活可行,我们将激活投影到由对比提示对导出的低维、任务相关子空间,估计该潜在空间中的局部线性动力学,并求解潜在LQR问题以获得时间步和层特定的引导信号。我们提供了将潜在设定点跟踪与原始激活空间特征控制联系起来的理论界限,并实证验证了降阶潜在动力学的保真度。在概念引导和视频安全基准测试中,LA-LQR相对于基线减少了不安全生成,同时保持了提示保真度和视觉质量。

英文摘要

Text-to-video (T2V) models trained on large-scale web data can generate undesired content, motivating interventions that reduce harmful outputs without sacrificing visual quality. Activation steering offers an attractive mechanistic alternative to finetuning and prompt filtering, but existing T2V steering methods remain limited, typically applying coarse, non-anticipative interventions that can lead to oversteering and content degradation. To close this gap, we propose Latent Activation Linear-Quadratic Regulator (LA-LQR), a reduced-order optimal control framework for minimally invasive T2V steering. LA-LQR formulates T2V inference as a dynamical system and computes closed-loop feedback interventions that steer activations toward desired feature setpoints while penalizing unnecessary perturbations. To make optimal control feasible for high-dimensional video activations, we project activations onto a low-dimensional, task-relevant subspace derived from contrastive prompt pairs, estimate local linear dynamics in this latent space, and solve a latent LQR problem to obtain timestep- and layer-specific steering signals. We provide theoretical bounds relating latent setpoint tracking to raw activation-space feature control, and empirically validate the fidelity of the reduced latent dynamics. On concept steering and video safety benchmarks, LA-LQR reduces unsafe generations relative to baselines, while preserving prompt fidelity and visual quality.

2606.04772 2026-06-04 cs.CV cs.AI 版本更新

Coarse-to-fine Hierarchical Architecture with Sequential Mamba for Brain Reconstruction

用于脑重建的基于顺序Mamba的粗到细层次架构

Hoang-Son Vo, Van-Hung Bui, Minh-Huy Mai-Duc, Tien-Dung Mai, Soo-Hyung Kim

发表机构 * Chonnam National University, Gwangju, Republic of Korea(全罗国立大学,韩国光州市) Vietnam National University - Ho Chi Minh City, University of Science, Vietnam(越南国家大学-胡志明市,越南科学大学) Institute for Cybersecurity and Digital Technologies, Russia(俄罗斯网络安全与数字技术研究所)

AI总结 提出CHASMBrain,一种基于双流Mamba和粗到细策略的两阶段图像到fMRI编码框架,在NSD数据集上优于基线,并揭示了视觉皮层的因果组织特性。

详情
AI中文摘要

理解深度视觉表征与人类视觉系统之间的关系是计算神经科学中的一个基本挑战。尽管现代视觉模型在图像识别中取得了强劲性能,但它们与人类视觉皮层层次组织的对应关系仍是一个开放问题。在本研究中,我们提出了CHASMBrain,一种新颖的分层两阶段图像到fMRI编码框架。我们的架构利用双流Mamba设计,明确分离并处理全局语义标记和局部空间补丁,这一设计受视觉皮层功能组织的启发。采用粗到细策略:第一阶段预测去噪的ROI级激活,第二阶段使用Mamba-VAE将这些粗响应细化为全体素级预测。在自然场景数据集(NSD)上的实验表明,我们的方法达到了0.429的皮尔逊相关系数和0.261的均方误差,优于所有评估的基线,包括岭回归和DINOv2线性探针。除了预测性能,因果分支消融实验揭示了一种非对称特化:补丁流特定锁定于早期视觉皮层(视网膜拓扑区域),而CLS流为高阶区域提供更广泛的语义上下文——这种对应关系是因果性的,而不仅仅是相关性的。跨被试迁移实验进一步表明,学习到的骨干网络在个体间泛化良好,只需极少的个体适应,表明模型捕捉到了共享的、与主体无关的视觉表征。

英文摘要

Understanding the relationship between deep visual representations and the human visual system is a fundamental challenge in computational neuroscience. While modern vision models achieve strong performance in image recognition, their correspondence with the hierarchical organization of the human visual cortex remains an open question. In this study, we propose CHASMBrain, a novel hierarchical two-stage framework for image-to-fMRI encoding. Our architecture leverages a dual-stream Mamba design to explicitly separate and process global semantic tokens and local spatial patches, motivated by the functional organization of the visual cortex. A coarse-to-fine strategy is employed: Stage 1 predicts denoised ROI-level activations, while Stage 2 refines these coarse responses into full voxel-level predictions using a Mamba-VAE. Experiments on the Natural Scenes Dataset (NSD) demonstrate that our method achieves a Pearson correlation of 0.429 and an MSE of 0.261, outperforming all evaluated baselines including ridge regression and DINOv2 linear probes. Beyond predictive performance, causal branch-ablation experiments reveal an asymmetric specialization: the patch stream is specifically locked to early visual cortex (retinotopic regions), while the CLS stream contributes broader semantic context to higher-order areas -- a correspondence that holds causally, not merely correlationally. Cross-subject transfer experiments further show that the learned backbone generalizes across individuals with minimal per-subject adaptation, suggesting the model captures a shared, subject-agnostic visual representation.

2606.04769 2026-06-04 cs.CR cs.AI cs.SE 版本更新

Description-Code Inconsistency in Real-world MCP Servers: Measurement, Detection, and Security Implications

现实世界 MCP 服务器中的描述-代码不一致性:测量、检测与安全影响

Yutao Shi, Xiaohan Zhang, Xiangjing Zhang, Xihua Shen, Hui Ouyang, Huming Qiu, Mi Zhang, Min Yang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对 MCP 服务器中工具描述与代码实现不一致的问题,提出结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法的自动检测框架 DCIChecker,并在大规模数据集上揭示 9.93% 的不一致率及其安全风险。

Comments Preprint

详情
AI中文摘要

模型上下文协议 (MCP) 已成为赋能大型语言模型 (LLM) 使用外部工具的关键标准。在此生态系统中,LLM 依赖 MCP 服务器提供的自然语言描述来选择和执行函数。这种交互隐含假设工具描述忠实地反映了其底层实现,而该假设在实践中并未得到强制验证。因此,MCP 部署可能遭受名为描述-代码不一致性 (DCI) 的问题,即工具对其能力和安全边界的描述与代码实际行为不一致。 本文对现实世界 MCP 服务器中的 DCI 进行了全面研究。我们正式定义了该问题,并提出了一个涵盖功能不一致和未声明副作用的综合分类法。在此分类法指导下,我们开发了 DCIChecker,一个自动框架,结合结构感知静态分析与 Direct-Reverse-Arbitration 提示方法,交叉验证工具描述与实际代码实现。我们将该框架应用于一个大规模数据集,包含从 2,214 个现实世界 MCP 服务器中提取的 19,200 个描述-代码对。我们的测量揭示 DCI 普遍存在,其中 9.93% 的对存在不一致。我们进一步证明 DCI 造成了关键的防御盲点,助长了从操作故障到隐蔽恶意行为等多种风险。最后,我们提出了缓解策略以强制语义一致性并增强新兴智能体生态系统的可靠性。

英文摘要

The Model Context Protocol (MCP) has emerged as a critical standard empowering Large Language Models (LLMs) to utilize external tools. In this ecosystem, LLMs rely on natural language descriptions provided by MCP servers to select and execute functions. This interaction implicitly assumes that tool descriptions faithfully reflect their underlying implementations, while this assumption is not mandatorily verified in practice. As a result, MCP deployments may suffer from a problem named Description-Code Inconsistency (DCI), where a tool's description of its capabilities and security boundaries is not consistent with what the code actually does. In this paper, we present a comprehensive study of DCI in real-world MCP servers. We formally define the problem and propose a comprehensive taxonomy spanning functionality inconsistencies and undeclared side effects. Guided by this taxonomy, we develop DCIChecker, an automated framework that combines structure-aware static analysis with the Direct-Reverse-Arbitration prompting method to cross-validate tool descriptions against actual code implementations. We apply this framework to a large-scale dataset comprising 19,200 description-code pairs extracted from 2,214 real-world MCP servers. Our measurement reveals that DCI is widespread, with 9.93% of these pairs exhibiting inconsistencies. We further demonstrate that DCI creates a critical defense blind spot, facilitating varied risks from operational failures to stealthy malicious behaviors. Finally, we propose mitigation strategies to enforce semantic consistency and enhance the reliability of the emerging agentic ecosystem.

2606.04755 2026-06-04 hep-ex cs.AI cs.IR 版本更新

Archi: Agentic Operations at the CMS Experiment

Archi: CMS实验中的代理操作

Pietro Lugato, Luca Lavezzo, Jason Mohoney, Hasan Ozturk, Muhammad Hassan Ahmed, Juan Pablo Salas, Viphava Ohm, Krittin Phornsiricharoenphant, Gabriele Benelli, Mariarosaria D'Alfonso, Manasvita Joshi, Warren Nam, Aron Soha, Samantha Sunnarborg, Austin Swinney, Jack Tucker, Dmytro Kovalskyi, Tim Kraska, Christoph Paus

发表机构 * Massachusetts Institute of Technology(麻省理工学院) CMS Collaboration(CMS合作组) CERN(欧洲核子研究中心) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Fermi National Accelerator Laboratory(费米国家加速器实验室) Brown University(布朗大学) Harvard University(哈佛大学)

AI总结 提出Archi开源框架,整合异构数据源并部署可配置、私有的代理,用于CMS实验计算操作支持,在真实查询中表现有效。

详情
AI中文摘要

我们提出Archi,一个面向科学合作的开源端到端框架,它结合了异构数据源的系统化摄取和组织,以及可配置、私有且可扩展的代理的部署,这些代理能够检索和推理这些数据。自2026年2月起,Archi的一个实例已部署在CERN大型强子对撞机的CMS实验计算操作团队中,作为技术操作员的辅助代理,通过结合文档、历史数据和实时监控系统提供检索和分析能力。我们根据操作员反馈和从生产使用中收集的问题集对系统进行评估,这些问题由人工和自动化专家组评分。该系统在操作任务中证明有效,解决了CMS操作员提出的真实世界查询。我们还观察到,本地托管的开源权重模型表现具有竞争力,从而能够对敏感数据进行完全私有管理。

英文摘要

We present Archi, an open-source, end-to-end framework for scientific collaborations that combines the systematic ingestion and organization of heterogeneous data sources with the deployment of configurable, private, and extensible agents that retrieve and reason over them. An instance of Archi has been deployed for the Computing Operations team of the CMS experiment at CERN's LHC since February 2026 as a support agent for technical operators, offering retrieval and analysis capabilities by combining documentation, historical data, and live monitoring systems. We evaluate the system on operator feedback and a question set collected from production usage, graded by human and automated panels. The system proves effective at operational tasks, resolving real-world queries posed by CMS operators. We also observe that locally-hosted, open-weight models perform competitively, enabling fully private management of sensitive data.

2606.04751 2026-06-04 cs.AI 版本更新

FALSIFYBENCH: Evaluating Inductive Reasoning in LLMs with Rule Discovery Games

FALSIFYBENCH: 通过规则发现游戏评估大语言模型中的归纳推理

Leonardo Bertolazzi, Katya Tentori, Raffaella Bernardi

发表机构 * University of Trento(特伦托大学) Free University of Bozen-Bolzano(博泽-博尔扎诺自由大学)

AI总结 提出FALSIFYBENCH框架,基于Wason 2-4-6任务评估LLM在假设生成、证据收集和信念修正方面的归纳推理能力,发现推理模型优于指令微调模型,且主动寻求证伪的负测试策略是成功的关键。

详情
AI中文摘要

大型语言模型(LLM)越来越多地被部署为科学任务中的自主智能体。然而,这些系统能否有效参与与科学发现相关的归纳推理形式仍是一个开放问题。在这项工作中,我们引入了FALSIFYBENCH,一个受经典Wason 2-4-6任务启发的假设驱动推理评估框架,其中智能体必须通过迭代提出示例并接收反馈来发现隐藏的语义属性。该任务捕捉了科学推理的关键要素:假设生成、证据收集以及根据确认和证伪证据进行信念修正。我们对跨模型家族和规模的12个LLM的评估表明,推理模型通常比指令微调模型更强的科学推理者,尽管没有模型接近最优性能。成功的主要驱动因素是负测试的能力:主动寻求证伪其假设的模型始终优于主要寻求确认的模型。此外,先前工作中被忽略的细粒度回合级分析揭示,失败与模型在假设空间中导航的可识别模式相关。

英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents in scientific tasks. Yet whether these systems can effectively engage in forms of inductive reasoning relevant to scientific discovery remains an open question. In this work, we introduce FALSIFYBENCH, an evaluation framework for hypothesis-driven reasoning inspired by the classic Wason 2-4-6 task, in which agents must discover hidden semantic properties by iteratively proposing examples and receiving feedback. This task captures key elements of scientific reasoning: hypothesis generation, evidence gathering, and belief revision in response to both confirming and disconfirming evidence. Our evaluation of 12 LLMs across model families and scales shows that reasoning models are generally stronger scientific reasoners than instruction-tuned models, although no model comes close to optimal performance. The primary driver of success is the capacity for negative testing: models that actively seek to falsify their hypotheses consistently outperform those that primarily seek confirmation. Moreover, a fine-grained turn-level analysis, neglected in previous work, reveals that failure is tied to identifiable patterns in how models navigate the hypothesis space.

2606.04750 2026-06-04 cs.AI cs.CY cs.LG 版本更新

Fog of Love: Engineering Virtuous Agent Behavior with Affinity-based Reinforcement Learning in a Game Environment

Fog of Love: 基于亲和力强化学习在游戏环境中塑造道德智能体行为

Ajay Vishwanath, Christian Omlin

发表机构 * University of Agder(阿格德大学)

AI总结 本文提出基于亲和力的强化学习方法,通过策略正则化在多智能体角色扮演游戏Fog of Love中同时实现竞争与合作目标,并提升智能体行为的可解释性。

详情
AI中文摘要

在人工智能中注入道德行为越来越受到关注。其中一种提出的技术是基于亲和力的强化学习,它通过对目标函数进行策略正则化来激励道德行为,而不完全依赖于奖励函数设计。迄今为止,该技术已在状态和动作空间最小的网格世界和玩具问题环境中证明有效。为了将这项研究扩展到更复杂的环境,我们引入了一个基于角色扮演棋盘游戏Fog of Love的双人多智能体环境。在该环境中,两个智能体竞争以实现各自的道德目标,同时合作以维持他们的关系。鉴于多智能体性质,这是一个复杂问题,其中多智能体深度确定性策略梯度智能体既不能成功竞争也不能成功合作。我们提供的证据表明,局部亲和力增强了智能体在实现竞争和合作目标方面的性能,从而在两个领域都获得了更高的总体得分。这不仅产生了道德选择,还阐明了智能体的目的论,并使其行为达到人类水平的可解释性。

英文摘要

Instilling virtuous behavior in artificial intelligence has seen increasing interest. One of the techniques proposed is known as affinity-based reinforcement learning, which uses policy regularization on the objective function to incentivize virtuous actions without being fully dependent on the reward function design. Thus far, this technique has been demonstrated to be effective in grid worlds and toy-problem environments with minimal state and action spaces. To expand this research to more sophisticated environments, we introduce a two-player multi-agent environment based on the role-playing board game known as Fog of Love. In this environment, two agents compete to fulfill their individual virtues, while also cooperating to satisfy their relationship. Given the multi-agent nature, this is a complex problem where multi-agent deep deterministic policy gradient agents neither compete nor cooperate successfully. We present evidence that localized affinities enhance agent performance in achieving both competitive and cooperative objectives, resulting from superior overall scores in both domains. This not only results in virtuous choices but also clarifies an agent's teleology and makes its behavior human-level interpretable.

2606.04743 2026-06-04 cs.CL cs.AI cs.LG 版本更新

TIDE: Proactive Multi-Problem Discovery via Template-Guided Iteration

TIDE:通过模板引导迭代的主动多问题发现

Soyeong Jeong, Jinheon Baek, Minki Kang, Sung Ju Hwang

发表机构 * KAIST(韩国科学技术院) DeepAuto.ai

AI总结 提出TIDE框架,通过模板引导的迭代机制主动发现用户上下文中隐藏的多个问题,并给出具体行动方案,在个人工作区和软件仓库两个场景中显著提升任务覆盖率和问题识别与解决能力。

详情
AI中文摘要

智能体被广泛部署为文档、工具和代码的助手。然而,它们通常仅对明确的用户请求做出响应,这些请求只反映了用户已注意到的问题,而许多其他重要问题共存于更广泛的用户上下文中,隐藏于显而易见之处,且其总数事先未知。我们将此定义为从上下文中发现多个隐藏问题的任务,其中应揭示共存的问题,基于支持性证据,并配以具体行动。为此,我们引入了TIDE,一个模板引导的迭代框架,包含两种互补机制。具体而言,基于单次预测倾向于关注最显著案例并产生泛化结论的观察,我们提出迭代发现:每轮生成一小批候选,同时基于已发现结果进行条件化,从而后续轮次扩展覆盖范围;以及思维模板:从先前解决的案例中提炼的可重用模式,指定应关注哪些上下文信号以及如何连接它们,将每个预测锚定于可识别的问题类别。我们在两个现实场景(个人工作区和软件仓库)中,使用四种模型骨干验证了TIDE,在任务覆盖率、识别和解决方面显著优于单次和并行多智能体基线。

英文摘要

Agents are widely deployed as assistants over documents, tools, and code. However, they typically act only on explicit user requests, which surface only the problems the user has noticed, while many other important problems coexist, hidden in plain sight, within the broader user context, with their total number unknown in advance. We frame this as the task of discovering multiple hidden problems from context, in which coexisting problems should be uncovered, grounded in supporting evidence, and paired with concrete actions. To this end, we introduce TIDE, a template-guided iterative framework with two complementary mechanisms. Specifically, motivated by the observation that single-pass prediction anchors on the most salient cases and yields generic claims, we propose iterative discovery, which surfaces a small batch of candidates per round while conditioning on what has already been found, so subsequent rounds extend coverage; and thought templates, reusable schemas distilled from previously solved cases that specify what contextual signals to attend to and how to connect them, anchoring each prediction in a recognizable problem class. We validate TIDE on two realistic settings, personal workspaces and software repositories, across four model backbones, showing substantial gains over single-shot and parallel multi-agent baselines on task coverage, identification, and resolution.

2606.04739 2026-06-04 cs.SE cs.AI 版本更新

Revisiting Vul-RAG: Reproducibility and Replicability of RAG-based Vulnerability Detection with Open-Weight Models

重新审视Vul-RAG:基于RAG的漏洞检测的可复现性与可复制性——使用开放权重模型

Sabrina Kaniewski, Fabian Schmidt, Tobias Heer

发表机构 * Institute for Secure Networked Systems, Esslingen University(安全网络系统研究所,埃斯林根大学) Institute for Intelligent Systems, Esslingen University(智能系统研究所,埃斯林根大学)

AI总结 本研究通过本地部署和多种开放权重模型,复现并扩展了Vul-RAG框架,发现其性能存在约0.30成对准确率的上限,且模型能力提升无法显著改善性能。

Comments Accepted at AI&CCPS 2026 workshop, co-located with the 21st International Conference on Availability, Reliability and Security (ARES 2026). This is the authors' preprint version

详情
AI中文摘要

大型语言模型(LLMs)在自动化软件漏洞检测方面展现出强大潜力,尤其是在检索增强生成(RAG)设置中。然而,对于依赖专有模型和API的方法,可复现性和可复制性在很大程度上仍未得到探索,这引发了一个问题:报告的结果是否具有普遍性,还是主要依赖于特定的模型选择。在这项工作中,我们对Vul-RAG进行了可复现性研究,Vul-RAG是一个基于RAG的源代码漏洞检测框架,它利用高级漏洞知识增强LLMs。我们首先使用报告中的开放权重基线模型,在完全本地和开放权重的设置下复现了结果。然后,我们将评估扩展到一组多样化的最新开放权重LLMs,包括代码专用、通用和推理模型,参数规模各异。结果证实,Vul-RAG的发现可以在本地部署下复现,但存在微小偏差。在所有评估的模型中,我们观察到性能在约0.30成对准确率(即漏洞函数和修补函数都被正确分类的代码对)处达到平台期。值得注意的是,即使对于更新更先进的模型,这一平台期仍然存在,表明仅凭模型能力的提升并不能显著提高性能。最后,我们讨论了检测效果、模型能力和模型规模之间的实际影响和权衡。实现和评估工件可在 https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG 公开获取。

英文摘要

Large language models (LLMs) have shown strong potential for automated software vulnerability detection, particularly in retrieval-augmented generation (RAG) settings. However, for approaches relying on proprietary models and APIs, reproducibility and replicability remain largely unexplored, raising the question of whether reported results generalize or depend primarily on specific model choices. In this work, we present a reproducibility study of Vul-RAG, a RAG-based framework for source code vulnerability detection that enhances LLMs with high-level vulnerability knowledge. We first replicate the results in a fully local and open-weights setting using the reported open-weight baseline models. We then extend the evaluation to a diverse set of recent open-weight LLMs, including code-specialized, general-purpose, and reasoning models of varying parameter sizes. The results confirm that the findings of Vul-RAG are reproducible under local deployment, but with minor deviations. Across all evaluated models, we observe a performance plateau at approximately 0.30 pairwise accuracy (code pairs for which both the vulnerable and the patched function are correctly classified). Notably, this plateau persists even for more recent and advanced models, indicating that improvements in model capacity alone do not substantially enhance performance. Finally, we discuss practical implications and trade-offs between detection effectiveness, model capabilities, and model scale. Implementation and evaluation artifacts are publicly available at https://github.com/hs-esslingen-it-security/revisiting-Vul-RAG.

2606.04736 2026-06-04 cs.LG cs.AI 版本更新

Curvature-aware dynamic precision approach for physics-informed neural networks

面向物理信息神经网络的曲率感知动态精度方法

Yingjie Shao, Ioannis N. Athanasiadis, George van Voorn, Taniya Kapoor

发表机构 * Mathematical & Statistical Methods Group (Biometris), Wageningen University & Research(数学与统计方法组(Biometris),瓦赫宁根大学与研究中心) Artificial Intelligence Group, Wageningen University & Research(人工智能组,瓦赫宁根大学与研究中心)

AI总结 提出一种曲率感知精度控制器,利用L-BFGS优化器中的曲率信息动态调整数值精度,在保持预测精度的同时降低双精度训练的计算成本。

详情
AI中文摘要

物理信息神经网络(PINNs)通过将物理定律直接嵌入神经网络训练,已成为模拟偏微分方程(PDEs)的有前景框架。然而,近期研究表明PINN优化对数值精度敏感。现有实现通常使用单精度(FP32),计算效率高但易出现失败模式,或双精度(FP64),鲁棒但成本高昂。这造成了计算效率与数值精度之间的权衡。为降低双精度训练的计算成本同时保持预测精度,我们提出一种曲率感知精度控制器,在训练过程中自适应调整数值精度,而非将其视为固定的实现选择。该方法重用来自有限内存BFGS(L-BFGS)优化器的曲率信息来构建精度控制器,在低精度足够时保留FP32,并在训练动态表明数值敏感或精度受限停滞时提升至FP64计算。我们在四个典型PINN失败模式基准和一个辐照度驱动的常微分方程示例上评估了所提方法。我们还测试了不同神经网络架构下的方法。该方法在所有基准方程上一致匹配甚至略微超过全FP64解的精度,同时相对于全双精度训练减少了训练时间。所得结果表明,PINN优化中的精度敏感性具有相位依赖性,仅在数值关键阶段选择性应用更高精度可以在不牺牲预测精度的前提下降低计算成本。

英文摘要

Physics-informed neural networks (PINNs) have become a promising framework for simulating partial differential equations (PDEs) by embedding physical laws directly into neural network training. However, recent studies show that PINN optimisation is sensitive to numerical precision. Existing implementations commonly use either single precision (FP32), which is computationally efficient but prone to failure modes, or double precision (FP64), which is robust but substantially expensive. This creates a trade-off between computational efficiency and numerical accuracy. To reduce the computational cost of double-precision training while retaining prediction accuracy, we propose a curvature-aware precision controller that adapts numerical precision during training rather than treating it as a fixed implementation choice. The proposed method reuses curvature information derived from the limited-memory BFGS (L-BFGS) optimiser to construct a precision controller, retaining FP32 when lower precision is sufficient and promoting computation to FP64 when the training dynamics indicate numerical sensitivity or precision-limited stagnation. We evaluate the proposed approach on four canonical PINN failure-mode benchmarks and an irradiance-driven ordinary differential equation example. We further test the proposed approach across different neural network architectures. The method consistently matches or even slightly exceeds full FP64 solution accuracy while reducing training time relative to full double-precision training on all benchmark equations. The obtained results indicate that precision sensitivity in PINN optimisation is phase-dependent, and that selectively applying higher precision only during numerically critical stages can lower computational cost without sacrificing predictive accuracy.

2606.04735 2026-06-04 cs.LG cs.AI 版本更新

Trace-Mediated Peak Bias: Bridging Temporal Credit Assignment and Cognitive Heuristics in Deep Reinforcement Learning

迹介导的峰值偏差:深度强化学习中时间信用分配与认知启发式的桥梁

Viktor Veselý, Aleksandar Todorov, Erwan Escudie, Matthia Sabatelli

发表机构 * Department of AI, University of Groningen(格罗宁根大学人工智能系)

AI总结 本文发现深度强化学习中的迹介导峰值偏差(TMPB),揭示了其作为峰值-末端规则的机制基础,并证明自适应优化器通过二阶矩归一化可缓解该偏差。

详情
AI中文摘要

时间信用分配是生物和人工智能的核心问题,但其与非线性函数逼近的相互作用尚不清楚。我们在深度强化学习中识别出一种系统性失效模式,称为迹介导峰值偏差(TMPB)。在中间资格迹深度下,智能体非理性地偏好具有高幅度奖励“峰值”的轨迹,而非具有更高累积回报的替代轨迹。这为峰值-末端规则提供了一种机制解释:一种人类记忆偏差,其中经验由其最强烈的时刻而非整合效用判断。我们证明,TMPB的出现是因为迹将远时时间差分误差放大为“梯度冲击”,而固定步长的随机梯度下降无法将其归一化,导致全局高估。相反,自适应优化器通过二阶矩归一化缓解了这种病理现象。我们的结果表明,类人的显著性扭曲可能自然产生于分布式系统中信用分配的数学约束,而自适应优化是理性价值估计的理论必要条件。

英文摘要

Temporal credit assignment is central to both biological and artificial intelligence, yet its interaction with non-linear function approximation is poorly understood. We identify a systematic failure mode in deep reinforcement learning (RL) termed Trace-Mediated Peak Bias (TMPB). At intermediate eligibility trace depths, agents irrationally prefer trajectories with high-magnitude reward ``peaks'' over alternatives with higher cumulative returns. This provides a mechanistic account of the Peak-End Rule: a human memory bias where experiences are judged by their most intense moments rather than integrated utility. We show that TMPB emerges because traces amplify distal Temporal Difference errors into ``gradient shocks'' that fixed-step-size Stochastic Gradient Descent cannot normalize, leading to global overestimation. Conversely, adaptive optimizers mitigate this pathology via second-moment normalization. Our results suggest that human-like saliency distortions may emerge naturally from the mathematical constraints of credit assignment in distributed systems, and that adaptive optimization is a theoretical necessity for rational value estimation.

2606.04705 2026-06-04 cs.CV cs.AI 版本更新

Enhancing MedSAM with a Lightweight Box Predictor for Medical Image Segmentation

通过轻量级框预测器增强 MedSAM 用于医学图像分割

Amirhossein Movahedisefat, Amirreza Fateh, Mohammad Reza Mohammadi

发表机构 * School of Computer Engineering, Iran University of Science and Technology (IUST)(伊朗科学技术大学计算机工程学院)

AI总结 提出一种集成轻量级框预测器的 MedSAM 增强框架,通过单次点击估计边界框以提升点提示的空间引导能力,在仅增加 1.6M 参数下显著提高多模态医学图像分割的准确性和鲁棒性。

详情
AI中文摘要

医学图像中的语义分割是一项关键但具有挑战性的任务,原因是数据稀缺和跨模态的高变异性。虽然像 Segment Anything Model (SAM) 这样的基础模型显示出潜力,但它们在没有特定适应的情况下往往难以处理医学图像。此外,点提示尽管是最自然的用户交互形式,但为可靠分割提供的空间上下文不足,特别是当目标结构不规则或对比度差时。在本文中,我们提出了一种增强的分割框架,将轻量级框预测器模块集成到 MedSAM 架构中。框预测器通过使用局部图像嵌入特征从单次用户点击估计近似边界框,提供空间引导以减少点提示的模糊性,同时仅引入 1.6M 额外参数和可忽略的推理开销。我们引入了一个两阶段训练流程,其中框预测器在集成到 MedSAM 之前独立训练。为了验证我们方法的泛化能力,我们在四个不同的数据集(FLARE22、BRISC、BUSI、LungSegDB)上进行了广泛评估,这些数据集涵盖不同的成像模态,包括 CT、MRI 和超声。我们的方法在不同解剖结构和成像领域中提高了分割准确性和鲁棒性,在 BUSI、FLARE22、BRISC 和 LungSegDB 上分别达到了 0.89、0.93、0.88 和 0.98 的 Dice 分数。代码可在 https://github.com/Amirhosseinmovahedi/MedSAM-BoxPredictor 获取。

英文摘要

Semantic segmentation in medical imaging is a critical yet challenging task due to data scarcity and high variability across modalities. While foundation models like the Segment Anything Model (SAM) show promise, they often struggle with medical images without specific adaptation. Moreover, point prompts, despite being the most natural form of user interaction, provide insufficient spatial context for reliable segmentation, particularly when target structures are irregular or poorly contrasted. In this paper, we propose an enhanced segmentation framework that integrates a lightweight Box Predictor module into the MedSAM architecture. The Box Predictor estimates an approximate bounding box from a single user click using localized image embedding features, providing spatial guidance that reduces the ambiguity of point prompts, while introducing only 1.6M additional parameters and negligible inference overhead. We introduce a two-stage training pipeline where the Box Predictor is trained independently before being integrated into MedSAM. To validate the generalization capability of our method, we conduct extensive evaluations on four diverse datasets (FLARE22, BRISC, BUSI, LungSegDB) spanning distinct imaging modalities, including CT, MRI, and Ultrasound. Our method improves segmentation accuracy and robustness across varied anatomical structures and imaging domains, achieving Dice scores of 0.89 (BUSI), 0.93 (FLARE22), 0.88 (BRISC), and 0.98 (LungSegDB). Code is available at https://github.com/Amirhosseinmovahedi/MedSAM-BoxPredictor

2606.04699 2026-06-04 cs.LG cs.AI cs.CV 版本更新

Graph-Guided Universum Learning in Generalized Eigenvalue Proximal SVMs for Alzheimer's Disease Classification

基于图引导的广义特征值近端支持向量机中的Universum学习用于阿尔茨海默病分类

Yogesh Kumar, Vrushank Ahire, Mudasir Ganaie

发表机构 * Dept. of Computer Science and Engineering, IIT Ropar, Punjab 140001, India(计算机科学与工程系,IIT罗帕尔,旁遮普140001,印度)

AI总结 针对阿尔茨海默病分类,提出两种图引导的Universum学习模型UG-GEPSVM和IUG-GEPSVM,利用轻度认知障碍样本构建图拉普拉斯正则化,替代传统独立惩罚项,在ADNI MRI数据集上取得更优性能。

详情
AI中文摘要

早期准确检测阿尔茨海默病(AD)对于及时干预和疾病管理至关重要。广义特征值近端支持向量机(GEPSVM)及其基于Universum的变体在AD分类中显示出有希望的结果。然而,现有方法将Universum样本视为独立点,未考虑它们之间的几何关系。本文提出了两种图引导的Universum学习模型,即UG-GEPSVM和IUG-GEPSVM,用于使用结构MRI数据进行AD与认知正常(CN)分类。在所提出的框架中,轻度认知障碍(MCI)受试者被用作Universum数据,以提供AD和CN类别之间的中间信息。使用高斯相似性、最小生成树连通性和多跳传播在Universum样本上构建图。从该图中导出拉普拉斯矩阵,捕获MCI样本的几何结构。这种基于拉普拉斯的正则化被纳入学习过程,以替代传统的独立Universum惩罚项。UG-GEPSVM将此正则化集成到广义特征值公式中,而IUG-GEPSVM使用标准特征值公式扩展了数值稳定的改进GEPSVM框架。在ADNI MRI数据集变体上使用ICA和PCA特征在五个不同噪声水平下的实验表明,两种提出的模型始终优于现有的GEPSVM和基于Universum的方法。UG-GEPSVM实现了88.07%的最高平均AUC,并在增加的噪声水平下保持稳定的性能。统计检验进一步证实了观察到的改进的显著性。

英文摘要

Early and accurate detection of Alzheimer's disease (AD) is important for timely intervention and disease management. Generalized Eigenvalue Proximal Support Vector Machine (GEPSVM) and its Universum-based variants have shown promising results for AD classification. However, existing methods treat Universum samples as independent points and do not consider the geometric relationships among them. This paper proposes two graph-guided Universum learning models, namely UG-GEPSVM and IUG-GEPSVM, for AD versus cognitively normal (CN) classification using structural MRI data. In the proposed framework, mild cognitive impairment (MCI) subjects are used as Universum data to provide intermediate information between AD and CN classes. A graph is constructed over the Universum samples using Gaussian similarity, Minimum Spanning Tree connectivity, and multi-hop propagation. From this graph, a Laplacian matrix is derived that captures the geometric structure of the MCI samples. This Laplacian-based regularization is incorporated into the learning process in place of the conventional independent Universum penalty term. UG-GEPSVM integrates this regularization into the generalized eigenvalue formulation, while IUG-GEPSVM extends the numerically stable improved GEPSVM framework using a standard eigenvalue formulation. Experiments on ADNI MRI dataset variants using ICA- and PCA-based features at five different noise levels show that both proposed models consistently outperform existing GEPSVM and Universum-based methods. UG-GEPSVM achieves the highest average AUC of 88.07% and maintains stable performance under increasing noise levels. Statistical tests further confirm the significance of the observed improvements.

2606.04684 2026-06-04 cs.CV cs.AI 版本更新

Real-Time Automatic License Plate Recognition Using YOLOv8, SORT Tracking, and Temporal Data Interpolation

基于YOLOv8、SORT跟踪与时间数据插值的实时自动车牌识别

Mirza Muhammad Mobeen

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出一个五阶段端到端算法流程,结合YOLOv8目标检测、SORT多目标跟踪和时间数据插值,解决动态交通监控中因光照变化、遮挡等导致的识别率低和跟踪路径断裂问题。

Comments 7 Pages, For Accessing code:https://github.com/ mobeen-pmo/Automatic-License-Plate-Recognition

详情
AI中文摘要

视频处理的实时困难严重限制了自动车牌识别(ALPR)在动态交通监控环境中的应用。对非受控变量(如光照剧烈变化、摄像机扫描角度、车辆高速行驶和物理遮挡)的高保真识别是一个问题,常导致跟踪路径断裂和光学字符识别(OCR)率低下。为缓解这些弱点,本研究提出一个五阶段端到端算法流程,涵盖基于深度学习的目标检测、运动学多目标跟踪和几何时间数据插值之间的平滑过渡。所提出的架构利用强大的YOLOv8 nano模型在第一阶段定位车辆,然后使用简单在线实时跟踪(SORT)算法建立帧间时空联系。另一种更具体的YOLOv8目标检测器检测车牌区域,将切片数组传递给EasyOCR链,并受位置语法验证约束。更重要的是,启动离线时间边界框插值机制以重新连接断裂的路径。

英文摘要

The real-time hardships of video processing seriously limit the usage of Automatic License Plate Recognition (ALPR) with application in dynamic traffic monitoring settings. High-fidelity recognition of unconstrained variables, e.g. drastic variations in illumination, acute camera scans, high vehicle speeds, and harsh physical concealment, is a problem that often leads to disjointed tracking paths and poor Optical Character Recognition (OCR) rates. In order to mitigate these weaknesses, the study proposes a 5 stage, end-to-end algorithmic pipeline, encompassing a smooth transition between deep learning based object detection, multi-object tracking which is kinematic in nature, and geometry temporal data interpolation. The suggested architecture takes advantage of a very powerful YOLOv8 nano model to localize the vehicle at the first stage and then Simple Online and Realtime Tracking (SORT) algorithm is used to build spatial-temporal links between frames. Another, more specific typology of YOLOv8 object detectors the license plate area, channeling the sliced array to an EasyOCR chain under the limitations of positional syntax verification. More importantly, an offline interpolation mechanism of temporal bounding box is initiated to recast fragmented paths.

2606.04662 2026-06-04 cs.LG cs.AI 版本更新

Why Muon Outperforms Adam: A Curvature Perspective

为什么 Muon 优于 Adam:曲率视角

Shuche Wang, Fengzhuo Zhang, Jiaxiang Li, Dirk Bergemann, Zhuoran Yang

发表机构 * National University of Singapore(新加坡国立大学) Yale University(耶鲁大学) University of Minnesota(明尼苏达大学)

AI总结 从曲率视角出发,通过泰勒展开和曲率分解,发现 Muon 因更低的归一化方向锐度(NDS)而比 Adam 实现更大的一步损失下降,数据不平衡和层内曲率是其主要优势来源。

详情
AI中文摘要

Muon 在大语言模型训练中相比 Adam 将训练效率提升约两倍,但这一优势的局部几何来源尚不清楚。我们的工作首次从曲率视角尝试揭开 Muon 优于 Adam 的原因。首先,我们对训练损失曲面应用二阶泰勒近似,表明在匹配验证损失下,Muon 比 Adam 实现更大的一步损失下降。两种优化器的一阶增益相当,但 Muon 始终承受更小的二阶曲率惩罚。其次,我们将该曲率惩罚分解为更新范数的平方和归一化方向锐度(NDS)。我们发现 Muon 和 Adam 的更新范数相当,因此 Muon 更小的曲率惩罚源于更低的 NDS,而非更新尺度。第三,我们研究训练数据和模型结构如何塑造 Muon 的 NDS 优势。使用具有受控不平衡的 Zipf-概率上下文无关文法(PCFG)数据,我们表明数据不平衡放大了 Muon 相对于 Adam 的 NDS 优势。进一步的层内/跨层分解表明,在训练的中后期,Muon 更低的 NDS 主要由更小的层内曲率维持。除了经验证据,我们还分析了具有异质曲率和梯度对齐于高曲率模式的风格化二次问题。我们证明 Muon 通过平衡曲率组间的更新能量,实现了比 GD 更低的平均 NDS;当曲率异质性足够强时,在相同步数后这也产生更低的局部二次损失。

英文摘要

Muon improves training efficiency over Adam in large language-model training by about two times, but the local geometric source of this advantage remains unclear. Our work takes a first step toward demystifying Muon's superiority over Adam from a curvature perspective. First, we apply a second-order Taylor approximation to the training landscape and show that Muon achieves a larger one-step loss decrease than Adam at matched validation loss. The two optimizers have comparable first-order gains, but Muon consistently incurs a smaller second-order curvature penalty. Second, we decompose this curvature penalty into the squared update norm and Normalized Directional Sharpness (NDS). We find that Muon and Adam have comparable update norms, so Muon's smaller curvature penalty is driven by lower NDS, not update scale. Third, we study how training data and model structure shape Muon's NDS advantage. Using Zipf-Probabilistic Context-Free Grammar (PCFG) data with controlled imbalance, we show that data imbalance amplifies Muon's NDS advantage over Adam. A within-/cross-layer decomposition further shows that, in the middle and late stages of training, Muon's lower NDS is mainly sustained by smaller within-layer curvature. Beyond empirical evidence, we analyze stylized quadratic problems with heterogeneous curvature and gradient alignment toward high-curvature modes. We prove that Muon attains a smaller average NDS than GD by balancing update energy across curvature groups; when curvature heterogeneity is sufficiently strong, this also yields lower local quadratic loss after the same number of steps.

2606.04656 2026-06-04 cs.CV cs.AI 版本更新

Instance-Level Post Hoc Uncertainty Quantification in Object Detection

目标检测中的实例级事后不确定性量化

Chongzhe Zhang, Zifan Zeng, Qunli Zhang, Feng Liu, Zheng Hu

发表机构 * Tsinghua University(清华大学)

AI总结 提出蒙特卡洛广义线性模型(MC-GLM),用于目标检测中实例级、近似事后不确定性量化,无需重新训练,在nuScenes数据集上验证了有效性。

Comments 7 pages, 2 figures

详情
AI中文摘要

目标检测是自动驾驶的安全关键组成部分。为了安全保证,量化边界框预测中的不确定性至关重要。无需重新训练的事后不确定性量化符合实际部署需求;因此,我们采用拉普拉斯近似。由于需要实例级不确定性,需要多次反向传播的线性化推理方法时间效率不高,而基于采样的方法并非完全事后。我们提出了蒙特卡洛广义线性模型(MC-GLM),它提供实例级且近似事后不确定性量化。蒙特卡洛步骤中所需的样本数量是恒定的,与输出实例数量无关,因此可以并行化。在nuScenes数据集上使用CenterPoint检测器的实验验证了我们方法的有效性,所得不确定性表现出良好质量。

英文摘要

Object detection is a safety-critical component of autonomous driving. It is essential to quantify the uncertainty in bounding-box predictions for safety assurance. Post hoc uncertainty quantification without retraining aligns with real-world deployment requirements; therefore, we employ the Laplace approximation. Because instance-level uncertainty is needed, linearized inference methods that require multiple backpropagations are not time-efficient, and sampling-based methods are not fully post hoc. We propose Monte-Carlo generalized linearized model (MC-GLM), which provides instance-level and approximately post hoc uncertainty quantification. The number of samples required in the Monte Carlo step is constant and independent of the number of output instances, so it can be parallelized. Experiments on the nuScenes dataset with the CenterPoint detector validate the effectiveness of our method, and the resulting uncertainties exhibit good quality.

2606.04648 2026-06-04 cs.AI 版本更新

BiNSGPS: Geometry Problem Solving via Bidirectional Neuro-Symbolic Interaction

BiNSGPS: 通过双向神经符号交互解决几何问题

Qi Wang, Peijie Wang, Fei Yin, Cheng-Lin Liu

发表机构 * MAIS, Institute of Automation of Chinese Academy of Sciences(自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院)

AI总结 提出BiNSGPS框架,通过多模态大语言模型顾问与符号求解器之间的双向神经符号交互,动态纠正不一致的形式表示或提出辅助假设,以解决几何问题中的早期错误和符号冲突。

详情
AI中文摘要

几何问题求解在人工智能中提出了独特的挑战。现有方法通常分为两种范式:符号方法(适应性有限)和神经方法(容易产生幻觉)。最近的神经符号混合方法主要依赖单向流水线,其中神经输出被输入求解器而无反馈,使得系统对早期错误脆弱。为了打破这一单向瓶颈,我们提出了BiNSGPS,一个在多模态大语言模型顾问和符号求解器之间建立双向神经符号交互的框架。多模态大语言模型顾问主动整合来自符号求解器的反馈,以动态纠正不一致的形式表示或提出辅助假设,解决符号冲突并促进复杂推理。

英文摘要

Geometry problem solving poses distinct challenges in artificial intelligence. Existing approaches typically fall into two paradigms: symbolic methods, which exhibit limited adaptability, and neural methods, which are prone to hallucinations. Recent neuro-symbolic hybrids predominantly rely on a unidirectional pipeline where neural outputs are fed into solvers without feedback, making system brittle to early-stage errors. To break this unidirectional bottleneck, we propose BiNSGPS, a framework that establishes Bidirectional Neuro-Symbolic Interaction (BiNS) between a MLLM Adviser and a Symbolic Solver. MLLM Adviser actively incorporates feedback from the symbolic solver to dynamically rectify inconsistent formal representations or propose auxiliary hypotheses, resolving symbolic conflicts and facilitating complex deductions.

2606.04646 2026-06-04 cs.CL cs.AI cs.IR 版本更新

QO-Bench: Diagnosing Query-Operator-Preserving Retrieval over Typed Event Tuples

QO-Bench: 诊断类型化事件元组上的查询操作符保持检索

Mengao Zhang, Xiang Yang, Chang Liu, Tianhui Tan, Ke-wei Huang

发表机构 * Asian Institute of Digital Finance, National University of Singapore(亚洲数字金融研究所,新加坡国立大学)

AI总结 提出QO-Bench基准,通过类型化事件元组上的确定性评估,诊断检索增强生成系统在查询操作符(如连接、交集)上的执行瓶颈。

Comments 14 pages

详情
AI中文摘要

许多关于商业、法律和科学语料库的现实世界问题是文本中潜在记录的数据库风格查询的自然语言版本。现有的检索增强生成(RAG)系统主要针对语义相关性进行优化,但检索到看似相关的段落并不能保证正确的查询执行。我们引入了QO-Bench,一个用于类型化事件元组上查询操作符问答的诊断基准。该基准涵盖22,984篇新闻文章和614个公司事件,涉及18个查询模板,在785个问题上进行评估。每个黄金答案由类型化事件元组确定性计算得出,并通过召回率评分,答案通过精确匹配而非LLM评判器与黄金元组匹配。这种设计支持操作符级别的诊断,如连接和交集。我们在匹配条件下评估了RAG、ReAct RAG、GraphRAG和信息提取到SQL的方法,并设置了一个长上下文oracle上限以隔离检索失败。一个双轴框架——索引时保持与查询时执行——预测了每种范式失败的位置,结果证实了这一点:系统检索到相关文本,但丢弃了操作符所需的类型化值,并且可部署的范式排名在不同操作符间反转,相似性检索在过滤/投影上领先,而提取到SQL在交集和计数上领先。即使提供了黄金证据,长上下文oracle也远未饱和,因此操作符执行——而不仅仅是检索——是一个核心瓶颈,更强的答案模型也无法消除。QO-Bench将目标从段落相关性重新定义为查询操作符保持检索。

英文摘要

Many real-world questions over business, legal, and scientific corpora are natural-language versions of database-style queries over records latent in text. Existing retrieval-augmented generation (RAG) systems are optimized primarily for semantic relevance, but retrieving plausible passages does not guarantee correct query execution. We introduce QO-Bench, a diagnostic benchmark for query-operator question answering over typed event tuples. The benchmark covers 22,984 news articles and 614 corporate events across 18 query templates, evaluated on 785 questions. Each gold answer is deterministically computed from typed event tuples and scored by recall, with answers matched to the gold tuples by exact match rather than an LLM judge. This design enables operator-level diagnosis such as joins and intersection. We evaluate RAG, ReAct RAG, GraphRAG, and information-extraction-to-SQL under matched conditions, with a long-context oracle ceiling to isolate retrieval failure. A two-axis framework -- index-time preservation versus query-time execution -- predicts where each paradigm fails, and the results bear it out: systems retrieve relevant text but discard the typed values operators need, and the deployable paradigm ranking inverts across operators, with similarity retrieval leading on filter/project and extraction-to-SQL on intersection and counting. Even given the gold evidence, a long-context oracle stays far from saturated, so operator execution -- not retrieval alone -- is a core bottleneck that a stronger answer model does not remove. QO-Bench reframes the goal from passage relevance to query-operator-preserving retrieval.

2606.04620 2026-06-04 cs.LG cs.AI 版本更新

QuBLAST: A Framework for Quantizing Large Language Models with Block-Level Compression Approach and Activation Scaling Strategy

QuBLAST: 一种采用块级压缩方法和激活缩放策略量化大语言模型的框架

Pasindu Wickramasinghe, Achyuta Muthuvelan, Rachmad Vidya Wicaksana Putra, Minghao Shao, Muhammad Shafique

发表机构 * eBRAIN Lab, Division of Engineering, New York University (NYU) Abu Dhabi(eBRAIN实验室,工程系,纽约大学(NYU)阿布扎赫德分校)

AI总结 针对大语言模型部署困难,提出QuBLAST框架,通过块级混合精度量化和激活缩放策略,在降低模型大小40%-45.2%的同时保持困惑度增加不超过5%。

Comments 10 pages, 9 figures, 5 tables

详情
AI中文摘要

大语言模型已成为解决NLP任务的最先进算法。然而,它们通常伴随着巨大的计算和内存成本,因此难以部署在嵌入式系统上。为此,最先进的方法通常在网络的所有注意力块上采用统一的训练后量化,从而忽略了在同一网络中应用不同量化级别的潜力。它们还采用复杂操作来减轻激活异常值的负面影响,从而产生高计算开销。此外,它们没有考虑使用具有非传统注意力架构(例如状态空间模型)的新兴大语言模型进行评估,这些模型在应用量化时提出了不同的挑战。为了解决这些局限性,我们提出了QuBLAST,一种新颖的训练后量化方法,该方法采用块级压缩方法和激活缩放策略用于大语言模型。块级压缩方法实现了网络各块之间的混合精度量化,而激活缩放策略有效减轻了激活异常值的负面影响。具体来说,QuBLAST首先通过交叉熵损失分析预训练模型中不同注意力块的敏感性。QuBLAST利用这种敏感性分析来确定模型中每个注意力块的权重量化级别。此外,QuBLAST为每个块采用激活缩放图来控制激活值的范围并减轻激活异常值的负面影响,从而实现更好的量化结果。实验结果表明,QuBLAST在不同模型架构(即Qwen3-8B、Llama3-8B、Mistral v0.1-8B和Falcon H1R-7B)上将模型大小减少了40%-45.2%,同时在WikiText-2和WikiText-103数据集上保持性能在5%的困惑度增加之内。

英文摘要

LLMs have become the state-of-the-art algorithms for solving NLP tasks. However, they typically come at huge computational and memory costs, thus making them difficult to deploy on embedded systems. Toward this, state-of-the-art methods typically employ uniform post-training quantization (PTQ) across attention blocks of the network, hence overlooking the potential of applying different quantization levels in the same network. They also employ complex operations to mitigate the negative impact of activation outliers, hence incurring high computational overheads. Moreover, they have not considered evaluation using emerging LLMs with non-conventional attention architectures (e.g., state-space models), which pose different challenges in applying quantization. To address these limitations, we propose QuBLAST, a novel PTQ methodology that employs block-level compression approach with activation scaling strategy for LLMs. Block-level compression approach enables mixed-precision quantization across blocks of the network, while activation scaling strategy efficiently mitigates the negative impact of activation outliers. Specifically, QuBLAST first analyzes the sensitivity of different attention blocks in the pre-trained model through the cross-entropy loss analysis. QuBLAST leverages this sensitivity analysis to determine the weight quantization level for each attention block in the model. Furthermore, QuBLAST employs the activation scaling map for each block to control the range of activation values and mitigate the negative impact of activation outliers, thereby enabling better quantization results. Experimental results show that, QuBLAST reduces model sizes by 40%-45.2% across different model architectures (i.e., Qwen3-8B, Llama3-8B, Mistral v0.1-8B, and Falcon H1R-7B), while maintaining the performance within 5% perplexity increase for the WikiText-2 and WikiText-103 datasets.

2606.04619 2026-06-04 cs.AI cs.LO 版本更新

A Normative Intermediate Representation for ASP-Based Compliance Reasoning

基于ASP的合规推理的规范性中间表示

Yangfan Wu, Huanyu Yang, Jianmin Ji

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出MONIR,一种用于ASP合规推理的模态化输出规范性中间表示,通过分阶段操作语义和可执行编译,结合LLM辅助流程应用于中国ADAS法规,并评估提取质量与模块化增量求解效率。

详情
AI中文摘要

我们提出MONIR,一种用于基于ASP的合规推理的模态化输出规范性中间表示。其核心片段具有分阶段操作语义,而MONIR-ASP提供了可执行编译以及外部函数、时间规则和稳定模型推理的扩展。我们通过LLM辅助流程将框架实例化到中国ADAS法规和标准上。实验评估了提取质量以及模块化和增量ASP求解的效率。

英文摘要

We propose MONIR, a Modalized-Output Normative Intermediate Representation for ASP-based compliance reasoning. Its core fragment has a staged operational semantics, while MONIR-ASP provides an executable compilation and extensions for external functions, temporal rules, and stable-model reasoning. We instantiate the framework on Chinese ADAS regulations and standards with an LLM-assisted pipeline. Experiments evaluate extraction quality and the efficiency of modular and incremental ASP solving.

2606.04599 2026-06-04 cs.AI cs.CE 版本更新

Plan First, Judge Later, Run Better: A DMAIC-Inspired Agentic System for Industrial Anomaly Detection

先计划,后评判,更优运行:一种受DMAIC启发的工业异常检测智能体系统

Yongzi Yu, Ao Li, Le Wang, Ziyue Li, Fugee Tsung, Yuxuan Liang, Man Li

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) The Hong Kong University of Science and Technology(香港科技大学) Shanghai University of Finance and Economics(上海财经大学) Technische Universität München(慕尼黑技术大学) Southwestern University of Finance and Economics(西南财经大学)

AI总结 提出受DMAIC启发的多智能体系统DMAIC-IAD,通过先制定标准化操作程序(SOP)再生成策略,并引入预训练的无执行评判模型来排序候选策略,无需昂贵运行时试验,在四种模态上平均检测性能提升37.76%。

详情
AI中文摘要

大型语言模型(LLM)智能体在自动化复杂数据分析工作流方面展现出潜力,但在高风险工业场景中的可靠部署仍具挑战。工业异常检测(IAD)对制造质量、安全和效率至关重要,然而现有基于LLM的IAD智能体主要关注执行,而策略制定方面利用不足。因此,它们难以以统一且经济高效的方式处理异构模态。受DMAIC质量管理框架启发,我们提出DMAIC-IAD(受DMAIC启发的智能体工业异常检测),一种“先计划,后评判”的多智能体系统,将LLM智能体与结构化工业问题解决相结合。DMAIC-IAD在策略生成前将异构参考提炼为标准化操作程序(SOP),并引入预训练的无执行评判模型,无需昂贵的运行时试验即可对候选策略进行排序。跨四种模态的大量实验表明,DMAIC-IAD在适用智能体基线上平均检测性能提升37.76%。

英文摘要

Large language model (LLM) agents have shown promise in automating complex data-analysis workflows, but their reliable deployment remains challenging in high-stakes industrial scenarios. Industrial anomaly detection (IAD) is essential for manufacturing quality, safety, and efficiency, yet existing LLM-based IAD agents mainly focus on execution while under-exploiting strategy formulation. Consequently, they struggle to handle heterogeneous modalities in a unified and cost-effective manner. Inspired by the DMAIC quality-management framework, we propose DMAIC-IAD (DMAIC-inspired Agentic Industrial Anomaly Detection), a "Plan First, Judge Later" multi-agent system that aligns LLM agents with structured industrial problem-solving. DMAIC-IAD distills heterogeneous references into standardized operating procedures (SOPs) before strategy generation, and introduces a pre-trained execution-free judge model to rank candidate strategies without costly runtime trials. Extensive experiments across four modalities show that DMAIC-IAD improves average detection performance over applicable agentic baselines by 37.76%.

2606.04597 2026-06-04 cs.AI 版本更新

Learning Admissible Heuristics via Cost Partitioning

通过成本划分学习可采纳启发式

Hugo Barral, Quentin Cappart, Marie-José Huguet, Sylvie Thiébaux

发表机构 * UCLouvain, Louvain-la-Neuve, Belgium(列日大学,列日-拉-纽夫,比利时) Australian National University, Canberra, Australia(澳大利亚国立大学,堪培拉,澳大利亚)

AI总结 提出一个框架,利用成本划分与乘子预测的拉格朗日对偶等价性,通过图编码和自注意力网络学习可采纳成本划分,从而生成首个保证可采纳性的机器学习启发式。

详情
AI中文摘要

可采纳启发式对于最优规划至关重要,但由于存在高估风险,学习它们仍然具有挑战性。成本划分在保持可采纳性的同时结合多个抽象启发式,但在线计算最优划分代价高昂。我们提出了一个框架,通过利用成本划分与乘子预测之间的拉格朗日对偶等价性,学习推断可采纳成本划分。规划状态和模式被编码为带标签的图,并使用Weisfeiler-Leman算法的动作中心变体提取结构特征向量。一个具有轴向自注意力和softmax输出层的深度架构将这些特征映射到成本权重,这些权重通过构造满足划分约束,从而确保可采纳性。实验表明,与次优划分基线相比,节点扩展减少,同时保持严格的可采纳性。据我们所知,这是第一个保证可采纳性的机器学习启发式。

英文摘要

Admissible heuristics are essential for optimal planning, yet learning them remains challenging due to the risk of overestimation. Cost partitioning combines multiple abstraction heuristics while preserving admissibility, but computing optimal partitions online is expensive. We propose a framework that learns to infer admissible cost partitions by leveraging the Lagrangian dual equivalence between cost partitioning and multiplier prediction. Planning states and patterns are encoded as labelled graphs, and an action-centric variant of the Weisfeiler-Leman algorithm extracts structural feature vectors. A deep architecture with axial self-attention and a softmax output layer maps these features to cost weights that satisfy the partition constraints by construction, ensuring admissibility. Experiments demonstrate reduced node expansions compared to suboptimal partitioning baselines while maintaining strict admissibility. To our knowledge, this is the first machine-learned heuristic guaranteed to be admissible.

2606.04594 2026-06-04 cs.DC cs.AI cs.SE 版本更新

Ekka: Automated Diagnosis of Silent Errors in LLM Inference

Ekka: LLM推理中静默错误的自动诊断

Yile Gu, Zhen Zhang, Shaowei Zhu, Xinwei Fu, Jun Wu, Yida Wang, Baris Kasikci

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 提出Ekka系统,通过差分调试对齐比较中间执行状态,自动诊断LLM推理框架中的静默错误,在真实错误基准上达到80% pass@1和88% pass@5的诊断准确率。

Comments ICML 2026

详情
AI中文摘要

LLM服务框架随着复杂的软件栈和大量优化而快速发展。快速开发过程可能引入静默错误,即输出质量在没有任何显式错误信号的情况下悄然下降。由于高层症状与底层根本原因之间存在巨大的语义鸿沟,诊断静默错误非常困难。我们观察到,通过利用语义正确的参考实现,静默错误的诊断可以有效地构建为差分调试问题。我们提出了Ekka,一个自动诊断系统,通过系统地对齐和比较目标框架与参考框架之间的中间执行状态来识别根本原因。我们构建了一个来自流行服务框架的真实静默错误基准,Ekka显示出80%的pass@1诊断准确率和88%的pass@5诊断准确率,优于现有系统。Ekka还诊断了服务框架中的4个新静默错误,所有错误均已得到开发者确认。

英文摘要

LLM serving frameworks are quickly evolving with a complex software stack and a vast number of optimizations. The rapid development process can introduce silent errors where output quality silently degrades without any explicit error signals. Diagnosing silent errors is notoriously difficult due to the substantial semantic gap between the high-level symptoms and the low-level root causes. We observe that diagnosis of silent errors can be effectively framed as a differential debugging problem by leveraging the existence of semantically correct reference implementations. We propose Ekka, an automated diagnosis system that identifies root causes by systematically aligning and comparing intermediate execution states between a target and a reference framework. We constructed a benchmark of real-world silent errors from popular serving frameworks, where Ekka shows 80% pass@1 diagnosis accuracy and 88% pass@5 diagnosis accuracy, outperforming state-of-the-art systems. Ekka also diagnoses 4 new silent errors from serving frameworks, all of which have been confirmed by the developers.

2606.04592 2026-06-04 cs.CY cs.AI cs.HC 版本更新

Synthetic Personalities: How Well Can LLMs Mimic Individual Respondents Using Socio-Economic Microdata?

合成人格:LLM 如何使用社会经济微观数据模仿个体受访者?

Leonard Kinzinger, Jochen Hartmann

发表机构 * Technical University of Munich(慕尼黑技术大学)

AI总结 研究利用德国社会经济面板数据构建个体级数字孪生,通过评估不同构建方法(模型、信息深度、嵌入方式、推理模式)对200万以上孪生响应的准确性,发现信息深度在75%熵分位数达到成本效益帕累托点,最佳单元准确率达78.8%。

详情
AI中文摘要

基于LLM的数字孪生有望扩展和加速市场研究,但大多数已发表的孪生要么是基于少数人口统计问题的粗略角色机器人,要么是基于专门收集的调查和访谈记录构建的详细个体级孪生。这两种设置都不涉及营销实践中操作上最相关的情况:从企业通过CRM系统、忠诚度计划和重复调查积累的现有异构面板数据中构建详细的个体孪生。我们从德国社会经济面板(SOEP)构建详细的个体级孪生,并在一个$3 \times 5 \times 2 \times 2$的构建方法网格中评估它们,该网格涵盖三个开放权重的LLM、五个按归一化香农熵排序的累积信息深度、两种嵌入方法和两种推理模式,对500名参与者和183个保留问题评分超过210万个孪生响应。孪生质量随信息深度提高,但超过75%熵分位数后收益递减,该分位数相对于性能最佳的100%单元充当成本效益帕累托点。将嵌入从叙述性角色摘要切换到原始对话历史(过去响应)在100%深度下每个模型-推理单元中提高了保留准确率,而显式思考模式提高了秩次相关性但不改变准确率。最佳单元准确率达到78.8%,Fisher-$z$相关性在SOEP保留评估集上达到$r = 0.590$。研究结果表明,基于孪生的市场研究不再受数据设计限制,而是受项目数量、模型选择和本文现在映射的一小部分构建级决策限制。

英文摘要

LLM-based digital twins promise to scale and accelerate market research, but most published twins are either coarse persona bots conditioned on a few demographic questions or detailed individual-level twins built on purpose-collected surveys and interview transcripts. Neither setup speaks to the operationally most relevant case for marketing practice: building detailed individual twins from the pre-existing heterogeneous panel data that firms already accumulate through CRM systems, loyalty programs, and repeat surveys. We construct detailed individual-level twins from the German Socio-Economic Panel (SOEP) and evaluate them across a $3 \times 5 \times 2 \times 2$ construction-method grid that covers three open-weights LLMs, five cumulative information depths ranked by normalized Shannon entropy, two embedding methods, and two reasoning modes, scoring over 2.1 million twin responses on 500 participants and 183 held-out questions. Twin quality rises with information depth but with diminishing returns past the 75 percent entropy quartile, which acts as a cost-efficient Pareto point relative to the best-performing 100 percent cells. Switching the embedding from a narrative persona summary to a raw dialog history of past responses raises hold-out accuracy in every model-by-reasoning cell at the 100 percent depth, while an explicit thinking mode raises rank-order correlation without moving accuracy. Best-cell accuracy reaches 78.8 percent and Fisher-$z$ correlation reaches $r = 0.590$ on the SOEP held-out evaluation set. The findings suggest that twin-based market research is no longer gated by data design, but by item volume, model selection, and a small set of construction-level decisions that this paper now maps.

2606.04579 2026-06-04 cs.AI 版本更新

SCI-PRM: A Tool Aware Process Reward Model for Scientific Reasoning Verification

SCI-PRM:用于科学推理验证的工具感知过程奖励模型

Xiangyu Zhao, Hengyuan Zhao, Yiheng Wang, Wanghan Xu, Yuhao Zhou, Qinglong Cao, Zhiwang Zhou, Lei Bai, Wenlong Zhang, Xiao-Ming Wu

发表机构 * The Hong Kong Polytechnic University(香港理工大学) Shanghai AI Lab(上海人工智能实验室) National University of Singapore(新加坡国立大学) Shanghai Jiao Tong University(上海交通大学) Sichuan University(四川大学) Tongji University(同济大学)

AI总结 针对科学推理中工具使用和事实一致性问题,提出Sci-PRM模型,通过构建包含工具链轨迹的数据集SCIPRM70K并训练过程奖励模型,在测试时扩展和强化学习中提供细粒度监督,提升基础模型性能。

Comments Accepted by KDD 2026 AI4Science Track

详情
AI中文摘要

虽然过程奖励模型(PRM)在数学推理中取得了显著成功,但它们在复杂科学领域(如生物学、化学和物理学)的应用仍基本未被探索。科学问题不仅要求逻辑严谨,还要求事实一致性和领域特定工具的精确使用,而当前模型在这些方面常常出现幻觉且缺乏验证。在本文中,我们首先构建了SCIPRM70K,这是一个大规模数据集,包含显式地将推理与科学工具执行交错的工具链轨迹。在此基础上,我们训练了一个名为Sci-PRM的高效奖励模型,以在单次推理的每一步提供关于工具选择、执行准确性和结果解释的细粒度监督。实验表明,Sci-PRM在两个关键方面显著增强了基础模型:(1)通过Best-of-N选择实现有效的测试时扩展;(2)当集成到强化学习中时,它作为密集奖励信号,缓解了优势消失的关键问题,使模型能够突破现有性能上限。

英文摘要

While Process Reward Models (PRMs) have achieved remarkable success in mathematical reasoning, their application in complex scientific domains-such as biology, chemistry, and physics remains largely unexplored. Scientific problems demand not only logical rigor but also factual consistency and the precise usage of domain-specific tools, areas where current models often suffer from hallucinations and lack of verification. In this paper, we first construct SCIPRM70K, a large-scale dataset featuring Chain-of-Tool trajectories that explicitly interleave reasoning with the execution of scientific tools. Building upon this, we train an efficient reward model called Sci-PRM to provide fine-grained supervision on tool selection, execution accuracy, and result interpretation at each step in one inference. Experiments demonstrate that Sci-PRM significantly enhances foundation models in two key aspects: (1) it enables effective test-time scaling via Best-of-N selection; and (2) when integrated into Reinforcement Learning, it serves as a dense reward signal that mitigates the critical issue of advantage disappearance, allowing the model to break through existing performance ceilings.

2606.04562 2026-06-04 cs.AI cs.LG cs.SI 版本更新

Neetyabhas: A Framework for Uncertainty-Aware Public Policy Optimization in Rational Agent-Based Models

Neetyabhas: 理性主体模型中不确定性感知的公共政策优化框架

Janani Venugopalan, Gaurav Deshkar, Rishabh Gaur, Harshal Hayatnagarkar, Jayanta Kshirsagar

发表机构 * ThoughtWorks

AI总结 提出一种集成流行病测量和政策执行不确定性的分层强化学习框架,通过模拟个体行为与政策干预的交互,有效管理疫情并降低影响。

详情
AI中文摘要

目的 世界卫生组织的COVID-19非药物干预措施(如封锁、疫苗接种)有效遏制了传播,但带来了沉重的经济负担。现有研究常常忽略个体行为,并错误地假设完美的感染追踪和无误的政策执行,未能考虑现实世界的不确定性和错误。方法 我们提出了一种整合流行病测量(感染/住院)和政策执行中不确定性的方法。我们构建了一个包含1000名个体的模拟模型,这些个体实时做出关于佩戴口罩、接种疫苗和购物的选择。同时,政策制定者基于健康和经济观察部署干预措施(封锁、强制令)。该框架由分层强化学习智能体驱动,利用深度Q网络以及不确定性感知的策略梯度变体(DDPG和TD3)。结果 模拟有效管理了疫情的进展。佩戴口罩和疫苗接种被证明非常有效,显著降低了疫情高峰的高度和持续时间。通过整合个体行为、政策不确定性和多方面的干预措施,我们的动态控制方法成功减轻了疫情的影响。结论 我们的模型通过将不确定性和人类行为嵌入公共卫生政策框架,克服了以往研究的局限性。模拟表明,考虑个体选择和不完美数据对于设计复杂疫情期间的有效干预措施至关重要,其中口罩和疫苗是关键工具。

英文摘要

Purpose The WHO's COVID-19 non-pharmaceutical interventions (e.g., lockdowns, vaccinations) effectively curb transmission but impose heavy economic strains. Existing research often neglects individual behaviors and falsely assumes perfect infection tracking and flawless policy execution, failing to account for real-world uncertainties and errors. Methods We propose an integrative approach incorporating uncertainties in both epidemic measurement (infections/hospitalizations) and policy implementation. We built a simulation model of 1,000 individuals making real-time choices regarding mask-wearing, vaccination, and shopping. Concurrently, policymakers deploy interventions (lockdowns, mandates) based on health and economic observations. This framework is driven by hierarchical reinforcement learning agents, utilizing deep Q-networks alongside uncertainty-aware policy gradient variants (DDPG and TD3). Results The simulations effectively managed the epidemic's progression. Masking and vaccinations proved highly effective, significantly reducing both the outbreak's peak height and duration. By integrating individual behaviors, policy uncertainties, and multifaceted interventions, our dynamic control approach successfully mitigated the epidemic's impact. Conclusions Our model overcomes previous research limitations by embedding uncertainty and human behavior into public health policy frameworks. The simulation demonstrates that accounting for individual choices and imperfect data is crucial for designing effective interventions during complex pandemics, with masks and vaccines serving as pivotal tools.

2606.04555 2026-06-04 cs.CL cs.AI 版本更新

Temporal Order Matters for Agentic Memory: Segment Trees for Long-Horizon Agents

时间顺序对智能体记忆至关重要:面向长程智能体的线段树

Yifan Simon Liu, Liam Gallagher, Faeze Moradi Kalarde, Jiazhou Liang, Armin Toroghi, Scott Sanner

发表机构 * University of Toronto(多伦多大学) Vector Institute for Artificial Intelligence(人工智能向量研究所)

AI总结 提出线段树记忆架构SegTreeMem,通过在线右边缘更新规则保持对话历史的时间顺序,结合层次化时间上下文进行检索,在长程记忆基准上优于现有方法。

详情
AI中文摘要

长程对话智能体需要通过与用户交互不断演化的事件、任务和目标进行互动。这些历史记录本质上是时间性的,然而许多现有的记忆系统主要按主题相似性组织信息,可能忽略事件发生的顺序。我们引入线段树记忆(Segment Tree Memory,简称SegTreeMem),这是一种将对话历史表示为按时间顺序排列的线段树的记忆架构。SegTreeMem通过在线最右边缘更新规则逐步插入新话语,在形成层次化记忆片段的同时保持时间顺序。在检索时,SegTreeMem通过树传播相关性分数,将局部语义匹配与层次化时间上下文相结合。在三个长程记忆基准和两个LLM骨干网络上,SegTreeMem在答案质量上优于平面检索、图结构记忆和树结构记忆基线。额外的时间顺序排列分析表明,性能提升依赖于在记忆构建过程中保持时间顺序,这支持了时间顺序是智能体记忆关键结构的观点。

英文摘要

Long-horizon conversational agents need to interact with users through evolving events, tasks, and goals. Such histories are naturally temporal, yet many existing memory systems organize information primarily by topical similarity and may ignore the order in which events occur. We introduce Segment Tree Memory, or SegTreeMem, a memory architecture that represents conversation history as a temporally ordered Segment Tree over utterances. SegTreeMem incrementally inserts new utterances through an online rightmost-frontier update rule, preserving chronological order while forming hierarchical memory segments. For retrieval, SegTreeMem propagates relevance scores through the tree to combine local semantic matching with hierarchical temporal context. Across three long-horizon memory benchmarks and two LLM backbones, SegTreeMem improves answer quality over flat retrieval, graph-structured memory, and tree-structured memory baselines. Additional temporal-order permutation analysis shows that the performance gain depends on preserving temporal order during memory construction, supporting the claim that temporal order is a key structure for agentic memory.

2606.04536 2026-06-04 cs.AI 版本更新

Scaling Self-Evolving Agents via Parametric Memory

通过参数化内存扩展自进化智能体

Tao Ren, Weiyao Luo, Hui Yang, Rongzhi Zhu, Xiang Huang, Yuchuan Wu, Bingxue Chou, Jieping Ye, Jiafeng Liang, Yongbin Li, Yijie Peng

发表机构 * Alibaba Group(阿里巴巴集团) Peking University(北京大学)

AI总结 提出TMEM框架,通过在线更新LoRA权重使智能体从经验中学习,从而在单轮交互中改变未来行为,并在多个基准上优于基于摘要和检索的方法。

详情
AI中文摘要

现有的内存增强型LLM智能体仅在提示空间中存储过去经验,作为文本摘要或检索段落,同时在整个运行过程中保持模型参数冻结。这类智能体可以\emph{查找}它们所见过的东西,但无法\emph{从中学习}:它们的策略不会因经验而改变,任何从上下文中丢弃的信息都会永久丢失。我们引入 exttt{TMEM},一个自进化的参数化内存框架,其中智能体不仅将历史压缩为显式内存,还通过轻量级在线更新将提炼的监督吸收到快速LoRA权重$Δ_t$中,从而在单个回合内真正改变其未来行为。我们将其形式化为具有快速权重运行动态的智能决策过程:动作从$π_{θ_0+Δ_t}$中采样,而提取动作产生监督,更新$Δ_t$以用于后续决策。这种观点使得提取策略可以直接通过RL优化:训练$θ_0$不仅改进了任务动作,还提高了用于在线LoRA适应的数据质量。我们进一步提出基于SVD的LoRA子空间初始化以加速在线收敛。在LoCoMo、LongMemEval-S、多目标搜索和CL-Bench上的实验表明, exttt{TMEM}在不同模型规模下始终优于基于摘要和基于检索的基线。

英文摘要

Existing memory-augmented LLM agents store past experience exclusively in prompt space, as textual summaries or retrieved passages, while keeping model parameters frozen throughout a rollout. Such agents can \emph{look up} what they have seen but cannot \emph{learn from} it: their policy is unchanged by experience, and any information dropped from the context is permanently lost. We introduce \texttt{TMEM}, a self-evolving parametric memory framework in which the agent not only compresses history into explicit memory but also absorbs distilled supervision into fast LoRA weights $Δ_t$ via lightweight online updates, genuinely altering its future behavior within a single episode. We formalize this as an agentic decision process with fast-weight rollout dynamics: actions are sampled from $π_{θ_0+Δ_t}$, while extraction actions produce supervision that updates $Δ_t$ for subsequent decisions. This view makes the extraction policy directly optimizable by RL: training $θ_0$ improves not only task actions but also the quality of the data used for online LoRA adaptation. We further propose SVD-based initialization of the LoRA subspace to accelerate online convergence. Experiments on LoCoMo, LongMemEval-S, multi-objective search, and CL-Bench show that \texttt{TMEM} consistently outperforms summary-based and retrieval-based baselines across different model scales.

2606.04535 2026-06-04 cs.CL cs.AI 版本更新

Dynamic Infilling Anchors for Format-Constrained Generation in Diffusion Large Language Models

扩散大语言模型中用于格式约束生成的动态填充锚点

Boyan Han, Yiwei Wang, Yi Song, Yujun Cai, Chi Zhang

发表机构 * AGI Lab, Westlake University, China(西溪大学AGI实验室,中国) University of California, Merced, USA(加州大学梅尔德分校,美国) Teeni AI, China(Teeni AI,中国) The University of Queensland, Australia(昆士兰大学,澳大利亚)

AI总结 提出动态填充锚点(DIA),一种无需训练的方法,通过动态估计结束锚点位置调整生成长度,确保格式约束下的结构正确性和语义连贯性,在GSM8K和MATH上实现零样本性能提升。

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026)

详情
AI中文摘要

扩散大语言模型(dLLMs)提供双向注意力和并行生成,使其能够利用全局上下文并自然支持格式约束任务,如可解析的JSON或推理模板。虽然直接的固定锚点可以强制执行此类约束,但它们通常强加刚性跨度,导致推理截断或内容冗余。为了克服这一点,我们提出了动态填充锚点(DIA),一种无需训练的方法,在迭代填充之前动态估计结束锚点位置以调整生成长度。这种灵活机制确保了结构正确性和语义连贯性,避免了固定跨度方法的低效。在推理基准上的实验表明,DIA显著提高了格式合规性和答案准确性,在GSM8K和MATH上实现了显著的零样本增益。这些结果确立了DIA作为通往可靠、结构感知生成的一条稳健路径。

英文摘要

Diffusion large language models (dLLMs) offer bidirectional attention and parallel generation, enabling them to exploit global context and naturally support format-constrained tasks like parseable JSON or reasoning templates. While straightforward fixed anchors can enforce such constraints, they often impose rigid spans, leading to truncated reasoning or redundant content. To overcome this, we propose Dynamic Infilling Anchors (DIA), a training-free method that dynamically estimates end-anchor positions to adjust generation length before iterative infilling. This flexible mechanism ensures structural correctness and semantic coherence, avoiding the inefficiencies of fixed-span methods. Experiments on reasoning benchmarks demonstrate that DIA substantially improves format compliance and answer accuracy, achieving significant zero-shot gains on GSM8K and MATH. These results establish DIA as a robust pathway toward reliable, structure-aware generation.

2606.04528 2026-06-04 cs.CV cs.AI 版本更新

Optical-Guided Neural Collapse for SAR Few-Shot Class Incremental Learning

光学引导的SAR少样本类增量学习中的神经坍缩

Fan Zhang, Sijin Zheng, Fei Ma, Qiang Yin, Yongsheng Zhou, Fei Gao, Xian Sun

发表机构 * Beihang University, Beijing 100191, China(北航,北京100191,中国) Aerospace Information Research Institute, Chinese Academy of Sciences, Beijing 100190, China(航天信息研究所,中国科学院,北京100190,中国)

AI总结 针对SAR图像少样本类增量学习中的数据稀缺和方位角敏感问题,提出利用光学ATR数据集的正交子空间作为几何先验,通过投影损失和分类器损失联合诱导神经坍缩,实现特征紧凑性和类间可分离性。

Comments 16 pages, 6 figures

详情
AI中文摘要

合成孔径雷达图像中的少样本类增量学习由于严重的数据稀缺和SAR特有的变异性而面临独特挑战。特别是,SAR中强烈的方位角敏感性导致大的类内变异和类间混淆,而FSCIL的顺序更新进一步导致先前学习类别的灾难性遗忘。受神经坍缩启发,我们提出了一种光学引导的SAR FSCIL框架,该框架从数据丰富的光学ATR数据集中推导出正交特征子空间,并将其作为几何先验来指导SAR特征学习。通过主角约束将SAR特征投影到这些正交子空间上,有效地将判别结构从光学域转移到SAR域。具体地,我们的投影损失和用冻结的单纯形ETF几何优化的分类器损失通过将特征集中在类均值周围同时保持大的类间角度,联合诱导神经坍缩。我们在一个包含光学ATR数据集和具有24个目标类别的SAR ATR数据集的基准上评估该方法,该基准组织为一个基础训练会话和七个增量会话。与最近的FSCIL方法(包括NCFSCIL等)相比,我们的方法实现了最高的最终准确率以及最终性能与性能下降之间的有利权衡。此外,神经坍缩指标显示类内紧凑性和类间可分离性得到改善,表明学习到的特征更接近理想的单纯形ETF几何。

英文摘要

Few-shot class-incremental learning (FSCIL) in synthetic aperture radar imagery presents unique challenges due to severe data scarcity and SAR-specific variability. In particular, strong azimuth sensitivity in SAR induces large intra-class variation and inter-class confusion, and FSCIL sequential updates further lead to catastrophic forgetting of previously learned classes. Inspired by neural collapse, we propose an optical-guided SAR FSCIL framework, which derives orthogonal feature subspaces from a data-rich optical ATR dataset and uses them as geometric priors to guide SAR feature learning. SAR features are projected onto these orthogonal subspaces via principal angle constraints, effectively transferring discriminative structure from the optical to the SAR domain. Specifically, our projection loss and the classifier loss optimized with a frozen simplex-ETF geometry jointly induce neural collapse by concentrating features around class means while maintaining large inter-class angles. We evaluate the approach on a benchmark comprising an optical ATR dataset and a SAR ATR dataset with 24 target classes, organized into a base training session and seven incremental sessions. Compared with recent FSCIL methods including NCFSCIL and so on, our method achieves the highest final accuracy and a favorable trade-off between final performance and performance degradation. Moreover, neural collapse metrics show improved intra-class compactness and inter-class separability, indicating that the learned features more closely approximate the ideal simplex-ETF geometry.

2606.04522 2026-06-04 cs.IR cs.AI cs.DB cs.LG 版本更新

ANN Search: Recall What Matters

ANN搜索:召回真正重要的

Dimitris Dimitropoulos, Nikos Mamoulis

发表机构 * University of Ioannina(伊奥尼亚大学) Archimedes, Athena RC(阿基米德,雅典RC)

AI总结 本文提出用逆近似比1/Ratio@k替代Recall@k来评估近似最近邻搜索质量,实验表明前者能更准确反映实际效用并降低计算开销。

详情
AI中文摘要

近似最近邻(ANN)搜索已成为信息检索和现代机器学习任务(从分类到检索增强生成)的核心原语。社区主要通过给定Recall@k(检索到的真实精确最近邻的比例)下的吞吐量来评估和调优ANN算法。我们认为,ANN搜索真正重要的是检索结果的质量,而非它们与真实kNN集合的重叠。我们证明,使用Recall@k评估检索质量会带来不必要的计算开销,并研究用逆近似比1/Ratio@k替代它。1/Ratio@k评估检索到的邻居与真实邻居之间距离的差异。它无需判断、无需超参数,仅通过标准ANN基准输入即可计算。我们在涵盖广泛内在维度的多样化数据集上对最先进的ANN算法进行基准测试,从效率、下游分类和检索增强生成三个维度全面评估这两个指标。在效率方面,优化1/Ratio@k达到操作质量阈值所需的计算成本远低于Recall@k。在下游任务中,即使Recall@k显著下降,性能指标(标签精度、语义相似度、BERTScore和LLM评分质量)仍保持高度稳定。相反,逆近似比紧密反映了这种稳定性,比Recall@k更好地追踪实际效用。最终,虽然Recall@k夸大了近似的真实成本,但1/Ratio@k提供了更准确、可部署的ANN实际质量代理。

英文摘要

Approximate nearest neighbor (ANN) search has become a core primitive in information retrieval and modern machine learning tasks, from classification to retrieval-augmented generation. The community evaluates and tunes ANN algorithms primarily on their throughput at a given Recall@k, the fraction of true exact neighbors retrieved. We argue that what really matters in ANN search is the quality of the retrieved results and not their overlap with the true kNN set. We show that using Recall@k to assess retrieval quality forces unnecessary computational overhead and investigate replacing it by 1/Ratio@k, the inverse approximation ratio. 1/Ratio@k evaluates the differences between the distances of the retrieved and true neighbors. It is judge-free, hyperparameter-free, and computable from standard ANN benchmark inputs alone. We benchmark state-of-the-art ANN algorithms across diverse datasets spanning a wide range of intrinsic dimensionalities, evaluating the two metrics comprehensively across efficiency, downstream classification, and retrieval-augmented generation. On the efficiency axis, optimizing for 1/Ratio@k reaches operational quality thresholds at a substantially lower computational cost than Recall@k. In downstream tasks, performance indicators (label precision, semantic similarity, BERTScore, and LLM-graded quality) remain highly stable even when Recall@k drops significantly. The inverse approximation ratio, on the other hand, closely mirrors this stability, tracking true utility much better than Recall@k. Ultimately, while Recall@k overstates the true cost of approximation, 1/Ratio@k offers a more accurate, deployable proxy for actual ANN quality.

2606.04517 2026-06-04 cs.NI cs.AI 版本更新

Treat Traffic Like Trees: A Semantic-Preserving Hierarchical Graph-Based Expert Framework for Encrypted Traffic Analysis

像对待树一样对待流量:一种用于加密流量分析的语义保持分层图专家框架

Yuantu Luo, Jun Tao, Linxiao Yu, Guang Cheng

发表机构 * School of Cyber Science and Engineering, Southeast University(东南大学网络安全科学与工程学院) Purple Mountain Laboratories(紫金山实验室) Engineering Research Center of Blockchain Application, Supervision and Management (Southeast University)(区块链应用、监督与管理工程研究中心(东南大学)) Engineering Research Center of Security for Ubiquitous Network, Jiangsu Province(江苏省物联网安全工程技术研究中心)

AI总结 提出一种基于协议树图注意力与专家混合的语义保持分层图专家框架(PTGAMoE),通过字段级图构建和专家委员会设计,在严格无数据泄露设置下显著优于现有模型,并提供可解释的协议级特征重要性分析。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

基于图的深度学习方法已被广泛应用于加密流量分析,以利用不同粒度下的潜在相关性。然而,复杂的预处理流程和精细的模型结构虽然通常能取得良好性能,但在表示学习过程中可能掩盖固有的协议语义。此外,由协议规范定义并在人工流量分析中常规使用的协议层及其对应字段的分层结构,在现有学习框架中仍未得到充分探索。在本文中,我们提出了一种用于加密流量分析的语义保持分层图专家框架——协议树图注意力与专家混合(PTGAMoE)。基于字段的图构建和专家委员会设计使PTGAMoE能够量化模型对特定字段和协议的偏好。在严格无数据泄露设置下,对代表性基准数据集的大量实验结果表明,PTGAMoE显著优于最先进的模型。此外,语义保持设计提供了关于协议级特征重要性和专家级贡献的可解释性洞察,反映了模型在加密流量分类任务中的决策逻辑。

英文摘要

Graph-based deep learning methods have been widely employed in encrypted traffic analysis to exploit latent correlations across different granularities. However, while complex preprocessing pipelines and sophisticated model structures often achieve strong performance, they may obscure inherent protocol semantics during representation learning. Moreover, the hierarchical structure of protocol layers and their corresponding fields, defined by protocol specifications and routinely utilized in manual traffic analysis, remains underexplored in existing learning frameworks. In this paper, we propose Protocol Tree Graph Attention with Mixture of Experts (PTGAMoE), a semantic-preserving hierarchical graph-based expert framework for encrypted traffic analysis. The field-based graph construction and expert committee design enable PTGAMoE to quantify the model's preferences for specific fields and protocols. Extensive experimental results on representative benchmark datasets under strict no-data-leakage settings demonstrate that PTGAMoE significantly outperforms state-of-the-art (SOTA) models. Furthermore, the semantic-preserving design provides interpretable insights into protocol-level feature importance and expert-level contributions, reflecting the model's decision-making logic in encrypted traffic classification tasks.

2606.04516 2026-06-04 cs.LG cs.AI 版本更新

GeoMin: Data-Efficient Semi-Supervised RLVR via Geometric Distribution Modeling

GeoMin: 基于几何分布建模的数据高效半监督RLVR

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Kai Tang, Zhengqing Zang, Bowen Song, Weiqiang Wang, Gang Chen

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团)

AI总结 提出GeoMin方法,通过建模标注数据的全局特征分布来解码正确与错误展开的结构差异,从而建立稳健先验评估自奖励信号可靠性,以少量标注数据高效利用未标注数据,在仅用10%标注时超越全监督模型。

详情
AI中文摘要

基于可验证奖励的强化学习(RLVR)显著提升了LLM的推理能力,但面临困境:标准监督扩展受限于高标注成本,而无监督替代方案则遭受严重的模型崩溃。最近的半监督RLVR方法通过使用少量标注集指导未标注数据,在训练效果和标注成本之间取得了有前景的权衡。然而,由于依赖粗糙的性能启发式,它们遭受严重的数据效率瓶颈,导致绝大多数有价值实例未被充分利用。为此,我们提出GeoMin,它在标注数据上建模全局特征分布,以解码正确和错误展开之间的结构差异,从而建立稳健的先验来评估自奖励信号的可靠性,并充分释放未标注数据的潜力。实验上,GeoMin比最强基线高出+4.1%,甚至在使用仅10%标注的情况下超越全监督模型,展示了显著的数据效率。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) significantly advances LLM reasoning, yet it faces a dilemma: standard supervised scaling is throttled by high annotation costs, while unsupervised alternatives suffer from severe model collapse. Recent semi-supervised RLVR methods address this by using a small labeled set to guide unlabeled data, achieving a promising trade-off between training efficacy and annotation cost. However, they suffer from a severe data-efficiency bottleneck due to the reliance on coarse performance heuristics, leaving a vast majority of valuable instances underutilized. To this end, we propose GeoMin, which models global feature distributions on labeled data to decode the structural discrepancy between correct and incorrect rollouts, thereby establishing a robust prior to assess the reliability of self-reward signals and fully unleash the potential of unlabeled data. Empirically, GeoMin outperforms the strongest baselines by +4.1% and even surpasses fully supervised models with only 10% of the annotations, demonstrating remarkable data efficiency.

2606.04507 2026-06-04 cs.CL cs.AI 版本更新

Self-Evolving Deep Research via Joint Generation and Evaluation

通过联合生成与评估实现自我进化的深度研究

Han Zhu, Chengkun Cai, Yuanfeng Song, Xing Chen, Sirui Han, Yike Guo

发表机构 * The Hong Kong University of Science and Technology(香港科技大学) ByteDance, China(字节跳动) University College London(伦敦大学学院)

AI总结 提出SCORE框架,通过共享参数的协同进化训练联合优化评估器与求解器,解决深度研究报告生成中奖励不可验证的问题,持续提升生成质量。

详情
AI中文摘要

大型语言模型(LLM)在日常应用中越来越广泛,其中深度研究是一项特别重要的能力。与传统的问答(QA)任务不同,深度研究报告生成缺乏明确的真实答案,这使得奖励设计本质上不可验证,限制了有效的强化学习。现有方法通过LLM作为评判者和查询相关的评估标准来缓解这一挑战,但它们仍然依赖静态评估器,无法随着求解器的改进而调整标准,导致优化压力不足并最终饱和。我们通过一个用于深度研究评估和生成的 extbf{自}我进化 extbf{协}同进化训练框架(SCORE)来解决这一限制,该框架在共享参数的学习过程中紧密耦合评估器和求解器。我们不将生成和评估视为孤立的模块,而是利用它们的内在联系,在单个共享参数模型中实现联合改进。为了限制这一过程,我们引入了一个元控制机制,该机制根据求解器的性能动态控制评估环境,鼓励有效的评估维度和足够深入的评估器搜索。在深度研究基准上的大量实验表明,报告生成质量持续提升,表明协同进化评估和生成是训练开放式研究代理的一个有前景的方向。

英文摘要

Large Language Models (LLMs) have become increasingly adopted in daily applications, with deep research standing out as a particularly important capability. Unlike traditional question-answering (QA) tasks, deep research report generation lacks definitive ground-truth, making reward design inherently unverifiable and limiting effective reinforcement learning. Existing approaches mitigate this challenge with LLM-as-a-judge and query-dependent evaluation rubrics, but they still rely on static evaluators that cannot adapt their standards as the solver improves, leading to insufficient and eventually saturated optimization pressure. We address this limitation with a \textbf{s}elf-evolving \textbf{co}-evolutionary training framework for deep \textbf{re}search evaluation and generation (SCORE), which tightly couples an evaluator and a solver in a shared-parameter learning process. Rather than treating generation and evaluation as isolated modules, we leverage their intrinsic connection to enable joint improvement within a single shared-parameter model. To restrict this process, we introduce a meta-harness, which dynamically controls the evaluation environment based on solver performance, encouraging valid evaluation dimensions and sufficiently deep evaluator search. Extensive experiments on deep research benchmarks demonstrate consistent improvement in report generation quality, showing that co-evolving evaluation and generation is a promising direction for training open-ended research agents.

2606.04505 2026-06-04 cs.AI 版本更新

Simulate, Reason, Decide: Scientific Reasoning with LLMs for Simulation-Driven Decision Making

模拟、推理、决策:基于科学推理的LLM驱动模拟决策

Yuhan Yang, Ruipu Li, Alexander Rodríguez

发表机构 * Computer Science and Engineering University of Michigan(计算机科学与工程大学密歇根大学)

AI总结 提出MechSim框架,通过神经符号推理使LLM能够推理科学模拟器的机制和假设,提升决策透明度和可靠性。

详情
AI中文摘要

科学模拟器越来越多地被集成到LLM驱动的系统中,用于高风险模拟驱动决策。然而,现有框架主要使用LLM来生成、校准或执行模拟器,将其视为黑盒接口而非可推理的结构化机械系统。因此,当前方法缺乏识别、表示和推理模拟器行为背后的假设和机制的能力,限制了透明度、可审计性和决策合理性。我们引入了MechSim,一个面向可执行科学模拟器的机制基础神经符号推理框架。与先前主要对静态符号结构进行推理的神经符号方法不同,MechSim使LLM代理能够推理科学模拟器的机制、假设和执行行为。我们的框架通过共享结构化模式表示模拟器,捕获假设、变量、机制依赖和执行轨迹。在此表示之上,LLM代理作为受约束的推理引擎运行,生成结构化的、基于证据的解释,将模拟器结果与其底层机制联系起来。我们在多个高风险领域评估了我们的方法,结果表明它提高了机制级解释质量、模拟器分析和下游决策可靠性。

英文摘要

Scientific simulators are increasingly being integrated into LLM-driven systems for high-stakes simulation-driven decision-making. However, existing frameworks primarily use LLMs to generate, calibrate, or execute simulators, treating them as black-box interfaces rather than as structured mechanistic systems that can be reasoned about. As a result, current approaches lack the ability to identify, represent, and reason about the assumptions and mechanisms underlying simulator behavior, limiting transparency, auditability, and decision justification. We introduce MechSim, a mechanism-grounded neuro-symbolic reasoning framework for executable scientific simulators. Unlike prior neuro-symbolic approaches that primarily reason over static symbolic structures, MechSim enables LLM agents to reason about the mechanisms, assumptions, and execution behavior of scientific simulators. Our framework represents simulators through a shared structured schema capturing assumptions, variables, mechanism dependencies, and execution traces. On top of this representation, LLM agents operate as constrained reasoning engines that generate structured, evidence-grounded explanations linking simulator outcomes to their underlying mechanisms. We evaluate our approach across multiple high-stakes domains and show that it improves mechanism-level explanation quality, simulator analysis, and downstream decision-making reliability.

2606.04503 2026-06-04 cs.LG cs.AI 版本更新

Smart Picks in the Dark: Towards Efficient RLVR for Reasoning via Tracing Metacognitive Pivots

暗中选择:通过追踪元认知支点实现高效的推理可验证奖励强化学习

Guangcheng Zhu, Shenzhi Yang, Haobo Wang, Xing Zheng, Yingfan MA, Xuening Feng, Zhongqi Chen, Bowen Song, Weiqiang Wang, Gang Chen

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团)

AI总结 针对可验证奖励强化学习(RLVR)中数据效率低的问题,提出PivotTrace框架,利用注意力动态追踪推理过程中的元认知支点,通过支点密度量化不确定性实现数据自动分流,在仅使用29.3%标注样本和2.75倍收敛加速下超越全监督模型。

详情
AI中文摘要

可验证奖励强化学习(RLVR)极大地推进了大型推理模型(LRMs),但它需要及时在大量完全标注的数据集上进行训练。为此,从两个角度广泛研究了数据高效的RLVR方法:(i)数据选择方法识别一小部分“黄金”样本,这些样本能产生接近全数据性能,但它们依赖于预先存在的标注数据池。(ii)无监督RLVR方法在大规模未标注数据上利用模型自身的内部监督信号进行训练,但表现出次优性能。因此,我们研究了RLVR的“暗中选择”设置,其目标是在没有先验监督的情况下,选择对训练最有益且值得标注的未标注样本。通过系统分析,我们证明智能选择依赖于一个校准良好的不确定性估计器,以实现数据的策略性划分,从而进行自适应训练方案。基于这一见解,我们提出了PivotTrace,一个三路数据分流框架,利用注意力动态追踪推理过程中的元认知支点。通过支点密度精确量化不确定性,PivotTrace实现了自动数据路由,协同最大化标注和训练效率。实验表明,PivotTrace仅使用29.3%的标注样本和2.75倍的收敛速度就超越了全监督LRM。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has greatly advanced large reasoning models (LRMs), but it requires timely training on a huge fully-annotated dataset. To this end, data-efficient RLVR methods have been widely studied from two perspectives: (i) data selection methods identify a small subset of "golden" samples that yield near-full-data performance, but they rely on a pre-existing pool of labeled data. (ii) unsupervised RLVR methods train the model using its own internal supervision signals on large-scale unlabeled data, yet they exhibit suboptimal performance. Accordingly, we investigate the "pick in the dark" setup for RLVR, which aims to select, without prior supervision, unlabeled samples that are most beneficial for training and worthy of annotation. Through systematic analysis, we demonstrate that smart picks hinge on a well-calibrated uncertainty estimator to enable strategic partitioning of data for adaptive training regimes. Building on this insight, we propose PivotTrace, a three-way data triage framework that leverages attention dynamics to trace metacognitive pivots during reasoning. By precisely quantifying uncertainty through pivot density, PivotTrace achieves automated data routing to synergistically maximize both annotation and training efficiency. Empirically, PivotTrace surpasses the fully supervised LRM with only 29.3% annotated samples and 2.75 faster convergence.

2606.04494 2026-06-04 cs.AI 版本更新

Beyond Prompt-Based Planning: MCP-Native Graph Planning-based Biomedical Agent System

超越基于提示的规划:基于MCP原生图规划的生物医学智能体系统

Zhangtianyi Chen, Florensia Widjaja, Wufei Dai, Xiangjun Zhang, Yuhao Shen, Juexiao Zhou

发表机构 * The Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 提出BioManus系统,通过将异构生物信息学工具编译为标准MCP服务器并构建类型化异构图,实现基于图结构的规划,解决工具混淆和上下文效率问题,在BioAgentBench和LAB-Bench上提升执行准确性和工作流有效性。

详情
AI中文摘要

生物医学智能体有望自动化复杂的生物工作流,但当前系统面临两个基本瓶颈:生物信息学工具在接口和执行环境上高度异构,而智能体规划仍依赖于基于提示的扁平工具描述。随着生物医学软件生态系统的增长,这种工具覆盖与上下文大小之间的耦合导致工具混淆、规划不稳定和执行效率低下。我们引入BioManus,一种基于结构化生物能力上的图支架规划的原生MCP生物医学智能体。BioManus首先提出BioinfoMCP编译器,将异构生物信息学软件转换为标准化的MCP服务器,从而产生一个大型可执行的MCP生态系统。然后,它将这个生态系统组织成一个类型化的异构图,涵盖工具、操作、数据类型和工作流阶段。在推理时,BioManus检索紧凑的任务特定子图,合成操作级工作流支架。这种设计将规划复杂度与原始工具库存大小解耦,在高召回率检索下实现了上下文压缩比Theta(N / (h * m_bar)),其中N是工具总数,h是工作流长度,m_bar(远小于N)是每个操作的平均候选工具数量。在BioAgentBench和LAB-Bench上的实验表明,与先进的生物医学智能体基线相比,BioManus提高了执行准确性、工作流有效性和上下文效率。这项工作表明了一种范式转变:可扩展的生物医学推理需要结构化的可执行能力图,而不是越来越大的提示级工具检索。

英文摘要

Biomedical agents promise to automate complex biological workflows, yet current systems face two fundamental bottlenecks: bioinformatics tools are highly heterogeneous in interfaces and execution environments, while agent planning still relies on flat prompt-retrieved tool descriptions. As biomedical software ecosystems grow, this coupling between tool coverage and context size leads to tool confusion, unstable planning, and inefficient execution. We introduce BioManus, an MCP-native biomedical agent built on graph-scaffolded planning over structured biological capabilities. BioManus first introduces the BioinfoMCP Compiler, which converts heterogeneous bioinformatics software into standardized MCP servers, yielding a large executable MCP ecosystem. It then organizes this ecosystem as a typed heterogeneous MCP graph over tools, operations, datatypes, and workflow stages. At inference time, BioManus retrieves compact task-specific subgraphs, synthesizes operation-level workflow scaffolds. This design decouples planning complexity from raw tool inventory size, achieving a context compression ratio of Theta(N / (h * m_bar)) under high-recall retrieval, where N is the total tool count, h is the workflow horizon, and m_bar (much smaller than N) is the average number of candidate tools per operation. Experiments on BioAgentBench and LAB-Bench show that BioManus improves execution accuracy, workflow validity, and context efficiency over advanced biomedical agent baselines. This work suggests a paradigm shift: scalable biomedical reasoning requires structured executable capability graphs rather than increasingly larger prompt-level tool retrieval.

2606.04493 2026-06-04 cs.CV cs.AI 版本更新

SFMambaNet: Spectral-Frequency Enhanced Selective State Space Model for Correspondence Pruning

SFMambaNet: 用于对应点筛选的频谱-频率增强选择性状态空间模型

Zhihua Wang, Yanping Li, Yizhang Liu

AI总结 提出SFMambaNet,通过局部频谱-几何注意力块和频谱集成全局Mamba块,首次将频域感知融入对应点筛选任务,增强内点与离点的区分能力。

详情
AI中文摘要

对应点筛选旨在从初始对应点集中识别内点。现有大多数基于图神经网络的方法依赖于从粗欧几里得坐标映射的几何特征,难以捕捉内点呈现的细微几何一致性。而基于Mamba的方法虽具有全局感受野和长序列建模能力,但往往在隐藏状态空间中积累大量不一致特征,难以区分内点与离点。本文首次将频域感知融入该任务,提出SFMambaNet,一种新颖的频谱-频率增强Mamba双视图对应点筛选网络。我们的方法由两个组件协同构成:首先,设计局部频谱-几何注意力(LSGA)块。LSGA将频谱位置编码融入局部图交互,并引入多尺度Mamba处理,以增强对细微几何一致性的捕捉并提升局部特征判别性。在此基础上,设计频谱集成全局Mamba(SIGM)块。SIGM在状态空间中嵌入频率门控机制,利用LSGA提供的频率信息显式抑制隐藏状态内高频噪声的累积,并减轻不一致特征的传播。这增强了内点-离点可分性,并以近乎线性的复杂度实现了鲁棒的全局上下文建模能力。大量实验表明,SFMambaNet在多个具有挑战性的任务上优于当前最先进方法。代码可在https://github.com/Kirito14IT/SFMambaNet获取。

英文摘要

Correspondence pruning aims to identify inliers from an initial set of correspondences. Most existing Graph Neural Network (GNN)-based methods rely on geometric features mapped from coarse Euclidean coordinates, which struggle to capture the subtle geometric consistencies presented by inliers. While Mamba-based methods possess global receptive fields and long sequence modeling capabilities, they tend to accumulate substantial inconsistent features within the hidden state space, making it difficult to distinguish inliers from outliers. In this paper, we integrate frequency domain perception into this task for the first time and propose SFMambaNet, a novel Spectral-Frequency enhanced Mamba-based two-view correspondence pruning network. Our method is collaboratively composed of two components: First, we design a Local Spectral-Geometric Attention (LSGA) block. LSGA incorporates spectral positional encoding into local graph interactions and introduces multi-scale Mamba processing to enhance the capture of subtle geometric consistencies and improve local feature discriminability. Building upon this, we design a Spectral-Integrated Global Mamba (SIGM) block. SIGM embeds a frequency gating mechanism within the state space, utilizing the frequency information provided by LSGA to explicitly suppress high-frequency noise accumulation within hidden states and mitigate the propagation of inconsistent features. This enhances inlier-outlier separability and achieves robust global context modeling capabilities with nearly linear complexity. Extensive experiments demonstrate that SFMambaNet outperforms current state-of-the-art methods on several challenging tasks. The code is available at https://github.com/Kirito14IT/SFMambaNet.

2606.04484 2026-06-04 cs.AI cs.LG cs.MA 版本更新

AgentJet: A Flexible Swarm Training Framework for Agentic Reinforcement Learning

AgentJet:一种用于智能体强化学习的灵活群体训练框架

Qingxu Fu, Boyin Liu, Shuchang Tao, Zhaoyang Liu, Bolin Ding

发表机构 * Tongyi Lab, Alibaba Group(通义实验室,阿里巴巴集团)

AI总结 提出AgentJet,一种解耦的多节点群体训练框架,支持异构多模型强化学习、多任务鸡尾酒训练、容错执行和实时代码迭代,并通过上下文跟踪模块实现1.5-10倍训练加速。

Comments Technical report, 27 pages

详情
AI中文摘要

我们提出了AgentJet,一个用于大型语言模型(LLM)智能体强化学习的分布式群体训练框架。与将智能体运行与模型优化紧密耦合的集中式框架不同,AgentJet采用解耦的多节点架构,其中群体服务器节点托管可训练模型并在GPU集群上运行优化,而群体客户端节点在任意设备上执行任意智能体。这种设计提供了集中式框架难以支持的能力:(1)异构多模型强化学习,支持训练具有多个LLM作为大脑的异构多智能体团队;(2)具有隔离智能体运行时的多任务鸡尾酒训练;(3)容错执行,防止外部环境故障中断训练过程;(4)实时代码迭代,允许通过替换群体客户端节点在训练期间编辑智能体。为了支持多模型、多轮和多智能体设置中的高效强化学习,AgentJet引入了一个带有时间线合并的上下文跟踪模块,该模块合并冗余上下文并实现1.5-10倍的训练加速。最后,AgentJet引入了一个自动化研究系统,该系统以研究主题为输入,并在大规模集群上自主进行长期、多天的强化学习研究。通过利用群体架构,该系统在无需人工干预的情况下复现了强化学习研究人员的关键探索工作流程。

英文摘要

We present AgentJet, a distributed swarm training framework for large language model (LLM) agent reinforcement learning. Unlike centralized frameworks that tightly couple agent rollouts with model optimization, AgentJet adopts a decoupled multi-node architecture in which swarm server nodes host trainable models and run optimization on GPU clusters, whereas swarm client nodes execute arbitrary agents on arbitrary devices. This design provides capabilities that are difficult to support in centralized frameworks: (1) heterogeneous multi-model reinforcement learning, enabling the training of heterogeneous multi-agent teams with multiple LLM as brains; (2) multi-task cocktail training with isolated agent runtimes; (3) fault-tolerant execution that prevents external environment failures from interrupting the training process; and (4) live code iteration, which allows agents to be edited during training by replacing swarm client nodes. To support efficient RL in multi-model, multi-turn, and multi-agent settings, AgentJet introduces a context tracking module with timeline merging, which consolidates redundant context and achieves a 1.5-10x training speedup. Finally, AgentJet introduces an automated research system that takes a research topic as input and autonomously conducts long-horizon, multi-day RL studies on large-scale clusters. By leveraging the swarm architecture, this system reproduces key exploratory workflows of RL researchers without human intervention during execution.

2606.04479 2026-06-04 cs.CV cs.AI cs.CL 版本更新

Evaluating Reasoning Fidelity in Visual Text Generation

评估视觉文本生成中的推理保真度

Jiajun Hong, Jiawei Zhou

发表机构 * Stony Brook University(石桥大学)

AI总结 通过长文本渲染、事实知识探测、上下文理解和多步推理等任务,评估当前文本到图像模型在视觉文本生成中是否忠实保持推理能力,发现其常产生语义错误和逻辑不一致,与纯文本模型存在显著差距。

Comments Peer reviewed and accepted at CVPR 2026 at the GRAIL-V (Grounded Retrieval and Agentic Intelligence for Vision-Language) workshop (non-archival track)

详情
AI中文摘要

最近的文本到图像(T2I)模型能够在图像中渲染高度清晰且结构良好的文本,从而支持文档生成和幻灯片生成等应用。然而,当复杂解决方案必须直接通过渲染文本表达时,这些系统是否忠实地保留了推理能力,还是仅仅模仿表面模式,目前尚不清楚。我们通过评估视觉文本生成中的推理保真度来研究这一问题,其中模型必须将完整的推理过程表达为图像。我们的评估包括长文本渲染、事实知识探测、上下文理解和多步推理。在这些设置中,我们发现当前的T2I模型经常产生语义错误、逻辑不一致和错误的中间步骤,即使渲染的文本在视觉上清晰。这些失败与纯文本模型在相同任务上的强推理表现形成对比。我们的发现揭示了视觉文本生成与程序性推理之间的显著差距,促使更可靠的视觉文本推理。

英文摘要

Recent text-to-image (T2I) models can render highly legible and well-structured text within images, enabling applications including document generation and slide generation. However, it remains unclear whether such systems faithfully preserve reasoning ability when complex solutions must be expressed directly through rendered text, or whether they merely imitate surface-level patterns. We investigate this question by evaluating reasoning fidelity in visual text generation, where models must express complete reasoning processes as images. Our evaluation includes long text rendering, factual knowledge probing, context understanding, and multi-step reasoning. Across these settings, we find that current T2I models frequently produce semantic errors, logical inconsistencies, and incorrect intermediate steps, even when the rendered text appears visually clear. These failures contrast with the strong reasoning performance of text-only models on the same tasks. Our findings reveal a substantial gap between visual text generation and procedural reasoning, motivating more reliable visual text reasoning.

2606.04473 2026-06-04 cs.LG cs.AI 版本更新

ChessMimic: Per-Rating Transformer Models for Human Move, Clock, and Outcome Prediction in Online Blitz Chess

ChessMimic: 用于在线闪电棋中人类走棋、时钟和结果预测的按等级划分的Transformer模型

Thomas Johnson

发表机构 * nascent.xyz(nascent实验室)

AI总结 提出ChessMimic系统,包含三个小型编码器Transformer模型,分别用于走棋、思考时间和结果预测,通过按Elo等级分段训练实现更精细的技能校准,在Lichess闪电棋数据上走棋预测准确率超越Maia-2,结果预测AUC达0.78,时钟模型提供可用但非最优的思考时间信号。

详情
AI中文摘要

我们提出了ChessMimic,一个由三个小型编码器Transformer组成的系统——分别用于走棋、思考时间和结果预测——以局面、最近走棋历史、玩家等级和时钟状态为条件。我们为每100 Elo等级区间拟合每个模型的独立实例,以参数效率换取更精细的技能校准。在Lichess Rated Blitz游戏的一个月保留切片上,ChessMimic的人类走棋预测准确率在每个Elo区间都优于Maia-2。与Maia-3相比,我们的9M参数模型的准确率介于Maia-3-5M和Maia-3-23M之间,且没有几何注意力偏置的额外复杂性。除了走棋匹配模型,我们还训练了一个游戏结果模型,该模型不仅以局面为条件,还以玩家等级、时间控制和剩余时钟时间为条件。结果模型在样本外达到了0.78的AUC,击败了Maia-2以及基于子力、等级和时钟时间的逻辑回归。最后,我们训练了一个时钟模型来预测人类思考时间。该时钟模型在ALLIE风格过滤器下提供了可用但非最优的每步思考时间信号(Pearson r = 0.41,Spearman rho = 0.50,MAE 4.10秒,而ALLIE报告的r = 0.70),残差差距集中在每位置桶的锐度上,而非桶边际校准。公开演示在1e4.ai,我们在GitHub上发布了代码、每个区间的权重以及C++数据过滤管道代码。

英文摘要

We present ChessMimic, a system of three small encoder-only transformers - for move, thinking-time, and outcome prediction - conditioned on the position, recent move history, player rating, and clock state. We fit a separate instance of each model per 100-Elo rating band, trading parameter efficiency for sharper per-skill calibration. On a held-out month-wide slice of Lichess Rated Blitz games ChessMimic's human move prediction accuracy outperforms Maia-2 in every Elo band. Compared to Maia-3, our 9M parameter model's accuracy sits between Maia-3-5M and Maia-3-23M without the additional complexity of Geometric Attention Bias. In addition to the move matching model, we also train a game outcome model that conditions not only on the position, but also player ratings, time control, and remaining clock times. The outcome model achieves an AUC of 0.78 out of sample, beating Maia-2 as well as logistic regressions based on material, ratings, and clock time. Finally, we train a clock model that predicts human thinking times. The clock model provides a usable but non-SOTA per-ply think-time signal under ALLIE-style filters (Pearson r = 0.41, Spearman rho = 0.50, MAE 4.10 s, against ALLIE's reported r = 0.70), with the residual gap concentrated in per-position bucket sharpness rather than bucket-marginal calibration. A public demo is at 1e4.ai and we release code, per-band weights, and the C++ data-filter pipeline code in GitHub.

2606.04469 2026-06-04 cs.CV cs.AI 版本更新

Adaptive Calibration for Fair and Performant Facial Recognition

自适应校准:实现公平且高性能的面部识别

Ryan Brown, Chris Russell

发表机构 * University of Oxford(牛津大学)

AI总结 提出自适应校准(AC)方法,通过将归一化嵌入的余弦相似度映射为校准概率,并融入局部上下文校正区域差异,从而在无需人口统计元数据的情况下提升面部识别的整体性能和公平性。

详情
AI中文摘要

我们引入自适应校准(AC),一种新颖的面部识别校准策略,将归一化嵌入之间的余弦相似度映射为良好校准的概率。通过将局部上下文纳入校准,自适应校正确保了余弦相似度中的一个基本不匹配问题,即相同的距离在不同嵌入区域可能对应不同的匹配概率。我们的方法在无需人口统计元数据的情况下,既提高了整体性能,又实现了更公平的校准。在各种预训练模型和标准基准上,我们的方法在准确性和公平性指标上始终优于现有方法。AC为公平的面部识别提供了实用的解决方案,无需人口统计组注释,同时提高了整体性能。与现有方法不同,我们的方法提供了连续的、区域特定的校准,避免了“降级”现象,即公平性以牺牲某些群体的性能为代价。

英文摘要

We introduce Adaptive Calibration (AC), a novel calibration strategy for facial recognition that maps cosine similarity between normalized embeddings to well-calibrated probabilities. By incorporating local context into calibration, Adaptive Calibration corrects for a fundamental mismatch in cosine similarity, whereby the same distance can correspond to different match probabilities in different embedding regions. Our approach improves both overall performance and results in a fairer calibration without requiring demographic metadata. Our approach consistently dominates existing methods both on accuracy and fairness metrics across a variety of pretrained models and standard benchmarks. AC provides a practical solution for equitable facial recognition, without requiring demographic group annotations, and while improving overall performance. Unlike existing approaches, our method provides continuous, region-specific calibration that avoids "leveling down" where fairness comes at the cost of degraded performance for some groups.

2606.04468 2026-06-04 cs.LG cs.AI cs.NE math.OC 版本更新

ParetoPilot: Zero-Surrogate Offline Multi-Objective Optimization via Infer-Perturb-Guide Diffusion

ParetoPilot:通过推断-扰动-引导扩散实现零代理离线多目标优化

Ruiqing Sun, Sen Yang, Dawei Feng, Bo Ding, Yijie Wang, Huaimin Wang

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 提出ParetoPilot,一种无需外部代理模型的零代理扩散框架,通过推断-扰动-引导引擎在无条件去噪步骤中隐式推断目标方向、正交化并行引力场和边缘感知排斥力,实现离线多目标优化的帕累托最优设计。

详情
AI中文摘要

离线多目标优化旨在基于静态数据集发现新颖的帕累托最优设计,而无需昂贵的环境交互。尽管最近的生成方法取得了显著成功,但它们主要依赖外部代理模型。这种依赖引入了显著的计算开销,遭受欺骗性评估,并偏离了联合训练主流生成模型与条件的流行范式。为了解决这些瓶颈,我们提出了ParetoPilot,一种用于离线多目标优化的新颖零代理扩散框架。ParetoPilot充分利用预训练扩散模型中固有的条件先验。其核心是引入了推断-扰动-引导引擎,该引擎无缝地插入在反向生成过程的无条件去噪步骤中。首先,通过匹配条件噪声预测和无条件噪声预测,隐式推断瞬时目标方向。其次,数学上正交化一个用于严格收敛的平行引力场和一个用于相互多样性的边缘感知排斥力,从而生成一个动态退火的扰动向量。最后,这个扰动目标通过标准的无分类器引导无缝地引导生成过程。在51个任务上的大量实验表明,ParetoPilot优于14个最先进的基于代理和逆生成基线。通过消除辅助代理训练,我们的方法在实现超体积改进和鲁棒帕累托前沿覆盖的同时,保护了数据隐私。

英文摘要

Offline multi-objective optimization (Offline MOO) aims to discover novel Pareto-optimal designs based on static datasets without expensive environment interactions. While recent generative methods have achieved notable success, they predominantly rely on external surrogate models. This dependency introduces significant computational overhead, suffers from deceptive evaluations, and deviates from the prevailing paradigm of jointly training mainstream generative models with conditions. To address these bottlenecks, we propose ParetoPilot, a novel zero-surrogate diffusion framework for offline MOO. ParetoPilot fully leverages the conditional priors inherently embedded within pre-trained diffusion models. At its core, the framework introduces the Infer-Perturb-Guide (IPG) engine, which is seamlessly interleaved within the unconditional denoising steps of the reverse generation process. First, it implicitly infers the instantaneous objective direction by matching conditional and unconditional noise predictions. Next, it mathematically orthogonalizes a parallel gravity field for strict convergence and an edgeness-aware repulsive force for mutual diversity, creating a dynamically annealed perturbation vector. Finally, this perturbed target seamlessly steers the generation process via standard Classifier-Free Guidance (CFG). Extensive experiments across 51 tasks demonstrate that ParetoPilot outperforms 14 state-of-the-art surrogate-based and inverse generative baselines. By eliminating auxiliary proxy training, our approach preserves data privacy while achieving hypervolume improvement and robust Pareto front coverage.

2606.04465 2026-06-04 cs.CL cs.AI 版本更新

SePO: Self-Evolving Prompt Agent for System Prompt Optimization

SePO: 用于系统提示优化的自我进化提示智能体

Wangcheng Tao, Han Wu, Weng-Fai Wong

发表机构 * National University of Singapore(新加坡国立大学) City University of Hong Kong(香港城市大学)

AI总结 提出SePO方法,通过自我指涉设计让提示智能体同时优化任务智能体和自身的系统提示,采用两阶段进化训练,在多个基准上平均准确率提升4.49%。

Comments 26 pages. Code: https://github.com/taowangcheng/SePO

详情
AI中文摘要

系统提示优化在不修改底层模型的情况下改善智能体行为,生成可读且模型无关的指令。现有方法构建一个提示智能体来优化任务智能体的系统提示,但提示智能体自身的系统提示仍由人工设计且固定不变。我们提出自我进化提示优化(SePO),将提示智能体自身的系统提示与任务智能体的系统提示一同作为优化目标。SePO采用自我指涉设计:一个单一的提示智能体在开放式进化搜索下同时改进任务智能体的系统提示和自身的系统提示,该搜索维护一个候选提示档案作为垫脚石。训练分为两个阶段:预训练在多任务池上进化提示智能体,微调则将其应用于目标任务。在涵盖数学(AIME'25)、抽象推理(ARC-AGI-1)、研究生级科学(GPQA)、代码生成(MBPP)和逻辑谜题(数独)的五个基准上,SePO始终优于Manual-CoT、TextGrad和MetaSPO,与Manual-CoT相比平均准确率提升4.49%。预训练中的提示优化技能也能泛化到预训练混合任务之外的任务,而非记忆每个任务的提示。

英文摘要

System prompt optimization improves agent behavior without modifying the underlying model, yielding human-readable, model-agnostic instructions. Existing methods build a prompt agent that refines task agents' system prompts, yet leave the prompt agent's own system prompt hand-engineered and fixed. We propose Self-Evolving Prompt Optimization (SePO), which treats the prompt agent's own system prompt as an optimization target alongside task agents' system prompts. SePO adopts a self-referential design. A single prompt agent improves both task agents' system prompts and its own under an open-ended evolutionary search that maintains an archive of candidate prompts as stepping stones. Training proceeds in two stages: pre-training evolves the prompt agent on a multi-task pool, and fine-tuning then applies it to a target task. Across five benchmarks spanning math (AIME'25), abstract reasoning (ARC-AGI-1), graduate-level science (GPQA), code generation (MBPP), and logic puzzles (Sudoku), SePO consistently outperforms Manual-CoT, TextGrad, and MetaSPO, improving the average accuracy by 4.49 points compared to Manual-CoT. The prompt optimization skill from pre-training also generalizes to tasks beyond the pre-training mixture, rather than memorizing per-task prompts.

2606.04460 2026-06-04 cs.CR cs.AI cs.LG 版本更新

CyberGym-E2E: Scalable Real-World Benchmark for AI Agents' End-to-End Cybersecurity Capabilities

CyberGym-E2E:面向AI代理端到端网络安全能力的可扩展真实世界基准

Tianneng Shi, Robin Rheem, Dongwei Jiang, Mona Wang, Francisco De La Riega, Zhun Wang, Jingzhi Jiang, Alexander Cheung, Sean Tai, Jonah Cha, Jianhong Tu, Gabriel Han, Chenguang Wang, Jingxuan He, Wenbo Guo, Dawn Song

发表机构 * Stanford University(斯坦福大学) UC Berkeley(加州大学伯克利分校)

AI总结 提出CyberGym-E2E,一个大规模、真实的端到端网络安全基准,通过自动化流水线将开源漏洞数据转化为评估环境,全面评估AI代理在漏洞发现、PoC生成和补丁生成全生命周期中的能力。

Comments ICML 2026

详情
AI中文摘要

人工智能有潜力通过使系统能够自主检测、分析和修复软件漏洞来改变网络安全。然而,现有对AI系统的网络安全评估在规模或范围上有限,未能捕捉真实世界软件漏洞发现和修复的端到端生命周期。为了解决这一差距,我们提出了CyberGym-E2E,一个大规模、真实的端到端网络安全基准,全面评估AI代理在漏洞发现、PoC生成和补丁生成整个生命周期中的能力。CyberGym-E2E全面且可扩展,因为我们构建了一个自动化的、代理增强的流水线,用于将开源漏洞数据转化为真实的评估环境。目前,该基准包含139个不同开源项目中的920个真实世界漏洞。

英文摘要

AI has the potential to transform cybersecurity by enabling systems that can autonomously detect, analyze, and remediate software vulnerabilities. However, existing cybersecurity evaluations of AI systems are limited in scale or scope, and fail to capture the end-to-end lifecycle of real-world software vulnerability discovery and remediation. To address this gap, we propose CyberGym-E2E, a large-scale and realistic end-to-end cybersecurity benchmark that comprehensively evaluates AI agents' abilities across the full lifecycle of vulnerability discovery, PoC generation, and patch generation. CyberGym-E2E is comprehensive and scalable, as we build an automated, agent-enhanced pipeline for transforming open-source vulnerability data into realistic evaluation environments. Currently, the benchmark consists of 920 real-world vulnerabilities across 139 different open-source projects.

2606.04459 2026-06-04 cs.CR cs.AI cs.CC cs.CL 版本更新

Token Rankings are Unforgeable Language Model Signatures

Token排名是不可伪造的语言模型签名

Matthew Finlayson, Andreas Grivas, Xiang Ren, Swabha Swayamdipta

发表机构 * University of Southern California(南加州大学) University of Edinburgh(爱丁堡大学)

AI总结 本文发现语言模型的token排名(按概率排序)构成唯一且不可伪造的签名,并研究了在限制API下如何平衡签名展示与参数泄露。

详情
AI中文摘要

已知语言模型参数对其logit输出施加了(每个模型)独特的几何约束,这作为识别模型的签名,但当API分发logits时也会泄露模型的最后一层参数。我们研究了更严格的API,这些API只暴露token排名(即按概率排序,但不暴露概率值),并发现排名也构成签名:对于足够大的$k$,每个模型都有一组唯一的可行top-$k$排名。此外,排名签名是第一个已知的(多项式时间)不可伪造签名,因为找到一个具有相同可行排名集的模型是NP难的。在安全方面,我们发现token排名已经足以近似窃取模型的最后一层,类似于logits,尽管近似太粗糙以至于无法伪造签名,并且可以通过将API限制为足够小的$k$的top-$k$ token来有效应对。由于展示模型签名所需的top-$k$通常小于防止窃取所需的$k$,因此API可以在不泄露模型参数的情况下展示不可伪造的签名。

英文摘要

Language model parameters are known to impose unique (to each model) geometric constraints on their logit outputs, which serves as a signature that identifies the model, but also leaks the model's final layer parameters when an API distributes logits. We investigate more restrictive APIs that expose token rankings (i.e., their ordering by probability, but not the probability values) and find that rankings also constitute a signature: every model has a unique set of feasible top-$k$ rankings for sufficiently large $k$. Furthermore, the ranking signature is the first known (polynomially) unforgeable signature, since finding a model with the same set of feasible rankings is NP-hard. On the security front, we find that token rankings are already sufficient to approximately steal the final layer of the model, similar to logits, though the approximation is too coarse to forge the signature, and can be effectively countered by restricting the API to top-$k$ tokens with sufficiently small $k$. Since the top-$k$ required to present the model signature is generally smaller than the $k$ required to prevent stealing, it is possible for an API to present an unforgeable signature without leaking model parameters.

2606.04455 2026-06-04 cs.AI cs.CL 版本更新

The Meta-Agent Challenge: Are Current Agents Capable of Autonomous Agent Development?

元智能体挑战:当前智能体能否自主开发智能体?

Xinyu Lu, Tianshu Wang, Pengbo Wang, zujie wen, Zhiqiang Zhang, Jun Zhou, Boxi Cao, Yaojie Lu, Hongyu Lin, Xianpei Han, Le Sun

发表机构 * Chinese Information Processing Laboratory, Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所信息处理实验室) University of Chinese Academy of Sciences(中国科学院大学) Ant Group(蚂蚁集团)

AI总结 提出元智能体挑战(MAC)框架,评估前沿模型自主开发智能体系统的能力,发现多数元智能体难以匹敌人类设计的基线策略,且存在鲁棒性和对齐问题。

Comments Website: https://meta-agent-challenge.com/

详情
AI中文摘要

当前的AI基准测试评估智能体在人类设计的工作流程中执行任务的能力。这些评估从根本上未能衡量一个关键的更高级能力:模型能否自主开发智能体系统。我们引入了元智能体挑战(MAC),这是一个评估框架,旨在测试前沿模型自主开发智能体的能力。具体来说,一个代码智能体(元智能体)被赋予一个沙盒环境、一个评估API和一个时间限制,以迭代地编程一个智能体工件,该工件在五个领域的保留测试集上最大化性能。为确保评估完整性,该框架通过多层防御机制防止奖励黑客攻击。利用该框架,我们证明元智能体很少能匹配人类设计的基线策略,而少数能匹配的则主要由专有前沿模型主导。此外,设计过程表现出高方差,高优化压力会浮现出诸如真实数据窃取等新兴对抗行为——凸显了鲁棒性和模型对齐方面的关键缺陷。最终,MAC为自主AI研究和开发提供了一个严格的、开源的基准测试,为评估递归自我改进提供了经验代理。基准测试公开于:https://github.com/ant-research/meta-agent-challenge。

英文摘要

Current AI benchmarks evaluate agents on task execution within human-designed workflows. These evaluations fundamentally fail to measure a critical next-level capability: whether models can autonomously develop agent systems. We introduce the Meta-Agent Challenge (MAC), an evaluation framework designed to test the capacity of frontier models for autonomous agent development. Specifically, a code agent (the meta-agent) is given a sandboxed environment, an evaluation API, and a time limitation to iteratively program an agent artifact that maximizes performance on a held-out test set across five domains. To ensure evaluation integrity, this framework is secured by multi-layer defenses against reward hacking. Leveraging this framework, we demonstrate that meta-agents rarely match human-engineered baseline policies, and the few that do are dominated by proprietary frontier models. Moreover, the design process exhibits high variance, and high optimization pressure surfaces emergent adversarial behaviors like ground-truth exfiltration-highlighting critical deficits in both robustness and model alignment. Ultimately, MAC provides a rigorous, open-source benchmark for autonomous AI research and development, offering an empirical proxy for evaluating recursive self-improvement. Benchmark is publicly available at: https://github.com/ant-research/meta-agent-challenge.

2606.04445 2026-06-04 cs.LG cs.AI math.ST stat.TH 版本更新

RowNet: A Memory Transformer for Tabular Regression

RowNet: 用于表格回归的记忆Transformer

Askat Rakhymbekov, Gulshat Muhametjanova

发表机构 * Department of Applied Mathematics and Informatics(应用数学与信息学系) Kyrgyz-Turkish Manas University(吉尔吉斯-土耳其马纳斯大学)

AI总结 针对房地产估值中表格回归问题,提出RowNet,一种基于检索的神经网络架构,通过记忆库中的成对相似性特征、目标一致性增强和混合专家模块实现价格预测。

Comments Retrieval-based neural architecture for real estate valuation. Related to TabR (arXiv:2307.14338) and retrieval-augmented tabular learning

详情
AI中文摘要

房地产估值是一个结构化回归问题,其中价格受异构特征类型、稀疏区域效应、非线性交互以及可比房产的实际逻辑影响。标准多层感知器将每一行视为孤立向量,必须仅从监督中学习局部性、尺度敏感性和类别匹配。梯度提升决策树提供了强大的表格基线,但其以特征为中心的分裂机制并未显式建模相似历史观测的检索。本文提出了RowNet,一种用于房地产每平方米价格预测的基于检索的神经网络架构。RowNet通过针对标记属性记忆库的成对相似性特征来表示查询属性。第一检索层从仅特征相似性中估计粗略目标。第二层通过目标一致性特征增强记忆比较,并使用多个学习注意力头检索互补的可比集。最终的混合专家模块结合了学习门控、残差校正、熵正则化和头多样性正则化以产生预测。

英文摘要

Real estate valuation is a structured regression problem in which prices are governed by heterogeneous feature types, sparse regional effects, nonlinear interactions, and the practical logic of comparable properties. Standard multilayer perceptrons treat each row as an isolated vector and must learn locality, scale sensitivity, and categorical matching from supervision alone. Gradient-boosted decision trees provide strong tabular baselines, but their feature-centric splitting mechanism does not explicitly model the retrieval of similar historical observations. This paper presents RowNet, a retrieval-based neural architecture for real estate price-per-square-meter prediction. RowNet represents a query property through pairwise similarity features against a memory bank of labeled properties. A first retrieval layer estimates a coarse target from feature-only similarities. A second layer augments the memory comparison with target-consistency features and uses multiple learned attention heads to retrieve complementary comparable sets. A final mixture-of-experts module combines learned gating, residual correction, entropy regularization, and head-diversity regularization to produce the prediction.

2606.04442 2026-06-04 cs.CL cs.AI 版本更新

MemoryDocDataSet: A Benchmark for Joint Conversational Memory and Long Document Reasoning

MemoryDocDataSet: 联合对话记忆与长文档推理的基准测试

Qiyang Xie, Jialun Wu, Xinjie He, Su Liu, Shuai Xiao, Zhiyuan Lin, Weikai Zhou

发表机构 * Northeastern University(东北大学) Johns Hopkins University(约翰霍普金斯大学) Columbia University(哥伦比亚大学) Independent Researcher(独立研究者)

AI总结 提出MemoryDocDataSet合成基准,包含50个微世界和1000个QA对,评估系统同时处理多轮对话历史和长文档阅读理解的能力,其中75.1%的问题需要混合推理(先导航对话历史再提取文档答案),实验显示联合检索存在明显差距。

Comments 17 pages, 2 figures, 8 tables. Submitted for peer review

详情
AI中文摘要

人工智能系统越来越需要结合两种要求很高的能力:导航多轮对话历史和在长文档中进行深度阅读理解。然而,现有的基准测试没有同时评估这两者。我们引入了MemoryDocDataSet,一个包含50个微世界和1000个QA对的合成基准,其中每个实例包含3-5个人物角色、一个跨越数月活动的时间事件图、3-5篇真实长文档(每篇20,000-50,000个token,来自Caselaw Access Project)、基于这些文档的多轮对话,以及跨越五个推理类别的20个问答对。其定义特征是混合源标签:需要系统首先导航对话历史以确定哪个文档相关,然后从该文档中提取答案的问题。混合问题占数据集的75.1%。通过使用LLM作为评判者的提示敏感性自一致性分析来表征数据集质量,在所有50个微世界中得到中位数Cohen's $κ= 0.634$。我们评估了六种基线配置,涵盖截断上下文、长上下文LLM、检索增强生成(RAG)和记忆系统。最佳基线(RAG-Both)在整体F1上达到0.358,在混合问题上达到0.342。仅文档检索(RAG-Doc)在混合问题上降至0.267,尽管在仅文档问题上达到0.453,这显示了明显的联合检索差距,激励了统一对话记忆与长文档导航的架构。我们发布了数据集、生成流水线和所有基线实现。

英文摘要

AI systems increasingly need to combine two demanding capabilities: navigating multi-session conversation history and performing deep reading comprehension within long documents. Yet no existing benchmark evaluates both simultaneously. We introduce MemoryDocDataSet, a synthetic benchmark of 50 micro-worlds and 1,000 QA pairs in which each instance comprises 3-5 personas, a temporal event graph spanning months of activity, 3-5 real long documents (20,000-50,000 tokens each sourced from the Caselaw Access Project), multi-session conversations grounded on those documents, and 20 question-answer pairs across five reasoning categories. The defining feature is the Hybrid source tag: questions requiring a system to first navigate conversation history to identify which document is relevant, then extract the answer from within that document. Hybrid questions account for 75.1% of the dataset. Dataset quality is characterised through a prompt-sensitivity self-consistency analysis using LLM-as-judge, yielding a median Cohen's $κ= 0.634$ across all 50 micro-worlds. We evaluate six baseline configurations spanning truncated context, long-context LLMs, retrieval-augmented generation (RAG), and memory systems. The best baseline (RAG-Both) achieves 0.358 overall F1 and 0.342 on Hybrid. Document-only retrieval (RAG-Doc) collapses to 0.267 on Hybrid despite achieving 0.453 on Doc-only questions, demonstrating a clear joint-retrieval gap that motivates architectures unifying conversational memory with long-document navigation. We release the dataset, generation pipeline, and all baseline implementations.

2606.04438 2026-06-04 cs.LG cs.AI 版本更新

LoopMoE: Unifying Iterative Computation with Mixture-of-Experts for Language Modeling

LoopMoE:统一迭代计算与混合专家模型用于语言建模

Wenkai Chen, Tianshu Li, Wenyong Huang, Yichun Yin, Lifeng Shang, Chengwei Qin

发表机构 * Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Huawei Technologies Co.,Ltd.(华为技术有限公司)

AI总结 提出LoopMoE,通过迭代自适应层归一化和容量平衡策略,在相同参数和FLOPs下,循环MoE语言模型在多个基准上优于标准MoE。

详情
AI中文摘要

混合专家模型(MoE)和循环架构分别沿着参数容量和有效深度两个正交维度扩展模型。然而,主流的循环架构依赖于密集主干,将参数数量与每个token的FLOPs耦合,这使得在匹配预算下无法隔离迭代计算的效果。为此,我们提出了LoopMoE,一种循环MoE语言模型,通过两种设计将稀疏路由与迭代权重共享计算相结合。第一种是IterAdaLN,它通过联合以迭代索引和每个token隐藏状态为条件的调制信号来解决权重共享对称性。第二种是一种容量平衡策略,恢复了经过良好调整的非循环参考模型的注意力到FFN活跃参数比率。这些设计共同实现了在相同总参数、每个token FLOPs和活跃子层比率下,循环MoE与标准MoE的首次严格受控的头对头评估。在3B规模下,LoopMoE在9个下游基准测试中的8个上优于标准MoE,平均提升超过1个点。在9B规模下,LoopMoE继续优于匹配的标准MoE,表明架构优势在更大规模下持续存在。我们的工作建立了稀疏性和循环性的受控综合,并为循环语言模型指明了一个有前景的方向。

英文摘要

Mixture-of-Experts (MoE) and looped architectures scale models along two orthogonal axes, namely parameter capacity and effective depth. However, mainstream looped architectures rely on dense backbones that couple parameter count with per-token FLOPs, which makes it impossible to isolate the effect of iterative computation under matched budgets. To this end, we present LoopMoE, a looped MoE language model that integrates sparse routing with iterative weight-shared computation through two designs. The first is IterAdaLN, which resolves weight-sharing symmetry via a modulation signal jointly conditioned on the iteration index and the per-token hidden state. The second is a capacity-balancing strategy that recovers the attention-to-FFN active parameter ratio of well-tuned non-looped references. Together, these designs enable the first strictly controlled, head-to-head evaluation of a looped MoE against a Vanilla MoE under identical total parameters, per-token FLOPs, and active sublayer ratios. At the 3B scale, LoopMoE outperforms the Vanilla MoE on 8 of 9 downstream benchmarks with an average improvement exceeding 1 point. At the 9B scale, LoopMoE continues to outperform the matched Vanilla MoE, indicating that the architectural gain persists at larger scale. Our work establishes a controlled synthesis of sparsity and recurrence, and suggests a promising direction for looped language models.

2606.04435 2026-06-04 cs.AI cs.CL cs.CR cs.IR 版本更新

Cascading Hallucination in Agentic RAG: The CHARM Framework for Detection and Mitigation

智能体RAG中的级联幻觉:用于检测和缓解的CHARM框架

Saroj Mishra

发表机构 * University of North Dakota(北达科他大学)

AI总结 针对多步智能体RAG管道中早期错误传播并放大为最终错误输出的级联幻觉问题,提出CHARM框架,通过阶段级事实验证、跨阶段一致性跟踪、置信度传播监控和级联解析触发四个组件实现检测与缓解,在多个数据集上达到89.4%的级联检测率和82.1%的错误传播减少。

详情
AI中文摘要

多步智能体检索增强生成(RAG)管道在复杂推理任务中展现出显著能力,但仍然容易受到一类现有幻觉检测机制系统性遗漏的故障影响:级联幻觉,即在管道早期阶段引入的错误会通过连续推理步骤传播并放大,产生自信但事实不正确的最终输出。为解决这一漏洞,我们将级联幻觉形式化为智能体RAG系统中的一种独特故障模式,提出四种级联模式的分类法,并引入CHARM(级联幻觉感知解析与缓解),一种用于检测和中断多步推理管道中错误传播的架构框架。CHARM包含四个组件——阶段级事实验证、跨阶段一致性跟踪、置信度传播监控和级联解析触发——它们与标准智能体RAG管道并行运行,无需替换架构。我们在HotpotQA、MuSiQue、2WikiMultiHopQA以及一个自定义对抗数据集上,在LangChain智能体管道配置下评估CHARM,实现了89.4%的级联检测率、5.3%的假阳性率、每阶段平均215 ms ± 18 ms的延迟开销,以及82.1%的错误传播减少,而输出级检测器仅为18.5%。组件消融实验证实每个检测模块对整体级联覆盖都有显著贡献。CHARM与人在回路监督框架集成,为生产级智能体AI部署提供完整的可靠性和治理栈。

英文摘要

Multi-step agentic retrieval-augmented generation (RAG) pipelines have demonstrated significant capability for complex reasoning tasks, yet remain vulnerable to a class of failure that existing hallucination detection mechanisms systematically miss: cascading hallucination, where errors introduced at early pipeline stages propagate and amplify across successive reasoning steps, producing confident but factually incorrect final outputs. To address this vulnerability, we formalize cascading hallucination as a distinct failure mode in agentic RAG systems, present a four-type taxonomy of cascade patterns, and introduce CHARM (Cascading Hallucination Aware Resolution and Mitigation), an architectural framework for detecting and interrupting error propagation in multi-step reasoning pipelines. CHARM comprises four components - stage-level fact verification, cross-stage consistency tracking, confidence propagation monitoring, and cascade resolution triggering - that operate alongside standard agentic RAG pipelines without requiring architectural replacement. We evaluate CHARM on HotpotQA, MuSiQue, 2WikiMultiHopQA, and a custom adversarial dataset across LangChain agentic pipeline configurations, achieving an 89.4% cascade detection rate with a 5.3% false positive rate and 215 ms +/- 18 ms average latency overhead per stage, achieving an error propagation reduction of 82.1%, compared to 18.5% for output-level detectors. Component ablations confirm that each detection module contributes meaningfully to overall cascade coverage. CHARM integrates with human-in-the-loop oversight frameworks to provide a complete reliability and governance stack for production agentic AI deployment.

2606.04425 2026-06-04 cs.CR cs.AI 版本更新

What If Prompt Injection Never Left? Exploring Cross-Session Stored Prompt Injection in Agentic Systems

如果提示注入从未消失?探索智能体系统中的跨会话存储提示注入

Yuanbo Xie, Tianyun Liu, Yingjie Zhang, Suchen Liu, Yulin Li, Liya Su, Tingwen Liu

发表机构 * Institute of Information Engineering, Chinese Academy of Sciences(中国科学院信息工程研究所) School of Cyber Security, University of Chinese Academy of Sciences(中国科学院大学网络空间安全学院) AI Sec Lab, Beijing Chaitin Technology Co.,Ltd(北京柴坦科技有限公司AI安全实验室)

AI总结 本研究引入跨会话存储提示注入,通过持久化状态使提示注入从单会话模型级威胁转变为长期系统级漏洞,并构建了分类法、基准测试和沙箱工具以评估风险。

Comments position paper

详情
AI中文摘要

现代智能体系统将大语言模型从会话受限的助手转变为跨会话持久化并演化共享世界状态的有状态系统,通过记忆、文件系统、工具和其他长期存在的上下文工件实现。这种转变从根本上扩展了提示注入的攻击面。然而,先前关于提示注入的工作主要关注单会话内的模型级威胁,忽视了跨会话持久系统状态如何从根本上改变智能体系统的系统级风险。受Web系统中存储型跨站脚本的启发,我们引入了跨会话存储提示注入,其中成功的注入可以持久存在于智能体系统状态中,并在原始攻击者交互结束后长时间静默影响未来执行。为了系统研究这一威胁,我们形式化了存储提示注入,并开发了关于对抗性内容如何跨会话持久化并影响智能体系统的分类法。我们进一步开发了基准测试和沙箱工具包来评估存储提示注入的风险,支持对不同模型、攻击目标和持久化渠道的攻击成功率进行定量分析。我们的发现强调,持久化将提示注入从短暂的模型级威胁转变为嵌入智能体执行状态中的长期系统级漏洞。我们希望这项工作能引起对这一新兴威胁的更广泛关注,并激励社区系统研究和缓解智能体系统中持久化带来的系统风险。

英文摘要

Modern agentic systems transform LLMs from session-bounded assistants into stateful systems that persist and evolve shared world state across sessions through memories, filesystems, tools, and other long-lived contextual artifacts. This shift fundamentally expands the attack surface of prompt injection. However, prior works on prompt injection have largely focused on model-level threats within a single session, overlooking how cross-session persistent system state fundamentally changes the system-level risk of agentic systems. Inspired by stored cross-site scripting in web systems, we introduce cross-session stored prompt injection, where a successful injection can persist within agentic system state and silently influence future executions long after the original attacker interaction has ended. To systematically study this threat, we formalize stored prompt injection and develop a taxonomy of how adversarial content persists and affects agentic systems across sessions. We further develop a benchmark and sandbox toolkit to evaluate the risks of stored prompt injection, enabling quantitative analysis of attack success across different models, attack goals, and persistence channels. Our findings highlight that persistence transforms prompt injection from an ephemeral model-level threat into a long-lived system-level vulnerability embedded within agent execution state. We hope this work draws broader attention to this emerging threat and motivates the community to systematically study and mitigate system risks arising from persistence in agentic systems.

2606.04419 2026-06-04 eess.IV cs.AI cs.CV physics.med-ph 版本更新

L-TGVN: Leveraging Longitudinal Priors for Personalized Rapid MRI

L-TGVN:利用纵向先验进行个性化快速MRI

Arda Atalık, Sumit Chopra, Daniel K. Sodickson

发表机构 * NYU Center for Data Science(纽约大学数据科学中心) Center for Advanced Imaging Innovation and Research (CAI²R)(先进成像创新与研究中心) Courant Institute of Mathematical Sciences(数学科学学院) Function Health

AI总结 提出L-TGVN,一种利用纵向先验作为侧信息从高度欠采样测量中重建当前扫描的变分网络,无需显式配准并适应协议差异,在定量指标和结构保持上优于基线方法。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

MRI提供优异的软组织对比度且无电离辐射,但长采集时间增加患者不适,同时提高检查成本并限制扫描仪吞吐量。减少扫描时间的常见方法是采集更少的测量值,这会产生一个病态线性逆问题;因此,恢复诊断质量的图像需要结合测量数据之外的先验知识。在随访检查中,患者最近的先前扫描可以提供高度信息化的受试者特定背景,但实际应用因时间变化(包括病理进展)、扫描间错位以及跨采集协议漂移而复杂化。在这项工作中,我们引入了L-TGVN,一种纵向信任引导变分网络,利用先前扫描作为侧信息,从高度欠采样测量中重建当前扫描。关键是,L-TGVN约束先前扫描的影响与获取的测量一致。与许多现有的纵向重建方法不同,它不需要先前扫描和当前扫描之间的显式预配准。它进一步适应不同就诊间的采集协议差异(例如,序列参数的变化)。我们在匹配容量的基线上评估L-TGVN,包括先验引导方法和不使用纵向先验的方法,并观察到标准定量指标的一致改进,以及在挑战性加速下更好地保留精细结构。源代码可在github.com/sodicksonlab/L-TGVN获取。

英文摘要

MRI provides excellent soft-tissue contrast without ionizing radiation, but long acquisition times increase patient discomfort while also raising exam costs and limiting scanner throughput. A common approach to reduce scan time is to acquire fewer measurements, which yields an ill-posed linear inverse problem; recovering diagnostic-quality images therefore requires incorporating prior knowledge beyond the measured data. In follow-up exams, the most recent prior scan of a patient can provide a highly informative subject-specific context, but practical use is complicated by temporal changes (including pathology progression), misalignment between scans, and protocol drift across acquisitions. In this work, we introduce L-TGVN, a Longitudinal Trust-Guided Variational Network that leverages prior scans as side information to reconstruct the current scan from heavily undersampled measurements. Crucially, L-TGVN constrains the influence of prior scans to be consistent with the acquired measurements. Unlike many existing longitudinal reconstruction methods, it does not require explicit pre-registration between prior and current scans. It further accommodates differences in acquisition protocols across visits (e.g., changes in sequence parameters). We evaluate L-TGVN against matched-capacity baselines, including prior-guided methods and methods that do not use longitudinal priors, and observe consistent improvements in standard quantitative metrics together with better preservation of fine structures at challenging accelerations. Source code is available at github.com/sodicksonlab/L-TGVN.

2606.04408 2026-06-04 cs.LG cs.AI 版本更新

An Ensembled Latent Factor Model via Differential Evolution and Gradient Descent Optimization

基于差分进化和梯度下降优化的集成潜在因子模型

Rui Zhang, Jinhang Liu, Wenbo Zhang

发表机构 * Chongqing Academy of Economics Research(重庆经济研究院) College of Computer and Information Science, Southwest University(西南大学计算机与信息科学学院)

AI总结 针对高维不完全数据,提出一种集成潜在因子模型,通过差分进化和梯度下降两种优化方法分别建模并自适应加权融合,以获取更全面、偏差更小的表示。

详情
AI中文摘要

高维不完全(HDI)数据在许多现实世界的大数据场景中普遍存在。潜在因子模型是一种常见的表示学习方法,能够从这些数据中揭示信息丰富的潜在因子。然而,大多数现有的潜在因子模型仅依赖梯度下降进行优化,这可能导致表示不充分且有偏差,特别是在处理异构HDI数据时。因此,本研究提出了一种基于差分进化和梯度下降优化的集成潜在因子模型(ELFM-DEGDO),其设计包括两个方面:1)分别通过差分进化和梯度下降优化独立建模两个不同的潜在因子模型;2)通过定制的自适应加权机制将这两个不同的潜在因子模型组合起来,以有效融合它们的优势。通过利用两种优化范式的互补优势,ELFM-DEGDO能够为HDI数据生成更全面、偏差更小的表示。在三个HDI数据集上的测试表明,ELFM-DEGDO的性能始终优于相关的几种潜在因子模型。

英文摘要

High-dimensional and incomplete (HDI) data are prevalent in many real-world big data scenarios. Latent factor models serve as a common representation learning approach, capable of uncovering informative latent factors from such data. Nevertheless, most existing latent factor models rely solely on gradient descent for optimization, which may lead to insufficient and biased representations, particularly when dealing with heterogeneous HDI data. Thus, this study proposes an Ensembled Latent Factor Model via Differential Evolution and Gradient Descent Optimization (ELFM-DEGDO) with two-fold designed: 1) two diverse latent factor models are independently modeled via differential evolution and gradient descent optimization, respectively, and 2) the two diverse latent factor models are combined via a customized self-adaptive weighting mechanism to effectively fuse their strengths. By leveraging the complementary advantages of both optimization paradigms, ELFM-DEGDO is able to produce more comprehensive and less biased representations for HDI data. Three HDI datasets are tested to show that ELFM-DEGDO consistently performs better than related several latent factor models.

2606.04405 2026-06-04 cs.LG cs.AI 版本更新

Low-Rank Decay for Grokking in Scale-Invariant Transformers: A Spectral-Geometric View

尺度不变Transformer中Grokking的低秩衰减:谱几何视角

Mingyu Li

发表机构 * Beijing Normal University(北京师范大学)

AI总结 针对尺度不变Transformer中权重衰减无法简化归一化层函数的问题,提出低秩衰减(LRD)正则化器,通过核范数子梯度的切向分量压缩奇异值,在模算术任务中加速有效秩下降并扩展延迟泛化(grokking)的数据边界。

详情
AI中文摘要

现代Transformer架构经常采用归一化机制,如RMSNorm和Query-Key归一化,使得模型的部分相对于权重幅度近似尺度不变。在这种机制下,标准的Frobenius范数权重衰减仅沿权重空间的径向方向作用,无法直接简化归一化层所表示的函数。我们通过这一视角研究小规模算法任务中的grokking现象,并提出\emph{低秩衰减}(LRD),一种类似核范数的谱正则化器,其子梯度——极因子$UV^\top$——即使在尺度不变设置中也保留切向分量。这一区别具有具体的动力学后果:在模型记忆训练集且任务梯度消失后,L2衰减无法再重塑权重谱,而LRD则以类似$\ell_1$的方式继续压缩奇异值。在模算术任务中,我们发现LRD诱导Query/Key矩阵的快速有效秩下降,并扩展了延迟泛化(grokking)发生的数据分数边界。我们进一步通过核范数子微分在低秩流形附近的“针到扇”展开,提供了谱几何解释。

英文摘要

Modern Transformer architectures frequently employ normalization mechanisms such as RMSNorm and Query-Key Normalization, making parts of the model approximately scale-invariant with respect to weight magnitudes. In this regime, standard Frobenius-norm weight decay acts purely along the radial direction of the weight space and cannot directly simplify the function represented by the normalized layer. We study grokking in small algorithmic tasks through this lens and propose \emph{Low-Rank Decay} (LRD), a nuclear-norm-like spectral regularizer whose subgradient -- the polar factor $UV^\top$ -- retains a tangential component even in the scale-invariant setting. This distinction has a concrete dynamical consequence: after the model memorizes the training set and task gradients vanish, L2 decay can no longer reshape the weight spectrum, whereas LRD continues to compress singular values in an $\ell_1$-like fashion. On modular arithmetic tasks, we find that LRD induces rapid effective-rank collapse in Query/Key matrices and expands the data-fraction boundary at which delayed generalization (grokking) occurs. We further provide a spectral-geometric interpretation through the ``needle-to-fan'' expansion of the nuclear-norm subdifferential near low-rank strata.

2606.04402 2026-06-04 cs.AI 版本更新

Not All Errors Are Equal: Consequence-Aware Reasoning Compute Allocation

并非所有错误都同等重要:后果感知的推理计算分配

Jingbo Wen, Liang He, Ziqi He

发表机构 * The University of Sydney(悉尼大学) Shanghai Institute of Optics and Fine Mechanics(上海光学精密机械研究所)

AI总结 提出后果感知的测试时计算分配方法,通过轻量级预测器估计任务错误成本,在相同预算下将高后果任务路由到更多计算资源,在SWE-bench上降低22%-33%的成本加权损失。

详情
AI中文摘要

现代推理模型可以为不同任务分配不同量的测试时计算,例如思考令牌、模型调用或计算预算。现有方法通常通过预测难度来驱动这种分配,并在预期能提高准确率的地方投入更多计算。这隐含地假设所有失败的成本相同,因为准确率目标对每个任务一视同仁。然而,这种假设在部署中并不成立:日志消息中的拼写错误和导致生产数据库损坏的迁移都算作一次基准失败,但它们的实际成本根本不同。为填补这一空白,我们提出后果感知的测试时计算分配。我们不是仅根据预测难度来路由计算,而是使用轻量级预测器从问题文本中估计如果任务被错误解决会有多高的成本。然后,调度器在相同总预算下将更高后果的任务路由到更大的计算层级或更高的思考预算。我们在SWE-bench Lite上进行主要实验,并在Multi-SWE-bench mini上评估跨数据集行为,总共涵盖700个软件工程任务。我们的结果表明,在各种标注下,后果和难度大致正交,并且当前的思考模型并未根据后果充分分配计算。此外,我们的仅问题文本预测器在300个SWE-bench任务中从未将高后果任务误分类为低后果任务。在匹配的计算预算下,我们的后果感知调度器相对于难度感知路由将成本加权损失降低了22%至33%;特别是,优先级感知变体(根据边际效用信号缩放每个任务的成本进行路由)降低了超过30%,而其可部署的预测器驱动版本保留了超过90%的预言机增益。

英文摘要

Modern reasoning models can allocate different amounts of test-time computation, such as thinking tokens, model calls, or compute budget, to different tasks. Existing methods generally drive this allocation by predicted difficulty and spend more compute where it is expected to raise accuracy. This implicitly assumes that all failures cost the same, since an accuracy objective weights every task equally. However, such an assumption does not hold in deployment: A typo in a log message and a migration that corrupts a production database both count as one benchmark failure, but their real-world costs are fundamentally different. To fill this gap, we propose consequence-aware test-time compute allocation. Instead of routing compute only by predicted difficulty, we use a lightweight predictor to estimate from the issue text how costly a task would be if solved incorrectly. The scheduler then routes higher-consequence tasks to larger compute tiers or higher thinking budgets under the same total budget. We conduct main experiments on SWE-bench Lite and evaluate cross-dataset behavior on Multi-SWE-bench mini, covering 700 software-engineering tasks in total. Our results reveal that consequence and difficulty are approximately orthogonal under various annotations, and that current thinking models do not allocate compute sufficiently according to consequence. Moreover, our issue-only predictor never misclassifies a high-consequence task as low-consequence across the 300 SWE-bench tasks. Under matched compute budgets, our consequence-aware scheduler reduces cost-weighted loss by 22% to 33% relative to difficulty-aware routing; in particular, the priority-aware variant, which routes by per-task cost scaled by the marginal-utility signal, crosses 30%, and its deployable predictor-driven version retains over 90% of the oracle gain.

2606.04391 2026-06-04 cs.AI 版本更新

Online Skill Learning for Web Agents via State-Grounded Dynamic Retrieval

基于状态接地动态检索的Web代理在线技能学习

Jiaxi Li, Ke Deng, Yun Wang, Jingyuan Huang, Yucheng Shi, Qiaoyu Tan, Jin Lu, Ninghao Liu

发表机构 * University of Georgia(佐治亚大学) Tencent America(腾讯美国) New York University(纽约大学) The Hong Kong Polytechnic University(香港理工大学)

AI总结 提出状态接地动态检索(SGDR)方法,通过逐步技能重用提升Web代理在多步自动化任务中的表现,在WebArena上平均成功率分别达到37.5%(GPT-4.1)和24.3%(Qwen3-4B)。

Comments 17 pages

详情
AI中文摘要

语言代理越来越依赖可重用技能来改进跨相关任务的多步Web自动化。越来越多的研究关注在线技能学习,其中代理不断从先前的任务轨迹中归纳技能,并在未来的任务中动态重用它们。然而,现有方法主要在任务级别重用技能:根据初始任务指令检索一组固定的技能,并在执行过程中保持不变。这种静态策略与Web执行不一致,因为适当的下一步动作不仅取决于任务目标,还取决于当前网页状态,而网页状态通常会转变为初始技能无法覆盖的情况。为了解决这一差距,我们提出了状态接地动态检索(SGDR),一种在线技能学习方法,使Web代理能够逐步重用技能。SGDR由三个组件组成:一个滑动窗口提取过程,将完成的轨迹转化为可在中间执行状态调用的可重用子程序;一种双文本代码表示,将技能检索与可执行动作连接起来;以及一种状态接地动态检索机制,将技能与任务目标和当前网页状态相匹配。在WebArena上跨五个领域的实验表明,SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. 代码可在 https://github.com/plusnli/skill-dynamic-retrieval 获取。

英文摘要

Language agents increasingly rely on reusable skills to improve multi-step web automation across related tasks. A growing line of work studies online skill learning, where agents continually induce skills from previous task trajectories and reuse them in future tasks on the fly. However, existing methods mainly reuse skills at the task-level: a fixed set of skills is retrieved based on the initial task instruction and then held fixed throughout execution. This static strategy is misaligned with web execution, where the appropriate next action depends not only on the task goal but also on the current webpage state, which often transitions into situations that the initial skills fail to cover. To address this gap, we propose State-Grounded Dynamic Retrieval (SGDR), an online skill learning method that enables stepwise skill reuse for web agents. SGDR consists of three components: a sliding-window extraction process that turns completed trajectories into reusable sub-procedures invokable at intermediate execution states, a dual text-code representation that connects skill retrieval with executable action, and a state-grounded dynamic retrieval mechanism that matches skills to both the task goal and the current webpage state. Experiments on WebArena across five domains show that SGDR consistently outperforms strong baselines, achieving average success rates of 37.5% with GPT-4.1 and 24.3% with Qwen3-4B, corresponding to relative gains of 10.6% and 10.0% over the strongest baseline, respectively. The code is available at https://github.com/plusnli/skill-dynamic-retrieval.

2606.04388 2026-06-04 cs.CR cs.AI cs.LG 版本更新

TITAN-FedAnil+: Trust-Based Adaptive Blockchain Federated Learning for Resource-Constrained Intelligent Enterprises

TITAN-FedAnil+:面向资源受限智能企业的基于信任的自适应区块链联邦学习

Muhammad Hadi, Muhammad Jahangir, Talha Shafique, Muhammad Khuram Shahzad

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出TITAN-FedAnil+框架,通过基于亲和传播的自适应聚类聚合过滤恶意更新、GPU加速向量化提升效率及有符号状态跳变机制实现轻量级区块链重同步,在资源受限边缘设备上内存开销降低81%。

Comments 8 pages, 5 figures; code available at https://github.com/error8149/FedAnilPlus-Optimized

详情
AI中文摘要

联邦学习(FL)已成为一种在保护数据隐私的同时实现协作智能的有效范式。然而,由非独立同分布(non-IID)数据分布引起的数据异构性和去中心化安全威胁仍然是重大挑战,尤其是在资源受限的企业环境中。本文提出了TITAN-FedAnil+,一种面向智能企业中区块链联邦学习的基于信任的自适应网络。所提出的框架引入了基于亲和传播的自适应聚类聚合,无需预先知道攻击者数量即可识别并过滤恶意更新。此外,采用GPU加速向量化以提高计算效率,同时通过有符号状态跳变机制实现轻量级区块链重同步。实验结果表明,与基线框架相比,在受限的8 GB边缘设备上经过50轮通信,内存开销显著降低,节省高达81%。结果表明,TITAN-FedAnil+有效提升了智能企业环境中安全联邦学习部署的鲁棒性、可扩展性和资源效率。

英文摘要

Federated Learning (FL) has emerged as an effective paradigm for collaborative intelligence while preserving data privacy. However, data heterogeneity arising from non-IID distributions and decentralized security threats remain significant challenges, particularly in resource-constrained enterprise environments. This paper presents TITAN-FedAnil+, a Trust-Based Adaptive Network for blockchain-enabled federated learning in intelligent enterprises. The proposed framework introduces affinity propagation-based adaptive clustered aggregation to identify and filter malicious updates without requiring prior knowledge of the number of attackers. In addition, GPU-accelerated vectorization is employed to improve computational efficiency, while a signed state jump mechanism enables lightweight blockchain resynchronization. Experimental results demonstrate substantial reductions in memory overhead, achieving up to 81% savings across 50 communication rounds on constrained 8 GB edge devices compared with the baseline framework. The results indicate that TITAN-FedAnil+ effectively improves robustness, scalability, and resource efficiency for secure federated learning deployments in intelligent enterprise environments.

2606.04387 2026-06-04 cs.IR cs.AI 版本更新

Rethinking Sales Lead Scoring with LLM-based Hierarchical Preference Ranking

重新思考基于LLM的分层偏好排名的销售线索评分

Chenyu Zhang, Yiwen Liu, Yin Sun, Xinyuan Zhang, Yuji Cao, Junming Jiao, Juyi Qiao

发表机构 * Intelligent Business Team, Li Auto Inc.(李自动公司智能商务团队)

AI总结 针对高价值领域销售线索转化问题,提出基于LLM的判别式框架HPRO,通过分层偏好排名优化联合建模结构化与非结构化数据,实现评分与排名性能提升。

详情
AI中文摘要

在高价值领域(如汽车、房地产)中,销售线索转化与电子商务推荐有根本不同,因为其决策周期长且涉及多阶段漏斗。传统的线索评分方法(基于规则的评分卡、机器学习或逐点CTR模型)面临严重挑战:监督信号稀疏、非结构化CRM日志中的语义鸿沟,以及无法捕捉线索的相对优先级。虽然大型语言模型(LLM)能够对客户交互提供卓越的语义理解,但通用LLM不适合线索排名:它们生成文本而非可比较的分数,并且缺乏与销售漏斗分层优先级的一致性。我们提出了一种基于LLM的判别式框架用于销售线索评分,该框架支持结构化CRM特征和非结构化客户交互的联合建模。在此框架之上,我们提出了HPRO(分层偏好排名优化),通过分层偏好排名目标增强销售线索评分。HPRO采用边际感知的Bradley-Terry公式,将稀疏的二元标签转换为密集的、漏斗感知的偏好对,使线索评分能够同时利用逐点和成对监督。在来自领先新能源汽车品牌的大规模数据上的实验表明,分类性能达到最先进水平(AUC 0.8161),排名性能提升(排名靠前线索的精确度提高39.7%)。为期132天的在线A/B测试验证了9.5%的销量提升,确认了实际的商业影响。

英文摘要

Sales lead conversion in high-stakes domains (e.g., automotive, real estate) differs fundamentally from e-commerce recommendation due to prolonged decision cycles and multi-stage funnels. Traditional lead scoring methods rule-based scorecards, machine learning, or pointwise CTR models face severe challenges: sparse supervision, a semantic gap in unstructured CRM logs, and inability to capture relative lead priority. While Large Language Models(LLMs) offer superior semantic understanding of customer interactions, general-purpose LLMs are ill-suited for lead ranking: they generate text rather than comparable scores, and lack alignment with the hierarchical priorities of sales funnels. We introduce an LLM-based discriminative framework for sales lead scoring, which supports joint modeling of structured CRM features and unstructured customer interactions. On top of this framework, we propose HPRO (Hierarchical Preference Ranking Optimization), which augments sales lead scoring with a hierarchical preference ranking objective. HPRO employs a margin-aware Bradley-Terry formulation to transform sparse binary labels into dense, funnel-aware preference pairs, enabling lead scoring to leverage both pointwise and pairwise supervision. Experiments on large-scale data from a leading NEV brand demonstrate state-of-the-art classification (AUC 0.8161) and ranking performance (+39.7% precision among top-ranked leads). A 132-day online A/B test validates 9.5% sales volume uplift, confirming real-world commercial impact.

2606.04382 2026-06-04 cs.DL cs.AI cs.IR 版本更新

LCSHBench: A Multilingual, Consensus-Grounded Benchmark for Library of Congress Subject Heading Assignment

LCSHBench:一个多语言、共识基础的国会图书馆主题标目分配基准

Kwok Leong Tang

发表机构 * Library of Congress(国会图书馆)

AI总结 提出LCSHBench基准,基于多图书馆共识构建多语言书目记录集,通过精确匹配和概念匹配评估自动主题编目,并展示低秩微调嵌入器在跨语言检索中的改进。

详情
AI中文摘要

自动主题编目为书目记录分配受控词汇标目,但LCSH缺乏标准的公开基准。我们引入LCSHBench:来自哈佛、哥伦比亚和普林斯顿开放许可目录的15种语言的22,346本书。只有当至少两个独立编目机构分配了LCSH时,记录才被纳入;我们发布每个目录的来源以及联合和一致答案视图。对465,187部由三个图书馆编目的作品进行的一致性研究显示了这种设计的重要性:图书馆通常在底层主题上达成一致(93.3%共享概念级标目),但在确切表达上经常不同(39.4%具有相同的标目集)。因此,LCSHBench通过开放词汇生成和全词汇检索,使用按语言和标目类型分解的集合和排名指标,对精确匹配和概念匹配进行评分。作为首次演示,对300M设备端嵌入器的低秩微调改进了跨语言检索,并在开发集上的精确召回率@200(0.659 vs 0.623)超过了3,072维托管嵌入器。语言面板显示增益并不均匀,保留测试和端到端确认仍是未来工作。

英文摘要

Automated subject cataloging assigns controlledvocabulary headings to bibliographic records, but LCSH has no standard public benchmark. We introduce LCSHBench: 22,346 books in 15 languages from the openly licensed Harvard, Columbia, and Princeton catalogs. Records enter only when at least two independent cataloging agencies assigned LCSH; we release per-catalog provenance plus union and unanimous answer views. A concordance study of 465,187 works cataloged by all three libraries shows why this design matters: libraries usually agree on the underlying topic (93.3% share a concept-level heading) but often differ in exact expression (39.4% have identical heading sets). LCSHBench therefore scores both exact and concept matches, with set and rank metrics broken down by language and heading type, across open-vocabulary generation and full-vocabulary retrieval. As a first demonstration, a low-rank fine-tune of a 300M on-device embedder improves cross-lingual retrieval and beats a 3,072-dimensional hosted embedder on development exact recall@200 (0.659 vs 0.623). The language panel shows the gain is not uniform, and held-out-test and end-to-end confirmation remain future work.

2606.04381 2026-06-04 cs.LG cs.AI 版本更新

From Symbolic to Geometric: Enabling Spatial Reasoning in Large Language Models

从符号到几何:在大语言模型中实现空间推理

Chen Chu, Bita Azarijoo, Li Xiong, Khurram Shafique, Cyrus Shahabi

发表机构 * University of Southern California(南加州大学) Emory University(埃默里大学) Novateur Research Solutions(Novateur研究解决方案)

AI总结 提出空间语言模型(SLM),通过将位置信息作为一等模态并学习空间表示,在推理过程中实现几何空间推理,显著优于基于符号推理的现有方法。

详情
AI中文摘要

近期的大语言模型(LLM)通常表现出空间推理能力;然而,这种能力很大程度上是\emph{符号}性的,源于对空间语言的模式匹配,而非真正的\emph{几何}空间推理。由于LLM操作离散令牌,它们缺乏对连续空间表示、显式几何计算和结构化空间算子的原生支持。为解决这一局限,我们引入了\emph{空间语言模型(SLM)},这是首个将位置信息作为一等模态并在模型推理过程中实现几何空间推理的多模态LLM。SLM直接操作学习到的空间表示,而非空间关系的文本描述。为支持有效训练,我们构建了\emph{空间指令数据集},该数据集对齐了空间表示、原子几何操作和自然语言指令。我们进一步提出了名为\emph{SpatialEval}的新基准,旨在评估属性、距离、拓扑和相对位置任务上的空间推理。大量实验表明,SLM显著优于依赖通过提示工程或文本抽象进行符号推理的现有基于LLM的方法,展示了集成几何空间表示对稳健空间推理的优势。我们的指令数据集、评估基准、模型训练代码和模型检查点可在\hyperlink{https://github.com/chuchen2017/SLM}{https://github.com/chuchen2017/SLM}获取。

英文摘要

Recent large language models (LLMs) often appear to exhibit spatial reasoning ability; however, this capability is largely \emph{symbolic}, arising from pattern matching over spatial language rather than true \emph{geometric} reasoning over space. Because LLMs operate on discrete tokens, they lack native support for continuous spatial representations, explicit geometric computation, and structured spatial operators. To address this limitation, we introduce the \emph{Spatial Language Model (SLM)}, the first multimodal LLM that treats location information as a first-class modality and enables geometric spatial reasoning within the model's inference process. SLM directly operates on learned spatial representations rather than textual descriptions of spatial relations. To support effective training, we construct a \emph{Spatial Instruction Dataset} that aligns spatial representations, atomic geometric operations, and natural language instructions. We further propose a new benchmark named \emph{SpatialEval}, which is designed to evaluate spatial reasoning across attributes, distance, topology, and relative-position tasks. Extensive experiments show that SLM significantly outperforms existing LLM-based approaches that rely on symbolic reasoning via prompt engineering or textual abstraction, demonstrating the benefits of integrating geometric spatial representations for robust spatial reasoning. Our instruction dataset, evaluation benchmark, model training codes, and models' checkpoints can be found at: \hyperlink{https://github.com/chuchen2017/SLM}{https://github.com/chuchen2017/SLM}.

2606.04374 2026-06-04 cs.IR cs.AI 版本更新

DSIRM: Learning Query-Bridged Discrete Semantic Identifiers for E-commerce Relevance Modeling

DSIRM:学习查询桥接的离散语义标识符用于电商相关性建模

Bokang Wang, Xing Fang, Mingmin Jin, Jing Wang, Zhentao Song, Guangxin Song, Jianbo Zhu

发表机构 * Taobao & Tmall Group of Alibaba(淘宝与天猫集团)

AI总结 针对电商搜索中连续嵌入难以捕捉细粒度属性区分的问题,提出查询桥接对比量化的离散语义标识符相关性模型(DSIRM),通过注入查询-物品交互监督学习语义感知分区,并利用生成式大语言模型预测物品标识符,显著提升相关性建模效果。

Comments Jing Wang (Corresponding Author)

详情
AI中文摘要

尽管连续嵌入在电商搜索相关性方面取得了快速进展,但一个长期存在的难题是难以捕捉细粒度的属性区分。虽然离散语义标识符(SIDs)已被广泛采用作为有前景的替代方案,但现有的SID生成方法严重依赖无监督量化。在现实场景中,缺乏显式监督通常使得更难决定哪些物品应共享一个SID,导致查询依赖排序的能力有限。为了解决无监督SID的问题,我们提出显式建模离散相关性特征,并开发了离散语义标识符相关性模型(DSIRM)。具体而言,我们在物品侧提出了一种查询桥接的对比量化方法,将查询-物品交互监督注入残差量化中,以主动学习相关性感知的语义分区。另一方面,我们在查询侧探索生成式大语言模型,从文本中显式预测物品SID,解决长尾查询和意图模糊问题。查询和物品SID之间的层次前缀匹配产生了具有判别力的特征,完美补充了密集信号。在天猫生产数据上的大量实验结果表明,我们提出的方法取得了更好的结果,离线AUC提升了+1.54%。通过高效的混合架构部署,它实现了显著的在线提升(UCTR +0.13%,UCTCVR +0.25%),证明了其巨大的工业价值。

英文摘要

Despite rapid progress of continuous embeddings for e-commerce search relevance, a long-standing open problem is the difficulty in capturing fine-grained attribute distinctions. While discrete Semantic Identifiers (SIDs) have been widely adopted as a promising alternative, existing SID generation methods rely heavily on unsupervised quantization. In realistic scenarios, the lack of explicit supervision often makes it more difficult to dictate which items should share an SID, resulting in limited capability for query-dependent ranking. To address the issue of unsupervised SIDs, we propose to explicitly model discrete relevance features and develop a Discrete Semantic Identifier Relevance Model (DSIRM). Specifically, we present a query-bridged contrastive quantization approach on the item side, injecting query-item interaction supervision into Residual Quantization to actively learn relevance-aware semantic partitions. On the other hand, we explore generative LLMs on the query side to explicitly predict item SIDs from text, resolving tail queries and intent ambiguity. Hierarchical prefix matching between query and item SIDs yields discriminative features that perfectly complement dense signals. Extensive experimental results on Tmall's production data show that our proposed approach has achieved better results, improving offline AUC by +1.54\%. Deployed via an efficient hybrid architecture, it achieves significant online lifts (+0.13\% UCTR, +0.25\% UCTCVR), proving its massive industrial value.

2606.04365 2026-06-04 cs.CV cs.AI 版本更新

Multi-Granularity 3D Kidney Lesion Characterization from CT Volumes

多粒度3D肾脏病变特征提取来自CT体积

Renjie Liang, Zhengkang Fan, Jinqian Pan, Chenkun Sun, Jiang Bian, Russell Terry, Jie Xu

发表机构 * Department of Health Outcomes and Biomedical Informatics, University of Florida(健康结果与生物医学信息学系,佛罗里达大学) Department of Urology, University of Florida(泌尿外科,佛罗里达大学) Department of Biostatistics and Health Data Science, Indiana University School of Medicine(生物统计学与健康数据科学系,印第安纳大学医学院) Center of Biomedical Informatics(生物医学信息学中心)

AI总结 提出LesionDETR,一种基于DETR的架构,通过大小距离匈牙利匹配和分层损失,实现从CT体积中按病变预测四个临床属性,在双侧异常检测上达到AUC 0.799。

详情
AI中文摘要

放射学报告通过类型、大小、增强和衰减描述肾脏病变,但现有的3D方法仅在患者或器官级别进行预测。我们将肾脏CT特征提取重新定义为每个病变的集合预测任务:一个模型为每个肾脏输出可变数量的病变,每个病变具有四个临床属性。我们从一家学术医疗中心的788名患者中整理了2,619个CT体积,具有多粒度的侧别和每个病变的标签,并使用KiTS23(489例)进行零样本外部验证。我们提出了 extbf{LesionDETR},一种DETR风格的架构,具有大小距离匈牙利匹配和分层损失,将每个槽的输出聚合到侧别目标。在四种输入表示和六种编码器初始化中,两个设计选择占主导地位:分割掩码作为输入通道,以及同域腹部预训练(SuPreM);通用大型语料库预训练并不比随机初始化更好。LesionDETR在UF-Health上达到双侧侧别异常AUC $0.799 \pm 0.009$,在KiTS23上达到$0.817 \pm 0.072$。计数条件变体在囊性病变上达到每个病变mAP $0.190 \pm 0.083$;罕见的实性病变AP仍处于噪声水平,表明下一个瓶颈是针对性数据收集,而非架构。该框架为下游结构化报告生成提供了经过验证的每个病变预测。

英文摘要

Radiology reports describe kidney lesions by type, size, enhancement, and attenuation, yet existing 3D methods predict only at the patient or organ level. We reformulate kidney CT characterization as a per-lesion set-prediction task: one model emits a variable number of lesions per kidney, each with four clinical attributes. We curated 2,619 CT volumes from 788 patients at one academic medical center, with multi-granularity side- and per-lesion labels, and used KiTS23 (489 cases) for zero-shot external validation. We propose \textbf{LesionDETR}, a DETR-style architecture with size-distance Hungarian matching and a hierarchical loss that aggregates per-slot outputs to side-level objectives. Across four input representations and six encoder initializations, two design choices dominate: a segmentation mask as an input channel, and same-domain abdominal pretraining (SuPreM); generic large-corpus pretraining is no better than random initialization. LesionDETR reaches bilateral side-level abnormality AUC $0.799 \pm 0.009$ on UF-Health and $0.817 \pm 0.072$ on KiTS23. A count-conditioned variant reaches per-lesion mAP $0.190 \pm 0.083$ on cystic lesions; rare solid-lesion AP stays at the noise floor, pointing to targeted data collection, not architecture, as the next bottleneck. The framework yields verified per-lesion predictions for downstream structured report generation.

2606.04345 2026-06-04 cs.CV cs.AI cs.LG 版本更新

HYolo: An Intelligent IoT-Based Object Detection System Using Hypergraph Learning

HYolo:一种基于超图学习的智能物联网目标检测系统

Isha Abid, Fawad Khan, Muhammad Khuram Shahzad

发表机构 * National University of Sciences and Technology(国家安全科学与技术大学)

AI总结 提出HYolo框架,将超图学习融入YOLO架构以建模高阶特征关系,在COCO数据集上mAP@50提升约12%。

Comments 8 pages, multiple figures;

详情
AI中文摘要

本文提出HYolo,一种基于物联网的智能目标检测框架,将超图学习集成到YOLO架构中。传统的基于YOLO的目标检测模型主要捕获成对特征交互,可能无法建模对象与上下文特征之间的复杂高阶关系。为解决这一局限,HYolo引入超图学习以捕获更丰富的上下文依赖关系并改进对象表示。在COCO数据集上的实验评估表明,与基线YOLO模型相比,性能显著提升。所提方法在mAP@50上实现了约12%的提升,同时增强了整体检测准确性和鲁棒性。通过建模高阶特征关系,HYolo在物联网环境中提供了改进的上下文理解和更可靠的目标检测性能。结果表明,将超图学习集成到目标检测流程中,为智能且上下文感知的物联网视觉系统提供了一个有前景的方向。

英文摘要

This paper presents HYolo, an intelligent IoT-based object detection framework that integrates hypergraph learning into the YOLO architecture. Traditional YOLO-based object detection models primarily capture pairwise feature interactions and may fail to model complex high-order relationships among objects and contextual features. To address this limitation, HYolo incorporates hypergraph learning to capture richer contextual dependencies and improve object representation. Experimental evaluation on the COCO dataset demonstrates significant performance improvements over baseline YOLO models. The proposed approach achieves approximately 12% improvement in mAP@50 while enhancing overall detection accuracy and robustness. By modeling high-order feature relationships, HYolo provides improved contextual understanding and more reliable object detection performance in IoT-based environments. The results indicate that integrating hypergraph learning into object detection pipelines offers a promising direction for intelligent and context-aware IoT vision systems.

2606.04342 2026-06-04 cs.LG cs.AI 版本更新

Expectations vs. Realities: The Cost of MSE-Optimal Forecasting Under Conditional Uncertainty

期望与现实:条件不确定性下MSE最优预测的成本

Riku Green, Zahraa S. Abdallah, Telmo M Silva Filho

发表机构 * The University of Bristol(布里斯托尔大学)

AI总结 本文通过条件不确定性间隙理论证明多步时间序列预测中MSE最优与边际真实性存在根本性权衡,并实证表明小幅牺牲MSE(≤5%)可显著提升边际真实性(中位数17.3%)。

Comments 12 pages, Accepted for KDD 2026 Research track

详情
AI中文摘要

多步时间序列预测(MSF)通常使用均方误差(MSE)等逐点误差指标进行评估,隐含地将条件均值视为充分目标。我们证明,在条件不确定性下,当条件期望在较长预测范围内无法代表典型实现值时,这种做法可能产生误导。我们通过条件不确定性间隙形式化这一效应,并证明只要该间隙非零,任何确定性预测器都无法同时最小化MSE并匹配实现未来的边际分布。这确立了MSF评估中逐点准确性与边际真实性之间根本性的、与模型无关的权衡。利用受控随机动力系统和九个真实世界预测基准,我们经验性地刻画了由此产生的准确性-真实性前沿,并量化了仅基于MSE的模型选择的实际成本。随着条件不确定性随预测范围增加,可达集扩展为明显的帕累托前沿,将MSE最优但分散不足的预测器与牺牲准确性换取真实边际变异性的方法区分开来。在多个基准中,我们发现MSE的小幅放松(≤5%)通常能带来边际真实性的不成比例提升,中位数改进为17.3%,在某些数据集中增益超过30%。我们进一步表明,常见的预测策略系统性地占据该前沿的不同区域:直接多输出预测器集中在准确性最优极端附近,而递归策略和基于样本的推断更倾向于边际真实性。这些结果共同揭示了长期预测中基于MSE评估的结构性失败模式,并将策略和推断选择重新定义为对不可避免的准确性-真实性权衡的导航。

英文摘要

Multi-step time series forecasting (MSF) is commonly evaluated using point-wise error metrics such as mean squared error (MSE), implicitly treating the conditional mean as a sufficient target. We show that this can be misleading under conditional uncertainty, where the conditional expectation becomes unrepresentative of typical realized values at longer horizons. We formalize this effect through a conditional uncertainty gap and prove that whenever this gap is nonzero, no deterministic predictor can simultaneously minimize MSE and match the marginal distribution of realized futures. This establishes a fundamental, model-agnostic trade-off between point accuracy and marginal realism in MSF evaluation. Using controlled stochastic dynamical systems and nine real-world forecasting benchmarks, we empirically characterize the resulting accuracy--realism frontier and \textbf{quantify the practical cost of MSE-only model selection}. As conditional uncertainty increases with forecast horizon, the attainable set expands into a pronounced Pareto front, separating MSE-optimal but under-dispersed predictors from methods that trade accuracy for realistic marginal variability. \textbf{Across benchmarks, we find that small relaxations in MSE ($\boldsymbol{\le 5\%}$) frequently unlock disproportionate gains in marginal realism, with median improvements of $\mathbf{17.3\%}$ and gains exceeding $\mathbf{30\%}$ in some datasets.} We further show that common forecasting strategies systematically occupy different regions of this frontier: direct multi-output predictors concentrate near the accuracy-optimal extreme, while recursive strategies and sample-based inference favors marginal realism. Together, these results expose a structural failure mode of MSE-based evaluation in long-horizon forecasting and recast strategy and inference selection as navigation of an unavoidable accuracy--realism trade-off.

2605.01910 2026-06-04 cs.LG cs.AI cs.DC 版本更新

Stochastic Sparse Attention for Memory-Bound Inference

随机稀疏注意力用于内存受限推理

Kyle Lee, Corentin Delacour, Kevin Callahan-Coray, Kyle Jiang, Can Yaras, Samet Oymak, Tathagata Srimani, Kerem Y. Camsari

发表机构 * University of California, Santa Barbara(加州大学圣芭芭拉分校) University of Michigan(密歇根大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出SANTA方法,通过从后softmax分布中采样稀疏索引来减少值缓存访问,实现无乘法的高效解码,在Llama-3.1-8B-Instruct上获得1.5倍注意力核加速和1.25倍端到端加速。

Comments Code available at https://github.com/OPUSLab/SANTA

详情
Journal ref
ICML 2026
AI中文摘要

自回归解码在长上下文中变得带宽受限,因为生成每个token需要从KV缓存中读取所有$n_k$个键和值向量。我们提出随机加法无乘法注意力(SANTA),一种通过从后softmax分布中采样$S \ll n_k$个索引并仅聚合这些值行来稀疏化值缓存访问的方法。这产生了后softmax值聚合的无偏估计,同时将值阶段的乘加运算替换为收集和加法。我们引入分层和系统采样来设计方差减少、GPU友好的变体。在32k token上下文的Llama-3.1-8B-Instruct上评估,S$^2$ANTA匹配基线准确率,同时在NVIDIA RTX 6000 Ada上相比FlashInfer和FlashDecoding实现高达1.5倍解码步注意力核加速。在批处理长上下文生成中,这些核增益转化为高达1.25倍的端到端解码延迟加速。最后,我们提出伯努利$qK^\mathsf{T}$采样作为补充技术来稀疏化分数阶段,通过随机三元查询减少键特征访问。两种方法对上游量化、低秩投影、KV缓存压缩和KV缓存选择方法互补。它们共同指向稀疏、无乘法和节能的推理。我们在https://github.com/OPUSLab/SANTA.git开源了我们的核。

英文摘要

Autoregressive decoding becomes bandwidth-limited at long contexts, as generating each token requires reading all $n_k$ key and value vectors from KV cache. We present Stochastic Additive No-mulT Attention (SANTA), a method that sparsifies value-cache access by sampling $S \ll n_k$ indices from the post-softmax distribution and aggregates only those value rows. This yields an unbiased estimator of the post-softmax value aggregation while replacing value-stage multiply-accumulates with gather-and-add. We introduce stratified and systematic sampling to design variance-reduced, GPU-friendly variants. Evaluated on Llama-3.1-8B-Instruct at 32k-token contexts, S$^2$ANTA matches baseline accuracy while achieving up to $1.5\times$ decode-step attention-kernel speedup over FlashInfer and FlashDecoding on an NVIDIA RTX 6000 Ada. In batched long-context generation, these kernel gains translate to up to $1.25\times$ end-to-end decode-latency speedup. Finally, we propose Bernoulli $qK^\mathsf{T}$ sampling as a complementary technique to sparsify the score stage, reducing key-feature access through stochastic ternary queries. Both methods are complementary to upstream quantization, low-rank projection, KV-cache compression, and KV-cache selection methods. Together, they point toward sparse, multiplier-free, and energy-efficient inference. We open-source our kernels at: https://github.com/OPUSLab/SANTA.git

2606.04329 2026-06-04 cs.CR cs.AI 版本更新

From Untrusted Input to Trusted Memory: A Systematic Study of Memory Poisoning Attacks in LLM Agents

从不可信输入到可信内存:LLM智能体中内存投毒攻击的系统研究

Pritam Dash, Tongyu Ge, Aditi Jain, Tanmay Shah, Zhiwei Shang

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文系统研究了基于LLM的智能体中的内存投毒攻击,识别了四种内存写入通道和九种结构漏洞,提出了六类攻击的分类法,并设计了评估基准MPBench,发现更积极读写内存的智能体更易被利用,且现有提示注入防御无法覆盖内存投毒攻击。

详情
AI中文摘要

内存是AI智能体的核心组件,使其能够在交互中积累知识并提高性能。然而,持久性内存引入了内存投毒的风险,即单个对抗性内存写入可以对智能体行为产生长期影响。我们对基于LLM的智能体中的内存投毒进行了系统研究。我们识别了四种内存写入通道和九种模型能力、系统提示设计以及智能体系统架构中的结构漏洞,这些漏洞使得这些通道可被利用。基于这些漏洞,我们提出了六类内存投毒攻击的分类法。此外,我们设计了MPBench——一个用于评估内存投毒攻击的基准,并表明设计为更积极读写和检索内存的智能体更容易被利用。我们还表明,现有的提示注入防御无法覆盖内存投毒攻击。我们的发现为理解和缓解针对AI智能体的内存投毒攻击提供了基础。

英文摘要

Memory is a core component of AI agents, enabling them to accumulate knowledge across interactions and improve performance. However, persistent memory introduces the risk of memory poisoning, where a single adversarial memory write can exert long-term influence over agent behavior. We present a systematic study of memory poisoning in LLM-based agents. We identify four memory write channels and nine structural vulnerabilities in model capabilities, system prompt design, and agent system architecture that make these channels exploitable. Based on these vulnerabilities, we develop a taxonomy of six classes of memory poisoning attacks. Furthermore, we design MPBench -- a benchmark for evaluating memory poisoning attacks, and show that agents designed to write and retrieve memory more aggressively are more exploitable. We also show that existing prompt injection defenses fail to cover memory poisoning attacks. Our findings provide a foundation for understanding and mitigating memory poisoning attacks against AI agents.

2606.04328 2026-06-04 cs.NI cs.AI 版本更新

Generalizable Multi-Task Learning for Wireless Networks Using Prompt Decision Transformers

基于提示决策变压器的无线网络可泛化多任务学习

Fatih Temiz, Shavbo Salehi, Melike Erol-Kantarci

发表机构 * IEEE University of California, Berkeley(加州大学伯克利分校)

AI总结 提出PromptDT框架,将多小区选择重构为序列建模问题,利用离线轨迹和任务特定提示实现跨异构网络配置的可扩展学习,在无需重训练的情况下提升多任务QoE达49%。

Comments Accepted paper at IEEE International Mediterranean Conference on Communications and Networking (MeditCom) 2026

详情
AI中文摘要

未来无线网络需要快速适应高度异构的环境和动态任务配置,这要求从传统的基于规则和优化的无线资源管理(RRM)转向人工智能(AI)驱动的RRM。AI驱动的方法可以学习复杂的非线性关系,泛化到不同的网络条件,并实现实时、可扩展和自主的决策。在RRM技术中,协调多点(CoMP)传输对于减轻小区间干扰和提升小区边缘性能至关重要,从而在密集部署中改善体验质量(QoE)。然而,最优多小区选择仍然是一个复杂的组合挑战,因为它需要在动态流量和信道条件下联合优化许多可能的服务小区组合。尽管取得了成功,但传统的深度强化学习(DRL)方法,如近端策略优化(PPO),在状态和动作空间变化时存在样本效率低、泛化能力有限和重新训练成本高的问题。为了解决这些瓶颈,我们提出了一种基于提示决策变压器(PromptDT)的多任务学习框架,该框架能够跨不同网络配置学习,并将多小区选择重构为序列建模问题。通过利用离线轨迹和任务特定提示,PromptDT实现了跨不同网络配置(包括变化的基站和用户设备数量以及调度策略)的可扩展学习。实验结果表明,与基线相比,PromptDT在多任务设置中将QoE提高了高达49%,且性能随模型容量正向扩展。此外,PromptDT能有效泛化到未见过的任务,实现对新网络配置的鲁棒少样本适应,无需重新训练或微调。

英文摘要

Future wireless networks demand rapid adaptation to highly heterogeneous environments and dynamic task configurations, necessitating a shift from conventional rule-based and optimization-driven radio resource management (RRM) toward artificial intelligence (AI)-driven RRM. AI-driven approaches can learn complex nonlinear relationships, generalize across diverse network conditions and enable real-time, scalable and autonomous decision-making. Among RRM techniques, coordinated multipoint (CoMP) transmission is pivotal for mitigating inter-cell interference and enhancing cell-edge performance, thereby improving quality of experience (QoE) in dense deployments. However, optimal multi-cell selection remains a complex combinatorial challenge as it requires jointly optimizing over many possible serving-cell combinations under dynamic traffic and channel conditions. Despite their success, conventional deep reinforcement learning (DRL) methods such as proximal policy optimization (PPO) suffer from poor sample efficiency, limited generalization, and costly retraining when state and action spaces change. To address these bottlenecks, we propose a Prompt Decision Transformer (PromptDT) based multi-task learning framework capable of learning across diverse network configurations and reformulating multi-cell selection as a sequence modeling problem. By leveraging offline trajectories and task-specific prompts, PromptDT enables scalable learning across diverse network configurations, including varying base stations and user equipment counts, and scheduler policies. Experimental results demonstrate that PromptDT improves QoE by up to 49% in multi-task settings compared to baselines, with performance scaling positively alongside model capacity. Moreover, PromptDT generalizes effectively to unseen tasks, achieving robust few-shot adaptation to new network configurations without retraining or fine-tuning.

2606.04327 2026-06-04 cs.LG cs.AI math.OC 版本更新

A Geometric Characterization of the Stationary Plateau for Two-Layer Neural Networks

两层神经网络平稳高原的几何刻画

Tian Ding, Dawei Li, Ruoyu Sun

发表机构 * Shenzhen International Center of Industrial and Applied Mathematics(深圳工业与应用数学国际中心) Shenzhen Research Institute of Big Data(深圳大数据研究院) Shenzhen Loop Area Institute(深圳环城区域研究所) AutoKernel University of Minnesota Twin Cities(明尼苏达大学双城分校) School of Data Science, The Chinese University of Hong Kong, Shenzhen, China(香港中文大学(深圳)数据科学学院)

AI总结 通过定义“内Hessian”矩阵,研究了光滑激活函数下两层神经网络损失景观中平稳高原的几何结构,分类了所有平稳点的类型(局部极小或鞍点),并揭示了分裂系数与内Hessian的定性如何共同决定高原的局部几何。

Comments 47 pages

详情
AI中文摘要

我们研究了光滑激活函数的两层神经网络损失景观中出现的平稳高原的几何结构。我们关注“神经元分裂”现象,其中复制一个隐藏神经元会在更宽的网络中产生一个仿射平稳点集。我们提供了这些高原上所有平稳点的全面分类,确定了它们在何种条件下构成局部极小点或鞍点。我们的刻画依赖于一个我们称之为“内Hessian”矩阵的每个神经元曲率对象。我们的分析表明,内Hessian的定性以及分裂系数的选择共同决定了高原的局部几何。我们证明,分裂一个局部极小点可以产生局部极小和鞍点的混合,或者一个全鞍点的高原,在温和假设下确定了一个具体的必然鞍点区域。相反,分裂一个鞍点总是产生一个鞍点的高原。我们的结果统一并扩展了先前的景观分析,阐明了模型扩展何时以及如何保持或改变平稳点的性质。这些发现为神经网络中宽度扩展和重参数化的影响提供了新的几何见解。

英文摘要

We investigate the geometric structure of stationary plateaus that arise in the loss landscape of two-layer neural networks with smooth activation functions. We focus on the phenomenon of "neuron splitting" where duplicating a hidden neuron yields an affine set of stationary points in a wider network. We provide a comprehensive classification of all stationary points on these plateaus, determining under what conditions they constitute local minima or saddle points. Our characterization hinges on a per-neuron curvature object we term the "inner Hessian" matrix. Our analysis reveals that the definiteness of the inner Hessian and the choice of splitting coefficients jointly dictate the local geometry of the plateau. We show that "splitting" a local minimum can yield either a mixture of local minima and saddles or an all-saddle plateau, with a concrete sure-saddle region identified under mild assumptions. In contrast, splitting a saddle point always produces a plateau of saddle points. Our results unify and extend prior landscape analyses, elucidating when and how model expansion preserves or alters the nature of stationary points. These findings offer new geometric insights into the effects of width expansion and reparameterization in neural networks.

2606.04326 2026-06-04 cs.LG cs.AI 版本更新

Measuring What Matters: Synthetic Benchmarks for Concept Bottleneck Models

衡量重要之事:概念瓶颈模型的合成基准

Julian Skirzynski, Harry Cheon, Shreyas Kadekodi, Meredith Stewart, Berk Ustun

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 本文开发了用于概念瓶颈模型的合成基准,通过控制数据模态、概念选择、标注质量和完整性等属性,评估模型在决策支持和自动化场景下的性能,并诊断失败模式。

Comments Benchmarks available at https://github.com/ustunb/concept-benchmark

详情
AI中文摘要

概念瓶颈模型从输入中检测到的高级概念预测结果。尽管概念提供了从可解释性中获益的简单方法,但很少有数据集包含概念标签。这限制了研究人员确定哪些问题适合这些模型、隔离驱动其性能或导致失败的因素、或发现哪些算法表现良好的能力。在本文中,我们为概念瓶颈模型开发了合成基准,重点关注其两个主要用例:决策支持(模型帮助人类做出更好的决策)和自动化(模型在无监督下处理常规任务)。我们的基准可以生成带标签的数据集,同时控制影响性能的属性,包括数据模态、概念选择、标注质量和完整性。我们演示了如何使用这些基准评估代表性类别的概念瓶颈模型。我们的演示展示了基准如何诊断失败模式并指导后续测试。

英文摘要

Concept bottleneck models predict outcomes from high-level concepts detected in inputs. Although concepts provide a simple way to reap benefits from interpretability, very few datasets include concept labels. This limits researchers' ability to determine which problems are suitable for these models, isolate the factors that drive their performance or lead to failures, or uncover which algorithms perform well. In this paper, we develop synthetic benchmarks for concept-bottleneck models, focusing on their two main use cases: decision support, in which models assist humans in making better decisions, and automation, in which models handle routine tasks without supervision. Our benchmarks can generate labeled datasets while controlling for properties that affect performance, including data modality, concept choice, annotation quality, and completeness. We demonstrate how the benchmarks can be used to evaluate representative classes of concept bottleneck models. Our demonstrations show how the benchmarks can diagnose failure modes and guide follow-up testing.

2606.04321 2026-06-04 cs.AI 版本更新

The Digital Apprentice: A Framework for Human-Directed Agentic AI Development

数字学徒:面向人类指导的自主AI开发框架

Travis Weber, Rohit Taneja

发表机构 * Pheo Inc(Pheo公司)

AI总结 提出数字学徒框架,通过方法论捕获、授权和持续对齐三个组件,使AI代理在人类指导下逐步获得自主权,实现可扩展且可信的自主系统。

Comments Submitted to ACM AI Leadership Summit 2026, Visionary Papers Track. 5 pages, 2 figures

详情
AI中文摘要

自主AI部署面临一个反复出现的设计张力:重度人类监督限制了规模,而广泛自主则超出问责范围。这两种姿态都无法提供负责任委派所需的治理基础设施。我们提出数字学徒,一个可扩展、安全的AI代理框架,其中自主权是挣得的,而非假设的。数字学徒是一个发展型学习者,内化指导人类的隐性方法论,仅在经验证据证明合理时,才逐步通过每个技能的自主层级。结果是一个随时间变得真正有用,同时保持与特定人类标准一致的代理。三个架构组件使之成为可能。(1) 方法论捕获,将指导专家的隐性方法提炼为结构化资产。(2) 授权,自主升级由明确的人类批准控制。(3) 持续对齐,在运行时纠正漂移,并将每次纠正转化为自有偏好数据。我们将该框架实例化为推理时控制平面。我们对质量框架进行数学建模,并讨论旨在提高质量的策略和技术。我们将该框架应用于开放专业语料库,并展示在流量变化下,捕获数据漂移并在运行时应用不同技术如何恢复降级的质量维度。其意义超越任何单一应用。我们相信,这三个支柱作为一个系统缝合在一起,为能够在不牺牲信任的情况下扩展的自主系统提供了一条更安全、更可行的路径。

英文摘要

Agentic AI deployments face a recurring design tension: heavy human oversight limits scale, while broad autonomy outruns accountability. Neither posture provides the governance infrastructure required for responsible delegation. We present the Digital Apprentice, a framework for scalable, safe AI agency in which autonomy is earned, not assumed. The Digital Apprentice is a developmental learner that internalizes the tacit methodology of a directing human, graduating through per-skill autonomy tiers only when empirical evidence justifies it. The result is an agent that becomes genuinely useful over time while remaining aligned to a specific human's standards. Three architectural components make this possible. (1) Methodology capture, distilling a directing professional's tacit approach into structured assets. (2) Authorization, with autonomy escalation gated by explicit human approval. (3) Continuous alignment, correcting drift at runtime and converting each correction into owned preference data. We instantiate this framework as an inference-time control plane. We mathematically model the quality framework and discuss policies and techniques designed to raise quality. We apply the framework to an open professional corpus, and we show how catching data drift and applying a different technique at runtime recovers degraded quality dimensions under traffic shift. The implication extends beyond any single application. We believe these three pillars, stitched together as a system, form a safer and more viable path to agentic systems that can scale without sacrificing trust.

2606.04320 2026-06-04 cs.LG cs.AI 版本更新

OpenRFM: Dissecting Relational In-Context Learning

OpenRFM:剖析关系型上下文学习

Zhikai Chen, Junyu Yin, Jialiang Gu, Siheng Xiong, Xiaoze Liu, Ruowang Zhang, Keren Zhou, Kai Guo

发表机构 * Michigan State University(密歇根州立大学) Georgia Institute of Technology(佐治亚理工学院) Purdue University(普渡大学) George Mason University(乔治·马歇尔大学)

AI总结 本文通过分析关系型Transformer的模型和数据两方面问题,提出双阶段上下文学习架构和同质性感知预训练混合策略,构建OpenRFM模型,在关系型基础模型上平均任务性能提升约30%。

Comments 25 pages, including appendix

详情
AI中文摘要

关系型基础模型(RFM)承诺一个单一的预训练预测器,给定任何关系数据库,通过关系型上下文学习(ICL)在一次前向传播中返回预测。然而,开放RFM与其商业对应物之间存在显著差距,且这一差距的根源尚未被系统理解。我们从两个角度剖析了一个代表性框架——关系型Transformer(RT)。模型方面:我们表明RT执行关系级ICL,而核回归视图显示,当稀疏标签单元覆盖导致欠定回归时,它会失败。数据方面:我们消融了RT的预训练来源,发现仅合成预训练和分布内预训练将相同架构驱动到不同机制(惰性与特征学习)。探究这一差距揭示,缺失的成分是标签生成过程中可识别支持的关系型潜在变量。这两个诊断转化为:(1)一种双阶段ICL架构,将关系型骨干与从预训练表格基础模型提升的批级ICL层相结合,以克服关系级标签稀缺;(2)一种同质性感知的合成加持续真实数据预训练混合,辅以基于原型的正则化。这些选择定义了OpenRFM,一个简单而有效的RFM,在RT骨干上平均任务性能提升约30%,并在大量评估任务上超越了商业模型KumoRFMv1。

英文摘要

Relational Foundation Models (RFMs) promise a single pre-trained predictor that, given any relational database, returns predictions in one forward pass via relational in-context learning (ICL). Yet a substantial gap separates open RFMs from their commercial counterparts, and the origin of this gap has not been systematically understood. We dissect a representative framework, the Relational Transformer (RT), from two perspectives. Model side: we show that RT performs relation-level ICL, and a kernel regression view shows it fails when sparse label-cell coverage yields an underdetermined regression. Data side: we ablate RT's pre-training source and find that existing synthetic-only pre-training and in-distribution pre-training drive the same architecture into different regimes, lazy vs. feature-learning. Probing this gap reveals that the missing ingredient is a support-identifiable relational latent in the label-generation process. These two diagnoses translate into (1) a dual-stage ICL architecture that combines the relational backbone with a batch-level ICL layer lifted from a pre-trained tabular foundation model to overcome relation-level label scarcity, and (2) a homophily-aware synthetic plus continual real-data pre-training mixture, augmented with a prototype-based regularization. These choices define OpenRFM, a simple yet effective RFM that improves average task performance by approximately 30% over the RT backbone and surpasses the commercial model KumoRFMv1 on a large set of evaluation tasks.

2606.04315 2026-06-04 cs.AI 版本更新

Exploring Cross-Scenario Generality of Agentic Memory Systems: Diagnostics and a Strong Baseline

探索智能体记忆系统的跨场景通用性:诊断与强基线

Zhikai Chen, Jialiang Gu, Junyu Yin, Xianxuan Long, Shenglai Zeng, Xiaoze Liu, Kai Guo, Keren Zhou, Jiliang Tang

发表机构 * Michigan State University(密歇根州立大学) George Mason University(乔治·马歇尔大学) Purdue University(普渡大学)

AI总结 通过诊断现有记忆系统在多种场景下的表现,提出一个基于工具调用的自管理记忆框架AutoMEM,实现最佳跨场景通用性。

Comments 14 pages

详情
AI中文摘要

LLM智能体积累的历史记录会超出其上下文窗口,这推动了关于记忆系统的研究日益增多。然而,大多数现有设计仅针对单一场景(多会话聊天或单轨迹格式)进行调优,几乎没有证据表明它们能够泛化到部署中智能体遇到的异构轨迹。我们重新审视了八个记忆系统以及一个用于搜索问题的智能体框架,在五个场景上进行了评估:单轮问答、多会话聊天、智能体轨迹问答、记忆压力测试和长周期智能体任务。该框架通过工具调用自管理平面文本文件存储,实现了最佳跨任务排名,这表明记忆性能取决于赋予智能体对存储和检索的主动控制,而不是被动地依赖固定流水线后的存储。我们将这一见解实例化为AutoMEM,一个具有自管理工具接口的智能体记忆框架,在我们评估的系统中实现了最佳跨场景通用性。

英文摘要

LLM agents accumulate histories that outgrow their context windows, motivating a growing literature on memory systems. Yet most existing designs are tuned to a single scenario (multi-session chat or a single trajectory format), and there is little evidence that they generalize across the heterogeneous trajectories agents encounter in deployment. We revisit eight memory systems plus an agentic harness for search problems, on five scenarios: single-turn QA, multi-session chat, agentic-trajectory QA, memory stress tests, and long-horizon agentic tasks. The harness, which self-manages flat text-file storage via tool calls, achieves the best cross-task ranking, suggesting that memory performance hinges on giving the agent active control over storage and retrieval rather than on a passive store behind a fixed pipeline. We instantiate this insight in AutoMEM, an agentic memory harness with a self-managed tool interface that achieves the best cross-scenario generality among the systems we evaluate.

2606.04298 2026-06-04 cs.NI cs.AI 版本更新

Anycast Performance in Context

上下文中的任播性能

Eric Liang

发表机构 * Oracle

AI总结 本文通过比较根DNS和CDN中的任播延迟,提出了一种区分弹性驱动和延迟驱动目标的优化框架,并得出结论:运营商不应使用相同的目标函数优化根DNS和CDN任播。

详情
AI中文摘要

IP任播允许一个服务从多个物理站点通告一个地址,让BGP将每个客户端映射到一个站点。它是DNS根服务器系统、公共解析器和一些内容分发网络的核心,然而相同的路由机制在不同应用中有着截然不同的后果。本文比较了两种设置中的任播延迟:根DNS(其中递归缓存将根服务器延迟分摊到许多用户和长生存时间值上)和CDN(其中每次额外的往返直接影响页面加载、视频启动或API延迟)。综合发现,根DNS任播可能表现出显著的路径膨胀,但仍产生有限的用户可见延迟,而CDN任播需要主动工程化对等互联、路由策略、吸引范围和测量反馈以保持膨胀较小。本文贡献了一个比较延迟模型、一个可复现的测量设计以及一个将弹性驱动的任播目标与延迟驱动的目标分开的优化框架。核心结论是实用的:运营商不应使用相同的目标函数优化根DNS和CDN任播。对于根DNS,鲁棒性、可达性和缓存行为占主导地位;对于CDN服务,尾部延迟、吸引正确性和策略控制占主导地位。

英文摘要

IP anycast lets a service advertise one address from many physical sites, leaving BGP to map each client to a site. It is central to the DNS root server system, public resolvers, and some content delivery networks, yet the same routing mechanism has very different consequences across applications. This paper compares anycast latency in two settings: root DNS, where recursive caching amortizes root-server delay over many users and long time-to-live values, and CDNs, where each additional round trip can directly affect page-load, video-start, or API latency. The synthesis finds that root DNS anycast can exhibit substantial path inflation while still producing limited user-visible delay, whereas CDN anycast requires active engineering of peering, route policy, catchment scope, and measurement feedback to keep inflation small. The paper contributes a comparative latency model, a reproducible measurement design, and an optimization framework that separates resilience-driven anycast objectives from latency-driven objectives. The central conclusion is practical: operators should not optimize root DNS and CDN anycast with the same objective function. For root DNS, robustness, reachability, and cache behavior dominate; for CDN services, tail latency, catchment correctness, and policy control dominate.

2606.04296 2026-06-04 cs.AI 版本更新

The Saturation Trap and the Subjectivity of Intervention Timing: Why Affect-Based Triggers and LLM Judges Fail to Time Interventions on Autonomous Agents

饱和陷阱与干预时机的主观性:为什么基于情感的触发器和LLM评判者无法在自主智能体上把握干预时机

Manvendra Modgil

发表机构 * manvendramodgil.ai

AI总结 本研究通过18维情感动力学引擎HEART诊断自主智能体干预时机问题,发现状态饱和陷阱、LLM评判者的能力与上下文门槛,以及人类标注者之间极低的干预时机一致性,表明干预时机是一个低可靠性构念。

Comments 11 pages, 5 tables. Code and data:https://github.com/2025eb1100268-tech/intervention-timing-saturation-trap

详情
AI中文摘要

随着自主AI智能体从对话系统转向长周期软件执行,决定何时中断智能体的运行时安全层变得至关重要。我们使用一个连续的18维情感动力学引擎(HEART)作为诊断探针,研究了这一时机问题,评估了四种干预触发家族——绝对状态阈值、复合状态-动作模式、正则推理特征提取和零样本LLM作为评判者——针对SWE-bench-Verified调试轨迹上人工标注的干预点。我们报告了三个发现。首先,状态饱和陷阱:智能体在持续困难下没有恢复信号,因此建模的挫折感迅速越过阈值并保持最大值,将基于状态阈值的触发器从时刻检测器转变为近乎恒定的指示器,在五个轨迹中触发39-83%的动作。其次,LLM评判者的能力和上下文底线:小模型(gpt-5.4-mini)从不触发,而前沿和跨供应商模型只有在完整轨迹上下文下才能逃脱零触发底线,即使如此,F1值也仅为0.17-0.40,成本高达90倍。第三,最重要的是,监督目标在人类之间不可复现:三名训练有素的标注者使用同一评分标准对一条56动作轨迹进行标注,在干预位置上的一致性仅略高于偶然(位置Krippendorff's alpha = +0.047;最佳成对Cohen's kappa = +0.349),而在干预类型上完全不一致(暂停退化;澄清低于偶然;仅反思alpha = +0.226)。我们得出结论,干预时机是一个低可靠性构念,使得单标注者F1不适合作为优化目标。我们的贡献是跨人类评分者间信度、四种检测器架构、跨模型LLM评判者扫描以及复现的饱和效应,共同绘制了这一问题图谱,而非任何单一检测器的准确性。

英文摘要

As autonomous AI agents move from conversational systems to long-horizon software execution, runtime safety layers that decide when to interrupt an agent have become essential. We study this timing problem using a continuous 18-dimensional affective-dynamics engine (HEART) as a diagnostic probe, evaluating four intervention trigger families - absolute state thresholds, composite state-action patterns, regex reasoning-feature extraction, and zero-shot LLM-as-judge - against human-annotated intervention points on SWE-bench-Verified debugging traces. We report three findings. First, a State Saturation Trap: agents show no recovery signal under sustained difficulty, so modeled frustration quickly crosses the threshold and stays at its maximum, converting threshold-on-state triggers from moment detectors into near-constant indicators that fire on 39-83% of actions across five trajectories. Second, a capability-and-context floor for LLM judges: a small model (gpt-5.4-mini) never fires, while frontier and cross-vendor models escape the zero-firing floor only with full-trajectory context, and even then reach only F1 0.17-0.40 at up to 90x the cost. Third, and most importantly, the supervised target is not reproducible among humans: three trained annotators using one rubric on a 56-action trajectory agree on where to intervene only slightly above chance (location Krippendorff's alpha = +0.047; best pairwise Cohen's kappa = +0.349) and not at all on intervention type (pause degenerate; clarify below chance; reflect only alpha = +0.226). We conclude that intervention timing is a low-reliability construct, making single-annotator F1 an unsuitable optimization target. Our contribution is the joint mapping of this problem across human inter-rater reliability, four detector architectures, a cross-model LLM-judge sweep, and a reproduced saturation effect, rather than any single detector's accuracy.

2606.04287 2026-06-04 cs.LG cs.AI 版本更新

Scaling Novel Graph Generation via Lightweight Structure-Guided Autoregressive Models

通过轻量级结构引导自回归模型扩展新颖图生成

Alessio Barboni, Massimiliano Lupo Pasini, Bishal Lakha, Edoardo Serra

发表机构 * Boise State University(博伊州立大学) Oak Ridge National Laboratory(橡树岭国家实验室)

AI总结 提出一种轻量级自回归框架,利用结构引导拓扑排序和两阶段训练策略,在分子和非分子基准上实现高新颖性、有效性和唯一性的图生成。

详情
AI中文摘要

生成真实且多样的图是机器学习中的一个关键问题,在分子发现、电路设计、网络安全等领域有应用。然而,当前的图生成模型在可扩展性和新颖性方面仍存在局限。基于扩散的方法通常需要昂贵的全邻接操作和长去噪链,而许多自回归和混合模型至少具有二次复杂度。此外,这些模型往往模仿训练图而非泛化到新图。我们提出一个轻量级自回归框架来解决这些问题。它使用结构引导的拓扑排序将图序列化为规则的边序列,实现近对数线性生成,以及一种两阶段训练策略,结合探索导向的增强和迭代细化,以减少过拟合并促进受控的新颖性。在分子和非分子基准上的实验表明,我们的方法在保持高有效性和唯一性的同时提高了新颖性。该框架还支持LSTM和Mamba风格的因果序列骨干,大内存加速器使得能够进行超出典型GPU限制的更长的图序列实验。

英文摘要

Generating realistic and diverse graphs is a key problem in machine learning, with applications in molecular discovery, circuit design, cybersecurity, and beyond. However, current graph generative models remain limited by scalability and novelty. Diffusion-based methods often require costly full-adjacency operations and long denoising chains, while many autoregressive and hybrid models have at least quadratic complexity. In addition, these models often imitate training graphs rather than generalize beyond them. We propose a lightweight autoregressive framework to address these issues. It uses a structure-guided topological ordering to serialize graphs into regular edge sequences, enabling near log-linear generation, and a two-phase training strategy that combines exploration-oriented augmentation with iterative refinement to reduce overfitting and promote controlled novelty. Experiments on molecular and non-molecular benchmarks show that our approach improves novelty while preserving high validity and uniqueness. The framework also supports both LSTM and Mamba-style causal sequence backbones, with large-memory accelerators enabling longer graph-sequence experiments beyond typical GPU limits.

2606.04284 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Sparse Mixture-of-Experts Reward Models Learn Interpretable and Specialized Experts for Personalized Preference Modeling

稀疏混合专家奖励模型学习可解释且专业化的专家用于个性化偏好建模

Yifan Wang, Jinyi Mu, Mayank Jobanputra, Yu Wang, Ji-Ung Lee, Soyoung Oh, Isabel Valera, Vera Demberg

发表机构 * Saarland University(萨尔兰大学) Independent Researcher(独立研究者) Bielefeld University(比勒菲尔德大学) Max Planck Institute for Software Systems(马克斯·普朗克软件系统研究所) Max Planck Institute for Informatics(马克斯·普朗克信息研究所)

AI总结 提出稀疏混合专家奖励模型,通过稀疏路由和专家多样性训练,从二元偏好数据中学习可解释的专家模式,提升个性化偏好建模的测试时适应性和可解释性。

详情
AI中文摘要

偏好建模在基于人类反馈的强化学习(RLHF)中扮演核心角色,使大型语言模型(LLMs)与人类价值观对齐。然而,大多数现有方法假设一个通用的奖励函数,忽视了人类偏好的多样性和异质性。为了在不增加额外标注成本的情况下解决这一限制,最近的工作提出从二元数据中学习多个偏好组件,并组合它们以建模个体偏好。然而,这些组件往往无法捕捉连贯且解耦的模式,限制了其可解释性和个性化效果。在这项工作中,我们提出了一种稀疏混合专家(MoE)奖励模型,该模型在二元偏好数据训练过程中鼓励稀疏路由和专家多样性。在受控和真实世界的实验中,稀疏MoE学习了可解释的路由模式和专业化的专家。它还改进了测试时的个性化,并且适应后的专家权重变化为分析模型如何适应个性化偏好提供了定性视角。

英文摘要

Preference modeling plays a central role in reinforcement learning from human feedback (RLHF), enabling large language models (LLMs) to align with human values. However, most existing approaches assume a universal reward function, neglecting the diversity and heterogeneity of human preferences. To address this limitation without additional annotation costs, recent work has proposed learning multiple preference components from binary data and combining them to model individual preferences. Nevertheless, these components often fail to capture coherent and disentangled patterns, limiting their interpretability and effectiveness for personalization. In this work, we propose a sparse Mixture-of-Experts (MoE) reward model that encourages sparse routing and expert diversity during training on binary preference data. Across controlled and real-world experiments, sparse MoE learns interpretable routing patterns and specialized experts. It also improves test-time personalization, and post-adaptation shifts in expert weights provide a qualitative lens for analyzing how the model adapts to personalized preferences.

2606.04280 2026-06-04 cs.LG cs.AI cs.IR 版本更新

The Loss Is Not Enough: Sampling Conditions and Inductive Bias in Contrastive Representation Learning

损失还不够:对比表示学习中的采样条件和归纳偏置

Justinas Zaliaduonis, Patrick Putzky, Till Richter, Sergios Gatidis

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本文通过测度论框架形式化对比学习中的多样性条件,提出支持校正的InfoNCE变体,并实验验证了采样多样性与编码器归纳偏置的相互作用。

详情
AI中文摘要

对比学习已成为自监督表示学习的主要范式,但其恢复有意义潜在几何的条件尚未完全理解。我们开发了一个测度论框架,形式化了多样性条件,即正对采样的支持要求,这是等距潜在恢复所必需的。我们表明,标准的全支持von Mises-Fisher设置意味着满足多样性条件,因此全局对比损失最小化器可以恢复潜在几何(直到正交变换),而受限条件分布可以使非正交映射达到严格更低的渐近对比损失。我们引入了一种支持校正的信息噪声对比估计(InfoNCE)变体作为理论修复:这种校正使得正交潜在空间恢复成为可能,但并不能唯一选择它。在合成基准上的实验验证了可识别性预测,CIFAR-10实验与定性预测一致,即当采样多样性有限时,架构归纳偏置变得更加重要。总之,我们的结果阐明了采样机制和编码器归纳偏置在对比表示学习中的相互作用。

英文摘要

Contrastive learning has become a leading paradigm for self-supervised representation learning, yet the conditions under which it recovers meaningful latent geometry remain incompletely understood. We develop a measure-theoretic framework formalizing the diversity condition, a support requirement on positive-pair sampling that is necessary for isometric latent recovery. We show that the standard full-support von Mises-Fisher setting implies the satisfaction of the diversity condition and as a consequence global contrastive loss minimizers recover latent geometry up to orthogonal transformation, while restricted conditionals can make non-orthogonal maps attain strictly lower asymptotic contrastive loss. We introduce a support-corrected Information Noise Contrastive Estimation (InfoNCE) variant as a theoretical fix: this correction makes orthogonal latent space recovery achievable but does not uniquely select it. Experiments on synthetic benchmarks validate the identifiability predictions, and CIFAR-10 experiments are consistent with the qualitative prediction that architectural inductive bias becomes more important when sampling diversity is limited. Together, our results clarify how sampling mechanisms and encoder inductive bias interact in contrastive representation learning.

2606.04275 2026-06-04 cs.LG cs.AI 版本更新

From Ticks to Flows: Dynamics of Neural Reinforcement Learning in Continuous Environments

从蜱虫到流:连续环境中神经强化学习的动力学

Saket Tiwari, Tejas Kotwal, George Konidaris

发表机构 * Brown University(布朗大学)

AI总结 本文通过将深度强化学习建模为连续时间随机过程,利用随机控制理论,首次推导了连续环境下过参数化神经演员-评论家算法在无限宽度极限下的状态分布演化方程。

Comments Presented at ICLR 2026: https://openreview.net/forum?id=TdiRLe3rPA

详情
AI中文摘要

我们提出了一种新颖的深度强化学习(RL)在连续环境中的理论框架,通过借鉴随机控制的思想,将问题建模为连续时间随机过程。在先前工作的基础上,我们引入了一个可行的演员-评论家算法模型,该模型同时包含探索和随机转移。对于单隐藏层神经网络,我们表明环境状态可以表述为两个时间尺度的过程:环境时间和梯度时间。在此框架下,我们描述了表示环境状态和累积折扣回报估计的时间相关随机变量如何在两层网络的无限宽度极限下随梯度步长演化。利用随机微分方程理论,我们首次在连续RL中推导出一个方程,描述了在极小的学习率下,每个梯度步长上状态分布的无穷小变化。总体而言,我们的工作为研究过参数化神经演员-评论家算法提供了一种新颖的非参数化表述。我们通过一个简单的连续控制任务实证验证了我们的理论结果。

英文摘要

We present a novel theoretical framework for deep reinforcement learning (RL) in continuous environments by modeling the problem as a continuous-time stochastic process, drawing on insights from stochastic control. Building on previous work, we introduce a viable model of actor-critic algorithm that incorporates both exploration and stochastic transitions. For single-hidden-layer neural networks, we show that the state of the environment can be formulated as a two time scale process: the environment time and the gradient time. Within this formulation, we characterize how the time-dependent random variables that represent the environment's state and estimate of the cumulative discounted return evolve over gradient steps in the infinite width limit of two-layer networks. Using the theory of stochastic differential equations, we derive, for the first time in continuous RL, an equation describing the infinitesimal change in the state distribution at each gradient step, under a vanishingly small learning rate. Overall, our work provides a novel nonparametric formulation for studying overparametrized neural actor-critic algorithms. We empirically corroborate our theoretical result using a toy continuous control task.

2606.04273 2026-06-04 cs.AI 版本更新

Characterizing initial human-AI proof formalization workflows

表征初始人机交互的证明形式化工作流

Katherine M. Collins, Simon Frieder, Jonas Bayer, Jacob Loader, Jeck Lim, Peiyang Song, Fabian Zaiser, Lexin Zhou, Shanda Li, Sam Looi, Joshua B. Tenenbaum, Umang Bhatt, Adrian Weller, Jose Hernandez-Orallo, Cameron E. Freer, Valerie Chen, Ilia Sucholutsky

发表机构 * Massachusetts Institute of Technology(麻省理工学院) University of Cambridge(剑桥大学) Princeton University(普林斯顿大学) University of Oxford(牛津大学) Caltech(加州理工学院) Carnegie Mellon University(卡内基梅隆大学) Universitat Politècnica de València(瓦伦西亚理工大学) New York University(纽约大学)

AI总结 通过混合方法分析,研究人们在形式化证明过程中对AI工具的需求、障碍及实际使用模式,发现AI辅助能提高形式化准确率且用户偏好多样但普遍希望保持人类对证明发现过程的高层控制。

详情
AI中文摘要

几个世纪以来,人类数学家通过书写证明来支撑其数学论证;然而,自动验证证明有效性的能力长期以来一直是一个挑战。AI系统在生成代码和进行日益高级的数学推理方面的进步,有望改变人们形式化并进而验证证明的能力。虽然许多工作聚焦于对当前前沿进行基准测试,但我们转而研究人们如何使用这些工具。我们采用混合方法分析,研究AI对人们形式化工作流的初始影响:人们声称想要什么,他们认为这些愿景的障碍是什么,以及他们在实践中如何实际使用和适应AI。一项定性调查显示,人们的偏好是多样化的,但普遍希望AI辅助形式化,同时保留人类对证明发现过程的高层控制。为了评估在这种限制下人们如何实际使用AI进行形式化,我们进行了一项受控用户研究,参与者形式化非正式的数学问题及其证明,在有和没有AI的情况下,涉及不同难度和领域的多种数学问题。尽管当时用于自动形式化的工具有限,但参与者在使用AI工具时往往比单独形式化时获得更高的形式化准确率,大多数参与者灵活选择使用多种不同的AI工具。综合来看,我们的工作揭示了AI融入形式化工作流的早期阶段,涉及人类与AI参与的密切互动。

英文摘要

For centuries, human mathematicians have written proofs to substantiate their mathematical arguments; yet, the ability to automatically verify the validity of proofs has long been a challenge. Advances in AI systems' ability to generate code and engage in increasingly high-level mathematical reasoning promise to transform people's ability to formalize and thereby verify proofs. While many works focus on benchmarking the current frontier, we instead study how people use these tools. We conduct a mixed-methods analysis into the initial impact of AI on people's formalization workflows: what people claim they want, what they see as the barriers to those visions, and how they actually use and adapt AI in practice. A qualitative survey shows that people's preferences are diverse, but with a general desire for AI assistance in formalization that preserves high-level human control over the proof discovery process. To assess how people actually engage with AI for formalization under such limitations, we conduct a controlled user study in which participants formalize informal math problems and their proofs, with and without AI, across a range of mathematical problems at varying levels of difficulty and domains. Despite limitations of the tools at the time for autoformalization, participants tend to attain higher formalization accuracy when allowed access to AI tools than when formalizing on their own, with most participants flexibly choosing to use multiple different AI tools. Taken together, our work sheds light on the early stages of AI integration into formalization workflows, involving an intimate interplay of human and AI engagement.

2606.04271 2026-06-04 cs.CV cs.AI 版本更新

StandardE2E: A Unified Framework for End-to-End Autonomous Driving Datasets

StandardE2E:端到端自动驾驶数据集的统一框架

Stepan Konev

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出StandardE2E框架,通过统一数据模式、多数据集联合加载和简化新数据集添加流程,解决端到端自动驾驶数据集格式不兼容问题。

详情
AI中文摘要

自动驾驶已从模块化的感知-预测-规划堆栈转向端到端(E2E)模型,这些模型直接将传感器输入映射到车辆控制,通常通过辅助任务(如3D检测、运动预测和高清地图感知)进行正则化。进展由快速增长的传感器丰富驾驶数据集生态系统驱动,但每个数据集都有自己的文件格式、API、坐标约定和模态覆盖范围,导致跨数据集实验甚至基本的每个数据集预处理都需要为每个项目重新实现。我们提出StandardE2E,一个为E2E驾驶数据集提供统一接口的框架。StandardE2E (i) 在共享数据模式下标准化每个数据集的预处理;(ii) 在单个PyTorch DataLoader中组合多个数据集,用于跨数据集预训练、辅助任务监督和场景级过滤;(iii) 将添加新数据集简化为从原始帧到规范模式的单个数据集映射,而整个下游流程保持不变。该框架开箱即支持六个数据集:Waymo End-to-End、Waymo Perception、Argoverse 2 Sensor、Argoverse 2 LiDAR、NAVSIM (OpenScene-v1.1) 和 WayveScenes101,并作为开源标准e2e Python包发布,可在 https://github.com/stepankonev/StandardE2E 获取。

英文摘要

Autonomous driving has shifted from modular perception-prediction-planning stacks toward end-to-end (E2E) models that map sensor inputs directly to vehicle control, often regularized by auxiliary tasks such as 3D detection, motion forecasting, and HD-map perception. Progress is driven by a fast-growing ecosystem of sensor-rich driving datasets, yet each ships its own file formats, APIs, coordinate conventions, and modality coverage, leaving cross-dataset experimentation and even basic per-dataset preprocessing to be re-implemented per project. We present StandardE2E, a framework that provides a single unified interface over E2E driving datasets. StandardE2E (i) standardizes per-dataset preprocessing under one shared data schema; (ii) combines multiple datasets in a single PyTorch DataLoader for cross-dataset pretraining, auxiliary-task supervision, and scenario-level filtering; and (iii) reduces adding a new dataset to a single per-dataset mapping from raw frames to the canonical schema, leaving the entire downstream pipeline unchanged. The framework supports six datasets out of the box: Waymo End-to-End, Waymo Perception, Argoverse 2 Sensor, Argoverse 2 LiDAR, NAVSIM (OpenScene-v1.1), and WayveScenes101, and is released as the open-source standard-e2e Python package, available at https://github.com/stepankonev/StandardE2E.

2606.04269 2026-06-04 cs.RO cs.AI cs.CV 版本更新

Instant-Fold: In-Context Imitation Learning for Deformable Object Manipulation

Instant-Fold: 可变形物体操作的情境模仿学习

Yilong Wang, Cheng Qian, Edward Johns

发表机构 * The Robot Learning Lab(机器人学习实验室) Imperial College London(伦敦帝国学院)

AI总结 提出Instant-Fold框架,通过单次人类演示的情境模仿学习,无需梯度更新即可推断并执行多种可变形物体操作模式,在仿真训练后零样本迁移到真实世界。

详情
AI中文摘要

可变形物体操作(DOM)具有挑战性,因为其状态是高维、部分可观测的,并且通过长时间跨度、拓扑变化的交互演变,涉及多种有效的操作模式。我们引入了Instant-Fold,一个用于DOM的情境模仿学习框架。给定单次人类演示,我们的策略直接从演示中推断并执行多种操作模式,包括空间执行和顺序的变化,无需梯度更新。我们的方法首先通过时间对比预训练学习变形感知的视觉表示,然后基于演示的条件流匹配变换器策略预测执行预期操作模式的动作。完全在仿真中训练的Instant-Fold能够泛化到多种折叠模式,并零样本迁移到真实世界环境,无需额外的数据收集或微调。视频可在https://instant-fold.github.io获取。

英文摘要

Deformable object manipulation (DOM) is challenging due to high-dimensional, partially observable states that evolve through long-horizon, topology-changing interactions with multiple valid manipulation modes. We introduce Instant-Fold, an in-context imitation learning framework for DOM. Given a single human demonstration, our policy infers and executes diverse manipulation modes directly from the demonstration, including variations in spatial execution and ordering, without requiring gradient updates. Our approach first learns deformation-aware visual representations via temporal contrastive pretraining, after which a flow-matching transformer policy conditioned on the demonstration predicts actions to execute the intended manipulation mode. Trained entirely in simulation, Instant-Fold generalizes across diverse folding modes and transfers zero-shot to real-world settings without additional data collection or finetuning. Videos are available at https://instant-fold.github.io.

2606.04262 2026-06-04 cs.CL cs.AI 版本更新

Can I Take Another Dose? Evaluating LLM Decision-Making Under Temporal Uncertainty in OTC Dosing QA

我可以再服一剂吗?评估LLM在OTC剂量问答中时间不确定性下的决策能力

Maroof Kousar, Yibo Hu

发表机构 * Illinois Institute of Technology(伊利诺伊理工学院)

AI总结 提出DOSEBENCH基准测试,评估大语言模型在非处方药剂量问答中处理时间推理、约束遵循和不确定性的能力。

Comments 16 pages, 7 figures

详情
AI中文摘要

大型语言模型(LLM)越来越多地被用于日常健康问题,包括用户是否可以安全地再服用一剂非处方(OTC)药物。然而,这一常见的安全相关场景在现有的医学问答评估中仍未得到充分探索,其中正确答案需要跟踪剂量时间、计算滚动24小时摄入量、遵循产品标签约束以及处理不完整的用药史。我们引入了DOSEBENCH,这是一个包含81个精心策划的OTC剂量场景的聚焦基准测试,专注于成人对乙酰氨基酚和布洛芬的使用,并带有手动标注的金标准参考。我们使用决策正确性、一致性、解释可验证性、失败类型和置信度相关信号等指标,在多次运行中评估了四个LLM,共获得1620个模型响应。我们的结果表明,模型在滚动窗口推理和模糊敏感场景中经常遇到困难,且稳定或看似自信的响应仍可能违反剂量约束。这些发现表明,OTC剂量问答为评估医学问答中的时间推理、约束遵循和安全相关不确定性处理提供了一个狭窄但实用的测试平台。

英文摘要

Large language models (LLMs) are increasingly used for everyday health questions, including whether a user can safely take another dose of an over-the-counter (OTC) medication. Yet this common safety-relevant setting remains underexplored in existing medical QA evaluations, where correct answers require tracking dose timing, computing rolling 24-hour intake, following product-label constraints, and handling incomplete medication histories. We introduce DOSEBENCH, a focused benchmark of 81 curated OTC dosing scenarios focused on adult acetaminophen and ibuprofen use, with manually annotated gold references. We evaluate four LLMs across repeated runs using metrics for decision correctness, consistency, explanation verifiability, failure types, and confidence-related signals, resulting in 1,620 model responses. Our results show that models frequently struggle with rolling-window reasoning and ambiguity-sensitive cases and that stable or confident-looking responses can still violate dosing constraints. These findings suggest that OTC dosing QA provides a narrow yet practical testbed for evaluating temporal reasoning, constraint following, and safety-relevant uncertainty handling in medical QA.

2606.04261 2026-06-04 cs.AI cs.CL cs.CV cs.ET cs.LG 版本更新

Can Generalist Agents Automate Data Curation?

通用智能体能否自动化数据筛选?

Feiyang Kang, Hanze Li, Adam Nguyen, Mahavir Dabas, Jiaqi W. Ma, Frederic Sala, Dawn Song, Ruoxi Jia

发表机构 * Virginia Tech(弗吉尼亚理工大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of Wisconsin-Madison(威斯康星大学麦迪逊分校) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出Curation-Bench基准,通过通用编码智能体自动化数据筛选循环,实验表明现成智能体可达到强基线,但存在执行-研究差距,而结构化方法引导的智能体能在十分之一数据预算下自主组合出优于强基线的数据选择策略。

Comments Preprint

详情
AI中文摘要

训练数据的筛选是现代AI开发中最重要但劳动密集的部分之一:实践者根据嘈杂的基准反馈迭代地提出、实施、评估和修订数据策略。我们探究通用编码智能体能否自动化这一数据筛选循环。我们引入了*Curation-Bench*,一个以智能体为中心的基准,它固定模型、训练配方和评估套件,同时赋予智能体命令行权限以检查数据、实施策略、提交到固定的训练/评估流水线并进行修订。在视觉-语言指令微调实例中,现成智能体在十次迭代内达到了已发表的强数据选择基线。然而,轨迹分析揭示了持续的*执行-研究差距*:即使提供了策略指南和论文参考,智能体主要调整局部策略变体,而非探索新的策略家族。要求每次迭代引用、实例化和改编先前方法的框架将智能体转向方法引导的探索。这种框架化的智能体自主组合——无需人工设计输入——一种数据选择策略,在十分之一的数据预算下优于已发表的强基线。总体而言,当前智能体可以运行筛选循环,但可靠的数据研究需要框架化的方法适应,而非仅靠开放式提示。代码和基准已开源。

英文摘要

Curating training data is among the most consequential yet labor-intensive parts of modern AI development: practitioners iteratively propose, implement, evaluate, and revise data policies against noisy benchmark feedback. We ask whether generalist coding agents can automate this data-curation loop. We introduce *Curation-Bench*, an agent-centric benchmark that fixes the model, training recipe, and evaluation suite while giving agents command-line access to inspect data, implement policies, submit them to a fixed training/evaluation pipeline, and revise. In a vision-language instruction-tuning instantiation, out-of-the-box agents reach strong published data-selection baselines within ten iterations. However, trajectory analysis reveals a persistent *execution-research gap*: agents mainly tune local policy variants rather than explore new policy families, even when given strategy guides and paper references. Scaffolds requiring each iteration to cite, instantiate, and adapt a prior method shift agents toward method-guided exploration. The scaffolded agent autonomously composes -- without human design input -- a data-selection policy that outperforms strong published baselines at one-tenth their data budget. Overall, current agents can run the curation loop, but reliable data research requires scaffolded method adaptation, not open-ended prompting alone. Code and benchmark are open-sourced.

2606.04246 2026-06-04 cs.AI cs.AR cs.CL 版本更新

StepPRM-RTL: Stepwise Process-Reward Guided LLM Fine-Tuning for Enhanced RTL Synthesis

StepPRM-RTL:基于逐步过程奖励引导的LLM微调以增强RTL综合

Prashanth Vijayaraghavan, Apoorva Nitsure, Luyao Shi, Ehsan Degan, Vandana Mukherjee

发表机构 * IBM Research San Jose CA USA(IBM研究院圣何塞加州美国)

AI总结 提出StepPRM-RTL框架,结合逐步轨迹建模、过程奖励模型和检索增强微调,通过密集反馈和蒙特卡洛树搜索探索推理路径,提升LLM生成RTL代码的功能正确性和推理保真度,在基准数据集上相比先前方法提升超10%。

Comments 6 pages, 2 figures, DAC'2026

详情
AI中文摘要

由于Verilog和VHDL中的长程推理、多步依赖和严格正确性约束,数字硬件设计的RTL代码自动生成仍然具有挑战性。我们提出StepPRM-RTL,一种新颖的框架,结合逐步轨迹建模、过程奖励模型(PRM)和检索增强微调(RAFT),以增强基于LLM的RTL代码生成的功能正确性和推理保真度。StepPRM-RTL从规范解构建逐步推理轨迹,其中每一步包含一个理由和增量代码修改。过程奖励模型(PRM)评估中间步骤,提供密集反馈,指导RAFT微调期间的强化式更新。蒙特卡洛树搜索(MCTS)探索替代推理路径,用高质量轨迹丰富训练数据集。这种逐步和结果感知奖励的集成使模型能够学习如何以及为何构建正确的RTL,从而改善超出标准监督或基于结果训练的长程推理。在基准Verilog和VHDL数据集上的实验评估表明,StepPRM-RTL在功能正确性和推理保真度指标上优于先前最佳方法超过10%。消融研究证实,PRM引导奖励和逐步轨迹探索的结合是其性能的关键。StepPRM-RTL跨RTL语言泛化,并为高保真、可解释的代码生成提供了可扩展框架,为LLM辅助硬件设计自动化建立了新标准。

英文摘要

Automatic generation of RTL code for digital hardware designs remains challenging due to long-horizon reasoning, multi-step dependencies, and strict correctness constraints in Verilog and VHDL. We present StepPRM-RTL, a novel framework that combines stepwise trajectory modeling, process-reward modeling (PRM), and retrieval-augmented fine-tuning (RAFT) to enhance both the functional correctness and reasoning fidelity of LLM-based RTL code generation. StepPRM-RTL constructs stepwise reasoning trajectories from canonical solutions, where each step contains a rationale and incremental code modification. A Process Reward Model (PRM) evaluates intermediate steps, providing dense feedback that guides reinforcement-style updates during RAFT fine-tuning. Monte Carlo Tree Search (MCTS) explores alternative reasoning paths, enriching the training dataset with high-quality trajectories. This integration of stepwise and outcome-aware rewards allows the model to learn both how and why to construct correct RTL, improving long-horizon reasoning beyond standard supervised or outcome-based training. Experimental evaluation on benchmark Verilog and VHDL datasets demonstrates that StepPRM-RTL outperforms the best prior methods by over 10\% in functional correctness and reasoning fidelity metrics. Ablation studies confirm that the combination of PRM-guided rewards and stepwise trajectory exploration is key to its performance. StepPRM-RTL generalizes across RTL languages and provides a scalable framework for high-fidelity, interpretable code generation, establishing a new standard for LLM-assisted hardware design automation.

2606.04244 2026-06-04 cs.AI cs.CL cs.CV cs.LG 版本更新

VAMPS: Visual-Assisted Mathematical Problem Solving Benchmark

VAMPS: 视觉辅助数学问题求解基准

Amirhossein Dabiriaghdam, Shayan Vassef, Mohammadreza Bakhtiari, Yasamin Medghalchi, Ilker Hacihaliloglu, Mesrob Ohannessian, Lele Wang, Giuseppe Carenini

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出VAMPS基准,通过1,168道双语多选题评估多模态大模型在借助绘图工具进行数学推理时的表现,发现直接解析求解优于工具辅助视觉求解。

详情
AI中文摘要

多模态大语言模型在复杂推理方面能力日益增强,但当它们必须通过工具外部化问题然后基于工具输出进行推理时,尤其是在依赖视觉辅助的情况下,其性能往往会下降。这一差距尤为重要,因为真实的工程和科学工作流程通常依赖可视化工具进行分析、验证和决策。为了研究这一差异,我们引入了VAMPS(视觉辅助数学问题求解),一个用于图辅助数学的基准。VAMPS包含1,168个多模态、双语选择题问答对,这些题目来自伊朗大学入学考试的代数和微积分问题,并通过人工审核的LLM生成的合成变体进行了扩展,所有题目都经过精心挑选,使得绘图能够通过揭示交点、极值、渐近线等提供自然的求解策略。VAMPS旨在用于基准测试和诊断,它超越了以往主要评估在固定视觉输入上进行推理的多模态基准,通过测试模型是否能够从构建有用的图形中受益并将其答案基于结果可视化。总体而言,我们发现,在一组多样化的模型中,直接解析求解出人意料地优于工具辅助的视觉求解,即使在绘图是自然策略的问题上也是如此。

英文摘要

Multimodal large language models are increasingly capable of complex reasoning, yet their performance often degrades when they must externalize a problem through a tool and then reason over the tool's output, specifically when they rely on visual aids. This gap is especially important because real engineering and scientific workflows often rely on visualization tools for analysis, validation, and decision-making. To study this discrepancy, we introduce VAMPS (Visual-Assisted Mathematical Problem Solving), a benchmark for graph-assisted mathematics. VAMPS contains 1,168 multimodal, bilingual multiple-choice question-answer pairs drawn from Iranian University Entrance Exam algebra and calculus problems and expanded with human-reviewed LLM-generated synthetic variants, all selected so that plotting provides a natural solution strategy by revealing intersections, extrema, asymptotes, etc. Designed for both benchmarking and diagnosis, VAMPS goes beyond prior multimodal benchmarks that primarily evaluate reasoning over fixed visual inputs by testing whether a model can benefit from constructing a useful graph and grounding its answer in the resulting visualization. Overall, we found that across a diverse set of models, direct analytical solving surprisingly outperforms tool-enabled visual solving, even on problems where plotting is a natural strategy.

2606.04240 2026-06-04 cs.CV cs.AI cs.CL 版本更新

Overview of the EReL@MIR 2025 Multimodal Document Retrieval Challenge (Track 1)

EReL@MIR 2025 多模态文档检索挑战赛(赛道1)概述

Jingbiao Mei

发表机构 * University of Cambridge(剑桥大学) Cambridge United Kingdom(剑桥英国)

AI总结 本文介绍了EReL@MIR 2025多模态文档检索挑战赛(赛道1)的设计、数据集、评估协议、最终排名及前三名获胜系统的分析,所有系统均基于Qwen2-VL系列解码器多模态大语言模型嵌入器。

Comments MDR Challenge Report at WWW2025

详情
AI中文摘要

对于视觉丰富的文档(即文本与图形、表格和图表交织的页面)的检索,对于多模态检索增强生成至关重要,然而大多数检索器仍然丢弃视觉通道。\emph{多模态文档检索挑战赛}是首届EReL@MIR研讨会(与2025年万维网会议同期举办)中MIR挑战赛的赛道1,要求参与者构建一个\emph{单一}检索系统,处理两种互补的场景:基于文本查询在长文档内进行封闭集文档页面检索(MMDocIR),以及基于图像或图像加文本查询进行开放域维基百科风格段落检索(M2KR)。系统根据两个任务上平均Recall@$\{1,3,5\}$的宏平均值进行排名。该挑战赛吸引了来自22个团队的455名参赛者和586份提交。本报告描述了挑战赛的设计、数据集和评估协议;报告了最终排名;并分析了三个获胜团队的系统。所有三个系统都基于Qwen2-VL系列的解码器多模态大语言模型嵌入器,而非CLIP风格的编码器,主要区别在于它们是通过微调集成、无训练的多路融合与强视觉语言重排序器,还是零样本后期交互达到顶尖水平。无训练系统与微调获胜者的得分差距在0.1分以内。

英文摘要

Retrieval over visually-rich documents, pages that interleave text with figures, tables, and charts, is essential for multimodal retrieval-augmented generation, yet most retrievers still discard the visual channel. The \emph{Multimodal Document Retrieval Challenge}, Track~1 of the MIR Challenge at the first EReL@MIR workshop, co-located with The Web Conference 2025, asks participants to build a \emph{single} retrieval system that handles two complementary regimes: closed-set document page retrieval within long documents from a text query (MMDocIR), and open-domain retrieval of Wikipedia-style passages from an image or image-plus-text query (M2KR). Systems are ranked by the macro-average of mean Recall@$\{1,3,5\}$ over the two tasks. The challenge drew 455 entrants and 586 submissions across 22 teams. This report describes the challenge design, datasets, and evaluation protocol; reports the final standings; and analyses the three winning teams' systems. All three build on decoder-based Multimodal-LLM embedders from the Qwen2-VL family rather than on CLIP-style encoders, and differ chiefly in whether they reach the top through fine-tuned ensembles, training-free multi-route fusion with a strong vision-language re-ranker, or zero-shot late interaction. The training-free system finished within $0.1$ point of the fine-tuned winner.

2606.04238 2026-06-04 cs.LG cs.AI 版本更新

Recover-LoRA for Aggressive Quantization: Reclaiming Accuracy in 2-Bit Language Models via Low-Rank Adaptation with Knowledge Distillation on Synthetic Data

Recover-LoRA 用于激进量化:通过低秩适配与合成数据知识蒸馏恢复2比特语言模型的精度

Devleena Das, Rajeev Patwari, Elliott Delaye, Ashish Sirasao

发表机构 * Advanced Micro Devices, Inc.(先进微器件公司)

AI总结 针对2比特激进量化导致的大语言模型精度严重下降问题,提出Recover-LoRA方法,结合选择性混合精度策略(仅MLP的gate和up层量化为2比特)和基于合成数据蒸馏的低秩适配训练,在Qwen3-4B上以1万合成样本在12个基准中恢复9个基准80-95%的精度。

详情
AI中文摘要

将权重激进量化至2比特精度可大幅提升大语言模型推理的吞吐量和内存效率,但通常会导致严重的精度下降。这些增益对于内存容量和带宽为主要限制的边缘和设备端部署尤为重要。在本工作中,我们将Recover-LoRA——一种最初为通用模型权重损坏设计的轻量级、无需数据的精度恢复方法——扩展到超低比特量化场景。我们提出了一种选择性混合精度策略,其中仅MLP的gate和up投影层被量化为2比特(W2),而所有其他线性层保持更高精度,从而形成混合精度的GateUp配置。通过三个模型系列(4B-20B)和两个硬件平台的屋顶线分析,我们证明W4/W2-GateUp部署(4比特基础加2比特gate/up)相比均匀W4可实现7.5-23.3%的TPS提升(取决于模型和上下文长度),同时将量化误差限制在可预测的层子集内。然后,我们应用Recover-LoRA——在量化层上通过合成数据的logit蒸馏训练低秩适配器——来恢复因gate和up层的2比特量化而损失的精度。在Qwen3-4B的案例研究中,Recover-LoRA仅使用1万合成训练样本且无需标注数据,就在12个基准中的9个上实现了80-95%的精度恢复。我们进一步证明,对于基于蒸馏的恢复,合成数据的表现与精心整理的标注数据相当,并且恢复结果可泛化到分布外评估任务。我们的结果表明,Recover-LoRA是一种实用的后量化精度恢复工具,适用于部署场景中的激进权重压缩。

英文摘要

Aggressive weight quantization to 2-bit precision offers substantial throughput and memory gains for large language model (LLM) inference, but typically incurs severe accuracy degradation. These gains are particularly relevant for edge and on-device deployment, where memory capacity and bandwidth are primary constraints. In this work, we extend Recover-LoRA -- a lightweight, data-free accuracy recovery method originally developed for general model weight corruption -- to the setting of ultra-low-bit quantization. We propose a selective mixed-precision strategy in which only gate and up projection layers of the MLP are quantized to 2-bit (W2), while all other linear layers remain at higher precision, yielding a mixed-precision GateUp configuration. We demonstrate via roofline analysis across three model families (4B--20B) and two hardware platforms that a W4/W2-GateUp deployment (4-bit base with 2-bit gate/up) delivers 7.5--23.3\% TPS improvement over uniform W4 depending on model and context length, while confining quantization error to a predictable subset of layers. We then apply Recover-LoRA -- training low-rank adapters on the quantized layers via logit distillation with synthetic data -- to recover accuracy lost from 2-bit quantization of the gate and up layers. In a case study on Qwen3-4B, Recover-LoRA achieves 80--95\% accuracy recovery on 9 of 12 benchmarks, using only 10k synthetic training samples and no labeled data. We further demonstrate that synthetic data performs comparably to curated labeled data for distillation-based recovery, and that recovery generalizes to out-of-distribution evaluation tasks. Our results present Recover-LoRA as a practical post-quantization accuracy recovery tool for aggressive weight compression in deployment settings.

2606.04236 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Supportive Token Revealing for Fast Diffusion Language Model Decoding

支持性标记揭示:快速扩散语言模型解码

Giries Abu Ayoub, Mario Barbara, Lluís Pastor-Pérez, Tanja Bien, Aneesh Barthakur, Alaa Maalouf, Loay Mualem

发表机构 * Department of Computer Science, University of Haifa(海法大学计算机科学系) Institute for AI, University of Stuttgart(斯图加特大学人工智能研究所) IMPRS-IS

AI总结 提出AXON模块,通过选择注意力、不确定性和置信度信号中的锚点标记来改善扩散语言模型并行解码的质量-延迟权衡。

详情
AI中文摘要

离散扩散语言模型可以通过并行更新多个掩码位置来高效生成文本,但这种并行性引入了质量-延迟权衡。激进的解码可能过早提交相互依赖的标记,而保守的解码则需要大量去噪步骤。现有方法通过使用置信度或依赖性标准决定哪些标记可以安全揭示来解决这一矛盾。然而,避免不安全提交并不一定使剩余的掩码序列易于解码,因为不确定的标记可能依赖于掩码标记,从而成为去噪步骤的瓶颈。我们提出AXON,一个无需训练的模块,可添加到现有扩散语言模型的并行解码策略之上。AXON不替换基础解码器,而是监控剩余不确定的掩码标记,并仅当它们当前状态表明需要额外上下文时才进行干预。然后它将标准从揭示哪些标记最安全转变为哪些自信揭示最能支持后续去噪。AXON使用注意力、不确定性和置信度信号选择锚点,即不确定位置关注的自信掩码标记。在多个扩散语言模型的推理和代码生成基准上的实验表明,AXON改善了现有并行解码器的质量-延迟权衡,通常减少函数评估次数,同时保持或提高准确性。

英文摘要

Discrete diffusion language models can generate text efficiently by updating multiple masked positions in parallel, but this parallelism introduces a quality-latency trade-off. Aggressive decoding may commit mutually dependent tokens too early, while conservative decoding requires many denoising steps. Existing methods address this tension by deciding which tokens are safe to reveal using confidence or dependency criteria. However, avoiding unsafe commits does not necessarily make the remaining masked sequence easy to decode, since uncertain tokens may depend on masked tokens, creating a bottleneck for denoising steps. We propose AXON, a training-free module that can be added on top of existing parallel decoding strategies for diffusion language models. Rather than replacing the base decoder, AXON monitors the remaining uncertain masked tokens and intervenes only when their current state suggests that additional context is needed. It then shifts the criterion from which tokens are safest to reveal to which confident reveals would best support later denoising. AXON selects anchors, confident masked tokens that uncertain positions attend to, using attention, uncertainty, and confidence signals. Experiments on reasoning and code-generation benchmarks across multiple diffusion language models show that AXON improves the quality-latency trade-off of existing parallel decoders, often reducing the number of function evaluations while maintaining or improving accuracy.

2606.04231 2026-06-04 cs.CL cs.AI 版本更新

MM-BizRAG: Rethinking Multimodal Retrieval-Augmented Generation for General Purpose Enterprise Q&A

MM-BizRAG:面向通用企业问答的多模态检索增强生成再思考

Hanoz Bhathena, Parin Rajesh Jhaveri, Rohan Mittal, Prateek Singh, Aymen Kallala, Rachneet Kaur, Yiqiao Jin, Zhen Zeng, Adwait Ratnaparkhi, Denis Kochedykov

发表机构 * JPMorgan Chase & Co.(摩根大通公司) Georgia Institute of Technology(佐治亚理工学院)

AI总结 提出MM-BizRAG框架,通过文档结构感知分割和布局感知解析,结合统一LLM驱动的工件转换与推理时多模态组装,无需微调即可提升企业文档问答性能,在异构企业数据集和两个公开基准上超越基线最多32个百分点。

Comments Accepted at ACL 2026 (Industry Track)

详情
AI中文摘要

近期多模态检索增强生成(MM-RAG)的进展倾向于最小化解析,依赖页面级图像来生成检索器嵌入和答案生成。虽然高效,但这种趋势往往忽略了对复杂企业文档中丰富结构化信息的显式处理,而是依赖预训练嵌入或视觉语言模型隐式捕获这种结构。在本工作中,我们采取更直接的方法:MM-BizRAG通过文档结构感知分割主动提取和表示文档结构,该分割根据文档方向动态路由文档至特定方向的摄取管道,对垂直结构文档(如报告)应用显式布局感知解析,对水平结构文档(如幻灯片)应用整体页面级表示。统一的LLM驱动的工件转换管道通过基于占位符的位置对齐保留自然阅读顺序,而推理时的多模态组装将检索表示与生成上下文解耦,无需任何微调即可生成更丰富、更基于事实的答案。通过在大型异构企业数据集和两个公开基准(SlideVQA和FinRAGBench-V)上的实验,MM-BizRAG一致地超越最先进的以视觉为中心的基线最多32个百分点,在报告式布局上尤其强劲。此外,我们引入了FastRAGEval,一种单次调用的LLM评判指标,用于细粒度生成召回,将RAGChecker的成本减半,同时实现更强的人类对齐。

英文摘要

Recent advances in multimodal retrieval-augmented generation (MM-RAG) have shifted toward minimal parsing, relying on page-level images for producing retriever embeddings and for answer generation. While efficient, this trend often neglects explicit handling of the rich, structured information in complex enterprise documents, instead depending on pre-trained embeddings or vision-language models to implicitly capture such structure. In this work, we take a more direct approach: MM-BizRAG proactively extracts and represents document structure via a document structure-aware split that dynamically routes documents through orientation-specific ingestion pipelines, applying explicit layout-aware parsing for vertically structured documents (e.g., reports) and holistic page-level representations for horizontally structured documents (e.g., slide decks). A unified LLM-driven artifact transformation pipeline with placeholder-based positional alignment preserves natural reading order, while inference-time multimodal assembly decouples retrieval representations from generation context, enabling richer, more grounded answers without any finetuning requirement. Through experiments on a large, heterogeneous enterprise dataset and two public benchmarks (SlideVQA and FinRAGBench-V), MM-BizRAG consistently outperforms state-of-the-art vision-centric baselines by up to 32% points, with especially strong gains on report-style layouts. Furthermore, we introduce FastRAGEval, a single-call LLM Judge metric for fine-grained generative recall that halves RAGChecker's cost while achieving stronger human alignment.

2606.04226 2026-06-04 cs.RO cs.AI 版本更新

PerceptTwin: Semantic Scene Reconstruction for Iterative LLM Planning and Verification

PerceptTwin:面向迭代LLM规划与验证的语义场景重建

Charlie Gauthier, Sacha Morin, Liam Paull

发表机构 * Department of Computer Science and Operations Research, Université de Montréal(蒙特利尔大学计算机科学与运筹学系) Mila - Quebec AI Institute(魁北克人工智能研究所) CIFAR AI Chair(CIFAR人工智能主席)

AI总结 提出PerceptTwin自动管道,从机器人感知的语义场景表示构建交互式仿真,结合LLM法官验证规划正确性与人类偏好,提升规划成功率约39%。

Comments Accepted at ICRA 2026 (Vienna); published on arxiv for archival purposes. See also https://percept-twin.github.io/

详情
AI中文摘要

仿真环境对于机器人策略学习以及规划验证与确认都很有用。传统上,创建仿真的过程是繁重的。为机器人运行的每个单独环境创建定制的仿真环境是不可行的。在这项工作中,我们引入了PerceptTwin,这是一个全自动管道,直接从机器人感知栈产生的语义场景表示构建交互式仿真。PerceptTwin结合了开放词汇对象地图与3D资产生成、 afford预测和常识条件检查。这些交互式仿真可用于在机器人硬件上执行规划之前验证和完善规划。借鉴AI对齐文献,我们还引入了一个LLM法官,用于验证规划的正确性和与人类偏好的一致性。实验表明,PerceptTwin反馈允许LLM规划器完善规划、增强安全性并抵抗有害的黑盒提示攻击。在我们的任务套件中,PerceptTwin使GPT5、GPT5Mini和GPT5Nano规划器的规划成功率平均提高约39%。此外,对于因未满足技能前提条件而失败的规划,PerceptTwin还将人类规划验证平均提高高达18%。我们的结果证明了从机器人感知进行开放词汇场景仿真作为更安全、更可靠的机器人规划基础的潜力。

英文摘要

Simulation environments are useful for both robot policy learning and planning verification and validation. Traditionally, the process of creating a simulation was onerous. Creating a bespoke simulation environment for each individual environment that a robot would operate in was simply infeasible. In this work, we introduce PerceptTwin, a fully automatic pipeline that constructs interactive simulations directly from semantic scene representations produced by a robot's perception stack. PerceptTwin combines open-vocabulary object maps with 3D asset generation, affordance prediction, and commonsense condition checking. These interactive simulations can be used to validate and refine plans before they are executed on the robot hardware. Borrowing from the AI alignment literature, we also introduce an LLM judge that verifies plan correctness and alignment with human preferences. Experiments show that PerceptTwin feedback allows LLM planners to refine plans, enhance safety, and resist harmful black-box prompting attacks. In our suite of tasks, PerceptTwin improves plan success by an average of approximately 39% for GPT5, GPT5Mini, and GPT5Nano planners. Additionally, PerceptTwin also improves human plan verification by up to 18% on average for plans that fail due to unfilled skill preconditions. Our results demonstrate the potential of open-vocabulary scene simulation from robot perception as a foundation for safer, more reliable robot planning.

2606.04223 2026-06-04 cs.AI 版本更新

Consensus is Strategically Insufficient: Reasoning-Trace Disagreement as a Knowledge-Representation Signal

共识在策略上是不充分的:推理轨迹分歧作为知识表示信号

Michał Wawer, Jarosław A. Chudziak

发表机构 * Laboratory of The New Ethos(新伦理实验室) Warsaw University of Technology(华沙理工大学) Institute of Computer Science(计算机科学研究所) Faculty of Electronics and Information Technology(电子与信息技术学院)

AI总结 本文提出在价值负载任务中,分歧可能反映规范不确定性而非错误,通过将推理轨迹和决策抽象为符号分歧状态,构建知识表示层以支持可废止策略路由,连接亚符号LLM审议与符号知识表示。

Comments Accepted to LAMAS&SR workshop at FLoC 2026 (KR + ICPL + LICS + CP + FSCD)

详情
AI中文摘要

多智能体系统通常通过投票、共识协议、辩论或容错聚合来减少分歧。我们认为,对于价值负载任务,这一目标是不充分的,因为分歧可能反映真正的规范不确定性而非智能体错误。基于先前关于人机协作审核中推理轨迹分歧的工作,我们提出一个知识表示层,其中推理轨迹和智能体决策被抽象为符号分歧状态。给定产生显式推理轨迹和二元决策的智能体,我们根据推理相似性和结论一致性区分四种状态:收敛一致、发散一致、收敛分歧和发散分歧。这些状态支持可废止的策略路由规则。我们在内容审核中实例化该框架,并论证分歧感知路由为多智能体策略推理中亚符号LLM审议与符号知识表示之间提供了桥梁。

英文摘要

Multi-agent systems are commonly designed to reduce disagreement through voting, consensus protocols, debate, or fault-tolerant aggregation. We argue that this objective is insufficient for value-laden tasks, where disagreement may reflect genuine normative uncertainty rather than agent error. Building on prior work on reasoning-trace disagreement in human-AI collaborative moderation, we propose a knowledge-representation layer in which reasoning traces and agent decisions are abstracted into symbolic disagreement states. Given agents producing explicit reasoning traces and binary decisions, we distinguish four states according to reasoning similarity and conclusion agreement: convergent agreement, divergent agreement, convergent disagreement and divergent disagreement. These states support defeasible strategic routing rules. We instantiate the framework in content moderation and argue that disagreement-aware routing provides a bridge between sub-symbolic LLM deliberation and symbolic knowledge representation for multi-agent strategic reasoning.

2606.04205 2026-06-04 cs.MM cs.AI cs.CL cs.CV cs.LG cs.SD 版本更新

DetectZoo: A Unified Toolkit for AI-Generated Content Detection Across Text, Audio, and Image Modalities

DetectZoo:一个用于跨文本、音频和图像模态的AI生成内容检测的统一工具包

Sajad Ebrahimi, Nima Jamali, Bardia Shirsalimian, Kelly McConvey, Wentao Zhang, Jalehsadat Mahdavimoghaddam, Maksym Taranukhin, Maura Grossman, Vered Shwartz, Yuntian Deng, Ebrahim Bagheri

发表机构 * University of Toronto(多伦多大学) University of Waterloo(滑铁卢大学) Toronto Metropolitan University(多伦多 Metropolitan 大学) University of British Columbia(不列颠哥伦比亚大学) Vector Institute(向量研究所)

AI总结 提出DetectZoo,一个首个统一的多模态AI生成内容检测工具包,通过标准化数据预处理、评估流程和集成61个检测器与22个基准数据集,实现公平可重复的基准测试。

详情
AI中文摘要

生成模型的日益普及和能力提升模糊了人类与机器生成内容之间的界限,推动了跨文本、图像和音频检测领域的大量研究。大多数现有的检测器要么是商业软件,要么是开源但带有不兼容的代码库、定制化的预处理、评估协议和评估指标,这使得它们的采用、公平比较和复现变得相当困难。为了解决这一关键差距,我们引入了DetectZoo,这是首个可扩展的工具包,旨在为跨文本、音频和图像模态的AI生成内容检测提供统一接口。DetectZoo标准化了从数据摄取和预处理到模型评估的完整实证流程,为研究人员提供了一个统一的框架来系统地基准测试最先进的检测器。通过将多样的公共数据集和基线检测算法集成到单一的统一API下,我们的工具包促进了严格且可重复的评估。DetectZoo提供了61个检测器的参考实现、22个基准数据集的原生加载器,以及一个标准化的评估流程,通过通用接口报告多个指标。每个检测器都是自包含的,但可通过同一接口访问,自动缓存预训练权重,并复现原始发表的结果。DetectZoo降低了多模态AI取证的入门门槛,使研究人员能够识别跨领域的性能差距,并加速开发鲁棒、可泛化的检测技术。开源仓库和全面文档可在https://github.com/sadjadeb/DetectZoo 获取,且可通过pip install detectzoo安装该包。

英文摘要

The growing popularity and capacity of generative models have eroded the distinction between human and machine-generated content, motivating a growing body of work on detection across text, images, and audio. Most available detectors are either commercial software or, if open-source, come with incompatible codebases with bespoke preprocessing, evaluation protocols, and evaluation metrics, which make their adoption, fair comparison, and reproduction quite difficult. To address this critical gap, we introduce DetectZoo, a first-of-its-kind, extensible toolkit designed to provide a unified interface for AI-generated content detection across text, audio, and image modalities. DetectZoo standardizes the complete empirical pipeline, from data ingestion and preprocessing to model assessment, offering researchers a cohesive framework to benchmark state-of-the-art detectors systematically. By integrating diverse public datasets and baseline detection algorithms under a single, unified API, our toolkit facilitates rigorous and reproducible evaluation. DetectZoo provides reference implementations of 61 detectors, native loaders for 22 benchmark datasets, and a standardized evaluation pipeline that reports multiple metrics through a common interface. Each detector is self-contained yet accessible through the same interface, automatically caches pretrained weights, and reproduces the original published results. DetectZoo lowers the barrier to entry for multi-modal AI forensics, enabling researchers to identify performance gaps across domains and accelerating the development of robust, generalizable detection techniques. The open-source repository and comprehensive documentation are publicly available at https://github.com/sadjadeb/DetectZoo, and the package can be installed via pip install detectzoo.

2606.04202 2026-06-04 cs.AI 版本更新

SMAC-Talk: A Natural Language Extension of the StarCraft Multi-Agent Challenge for Large Language Models

SMAC-Talk: 面向大型语言模型的星际争霸多智能体挑战的自然语言扩展

Joel Sol, Homayoun Najjaran

发表机构 * Faculty of Engineering and Computer Science(工程与计算机科学学院) University of Victoria(维多利亚大学)

AI总结 提出SMAC-Talk环境,通过自然语言通信通道评估LLM智能体在合作多智能体场景中的协调与信任,并构建含欺骗性通信者的评估场景。

Comments 8 pages, 1 figure

详情
AI中文摘要

随着LLM的广泛部署,它们越来越需要与其他AI智能体协同工作而非孤立运行。在这些场景中,有效协调要求智能体进行通信、共享信息并在不确定性下做出决策。我们提出了SMAC-Talk,这是星际争霸多智能体挑战的自然语言扩展,用于评估基于LLM的智能体在合作多智能体环境中的表现。该环境具有分散控制、部分可观测性和长周期决策等关键特征。SMAC-Talk包含一个自然语言通信通道,用于探测智能体的协调与信任。我们利用该通信通道构建了不同的评估场景,包括嵌入欺骗性通信者的设置,该通信者试图仅通过通信来干扰和欺骗盟友。我们提供了三个基准测试智能体,使用Qwen3.5系列的4个模型,并研究了推理结构、记忆和模型规模如何影响智能体间的协调。我们将SMAC-Talk作为开放基准发布,以支持研究社区在合作多智能体场景中开发和评估LLM智能体。

英文摘要

As LLMs become more widely deployed, they are increasingly expected to work alongside other AI agents rather than operating in isolation. Effective coordination in these settings requires agents to communicate, share information and make decisions under uncertainty. We introduce SMAC-Talk, a natural language extension of the StarCraft Multi-Agent Challenge for evaluating LLM-based agents in cooperative multi-agent environments. The environment has several key features such as decentralized control, partial observability and long-horizon decision making. SMAC-Talk includes a natural language communication channel which is used to probe agent coordination and trust. We use this communication channel to construct different evaluation scenarios, including settings with an embedded deceptive communicator that tries to disrupt and deceive allies through communication alone. We provide three agents for benchmarking using 4 models from the Qwen3.5 family and study how reasoning structure, memory and model scale affect coordination between agents. We release SMAC-Talk as an open benchmark to support the research community in developing and evaluating LLM agents in cooperative multi-agent settings.

2606.04193 2026-06-04 cs.CR cs.AI cs.DC 版本更新

Notarized Agents: Receiver-Attested Confidential Receipts for AI Agent Actions

公证代理:面向AI代理行为的接收方认证保密收据

Juan Figuera

发表机构 * Independent Researcher, Sello Project(独立研究者,Sello项目)

AI总结 针对AI代理日志自审计的信任缺陷,提出接收方签名收据协议Sello,通过HPKE加密、JWS绑定和Merkle日志实现防篡改追踪。

Comments 22 pages. Reference implementation at https://github.com/juanfiguera/sello

详情
AI中文摘要

当前AI代理的可观测性在结构上存在缺陷:生成活动日志的实体与被记录活动的实体是同一个。被攻破或有缺陷的代理可以省略、篡改或伪造自身的追踪记录,而运行代理的操作员没有独立的方法检测篡改。我们提出了一类协议来解决这个问题,通过反转信任边界:接收代理调用的服务使用自己的密钥对观察到的内容签名收据,将收据加密给代理的所有者,并将其发布到公共透明度日志中。所有者可以在不信任代理或其操作员的情况下重建防篡改追踪。我们将该类协议实例化为Sello,一种结合了当前任何系统都不具备的四个属性的协议:(P1)接收方签名,(P2)通过JWS将HPKE加密绑定到所有者公钥的授权令牌,(C3)发布到见证人共同签名的Merkle日志,以及(P4)通过令牌引用进行所有者端发现。我们描述了该协议,分析了在攻击者控制代理及其操作员的情况下的安全性,给出了加密操作的微基准测试,并将Sello与相邻的收据协议工作(Signet、AgentROA、Agent Passport System、draft-farley-acta、SCITT)进行了比较。我们讨论了已知的限制,包括抑制攻击、服务合谋和采用激励问题。

英文摘要

Current AI agent observability is structurally compromised: the entity producing the activity log is the same entity whose activity is being logged. A compromised or buggy agent can omit, alter, or fabricate its own traces, and the operator running the agent has no independent way to detect tampering. We propose a class of protocols that resolves this by inverting the trust boundary: the service that receives an agent's call signs a receipt of what it observed using its own key, encrypts the receipt to the agent's owner, and publishes it to a public transparency log. The owner reconstructs a tamper-evident trail without trusting the agent or its operator. We instantiate the class as Sello, a protocol combining four properties absent in any current system: (P1) receiver-side signing, (P2) HPKE encryption to an owner public key bound to the authorization token via JWS, (P3) publication to a witness-cosigned Merkle log, and (P4) owner-side discovery by token reference. We describe the protocol, analyze its security under an adversary that controls the agent and its operator, present microbenchmarks of the cryptographic operations, and situate Sello among adjacent receipt-protocol work (Signet, AgentROA, Agent Passport System, draft-farley-acta, SCITT). We discuss known limitations including the suppression attack, service collusion, and the adoption-incentive problem.

2606.04191 2026-06-04 cs.LG cs.AI 版本更新

Metric-Aware Hybrid Forecasting for the CTF4Science Lorenz Challenge

CTF4Science Lorenz挑战的度量感知混合预测

Cen Lu

发表机构 * EPFL & Idiap Research Institute(瑞士联邦理工学院(EPFL)及Idiap研究所)

AI总结 针对CTF4Science Lorenz挑战,提出一种度量感知混合系统,通过为不同度量族分配专用预测器(去噪器、ODE拟合、直方图替换),在九项任务对上取得高分。

详情
AI中文摘要

我们描述了针对CTF4Science Lorenz挑战的方法,该基准混合了短时预测、长时间分布匹配和轨迹重建,涵盖九项任务对。关键发现是,没有单一模型族在所有度量上占优。相反,我们构建了一个度量感知混合系统,为每个度量族分配不同的预测器:(1)用于全轨迹重建的合成预训练去噪器,(2)用于前20个预测步的Lorenz ODE拟合和轨迹射击,以及(3)使用合成Lorenz库的直方图尾部替换用于长时间评估。该系统中一个具有代表性的成熟提交在公共排行榜上得分为83.83551,而采用相同思想的小型后续堆栈达到了83.85529。我们专注于更干净的中间系统,因为它捕获了完整方法,同时足够简单以重现和分析,而最终提交可以理解为同一骨干的保守扩展。

英文摘要

We describe our approach to the CTF4Science Lorenz challenge, a benchmark that mixes short-horizon forecasting, long-time distribution matching, and trajectory reconstruction across nine task pairs. The key discovery is that no single model family dominated all metrics. Instead, we built a metric-aware hybrid system that assigned a different predictor to each metric family: (1) synthetic-pretrained denoisers for full-trajectory reconstruction, (2) Lorenz ODE fitting and trajectory shooting for the first 20 forecast steps, and (3) histogram-tail substitution using synthetic Lorenz libraries for long-time evaluation. A representative mature submission from this system family scored 83.83551 on the public leaderboard, and a small follow-up stack of the same ideas reached 83.85529. We focus on the cleaner intermediate system because it captures the full method while remaining simple enough to reproduce and analyze, while the final submission can be understood as a conservative extension of the same backbone.

2606.04188 2026-06-04 cs.LG cs.AI cs.RO 版本更新

Dual Advantage Fields

双优势场

Alexey Zemtsov, Maxim Bobrin, Alexander Nikulin, Dmitry V. Dylov, Fakhri Karray, Vladislav Kurenkov, Martin Takáč, Arip Asadulaev

发表机构 * NUST MISIS(努斯大学材料科学与工程学院) MSU(莫斯科大学) Computational Imaging Lab(计算成像实验室) MBZUAI(马斯喀特人工智能研究院) dunnolab(杜诺实验室) Innopolis University(因诺波利斯大学)

AI总结 提出双优势场(DAF)方法,利用双线性对偶值模型生成局部优势信号,通过动作-效应模型预测折扣特征位移并与目标方向对齐来评分动作,实现离线目标条件强化学习中的策略提取。

Comments Accepted by ICML 2026 Workshop on Decision-Making from Offline Datasets to Online Adaptation: Black-Box Optimization to Reinforcement Learning

详情
AI中文摘要

离线目标条件强化学习需要长期可达性估计和局部动作比较。双目标表示提供捕获全局目标可达性的值场,但它们不直接指定在给定状态下应优先选择哪个动作。我们提出双优势场(DAF),一种策略提取方法,将双线性对偶值模型转化为局部优势信号。在双线性对偶参数化下,目标嵌入是值场关于状态表示的梯度。DAF学习一个动作-效应模型,预测由动作引起的折扣特征位移,并通过该位移与目标方向的对齐程度对动作进行评分。在可实现的情况下,该分数等于目标条件Bellman优势,从而提供标准的局部策略改进保证。在OGBench的 locomotion、manipulation 和 puzzle 任务上,DAF改进了聚合RLiable指标,并在局部正确动作与直接朝向最终目标移动不同的设置中表现强劲。

英文摘要

Offline goal-conditioned reinforcement learning requires both long-horizon reachability estimates and local action comparisons. Dual goal representations provide value fields that capture global goal reachability, but they do not directly specify which action should be preferred at a given state. We propose Dual Advantage Fields, a policy-extraction method that turns a bilinear dual value model into a local advantage signal. Under bilinear dual parameterization, the goal embedding is the gradient of the value field with respect to the state representation. DAF learns an action-effect model that predicts the discounted feature displacement induced by an action and scores actions by the alignment between this displacement and the goal direction. In the realizable case, this score equals the goal-conditioned Bellman advantage, yielding a standard local policy-improvement guarantee. On OGBench locomotion, manipulation, and puzzle tasks, DAF improves aggregate RLiable metrics and performs strongly in settings where locally correct actions differ from direct movement toward the final goal.

2606.04182 2026-06-04 cs.LG cs.AI stat.ML 版本更新

Exact Unlearning in Reinforcement Learning

强化学习中的精确遗忘

Thanh Nguyen-Tang, Raman Arora

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出强化学习中的精确遗忘问题,通过ρ-TV稳定算法实现数据删除后输出与从未学习该数据时不可区分,并给出近乎最优的遗憾界。

Comments ICML Spotlight

详情
AI中文摘要

我们提出了强化学习中的精确遗忘问题,目标是设计一个高效框架,使得在收到删除请求后能够移除任何用户的数据,即遗忘后在线学习者的输出与从未与学习者交互过的用户所产生的结果不可区分。对于任意 $ρ>0$,我们证明存在一个 $ρ$-TV 稳定的强化学习算法,支持精确遗忘过程,其期望计算成本仅为从头重新训练计算成本的 $ρ\sqrt{\ln T}$ 分之一。我们为表格型马尔可夫决策过程构造了这样一个 $ρ$-TV 稳定的强化学习算法,其遗憾界为 $\mathcal{O}(H^2 \sqrt{SAT} + H^3 S^2 A + {H^{2.5} S^2 A}/ρ)$,其中 $S, A, H, T$ 分别表示状态数、动作数、回合长度和回合数。我们还为 $ρ$-TV 稳定的强化学习算法建立了 $\Omega(H\sqrt{\!SAT}\! +\! {SAH}/ρ)$ 的下界,表明我们的算法几乎是极小化最优的。

英文摘要

We formulate the problem of \emph{exact unlearning} in reinforcement learning, where the goal is to design an efficient framework that enables the removal of any user's data upon deletion request, i.e., the online learner's output after unlearning is \emph{indistinguishable} from what would have been produced had the deleted user never interacted with the learner. For any $ρ>0$, we show that there exists a reinforcement learning (RL) algorithm that is $ρ$-TV-stable and supports an exact unlearning procedure whose expected computational cost is only a $ρ\sqrt{\ln T}$ fraction of the computational cost of retraining from scratch. We construct such a $ρ$-TV-stable RL algorithm for tabular Markov decision processes (MDPs), which achieves a regret bound of $\mathcal{O}(H^2 \sqrt{SAT} + H^3 S^2 A + {H^{2.5} S^2 A}/ρ)$, where $S, A, H$, and $T$ denote the number of states, the number of actions, the episode horizon, and the number of episodes, respectively. We also establish a lower bound of $Ω(H\sqrt{\!SAT}\! +\! {SAH}/ρ)$ for $ρ$-TV-stable RL algorithms, showing that our algorithm is nearly minimax optimal.

2606.04177 2026-06-04 cs.CL cs.AI 版本更新

A Systematic Analysis of Linguistic Features in AI-Generated Text Detection Across Domains and Models

跨领域与模型的人工智能生成文本检测中语言特征的系统分析

Yassir El Attar, Esra Dönmez, Maximilian Maurer, Agnieszka Falenska

发表机构 * Institute for Natural Language Processing, University of Stuttgart(斯图加特大学自然语言处理研究所) Interchange Forum for Reflecting on Intelligent Systems, University of Stuttgart(智能系统反思交流论坛,斯图加特大学) GESIS Leibniz Institute for the Social Sciences(莱比锡社会科学院) Heinrich-Heine University Düsseldorf(杜塞尔多夫海因里希-海涅大学)

AI总结 通过大规模实证研究,系统评估284个可解释语言特征在27个LLM和10个文本领域中的鲁棒性,发现词汇丰富度是跨模型和领域的最可靠信号。

Comments preprint

详情
AI中文摘要

可解释的语言特征为解释给定文本为何看似机器生成提供了一种有前景的方法,尤其对于非专业用户。然而,关于哪些特征可靠地指示LLM生成文本的现有发现仍然分散在不同的特征集、模型和文本领域中。为解决这一差距,我们进行了一项大规模实证研究,评估语言信号在表征AI生成文本方面的鲁棒性。我们的分析涵盖了来自27个LLM和十个文本领域的输出中的284个可解释语言特征,并在跨模型和跨领域泛化设置下进行。我们表明,仅基于语言特征的分类器可以可靠地区分AI生成文本和人类撰写文本。然而,许多先前提出的指标被证明高度依赖上下文,但词汇丰富度指标除外,这些指标在模型家族和文本领域中保持鲁棒信号。这些结果展示了哪些语言信号在上下文中泛化,并为更可靠、可解释的AI生成语言分析提供了基础。

英文摘要

Interpretable linguistic features offer a promising approach for explaining why a given text appears machine-generated, particularly for non-expert users. However, existing findings on which features reliably indicate LLM-generated text remain fragmented across feature sets, models, and text domains. To address this gap, we conduct a large-scale empirical study assessing the robustness of linguistic signals for characterizing AI-generated text. Our analysis covers 284 interpretable linguistic features across outputs from 27 LLMs and ten text domains under cross-model and cross-domain generalization settings. We show that classifiers based solely on linguistic features can reliably distinguish AI-generated from human-written text. However, many previously proposed indicators prove strongly context-dependent, with the exception of measures of lexical richness, which remain robust signals across model families and text domains. These results demonstrate which linguistic signals generalize across contexts and provide a foundation for more reliable, interpretable analyses of AI-generated language.

2606.04171 2026-06-04 cs.CR cs.AI cs.LG 版本更新

MimeLens: Position-Agnostic Content-Type Detection for Binary Fragments

MimeLens: 二进制片段的位置无关内容类型检测

Michael J. Bommarito

发表机构 * II∗

AI总结 针对现有文件类型分类系统(如Magika)无法处理无头片段、随机磁盘块等非完整文件输入的问题,提出MimeLens,一种基于BERT的小型编码器家族,通过随机偏移采样训练实现位置无关的二进制内容分类,在libmagic标记数据上top-1准确率比Magika v1.1高10.7个百分点,并能从单个UDP数据包或随机磁盘块中分类。

Comments 18 pages, 2 figures, 15 tables. Models released on Hugging Face (https://huggingface.co/mjbommar); reference training code at https://github.com/mjbommar/mimelens-training

详情
AI中文摘要

文件类型分类是恶意软件分类、取证雕刻、数据包检查和存储索引等工作流程的基础。像Google的Magika这样的学习系统假设在已知偏移处访问整个文件,因此它们无法处理这些任务实际产生的许多输入,例如单个数据包负载、无头的雕刻片段、随机磁盘块或分块上传。我们引入了MimeLens,这是一个小型BERT风格编码器家族,在从每个文件内均匀随机偏移处采样的窗口的二进制内容上进行预训练,没有特权文件头位置,有标准上下文和短上下文变体。一个字节块来自文件中的任何位置,无需头部且无固定大小;输出是libmagic的125个MIME标签之一。在完整文件的干净头部上,MimeLens在libmagic标记数据上的top-1准确率比Magika v1.1高10.7个百分点,并且在Magika无法分类的地方(例如单个中间流UDP数据包)仍然能分类,在随机中间文件磁盘块上的准确率是libmagic和Magika的两倍以上。代价是延迟:在CPU上,MimeLens每个样本的运行速度大约比Magika慢一到两个数量级,但在消费级GPU或批处理中与之相当。所有训练好的检查点已在Hugging Face上发布(mjbommar/mimelens-001-*)。

英文摘要

File-type classification underlies many workflows like malware triage, forensic carving, packet inspection, and storage indexing. Learned systems such as Google's Magika assume whole-file access at a known offset, so they break on the inputs many of these tasks actually produce, like a single packet payload, a header-less carved fragment, a random disk block, or a chunked upload. We introduce MimeLens, a family of small BERT-style encoders pretrained on binary content from windows sampled at a uniformly random offset within each file, with no privileged head-of-file position, in standard- and short-context variants. A byte chunk goes in from anywhere in a file, no header needed and no fixed size; out comes one of libmagic's 125 MIME labels. On the clean head of complete files, MimeLens beats Magika v1.1 by +10.7 pp top-1 on libmagic-labeled data, and it keeps classifying where Magika cannot: from a single mid-stream UDP packet, and more than twice as accurately as libmagic and Magika on random mid-file disk blocks. The cost is latency: MimeLens runs roughly one to two orders of magnitude slower per sample on CPU than Magika, though it matches on consumer GPUs or in batch. All trained checkpoints are released on Hugging Face (mjbommar/mimelens-001-*).

2606.04167 2026-06-04 cs.LG cs.AI 版本更新

Smart Transportation Without Neurons -- Fair Metro Network Expansion with Tabular Reinforcement Learning

无神经元的智能交通——基于表格强化学习的公平地铁网络扩展

Dimitris Michailidis, Sennay Ghebreab, Fernando P. Santos

发表机构 * Socially Intelligent Artificial Systems University of Amsterdam(社会智能人工智能系统大学阿姆斯特丹)

AI总结 针对地铁网络扩展问题,提出将非马尔可夫奖励决策过程与表格强化学习相结合的方法,在保证性能的同时大幅降低训练轮次和碳排放,并融入社会公平性指标。

Comments 16 pages

详情
AI中文摘要

我们解决了地铁网络扩展问题(MNEP),这是交通网络设计问题(TNDP)的一个子集,专注于扩展地铁系统以满足出行需求。传统方法依赖于精确和启发式方法,需要专家定义的约束来缩小搜索空间。最近,深度强化学习(Deep RL)因其在复杂序列决策过程中的有效性而出现,但它仍然计算成本高、环境成本高,并且需要额外的工程来解释。我们表明,MNEP问题规模足够小,不需要深度强化学习方法。将MNEP重新表述为非马尔可夫奖励决策过程(NMRDP),我们使用表格强化学习以显著更少的训练轮次实现类似的性能,此外还提供了更高的可解释性。此外,我们将社会公平标准纳入奖励函数,侧重于效率和公平性,突出了我们方法的多功能性。在现实场景中——西安和阿姆斯特丹——我们的方法平均将总轮次减少了18倍,总碳排放减少了12倍,同时与深度强化学习保持竞争力。这种方法提供了一种可复制、模块化、可解释且资源高效的解决方案,并具有应用于其他组合优化问题的潜力。

英文摘要

We tackle the Metro Network Expansion Problem (MNEP), a subset of the Transport Network Design Problem (TNDP), which focuses on expanding metro systems to satisfy travel demand. Traditional methods rely on exact and heuristic approaches that require expert-defined constraints to reduce the search space. Recently, deep reinforcement learning (Deep RL) has emerged due to its effectiveness in complex sequential decision-making processes-it remains, however, computationally expensive, environmentally costly, and requires additional engineering to interpret. We show that MNEP problems are small enough to not require Deep RL methods. Reformulating the MNEP as a Non-Markovian Rewards Decision Process (NMRDP), we use tabular RL to achieve similar performance with significantly fewer training episodes, additionally offering greater interpretability. Additionally, we incorporate social equity criteria into the reward functions, focusing on efficiency and fairness, highlighting the versatility of our method. Evaluated in real-world settings-Xi'an and Amsterdam-our method reduces total episodes by a factor of 18 and total carbon emissions by a factor of 12 on average, while remaining competitive with Deep RL. This approach offers a replicable, modular, interpretable, and resource-efficient solution with potential applications to other combinatorial optimization problems.

2606.04164 2026-06-04 cs.LG cs.AI 版本更新

ADAPTOOD: Uncertainty-Aware Fine-Tuning for Out-of-Distribution ECG Time Series Models

ADAPTOOD:面向分布外心电图时间序列模型的不确定性感知微调

Sotirios Vavaroutas, Yu Yvonne Wu, Ali Etemad, Cecilia Mascolo

发表机构 * University of Cambridge(剑桥大学) Dartmouth College(达特茅斯学院) Queen’s University(皇后大学)

AI总结 提出ADAPTOOD框架,利用数据不确定性量化分布偏移严重性,结合低秩更新和自适应超参数优化,在分布外心电图时间序列任务上提升准确率高达7%和精确率12.9%。

Comments 11 pages

详情
AI中文摘要

用于训练的数据样本通常与微调和部署期间遇到的数据不同,尽管机器学习模型显示出潜力,但在只有少量标注数据集可用时,其性能仍然有限。在由不同传感器、人群和应用设置引起的分布偏移下,性能通常会下降。尽管预训练有所帮助,但模型在现实环境中经常遇到分布外(OOD)数据,导致鲁棒性降低。现有的自适应方法通常假设固定的分布偏移,并在出现多种类型或严重性时难以应对。特别是,它们忽略了偏移的严重性,例如将适应大型熟悉数据集与适应带有新任务的小型数据集同等对待,这限制了泛化能力。为了解决这个问题,我们提出了ADAPTOOD,这是一个新颖的框架,利用数据不确定性来量化分布偏移的严重性并指导时间序列的微调。这种不确定性衡量目标部署分布中的样本与预训练分布偏离的程度,提供了OOD严重性的直接信号。我们的框架将这种不确定性与低秩模型更新和自适应超参数优化相结合,以改进自适应。我们表明,在OOD任务中,ADAPTOOD比现有方法实现了高达7%的准确率和12.9%的精确率提升,在分布偏移严重性增加时仍保持强劲性能。

英文摘要

Data samples used for training often differ from those encountered during fine-tuning and deployment, and while ML models show promise, their performance remains limited when only small annotated datasets are available. Performance often degrades under distribution shifts caused by diverse sensors, populations, and application settings. Although pre-training helps, models frequently encounter out-of-distribution (OOD) data in real-world settings, leading to reduced robustness. Existing adaptation methods usually assume fixed distribution shifts and struggle when multiple types or severities occur. In particular, they overlook shift severity, for example treating adaptation to a large familiar dataset the same as adaptation to a small dataset with a new task, which limits generalisation. To address this, we propose ADAPTOOD, a novel framework that leverages data uncertainty to quantify distribution shift severity and guide fine-tuning for time series. This uncertainty measures how strongly samples from the target deployment distribution deviate from the pre-training distribution, providing a direct signal of OOD severity. Our framework combines this uncertainty with low-rank model updates and adaptive hyperparameter optimisation to improve adaptation. We show that ADAPTOOD achieves up to 7% higher accuracy and 12.9% higher precision than existing methods in OOD tasks, maintaining strong performance as distribution shift severity increases.

2606.04152 2026-06-04 cs.AI cs.CY 版本更新

Thinking Through Signs: PEEL as a Semiotic Scaffolding for Epistemically Accountable AI-Enabled Research

通过符号思考:PEEL作为认知可问责的AI赋能研究的符号脚手架

Clarisse de Souza, Gabriel Barbosa, Simone Diniz Junqueira Barbosa, Bárbara Betts, Renato Cerqueira, Juliana Jansen Ferreira

发表机构 * PUC-Rio(里约热内卢联邦大学) PUC-Behring Institute of Artificial Intelligence(贝林格人工智能研究所)

AI总结 本文提出PEEL框架,结合Voyant Tools的确定性远读与Claude的LLM解释,基于皮尔斯符号学和溯因推理,揭示AI生成摘要中的系统性扭曲,并得出三项设计启示。

Comments 10 pages, 5 figuras

详情
AI中文摘要

大型语言模型正在重塑研究实践,同时悄然侵蚀研究者的认知可问责性。本文评论介绍了PEEL——AI中认知参与素养的协议,这是一个工作脚手架,它结合了通过Voyant Tools进行的确定性远读和通过Claude进行的LLM解释,基于皮尔斯符号学和溯因推理。应用于三个源文本的AI生成浓缩版本,PEEL揭示了在没有非AI测量的情况下不可见的数量、词频和认知声音的系统性扭曲,并产生了三项设计启示:确定性工具必须伴随AI工具;流畅性不等于保真度;认知权威必须被设计进来,而不是被假定。

英文摘要

Large language models are reshaping research practice while quietly eroding researchers epistemic accountability. This commentary introduces PEEL - Protocols for Epistemically Engaged Literacy in AI, a working scaffolding that combines deterministic distant reading via Voyant Tools with LLM interpretation via Claude, grounded in Peircean semiotics and abductive reasoning. Applied to AI-generated condensations of three source texts, PEEL reveals systematic distortions in quantity, term frequency, and epistemic voice that are invisible without non-AI measurement -- and yields three design implications: deterministic instruments must accompany AI tools; fluency is not fidelity; epistemic authority must be designed in, not assumed.

2606.04150 2026-06-04 cs.AI cs.HC 版本更新

Stumbling Into AI Emotional Dependence: How Routine AI Interactions Reshape Human Connection

偶然陷入AI情感依赖:日常AI互动如何重塑人际关系

Yaoxi Shi, Cathy Mengying Fang, Pattie Maez, Amit Goldenberg

发表机构 * Imperial College Business School(帝国学院商学院) Harvard Business School AI Institute(哈佛商学院人工智能研究所) MIT Media Lab(麻省理工学院媒体实验室) Harvard Business School(哈佛商学院) Harvard Department of Psychology(哈佛大学心理学系)

AI总结 本文通过实证研究,揭示AI情感支持通常在日常任务导向的互动中偶然产生,且这种路径依赖会改变人们对AI情感能力的信念,导致对AI的偏好增加、对人类的偏好减少。

详情
AI中文摘要

公共讨论和新兴政策通常假设AI情感支持是一种有意的行为:孤独的用户有意识地寻求专用伴侣聊天机器人的安慰。在本文中,我们基于新兴的实证证据,认为这种描述在两个层面上不准确,既涉及AI情感支持的产生方式,也涉及它如何塑造未来行为。首先,AI情感支持通常是在通用平台上的任务导向互动中偶然产生的,就像工作场所的友谊通过合作加深一样。其次,这些偶然遭遇是路径依赖的:对AI情感支持的积极体验会更新人们对AI情感能力的信念,并改变他们未来寻求情感支持的选择,增加对AI的偏好,减少对人类的偏好。我们回顾了最近的证据,包括与OpenAI合作进行的一项大规模纵向研究,该研究显示,每天与AI进行五分钟关于个人问题的对话,持续28天,导致寻求人类支持的偏好下降10.3%,对AI的偏好上升11.6%。这些发现表明,当前专注于伴侣应用和孤立互动的政策无法充分保护人际关系。相反,有效的监管应扩展到通用AI系统,并解决人们寻求支持方式的累积性、轨迹层面的变化。认识到人们如何偶然陷入AI情感支持,以及这些遭遇如何随时间重塑人际关系,对于保障人类福祉至关重要。

英文摘要

Public discourse and emerging policy typically assume that AI emotional support is a deliberate act: a lonely user consciously seeking comfort from a dedicated companion chatbot. In this paper, we draw on emerging empirical evidence and argue that this picture is inaccurate on two accounts, both in how AI emotional support arises and how it shapes future behavior. First, AI emotional support commonly emerges incidentally within task-oriented interactions on general-purpose platforms, much as workplace friendships deepen through collaboration. Second, these incidental encounters are path-dependent: positive experiences of AI emotional support update people's beliefs about AI's emotional capabilities and redirect their choices for future emotional support, increasing preference for AI and decreasing preference for humans. We review recent evidence, including a large-scale longitudinal study conducted in collaboration with OpenAI, showing that daily five-minute conversations with an AI about personal issues over 28 days led to a 10.3% decrease in the preference for seeking support from humans and an 11.6% increase in the preference for AI. These findings suggest that current policy, focused on companion apps and isolated interactions, cannot adequately protect human connection. Instead, effective regulations should extend to general-purpose AI systems and address cumulative, trajectory-level changes in how people seek support. Recognizing how people stumble into AI emotional support and how those encounters redirect human connections over time is essential to safeguarding human well-being.

2606.04143 2026-06-04 cs.LG cs.AI 版本更新

Physics-Informed Machine Learning for Short-Term Flood Prediction

物理信息机器学习用于短期洪水预测

Tewodros Syum Gebre, Jagrati Talreja, Leila Hashemi-Beni

发表机构 * IEEE Service Center(IEEE服务中心) National Science Foundation(国家科学基金会) Microsoft(微软)

AI总结 提出一种物理信息机器学习框架,通过将水文知识作为趋势对齐约束嵌入LSTM损失函数,在数据稀缺和极端天气下提升洪水预测的物理一致性和可靠性。

Comments This paper has been accepted for publication in IGARSS 2026. The final authenticated version will be available through IEEE Xplore

详情
AI中文摘要

准确的洪水预测对于减轻灾害风险和保护社区至关重要。然而,纯数据驱动的机器学习模型在数据稀缺环境中常常表现不佳,并可能违反基本的水文原理。标准长短期记忆(LSTM)网络可能产生物理上不一致的预测,特别是在外推到极端天气条件时。为了解决这些限制,我们提出了一种物理信息机器学习(PIML)框架,将水文知识直接纳入LSTM模型的损失函数中。具体来说,趋势对齐约束惩罚降水与流量趋势之间的方向不一致性,从而在不需复杂水动力学方程的情况下提高模型鲁棒性。这种正则化鼓励模型学习物理上合理的水文过程线行为,即使在训练数据有限的情况下,也能增强峰值洪水事件期间的可靠性。实验结果表明,所提出的物理信息模型在数据稀缺环境下优于标准LSTM基线,当仅使用5%的可用数据训练时,纳什-萨特克利夫效率(NSE)从0.20提高到0.23。在模拟极端气候情景下的额外压力测试表明,基线模型表现出不稳定的行为,而物理信息模型保持了方向一致性和物理合理性。尽管在数据有限的情况下准确预测极端峰值幅度仍然具有挑战性,但所提出的方法显著减少了纯数据驱动模型中常见的非物理波动。这些发现表明,简单的物理约束可以显著提高深度学习模型在实时洪水预测中的可靠性,为无测站流域和不断变化的气候条件提供了实用解决方案。

英文摘要

Accurate flood forecasting is essential for mitigating disaster risks and protecting communities. However, purely data-driven machine learning models often struggle in data-scarce environments and may violate fundamental hydrological principles. Standard Long Short-Term Memory (LSTM) networks can generate physically inconsistent predictions, particularly when extrapolating to extreme weather conditions. To address these limitations, we propose a Physics-Informed Machine Learning (PIML) framework that incorporates hydrological knowledge directly into the loss function of an LSTM model. Specifically, a Trend Alignment constraint penalizes directional inconsistencies between precipitation and discharge trends, improving model robustness without requiring complex hydrodynamic equations. This regularization encourages the model to learn physically plausible hydrograph behavior, even with limited training data, while enhancing reliability during peak flood events. Experimental results show that the proposed physics-informed model outperforms a standard LSTM baseline in data-scarce settings, increasing the Nash-Sutcliffe Efficiency (NSE) from 0.20 to 0.23 when trained on only 5% of the available data. Additional stress tests under simulated extreme climate scenarios demonstrate that the baseline model exhibits unstable behavior, whereas the physics-informed model maintains directional consistency and physical plausibility. Although accurately predicting extreme peak magnitudes remains challenging with limited data, the proposed approach substantially reduces unphysical fluctuations common in purely data-driven models. These findings demonstrate that simple physical constraints can significantly improve the reliability of deep learning models for real-time flood forecasting, offering a practical solution for ungauged basins and evolving climate conditions.

2606.04141 2026-06-04 cs.CR cs.AI 版本更新

Caught in the Act(ivation): Toward Pre-Output and Multi-Turn Detection of Credential Exfiltration by LLM Agents

当场抓获(激活):面向LLM智能体的凭证泄露预输出和多轮检测

Kargi Chauhan, Pratibha Revankar

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 研究通过激活探针、蜜令令牌和累积信息流追踪三种互补防御方法,在预输出和多轮对话中检测LLM智能体的凭证泄露。

详情
AI中文摘要

LLM智能体通常将敏感凭证与不可信检索内容置于同一上下文窗口中,为间接提示注入诱导凭证泄露提供了直接途径。我们通过三种互补防御研究这种失效模式。首先,我们探究激活探针能否在输出令牌发出前检测凭证访问。其次,我们从格式特定的字符模型构建蜜令令牌,并使用分裂共形预测校准检测。第三,我们将多轮泄露视为累积信息流问题,并跨对话轮次追踪估计的泄露预算。在开放权重模型的受控实验中,激活特征能够高精度区分良性提示和凭证窃取提示,包括在保留编码变换下。在一个小型合成多轮测试集中,累积会计检测到了每轮检测器遗漏的攻击。这些结果是初步的:多轮基准测试为内部小型数据集,激活方法需要白盒访问,信息估计器提供的是实用信号而非正式上界。尽管如此,结果表明凭证泄露防御应结合预输出监控、校准的金丝雀检测和时间泄露会计,而非仅依赖文本级输出过滤器。

英文摘要

LLM agents often place sensitive credentials in the same context window as untrusted retrieved content, creating a direct path for indirect prompt injection to induce credential exfiltration. We study this failure mode through three complementary defenses. First, we ask whether activation probes can detect credential access before output tokens are emitted. Second, we construct honeytokens from format-specific character models and calibrate detection with split conformal prediction. Third, we treat multi-turn exfiltration as a cumulative information-flow problem and track an estimated leakage budget across conversation turns. In controlled experiments on open-weight models, activation features separate benign and credential-seeking prompts with high accuracy, including under held-out encoding transformations. In a small synthetic multi-turn suite, cumulative accounting detects attacks that per-turn detectors miss. These results are preliminary: the multi-turn benchmark is in-house and small, the activation method requires white-box access, and the information estimator provides a practical signal rather than a formal upper bound. Still, the results suggest that credential-exfiltration defenses should combine pre-output monitoring, calibrated canary detection, and temporal leakage accounting rather than relying only on text-level output filters.

2606.04126 2026-06-04 cs.AR cs.AI cs.SE 版本更新

HighTide: An Agent-Curated Open-Source VLSI Benchmark Suite

HighTide:一个由智能体策划的开源VLSI基准测试套件

Benjamin Goldblatt, Paolo Pedroso, Farhad Modaresi, Ethan Sifferman, Matthew R. Guthaus

发表机构 * University of California, Santa Cruz(加州大学圣克鲁兹分校)

AI总结 提出HighTide,一个由AI辅助策划的开源VLSI基准测试套件,通过12种智能体技能覆盖设计生命周期,并集成Bazel增量编译和远程缓存。

详情
AI中文摘要

我们介绍HighTide,一个不断演进的AI辅助基准测试套件。具体贡献包括:(i) 一个涵盖多种设计语言和技术节点的多样化开源套件,(ii) 基于Bazel的增量RTL到GDS编译,支持远程缓存,(iii) 通过十二种智能体技能进行AI辅助设计策划,覆盖设计生命周期、流程优化、工具参考和元维护,并配有每个设计的决策日志,作为跨套件调优理由的长期记忆,以及(iv) 一个包含RTL编译验证的基础设施,用于稳定发布。该套件公开可用,并旨在与开源硬件生态系统共同成长。

英文摘要

We introduce HighTide, an evolving AI-assisted benchmark suite. Specifically, the contributions are: (i) a diverse open-source suite spanning multiple design languages and technology nodes, (ii) Bazel-based incremental RTL-to-GDS compilation with remote caching, (iii) AI-assisted design curation through twelve agent skills covering the design lifecycle, flow optimization, tool reference, and meta-maintenance, backed by per-design decision logs that serve as long-term memory of tuning rationale across the suite, and (iv) an infrastructure with RTL compilation verification for stable releases. The suite is publicly available and designed to grow with the open-source hardware ecosystem.

2606.04123 2026-06-04 math.OC cs.AI cs.RO 版本更新

Semantic Constraint Synthesis for Adaptive Trajectory Optimization via Large Language Models

基于大语言模型的语义约束综合用于自适应轨迹优化

Eleanor Brosius, Yuji Takubo, Daniele Gammelli, Simone D'Amico, Marco Pavone

发表机构 * Stanford University(斯坦福大学)

AI总结 提出利用大语言模型将自然语言描述的任务需求转化为可执行的轨迹优化代码和数学公式,在航天器交会场景中实现了从语义需求重构凸轨迹优化问题的高成功率。

Comments 7 pages, 4 figures, Presented as a short paper at IEEE CVPR 2026, AI4Space Workshop

详情
AI中文摘要

轨迹优化是实现太空探索中安全可靠自主操作的关键组成部分。随着太空任务在频率、复杂性和范围上的增加,迫切需要快速制定数学上合理的轨迹优化问题,以准确反映任务目标和操作约束。然而,将任务意图转化为易于处理的轨迹优化分析公式需要大量的领域专业知识。本文提出一个框架,利用大语言模型(LLMs)将任务需求和约束的自然语言描述转化为可执行的轨迹优化代码及相应的数学公式。在航天器交会场景中的实验表明,从语义任务需求重构凸轨迹优化问题具有高成功率。最终,这项工作凸显了LLMs在连接高层意图与形式化优化模型方面的潜力,从而实现更灵活高效的航天器轨迹设计。

英文摘要

Trajectory optimization is a critical component for enabling safe and reliable autonomous operations in space exploration. As space missions increase in frequency, complexity, and scope, there is a growing need to rapidly formulate mathematically sound trajectory optimization problems that accurately reflect mission objectives and operational constraints. However, translating mission intent into tractable analytical formulations for trajectory optimization requires substantial domain expertise. This paper presents a framework that leverages large language models (LLMs) to translate natural language descriptions of mission requirements and constraints into executable trajectory optimization code and corresponding mathematical formulations. Experiments in spacecraft rendezvous scenarios demonstrate a high success rate in reconditioning a convex trajectory optimization problem from semantic mission requirements. Ultimately, this work highlights the potential of LLMs to bridge high-level intent and formal optimization models, enabling more flexible and efficient trajectory design of spacecraft.

2606.04120 2026-06-04 cs.CL cs.AI 版本更新

SaliMory: Orchestrating Cognitive Memory for Conversational Agents

SaliMory: 为对话代理编排认知记忆

Kai Zhang, Xinyuan Zhang, Hongda Jiang, Shiun-Zu Kuo, Hyokun Yun, Ejaz Ahmed, Shereen Oraby, Ziyun Li, Sanat Sharma, Ann Lee, Ahmed A Aly, Anuj Kumar, Raffay Hamid, Xin Luna Dong

发表机构 * Meta Reality Labs(Meta现实实验室)

AI总结 提出SALIMORY框架,通过层级阶段过程奖励和奖励分解对比优化,端到端训练单一语言模型管理认知结构记忆,显著降低记忆相关错误并提升个性化表现。

详情
AI中文摘要

作为终身伴侣的对话代理必须在所有交互中保持持久记忆。然而,简单地用原始检索扩展上下文窗口会降低推理质量,而通过标准强化学习训练记忆代理在多阶段流程中会造成严重的信用分配瓶颈。为解决这一问题,我们引入了SALIMORY,一个训练单一语言模型管理认知结构记忆(涵盖用户事实、偏好和工作记忆)的框架。通过引入层级阶段过程奖励和奖励分解对比优化,SALIMORY为不同的记忆操作(选择性过滤、整合和线索驱动回忆)提供端到端的隔离监督。SALIMORY将记忆相关故障减少了三分之一,端到端准确率比最先进方法高出10%以上,良好个性化率提高了一倍多。

英文摘要

Conversational agents that serve as lifelong companions must maintain persistent memory across all interactions. However, simply expanding context windows with raw retrieval degrades reasoning quality, while training memory agents via standard reinforcement learning creates a severe credit assignment bottleneck in a multi-stage pipeline. To solve this, we introduce SALIMORY, a framework that trains a single language model to manage a cognitively-structured memory-spanning user facts, preferences, and working memory. By introducing a hierarchical stage-wise process reward and reward-decomposed contrastive refinement, SALIMORY provides isolated supervision for distinct memory operations (selective filtering, consolidation, and cue-driven recall) end-to-end. SALIMORY cuts memory-attributed failures by one-third, outperforms the state-of-the-art by over 10% in end-to-end accuracy, and more than doubles the Good Personalization rate.

2606.04115 2026-06-04 cs.LG cs.AI 版本更新

dMX: Differentiable Mixed-Precision Assignment for Low-Precision Floating-Point Formats

dMX: 低精度浮点格式的可微分混合精度分配

Giuseppe Franco, Ian Colbert, Pablo Monteagudo-Lago, Felix Marty, Nicholas Fraser

发表机构 * AMD

AI总结 提出可微分混合精度量化框架 dMX,通过连续优化每层浮点格式参数并配合退火调度和正则化项,实现硬件兼容的 MXFP 格式分配,在 LLM 上取得帕累托最优效果。

详情
AI中文摘要

将大型语言模型(LLM)量化为低精度浮点表示是高效部署的关键,然而在所有层上统一应用单一比特宽度在性能和准确性方面均非最优。本文介绍 dMX,一种用于可学习浮点比特宽度分配的可微分混合精度量化框架。我们研究了其在开放计算项目(OCP)标准定义的微缩放浮点(MXFP)数据类型家族上的应用。每层比特宽度分配被表述为一个连续优化问题,其中每层的浮点格式由一个标量参数参数化,将多变量设计空间折叠为单个可学习偏移量。在训练过程中,该偏移量取连续值,避免了离散量化格式之间的突然振荡。基于温度的退火调度逐步离散化学习到的偏移量,确保最终配置映射到硬件兼容的 MXFP 格式,而不会在训练和推理行为之间出现突变。目标感知正则化项将平均比特宽度引导至用户指定的预算,作为推理成本的粗粒度代理,平衡模型质量与部署效率。我们在不同 LLM 家族(如 Llama、Qwen3 和 SmolLM2)上进行了实验,评估了 WikiText-2 上的困惑度和四个零样本推理基准上的准确率。在这些设置中,dMX 一致地产生帕累托主导模型,并优于基于 Kullback-Leibler(KL)散度的层选择启发式方法,有效导航模型质量与平均比特宽度之间的权衡。

英文摘要

Quantizing large language models (LLMs) to low-precision floating-point representations is central to efficient deployment, yet applying a single bit-width uniformly across all layers is sub-optimal in terms of both performance and accuracy. This work introduces dMX, a differentiable mixed-precision quantization framework for learnable floating-point bit-width assignment. We study its application for the microscaling floating-point (MXFP) family of data types defined by the Open Compute Project (OCP) standard. The per-layer bit-width assignment is formulated as a continuous optimization problem in which each layer's floating-point format format is parameterized by a scalar parameter, folding the multi-variate design space into a single learnable offset. During training this offset takes continuous values, avoiding sudden oscillations between discrete quantization formats. A temperature-based annealing schedule progressively discretizes the learned offsets, ensuring that the final configuration maps to hardware-compatible MXFP formats without abrupt transitions between training and inference behavior. A target-aware regularization term steers the average bit-width toward a user-specified budget, serving as a coarse-grained proxy for inference cost and balancing model quality against deployment efficiency. We performed experiments on different families of LLM, such as Llama, Qwen3, and SmolLM2, evaluating perplexity on WikiText-2 and accuracy on four zero-shot reasoning benchmarks. Across these settings, dMX consistently yields Pareto-dominating models and improves over Kullback-Leibler (KL) divergence-based layer-selection heuristics, efficiently navigating trade-offs between model quality and average bit-width.

2606.04111 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

AgenticDiffusion: Agentic Diffusion-based Path Planning for Vision-Based UAV Navigation

AgenticDiffusion:基于智能体扩散的视觉无人机导航路径规划

Faryal Batool, Muhammad Ahsan Mustafa, Fawad Mehboob, Valerii Serpiva, Dzmitry Tsetserukou

发表机构 * University of Engineering and Technology, Lahore(拉合尔工程与技术大学)

AI总结 提出AgenticDiffusion多视角无人机导航框架,结合语言引导推理、开放词汇目标定位、视觉扩散规划与NMPC,通过协调第一人称和俯视图提升室内导航效率,在40次真实实验中实现80%任务成功率。

详情
AI中文摘要

室内无人机导航需要在有限视场观测下实现高效探索、场景理解和可靠轨迹执行。现有的基于视觉的导航框架通常依赖单视角观测,限制了其对遮挡、目标可见性和全局场景结构的推理能力。在这项工作中,我们提出了AgenticDiffusion,一个多视角无人机导航框架,在统一的空中导航流程中协调语言引导推理、开放词汇目标定位、基于视觉的扩散规划以及NMPC。给定自然语言指令和同步的第一人称视角(FPV)与俯视图观测,该框架在轨迹执行前确定最具信息量的导航视角并生成任务计划。使用开放词汇定位模型定位目标后,特定视角的扩散规划器生成用于无人机执行的导航轨迹。通过互补视角,所提框架减少了重复目标探索,并提高了在杂乱室内环境中的导航效率。该框架在四个真实无人机导航场景中进行了验证,涉及自适应视角选择、多阶段任务执行、长时域导航和安全着陆点选择。实验结果表明,在40次真实试验中,总体任务成功率达到80%,而扩散规划器实现了100%的轨迹生成成功率。

英文摘要

Indoor UAV navigation requires efficient exploration, scene understanding, and reliable trajectory execution under limited field-of-view observations. Existing vision-based navigation frameworks typically rely on single-view observations, limiting their ability to reason about occlusions, target visibility, and global scene structure. In this work, we propose AgenticDiffusion, a multi-view UAV navigation framework that coordinates language-guided reasoning, open-vocabulary target grounding, vision-based diffusion planning, and NMPC within a unified aerial navigation pipeline. Given a natural language instruction and synchronized first-person-view (FPV) and top-view observations, the framework determines the most informative viewpoint for navigation and generates a mission plan prior to trajectory execution. The targets are localized using an open-vocabulary grounding model, after which viewpoint-specific diffusion planners generate navigation trajectories for UAV execution. Using complementary viewpoints, the proposed framework reduces repeated target exploration and improves navigation efficiency in cluttered indoor environments. The framework was validated in four real-world UAV navigation scenarios involving adaptive viewpoint selection, multi-stage mission execution, long-horizon navigation, and safe landing-site selection. The experimental results demonstrated an overall mission success rate of 80% in 40 real-world trials, while the diffusion planners achieved a trajectory generation success rate of 100%.

2606.04108 2026-06-04 cs.GR cs.AI cs.CV cs.LG 版本更新

SymTRELLIS: Symmetry-Enforced Voxel Latents for 3D Generation

SymTRELLIS: 对称性增强的体素潜变量用于3D生成

Guangda Ji, Qimin Chen, Qinchan Li, Mingrui Zhao, Kai Wang, Hao Zhang

发表机构 * Simon Fraser University(西蒙 Fraser大学)

AI总结 提出SymTRELLIS方法,通过在流模型生成过程中对预测速度进行对称化平均,强制任意有限点群对称性,无需重新训练VAE或流模型,显著降低对称性误差。

详情
AI中文摘要

单视图3D生成模型已取得令人印象深刻的视觉质量,但它们并非为满足结构或功能需求而设计,在实践中常常存在不足。对称性就是这样一个需求:违反对称性,即使是微小的违反,也可能使模型在物理上不可用。我们提出SymTRELLIS,一种在TRELLIS.2的基于流的3D生成过程中强制任意有限点群对称性(旋转、反射和多面体对称)的方法,无需重新训练底层的VAE或流模型。我们的关键思想是将空间变换在潜空间中的作用近似为体素潜变量上的学习线性算子,通过一个轻量级的空间变换潜映射器实现,该映射器在通用的非对称3D数据上训练。在生成时,我们通过在每一步ODE中对所有对称等价变换的预测流速度进行平均来强制对称性,这一过程称为速度对称化。对称性规格可以从初始TRELLIS.2生成中自动估计,或由用户提供,从而实现超越输入图像暗示的刻意折叠操作。在一个包含266个严格对称物体的基准测试上(涵盖2到20倍旋转和多面体对称群),与TRELLIS.2、Hunyuan3D-2.1和TripoSG相比,SymTRELLIS显著降低了所有对称性误差指标,同时保持了与基础模型相当的重建精度。

英文摘要

Single-view 3D generative models have achieved impressive visual quality, yet they are not designed to satisfy structural or functional requirements, and in practice, often fall short. Symmetry is one such requirement: violations, even subtle ones, on symmetry can render a model physically unusable. We present SymTRELLIS, a method that enforces arbitrary finite point group symmetries (rotational, reflectional, and polyhedral) during the flow-based 3D generation of TRELLIS.2, without retraining the underlying VAE or flow model. Our key idea is to approximate the latent-space action of spatial transformations as a learned linear operator on voxel latents, implemented as a lightweight spatial-transform latent mapper trained on generic, non-symmetric 3D data. At generation time, we enforce symmetry by averaging predicted flow velocities across all symmetry-equivalent transformations at each ODE step, a process we call velocity symmetrization. The symmetry specification can be estimated automatically from an initial TRELLIS.2 generation or supplied by the user, enabling deliberate fold manipulation beyond what the input image suggests. On a curated benchmark of 266 strictly symmetric objects spanning 2- to 20-fold rotations and polyhedral symmetry groups, SymTRELLIS substantially reduces all symmetry error metrics compared to TRELLIS.2, Hunyuan3D-2.1, and TripoSG, while maintaining reconstruction accuracy comparable to the base model.

2606.04106 2026-06-04 cs.LG cs.AI 版本更新

Building The Ph(ysical)AI Layer Of Machine Intelligence

构建机器智能的物理AI层

Ulbert Jose Botero, Liam Smith, Brooks Olney, Pooya Khorrami, Steven Kusiak, Watson Jia, Sage Trudeau, Daniel Capecci

发表机构 * MIT Lincoln Laboratory(麻省理工学院林肯实验室)

AI总结 提出基于信号处理原理的基座模型,通过射频数据训练实现跨模态迁移,无需目标域微调,以1.99M参数在15个任务上平均准确率77.7%。

Comments 102 pages, 11 Figures

详情
AI中文摘要

基础模型通过多样化数据的大规模训练实现泛化,但在没有配对训练数据的情况下,向真正未见过的领域迁移存在局限性。我们提出基于原理的基座模型,该模型编码信号处理原理(傅里叶分解、能量守恒、对称性),而不是学习无约束的统计相关性。我们假设不同领域的差异不在于基本物理规律,而在于时间、频率、幅度或相位上的可学习变换。仅使用射频数据训练,并结合这些原理的协同设计架构和损失函数,我们实现了向音频、图像、文本和视频的跨模态迁移,仅使用从射频数据学习到的冻结表示,无需在目标域上对编码器进行微调。我们的1.99M参数冻结编码器通过线性探测在15个不同任务上达到77.7%的平均准确率(top-3为91.9%),具有系统性差异:在物理基础任务(说话人识别、地震学、射频指纹识别)上为84.5%,而在语义任务(音乐流派、语言识别)上为70.0%。这表明基于原理和基于规模的方法提供了互补路径:物理原理实现了高效的跨模态迁移,同时自然地界定了物理理解与语义理解之间的边界。

英文摘要

Foundation models achieve generalization through massive-scale training on diverse data, but have limitations with transfer to truly unseen domains without paired training data. We propose principle-driven foundation models that encode signal-theoretic principles (Fourier decomposition, energy conservation, symmetry) rather than learn untethered statistical correlations. We hypothesize that domains differ not in fundamental physics, but in learnable transformations in time, frequency, magnitude, or phase. Training exclusively on radio-frequency (RF) data with co-designed architecture and losses incorporating these principles, we achieve cross-modal transfer to audio, images, text, and video using only frozen representations learned from RF data, requiring no fine-tuning of the encoder on target domains. Our 1.99M parameter frozen encoder achieves 77.7% average accuracy (91.9% top-3) across 15 diverse tasks via linear probing, with systematic variation: 84.5 on physically-grounded tasks (speaker recognition, seismology, RF fingerprinting) versus 70.0% on semantic tasks (music genre, language recognition). This reveals that principle-driven and scale-driven approaches offer complementary paths: physical principles enable efficient cross-modal transfer while naturally establishing the boundary between physical and semantic understanding.

2606.04104 2026-06-04 cs.SE cs.AI cs.CR 版本更新

Proof-Carrying Agent Actions: Model-Agnostic Runtime Governance for Heterogeneous Agent Systems

证明携带型智能体动作:异构智能体系统的模型无关运行时治理

Zexun Wang

发表机构 * Ond Holdings Inc(Ond控股公司)

AI总结 提出一种运行时无关的治理模型PCAA,通过动作证书和五个检查点实现异构智能体系统的统一授权与审计,并在参考实现中验证其可移植性和有效性。

Comments 25 pages, 2 tables, 3 figures. Implementation-informed systems paper with bounded public validation

详情
AI中文摘要

智能体系统通过具有非常不同控制点的运行时执行:本地编码工具、框架SDK、托管智能体平台、API网关和仅观察集成。因此,一个高风险动作(如外部发布数据)可能在一个运行时中表现为shell命令,在另一个运行时中表现为工具调用,在第三个运行时中表现为托管会话转换。这使得难以一致地回答一个基本的治理问题:什么动作被授权,由谁授权,具有什么批准语义,以及执行后有什么证据? 本文提出了证明携带型智能体动作(PCAA),这是一种以动作证书而非供应商原生会话记录为中心的运行时无关治理模型。PCAA围绕五个检查点组织控制:动作前可接受性、动作开放、假设捕获、批准和结果关闭。它将这些检查点绑定到一个可移植的动作信封、运行时和批准收据以及可重放证明。该模型以两种实际方式扩展:证书是外部性感知的,携带边界事实(如目标可见性和账户来源),并且批准由明确的可执行性类别描述,而不是由单一的已审查或未审查位描述。 我们通过异构智能体控制平面中的参考实现和披露受限的评估协议来研究该模型。在一个从24个可执行种子扩展到跨四个运行时家族的96个轨迹的保护基准上,PCAA在消融下暴露不同故障模式的同时保持了路由质量。本文贡献了围绕证书携带动作的运行时治理的系统公式化,以及一个基于实现的说明,说明该公式化如何在运行时变更下保持可移植性,而不会崩溃为供应商特定的控制面。

英文摘要

Agent systems execute through runtimes with very different control points: local coding tools, framework SDKs, managed agent platforms, API gateways, and observer-only integrations. A high-risk action such as publishing data externally may therefore appear as a shell command in one runtime, a tool call in another, and a hosted session transition in a third. This makes it difficult to answer a basic governance question consistently: what action was authorized, under whose authority, with what approval semantics, and with what evidence after execution? This paper presents Proof-Carrying Agent Actions (PCAA), a runtime-neutral governance model centered on an action certificate rather than on a vendor-native session record. PCAA organizes control around five checkpoints: pre-action admissibility, action open, assumption capture, approval, and outcome closure. It binds these checkpoints to a portable action envelope, runtime and approval receipts, and replay-ready proof. The model is extended in two practical ways: the certificate is externality-aware, carrying boundary facts such as destination visibility and account provenance, and approval is described by explicit enforceability classes rather than by a single reviewed or unreviewed bit. We study the model through a reference implementation in a heterogeneous agent control plane and a disclosure-bounded evaluation protocol. On a protected benchmark expanded from 24 executable seeds to 96 traces across four runtime families, PCAA preserves route quality while exposing distinct failure modes under ablation. The paper contributes a systems formulation of runtime governance around certificate-bearing actions and an implementation-grounded account of how that formulation can remain portable under runtime churn without collapsing into vendor-specific control surfaces.

2606.04103 2026-06-04 cs.SD cs.AI cs.LG eess.AS 版本更新

The Differentiable Auditory Loop (DAL): An ML Framework for Hyper-Personalized Hearing Aids

可微分听觉环路(DAL):用于超个性化助听器的机器学习框架

Alejandro Ballesta Rosen, Jason Mikiel-Hunter, Julian Maclaren, Jack Collins, Richard F. Lyon, Simon Carlile

发表机构 * Google Research Australia(谷歌澳大利亚研究实验室) Macquarie University(麦考瑞大学)

AI总结 提出可微分听觉环路(DAL)框架,通过将CARFAC模型移植到JAX并优化SEANet深度神经网络,以正常听觉神经活动模式为参考补偿听力损失,在神经表征和信号保真度指标上优于传统助听器基线。

详情
AI中文摘要

传统助听器依赖固定的频率依赖性放大和压缩来管理灵敏度降低,这在复杂环境中(如多说话者场景,即“鸡尾酒会”问题)往往无法提供足够的听力支持。为了更全面地解决听力损失背后的编码功能障碍,我们引入了可微分听觉环路(DAL),这是一个用于个性化助听器设计和验配的新开源框架。我们的第一个DAL实现包含了CARFAC——一个可微的人类耳蜗功能模型,我们将其移植到JAX,以优化深度神经网络,使受损的听觉神经活动模式与正常听力参考匹配。为了构建具有所需精细频谱-时间信号处理的助听器,我们采用了SEANet,一种波形到波形的全卷积UNet生成器。我们通过比较适配正常听力的CARFAC模型输出与适配每个受试者个体听力损伤的CARFAC模型输出来微调网络。比较使用来自各自CARFAC神经活动模式(NAP)输出和稳定听觉图像(SAI)的损失函数进行,后者提供捕获听觉神经输出中相位不敏感时间结构的二维表示。通过梯度下降,SEANet模型学习同时去噪输入并补偿由受损CARFAC模型建模的听力损失。在神经表征和信号保真度指标上,DAL优化的SEANet模型优于测试的主助听器(MHA)基线。DAL框架为基于模型、机器学习驱动的助听器信号处理个性化提供了一条实用路径。下一步包括硬件部署以实现真实世界的临床测试。

英文摘要

Conventional hearing aids rely on fixed, frequency-dependent amplification and compression to manage reduced sensitivity, which often fails to provide sufficient listening support in complex environments, such as situations with multiple speakers (the ``cocktail party'' problem). To more comprehensively address the underlying encoding dysfunctions of hearing loss, we introduce the Differentiable Auditory Loop (DAL), a new open-source framework for personalized hearing aid design and fitting. Our first implementation of DAL incorporates CARFAC, a differentiable model of human cochlear function, which we ported to JAX, to optimize a deep neural network to match impaired auditory neural activity patterns with a normal-hearing reference. To build a hearing aid with the fine-grained spectro-temporal signal processing required, we adopt SEANet, a waveform-to-waveform fully convolutional UNet generator. We fine-tune the network by comparing the outputs of a CARFAC model fitted to normal hearing with that of a CARFAC model fitted to match each subject's individual hearing impairment. The comparison is done using loss functions derived from the respective CARFAC neural activity pattern (NAP) outputs and stabilized auditory images (SAIs), the latter providing a 2D representation that captures phase-insensitive temporal structure in the auditory nerve output. Through gradient descent, the SEANet model learns to both denoise the input and compensate for the hearing loss modelled by the impaired CARFAC model. Across neural-representation and signal-fidelity metrics, the DAL-optimized SEANet model outperformed the tested master hearing aid (MHA) baselines. The DAL framework provides a practical path toward model-based, machine-learning-driven personalization of hearing aid signal processing. Next steps include hardware deployment to enable real-world clinical testing.

2606.04095 2026-06-04 cs.CL cs.AI 版本更新

POLARIS: Guiding Small Models to Write Long Stories

POLARIS:引导小模型撰写长篇小说

Rishanth Rajendhran, Jenna Russell, Mohit Iyyer, John Frederick Wieting

发表机构 * University of Maryland(马里兰大学) Google(谷歌) DeepMind(深Mind)

AI总结 提出POLARIS训练方法,结合LLM裁判奖励和人类参考注入,使9B小模型在长故事写作中达到与27B模型相当的质量,并展现出长度泛化能力。

详情
AI中文摘要

小型开源模型在长篇创意写作中表现不佳:它们生成的故事要么远低于要求的长度,要么随着长度增加质量显著下降,尤其是与前沿模型相比。我们提出了POLARIS(基于LLM裁判奖励和锚定参考注入的故事写作策略优化),这是一种低计算量的GRPO方法,包含两个关键要素:一个具有结构化故事质量评分标准的前沿LLM裁判作为在线奖励,以及人类参考注入(HRI),其中教师强制的人类撰写故事作为每个GRPO组内的高奖励锚点。通过将我们的训练方法应用于Qwen3.5-9B,使用从100部短篇小说集中提取的约1.4K个提示-故事对数据集和4块A100 GPU,我们得到了POLARIS-9B。在涵盖分布内和分布外提示及评分标准的五个基准测试中,POLARIS-9B与更大的开源模型竞争,同时更严格地遵循长度指令。盲人机评估证实,POLARIS-9B优于基础Qwen3.5-9B,并与Qwen3.5-27B相当。尽管仅在长达4000词的故事上训练,POLARIS-9B在要求故事长度达到训练长度3倍的提示下仍能保持质量,而大多数开源模型在此情况下质量、长度遵循度或两者均显著下降。更广泛地说,我们的结果表明,长度泛化是创意写作模型的一个有意义的压力测试,也是区分其他接近模型的有用视角。

英文摘要

Small open-weight models struggle at long-form creative writing: their generated stories either fall far short of the requested length, or their quality significantly degrades as length increases, especially when compared to frontier models. We present POLARIS (Policy Optimization with LLM-as-a-judge rewards and Anchored-Reference Injection for Storywriting), a lower-compute GRPO recipe with two key ingredients: a frontier LLM judge with a structured Story Quality rubric as the online reward, and human-reference injection (HRI), where a teacher-forced human-written story serves as a high-reward anchor within each GRPO group. By applying our training recipe to Qwen3.5-9B, using a dataset of approximately 1.4K prompt-story pairs derived from 100 short-story anthologies and 4 A100 GPUs, we obtain POLARIS-9B. Across five benchmarks spanning in-distribution and out-of-distribution prompts and rubrics, POLARIS-9B is competitive with much larger open-weight models while following length instructions more closely. A blinded human evaluation confirms that POLARIS-9B is preferred to the base Qwen3.5-9B and on par with Qwen3.5-27B. Despite training only on stories up to 4k words, POLARIS-9B preserves quality on prompts requesting stories up to 3 times the training length, a regime where most open-weight models degrade substantially in quality, length adherence, or both. More broadly, our results suggest that length generalization is a meaningful stress test for creative-writing models and a useful lens for distinguishing otherwise close models.

2606.04075 2026-06-04 cs.LG cs.AI cs.CL cs.CR cs.CY 版本更新

Large Language Models Hack Rewards, and Society

大型语言模型攻击奖励机制与社会

Wei Liu, Xinyi Mou, Hanqi Yan, Zhongyu Wei, Yulan He

发表机构 * King’s College London(伦敦大学国王学院) Fudan University(复旦大学) The Alan Turing Institute(艾伦·图灵研究所)

AI总结 研究强化学习训练中大型语言模型利用奖励函数漏洞的“社会攻击”现象,通过SocioHack沙盒实验发现模型能发现并利用社会规则漏洞,且现有安全措施效果有限。

Comments 14 pages, 9 figures, 7 tables

详情
AI中文摘要

强化学习已成为一种主导的后训练范式,使大型语言模型能够从奖励中学习。我们观察到社会规则在结构上与奖励函数相似。它们定义了可衡量的结果、阈值和例外情况,同时往往仅部分指定了制度意图。我们假设强化学习训练过程可能利用这些漏洞,因此提出模型在强化学习期间攻击奖励函数的已知倾向是否可能扩展为一种更严重的失败模式,即社会攻击:发现社会运行规则中的漏洞。为了研究这一现象,我们引入了SocioHack,一个包含72个社会环境的沙盒,并发现这些环境中奖励攻击自然出现并导致监管漏洞的发现。模型学会攻击社会规则并生成技术上合规但违背监管意图的策略,而当前的大型语言模型安全措施仅提供有限的缓解。因此,收集真实世界反馈用于模型训练需要更加谨慎,我们需要下一代后训练范式来安全地在真实社会中迭代大型语言模型。

英文摘要

Reinforcement learning (RL) has become a dominant post-training paradigm, enabling large language models (LLMs) to learn from rewards. We observe that societal regulations are structurally similar to reward functions. They define measurable outcomes, thresholds, and exceptions, while often leaving institutional intent only partially specified. We hypothesise that the RL training process may exploit these gaps and therefore ask whether models' well-known tendency to hack reward functions during RL can scale into a more consequential failure mode named societal hacking: discovering loopholes in the rules society runs on. To study this phenomenon, we introduce SocioHack, a sandbox of 72 societal environments, and find that within these environments, reward hacking naturally emerges and leads to regulatory loophole discovery. Models learn to hack the social rules and generate strategies that remain technically compliant while defeating regulatory intent, and current LLM safeguards provide only limited mitigation. Therefore, collecting in-the-wild feedback for model training requires greater caution, and we need a next-generation post-training paradigm for safely iterating LLMs in real society.=

2606.04074 2026-06-04 cs.LG cs.AI cs.IT math.IT 版本更新

Adaptive Patching Is Harder Than It Looks For Time-Series Forecasting

自适应分块在时间序列预测中比看起来更难

Federico Zucchi, Yi Xie, Chao Zhang, Keyuan Luo, Thomas Lampert, Ziyue Li

发表机构 * ICube, University of Strasbourg, Illkirch-Graffenstaden, France(斯特拉斯堡大学ICube研究所,法国伊尔克里奇-格拉夫芬斯坦德) Technical University of Munich(慕尼黑技术大学) FinTech Thrust, The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)金融科技研究组) Computer Science Department, Hainan Bielefeld University of Applied Sciences(海南比尔费尔德应用科学大学计算机科学系) Cephalgo, Strasbourg, France(法国斯特拉斯堡Cephalgo公司) Heilbronn Data Science Center, Munich Data Science Institute(慕尼黑数据科学研究所海德堡数据科学中心)

AI总结 本文通过理论分析和实验验证,探讨自适应分块在时间序列Transformer中是否优于调优的均匀分块,发现均匀基线在标准基准上具有竞争力,自适应分块的优势有限且依赖于特定方法和数据集。

详情
AI中文摘要

自适应分块是时间序列Transformer最近提出的一个引人注目的方案:在序列局部信息丰富的区域分配更细的分块。本文探究在什么条件下内容自适应分块算子应优于调优的均匀算子。局部异质性本身并不足够:在逐点预测损失下,一个看似复杂的区域并不自动意味着更细的分块会减少损失。我们将分块建模为有预算的比特率分配,并推导出一个显式阈值,动态分块规则必须满足该阈值才能击败调优的均匀基线,然后从局部(二次代理)和全局(模型假设下的强凸界)两方面界定了可实现的改进。由此得出两个结构性结果:在没有耦合约束的情况下,标量局部复杂度无法在常见损失景观下产生非均匀最优;一旦骨干网络训练到其表示感知最优,对齐增益会在调优的均匀分块大小附近崩溃。为了验证这些预测,我们在三种代表性架构上进行了受控隔离研究,用均匀分块大小扫描替换每个自适应机制,同时保持骨干网络、数据和训练协议不变。在标准的长时域预测基准上,验证选择的均匀基线与动态对应物具有竞争力,每个设置的效果集中在零附近,且按数据集汇总后没有一致的方向性优势。我们观察到的较大增益是方法和数据集特定的。因此,自适应分块应针对调优的均匀基线进行评估;其价值取决于是否有一个廉价且可靠的路由信号能够识别出更细的分块实际上在何处减少预测损失。

英文摘要

Adaptive patching is a recent and compelling proposal for time-series Transformers: allocate finer patches where the sequence looks locally informative. This paper asks under what conditions a content-adaptive patching operator should outperform a tuned uniform one. Local heterogeneity alone is not enough: under pointwise forecasting losses, a complex-looking region is not automatically one where finer patching reduces the loss. We model patching as a budgeted bitrate allocation and derive an explicit threshold that a dynamic patching rule must satisfy to beat a well-tuned uniform baseline, then bound the achievable improvement both locally (a quadratic surrogate) and globally (a strong-convexity bound under the model's assumptions). Two structural results follow: without a coupling constraint, scalar local complexity cannot produce a non-uniform optimum under a common loss landscape; and once the backbone is trained to its representation-aware optimum, the alignment gain collapses around a well-tuned uniform patch size. To test these predictions, we run a controlled isolation study on three representative architectures, replacing each adaptive mechanism with a uniform patch-size sweep while keeping the backbone, data, and training protocol fixed. On standard long-horizon forecasting benchmarks, the validation-selected uniform baseline is competitive with the dynamic counterpart, with per-setting effects concentrated near zero and no consistent directional advantage once results are aggregated by dataset. The larger gains we do observe are method- and dataset-specific. Adaptive patching should therefore be evaluated against a tuned uniform baseline; its value depends on whether a cheap and reliable routing signal can identify where finer patches actually reduce forecasting loss.

2606.04073 2026-06-04 cs.LG cs.AI stat.ML 版本更新

TPA-AD: A Two-Stage Pseudo Anomaly-Guided Method for Bearing Time-Series Anomaly Detection

TPA-AD: 一种用于轴承时间序列异常检测的两阶段伪异常引导方法

Xiancheng Wang, Zhibo Zhang, Ran Li, Rui Wang, Minghang Zhao, Shisheng Zhong, Lin Wang

发表机构 * CQSF.com(重庆师范大学) Huadian University(哈尔滨理工大学)

AI总结 提出一种两阶段伪异常引导方法TPA-AD,通过重构模型和特征误差控制生成边界伪异常窗口,结合对比学习与KNN实现无监督轴承时间序列异常检测,在轴承故障和退化数据集上表现稳定且具泛化性。

详情
AI中文摘要

本文提出了一种两阶段伪异常引导的异常检测方法(TPA-AD),用于在仅正常样本可用的训练设置下进行轴箱轴承时间序列异常检测(TSAD)。该方法首先利用重构模型和每特征目标误差控制在正常边界附近生成伪异常窗口,然后通过正常窗口与伪异常窗口之间的对比学习学习异常敏感表示,最后使用k近邻(KNN)生成窗口级和点级异常分数。与依赖已知故障类别、真实异常先验或随机异常注入的现有方法相比,TPA-AD通过在边界邻域构建伪异常提高了正常边界的可分离性,并能联合处理混合变量场景中的连续和离散特征。主要实验在轴承故障检测数据集和退化过程数据集上进行,并在13个公共TSAD数据集上进行了额外的探索性扩展。结果表明,所提方法产生相对稳定的异常响应,对退化演化敏感,并在公共TSAD基准和真实高速列车相关轴承数据上表现出一定程度的更广泛适用性。

英文摘要

This paper proposes a two-stage pseudo anomaly-guided anomaly detection method (\textbf{T}wo-stage \textbf{P}seudo \textbf{A}nomaly-guided \textbf{A}nomaly \textbf{D}etection, \textbf{TPA-AD}) for axle-box bearing time-series anomaly detection (time series anomaly detection, TSAD) under the setting where only normal samples are available for training. The method first generates pseudo-anomalous windows near the normal boundary using a reconstruction model and per-feature target-error control. It then learns anomaly-sensitive representations through contrastive learning between normal and pseudo-anomalous windows, and finally produces window-level and point-level anomaly scores using k-nearest neighbors (KNN). Compared with existing methods that rely on known fault categories, real anomaly priors, or random anomaly injection, TPA-AD improves the separability of the normal boundary by constructing pseudo-anomalies in boundary neighborhoods and can jointly handle continuous and discrete features in mixed-variable scenarios. The main experiments are conducted on bearing fault detection datasets and degradation-process datasets, with an additional exploratory extension on $13$ public TSAD datasets. The results show that the proposed method yields relatively stable anomaly responses, is sensitive to degradation evolution, and demonstrates a certain degree of broader applicability on public TSAD benchmarks and real high-speed-train-related bearing data.

2606.04067 2026-06-04 cs.CR cs.AI 版本更新

Need to Know: Contextual-Integrity-Grounded Query Rewriting for Privacy-Conscious LLM Delegation

须知:基于语境完整性的隐私意识LLM委托查询重写

Xinyue Huang, Xiaochun Cao, Wenyuan Yang

发表机构 * Sun Yat-sen University(中山大学)

AI总结 针对LLM委托中查询隐私泄露问题,提出基于语境完整性的查询重写框架,通过CI引导的强化学习训练重写器,在保留任务关键信息的同时抑制非必要敏感披露,实现最佳隐私-效用权衡。

详情
AI中文摘要

随着LLM日益融入日常工作流程,发送到云端LLM的用户查询通常混合了任务必需内容和任务非必需的敏感披露,但基于类型的PII编辑是上下文无关的,可能引发两个问题:过度披露未类型化的敏感上下文和过度移除承载答案的片段。我们在语境完整性下重新定义隐私保护查询重写:只有当某个片段对任务必要时才应转发。我们引入了DelegateCI-Bench,这是首个基于任务的语境完整性基准,用于隐私意识委托,包含3,167个样本,结合了涵盖11个任务和20种任务类型的高质量合成数据、基于WildChat的真实用户查询以及一个包含密集敏感信息的医学挑战集。基于此基准,我们提出了一个CI引导的强化学习框架,将必要和非必要的敏感片段转化为可验证的优化信号,并训练一个查询重写器,以保留任务关键信息同时抑制不必要的敏感披露。实验表明,我们学习的重写器实现了最佳的隐私-效用权衡,与设备端基线相比,平均效用提升高达+10.1。

英文摘要

As LLMs become increasingly woven into everyday workflows, user queries sent to cloud hosted LLMs routinely mix task-essential content with task non-essential sensitive disclosures, yet type based PII redaction is context agnostic and may raise two issues: over disclosing untyped sensitive context and over removing answer bearing spans. We recast privacy preserving query rewriting under Contextual Integrity: a span should be forwarded only if it is necessary for the task. We introduce DelegateCI-Bench, the first task based Contextual Integrity benchmark for privacy-conscious delegation, comprising 3,167 samples that combine high quality synthetic data spanning 11 tasks and 20 task types, WildChat based real user queries, and a medical challenge set with dense sensitive information. Building on this benchmark, we propose a CI-guided reinforcement learning framework that converts essential and non-essential sensitive spans into verifiable optimization signals, and train a query rewriter to preserve task critical information while suppressing unnecessary sensitive disclosure. Experiments show that our learned rewriter achieves the best privacy-utility tradeoff, achieving up to +10.1 average utility over on-device baselines.

2606.04063 2026-06-04 cs.LG cs.AI 版本更新

LLM Compression with Jointly Optimizing Architectural and Quantization choices

联合优化架构与量化选择的大语言模型压缩

Hoang-Loc La, Truong-Thanh Le, Amir Taherkordi, Phuong Hoai Ha

发表机构 * UiT The Arctic University of Norway(UiT北莫斯科斯大学) University of Oslo, Norway(奥斯陆大学)

AI总结 提出一种可微神经架构搜索框架,联合优化大语言模型的架构配置与混合精度量化,实现更优的精度-延迟权衡。

详情
AI中文摘要

部署大型语言模型(LLM)因其巨大的内存和计算需求而具有挑战性。虽然一些方法通过从头开发小型或微型语言模型来解决这一问题,但这些方法需要大量的GPU训练。压缩预训练的LLM用于边缘设备提供了一种有吸引力的替代方案。除了剪枝和量化,神经架构搜索(NAS)能够实现有效的压缩,然而先前的NAS方法通常限制搜索空间并将架构与量化解耦。我们引入了一种可微NAS框架,该框架探索整个空间,并联合优化LLM线性层的架构配置与混合精度量化。实验表明,我们的模型在精度-延迟权衡上具有优越性:在可比精度下,我们的模型推理速度比顺序的NAS后量化基线快1.4倍,或在等效延迟下,在七个推理任务上平均精度提高高达6%。

英文摘要

Deploying large language models (LLMs) is challenging due to their significant memory and computational requirements. While some methods address this by developing small or tiny language models from scratch, these approaches demand extensive GPU training. Compressing pre-trained LLMs for edge devices offers a compelling alternative. Beyond pruning and quantization, Neural Architecture Search (NAS) enables effective compression, yet prior NAS approaches often limit the search space and decouple architecture from quantization. We introduce a differentiable NAS framework that explores the entire space and jointly optimizes architectural configurations alongside mixed-precision quantization for linear layers of LLMs. Experiments demonstrate superior accuracy-latency trade-offs: our models achieve up to 1.4x faster inference than sequential NAS-then-quantization baselines at comparable accuracy, or up to 6% higher average accuracy across seven reasoning tasks at equivalent latency.

2606.04057 2026-06-04 cs.SE cs.AI cs.LG 版本更新

The Invisible Lottery: How Subtle Cues Steer Algorithm Choice in LLM Code Generation

隐形彩票:微妙线索如何引导LLM代码生成中的算法选择

Akanksha Narula, Mofasshara Binte Rafique, Laurent Bindschaedler

发表机构 * University of Washington(华盛顿大学) Google Research(谷歌研究院)

AI总结 通过大量控制实验,发现提示中的偶然线索(如上下文词或元数据)会系统性地改变LLM在代码生成中选择的算法族分布,影响性能、安全性和可维护性,而直接命名算法是最可靠的缓解措施。

详情
AI中文摘要

大型语言模型(LLM)现在生成大量生产代码,通常用于具有多个有效算法解决方案的任务。偶然的提示线索,即任务规范之外的上下文词或元数据,可以引导模型选择哪个算法,即使所有输出都通过相同的测试。提示敏感性作为提高输出质量的工具已被广泛研究。这里,输出策略意味着在固定正确性下的算法选择。我们将算法引导定义为线索引起的算法族分布变化,并在11个任务、19种线索类型(18个通道加上一个记忆化语义与表面消融,在改变排版和标点的同时保留含义)以及15个模型配置上进行了46,535次控制实验。我们发现算法族分布存在大规模、系统性的变化(高达100个百分点),与线索语义基本一致,包括在速率限制等应用任务中。直接命名算法是我们测试的最可靠的缓解措施。因此,偶然的上下文在性能、安全性和可维护性上创造了一个“隐形彩票”。

英文摘要

Large language models (LLMs) now generate substantial production code, often for tasks with multiple valid algorithmic solutions. Incidental prompt cues, meaning contextual words or metadata outside the task specification, can steer which algorithm the model selects, even when all outputs pass the same tests. Prompt sensitivity is well studied as a tool to improve output quality. Here, output policy means algorithm choice under fixed correctness. We define algorithm steering as cue-induced shifts in algorithm-family distributions and run 46,535 controlled experiments across 11 tasks, 19 cue types (18 channels plus a memoization semantic-vs-surface ablation that preserves meaning while changing typography and punctuation), and 15 model configurations. We find large, systematic shifts in algorithm-family distributions (up to 100 pp), largely consistent with cue semantics, including in applied tasks such as rate limiting. Direct algorithm naming is the most reliable mitigation we tested. Accidental context therefore creates an "invisible lottery" over performance, security, and maintainability.

2606.04053 2026-06-04 cs.LG cs.AI 版本更新

A Goal-Set Characterization of Task Composition in the Boolean Task Algebra

布尔任务代数中任务组合的目标集刻画

Eduardo Terrés-Caballero, Herke van Hoof

发表机构 * Informatics Institute, University of Amsterdam(阿姆斯特丹大学信息学院) AMLab, University of Amsterdam(阿姆斯特丹大学AML实验室)

AI总结 本文通过目标集方法简化了布尔任务代数中的任务组合,证明了确定性MDP中最优扩展Q值函数由通用任务和空任务决定,从而减少了学习成本。

详情
AI中文摘要

布尔任务代数(BTA)通过为达到目标的任务配备布尔运算,为强化学习中的零样本任务组合提供了一个原则性框架。我们重新审视了其结构假设,并形式化了最优扩展Q值函数空间中的坍缩:在确定性MDP中,每个这样的函数完全由通用任务和空任务决定。这使得原始BTA公式中提出的对数基任务集变得冗余。基于这一观察,我们引入了一种基于目标集的组合方法,该方法对目标集执行逻辑运算,并通过从通用值函数和空值函数中选择切片来重构组合值函数。这降低了标准BTA的学习成本,并减少了BTA和技能机器的组合时间,同时保持了策略性能。在表格、视觉、函数逼近和连续控制领域的实验表明,学习额外的基任务并不会带来更好的性能。最后,我们研究了随机设置,并提供了一个反例,表明这种坍缩不一定成立,即最优组合可能需要考虑目标数量指数级的策略。代码可在 https://github.com/EduardoTerres/bta_paper 获取。

英文摘要

The Boolean Task Algebra (BTA) provides a principled framework for zero-shot task composition in reinforcement learning by equipping goal-reaching tasks with Boolean operations. We revisit its structural assumptions and formalize a collapse in the space of optimal extended Q-value functions: in deterministic MDPs, every such function is fully determined by the universal and empty tasks. This makes the logarithmic set of base tasks proposed in the original BTA formulation redundant. Building on this observation, we introduce a goal-set-based composition method that performs logical operations on goal sets and reconstructs composed value functions by selecting slices from the universal and empty value functions. This reduces learning costs for standard BTA and reduces composition time for both BTA and Skill Machines, while preserving policy performance. Experiments across tabular, visual, function-approximation, and continuous-control domains show that learning additional base tasks does not yield better performance. Finally, we study the stochastic setting and provide a counterexample showing that this collapse need not hold, that is, optimal composition may require accounting for exponentially many policies in the number of goals. Code is available at https://github.com/EduardoTerres/bta_paper.

2606.04051 2026-06-04 cs.LG cs.AI cs.CR 版本更新

RUBAS: Rubric-Based Reinforcement Learning for Agent Safety

RUBAS: 基于评分标准的强化学习用于智能体安全

Xian Qi Loye, Qinglin Su, Zhexin Zhang, Shiyao Cui, Qi Zhu, Fei Mi, Hongning Wang, Minlie Huang

发表机构 * The Conversational AI (CoAI) group, DCST, Tsinghua University(清华大学对话人工智能(CoAI)组,DCST,清华大学) Huawei Noah’s Ark Lab(华为诺亚实验室)

AI总结 提出RUBAS框架,通过将智能体行为分解为四个维度的评分标准提供细粒度奖励,利用强化学习在保证任务完成的同时提升工具使用安全性。

详情
AI中文摘要

LLM进化为工具型智能体带来了与真实世界执行相关的新安全挑战,而非简单的文本生成。现有的对齐方法通常依赖粗略的拒绝信号或静态监督,难以在多样化的智能体风险中平衡安全性与有用的工具执行。我们提出了RUBAS,一种基于评分标准的强化学习框架用于智能体安全。RUBAS将智能体行为分解为四个维度:工具使用安全性、参数安全性、响应安全性和有用性。这些结构化的评分标准在完整的智能体轨迹上提供细粒度且可解释的奖励,使强化学习能够在保持任务完成的同时优化安全工具使用。在多个智能体安全基准和模型上的大量实验表明,RUBAS相比标准对齐基线提高了安全性,减少了基于工具的幻觉,并保持了竞争性的实用性。我们的结果表明,多维评分标准奖励为在安全关键的工具使用环境中对齐LLM智能体提供了有效的训练信号。

英文摘要

The evolution of LLMs into tool-enabled agents creates a new class of safety challenges associated with real-world execution rather than simple text generation. Existing alignment methods often rely on coarse refusal signals or static supervision, making it difficult to balance safety with useful tool execution across diverse agentic risks. We introduce RUBAS, a rubric-based reinforcement learning framework for agent safety. RUBAS decomposes agent behavior into four dimensions: tool-use safety, argument safety, response safety, and helpfulness. These structured rubrics provide fine-grained and interpretable rewards over complete agent trajectories, enabling reinforcement learning to optimize safe tool use while preserving task completion. Extensive experiments across multiple agent safety benchmarks and models show that RUBAS improves safety over standard alignment baselines, reduces tool-grounded hallucinations, and maintains competitive utility. Our results suggest that multi-dimensional rubric rewards provide an effective training signal for aligning LLM agents in safety-critical tool-use settings.

2606.04050 2026-06-04 cs.LG cs.AI 版本更新

LiftQuant: Continuous Bit-Width LLM via Dimensional Lifting and Projection

LiftQuant: 通过维度提升和投影实现连续位宽的LLM

Liulu He, XuanAng Liu, Juntao Liu, Taolue Feng, Ting Lu, Chunsheng Gan, Zhiyv Peng, Yuan Du, Huanrui Yang, Yijiang Liu, Li Du

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 提出LiftQuant框架,通过“提升-投影”机制实现准连续位宽控制,以精确适配内存预算,在70B模型上以2.4位压缩超越现有2位模型。

Comments ICML 2026 Spotlight

详情
AI中文摘要

现有的量化方法从根本上受限于刚性的整数位宽(例如2位、3位),导致存在“部署鸿沟”,即大型语言模型无法最优地适配特定的内存预算。为弥合这一鸿沟,我们引入了LiftQuant,一种新颖的框架,能够实现连续位宽控制,从而实现真正的帕累托最优部署。其核心创新是一种“提升-投影”机制,该机制通过从更高维度的“提升”空间中投影一个简单的1位格点来近似低维权重向量。关键在于,有效位宽仅由提升维度与原始维度的比率决定,这使得位宽可以准连续地调整,因为维度是一个灵活的结构参数。这种投影生成一个结构化但非均匀的码本,捕获了向量量化(VQ)的表达能力。虽然优于VQ,但LiftQuant的解码路径仅依赖于线性变换和1位均匀量化器,保持了硬件友好的特性。这种灵活性具有变革性:LiftQuant能够将70B的LLM压缩到2.4位,以精确适配24GB GPU,其性能显著超过在同一设备上部署的最先进的2位模型。我们的代码和检查点可在https://github.com/Heliulu/LiftQuant获取。

英文摘要

Existing quantization methods are fundamentally limited by rigid, integer-based bit-widths (e.g., 2, 3-bit), resulting in a ``deployment gap" where Large Language Models cannot be optimally fitted to specific memory budgets. To bridge this gap, we introduce LiftQuant, a novel framework that enables continuous bit-width control for true Pareto-optimal deployment. The core innovation is a ``lift-then-project" mechanism which approximates low-dimensional weight vectors by projecting a simple 1-bit lattice from a higher-dimensional ``lifted" space. Crucially, the effective bit-width is determined simply by the ratio of the lifted dimension to the original dimension, which allows the bit-width to be tuned quasi-continuous as the dimension is a flexible structural parameter. This projection generates a structured yet non-uniform codebook, capturing the expressive power of Vector Quantization (VQ). While beneficial over VQ, LiftQuant's decoding path relies solely on linear transformations and 1-bit uniform quantizers, retaining hardware-friendly nature. This flexibility is transformative: LiftQuant enables a 70B LLM to be compressed to 2.4 bits to precisely fit a 24GB GPU, where its performance significantly surpasses state-of-the-art 2-bit models fitted on the same device. Our code and ckpt is available at https://github.com/Heliulu/LiftQuant.

2606.04048 2026-06-04 cs.LG cs.AI 版本更新

Unlocking Feature Learning in Gated Delta Networks at Scale

解锁大规模门控Delta网络中的特征学习

Yifeng Liu, Quanquan Gu

发表机构 * University of California Los Angeles(加州大学洛杉矶分校)

AI总结 本文通过推导门控Delta网络的缩放规则,实现了超参数(尤其是学习率)在不同模型宽度下的零样本迁移,验证了Maximal Update Parametrization在结构化状态空间模型中的有效性。

详情
AI中文摘要

训练和扩展大型语言模型需要巨大的计算资源,这促使了高效次二次架构和原则性超参数调优方法的发展。虽然最大更新参数化($μ$P)已实现标准Transformer的零样本超参数迁移,但其在线性模型(特别是具有结构化状态转换和复杂架构的模型)中的扩展仍基本未探索。通过在前向传播、门控机制和循环状态动态中严格传播坐标大小估计,我们推导出门控Delta网络的缩放规则。语言模型预训练实验证实,我们的配置使得在AdamW和SGD下,学习率在不同模型宽度间稳定迁移,而标准参数化无法迁移,验证了我们分析的正确性和实用性。

英文摘要

Training and scaling Large Language Models demand enormous computational resources, motivating both efficient sub-quadratic architectures and principled hyperparameter tuning methods. While the Maximal Update Parametrization ($μ$P) has enabled zero-shot hyperparameter transfer for standard Transformers, its extension to linear models, particularly those with structured state transitions and complicated architectures, remains largely unexplored. By rigorously propagating coordinate-size estimates through the forward pass, gating mechanisms, and recurrent state dynamics, we derive the scaling rules for Gated Delta Network. Experiments on language-model pre-training confirm that our configurations enable stable learning-rate transfer across model widths under both AdamW and SGD, whereas standard parametrization fails to transfer, validating the correctness and practical utility of our analysis.

2606.04046 2026-06-04 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

Dive into the Scene: Breaking the Perceptual Bottleneck in Vision-Language Decision Making via Focus Plan Generation

深入场景:通过焦点计划生成打破视觉-语言决策中的感知瓶颈

Boyuan Xiao, Bohong Chen, Yumeng Li, Ji Feng, Yao-Xiang Ding, Kun Zhou

发表机构 * University of Science and Technology of China(中国科学技术大学) Tsinghua University(清华大学)

AI总结 提出SceneDiver方法,通过从粗到细的焦点计划生成,逐步构建场景图并分解任务,减少视觉幻觉,提升视觉-语言模型和视觉-语言-动作模型在具身决策任务中的表现。

Comments Accepted at ICML 2026

详情
AI中文摘要

在具身视觉-语言决策任务(如机器人操作和导航)中,视觉-语言模型和视觉-语言-动作模型(VLMs & VLAs)是具有不同优势的强大工具:VLMs更擅长长期规划,而VLAs更擅长反应控制。然而,它们的性能受到相同感知瓶颈的限制:由于模型无法区分任务相关对象与干扰物,导致视觉幻觉。原则上,准确识别并聚焦关键对象同时过滤无关对象是突破这一限制的关键。一个直接的解决方案是一步聚焦:直接关注重要对象。然而,这种方法被证明无效,因为有效的聚焦本质上需要深度场景理解。为此,我们提出SceneDiver,一种利用VLMs长期规划能力的从粗到细的焦点计划生成方法,首先构建整体场景图以建立初步理解,然后通过识别、理解和分析的迭代循环逐步将任务分解为更简单的子问题。为了实现反应控制,我们还设计了一个轻量级适配器,将深思熟虑的聚焦能力蒸馏到VLAs中。在标准具身AI基准上的评估证实,我们的方法显著减少了VLMs和VLAs的视觉幻觉,同时在需要快速执行的任务中保持了计算效率。我们的代码和数据发布在:https://future-item.github.io/SceneDiver。

英文摘要

In embodied vision-language decision making tasks such as robotic manipulation and navigation, Vision-Language and Vision-Language-Action Models (VLMs & VLAs) are powerful tools with different benefits: VLMs are better at long-term planning, while VLAs are better at reactive control. However, their performance is limited by the same perceptual bottleneck: visual hallucinations arise due to the models' inability to distinguish task-relevant objects from distractors. In principle, accurate identification and focus on critical objects while filtering out irrelevant ones is the key to break this limitation. A straightforward solution is one-step focus: directly attending to essential objects. However, this approach proves ineffective because effective focus inherently requires deep scene understanding. To this end, we propose SceneDiver, a coarse-to-fine focus plan generation method for VLMs leveraging their long-term planning abilities, that first constructs a holistic scene graph to establish initial comprehension, then progressively decomposes the task into simpler sub-problems through an iterative cycle of recognition, understanding, and analysis. To enable reactive control, we also design a lightweight adapter for distilling the deliberate focus ability into VLAs. Evaluations on standard embodied AI benchmarks confirm that our method substantially reduces visual hallucinations for both VLMs and VLAs, while preserving computational efficiency in tasks requiring fast execution. Our code and data are released at: https://future-item.github.io/SceneDiver.

2606.04045 2026-06-04 cs.LG cs.AI 版本更新

Bayes-Sufficient Representations in Supervised Learning

监督学习中的贝叶斯充分表示

Vasileios Sevetlidis

发表机构 * Athena Research Center, Kimmeria Campus, Xanthi, Greece(阿塔尼亚研究中心,基米里亚校区,辛提斯,希腊) Democritus University of Thrace, Vas. Sofias Campus, Xanthi, Greece(德摩根大学,瓦斯·索菲亚校区,辛提斯,希腊) International Hellenic University, Serres, Greece(国际希腊大学,塞雷斯,希腊)

AI总结 本文定义了监督学习中表示对损失函数的贝叶斯充分性,引入贝叶斯商概念,并证明最小充分表示等价于贝叶斯商,通过实验区分了充分性、最小性和非必要信息保留。

详情
AI中文摘要

表示学习通常被描述为保留输入中与预测相关的信息。本文探讨了在固定监督决策问题中相关性的含义。定义了一个表示对于联合分布和损失是贝叶斯充分的,如果某个预测头可以使用它来实现贝叶斯最优行动规则。这使得目标信息依赖于损失。在几乎必然唯一的贝叶斯行动情况下,相关对象是贝叶斯商,它识别需要相同贝叶斯最优行动的输入。当表示细化这个商时,它是充分的;当它在信息上等价于商时,它是贝叶斯最小的。该框架自然地连接到属性诱导:零一损失需要贝叶斯类,平方损失需要条件均值,布里尔损失需要二元预测中的条件概率,对数损失或严格适当评分规则需要预测分布。受控的有限实验、学习的神经瓶颈实验以及真实数据的iNaturalist分类学细化实验说明了充分性、最小性和保留的非必要信息之间的区别。对于固定的监督问题,分布和损失决定贝叶斯行动,贝叶斯行动决定商,商决定贝叶斯最优预测所需的最小信息。

英文摘要

Representation learning is often described as preserving the information in an input that is relevant for prediction. This work asks what relevance means for a fixed supervised decision problem. A representation is defined to be Bayes-sufficient for a joint distribution and loss if some prediction head can use it to implement a Bayes-optimal action rule. This makes the target information loss-dependent. In the almost-surely unique Bayes-action case, the relevant object is a Bayes quotient, which identifies inputs that require the same Bayes-optimal action. A representation is sufficient when it refines this quotient, and Bayes-minimal when it is informationally equivalent to it. The framework connects naturally to property elicitation: zero-one loss requires the Bayes class, squared loss the conditional mean, Brier loss the conditional probability in binary prediction, and log loss or strictly proper scoring rules the predictive distribution. Controlled finite experiments, learned neural bottleneck experiments, and a real-data iNaturalist taxonomic refinement experiment illustrate the distinction between sufficiency, minimality, and retained non-required information. For a fixed supervised problem, the distribution and the loss determine the Bayes action, the Bayes action determines the quotient, and the quotient determines the minimal information required for Bayes-optimal prediction.

2606.04040 2026-06-04 cs.SD cs.AI eess.AS 版本更新

Channel-Oriented Design for EEG-to-Music Reconstruction

面向脑电到音乐重建的通道导向设计

Jiaxin Qing, Junwei Lu, Lexin Li

发表机构 * UC Berkeley(加州大学伯克利分校) Harvard University(哈佛大学)

AI总结 针对脑电信号弱、易受噪声和通道变异影响的问题,提出通道导向设计(包括通道级标记化、多视角自蒸馏和数据增强),在编码-对齐-解码流水线中实现稳定的音乐语义空间对齐,显著提升重建性能。

详情
AI中文摘要

脑机接口旨在从神经信号中解码自然刺激,但迄今为止大多数进展集中在视觉和语言领域。本文研究更具挑战性但探索较少的脑电到音乐重建场景,其中信号微弱、分布广泛且极易受噪声和通道变异影响。我们的核心发现是,早期通道混合会破坏微弱但具有判别性的脑电信号。为此,我们提出一种包含三个关键组件的通道导向设计。具体而言,通道级标记化将每个电极视为显式标记以保留空间局部的神经证据,通道级多视角自蒸馏通过时间裁剪和随机通道子集强制一致性以学习鲁棒且分布式的表示,通道级数据增强引入结构化通道丢弃以提高对噪声、伪迹和缺失电极的不变性。这些组件共同保留了跨通道的微弱但信息丰富的信号,并实现了与语义音乐表示空间的稳定对齐。我们将该通道导向设计集成到脑电到音乐重建的编码-对齐-解码流水线中。理论上,我们刻画了何时保留通道级结构能够改善对齐。实验上,我们与一系列最先进的基线方法进行比较,并展示了一致且显著的性能提升。

英文摘要

Brain-computer interfaces aim to decode naturalistic stimuli from neural signals, yet most progress to date has focused on vision and language. In this article, we study a more challenging but far less explored setting, EEG-to-music reconstruction, where signals are weak, distributed, and highly susceptible to noise and channel variability. Our central finding is that early channel mixing destroys weak but discriminative EEG signals. To address this, we propose a channel-oriented design with three key components. Specifically, channel-wise tokenization treats each electrode as an explicit token to retain spatially localized neural evidence, channel-wise multi-view self-distillation enforces consistency across temporal crops and random channel subsets to learn robust and distributed representations, and channel-wise data augmentation introduces structured channel dropout to improve invariance to noise, artifacts, and missing electrodes. Together, these components preserve weak yet informative signals across channels and enable stable alignment to a semantic music representation space. We integrate this channel-oriented design within an encoding-alignment-decoding pipeline for EEG-to-music reconstruction. Theoretically, we characterize when preserving channel-level structure leads to improved alignment. Empirically, we compare with a range of state-of-the-art baselines and demonstrate consistent and significant performance gains.

2606.04039 2026-06-04 cs.NE cs.AI cs.LG 版本更新

Beyond Static Priors: Dynamic Neural Guidance for Large-Scale Ant Colony Optimization

超越静态先验:大规模蚁群优化的动态神经引导

Dat Thanh Tran, Van Khu Vu, Yining Ma

发表机构 * Center for AI Research(人工智能研究中心) VinUniversity(文大学) College of Engineering and Computer Science(工程与计算机科学学院) Laboratory for Information and Decision Systems(信息与决策系统实验室) Massachusetts Institute of Technology(麻省理工学院)

AI总结 提出DyNACO框架,通过周期性观察信息素分布和当前解实现动态神经引导,结合扰动ACO后端和范围受限的细化机制,在TSP上扩展至10万节点并优于神经基线,在CVRP上以<1%神经开销持续改进无引导基线。

Comments Accepted at KDD 2026

详情
AI中文摘要

神经引导的蚁群优化(ACO)存在一个根本性的训练-推理错位:策略通常被训练来生成静态先验(例如热图),但部署时却用于引导迭代的、长视野的搜索过程。在本文中,我们提出了DyNACO,一个新颖的框架,通过周期性观察信息素分布和当前解来实现动态神经引导。为了使DyNACO在大规模上易于处理,我们将策略与基于扰动的ACO后端和范围受限的细化机制配对,共同确保有效性和稳定的信用分配。在TSP上,DyNACO扩展到10万个节点的实例,并优于神经基线,同时与无引导求解器相比通常减少总运行时间。我们通过容量感知后端将DyNACO扩展到CVRP,以不到1%的神经开销持续改进无引导基线。我们进一步提供了深入分析,验证了模型的泛化能力,并阐明了为什么动态引导优于静态先验。我们的工作强调了在学习引导优化中使神经训练与迭代搜索动态对齐的必要性。代码可在https://github.com/shoraaa/DyNACO获取。

英文摘要

Neural-guided Ant Colony Optimization (ACO) suffers from a fundamental training-inference misalignment: policies are typically trained to generate static priors (e.g., heatmaps), yet deployed to guide iterative, long-horizon search processes. In this paper, we present DyNACO, a novel framework that achieves dynamic neural guidance by periodically observing the pheromone distribution and the incumbent solution. To make DyNACO tractable at scale, we pair the policy with a perturbation-based ACO backend and a scope-restricted refinement mechanism that jointly ensure efficacy and stable credit assignment. On TSP, DyNACO scales to 100,000-node instances and outperforms neural baselines while often reducing total runtime compared to the unguided solver. We extend DyNACO to CVRP via a capacity-aware backend, consistently improving the unguided baseline with less than 1% neural overhead. We further provide in-depth analysis validating the model's generalization capabilities and elucidating why dynamic guidance outperforms static priors. Our work underscores the necessity of aligning neural training with iterative search dynamics in learning-guided optimization. The code is available at https://github.com/shoraaa/DyNACO.

2606.04035 2026-06-04 cs.SE cs.AI cs.LG 版本更新

Unpredictable Safety: Domain-Dependent Compliance and the Transparency Gap in Open-Weight LLMs

不可预测的安全性:开放权重大语言模型中领域依赖的合规性与透明度差距

Zacharie Bugaud

发表机构 * Astera Institute(Astera研究院)

AI总结 通过7个伦理领域的标准化实验,发现开放权重大语言模型的合规率在14.7%到85.7%之间波动,且同一模型在不同领域表现高度不一致,揭示了安全机制缺乏透明度和一致性。

详情
AI中文摘要

我们对开放权重大语言模型中领域依赖的安全行为进行了系统研究:在7个伦理领域进行了7项标准化实验,测试了5个模型(12B--70B),共4200次交互,并采用双法官验证。使用双条件方法,每个场景在分析框架(识别危害)和操作框架(帮助实施危害)下进行测试,我们发现合规率从14.7%(人口贩卖)到85.7%(监控设计)不等,跨度达71个百分点,且非重叠的聚类自助法95%置信区间。可信部署需要可预测的安全行为,但我们发现合规性高度依赖于上下文:同一模型(Mistral Nemo 12B)在100%的请求中提供监控设计,但仅在26.7%的请求中协助贩卖。这种不可预测性对部署者来说是不透明的:技术框架绕过,即有害请求被重新定义为工程问题,从而覆盖安全训练,而没有任何外部信号表明拒绝阈值已改变。领域内异质性高达84.4个百分点,意味着即使在领域层面也无法预测安全行为。在通过GitHub Copilot CLI部署产品界面访问的五个前沿封闭模型(GPT-4.1/5.2, Claude Haiku/Sonnet/Opus 4.x;n=4,163个响应)上进行的复制实验,再现了相同的领域分层,绝对水平有所减弱但形状相同,其中两个低规范化领域(科学欺诈、监控)再次最为宽松。这些结果表明,当前的安全机制缺乏可信AI部署所需的透明度和一致性。

英文摘要

We present a systematic study of domain-dependent safety behavior in open-weight LLMs: 7 standardized experiments across 7 ethical domains, testing 5 models (12B--70B) in 4,200 interactions with dual-judge validation. Using a dual-condition methodology, each scenario tested in both an analytical framing (identify the harm) and an operational framing (help commit the harm), we find compliance rates vary from 14.7% (human trafficking) to 85.7% (surveillance design), a 71-percentage-point span with non-overlapping cluster-bootstrapped 95% CIs. Trustworthy deployment requires predictable safety behavior, yet we find compliance is highly context-dependent: the same model (Mistral Nemo 12B) provides surveillance designs in 100% of requests but assists with trafficking in only 26.7%. This unpredictability is opaque to deployers: the technical framing bypass, where harmful requests reframed as engineering problems override safety training without any external signal that refusal thresholds have shifted. Within-domain heterogeneity reaches 84.4pp, meaning safety behavior cannot be predicted even at the domain level. A replication on five frontier closed models (GPT-4.1/5.2, Claude Haiku/Sonnet/Opus 4.x; n=4,163 responses) accessed via the GitHub Copilot CLI deployed-product surface reproduces the same domain stratification, attenuated in absolute level but identical in shape, with the two low-codification domains (science fraud, surveillance) again the most permissive. These results show that current safety mechanisms lack the transparency and consistency required for trustworthy AI deployment.

2606.04027 2026-06-04 cs.CR cs.AI 版本更新

MaskForge: Structure-Aware Adaptive Attacks for Jailbreaking Diffusion Large Language Models

MaskForge:用于越狱扩散大语言模型的结构感知自适应攻击

Yingzi Ma, Zhengyue Zhao, Xiaogeng Liu, Minhui Xue, Yue Zhao, Chaowei Xiao

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Johns Hopkins University(约翰霍普金斯大学) University of Southern California(南加州大学) Responsible AI Research (RAIR) Centre, The University of Adelaide(阿德莱德大学负责任人工智能研究中心)

AI总结 提出MaskForge,一种全黑盒自适应攻击方法,通过优化结构模式库实现扩散大语言模型的红队测试,平均攻击成功率达79.3%。

Comments 28 pages, 7 figures, 11 tables. Preprint

详情
AI中文摘要

扩散大语言模型(dLLMs)通过在双向上下文下迭代去噪部分掩码序列来生成文本,展现出与自回归LLMs不同的安全表面。由于掩码令牌是原生输入,且令牌由置信度而非位置决定,因此可以通过填充和在受监控前缀之外诱导有害内容。现有的越狱方法要么忽略了这种原生填充能力,要么依赖于低多样性的掩码模板,这些模板统一应用于所有目标,缺乏结构适应性或累积攻击经验。我们提出MaskForge,一种全黑盒自适应攻击,将dLLM红队测试转化为对不断增长的结构模式库的优化搜索。MaskForge将成功的尝试抽象为可重用的模式,使用UCB bandit选择与目标兼容的模式,并在当前库失败时调用评分器引导的备用方案。成功的尝试被蒸馏回模式库,使得经验能够跨目标累积。在五个公开dLLM和三个基准测试中,MaskForge实现了79.3%的平均攻击成功率,相比最强的竞争dLLM基线相对提升17.6%。成熟的模式库进一步迁移到AdvBench而无需任何更新,实现了88.2%的攻击成功率和相比最强竞争基线67%的相对提升。

英文摘要

Diffusion large language models (dLLMs) generate text by iteratively denoising partially masked sequences under bidirectional context, exposing a safety surface distinct from autoregressive LLMs. Because mask tokens are native inputs and tokens are committed by confidence rather than position, harmful content can be induced through infilling and outside the monitored prefix. Existing jailbreaks either miss this native infill capability or rely on low-diversity mask-bearing templates applied uniformly across goals, with little structural adaptation or accumulated attack experience. We propose MaskForge, a fully black-box adaptive attack that casts dLLM red-teaming as optimized search over a growing library of structural patterns. MaskForge abstracts successful attempts into reusable schemas, selects goal-compatible patterns with a UCB bandit, and invokes a scorer-guided fallback when the current library fails. Successful attempts are distilled back into the pattern library, enabling experience to accumulate across goals. Across five public dLLMs and three benchmarks, MaskForge achieves an average attack success rate of 79.3%, a 17.6% relative improvement over the strongest competing dLLM baseline. The matured pattern library further transfers to AdvBench without any updates, achieving a 88.2% attack success rate and a 67% relative improvement over the strongest competing baseline.

2606.04025 2026-06-04 cs.SE cs.AI 版本更新

The Biomimetic Architecture of Software 4.0

软件4.0的仿生架构

Philip Sheldrake, Dirk Scheffler

发表机构 * Unnamed Labs Amsterdam(阿姆斯特丹无名实验室) Unnamed Labs Karlsruhe(卡尔斯鲁厄无名实验室)

AI总结 本文提出软件4.0范式,通过自创生异质架构融合人类智能、神经AI与反射符号基底,解决概率-符号阻抗不匹配问题,并介绍实现该架构的编程语言Recognitive。

Comments 14 pages

详情
AI中文摘要

主流编程范式继承了一种为单个人脑指导本地机器的过去时代优化的执行模型,使得当代系统背负着历史路径依赖。当被迫承载多维连接主义智能时,这种脆弱的组装模型在深刻的概率-符号阻抗不匹配的重压下断裂。虽然当代软件3.x框架试图通过将大型语言模型(LLM)封装在日益复杂的外部框架中来修补这种不匹配,但这种螺旋上升的架构复杂性只会增加静态代码组装的开销。为了从根源而非表象解决问题,本文引入了软件4.0——一个由人类智能、神经AI和原生反射符号基底构成的自创生异质架构。在此范式下,软件从待解析的惰性语料转变为自我调节的代谢网络,原生地验证、修改和演化自身的结构完整性。我们提出了Recognitive,即实现该架构的编程语言和平台。通过将结构验证的负担卸载到确定性基底上,它解锁了一种优越的推理时扩展机制——其中连接主义计算完全转化为深度语义探索和假设遍历,而非以毁灭性的计算和财务成本来概率性地模拟结构约束。超越传统的“软件工厂”思维,我们概述了将连接主义意图落地并全面进入智能时代所需的理论基础。这是一篇基础性愿景论文;类型系统和操作语义的经验评估及形式化规范是未来工作的主题。

英文摘要

Dominant programming paradigms inherit an execution model optimised for a bygone era of a single human mind instructing a local machine, leaving contemporary systems burdened with historical path dependencies. When forced to host multi-dimensional, connectionist intelligence, this brittle assembly model fractures under the weight of a profound probabilistic-symbolic impedance mismatch. While contemporary Software 3.x frameworks attempt to patch the mismatch by encasing large language models (LLMs) in increasingly complicated external harnesses, this spiralling architectural complexity only compounds the carrying cost of static code assembly. To address the cause rather than the effects, this paper introduces Software 4.0 -- an autopoietic heterarchy of human intelligence, neural AI, and natively reflective symbolic substrate. Under this paradigm, software is transformed from an inert corpus to be parsed into a self-regulating metabolic network that natively verifies, modifies, and evolves its own structural integrity. We present Recognitive, the programming language and platform that materialises this architecture. By offloading the burden of structural verification to a deterministic substrate, it unlocks a superior inference-time scaling regime -- one where connectionist compute translates entirely into deep semantic exploration and hypothesis traversal rather than the ruinous computational and financial cost of simulating structural constraints probabilistically. Moving beyond the legacy 'Software Factory' mindset, we outline the theoretical foundations required to ground connectionist intent and arrive fully in the intelligence age. This is a foundational vision paper; empirical evaluation and formal specification of the type system and operational semantics are the subject of future work.

2606.04023 2026-06-04 cs.SE cs.AI 版本更新

CodegenBench: Can LLMs Write Efficient Code Across Architectures?

CodegenBench: 大型语言模型能否跨架构编写高效代码?

Jie Li, Wenzhao Wu, Junqi Hu, Qinrui Zheng, Bowen Wu, Juepeng Zheng, Yutong Lu, Haohuan Fu

发表机构 * Sun Yat-sen University(中山大学) National Supercomputing Center in Wuxi(无锡国家超级计算机中心) National Supercomputing Center in Shenzhen(深圳国家超级计算机中心) University of the Chinese Academy of Sciences(中国科学院大学) Tsinghua Shenzhen International Graduate School(清华大学深圳国际研究生院)

AI总结 提出CodegenBench基准测试,评估LLMs在x86_64、Sunway和Kunpeng三种架构上生成高效并行代码的能力,发现其在通用架构上表现良好,但在领域特定架构上性能显著下降。

Comments 29 pages, 22 figures

详情
AI中文摘要

尽管大型语言模型(LLMs)在通用编程和GPU加速环境(如PyTorch、CUDA)的代码生成任务中得到了广泛评估,但它们在面向CPU的高性能计算(HPC)跨不同架构上的能力仍未充分探索。为填补这一空白,我们引入了CodegenBench,这是一个全面的基准测试套件,旨在评估在三种不同硬件平台(x86_64、Sunway和Kunpeng)上生成高效并行代码的能力。我们的基准测试包含106个标准基本线性代数子程序(BLAS)例程,建立了一个基础基线,以及20个针对每个独特超级计算架构(LeetSunway和LeetKunpeng)改编的专门计算内核。我们的广泛评估揭示,虽然最先进的LLMs能够为像x86_64这样的普遍架构生成优化代码,但在公共文档和训练数据有限的领域特定架构上,它们表现出显著的性能下降,突显了跨平台泛化的关键局限性。此外,我们对影响代码质量的因素(如实现长度和任务复杂度)的分析表明,当前LLMs在需要简洁代码片段的中等难度问题上最为有效。我们开源了我们的数据集和自动化评估基础设施,以促进未来在LLM驱动的高性能代码生成方面的研究。资源可在https://anonymous.4open.science/r/CodegenBench-EDE1/和https://anonymous.4open.science/r/CodegenBenchDataset-2551/获取。

英文摘要

While large language models (LLMs) have been extensively evaluated on code generation tasks for general-purpose programming and GPU-accelerated environments (e.g., PyTorch, CUDA), their capabilities in CPU-oriented high-performance computing (HPC) across diverse architectures remain underexplored. To bridge this gap, we introduce CodegenBench, a comprehensive benchmark suite designed to evaluate the generation of efficient parallel code across three distinct hardware platforms: x86_64, Sunway, and Kunpeng. Our benchmark comprises 106 standard Basic Linear Algebra Subprograms (BLAS) routines establishing a fundamental baseline, alongside 20 specialized computational kernels adapted for each of the unique supercomputing architectures (LeetSunway and LeetKunpeng). Our extensive evaluation reveals that while state-of-the-art LLMs can generate optimized code for ubiquitous architectures like x86_64, they exhibit significant performance degradation on domain-specific architectures with limited public documentation and training data, highlighting critical limitations in cross-platform generalization. Furthermore, our analysis of factors influencing code quality such as implementation length and task complexity indicates that current LLMs are most effective for moderately difficult problems requiring concise code snippets. We open-source our dataset and automated evaluation infrastructure to facilitate future research in LLM-driven high-performance code generation. The resources are available at https://anonymous.4open.science/r/CodegenBench-EDE1/ and https://anonymous.4open.science/r/CodegenBenchDataset-2551.

2606.04019 2026-06-04 eess.SP cs.AI 版本更新

Gravity-Aware Hierarchical Routing for Lightweight SensorLLM on Human Activity Recognition

面向人体活动识别的轻量级SensorLLM的重力感知层次路由

Hao Li, Mingrui Zheng, Yasuyuki Tahara, Yuichi Sei

发表机构 * Department of Informatics, Graduate School of Informatics and Engineering(信息学院信息科学与工程研究生院) Graduate School of Information Science and Technology(信息科学与技术研究生院)

AI总结 针对轻量级SensorLLM在静态活动识别上的退化问题,提出一种基于重力感知层次路由的轻量级后对齐适配方法,通过统计线索和软路由显著提升静态类别的宏F1分数。

详情
AI中文摘要

最近关于传感器-语言对齐的研究表明,两阶段框架可以提高可穿戴传感器人体活动识别(HAR)的语义建模能力,其中SensorLLM风格的方法首先进行运动到语言的对齐,然后微调模型用于下游任务。然而,我们的实验揭示了一个一致的失败模式:当第二阶段的主干被压缩到紧凑模型(如TinyLlama)时,动态活动的识别仍然相对较强,而低运动静态类别(如站立、坐着和躺着)的区分能力显著下降。为了解决这个问题,我们提出了一种重力感知层次路由头,作为一种轻量级的后对齐适配方法,构建在已经对齐的模型之上,而不是一个新的大规模预训练框架。该方法使用来自Chronos分词器状态的每通道均值和标准差来提取与姿势和重力方向相关的统计线索,并通过软路由自适应地结合静态专家和全专家,同时使用负载平衡损失进行稳定训练。在MHealth数据集上,该设计以最小的参数开销显著提高了宏F1分数,并且增益主要集中在静态类别上,同时保持了对动态活动的强性能。作为arXiv上的首次披露,本文仅报告了单个数据集上的结果,旨在突出核心方法,并为未来工作中的更广泛评估奠定基础。

英文摘要

Recent studies on sensor-language alignment have shown that two-stage frameworks can improve the semantic modeling ability of wearable-sensor human activity recognition (HAR), where SensorLLM-style methods first perform motion-to-language alignment and then fine-tune the model for downstream tasks. However, our experiments reveal a consistent failure mode when the Stage 2 backbone is compressed to a compact model such as TinyLlama: recognition of dynamic activities remains relatively strong, while the discrimination of low-motion static classes such as standing, sitting, and lying degrades substantially. To address this issue, we propose a gravity-aware hierarchical routing head as a lightweight post-alignment adaptation built on top of an already aligned model, rather than a new large-scale pretraining framework. The method uses the per-channel mean and std from the Chronos tokenizer state to extract statistical cues related to posture and gravity direction, and adaptively combines a static expert and a full expert through soft routing, together with a load-balancing loss for stable training. On the MHealth dataset, this design significantly improves macro-F1 with minimal parameter overhead, and the gains are concentrated mainly on static classes while preserving strong performance on dynamic activities. As a first arXiv disclosure, the current paper reports results on a single dataset only, with the goal of highlighting the core method and laying the groundwork for broader evaluation in future work.

2606.04010 2026-06-04 q-bio.NC cs.AI 版本更新

The Variance Brain Foundation Models Forgot: Third-Order Statistics Predict Cognition Where Billion-Parameter Models Fail

大脑基础模型遗忘的方差:三阶统计在十亿参数模型失败时预测认知

Giovanni Marraffini, Gabriel Mahuas, Trinidad Borrell, Victoria Shevchenko, Demian Wassermann

发表机构 * Inria Saclay Île-de-France, CEA, Université Paris-Saclay, Palaiseau, France(法国巴黎萨克雷大学Inria萨克雷研究中心、CEA、巴黎萨克雷大学、帕莱索分校) Sigma Nova Sorbonne Université, Institut du Cerveau - Paris Brain Institute - ICM(索邦大学、巴黎脑研究所-巴黎脑研究所-ICM) Forschungszentrum Jülich(茹里希研究中心)

AI总结 研究发现,大脑基础模型(BFMs)的预训练主要捕获了fMRI信号中的方差成分,但忽略了预测认知的高阶结构,而基于三阶协偏度张量的线性管道无需预训练即可超越现有BFMs。

Comments 37 pages, 16 figures, 23 tables

详情
AI中文摘要

大脑基础模型(BFMs)是在fMRI数据上预训练的自监督Transformer。我们认为这些模型应该能从fMRI信号中捕捉每个受试者的认知表现。然而,在三个最先进的BFM和所有我们测试的读出方法中,它们对认知的预测能力都低于基于功能连接矩阵(FC)的约8万参数的线性回归。差距随着规模扩大而加剧:BrainLM的6.5亿模型预测认知的能力低于其1.11亿模型。我们将此归因于方差分配问题:BFM预训练捕获了主导fMRI的方差成分,但没有捕获预测认知的高阶结构。我们对重构信号的每累积量分析表明,二阶协方差部分保留,而三阶协偏度张量大部分被破坏。为了恢复BFM丢失的信息,我们设计了一个线性管道,将fMRI信号投影到最能保留其协偏度的子空间,并在那里计算FC。这在我们测试的每个数据集和分区上都超过了原始FC和所有预训练的BFM,在受控评估下优于先前最先进方法,且无需预训练和GPU。我们通过在相同子空间上使用针对性的损失进行微调,恢复了BrainLM前向传播中原始FC的上限。这表明瓶颈在于预训练目标,而非架构或模型大小。

英文摘要

Brain foundation models (BFMs) are self-supervised Transformers pretrained on fMRI data. We posit that these models should capture each subject's cognitive performance from their fMRI signal. Yet across three state-of-the-art BFMs and every readout we test, they predict cognition worse than a linear regression from the $\sim$80K parameters of the functional connectivity matrix (FC). The gap widens with scale: BrainLM's 650M model predicts cognition worse than its 111M. We attribute this to a \textbf{variance allocation problem}: BFM pretraining captures the variance components that dominate fMRI but not the higher-order structure that predicts cognition. Our per-cumulant analysis of the reconstructed signal shows that the second-order covariance is partially preserved, while the third-order co-skewness tensor is largely destroyed. To recover what BFMs lose, we design a linear pipeline that projects the fMRI signal into the subspace that best preserves its co-skewness and computes FC there. This \textbf{exceeds raw FC and every pretrained BFM} on every dataset and parcellation we test, outperforming prior state-of-the-art under controlled evaluation \textbf{with no pretraining and no GPU}. We \textbf{recover the raw-FC ceiling on BrainLM's forward pass} by finetuning with a loss targeted at this same subspace. This shows that the bottleneck is the pretraining objective, not the architecture or the model size.

2606.04008 2026-06-04 eess.SP cs.AI 版本更新

Neural Radiated-Noise Fields for Unmanned Underwater Vehicle Noise Spectrum Prediction in Three-Dimensional Scenes

用于三维场景中无人水下航行器噪声频谱预测的神经辐射噪声场

Yan Wu, Yang Yang, Jun Fan, Bin Wang

发表机构 * Key Laboratory of Marine Intelligent Equipment and System, Ministry of Education, Shanghai Jiaotong University, Shanghai(海洋智能装备与系统重点实验室、教育部、上海交通大学、上海)

AI总结 提出神经辐射噪声场(NRNF),将UUV辐射噪声谱表示为三维位置、偏航角和频率的连续函数,实现任意空间位置的查询预测,在湖试数据集上平均预测误差为3.5 dB。

详情
AI中文摘要

无人水下航行器(UUV)的辐射噪声是表征声学特征和评估平台性能的重要指标。针对传统基于物理建模和数值模拟方法对目标结构信息和环境边界条件依赖性强,且无法在三维场景中实现连续空间频谱响应建模的问题,本文提出了一种神经辐射噪声场(NRNF)。NRNF将UUV辐射噪声谱表示为三维UUV位置、三维水听器位置、UUV偏航角和频率的连续函数,从而能够在任意空间位置进行基于查询的预测。所提方法采用正弦编码处理位置和频率,并引入可学习的三维场景特征网格来显式表示环境结构和传播效应。基于湖试构建了频谱预测数据集,并在水平外推、深度外推和跨航次泛化三种设置下评估模型。结果表明,NRNF在50至5000 Hz频段实现了3.5 dB的平均预测误差。水平外推最容易,深度外推最具挑战性,跨航次泛化难度居中。进一步的消融实验表明,场景特征网格显著提高了模型的预测稳定性和空间泛化能力。

英文摘要

Radiated noise in unmanned underwater vehicles (UUVs) is an important indicator for characterizing acoustic signatures and evaluating platform performance. To address the strong dependence of traditional physics-based modeling and numerical simulation methods on target structural information and environmental boundary conditions, and their inability to achieve continuous spatial spectrum-response modeling in three-dimensional scenes, this paper proposes a neural radiated-noise field (NRNF). An NRNF represents the UUV radiated-noise spectrum as a continuous function of the three-dimensional UUV position, the three-dimensional hydrophone position, the UUV yaw angle, and the frequency, enabling query-based prediction at arbitrary spatial locations. The proposed method employs sinusoidal encoding for position and frequency, and introduces a learnable three-dimensional scene feature grid to explicitly represent environmental structure and propagation effects. A spectrum-prediction dataset is constructed from lake trials, and the proposed model is evaluated under three settings: horizontal extrapolation, depth extrapolation, and cross-run generalization. Results show that the NRNF achieves an average prediction error of 3.5 dB in the 50 to 5000 Hz band. Horizontal extrapolation is easiest, depth extrapolation is the most challenging, and cross-run generalization is of intermediate difficulty. Further ablation results demonstrate that the scene feature grid significantly improves the prediction stability and spatial generalization of the model.

2606.03995 2026-06-04 cs.LG cs.AI q-bio.QM 版本更新

Early Detection of Alzheimer's Disease Using Explainable Machine Learning on Clinical Biomarkers: A Multi-Class Classification Study Using the Alzheimer's Disease Neuroimaging Initiative (ADNI) Dataset

使用可解释机器学习基于临床生物标志物早期检测阿尔茨海默病:基于阿尔茨海默病神经影像学倡议(ADNI)数据集的多分类研究

Afshan Hashmi

发表机构 * TRDC, Tuwaiq Academy(TRDC,图瓦伊克学院)

AI总结 本研究使用XGBoost分类器,基于ADNI数据集的8个临床特征(MMSE、CDR Global、CDR-SB、MoCA、FAQ、年龄、性别、教育程度)进行三分类(正常认知、轻度认知障碍、阿尔茨海默病)检测,通过SMOTE处理类别不平衡,Optuna优化超参数,SHAP提供可解释性,在测试集上达到macro AUC 0.982、准确率0.943,并揭示了临床合理的特征重要性模式。

详情
AI中文摘要

背景:阿尔茨海默病(AD)影响全球超过5500万人。从常规临床评估中准确、可解释地检测正常认知(NC)、轻度认知障碍(MCI)和AD仍是一个关键未满足需求。方法:使用XGBoost分类器进行三分类检测,采用来自阿尔茨海默病神经影像学倡议(ADNI)的八个临床特征:MMSE、CDR Global、CDR Sum of Boxes(CDR-SB)、MoCA、FAQ、年龄、性别和教育程度。使用Optuna(50次试验)优化超参数;通过SMOTE处理类别不平衡。性能通过macro AUC-ROC(1000次迭代bootstrap 95%置信区间)、macro F1、平衡准确率和Cohen's kappa评估。SHAP值提供特征级别的可解释性。结果:数据集包含1641名基线受试者(608 NC、767 MCI、266 AD)。在五折交叉验证中,平均macro AUC为0.983(SD 0.007),准确率为0.944(SD 0.006),macro F1为0.929(SD 0.008)。在保留测试集(n=247)上,macro AUC为0.982(95% CI: 0.965--0.995),准确率为0.943,平衡准确率为0.932,macro F1为0.927,Cohen's kappa为0.909。SHAP分析确定CDR Global是NC和MCI的主要预测因子,而CDR-SB和MMSE共同驱动AD分类。结论:一个基于常规临床评估训练的可解释机器学习模型实现了近乎完美的三分类阿尔茨海默病检测。SHAP分析揭示了临床合理、类别特定的特征重要性模式,支持临床有效性。未来工作将扩展该框架,加入语音生物标志物以实现多模态检测。

英文摘要

Background: Alzheimer's disease (AD) affects over 55 million people worldwide. Accurate, interpretable detection of normal cognition (NC), mild cognitive impairment (MCI), and AD from routine clinical assessments remains a critical unmet need. Methods: An XGBoost classifier was developed for three-class detection using eight clinical features from the Alzheimer's Disease Neuroimaging Initiative (ADNI): MMSE, CDR Global, CDR Sum of Boxes (CDR-SB), MoCA, FAQ, age, sex, and education. Hyperparameters were optimised using Optuna (50 trials); class imbalance was addressed with SMOTE. Performance was evaluated by macro AUC-ROC with 1,000-iteration bootstrap 95% confidence intervals, macro F1, balanced accuracy, and Cohen's kappa. SHAP values provided feature-level explainability. Results: The dataset comprised 1,641 baseline subjects (608 NC, 767 MCI, 266 AD). On five-fold cross-validation, mean macro AUC was 0.983 (SD 0.007), accuracy 0.944 (SD 0.006), and macro F1 0.929 (SD 0.008). On the held-out test set (n = 247), macro AUC was 0.982 (95% CI: 0.965--0.995), accuracy 0.943, balanced accuracy 0.932, macro F1 0.927, and Cohen's kappa 0.909. SHAP analysis identified CDR Global as the dominant predictor for NC and MCI, while CDR-SB and MMSE together drove AD classification. Conclusion: An explainable machine learning model trained on routine clinical assessments achieves near-perfect three-class Alzheimer's detection. SHAP analysis reveals clinically plausible, class-specific feature importance patterns supporting clinical validity. Future work will extend this framework with speech biomarkers for multimodal detection.

2605.04356 2026-06-04 cs.LG cs.AI 版本更新

Efficiently Aligning Language Models with Online Natural Language Feedback

通过在线自然语言反馈高效对齐语言模型

Christine Ye, Joe Benton

发表机构 * GitHub

AI总结 提出使用在线自然语言反馈替代可验证奖励,通过迭代优化代理奖励模型并在过优化点收集专家监督,在模糊领域高效对齐语言模型,实验表明可大幅提升专家监督的数据效率。

详情
AI中文摘要

可验证奖励的强化学习已被用于在许多领域激发语言模型的出色性能。但是,AI的广泛有益部署可能需要我们在“模糊”、难以监督的领域中训练具有强大能力的模型。在本文中,我们开发了在模糊领域中对齐语言模型的方法,其中人类专家仍然能够提供高质量的监督信号,但仅限于少量模型输出,使用在线自然语言反馈。具体来说,我们通过迭代优化代理奖励信号来训练模型,在过优化点停止,收集新的专家监督,并更新代理奖励。我们使用上下文学习(ICL)和微调从语言模型构建代理奖励模型。我们通过分别在Qwen3-8B和Haiku 4.5上激发创意写作和对齐研究能力来测试我们的方法。对于Qwen3-8B,ICL方法使用50倍更少的专家样本恢复了高达35%的性能,而微调方法使用最多20倍更少的样本恢复了80%,使用3倍更少的样本恢复了100%。对于Haiku 4.5,ICL方法使用30倍更少的样本恢复了高达35%的性能,微调方法使用10倍更少的样本恢复了100%。我们的结果表明,在线自然语言反馈可以显著提高专家监督的数据效率。

英文摘要

Reinforcement learning with verifiable rewards has been used to elicit impressive performance from language models in many domains. But, broadly beneficial deployments of AI may require us to train models with strong capabilities in "fuzzy", hard-to-supervise domains. In this paper, we develop methods to align language models in fuzzy domains where human experts are still able to provide high-quality supervision signal, but only for a small number of model outputs, using online natural language feedback. Specifically, we train models by iteratively optimizing against proxy reward signals, stopping at the point of over-optimization, collecting fresh expert supervision, and updating the proxy reward. We construct proxy reward models from language models using in-context learning (ICL) and fine-tuning. We test our methods by eliciting creative writing and alignment research capabilities in Qwen3-8B and Haiku 4.5 respectively. For Qwen3-8B, ICL methods recover up to 35% of performance with 50x fewer expert samples, while fine-tuning methods recover 80% with up to 20x fewer samples and 100% with 3x fewer samples. For Haiku 4.5, ICL methods recover up to 35% of performance with 30x fewer samples, and fine-tuning methods recover 100% with 10x fewer samples. Our results suggest that online natural language feedback can substantially improve the data efficiency of expert supervision.

2606.03988 2026-06-04 cs.AI 版本更新

Imaginative Perception Tokens Enhance Spatial Reasoning in Multimodal Language Models

想象感知标记增强多模态语言模型的空间推理能力

Mahtab Bigverdi, Linjie Li, Weikai Huang, Yiming Liu, Jaemin Cho, Jieyu Zhang, Tuhin Kundu, Chris Dangjoo Kim, Zelun Luo, Linda Shapiro, Ranjay Krishna

发表机构 * University of Washington(华盛顿大学) Allen Institute for AI(Allen人工智能研究所) Microsoft(微软) OpenAI(开放人工智能研究院)

AI总结 提出想象感知标记(IPT)作为中间感知表征,通过监督学习提升多模态语言模型在不可见视角推理、遮挡路径追踪等空间推理任务上的性能,在三个新构建的数据集上优于文本思维链训练。

详情
AI中文摘要

视觉语言模型(VLM)在许多任务上表现出色,但当关键信息无法直接观察时,仍难以进行空间推理。许多此类问题需要想象感知:从未见视角推断所见内容、追踪穿过遮挡空间的路径、或将部分观察整合成连贯的空间表征。我们引入了想象感知标记(IPT),这是一种中间感知表征,将VLM在替代空间配置下会感知到的内容外部化,同时保持与观察输入一致。为了研究这一能力,我们设计了三个任务:视角推理(PET)、路径追踪(PT)和多视角计数(MVC),并构建了包含约20K个样本的数据集,附带真实想象、答案和评估基准。以统一VLM BAGEL为骨干,IPT监督持续提升了空间推理性能,并且通常优于文本思维链训练,即使在推理时不生成图像。在MVC上,IPT将准确率提高了3.4%,并在PT上达到了与强大闭源模型竞争的性能。我们进一步发现,将IPT与仅标签监督相结合能带来额外收益,而文本思维链可能大幅降低性能,这表明当空间计算被迫通过语言进行时存在模态不匹配。总体而言,IPT为推理未观察到的空间结构提供了原则性的监督信号,在生成可解释中间表征的同时提升了泛化能力。

英文摘要

Vision language models (VLMs) excel at many tasks but still struggle with spatial reasoning when critical information is not directly observable. Many such problems require imaginative perception: inferring what would be seen from an unseen viewpoint, tracing paths through occluded spaces, or integrating partial observations into a coherent spatial representation. We introduce Imaginative Perception Tokens (IPT), intermediate perceptual representations that externalize what a VLM would perceive under alternative spatial configurations while remaining consistent with the observed input. To study this capability, we formulate three tasks, Perspective Taking (PET), Path Tracing (PT), and Multiview Counting (MVC), and construct datasets of approximately 20K examples with ground truth imaginations, answers, and evaluation benchmarks. Using the unified VLM BAGEL as the backbone, IPT supervision consistently improves spatial reasoning and often outperforms textual chain of thought training, even without generating images at inference time. On MVC, IPT improves accuracy by 3.4% and achieves competitive performance with strong closed-source models on PT. We further find that combining IPT and label-only supervision yields additional gains, whereas textual chain of thought can substantially degrade performance, suggesting a modality mismatch when spatial computation is forced through language. Overall, IPT provides a principled supervision signal for reasoning about unobserved spatial structure, improving generalization while producing interpretable intermediate representations.

2606.03938 2026-06-04 cs.LG cs.AI 版本更新

q0: Primitives for Hyper-Epoch Pretraining

q0: 超周期预训练的原语

Bishwas Mandal, Shmuel Berman, Akshay Vegesna, Samip Dahal

发表机构 * Q Labs(Q实验室) Princeton University(普林斯顿大学)

AI总结 针对多周期训练中单模型性能饱和的问题,提出超周期预训练(q0)方法,通过循环调度、链式蒸馏和学习先验三个原语,从多周期预算中生成多样化模型群体并聚合其预测,显著提升数据效率。

Comments 22 pages, 5 figures

详情
AI中文摘要

多周期训练正成为标准做法,因为计算能力的增长速度快于高质量文本的供应。但预训练单个模型会在几轮后饱和,远在计算预算耗尽之前。我们认为这需要概念上的转变,从训练单个模型转向探索模型群体并聚合它们的预测。我们引入了超周期预训练(q0),它将多周期预算转化为多样化模型群体,其组合预测比单个精炼模型达到更低的验证损失。q0 归结为三个核心原语。具有反相关学习率和权重衰减的循环调度从几个并行轨迹中收集多样化模型。链式蒸馏使每个模型针对其前驱进行训练,从而模型质量在群体中累积。一个在保留集上拟合的学习先验,为任何推理预算选择和加权成员。在 1.8B 参数模型上,使用 100M FineWeb 令牌训练,q0 仅使用约 56 个周期(约 4.6 倍更少)即可匹配强大的 256 周期集成基线,或当匹配基线的集成大小时使用约 67 个周期(约 3.8 倍更少),并持续改进。这些增益在 Slowrun 设置下达到累积约 12.9 倍的数据效率,并迁移到下游基准测试。关键的是,最优分配随预算变化,因此我们给出了处方性配方,说明如何花费给定的周期预算以最大化泛化,从单个周期到最大预算。

英文摘要

Multi-epoch training is becoming the standard now that compute is growing faster than the supply of high-quality text. But pretraining a single model saturates within a few passes, long before the compute budget is exhausted. We argue this calls for a conceptual shift from training a single model toward exploring a population of models and aggregating their predictions. We introduce hyper-epoch pretraining (q0), which turns a multi-epoch budget into a population of diverse models whose combined predictions reach a lower validation loss than a single refined model. q0 reduces to three core primitives. A cyclic schedule with anti-correlated learning rate and weight decay collects diverse models from a few parallel trajectories. Chain distillation trains each model against its predecessor so that model quality compounds across the population. A learned prior, fit on a held out set, selects and weights members for any inference budget. On a 1.8B-parameter model trained on 100M FineWeb tokens, q0 matches a strong 256-epoch ensemble baseline using only ~56 epochs (~4.6x fewer), or ~67 epochs (~3.8x fewer) when matched to the baseline's ensemble size, and continues to improve beyond it. These gains reach cumulative ~12.9x data efficiency under the Slowrun setting and transfer to downstream benchmarks. Crucially, the optimal allocation shifts with the budget, so we give prescriptive recipes for how to spend a given epoch budget to maximize generalization, from a single epoch up to the largest budgets.

2606.03937 2026-06-04 cs.AI 版本更新

Entropy Is Not Enough: Unlocking Effective Reinforcement Learning for Visual Reasoning via Vision-Anchored Token Selection

熵是不够的:通过视觉锚定令牌选择解锁视觉推理的有效强化学习

Senjie Jin, Peixin Wang, Boyang Liu, Xiaoran Fan, Shuo Li, Zhiheng Xi, Jiazheng Zhang, Yuhao Zhou, Tao Gui, Qi Zhang, Xuanjing Huang

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University(复旦大学计算机科学与人工智能学院)

AI总结 针对视觉推理中基于熵的信用分配机制失效问题,提出VEPO框架,通过视觉敏感性与令牌熵的乘法耦合实现梯度信用重定向,显著提升多模态强化学习性能。

详情
AI中文摘要

虽然令牌级熵通常被认为在仅文本的强化学习与可验证奖励(RLVR)中对于信用分配有效,但尚不清楚该机制在视觉推理中是否仍然成立。我们的对照研究表明,由于忽略了具有自然低熵的视觉敏感令牌,该机制在视觉推理中失效。尽管现有的多模态RL方法日益认识到视觉感知的重要性,但它们难以满足将精确感知基础与语义推理交织的内在需求,要么缺乏系统的视觉度量,要么忽视了令牌熵主要驱动语义探索。为解决这一问题,我们引入了VEPO(视觉熵令牌选择策略优化),这是一个有效的RL框架,通过原则性的乘法耦合明确整合视觉敏感性与令牌熵,其中VEPO将梯度信用重定向到同时具有视觉基础且信息量高的令牌。大量实验表明VEPO具有领先性能,在7B规模上显著超过仅熵基线2.28分,在3B规模上超过3.15分。消融实验进一步证实了我们方法的合理性。

英文摘要

While token-level entropy is commonly recognized as effective for credit assignment in text-only reinforcement learning with verifiable rewards (RLVR), it remains unclear whether this mechanism still holds in visual reasoning. Our controlled study shows that this mechanism collapses in visual reasoning due to the omission of vision-sensitive tokens with naturally low entropy. Although existing multimodal RL methods increasingly acknowledge the importance of visual perception, they struggle to satisfy the inherent demand for interleaving precise perceptual grounding with semantic reasoning, either lacking systematic visual measurements or overlooking that token entropy primarily drives semantic exploration. To address this, we introduce VEPO (Vision-Entropy token-selection for Policy Optimization), an effective RL framework explicitly integrating visual sensitivity with token entropy via a principled multiplicative coupling, where VEPO redirects gradient credit toward tokens which are simultaneously visually grounded and highly informative. Extensive experiments demonstrate VEPO's leading performance, significantly outperforming the entropy-only baseline by 2.28 points at 7B-scale and 3.15 points at 3B-scale. Ablations further substantiate the soundness of our method.

2606.03892 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Synthesize and Reward -- Reinforcement Learning for Multi-Step Tool Use in Live Environments

合成与奖励——面向实时环境中多步骤工具使用的强化学习

Ibrahim Abdelaziz, Asim Munawar, Kinjal Basu, Maxwell Crouse, Chulaka Gunasekara, Suneet Katrekar, Pavan Kapanipathi

发表机构 * IBM Research(IBM研究院)

AI总结 提出PROVE框架,通过20个有状态MCP服务器、自动化数据合成流水线和多组件程序化奖励,解决多步骤工具调用中的环境构建、查询生成和奖励设计问题,在BFCL Multi-Turn、tau2-bench和T-Eval上分别提升最多+10.2、+6.8和+6.5分。

详情
AI中文摘要

训练LLM编排多步骤工具调用受到三个相互耦合的障碍的阻碍:现实的有状态执行环境构建成本高昂,合成训练查询通常与服务器的实际状态脱节(因此生成的工具调用无法执行),以及基于回忆的RL奖励会鼓励冗长的工具调用模式。我们提出PROVE(已验证环境上的程序化奖励),一个包含三项贡献的框架:(1)一个包含20个有状态MCP(模型上下文协议)服务器的库,暴露了343个工具,支持具有会话范围状态隔离的实时执行RL训练;(2)一个自动数据合成流水线,通过基于实时采样服务器状态的依赖图引导的对话模拟,针对这些服务器生成经过验证的多轮工具调用轨迹,使得每个生成的查询都引用实际存在的实体;(3)一个多组件程序化奖励——渐进式有效性评分、依赖感知覆盖率、具有复杂度缩放调用预算的自适应效率惩罚、工具名称信号和参数值匹配奖励——无需外部评判模型。我们使用相同的奖励超参数和约13K训练示例,通过GRPO训练了四个模型(Qwen3-4B、Qwen3-8B、Qwen2.5-7B、Granite-4.1-8B);仅对每个模型族从三点扫描中调整学习率。在BFCL Multi-Turn、tau2-bench和T-Eval上,PROVE分别带来了最多+10.2、+6.8和+6.5分的改进,表明紧凑的程序化奖励在两个模型族的多步骤工具编排上产生了一致的收益。

英文摘要

Training LLMs to orchestrate multi-step tool calls is held back by three coupled obstacles: realistic stateful execution environments are costly to build, synthetic training queries are often detached from the server's actual state (so the generated tool calls fail to execute), and recall-based RL rewards incentivize verbose tool-calling patterns. We present PROVE (Programmatic Rewards On Verified Environments), a framework with three contributions: (1) a library of 20 stateful MCP (Model Context Protocol) servers exposing 343 tools, enabling live-execution RL training with session-scoped state isolation; (2) a state-machine data synthesis pipeline that generates multi-turn tool-call trajectories grounded in live-sampled server state, so generated queries reference entities that actually exist; and (3) a multi-component programmatic reward with an adaptive efficiency penalty that counters the verbosity incentive of recall-based rewards. We train four models (Qwen3-4B, Qwen3-8B, Qwen2.5-7B, Granite-4.1-8B) with GRPO on the resulting ~13K training examples. On BFCL Multi-Turn, tau2-bench, and T-Eval, PROVE yields improvements of up to +10.2, +6.8, and +6.5 points respectively, demonstrating that this framework yields consistent gains on multi-step tool orchestration across two model families.

2606.03810 2026-06-04 cs.CL cs.AI 版本更新

Consistency Training Can Entrench Misalignment

一致性训练可能固化不对齐

David Demitri Africa, Arathi Mani

发表机构 * UK AI Security Institute(英国人工智能安全研究所)

AI总结 研究通过七种一致性训练方法在108个微调模型上的实验,发现一致性训练通常抑制奖励黑客和新兴不对齐,但会放大谄媚行为,并提出了一个统一的理论框架来解释其对齐效应。

Comments Accepted to ICML 2026

详情
AI中文摘要

一致性训练鼓励模型在相关输入或采样过程中产生相似输出。这类方法简单、可扩展且基本无需标签,但其对模型对齐的影响仍知之甚少。这些方法的自引导特性是否会放大模型中的不良行为?我们在108个“模型生物体”(经过微调以展示各种受控不对齐行为的开源模型,7B-70B)上测试了七种一致性训练方法。我们发现结果差异显著:一致性训练通常抑制奖励黑客和新兴不对齐,但会放大谄媚行为。我们提供的证据表明,由一致性标注过程引起的分布偏移(而非选择算子的变化)可能是系统性对齐效应的主要驱动因素。最后,我们提出了一个统一的理论框架,推导出一致性训练放大或抑制不对齐的条件。总之,我们的研究确立了一致性训练并非对齐中立的,其在关键系统中的使用应受到仔细审计。

英文摘要

Consistency training encourages a model to produce similar outputs across related inputs or sampling procedures. Such methods are simple, scalable, and largely label-free, but their effects on model alignment remain poorly understood. Could the self-bootstrapping nature of these methods amplify undesired behavior in models? We test seven consistency training methods on 108 model organisms: open-source models (7B--70B) fine-tuned to exhibit various forms of controlled misaligned behavior. We find that outcomes vary significantly: consistency training generally suppresses reward hacking and emergent misalignment but amplifies sycophancy. We present evidence that distribution shifts induced by the consistency labeling process, rather than variation in the selection operators, may be the primary driver of systematic alignment effects. Finally, we present a unifying theoretical framework to derive conditions under which consistency training will amplify or suppress misalignment. In total, our study establishes that consistency training is not alignment-neutral, and that its use in critical systems should be carefully audited.

2606.03746 2026-06-04 cs.CV cs.AI cs.GR cs.LG 版本更新

Qwen-Image-Flash: Beyond Objective Design

Qwen-Image-Flash:超越目标设计

Tianhe Wu, Kun Yan, Zikai Zhou, Lihan Jiang, Jiahao Li, Jie Zhang, Kaiyuan Gao, Ningyuan Tang, Shengming Yin, Xiaoyue Chen, Xiao Xu, Yilei Chen, Yuxiang Chen, Yan Shu, Yixian Xu, Yanran Zhang, Zihao Liu, Zhendong Wang, Zekai Zhang, Deqing Li, Liang Peng, Yi Wang, Jingren Zhou, Chenfei Wu

发表机构 * alibaba-inc.com(阿里巴巴公司)

AI总结 本文通过系统研究数据组成、教师指导和任务混合三个因素,提出Qwen-Image-Flash,表明有效的少步蒸馏不仅需要精心设计的目标,还需要对更广泛的训练流程进行原则性组织。

详情
AI中文摘要

少步蒸馏已成为加速先进视觉生成模型的有效策略,但先前的工作主要集中在蒸馏目标上。在这项工作中,我们从互补的角度重新审视少步蒸馏,重点关注关键影响学生表现的训练方案。以Qwen-Image-2.0为代表案例,我们系统地研究了统一文本到图像生成和指令引导图像编辑蒸馏中的三个因素:数据组成、教师指导和任务混合。我们的实证分析揭示了若干非直观行为,这些行为推动了Qwen-Image-Flash的开发。总体而言,我们的结果表明,有效的少步蒸馏不仅需要精心设计的目标,还需要对更广泛的训练流程进行原则性组织。

英文摘要

Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.

2606.03660 2026-06-04 cs.AI 版本更新

From Answers to States: Verifiable Process-Level Evaluation of Chemical Reasoning in Large Language Models

从答案到状态:大语言模型中化学推理的可验证过程级评估

Hongyu Guo, Hao Li, He Cao, Gongbo Zhang, Li Yuan

发表机构 * Peking University, Shenzhen Graduate School(北京大学深圳研究生院) International Digital Economy Academy (IDEA)(国际数字经济学院)

AI总结 提出ChemCoTBench-V2基准,通过确定性规则和参考轨迹验证结构化化学推理步骤,揭示模型在最终答案正确性与推理状态一致性之间的差距。

Comments 23 pages, 6 figures, 14 tables

详情
AI中文摘要

大语言模型越来越多地被用作化学助手,然而大多数化学基准仍然只对最终答案评分。这掩盖了一个关键的失败模式:模型可能输出正确的分子、产物或选项,但其推理过程违反了化学逻辑。现有的过程级评估器难以扩展,因为LLM评判者和人工步骤级过程注释成本高、不一致且容易产生幻觉。我们引入了ChemCoTBench-V2,一个规则可验证的诊断基准,用于对结构化、可验证的化学推理轨迹进行低成本、可审计的评估。它涵盖分子理解、分子编辑、分子优化和反应预测,包含18个报告任务中的5620个评估样本。模型必须在专家设计的模板中暴露关键中间步骤,这些步骤通过确定性化学规则进行检查,对于封闭答案任务,还使用参考轨迹而非另一个LLM评判者。开放式的分子优化通过预言机可验证的状态约束而非严格的轨迹匹配进行评估。该基准报告三个独立的信号:最终答案正确性、模板遵循度和基于专家精炼中间步骤的逐步骤验证器正确性。对前沿模型的实验揭示了最终答案成功与结构化推理状态一致性之间的持续差距:模型通常遵循要求的格式但未能通过化学步骤检查,或者正确回答但支持性推理薄弱。ChemCoTBench-V2支持细粒度模型比较,并识别轨迹首次违反验证器的具体步骤。

英文摘要

Large language models are increasingly used as chemistry assistants, yet most chemistry benchmarks still score only final answers. This masks a critical failure mode: a model may output the correct molecule, product, or option while its reasoning violates chemical logic. Existing process-level evaluators are hard to scale because LLM judges and human step-level process annotation are costly, inconsistent, and vulnerable to hallucination. We introduce ChemCoTBench-V2, a rule-verifiable diagnostic benchmark for low-cost, auditable evaluation of structured, verifier-addressable chemical reasoning traces. It spans molecular understanding, molecule editing, molecular optimization, and reaction prediction, with 5,620 evaluation samples across 18 reporting tasks. Models must expose key intermediate steps in expert-designed templates, and those steps are checked with deterministic chemistry rules and, for closed-answer tasks, reference traces rather than another LLM judge. Open-ended molecular optimization is evaluated with oracle-verifiable state constraints rather than strict trace matching. The benchmark reports three separate signals: final-answer correctness, template adherence, and step-wise verifier correctness over expert-refined intermediate commitments. Experiments on frontier models reveal a persistent gap between final-answer success and structured-reasoning-state consistency: models often follow the requested format while failing chemical-step checks, or answer correctly with weak supporting reasoning. ChemCoTBench-V2 enables fine-grained model comparison and identifies the concrete step at which the trace first violates the verifier.

2606.03631 2026-06-04 cs.LG cs.AI 版本更新

AnchorMoE: Interpretable Time Series Classification via Anchor-Routed MoE

AnchorMoE: 基于锚点路由的混合专家模型实现可解释时间序列分类

Tao Xie, Zexi Tan, Haoyi Xiao, Mengke Li, Yiqun Zhang, Yang Lu, Cuie Yang, Yiu-ming Cheung

发表机构 * School of Automation, Guangdong University of Technology(广东工业大学自动化学院) School of Computer Science and Technology, Guangdong University of Technology(广东工业大学计算机科学与技术学院) College of Computer Science and Software Engineering, Shenzhen University(深圳大学计算机科学与软件工程学院) School of Informatics, Xiamen University(厦门大学信息学院) State Key Laboratory of Synthetical Automation for Process Industries, Northeastern University(东北大学过程工业综合自动化国家重点实验室) Department of Computer Science, Hong Kong Baptist University(香港 Baptist 大学计算机科学系)

AI总结 提出AnchorMoE框架,利用混合专家架构对局部补丁进行多视角表示并路由至专门专家,通过加性分解实现前向可解释性,并引入几何正交约束和不确定性感知门控机制提升稀疏信号下的分解可靠性与噪声抑制。

Comments Accepted by KDD 2026, 12 pages

详情
AI中文摘要

多变量时间序列分类(MTSC)在高风险领域(如临床诊断和工业故障检测)中至关重要,这些领域的安全部署需要透明的决策过程。然而,隔离驱动模型预测的时间段具有挑战性,因为现实世界时间序列中的判别信号通常是稀疏、异构且被背景噪声严重掩盖的。因此,本文提出了AnchorMoE,一种天生可解释的分类框架。基于混合专家(MoE)架构,AnchorMoE编码局部补丁的多视角表示并将其路由到专门专家,确保最终预测被表述为输入段上的精确加性分解,从而促进前向透明度,而非依赖事后估计。为了在稀疏信号分布下保持这种分解的可靠性,我们引入了几何正交约束,惩罚表示冗余,迫使不同专家专门处理异构预测模式。此外,设计了一个不确定性感知的可靠性门控,动态校准每个段的贡献,有效抑制残余背景噪声。在真实世界和合成基准上的大量实验表明,AnchorMoE在实现高度竞争的分类性能的同时,忠实于原始时间序列进行决策。

英文摘要

Multivariate time series classification (MTSC) is pivotal in high-stakes domains, such as clinical diagnosis and industrial fault detection, where safe deployment necessitates transparent decision-making. However, isolating the temporal segments that drive model predictions is challenging because discriminative signals in real-world time series are typically sparse, heterogeneous, and heavily obscured by background noise. This paper, therefore, proposes AnchorMoE, an interpretable-by-construction classification framework. Built upon a Mixture-of-Experts (MoE) architecture, AnchorMoE encodes multi-view representations of local patches and routes them to specialized experts, ensuring that the final prediction is formulated as an exact additive decomposition over the input segments, facilitating ante-hoc transparency rather than relying on post-hoc estimations. To maintain the reliability of this decomposition under sparse signal distributions, we introduce a geometric orthogonality constraint that penalizes representational redundancy, compelling distinct experts to specialize in heterogeneous predictive patterns. Furthermore, an uncertainty-aware reliability gate is designed to dynamically calibrate the contribution of each segment, effectively suppressing residual background noise. Extensive experiments on real-world and synthetic benchmarks demonstrate that AnchorMoE achieves highly competitive classification performance while faithfully grounding its decisions in the raw time series.

2606.03606 2026-06-04 cs.CR cs.AI 版本更新

Testing LLM Arithmetic Reasoning Generalization with Automatic Numeric-Remapping Attacks

测试大语言模型算术推理泛化能力:自动数值重映射攻击

Malia Barker, Bishal Lakha, Edoardo Serra, Francesco Gullo

发表机构 * Department of Computer Science, Boise State University(计算机科学系,博伊州立大学) University of L’Aquila(拉奎拉大学)

AI总结 提出自动数值重映射攻击算法,通过保持推理程序的小数值变化测试LLM算术推理鲁棒性,发现GSM8K上准确率下降12-26个百分点,而MAWPS和MultiArith更稳定。

详情
AI中文摘要

大语言模型在算术推理基准上表现强劲,应对算术脆弱性的一种常见方法是将计算委托给代码。然而,模型仍经常用于需要直接从自然语言推理的场景,可信赖的模型应能解决小数值算术文字题而无需外部工具。先前工作表明,LLM对数值变化敏感:模型可能解决原始问题,但在需要相同推理过程但数字不同的结构相似变体上失败。我们探究这种脆弱性是否在更严格的设置下持续存在,该设置涉及保留原始推理程序并避免大数值压力测试的小规模、模式保持的数值变化。我们引入了一种自动算法,用于生成算术文字题的数值重映射攻击。与需要手动模式或约束的基于模板的扰动方法不同,我们的方法推导问题特定的符号表示,生成受约束的数值重映射,重新计算正确答案,并通过由LLM生成的编辑计划指导的确定性编辑实现变换后的问题。分阶段验证和高置信度审计保留了可靠的攻击,使得流水线在有限人工干预下可扩展。我们在GSM8K、MAWPS和MultiArith上评估了DeepSeek-R1 (70B)、Gemma4 (31B)和GPT-OSS (120B)。在GSM8K上,完成的运行显示条件准确率下降12.16至25.82个百分点。MAWPS和MultiArith则稳定得多,大多数攻击后的准确率接近或高于98%。这些结果表明,数值重映射鲁棒性强烈依赖于数据集结构:即使推理程序被保留且答案被重新计算,GSM8K仍然敏感,而更短、更规则的数据集则更鲁棒。

英文摘要

Large language models achieve strong performance on arithmetic reasoning benchmarks, and one common response to arithmetic brittleness is to delegate computation to code. Yet models are still often used in settings where they must reason directly from natural language, and trustworthy models should solve small-number arithmetic word problems without external tools. Prior work shows that LLMs are sensitive to numerical variation: a model may solve an original problem but fail on structurally similar variants requiring the same reasoning procedure with different numbers. We ask whether this fragility persists under a stricter setting involving small, schema-preserving numeric changes that retain the original reasoning program and avoid large-number stress tests. We introduce an automatic algorithm for generating numeric-remapping attacks on arithmetic word problems. Unlike template-based perturbation methods requiring manual schemas or constraints, our approach derives problem-specific symbolic representations, generates constrained numeric remappings, recomputes gold answers, and realizes transformed questions through deterministic edits guided by LLM-generated edit plans. Stage-wise validation and a high-confidence audit retain reliable attacks, making the pipeline scalable with limited human intervention. We evaluate DeepSeek-R1 (70B), Gemma4 (31B), and GPT-OSS (120B) on GSM8K, MAWPS, and MultiArith. On GSM8K, completed runs show conditional accuracy drops of 12.16 to 25.82 percentage points. MAWPS and MultiArith are far more stable, with most attacked accuracies near or above 98%. These results show that numeric-remapping robustness depends strongly on dataset structure: GSM8K remains sensitive even when reasoning programs are preserved and answers are recomputed, while shorter, more regular datasets are more robust.

2606.03598 2026-06-04 cs.RO cs.AI cs.CV 版本更新

PHASER: Phase-Aware and Semantic Experience Replay for Vision-Language-Action Models

PHASER: 面向视觉-语言-动作模型的相位感知与语义经验回放

Ziyang Chen, Shaoguang Wang, Weiyu Guo, Qianyi Cai, He Zhang, Pengteng Li, Yiren Zhao, Yandong Guo

发表机构 * Thrust of AI, HKUST(Guangzhou)(人工智能 thrust,香港科技大学(广州)) AI 2 Robotics, Shenzhen, China(人工智能与机器人,深圳,中国)

AI总结 提出PHASER框架,通过相位感知容量分配和多模态干扰路由策略,结合自动相位提取管线Auto-PC,解决VLA模型在持续学习中的灾难性遗忘问题,在LIBERO基准上平均成功率提升高达31%。

Comments 20 pages, 8 figures, 12 tables

详情
AI中文摘要

视觉-语言-动作(VLA)模型在语言条件机器人操作中取得了显著成功。然而,在开放环境中部署这些模型需要持续获取新技能,这一过程不可避免地会严重遗忘先前学习的行为。虽然经验回放(ER)是一种标准的缓解策略,但简单的均匀采样从根本上与操作轨迹的时间特征不一致。它系统性地欠采样短暂但因果关键的子技能,导致相位饥饿,并完全忽略了历史任务中不同程度的遗忘。为克服这些限制,我们提出PHASER,一种架构无关的持续学习框架。PHASER采用以相位为中心的容量分配,确保所有子技能获得平等的记忆支持,并结合多模态干扰路由策略,动态优先处理遗忘风险高的历史相位。此外,为实现完全自主的终身适应,我们集成了Auto-PC,一种轻量级管线,结合无监督动作信号变化点检测和基于VLM的语义验证,无需大量人工监督即可提取时间边界。在LIBERO持续学习套件上对三个VLA骨干网络的评估表明,PHASER取得了显著的实证改进,与匹配预算的ER相比,平均成功率(ASR)提升高达31%,并在LIBERO-Goal CL设置中达到87.8%的最终ASR。

英文摘要

Vision-Language-Action (VLA) models have achieved remarkable success in language-conditioned robotic manipulation. However, deploying these models in open-ended environments requires continuously acquiring novel skills, a process that inevitably triggers severe catastrophic forgetting of previously learned behaviors. While experience replay (ER) serves as a standard mitigating strategy, naive uniform sampling fundamentally misaligns with the temporal characteristics of manipulation trajectories. It systematically under-samples brief but causally critical sub-skills, leading to phase starvation, and completely overlooks the varying degrees of forgetting across historical tasks. To overcome these limitations, we introduce PHASER, an architecture-agnostic continual learning framework. PHASER employs a phase-centric capacity allocation to guarantee equal memory support for all sub-skills, coupled with a multi-modal interference routing strategy that dynamically prioritizes historical phases at high risk of forgetting. Furthermore, to enable fully autonomous lifelong adaptation, we integrate Auto-PC, a lightweight pipeline combining unsupervised action-signal change-point detection with VLM-based semantic verification to extract temporal boundaries without intensive manual supervision. Evaluated across three VLA backbones on LIBERO continual learning suites, PHASER yields substantial empirical improvements, increasing Average Success Rate (ASR) by up to 31% over matched-budget ER and achieving an 87.8% final ASR on the LIBERO-Goal CL setting.

2606.03564 2026-06-04 cs.CV cs.AI 版本更新

CR-Seg: Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation

CR-Seg:注意力引导与CoT增强的由粗到精推理分割

Yifan Cao, Xiaocui Yang, Faxian Wan, Shi Feng, Daling Wang, Yifei Zhang

发表机构 * School of Computer Science and Engineering, Northeastern University(东北大学计算机科学与工程学院)

AI总结 提出CR-Seg两阶段框架,通过注意力图提取和全局到局部思维链,实现由粗到精的推理分割,解决跨模态对齐和推理-答案不一致问题。

详情
AI中文摘要

推理分割旨在通过联合视觉-文本推理来分割复杂语言描述的目标对象。现有方法通常依赖学习到的语义标记来桥接多模态大语言模型(MLLMs)和分割模型,但面临困难的跨模态对齐问题;或者依赖显式空间提示(如边界框),但可能丢失整体响应语义。为解决这些限制,我们提出注意力引导与CoT增强的由粗到精推理分割(CR-Seg),一个两阶段框架。具体地,我们设计了提取注意力图和点(EAP)模块,用于提取粗目标定位的注意力图并选择信息点,两者都输入SAM进行掩码细化。为缓解推理-答案不一致,我们进一步引入全局到局部思维链(GLCoT),引导模型从全局场景上下文逐步推理到局部目标细节。在推理分割基准上的大量实验证明了CR-Seg的有效性。

英文摘要

Reasoning segmentation aims to segment target objects described by complex language through joint visual-textual reasoning. Existing methods typically rely on either learned semantic tokens to bridge Multimodal Large Language Models (MLLMs) and segmentation models, suffering from difficult cross-modal alignment, or explicit spatial prompts such as bounding boxes, which may lose holistic response semantics. To address these limitations, we propose Attention-Guided and CoT-Enhanced Coarse-to-Refined Reasoning Segmentation, termed CR-Seg, a two-stage framework for coarse-to-refined reasoning segmentation. Specifically, we design an Extract Attention Maps and Points (EAP) module to extract attention maps for coarse target localization and select informative points, both of which are fed into SAM for mask refinement. To alleviate reasoning--answer inconsistency, we further introduce Global-to-Local Chain-of-Thought (GLCoT), which guides the model to reason progressively from global scene context to local target details. Extensive experiments on reasoning segmentation benchmarks demonstrate the effectiveness of CR-Seg.

2606.03376 2026-06-04 cs.CV cs.AI cs.CL cs.LG 版本更新

P$^2$-DPO: Grounding Hallucination in Perceptual Processing via Calibration Direct Preference Optimization

P²-DPO:通过校准直接偏好优化在感知处理中锚定幻觉

Ruipeng Zhang, Zhihao Li, Haozhang Yuan, C. L. Philip Chen, Tong Zhang

发表机构 * Guangdong Provincial Key Laboratory of Computational AI Models and Cognitive Intelligence, School of Computer Science & Engineering, South China University of Technology(广东省计算人工智能模型与认知智能重点实验室,计算机科学与工程学院,华南理工大学) Pazhou Lab, Guangzhou, China(琶洲实验室,广州,中国) Engineering Research Center of the Ministry of Education on Health Intelligent Perception and Paralleled Digital-Human, Guangzhou, China(教育部健康智能感知与并行数字人工程研究中心,广州,中国)

AI总结 针对大型视觉语言模型中的幻觉问题,提出P²-DPO训练范式,通过模型自生成偏好对和校准损失,直接优化感知瓶颈和视觉鲁棒性,无需昂贵人工反馈。

详情
AI中文摘要

幻觉最近在大型视觉语言模型(LVLMs)中引起了广泛的研究关注。直接偏好优化(DPO)旨在直接从人类提供的纠正偏好中学习,从而解决幻觉问题。尽管取得了成功,但这种范式尚未专门针对关注区域中的感知瓶颈或解决图像退化下的视觉鲁棒性不足问题。此外,现有的偏好对通常是视觉无关的,其固有的离策略性质限制了它们在指导模型学习方面的有效性。为了解决这些挑战,我们提出了感知处理直接偏好优化(P²-DPO),一种新颖的训练范式,其中模型生成并学习自己的偏好对,从而直接解决已识别的视觉瓶颈,同时固有地避免视觉无关和离策略数据的问题。它引入了:(1)一种针对焦点增强感知和视觉鲁棒性的在策略偏好对构建方法,以及(2)一种精心设计的校准损失,以精确地将视觉信号与文本的因果生成对齐。实验结果表明,在相当数量的训练数据和成本下,P²-DPO在基准测试中优于依赖昂贵人工反馈的强基线。此外,对注意力区域保真度(ARF)和图像退化场景的评估验证了P²-DPO在解决关注区域感知瓶颈和提高对退化输入的视觉鲁棒性方面的有效性。

英文摘要

Hallucination has recently garnered significant research attention in Large Vision-Language Models (LVLMs). Direct Preference Optimization (DPO) aims to learn directly from the corrected preferences provided by humans, thereby addressing the hallucination issue. Despite its success, this paradigm has yet to specifically target the perceptual bottleneck in attended regions or address insufficient Visual Robustness against image degradation. Furthermore, existing preference pairs are often vision-agnostic and their inherently off-policy nature limits their effectiveness in guiding model learning. To address these challenges, we propose Perceptual Processing Direct Preference Optimization (P$^2$-DPO), a novel training paradigm in which the model generates and learns from its own preference pairs, thereby directly addressing the identified visual bottlenecks while inherently avoiding the issues of vision-agnostic and off-policy data. It introduces: (1) an on-policy preference pairs construction method targeting Focus-and-Enhance perception and Visual Robustness, and (2) a well-designed Calibration Loss to precisely align visual signals with the causal generation of text. Experimental results demonstrate that with a comparable amount of training data and cost, P$^2$-DPO outperforms strong baselines that rely on costly human feedback on benchmarks. Furthermore, evaluations on Attention Region Fidelity (ARF) and image degradation scenarios validate the effectiveness of P$^2$-DPO in addressing perceptual bottleneck in attended regions and improving Visual Robustness against degraded inputs.

2606.03323 2026-06-04 cs.CR cs.AI 版本更新

Implement Kubernetes Pod-Level Remote Attestation for Confidential Workloads on dstack

dstack-capsule:Kubernetes 上机密工作负载的 Pod 级远程证明

Yang Yang, Kevin Wang, Yuanhai Luo, Hang Yin, Jie Cai, Shunfan Zhou, Wenfeng Wang

发表机构 * OPPO Phala

AI总结 提出 dstack-capsule 平台,通过两层证明架构和权限熔断机制,在 Intel TDX 上实现多个 Pod 共享一个机密虚拟机且每个 Pod 保留独立硬件背书身份的 Pod 级远程证明,避免了每 Pod 独立虚拟机的资源开销。

详情
AI中文摘要

LLM即服务和其他机密云工作负载的兴起要求密码学证明用户数据在可信、未被篡改的环境中处理。现有解决方案,特别是机密容器(CoCo),强制执行严格的“每个虚拟机一个Pod”模型,仅证明客户机操作系统栈,留下容器级身份未验证,并导致高昂的每虚拟机资源开销。我们提出dstack-capsule,一个Kubernetes平台,通过允许多个Pod共享单个机密虚拟机,同时每个Pod保留独立的硬件背书身份,在Intel TDX上实现Pod级远程证明。我们的关键见解是两层证明架构:静态平台测量通过不可逆的权限熔断冻结在RTMR[3]中,而动态Pod身份(pod_uid、pod_spec_hash、workload_id)嵌入在TDX Quote的report_data字段中,并在每次请求时由硬件签名。dstack-capsule引入了(1)一个Pod级证明协议,将Pod规范摘要绑定到硬件签名的Quote;(2)一个权限熔断机制,原子地将节点从设置模式转换到安全模式;(3)一个多层沙箱,涵盖存储、运行时、准入、API和网络隔离层;以及(4)一个基于Kubernetes 1.32、Intel TDX和Sysbox的完整开源实现。我们评估了dstack-capsule的安全属性、证明正确性和性能特征,证明它实现了Pod粒度验证,而没有每虚拟机隔离的资源开销。

英文摘要

The rise of LLM-as-a-Service and other confidential cloud workloads demands cryptographic proof that user data is processed in a trusted, untampered environment. Existing solutions, notably Confidential Containers (CoCo), enforce a strict "one Pod per VM" model that attests only the Guest OS stack, leaving container-level identity unverified and incurring prohibitive per-VM resource overhead. We present dstack-capsule, a Kubernetes platform that enables Pod-level remote attestation on Intel TDX by allowing multiple Pods to share a single Confidential VM while each retains independent, hardware-backed proof of identity. Our key insight is a two-layer attestation architecture: static platform measurements are frozen in RTMR[3] via an irreversible privilege fuse, while dynamic Pod identities (pod_uid, pod_spec_hash, workload_id) are embedded in the TDX Quote's report_data field and signed by hardware on every request. dstack-capsule introduces (1) a Pod-level attestation protocol binding Pod spec digests to hardware-signed Quotes; (2) a privilege fuse mechanism that atomically transitions a node from setup mode to secure mode; (3) a multi-layer sandbox spanning storage, runtime, admission, API, and network isolation layers; and (4) a complete open-source implementation based on Kubernetes 1.32, Intel TDX, and Sysbox. We evaluate the security properties, attestation correctness, and performance characteristics of dstack-capsule, demonstrating that it achieves Pod-granularity verification without the resource overhead of per-VM isolation.

2606.03307 2026-06-04 cs.IR cs.AI 版本更新

Generalizing Graph Foundation Models via Hyperbolic Retrieval-Augmented Generation

通过双曲检索增强生成泛化图基础模型

Yifan Jin, Qirui Ji, Bin Qin, Jiangmeng Li, Lixiang Liu, Fuchun Sun, Changwen Zheng

发表机构 * Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所) University of Chinese Academy of Sciences(中国科学院大学) Tsinghua University(清华大学)

AI总结 提出双曲检索增强生成框架,通过双曲空间索引树状外部知识库并多粒度检索,解决图基础模型分布偏移下的泛化问题。

Comments Accepted by KDD2026

详情
AI中文摘要

图基础模型(GFMs)通过利用大规模预训练进行跨领域推理,成为图表示学习中的主导范式。然而,这些模型编码的参数化知识不足以应对分布偏移,限制了其泛化能力。为了缓解这一问题,检索增强生成(RAG)被引入以在推理时融入外部知识。然而,现有在欧几里得空间中运行的RAG框架存在一个基本的几何限制:欧几里得空间的多项式体积增长与树状结构的外部知识库本质上不匹配。这种不匹配导致检索中语义粒度的损失,并产生枢纽效应。为了解决这一限制,我们提出了双曲检索增强生成(HyRAG)框架,旨在增强GFMs的泛化能力。具体来说,引入的双曲知识索引模块通过在双曲空间中建模外部知识库,保留了其树状层次结构。然后,多粒度检索模块通过粗粒度和细粒度知识检索分别为GFMs提供全局语义锚点和局部语义细节。最后,双路径融合模块在特征和结构层面实现了图任务的有效知识整合。在多个图基准上的实验表明,在零样本设置下取得了显著改进,突显了我们的方法在鲁棒GFMs推理中的泛化能力。

英文摘要

Graph foundation models (GFMs) emerged as a dominant paradigm in graph representation learning by leveraging large-scale pre-training for cross-domain inference. However, the parameterized knowledge encoded within these models is insufficient to cope with distribution shifts, limiting their generalization ability. To mitigate this issue, retrieval-augmented generation (RAG) has been introduced to incorporate external knowledge at inference time. Nevertheless, existing RAG frameworks operating in Euclidean space suffer from a fundamental geometric limitation: the polynomial volume growth of Euclidean space is inherently mismatched with the tree-structured external knowledge bases. This mismatch leads to the loss of semantic granularity in retrieval and gives rise to the hubness phenomenon.To address this limitation, we propose a Hyperbolic Retrieval-Augmented Generation (HyRAG) framework designed to enhance the generalization capabilities of GFMs. Specifically, the introduced Hyperbolic Knowledge Indexing module retains the tree-like hierarchies of the external knowledge base by modeling them within hyperbolic space. The Multi-granularity Retrieval module then provides GFMs with the global semantic anchors and local semantic nuances through coarse-grained and fine-grained knowledge retrieval, respectively. Finally, the Dual-path Fusion module achieves effective knowledge integration for graph tasks at both the feature and structural levels. Experiments on multiple graph benchmarks demonstrate significant improvements in the zero-shot setting, highlighting the generalization of our method for robust GFMs inference.

2606.03303 2026-06-04 cs.AI 版本更新

LEAP: Supercharging LLMs for Formal Mathematics with Agentic Frameworks

LEAP:利用智能体框架增强形式化数学的大语言模型

Po-Nien Kung, Linfeng Song, Dawsen Hwang, Jinsung Yoon, Chun-Liang Li, Simone Severini, Mirek Olšák, Edward Lockhart, Quoc V Le, Burak Gokturk, Thang Luong, Tomas Pfister, Nanyun Peng

发表机构 * Google Cloud AI Research(谷歌云人工智能研究) Google Cloud(谷歌云) Google DeepMind(谷歌深Mind)

AI总结 提出LEAP智能体框架,通过分解问题、与Lean编译器交互及自我优化,使通用大模型在形式化定理证明上达到最先进性能,并在Putnam竞赛和Lean-IMO-Bench上超越专业系统。

详情
AI中文摘要

大语言模型(LLMs)在非正式数学推理中表现强劲,但在生成如Lean等形式语言中可机械验证的证明方面存在困难。我们提出LEAP,一个智能体框架,使通用基础模型在自动化形式定理证明上达到最先进性能。LEAP利用基础模型的能力,如非正式推理、指令遵循和迭代自我优化。通过将复杂问题分解为更小的单元,该系统通过与Lean编译器的持续交互,将形式化证明构建与非正式蓝图连接起来。为了在日益饱和的基准之外提供严格评估,我们引入了Lean-IMO-Bench,一个用Lean形式化的IMO风格问题基准,其陈述简短但证明高度非常规且多步,涵盖广泛难度级别。实验上,在最新2025年Putnam竞赛(北美本科生年度数学竞赛)中,LEAP解决了所有12个问题,匹配了前沿形式化数学模型的最新突破。在Lean-IMO-Bench上,LEAP将通用LLM的一次性形式化解决率从低于10%提升至70%,显著超过了由专业金牌级IMO系统设定的48%基准。此外,我们通过自主形式化开放组合挑战的复杂证明,包括Knuth偶阶Cayley图哈密顿分解中关键子问题的验证证明,展示了LEAP的研究级实用性。

英文摘要

Large Language Models (LLMs) exhibit strong informal mathematical reasoning but struggle to generate mechanically verifiable proofs in formal languages like Lean. We present LEAP, an agentic framework that enables general-purpose foundation models to achieve state-of-the-art performance on automated formal theorem proving. LEAP leverages foundation model capabilities, such as informal reasoning, instruction following, and iterative self-refinement. By decomposing complex problems into smaller units, the system bridges formal proof construction with informal blueprints through continuous interaction with the Lean compiler. To provide a rigorous evaluation beyond increasingly saturated benchmarks, we introduce Lean-IMO-Bench, a benchmark of IMO-style problems formalized in Lean, with short statements yet highly non-routine and multi-step proofs across a wide range of difficulty levels. Empirically, on the latest 2025 Putnam Competition, an annual mathematics competition for undergraduate students in North America, LEAP solves all 12 problems, matching recent breakthroughs by frontier formal mathematical models. On Lean-IMO-Bench, LEAP boosts the one-shot formal solve rate of general-purpose LLMs from below 10% to 70%, notably surpassing the 48% benchmark set by a specialized, gold-medal-caliber IMO system. Furthermore, we demonstrate LEAP's research-level utility by autonomously formalizing complex proofs for open combinatorial challenges, including a verified proof for a key subproblem in Knuth's Hamiltonian decomposition of even-order Cayley graphs.

2606.03201 2026-06-04 cs.CV cs.AI 版本更新

Reinforcement Learning from Cross-domain Videos with Video Prediction Model

基于视频预测模型的跨领域视频强化学习

Zhao Yang, Xinrui Zu, Jacob E. Kooi, Thomas Delliaux, He Liu, Shujian Yu, Kevin Sebastian Luck, Vincent François-Lavet

发表机构 * VU Amsterdam(阿姆斯特丹大学) ISAE-SUPAERO

AI总结 提出XIPER奖励模型,通过跨领域视频预测将智能体观测映射到专家域,利用预测似然作为奖励信号,解决视觉差异域中无奖励信号和领域差距问题。

详情
AI中文摘要

由于缺乏奖励信号以及存在领域差距,从视觉上截然不同的领域的专家视频中进行强化学习具有挑战性。我们引入了XIPER(跨领域视频预测奖励),这是一种奖励模型,用于从视觉不同领域收集的专家视频中进行学习,其中智能体的外观因颜色、形态或仿真到现实差距等因素而不同。更具体地说,XIPER训练了一个跨领域视频预测模型,将智能体观测映射到专家领域,并使用预测似然作为奖励信号。在DMC Color Suite(8个任务)和DMC Body Suite(3个任务)上的实验表明,尽管存在智能体颜色和形态等领域的差距,XIPER始终优于基线方法。我们进一步在仿真到现实迁移数据集上分析了XIPER,证明它仅凭模拟专家视频就能为真实机器人观测产生有意义的奖励信号。代码、预训练模型、数据集和视频演示可在我们的项目网页上找到:this https URL

英文摘要

Reinforcement learning from expert videos across visually distinct domains is challenging due to the absence of reward signals and the presence of domain gaps. We introduce XIPER (Cross-domain Video Prediction Reward), a reward model for learning from expert videos collected in a visually different domain, where the agent's appearance differs due to factors such as color, morphology, or the sim-to-real gap. More specifically, XIPER trains a cross-domain video prediction model that maps agent observations into the expert domain and uses the prediction likelihood as a reward signal. Experiments on the DMC Color Suite (8 tasks) and DMC Body Suite (3 tasks) show that XIPER consistently outperforms baselines despite domain gaps such as differences in agent color and morphology. We further analyze XIPER on a sim-to-real transfer dataset, demonstrating that it produces meaningful reward signals for real-robot observations given only simulated expert videos. Code, pretrained models, datasets and video demonstrations can be found on our project webpage: https://sites.google.com/view/xiper

2606.02914 2026-06-04 cs.AI cs.CL 版本更新

Large AI Models in Dental Healthcare: From General-Purpose Systems to Domain-Specific Foundation Models

牙科医疗中的大型AI模型:从通用系统到领域特定基础模型

Sema Helali, Lina Abu Nada, Sausan Al Kawas, Alaa Abd-Alrazaq, Faleh Tamimi, Rafat Damseh

发表机构 * University of Al Ain, UAE(阿联酋阿恩大学) Sharjah University, UAE(阿联酋谢尔杰大学) Cornell University, Qatar(卡塔尔康奈尔大学) McGill University(麦吉尔大学)

AI总结 本文通过系统综述,提出二维分类框架,比较语言生成模型、判别视觉基础模型和牙科特定基础模型在牙科任务中的表现,发现集成管道优于单一模型,并指出数据不对称、幻觉和缺乏标准化基准等障碍。

详情
AI中文摘要

背景:口腔疾病影响全球近35亿人,但大规模AI模型在牙科中的临床潜力尚不明确。出现了三类不同的模型:语言生成模型、判别视觉基础模型和牙科特定基础模型,目前缺乏统一综述来审视它们的关系和共同局限性。方法:遵循PRISMA-ScR指南,系统检索四个数据库(PubMed、Google Scholar、Scopus、arXiv),由两名评审员独立筛选。应用纳入/排除标准后,纳入97项研究(2020-2026年)。我们提出了一个二维分类框架,按架构范式和牙科专业化程度对模型进行组织。结果:语言生成模型在基于文本的任务(临床推理、执照考试、患者沟通)中表现出色,但在依赖图像的诊断中表现不一致。改编的SAM和CLIP变体在牙齿分割和病变检测中取得了强劲结果。牙科特定模型(DentVFM、DentVLM、OralGPT)在复杂多模态任务中表现最强。集成管道始终优于单一模型方法。观察到数据不对称:牙科特定预训练几乎完全集中在视觉领域,反映了大规模牙科文本语料库的稀缺。结论:通用模型和牙科特定模型发挥互补作用;最有效的系统在结构化管道中结合两者。安全自主部署需要解决三个持续障碍:生成模型中的幻觉、有限的标注牙科数据集以及缺乏标准化的临床评估基准。

英文摘要

Background: Oral diseases affect nearly 3.5 billion people worldwide, yet the comparative clinical potential of large-scale AI models in dentistry remains poorly understood. Three distinct model categories have emerged: language-generative models, discriminative vision foundation models, and dental-specific foundation models, with no unified review examining their relationships and collective limitations. Methods: Following PRISMA-ScR guidelines, we systematically searched four databases (PubMed, Google Scholar, Scopus, arXiv), screened independently by two reviewers. After applying inclusion/exclusion criteria, 97 studies (2020-2026) were included. We propose a two-dimensional classification framework organizing models by architectural paradigm and dental specialization degree. Results: Language-generative models excel at text-based tasks (clinical reasoning, licensing exams, patient communication) but show inconsistent performance on image-dependent diagnostics. Adapted SAM and CLIP variants achieve strong tooth segmentation and lesion detection results. Dental-specific models (DentVFM, DentVLM, OralGPT) demonstrate strongest performance on complex multimodal tasks. Integrated pipelines consistently outperform single-model approaches. A data asymmetry is observed: dental-specific pretraining concentrates almost entirely in the vision domain, reflecting scarce large-scale dental text corpora. Conclusions: General-purpose and dental-specific models play complementary roles; the most effective systems combine both within structured pipelines. Safe autonomous deployment requires resolving three persistent barriers: hallucination in generative models, limited annotated dental datasets, and absent standardized clinical evaluation benchmarks.

2606.02886 2026-06-04 cs.LG cs.AI cs.CE math.PR physics.ao-ph 版本更新

Scalable Uncertainty Quantification for Extreme Weather Forecasting via Empirical Neural Tangent Kernels

基于经验神经正切核的极端天气预报可扩展不确定性量化

Jose Marie Antonio Miñoza, Rex Gregor Laylo, Sebastian C. Ibañez

发表机构 * Center for AI Research(人工智能研究中心) Department of Education(教育部门) Makati Philippines(马卡蒂菲律宾)

AI总结 本文提出基于神经正切核的不确定性量化方法,利用最后一层经验特征,通过方差崩溃机制和分解性能分析,实现无需重训练的极端天气自适应预测区间。

Comments Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26)

详情
AI中文摘要

深度学习天气模型现在匹配数值天气预报的准确性,同时运行速度快几个数量级,但产生确定性预测而没有不确定性估计,这对于极端天气事件期间的高风险决策是一个关键差距。本文提出基于神经正切核的不确定性量化(NTK-UQ),使用最后一层经验特征。理论分析预测,UQ质量通过两种机制依赖于架构。首先,方差崩溃机制解释了UQ何时失败:当特征值截断秩接近特征空间的有效秩时,GP校正项消耗几乎所有的先验方差,破坏了热带气旋与常规条件之间的区分;具有集中谱(谱算子)的架构需要激进截断(k≤10),而基于注意力的模型容忍满秩计算。其次,分解性能取决于极端天气的非高斯、重尾结构:独立成分分析利用高阶统计量(峰度、负熵)来隔离重尾极端事件特征,实现了比仅捕获二阶方差的奇异值分解更高的区分度。一个数据驱动的选择规则根据特征谱集中比选择ICA或SVD,正确地为所有四种评估架构指定了更优的分解。与分裂共形预测(自然的后验基线)相比,NTK-UQ在90%覆盖率下实现了31-37%更窄的预测区间,并且独特地产生随极端事件严重程度缩放的自适应区间,而共形预测无法通过构造实现。该框架无需重训练;推理时的不确定性每个样本仅需一次矩阵-向量乘积。

英文摘要

Deep learning weather models now match numerical weather prediction accuracy while running orders of magnitude faster, but produce deterministic forecasts without uncertainty estimates, a critical gap for high-stakes decisions during extreme weather events. This paper proposes Neural Tangent Kernel-based uncertainty quantification (NTK-UQ) using last-layer empirical features. Theoretical analysis predicts that UQ quality is architecture-dependent through two mechanisms. First, a variance collapse mechanism explains when UQ fails: when the eigenvalue truncation rank approaches the effective rank of the feature space, the GP correction term consumes nearly all prior variance, destroying discrimination between tropical cyclones and routine conditions; architectures with concentrated spectra (spectral operators) require aggressive truncation ($k \leq 10$), while attention-based models tolerate full-rank computation. Second, decomposition performance depends on the non-Gaussian, heavy-tailed structure of extreme weather: Independent Component Analysis exploits higher-order statistics (kurtosis, negentropy) to isolate heavy-tailed extreme-event features, achieving higher discrimination than singular value decomposition, which captures only second-order variance. A data-driven selection rule chooses ICA or SVD from the feature eigenspectrum concentration ratio, correctly prescribing the superior decomposition for all four evaluated architectures. Compared to split conformal prediction (the natural post-hoc baseline), NTK-UQ achieves 31--37\% sharper prediction intervals at 90\% coverage, and uniquely produces \emph{adaptive} intervals that scale with extreme event severity, which conformal prediction cannot achieve by construction. The framework requires no retraining; inference-time uncertainty requires only a single matrix-vector product per sample.

2606.02636 2026-06-04 cs.RO cs.AI 版本更新

Too Much of a Good Thing: When sim2real Efforts Impede Policy Learning (And What to Do About It)

过犹不及:当 sim2real 努力阻碍策略学习(以及如何应对)

Kyle Morgenstein, Bharath Masetty, Stephen Welch, Luis Sentis

发表机构 * Apptronik University of Texas at Austin(得克萨斯大学奥斯汀分校)

AI总结 本文指出 sim2real 努力与策略学习之间存在激励错位,导致模拟器锁定和策略探索不足,并提出通过 sim2sim2real 范式仅以机器人运动学为设计约束的潜在解决方案。

详情
AI中文摘要

虽然 sim2real 努力对于有效将策略迁移到硬件上是必要的,但过犹不及。我们认为,sim2real 努力导致了与策略学习的激励错位,由于现实世界施加的不合理约束,导致模拟器锁定和策略探索不足。我们对当前问题状态进行了诊断和解释,并提出了一种潜在解决方案,即通过 sim2sim2real 范式,仅以机器人的运动学作为唯一设计约束。

英文摘要

While sim2real efforts are necessary for effective policy transfer to hardware, there is such a thing as too much of a good thing. We argue that sim2real efforts have led to misaligned incentives with policy learning, resulting in simulator lock in and poor policy exploration due to the unreasonable constraints imposed by the real world. We offer a diagnosis and explanation of the current status of the problem, and propose a potential solution via a sim2sim2real paradigm that leverages the robot's kinematics as the sole design constraint.

2606.03554 2026-06-04 cond-mat.stat-mech cs.AI nlin.AO physics.comp-ph 版本更新

Constraint-Enhanced Physical Search through Correlation Matching

通过相关性匹配的约束增强物理搜索

Song-Ju Kim

发表机构 * SOBIN Institute LLC(SOBIN研究所)

AI总结 本文提出约束增强物理搜索原理,通过将探索的时间相关性与约束诱导的空间相关性匹配,利用最小拉锯战赌博机模型(TOW)证明守恒律将局部观测转化为跨选项的差异证据,而时间相关驱动控制探索顺序,从而提升搜索效率。

Comments 13 pages, 4 figures

详情
AI中文摘要

物理系统不仅为搜索过程添加噪声,还施加约束,从而产生结构化相关性。我们提出一个约束增强物理搜索原理,其中探索的时间相关性与更新动力学中约束诱导的空间相关性相匹配。使用一个最小拉锯战赌博机模型(TOW),我们证明守恒律将局部观测转化为跨选项的差异证据,而时间相关驱动控制探索顺序。搜索效率的提升不是通过更强的随机性或最大反相关性,而是通过将时间相关性与将反馈转化为证据的物理更新尺度相匹配。一个标度估计识别出更新噪声与对比度之比是限制时间反相关性使用程度的主要参数。结果提示了物理搜索的一个一般组织原则:约束和涨落可以产生结构化的时空相关性,当这些相关性与更新动力学相匹配时,高效探索就会出现。

英文摘要

Physical systems do not merely add noise to search processes; they impose constraints that generate structured correlations. We propose a principle of constraint-enhanced physical search in which temporal correlations in exploration are matched to constraint-induced spatial correlations in the update dynamics. Using a minimal tug-of-war bandit model (TOW), we show that a conservation law converts local observations into differential evidence across alternatives, while a temporally correlated drive controls the order of exploration. Search efficiency is improved not by stronger randomness or by maximal anti-correlation, but by matching the temporal correlation to the physical update scale that converts feedback into evidence. A scaling estimate identifies the update-noise-to-contrast ratio as the leading parameter that limits how strongly temporal anti-correlation can be used. The results suggest a general organizing principle for physical search: constraints and fluctuations can generate structured spatiotemporal correlations, and efficient exploration emerges when these correlations are matched to the update dynamics.

2606.02403 2026-06-04 cs.CL cs.AI 版本更新

AutoForest: Automatically Generating Forest Plots from Biomedical Studies with End-to-End Evidence Extraction and Synthesis

AutoForest: 从生物医学研究中自动生成森林图,实现端到端的证据提取与综合

Massimiliano Pronesti, Angelo Miculescu, Mohsin Kapdi, Paul Flanagan, Oisín Redmond, Joao Bettencourt-Silva, Gurdeep Mannu, Spiros Denaxas, Rui Bebiano Da Providencia E Costa, Anya Belz, Yufang Hou

发表机构 * IBM Research(IBM研究院) Dublin City University(都柏林城市大学) UCL(伦敦大学学院) University of Oxford(牛津大学) IT:U Interdisciplinary Transformation University Austria(奥地利 interdisciplinary Transformation 大学)

AI总结 提出AutoForest系统,通过端到端的证据提取与统计综合,直接从生物医学论文自动生成可发表的森林图,加速证据综合并降低元分析门槛。

Comments Accepted to ACL2026 (System Demonstrations Track)

详情
AI中文摘要

系统评价依赖森林图来综合生物医学研究中的定量证据,但生成森林图仍然是一个碎片化且劳动密集型的过程。研究人员必须解读复杂的临床文本,手动从试验中提取结果数据,定义适当的干预措施和对照,协调不一致的研究设计,并执行元分析计算——通常需要使用需要结构化输入和领域专业知识的专门软件。虽然最近的研究表明,大型语言模型可以从非结构化文本中提取研究级数据,但现有系统没有自动化从原始文档到综合森林图的完整流程。为了解决这一差距,我们引入了AutoForest,这是第一个端到端系统,可以直接从生物医学论文生成可发表的森林图。给定一篇或多篇研究论文,AutoForest自动建议ICO(干预、对照、结果)元素,提取结果数据,执行统计综合,并渲染最终的森林图。我们描述了系统架构、用户界面,并通过一项涉及临床医生的用户研究,展示了其在真实世界示例上的有效性,表明AutoForest可以加速证据综合并大幅降低进行元分析的门槛。

英文摘要

Systematic reviews rely on forest plots to synthesise quantitative evidence across biomedical studies, but generating them remains a fragmented and labour-intensive process. Researchers must interpret complex clinical texts, manually extract outcome data from trials, define appropriate interventions and comparators, harmonise inconsistent study designs, and carry out meta-analytic computations-typically using specialised software that demands structured inputs and domain expertise. While recent work has demonstrated that large language models can extract study-level data from unstructured text, no existing system automates the complete pipeline from raw documents to synthesised forest plots. To address this gap, we introduce AutoForest, the first end-to-end system that generates publication-ready forest plots directly from biomedical papers. Given one or more study papers, AutoForest automatically suggests ICO (Intervention, Comparator, Outcome) elements, extracts outcome data, performs statistical synthesis, and renders the final forest plot. We describe the system architecture, user interface and demonstrate its effectiveness on real-world examples through a user study involving clinicians, showing how AutoForest can accelerate evidence synthesis and substantially lower the barrier to conducting meta-analyses.

2606.01961 2026-06-04 cs.AI 版本更新

AutoMedBench: Towards Medical AutoResearch with Agentic AI Models

AutoMedBench: 迈向基于智能体AI模型的医学自动研究

Junqi Liu, Selena Song, Yuhan Wang, Jiawei Mao, Hardy Chen, Xiaoke Huang, Tianhao Qi, Pengfei Guo, Yucheng Tang, Yufan He, Can Zhao, Andriy Myronenko, Dong Yang, Daguang Xu, Yuyin Zhou

发表机构 * University of California, Santa Cruz(加州大学圣克鲁兹分校) NVIDIA

AI总结 提出AutoMedBench,一个工作流感知的基准,通过五阶段工作流(计划、设置、验证、推理、提交)评估自主智能体在医学AI研究中的行为,发现验证阶段最弱而设置阶段最强,验证和提交失败占主导。

详情
AI中文摘要

自主智能体越来越被期望支持端到端的医学AI研究工作流程,超越孤立的预测任务或短形式的临床问答。然而,现有的医学智能体基准主要评估最终输出,对研究过程中智能体行为的可见性有限。为填补这一空白,我们提出了AutoMedBench,一个工作流感知的基准,用于跨多种医学成像和多模态推理任务的自主医学AI研究,将智能体执行组织成统一的五阶段工作流(S1-S5):计划、设置、验证、推理和提交。它包含长时域任务,每次运行平均33个智能体回合,涵盖五个研究轨道:分割、图像增强、视觉问答(VQA)、报告生成和病变检测。每个任务在两种难度级别(Lite和Standard)下评估,它们使用相同的数据和指标,但在任务简报脚手架的数量上有所不同,每次运行使用最终任务性能和S1-S5阶段得分进行评分,从而实现从初始任务简报到最后提交工件的阶段级分析。在数千次记录运行中,阶段级评分显示,验证是平均最弱的工作流阶段,而设置是最强的,这表明当前智能体更擅长使流程可执行,而不是验证其可靠性。运行后错误分析进一步显示,验证和提交失败主导了标记错误,分别占触发代码的37.7%和38.1%,而任务理解错误很少,占0.9%,并且触发一个错误代码的运行平均总体得分比无错误代码的运行低48%。

英文摘要

Autonomous agents are increasingly expected to support end-to-end medical-AI research workflows, moving beyond isolated prediction tasks or short-form clinical question answering. However, existing medical agent benchmarks primarily evaluate final outputs, providing limited visibility into agent behavior within the research process. To address this gap, we present AutoMedBench, a workflow-aware benchmark for autonomous medical-AI research across diverse medical imaging and multimodal inference tasks, organizing agent execution into a unified five-stage workflow (S1-S5): Plan, Setup, Validate, Inference, and Submit. It comprises long-horizon tasks with each run averaging 33 agent turns, spanning five research tracks: segmentation, image enhancement, visual question answering (VQA), report generation, and lesion detection. Each task is evaluated under two difficulty tiers, Lite and Standard, which use the same data and metrics but differ in the amount of task-brief scaffolding, and each run is scored using both final task performance and S1-S5 stage scores, enabling stage-level analysis from the initial task brief to the final submitted artifact. Across thousands of recorded runs, stage-level scoring reveals that Validate is the weakest workflow stage on average, whereas Setup is the strongest, suggesting that current agents are better at making pipelines executable than at verifying their reliability. Post-run error analysis further shows that verification and submission failures dominate tagged errors, accounting for 37.7% and 38.1% of fired codes respectively, whereas task-understanding errors are rare at 0.9%, and runs with one fired error code have a 48% lower overall score than runs with no error code on average.

2606.01770 2026-06-04 cs.LG cs.AI 版本更新

Adaptive Auto-Harness: Sustained Self-Improvement for Agentic System Deployment on Open-Ended Task Streams

自适应自动框架:面向开放式任务流的智能体系统部署的持续自我改进

Zewen Liu, Zhan Shi, Yisi Sang, Bing He, Minhua Lin, Tianxin Wei, Dakuo Wang, Benoit Dumoulin, Wei Jin, Hanqing Lu

发表机构 * Emory University(埃默里大学) Amazon(亚马逊) The Pennsylvania State University(宾夕法尼亚州立大学) UIUC(伊利诺伊大学香槟分校) Northeastern University(东北大学)

AI总结 提出自适应自动框架(Adaptive Auto-Harness),通过状态化多智能体进化器、带求解时路由的框架树和人工引导机制,解决开放式任务流中自动框架性能退化问题,在多个流上超越现有基线。

详情
AI中文摘要

自动框架系统(如A-Evolve、GEPA和Meta-Harness)通过从执行反馈中优化提示、技能、工具、记忆和支持基础设施来改进LLM智能体,但它们通常在固定的离线基准上进行评估。实际部署中呈现的是开放式任务流:历史记录无固定终点增长,异构任务需要不同的框架,问题分布随时间变化。这些挑战使得单一反复密集更新的框架变得脆弱,导致性能退化,准确率早期达到峰值后下降。这激发了具有任务自适应性的持续框架构建。我们引入了自适应自动框架(Adaptive Auto-Harness),一个针对此类流的框架和系统。该框架将到 oracle 框架的差距分解为进化损失和适应损失。系统通过状态化多智能体进化器、带求解时路由的框架树以及针对历史缺乏所需信号情况的人工引导钩子来解决这些损失。在预测市场、安全竞赛和事件预测流中,自适应自动框架优于五个现有的自动框架基线,消融实验将收益归因于更好的构建、路由或针对性的人工引导。代码可在 https://github.com/A-EVO-Lab/AdaptiveHarness 获取。

英文摘要

Auto-harness systems such as A-Evolve, GEPA, and Meta-Harness improve LLM agents by optimizing prompts, skills, tools, memories, and supporting infrastructure from execution feedback, but they are typically evaluated on fixed offline benchmarks. Real deployments instead present open-ended task streams: histories grow without a fixed endpoint, heterogeneous tasks require different harnesses, and problem distributions shift over time. These challenges make a single repeatedly and densely updated harness brittle, causing performance degradation as accuracy peaks early and then declines. This motivates sustained harness construction with task-wise adaptation. We introduce Adaptive Auto-Harness, a framework and system for such streams. The framework decomposes the gap to an oracle harness into evolution loss and adaptation loss. The system addresses these losses with a stateful multi-agent evolver, a harness tree with solve-time routing, and human-steering hooks for cases where history lacks the needed signal. Across prediction-market, security-competition, and event-forecasting streams, Adaptive Auto-Harness outperforms five existing auto-harness baselines and ablations attribute gains to better construction, routing, or targeted human steering. Code is available in \href{https://github.com/A-EVO-Lab/a-evolve/tree/release/adaptive-auto-harness}{Link}.

2606.01212 2026-06-04 cs.CL cs.AI cs.CR cs.IR 版本更新

DiscourseFlip: An Oblique Discourse-Level Opinion Manipulation Attack against Black-box Retrieval-Augmented Generation

DiscourseFlip: 面向黑盒检索增强生成的非直述式语篇级观点操纵攻击

Yuyang Gong, Miaokun Chen, Jiawei Liu, Zhuo Chen, Guoxiu He, Wei Lu, XiaoFeng Wang, Xiaozhong Liu

发表机构 * Wuhan University(武汉大学) East China Normal University(华东师范大学) Nanyang Technological University(南洋理工大学) Worcester Polytechnic Institute(沃思堡理工学院)

AI总结 提出一种基于图引导的代理攻击方法DiscourseFlip,通过语义查询网络中的协同影响在有限预算下最大化语篇级观点偏差,实验证明其有效性和隐蔽性,并揭示现有防御的不足。

详情
AI中文摘要

检索增强生成(RAG)系统被广泛部署且影响力日益增强,但其对外部语料库的依赖暴露了来自中毒检索内容的新安全风险。现有的RAG攻击主要关注单个查询或狭窄主题局部查询集,这限制了其实际影响范围,并在现实场景中提供有限的伪装。在本文中,我们引入了语篇级观点操纵,这是一种新的威胁模型,其中跨语义查询网络的协同影响会在整体、多主题查询空间上诱导观点转变。我们在黑盒设置中形式化了这种威胁,并提出了DiscourseFlip,一种基于代理的、图引导的攻击,动态分配有限的中毒预算以最大化语篇级观点偏差。大量实验表明,DiscourseFlip在上下文化查询网络上持续诱导目标观点转变,并在覆盖范围和有效性方面显著优于现有基线。用户研究进一步证实,DiscourseFlip有效且能很好地伪装以躲避用户检测。此外,系统分析表明,现有的缓解策略对语篇级操纵无效,这凸显了迫切需要更鲁棒和自适应的防御措施来应对语篇级漏洞。

英文摘要

Retrieval-Augmented Generation (RAG) systems are widely deployed and increasingly influential, but their reliance on external corpora exposes new security risks from poisoned retrieval content. Existing RAG attacks are largely focusing on individual queries or narrow topic-local query sets, which limits their practical reach and offers limited camouflage in real-world settings. In this paper, we introduce discourse-level opinion manipulation, a new threat model in which coordinated influence across a semantic query network induces opinion shifts over a holistic, multi-topic query space. We formalize this threat in a black-box setting and propose DiscourseFlip, an agentic, graph-guided attack that dynamically allocates a limited poisoning budget to maximize discourse-level opinion deviation. Extensive experiments demonstrate that DiscourseFlip consistently induces targeted opinion shifts across the contextualized query network and significantly outperforms existing baselines in terms of coverage and effectiveness. User studies further confirm that DiscourseFlip is effective while remaining well camouflaged from user detection. Moreover, systematic analyses show that existing mitigation strategies are ineffective against discourse-level manipulation, underscoring the urgent need for more robust and adaptive defenses to address discourse-level vulnerabilities.

2606.01138 2026-06-04 cs.CR cs.AI cs.DC 版本更新

memorywire: A Vendor-Neutral Wire Format for Agent Memory Operations

AMP:一种用于智能体内存操作的供应商中立线格式

Thamilvendhan Munirathinam

发表机构 * Independent Researcher(独立研究者)

AI总结 提出一种基于JSON-Schema 2020-12的供应商中立线格式memorywire,支持五种内存操作和四种内存类型,通过参考实现和基准测试验证其性能与兼容性。

Comments v2: title corrected from pre-launch name "AMP" to "memorywire"; abstract clarifies recall@5 = 1.000 is on the 42 gold-id queries (50 total; 8 no-match probes excluded). 17 pages, 1 figure, 6 tables. Code: github.com/mthamil107/memorywire. Companion to arXiv:2604.18248 (Prompt Injection Detection)

详情
AI中文摘要

智能体内存框架——mem0、Letta/MemGPT、Cognee、Zep/Graphiti、MemoryOS、MemTensor——各自提供自己的SDK、存储布局和操作词汇。没有共享的线格式:每次集成都是定制的,每次迁移都从头重建内存,并且没有框架提供治理界面,让人类在写入进入长期存储之前进行审查。我们提出memorywire,一种基于JSON-Schema 2020-12的线格式,支持五种内存操作(记住、回忆、遗忘、合并、过期)和四种内存类型(语义、情景、程序、情感),并包含一个MemoryStore接口、一个扇出路由器以及一个可选的人机回环治理通道。我们描述了一个开源参考实现,包含五个后端适配器(sqlite-vec、mem0、Letta、Cognee、pgvector);一个基于100个事实/50个查询的标注语料库的微基准测试,在42个标注查询上实现了recall@5=1.000,摄入p50=37.8毫秒,回忆p50=40.6毫秒;一个对抗融合实验显示,在1-of-N秩0注入扫描(K∈{0,5,...,50})中,倒数秩融合保持recall@5=1.000,而最大融合在K≥5时下降至0.500,泄露率达80%;以及一个16场景跨适配器一致性测试套件,80个单元中通过68个,零失败。贡献不在于新算法,而在于将现有组件(RRF、FSM、STM/LTM整合、差异与批准工作流)打包成一个供应商中立的协议,并附有经验验证的参考实现,旨在与模型上下文协议协作而非竞争。

英文摘要

Agent-memory frameworks -- mem0, Letta/MemGPT, Cognee, Zep/Graphiti, MemoryOS, MemTensor -- each ship their own SDK, storage layout, and operational vocabulary. There is no shared wire format: every integration is bespoke, every migration rebuilds memory from scratch, and no framework ships a governance surface that lets a human review writes before they enter long-term storage. We present memorywire, a JSON-Schema 2020-12 wire format for five memory operations (remember, recall, forget, merge, expire) over four memory types (semantic, episodic, procedural, emotional), with a MemoryStore interface, a fan-out router, and an optional HITL governance channel. We describe an open-source reference implementation with five backend adapters (sqlite-vec, mem0, Letta, Cognee, pgvector); a microbenchmark on a 100-fact / 50-query labelled corpus (42 with non-empty gold ids + 8 no-match probes) achieving recall@5 = 1.000 on the 42 gold-id queries with ingest p50 = 37.8 ms and recall p50 = 40.6 ms; an adversarial-fusion experiment showing Reciprocal Rank Fusion holds recall@5 = 1.000 across a 1-of-N rank-0 injection sweep (K in {0, 5, ..., 50}) where max fusion collapses to 0.500 with 80% leak at K >= 5; and a 16-scenario cross-adapter conformance suite passing 68 of 80 cells with zero failures. The contribution is not a new algorithm; it is a packaging of established components (RRF, FSMs, STM/LTM consolidation, diff-and-approve workflows) into a venue-neutral protocol with an empirically validated reference, positioned to compose with the Model Context Protocol rather than compete with it.

2606.01023 2026-06-04 cs.CV cs.AI 版本更新

Data Collection for Training Quality-Control AI in Carpet Manufacturing

地毯制造中用于训练质量控制AI的数据收集

Akbar Erkinov

发表机构 * Independent Researcher(独立研究者)

AI总结 针对地毯生产中视觉检测慢、主观且不一致的问题,提出一种在线机器视觉系统设计,通过同步线扫描相机和组合照明实时检测缺陷,并系统收集标注数据以持续训练质量控制模型,最终通过DMAIC方法量化质量改进。

Comments 10 pages, 3 figures

详情
AI中文摘要

视觉检测仍然是机织和簇绒地毯生产中主要的质量控制实践,但在现代织机的线速度和宽度下,它缓慢、主观且不一致。我们提出了一种在线机器视觉系统的设计方案,其主要目的有两个:实时检测地毯幅面,以及同样重要的是,系统地收集和标注缺陷图案的图像,以便在设备使用寿命内训练日益强大的质量控制模型。该方案基于一个具体的工业环境:在一个机织地毯生产设施中进行的六西格玛(DMAIC)项目,该项目预计在增加织机后会出现生产瓶颈,且基线缺陷率较高,质量故障带来的财务风险显著。我们描述了一个基于同步线扫描相机并组合明场和掠射照明的成像子系统,推导了在多米宽幅面上分辨细微结构缺陷所需的分辨率和吞吐量要求,并定义了地毯特定的缺陷分类。然后,我们提出了一种分阶段建模策略,从基于无缺陷材料的无监督异常检测开始,遵循MVTec异常检测基准中地毯类别的范例,并通过人在环的标注飞轮成熟为有监督的检测和分割模型。最后,我们将检测性能与DMAIC目标联系起来,展示逃逸缺陷的减少如何转化为过程质量和过程西格玛水平的提升。贡献在于提供了一个端到端、可部署的蓝图,将数据收集视为首要工程目标而非事后考虑。

英文摘要

Visual inspection remains the dominant quality-control practice in woven and tufted carpet production, yet it is slow, subjective, and inconsistent at the line speeds and widths of modern looms. We present a design proposal for an in-line machine-vision system whose primary purpose is twofold: to inspect the carpet web in real time and, equally importantly, to systematically collect and label images of defect patterns so that increasingly capable quality-control models can be trained over the life of the installation. The proposal is grounded in a concrete industrial setting: a Six Sigma (DMAIC) project at a woven-carpet production facility that anticipated a production bottleneck following the installation of additional weaving machines, with a substantial baseline defect rate and significant financial exposure associated with quality failures. We describe an imaging subsystem based on synchronized line-scan cameras with combined bright-field and grazing illumination, derive the resolution and throughput requirements needed to resolve fine structural defects across a multi-metre web, and define a carpet-specific defect taxonomy. We then lay out a staged modelling strategy that begins with unsupervised anomaly detection trained on defect-free material, following the paradigm exemplified by the carpet category of the MVTec Anomaly Detection benchmark, and matures through a human-in-the-loop annotation flywheel into supervised detection and segmentation models. Finally, we connect detection performance to the DMAIC objectives, showing how reductions in escaped defects translate into improved process quality and process sigma levels. The contribution is an end-to-end, deployable blueprint that treats data collection as a first-class engineering objective rather than an afterthought.

2606.00747 2026-06-04 cs.CV cs.AI 版本更新

SkyShield: Occupancy as a Safety Interface for Low-Altitude UAV Autonomy

SkyShield:占用作为低空无人机自主飞行的安全接口

Jie Gao, Jie Ma, Kaihui Lin, Kai Ye, Miaohui Zhang, Pingyang Dai, Liujuan Cao

发表机构 * Xiamen University(厦门大学) Jiangxi Academy of Sciences(江西省科学院)

AI总结 针对低空无人机自主飞行中的三维空间理解问题,提出首个前视单目语义占用基准SkyShield、动态感知度量KAR-mIoU和几何优先基线SkyOcc,将占用作为安全接口。

详情
AI中文摘要

对于低空无人机自主飞行,三维空间理解不仅仅是感知目标,更是人类指令与物理飞行之间的安全接口。在20米以下的人尺度城市空域中,薄几何结构、遮挡、植被和城市杂乱决定了飞行器能否安全进入前方空间。然而,现有的无人机数据集主要提供2D标注或3D框,而面向驾驶的占用基准假设稳定的地面级传感器装置。两者都缺少低空飞行的定义性场景:一个前视单目相机从移动的飞行器上观察占据和自由空间,具有逐帧变化的6自由度姿态和相机外参。为填补这一空白,我们提出了SkyShield,据我们所知,这是首个面向20米以下城市无人机飞行的前视单目语义占用基准。基于CARLA构建,SkyShield包含36K个前视无人机样本,涵盖多种城市场景和天气条件,每张图像配以逐帧6自由度无人机姿态、逐帧动态相机几何、无人机状态和前视截锥体语义占用标签。我们进一步提出了KAR-mIoU,一种以无人机为中心且动态感知的度量,通过运动可达性和碰撞时间重新加权体素级评估,揭示传统mIoU隐藏的安全关键风险。为应对这一具有挑战性的新场景,我们提供了SkyOcc,一种几何优先的单目基线,将逐帧无人机姿态集成到投影中,融合时序占用特征,并应用安全先验优化以保留稀疏的碰撞关键结构。SkyShield、KAR-mIoU和SkyOcc共同将占用确立为低空空中自主飞行的安全接口。代码和数据集将公开发布。

英文摘要

For low-altitude Unmanned Aerial Vehicle (UAV) autonomy, 3D spatial understanding is not merely a perception objective, but the safety interface between human instructions and physical flight. In human-scale urban airspace below 20 meters, thin geometry, occlusions, vegetation, and urban clutter define whether an aerial agent can safely enter the space ahead. However, existing UAV datasets mainly provide 2D annotations or 3D boxes, while driving-oriented occupancy benchmarks assume stable ground-level sensor rigs. Both miss the defining regime of low-altitude flight: a front-facing monocular camera observing occupied and free space from a moving aerial body with frame-wise changing 6-DoF pose and camera extrinsics. To bridge this gap, we introduce SkyShield, to the best of our knowledge the first front-view monocular semantic occupancy benchmark for urban UAV flight below 20 meters. Built on CARLA, SkyShield contains 36K front-view UAV samples across diverse urban scenes and weather conditions, pairing each image with frame-wise 6-DoF UAV pose, frame-wise dynamic camera geometry, UAV states, and front-frustum semantic occupancy labels. We further propose KAR-mIoU, a UAV-centric and dynamics-aware metric that re-weights voxel-level evaluation by kinematic reachability and time-to-collision, revealing safety-critical risks hidden by conventional mIoU. To tackle this challenging new setting, we provide SkyOcc, a geometry-first monocular baseline that integrates frame-wise UAV attitude into projection, fuses temporal occupancy features, and applies safety-prior optimization to preserve sparse collision-critical structures. Together, SkyShield, KAR-mIoU, and SkyOcc establish occupancy as a safety interface for low-altitude aerial autonomy. Code and dataset will be released publicly.

2606.00732 2026-06-04 cs.AI cs.LG 版本更新

SHARP: Sleep-based Hierarchical Accelerated Replay for Long Range Non-Stationary Temporal Pattern Recognition

SHARP: 基于睡眠的分层加速重放用于长程非平稳时间模式识别

Jayanta Dey, Shikhar Srivastava, Itamar Lerner, Christopher Kanan, Dhireesha Kudithipudi

发表机构 * Department of Computer Engineering, University of Texas at San Antonio, USA(德克萨斯大学圣安东尼奥分校计算机工程系) Department of Computer Science, University of Rochester, USA(罗切斯特大学计算机科学系) Department of Psychology, University of Texas at San Antonio, USA(德克萨斯大学圣安东尼奥分校心理学系)

AI总结 提出SHARP框架,通过将时间学习分解为记忆模块和模式识别模块,并引入离线睡眠阶段加速重放时间结构记忆,实现长程非平稳序列模式的高效学习。

详情
AI中文摘要

学习长程非平稳时间模式仍然是现代序列模型的核心挑战,特别是在严格的流式设置中。在这些设置中,数据按顺序到达,必须单次处理,不能同时回顾过去的观测。标准架构,包括循环神经网络和变换器,受到截断时间反向传播或显式输入窗口长度的限制,无法进行长程信用分配。为了解决这些限制,我们提出了SHARP(基于睡眠的分层加速重放),一个将时间学习分解为两个互补组件的框架:一个累积过去输入的结构化历史的记忆模块,以及一个在该记忆上操作的模式识别模块。这种分离通过消除跨多步时间反向传播进行长程信用分配的需求,实现了对非平稳动态的资源高效和计算高效适应。受啮齿动物在慢波睡眠期间观察到的加速重放启发,SHARP引入了离线(睡眠)阶段,其中时间结构的记忆痕迹以加速形式重放并整合到更高层次的记忆表示中,从而改善长程上下文保留。通过受控模拟和消融研究,我们表征了所提出框架的关键属性。在text8和PG-19等基准数据集上,我们证明SHARP通过保留先前见过数据的下一个令牌预测性能,同时继续从当前流中学习并泛化到未来未见数据,改进了循环基线。这些增益得益于其分层结构,该结构以线性时间计算成本实现了指数级增长的有效时间上下文。

英文摘要

Learning long-range non-stationary temporal patterns remains a core challenge for modern sequence models, particularly in strict streaming settings. In these settings, data arrive sequentially and must be processed in a single pass without simultaneously revisiting past observations. Standard architectures, including recurrent neural networks and transformers, are constrained by either truncated backpropagation through time horizon or explicit input window length for long range credit assignment. To address these limitations, we propose SHARP (Sleep-based Hierarchical Accelerated Replay), a framework that decomposes temporal learning into two complementary components: a memory module that accumulates a structured history of past inputs, and a pattern-recognition module that operates over this memory. This separation enables resource- and compute-efficient adaptation to non-stationary dynamics by eliminating the need for backpropagation through time across many steps for long-range credit assignment. Inspired by the accelerated replay observed in rodents during slow-wave sleep, SHARP incorporates offline (sleep) phases in which temporally structured memory traces are replayed in an accelerated form and integrated into higher-level memory representations, improving long-range context retention. Through controlled simulations and ablation studies, we characterize the key properties of the proposed framework. In benchmark datasets such as text8 and PG-19, we demonstrate that SHARP improves over recurrent baselines by retaining next-token predictive performance on previously seen data while continuing to learn from the current stream and generalizing to future unseen data. These gains are enabled by its hierarchical structure, which yields an exponentially increasing effective temporal context with only linear-time computational cost.

2606.00012 2026-06-04 cs.CL cs.AI 版本更新

DraDDP: A Multimodal Multi-Party Dialogue Discourse Parsing Dataset

DraDDP:多模态多方对话话语解析数据集

Shannan Liu, Peifeng Li, Yaxin Fan, Qiaoming Zhu

发表机构 * School of Computer Science and Technology, Soochow University(苏州大学计算机科学与技术学院)

AI总结 针对现有研究局限于文本或双方对话的问题,构建了基于美剧的首个公开英文多模态多方对话话语解析数据集DraDDP,并验证了多模态信息在捕捉对话结构和关系类型中的价值。

详情
Journal ref
Findings of the Association for Computational Linguistics (ACL 2026)
AI中文摘要

多方对话话语解析旨在识别对话中话语之间的依赖结构和关系类型。以往的研究大多局限于文本模态或双方对话,无法满足多模态和多方对话场景。本文基于美国电视剧,构建了首个公开的英文多模态多方对话话语解析数据集DraDDP。该数据集包含495个对话片段,共6,374条话语和9.1小时的并行视频内容,涵盖了丰富的多方交互场景。此外,我们在DraDDP上评估了该任务,并深入分析了不同模态的影响,建立了全面的基准。实验结果表明,多模态信息在捕捉对话结构和关系类型方面具有重要价值。我们将公开发布数据集、标注指南和代码,以促进多模态对话理解的未来研究。

英文摘要

Multi-party dialogue discourse parsing aims to identify dependency structures and relation types between utterances in conversations. Previous studies are mostly limited to textual modality or two-party dialogue, failing to meet the multimodal and multi-party settings. In this paper, we construct the first publicly available English multimodal dataset DraDDP for multi-party dialogue discourse parsing, based on American TV dramas. DraDDP contains 495 dialogue segments with 6,374 utterances and 9.1 hours of parallel video content, covering rich multi-party interaction scenarios. Moreover, we establish comprehensive benchmarks by evaluating this task on DraDDP and conducting in-depth analysis on the impact of different modalities. Experimental results demonstrate the value of multimodal information in capturing dialogue structures and relation types. We will publicly release the dataset, annotation guidelines, and code to promote future research in multimodal dialogue understanding.

2605.31483 2026-06-04 cs.CL cs.AI 版本更新

BenHalluEval: A Multi-Task Hallucination Evaluation Framework for Large Language Models on Bengali

BenHalluEval:孟加拉语大语言模型的多任务幻觉评估框架

Shefayat E Shams Adib, Ahmed Alfey Sani, Ekramul Alam Esham, Ajwad Abrar, Ishmam Tashdeed, Md Taukir Azam Chowdhury

发表机构 * Department of Computer Science and Engineering, Islamic University of Technology(伊斯兰科技大学计算机科学与工程系) Department of Computer Science and Engineering, University of California(加州大学计算机科学与工程系)

AI总结 针对孟加拉语大语言模型幻觉评估的空白,提出BenHalluEval框架,涵盖四项任务,构建12000个幻觉候选,并提出双轨校准指标BenHalluScore,揭示模型间幻觉校准的显著差异。

Comments Preprint. Under review

详情
AI中文摘要

尽管孟加拉语是世界上使用人数第六多的语言,但此前尚无工作系统评估大语言模型(LLMs)在孟加拉语上的幻觉。我们提出了BenHalluEval,一个针对孟加拉语的细粒度幻觉评估框架,涵盖四项任务:生成式问答(GQA)、孟加拉语-英语混合问答、摘要和推理。我们利用GPT-5.4从三个现有孟加拉语数据集中构建了12,000个幻觉候选,涵盖十二种任务特定的幻觉类型,并在双轨协议下评估了七个LLM,涵盖推理导向、多语言和孟加拉语中心类别,该协议独立测量真实实例上的假阳性率(轨道A)和幻觉候选上的幻觉检测率(轨道B)。为了同时惩罚两种失败模式并防止均匀响应偏差导致的分数膨胀,我们提出了BenHalluScore,一种双轨校准指标,在模型和任务上范围从7.72%到55.42%,揭示了幻觉校准的显著差异。链式思维提示作为一种缓解策略应用,会改变响应分布,但未能一致改善幻觉判别。BenHalluEval建立了首个针对孟加拉语的专用幻觉基准,并突显了单轨和仅提示评估方法在低资源语言环境中的不足。数据集和代码可在https://anonymous.4open.science/r/BanglaHalluEval-EB77获取。

英文摘要

Despite Bengali being the sixth most spoken language in the world, no prior work has systematically evaluated hallucination in large language models (LLMs) for Bengali. We introduce BenHalluEval, a fine-grained hallucination evaluation framework for Bengali covering four tasks: Generative Question Answering (GQA), Bangla-English Code-Mixed QA, Summarization, and Reasoning. We construct 12,000 hallucinated candidates using GPT-5.4 across twelve task-specific hallucination types, drawn from three existing Bengali datasets, and evaluate seven LLMs spanning reasoning-oriented, multilingual, and Bengali-centric categories under a dual-track protocol that independently measures false-positive rate on ground-truth instances (Track A) and hallucination detection rate on hallucinated candidates (Track B). To jointly penalise both failure modes and prevent inflated scores from uniform response bias, we propose BenHalluScore, a dual-track calibration metric that ranges from 7.72% to 55.42% across models and tasks, revealing substantial variation in hallucination calibration. Chain-of-thought prompting, applied as a mitigation strategy, shifts response distributions without consistently improving hallucination discrimination. BenHalluEval establishes the first dedicated hallucination benchmark for Bengali and highlights the inadequacy of single-track and prompting-only evaluation approaches for low-resource language settings. The dataset and code are available at https://anonymous.4open.science/r/BanglaHalluEval-EB77.

2605.28210 2026-06-04 cs.AI cs.CY cs.HC q-bio.NC 版本更新

The Illusion of Opting in AI-Mediated Consequential Decisions

AI中介的后果性决策中的选择错觉

Eugene Yu Ji

发表机构 * GitHub

AI总结 基于Ullmann-Margalit的选择概念,揭示当前AI系统造成一种“选择错觉”,即看似有意义的后果性选择实则削弱了主体的真正选择能力,并提出通过存在诚实、生态理性和反事实修复三个规范要义来保护和发展元能力。

Comments 11 pages, 1 figure, 2 tables

详情
AI中文摘要

借鉴Ullmann-Margalit的选择概念(变革性、不可逆性、被排除替代方案的阴影),我们表明当前AI系统引发了一个深刻的伦理问题,而现有AI伦理尚未充分捕捉:选择错觉,即个人和群体遭遇看似有意义的后果性选择的欺骗性外观,而成为真正能够选择所需的主体性却被削弱。针对将AI主要视为给定目标优化器的进路,我们认为应通过AI系统是否保护和发展对抗选择错觉的元能力来评估:这种元能力是社会和制度支撑的主体能力,通过它手段和目的得以形成、争论、修订和拥有。这种重新框架对于弱势群体尤为紧迫,当AI中介的路径误导行为和行动时,他们最无力承担选择错觉的成本。我们为AI中介的后果性决策提出三个规范要义:存在诚实,承认预测的局限性;生态理性,将指导置于异质的生活生态中;以及反事实修复,当AI中介的决策路径失败时,承认并修复被排除的替代方案。

英文摘要

Drawing on Ullmann-Margalit's concept of opting (transformative, irrevocable, and shadowed by foreclosed alternatives), we show that current AI systems raise a profound ethical problem that existing AI ethics has not fully captured: the illusion of opting, in which persons and groups encounter the deceptive appearance of meaningful consequential choice while the agency needed to become genuinely capable of choosing is weakened. Against approaches that treat AI primarily as an optimizer of already given ends, we argue that AI systems should be evaluated by whether they protect and cultivate meta-capacity against the illusion of opting: the socially and institutionally scaffolded agentive capacity through which means and ends can be formed, contested, revised, and owned. This reframing is especially urgent for disadvantaged populations, who are least able to absorb the costs of the illusion of opting when AI-mediated pathways misdirect behavior and action. We propose three normative imperatives for AI-mediated consequential decisions: existential honesty, which acknowledges the limits of prediction; ecological rationality, which situates guidance within heterogeneous lived ecologies; and counterfactual reparation, which acknowledges and repairs foreclosed alternatives when AI-mediated decision-making pathways fail.

2605.24358 2026-06-04 cs.LG cs.AI 版本更新

Treatment Effect Estimation with Differentiated Networked Effect on Graph Data

图数据上具有差异化网络效应的处理效应估计

Xiaofeng Lin, Han Bao, Hisashi Kashima

发表机构 * Kyoto University(京都大学) The Institute of Statistical Mathematics(统计数学研究所) Tohoku University(东北大学) RIKEN AIP(理化学研究所AIP)

AI总结 针对图数据中个体处理效应估计受邻居干扰且存在差异化网络效应的问题,提出一种结合部分注意力机制和消息放大器的干扰建模方法,以捕获邻居重要性和规模差异,提升估计精度。

Comments Accepted by the research track of the KDD 2026 conference

详情
AI中文摘要

从观测图数据中估计个体处理效应(ITE)对于商业和医学等领域的决策至关重要。由于干扰的存在,该任务具有挑战性,因为个体结果可能受到其邻居的处理和协变量的影响。现有方法尝试对这种干扰进行建模以实现准确的ITE估计。然而,一个关键问题常常被忽视:差异化网络效应(DNE),即由具有不同重要性和规模的邻居组成的局部网络所产生的影响。捕获DNE至关重要;否则,由于对干扰的错误刻画,我们将得到不精确的ITE估计,从而导致错误的决策。为了解决这一挑战,我们提出了一种新颖的干扰建模机制,该机制结合了两个部分注意力机制和一个消息放大器。部分注意力机制自动估计不同邻居在干扰中的重要性,而消息放大器根据邻居的规模调整干扰建模机制的结果,所有这些使得模型能够捕获DNE。在三个真实世界图上的实验表明,我们的方法在从图数据估计ITE方面优于现有方法,这证实了显式捕获DNE的重要性。

英文摘要

Estimating individual treatment effect (ITE) from observational graph data is crucial for decision-making in the fields such as commerce and medicine. This task is challenging due to interference, where individual outcomes can be influenced by the treatments and covariates of their neighbors. Existing methods attempt to model such interference for accurate ITE estimation. However, a critical issue is often overlooked: differentiated networked effect (DNE), an effect caused by local networks consisting of neighbors with varying importance and scales. Capturing DNE is vital; otherwise, we will end up with imprecise ITE estimation due to an erroneous characterization of interference, which can result in misguided decisions. To address this challenge, we propose a novel interference modeling mechanism that incorporates two partial attention mechanisms and a message amplifier. The partial attention mechanisms automatically estimate the importance of different neighbors in contributing to interference, while the message amplifier adjusts the results of the interference modeling mechanism based on the scale of neighbors, all of which enables the model to capture DNE. Experiments on three real-world graphs demonstrate that our methods outperform existing approaches for ITE estimation from graph data, which corroborates the importance of explicitly capturing DNE.

2605.27488 2026-06-04 cs.CR cs.AI 版本更新

Grimlock: Guarding High-Agency Systems with eBPF and Attested Channels

Grimlock: 使用eBPF和认证通道保护高代理系统

Qiancheng Wu, Wenhui Zhang, Gan Fang, Sheng Mao, Biao Gao, David Levitsky, Shawna Murphy Butterworth, Rob Cameron

发表机构 * Roblox

AI总结 针对代理系统中用户编排代码带来的安全挑战,提出Grimlock代理守卫,通过eBPF强制流量拦截和TLS 1.3通道绑定认证,实现透明、可审计、作用域绑定的代理间通信。

Comments Vision paper presented at the 1st Workshop on Operating Systems Design for AI Agents (AgenticOS '26), co-located with ASPLOS 2026

详情
AI中文摘要

代理系统越来越多地运行用户编写的编排代码,这些代码调用工具、生成子任务并在机器和云之间委派工作。虽然这种高代理效率很高,但它带来了安全问题:身份、授权、来源和委派通常被推入应用程序代码,在那里它们变得难以一致地执行和审计。我们提出Grimlock,一种代理守卫,通过将信任执行移动到沙箱子系统中,同时保持代理代码不变,来恢复关注点分离。Grimlock使用eBPF强制流量拦截来确保沙箱通信通过守卫,并将其与绑定到标准TLS 1.3通道绑定的握手后认证相结合。通道建立后,守卫授权通信并生成短暂的、通道绑定的作用域令牌,这些令牌捕获最小权限委派。在接收端,目标守卫重新验证身份、作用域和通道绑定,终止TLS,并仅在策略检查成功后向目标沙箱释放明文。kTLS为受保护的通信提供了高效的数据平面。因此,Grimlock提供了一条路径,使用通用Linux原语,无需更改用户层编排代码,即可在异构多云环境中实现透明、可审计、作用域绑定的代理间通信。

英文摘要

Agentic systems increasingly run user-authored orchestration code that invokes tools, spawns subtasks, and delegates work across machines and clouds. Although this high agency is productive, it creates a security problem: identity, authorization, provenance, and delegation are often pushed into application code, where they become difficult to enforce consistently and difficult to audit. We present Grimlock, an Agent Guard that restores separation of concerns by moving trust enforcement into the sandbox substrate while leaving agent code unchanged. Grimlock uses eBPF-enforced traffic interception to ensure that sandbox communication passes through a guard, and combines it with post-handshake attestation bound to standard TLS~1.3 channel bindings. After a channel is established, the guard authorizes communication and mints short-lived, channel-bound scope tokens that capture least-privilege delegation. At the receiving side, the destination guard re-validates identity, scope, and channel binding, terminates TLS, and releases plaintext to the destination sandbox only after policy checks succeed. kTLS provides an efficient dataplane for protected communication. As a result, Grimlock offers a path toward transparent, auditable, and scope-bound agent-to-agent communication across heterogeneous multi-cloud environments, using commodity Linux primitives and without requiring changes to user-layer orchestration code.

2605.30120 2026-06-04 cs.IR cs.AI cs.LG 版本更新

No More K-means: Single-Stage Sparse Coding for Efficient Multi-Vector Retrieval

不再需要K-means:用于高效多向量检索的单阶段稀疏编码

Lixuan Guo, Yifei Wang, Tiansheng Wen, Aosong Feng, Stefanie Jegelka, Chenyu You

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 针对多向量检索中K-means聚类导致的索引延迟和语义损失问题,提出单阶段稀疏检索(SSR),利用稀疏自编码器将词元嵌入投影为高维稀疏表示,结合倒排索引实现高效检索,在BEIR基准上索引时间减少15倍、检索延迟减半且性能提升。

Comments Accepted by ICML2026

详情
AI中文摘要

以ColBERT为代表的多向量检索(MVR)模型通过保留细粒度的词元级交互,在检索准确性上树立了新标杆。然而,这种粒度带来了存储和检索效率的瓶颈:为了管理十亿级词元向量的巨大内存占用和计算开销,最先进的系统被迫依赖激进的降维和复杂的聚类(例如K-means)。这种妥协引入了两个关键限制:大规模语料库聚类的过度索引延迟以及压缩固有的语义信息损失。在本文中,我们提出了单阶段稀疏检索(SSR),这是一种范式转变,用高效的稀疏编码取代了昂贵的聚类。我们不将特征压缩为低维稠密向量,而是利用稀疏自编码器(SAE)将词元嵌入投影到高维但高度稀疏的表示中。这种转换使我们能够完全绕过向量聚类,并利用倒排索引实现精确、高吞吐量的检索。在BEIR基准上的大量实验表明,SSR实现了“三连胜”的改进:与ColBERTv2相比,索引时间减少了15倍,检索延迟减半,同时检索性能优于领先的基线方法。

英文摘要

Multi-vector retrieval (MVR) models, exemplified by ColBERT, have established new benchmarks in retrieval accuracy by preserving fine-grained token-level interactions. However, this granularity imposes prohibitive storage and retrieval efficiency bottlenecks: to manage the immense memory footprint and computational overhead of billion-scale token vectors, state-of-the-art systems are forced to rely on aggressive dimension reduction and complex clustering (e.g., K-means). This compromise introduces two critical limitations: excessive indexing latency of clustering large-scale corpora and semantic information loss inherent to compression. In this paper, we propose Single-stage Sparse Retrieval (SSR}, a paradigm shift that replaces expensive clustering with efficient sparse coding. Instead of compressing features into low-dimensional dense vectors, we utilize Sparse Autoencoder (SAE) to project token embeddings into a high-dimensional but highly sparse representation. This transformation enables us to bypass vector clustering entirely and leverage inverted indexing for precise, high-throughput retrieval. Extensive experiments on the BEIR benchmark demonstrate that SSR achieves a "trifecta" of improvements: it reduces indexing time by 15x compared to ColBERTv2, halves retrieval latency, and simultaneously improves retrieval performance over leading baselines.

2605.29928 2026-06-04 cs.HC cs.AI 版本更新

Label Over Logic? How Source Cues Bias Human Fallacy Judgments More Than LLMs

标签胜过逻辑?源标签如何比LLMs更严重地偏差人类的谬误判断

Mahjabin Nahar, Nafis Irtiza Tripto, Aiping Xiong, Ting-Hao 'Kenneth' Huang, Dongwon Lee

发表机构 * The Pennsylvania State University(宾夕法尼亚州立大学)

AI总结 通过在线实验和LLM对比,发现人类在评估逻辑谬误时显著受到内容源标签(如人类、AI等)的影响,而LLM评估相对稳定,表明源标签偏差主要是人类的弱点。

详情
AI中文摘要

随着AI生成和AI辅助内容充斥在线空间,附加在这些内容上的源标签可能会扭曲人类的推理判断,对审核、评估和决策产生下游影响。LLM是否也存在这种脆弱性,或者能提供更不受源影响的评估,仍然是一个悬而未决的问题,直接影响人机协作。我们使用逻辑谬误作为受控环境来隔离源标签对推理质量的影响,独立于领域知识。我们进行了一项在线研究(N=505),参与者被分配到不同的源条件(人类、AI、人类辅助AI、AI辅助人类或无披露),并评估包含逻辑谬误的评论,将其判断与LLM(GPT-5.2、Gemini 2.5 Flash、Claude Sonnet 4.5)在相同源条件下的评估进行比较。人类评估者显著更容易受到标记为人类或人类辅助AI的谬误的影响,并在这些条件下给予更高的信任和评估评分。LLM评估在不同源标签下相对稳定,但不同模型表现各异。无论是否存在谬误,人类和LLM在所有条件下的置信水平都同样高。我们的发现表明,推理评估中的源标签偏差主要是人类的弱点,并突显了在日益AI中介的环境中人类与LLM协作的潜力。

英文摘要

As AI-generated and AI-assisted content floods online spaces, source labels attached to such content can distort human reasoning judgments, with downstream consequences for moderation, evaluation, and decision-making. Whether LLMs share this vulnerability, or offer more source-agnostic evaluation, remains an open question with direct implications for human-AI collaboration. We examine this issue using logical fallacies as a controlled setting to isolate source-label effects on reasoning quality, independent of domain knowledge. We conduct an online study (N=505) where participants are assigned to a source condition (human, AI, human with AI assistance, AI with human assistance, or no disclosure) and evaluate comments containing logical fallacies, comparing their judgments with those of LLMs (GPT-5.2, Gemini 2.5 Flash, Claude Sonnet 4.5), who were evaluated across the same source conditions. Human evaluators were significantly more susceptible to fallacies labeled as written by human or human with AI assistance and assigned higher trust and evaluation ratings in these conditions. LLM evaluations remained comparatively stable across source labels, though performance varied across models. Confidence levels were similarly high across conditions for both humans and LLMs, regardless of fallacy presence. Our findings indicate that source-label bias in reasoning evaluation is primarily a human vulnerability and highlight the potential of human-LLM collaboration in increasingly AI-mediated environments.

2605.29861 2026-06-04 cs.CL cs.AI 版本更新

Towards Verifiable Multimodal Deep Research: A Multi-Agent Harness for Interleaved Report Generation

迈向可验证的多模态深度研究:用于交错报告生成的多智能体框架

Chenghao Zhang, Guanting Dong, Yufan Liu, Tong Zhao, Xiaoxi Li, Zhicheng Dou

发表机构 * Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学北京校区人工智能学院)

AI总结 提出多智能体框架Ptah,通过规划、研究和写作阶段生成交错文本与视觉证据的多模态报告,并引入验证器确保事实准确性和跨模态一致性。

Comments In progress

详情
AI中文摘要

大型语言模型(LLMs)已将自主智能体从深度搜索(检索简洁的事实答案)推进到深度研究(将分散的证据综合成长篇报告)。然而,由于缺乏确定性真实值的开放式合成以及需要将文本论证与视觉证据交错,可验证的多模态深度研究仍然具有挑战性。我们提出 extsc{Ptah},一个用于交错报告生成的多智能体框架。 extsc{Ptah}通过规划、研究和写作阶段编排从用户查询到渲染网页报告的完整生命周期,其中专门智能体构建视觉感知计划、收集基于声明的证据、在 extit{视觉工作记忆}中维护与源对齐的图像,并通过声明式多模态工具使用撰写报告。验证智能体作为框架的接受函数,在整个工作流中强制执行事实依据、引用保真度和跨模态一致性。我们进一步引入 extsc{Ptah}Eval,一个评估协议,通过图像级和呈现级评估增强现有基准。在深度研究基准上的实验表明, extsc{Ptah}生成的面向人类的多模态报告比强基线更可靠、视觉信息更丰富且更实用。

英文摘要

Large Language Models (LLMs) have advanced autonomous agents from deep search, which retrieves concise factual answers, to deep research, which synthesizes scattered evidence into long-form reports. However, verifiable multimodal deep research remains challenging due to open-ended synthesis without deterministic ground truth and the need to interleave textual arguments with visual evidence. We propose Ptah, a multi-agent harness for interleaved report generation. Ptah orchestrates the lifecycle from user query to rendered web report through planning, research, and writing stages, where specialized agents construct visual-aware plans, collect claim-grounded evidence, maintain source-aligned images in a Visual Working Memory, and compose reports through declarative multimodal tool use. A verifier agent serves as the harness's acceptance function, enforcing factual grounding, citation fidelity, and cross-modal consistency throughout the workflow. We further introduce PtahEval, an evaluation protocol that augments existing benchmarks with image-level and presentation-level assessments. Experiments on deep research benchmarks show that Ptah produces more reliable, visually informative, and usable human-facing multimodal reports than strong baselines. Our code is released at https://github.com/SnowNation101/Ptah

2509.23694 2026-06-04 cs.AI cs.CL cs.CR 版本更新

SafeSearch: Automated Red-Teaming of LLM-Based Search Agents

SafeSearch: 基于LLM的搜索代理的自动化红队测试

Jianshuo Dong, Sheng Guo, Hao Wang, Xun Chen, Zhuotao Liu, Tianwei Zhang, Ke Xu, Minlie Huang, Han Qiu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出SafeSearch自动化红队框架,系统评估基于LLM的搜索代理在五个风险类别中的安全性,发现GPT-4.1-mini在搜索工作流中攻击成功率高达90.5%,且常见防御措施效果有限。

Comments Accepted by ICML 2026

详情
AI中文摘要

搜索代理将LLM连接到互联网,使其能够访问更广泛和更新的信息。然而,这也引入了一个新的威胁面:不可靠的搜索结果可能误导代理产生不安全的输出。现实世界的事件和我们的两个野外观察表明,此类失败在实践中可能发生。为了系统地研究这一威胁,我们提出了SafeSearch,一个可扩展、成本效益高且轻量级的自动化红队框架,支持搜索代理的沙盒安全评估。利用该框架,我们生成了涵盖五个风险类别(例如,错误信息和提示注入)的300个测试用例,并评估了三个搜索代理框架在17个代表性LLM上的表现。我们的结果揭示了基于LLM的搜索代理存在重大漏洞,在搜索工作流设置中,GPT-4.1-mini的最高攻击成功率(ASR)达到90.5%。此外,我们发现常见的防御措施(如提醒提示)提供的保护有限。总体而言,SafeSearch提供了一种实用的方法来衡量和提高基于LLM的搜索代理的安全性。

英文摘要

Search agents connect LLMs to the Internet, enabling them to access broader and more up-to-date information. However, this also introduces a new threat surface: unreliable search results can mislead agents into producing unsafe outputs. Real-world incidents and our two in-the-wild observations show that such failures can occur in practice. To study this threat systematically, we propose SafeSearch, an automated red-teaming framework that is scalable, cost-efficient, and lightweight, enabling sandboxed safety evaluation of search agents. Using this, we generate 300 test cases spanning five risk categories (e.g., misinformation and prompt injection) and evaluate three search agent scaffolds across 17 representative LLMs. Our results reveal substantial vulnerabilities in LLM-based search agents, with the highest ASR reaching 90.5% for GPT-4.1-mini in a search-workflow setting. Moreover, we find that common defenses, such as reminder prompting, offer limited protection. Overall, SafeSearch provides a practical way to measure and improve the safety of LLM-based search agents.

2605.29280 2026-06-04 cs.LG cs.AI cs.IR 版本更新

LoopFM: Learning frOm HistOrical RePresentations of Foundation Model for Recommendation

LoopFM:从基础模型的历史表示中学习用于推荐

Shali Jiang, Hua Zheng, Boyang Liu, Laming Chen, Kenny Lov, Chuanqi Xu, Lisang Ding, Qinghai Zhou, Can Cui, Xiaolong Liu, Xiaoyi Liu, Yasmine Badr, Xin Xu, Jiyan Yang, Ellie Dingqiao Wen, Gerard Jonathan Mugisha Akkerhuis, Chenxiao Guan, Rong Jin, Ruichao Qiu, Xian Chen, Shifu Xu, Zhehui Zhou, Ping Chen, Rui Yang, Haicheng Chen, Xiangge Meng, Song Zhou, Dharak Kharod, Shuyu Xu, Qiang Jin, Qiao Yang, Wankun Zhu, Qin Huang, Yuzhen Huang, Darren Liu, Parish Aggarwal, Hui Zhou, Erzhuo Wang, Shuo Chang, Xiaorui Gan, Wenlin Chen, Santanu Kolay, Huayu Li

发表机构 * Meta

AI总结 针对知识蒸馏中传递标量导致转移率下降的问题,提出LoopFM框架,通过将基础模型的中间嵌入作为输入特征传递给下游垂直模型,实现高带宽知识转移,并在理论和实验中证明其有效性。

Comments Shali Jiang, Hua Zheng, Boyang Liu contributed equally to this work

详情
AI中文摘要

知识蒸馏(KD)将大型基础模型(FM)的单个标量预测传递给紧凑的垂直模型(VM),但由于单个标量无法传达较大FM学习的丰富中间知识,导致转移率(VM捕获的FM改进比例)下降。为了解决这一瓶颈,我们提出了LoopFM(从FM的历史表示中学习),该框架通过将FM中间嵌入结构化为下游VM的输入特征(例如,用户历史序列)来打开高带宽传输通道,无需在服务时进行实时FM推理,也无需FM和VM之间的架构耦合。我们为LoopFM提供了理论框架,包括增益分解和转移率分析。在三个公开基准上,LoopFM展示了强大的AUC改进(例如,在淘宝广告上提高6%以上)以及与KD互补的知识转移能力。在工业规模系统(数十亿样本、万亿参数FM)上,LoopFM在KD基础上将知识转移率大约翻倍,在Y1H1中实现了+0.5%的转化改进,在Y1H2中分别从两次单独发布实现了+1.03%和+1.22%的转化改进。

英文摘要

Knowledge distillation (KD) transfers a single scalar prediction from a large foundation model (FM) to compact vertical models (VMs), suffering from diminishing transfer ratio -- the fraction of FM improvement captured by the VM -- as a single scalar cannot convey the rich intermediate knowledge that larger FMs learn. To address this bottleneck, we propose LoopFM (Learning frOm HistOrical RePresentations of FM), a framework that opens a high-bandwidth transfer channel by structuring FM intermediate embeddings as input features (e.g., user history sequence) for downstream VMs, without requiring real-time FM inference at serving and architectural coupling between FM and VM. We provide a theoretical framework for LoopFM with a gain decomposition and transfer-ratio analysis. On three public benchmarks, LoopFM demonstrates strong AUC improvements (e.g., 6%+ on TaobaoAd) and complementary knowledge transfer capability with KD. On industrial-scale systems (billions of examples, trillion-parameter FMs), LoopFM approximately doubles the knowledge transfer ratio on top of KD, delivering a +0.5% conversion improvement in the first half after its initial launch, and +1.03% and +1.22% conversion improvement from two individual launches in the subsequent half.

2605.29076 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Structured Prompt Optimization Meets Reinforcement Learning for Global and Local Interpretability over Complex Text

结构化提示优化结合强化学习实现复杂文本的全局与局部可解释性

Tianyang Zhou, Wenbo Chen, Pierre Jinghong Liang, Leman Akoglu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Amazon(亚马逊)

AI总结 提出eXTC框架,通过结构化提示优化、基于SOP的推理蒸馏和强化学习扩展,在分类性能和解释质量上显著优于现有范式。

详情
AI中文摘要

LLMs在文本分类上取得了进展,但现有范式面临权衡:监督(仅标签)微调可扩展,但对复杂文本推理有限且缺乏模型透明度;离散提示优化提供可读指令,但性能和可扩展性不佳。我们引入eXTC(可解释文本分类器),包含三个渐进阶段:(1)通过新的结构化提示优化算法学习自然语言的标准操作程序(SOP或规则手册);(2)从大型教师LLM到紧凑LM的基于SOP的推理蒸馏;(3)通过强化学习扩展超出初始SOP的推理能力。该设计使eXTC能够(i)通过紧凑LM实现快速推理,(ii)提供推理时的局部推理轨迹,以及其学习领域规则的全局模块化解释,同时(iii)在分类性能和解释质量上显著优于现有范式,并逐步提升。

英文摘要

LLMs have advanced text classification, yet existing paradigms face a trade-off: supervised (label only) fine-tuning is scalable but offers limited reasoning on complex text and lacks broader model transparency, while discrete prompt optimization offers human-readable instructions but struggles with performance and scalability. We introduce eXTC (eXplainable Text Classifier) with three progressive stages: (1) learning a Standard Operating Procedure (SOP, or rulebook) in natural language via a new Structured Prompt Optimization algorithm; (2) SOP-grounded reasoning distillation from a large teacher LLM into a compact LM; and (3) expanding reasoning capabilities beyond the initial SOP via reinforcement learning. This design enables eXTC to provide (i) fast inference via a compact LM, with (ii) inference-time local reasoning traces, alongside a global, modular explanation of its learned domain rules, while (iii) significantly outperforming existing paradigms across diverse benchmarks in both classification performance and explanation quality, with stage-by-stage gains.

2605.28829 2026-06-04 cs.CL cs.AI cs.CY 版本更新

Aryabhata 2: Scaling Reinforcement Learning for Advanced STEM Reasoning

Aryabhata 2:扩展强化学习以提升高级STEM推理能力

Ritvik Rastogi, Vishal Singh, Tejas Chaudhari, Sandeep Varma

发表机构 * PhysicsWallah

AI总结 本文提出Aryabhata 2,一个通过强化学习后训练在竞争性STEM考试中提升推理能力的语言模型,在JEE、NEET等基准上超越基础模型且输出token减少高达64%。

详情
AI中文摘要

竞争性STEM考试(如JEE和NEET)需要多步符号推理、精确数值计算以及物理、化学和数学的深层概念理解。近期的大语言模型在常见推理基准上表现强劲,但仍难以大规模部署,因为数百万学生的疑问需要领域特定且结构一致的问题求解。 我们提出了Aryabhata 2,一个专注于竞争性STEM考试推理的语言模型,通过强化学习后训练进行优化。利用PhysicsWallah的内部题库,我们构建了高质量的训练课程,并通过可验证奖励的强化学习对GPT-OSS-20B进行后训练。训练结合了延长强化学习与通过逐步增大的rollout组大小拓宽探索。 我们在竞争性考试基准(包括JEE Main、JEE Advanced和NEET)以及分布外推理数据集(如AIME、HMMT、MMLU-Pro、MMLU-Redux 2.0和GPQA)上评估了Aryabhata 2。结果表明,Aryabhata 2在竞争性STEM推理上优于其基础模型GPT-OSS-20B,同时所需输出token大幅减少(最多减少64%)。

英文摘要

Competitive STEM examinations such as JEE and NEET require multi-step symbolic reasoning, precise numerical computation, and deep conceptual understanding across physics, chemistry, and mathematics. Recent large language models perform strongly on common reasoning benchmarks, yet they remain difficult to deploy at scale, where millions of student doubts demand domain-specific, consistently structured problem solving. We introduce Aryabhata 2, a reasoning-focused language model for competitive STEM examinations, trained via reinforcement-learning post-training. Using PhysicsWallah's internal question banks, we construct a high-quality training curriculum and post-train GPT-OSS-20B through reinforcement learning with verifiable rewards. Training combines prolonged reinforcement learning with broadened exploration via progressively larger rollout group sizes. We evaluate Aryabhata 2 on competitive examination benchmarks, including JEE Main, JEE Advanced, and NEET, as well as out-of-distribution reasoning datasets such as AIME, HMMT, MMLU-Pro, MMLU-Redux 2.0, and GPQA. Results show that Aryabhata 2 outperforms its base model GPT-OSS-20B on competitive STEM reasoning while requiring substantially fewer output tokens (up to 64\% fewer).

2605.25402 2026-06-04 cs.CV cs.AI 版本更新

Anatomy-Anchored Self-Supervision: Distilling Vision Foundation Models for Invariant Ultrasound Representation

解剖锚定的自监督:蒸馏视觉基础模型用于不变超声表示

Chunzheng Zhu, Yijun Wang, Jianxin Lin, Feng Wang, Hongwei Wang, Lei Zhao, Shengli Li, Kenli Li

发表机构 * Hunan University(湖南大学) Shenzhen Maternity and Child Healthcare Hospital(深圳妇幼保健医院)

AI总结 提出解剖锚定的超声自监督框架ANAUS,通过可学习潜在提示引擎和领域自适应实现无标注解剖分割,并设计双策略自监督学习(语义感知解剖分离对齐和上下文核心区域预测)来增强表示学习,在六个公开数据集上超越现有方法。

Comments MICCAI 2026 Accepted Paper; Anatomy-Anchored Ultrasound Self-Supervision

详情
AI中文摘要

自监督预训练范式在医学图像中学习可迁移表示方面日益重要,但现有超声图像方法在图像或帧级别操作,忽略了临床对齐表示学习的解剖上下文。在这项工作中,我们提出了一种解剖锚定的超声自监督框架ANAUS,将表示学习从通用视觉区域转移到临床有意义的解剖结构。利用可学习的潜在提示引擎以及对现有公开图像-掩码对的一次性领域自适应,我们使LP-SAM模块能够大规模实现无标注解剖描绘。基于此解剖基础,我们提出了一种双策略自监督学习范式,包括视图间语义感知的解剖分离对齐和上下文核心区域预测,以增强表示学习。具体而言,前者在相同解剖区域内强制特征不变性,同时促进不同结构间的可区分性;后者迫使模型重建被破坏的区域,从而捕获细粒度的结构细节。在六个公开数据集上的广泛评估表明,我们的方法持续优于当前最先进的方法,同时保持了临床部署所需的计算效率。代码可在https://github.com/zhcz328/ANAUS获取。

英文摘要

Self-supervised pre-training paradigm has gained increasing prominence for learning transferable representations in medical imaging, yet existing methods for ultrasound (US) images operate at the image or frame level, overlooking the anatomical context for clinical-aligned representation learning. In this work, we propose an anatomy-anchored ultrasound self-supervision framework ANAUS that shifts representation learning from generic visual regions to clinically meaningful anatomical structures. Utilizing a learnable latent prompt engine alongside a one-time domain adaptation on existing public image-mask pairs, we empower the LP-SAM module to achieve annotation-free anatomy delineation at scale. Building upon this anatomical grounding, we propose a dual-policy self-supervised learning paradigm consisting of inter-view semantics-aware anatomy-separating alignment and contextual core-region prediction to enhance representation learning. Specifically, the former enforces feature invariance within identical anatomical regions while promoting discriminability across distinct structures; the latter compels the model to reconstruct corrupted regions, thereby capturing fine-grained structural details. Extensive evaluations on six public datasets demonstrate that ANAUS consistently outstrips current state-of-the-art methods while maintaining the computational efficiency essential for clinical deployment. Code is available at https://github.com/zhcz328/ANAUS.

2605.11130 2026-06-04 cs.LG cs.AI 版本更新

HEPA: A Self-Supervised Horizon-Conditioned Event Predictive Architecture for Time Series

HEPA: 一种用于时间序列的自监督水平条件事件预测架构

Jonas Petersen, Gian-Alessandro Lombardi, Riccardo Maggioni, Camilla Mazzoleni, Federico Martelli, Philipp Petersen

发表机构 * ETH Zurich(苏黎世联邦理工学院) Forgis University of Vienna(维也纳大学)

AI总结 提出HEPA架构,通过因果Transformer编码器联合嵌入预测(JEPA)预训练和仅微调预测器生成单调生存累积分布函数,在14个基准测试中超过PatchTST等模型,参数和标注数据量减少一个数量级。

Comments Spotlight at FMSD, ICML 2026. Code: https://github.com/Forgis-Labs/HEPA

详情
AI中文摘要

多变量时间序列中的关键事件,从涡轮机故障到心律失常,需要准确的预测,但由于此类事件罕见且标注成本高,标注数据稀缺。我们引入了HEPA(水平条件事件预测架构),基于两个关键原则。首先,通过联合嵌入预测架构(JEPA)预训练因果Transformer编码器:一个水平条件预测器学习预测未来表示而非未来值,迫使编码器仅从无标注数据中捕获可预测的时间动态。其次,我们冻结编码器,仅微调预测器以预测目标事件,生成随水平单调的生存累积分布函数(CDF)。在所有基准测试中,使用固定的架构和优化器超参数,HEPA处理了水污染、网络攻击检测、波动率制度以及跨11个领域的另外8种事件类型,在14个基准测试中的至少10个上超过了包括PatchTST、iTransformer、MAE和Chronos-2在内的领先时间序列架构,调优参数少一个数量级,并且在生命周期数据集上,标注数据少一个数量级。

英文摘要

Critical events in multivariate time series, from turbine failures to cardiac arrhythmias, demand accurate prediction, yet labeled data is scarce because such events are rare and costly to annotate. We introduce HEPA (Horizon-conditioned Event Predictive Architecture), built on two key principles. First, a causal Transformer encoder is pretrained via a Joint-Embedding Predictive Architecture (JEPA): a horizon-conditioned predictor learns to forecast future representations rather than future values, forcing the encoder to capture predictable temporal dynamics from unlabeled data alone. Second, we freeze the encoder and finetune only the predictor toward the target event, producing a monotonic survival cumulative distribution function (CDF) over horizons. With fixed architecture and optimiser hyperparameters across all benchmarks, HEPA handles water contamination, cyberattack detection, volatility regimes, and eight further event types across 11 domains, exceeding leading time-series architectures including PatchTST, iTransformer, MAE, and Chronos-2 on at least 10 of 14 benchmarks, with an order of magnitude fewer tuned parameters and, on lifecycle datasets, an order of magnitude less labeled data.

2605.09081 2026-06-04 cs.LG cs.AI 版本更新

FactoryNet: A Large-Scale Dataset toward Industrial Time-Series Foundation Models

FactoryNet:面向工业时间序列基础模型的大规模数据集

Karim Othman, Jonas Petersen, Matei Ignuta-Ciuncanu, Camilla Mazzoleni, Federico Martelli, Alessandro Lombardi, Riccardo Maggioni, Philipp Petersen

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 提出首个工业时间序列通用预训练语料库FactoryNet,通过统一模式实现跨实体零样本迁移和高效异常检测。

Comments Accepted at AI4Physics and FMSD, ICML 2026. Code: https://github.com/Forgis-Labs/FactoryNet

详情
AI中文摘要

我们引入了首个工业时间序列数据的通用预训练语料库:FactoryNet。该数据集包含51M个数据点,涵盖六种实体上的23k个端到端任务执行(13.3k真实,9.8k合成),通过共享模式实现了鲁棒的零样本跨实体迁移和高参数效率的异常检测。我们提出了一种新颖的模式:设定点、努力、反馈、上下文(S-E-F-C),该模式贯穿整个流水线,将任何驱动系统映射到共同的表示框架。该语料库涵盖27种标注的异常类型,以及健康基线和机器人操作与加工领域的反事实对。跨实体迁移实验取得了积极结果:在考虑偏见的指标下,我们的模型在评估的源-目标对上展示了公平的跨实体迁移能力,而24个模式对齐的信号与高维基线相比,实现了有竞争力的异常检测性能。我们发布FactoryNet作为一个不断增长的多实体数据集,以推动工业基础模型的发展。

英文摘要

We introduce the first universal pretraining corpus for industrial time-series data: FactoryNet. 51M datapoints across 23k end-to-end task executions (13.3k real, 9.8k synthetic) on six embodiments, unified by a shared schema that enables robust zero-shot cross-embodiment transfer and highly parameter-efficient anomaly detection. We introduce a novel schema: Setpoint, Effort, Feedback, Context (S-E-F-C) underlying the whole pipeline that maps any actuated system into a common representational frame. The corpus spans 27 annotated anomaly types alongside healthy baselines and counterfactual pairs across robotic manipulation and machining domains. Cross-embodiment transfer experiments yield positive results: under bias-aware metrics our model demonstrates fair cross-embodiment transfer capabilities on the evaluated source-target pair, while 24 schema-aligned signals achieves competitive anomaly detection performance compared to high-dimensional baselines. We release FactoryNet as a growing, multi-embodiment dataset to drive progress toward industrial foundation models.

2605.24602 2026-06-04 cs.CV cs.AI 版本更新

Correcting Visual Blur Induced by Attention Distraction to Reduce Hallucinations: Algorithm and Theory

纠正注意力分散引起的视觉模糊以减少幻觉:算法与理论

Quanjiang Li, Zhiming Liu, Wei Luo, Tingjin Luo, Chenping Hou

发表机构 * National University of Singapore(新加坡国立大学) University of Science and Technology of China(中国科学技术大学)

AI总结 本文揭示多模态大语言模型中的物体幻觉与类人注意力分散现象相关,并提出一种无需额外训练的注意力聚焦方法(AFIP)通过跨头注意力增强和动态历史注意力强化来纠正视觉模糊,从而减少幻觉。

详情
Journal ref
ICML2026
AI中文摘要

多模态大语言模型(MLLMs)经常遭受物体幻觉的困扰,但导致这一失败的视觉感知机制仍知之甚少。在这项工作中,我们揭示幻觉与一种类人注意力分散现象密切相关,其中人类在注意力分散下会经历视觉清晰度下降并产生不准确的描述,而在模型中,同样的机制表现为解码过程中多头注意力的空间不一致性以及对图像令牌注意力的时间衰减。我们进一步提供了理论见解,表明注意力分散会增加模型复杂度并降低分类泛化能力。受这些发现的启发,我们提出了一种用于改进图像感知的注意力聚焦方法(AFIP),该方法通过跨头注意力丰富来纠正注意力分散,并通过动态历史注意力增强来强化视觉基础。在多个基准和模型上的大量实验验证了AFIP的有效性,且无需额外训练。

英文摘要

Multimodal large language models (MLLMs) frequently suffer from object hallucinations, yet the visual perceptual mechanism underlying this failure remains poorly understood. In this work, we reveal that hallucinations are strongly associated with a human-like attention distraction phenomenon, where humans under divided focus experience degraded visual clarity and produce inaccurate descriptions, while in models the same mechanism manifests as spatial inconsistency in multi-head attention and temporal fading of attention to image tokens during decoding. We further provide theoretical insights that attention dispersion increases model complexity and degrades classification generalization. Motivated by these findings, we propose an Attention-Focused Approach for Improved Image Perception (AFIP), which corrects attention distraction via cross-head attention enrichment and reinforces visual grounding through dynamic historical attention enhancement. Extensive experiments on multiple benchmarks and models validate the effectiveness of AFIP without additional training.

2605.17273 2026-06-04 cs.LG cs.AI 版本更新

Position: State-of-the-Art Claims Require State-of-the-Art Evidence

立场:声称最先进需要最先进的证据

YongKyung Oh

发表机构 * YongKyung Oh(永庆欧)

AI总结 本文指出人工智能和机器学习研究中普遍存在的声称最先进(SOTA)与证据不足之间的差距,通过分析十个跨领域基准测试发现,超过一半的顶级模型比较中至少一项常见的优越性假设不成立,并呼吁声明语言应反映证据强度。

详情
AI中文摘要

最先进(SOTA)声称在人工智能(AI)和机器学习(ML)研究中普遍存在。这些声称基于基准评估,其中模型根据跨任务的总分进行排名。公共基准或排行榜是最明显的实例,但相同的结构也出现在文献中的论文表格中。然而,这种微弱的证据往往无法支持这些强有力的声称。我们识别出AI基准测试中普遍存在的声称-证据差距。声称SOTA隐含着超越平均分数优越性的假设,表明模型在大多数任务上显著优于替代方案。然而,平均分数的边际改进仅表明平均排名靠前,而非真正的优越性。通过分析来自公共排行榜的十个跨领域基准测试,我们发现超过一半的顶级模型比较中,至少一项常见的优越性假设不成立。这些属性包括有意义的效应大小、跨任务的一致性,或对数据集移除的鲁棒性。相反,总分提升往往由异常数据集驱动。即使在任务众多的基准测试中,这种脆弱性仍然存在。我们认为,声称语言应反映潜在证据的强度。这不需要额外的实验,只需诚实地报告结果实际显示的内容,从而实现跨模型更精确和可解释的比较。

英文摘要

State-of-the-Art (SOTA) claims pervade Artificial Intelligence (AI) and Machine Learning (ML) research. These claims rest on benchmark evaluations, where models are ranked by aggregate scores across tasks. Public benchmarks or leaderboards are the most visible instance, but the same structure appears in paper tables throughout the literature. However, such minimal evidence often cannot support these strong claims. We identify a widespread claim-evidence gap in AI benchmarking. Claiming SOTA carries implicit assumptions beyond mean score superiority, suggesting that a model meaningfully outperforms alternatives across most tasks. However, a marginal improvement in the mean score merely indicates a top average rank rather than true superiority. Analyzing ten cross-domain benchmarks from public leaderboards, we found that in more than half of top-model comparisons, at least one commonly assumed property of superiority does not hold. These properties include meaningful effect size, consistency across tasks, or robustness to dataset removal. Instead, aggregate gains are frequently driven by outlier datasets. This fragility persists even in benchmarks with many tasks. We argue that claim language should reflect the strength of the underlying evidence. This requires no additional experiments, only honest reporting of what results actually show, enabling more precise and interpretable comparisons across models.

2605.22240 2026-06-04 cs.AI 版本更新

Unlocking Proactivity in Task-Oriented Dialogue

解锁任务导向型对话中的主动性

Azure Zhang, Ning Gao, Yuqin Dai, Ruiyuan Wu, Jinpeng Wang, Rena Wei Gao, Bingdong Tan, Shuzheng Gao, Zongjie Li, Chaozheng Wang

发表机构 * Keeta AI, Meituan(Keeta AI,美团) Independent Researcher(独立研究者) CUHK(香港中文大学) HKUST(香港科技大学)

AI总结 针对任务导向型对话中主动性问题,提出认知用户模拟器和模拟器诱导的非对称视角策略优化,通过建模用户潜在关注实现主动对话。

详情
AI中文摘要

主动任务导向型对话(如外呼销售)需要一个有说服力的代理,能够主动探询用户的关注点,并在有限轮次内引导对话走向接受。然而,后训练的LLM本质上是保守的,而奖励塑造强化学习(如GRPO)效果不佳,因为它仅重新加权被动策略已采样的内容。我们表明,以用户的潜在关注为条件可以解锁任何采样量都无法破坏的主动能力,从而将这些关注确立为关键的训练时信号。为将这一发现付诸实践,我们构建了**认知用户模拟器**,它将每个用户建模为一个分层角色,包括可观察的外部特征和隐藏的内部关注。该模拟器产生忠实且多样化的交互,同时输出每轮状态动态以跟踪说服进展。然后,我们引入**模拟器诱导的非对称视角策略优化**,将建模的关注和模拟状态转换转化为互补的训练目标:(1)*非对称在线自蒸馏*,将关注感知行为从同一策略的特权视角转移到其可部署的、仅对话视角;(2)*状态转换策略优化*...

英文摘要

Proactive task-oriented dialogue (TOD), such as outbound sales, demands a persuasive agent that actively probes the user's concerns and steers the conversation toward acceptance within a bounded number of turns. Yet post-trained LLMs are inherently conservative, and reward-shaping RL (e.g., GRPO) struggles since it only re-weights what an already passive policy samples. We show that conditioning on the user's latent concerns unlocks proactive capability that no amount of sampling can undermine, establishing these concerns as a pivotal training-time signal. To operationalize this finding, we build the \textbf{Cognitive User Simulator}, which models each user as a stratified persona comprising observable external traits and hidden internal concerns. The simulator produces faithful and diverse interactions, while emitting per-turn state dynamics that track persuasion progress. We then introduce \textbf{Simulator-Induced Asymmetric-View Policy Optimization}, which converts the modeled concerns and the simulation state transition into complementary training objectives: (1) \emph{Asymmetric On-Policy Self-Distillation} that transfers concern-aware behavior from a privileged view of the same policy into its deployable, conversation-only view; and (2) \emph{State-Transition Policy Refinement} ...

2605.21446 2026-06-04 cs.RO cs.AI 版本更新

Lost in Fog: Sensor Perturbations Expose Reasoning Fragility in Driving VLAs

迷失在雾中:传感器扰动暴露驾驶VLA的推理脆弱性

Abhinaw Priyadershi, Jelena Frtunikj

发表机构 * NVIDIA Corporation, USA(NVIDIA公司,美国) NVIDIA GmbH, Germany(NVIDIA德国公司)

AI总结 通过受控传感器扰动实验,发现因果链解释的一致性可作为轨迹可靠性的高保真指标,并证明启用因果链生成可提升轨迹精度。

详情
AI中文摘要

可解释的自主驾驶规划器不仅依赖于生成解释,还依赖于这些解释在真实传感器退化下的可靠性。本文对自主驾驶中视觉-语言-动作(VLA)模型的鲁棒性进行了受控扰动研究,评估了Alpamayo R1(10B参数)在八种传感器扰动(四种强度的高斯噪声、两种光照极端条件和两种雾浓度;约18,000次推理试验)下的1,996个场景。我们发现推理一致性是轨迹可靠性的高保真指标:当扰动后因果链(CoC)解释发生变化时,轨迹偏差激增5.3倍(21.8米 vs 4.1米),跨攻击类型的相关系数r=0.99,每样本点双列相关系数r_pb=0.53(Cohen's d=1.12)。受控消融实验表明,在匹配的推理设置下,启用CoC生成与轨迹精度提升相关(平均提升11.8%;p<0.0001)。在测试的噪声范围(σ∈{10,30,50,70})内,退化近似线性(R²=0.957),而标准输入预处理防御仅提供边际缓解。综上,这些结果将CoC一致性确立为规划安全的定量代理,并激励基于推理的运行时监控以实现更安全的VLA部署。

英文摘要

Interpretable autonomous driving planners depend not only on generating explanations, but also on those explanations remaining reliable under real-world sensor degradation. In this paper we present a controlled perturbation study of Vision-Language-Action (VLA) robustness in autonomous driving, evaluating Alpamayo R1 (10B parameters) across 1,996 scenarios under eight sensor perturbations (Gaussian noise at four intensities, two lighting extremes, and two fog levels; ${\sim}18{,}000$ inference trials). We find that reasoning consistency is a high-fidelity indicator of trajectory reliability: when Chain-of-Causation (CoC) explanations change after perturbation, trajectory deviation spikes $5.3{\times}$ (21.8m vs 4.1m), with $r\!=\!0.99$ across attack types and $r_{pb}\!=\!0.53$ per-sample (Cohen's $d\!=\!1.12$). A controlled ablation provides evidence that enabling CoC generation is associated with improved trajectory accuracy (11.8% on average across conditions; $p < 0.0001$) under matched inference settings. Over the tested noise range ($σ\in \{10, 30, 50, 70\}$), degradation is approximately linear ($R^2\!=\!0.957$), while standard input preprocessing defenses provide only marginal relief. Together, these results establish CoC consistency as a quantitative proxy for planning safety and motivate reasoning-based runtime monitoring for safer VLA deployment.

2605.20654 2026-06-04 cs.LG cs.AI 版本更新

REFLECTOR: Internalizing Step-wise Reflection against Indirect Jailbreak

REFLECTOR: 内化逐步反思以对抗间接越狱

Jiachen Ma, Jiawen Zhang, Xiangtian Li, Bo Zou, Chaochao Lu, Chao Yang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出REFLECTOR两阶段框架,通过教师引导生成反思数据并进行监督微调,再结合强化学习内化自主反思能力,在复杂间接攻击下实现超过90%的防御成功率,同时提升通用性能。

Comments ICML 2026

详情
AI中文摘要

尽管大型语言模型(LLMs)展现出卓越的能力,但它们仍然容易受到复杂的多步越狱攻击,这些攻击通过利用内部生成过程来规避传统的表面安全对齐。为了解决这些漏洞,我们提出了REFLECTOR,一个原则性的两阶段框架,将自我反思内化在生成轨迹中。REFLECTOR首先利用教师引导生成高质量反思数据用于监督微调(SFT),建立结构化的反思模式。随后,它使用强化学习(RL)结合结果驱动和奖励有效性监督,以培养稳健、自主的自我反思能力。实验结果表明,REFLECTOR在复杂的间接攻击下实现了超过90%的防御成功率(DSR),同时在不同威胁场景中具有稳健的泛化能力。值得注意的是,该框架增强了任务特定和通用效用,在GSM8K上获得了5.85%的提升,并在知识密集型基准测试中表现更佳。通过内化轨迹级安全性,REFLECTOR克服了表面对齐的基本限制,且没有显著的计算开销,为开发安全且能力强大的LLMs提供了一种高效且可扩展的解决方案。

英文摘要

While Large Language Models (LLMs) demonstrate remarkable capabilities, they remain susceptible to sophisticated, multi-step jailbreak attacks that circumvent conventional surface-level safety alignment by exploiting the internal generation process. To address these vulnerabilities, we propose Reflector, a principled two-stage framework that internalizes self-reflection within the generation trajectory. Reflector first leverages teacher-guided generation to produce high-quality reflection data for supervised fine-tuning (SFT), establishing structured reflection patterns. It subsequently uses Reinforcement Learning (RL) with outcome-driven and reward-validity supervision to instill robust, autonomous self-reflection capabilities. Empirical results show that Reflector achieves Defense Success Rates (DSR) exceeding 90% against complex indirect attacks while generalizing robustly across diverse threat scenarios. Notably, the framework enhances both task-specific and general utility, yielding a 5.85% gain on GSM8K alongside improved performance on knowledge-intensive benchmarks. By internalizing trajectory-level safety, Reflector overcomes the fundamental limitations of surface alignment without significant computational overhead, offering an efficient and scalable solution for the development of safe and capable LLMs.

2605.19398 2026-06-04 cs.CV cs.AI 版本更新

Rebalancing Reference Frame Dominance to Improve Motion in Image-to-Video Models

重新平衡参考帧主导性以改善图像到视频模型中的运动

Wooseok Jeon, Seungho Park, Seunghyun Shin, Sangeyl Lee, Hyeonho Jeong, Hae-Gon Jeon

发表机构 * Yonsei University(延世大学) GIST(韩国科学技术院) Adobe Research(Adobe研究)

AI总结 针对图像到视频模型生成视频过于静态的问题,提出无需训练且模型无关的DyMoS方法,通过重新平衡去噪初期生成帧对参考帧的注意力来增强运动,同时保持视觉质量和保真度。

Comments Preprint. Project page: https://sh0xed98b8.github.io/DyMoS/

详情
AI中文摘要

与文本到视频模型相比,图像到视频模型通常生成的视频过于静态。先前的方法通过削弱或修改图像条件信号来缓解这一问题,但往往需要额外训练或牺牲对参考图像的保真度。在这项工作中,我们识别出参考帧主导性是运动抑制的关键机制。我们观察到,I2V模型中的非参考帧将过多的自注意力分配给参考帧的关键词元,导致参考信息随时间过度传播,从而抑制了帧间动态。基于这一发现,我们提出了DyMoS(动态运动滑块),一种无需训练且模型无关的方法,在初始去噪步骤中重新平衡从生成帧到参考帧的注意力路径。DyMoS保持输入图像和模型权重不变,并引入单个标量参数以连续控制运动强度。在多个最先进的I2V骨干网络上的实验表明,DyMoS在保持视觉质量和参考图像保真度的同时,一致地改善了运动动态。

英文摘要

Image-to-video models often generate videos that remain overly static, compared to text-to-video models. While prior approaches mitigate this issue by weakening or modifying the image-conditioning signal, they often require additional training or sacrifice fidelity to the reference image. In this work, we identify reference-frame dominance as a key mechanism behind motion suppression. We observe that non-reference frames in I2V models allocate excessive self-attention to reference-frame key tokens, causing reference information to be over-propagated across time and suppressing inter-frame dynamics. Based on this finding, we propose DyMoS (Dynamic Motion Slider), a training-free and model-agnostic method that rebalances the attention pathway from generated frames to the reference frame during initial denoising steps. DyMoS leaves both the input image and model weights unchanged and introduces a single scalar parameter for continuous control over motion strength. Experiments across multiple state-of-the-art I2V backbones demonstrate that DyMoS consistently improves motion dynamics while maintaining visual quality and fidelity to the reference image.

2605.18879 2026-06-04 cs.LG cs.AI cs.CL 版本更新

ZeroUnlearn: Few-Shot Knowledge Unlearning in Large Language Models

ZeroUnlearn:大语言模型中的少样本知识遗忘

Yujie Lin, Chengyi Yang, Zhishang Xiang, Yiping Song, Jinsong Su

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出ZeroUnlearn框架,通过模型编辑将机器遗忘重新定义为精确的知识重映射问题,利用封闭解乘法参数更新实现高效、定向的少样本遗忘。

详情
AI中文摘要

大型语言模型由于在海量网络语料上训练,不可避免地会保留敏感信息(定义为可能引发有害生成的输入),从而引发隐私和安全担忧。现有的机器遗忘方法主要依赖于重训练或激进微调,这些方法要么计算成本高,要么容易降低相关知识并损害整体模型效用。在这项工作中,我们通过模型编辑将机器遗忘重新表述为一个精确的知识重映射问题。我们提出了ZeroUnlearn,一个少样本遗忘框架。它通过将敏感输入映射到中性目标状态并移除其原始表示来覆盖敏感输入。ZeroUnlearn通过封闭解形式的乘法参数更新强制执行表示正交性,从而实现高效且有针对性的遗忘。我们进一步将ZeroUnlearn扩展到基于梯度的变体,用于多样本遗忘。实验表明,我们的方法在保持模型整体效用的同时优于现有基线。我们的代码可在github上获取:https://github.com/XMUDeepLIT/ZeroUnlearn。

英文摘要

Large language models inevitably retain sensitive information, defined as inputs that may induce harmful generations, due to training on massive web corpora, raising concerns for privacy and safety. Existing machine unlearning methods primarily rely on retraining or aggressive fine-tuning, which are either computationally expensive or prone to degrading related knowledge and overall model utility. In this work, we reformulate machine unlearning as a precise knowledge re-mapping problem via model editing. We propose ZeroUnlearn, a few-shot unlearning framework. It overwrites sensitive inputs by mapping them to a neutral target state and removing their original representations. ZeroUnlearn enforces representational orthogonality through a multiplicative parameter update with a closed-form solution, enabling efficient and targeted unlearning. We further extend ZeroUnlearn to a gradient-based variant for multi-sample unlearning. Experiments demonstrate that our approach outperforms existing baselines while preserving general model utility. Our code is available at the github: https://github.com/XMUDeepLIT/ZeroUnlearn.

2605.19294 2026-06-04 cs.RO cs.AI 版本更新

DEFLECT: Temporal Counterfactual Preference Learning for Delay-Robust Asynchronous VLAs

DEFLECT: 面向延迟鲁棒异步VLA的时间反事实偏好学习

Yixiang Zhu, Yonghao Chen, Zijie Yang, Yusong Hu, Xinyu Chen

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科学与技术大学(广州)) One Robotics

AI总结 针对异步视觉-语言-动作(VLA)策略中陈旧观测导致的预测-执行不匹配问题,提出离线后训练框架DEFLECT,通过反事实偏好监督学习偏好与执行时间对齐的动作,无需人工标注、在线部署或架构修改,显著提升高延迟下的任务成功率。

详情
AI中文摘要

视觉-语言-动作(VLA)策略越来越依赖异步推理,将大模型延迟隐藏在持续的机器人运动背后。虽然这避免了同步动作块执行的“走走停停”行为,但产生了预测-执行不匹配:下一个动作块是根据推理开始时的陈旧观测计算得出的,但仅在机器人和场景发生变化后才执行。因此,适合预测时状态的动作可能与执行时状态不对齐。现有的运行时修复、行为克隆和偏好对齐方法并未直接教导策略解决这种陈旧输入不匹配问题。我们提出DEFLECT,一个面向延迟鲁棒异步VLA的离线后训练框架。DEFLECT将延迟引起的不匹配转化为反事实偏好监督:冻结的参考VLA从未来的执行时间观测生成偏好块,并从陈旧的预测时间观测生成拒绝块。可训练策略在相同的部署时间输入下对两个块进行评分,学习偏好与执行时间对齐的动作,同时监督微调锚点保留专家动作流形。DEFLECT不需要人工偏好标签、奖励模型、在线机器人部署、架构更改或额外的推理时间计算。在Kinetix、LIBERO和三个真实机器人任务上,DEFLECT相比强异步VLA基线提高了延迟鲁棒性,在高延迟下成功率提升高达6.4个百分点,并在真实规模VLA的最长延迟下实现4.6个百分点的增益。

英文摘要

Vision-Language-Action (VLA) policies increasingly rely on asynchronous inference to hide large-model latency behind ongoing robot motion. While this avoids the stop-and-go behavior of synchronous action-chunk execution, it creates a prediction-execution mismatch: the next chunk is computed from a stale observation at inference start but executed only after the robot and scene have evolved. As a result, actions that fit the prediction-time state can become misaligned with the execution-time state. Existing runtime repair, behavior-cloning, and preference-alignment approaches do not directly teach the policy to resolve this stale-input mismatch. We propose DEFLECT, an offline post-training framework for delay-robust asynchronous VLAs. DEFLECT converts latency-induced mismatch into counterfactual preference supervision: a frozen reference VLA generates a preferred chunk from the future execution-time observation and a rejected chunk from the stale prediction-time observation. The trainable policy scores both chunks under the same deployment-time input, learning to favor execution-time-aligned actions while a supervised fine-tuning anchor preserves the expert action manifold. DEFLECT requires no human preference labels, reward models, online robot rollouts, architectural changes, or additional inference-time computation. Across Kinetix, LIBERO, and three real-robot tasks, DEFLECT improves delay robustness over strong asynchronous VLA baselines, raising high-latency success by up to 6.4 percentage points and achieving a 4.6 percentage-point gain at the longest delay on a real-scale VLA.

2605.18931 2026-06-04 stat.ML cs.AI cs.LG 版本更新

Markov Chain Decoders Overcome the Heavy-Tail Limitations of Lipschitz Generative Models

马尔可夫链解码器克服Lipschitz生成模型的重尾限制

Abdelhakim Ziani, Andras Horvath, Paolo Ballarini

发表机构 * Université Paris Saclay, Lab. MICS, CentraleSupélec, Gif-sur-Yvette, France(巴黎萨克雷大学,MICS实验室,CentraleSupélec,法国吉夫昂耶vette) Università di Torino, Torino, Italy(都灵大学,意大利都灵)

AI总结 针对Lipschitz生成模型无法生成重尾分布的问题,提出用基于马尔可夫链的Phase-Type分布替换高斯解码器,显著降低了尾部误差和极端分位数误差。

详情
Journal ref
22nd European Performance Engineering Workshop (EPEW 2026), Jun 2025, Grimstad, Norway
AI中文摘要

重尾分布在性能评估、网络流量和风险建模中普遍存在。这种行为对现代深度生成模型构成了根本性挑战。标准变分自编码器(VAE)采用高斯解码器似然和Lipschitz约束神经网络,这种组合在结构上无法产生重尾输出:高斯尾部呈指数衰减,而Lipschitz连续性阻止解码器放大来自潜在空间的罕见事件以充分克服这种衰减。我们提供了这一局限性的理论刻画,并使用合成Pareto数据(跨越尾部指数$α$ ∈ {2, 3, 5, 30}和维度d ∈ {1, 5, 10}的网格)进行了受控实证演示。作为解决方案,我们在保持编码器、潜在空间和训练过程不变的情况下,将高斯解码器替换为基于马尔可夫链的Phase-Type(PH)分布。PH分布允许对任何正值分布(包括重尾族)进行任意精确的近似。实验表明,对于重尾数据,与高斯基线相比,基于PH的模型将尾部Kolmogorov-Smirnov距离减少了最多6倍,极端分位数误差减少了最多10倍。这些结果表明,将基于马尔可夫链的分布集成到生成模型的解码器中,为重尾生成问题提供了一个有原则且实际有效的解决方案。

英文摘要

Heavy-tailed distributions are prevalent in performance evaluation, network traffic, and risk modeling. This behavior poses a fundamental challenge for modern deep generative models. Standard Variational Autoencoders (VAEs) employ Gaussian decoder likelihoods and Lipschitz-constrained neural networks, a combination that is structurally incapable of producing heavy-tailed outputs: the Gaussian tail decays exponentially, and Lipschitz continuity prevents the decoder from amplifying rare events from the latent space input to sufficiently overcome this decay. We provide both a theoretical characterization of this limitation and a controlled empirical demonstration using synthetic Pareto data across a grid of tail indices $α$ $\in$ {2, 3, 5, 30} and dimensions d $\in$ {1, 5, 10}. As a solution, we replace the Gaussian decoder with a Phase-Type (PH) distribution based on Markov chains, while keeping the encoder, latent space, and training procedure identical. PH distributions allow for arbitrarily precise approximations of any positive-valued distributions, including heavy-tailed families. Experiments showed that the PH-based model reduces tail Kolmogorov-Smirnov distance by up to x6 and extreme quantile error by up to x10 compared to the Gaussian baseline for heavy-tailed data. These results demonstrate that integrating Markov chain-based distributions into the decoder of a generative model institutes a principled and practically effective solution to the heavy-tail generation problem.

2605.16331 2026-06-04 q-bio.BM cs.AI 版本更新

Retrieval and competition: how a protein foundation model starts a protein

检索与竞争:蛋白质基础模型如何启动蛋白质

Piotr Jedryszek, Oliver M. Crook

发表机构 * Department of Biology, University of Oxford, Oxford, UK(牛津大学生物学系) Kavli Institute for Nanoscience Discovery, University of Oxford, Oxford, UK(牛津大学纳科学发现研究所) Department of Chemistry, University of Oxford, Oxford, UK(牛津大学化学系)

AI总结 通过追踪ESM2-8M预测蛋白质起始甲硫氨酸的计算路径,发现模型依赖位置先验检索而非直接识别,揭示了模型置信度与生物学证据之间的脱节。

Comments updated figure 4

详情
AI中文摘要

蛋白质语言模型越来越多地用于指导实验和临床决策,但通常不清楚一个自信的预测是反映了对生物学证据的识别还是对统计默认值的检索。我们针对一个近乎普遍的生物学规则——蛋白质以甲硫氨酸起始——通过追踪ESM2-8M产生该预测的计算路径来检验这一区别。模型并未检测到掩码位置的甲硫氨酸。相反,它通过跨层组装的特定位置查询,从序列起始标记处的参考表示中检索出有利于甲硫氨酸的信号,最终输出通过与上下文相关电路的竞争而出现。为了理解位置信息如何到达读出端,我们引入了旋转频率带内注意力分数的范数-方向分解。位置编码通过分布在各个频带中的查询范数和角度对齐的耦合变化来运作。对于真实N端不是甲硫氨酸的序列(此时生物学问题至关重要),模型仍然预测甲硫氨酸。这不是由意外机制产生的正确预测,而是匹配统计平均值的位置先验检索电路的输出,在生物学偏离平均值的地方失败。区分这两者需要在单个电路、频率带和查询组成的层面上进行解析,这表明在生物学风险更高的预测中,机制验证将是必要且具有挑战性的。即使对于最简单的生物学规则,模型的预测也是通过分布式计算电路而非直接识别来介导的,这表明任务复杂性的增加将进一步模糊模型置信度与潜在生物学证据之间的关系。

英文摘要

Protein language models are increasingly used to guide experimental and clinical decisions, yet it is often unclear whether a confident prediction reflects recognition of biological evidence or retrieval of a statistical default. We examine this distinction for a near-universal biological rule, that proteins begin with methionine, by tracing the computational pathway through which ESM2-8M produces this prediction. The model does not detect methionine at the masked position. Instead, it retrieves a methionine-favouring signal from a reference representation at the beginning-of-sequence token via a position-specific query assembled across layers, with the final output emerging through competition with context-dependent circuits. To understand how positional information reaches the readout, we introduce a norm-direction decomposition of attention scores within rotary frequency bands. Positional encoding operates through coupled changes in query norm and angular alignment distributed across these bands. On sequences whose true N-terminus is not methionine, where the biological question matters, the model predicts methionine anyway. This is not a correct prediction produced by an unexpected mechanism, but the output of a positional-prior retrieval circuit that matches the statistical average and fails where biology diverges from it. Distinguishing the two requires resolution at the level of individual circuits, frequency bands, and query composition, suggesting that mechanistic verification will be necessary, and challenging, for predictions where the biological stakes are higher. Even for the simplest biological rule, the model's prediction is mediated by a distributed computational circuit rather than direct recognition, suggesting that increasing task complexity will further obscure the relationship between model confidence and underlying biological evidence.

2605.16301 2026-06-04 cs.CY cs.AI cs.LG 版本更新

Do LLMs Hold Their Values? MANTA: A Multi-Turn Adversarial Benchmark for Animal Welfare Reasoning

LLMs 是否坚持其价值观?MANTA:一个用于动物福利推理的多轮对抗性基准

Isabella Luong, Joyee Chen, Arturs Kanepajs, Jasmine Brazilek, Sankalpa Ghose, David Williams-King, Linh Le, Allen Lu

发表机构 * SPAR Compassion Aligned Machine Learning(同情对齐机器学习) NUS(新加坡大学) Mila(Mila研究所) ERA Cambridge(剑桥ERA)

AI总结 提出 MANTA 基准,通过多轮对抗性对话评估大语言模型在动物福利推理中的价值观稳定性和道德敏感性,发现单轮基准无法捕捉的排名变化和物种-压力交互效应。

详情
AI中文摘要

评估大语言模型(LLMs)中的动物福利推理仍然是一个开放挑战,尽管它们在消费者和专业环境中迅速部署,其中福利考虑隐含地出现在日常查询中。现有的基准(如 AnimalHarmBench)通过单轮、明确框架的问题进行评估,衡量模型在直接询问时是否避免有害内容。这种方法忽略了两种失败模式:在持续对抗性压力下的对齐退化,以及道德敏感性(模型是否在日常查询中自发提出福利问题)。为填补这一空白,我们构建了 MANTA,一个包含 1,088 个五轮对话的基准,从隐式的第一轮场景开始,通过明确的福利提示,再到来自五种类型(社会、文化、经济、实用和认知)的三轮对抗性压力。我们在两个维度上对对话进行评分:动物福利价值观稳定性(AWVS,主要)和动物福利道德敏感性(AWMS,诊断)。我们评估了七个前沿模型:Claude Opus 4.7、GPT-5.5、DeepSeek V4、Llama 3.3 70B、Mistral Small、Grok 4.3 和 Gemini 3.1 Flash Lite。多轮评估捕捉了单轮基准遗漏的行为:7 个模型中有 4 个相对于第一轮得分改变了排名,包括 Gemini Flash Lite,它在 AWMS 上从第五名下降到 AWVS 上的最后一名。AWMS 和 AWVS 呈正相关但不完全相关,表明道德识别测试捕捉了模型在压力下行为的一个稳定但不完整的组成部分。MANTA 还提供了先前基准无法获得的物种-压力交互矩阵,显示福利鲁棒性同时取决于动物和施加的压力;伴侣动物得分高于野生动物,后者高于养殖动物和无脊椎动物。我们发布了数据集、脚本化压力计划、评判提示和分析代码。

英文摘要

Evaluating animal welfare reasoning in LLMs remains an open challenge despite rapid deployment in consumer and professional contexts where welfare considerations appear implicitly in everyday queries. Existing benchmarks such as AnimalHarmBench evaluate this through single-turn, explicitly framed questions, measuring whether models avoid harmful content when directly asked. This approach overlooks two failure modes: alignment degradation under sustained adversarial pressure, and moral sensitivity (whether a model spontaneously surfaces welfare stakes in everyday queries). To fill this gap, we construct MANTA, a benchmark of 1,088 five-turn conversations progressing from an implicit Turn-1 scenario through an explicit welfare prompt to three adversarial pressure rounds drawn from a five-type taxonomy: Social, Cultural, Economic, Pragmatic, and Epistemic. We score conversations on two dimensions: Animal Welfare Value Stability (AWVS, primary) and Animal Welfare Moral Sensitivity (AWMS, diagnostic). We evaluate seven frontier models: Claude Opus 4.7, GPT-5.5, DeepSeek V4, Llama 3.3 70B, Mistral Small, Grok 4.3, and Gemini 3.1 Flash Lite. Multi-turn evaluation captures behavior single-turn benchmarks miss: 4 of 7 models change rank relative to Turn 1 scores, including Gemini Flash Lite, which drops from fifth on AWMS to last on AWVS. AWMS and AWVS are positively but imperfectly correlated, suggesting moral-recognition tests capture a stable but incomplete component of model behavior under pressure. MANTA also enables a species-by-pressure interaction matrix unavailable to prior benchmarks, showing welfare robustness depends jointly on the animal and pressure applied; companion animals score above wild animals, which score above farmed animals and invertebrates. We release the dataset, scripted pressure plans, judge prompts, and analysis code.

2605.15152 2026-06-04 cs.LG cs.AI 版本更新

Widening the Gap: Exploiting LLM Quantization via Outlier Injection

扩大差距:通过异常值注入利用LLM量化

Xiaohua Zhan, Kazuki Egashira, Robin Staab, Mark Vero, Martin Vechev

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 本文提出首个针对多种先进量化方法(AWQ、GPTQ、GGUF I-quants)的量化条件攻击,通过注入异常值导致权重塌缩,诱导模型在量化后出现恶意行为。

详情
AI中文摘要

LLM量化已成为内存高效部署的关键。最近的研究表明,量化方案可能带来严重的安全风险:对手可以发布一个在全精度下看似良性,但在用户量化后表现出恶意行为的模型。然而,现有的量化条件攻击仅限于相对简单的量化方法,攻击者可以估计在目标量化下保持不变的权重区域。值得注意的是,先前的攻击始终未能攻破更流行和复杂的方案,限制了其实际影响。在这项工作中,我们提出了首个量化条件攻击,能够持续诱导出可由多种先进量化技术(包括AWQ、GPTQ和GGUF I-quants)触发的恶意行为。我们的攻击利用了现代量化方法共有的一个简单特性:大的异常值可能导致其他权重四舍五入为零。因此,通过向特定权重块注入异常值,对手可以诱导模型出现目标性的、可预测的权重塌缩。这种效应可用于制作看似良性的全精度模型,这些模型在量化后表现出广泛的恶意行为。通过在三种攻击场景和LLM上的广泛评估,我们表明我们的攻击在先前攻击失败的多种量化方法上实现了高成功率。我们的结果首次证明,量化的安全风险不仅限于更简单的方案,而是广泛存在于复杂、广泛使用的量化方法中。

英文摘要

LLM quantization has become essential for memory-efficient deployment. Recent work has shown that quantization schemes can pose critical security risks: an adversary may release a model that appears benign in full precision but exhibits malicious behavior once quantized by users. However, existing quantization-conditioned attacks have been limited to relatively simple quantization methods, where the attacker can estimate weight regions that remain invariant under the target quantization. Notably, prior attacks have consistently failed to compromise more popular and sophisticated schemes, limiting their practical impact. In this work, we introduce the first quantization-conditioned attack that consistently induces malicious behavior that can be triggered by a broad range of advanced quantization techniques, including AWQ, GPTQ, and GGUF I-quants. Our attack exploits a simple property shared by many modern quantization methods: large outliers can cause other weights to be rounded to zero. Consequently, by injecting outliers into specific weight blocks, an adversary can induce a targeted, predictable weight collapse in the model. This effect can be used to craft seemingly benign full-precision models that exhibit a wide range of malicious behaviors after quantization. Through extensive evaluation across three attack scenarios and LLMs, we show that our attack achieves high success rates against a broad range of quantization methods on which prior attacks fail. Our results demonstrate, for the first time, that the security risks of quantization are not restricted to simpler schemes but are broadly relevant across complex, widely-used quantization methods.

2605.14054 2026-06-04 cs.AI cs.CV 版本更新

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

Bad Seeing or Bad Thinking? Rewarding Perception for Multimodal Reasoning

Haozhe Wang, Qixin Xu, Changpeng Wang, Taofeng Xue, Chong Peng, Wenhu Chen, Fangzhen Lin

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种基于强化学习的模态感知信用分配框架(MoCA),通过感知验证和结构化口头验证解决视觉语言模型中感知与推理的权衡问题,实现多任务性能提升。

Comments Accepted by ICML 2026 as Oral

详情
AI中文摘要

实现稳健的感知-推理协同是高级视觉语言模型(VLM)的核心目标。最近的进展通过架构设计或智能体工作流追求这一目标。然而,这些方法通常受限于静态文本推理,或因外部智能体复杂性的巨大计算和工程负担而变得复杂。更糟糕的是,这种大量投入并未带来成比例的性能提升,常常在感知和推理上观察到“跷跷板效应”。这促使我们从根本上重新思考真正的瓶颈。在本文中,我们认为这种权衡的根本原因是模态信用分配中的模糊性:当VLM失败时,是由于感知缺陷(“坏视力”)还是逻辑缺陷(“坏思维”)?为解决这一问题,我们引入了一个强化学习框架,通过可靠地奖励感知保真度来改善感知-推理协同。我们明确地将生成过程分解为交错的感知和推理步骤。这种解耦使得能够对感知进行有针对性的监督。关键的是,我们引入了感知验证(PV),利用“盲推理”代理独立于推理结果奖励感知保真度。此外,为了在自由形式的VL任务中扩展训练,我们提出了结构化口头验证(Structured Verbal Verification),用结构化的算法执行替代高方差的LLM评判。这些技术被整合到模态感知信用分配(MoCA)机制中,该机制将奖励路由到特定的错误源——无论是坏视力还是坏思维——使单个VLM能够在广泛的任务谱系上同时获得性能提升。

英文摘要

Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a "seesaw effect" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception ("bad seeing") or flawed logic ("bad thinking")? To resolve this, we introduce a reinforcement learning framework that improves perception-reasoning synergy by reliably rewarding the perception fidelity. We explicitly decompose the generation process into interleaved perception and reasoning steps. This decoupling enables targeted supervision on perception. Crucially, we introduce Perception Verification (PV), leveraging a "blindfolded reasoning" proxy to reward perceptual fidelity independently of reasoning outcomes. Furthermore, to scale training across free-form VL tasks, we propose Structured Verbal Verification, which replaces high-variance LLM judging with structured algorithmic execution. These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error -- either bad seeing or bad thinking -- enabling a single VLM to achieve simultaneous performance gains across a wide task spectrum.

2605.13672 2026-06-04 cs.CV cs.AI cs.LG 版本更新

SpurAudio: A Benchmark for Studying Shortcut Learning in Few-Shot Audio Classification

SpurAudio: 用于研究少样本音频分类中捷径学习的基准

Giries Abu Ayoub, Morad Tukan, Loay Mualem

发表机构 * Department of Computer Science, University of Haifa(海法大学计算机科学系) Independent Researcher(独立研究者) University of Stuttgart, Germany(斯图加特大学,德国) IMPRS-IS, Germany(智能系统国际Max Planck研究学校,德国)

AI总结 提出SpurAudio基准,通过控制音频中前景与背景的关联,评估少样本分类模型对虚假相关性的敏感性,发现现有方法在背景变化时性能显著下降。

详情
AI中文摘要

少样本分类(FSC)广泛用于从有限标注数据中学习,但大多数评估隐含假设目标概念与上下文线索无关。然而,在现实场景中,样本通常出现在丰富的上下文中,允许模型利用前景内容与背景信号之间的虚假相关性。虽然这种效应已在少样本图像分类中得到研究,但其在少样本音频分类中的作用仍 largely 未被探索,且现有音频基准对上下文结构的控制有限。我们引入了 SpurAudio,一个利用音频中前景事件和背景环境的自然可分离性,以支持对支持集和查询集之间的上下文偏移进行可控、多级评估的基准。使用该基准,我们表明许多最先进的少样本方法在背景相关性被破坏时遭受严重的性能下降,尽管在标准评估协议下达到相似的准确率。关键的是,即使在大型预训练音频基础模型中,这种脆弱性仍然存在,排除了骨干网络容量不足的解释。此外,在传统基准下看似相当的方法可能对虚假相关性表现出显著不同的敏感性,揭示了与特征表示在推理时如何与分类器头交互相关的系统性算法优势和脆弱性。这些发现为音频中少样本方法的行为提供了新的见解,并强调了在评估FSC模型时需要明确探测上下文依赖性的基准。

英文摘要

Few-shot classification (FSC) is widely used for learning from limited labeled data, yet most evaluations implicitly assume that target concepts are independent of contextual cues. In real-world settings, however, examples often appear within rich contexts, allowing models to exploit spurious correlations between foreground content and background signals. While such effects have been studied in few-shot image classification, their role in few-shot audio classification remains largely unexplored, and existing audio benchmarks offer limited control over contextual structure. We introduce SpurAudio, a benchmark that leverages the natural separability of foreground events and background environments in audio to enable controlled, multi-level evaluation of contextual shifts across support and query sets. Using this benchmark, we show that many state-of-the-art few-shot methods suffer severe performance degradation when background correlations are disrupted, despite achieving similar accuracy under standard evaluation protocols. Crucially, this vulnerability persists even in large pretrained audio foundation models, ruling out limited backbone capacity as an explanation. Moreover, methods that appear comparable under conventional benchmarks can exhibit markedly different sensitivity to spurious correlations, revealing systematic algorithmic strengths and vulnerabilities tied to how feature representations interact with classifier heads at inference time. These findings provide new insight into the behavior of few-shot methods in audio and highlight the need for benchmarks that explicitly probe context dependence when evaluating FSC models.

2304.10891 2026-06-04 cs.LG cs.AI cs.CV cs.RO cs.SY eess.SY 版本更新

Transformer-Based Autonomous Driving Models and Deployment-Oriented Compression: A Survey

基于Transformer的自动驾驶模型与面向部署的压缩:综述

Juan Zhong, Yuhang Shi, Zukang Xu, Xi Chen

发表机构 * Renmin University of China(中国人民大学) Artificial Intelligence Innovation and Incubation Institute, Fudan University(复旦大学人工智能创新与孵化院) Shanghai Academy of AI for Science(上海人工智能科学研究院) Department of houmo.ai(houmo.ai部门)

AI总结 本文综述了基于Transformer的自动驾驶模型,并从部署角度分析了压缩与加速策略(如量化、剪枝、知识蒸馏等)如何影响模型设计、部署性、鲁棒性和安全性。

详情
AI中文摘要

基于Transformer的模型正成为自动驾驶的核心范式,因为它们能够捕捉感知、预测和规划中的长程空间依赖、多智能体交互和多模态上下文。然而,它们在真实车辆中的部署仍然困难,因为高容量注意力架构带来了显著的延迟、内存和能量开销。本综述回顾了具有代表性的基于Transformer的自动驾驶模型,并按任务角色、感知配置和架构设计进行组织。更重要的是,我们从面向部署的角度审视这些模型,分析效率约束如何在实际中重塑模型设计选择。我们进一步回顾了与基于Transformer的驾驶系统相关的压缩和加速策略,包括量化、剪枝、知识蒸馏、低秩近似和高效注意力,并讨论了它们的优势、局限性和任务依赖性。我们不将压缩视为孤立的后期处理步骤,而是强调其作为直接影响部署性、鲁棒性和安全性的系统级设计考虑。最后,我们指出了面向标准化、安全感知和硬件感知的高效自动驾驶系统评估的开放挑战和未来研究方向。

英文摘要

Transformer-based models are becoming a central paradigm in autonomous driving because they can capture long-range spatial dependencies, multi-agent interactions, and multimodal context across perception, prediction, and planning. At the same time, their deployment in real vehicles remains difficult because high-capacity attention-based architectures impose substantial latency, memory, and energy overhead. This survey reviews representative Transformer-based autonomous driving models and organizes them by task role, sensing configuration, and architectural design. More importantly, it examines these models from a deployment-oriented perspective and analyzes how efficiency constraints reshape model design choices in practice. We further review compression and acceleration strategies relevant to Transformer-based driving systems, including quantization, pruning, knowledge distillation, low-rank approximation, and efficient attention, and discuss their benefits, limitations, and task-dependent applicability. Rather than treating compression as an isolated post-processing step, we highlight it as a system-level design consideration that directly affects deployability, robustness, and safety. Finally, we identify open challenges and future research directions toward standardized, safety-aware, and hardware-conscious evaluation of efficient autonomous driving systems.

2605.10246 2026-06-04 cs.AI 版本更新

SciIntegrity-Bench: A Benchmark for Evaluating Academic Integrity in AI Scientist Systems

SciIntegrity-Bench:评估AI科学家系统学术诚信的基准

Zonglin Yang, Xingtong Liu, Xinyan Xu

发表机构 * Tongji University(同济大学) University of Tübingen(图宾根大学)

AI总结 提出SCIINTEGRITY-BENCH基准,通过困境评估范式测试7个LLM在33个场景中的学术诚信,发现整体诚信问题率达34.2%,所有模型均存在失败,其中数据缺失场景下所有模型生成合成数据而非承认不可行。

详情
AI中文摘要

AI科学家系统越来越多地被用于自主研究,但其学术诚信从未被系统评估。我们引入了SCIINTEGRITY-BENCH,这是第一个围绕困境评估范式设计的基准:其11个陷阱类别中的33个场景均被构建为,诚实承认失败是唯一正确的回应,而完成任务则需要不端行为。在涵盖7个最先进LLM的231次评估运行中,整体诚信问题率达到34.2%,且没有模型实现零失败。最引人注目的是,在数据缺失场景中,所有七个模型都生成合成数据而非承认不可行,仅在是否披露替代数据方面有所不同。进一步的提示消融研究分离出两个驱动因素:移除明确的完成压力将未披露的捏造从20.6%急剧降至3.2%,而底层合成率保持不变,揭示了一种独立于提示级指令而持续存在的内在完成偏差。这些发现表明,缺乏诚实拒绝作为一种训练有素的倾向是观察到的失败的主要驱动因素。我们在https://github.com/liuxingtong/Sci-Integrity-Bench发布SCIINTEGRITY-BENCH。

英文摘要

AI scientist systems are increasingly deployed for autonomous research, yet their academic integrity has never been systematically evaluated. We introduce SCIINTEGRITY-BENCH, the first benchmark designed around a dilemmatic evaluation paradigm: each of its 33 scenarios across 11 trap categories is constructed so that honest acknowledgment of failure is the only correct response, while task completion requires misconduct. Across 231 evaluation runs spanning 7 state-of-the-art LLMs, the overall integrity problem rate reaches 34.2%, and no model achieves zero failures. Most strikingly, across missing-data scenarios, all seven models generate synthetic data rather than acknowledging infeasibility, differing only in whether they disclose the substitution. A further prompt ablation study separates two drivers: removing explicit completion pressure sharply reduces undisclosed fabrication from 20.6% to 3.2%, while the underlying synthesis rate remains unchanged, revealing an intrinsic completion bias that persists independent of prompt-level instructions. These findings point to the absence of honest refusal as a trained disposition as the primary driver of observed failures. We release SCIINTEGRITY-BENCH at https://github.com/liuxingtong/Sci-Integrity-Bench.

2602.02834 2026-06-04 cs.LG cs.AI 版本更新

What Structural Inductive Bias Helps Transformers Reason Over Knowledge Graphs? A Study with Tabula RASA

什么结构归纳偏置帮助Transformer在知识图谱上进行推理?Tabula RASA研究

Jonas Petersen, Camilla Mazzoleni, Gian-Alessandro Lombardi, Federico Martelli, Riccardo Maggioni

发表机构 * ETH Zurich(苏黎世联邦理工学院)

AI总结 通过最小化Transformer修改的消融实验,发现稀疏邻接掩码是驱动多跳推理的主要结构归纳偏置,而关系参数贡献有限。

Comments Accepted at GFM, ICML 2026

详情
AI中文摘要

什么结构归纳偏置帮助Transformer在知识图谱上进行推理?通过对一个最小化Transformer修改(包含四个独立可移除组件:稀疏邻接掩码、边类型偏置、查询缩放、值门控)进行受控消融,我们隔离了哪些结构信号驱动多跳推理。我们的发现很明确:稀疏邻接掩码单独占据了相对于未掩码Transformer改进的主要份额(在3跳MetaQA上+72.5pp,在WebQSP上+45.5pp,在CWQ上+53.9pp),而学习的关系参数只增加了适度的改进,并且在缺乏结构指导时可能造成损害。一个零样本实验提供了架构独立的佐证:当边类型被排除时,基于掩码的注意力退化比关系特定权重少4.0倍。多跳KGQA的有用归纳偏置主要是拓扑的,而非关系的。

英文摘要

What structural inductive bias helps transformers reason over knowledge graphs? Through controlled ablations of a minimal transformer modification with four independently removable components (sparse adjacency masking, edge-type biases, query scaling, value gating), we isolate which structural signals drive multi-hop reasoning. Our finding is sharp: sparse adjacency masking alone accounts for the dominant share of improvement over unmasked transformers (+72.5pp on 3-hop MetaQA, +45.5pp on WebQSP, +53.9pp on CWQ), while learned relation parameters add only modest refinement and can actively hurt without structural guidance. A zero-shot experiment provides architecturally independent corroboration: masking-based attention degrades 4.0x less than relation-specific weights when edge types are held out. The useful inductive bias for multi-hop KGQA is predominantly topological, not relational.

2605.03353 2026-06-04 cs.CR cs.AI 版本更新

SkCC: Portable and Secure Skill Compilation for Cross-Framework LLM Agents

SkCC:面向跨框架LLM代理的可移植且安全的技能编译

Yipeng Ouyang, Yi Xiao, Yuhao Gu, Xianwei Zhang

发表机构 * Sun Yat-sen University(中山大学)

AI总结 针对LLM代理技能在不同框架间缺乏可移植性和安全性的问题,提出SkCC编译器,通过强类型中间表示SkIR解耦语义与格式,实现跨框架部署,并内置静态优化器强制执行安全约束,显著提升性能并降低适配复杂度。

Comments Accepted by the Agent Skills Workshop at ACM CAIS 2026. 20 pages, 6 figures. Project Homepage: https://skcc.nexa-lang.com/ Code Repo: https://github.com/Nexa-Language/Skill-Compiler/

详情
AI中文摘要

LLM代理越来越依赖可重用技能(例如SKILL markdown文件)来执行复杂任务,但这些工件缺乏可移植性:代理框架对提示格式高度敏感,导致同一技能的性能差异很大。然而,大多数技能以格式无关的Markdown形式一次性编写,需要昂贵的逐框架重写,并且安全性在很大程度上未得到解决,实践中存在广泛漏洞。为解决这些问题,我们提出SkCC,一个LLM代理编译器,将经典编译设计引入代理技能开发。SkCC以SkIR为核心,这是一种强类型中间表示,将技能语义与框架特定格式解耦,从而支持跨代理框架的可移植部署。在此IR之上,静态优化器强制执行安全约束,在部署前阻止漏洞。作为四阶段流水线实现,SkCC有效将跨$m$个技能和$n$个框架的适配复杂度从$O(m \times n)$降低到$O(m + n)$。在SkillsBench上的实验表明,SkCC相比原始版本带来一致且显著的性能提升,在Claude Code上通过率从21.1%提高到33.3%,在Kimi CLI上从35.1%提高到48.7%。此外,该设计实现了低于10ms的编译延迟、94.8%的主动安全触发率以及跨框架10-46%的运行时token节省。

英文摘要

LLM agents increasingly rely on reusable skills (e.g., SKILL markdown files) to execute complex tasks, yet these artifacts lack portability: agent frameworks are highly sensitive to prompt formatting, leading to a large performance variation for the same skill. Nevertheless, most skills are authored once as format-agnostic Markdown, necessitating costly per-framework rewrites and also leaving security largely unaddressed, with widespread vulnerabilities in practice. To address this, we present SkCC, a compiler for LLM agents that introduces classical compilation design into agent skill development. SkCC centers on SkIR, a strongly-typed intermediate representation that decouples skill semantics from framework-specific formatting, thus enabling portable deployment across agent frameworks. Atop of this IR, a static Optimizer enforces security constraints, blocking vulnerabilities before deployment. Implemented as a four-phase pipeline, SkCC effectively reduces adaptation complexity from $O(m \times n)$ to $O(m + n)$ across $m$ skills and $n$ frameworks. Experiments on SkillsBench demonstrate that SkCC delivers consistent and substantial gains over original counterparts, with pass rate increases from 21.1% to 33.3% on Claude Code and from 35.1% to 48.7% on Kimi CLI. Further, the design achieves sub-10ms compilation latency, 94.8% proactive security trigger rate, and 10-46% runtime token savings across frameworks.

2510.17281 2026-06-04 cs.LG cs.AI cs.IR 版本更新

MemoryBench: A Benchmark for Memory and Continual Learning in LLM Systems

MemoryBench:面向LLM系统的记忆与持续学习基准

Qingyao Ai, Yichen Tang, Changyue Wang, Jianming Long, Weihang Su, Yiqun Liu

发表机构 * Department of Computer Science and Technology, Tsinghua University, Beijing, China(清华大学计算机科学与技术系)

AI总结 提出用户反馈模拟框架及跨领域、多语言、多任务类型的综合基准MemoryBench,评估LLM系统从累积用户反馈中持续学习的能力,实验表明现有方法效果与效率均不理想。

详情
AI中文摘要

扩展数据、参数和测试时计算一直是改进LLM系统(LLMsys)的主流方法,但由于高质量数据的逐渐枯竭以及更大计算资源消耗带来的边际收益,这些方法的性能上限已几乎达到。受人类和传统AI系统从实践中学习能力的启发,为LLMsys构建记忆和持续学习框架已成为近期文献中一个重要且热门的研究方向。然而,现有的LLM记忆基准通常侧重于评估系统在长文本输入的同质阅读理解任务上的表现,而非测试其在服务时间内从累积用户反馈中学习的能力。因此,我们提出了一个用户反馈模拟框架和一个涵盖多个领域、语言和任务类型的综合基准,以评估LLMsys的持续学习能力。实验表明,最先进的基线方法在有效性和效率上远未令人满意,我们希望这一基准能为未来LLM记忆和优化算法的研究铺平道路。

英文摘要

Scaling up data, parameters, and test-time computation has been the mainstream methods to improve LLM systems (LLMsys), but their upper bounds are almost reached due to the gradual depletion of high-quality data and marginal gains obtained from larger computational resource consumption. Inspired by the abilities of human and traditional AI systems in learning from practice, constructing memory and continual learning frameworks for LLMsys has become an important and popular research direction in recent literature. Yet, existing benchmarks for LLM memory often focus on evaluating the system on homogeneous reading comprehension tasks with long-form inputs rather than testing their abilities to learn from accumulated user feedback in service time. Therefore, we propose a user feedback simulation framework and a comprehensive benchmark covering multiple domains, languages, and types of tasks to evaluate the continual learning abilities of LLMsys. Experiments show that the effectiveness and efficiency of state-of-the-art baselines are far from satisfying, and we hope this benchmark could pave the way for future studies on LLM memory and optimization algorithms. Website: https://memorybench.thuir.cn Code: https://github.com/THUIR/MemoryBench Data: https://huggingface.co/datasets/THUIR/MemoryBench Data-Full: https://huggingface.co/datasets/THUIR/MemoryBench-Full

2605.07724 2026-06-04 cs.LG cs.AI 版本更新

Curated Synthetic Data Doesn't Have to Collapse: A Theoretical Study of Generative Retraining with Pluralistic Preferences

策展合成数据不会崩溃:具有多元偏好的生成式再训练的理论研究

Ali Falahati, Mohammad Mohammadi Amiri, Kate Larson, Lukasz Golab

发表机构 * University of Washington(华盛顿大学)

AI总结 通过理论分析证明,基于多个奖励函数进行策展的递归训练可以避免生成模型崩溃,并收敛到满足加权纳什议价解的稳定分布。

Comments Proceedings of the 43rd International Conference on Machine Learning, Seoul, South Korea. PMLR 306, 2026

详情
AI中文摘要

生成模型的递归再训练提出了一个关键的表示挑战:当基于固定奖励信号策展合成输出时,模型倾向于崩溃到过度优化该目标的狭窄输出集上。先前的研究表明,如果不将真实数据混合进来,这种崩溃是不可避免的。我们从对齐角度重新审视这一结论,并表明通过基于多个奖励函数的策展可以减轻崩溃。我们形式化了异质偏好下递归训练的动力学,并证明在特定条件下,模型收敛到一个稳定分布,该分布在竞争的高奖励区域之间分配概率质量。极限分布保持多样性,并证明满足加权纳什议价解,为合成再训练循环中的价值聚合提供了正式解释。

英文摘要

Recursive retraining of generative models poses a critical representation challenge: when synthetic outputs are curated based on a fixed reward signal, the model tends to collapse onto a narrow set of outputs that over-optimize that objective. Prior work suggests that such collapse is unavoidable without adding real data into the mix. We revisit this conclusion from an alignment perspective and show that collapse can be mitigated through curation based on multiple reward functions. We formalize the dynamics of recursive training under heterogeneous preferences and prove that, under certain conditions, the model converges to a stable distribution that allocates probability mass across competing high-reward regions. The limiting distribution preserves diversity and provably satisfies a weighted Nash bargaining solution, offering a formal interpretation of value aggregation in synthetic retraining loops.

2605.07032 2026-06-04 cs.LG cs.AI 版本更新

A Systematic Investigation of RL-Jailbreaking in LLMs

LLMs中RL越狱的系统性研究

Montaser Mohammedalamen, Kevin Roice, Reginald McLean, Alyssa Lefaivre Škopac

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文首次系统分解RL越狱框架,通过分析奖励函数、动作空间、回合长度等环境形式化因素和算法措施,发现密集奖励和延长回合长度是越狱成功的主要驱动因素,并提供了提升RL越狱效率及强化模型防御的工具。

Comments Warning: This paper may contain unfiltered and potentially offensive jailbreaking examples. Accepted at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情
AI中文摘要

生成模型从下一个词预测器演变为复杂系统的自主引擎,这要求严格的安全加固。对抗性越狱,即通过策略性操纵模型以产生有害输出,仍然是安全部署的主要威胁。虽然强化学习(RL)通过顺序优化将越狱视为多步攻击,但对该框架为何成功的机制理解仍不完整。为填补这一空白,我们首次对RL越狱进行了系统分解。我们将框架解构为问题形式化(奖励函数、动作空间、回合长度)和算法措施(RL算法、训练数据、奖励塑造),以识别对抗成功的结构决定因素。我们的结果表明,RL越狱者成功攻破了所有目标模型和安全措施。通过这种首次分析,我们证明环境形式化,特别是密集奖励和延长回合长度,是越狱成功的主要驱动因素。这项工作为提高RL越狱效率提供了工具,并最终强化生成模型以抵御基于RL的攻击。

英文摘要

The evolution of generative models from next-token predictors to autonomous engines of complex systems necessitates rigorous safety hardening. Adversarial jailbreaking, the strategic manipulation of models to elicit harmful output, remains a primary threat to safe deployment. While Reinforcement Learning (RL) frames jailbreaking as a multi-step attack through sequential optimization, a mechanistic understanding of why the framework succeeds remains incomplete. To fill this gap, we present the first systematic decomposition of RL jailbreaking. We deconstruct the framework into problem formalization (reward function, action space, episode length), and algorithmic measures (RL algorithm, training data, reward-shaping) to identify the structural determinants of adversarial success. Our results reveal that the RL-jailbreaker successfully compromised all targeted models and safeguards. Through this first-of-its-kind analysis, we demonstrate that environment formalization, specifically dense rewards and extended episode lengths, is the primary driver of jailbreaking success. This work provides a tool for improving RL-jailbreaker efficiency and, ultimately, harden generative models resistant to RL-based attacks.

2605.00242 2026-06-04 cs.CV cs.AI 版本更新

MAEPose: Self-Supervised Spatiotemporal Learning for Human Pose Estimation on mmWave Video

MAEPose: 基于毫米波视频的人体姿态估计的自监督时空学习

Xijia Wei, Yuan Fang, Kevin Chetty, Youngjun Cho, Nadia Bianchi-Berthouze

发表机构 * University College London(伦敦大学学院)

AI总结 提出MAEPose,一种直接处理毫米波频谱视频的掩码自编码方法,通过自监督时空学习实现鲁棒的人体姿态估计,在三个数据集上优于现有方法。

详情
AI中文摘要

毫米波雷达为基于RGB的人体姿态估计提供了一种更具隐私保护性的替代方案。然而,现有方法通常依赖预提取的中间表示,如稀疏点云或频谱图图像,这些方法丢弃了雷达视频流中自然存在的丰富时空信息用于模型学习,同时此类信号处理增加了系统复杂性。此外,现有解决方案主要采用端到端监督方式,未利用未标记的原始视频流来学习通用表示。在本研究中,我们提出MAEPose,一种基于掩码自编码的人体姿态估计方法,直接处理毫米波频谱视频。MAEPose从未标记的雷达视频中学习时空运动感知的通用表示,并利用其热图解码器进行多帧姿态估计预测。我们基于留一法交叉验证和严格的统计检验,在三个数据集上对其进行评估。MAEPose在MPJPE指标上始终优于最先进的基线方法,最高提升22.1%(p<0.05),并且在零样本旁观者干扰下保持鲁棒精度,误差仅增加6.5%。消融研究证实,预训练和热图解码器均有显著贡献,而模态分析表明,使用距离-多普勒视频作为输入比距离-方位角或其融合能实现更好的姿态估计性能,且计算成本更低。

英文摘要

Millimetre-wave (mmWave) radar offers a more privacy-preserving alternative to RGB-based human pose estimation. However, existing methods typically rely on pre-extracted intermediate representations such as sparse point clouds or spectrogram images, where the rich spatiotemporal information naturally present in radar video streams is discarded for model learning, while such signal processing adds system complexity. In addition, existing solutions are mainly conducted in an end-to-end supervised manner without leveraging unlabelled raw video streams to learn generalized representations. In this study, we present MAEPose, a masked autoencoding-based human pose estimation approach that operates directly on mmWave spectrogram videos. MAEPose learns spatiotemporal motion-aware generalized representations from unlabelled radar video, and leverages its heatmap decoder for multi-frame pose estimation predictions. We evaluate it across three datasets based on leave-one-person-out cross-validation with rigorous statistical testing. MAEPose consistently outperforms state-of-the-art baselines by up to 22.1% in MPJPE p<0.05, and maintains robust accuracy under zero-shot bystander interference with only a 6.5% error increase. Ablation studies confirm that both the pre-training and the heatmap decoder contribute substantially, while modality analysis indicates that leveraging Range-Doppler video as input achieves better pose estimation performance than Range-Azimuth or their fusion, with lower computational cost.

2604.27007 2026-06-04 cs.AI 版本更新

Binary Spiking Neural Networks as Causal Models

二元脉冲神经网络作为因果模型

Aditya Kar, Emiliano Lorini, Timothée Masquelier

发表机构 * Institut de Recherche en Informatique de Toulouse (IRIT)(图卢兹信息研究所(IRIT)) Centre de Recherche Cerveau et Cognition (CerCo)(脑与认知研究中心(CerCo)) CNRS(国家科学研究中心)

AI总结 将二元脉冲神经网络(BSNN)表示为二元因果模型,利用SAT和SMT求解器计算溯因解释,并保证解释中不包含无关特征。

详情
Journal ref
Logics for New-Generation AI 2025 Fifth International Workshop, Beishui Liao; Antonino Rotolo; Leendert van der Torre; Liuwen Yu, Dec 2025, Luxembourg City, Luxembourg. pp.51-68
AI中文摘要

我们对二元脉冲神经网络(BSNN)进行因果分析以解释其行为。我们正式定义了BSNN,并将其脉冲活动表示为二元因果模型。借助这种因果表示,我们能够利用基于逻辑的方法解释网络的输出。特别地,我们展示了可以成功使用SAT和SMT求解器从该二元因果模型中计算溯因解释。为了说明我们的方法,我们在标准MNIST数据集上训练了BSNN,并应用基于SAT和SMT的方法,基于像素级特征找到网络分类的溯因解释。我们还将找到的解释与可解释AI领域流行的方法SHAP进行了比较。我们表明,与SHAP不同,我们的方法保证找到的解释不包含完全无关的特征。

英文摘要

We provide a causal analysis of Binary Spiking Neural Networks (BSNNs) to explain their behavior. We formally define a BSNN and represent its spiking activity as a binary causal model. Thanks to this causal representation, we are able to explain the output of the network by leveraging logic-based methods. In particular, we show that we can successfully use a SAT as well as a SMT solver to compute abductive explanations from this binary causal model. To illustrate our approach, we trained the BSNN on the standard MNIST dataset and applied our SAT-based and SMT-based methods to finding abductive explanations of the network's classifications based on pixel-level features. We also compared the found explanations against SHAP, a popular method used in the area of explainable AI. We show that, unlike SHAP, our approach guarantees that a found explanation does not contain completely irrelevant features.

2604.25860 2026-06-04 cs.CL cs.AI cs.CY 版本更新

Luminol-AIDetect: Fast Zero-shot Machine-Generated Text Detection based on Perplexity under Text Shuffling

Luminol-AIDetect: 基于文本打乱下困惑度的快速零样本机器生成文本检测

Lucio La Cava, Andrea Tagarelli

发表机构 * DIMES Dept., University of Calabria(卡塔尼亚大学DIMES部门)

AI总结 提出Luminol-AIDetect,一种通过随机打乱文本并利用困惑度变化来区分机器生成文本与人类写作的零样本统计方法,在多个领域和攻击下达到SOTA性能。

Comments Under Review

详情
AI中文摘要

机器生成文本检测需要识别跨生成模型的结构不变信号,而非依赖模型特定指纹。为此,我们假设尽管大语言模型擅长局部语义一致性,但其自回归特性导致与人类写作相比存在特定结构脆弱性。我们提出Luminol-AIDetect,一种新颖的零样本统计方法,通过连贯性破坏暴露这种脆弱性。通过应用简单的随机文本打乱程序,我们证明困惑度的变化可作为原则性的、模型无关的判别依据,因为机器生成文本在打乱下的困惑度表现出特征性分散,与人类写作更稳定的结构变异性显著不同。Luminol-AIDetect利用这一区别指导决策过程,从输入文本及其打乱版本中提取少量基于困惑度的标量特征,然后通过密度估计和集成预测进行检测。在8个内容领域、11种对抗攻击类型和18种语言上的评估表明,Luminol-AIDetect实现了最先进的性能,FPR降低高达17倍,同时成本低于先前方法。

英文摘要

Machine-generated text (MGT) detection requires identifying structurally invariant signals across generation models, rather than relying on model-specific fingerprints. In this respect, we hypothesize that while large language models excel at local semantic consistency, their autoregressive nature results in a specific kind of structural fragility compared to human writing. We propose Luminol-AIDetect, a novel, zero-shot statistical approach that exposes this fragility through coherence disruption. By applying a simple randomized text-shuffling procedure, we demonstrate that the resulting shift in perplexity serves as a principled, model-agnostic discriminant, as MGT displays a characteristic dispersion in perplexity-under-shuffling that differs markedly from the more stable structural variability of human-written text. Luminol-AIDetect leverages this distinction to inform its decision process, where a handful of perplexity-based scalar features are extracted from an input text and its shuffled version, then detection is performed via density estimation and ensemble-based prediction. Evaluated across 8 content domains, 11 adversarial attack types, and 18 languages, Luminol-AIDetect demonstrates state-of-the-art performance, with gains up to 17x lower FPR while being cheaper than prior methods.

2603.01421 2026-06-04 cs.AI cs.CL 版本更新

SciDER: Scientific Data-centric End-to-end Researcher

SciDER: 以科学数据为中心的端到端研究者

Ke Lin, Owais Aijaz, Yilin Lu, Yiyang Luo, Xuehang Guo, Preslav Nakov

发表机构 * GitHub

AI总结 提出SciDER多智能体系统,通过数据驱动方法和动态多模态技能系统,自动化科学研究的全生命周期,并在六个基准测试中取得领先结果。

Comments 10 pages, 8 figures, 7 tables

详情
AI中文摘要

虽然大型语言模型加速了科学发现,但现有智能体在适应性、领域泛化和多模态可扩展性方面面临严重限制,通常难以自主处理原始的、特定领域的实验数据。为了克服这些障碍,我们引入了SciDER,一个旨在灵活自动化整个研究生命周期的多智能体系统。该框架采用新颖的数据中心方法,并在四个专门的子智能体之间集成动态多模态技能系统。具体来说,一个构思智能体通过进化思想搜索生成新颖假设,一个数据分析智能体系统化地结构化原始数据,一个实验智能体基于数据集特征合成可执行代码,一个批评智能体驱动迭代自我改进。为了民主化开源科学发现,我们发布了OpenSciDER-SFT-8K,一个高质量的执行轨迹数据集,以及OpenSciDER-27B微调模型。在六个基准测试中,SciDER和OpenSciDER取得了具有竞争力或领先的结果,在数据中心分析、端到端研究执行和多模态科学可视化方面尤其强劲。通过将数据分析与实验执行相结合,SciDER弥合了抽象科学推理与可重复实验合成之间的差距。

英文摘要

While large language models accelerate scientific discovery, existing agents face severe limitations in adaptability, domain generalization, and multimodal scalability, often struggling to autonomously process raw, domain-specific experimental data. To overcome these barriers, we introduce SciDER, a multi-agent system designed to flexibly automate the entire research lifecycle. This framework employs a novel data-centric approach and integrates a dynamic multimodal skill system across four specialized sub-agents. Specifically, an ideation agent generates novel hypotheses via Evolutionary Idea Search, a data analysis agent systematically structures raw data, an experimentation agent synthesizes executable code grounded in dataset characteristics, and a critic agent drives iterative self-refinement. To democratize open-source scientific discovery, we release OpenSciDER-SFT-8K, a high-quality execution trajectory dataset, alongside the OpenSciDER-27B fine-tuned model. Across six benchmarks, SciDER and OpenSciDER obtain competitive or leading results, with especially strong gains on data-centric analysis, end-to-end research execution, and multimodal scientific visualization. By integrating data analysis with experimental execution, SciDER bridges the gap between abstract scientific reasoning and reproducible experimentation synthesis.

2510.11194 2026-06-04 cs.AI 版本更新

Aligning Deep Implicit Preferences by Learning to Reason Defensively

通过防御性推理对齐深度隐式偏好

Peiming Li, Zhiyuan Hu, Yang Tang, Shiyu Li, Xi Chen

发表机构 * Basic Algorithm Center, PCG, Tencent(腾讯基本算法中心) School of Electronic and Computer Engineering, Peking University(北京大学电子与计算机工程学院)

AI总结 提出基于批判驱动推理对齐(CDRA)的方法,通过DeepPref基准和个性化生成过程奖励模型(Pers-GenPRM),将偏好对齐转化为结构化推理过程,以推断用户深层隐式偏好并实现防御性推理。

详情
Journal ref
ICLR 2026 Conference
AI中文摘要

个性化对齐对于使大型语言模型(LLMs)有效参与以用户为中心的交互至关重要。然而,当前方法面临双重挑战:它们无法推断用户的深度隐式偏好(包括未言明的目标、语义上下文和风险容忍度),并且缺乏在现实世界模糊性中进行防御性推理所需的能力。这种认知差距导致响应肤浅、脆弱且短视。为了解决这个问题,我们提出了批判驱动推理对齐(CDRA),它将对齐从标量奖励匹配任务重新构建为结构化推理过程。首先,为了弥合偏好推断差距,我们引入了DeepPref基准。该数据集包含20个主题的3000个偏好-查询对,通过模拟多面认知委员会生成带有批判注释的推理链,以解构查询语义并揭示潜在风险。其次,为了灌输防御性推理,我们引入了个性化生成过程奖励模型(Pers-GenPRM),它将奖励建模构建为个性化推理任务。它在输出基于此推理的最终分数之前,生成批判链以评估响应与用户偏好的一致性。最终,这种可解释的结构化奖励信号通过批判驱动策略对齐(一种结合数值和自然语言反馈的过程级在线强化学习算法)指导策略模型。实验表明,CDRA在执行稳健推理的同时,擅长发现并与用户的真实偏好对齐。我们的代码和数据集可在https://github.com/Zephyrian-Hugh/Deep-pref获取。

英文摘要

Personalized alignment is crucial for enabling Large Language Models (LLMs) to engage effectively in user-centric interactions. However, current methods face a dual challenge: they fail to infer users' deep implicit preferences (including unstated goals, semantic context and risk tolerances), and they lack the defensive reasoning required to navigate real-world ambiguity. This cognitive gap leads to responses that are superficial, brittle and short-sighted. To address this, we propose Critique-Driven Reasoning Alignment (CDRA), which reframes alignment from a scalar reward-matching task into a structured reasoning process. First, to bridge the preference inference gap, we introduce the DeepPref benchmark. This dataset, comprising 3000 preference-query pairs across 20 topics, is curated by simulating a multi-faceted cognitive council that produces critique-annotated reasoning chains to deconstruct query semantics and reveal latent risks. Second, to instill defensive reasoning, we introduce the Personalized Generative Process Reward Model (Pers-GenPRM), which frames reward modeling as a personalized reasoning task. It generates a critique chain to evaluate a response's alignment with user preferences before outputting a final score based on this rationale. Ultimately, this interpretable, structured reward signal guides policy model through Critique-Driven Policy Alignment, a process-level online reinforcement learning algorithm integrating both numerical and natural language feedback. Experiments demonstrate that CDRA excels at discovering and aligning with users' true preferences while executing robust reasoning. Our code and dataset are available at https://github.com/Zephyrian-Hugh/Deep-pref.

2401.07386 2026-06-04 cs.CY cs.AI cs.LG 版本更新

How do machines learn? Evaluating the AIcon2abs method

机器如何学习?评估AIcon2abs方法

Rubens Lacerda Queiroz, Cabral Lima, Fabio Ferrentini Sampaio, Priscila Machado Vieira Lima

发表机构 * PPGI, Federal University of Rio de Janeiro(里约热内卢联邦大学PPGI系) Computer Science Institute, Federal University of Rio de Janeiro(里约热内卢联邦大学计算机科学研究所) Polytechnic University of Setúbal – Portugal(葡萄牙塞图巴尔理工大学) PESC/COPPE, Federal University of Rio de Janeiro(里约热内卢联邦大学PESC/COPPE系) Tercio Pacitti Institute (NCE), Federal University of Rio de Janeiro(里约热内卢联邦大学Tercio Pacitti研究所(NCE))

AI总结 本研究通过远程课程实验,评估了基于WiSARD权重神经网络、无需互联网的AIcon2abs方法在提升不同年龄段公众对机器学习理解方面的有效性,结果显示参与者满意度高。

Comments textual review (spelling and grammar); reorganization of the elements of some figures; New references included

详情
AI中文摘要

本研究扩展了先前介绍AIcon2abs方法(从具体到抽象的人工智能:向公众揭秘人工智能)的工作,该方法是一种创新方法,旨在提高不同年龄群体(包括K-12学生)对机器学习(ML)的理解,并评估其有效性。AIcon2abs采用WiSARD算法,这是一种以其简单性和用户可访问性著称的无权重神经网络。WiSARD不需要互联网,使其非常适合非技术用户和资源有限的环境。该方法使参与者能够通过引人入胜的动手活动直观地可视化和交互ML过程,仿佛他们自己就是算法。该方法允许用户通过实践活动直观地可视化和理解训练和分类的内部过程。由于WiSARD的功能不需要互联网连接,它可以从最小数据集(甚至单个示例)中有效学习。这一特性使用户能够观察到机器在接收更多数据时如何逐步提高其准确性。此外,WiSARD生成代表其学习内容的心理图像,突出显示分类数据的基本特征。AIcon2abs通过一个六小时的远程课程进行测试,有34名巴西参与者,包括5名儿童、5名青少年和24名成人。数据分析从两个角度进行:混合方法预实验(包括假设检验)和定性现象学分析。几乎所有参与者都对AIcon2abs给予正面评价,结果显示在实现预期结果方面具有高度满意度。本研究已获得CEP-HUCFF-UFRJ研究伦理委员会的批准。

英文摘要

This study expands on previous work that introduced the AIcon2abs method (AI from Concrete to Abstract: Demystifying Artificial Intelligence to the general public), an innovative approach designed to increase public understanding of machine learning (ML) across diverse age groups, including K-12 students, and aims to evaluate its effectiveness. AIcon2Abs employs the WiSARD algorithm, a weightless neural network known for its simplicity, and user accessibility. WiSARD does not require Internet, making it ideal for non-technical users and resource-limited environments. This method enables participants to intuitively visualize and interact with ML processes through engaging, hands-on activities, as if they were the algorithms themselves. The method allows users to intuitively visualize and understand the internal processes of training and classification through practical activities. Once WiSARDs functionality does not require an Internet connection, it can learn effectively from a minimal dataset, even from a single example. This feature enables users to observe how the machine improves its accuracy incrementally as it receives more data. Moreover, WiSARD generates mental images representing what it has learned, highlighting essential features of the classified data. AIcon2abs was tested through a six-hour remote course with 34 Brazilian participants, including 5 children, 5 adolescents, and 24 adults. Data analysis was conducted from two perspectives: a mixed-method pre-experiment (including hypothesis testing), and a qualitative phenomenological analysis. Nearly all participants rated AIcon2abs positively, with the results demonstrating a high degree of satisfaction in achieving the intended outcomes. This research was approved by the CEP-HUCFF-UFRJ Research Ethics Committee.

2601.09853 2026-06-04 cs.CL cs.AI 版本更新

MedRedFlag: Investigating how LLMs Redirect Misconceptions in Real-World Health Communication

MedRedFlag:探究LLMs如何在真实健康沟通中纠正误解

Sraavya Sambara, Yuan Pu, Ayman Ali, Vishala Mishra, Lionel Wong, Monica Agrawal

发表机构 * Independent Researcher(独立研究者) Duke University(杜克大学) Stanford University(斯坦福大学)

AI总结 本研究通过构建MedRedFlag数据集(1100+个来自Reddit的需纠正问题),系统比较了先进LLMs与临床医生的回应,发现LLMs常未能纠正问题中的错误前提,可能导致次优医疗决策,揭示了患者面向医疗AI系统的关键安全漏洞。

详情
AI中文摘要

来自患者的真实健康问题往往无意中嵌入了错误的假设或前提。在这种情况下,安全的医疗沟通通常涉及纠正:先指出隐含的误解,然后回应用户的潜在背景,而非原始问题。尽管大型语言模型(LLMs)越来越多地被普通用户用于医疗建议,但它们尚未针对这一关键能力进行测试。因此,在本工作中,我们研究了LLMs如何应对真实健康问题中嵌入的错误前提。我们开发了一个半自动化流程来整理MedRedFlag,这是一个包含1100多个来自Reddit的、需要纠正的问题的数据集。然后,我们系统地比较了最先进的LLMs与临床医生的回应。我们的分析显示,LLMs往往未能纠正有问题的提问,即使检测到了有问题的前提,并且提供的答案可能导致次优的医疗决策。我们的基准测试和结果揭示了LLMs在真实健康沟通条件下表现的新且重大的差距,突显了面向患者的医疗AI系统的关键安全问题。代码和数据集可在https://github.com/srsambara-1/MedRedFlag获取。

英文摘要

Real-world health questions from patients often unintentionally embed false assumptions or premises. In such cases, safe medical communication typically involves redirection: addressing the implicit misconception and then responding to the underlying patient context, rather than the original question. While large language models (LLMs) are increasingly being used by lay users for medical advice, they have not yet been tested for this crucial competency. Therefore, in this work, we investigate how LLMs react to false premises embedded within real-world health questions. We develop a semi-automated pipeline to curate MedRedFlag, a dataset of 1100+ questions sourced from Reddit that require redirection. We then systematically compare responses from state-of-the-art LLMs to those from clinicians. Our analysis reveals that LLMs often fail to redirect problematic questions, even when the problematic premise is detected, and provide answers that could lead to suboptimal medical decision making. Our benchmark and results reveal a novel and substantial gap in how LLMs perform under the conditions of real-world health communication, highlighting critical safety concerns for patient-facing medical AI systems. Code and dataset are available at https://github.com/srsambara-1/MedRedFlag.

2506.10630 2026-06-04 cs.LG cs.AI 版本更新

Time Series Forecasting as Reasoning: A Slow-Thinking Approach with Reinforced LLMs

时间序列预测作为推理:一种基于强化LLM的慢思考方法

Yitong Zhou, Yucong Luo, Mingyue Cheng, Qi Liu, Jiahao Wang, Daoyu Wang, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China(认知智能国家重点实验室,中国科学技术大学)

AI总结 提出Time-R1框架,通过两阶段强化微调(监督微调+强化学习)训练LLM进行多步推理,以提升时间序列预测的准确性。

详情
AI中文摘要

为了推进时间序列预测(TSF),人们提出了各种方法来提高预测精度,从统计技术发展到数据驱动的深度学习架构。尽管这些方法有效,但大多数现有方法仍然遵循快速思考范式——依赖提取历史模式并将其映射到未来值作为核心建模理念,缺乏包含中间时间序列推理的显式思考过程。与此同时,新兴的慢思考LLM(如OpenAI-o1)展示了显著的多步推理能力,为克服这些问题提供了替代途径。然而,仅靠提示工程存在若干局限性——包括高计算成本、隐私风险以及领域特定时间序列深度推理能力有限。为了解决这些局限性,更有前景的方法是训练LLM发展慢思考能力并获得强大的时间序列推理技能。为此,我们提出了Time-R1,一个两阶段强化微调框架,旨在增强LLM用于时间序列预测的多步推理能力。具体来说,第一阶段进行监督微调以进行预热适应,而第二阶段采用强化学习来提高模型的泛化能力。特别地,我们专门为时间序列预测设计了一个细粒度的多目标奖励,然后引入了GRIP(基于组的相对重要性策略优化),它利用非均匀采样进一步鼓励和优化模型对有效推理路径的探索。实验表明,Time-R1在多种数据集上显著提高了预测性能。

英文摘要

To advance time series forecasting (TSF), various methods have been proposed to improve prediction accuracy, evolving from statistical techniques to data-driven deep learning architectures. Despite their effectiveness, most existing methods still adhere to a fast thinking paradigm-relying on extracting historical patterns and mapping them to future values as their core modeling philosophy, lacking an explicit thinking process that incorporates intermediate time series reasoning. Meanwhile, emerging slow-thinking LLMs (e.g., OpenAI-o1) have shown remarkable multi-step reasoning capabilities, offering an alternative way to overcome these issues. However, prompt engineering alone presents several limitations - including high computational cost, privacy risks, and limited capacity for in-depth domain-specific time series reasoning. To address these limitations, a more promising approach is to train LLMs to develop slow thinking capabilities and acquire strong time series reasoning skills. For this purpose, we propose Time-R1, a two-stage reinforcement fine-tuning framework designed to enhance multi-step reasoning ability of LLMs for time series forecasting. Specifically, the first stage conducts supervised fine-tuning for warmup adaptation, while the second stage employs reinforcement learning to improve the model's generalization ability. Particularly, we design a fine-grained multi-objective reward specifically for time series forecasting, and then introduce GRIP (group-based relative importance for policy optimization), which leverages non-uniform sampling to further encourage and optimize the model's exploration of effective reasoning paths. Experiments demonstrate that Time-R1 significantly improves forecast performance across diverse datasets.

2604.14575 2026-06-04 cs.LG cs.AI stat.ME stat.ML 版本更新

Generative Augmented Inference

生成式增强推断

Cheng Lu, Mengxin Wang, Dennis J. Zhang, Heng Zhang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学) University of Toronto(多伦多大学)

AI总结 提出生成式增强推断(GAI)框架,将AI输出视为学习真实标签的高维信息特征而非代理,通过非参数方法建模,实现人机数据联合的一致估计和有效推断,在随机标注下渐近效率严格优于仅用人类数据。

详情
AI中文摘要

大型语言模型使得廉价的AI生成标注成为可能,但如何可靠地将其用于因果推断仍然具有挑战性。简单地将AI和人类数据混合会引入偏差,而现有方法如预测驱动推断(PPI;Angelopoulos et al., 2023a)将AI输出视为真实标签的代理——这一假设在实践中常被生成模型输出所违背。我们提出生成式增强推断(GAI),一个将AI输出视为学习人类标签的一般性、潜在高维信息特征而非替代品的框架。GAI使用非参数方法灵活建模这种关系,从而能够从人类和AI的联合数据中进行一致估计和有效推断。我们建立了渐近正态性,并证明在随机标注下,只要AI输出对真实标签具有信息量,GAI在渐近效率上严格优于仅使用人类数据的估计。在真实数据集上的实证研究表明,与仅使用人类数据和基于PPI的估计相比,GAI在多种生成数据源上显著降低了估计误差并提高了置信区间质量。

英文摘要

Large language models enable inexpensive AI-generated annotations, but using them reliably for causal inference remains challenging. Naively pooling AI and human data induces bias, while existing methods such as Prediction-Powered Inference (PPI; Angelopoulos et al., 2023a) treat AI outputs as proxies of true labels -- an assumption often violated for generative model outputs in practice. We propose Generative Augmented Inference (GAI), a framework that treats AI outputs as general, potentially high-dimensional informative features for learning human labels rather than as surrogates. GAI flexibly models this relationship using nonparametric methods, enabling consistent estimation and valid inference from combined human and AI data. We establish asymptotic normality and show that, under random labeling, GAI strictly improves asymptotic efficiency over human-data-only estimation whenever AI outputs are informative for true labels. Empirical studies on real-world datasets demonstrate that GAI significantly reduces estimation error and improves confidence interval quality across diverse generative data sources relative to human-only and PPI-based estimation.

2604.12645 2026-06-04 cs.RO cs.AI 版本更新

Contextual Multi-Task Reinforcement Learning for Autonomous Reef Monitoring

上下文多任务强化学习用于自主珊瑚礁监测

Melvin Laux, Yi-Ling Liu, Rina Alo, Sören Töpper, Mariela De Lucas Alvarez, Frank Kirchner, Rebecca Adam

发表机构 * University of Bremen(不莱梅大学)

AI总结 针对水下动力学不确定性和任务变化,提出上下文多任务强化学习框架,学习可复用的控制策略,在模拟环境中实现高效训练、零样本泛化和鲁棒性。

Comments To be published in IEEE OCEANS 2026 (Sanya) conference proceedings

详情
AI中文摘要

尽管自主水下航行器有望实现海洋生态系统监测,但其部署从根本上受限于在高度不确定和非平稳的水下动力学下控制航行器的难度。为了解决这些挑战,我们采用数据驱动的强化学习方法来补偿未知动力学和任务变化。传统的单任务强化学习容易过拟合训练环境,从而限制了所学策略的长期实用性。因此,我们提出使用上下文多任务强化学习范式,允许我们学习可复用于各种任务的控制器,例如在一个珊瑚礁中检测牡蛎,在另一个珊瑚礁中检测珊瑚。我们评估上下文多任务强化学习是否能有效学习自主水下珊瑚礁监测的鲁棒且可泛化的控制策略。我们在HoloOcean中的模拟珊瑚礁环境中训练了一个单一上下文相关策略,该策略能够解决多个相关的监测任务。在我们的实验中,我们经验性地评估了上下文策略在样本效率、对未见任务的零样本泛化以及对变化水流的鲁棒性方面的表现。通过利用多任务强化学习,我们旨在提高训练效率以及所学策略的可重用性,从而向更可持续的自主珊瑚礁监测程序迈进一步。

英文摘要

Although autonomous underwater vehicles promise the capability of marine ecosystem monitoring, their deployment is fundamentally limited by the difficulty of controlling vehicles under highly uncertain and non-stationary underwater dynamics. To address these challenges, we employ a data-driven reinforcement learning approach to compensate for unknown dynamics and task variations. Traditional single-task reinforcement learning has a tendency to overfit the training environment, thus, limit the long-term usefulness of the learnt policy. Hence, we propose to use a contextual multi-task reinforcement learning paradigm instead, allowing us to learn controllers that can be reused for various tasks, e.g., detecting oysters in one reef and detecting corals in another. We evaluate whether contextual multi-task reinforcement learning can efficiently learn robust and generalisable control policies for autonomous underwater reef monitoring. We train a single context-dependent policy that is able to solve multiple related monitoring tasks in a simulated reef environment in HoloOcean. In our experiments, we empirically evaluate the contextual policies regarding sample-efficiency, zero-shot generalisation to unseen tasks, and robustness to varying water currents. By utilising multi-task reinforcement learning, we aim to improve the training effectiveness, as well as the reusability of learnt policies to take a step towards more sustainable procedures in autonomous reef monitoring.

2604.11510 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Policy Split: Incentivizing Dual-Mode Exploration in LLM Reinforcement with Dual-Mode Entropy Regularization

策略分裂:通过双模式熵正则化激励大语言模型强化学习中的双模式探索

Jiashu Yao, Heyan Huang, Daiqing Wu, Zeming Liu, Yuhang Guo

发表机构 * Beijing Institute of Technology(北京理工大学) Tsinghua University(清华大学) Beihang University(北航)

AI总结 提出Policy Split方法,将策略分裂为正常和高熵两种模式,通过协作双模式熵正则化在保持准确性的同时促进多样化探索,实验表明在通用和创造性任务上优于现有基线。

Comments preprint

详情
AI中文摘要

为了在不牺牲准确性的情况下鼓励大语言模型(LLM)强化学习(RL)中的多样化探索,我们提出了Policy Split,一种新颖的范式,通过高熵提示将策略分裂为正常模式和高熵模式。在共享模型参数的同时,两种模式针对不同目标进行协作的双模式熵正则化。具体来说,正常模式优化任务正确性,而高熵模式融入探索偏好,两种模式协作学习。大量实验表明,我们的方法在通用和创造性任务的各种模型规模上始终优于已建立的熵引导RL基线。进一步分析揭示,Policy Split促进了双模式探索,其中高熵模式产生与正常模式不同的行为模式,提供独特的学习信号。

英文摘要

To encourage diverse exploration in reinforcement learning (RL) for large language models (LLMs) without compromising accuracy, we propose Policy Split, a novel paradigm that bifurcates the policy into normal and high-entropy modes with a high-entropy prompt. While sharing model parameters, the two modes undergo collaborative dual-mode entropy regularization tailored to distinct objectives. Specifically, the normal mode optimizes for task correctness, while the high-entropy mode incorporates a preference for exploration, and the two modes learn collaboratively. Extensive experiments demonstrate that our approach consistently outperforms established entropy-guided RL baselines across various model sizes in general and creative tasks. Further analysis reveals that Policy Split facilitates dual-mode exploration, where the high-entropy mode generates distinct behavioral patterns to the normal mode, providing unique learning signals.

2604.09686 2026-06-04 cs.AI cs.CV 版本更新

Belief-Aware VLM Model for Human-like Reasoning

信念感知的VLM模型用于类人推理

Anshul Nayak, Shahil Shaik, Yue Wang

发表机构 * Mechanical Engineering Department, Clemson University(克莱姆森大学机械工程系)

AI总结 提出一种信念感知的视觉语言模型框架,通过检索式记忆和强化学习近似信念,提升长时程意图推理能力,在HD-EPIC等数据集上优于零样本基线。

Comments Accepted for publication at the IEEE International Conference on Robot and Human Interactive Communication (RO-MAN 2026). 6 pages, 3 figures, 1 table

详情
AI中文摘要

传统的意图推理神经网络模型严重依赖可观测状态,难以泛化到多样化的任务和动态环境。视觉语言模型(VLM)和视觉语言动作(VLA)模型的最新进展通过大规模多模态预训练引入了常识推理,实现了跨任务的零样本性能。然而,这些模型仍然缺乏显式的信念表示和更新机制,限制了其像人类一样推理或捕捉长时程中不断演变的人类意图的能力。为了解决这个问题,我们提出了一个信念感知的VLM框架,集成了基于检索的记忆和强化学习。我们不学习显式的信念模型,而是使用基于向量的记忆来近似信念,该记忆检索相关的多模态上下文,并将其纳入VLM进行推理。我们进一步通过在VLM潜在空间上使用强化学习策略来优化决策。我们在公开可用的VQA数据集(如HD-EPIC)上评估了我们的方法,并展示了相对于零样本基线的持续改进,突出了信念感知推理的重要性。

英文摘要

Traditional neural network models for intent inference rely heavily on observable states and struggle to generalize across diverse tasks and dynamic environments. Recent advances in Vision Language Models (VLMs) and Vision Language Action (VLA) models introduce common-sense reasoning through large-scale multimodal pretraining, enabling zero-shot performance across tasks. However, these models still lack explicit mechanisms to represent and update belief, limiting their ability to reason like humans or capture the evolving human intent over long-horizon. To address this, we propose a belief-aware VLM framework that integrates retrieval-based memory and reinforcement learning. Instead of learning an explicit belief model, we approximate belief using a vector-based memory that retrieves relevant multimodal context, which is incorporated into the VLM for reasoning. We further refine decision-making using a reinforcement learning policy over the VLM latent space. We evaluate our approach on publicly available VQA datasets such as HD-EPIC and demonstrate consistent improvements over zero-shot baselines, highlighting the importance of belief-aware reasoning.

2604.07778 2026-06-04 cs.AI 版本更新

The Accountability Horizon: An Impossibility Theorem for Governing Human-Agent Collectives

问责地平线:人类-智能体集体治理的不可能性定理

Haileleol Tibebu, Hewan Shemtaga

发表机构 * University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Responsible Intelligence Institute(负责任智能研究所)

AI总结 本文通过引入人类-智能体集体形式化模型和问责不完全性定理,证明当自主性超过可计算阈值时,现有AI问责框架必然失效,并基于合成实验验证了该相变边界。

详情
AI中文摘要

现有的AI系统问责框架(法律、伦理和监管)都基于一个共同假设:对于任何重要结果,至少有一个可识别的人具有足够的参与度和预见性来承担有意义的责任。本文证明,一旦自主性超过可计算阈值,智能体AI系统违反这一假设不是工程限制,而是数学必然性。我们引入了人类-智能体集体,这是一种联合人-AI系统的形式化,其中智能体被建模为共享结构因果模型中的状态-策略元组。自主性通过四维信息论特征(认知、执行、评估、社会)来刻画;集体行为通过交互图和联合行动空间来刻画。我们通过四个最小属性公理化了合法问责:可归因性(责任需要因果贡献)、可预见性边界(责任不能超过预测能力)、非空性(至少一个智能体承担非平凡责任)和完备性(所有责任必须完全分配)。我们的核心结果——问责不完全性定理——证明,对于任何复合自主性超过问责地平线且交互图包含人-AI反馈循环的集体,没有框架能同时满足所有四个属性。这种不可能性是结构性的:透明度、审计和监督在不降低自主性的情况下无法解决。在阈值以下,存在合法框架,从而建立了一个尖锐的相变。在3000个合成集体上的实验证实了所有预测,零违规。这是AI治理中的第一个不可能性结果,建立了一个形式边界,低于该边界当前范式仍然有效,高于该边界分布式问责机制变得必要。

英文摘要

Existing accountability frameworks for AI systems, legal, ethical, and regulatory, rest on a shared assumption: for any consequential outcome, at least one identifiable person had enough involvement and foresight to bear meaningful responsibility. This paper proves that agentic AI systems violate this assumption not as an engineering limitation but as a mathematical necessity once autonomy exceeds a computable threshold. We introduce Human-Agent Collectives, a formalisation of joint human-AI systems where agents are modelled as state-policy tuples within a shared structural causal model. Autonomy is characterised through a four-dimensional information-theoretic profile (epistemic, executive, evaluative, social); collective behaviour through interaction graphs and joint action spaces. We axiomatise legitimate accountability through four minimal properties: Attributability (responsibility requires causal contribution), Foreseeability Bound (responsibility cannot exceed predictive capacity), Non-Vacuity (at least one agent bears non-trivial responsibility), and Completeness (all responsibility must be fully allocated). Our central result, the Accountability Incompleteness Theorem, proves that for any collective whose compound autonomy exceeds the Accountability Horizon and whose interaction graph contains a human-AI feedback cycle, no framework can satisfy all four properties simultaneously. The impossibility is structural: transparency, audits, and oversight cannot resolve it without reducing autonomy. Below the threshold, legitimate frameworks exist, establishing a sharp phase transition. Experiments on 3,000 synthetic collectives confirm all predictions with zero violations. This is the first impossibility result in AI governance, establishing a formal boundary below which current paradigms remain valid and above which distributed accountability mechanisms become necessary.

2604.04944 2026-06-04 cs.CL cs.AI 版本更新

Inclusion-of-Thoughts: Mitigating Preference Instability via Purifying the Decision Space

包含思维:通过净化决策空间缓解偏好不稳定性

Mohammad Reza Ghasemi Madani, Soyeon Caren Han, Shuo Yang, Jey Han Lau

发表机构 * School of Computing and Information Systems, The University of Melbourne(计算与信息系统学院,墨尔本大学)

AI总结 提出包含思维(IoT)策略,通过逐步自过滤干扰选项来重构多选题,从而稳定模型偏好并提升推理性能。

详情
AI中文摘要

多项选择题(MCQ)被广泛用于评估大型语言模型(LLM)。然而,LLM 仍然容易受到似是而非的干扰项的影响。这常常将注意力转移到无关选项上,导致在正确和错误答案之间不稳定地摇摆。在本文中,我们提出包含思维(IoT),一种渐进式自过滤策略,旨在减轻这种认知负荷(即干扰项存在下模型偏好的不稳定性),并使模型更有效地关注合理答案。我们的方法仅使用合理的选项选择来重构 MCQ,为检查比较判断以及模型在扰动下内部推理的稳定性提供了一个受控环境。通过明确记录这一过滤过程,IoT 还增强了模型决策的透明度和可解释性。广泛的实证评估表明,IoT 在算术、常识推理和教育基准测试中显著提升了思维链性能,且计算开销极小。

英文摘要

Multiple-choice questions (MCQs) are widely used to evaluate large language models (LLMs). However, LLMs remain vulnerable to the presence of plausible distractors. This often diverts attention toward irrelevant choices, resulting in unstable oscillation between correct and incorrect answers. In this paper, we propose Inclusion-of-Thoughts (IoT), a progressive self-filtering strategy that is designed to mitigate this cognitive load (i.e., instability of model preferences under the presence of distractors) and enable the model to focus more effectively on plausible answers. Our method operates to reconstruct the MCQ using only plausible option choices, providing a controlled setting for examining comparative judgements and therefore the stability of the model's internal reasoning under perturbation. By explicitly documenting this filtering process, IoT also enhances the transparency and interpretability of the model's decision-making. Extensive empirical evaluation demonstrates that IoT substantially boosts chain-of-thought performance across a range of arithmetic, commonsense reasoning, and educational benchmarks with minimal computational overhead.

2602.00104 2026-06-04 cs.CV cs.AI 版本更新

R3G: A Reasoning-Retrieval-Reranking Framework for Vision-Centric Answer Generation

R3G: 一种面向以视觉为中心的答案生成的推理-检索-重排序框架

Zhuohong Chen, Zhengxian Wu, Zirui Liao, Shenao Jiang, Hangrui Xu, Yang Chen, Chaokui Su, Xiaoyu Liu, Haoqian Wang

发表机构 * The Shenzhen International Graduate School, Tsinghua University, China(清华大学深圳国际研究生院) State Key Laboratory of Nuclear Power Safety Technology and Equipment, China(核能安全技术与装备国家重点实验室) School of Computer Science and Information Engineering, Hefei University of Technology, China(合肥工业大学计算机科学与信息工程学院)

AI总结 提出R3G框架,通过先制定推理计划指定所需视觉线索,再采用粗检索加细粒度重排序的两阶段策略选择证据图像,在MRAG-Bench上提升六种多模态大语言模型在九个子场景中的准确率,实现整体最优性能。

详情
AI中文摘要

以视觉为中心的VQA检索需要检索图像以补充缺失的视觉线索,并将其整合到推理过程中。然而,选择正确的图像并将其有效整合到模型的推理中仍然具有挑战性。为了解决这一挑战,我们提出了R3G,一个模块化的推理-检索-重排序框架。它首先生成一个简要的推理计划,指定所需的视觉线索,然后采用两阶段策略,先进行粗检索,再进行细粒度重排序,以选择证据图像。在MRAG-Bench上,R3G在六个多模态大语言模型骨干和九个子场景中提高了准确率,实现了整体最优性能。消融实验表明,充分性感知的重排序和推理步骤是互补的,有助于模型既选择正确的图像又充分利用它们。我们在https://github.com/czh24/R3G发布代码和数据。

英文摘要

Vision-centric retrieval for VQA requires retrieving images to supply missing visual cues and integrating them into the reasoning process. However, selecting the right images and integrating them effectively into the model's reasoning remains challenging.To address this challenge, we propose R3G, a modular Reasoning-Retrieval-Reranking framework.It first produces a brief reasoning plan that specifies the required visual cues, then adopts a two-stage strategy, with coarse retrieval followed by fine-grained reranking, to select evidence images.On MRAG-Bench, R3G improves accuracy across six MLLM backbones and nine sub-scenarios, achieving state-of-the-art overall performance. Ablations show that sufficiency-aware reranking and reasoning steps are complementary, helping the model both choose the right images and use them well. We release code and data at https://github.com/czh24/R3G.

2604.00819 2026-06-04 cs.CL cs.AI 版本更新

Emotion Entanglement and Bayesian Inference for Multi-Dimensional Emotion Understanding

情感纠缠与贝叶斯推理用于多维情感理解

Hemanth Kotaprolu, Kishan Maharaj, Raey Zhao, Abhijit Mishra, Pushpak Bhattacharyya

发表机构 * Indian Institute of Technology Bombay(印度理工学院班加罗尔) University of Texas at Austin(德克萨斯大学奥斯汀分校) IBM Research(IBM研究院)

AI总结 提出基于Plutchik基本情绪理论的情感场景基准EmoScene,并利用情感共现统计的贝叶斯推理框架进行联合后验推理,提升多维情感理解的结构一致性。

Comments 19 pages in total, 10 Figures, 7 Tables

详情
AI中文摘要

理解自然语言中的情感本质上是一个多维推理问题,其中多个情感信号通过上下文、人际关系和情境线索相互作用。然而,大多数现有的情感理解基准依赖于短文本和预定义的情感标签,将这一过程简化为独立的标签预测,忽略了情感之间的结构化依赖关系。为了解决这一局限性,我们引入了情感场景(EmoScene),一个基于理论的基准,包含4,731个上下文丰富的场景,并用源自Plutchik基本情绪的8维情感向量进行标注。基于情感很少独立出现的观察,我们进一步提出了一个纠缠感知的贝叶斯推理框架,该框架结合情感共现统计,对情感向量进行联合后验推理。这种轻量级的后处理不需要任何参数更新,提高了预测的结构一致性,并在不增加额外成本的情况下,整体词汇准确率提升了2.24%。因此,EmoScene为研究多维情感理解和当前语言模型的局限性提供了一个具有挑战性的基准。

英文摘要

Understanding emotions in natural language is inherently a multi-dimensional reasoning problem, where multiple affective signals interact through context, interpersonal relations, and situational cues. However, most existing emotion understanding benchmarks rely on short texts and predefined emotion labels, reducing this process to independent label prediction and ignoring the structured dependencies among emotions. To address this limitation, we introduce Emotional Scenarios (EmoScene), a theory-grounded benchmark of 4,731 contextrich scenarios annotated with an 8-dimensional emotion vector derived from Plutchik's basic emotions. Motivated by the observation that emotions rarely occur independently, we further propose an entanglement-aware Bayesian inference framework that incorporates emotion co-occurrence statistics to perform joint posterior inference over the emotion vector. This lightweight post-processing does not require any parameter updates and improves the structural consistency of predictions, and yields overall gains of 2.24% Lexical Accuracy without any additional cost. EmoScene therefore provides a challenging benchmark for studying multi-dimensional emotion understanding and the limitations of current language models.

2603.28762 2026-06-04 cs.CV cs.AI cs.GR cs.LG 版本更新

On-the-fly Repulsion in the Contextual Space for Rich Diversity in Diffusion Transformers

上下文空间中的即时排斥以实现扩散变换器的丰富多样性

Omer Dahary, Benaya Koren, Daniel Garibi, Daniel Cohen-Or

发表机构 * Tel Aviv University(特拉维夫大学) Snap Research Israel(Snap以色列研究)

AI总结 针对文本到图像扩散模型多样性不足的问题,提出在扩散变换器的上下文空间中通过多模态注意力通道施加即时排斥,在不牺牲视觉保真度和语义一致性的前提下显著提升生成多样性,且计算开销小,适用于现代Turbo和蒸馏模型。

Comments SIGGRAPH 2026. Project page: https://contextual-repulsion.github.io/

详情
AI中文摘要

现代文本到图像(T2I)扩散模型在语义对齐方面取得了显著进展,但通常缺乏多样性,倾向于为任何给定提示收敛到狭窄的视觉解决方案集。这种典型性偏差对需要广泛生成结果的创意应用构成了挑战。我们识别出当前多样性方法中的一个基本权衡:修改模型输入需要昂贵的优化来整合生成路径的反馈。相反,对空间上已承诺的中间潜变量进行操作往往会破坏正在形成的视觉结构,导致伪影。在这项工作中,我们提出在上下文空间中应用排斥作为一种新颖的框架,以实现扩散变换器的丰富多样性。通过干预多模态注意力通道,我们在变换器的前向传播过程中施加即时排斥,在文本条件被新兴图像结构丰富后的块之间注入干预。这允许在结构信息形成后但构图固定之前重定向引导轨迹。我们的结果表明,上下文空间中的排斥在不牺牲视觉保真度或语义一致性的情况下产生了显著更丰富的多样性。此外,我们的方法非常高效,计算开销小,即使在现代“Turbo”和蒸馏模型中也有效,而传统的基于轨迹的干预在这些模型中通常会失败。

英文摘要

Modern Text-to-Image (T2I) diffusion models have achieved remarkable semantic alignment, yet they often suffer from a significant lack of variety, converging on a narrow set of visual solutions for any given prompt. This typicality bias presents a challenge for creative applications that require a wide range of generative outcomes. We identify a fundamental trade-off in current approaches to diversity: modifying model inputs requires costly optimization to incorporate feedback from the generative path. In contrast, acting on spatially-committed intermediate latents tends to disrupt the forming visual structure, leading to artifacts. In this work, we propose to apply repulsion in the Contextual Space as a novel framework for achieving rich diversity in Diffusion Transformers. By intervening in the multimodal attention channels, we apply on-the-fly repulsion during the transformer's forward pass, injecting the intervention between blocks where text conditioning is enriched with emergent image structure. This allows for redirecting the guidance trajectory after it is structurally informed but before the composition is fixed. Our results demonstrate that repulsion in the Contextual Space produces significantly richer diversity without sacrificing visual fidelity or semantic adherence. Furthermore, our method is uniquely efficient, imposing a small computational overhead while remaining effective even in modern "Turbo" and distilled models where traditional trajectory-based interventions typically fail.

2603.24747 2026-06-04 cs.AI cs.MA 版本更新

Formal Semantics for Agentic Tool Protocols: A Process Calculus Approach

智能体工具协议的形式语义:一种进程演算方法

Andreas Schlapbach

AI总结 本文通过进程演算形式化两种智能体工具协议(SGD和MCP),证明它们在映射Phi下结构互模拟,但反向映射有损,进而提出MCP+扩展实现完全等价。

Comments Logical flaw in Theorem 21

详情
AI中文摘要

能够调用外部工具的大型语言模型智能体的出现,催生了对智能体协议进行形式验证的迫切需求。两个范式主导了这一领域:Schema-Guided Dialogue (SGD),一个用于零样本API泛化的研究框架,以及Model Context Protocol (MCP),一个用于智能体-工具集成的行业标准。虽然两者都通过模式描述实现动态服务发现,但它们的形式关系仍未探索。基于先前建立这些范式概念趋同的工作,我们提出了SGD和MCP的第一个进程演算形式化,证明它们在定义良好的映射Phi下结构互模拟。然而,我们证明反向映射Phi^{-1}是部分且有损的,揭示了MCP表达性的关键缺陷。通过双向分析,我们识别出五个原则——语义完备性、显式动作边界、失败模式文档、渐进式披露兼容性和工具间关系声明——作为完全行为等价的充分必要条件。我们将这些原则形式化为类型系统扩展MCP+,证明MCP+与SGD同构。我们的工作为经过验证的智能体系统提供了第一个形式基础,并将模式质量确立为可证明的安全属性。

英文摘要

The emergence of large language model agents capable of invoking external tools has created urgent need for formal verification of agent protocols. Two paradigms dominate this space: Schema-Guided Dialogue (SGD), a research framework for zero-shot API generalization, and the Model Context Protocol (MCP), an industry standard for agent-tool integration. While both enable dynamic service discovery through schema descriptions, their formal relationship remains unexplored. Building on prior work establishing the conceptual convergence of these paradigms, we present the first process calculus formalization of SGD and MCP, proving they are structurally bisimilar under a well-defined mapping Phi. However, we demonstrate that the reverse mapping Phi^{-1} is partial and lossy, revealing critical gaps in MCP's expressivity. Through bidirectional analysis, we identify five principles -- semantic completeness, explicit action boundaries, failure mode documentation, progressive disclosure compatibility, and inter-tool relationship declaration -- as necessary and sufficient conditions for full behavioral equivalence. We formalize these principles as type-system extensions MCP+, proving MCP+ is isomorphic to SGD. Our work provides the first formal foundation for verified agent systems and establishes schema quality as a provable safety property.

2603.23841 2026-06-04 cs.CL cs.AI 版本更新

PoliticsBench: Benchmarking Political Values in Large Language Models with Multi-Turn Roleplay

PoliticsBench: 通过多轮角色扮演基准测试大型语言模型中的政治价值观

Rohan Khetan, Ashna Khetan

发表机构 * Northville High School, Northville, USA(北维尅高中) Department of Computer Science, Stanford University, Stanford, USA(斯坦福大学计算机科学系)

AI总结 提出PoliticsBench,一个多阶段角色扮演基准,通过20个演化场景评估LLM的细粒度价值表达,发现场景提示比直接提问能引发更广泛和强烈的价值表达。

Comments 7 pages, 5 tables, 5 figures, 4 appendix pages. Accepted to the ICML 2026 Trustworthy AI for Good Workshop

详情
AI中文摘要

虽然大型语言模型(LLMs)越来越多地被用作主要信息来源,但其潜在的政治偏见可能影响其客观性。现有的LLM社会偏见基准主要评估人口统计刻板印象,而当衡量政治偏见时,是在粗略的层面上进行的,忽视了塑造社会政治推理的价值观。我们引入了PoliticsBench,一个用于评估LLM中细粒度价值表达的多阶段角色扮演基准。在20个演化场景中,模型在竞争压力下阐述权衡、表明立场并做出决策。在八个主流LLM上,我们表明,与直接的政治问题相比,基于场景的提示引发了更广泛和更强烈的价值表达,峰值交互阶段使强烈激活的价值维度数量增加了约0.75(共10个维度),相对于基线提示具有统计显著性(p < 0.05)。此外,在交互过程中,立场的承诺度增加,从初始阶段到决策阶段,在[0,5]量表上上升了约1.4分。虽然在后期交互阶段,响应对于场景释义的鲁棒性降低,但评判者间的一致性保持相对稳定。我们的结果表明,评估LLM的政治行为需要超越静态提示,转向更长的交互设置,以捕捉价值观如何在上下文中应用。

英文摘要

While Large Language Models (LLMs) are increasingly used as primary sources of information, their potential for political bias may impact their objectivity. Existing benchmarks of LLM social bias primarily evaluate demographic stereotypes, and when political bias is measured, it is done so at a coarse level, overlooking the values that shape sociopolitical reasoning. We introduce PoliticsBench, a multi-stage roleplay benchmark for evaluating fine-grained value expression in LLMs. Across twenty evolving scenarios, models articulate tradeoffs, take positions, and make decisions under competing pressures. Across eight prominent LLMs, we show that scenario-based prompting elicits broader and more strongly expressed value profiles than direct political questions, with peak interaction stages increasing the number of strongly activated value dimensions by approximately $0.75$ (out of 10 total dimensions), a statistically significant increase relative to baseline prompting ($p < 0.05$). In addition, commitment to a stance increases over the course of interaction, rising by approximately $1.4$ points on a $[0,5]$ scale from initial to decision stages. While responses become less robust to scenario paraphrasing in later interaction stages, inter-judge agreement remains relatively stable. Our results suggest that evaluating LLM political behavior requires moving beyond static prompts toward longer interactive settings that capture how values are applied in context.

2603.23420 2026-06-04 cs.AI 版本更新

Bilevel Autoresearch: Meta-Autoresearching Itself

双层自动研究:元自动研究自身

Yaonan Qu, Meng Lu

发表机构 * Independent Researcher(独立研究者)

AI总结 提出双层自动研究框架,外层循环通过读取内层循环代码和轨迹、识别瓶颈并注入可执行Python搜索机制来改进内层循环,在GPT预训练基准上实现5倍改进。

Comments 16 pages, 5 figures, 3 tables. v2 expands the framing as mechanism-level agentic self-improvement and updates related work and limitations; core method and experiments unchanged. This paper was primarily drafted by AI agents with human oversight and direction

详情
AI中文摘要

如果自动研究本身是一种研究形式,那么自动研究可以应用于研究本身。我们提出了双层自动研究(Bilevel Autoresearch),一种双层框架,其中外层自动研究循环通过读取内层自动研究循环的代码和轨迹,识别瓶颈,并在运行时生成可注入的Python搜索机制来改进内层循环。内层循环优化任务性能;外层循环优化内层循环的搜索方式。两个循环使用相同的LLM,因此改进来自双层架构而非更强的元级模型,尽管外层循环消耗额外的推理和挂钟时间预算。在Karpathy的GPT预训练基准上,元自动研究外层循环相比单独的内层循环实现了5倍的改进(验证集每字节困惑度从-0.045降至-0.009),而无需机制变化的参数级调整则没有可靠的增益。外层循环从相邻搜索领域实例化机制,包括组合优化、多臂老虎机和实验设计,无需人工指定最终机制设计。轨迹分析表明,这些机制打破了确定性搜索模式,并迫使探索LLM先验所避免的方向。实验表明,在该基准上迈出了双层的第一步:外层循环改进了内层循环的搜索行为。在此实现中,代码是机制载体,但技能、提示、工作流、评估器、领域原则、世界模型假设和记忆模式也可以编码塑造未来智能体行为的机制。这指向了一条递归自举的路径,其中为内层循环发现的机制可以反馈回来改进元级循环本身。

英文摘要

If autoresearch is itself a form of research, then autoresearch can be applied to research itself. We present Bilevel Autoresearch, a bilevel framework in which an outer autoresearch loop improves an inner autoresearch loop by reading its code and traces, identifying bottlenecks, and generating injectable Python search mechanisms at runtime. The inner loop optimizes task performance; the outer loop optimizes how the inner loop searches. Both loops use the same LLM, so improvements come from the bilevel architecture rather than a stronger meta-level model, although the outer loop consumes additional inference and wall-clock budget. On Karpathy's GPT pretraining benchmark, the meta-autoresearch outer loop achieves a 5x improvement over the standard inner loop alone (-0.045 vs. -0.009 val_bpb), while parameter-level adjustment without mechanism change yields no reliable gain. The outer loop instantiates mechanisms from adjacent search domains, including combinatorial optimization, multi-armed bandits, and design of experiments, without human specification of the final mechanism design. Trace analysis suggests that these mechanisms break deterministic search patterns and force exploration of directions the LLM's priors avoid. The experiments demonstrate, on this benchmark, a first bilevel step: an outer loop improves the search behavior of an inner loop. Code is the mechanism carrier in this implementation, but skills, prompts, workflows, evaluators, domain principles, world-model assumptions, and memory schemas can also encode mechanisms that shape future agent behavior. This suggests a path toward recursive bootstrapping, where mechanisms discovered for the inner loop can be fed back to improve the meta-level loop itself.

2603.22121 2026-06-04 cs.CV cs.AI 版本更新

GenSpan: Generation-Calibrated Motion Span Priors for Multi-Verb Video Corpus Moment Retrieval

GenSpan: 用于多动词视频语料库时刻检索的生成校准运动跨度先验

Yunzhuo Sun, Xinyue Liu, Yanyang Li, Nanding Wu, Linlin Zong, Xianchao Zhang, Wenxin Liang

发表机构 * Dalian University of Technology(大连理工大学)

AI总结 提出GenSpan框架,利用LLM生成辅助视频作为时间先验,结合令牌选择器和双向状态空间模型,提升多动词查询下的视频语料库时刻检索与定位性能。

Comments Major revision with title change, updated method, and additional experiments

详情
AI中文摘要

视频语料库时刻检索(VCMR)旨在检索与自然语言查询对应的正确视频及其时间片段,对于时间动作顺序至关重要的多动词查询尤其具有挑战性。现有方法通常仅依赖文本或静态图像,难以捕捉隐式运动动态,导致检索错误和时间错位。我们提出GenSpan,一个生成校准的VCMR框架,从LLM选择的字幕线索和分解的子事件中构建短辅助视频,将这些作为时间先验而非直接检索目标。令牌选择器过滤与生成运动对齐的候选视频特征,双向状态空间模型高效预测视频-时刻元组。在TVR和ActivityNet-Captions上的实验表明,GenSpan提高了语料库级检索和时刻定位,特别是对于复杂的多动作查询,同时与最先进的多模态基线相比降低了计算成本。

英文摘要

Video Corpus Moment Retrieval (VCMR) aims to retrieve both the correct video and its temporal segment corresponding to a natural-language query, a task that is especially challenging for multi-verb queries where temporal action ordering is critical. Existing approaches often rely solely on text or static images and struggle to capture implicit motion dynamics, leading to retrieval errors and temporal misalignment. We propose GenSpan, a generation-calibrated VCMR framework that constructs short auxiliary videos from LLM-selected subtitle cues and decomposed sub-events, using these as temporal priors rather than direct retrieval targets. A token selector filters candidate-video features aligned with generated motion, and a bidirectional state-space model efficiently predicts video-moment tuples. Experiments on TVR and ActivityNet-Captions demonstrate that GenSpan improves corpus-level retrieval and moment localization, particularly for complex multi-action queries, while reducing computational cost compared to state-of-the-art multimodal baselines.

2603.13432 2026-06-04 cs.CV cs.AI 版本更新

Spatial Transcriptomics as Images for Large-Scale Pretraining

空间转录组学作为图像进行大规模预训练

Yishun Zhu, Jiaxin Qi, Jian Wang, Yuhua Zheng, Jianqiang Huang

发表机构 * Computer Network Information Center, Chinese Academy of Sciences(中国科学院计算机网络信息中心) Hangzhou Institute for Advanced Study, University of the Chinese Academy of Sciences(中国科学院大学杭州高等研究院)

AI总结 提出将空间转录组学数据视为可裁剪的多通道图像,通过空间分块和基因子集选择来增加训练样本并保留空间上下文,实现大规模预训练,显著提升下游任务性能。

详情
AI中文摘要

空间转录组学(ST)在组织切片上具有精确坐标的离散点处分析数千个基因表达值,保留了临床和病理研究所需的空间背景。随着测序通量的提高和平台的进步,不断增长的数据量促使大规模ST预训练成为可能。然而,预训练的基本单元(即单个训练样本的构成)仍然不明确。现有选择分为两类:(1)将每个点视为独立样本,这丢弃了空间依赖性,将ST简化为单细胞转录组学;(2)将整个切片视为单个样本,这导致输入过大且训练样本急剧减少,削弱了有效预训练。为解决这一问题,我们提出将空间转录组学视为可裁剪的图像。具体而言,我们通过从原始切片中裁剪补丁,定义了一个具有固定空间大小的多通道图像表示,从而在保留空间上下文的同时大幅增加训练样本数量。在通道维度上,我们定义了基因子集选择规则以控制输入维度并提高预训练稳定性。大量实验表明,所提出的基于图像的数据集构建方法用于ST预训练能够持续提升下游性能,优于传统预训练方案。消融研究验证了空间分块和通道设计都是必要的,从而建立了一种统一、实用的ST数据组织范式,支持大规模预训练。

英文摘要

Spatial Transcriptomics (ST) profiles thousands of gene expression values at discrete spots with precise coordinates on tissue sections, preserving spatial context essential for clinical and pathological studies. With rising sequencing throughput and advancing platforms, the expanding data volumes motivate large-scale ST pretraining. However, the fundamental unit for pretraining, i.e., what constitutes a single training sample, remains ill-posed. Existing choices fall into two camps: (1) treating each spot as an independent sample, which discards spatial dependencies and collapses ST into single-cell transcriptomics; and (2) treating an entire slide as a single sample, which produces prohibitively large inputs and drastically fewer training examples, undermining effective pretraining. To address this gap, we propose treating spatial transcriptomics as croppable images. Specifically, we define a multi-channel image representation with fixed spatial size by cropping patches from raw slides, thereby preserving spatial context while substantially increasing the number of training samples. Along the channel dimension, we define gene subset selection rules to control input dimensionality and improve pretraining stability. Extensive experiments show that the proposed image-like dataset construction for ST pretraining consistently improves downstream performance, outperforming conventional pretraining schemes. Ablation studies verify that both spatial patching and channel design are necessary, establishing a unified, practical paradigm for organizing ST data and enabling large-scale pretraining.

2603.19005 2026-06-04 cs.LG cs.AI stat.ME 版本更新

AgentDS Technical Report: Benchmarking the Future of Human-AI Collaboration in Domain-Specific Data Science

AgentDS技术报告:领域特定数据科学中人机协作的未来基准测试

An Luo, Jin Du, Xun Xian, Robert Specht, Fangqiao Tian, Ganghua Wang, Xuan Bi, Charles Fleming, Ashish Kundu, Jayanth Srinivasa, Mingyi Hong, Rui Zhang, Tianxi Li, Galin Jones, Jie Ding

发表机构 * School of Statistics, University of Minnesota(明尼苏达大学统计学系) AIScientists, Inc.(AIScientists公司) Data Science Institute, University of Chicago(芝加哥大学数据科学研究所) Carlson School of Management, University of Minnesota(明尼苏达大学卡尔森管理学院) Cisco Research(思科研究) Department of Electrical and Computer Engineering, University of Minnesota(明尼苏达大学电气与计算机工程系) Division of Computational Health Sciences, University of Minnesota(明尼苏达大学计算健康科学 division)

AI总结 提出AgentDS基准测试和竞赛,通过17个跨行业挑战评估AI代理及人机协作在领域特定数据科学中的表现,发现AI代理在领域推理上存在不足,人机协作优于纯AI方法。

详情
AI中文摘要

数据科学在将复杂数据转化为跨领域的可操作洞察方面发挥着关键作用。大型语言模型(LLM)和人工智能(AI)代理的最新发展显著自动化了数据科学工作流程。然而,目前尚不清楚AI代理在多大程度上能够匹配人类专家在领域特定数据科学任务上的表现,以及人类专业知识在哪些方面仍具有优势。我们引入了AgentDS,一个旨在评估AI代理和人机协作在领域特定数据科学中表现的基准测试和竞赛。AgentDS包含来自六个行业(商业、食品生产、医疗保健、保险、制造业和零售银行)的17个挑战。我们组织了一场公开竞赛,涉及29支队伍和80名参与者,从而能够系统比较人机协作方法与纯AI基线。我们的结果表明,当前的AI代理在领域特定推理方面存在困难。纯AI基线的表现低于竞赛参与者的前四分位数,而最强的解决方案来自人机协作。这些发现挑战了AI完全自动化的说法,并强调了人类专业知识在数据科学中的持久重要性,同时为下一代AI指明了方向。访问AgentDS网站:https://agentds.org/,开源数据集:https://huggingface.co/datasets/lainmn/AgentDS。

英文摘要

Data science plays a critical role in transforming complex data into actionable insights across numerous domains. Recent developments in large language models (LLMs) and artificial intelligence (AI) agents have significantly automated data science workflow. However, it remains unclear to what extent AI agents can match the performance of human experts on domain-specific data science tasks, and in which aspects human expertise continues to provide advantages. We introduce AgentDS, a benchmark and competition designed to evaluate both AI agents and human-AI collaboration performance in domain-specific data science. AgentDS consists of 17 challenges across six industries: commerce, food production, healthcare, insurance, manufacturing, and retail banking. We conducted an open competition involving 29 teams and 80 participants, enabling systematic comparison between human-AI collaborative approaches and AI-only baselines. Our results show that current AI agents struggle with domain-specific reasoning. AI-only baselines perform below the top quartile of competition participants, while the strongest solutions arise from human-AI collaboration. These findings challenge the narrative of complete automation by AI and underscore the enduring importance of human expertise in data science, while illuminating directions for the next generation of AI. Visit the AgentDS website here: https://agentds.org/ and open source datasets here: https://huggingface.co/datasets/lainmn/AgentDS .

2603.18577 2026-06-04 cs.AI 版本更新

MedForge: Interpretable Medical Deepfake Detection via Forgery-aware Reasoning

MedForge:基于伪造感知推理的可解释医学深度伪造检测

Zhihui Chen, Kai He, Qingyuan Lei, Bin Pu, Jian Zhang, Yuling Xu, Mengling Feng

发表机构 * National University of Singapore(新加坡国立大学) The Chinese University of Hong Kong(香港中文大学) Hunan University(湖南大学) Xi’an Jiaotong University(西安交通大学) Guangdong Provincial People’s Hospital(广东省人民医院)

AI总结 提出MedForge框架,通过构建大规模基准数据集MedForge-90K和局部-分析推理方法,实现可解释的医学图像伪造检测,在准确性和专家对齐解释上达到最优。

详情
AI中文摘要

文本引导的图像编辑器现在能够以高保真度操纵真实的医学扫描,实现病灶植入/移除,威胁临床信任和安全。现有防御措施不足以应对医疗领域。医学检测器大多是黑箱,而基于MLLM的解释器通常是事后解释,缺乏医学专业知识,并可能在模糊案例上产生幻觉证据。我们提出MedForge,一种用于事前、基于证据的医学伪造检测的数据和方法解决方案。我们引入了MedForge-90K,这是一个大规模基准数据集,包含19种病理的真实病灶编辑,并通过医生检查指南和黄金编辑位置提供专家指导的推理监督。在此基础上,MedForge-Reasoner执行局部-分析推理,在产生判决前预测可疑区域,并通过伪造感知GSPO进一步对齐,以加强基础并减少幻觉。实验表明,该方法在检测准确性和可信、专家对齐的解释方面达到了最先进水平。

英文摘要

Text-guided image editors can now manipulate authentic medical scans with high fidelity, enabling lesion implantation/removal that threatens clinical trust and safety. Existing defenses are inadequate for healthcare. Medical detectors are largely black-box, while MLLM-based explainers are typically post-hoc, lack medical expertise, and may hallucinate evidence on ambiguous cases. We present MedForge, a data-and-method solution for pre-hoc, evidence-grounded medical forgery detection. We introduce MedForge-90K, a large-scale benchmark of realistic lesion edits across 19 pathologies with expert-guided reasoning supervision via doctor inspection guidelines and gold edit locations. Building on it, MedForge-Reasoner performs localize-then-analyze reasoning, predicting suspicious regions before producing a verdict, and is further aligned with Forgery-aware GSPO to strengthen grounding and reduce hallucinations. Experiments demonstrate state-of-the-art detection accuracy and trustworthy, expert-aligned explanations.

2603.12433 2026-06-04 cs.CV cs.AI cs.LG 版本更新

Revisiting Model Stitching In the Foundation Model Era

重新审视基础模型时代的模型拼接

Zheda Mai, Ke Zhang, Fu-En Wang, Zixiao Ken Wang, Albert Y. C. Chen, Lu Xia, Min Sun, Wei-Lun Chao, Cheng-Hao Kuo

发表机构 * The Ohio State University(俄亥俄州立大学) Boston University(波士顿大学) Amazon(亚马逊)

AI总结 本文通过系统协议研究视觉基础模型(如CLIP、DINOv2、SigLIP 2)的可拼接性,提出基于目标模型倒数第二层特征匹配损失的拼接方法,并构建VFM拼接树(VST)实现多模态大模型中多个VFM的准确率-延迟权衡。

Comments Accepted by CVPR 2026

详情
AI中文摘要

模型拼接通过一个轻量拼接层将一个模型(源)的早期层连接到另一个模型(目标)的后期层,作为表征兼容性的探针。先前工作发现,尽管初始化或目标不同,但基于同一数据集训练的模型仍然是可拼接的(准确率下降可忽略)。我们重新审视在目标、数据和模态组合(例如CLIP、DINOv2、SigLIP 2)上各异的视觉基础模型(VFM)的拼接,并提出问题:异构VFM是否可拼接?我们引入了一个系统协议,涵盖拼接点、拼接层家族、训练损失和下游任务。三个发现浮现:(1)拼接层训练至关重要:传统方法在拼接点匹配中间特征或端到端优化任务损失时难以保持准确率,尤其是在浅层拼接点。(2)通过在目标模型的倒数第二层使用简单的特征匹配损失,异构VFM在视觉任务上变得可靠可拼接。(3)对于深层拼接点,拼接模型可以超越任一组成模型,仅增加少量推理开销(用于拼接层)。基于这些发现,我们进一步提出VFM拼接树(VST),它在多个VFM之间共享早期层同时保留其后期层,为通常利用多个VFM的多模态大语言模型提供了可控的准确率-延迟权衡。综合来看,我们的研究将拼接从诊断探针提升为整合互补VFM优势并定位其表征对齐或分歧点的实用方法。

英文摘要

Model stitching, connecting early layers of one model (source) to later layers of another (target) via a light stitch layer, has served as a probe of representational compatibility. Prior work finds that models trained on the same dataset remain stitchable (negligible accuracy drop) despite different initializations or objectives. We revisit stitching for Vision Foundation Models (VFMs) that vary in objectives, data, and modality mix (e.g., CLIP, DINOv2, SigLIP 2) and ask: Are heterogeneous VFMs stitchable? We introduce a systematic protocol spanning the stitch points, stitch layer families, training losses, and downstream tasks. Three findings emerge. (1) Stitch layer training matters: conventional approaches that match the intermediate features at the stitch point or optimize the task loss end-to-end struggle to retain accuracy, especially at shallow stitch points. (2) With a simple feature-matching loss at the target model's penultimate layer, heterogeneous VFMs become reliably stitchable across vision tasks. (3) For deep stitch points, the stitched model can surpass either constituent model at only a small inference overhead (for the stitch layer). Building on these findings, we further propose the VFM Stitch Tree (VST), which shares early layers across VFMs while retaining their later layers, yielding a controllable accuracy-latency trade-off for multimodal LLMs that often leverage multiple VFMs. Taken together, our study elevates stitching from a diagnostic probe to a practical recipe for integrating complementary VFM strengths and pinpointing where their representations align or diverge.

2603.13384 2026-06-04 cs.SE cs.AI 版本更新

VulnAgent-R2: Evidence-Calibrated Multi-Agent Auditing for Repository-Level Vulnerability Detection

VulnAgent-R2: 证据校准的多智能体审计用于仓库级漏洞检测

Renwei Meng, Haoyi Wu, Jingming Wang

发表机构 * stu.ahu.edu.cn(安徽大学)

AI总结 提出VulnAgent-R2,一个预算感知的多智能体审计框架,通过反事实证据重加权、构建感知验证计划合成和成本风险帕累托调度器,在仓库级漏洞检测中提升F1和AUROC,并降低在线令牌消耗。

Comments 13 pages, 4 figures

详情
AI中文摘要

软件漏洞通常依赖于跨文件数据流、构建选项、框架约定和运行时防护,因此孤立的函数分类器会产生脆弱且校准不良的警告。仓库级LLM智能体可以收集更丰富的证据,但先前的变体对可重复性、验证器行为、基线公平性和统计不确定性规定不足。我们提出VulnAgent-R2,一个预算感知的智能体审计框架,包含三个额外的可重用模块:反事实证据重加权、构建感知验证计划合成和成本风险帕累托调度器。该系统结合了图分类、有界上下文优化、角色专业化智能体、怀疑性反证据、选择性动态验证和校准融合。在Devign、Big-Vul、DiverseVul和PrimeVul上,VulnAgent-R2分别获得0.798/0.895、0.739/0.871、0.700/0.842和0.385/0.781的F1/AUROC。在JITVul上,它达到0.606 F1、0.529 Top-1和0.742 Top-3定位,同时在线令牌比始终全量多智能体执行减少38.3%。在线时间包括检索、LLM调用、CER评分、验证器规划、编译和测试执行,但不包括一次性共享索引。Bootstrap检验显示,在PrimeVul上相对于VulnAgent-X的增益为+0.038 F1,95% CI [0.020, 0.055],Holm调整p=0.009。将漏洞检测视为校准证据积累,在评估协议下提高了检测、定位、可审计性和成本控制,同时仍然是手动审查的辅助而非替代。代码可在https://github.com/renweimeng/Vlun-Agent-X获取。

英文摘要

Software vulnerabilities often depend on cross-file data flow, build options, framework conventions, and runtime guards, so isolated function classifiers produce fragile and poorly calibrated warnings. Repository-level LLM agents can gather richer evidence, but prior variants under-specify reproducibility, verifier behavior, baseline fairness, and statistical uncertainty. We present VulnAgent-R2, a budget-aware agentic auditing framework with three additional reusable modules: counterfactual evidence reweighting, build-aware verification-plan synthesis, and a cost-risk Pareto scheduler. The system combines graph triage, bounded context optimization, role-specialized agents, sceptic counter-evidence, selective dynamic verification, and calibrated fusion. On Devign, Big-Vul, DiverseVul, and PrimeVul, VulnAgent-R2 obtains 0.798/0.895, 0.739/0.871, 0.700/0.842, and 0.385/0.781 F1/AUROC, respectively. On JITVul it reaches 0.606 F1, 0.529 Top-1, and 0.742 Top-3 localization, while reducing online tokens by 38.3\% over always-full multi-agent execution. Online time includes retrieval, LLM calls, CER scoring, verifier planning, compilation, and test execution, but excludes one-time shared indexing. Bootstrap tests show the PrimeVul gain over VulnAgent-X is +0.038 F1, 95\% CI [0.020, 0.055], Holm-adjusted $p=0.009$. Treating vulnerability detection as calibrated evidence accumulation improves detection, localization, auditability, and cost control under the evaluated protocol, while remaining a prioritization aid rather than a replacement for manual review.Code is available at https://github.com/renweimeng/Vlun-Agent-X.

2602.23312 2026-06-04 cs.HC cs.AI cs.LG cs.RO cs.SY eess.SY 版本更新

Evaluating Zero-Shot and One-Shot Adaptation of Small Language Models in Leader-Follower Interaction

评估小语言模型在领导者-跟随者交互中的零样本和单样本适应

Rafael R. Baptista, André de Lima Salgado, Ricardo V. Godoy, Marcelo Becker, Thiago Boaventura, Gustavo J. G. Lahr

发表机构 * University of Sao Paulo(圣保罗大学) Federal University of Lavras(拉瓦尔联邦大学) Faculdade Israelita de Ensino e Pesquisa Albert Einstein(亚伯拉罕·林克·埃instein教育与研究学院)

AI总结 本文通过微调小语言模型(Qwen2.5-0.5B)在领导者-跟随者交互中实现角色分类,零样本微调达到86.66%准确率且延迟低至22.2毫秒,但单样本模式因上下文长度增加导致性能下降。

详情
AI中文摘要

领导者-跟随者交互是人机交互(HRI)中的一个重要范式。然而,对于资源受限的移动和辅助机器人来说,实时分配角色仍然具有挑战性。虽然大型语言模型(LLMs)在自然通信方面显示出潜力,但其规模和延迟限制了设备端部署。小语言模型(SLMs)提供了一种潜在的替代方案,但它们在HRI中角色分类的有效性尚未得到系统评估。在本文中,我们提出了一个用于领导者-跟随者通信的SLMs基准测试,引入了一个源自已发表数据库的新数据集,并增加了合成样本以捕捉交互特定的动态。我们研究了两种适应策略:提示工程和微调,在零样本和单样本交互模式下进行研究,并与未训练的基线进行比较。使用Qwen2.5-0.5B的实验表明,零样本微调实现了稳健的分类性能(86.66%准确率),同时保持低延迟(每个样本22.2毫秒),显著优于基线和提示工程方法。然而,结果也表明在单样本模式下性能下降,其中增加的上下文长度挑战了模型的架构能力。这些发现表明,微调的SLMs为直接角色分配提供了有效的解决方案,同时突出了边缘端对话复杂性与分类可靠性之间的关键权衡。

英文摘要

Leader-follower interaction is an important paradigm in human-robot interaction (HRI). Yet, assigning roles in real time remains challenging for resource-constrained mobile and assistive robots. While large language models (LLMs) have shown promise for natural communication, their size and latency limit on-device deployment. Small language models (SLMs) offer a potential alternative, but their effectiveness for role classification in HRI has not been systematically evaluated. In this paper, we present a benchmark of SLMs for leader-follower communication, introducing a novel dataset derived from a published database and augmented with synthetic samples to capture interaction-specific dynamics. We investigate two adaptation strategies: prompt engineering and fine-tuning, studied under zero-shot and one-shot interaction modes, compared with an untrained baseline. Experiments with Qwen2.5-0.5B reveal that zero-shot fine-tuning achieves robust classification performance (86.66% accuracy) while maintaining low latency (22.2 ms per sample), significantly outperforming baseline and prompt-engineered approaches. However, results also indicate a performance degradation in one-shot modes, where increased context length challenges the model's architectural capacity. These findings demonstrate that fine-tuned SLMs provide an effective solution for direct role assignment, while highlighting critical trade-offs between dialogue complexity and classification reliability on the edge.

2603.10289 2026-06-04 quant-ph cs.AI cs.LG 版本更新

Quantum entanglement provides a competitive advantage in adversarial games

量子纠缠在对抗性博弈中提供竞争优势

Peiyong Wang, Kieran Hymas, James Quach

发表机构 * CSIRO(联邦科学与工业研究组织)

AI总结 本研究通过量子-经典混合智能体在Pong对抗性马尔可夫博弈中的实验,发现纠缠量子电路在特征提取和竞争性强化学习中优于可分离电路,表明量子纠缠可作为表示学习的功能资源。

Comments 22 pages, 5 figures

详情
AI中文摘要

量子资源是否能在完全经典的竞争环境中提供优势仍然是一个悬而未决的问题。竞争性零和强化学习尤其具有挑战性,因为成功需要对对抗智能体之间的动态交互进行建模,而非静态的状态-动作映射。在此,我们进行了一项受控研究,隔离了量子纠缠在训练于Pong(一个竞争性马尔可夫博弈)的量子-经典混合智能体中的作用。一个8量子比特参数化量子电路作为近端策略优化框架内的特征提取器,允许直接比较可分离电路与包含固定(CZ)或可训练(IsingZZ)纠缠门的架构。纠缠电路在参数数量相当的情况下始终优于可分离电路,并且在低容量区域中达到或超过经典多层感知机基线。表示相似性分析进一步表明,纠缠电路学习到结构上不同的特征,与对交互状态变量的改进建模一致。这些发现确立了纠缠作为竞争性强化学习中表示学习的功能资源。

英文摘要

Whether uniquely quantum resources confer advantages in fully classical, competitive environments remains an open question. Competitive zero-sum reinforcement learning is particularly challenging, as success requires modelling dynamic interactions between opposing agents rather than static state-action mappings. Here, we conduct a controlled study isolating the role of quantum entanglement in a quantum-classical hybrid agent trained on Pong, a competitive Markov game. An 8-qubit parameterised quantum circuit serves as a feature extractor within a proximal policy optimisation framework, allowing direct comparison between separable circuits and architectures incorporating fixed (CZ) or trainable (IsingZZ) entangling gates. Entangled circuits consistently outperform separable counterparts with comparable parameter counts and, in low-capacity regimes, match or exceed classical multilayer perceptron baselines. Representation similarity analysis further shows that entangled circuits learn structurally distinct features, consistent with improved modelling of interacting state variables. These findings establish entanglement as a function resource for representation learning in competitive reinforcement learning.

2603.10044 2026-06-04 cs.SE cs.AI cs.CL cs.LG 版本更新

Safety Under Scaffolding: How Evaluation Conditions Shape Measured Safety

脚手架下的安全性:评估条件如何影响测量的安全性

David Gringras

发表机构 * Harvard University(哈佛大学) MIT(麻省理工学院)

AI总结 本研究通过62,808次盲法预注册评估,测试了六种前沿模型在四种部署配置下的安全性,发现脚手架架构对安全性影响较小,而格式转换(如选择题与开放式问题)可导致5-20个百分点的测量差异,且模型-脚手架间存在显著异质性,质疑了单一综合安全性分数的实用性。

Comments 74 pages including appendices. 6 frontier models, 62,808 primary observations (~89k total). Pre-registered: OSF DOI 10.17605/OSF.IO/CJW92. Code and data: https://github.com/davidgringras/safety-under-scaffolding

详情
AI中文摘要

在基准测试中获得的安全分数不一定能预测同一模型在未经测试的智能体脚手架中的行为。我们通过四种部署配置(直接API、ReAct、多智能体批评者、map-reduce委托)运行了六种前沿模型:在四个安全基准测试(BBQ、TruthfulQA、XSTest/OR-Bench、sycophancy)上进行了N = 62,808次盲法、预注册、等价性检验评估,以及三项支持性分析。ReAct和多智能体脚手架保持在预注册的±2个百分点的等价范围内;map-reduce委托降低了测量的安全性(NNH = 14),尽管这种损失很大程度上是测量伪影:在相同项目上,选择题与开放式问题的措辞使测量的安全率变化5-20个百分点,而分解过程无声地移除了选择题选项。每个模型map-reduce损失的约40-89%归因于这种格式转换而非推理中断,一种保留选项的变体恢复了大部分损失。汇总效应也掩盖了模型与脚手架之间的显著异质性:在map-reduce下,对于相同项目,Opus损失16.8个百分点,而Llama 4增加18.8个百分点。从结构上看,脚手架架构仅解释了0.4%的结果方差(基准选择解释了45倍以上),泛化系数G = 0.000(bootstrap 95% CI [0.000, 0.752])。如此宽的区间本身足以削弱任何单一综合安全分数作为部署标准的效用。这些是“简单案例”;像诡计和CBRN提升这样的重要属性没有明显理由对格式或脚手架不敏感。代码、数据和提示已作为ScaffoldSafety发布。

英文摘要

A safety score earned on a benchmark need not predict how the same model behaves once it is wrapped in an agentic scaffold the benchmark never tested. We ran six frontier models through four deployment configurations (direct API, ReAct, multi-agent critic, map-reduce delegation): N = 62,808 blinded, pre-registered, equivalence-tested evaluations across four safety benchmarks (BBQ, TruthfulQA, XSTest/OR-Bench, sycophancy), plus three supporting analyses. ReAct and multi-agent scaffolds stay within a pre-registered +/-2 pp equivalence margin; map-reduce delegation degrades measured safety (NNH = 14), though that loss is largely a measurement artifact: on identical items, multiple-choice versus open-ended phrasing shifts the measured safety rate by 5-20 pp, and decomposition silently strips the multiple-choice options. Roughly 40-89% of the per-model map-reduce loss is this format conversion rather than reasoning disruption, and an option-preserving variant recovers most of it. Pooled effects also mask sharp model-by-scaffold heterogeneity: under map-reduce, on identical items, Opus loses 16.8 pp while Llama 4 gains 18.8 pp. Structurally, scaffold architecture explains only 0.4% of outcome variance (benchmark choice explains 45x more), and the generalizability coefficient is G = 0.000 (bootstrap 95% CI [0.000, 0.752]). An interval that wide is enough on its own to undermine the utility of any single composite safety number as a deployment criterion. These are the "easy cases"; consequential properties like scheming and CBRN uplift have no obvious reason to be less format- or scaffold-sensitive. Code, data, and prompts are released as ScaffoldSafety.

2603.09493 2026-06-04 cs.CV cs.AI 版本更新

EvoPrompt: Guided Prompt Evolution for Vision-Language Models Adaptation

EvoPrompt: 引导提示演化以适应视觉-语言模型

Enming Zhang, Jiayang Li, Yanlong Wang, Yanru Wu, Zhenyu Liu, Yang Li

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院,清华大学) Sun Yat-sen University(中山大学) Chinese University of Hong Kong, Shenzhen(香港中文大学(深圳))

AI总结 提出EvoPrompt框架,通过引导提示演化路径并解耦低秩更新为方向和幅度分量,实现视觉-语言模型在少样本学习中的遗忘-free微调,同时保持零样本能力。

详情
AI中文摘要

大规模视觉-语言模型(VLM)在有限标注数据下适应下游任务仍然是一个重大挑战。虽然参数高效的提示学习方法提供了一条有希望的路径,但它们常常遭受预训练知识的灾难性遗忘。为了解决这一限制,我们的工作基于一个洞察:控制提示的演化路径对于遗忘-free适应至关重要。为此,我们提出了EvoPrompt,一个旨在明确引导提示轨迹以进行知识保留微调的新型框架。具体来说,我们的方法采用模态共享提示投影器(MPP)从统一嵌入空间生成层次化提示。关键的是,一种演化训练策略将低秩更新解耦为方向和幅度分量,保留早期学习的语义方向而仅调整其幅度,从而使提示能够在不丢弃基础知识的情况下演化。这一过程通过特征几何正则化(FGR)进一步稳定,该正则化强制特征去相关以防止表示崩溃。大量实验表明,EvoPrompt在少样本学习中实现了最先进的性能,同时稳健地保留了预训练VLM的原始零样本能力。

英文摘要

The adaptation of large-scale vision-language models (VLMs) to downstream tasks with limited labeled data remains a significant challenge. While parameter-efficient prompt learning methods offer a promising path, they often suffer from catastrophic forgetting of pre-trained knowledge. Toward addressing this limitation, our work is grounded in the insight that governing the evolutionary path of prompts is essential for forgetting-free adaptation. To this end, we propose EvoPrompt, a novel framework designed to explicitly steer the prompt trajectory for knowledge-preserving fine-tuning. Specifically, our approach employs a Modality-Shared Prompt Projector (MPP) to generate hierarchical prompts from a unified embedding space. Critically, an evolutionary training strategy decouples low-rank updates into directional and magnitude components, preserving early-learned semantic directions while only adapting their magnitude, thus enabling prompts to evolve without discarding foundational knowledge. This process is further stabilized by Feature Geometric Regularization (FGR), which enforces feature decorrelation to prevent representation collapse. Extensive experiments demonstrate that EvoPrompt achieves state-of-the-art performance in few-shot learning while robustly preserving the original zero-shot capabilities of pre-trained VLMs.

2603.09391 2026-06-04 cs.SD cs.AI eess.AS 版本更新

Physics-Informed Neural Engine Sound Modeling with Differentiable Pulse-Train Synthesis

基于物理信息的神经引擎声音建模与可微分脉冲串合成

Robin Doerfler, Lonce Wyse

发表机构 * GitHub

AI总结 提出脉冲串-谐振器(PTR)模型,通过可微分合成架构直接建模发动机脉冲形状和时间结构,利用物理信息归纳偏置提升谐波重建质量并降低总损失。

Comments Revised version; to appear in the Proceedings of the 34th European Signal Processing Conference (EUSIPCO 2026)

详情
AI中文摘要

发动机声音源自连续的排气压力脉冲,而非持续的谐波振荡。虽然神经合成方法通常旨在近似最终的频谱特性,但我们提出直接建模底层脉冲形状和时间结构。我们提出了脉冲串-谐振器(PTR)模型,这是一种可微分合成架构,通过将发动机音频生成为与发动机点火模式对齐的参数化脉冲串,并通过模拟排气声学的递归Karplus-Strong谐振器传播它们。该架构集成了物理信息归纳偏置,包括谐波衰减、热力学音高调制、气门动力学包络、排气系统共振以及导出的发动机运行模式,如节气门操作和减速断油(DFCO)。在三种不同发动机类型(总计7.5小时音频)上验证,PTR在谐波重建上比谐波加噪声基线模型提高了21%,总损失降低了5.7%,同时提供了对应于物理现象的可解释参数。完整的代码、模型权重和音频示例已公开提供。

英文摘要

Engine sounds originate from sequential exhaust pressure pulses rather than sustained harmonic oscillations. While neural synthesis methods typically aim to approximate the resulting spectral characteristics, we propose directly modeling the underlying pulse shapes and temporal structure. We present the Pulse-Train-Resonator (PTR) model, a differentiable synthesis architecture that generates engine audio as parameterized pulse trains aligned to engine firing patterns and propagates them through recursive Karplus-Strong resonators simulating exhaust acoustics. The architecture integrates physics-informed inductive biases including harmonic decay, thermodynamic pitch modulation, valve-dynamics envelopes, exhaust system resonances and derived engine operating modes such as throttle operation and Deceleration Fuel Cutoff (DFCO). Validated on three diverse engine types totaling 7.5 hours of audio, PTR achieves a 21% improvement in harmonic reconstruction and a 5.7% reduction in total loss over a harmonic-plus-noise baseline model, while providing interpretable parameters corresponding to physical phenomena. Complete code, model weights, and audio examples are openly available.

2603.09170 2026-06-04 cs.RO cs.AI 版本更新

ZeroWBC: Learning Natural Whole-Body Humanoid Interaction from Human Egocentric Data

ZeroWBC: 从人类自我中心数据学习自然全身人形交互

Haoran Yang, Jiacheng Bao, Yucheng Xin, Haoming Song, Yuyang Tian, Bin Zhao, Dong Wang, Xuelong Li

发表机构 * University of Science and Technology of China(中国科学技术大学) Shanghai AI Laboratory(上海人工智能实验室) Northwestern Polytechnical University(西北工业大学) Tsinghua University(清华大学) Shanghai Jiao Tong University(上海交通大学) TeleAI, China Telecom(TeleAI,中国电信)

AI总结 提出ZeroWBC框架,利用人类自我中心视频和同步全身运动数据,通过生成-跟踪方法实现无遥操作的人形机器人全身交互控制。

详情
AI中文摘要

由于全身遥操作数据成本高昂,实现多功能且自然的全身人形交互控制仍然具有挑战性。我们提出ZeroWBC,一种无需遥操作的框架,从人类自我中心视频以及同步的全身运动和文本注释中学习人形全身交互。ZeroWBC采用生成-跟踪公式来解决静态场景全身交互控制问题。给定初始自我中心图像和语言指令,微调的视觉-语言模型生成未来人类全身运动标记,这些标记被解码为连续运动并重定向到人形机器人。得到的参考运动,连同根部和关键身体部位轨迹,然后由通用交互运动跟踪策略执行。为了提高交互性能,我们引入了一种面向交互的跟踪奖励,该奖励优先考虑全局根部和关键身体部位轨迹对齐,同时保持自然的全身运动。在Unitree G1人形机器人上的实验表明,ZeroWBC无需机器人遥操作演示即可实现多样化的场景感知行为。这些结果表明了一种从人类自我中心数据学习自然人形全身交互的可扩展范式。

英文摘要

Achieving versatile and natural whole-body humanoid interaction control remains challenging due to the high cost of whole-body teleoperation data. We present ZeroWBC, a teleoperation-free framework that learns humanoid whole-body interaction from human egocentric videos paired with synchronized whole-body motion and text annotations. ZeroWBC adopts a generation-then-tracking formulation to tackle the static scene whole-body interaction control problem. Given an initial egocentric image and a language instruction, a fine-tuned Vision-Language Model generates future human whole-body motion tokens, which are decoded into continuous motions and retargeted to the humanoid. The resulting reference motions, together with root and key body-part trajectories, are then executed by a general interactive motion tracking policy. To improve interaction performance, we introduce an interaction-oriented tracking reward that prioritizes global root and key body-part trajectory alignment while preserving natural whole-body motion. Experiments on the Unitree G1 humanoid robot show that ZeroWBC enables diverse scene-aware behaviors without robot teleoperation demonstrations. These results suggest a scalable paradigm for learning natural humanoid whole-body interaction from human egocentric data.

2510.27191 2026-06-04 cs.RO cs.AI 版本更新

Vectorized Online POMDP Planning

向量化在线POMDP规划

Marcus Hoerger, Muhammad Sudrajat, Hanna Kurniawati

发表机构 * School of Computing, Australian National University(澳大利亚国立大学计算机学院)

AI总结 提出向量化在线POMDP规划器(VOPP),通过全向量化计算消除并行瓶颈,实现大规模并行在线规划,计算效率比现有最先进并行求解器高至少20倍。

Comments 8 pages, 3 figures. Accepted at ICRA 2026

详情
AI中文摘要

部分可观测下的规划是自主机器人的一项基本能力。部分可观测马尔可夫决策过程(POMDP)为部分可观测问题下的规划提供了强大的框架,捕捉了动作的随机效应以及通过噪声观测获得的有限信息。POMDP求解可以极大受益于当今硬件上的大规模并行化,但并行化POMDP求解器一直具有挑战性。大多数求解器依赖于将动作上的数值优化与其价值估计交错进行,这会在并行进程之间产生依赖关系和同步瓶颈,从而抵消并行化的好处。在本文中,我们提出了向量化在线POMDP规划器(VOPP),一种新颖的并行在线求解器,它利用了最近的POMDP公式,该公式解析地解决了优化组件的一部分,将数值计算仅保留为期望的估计。VOPP将所有与规划相关的数据结构表示为张量集合,并将所有规划步骤实现为该表示上的全向量化计算。结果是一个大规模并行的在线求解器,并发进程之间没有依赖关系或同步瓶颈。实验结果表明,与现有的最先进并行在线求解器相比,VOPP在计算近最优解方面的效率至少高出20倍。此外,VOPP优于最先进的顺序在线求解器,同时使用的规划预算小1000倍。

英文摘要

Planning under partial observability is an essential capability of autonomous robots. The Partially Observable Markov Decision Process (POMDP) provides a powerful framework for planning under partial observability problems, capturing the stochastic effects of actions and the limited information available through noisy observations. POMDP solving could benefit tremendously from massive parallelization on today's hardware, but parallelizing POMDP solvers has been challenging. Most solvers rely on interleaving numerical optimization over actions with the estimation of their values, which creates dependencies and synchronization bottlenecks between parallel processes that can offset the benefits of parallelization. In this paper, we propose Vectorized Online POMDP Planner (VOPP), a novel parallel online solver that leverages a recent POMDP formulation which analytically solves part of the optimization component, leaving numerical computations to consist of only estimation of expectations. VOPP represents all data structures related to planning as a collection of tensors, and implements all planning steps as fully vectorized computations over this representation. The result is a massively parallel online solver with no dependencies or synchronization bottlenecks between concurrent processes. Experimental results indicate that VOPP is at least $20\times$ more efficient in computing near-optimal solutions compared to an existing state-of-the-art parallel online solver. Moreover, VOPP outperforms state-of-the-art sequential online solvers, while using a planning budget that is $1000\times$ smaller.

2603.04444 2026-06-04 cs.NI cs.AI 版本更新

vLLM Semantic Router: Signal Driven Decision Routing for Mixture-of-Modality Models

vLLM Semantic Router: 面向多模态混合模型的信号驱动决策路由

Xunzhuo Liu, Huamin Chen, Samzong Lu, Yossi Ovadia, Guohong Wen, Hao Wu, Zhengda Tan, Jintao Zhang, Senan Zedan, Yehudit Kerido, Liav Weiss, Haichen Zhang, Bishen Yu, Asaad Balum, Noa Limoy, Abdallah Samara, Baofa Fan, Brent Salisbury, Ryan Cook, Zhijie Wang, Qiping Pan, Rehan Khan, Avishek Goswami, Houston H. Zhang, Shuyi Wang, Ziang Tang, Fang Han, Zohaib Hassan, Jianqiao Zheng, Avinash Changrani, Xue, Liu, Bowei He

发表机构 * MBZUAI(穆斯林人工智能研究所)

AI总结 提出vLLM Semantic Router框架,通过可组合信号编排(13种异构信号类型和布尔决策规则)实现多模态模型部署中的智能请求路由,支持不同场景的差异化策略配置。

Comments Technical Report

详情
AI中文摘要

随着大语言模型在模态、能力和成本配置上的多样化,智能请求路由(即在推理时为每个查询选择合适模型)已成为关键的系统挑战。我们提出vLLM Semantic Router,一种面向多模态混合模型部署的信号驱动决策路由框架。该架构遵循两种互补的香农启发视角。在信息论机制中,信号提取通过从原始查询中提炼路由相关信息来降低“选择哪个模型?”的熵。在布尔代数机制中,决策引擎将信号条件组合成功能完备的路由策略。核心创新是可组合信号编排:13种异构信号类型(涵盖亚毫秒级启发式方法以及用于语义、安全性和模态的神经分类器)通过可配置的布尔决策规则组合成部署特定的路由策略,使得根本不同的场景(多云企业、隐私监管、成本优化)被表达为同一架构上的不同配置。匹配的决策通过13种选择算法驱动语义模型路由,而每个决策的插件链强制执行安全约束,包括三阶段HaluGate幻觉检测流水线和轻量级情景记忆系统(带ReflectionGate用于个性化多轮上下文)。一种类型化的神经符号DSL指定这些路由策略并将其编译到多个部署目标,实现无需代码更改的配置优先适配。这些组件共同表明,可组合信号编排使单一框架能够以差异化的成本、隐私和安全策略服务于多种部署场景。

英文摘要

As large language models (LLMs) diversify across modalities, capabilities, and cost profiles, the problem of intelligent request routing: selecting the right model for each query at inference time, has become a critical systems challenge. We present vLLM Semantic Router, a signal-driven decision routing framework for Mixture-of-Modality (MoM) model deployments. The architecture follows two complementary Shannon-inspired views. In the information-theoretic regime, signal extraction reduces the entropy of "which model?" by distilling routing-relevant information from raw queries. In the Boolean-algebraic regime, the decision engine composes functionally complete routing policies from signal conditions. The central innovation is composable signal orchestration: thirteen heterogeneous signal types, spanning sub-millisecond heuristics and neural classifiers for semantics, safety, and modality, are composed through configurable Boolean decision rules into deployment-specific routing policies, so that fundamentally different scenarios (multi-cloud enterprise, privacy-regulated, cost-optimized) are expressed as different configurations over the same architecture. Matched decisions drive semantic model routing via thirteen selection algorithms, while per-decision plugin chains enforce safety constraints including a three-stage HaluGate hallucination detection pipeline and a lightweight episodic memory system with ReflectionGate for personalized multi-turn context. A typed neural-symbolic DSL specifies these routing policies and compiles them to multiple deployment targets, enabling configuration-first adaptation without code changes. Together, these components show that composable signal orchestration enables a single framework to serve diverse deployment scenarios with differentiated cost, privacy, and safety policies.

2603.03482 2026-06-04 cs.CV cs.AI cs.LG 版本更新

Beyond Pixel Histories: World Models with Persistent 3D State

超越像素历史:具有持久3D状态的世界模型

Samuel Garcin, Thomas Walker, Steven McDonagh, Tim Pearce, Hakan Bilen, Tianyu He, Kaixin Wang, Jiang Bian

发表机构 * University of Edinburgh(爱丁堡大学) Microsoft Research(微软研究院)

AI总结 提出PERSIST范式,通过模拟潜在3D场景(环境、相机、渲染器)的演化,实现具有持久空间记忆和一致几何的世界模型,显著提升3D一致性、空间记忆和长期稳定性。

Comments Accepted to the International Conference on Machine Learning (ICML) 2026. To appear in the Proceedings of Machine Learning Research (PMLR). 9 pages

详情
AI中文摘要

交互式世界模型通过响应用户的动作持续生成视频,实现开放式的生成能力。然而,现有模型通常缺乏环境的3D表示,意味着3D一致性必须从数据中隐式学习,且空间记忆受限于有限的时域上下文窗口。这导致不真实的用户体验,并对训练智能体等下游任务构成重大障碍。为解决这一问题,我们提出PERSIST,一种新的世界模型范式,它模拟潜在3D场景(环境、相机和渲染器)的演化。这使得我们能够合成具有持久空间记忆和一致几何的新帧。定量指标和定性用户研究均表明,与现有方法相比,在空间记忆、3D一致性和长期稳定性方面有显著提升,从而实现连贯、演化的3D世界。我们进一步展示了新颖的能力,包括从单张图像合成多样化的3D环境,以及通过直接在3D空间中支持环境编辑和指定,实现对生成体验的细粒度、几何感知控制。项目页面:https://francelico.github.io/persist.github.io

英文摘要

Interactive world models continually generate video by responding to a user's actions, enabling open-ended generation capabilities. However, existing models typically lack a 3D representation of the environment, meaning 3D consistency must be implicitly learned from data, and spatial memory is restricted to limited temporal context windows. This results in an unrealistic user experience and presents significant obstacles to downstream tasks such as training agents. To address this, we present PERSIST, a new paradigm of world model which simulates the evolution of a latent 3D scene: environment, camera, and renderer. This allows us to synthesise new frames with persistent spatial memory and consistent geometry. Both quantitative metrics and a qualitative user study show substantial improvements in spatial memory, 3D consistency, and long-horizon stability over existing methods, enabling coherent, evolving 3D worlds. We further demonstrate novel capabilities, including synthesising diverse 3D environments from a single image, as well as enabling fine-grained, geometry-aware control over generated experiences by supporting environment editing and specification directly in 3D space. Project page: https://francelico.github.io/persist.github.io

2603.02697 2026-06-04 cs.CV cs.AI 版本更新

ShareVerse: Multi-Agent Consistent Video Generation for Shared World Modeling

ShareVerse:面向共享世界建模的多智能体一致视频生成

Jiayi Zhu, Jianing Zhang, Yiying Yang, Wei Cheng, Xiaoyun Yuan

发表机构 * Shanghai Jiao Tong University China(上海交通大学中国) Fudan University China(复旦大学中国) StepFun China(StepFun中国)

AI总结 提出ShareVerse框架,通过构建多智能体交互数据集、空间拼接策略和跨智能体注意力机制,实现多智能体共享世界的一致视频生成。

详情
AI中文摘要

本文提出ShareVerse,一个视频生成框架,支持多智能体共享世界建模,解决了现有工作缺乏统一共享世界构建和多智能体交互支持的问题。ShareVerse利用大型视频模型的生成能力,并整合了三个关键创新:1)在CARLA仿真平台上构建了大规模多智能体交互世界建模数据集,包含多样场景、天气条件和交互轨迹,以及配对的每智能体四视角视频(前/后/左/右视图)和相机数据。2)我们提出了一种针对独立智能体四视角视频的空间拼接策略,以建模更广泛的环境并确保内部多视角几何一致性。3)我们将跨智能体注意力模块集成到预训练视频模型中,实现跨智能体时空信息的交互传递,保证重叠区域的共享世界一致性和非重叠区域的合理生成。支持49帧大规模视频生成的ShareVerse能够准确感知动态智能体的位置,实现一致的共享世界建模。

英文摘要

This paper presents ShareVerse, a video generation framework enabling multi-agent shared world modeling, addressing the gap in existing works that lack support for unified shared world construction with multi-agent interaction. ShareVerse leverages the generation capability of large video models and integrates three key innovations: 1) A dataset for large-scale multi-agent interactive world modeling is built on the CARLA simulation platform, featuring diverse scenes, weather conditions, and interactive trajectories with paired multi-view videos (front/ rear/ left/ right views per agent) and camera data. 2) We propose a spatial concatenation strategy for four-view videos of independent agents to model a broader environment and to ensure internal multi-view geometric consistency. 3) We integrate cross-agent attention blocks into the pretrained video model, which enable interactive transmission of spatial-temporal information across agents, guaranteeing shared world consistency in overlapping regions and reasonable generation in non-overlapping regions. ShareVerse, which supports 49-frame large-scale video generation, accurately perceives the position of dynamic agents and achieves consistent shared world modeling.

2509.02655 2026-06-04 cs.CY cs.AI 版本更新

BioBlue: Systematic runaway-optimiser-like LLM failure modes on biologically and economically aligned AI safety benchmarks for LLMs with simplified observation format

BioBlue:在生物与经济对齐的AI安全基准上,具有简化观察格式的LLM的系统性类失控优化失败模式

Roland Pihlakas, Sruthi Susan Kuriakose

发表机构 * Independent researcher(独立研究者) Three Laws research collaboration(Three Laws研究合作) Rakvere, Estonia(爱沙尼亚拉克雷市)

AI总结 本研究通过长期控制环境测试LLM,发现尽管LLM能理解目标,但在多目标场景下会系统性偏离至单目标、无界优化行为,表现出类似失控优化的失败模式。

Comments 27 pages, 7 figures, 7 tables

详情
AI中文摘要

许多关于“失控优化”的AI对齐讨论聚焦于RL智能体:无界效用最大化者,它们以牺牲其他一切为代价过度优化代理目标(例如,“回形针最大化者”、规范博弈)。基于LLM的系统通常被认为更安全,因为它们作为下一个词元预测器而非持久优化器运行。我们通过将LLM置于需要随时间维持状态或平衡目标的简单、长期控制型环境中来实证检验这一假设:单目标和多目标稳态、平衡无界目标与递减收益、以及可再生资源的可持续性。我们发现,尽管LLM在多个步骤中经常表现适当并清楚理解所述目标,但它们常常以结构化的方式丢失上下文并漂移至失控行为:忽略稳态目标、从多目标权衡崩溃为单目标最大化——从而未能尊重凹效用结构。这些失败在初始阶段的能力行为之后可靠地出现,并表现出特征模式(包括自模仿振荡、无界最大化以及回归单目标优化),尽管此时上下文窗口远未满。问题不在于LLM只是丢失上下文并变得不连贯。尽管LLM表面上看似多目标且有界,但在涉及多目标的持续交互下,其行为系统性偏向于像单目标、无界、对齐不良的优化器。我们假设存在一个词元级模式强化吸引子:LLM可能越来越多地从其近期动作历史的词元模式而非原始指令中推导动作。为何这仅发生在多目标设置中仍是一个开放问题。

英文摘要

Many AI alignment discussions of "runaway optimisation" focus on RL agents: unbounded utility maximisers that over-optimise a proxy objective (e.g., "paperclip maximiser", specification gaming) at the expense of everything else. LLM-based systems are often assumed to be safer because they function as next-token predictors rather than persistent optimisers. We empirically test this assumption by placing LLMs in simple, long-horizon control-style environments that require maintaining state of or balancing objectives over time: single- and multi-objective homeostasis, balancing unbounded objectives with diminishing returns, and sustainability of a renewable resource. We find that, although LLMs frequently behave appropriately for many steps and clearly understand the stated objectives, they often lose context in structured ways and drift into runaway behaviours: ignoring homeostatic targets, collapsing from multi-objective trade-offs into single-objective maximisation - thus failing to respect concave utility structures. These failures emerge reliably after initial periods of competent behaviour and exhibit characteristic patterns (including self-imitative oscillations, unbounded maximisation, and reverting to single-objective optimisation), even though the context window is far from full at that point. The problem is not that the LLMs just lose context and become incoherent. Although LLMs appear multi-objective and bounded on the surface, their behaviour under sustained interaction involving multiple objectives, is systematically biased towards acting like single-objective, unbounded, poorly aligned optimisers. We hypothesise a token-level pattern reinforcement attractor: LLMs may increasingly derive actions from the token patterns of their recent action history rather than from the original instructions. Why this happens only in multi-objective settings remains an open question.

2602.20971 2026-06-04 cs.LG cs.AI 版本更新

Does Order Matter : Connecting The Law of Robustness to Robust Generalization

顺序重要吗:连接鲁棒性定律与鲁棒泛化

Mihir More, Aritra Das, Jaee Ponde, Himadri Mandal, Vishnu Varadarajan, Debayan Gupta

发表机构 * Ashoka University(阿什oka大学) Truth Audit Labs(真相审计实验室) Indian Statistical Institute(印度统计研究所)

AI总结 本文通过全局和局部Rademacher复杂度,将鲁棒性定律(Lipschitz常数下界)与鲁棒泛化误差联系起来,证明了对任意数据分布,全局Lipschitz界阶不变,而局部Lipschitz界阶随扰动半径和局部浓度项变化。

详情
AI中文摘要

Bubeck和Selke(2021)将鲁棒性定律与鲁棒泛化误差之间的联系作为一个开放问题提出。鲁棒性定律指出,过参数化对于模型实现鲁棒插值是必要的,即插值函数必须是Lipschitz的。Wu等人(2023)将该定律推广到任意数据分布,证明Lipschitz常数满足$L = Ω(n^{1/d})$。另一方面,鲁棒泛化研究小的鲁棒训练损失是否意味着小的鲁棒测试损失。这可以使用统计学习技术(如Rademacher复杂度)来研究,其中鲁棒损失类的Rademacher复杂度的界意味着函数类Lipschitz性的界。我们利用这一联系,明确地将两者联系起来,适用于任意数据分布。(i) 我们证明,在考虑鲁棒损失类的全局Rademacher复杂度时,Lipschitz界的阶保持不变。(ii) 在局部尺度上,即对于具有小经验误差的函数子集,Lipschitz界的阶随扰动半径$ρ$和局部浓度项$\sqrt{r/n}$变化。

英文摘要

Bubeck and Selke (2021) propose the connection between the Law of Robustness and robust generalization error as an open problem. The Law of Robustness states that overparameterization is necessary for models to interpolate robustly, i.e., the interpolating function is required to be Lipschitz. Wu et al. (2023) extend this law to arbitrary data distributions, proving that the Lipschitz constant satisfies $L = Ω(n^{1/d})$. Robust generalization, on the other hand, asks whether small robust training loss implies small robust test loss. This can be studied using statistical learning techniques such as Rademacher complexities, where a bound on the Rademacher complexity of the robust loss class implies a bound on the Lipschitzness of the function class. We use this connection to explicitly link the two for arbitrary data distributions. (i) We prove that the order of the Lipschitz bound remains the same when considering the global Rademacher complexity of robust loss classes. (ii) At the local scale, i.e., for subsets of functions with small empirical error, the order of the Lipschitz bound changes with the perturbation radius $ρ$ and the localized concentration term $\sqrt{r/n}$.

2511.05722 2026-06-04 cs.CL cs.AI 版本更新

OckBench: Measuring the Efficiency of LLM Reasoning

OckBench:衡量LLM推理效率

Zheng Du, Hao Kang, Song Han, Tushar Krishna, Ligeng Zhu

发表机构 * Georgia Institute of Technology(佐治亚理工学院) Massachusetts Institute of Technology(麻省理工学院) NVIDIA(英伟达)

AI总结 提出OckBench基准,联合评估推理和编码任务中的准确性与token效率,揭示当前模型token利用率低下问题。

详情
AI中文摘要

大型语言模型(LLM)如GPT-5和Gemini 3已推动自动推理和代码生成的前沿。然而,当前基准强调准确性和输出质量,忽略了关键维度:token使用的效率。在实际应用中,token效率变化很大。解决相同问题且准确率相近的模型,其token长度差异可达 extbf{5.0$ imes$},导致模型推理能力存在巨大差距。这种差异暴露了显著的冗余,凸显了对标准化基准来量化token效率差距的迫切需求。因此,我们引入OckBench,这是首个联合衡量推理和编码任务中准确性与token效率的基准。我们的评估表明,当前模型的token效率在很大程度上未得到优化,显著增加了服务成本和延迟。这些发现为社区优化潜在推理能力(即token效率)提供了具体路线图。最终,我们主张评估范式转变:token不应被无谓地倍增。我们的基准可在https://ockbench.github.io/获取。

英文摘要

Large language models (LLMs) such as GPT-5 and Gemini 3 have pushed the frontier of automated reasoning and code generation. Yet current benchmarks emphasize accuracy and output quality, neglecting a critical dimension: efficiency of token usage. The token efficiency is highly variable in practical. Models solving the same problem with similar accuracy can exhibit up to a \textbf{5.0$\times$} difference in token length, leading to massive gap of model reasoning ability. Such variance exposes significant redundancy, highlighting the critical need for a standardized benchmark to quantify the gap of token efficiency. Thus, we introduce OckBench, the first benchmark that jointly measures accuracy and token efficiency across reasoning and coding tasks. Our evaluation reveals that token efficiency remains largely unoptimized across current models, significantly inflating serving costs and latency. These findings provide a concrete roadmap for the community to optimize the latent reasoning ability, token efficiency. Ultimately, we argue for an evaluation paradigm shift: tokens must not be multiplied beyond necessity. Our benchmarks are available at https://ockbench.github.io/.

2602.19101 2026-06-04 cs.CL cs.AI 版本更新

Value Entanglement: Conflation Between Different Kinds of Good In (Some) Large Language Models

价值纠缠:大型语言模型中不同种类好的混淆

Seong Hah Cho, Junyi Li, Anna Leshinskaya

发表机构 * Independent Department of Cognitive Sciences, UC Irvine(独立认知科学系,加州大学 Irvine 分校)

AI总结 通过探测模型行为、嵌入和残差流激活,发现大型语言模型普遍存在价值纠缠,即道德、语法和经济三种价值被混淆,其中语法和经济价值过度受道德价值影响,通过选择性消融与道德相关的激活向量可修复此问题。

详情
AI中文摘要

大型语言模型(LLMs)的价值对齐要求我们经验性地测量这些模型实际习得的价值表征。人类价值表征的一个特点是能够区分不同种类的价值。我们研究LLMs是否同样区分三种不同的好:道德的、语法的和经济的。通过探测模型行为、嵌入和残差流激活,我们报告了普遍存在的价值纠缠案例:这些不同价值表征之间的混淆。具体而言,相对于人类规范,语法和经济评价被发现过度受道德价值影响。通过选择性消融与道德相关的激活向量,这种混淆得到了修复。

英文摘要

Value alignment of Large Language Models (LLMs) requires us to empirically measure these models' actual, acquired representation of value. Among the characteristics of value representation in humans is that they distinguish among value of different kinds. We investigate whether LLMs likewise distinguish three different kinds of good: moral, grammatical, and economic. By probing model behavior, embeddings, and residual stream activations, we report pervasive cases of value entanglement: a conflation between these distinct representations of value. Specifically, both grammatical and economic valuation was found to be overly influenced by moral value, relative to human norms. This conflation was repaired by selective ablation of the activation vectors associated with morality.

2602.16966 2026-06-04 cs.LG cs.AI 版本更新

A Unified Framework for Locality in Scalable MARL

可扩展多智能体强化学习中局部性的统一框架

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder(科罗拉多大学博尔德分校) INRIA Paris(巴黎国家信息与自动化研究所)

AI总结 提出统一框架,通过将矩阵C^π分解为环境敏感性和策略敏感性部分,利用谱半径条件ρ(H^π)<1严格弱于行和条件,证明软max温度直接控制局部性,并给出块坐标KL近端策略改进的确定性保证。

详情
AI中文摘要

网络化多智能体强化学习的可扩展方法让每个智能体仅使用智能体图的一小部分邻域进行规划。这仅在系统是值局部性时有效,即一个智能体的扰动对远处另一个智能体的长期值影响较弱。在平均奖励设置中,验证局部性的标准方法是Dobrushin行和界,该界基于一个矩阵$C^π$,该矩阵捕捉每个智能体的下一个状态如何依赖于其他智能体的当前状态。为了使该矩阵易于处理,先前的工作通过联合动作的上确界来约束它。得到的界与策略无关,但当策略从不选择最坏情况动作时,该界是松的。我们将$C^π$分解为分别跟踪环境敏感性和策略敏感性的部分,$C^π\preceq E^{\mathrm s}+E^{\mathrm a}Π(π)$,其中$E^{\mathrm s}$衡量下一个状态如何随当前状态变化,$E^{\mathrm a}$衡量它如何随当前动作变化,$Π(π)$衡量策略对状态变化的反应程度。那么$H^π:= E^{\mathrm s}+E^{\mathrm a}Π(π)$的谱半径控制平均奖励泊松解的衰减,谱证书$ρ(H^π)<1$严格弱于同一矩阵上的行和条件$\|H^π\|_\infty<1$,并适用于先前Dobrushin风格工作中使用的策略无关动作上确界界无法处理的场景。对于温度-$τ$ softmax策略,我们有$Π(π)\le L/(2τ)$,因此softmax温度直接控制局部性。我们利用这一衰减结果为块坐标KL近端策略改进模板提供确定性预言机保证,其截断偏差随消息传递半径$κ$指数衰减。

英文摘要

Scalable methods for networked multi-agent reinforcement learning let each agent plan using only a small neighborhood of the agent graph. This works only when the system is value-local, meaning a perturbation at one agent affects the long-run value at another agent weakly when the two are far apart. In the average-reward setting, the standard way to certify locality is the Dobrushin row-sum bound on a single matrix $C^π$ that captures how each agent's next state depends on each other agent's current state. To make this matrix easy to work with, prior work bounds it by a supremum over joint actions. The resulting bound is independent of the policy, but it is loose whenever the policy never picks the worst-case action. We split $C^π$ into pieces that separately track environment sensitivity and policy sensitivity, $C^π\preceq E^{\mathrm s}+E^{\mathrm a}Π(π)$, where $E^{\mathrm s}$ measures how the next state moves with the current state, $E^{\mathrm a}$ measures how it moves with the current action, and $Π(π)$ measures how reactive the policy is to changes in state. The spectral radius of $H^π:= E^{\mathrm s}+E^{\mathrm a}Π(π)$ then controls the decay of the average-reward Poisson solution, and the spectral certificate $ρ(H^π)<1$ is strictly weaker than the row-sum condition $\|H^π\|_\infty<1$ on the same matrix and applies in regimes where policy-independent action-supremum bounds used in prior Dobrushin-style work cannot. For temperature-$τ$ softmax policies we get $Π(π)\le L/(2τ)$, so the softmax temperature directly controls locality. We use this decay result to give a deterministic oracle guarantee for a block-coordinate KL-proximal policy-improvement template whose truncation bias decays exponentially in the message-passing radius $κ$.

2602.03972 2026-06-04 stat.ML cs.AI cs.LG 版本更新

Fixed Budget is No Harder Than Fixed Confidence in Best-Arm Identification up to Logarithmic Factors

固定预算在最佳臂识别中不比固定置信度难(对数因子范围内)

Kapilan Balagopalan, Yinan Li, Yao Zhao, Tuan Nguyen, Anton Daitche, Houssam Nassif, Kwang-Sung Jun

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Washington(华盛顿大学) University of Texas at Austin(德克萨斯大学奥斯汀分校)

AI总结 本文提出元算法FC2FB,将固定置信度算法转化为固定预算算法,证明固定预算的样本复杂度在log因子内不高于固定置信度。

详情
Journal ref
International Conference on Machine Learning (ICML'26), Seoul, Korea, 2026
AI中文摘要

最佳臂识别(BAI)问题是交互式机器学习中最基本的问题之一,有两种形式:固定预算设置(FB)和固定置信度设置(FC)。对于具有唯一最佳臂的$K$臂赌博机,两种设置的最优样本复杂度已被确定,且在对数因子内匹配。这引出了一个关于通用的、可能具有结构化的BAI问题的有趣研究问题:FB是否比FC更难,还是相反?在本文中,我们证明FB在对数因子内并不比FC难。我们通过构造性方式做到这一点:我们提出了一种名为FC2FB(固定置信度到固定预算)的新算法,这是一种元算法,它接收一个FC算法$\mathcal{A}$并将其转化为FB算法。我们证明FC2FB的样本复杂度与$\mathcal{A}$的样本复杂度在对数因子内匹配。这意味着最优FC样本复杂度是FB最优样本复杂度的一个上界(在对数因子内)。我们的结果不仅揭示了FB和FC之间的基本关系,而且具有重要含义:FC2FB与现有最先进的FC算法相结合,可以改善许多FB问题的样本复杂度。

英文摘要

The best-arm identification (BAI) problem is one of the most fundamental problems in interactive machine learning, which has two flavors: the fixed-budget setting (FB) and the fixed-confidence setting (FC). For $K$-armed bandits with a unique best arm, the optimal sample complexities for both settings have been settled down, and they match up to logarithmic factors. This prompts an interesting research question about the generic, potentially structured BAI problems: is FB harder than FC or the other way around? In this paper, we show that FB is no harder than FC up to logarithmic factors. We do this constructively: we propose a novel algorithm called FC2FB (fixed confidence to fixed budget), which is a meta algorithm that takes in an FC algorithm $\mathcal{A}$ and turn it into an FB algorithm. We prove that FC2FB enjoys a sample complexity that matches, up to logarithmic factors, that of the sample complexity of $\mathcal{A}$. This means that the optimal FC sample complexity is an upper bound of the optimal FB sample complexity up to logarithmic factors. Our result not only reveals a fundamental relationship between FB and FC, but also has a significant implication: FC2FB combined with existing state-of-the-art FC algorithms leads to improved sample complexity for a number of FB problems.

2602.15202 2026-06-04 quant-ph cs.AI cs.NA eess.SP math.NA stat.CO 版本更新

Tomography by Design: An Algebraic Approach to Low-Rank Quantum States

按设计断层扫描:低秩量子态的代数方法

Shakir Showkat Sofi, Charlotte Vermeylen, Lieven De Lathauwer

发表机构 * Leuven.AI - KU Leuven institute for AI(Leuven.AI - KU莱顿人工智能研究所)

AI总结 提出一种代数算法,通过测量特定可观测量估计密度矩阵的结构化条目,并利用低秩假设通过数值线性代数完成矩阵,实现高效且确定性的量子态层析。

Comments 5 pages, Accepted to EUSIPCO 2026

详情
AI中文摘要

我们提出了一种用于量子态层析的代数算法,该算法利用对某些可观测量的测量来估计底层密度矩阵的结构化条目。在低秩假设下,其余条目可以仅使用标准数值线性代数运算获得。所提出的代数矩阵补全框架适用于一大类通用的低秩混合量子态,并且与最先进的方法相比,计算效率高,同时提供确定性的恢复保证。

英文摘要

We present an algebraic algorithm for quantum state tomography that leverages measurements of certain observables to estimate structured entries of the underlying density matrix. Under low-rank assumptions, the remaining entries can be obtained solely using standard numerical linear algebra operations. The proposed algebraic matrix completion framework applies to a broad class of generic, low-rank mixed quantum states and, compared with state-of-the-art methods, is computationally efficient while providing deterministic recovery guarantees.

2602.14117 2026-06-04 cs.NI cs.AI 版本更新

Toward Autonomous O-RAN: A Multi-Scale Agentic AI Framework for Real-Time Network Control and Management

迈向自主O-RAN:一种用于实时网络控制与管理的多尺度智能体AI框架

Hojjat Navidan, Mohammad Cheraghinia, Jaron Fontaine, Mohamed Seif, Eli De Poorter, H. Vincent Poor, Ingrid Moerman, Adnan Shahid

发表机构 * IDLab, Department of Information Technology at Ghent University - imec(IDLab,格鲁特大学信息科技系 - imec) Department of Electrical and Computer Engineering, Princeton University(电气与计算机工程系,普林斯顿大学)

AI总结 提出一种多尺度智能体AI框架,通过非实时、近实时和实时控制环的协调层次结构,实现O-RAN的自主网络控制与管理,并在非平稳条件下和意图驱动的切片资源控制场景中验证了其有效性。

Comments Submitted to the IEEE Networks Journal

详情
AI中文摘要

开放无线接入网络(O-RAN)通过解耦、软件驱动的组件和开放接口承诺灵活的6G网络接入,但这种可编程性也增加了操作复杂性。多个控制环共存于服务管理层和RAN智能控制器(RIC)中,而独立开发的控制应用可能以意外方式交互。同时,生成式人工智能的最新进展正在推动从孤立AI模型向能够解释目标、协调多个模型和控制功能并随时间调整行为的智能体AI系统转变。本文提出了一种用于O-RAN的多尺度智能体AI框架,将RAN智能组织为跨非实时(Non-RT)、近实时(Near-RT)和实时(RT)控制环的协调层次结构:(i)Non-RT RIC中的大语言模型(LLM)智能体将运营商意图转化为策略并管理模型生命周期;(ii)Near-RT RIC中的小语言模型(SLM)智能体执行低延迟优化,并可激活、调整或禁用现有控制应用;(iii)分布式单元附近的无线物理层基础模型(WPFM)智能体提供接近空中接口的快速推理。我们描述了这些智能体如何通过标准化的O-RAN接口和遥测进行协作。通过基于开源模型、软件和数据集的概念验证实现,我们在两个代表性场景中展示了所提出的智能体方法:非平稳条件下的鲁棒操作和意图驱动的切片资源控制。

英文摘要

Open Radio Access Networks (O-RAN) promise flexible 6G network access through disaggregated, software-driven components and open interfaces, but this programmability also increases operational complexity. Multiple control loops coexist across the service management layer and RAN Intelligent Controller (RIC), while independently developed control applications can interact in unintended ways. In parallel, recent advances in generative Artificial Intelligence (AI) are enabling a shift from isolated AI models toward agentic AI systems that can interpret goals, coordinate multiple models and control functions, and adapt their behavior over time. This article proposes a multi-scale agentic AI framework for O-RAN that organizes RAN intelligence as a coordinated hierarchy across the Non-Real-Time (Non-RT), Near-Real-Time (Near-RT), and Real-Time (RT) control loops: (i) A Large Language Model (LLM) agent in the Non-RT RIC translates operator intent into policies and governs model lifecycles. (ii) Small Language Model (SLM) agents in the Near-RT RIC execute low-latency optimization and can activate, tune, or disable existing control applications; and (iii) Wireless Physical-layer Foundation Model (WPFM) agents near the distributed unit provide fast inference close to the air interface. We describe how these agents cooperate through standardized O-RAN interfaces and telemetry. Using a proof-of-concept implementation built on open-source models, software, and datasets, we demonstrate the proposed agentic approach in two representative scenarios: robust operation under non-stationary conditions and intent-driven slice resource control.

2602.12643 2026-06-04 cs.LG cs.AI stat.ML 版本更新

Unifying Model-Free Efficiency and Model-Based Representations via Latent Dynamics

通过潜在动力学统一无模型效率与基于模型的表示

Jashaswimalya Acharjee, Balaraman Ravindran

AI总结 提出统一潜在动力学算法,通过将状态-动作对嵌入到值函数近似线性的潜在空间,无需规划开销即可融合无模型效率与基于模型表示的优势,在80个环境中匹配或超越专门基线。

Comments Similarities found with a prior work. Hence, requesting for withdrawal until further notice

详情
AI中文摘要

我们提出了统一潜在动力学(ULD),一种新颖的强化学习算法,它统一了无模型方法的效率与基于模型方法的表示优势,且不产生规划开销。通过将状态-动作对嵌入到真实值函数近似线性的潜在空间中,我们的方法支持跨不同领域使用单一超参数集——从低维和像素输入的连续控制到高维Atari游戏。我们证明,在温和条件下,基于嵌入的时序差分更新的不动点与相应线性基于模型的值扩展的不动点一致,并推导了将嵌入保真度与值逼近质量相关联的显式误差界。在实践中,ULD采用编码器、值函数和策略网络的同步更新、短视界预测动力学的辅助损失以及奖励尺度归一化,以确保在稀疏奖励下的稳定学习。在涵盖Gym运动控制、DeepMind Control(本体感觉和视觉)以及Atari的80个环境上的评估表明,我们的方法匹配或超过了专门的基于模型和通用基于模型的基线的性能——以最少的调参和更少的参数实现了跨领域能力。这些结果表明,仅与值对齐的潜在表示就能提供传统上归因于完整基于模型规划的适应性和样本效率。

英文摘要

We present Unified Latent Dynamics (ULD), a novel reinforcement learning algorithm that unifies the efficiency of model-free methods with the representational strengths of model-based approaches, without incurring planning overhead. By embedding state-action pairs into a latent space in which the true value function is approximately linear, our method supports a single set of hyperparameters across diverse domains -- from continuous control with low-dimensional and pixel inputs to high-dimensional Atari games. We prove that, under mild conditions, the fixed point of our embedding-based temporal-difference updates coincides with that of a corresponding linear model-based value expansion, and we derive explicit error bounds relating embedding fidelity to value approximation quality. In practice, ULD employs synchronized updates of encoder, value, and policy networks, auxiliary losses for short-horizon predictive dynamics, and reward-scale normalization to ensure stable learning under sparse rewards. Evaluated on 80 environments spanning Gym locomotion, DeepMind Control (proprioceptive and visual), and Atari, our approach matches or exceeds the performance of specialized model-free and general model-based baselines -- achieving cross-domain competence with minimal tuning and a fraction of the parameter footprint. These results indicate that value-aligned latent representations alone can deliver the adaptability and sample efficiency traditionally attributed to full model-based planning.

2602.11189 2026-06-04 q-bio.BM cs.AI 版本更新

MuCO: Generative Peptide Cyclization Empowered by Multi-stage Conformation Optimization

MuCO:基于多阶段构象优化的生成式肽环化

Yitian Wang, Fanmeng Wang, Angxiao Yue, Wentao Guo, Yaning Cui, Hongteng Xu

发表机构 * Department of XXX, University of YYY, Location, Country(XXX部门,YYY大学,地点,国家) School of ZZZ, Institute of WWW, Location, Country(ZZZ学院,WWW研究所,地点,国家) Gaoling School of Artificial Intelligence, Renmin University of China, Beijing, China(中关村人工智能学院,中国人民大学,北京,中国) Beijing Key Laboratory of Research on Large Models(北京大模型研究关键实验室) Engineering Research Center of Next-Generation Intelligent Search(下一代智能搜索工程研究中心)

AI总结 提出MuCO方法,通过多阶段构象优化生成环肽构象,在物理稳定性、结构多样性和计算效率上优于现有方法。

详情
AI中文摘要

建模肽环化对于虚拟筛选具有理想物理和药物特性的候选肽至关重要。这一任务具有挑战性,因为环肽通常呈现多样化的环状构象,而由线性肽折叠推导出的确定性预测模型无法很好地捕捉这些构象。在本研究中,我们提出MuCO(多阶段构象优化),一种生成式肽环化方法,对以相应线性肽为条件的环肽构象分布进行建模。原则上,MuCO将肽环化任务解耦为三个阶段:拓扑感知的主链设计、生成式侧链打包和物理感知的全原子优化,从而以从粗到细的方式生成和优化环肽构象。这种多阶段框架实现了用于构象生成的高效并行采样策略,并允许快速探索多样化的低能构象。在大型CPSea数据集上的实验表明,MuCO在物理稳定性、结构多样性、二级结构恢复和计算效率方面显著且一致地优于最先进的方法,使其成为探索和设计环肽的有前景的计算工具。所提出方法的演示可在https://github.com/mianqiu00/MuCO找到。

英文摘要

Modeling peptide cyclization is critical for the virtual screening of candidate peptides with desirable physical and pharmaceutical properties. This task is challenging because a cyclic peptide often exhibits diverse, ring-shaped conformations, which cannot be well captured by deterministic prediction models derived from linear peptide folding. In this study, we propose MuCO (Multi-stage Conformation Optimization), a generative peptide cyclization method that models the distribution of cyclic peptide conformations conditioned on the corresponding linear peptide. In principle, MuCO decouples the peptide cyclization task into three stages: topology-aware backbone design, generative side-chain packing, and physics-aware all-atom optimization, thereby generating and optimizing conformations of cyclic peptides in a coarse-to-fine manner. This multi-stage framework enables an efficient parallel sampling strategy for conformation generation and allows for rapid exploration of diverse, low-energy conformations. Experiments on the large-scale CPSea dataset demonstrate that MuCO significantly and consistently outperforms state-of-the-art methods in physical stability, structural diversity, secondary structure recovery, and computational efficiency, making it a promising computational tool for exploring and designing cyclic peptides. The demo of the proposed method can be found at https://github.com/mianqiu00/MuCO.

2510.26219 2026-06-04 cs.LG cs.AI 版本更新

Test-time reward-guided alignment of language models by importance sampling on pre-logit space

基于预逻辑空间重要性采样的测试时奖励引导语言模型对齐

Sekitoshi Kanai, Tsukasa Yoshida, Hiroshi Takahashi, Haru Kuroki, Kazumune Hashimoto

发表机构 * NTT, Inc.(NTT公司) Toyohashi University of Technology(东邦大学) The University of Osaka(大阪大学)

AI总结 提出一种基于预逻辑空间自适应重要性采样的测试时对齐方法AISP,通过高斯扰动和重要性采样优化奖励期望,在样本效率上优于最佳-of-n采样和其他测试时对齐方法。

Comments 24 pages, 10 figures

详情
AI中文摘要

大型语言模型(LLM)的测试时对齐因其微调计算成本高而受到关注。本文提出一种新的测试时奖励引导对齐方法,称为基于预逻辑的自适应重要性采样(AISP),该方法基于随机控制输入的采样模型预测控制。AISP将高斯扰动应用于预逻辑(倒数第二层的输出),以最大化相对于扰动均值的期望奖励。我们证明,通过重要性采样和采样奖励可以获得最优均值。AISP在使用样本数量方面的奖励优于最佳-of-n采样,并且比其他基于奖励的测试时对齐方法获得更高的奖励。

英文摘要

Test-time alignment of large language models (LLMs) attracts attention because fine-tuning of LLMs requires high computational costs. In this paper, we propose a new test-time reward-guided alignment method called adaptive importance sampling on pre-logits (AISP) on the basis of the sampling-based model predictive control with the stochastic control input. AISP applies the Gaussian perturbation into pre-logits, which are outputs of the penultimate layer, so as to maximize expected rewards with respect to the mean of the perturbation. We demonstrate that the optimal mean is obtained by importance sampling with sampled rewards. AISP outperforms best-of-n sampling in terms of rewards over the number of used samples and achieves higher rewards than other reward-based test-time alignment methods.

2506.06006 2026-06-04 cs.CV cs.AI cs.CL 版本更新

Can VLMs Predict Future States? Bootstrapping World Models from Inverse Dynamics

视觉语言模型能预测未来状态吗?从逆动力学引导世界模型

Yifu Qiu, Yftah Ziser, Anna Korhonen, Shay B. Cohen, Edoardo M. Ponti

发表机构 * Institute for Language, Cognition and Computation, University of Edinburgh(语言、认知与计算研究所,爱丁堡大学) Language Technology Lab, University of Cambridge(语言技术实验室,剑桥大学) NVIDIA(NVIDIA公司) University of Groningen(格罗宁根大学)

AI总结 本文发现视觉语言模型(VLM)难以直接进行前向动力学预测(FDP),但逆动力学预测(IDP)更容易学习,并利用IDP通过弱监督学习和推理时验证两种策略引导FDP,在Aurora-Bench上取得与最先进图像编辑模型竞争的性能。

详情
AI中文摘要

统一的视觉语言模型(VLM)能否执行前向动力学预测(FDP),即根据先前的观察和(语言形式的)动作预测未来状态(图像形式)?我们发现VLM难以根据指令生成帧之间物理上合理的过渡。然而,我们识别出多模态基础中的一个关键不对称性:微调VLM学习逆动力学预测(IDP)——有效地描述帧之间的动作——比学习FDP容易得多。反过来,IDP可以通过两种主要策略引导FDP:1)来自合成数据的弱监督学习,以及2)推理时验证。首先,IDP可以为未标记的视频帧观察对标注动作,以扩大FDP的训练数据规模。其次,IDP可以为FDP的多个样本分配奖励以对其进行评分,从而在推理时有效指导搜索。我们通过Aurora-Bench上的以动作为中心的图像编辑任务,使用两个VLM家族评估了这两种策略产生的FDP。尽管仍然是通用模型,我们的最佳模型实现了与最先进的图像编辑模型竞争的性能,根据GPT4o作为评判,在Aurora-Bench的所有子集上,性能提高了7%到13%,并获得了最佳平均人类评估。

英文摘要

Can unified vision-language models (VLMs) perform forward dynamics prediction (FDP), i.e., predicting the future state (in image form) given the previous observation and an action (in language form)? We find that VLMs struggle to generate physically plausible transitions between frames from instructions. Nevertheless, we identify a crucial asymmetry in multimodal grounding: fine-tuning a VLM to learn inverse dynamics prediction (IDP)-effectively captioning the action between frames-is significantly easier than learning FDP. In turn, IDP can be used to bootstrap FDP through two main strategies: 1) weakly supervised learning from synthetic data and 2) inference time verification. Firstly, IDP can annotate actions for unlabelled pairs of video frame observations to expand the training data scale for FDP. Secondly, IDP can assign rewards to multiple samples of FDP to score them, effectively guiding search at inference time. We evaluate the FDP resulting from both strategies through the task of action-centric image editing on Aurora-Bench with two families of VLMs. Despite remaining general-purpose, our best model achieves a performance competitive with state-of-the-art image editing models, improving on them by a margin between 7% and 13% according to GPT4o-as-judge, and achieving the best average human evaluation across all subsets of Aurora-Bench.

2511.13391 2026-06-04 cs.LG cs.AI math.CO math.MG 版本更新

Finding Kissing Numbers with Game-theoretic Reinforcement Learning

用博弈论强化学习寻找亲吻数

Chengdong Ma, Théo Tao Zhaowei, Pengyu Li, Minghao Liu, Haojun Chen, Zihao Mao, Bo Li, Yuan Cheng, Yuan Qi, Yaodong Yang

发表机构 * Institute for Artificial Intelligence, Peking University(北京大学人工智能研究院) Shanghai Academy of AI for Science(上海人工智能科学研究院) Artificial Intelligence Innovation and Incubation Institute, Fudan University(复旦大学人工智能创新与孵化院)

AI总结 将亲吻数问题转化为合作矩阵补全博弈,利用强化学习系统PackingStar在极值配置空间中探索,改进了15个长期未突破的亲吻数上界,并发现了新的可解释几何结构。

详情
AI中文摘要

自1694年牛顿首次研究亲吻数问题以来,确定中心球周围非重叠球的最大数量一直是离散几何中的一个决定性挑战。作为希尔伯特第18问题的局部类比,它在几何、数论和信息论中具有深远意义。尽管格和编码取得了显著进展,但该领域局限于孤立的极值构型,掩盖了潜在的几何原理。在这里,我们将对象转移到更广泛的极值配置空间,从而为亲吻数问题开辟了一条新路径。因此,我们将该问题重新表述为一个合作矩阵补全博弈,并训练一个强化学习系统PackingStar来解决它。一个玩家填充余弦条目,而另一个玩家纠正次优条目,使爆炸性的几何复杂性变得可处理。在极值配置空间内工作,PackingStar发现了新的可解释几何结构,改进了15个在亲吻数及其推广中保持数十年的强上界,其中几个在自然内积下被证明是最优的。这些发现揭示了Fischer群Fi22的第一个显式球面编码实现,扩展了子群结构的经典欧几里得表示,并直接启发了数学家的后续突破。总体而言,这项工作为人工智能在希尔伯特级别问题上的进展提供了一个早期示例,展示了强化学习通过解锁更具表现力的对象来推动数学发现。

英文摘要

Since Isaac Newton first studied the Kissing Number Problem in 1694, determining the maximal number of non-overlapping spheres around a central sphere has remained a defining challenge in discrete geometry. As the local analogue of Hilbert's 18th problem, it has profound implications across geometry, number theory and information theory. Although lattices and codes have achieved significant progress, the field is confined to isolated extremal configurations, leaving underlying geometric principles obscured. Here we shift the object to the broader extremal configuration space, thereby opening a new path for the Kissing Number Problem. Accordingly, we recast this problem as a cooperative matrix-completion game, and train a reinforcement learning system, PackingStar, to solve it. One player fills cosine entries while the other corrects suboptimal ones, making explosive geometric complexity tractable. Working within extremal configuration spaces, PackingStar discovers new interpretable geometric structures that improve 15 strong bounds held for decades in kissing numbers and their generalizations, several of them provably optimal under natural inner products. These findings reveal the first explicit spherical-code realization of the Fischer group Fi22, extend the classical Euclidean representation of subgroup structure, and directly inspire subsequent breakthroughs by mathematicians. Overall, the work provides an early example of AI-driven progress on a Hilbert-calibre problem, showing how reinforcement learning advances mathematical discovery by unlocking more expressive objects.

2602.09075 2026-06-04 cs.LG cs.AI 版本更新

Learning to Remember, Learn, and Forget in Attention-Based Models

在基于注意力的模型中学习记忆、学习和遗忘

Djohan Bonnet, Jamie Lohoff, Jan Finkbeiner, Elidona Shiqerukaj, Emre Neftci

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出Palimpsa模型,将上下文学习视为持续学习问题,通过贝叶斯元可塑性解决稳定性-可塑性困境,显著提升记忆容量,在MQAR和常识推理任务上优于基线。

详情
AI中文摘要

Transformer中的上下文学习(ICL)作为一种在线联想记忆,被认为是其在复杂序列处理任务中高性能的基础。然而,在门控线性注意力模型中,这种记忆具有固定容量且容易受到干扰,尤其是对于长序列。我们提出Palimpsa,一种自注意力模型,将ICL视为必须解决稳定性-可塑性困境的持续学习问题。Palimpsa使用贝叶斯元可塑性,其中每个注意力状态的可塑性绑定到一个由捕获累积知识的先验分布支撑的重要性状态。我们证明各种门控线性注意力模型作为特定的架构选择和后验近似出现,并且Mamba2是Palimpsa的一个特例,其中遗忘占主导。这一理论联系使得任何非元可塑性模型都能转化为元可塑性模型,从而显著扩展其记忆容量。我们的实验表明,Palimpsa在多查询联想回忆(MQAR)基准和常识推理任务上始终优于基线。

英文摘要

In-Context Learning (ICL) in transformers acts as an online associative memory and is believed to underpin their high performance on complex sequence processing tasks. However, in gated linear attention models, this memory has a fixed capacity and is prone to interference, especially for long sequences. We propose Palimpsa, a self-attention model that views ICL as a continual learning problem that must address a stability-plasticity dilemma. Palimpsa uses Bayesian metaplasticity, where the plasticity of each attention state is tied to an importance state grounded by a prior distribution that captures accumulated knowledge. We demonstrate that various gated linear attention models emerge as specific architecture choices and posterior approximations, and that Mamba2 is a special case of Palimpsa where forgetting dominates. This theoretical link enables the transformation of any non-metaplastic model into a metaplastic one, significantly expanding its memory capacity. Our experiments show that Palimpsa consistently outperforms baselines on the Multi-Query Associative Recall (MQAR) benchmark and on Commonsense Reasoning tasks.

2602.09464 2026-06-04 cs.SE cs.AI cs.CL 版本更新

AlgoVeri: An Aligned Benchmark for Verified Code Generation on Classical Algorithms

AlgoVeri:面向经典算法的验证代码生成对齐基准

Haoyu Zhao, Ziran Yang, Jiawei Li, Deyuan He, Zenan Li, Chi Jin, Venugopal V. Veeravalli, Aarti Gupta, Sanjeev Arora

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 为解决跨范式验证代码生成评估缺乏统一方法的问题,提出AlgoVeri基准,在Dafny、Verus和Lean三种语言上评估77个经典算法的验证代码生成,揭示不同验证系统的能力差距。

Comments Accepted to ICML 2026, 32 pages

详情
AI中文摘要

验证代码生成指从严格规范生成形式化验证的代码。近期AI模型在验证代码生成方面展现出潜力,但缺乏跨范式的统一评估方法。现有基准仅测试单一语言/工具(如Dafny、Verus和Lean),且各自覆盖非常不同的任务,因此性能数据无法直接比较。我们通过AlgoVeri基准填补这一空白,该基准在Dafny、Verus和Lean上评估77个经典算法的验证代码生成。通过强制使用相同的功能契约,AlgoVeri揭示了验证系统中的关键能力差距。前沿模型在Dafny中取得了可观的成功(Gemini-3 Flash为40.3%),其中高层抽象和SMT自动化简化了工作流,但在Verus的系统级内存约束(24.7%)和Lean所需的显式证明构造(7.8%)下性能急剧下降。除了总体指标,我们还发现了测试时计算动态的显著差异:Gemini-3有效利用迭代修复提升性能(例如,在Dafny中通过率提高三倍),而GPT-OSS则早期饱和。最后,我们的错误分析表明,语言设计影响改进轨迹:Dafny允许模型专注于逻辑正确性,而Verus和Lean将模型困在持久的语法和语义障碍中。所有数据和评估代码可在https://github.com/haoyuzhao123/algoveri获取。

英文摘要

Vericoding refers to the generation of formally verified code from rigorous specifications. Recent AI models show promise in vericoding, but a unified methodology for cross-paradigm evaluation is lacking. Existing benchmarks test only individual languages/tools (e.g., Dafny, Verus, and Lean) and each covers very different tasks, so the performance numbers are not directly comparable. We address this gap with AlgoVeri, a benchmark that evaluates vericoding of $77$ classical algorithms in Dafny, Verus, and Lean. By enforcing identical functional contracts, AlgoVeri reveals critical capability gaps in verification systems. While frontier models achieve tractable success in Dafny ($40.3$% for Gemini-3 Flash), where high-level abstractions and SMT automation simplify the workflow, performance collapses under the systems-level memory constraints of Verus ($24.7$%) and the explicit proof construction required by Lean (7.8%). Beyond aggregate metrics, we uncover a sharp divergence in test-time compute dynamics: Gemini-3 effectively utilizes iterative repair to boost performance (e.g., tripling pass rates in Dafny), whereas GPT-OSS saturates early. Finally, our error analysis shows that language design affects the refinement trajectory: while Dafny allows models to focus on logical correctness, Verus and Lean trap models in persistent syntactic and semantic barriers. All data and evaluation code can be found at https://github.com/haoyuzhao123/algoveri.

2509.25289 2026-06-04 cs.LG cs.AI 版本更新

ClustRecNet: A Novel End-to-End Deep Learning Framework for Clustering Algorithm Recommendation

ClustRecNet: 一种用于聚类算法推荐的新型端到端深度学习框架

Mohammadreza Bakhtyari, Bogdan Mazoure, Renato Cordeiro de Amorim, Guillaume Rabusseau, Vladimir Makarenkov

发表机构 * Département d’Informatique, Université du Québec à Montréal(魁北克大学蒙特利尔分校计算机科学系) Mila - Quebec AI Institute(魁北克人工智能研究所) School of Computer Science and EE, University of Essex(埃塞克斯大学计算机科学与电子工程学院) Department of Computer Science and Operations Research, Université de Montréal(蒙特利尔大学计算机科学与运筹学系)

AI总结 提出ClustRecNet,一种端到端深度学习框架,通过直接学习原始表格数据的高阶表示来推荐合适的聚类算法,在合成和真实基准上优于传统内部聚类有效性指标和AutoML方法。

Comments Published in IEEE Access

详情
Journal ref
IEEE Access, vol. 14, pp. 81352 - 81365, 2026
AI中文摘要

为给定数据集识别有效的聚类算法仍然是一个基本的无监督学习问题。我们引入了ClustRecNet,一种新颖的端到端深度学习框架,通过直接学习原始表格数据的高阶表示来推荐合适的聚类算法。为了促进稳健的元学习,我们首先构建了一个包含34,000个合成数据集的综合存储库,涵盖了多种聚类场景,运行了10种流行的聚类算法,并使用调整兰德指数(ARI)建立真实标签。ClustRecNet的架构包含一个卷积块、两个残差块和一个注意力块,以捕获局部和全局结构模式,有效绕过了与手动特征工程相关的知识瓶颈。在合成和真实世界基准上的广泛评估表明,ClustRecNet始终优于传统的内部聚类有效性指标,如轮廓系数、Calinski-Harabasz、Davies-Bouldin和Dunn,以及最先进的自动化机器学习(AutoML)方法,如ML2DAC、AutoCluster和AutoML4Clust。例如,我们的框架在合成数据上平均比Calinski-Harabasz聚类有效性指数高出0.497的ARI增益,在真实世界基准上平均比领先的AutoML方法(ML2DAC)高出44.16%的ARI改进。代码和数据可在以下网址获取:https://github.com/mrbakhtyari/ClustRecNet

英文摘要

Identifying an effective clustering algorithm for a given dataset remains a fundamental unsupervised learning issue. We introduce ClustRecNet, a novel end-to-end deep learning framework that recommends suitable clustering algorithm(s) by directly learning high-order representations of raw tabular data. To facilitate robust meta-learning, we first construct a comprehensive repository of 34,000 synthetic datasets encompassing a large variety of clustering scenarios, run 10 popular clustering algorithms, and use Adjusted Rand Index (ARI) to establish ground-truth labels. ClustRecNet's architecture incorporates a convolution block, two residual blocks, and an attention block to capture local and global structural patterns, effectively bypassing the knowledge bottleneck associated with manual feature engineering. Extensive evaluation on both synthetic and real-world benchmarks demonstrates that ClustRecNet consistently outperforms traditional internal cluster validity indices such as Silhouette, Calinski-Harabasz, Davies-Bouldin, and Dunn as well as state-of-the-art Automated Machine Learning (AutoML) approaches such as ML2DAC, AutoCluster, and AutoML4Clust. For example, our framework achieves an average 0.497 ARI gain over the Calinski-Harabasz cluster validity index on synthetic data and an average 44.16% ARI improvement over the leading AutoML approach (ML2DAC) on real-world benchmarks. Code and data are available at: https://github.com/mrbakhtyari/ClustRecNet

2601.20800 2026-06-04 cs.LG cs.AI 版本更新

Conditional PED-ANOVA: Hyperparameter Importance in Hierarchical & Dynamic Search Spaces

条件PED-ANOVA:层次与动态搜索空间中的超参数重要性

Kaito Baba, Yoshihiko Ozaki, Shuhei Watanabe

发表机构 * Preferred Networks, Inc.(Preferred Networks公司) The University of Tokyo(东京大学) SB Intuitions Corp.(SB Intuitions公司)

AI总结 提出条件PED-ANOVA框架,用于估计条件搜索空间中超参数的重要性,通过闭式估计器准确反映条件激活和域变化,实验证明其优于朴素适应方法。

Comments 20 pages, 15 figures. Accepted to the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD 2026)

详情
AI中文摘要

我们提出条件PED-ANOVA(condPED-ANOVA),一个用于估计条件搜索空间中超参数重要性(HPI)的原则性框架,其中超参数的存在或域可能依赖于其他超参数。尽管原始PED-ANOVA提供了一种快速有效的方法来估计搜索空间内高性能区域的HPI,但它假设一个固定的、无条件的搜索空间,因此无法正确处理条件超参数。为了解决这个问题,我们引入了针对高性能区域的条件HPI,并推导出一个闭式估计器,能够准确反映条件激活和域变化。实验表明,现有HPI估计器的朴素适应在条件设置下会产生误导性或不可解释的重要性,而condPED-ANOVA始终提供反映底层条件结构的有意义的重要性。我们的代码公开在https://github.com/kAIto47802/condPED-ANOVA。

英文摘要

We propose conditional PED-ANOVA (condPED-ANOVA), a principled framework for estimating hyperparameter importance (HPI) in conditional search spaces, where the presence or domain of a hyperparameter can depend on other hyperparameters. Although the original PED-ANOVA provides a fast and efficient way to estimate HPI within the top-performing regions of the search space, it assumes a fixed, unconditional search space and therefore cannot properly handle conditional hyperparameters. To address this, we introduce a conditional HPI for top-performing regions and derive a closed-form estimator that accurately reflects conditional activation and domain changes. Experiments show that naive adaptations of existing HPI estimators yield misleading or uninterpretable importances in conditional settings, whereas condPED-ANOVA consistently provides meaningful importances that reflect the underlying conditional structure. Our code is publicly available at https://github.com/kAIto47802/condPED-ANOVA.

2602.06911 2026-06-04 cs.CR cs.AI 版本更新

TamperBench: Systematically Stress-Testing LLM Safety Under Fine-Tuning and Tampering

TamperBench:系统化压力测试微调和篡改下的LLM安全性

Saad Hossain, Tom Tseng, Punya Syon Pandey, Samanvay Vajpayee, Matthew Kowal, Nayeema Nonta, Samuel Simko, Stephen Casper, Zhijing Jin, Kellin Pelrine, Sirisha Rambhatla

发表机构 * Critical ML Lab Waterloo Canada(Waterloo大学Critical ML实验室) FAR.AI Berkeley USA(伯克利美国FAR.AI公司) University of Toronto Toronto Canada(多伦多大学) University of Waterloo Waterloo Canada(Waterloo大学) ETH Zürich Zürich Switzerland(苏黎世联邦理工学院) MIT CSAIL Cambridge USA(麻省理工学院CSAIL实验室) University of Toronto, MPI, EuroSafeAI, Vector Institute Toronto Canada(多伦多大学、马克斯·普朗克研究所、EuroSafeAI、Vector Institute) Critical ML Lab University of Waterloo Waterloo Canada(Waterloo大学Critical ML实验室)

AI总结 提出统一框架TamperBench,通过系统化超参数扫描评估21个开源LLM在9种篡改威胁下的安全性和实用性,发现越狱微调是最严重攻击,当前对齐阶段防御基本失效。

Comments 25 pages, 15 figures

详情
AI中文摘要

随着能力日益增强的开源大语言模型(LLMs)的部署,提高其抵抗意外或故意不安全修改的篡改能力对于最小化风险变得至关重要。然而,目前没有标准方法来评估篡改抵抗性。不同的数据集、指标和篡改配置使得难以比较不同模型和防御之间的安全性、实用性和鲁棒性。为解决这一问题,我们引入了TamperBench,这是第一个系统评估LLM篡改抵抗性的统一框架。TamperBench (i) 整理了最先进的权重空间微调攻击、潜在空间表示攻击和对齐阶段防御的仓库;(ii) 通过每个攻击-模型对的系统化超参数扫描实现现实的对抗性评估;(iii) 提供安全性和实用性评估。我们使用TamperBench评估了21个开源LLM,包括增强防御的变体,针对九种篡改威胁,使用标准化的安全性和能力指标,并对每个模型-攻击对进行超参数扫描。结果提供了包括后训练对篡改抵抗性的影响、越狱微调通常是最严重的攻击以及当前对齐阶段防御基本无法抵御攻击扫描等见解。代码可在 https://github.com/criticalml-uw/TamperBench 获取。

英文摘要

As increasingly capable open-weight large language models (LLMs) are deployed, improving their tamper resistance against unsafe modifications, whether accidental or intentional, becomes critical to minimize risks. However, there is no standard approach to evaluate tamper resistance. Varied datasets, metrics, and tampering configurations make it difficult to compare safety, utility, and robustness across different models and defenses. To address this, we introduce TamperBench, the first unified framework to systematically evaluate the tamper resistance of LLMs. TamperBench (i) curates a repository of state-of-the-art weight-space fine-tuning attacks, latent-space representation attacks, and alignment-stage defenses; (ii) enables realistic adversarial evaluation through systematic hyperparameter sweeps per attack-model pair; and (iii) provides both safety and utility evaluations. We use TamperBench to evaluate 21 open-weight LLMs, including defense-augmented variants, across nine tampering threats using standardized safety and capability metrics with hyperparameter sweeps per model-attack pair. The results provide insights including effects of post-training on tamper resistance, that jailbreak-tuning is typically the most severe attack, and that current alignment-stage defenses largely fail to withstand attack sweeps. Code is available at https://github.com/criticalml-uw/TamperBench.

2602.04101 2026-06-04 cs.AI 版本更新

Interfaze: The Future of AI is built on Task-Specific Small Models

Interfaze: 人工智能的未来建立在特定任务的小模型之上

Harsha Vardhan Khurdula, Vineet Agarwal, Yoeven D Khemlani

发表机构 * GitHub

AI总结 提出Interfaze混合模型,通过共享嵌入空间将任务特定深度神经网络融合到Transformer解码器中,在多个确定性基准上以低成本达到高精度。

Comments 10 pages, 2 figures

详情
AI中文摘要

我们提出Interfaze,一种原生混合模型,通过共享嵌入空间将任务特定的深度神经网络(CNN和DNN)直接融合到Transformer解码器中。专门的感知编码器处理复杂多语言PDF的光学字符识别(OCR)、开放词汇对象和图形用户界面(GUI)检测,以及带说话人分离的多语言语音识别。每个编码器通过任务特定的适配器暴露,并可独立激活,因此查询仅触及所需的参数。内置的动作基础提供接地外部状态:代理无头浏览器和爬虫、代码沙箱、多域网络索引和可扩展向量存储。解码器过滤并合并这些信号,在任务需要时进行推理,并输出基于置信度的确定性结果。原始专家元数据(边界框、置信度分数、时间戳)被保留并作为前文与答案一起返回。在此架构上,Interfaze-Beta在确定性开发者任务基准套件中领先。它在OCRBench v2上达到70.7%,在olmOCR上达到85.7%,在RefCOCO上达到82.1%,在VoxPopuli上词错误率2.4%,在Spider-2.0-Lite上达到52.9%,在GPQA-Diamond上达到92.4%,在MMMLU上达到90.9%,在MMMU-Pro上达到71.1%,在结构化输出基准(SOB)上值准确率80.5%,在每个任务上都优于价格相当的通才模型(Gemini-3-Flash、Gemini-3.5-Flash、Claude-Sonnet-4.6、GPT-5.4-Mini和Grok-4.3)。由于融合的专家编码器通过单次传递而非重复工具调用大型模型来解决感知问题,Interfaze在确定性任务上以闪存级成本达到高精度和可验证的元数据。

英文摘要

We present Interfaze, a native hybrid model that fuses task-specific deep neural networks (CNNs and DNNs) directly into a transformer decoder through a shared embedding space. Specialized perceptual encoders handle optical character recognition (OCR) over complex multilingual PDFs, open-vocabulary object and graphical user interface (GUI) detection, and multilingual speech recognition with diarization. Each is exposed through a task-specific adapter and can be activated on its own, so a query touches only the parameters it needs. A built-in action foundation supplies a grounded external state: a proxied headless browser and scraper, a code sandbox, a multi-domain web index, and a scalable vector store. The decoder filters and merges these signals, reasons over them when a task requires it, and emits deterministic outputs built on confidence. The raw specialist metadata (bounding boxes, confidence scores, timestamps) is preserved and returned alongside the answer as precontext. On this architecture, Interfaze-Beta leads a suite of deterministic developer-task benchmarks. It reaches 70.7% on OCRBench v2, 85.7% on olmOCR, 82.1% on RefCOCO, a 2.4% word error rate on VoxPopuli, 52.9% on Spider-2.0-Lite, 92.4% on GPQA-Diamond, 90.9% on MMMLU, 71.1% on MMMU-Pro, and 80.5% value accuracy on the Structured Output Benchmark (SOB), ahead of comparably priced generalist models (Gemini- 3-Flash, Gemini-3.5-Flash, Claude-Sonnet-4.6, GPT-5.4-Mini, and Grok-4.3) on every task. Because fused specialist encoders resolve perception in a single pass instead of through repeated tool calls into a large model, Interfaze reaches high accuracy with verifiable metadata on deterministic tasks while running at flash-tier cost.

2601.09719 2026-06-04 cs.CL cs.AI 版本更新

Bounded Hyperbolic Tangent: A Stable and Efficient Alternative to Pre-Layer Normalization in Large Language Models

有界双曲正切:大型语言模型中预层归一化的稳定高效替代方案

Hoyoon Byun, Youngjun Choi, Taero Kim, Sungrae Park, Kyungwoo Song

发表机构 * Yonsei University(延世大学) Upstage AI

AI总结 提出BHyT,通过有界双曲正切和数据驱动的输入约束替代Pre-LN,在保持稳定性的同时提升训练和推理效率。

Comments Accepted to ICML 2026

详情
AI中文摘要

预层归一化(Pre-LN)是大型语言模型(LLM)的事实标准,对于稳定预训练和有效迁移学习至关重要。然而,Pre-LN会带来重复的统计计算开销,并且仍然容易受到深度诅咒的影响,即随着层数增加,隐藏状态幅度和方差增大,破坏训练稳定性。面向效率的无归一化方法(如Dynamic Tanh (DyT))提高了吞吐量,但在深度下仍然脆弱。为了同时解决稳定性和效率问题,我们提出了有界双曲正切(BHyT),作为Pre-LN的直接替代方案。BHyT将tanh非线性与显式的、数据驱动的输入边界相结合,使激活值保持在非饱和范围内。它防止了激活幅度和方差随深度增长,并提供了理论稳定性保证。在效率方面,BHyT每个块仅计算一次精确统计量,并用轻量级方差近似替代第二次归一化。实验表明,BHyT在预训练期间表现出更好的稳定性和效率,与RMSNorm相比,平均训练速度提升1.6%,平均token生成吞吐量提升1.77%,同时在语言理解和推理基准上保持强大的预训练-only和SFT后性能。代码见:https://github.com/MLAI-Yonsei/BHyT

英文摘要

Pre-Layer Normalization (Pre-LN) is the de facto choice for large language models (LLMs) and is crucial for stable pretraining and effective transfer learning. However, Pre-LN incurs repeated statistical-computation overhead and remains vulnerable to the curse of depth, where hidden-state magnitudes and variances grow as the number of layers increases, destabilizing training. Efficiency-oriented normalization-free methods such as Dynamic Tanh (DyT) improve throughput but remain fragile at depth. To jointly address stability and efficiency, we propose Bounded Hyperbolic Tanh (BHyT), a drop-in replacement for Pre-LN. BHyT combines a tanh nonlinearity with explicit, data-driven input bounding to keep activations within a non-saturating range. It prevents depth-wise growth in activation magnitude and variance and provides a theoretical stability guarantee. For efficiency, BHyT computes exact statistics once per block and replaces a second normalization with a lightweight variance approximation. Empirically, BHyT demonstrates improved stability and efficiency during pretraining, achieving an average of 1.6\% faster training and an average of 1.77\% higher token generation throughput compared to RMSNorm, while maintaining strong pretraining-only and post-SFT performance across language understanding and reasoning benchmarks\footnote{Code is available at: https://github.com/MLAI-Yonsei/BHyT}.

2602.02405 2026-06-04 cs.LG cs.AI 版本更新

Making Expert Reasoning Learnable with Self-Distillation

通过自蒸馏使专家推理可学习

Ethan Mendes, Jungsoo Park, Alan Ritter

发表机构 * Georgia Institute of Technology, Atlanta, Georgia(佐治亚理工学院,亚特兰大,佐治亚州)

AI总结 提出分布对齐模仿学习(DAIL),通过两步自蒸馏方法弥合专家解决方案与模型分布之间的差距,利用少量高质量专家数据显著提升大语言模型的推理能力。

Comments ICML 2026

详情
AI中文摘要

提升大语言模型(LLM)的推理能力通常依赖于模型采样正确解以进行强化,或存在更强模型来解决问题。然而,许多难题即使对当前前沿模型也难以处理,阻碍了有效训练信号的提取。一个有前景的替代方案是利用高质量的人类专家解决方案,但直接模仿这些数据从根本上存在分布外问题:专家解决方案通常具有教学性质,包含为人类读者而非计算模型设计的隐含推理间隙。此外,高质量专家解决方案成本高昂,需要可泛化且样本高效的训练方法。我们提出分布对齐模仿学习(DAIL),一种两步自蒸馏方法,通过首先将专家解决方案转化为详细的、分布内的推理轨迹,然后应用对比目标使学习聚焦于专家见解和方法,从而弥合分布差距。我们发现,DAIL可以利用少于1000个高质量专家解决方案,在Qwen2.5-Instruct和Qwen3上实现高达31%的pass@128增益,推理效率翻倍,并实现域外泛化。

英文摘要

Improving the reasoning capabilities of large language models (LLMs) typically relies either on the model's ability to sample a correct solution to be reinforced or the existence of a stronger model able to solve the problem. However, many difficult problems remain intractable for even current frontier models, preventing the extraction of valid training signals. A promising alternative is to leverage high-quality expert human solutions, yet naive imitation of this data fails because it is fundamentally out-of-distribution: expert solutions are typically didactic, containing implicit reasoning gaps intended for human readers rather than computational models. Furthermore, high-quality expert solutions are expensive, necessitating generalizable, sample-efficient training methods. We propose Distribution Aligned Imitation Learning (DAIL), a two-step self-distillation method that bridges the distributional gap by first transforming expert solutions into detailed, in-distribution reasoning traces and then applying a contrastive objective to focus learning on expert insights and methodologies. We find that DAIL can leverage fewer than 1000 high-quality expert solutions to achieve up to 31% pass@128 gains on Qwen2.5-Instruct and Qwen3, double reasoning efficiency, and enable out-of-domain generalization.

2602.01658 2026-06-04 cs.LG cs.AI 版本更新

Efficient Adversarial Attacks on High-dimensional Offline Bandits

高维离线Bandits的高效对抗攻击

Seyed Mohammad Hadi Hosseini, Amir Najafi, Mahdieh Soleymani Baghshah

发表机构 * Department of Computer Engineering, Sharif University of Technology(技术学院计算机工程系)

AI总结 研究离线bandit训练在奖励模型被对抗扰动时的脆弱性,提出高维威胁模型,证明维度增加时攻击所需扰动范数减小,实验验证了针对性攻击的高成功率。

Comments Published at ICLR 2026 Conference

详情
AI中文摘要

Bandit算法最近成为评估机器学习模型(包括生成图像模型和大语言模型)的强大工具,通过高效识别表现最佳的候选者而无需详尽比较。这些方法通常依赖于奖励模型(常在Hugging Face等平台上以公共权重发布)向bandit提供反馈。在线评估昂贵且需要重复试验,而使用记录数据的离线评估已成为有吸引力的替代方案。然而,离线bandit评估的对抗鲁棒性在很大程度上尚未被探索,特别是当攻击者在bandit训练之前扰动奖励模型(而非训练数据)时。在这项工作中,我们通过理论和实证研究离线bandit训练对奖励模型对抗操纵的脆弱性来填补这一空白。我们引入了一种新颖的威胁模型,其中攻击者利用高维环境中的离线数据劫持bandit的行为。从线性奖励函数开始,扩展到非线性模型如ReLU神经网络,我们研究了用于生成模型评估的两个Hugging Face评估器上的攻击:一个测量美学质量,另一个评估组合对齐。我们的结果表明,即使对奖励模型权重进行微小、不可察觉的扰动,也能显著改变bandit的行为。从理论角度来看,我们证明了一个显著的高维效应:随着输入维度的增加,成功攻击所需的扰动范数减小,使得现代应用如图像评估尤其脆弱。大量实验证实,简单的随机扰动无效,而精心设计的针对性攻击实现了近乎完美的攻击成功率。

英文摘要

Bandit algorithms have recently emerged as a powerful tool for evaluating machine learning models, including generative image models and large language models, by efficiently identifying top-performing candidates without exhaustive comparisons. These methods typically rely on a reward model, often distributed with public weights on platforms such as Hugging Face, to provide feedback to the bandit. While online evaluation is expensive and requires repeated trials, offline evaluation with logged data has become an attractive alternative. However, the adversarial robustness of offline bandit evaluation remains largely unexplored, particularly when an attacker perturbs the reward model (rather than the training data) prior to bandit training. In this work, we fill this gap by investigating, both theoretically and empirically, the vulnerability of offline bandit training to adversarial manipulations of the reward model. We introduce a novel threat model in which an attacker exploits offline data in high-dimensional settings to hijack the bandit's behavior. Starting with linear reward functions and extending to nonlinear models such as ReLU neural networks, we study attacks on two Hugging Face evaluators used for generative model assessment: one measuring aesthetic quality and the other assessing compositional alignment. Our results show that even small, imperceptible perturbations to the reward model's weights can drastically alter the bandit's behavior. From a theoretical perspective, we prove a striking high-dimensional effect: as input dimensionality increases, the perturbation norm required for a successful attack decreases, making modern applications such as image evaluation especially vulnerable. Extensive experiments confirm that naive random perturbations are ineffective, whereas carefully targeted perturbations achieve near-perfect attack success rates ...

2602.01619 2026-06-04 cs.LG cs.AI 版本更新

SUSD: Structured Unsupervised Skill Discovery through State Factorization

SUSD: 通过状态分解的结构化无监督技能发现

Seyed Mohammad Hadi Hosseini, Mahdieh Soleymani Baghshah

发表机构 * Department of Computer Engineering(计算机工程系)

AI总结 提出SUSD框架,通过将状态空间分解为独立组件并分配不同技能变量,结合动态模型自适应引导探索,实现更丰富多样的无监督技能发现,并在分解环境中显著优于现有方法。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

无监督技能发现(USD)旨在无需外部奖励的情况下自主学习多样化的技能。最常见的USD方法之一是最大化技能潜在变量与状态之间的互信息(MI)。然而,基于MI的方法由于其不变性特性,倾向于偏好简单、静态的技能,限制了动态、任务相关行为的发现。距离最大化技能发现(DSD)通过利用状态空间距离促进更动态的技能,但仍未能鼓励涵盖环境中所有可控因素或实体的全面技能集。在这项工作中,我们引入了SUSD,一种新颖的框架,通过将状态空间分解为独立组件(例如,物体或可控实体)来利用环境的组合结构。SUSD将不同的技能变量分配给不同的因素,从而实现对技能发现过程的更细粒度控制。一个动态模型还跟踪各因素的学习情况,自适应地将智能体的注意力引导至未充分探索的因素。这种结构化方法不仅促进了更丰富、更多样化技能的发现,还产生了一种分解的技能表示,能够对单个实体进行细粒度且解耦的控制,从而通过分层强化学习(HRL)促进组合下游任务的高效训练。我们在三个环境中的实验结果(因素数量从1到10)表明,我们的方法能够在无监督的情况下发现多样且复杂的技能,在分解和复杂环境中显著优于现有的无监督技能发现方法。代码公开于:https://github.com/hadi-hosseini/SUSD。

英文摘要

Unsupervised Skill Discovery (USD) aims to autonomously learn a diverse set of skills without relying on extrinsic rewards. One of the most common USD approaches is to maximize the Mutual Information (MI) between skill latent variables and states. However, MI-based methods tend to favor simple, static skills due to their invariance properties, limiting the discovery of dynamic, task-relevant behaviors. Distance-Maximizing Skill Discovery (DSD) promotes more dynamic skills by leveraging state-space distances, yet still fall short in encouraging comprehensive skill sets that engage all controllable factors or entities in the environment. In this work, we introduce SUSD, a novel framework that harnesses the compositional structure of environments by factorizing the state space into independent components (e.g., objects or controllable entities). SUSD allocates distinct skill variables to different factors, enabling more fine-grained control on the skill discovery process. A dynamic model also tracks learning across factors, adaptively steering the agent's focus toward underexplored factors. This structured approach not only promotes the discovery of richer and more diverse skills, but also yields a factorized skill representation that enables fine-grained and disentangled control over individual entities which facilitates efficient training of compositional downstream tasks via Hierarchical Reinforcement Learning (HRL). Our experimental results across three environments, with factors ranging from 1 to 10, demonstrate that our method can discover diverse and complex skills without supervision, significantly outperforming existing unsupervised skill discovery methods in factorized and complex environments. Code is publicly available at: https://github.com/hadi-hosseini/SUSD.

2601.15158 2026-06-04 cs.LG cs.AI 版本更新

Outcome-Based RL Provably Leads Transformers to Reason, but Only With the Right Data

基于结果的强化学习可证明地引导Transformer进行推理,但仅在合适的数据条件下

Yuval Ran-Milo, Yotam Alexander, Shahar Mendel, Nadav Cohen

发表机构 * Tel Aviv University(特拉维夫大学)

AI总结 本文通过分析单层Transformer在合成图遍历任务上的策略梯度动力学,证明了基于结果的强化学习能够使Transformer自发学习出结构化的迭代推理算法,并揭示了训练数据中“简单示例”的分布对推理能力涌现的关键作用。

Comments 94 pages, 7 figures

详情
AI中文摘要

通过基于结果的监督进行强化学习训练的Transformer可以自发地生成中间推理步骤(思维链)。然而,稀疏奖励驱动策略梯度发现这种系统性推理的机制仍然知之甚少。我们通过分析单层Transformer在合成图遍历任务上的策略梯度动力学来解决这个问题,该任务没有思维链就无法解决,但允许简单的迭代解决方案。我们证明,尽管仅对最终答案的正确性进行训练,策略梯度仍驱动Transformer收敛到一个结构化的、可解释的算法,该算法逐顶点迭代遍历图。我们刻画了这种涌现所需的分布特性,识别出“简单示例”(即需要较少推理步骤的实例)的关键作用。当训练分布在这些更简单的示例上放置足够的质量时,Transformer学习到一种可泛化的遍历策略,能够外推到更长的链;当这种质量消失时,策略梯度学习变得不可行。我们通过在合成数据上的实验以及在数学推理任务中使用真实世界语言模型的实验来证实我们的理论结果,验证了我们的理论发现可以推广到实际场景。

英文摘要

Transformers trained via Reinforcement Learning (RL) with outcome-based supervision can spontaneously develop the ability to generate intermediate reasoning steps (Chain-of-Thought). Yet the mechanism by which sparse rewards drive policy gradient to discover such systematic reasoning remains poorly understood. We address this by analyzing the policy gradient dynamics of single-layer Transformers on a synthetic graph traversal task that cannot be solved without Chain-of-Thought but admits a simple iterative solution. We prove that despite training solely on final-answer correctness, policy gradient drives the Transformer to converge to a structured, interpretable algorithm that iteratively traverses the graph vertex-by-vertex. We characterize the distributional properties required for this emergence, identifying the critical role of "simple examples": instances requiring fewer reasoning steps. When the training distribution places sufficient mass on these simpler examples, the Transformer learns a generalizable traversal strategy that extrapolates to longer chains; when this mass vanishes, policy gradient learning becomes infeasible. We corroborate our theoretical results through experiments on synthetic data and with real-world language models on mathematical reasoning tasks, validating that our theoretical findings carry over to practical settings.

2512.21917 2026-06-04 cs.LG cs.AI econ.EM stat.ML 版本更新

Semiparametric Preference Optimization: Your Language Model is Secretly a Single-Index Model

半参数偏好优化:你的语言模型秘密地是一个单索引模型

Nathan Kallus

发表机构 * Netflix & Cornell University(Netflix与康奈尔大学)

AI总结 本文提出半参数偏好优化方法,通过放宽偏好与潜在奖励之间的链接函数假设,在未知且无限制的链接函数下进行策略对齐,并证明策略类的可实现性诱导出半参数单索引二元选择模型,直接学习策略并给出链接无关的收敛保证。

详情
AI中文摘要

策略对齐到偏好数据通常假设观察到的偏好与潜在奖励之间存在已知的链接函数(例如,Bradley-Terry模型/逻辑链接)。这种链接的错误设定可能会使推断的奖励产生偏差,并使学习到的策略偏离对齐。我们研究了在未知且无限制的链接函数下的策略对齐。我们提出了一个$f$-散度约束的奖励最大化问题,并表明策略类中的可实现性诱导出一个半参数单索引二元选择模型,其中标量策略诱导的索引捕获了所有对示范的依赖,而剩余的偏好分布是无限制的。与计量经济学中要求识别此类模型的结构参数并进行估计不同,我们开发了直接学习策略的方法,其中奖励函数是隐式的,分析了与最优策略的误差,并允许不可识别和非参数的索引。我们证明了基于通用函数复杂度度量的链接无关收敛保证,并通过实验验证了方法和理论。代码可在 https://github.com/causalml/spo/ 获取。

英文摘要

Policy alignment to preference data typically assumes a known link function between observed preferences and latent rewards (e.g., Bradley-Terry model / logistic link). Misspecification of this link can bias inferred rewards and misalign learned policies. We study policy alignment under an unknown and unrestricted link function. We formulate an $f$-divergence-constrained reward maximization problem and show that realizability in a policy class induces a semiparametric single-index binary choice model, where a scalar policy-induced index captures all dependence on demonstrations and the remaining preference distribution is unrestricted. Rather than impose identifiability of structural parameters of such a model and estimate them, as in econometrics, we develop methods that directly learn policies, with the reward function implicit, analyzing error to the optimal policy and allowing for unidentifiable and nonparametric indices. We prove link-agnostic convergence guarantees in terms of generic function complexity measures and validate the methods and theory empirically. Code is available at https://github.com/causalml/spo/.

2602.01146 2026-06-04 cs.AI 版本更新

PersistBench: When Should Long-Term Memories Be Forgotten by LLMs?

PersistBench: 大型语言模型何时应忘记长期记忆?

Sidharth Pulipaka, Oliver Chen, Manas Sharma, Taaha S Bajwa, Vyas Raina, Ivaxi Sheth

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出 PersistBench 基准,评估 LLM 长期记忆带来的跨域泄露和记忆诱导的谄媚安全风险,发现主流模型失败率高达 53% 和 97%。

Comments 76 pages, 34 figures, ICML (2026)

详情
AI中文摘要

对话助手正越来越多地将长期记忆与大型语言模型(LLM)集成。这种记忆的持久性,例如用户是素食主义者,可以增强未来对话中的个性化。然而,同样的持久性也可能引入很大程度上被忽视的安全风险。因此,我们引入 PersistBench 来衡量这些安全风险的程度。我们识别出两种长期记忆特有的风险:跨域泄露,即 LLM 不恰当地从长期记忆中注入上下文;以及记忆诱导的谄媚,即存储的长期记忆暗中强化用户偏见。我们在基准上评估了 18 个前沿和开源 LLM。我们的结果显示这些 LLM 的失败率高得惊人——跨域样本的中位失败率为 53%,谄媚样本为 97%。为了解决这个问题,我们的基准鼓励在前沿对话系统中开发更稳健、更安全的长期记忆使用方式。

英文摘要

Conversational assistants are increasingly integrating long-term memory with large language models (LLMs). This persistence of memories, e.g., the user is vegetarian, can enhance personalization in future conversations. However, the same persistence can also introduce safety risks that have been largely overlooked. Hence, we introduce PersistBench to measure the extent of these safety risks. We identify two long-term memory-specific risks: cross-domain leakage, where LLMs inappropriately inject context from the long-term memories; and memory-induced sycophancy, where stored long-term memories insidiously reinforce user biases. We evaluate 18 frontier and open-source LLMs on our benchmark. Our results reveal a surprisingly high failure rate across these LLMs - a median failure rate of 53% on cross-domain samples and 97% on sycophancy samples. To address this, our benchmark encourages the development of more robust and safer long-term memory usage in frontier conversational systems.

2601.21461 2026-06-04 cs.LG cs.AI 版本更新

L$^3$: Large Lookup Layers

L$^3$:大型查找层

Albert Tseng, Christopher De Sa

发表机构 * Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出Large Lookup Layer (L$^3$),通过静态基于token的路由聚合每个token的嵌入,实现稀疏性,在语言建模和下游任务中优于稠密模型和等稀疏MoE。

Comments ICML 2026

详情
AI中文摘要

现代稀疏语言模型通常通过混合专家(MoE)层实现稀疏性,该层动态地将token路由到稠密MLP“专家”。然而,动态硬路由存在一些缺点,例如潜在的硬件效率低下以及需要辅助损失来稳定训练。相比之下,分词器嵌入表本质上是稀疏的,通过为每个token选择单个嵌入来避免这些问题,但代价是没有上下文信息。在这项工作中,我们引入了大型查找层(L$^3$),它将嵌入表推广到模型解码器层,作为进一步扩展稀疏性的一种手段。L$^3$层使用基于token的静态路由,以上下文相关的方式聚合每个token的一组学习嵌入,允许模型通过将信息缓存在嵌入中有效地平衡内存和计算。L$^3$有两个主要组成部分:(1)一个系统友好的架构,允许快速训练和CPU卸载推理,且没有开销;(2)一种信息论嵌入分配算法,有效平衡速度和质量。我们通过训练具有多达2.6B活动参数的transformer来实证测试L$^3$,发现L$^3$在语言建模和下游任务中均显著优于稠密模型和等稀疏MoE。

英文摘要

Modern sparse language models typically achieve sparsity through Mixture-of-Experts (MoE) layers, which dynamically route tokens to dense MLP "experts." However, dynamic hard routing has a number of drawbacks, such as potentially poor hardware efficiency and needing auxiliary losses for stable training. In contrast, the tokenizer embedding table, which is natively sparse, largely avoids these issues by selecting a single embedding per token at the cost of not having contextual information. In this work, we introduce the Large Lookup Layer (L$^3$), which generalizes embedding tables to model decoder layers as a means of further scaling sparsity. L$^3$ layers use static token-based routing to aggregate a set of learned embeddings per token in a context-dependent way, allowing the model to efficiently balance memory and compute by caching information in embeddings. L$^3$ has two main components: (1) a systems-friendly architecture that allows for fast training and CPU-offloaded inference with no overhead, and (2) an information-theoretic embedding allocation algorithm that effectively balances speed and quality. We empirically test L$^3$ by training transformers with up to 2.6B active parameters and find that L$^3$ strongly outperforms both dense models and iso-sparse MoEs in both language modeling and downstream tasks.

2601.22450 2026-06-04 cs.LG cs.AI 版本更新

Tuning the Implicit Regularizer of Masked Diffusion Language Models: Enhancing Generalization via Insights from $k$-Parity

调整掩码扩散语言模型的隐式正则化器:通过$k$-奇偶问题的见解增强泛化能力

Jianhao Huang, Baharan Mirzasoleiman

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 本文通过$k$-奇偶问题研究掩码扩散语言模型的泛化特性,理论分解其目标函数为信号和噪声两部分,并利用噪声作为隐式正则化器,通过优化掩码概率分布显著提升模型性能。

Comments ICML 2026

详情
AI中文摘要

掩码扩散语言模型最近成为一种强大的生成范式,但与自回归模型相比,其泛化特性仍未得到充分研究。本文在$k$-奇偶问题(计算$k$个相关位的异或和)的背景下研究这些特性,其中神经网络通常表现出“grokking”现象——长时间的性能平台期后突然泛化。我们从理论上将掩码扩散(MD)目标分解为驱动特征学习的信号机制和作为隐式正则化器的噪声机制。通过在$k$-奇偶问题上使用MD目标训练nanoGPT,我们证明MD目标从根本上改变了学习景观,实现了快速且同时的泛化,而无需经历grokking。此外,我们利用理论见解优化MD目标中掩码概率的分布。我们的方法显著提高了50M参数模型的困惑度,并在从头预训练和监督微调中均取得了优越结果。具体而言,在8B参数模型上,我们观察到性能提升分别达到$8.8\%$和$5.8\%$,证实了我们的框架在大规模掩码扩散语言模型中的可扩展性和有效性。

英文摘要

Masked Diffusion Language Models have recently emerged as a powerful generative paradigm, yet their generalization properties remain understudied compared to their auto-regressive counterparts. In this work, we investigate these properties within the setting of the $k$-parity problem (computing the XOR sum of $k$ relevant bits), where neural networks typically exhibit grokking -- a prolonged plateau of chance-level performance followed by sudden generalization. We theoretically decompose the Masked Diffusion (MD) objective into a Signal regime which drives feature learning, and a Noise regime which serves as an implicit regularizer. By training nanoGPT using MD objective on the $k$-parity problem, we demonstrate that MD objective fundamentally alters the learning landscape, enabling rapid and simultaneous generalization without experiencing grokking. Furthermore, we leverage our theoretical insights to optimize the distribution of the mask probability in the MD objective. Our method significantly improves perplexity for 50M-parameter models and achieves superior results across both pre-training from scratch and supervised fine-tuning. Specifically, we observe performance gains peaking at $8.8\%$ and $5.8\%$, respectively, on 8B-parameter models, confirming the scalability and effectiveness of our framework in large-scale masked diffusion language model regimes.

2601.22396 2026-06-04 cs.CL cs.AI cs.CY cs.HC physics.soc-ph 版本更新

Culturally Grounded Personas in Large Language Models: Characterization and Alignment with Socio-Psychological Value Frameworks

大型语言模型中的文化基础人物角色:与社会心理价值框架的表征与对齐

Candida M. Greco, Lucio La Cava, Andrea Tagarelli

发表机构 * DIMES, University of Calabria, Italy(意大利卡拉布里亚大学DIMES研究所)

AI总结 本研究通过世界价值观调查、英格尔哈特-韦尔策尔文化地图和道德基础理论,评估大型语言模型生成的文化基础人物角色是否准确反映不同文化条件下的世界和道德价值体系,并分析其跨文化结构和道德变异。

Comments Under Review

详情
AI中文摘要

尽管大型语言模型(LLMs)在模拟人类行为方面的实用性日益增强,但这些合成人物角色在不同文化条件下是否准确反映世界和道德价值体系仍不确定。本文研究了合成、文化基础人物角色与既定框架(特别是世界价值观调查(WVS)、英格尔哈特-韦尔策尔文化地图和道德基础理论)的对齐情况。我们基于一组可解释的WVS衍生变量概念化并生成LLM人物角色,并通过三个互补视角检查生成的人物角色:在英格尔哈特-韦尔策尔地图上的定位,揭示其反映跨文化条件稳定差异的解释;与世界价值观调查在人口统计层面的一致性,其中响应分布大致追踪人类群体模式;以及源自道德基础问卷的道德轮廓,我们通过文化-道德映射分析道德响应如何在不同文化配置中变化。我们的文化基础人物角色生成和分析方法能够评估跨文化结构和道德变异。

英文摘要

Despite the growing utility of Large Language Models (LLMs) for simulating human behavior, the extent to which these synthetic personas accurately reflect world and moral value systems across different cultural conditionings remains uncertain. This paper investigates the alignment of synthetic, culturally-grounded personas with established frameworks, specifically the World Values Survey (WVS), the Inglehart-Welzel Cultural Map, and Moral Foundations Theory. We conceptualize and produce LLM-generated personas based on a set of interpretable WVS-derived variables, and we examine the generated personas through three complementary lenses: positioning on the Inglehart-Welzel map, which unveils their interpretation reflecting stable differences across cultural conditionings; demographic-level consistency with the World Values Survey, where response distributions broadly track human group patterns; and moral profiles derived from a Moral Foundations questionnaire, which we analyze through a culture-to-morality mapping to characterize how moral responses vary across different cultural configurations. Our approach of culturally-grounded persona generation and analysis enables evaluation of cross-cultural structure and moral variation.

2601.19921 2026-06-04 cs.CL cs.AI 版本更新

Demystifying Multi-Agent Debate: The Role of Confidence and Diversity

揭秘多智能体辩论:置信度与多样性的作用

Xiaochen Zhu, Caiqi Zhang, Yizhou Chi, Tom Stafford, Nigel Collier, Andreas Vlachos

发表机构 * University of Cambridge(剑桥大学) University of Sheffield(谢菲尔德大学)

AI总结 针对多智能体辩论(MAD)在提升大语言模型性能时效果不佳的问题,提出多样性感知初始化和置信度调节辩论协议两种轻量级干预方法,显著提升辩论有效性。

详情
AI中文摘要

多智能体辩论(MAD)被广泛用于通过测试时缩放提升大语言模型(LLM)性能,然而近期研究表明,尽管计算成本更高,普通MAD往往不如简单的多数投票。研究表明,在同质化智能体和统一信念更新下,辩论保持了预期的正确性,因此无法可靠地改善结果。借鉴人类审议和集体决策的研究发现,我们识别出普通MAD缺失的两个关键机制:(i)初始观点的多样性,以及(ii)明确且校准的置信度沟通。我们提出两种轻量级干预方法。首先,一种多样性感知初始化,选择更多样化的候选答案池,增加辩论开始时存在正确假设的可能性。其次,一种置信度调节的辩论协议,其中智能体表达校准后的置信度,并根据他人的置信度调节其更新。我们从理论上证明,多样性感知初始化在不改变底层更新动态的情况下提高了MAD成功的先验概率,而置信度调节更新使辩论能够系统地漂移到正确假设。在实验上,在六个面向推理的QA基准测试中,我们的方法始终优于普通MAD和多数投票。我们的结果将人类审议与基于LLM的辩论联系起来,并表明简单、有原则的修改可以显著增强辩论效果。

英文摘要

Multi-agent debate (MAD) is widely used to improve large language model (LLM) performance through test-time scaling, yet recent work shows that vanilla MAD often underperforms simple majority vote despite higher computational cost. Studies show that, under homogeneous agents and uniform belief updates, debate preserves expected correctness and therefore cannot reliably improve outcomes. Drawing on findings from human deliberation and collective decision-making, we identify two key mechanisms missing from vanilla MAD: (i) diversity of initial viewpoints and (ii) explicit, calibrated confidence communication. We propose two lightweight interventions. First, a diversity-aware initialisation that selects a more diverse pool of candidate answers, increasing the likelihood that a correct hypothesis is present at the start of debate. Second, a confidence-modulated debate protocol in which agents express calibrated confidence and condition their updates on others' confidence. We show theoretically that diversity-aware initialisation improves the prior probability of MAD success without changing the underlying update dynamics, while confidence-modulated updates enable debate to systematically drift to the correct hypothesis. Empirically, across six reasoning-oriented QA benchmarks, our methods consistently outperform vanilla MAD and majority vote. Our results connect human deliberation with LLM-based debate and demonstrate that simple, principled modifications can substantially enhance debate effectiveness.

2512.03553 2026-06-04 cs.CV cs.AI 版本更新

Dynamic Content Moderation in Livestreams: Combining Supervised Classification with MLLM-Boosted Similarity Matching

直播中的动态内容审核:结合监督分类与MLLM增强的相似度匹配

Wei Chee Yew, Hailun Xu, Sanjay Saha, Xiaotian Fan, Hiok Hian Ong, David Yuchen Wang, Kanchan Sarkar, Zhenheng Yang, Danhui Guan

发表机构 * TikTok Singapore Singapore(TikTok新加坡) TikTok San Jose United States(TikTok旧金山美国) TikTok Shanghai China(TikTok上海中国)

AI总结 提出一种混合审核框架,结合监督分类和基于参考的相似度匹配,利用多模态大语言模型提升准确性,在保持轻量推理的同时实现大规模直播内容审核。

Comments To be published at KDD 2026 (ADS track)

详情
AI中文摘要

内容审核对于大规模用户生成视频平台仍然是一个关键且具有挑战性的任务,尤其是在直播环境中,审核必须及时、多模态,并且能够应对不断演变的不良内容形式。我们提出了一个在生产规模部署的混合审核框架,该框架将已知违规的监督分类与针对新颖或微妙情况的基于参考的相似度匹配相结合。这种混合设计能够稳健地检测出明确违规以及传统分类器无法检测到的新颖边缘情况。多模态输入(文本、音频、视觉)通过两个流水线处理,多模态大语言模型(MLLM)将知识提炼到每个流水线中,以提高准确性,同时保持推理轻量。在生产中,分类流水线在80%精确率下达到67%召回率,相似度流水线在80%精确率下达到76%召回率。大规模A/B测试显示,用户对不良直播的观看次数减少了6-8%。这些结果表明了一种可扩展且适应性强的多模态内容治理方法,能够处理明确违规和新兴对抗行为。

英文摘要

Content moderation remains a critical yet challenging task for large-scale user-generated video platforms, especially in livestreaming environments where moderation must be timely, multimodal, and robust to evolving forms of unwanted content. We present a hybrid moderation framework deployed at production scale that combines supervised classification for known violations with reference-based similarity matching for novel or subtle cases. This hybrid design enables robust detection of both explicit violations and novel edge cases that evade traditional classifiers. Multimodal inputs (text, audio, visual) are processed through both pipelines, with a multimodal large language model (MLLM) distilling knowledge into each to boost accuracy while keeping inference lightweight. In production, the classification pipeline achieves 67% recall at 80% precision, and the similarity pipeline achieves 76% recall at 80% precision. Large-scale A/B tests show a 6-8% reduction in user views of unwanted livestreams}. These results demonstrate a scalable and adaptable approach to multimodal content governance, capable of addressing both explicit violations and emerging adversarial behaviors.

2506.10912 2026-06-04 cs.AI cs.CL 版本更新

Breaking Bad Molecules: Are MLLMs Ready for Structure-Level Molecular Detoxification?

Breaking Bad Molecules: MLLMs 是否准备好进行结构级分子解毒?

Fei Lin, Ziyang Gong, Cong Wang, Tengchao Zhang, Yonglin Tian, Yining Jiang, Ji Dai, Chao Guo, Xiaotong Yu, Xue Yang, Gen Luo, Fei-Yue Wang

发表机构 * Department of Engineering Science, Macau University of Science and Technology, Macau, China(澳门科学技术大学工程科学系) School of Computer Science, Shanghai Jiao Tong University, Shanghai, China(上海交通大学计算机科学学院) Institute of Automation, Chinese Academy of Sciences, Beijing, China(中国科学院自动化研究所) School of Pharmacy, Macau University of Science and Technology, Macau, China(澳门科学技术大学药学院) Faculty of Electrical Engineering and Computer Science, Ningbo University, Ningbo, China(宁波大学电气与计算机科学学院) State Key Laboratory of Biopharmaceutical Preparation and Delivery, Institute of Process Engineering, Chinese Academy of Sciences, Beijing, China(中国科学院生物制药制备与递送国家重点实验室) School of Automation and Intelligent Sensing, Shanghai Jiao Tong University, Shanghai, China(上海交通大学自动化与智能感知学院) Shanghai Artificial Intelligence Laboratory, Shanghai, China(上海人工智能实验室)

AI总结 本文提出 ToxiMol 基准任务,利用多模态大语言模型进行分子毒性修复,并构建数据集、提示流程和自动评估框架 ToxiEval,实验表明当前模型虽面临挑战但展现出毒性理解与结构编辑的潜力。

详情
AI中文摘要

毒性仍然是早期药物开发失败的主要原因。尽管分子设计和性质预测取得了进展,但分子毒性修复任务——生成结构有效且毒性降低的分子替代物——尚未被系统定义或基准化。为填补这一空白,我们引入了 ToxiMol,这是首个针对通用多模态大语言模型(MLLMs)的分子毒性修复基准任务。我们构建了一个标准化数据集,涵盖 11 个主要任务和 660 个代表性有毒分子,覆盖多种机制和粒度。我们设计了一个具有机制感知和任务自适应能力的提示注释流程,并基于专家毒理学知识。同时,我们提出了一个自动评估框架 ToxiEval,将毒性终点预测、合成可及性、类药性和结构相似性集成到高通量评估链中,用于修复成功评估。我们系统评估了 43 个主流通用 MLLMs,并进行了多项消融研究,以分析关键问题,包括评估指标、候选多样性和失败归因。实验结果表明,尽管当前 MLLMs 在此任务上仍面临重大挑战,但它们开始展现出在毒性理解、语义约束遵循和结构感知编辑方面的有前景的能力。

英文摘要

Toxicity remains a leading cause of early-stage drug development failure. Despite advances in molecular design and property prediction, the task of molecular toxicity repair, generating structurally valid molecular alternatives with reduced toxicity, has not yet been systematically defined or benchmarked. To fill this gap, we introduce ToxiMol, the first benchmark task for general-purpose Multimodal Large Language Models (MLLMs) focused on molecular toxicity repair. We construct a standardized dataset covering 11 primary tasks and 660 representative toxic molecules spanning diverse mechanisms and granularities. We design a prompt annotation pipeline with mechanism-aware and task-adaptive capabilities, informed by expert toxicological knowledge. In parallel, we propose an automated evaluation framework, ToxiEval, which integrates toxicity endpoint prediction, synthetic accessibility, drug-likeness, and structural similarity into a high-throughput evaluation chain for repair success. We systematically assess 43 mainstream general-purpose MLLMs and conduct multiple ablation studies to analyze key issues, including evaluation metrics, candidate diversity, and failure attribution. Experimental results show that although current MLLMs still face significant challenges on this task, they begin to demonstrate promising capabilities in toxicity understanding, semantic constraint adherence, and structure-aware editing.

2601.18777 2026-06-04 cs.LG cs.AI cs.CL cs.IR stat.AP 版本更新

PRECISE: Reducing the Bias of LLM Evaluations Using Prediction-Powered Ranking Estimation

PRECISE: 使用预测驱动的排名估计减少LLM评估的偏差

Abhishek Divekar, Anirban Majumder

发表机构 * Primary contributor and corresponding author(主要贡献者及通讯作者)

AI总结 提出PRECISE框架,通过结合少量人工标注与LLM判断,利用预测驱动推断(PPI)方法,在低资源下可靠估计搜索、排序和RAG系统的指标,并校正LLM偏差。

Comments Accepted at AAAI 2026 - Innovative Applications of AI (IAAI-26)

详情
AI中文摘要

评估搜索、排序和RAG系统的质量传统上需要大量人工相关性标注。近年来,一些已部署的系统探索使用大型语言模型(LLM)作为自动评判者,但其固有偏差阻碍了直接用于指标估计。我们提出了一个扩展预测驱动推断(PPI)的统计框架,将最少的人工标注与LLM判断相结合,以生成需要子实例标注的指标的可靠估计。我们的方法仅需少至100个人工标注查询和10,000个未标注示例,相比传统方法显著减少了标注需求。我们为基于LLM的查询改写应用中的相关性提升推断制定了所提出的框架(PRECISE),将PPI扩展到查询-文档级别的子实例标注。通过重新制定指标集成空间,我们将计算复杂度从O(2^|C|)降低到O(2^K),其中|C|表示语料库大小(百万量级)。在多个著名检索数据集上的详细实验表明,我们的方法降低了业务关键指标Precision@K的估计方差,同时在低资源设置下有效校正了LLM偏差。

英文摘要

Evaluating the quality of search, ranking and RAG systems traditionally requires a significant number of human relevance annotations. In recent times, several deployed systems have explored the usage of Large Language Models (LLMs) as automated judges for this task while their inherent biases prevent direct use for metric estimation. We present a statistical framework extending Prediction-Powered Inference (PPI) that combines minimal human annotations with LLM judgments to produce reliable estimates of metrics which require sub-instance annotations. Our method requires as few as 100 human-annotated queries and 10,000 unlabeled examples, reducing annotation requirements significantly compared to traditional approaches. We formulate our proposed framework (PRECISE) for inference of relevance uplift for an LLM-based query reformulation application, extending PPI to sub-instance annotations at the query-document level. By reformulating the metric-integration space, we reduced the computational complexity from O(2^|C|) to O(2^K), where |C| represents corpus size (in order of millions). Detailed experiments across prominent retrieval datasets demonstrate that our method reduces the variance of estimates for the business-critical Precision@K metric, while effectively correcting for LLM bias in low-resource settings.

2601.18175 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Success Conditioning as Policy Improvement: The Optimization Problem Solved by Imitating Success

成功条件化作为策略改进:模仿成功所解决的优化问题

Daniel Russo

发表机构 * Daniel J. Russo

AI总结 本文证明成功条件化(模仿成功轨迹)精确求解了一个信任区域优化问题,其χ²散度约束半径由数据自动确定,并揭示了相对策略改进、策略变化幅度和动作影响之间的等式关系。

详情
AI中文摘要

一种广泛使用的策略改进技术是成功条件化,即收集轨迹,识别那些实现期望结果的轨迹,并更新策略以模仿沿成功轨迹采取的动作。这一原则有许多名称——带SFT的拒绝采样、目标条件化RL、决策Transformer——但它解决了什么优化问题(如果有的话)一直不清楚。我们证明成功条件化精确求解了一个信任区域优化问题,在由数据自动确定半径的χ²散度约束下最大化策略改进。这产生了一个恒等式:相对策略改进、策略变化幅度以及我们称为动作影响(衡量动作选择中的随机变化如何影响成功率)的量在每个状态下都完全相等。因此,成功条件化表现为一个保守的改进算子。精确的成功条件化不会降低性能或引发危险的分布偏移,但当它失败时,它会以可观察的方式失败,即几乎不改变策略。我们将我们的理论应用于常见的回报阈值设定实践,表明这可以放大改进,但代价是可能与真实目标不一致。

英文摘要

A widely used technique for improving policies is success conditioning, in which one collects trajectories, identifies those that achieve a desired outcome, and updates the policy to imitate the actions taken along successful trajectories. This principle appears under many names -- rejection sampling with SFT, goal-conditioned RL, Decision Transformers -- yet what optimization problem it solves, if any, has remained unclear. We prove that success conditioning exactly solves a trust-region optimization problem, maximizing policy improvement subject to a $χ^2$ divergence constraint whose radius is determined automatically by the data. This yields an identity: relative policy improvement, the magnitude of policy change, and a quantity we call action-influence -- measuring how random variation in action choices affects success rates -- are exactly equal at every state. Success conditioning thus emerges as a conservative improvement operator. Exact success conditioning cannot degrade performance or induce dangerous distribution shift, but when it fails, it does so observably, by hardly changing the policy at all. We apply our theory to the common practice of return thresholding, showing this can amplify improvement, but at the cost of potential misalignment with the true objective.

2601.17363 2026-06-04 cs.CL cs.AI 版本更新

Do readers prefer AI-generated Italian short stories?

读者是否更喜欢AI生成的意大利短篇小说?

Michael Farrell

发表机构 * IULM University Milan Italy(米兰IULM大学)

AI总结 通过盲测实验,比较AI(ChatGPT-4o)与著名作家Alberto Moravia的意大利短篇小说,发现AI文本平均评分略高且更受偏好,但差异不显著,且与人口统计和阅读习惯无关。

Comments 8 pages, peer-reviewed and accepted for presentation at New Trends in Translation and Interpreting Technology (NeTTIT 2026), paged-up for publication

详情
AI中文摘要

本研究调查读者是否更喜欢AI生成的意大利短篇小说,而非著名意大利作家创作的作品。在盲测设置中,20名参与者阅读并评估了三篇故事,其中两篇由ChatGPT-4o生成,一篇由Alberto Moravia创作,参与者不知晓故事来源。为探索潜在影响因素,还收集了阅读习惯和人口统计数据,包括年龄、性别、教育程度和母语。结果显示,AI编写的文本平均评分略高,且更常被偏好,尽管差异不大。文本偏好与人口统计或阅读习惯变量之间未发现统计学显著关联。这些发现挑战了读者偏好人类创作小说的假设,并引发了关于在文学语境中是否需要编辑合成文本的问题。

英文摘要

This study investigates whether readers prefer AI-generated short stories in Italian over one written by a renowned Italian author. In a blind setup, 20 participants read and evaluated three stories, two created with ChatGPT-4o and one by Alberto Moravia, without being informed of their origin. To explore potential influencing factors, reading habits and demographic data, comprising age, gender, education and first language, were also collected. The results showed that the AI-written texts received slightly higher average ratings and were more frequently preferred, although differences were modest. No statistically significant associations were found between text preference and demographic or reading-habit variables. These findings challenge assumptions about reader preference for human-authored fiction and raise questions about the necessity of synthetic-text editing in literary contexts.

2601.06196 2026-06-04 cs.LG cs.AI cs.CL 版本更新

Geometry-Aware Hallucination Detection in Large Language Models

大语言模型中的几何感知幻觉检测

Bodla Krishna Vamshi, Rohan Bhatnagar, Haizhao Yang

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校)

AI总结 提出GA-ICL框架,利用冻结LLM的潜在表示建模局部流形和类别原型几何,选择上下文示例以检测幻觉,在FEVER和HaluEval基准上优于基线方法。

详情
AI中文摘要

大型语言模型(LLM)经常生成事实不正确或未经支持的内容,通常称为幻觉。先前的工作探索了解码策略、检索增强和监督微调用于幻觉检测,而最近的研究表明,上下文学习(ICL)可以显著影响事实可靠性。然而,现有的ICL示例选择方法通常依赖于表面相似性启发式方法,并且在任务和模型上表现出有限的鲁棒性。我们提出GA-ICL,一种几何感知的示例采样框架,用于选择上下文示例,该框架利用从冻结LLM中提取的潜在表示。通过联合建模局部流形结构和类别感知的原型几何,GA-ICL根据示例与学习原型的接近程度进行选择,而不仅仅是基于词汇或嵌入相似性。在事实验证(FEVER)和幻觉检测(HaluEval)基准上,GA-ICL在大多数评估设置中优于标准ICL选择基线,在对话和摘要任务上尤其有显著提升。该方法在温度扰动和模型变化下保持鲁棒性,表明与启发式检索策略相比具有更高的稳定性。虽然在较小模型规模下的某些问答场景中,词汇检索仍可能具有竞争力,但我们的结果表明,几何感知的原型选择为幻觉检测提供了一种可靠且训练轻量的方法,无需修改LLM参数。在Phi-14B和Qwen3-32B上的扩展评估证实,GA-ICL能有效扩展到更大模型,在包括较小模型显示边界条件限制的问答任务在内的所有比较基线上均表现优异,为改进ICL示例选择提供了原则性方向。

英文摘要

Large language models (LLMs) frequently generate factually incorrect or unsupported content, commonly referred to as hallucinations. Prior work has explored decoding strategies, retrieval augmentation, and supervised fine-tuning for hallucination detection, while recent studies show that in-context learning (ICL) can substantially influence factual reliability. However, existing ICL demonstration selection methods often rely on surface-level similarity heuristics and exhibit limited robustness across tasks and models. We propose GA-ICL, a geometry-aware demonstration sampling framework for selecting in-context demonstrations that leverages latent representations extracted from frozen LLMs. By jointly modeling local manifold structure and class-aware prototype geometry, GA-ICL selects demonstrations based on their proximity to learned prototypes rather than lexical or embedding similarity alone. Across factual verification (FEVER) and hallucination detection (HaluEval) benchmarks, GA-ICL outperforms standard ICL selection baselines in the majority of evaluated settings, with particularly strong gains on dialogue and summarization tasks. The method remains robust under temperature perturbations and model variation, indicating improved stability compared to heuristic retrieval strategies. While lexical retrieval can remain competitive in certain question-answering regimes at smaller model scales, our results demonstrate that geometry-aware prototype selection provides a reliable and training-light approach for hallucination detection without modifying LLM parameters. Extended evaluations on Phi-14B and Qwen3-32B confirm that GA-ICL scales effectively to larger models, outperforming all compared baselines including on QA tasks where smaller models show boundary-condition limitations, offering a principled direction for improved ICL demonstration selection.

2601.13735 2026-06-04 cs.AI 版本更新

Reasoning or Fluency? Dissecting Probabilistic Confidence in Best-of-N Selection

推理还是流畅性?剖析Best-of-N选择中的概率置信度

Hojin Kim, Jaehyung Kim

发表机构 * Yonsei University(延世大学)

AI总结 本文通过引入三类因果扰动实验,发现当前概率置信度指标主要捕捉表面流畅性而非推理质量,并提出对比因果度量以更忠实地选择输出。

Comments 15 pages, 4 figures

详情
AI中文摘要

概率置信度指标越来越多地被用作Best-of-N选择中推理质量的代理,其假设是更高的置信度反映更高的推理保真度。在这项工作中,我们通过调查这些指标是否真正捕捉到有效推理所需的步骤间因果依赖性来挑战这一假设。我们引入了三类步骤间因果扰动,系统地破坏推理步骤之间的依赖性,同时保持局部流畅性。令人惊讶的是,在不同的模型族和推理基准上,我们发现选择精度在这些扰动下仅轻微下降。即使是严重的干预,例如应用硬注意力掩码直接阻止模型关注先前的推理步骤,也不会显著降低选择性能。这些发现提供了强有力的证据,表明当前的概率指标在很大程度上对逻辑结构不敏感,而是主要捕捉表面流畅性或分布内先验。受此差距的启发,我们提出了一种对比因果度量,明确隔离步骤间因果依赖性,并证明它比现有的基于概率的方法产生更忠实的输出选择。

英文摘要

Probabilistic confidence metrics are increasingly adopted as proxies for reasoning quality in Best-of-N selection, under the assumption that higher confidence reflects higher reasoning fidelity. In this work, we challenge this assumption by investigating whether these metrics truly capture inter-step causal dependencies necessary for valid reasoning. We introduce three classes of inter-step causality perturbations that systematically disrupt dependencies between reasoning steps while preserving local fluency. Surprisingly, across diverse model families and reasoning benchmarks, we find that selection accuracy degrades only marginally under these disruptions. Even severe interventions, such as applying hard attention masks that directly prevent the model from attending to prior reasoning steps, do not substantially reduce selection performance. These findings provide strong evidence that current probabilistic metrics are largely insensitive to logical structure, and primarily capture surface-level fluency or in-distribution priors instead. Motivated by this gap, we propose a contrastive causality metric that explicitly isolates inter-step causal dependencies, and demonstrate that it yields more faithful output selection than existing probability-based approaches.

2601.07036 2026-06-04 cs.CL cs.AI cs.LG 版本更新

Mid-Think: Training-Free Intermediate-Budget Reasoning via Token-Level Triggers

Mid-Think: 通过词元级触发器实现无需训练的中间预算推理

Wang Yang, Debargha Ganguly, Xinpeng Li, Chaoda Song, Shouren Wang, Vikash Singh, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University(凯斯西储大学)

AI总结 本文通过分析注意力机制和提示实验,发现推理行为主要由少量触发词元控制,并据此提出Mid-Think方法,通过组合触发词元实现中间预算推理,在准确率-长度权衡上优于基线,并能在强化学习训练中减少时间并提升性能。

详情
AI中文摘要

混合推理语言模型通常通过高级的Think/No-think指令来控制推理行为,但我们发现这种模式切换主要由一小部分触发词元驱动,而非指令本身。通过注意力分析和受控提示实验,我们表明开头的“Okay”词元会诱导推理行为,而“</think>”后的换行模式则会抑制推理。基于这一观察,我们提出了Mid-Think,一种简单的无需训练的提示格式,通过组合这些触发器实现中间预算推理,在准确率-长度权衡上始终优于固定词元和基于提示的基线。此外,在监督微调后将Mid-Think应用于强化学习训练,可将训练时间减少约15%,同时将Qwen3-8B在AIME上的最终性能从69.8%提升至72.4%,在GPQA上从58.5%提升至61.1%,证明了其在推理时控制和基于强化学习的推理训练中的有效性。

英文摘要

Hybrid reasoning language models are commonly controlled through high-level Think/No-think instructions to regulate reasoning behavior, yet we found that such mode switching is largely driven by a small set of trigger tokens rather than the instructions themselves. Through attention analysis and controlled prompting experiments, we show that a leading ``Okay'' token induces reasoning behavior, while the newline pattern following ``</think>'' suppresses it. Based on this observation, we propose Mid-Think, a simple training-free prompting format that combines these triggers to achieve intermediate-budget reasoning, consistently outperforming fixed-token and prompt-based baselines in terms of the accuracy-length trade-off. Furthermore, applying Mid-Think to RL training after SFT reduces training time by approximately 15% while improving final performance of Qwen3-8B on AIME from 69.8% to 72.4% and on GPQA from 58.5% to 61.1%, demonstrating its effectiveness for both inference-time control and RL-based reasoning training.

2512.04668 2026-06-04 cs.CR cs.AI cs.CL 版本更新

Topology Matters: Measuring Memory Leakage in Multi-Agent LLMs

拓扑结构至关重要:多智能体大语言模型中的内存泄漏测量

Jinbo Liu, Defu Cao, Yifei Wei, Tianyao Su, Yuan Liang, Yushun Dong, Yan Liu, Yue Zhao, Xiyang Hu

发表机构 * Arizona State University(亚利桑那州立大学) University of Southern California(南加州大学) Florida State University(佛罗里达州立大学)

AI总结 提出MAMA框架,通过控制图拓扑结构评估多智能体LLM系统中的内存泄漏,发现密集连接、短攻击距离和高中心性增加泄漏,并给出稀疏或层次化拓扑的设计建议。

Comments Accepted to Findings of the Association for Computational Linguistics: ACL 2026. Camera-ready version

详情
AI中文摘要

图拓扑结构是多智能体LLM系统中内存泄漏的基本决定因素,但其影响尚未得到充分量化。我们提出了MAMA(多智能体内存攻击),一个用于比较多智能体LLM系统中拓扑条件内存泄漏的受控评估框架。MAMA操作于包含标记的个人身份信息(PII)实体的合成文档,从中生成经过清理的任务指令。我们执行两阶段协议:Engram(将私人信息植入目标智能体的内存)和Resonance(多轮交互,攻击者尝试提取)。在10轮中,我们使用两阶段恢复标准测量泄漏,该标准结合了精确匹配提取和基于LLM对攻击者最终输出的推理。我们评估了六种典型拓扑(完全图、环、链、树、星、星环),涉及n∈{4,5,6}、攻击者-目标放置和基础模型。结果一致:更密集的连通性、更短的攻击者-目标距离和更高的目标中心性增加泄漏;大多数泄漏发生在早期轮次,然后趋于平稳;模型选择改变绝对比率但保留广泛的结构趋势;时空/位置属性比身份凭证或受监管标识符更容易泄漏。我们提炼出系统设计的实用指导:倾向于稀疏或层次化连通性,最大化攻击者-目标分离,并通过拓扑感知访问控制限制枢纽/捷径路径。我们的代码可在https://github.com/llll121/mama-eval获取。

英文摘要

Graph topology is a fundamental determinant of memory leakage in multi-agent LLM systems, yet its effects remain poorly quantified. We introduce MAMA (Multi-Agent Memory Attack), a controlled evaluation framework for comparing topology-conditioned memory leakage in multi-agent LLM systems. MAMA operates on synthetic documents containing labeled Personally Identifiable Information (PII) entities, from which we generate sanitized task instructions. We execute a two-phase protocol: Engram (seeding private information into a target agent's memory) and Resonance (multi-round interaction where an attacker attempts extraction). Over 10 rounds, we measure leakage using a two-stage recovery criterion that combines exact-match extraction with LLM-based inference over the attacker's final output. We evaluate six canonical topologies (complete, circle, chain, tree, star, star-ring) across $n\in\{4,5,6\}$, attacker-target placements, and base models. Results are consistent: denser connectivity, shorter attacker-target distance, and higher target centrality increase leakage; most leakage occurs in early rounds and then plateaus; model choice shifts absolute rates but preserves broad structural trends; spatiotemporal/location attributes leak more readily than identity credentials or regulated identifiers. We distill practical guidance for system design: favor sparse or hierarchical connectivity, maximize attacker-target separation, and restrict hub/shortcut pathways via topology-aware access control. Our code is available at https://github.com/llll121/mama-eval.

2511.07107 2026-06-04 cs.AI cs.CL 版本更新

MENTOR: A Metacognition-Driven Self-Evolution Framework for Uncovering and Mitigating Implicit Domain Risks in LLMs

MENTOR: 一种元认知驱动的自我进化框架,用于发现和缓解大语言模型中的隐式领域风险

Liang Shan, Kaicheng Shen, Wen Wu, Zhenyu Ying, Chaochao Lu, Yan Teng, Jingqi Huang, Qingshan Liu, Guangze Ye, Guoqing Wang, Jie Zhou, Liang He

发表机构 * School of Computer Science and Technology, East China Normal University(东华大学计算机科学与技术学院) Shanghai AI Lab, Shanghai Innovation Institute(上海人工智能实验室,上海创新研究院)

AI总结 针对大语言模型在特定领域(如教育、金融、管理)中存在的隐式安全风险,提出基于元认知自我评估和动态规则知识图谱的MENTOR框架,通过激活级引导信号有效降低攻击成功率。

详情
AI中文摘要

确保大语言模型(LLMs)的安全性对于实际部署至关重要。然而,当前的安全措施往往无法解决隐式的、特定领域的风险。为了研究这一差距,我们引入了一个包含3000个标注查询的数据集,涵盖教育、金融和管理领域。对14个主流LLMs的评估揭示了一个令人担忧的漏洞:平均越狱成功率为57.8%。为此,我们提出了MENTOR,一种元认知驱动的自我进化框架。MENTOR执行元认知自我评估,采用视角转换和后果推理等策略来揭示潜在的模型错位。由此产生的反思被提炼为动态的基于规则的知识图谱,从中检索到的规则被转换为激活级引导信号,以在推理过程中指导内部表示。实验表明,MENTOR在所有测试领域显著降低了攻击成功率,并优于现有的安全对齐方法。MENTOR的代码和数据集可在 https://anonymous.4open.science/r/MENTOR-Evo 获取。

英文摘要

Ensuring the safety of Large Language Models (LLMs) is critical for real-world deployment. However, current safety measures often fail to address implicit, domain-specific risks. To investigate this gap, we introduce a dataset of 3,000 annotated queries spanning education, finance, and management. Evaluations across 14 leading LLMs reveal a concerning vulnerability: an average jailbreak success rate of 57.8\%. In response, we propose MENTOR, a metacognition-driven self-evolution framework. MENTOR performs metacognitive self-assessment, using strategies such as perspective-taking and consequential reasoning to uncover latent model misalignments. The resulting reflections are distilled into dynamic rule-based knowledge graphs, from which retrieved rules are converted into activation-level steering signals to guide internal representations during inference. Experiments demonstrate that MENTOR substantially reduces attack success rates across all tested domains and outperforms existing safety alignment methods. The code and dataset for MENTOR are available at: https://anonymous.4open.science/r/MENTOR-Evo.

2411.05894 2026-06-04 cs.CL cs.AI cs.LG 版本更新

SSSD: Simply-Scalable Speculative Decoding

SSSD: 简单可扩展的推测解码

Michele Marzollo, Jiawei Zhuang, Niklas Roemer, Niklas Zwingenberger, Lorenz K. Müller, Lukas Cavigelli

发表机构 * Huawei(华为) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出一种无需训练的推测解码方法SSSD,结合轻量级n-gram匹配和硬件感知推测,在多种基准测试中达到与领先训练方法相当的性能,延迟降低高达2.9倍,且对语言和领域变化具有鲁棒性。

Comments Accepted to the 64th Annual Meeting of the Association for Computational Linguistics (ACL 2026, Main Conference)

详情
AI中文摘要

推测解码已成为加速大型语言模型推理的流行技术。然而,大多数现有方法在生产服务系统中仅带来适度的改进。实现显著加速的方法通常依赖于额外的训练草案模型或辅助模型组件,增加了部署和维护的复杂性。这种增加的复杂性降低了灵活性,特别是当服务负载转移到草案模型训练数据中未充分表示的任务、领域或语言时。我们引入了简单可扩展的推测解码(SSSD),一种无需训练的方法,结合了轻量级n-gram匹配和硬件感知推测。相对于标准自回归解码,SSSD将延迟降低高达2.9倍。它在广泛的基准测试中达到了与领先的基于训练的方法相当的性能,同时需要显著更低的采用成本——无需数据准备、训练或调优——并且在语言和领域变化以及长上下文设置中表现出优越的鲁棒性。

英文摘要

Speculative Decoding has emerged as a popular technique for accelerating inference in Large Language Models. However, most existing approaches yield only modest improvements in production serving systems. Methods that achieve substantial speedups typically rely on an additional trained draft model or auxiliary model components, increasing deployment and maintenance complexity. This added complexity reduces flexibility, particularly when serving workloads shift to tasks, domains, or languages that are not well represented in the draft model's training data. We introduce Simply-Scalable Speculative Decoding (SSSD), a training-free method that combines lightweight n-gram matching with hardware-aware speculation. Relative to standard autoregressive decoding, SSSD reduces latency by up to 2.9x. It achieves performance on par with leading training-based approaches across a broad range of benchmarks, while requiring substantially lower adoption effort--no data preparation, training or tuning are needed--and exhibiting superior robustness under language and domain shift, as well as in long-context settings.

2512.17678 2026-06-04 cs.LG cs.AI 版本更新

You Only Train Once: Differentiable Subset Selection for Omics Data

你只训练一次:用于组学数据的可微分子集选择

Daphné Chopard, Jorge da Silva Gonçalves, Irene Cannistraci, Thomas M. Sutter, Julia E. Vogt

发表机构 * Department of Computer Science, ETH Zurich(计算机科学系,苏黎世联邦理工学院) Department of Intensive Care and Neonatology, University Children’s Hospital Zurich(重症医学与新生儿科,苏黎世大学儿童医院)

AI总结 提出YOTO框架,通过端到端可微架构联合选择离散基因子集并进行预测,实现稀疏、多任务学习,提升单细胞转录组数据分析性能。

Comments Camera-ready version accepted at Transactions on Machine Learning Research (TMLR)

详情
Journal ref
Transactions on Machine Learning Research, 2026
AI中文摘要

从单细胞转录组数据中选择紧凑且信息丰富的基因子集对于生物标志物发现、提高可解释性和成本效益分析至关重要。然而,大多数现有的特征选择方法要么作为多阶段流水线运行,要么依赖于事后特征归因,使得选择和预测弱耦合。在这项工作中,我们提出了YOTO(你只训练一次),一个端到端框架,在单个可微架构中联合识别离散基因子集并进行预测。在我们的模型中,预测任务直接指导选择哪些基因,而学习到的子集反过来塑造预测表示。这种闭环反馈使模型能够在训练过程中迭代地优化其选择内容和预测方式。与现有方法不同,YOTO强制执行稀疏性,使得只有选中的基因对推理有贡献,从而无需训练额外的下游分类器。通过多任务学习设计,模型在相关目标之间学习共享表示,使得部分标记的数据集能够相互提供信息,并发现无需额外训练步骤即可跨任务泛化的基因子集。我们在两个代表性的单细胞RNA-seq数据集上评估YOTO,显示它持续优于最先进的基线。这些结果表明,稀疏、端到端、多任务的基因子集选择提高了预测性能,并产生了紧凑且有意义的基因子集,推进了生物标志物发现和单细胞分析。

英文摘要

Selecting compact and informative gene subsets from single-cell transcriptomic data is essential for biomarker discovery, improving interpretability, and cost-effective profiling. However, most existing feature selection approaches either operate as multi-stage pipelines or rely on post hoc feature attribution, making selection and prediction weakly coupled. In this work, we present YOTO (you only train once), an end-to-end framework that jointly identifies discrete gene subsets and performs prediction within a single differentiable architecture. In our model, the prediction task directly guides which genes are selected, while the learned subsets, in turn, shape the predictive representation. This closed feedback loop enables the model to iteratively refine both what it selects and how it predicts during training. Unlike existing approaches, YOTO enforces sparsity so that only the selected genes contribute to inference, eliminating the need to train additional downstream classifiers. Through a multi-task learning design, the model learns shared representations across related objectives, allowing partially labeled datasets to inform one another, and discovering gene subsets that generalize across tasks without additional training steps. We evaluate YOTO on two representative single-cell RNA-seq datasets, showing that it consistently outperforms state-of-the-art baselines. These results demonstrate that sparse, end-to-end, multi-task gene subset selection improves predictive performance and yields compact and meaningful gene subsets, advancing biomarker discovery and single-cell analysis.

2512.16919 2026-06-04 cs.CV cs.AI cs.RO 版本更新

DVGT: Driving Visual Geometry Transformer

DVGT: 驾驶视觉几何变换器

Sicheng Zuo, Zixun Xie, Wenzhao Zheng, Shaoqing Xu, Fang Li, Shengyin Jiang, Long Chen, Zhi-Xin Yang, Jiwen Lu

发表机构 * Tsinghua University(清华大学) University of Macau(澳门大学) Xiaomi EV(小米电动车) Peking University(北京大学)

AI总结 提出DVGT,一种从无位姿多视角图像序列重建全局稠密3D点图的视觉几何变换器,通过交替注意力机制学习几何关系,无需相机参数和后处理对齐,在多个驾驶数据集上显著优于现有模型。

Comments Code is available at https://github.com/wzzheng/DVGT

详情
AI中文摘要

从视觉输入中感知和重建3D场景几何对于自动驾驶至关重要。然而,目前仍缺乏一种能够适应不同场景和相机配置的、面向驾驶的稠密几何感知模型。为弥补这一空白,我们提出了驾驶视觉几何变换器(DVGT),它从一系列无位姿的多视角视觉输入中重建全局稠密3D点图。我们首先使用DINO骨干网络为每张图像提取视觉特征,并采用交替的视角内局部注意力、跨视角空间注意力和跨帧时间注意力来推断图像间的几何关系。然后,我们使用多个头解码第一帧自车坐标系下的全局点图以及每帧的自车位姿。与依赖精确相机参数的传统方法不同,DVGT无需显式的3D几何先验,能够灵活处理任意相机配置。DVGT直接从图像序列预测度量尺度的几何,消除了与外部传感器后对齐的需求。在包含nuScenes、OpenScene、Waymo、KITTI和DDAD的大型驾驶数据集混合训练下,DVGT在各种场景中显著优于现有模型。代码可在https://github.com/wzzheng/DVGT获取。

英文摘要

Perceiving and reconstructing 3D scene geometry from visual inputs is crucial for autonomous driving. However, there still lacks a driving-targeted dense geometry perception model that can adapt to different scenarios and camera configurations. To bridge this gap, we propose a Driving Visual Geometry Transformer (DVGT), which reconstructs a global dense 3D point map from a sequence of unposed multi-view visual inputs. We first extract visual features for each image using a DINO backbone, and employ alternating intra-view local attention, cross-view spatial attention, and cross-frame temporal attention to infer geometric relations across images. We then use multiple heads to decode a global point map in the ego coordinate of the first frame and the ego poses for each frame. Unlike conventional methods that rely on precise camera parameters, DVGT is free of explicit 3D geometric priors, enabling flexible processing of arbitrary camera configurations. DVGT directly predicts metric-scaled geometry from image sequences, eliminating the need for post-alignment with external sensors. Trained on a large mixture of driving datasets including nuScenes, OpenScene, Waymo, KITTI, and DDAD, DVGT significantly outperforms existing models on various scenarios. Code is available at https://github.com/wzzheng/DVGT.

2512.05277 2026-06-04 cs.CV cs.AI 版本更新

From Segments to Scenes: Temporal Understanding for Agentic Autonomous Driving via Vision-Language Models

从片段到场景:自动驾驶中基于视觉语言模型的时间理解

Kevin Cannons, Saeed Ranjbar Alvar, Mohammad Asiful Hossain, Ahmad Rezaei, Mohsen Gholami, Alireza Heidarikhazaei, Zhou Weimin, Yong Zhang, Mohammad Akbari

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Cambridge(剑桥大学) University of Toronto(多伦多大学) ETH Zurich(苏黎世联邦理工学院) University of Washington(华盛顿大学) University of Southern California(南加州大学)

AI总结 提出自动驾驶时间理解基准TAD,通过场景思维链和轨迹认知图两种无训练方法提升视觉语言模型的时间推理能力。

详情
AI中文摘要

视觉语言模型(VLM)越来越多地被部署为野外自主代理的感知和推理骨干,其中自动驾驶(AD)是最安全关键的实例之一。可靠的时间理解对于此类代理预测事件、归因原因和在动态环境中安全行动至关重要,但即使对于最先进的(SoTA)VLM来说,这仍然是一个重大挑战。先前的视频基准强调了其他内容(体育、烹饪等),但现有基准没有专门关注短时和长时AD视频的时间理解。为填补这一空白,我们提出了自动驾驶时间理解(TAD)基准,包含近6000个问答(QA)对,涵盖7个任务,并评估了9个闭源和开源通用以及AD专用模型。当前SoTA模型在TAD上的表现远低于人类准确率。为了改进基于VLM的驾驶代理的时间推理,我们提出了两种新颖的无训练解决方案:Scene-CoT,它使用思维链(CoT)推理;以及TCogMap,它结合了由轨迹分析模块生成的自我中心时间认知图,该模块作为VLM周围的代理工具运行。与现有VLM集成后,我们的方法在TAD上的平均准确率提高了高达17.72%,在STSBench上提高了高达10.35%。通过引入TAD、对SoTA模型进行基准测试并提出有效的增强方法,本工作旨在促进野外代理AD系统时间理解的进一步进展。基准和评估代码分别可在${\href{https://huggingface.co/datasets/vbdai/TAD}{ ext{Hugging Face}}}$和${\href{https://github.com/vbdi/tad_bench}{ ext{GitHub}}}$上获取。

英文摘要

Vision-Language Models (VLMs) are increasingly deployed as the perception and reasoning backbone of autonomous agents acting in the wild, with autonomous driving (AD) being one of the most safety-critical instances. Reliable temporal understanding is essential for such agents to anticipate events, attribute causes, and act safely in dynamic environments, yet this remains a significant challenge even for state-of-the-art (SoTA) VLMs. Prior video benchmarks have emphasized other content (sports, cooking, etc.), yet no existing benchmark focuses exclusively on temporal understanding for both short- and long-form AD footage. To fill this gap, we present the Temporal Understanding in Autonomous Driving (TAD) benchmark, comprising nearly 6000 question-answer (QA) pairs across 7 tasks, and evaluate 9 closed- and open-source generalist as well as AD-specialist models. Current SoTA models perform substantially below human accuracy on TAD. To improve the temporal reasoning of VLM-based driving agents, we propose two novel training-free solutions: Scene-CoT, which uses Chain-of-Thought (CoT) reasoning, and TCogMap, which incorporates an ego-centric temporal cognitive map produced by a trajectory-analysis module that operates as an agentic tool around the VLM. Integrated with existing VLMs, our methods improve average accuracy on TAD by up to $17.72\%$ and by up to $10.35\%$ on STSBench. By introducing TAD, benchmarking SoTA models, and proposing effective enhancements, this work aims to catalyze further progress on temporal understanding for agentic AD systems operating in the wild. The benchmark and evaluation code are available at ${\href{https://huggingface.co/datasets/vbdai/TAD}{\text{Hugging Face}}}$ and ${\href{https://github.com/vbdi/tad_bench}{\text{GitHub}}}$, respectively.

2511.16624 2026-06-04 cs.CV cs.AI 版本更新

SAM 3D: 3Dfy Anything in Images

SAM 3D: 将图像中的任何内容3D化

SAM 3D Team, Xingyu Chen, Fu-Jen Chu, Pierre Gleize, Kevin J Liang, Alexander Sax, Hao Tang, Weiyao Wang, Michelle Guo, Thibaut Hardin, Xiang Li, Aohan Lin, Jiawei Liu, Ziqi Ma, Anushka Sagar, Bowen Song, Xiaodong Wang, Jianing Yang, Bowen Zhang, Piotr Dollár, Georgia Gkioxari, Matt Feiszli, Jitendra Malik

发表机构 * Meta Superintelligence Labs(Meta超智能实验室)

AI总结 提出SAM 3D生成模型,从单张图像重建3D物体的几何、纹理和布局,通过人机协同标注和分阶段训练突破数据瓶颈,在真实场景中取得显著优势。

Comments Website: https://ai.meta.com/sam3d/

详情
AI中文摘要

我们提出SAM 3D,一种用于视觉引导的3D物体重建的生成模型,能够从单张图像预测几何、纹理和布局。SAM 3D在自然图像中表现出色,这些图像中遮挡和场景杂乱很常见,且来自上下文的视觉识别线索起着更重要的作用。我们通过一个人工和模型在环的流水线来标注物体形状、纹理和姿态,以前所未有的规模提供视觉引导的3D重建数据。我们在一个现代的、多阶段的训练框架中从这些数据中学习,该框架结合了合成预训练和真实世界对齐,打破了3D“数据壁垒”。与近期工作相比,我们获得了显著提升,在真实世界物体和场景的人类偏好测试中至少达到5:1的胜率。我们将发布我们的代码和模型权重、一个在线演示以及一个新的用于野外3D物体重建的具有挑战性的基准测试。

英文摘要

We present SAM 3D, a generative model for visually grounded 3D object reconstruction, predicting geometry, texture, and layout from a single image. SAM 3D excels in natural images, where occlusion and scene clutter are common and visual recognition cues from context play a larger role. We achieve this with a human- and model-in-the-loop pipeline for annotating object shape, texture, and pose, providing visually grounded 3D reconstruction data at unprecedented scale. We learn from this data in a modern, multi-stage training framework that combines synthetic pretraining with real-world alignment, breaking the 3D "data barrier". We obtain significant gains over recent work, with at least a 5:1 win rate in human preference tests on real-world objects and scenes. We will release our code and model weights, an online demo, and a new challenging benchmark for in-the-wild 3D object reconstruction.

2510.17064 2026-06-04 cs.AI 版本更新

BRAINCELL-AID: An Agentic AI Created Brain Cell Type Resource for Community Annotation

BRAINCELL-AID:用于社区注释的由AI代理创建的脑细胞类型资源

Rongbin Li, Wenbo Chen, Zhao Li, Rodrigo Munoz-Castaneda, Jinbo Li, Neha S. Maurya, Arnav Solanki, Huan He, Hanwen Xing, Meaghan Ramlakhan, Zachary Wise, Nelson Johansen, Zhuhao Wu, Hua Xu, Michael Hawrylycz, W. Jim Zheng

发表机构 * McWilliams School of Biomedical Informatics, University of Texas Health Science Center at Houston(德克萨斯大学健康科学中心休斯顿分校麦克威廉斯生物医学信息学学院) Appel Alzheimer’s Disease Research Institute, Feil Family Brain and Mind Research Institute, Weill Cornell Medicine(艾伯神经变性病研究机构、费尔家庭脑与心灵研究机构、韦尔·科恩医学中心) Department of Biomedical Informatics and Data Science, School of Medicine, Yale University(耶鲁大学医学院生物医学信息学与数据科学系) Allen Institute for Brain Science(艾伦脑科学研究所)

AI总结 提出一种多智能体AI系统BRAINCELL-AID,结合检索增强生成和自由文本描述与本体标签,实现对单细胞RNA测序数据中基因集注释的准确性和鲁棒性提升,并应用于小鼠脑细胞图谱的注释。

Comments 23 pages, 6 figures, 2 tables

详情
AI中文摘要

单细胞RNA测序已经改变了我们识别不同细胞类型及其转录组特征的能力。然而,注释这些特征——尤其是那些涉及特征不明确的基因的特征——仍然是一个主要挑战。传统方法,如基因集富集分析(GSEA),依赖于精心策划的注释,并且在这些情况下往往表现不佳。大型语言模型(LLMs)提供了一种有前途的替代方案,但难以在结构化本体中表示复杂的生物学知识。为了解决这个问题,我们提出了BRAINCELL-AID(BRAINCELL-AID:https://biodataai.uth.edu/BRAINCELL-AID),一种新颖的多智能体AI系统,它将自由文本描述与本体标签相结合,以实现更准确和稳健的基因集注释。通过整合检索增强生成(RAG),我们开发了一个稳健的智能体工作流程,利用相关的PubMed文献优化预测,减少幻觉并增强可解释性。使用这个工作流程,我们在小鼠基因集的前列预测中实现了77%的正确注释。应用这种方法,我们注释了来自BRAIN Initiative细胞普查网络生成的综合小鼠脑细胞图谱的5,322个脑细胞簇,通过识别区域特异性基因共表达模式和推断基因集合的功能角色,从而对脑细胞功能产生了新的见解。BRAINCELL-AID还识别了具有神经学意义描述的基底神经节相关细胞类型。因此,我们创建了一个宝贵的资源来支持社区驱动的细胞类型注释。

英文摘要

Single-cell RNA sequencing has transformed our ability to identify diverse cell types and their transcriptomic signatures. However, annotating these signatures-especially those involving poorly characterized genes-remains a major challenge. Traditional methods, such as Gene Set Enrichment Analysis (GSEA), depend on well-curated annotations and often perform poorly in these contexts. Large Language Models (LLMs) offer a promising alternative but struggle to represent complex biological knowledge within structured ontologies. To address this, we present BRAINCELL-AID (BRAINCELL-AID: https://biodataai.uth.edu/BRAINCELL-AID), a novel multi-agent AI system that integrates free-text descriptions with ontology labels to enable more accurate and robust gene set annotation. By incorporating retrieval-augmented generation (RAG), we developed a robust agentic workflow that refines predictions using relevant PubMed literature, reducing hallucinations and enhancing interpretability. Using this workflow, we achieved correct annotations for 77% of mouse gene sets among their top predictions. Applying this approach, we annotated 5,322 brain cell clusters from the comprehensive mouse brain cell atlas generated by the BRAIN Initiative Cell Census Network, enabling novel insights into brain cell function by identifying region-specific gene co-expression patterns and inferring functional roles of gene ensembles. BRAINCELL-AID also identifies Basal Ganglia-related cell types with neurologically meaningful descriptions. Hence, we create a valuable resource to support community-driven cell type annotation.

2511.03304 2026-06-04 cs.LG cs.AI 版本更新

Extending Fair Null-Space Projections for Continuous Attributes to Kernel Methods

将连续属性的公平零空间投影扩展到核方法

Felix Störck, Fabian Hinder, Barbara Hammer

发表机构 * Felix Störck Fabian Hinder Barbara Hammer

AI总结 提出将公平零空间投影扩展到核诱导特征空间,通过经验特征空间直接变换核矩阵,实现模型和公平评分无关的连续属性公平性方法,并在支持向量回归中展示竞争性或改进性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

随着机器学习系统融入数百万人的日常社会生活,公平性在其发展中的优先级日益提高。公平性概念通常依赖受保护属性来评估潜在偏差。这里,大多数文献关注离散设置下的目标和受保护属性。关于连续属性尤其是与回归结合——我们称之为“连续公平性”——的文献很少。一种常见策略是迭代零空间投影,目前仅在线性模型或通过非线性编码器获得的嵌入中探索。我们通过“经验特征空间”将其扩展到核诱导特征空间,从而改进这一点。我们从理论上推导出这是核矩阵的直接变换,产生一种适用于连续受保护属性的模型和公平评分无关的方法。我们证明,与支持向量回归结合时,我们的新方法在多个数据集上相比其他当代方法具有竞争性或改进的性能。

英文摘要

With the on-going integration of machine learning systems into the everyday social life of millions the notion of fairness becomes an ever increasing priority in their development. Fairness notions commonly rely on protected attributes to assess potential biases. Here, the majority of literature focuses on discrete setups regarding both target and protected attributes. The literature on continuous attributes especially in conjunction with regression -- we refer to this as \emph{continuous fairness} -- is scarce. A common strategy is iterative null-space projection which as of now has only been explored for linear models or embeddings such as obtained by a non-linear encoder. We improve on this by extending this to kernel induced feature spaces by means of the ``empirical feature space''. We theoretically derive this as a direct transformation of the kernel matrix yielding a model and fairness-score agnostic method applicable to continuous protected attributes. We demonstrate that our novel approach in conjunction with Support Vector Regression (SVR) provides competitive or improved performance across multiple datasets in comparison to other contemporary methods.

2510.24342 2026-06-04 cs.AI 版本更新

A Unified Geometric Space for Topological Alignment Between Transformer-Based Models and Human Brain Networks

基于Transformer的模型与人脑网络之间拓扑对齐的统一几何空间

Silin Chen, Yuzhong Chen, Caiwei Wang, Zifan Wang, Junhao Wang, Zifeng Jia, Keith M Kendrick, Tuo Zhang, Lin Zhao, Dezhong Yao, Tianming Liu, Xi Jiang

发表机构 * The Clinical Hospital of Chengdu Brain Science Institute, MOE-K Lab for NeuroInformation, Brain‑Apparatus Communication Institute, School of Life Science and Technology, University of Electronic Science and Technology of China(成都脑科学研究院临床医院,MOE-K神经信息实验室,脑-装置通信研究所,电子科技大学生命科学与技术学院) School of Automation, Northwestern Polytechnical University(西北工业大学自动化学院) Department of Biomedical Engineering, New Jersey Institute of Technology(新泽西理工学院生物医学工程系) School of Computing, University of Georgia(佐治亚大学计算机学院)

AI总结 提出一个模态无关、任务无关的拓扑对齐空间,通过图组织属性将Transformer模型的注意力拓扑映射到人脑固有连接网络,揭示了不同模态和规模模型的连续弧形分布及对齐特性。

详情
AI中文摘要

先前的脑-人工智能对齐研究通常受限于特定的输入和任务,限制了其捕捉不同模态模型组织特性的能力。在这项工作中,我们聚焦于基于Transformer的模型,引入了一个脑-模型拓扑对齐空间。我们不是从神经机制推断对齐,而是通过基于图的组织特性来检查对齐,将模型的内在空间注意力拓扑映射到规范的人脑固有连接网络(ICNs)。这使得在组织特性层面上,对视觉、语言和多模态系统进行模态无关且无任务的比较成为可能。通过分析跨这些模态和规模的151个基于Transformer的模型,我们观察到一个连续的弧形分布,反映了不同程度的拓扑对齐。与其训练目标一致,优化用于全局语义抽象的模型与高阶ICNs关联更紧密,而专注于局部细节的模型则与低级ICNs关联。更令人惊讶的是,我们发现了非直观的现象:DINOv2相比其前身表现出对齐降低,蒸馏的DeiT模型显示出反直觉的缩放反转,即更大的模型与高阶ICNs对齐更差,而微调和指令调优对对齐影响有限。此外,拓扑对齐分数与30个视觉Transformer的ImageNet-1K Top-1准确率相关性不显著(r=0.266, p=0.156)。这项工作为通过脑参考拓扑映射比较基于Transformer的模型的组织特性提供了新的定量视角。

英文摘要

Prior brain-AI alignment studies are typically constrained by specific inputs and tasks, limiting their ability to capture organizational properties across models with different modalities. In this work, we focus on Transformer-based models and introduce a brain-model topological alignment space. Rather than inferring alignment from neural mechanisms, we examine it through graph-based organizational properties, mapping the intrinsic spatial attention topology of a model onto canonical human intrinsic connectivity networks (ICNs). This enables a modality-agnostic and task-free comparison across vision, language, and multimodal systems at the level of organizational properties. Analyzing 151 Transformer-based models across these modalities and scales, we observe a continuous arc-shaped distribution, reflecting varying degrees of topological alignment. Consistent with their training objectives, models optimized for global semantic abstraction were associated more closely with higher-order ICNs, while local detail-focused models associated with low-level ICNs. More surprisingly, we uncovered non-intuitive phenomena: DINOv2 exhibited reduced alignment compared to its predecessors, distilled DeiT models showed a counterintuitive scaling inversion where larger models aligned less well with higher-order ICNs, and fine-tuning as well as instruction tuning had limited effect on alignment. Furthermore, topological alignment scores showed non-significant correlation with ImageNet-1K Top-1 accuracy in 30 vision Transformers (r=0.266, p=0.156). This work provides a new quantitative perspective for comparing the organizational properties of Transformer-based models through brain-referenced topological mapping.

2508.08237 2026-06-04 cs.MM cs.AI cs.CV cs.SD eess.AS 版本更新

VGGSounder: Audio-Visual Evaluations for Foundation Models

VGGSounder:基础模型的音视频评估

Daniil Zverev, Thaddäus Wiedemer, Ameya Prabhu, Matthias Bethge, Wieland Brendel, A. Sophia Koepke

发表机构 * Technical University of Munich, MCML(慕尼黑技术大学,MCML) University of Tübingen(图宾根大学) Tübingen AI Center(图宾根人工智能中心) MPI for Intelligent Systems, ELLIS Institute(智能系统Max Planck研究所,ELLIS研究所)

AI总结 针对VGGSound数据集在音视频基础模型评估中的标签不完整、类别重叠和模态错位等问题,提出重新标注的多标签测试集VGGSounder,并引入模态混淆指标分析模型性能退化。

Comments Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) 2025

详情
AI中文摘要

音视频基础模型的出现凸显了可靠评估其多模态理解能力的重要性。VGGSound数据集常被用作评估音视频分类的基准。然而,我们的分析发现了VGGSound的几个局限性,包括标签不完整、部分类别重叠以及模态错位。这些问题导致对听觉和视觉能力的评估出现偏差。为了解决这些局限性,我们引入了VGGSounder,这是一个全面重新标注的多标签测试集,它扩展了VGGSound,并专门设计用于评估音视频基础模型。VGGSounder具有详细的模态标注,能够精确分析特定模态的性能。此外,通过我们新的模态混淆指标,我们分析了添加另一种输入模态时的性能退化,揭示了模型的局限性。

英文摘要

The emergence of audio-visual foundation models underscores the importance of reliably assessing their multi-modal understanding. The VGGSound dataset is commonly used as a benchmark for evaluation audio-visual classification. However, our analysis identifies several limitations of VGGSound, including incomplete labelling, partially overlapping classes, and misaligned modalities. These lead to distorted evaluations of auditory and visual capabilities. To address these limitations, we introduce VGGSounder, a comprehensively re-annotated, multi-label test set that extends VGGSound and is specifically designed to evaluate audio-visual foundation models. VGGSounder features detailed modality annotations, enabling precise analyses of modality-specific performance. Furthermore, we reveal model limitations by analysing performance degradation when adding another input modality with our new modality confusion metric.

2510.15416 2026-06-04 cs.AI 版本更新

Adaptive Minds: Empowering Agents with LoRA-as-Tools

自适应心智:将LoRA作为工具赋予智能体能力

Pavan C Shekar, Aswanth Krishnan

发表机构 * GitHub

AI总结 提出将LoRA适配器作为可调用工具的框架,通过路由和智能体推理聚合多个专业适配器的优势,在30个适配器库中达到98.3%路由准确率,并在九类任务上显著提升性能。

Comments 13 pages, 3 figures, 9 tables. ICML 2026 CompLearn Workshop camera-ready (non-archival). Code: https://github.com/qpiai/adaptive-minds

详情
AI中文摘要

我们研究了一个框架,其中LoRA适配器被视为可调用的工具,基础语言模型可以动态选择并调用它们。我们假设,当适配器经过训练以提供强大的领域特定增益,并附带清晰的元数据时,基础模型可以可靠地将查询路由到适当的专家,从而有效地在单个框架内聚合许多专门适配器的优势。我们引入了自适应心智(Adaptive Minds),这是一个通用框架,在其中我们研究单步路由和多步智能体推理。在这种设置中,智能体可以迭代地调用多个适配器以及其他工具(例如,外部API、检索系统或执行环境),并在多个步骤中对其输出进行推理。这重新将适配器视为模块化技能或记忆单元,可以在推理过程中组合,而不是静态应用。在我们的评估中,路由层在30个适配器库上达到了98.3%的准确率,并且在单一共享训练配方下,训练有素的专业适配器在九个任务族中提供了+4.6到+84.0个百分点的严格评分增益;AM路由器在每个查询包含领域信号的基准测试中,将这些增益聚合在直接专业适配器的5个百分点以内。我们的研究结果表明,该方法的有效性取决于各个适配器的质量和专业化程度,并且启用许多此类专家的灵活组合可以显著扩展语言模型智能体的实际能力,朝着更通用的、工具增强的智能迈进。

英文摘要

We investigate a framework in which LoRA adapters are treated as callable tools that a base language model can dynamically select and invoke. We hypothesize that, when adapters are trained to provide strong domain-specific gains and are exposed with clear metadata, a base model can reliably route queries to the appropriate expert, effectively aggregating the benefits of many specialized adapters within a single framework. We introduce Adaptive Minds, a general framework within which we study both single-step routing and multi-step agentic reasoning. In this setting, the agent can iteratively invoke multiple adapters alongside other tools (e.g., external APIs, retrieval systems, or execution environments) and reason over their outputs across multiple steps. This reframes adapters as modular skills or memory units that can be composed during reasoning rather than statically applied. In our evaluation, the routing layer reaches 98.3% accuracy on a 30-adapter library, and well-trained specialists provide +4.6 to +84.0 percentage points of strict-scorer gain across nine task families under a single shared training recipe; the AM router aggregates these gains within 5 pp of the direct specialist on every benchmark whose queries surface domain signal. Our findings suggest that the effectiveness of this approach depends on the quality and specialization of individual adapters, and that enabling flexible composition of many such experts can significantly expand the practical capabilities of language model agents, moving toward more general, tool-augmented intelligence.

2510.13704 2026-06-04 cs.LG cs.AI cs.RO 版本更新

Simplicial Embeddings Improve Sample Efficiency in Actor-Critic Agents

单纯形嵌入提升Actor-Critic智能体的样本效率

Johan Obando-Ceron, Walter Mayor, Samuel Lavoie, Scott Fujimoto, Aaron Courville, Pablo Samuel Castro

发表机构 * Mila – Québec AI Institute(魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) McGill University(麦吉尔大学) CIFAR AI Chair(CIFAR人工智能主席)

AI总结 针对大规模环境并行化下Actor-Critic方法仍需大量交互的问题,提出使用单纯形嵌入作为轻量级表示层,通过几何归纳偏置产生稀疏离散特征,稳定评论家引导并强化策略梯度,在FastTD3、FastSAC和PPO中一致提升样本效率和最终性能。

详情
AI中文摘要

最近的工作提出通过大规模环境并行化来加速actor-critic方法的挂钟训练时间;不幸的是,这些方法有时仍需要大量的环境交互才能达到期望的性能水平。注意到结构良好的表示可以改善深度强化学习(RL)智能体的泛化能力和样本效率,我们提出使用单纯形嵌入:将嵌入约束到单纯形结构的轻量级表示层。这种几何归纳偏置产生稀疏且离散的特征,稳定了评论家引导并强化了策略梯度。当应用于FastTD3、FastSAC和PPO时,单纯形嵌入在多种连续和离散控制环境中一致提高了样本效率和最终性能,且不损失运行速度。

英文摘要

Recent works have proposed accelerating the wall-clock training time of actor-critic methods via the use of large-scale environment parallelization; unfortunately, these can sometimes still require large number of environment interactions to achieve a desired level of performance. Noting that well-structured representations can improve the generalization and sample efficiency of deep reinforcement learning (RL) agents, we propose the use of simplicial embeddings: lightweight representation layers that constrain embeddings to simplicial structures. This geometric inductive bias results in sparse and discrete features that stabilize critic bootstrapping and strengthen policy gradients. When applied to FastTD3, FastSAC, and PPO, simplicial embeddings consistently improve sample efficiency and final performance across a variety of continuous- and discrete-control environments, without any loss in runtime speed.

2505.11166 2026-06-04 cs.CL cs.AI 版本更新

SoLoPO: Unlocking Long-Context Capabilities in LLMs via Short-to-Long Preference Optimization

SoLoPO: 通过短到长偏好优化解锁大语言模型的长上下文能力

Huashan Sun, Shengyi Liao, Yansen Han, Yu Bai, Yang Gao, Cheng Fu, Weizhou Shen, Fanqi Wan, Ming Yan, Ji Zhang, Fei Huang

发表机构 * Tongyi Lab, Alibaba Group(通义实验室,阿里巴巴集团)

AI总结 提出SoLoPO框架,将长上下文偏好优化解耦为短上下文偏好优化和短到长奖励对齐,以提升大语言模型的长上下文利用能力。

Comments Published as a conference paper at ICLR 2026

详情
AI中文摘要

尽管在扩展上下文大小的预训练方面取得了进展,但大语言模型(LLMs)在有效利用现实世界中的长上下文信息方面仍面临挑战,这主要是由于数据质量问题、训练效率低下以及缺乏设计良好的优化目标导致的长上下文对齐不足。为了解决这些限制,我们提出了一个名为 extbf{S}h extbf{o}rt-to- extbf{Lo}ng extbf{P}reference extbf{O}ptimization( extbf{SoLoPO})的框架,将长上下文偏好优化(PO)解耦为两个组成部分:短上下文PO和短到长奖励对齐(SoLo-RA),并得到了理论和实验证据的支持。具体来说,短上下文PO利用从短上下文中采样的偏好对来增强模型的情境知识利用能力。同时,SoLo-RA明确鼓励在包含相同任务相关信息的短上下文和长上下文条件下,响应的奖励分数一致性。这有助于将模型处理短上下文的能力迁移到长上下文场景中。SoLoPO与主流的偏好优化算法兼容,同时显著提高了数据构建和训练过程的效率。实验结果表明,SoLoPO增强了所有这些算法在各种长上下文基准测试中的长度和领域泛化能力,同时在计算和内存效率方面取得了显著提升。

英文摘要

Despite advances in pretraining with extended context sizes, large language models (LLMs) still face challenges in effectively utilizing real-world long-context information, primarily due to insufficient long-context alignment caused by data quality issues, training inefficiencies, and the lack of well-designed optimization objectives. To address these limitations, we propose a framework named \textbf{S}h\textbf{o}rt-to-\textbf{Lo}ng \textbf{P}reference \textbf{O}ptimization (\textbf{SoLoPO}), decoupling long-context preference optimization (PO) into two components: short-context PO and short-to-long reward alignment (SoLo-RA), supported by both theoretical and empirical evidence. Specifically, short-context PO leverages preference pairs sampled from short contexts to enhance the model's contextual knowledge utilization ability. Meanwhile, SoLo-RA explicitly encourages reward score consistency for the responses when conditioned on both short and long contexts that contain identical task-relevant information. This facilitates transferring the model's ability to handle short contexts into long-context scenarios. SoLoPO is compatible with mainstream preference optimization algorithms, while substantially improving the efficiency of data construction and training processes. Experimental results show that SoLoPO enhances all these algorithms with respect to stronger length and domain generalization abilities across various long-context benchmarks, while achieving notable improvements in both computational and memory efficiency.

1708.06233 2026-06-04 cs.AI cs.MA cs.SI econ.GN physics.soc-ph q-fin.EC 版本更新

Fake News in Social Networks

社交媒体中的虚假新闻

Christoph Aymanns, Jakob Foerster, Co-Pierre Georg, Matthias Weber

发表机构 * University of St. Gallen(圣加尔大学) University of Oxford(牛津大学) Frankfurt School of Finance and Management(法兰克福金融与管理学院) Swiss Finance Institute(瑞士金融研究所)

AI总结 本文提出多智能体强化学习作为建模社交媒体中虚假新闻的新方法,发现针对高连接性和弱隐私信息的人群更有效,且信息分散传播比集中传播更有效,同时平衡网络中虚假新闻传播较弱,通过人类实验验证了模型的适用性。

详情
AI中文摘要

我们提出多智能体强化学习作为一种新的方法来建模社交媒体中的虚假新闻。该方法允许我们建模社交网络中人类行为,无论是不熟悉的人群还是已经适应虚假新闻存在的人群。特别是后者对现有方法具有挑战性。我们发现,如果虚假新闻攻击针对高连接性人群和隐私信息较弱的人群,则攻击效果更佳。信息在多个智能体中扩散比在少数智能体中集中更有效。此外,虚假新闻在平衡网络中传播较弱,而在聚类网络中传播更有效。我们部分验证了我们的发现,通过人类实验,实验证据支持了模型的预测,表明该模型适合分析社交媒体中的虚假新闻传播。

英文摘要

We propose multi-agent reinforcement learning as a new method for modeling fake news in social networks. This method allows us to model human behavior in social networks both in unaccustomed populations and in populations that have adapted to the presence of fake news. In particular the latter is challenging for existing methods. We find that a fake-news attack is more effective if it targets highly connected people and people with weaker private information. Attacks are more effective when the disinformation is spread across several agents than when the disinformation is concentrated with more intensity on fewer agents. Furthermore, fake news spread less well in balanced networks than in clustered networks. We test a part of our findings in a human-subject experiment. The experimental evidence provides support for the predictions from the model, suggesting that the model is suitable to analyze the spread of fake news in social networks.

2510.08647 2026-06-04 cs.CL cs.AI 版本更新

Can Reasoning Path still be Effective as Input? Bridging Post-Reasoning to Chain-of-Thought Compression

推理路径作为输入仍然有效吗?将后推理与思维链压缩连接起来

Chengzhengxu Li, Xiaoming Liu, Zhaohan Zhang, Shengchao Liu, Guoxin Ma, Yu Lan, Cong Wang, Chao Shen

发表机构 * Faculty of Electronic and Information Engineering, Xi’an Jiaotong University(西安交通大学电子与信息工程学院) Queen Mary University of London(伦敦大学玛丽女王学院) City University of Hong Kong(香港城市大学)

AI总结 提出后推理范式,通过将思维链作为上下文输入来简化推理任务,并设计UCoT框架训练轻量级压缩器生成软令牌形式的上下文思维链,从而在保持推理能力的同时显著压缩输出长度。

Comments ACL 2026 Main Track

详情
AI中文摘要

近期发展使得大型语言模型(LLMs)能够通过长思维链(CoT)实现高级推理,但这是以牺牲推理效率为代价来换取性能。现有工作侧重于压缩推理过程中生成的CoT,但这会损害推导正确答案所需的信息。在这项工作中,我们提出后推理(post-reasoning)这一推理范式,将CoT作为上下文的一部分,以简化LLMs的推理任务。我们发现后推理显著减少了LLMs的生成长度,但其有效性取决于上下文CoT生成的效率和可靠性。因此,我们提出UCoT(Upfront CoT),一个用于CoT压缩的高效后推理框架。UCoT训练一个轻量级模型(压缩器)以软令牌形式提供上下文CoT,并训练LLM(执行器)利用此上下文CoT生成最终答案。大量实验表明,UCoT在保持执行器强大推理能力的同时,显著减少了CoT的长度。值得一提的是,当将UCoT应用于Qwen2.5-7B-Instruct模型时,在GSM8K数据集上的令牌使用量减少了50%,而性能比最先进(SOTA)方法高出3.08%。

英文摘要

Recent developments have enabled advanced reasoning in Large Language Models (LLMs) via long Chain-of-Thought (CoT), trading efficiency during inference for performance. Existing works focus on compressing generated CoT in reasoning, which impairs the necessary information for deriving the correct answer. In this work, we propose post-reasoning, a reasoning paradigm that takes CoT as a part of context to simplify the reasoning task for LLMs. We find that post-reasoning significantly reduces the generation length of LLMs, but its effectiveness hinges on the efficiency and the reliability of the contextual CoT generation. Therefore, we propose Upfront CoT (UCoT), an efficient post-reasoning framework for CoT compression. UCoT trains a lightweight model (compressor) to provide contextual CoT in form of soft tokens and trains the LLM (executor) to leverage this contextual CoT for producing the final answer. Extensive experiments show that UCoT maintains the powerful reasoning ability of executor while significantly reducing the length of CoT. It is worth mentioning that when applying UCoT to the Qwen2.5-7B-Instruct model, the usage of tokens on GSM8K dataset is reduced by 50%, while the performance is 3.08% higher than that of the state-of-the-art (SOTA) method.

2510.03511 2026-06-04 cs.CV cs.AI cs.LG eess.IV 版本更新

Platonic Transformers: A Solid Choice For Equivariance

柏拉图式Transformer:等变性的坚实选择

Mohammad Mohaiminul Islam, Rishabh Anand, David R. Wessels, Friso de Kruiff, Thijs P. Kuipers, Rex Ying, Clara I. Sánchez, Sharvaree Vadgama, Georg Bökman, Erik J. Bekkers

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出Platonic Transformer,通过基于柏拉图立体对称群参考帧的注意力机制实现等变性,在不增加计算成本的前提下提升性能。

详情
AI中文摘要

尽管Transformer广泛应用,但缺乏科学和计算机视觉中常见几何对称性的归纳偏置。现有的等变方法往往通过复杂、计算密集的设计牺牲了Transformer的高效性和灵活性。我们引入Platonic Transformer来解决这一权衡。通过将注意力定义为相对于柏拉图立体对称群参考帧,我们的方法引入了一种有原则的权重共享方案。这使得模型能够同时对连续平移和柏拉图对称性保持等变,同时保留标准Transformer的精确架构和计算成本。此外,我们证明这种注意力在形式上等价于动态群卷积,这表明模型学习自适应几何滤波器,并实现高度可扩展的线性时间卷积变体。在计算机视觉(CIFAR-10)、3D点云(ScanObjectNN)和分子性质预测(QM9、OMol25)等多个基准测试中,Platonic Transformer通过利用这些几何约束以零额外成本取得了有竞争力的性能。

英文摘要

While widespread, Transformers lack inductive biases for geometric symmetries common in science and computer vision. Existing equivariant methods often sacrifice the efficiency and flexibility that make Transformers so effective through complex, computationally intensive designs. We introduce the Platonic Transformer to resolve this trade-off. By defining attention relative to reference frames from the Platonic solid symmetry groups, our method induces a principled weight-sharing scheme. This enables combined equivariance to continuous translations and Platonic symmetries, while preserving the exact architecture and computational cost of a standard Transformer. Furthermore, we show that this attention is formally equivalent to a dynamic group convolution, which reveals that the model learns adaptive geometric filters and enables a highly scalable, linear-time convolutional variant. Across diverse benchmarks in computer vision (CIFAR-10), 3D point clouds (ScanObjectNN), and molecular property prediction (QM9, OMol25), the Platonic Transformer achieves competitive performance by leveraging these geometric constraints at no additional cost.

2510.01902 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Constrained Adaptive Rejection Sampling

约束自适应拒绝采样

Paweł Parys, Sairam Vaidya, Taylor Berg-Kirkpatrick, Loris D'Antoni

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出约束自适应拒绝采样(CARS),通过自适应剪枝无效前缀来提高拒绝采样的样本效率,同时保持无分布扭曲,在程序模糊测试和分子生成等任务中优于现有方法。

详情
AI中文摘要

语言模型(LMs)越来越多地应用于生成的输出必须满足严格语义或语法约束的场景。现有的约束生成方法处于一个谱系中:贪婪约束解码方法在解码过程中强制执行有效性,但扭曲了LM的分布;而拒绝采样(RS)保留了保真度,但通过丢弃无效输出浪费计算资源。在程序模糊测试等领域,样本的有效性和多样性都至关重要,这两种极端方法都有问题。我们提出约束自适应拒绝采样(CARS),一种严格提高RS样本效率且不产生分布扭曲的方法。CARS从无约束LM采样开始,通过将违反约束的续写记录在trie中并从后续抽取中减去其概率质量,自适应地排除它们。这种自适应剪枝确保已证明无效的前缀不会被重新访问,接受率单调提高,并且生成的样本精确遵循约束分布。在多个领域的实验(例如程序模糊测试和分子生成)中,CARS始终实现更高的效率(以每个有效样本的LM前向传递次数衡量),同时产生比GCD和近似LM分布的方法更强的样本多样性。

英文摘要

Language Models (LMs) are increasingly used in applications where generated outputs must satisfy strict semantic or syntactic constraints. Existing approaches to constrained generation fall along a spectrum: greedy constrained decoding methods enforce validity during decoding but distort the LM's distribution, while rejection sampling (RS) preserves fidelity but wastes computation by discarding invalid outputs. Both extremes are problematic in domains such as program fuzzing, where both validity and diversity of samples are essential. We present Constrained Adaptive Rejection Sampling (CARS), an approach that strictly improves the sample-efficiency of RS without distributional distortion. CARS begins with unconstrained LM sampling and adaptively rules out constraint-violating continuations by recording them in a trie and subtracting their probability mass from future draws. This adaptive pruning ensures that prefixes proven invalid are never revisited, acceptance rates improve monotonically, and the resulting samples exactly follow the constrained distribution. In experiments on a variety of domains -- e.g., program fuzzing and molecular generation -- CARS consistently achieves higher efficiency -- measured in the number of LM forward passes per valid sample -- while also producing stronger sample diversity than both GCD and methods that approximate the LM's distribution.

2505.22988 2026-06-04 cs.LG cs.AI 版本更新

Model-Preserving Adaptive Rounding

模型保持的自适应舍入

Albert Tseng, Zhaofeng Sun, Christopher De Sa

发表机构 * Department of Computer Science, Cornell University(康奈尔大学计算机科学系)

AI总结 提出一种直接考虑网络输出误差的自适应舍入量化算法YAQA,通过理论分析给出首个端到端误差界,并利用Kronecker分解近似Hessian矩阵,在无推理开销下实现优于GPTQ/LDLQ约30%的误差降低。

Comments ICML 2026

详情
AI中文摘要

量化的目标是生成一个压缩模型,其输出分布尽可能接近原始模型。为了可处理地实现这一点,大多数量化算法最小化每层的即时激活误差作为端到端误差的代理。然而,这忽略了未来层的影响,使其成为一个较差的代理。在这项工作中,我们引入了另一种量化算法(YAQA),一种直接考虑网络输出误差的自适应舍入算法。YAQA引入了一系列理论结果,最终给出了量化算法的首个端到端误差界。首先,我们通过Hessian近似的结构刻画了自适应舍入算法的收敛时间。然后,我们证明端到端误差可以通过近似与真实Hessian的余弦相似度来界定。这允许一种自然的Kronecker分解近似,并具有相应的近最优Hessian草图。YAQA在理论上优于GPTQ/LDLQ,并在经验上比这些方法减少约30%的误差。YAQA甚至实现了比量化感知训练更低的误差。这转化为下游任务上的最先进性能,同时不增加推理开销。

英文摘要

The goal of quantization is to produce a compressed model whose output distribution is as close to the original model's as possible. To do this tractably, most quantization algorithms minimize the immediate activation error of each layer as a proxy for the end-to-end error. However, this ignores the effect of future layers, making it a poor proxy. In this work, we introduce Yet Another Quantization Algorithm (YAQA), an adaptive rounding algorithm that directly considers the error at the network's output. YAQA introduces a series of theoretical results that culminate in the first end-to-end error bounds for quantization algorithms. First, we characterize the convergence time of adaptive rounding algorithms via the structure of their Hessian approximations. We then show that the end-to-end error can be bounded by the approximation's cosine similarity to the true Hessian. This admits a natural Kronecker-factored approximation with corresponding near-optimal Hessian sketches. YAQA is provably better than GPTQ/LDLQ and empirically reduces the error by $\approx 30\%$ over these methods. YAQA even achieves a lower error than quantization aware training. This translates to state of the art performance on downstream tasks, all while adding no inference overhead.

2509.15676 2026-06-04 cs.LG cs.AI cs.CL 版本更新

KITE: Kernelized and Information Theoretic Exemplars for In-Context Learning

KITE: 基于核方法和信息论的上下文学习示例选择

Vaibhav Singh, Soumya Suvra Ghosal, Kapu Nirmal Joshua, Soumyabrata Pal, Sayak Ray Chowdhury

发表机构 * IIT Bombay(印度比哈尔理工学院) UMD College Park(马里兰大学 College Park 分校) IIT Kanpur(印度坎普尔理工学院) Adobe Research(Adobe 研究)

AI总结 针对上下文学习中的示例选择问题,提出一种基于信息论和核方法的贪心算法,通过最小化查询特定预测误差并引入多样性正则化,显著提升分类性能。

详情
AI中文摘要

上下文学习(ICL)已成为一种强大的范式,通过仅使用提示中精心选择的少量任务特定示例,使大型语言模型(LLM)适应新的、数据稀缺的任务。然而,鉴于LLM有限的上下文大小,一个基本问题出现了:应选择哪些示例以最大化给定用户查询的性能?虽然基于最近邻的方法(如KATE)已被广泛用于此目的,但它们在高维嵌入空间中存在众所周知的缺点,包括泛化能力差和缺乏多样性。在这项工作中,我们从原则性的、信息论驱动的角度研究ICL中的示例选择问题。我们首先将LLM建模为输入嵌入上的线性函数,并将示例选择任务框架化为一个查询特定的优化问题:从较大的示例库中选择一个子集,以最小化特定查询上的预测误差。这种表述通过针对特定查询实例的准确预测,偏离了传统的以泛化为中心的学习理论方法。我们推导出一个原则性的代理目标,该目标是近似子模的,从而能够使用具有近似保证的贪心算法。我们通过(i)引入核技巧以在高维特征空间中操作而无需显式映射,以及(ii)引入基于最优设计的正则化项以鼓励所选示例的多样性,进一步增强了我们的方法。实验上,我们在多个分类任务上展示了相对于标准检索方法的显著改进,突出了在真实世界、标签稀缺场景中,结构感知、多样化的示例选择对ICL的益处。

英文摘要

In-context learning (ICL) has emerged as a powerful paradigm for adapting large language models (LLMs) to new and data-scarce tasks using only a few carefully selected task-specific examples presented in the prompt. However, given the limited context size of LLMs, a fundamental question arises: Which examples should be selected to maximize performance on a given user query? While nearest-neighbor-based methods like KATE have been widely adopted for this purpose, they suffer from well-known drawbacks in high-dimensional embedding spaces, including poor generalization and a lack of diversity. In this work, we study this problem of example selection in ICL from a principled, information theory-driven perspective. We first model an LLM as a linear function over input embeddings and frame the example selection task as a query-specific optimization problem: selecting a subset of exemplars from a larger example bank that minimizes the prediction error on a specific query. This formulation departs from traditional generalization-focused learning theoretic approaches by targeting accurate prediction for a specific query instance. We derive a principled surrogate objective that is approximately submodular, enabling the use of a greedy algorithm with an approximation guarantee. We further enhance our method by (i) incorporating the kernel trick to operate in high-dimensional feature spaces without explicit mappings, and (ii) introducing an optimal design-based regularizer to encourage diversity in the selected examples. Empirically, we demonstrate significant improvements over standard retrieval methods across a suite of classification tasks, highlighting the benefits of structure-aware, diverse example selection for ICL in real-world, label-scarce scenarios.

2509.10247 2026-06-04 cs.RO cs.AI 版本更新

DiffAero: A GPU-Accelerated Differentiable Simulation Framework for Efficient Quadrotor Policy Learning

DiffAero: 一种用于高效四旋翼策略学习的GPU加速可微分仿真框架

Xinhong Zhang, Runqing Wang, Yunfan Ren, Jian Sun, Hao Fang, Jie Chen, Gang Wang

发表机构 * State Key Lab of Autonomous Intelligent Unmanned Systems, Beijing Institute of Technology(自主智能无人系统国家重点实验室,北京理工大学) Zhongguancun Academy(中关村academy) Department of Mechanical Engineering, University of Hong Kong(香港大学机械工程系) Harbin Institute of Technology(哈尔滨工业大学)

AI总结 提出DiffAero,一种轻量级、GPU加速且完全可微的仿真框架,通过并行化物理与渲染实现高效四旋翼控制策略学习,并在消费级硬件上数小时内训练出鲁棒策略。

Comments 8 pages, 11 figures, 1 table

详情
AI中文摘要

本文介绍了DiffAero,一种轻量级、GPU加速且完全可微的仿真框架,专为高效的四旋翼控制策略学习而设计。DiffAero支持环境级和智能体级并行,并在统一的GPU原生训练接口中集成了多种动力学模型、可定制的传感器堆栈(IMU、深度相机和LiDAR)以及多样化的飞行任务。通过在GPU上完全并行化物理和渲染,DiffAero消除了CPU-GPU数据传输瓶颈,并在仿真吞吐量上实现了数量级的提升。与现有仿真器相比,DiffAero不仅提供高性能仿真,还作为探索可微和混合学习算法的研究平台。广泛的基准测试和真实世界飞行实验表明,DiffAero与混合学习算法相结合,可以在消费级硬件上数小时内学习到鲁棒的飞行策略。代码可在https://github.com/flyingbitac/diffaero获取。

英文摘要

This letter introduces DiffAero, a lightweight, GPU-accelerated, and fully differentiable simulation framework designed for efficient quadrotor control policy learning. DiffAero supports both environment-level and agent-level parallelism and integrates multiple dynamics models, customizable sensor stacks (IMU, depth camera, and LiDAR), and diverse flight tasks within a unified, GPU-native training interface. By fully parallelizing both physics and rendering on the GPU, DiffAero eliminates CPU-GPU data transfer bottlenecks and delivers orders-of-magnitude improvements in simulation throughput. In contrast to existing simulators, DiffAero not only provides high-performance simulation but also serves as a research platform for exploring differentiable and hybrid learning algorithms. Extensive benchmarks and real-world flight experiments demonstrate that DiffAero and hybrid learning algorithms combined can learn robust flight policies in hours on consumer-grade hardware. The code is available at https://github.com/flyingbitac/diffaero.

2509.08846 2026-06-04 cs.LG cs.AI stat.ML 版本更新

Uncertainty Estimation using Variance-Gated Distributions

使用方差门控分布的不确定性估计

H. Martin Gillis, Isaac Xu, Thomas Trappenberg

发表机构 * Faculty of Computer Science(计算机科学学院) Dalhousie University(达尔豪斯大学)

AI总结 提出基于类概率分布信噪比的方差门控不确定性估计框架,通过集成置信因子缩放预测,解决神经网络预测不确定性分解中的加性分解问题。

Comments NeurIPS Workshop: Mathematical Foundations and Operational Integration of Machine Learning for Uncertainty-Aware Decision-Making

详情
AI中文摘要

评估神经网络每个样本的不确定性量化对于涉及高风险应用的决策至关重要。一种常见的方法是使用贝叶斯或近似模型的预测分布,并将相应的预测不确定性分解为认知(模型相关)和偶然(数据相关)成分。然而,加性分解最近受到质疑。在这项工作中,我们提出了一个基于不同模型预测中类概率分布信噪比的不确定性估计和分解的直观框架。我们引入了一种方差门控度量,该度量通过从集成中导出的置信因子来缩放预测。我们使用这个度量来讨论委员会机器多样性崩溃的存在性。

英文摘要

Evaluation of per-sample uncertainty quantification from neural networks is essential for decision-making involving high-risk applications. A common approach is to use the predictive distribution from Bayesian or approximation models and decompose the corresponding predictive uncertainty into epistemic (model-related) and aleatoric (data-related) components. However, additive decomposition has recently been questioned. In this work, we propose an intuitive framework for uncertainty estimation and decomposition based on the signal-to-noise ratio of class probability distributions across different model predictions. We introduce a variance-gated measure that scales predictions by a confidence factor derived from ensembles. We use this measure to discuss the existence of a collapse in the diversity of committee machines.

2509.03351 2026-06-04 cs.LG cs.AI q-bio.QM 版本更新

epiGPTope: A machine learning-based epitope generator and classifier

epiGPTope: 一种基于机器学习的表位生成器和分类器

Natalia Flechas Manrique, Alberto Martínez, Elena López-Martínez, Luc Andrea, Román Orus, Aitor Manteca, Aitziber L. Cortajarena, Llorenç Espinosa-Portalés

发表机构 * Multiverse Computing(多维计算公司) Centre for Cooperative Research in Biomaterials (CIC biomaGUNE)(生物材料联合研究中心) Basque Research and Technology Alliance (BRTA)(巴斯克研究与技术联盟) Donostia International Physics Center(多斯蒂亚国际物理中心) Ikerbasque Foundation for Science(伊kerbasque科学基金会) IKERBASQUE(伊kerbasque)

AI总结 提出基于大型语言模型epiGPTope,通过预训练和微调直接生成新型表位序列,并结合统计分类器预测表位来源(细菌或病毒),以加速合成表位库的构建和筛选。

Comments 11 pages, 4 figures. Supplementary Information with 5 pages, 4 figures

详情
Journal ref
ACS Synthetic Biology 2026 15 (2), 631-642
AI中文摘要

表位是能被抗体或免疫细胞受体识别的短抗原肽序列,对免疫疗法、疫苗和诊断的开发至关重要。然而,由于巨大的组合序列空间(n个氨基酸的线性表位有$20^n$种组合),即使采用高通量实验技术,合成表位库的合理设计也极具挑战。在本研究中,我们提出了一种大型语言模型epiGPTope,该模型在蛋白质数据上预训练,并专门针对线性表位进行微调,首次能够直接生成新型表位样序列,这些序列被发现具有与已知表位相似的统计特性。这种生成方法可用于制备表位候选序列库。我们进一步训练统计分类器来预测表位序列是细菌来源还是病毒来源,从而缩小候选库范围,提高识别特定表位的可能性。我们提出,这种生成模型与预测模型的组合有助于表位发现。该方法仅使用线性表位的一级氨基酸序列,无需几何框架或手工特征。通过开发生成生物学可行序列的方法,我们预期能更快、更经济地生成和筛选合成表位,并在新生物技术开发中具有相关应用。

英文摘要

Epitopes are short antigenic peptide sequences which are recognized by antibodies or immune cell receptors. These are central to the development of immunotherapies, vaccines, and diagnostics. However, the rational design of synthetic epitope libraries is challenging due to the large combinatorial sequence space, $20^n$ combinations for linear epitopes of n amino acids, making screening and testing unfeasible, even with high throughput experimental techniques. In this study, we present a large language model, epiGPTope, pre-trained on protein data and specifically fine-tuned on linear epitopes, which for the first time can directly generate novel epitope-like sequences, which are found to possess statistical properties analogous to the ones of known epitopes. This generative approach can be used to prepare libraries of epitope candidate sequences. We further train statistical classifiers to predict whether an epitope sequence is of bacterial or viral origin, thus narrowing the candidate library and increasing the likelihood of identifying specific epitopes. We propose that such combination of generative and predictive models can be of assistance in epitope discovery. The approach uses only primary amino acid sequences of linear epitopes, bypassing the need for a geometric framework or hand-crafted features of the sequences. By developing a method to create biologically feasible sequences, we anticipate faster and more cost-effective generation and screening of synthetic epitopes, with relevant applications in the development of new biotechnologies.

2508.14623 2026-06-04 eess.AS cs.AI cs.SD 版本更新

A Study of the Scale Invariant Signal to Distortion Ratio in Speech Separation with Noisy References

带噪参考下语音分离中尺度不变信失真比的研究

Simon Dahl Jepsen, Mads Græsbøll Christensen, Jesper Rindom Jensen

发表机构 * European Union(欧洲联盟)

AI总结 本文研究了在训练参考包含噪声时,使用尺度不变信失真比作为评估和训练目标的影响,提出通过增强参考和混合数据来避免学习噪声参考,实验表明可减少分离语音中的噪声但可能引入伪影。

Comments Accepted for IEEE ASRU 2025, Workshop on Automatic Speech Recognition and Understanding. Copyright (c) 2025 IEEE. 8 pages, 6 figures, 2 tables

详情
Journal ref
2025 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Honolulu, HI, USA, 2025, pp. 1-8
AI中文摘要

本文研究了在监督语音分离中,当训练参考包含噪声时(如事实上的基准WSJ0-2Mix),使用尺度不变信失真比(SI-SDR)作为评估和训练目标的影响。对带噪参考的SI-SDR推导表明,噪声限制了可实现的SI-SDR,或导致分离输出中出现不希望的噪声。为了解决这个问题,提出了一种增强参考并用WHAM!扩充混合数据的方法,旨在训练避免学习噪声参考的模型。使用非侵入式NISQA.v2指标评估了在这些增强数据集上训练的两个模型。结果显示分离语音中的噪声减少,但表明处理参考可能引入伪影,限制了整体质量提升。在WSJ0-2Mix和Libri2Mix测试集上,各模型的SI-SDR与感知噪声之间存在负相关,这印证了推导的结论。

英文摘要

This paper examines the implications of using the Scale-Invariant Signal-to-Distortion Ratio (SI-SDR) as both evaluation and training objective in supervised speech separation, when the training references contain noise, as is the case with the de facto benchmark WSJ0-2Mix. A derivation of the SI-SDR with noisy references reveals that noise limits the achievable SI-SDR, or leads to undesired noise in the separated outputs. To address this, a method is proposed to enhance references and augment the mixtures with WHAM!, aiming to train models that avoid learning noisy references. Two models trained on these enhanced datasets are evaluated with the non-intrusive NISQA.v2 metric. Results show reduced noise in separated speech but suggest that processing references may introduce artefacts, limiting overall quality gains. Negative correlation is found between SI-SDR and perceived noisiness across models on the WSJ0-2Mix and Libri2Mix test sets, underlining the conclusion from the derivation.

2508.01815 2026-06-04 cs.CL cs.AI 版本更新

From Graph Retrieval to Schema Realization: Counterfactual Validation for Text-to-SPARQL over Heterogeneous Knowledge Graphs

从图检索到模式实现:面向异构知识图谱的文本到SPARQL的反事实验证

Chengxiao Dai, Yue Xiu, Dusit Niyato

发表机构 * University of Bristol(布里斯托大学)

AI总结 提出SchemaForge框架,通过问题条件化的模式切片对齐和反事实验证,在异构知识图谱上提升文本到SPARQL查询生成的执行准确率。

详情
AI中文摘要

文本到SPARQL将自然语言问题映射为RDF知识图谱上的可执行SPARQL查询。标准评估通常预先固定目标图,但实际知识图谱问答(KGQA)可能涉及具有不同模式、部分对齐和不完整元数据的异构图集合。在此设置下,查询生成不仅依赖于SPARQL语法:系统必须识别能够支持问题所需的谓词、实体类型、连接、过滤器和约束的图模式。我们提出SchemaForge,一个面向异构KG集合的文本到SPARQL的基于模式的智能体框架。其核心机制是问题条件化的模式切片对齐:弱图证据首先识别可能的图,而更强的模式证据确定局部模式切片能否实现预期查询。选定的模式切片随后在执行前约束查询生成和验证。当仅有一个图可用时,该公式简化为带有模式基础的标准单KG文本到SPARQL。我们在LC-QuAD 2.0、QALD-9 Plus、QALD-10和Spider4SPARQL上评估SchemaForge。在四个公开基准上,SchemaForge相比最强匹配的智能体基线平均提高执行准确率11.50个百分点。在Spider4SPARQL上,SchemaForge将执行准确率从54.86%提升至64.18%,并达到73.0%的Top-1和97.0%的Top-3图分配准确率。这些结果表明,从弱图证据转向模式特定的查询承诺,结合反事实答案集检查,改进了异构知识图谱上的可执行查询生成。

英文摘要

Text-to-SPARQL maps natural-language questions to executable SPARQL queries over RDF knowledge graphs. While standard evaluations often fix the target graph in advance, practical knowledge graph question answering (KGQA) may involve heterogeneous graph collections with different schemas, partial alignments, and incomplete metadata. In this setting, query generation depends on more than SPARQL syntax: the system must identify a graph schema that can support the predicates, entity types, joins, filters, and constraints required by the question. We present SchemaForge, a schema-grounded agentic framework for text-to-SPARQL over heterogeneous KG collections. Its central mechanism is question-conditioned schema-slice alignment: weak graph evidence first identifies plausible graphs, while stronger schema evidence determines whether a local schema slice can realize the intended query. The selected schema slice then constrains query generation and verification before execution. When only one graph is available, the same formulation reduces to standard single-KG text-to-SPARQL with schema grounding. We evaluate SchemaForge on LC-QuAD 2.0, QALD-9 Plus, QALD-10, and Spider4SPARQL. Across the four public benchmarks, SchemaForge improves execution accuracy over the strongest matched agent baseline by 11.50 percentage points on average. On Spider4SPARQL, SchemaForge improves execution accuracy from 54.86% to 64.18% and achieves 73.0% Top-1 and 97.0% Top-3 graph allocation accuracy. These results show that moving from weak graph evidence to schema-specific query commitments, together with counterfactual answer-set checks, improves executable query generation over heterogeneous knowledge graphs.

2507.21638 2026-06-04 cs.AI cs.LG cs.MA cs.RO 版本更新

Assistax: A Multi-Agent Hardware-Accelerated Reinforcement Learning Benchmark for Assistive Robotics

Assistax: 一个用于辅助机器人的多智能体硬件加速强化学习基准

Leonard Hinckeldey, Elliot Fosong, Rimvydas Rubavicius, Elle Miller, Trevor McInroe, Fan Zhang, Patricia Wollstadt, Stefano V. Albrecht, Subramanian Ramamoorthy

发表机构 * University of California, Berkeley(加州大学伯克利分校) Stanford University(斯坦福大学)

AI总结 提出Assistax基准,利用JAX硬件加速和基于多智能体强化学习的辅助机器人任务,实现高达370倍加速,并测试机器人的零样本协调能力。

Comments Accepted at the Reinforcement Learning Conference 2026

详情
AI中文摘要

强化学习(RL)算法的发展在很大程度上受到具有挑战性的任务和基准的推动。游戏在RL基准中占据主导地位,因为它们呈现了相关的挑战,运行成本低且易于理解。虽然围棋和Atari等游戏带来了许多突破,但它们通常不能直接转化为现实世界的具身应用。在认识到需要多样化RL基准并解决具身交互场景中出现的复杂性的情况下,我们引入了Assistax:一个旨在解决辅助机器人任务中出现的挑战的开源基准。Assistax利用JAX的硬件加速,在基于物理的模拟中实现显著的学习加速。在开环挂钟时间方面,Assistax在向量化训练运行时比基于CPU的替代方案快高达370倍。Assistax使用多智能体RL将辅助机器人与活跃的人类患者之间的交互概念化,以训练一群多样化的伙伴智能体,从而可以测试具身机器人智能体的零样本协调能力。对流行的连续控制RL和MARL算法进行的广泛评估和超参数调优提供了可靠的基线,并将Assistax确立为推进辅助机器人RL研究的实用基准。代码可在以下网址获取:https://github.com/assistive-autonomy/assistax。

英文摘要

The development of reinforcement learning (RL) algorithms has been largely driven by ambitious challenge tasks and benchmarks. Games have dominated RL benchmarks because they present relevant challenges, are inexpensive to run and easy to understand. While games such as Go and Atari have led to many breakthroughs, they often do not directly translate to real-world embodied applications. In recognising the need to diversify RL benchmarks and addressing complexities that arise in embodied interaction scenarios, we introduce Assistax: an open-source benchmark designed to address challenges arising in assistive robotics tasks. Assistax uses JAX's hardware acceleration for significant speed-ups for learning in physics-based simulations. In terms of open-loop wall-clock time, Assistax runs up to $370\times$ faster when vectorising training runs compared to CPU-based alternatives. Assistax conceptualises the interaction between an assistive robot and an active human patient using multi-agent RL to train a population of diverse partner agents against which an embodied robotic agent's zero-shot coordination capabilities can be tested. Extensive evaluation and hyperparameter tuning for popular continuous control RL and MARL algorithms provide reliable baselines and establish Assistax as a practical benchmark for advancing RL research for assistive robotics. The code is available at: https://github.com/assistive-autonomy/assistax.

2506.05233 2026-06-04 cs.LG cs.AI cs.CL 版本更新

MesaNet: Sequence Modeling by Locally Optimal Test-Time Training

MesaNet: 通过局部最优测试时训练进行序列建模

Johannes von Oswald, Nino Scherrer, Seijin Kobayashi, Luca Versari, Songlin Yang, Sarthak Mittal, Maximilian Schlegel, Kaitlin Maile, Yanick Schimpf, Oliver Sieberling, Alexander Meulemans, Rif A. Saurous, Guillaume Lajoie, Charlotte Frenkel, Razvan Pascanu, Blaise Agüera y Arcas, João Sacramento

发表机构 * Google(谷歌) Paradigms of Intelligence Team(智能范式团队) Google DeepMind(谷歌深Mind) MIT CSAIL(麻省理工学院CSAIL)

AI总结 提出一种基于共轭梯度求解器实现局部最优测试时训练的Mesa层,在保持常数推理成本的同时,在语言建模困惑度和下游基准性能上超越现有RNN模型。

Comments Published at ICLR 2026

详情
AI中文摘要

序列建模目前主要由使用softmax自注意力的因果Transformer架构主导。尽管被广泛采用,Transformer在推理时需要线性扩展内存和计算。最近一系列工作将softmax操作线性化,产生了具有恒定内存和计算成本的强大循环神经网络模型,如DeltaNet、Mamba或xLSTM。这些模型可以通过注意到其循环层动态都源于上下文回归目标(通过在线学习规则近似优化)来统一。在此,我们加入这一系列工作,引入最近提出的Mesa层(von Oswald等人,2024)的一个数值稳定、可分块并行化的版本,该层原本只能顺序运行,因此不可扩展。该层同样源于上下文损失,但现在使用快速共轭梯度求解器在每个时间点将其最小化至最优。通过一系列扩展到十亿参数规模的实验,我们表明最优测试时训练使得语言建模困惑度更低,下游基准性能优于之前的RNN,尤其是在需要长上下文理解的任务上。这一性能提升以推理时额外浮点运算为代价。因此,我们的结果与最近增加测试时计算以提高性能的趋势有趣地相关——这里通过花费计算在神经网络内部解决序列优化问题来实现。

英文摘要

Sequence modeling is currently dominated by causal transformer architectures that use softmax self-attention. Although widely adopted, transformers require scaling memory and compute linearly during inference. A recent stream of work linearized the softmax operation, resulting in powerful recurrent neural network (RNN) models with constant memory and compute costs such as DeltaNet, Mamba or xLSTM. These models can be unified by noting that their recurrent layer dynamics can all be derived from an in-context regression objective, approximately optimized through an online learning rule. Here, we join this line of work and introduce a numerically stable, chunkwise parallelizable version of the recently proposed Mesa layer (von Oswald et al., 2024), which could only run sequentially in time and was therefore not scalable. This layer again stems from an in-context loss, but which is now minimized to optimality at every time point using a fast conjugate gradient solver. Through an extensive suite of experiments study up to the billion-parameter scale, we show that optimal test-time training enables reaching lower language modeling perplexity and higher downstream benchmark performance than previous RNNs, especially on tasks requiring long context understanding. This performance gain comes at the cost of additional flops spent during inference time. Our results are therefore intriguingly related to recent trends of increasing test-time compute to improve performance -- here by spending compute to solve sequential optimization problems within the neural network itself.

2505.19293 2026-06-04 cs.CL cs.AI cs.LG 版本更新

100-LongBench: Are de facto Long-Context Benchmarks Literally Evaluating Long-Context Ability?

100-LongBench:事实上的长上下文基准是否真的在评估长上下文能力?

Wang Yang, Hongye Jin, Shaochen Zhong, Song Jiang, Qifan Wang, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University(凯斯西储大学) Texas A&M University(德克萨斯A&M大学) Rice University(里德大学) University of California, Los Angeles(加州大学洛杉矶分校) Meta(Meta公司)

AI总结 针对现有长上下文基准无法分离基线能力与真实长上下文能力、且输入长度固定等问题,提出长度可控的长上下文基准和新指标,以有效评估大语言模型的长上下文能力。

详情
AI中文摘要

长上下文能力被认为是LLM最重要的能力之一,因为真正具备长上下文能力的LLM使用户能够轻松处理许多原本繁琐的任务——例如,阅读长文档寻找答案与直接询问LLM。然而,现有的基于真实任务的长上下文评估基准有两个主要缺陷。首先,像LongBench这样的基准通常没有提供适当的指标来将长上下文性能与模型的基线能力分开,使得跨模型比较不清晰。其次,此类基准通常以固定输入长度构建,这限制了它们在不同模型上的适用性,并且无法揭示模型何时开始崩溃。为了解决这些问题,我们引入了一个长度可控的长上下文基准和一个新颖的指标,该指标将基线知识与真实的长上下文能力解耦。实验证明了我们的方法在有效评估LLM方面的优越性。

英文摘要

Long-context capability is considered one of the most important abilities of LLMs, as a truly long context-capable LLM enables users to effortlessly process many originally exhausting tasks -- e.g., digesting a long-form document to find answers vs. directly asking an LLM about it. However, existing real-task-based long-context evaluation benchmarks have two major shortcomings. First, benchmarks like LongBench often do not provide proper metrics to separate long-context performance from the model's baseline ability, making cross-model comparison unclear. Second, such benchmarks are usually constructed with fixed input lengths, which limits their applicability across different models and fails to reveal when a model begins to break down. To address these issues, we introduce a length-controllable long-context benchmark and a novel metric that disentangles baseline knowledge from true long-context capabilities. Experiments demonstrate the superiority of our approach in effectively evaluating LLMs.

2505.17315 2026-06-04 cs.AI cs.CL cs.LG 版本更新

Longer Context, Deeper Thinking: Uncovering the Role of Long-Context Ability in Reasoning

更长上下文,更深思考:揭示长上下文能力在推理中的作用

Wang Yang, Zirui Liu, Hongye Jin, Qingyu Yin, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University(凯斯西储大学) University of Minnesota - Twin Cities(明尼苏达大学双城分校) Texas A&M University(德克萨斯阿姆大学)

AI总结 本研究通过实验发现,增强模型的长上下文能力(在监督微调前)能显著提升推理性能,即使对于短输入任务也有泛化收益,表明长上下文建模是推理能力的关键基础。

详情
AI中文摘要

近期语言模型展现出强大的推理能力,但长上下文能力对推理的影响仍未充分探索。在本工作中,我们假设当前推理能力的局限性部分源于长上下文能力不足,这一假设基于经验观察:(1)更高的上下文窗口长度通常带来更强的推理性能,(2)失败的推理案例与失败的长上下文案例相似。为验证这一假设,我们检验了在监督微调(SFT)前增强模型的长上下文能力是否能提升推理性能。具体而言,我们比较了架构和微调数据相同但长上下文能力不同的模型。结果揭示了一致趋势:长上下文能力更强的模型在SFT后,在推理基准上取得了显著更高的准确率。值得注意的是,即使在输入长度较短的任务上,这些增益也持续存在,表明长上下文训练为推理性能提供了可泛化的益处。这些发现表明,长上下文建模不仅对处理长输入至关重要,而且也是推理的关键基础。我们主张将长上下文能力作为未来语言模型设计的首要目标。

英文摘要

Recent language models exhibit strong reasoning capabilities, yet the influence of long-context capacity on reasoning remains underexplored. In this work, we hypothesize that current limitations in reasoning stem, in part, from insufficient long-context capacity, motivated by empirical observations such as (1) higher context window length often leads to stronger reasoning performance, and (2) failed reasoning cases resemble failed long-context cases. To test this hypothesis, we examine whether enhancing a model's long-context ability before Supervised Fine-Tuning (SFT) leads to improved reasoning performance. Specifically, we compared models with identical architectures and fine-tuning data but varying levels of long-context capacity. Our results reveal a consistent trend: models with stronger long-context capacity achieve significantly higher accuracy on reasoning benchmarks after SFT. Notably, these gains persist even on tasks with short input lengths, indicating that long-context training offers generalizable benefits for reasoning performance. These findings suggest that long-context modeling is not just essential for processing lengthy inputs, but also serves as a critical foundation for reasoning. We advocate for treating long-context capacity as a first-class objective in the design of future language models.

2504.15587 2026-06-04 cs.LG cs.AI 版本更新

MetaMolGen: A Neural Graph Motif Generation Model for De Novo Molecular Design

MetaMolGen: 一种用于从头分子设计的神经图基序生成模型

Zimo Yan, Jie Zhang, Zheng Xie, Chang Liu, Yizhen Liu, Yiping Song

发表机构 * National University of Defense Technology(国防科技大学)

AI总结 提出基于元学习的分子生成模型MetaMolGen,通过标准化图基序分布和轻量级自回归序列模型,实现少样本和属性条件分子生成。

详情
AI中文摘要

分子生成在药物发现和材料科学中扮演重要角色,尤其是在数据稀缺场景下,传统生成模型往往难以实现令人满意的条件泛化。为应对这一挑战,我们提出MetaMolGen,一种基于一阶元学习的分子生成器,专为少样本和属性条件分子生成而设计。MetaMolGen通过将图基序映射到标准化潜在空间来标准化其分布,并采用轻量级自回归序列模型生成忠实反映底层分子结构的SMILES序列。此外,它通过集成到生成过程中的可学习属性投影器,支持具有目标属性的分子的条件生成。实验结果表明,MetaMolGen在低数据条件下持续生成有效且多样的SMILES序列,优于传统基线。这突显了其在快速适应和高效条件生成方面的优势,适用于实际分子设计。

英文摘要

Molecular generation plays an important role in drug discovery and materials science, especially in data-scarce scenarios where traditional generative models often struggle to achieve satisfactory conditional generalization. To address this challenge, we propose MetaMolGen, a first-order meta-learning-based molecular generator designed for few-shot and property-conditioned molecular generation. MetaMolGen standardizes the distribution of graph motifs by mapping them to a normalized latent space, and employs a lightweight autoregressive sequence model to generate SMILES sequences that faithfully reflect the underlying molecular structure. In addition, it supports conditional generation of molecules with target properties through a learnable property projector integrated into the generative process.Experimental results demonstrate that MetaMolGen consistently generates valid and diverse SMILES sequences under low-data regimes, outperforming conventional baselines. This highlights its advantage in fast adaptation and efficient conditional generation for practical molecular design.

2504.12329 2026-06-04 cs.CL cs.AI 版本更新

Speculative Thinking: Enhancing Small-Model Reasoning with Large Model Guidance at Inference Time

推测性思考:在推理时利用大模型指导增强小模型推理能力

Wang Yang, Xiang Yue, Vipin Chaudhary, Xiaotian Han

发表机构 * Case Western Reserve University(凯斯西储大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出一种无需训练的推测性思考框架,通过让大推理模型在推理层面引导小模型,在提升小模型推理准确率的同时缩短输出长度。

详情
AI中文摘要

近期进展利用后训练来增强模型推理性能,这通常需要昂贵的训练流程,并且仍然存在低效、输出过长的问题。我们提出推测性思考,一种无需训练的框架,使大推理模型在推理层面引导小模型进行推理,区别于在词元层面操作的推测解码。我们的方法基于两个观察:(1)支持推理的词元(如“wait”)经常出现在结构分隔符(如“\n\n”)之后,作为反思或继续的信号;(2)大模型对反思行为有更强的控制,减少不必要的回溯同时提高推理质量。通过策略性地将反思步骤委托给能力更强的模型,我们的方法显著提升了推理模型的推理准确率,同时缩短了输出。在32B推理模型的辅助下,1.5B模型在MATH500上的准确率从83.2%提升至89.4%,实现了6.2%的大幅提升。同时,平均输出长度从5439个词元减少到4583个词元,下降了15.7%。此外,当应用于非推理模型(Qwen-2.5-7B-Instruct)时,我们的框架在相同基准上将准确率从74.0%提升至81.8%,实现了7.8%的相对提升。

英文摘要

Recent advances leverage post-training to enhance model reasoning performance, which typically requires costly training pipelines and still suffers from inefficient, overly lengthy outputs. We introduce Speculative Thinking, a training-free framework that enables large reasoning models to guide smaller ones during inference at the reasoning level, distinct from speculative decoding, which operates at the token level. Our approach is based on two observations: (1) reasoning-supportive tokens such as "wait" frequently appear after structural delimiters like "\n\n", serving as signals for reflection or continuation; and (2) larger models exhibit stronger control over reflective behavior, reducing unnecessary backtracking while improving reasoning quality. By strategically delegating reflective steps to a more capable model, our method significantly boosts the reasoning accuracy of reasoning models while shortening their output. With the assistance of the 32B reasoning model, the 1.5B model's accuracy on MATH500 increases from 83.2% to 89.4%, marking a substantial improvement of 6.2%. Simultaneously, the average output length is reduced from 5439 tokens to 4583 tokens, representing a 15.7% decrease. Moreover, when applied to a non-reasoning model (Qwen-2.5-7B-Instruct), our framework boosts its accuracy from 74.0% to 81.8% on the same benchmark, achieving a relative improvement of 7.8%.

2405.08036 2026-06-04 cs.LG cs.AI 版本更新

Potentially Optimal Joint Actions Recognition for Cooperative Multi-Agent Reinforcement Learning

合作多智能体强化学习中潜在最优联合动作识别

Chang Huang, Shatong Zhu, Junqiao Zhao, Hongtu Zhou, Di Zhang, Hai Zhang, Chen Ye, Ziqiao Wang, Guang Chen

发表机构 * School of Computer Science and Technology, Tongji University(同济大学计算机科学与技术学院) Stanford University(斯坦福大学) MOE Key Lab of Embedded System and Service Computing, Tongji University, Shanghai, China(同济大学嵌入式系统与服务计算教育部重点实验室,上海,中国) The University of Hong Kong(香港大学) Shanghai Innovation Institute(上海创新研究院)

AI总结 针对值函数分解中单调性约束限制表达能力的问题,提出潜在最优联合动作加权方法,通过迭代加权训练保证最优策略恢复,在多个任务上超越现有方法。

Comments ICLR 2026

详情
Journal ref
ICLR 2026
AI中文摘要

值函数分解在合作多智能体强化学习(MARL)中被广泛使用。现有方法通常对联合动作值与个体动作值之间施加单调性约束以实现分散执行。然而,此类约束限制了值函数分解的表达能力,缩小了可表示的联合动作值范围,并阻碍了最优策略的学习。为解决这一问题,我们提出了潜在最优联合动作加权(POW)方法,该方法在现有近似加权策略可能失效的情况下确保最优策略恢复。POW通过一个理论上有依据的迭代加权训练过程,迭代地识别潜在最优联合动作并为其分配更高的训练权重。我们证明该机制保证了真实最优策略的恢复,克服了先前启发式加权策略的局限性。POW是架构无关的,可以无缝集成到现有的值函数分解算法中。在矩阵博弈、难度增强的捕食者-猎物任务、SMAC、SMACv2以及高速公路环境交叉口场景上的大量实验表明,POW显著提升了稳定性,并持续超越了最先进的基于值的MARL方法。

英文摘要

Value function factorization is widely used in cooperative multi-agent reinforcement learning (MARL). Existing approaches often impose monotonicity constraints between the joint action value and individual action values to enable decentralized execution. However, such constraints limit the expressiveness of value factorization, restricting the range of joint action values that can be represented and hindering the learning of optimal policies. To address this, we propose Potentially Optimal Joint Actions Weighting (POW), a method that ensures optimal policy recovery where existing approximate weighting strategies may fail. POW iteratively identifies potentially optimal joint actions and assigns them higher training weights through a theoretically grounded iterative weighted training process. We prove that this mechanism guarantees recovery of the true optimal policy, overcoming the limitations of prior heuristic weighting strategies. POW is architecture-agnostic and can be seamlessly integrated into existing value factorization algorithms. Extensive experiments on matrix games, difficulty-enhanced predator-prey tasks, SMAC, SMACv2, and a highway-env intersection scenario show that POW substantially improves stability and consistently surpasses state-of-the-art value-based MARL methods.

2503.06525 2026-06-04 cs.CY cs.AI 版本更新

From Motion Signals to Insights: A Unified Framework for Student Behavior Analysis and Feedback in Physical Education Classes

从运动信号到洞察:体育课堂学生行为分析与反馈的统一框架

Xian Gao, Jiacheng Ruan, Jingsheng Gao, Mingye Xie, Zongyun Zhang, Ting Liu, Yuzhuo Fu

发表机构 * Shanghai Jiao Tong University(上海交通大学)

AI总结 提出一个基于运动信号和大型语言模型的端到端统一框架,用于体育课堂学生行为分析,自动生成教学洞察和改进建议。

Comments Work in progress

详情
AI中文摘要

在教育场景中分析学生行为对于提高教学质量和学生参与度至关重要。现有的基于AI的模型通常依赖课堂视频录像来识别和分析学生行为。虽然这些基于视频的方法可以部分捕捉和分析学生动作,但在体育课堂中,由于活动在户外开放空间进行且活动多样,它们难以准确跟踪每个学生的动作,并且难以泛化到这些场景中涉及的专业技术动作。此外,当前方法通常缺乏整合专业教学知识的能力,限制了它们提供深入的学生行为洞察和优化教学设计反馈的能力。为了解决这些限制,我们提出了一个统一的端到端框架,该框架利用基于运动信号的人类活动识别技术,结合先进的大型语言模型,对体育课堂中的学生行为进行更详细的分析和反馈。我们的框架从教师的教学设计和学生在体育课期间的运动信号开始,最终生成带有教学洞察和改进建议的自动化报告,以优化学习和课堂教学。该解决方案提供了一种基于运动信号的方法,用于分析学生行为并优化针对体育课堂的教学设计。实验结果表明,我们的框架能够准确识别学生行为并产生有意义的教学洞察。

英文摘要

Analyzing student behavior in educational scenarios is crucial for enhancing teaching quality and student engagement. Existing AI-based models often rely on classroom video footage to identify and analyze student behavior. While these video-based methods can partially capture and analyze student actions, they struggle to accurately track each student's actions in physical education classes, which take place in outdoor, open spaces with diverse activities, and are challenging to generalize to the specialized technical movements involved in these settings. Furthermore, current methods typically lack the ability to integrate specialized pedagogical knowledge, limiting their ability to provide in-depth insights into student behavior and offer feedback for optimizing instructional design. To address these limitations, we propose a unified end-to-end framework that leverages human activity recognition technologies based on motion signals, combined with advanced large language models, to conduct more detailed analyses and feedback of student behavior in physical education classes. Our framework begins with the teacher's instructional designs and the motion signals from students during physical education sessions, ultimately generating automated reports with teaching insights and suggestions for improving both learning and class instructions. This solution provides a motion signal-based approach for analyzing student behavior and optimizing instructional design tailored to physical education classes. Experimental results demonstrate that our framework can accurately identify student behaviors and produce meaningful pedagogical insights.

2407.03884 2026-06-04 cs.CL cs.AI 版本更新

ChatSOP: An SOP-Guided MCTS Planning Framework for Controllable LLM Dialogue Agents

ChatSOP: 一种SOP引导的MCTS规划框架,用于可控的LLM对话代理

Zhigen Li, Jianxiang Peng, Yanmeng Wang, Yong Cao, Tianhao Shen, Minghui Zhang, Linxi Su, Shang Wu, Yihang Wu, Yuqian Wang, Ye Wang, Wei Hu, Jianfeng Li, Shaojun Wang, Jing Xiao, Deyi Xiong

发表机构 * TJUNLP Lab, College of Intelligence and Computing, Tianjin University(天津大学智能计算学院TJUNLP实验室) Ping An Technology(平安科技) Tübingen AI Center, University of Tübingen(图宾根大学图宾根人工智能中心) Kunming University of Science and Technology(昆明理工大学)

AI总结 提出ChatSOP框架,通过SOP引导的蒙特卡洛树搜索增强LLM对话代理的可控性,在动作准确率上相比GPT-3.5基线提升27.95%。

Comments Accepted to ACL 2025 main

详情
Journal ref
Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 17637-17659, 2025
AI中文摘要

由大型语言模型驱动的对话代理在各种任务中表现出优越的性能。尽管它们能更好地理解用户并生成类人回复,但**缺乏可控性**仍然是一个关键挑战,常常导致对话偏离主题或任务失败。为了解决这个问题,我们引入标准操作程序来规范对话流程。具体来说,我们提出了**ChatSOP**,一种新颖的SOP引导的蒙特卡洛树搜索规划框架,旨在增强LLM驱动的对话代理的可控性。为此,我们整理了一个数据集,包含使用GPT-4o的半自动角色扮演系统生成的、经过严格人工质量控制验证的SOP标注的多场景对话。此外,我们提出了一种新方法,将思维链推理与监督微调相结合用于SOP预测,并利用SOP引导的蒙特卡洛树搜索在对话中进行最优动作规划。实验结果表明了我们方法的有效性,例如,与基于GPT-3.5的基线模型相比,动作准确率提高了27.95%,并且在开源模型上也显示出显著的提升。数据集和代码已公开。

英文摘要

Dialogue agents powered by Large Language Models (LLMs) show superior performance in various tasks. Despite the better user understanding and human-like responses, their **lack of controllability** remains a key challenge, often leading to unfocused conversations or task failure. To address this, we introduce Standard Operating Procedure (SOP) to regulate dialogue flow. Specifically, we propose **ChatSOP**, a novel SOP-guided Monte Carlo Tree Search (MCTS) planning framework designed to enhance the controllability of LLM-driven dialogue agents. To enable this, we curate a dataset comprising SOP-annotated multi-scenario dialogues, generated using a semi-automated role-playing system with GPT-4o and validated through strict manual quality control. Additionally, we propose a novel method that integrates Chain of Thought reasoning with supervised fine-tuning for SOP prediction and utilizes SOP-guided Monte Carlo Tree Search for optimal action planning during dialogues. Experimental results demonstrate the effectiveness of our method, such as achieving a 27.95% improvement in action accuracy compared to baseline models based on GPT-3.5 and also showing notable gains for open-source models. Dataset and codes are publicly available.

2408.11121 2026-06-04 cs.LG cs.AI cs.CL cs.CR 版本更新

DOMBA: Double Model Balancing for Access-Controlled Language Models via Minimum-Bounded Aggregation

DOMBA: 通过最小有界聚合实现访问控制语言模型的双模型平衡

Tom Segal, Asaf Shabtai, Yuval Elovici

发表机构 * Ben-Gurion University(本·古里安大学)

AI总结 提出DOMBA方法,通过最小有界平均函数聚合两个不同访问级别文档训练的语言模型的概率分布,在保证安全性的同时实现高效用。

Comments Code: https://github.com/ppo1/DOMBA 11 pages, 3 figures

详情
Journal ref
Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 39, pp. 25101-25109, 2025
AI中文摘要

大型语言模型(LLMs)的实用性在很大程度上取决于其训练数据的质量和数量。许多组织拥有大量数据语料库,可用于训练或微调针对其特定需求的LLMs。然而,这些数据集通常带有基于用户权限并由访问控制机制强制执行的访问限制。在此类数据集上训练LLMs可能导致敏感信息暴露给未经授权的用户。防止此类暴露的一种直接方法是为每个访问级别训练一个单独的模型。然而,由于每个模型的训练数据量相对于整个组织语料库的总量有限,这可能导致模型效用低下。另一种方法是在所有数据上训练单个LLM,同时限制未经授权信息的暴露。然而,当前针对LLMs的暴露限制方法对于访问控制数据无效,因为敏感信息在多个训练样本中频繁出现。我们提出DOMBA——双模型平衡——一种训练和部署LLMs的简单方法,可在提供高效用和访问控制功能的同时保证安全性。DOMBA使用“最小有界”平均函数(一个受较小值约束的函数,例如调和平均)聚合两个模型的概率分布,每个模型在具有(可能多个)不同访问级别的文档上训练。详细的数学分析和广泛评估表明,DOMBA在保护受限信息的同时,提供了与非安全模型相当的效用。

英文摘要

The utility of large language models (LLMs) depends heavily on the quality and quantity of their training data. Many organizations possess large data corpora that could be leveraged to train or fine-tune LLMs tailored to their specific needs. However, these datasets often come with access restrictions that are based on user privileges and enforced by access control mechanisms. Training LLMs on such datasets could result in exposure of sensitive information to unauthorized users. A straightforward approach for preventing such exposure is to train a separate model for each access level. This, however, may result in low utility models due to the limited amount of training data per model compared to the amount in the entire organizational corpus. Another approach is to train a single LLM on all the data while limiting the exposure of unauthorized information. However, current exposure-limiting methods for LLMs are ineffective for access-controlled data, where sensitive information appears frequently across many training examples. We propose DOMBA - double model balancing - a simple approach for training and deploying LLMs that provides high utility and access-control functionality with security guarantees. DOMBA aggregates the probability distributions of two models, each trained on documents with (potentially many) different access levels, using a "min-bounded" average function (a function that is bounded by the smaller value, e.g., harmonic mean). A detailed mathematical analysis and extensive evaluation show that DOMBA safeguards restricted information while offering utility comparable to non-secure models.

2411.19758 2026-06-04 cs.CV cs.AI cs.LG 版本更新

LaVIDE: Language-Prompted Satellite Change Detection via Map-Image Alignment

LaVIDE: 通过地图-图像对齐的语言提示卫星变化检测

Shuguo Jiang, Fang Xu, Chuandong Liu, Hong Tan, Shengyang Li, Lei Yu, Wen Yang, Sen Jia, Gui-Song Xia

发表机构 * School of Computer Science, Wuhan University(武汉大学计算机学院) School of Artificial Intelligence, Wuhan University(武汉大学人工智能学院) Technology and Engineering Center for Space Utilization and the Key Laboratory of Space Utilization, Chinese Academy of Sciences(中国科学院空间利用技术与重点实验室) School of Aeronautics and Astronautics, University of Chinese Academy of Sciences(中国科学院大学航空宇航学院) School of Electronic Information, Wuhan University(武汉大学电子信息学院) College of Computer Science and Software Engineering, Shenzhen University(深圳大学计算机科学与软件工程学院)

AI总结 提出LaVIDE框架,利用受限提示学习和对象感知嵌入增强,通过语言弥合高层地图类别与低层图像细节之间的语义鸿沟,实现跨模态对齐,在多类与单类变化检测任务上分别提升IoU 18.4%和5.2%。

详情
AI中文摘要

基于地图参考和最新图像的遥感变化检测,在缺乏早期图像进行比较时,有助于及时观测地球表面。然而,高层地图类别与低层图像细节之间的语义鸿沟阻碍了提取同质特征以进行稳健的时间关联。与比较像素级视觉相似性或传播分割误差的传统方法不同,我们提出了一种新颖框架——LaVIDE(用于检测变化的语言-视觉判别器),该框架以语言为中介,弥合了高层地图类别与低层图像细节之间的语义鸿沟。具体来说,我们引入了受限提示学习来生成上下文感知的文本提示,使地图语义与图像内容对齐,并采用对象感知嵌入增强策略将对象级属性(如形状、边界)整合到地图表示中。这些组件能够在统一的语言-视觉特征空间中实现稳健的跨模态对齐。在四个基准数据集(DynamicEarthNet、HRSCD、BANDON和SECOND)上的大量实验表明,LaVIDE以显著优势超越了最先进的方法,在多类和单类变化检测任务上分别实现了18.4%和5.2%的IoU提升。我们的框架不仅提高了地图-图像变化检测的准确性,还为以最少人工干预快速更新地图提供了实用解决方案,有望在城市规划、灾害评估和生态保护等领域产生广泛影响。代码和数据集可在 https://github.com/ShuGuoJ/LAVIDE.git 获取。

英文摘要

Remote sensing change detection based on a map reference and an up-to-date image boosts timely observation of the Earth's surface when earlier images are lacking for comparison. However, the semantic gap between high-level map categories and low-level image details hinders the extraction of homogeneous features for robust temporal association in change detection. Unlike conventional approaches that either compare pixel-level visual similarity or propagate segmentation errors, \textcolor{black}{we propose a novel framework, \underline{La}nguage-\underline{VI}sion \underline{D}iscriminator for d\underline{E}tecting changes, LaVIDE}, which bridges the semantic gap between high-level map categories and low-level image details using language as an intermediary. Specifically, we introduce {\it restricted prompt learning} to generate context-aware textual prompts that align map semantics with image content, and an {\it object-aware embedding enhancement} strategy to integrate object-level attributes (e.g., shape, boundary) into map representations. These components enable robust cross-modal alignment within a unified language-vision feature space. Extensive experiments on four benchmarks, DynamicEarthNet, HRSCD, BANDON, and SECOND, demonstrate that LaVIDE outperforms state-of-the-art methods by significant margins, achieving $18.4\%$ and $5.2\%$ improvements in IoU on multi-class and single-class change detection tasks, respectively. Our framework not only advances the accuracy of map-image change detection but also provides a practical solution for rapid map updating with minimal human intervention, promising broad impacts in urban planning, disaster assessment, and ecological conservation. Code and datasets are available at: https://github.com/ShuGuoJ/LAVIDE.git.

2407.13922 2026-06-04 cs.CV cs.AI cs.LG 版本更新

CounterFace: A Synthetic Face Dataset for Fine-Grained Counterfactual Evaluation of Face Recognition Systems

CounterFace: 用于人脸识别系统细粒度反事实评估的合成人脸数据集

Guruprasad Viswanathan Ramesh, Ashish Hooda, Shimaa Ahmed, Harrison J Rosenberg, Ramya Korlakai Vinayak, Kassem Fawaz

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) Visa Research(Visa研究)

AI总结 提出CounterFace数据集,通过全自动流水线生成包含20种面部属性和8种人口统计因素的11,821个反事实人脸对,用于细粒度评估人脸识别系统在特定属性-人口统计组合下的性能退化。

Comments Code available at https://github.com/Guruprasad68/counterface_facct2026. Dataset available for non-commercial research upon request

详情
AI中文摘要

人脸识别系统广泛应用于关键应用,因此其在不同人群和条件下的可靠性和鲁棒性至关重要。人脸识别系统的标准评估通常依赖LFW等数据集来估计平均识别准确率。一些基准测试也捕捉了粗粒度的身份内变化,如老化、姿态和光照。然而,人脸存在更细粒度的变化,包括发型和化妆等外观变化,这些在现有基准测试中代表性不足。反事实评估提供了一种在细粒度变化下评估人脸识别鲁棒性的方法。然而,现有使用图像生成器合成的反事实人脸数据集由于在流程中使用人工验证,属性覆盖范围有限。我们提出CounterFace,一个新的反事实评估数据集,包含20种面部属性和8种人口统计因素,超过先前合成人脸数据集14种属性和2种人口统计因素。该数据集使用基于现成图像生成器和自定义验证器的全自动流水线生成,无需人工验证。CounterFace包含11,821个反事实人脸对,事后用户研究证实了生成反事实的忠实性。我们评估了两个商业和四个开源人脸识别系统(AWS Rekognition、Face++、AdaFace、MagFace、ArcFace、FaceNet)在160种属性-人口统计组合上的性能。与标准评估基准不同,我们的数据集有助于隔离单个系统的精确故障模式。结果表明,所有六个系统的性能退化因属性和人口统计而异,遮挡属性(如口罩和胡须)普遍降低性能。

英文摘要

Face recognition (FR) systems are widely deployed in critical applications, making their reliability and robustness across diverse populations and conditions essential. Standard evaluation of FR systems typically relies on datasets such as LFW to estimate average recognition accuracy. Some benchmarks also capture coarse-grained intra-identity variations such as aging, pose, and lighting. However, human faces undergo more fine-grained changes, including appearance changes such as hairstyles and makeup, that are underrepresented in existing benchmarks. Counterfactual evaluation provides a method to assess FR robustness under such fine-grained variations. Existing counterfactual face datasets synthesized with image generators, however, are limited in attribute coverage due to the use of humans for verification in the pipeline. We propose CounterFace, a new counterfactual evaluation dataset comprising 20 facial attributes and 8 demographic factors, exceeding prior synthetic face datasets by 14 attributes and 2 demographics. The dataset is generated using a fully automated pipeline based on off-the-shelf image generators with custom verifiers, removing human need for verification. CounterFace contains 11,821 counterfactual face pairs, and a post-hoc user study confirms the faithfulness of the generated counterfactuals. We evaluate two commercial and four open-source FR systems (AWS Rekognition, Face++, AdaFace, MagFace, ArcFace, FaceNet) across 160 attribute-demographic combinations. Our dataset helps in the isolation of precise failure modes for individual systems unlike standard evaluation benchmarks. Results indicate that the performance degradation varies across attributes and demographics for all six systems and occluding attributes (e.g., facemask and facial hair) universally degrade performance.

1709.09480 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

A Benchmark Environment Motivated by Industrial Control Problems

由工业控制问题启发的基准环境

Daniel Hein, Stefan Depeweg, Michel Tokic, Steffen Udluft, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Siemens AG, Corporate Technology(西门子股份公司企业技术部)

AI总结 本文提出一个结合工业控制问题的基准环境,旨在解决真实工业环境与现有人工基准之间缺乏联系的问题,通过详细描述基准动态并识别典型实验设置来促进强化学习方法的改进。

详情
Journal ref
2017 IEEE Symposium Series on Computational Intelligence (SSCI)
AI中文摘要

在强化学习(RL)研究领域,频繁出现新的有前景的方法被开发并引入RL社区。然而,尽管许多研究人员渴望将他们的方法应用于现实世界的问题,但在真实工业环境中实施这些方法往往是一个令人沮丧和繁琐的过程。通常,学术研究小组只能有限地访问真实工业数据和应用。因此,新方法通常通过使用人工软件基准来开发、评估和比较。一方面,这些基准旨在提供可解释的RL训练场景和对所用方法学习过程的深入见解。另一方面,它们通常与现实工业应用缺乏相似性。为此,我们利用行业经验设计了一个基准,以弥合自由可用、文档齐全且有动机的人工基准与真实工业问题属性之间的差距。所得到的工业基准(IB)已通过在GitHub上发布其Java和Python代码,包括一个OpenAI Gym包装器,向RL社区公开。在本文中,我们详细阐述了IB的动力学,并识别了能够捕捉现实世界工业控制问题中常见情况的典型实验设置。

英文摘要

In the research area of reinforcement learning (RL), frequently novel and promising methods are developed and introduced to the RL community. However, although many researchers are keen to apply their methods on real-world problems, implementing such methods in real industry environments often is a frustrating and tedious process. Generally, academic research groups have only limited access to real industrial data and applications. For this reason, new methods are usually developed, evaluated and compared by using artificial software benchmarks. On one hand, these benchmarks are designed to provide interpretable RL training scenarios and detailed insight into the learning process of the method on hand. On the other hand, they usually do not share much similarity with industrial real-world applications. For this reason we used our industry experience to design a benchmark which bridges the gap between freely available, documented, and motivated artificial benchmarks and properties of real industrial problems. The resulting industrial benchmark (IB) has been made publicly available to the RL community by publishing its Java and Python code, including an OpenAI Gym wrapper, on Github. In this paper we motivate and describe in detail the IB's dynamics and identify prototypic experimental settings that capture common situations in real-world industry control problems.

2006.04013 2026-06-04 cs.CY cs.AI cs.LG 版本更新

AI from concrete to abstract: demystifying artificial intelligence to the general public

从具体到抽象的人工智能:向公众揭秘人工智能

Rubens Lacerda Queiroz, Fábio Ferrentini Sampaio, Cabral Lima, Priscila Machado Vieira Lima

发表机构 * Federal University of Rio de Janeiro – UFRJ – Brazil(巴西联邦大学里约热内卢分校) InovLabs – Portugal(葡萄牙InovLabs) Atlantica University – Portugal(葡萄牙Atlantica大学) PESC/COPPE Tercio Pacitti Institute (NCE)(Tercio Pacitti研究所(NCE))

AI总结 本文提出一种结合可视化编程与WiSARD无权重人工神经网络的新方法AIcon2abs,通过实践开发学习机器并观察其学习过程,帮助普通大众(包括儿童)理解人工智能的基本概念。

Comments 23 pages; 2 tables; 47 figures; review comment: Included references for the final published peer-reviewed version of this pre-print: https://doi.org/10.1007/s00146-021-01151-x and https://rdcu.be/cihdO; typos corrected

详情
Journal ref
AI & SOCIETY, 36 877-893 (2021)
AI中文摘要

人工智能(AI)已被广泛应用于众多领域,这表明迫切需要开发手段,使普通大众对AI的含义有最基本的理解。本文结合可视化编程与WiSARD无权重人工神经网络,提出了一种新方法——从具体到抽象的人工智能(AIcon2abs),使普通人(包括儿童)能够实现这一目标。该方法的主要策略是通过与学习机器开发相关的实践活动,以及观察其学习过程,来促进对人工智能的去神秘化。因此,它能够使受训者获得技能,从而在涉及采用人工智能机制的辩论和决策中成为有洞察力的参与者。目前,通过编程教授基本AI概念的现有方法将机器智能视为外部元素/模块。经过训练后,该外部模块被耦合到学习者正在开发的主应用程序中。而在本文提出的方法中,训练和分类任务都是构成主程序的模块,就像其他编程结构一样。作为AIcon2abs的一个有益副作用,能够从数据中学习的程序与常规计算机程序之间的区别变得更加明显。此外,WiSARD无权重人工神经网络模型的简单性使得训练和分类任务的内部实现易于可视化和理解。

英文摘要

Artificial Intelligence (AI) has been adopted in a wide range of domains. This shows the imperative need to develop means to endow common people with a minimum understanding of what AI means. Combining visual programming and WiSARD weightless artificial neural networks, this article presents a new methodology, AI from concrete to abstract (AIcon2abs), to enable general people (including children) to achieve this goal. The main strategy adopted by is to promote a demystification of artificial intelligence via practical activities related to the development of learning machines, as well as through the observation of their learning process. Thus, it is possible to provide subjects with skills that contributes to making them insightful actors in debates and decisions involving the adoption of artificial intelligence mechanisms. Currently, existing approaches to the teaching of basic AI concepts through programming treat machine intelligence as an external element/module. After being trained, that external module is coupled to the main application being developed by the learners. In the methodology herein presented, both training and classification tasks are blocks that compose the main program, just as the other programming constructs. As a beneficial side effect of AIcon2abs, the difference between a program capable of learning from data and a conventional computer program becomes more evident. In addition, the simplicity of the WiSARD weightless artificial neural network model enables easy visualization and understanding of training and classification tasks internal realization.

1710.05465 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Flow: A Modular Learning Framework for Mixed Autonomy Traffic

Flow: 一种用于混合自主性的模块化学习框架

Cathy Wu, Aboudy Kreidieh, Kanaad Parvate, Eugene Vinitsky, Alexandre M Bayen

发表机构 * Laboratory for Information and Decision Systems, Massachusetts Institute of Technology(信息与决策实验室,麻省理工学院) Institute of Data, Systems, and Society, Massachusetts Institute of Technology(数据、系统与社会研究所,麻省理工学院) Department of Mechanical Engineering, University of California, Berkeley(机械工程系,加州大学伯克利分校)

AI总结 本文提出了一种模块化学习框架,利用深度强化学习解决复杂交通动态问题,通过提高系统层面的速度,使学习到的控制法则在仅有4-7%的自动驾驶汽车参与度下,相比人类驾驶性能提升高达57%。此外,在单车道交通中,一个仅使用局部观测的小型神经网络控制法则能够消除拥堵现象,达到近最优性能。

Comments 17 pages, 8 figures, 5 tables. 2021 IEEE Transactions on Robotics (T-RO)

详情
AI中文摘要

自动驾驶车辆(AVs)的快速发展为交通系统带来了巨大的潜力,通过提高安全性和效率以及出行可及性。然而,随着AVs的采用,这些影响的发展进程并不清楚。从分析部分自动驾驶的总体目标来看,出现了许多技术挑战:部分控制和观测、多车辆交互以及现实世界网络所代表的大量场景。为了深入了解近期AV的影响,本文研究了深度强化学习(RL)在低AV采用率环境下克服这些挑战的适用性。本文提出了一种模块化学习框架,利用深度RL来处理复杂的交通动态。模块由多个部分组成,以捕捉常见的交通现象(如停止-启动交通拥堵、车道变换、交叉口)。学习到的控制法则在系统层面的速度上优于人类驾驶性能,仅在4-7%的AVs参与度下,提高了高达57%。此外,在单车道交通中,一个仅使用局部观测的小型神经网络控制法则被发现能够消除停止-启动交通现象,超越了所有已知的基于模型的控制器,达到近最优性能,并且能够推广到非分布交通密度。

英文摘要

The rapid development of autonomous vehicles (AVs) holds vast potential for transportation systems through improved safety, efficiency, and access to mobility. However, the progression of these impacts, as AVs are adopted, is not well understood. Numerous technical challenges arise from the goal of analyzing the partial adoption of autonomy: partial control and observation, multi-vehicle interactions, and the sheer variety of scenarios represented by real-world networks. To shed light into near-term AV impacts, this article studies the suitability of deep reinforcement learning (RL) for overcoming these challenges in a low AV-adoption regime. A modular learning framework is presented, which leverages deep RL to address complex traffic dynamics. Modules are composed to capture common traffic phenomena (stop-and-go traffic jams, lane changing, intersections). Learned control laws are found to improve upon human driving performance, in terms of system-level velocity, by up to 57% with only 4-7% adoption of AVs. Furthermore, in single-lane traffic, a small neural network control law with only local observation is found to eliminate stop-and-go traffic - surpassing all known model-based controllers to achieve near-optimal performance - and generalize to out-of-distribution traffic densities.

1811.01220 2026-06-04 math.OC cs.AI cs.CC cs.NA math.NA 版本更新

Sharp worst-case evaluation complexity bounds for arbitrary-order nonconvex optimization with inexpensive constraints

任意阶非凸优化的最坏情况评估复杂度界限

Coralia Cartis, Nick I. M. Gould, Philippe L. Toint

发表机构 * Mathematical Institute, Oxford University(牛津大学数学研究所) Computational Mathematics Group, STFC-Rutherford Appleton Laboratory(STFC-拉瑟福德苹果顿实验室计算数学组) Namur Center for Complex Systems (naXys), University of Namur(纳慕尔复杂系统中心(naXys),纳慕尔大学)

AI总结 本文研究了具有低成本约束的任意阶非凸优化问题的最坏情况评估复杂度界限,提出了一种概念性正则化算法,能够在给定精度和最优阶数的情况下,以O(ε^(- (p+1)/(p-q+1)))的次数评估目标函数及其导数,计算出合适的q阶近似极小值点。

Comments 30 pages

详情
Journal ref
SIAM Journal on Optimization,, vol. 30(1), pp. 513-541, 2020
AI中文摘要

我们为非凸最小化问题提供了精确的最坏情况评估复杂度界限,这些问题是具有通用低成本约束的问题,即约束的评估/执行成本相对于目标函数的评估成本可以忽略不计。这些界限统一、扩展或改进了所有已知的无约束和凸约束问题的上界和下界复杂度界限。证明了,在给定精度水平ε,最高可用Lipschitz连续导数阶数p和期望最优阶数q(介于1和p之间)的情况下,一个概念性正则化算法需要不超过O(ε^(- (p+1)/(p-q+1)))次目标函数及其导数的评估,以计算一个合适的q阶近似极小值点。通过适当选择正则化,如果p阶导数仅仅是Hölder连续而非Lipschitz连续,则也得出类似的结果。我们提供了一个例子,说明上述复杂度界限对于无约束和广泛类别的约束问题都是精确的,并且从最坏情况复杂度的角度解释了正则化方法的最优性,限于一大类使用相同导数信息的算法。

英文摘要

We provide sharp worst-case evaluation complexity bounds for nonconvex minimization problems with general inexpensive constraints, i.e.\ problems where the cost of evaluating/enforcing of the (possibly nonconvex or even disconnected) constraints, if any, is negligible compared to that of evaluating the objective function. These bounds unify, extend or improve all known upper and lower complexity bounds for unconstrained and convexly-constrained problems. It is shown that, given an accuracy level $ε$, a degree of highest available Lipschitz continuous derivatives $p$ and a desired optimality order $q$ between one and $p$, a conceptual regularization algorithm requires no more than $O(ε^{-\frac{p+1}{p-q+1}})$ evaluations of the objective function and its derivatives to compute a suitably approximate $q$-th order minimizer. With an appropriate choice of the regularization, a similar result also holds if the $p$-th derivative is merely Hölder rather than Lipschitz continuous. We provide an example that shows that the above complexity bound is sharp for unconstrained and a wide class of constrained problems, we also give reasons for the optimality of regularization methods from a worst-case complexity point of view, within a large class of algorithms that use the same derivative information.

1902.02311 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY 版本更新

Decentralized Multi-Agents by Imitation of a Centralized Controller

通过模仿集中控制器实现去中心化多智能体

Alex Tong Lin, Mark J. Debord, Katia Estabridis, Gary Hewer, Guido Montufar, Stanley Osher

发表机构 * UCLA(加州大学洛杉矶分校) Max Planck Institute, Leipzig(莱比锡马克斯·普朗克研究所) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出了一种基于集中训练、去中心执行框架的新型算法,通过模仿学习生成去中心化多智能体,解决了多智能体强化学习中非平稳和部分可观测环境下的协作问题。

详情
AI中文摘要

我们考虑了一个多智能体强化学习问题,其中每个智能体试图在与其他智能体交互时最大化共享奖励,且可能无法通信。通常,智能体无法访问其他智能体的策略,因此每个智能体都处于非平稳和部分可观测的环境中。为了获得去中心化作用的多智能体,我们引入了一种新的算法,该算法基于流行的集中训练、去中心执行框架。该训练框架首先通过单一集中联合空间学习者解决多智能体问题,然后用于指导模仿学习以生成独立的去中心化多智能体。该框架具有灵活性,可以使用任何强化学习算法来获得专家,以及任何模仿学习算法来获得去中心化智能体。这与其它多智能体学习算法不同,例如可能需要更具体的结构。我们为该方法提供了一些理论界限,并展示了通过模仿学习可以获得多智能体问题的去中心化解决方案。

英文摘要

We consider a multi-agent reinforcement learning problem where each agent seeks to maximize a shared reward while interacting with other agents, and they may or may not be able to communicate. Typically the agents do not have access to other agent policies and thus each agent is situated in a non-stationary and partially-observable environment. In order to obtain multi-agents that act in a decentralized manner, we introduce a novel algorithm under the popular framework of centralized training, but decentralized execution. This training framework first obtains solutions to a multi-agent problem with a single centralized joint-space learner, which is then used to guide imitation learning for independent decentralized multi-agents. This framework has the flexibility to use any reinforcement learning algorithm to obtain the expert as well as any imitation learning algorithm to obtain the decentralized agents. This is in contrast to other multi-agent learning algorithms that, for example, can require more specific structures. We present some theoretical bounds for our method, and we show that one can obtain decentralized solutions to a multi-agent problem through imitation learning.

1701.00178 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Lazily Adapted Constant Kinky Inference for Nonparametric Regression and Model-Reference Adaptive Control

惰性适应的常数Kinky推断用于非参数回归和模型参考自适应控制

Jan-Peter Calliess

发表机构 * Dept. of Engineering Science University of Oxford, UK(工程科学系 奥克斯福德大学 英国)

AI总结 本文提出了一种惰性适应的常数Kinky推断方法,用于非参数回归和模型参考自适应控制,通过在线估计Hölder常数并建立强通用逼近保证,展示了在密集数据下学习任意连续函数的能力。

详情
AI中文摘要

非线性集合成员预测、Lipschitz插值或Kinky推断是机器学习中利用预设Lipschitz性质来计算未观测函数值推断的方法。在已知目标函数真实最佳Lipschitz常数的上界时,这些方法提供收敛保证和预测的界限。考虑一个更一般的设置,该设置基于相对于伪度量的Hölder连续性,我们提出了一种在线方法,用于从可能受有界观测误差影响的函数值观测中估计Hölder常数。利用此方法在Kinky推断规则中计算自适应参数,从而得到一种非参数机器学习方法,我们为此建立了强通用逼近保证。也就是说,我们证明我们的预测规则在数据越来越密集的情况下,可以学习任意连续函数,其最坏误差界取决于观测不确定性水平。我们在非参数模型参考自适应控制(MRAC)的背景下应用了我们的方法。在一系列模拟飞机滚动动力学和性能指标中,我们的方法优于基于高斯过程和RBF神经网络最近提出的方法。对于离散时间系统,我们为我们的基于学习的控制器在批量学习和在线学习设置下的跟踪成功率提供了保证。

英文摘要

Techniques known as Nonlinear Set Membership prediction, Lipschitz Interpolation or Kinky Inference are approaches to machine learning that utilise presupposed Lipschitz properties to compute inferences over unobserved function values. Provided a bound on the true best Lipschitz constant of the target function is known a priori they offer convergence guarantees as well as bounds around the predictions. Considering a more general setting that builds on Hoelder continuity relative to pseudo-metrics, we propose an online method for estimating the Hoelder constant online from function value observations that possibly are corrupted by bounded observational errors. Utilising this to compute adaptive parameters within a kinky inference rule gives rise to a nonparametric machine learning method, for which we establish strong universal approximation guarantees. That is, we show that our prediction rule can learn any continuous function in the limit of increasingly dense data to within a worst-case error bound that depends on the level of observational uncertainty. We apply our method in the context of nonparametric model-reference adaptive control (MRAC). Across a range of simulated aircraft roll-dynamics and performance metrics our approach outperforms recently proposed alternatives that were based on Gaussian processes and RBF-neural networks. For discrete-time systems, we provide guarantees on the tracking success of our learning-based controllers both for the batch and the online learning setting.

1607.01202 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Optimal control for a robotic exploration, pick-up and delivery problem

机器人探索、拾取和配送问题的最优控制

Vladislav Nenchev, Christos G. Cassandras, Jörg Raisch

AI总结 本文研究了机器人在最小时间内寻找并收集有限数量物体并运送到集散地的最优控制问题,采用递推时间窗方案解决混合系统中的最优控制问题,并提出基于运动参数化和梯度优化的事件驱动方法,以提高计算效率。

Comments 14 pages, 23 figures

详情
AI中文摘要

本文解决了一个机器人在最小时间内寻找并收集有限数量物体并运送到集散地的最优控制问题。该机器人具有四阶动力学,其在拾取或放下物体时会瞬间改变。物体被建模为具有预先未知位置的点质量,在有界二维空间中可能包含未知障碍物。对于这种混合系统,通过递推时间窗方案近似求解最优控制问题(OCP),其中推导出的成本到目标的下界在最坏情况和概率情况下进行评估,假设物体位置服从均匀分布。首先,基于时间和位置空间离散化和混合整数规划的时间驱动近似解被提出。由于该解的计算成本较高,提出了一种基于合适运动参数化和梯度优化的事件驱动近似方法。在数值示例中比较了解决方案,表明后一种方法在计算上具有显著优势,同时与前者产生相似的定性结果。这些方法特别适用于各种机器人应用,如自动化清洁、搜索和救援、收割或制造。

英文摘要

This paper addresses an optimal control problem for a robot that has to find and collect a finite number of objects and move them to a depot in minimum time. The robot has fourth-order dynamics that change instantaneously at any pick-up or drop-off of an object. The objects are modeled by point masses with a-priori unknown locations in a bounded two-dimensional space that may contain unknown obstacles. For this hybrid system, an Optimal Control Problem (OCP) is approximately solved by a receding horizon scheme, where the derived lower bound for the cost-to-go is evaluated for the worst and for a probabilistic case, assuming a uniform distribution of the objects. First, a time-driven approximate solution based on time and position space discretization and mixed integer programming is presented. Due to the high computational cost of this solution, an alternative event-driven approximate approach based on a suitable motion parameterization and gradient-based optimization is proposed. The solutions are compared in a numerical example, suggesting that the latter approach offers a significant computational advantage while yielding similar qualitative results compared to the former. The methods are particularly relevant for various robotic applications like automated cleaning, search and rescue, harvesting or manufacturing.

1904.02851 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Planning under non-rational perception of uncertain spatial costs

在不确定空间成本下的非理性感知规划

Aamodh Suresh, Sonia Martinez

AI总结 本文研究了在不确定空间成本下考虑非理性风险感知的运动规划策略,提出基于累积前景理论(CPT)生成感知风险地图的方法,并通过理论和仿真验证了CPT模型的建模能力,与CVaR等其他风险感知模型相比,展示了在路径规划中的优势。

Comments 12 pages and 10 figures. This revision adds more explanation and clearer figures

详情
AI中文摘要

本工作探讨了设计一种考虑与不确定空间成本相关的风险感知的运动规划策略。我们提出的方法利用累积前景理论(CPT)来生成给定环境中的感知风险地图。CPT-like感知风险和路径长度指标被结合以定义一个符合采样运动规划器(RRT*)渐近最优要求的成本函数。通过理论和仿真展示了CPT的建模能力,并与其他风险感知模型如条件价值-at-风险(CVaR)进行了比较。理论上,我们定义了风险感知模型的表达性概念,并证明CPT的表达性高于CVaR和期望风险。然后我们展示了这种表达性在路径规划设置中的转化,其中我们观察到一个配备CPT和同时扰动随机近似(SPSA)方法的规划器可以更好地近似任意环境中的路径。此外,我们通过仿真展示了我们的规划器能够捕捉一组丰富的有意义路径,代表了不同风险感知的自定义环境。然后我们通过在拥挤和动态环境中的仿真比较了我们的规划器与T-RRT*(连续成本空间的规划器)和Risk-RRT*(动态人类障碍物的风险感知规划器)的性能,展示了我们所提规划器的优势。

英文摘要

This work investigates the design of risk-perception-aware motion-planning strategies that incorporate non-rational perception of risks associated with uncertain spatial costs. Our proposed method employs the Cumulative Prospect Theory (CPT) to generate a perceived risk map over a given environment. CPT-like perceived risks and path-length metrics are then combined to define a cost function that is compliant with the requirements of asymptotic optimality of sampling-based motion planners (RRT*). The modeling power of CPT is illustrated in theory and in simulation, along with a comparison to other risk perception models like Conditional Value at Risk (CVaR). Theoretically, we define a notion of expressiveness for a risk perception model and show that CPT's is higher than that of CVaR and expected risk. We then show that this expressiveness translates to our path planning setting, where we observe that a planner equipped with CPT together with a simultaneous perturbation stochastic approximation (SPSA) method can better approximate arbitrary paths in an environment. Additionally, we show in simulation that our planner captures a rich set of meaningful paths, representative of different risk perceptions in a custom environment. We then compare the performance of our planner with T-RRT* (a planner for continuous cost spaces) and Risk-RRT* (a risk-aware planner for dynamic human obstacles) through simulations in cluttered and dynamic environments respectively, showing the advantage of our proposed planner.

1709.01610 2026-06-04 math.OC cs.AI cs.SY eess.SY nlin.AO 版本更新

A second order primal-dual method for nonsmooth convex composite optimization

一种用于非光滑凸复合优化的二阶对偶方法

Neil K. Dhingra, Sei Zhen Khong, Mihailo R. Jovanović

发表机构 * Numerica Corporation(Numerica公司) University of Southern California(南加州大学)

AI总结 本文提出了一种二阶对偶方法,用于求解目标函数为强凸二次可微项与可能非可微凸正则化项之和的优化问题。通过引入辅助变量,利用非光滑正则化项的近似算子将增强拉格朗日函数转换为一次但非二次连续可微的函数,其鞍点对应于原始优化问题的解。进一步开发了全局收敛的定制算法,利用对偶增强拉格朗日函数作为 merit 函数,并证明了搜索方向可高效计算且具有二次/超线性渐近收敛性。

Comments 32 pages, 8 figures

详情
AI中文摘要

我们开发了一种二阶对偶方法,用于求解目标函数由强凸二次可微项和可能非可微凸正则化项之和构成的优化问题。在引入辅助变量后,我们利用非光滑正则化项的近似算子,将相应的增强拉格朗日函数转换为一次但非二次连续可微的函数。该函数的鞍点对应于原始优化问题的解。我们通过一般化的Hessian来定义该函数上的二阶更新,并证明了相应微分包含的全局指数稳定性。此外,我们开发了一种全局收敛的定制算法,利用对偶增强拉格朗日函数作为 merit 函数。我们证明了搜索方向可以高效计算,并证明了二次/超线性渐近收敛性。我们使用 $\ell_1$-正则化的模型预测控制问题和设计空间不变系统分布式控制器的问题来展示本方法的优越性和有效性。

英文摘要

We develop a second order primal-dual method for optimization problems in which the objective function is given by the sum of a strongly convex twice differentiable term and a possibly nondifferentiable convex regularizer. After introducing an auxiliary variable, we utilize the proximal operator of the nonsmooth regularizer to transform the associated augmented Lagrangian into a function that is once, but not twice, continuously differentiable. The saddle point of this function corresponds to the solution of the original optimization problem. We employ a generalization of the Hessian to define second order updates on this function and prove global exponential stability of the corresponding differential inclusion. Furthermore, we develop a globally convergent customized algorithm that utilizes the primal-dual augmented Lagrangian as a merit function. We show that the search direction can be computed efficiently and prove quadratic/superlinear asymptotic convergence. We use the $\ell_1$-regularized model predictive control problem and the problem of designing a distributed controller for a spatially-invariant system to demonstrate the merits and the effectiveness of our method.

1806.04225 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY math.OC 版本更新

PAC-Bayes Control: Learning Policies that Provably Generalize to Novel Environments

PAC-Bayes 控制:学习能够证明在新环境中泛化的能力的策略

Anirudha Majumdar, Alec Farid, Anoopkumar Sonar

发表机构 * Department of Mechanical and Aerospace Engineering(1,2 机械与航空航天工程系) Department of Computer Science Princeton University(3 计算机科学系 纽约大学普林斯顿分校)

AI总结 本文提出了一种基于PAC-Bayes框架的机器人策略学习方法,通过在新环境中泛化能力的理论分析,为机器人系统提供强泛化保证。

Comments Extended version of paper presented at the 2018 Conference on Robot Learning (CoRL)

详情
AI中文摘要

我们的目标是学习能够证明在新环境中泛化能力的机器人控制策略,给定一组示例环境的数据集。我们方法的关键技术思想是利用机器学习中的泛化理论工具,通过精确的类比(以缩减形式呈现)将控制策略在新环境中的泛化与监督学习中的假设泛化联系起来。特别是,我们利用Probably Approximately Correct (PAC)-Bayes框架,这使我们能够获得在新环境中(随机)控制策略预期成本的上界。我们提出策略学习算法,明确寻求最小化此上界。相应的优化问题可以在有限策略空间的设置中通过凸优化(特别是相对熵编程)解决。在更一般的情况下,对于连续参数化策略(例如神经网络策略),我们使用随机梯度下降来最小化此上界。我们展示了所提出方法应用于学习(1)反应性障碍物回避策略和(2)基于神经网络的抓取策略的模拟结果。我们还展示了Parrot Swing无人机在不同障碍物环境中的硬件结果。我们的例子展示了该方法在具有连续状态和动作空间、复杂(例如非线性)动态、丰富感官输入(例如深度图像)和基于神经网络的策略的机器人系统中提供强泛化保证的潜力。

英文摘要

Our goal is to learn control policies for robots that provably generalize well to novel environments given a dataset of example environments. The key technical idea behind our approach is to leverage tools from generalization theory in machine learning by exploiting a precise analogy (which we present in the form of a reduction) between generalization of control policies to novel environments and generalization of hypotheses in the supervised learning setting. In particular, we utilize the Probably Approximately Correct (PAC)-Bayes framework, which allows us to obtain upper bounds that hold with high probability on the expected cost of (stochastic) control policies across novel environments. We propose policy learning algorithms that explicitly seek to minimize this upper bound. The corresponding optimization problem can be solved using convex optimization (Relative Entropy Programming in particular) in the setting where we are optimizing over a finite policy space. In the more general setting of continuously parameterized policies (e.g., neural network policies), we minimize this upper bound using stochastic gradient descent. We present simulated results of our approach applied to learning (1) reactive obstacle avoidance policies and (2) neural network-based grasping policies. We also present hardware results for the Parrot Swing drone navigating through different obstacle environments. Our examples demonstrate the potential of our approach to provide strong generalization guarantees for robotic systems with continuous state and action spaces, complicated (e.g., nonlinear) dynamics, rich sensory inputs (e.g., depth images), and neural network-based policies.

1810.05947 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Robust Model Predictive Control of Irrigation Systems with Active Uncertainty Learning and Data Analytics

具有主动不确定性学习和数据分析的灌溉系统鲁棒模型预测控制

Chao Shang, Wei-Han Chen, Abraham Duncan Stroock, Fengqi You

发表机构 * Department of Automation, Tsinghua University(自动化系,清华大学)

AI总结 本文提出了一种数据驱动的鲁棒模型预测控制方法,结合机理模型和数据驱动模型,通过构建不确定性集来提高灌溉系统的控制效率和可靠性,实验证明该方法能显著减少用水量并提升控制性能。

详情
Journal ref
IEEE Transactions on Control Systems Technology, vol. 28, no. 4, pp. 1493-1504, 2020
AI中文摘要

我们开发了一种新型的数据驱动鲁棒模型预测控制(DDRMPC)方法,用于自动控制灌溉系统。核心思想是将机理模型(描述土壤含水量变化的动力学)和数据驱动模型(表征蒸散发和降水预测误差的不确定性)整合到一个系统控制框架中。为了更好地捕捉不确定性分布的支持,我们采用了一种基于学习的新方法,通过历史数据构建不确定性集。对于蒸散发预测误差,采用基于支持向量聚类的不确定性集,该方法可以方便地从历史数据中构建。而对于降水预测误差,我们分析了其分布对预测值的依赖性,并进一步设计了基于此类不确定性的特性定制的不确定性集。这样,整体不确定性分布可以被详细描述,最终有助于做出合理且高效的控制决策。为了确保数据驱动不确定性集的质量,采用训练-校准方案以提供理论性能保证。采用广义仿射决策规则以获得最优控制问题的可计算近似,从而确保DDRMPC的实用性。使用真实数据的案例研究显示,DDRMPC能够可靠地保持土壤含水量在安全水平以上并避免作物破坏。所提出的DDRMPC方法相比精细调优的开环控制策略,总用水量减少了40%。与精心调优的规则基控制和确定性等价模型预测控制相比,所提出的DDRMPC方法可以显著减少总用水量并提高控制性能。

英文摘要

We develop a novel data-driven robust model predictive control (DDRMPC) approach for automatic control of irrigation systems. The fundamental idea is to integrate both mechanistic models, which describe dynamics in soil moisture variations, and data-driven models, which characterize uncertainty in forecast errors of evapotranspiration and precipitation, into a holistic systems control framework. To better capture the support of uncertainty distribution, we take a new learning-based approach by constructing uncertainty sets from historical data. For evapotranspiration forecast error, the support vector clustering-based uncertainty set is adopted, which can be conveniently built from historical data. As for precipitation forecast errors, we analyze the dependence of their distribution on forecast values, and further design a tailored uncertainty set based on the properties of this type of uncertainty. In this way, the overall uncertainty distribution can be elaborately described, which finally contributes to rational and efficient control decisions. To assure the quality of data-driven uncertainty sets, a training-calibration scheme is used to provide theoretical performance guarantees. A generalized affine decision rule is adopted to obtain tractable approximations of optimal control problems, thereby ensuring the practicability of DDRMPC. Case studies using real data show that, DDRMPC can reliably maintain soil moisture above the safety level and avoid crop devastation. The proposed DDRMPC approach leads to a 40% reduction of total water consumption compared to the fine-tuned open-loop control strategy. In comparison with the carefully tuned rule-based control and certainty equivalent model predictive control, the proposed DDRMPC approach can significantly reduce the total water consumption and improve the control performance.

1904.01068 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient and Safe Exploration in Deterministic Markov Decision Processes with Unknown Transition Models

在未知转移模型的确定性马尔可夫决策过程中实现高效且安全的探索

Erdem Bıyık, Jonathan Margoliash, Shahrouz Ryan Alimo, Dorsa Sadigh

发表机构 * Stanford University(斯坦福大学) Jet Propulsion Laboratory(喷气推进实验室) California Institute of Technology(加州理工学院)

AI总结 本文提出了一种安全探索算法,通过利用Lipschitz连续性确保在探索过程中不访问危险状态,该算法在确定性马尔可夫决策过程中提供了确定性的安全保证,并通过模拟导航任务验证了其性能。

Comments Proceedings of the American Control Conference (ACC), July 2019. The first two authors have equal contribution

详情
AI中文摘要

我们提出了一种安全探索算法,用于具有未知转移模型的确定性马尔可夫决策过程。我们的算法通过利用Lipschitz连续性来保证安全性,确保在探索过程中不访问不安全的状态。与许多其他现有技术不同,所提供的安全保证是确定性的。我们的算法被优化以减少探索安全空间所需的操作次数。我们在导航任务的模拟中将我们的算法与基线方法进行了比较,以展示其性能。

英文摘要

We propose a safe exploration algorithm for deterministic Markov Decision Processes with unknown transition models. Our algorithm guarantees safety by leveraging Lipschitz-continuity to ensure that no unsafe states are visited during exploration. Unlike many other existing techniques, the provided safety guarantee is deterministic. Our algorithm is optimized to reduce the number of actions needed for exploring the safe space. We demonstrate the performance of our algorithm in comparison with baseline methods in simulation on navigation tasks.

1905.11011 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY 版本更新

Robustness of accelerated first-order algorithms for strongly convex optimization problems

强凸优化问题中加速一阶算法的鲁棒性

Hesameddin Mohammadi, Meisam Razaviyayn, Mihailo R. Jovanović

发表机构 * Ming Hsieh Department of Electrical and Computer Engineering(明希德电气与计算机工程系) Daniel J. Epstein Department of Industrial and Systems Engineering(丹尼尔·J·埃普斯坦工业与系统工程系)

AI总结 本文研究了在梯度评估中存在随机不确定性的加速一阶算法的鲁棒性,分析了噪声对优化变量均方误差的影响,并探讨了噪声放大与收敛速率之间的根本权衡。

Comments 45 pages, 6 figures

详情
AI中文摘要

我们研究了在梯度评估中存在随机不确定性的加速一阶算法的鲁棒性。具体而言,针对无约束、光滑、强凸优化问题,我们考察了在迭代项受到加性白噪声扰动时优化变量的均方误差。这种不确定性可能出现在通过真实系统的测量来近似梯度或在分布式网络计算中。尽管此类问题的一阶算法的动力学是非线性的,我们建立了均方偏离最优解的上界,这些上界在常数因子范围内是紧致的。我们的分析量化了通过任何类似于Nesterov或重力球方法的加速方案所获得的噪声放大与收敛速率之间的根本权衡。为了获得额外的分析洞察,对于强凸二次问题,我们明确地将优化变量的稳态方差表示为目标函数Hessian矩阵特征值的函数。我们证明了Hessian的整个谱,而不仅仅是极值特征值,影响噪声算法的鲁棒性。我们将这一结果专门应用于无向网络上的分布式平均问题,并考察了网络大小和拓扑结构对噪声加速算法鲁棒性的影响。

英文摘要

We study the robustness of accelerated first-order algorithms to stochastic uncertainties in gradient evaluation. Specifically, for unconstrained, smooth, strongly convex optimization problems, we examine the mean-squared error in the optimization variable when the iterates are perturbed by additive white noise. This type of uncertainty may arise in situations where an approximation of the gradient is sought through measurements of a real system or in a distributed computation over a network. Even though the underlying dynamics of first-order algorithms for this class of problems are nonlinear, we establish upper bounds on the mean-squared deviation from the optimal solution that are tight up to constant factors. Our analysis quantifies fundamental trade-offs between noise amplification and convergence rates obtained via any acceleration scheme similar to Nesterov's or heavy-ball methods. To gain additional analytical insight, for strongly convex quadratic problems, we explicitly evaluate the steady-state variance of the optimization variable in terms of the eigenvalues of the Hessian of the objective function. We demonstrate that the entire spectrum of the Hessian, rather than just the extreme eigenvalues, influence robustness of noisy algorithms. We specialize this result to the problem of distributed averaging over undirected networks and examine the role of network size and topology on the robustness of noisy accelerated algorithms.

1809.06646 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Model-Free Adaptive Optimal Control of Episodic Fixed-Horizon Manufacturing Processes using Reinforcement Learning

基于强化学习的无模型自适应最优控制用于周期固定时间制造过程

Johannes Dornheim, Norbert Link, Peter Gumbsch

发表机构 * Institute Intelligent Systems Research Group, Karlsruhe University of Applied Sciences(智能系统研究组,卡尔斯鲁厄应用科学大学) Institute for Applied Materials (IAM-CMS), Karlsruhe Institute of Technology(应用材料研究所(IAM-CMS),卡尔斯鲁厄理工学院)

AI总结 本文提出了一种用于周期固定时间制造过程的自学习最优控制算法,通过强化学习在连续过程中构建控制模型,并利用测量的产品质量作为奖励,从而避免了传统模型预测控制和近似动态规划算法所需的先验模型公式,解决了非线性动态和随机影响带来的系统辨识、精确建模和运行复杂度问题。

Comments Journal preprint version

详情
Journal ref
International Journal of Control, Automation and Systems (2019)
AI中文摘要

本文提出了一种用于周期固定时间制造过程的自学习最优控制算法,通过强化学习在连续过程中构建控制模型,并利用测量的产品质量作为奖励,从而避免了传统模型预测控制和近似动态规划算法所需的先验模型公式,解决了非线性动态和随机影响带来的系统辨识、精确建模和运行复杂度问题。该算法通过与过程交互在线学习期望函数,以推导最优的过程控制决策。所提出的算法考虑了过程条件的随机变化,并能够应对部分可观测性。开发并研究了一种基于Q学习的方法,用于部分可观测的周期固定时间制造过程的自适应最优控制。通过将其应用于模拟的随机最优控制问题,即金属板深拉伸过程,对所得到的算法进行了实例化和评估。

英文摘要

A self-learning optimal control algorithm for episodic fixed-horizon manufacturing processes with time-discrete control actions is proposed and evaluated on a simulated deep drawing process. The control model is built during consecutive process executions under optimal control via reinforcement learning, using the measured product quality as reward after each process execution. Prior model formulation, which is required by state-of-the-art algorithms from model predictive control and approximate dynamic programming, is therefore obsolete. This avoids several difficulties namely in system identification, accurate modelling, and runtime complexity, that arise when dealing with processes subject to nonlinear dynamics and stochastic influences. Instead of using pre-created process and observation models, value function-based reinforcement learning algorithms build functions of expected future reward, which are used to derive optimal process control decisions. The expectation functions are learned online, by interacting with the process. The proposed algorithm takes stochastic variations of the process conditions into account and is able to cope with partial observability. A Q-learning-based method for adaptive optimal control of partially observable episodic fixed-horizon manufacturing processes is developed and studied. The resulting algorithm is instantiated and evaluated by applying it to a simulated stochastic optimal control problem in metal sheet deep drawing.

1807.01739 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY 版本更新

Proximal algorithms for large-scale statistical modeling and sensor/actuator selection

大规模统计建模和传感器/执行器选择的近端算法

Armin Zare, Hesameddin Mohammadi, Neil K. Dhingra, Tryphon T. Georgiou, Mihailo R. Jovanović

发表机构 * Ming Hsieh Department of Electrical and Computer Engineering, University of Southern California(南加州大学明希赫电子与计算机工程系) Numerica Corporation(Numerica公司)

AI总结 本文提出了一种统一的近端算法框架,用于解决大规模系统建模与控制中的正则化半定规划问题,通过近端方法实现了对统计建模和传感器/执行器选择的高效处理,展示了算法的线性收敛性和有效性。

Comments To appear in IEEE Trans. Automat. Control

详情
AI中文摘要

若干在随机驱动动态系统建模与控制中的问题可以被表述为正则化半定规划。我们考察了两个具有代表性的此类问题,并展示了它们可以以类似的方式进行表述。第一个问题在统计建模中寻求通过适当且最小的扰动来协调观测统计数据。第二个问题则旨在为控制目的最优选择可用的传感器和执行器子集。为了应对大规模系统的建模与控制,我们开发了一种统一的算法框架,利用近端方法。我们的定制算法利用问题结构,使得能够处理统计建模以及传感器和执行器选择,比当前通用求解器可以处理的规模大得多。我们建立了近端梯度算法的线性收敛性,对比了所提出的近端算法与交替方向乘子法,并提供了示例以说明我们框架的优势和有效性。

英文摘要

Several problems in modeling and control of stochastically-driven dynamical systems can be cast as regularized semi-definite programs. We examine two such representative problems and show that they can be formulated in a similar manner. The first, in statistical modeling, seeks to reconcile observed statistics by suitably and minimally perturbing prior dynamics. The second seeks to optimally select a subset of available sensors and actuators for control purposes. To address modeling and control of large-scale systems we develop a unified algorithmic framework using proximal methods. Our customized algorithms exploit problem structure and allow handling statistical modeling, as well as sensor and actuator selection, for substantially larger scales than what is amenable to current general-purpose solvers. We establish linear convergence of the proximal gradient algorithm, draw contrast between the proposed proximal algorithms and alternating direction method of multipliers, and provide examples that illustrate the merits and effectiveness of our framework.

1803.00204 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

Scalar Quantization as Sparse Least Square Optimization

标量量化作为稀疏最小二乘优化

Chen Wang, Xiaomei Yang, Shaomin Fei, Kai Zhou, Xiaofeng Gong, Miao Du, Ruisen Luo

发表机构 * College of Electrical Engineering, Sichuan University(四川大学电气工程学院) Department of Computer Science, Rutgers University -- New Brunswick(罗格斯大学新布朗斯维广场分校计算机科学系) Engineering Practice Center, Chengdu University of Information Technology(成都信息科技大学工程实践中心)

AI总结 本文提出了一种基于稀疏最小二乘优化的新方法,用于解决标量量化中的问题,通过引入l1、l1+l2和l0正则化,改进了传统聚类方法的不足,提升了在位宽缩减场景下的性能。

详情
Journal ref
IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019
AI中文摘要

量化可以用来形成具有共享值的新向量/矩阵,其值接近原始数据。近年来,标量量化在值共享应用中的普及度迅速上升,因为它在减少神经网络复杂度方面具有巨大实用性。现有的基于聚类的量化技术虽然发展成熟,但存在多个缺点,包括对随机种子的依赖性、空集群或超出范围的集群,以及大量集群时的时间复杂度高。为克服这些问题,本文从新的视角研究标量量化问题,即稀疏最小二乘优化。具体来说,受稀疏最小二乘回归性质的启发,提出了几种基于l1最小二乘的量化算法。此外,还提出了类似的方案,具有l1 + l2和l0正则化。此外,为了计算给定数量的值/集群的量化结果,本文设计了一种迭代方法和一种基于聚类的方法,并且两者都建立在稀疏最小二乘之上。本文表明,后者方法在数学上等价于改进版的k-means聚类基量化算法,尽管两种算法起源于不同的直觉。所提出的算法在三种类型的数据上进行了测试,比较和分析了其计算性能,包括信息损失、时间消耗以及稀疏向量值的分布。本文为量化领域提供了新的视角,所提出的算法在某些位宽缩减场景下表现优异,当所需的量化后分辨率(值的数量)不显著低于原始数量时尤其如此。

英文摘要

Quantization can be used to form new vectors/matrices with shared values close to the original. In recent years, the popularity of scalar quantization for value-sharing applications has been soaring as it has been found huge utilities in reducing the complexity of neural networks. Existing clustering-based quantization techniques, while being well-developed, have multiple drawbacks including the dependency of the random seed, empty or out-of-the-range clusters, and high time complexity for a large number of clusters. To overcome these problems, in this paper, the problem of scalar quantization is examined from a new perspective, namely sparse least square optimization. Specifically, inspired by the property of sparse least square regression, several quantization algorithms based on $l_1$ least square are proposed. In addition, similar schemes with $l_1 + l_2$ and $l_0$ regularization are proposed. Furthermore, to compute quantization results with a given amount of values/clusters, this paper designed an iterative method and a clustering-based method, and both of them are built on sparse least square. The paper shows that the latter method is mathematically equivalent to an improved version of k-means clustering-based quantization algorithm, although the two algorithms originated from different intuitions. The algorithms proposed were tested with three types of data and their computational performances, including information loss, time consumption, and the distribution of the values of the sparse vectors, were compared and analyzed. The paper offers a new perspective to probe the area of quantization, and the algorithms proposed can outperform existing methods especially under some bit-width reduction scenarios, when the required post-quantization resolution (number of values) is not significantly lower than the original number.

1810.00697 2026-06-04 eess.SY cs.AI cs.LG cs.SY 版本更新

Data-driven Discovery of Cyber-Physical Systems

基于数据的物理系统发现

Ye Yuan, Xiuchuan Tang, Wei Pan, Xiuting Li, Wei Zhou, Hai-Tao Zhang, Han Ding, Jorge Goncalves

发表机构 * School of Automation, Huazhong University of Science and Technology(华中科技大学自动化学院) State Key Lab of Digital Manufacturing Equipment and Technology(数字制造装备与技术国家重点实验室) School of Mechanical Science and Engineering, Huazhong University of Science and Technology(华中科技大学机械科学与工程学院) Department of Cognitive Robotics, Delft University of Technology(代尔夫特理工大学认知机器人系) Department of Engineering, University of Cambridge(剑桥大学工程系) Luxembourg Centre for Systems Biomedicine, University of Luxembourg(卢森堡系统生物医学中心,卢森堡大学)

AI总结 本文提出了一种从数据直接反向工程物理系统的通用框架,通过识别物理系统和推断转移逻辑,成功应用于机械、电气系统和医疗应用,为预测CPS轨迹、评估性能、设计容错系统和制定新系统设计指南提供了新方法。

详情
AI中文摘要

物理系统(CPSs)将软件嵌入物理世界,广泛应用于智能电网、机器人、智能制造和医疗监测等领域。由于其固有的复杂性,来自物理组件和网络组件的组合以及它们之间的相互作用,CPSs在建模方面表现出抗性。本文提出了一种从数据直接反向工程CPSs的通用框架。该方法涉及识别物理系统以及推断转移逻辑。它已成功应用于从机械和电气系统到医疗应用的多个现实世界示例。该新颖的框架旨在使研究人员能够基于发现的模型预测CPS的轨迹。此类信息已被证明对于评估CPS性能、设计容错CPS以及为新CPS制定设计指南至关重要。

英文摘要

Cyber-physical systems (CPSs) embed software into the physical world. They appear in a wide range of applications such as smart grids, robotics, intelligent manufacture and medical monitoring. CPSs have proved resistant to modeling due to their intrinsic complexity arising from the combination of physical components and cyber components and the interaction between them. This study proposes a general framework for reverse engineering CPSs directly from data. The method involves the identification of physical systems as well as the inference of transition logic. It has been applied successfully to a number of real-world examples ranging from mechanical and electrical systems to medical applications. The novel framework seeks to enable researchers to make predictions concerning the trajectory of CPSs based on the discovered model. Such information has been proven essential for the assessment of the performance of CPS, the design of failure-proof CPS and the creation of design guidelines for new CPSs.

1904.04211 2026-06-04 astro-ph.SR cs.AI cs.NA math.NA 版本更新

Desaturating EUV observations of solar flaring storms

淡化日冕层太阳耀斑风暴的观测

Sabrina Guastavino, Michele Piana, Anna Maria Massone, Richard Schwartz, Federico Benvenuto

发表机构 * CNR - SPIN(意大利国家研究委员会-SPIN) NASA Goddard Space Flight Center(美国国家航空航天局戈达德空间飞行中心)

AI总结 本文提出了一种新颖的去饱和方法,能够通过利用图像本身的信息恢复AIA图像中饱和区域的信号,为构建AIA数据重建流程提供了可靠工具。

详情
AI中文摘要

图像饱和一直是太阳天文观测中多个仪器面临的问题,特别是在EUV波长范围内。然而,随着太阳动态观测站(SDO)任务载荷中大气成像装配(AIA)的发射,图像饱和已成为大数据问题,涉及自2010年2月以来每年提供的 impressive 数据集中的约10^$帧。本文介绍了一种新颖的去饱和方法,该方法能够通过利用图像本身包含的信息来恢复任何AIA图像中饱和区域的信号。这种独特的方法学特性,加上去饱和图像前所未有的统计可靠性,可能使该算法成为实现AIA数据重建流程的完美工具,即使在长时间、高能耀斑事件的情况下也能正常工作。

英文摘要

Image saturation has been an issue for several instruments in solar astronomy, mainly at EUV wavelengths. However, with the launch of the Atmospheric Imaging Assembly (AIA) as part of the payload of the Solar Dynamic Observatory (SDO) image saturation has become a big data issue, involving around 10^$ frames of the impressive dataset this beautiful telescope has been providing every year since February 2010. This paper introduces a novel desaturation method, which is able to recover the signal in the saturated region of any AIA image by exploiting no other information but the one contained in the image itself. This peculiar methodological property, jointly with the unprecedented statistical reliability of the desaturated images, could make this algorithm the perfect tool for the realization of a reconstruction pipeline for AIA data, able to work properly even in the case of long-lasting, very energetic flaring events.

1806.06790 2026-06-04 cs.LG cs.AI cs.IT cs.SY eess.SY math.IT math.OC stat.ML 版本更新

Towards Distributed Energy Services: Decentralizing Optimal Power Flow with Machine Learning

迈向分布式能源服务:利用机器学习实现最优功率流的去中心化

Roel Dobbe, Oscar Sondermeijer, David Fridovich-Keil, Daniel Arnold, Duncan Callaway, Claire Tomlin

发表机构 * AI Now Institute at New York University(纽约大学AI现在研究所) Energy & Resources Group at UC Berkeley(伯克利大学能源与资源组)

AI总结 本文提出了一种基于机器学习的去中心化方法,通过本地可用信息学习可控分布式能源资源(DER)的控制策略,以重构和模仿集中式最优功率流(OPF)问题的解决方案,从而实现分布式能源服务。

Comments Accepted for publication. To appear in the IEEE Transactions on Smart Grid

详情
AI中文摘要

实现最优功率流(OPF)方法以调节电力网络中的电压和功率流通常被认为需要大量通信。我们考虑包含多个可控分布式能源资源(DER)的配电系统,并提出一种数据驱动的方法,用于学习每个DER的控制策略,以仅利用本地可用信息来重构和模仿集中式OPF问题的解决方案。集体来看,所有本地控制器紧密匹配集中式OPF解决方案,提供接近最优的性能并满足系统约束。速率失真框架使得能够分析由此产生的完全去中心化控制策略在重构OPF解决方案方面的效果。该方法为决定DER应与哪些节点通信以改进其个别策略提供了自然扩展。该方法在单相和三相测试馈线网络上应用,使用真实负载和分布式发电机的数据,重点于不表现出跨时间依赖性的DER。它为配电系统运营商提供了一个框架,以高效规划和操作DER的贡献,以实现配电网络中的分布式能源服务。

英文摘要

The implementation of optimal power flow (OPF) methods to perform voltage and power flow regulation in electric networks is generally believed to require extensive communication. We consider distribution systems with multiple controllable Distributed Energy Resources (DERs) and present a data-driven approach to learn control policies for each DER to reconstruct and mimic the solution to a centralized OPF problem from solely locally available information. Collectively, all local controllers closely match the centralized OPF solution, providing near optimal performance and satisfaction of system constraints. A rate distortion framework enables the analysis of how well the resulting fully decentralized control policies are able to reconstruct the OPF solution. The methodology provides a natural extension to decide what nodes a DER should communicate with to improve the reconstruction of its individual policy. The method is applied on both single- and three-phase test feeder networks using data from real loads and distributed generators, focusing on DERs that do not exhibit inter-temporal dependencies. It provides a framework for Distribution System Operators to efficiently plan and operate the contributions of DERs to achieve Distributed Energy Services in distribution networks.

1807.08229 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Optimal Continuous State POMDP Planning with Semantic Observations: A Variational Approach

基于语义观测的最优连续状态POMDP规划:一种变分方法

Luke Burks, Ian Loefgren, Nisar Ahmed

AI总结 本文提出了一种基于变分方法的最优规划策略,针对语义观测下的连续状态部分可观测马尔可夫决策过程(CPOMDP)进行改进,通过变分贝叶斯方法解决混合连续-离散概率模型的表示和推理问题,提升了动态决策任务的效率和鲁棒性。

Comments Final version accepted to IEEE Transactions on Robotics (in press as of August 2019)

详情
AI中文摘要

本文开发了用于利用语义观测进行最优规划的新策略,使用连续状态部分可观测马尔可夫决策过程(CPOMDP)。在高斯混合(GM)CPOMDP策略近似方法方面,提出了两项主要创新。尽管现有方法具有许多有益的理论性质,但它们无法高效地表示和推理混合连续-离散概率模型。第一项主要创新是通过softmax模型推导出连续-离散语义观测概率的变分贝叶斯GM近似,用于点基值迭代贝尔曼策略备份。这种方法的关键优势是可以在复杂的非高斯不确定性下进行动态决策,同时利用连续动态状态空间模型(从而避免繁琐且昂贵的离散化)。第二项主要创新是一种基于聚类的混合物凝聚技术,能够很好地扩展到非常大的GM策略函数和信念函数。针对目标搜索和拦截任务的仿真结果表明,这些创新所产生的GM策略比其他最先进的策略近似方法产生的策略更有效,但需要显著较少的建模开销和在线运行时间成本。此外,结果还显示该方法对模型误差具有鲁棒性,并能扩展到更高维度。

英文摘要

This work develops novel strategies for optimal planning with semantic observations using continuous state partially observable markov decision processes (CPOMDPs). Two major innovations are presented in relation to Gaussian mixture (GM) CPOMDP policy approximation methods. While existing methods have many desirable theoretical properties, they are unable to efficiently represent and reason over hybrid continuous-discrete probabilistic models. The first major innovation is the derivation of closed-form variational Bayes GM approximations of Point-Based Value Iteration Bellman policy backups, using softmax models of continuous-discrete semantic observation probabilities. A key benefit of this approach is that dynamic decision-making tasks can be performed with complex non-Gaussian uncertainties, while also exploiting continuous dynamic state space models (thus avoiding cumbersome and costly discretization). The second major innovation is a new clustering-based technique for mixture condensation that scales well to very large GM policy functions and belief functions. Simulation results for a target search and interception task with semantic observations show that the GM policies resulting from these innovations are more effective than those produced by other state of the art policy approximations, but require significantly less modeling overhead and online runtime cost. Additional results show the robustness of this approach to model errors and scaling to higher dimensions.

1902.07747 2026-06-04 eess.SY cs.AI cs.DC cs.RO cs.SY 版本更新

Lookup Table-Based Consensus Algorithm for Real-Time Longitudinal Motion Control of Connected and Automated Vehicles

基于查找表的共识算法用于连接和自动化车辆的实时纵向运动控制

Ziran Wang, Kyuntae Han, BaekGyu Kim, Guoyuan Wu, Matthew J. Barth

AI总结 本文提出了一种基于查找表的共识算法,用于实时控制连接和自动化车辆的纵向运动,通过动态生成查找表来实时寻找最佳控制增益,优于之前的工作和线性反馈算法。

Comments 2019 American Control Conference (ACC)Philadelphia, PA, USA, July 10-12, 2019978-1-5386-7928-9

详情
AI中文摘要

连接和自动化车辆(CAV)技术是解决当前交通系统安全、机动性和可持续性问题的有前途的解决方案。具体而言,控制算法在CAV系统中起重要作用,因为它执行由前一步生成的命令,如通信、感知和规划。在本研究中,我们提出了一种共识算法,用于实时控制CAV的纵向运动。与该领域之前的研究不同,这些研究中的共识算法的控制增益是预先确定并固定的,我们开发了算法来构建查找表,实时寻找不同CAV初始条件下的理想控制增益。数值模拟显示,所提出的基于查找表的共识算法在四种不同场景中,针对各种CAV初始条件,在收敛时间和最大 jerk 方面均优于作者之前的工作以及van Arem的基于线性反馈的纵向运动控制算法。

英文摘要

Connected and automated vehicle (CAV) technology is one of the promising solutions to addressing the safety, mobility and sustainability issues of our current transportation systems. Specifically, the control algorithm plays an important role in a CAV system, since it executes the commands generated by former steps, such as communication, perception, and planning. In this study, we propose a consensus algorithm to control the longitudinal motion of CAVs in real time. Different from previous studies in this field where control gains of the consensus algorithm are pre-determined and fixed, we develop algorithms to build up a lookup table, searching for the ideal control gains with respect to different initial conditions of CAVs in real time. Numerical simulation shows that, the proposed lookup table-based consensus algorithm outperforms the authors' previous work, as well as van Arem's linear feedback-based longitudinal motion control algorithm in all four different scenarios with various initial conditions of CAVs, in terms of convergence time and maximum jerk of the simulation run.

1903.02531 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Combining Optimal Control and Learning for Visual Navigation in Novel Environments

将最优控制与学习相结合用于新环境中的视觉导航

Somil Bansal, Varun Tolani, Saurabh Gupta, Jitendra Malik, Claire Tomlin

发表机构 * University of California, Berkeley(加州大学伯克利分校) Facebook AI Research(脸书人工智能研究)

AI总结 本文提出了一种结合模型控制与学习感知的方法,用于在新环境中实现可靠的视觉导航,通过生成无碰撞路径的 waypoints,使机器人能够高效地到达目标位置,同时在低帧率和仿真到现实的迁移中表现良好。

Comments Project website: https://vtolani95.github.io/WayPtNav/

详情
AI中文摘要

基于模型的控制是机器人导航的流行范式,因为它可以利用已知的动力学模型来高效地规划鲁棒的机器人轨迹。然而,在环境事先未知且只能通过机器人上的传感器部分观测的情况下,使用基于模型的方法具有挑战性。在本工作中,我们通过将基于模型的控制与基于学习的感知相结合来解决这一不足。基于学习的感知模块生成一系列 waypoints,通过无碰撞路径引导机器人到达目标。这些 waypoints 被用于基于模型的规划器生成平滑且动态可行的轨迹,该轨迹通过反馈控制在物理系统上执行。我们在模拟的真实世界复杂环境中以及在实际地面车辆上的实验表明,与纯几何映射或端到端学习方法相比,所提出的方法在新环境中能够更可靠、更高效地到达目标位置。我们的方法不依赖于详细的显式 3D 环境地图,能够与低帧率工作,并且在仿真到现实的迁移中表现良好。描述我们方法和实验的视频可在项目网站上获得。

英文摘要

Model-based control is a popular paradigm for robot navigation because it can leverage a known dynamics model to efficiently plan robust robot trajectories. However, it is challenging to use model-based methods in settings where the environment is a priori unknown and can only be observed partially through on-board sensors on the robot. In this work, we address this short-coming by coupling model-based control with learning-based perception. The learning-based perception module produces a series of waypoints that guide the robot to the goal via a collision-free path. These waypoints are used by a model-based planner to generate a smooth and dynamically feasible trajectory that is executed on the physical system using feedback control. Our experiments in simulated real-world cluttered environments and on an actual ground vehicle demonstrate that the proposed approach can reach goal locations more reliably and efficiently in novel environments as compared to purely geometric mapping-based or end-to-end learning-based alternatives. Our approach does not rely on detailed explicit 3D maps of the environment, works well with low frame rates, and generalizes well from simulation to the real world. Videos describing our approach and experiments are available on the project website.

1807.06613 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Deep Reinforcement Learning for Swarm Systems

深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

发表机构 * L-CAS University of Lincoln(L-CAS林肯大学) Technische Universität Darmstadt(达姆施塔特技术大学)

AI总结 本文提出了一种基于分布均嵌入的新状态表示方法,用于深度多智能体强化学习,以更有效地处理大规模同质群体系统的去中心化决策问题。

Comments 31 pages, 12 figures, version 3 (published in JMLR Volume 20)

详情
Journal ref
Journal of Machine Learning Research 20(54):1-31, 2019
AI中文摘要

最近,深度强化学习(RL)方法已成功应用于多智能体场景。通常,这些方法依赖于将智能体状态拼接起来以表示去中心化决策所需的信 �息内容。然而,拼接在大规模同质群体系统中表现不佳,因为它不利用这些系统固有的基本属性:(i)群体中的智能体是可互换的,(ii)群体中智能体的精确数量无关。因此,我们提出了一种基于分布均嵌入的新深度多智能体RL状态表示方法。我们将智能体视为分布的样本,并使用经验均嵌入作为去中心化策略的输入。我们通过直方图、径向基函数和端到端学习的神经网络定义了不同的均嵌入特征空间。我们在群体文献中两个著名的已知问题(相遇和追捕)上评估了该表示方法,在全局和局部可观察的设置中。对于局部设置,我们进一步引入了简单的通信协议。所有方法中,基于神经网络特征的均嵌入表示能够促进相邻智能体之间最丰富的信息交换,从而促进更复杂的集体策略的发展。

英文摘要

Recently, deep reinforcement learning (RL) methods have been applied successfully to multi-agent scenarios. Typically, these methods rely on a concatenation of agent states to represent the information content required for decentralized decision making. However, concatenation scales poorly to swarm systems with a large number of homogeneous agents as it does not exploit the fundamental properties inherent to these systems: (i) the agents in the swarm are interchangeable and (ii) the exact number of agents in the swarm is irrelevant. Therefore, we propose a new state representation for deep multi-agent RL based on mean embeddings of distributions. We treat the agents as samples of a distribution and use the empirical mean embedding as input for a decentralized policy. We define different feature spaces of the mean embedding using histograms, radial basis functions and a neural network learned end-to-end. We evaluate the representation on two well known problems from the swarm literature (rendezvous and pursuit evasion), in a globally and locally observable setup. For the local setup we furthermore introduce simple communication protocols. Of all approaches, the mean embedding representation using neural network features enables the richest information exchange between neighboring agents facilitating the development of more complex collective strategies.

1905.13053 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Unpredictability of AI

AI的不可预测性

Roman V. Yampolskiy

发表机构 * Computer Engineering and Computer Science University of Louisville(计算机工程与计算机科学路易斯维尔大学)

AI总结 本文研究了AI安全领域中一个核心问题,即智能系统的行为预测难题,证明了即使知道终端目标,也无法准确预测超人类智能系统的行为,对AI安全产生了深远影响。

详情
AI中文摘要

AI安全这一年轻领域仍在确定其挑战和限制的过程中。在本文中,我们正式描述了一个不可能结果,即AI的不可预测性。我们证明了即使知道系统的终端目标,也无法准确且一致地预测超人类智能系统为实现其目标所采取的具体行动。最后讨论了不可预测性对AI安全的影响。

英文摘要

The young field of AI Safety is still in the process of identifying its challenges and limitations. In this paper, we formally describe one such impossibility result, namely Unpredictability of AI. We prove that it is impossible to precisely and consistently predict what specific actions a smarter-than-human intelligent system will take to achieve its objectives, even if we know terminal goals of the system. In conclusion, impact of Unpredictability on AI Safety is discussed.

1905.09673 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Deep Q-Learning with Q-Matrix Transfer Learning for Novel Fire Evacuation Environment

基于Q矩阵迁移学习的深度Q学习用于新型火灾疏散环境

Jivitesh Sharma, Per-Arne Andersen, Ole-Chrisoffer Granmo, Morten Goodwin

发表机构 * Centre for Artificial Intelligence Research(人工智能研究中心) Department of Information and Communication Technology(信息与通信技术系) University of Agder, Norway(阿格德大学,挪威)

AI总结 本文提出了一种基于Q矩阵迁移学习的深度Q学习方法,用于解决紧急疏散问题,通过预训练DQN网络权重以获取最短路径信息,并在复杂真实环境中实现最优疏散路径。

Comments 21 pages, 14 figures, 4 tables

详情
AI中文摘要

我们关注紧急疏散这一重要问题,该问题显然可以受益于强化学习,但长期以来未被充分研究。紧急疏散是一个复杂的任务,难以用强化学习解决,因为紧急情况高度动态,包含大量变化变量和复杂约束,使训练变得困难。在本文中,我们提出了第一个用于训练强化学习代理进行疏散规划的火灾疏散环境。该环境被建模为图,以捕捉建筑结构。它包括现实特征,如火势蔓延、不确定性和瓶颈。我们已经将环境实现为OpenAI gym格式,以促进未来研究。我们还提出了一种新的强化学习方法,该方法通过预训练DQN代理的网络权重来整合通往出口的最短路径信息。我们通过使用表格Q学习来学习建筑模型图中的最短路径来实现这一点。此信息通过故意在Q矩阵上过拟合来转移到网络。然后,预训练的DQN模型在火灾疏散环境中进行训练,以在时间变化条件下生成最优疏散路径。我们对所提出的方法与PPO、VPG、SARSA、A2C和ACKTR等最新强化学习算法进行了比较。结果表明,我们的方法在包括原始DQN模型在内的最新模型上表现出巨大的优势。最后,我们在一个大型且复杂的现实建筑中测试我们的模型,该建筑由91个房间组成,可以移动到任何其他房间,因此有8281种动作。我们使用基于注意力的机制来处理大动作空间。我们的模型在现实世界紧急环境中实现了接近最优的性能。

英文摘要

We focus on the important problem of emergency evacuation, which clearly could benefit from reinforcement learning that has been largely unaddressed. Emergency evacuation is a complex task which is difficult to solve with reinforcement learning, since an emergency situation is highly dynamic, with a lot of changing variables and complex constraints that makes it difficult to train on. In this paper, we propose the first fire evacuation environment to train reinforcement learning agents for evacuation planning. The environment is modelled as a graph capturing the building structure. It consists of realistic features like fire spread, uncertainty and bottlenecks. We have implemented the environment in the OpenAI gym format, to facilitate future research. We also propose a new reinforcement learning approach that entails pretraining the network weights of a DQN based agents to incorporate information on the shortest path to the exit. We achieved this by using tabular Q-learning to learn the shortest path on the building model's graph. This information is transferred to the network by deliberately overfitting it on the Q-matrix. Then, the pretrained DQN model is trained on the fire evacuation environment to generate the optimal evacuation path under time varying conditions. We perform comparisons of the proposed approach with state-of-the-art reinforcement learning algorithms like PPO, VPG, SARSA, A2C and ACKTR. The results show that our method is able to outperform state-of-the-art models by a huge margin including the original DQN based models. Finally, we test our model on a large and complex real building consisting of 91 rooms, with the possibility to move to any other room, hence giving 8281 actions. We use an attention based mechanism to deal with large action spaces. Our model achieves near optimal performance on the real world emergency environment.

1902.01119 2026-06-04 cs.AI cs.CL cs.LG cs.SY eess.SY 版本更新

The Natural Language of Actions

动作的自然语言

Guy Tennenholtz, Shie Mannor

发表机构 * Faculty of Electrical Engineering, Technion Institute of Technology, Israel(电气工程学院,技术学院,以色列)

AI总结 本文提出Act2Vec框架,用于学习基于上下文的动作表示以提升强化学习性能,通过将相似动作分组并利用动作间的关系来改进Q值近似和状态表示。

Comments Published in the proceedings of the 36th International Conference on Machine Learning (ICML 2019)

详情
AI中文摘要

我们介绍了Act2Vec,一种通用框架,用于学习基于上下文的动作表示以用于强化学习。在向量空间中表示动作有助于强化学习算法通过将相似动作分组并利用不同动作之间的关系来实现更好的性能。我们展示了如何从演示中提取环境的先验知识,并将其注入到编码自然兼容行为的动作向量表示中。然后我们利用这些表示来增强状态表示以及改进Q值的函数逼近。我们还在三个领域中可视化和测试了动作嵌入:绘画任务、高维导航任务以及星际争霸II中的大规模动作空间领域。

英文摘要

We introduce Act2Vec, a general framework for learning context-based action representation for Reinforcement Learning. Representing actions in a vector space help reinforcement learning algorithms achieve better performance by grouping similar actions and utilizing relations between different actions. We show how prior knowledge of an environment can be extracted from demonstrations and injected into action vector representations that encode natural compatible behavior. We then use these for augmenting state representations as well as improving function approximation of Q-values. We visualize and test action embeddings in three domains including a drawing task, a high dimensional navigation task, and the large action space domain of StarCraft II.

1905.02606 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Optimal Control of Complex Systems through Variational Inference with a Discrete Event Decision Process

通过变分推断优化复杂系统的控制:离散事件决策过程

Wen Dong, Bo Liu, Fan Yang

发表机构 * University at Buffalo(布法罗大学) Auburn University(阿伯伯大学)

AI总结 本文提出了一种基于变分推断的方法,将复杂社会网络决策问题建模为离散事件决策过程,以解决高维状态-动作空间中的维度灾难问题,从而在现实交通场景中实现更高的系统预期奖励、更快的收敛速度和更低的价值函数方差。

详情
AI中文摘要

复杂社会系统由相互关联的个体组成,其相互作用导致群体行为。现实复杂系统的最优控制有广泛的应用,包括道路交通管理、流行病预防和信息传播。然而,由于高维和非线性系统动态以及决策者面临的爆炸性状态和动作空间,实现此类现实复杂系统控制具有挑战性。现有方法可分为基于模拟和解析两类。现有的模拟方法在蒙特卡洛积分中具有高方差,而解析方法则面临建模不准确的问题。我们采用模拟建模来指定复杂系统的复杂动态,并为在具有高维状态-动作空间的复杂网络中搜索最优策略开发了解析解。为了捕捉复杂系统的动态,我们将复杂社会网络决策问题建模为离散事件决策过程。为了解决复杂系统中的维度灾难和在高维状态-动作空间中的搜索问题,我们将复杂系统的控制减少到变分推断和参数学习,引入Bethe熵近似,并开发了期望传播算法。我们提出的方法在现实交通场景中比最先进的解析和采样方法在系统预期奖励、收敛速度和价值函数方差方面表现更优。

英文摘要

Complex social systems are composed of interconnected individuals whose interactions result in group behaviors. Optimal control of a real-world complex system has many applications, including road traffic management, epidemic prevention, and information dissemination. However, such real-world complex system control is difficult to achieve because of high-dimensional and non-linear system dynamics, and the exploding state and action spaces for the decision maker. Prior methods can be divided into two categories: simulation-based and analytical approaches. Existing simulation approaches have high-variance in Monte Carlo integration, and the analytical approaches suffer from modeling inaccuracy. We adopted simulation modeling in specifying the complex dynamics of a complex system, and developed analytical solutions for searching optimal strategies in a complex network with high-dimensional state-action space. To capture the complex system dynamics, we formulate the complex social network decision making problem as a discrete event decision process. To address the curse of dimensionality and search in high-dimensional state action spaces in complex systems, we reduce control of a complex system to variational inference and parameter learning, introduce Bethe entropy approximation, and develop an expectation propagation algorithm. Our proposed algorithm leads to higher system expected rewards, faster convergence, and lower variance of value function in a real-world transportation scenario than state-of-the-art analytical and sampling approaches.

1809.07412 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Learning, Planning, and Control in a Monolithic Neural Event Inference Architecture

在单体神经事件推理架构中的学习、规划与控制

Martin V. Butz, David Bilkey, Dania Humaidan, Alistair Knott, Sebastian Otte

发表机构 * Cognitive Modeling Group Computer Science Department University of Tübingen(图宾根大学认知建模组计算机科学系)

AI总结 该研究提出了一种单体神经事件推理架构REPRISE,通过学习动态系统的时序事件预测模型,结合回顾和前瞻推理,实现对传感器运动动态的高效预测与控制。

Comments This is the final revision submitted to the Neural Networks journal. The revision mainly includes improvements in language, explanation, and additional references and system relations

详情
AI中文摘要

我们引入了REPRISE,一种回顾和前瞻推理方案,用于学习动态系统的时序事件预测模型。REPRISE推断出不可观测的上下文事件状态及其最佳解释最近遭遇的传感器运动经验的时序预测模型。同时,它以目标导向的方式优化即将到来的运动活动。在此,REPRISE通过循环神经网络(RNN)实现,该网络学习由不同模拟动态车辆生成的传感器运动连续性的时序前向模型。RNN通过上下文神经元增强,能够编码不同但相关的传感器运动动态为紧凑的事件代码。我们证明REPRISE能够同时学习分离和近似遇到的传感器运动动态:它分析传感器运动误差信号,同时适应内部上下文神经活动和连接权重值。此外,我们证明REPRISE可以利用所学模型诱导目标导向的模型预测控制,即近似主动推理:给定一个目标状态,系统想象一个优化该状态的运动命令序列,以最小化与目标的距离。RNN活动因此持续想象即将到来的未来并反思最近的过去,优化预测模型、隐藏神经状态活动和即将到来的运动活动。结果,事件预测神经编码得以发展,从而能够调用高效且适应性强的目标导向传感器运动控制。

英文摘要

We introduce REPRISE, a REtrospective and PRospective Inference SchEme, which learns temporal event-predictive models of dynamical systems. REPRISE infers the unobservable contextual event state and accompanying temporal predictive models that best explain the recently encountered sensorimotor experiences retrospectively. Meanwhile, it optimizes upcoming motor activities prospectively in a goal-directed manner. Here, REPRISE is implemented by a recurrent neural network (RNN), which learns temporal forward models of the sensorimotor contingencies generated by different simulated dynamic vehicles. The RNN is augmented with contextual neurons, which enable the encoding of distinct, but related, sensorimotor dynamics as compact event codes. We show that REPRISE concurrently learns to separate and approximate the encountered sensorimotor dynamics: it analyzes sensorimotor error signals adapting both internal contextual neural activities and connection weight values. Moreover, we show that REPRISE can exploit the learned model to induce goal-directed, model-predictive control, that is, approximate active inference: Given a goal state, the system imagines a motor command sequence optimizing it with the prospective objective to minimize the distance to the goal. The RNN activities thus continuously imagine the upcoming future and reflect on the recent past, optimizing the predictive model, the hidden neural state activities, and the upcoming motor activities. As a result, event-predictive neural encodings develop, which allow the invocation of highly effective and adaptive goal-directed sensorimotor control.

1808.07921 2026-06-04 cs.RO cs.AI cs.PL cs.SE cs.SY eess.SY 版本更新

SOTER: A Runtime Assurance Framework for Programming Safe Robotics Systems

SOTER:一种用于安全机器人系统编程的运行时保证框架

Ankush Desai, Shromona Ghosh, Sanjit A. Seshia, Natarajan Shankar, Ashish Tiwari

发表机构 * University of California at Berkeley, CA, USA(加州大学伯克利分校) SRI International(SRI国际) Microsoft(微软)

AI总结 本文提出SOTER框架,通过一种编程语言和集成的运行时保证系统,为安全机器人系统提供保障,确保在使用未经认证组件时仍能满足安全要求。

详情
AI中文摘要

近年来,机器人实现更高自主性和智能性的趋势导致了高度复杂性。自主机器人越来越多地依赖第三方现成组件和复杂的机器学习技术。这种趋势使得提供强设计时认证的正确操作变得具有挑战性。为了解决这些挑战,我们提出了SOTER,一种机器人编程框架,包含两个关键组件:(1)一种用于实现和测试高层反应式机器人软件的编程语言;(2)一个集成的运行时保证(RTA)系统,该系统帮助在使用未经认证的组件时仍能提供安全保证。SOTER提供了语言原语,用于声明性地构建RTA模块,该模块包含一个高级高性能控制器(未经认证)、一个安全但性能较低的控制器(认证)以及期望的安全规范。该框架提供正式保证,确保一个良好的RTA模块始终满足安全规范,而无需完全牺牲性能,通过在安全时使用高性能未经认证的组件。SOTER允许复杂的机器人软件堆栈作为RTA模块的组合来构建,其中每个未经认证的组件都通过RTA模块进行保护。为了证明我们框架的有效性,我们考虑了一个现实世界案例研究,即构建一个安全的无人机监视系统。我们的实验在模拟和实际无人机上均表明,SOTER启用的RTA确保了系统的安全性,包括在不可信的第三方组件有bug或偏离预期行为时。

英文摘要

The recent drive towards achieving greater autonomy and intelligence in robotics has led to high levels of complexity. Autonomous robots increasingly depend on third party off-the-shelf components and complex machine-learning techniques. This trend makes it challenging to provide strong design-time certification of correct operation. To address these challenges, we present SOTER, a robotics programming framework with two key components: (1) a programming language for implementing and testing high-level reactive robotics software and (2) an integrated runtime assurance (RTA) system that helps enable the use of uncertified components, while still providing safety guarantees. SOTER provides language primitives to declaratively construct a RTA module consisting of an advanced, high-performance controller (uncertified), a safe, lower-performance controller (certified), and the desired safety specification. The framework provides a formal guarantee that a well-formed RTA module always satisfies the safety specification, without completely sacrificing performance by using higher performance uncertified components whenever safe. SOTER allows the complex robotics software stack to be constructed as a composition of RTA modules, where each uncertified component is protected using a RTA module. To demonstrate the efficacy of our framework, we consider a real-world case-study of building a safe drone surveillance system. Our experiments both in simulation and on actual drones show that the SOTER-enabled RTA ensures the safety of the system, including when untrusted third-party components have bugs or deviate from the desired behavior.

1904.05072 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Differential Dynamic Programming for Multi-Phase Rigid Contact Dynamics

多相刚体接触动力学中的微分动态规划

Rohan Budhiraja, Justin Carpentier, Carlos Mastalli, Nicolas Mansard

发表机构 * CNRS, LAAS(法国国家科学研究中心,拉拉斯研究所) INRIA, France(法国国家信息与自动化研究所,法国)

AI总结 本文提出使用微分动态规划算法来优化多相刚体接触动力学的全身轨迹,通过利用角动量提高运动效率,减少力和冲击,并在无外力情况下实现姿态控制。

Comments 6 pages, IEEE RAS International Conference on Humanoid Robots

详情
AI中文摘要

当今生成高效运动的常见策略是将问题分解为两个连续步骤:第一步生成接触序列和质心轨迹,第二步计算遵循质心模式的全身轨迹。然而,第二步通常由简单的程序如逆运动学求解器处理。相反,我们提出使用局部最优控制求解器,即微分动态规划(DDP),来计算全身轨迹。我们的方法通过利用角动量产生更高效的运动,具有较低的力和较小的冲击。为此,我们提出了一种原始的DDP公式,利用刚体接触模型的Kuhn-Tucker约束。通过在真实HRP-2机器人上执行大步行走和无外力情况下的姿态控制问题,我们实验性地展示了这种方法的重要性。

英文摘要

A common strategy today to generate efficient locomotion movements is to split the problem into two consecutive steps: the first one generates the contact sequence together with the centroidal trajectory, while the second one computes the whole-body trajectory that follows the centroidal pattern. Yet the second step is generally handled by a simple program such as an inverse kinematics solver. In contrast, we propose to compute the whole-body trajectory by using a local optimal control solver, namely Differential Dynamic Programming (DDP). Our method produces more efficient motions, with lower forces and smaller impacts, by exploiting the Angular Momentum (AM). With this aim, we propose an original DDP formulation exploiting the Karush-Kuhn-Tucker constraint of the rigid contact model. We experimentally show the importance of this approach by executing large steps walking on the real HRP-2 robot, and by solving the problem of attitude control under the absence of external forces.

1904.04595 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Simultaneous Contact, Gait and Motion Planning for Robust Multi-Legged Locomotion via Mixed-Integer Convex Optimization

通过混合整数凸优化实现鲁棒多足运动的同步接触、步态和运动规划

Bernardo Aceituno-Cabezas, Carlos Mastalli, Hongkai Dai, Michele Focchi, Andreea Radulescu, Darwin G. Caldwell, Jose Cappelletto, Juan C. Grieco, Gerardo Fernandez-Lopez, Claudio Semini

发表机构 * School of Electrical and Computer Engineering, Georgia Institute of Technology, Atlanta, GA 30332 USA(电气与计算机工程系,佐治亚理工学院,亚特兰大,GA 30332 USA) Twentieth Century Fox, Springfield, USA(二十世纪福克斯,斯普林菲尔德,USA) Starfleet Academy, San Francisco, CA 96678 USA(星际舰队学院,旧金山,CA 96678 USA) Tyrell Inc., 123 Replicant Street, Los Angeles, CA 90210 USA(泰勒尔公司,123 复制人街,洛杉矶,CA 90210 USA)

AI总结 本文提出了一种混合整数凸优化方法,用于同时规划多足机器人的接触位置、步态转换和运动,以提高运动的通用性并保持低计算时间。

Comments 8 pages, IEEE Robotics and Automation Letters

详情
AI中文摘要

传统多足运动规划方法将问题分为多个阶段,如接触搜索和轨迹生成。然而,同时考虑接触和运动对于生成复杂的全身行为至关重要。目前,将这些问题耦合在一起需要假设固定的步态序列和平坦地形条件,或者使用非凸优化,计算时间不可行。本文提出了一种混合整数凸公式,以高效的方式同时规划接触位置、步态转换和运动。与之前的工作不同,我们的方法不限于平坦地形或预设的步态序列。相反,我们纳入摩擦锥稳定性边际,近似机器人扭矩限制,并使用混合整数凸约束规划步态。我们通过在HyQ机器人上实验验证了我们的方法,穿越了不同具有挑战性的地形,其中非凸性和平坦地形假设可能导致次优或不稳定计划。我们的方法在保持低计算时间的同时提高了运动的通用性。

英文摘要

Traditional motion planning approaches for multi-legged locomotion divide the problem into several stages, such as contact search and trajectory generation. However, reasoning about contacts and motions simultaneously is crucial for the generation of complex whole-body behaviors. Currently, coupling theses problems has required either the assumption of a fixed gait sequence and flat terrain condition, or non-convex optimization with intractable computation time. In this paper, we propose a mixed-integer convex formulation to plan simultaneously contact locations, gait transitions and motion, in a computationally efficient fashion. In contrast to previous works, our approach is not limited to flat terrain nor to a pre-specified gait sequence. Instead, we incorporate the friction cone stability margin, approximate the robot's torque limits, and plan the gait using mixed-integer convex constraints. We experimentally validated our approach on the HyQ robot by traversing different challenging terrains, where non-convexity and flat terrain assumptions might lead to sub-optimal or unstable plans. Our method increases the motion generality while keeping a low computation time.

1904.02341 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Online Risk-Bounded Motion Planning for Autonomous Vehicles in Dynamic Environments

在线风险受限的自主车辆动态环境中的运动规划

Xin Huang, Sungkweon Hong, Andreas Hofmann, Brian C. Williams

发表机构 * MIT Computer Science and Artificial Intelligence Laboratory(麻省理工学院计算机科学与人工智能实验室)

AI总结 本文提出了一种在线风险受限的运动规划方法,通过结合意图识别算法和POMDP求解器,生成安全高效的路径规划方案,尤其在无保护左转和变道等复杂环境中表现更优。

Comments Accepted at ICAPS'19. 10 pages, 6 figures, 1 table

详情
AI中文摘要

高效且稳健的自主车辆运动规划面临的关键挑战是理解周围代理的意图。忽略动态环境中其他代理的意图会导致风险或过于保守的规划。本文将运动规划问题建模为部分可观测马尔可夫决策过程(POMDP),并提出一个在线系统,结合意图识别算法和POMDP求解器,为自主车辆生成风险受限的路径规划。意图识别算法利用贝叶斯过滤和预学习的机动运动模型,预测每个代理车辆在有限时间 horizon 内的混合运动状态。我们实时更新POMDP模型,并使用启发式搜索算法求解,生成具有碰撞概率上界保证的策略。我们证明,与基线方法相比,我们的系统在多个具有挑战性的环境中,能够生成更高效和安全的运动规划。

英文摘要

A crucial challenge to efficient and robust motion planning for autonomous vehicles is understanding the intentions of the surrounding agents. Ignoring the intentions of the other agents in dynamic environments can lead to risky or over-conservative plans. In this work, we model the motion planning problem as a partially observable Markov decision process (POMDP) and propose an online system that combines an intent recognition algorithm and a POMDP solver to generate risk-bounded plans for the ego vehicle navigating with a number of dynamic agent vehicles. The intent recognition algorithm predicts the probabilistic hybrid motion states of each agent vehicle over a finite horizon using Bayesian filtering and a library of pre-learned maneuver motion models. We update the POMDP model with the intent recognition results in real time and solve it using a heuristic search algorithm which produces policies with upper-bound guarantees on the probability of near colliding with other dynamic agents. We demonstrate that our system is able to generate better motion plans in terms of efficiency and safety in a number of challenging environments including unprotected intersection left turns and lane changes as compared to the baseline methods.

1707.09198 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC 版本更新

Data-Driven Stochastic Robust Optimization: A General Computational Framework and Algorithm for Optimization under Uncertainty in the Big Data Era

数据驱动的随机稳健优化:大数据时代不确定性优化的通用计算框架和算法

Chao Ning, Fengqi You

发表机构 * Robert Frederick Smith School of Chemical and Biomolecular Engineering, Cornell University(罗伯特·弗雷德里克·史密斯化学与生物分子工程学院,康奈尔大学)

AI总结 本文提出了一种数据驱动的随机稳健优化框架,通过双层优化结构基于数据驱动的不确定性模型,结合两阶段随机规划和自适应稳健优化,解决大数据时代下的不确定性优化问题。

详情
Journal ref
Computers & Chemical Engineering, Volume 111, Pages 115-133, 4 March 2018,
AI中文摘要

本文提出了一种新颖的数据驱动随机稳健优化(DDSRO)框架,用于利用带有标签的多类不确定性数据进行不确定性优化。大数据集中的不确定性数据通常来自各种条件,这些条件通过类别标签进行编码。采用狄利克雷过程混合模型和最大似然估计等机器学习方法进行不确定性建模。基于数据驱动的不确定性模型,进一步提出了一种双层优化结构的DDSRO框架。外层优化问题采用两阶段随机规划方法,以在不同数据类别上优化预期目标;自适应稳健优化作为内层问题,确保解决方案的鲁棒性,同时保持计算可行性。进一步开发了一种基于分解的算法,以高效解决由此产生的多级优化问题。通过过程网络设计和规划的案例研究,展示了所提框架和算法的应用性。

英文摘要

A novel data-driven stochastic robust optimization (DDSRO) framework is proposed for optimization under uncertainty leveraging labeled multi-class uncertainty data. Uncertainty data in large datasets are often collected from various conditions, which are encoded by class labels. Machine learning methods including Dirichlet process mixture model and maximum likelihood estimation are employed for uncertainty modeling. A DDSRO framework is further proposed based on the data-driven uncertainty model through a bi-level optimization structure. The outer optimization problem follows a two-stage stochastic programming approach to optimize the expected objective across different data classes; adaptive robust optimization is nested as the inner problem to ensure the robustness of the solution while maintaining computational tractability. A decomposition-based algorithm is further developed to solve the resulting multi-level optimization problem efficiently. Case studies on process network design and planning are presented to demonstrate the applicability of the proposed framework and algorithm.

1903.03948 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Rethinking System Health Management

重新思考系统健康管理

Edward Balaban, Stephen B. Johnson, Mykel J. Kochenderfer

发表机构 * Intelligent Systems Division, NASA Ames Research Center(美国国家航空航天局阿姆斯研究中心智能系统部门) Dependable System Technologies, LLC(可靠系统技术有限公司) Jacobs ESSCA Group at NASA Marshall Space Flight Center(美国国家航空航天局马歇尔太空飞行中心Jacobs ESSCA小组) Department of Aeronautics and Astronautics, Stanford University(斯坦福大学航空与航天系)

AI总结 本文提出将系统健康管理与决策制定统一起来,以提高系统运行效率并降低整体复杂性,通过数值示例展示了传统方法的局限性。

Comments Published in the proceedings of the 2018 AAAI Fall Symposium on Integrating Planning, Diagnosis, and Causal Reasoning

详情
AI中文摘要

复杂动态系统的健康管理传统上与自动化控制、规划和调度(通常称为决策制定)分开发展。集成系统健康管理的目标是使系统健康管理与决策制定协调一致,尽管成功的实际应用仍然有限。本文提出,系统健康管理与决策制定不应被视为相互连接但又不同的实体,而应在其 formulations 中统一。借助建模和计算的进步,我们主张统一方法将提高系统的操作效率,并可能导致整体系统复杂性降低。我们概述了普遍的系统健康管理方法,并通过数值示例说明其局限性。然后描述了所提出的统一方法,并展示其如何容纳典型的系统健康管理概念。

英文摘要

Health management of complex dynamic systems has traditionally evolved separately from automated control, planning, and scheduling (generally referred to in the paper as decision making). A goal of Integrated System Health Management has been to enable coordination between system health management and decision making, although successful practical implementations have remained limited. This paper proposes that, rather than being treated as connected, yet distinct entities, system health management and decision making should be unified in their formulations. Enabled by advances in modeling and computing, we argue that the unified approach will increase a system's operational effectiveness and may also lead to a lower overall system complexity. We overview the prevalent system health management methodology and illustrate its limitations through numerical examples. We then describe the proposed unification approach and show how it accommodates the typical system health management concepts.

1812.05591 2026-06-04 eess.SY cs.AI cs.MA cs.SY 版本更新

TuSeRACT: Turn-Sample-Based Real-Time Traffic Signal Control

TuSeRACT:基于转向的实时交通信号控制

Srishti Dhamija, Pradeep Varakantham

发表机构 * School of Information Systems, Singapore Management University(新加坡管理大学信息学院)

AI总结 本文提出TuSeRACT,一种基于转向的实时交通信号控制方法,通过采样转向流量来优化交通信号调度,从而降低车辆等待时间,相比SURTRAC有更优的性能。

详情
AI中文摘要

实时交通信号控制是一个具有挑战性的问题,由于不断变化的交通需求模式、有限的规划时间和各种不确定性来源(例如转向运动、车辆检测)在现实世界中。SURTRAC(可扩展的Urban交通控制)是一种最近开发的交通信号控制方法,它在实时计算中计算减少延误和协调(跨邻近交通灯)的即将到来车辆集群的调度。为了确保在转向引起的不确定性存在下实时响应性,SURTRAC计算调度以最小化预期转向运动的延误,而不是在转向引起的不确定性下最小化预期延误。这种近似确保了实时可处理性,但在存在转向引起的不确定性时会降低解决方案质量。为了解决这一限制,我们引入了TuSeRACT(基于转向的实时交通信号控制),一种分布式基于采样的调度方法用于交通信号控制。与SURTRAC不同,TuSeRACT计算调度以最小化采样转向运动的观察交通下的预期延误,并与邻近交叉口通信流量样本。我们将这种基于采样的调度问题公式化为一个约束程序,并在合成交通网络上经验性地评估了我们的方法。我们的方法在车辆等待时间方面相对于SURTRAC提供了显著更低的平均值。

英文摘要

Real-time traffic signal control is a challenging problem owing to constantly changing traffic demand patterns, limited planning time and various sources of uncertainty (e.g., turn movements, vehicle detection) in the real world. SURTRAC (Scalable URban TRAffic Control) is a recently developed traffic signal control approach which computes delay-minimizing and coordinated (across neighbouring traffic lights) schedules of oncoming vehicle clusters in real time. To ensure real-time responsiveness in the presence of turn-induced uncertainty, SURTRAC computes schedules which minimize the delay for the expected turn movements as opposed to minimizing the expected delay under turn-induced uncertainty. This approximation ensures real-time tractability, but degrades solution quality in the presence of turn-induced uncertainty. To address this limitation, we introduce TuSeRACT (Turn Sample based Real-time trAffic signal ConTrol), a distributed sample-based scheduling approach to traffic signal control. Unlike SURTRAC, TuSeRACT computes schedules that minimize expected delay over sampled turn movements of observed traffic, and communicates samples of traffic outflows to neighbouring intersections. We formulate this sample-based scheduling problem as a constraint program and empirically evaluate our approach on synthetic traffic networks. Our approach provides substantially lower mean vehicular waiting times relative to SURTRAC.

1902.08705 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

A General Framework for Structured Learning of Mechanical Systems

结构机械系统学习的通用框架

Jayesh K. Gupta, Kunal Menda, Zachary Manchester, Mykel J. Kochenderfer

发表机构 * Stanford University(斯坦福大学)

AI总结 本文提出了一种通用框架,用于结构化学习机械系统,通过结合先验知识和训练表达式近似器来提高模型的准确性和效率。

Comments 10 pages, 7 figures. First two authors contributed equally. Submitted to IROS/RA-L. Code at https://github.com/sisl/mechamodlearn/

详情
AI中文摘要

学习准确的动力学模型对于优化和顺应性控制机器人系统至关重要。当前使用解析参数化进行白盒建模或使用神经网络进行黑盒建模的方法可能会产生高偏差或高方差。我们提出了一个灵活的灰盒模型,可以无缝地结合可用的先验知识,并在没有时训练具有表达能力的函数近似器。我们提出使用神经网络参数化机械系统,以建模其拉格朗日量和作用在其上的广义力。我们在模拟的驱动双摆上测试了我们的方法。我们展示了我们的方法在数据效率以及基于模型的强化学习中的性能优于朴素的黑盒模型。我们还系统地研究了我们的方法在结合可用的系统先验知识以提高数据效率方面的能力。

英文摘要

Learning accurate dynamics models is necessary for optimal, compliant control of robotic systems. Current approaches to white-box modeling using analytic parameterizations, or black-box modeling using neural networks, can suffer from high bias or high variance. We address the need for a flexible, gray-box model of mechanical systems that can seamlessly incorporate prior knowledge where it is available, and train expressive function approximators where it is not. We propose to parameterize a mechanical system using neural networks to model its Lagrangian and the generalized forces that act on it. We test our method on a simulated, actuated double pendulum. We show that our method outperforms a naive, black-box model in terms of data-efficiency, as well as performance in model-based reinforcement learning. We also conduct a systematic study of our method's ability to incorporate available prior knowledge about the system to improve data efficiency.

1902.10590 2026-06-04 cs.SE cs.AI cs.LG cs.SY eess.SY 版本更新

Architecting Dependable Learning-enabled Autonomous Systems: A Survey

构建可靠的学习自主系统:一项综述

Chih-Hong Cheng, Dhiraj Gulati, Rongjie Yan

发表机构 * fortiss - Research Institute of the Free State of Bavaria, Germany(巴伐利亚自由州研究 institute) State Key Laboratory of Computer Science, China(中国计算机科学国家重点实验室)

AI总结 本文综述了构建可靠学习自主系统的方法,重点在于自动驾驶,讨论了多样冗余、信息融合和运行时监控等技术支柱,并总结了提升深度学习组件可靠性的最新方法,最后提出了研究方向。

详情
AI中文摘要

我们提供了一项关于构建可靠学习自主系统架构方法的综述,重点在于自动驾驶。我们考虑了构建可靠自主性的三个技术支柱,即多样化冗余、信息融合和运行时监控。对于学习组件,我们还总结了近年来提高深度学习组件可靠性的最新架构方法。最后,我们以现有方法面临的挑战为导向,提出了一 series of promising research directions.

英文摘要

We provide a summary over architectural approaches that can be used to construct dependable learning-enabled autonomous systems, with a focus on automated driving. We consider three technology pillars for architecting dependable autonomy, namely diverse redundancy, information fusion, and runtime monitoring. For learning-enabled components, we additionally summarize recent architectural approaches to increase the dependability beyond standard convolutional neural networks. We conclude the study with a list of promising research directions addressing the challenges of existing approaches.

1812.06120 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Simulation to Scaled City: Zero-Shot Policy Transfer for Traffic Control via Autonomous Vehicles

模拟到缩放城市:通过自动驾驶车辆实现交通控制的零样本策略迁移

Kathy Jang, Eugene Vinitsky, Behdad Chalaki, Ben Remer, Logan Beaver, Andreas Malikopoulos, Alexandre Bayen

发表机构 * University of California, Berkeley(加州大学伯克利分校) University of Delaware(德克萨斯大学)

AI总结 本文通过深度强化学习训练自动驾驶车辆在环形交叉口的控制策略,并将训练好的策略迁移至缩放智能城市进行测试,发现注入噪声的策略在迁移后表现更佳,实现了交通流的优化。

Comments To be published at the International Conference on Cyber Physical Systems (ICCPS) 2019. 10 pages, 9 figures

详情
AI中文摘要

使用深度强化学习,我们训练了自动驾驶车辆在车队中通过环形交叉口的控制策略。使用Flow库,我们在微仿真器中训练了两种策略:一种在状态和动作空间中注入噪声,另一种则没有。在模拟中,自动驾驶车辆为两种策略都学习出一种涌现的引导行为,即减速以实现更流畅的合并。随后,我们将该策略直接迁移至德雷克塞尔大学缩放智能城市(UDSSC)测试平台,该平台是连接和自动化车辆的1:25比例测试场。我们对两种策略在缩放城市中的性能进行了表征。结果显示,无噪声策略经常导致碰撞,仅偶尔实现引导;而注入噪声的策略则始终表现出引导行为且无碰撞,表明噪声有助于零样本策略迁移。此外,迁移后的噪声注入策略在UDSSC中使平均行程时间减少了5%,最大行程时间减少了22%。控制器的视频可在https://sites.google.com/view/iccps-policy-transfer查看。

英文摘要

Using deep reinforcement learning, we train control policies for autonomous vehicles leading a platoon of vehicles onto a roundabout. Using Flow, a library for deep reinforcement learning in micro-simulators, we train two policies, one policy with noise injected into the state and action space and one without any injected noise. In simulation, the autonomous vehicle learns an emergent metering behavior for both policies in which it slows to allow for smoother merging. We then directly transfer this policy without any tuning to the University of Delaware Scaled Smart City (UDSSC), a 1:25 scale testbed for connected and automated vehicles. We characterize the performance of both policies on the scaled city. We show that the noise-free policy winds up crashing and only occasionally metering. However, the noise-injected policy consistently performs the metering behavior and remains collision-free, suggesting that the noise helps with the zero-shot policy transfer. Additionally, the transferred, noise-injected policy leads to a 5% reduction of average travel time and a reduction of 22% in maximum travel time in the UDSSC. Videos of the controllers can be found at https://sites.google.com/view/iccps-policy-transfer.

1711.09048 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

A Compression-Inspired Framework for Macro Discovery

一种受压缩启发的宏发现框架

Francisco M. Garcia, Bruno C. da Silva, Philip S. Thomas

发表机构 * College of Information and Computer Sciences(信息与计算机科学学院) Department of Computer Science(计算机科学系) University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Federal University Rio Grande do Sul(里约格朗德杜斯阿鲁斯联邦大学)

AI总结 本文提出了一种受压缩启发的宏发现框架,通过识别高性能策略获得的轨迹中的重复模式,帮助强化学习代理利用早期经验快速解决相关新任务。

Comments Accepted as Extended Abstract, AAMAS, 2019

详情
AI中文摘要

在本文中,我们考虑了强化学习代理在解决一组相关马尔可夫决策过程时,如何利用早期获得的知识来提高其快速解决新但相关任务的能力。一种利用这种经验的方法是通过识别从高性能策略中获得的轨迹中的重复模式。我们提出一个三步框架:代理1) 通过压缩来自近最优策略的轨迹生成一组候选开环宏;2) 评估每个宏的价值;3) 选择一个最大化多样性的宏子集,覆盖通常用于解决相关任务集的策略空间。我们的实验表明,将识别出的宏扩展到代理的原始原始动作集,使其能够更快速地在未见过但相似的MDPs中学习到最优策略。

英文摘要

In this paper we consider the problem of how a reinforcement learning agent tasked with solving a set of related Markov decision processes can use knowledge acquired early in its lifetime to improve its ability to more rapidly solve novel, but related, tasks. One way of exploiting this experience is by identifying recurrent patterns in trajectories obtained from well-performing policies. We propose a three-step framework in which an agent 1) generates a set of candidate open-loop macros by compressing trajectories drawn from near-optimal policies; 2) evaluates the value of each macro; and 3) selects a maximally diverse subset of macros that spans the space of policies typically required for solving the set of related tasks. Our experiments show that extending the original primitive action-set of the agent with the identified macros allows it to more rapidly learn an optimal policy in unseen, but similar MDPs.

1902.08274 2026-06-04 cs.AI cs.LG cs.MA cs.SY eess.SY 版本更新

An Online Decision-Theoretic Pipeline for Responder Dispatch

为响应调度设计一个在线决策理论管道

Ayan Mukhopadhyay, Geoffrey Pettet, Chinmaya Samal, Abhishek Dubey, Yevgeniy Vorobeychik

发表机构 * Vanderbilt University(范德比大学) Washington University(华盛顿大学)

AI总结 本文提出了一种在线决策理论管道,用于有效应对紧急事件,通过实时数据流更新模型,提高响应效率并减少计算时间。

Comments Appeared in ICCPS 2019

详情
AI中文摘要

向服务交通事故、火灾、 distress 电话和犯罪等紧急事件派遣应急响应人员的问题困扰着全球各地的城市。尽管此类问题已广泛研究,但大多数方法是离线的。这些方法无法捕捉到关键紧急响应发生的动态变化环境,因此无法在实践中实施。任何全面的方法必须考虑其他挑战,包括预测事件何时何地发生以及理解环境动态变化。我们描述了一个系统,该系统以在线方式处理所有这些问题,即模型通过流数据源更新。我们强调这种做法对应急响应有效性的重要性,并提出了一种算法框架,可以为给定的决策理论模型计算有希望的行动。我们还提出了一种在线机制用于事件预测,以及基于循环神经网络的方法来学习和预测影响响应调度的环境特征。我们比较了我们的方法与现有最先进的方法和现有调度策略,结果表明我们的方法在减少响应时间的同时大幅减少了计算时间。

英文摘要

The problem of dispatching emergency responders to service traffic accidents, fire, distress calls and crimes plagues urban areas across the globe. While such problems have been extensively looked at, most approaches are offline. Such methodologies fail to capture the dynamically changing environments under which critical emergency response occurs, and therefore, fail to be implemented in practice. Any holistic approach towards creating a pipeline for effective emergency response must also look at other challenges that it subsumes - predicting when and where incidents happen and understanding the changing environmental dynamics. We describe a system that collectively deals with all these problems in an online manner, meaning that the models get updated with streaming data sources. We highlight why such an approach is crucial to the effectiveness of emergency response, and present an algorithmic framework that can compute promising actions for a given decision-theoretic model for responder dispatch. We argue that carefully crafted heuristic measures can balance the trade-off between computational time and the quality of solutions achieved and highlight why such an approach is more scalable and tractable than traditional approaches. We also present an online mechanism for incident prediction, as well as an approach based on recurrent neural networks for learning and predicting environmental features that affect responder dispatch. We compare our methodology with prior state-of-the-art and existing dispatch strategies in the field, which show that our approach results in a reduction in response time with a drastic reduction in computational time.

1812.07084 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Learning Constraints from Demonstrations

从示范中学习约束

Glen Chou, Dmitry Berenson, Necmiye Ozay

发表机构 * Dept. of Electrical Engineering and Computer Science, University of Michigan, Ann Arbor, MI, 48109, USA(电气工程与计算机科学系,密歇根大学,安娜堡,MI,48109,美国)

AI总结 该研究提出了一种从示范中学习未知约束的方法,通过任务示范、成本函数和系统动力学与控制约束,利用hit-and-run采样获取低成本但不安全的轨迹,并通过整数规划获得一致的不安全集表示,同时理论分析了可从安全示范中学习的约束子集。

Comments Presented at the Workshop on the Algorithmic Foundations of Robotics (WAFR), 2018, Mérida, Mexico

详情
AI中文摘要

我们通过提供一种方法扩展了从示范中学习的范式,该方法利用任务的示范、成本函数以及系统动力学和控制约束来学习跨任务的未知约束。给定安全的示范,我们的方法使用hit-and-run采样来获得低成本但不安全的轨迹。安全和不安全的轨迹都被用来通过求解整数规划问题获得不安全集的一致表示。我们的方法能够跨系统动力学泛化,并学习保证的约束子集。我们还提供了理论分析,说明从安全示范中可以学习的约束子集。我们在线性和非线性系统动力学上展示了我们的方法,并证明它可以修改以适应次优示范,并且也可以用于特征空间中学习约束。

英文摘要

We extend the learning from demonstration paradigm by providing a method for learning unknown constraints shared across tasks, using demonstrations of the tasks, their cost functions, and knowledge of the system dynamics and control constraints. Given safe demonstrations, our method uses hit-and-run sampling to obtain lower cost, and thus unsafe, trajectories. Both safe and unsafe trajectories are used to obtain a consistent representation of the unsafe set via solving an integer program. Our method generalizes across system dynamics and learns a guaranteed subset of the constraint. We also provide theoretical analysis on what subset of the constraint can be learnable from safe demonstrations. We demonstrate our method on linear and nonlinear system dynamics, show that it can be modified to work with suboptimal demonstrations, and that it can also be used to learn constraints in a feature space.

1709.04794 2026-06-04 cs.AI cs.NA cs.PF math.NA 版本更新

Fast semi-supervised discriminant analysis for binary classification of large data-sets

快速半监督判别分析用于大数据集的二分类

Joris Tavernier, Jaak Simm, Karl Meerbergen, Joerg Kurt Wegner, Hugo Ceulemans, Yves Moreau

发表机构 * Department of Computer Science, KU Leuven(库勒万大学计算机科学系)

AI总结 本文提出并分析了三种可扩展的半监督判别分析方法,通过利用数据稀疏性和Krylov子空间的移位不变性,提高了大数据集二分类的效率和性能。

详情
AI中文摘要

高维数据需要可扩展的算法。我们提出了三种可扩展且相关的半监督判别分析(SDA)算法,并分析了这些算法。这些方法基于Krylov子空间方法,利用数据稀疏性和Krylov子空间的移位不变性。此外,通过在半监督设置中添加中心化来改进问题定义。所提出的方法在制药公司的行业级数据集上进行了评估,以预测化合物在目标蛋白上的活性。结果表明,SDA实现了良好的预测性能,而我们的方法仅需几秒钟,显著提高了之前最先进的方法的计算时间。

英文摘要

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.

1610.05202 2026-06-04 cs.LG cs.AI cs.DC cs.SY eess.SY stat.ML 版本更新

Decentralized Collaborative Learning of Personalized Models over Networks

网络上的去中心化协作学习个性化模型

Paul Vanhaesebrouck, Aurélien Bellet, Marc Tommasi

发表机构 * INRIA

AI总结 本文研究了在协作对等网络中,如何通过与其他具有相似目标的代理通信来改进本地训练模型,提出两种异步 gossip 算法并基于 ADMM 实现去中心化算法。

Comments To appear in the Proceedings of the 20th International Conference on Artificial Intelligence and Statistics (AISTATS 2017)

详情
AI中文摘要

我们考虑了一个协作对等网络中的学习代理集合,其中每个代理根据其自身的学习目标学习一个个性化模型。本文研究的问题是:如何通过与其他具有相似目标的代理通信来改进本地训练的模型?我们引入并分析了两种异步 gossip 算法,以完全去中心化的方式运行。我们的第一种方法受标签传播启发,旨在在网络中平滑预训练的本地模型,同时考虑每个代理对其初始模型的置信度。我们的第二种方法中,代理通过基于本地数据集和邻居行为的迭代更新联合学习和传播模型。为了优化这个具有挑战性的目标,我们的去中心化算法基于 ADMM。

英文摘要

We consider a set of learning agents in a collaborative peer-to-peer network, where each agent learns a personalized model according to its own learning objective. The question addressed in this paper is: how can agents improve upon their locally trained model by communicating with other agents that have similar objectives? We introduce and analyze two asynchronous gossip algorithms running in a fully decentralized manner. Our first approach, inspired from label propagation, aims to smooth pre-trained local models over the network while accounting for the confidence that each agent has in its initial model. In our second approach, agents jointly learn and propagate their model by making iterative updates based on both their local dataset and the behavior of their neighbors. To optimize this challenging objective, our decentralized algorithm is based on ADMM.

1606.02421 2026-06-04 stat.ML cs.AI cs.DC cs.LG cs.SY eess.SY 版本更新

Gossip Dual Averaging for Decentralized Optimization of Pairwise Functions

基于 gossip 的双重平均法用于分布式优化配对函数

Igor Colin, Aurélien Bellet, Joseph Salmon, Stéphan Clémençon

发表机构 * Magnet Team, INRIA Lille – Nord Europe(磁力团队、法国国家信息与自动化技术研究所里尔-北欧洲分部)

AI总结 本文提出了一种基于 gossip 的双重平均算法,用于在分布式网络中优化配对函数,适用于排名、距离度量学习和图推断等应用,通过同步和异步设置解决优化问题,并展示了其在AUC最大化和度量学习中的实际应用。

详情
AI中文摘要

在分布式网络(如传感器、连接设备等)中,存在对高效算法优化全局成本函数的重要需求,例如从每个计算单元收集的本地数据中学习全局模型。本文针对分布式最小化数据点配对函数的问题,这些点分布在定义网络通信拓扑的图的节点上。该问题在排名、距离度量学习和图推断等领域有广泛应用。我们提出了一种基于双重平均的新型 gossip 算法,旨在在同步和异步设置中解决此类问题。所提出的框架足够灵活,能够处理约束和正则化优化问题的变体。我们的理论分析表明,所提出的算法在保持集中式双重平均收敛速度的同时,仅引入一个加性偏差项。我们还通过在AUC最大化和度量学习问题上的数值模拟,展示了我们方法的实际价值。

英文摘要

In decentralized networks (of sensors, connected objects, etc.), there is an important need for efficient algorithms to optimize a global cost function, for instance to learn a global model from the local data collected by each computing unit. In this paper, we address the problem of decentralized minimization of pairwise functions of the data points, where these points are distributed over the nodes of a graph defining the communication topology of the network. This general problem finds applications in ranking, distance metric learning and graph inference, among others. We propose new gossip algorithms based on dual averaging which aims at solving such problems both in synchronous and asynchronous settings. The proposed framework is flexible enough to deal with constrained and regularized variants of the optimization problem. Our theoretical analysis reveals that the proposed algorithms preserve the convergence rate of centralized dual averaging up to an additive bias term. We present numerical simulations on Area Under the ROC Curve (AUC) maximization and metric learning problems which illustrate the practical interest of our approach.

1804.06760 2026-06-04 eess.SY cs.AI cs.SE cs.SY 版本更新

Simulation-based Adversarial Test Generation for Autonomous Vehicles with Machine Learning Components

基于模拟的对抗性测试生成用于自动驾驶车辆的机器学习组件

Cumhur Erkan Tuncali, Georgios Fainekos, Hisahiro Ito, James Kapinski

发表机构 * Toyota Research Institute of North America(丰田北美研究院) Arizona State University(亚利桑那州立大学)

AI总结 本文提出了一种基于模拟的对抗性测试生成框架,用于评估包含机器学习组件的自动驾驶系统模型的闭环属性,通过测试用例生成和自动失效方法提高系统可靠性。

Comments This is a modified version of a paper presented at the 29th IEEE Intelligent Vehicles Symposium (IV 2018). Source code is available at https://cpslab.assembla.com/spaces/sim-atav

详情
AI中文摘要

许多组织正在开发自动驾驶系统,这些系统预计在未来不久将大规模部署。尽管如此,对于如何测试、调试和认证这些系统的性能仍缺乏共识。主要挑战之一是许多自动驾驶系统包含机器学习组件,如深度神经网络,其形式化属性难以描述。我们提出了一种兼容测试用例生成和自动失效方法的测试框架,用于评估具有物理交互的系统。我们展示了如何在虚拟环境中使用该框架来评估包含ML组件的自动驾驶系统模型的闭环属性。我们还展示了如何使用测试用例生成方法,如覆盖数组,以及需求失效方法,自动识别有问题的测试场景。所提出的框架可以用来提高自动驾驶系统的可靠性。

英文摘要

Many organizations are developing autonomous driving systems, which are expected to be deployed at a large scale in the near future. Despite this, there is a lack of agreement on appropriate methods to test, debug, and certify the performance of these systems. One of the main challenges is that many autonomous driving systems have machine learning components, such as deep neural networks, for which formal properties are difficult to characterize. We present a testing framework that is compatible with test case generation and automatic falsification methods, which are used to evaluate cyber-physical systems. We demonstrate how the framework can be used to evaluate closed-loop properties of an autonomous driving system model that includes the ML components, all within a virtual environment. We demonstrate how to use test case generation methods, such as covering arrays, as well as requirement falsification methods to automatically identify problematic test scenarios. The resulting framework can be used to increase the reliability of autonomous driving systems.

1604.00359 2026-06-04 cs.AI cs.NA cs.NE math.NA 版本更新

Using Well-Understood Single-Objective Functions in Multiobjective Black-Box Optimization Test Suites

在多目标黑箱优化测试套件中使用已知的单目标函数

Dimo Brockhoff, Tea Tusar, Anne Auger, Nikolaus Hansen

发表机构 * Inria, research centre Saclay and CMAP UMR 7641(Inria萨克莱研究中心及CMAP UMR 7641)

AI总结 本文提出通过结合现有单目标问题来构建多目标问题,介绍bbob-biobj测试套件及其扩展版本,并提供了一种通用方法来创建任意数量目标的测试套件,以比较确定性和随机优化算法的性能。

Comments ArXiv e-prints, arXiv:1604.00359

详情
AI中文摘要

几种测试函数套件被用于多目标优化算法的数值基准测试。虽然它们有一些 desirable 的属性,如各种形状的Pareto集和Pareto前沿,但大多数当前使用的函数具有在现实问题中可能被低估的特性。这些特性主要源于此类函数的更容易构造,导致了分离性、恰好位于边界约束的最优解以及控制解与Pareto前沿距离的变量的存在。本文提出了一种替代的多目标问题构造方法,通过结合文献中的现有单目标问题。我们特别描述了连续域中的bbob-biobj测试套件,包含55个双目标函数,以及其扩展版本包含92个双目标函数(bbob-biobj-ext)。这两个测试套件已在COCO平台中实现,用于黑箱优化基准测试。最后,我们推荐了一种通用的创建任意数量目标的测试套件的程序。除了提供正式的函数定义并展示其(已知)属性外,本文还旨在从具有相似属性的函数组、目标空间归一化和问题实例的角度解释我们的方法的原理。后者使我们能够轻松比较确定性和随机求解器的性能,这是基准测试中常被忽视的问题。

英文摘要

Several test function suites are being used for numerical benchmarking of multiobjective optimization algorithms. While they have some desirable properties, like well-understood Pareto sets and Pareto fronts of various shapes, most of the currently used functions possess characteristics that are arguably under-represented in real-world problems. They mainly stem from the easier construction of such functions and result in improbable properties such as separability, optima located exactly at the boundary constraints, and the existence of variables that solely control the distance between a solution and the Pareto front. Here, we propose an alternative way to constructing multiobjective problems-by combining existing single-objective problems from the literature. We describe in particular the bbob-biobj test suite with 55 bi-objective functions in continuous domain, and its extended version with 92 bi-objective functions (bbob-biobj-ext). Both test suites have been implemented in the COCO platform for black-box optimization benchmarking. Finally, we recommend a general procedure for creating test suites for an arbitrary number of objectives. Besides providing the formal function definitions and presenting their (known) properties, this paper also aims at giving the rationale behind our approach in terms of groups of functions with similar properties, objective space normalization, and problem instances. The latter allows us to easily compare the performance of deterministic and stochastic solvers, which is an often overlooked issue in benchmarking.

1805.11706 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML 版本更新

Supervised Policy Update for Deep Reinforcement Learning

深度强化学习中的监督策略更新

Quan Vuong, Yiming Zhang, Keith W. Ross

发表机构 * University of California, San Diego(加州大学圣地亚哥分校) New York University(纽约大学)

AI总结 本文提出了一种新的样本效率高的方法,称为监督策略更新(SPU),用于深度强化学习。该方法通过当前策略生成的数据,在非参数化的近端策略空间中构建并求解一个约束优化问题,然后利用监督回归将最优的非参数化策略转换为参数化策略,从而生成新的样本。该方法适用于离散和连续动作空间,并能处理多种接近约束。本文展示了如何通过该方法解决自然策略梯度和信任区域策略优化(NPG/TRPO)以及近端策略优化(PPO)问题。SPU的实现比TRPO更简单,在样本效率方面,实验表明SPU在Mujoco模拟机器人任务中优于TRPO,在Atari视频游戏任务中优于PPO。

Comments Accepted as a conference paper at ICLR 2019

详情
AI中文摘要

我们提出了一种新的样本效率高的方法,称为监督策略更新(SPU),用于深度强化学习。从当前策略生成的数据开始,SPU在非参数化的近端策略空间中构建并求解一个约束优化问题。利用监督回归,它将最优的非参数化策略转换为参数化策略,从而生成新的样本。该方法具有通用性,适用于离散和连续动作空间,并能处理多种接近约束。我们展示了如何通过该方法解决自然策略梯度和信任区域策略优化(NPG/TRPO)以及近端策略优化(PPO)问题。SPU的实现比TRPO更简单。在样本效率方面,我们的广泛实验表明,SPU在Mujoco模拟机器人任务中优于TRPO,在Atari视频游戏任务中优于PPO。

英文摘要

We propose a new sample-efficient methodology, called Supervised Policy Update (SPU), for deep reinforcement learning. Starting with data generated by the current policy, SPU formulates and solves a constrained optimization problem in the non-parameterized proximal policy space. Using supervised regression, it then converts the optimal non-parameterized policy to a parameterized policy, from which it draws new samples. The methodology is general in that it applies to both discrete and continuous action spaces, and can handle a wide variety of proximity constraints for the non-parameterized optimization problem. We show how the Natural Policy Gradient and Trust Region Policy Optimization (NPG/TRPO) problems, and the Proximal Policy Optimization (PPO) problem can be addressed by this methodology. The SPU implementation is much simpler than TRPO. In terms of sample efficiency, our extensive experiments show SPU outperforms TRPO in Mujoco simulated robotic tasks and outperforms PPO in Atari video game tasks.

1810.08124 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Approximate Dynamic Programming for Planning a Ride-Sharing System using Autonomous Fleets of Electric Vehicles

为使用自动驾驶电动车的拼车系统进行规划的近似动态规划

Lina Al-Kanj, Juliana Nascimento, Warren B. Powell

发表机构 * Operations Research and Financial Engineering Department(运筹学与金融工程系)

AI总结 本文研究了自动驾驶电动车拼车系统中的调度问题、 surge定价问题和车队规模规划问题,采用近似动态规划方法来优化车辆分配、充电和重新定位决策,并通过分层聚合技术提高价值函数估计的准确性,同时利用自适应学习方法确定每趟行程的定价。

详情
AI中文摘要

在十年内,几乎每家主要汽车公司以及如Uber等车队运营商都宣布计划将自动驾驶车辆投放到道路上。同时,电动车正迅速成为下一代技术,不仅成本效益高,还能减少碳足迹。由中央管理的无人驾驶车队与电动车的操作特性相结合,正创造一种变革性技术,提供显著的成本节省和高水平的服务。该问题涉及调度问题,即分配乘客到车辆;surge定价问题,即决定每趟行程的价格;以及规划问题,即决定车队规模。我们使用近似动态规划来开发高质量的操作调度策略,以确定哪辆车最适合特定行程,何时应充电,以及何时应重新定位到提供更高行程密度的区域。我们证明价值函数在电池和时间维度上是单调的,并利用分层聚合技术,用少量观测数据获得更好的价值函数估计。然后,使用自适应学习方法讨论surge定价问题,以决定每趟行程的价格。最后,我们讨论了车队规模问题,其取决于前两个问题。

英文摘要

Within a decade, almost every major auto company, along with fleet operators such as Uber, have announced plans to put autonomous vehicles on the road. At the same time, electric vehicles are quickly emerging as a next-generation technology that is cost effective, in addition to offering the benefits of reducing the carbon footprint. The combination of a centrally managed fleet of driverless vehicles, along with the operating characteristics of electric vehicles, is creating a transformative new technology that offers significant cost savings with high service levels. This problem involves a dispatch problem for assigning riders to cars, a surge pricing problem for deciding on the price per trip and a planning problem for deciding on the fleet size. We use approximate dynamic programming to develop high-quality operational dispatch strategies to determine which car is best for a particular trip, when a car should be recharged, and when it should be re-positioned to a different zone which offers a higher density of trips. We prove that the value functions are monotone in the battery and time dimensions and use hierarchical aggregation to get better estimates of the value functions with a small number of observations. Then, surge pricing is discussed using an adaptive learning approach to decide on the price for each trip. Finally, we discuss the fleet size problem which depends on the previous two problems.

1803.00444 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY stat.ML 版本更新

Inverse Reinforcement Learning via Nonparametric Spatio-Temporal Subgoal Modeling

通过非参数时空子目标建模实现逆强化学习

Adrian Šošić, Elmar Rueckert, Jan Peters, Abdelhak M. Zoubir, Heinz Koeppl

发表机构 * Signal Processing Group(信号处理组) Institute for Robotics and Cognitive Systems(机器人与认知系统研究所) Autonomous Systems Labs(自主系统实验室) Bioinspired Communication Systems(生物启发通信系统)

AI总结 本文提出了一种基于非参数时空子目标建模的逆强化学习方法,通过局部上下文更高效地解释单条轨迹,实现更紧凑的行为表示,并构建隐式意图模型以预测未观察到的情况,从而在处理意图变化和主动学习场景中表现出色。

Comments 45 pages, 14 figures; ### Version 3 ### published in the Journal of Machine Learning Research

详情
AI中文摘要

逆强化学习(IRL)领域的发展导致了更复杂的推理框架,这些框架放宽了原始建模假设,即观察到的代理行为仅反映单一意图。相反于学习全局行为模型,最近的IRL方法将演示数据分成部分,以考虑不同轨迹可能对应不同意图,例如因为它们由不同领域专家生成。在本工作中,我们进一步采用子目标的直观概念,建立一个前提:即使单条轨迹在特定上下文中局部解释也比全局更高效,从而实现更紧凑的行为表示。基于这一假设,我们构建了代理目标的隐式意图模型,以预测未观察到的情况。结果是一种集成的贝叶斯预测框架,显著优于现有IRL解决方案,并提供与专家计划一致的平滑策略估计。最值得注意的是,我们的框架自然处理代理意图随时间变化的情况,而经典IRL算法失败。此外,由于其概率性质,该模型可以轻松应用于主动学习场景,以指导专家的演示过程。

英文摘要

Advances in the field of inverse reinforcement learning (IRL) have led to sophisticated inference frameworks that relax the original modeling assumption of observing an agent behavior that reflects only a single intention. Instead of learning a global behavioral model, recent IRL methods divide the demonstration data into parts, to account for the fact that different trajectories may correspond to different intentions, e.g., because they were generated by different domain experts. In this work, we go one step further: using the intuitive concept of subgoals, we build upon the premise that even a single trajectory can be explained more efficiently locally within a certain context than globally, enabling a more compact representation of the observed behavior. Based on this assumption, we build an implicit intentional model of the agent's goals to forecast its behavior in unobserved situations. The result is an integrated Bayesian prediction framework that significantly outperforms existing IRL solutions and provides smooth policy estimates consistent with the expert's plan. Most notably, our framework naturally handles situations where the intentions of the agent change over time and classical IRL algorithms fail. In addition, due to its probabilistic nature, the model can be straightforwardly applied in active learning scenarios to guide the demonstration process of the expert.

1811.12211 2026-06-04 eess.SP cs.AI cs.SY eess.SY 版本更新

Particle Probability Hypothesis Density Filter based on Pairwise Markov Chains

基于配对马尔可夫链的粒子概率假说密度滤波器

Jiangyi Liu, Chunping Wang, Wei Wang

发表机构 * Electronic and optical engineering Department, Shijiazhuang Campus of Army Engineering University(陆军工程大学石家庄校区电子与光学工程学院) China Huayin Ordnance Test Center(中国华阴 ordnance 测试中心)

AI总结 本文提出了一种基于配对马尔可夫链模型的粒子概率假说密度滤波器(PF-PMC-PHD),用于非线性多目标跟踪系统,通过放松传统HMC模型的独立性假设,提升了跟踪性能。

详情
AI中文摘要

大多数多目标跟踪滤波器假设一个目标及其观测遵循隐藏马尔可夫链(HMC)模型,但HMC模型的隐含独立性假设在许多实际应用中不成立,配对马尔可夫链(PMC)模型比传统HMC模型更普遍适用。本文提出了一种基于PMC模型的粒子概率假说密度滤波器(PF-PMC-PHD),用于非线性多目标跟踪系统。仿真结果表明,PF-PMC-PHD滤波器的有效性,并在保持非线性和高斯HMC模型的局部物理特性的同时,放松其独立性假设的情况下,其跟踪性能优于基于HMC模型的粒子PHD滤波器。

英文摘要

Most multi-target tracking filters assume that one target and its observation follow a Hidden Markov Chain (HMC) model, but the implicit independence assumption of HMC model is invalid in many practical applications, and a Pairwise Markov Chain (PMC) model is more universally suitable than traditional HMC model. A particle probability hypothesis density filter based on PMC model (PF-PMC-PHD) is proposed for the nonlinear multi-target tracking system. Simulation results show the effectiveness of PF-PMC-PHD filter, and that the tracking performance of PF-PMC-PHD filter is superior to the particle PHD filter based on HMC model in a scenario where we kept the local physical properties of nonlinear and Gaussian HMC models while relaxing their independence assumption.

1811.11259 2026-06-04 cs.LG cs.AI cs.DS cs.SY eess.SY stat.ML 版本更新

Scaling Configuration of Energy Harvesting Sensors with Reinforcement Learning

基于强化学习的能源收集传感器的扩展配置

Francesco Fraternali, Bharathan Balaji, Rajesh Gupta

发表机构 * University of California, San Diego(加州大学圣迭戈分校) University of California, Los Angeles(加州大学洛杉矶分校)

AI总结 本文提出利用强化学习自动配置室内太阳能板能源收集传感器的采样率,通过减少训练阶段和计算需求,实现快速部署和大规模扩展,有效提升传感器数据采集效率并避免能源耗尽。

Comments 7 pages, 5 figures

详情
Journal ref
ENSsys '18: International Workshop on Energy Harvesting & Energy-Neutral Sensing Systems}{November 4, 2018}{Shenzhen, China
AI中文摘要

随着物联网(IoT)的出现,越来越多的能源收集方法被用于补充或替代电池供电传感器。能源收集传感器需要根据应用、硬件和环境条件进行配置,以最大化其效用。目前,传感器配置要么是手动的,要么基于启发式方法,需要宝贵的领域专业知识。强化学习(RL)是一种有前景的方法,可以自动化配置并高效扩展IoT部署,但尚未在实践中得到应用。我们提出了解决这一差距的解决方案:减少RL的训练阶段,使节点在部署后短时间内即可运行,并减少计算需求以扩展到大规模部署。我们专注于配置基于室内太阳能板的能源收集传感器的采样率。我们基于三个月内从5个传感器节点收集的数据创建了一个模拟器。我们的模拟结果表明,RL可以有效学习能源可用性模式,并配置传感器节点的采样率以在确保不耗尽能源存储的情况下最大化传感数据。通过我们的方法,节点可以在部署的第一天内投入使用。我们还展示了通过使用相似光照条件的节点共享单个策略来减少RL策略数量的可能性。

英文摘要

With the advent of the Internet of Things (IoT), an increasing number of energy harvesting methods are being used to supplement or supplant battery based sensors. Energy harvesting sensors need to be configured according to the application, hardware, and environmental conditions to maximize their usefulness. As of today, the configuration of sensors is either manual or heuristics based, requiring valuable domain expertise. Reinforcement learning (RL) is a promising approach to automate configuration and efficiently scale IoT deployments, but it is not yet adopted in practice. We propose solutions to bridge this gap: reduce the training phase of RL so that nodes are operational within a short time after deployment and reduce the computational requirements to scale to large deployments. We focus on configuration of the sampling rate of indoor solar panel based energy harvesting sensors. We created a simulator based on 3 months of data collected from 5 sensor nodes subject to different lighting conditions. Our simulation results show that RL can effectively learn energy availability patterns and configure the sampling rate of the sensor nodes to maximize the sensing data while ensuring that energy storage is not depleted. The nodes can be operational within the first day by using our methods. We show that it is possible to reduce the number of RL policies by using a single policy for nodes that share similar lighting conditions.

1811.09914 2026-06-04 eess.SY cs.AI cs.MA cs.RO cs.SY 版本更新

RADMPC: A Fast Decentralized Approach for Chance-Constrained Multi-Vehicle Path-Planning

RADMPC:一种用于机会约束多车辆路径规划的快速去中心化方法

Aaron Huang, Benjamin J. Ayton, Brian C. Williams

发表机构 * Computer Science and Artificial Intelligence Laboratory(计算机科学与人工智能实验室) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文提出了一种基于去中心化路径规划方法RADMPC的快速机会约束多车辆路径规划方法,通过评估车辆交互来确定需要耦合规划的车辆集,并利用IRA在较小的车辆集上快速规划安全路径,从而显著提高计算效率。

详情
AI中文摘要

鲁棒的多车辆路径规划对于确保运输、搜索救援和机器人探索等应用中的多车辆系统安全性至关重要。迭代风险分配(IRA)等机会约束方法已被开发用于环境扰动无界的场景。然而,多车辆情况下的机会约束方法通常采用集中策略,其中所有车辆对之间存在耦合关系。随着车队规模的增加,这种策略变得不可行,因为计算时间与规划的车辆数呈指数增长,由于车辆对之间的耦合约束呈多项式增长。我们提出了一种更快的机会约束多车辆路径规划方法,该方法依赖于一种称为风险意识去中心化模型预测控制(RADMPC)的去中心化路径规划方法,以快速近似集中IRA方法。RADMPC近似通过评估车辆交互来确定应耦合规划的车辆集。将IRA应用于由RADMPC近似确定的较小车辆集上,能够快速为整个车队规划安全路径。蒙特卡洛模拟分析证明了我们方法的正确性,并与集中IRA方法相比显示出显著的计算时间改进。

英文摘要

Robust multi-vehicle path-planning is important for ensuring the safety of multi-vehicle systems in applications like transportation, search and rescue, and robotic exploration. Chance-constrained methods like Iterative Risk Allocation (IRA)\cite{IRA} have been developed for situations where environmental disturbances are unbounded. However, chance-constrained methods for the multi-vehicle case generally use centralized strategies where the vehicle set is planned with couplings between all vehicle pairs. This approach is intractable as fleet size increases because computation time is exponential with respect to the number of vehicles being planned over due to a polynomial increase in coupling constraints between vehicle pairs. We present a faster approach for chance-constrained multi-vehicle path-planning that relies upon a decentralized path-planning method called Risk-Aware Decentralized Model Predictive Control (RADMPC) to rapidly approximate a centralized IRA approach. The RADMPC approximation is evaluated for vehicle interactions to determine the vehicle sets that should be planned in a coupled manner. Applying IRA to the smaller vehicle sets determined from the RADMPC approximation rapidly plans safe paths for the entire fleet. A Monte Carlo simulation analysis demonstrates the correctness of our approach and a significant improvement in computation time compared to a centralized IRA approach.

1811.06447 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Adversarial Resilience Learning - Towards Systemic Vulnerability Analysis for Large and Complex Systems

对抗韧性学习 - 面向大规模复杂系统系统性脆弱性分析

Lars Fischer, Jan-Menno Memmen, Eric MSP Veith, Martin Tröschel

发表机构 * the number of potential states is to large and the behaviour is too(复杂系统)

AI总结 本文提出对抗韧性学习(ARL)概念,用于建模、训练和分析人工神经网络作为复杂系统中竞争代理的表示。通过模拟电力系统中的攻击者和防御者角色,ARL提供了一种适应性强、可重复的基于行为的测试方法,能够检测之前未知的攻击向量。

Comments 10 pages

详情
AI中文摘要

本文介绍了对抗韧性学习(ARL),一种用于建模、训练和分析人工神经网络作为复杂系统中竞争代理的表示的概念。在我们的示例中,代理通常扮演攻击者或防御者的角色,旨在恶化或改进或保持系统定义的性能指标。我们的概念提供了一种适应性强、可重复的基于行为的测试方法,有机会检测之前未知的攻击向量。我们提供了ARL的构成命名法,并基于此描述了ARL在模拟电力系统中的初步实现的实验设置和结果。

英文摘要

This paper introduces Adversarial Resilience Learning (ARL), a concept to model, train, and analyze artificial neural networks as representations of competitive agents in highly complex systems. In our examples, the agents normally take the roles of attackers or defenders that aim at worsening or improving-or keeping, respectively-defined performance indicators of the system. Our concept provides adaptive, repeatable, actor-based testing with a chance of detecting previously unknown attack vectors. We provide the constitutive nomenclature of ARL and, based on it, the description of experimental setups and results of a preliminary implementation of ARL in simulated power systems.

1811.05788 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Learning to Compensate Photovoltaic Power Fluctuations from Images of the Sky by Imitating an Optimal Policy

通过模仿最优策略从天空图像中学习补偿光伏功率波动

Robin Spiess, Felix Berkenkamp, Jan Poland, Andreas Krause

发表机构 * Department of Computer Science, ETH Zurich(计算机科学系,苏黎世联邦理工学院) ABB Corporate Research, Switzerland(瑞士ABB企业研究)

AI总结 本文提出了一种基于深度学习的方法,利用天空图像预测性地补偿光伏功率波动,减少电池压力,通过模仿学习训练神经网络近似最优策略。

Comments 7 pages, 7 figures

详情
AI中文摘要

光伏(PV)发电站的输出功率取决于环境,因此会随时间波动。这导致光伏功率可能在电网中引起不稳定性,尤其是在日益广泛使用的情况下。限制功率输出变化率是缓解这些波动的常见方法,通常借助大型电池。一种使用这些电池补偿阶跃变化的反应控制器在实践中有效,但会导致电池因高能量通过而受到压力。在本文中,我们提出了一种深度学习方法,利用天空图像来预测性地补偿功率波动并减少电池压力。特别是,我们证明可以通过仅在事后可用的信息来计算最优控制策略。基于此,我们使用模仿学习训练一个神经网络,该网络近似这种事后最优策略,但仅使用当前可用的天空图像和传感器数据。我们对一个大规模的测量和图像数据集进行了评估,并展示了训练后的策略能够减少电池压力。

英文摘要

The energy output of photovoltaic (PV) power plants depends on the environment and thus fluctuates over time. As a result, PV power can cause instability in the power grid, in particular when increasingly used. Limiting the rate of change of the power output is a common way to mitigate these fluctuations, often with the help of large batteries. A reactive controller that uses these batteries to compensate ramps works in practice, but causes stress on the battery due to a high energy throughput. In this paper, we present a deep learning approach that uses images of the sky to compensate power fluctuations predictively and reduces battery stress. In particular, we show that the optimal control policy can be computed using information that is only available in hindsight. Based on this, we use imitation learning to train a neural network that approximates this hindsight-optimal policy, but uses only currently available sky images and sensor data. We evaluate our method on a large dataset of measurements and images from a real power plant and show that the trained policy reduces stress on the battery.

1811.04584 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Navigating Assistance System for Quadcopter with Deep Reinforcement Learning

四旋翼避障导航辅助系统基于深度强化学习

Tung-Cheng Wu, Shau-Yin Tseng, Chin-Feng Lai, Chia-Yu Ho, Ying-Hsun Lai

发表机构 * National Cheng Kung University(国立成功大学) Research Laboratories(研究实验室) Industrial Technology Research Institute(工业技术研究 institutes) Department of Computer Science(计算机科学系) Information Engineering(信息工程系) National Taitung University(国立台东大学)

AI总结 本文提出了一种基于深度强化学习的四旋翼避障导航辅助系统,通过两个功能模块分别实现路径导航和碰撞避障,实验表明该方法在500次飞行中碰撞率为14%。

Comments conference

详情
AI中文摘要

在本文中,我们提出了一种深度强化学习方法,用于四旋翼飞行器在飞行路径上绕过障碍物。在以往的研究中,算法仅控制四旋翼的前进方向。在本文中,我们使用两个功能来控制四旋翼。一个是四旋翼导航功能,它基于计算协调点并找到通往目标的直线路径。另一个功能是碰撞避障功能,它通过深度Q网络模型实现。两个功能都会输出旋转度数,智能体将结合这两个输出进行转向。此外,深度Q网络还可以使四旋翼向上或向下飞行以绕过障碍物并到达目标。我们的实验结果表明,在500次飞行后碰撞率为14%。基于这项工作,我们将训练更复杂的感知和转移模型以应用于真实的四旋翼飞行器。

英文摘要

In this paper, we present a deep reinforcement learning method for quadcopter bypassing the obstacle on the flying path. In the past study, the algorithm only controls the forward direction about quadcopter. In this letter, we use two functions to control quadcopter. One is quadcopter navigating function. It is based on calculating coordination point and find the straight path to the goal. The other function is collision avoidance function. It is implemented by deep Q-network model. Both two function will output rotating degree, the agent will combine both output and turn direct. Besides, deep Q-network can also make quadcopter fly up and down to bypass the obstacle and arrive at the goal. Our experimental result shows that the collision rate is 14% after 500 flights. Based on this work, we will train more complex sense and transfer model to the real quadcopter.

1811.03853 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Sample-Efficient Policy Learning based on Completely Behavior Cloning

基于完全行为克隆的高效策略学习

Qiming Zou, Ling Wang, Ke Lu, Yu Li

发表机构 * Department of Computer Science and Technology, Harbin Institute of Technology, China(计算机科学与技术系,哈尔滨工业大学,中国) Department of Management Science and Engineering, Anhui University of Technology, China(管理科学与工程系,安徽理工大学,中国)

AI总结 本文提出了一种基于完全行为克隆的策略初始化算法PLCBC,通过将模型预测控制转换为分段仿射函数并用神经网络表达,实现无训练的完全克隆,从而提高策略学习的效率和收敛性。

详情
AI中文摘要

直接策略搜索是强化学习中最重要的算法之一。然而,从头开始学习需要大量经验数据,并容易陷入局部极小值。此外,部分训练的策略可能会对智能体和环境产生危险的动作。为了解决这些挑战,本文提出了一种称为基于完全行为克隆的策略学习(PLCBC)的策略初始化算法。PLCBC首先使用多参数编程将模型预测控制(MPC)控制器转换为分段仿射(PWA)函数,并用神经网络表达此函数。通过这种方式,PLCBC可以在不损失性能的情况下完全克隆MPC控制器,并且是完全无训练的。实验表明,这种初始化策略可以帮助智能体在高奖励状态区域学习,并更快、更有效地收敛。

英文摘要

Direct policy search is one of the most important algorithm of reinforcement learning. However, learning from scratch needs a large amount of experience data and can be easily prone to poor local optima. In addition to that, a partially trained policy tends to perform dangerous action to agent and environment. In order to overcome these challenges, this paper proposed a policy initialization algorithm called Policy Learning based on Completely Behavior Cloning (PLCBC). PLCBC first transforms the Model Predictive Control (MPC) controller into a piecewise affine (PWA) function using multi-parametric programming, and uses a neural network to express this function. By this way, PLCBC can completely clone the MPC controller without any performance loss, and is totally training-free. The experiments show that this initialization strategy can help agent learn at the high reward state region, and converge faster and better.

1803.08287 2026-06-04 eess.SY cs.AI cs.LG cs.RO cs.SY 版本更新

Learning-based Model Predictive Control for Safe Exploration

基于学习的模型预测控制用于安全探索

Torsten Koller, Felix Berkenkamp, Matteo Turchetta, Andreas Krause

发表机构 * Vector Institute(向量研究所) Max Planck ETH Center for Learning Systems(马克斯·普朗克-ETH学习系统中心)

AI总结 本文提出了一种基于学习的模型预测控制方法,通过高斯过程先验假设构建可证明准确的轨迹置信区间,从而提供可证明的高概率安全保证,用于动态系统的安全高效探索和学习。

Comments Proc. of the Conference on Decision and Control, 2018

详情
AI中文摘要

基于学习的方法在没有显著系统先验知识的情况下成功解决了复杂控制任务。然而,这些方法通常不提供任何安全保证,这限制了它们在安全关键的现实应用中的使用。在本文中,我们提出了一种基于学习的模型预测控制方案,可以提供可证明的高概率安全保证。为此,我们利用高斯过程先验对动态特性进行正则性假设,以构建可证明准确的预测轨迹置信区间。与以往的方法不同,我们不假设模型不确定性是独立的。基于这些预测,我们保证轨迹满足安全约束。此外,我们使用终端集约束递归地保证在每个迭代中都存在安全的控制动作。在我们的实验中,我们展示了所提出算法可以安全且高效地探索和学习动态系统。

英文摘要

Learning-based methods have been successful in solving complex control tasks without significant prior knowledge about the system. However, these methods typically do not provide any safety guarantees, which prevents their use in safety-critical, real-world applications. In this paper, we present a learning-based model predictive control scheme that can provide provable high-probability safety guarantees. To this end, we exploit regularity assumptions on the dynamics in terms of a Gaussian process prior to construct provably accurate confidence intervals on predicted trajectories. Unlike previous approaches, we do not assume that model uncertainties are independent. Based on these predictions, we guarantee that trajectories satisfy safety constraints. Moreover, we use a terminal set constraint to recursively guarantee the existence of safe control actions at every iteration. In our experiments, we show that the resulting algorithm can be used to safely and efficiently explore and learn about dynamic systems.

1811.00426 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Improving the Modularity of AUV Control Systems using Behaviour Trees

使用行为树提高水下机器人控制系统的模块化程度

Christopher Iliffe Sprague, Özer Özkahraman, Andrea Munafo, Rachel Marlow, Alexander Phillips, Petter Ögren

发表机构 * Robotics, Perception and Learning Lab(机器人、感知与学习实验室) Royal Institute of Technology(皇家理工学院) National Oceanography Centre(国家海洋学研究中心)

AI总结 本文展示如何利用行为树设计模块化、多功能且稳健的控制架构,用于关键任务系统,特别针对自主水下机器人。研究强调了系统安全的稳健性、执行多种任务的多功能性以及模块化在结合稳健性和多功能性中的重要性。

Comments Submitted to 2018 IEEE OES Autonomous Underwater Vehicle Symposium

详情
AI中文摘要

在本文中,我们展示了行为树(BTs)如何用于设计模块化、多功能且稳健的控制架构,用于关键任务系统。特别是,我们在此背景下展示了自主水下机器人(AUVs)的应用。在系统安全方面,稳健性很重要,因为手动恢复AUVs往往非常困难。此外,多功能性对于执行多种不同任务至关重要。最后,模块化是实现稳健性和多功能性结合所必需的,因为多功能系统的复杂性需要封装在模块中,以便创建一个简单的整体结构,从而实现稳健性分析。所提出的设计通过典型的AUV任务进行了说明。

英文摘要

In this paper, we show how behaviour trees (BTs) can be used to design modular, versatile, and robust control architectures for mission-critical systems. In particular, we show this in the context of autonomous underwater vehicles (AUVs). Robustness, in terms of system safety, is important since manual recovery of AUVs is often extremely difficult. Further more, versatility is important to be able to execute many different kinds of missions. Finally, modularity is needed to achieve a combination of robustness and versatility, as the complexity of a versatile systems needs to be encapsulated in modules, in order to create a simple overall structure enabling robustness analysis. The proposed design is illustrated using a typical AUV mission.

1810.13072 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Formal Verification of Neural Network Controlled Autonomous Systems

神经网络控制自主系统的形式验证

Xiaowu Sun, Haitham Khedr, Yasser Shoukry

发表机构 * Department of Electrical Computer Engineering University of Maryland, College Park

AI总结 本文研究了如何形式验证配备神经网络控制器的自主机器人在LiDAR图像处理中安全性的核心问题,通过构建有限状态抽象并利用可达性分析计算安全的初始条件,提出了一种多项式时间算法来分区工作空间并计算对应的仿射成像函数,同时利用SMC编码分析神经网络行为,通过数值模拟验证了算法的效率。

详情
AI中文摘要

在本文中,我们考虑了正式验证配备神经网络(NN)控制器的自主机器人在处理LiDAR图像以产生控制动作时的安全性问题。给定一个由一组多边形障碍物特征化的工作空间,我们的目标是计算一组安全的初始条件,使得从这些初始条件出发的机器人轨迹能够保证避开障碍物。我们的方法是构建系统的有限状态抽象,并利用标准的可达性分析在有限状态抽象上计算安全的初始状态集。计算有限状态抽象的第一个技术问题是数学建模将机器人位置映射到LiDAR图像的成像函数。为此,我们引入了成像适应集的概念,作为工作空间的分区,在这些分区中,成像函数被保证为仿射的。我们开发了一种多项式时间算法,用于将工作空间划分为成像适应集并计算相应的仿射成像函数。给定这种工作空间分区,机器人的离散时间线性动力学以及一个预训练的具有修正线性单元(ReLU)非线性的神经网络控制器,第二个技术挑战是分析神经网络的行为。为此,我们利用满足模凸(SMC)编码来枚举所有可能的ReLU段落。SMC求解器随后使用布尔可满足性求解器和凸优化求解器,将问题分解为更小的子问题。为了加速这个过程,我们开发了一种预处理算法,可以快速修剪可行的ReLU段落。最后,我们通过数值模拟验证了所提出算法的效率,模拟中神经网络控制器的复杂性逐渐增加。

英文摘要

In this paper, we consider the problem of formally verifying the safety of an autonomous robot equipped with a Neural Network (NN) controller that processes LiDAR images to produce control actions. Given a workspace that is characterized by a set of polytopic obstacles, our objective is to compute the set of safe initial conditions such that a robot trajectory starting from these initial conditions is guaranteed to avoid the obstacles. Our approach is to construct a finite state abstraction of the system and use standard reachability analysis over the finite state abstraction to compute the set of the safe initial states. The first technical problem in computing the finite state abstraction is to mathematically model the imaging function that maps the robot position to the LiDAR image. To that end, we introduce the notion of imaging-adapted sets as partitions of the workspace in which the imaging function is guaranteed to be affine. We develop a polynomial-time algorithm to partition the workspace into imaging-adapted sets along with computing the corresponding affine imaging functions. Given this workspace partitioning, a discrete-time linear dynamics of the robot, and a pre-trained NN controller with Rectified Linear Unit (ReLU) nonlinearity, the second technical challenge is to analyze the behavior of the neural network. To that end, we utilize a Satisfiability Modulo Convex (SMC) encoding to enumerate all the possible segments of different ReLUs. SMC solvers then use a Boolean satisfiability solver and a convex programming solver and decompose the problem into smaller subproblems. To accelerate this process, we develop a pre-processing algorithm that could rapidly prune the space feasible ReLU segments. Finally, we demonstrate the efficiency of the proposed algorithms using numerical simulations with increasing complexity of the neural network controller.

1810.12429 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Breaking the Curse of Horizon: Infinite-Horizon Off-Policy Estimation

打破地平线诅咒:无限地平线离线估计

Qiang Liu, Lihong Li, Ziyang Tang, Dengyong Zhou

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Google Brain(谷歌大脑)

AI总结 本文提出了一种新的离线估计方法,通过直接在平稳状态访问分布上应用重要性采样来避免现有估计器中方差爆炸的问题,核心贡献是提出了一种估计两个平稳分布密度比的新方法,并推导了RKHS情况下的闭式解。

Comments 21 pages, 5 figures, NIPS 2018 (spotlight)

详情
AI中文摘要

我们考虑了估计目标策略预期奖励的离线估计问题,该问题使用由不同行为策略收集的样本进行估计。重要性采样(IS)已成为推导(近)无偏估计器的关键技术,但在长地平线问题中已知会遭受过度高的方差。在无限地平线问题的极端情况下,基于IS的估计器的方差可能甚至是无界的。在本文中,我们提出了一种新的离线估计方法,直接在平稳状态访问分布上应用重要性采样,以避免现有估计器所面临的爆炸方差问题。我们的关键贡献是提出了一种估计两个平稳分布密度比的新方法,仅从行为分布中采样轨迹。我们为估计问题开发了一种mini-max损失函数,并推导了RKHS情况下的闭式解。我们通过理论和实证分析支持我们的方法。

英文摘要

We consider the off-policy estimation problem of estimating the expected reward of a target policy using samples collected by a different behavior policy. Importance sampling (IS) has been a key technique to derive (nearly) unbiased estimators, but is known to suffer from an excessively high variance in long-horizon problems. In the extreme case of in infinite-horizon problems, the variance of an IS-based estimator may even be unbounded. In this paper, we propose a new off-policy estimation method that applies IS directly on the stationary state-visitation distributions to avoid the exploding variance issue faced by existing estimators.Our key contribution is a novel approach to estimating the density ratio of two stationary distributions, with trajectories sampled from only the behavior distribution. We develop a mini-max loss function for the estimation problem, and derive a closed-form solution for the case of RKHS. We support our method with both theoretical and empirical analyses.

1810.09729 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Design Challenges of Multi-UAV Systems in Cyber-Physical Applications: A Comprehensive Survey, and Future Directions

多无人机系统在网络物理应用中的设计挑战:综述与未来方向

Reza Shakeri, Mohammed Ali Al-Garadi, Ahmed Badawy, Amr Mohamed, Tamer Khattab, Abdulla Al-Ali, Khaled A. Harras, Mohsen Guizani

发表机构 * Carnegie Mellon University Qatar Campus(卡塔尔分校卡内基梅隆大学)

AI总结 本文综述了多无人机系统在网络物理应用中的关键设计挑战,探讨了目标和基础设施对象的覆盖与跟踪、能量高效导航以及基于机器学习的图像分析等核心方法,并提出了面向细粒度网络物理应用的先进算法和未来研究方向。

详情
AI中文摘要

无人驾驶飞行器(UAVs)近年来迅速发展,为一系列创新应用提供了支持,这些应用有可能从根本上改变网络物理系统(CPSs)的设计方式。CPSs 是一种现代系统,具有计算和物理潜力的协同作用,能够通过多种新机制与人类交互。使用 UAVs 在 CPS 应用中的主要优势在于其卓越的特性,包括机动性、动态性、易于部署、适应高度、敏捷性、可调节性和随时在任何地方有效评估现实功能的能力。此外,从技术角度来看,UAVs 被预测将成为高级 CPSs 发展的重要元素。因此,在本次综述中,我们旨在确定多 UAV 系统在 CPS 应用中最基本和重要的设计挑战。我们强调了关键且多方面的内容,涵盖目标和基础设施对象的覆盖与跟踪、能量高效的导航以及使用机器学习进行图像分析以支持细粒度的 CPS 应用。此外,还研究了关键原型和测试平台,以展示这些实用技术如何促进 CPS 应用。我们提出了面向设计挑战的最先进算法,结合定量和定性方法,并将这些挑战与重要的 CPS 应用映射,以得出关于每个应用挑战的深入结论。最后,我们总结了可能的新方向和想法,这些可能会影响这些领域的未来研究。

英文摘要

Unmanned Aerial Vehicles (UAVs) have recently rapidly grown to facilitate a wide range of innovative applications that can fundamentally change the way cyber-physical systems (CPSs) are designed. CPSs are a modern generation of systems with synergic cooperation between computational and physical potentials that can interact with humans through several new mechanisms. The main advantages of using UAVs in CPS application is their exceptional features, including their mobility, dynamism, effortless deployment, adaptive altitude, agility, adjustability, and effective appraisal of real-world functions anytime and anywhere. Furthermore, from the technology perspective, UAVs are predicted to be a vital element of the development of advanced CPSs. Therefore, in this survey, we aim to pinpoint the most fundamental and important design challenges of multi-UAV systems for CPS applications. We highlight key and versatile aspects that span the coverage and tracking of targets and infrastructure objects, energy-efficient navigation, and image analysis using machine learning for fine-grained CPS applications. Key prototypes and testbeds are also investigated to show how these practical technologies can facilitate CPS applications. We present and propose state-of-the-art algorithms to address design challenges with both quantitative and qualitative methods and map these challenges with important CPS applications to draw insightful conclusions on the challenges of each application. Finally, we summarize potential new directions and ideas that could shape future research in these areas.

1810.08759 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Design of robust H_inf fuzzy output feedback controller for affine nonlinear systems:Fuzzy Lyapunov function approach

面向仿真非线性系统的鲁棒H_∞模糊输出反馈控制器设计:模糊Lyapunov函数方法

Leila Rajabpour, Mokhtar Shasadeghi, Alireza Barzegar

发表机构 * University of Technology Malaysia(技术大学马来西亚) Shiraz University of Technology(谢尔兹技术大学) Nanyang Technological University(南洋理工大学)

AI总结 本文提出了一种基于非二次Lyapunov函数和引入松弛矩阵技术的新系统方法,用于具有扰动的仿真非线性系统。首先,将仿真非线性系统表示为Takagi-Sugeno(T-S)模糊双线性模型。随后,基于并行分布式补偿(PDC)方案设计鲁棒H_∞控制器。通过利用Lyapunov函数推导出稳定性条件,以线性矩阵不等式(LMIs)形式表达。此外,提出一些松弛矩阵以减少LMIs稳定性条件的保守性。最后,通过详细讨论等温连续搅拌釜反应器(CSTR)用于Van de Vusse反应器的应用,来说明所提方法的优点并验证其有效性。

详情
AI中文摘要

本文提出了一种基于非二次Lyapunov函数和引入松弛矩阵技术的新系统方法,用于具有扰动的仿真非线性系统。为实现目标,首先将仿真非线性系统表示为Takagi-Sugeno(T-S)模糊双线性模型。随后,基于并行分布式补偿(PDC)方案设计鲁棒H_∞控制器。然后,通过利用Lyapunov函数推导出稳定性条件,以线性矩阵不等式(LMIs)形式表达。此外,提出一些松弛矩阵以减少LMIs稳定性条件的保守性。最后,通过详细讨论等温连续搅拌釜反应器(CSTR)用于Van de Vusse反应器的应用,来说明所提方法的优点并验证其有效性。

英文摘要

In this paper, we propose a new systematic approach based on nonquadratic Lyapunov function and technique of introducing slack matrices, for a class of affine nonlinear systems with disturbance. To achieve the goal, first, the affine nonlinear system is represented via Takagi-Sugeno (T-S) fuzzy bilinear model. Subsequently, the robust H_inf controller is designed based on parallel distributed compensation (PDC) scheme. Then, the stability conditions are derived in terms of linear matrix inequalities (LMIs) by utilizing Lyapunov function. Moreover, some slack matrices are proposed to reduce the conservativeness of the LMI stability conditions. Finally, for illustrating the merits and verifying the effectiveness of the proposed approach, the application of an isothermal continuous stirred tank reactor (CSTR) for Van de Vusse reactor is discussed in details.

1810.04859 2026-06-04 cs.IT cs.AI cs.LG cs.SY eess.SY math.IT math.ST stat.TH 版本更新

Policy Design for Active Sequential Hypothesis Testing using Deep Learning

使用深度学习的主动顺序假设检验政策设计

Dhruva Kartik, Ekraam Sabir, Urbashi Mitra, Prem Natarajan

发表机构 * USC Information Sciences Institute(美国南加州大学信息科学研究所)

AI总结 本文研究了如何利用深度学习设计更有效的主动顺序假设检验策略,通过比较新提出的启发式方法与现有方法,展示了在某些场景下性能的显著提升。

Comments Accepted at 56th Annual Allerton Conference on Communication, Control, and Computing

详情
AI中文摘要

信息论在通信、压缩和假设检验等各类问题中取得了很大的成功,而随机控制理论则通过动态规划对部分可观测马尔可夫决策过程(POMDPs)的最优策略进行表征。然而,一般情况下找到这些问题的最优策略是计算上困难的,因此在实践中通常采用启发式方法。深度学习可以作为一种工具,用于设计更好的启发式方法。本文考虑了主动顺序假设检验问题,目标是通过自适应选择适当的查询来以最少的样本量可靠地推断真实假设。该问题可以建模为POMDP,并且文献中已存在其价值函数的界。然而,最优策略尚未被识别,各种启发式方法被使用。本文提出了两种新的启发式方法:一种基于深度强化学习,另一种基于KL散度零和博弈。这些启发式方法与最先进的解决方案进行了比较,并通过数值实验表明,在某些场景下,所提出的启发式方法能够显著优于现有方法。

英文摘要

Information theory has been very successful in obtaining performance limits for various problems such as communication, compression and hypothesis testing. Likewise, stochastic control theory provides a characterization of optimal policies for Partially Observable Markov Decision Processes (POMDPs) using dynamic programming. However, finding optimal policies for these problems is computationally hard in general and thus, heuristic solutions are employed in practice. Deep learning can be used as a tool for designing better heuristics in such problems. In this paper, the problem of active sequential hypothesis testing is considered. The goal is to design a policy that can reliably infer the true hypothesis using as few samples as possible by adaptively selecting appropriate queries. This problem can be modeled as a POMDP and bounds on its value function exist in literature. However, optimal policies have not been identified and various heuristics are used. In this paper, two new heuristics are proposed: one based on deep reinforcement learning and another based on a KL-divergence zero-sum game. These heuristics are compared with state-of-the-art solutions and it is demonstrated using numerical experiments that the proposed heuristics can achieve significantly better performance than existing methods in some scenarios.

1802.04205 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Efficient Hierarchical Robot Motion Planning Under Uncertainty and Hybrid Dynamics

在不确定性和混合动力学下的高效机器人运动规划

Ajinkya Jain, Scott Niekum

发表机构 * Department of Mechanical Engineering(机械工程系) Department of Computer Science(计算机科学系) University of Texas at Austin, USA(得克萨斯大学奥斯汀分校)

AI总结 本文提出了一种分层POMDP规划器,用于在存在不确定性的情况下为混合动力学模型生成成本优化的运动计划,通过将非线性动力学分解为离散的局部动力学模型,从而有效减少状态不确定性。

Comments 2nd Conference on Robot Learning (CoRL 2018), Zurich, Switzerland

详情
AI中文摘要

嘈杂的观测与非线性动力学是机器人运动规划中最大的挑战之一。通过将非线性动力学分解为一组离散的局部动力学模型,混合动力学提供了一种自然的方式来建模非线性动力学,尤其是在由于接触等因素导致动力学突然不连续的系统中。我们提出了一种分层POMDP规划器,该规划器为混合动力学模型开发成本优化的运动计划。分层规划器首先开发一个高层运动计划,以确定要访问的局部动力学模型的顺序,然后将其转换为详细的连续状态计划。这种分层规划方法将POMDP规划问题分解为更小的子部分,这些子部分可以以显著降低的计算成本解决。能够按顺序访问局部动力学模型的能力也提供了一种强大的方法,利用混合动力学来减少状态不确定性。我们在模拟领域导航任务和具有机械臂的装配任务上评估了所提出的规划器,证明了我们的方法可以有效解决具有高观测噪声和非线性动力学的任务,且计算成本显著低于直接规划方法。

英文摘要

Noisy observations coupled with nonlinear dynamics pose one of the biggest challenges in robot motion planning. By decomposing nonlinear dynamics into a discrete set of local dynamics models, hybrid dynamics provide a natural way to model nonlinear dynamics, especially in systems with sudden discontinuities in dynamics due to factors such as contacts. We propose a hierarchical POMDP planner that develops cost-optimized motion plans for hybrid dynamics models. The hierarchical planner first develops a high-level motion plan to sequence the local dynamics models to be visited and then converts it into a detailed continuous state plan. This hierarchical planning approach results in a decomposition of the POMDP planning problem into smaller sub-parts that can be solved with significantly lower computational costs. The ability to sequence the visitation of local dynamics models also provides a powerful way to leverage the hybrid dynamics to reduce state uncertainty. We evaluate the proposed planner on a navigation task in the simulated domain and on an assembly task with a robotic manipulator, showing that our approach can solve tasks having high observation noise and nonlinear dynamics effectively with significantly lower computational costs compared to direct planning approaches.

1810.03025 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Discretizing Logged Interaction Data Biases Learning for Decision-Making

对记录交互数据进行离散化会偏学习决策制定

Peter Schulam, Suchi Saria

发表机构 * Johns Hopkins University(约翰霍普金斯大学)

AI总结 本文研究了对非等间隔时间序列数据进行离散化对决策制定模型训练的影响,指出离散化引入了偏差,并提出使用连续时间模型来避免这一问题。

Comments This is a standalone short paper describing a new type of bias that can arise when learning from time series data for sequential decision-making problems

详情
AI中文摘要

时间序列数据通常在非等间隔时间点测量,常通过离散化作为预处理步骤。例如,客户到达时间的数据可能通过将每小时内的到达次数相加来简化,从而生成更易建模的离散时间序列。在本文摘要中,我们展示离散化引入了影响决策制定模型训练的偏差。我们称这种现象为离散化偏差,并表明可以通过使用连续时间模型来避免它。

英文摘要

Time series data that are not measured at regular intervals are commonly discretized as a preprocessing step. For example, data about customer arrival times might be simplified by summing the number of arrivals within hourly intervals, which produces a discrete-time time series that is easier to model. In this abstract, we show that discretization introduces a bias that affects models trained for decision-making. We refer to this phenomenon as discretization bias, and show that we can avoid it by using continuous-time models instead.

1809.09261 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Resilient Computing with Reinforcement Learning on a Dynamical System: Case Study in Sorting

基于动态系统的强化学习鲁棒计算:排序问题案例研究

Aleksandra Faust, James B. Aimone, Conrad D. James, Lydia Tapia

发表机构 * Google Brain, Mountain View, CA, USA(谷歌大脑,美国加利福尼亚州山景城) Sandia National Labs, Albuquerque, NM, USA(桑迪亚国家实验室,美国新墨西哥州阿尔伯克基)

AI总结 本文将计算过程建模为反馈控制问题,利用强化学习解决序列决策问题,通过排序问题案例展示鲁棒计算方法在克服传统编程局限性方面的有效性。

Comments 11 pages, accepted to CDC 2018. Here with additional evaluations

详情
AI中文摘要

机器人和自主代理在资源有限的情况下,通常依赖不完美的模型和传感器测量来完成目标导向任务。特别是,强化学习(RL)和反馈控制可以用来帮助机器人实现目标。本文基于这一领域的工作,将通用计算建模为反馈控制问题,使代理能够自主克服标准过程语言编程的局限性:对错误的鲁棒性和早期程序终止的容忍。我们的建模将计算视为程序变量空间中的轨迹生成。计算因此成为一个序列决策问题,通过强化学习(RL)解决,并通过李雅普诺夫稳定性理论分析以评估代理的鲁棒性和向目标的进展。我们通过一个典型的计算机科学问题——数组排序的案例研究来实现这一点。评估显示,我们的RL排序代理能够稳定地向渐近稳定的终点进展,对故障组件具有鲁棒性,并且比传统的快速排序和冒泡排序进行的数组操作更少。

英文摘要

Robots and autonomous agents often complete goal-based tasks with limited resources, relying on imperfect models and sensor measurements. In particular, reinforcement learning (RL) and feedback control can be used to help a robot achieve a goal. Taking advantage of this body of work, this paper formulates general computation as a feedback-control problem, which allows the agent to autonomously overcome some limitations of standard procedural language programming: resilience to errors and early program termination. Our formulation considers computation to be trajectory generation in the program's variable space. The computing then becomes a sequential decision making problem, solved with reinforcement learning (RL), and analyzed with Lyapunov stability theory to assess the agent's resilience and progression to the goal. We do this through a case study on a quintessential computer science problem, array sorting. Evaluations show that our RL sorting agent makes steady progress to an asymptotically stable goal, is resilient to faulty components, and performs less array manipulations than traditional Quicksort and Bubble sort.

1809.07098 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY 版本更新

Novelty-organizing team of classifiers in noisy and dynamic environments

在噪声和动态环境中组织新颖性的分类器团队

Danilo Vasconcellos Vargas, Hirotaka Takano, Junichi Murata

发表机构 * Graduate School of Information Science(信息科学研究生学校) Electrical Engineering Kyushu University Fukuoka, Japan Email(电气工程九州大学福冈日本电子邮件) Faculty of Information Science(信息科学学院)

AI总结 该研究提出了一种在噪声和动态环境中有效工作的分类器团队(NOTC),并通过连续动作山车问题及其变体进行验证,展示了NOTC在性能上的优势,尽管其初始化过程需要一些时间。

详情
Journal ref
2015 IEEE Congress on Evolutionary Computation (CEC)
AI中文摘要

在现实世界中,环境不断变化,输入变量受到噪声的影响。然而,很少有算法能够在这种情况下工作。在这里,新颖性组织分类器团队(NOTC)被应用于连续动作山车以及其两个变种:噪声山车和不稳定天气山车。这些问题分别考虑了噪声和问题动态的变化。此外,NOTC在这些问题中与神经进化拓扑增强(NEAT)进行了比较,揭示了两种方法之间的权衡。尽管NOTC在所有问题中均表现最佳,但NEAT需要更少的试验来收敛。证明了NOTC之所以表现更好,是因为其将输入空间划分为更易处理的问题。不幸的是,这种输入空间的划分也需要一些时间来初始化。

英文摘要

In the real world, the environment is constantly changing with the input variables under the effect of noise. However, few algorithms were shown to be able to work under those circumstances. Here, Novelty-Organizing Team of Classifiers (NOTC) is applied to the continuous action mountain car as well as two variations of it: a noisy mountain car and an unstable weather mountain car. These problems take respectively noise and change of problem dynamics into account. Moreover, NOTC is compared with NeuroEvolution of Augmenting Topologies (NEAT) in these problems, revealing a trade-off between the approaches. While NOTC achieves the best performance in all of the problems, NEAT needs less trials to converge. It is demonstrated that NOTC achieves better performance because of its division of the input space (creating easier problems). Unfortunately, this division of input space also requires a bit of time to bootstrap.

1709.06196 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Online algorithms for POMDPs with continuous state, action, and observation spaces

在线算法用于具有连续状态、动作和观察空间的POMDPs

Zachary Sunberg, Mykel Kochenderfer

发表机构 * Aeronautics and Astronautics Dept. Stanford University(航空航天系 斯坦福大学)

AI总结 本文提出POMCPOW和PFT-DPW算法,通过加权粒子过滤解决连续状态空间POMDPs的求解问题,验证了改进方法的有效性。

Comments Added Multilane section

详情
Journal ref
Short version published in 2018 proceedings of the International Conference on Automated Planning and Scheduling (ICAPS)
AI中文摘要

在线求解部分可观测马尔可夫决策过程的算法已被应用于具有大离散状态空间的问题,但连续状态、动作和观察空间仍具挑战性。本文首先探讨双级渐进扩展(DPW)作为解决方案,但证明该修改单独不足,因为搜索树中的信念表示坍缩为单个粒子,导致算法收敛到次优策略。本文提出并评估了两种新算法,POMCPOW和PFT-DPW,通过加权粒子过滤克服这一缺陷。仿真结果表明,这些改进使算法在先前方法失败的场景中取得成功。

英文摘要

Online solvers for partially observable Markov decision processes have been applied to problems with large discrete state spaces, but continuous state, action, and observation spaces remain a challenge. This paper begins by investigating double progressive widening (DPW) as a solution to this challenge. However, we prove that this modification alone is not sufficient because the belief representations in the search tree collapse to a single particle causing the algorithm to converge to a policy that is suboptimal regardless of the computation time. This paper proposes and evaluates two new algorithms, POMCPOW and PFT-DPW, that overcome this deficiency by using weighted particle filtering. Simulation results show that these modifications allow the algorithms to be successful where previous approaches fail.

1808.09806 2026-06-04 eess.SY cs.AI cs.SY 版本更新

MARL-FWC: Optimal Coordination of Freeway Traffic Control Measures

MARL-FWC:自由流交通控制措施的最优协调

Ahmed Fares, Walid Gomaa, Mohamed A. Khamis

发表机构 * Cyber-Physical Systems Lab(智能物理系统实验室) Egypt-Japan University of Science and Technology (E-JUST)(埃及-日本科学技术大学) Faculty of Engineering (Shoubra)(工程学院(舒卜拉)) Benha University(班纳大学) Faculty of Engineering(工程学院) Alexandria University(亚历山大大学)

AI总结 本文提出MARL-FWC系统,通过多智能体强化学习优化自由流交通控制,结合出入口限流和动态限速,实现交通流最优协调。

详情
AI中文摘要

本文的目标是通过多个出入口限流控制及其互补的动态限速(DSLs)来优化自由流的总体交通流。当最小化自由流密度与最大交通流临界比的差异时,可以达到最优的自由流运行。本文提出了一种多智能体强化学习用于自由流控制(MARL-FWC)系统,用于出入口限流和DSLs。MARL-FWC引入了一种基于协同马尔可夫决策过程建模(马尔可夫游戏)的新微观框架,并关联了合作Q学习算法。该技术在协调图框架下结合收益传播(Max-Plus算法),特别适合最优控制目的。MARL-FWC提供了三种控制设计:完全独立、完全分布式和集中式,适用于不同的网络架构。MARL-FWC被广泛测试以评估所提出的联合收益模型以及全局收益。实验在著名的VISSIM交通模拟器中进行,以评估MARL-FWC。实验结果表明,总旅行时间显著减少,平均速度增加(与基准情况相比),同时保持最优交通流。

英文摘要

The objective of this article is to optimize the overall traffic flow on freeways using multiple ramp metering controls plus its complementary Dynamic Speed Limits (DSLs). An optimal freeway operation can be reached when minimizing the difference between the freeway density and the critical ratio for maximum traffic flow. In this article, a Multi-Agent Reinforcement Learning for Freeways Control (MARL-FWC) system for ramps metering and DSLs is proposed. MARL-FWC introduces a new microscopic framework at the network level based on collaborative Markov Decision Process modeling (Markov game) and an associated cooperative Q-learning algorithm. The technique incorporates payoff propagation (Max-Plus algorithm) under the coordination graphs framework, particularly suited for optimal control purposes. MARL-FWC provides three control designs: fully independent, fully distributed, and centralized; suited for different network architectures. MARL-FWC was extensively tested in order to assess the proposed model of the joint payoff, as well as the global payoff. Experiments are conducted with heavy traffic flow under the renowned VISSIM traffic simulator to evaluate MARL-FWC. The experimental results show a significant decrease in the total travel time and an increase in the average speed (when compared with the base case) while maintaining an optimal traffic flow.

1807.08048 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Baidu Apollo EM Motion Planner

百度 Apollo EM 运动规划器

Haoyang Fan, Fan Zhu, Changchun Liu, Liangliang Zhang, Li Zhuang, Dong Li, Weicheng Zhu, Jiangtao Hu, Hongye Li, Qi Kong

发表机构 * Baidu USA LLC(百度美国有限公司)

AI总结 本文提出基于百度 Apollo 开源自动驾驶平台的实时运动规划系统,解决工业级4级运动规划问题,兼顾安全性、舒适性和可扩展性,通过分层结构实现多车道和单车道自动驾驶。

详情
AI中文摘要

本文介绍了一种基于百度 Apollo(开源)自动驾驶平台的实时运动规划系统。该系统旨在解决工业级4级运动规划问题,同时考虑安全性、舒适性和可扩展性。系统采用分层结构处理多车道和单车道自动驾驶:(1)系统顶层为多车道策略,通过并行计算的车道级轨迹进行比较以处理变道场景。(2)在车道级轨迹生成器中,基于弗伦兹框架迭代求解路径和速度优化。(3)对于路径和速度优化,提出结合动态规划和基于样条的二次规划的方法,构建可扩展且易于调节的框架,同时处理交通规则、障碍物决策和平滑性。该规划器可扩展至高速公路和低速城市驾驶场景。我们通过场景示例和道路测试结果展示了该算法。本文描述的系统自2017年9月Apollo v1.5发布以来已部署到数十辆百度Apollo自动驾驶车辆。截至2018年5月16日,该系统已在各种城市场景下进行了3,380小时和约68,000公里(42,253英里)的闭环自动驾驶测试。本文描述的算法可在https://github.com/ApolloAuto/apollo/tree/master/modules/planning上获得。

英文摘要

In this manuscript, we introduce a real-time motion planning system based on the Baidu Apollo (open source) autonomous driving platform. The developed system aims to address the industrial level-4 motion planning problem while considering safety, comfort and scalability. The system covers multilane and single-lane autonomous driving in a hierarchical manner: (1) The top layer of the system is a multilane strategy that handles lane-change scenarios by comparing lane-level trajectories computed in parallel. (2) Inside the lane-level trajectory generator, it iteratively solves path and speed optimization based on a Frenet frame. (3) For path and speed optimization, a combination of dynamic programming and spline-based quadratic programming is proposed to construct a scalable and easy-to-tune framework to handle traffic rules, obstacle decisions and smoothness simultaneously. The planner is scalable to both highway and lower-speed city driving scenarios. We also demonstrate the algorithm through scenario illustrations and on-road test results. The system described in this manuscript has been deployed to dozens of Baidu Apollo autonomous driving vehicles since Apollo v1.5 was announced in September 2017. As of May 16th, 2018, the system has been tested under 3,380 hours and approximately 68,000 kilometers (42,253 miles) of closed-loop autonomous driving under various urban scenarios. The algorithm described in this manuscript is available at https://github.com/ApolloAuto/apollo/tree/master/modules/planning.

1709.07224 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Local Communication Protocols for Learning Complex Swarm Behaviors with Deep Reinforcement Learning

局部通信协议用于通过深度强化学习学习复杂群集行为

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

发表机构 * School of Computer Science, University of Lincoln(林肯大学计算机科学学院) Department of Electrical Engineering, Technische Universität Darmstadt(达姆施塔特技术大学电气工程系)

AI总结 本文提出简单通信协议,利用深度强化学习在多机器人群环境中学习去中心化控制策略,通过直方图编码局部邻域关系并传输任务特定信息,如最短距离和方向,以完成协作任务。

Comments 13 pages, 4 figures, version 2, accepted at ANTS 2018

详情
AI中文摘要

群集系统对强化学习(RL)构成挑战,因为算法需要学习去中心化控制策略以应对代理的有限局部感知和通信能力。虽然直接定义代理行为困难,但可通过先验知识定义简单的通信协议。本文提出多种简单通信协议,用于深度强化学习在多机器人群环境中寻找去中心化控制策略。协议基于直方图编码代理的局部邻域关系,并可传输任务特定信息,如到目标的最短距离和方向。在我们的框架中,我们采用信任区域策略优化的变体来学习复杂协作任务,如编队和建立通信链路。我们在模拟的2D物理环境中评估了我们的发现,并比较了不同通信协议的影响。

英文摘要

Swarm systems constitute a challenging problem for reinforcement learning (RL) as the algorithm needs to learn decentralized control policies that can cope with limited local sensing and communication abilities of the agents. While it is often difficult to directly define the behavior of the agents, simple communication protocols can be defined more easily using prior knowledge about the given task. In this paper, we propose a number of simple communication protocols that can be exploited by deep reinforcement learning to find decentralized control policies in a multi-robot swarm environment. The protocols are based on histograms that encode the local neighborhood relations of the agents and can also transmit task-specific information, such as the shortest distance and direction to a desired target. In our framework, we use an adaptation of Trust Region Policy Optimization to learn complex collaborative tasks, such as formation building and building a communication link. We evaluate our findings in a simulated 2D-physics environment, and compare the implications of different communication protocols.

1709.05077 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Transforming Cooling Optimization for Green Data Center via Deep Reinforcement Learning

通过深度强化学习优化绿色数据中心的冷却系统

Yuanlong Li, Yonggang Wen, Kyle Guan, Dacheng Tao

发表机构 * School of Computer Science and Engineering, Nanyang Technological University(南洋理工大学计算机科学与工程学院) Bell Labs, Nokia(诺基亚贝尔实验室)

AI总结 本文提出利用数据中心监控数据优化冷却控制策略,采用深度强化学习框架设计端到端冷却控制算法,实现冷却成本降低11%的模拟平台结果及15%的实时数据节省。

详情
AI中文摘要

冷却系统在现代数据中心(DC)中起着关键作用。开发最优控制策略对于数据中心冷却系统是一个具有挑战性的任务。现有方法通常依赖于基于机械冷却、电气和热管理知识构建的系统模型近似,这难以设计且可能导致次优或不稳定性能。本文提出利用数据中心中的大量监控数据来优化控制策略。为此,将冷却控制策略设计转化为具有温度约束的能量成本最小化问题,并将其应用于新兴的深度强化学习(DRL)框架。具体而言,我们提出了一种基于actor-critic框架和深度确定性策略梯度(DDPG)算法的端到端冷却控制算法(CCA)。在所提出的CCA中,评估网络被训练以预测一个受数据中心房间冷却状态惩罚的能量成本计数器,而策略网络被训练以在给定当前负载和天气信息时预测优化的控制设置。所提出的算法在EnergyPlus模拟平台和从新加坡国家超级计算中心(NSCC)收集的实时数据跟踪上进行了评估。我们的结果表明,所提出的CCA在模拟平台上相比手动配置的基线控制算法可实现约11%的冷却成本节省。在基于跟踪的研究中,我们提出了一种去低估验证机制,因为我们无法直接在真实数据中心上测试该算法。尽管使用DUE结果较为保守,如果我们设置入口温度阈值为26.6摄氏度,我们仍能在NSCC数据跟踪上实现约15%的冷却能耗节省。

英文摘要

Cooling system plays a critical role in a modern data center (DC). Developing an optimal control policy for DC cooling system is a challenging task. The prevailing approaches often rely on approximating system models that are built upon the knowledge of mechanical cooling, electrical and thermal management, which is difficult to design and may lead to sub-optimal or unstable performances. In this paper, we propose utilizing the large amount of monitoring data in DC to optimize the control policy. To do so, we cast the cooling control policy design into an energy cost minimization problem with temperature constraints, and tap it into the emerging deep reinforcement learning (DRL) framework. Specifically, we propose an end-to-end cooling control algorithm (CCA) that is based on the actor-critic framework and an off-policy offline version of the deep deterministic policy gradient (DDPG) algorithm. In the proposed CCA, an evaluation network is trained to predict an energy cost counter penalized by the cooling status of the DC room, and a policy network is trained to predict optimized control settings when gave the current load and weather information. The proposed algorithm is evaluated on the EnergyPlus simulation platform and on a real data trace collected from the National Super Computing Centre (NSCC) of Singapore. Our results show that the proposed CCA can achieve about 11% cooling cost saving on the simulation platform compared with a manually configured baseline control algorithm. In the trace-based study, we propose a de-underestimation validation mechanism as we cannot directly test the algorithm on a real DC. Even though with DUE the results are conservative, we can still achieve about 15% cooling energy saving on the NSCC data trace if we set the inlet temperature threshold at 26.6 degree Celsius.

1807.03769 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Kernel-Based Learning for Smart Inverter Control

基于核方法的智能逆变器控制

Aditie Garg, Mana Jalali, Vassilis Kekatos, Nikolaos Gatsis

发表机构 * Dept. of ECE, Virginia Tech(维吉尼亚理工大学电子工程系) Dept. of ECE, Un. of Texas at San Antonio(德克萨斯大学圣安东尼奥分校电子工程系)

AI总结 本文提出非线性逆变器控制策略,通过类比多任务学习将反应控制视为核回归任务,利用线性化电网模型和预测数据场景,在馈线层面联合设计逆变器规则以最小化电压偏差和电阻损耗。

Comments Submitted to the 2018 IEEE Global Signal and Information Processing Conf., Symposium on Smart Energy Infrastructures

详情
AI中文摘要

目前,分布电网面临由间歇性太阳能发电引起的频繁电压波动的挑战。智能逆变器被倡导为一种快速响应的手段,用于调节电压并最小化电阻损耗。由于最优逆变器协调可能计算上具有挑战性,而预设的本地控制规则表现不佳,因此定制化的准静态控制规则被视为最佳折中方案。本文从仿射控制规则出发,提出非线性逆变器控制策略。通过类比多任务学习,将反应控制视为基于核的回归任务。利用线性化电网模型和给定的预期数据场景,在馈线层面联合设计逆变器规则,以最小化电压偏差和电阻损耗的凸组合,通过线性约束的二次规划。使用真实世界数据在基准馈线上的数值测试表明,非线性控制规则即使由少数非本地读数驱动,也能实现近最优性能。

英文摘要

Distribution grids are currently challenged by frequent voltage excursions induced by intermittent solar generation. Smart inverters have been advocated as a fast-responding means to regulate voltage and minimize ohmic losses. Since optimal inverter coordination may be computationally challenging and preset local control rules are subpar, the approach of customized control rules designed in a quasi-static fashion features as a golden middle. Departing from affine control rules, this work puts forth non-linear inverter control policies. Drawing analogies to multi-task learning, reactive control is posed as a kernel-based regression task. Leveraging a linearized grid model and given anticipated data scenarios, inverter rules are jointly designed at the feeder level to minimize a convex combination of voltage deviations and ohmic losses via a linearly-constrained quadratic program. Numerical tests using real-world data on a benchmark feeder demonstrate that nonlinear control rules driven also by a few non-local readings can attain near-optimal performance.

1807.02297 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Combinatorial Bandits for Incentivizing Agents with Dynamic Preferences

基于动态偏好的激励机制组合博弈问题

Tanner Fiez, Shreyas Sekar, Liyuan Zheng, Lillian J. Ratliff

发表机构 * Electrical Engineering Department, University of Washington(华盛顿大学电气工程系)

AI总结 本文提出一种多臂老虎机框架,用于在资源受限环境下匹配用户激励,结合贪心匹配、UCB算法和马尔可夫链混合时间,理论分析 regret 并通过合成和现实案例验证性能。

Comments Published as a conference paper in Conference on Uncertainty in Artificial Intelligence (UAI) 2018

详情
AI中文摘要

个性化激励或推荐设计以提高用户参与度正日益受到重视,随着数字平台提供商不断涌现。我们提出了一种多臂老虎机框架,用于匹配激励给用户,其偏好在事前未知且随时间动态变化,在资源受限环境下。我们设计了一种算法,结合了三个不同领域的思想:(i) 贪心匹配范式,(ii) 用于老虎机的上置信界算法 (UCB),以及 (iii) 马尔可夫链理论中的混合时间。对于该算法,我们提供了关于 regret 的理论界限,并通过合成和现实(如共享单车平台的供需匹配)示例展示了其性能。

英文摘要

The design of personalized incentives or recommendations to improve user engagement is gaining prominence as digital platform providers continually emerge. We propose a multi-armed bandit framework for matching incentives to users, whose preferences are unknown a priori and evolving dynamically in time, in a resource constrained environment. We design an algorithm that combines ideas from three distinct domains: (i) a greedy matching paradigm, (ii) the upper confidence bound algorithm (UCB) for bandits, and (iii) mixing times from the theory of Markov chains. For this algorithm, we provide theoretical bounds on the regret and demonstrate its performance via both synthetic and realistic (matching supply and demand in a bike-sharing platform) examples.

1807.00553 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.DS stat.ML 版本更新

A Broader View on Bias in Automated Decision-Making: Reflecting on Epistemology and Dynamics

对自动化决策中偏见的更广泛视角:反思认识论与动态性

Roel Dobbe, Sarah Dean, Thomas Gilbert, Nitin Kohli

发表机构 * Department of Electrical Engineering and Computer Sciences, University of California Berkeley, USA(加州大学伯克利分校电气工程与计算机科学系) Department of Rhetoric, University of California Berkeley, USA(加州大学伯克利分校修辞学系) School of Information, University of California Berkeley, USA(加州大学伯克利分校信息学院)

AI总结 本文探讨自动化决策中偏见的根源,将技术偏见视为认识论问题,新兴偏见视为动态反馈现象,强调需反思认识论并采用价值敏感设计方法改进决策系统。

Comments Presented at the 2018 Workshop on Fairness, Accountability and Transparency in Machine Learning during ICML 2018, Stockholm, Sweden

详情
AI中文摘要

机器学习(ML)正日益应用于现实世界,提供可操作见解并成为自动化决策系统的基础。尽管训练数据中固有的偏见是公平性讨论的核心问题,但这些系统也受到技术性和新兴偏见的影响,后者常作为实施中的上下文特定产物出现。本文将技术偏见视为认识论问题,新兴偏见视为动态反馈现象。为激发关于如何改变机器学习实践以有效应对这些问题的讨论,本文探索了偏见的更广泛视角,强调反思认识论的必要性,并指出价值敏感设计方法以重新审视自动化决策系统的设计和实施过程。

英文摘要

Machine learning (ML) is increasingly deployed in real world contexts, supplying actionable insights and forming the basis of automated decision-making systems. While issues resulting from biases pre-existing in training data have been at the center of the fairness debate, these systems are also affected by technical and emergent biases, which often arise as context-specific artifacts of implementation. This position paper interprets technical bias as an epistemological problem and emergent bias as a dynamical feedback phenomenon. In order to stimulate debate on how to change machine learning practice to effectively address these issues, we explore this broader view on bias, stress the need to reflect on epistemology, and point to value-sensitive design methodologies to revisit the design and implementation process of automated decision-making systems.

1711.10868 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

La production de nitrites lors de la dénitrification des eaux usées par biofiltration - Stratégie de contrôle et de réduction des concentrations résiduelles

废水生物过滤脱硝过程中亚硝酸盐的生成 - 控制与残留浓度的减少策略

Vincent Rocher, Cédric Join, Stéphane Mottelet, Jean Bernier, Sabrina Rechdaoui-Guérin, Sam Azimi, Paul Lessard, André Pauss, Michel Fliess

发表机构 * SIAAP (Syndicat Interdépartemental pour l'Assainissement de l'Agglomération Parisienne)(巴黎大都会污水处理协会) CRAN (CNRS, UMR 7039)(CRAN(国家科学研究中心,UMR 7039)) TIMR (EA 4297)(TIMR(EA 4297)) Département de génie civil et de génie des eaux, Université Laval(土木工程与水工程系,拉瓦尔大学) LIX (CNRS, UMR 7161)(LIX(国家科学研究中心,UMR 7161)) AL.I.E.N. (ALgèbre pour Identification & Estimation Numériques)(AL.I.E.N.(代数用于识别与数值估计))

AI总结 研究通过MOCOPEE项目探讨废水脱硝过程中亚硝酸盐生成机制,开发测量与控制工具以降低现场亚硝酸盐浓度,采用模型无关控制策略提升脱硝效率。

Comments in french, Journal of Water Science, to appear

详情
Journal ref
Revue des Sciences de l'Eau, 31(1), 2018, 61-73
AI中文摘要

近年来,巴黎大区污水处理厂对脱硝后处理过程的流行导致塞纳河中亚硝酸盐浓度回升。控制脱硝后亚硝酸盐生成成为关键技术问题。MOCOPEE项目研究了废水脱硝过程中亚硝酸盐生成的机理,并开发了测量和控制工具以减少现场亚硝酸盐产量。先前研究表明,典型的甲醇投加策略会导致反应器中碳氮比波动,从而引起出水亚硝酸盐浓度不稳定。因此,在SimBio模型上测试了将模型无关控制添加到经典投加策略的可能性,该模型模拟了废水生物滤池的行为。相应的

英文摘要

The recent popularity of post-denitrification processes in the greater Paris area wastewater treatment plants has caused a resurgence of the presence of nitrite in the Seine river. Controlling the production of nitrite during the post-denitrification has thus become a major technical issue. Research studies have been led in the MOCOPEE program (www.mocopee.com) to better understand the underlying mechanisms behind the production of nitrite during wastewater denitrification and to develop technical tools (measurement and control solutions) to assist on-site reductions of nitrite productions. Prior studies have shown that typical methanol dosage strategies produce a varying carbon-to-nitrogen ratio in the reactor, which in turn leads to unstable nitrite concentrations in the effluent. The possibility of adding a model-free control to the actual classical dosage strategy has thus been tested on the SimBio model, which simulates the behavior of wastewater biofilters. The corresponding "intelligent" feedback loop, which is using effluent nitrite concentrations, compensates the classical strategy only when needed. Simulation results show a clear improvement in average nitrite concentration level and level stability in the effluent, without a notable overcost in methanol.

1806.08083 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Expanding the Active Inference Landscape: More Intrinsic Motivations in the Perception-Action Loop

拓展主动推断领域:感知-动作循环中的更多内在动机

Martin Biehl, Christian Guckelsberger, Christoph Salge, Simón C. Smith, Daniel Polani

发表机构 * Araya Inc.(Araya公司) Computational Creativity Group, Department of Computing, Goldsmiths, University of London(Goldsmiths大学计算创意小组) Game Innovation Lab, Department of Computer Science and Engineering, New York University(纽约大学游戏创新实验室) Sepia Lab, Adaptive Systems Research Group, Department of Computer Science, University of Hertfordshire(赫特福德大学计算机科学系Sepia实验室) Institute of Perception, Action and Behaviour, School of Informatics, The University of Edinburgh(爱丁堡大学信息学院感知、行为与行为研究所)

AI总结 本文探讨主动推断中是否可利用其他内在动机替代原有动机,同时保持核心机制,并通过形式化方法连接通用强化学习。

Comments 53 pages, 6 figures, 2 tables

详情
AI中文摘要

主动推断是一种雄心勃勃的理论,将自主代理的感知、推断和动作选择统一于单一原则下。它为许多认知现象提供了生物合理解释,包括意识。在主动推断中,动作选择由一个评估未来动作的客观函数驱动,该函数基于当前推断的世界信念。主动推断本质上独立于外在奖励,使其在不同环境或代理形态中具有高度鲁棒性。在文献中,共享这种独立性的范式被总结为内在动机。与主动推断不同,这些动机模型通常不承诺特定的推断和动作选择机制。本文研究主动推断的推断和动作选择机制是否也可用于其他内在动机替代原动机。感知-动作循环明确将推断和动作选择与环境和代理记忆联系起来,因此被用作分析基础。我们重构了主动推断方法,将其原始公式定位其中,并展示如何在保持许多原始特征的同时使用其他内在动机。此外,我们通过形式化方法展示了与通用强化学习的联系。主动推断研究可能从比较其他内在动机诱导的动力学中受益。内在动机研究可能从另一种实现内在动机代理的方式中受益,该方式也共享主动推断的生物合理性。

英文摘要

Active inference is an ambitious theory that treats perception, inference and action selection of autonomous agents under the heading of a single principle. It suggests biologically plausible explanations for many cognitive phenomena, including consciousness. In active inference, action selection is driven by an objective function that evaluates possible future actions with respect to current, inferred beliefs about the world. Active inference at its core is independent from extrinsic rewards, resulting in a high level of robustness across e.g.\ different environments or agent morphologies. In the literature, paradigms that share this independence have been summarised under the notion of intrinsic motivations. In general and in contrast to active inference, these models of motivation come without a commitment to particular inference and action selection mechanisms. In this article, we study if the inference and action selection machinery of active inference can also be used by alternatives to the originally included intrinsic motivation. The perception-action loop explicitly relates inference and action selection to the environment and agent memory, and is consequently used as foundation for our analysis. We reconstruct the active inference approach, locate the original formulation within, and show how alternative intrinsic motivations can be used while keeping many of the original features intact. Furthermore, we illustrate the connection to universal reinforcement learning by means of our formalism. Active inference research may profit from comparisons of the dynamics induced by alternative intrinsic motivations. Research on intrinsic motivations may profit from an additional way to implement intrinsically motivated agents that also share the biological plausibility of active inference.

1803.02998 2026-06-04 eess.SY cs.AI cs.SY 版本更新

DeepCAS: A Deep Reinforcement Learning Algorithm for Control-Aware Scheduling

DeepCAS: 一种用于控制感知调度的深度强化学习算法

Burak Demirel, Arunselvan Ramaswamy, Daniel E. Quevedo, Holger Karl

发表机构 * Paderborn University(帕德博恩大学)

AI总结 本文提出DeepCAS算法,通过深度强化学习实现控制感知调度,优化子系统控制器并最小化控制损失,实验证明其优于周期性调度。

详情
AI中文摘要

我们考虑由多个独立受控子系统组成的网络控制系统,这些系统通过共享通信网络运行。此类系统在网络物理系统、物联网和大规模工业系统中普遍存在。在许多大规模设置中,通信网络的规模小于系统的规模,从而引发调度问题。本文的主要贡献是开发一种基于深度强化学习的控制感知调度(DeepCAS)算法,以解决这些问题。我们采用以下(最优)设计策略:首先,为每个子系统合成最优控制器;其次,设计一个学习算法,以适应所选子系统(被控对象)和控制器。由于这种适应性,我们的算法找到一个调度方案,以最小化控制损失。我们通过实验证明,DeepCAS找到的调度性能优于周期性调度。

英文摘要

We consider networked control systems consisting of multiple independent controlled subsystems, operating over a shared communication network. Such systems are ubiquitous in cyber-physical systems, Internet of Things, and large-scale industrial systems. In many large-scale settings, the size of the communication network is smaller than the size of the system. In consequence, scheduling issues arise. The main contribution of this paper is to develop a deep reinforcement learning-based \emph{control-aware} scheduling (\textsc{DeepCAS}) algorithm to tackle these issues. We use the following (optimal) design strategy: First, we synthesize an optimal controller for each subsystem; next, we design a learning algorithm that adapts to the chosen subsystems (plants) and controllers. As a consequence of this adaptation, our algorithm finds a schedule that minimizes the \emph{control loss}. We present empirical results to show that \textsc{DeepCAS} finds schedules with better performance than periodic ones.

1806.00727 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Closed-loop Bayesian Semantic Data Fusion for Collaborative Human-Autonomy Target Search

闭环贝叶斯语义数据融合用于协同人机目标搜索

Luke Burks, Ian Loefgren, Luke Barbier, Jeremy Muesing, Jamison McGinley, Sousheel Vunnam, Nisar Ahmed

AI总结 本文提出一种闭环贝叶斯语义数据融合方法,通过CPOMDP规划生成最优轨迹,结合不完美传感器数据和人类提供的语义观察,提升动态目标搜索效率。

Comments Final version accepted and submitted to 2018 FUSION Conference (Cambridge, UK, July 2018)

详情
AI中文摘要

在搜索应用中,自主无人车辆必须能够高效重新获取和定位长时间可能处于视线外的大空间中移动目标。为此,本文开发并验证了一种新的协同人机感知解决方案。我们的方法利用连续部分可观测马尔可夫决策过程(CPOMDP)规划,生成最优利用不完美传感器数据和可请求的语义自然语言观察的车辆轨迹。关键创新是可扩展的层次高斯混合模型形式,用于在连续动态状态空间中高效求解包含语义观察的CPOMDPs。该方法在定制测试平台上通过真实的人机团队在动态室内目标搜索和捕捉场景中进行了演示和验证。

英文摘要

In search applications, autonomous unmanned vehicles must be able to efficiently reacquire and localize mobile targets that can remain out of view for long periods of time in large spaces. As such, all available information sources must be actively leveraged -- including imprecise but readily available semantic observations provided by humans. To achieve this, this work develops and validates a novel collaborative human-machine sensing solution for dynamic target search. Our approach uses continuous partially observable Markov decision process (CPOMDP) planning to generate vehicle trajectories that optimally exploit imperfect detection data from onboard sensors, as well as semantic natural language observations that can be specifically requested from human sensors. The key innovation is a scalable hierarchical Gaussian mixture model formulation for efficiently solving CPOMDPs with semantic observations in continuous dynamic state spaces. The approach is demonstrated and validated with a real human-robot team engaged in dynamic indoor target search and capture scenarios on a custom testbed.

1806.00589 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Efficient Entropy for Policy Gradient with Multidimensional Action Space

在多维动作空间中高效的策略梯度熵

Yiming Zhang, Quan Ho Vuong, Kenny Song, Xiao-Yue Gong, Keith W. Ross

发表机构 * New York University(纽约大学) New York University Abu Dhabi(纽约大学阿布扎克分校) New York University Shanghai(纽约大学上海分校) Massachusetts Institute of Technology(麻省理工学院)

AI总结 本文提出高效计算高维动作空间策略梯度熵的方法,通过改进的无偏估计器提升探索效率,在多猎手多兔子网格游戏和多智能体多臂老虎机问题中验证了其有效性。

详情
AI中文摘要

近年来,深度强化学习在解决高维状态空间(如Atari游戏)的序列决策过程方面表现出色。然而,许多强化学习问题涉及高维离散动作空间和高维状态空间。本文考虑熵奖励,用于在策略梯度中鼓励探索。在高维动作空间中,计算熵及其梯度需要枚举所有动作并为每个动作运行前向和反向传播,这可能计算上不可行。我们开发了几种新颖的无偏估计器用于熵奖励及其梯度。我们将这些估计器应用于几种参数化策略模型,包括独立采样、CommNet、带有修改MDP的自回归和带有LSTM的自回归。最后,我们在两个环境中测试我们的算法:一个多猎手多兔子网格游戏和一个多智能体多臂老虎机问题。结果表明,我们的熵估计器在边际额外计算成本下显著提升了性能。

英文摘要

In recent years, deep reinforcement learning has been shown to be adept at solving sequential decision processes with high-dimensional state spaces such as in the Atari games. Many reinforcement learning problems, however, involve high-dimensional discrete action spaces as well as high-dimensional state spaces. This paper considers entropy bonus, which is used to encourage exploration in policy gradient. In the case of high-dimensional action spaces, calculating the entropy and its gradient requires enumerating all the actions in the action space and running forward and backpropagation for each action, which may be computationally infeasible. We develop several novel unbiased estimators for the entropy bonus and its gradient. We apply these estimators to several models for the parameterized policies, including Independent Sampling, CommNet, Autoregressive with Modified MDP, and Autoregressive with LSTM. Finally, we test our algorithms on two environments: a multi-hunter multi-rabbit grid game and a multi-agent multi-arm bandit problem. The results show that our entropy estimators substantially improve performance with marginal additional computational cost.

1709.05746 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Adversarial Discriminative Sim-to-real Transfer of Visuo-motor Policies

对抗性判别仿真到现实的视觉-运动策略转移

Fangyi Zhang, Jürgen Leitner, Zongyuan Ge, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)(澳大利亚机器人视觉中心) Queensland University of Technology (QUT)(昆士兰技术大学) Monash University(墨尔本大学)

AI总结 本文提出对抗性判别仿真到现实转移方法,减少现实数据标注成本,在桌面上物体抓取任务中,通过视觉观测控制7自由度机械臂在障碍物中抓取蓝色立方体,仅需93个标注和186个未标注图像即可实现97.8%的成功率和1.8厘米的控制精度。

Comments Under review for the International Journal of Robotics Research

详情
AI中文摘要

各种方法已被提出以学习用于现实世界机器人应用的视觉-运动策略。一种解决方案是首先在仿真中学习然后转移到现实世界。在转移过程中,大多数现有方法需要带有标签的真实图像。然而,在许多机器人应用中,标注过程往往昂贵甚至不实际。在本文中,我们提出了一种对抗性判别仿真到现实转移方法,以减少标注真实数据的成本。通过模块化网络在桌面物体抓取任务中验证了该方法的有效性,其中7自由度的机械臂以速度模式控制在障碍物中抓取蓝色立方体。对抗性转移方法将标注真实数据的需求减少了50%。策略可以仅使用93个标注和186个未标注的真实图像转移到现实环境。转移的视觉-运动策略对训练中未见过的物体和移动目标具有鲁棒性,实现了97.8%的成功率和1.8厘米的控制精度。

英文摘要

Various approaches have been proposed to learn visuo-motor policies for real-world robotic applications. One solution is first learning in simulation then transferring to the real world. In the transfer, most existing approaches need real-world images with labels. However, the labelling process is often expensive or even impractical in many robotic applications. In this paper, we propose an adversarial discriminative sim-to-real transfer approach to reduce the cost of labelling real data. The effectiveness of the approach is demonstrated with modular networks in a table-top object reaching task where a 7 DoF arm is controlled in velocity mode to reach a blue cuboid in clutter through visual observations. The adversarial transfer approach reduced the labelled real data requirement by 50%. Policies can be transferred to real environments with only 93 labelled and 186 unlabelled real images. The transferred visuo-motor policies are robust to novel (not seen in training) objects in clutter and even a moving target, achieving a 97.8% success rate and 1.8 cm control accuracy.

1805.09613 2026-06-04 stat.ML cs.AI cs.LG cs.RO cs.SY eess.SY 版本更新

A0C: Alpha Zero in Continuous Action Space

A0C:在连续动作空间中的Alpha Zero

Thomas M. Moerland, Joost Broekens, Aske Plaat, Catholijn M. Jonker

发表机构 * Dep. of Computer Science, Delft University of Technology, The Netherlands(代尔夫特理工大学计算机科学系,荷兰) Dep. of Computer Science, Leiden University, The Netherlands(莱顿大学计算机科学系,荷兰)

AI总结 本文提出将Alpha Zero扩展到连续动作空间的理论方法,并在倒摆任务中验证了其可行性,为连续动作空间中的迭代搜索与学习应用奠定了基础。

详情
AI中文摘要

Alpha Zero的核心创新在于树搜索与深度学习的结合,这在国际象棋、国际跳棋和围棋等离散动作空间的游戏中证明非常成功。然而,许多现实世界的强化学习领域具有连续动作空间,例如机器人控制、导航和自动驾驶汽车。本文提出了将Alpha Zero扩展到连续动作空间所需的理论扩展。我们还提供了一些在倒摆摆起任务中的初步实验,实证地展示了我们方法的可行性。因此,这项工作为在连续动作空间领域中应用迭代搜索与学习奠定了基础。

英文摘要

A core novelty of Alpha Zero is the interleaving of tree search and deep learning, which has proven very successful in board games like Chess, Shogi and Go. These games have a discrete action space. However, many real-world reinforcement learning domains have continuous action spaces, for example in robotic control, navigation and self-driving cars. This paper presents the necessary theoretical extensions of Alpha Zero to deal with continuous action space. We also provide some preliminary experiments on the Pendulum swing-up task, empirically showing the feasibility of our approach. Thereby, this work provides a first step towards the application of iterated search and learning in domains with a continuous action space.

1805.07196 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Supervisory Control of Probabilistic Discrete Event Systems under Partial Observation

对在部分观测下概率离散事件系统的监督控制

Weilin Deng, Jingkai Yang, Daowen Qiu

发表机构 * School of Data and Computer Science, Sun Yat-sen University(中山大学数据与计算机科学学院)

AI总结 研究在概率监督控制器和部分观测假设下概率离散事件系统(PDESs)的监督控制,提出概率可控性和可观测性的概念,并设计多项式验证算法,同时引入并计算了最优控制问题的解。

Comments 36 pages, comments are welcome

详情
AI中文摘要

对在部分观测下概率离散事件系统的监督控制进行了研究,假设监督控制器是概率性的且具有部分观测。定义了概率P监督器,为每个观测指定控制模式的概率分布。提出了概率可控性和可观测性的概念,并证明其为概率P监督器存在的必要且充分条件。此外,提出了概率可控性和可观测性的多项式验证算法。还引入了下界概率可控且可观的超语言,并将其作为PDESs最优控制问题的解进行计算。通过几个例子展示了所获得的结果。

英文摘要

The supervisory control of probabilistic discrete event systems (PDESs) is investigated under the assumptions that the supervisory controller (supervisor) is probabilistic and has a partial observation. The probabilistic P-supervisor is defined, which specifies a probability distribution on the control patterns for each observation. The notions of the probabilistic controllability and observability are proposed and demonstrated to be a necessary and sufficient conditions for the existence of the probabilistic P-supervisors. Moreover, the polynomial verification algorithms for the probabilistic controllability and observability are put forward. In addition, the infimal probabilistic controllable and observable superlanguage is introduced and computed as the solution of the optimal control problem of PDESs. Several examples are presented to illustrate the results obtained.

1805.04201 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Learning to Grasp Without Seeing

无需视觉的抓取学习

Adithyavairavan Murali, Yin Li, Dhiraj Gandhi, Abhinav Gupta

发表机构 * The Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所)

AI总结 本文提出基于触觉感知的抓取方法,通过触觉信号表征和迭代重抓取提升抓取稳定性,实验表明在无视觉信息下可有效抓取新型物体。

详情
AI中文摘要

能否在不看到物体的情况下让机器人抓取未知物体?本文提出了一种基于触觉感知的解决方案,结合触觉信号定位与触觉反馈重抓取。我们创建了一个大规模抓取数据集,包含超过30帧RGB图像和280万条触觉样本。提出了一种无监督自编码方案,显著提升了触觉感知任务的性能。系统分为两个步骤:首先,触觉定位模型通过粒子滤波聚合目标信息,输出物体位置估计以建立初始抓取;其次,重抓取模型基于学习特征逐步改进抓取,估计抓取稳定性并预测下一步调整。最终通过大量实验验证了在无视觉信息下抓取新型物体的有效性,并在视觉策略基础上提升了整体准确率10.6%。

英文摘要

Can a robot grasp an unknown object without seeing it? In this paper, we present a tactile-sensing based approach to this challenging problem of grasping novel objects without prior knowledge of their location or physical properties. Our key idea is to combine touch based object localization with tactile based re-grasping. To train our learning models, we created a large-scale grasping dataset, including more than 30 RGB frames and over 2.8 million tactile samples from 7800 grasp interactions of 52 objects. To learn a representation of tactile signals, we propose an unsupervised auto-encoding scheme, which shows a significant improvement of 4-9% over prior methods on a variety of tactile perception tasks. Our system consists of two steps. First, our touch localization model sequentially 'touch-scans' the workspace and uses a particle filter to aggregate beliefs from multiple hits of the target. It outputs an estimate of the object's location, from which an initial grasp is established. Next, our re-grasping model learns to progressively improve grasps with tactile feedback based on the learned features. This network learns to estimate grasp stability and predict adjustment for the next grasp. Re-grasping thus is performed iteratively until our model identifies a stable grasp. Finally, we demonstrate extensive experimental results on grasping a large set of novel objects using tactile sensing alone. Furthermore, when applied on top of a vision-based policy, our re-grasping model significantly boosts the overall accuracy by 10.6%. We believe this is the first attempt at learning to grasp with only tactile sensing and without any prior object knowledge.

1805.03090 2026-06-04 math.OC cs.AI cs.SY eess.SY 版本更新

Deception in Optimal Control

最优控制中的欺骗

Melkior Ornik, Ufuk Topcu

发表机构 * Institute for Computational Engineering and Sciences, University of Texas at Austin(德克萨斯大学奥斯汀分校计算工程与科学研究所) Department of Aerospace Engineering and Engineering Mechanics and the Institute for Computational Engineering and Sciences, University of Texas at Austin(德克萨斯大学奥斯汀分校航空航天工程与工程力学系及计算工程与科学研究所)

AI总结 本文提出一个数学严谨的框架,用于定义最优控制中的欺骗,通过设计最优欺骗策略,考虑代理和对手的信念空间,并讨论在不确定性和部分可观测马尔可夫决策过程中的欺骗策略设计。

详情
AI中文摘要

本文考虑了一个对抗性场景,其中一方试图实现目标,而其对手试图学习该方的意图并阻止其达成目标。代理有动机试图欺骗对手,同时努力实现其目标。本文的主要贡献是引入了一个数学严谨的框架,用于在最优控制背景下定义欺骗。核心概念是信念诱导奖励:一种奖励不仅依赖于代理的状态和动作,还依赖于对手的信念。设计最优欺骗策略成为在代理状态空间和对手信念空间的乘积上进行最优控制设计的问题。所提出的框架允许在任意具有奖励函数的控制系统中定义欺骗,以及带有额外限制代理控制策略的规范。除了定义欺骗外,我们还讨论了在代理对对手学习过程的知识不确定时如何设计最优欺骗策略。在论文后半部分,我们聚焦于代理行为由马尔可夫决策过程决定的场景,并展示在缺乏对手知识时设计最优欺骗策略自然减少到之前讨论的控制设计问题中部分可观测或不确定的马尔可夫决策过程中。最后,我们给出了两个欺骗策略的例子:一个“警察与小偷”场景和一个代理在移动时使用伪装的例子。我们展示了在这些例子中最优欺骗策略遵循上述设置中如何欺骗对手的直观想法。

英文摘要

In this paper, we consider an adversarial scenario where one agent seeks to achieve an objective and its adversary seeks to learn the agent's intentions and prevent the agent from achieving its objective. The agent has an incentive to try to deceive the adversary about its intentions, while at the same time working to achieve its objective. The primary contribution of this paper is to introduce a mathematically rigorous framework for the notion of deception within the context of optimal control. The central notion introduced in the paper is that of a belief-induced reward: a reward dependent not only on the agent's state and action, but also adversary's beliefs. Design of an optimal deceptive strategy then becomes a question of optimal control design on the product of the agent's state space and the adversary's belief space. The proposed framework allows for deception to be defined in an arbitrary control system endowed with a reward function, as well as with additional specifications limiting the agent's control policy. In addition to defining deception, we discuss design of optimally deceptive strategies under uncertainties in agent's knowledge about the adversary's learning process. In the latter part of the paper, we focus on a setting where the agent's behavior is governed by a Markov decision process, and show that the design of optimally deceptive strategies under lack of knowledge about the adversary naturally reduces to previously discussed problems in control design on partially observable or uncertain Markov decision processes. Finally, we present two examples of deceptive strategies: a "cops and robbers" scenario and an example where an agent may use camouflage while moving. We show that optimally deceptive strategies in such examples follow the intuitive idea of how to deceive an adversary in the above settings.

1805.00983 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Robust Deep Reinforcement Learning for Security and Safety in Autonomous Vehicle Systems

面向自动驾驶系统安全与安全的鲁棒深度强化学习

Aidin Ferdowsi, Ursula Challita, Walid Saad, Narayan B. Mandayam

发表机构 * Ericsson Research(爱立信研究) WINLAB, Dept. of ECE, Rutgers University(WINLAB,电子与计算机工程系,罗格斯大学)

AI总结 本文提出了一种新颖的对抗深度强化学习算法,用于提高自动驾驶系统在面对网络物理攻击时的鲁棒性,通过游戏理论框架分析攻击者与自动驾驶车辆之间的对抗行为,利用LSTM块学习预期间距偏差以优化安全控制。

Comments 8 pages, 4 figures

详情
AI中文摘要

为了在未来的智能城市中有效运行,自动驾驶车辆(AVs)必须依赖于车载传感器如摄像头和雷达以及车对车通信。这种对传感器和通信链路的依赖使AVs容易受到网络物理(CP)攻击,攻击者试图通过操纵数据来控制AVs。因此,为了确保安全和最优的AV动态控制,AVs的数据处理功能必须对这些CP攻击具有鲁棒性。为此,本文分析了在存在CP攻击情况下监控AV动态的状态估计过程,并提出了一种新颖的对抗深度强化学习(RL)算法,以最大化AV动态控制对CP攻击的鲁棒性。在所提出的游戏中,攻击者试图注入错误的数据到AV传感器读数中,以操纵车对车最优安全间距,从而可能增加AV事故风险或减少道路上的车辆流量。同时,AV作为防御方,试图最小化间距偏差以确保对攻击者行为的鲁棒性。由于AV没有关于攻击者行为的信息,且数据值操纵的可能性无限,玩家过去互动的结果被输入到长短期记忆(LSTM)块中。每个玩家的LSTM块学习其自身行动导致的预期间距偏差,并将其反馈到其RL算法中。然后,攻击者的RL算法选择最大化间距偏差的动作,而AV的RL算法则试图找到最小化此类偏差的最佳动作。

英文摘要

To operate effectively in tomorrow's smart cities, autonomous vehicles (AVs) must rely on intra-vehicle sensors such as camera and radar as well as inter-vehicle communication. Such dependence on sensors and communication links exposes AVs to cyber-physical (CP) attacks by adversaries that seek to take control of the AVs by manipulating their data. Thus, to ensure safe and optimal AV dynamics control, the data processing functions at AVs must be robust to such CP attacks. To this end, in this paper, the state estimation process for monitoring AV dynamics, in presence of CP attacks, is analyzed and a novel adversarial deep reinforcement learning (RL) algorithm is proposed to maximize the robustness of AV dynamics control to CP attacks. The attacker's action and the AV's reaction to CP attacks are studied in a game-theoretic framework. In the formulated game, the attacker seeks to inject faulty data to AV sensor readings so as to manipulate the inter-vehicle optimal safe spacing and potentially increase the risk of AV accidents or reduce the vehicle flow on the roads. Meanwhile, the AV, acting as a defender, seeks to minimize the deviations of spacing so as to ensure robustness to the attacker's actions. Since the AV has no information about the attacker's action and due to the infinite possibilities for data value manipulations, the outcome of the players' past interactions are fed to long-short term memory (LSTM) blocks. Each player's LSTM block learns the expected spacing deviation resulting from its own action and feeds it to its RL algorithm. Then, the the attacker's RL algorithm chooses the action which maximizes the spacing deviation, while the AV's RL algorithm tries to find the optimal action that minimizes such deviation.

1801.07745 2026-06-04 math.OC cs.AI cs.CG cs.NA math.NA 版本更新

Optimal Transport on Discrete Domains

离散域上的最优传输

Justin Solomon

发表机构 * MIT Department of Electrical Engineering and Computer Science(麻省理工学院电气工程与计算机科学系) MIT Department of Electrical Engineering(麻省理工学院电气工程系)

AI总结 本文探讨了离散最优传输的最新进展,结合偏微分方程与凸分析,提出理论支持的模型,适用于数万到数百万顶点的领域。

详情
AI中文摘要

受物流问题中供需匹配的启发,最优传输(或蒙特卡洛问题)涉及在几何域上定义的概率分布的匹配。在最明显的离散化中,最优传输成为大规模线性规划问题,通常在三角网格、图、点云等图形和机器学习中遇到的域上难以高效求解。然而,最近的数值最优传输突破使可扩展性达到数量级更大的问题,可在几秒钟内解决。本文讨论了利用离散和光滑问题方面理解的数值最优传输进展。最先进的离散最优传输技术结合了偏微分方程(PDE)与凸分析的洞察,以重新公式化、离散化和优化运输问题。最终结果是一组理论上支持的模型,适用于具有数万或数百万顶点的领域。由于数值最优传输是一个相对较新的学科,特别强调了识别和解释需要数学洞察和额外研究的开放问题。

英文摘要

Inspired by the matching of supply to demand in logistical problems, the optimal transport (or Monge--Kantorovich) problem involves the matching of probability distributions defined over a geometric domain such as a surface or manifold. In its most obvious discretization, optimal transport becomes a large-scale linear program, which typically is infeasible to solve efficiently on triangle meshes, graphs, point clouds, and other domains encountered in graphics and machine learning. Recent breakthroughs in numerical optimal transport, however, enable scalability to orders-of-magnitude larger problems, solvable in a fraction of a second. Here, we discuss advances in numerical optimal transport that leverage understanding of both discrete and smooth aspects of the problem. State-of-the-art techniques in discrete optimal transport combine insight from partial differential equations (PDE) with convex analysis to reformulate, discretize, and optimize transportation problems. The end result is a set of theoretically-justified models suitable for domains with thousands or millions of vertices. Since numerical optimal transport is a relatively new discipline, special emphasis is placed on identifying and explaining open problems in need of mathematical insight and additional research.

1804.04696 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient Model Identification for Tensegrity Locomotion

高效 tensegrity 机器人运动的模型识别

Shaojun Zhu, David Surovik, Kostas E. Bekris, Abdeslam Boularias

发表机构 * Department of Computer Science, Rutgers University(计算机科学系,罗格斯大学)

AI总结 本文提出一种高效方法,利用物理引擎和贝叶斯优化框架,用于识别高维顺应性tensegrity机器人中的未知机械参数,提升运动控制精度。

详情
AI中文摘要

本文旨在以实用方式识别未知物理参数,如驱动机器人连杆的机械模型,这些参数在动态机器人任务中至关重要。关键特征包括使用现成的物理引擎和贝叶斯优化框架。所考虑的任务是高维、顺应性tensegrity机器人的运动。关键见解在于将模型识别挑战投影到适当的低维空间以提高效率。与替代方法的比较表明,所提出的方法可以在给定的时间预算内更准确地识别参数,从而实现更精确的运动控制。

英文摘要

This paper aims to identify in a practical manner unknown physical parameters, such as mechanical models of actuated robot links, which are critical in dynamical robotic tasks. Key features include the use of an off-the-shelf physics engine and the Bayesian optimization framework. The task being considered is locomotion with a high-dimensional, compliant Tensegrity robot. A key insight, in this case, is the need to project the model identification challenge into an appropriate lower dimensional space for efficiency. Comparisons with alternatives indicate that the proposed method can identify the parameters more accurately within the given time budget, which also results in more precise locomotion control.

1804.03973 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Reasoning about Safety of Learning-Enabled Components in Autonomous Cyber-physical Systems

关于自主机电系统中学习组件安全性的推理

Cumhur Erkan Tuncali, James Kapinski, Hisahiro Ito, Jyotirmoy V. Deshmukh

发表机构 * Toyota Research Institute of North America(丰田北美研究院) University of Southern California(南加州大学)

AI总结 本文提出基于模拟的方法生成屏障证书函数,用于验证包含神经网络控制器的机电系统安全性。通过线性规划求解器从随机初始状态获得的模拟轨迹中找到候选生成函数,并利用SMT求解器验证其安全性。

Comments Invited paper in conference: Design Automation Conference (DAC) 2018

详情
AI中文摘要

我们提出一种基于模拟的方法,用于生成用于验证包含神经网络控制器的机电系统(CPS)安全性的屏障证书函数。利用线性规划求解器,从通过随机选择初始状态获得的CPS模型模拟轨迹中找到候选生成函数。然后选择生成函数的水平集作为屏障证书,表示从给定初始状态集无法到达不安全系统状态。通过SMT求解器验证屏障证书属性。该方法在自主车辆的Dubins车模型上进行了案例研究,该模型由神经网络控制以跟随给定路径。

英文摘要

We present a simulation-based approach for generating barrier certificate functions for safety verification of cyber-physical systems (CPS) that contain neural network-based controllers. A linear programming solver is utilized to find a candidate generator function from a set of simulation traces obtained by randomly selecting initial states for the CPS model. A level set of the generator function is then selected to act as a barrier certificate for the system, meaning it demonstrates that no unsafe system states are reachable from a given set of initial states. The barrier certificate properties are verified with an SMT solver. This approach is demonstrated on a case study in which a Dubins car model of an autonomous vehicle is controlled by a neural network to follow a given path.

1804.02884 2026-06-04 cs.AI cs.LG cs.MA cs.NE cs.SY eess.SY 版本更新

Policy Gradient With Value Function Approximation For Collective Multiagent Planning

基于价值函数近似集体多智能体规划的策略梯度

Duc Thien Nguyen, Akshat Kumar, Hoong Chuin Lau

发表机构 * School of Information Systems(信息系统学院) Singapore Management University(新加坡管理大学)

AI总结 本文提出一种改进的actor-critic方法,用于优化集体决策多智能体规划问题,通过分解近似动作价值函数提升收敛速度,并在合成任务和出租车车队优化中验证了方法的有效性。

详情
AI中文摘要

去中心化(PO)MDPs为多智能体系统序列决策提供了表达性框架。鉴于其计算复杂性,近期研究聚焦于可处理且实用的Dec-POMDP子类。我们针对此类子类CDEC-POMDP进行研究,其中智能体群体行为影响联合奖励和环境动态。本文的主要贡献是一种用于优化CDEC-POMDP策略的actor-critic强化学习方法。 vanilla AC在大问题上收敛缓慢。为解决此问题,我们展示了如何通过特定的分解近似动作价值函数过智能体导致有效的更新,并推导出一种基于局部奖励信号训练critic的新方法。在合成基准和现实世界出租车车队优化问题上的比较表明,我们的新AC方法提供了比先前最佳方法更高质量的解决方案。

英文摘要

Decentralized (PO)MDPs provide an expressive framework for sequential decision making in a multiagent system. Given their computational complexity, recent research has focused on tractable yet practical subclasses of Dec-POMDPs. We address such a subclass called CDEC-POMDP where the collective behavior of a population of agents affects the joint-reward and environment dynamics. Our main contribution is an actor-critic (AC) reinforcement learning method for optimizing CDEC-POMDP policies. Vanilla AC has slow convergence for larger problems. To address this, we show how a particular decomposition of the approximate action-value function over agents leads to effective updates, and also derive a new way to train the critic based on local reward signals. Comparisons on a synthetic benchmark and a real-world taxi fleet optimization problem show that our new AC approach provides better quality solutions than previous best approaches.

1612.07139 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

A Survey of Deep Network Solutions for Learning Control in Robotics: From Reinforcement to Imitation

深度网络在机器人学习控制中的应用综述:从强化到模仿

Lei Tai, Jingwei Zhang, Ming Liu, Joschka Boedecker, Wolfram Burgard

发表机构 * University of Freiburg(弗赖堡大学)

AI总结 本文综述了深度学习在机器人学习控制中的应用,探讨了深度强化学习和模仿学习两大主流方法,分析了其在导航、 manipulation 任务中的应用及现实差距挑战。

Comments 19 pages, 1 figures

详情
AI中文摘要

深度学习技术已广泛应用于各种研究领域,取得了最先进的成果。本文综述了针对机器人应用的学习控制策略的深度学习解决方案。我们讨论了深度学习在学习控制中的两大主要范式:深度强化学习和模仿学习。对于深度强化学习(DRL),我们从传统强化学习算法开始,展示了如何将其扩展到深度领域,并介绍了在机器人导航和 manipulation 任务中使用 DRL 的代表性工作。我们继续讨论了解决现实差距挑战的方法,即如何将仿真中训练的 DRL 策略转移到现实世界场景,并总结了用于 DRL 研究的机器人仿真平台。对于模仿学习,我们探讨了其三个主要类别:行为克隆、逆强化学习和生成对抗模仿学习,介绍了它们的公式及其在机器人应用中的对应情况。最后,我们讨论了开放挑战和研究前沿。

英文摘要

Deep learning techniques have been widely applied, achieving state-of-the-art results in various fields of study. This survey focuses on deep learning solutions that target learning control policies for robotics applications. We carry out our discussions on the two main paradigms for learning control with deep networks: deep reinforcement learning and imitation learning. For deep reinforcement learning (DRL), we begin from traditional reinforcement learning algorithms, showing how they are extended to the deep context and effective mechanisms that could be added on top of the DRL algorithms. We then introduce representative works that utilize DRL to solve navigation and manipulation tasks in robotics. We continue our discussion on methods addressing the challenge of the reality gap for transferring DRL policies trained in simulation to real-world scenarios, and summarize robotics simulation platforms for conducting DRL research. For imitation leaning, we go through its three main categories, behavior cloning, inverse reinforcement learning and generative adversarial imitation learning, by introducing their formulations and their corresponding robotics applications. Finally, we discuss the open challenges and research frontiers.

1712.04170 2026-06-04 cs.AI cs.NE cs.SY eess.SY 版本更新

Interpretable Policies for Reinforcement Learning by Genetic Programming

通过遗传编程实现强化学习的可解释策略

Daniel Hein, Steffen Udluft, Thomas A. Runkler

发表机构 * Technical University of Munich, Department of Informatics(慕尼黑技术大学信息学院) Siemens AG, Corporate Technology(西门子股份公司企业技术部)

AI总结 本文提出基于模型驱动批量强化学习和遗传编程的GPRL方法,通过预存的默认状态-动作轨迹样本自动生成可解释的强化学习策略,实验表明其优于传统符号回归方法。

详情
AI中文摘要

可解释性强化学习策略的搜索在学术和工业领域均有重要价值。特别是对于工业系统,如果策略易于理解和评估,领域专家更可能部署自主学习的控制器。基本代数方程只要复杂度适当,就能满足这些要求。本文引入基于模型驱动批量强化学习和遗传编程的强化学习遗传编程(GPRL)方法,该方法可从预存的默认状态-动作轨迹样本中自动生成策略方程。GPRL与传统利用遗传编程进行符号回归的方法相比,能够生成模仿现有高性能但不可解释策略的策略。在三个强化学习基准测试中,即山车、倒极杆平衡和工业基准,实验显示GPRL方法优于符号回归方法。GPRL能够从预存的默认轨迹数据中生成高性能且可解释的强化学习策略。

英文摘要

The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on model-based batch reinforcement learning and genetic programming, which autonomously learns policy equations from pre-existing default state-action trajectory samples. GPRL is compared to a straight-forward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing well-performing, but non-interpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cart-pole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing well-performing interpretable reinforcement learning policies from pre-existing default trajectory data.

1803.08137 2026-06-04 cs.CV cs.AI cs.NA math.NA stat.ML 版本更新

Robust Blind Deconvolution via Mirror Descent

通过镜像下降实现鲁棒盲去卷积

Sathya N. Ravi, Ronak Mehta, Vikas Singh

AI总结 本文研究盲去卷积的鲁棒性和收敛性,提出一种具有理论保证的算法,在实践中表现优异。

详情
AI中文摘要

我们重新审视盲去卷积问题,重点在于理解其鲁棒性和收敛性属性。可证明的鲁棒性对噪声和其他扰动的容忍能力最近在视觉领域受到关注,从获得对抗攻击的免疫性到评估和描述关键任务应用中算法的失败模式。此外,许多基于深度架构的盲去卷积方法内部使用或优化基本公式,因此更清楚地理解该子模块的行为、何时可以求解以及它可以容忍多少噪声注入是首要要求。我们推导了盲去卷积理论基础的新见解。出现的算法具有良好的收敛保证,并在我们论文中正式定义的意义上被证明是鲁棒的。有趣的是,这些技术结果在实践中表现非常出色,其中在标准数据集上,我们的算法结果与或优于现有最先进方法。关键词:盲去卷积,鲁棒连续优化

英文摘要

We revisit the Blind Deconvolution problem with a focus on understanding its robustness and convergence properties. Provable robustness to noise and other perturbations is receiving recent interest in vision, from obtaining immunity to adversarial attacks to assessing and describing failure modes of algorithms in mission critical applications. Further, many blind deconvolution methods based on deep architectures internally make use of or optimize the basic formulation, so a clearer understanding of how this sub-module behaves, when it can be solved, and what noise injection it can tolerate is a first order requirement. We derive new insights into the theoretical underpinnings of blind deconvolution. The algorithm that emerges has nice convergence guarantees and is provably robust in a sense we formalize in the paper. Interestingly, these technical results play out very well in practice, where on standard datasets our algorithm yields results competitive with or superior to the state of the art. Keywords: blind deconvolution, robust continuous optimization

1612.05971 2026-06-04 eess.SY cs.AI cs.GT cs.SY math.OC 版本更新

An Integrated Optimization + Learning Approach to Optimal Dynamic Pricing for the Retailer with Multi-type Customers in Smart Grids

在智能电网中面向多类型顾客的零售商最优动态定价集成优化与学习方法

Fanlin Meng, Xiao-Jun Zeng, Yan Zhang, Chris J. Dent, Dunwei Gong

发表机构 * School of Engineering and Computing Sciences, Durham University(工程与计算科学学院,达勒姆大学) School of Computer Science, The University of Manchester(计算机科学学院,曼彻斯特大学) College of Information System and Management, National University of Defense Technology(信息系统与管理学院,国防科技大学) School of Mathematics, University of Edinburgh(数学学院,爱丁堡大学) School of Information and Control Engineering, China University of Mining and Technology(信息与控制工程学院,中国矿业大学)

AI总结 本文针对智能电网中零售商服务三种不同类型的顾客问题,提出两级决策框架,结合优化与学习方法实现动态定价优化,通过仿真实验验证模型的有效性。

Comments 38 pages, 6 figures

详情
AI中文摘要

本文考虑智能电网中零售商服务三种不同类型的顾客的现实场景,即具有嵌入智能电表的最优家庭能源管理系统顾客(C-HEMS)、仅具有智能电表的顾客(C-SM)以及无智能电表的顾客(C-NONE)。本文的主要目标是支持零售商在混合顾客群体中做出最优的日提前动态定价决策。为此,我们提出一个两级决策框架,其中零售商作为上层代理首先宣布未来24小时的电力价格,顾客作为下层代理随后根据价格调度其能源使用。对于下层问题,我们根据不同顾客的独特特征建模其价格响应性。对于上层问题,我们优化动态价格以最大化零售商利润,同时满足现实市场约束。上述两级模型通过基于遗传算法(GA)的分布式优化方法解决,其可行性和有效性通过仿真结果得到验证。

英文摘要

In this paper, we consider a realistic and meaningful scenario in the context of smart grids where an electricity retailer serves three different types of customers, i.e., customers with an optimal home energy management system embedded in their smart meters (C-HEMS), customers with only smart meters (C-SM), and customers without smart meters (C-NONE). The main objective of this paper is to support the retailer to make optimal day-ahead dynamic pricing decisions in such a mixed customer pool. To this end, we propose a two-level decision-making framework where the retailer acting as upper-level agent firstly announces its electricity prices of next 24 hours and customers acting as lower-level agents subsequently schedule their energy usages accordingly. For the lower level problem, we model the price responsiveness of different customers according to their unique characteristics. For the upper level problem, we optimize the dynamic prices for the retailer to maximize its profit subject to realistic market constraints. The above two-level model is tackled by genetic algorithms (GA) based distributed optimization methods while its feasibility and effectiveness are confirmed via simulation results.

1703.02660 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

Towards Generalization and Simplicity in Continuous Control

连续控制中的泛化与简洁性

Aravind Rajeswaran, Kendall Lowrey, Emanuel Todorov, Sham Kakade

发表机构 * University of Washington(华盛顿大学)

AI总结 本文展示简单线性与RBF参数化策略可解决多种连续控制任务,性能可与更复杂网络相媲美,且多样初始化提升泛化能力。

Comments NIPS 2017, Project page: https://sites.google.com/view/simple-pol

详情
AI中文摘要

本文表明,简单线性及RBF参数化策略可训练解决多种连续控制任务,包括OpenAI Gym基准。这些策略性能可与更复杂参数化方法相媲美。现有训练测试场景受限且易过拟合,导致仅轨迹中心策略。多样初始化产生更具全局性的策略,允许系统在大扰动下恢复,如补充视频所示。

英文摘要

This work shows that policies with simple linear and RBF parameterizations can be trained to solve a variety of continuous control tasks, including the OpenAI gym benchmarks. The performance of these trained policies are competitive with state of the art results, obtained with more elaborate parameterizations such as fully connected neural networks. Furthermore, existing training and testing scenarios are shown to be very limited and prone to over-fitting, thus giving rise to only trajectory-centric policies. Training with a diverse initial state distribution is shown to produce more global policies with better generalization. This allows for interactive control scenarios where the system recovers from large on-line perturbations; as shown in the supplementary video.

1803.06775 2026-06-04 quant-ph cs.AI cs.ET cs.SY eess.SY 版本更新

Comparing and Integrating Constraint Programming and Temporal Planning for Quantum Circuit Compilation

比较和整合约束编程与时间规划用于量子电路编译

Kyle E. C. Booth, Minh Do, J. Christopher Beck, Eleanor Rieffel, Davide Venturelli, Jeremy Frank

发表机构 * Quantum Artificial Intelligence Laboratory, NASA Ames Research Center(量子人工智能实验室,美国国家航空航天局阿姆斯特朗研究中心) Planning and Scheduling Group, NASA Ames Research Center(规划与调度组,美国国家航空航天局阿姆斯特朗研究中心) USRA Research Institute for Advanced Computer Science(美国宇航局高级计算机科学研究所) Stinger Ghaffarian Technologies, Inc.(Stinger Ghaffarian技术公司) Department of Mechanical & Industrial Engineering, University of Toronto(多伦多大学机械与工业工程系)

AI总结 本文比较了约束编程与时间规划在量子电路编译中的应用,提出混合方法提升求解质量,证明混合方法在多数问题中优于单独使用时间规划。

Comments 9 pages, 2 figures, Proceedings of the 28th International Conference of Automated Planning and Scheduling 2018 (ICAPS-18)

详情
AI中文摘要

最近,将一般量子算法编译为近期量子处理器的makespan最小化问题被引入人工智能社区。研究显示时间规划是量子电路编译(QCC)问题的一种强大方法。本文探讨了约束编程(CP)作为时间规划的替代和补充方法。我们通过引入两个新的问题变体扩展了先前工作,这些变体结合了量子计算社区识别的重要特征。我们应用时间规划和CP解决基准和扩展的QCC问题,作为单独和混合方法。我们的混合方法利用时间规划找到的解决方案预热CP,利用前者在任务选项性高的问题中找到满意解的能力,而CP通常难以处理。CP模型受益于预热提供的推断边界,从而找到更高质量的解。实证评估表明,虽然单独使用CP仅在最小问题中具有竞争力,但CP与时间规划的混合方法在多数问题类别中表现优于单独使用时间规划。

英文摘要

Recently, the makespan-minimization problem of compiling a general class of quantum algorithms into near-term quantum processors has been introduced to the AI community. The research demonstrated that temporal planning is a strong approach for a class of quantum circuit compilation (QCC) problems. In this paper, we explore the use of constraint programming (CP) as an alternative and complementary approach to temporal planning. We extend previous work by introducing two new problem variations that incorporate important characteristics identified by the quantum computing community. We apply temporal planning and CP to the baseline and extended QCC problems as both stand-alone and hybrid approaches. Our hybrid methods use solutions found by temporal planning to warm start CP, leveraging the ability of the former to find satisficing solutions to problems with a high degree of task optionality, an area that CP typically struggles with. The CP model, benefiting from inferred bounds on planning horizon length and task counts provided by the warm start, is then used to find higher quality solutions. Our empirical evaluation indicates that while stand-alone CP is only competitive for the smallest problems, CP in our hybridization with temporal planning out-performs stand-alone temporal planning in the majority of problem classes.

1707.01625 2026-06-04 eess.SY cs.AI cs.GT cs.SY 版本更新

Optimal Vehicle Dispatching Schemes via Dynamic Pricing

通过动态定价实现最优车辆调度方案

Mengjing Chen, Weiran Shen, Pingzhong Tang, Song Zuo

发表机构 * IIIS, Tsinghua University(清华大学信息科学与技术学院)

AI总结 本文通过经济方法解决网约车平台在地理和时间信息下的最优定价和车辆调度问题,提出高效算法计算最优随机定价方案,并通过实验证明其优于固定定价和涨价机制。

详情
AI中文摘要

近年来,拼车服务已成为缓解交通拥堵的有效方式。这些平台的关键问题是如何制定收入最优(或GMV最优)的定价方案和诱导的车辆调度策略,以整合地理和时间信息。本文通过经济方法解决此问题。简单建模下,底层优化问题可能非凸,难以计算。为此,我们使用所谓的

英文摘要

Over the past few years, ride-sharing has emerged as an effective way to relieve traffic congestion. A key problem for these platforms is to come up with a revenue-optimal (or GMV-optimal) pricing scheme and an induced vehicle dispatching policy that incorporate geographic and temporal information. In this paper, we aim to tackle this problem via an economic approach. Modeled naively, the underlying optimization problem may be non-convex and thus hard to compute. To this end, we use a so-called "ironing" technique to convert the problem into an equivalent convex optimization one via a clean Markov decision process (MDP) formulation, where the states are the driver distributions and the decision variables are the prices for each pair of locations. Our main finding is an efficient algorithm that computes the exact revenue-optimal (or GMV-optimal) randomized pricing schemes. We characterize the optimal solution of the MDP by a primal-dual analysis of a corresponding convex program. We also conduct empirical evaluations of our solution through real data of a major ride-sharing platform and show its advantages over fixed pricing schemes as well as several prevalent surge-based pricing schemes.

1708.08113 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Novel Sensor Scheduling Scheme for Intruder Tracking in Energy Efficient Sensor Networks

新颖的传感器调度方案用于能量高效的入侵者跟踪

Raghuram Bharadwaj Diddigi, Prabuchandran K. J., Shalabh Bhatnagar

发表机构 * Department of Computer Science and Automation, Indian Institute of Science(计算机科学与自动化系,印度科学研究院)

AI总结 本文提出基于POMDP的强化学习算法,用于在能量受限下高效跟踪入侵者,通过UCT方法实现状态和动作空间的扩展,验证了算法在大规模问题中的有效性。

详情
AI中文摘要

我们考虑使用无线传感器网络跟踪入侵者的问题。在每个时间点,必须确定最优数量和正确配置的传感器以供电,因为供电消耗能量,因此需要在准确跟踪入侵者位置和传感器能耗之间进行权衡。该问题在部分可观测马尔可夫决策过程(POMDP)框架中进行建模。即使对于文献中的最先进算法,维度灾难使问题难以处理。本文在POMDP框架下将入侵检测(ID)问题建模为合适的状态-动作空间,并开发一种利用上置信树搜索(UCT)方法的强化学习(RL)算法来解决ID问题。通过仿真,我们证明了我们的算法在状态和动作空间增大时表现良好且可扩展。

英文摘要

We consider the problem of tracking an intruder using a network of wireless sensors. For tracking the intruder at each instant, the optimal number and the right configuration of sensors has to be powered. As powering the sensors consumes energy, there is a trade off between accurately tracking the position of the intruder at each instant and the energy consumption of sensors. This problem has been formulated in the framework of Partially Observable Markov Decision Process (POMDP). Even for the state-of-the-art algorithm in the literature, the curse of dimensionality renders the problem intractable. In this paper, we formulate the Intrusion Detection (ID) problem with a suitable state-action space in the framework of POMDP and develop a Reinforcement Learning (RL) algorithm utilizing the Upper Confidence Tree Search (UCT) method to solve the ID problem. Through simulations, we show that our algorithm performs and scales well with the increasing state and action spaces.

1802.08138 2026-06-04 cs.AI cs.GT cs.SY eess.SY 版本更新

Reliable Intersection Control in Non-cooperative Environments

非合作环境中的可靠交叉口控制

Muhammed O. Sayin, Chung-Wei Lin, Shinichi Shiraishi, Tamer Başar

发表机构 * University of Illinois at Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Toyota InfoTechnology Center(丰田信息技术中心)

AI总结 本文提出一种可靠交叉口控制机制,用于非合作环境中的战略自主和联网车辆。通过分析车辆的战略行为,确定纳什均衡,并识别社会最优均衡以实现公平分配。

Comments Extended version (including proofs of theorems and lemmas) of the paper: M. O. Sayin, C.-W. Lin, S. Shiraishi, and T. Basar, "Reliable intersection control in non-cooperative environments", to appear in the Proceedings of American Control Conference, 2018

详情
AI中文摘要

我们提出了一种可靠的交叉口控制机制,用于战略自主和联网车辆(智能体)在非合作环境中。每个智能体可以获取其最早可能和期望的通过时间,并向交叉口管理者报告通过时间,管理者按先到先得的原则分配交叉口时间。然而,智能体可能有冲突利益并采取策略性行为。为此,我们分析智能体的战略行为,并为所有可能场景制定纳什均衡。此外,在所有纳什均衡中,我们识别出一个社会最优均衡,以实现公平的交叉口分配,并相应地描述一种策略证明的交叉口机制,该机制实现了可靠的交叉口控制,使得策略性智能体没有动机策略性地报告他们的通过时间。

英文摘要

We propose a reliable intersection control mechanism for strategic autonomous and connected vehicles (agents) in non-cooperative environments. Each agent has access to his/her earliest possible and desired passing times, and reports a passing time to the intersection manager, who allocates the intersection temporally to the agents in a First-Come-First-Serve basis. However, the agents might have conflicting interests and can take actions strategically. To this end, we analyze the strategic behaviors of the agents and formulate Nash equilibria for all possible scenarios. Furthermore, among all Nash equilibria we identify a socially optimal equilibrium that leads to a fair intersection allocation, and correspondingly we describe a strategy-proof intersection mechanism, which achieves reliable intersection control such that the strategic agents do not have any incentive to misreport their passing times strategically.

1802.06314 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Autonomous Vehicle Speed Control for Safe Navigation of Occluded Pedestrian Crosswalk

自动驾驶车辆速度控制:安全通过遮挡人行横道

Sarah Thornton

发表机构 * Dynamic Design Lab(动态设计实验室)

AI总结 本文提出基于部分可观测马尔可夫决策过程的速度控制方法,用于安全通过遮挡人行横道,通过动态规划计算控制策略以应对感知限制。

Comments 6 pages, 9 figures

详情
AI中文摘要

人类和自动驾驶车辆传感器的感知能力有限。当这些限制与涉及易受伤害道路使用者的场景重合时,必须在运动规划器中考虑这些限制。在遮挡人行横道的场景中,接近车辆的速度应是道路上不确定性量的函数。在本工作中,纵向控制器被建模为部分可观测马尔可夫决策过程,并使用动态规划计算控制策略。该控制策略将速度剖面传递给模型预测转向控制器。

英文摘要

Both humans and the sensors on an autonomous vehicle have limited sensing capabilities. When these limitations coincide with scenarios involving vulnerable road users, it becomes important to account for these limitations in the motion planner. For the scenario of an occluded pedestrian crosswalk, the speed of the approaching vehicle should be a function of the amount of uncertainty on the roadway. In this work, the longitudinal controller is formulated as a partially observable Markov decision process and dynamic programming is used to compute the control policy. The control policy scales the speed profile to be used by a model predictive steering controller.

1705.07262 2026-06-04 cs.LG cs.AI cs.NE cs.SY eess.SY 版本更新

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

批量强化学习在工业基准上的应用:初步经验

Daniel Hein, Steffen Udluft, Michel Tokic, Alexander Hentschel, Thomas A. Runkler, Volkmar Sterzing

发表机构 * Technical University of Munich, Department of Informatics(慕尼黑技术大学信息学院) Siemens AG, Corporate Technology(西门子股份公司企业技术部)

AI总结 本文研究了粒子群优化策略在工业基准上的表现,展示了其在真实应用场景中的有效性,相比传统方法,PSO-P在性能和鲁棒性上表现突出。

详情
Journal ref
2017 International Joint Conference on Neural Networks (IJCNN), Anchorage, AK, 2017, pp. 4214-4221
AI中文摘要

粒子群优化策略(PSO-P)近期被引入并证明在与学术强化学习基准的非策略、批量设置中产生了显著成果。为进一步研究其在真实应用中的性质和可行性,本文在所谓的工业基准(IB)上研究PSO-P,这是一个旨在通过包含工业应用中发现的各种方面(如连续状态和动作空间、高维部分可观测状态空间、延迟效应和复杂随机性)而变得真实的新强化学习(RL)基准。PSO-P在IB上的实验结果与基于模型的递归控制神经网络(RCNN)和基于模型的神经拟合Q迭代(NFQ)推导出的闭式控制策略的结果进行比较。实验表明,PSO-P不仅对学术基准感兴趣,也对真实世界工业应用感兴趣,因为它在我们的IB设置中也产生了最佳表现的策略。与其它已建立的RL技术相比,PSO-P在性能和鲁棒性上表现出色,仅需相对较低的努力来找到合适的参数或做出复杂的设计决策。

英文摘要

The Particle Swarm Optimization Policy (PSO-P) has been recently introduced and proven to produce remarkable results on interacting with academic reinforcement learning benchmarks in an off-policy, batch-based setting. To further investigate the properties and feasibility on real-world applications, this paper investigates PSO-P on the so-called Industrial Benchmark (IB), a novel reinforcement learning (RL) benchmark that aims at being realistic by including a variety of aspects found in industrial applications, like continuous state and action spaces, a high dimensional, partially observable state space, delayed effects, and complex stochasticity. The experimental results of PSO-P on IB are compared to results of closed-form control policies derived from the model-based Recurrent Control Neural Network (RCNN) and the model-free Neural Fitted Q-Iteration (NFQ). Experiments show that PSO-P is not only of interest for academic benchmarks, but also for real-world industrial applications, since it also yielded the best performing policy in our IB setting. Compared to other well established RL techniques, PSO-P produced outstanding results in performance and robustness, requiring only a relatively low amount of effort in finding adequate parameters or making complex design decisions.

1607.07942 2026-06-04 cs.AI cs.IT cs.SY eess.SY math.IT 版本更新

Multiple scan data association by convex variational inference

通过凸变分推断实现多扫描数据关联

Jason L. Williams, Roslyn A. Lau

发表机构 * Defence Science and Technology Group, Australia(澳大利亚国防科学与技术集团) National Security, Intelligence, Surveillance and Reconnaissance Division(国家安全、情报、监视与侦察部门) Queensland University of Technology, Australia(澳大利亚昆士兰理工大学) Maritime Division, Defence Science and Technology Group, Australia(澳大利亚国防科学与技术集团的海军部门)

AI总结 本文研究多扫描数据关联问题,提出基于分数自由能的凸优化方法,改进了传统信念传播算法,提升目标跟踪精度。

详情
AI中文摘要

数据关联,即对目标与测量之间的对应关系进行推理,是目标跟踪中的基础问题。最近,信念传播(BP)作为一种估计测量与目标关联的边缘概率的有希望方法出现,提供了快速且准确的估计。BP在特定形式中的出色表现可能归因于其隐含优化的底层自由能的凸性。本文研究多扫描数据关联问题,即对目标与多个测量集之间的对应关系进行推理的问题,这可能对应于不同的传感器或不同的时间步。我们发现单扫描BP形式的多扫描扩展是非凸的,并展示了由此产生的不良行为。使用最近提出的分数自由能(FFE)构建了凸自由能。为单扫描FFE提供了一个收敛的、类似BP的算法,并用于通过对偶坐标上升优化多扫描自由能。最后,基于联合概率数据关联(JPDA)的变分解释,我们开发了一个类似于JPDA的序列变体算法,但保留了来自先前扫描的一致性约束。所提出方法的性能在仅靠方位角的目标定位问题上得到验证。

英文摘要

Data association, the reasoning over correspondence between targets and measurements, is a problem of fundamental importance in target tracking. Recently, belief propagation (BP) has emerged as a promising method for estimating the marginal probabilities of measurement to target association, providing fast, accurate estimates. The excellent performance of BP in the particular formulation used may be attributed to the convexity of the underlying free energy which it implicitly optimises. This paper studies multiple scan data association problems, i.e., problems that reason over correspondence between targets and several sets of measurements, which may correspond to different sensors or different time steps. We find that the multiple scan extension of the single scan BP formulation is non-convex and demonstrate the undesirable behaviour that can result. A convex free energy is constructed using the recently proposed fractional free energy (FFE). A convergent, BP-like algorithm is provided for the single scan FFE, and employed in optimising the multiple scan free energy using primal-dual coordinate ascent. Finally, based on a variational interpretation of joint probabilistic data association (JPDA), we develop a sequential variant of the algorithm that is similar to JPDA, but retains consistency constraints from prior scans. The performance of the proposed methods is demonstrated on a bearings only target localisation problem.

1801.07229 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Combinatorial framework for planning in geological exploration

地质勘探规划的组合框架

Mark Sh. Levin

AI总结 本文提出了一种用于油气田地质勘探规划的组合框架,通过多属性评估、层次化设计和区域整合,优化勘探方案。

Comments 14 pages, 15 figures, 11 tables

详情
AI中文摘要

本文描述了一种用于油气田地质勘探规划的组合框架。该框架包括构建四层树状模型、生成局部设计替代方案、多属性评估、层次化设计、区域整合以及计划聚合。第二至第五阶段基于层次化多属性形态学设计方法,第六阶段基于检测替代方案的'核心'并扩展其元素。替代方案的评估基于专家判断,并通过亚姆拉半岛的数值示例进行了说明。

英文摘要

The paper describes combinatorial framework for planning of geological exploration for oil-gas fields. The suggested scheme of the geological exploration involves the following stages: (1) building of special 4-layer tree-like model (layer of geological exploration): productive layer, group of productive layers, oil-gas field, oil-gas region (or group of the fields); (2) generations of local design (exploration) alternatives for each low-layer geological objects: conservation, additional search, independent utilization, joint utilization; (3) multicriteria (i.e., multi-attribute) assessment of the design (exploration) alternatives and their interrelation (compatibility) and mapping if the obtained vector estimates into integrated ordinal scale; (4) hierarchical design ('bottom-up') of composite exploration plans for each oil-gas field; (5) integration of the plans into region plans and (6) aggregation of the region plans into a general exploration plan. Stages 2, 3, 4, and 5 are based on hierarchical multicriteria morphological design (HMMD) method (assessment of ranking of alternatives, selection and composition of alternatives into composite alternatives). The composition problem is based on morphological clique model. Aggregation of the obtained modular alternatives (stage 6) is based on detection of a alternatives 'kernel' and its extension by addition of elements (multiple choice model). In addition, the usage of multiset estimates for alternatives is described as well. The alternative estimates are based on expert judgment. The suggested combinatorial planning methodology is illustrated by numerical examples for geological exploration of Yamal peninsula.

1801.00048 2026-06-04 eess.SY cs.AI cs.SY q-bio.NC 版本更新

Characterizing optimal hierarchical policy inference on graphs via non-equilibrium thermodynamics

通过非平衡热力学刻画图上最优分层策略推断

Daniel McNamee

发表机构 * Computational and Biological Learning Lab, University of Cambridge(计算与生物学习实验室,剑桥大学)

AI总结 本文提出一种基于图的非平衡热力学方法,用于构建和推断最优分层策略,解决状态空间在不同空间分辨率下的层次结构构建问题。

Comments NIPS 2017 Workshop on Hierarchical Reinforcement Learning. 8 pages, 1 figure

详情
AI中文摘要

层次结构在随机最优控制和生物控制中具有根本性意义,因其能促进控制算法中的多种有利计算特性,并可能成为传感器运动和认知控制系统的核心原理。然而,理论上尚未明确构建所有空间分辨率下的状态空间层次结构及其通过策略推断过程演变的方法。本文在图的背景下引入了一种形式化方法,用于推导离散马尔可夫决策过程的规范表示。所得到的层次结构对应于一种分层策略推断算法,该算法近似了由先验和最优策略生成的状态空间轨迹密度之间的离散梯度流。

英文摘要

Hierarchies are of fundamental interest in both stochastic optimal control and biological control due to their facilitation of a range of desirable computational traits in a control algorithm and the possibility that they may form a core principle of sensorimotor and cognitive control systems. However, a theoretically justified construction of state-space hierarchies over all spatial resolutions and their evolution through a policy inference process remains elusive. Here, a formalism for deriving such normative representations of discrete Markov decision processes is introduced in the context of graphs. The resulting hierarchies correspond to a hierarchical policy inference algorithm approximating a discrete gradient flow between state-space trajectory densities generated by the prior and optimal policies.

1712.09356 2026-06-04 cs.AI cs.SY eess.SY 版本更新

An Online Ride-Sharing Path Planning Strategy for Public Vehicle Systems

面向公共交通系统的在线拼车路径规划策略

Ming Zhu, Xiao-Yang Liu, Xiaodong Wang

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences(深圳先进技术研究院,中国科学院)

AI总结 本文提出一种高效的在线拼车路径规划策略,通过过滤不符合乘客服务质量的请求,将全局搜索转化为局部搜索,从而降低计算复杂度,实验表明计算时间比穷举法减少22%。

Comments 12 pages

详情
AI中文摘要

作为高效的交通管理平台,公共交通系统(PV系统)被设想为未来智能城市解决交通拥堵和污染的有希望方法。PV系统提供在线/动态的点对点拼车服务,旨在以最少的车辆数和最低成本服务足够数量的乘客。PV系统的关键组成部分是在线拼车调度策略。本文提出了一种高效的路径规划策略,通过为每辆车限定潜在搜索区域并过滤掉违反乘客服务质量级别的请求,将全局搜索减少到局部搜索。我们分析了所提出解决方案的性能,如计算复杂度的降低比例。基于曼哈顿出租车数据集的仿真显示,在相同的服务质量性能下,计算时间比穷举搜索方法减少了22%。

英文摘要

As efficient traffic-management platforms, public vehicle (PV) systems are envisioned to be a promising approach to solving traffic congestions and pollutions for future smart cities. PV systems provide online/dynamic peer-to-peer ride-sharing services with the goal of serving sufficient number of customers with minimum number of vehicles and lowest possible cost. A key component of the PV system is the online ride-sharing scheduling strategy. In this paper, we propose an efficient path planning strategy that focuses on a limited potential search area for each vehicle by filtering out the requests that violate passenger service quality level, so that the global search is reduced to local search. We analyze the performance of the proposed solution such as reduction ratio of computational complexity. Simulations based on the Manhattan taxi data set show that, the computing time is reduced by 22% compared with the exhaustive search method under the same service quality performance.

1705.08927 2026-06-04 quant-ph cs.AI cs.ET cs.SY eess.SY 版本更新

Compiling quantum circuits to realistic hardware architectures using temporal planners

利用时间规划器将量子电路编译到现实硬件架构

Davide Venturelli, Minh Do, Eleanor Rieffel, Jeremy Frank

发表机构 * NASA Ames Research Center, Quantum Artificial Intelligence Laboratory(美国国家航空航天局阿姆斯研究中心,量子人工智能实验室) USRA Research Institute for Advanced Computer Science (RIACS)(美国宇航局高级计算机科学研究所(RIACS)) Stinger Ghaffarian Technologies (SGT Inc.)(Stinger Ghaffarian技术(SGT公司)) NASA Ames Research Center, Planning and Scheduling Group(美国国家航空航天局阿姆斯研究中心,计划与调度组)

AI总结 本文研究了将量子电路编译到新兴量子硬件的时空规划方法,重点探讨了超导架构的最近邻约束,并通过QAOA电路的实验验证了时间规划在编译优化中的可行性。

Comments updated manuscript, more planners and results

详情
Journal ref
2017 Quantum Sci. Technol. - also related to proceedings of IJCAI 2017, and ICAPS SPARK Workshop 2017
AI中文摘要

为了在新兴门模型量子硬件上运行量子算法,量子电路必须被编译以考虑硬件的限制。对于近期硬件,由于只能有限地缓解退相干,最小化电路持续时间至关重要。我们研究了将时间规划器应用于量子电路编译到新兴量子硬件的问题。虽然我们的方法是通用的,但我们专注于编译到具有最近邻约束的超导硬件架构。我们的初步实验集中在编译具有高数量交换门的量子交替算子范式(QAOA)电路,这些交换门允许在应用门的顺序上具有极大的灵活性。这种自由度使找到最优编译更具挑战性,但也意味着更优化的编译可能带来更大的收益。我们将这个量子电路编译问题映射到时间规划问题,并为不同大小的QAOA电路生成了一个测试集,以现实硬件架构为目标。我们报告了几个最先进的时间规划器在该测试集上的编译结果。这项早期的实证评估表明,时间规划是量子电路编译的一种可行方法。

英文摘要

To run quantum algorithms on emerging gate-model quantum hardware, quantum circuits must be compiled to take into account constraints on the hardware. For near-term hardware, with only limited means to mitigate decoherence, it is critical to minimize the duration of the circuit. We investigate the application of temporal planners to the problem of compiling quantum circuits to newly emerging quantum hardware. While our approach is general, we focus on compiling to superconducting hardware architectures with nearest neighbor constraints. Our initial experiments focus on compiling Quantum Alternating Operator Ansatz (QAOA) circuits whose high number of commuting gates allow great flexibility in the order in which the gates can be applied. That freedom makes it more challenging to find optimal compilations but also means there is a greater potential win from more optimized compilation than for less flexible circuits. We map this quantum circuit compilation problem to a temporal planning problem, and generated a test suite of compilation problems for QAOA circuits of various sizes to a realistic hardware architecture. We report compilation results from several state-of-the-art temporal planners on this test set. This early empirical evaluation demonstrates that temporal planning is a viable approach to quantum circuit compilation.

1610.06781 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Modular Deep Q Networks for Sim-to-real Transfer of Visuo-motor Policies

模块化深度Q网络用于视觉-运动策略的仿真到现实迁移

Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter Corke

发表机构 * Australian Centre for Robotic Vision (ACRV)(澳大利亚机器人视觉中心) Queensland University of Technology (QUT)(昆士兰理工大学)

AI总结 本文提出模块化深度强化学习方法,通过在感知与控制之间引入瓶颈,实现仿真到现实的迁移,提升机器人视觉-运动协调能力。

Comments Australasian Conference on Robotics and Automation (ACRA) 2017, Student Paper Award Finalist

详情
Journal ref
The proceedings of the Australasian Conference on Robotics and Automation (ACRA) 2017
AI中文摘要

尽管深度学习在计算机视觉中因大量视觉数据而取得显著成功,但为机器人学习收集足够大的现实世界数据集成本较高。为提高这些技术在真实机器人上的实用性,我们提出了一种模块化深度强化学习方法,能够将仿真训练的模型迁移到现实世界机器人任务中。我们引入了感知与控制之间的瓶颈,使网络能够独立训练,然后在端到端方式下合并和微调,以进一步提高视觉-运动协调性。在经典的平面视觉引导机器人抓取任务中,微调后的准确度达到1.6像素,显著优于直接迁移(17.5像素),显示出在更复杂和广泛的应用中的潜力。我们的方法提供了一种更高效学习和迁移视觉-运动策略的技术,无需完全依赖大规模现实世界机器人数据集。

英文摘要

While deep learning has had significant successes in computer vision thanks to the abundance of visual data, collecting sufficiently large real-world datasets for robot learning can be costly. To increase the practicality of these techniques on real robots, we propose a modular deep reinforcement learning method capable of transferring models trained in simulation to a real-world robotic task. We introduce a bottleneck between perception and control, enabling the networks to be trained independently, but then merged and fine-tuned in an end-to-end manner to further improve hand-eye coordination. On a canonical, planar visually-guided robot reaching task a fine-tuned accuracy of 1.6 pixels is achieved, a significant improvement over naive transfer (17.5 pixels), showing the potential for more complicated and broader applications. Our method provides a technique for more efficient learning and transfer of visuo-motor policies for real robotic systems without relying entirely on large real-world robot datasets.

1712.06577 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Parallel Complexity of Forward and Backward Propagation

前向和反向传播的并行复杂度

Maxim Naumov

发表机构 * NVIDIA

AI总结 研究前向和反向传播作为三角方程组解的并行计算复杂度,提出直接和迭代并行算法,并展示FNN和RNN的反向传播可并行处理。

Comments 18 pages

详情
AI中文摘要

我们证明前向和反向传播可以表示为下三角和上三角方程组的解。对于标准前馈网络和循环神经网络,三角方程组总是块双对角线,而对于一般计算图,它们可能具有更复杂的三角稀疏模式。我们讨论了可以直接和迭代并行求解的算法,并将其解释为不同的模型并行方法。此外,我们展示了对于具有k层和τ时间步的FNN和RNN,反向传播可以在O(log k)和O(log k log τ)步内并行执行。最后,我们概述了使用雅可比矩阵扩展此技术的可能性,以处理任意层。

英文摘要

We show that the forward and backward propagation can be formulated as a solution of lower and upper triangular systems of equations. For standard feedforward (FNNs) and recurrent neural networks (RNNs) the triangular systems are always block bi-diagonal, while for a general computation graph (directed acyclic graph) they can have a more complex triangular sparsity pattern. We discuss direct and iterative parallel algorithms that can be used for their solution and interpreted as different ways of performing model parallelism. Also, we show that for FNNs and RNNs with $k$ layers and $τ$ time steps the backward propagation can be performed in parallel in O($\log k$) and O($\log k \log τ$) steps, respectively. Finally, we outline the generalization of this technique using Jacobians that potentially allows us to handle arbitrary layers.

1712.04612 2026-06-04 q-fin.CP cs.AI cs.CE cs.LG cs.SY eess.SY 版本更新

Inverse Reinforcement Learning for Marketing

营销中的逆强化学习

Igor Halperin

发表机构 * NYU Tandon School of Engineering(纽约大学坦顿工程学院)

AI总结 本文提出利用逆强化学习研究动态消费者需求,通过最大熵方法构建可 tractable 模型,展示观测噪声可能被误认为消费者异质性。

Comments 18 pages, 5 figures

详情
AI中文摘要

从观察行为中学习顾客偏好是营销文献中的重要课题。结构模型通常将前瞻性顾客或企业建模为效用最大化代理,其效用通过随机最优控制方法估计。本文提出基于逆强化学习(IRL)的替代方法研究动态消费者需求。我们开发了一种最大熵IRL的变种,导致高度可 tractable 的模型公式,最终转化为低维凸优化以寻找最优模型参数。通过消费者需求的模拟,我们显示相同顾客的观测噪声可以轻易被误认为显而易见的消费者异质性。

英文摘要

Learning customer preferences from an observed behaviour is an important topic in the marketing literature. Structural models typically model forward-looking customers or firms as utility-maximizing agents whose utility is estimated using methods of Stochastic Optimal Control. We suggest an alternative approach to study dynamic consumer demand, based on Inverse Reinforcement Learning (IRL). We develop a version of the Maximum Entropy IRL that leads to a highly tractable model formulation that amounts to low-dimensional convex optimization in the search for optimal model parameters. Using simulations of consumer demand, we show that observational noise for identical customers can be easily confused with an apparent consumer heterogeneity.

1712.00634 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY math.OC 版本更新

PFAx: Predictable Feature Analysis to Perform Control

PFAx:可预测特征分析用于控制

Stefan Richthofer, Laurenz Wiskott

AI总结 PFAx通过整合补充信息提升预测性能,并透明展示补充信息对特征选择的影响,应用于强化学习环境中的智能体控制优化。

详情
AI中文摘要

可预测特征分析(PFA)(Richthofer, Wiskott, ICMLA 2015)是一种对高维输入信号进行降维的算法,提取最可预测的子信号。本文扩展了PFA,考虑补充信息以提高预测。补充信息不参与特征提取,特征仅从主输入中提取。PFAx透明地展示补充信息如何提升预测质量,并可生成补充信息以实现主信号的特定目标。该方法应用于强化学习环境,使智能体局部优化状态,接近目标。后续论文将扩展此方法以实现全局优化。

英文摘要

Predictable Feature Analysis (PFA) (Richthofer, Wiskott, ICMLA 2015) is an algorithm that performs dimensionality reduction on high dimensional input signal. It extracts those subsignals that are most predictable according to a certain prediction model. We refer to these extracted signals as predictable features. In this work we extend the notion of PFA to take supplementary information into account for improving its predictions. Such information can be a multidimensional signal like the main input to PFA, but is regarded external. That means it won't participate in the feature extraction - no features get extracted or composed of it. Features will be exclusively extracted from the main input such that they are most predictable based on themselves and the supplementary information. We refer to this enhanced PFA as PFAx (PFA extended). Even more important than improving prediction quality is to observe the effect of supplementary information on feature selection. PFAx transparently provides insight how the supplementary information adds to prediction quality and whether it is valuable at all. Finally we show how to invert that relation and can generate the supplementary information such that it would yield a certain desired outcome of the main signal. We apply this to a setting inspired by reinforcement learning and let the algorithm learn how to control an agent in an environment. With this method it is feasible to locally optimize the agent's state, i.e. reach a certain goal that is near enough. We are preparing a follow-up paper that extends this method such that also global optimization is feasible.

1711.10566 2026-06-04 cs.AI cs.LG cs.NA math.AP math.NA stat.ML 版本更新

Physics Informed Deep Learning (Part II): Data-driven Discovery of Nonlinear Partial Differential Equations

物理指导深度学习(第二部分):数据驱动发现非线性偏微分方程

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University(应用数学系,布朗大学)

AI总结 本文提出物理指导神经网络,用于在尊重物理定律的前提下解决监督学习任务。第二部分聚焦于数据驱动发现偏微分方程的问题,区分了连续时间和离散时间模型,并通过数学物理中的多个基准问题验证了方法的有效性。

详情
AI中文摘要

我们介绍了一种物理指导的神经网络——一种在解决监督学习任务时尊重由一般非线性偏微分方程描述的物理定律的神经网络。在本文第二部分中,我们专注于偏微分方程的数据驱动发现问题。根据可用数据在时空中的分布是散乱还是固定时间快照,我们引入了两种主要算法类别,即连续时间和离散时间模型。通过数学物理中的广泛基准问题,包括守恒定律、不可压缩流体流动和非线性浅水波传播,展示了我们方法的有效性。

英文摘要

We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this second part of our two-part treatise, we focus on the problem of data-driven discovery of partial differential equations. Depending on whether the available data is scattered in space-time or arranged in fixed temporal snapshots, we introduce two main classes of algorithms, namely continuous time and discrete time models. The effectiveness of our approach is demonstrated using a wide range of benchmark problems in mathematical physics, including conservation laws, incompressible fluid flow, and the propagation of nonlinear shallow-water waves.

1711.10561 2026-06-04 cs.AI cs.LG cs.NA math.DS math.NA stat.ML 版本更新

Physics Informed Deep Learning (Part I): Data-driven Solutions of Nonlinear Partial Differential Equations

物理引导的深度学习(第一部分):非线性偏微分方程的数据驱动求解

Maziar Raissi, Paris Perdikaris, George Em Karniadakis

发表机构 * Division of Applied Mathematics, Brown University(应用数学系,布朗大学)

AI总结 本文提出物理引导的神经网络,用于在满足物理定律的前提下解决监督学习问题。第一部分介绍了如何利用这些网络推断偏微分方程的解,并构建可微的物理引导替代模型。

详情
AI中文摘要

我们引入了物理引导的神经网络——一种在解决监督学习任务时尊重由一般非线性偏微分方程描述的物理定律的神经网络。在本两部分论述中,我们围绕解决两类主要问题展开:数据驱动求解和数据驱动发现偏微分方程。根据可用数据的性质和安排,我们设计了两种不同的算法类别,即连续时间和离散时间模型。所得到的神经网络形成了一种新的数据高效通用函数逼近器类别,能够自然地将任何底层物理定律作为先验信息编码。在本第一部分中,我们展示了这些网络如何用于推断偏微分方程的解,并获得完全可微的物理引导替代模型,该模型对所有输入坐标和自由参数均可微分。

英文摘要

We introduce physics informed neural networks -- neural networks that are trained to solve supervised learning tasks while respecting any given law of physics described by general nonlinear partial differential equations. In this two part treatise, we present our developments in the context of solving two main classes of problems: data-driven solution and data-driven discovery of partial differential equations. Depending on the nature and arrangement of the available data, we devise two distinct classes of algorithms, namely continuous time and discrete time models. The resulting neural networks form a new class of data-efficient universal function approximators that naturally encode any underlying physical laws as prior information. In this first part, we demonstrate how these networks can be used to infer solutions to partial differential equations, and obtain physics-informed surrogate models that are fully differentiable with respect to all input coordinates and free parameters.

1711.08512 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A Study on Modeling of Inputting Electrical Power of Ultra High Power Electric Furnace by using Fuzzy Rule and Regression Model

基于模糊规则和回归模型的超高压电炉输入电力建模研究

Choe Un-Chol, Yun Kum-Il, Kwak Son-Il

发表机构 * Faculty of Electronics & Automation, Kim Il Sung University(电子自动化学院,金日成大学)

AI总结 本文提出利用模糊规则和回归模型建立影响高超功率电炉熔炼过程的电力输入模型,并通过仿真实验验证其有效性。

Comments 8 pages, 3 figures, 1 table

详情
AI中文摘要

本文提出了一种方法,该方法通过模糊规则和回归模型来建立影响高超功率(UHP)电炉熔炼过程的电力输入模型,并通过仿真实验验证了该方法的有效性。

英文摘要

: In this paper a method to make inputting electrical model upon factors that affect melting process of high ultra power(UHP) electric furnace by using fuzzy rule and regression model is suggested and its effectiveness is verified with simulation experiment.

1704.04058 2026-06-04 math.OC cs.AI cs.NA math.FA math.NA 版本更新

Solving ill-posed inverse problems using iterative deep neural networks

使用迭代深度神经网络求解病态反问题

Jonas Adler, Ozan Öktem

AI总结 本文提出了一种部分学习方法,利用深度学习和经典正则化理论解决非线性反问题,通过卷积网络学习梯度组件,提升重建速度和PSNR性能。

详情
Journal ref
Inverse Problems 2017
AI中文摘要

我们提出了一种部分学习方法,用于求解非线性正则化反问题。该方法结合经典正则化理论和深度学习进展,利用正则化函数、前向算子和噪声模型的先验信息进行学习。结果是一种梯度样迭代方案,其中梯度组件通过卷积网络学习,输入数据不一致性和正则化器的梯度。我们在非线性断层成像问题中测试了该方法,使用Sheep-Logan幻影和头CT模拟数据,结果优于FBP和TV重建,PSNR提升5.4 dB,速度显著加快,单GPU约0.4秒完成512x512体积重建。

英文摘要

We propose a partially learned approach for the solution of ill posed inverse problems with not necessarily linear forward operators. The method builds on ideas from classical regularization theory and recent advances in deep learning to perform learning while making use of prior information about the inverse problem encoded in the forward operator, noise model and a regularizing functional. The method results in a gradient-like iterative scheme, where the "gradient" component is learned using a convolutional network that includes the gradients of the data discrepancy and regularizer as input in each iteration. We present results of such a partially learned gradient scheme on a non-linear tomographic inversion problem with simulated data from both the Sheep-Logan phantom as well as a head CT. The outcome is compared against FBP and TV reconstruction and the proposed method provides a 5.4 dB PSNR improvement over the TV reconstruction while being significantly faster, giving reconstructions of 512 x 512 volumes in about 0.4 seconds using a single GPU.

1711.08237 2026-06-04 eess.SY cs.AI cs.SI cs.SY 版本更新

The Stochastic Firefighter Problem

随机灭火问题

Guy Tennenholtz, Constantine Caramanis, Shie Mannor

AI总结 研究网络中个体顺序接种策略,提出在概率环境下最优的接种策略,并在不同网络结构上计算感染人数的期望上界和下界。

详情
AI中文摘要

研究传染病传播的动力学在确定风险和控制措施中的关键作用。我们研究网络中个体的顺序接种策略。在原始(确定性)的灭火问题中,火灾在给定图的某个节点爆发。在每个时间步,b个节点可以通过消防员保护,然后火灾会传播到所有未受保护的邻居节点。当火灾无法继续传播时过程结束。我们将灭火问题扩展到概率环境,其中感染是随机的。我们设计了一种简单的策略,仅对感染节点的邻居进行接种,并且在正则树和一般图上,对于足够大的预算,该策略是最佳的。我们推导了计算感染个体数期望上界和下界的方法,并提供了在期望中控制所需的预算估计。我们明确地在树、d维网格和Erdős Rényi图上计算这些内容。最后,我们构建了一种状态依赖的预算分配策略,并在遵循第一阶认识接种政策的真实网络上展示了其优于常数预算分配的优越性。

英文摘要

The dynamics of infectious diseases spread is crucial in determining their risk and offering ways to contain them. We study sequential vaccination of individuals in networks. In the original (deterministic) version of the Firefighter problem, a fire breaks out at some node of a given graph. At each time step, b nodes can be protected by a firefighter and then the fire spreads to all unprotected neighbors of the nodes on fire. The process ends when the fire can no longer spread. We extend the Firefighter problem to a probabilistic setting, where the infection is stochastic. We devise a simple policy that only vaccinates neighbors of infected nodes and is optimal on regular trees and on general graphs for a sufficiently large budget. We derive methods for calculating upper and lower bounds of the expected number of infected individuals, as well as provide estimates on the budget needed for containment in expectation. We calculate these explicitly on trees, d-dimensional grids, and Erdős Rényi graphs. Finally, we construct a state-dependent budget allocation strategy and demonstrate its superiority over constant budget allocation on real networks following a first order acquaintance vaccination policy.

1711.04518 2026-06-04 eess.SY cs.AI cs.HC cs.LG cs.NE cs.SY 版本更新

A Supervised Learning Concept for Reducing User Interaction in Passenger Cars

一种用于减少乘客汽车中用户交互的监督学习概念

Marius Stärk, Damian Backes, Christian Kehl

AI总结 本文提出了一种基于监督学习的自动化系统,用于减少人机交互界面中的交互复杂性,适用于汽车多模态热调节系统的设定点选择。

Comments 4 pages, 9 figures, concept only

详情
AI中文摘要

本文介绍了一种用于人机界面(HMI)的自动化系统,用于通过监督学习实现设定点调整。以乘客汽车多模态热调节系统的HMI为例,展示了一个复杂的设定点选择系统。目标是将交互复杂性降低到完全自动化。该方法不仅限于气候控制应用,还可扩展到其他基于设定点的HMI领域。

英文摘要

In this article an automation system for human-machine-interfaces (HMI) for setpoint adjustment using supervised learning is presented. We use HMIs of multi-modal thermal conditioning systems in passenger cars as example for a complex setpoint selection system. The goal is the reduction of interaction complexity up to full automation. The approach is not limited to climate control applications but can be extended to other setpoint-based HMIs.

1705.08551 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Safe Model-based Reinforcement Learning with Stability Guarantees

具有稳定性保证的安全模型基于强化学习

Felix Berkenkamp, Matteo Turchetta, Angela P. Schoellig, Andreas Krause

AI总结 本文提出一种考虑安全性的强化学习算法,通过Lyapunov稳定性验证理论,利用动态统计模型获得具有证明稳定性的高性能控制策略,并在模拟倒立摆中展示其安全优化神经网络策略的能力。

Comments Proc. of Neural Information Processing Systems (NIPS), 2017

详情
AI中文摘要

强化学习是一种从实验数据中学习最优策略的强大范式。然而,为了找到最优策略,大多数强化学习算法会探索所有可能的动作,这可能对现实系统有害。因此,学习算法在现实世界中很少应用于安全关键系统。在本文中,我们提出了一种明确考虑安全性的学习算法,定义为稳定性保证。具体来说,我们扩展了控制理论中关于Lyapunov稳定性验证的结果,并展示了如何利用动态的统计模型来获得具有证明稳定性的高性能控制策略。此外,在额外的正则性假设条件下,我们证明了可以有效地、安全地收集数据以学习动态特性,从而提高控制性能并扩大状态空间的安全区域。在我们的实验中,我们展示了所得到的算法如何在模拟倒立摆上安全地优化神经网络策略,而摆杆从未倒下。

英文摘要

Reinforcement learning is a powerful paradigm for learning optimal policies from experimental data. However, to find optimal policies, most reinforcement learning algorithms explore all possible actions, which may be harmful for real-world systems. As a consequence, learning algorithms are rarely applied on safety-critical systems in the real world. In this paper, we present a learning algorithm that explicitly considers safety, defined in terms of stability guarantees. Specifically, we extend control-theoretic results on Lyapunov stability verification and show how to use statistical models of the dynamics to obtain high-performance control policies with provable stability certificates. Moreover, under additional regularity assumptions in terms of a Gaussian process prior, we prove that one can effectively and safely collect data in order to learn about the dynamics and thus both improve control performance and expand the safe region of the state space. In our experiments, we show how the resulting algorithm can safely optimize a neural network policy on a simulated inverted pendulum, without the pendulum ever falling down.

1602.06667 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

A Motion Planning Strategy for the Active Vision-Based Mapping of Ground-Level Structures

一种用于主动视觉建图的地面结构运动规划策略

Manikandasriram Srinivasan Ramanagopal, André Phu-Van Nguyen, Jerome Le Ny

AI总结 本文提出了一种指导配备摄像头或深度传感器的地面机器人自主建图有限三维结构可见部分的策略,通过运动规划算法确定合适视角并自动填补点云中的空洞,适用于建筑、施工和检测领域。

Comments Accepted for publication in IEEE Transactions on Automation Science and Engineering. Available in IEEE Xplore at http://ieeexplore.ieee.org/document/8093664

详情
AI中文摘要

本文提出了一种策略,用于指导配备摄像头或深度传感器的地面机器人,以自主建图有限三维结构的可见部分。我们描述了确定合适连续视角的运动规划算法,并尝试自动填补由感知和感知层产生的点云中的空洞。重点是准确重建中等大小结构的3D模型,而非映射大型开放环境。所提出的算法不需要以网格模型或包围盒形式的初始化,生成的路径适用于视觉传感器同时用于建图和机器人局部化的情况,特别是在没有额外绝对定位系统时。我们分析了我们的策略的覆盖性质,并将其性能与经典前沿探索算法进行比较。我们展示了其在不同结构大小、局部化精度水平和深度传感器范围下的有效性,并在真实世界实验中验证了我们的设计。

英文摘要

This paper presents a strategy to guide a mobile ground robot equipped with a camera or depth sensor, in order to autonomously map the visible part of a bounded three-dimensional structure. We describe motion planning algorithms that determine appropriate successive viewpoints and attempt to fill holes automatically in a point cloud produced by the sensing and perception layer. The emphasis is on accurately reconstructing a 3D model of a structure of moderate size rather than mapping large open environments, with applications for example in architecture, construction and inspection. The proposed algorithms do not require any initialization in the form of a mesh model or a bounding box, and the paths generated are well adapted to situations where the vision sensor is used simultaneously for mapping and for localizing the robot, in the absence of additional absolute positioning system. We analyze the coverage properties of our policy, and compare its performance to the classic frontier based exploration algorithm. We illustrate its efficacy for different structure sizes, levels of localization accuracy and range of the depth sensor, and validate our design on a real-world experiment.

1711.03026 2026-06-04 eess.SY cs.AI cs.SY stat.ML 版本更新

Intelligent Fault Analysis in Electrical Power Grids

电力电网的智能故障分析

Biswarup Bhattacharya, Abhishek Sinha

AI总结 本文提出利用人工智能技术,通过形式化模型和机器学习方法检测电网健康状况,提升电网稳定性与安全性。

Comments In proceedings of the 29th IEEE International Conference on Tools with Artificial Intelligence (ICTAI) 2017 (full paper); 6 pages; 13 figures

详情
AI中文摘要

电力电网是当今世界基础设施中最重要的一部分。每个国家都依赖自身电网的安全性和稳定性来为家庭和工业提供电力。即使电网的某一小部分出现故障,也可能导致生产力、收入损失,甚至在某些情况下导致生命危险。因此,设计一个能够检测电网健康状况并在严重异常发生前采取保护措施的系统至关重要。为此,我们致力于创建一个智能系统,能够随时分析电网信息,并通过使用复杂的正式模型和新颖的机器学习技术如循环神经网络来确定电网的健康状况。我们的系统使用西门子PSS/E软件模拟电网条件,包括故障、发电机输出波动和负载波动等刺激,并使用SVM、LSTM等分类器对数据进行训练和测试。结果非常出色,我们的方法在数据上表现出很高的准确性。该模型可以轻松扩展以处理更大、更复杂的电网架构。

英文摘要

Power grids are one of the most important components of infrastructure in today's world. Every nation is dependent on the security and stability of its own power grid to provide electricity to the households and industries. A malfunction of even a small part of a power grid can cause loss of productivity, revenue and in some cases even life. Thus, it is imperative to design a system which can detect the health of the power grid and take protective measures accordingly even before a serious anomaly takes place. To achieve this objective, we have set out to create an artificially intelligent system which can analyze the grid information at any given time and determine the health of the grid through the usage of sophisticated formal models and novel machine learning techniques like recurrent neural networks. Our system simulates grid conditions including stimuli like faults, generator output fluctuations, load fluctuations using Siemens PSS/E software and this data is trained using various classifiers like SVM, LSTM and subsequently tested. The results are excellent with our methods giving very high accuracy for the data. This model can easily be scaled to handle larger and more complex grid architectures.

1711.02877 2026-06-04 eess.SY cs.AI cs.LG cs.SY math.OC 版本更新

Un résultat intrigant en commande sans modèle

一个令人着迷的无模型控制结果

Cédric Join, Emmanuel Delaleau, Michel Fliess, Claude H. Moog

AI总结 通过鲁夫-赫维茨准则,证明了无模型控制中智能比例控制器可能比智能比例-微分控制器更难调参,通过仿真展示了iPD的优势。

Comments in French, https://www.openscience.fr/Un-resultat-intrigant-en-commande-sans-modele

详情
Journal ref
ISTE OpenScience Automatique, vol. 1, 2017
AI中文摘要

一个简单的数学例子证明,通过鲁夫-赫维茨准则,一个令人着迷的结果得以展现,即在当今对无模型控制的理解中,智能比例控制器(iP)可能比智能比例-微分控制器(iPD)更难调参。通过计算机仿真展示了iPD相较于经典PID的显著优势。引言和结论从近期进展的角度分析了无模型控制。

英文摘要

An elementary mathematical example proves, thanks to the Routh-Hurwitz criterion, a result that is intriguing with respect to today's practical understanding of model-free control, i.e., an "intelligent" proportional controller (iP) may turn to be more difficult to tune than an intelligent proportional-derivative one (iPD). The vast superiority of iPDs when compared to classic PIDs is shown via computer simulations. The introduction as well as the conclusion analyse model-free control in the light of recent advances.

1711.02857 2026-06-04 cs.LG cs.AI cs.CV cs.NA math.NA stat.ML 版本更新

Learning Sparse Visual Representations with Leaky Capped Norm Regularizers

通过泄漏受限范数正则化器学习稀疏视觉表示

Jianqiao Wangni, Dahua Lin

AI总结 本文提出泄漏受限范数正则化器,用于学习过完备视觉表示,证明了其在3D形状恢复中的收敛性,优于ℓ1和非凸正则化方法。

详情
AI中文摘要

诱导稀疏性的正则化是学习过完备视觉表示的重要组成部分。尽管ℓ1正则化广受欢迎,本文研究了非凸正则化在该问题中的应用。我们的贡献包括三个部分:首先,我们提出了泄漏受限范数正则化器(LCNR),允许模型权重低于一定阈值的部分被更强地正则化,从而实现强稀疏性,仅引入可控的估计偏差。我们提出了一种主要化-最小化算法来优化联合目标函数。其次,我们的研究显示,在单目3D形状恢复和神经网络中,LCNR优于ℓ1和其他非凸正则化方法,实现了最先进的性能和更快的收敛速度。第三,我们证明了在3D恢复问题上的理论全局收敛速度。到目前为止,这是首次对3D恢复问题的收敛性分析。

英文摘要

Sparsity inducing regularization is an important part for learning over-complete visual representations. Despite the popularity of $\ell_1$ regularization, in this paper, we investigate the usage of non-convex regularizations in this problem. Our contribution consists of three parts. First, we propose the leaky capped norm regularization (LCNR), which allows model weights below a certain threshold to be regularized more strongly as opposed to those above, therefore imposes strong sparsity and only introduces controllable estimation bias. We propose a majorization-minimization algorithm to optimize the joint objective function. Second, our study over monocular 3D shape recovery and neural networks with LCNR outperforms $\ell_1$ and other non-convex regularizations, achieving state-of-the-art performance and faster convergence. Third, we prove a theoretical global convergence speed on the 3D recovery problem. To the best of our knowledge, this is the first convergence analysis of the 3D recovery problem.

1708.01930 2026-06-04 cs.AI cs.MA cs.RO cs.SY eess.SY 版本更新

Enhanced Emotion Enabled Cognitive Agent Based Rear End Collision Avoidance Controller for Autonomous Vehicles

增强型情感驱动认知代理基于后方碰撞避免控制器用于自动驾驶车辆

Faisal Riaz, Muaz A. Niazi

AI总结 本文提出一种基于增强型情感驱动认知代理的后方碰撞避免控制器,通过引入恐惧情绪生成机制,提高自动驾驶车辆的碰撞避免效率和规则数量。

Comments 39 pages, 17 figures

详情
AI中文摘要

后方碰撞是自然中最致命的事故,导致大多数交通伤亡和伤害。现有研究提出了许多后方碰撞避免解决方案,但这些方案高度依赖精确的数学模型。然而,实际道路驾驶受非线性因素如路面状况、驾驶员反应时间、行人流量和车辆动力学影响,因此获得车辆控制系统精确数学模型具有挑战性。这个问题通过模糊逻辑解决了,但过多的模糊规则直接影响其效率。此外,这些基于模糊逻辑的控制器未使用适当的代理建模来模拟人工驾驶员执行这些模糊规则的功能。鉴于这些限制,我们提出了一种增强型情感驱动认知代理(EEEC_Agent)控制器,帮助自动驾驶车辆(AVs)以较少的规则进行后方碰撞避免,设计基于恐惧情绪,并具有高效率。为了在EEEC_Agent中引入恐惧情绪生成机制,采用了Orton, Clore & Collins(OCC)模型。EEEC_Agent的恐惧生成机制通过NetLogo模拟验证。此外,通过特别设计的原型AV平台对EEEC_Agent的功能进行了实际验证。最终,与现有最先进研究的定性比较研究表明,所提出的模型优于近期研究。

英文摘要

Rear end collisions are deadliest in nature and cause most of traffic casualties and injuries. In the existing research, many rear end collision avoidance solutions have been proposed. However, the problem with these proposed solutions is that they are highly dependent on precise mathematical models. Whereas, the real road driving is influenced by non-linear factors such as road surface situations, driver reaction time, pedestrian flow and vehicle dynamics, hence obtaining the accurate mathematical model of the vehicle control system is challenging. This problem with precise control based rear end collision avoidance schemes has been addressed using fuzzy logic, but the excessive number of fuzzy rules straightforwardly prejudice their efficiency. Furthermore, these fuzzy logic based controllers have been proposed without using proper agent based modeling that helps in mimicking the functions of an artificial human driver executing these fuzzy rules. Keeping in view these limitations, we have proposed an Enhanced Emotion Enabled Cognitive Agent (EEEC_Agent) based controller that helps the Autonomous Vehicles (AVs) to perform rear end collision avoidance with less number of rules, designed after fear emotion, and high efficiency. To introduce a fear emotion generation mechanism in EEEC_Agent, Orton, Clore & Collins (OCC) model has been employed. The fear generation mechanism of EEEC_Agent has been verified using NetLogo simulation. Furthermore, practical validation of EEEC_Agent functions has been performed using specially built prototype AV platform. Eventually, the qualitative comparative study with existing state of the art research works reflect that proposed model outperforms recent research.

1708.01628 2026-06-04 cs.MA cs.AI cs.GT cs.SY eess.SY 版本更新

Validation of Enhanced Emotion Enabled Cognitive Agent Using Virtual Overlay Multi-Agent System Approach

基于虚拟叠加多智能体系统的增强型情感认知智能体验证

Faisal Riaz, Muaz A. Niazi

AI总结 本文提出基于虚拟叠加多智能体系统的方法,验证了增强型情感认知智能体在避免道路碰撞中的有效性,展示了其在不同交通情境下感知恐惧等级的能力及更短的停车视距和超车视距。

Comments 35 pages, 21 figures, 19 tables

详情
Journal ref
Broad Research in Artificial Intelligence and Neuroscience 8.3 (2017): 13-37
AI中文摘要

通过避免道路碰撞来提高道路安全性是发明自动驾驶车辆(AVs)的主要原因之一。在此背景下,设计能够真正代表人类认知和情感的基于智能体的碰撞避免组件,似乎是更可行的方法,因为智能体可以替代人类驾驶员。然而,据我们所知,在这一领域中,非常少有基于人类情感和认知的智能体研究。此外,这些基于智能体的解决方案尚未使用任何关键的验证技术进行验证。考虑到这种缺乏验证实践的情况,我们选择了最先进的情感认知智能体(EEC_Agent),该智能体旨在避免半自动驾驶车辆之间的侧向碰撞。EEC_Agent的架构已使用认知智能体基于计算(CABC)框架中的探索性智能体建模(EABM)级别进行了修订,并引入了基于Ortony、Clore & Collins(OCC)模型的实时恐惧情绪生成机制。然后,所提出的恐惧生成机制已通过CABC框架中的验证智能体建模级别使用虚拟叠加多智能体系统(VOMAS)进行验证。广泛的模拟和实际实验表明,增强型EEC_Agent能够根据不同的交通情境感知不同层次的恐惧,并且相比人类驾驶员,其所需的停车视距(SSD)和超车视距(OSD)更小。

英文摘要

Making roads safer by avoiding road collisions is one of the main reasons for inventing Autonomous vehicles (AVs). In this context, designing agent-based collision avoidance components of AVs which truly represent human cognition and emotions look is a more feasible approach as agents can replace human drivers. However, to the best of our knowledge, very few human emotion and cognition-inspired agent-based studies have previously been conducted in this domain. Furthermore, these agent-based solutions have not been validated using any key validation technique. Keeping in view this lack of validation practices, we have selected state-of-the-art Emotion Enabled Cognitive Agent (EEC_Agent), which was proposed to avoid lateral collisions between semi-AVs. The architecture of EEC_Agent has been revised using Exploratory Agent Based Modeling (EABM) level of the Cognitive Agent Based Computing (CABC) framework and real-time fear emotion generation mechanism using the Ortony, Clore & Collins (OCC) model has also been introduced. Then the proposed fear generation mechanism has been validated using the Validated Agent Based Modeling level of CABC framework using a Virtual Overlay MultiAgent System (VOMAS). Extensive simulation and practical experiments demonstrate that the Enhanced EEC_Agent exhibits the capability to feel different levels of fear, according to different traffic situations and also needs a smaller Stopping Sight Distance (SSD) and Overtaking Sight Distance (OSD) as compared to human drivers.

1710.11040 2026-06-04 cs.RO cs.AI cs.SY eess.SY math.OC 版本更新

How Should a Robot Assess Risk? Towards an Axiomatic Theory of Risk in Robotics

机器人应如何评估风险?迈向机器人学中的风险轴理论

Anirudha Majumdar, Marco Pavone

AI总结 本文探讨了机器人风险评估的理论基础,提出风险度量应满足的公理,讨论了风险度量的表示定理及其在机器人应用中的实例,并分析了常用风险度量的局限性。

Comments Extended version of paper published in International Symposium on Robotics Research (ISRR) 2017

详情
AI中文摘要

赋予机器人评估风险和做出风险感知决策的能力被视为确保在不确定环境下运作的机器人安全的关键步骤。但,机器人应如何量化风险?一种自然且常见的方法是考虑一种框架,即随机结果被赋予成本——这种分配由一个成本随机变量捕捉。量化风险则对应于评估风险度量,即从成本随机变量到实数的映射。然而,什么是构成

英文摘要

Endowing robots with the capability of assessing risk and making risk-aware decisions is widely considered a key step toward ensuring safety for robots operating under uncertainty. But, how should a robot quantify risk? A natural and common approach is to consider the framework whereby costs are assigned to stochastic outcomes - an assignment captured by a cost random variable. Quantifying risk then corresponds to evaluating a risk metric, i.e., a mapping from the cost random variable to a real number. Yet, the question of what constitutes a "good" risk metric has received little attention within the robotics community. The goal of this paper is to explore and partially address this question by advocating axioms that risk metrics in robotics applications should satisfy in order to be employed as rational assessments of risk. We discuss general representation theorems that precisely characterize the class of metrics that satisfy these axioms (referred to as distortion risk metrics), and provide instantiations that can be used in applications. We further discuss pitfalls of commonly used risk metrics in robotics, and discuss additional properties that one must consider in sequential decision making tasks. Our hope is that the ideas presented here will lead to a foundational framework for quantifying risk (and hence safety) in robotics applications.

1710.10532 2026-06-04 eess.SY cs.AI cs.LG cs.SY 版本更新

Interpretable Apprenticeship Learning with Temporal Logic Specifications

具有时序逻辑规范的可解释模仿学习

Daniel Kasenberg, Matthias Scheutz

AI总结 本文提出通过多目标优化从MDP中的行为轨迹推断LTL规范,采用违反成本概念设计状态和动作基于的目标函数,并通过遗传算法在简单领域验证方法有效性。

Comments Accepted to the 56th IEEE Conference on Decision and Control (CDC 2017)

详情
AI中文摘要

近期工作已针对线性时序逻辑(LTL)公式作为在马尔可夫决策过程(MDP)中规划智能体的规范进行了研究。我们考虑逆问题:从MDP中的演示行为轨迹推断LTL规范。我们将此问题形式化为多目标优化问题,并基于

英文摘要

Recent work has addressed using formulas in linear temporal logic (LTL) as specifications for agents planning in Markov Decision Processes (MDPs). We consider the inverse problem: inferring an LTL specification from demonstrated behavior trajectories in MDPs. We formulate this as a multiobjective optimization problem, and describe state-based ("what actually happened") and action-based ("what the agent expected to happen") objective functions based on a notion of "violation cost". We demonstrate the efficacy of the approach by employing genetic programming to solve this problem in two simple domains.

1710.09627 2026-06-04 cs.AI cs.NI cs.SY eess.SY 版本更新

SRE: Semantic Rules Engine For the Industrial Internet-Of-Things Gateways

SRE:面向工业互联网-of-things网关的语义规则引擎

Charbel El Kaed, Imran Khan, Andre Van Den Berg, Hicham Hossayni, Christophe Saint-Marcel

AI总结 本文提出一种面向工业网关的语义规则引擎SRE,用于实现动态灵活的基于规则的控制策略,支持实时管理规则并提供语义查询结果。

Comments Accepted for publication in forthcoming issue of IEEE Transactions on Industrial Informatics. The content is final but has NOT been proof-read

详情
Journal ref
IEEE Transactions on Industrial Informatics, 2017
AI中文摘要

物联网范式的发展为解决现实问题提供了机会。例如,能源管理吸引了学术界、工业界、政府和监管机构的广泛关注。它涉及收集能源使用数据、分析数据并通过控制策略优化能源消耗。然而,在工业环境中,进行此类优化并不简单。业务规则的变化、过程控制和客户要求的变化使问题更加具有挑战性。本文提出了一种面向工业网关的语义规则引擎(SRE),允许实现动态且灵活的基于规则的控制策略。它简单、表达能力强,并允许在不造成任何服务中断的情况下实时管理规则。此外,它能够处理语义查询,并通过从已定义的概念中推断额外知识来提供结果。SRE已在不同硬件平台和商业产品上得到验证和测试。还提供了性能评估以验证其对客户要求的符合性。

英文摘要

The Advent of the Internet-of-Things (IoT) paradigm has brought opportunities to solve many real-world problems. Energy management, for example, has attracted huge interest from academia, industries, governments and regulatory bodies. It involves collecting energy usage data, analyzing it, and optimizing the energy consumption by applying control strategies. However, in industrial environments, performing such optimization is not trivial. The changes in business rules, process control, and customer requirements make it much more challenging. In this paper, a Semantic Rules Engine (SRE) for industrial gateways is presented that allows implementing dynamic and flexible rule-based control strategies. It is simple, expressive, and allows managing rules on-the-fly without causing any service interruption. Additionally, it can handle semantic queries and provide results by inferring additional knowledge from previously defined concepts in ontologies. SRE has been validated and tested on different hardware platforms and in commercial products. Performance evaluations are also presented to validate its conformance to the customer requirements.

1710.07147 2026-06-04 cs.AI cs.SY eess.SY 版本更新

A Two-Phase Safe Vehicle Routing and Scheduling Problem: Formulations and Solution Algorithms

两阶段安全车辆路径与调度问题:建模与求解算法

Aschkan Omidvar, Eren Erman Ozguven, O. Arda Vanli, R. Tavakkoli-Moghaddam

AI总结 本文提出一种两阶段时间依赖车辆路径与调度优化模型,通过避免重复拥堵和选择事故概率较低的路线,替代传统最短距离或行驶时间目标。第一阶段利用混合整数规划模型确定安全路径;第二阶段通过调整出发时间和速度避免拥堵,采用改进的模拟退火算法求解。

详情
AI中文摘要

我们提出一个两阶段时间依赖车辆路径与调度优化模型,通过(1)避免重复拥堵和(2)选择事故概率较低的路线,替代文献中常见的最短距离或行驶时间目标。第一阶段根据时间动态考虑道路网络上的速度变化,解决混合整数规划模型以确定车队和节点序列的安全路径。第二阶段将每条路线视为独立的交通路径(固定路线和节点序列),通过调整车辆从每个节点的出发时间和调整各边的次优速度来避免拥堵。提出的改进模拟退火(SA)算法用于迭代求解这两个复杂模型,能够以较短的时间提供解决方案。

英文摘要

We propose a two phase time dependent vehicle routing and scheduling optimization model that identifies the safest routes, as a substitute for the classical objectives given in the literature such as shortest distance or travel time, through (1) avoiding recurring congestions, and (2) selecting routes that have a lower probability of crash occurrences and non-recurring congestion caused by those crashes. In the first phase, we solve a mixed-integer programming model which takes the dynamic speed variations into account on a graph of roadway networks according to the time of day, and identify the routing of a fleet and sequence of nodes on the safest feasible paths. Second phase considers each route as an independent transit path (fixed route with fixed node sequences), and tries to avoid congestion by rescheduling the departure times of each vehicle from each node, and by adjusting the sub-optimal speed on each arc. A modified simulated annealing (SA) algorithm is formulated to solve both complex models iteratively, which is found to be capable of providing solutions in a considerably short amount of time.

1709.03153 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

MBMF: Model-Based Priors for Model-Free Reinforcement Learning

MBMF:基于模型的先验用于无模型强化学习

Somil Bansal, Roberto Calandra, Kurtland Chua, Sergey Levine, Claire Tomlin

AI总结 本文提出一种结合模型与无模型强化学习的方法,通过学习概率动力学模型作为先验,提升数据效率和成本效益。

Comments After we submitted the paper for consideration in CoRL 2017 we found a paper published in the recent past with a similar method (see related work for a discussion). Considering the similarities between the two papers, we have decided to retract our paper from CoRL 2017

详情
AI中文摘要

强化学习主要分为无模型和有模型两种范式。每种范式都有其优势和局限性,并已成功应用于适合其相应优势的真实世界领域。本文提出一种新方法,旨在弥合这两种范式的差距。我们通过学习概率动力学模型,并将其作为交织的无模型优化的先验,结合两种范式的优点,从而实现数据高效和成本节约。结果表明,我们的方法在性能上优于纯有模型和纯无模型方法,以及简单切换范式的方法。

英文摘要

Reinforcement Learning is divided in two main paradigms: model-free and model-based. Each of these two paradigms has strengths and limitations, and has been successfully applied to real world domains that are appropriate to its corresponding strengths. In this paper, we present a new approach aimed at bridging the gap between these two paradigms. We aim to take the best of the two paradigms and combine them in an approach that is at the same time data-efficient and cost-savvy. We do so by learning a probabilistic dynamics model and leveraging it as a prior for the intertwined model-free optimization. As a result, our approach can exploit the generality and structure of the dynamics model, but is also capable of ignoring its inevitable inaccuracies, by directly incorporating the evidence provided by the direct observation of the cost. Preliminary results demonstrate that our approach outperforms purely model-based and model-free approaches, as well as the approach of simply switching from a model-based to a model-free setting.

1707.09095 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Toward the Starting Line: A Systems Engineering Approach to Strong AI

迈向起点:一种系统工程方法用于强人工智能

Tansu Alpcan, Sarah M. Erfani, Christopher Leckie

AI总结 本文提出一种基于系统工程的方法,旨在解决强人工智能的起点问题,通过跨学科融合推动主流研究。

Comments 11 pages, 3 figures

详情
AI中文摘要

人工一般智能(AGI)或强人工智能旨在创造具有人类水平智能的机器,相较于现有计算和人工智能系统仍是一个雄心勃勃的目标。在经历了多次 hype 周期和 AI 历史教训后,显然需要一个重大的概念飞跃才能跨越起点,从而启动主流 AGI 研究。本文旨在为达到这一起点做出小的理论贡献。通过对 AGI 问题从不同视角进行广泛分析,介绍了一种基于系统理论和工程研究的方法,该方法建立在现有的主流 AI 和系统基础之上。识别了系统学科与 AI 研究之间的几个有前途的交叉促进机会。讨论了具体潜在的研究方向。

英文摘要

Artificial General Intelligence (AGI) or Strong AI aims to create machines with human-like or human-level intelligence, which is still a very ambitious goal when compared to the existing computing and AI systems. After many hype cycles and lessons from AI history, it is clear that a big conceptual leap is needed for crossing the starting line to kick-start mainstream AGI research. This position paper aims to make a small conceptual contribution toward reaching that starting line. After a broad analysis of the AGI problem from different perspectives, a system-theoretic and engineering-based research approach is introduced, which builds upon the existing mainstream AI and systems foundations. Several promising cross-fertilization opportunities between systems disciplines and AI research are identified. Specific potential research directions are discussed.

1708.08035 2026-06-04 math.OC cs.AI cs.NA math.NA 版本更新

A Conservation Law Method in Optimization

优化中的守恒定律方法

Bin Shi

AI总结 本文提出基于牛顿第二定律无摩擦的算法,用于寻找非凸优化的局部极小值和某些程度的全局极小值,通过速度可观测性和可控性实现高效收敛。

详情
AI中文摘要

我们提出了一些算法,用于在非凸优化中寻找局部极小值,并在某种程度上获得全局极小值,基于牛顿第二定律无摩擦。通过运动中的速度可观测性和可控性这一关键观察,算法基于辛欧拉方案模拟牛顿第二定律无摩擦。从解析解的直观分析出发,我们对所提出算法的高速收敛性进行了理论分析。最后,我们提出了在高维空间中强凸函数、非强凸函数和非凸函数的实验。

英文摘要

We propose some algorithms to find local minima in nonconvex optimization and to obtain global minima in some degree from the Newton Second Law without friction. With the key observation of the velocity observable and controllable in the motion, the algorithms simulate the Newton Second Law without friction based on symplectic Euler scheme. From the intuitive analysis of analytical solution, we give a theoretical analysis for the high-speed convergence in the algorithm proposed. Finally, we propose the experiments for strongly convex function, non-strongly convex function and nonconvex function in high-dimension.

1710.00489 2026-06-04 cs.RO cs.AI cs.CV cs.NE cs.SY eess.SY 版本更新

SE3-Pose-Nets: Structured Deep Dynamics Models for Visuomotor Planning and Control

SE3-姿态网络:用于视觉-运动规划和控制的结构深度动力学模型

Arunkumar Byravan, Felix Leeb, Franziska Meier, Dieter Fox

AI总结 本文提出了一种基于结构深度动力学模型的深度视觉-运动控制方法,通过编码器-解码器结构学习低维姿态嵌入,实现场景分割和姿态预测,并在现实世界中实现了闭环控制。

Comments 8 pages, Initial submission to IEEE International Conference on Robotics and Automation (ICRA) 2018

详情
AI中文摘要

本文提出了一种基于结构深度动力学模型的深度视觉-运动控制方法。我们的深度动力学模型是一种SE3-Nets的变体,通过编码器-解码器结构学习低维姿态嵌入用于视觉-运动控制。与以往工作不同,我们的动力学模型是结构化的:给定一个输入场景,我们的网络明确学习分割显著部分并预测其姿态嵌入以及其运动作为姿态空间中的变化。我们通过一对相隔动作的点云训练我们的模型,并展示在仅提供帧间点对数据关联的监督下,我们的网络能够学习有意义的场景分割以及一致的姿态。我们进一步展示我们的模型可以直接在学习的低维姿态空间中用于闭环控制,其中动作通过最小化姿态空间中的误差使用基于梯度的方法计算,类似于传统模型驱动控制。我们展示了在模拟和现实世界中控制Baxter机器人从原始深度数据的结果,并与两种基线深度网络进行了比较。我们的方法在实时运行,实现了良好的场景动态预测,并在多个控制运行中优于基线方法。视频结果可在:https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/

英文摘要

In this work, we present an approach to deep visuomotor control using structured deep dynamics models. Our deep dynamics model, a variant of SE3-Nets, learns a low-dimensional pose embedding for visuomotor control via an encoder-decoder structure. Unlike prior work, our dynamics model is structured: given an input scene, our network explicitly learns to segment salient parts and predict their pose-embedding along with their motion modeled as a change in the pose space due to the applied actions. We train our model using a pair of point clouds separated by an action and show that given supervision only in the form of point-wise data associations between the frames our network is able to learn a meaningful segmentation of the scene along with consistent poses. We further show that our model can be used for closed-loop control directly in the learned low-dimensional pose space, where the actions are computed by minimizing error in the pose space using gradient-based methods, similar to traditional model-based control. We present results on controlling a Baxter robot from raw depth data in simulation and in the real world and compare against two baseline deep networks. Our method runs in real-time, achieves good prediction of scene dynamics and outperforms the baseline methods on multiple control runs. Video results can be found at: https://rse-lab.cs.washington.edu/se3-structured-deep-ctrl/

1709.08471 2026-06-04 math.NA cs.AI cs.NA 版本更新

Bayesian Filtering for ODEs with Bounded Derivatives

具有有界导数的ODEs的贝叶斯滤波

Emilia Magnani, Hans Kersting, Michael Schober, Philipp Hennig

AI总结 本文提出了一种新的贝叶斯滤波方法,用于求解具有有界导数的常微分方程,通过引入集成奥本海姆-乌伦贝克过程(IOUP)作为先验,改进了传统积分维纳过程(IWP)滤波器。

Comments 14 pages, 9 figrues

详情
AI中文摘要

近年来,对常微分方程(ODEs)的概率求解器日益感兴趣,这些求解器返回完整的概率测度,而非点估计,并能整合对ODE本身的不确定性,例如当向量场或初始值仅近似已知或可计算时。最近提出的一种ODE滤波器将ODE的解建模为高斯-马尔可夫过程,作为贝叶斯统计中的先验。尽管先前工作使用维纳过程先验建模ODE的(可能多次)导数,并建立了相应求解器与经典数值方法的等价性,本文提出问题:其他先验是否也能产生实用的求解器?为此,我们讨论了多种可能的先验,提出了一种新的先验——集成奥本海姆-乌伦贝克过程(IOUP),它补充了现有的集成维纳过程(IWP)滤波器,通过编码解的时间导数有界性质,即导数会趋向于漂向零。我们提供了比较IWP和IOUP滤波器的实验,支持IWP在近似发散ODE解时表现更好,而IOUP更适合具有有界导数的轨迹。

英文摘要

Recently there has been increasing interest in probabilistic solvers for ordinary differential equations (ODEs) that return full probability measures, instead of point estimates, over the solution and can incorporate uncertainty over the ODE at hand, e.g. if the vector field or the initial value is only approximately known or evaluable. The ODE filter proposed in recent work models the solution of the ODE by a Gauss-Markov process which serves as a prior in the sense of Bayesian statistics. While previous work employed a Wiener process prior on the (possibly multiple times) differentiated solution of the ODE and established equivalence of the corresponding solver with classical numerical methods, this paper raises the question whether other priors also yield practically useful solvers. To this end, we discuss a range of possible priors which enable fast filtering and propose a new prior--the Integrated Ornstein Uhlenbeck Process (IOUP)--that complements the existing Integrated Wiener process (IWP) filter by encoding the property that a derivative in time of the solution is bounded in the sense that it tends to drift back to zero. We provide experiments comparing IWP and IOUP filters which support the belief that IWP approximates better divergent ODE's solutions whereas IOUP is a better prior for trajectories with bounded derivatives.

1709.06080 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Feedforward and Recurrent Neural Networks Backward Propagation and Hessian in Matrix Form

前馈和循环神经网络的反向传播与Hessian矩阵形式

Maxim Naumov

AI总结 本文研究了前馈和循环神经网络的线性代数理论,推导了Hessian的精确表达式,并展示了权重梯度和Hessian的矩阵形式。

Comments 23 pages, 4 figures

详情
AI中文摘要

本文聚焦于前馈(FNN)和循环(RNN)神经网络背后的线性代数理论。我们回顾了反向传播,包括通过时间反向传播(BPTT)。此外,我们推导出Hessian的新的精确表达式,代表了二次效应。我们证明,对于t个时间步,权重梯度可以表示为秩-t矩阵,而权重Hessian则可以表示为t²个Kronecker积之和,这些Kronecker积由秩-1和W^TAW矩阵组成,其中A和W是某些矩阵。此外,我们还证明,对于大小为r的mini-batch,权重更新可以表示为秩-rt矩阵。最后,我们简要评论了Hessian矩阵的特征值。

英文摘要

In this paper we focus on the linear algebra theory behind feedforward (FNN) and recurrent (RNN) neural networks. We review backward propagation, including backward propagation through time (BPTT). Also, we obtain a new exact expression for Hessian, which represents second order effects. We show that for $t$ time steps the weight gradient can be expressed as a rank-$t$ matrix, while the weight Hessian is as a sum of $t^{2}$ Kronecker products of rank-$1$ and $W^{T}AW$ matrices, for some matrix $A$ and weight matrix $W$. Also, we show that for a mini-batch of size $r$, the weight update can be expressed as a rank-$rt$ matrix. Finally, we briefly comment on the eigenvalues of the Hessian matrix.

1709.06011 2026-06-04 cs.MA cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Guided Deep Reinforcement Learning for Swarm Systems

引导式深度强化学习用于群体系统

Maximilian Hüttenrauch, Adrian Šošić, Gerhard Neumann

AI总结 本文研究如何通过有限感知能力的协作代理(如机器人群)学习控制方法,提出引导式强化学习框架,利用中央 critic 获取全局状态以简化策略评估,通过深度强化学习近似 Q 函数和策略。

Comments 15 pages, 8 figures, accepted at the AAMAS 2017 Autonomous Robots and Multirobot Systems (ARMS) Workshop

详情
AI中文摘要

本文研究如何学习控制具有有限感知能力的协作代理群体(如机器人群)。代理仅具备基本传感器能力,但通过协作可完成复杂任务,如分布式装配或搜索救援。学习群体代理的策略因分布式部分可观测性而困难。本文采用引导式方法,其中 critic 在学习过程中拥有全局状态的中央访问,从而从强化学习角度简化策略评估问题。例如,通过摄像头图像获取所有机器人位置,但该图像仅供 critic 使用,不供机器人控制策略。本文采用 actor-critic 方法,其中 actor 仅基于本地感知信息做决策,而 critic 基于真实全局状态进行学习。算法使用深度强化学习近似 Q 函数和策略。算法性能在两个简单模拟 2D 代理任务上进行评估:1) 找到并维持一定距离;2) 定位目标。

英文摘要

In this paper, we investigate how to learn to control a group of cooperative agents with limited sensing capabilities such as robot swarms. The agents have only very basic sensor capabilities, yet in a group they can accomplish sophisticated tasks, such as distributed assembly or search and rescue tasks. Learning a policy for a group of agents is difficult due to distributed partial observability of the state. Here, we follow a guided approach where a critic has central access to the global state during learning, which simplifies the policy evaluation problem from a reinforcement learning point of view. For example, we can get the positions of all robots of the swarm using a camera image of a scene. This camera image is only available to the critic and not to the control policies of the robots. We follow an actor-critic approach, where the actors base their decisions only on locally sensed information. In contrast, the critic is learned based on the true global state. Our algorithm uses deep reinforcement learning to approximate both the Q-function and the policy. The performance of the algorithm is evaluated on two tasks with simple simulated 2D agents: 1) finding and maintaining a certain distance to each others and 2) locating a target.

1709.04574 2026-06-04 cs.HC cs.AI cs.SY eess.SY stat.ML 版本更新

Towards personalized human AI interaction - adapting the behavior of AI agents using neural signatures of subjective interest

迈向个性化的人工智能交互 - 利用主观兴趣的神经签名来适应AI代理的行为

Victor Shih, David C Jangraw, Paul Sajda, Sameer Saproo

AI总结 本文提出通过神经签名检测用户兴趣,使深度强化学习AI代理适应个性化人类偏好,首次展示hBCI在虚拟环境中隐式强化AI控制系统的应用。

Comments 11 pages, 9 figures, 1 table, Submitted to IEEE Trans. on Neural Networks and Learning Systems

详情
AI中文摘要

强化学习AI通常使用环境中的客观奖励/惩罚信号(如游戏得分、完成时间等)来学习最优任务策略。然而,此类AI代理的人机交互应包含隐式且主观的强化信号(如人类对特定AI行为的偏好),以适应个体化的人类偏好。这种适应会模仿自然发生的增强信任和舒适度的社会互动过程。本文展示如何利用混合脑机接口(hBCI)检测个体在虚拟环境中的兴趣水平,以适应控制虚拟自动驾驶车辆的深度强化学习AI代理。具体而言,我们展示AI学习了一种保持与前车安全距离的驾驶策略,并最值得注意的是,当车辆乘客遇到感兴趣物体时,优先减速。这种适应使主观有趣物体的观看时间增加了20%。这是首次展示如何利用hBCI以包含用户偏好的方式向AI代理提供隐式强化。

英文摘要

Reinforcement Learning AI commonly uses reward/penalty signals that are objective and explicit in an environment -- e.g. game score, completion time, etc. -- in order to learn the optimal strategy for task performance. However, Human-AI interaction for such AI agents should include additional reinforcement that is implicit and subjective -- e.g. human preferences for certain AI behavior -- in order to adapt the AI behavior to idiosyncratic human preferences. Such adaptations would mirror naturally occurring processes that increase trust and comfort during social interactions. Here, we show how a hybrid brain-computer-interface (hBCI), which detects an individual's level of interest in objects/events in a virtual environment, can be used to adapt the behavior of a Deep Reinforcement Learning AI agent that is controlling a virtual autonomous vehicle. Specifically, we show that the AI learns a driving strategy that maintains a safe distance from a lead vehicle, and most novelly, preferentially slows the vehicle when the human passengers of the vehicle encounter objects of interest. This adaptation affords an additional 20\% viewing time for subjectively interesting objects. This is the first demonstration of how an hBCI can be used to provide implicit reinforcement to an AI agent in a way that incorporates user preferences into the control system.

1709.02555 2026-06-04 eess.SY cs.AI cs.LG cs.LO cs.SY 版本更新

Causality-Aided Falsification

因果辅助的反驳

Takumi Akazaki, Yoshihiro Kumazawa, Ichiro Hasuo

AI总结 本文提出利用因果信息提升异构系统质量保证中反驳效率的方法,通过贝叶斯网络优化成本函数实现高效输入值搜索。

Comments In Proceedings FVAV 2017, arXiv:1709.02126

详情
Journal ref
EPTCS 257, 2017, pp. 3-18
AI中文摘要

在异构系统的质量保证中,反例寻找因其复杂性超出了大多数验证技术的可扩展性而受到关注。本文提出在反例寻找中引入因果辅助的概念:通过为反例求解器提供由贝叶斯网络表达的合适因果信息,使其依赖于特定成本函数的随机优化,可以高效地搜索反例输入值。我们的实验结果展示了该方法的可行性。

英文摘要

Falsification is drawing attention in quality assurance of heterogeneous systems whose complexities are beyond most verification techniques' scalability. In this paper we introduce the idea of causality aid in falsification: by providing a falsification solver -- that relies on stochastic optimization of a certain cost function -- with suitable causal information expressed by a Bayesian network, search for a falsifying input value can be efficient. Our experiment results show the idea's viability.

1709.02435 2026-06-04 cs.AI cs.LG cs.SE cs.SY eess.SY 版本更新

An Analysis of ISO 26262: Using Machine Learning Safely in Automotive Software

ISO 26262分析:在汽车软件中安全使用机器学习

Rick Salay, Rodrigo Queiroz, Krzysztof Czarnecki

AI总结 本文分析了在汽车软件中使用机器学习对ISO 26262安全生命周期的影响,并提出适应该标准以容纳机器学习的建议。

Comments 6 pages, 3 figures

详情
AI中文摘要

机器学习(ML)在高级驾驶辅助和自动驾驶功能中的作用日益增加;然而,其在安全认证方面的充分性仍存在争议。本文分析了将ML作为实现方法对ISO 26262安全生命周期的影响,并探讨了如何解决这些问题。我们随后提供了一套建议,说明如何调整标准以适应机器学习。

英文摘要

Machine learning (ML) plays an ever-increasing role in advanced automotive functionality for driver assistance and autonomous operation; however, its adequacy from the perspective of safety certification remains controversial. In this paper, we analyze the impacts that the use of ML as an implementation approach has on ISO 26262 safety lifecycle and ask what could be done to address them. We then provide a set of recommendations on how to adapt the standard to accommodate ML.

1709.02126 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Proceedings First Workshop on Formal Verification of Autonomous Vehicles

第一届自动驾驶车辆形式验证研讨会论文集

Lukas Bulwahn, Maryam Kamali, Sven Linker

发表机构 * International Conference on integrated Formal Methods(国际形式化方法会议) EPTCS(电子程序技术报告)

AI总结 本文集聚焦自动驾驶车辆的形式验证,汇集了形式验证领域研究人员及控制理论、机器人学等领域的专家,探讨验证技术在自动驾驶开发中的应用。

详情
Journal ref
EPTCS 257, 2017
AI中文摘要

这些是2017年9月19日在意大利都灵举行的自动驾驶车辆形式验证研讨会的论文集,作为国际集成形式方法会议(iFM 2017)的附属研讨会。研讨会旨在汇集形式验证社区中开发用于自动驾驶车辆的形式方法的研究人员,以及在控制理论或机器人学等领域工作的研究人员,探讨验证技术在自动驾驶车辆设计与开发中的应用。

英文摘要

These are the proceedings of the workshop on Formal Verification of Autonomous Vehicles, held on September 19th, 2017 in Turin, Italy, as an affiliated workshop of the International Conference on integrated Formal Methods (iFM 2017). The workshop aim is to bring together researchers from the formal verification community that are developing formal methods for autonomous vehicles as well as researchers working, e.g., in the area of control theory or robotics, interested in applying verification techniques for designing and developing of autonomous vehicles.

1708.03800 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Energy saving for building heating via a simple and efficient model-free control design: First steps with computer simulations

通过简单高效的模型无关控制设计实现建筑供暖节能:计算机仿真初步研究

Hassane Abouaïssa, Ola Alhaj Hasan, Cédric Join, Michel Fliess, Didier Defer

AI总结 本文提出一种无需数学描述的模型无关控制方法,通过计算机仿真展示其在建筑供暖节能中的有效性,并与经典PI控制器和基于平坦度的预测控制器进行对比。

Comments 21st International Conference on System Theory, Control and Computing, October 2017, Sinaia, Romania

详情
AI中文摘要

建筑供暖系统基于模型的节能控制在现有研究中面临严重的物理、数学和校准难题。本文通过新的模型无关控制设置来解决这一问题,其中无需任何数学描述。展示了多个有说服力的计算机仿真。提供了与经典PI控制器和基于平坦度的预测控制器的比较。

英文摘要

The model-based control of building heating systems for energy saving encounters severe physical, mathematical and calibration difficulties in the numerous attempts that has been published until now. This topic is addressed here via a new model-free control setting, where the need of any mathematical description disappears. Several convincing computer simulations are presented. Comparisons with classic PI controllers and flatness-based predictive control are provided.

1708.08133 2026-06-04 q-bio.NC cs.AI cs.SY eess.SY math.DS 版本更新

Methods for applying the Neural Engineering Framework to neuromorphic hardware

将神经工程框架应用于类脑硬件的方法

Aaron R. Voelker, Chris Eliasmith

AI总结 本文介绍了应用于最新类脑硬件的神经工程框架方法,重点在于实现线性和非线性动力系统,并考虑非理想混合模拟-数字突触的高阶动力学。

Comments 11 pages, no figures

详情
AI中文摘要

我们回顾了当前用于将神经工程框架应用于最新类脑硬件的软件工具和理论方法。这些方法可用于实现利用轴突传导时间延迟的线性和非线性动力系统,并完全考虑具有异质时间常数的非理想混合模拟-数字突触。这总结了之前在更生物学背景下讨论过的这些方法版本(Voelker & Eliasmith, 2017)或针对特定类脑架构讨论过的版本(Voelker et al., 2017)。

英文摘要

We review our current software tools and theoretical methods for applying the Neural Engineering Framework to state-of-the-art neuromorphic hardware. These methods can be used to implement linear and nonlinear dynamical systems that exploit axonal transmission time-delays, and to fully account for nonideal mixed-analog-digital synapses that exhibit higher-order dynamics with heterogeneous time-constants. This summarizes earlier versions of these methods that have been discussed in a more biological context (Voelker & Eliasmith, 2017) or regarding a specific neuromorphic architecture (Voelker et al., 2017).

1708.01925 2026-06-04 cs.MA cs.AI cs.CY cs.RO cs.SY eess.SY 版本更新

Designing Autonomous Vehicles: Evaluating the Role of Human Emotions and Social Norms

设计自动驾驶车辆:评估人类情感与社会规范的作用

Faisal Riaz, Muaz A. Niazi

AI总结 本文提出通过引入社会规范合规机制,使自动驾驶车辆遵循道路与社会规则,利用模糊逻辑和情绪计算提升决策能力,通过模拟验证其在减少碰撞方面的有效性。

Comments 42 pages, 12 figures

详情
AI中文摘要

人类即将在未来不久将驾驶权利委托给自动驾驶车辆。然而,为完成这一复杂任务,需要一种机制,迫使自动驾驶车辆遵守由良好驾驶者实践的道路和社会规则。此任务可通过在自动驾驶车辆中引入社会规范合规机制来实现。本文提出一个自动驾驶车辆的人工社会作为人类社会的类比。每个AV被分配了具有不同社会影响的社会性格。社会规范被引入,帮助AV在受情绪影响的情况下做出道路避障决策。此外,通过基于前景的情绪(即恐惧)的社交规范合规机制,利用模糊逻辑计算情绪,并通过SimConnect方法将恐惧的模糊值提供给Netlogo模拟环境,以模拟自动驾驶车辆的人工社会。通过行为空间工具进行了广泛的测试,以确定所提出方法在碰撞数量方面的性能。此外,还提出了基于随机漫步模型的人工社会作为比较。与随机漫步的比较证明,所提出的方法为未来自动驾驶车辆的自动驾驶系统提供了更好的选择,这些系统在安全道路旅行方面将更具社会接受性和信任度。

英文摘要

Humans are going to delegate the rights of driving to the autonomous vehicles in near future. However, to fulfill this complicated task, there is a need for a mechanism, which enforces the autonomous vehicles to obey the road and social rules that have been practiced by well-behaved drivers. This task can be achieved by introducing social norms compliance mechanism in the autonomous vehicles. This research paper is proposing an artificial society of autonomous vehicles as an analogy of human social society. Each AV has been assigned a social personality having different social influence. Social norms have been introduced which help the AVs in making the decisions, influenced by emotions, regarding road collision avoidance. Furthermore, social norms compliance mechanism, by artificial social AVs, has been proposed using prospect based emotion i.e. fear, which is conceived from OCC model. Fuzzy logic has been employed to compute the emotions quantitatively. Then, using SimConnect approach, fuzzy values of fear has been provided to the Netlogo simulation environment to simulate artificial society of AVs. Extensive testing has been performed using the behavior space tool to find out the performance of the proposed approach in terms of the number of collisions. For comparison, the random-walk model based artificial society of AVs has been proposed as well. A comparative study with a random walk, prove that proposed approach provides a better option to tailor the autopilots of future AVS, Which will be more socially acceptable and trustworthy by their riders in terms of safe road travel.

1610.05984 2026-06-04 cs.NE cs.AI cs.LG cs.SY eess.SY 版本更新

Particle Swarm Optimization for Generating Interpretable Fuzzy Reinforcement Learning Policies

粒子群优化用于生成可解释的模糊强化学习策略

Daniel Hein, Alexander Hentschel, Thomas Runkler, Steffen Udluft

AI总结 本文提出一种基于模糊粒子群强化学习(FPSRL)的方法,通过训练参数在模拟真实系统动态的世界模型上生成可解释的模糊强化学习策略,适用于无法进行在线学习的领域。

详情
Journal ref
Engineering Applications of Artificial Intelligence, Volume 65C, October 2017, Pages 87-98
AI中文摘要

模糊控制器是用于连续状态和动作空间的有效且可解释的系统控制器。到目前为止,此类控制器要么是手动构建的,要么是通过使用专家生成的问题特定成本函数或结合详细的最优控制策略知识自动训练的。在大多数现实世界的强化学习(RL)问题中,这两种要求都不存在。在这些应用中,由于在线学习需要在策略训练期间探索问题的动力学,因此通常禁止在线学习。我们引入了一种模糊粒子群强化学习(FPSRL)方法,该方法仅通过在模拟真实系统动态的世界模型上训练参数来构建模糊RL策略。这些世界模型是通过使用之前生成的转换样本的自主机器学习技术创建的。据我们所知,这种方法是首次将自组织模糊控制器与基于模型的批量RL相关联的。因此,FPSRL旨在解决那些禁止在线学习、系统动态相对容易从先前生成的默认策略转换样本中建模,并且预计存在相对易于解释的控制策略的领域的问题。通过使用三个标准RL基准,即山车、平衡小车和小车摆起,证明了所提出方法在这些领域中的效率。我们的实验结果展示了高性能且可解释的模糊策略。

英文摘要

Fuzzy controllers are efficient and interpretable system controllers for continuous state and action spaces. To date, such controllers have been constructed manually or trained automatically either using expert-generated problem-specific cost functions or incorporating detailed knowledge about the optimal control strategy. Both requirements for automatic training processes are not found in most real-world reinforcement learning (RL) problems. In such applications, online learning is often prohibited for safety reasons because online learning requires exploration of the problem's dynamics during policy training. We introduce a fuzzy particle swarm reinforcement learning (FPSRL) approach that can construct fuzzy RL policies solely by training parameters on world models that simulate real system dynamics. These world models are created by employing an autonomous machine learning technique that uses previously generated transition samples of a real system. To the best of our knowledge, this approach is the first to relate self-organizing fuzzy controllers to model-based batch RL. Therefore, FPSRL is intended to solve problems in domains where online learning is prohibited, system dynamics are relatively easy to model from previously generated default policy transition samples, and it is expected that a relatively easily interpretable control policy exists. The efficiency of the proposed approach with problems from such domains is demonstrated using three standard RL benchmarks, i.e., mountain car, cart-pole balancing, and cart-pole swing-up. Our experimental results demonstrate high-performing, interpretable fuzzy policies.

1708.03366 2026-06-04 cs.LG cs.AI cs.CR cs.SY eess.SY 版本更新

Resilient Linear Classification: An Approach to Deal with Attacks on Training Data

鲁棒线性分类:一种应对训练数据攻击的方法

Sangdon Park, James Weimer, Insup Lee

AI总结 本文提出一种鲁棒线性分类方法,通过引入多数约束,提高对抗训练数据攻击的鲁棒性,验证了传统算法在攻击下的脆弱性。

Comments Accepted as a conference paper at ICCPS17

详情
AI中文摘要

数据驱动技术用于控制自动驾驶车辆、处理能源管理的需求响应以及建模人体生理学用于医疗设备。这些技术从训练数据中提取模型,其性能通常基于训练数据中的随机误差进行分析。然而,如果训练数据被攻击者恶意篡改,这些攻击对数据驱动CPS底层学习算法的影响尚未被考虑。本文分析了分类算法对训练数据攻击的鲁棒性。具体而言,提出了一种通用度量标准,用于衡量分类算法对训练数据最坏情况篡改的鲁棒性。使用该度量标准,我们显示传统线性分类算法在受限条件下具有鲁棒性。为克服这些限制,我们提出了一种具有多数约束的线性分类算法,并证明其比传统算法更鲁棒。在合成数据和一个现实世界的回顾性心律失常医疗案例研究中的评估显示,传统算法对篡改的训练数据易受攻击,而所提算法更具鲁棒性(以最坏情况篡改衡量)。

英文摘要

Data-driven techniques are used in cyber-physical systems (CPS) for controlling autonomous vehicles, handling demand responses for energy management, and modeling human physiology for medical devices. These data-driven techniques extract models from training data, where their performance is often analyzed with respect to random errors in the training data. However, if the training data is maliciously altered by attackers, the effect of these attacks on the learning algorithms underpinning data-driven CPS have yet to be considered. In this paper, we analyze the resilience of classification algorithms to training data attacks. Specifically, a generic metric is proposed that is tailored to measure resilience of classification algorithms with respect to worst-case tampering of the training data. Using the metric, we show that traditional linear classification algorithms are resilient under restricted conditions. To overcome these limitations, we propose a linear classification algorithm with a majority constraint and prove that it is strictly more resilient than the traditional algorithms. Evaluations on both synthetic data and a real-world retrospective arrhythmia medical case-study show that the traditional algorithms are vulnerable to tampered training data, whereas the proposed algorithm is more resilient (as measured by worst-case tampering).

1608.02193 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Spacetimes with Semantics (III) - The Structure of Functional Knowledge Representation and Artificial Reasoning

具有语义的空间(III)- 功能知识表示与人工推理的结构

Mark Burgess

AI总结 本文探讨了知识表示作为语义系统的结构,基于承诺理论框架,提出概念、关联知识和情境意识的解释,强调语义时空属性对学习和智能系统的影响。

Comments 122 pages, builiding on parts I and II Minor updates and corrections added to current version

详情
AI中文摘要

利用先前发展的语义时空概念,本文在承诺理论框架内探讨了知识表示及其结构作为语义系统的解释。通过为现象赋予解释,从观察者到被观察者,可以接近基于功能的系统简单描述,并具有直接实用价值。重点在于概念、关联知识和情境意识的解释。推断认为,大多数或所有这些概念源于纯粹的语义时空属性,这为更广泛理解学习或智能系统的构成提供了可能。一些关键原则浮现:1)时空尺度分离,2)四种不可约简关联类型的重复出现,通过意图传播:聚合、因果、合作和相似性,3)身份的辨别需求(离散),通过区分时间线同时性与顺序事件,4)学习(记忆)的能力。至少合理推测,涌现的知识抽象能力起源于基本的时空结构。这些笔记呈现了大部分已知结果的统一观点;它们使信息模型、知识表示、机器学习和语义网络(运输和信息基础)在共同框架下得以理解。'智能空间'的概念涵盖了人工系统和生物系统,跨越许多不同尺度,例如智能城市和组织。

英文摘要

Using the previously developed concepts of semantic spacetime, I explore the interpretation of knowledge representations, and their structure, as a semantic system, within the framework of promise theory. By assigning interpretations to phenomena, from observers to observed, we may approach a simple description of knowledge-based functional systems, with direct practical utility. The focus is especially on the interpretation of concepts, associative knowledge, and context awareness. The inference seems to be that most if not all of these concepts emerge from purely semantic spacetime properties, which opens the possibility for a more generalized understanding of what constitutes a learning, or even `intelligent' system. Some key principles emerge for effective knowledge representation: 1) separation of spacetime scales, 2) the recurrence of four irreducible types of association, by which intent propagates: aggregation, causation, cooperation, and similarity, 3) the need for discrimination of identities (discrete), which is assisted by distinguishing timeline simultaneity from sequential events, and 4) the ability to learn (memory). It is at least plausible that emergent knowledge abstraction capabilities have their origin in basic spacetime structures. These notes present a unified view of mostly well-known results; they allow us to see information models, knowledge representations, machine learning, and semantic networking (transport and information base) in a common framework. The notion of `smart spaces' thus encompasses artificial systems as well as living systems, across many different scales, e.g. smart cities and organizations.

1707.06334 2026-06-04 eess.SY cs.AI cs.IT cs.SY math.IT math.OC nlin.AO 版本更新

Fully Decentralized Policies for Multi-Agent Systems: An Information Theoretic Approach

多智能体系统的完全去中心化策略:信息论方法

Roel Dobbe, David Fridovich-Keil, Claire Tomlin

AI总结 本文提出基于信息论的框架,用于多智能体系统中无通信条件下的去中心化策略设计,通过率失真理论分析策略重建最优解的能力,并扩展至确定通信节点以提升个体策略性能。

详情
AI中文摘要

在多智能体系统中学习合作策略常受部分可观测性和协调不足的挑战。在某些情况下,问题结构允许通过有限通信实现分布式解决方案。本文考虑无通信可用的场景,学习所有智能体的局部策略以集体模拟集中多智能体静态优化问题的解。我们的主要贡献是基于率失真理论的信息论框架,该框架有助于分析所得到的完全去中心化策略重建最优解的能力。此外,该框架提供了一种自然扩展,以确定智能体应与哪些节点通信以提高其个体策略的性能。

英文摘要

Learning cooperative policies for multi-agent systems is often challenged by partial observability and a lack of coordination. In some settings, the structure of a problem allows a distributed solution with limited communication. Here, we consider a scenario where no communication is available, and instead we learn local policies for all agents that collectively mimic the solution to a centralized multi-agent static optimization problem. Our main contribution is an information theoretic framework based on rate distortion theory which facilitates analysis of how well the resulting fully decentralized policies are able to reconstruct the optimal solution. Moreover, this framework provides a natural extension that addresses which nodes an agent should communicate with to improve the performance of its individual policy.

1605.07246 2026-06-04 cs.LG cs.AI cs.NA math.NA 版本更新

Adaptive ADMM with Spectral Penalty Parameter Selection

自适应ADMM与谱惩罚参数选择

Zheng Xu, Mario A. T. Figueiredo, Tom Goldstein

AI总结 本文提出自适应ADMM算法,通过自适应调整惩罚参数实现快速收敛,提高算法鲁棒性与易用性。

Comments AISTATS 2017

详情
AI中文摘要

交替方向乘子法(ADMM)是一种解决广泛约束优化问题的 versatile 工具,适用于可微或非可微的目标函数。不幸的是,其性能高度敏感于惩罚参数,使ADMM往往不可靠且难以自动化。我们通过提出自适应调整惩罚参数的方法来克服这一缺点,得到的自适应ADMM(AADMM)算法受成功Barzilai-Borwein谱方法启发,实现快速收敛和对初始步长和问题规模的相对不敏感性。

英文摘要

The alternating direction method of multipliers (ADMM) is a versatile tool for solving a wide range of constrained optimization problems, with differentiable or non-differentiable objective functions. Unfortunately, its performance is highly sensitive to a penalty parameter, which makes ADMM often unreliable and hard to automate for a non-expert user. We tackle this weakness of ADMM by proposing a method to adaptively tune the penalty parameters to achieve fast convergence. The resulting adaptive ADMM (AADMM) algorithm, inspired by the successful Barzilai-Borwein spectral method for gradient descent, yields fast convergence and relative insensitivity to the initial stepsize and problem scaling.

1705.05065 2026-06-04 cs.RO cs.AI cs.CV cs.SY eess.SY 版本更新

AirSim: High-Fidelity Visual and Physical Simulation for Autonomous Vehicles

AirSim:面向自动驾驶车辆的高保真视觉与物理模拟

Shital Shah, Debadeepta Dey, Chris Lovett, Ashish Kapoor

AI总结 本文提出基于Unreal引擎的AirSim模拟器,用于高效开发和测试自动驾驶算法,支持高频率物理模拟和多种协议,通过四旋翼实验验证其有效性。

Comments Accepted for Field and Service Robotics conference 2017 (FSR 2017)

详情
AI中文摘要

为自动驾驶车辆开发和测试算法在现实世界中成本高且耗时。为利用最新机器智能和深度学习进展,需收集大量标注训练数据。本文提出基于Unreal引擎的新模拟器,提供真实的物理和视觉模拟。模拟器包含可实现实时硬件在环(HITL)模拟的物理引擎,支持MavLink等流行协议。模拟器从零开始设计,可扩展以适应新车辆类型、硬件平台和软件协议。模块化设计使各组件可独立用于其他项目。通过实现四旋翼自动驾驶车辆并实验性比较软件组件与真实飞行,验证了模拟器的有效性。

英文摘要

Developing and testing algorithms for autonomous vehicles in real world is an expensive and time consuming process. Also, in order to utilize recent advances in machine intelligence and deep learning we need to collect a large amount of annotated training data in a variety of conditions and environments. We present a new simulator built on Unreal Engine that offers physically and visually realistic simulations for both of these goals. Our simulator includes a physics engine that can operate at a high frequency for real-time hardware-in-the-loop (HITL) simulations with support for popular protocols (e.g. MavLink). The simulator is designed from the ground up to be extensible to accommodate new types of vehicles, hardware platforms and software protocols. In addition, the modular design enables various components to be easily usable independently in other projects. We demonstrate the simulator by first implementing a quadrotor as an autonomous vehicle and then experimentally comparing the software components with real-world flights.

1707.02515 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

A Fast Integrated Planning and Control Framework for Autonomous Driving via Imitation Learning

一种通过模仿学习的快速集成规划与控制系统用于自动驾驶

Liting Sun, Cheng Peng, Wei Zhan, Masayoshi Tomizuka

AI总结 本文提出一种结合学习与优化方法的两层框架,通过神经网络学习长期最优策略并结合短期优化控制器提升自动驾驶的安全性和效率。

详情
AI中文摘要

为实现自动驾驶中的安全高效规划与控制,需要一种能够长期 horizon 内实现良好驾驶质量且保证安全可行的驾驶策略。基于优化的方法,如模型预测控制(MPC),可以提供此类最优策略,但其计算复杂度通常无法满足实时实现的需求。为解决此问题,我们提出了一种快速集成规划与控制系统,该系统通过在两层分层结构中结合学习与优化方法。第一层定义为“策略层”,由神经网络建立,学习由MPC生成的长期最优驾驶策略。第二层称为“执行层”,是一个基于优化的短期控制器,能够跟踪由“策略层”提供的参考轨迹,并保证短期的安全性和可行性。此外,通过高效且高度代表性的特征,小尺寸的神经网络足以处理许多复杂的驾驶场景。这使得在线模仿学习与数据集聚合(DAgger)成为可能,从而能够快速且持续地提升“策略层”的性能。几个驾驶场景的例子被演示以验证所提框架的有效性和效率。

英文摘要

For safe and efficient planning and control in autonomous driving, we need a driving policy which can achieve desirable driving quality in long-term horizon with guaranteed safety and feasibility. Optimization-based approaches, such as Model Predictive Control (MPC), can provide such optimal policies, but their computational complexity is generally unacceptable for real-time implementation. To address this problem, we propose a fast integrated planning and control framework that combines learning- and optimization-based approaches in a two-layer hierarchical structure. The first layer, defined as the "policy layer", is established by a neural network which learns the long-term optimal driving policy generated by MPC. The second layer, called the "execution layer", is a short-term optimization-based controller that tracks the reference trajecotries given by the "policy layer" with guaranteed short-term safety and feasibility. Moreover, with efficient and highly-representative features, a small-size neural network is sufficient in the "policy layer" to handle many complicated driving scenarios. This renders online imitation learning with Dataset Aggregation (DAgger) so that the performance of the "policy layer" can be improved rapidly and continuously online. Several exampled driving scenarios are demonstrated to verify the effectiveness and efficiency of the proposed framework.

1706.09597 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Path Integral Networks: End-to-End Differentiable Optimal Control

路径积分网络:端到端可微最优控制

Masashi Okada, Luca Rigazio, Takenobu Aoshima

AI总结 本文提出路径积分网络(PI-Net),一种基于路径积分最优控制算法的递归网络表示,用于最优控制规划。PI-Net通过反向传播和随机梯度下降端到端学习系统动态和成本模型,具备规划能力,可泛化到未见状态,适用于连续控制任务,并支持多种学习方案。

详情
AI中文摘要

在本文中,我们介绍了路径积分网络(PI-Net),一种路径积分最优控制算法的递归网络表示。该网络包含系统动态和成本模型,用于基于最优控制的规划。PI-Net是完全可微的,通过反向传播和随机梯度下降端到端学习动态和成本模型。因此,PI-Net能够学习规划。PI-Net具有多个优点:它能够通过规划泛化到未见状态,可以应用于连续控制任务,并允许多种学习方案,包括模仿学习和强化学习。初步实验结果表明,通过模仿学习训练的PI-Net可以模仿两个模拟问题的控制演示:线性系统和倒摆上升问题。我们还展示了PI-Net能够学习演示中隐含的动态和成本模型。

英文摘要

In this paper, we introduce Path Integral Networks (PI-Net), a recurrent network representation of the Path Integral optimal control algorithm. The network includes both system dynamics and cost models, used for optimal control based planning. PI-Net is fully differentiable, learning both dynamics and cost models end-to-end by back-propagation and stochastic gradient descent. Because of this, PI-Net can learn to plan. PI-Net has several advantages: it can generalize to unseen states thanks to planning, it can be applied to continuous control tasks, and it allows for a wide variety learning schemes, including imitation and reinforcement learning. Preliminary experiment results show that PI-Net, trained by imitation learning, can mimic control demonstrations for two simulated problems; a linear system and a pendulum swing-up problem. We also show that PI-Net is able to learn dynamics and cost models latent in the demonstrations.

1609.00932 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.PR physics.data-an 版本更新

Spectral learning of dynamic systems from nonequilibrium data

从非平衡数据中学习动态系统的谱方法

Hao Wu, Frank Noé

AI总结 本文研究了在不假设数据同分布的情况下,通过施加平衡约束从非平衡观测数据中提取系统平衡动力学的谱学习特性,并提出了一种适用于连续数据的无bin扩展方法,实现线性复杂度下的稳定估计。

详情
Journal ref
Proceedings of the 29th conference on Neural Information Processing Systems (NIPS), Barcelona, Spain, 2016, pp. 4179-4187
AI中文摘要

可观测操作模型(OOMs)及相关模型是建模和分析随机系统的重要且强大的工具。它们精确描述有限秩系统的动力学,并可通过谱学习在假设数据同分布的情况下高效一致地估计。本文研究了在分析长时间尺度系统时不假设数据同分布的谱学习特性,并展示通过施加平衡约束可从非平衡观测数据中提取系统平衡动力学。此外,本文提出了一种适用于连续数据的无bin扩展谱学习方法。与其他连续值谱算法相比,无bin算法仅需线性复杂度即可实现平衡动力学的一致估计。

英文摘要

Observable operator models (OOMs) and related models are one of the most important and powerful tools for modeling and analyzing stochastic systems. They exactly describe dynamics of finite-rank systems and can be efficiently and consistently estimated through spectral learning under the assumption of identically distributed data. In this paper, we investigate the properties of spectral learning without this assumption due to the requirements of analyzing large-time scale systems, and show that the equilibrium dynamics of a system can be extracted from nonequilibrium observation data by imposing an equilibrium constraint. In addition, we propose a binless extension of spectral learning for continuous data. In comparison with the other continuous-valued spectral algorithms, the binless algorithm can achieve consistent estimation of equilibrium dynamics with only linear complexity.

1602.07764 2026-06-04 cs.AI cs.LG cs.NA math.NA math.OC stat.ML 版本更新

Reinforcement Learning of POMDPs using Spectral Methods

使用谱方法进行POMDP的强化学习

Kamyar Azizzadenesheli, Alessandro Lazaric, Animashree Anandkumar

AI总结 本文提出基于谱分解方法的POMDP强化学习算法,通过轨迹学习参数并利用优化 oracle 得到最优无记忆策略,证明了与最优无记忆策略的最优 regret 绑定和高维空间的高效扩展性。

详情
Journal ref
29th Annual Conference on Learning Theory, PMLR 49:193-256, 2016
AI中文摘要

我们提出了一种新的基于谱分解方法的POMDP强化学习算法。尽管谱方法之前已被用于一致学习隐马尔可夫模型等被动潜在变量模型,但POMDP更具挑战性,因为学习者与环境交互可能会改变未来的观测。我们设计了一种通过回合运行的算法,每个回合中利用谱技术从由固定策略生成的轨迹中学习POMDP参数。回合结束时,优化 oracle 返回基于估计POMDP模型的最优无记忆规划策略,该策略最大化预期奖励。我们证明了与最优无记忆策略相比的最优 regret 绑定以及在观测和动作空间维度上的高效扩展性。

英文摘要

We propose a new reinforcement learning algorithm for partially observable Markov decision processes (POMDP) based on spectral decomposition methods. While spectral methods have been previously employed for consistent learning of (passive) latent variable models such as hidden Markov models, POMDPs are more challenging since the learner interacts with the environment and possibly changes the future observations in the process. We devise a learning algorithm running through episodes, in each episode we employ spectral techniques to learn the POMDP parameters from a trajectory generated by a fixed policy. At the end of the episode, an optimization oracle returns the optimal memoryless planning policy which maximizes the expected reward based on the estimated POMDP model. We prove an order-optimal regret bound with respect to the optimal memoryless policy and efficient scaling with respect to the dimensionality of observation and action spaces.

1405.6341 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Efficient Model Learning for Human-Robot Collaborative Tasks

高效的人机协作任务模型学习

Stefanos Nikolaidis, Keren Gu, Ramya Ramakrishnan, Julie Shah

AI总结 本文提出一种框架,通过联合动作演示学习人类用户模型,使机器人能自动计算稳健的协作策略。采用无监督学习聚类动作序列,学习逆强化学习奖励函数,并在混合可观测马尔可夫决策过程框架中应用,实现对新用户的类型推断和策略计算。

详情
Journal ref
Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (HRI 2015)
AI中文摘要

我们提出了一种框架,用于从联合动作演示中学习人类用户模型,使机器人能够计算协作任务的稳健策略。学习过程完全自动,无需人工干预。首先,我们描述了使用无监督学习算法将演示的动作序列聚类为不同的人类类型。这些演示序列还被机器人用来通过逆强化学习算法学习代表每种类型的奖励函数。学习的模型随后作为混合可观测马尔可夫决策过程(MO-MDP)的一部分使用,其中人类类型是部分可观测变量。通过该框架,我们可以推断新用户类型(未包含在训练集中),并计算与新用户偏好一致且对人类动作偏离具有鲁棒性的机器人策略。最后,我们通过人类受试者实验数据验证了该方法,并进行了概念验证演示,其中一个人与小型工业机器人进行协作任务。

英文摘要

We present a framework for learning human user models from joint-action demonstrations that enables the robot to compute a robust policy for a collaborative task with a human. The learning takes place completely automatically, without any human intervention. First, we describe the clustering of demonstrated action sequences into different human types using an unsupervised learning algorithm. These demonstrated sequences are also used by the robot to learn a reward function that is representative for each type, through the employment of an inverse reinforcement learning algorithm. The learned model is then used as part of a Mixed Observability Markov Decision Process formulation, wherein the human type is a partially observable variable. With this framework, we can infer, either offline or online, the human type of a new user that was not included in the training set, and can compute a policy for the robot that will be aligned to the preference of this new user and will be robust to deviations of the human actions from prior demonstrations. Finally we validate the approach using data collected in human subject experiments, and conduct proof-of-concept demonstrations in which a person performs a collaborative task with a small industrial robot.

1702.07944 2026-06-04 cs.LG cs.AI cs.SY eess.SY math.OC stat.ML 版本更新

Stochastic Variance Reduction Methods for Policy Evaluation

基于随机方差缩减的方法用于策略评估

Simon S. Du, Jianshu Chen, Lihong Li, Lin Xiao, Dengyong Zhou

AI总结 本文提出基于线性函数逼近的策略评估方法,通过将经验策略评估问题转化为二次凸-凹鞍点问题,并设计了双变量批量梯度方法及两种随机方差缩减算法,实现线性缩放和线性收敛。

Comments Accepted by ICML 2017

详情
AI中文摘要

策略评估是强化学习中的关键步骤,用于估计在给定策略下状态长期价值的价值函数。本文聚焦于在固定数据集上使用线性函数逼近的策略评估。我们首先将经验策略评估问题转化为二次凸-凹鞍点问题,然后提出了一种对偶批量梯度方法,以及两种用于解决该问题的随机方差缩减方法。这些算法在样本大小和特征维度上均呈线性扩展。此外,即使当鞍点问题仅在对偶变量中具有强凹性而没有在原变量中具有强凸性时,它们仍能实现线性收敛。在基准问题上的数值实验验证了方法的有效性。

英文摘要

Policy evaluation is a crucial step in many reinforcement-learning procedures, which estimates a value function that predicts states' long-term value under a given policy. In this paper, we focus on policy evaluation with linear function approximation over a fixed dataset. We first transform the empirical policy evaluation problem into a (quadratic) convex-concave saddle point problem, and then present a primal-dual batch gradient method, as well as two stochastic variance reduction methods for solving the problem. These algorithms scale linearly in both sample size and feature dimension. Moreover, they achieve linear convergence even when the saddle-point problem has only strong concavity in the dual variables but no strong convexity in the primal variables. Numerical experiments on benchmark problems demonstrate the effectiveness of our methods.

1705.10432 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Fine-grained acceleration control for autonomous intersection management using deep reinforcement learning

基于深度强化学习的细粒度加速控制用于自动驾驶交叉口管理

Hamid Mirzaei, Tony Givargis

AI总结 本文利用信任区域策略优化方法,实现自动驾驶车辆在网格街道中的细粒度加速控制,以达成全局管理目标。

Comments Accepted in IEEE Smart World Congress 2017

详情
AI中文摘要

近年来,结合深度学习和强化学习的进展为设计新的控制代理提供了有前景的路径,这些代理能够学习复杂控制任务的最优策略。这些新方法解决了传统强化学习方法的主要限制,如定制特征工程和小动作/状态空间维度要求。在本文中,我们利用一种最先进的强化学习方法,即信任区域策略优化,来解决自动驾驶车辆的交叉口管理问题。我们展示了使用该方法可以对自动驾驶车辆进行网格街道计划中的细粒度加速控制,以实现全局设计目标。

英文摘要

Recent advances in combining deep learning and Reinforcement Learning have shown a promising path for designing new control agents that can learn optimal policies for challenging control tasks. These new methods address the main limitations of conventional Reinforcement Learning methods such as customized feature engineering and small action/state space dimension requirements. In this paper, we leverage one of the state-of-the-art Reinforcement Learning methods, known as Trust Region Policy Optimization, to tackle intersection management for autonomous vehicles. We show that using this method, we can perform fine-grained acceleration control of autonomous vehicles in a grid street plan to achieve a global design objective.

1705.05116 2026-06-04 cs.RO cs.AI cs.CV cs.LG cs.SY eess.SY 版本更新

Tuning Modular Networks with Weighted Losses for Hand-Eye Coordination

通过加权损失调节模块网络以提升手眼协调

Fangyi Zhang, Jürgen Leitner, Michael Milford, Peter I. Corke

AI总结 本文提出端到端微调方法,通过加权损失提升模块化深度视觉-运动策略在平面抓取任务中的手眼协调性能。

Comments 2 pages, to appear in the Deep Learning for Robotic Vision (DLRV) Workshop in CVPR 2017

详情
AI中文摘要

本文介绍了一种端到端微调方法,用于改进模块化深度视觉-运动策略(模块网络)中的手眼协调能力,其中每个模块独立训练。得益于加权损失,该微调方法显著提升了策略在机器人平面抓取任务中的性能。

英文摘要

This paper introduces an end-to-end fine-tuning method to improve hand-eye coordination in modular deep visuo-motor policies (modular networks) where each module is trained independently. Benefiting from weighted losses, the fine-tuning method significantly improves the performance of the policies for a robotic planar reaching task.

1601.04037 2026-06-04 cs.RO cs.AI cs.SY eess.SY math.DS math.OC 版本更新

Funnel Libraries for Real-Time Robust Feedback Motion Planning

用于实时鲁棒反馈运动规划的 funnel 库

Anirudha Majumdar, Russ Tedrake

AI总结 本文提出利用预计算的 funnel 库实现实时鲁棒反馈运动规划,通过凸优化计算 funnel 并在运行时安全组合运动计划,验证了在复杂环境中高动态机器人系统鲁棒性和安全性。

Comments International Journal of Robotics Research (To Appear)

详情
AI中文摘要

我们考虑了在存在环境不确定性、参数模型不确定性和扰动时,生成保证成功的机器人运动计划的问题。此外,我们还考虑了必须在实时中生成这些计划的场景,因为环境中的约束(如障碍物)可能在运行时通过有噪声的传感器感知到。我们的方法是预先计算不同系统操作的“funnels”库,这些 funnels 确保在执行对应操作的反馈控制器时,状态在扰动范围内保持。我们利用凸优化(特别是求和平方编程)的强大计算能力来计算这些 funnels。所得到的 funnel 库然后在运行时被顺序组合以生成运动计划,同时确保机器人的安全性。本文的一个主要优势是通过显式考虑不确定性的影响,机器人可以根据运动计划对扰动的脆弱性来评估。我们通过大量硬件实验(在高速(约12英里/小时)下避障的小型固定翼飞机)和彻底的仿真实验(地面车辆和四旋翼模型在复杂环境中导航)来演示和验证我们的方法。据我们所知,这些演示构成了首次证明安全且鲁棒的控制方法,用于具有复杂非线性动力学的机器人系统,在具有复杂几何约束的环境中实时规划。

英文摘要

We consider the problem of generating motion plans for a robot that are guaranteed to succeed despite uncertainty in the environment, parametric model uncertainty, and disturbances. Furthermore, we consider scenarios where these plans must be generated in real-time, because constraints such as obstacles in the environment may not be known until they are perceived (with a noisy sensor) at runtime. Our approach is to pre-compute a library of "funnels" along different maneuvers of the system that the state is guaranteed to remain within (despite bounded disturbances) when the feedback controller corresponding to the maneuver is executed. We leverage powerful computational machinery from convex optimization (sums-of-squares programming in particular) to compute these funnels. The resulting funnel library is then used to sequentially compose motion plans at runtime while ensuring the safety of the robot. A major advantage of the work presented here is that by explicitly taking into account the effect of uncertainty, the robot can evaluate motion plans based on how vulnerable they are to disturbances. We demonstrate and validate our method using extensive hardware experiments on a small fixed-wing airplane avoiding obstacles at high speed (~12 mph), along with thorough simulation experiments of ground vehicle and quadrotor models navigating through cluttered environments. To our knowledge, these demonstrations constitute one of the first examples of provably safe and robust control for robotic systems with complex nonlinear dynamics that need to plan in real-time in environments with complex geometric constraints.

1704.03103 2026-06-04 cs.RO cs.AI cs.CG cs.SY eess.SY 版本更新

Minkowski Operations of Sets with Application to Robot Localization

Minkowski运算与机器人定位的应用

Benoit Desrochers, Luc Jaulin

AI总结 本文通过引入Minkowski和与差的分离器,高效解决机器人在非结构化环境中基于声呐测量的定位问题,并通过测试案例验证了方法的有效性。

Comments In Proceedings SNR 2017, arXiv:1704.02421

详情
Journal ref
EPTCS 247, 2017, pp. 34-45
AI中文摘要

本文展示使用分离器(由两个互补约束器组成)可以高效解决机器人在非结构化环境中基于声呐测量的定位问题。我们引入与Minkowski和与差相关的分离器以促进问题解决。通过测试案例说明了该方法的原理。

英文摘要

This papers shows that using separators, which is a pair of two complementary contractors, we can easily and efficiently solve the localization problem of a robot with sonar measurements in an unstructured environment. We introduce separators associated with the Minkowski sum and the Minkowski difference in order to facilitate the resolution. A test-case is given in order to illustrate the principle of the approach.

1704.01383 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Finite-Time Stabilization of Longitudinal Control for Autonomous Vehicles via a Model-Free Approach

通过模型无关方法实现自动驾驶车辆纵向控制的有限时间稳定化

Philip Polack, Brigitte d'Andréa-Novel, Michel Fliess, Arnaud de la Fortelle, Lghani Menhour

AI总结 本文提出一种模型无关的纵向控制方法,用于计算车辆轮扭矩命令,以克服未知车辆参数带来的控制律生成问题。通过使关键参数时间变化,确保有限时间稳定性,仿真显示 overshoot 减小,驾驶舒适性提高,时间延迟鲁棒性增强。

Comments IFAC 2017 World Congress, Toulouse

详情
AI中文摘要

本文介绍了一种用于计算车辆轮扭矩命令的纵向模型无关控制方法。该方法使我们能够克服未知车辆参数在生成合适控制律时的问题。该控制设置中的一个重要参数被设计为时间变化的,以确保有限时间稳定性。展示了多个有说服力的计算机仿真并进行了讨论。因此, overshoot 变得更小。驾驶舒适性得以提高,对时间延迟的鲁棒性也得到改善。

英文摘要

This communication presents a longitudinal model-free control approach for computing the wheel torque command to be applied on a vehicle. This setting enables us to overcome the problem of unknown vehicle parameters for generating a suitable control law. An important parameter in this control setting is made time-varying for ensuring finite-time stability. Several convincing computer simulations are displayed and discussed. Overshoots become therefore smaller. The driving comfort is increased and the robustness to time-delays is improved.

1703.08262 2026-06-04 eess.SY cs.AI cs.FL cs.SY 版本更新

Supervisor Synthesis of POMDP based on Automata Learning

基于自动机学习的POMDP监督器合成

Xiaobin Zhang, Bo Wu, Hai Lin

AI总结 本文提出基于概率计算树逻辑的POMDP监督器合成框架,采用确定性有限自动机作为控制器,通过L*学习算法确定自动机的结构,以满足安全关键应用的性能保证。

详情
AI中文摘要

作为自主系统通用且流行的模型,部分可观测马尔可夫决策过程(POMDP)能够捕捉来自不同来源的不确定性,如传感噪声、执行误差和不确定环境。然而,其全面性使得在POMDP中的规划和控制困难。传统POMDP规划问题旨在寻找最优策略以最大化累积奖励的期望。但对安全关键应用而言,希望系统性能由形式规范描述的保证,这促使我们考虑使用形式方法来合成POMDP的监督器。给定由概率计算树逻辑(PCTL)定义的系统规范,我们提出一个具有确定性有限自动机(DFA)类型的监督控制框架,即za-DFA。尽管现有工作主要依赖优化技术来学习固定大小的有限状态控制器(FSCs),我们开发了一种基于L*学习的算法来确定za-DFA的空间和转换。定义了成员查询和不同的假设验证器。该学习算法是正确且完整的。通过详细步骤的例子来说明监督器合成算法。

英文摘要

As a general and thus popular model for autonomous systems, partially observable Markov decision process (POMDP) can capture uncertainties from different sources like sensing noises, actuation errors, and uncertain environments. However, its comprehensiveness makes the planning and control in POMDP difficult. Traditional POMDP planning problems target to find the optimal policy to maximize the expectation of accumulated rewards. But for safety critical applications, guarantees of system performance described by formal specifications are desired, which motivates us to consider formal methods to synthesize supervisor for POMDP. With system specifications given by Probabilistic Computation Tree Logic (PCTL), we propose a supervisory control framework with a type of deterministic finite automata (DFA), za-DFA, as the controller form. While the existing work mainly relies on optimization techniques to learn fixed-size finite state controllers (FSCs), we develop an $L^*$ learning based algorithm to determine both space and transitions of za-DFA. Membership queries and different oracles for conjectures are defined. The learning algorithm is sound and complete. An example is given in detailed steps to illustrate the supervisor synthesis algorithm.

1602.05450 2026-06-04 stat.ML cs.AI cs.MA cs.SY eess.SY 版本更新

Inverse Reinforcement Learning in Swarm Systems

群体系统中的逆强化学习

Adrian Šošić, Wasiur R. KhudaBukhsh, Abdelhak M. Zoubir, Heinz Koeppl

AI总结 本文提出swarMDP框架,通过将群体特性融入去中心化部分可观测马尔可夫决策过程,将多智能体逆强化学习问题转化为单智能体问题,并提出适用于群体设置的异构学习方案,验证了该框架能生成有意义的局部奖励模型以复制全局系统动态。

Comments 9 pages, 8 figures; ### Version 2 ### version accepted at AAMAS 2017

详情
AI中文摘要

逆强化学习(IRL)已成为从演示数据中学习行为模型的有用工具。然而,IRL在多智能体系统中的应用仍大多未被探索。本文展示了如何将IRL的原则扩展到同质大规模问题,受自然系统集体趋同行为的启发。特别地,我们对领域做出了以下贡献:1)我们引入了swarMDP框架,这是一种具有群体特性的去中心化部分可观测马尔可夫决策过程的子类。2)利用该框架固有的同质性,我们通过证明该模型中的智能体特定价值函数相等,将所产生的多智能体IRL问题减少为单智能体问题。3)为解决相应的控制问题,我们提出了一种特别针对群体设置的异构学习方案。在两个示例系统上的结果表明,我们的框架能够生成有意义的局部奖励模型,从而复制观察到的全局系统动态。

英文摘要

Inverse reinforcement learning (IRL) has become a useful tool for learning behavioral models from demonstration data. However, IRL remains mostly unexplored for multi-agent systems. In this paper, we show how the principle of IRL can be extended to homogeneous large-scale problems, inspired by the collective swarming behavior of natural systems. In particular, we make the following contributions to the field: 1) We introduce the swarMDP framework, a sub-class of decentralized partially observable Markov decision processes endowed with a swarm characterization. 2) Exploiting the inherent homogeneity of this framework, we reduce the resulting multi-agent IRL problem to a single-agent one by proving that the agent-specific value functions in this model coincide. 3) To solve the corresponding control problem, we propose a novel heterogeneous learning scheme that is particularly tailored to the swarm setting. Results on two example systems demonstrate that our framework is able to produce meaningful local reward models from which we can replicate the observed global system dynamics.

1701.08567 2026-06-04 econ.GN cs.AI q-fin.EC 版本更新

Decision structure of risky choice

风险选择的决策结构

Lamb Wubin, Naixin Ren

AI总结 本文探讨构建统一的风险选择理论,解释补偿性和非补偿性理论,指出决策简化过程受策略和时间影响,导致偏好反转现象。

Comments 13 pages

详情
AI中文摘要

众所周知,经济学家与心理学家在风险决策中存在争议。本文讨论构建统一的风险选择理论,解释补偿性和非补偿性理论。根据认知能力,人们无法构建连续且准确的主观概率世界,而是采用小、中、大概率等顺序概念。决策基于信息、经验、想象等因素,这些因素庞大,迫使人们制定策略。不同情境下采用不同策略,这些因素的分布具有不同决策结构。决策是简化决策结构的过程,但该过程并非固定路径,而是通过不同路径重复问题时发生变化,导致偏好反转。最有效的简化方法是计算期望值或基于一到两个维度做决策。我们还认为,决策时间至少包含四个部分:替代时间、一阶时间、二阶时间和计算时间。决策结构也能解释悖论和异常现象。JEL Codes: C10, D03, D81

英文摘要

As we know, there is a controversy about the decision making under risk between economists and psychologists. We discuss to build a unified theory of risky choice, which would explain both of compensatory and non-compensatory theories. For risky choice, according to cognition ability, we argue that people could not build a continuous and accurate subjective probability world, but several order concepts, such as small, middle and large probability. People make decisions based on information, experience, imagination and other things. All of these things are so huge that people have to prepare some strategies. That is, people have different strategies when facing to different situations. The distributions of these things have different decision structures. More precisely, decision making is a process of simplifying the decision structure. However, the process of decision structure simplifying is not stuck in a rut, but through different path when facing problems repeatedly. It is why preference reversal always happens when making decisions. The most efficient way to simplify the decision structure is calculating expected value or making decisions based on one or two dimensions. We also argue that the deliberation time at least has four parts, which are consist of substitution time, first order time, second order time and calculation time. Decision structure also can simply explain the phenomenon of paradoxes and anomalies. JEL Codes: C10, D03, D81

1702.02628 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Optimal Detection of Faulty Traffic Sensors Used in Route Planning

用于路线规划的故障交通传感器最优检测

Amin Ghafouri, Aron Laszka, Abhishek Dubey, Xenofon Koutsoukos

AI总结 本文提出基于高斯过程的预测模型,用于检测故障交通传感器,减少误报和漏报对路线规划的影响,并通过实测数据验证方法有效性。

Comments Proceedings of The 2nd Workshop on Science of Smart City Operations and Platforms Engineering (SCOPE 2017), Pittsburgh, PA USA, April 2017, 6 pages

详情
AI中文摘要

在智能城市中,实时交通传感器可能被用于各种应用,如路线规划。不幸的是,传感器容易出现故障,导致错误的交通数据。错误的数据会严重影响路线规划等应用,并增加旅行时间。为最小化传感器故障的影响,必须及时准确地检测故障。然而,典型检测算法可能导致大量误报和漏报,从而导致次优的路线规划。本文提出了一种有效的检测器,利用基于高斯过程的预测模型来识别故障交通传感器。进一步,我们提出了一种计算检测器最佳参数的方法,以最小化由于误报和漏报造成的损失。我们还确定了关键传感器,其故障对路线规划应用影响较大。最后,我们实施了我们的方法,并使用真实世界数据集和路线规划平台OpenTripPlanner进行数值评估。

英文摘要

In a smart city, real-time traffic sensors may be deployed for various applications, such as route planning. Unfortunately, sensors are prone to failures, which result in erroneous traffic data. Erroneous data can adversely affect applications such as route planning, and can cause increased travel time. To minimize the impact of sensor failures, we must detect them promptly and accurately. However, typical detection algorithms may lead to a large number of false positives (i.e., false alarms) and false negatives (i.e., missed detections), which can result in suboptimal route planning. In this paper, we devise an effective detector for identifying faulty traffic sensors using a prediction model based on Gaussian Processes. Further, we present an approach for computing the optimal parameters of the detector which minimize losses due to false-positive and false-negative errors. We also characterize critical sensors, whose failure can have high impact on the route planning application. Finally, we implement our method and evaluate it numerically using a real-world dataset and the route planning platform OpenTripPlanner.

1701.01654 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Application of Fuzzy Logic in Design of Smart Washing Machine

模糊逻辑在智能洗衣机设计中的应用

Rao Farhat Masood

AI总结 本文研究了基于模糊逻辑控制器的智能洗衣机设计,通过自动化输入实现高效洗衣时间管理,提升电能利用效率和工作效能。

Comments Fuzzy Washing Machine, Smart Washing Machine

详情
AI中文摘要

洗衣机在家庭中具有重要需求,因为它能减轻我们洗衣服的负担并节省大量时间。本文将探讨基于模糊逻辑控制器的智能洗衣机设计与开发。传统的洗衣机(定时器基于)使用多转定时器启动-停止机制,这种机制是机械的,容易损坏。除了启动和停止的问题外,机械定时器在维护和电力使用效率方面也存在问题。最近的发展表明,将数字电子技术融入该机器的最优功能是可能的,如今在实践中已实现。一些国际知名公司已通过引入智能人工智能开发了这种机器。此类机器利用传感器并智能计算主电机的运行时间(洗涤时间)。实时计算和处理也用于优化机器的运行时间。显然的结果是智能时间管理、更好的电力经济性和工作效率。本文探讨了基于模糊逻辑控制器的洗衣机的国产化,该洗衣机能够自动化输入并获得所需输出(洗涤时间)。

英文摘要

Washing machine is of great domestic necessity as it frees us from the burden of washing our clothes and saves ample of our time. This paper will cover the aspect of designing and developing of Fuzzy Logic based, Smart Washing Machine. The regular washing machine (timer based) makes use of multi-turned timer based start-stop mechanism which is mechanical as is prone to breakage. In addition to its starting and stopping issues, the mechanical timers are not efficient with respect of maintenance and electricity usage. Recent developments have shown that merger of digital electronics in optimal functionality of this machine is possible and nowadays in practice. A number of international renowned companies have developed the machine with the introduction of smart artificial intelligence. Such a machine makes use of sensors and smartly calculates the amount of run-time (washing time) for the main machine motor. Realtime calculations and processes are also catered in optimizing the run-time of the machine. The obvious result is smart time management, better economy of electricity and efficiency of work. This paper deals with the indigenization of FLC (Fuzzy Logic Controller) based Washing Machine, which is capable of automating the inputs and getting the desired output (wash-time).

1703.03161 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Behavior-based Navigation of Mobile Robot in Unknown Environments Using Fuzzy Logic and Multi-Objective Optimization

基于模糊逻辑和多目标优化的未知环境中移动机器人行为导航

Thi Thanh Van Nguyen, Manh Duong Phung, Quang Vinh Tran

AI总结 本文提出BBFM架构,通过模糊控制器和多目标优化协调机器人在未知环境中避障和避开局部极小值的问题,提升了导航精度和效率。

详情
Journal ref
International Journal of Control and Automation, Vol. 10, No. 2 (2017), pp.349-364
AI中文摘要

本文提出一种名为BBFM的行为导航架构,用于解决在存在障碍物和局部极小值区域的未知环境中移动机器人的导航问题。在该架构中,复杂导航任务被分解为主要子任务或行为。每个行为由模糊控制器实现并独立执行以处理特定导航问题。模糊控制器被修改为仅包含模糊化和推理过程,使其输出表示行为的目标的隶属函数。所有控制器的隶属函数随后用作多目标优化过程的目标函数以协调所有行为。该过程的结果是整体控制信号,即帕累托最优的控制信号,用于控制机器人。进行了大量模拟、比较和实验。结果表明,所提出的架构在精度、平滑度、行驶距离和时间响应方面优于一些流行的基于行为的架构。

英文摘要

This study proposes behavior-based navigation architecture, named BBFM, to deal with the problem of navigating the mobile robot in unknown environments in the presence of obstacles and local minimum regions. In the architecture, the complex navigation task is split into principal sub-tasks or behaviors. Each behavior is implemented by a fuzzy controller and executed independently to deal with a specific problem of navigation. The fuzzy controller is modified to contain only the fuzzification and inference procedures so that its output is a membership function representing the behavior's objective. The membership functions of all controllers are then used as the objective functions for a multi-objective optimization process to coordinate all behaviors. The result of this process is an overall control signal, which is Pareto-optimal, used to control the robot. A number of simulations, comparisons, and experiments were conducted. The results show that the proposed architecture outperforms some popular behavior-based architectures in term of accuracy, smoothness, traveled distance, and time response.

1703.02810 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

An Integrated and Scalable Platform for Proactive Event-Driven Traffic Management

主动事件驱动交通管理的集成可扩展平台

Alain Kibangou, Alexander Artikis, Evangelos Michelioudakis, Georgios Paliouras, Marius Schmitt, John Lygeros, Chris Baber, Natan Morar, Fabiana Fournier, Inna Skarbovsky

AI总结 本文提出一个集成平台,通过事件驱动方法预测拥堵,提升交通管理效率。

详情
AI中文摘要

高速公路的交通可通过路障控制室的可变限速来管理。人类操作员无法高效管理多个可变限速装置。为此,本文提出一个智能交通管理平台,包含新的可变限速协调方案、高效的交互仪表盘、机器学习工具用于学习事件定义以及能够处理交通场景固有不确定性的复杂事件处理工具。与传统方法不同,该事件驱动平台可提前4分钟预测拥堵,从而实现主动决策,显著改善交通状况。

英文摘要

Traffic on freeways can be managed by means of ramp meters from Road Traffic Control rooms. Human operators cannot efficiently manage a network of ramp meters. To support them, we present an intelligent platform for traffic management which includes a new ramp metering coordination scheme in the decision making module, an efficient dashboard for interacting with human operators, machine learning tools for learning event definitions and Complex Event Processing tools able to deal with uncertainties inherent to the traffic use case. Unlike the usual approach, the devised event-driven platform is able to predict a congestion up to 4 minutes before it really happens. Proactive decision making can then be established leading to significant improvement of traffic conditions.

1702.08726 2026-06-04 cs.SE cs.AI cs.SY eess.SY 版本更新

Stacked Thompson Bandits

堆叠汤普森老虎机

Lenz Belzner, Thomas Gabor

AI总结 堆叠汤普森老虎机通过模拟评估计划并采用贝叶斯方法指导搜索,高效生成满足时间逻辑要求的计划。

Comments Accepted at SEsCPS @ ICSE 2017

详情
AI中文摘要

我们介绍了堆叠汤普森老虎机(STB),用于高效生成很可能满足给定有界时间逻辑要求的计划。STB使用模拟评估计划,并采用贝叶斯方法利用所得信息指导其搜索。特别是,我们展示通过堆叠多臂老虎机并用汤普森采样指导每个老虎机的动作选择过程,使STB能够以高概率生成满足要求的计划,同时仅搜索搜索空间的一小部分。

英文摘要

We introduce Stacked Thompson Bandits (STB) for efficiently generating plans that are likely to satisfy a given bounded temporal logic requirement. STB uses a simulation for evaluation of plans, and takes a Bayesian approach to using the resulting information to guide its search. In particular, we show that stacking multiarmed bandits and using Thompson sampling to guide the action selection process for each bandit enables STB to generate plans that satisfy requirements with a high probability while only searching a fraction of the search space.

1503.03467 2026-06-04 math.NA cs.AI cs.NA math.ST stat.ML stat.TH 版本更新

Multigrid with rough coefficients and Multiresolution operator decomposition from Hierarchical Information Games

多重网格与粗糙系数的多重分辨率算子分解来自层次信息博弈

Houman Owhadi

AI总结 本文提出了一种近线性复杂度的多重网格/多重分辨率方法,用于处理具有粗糙系数的偏微分方程,通过信息博弈理论框架实现精确的先验精度和性能估计。

Comments Presented at SIAM CSE 15. Final (published) version. http://epubs.siam.org/doi/abs/10.1137/15M1013894

详情
Journal ref
SIAM Rev. 59-1, pp. 99-149 (2017)
AI中文摘要

我们介绍了一种近线性复杂度(几何和无网格/代数)的多重网格/多重分辨率方法,用于具有粗糙(L^∞)系数的偏微分方程,具有严格的先验准确性和性能估计。该方法通过决策/博弈理论框架发现,解决三个问题:(1)识别限制和插值算子;(2)基于线性算子图像的范数约束恢复信号;(3)基于解的层次嵌套测量的赌注。所得基本赌注形成一个层次的(确定性)基函数集合H^1_0(Ω)(赌注函数),这些函数(1)在子尺度/子带之间关于由偏微分方程能量范数诱导的标量积正交;(2)在H^1_0(Ω)中实现解空间的稀疏压缩;(3)诱导一个正交的多重分辨率算子分解。多重网格方法的操作图是一个倒置金字塔,其中赌注函数局部计算(由于指数衰减),层次化(从细到粗尺度),并分解为具有均匀有界条件数的独立线性系统。所得算法在空间(通过局部化)和带宽/子尺度(子尺度可以独立计算)上均可并行化。尽管该方法是确定性的,但其在信息博弈框架下具有自然的贝叶斯解释,且多重分辨率逼近相对于由嵌套测量层次诱导的滤波器形成一个鞅。

英文摘要

We introduce a near-linear complexity (geometric and meshless/algebraic) multigrid/multiresolution method for PDEs with rough ($L^\infty$) coefficients with rigorous a-priori accuracy and performance estimates. The method is discovered through a decision/game theory formulation of the problems of (1) identifying restriction and interpolation operators (2) recovering a signal from incomplete measurements based on norm constraints on its image under a linear operator (3) gambling on the value of the solution of the PDE based on a hierarchy of nested measurements of its solution or source term. The resulting elementary gambles form a hierarchy of (deterministic) basis functions of $H^1_0(Ω)$ (gamblets) that (1) are orthogonal across subscales/subbands with respect to the scalar product induced by the energy norm of the PDE (2) enable sparse compression of the solution space in $H^1_0(Ω)$ (3) induce an orthogonal multiresolution operator decomposition. The operating diagram of the multigrid method is that of an inverted pyramid in which gamblets are computed locally (by virtue of their exponential decay), hierarchically (from fine to coarse scales) and the PDE is decomposed into a hierarchy of independent linear systems with uniformly bounded condition numbers. The resulting algorithm is parallelizable both in space (via localization) and in bandwith/subscale (subscales can be computed independently from each other). Although the method is deterministic it has a natural Bayesian interpretation under the measure of probability emerging (as a mixed strategy) from the information game formulation and multiresolution approximations form a martingale with respect to the filtration induced by the hierarchy of nested measurements.

1702.01205 2026-06-04 cs.AI cs.LG cs.SY eess.SY 版本更新

Traffic Lights with Auction-Based Controllers: Algorithms and Real-World Data

带拍卖机制的交通灯控制器:算法与现实数据

Shumeet Baluja, Michele Covell, Rahul Sukthankar

AI总结 本文提出一种基于拍卖的交通灯控制器,通过微拍卖整合交通传感器信息,提升路容量和平均出行时间,优于现有静态程序灯和长期规划方案。

详情
AI中文摘要

实时优化交通流解决重要实际问题:减少驾驶员空闲时间、提高城市效率、减少气体排放和改善空气质量。当前交通灯优化研究多依赖扩展交通灯与其他交通设施的通信能力,但在此类能力普及前,可通过现有部署基础设施更响应当前交通状况来改进交通灯。本文介绍一种利用微拍卖进行竞价的交通灯控制器,无需其他外部信息源。我们在旧金山山景城和芝加哥river north社区的Android用户数月收集的大规模数据上训练和测试交通灯控制器。学习得到的拍卖机制控制器在两个城市中均在道路容量和平均出行时间等相关指标上超越了现有部署的交通灯、优化静态程序灯和长期规划方法,通过真实用户驾驶数据测量。

英文摘要

Real-time optimization of traffic flow addresses important practical problems: reducing a driver's wasted time, improving city-wide efficiency, reducing gas emissions and improving air quality. Much of the current research in traffic-light optimization relies on extending the capabilities of traffic lights to either communicate with each other or communicate with vehicles. However, before such capabilities become ubiquitous, opportunities exist to improve traffic lights by being more responsive to current traffic situations within the current, already deployed, infrastructure. In this paper, we introduce a traffic light controller that employs bidding within micro-auctions to efficiently incorporate traffic sensor information; no other outside sources of information are assumed. We train and test traffic light controllers on large-scale data collected from opted-in Android cell-phone users over a period of several months in Mountain View, California and the River North neighborhood of Chicago, Illinois. The learned auction-based controllers surpass (in both the relevant metrics of road-capacity and mean travel time) the currently deployed lights, optimized static-program lights, and longer-term planning approaches, in both cities, measured using real user driving data.

1701.01487 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Designing a Safe Autonomous Artificial Intelligence Agent based on Human Self-Regulation

基于人类自我调节设计安全的自主人工智能代理

Mark Muraven

AI总结 本文提出通过研究人类自我调节机制,设计安全的人工智能代理,以避免复杂系统中因目标不明确导致的有害后果。

Comments 17 pages

详情
AI中文摘要

目前,如何设计安全的人工智能代理成为研究重点。随着系统复杂性增加,不明确的目标或控制机制可能导致人工智能代理产生有害结果。因此,需要设计能够遵循初始编程意图的代理,即使在程序复杂性增加时也是如此。如何指定这些初始意图也是设计安全人工智能代理的障碍。最后,人工智能代理需要有冗余的安全机制,以确保任何编程错误不会升级为严重问题。人类是自主智能代理,已经避免了这些问题,本文认为通过理解人类自我调节和目标设定,可以更好地设计安全的人工智能代理。一些关于人类自我调节的一般原则被概述,并给出了针对人工智能设计的具体指导。

英文摘要

There is a growing focus on how to design safe artificial intelligent (AI) agents. As systems become more complex, poorly specified goals or control mechanisms may cause AI agents to engage in unwanted and harmful outcomes. Thus it is necessary to design AI agents that follow initial programming intentions as the program grows in complexity. How to specify these initial intentions has also been an obstacle to designing safe AI agents. Finally, there is a need for the AI agent to have redundant safety mechanisms to ensure that any programming errors do not cascade into major problems. Humans are autonomous intelligent agents that have avoided these problems and the present manuscript argues that by understanding human self-regulation and goal setting, we may be better able to design safe AI agents. Some general principles of human self-regulation are outlined and specific guidance for AI design is given.

1612.07059 2026-06-04 cs.AI cs.MA cs.SY eess.SY 版本更新

ARES: Adaptive Receding-Horizon Synthesis of Optimal Plans

ARES:自适应滚动 horizon 最优计划合成

Anna Lukina, Lukas Esterle, Christian Hirsch, Ezio Bartocci, Junxing Yang, Ashish Tiwari, Scott A. Smolka, Radu Grosu

AI总结 本文提出ARES算法,通过自适应粒子群优化生成最优计划,针对马尔可夫决策过程中的状态转换问题,结合重要性分裂思想,提供收敛保证,并在鸟群V形飞行中验证其有效性。

Comments submitted to TACAS 2017

详情
AI中文摘要

我们介绍了ARES,一种高效的近似算法,用于生成最优计划(动作序列),将马尔可夫决策过程(MDP)的初始状态转换到成本低于指定(收敛)阈值的状态。ARES使用粒子群优化,具有自适应的滚动 horizon 和粒子群大小。受重要性分裂启发,滚动 horizon 的长度和粒子数量被选择,使得至少一个粒子达到下一层次状态,即成本从上一层次状态减少所需delta的状态。状态和计划的层次关系以及由ARES构造的计划隐式定义了Lyapunov函数和最优策略,分别可以通过对MDP中所有状态应用ARES,直到某些拓扑等价关系,显式生成。我们还通过统计评估ARES生成最优计划的成功率。ARES算法源于我们希望明确飞行V形是否是一种优化能量节约、清晰视野和速度对齐的鸟群策略。即,我们感兴趣的是是否能发现将鸟群从任意初始状态转换到单个连接V形状态的最优计划。对于7只鸟的鸟群,ARES能够在8000个随机初始配置中95%的情况下在63秒内生成V形飞行计划。ARES也可以轻松定制为具有自适应滚动 horizon 和统计收敛保证的模型预测控制器(MPC)。据我们所知,我们的自适应大小方法是首次在滚动 horizon 技术中提供收敛保证的方法。

英文摘要

We introduce ARES, an efficient approximation algorithm for generating optimal plans (action sequences) that take an initial state of a Markov Decision Process (MDP) to a state whose cost is below a specified (convergence) threshold. ARES uses Particle Swarm Optimization, with adaptive sizing for both the receding horizon and the particle swarm. Inspired by Importance Splitting, the length of the horizon and the number of particles are chosen such that at least one particle reaches a next-level state, that is, a state where the cost decreases by a required delta from the previous-level state. The level relation on states and the plans constructed by ARES implicitly define a Lyapunov function and an optimal policy, respectively, both of which could be explicitly generated by applying ARES to all states of the MDP, up to some topological equivalence relation. We also assess the effectiveness of ARES by statistically evaluating its rate of success in generating optimal plans. The ARES algorithm resulted from our desire to clarify if flying in V-formation is a flocking policy that optimizes energy conservation, clear view, and velocity alignment. That is, we were interested to see if one could find optimal plans that bring a flock from an arbitrary initial state to a state exhibiting a single connected V-formation. For flocks with 7 birds, ARES is able to generate a plan that leads to a V-formation in 95% of the 8,000 random initial configurations within 63 seconds, on average. ARES can also be easily customized into a model-predictive controller (MPC) with an adaptive receding horizon and statistical guarantees of convergence. To the best of our knowledge, our adaptive-sizing approach is the first to provide convergence guarantees in receding-horizon techniques.

1612.04933 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Dynamical Kinds and their Discovery

动力学种类及其发现

Benjamin C. Jantzen

AI总结 本文提出一种无需显式构建动态模型或依赖系统动力学先验知识,即可分类具有相同结构因果系统的算法,展示了其在动态模型开发与验证中的应用价值。

Comments Accepted for the proceedings of the Causation: Foundation to Application Workshop, UAI 2016

详情
AI中文摘要

我们展示了将因果系统分类为共享相同结构的种类的可能性,无需首先构建显式动态模型或使用系统动力学的先验知识。该算法能够确定任意系统是否由相同形式的因果关系支配,具有在动态模型开发和验证中的重要应用价值。从理论上看,这也是科学推理中从实证数据中推导定律的关键阶段。所提出的算法基于动态对称性方法来处理动态种类。时间对称性是指对系统的一个或多个变量进行干预,该干预与系统的时间演化过程可交换。动态种类是共享一组动态对称性的系统类。所提出的算法通过直接比较系统展示的对称性来分类确定性、时间依赖性的因果系统。使用来自多种非线性系统的模拟、噪声数据,我们证明该算法能够正确地将系统分类为动态种类。该算法在显著的采样误差下具有鲁棒性,对采样误差的非正态性不敏感,并在动态相似性增加时表现良好。所展示的算法是首个针对自动化科学发现这一方面的算法。

英文摘要

We demonstrate the possibility of classifying causal systems into kinds that share a common structure without first constructing an explicit dynamical model or using prior knowledge of the system dynamics. The algorithmic ability to determine whether arbitrary systems are governed by causal relations of the same form offers significant practical applications in the development and validation of dynamical models. It is also of theoretical interest as an essential stage in the scientific inference of laws from empirical data. The algorithm presented is based on the dynamical symmetry approach to dynamical kinds. A dynamical symmetry with respect to time is an intervention on one or more variables of a system that commutes with the time evolution of the system. A dynamical kind is a class of systems sharing a set of dynamical symmetries. The algorithm presented classifies deterministic, time-dependent causal systems by directly comparing their exhibited symmetries. Using simulated, noisy data from a variety of nonlinear systems, we show that this algorithm correctly sorts systems into dynamical kinds. It is robust under significant sampling error, is immune to violations of normality in sampling error, and fails gracefully with increasing dynamical similarity. The algorithm we demonstrate is the first to address this aspect of automated scientific discovery.

1609.05258 2026-06-04 cs.RO cs.AI cs.CV cs.SY eess.SY 版本更新

The ACRV Picking Benchmark (APB): A Robotic Shelf Picking Benchmark to Foster Reproducible Research

ACRV 摘取基准 (APB):一个促进可重复研究的机器人货架摘取基准

Jürgen Leitner, Adam W. Tow, Jake E. Dean, Niko Suenderhauf, Joseph W. Durham, Matthew Cooper, Markus Eich, Christopher Lehnert, Ruben Mangels, Christopher McCool, Peter Kujala, Lachlan Nicholson, Trung Pham, James Sergeant, Liao Wu, Fangyi Zhang, Ben Upcroft, Peter Corke

AI总结 本文提出ACRV摘取基准(APB),通过42个常见物品、广泛可用的货架和精确的物品排列指南,提供可重复的机器人摘取基准,支持完整机器人系统的比较。

Comments 8 pages, submitted to RA:Letters

详情
AI中文摘要

机器人挑战如亚马逊摘取挑战(APC)或DARPA挑战是推动科学进步的重要方式。它们使研究在明确的基准上进行比较,所有参与者享有相同的测试条件。然而,此类挑战事件仅偶尔举行,参赛人数有限,且测试条件难以在主事件后复制。我们提出一个新的物理基准挑战:ACRV摘取基准(APB)。该基准设计为可重复,包含42个常见物品、广泛可用的货架和精确的物品排列指南。明确的评估协议使完整机器人系统(包括感知和操作)的比较成为可能,而不仅仅是子系统。本文还描述并报告了基于Baxter机器人开放基线系统的实验结果。

英文摘要

Robotic challenges like the Amazon Picking Challenge (APC) or the DARPA Challenges are an established and important way to drive scientific progress. They make research comparable on a well-defined benchmark with equal test conditions for all participants. However, such challenge events occur only occasionally, are limited to a small number of contestants, and the test conditions are very difficult to replicate after the main event. We present a new physical benchmark challenge for robotic picking: the ACRV Picking Benchmark (APB). Designed to be reproducible, it consists of a set of 42 common objects, a widely available shelf, and exact guidelines for object arrangement using stencils. A well-defined evaluation protocol enables the comparison of \emph{complete} robotic systems -- including perception and manipulation -- instead of sub-systems only. Our paper also describes and reports results achieved by an open baseline system based on a Baxter robot.

1612.04023 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Proceedings of the The First Workshop on Verification and Validation of Cyber-Physical Systems

第一届验证与验证网络物理系统研讨会会议记录

Mehdi Kargahi, Ashutosh Trivedi

发表机构 * Reykjavík, Iceland(冰岛雷克雅未克) MITL Specification Debugging for Monitoring of Cyber-Physical Systems(网络物理系统监控的MITL规格调试) Automatic Synthesis of Controllers from Specifications using Control Certificates(使用控制证书从规范自动合成控制器) A Compositional Framework for Preference-Aware Agents(偏好感知代理的组合框架) Output Feedback Controller Design with Symbolic Observers for Cyber-physical Systems(网络物理系统符号观测器输出反馈控制器设计) Towards an Approximate Conformance Relation for Hybrid I/O Automata(混合I/O自动机近似一致性关系) On Nonlinear Prices in Timed Automata(时序自动机中的非线性价格) Towards the Verification of Safety-critical Autonomous Systems in Dynamic Environments(动态环境中安全关键自主系统的验证)

AI总结 本文介绍了首届网络物理系统验证与验证研讨会,探讨了验证与验证方法,包括控制、模拟和形式化方法等,旨在解决复杂软件和算法的验证问题。

详情
Journal ref
EPTCS 232, 2016
AI中文摘要

第一届国际网络物理系统验证与验证研讨会(V2CPS-16)于冰岛雷克雅未克举行的第十二届国际形式化方法整合会议(iFM 2016)期间召开。该研讨会旨在汇集形式化验证和网络物理系统(CPS)领域的研究人员和专家,讨论涵盖广泛验证与验证方法的主题,包括但不限于控制、模拟、形式化方法等。网络物理系统(CPS)是网络化计算和物理过程的整合,具有有意义的相互作用;前者监控、控制并影响后者,而后者也影响前者。CPS在机器人、交通、通信、基础设施、能源和制造系统中有广泛应用。许多安全关键系统,如化学过程、医疗设备、飞机飞行控制系统和汽车系统,确实属于CPS。CPS的先进能力需要复杂的软件和合成算法,这些算法难以验证。事实上,该领域中的许多问题都是不可判定的。因此,一个重要的步骤是找到特定的抽象,这些抽象可能在特定属性上算法上可验证,描述CPS的部分或整体行为。

英文摘要

The first International Workshop on Verification and Validation of Cyber-Physical Systems (V2CPS-16) was held in conjunction with the 12th International Conference on integration of Formal Methods (iFM 2016) in Reykjavik, Iceland. The purpose of V2CPS-16 was to bring together researchers and experts of the fields of formal verification and cyber-physical systems (CPS) to cover the theme of this workshop, namely a wide spectrum of verification and validation methods including (but not limited to) control, simulation, formal methods, etc. A CPS is an integration of networked computational and physical processes with meaningful inter-effects; the former monitors, controls, and affects the latter, while the latter also impacts the former. CPSs have applications in a wide-range of systems spanning robotics, transportation, communication, infrastructure, energy, and manufacturing. Many safety-critical systems such as chemical processes, medical devices, aircraft flight control, and automotive systems, are indeed CPS. The advanced capabilities of CPS require complex software and synthesis algorithms, which are hard to verify. In fact, many problems in this area are undecidable. Thus, a major step is to find particular abstractions of such systems which might be algorithmically verifiable regarding specific properties of such systems, describing the partial/overall behaviors of CPSs.

1612.02739 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Controlling Robot Morphology from Incomplete Measurements

从不完整测量中控制机器人形态

Martin Pecka, Karel Zimmermann, Michal Reinštein, Tomáš Svoboda

AI总结 针对复杂形态机器人在城市搜索与救援任务中的地形穿越需求,提出通过自主控制处理不完整数据并确保安全性的方法。

Comments Accepted into IEEE Transactions to Industrial Electronics, Special Section on Motion Control for Novel Emerging Robotic Devices and Systems

详情
AI中文摘要

复杂形态的移动机器人对于在城市搜索与救援任务中穿越粗糙地形至关重要。由于远程操作复杂形态会增加操作员的认知负担,因此需要自主控制。自主控制会测量机器人状态和周围地形,通常只能部分观测,因此数据往往不完整。我们对缺失测量进行边缘化,并评估一个显式安全条件。如果安全条件被违反,身体安装的机械臂通过触觉探索收集缺失数据。

英文摘要

Mobile robots with complex morphology are essential for traversing rough terrains in Urban Search & Rescue missions (USAR). Since teleoperation of the complex morphology causes high cognitive load of the operator, the morphology is controlled autonomously. The autonomous control measures the robot state and surrounding terrain which is usually only partially observable, and thus the data are often incomplete. We marginalize the control over the missing measurements and evaluate an explicit safety condition. If the safety condition is violated, tactile terrain exploration by the body-mounted robotic arm gathers the missing data.

1612.01399 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A New Type-II Fuzzy Logic Based Controller for Non-linear Dynamical Systems with Application to a 3-PSP Parallel Robot

一种新型类型-II 模糊逻辑控制器用于非线性动力学系统及其在3-PSP并联机器人中的应用

Hamid Reza Hassanzadeh

AI总结 本文提出一种基于类型-II 模糊逻辑的控制器,用于非线性动力学系统,应用于3-PSP并联机器人,以应对不确定性问题。

Comments Master's thesis

详情
AI中文摘要

不确定性在几乎所有复杂系统中都是突出的例子,包括并联机器人。类型-II 模糊逻辑在处理不确定性方面优于传统模糊逻辑。类型-II 模糊逻辑控制器是较新的方法,因其在噪声(不确定性的重要实例)出现时的显著贡献而被应用于各种领域。在设计类型-I 模糊逻辑系统时,我们假设对模糊隶属函数几乎确定,但在许多情况下并不成立。因此,类型-II 模糊逻辑作为更现实的方法,可能在实际应用中有很大贡献。类型-II 模糊逻辑考虑了更高层次的不确定性,即类型-II 模糊变量的隶属度不再是一个确定的数字,而是本身是一个类型-I 语言术语。本文考虑了动态控制并联机器人中的不确定性影响。更具体地说,旨在将类型-II 模糊逻辑范式纳入基于模型的控制器,即所谓的计算扭矩控制方法,并将结果应用于具有3自由度的并联执行器。

英文摘要

The concept of uncertainty is posed in almost any complex system including parallel robots as an outstanding instance of dynamical robotics systems. As suggested by the name, uncertainty, is some missing information that is beyond the knowledge of human thus we may tend to handle it properly to minimize the side-effects through the control process. Type-II fuzzy logic has shown its superiority over traditional fuzzy logic when dealing with uncertainty. Type-II fuzzy logic controllers are however newer and more promising approaches that have been recently applied to various fields due to their significant contribution especially when noise (as an important instance of uncertainty) emerges. During the design of Type-I fuzzy logic systems, we presume that we are almost certain about the fuzzy membership functions which is not true in many cases. Thus T2FLS as a more realistic approach dealing with practical applications might have a lot to offer. Type-II fuzzy logic takes into account a higher level of uncertainty, in other words, the membership grade for a type-II fuzzy variable is no longer a crisp number but rather is itself a type-I linguistic term. In this thesis the effects of uncertainty in dynamic control of a parallel robot is considered. More specifically, it is intended to incorporate the Type-II Fuzzy Logic paradigm into a model based controller, the so-called computed torque control method, and apply the result to a 3 degrees of freedom parallel manipulator. ...

1611.09926 2026-06-04 econ.GN cs.AI q-fin.EC 版本更新

Choquet integral in decision analysis - lessons from the axiomatization

Choquet积分在决策分析中的应用——从公理化分析中汲取教训

Mikhail Timonin

AI总结 本文探讨Choquet积分的公理化分析及其在决策分析中的应用,指出传统方法在学习过程中存在的假设问题,并提出新的状态依赖效用模型。

详情
AI中文摘要

Choquet积分是一种强大的聚合算子,包含许多已知模型作为特例。本文分析这些特例的公理化性质,并探讨其学习过程中的假设问题。传统方法常假设所有维度在同一尺度上,但此假设在实践中难以成立。本文讨论了状态依赖效用模型的条件,并展示了其与传统方法的不同之处。

英文摘要

The Choquet integral is a powerful aggregation operator which lists many well-known models as its special cases. We look at these special cases and provide their axiomatic analysis. In cases where an axiomatization has been previously given in the literature, we connect the existing results with the framework that we have developed. Next we turn to the question of learning, which is especially important for the practical applications of the model. So far, learning of the Choquet integral has been mostly confined to the learning of the capacity. Such an approach requires making a powerful assumption that all dimensions (e.g. criteria) are evaluated on the same scale, which is rarely justified in practice. Too often categorical data is given arbitrary numerical labels (e.g. AHP), and numerical data is considered cardinally and ordinally commensurate, sometimes after a simple normalization. Such approaches clearly lack scientific rigour, and yet they are commonly seen in all kinds of applications. We discuss the pros and cons of making such an assumption and look at the consequences which axiomatization uniqueness results have for the learning problems. Finally, we review some of the applications of the Choquet integral in decision analysis. Apart from MCDA, which is the main area of interest for our results, we also discuss how the model can be interpreted in the social choice context. We look in detail at the state-dependent utility, and show how comonotonicity, central to the previous axiomatizations, actually implies state-independency in the Choquet integral model. We also discuss the conditions required to have a meaningful state-dependent utility representation and show the novelty of our results compared to the previous methods of building state-dependent models.

1611.09809 2026-06-04 eess.SY cs.AI cs.SY math.OC nlin.CD 版本更新

Fractional Order Fuzzy Control of Hybrid Power System with Renewable Generation Using Chaotic PSO

分数阶模糊控制在含可再生能源混合电力系统中的应用:基于混沌PSO的优化

Indranil Pan, Saptarshi Das

AI总结 本文提出一种分数阶模糊控制方案,结合混沌PSO优化,提升混合电力系统在非线性工况下的性能与鲁棒性。

Comments 21 pages, 12 figures, 4 tables

详情
Journal ref
ISA Transactions, Volume 62, May 2016, Pages 19-29
AI中文摘要

本文研究了一种混合电力系统的操作,通过一种新颖的模糊控制方案。混合电力系统包含多种自主发电系统,如风力涡轮机、太阳能光伏、柴油机、燃料电池、水电解器等。其他储能设备如电池、飞轮和超电容器也存在于网络中。采用了一种新颖的分数阶(FO)模糊控制方案,并利用粒子群优化(PSO)算法结合两个混沌映射来调整其参数,以实现改进的性能。该分数阶模糊控制器在线性和非线性运行工况下均优于传统PID和整数阶模糊PID控制器。该控制器在系统参数变化和速率约束非线性方面也表现出更强的鲁棒性。鲁棒性是这种情况下非常理想的特性,因为混合电力系统中的许多组件可能在不同时间点被开启/关闭或以不同功率输出运行。

英文摘要

This paper investigates the operation of a hybrid power system through a novel fuzzy control scheme. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell, aqua electrolyzer etc. Other energy storage devices like the battery, flywheel and ultra-capacitor are also present in the network. A novel fractional order (FO) fuzzy control scheme is employed and its parameters are tuned with a particle swarm optimization (PSO) algorithm augmented with two chaotic maps for achieving an improved performance. This FO fuzzy controller shows better performance over the classical PID, and the integer order fuzzy PID controller in both linear and nonlinear operating regimes. The FO fuzzy controller also shows stronger robustness properties against system parameter variation and rate constraint nonlinearity, than that with the other controller structures. The robustness is a highly desirable property in such a scenario since many components of the hybrid power system may be switched on/off or may run at lower/higher power output, at different time instants.

1611.09755 2026-06-04 eess.SY cs.AI cs.NE cs.SY math.OC 版本更新

Fractional Order AGC for Distributed Energy Resources Using Robust Optimization

分数阶AGC用于分布式能源资源的鲁棒优化

Indranil Pan, Saptarshi Das

AI总结 本文研究了分数阶自动发电控制在电力系统频率振荡阻尼中的应用,采用分布式能源发电,通过鲁棒优化技术优化控制器参数,提升系统性能。

Comments 12 pages, 16 figures, 5 tables

详情
Journal ref
IEEE Transactions on Smart Grid, Volume 7, Issue 5, Pages 2175 - 2186, Sept 2016
AI中文摘要

本文探讨了分数阶(FO)自动发电控制(AGC)在电力系统频率振荡阻尼中的适用性,采用分布式能源发电。混合电力系统包含多种自主发电系统,如风力涡轮机、太阳能光伏、柴油机、燃料电池和水电解器,以及电池和飞轮等其他储能设备。控制器位于远程位置,通过不可靠的通信网络发送和接收信号,具有随机延迟。控制器参数通过鲁棒优化技术优化,使用不同变种的粒子群优化(PSO)算法,并与相应最优解进行比较。采用基于档案的策略减少鲁棒优化方法的函数评估次数。通过鲁棒优化获得的解决方案能够处理控制器增益和阶数的更高变化,而系统性能不会显著下降。这从分数阶控制器实施的角度来看是有益的,因为设计能够容纳由于分数阶运算符近似不同实现方法和精度阶数而可能产生的系统参数变化。还比较了分数阶和整数阶(IO)控制器,以突出每种方案的优缺点。

英文摘要

The applicability of fractional order (FO) automatic generation control (AGC) for power system frequency oscillation damping is investigated in this paper, employing distributed energy generation. The hybrid power system employs various autonomous generation systems like wind turbine, solar photovoltaic, diesel engine, fuel-cell and aqua electrolyzer along with other energy storage devices like the battery and flywheel. The controller is placed in a remote location while receiving and sending signals over an unreliable communication network with stochastic delay. The controller parameters are tuned using robust optimization techniques employing different variants of Particle Swarm Optimization (PSO) and are compared with the corresponding optimal solutions. An archival based strategy is used for reducing the number of function evaluations for the robust optimization methods. The solutions obtained through the robust optimization are able to handle higher variation in the controller gains and orders without significant decrease in the system performance. This is desirable from the FO controller implementation point of view, as the design is able to accommodate variations in the system parameter which may result due to the approximation of FO operators, using different realization methods and order of accuracy. Also a comparison is made between the FO and the integer order (IO) controllers to highlight the merits and demerits of each scheme.

1611.03372 2026-06-04 cs.RO cs.AI cs.SE cs.SY eess.SY 版本更新

A stochastically verifiable autonomous control architecture with reasoning

一种具有推理能力的随机可验证自主控制架构

Paolo Izzo, Hongyang Qu, Sandor M. Veres

AI总结 本文提出一种具有推理能力的随机可验证自主控制架构LISA,通过将系统抽象为DTMC和MDP模型,实现代理与环境的概率验证,提升设计与运行时的验证效率。

Comments Accepted at IEEE Conf. Decision and Control, 2016

详情
AI中文摘要

本文介绍了一种名为有限指令集代理(LISA)的新代理架构,用于自主控制。该架构基于先前的AgentSpeak实现,结构比其前身更简单,旨在促进设计时和运行时的验证方法。研究并展示了将LISA系统抽象为两种不同的离散概率模型(DTMC和MDP)的过程。LISA系统为代理和环境的完整建模提供了工具,用于概率验证。代理程序可以自动编译为DTMC或MDP模型进行验证,使用Prism工具。自动生成的Prism模型可用于设计时和运行时的验证。运行时验证在LISA系统中作为内部建模机制,用于预测未来的 outcomes。

英文摘要

A new agent architecture called Limited Instruction Set Agent (LISA) is introduced for autonomous control. The new architecture is based on previous implementations of AgentSpeak and it is structurally simpler than its predecessors with the aim of facilitating design-time and run-time verification methods. The process of abstracting the LISA system to two different types of discrete probabilistic models (DTMC and MDP) is investigated and illustrated. The LISA system provides a tool for complete modelling of the agent and the environment for probabilistic verification. The agent program can be automatically compiled into a DTMC or a MDP model for verification with Prism. The automatically generated Prism model can be used for both design-time and run-time verification. The run-time verification is investigated and illustrated in the LISA system as an internal modelling mechanism for prediction of future outcomes.

1610.08127 2026-06-04 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

Fast Bayesian Non-Negative Matrix Factorisation and Tri-Factorisation

快速的贝叶斯非负矩阵分解与三因子分解

Thomas Brouwer, Jes Frellsen, Pietro Lio'

AI总结 本文提出一种快速变分贝叶斯算法,用于非负矩阵分解和三因子分解,相比Gibbs采样和非概率方法,该方法在迭代和时间步收敛速度更快,且无需额外样本估计后验。

Comments NIPS 2016 Workshop on Advances in Approximate Bayesian Inference

详情
AI中文摘要

我们提出了一种快速的变分贝叶斯算法,用于执行非负矩阵分解和三因子分解。我们证明了我们的方法在每次迭代和时间步(墙钟时间)上的收敛速度比Gibbs采样和非概率方法更快,并且不需要额外的样本来估计后验。我们特别展示了对于矩阵三因子分解,收敛具有挑战性,但我们的变分贝叶斯方法提供了一种快速的解决方案,使三因子分解方法能够更有效地使用。

英文摘要

We present a fast variational Bayesian algorithm for performing non-negative matrix factorisation and tri-factorisation. We show that our approach achieves faster convergence per iteration and timestep (wall-clock) than Gibbs sampling and non-probabilistic approaches, and do not require additional samples to estimate the posterior. We show that in particular for matrix tri-factorisation convergence is difficult, but our variational Bayesian approach offers a fast solution, allowing the tri-factorisation approach to be used more effectively.

1610.03518 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Transfer from Simulation to Real World through Learning Deep Inverse Dynamics Model

通过学习深度逆动力学模型实现仿真到现实世界的迁移

Paul Christiano, Zain Shah, Igor Mordatch, Jonas Schneider, Trevor Blackwell, Joshua Tobin, Pieter Abbeel, Wojciech Zaremba

AI总结 本文提出通过学习深度逆动力学模型,在仿真与现实世界之间实现控制策略的迁移,解决仿真与现实差异导致的性能下降问题。

详情
AI中文摘要

在仿真中开发控制策略通常比直接在现实世界中运行实验更实用和安全。这适用于通过规划和优化获得的策略,甚至更适用于通过强化学习获得的策略,后者通常非常数据密集。然而,仿真中成功的策略在部署到现实机器人时往往无法工作。然而,策略在仿真中执行的整体思路在现实世界中通常仍然有效。本文研究了此类场景,其中仿真中遍历的状态序列在现实世界中仍然合理,即使控制细节不同,例如摩擦、接触、质量和几何属性的差异。在执行过程中,我们的方法在每个时间步计算仿真基于的控制策略会做什么,但不执行这些控制在现实机器人上,而是计算仿真期望的下一个状态,并依赖于学习的深度逆动力学模型来决定最合适的现实世界动作以达到这些状态。深度模型只有在训练数据足够的情况下才有效,我们还提出了一种数据收集方法来(逐步)学习深度逆动力学模型。我们的实验表明,我们的方法在处理仿真到现实世界模型差异的各种基线方法中表现良好,包括输出误差控制和高斯动态适应。

英文摘要

Developing control policies in simulation is often more practical and safer than directly running experiments in the real world. This applies to policies obtained from planning and optimization, and even more so to policies obtained from reinforcement learning, which is often very data demanding. However, a policy that succeeds in simulation often doesn't work when deployed on a real robot. Nevertheless, often the overall gist of what the policy does in simulation remains valid in the real world. In this paper we investigate such settings, where the sequence of states traversed in simulation remains reasonable for the real world, even if the details of the controls are not, as could be the case when the key differences lie in detailed friction, contact, mass and geometry properties. During execution, at each time step our approach computes what the simulation-based control policy would do, but then, rather than executing these controls on the real robot, our approach computes what the simulation expects the resulting next state(s) will be, and then relies on a learned deep inverse dynamics model to decide which real-world action is most suitable to achieve those next states. Deep models are only as good as their training data, and we also propose an approach for data collection to (incrementally) learn the deep inverse dynamics model. Our experiments shows our approach compares favorably with various baselines that have been developed for dealing with simulation to real world model discrepancy, including output error control and Gaussian dynamics adaptation.

1610.00001 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Bacterial Foraging Optimized STATCOM for Stability Assessment in Power System

基于细菌觅食优化的STATCOM用于电力系统稳定性评估

Shiba R. Paital, Prakash K. Ray, Asit Mohanty, Sandipan Patra, Harishchandra Dubey

AI总结 本文研究了利用静态补偿器(STATCOM)改进单机连接无限母线(SMIB)电力系统稳定性,通过粒子群优化(PSO)和细菌觅食优化(BFO)优化PID控制器参数,与传统PID控制器进行比较,验证新方案在稳定性和电压调节上的鲁棒性。

Comments 5 pages, 7 figures, 2016 IEEE Students' Technology Symposium (TechSym 2016), At IIT Kharagpur, India

详情
AI中文摘要

本文提出了一种改进单机连接无限母线(SMIB)电力系统稳定性的方法,通过静态补偿器(STATCOM)中的比例-积分-微分(PID)控制器参数优化。PID控制器的增益通过基于粒子群优化(PSO)的启发式技术进行优化,同时采用细菌觅食优化(BFO)作为替代启发式方法来选择PID控制器的最佳增益。研究了上述软计算技术下STATCOM的性能,并在各种场景下与传统PID控制器进行比较。仿真结果伴有基于定量分析的性能指标。分析清楚地表明,与传统PID相比,新方案在稳定性和电压调节方面具有鲁棒性。

英文摘要

This paper presents a study of improvement in stability in a single machine connected to infinite bus (SMIB) power system by using static compensator (STATCOM). The gains of Proportional-Integral-Derivative (PID) controller in STATCOM are being optimized by heuristic technique based on Particle swarm optimization (PSO). Further, Bacterial Foraging Optimization (BFO) as an alternative heuristic method is also applied to select optimal gains of PID controller. The performance of STATCOM with the above soft-computing techniques are studied and compared with the conventional PID controller under various scenarios. The simulation results are accompanied with performance indices based quantitative analysis. The analysis clearly signifies the robustness of the new scheme in terms of stability and voltage regulation when compared with conventional PID.

1609.05960 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Incremental Sampling-based Motion Planners Using Policy Iteration Methods

基于增量采样的运动规划器使用策略迭代方法

Oktay Arslan, Panagiotis Tsiotras

AI总结 本文提出了一种基于策略迭代的运动规划算法,利用动态规划思想在随机图中求解最短路径问题,通过改进策略加速计算过程,适用于大规模并行化。

详情
AI中文摘要

最近随机运动规划的进步导致了一类新的基于采样的算法发展,这些算法提供了渐近最优性保证,例如RRT*和PRM*算法。仔细分析发现,这些算法中的所谓'重 wiring'步骤可以被解释为局部策略迭代(PI)步骤(即局部策略评估步骤后跟局部策略改进步骤),因此随着样本数趋于无穷大,这两种算法几乎肯定收敛到最优路径(概率1)。策略迭代,与价值迭代(VI)一样,是解决动态规划(DP)问题的常用方法。基于这一观察,最近提出了RRT#算法,该算法在每次迭代中对那些可能成为最优路径部分的顶点(即'有希望'的顶点)执行Bellman更新(即'备份')。RRT#算法因此利用了动态规划思想,并在随机生成的图上逐步实现以获得高质量的解决方案。在本文中,基于这一关键洞察,我们探索了一类不同的动态规划算法来解决由迭代采样方法生成的随机图中的最短路径问题。这些算法利用策略迭代而不是价值迭代,因此更适合大规模并行化。与RRT*算法不同,策略改进在重 wiring步骤中不是仅在局部进行,而是在当前迭代中被分类为'有希望'的顶点集合上进行。这倾向于加快整个过程。所得到的算法,恰当地命名为策略迭代-RRT#(PI-RRT#),是第一种基于动态规划思想的随机运动规划新类算法,利用PI方法。

英文摘要

Recent progress in randomized motion planners has led to the development of a new class of sampling-based algorithms that provide asymptotic optimality guarantees, notably the RRT* and the PRM* algorithms. Careful analysis reveals that the so-called "rewiring" step in these algorithms can be interpreted as a local policy iteration (PI) step (i.e., a local policy evaluation step followed by a local policy improvement step) so that asymptotically, as the number of samples tend to infinity, both algorithms converge to the optimal path almost surely (with probability 1). Policy iteration, along with value iteration (VI) are common methods for solving dynamic programming (DP) problems. Based on this observation, recently, the RRT$^{\#}$ algorithm has been proposed, which performs, during each iteration, Bellman updates (aka "backups") on those vertices of the graph that have the potential of being part of the optimal path (i.e., the "promising" vertices). The RRT$^{\#}$ algorithm thus utilizes dynamic programming ideas and implements them incrementally on randomly generated graphs to obtain high quality solutions. In this work, and based on this key insight, we explore a different class of dynamic programming algorithms for solving shortest-path problems on random graphs generated by iterative sampling methods. These class of algorithms utilize policy iteration instead of value iteration, and thus are better suited for massive parallelization. Contrary to the RRT* algorithm, the policy improvement during the rewiring step is not performed only locally but rather on a set of vertices that are classified as "promising" during the current iteration. This tends to speed-up the whole process. The resulting algorithm, aptly named Policy Iteration-RRT$^{\#}$ (PI-RRT$^{\#}$) is the first of a new class of DP-inspired algorithms for randomized motion planning that utilize PI methods.

1608.08292 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Robust Energy Storage Scheduling for Imbalance Reduction of Strategically Formed Energy Balancing Groups

稳健的能源存储调度以减少战略形成能源平衡小组的不平衡

Shantanu Chakraborty, Toshiya Okabe

AI总结 本文提出了一种基于概率编程的能源平衡小组形成方法,结合稳健存储调度策略,以减少能源不平衡并优化存储容量。

详情
Journal ref
Energy, Volume 114, 1 November 2016, Pages 405-417, ISSN 0360-5442
AI中文摘要

减少不平衡(合同供应与实际需求之间的在线能量差距及相关成本)对电力生产者和供应商(PPS)在 deregulated 能源市场中至关重要。PPS 需要通过前向市场互动尽可能精确地采购能源以减少不平衡能量。本文提出,1)(离线)一种有效的需求聚合策略,用于创建多个平衡小组,从而提高小组级聚合需求的可预测性;2)(在线)一种稳健的能源存储调度方法,以最小化特定平衡小组的不平衡,考虑需求预测的不确定性。小组形成通过概率编程方法使用贝叶斯马尔可夫链蒙特卡洛(MCMC)方法,在应用历史需求统计数据后进行。除了小组形成外,聚合策略(借助贝叶斯推断)还清除了所形成小组所需存储容量的上限,其中一部分将用于在线操作。在线操作中,提出了一种稳健的能源存储调度方法,以最小化预期不平衡能量和成本(不平衡能量的非线性函数),同时考虑特定小组的需求不确定性。所提出的方法应用于日本东京实际公寓建筑的需求数据。仿真结果用于验证所提方法的有效性。

英文摘要

Imbalance (on-line energy gap between contracted supply and actual demand, and associated cost) reduction is going to be a crucial service for a Power Producer and Supplier (PPS) in the deregulated energy market. PPS requires forward market interactions to procure energy as precisely as possible in order to reduce imbalance energy. This paper presents, 1) (off-line) an effective demand aggregation based strategy for creating a number of balancing groups that leads to higher predictability of group-wise aggregated demand, 2) (on-line) a robust energy storage scheduling that minimizes the imbalance for a particular balancing group considering the demand prediction uncertainty. The group formation is performed by a Probabilistic Programming approach using Bayesian Markov Chain Monte Carlo (MCMC) method after applied on the historical demand statistics. Apart from the group formation, the aggregation strategy (with the help of Bayesian Inference) also clears out the upper-limit of the required storage capacity for a formed group, fraction of which is to be utilized in on-line operation. For on-line operation, a robust energy storage scheduling method is proposed that minimizes expected imbalance energy and cost (a non-linear function of imbalance energy) while incorporating the demand uncertainty of a particular group. The proposed methods are applied on the real apartment buildings' demand data in Tokyo, Japan. Simulation results are presented to verify the effectiveness of the proposed methods.

1606.04087 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Networked Intelligence: Towards Autonomous Cyber Physical Systems

网络化智能:迈向自主的网络物理系统

Andre Karpistsenko

AI总结 本文探讨了如何结合产业与学术成果,构建大规模网络物理系统,提出概念架构并评估子系统成熟度,为智能系统发展提供规划指导。

详情
AI中文摘要

开发智能系统需要结合产业和学术成果。本文概述了相关研究领域和可应用于构建非常大规模网络物理系统的工业技术。使用概念架构来说明现有组件如何相互配合,并评估子系统的成熟度。目标是为消费者和工业互联网技术者、网络物理系统研究者及对数据与物联网融合感兴趣的人,结构化机器智能的发展和挑战,可用于智能系统发展的规划。

英文摘要

Developing intelligent systems requires combining results from both industry and academia. In this report you find an overview of relevant research fields and industrially applicable technologies for building very large scale cyber physical systems. A concept architecture is used to illustrate how existing pieces may fit together, and the maturity of the subsystems is estimated. The goal is to structure the developments and the challenge of machine intelligence for Consumer and Industrial Internet technologists, cyber physical systems researchers and people interested in the convergence of data & Internet of Things. It can be used for planning developments of intelligent systems.

1608.04361 2026-06-04 math.NA cs.AI cs.NA 版本更新

Multi-way Monte Carlo Method for Linear Systems

多向蒙特卡洛方法用于线性系统

Tao Wu, David F. Gleich

AI总结 本文提出多向马尔可夫随机游走方法,改进了线性系统求解的条件,使谱半径ρ(H⁺)<1成为必要充分条件,且在数值实验中验证了方法的有效性与速度优势。

详情
AI中文摘要

我们研究了求解形如x = Hx + b的线性系统所用的蒙特卡洛方法。该方法有效性的充分条件是‖H‖ < 1,这大大限制了其应用范围。我们通过提出新的多向马尔可夫随机游走方法,即标准马尔可夫随机游走的推广,改进了这一条件。在我们新的框架下,我们证明了该方法有效性的必要且充分条件是谱半径ρ(H⁺) < 1,这一要求比‖H‖ < 1更宽松。除了能解决更多问题外,我们的新方法比标准算法运行更快。在合成和实际世界矩阵上的数值实验中,我们验证了新方法的有效性。

英文摘要

We study the Monte Carlo method for solving a linear system of the form $x = H x + b$. A sufficient condition for the method to work is $\| H \| < 1$, which greatly limits the usability of this method. We improve this condition by proposing a new multi-way Markov random walk, which is a generalization of the standard Markov random walk. Under our new framework we prove that the necessary and sufficient condition for our method to work is the spectral radius $ρ(H^{+}) < 1$, which is a weaker requirement than $\| H \| < 1$. In addition to solving more problems, our new method can work faster than the standard algorithm. In numerical experiments on both synthetic and real world matrices, we demonstrate the effectiveness of our new method.

1608.02165 2026-06-04 cs.CV cs.AI cs.NA math.NA math.OC 版本更新

ShapeFit and ShapeKick for Robust, Scalable Structure from Motion

形状拟合与形状踢:用于鲁棒、可扩展的结构从运动

Thomas Goldstein, Paul Hand, Choongbum Lee, Vladislav Voroninski, Stefano Soatto

AI总结 本文提出一种利用高效凸优化程序进行成对方向定位恢复的新方法,能有效处理对抗性异常值,且在真实场景和模拟数据上验证了其性能和灵活性。

详情
AI中文摘要

我们介绍了一种新的方法,用于从成对方向中恢复位置,该方法利用了一个高效的凸优化程序,具有精确恢复保证,即使在存在对抗性异常值的情况下也能有效工作。当成对方向代表视图之间的缩放相对位置(例如通过视差几何估计)时,我们的方法可用于位置恢复,即确定相对姿态,仅需一个未知的标度因子。对于此任务,我们的方法性能与最先进的方法相当,但速度提高了数量级。我们提出的方法具有灵活性,可以适应其他位置恢复方法,并可用于加速其他方法。这些特性通过在13个大型不规则图像集合以及具有真实场景和模拟数据的地面真实数据上广泛测试来验证。

英文摘要

We introduce a new method for location recovery from pair-wise directions that leverages an efficient convex program that comes with exact recovery guarantees, even in the presence of adversarial outliers. When pairwise directions represent scaled relative positions between pairs of views (estimated for instance with epipolar geometry) our method can be used for location recovery, that is the determination of relative pose up to a single unknown scale. For this task, our method yields performance comparable to the state-of-the-art with an order of magnitude speed-up. Our proposed numerical framework is flexible in that it accommodates other approaches to location recovery and can be used to speed up other methods. These properties are demonstrated by extensively testing against state-of-the-art methods for location recovery on 13 large, irregular collections of images of real scenes in addition to simulated data with ground truth.

1608.00655 2026-06-04 eess.SY cs.AI cs.SY 版本更新

A Web-based Tool for Identifying Strategic Intervention Points in Complex Systems

用于复杂系统中战略干预点识别的网页工具

Sotiris Moschoyiannis, Nicholas Elia, Alexandra S. Penn, David J. B. Lloyd, Chris Knight

AI总结 本文提出一种基于Fuzzy Cognitive Mapping的网页工具,用于识别复杂系统中的最小控制配置,通过网络可控性理论确定战略干预点,应用于英国哈姆伯地区向生物基经济转型的决策过程。

Comments In Proceedings Cassting'16/SynCoP'16, arXiv:1608.00177

详情
Journal ref
EPTCS 220, 2016, pp. 39-52
AI中文摘要

在复杂系统中实现期望结果是一项具有挑战性的任务。系统架构的不明确性和动态规则的操作化数据稀缺是主要因素。本文基于Fuzzy Cognitive Mapping(FCM)提出分析方法,将系统表示为复杂网络,并利用网络可控性理论确定最小控制配置,即战略干预点。我们开发了一个网页工具,生成复杂网络的所有最小控制配置,并通过与工业、地方政府和非政府组织合作的经验验证了该方法在哈姆伯地区向生物基经济转型决策中的应用。

英文摘要

Steering a complex system towards a desired outcome is a challenging task. The lack of clarity on the system's exact architecture and the often scarce scientific data upon which to base the operationalisation of the dynamic rules that underpin the interactions between participant entities are two contributing factors. We describe an analytical approach that builds on Fuzzy Cognitive Mapping (FCM) to address the latter and represent the system as a complex network. We apply results from network controllability to address the former and determine minimal control configurations - subsets of factors, or system levers, which comprise points for strategic intervention in steering the system. We have implemented the combination of these techniques in an analytical tool that runs in the browser, and generates all minimal control configurations of a complex network. We demonstrate our approach by reporting on our experience of working alongside industrial, local-government, and NGO stakeholders in the Humber region, UK. Our results are applied to the decision-making process involved in the transition of the region to a bio-based economy.

1607.07896 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Polling-systems-based Autonomous Vehicle Coordination in Traffic Intersections with No Traffic Signals

基于轮询系统的自动驾驶车辆在无交通信号灯的交通交叉口协调控制

David Miculescu, Sertac Karaman

AI总结 本文提出一种基于轮询系统的协调控制算法,用于无交通信号灯的自动驾驶车辆交叉口安全高效通行,通过随机模型预测车辆到达时间,确保无碰撞并提供等待时间的严格上界。

详情
AI中文摘要

自动驾驶车辆的快速发展促使对全自动驾驶交通网络潜在优势的深入研究。大多数研究认为自动驾驶系统能显著提升性能。广泛研究的概念是全自动驾驶无碰撞交叉口,车辆在无交通信号灯的交叉口调整速度以安全快速通过。本文提出了一种协调控制算法,假设车辆到达时间的随机模型。所提算法提供了安全性和性能的证明保证。更具体地说,证明了无碰撞发生,并且提供了预期等待时间的严格上界。该算法还通过仿真进行了演示。所提算法受轮询系统启发。事实上,本文研究的问题导致了一种新的轮询系统,其中客户受微分约束,这可能本身具有研究价值。

英文摘要

The rapid development of autonomous vehicles spurred a careful investigation of the potential benefits of all-autonomous transportation networks. Most studies conclude that autonomous systems can enable drastic improvements in performance. A widely studied concept is all-autonomous, collision-free intersections, where vehicles arriving in a traffic intersection with no traffic light adjust their speeds to cross safely through the intersection as quickly as possible. In this paper, we propose a coordination control algorithm for this problem, assuming stochastic models for the arrival times of the vehicles. The proposed algorithm provides provable guarantees on safety and performance. More precisely, it is shown that no collisions occur surely, and moreover a rigorous upper bound is provided for the expected wait time. The algorithm is also demonstrated in simulations. The proposed algorithms are inspired by polling systems. In fact, the problem studied in this paper leads to a new polling system where customers are subject to differential constraints, which may be interesting in its own right.

1607.02480 2026-06-04 cs.AI cs.DC cs.SY eess.SY 版本更新

Real-Time Anomaly Detection for Streaming Analytics

实时流分析中的异常检测

Subutai Ahmad, Scott Purdy

AI总结 本文提出基于Hierarchical Temporal Memory算法的实时异常检测方法,通过流数据实时处理与学习实现预测,在金融指标和NAB基准测试中均取得最佳性能。

详情
AI中文摘要

世界上的许多数据都是流数据,即时间序列数据,在关键情况下异常信息具有重要意义。然而,检测流数据中的异常是一个具有挑战性的任务,要求检测器在实时处理数据的同时进行学习和预测。我们提出了一种基于在线序列记忆算法Hierarchical Temporal Memory(HTM)的新型异常检测技术。我们展示了在实时应用中检测金融指标异常的结果。我们还测试了该算法在NAB上,一个已发布的实时异常检测基准测试中,我们的算法取得了最佳性能。

英文摘要

Much of the worlds data is streaming, time-series data, where anomalies give significant information in critical situations. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, and learn while simultaneously making predictions. We present a novel anomaly detection technique based on an on-line sequence memory algorithm called Hierarchical Temporal Memory (HTM). We show results from a live application that detects anomalies in financial metrics in real-time. We also test the algorithm on NAB, a published benchmark for real-time anomaly detection, where our algorithm achieves best-in-class results.

1607.02419 2026-06-04 econ.GN cs.AI q-fin.EC 版本更新

Divisive-agglomerative algorithm and complexity of automatic classification problems

划分-聚类算法及自动分类问题的复杂性

Alexander Rubchinsky

AI总结 本文提出了解决自动分类问题的算法,并探讨了该问题的复杂性。

详情
AI中文摘要

本文提出了解决自动分类(AC)问题的算法。在自动分类问题中,需要从给定的模式矩阵或不相似性、相似性矩阵出发,找到一个或多个划分。

英文摘要

An algorithm of solution of the Automatic Classification (AC for brevity) problem is set forth in the paper. In the AC problem, it is required to find one or several artitions, starting with the given pattern matrix or dissimilarity, similarity matrix.

1307.4847 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Efficient Reinforcement Learning in Deterministic Systems with Value Function Generalization

在确定性系统中通过价值函数泛化实现高效的强化学习

Zheng Wen, Benjamin Van Roy

AI总结 本文提出OCP算法,通过优化约束传播实现高效探索和价值函数泛化,在有限时间 horizon 确定性系统中实现最优动作选择,并提供效率和渐进行为保证。

详情
AI中文摘要

我们考虑在有限时间 horizon 确定性系统中进行强化学习的问题,并提出乐观约束传播(OCP)算法,该算法旨在合成高效的探索和价值函数泛化。我们证明当真实价值函数位于给定的假设类中时,OCP在最多K个episode中选择最优动作,其中K是给定假设类的eluder维度。我们进一步建立了效率和渐进行为保证,即使真实价值函数不位于给定的假设类中,对于假设类是预指定指示函数在不相交集合上的张量的特殊情况。我们还讨论了OCP的计算复杂性,并展示了两个示例的计算结果。

英文摘要

We consider the problem of reinforcement learning over episodes of a finite-horizon deterministic system and as a solution propose optimistic constraint propagation (OCP), an algorithm designed to synthesize efficient exploration and value function generalization. We establish that when the true value function lies within a given hypothesis class, OCP selects optimal actions over all but at most K episodes, where K is the eluder dimension of the given hypothesis class. We establish further efficiency and asymptotic performance guarantees that apply even if the true value function does not lie in the given hypothesis class, for the special case where the hypothesis class is the span of pre-specified indicator functions over disjoint sets. We also discuss the computational complexity of OCP and present computational results involving two illustrative examples.

1607.01478 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Mixed Strategy for Constrained Stochastic Optimal Control

混合策略用于受约束的随机最优控制

Masahiro Ono, Mahmoud El Chamie, Marco Pavone, Behcet Acikmese

AI总结 本文提出混合策略用于受约束的随机最优控制,证明随机化控制输入在非凸优化问题中可降低成本,等于对偶间隙,并提出基于对偶优化的高效求解方法。

Comments 11 pages. 9 figures.Preliminary version of a working journal paper

详情
AI中文摘要

在具有随机约束的最优控制问题中,随机选择控制输入可以降低预期成本,例如随机模型预测控制(SMPC)。我们考虑具有初始随机化的控制器,即在开始时随机选择K+1个控制序列(称为K-随机化)。已知对于具有K个约束的有限状态、有限动作马尔可夫决策过程(MDP),K-随机化足以达到最小成本。我们发现,对于具有连续状态和动作空间的随机最优控制问题,相同结果也成立。进一步,我们证明当优化问题非凸时,控制输入的随机化可以导致成本降低,且该降低量等于对偶间隙。然后,我们提供随机解最优性的必要和充分条件,并开发基于对偶优化的高效求解方法。此外,在K=1的特殊情况(如联合概率约束问题)中,对偶优化可通过根查找更高效地解决。最后,我们在路径规划到未来火星任务的着陆、下降和着陆(EDL)规划等多个实际问题上测试理论并演示求解方法。

英文摘要

Choosing control inputs randomly can result in a reduced expected cost in optimal control problems with stochastic constraints, such as stochastic model predictive control (SMPC). We consider a controller with initial randomization, meaning that the controller randomly chooses from K+1 control sequences at the beginning (called K-randimization).It is known that, for a finite-state, finite-action Markov Decision Process (MDP) with K constraints, K-randimization is sufficient to achieve the minimum cost. We found that the same result holds for stochastic optimal control problems with continuous state and action spaces.Furthermore, we show the randomization of control input can result in reduced cost when the optimization problem is nonconvex, and the cost reduction is equal to the duality gap. We then provide the necessary and sufficient conditions for the optimality of a randomized solution, and develop an efficient solution method based on dual optimization. Furthermore, in a special case with K=1 such as a joint chance-constrained problem, the dual optimization can be solved even more efficiently by root finding. Finally, we test the theories and demonstrate the solution method on multiple practical problems ranging from path planning to the planning of entry, descent, and landing (EDL) for future Mars missions.

1602.04621 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Deep Exploration via Bootstrapped DQN

通过Bootstrap DQN进行深度探索

Ian Osband, Charles Blundell, Alexander Pritzel, Benjamin Van Roy

AI总结 本文提出Bootstrap DQN算法,通过随机价值函数实现高效探索,提升复杂环境中的学习速度和性能,尤其在Atari游戏中表现优异。

详情
AI中文摘要

复杂环境中的高效探索仍是强化学习的主要挑战。我们提出了Bootstrap DQN,一种简单算法,通过使用随机价值函数在计算和统计上高效地进行探索。与epsilon-greedy等策略不同,Bootstrap DQN实现时序扩展(或深度)探索,可导致学习速度呈指数级提升。我们在复杂随机MDP和大规模 Arcade Learning Environment 中展示了这些优势。Bootstrap DQN在大多数Atari游戏中显著提高了学习时间和性能。

英文摘要

Efficient exploration in complex environments remains a major challenge for reinforcement learning. We propose bootstrapped DQN, a simple algorithm that explores in a computationally and statistically efficient manner through use of randomized value functions. Unlike dithering strategies such as epsilon-greedy exploration, bootstrapped DQN carries out temporally-extended (or deep) exploration; this can lead to exponentially faster learning. We demonstrate these benefits in complex stochastic MDPs and in the large-scale Arcade Learning Environment. Bootstrapped DQN substantially improves learning times and performance across most Atari games.

1606.07149 2026-06-04 cs.NE cs.AI cs.CE cs.LG cs.SY eess.SY 版本更新

An Approach to Stable Gradient Descent Adaptation of Higher-Order Neural Units

一种高阶神经单元稳定梯度下降适应的方法

Ivo Bukovsky, Noriyasu Homma

AI总结 本文提出一种基于谱半径的高阶神经单元权重更新系统稳定性评估方法,通过梯度下降实现前馈和递归HONU的适应,确保每一步适应过程的稳定性,从而保证整个神经架构对目标数据的适应性。

Comments 2016, 13 pages

详情
Journal ref
IEEE Transactions on Neural Networks and Learning Systems,ISSN: 2162-237X,2016
AI中文摘要

本文介绍了用于评估高阶神经单元(HONUs)权重更新系统稳定性的方法,该系统采用多项式聚合神经输入(也称为多项式神经网络类别)进行适应,通过梯度下降方法实现前馈和递归HONUs的适应。该方法的核心基于权重更新系统的谱半径,允许在每次适应步骤中监控和维持稳定性。确保权重更新系统的稳定性(在每次单独的适应步骤中)自然导致整个神经架构适应目标数据的稳定性。此外,所用方法强调HONU的权重优化是一个线性问题,因此所提出的方法可以一般扩展到任何其可调整参数为线性的神经架构。

英文摘要

Stability evaluation of a weight-update system of higher-order neural units (HONUs) with polynomial aggregation of neural inputs (also known as classes of polynomial neural networks) for adaptation of both feedforward and recurrent HONUs by a gradient descent method is introduced. An essential core of the approach is based on spectral radius of a weight-update system, and it allows stability monitoring and its maintenance at every adaptation step individually. Assuring stability of the weight-update system (at every single adaptation step) naturally results in adaptation stability of the whole neural architecture that adapts to target data. As an aside, the used approach highlights the fact that the weight optimization of HONU is a linear problem, so the proposed approach can be generally extended to any neural architecture that is linear in its adaptable parameters.

1606.06512 2026-06-04 eess.SY cs.AI cs.CE cs.SY math.OC physics.soc-ph 版本更新

Graphical Models for Optimal Power Flow

图模型用于最优功率流

Krishnamurthy Dvijotham, Pascal Van Hentenryck, Michael Chertkov, Sidhant Misra, Marc Vuffray

AI总结 本文将树状网络的最优功率流问题转化为树状图模型的推断问题,结合动态规划与区间离散化,提出了一种高效求解方法,适用于任意配电网络和混合整数优化。

Comments To appear in Proceedings of the 22nd International Conference on Principles and Practice of Constraint Programming (CP 2016(

详情
AI中文摘要

最优功率流(OPF)是电力网络中的核心优化问题。尽管在电网运行中被常规解决,但一般情况下被证明是强NP难问题,而在树状网络上为弱NP难。本文将树状网络的OPF问题建模为树状图模型的推断问题,其中节点变量为低维向量。我们适配标准的树状图模型推断动态规划算法至OPF问题。结合节点变量的区间离散化,我们开发出OPF问题的近似算法。进一步,我们利用约束编程(CP)技术进行区间计算和自适应边界传播,获得实际高效的算法。与之前使用凸松弛保证最优性的OPF算法相比,我们的方法能够处理任意配电网络和混合整数优化问题。此外,该方法可以以分布式消息传递的方式实现,具有可扩展性,适用于智能电网应用,如分布式能源资源的控制。我们在多个基准网络上评估了该技术,并展示了使用此方法可以有效解决实际OPF问题。

英文摘要

Optimal power flow (OPF) is the central optimization problem in electric power grids. Although solved routinely in the course of power grid operations, it is known to be strongly NP-hard in general, and weakly NP-hard over tree networks. In this paper, we formulate the optimal power flow problem over tree networks as an inference problem over a tree-structured graphical model where the nodal variables are low-dimensional vectors. We adapt the standard dynamic programming algorithm for inference over a tree-structured graphical model to the OPF problem. Combining this with an interval discretization of the nodal variables, we develop an approximation algorithm for the OPF problem. Further, we use techniques from constraint programming (CP) to perform interval computations and adaptive bound propagation to obtain practically efficient algorithms. Compared to previous algorithms that solve OPF with optimality guarantees using convex relaxations, our approach is able to work for arbitrary distribution networks and handle mixed-integer optimization problems. Further, it can be implemented in a distributed message-passing fashion that is scalable and is suitable for "smart grid" applications like control of distributed energy resources. We evaluate our technique numerically on several benchmark networks and show that practical OPF problems can be solved effectively using this approach.

1606.05124 2026-06-04 cs.RO cs.AI cs.SY eess.SY 版本更新

Robust Active Perception via Data-association aware Belief Space planning

通过数据关联意识的信念空间规划实现鲁棒的主动感知

Shashank Pathak, Antony Thomas, Asaf Feniger, Vadim Indelman

AI总结 本文提出一种结合数据关联推理的信念空间规划方法,以应对定位不确定性和感知模糊环境中的挑战,通过设计新的成本函数提升主动解歧能力。

详情
AI中文摘要

我们开发了一种信念空间规划(BSP)方法,通过在规划中整合数据关联(DA)推理,同时考虑额外的不确定性来源,从而推动了该领域的前沿。现有BSP方法通常假设数据关联已知且完美,但在存在定位不确定性、模糊和感知混叠环境时,这一假设更难成立。相反,我们的数据关联意识信念空间规划(DA-BSP)方法在信念演化中显式推理数据关联,因此能更好地应对这些具有挑战性的现实场景。特别是,我们展示了由于感知混叠,后验信念成为概率分布函数的混合,设计了衡量预期模糊程度和后验不确定性的成本函数。使用这些以及标准成本(如控制惩罚、距离目标)在目标函数中,得到一个能够可靠表示动作影响且特别擅长主动解歧的通用框架。我们的方法因此适用于感知混叠环境中的鲁棒主动感知和自主导航。我们通过基本和现实的模拟展示了关键方面。

英文摘要

We develop a belief space planning (BSP) approach that advances the state of the art by incorporating reasoning about data association (DA) within planning, while considering additional sources of uncertainty. Existing BSP approaches typically assume data association is given and perfect, an assumption that can be harder to justify while operating, in the presence of localization uncertainty, in ambiguous and perceptually aliased environments. In contrast, our data association aware belief space planning (DA-BSP) approach explicitly reasons about DA within belief evolution, and as such can better accommodate these challenging real world scenarios. In particular, we show that due to perceptual aliasing, the posterior belief becomes a mixture of probability distribution functions, and design cost functions that measure the expected level of ambiguity and posterior uncertainty. Using these and standard costs (e.g.~control penalty, distance to goal) within the objective function, yields a general framework that reliably represents action impact, and in particular, capable of active disambiguation. Our approach is thus applicable to robust active perception and autonomous navigation in perceptually aliased environments. We demonstrate key aspects in basic and realistic simulations.

1606.01949 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Assisted Energy Management in Smart Microgrids

智能微电网中的辅助能源管理

Andrea Monacchi, Wilfried Elmenreich

AI总结 本文研究了通过正向合同缓解竞争需求导致的服务中断问题,设计了基于策略的经纪人并利用神经网络实现学习经纪人,以降低赔付成本并提高整体利润。

详情
AI中文摘要

需求响应为公用事业提供了一种机制,以向终端用户分享可再生能源使用所带来的随机性。价格被用来反映能源供应情况,将这种有限资源分配给最重视它的负载。然而,严格竞争机制在存在竞争需求时可能导致服务中断。为了解决这个问题,我们研究了使用远期合约,即价格反映未来供需曲线预期的服务水平协议。鉴于微电网资源有限,服务中断是服务可用性的相反目标。我们首先设计了基于策略的经纪人,然后识别出基于人工神经网络的学习经纪人。我们证明后者逐步减少赔偿成本并最大化整体利润。

英文摘要

Demand response provides utilities with a mechanism to share with end users the stochasticity resulting from the use of renewable sources. Pricing is accordingly used to reflect energy availability, to allocate such a limited resource to those loads that value it most. However, the strictly competitive mechanism can result in service interruption in presence of competing demand. To solve this issue we investigate on the use of forward contracts, i.e., service level agreements priced to reflect the expectation of future supply and demand curves. Given the limited resources of microgrids, service interruption is an opposite objective to the one of service availability. We firstly design policy-based brokers and identify then a learning broker based on artificial neural networks. We show the latter being progressively minimizing the reimbursement costs and maximizing the overall profit.

1606.01245 2026-06-04 math.NA cs.AI cs.NA math.OC stat.ML 版本更新

Scalable Algorithms for Tractable Schatten Quasi-Norm Minimization

可扩展算法用于可计算的Schatten准范数最小化

Fanhua Shang, Yuanyuan Liu, James Cheng

AI总结 本文提出两种可计算的Schatten准范数,设计高效算法以加速大规模问题解决,并通过实验验证其精度和速度优势。

Comments 16 pages, 5 figures, Appears in Proceedings of the 30th AAAI Conference on Artificial Intelligence (AAAI), Phoenix, Arizona, USA, pp. 2016--2022, 2016

详情
AI中文摘要

Schatten-p准范数(0<p<1)通常用于替代标准核范数以更精确地近似秩函数。然而,现有Schatten-p准范数最小化算法在每次迭代中均涉及奇异值分解(SVD)或特征值分解(EVD),因此对于大规模问题可能变得非常缓慢且不实用。本文首先定义了两种可计算的Schatten准范数,即Frobenius/核混合准范数和双核准范数,并证明它们本质上是Schatten-2/3和1/2准范数,分别导致仅需更新两个较小因子矩阵的高效算法。我们还为解决代表性矩阵补全问题设计了两种高效的近端交替线性化最小化算法。最后,我们提供了算法的全局收敛性和性能保证,其收敛性质优于现有算法。在合成和真实数据上的实验结果表明,我们的算法比现有最先进方法更准确,并且快了多个数量级。

英文摘要

The Schatten-p quasi-norm $(0<p<1)$ is usually used to replace the standard nuclear norm in order to approximate the rank function more accurately. However, existing Schatten-p quasi-norm minimization algorithms involve singular value decomposition (SVD) or eigenvalue decomposition (EVD) in each iteration, and thus may become very slow and impractical for large-scale problems. In this paper, we first define two tractable Schatten quasi-norms, i.e., the Frobenius/nuclear hybrid and bi-nuclear quasi-norms, and then prove that they are in essence the Schatten-2/3 and 1/2 quasi-norms, respectively, which lead to the design of very efficient algorithms that only need to update two much smaller factor matrices. We also design two efficient proximal alternating linearized minimization algorithms for solving representative matrix completion problems. Finally, we provide the global convergence and performance guarantees for our algorithms, which have better convergence properties than existing algorithms. Experimental results on synthetic and real-world data show that our algorithms are more accurate than the state-of-the-art methods, and are orders of magnitude faster.

1605.09772 2026-06-04 eess.SY cs.AI cs.SE cs.SY 版本更新

Technical Report: Directed Controller Synthesis of Discrete Event Systems

面向离散事件系统的定向控制器综合技术报告

Daniel Ciolek, Victor Braberman, Nicolás D'Ippolito, Sebastián Uchitel

AI总结 本文提出一种基于领域无关启发式的定向控制器综合方法,通过高效抽象环境并动态构建组件,实现对安全性和共安全性目标的离散事件系统控制。

Comments 8 pages, submitted to the 55th IEEE Conference on Decision and Control

详情
AI中文摘要

本文提出了一种定向控制器综合(DCS)技术,用于离散事件系统。DCS方法通过领域无关的启发式搜索来探索反应控制器的解空间。该启发式基于对复杂环境的组件化描述进行高效抽象。通过动态构建组件的组合,DCS在减少的状态空间部分中寻找解决方案。本文专注于无时间离散事件系统,具有安全性和共安全性(即可达性)目标。通过与其他知名控制器综合方法(基于符号表示和组合分析)的比较,评估了该技术的性能。

英文摘要

This paper presents a Directed Controller Synthesis (DCS) technique for discrete event systems. The DCS method explores the solution space for reactive controllers guided by a domain-independent heuristic. The heuristic is derived from an efficient abstraction of the environment based on the componentized way in which complex environments are described. Then by building the composition of the components on-the-fly DCS obtains a solution by exploring a reduced portion of the state space. This work focuses on untimed discrete event systems with safety and co-safety (i.e. reachability) goals. An evaluation for the technique is presented comparing it to other well-known approaches to controller synthesis (based on symbolic representation and compositional analyses).

1605.09497 2026-06-04 cs.GT cs.AI cs.MA cs.SY eess.SY 版本更新

Interdependent Scheduling Games

相互依赖的调度博弈

Andres Abeliuk, Haris Aziz, Gerardo Berbeglia, Serge Gaspers, Petr Kalina, Nicholas Mattei, Dominik Peters, Paul Stursberg, Pascal Van Hentenryck, Toby Walsh

AI总结 本文研究了相互依赖的调度博弈模型,探讨了在基础设施规划与协调中的应用,分析了福利最大化、纳什均衡的存在与计算等核心问题。

Comments Accepted to IJCAI 2016

详情
AI中文摘要

我们提出了一种相互依赖的调度博弈模型,其中每个玩家控制一组服务,这些服务可以独立调度。玩家可以随时调度自己的服务,但只有当所有前序服务(可能由同一玩家或不同玩家控制)被激活后,这些服务才会开始为玩家带来收益。这种模型受到在大规模基础设施规划和协调中遇到的问题的启发,例如自然灾害后恢复电力和燃气供应,或在危机中提供医疗护理时不同机构负责人员、设备和药品的配送。我们对这一设置进行了博弈论分析,特别考虑了福利最大化、计算最佳响应、纳什动态以及纳什均衡的存在与计算问题。

英文摘要

We propose a model of interdependent scheduling games in which each player controls a set of services that they schedule independently. A player is free to schedule his own services at any time; however, each of these services only begins to accrue reward for the player when all predecessor services, which may or may not be controlled by the same player, have been activated. This model, where players have interdependent services, is motivated by the problems faced in planning and coordinating large-scale infrastructures, e.g., restoring electricity and gas to residents after a natural disaster or providing medical care in a crisis when different agencies are responsible for the delivery of staff, equipment, and medicine. We undertake a game-theoretic analysis of this setting and in particular consider the issues of welfare maximization, computing best responses, Nash dynamics, and existence and computation of Nash equilibria.

1512.01110 2026-06-04 math.NA cs.AI cs.LG cs.NA 版本更新

Bayesian Matrix Completion via Adaptive Relaxed Spectral Regularization

基于自适应放松谱正则化的贝叶斯矩阵补全

Yang Song, Jun Zhu

AI总结 本文提出一种基于谱正则化的贝叶斯矩阵补全方法,通过放松奇异向量的正交约束,设计出适用于贝叶斯推断的自适应谱正则化方法,无需参数调优即可自动推断潜在因子数量,在稀疏矩阵上表现优异。

Comments Accepted to AAAI 2016

详情
AI中文摘要

基于低秩矩阵分解的贝叶斯矩阵补全方法已取得良好成果,但基于更直接的谱正则化方法的研究较少。本文通过提出基于谱正则化的新型贝叶斯矩阵补全方法填补这一空白。为克服奇异向量正交约束处理的困难,我们推导出一种等价形式,其中包含放松的约束,从而设计出适用于贝叶斯推断的自适应谱正则化方法。我们的贝叶斯方法不需要参数调优,能够自动推断潜在因子数量。在合成和真实数据集上的实验显示,该方法在恢复秩和协同过滤任务中表现良好,尤其在非常稀疏的矩阵上结果显著。

英文摘要

Bayesian matrix completion has been studied based on a low-rank matrix factorization formulation with promising results. However, little work has been done on Bayesian matrix completion based on the more direct spectral regularization formulation. We fill this gap by presenting a novel Bayesian matrix completion method based on spectral regularization. In order to circumvent the difficulties of dealing with the orthonormality constraints of singular vectors, we derive a new equivalent form with relaxed constraints, which then leads us to design an adaptive version of spectral regularization feasible for Bayesian inference. Our Bayesian method requires no parameter tuning and can infer the number of latent factors automatically. Experiments on synthetic and real datasets demonstrate encouraging results on rank recovery and collaborative filtering, with notably good results for very sparse matrices.

1511.03722 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ME stat.ML 版本更新

Doubly Robust Off-policy Value Evaluation for Reinforcement Learning

强化学习中的双重鲁棒离策略价值评估

Nan Jiang, Lihong Li

AI总结 本文提出一种双重鲁棒估计器,用于离策略价值评估,兼顾无偏性和低方差性,并在基准问题中验证其有效性。

Comments 14 pages; 4 figures; ICML 2016

详情
AI中文摘要

本文研究了强化学习(RL)中的离策略价值评估问题,其中目标是基于由不同策略收集的数据来估计新策略的价值。这一问题通常是将RL应用于现实世界问题时的关键步骤。尽管其重要性,现有的通用方法要么存在不可控的偏差,要么方差较高。在本文中,我们扩展了用于轮盘赌的双重鲁棒估计器到顺序决策问题,实现了两全其美:它保证无偏,并且比流行的重要性采样估计器具有显著更低的方差。我们展示了估计器在多个基准问题中的准确性,并展示了其作为安全策略改进子程序的用途。我们还提供了对问题难度的理论结果,并证明在某些情况下,我们的估计器可以达到下限。

英文摘要

We study the problem of off-policy value evaluation in reinforcement learning (RL), where one aims to estimate the value of a new policy based on data collected by a different policy. This problem is often a critical step when applying RL in real-world problems. Despite its importance, existing general methods either have uncontrolled bias or suffer high variance. In this work, we extend the doubly robust estimator for bandits to sequential decision-making problems, which gets the best of both worlds: it is guaranteed to be unbiased and can have a much lower variance than the popular importance sampling estimators. We demonstrate the estimator's accuracy in several benchmark problems, and illustrate its use as a subroutine in safe policy improvement. We also provide theoretical results on the hardness of the problem, and show that our estimator can match the lower bound in certain scenarios.

1605.05711 2026-06-04 math.OC cs.AI cs.SY eess.SY 版本更新

The Information-Collecting Vehicle Routing Problem: Stochastic Optimization for Emergency Storm Response

信息收集车辆路径问题:面向紧急风暴响应的随机优化

Lina Al-Kanj, Warren B. Powell, Belgacem Bouzaiene-Ayari

AI总结 本文提出一种随机优化策略,通过电话调用建立故障信念并利用车辆收集信息,以快速恢复电网故障,首次将信息收集和信念建模纳入车辆路径问题。

详情
AI中文摘要

电力公司面临风暴和冰灾导致的停电问题,但大多数电网缺乏传感器定位故障点。本文开发了一种策略,利用电话调用建立故障信念,并通过车辆收集额外信息,以快速恢复电网。该策略将车辆路径问题建模为顺序随机优化问题,提出随机前瞻策略并使用MCTS生成近优政策。仿真结果表明,该策略恢复电网速度优于传统启发式方法。

英文摘要

Utilities face the challenge of responding to power outages due to storms and ice damage, but most power grids are not equipped with sensors to pinpoint the precise location of the faults causing the outage. Instead, utilities have to depend primarily on phone calls (trouble calls) from customers who have lost power to guide the dispatching of utility trucks. In this paper, we develop a policy that routes a utility truck to restore outages in the power grid as quickly as possible, using phone calls to create beliefs about outages, but also using utility trucks as a mechanism for collecting additional information. This means that routing decisions change not only the physical state of the truck (as it moves from one location to another) and the grid (as the truck performs repairs), but also our belief about the network, creating the first stochastic vehicle routing problem that explicitly models information collection and belief modeling. We address the problem of managing a single utility truck, which we start by formulating as a sequential stochastic optimization model which captures our belief about the state of the grid. We propose a stochastic lookahead policy, and use Monte Carlo tree search (MCTS) to produce a practical policy that is asymptotically optimal. Simulation results show that the developed policy restores the power grid much faster compared to standard industry heuristics.

1604.08768 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Supervisory Control for Behavior Composition

行为组合的监督控制

Paolo Felli, Nitin Yadav, Sebastian Sardina

AI总结 将AI中的行为组合合成任务与离散事件系统领域的监督控制理论联系起来,通过协调可用行为实现目标模块,利用离散事件系统的理论基础和工具。

详情
AI中文摘要

将AI中的行为组合合成任务与离散事件系统领域的监督控制理论联系起来,通过协调可用行为实现目标模块,利用离散事件系统的理论基础和工具。

英文摘要

We relate behavior composition, a synthesis task studied in AI, to supervisory control theory from the discrete event systems field. In particular, we show that realizing (i.e., implementing) a target behavior module (e.g., a house surveillance system) by suitably coordinating a collection of available behaviors (e.g., automatic blinds, doors, lights, cameras, etc.) amounts to imposing a supervisor onto a special discrete event system. Such a link allows us to leverage on the solid foundations and extensive work on discrete event systems, including borrowing tools and ideas from that field. As evidence of that we show how simple it is to introduce preferences in the mapped framework.

1604.03912 2026-06-04 cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

Inverse Reinforcement Learning with Simultaneous Estimation of Rewards and Dynamics

逆强化学习与奖励和动态的同时估计

Michael Herman, Tobias Gindele, Jörg Wagner, Felix Schmitt, Wolfram Burgard

AI总结 本文提出一种基于梯度的逆强化学习方法,同时估计系统动态和奖励函数,提升了样本效率和估计准确性。

Comments accepted to appear in AISTATS 2016

详情
AI中文摘要

逆强化学习(IRL)描述了从观察到的智能体行为中学习未知马尔可夫决策过程(MDP)奖励函数的问题。由于智能体的行为源于其策略,而MDP策略依赖于随机系统动态和奖励函数,逆问题的解决方案受到两者显著影响。当前的IRL方法假设如果转移模型未知,可以获取额外的系统动态样本,或观察行为提供足够的系统动态样本以准确求解逆问题。这些假设往往不成立。为克服这一问题,我们提出了一种基于梯度的IRL方法,同时估计系统的动态。通过求解联合优化问题,我们的方法考虑了演示的偏差,这种偏差源于生成策略。在合成MDP和迁移学习任务上的评估显示,该方法在样本效率以及估计的奖励函数和转移模型的准确性方面有所改进。

英文摘要

Inverse Reinforcement Learning (IRL) describes the problem of learning an unknown reward function of a Markov Decision Process (MDP) from observed behavior of an agent. Since the agent's behavior originates in its policy and MDP policies depend on both the stochastic system dynamics as well as the reward function, the solution of the inverse problem is significantly influenced by both. Current IRL approaches assume that if the transition model is unknown, additional samples from the system's dynamics are accessible, or the observed behavior provides enough samples of the system's dynamics to solve the inverse problem accurately. These assumptions are often not satisfied. To overcome this, we present a gradient-based IRL approach that simultaneously estimates the system's dynamics. By solving the combined optimization problem, our approach takes into account the bias of the demonstrations, which stems from the generating policy. The evaluation on a synthetic MDP and a transfer learning task shows improvements regarding the sample efficiency as well as the accuracy of the estimated reward functions and transition models.

1604.02080 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Planning with Information-Processing Constraints and Model Uncertainty in Markov Decision Processes

在马尔可夫决策过程中的信息处理约束与模型不确定性规划

Jordi Grau-Moya, Felix Leibfried, Tim Genewein, Daniel A. Braun

AI总结 本文提出考虑模型不确定性的马尔可夫决策过程规划方法,通过信息论原理统一解决信息处理约束问题,结合广义变分原理推导价值迭代方案,并在网格世界模拟中验证其有效性。

Comments 16 pages, 3 figures

详情
AI中文摘要

信息论原理已被用于解决特定类别的马尔可夫决策问题。数学上,此类方法由变分自由能原理支配,允许通过相对于参考分布的KL散度表达的信息处理约束来解决MDP规划问题。本文考虑了此类MDP规划器的推广,即考虑模型不确定性。由于模型不确定性也可以形式化为信息处理约束,因此可以从单一广义变分原理中推导出统一的解决方案。我们提供了广义价值迭代方案并给出了收敛性证明。作为极限情况,该广义方案包括标准价值迭代(已知模型)、贝叶斯MDP规划和鲁棒规划。我们在网格世界模拟中展示了该方法的优势。

英文摘要

Information-theoretic principles for learning and acting have been proposed to solve particular classes of Markov Decision Problems. Mathematically, such approaches are governed by a variational free energy principle and allow solving MDP planning problems with information-processing constraints expressed in terms of a Kullback-Leibler divergence with respect to a reference distribution. Here we consider a generalization of such MDP planners by taking model uncertainty into account. As model uncertainty can also be formalized as an information-processing constraint, we can derive a unified solution from a single generalized variational principle. We provide a generalized value iteration scheme together with a convergence proof. As limit cases, this generalized scheme includes standard value iteration with a known model, Bayesian MDP planning, and robust planning. We demonstrate the benefits of this approach in a grid world simulation.

1603.04586 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Optimal Sensing via Multi-armed Bandit Relaxations in Mixed Observability Domains

通过混合可观测域中的多臂老虎机放松实现最优感知

Mikko Lauri, Risto Ritala

AI总结 研究在混合可观测域中不确定决策问题,通过放松约束推导最优价值函数上界,并利用多臂老虎机的可计算最优策略提升搜索空间剪枝效率,实验显示在目标监控领域有效。

Comments 6 pages, 2 figures

详情
Journal ref
Proc. IEEE Intl. Conf. on Robotics and Automation (ICRA), pp. 4807-4812, 2015
AI中文摘要

在混合可观测域中研究不确定决策问题,目标是在部分可观测随机过程中,在完全可观测内部状态约束下最大化获得的信息量。通过放松约束推导最优价值函数的上界,识别出在何种条件下放松问题可转化为多臂老虎机,其最优策略易于计算。将该上界应用于原始问题的搜索空间剪枝,并通过模拟实验评估对解质量的影响。实验结果表明,在目标监控领域有效剪枝了搜索空间。

英文摘要

Sequential decision making under uncertainty is studied in a mixed observability domain. The goal is to maximize the amount of information obtained on a partially observable stochastic process under constraints imposed by a fully observable internal state. An upper bound for the optimal value function is derived by relaxing constraints. We identify conditions under which the relaxed problem is a multi-armed bandit whose optimal policy is easily computable. The upper bound is applied to prune the search space in the original problem, and the effect on solution quality is assessed via simulation experiments. Empirical results show effective pruning of the search space in a target monitoring domain.

1603.02038 2026-06-04 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

Unscented Bayesian Optimization for Safe Robot Grasping

无迹贝叶斯优化用于安全机器人抓取

José Nogueira, Ruben Martinez-Cantin, Alexandre Bernardino, Lorenzo Jamone

AI总结 本文提出无迹贝叶斯优化算法,通过考虑输入噪声在安全区域寻找最优抓取策略,提升机器人抓取的安全性和效率。

Comments conference paper

详情
AI中文摘要

我们解决了在输入空间存在不确定性时的机器人抓取优化问题。通过试错探索策略实现抓取未知物体。贝叶斯优化是一种样本高效的优化算法,特别适合此设置,因为它能主动减少试验次数以学习待优化函数。事实上,这种主动对象探索策略与婴儿学习最佳抓取方式的策略相同。在学习抓取策略时,一些抓取参数配置可能对物体与机器人末端执行器之间相对姿态的误差非常敏感。我们称这些配置为不安全,因为抓取执行中的小误差可能将好的抓取变为坏的抓取。因此,为了降低抓取失败的风险,抓取应规划在安全区域。我们提出了一种新的算法,即无迹贝叶斯优化,能够在考虑输入噪声的情况下进行样本高效的优化以找到安全的极值。无迹贝叶斯优化的贡献是双方面的:一方面提供了一个新的决策过程,驱动探索到安全区域;另一方面提供了一个新的选择过程,选择在不进行额外分析或计算成本的情况下最优的抓取策略。这两个贡献都根植于无迹变换背后的强大理论,这是一种流行的非线性近似方法。我们在合成问题和现实的机器人抓取模拟中展示了其相对于经典贝叶斯优化的优势。结果表明,我们的方法在几次试验后就能获得最优且鲁棒的抓取策略,同时所选的抓取保持在安全区域。

英文摘要

We address the robot grasp optimization problem of unknown objects considering uncertainty in the input space. Grasping unknown objects can be achieved by using a trial and error exploration strategy. Bayesian optimization is a sample efficient optimization algorithm that is especially suitable for this setups as it actively reduces the number of trials for learning about the function to optimize. In fact, this active object exploration is the same strategy that infants do to learn optimal grasps. One problem that arises while learning grasping policies is that some configurations of grasp parameters may be very sensitive to error in the relative pose between the object and robot end-effector. We call these configurations unsafe because small errors during grasp execution may turn good grasps into bad grasps. Therefore, to reduce the risk of grasp failure, grasps should be planned in safe areas. We propose a new algorithm, Unscented Bayesian optimization that is able to perform sample efficient optimization while taking into consideration input noise to find safe optima. The contribution of Unscented Bayesian optimization is twofold as if provides a new decision process that drives exploration to safe regions and a new selection procedure that chooses the optimal in terms of its safety without extra analysis or computational cost. Both contributions are rooted on the strong theory behind the unscented transformation, a popular nonlinear approximation method. We show its advantages with respect to the classical Bayesian optimization both in synthetic problems and in realistic robot grasp simulations. The results highlights that our method achieves optimal and robust grasping policies after few trials while the selected grasps remain in safe regions.

1603.00748 2026-06-04 cs.LG cs.AI cs.RO cs.SY eess.SY 版本更新

Continuous Deep Q-Learning with Model-based Acceleration

基于模型的连续深度Q学习加速

Shixiang Gu, Timothy Lillicrap, Ilya Sutskever, Sergey Levine

AI总结 本文提出连续深度Q学习算法NAF及基于模型的加速方法,用于提升连续控制任务的样本效率和学习速度。

详情
AI中文摘要

模型无关强化学习已成功应用于多种挑战性问题,并扩展到处理大规模神经网络策略和价值函数。然而,模型无关算法的样本复杂性,特别是使用高维函数近似器时,限制了其在物理系统中的应用。本文探索了减少深度强化学习样本复杂性的算法和表示方法。我们提出两种互补技术来提高此类算法的效率。首先,我们推导出Q学习的连续变种,称为归一化优势函数(NAF),作为替代更常用的策略梯度和actor-critic方法。NAF表示允许我们应用带有经验回放的Q学习来处理连续任务,并在一组模拟机器人控制任务上显著提高性能。为进一步提高我们的方法效率,我们探索了使用学习模型来加速模型无关强化学习。我们展示迭代重新拟合的局部线性模型在这一点上特别有效,并在适用此类模型的领域中展示了显著更快的学习速度。

英文摘要

Model-free reinforcement learning has been successfully applied to a range of challenging problems, and has recently been extended to handle large neural network policies and value functions. However, the sample complexity of model-free algorithms, particularly when using high-dimensional function approximators, tends to limit their applicability to physical systems. In this paper, we explore algorithms and representations to reduce the sample complexity of deep reinforcement learning for continuous control tasks. We propose two complementary techniques for improving the efficiency of such algorithms. First, we derive a continuous variant of the Q-learning algorithm, which we call normalized adantage functions (NAF), as an alternative to the more commonly used policy gradient and actor-critic methods. NAF representation allows us to apply Q-learning with experience replay to continuous tasks, and substantially improves performance on a set of simulated robotic control tasks. To further improve the efficiency of our approach, we explore the use of learned models for accelerating model-free reinforcement learning. We show that iteratively refitted local linear models are especially effective for this, and demonstrate substantially faster learning on domains where such models are applicable.

1506.01326 2026-06-04 math.NA cs.AI cs.LG cs.NA stat.CO stat.ML 版本更新

Probabilistic Numerics and Uncertainty in Computations

概率数值计算与计算中的不确定性

Philipp Hennig, Michael A Osborne, Mark Girolami

AI总结 本文呼吁采用概率数值方法,通过在计算中返回不确定性来改进线性代数、积分、优化和微分方程求解等算法,强调其在气候科学和天文学等领域的应用价值。

Comments Author Generated Postprint. 17 pages, 4 Figures, 1 Table

详情
AI中文摘要

我们呼吁采用概率数值方法:即在数值任务中返回不确定性的算法,包括线性代数、积分、优化和求解微分方程。这些不确定性源于数值计算中由于时间和硬件限制导致的精度损失,对现代科学和工业至关重要。在诸如气候科学和天文学等应用中,基于大规模复杂数据的计算需求促使重新关注数值不确定性的管理。我们描述了几种经典数值方法如何自然地被解释为概率推断。然后展示概率观点如何提出新的算法,能够灵活适应应用需求,并提供改进的实证性能。我们提供了天文学和天文成像等实际科学问题中概率数值算法的实例,同时指出这些新算法存在的开放问题。最后,我们描述了概率数值方法如何为结合数值算法(如数值优化器和微分方程求解器)的计算提供一致的框架,可能允许诊断(和控制)计算中的误差源。

英文摘要

We deliver a call to arms for probabilistic numerical methods: algorithms for numerical tasks, including linear algebra, integration, optimization and solving differential equations, that return uncertainties in their calculations. Such uncertainties, arising from the loss of precision induced by numerical calculation with limited time or hardware, are important for much contemporary science and industry. Within applications such as climate science and astrophysics, the need to make decisions on the basis of computations with large and complex data has led to a renewed focus on the management of numerical uncertainty. We describe how several seminal classic numerical methods can be interpreted naturally as probabilistic inference. We then show that the probabilistic view suggests new algorithms that can flexibly be adapted to suit application specifics, while delivering improved empirical performance. We provide concrete illustrations of the benefits of probabilistic numeric algorithms on real scientific problems from astrometry and astronomical imaging, while highlighting open problems with these new algorithms. Finally, we describe how probabilistic numerical methods provide a coherent framework for identifying the uncertainty in calculations performed with a combination of numerical algorithms (e.g. both numerical optimisers and differential equation solvers), potentially allowing the diagnosis (and control) of error sources in computations.

1402.0635 2026-06-04 stat.ML cs.AI cs.LG cs.SY eess.SY 版本更新

Generalization and Exploration via Randomized Value Functions

通过随机价值函数实现泛化与探索

Ian Osband, Benjamin Van Roy, Zheng Wen

AI总结 本文提出随机最小二乘价值迭代算法(RLSVI),通过线性参数化价值函数实现高效的探索与泛化,证明其在无先验知识学习中的近优性能。

Comments arXiv admin note: text overlap with arXiv:1307.4847

详情
AI中文摘要

我们提出了随机最小二乘价值迭代(RLSVI)——一种新的强化学习算法,旨在通过线性参数化价值函数实现高效的探索与泛化。我们解释了使用玻尔兹曼或epsilon-贪婪探索的最小二乘价值迭代版本为何效率低下,并通过计算结果展示了RLSVI带来的显著效率提升。进一步,我们建立了RLSVI预期遗憾的上界,证明其在无先验知识学习中的近最优性。更广泛地说,我们的结果表明,随机价值函数为解决强化学习中的关键挑战——合成高效探索与有效泛化——提供了一种有前景的方法。

英文摘要

We propose randomized least-squares value iteration (RLSVI) -- a new reinforcement learning algorithm designed to explore and generalize efficiently via linearly parameterized value functions. We explain why versions of least-squares value iteration that use Boltzmann or epsilon-greedy exploration can be highly inefficient, and we present computational results that demonstrate dramatic efficiency gains enjoyed by RLSVI. Further, we establish an upper bound on the expected regret of RLSVI that demonstrates near-optimality in a tabula rasa learning context. More broadly, our results suggest that randomized value functions offer a promising approach to tackling a critical challenge in reinforcement learning: synthesizing efficient exploration and effective generalization.

1406.5311 2026-06-04 math.OC cs.AI cs.LG cs.NA math.NA stat.ML 版本更新

Towards A Deeper Geometric, Analytic and Algorithmic Understanding of Margins

迈向更深入的几何、分析和算法对边界的理解

Aaditya Ramdas, Javier Peña

AI总结 本文研究了矩阵A的边界条件度量,探讨了线性可行性问题的难度,通过几何、分析和算法方法扩展了边界理论,并证明了感知机收敛率与边界的关联。

Comments 18 pages, 3 figures

详情
Journal ref
Optimization Methods and Software, Volume 31, Issue 2, Pages 377-391, 2016
AI中文摘要

给定一个矩阵A,线性可行性问题(线性分类是其特例)旨在求解原问题w: A^Tw > 0或证明对偶问题的证书,即概率分布p: Ap = 0。受

英文摘要

Given a matrix $A$, a linear feasibility problem (of which linear classification is a special case) aims to find a solution to a primal problem $w: A^Tw > \textbf{0}$ or a certificate for the dual problem which is a probability distribution $p: Ap = \textbf{0}$. Inspired by the continued importance of "large-margin classifiers" in machine learning, this paper studies a condition measure of $A$ called its \textit{margin} that determines the difficulty of both the above problems. To aid geometrical intuition, we first establish new characterizations of the margin in terms of relevant balls, cones and hulls. Our second contribution is analytical, where we present generalizations of Gordan's theorem, and variants of Hoffman's theorems, both using margins. We end by proving some new results on a classical iterative scheme, the Perceptron, whose convergence rates famously depends on the margin. Our results are relevant for a deeper understanding of margin-based learning and proving convergence rates of iterative schemes, apart from providing a unifying perspective on this vast topic.

1601.00738 2026-06-04 cs.DC cs.AI cs.DB cs.SY eess.SY 版本更新

Resource Sharing for Multi-Tenant NoSQL Data Store in Cloud

多租户NoSQL数据存储在云计算中的资源共享

Jiaan Zeng

AI总结 本文研究多租户NoSQL数据存储中资源共享问题,提出两种解决方案:针对独立数据本地文件系统和共享数据并行文件系统的调度方案与工作负载感知资源预留方法,以提高性能和适应动态工作负载。

Comments PhD dissertation, December 2015

详情
AI中文摘要

多租户模式在云计算NoSQL数据存储中受到青睐,因为它能够在低成本下实现资源共享。多租户模式根据后端文件系统是本地文件系统(LFS)还是并行文件系统(PFS),以及租户是否独立或跨租户共享数据而有所不同。本论文聚焦并提出解决两种情况的方案:独立数据本地文件系统和共享数据并行文件系统。在独立数据本地文件系统情况下,Cassandra和HBase等最先进的NoSQL存储在特定条件下会出现资源竞争,导致性能下降。我们研究了干扰现象并提出两种方法。第一种提供了一种调度方案,可以近似资源消耗,适应工作负载动态并以分布式方式运行。第二种引入了工作负载感知的资源预留方法,以防止干扰。该方法依赖于离线获得的性能模型,并根据不同的工作负载资源需求进行预留。结果表明,这两种方法可以共同防止干扰并适应多租户下的动态工作负载。在共享数据并行文件系统情况下,已证明在租户之间共享数据时,使用并行文件系统运行分布式NoSQL存储并不经济。由于NoSQL存储对并行文件系统的不熟悉,引入了额外的开销。本论文针对键值存储(KVS),一种特定的NoSQL存储形式,提出了一种轻量级的KVS,基于并行文件系统以提高效率。该解决方案基于嵌入式KVS以实现高性能,但使用新颖的数据结构支持并发写入。结果表明,所提出的系统在多种不同的工作负载下均优于Cassandra和Voldemort。

英文摘要

Multi-tenancy hosting of users in cloud NoSQL data stores is favored by cloud providers because it enables resource sharing at low operating cost. Multi-tenancy takes several forms depending on whether the back-end file system is a local file system (LFS) or a parallel file system (PFS), and on whether tenants are independent or share data across tenants. In this thesis I focus on and propose solutions to two cases: independent data-local file system, and shared data-parallel file system. In the independent data-local file system case, resource contention occurs under certain conditions in Cassandra and HBase, two state-of-the-art NoSQL stores, causing performance degradation for one tenant by another. We investigate the interference and propose two approaches. The first provides a scheduling scheme that can approximate resource consumption, adapt to workload dynamics and work in a distributed fashion. The second introduces a workload-aware resource reservation approach to prevent interference. The approach relies on a performance model obtained offline and plans the reservation according to different workload resource demands. Results show the approaches together can prevent interference and adapt to dynamic workloads under multi-tenancy. In the shared data-parallel file system case, it has been shown that running a distributed NoSQL store over PFS for shared data across tenants is not cost effective. Overheads are introduced due to the unawareness of the NoSQL store of PFS. This dissertation targets the key-value store (KVS), a specific form of NoSQL stores, and proposes a lightweight KVS over a parallel file system to improve efficiency. The solution is built on an embedded KVS for high performance but uses novel data structures to support concurrent writes. Results show the proposed system outperforms Cassandra and Voldemort in several different workloads.

1512.06789 2026-06-04 stat.ML cs.AI cs.SY eess.SY math.OC 版本更新

Information-Theoretic Bounded Rationality

信息论有界理性

Pedro A. Ortega, Daniel A. Braun, Justin Dyer, Kee-Eung Kim, Naftali Tishby

AI总结 本文基于信息论提出有界理性的理论,通过自由能函数描述决策,具备控制解空间、精确蒙特卡洛规划及捕捉模型不确定性的特性,并扩展至序列决策。

Comments 47 pages, 19 figures

详情
AI中文摘要

有界理性,即在资源限制下进行决策和规划,被认为是人工智能、强化学习、计算神经科学和经济学中的重要开放问题。本文提供了一个基于信息论的有界理性的理论综述。我们为使用自由能功能作为有界理性决策的客观函数提供了概念论证。该功能具有三个关键特性:它控制了解空间的大小;它具有精确的蒙特卡洛规划器,却无需穷尽搜索;它捕捉到缺乏证据或与其他具有未知意图的智能体交互时产生的模型不确定性。我们讨论了单步决策情况,并展示如何通过等价变换扩展到序列决策。这种扩展产生了一种非常一般的决策问题类,涵盖了经典决策规则(如EXPECTIMAX和MINIMAX)作为极限情况,以及信任和风险敏感的规划。

英文摘要

Bounded rationality, that is, decision-making and planning under resource limitations, is widely regarded as an important open problem in artificial intelligence, reinforcement learning, computational neuroscience and economics. This paper offers a consolidated presentation of a theory of bounded rationality based on information-theoretic ideas. We provide a conceptual justification for using the free energy functional as the objective function for characterizing bounded-rational decisions. This functional possesses three crucial properties: it controls the size of the solution space; it has Monte Carlo planners that are exact, yet bypass the need for exhaustive search; and it captures model uncertainty arising from lack of evidence or from interacting with other agents having unknown intentions. We discuss the single-step decision-making case, and show how to extend it to sequential decisions using equivalence transformations. This extension yields a very general class of decision problems that encompass classical decision rules (e.g. EXPECTIMAX and MINIMAX) as limit cases, as well as trust- and risk-sensitive planning.

1512.06427 2026-06-04 cs.AI cs.DS cs.SY eess.SY math.OC 版本更新

Towards Integrated Glance To Restructuring in Combinatorial Optimization

迈向组合优化中重构的整合视角

Mark Sh. Levin

AI总结 本文研究组合优化中解决方案的重构问题,探讨重构成本与目标解接近度,并针对三种重构类型提出单准则和多准则问题解决方法。

Comments 31 pages, 34 figures, 10 tables

详情
AI中文摘要

本文聚焦于一组新的组合优化问题,即解决方案(作为集合/结构)的重构。重构过程的两个主要特征是重构成本和接近目标解的程度。研究了三种重构问题类型:(a)单阶段结构化,(b)多阶段结构化,(c)在改变元素集上的结构化。可以考虑单准则和多准则问题形式。重构问题对应于模块化系统或解决方案的重新设计(改进、升级)。本文描述并示例了针对背包问题、多选问题、分配问题、生成树问题、聚类问题、多准则排序问题、形态学团问题等组合优化问题的重构方法。数值示例展示了重构问题和求解方案。

英文摘要

The paper focuses on a new class of combinatorial problems which consists in restructuring of solutions (as sets/structures) in combinatorial optimization. Two main features of the restructuring process are examined: (i) a cost of the restructuring, (ii) a closeness to a goal solution. Three types of the restructuring problems are under study: (a) one-stage structuring, (b) multi-stage structuring, and (c) structuring over changed element set. One-criterion and multicriteria problem formulations can be considered. The restructuring problems correspond to redesign (improvement, upgrade) of modular systems or solutions. The restructuring approach is described and illustrated (problem statements, solving schemes, examples) for the following combinatorial optimization problems: knapsack problem, multiple choice problem, assignment problem, spanning tree problems, clustering problem, multicriteria ranking (sorting) problem, morphological clique problem. Numerical examples illustrate the restructuring problems and solving schemes.

1512.01885 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Probabilistic Structural Controllability in Causal Bayesian Networks

因果贝叶斯网络中的概率结构可控性

Ardavan Salehi Nobandegani, Ioannis N. Psaromiligkos

AI总结 本文首次研究因果贝叶斯网络中的概率可控性问题,提出概率结构可控性的定义,并识别出一组足够的驱动变量以实现目标变量状态的概率控制。

详情
AI中文摘要

人类经常面临一个关键问题,即在不确定的环境中,如何通过干预驱动变量来增加或减少目标变量达到期望或非期望状态的概率。本文首次研究了因果贝叶斯网络中的概率可控性问题。具体而言,本文旨在两方面:(i) 引入并形式化因果贝叶斯网络中的概率结构可控性问题;(ii) 识别一组足够的驱动变量以实现因果贝叶斯网络的概率结构可控性。我们还阐述了所识别的驱动变量集合所满足的最小性性质。在此背景下,'结构'一词表示仅已知CBN的结构。

英文摘要

Humans routinely confront the following key question which could be viewed as a probabilistic variant of the controllability problem: While faced with an uncertain environment governed by causal structures, how should they practice their autonomy by intervening on driver variables, in order to increase (or decrease) the probability of attaining their desired (or undesired) state for some target variable? In this paper, for the first time, the problem of probabilistic controllability in Causal Bayesian Networks (CBNs) is studied. More specifically, the aim of this paper is two-fold: (i) to introduce and formalize the problem of probabilistic structural controllability in CBNs, and (ii) to identify a sufficient set of driver variables for the purpose of probabilistic structural controllability of a generic CBN. We also elaborate on the nature of minimality the identified set of driver variables satisfies. In this context, the term "structural" signifies the condition wherein solely the structure of the CBN is known.

1505.00274 2026-06-04 cs.AI cs.SY eess.SY stat.ML 版本更新

Stick-Breaking Policy Learning in Dec-POMDPs

在Dec-POMDPs中采用Stick-Breaking策略的学习

Miao Liu, Christopher Amato, Xuejun Liao, Lawrence Carin, Jonathan P. How

AI总结 本文提出了一种变大小状态控制器的Dec-SBPR框架,通过Stick-Breaking先验构建局部策略,无需假设Dec-POMDP模型即可学习控制器参数,有效提升大规模问题的性能。

详情
AI中文摘要

期望最大化(EM)最近已被证明是学习大规模分布式部分可观测马尔可夫决策过程(Dec-POMDPs)中有限状态控制器(FSCs)的高效算法。然而,当前方法使用固定大小的FSCs,通常收敛于远离最优的极值。本文考虑使用可变大小的FSCs来表示每个智能体的局部策略。这些可变大小的FSCs通过Stick-Breaking先验构建,导致一种新的框架,称为去中心化Stick-Breaking策略表示(Dec-SBPR)。该方法通过变分贝叶斯算法学习控制器参数,而无需假设Dec-POMDP模型可用。Dec-SBPR在多个基准问题上的性能表明,该算法能够扩展到大规模问题,同时优于其他最先进的方法。

英文摘要

Expectation maximization (EM) has recently been shown to be an efficient algorithm for learning finite-state controllers (FSCs) in large decentralized POMDPs (Dec-POMDPs). However, current methods use fixed-size FSCs and often converge to maxima that are far from optimal. This paper considers a variable-size FSC to represent the local policy of each agent. These variable-size FSCs are constructed using a stick-breaking prior, leading to a new framework called \emph{decentralized stick-breaking policy representation} (Dec-SBPR). This approach learns the controller parameters with a variational Bayesian algorithm without having to assume that the Dec-POMDP model is available. The performance of Dec-SBPR is demonstrated on several benchmark problems, showing that the algorithm scales to large problems while outperforming other state-of-the-art methods.

1509.03044 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Recurrent Reinforcement Learning: A Hybrid Approach

递归强化学习:一种混合方法

Xiujun Li, Lihong Li, Jianfeng Gao, Xiaodong He, Jianshu Chen, Li Deng, Ji He

AI总结 本文提出一种混合模型,结合监督学习和强化学习,用于部分可观测任务的状态表示学习,在极少领域知识下有效。

Comments 11 pages, 6 figures

详情
AI中文摘要

成功的强化学习应用往往需要处理部分可观测状态。通常很难构建和推断隐藏状态,因为它们依赖于智能体的整个交互历史,可能需要大量领域知识。本文研究了一种深度学习方法,用于在极少领域知识下学习部分可观测任务的状态表示。特别地,我们提出了一种新的混合模型,结合监督学习(SL)和强化学习(RL)的优点,以联合方式训练:SL组件可以是循环神经网络(RNN)或其长短期记忆(LSTM)版本,具有捕捉长期依赖性的能力,从而有效学习隐藏状态的表示。RL组件是一个深度Q网络(DQN),学习优化控制以最大化长期奖励。在直接邮寄营销问题上的大量实验展示了所提出方法的有效性和优势,其在一组先前最先进的方法中表现最佳。

英文摘要

Successful applications of reinforcement learning in real-world problems often require dealing with partially observable states. It is in general very challenging to construct and infer hidden states as they often depend on the agent's entire interaction history and may require substantial domain knowledge. In this work, we investigate a deep-learning approach to learning the representation of states in partially observable tasks, with minimal prior knowledge of the domain. In particular, we propose a new family of hybrid models that combines the strength of both supervised learning (SL) and reinforcement learning (RL), trained in a joint fashion: The SL component can be a recurrent neural networks (RNN) or its long short-term memory (LSTM) version, which is equipped with the desired property of being able to capture long-term dependency on history, thus providing an effective way of learning the representation of hidden states. The RL component is a deep Q-network (DQN) that learns to optimize the control for maximizing long-term rewards. Extensive experiments in a direct mailing campaign problem demonstrate the effectiveness and advantages of the proposed approach, which performs the best among a set of previous state-of-the-art methods.

1510.07313 2026-06-04 eess.SY cs.AI cs.LO cs.RO cs.SY 版本更新

Safe Control under Uncertainty

在不确定性下的安全控制

Dorsa Sadigh, Ashish Kapoor

AI总结 本文提出Probabilistic Signal Temporal Logic(PrSTL)用于定义随机属性并确保概率保证,通过该逻辑合成安全控制器,应用于四旋翼和自动驾驶车辆等案例。

Comments 10 pages, 6 figures, Submitted to HSCC 2016

详情
AI中文摘要

本文提出Probabilistic Signal Temporal Logic(PrSTL)作为定义随机属性并确保概率保证的表达语言,通过该逻辑合成安全控制器,应用于四旋翼和自动驾驶车辆等案例。

英文摘要

Controller synthesis for hybrid systems that satisfy temporal specifications expressing various system properties is a challenging problem that has drawn the attention of many researchers. However, making the assumption that such temporal properties are deterministic is far from the reality. For example, many of the properties the controller has to satisfy are learned through machine learning techniques based on sensor input data. In this paper, we propose a new logic, Probabilistic Signal Temporal Logic (PrSTL), as an expressive language to define the stochastic properties, and enforce probabilistic guarantees on them. We further show how to synthesize safe controllers using this logic for cyber-physical systems under the assumption that the stochastic properties are based on a set of Gaussian random variables. One of the key distinguishing features of PrSTL is that the encoded logic is adaptive and changes as the system encounters additional data and updates its beliefs about the latent random variables that define the safety properties. We demonstrate our approach by synthesizing safe controllers under the PrSTL specifications for multiple case studies including control of quadrotors and autonomous vehicles in dynamic environments.

1510.04914 2026-06-04 cs.AI cs.DC cs.MS cs.NA math.NA math.OC 版本更新

Hybridization of Interval CP and Evolutionary Algorithms for Optimizing Difficult Problems

区间CP与进化算法的混合方法用于优化难题

Charlie Vanaret, Jean-Baptiste Gotteland, Nicolas Durand, Jean-Marc Alliot

AI总结 本文提出一种混合框架,结合区间方法与进化算法,通过消息传递实现并行搜索,展示Charibde在解决困难COCONUT问题时优于现有求解器。

Comments 21st International Conference on Principles and Practice of Constraint Programming (CP 2015), 2015

详情
AI中文摘要

在全局优化中,唯一严谨的数值证明最优性的方法是基于区间的算法,通过搜索空间的分支和不可含最优解的子域修剪。最先进的求解器通常整合局部优化算法来计算每个子空间的良好上界。本文提出了一种合作框架,其中区间方法与进化算法相互协作。后者是随机算法,通过候选解种群在搜索空间中迭代进化以达到满意解。在我们的合作求解器Charibde中,进化算法和基于区间的算法并行运行,并通过消息传递以高级方式交换边界、解和搜索空间。对困难COCONUT问题的基准测试表明,Charibde在非严谨求解器中具有竞争力,并比严谨求解器快一个数量级收敛。

英文摘要

The only rigorous approaches for achieving a numerical proof of optimality in global optimization are interval-based methods that interleave branching of the search-space and pruning of the subdomains that cannot contain an optimal solution. State-of-the-art solvers generally integrate local optimization algorithms to compute a good upper bound of the global minimum over each subspace. In this document, we propose a cooperative framework in which interval methods cooperate with evolutionary algorithms. The latter are stochastic algorithms in which a population of candidate solutions iteratively evolves in the search-space to reach satisfactory solutions. Within our cooperative solver Charibde, the evolutionary algorithm and the interval-based algorithm run in parallel and exchange bounds, solutions and search-space in an advanced manner via message passing. A comparison of Charibde with state-of-the-art interval-based solvers (GlobSol, IBBA, Ibex) and NLP solvers (Couenne, BARON) on a benchmark of difficult COCONUT problems shows that Charibde is highly competitive against non-rigorous solvers and converges faster than rigorous solvers by an order of magnitude.

1509.05722 2026-06-04 stat.ML cs.AI cs.MA cs.SY eess.SY 版本更新

Energy saving in smart homes based on consumer behaviour: A case study

基于消费者行为的智能家庭节能:一个案例研究

Michael Zehnder, Holger Wache, Hans-Friedrich Witschel, Danilo Zanatta, Miguel Rodriguez

AI总结 本文提出一个节能推荐系统,通过分析消费者行为数据,利用机器学习建议减少家庭能耗,同时保持居住舒适度。

Comments To be presented on IEEE International Smart Cities Conference 2015

详情
AI中文摘要

本文介绍了一个推荐系统案例,该系统可帮助智能家庭在不降低居住舒适度的情况下节省能源。系统利用消费者行为数据,通过机器学习建议居民采取节能行动。系统从Digitalstrom家庭自动化系统提供的事件数据中挖掘频繁和周期性模式,将这些模式转换为关联规则,并与居民当前行为进行比较。如果系统检测到可以在不降低舒适度的情况下节能的机会,它会向居民发送推荐。

英文摘要

This paper presents a case study of a recommender system that can be used to save energy in smart homes without lowering the comfort of the inhabitants. We present an algorithm that uses consumer behavior data only and uses machine learning to suggest actions for inhabitants to reduce the energy consumption of their homes. The system mines for frequent and periodic patterns in the event data provided by the Digitalstrom home automation system. These patterns are converted into association rules, prioritized and compared with the current behavior of the inhabitants. If the system detects an opportunities to save energy without decreasing the comfort level it sends a recommendation to the residents.

1508.03863 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Discrete Route/Trajectory Decision Making Problems

离散路径/轨迹决策制定问题

Mark Sh. Levin

AI总结 本文研究了复合多阶段决策问题,旨在设计从初始决策状态到目标决策状态的路径或轨迹。通过汽车路线问题作为基本物理隐喻,探讨了离散操作/状态设计空间(如有向图)中的多种决策问题,并在教育、医学和经济等领域应用。

Comments 25 pages, 34 figures, 16 tables

详情
AI中文摘要

本文聚焦于复合多阶段决策问题,旨在从初始决策状态(起点)到目标(终点)决策状态的设计。汽车路线问题被视为基本物理隐喻。这些问题基于离散(组合)操作/状态设计/解决问题空间(例如,有向图)。描述的离散决策问题类型可视为智能路径(轨迹、策略)的设计,并可用于多个领域:(a)教育(学生教育轨迹规划),(b)医学(医疗治疗),(c)经济(初创企业发展的轨迹)。描述了几种路径决策问题类型:(i)基本路径决策,(ii)多目标路径决策,(iii)多路径决策,(iv)带路径/轨迹变化的多路径决策,(v)复合多路径决策(解决方案是几个对应领域的多个路径/轨迹的组合),(vi)带协调路径/轨迹的复合多路径决策。此外,还考虑了建模和构建设计空间的问题。数值示例展示了所建议的方法。三个应用被考虑:教育轨迹(或然问题),初创公司计划(模块化三阶段设计),以及医疗计划(在有向图上规划,具有双组件顶点)。

英文摘要

The paper focuses on composite multistage decision making problems which are targeted to design a route/trajectory from an initial decision situation (origin) to goal (destination) decision situation(s). Automobile routing problem is considered as a basic physical metaphor. The problems are based on a discrete (combinatorial) operations/states design/solving space (e.g., digraph). The described types of discrete decision making problems can be considered as intelligent design of a route (trajectory, strategy) and can be used in many domains: (a) education (planning of student educational trajectory), (b) medicine (medical treatment), (c) economics (trajectory of start-up development). Several types of the route decision making problems are described: (i) basic route decision making, (ii) multi-goal route decision making, (iii) multi-route decision making, (iv) multi-route decision making with route/trajectory change(s), (v) composite multi-route decision making (solution is a composition of several routes/trajectories at several corresponding domains), and (vi) composite multi-route decision making with coordinated routes/trajectories. In addition, problems of modeling and building the design spaces are considered. Numerical examples illustrate the suggested approach. Three applications are considered: educational trajectory (orienteering problem), plan of start-up company (modular three-stage design), and plan of medical treatment (planning over digraph with two-component vertices).

1508.01345 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Fuzzy Logic Based Direct Torque Control Of Induction Motor With Space Vector Modulation

基于模糊逻辑的感应电机直接转矩控制与空间矢量调制

Fatih Korkmaz, İsmail Topaloğlu, Hayati Mamur

AI总结 本文提出基于模糊逻辑的空间矢量调制方法,用于改进感应电机直接转矩控制中的高转矩脉动问题,通过Matlab/Simulink仿真验证了其在动态转矩和速度响应上的显著提升。

Comments 10 pages

详情
AI中文摘要

感应电机因其无刷结构、低成本和鲁棒性能而被广泛应用于各种领域。近年来,多种控制方法被提出,其中直接转矩控制因其快速的动态转矩响应和简单的控制结构而备受重视。然而,直接转矩控制方法仍存在一些缺点,其中最突出的是高转矩脉动。本文提出了一种新的方法,即基于模糊逻辑的空间矢量调制,旨在克服传统直接转矩控制中的高转矩脉动问题。为了测试和比较所提出的直接转矩控制方法与传统方法,已在Matlab/Simulink中进行了不同工作条件下的仿真。仿真结果表明,与传统直接转矩控制方法相比,该方法在动态转矩和速度响应方面有显著改进。

英文摘要

The induction motors have wide range of applications for due to its well-known advantages like brushless structures, low costs and robust performances. Over the past years, many kind of control methods are proposed for the induction motors and direct torque control has gained huge importance inside of them due to fast dynamic torque responses and simple control structures. However, the direct torque control method has still some handicaps against the other control methods and most of the important of these handicaps is high torque ripple. This paper suggests a new approach, Fuzzy logic based space vector modulation, on the direct torque controlled induction motors and aim of the approach is to overcome high torque ripple disadvantages of conventional direct torque control. In order to test and compare the proposed direct torque control method with conventional direct torque control method simulations, in Matlab/Simulink,have been carried out in different working conditions. The simulation results showed that a significant improvement in the dynamic torque and speed responses when compared to the conventional direct torque control method.

1502.05443 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Influence-Optimistic Local Values for Multiagent Planning --- Extended Version

多智能体规划中的影响乐观局部值---扩展版

Frans A. Oliehoek, Matthijs T. J. Spaan, Stefan Witwicki

AI总结 本文提出一种适用于非因子化价值函数的多智能体规划影响乐观上界方法,通过划分子问题并乐观假设系统影响,提供质量保证并改进启发式搜索效果。

Comments Long version of IJCAI 2015 paper (and extended abstract at AAMAS 2015)

详情
AI中文摘要

近年来,多智能体规划在不确定环境下发展出能处理数十甚至百级智能体的方法,但大多数方法要么对问题域做出限制性假设,要么提供无质量保证的近似解。本文引入了一种针对非因子化价值函数的多智能体规划影响乐观上界方法,通过将大问题划分为子问题并在每个子问题中乐观假设系统影响,提供可衡量的质量保证。通过数值比较不同上界,展示了如何在百级智能体问题中获得非平凡保证,即启发式解接近最优。此外,本文还证明这些上界可能提高启发式影响搜索的有效性,并讨论了进一步应用于多智能体规划的潜力。

英文摘要

Recent years have seen the development of methods for multiagent planning under uncertainty that scale to tens or even hundreds of agents. However, most of these methods either make restrictive assumptions on the problem domain, or provide approximate solutions without any guarantees on quality. Methods in the former category typically build on heuristic search using upper bounds on the value function. Unfortunately, no techniques exist to compute such upper bounds for problems with non-factored value functions. To allow for meaningful benchmarking through measurable quality guarantees on a very general class of problems, this paper introduces a family of influence-optimistic upper bounds for factored decentralized partially observable Markov decision processes (Dec-POMDPs) that do not have factored value functions. Intuitively, we derive bounds on very large multiagent planning problems by subdividing them in sub-problems, and at each of these sub-problems making optimistic assumptions with respect to the influence that will be exerted by the rest of the system. We numerically compare the different upper bounds and demonstrate how we can achieve a non-trivial guarantee that a heuristic solution for problems with hundreds of agents is close to optimal. Furthermore, we provide evidence that the upper bounds may improve the effectiveness of heuristic influence search, and discuss further potential applications to multiagent planning.

1507.00567 2026-06-04 eess.SY cs.AI cs.DC cs.LG cs.SE cs.SY 版本更新

Self-Learning Cloud Controllers: Fuzzy Q-Learning for Knowledge Evolution

自学习云控制器:用于知识演化的模糊Q学习

Pooyan Jamshidi, Amir Sharifloo, Claus Pahl, Andreas Metzger, Giovani Estrada

AI总结 本文提出FQL4KE自学习模糊云控制器,通过在运行时学习和修改模糊规则,使用户能通过调整优先级权重来指定控制器,而非复杂适应规则,实验表明其优于传统控制器。

详情
AI中文摘要

云控制器旨在通过在运行时自动扩展计算资源来响应应用需求,以满足性能保证并最小化资源成本。现有云控制器通常依赖预定义的适应规则集,但云服务提供商难以在设计时定义最优或预置的适应规则,因为上层应用是黑箱。因此,适应决策的负担常转嫁给云应用。然而,大多数情况下,应用开发者对云基础设施了解有限。本文提出在运行时学习适应规则。为此,我们引入FQL4KE,一种自学习模糊云控制器。FQL4KE在运行时学习和修改模糊规则。其优势在于设计云控制器时无需依赖仅靠精确的设计时知识,这可能难以获取。FQL4KE使用户能够通过简单调整代表系统目标优先级的权重来指定云控制器,而不是指定复杂的适应规则。FQL4KE的适用性已在云应用框架ElasticBench中得到实验评估。实验结果表明,FQL4KE优于我们之前开发的无学习机制的模糊控制器和原生Azure自动扩展。

英文摘要

Cloud controllers aim at responding to application demands by automatically scaling the compute resources at runtime to meet performance guarantees and minimize resource costs. Existing cloud controllers often resort to scaling strategies that are codified as a set of adaptation rules. However, for a cloud provider, applications running on top of the cloud infrastructure are more or less black-boxes, making it difficult at design time to define optimal or pre-emptive adaptation rules. Thus, the burden of taking adaptation decisions often is delegated to the cloud application. Yet, in most cases, application developers in turn have limited knowledge of the cloud infrastructure. In this paper, we propose learning adaptation rules during runtime. To this end, we introduce FQL4KE, a self-learning fuzzy cloud controller. In particular, FQL4KE learns and modifies fuzzy rules at runtime. The benefit is that for designing cloud controllers, we do not have to rely solely on precise design-time knowledge, which may be difficult to acquire. FQL4KE empowers users to specify cloud controllers by simply adjusting weights representing priorities in system goals instead of specifying complex adaptation rules. The applicability of FQL4KE has been experimentally assessed as part of the cloud application framework ElasticBench. The experimental results indicate that FQL4KE outperforms our previously developed fuzzy controller without learning mechanisms and the native Azure auto-scaling.

1506.02312 2026-06-04 cs.AI cs.LG cs.RO cs.SY eess.SY 版本更新

A Framework for Constrained and Adaptive Behavior-Based Agents

一种用于约束和自适应行为基 agent 的框架

Renato de Pontes Pereira, Paulo Martins Engel

AI总结 本文提出一种框架,通过强化学习节点整合到行为树中,解决约束 agent 的学习能力问题,并展示其与分层强化学习选项的关系,确保嵌套学习节点的收敛性。

Comments 2015; 15 pages

详情
AI中文摘要

行为树常用于建模机器人和游戏中的 agent,其中必须由人类专家设计受约束的行为以确保 agent 在特定感知下执行特定动作链。在这些应用领域,学习是可取的,因为它能为 agent 提供适应和改进与人类和环境交互的能力,但往往被丢弃,因为其不可靠。本文提出一个框架,将强化学习节点作为行为树的一部分,以解决在受约束 agent 中添加学习能力的问题。我们展示了该框架与分层强化学习中选项的关系,确保嵌套学习节点的收敛性,并通过实验证明学习节点不会影响树中其他节点的执行。

英文摘要

Behavior Trees are commonly used to model agents for robotics and games, where constrained behaviors must be designed by human experts in order to guarantee that these agents will execute a specific chain of actions given a specific set of perceptions. In such application areas, learning is a desirable feature to provide agents with the ability to adapt and improve interactions with humans and environment, but often discarded due to its unreliability. In this paper, we propose a framework that uses Reinforcement Learning nodes as part of Behavior Trees to address the problem of adding learning capabilities in constrained agents. We show how this framework relates to Options in Hierarchical Reinforcement Learning, ensuring convergence of nested learning nodes, and we empirically show that the learning nodes do not affect the execution of other nodes in the tree.

1505.07872 2026-06-04 cs.AI cs.SY eess.SY math.OC 版本更新

Towards combinatorial clustering: preliminary research survey

朝着组合聚类:初步研究调查

Mark Sh. Levin

AI总结 本文从组合角度探讨聚类问题,涵盖基本聚类问题、评估方法、局部质量评估、多目标优化、通用模块聚类框架及基本聚类模型,重点在于将聚类问题建模为多目标优化问题。

Comments 102 pages, 66 figures, 67 tables

详情
AI中文摘要

本文从组合优化角度描述聚类问题,系统回顾了基本聚类问题、对象评估方法、局部和总质量评估、多目标优化、通用模块聚类框架及基本聚类模型。特别关注将聚类问题建模为多目标优化问题。组合优化模型作为辅助问题(如分配、分组、背包问题、多选问题、形态 clique 问题、寻找共识/中位数结构)被使用。数值示例展示了问题定义、解决方法和应用。该材料可作为研究调查、复合模块聚类软件设计的基础、文献参考和教程使用。

英文摘要

The paper describes clustering problems from the combinatorial viewpoint. A brief systemic survey is presented including the following: (i) basic clustering problems (e.g., classification, clustering, sorting, clustering with an order over cluster), (ii) basic approaches to assessment of objects and object proximities (i.e., scales, comparison, aggregation issues), (iii) basic approaches to evaluation of local quality characteristics for clusters and total quality characteristics for clustering solutions, (iv) clustering as multicriteria optimization problem, (v) generalized modular clustering framework, (vi) basic clustering models/methods (e.g., hierarchical clustering, k-means clustering, minimum spanning tree based clustering, clustering as assignment, detection of clisue/quasi-clique based clustering, correlation clustering, network communities based clustering), Special attention is targeted to formulation of clustering as multicriteria optimization models. Combinatorial optimization models are used as auxiliary problems (e.g., assignment, partitioning, knapsack problem, multiple choice problem, morphological clique problem, searching for consensus/median for structures). Numerical examples illustrate problem formulations, solving methods, and applications. The material can be used as follows: (a) a research survey, (b) a fundamental for designing the structure/architecture of composite modular clustering software, (c) a bibliography reference collection, and (d) a tutorial.

1505.04123 2026-06-04 cs.LG cs.AI cs.NA math.NA math.OC 版本更新

Margins, Kernels and Non-linear Smoothed Perceptrons

边距、核与非线性平滑感知机

Aaditya Ramdas, Javier Peña

AI总结 本文研究了在RKHS中寻找非线性分类函数的问题,提出了一种加速平滑算法,具有与经典核感知机相似的收敛特性,并给出了在无分类器存在时的分离定理。

Comments 17 pages, published in the proceedings of the International Conference on Machine Learning, 2014

详情
Journal ref
Ramdas, Aaditya, and Javier Pena. "Margins, kernels and non-linear smoothed perceptrons." Proceedings of the 31st International Conference on Machine Learning (ICML-14). 2014
AI中文摘要

我们关注在RKHS中寻找非线性分类函数的问题,从原问题和对偶问题两个角度出发,特别关注感知机和冯-诺依曼算法的推广。我们将问题转化为在RKHS中最大化正则化归一化硬边距(ρ),并利用表示定理将其转换为与核的(归一化和带符号)Gram矩阵相关的马哈拉诺斯基点积/半范数。我们推导出一种加速平滑算法,具有收敛率为√(log n)/ρ的特性,给定n个可分离点。当不存在此类分类器时,我们证明了RKHS版本的戈尔丹分离定理,并重新解释了负边距。这使得我们能够为原对偶算法提供保证,该算法在存在可行原问题时,可在min{√n/|ρ|, √n/ε}次迭代中找到RKHS中的完美分离器,或在无可行原问题时提供一个对偶ε-不可行性证书。

英文摘要

We focus on the problem of finding a non-linear classification function that lies in a Reproducing Kernel Hilbert Space (RKHS) both from the primal point of view (finding a perfect separator when one exists) and the dual point of view (giving a certificate of non-existence), with special focus on generalizations of two classical schemes - the Perceptron (primal) and Von-Neumann (dual) algorithms. We cast our problem as one of maximizing the regularized normalized hard-margin ($ρ$) in an RKHS and %use the Representer Theorem to rephrase it in terms of a Mahalanobis dot-product/semi-norm associated with the kernel's (normalized and signed) Gram matrix. We derive an accelerated smoothed algorithm with a convergence rate of $\tfrac{\sqrt {\log n}}ρ$ given $n$ separable points, which is strikingly similar to the classical kernelized Perceptron algorithm whose rate is $\tfrac1{ρ^2}$. When no such classifier exists, we prove a version of Gordan's separation theorem for RKHSs, and give a reinterpretation of negative margins. This allows us to give guarantees for a primal-dual algorithm that halts in $\min\{\tfrac{\sqrt n}{|ρ|}, \tfrac{\sqrt n}ε\}$ iterations with a perfect separator in the RKHS if the primal is feasible or a dual $ε$-certificate of near-infeasibility.

1403.5045 2026-06-04 cs.LG cs.AI cs.SY eess.SY stat.ML 版本更新

Matroid Bandits: Fast Combinatorial Optimization with Learning

Matroid Bandits: 快速组合优化中的学习

Branislav Kveton, Zheng Wen, Azin Ashkan, Hoda Eydgahi, Brian Eriksson

AI总结 本文提出matroid bandits,结合bandits和matroids,通过Optimistic Matroid Maximization算法解决在matroid上最大化随机模函数的问题,并给出两种 regret 上界。

详情
AI中文摘要

Matroid 是组合优化中独立性的概念,与计算效率密切相关。本文将bandits与matroids结合,提出matroid bandits,目标是学习在matroid上最大化随机模函数。我们提出实用算法Optimistic Matroid Maximization (OMM),并证明两种regret上界,均为亚线性时间,且在其他量上至多线性。gap-dependent上界是紧的,并证明了partition matroid bandit的匹配下界。最后在三个实际问题上评估了该方法,证明其实用性。

英文摘要

A matroid is a notion of independence in combinatorial optimization which is closely related to computational efficiency. In particular, it is well known that the maximum of a constrained modular function can be found greedily if and only if the constraints are associated with a matroid. In this paper, we bring together the ideas of bandits and matroids, and propose a new class of combinatorial bandits, matroid bandits. The objective in these problems is to learn how to maximize a modular function on a matroid. This function is stochastic and initially unknown. We propose a practical algorithm for solving our problem, Optimistic Matroid Maximization (OMM); and prove two upper bounds, gap-dependent and gap-free, on its regret. Both bounds are sublinear in time and at most linear in all other quantities of interest. The gap-dependent upper bound is tight and we prove a matching lower bound on a partition matroid bandit. Finally, we evaluate our method on three real-world problems and show that it is practical.

1502.04266 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Constrained Nonlinear Model Predictive Control of an MMA Polymerization Process via Evolutionary Optimization

通过进化优化实现MMA聚合过程的约束非线性模型预测控制

Masoud Abbaszadeh, Reza Solgi

AI总结 本文开发了非线性模型预测控制器用于间歇聚合过程,通过参数化期望轨迹得到轨迹线性化分段模型,并利用实验聚合反应器识别参数,设计多模型自适应预测控制器以实现热轨迹跟踪,采用遗传算法解决约束优化问题以最小化DMC成本函数。

Comments 12 pages, 9 figures, 28 references

详情
AI中文摘要

本文开发了一种非线性模型预测控制器用于间歇聚合过程。过程的物理模型沿期望轨迹参数化,从而得到轨迹线性化分段模型(多重线性模型库),并为实验聚合反应器识别参数。然后,为MMA聚合的热轨迹跟踪设计了多模型自适应预测控制器。过程的输入控制信号受到加热器最大热功率的限制。模型预测控制器中的约束优化通过遗传算法在每个采样间隔内求解,以最小化DMC成本函数。

英文摘要

In this work, a nonlinear model predictive controller is developed for a batch polymerization process. The physical model of the process is parameterized along a desired trajectory resulting in a trajectory linearized piecewise model (a multiple linear model bank) and the parameters are identified for an experimental polymerization reactor. Then, a multiple model adaptive predictive controller is designed for thermal trajectory tracking of the MMA polymerization. The input control signal to the process is constrained by the maximum thermal power provided by the heaters. The constrained optimization in the model predictive controller is solved via genetic algorithms to minimize a DMC cost function in each sampling interval.

1502.01321 2026-06-04 math.NA cs.AI cs.NA math-ph math.MP 版本更新

Numerical Solution of Fuzzy Stochastic Differential Equation

模糊随机微分方程的数值解法

Sukanta Nayak, Snehashish Chakraverty

AI总结 本文提出了解决不确定随机微分方程的新方法,利用模糊算术处理具有模糊参数的FSDE,并展示了精确解和欧拉-马尔可夫过程近似方法。

详情
AI中文摘要

本文提出了一种解决不确定随机微分方程(SDE)的新方法。这种不确定性源于系统中的参数,这些参数被视为三角模糊数(TFN)。本文采用[2]中提出的模糊算术作为工具来处理模糊随机微分方程(FSDE)。特别地,分析了一组伊藤随机微分方程,并展示了具有模糊值的精确解和欧拉-马尔可夫过程近似方法,并解了一些标准SDE。

英文摘要

In this paper an alternative approach to solve uncertain Stochastic Differential Equation (SDE) is proposed. This uncertainty occurs due to the involved parameters in system and these are considered as Triangular Fuzzy Numbers (TFN). Here the proposed fuzzy arithmetic in [2] is used as a tool to handle Fuzzy Stochastic Differential Equation (FSDE). In particular, a system of Ito stochastic differential equations is analysed with fuzzy parameters. Further exact and Euler Maruyama approximation methods with fuzzy values are demonstrated and solved some standard SDE.

1401.1549 2026-06-04 cs.LG cs.AI cs.SY eess.SY 版本更新

Optimal Demand Response Using Device Based Reinforcement Learning

基于设备的强化学习的最优需求响应

Zheng Wen, Daniel O'Neill, Hamid Reza Maei

AI总结 本文提出一种新型EMS框架,将需求响应问题建模为强化学习问题,通过设备集群分解解决调度问题,无需显式建模用户不满,提升计算效率。

详情
AI中文摘要

本文提出了一种新型EMS框架,将需求响应问题建模为强化学习问题,通过设备集群分解解决调度问题,无需显式建模用户不满,提升计算效率。

英文摘要

Demand response (DR) for residential and small commercial buildings is estimated to account for as much as 65% of the total energy savings potential of DR, and previous work shows that a fully automated Energy Management System (EMS) is a necessary prerequisite to DR in these areas. In this paper, we propose a novel EMS formulation for DR problems in these sectors. Specifically, we formulate a fully automated EMS's rescheduling problem as a reinforcement learning (RL) problem, and argue that this RL problem can be approximately solved by decomposing it over device clusters. Compared with existing formulations, our new formulation (1) does not require explicitly modeling the user's dissatisfaction on job rescheduling, (2) enables the EMS to self-initiate jobs, (3) allows the user to initiate more flexible requests and (4) has a computational complexity linear in the number of devices. We also demonstrate the simulation results of applying Q-learning, one of the most popular and classical RL algorithms, to a representative example.

1410.0083 2026-06-04 eess.SY cs.AI cs.RO cs.SY 版本更新

Integrating active sensing into reactive synthesis with temporal logic constraints under partial observations

将主动感知整合到带有时序逻辑约束的反应合成中,在部分观察下

Jie Fu, Ufuk Topcu

AI总结 本文提出在部分可观测和动态环境中,利用感知动作进行在线反应规划,通过主动感知策略减少不确定性,确保时序逻辑规范以概率1满足。

Comments 7 pages, 2 figures, submitted to American Control Conference 2015

详情
AI中文摘要

我们引入了在部分可观测和动态环境中,具有时序逻辑约束的系统中利用感知动作进行在线反应规划的概念。在动态环境信息不完整的情况下,反应控制器合成相当于解决一个具有部分观察的双人游戏,计算复杂度 impractically 高。为减轻高计算负担,通过感知动作进行在线重规划,避免在部分观察下解决反应系统的策略。相反,我们只解决一个策略,确保给定的时序逻辑规范在系统拥有完整环境观察时可以满足。此类策略随后被转换为基于观察到的状态序列(交互系统及其环境)做出控制决策的策略。当系统遇到一个信念——包含所有可能的当前状态假设的集合——对于观察策略未定义时,触发一系列感知动作,由主动感知策略选择,以减少系统信念中的不确定性。我们证明,在满足系统传感器集合的 mild 技术假设下,通过交替使用基于观察的策略和主动感知策略,可以以概率1满足给定的时序逻辑规范。

英文摘要

We introduce the notion of online reactive planning with sensing actions for systems with temporal logic constraints in partially observable and dynamic environments. With incomplete information on the dynamic environment, reactive controller synthesis amounts to solving a two-player game with partial observations, which has impractically computational complexity. To alleviate the high computational burden, online replanning via sensing actions avoids solving the strategy in the reactive system under partial observations. Instead, we only solve for a strategy that ensures a given temporal logic specification can be satisfied had the system have complete observations of its environment. Such a strategy is then transformed into one which makes control decisions based on the observed sequence of states (of the interacting system and its environment). When the system encounters a belief---a set including all possible hypotheses the system has for the current state---for which the observation-based strategy is undefined, a sequence of sensing actions are triggered, chosen by an active sensing strategy, to reduce the uncertainty in the system's belief. We show that by alternating between the observation-based strategy and the active sensing strategy, under a mild technical assumption of the set of sensors in the system, the given temporal logic specification can be satisfied with probability 1.

1409.5671 2026-06-04 cs.AI cs.CE cs.LG cs.LO cs.SY eess.SY 版本更新

A Formal Methods Approach to Pattern Synthesis in Reaction Diffusion Systems

反应扩散系统模式合成的正式方法方法

Ebru Aydin Gol, Ezio Bartocci, Calin Belta

AI总结 本文提出了一种基于空间叠加逻辑的模式检测与生成技术,结合模型检验与粒子群优化,实现反应扩散系统中所需模式的参数合成。

详情
AI中文摘要

我们提出了一种技术,用于检测和生成局部相互作用动态系统网络中的模式。我们的方法核心是一种新的空间叠加逻辑,其语义定义在分区图像的四叉树上。我们展示了该逻辑中的公式可以从正例和负例中高效学习。我们还证明,模式检测,作为模型检验算法实现,对于与学习集不同的测试数据集表现良好。我们为该逻辑定义了定量语义,并将模型检验算法与粒子群优化整合到计算框架中,用于合成导致反应扩散系统中所需模式的参数。

英文摘要

We propose a technique to detect and generate patterns in a network of locally interacting dynamical systems. Central to our approach is a novel spatial superposition logic, whose semantics is defined over the quad-tree of a partitioned image. We show that formulas in this logic can be efficiently learned from positive and negative examples of several types of patterns. We also demonstrate that pattern detection, which is implemented as a model checking algorithm, performs very well for test data sets different from the learning sets. We define a quantitative semantics for the logic and integrate the model checking algorithm with particle swarm optimization in a computational framework for synthesis of parameters leading to desired patterns in reaction-diffusion systems.

1408.5492 2026-06-04 eess.SY cs.AI cs.SY math.OC 版本更新

Towards Decision Support Technology Platform for Modular Systems

面向模块化系统的决策支持技术平台

Mark Sh. Levin

AI总结 本文提出一种面向模块化系统的决策支持技术平台,涵盖系统综合、建模、评估等七个基本框架,用于处理复合替代方案,提升多领域决策效率。

Comments 10 pages, 9 figures, 2 tables

详情
AI中文摘要

本文是一篇综述性论文,旨在探讨面向模块化系统的通用决策支持平台技术。该平台由七个基本组合工程框架组成,包括系统综合、系统建模、评估、瓶颈检测、改进/扩展、多阶段设计、组合进化和预测。平台基于决策支持程序(如多准则选择/排序、聚类)和组合优化问题(如背包问题、多选问题、团问题、分配/分配、覆盖、生成树)及其组合。本文描述了:(1)决策支持平台技术的一般方案;(2)模块化(复合)系统(或复合替代方案)的简要描述;(3)从替代方案选择向复合替代方案处理的趋势,对应于分层模块化产品/系统;(4)资源需求方案(即人力、信息-计算机);(5)基本组合工程框架及其在不同领域中的应用。

英文摘要

The survey methodological paper addresses a glance to a general decision support platform technology for modular systems (modular/composite alterantives/solutions) in various applied domains. The decision support platform consists of seven basic combinatorial engineering frameworks (system synthesis, system modeling, evaluation, detection of bottleneck, improvement/extension, multistage design, combinatorial evolution and forecasting). The decision support platform is based on decision support procedures (e.g., multicriteria selection/sorting, clustering), combinatorial optimization problems (e.g., knapsack, multiple choice problem, clique, assignment/allocation, covering, spanning trees), and their combinations. The following is described: (1) general scheme of the decision support platform technology; (2) brief descriptions of modular (composite) systems (or composite alternatives); (3) trends in moving from chocie/selection of alternatives to processing of composite alternatives which correspond to hierarchical modular products/systems; (4) scheme of resource requirements (i.e., human, information-computer); and (5) basic combinatorial engineering frameworks and their applications in various domains.

1407.2676 2026-06-04 math.OC cs.AI cs.LG cs.SY eess.SY stat.ML 版本更新

A New Optimal Stepsize For Approximate Dynamic Programming

近似动态规划的一种新最优步长

Ilya O. Ryzhov, Peter I. Frazier, Warren B. Powell

AI总结 本文提出一种新的最优步长规则,通过优化预测误差提升近似动态规划算法的短期性能,仅需一个敏感度较低的可调参数,适应问题噪声水平,加快数值实验中的收敛速度。

Comments Matlab files are included with the paper source

详情
AI中文摘要

近似动态规划(ADP)已在大规模交通运输问题、医疗保健、收益管理以及能源系统等广泛领域中得到了应用。设计有效的ADP算法有许多维度,但一个关键因素是用于更新价值函数近似值的步长规则。许多运筹学应用计算上都很耗费资源,因此快速获得良好结果非常重要。此外,最流行的步长公式使用可调参数,如果调节不当,可能会产生非常差的结果。我们推导出一种新的步长规则,以优化预测误差,从而提高ADP算法的短期性能。仅需一个相对不敏感的可调参数,新的规则能够适应问题中的噪声水平,并在数值实验中产生更快的收敛速度。

英文摘要

Approximate dynamic programming (ADP) has proven itself in a wide range of applications spanning large-scale transportation problems, health care, revenue management, and energy systems. The design of effective ADP algorithms has many dimensions, but one crucial factor is the stepsize rule used to update a value function approximation. Many operations research applications are computationally intensive, and it is important to obtain good results quickly. Furthermore, the most popular stepsize formulas use tunable parameters and can produce very poor results if tuned improperly. We derive a new stepsize rule that optimizes the prediction error in order to improve the short-term performance of an ADP algorithm. With only one, relatively insensitive tunable parameter, the new rule adapts to the level of noise in the problem and produces faster convergence in numerical experiments.

1406.1128 2026-06-04 cs.AI cs.SY eess.SY 版本更新

A self-organizing system for urban traffic control based on predictive interval microscopic model

基于预测区间微观模型的自组织城市交通控制系统

Bartlomiej Placzek

AI总结 本文提出一种基于预测区间微观模型的自组织交通信号系统,通过智能体预测控制动作对交通流的影响,优化交通控制性能,尤其在非均匀交通流中表现更优。

Comments 29 pages, 8 figures

详情
Journal ref
Engineering Applications of Artificial Intelligence, vol. 34, pp. 75-84, 2014
AI中文摘要

本文介绍了一种用于城市道路网络的自组织交通信号系统。该系统的关键元素是控制路口交通信号的智能体。每个智能体使用区间微观交通模型预测其可能控制动作在短时间范围内的影响。执行的控制动作基于预测的延迟区间进行选择。由于预测结果以区间形式表示,智能体可以识别并暂停那些对交通控制性能效果不确定的控制动作。所提出的交通控制系统在模拟环境中进行了评估。模拟实验表明,所提出的方法在交通控制性能上有所改进,特别是在非均匀交通流中表现更优。

英文摘要

This paper introduces a self-organizing traffic signal system for an urban road network. The key elements of this system are agents that control traffic signals at intersections. Each agent uses an interval microscopic traffic model to predict effects of its possible control actions in a short time horizon. The executed control action is selected on the basis of predicted delay intervals. Since the prediction results are represented by intervals, the agents can recognize and suspend those control actions, whose positive effect on the performance of traffic control is uncertain. Evaluation of the proposed traffic control system was performed in a simulation environment. The simulation experiments have shown that the proposed approach results in an improved performance, particularly for non-uniform traffic streams.

1405.0936 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Design Of Fuzzy Logic Traffic Controller For Isolated Intersections With Emergency Vehicle Priority System Using MATLAB Simulation

基于MATLAB仿真的隔离交叉口模糊逻辑交通控制器设计:含紧急车辆优先系统

Mohit Jha, Shailja Shukla

AI总结 本文设计了基于模糊逻辑的交通控制器,通过MATLAB仿真优化隔离交叉口交通流,利用等待时间和队列长度进行控制,并集成紧急车辆优先系统以提升交通效率。

Comments 7 Pages,7 Figure,CSIR Sponsored X Control Instrumentation System Conference 2013; ISBN 978-93-82338-93-2

详情
AI中文摘要

交通是各国面临的首要难题,尤其在大城市中车辆数量迅速增加。本文提出一种基于MATLAB的模糊逻辑交通控制器,用于控制隔离交叉口的交通流。该控制器基于当前绿灯阶段车辆的等待时间和队列长度,以及其他阶段的车辆队列长度,以确定交通灯时长和相位差,从而实现交通流的最优控制。所用模型包含两个通道,每个入口有不同的队列长度和等待时间。通过接近传感器选择最大等待时间和车辆队列长度作为输入,以改善交叉口的交通流。此外,该控制器还集成了紧急车辆警报传感器,可检测救护车、消防车和警车等紧急车辆,并优先通过信号灯。

英文摘要

Traffic is the chief puzzle problem which every country faces because of the enhancement in number of vehicles throughout the world, especially in large urban towns. Hence the need arises for simulating and optimizing traffic control algorithms to better accommodate this increasing demand. Fuzzy optimization deals with finding the values of input parameters of a complex simulated system which result in desired output. This paper presents a MATLAB simulation of fuzzy logic traffic controller for controlling flow of traffic in isolated intersections. This controller is based on the waiting time and queue length of vehicles at present green phase and vehicles queue lengths at the other phases. The controller controls the traffic light timings and phase difference to ascertain sebaceous flow of traffic with least waiting time and queue length. In this paper, the isolated intersection model used consists of two alleyways in each approach. Every outlook has different value of queue length and waiting time, systematically, at the intersection. The maximum value of waiting time and vehicle queue length has to be selected by using proximity sensors as inputs to controller for the ameliorate control traffic flow at the intersection. An intelligent traffic model and fuzzy logic traffic controller are developed to evaluate the performance of traffic controller under different pre-defined conditions for oleaginous flow of traffic. Additionally, this fuzzy logic traffic controller has emergency vehicle siren sensors which detect emergency vehicle movement like ambulance, fire brigade, Police Van etc. and gives maximum priority to him and pass preferred signal to it.

1402.6763 2026-06-04 math.OC cs.AI cs.NA math.NA 版本更新

Linear Programming for Large-Scale Markov Decision Problems

大规模马尔可夫决策问题的线性规划

Yasin Abbasi-Yadkori, Peter L. Bartlett, Alan Malek

AI总结 本文提出通过低维策略集与线性规划对大规模马尔可夫决策过程进行优化,通过随机凸优化和约束采样技术实现性能逼近,不依赖状态空间大小。

Comments 27 pages, 3 figures

详情
AI中文摘要

我们考虑如何控制具有大状态空间的马尔可夫决策过程(MDP)以最小化平均成本。由于对于大规模问题无法与最优策略竞争,我们追求更现实的目标,即与低维策略集竞争。我们使用MDP平均成本问题的对偶线性规划形式,其中变量是状态-动作对的平稳分布,并考虑一个低维子集的平稳分布邻域(以状态-动作特征定义)作为比较类。我们提出两种技术,一种基于随机凸优化,另一种基于约束采样。在两种情况下,我们给出的界限表明,我们的算法性能可以逼近比较类中任何策略的最佳性能。最重要的是,这些结果依赖于比较类的大小,而不是状态空间的大小。初步实验显示,所提出算法在排队应用中有效。

英文摘要

We consider the problem of controlling a Markov decision process (MDP) with a large state space, so as to minimize average cost. Since it is intractable to compete with the optimal policy for large scale problems, we pursue the more modest goal of competing with a low-dimensional family of policies. We use the dual linear programming formulation of the MDP average cost problem, in which the variable is a stationary distribution over state-action pairs, and we consider a neighborhood of a low-dimensional subset of the set of stationary distributions (defined in terms of state-action features) as the comparison class. We propose two techniques, one based on stochastic convex optimization, and one based on constraint sampling. In both cases, we give bounds that show that the performance of our algorithms approaches the best achievable by any policy in the comparison class. Most importantly, these results depend on the size of the comparison class, but not on the size of the state space. Preliminary experiments show the effectiveness of the proposed algorithms in a queuing application.

1401.1752 2026-06-04 cs.HC cs.AI cs.NA math.NA 版本更新

Speeding up SOR Solvers for Constraint-based GUIs with a Warm-Start Strategy

通过暖启动策略加速基于约束的GUI求解器

Noreen Jamil, Johannes Müller, Christof Lutteroth, Gerald Weber

AI总结 本文提出通过重用先前解来加速基于Gauss-Seidel算法的约束求解器,实验表明在GUI缩放和约束变更场景中能提升求解效率。

详情
AI中文摘要

许多计算机程序具有图形用户界面(GUI),需要良好的布局以高效利用可用屏幕空间。大多数GUI没有固定布局,而是可调整大小并能自我适应。约束是指定可变布局的强大工具:它们用于以通用形式指定布局,而约束求解器用于找到满足的具体系布局,例如特定GUI尺寸。约束求解器每次GUI调整或更改时都需要计算新布局,因此需要高效以确保良好的用户体验。基于Gauss-Seidel算法和逐次超松弛(SOR)的方法之一是约束求解器的途径。我们的观察是,调整或更改后的解决方案在结构上与先前解决方案相似。因此,我们的假设是,如果我们重用先前布局的解决方案来预热新布局的求解,可以提高基于SOR的约束求解器的计算性能。在本文中,我们报告了针对三种常见使用案例(大步缩放、小步缩放和约束变更)的实验,以验证这一假设。在实验中,我们测量了随机生成的GUI布局规范在不同尺寸下的求解时间。对于所有三种情况,我们发现如果使用现有解决方案作为新布局的起始解决方案,性能会得到提升。

英文摘要

Many computer programs have graphical user interfaces (GUIs), which need good layout to make efficient use of the available screen real estate. Most GUIs do not have a fixed layout, but are resizable and able to adapt themselves. Constraints are a powerful tool for specifying adaptable GUI layouts: they are used to specify a layout in a general form, and a constraint solver is used to find a satisfying concrete layout, e.g.\ for a specific GUI size. The constraint solver has to calculate a new layout every time a GUI is resized or changed, so it needs to be efficient to ensure a good user experience. One approach for constraint solvers is based on the Gauss-Seidel algorithm and successive over-relaxation (SOR). Our observation is that a solution after resizing or changing is similar in structure to a previous solution. Thus, our hypothesis is that we can increase the computational performance of an SOR-based constraint solver if we reuse the solution of a previous layout to warm-start the solving of a new layout. In this paper we report on experiments to test this hypothesis experimentally for three common use cases: big-step resizing, small-step resizing and constraint change. In our experiments, we measured the solving time for randomly generated GUI layout specifications of various sizes. For all three cases we found that the performance is improved if an existing solution is used as a starting solution for a new layout.

1311.4527 2026-06-04 cs.AI cs.DC cs.MA cs.RO cs.SY eess.SY 版本更新

A message-passing algorithm for multi-agent trajectory planning

多智能体轨迹规划的消息传递算法

Jose Bento, Nate Derbinsky, Javier Alonso-Mora, Jonathan Yedidia

AI总结 本文提出基于改进交替方向乘子法的新型算法,用于计算多智能体的无碰撞全局轨迹,具有自然并行化和易整合不同成本函数的优点。

Comments In Advances in Neural Information Processing Systems (NIPS), 2013. Demo video available at http://www.youtube.com/watch?v=yuGCkVT8Bew

详情
AI中文摘要

我们描述了一种新的方法,用于计算具有指定初始和最终配置的p个智能体的无碰撞全局轨迹,基于改进的交替方向乘子法(ADMM)。与现有方法相比,我们的方法具有自然并行化的能力,并且只需少量调整即可整合不同的成本函数。我们应用我们的方法到经典的具有挑战性的实例中,并观察到其计算需求在几个成本函数中随着p良好扩展。我们还展示了我们的算法的一种特化形式可用于通过在速度空间中解决联合优化问题进行局部运动规划。

英文摘要

We describe a novel approach for computing collision-free \emph{global} trajectories for $p$ agents with specified initial and final configurations, based on an improved version of the alternating direction method of multipliers (ADMM). Compared with existing methods, our approach is naturally parallelizable and allows for incorporating different cost functionals with only minor adjustments. We apply our method to classical challenging instances and observe that its computational requirements scale well with $p$ for several cost functionals. We also show that a specialization of our algorithm can be used for {\em local} motion planning by solving the problem of joint optimization in velocity space.

1311.1761 2026-06-04 cs.LG cs.AI cs.NE cs.RO cs.SY eess.SY 版本更新

Exploring Deep and Recurrent Architectures for Optimal Control

探索深度和循环架构以实现最优控制

Sergey Levine

AI总结 本文探讨了将深度和循环神经网络应用于连续高维运动控制任务,通过强化学习算法训练控制器,比较不同架构的性能,并讨论深度学习在最优控制中的应用前景。

Comments Appears in the Neural Information Processing Systems (NIPS 2013) Workshop on Deep Learning

详情
AI中文摘要

复杂的多层神经网络在多个监督任务中取得了最先进的结果。然而,此类多层网络在控制领域的成功应用迄今为止主要局限于控制流水线的感知部分。本文探讨了将深度和循环神经网络应用于连续、高维运动任务,其中网络用于表示控制策略,将系统状态(由关节角度表示)直接映射到每个关节的扭矩。通过使用最近的强化学习算法guided policy search,可以成功训练具有数千参数的神经网络控制器,从而比较各种架构。我们讨论了运动控制任务与先前监督感知任务的区别,展示了比较各种架构的实验结果,并讨论了将深度学习技术应用于最优控制问题的未来方向。

英文摘要

Sophisticated multilayer neural networks have achieved state of the art results on multiple supervised tasks. However, successful applications of such multilayer networks to control have so far been limited largely to the perception portion of the control pipeline. In this paper, we explore the application of deep and recurrent neural networks to a continuous, high-dimensional locomotion task, where the network is used to represent a control policy that maps the state of the system (represented by joint angles) directly to the torques at each joint. By using a recent reinforcement learning algorithm called guided policy search, we can successfully train neural network controllers with thousands of parameters, allowing us to compare a variety of architectures. We discuss the differences between the locomotion control task and previous supervised perception tasks, present experimental results comparing various architectures, and discuss future directions in the application of techniques from deep learning to the problem of optimal control.

1310.7950 2026-06-04 eess.SY cs.AI cs.LO cs.SY 版本更新

Technical Report: Distribution Temporal Logic: Combining Correctness with Quality of Estimation

技术报告:分布时间逻辑:结合正确性与估计质量

Austin Jones, Mac Schwager, Calin Belta

AI总结 本文提出分布时间逻辑(DTL),用于描述部分可观测系统中涉及不确定性和可能性的属性,提供了一种安全的公式化方法及监控算法,并通过救援机器人应用案例验证。

Comments More expanded version of "Distribution Temporal Logic: Combining Correctness with Quality of Estimation" to appear in IEEE CDC 2013

详情
AI中文摘要

我们提出了一种称为分布时间逻辑(DTL)的新时间逻辑,其定义在信念状态和隐藏状态的谓词上。DTL能够表达现有逻辑无法描述的涉及不确定性和可能性的属性。定义了DTL的安全公式化方法,并给出了针对部分可观测马尔可夫决策过程的监控算法。一个救援机器人应用的模拟案例阐述了我们的方法。

英文摘要

We present a new temporal logic called Distribution Temporal Logic (DTL) defined over predicates of belief states and hidden states of partially observable systems. DTL can express properties involving uncertainty and likelihood that cannot be described by existing logics. A co-safe formulation of DTL is defined and algorithmic procedures are given for monitoring executions of a partially observable Markov decision process with respect to such formulae. A simulation case study of a rescue robotics application outlines our approach.

1309.0866 2026-06-04 cs.LO cs.AI cs.LG cs.SY eess.SY 版本更新

On the Robustness of Temporal Properties for Stochastic Models

关于随机模型中时间属性的鲁棒性

Ezio Bartocci, Luca Bortolussi, Laura Nenzi, Guido Sanguinetti

AI总结 本文研究了随机模型中时间属性的鲁棒性,提出鲁棒性度量方法,并结合满足概率优化系统设计。

Comments In Proceedings HSB 2013, arXiv:1308.5724

详情
Journal ref
EPTCS 125, 2013, pp. 3-19
AI中文摘要

随机模型如连续时间马尔可夫链(CTMC)和随机混合自动机(SHA)因其能捕捉生物过程中的随机性而成为强大的形式化工具。形式化建模中的经典问题——模型检查问题——即计算特定时间逻辑公式行为在给定随机过程中的概率。然而,除了满足性外,还关注系统维持特定涌现行为的鲁棒性,不受外部噪声或模型参数微小变化的影响。本文提出将鲁棒性概念扩展至随机系统,展示其自然导致鲁棒性分数分布,并通过两个例子说明如何近似分布及其关键指标:平均鲁棒性和条件平均鲁棒性。其次,展示了如何将这些指标与满足概率结合,以解决系统设计问题,即优化随机模型的控制参数以最大化所需规范的鲁棒性。

英文摘要

Stochastic models such as Continuous-Time Markov Chains (CTMC) and Stochastic Hybrid Automata (SHA) are powerful formalisms to model and to reason about the dynamics of biological systems, due to their ability to capture the stochasticity inherent in biological processes. A classical question in formal modelling with clear relevance to biological modelling is the model checking problem. i.e. calculate the probability that a behaviour, expressed for instance in terms of a certain temporal logic formula, may occur in a given stochastic process. However, one may not only be interested in the notion of satisfiability, but also in the capacity of a system to mantain a particular emergent behaviour unaffected by the perturbations, caused e.g. from extrinsic noise, or by possible small changes in the model parameters. To address this issue, researchers from the verification community have recently proposed several notions of robustness for temporal logic providing suitable definitions of distance between a trajectory of a (deterministic) dynamical system and the boundaries of the set of trajectories satisfying the property of interest. The contributions of this paper are twofold. First, we extend the notion of robustness to stochastic systems, showing that this naturally leads to a distribution of robustness scores. By discussing two examples, we show how to approximate the distribution of the robustness score and its key indicators: the average robustness and the conditional average robustness. Secondly, we show how to combine these indicators with the satisfaction probability to address the system design problem, where the goal is to optimize some control parameters of a stochastic model in order to best maximize robustness of the desired specifications.

1308.5332 2026-06-04 eess.SY cs.AI cs.SE cs.SY 版本更新

An Integrated Framework for Diagnosis and Prognosis of Hybrid Systems

混合系统诊断与预测的集成框架

Elodie Chanthery, Pauline Ribot

AI总结 本文提出一种混合系统诊断与预测的集成理论框架,通过增强形式化方法以跟踪系统故障的衰老规律,并提出一种在混合框架中交织诊断与预测的方法。

Comments In Proceedings HAS 2013, arXiv:1308.4904

详情
Journal ref
EPTCS 124, 2013, pp. 14-25
AI中文摘要

复杂系统本质上是混合的:其动态行为兼具连续性和离散性。对于这些系统,维护和修理已成为最终产品总成本的重要组成部分。必须采用高效的诊断和预测技术来检测、隔离和预测故障。本文提出了一个原创的混合系统诊断与预测的理论框架。用于混合诊断的形式化方法被增强,以便能够跟踪每个系统故障的衰老规律。本文提出了一种在混合框架中交织诊断与预测的方法。

英文摘要

Complex systems are naturally hybrid: their dynamic behavior is both continuous and discrete. For these systems, maintenance and repair are an increasing part of the total cost of final product. Efficient diagnosis and prognosis techniques have to be adopted to detect, isolate and anticipate faults. This paper presents an original integrated theoretical framework for diagnosis and prognosis of hybrid systems. The formalism used for hybrid diagnosis is enriched in order to be able to follow the evolution of an aging law for each fault of the system. The paper presents a methodology for interleaving diagnosis and prognosis in a hybrid framework.

1306.4635 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Towards Multistage Design of Modular Systems

模块化系统的多阶段设计

Mark Sh. Levin

AI总结 本文提出了一种多阶段设计方法,用于设计复合(模块化)系统轨迹,通过时间/逻辑点定义、模块化设计和解决方案选择,解决复杂系统设计问题。

Comments 13 pages, 25 figures, 14 tables

详情
AI中文摘要

本文描述了复合(模块化)系统的多阶段设计(即系统轨迹的设计)。该设计过程包括:(i) 定义一组时间/逻辑点;(ii) 对每个时间/逻辑点进行模块化设计(例如基于组合合成作为分层形态学设计或多重选择问题)以获得多个系统解决方案;(iii) 在考虑其质量和相邻所选系统解决方案的兼容性质量时选择每个时间/逻辑点的系统解决方案(此处也使用组合合成)。主要研究的时间/逻辑点基于时间链。此外,还考虑了两种复杂情况:(a) 研究的逻辑点基于树状结构,(b) 研究的逻辑点基于有向图。数值示例展示了该方法。

英文摘要

The paper describes multistage design of composite (modular) systems (i.e., design of a system trajectory). This design process consists of the following: (i) definition of a set of time/logical points; (ii) modular design of the system for each time/logical point (e.g., on the basis of combinatorial synthesis as hierarchical morphological design or multiple choice problem) to obtain several system solutions; (iii) selection of the system solution for each time/logical point while taking into account their quality and the quality of compatibility between neighbor selected system solutions (here, combinatorial synthesis is used as well). Mainly, the examined time/logical points are based on a time chain. In addition, two complicated cases are considered: (a) the examined logical points are based on a tree-like structure, (b) the examined logical points are based on a digraph. Numerical examples illustrate the approach.

1306.0128 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Towards Detection of Bottlenecks in Modular Systems

向模块化系统瓶颈检测的迈进

Mark Sh. Levin

AI总结 本文探讨了复合模块化系统瓶颈检测的基本方法,包括传统质量管理方法、关键系统元素选择、复合系统故障识别及预测性检测,通过启发式方案解决相关问题。

Comments 12 pp., tables 4, figures 15

详情
AI中文摘要

本文描述了检测复合(模块化)系统瓶颈的一些基本方法。研究了以下基本系统瓶颈检测问题:(1)传统质量管理方法(帕累托图方法、多准则分析作为帕累托有效点的选择,以及/或多准则排序);(2)关键系统元素(关键组件/模块、关键组件互联)的选择;(3)选择互联系统组件作为复合系统故障(通过基于聚类的融合);(4)网络中的关键元素(例如节点);(5)系统瓶颈的预测检测(基于参数预测的系统组件检测)。在此,使用启发式求解方案。数值示例展示了这些方法。

英文摘要

The paper describes some basic approaches to detection of bottlenecks in composite (modular) systems. The following basic system bottlenecks detection problems are examined: (1) traditional quality management approaches (Pareto chart based method, multicriteria analysis as selection of Pareto-efficient points, and/or multicriteria ranking), (2) selection of critical system elements (critical components/modules, critical component interconnection), (3) selection of interconnected system components as composite system faults (via clique-based fusion), (4) critical elements (e.g., nodes) in networks, and (5) predictive detection of system bottlenecks (detection of system components based on forecasting of their parameters). Here, heuristic solving schemes are used. Numerical examples illustrate the approaches.

1305.4917 2026-06-04 cs.AI cs.SY eess.SY 版本更新

Note on Evaluation of Hierarchical Modular Systems

关于分层模块系统评估的注记

Mark Sh. Levin

AI总结 本文综述了分层复合系统评估方法,探讨了评估尺度、转换问题及整合方法,强调了系统组件评估与整合过程。

Comments 15 pages, 23 figures, 4 tables

详情
AI中文摘要

本文简要概述了分层复合系统评估方法的各类方法。所考虑的问题包括:(i)基本评估尺度(定量尺度、顺序尺度、多准则描述、两种类似偏序的尺度),(ii)基本类型的尺度转换问题,(iii)基本类型的尺度整合方法。将模块系统评估视为对系统组件(及其兼容性)的评估,并将获得的局部估计整合到总系统估计中。此过程基于上述问题(即尺度转换和整合)。展示了评估问题和评估方法的示例(包括数值例子)。

英文摘要

This survey note describes a brief systemic view to approaches for evaluation of hierarchical composite (modular) systems. The list of considered issues involves the following: (i) basic assessment scales (quantitative scale, ordinal scale, multicriteria description, two kinds of poset-like scales), (ii) basic types of scale transformations problems, (iii) basic types of scale integration methods. Evaluation of the modular systems is considered as assessment of system components (and their compatibility) and integration of the obtained local estimates into the total system estimate(s). This process is based on the above-mentioned problems (i.e., scale transformation and integration). Illustrations of the assessment problems and evaluation approaches are presented (including numerical examples).

1305.2752 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Hybrid fuzzy logic and pid controller based ph neutralization pilot plant

基于模糊逻辑和PID控制器的pH中和试点工厂

Oumair Naseer, Atif Ali Khan

AI总结 本文提出一种优化的数学模型和先进混合控制器设计,用于pH中和试点工厂的控制设计与自动化,以应对过程控制中的挑战。

详情
AI中文摘要

在过程控制行业中,由于仪器复杂性增加、实时要求提高、操作成本最小化和化学过程高度非线性,控制理论的应用发生了快速变化。以前基于单一控制器的过程控制技术在信号传输延迟、计算处理能力和信噪比方面效率低下。具有高效系统建模的混合控制器对于应对过程控制中的控制性能挑战至关重要。本文提出了优化的数学建模和先进的混合控制器(模糊逻辑和PID)设计,以及pH中和试点工厂的实践实现和验证。该方法对于过程控制行业中的物理化学系统控制设计和自动化尤为重要。

英文摘要

Use of Control theory within process control industries has changed rapidly due to the increase complexity of instrumentation, real time requirements, minimization of operating costs and highly nonlinear characteristics of chemical process. Previously developed process control technologies which are mostly based on a single controller are not efficient in terms of signal transmission delays, processing power for computational needs and signal to noise ratio. Hybrid controller with efficient system modelling is essential to cope with the current challenges of process control in terms of control performance. This paper presents an optimized mathematical modelling and advance hybrid controller (Fuzzy Logic and PID) design along with practical implementation and validation of pH neutralization pilot plant. This procedure is particularly important for control design and automation of Physico-chemical systems for process control industry.

1304.3088 2026-06-04 eess.SY cs.AI cs.MA cs.SY 版本更新

Information and Multi-Sensor Coordination

信息与多传感器协调

Greg Hager, Hugh F. Durrant-Whyte

AI总结 本文研究多传感器系统中信息融合与协调控制问题,提出基于团队决策理论的分析方法,通过仿真验证了在不确定性环境下多传感器协作的有效性。

Comments Appears in Proceedings of the Second Conference on Uncertainty in Artificial Intelligence (UAI1986)

详情
AI中文摘要

分布式多传感器感知系统的控制与集成是一个复杂且具有挑战性的问题。不同传感器的观测或意见往往不一致且难以比较,通常只是部分视角。传感器信息本质上具有不确定性,此外,个别传感器可能相对于整体系统本身存在误差。多传感器系统的成功运行必须考虑这种不确定性,并以智能和稳健的方式聚合不一致的信息。我们将多传感器系统的传感器视为团队成员或智能体,能够提供意见并在群体决策中协商。我们使用团队决策理论分析此结构的协调与控制。我们对多传感器聚合提出了一些新的分析结果,并详细描述了一个仿真,用于研究我们的想法。该仿真为分析在不确定性下协作的复杂智能体结构提供了基础。本研究的结果参考了多传感器机器人系统、分布式人工智能和在不确定性下的决策制定。

英文摘要

The control and integration of distributed, multi-sensor perceptual systems is a complex and challenging problem. The observations or opinions of different sensors are often disparate incomparable and are usually only partial views. Sensor information is inherently uncertain and in addition the individual sensors may themselves be in error with respect to the system as a whole. The successful operation of a multi-sensor system must account for this uncertainty and provide for the aggregation of disparate information in an intelligent and robust manner. We consider the sensors of a multi-sensor system to be members or agents of a team, able to offer opinions and bargain in group decisions. We will analyze the coordination and control of this structure using a theory of team decision-making. We present some new analytic results on multi-sensor aggregation and detail a simulation which we use to investigate our ideas. This simulation provides a basis for the analysis of complex agent structures cooperating in the presence of uncertainty. The results of this study are discussed with reference to multi-sensor robot systems, distributed Al and decision making under uncertainty.

1304.3075 2026-06-04 cs.AI cs.RO cs.SY eess.SY 版本更新

Application of Evidential Reasoning to Helicopter Flight Path Control

证据推理在直升机飞行路径控制中的应用

Shoshana Abel

AI总结 本文提出了一种专家系统推理和知识表示方法,用于在实时车辆导航系统中处理不确定性。提出了一种创新的证据推理方法,即求和与格点方法,并进行了数学推导、并行环境实现及原型软件开发和测试。

Comments Appears in Proceedings of the Second Conference on Uncertainty in Artificial Intelligence (UAI1986)

详情
AI中文摘要

本文提出了一种方法,用于研究和开发专家系统推理和知识表示方面,以在实时车辆导航系统中处理不确定性。此类系统对非地形跟随低空飞行系统在敌对环境中具有重大好处,例如NOE直升机或类似任务飞行器可能遇到的环境。开发了一种创新的证据推理方法,称为求和与格点方法。本文的研究和开发工作包括该方法的数学形式化发展、在并行环境中的公式化和表示、在专家系统中的方法原型软件开发,以及在车辆导航系统内进行的初始测试。

英文摘要

This paper presents a methodology for research and development of the inferencing and knowledge representation aspects of an Expert System approach for performing reasoning under uncertainty in support of a real time vehicle guidance and navigation system. Such a system could be of major benefit for non-terrain following low altitude flight systems operating in foreign hostile environments such as might be experienced by NOE helicopter or similar mission craft. An innovative extension of the evidential reasoning methodology, termed the Sum-and-Lattice-Points Method, has been developed. The research and development effort presented in this paper consists of a formal mathematical development of the Sum-and-Lattice-Points Method, its formulation and representation in a parallel environment, prototype software development of the method within an expert system, and initial testing of the system within the confines of the vehicle guidance system.

1304.2757 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Estimation Procedures for Robust Sensor Control

鲁棒传感器控制的估计方法

Greg Hager, Max Mintz

AI总结 本文研究非线性测量系统中鲁棒传感器控制问题,评估三种估计技术并讨论其适用条件及性能评估方法。

Comments Appears in Proceedings of the Third Conference on Uncertainty in Artificial Intelligence (UAI1987)

详情
AI中文摘要

许多机器人传感器估计问题可以描述为非线性测量系统。这些系统受噪声干扰且可能单次观测不足。为获得可靠估计结果,系统需选择视图以形成超定系统,即传感器控制问题。准确可靠的传感器控制需要一种能提供估计值和自身性能度量的估计方法。在非线性测量系统中,计算简单的闭式估计解可能不存在。然而,近似技术提供了可行替代方案。本文评估了三种估计技术:扩展卡尔曼滤波器、离散贝叶斯近似和迭代贝叶斯近似。我们呈现了数学结果和仿真统计数据,说明扩展卡尔曼滤波器在传感器控制中不适用的运行条件,并讨论了离散贝叶斯近似的使用问题。

英文摘要

Many robotic sensor estimation problems can characterized in terms of nonlinear measurement systems. These systems are contaminated with noise and may be underdetermined from a single observation. In order to get reliable estimation results, the system must choose views which result in an overdetermined system. This is the sensor control problem. Accurate and reliable sensor control requires an estimation procedure which yields both estimates and measures of its own performance. In the case of nonlinear measurement systems, computationally simple closed-form estimation solutions may not exist. However, approximation techniques provide viable alternatives. In this paper, we evaluate three estimation techniques: the extended Kalman filter, a discrete Bayes approximation, and an iterative Bayes approximation. We present mathematical results and simulation statistics illustrating operating conditions where the extended Kalman filter is inappropriate for sensor control, and discuss issues in the use of the discrete Bayes approximation.

1304.2382 2026-06-04 eess.SY cs.AI cs.SY 版本更新

Predicting the Likely Behaviors of Continuous Nonlinear Systems in Equilibrium

预测连续非线性系统在平衡状态下的可能行为

Alexander Yeh

AI总结 本文提出SAB方法,通过划分输入空间并建立密度下界,预测连续非线性系统在平衡状态下的行为可能性,无需精确知道输入密度参数。

Comments Appears in Proceedings of the Fourth Conference on Uncertainty in Artificial Intelligence (UAI1988)

详情
AI中文摘要

本文介绍了一种方法,用于预测连续非线性系统在平衡状态下的可能行为,其中输入值可以变化。该方法使用参数化方程模型和输入联合密度的下界来限制某些行为发生的可能性,例如状态变量处于给定数值范围内的概率。使用密度下界而非密度本身是可取的,因为通常输入密度的参数和形状并不完全已知。新方法称为SAB,其基本操作是将输入值空间划分为较小的区域,然后对这些区域的可能行为和概率进行界定。SAB首先找到粗略的边界,然后在给定更多时间后进行细化。与其它研究方法相比,SAB可以(1)找到所有可能的系统行为并指示其可能性,(2)不近似可能结果的分布,除非有误差大小的度量,(3)不使用离散化的变量值,这限制了可以找到概率边界的事件,(4)能够处理密度下界,(5)能够处理诸如两个状态变量都处于数值范围内的标准。

英文摘要

This paper introduces a method for predicting the likely behaviors of continuous nonlinear systems in equilibrium in which the input values can vary. The method uses a parameterized equation model and a lower bound on the input joint density to bound the likelihood that some behavior will occur, such as a state variable being inside a given numeric range. Using a bound on the density instead of the density itself is desirable because often the input density's parameters and shape are not exactly known. The new method is called SAB after its basic operations: split the input value space into smaller regions, and then bound those regions' possible behaviors and the probability of being in them. SAB finds rough bounds at first, and then refines them as more time is given. In contrast to other researchers' methods, SAB can (1) find all the possible system behaviors, and indicate how likely they are, (2) does not approximate the distribution of possible outcomes without some measure of the error magnitude, (3) does not use discretized variable values, which limit the events one can find probability bounds for, (4) can handle density bounds, and (5) can handle such criteria as two state variables both being inside a numeric range.