arXivDaily arXiv每日学术速递 周一至周五更新
重置
全部学科分类 2256
2601.03048 2026-05-28 cs.CV cs.AI cs.CC

On the Intrinsic Limits of Transformer Image Embeddings in Non-Solvable Spatial Reasoning

关于Transformer图像嵌入在非可解空间推理中的内在限制

Siyi Lyu, Quan Liu, Feng Yan

AI总结 本文通过将空间理解形式化为群同态问题,证明恒定深度Transformer由于TC⁰复杂度限制,无法在单次前向传播中捕获非可解群(如SO(3))的空间结构。

详情
AI中文摘要

视觉Transformer(ViT)在语义识别方面表现出色,但在心理旋转等空间推理任务中却出现系统性失败。虽然这通常归因于数据规模,但本文认为该限制源于架构的内在电路复杂度。通过将空间理解形式化为学习一个群同态问题——其中潜在嵌入保留作用于图像的物理变换的代数结构——我们识别出一个基本的计算瓶颈。具体来说,对于非可解群(例如$\mathrm{SO}(3)$),维持这种保结构嵌入的下界由单词问题决定,该问题是$\mathsf{NC^1}$-完全的。相比之下,具有多项式精度的恒定深度ViT严格受限于复杂度类$\mathsf{TC^0}$。在标准猜想$\mathsf{TC^0} \subsetneq \mathsf{NC^1}$下,出现了一个复杂度边界:恒定深度架构缺乏在单次前向传播中捕获非可解空间结构所需的逻辑深度。为了实证验证这一理论差距,我们提出了潜在空间代数(LSA)基准,该基准揭示了随着非可解任务组合深度的增加,ViT表示出现显著退化。

英文摘要

Vision Transformers (ViTs) excel in semantic recognition but exhibit systematic failures in spatial reasoning tasks such as mental rotation. While often attributed to data scale, this work argues that the limitation arises from the intrinsic circuit complexity of the architecture. By formalizing spatial understanding as learning a Group Homomorphism Problem -- where latent embeddings preserve the algebraic structure of physical transformations acting on images -- we identify a fundamental computational bottleneck. Specifically, for non-solvable groups (e.g., $\mathrm{SO}(3)$), maintaining such structure-preserving embeddings is lowerbounded by the Word Problem, which is $\mathsf{NC^1}$-complete. In contrast, constant-depth ViTs with polynomial precision are strictly bounded by the complexity class $\mathsf{TC^0}$. Under the standard conjecture $\mathsf{TC^0} \subsetneq \mathsf{NC^1}$, a complexity boundary emerges: constant-depth architectures lack the logical depth required to capture non-solvable spatial structures in a single forward pass. To empirically validate this theoretical gap, we propose the Latent Space Algebra (LSA) benchmark, which reveals a significant degradation in ViT representations as the compositional depth of non-solvable tasks increases.

2601.01616 2026-05-28 cs.LG eess.SP

Real Time NILM Based Power Monitoring of Identical Induction Motors Representing Cutting Machines in Textile Industry

基于实时非侵入式负荷监测的纺织行业切割机同型号感应电机功率监控

Md Istiauk Hossain Rifat, Moin Khan, Zohara Kamal, Md Borhan Uddin Khan, Mohammad Zunaed

AI总结 针对纺织行业能源监控落后的问题,提出基于实时非侵入式负荷监测(NILM)的框架,使用MATNILM模型对同型号感应电机进行功率分解,验证了实时监控可行性并指出多台相同设备同时运行时的分解困难。

Comments 9 pages, 9 figures

详情
AI中文摘要

孟加拉国的纺织行业是能源密集型行业之一,但其监控实践仍然大多过时,导致电力使用效率低下和运营成本高昂。为了解决这个问题,我们提出了一种基于实时非侵入式负荷监测(NILM)的框架,专为工业应用定制,重点关注代表纺织切割机的相同电机驱动负载。开发了一个包含电压和电流传感器、Arduino Mega和ESP8266的硬件装置,用于捕获总负荷和单个负荷数据,并在云平台上存储和处理。从三个相同的感应电机和辅助负载创建了一个新数据集,总计超过180,000个样本,以在具有挑战性的工业条件下评估最先进的MATNILM模型。结果表明,虽然总能量估计相当准确,但每个电器的分解面临困难,特别是当多台相同机器同时运行时。尽管存在这些挑战,集成系统通过Blynk应用程序展示了具有远程访问功能的实际实时监控。这项工作突出了NILM在工业环境中的潜力和局限性,为未来的改进提供了见解,例如更高频率的数据收集、更大规模的数据集以及用于处理相同负载的先进深度学习方法。

英文摘要

The textile industry in Bangladesh is one of the most energy-intensive sectors, yet its monitoring practices remain largely outdated, resulting in inefficient power usage and high operational costs. To address this, we propose a real-time Non-Intrusive Load Monitoring (NILM)-based framework tailored for industrial applications, with a focus on identical motor-driven loads representing textile cutting machines. A hardware setup comprising voltage and current sensors, Arduino Mega and ESP8266 was developed to capture aggregate and individual load data, which was stored and processed on cloud platforms. A new dataset was created from three identical induction motors and auxiliary loads, totaling over 180,000 samples, to evaluate the state-of-the-art MATNILM model under challenging industrial conditions. Results indicate that while aggregate energy estimation was reasonably accurate, per-appliance disaggregation faced difficulties, particularly when multiple identical machines operated simultaneously. Despite these challenges, the integrated system demonstrated practical real-time monitoring with remote accessibility through the Blynk application. This work highlights both the potential and limitations of NILM in industrial contexts, offering insights into future improvements such as higher-frequency data collection, larger-scale datasets and advanced deep learning approaches for handling identical loads.

2504.10079 2026-05-28 cs.CV

Hierarchical Relation-augmented Representation Generalization for Few-shot Action Recognition

层次化关系增强表示泛化用于少样本动作识别

Hongyu Qu, Ling Xing, Jiachao Zhang, Rui Yan, Yazhou Yao, Xiangbo Shu

AI总结 提出HR2G-shot框架,通过统一帧间、视频间和任务间三种关系建模,从整体视角学习任务特定的时间模式,以提升少样本动作识别的性能。

详情
AI中文摘要

少样本动作识别(FSAR)旨在通过少量样本识别新动作类别。现有方法通常通过设计帧间时间建模策略或粗粒度视频级交互来学习每个视频的帧级表示。然而,它们孤立地处理每个情节任务,忽略了视频间的细粒度时间关系建模,因此无法捕获跨视频共享的细粒度时间模式,也无法重用历史任务的时间知识。鉴于此,我们提出了HR2G-shot,一种用于FSAR的层次化关系增强表示泛化框架,它统一了三种关系建模(帧间、视频间和任务间),从整体视角学习任务特定的时间模式。除了进行帧间时间交互外,我们进一步设计了两个组件分别探索视频间和任务间关系:i) 视频间语义相关性(ISC)以细粒度方式执行跨视频帧级交互,从而捕获任务特定的查询特征,并增强类内一致性和类间可分离性;ii) 任务间知识迁移(IKT)从存储历史情节任务中多样时间模式的库中检索和聚合相关时间知识。在五个基准上的大量实验表明,HR2G-shot优于当前领先的FSAR方法。

英文摘要

Few-shot action recognition (FSAR) aims to recognize novel action categories with few exemplars. Existing methods typically learn frame-level representations for each video by designing inter-frame temporal modeling strategies or inter-video interaction at the coarse video-level granularity. However, they treat each episode task in isolation and neglect fine-grained temporal relation modeling between videos, thus failing to capture shared fine-grained temporal patterns across videos and reuse temporal knowledge from historical tasks. In light of this, we propose HR2G-shot, a Hierarchical Relation-augmented Representation Generalization framework for FSAR, which unifies three types of relation modeling (inter-frame, inter-video, and inter-task) to learn task-specific temporal patterns from a holistic view. Going beyond conducting inter-frame temporal interactions, we further devise two components to respectively explore inter-video and inter-task relationships: i) Inter-video Semantic Correlation (ISC) performs cross-video frame-level interactions in a fine-grained manner, thereby capturing task-specific query features and enhancing both intra-class consistency and inter-class separability; ii) Inter-task Knowledge Transfer (IKT) retrieves and aggregates relevant temporal knowledge from the bank, which stores diverse temporal patterns from historical episode tasks. Extensive experiments on five benchmarks show that HR2G-shot outperforms current top-leading FSAR methods.

2601.00501 2026-05-28 cs.CV

CPPO: Contrastive Perception Policy Optimization for VLM Agents

CPPO: 面向VLM智能体的对比感知策略优化

Ahmad Rezaei, Mohsen Gholami, Saeed Ranjbar Alvar, Kevin Cannons, Mohammad Asiful Hossain, Zhou Weimin, Yong Zhang, Mohammad Akbari

AI总结 提出一种自监督的对比感知策略优化方法CPPO,通过对比感知损失增强视觉语言模型的视觉基础能力,无需额外模型或标注,在感知关键任务中优于现有方法。

详情
AI中文摘要

我们引入了CPPO,一种用于微调视觉语言模型(VLM)的对比感知策略优化方法。可靠的感知是基于VLM的智能体在开放环境中推理和行动的核心要求:错误的视觉基础直接导致错误的行为、幻觉工具调用和不安全的决策。虽然强化学习(RL)显著提升了语言模型的推理能力,但将这些进展扩展到多模态智能体需要同时改进感知和推理。先前的工作主要通过显式感知奖励来解决这一挑战,这通常需要额外的LLM评判器、真实标注或强制将感知与推理分离。CPPO通过扩展RL目标,引入对比感知损失(CPL),以自监督方式解决了这一限制,为视觉基础提供了直接的学习信号。对比目标鼓励模型对输入的视觉信息更加敏感。为了有效应用这一信号,CPPO利用在扰动图像下模型输出分布中的熵移机制识别感知令牌,并在训练期间选择性地对这些令牌应用对比损失。实验表明,CPPO在避免额外模型的同时超越了先前方法,使训练更加高效和可扩展,并产生了更适合感知关键智能体任务的策略。

英文摘要

We introduce CPPO, a Contrastive Perception Policy Optimization method for finetuning vision--language models (VLMs). Reliable perception is a core requirement for VLM-based agents that must reason and act in open-ended environments: faulty visual grounding cascades directly into faulty actions, hallucinated tool calls, and unsafe decisions. While reinforcement learning (RL) has significantly improved reasoning in language models, extending these advances to multimodal agents requires improving both perception and reasoning. Prior works address this challenge mainly through explicit perception rewards, which often require extra LLM judges, ground-truth annotations, or forced separation of perception from reasoning. CPPO addresses this limitation in a self-supervised manner by extending the RL objective with a Contrastive Perception Loss (CPL) that provides a direct learning signal for visual grounding. The contrastive objective encourages the model to become more sensitive to input visual information. To apply this signal effectively, CPPO identifies perception tokens using an entropy-shift mechanism in the model's output distributions under perturbed images and applies the contrastive loss selectively to those tokens during training. Experiments show that CPPO surpasses prior methods while avoiding extra models, making training more efficient and scalable, and yielding policies that are better suited to perception-critical agentic tasks.

2512.23959 2026-05-28 cs.CL cs.AI cs.LG

HGMEM: Hypergraph-based Working Memory to Improve Multi-step RAG for Long-Context Complex Relational Modeling

HGMem:基于超图的工作记忆以改进长上下文复杂关系建模的多步RAG

Chulun Zhou, Chunkang Zhang, Guoxin Yu, Fandong Meng, Jie Zhou, Wai Lam, Mo Yu

AI总结 提出HGMem超图工作记忆系统,通过超边表示记忆单元并渐进形成高阶交互,增强多步RAG中的全局理解和复杂推理能力。

Comments ICML 2026; Code released at https://github.com/Encyclomen/HGMem

详情
AI中文摘要

多步检索增强生成(RAG)已成为增强大型语言模型(LLMs)在需要全局理解和密集推理任务上的广泛采用策略。尽管许多RAG系统整合了工作记忆来整合信息,但现有设计主要作为孤立事实的被动存储。这种静态特性忽略了原始事实之间的关键高阶相关性,从而限制了模型的多步推理能力,导致在扩展上下文中的碎片化推理和弱全局理解。我们引入了HGMem,一种基于超图的工作记忆系统,将记忆的概念从简单存储扩展到动态、表达性结构,用于复杂推理和全局理解。在我们的方法中,记忆被表示为超图,其中超边对应不同的记忆单元,使得记忆内高阶交互的逐步形成成为可能。该机制连接围绕焦点问题的事实和思考,将记忆演变为一个集成且情境化的知识结构,为更深层次的推理提供强有力的命题。我们在几个具有挑战性的全局理解基准上评估了HGMem。大量实验和深入分析表明,我们的方法持续改进了多步RAG,并在不同数据集上显著优于强基线系统。

英文摘要

Multi-step retrieval-augmented generation (RAG) has become a widely adopted strategy for enhancing large language models (LLMs) on tasks that demand global comprehension and intensive reasoning. Although many RAG systems incorporate a working memory to consolidate information, existing designs primarily function as a passive storage for isolated facts. This static nature overlooks crucial high-order correlations among primitive facts, thereby limiting models' capacity for multi-step reasoning and resulting in fragmented reasoning and weak global sense-making within extended contexts. We introduce HGMem, a hypergraph-based working memory system, extending the concept of memory beyond simple storage into a dynamic, expressive structure for complex reasoning and global understanding. In our approach, memory is represented as a hypergraph where hyperedges correspond to distinct memory units, enabling the progressive formation of high-order interactions within memory. This mechanism connects facts and thoughts around the focal problem, evolving the memory into an integrated and situated knowledge structure that provides strong propositions for deeper reasoning. We evaluate HGMem on several challenging global sense-making benchmarks. Extensive experiments and in-depth analyses demonstrate that our method consistently improves multi-step RAG and substantially outperforms strong baseline systems across diverse datasets.

2512.22777 2026-05-28 cs.LG cs.AI

Adapting, Fast and Slow: On Few-Shot Transportability of Compositions

适应,快与慢:关于组合的少样本可迁移性

Kasra Jalaldoust, Elias Bareinboim

AI总结 研究在少样本场景下,通过因果传输性理论将源域学习到的因果机制组合成目标域预测器,并区分模块传输性和电路传输性,提出基于梯度松弛的电路搜索方法以实现快速或慢速适应。

详情
AI中文摘要

跨域泛化需要连接源分布和目标分布的稳定结构。基于因果传输性理论,我们研究了一个序列预测设置,其中目标预测器可以表示为从源数据可学习的因果机制组成的电路。我们引入了两类传输性。模块传输性捕获原子情况,其中目标预测器由可从单个源域学习的机制给出。电路传输性将此思想推广到通过组合从源数据学习的多个模块获得的目标预测器,即使没有源机制直接预测目标标签,也能实现零样本预测。我们在逐渐放松的假设下研究这些电路类别。首先,我们提供了条件,在这些条件下,给定关于源域和目标域的因果知识,可以从源数据单独学习相关电路。然后,我们通过允许来自目标域的有限数据来放松这些结构假设。特别地,我们开发了一种监督域适应方案,该方案无需显式因果结构即可学习电路。由此产生的少样本保证将可实现误差与可从源数据学习的模块组成的最小目标电路的大小联系起来。最后,我们提出了符号电路搜索的基于梯度的松弛,并进行了实证评估,表明它定性地跟踪了预测的快速适应机制——有和没有中间位置的过程监督——以及当没有源机制匹配时的慢速适应。

英文摘要

Generalization across domains requires stable structure that links the source and target distributions. Building on causal transportability theory, we study a sequential prediction setting in which the target predictor can be represented as a circuit composed of causal mechanisms that are learnable from source data. We introduce two classes of transportability. Module transportability captures the atomic case, where the target predictor is given by a mechanism learnable from a single source domain. Circuit transportability generalizes this idea to target predictors obtained by composing several modules learned from source data, enabling zero-shot prediction even when no source mechanism directly predicts the target label. We study these classes of circuits under increasingly relaxed assumptions. First, we provide conditions under which the relevant circuits can be learned from source data alone, given causal knowledge about the source and target domains. We then relax these structural assumptions by allowing limited data from the target domain. In particular, we develop a supervised domain adaptation scheme that learns circuits without requiring explicit causal structure. The resulting few-shot guarantees tie the achievable error to the size of the smallest target circuit composable from modules learned from source data. Finally, we propose a gradient-based relaxation of the symbolic circuit search and evaluate it empirically, showing that it qualitatively tracks the predicted regimes of fast adaptation -- with and without process supervision over intermediate positions -- and slow adaptation when no source mechanism matches.

2501.09934 2026-05-28 cs.LG cs.AI

HEART: Achieving Timely Multi-Model Training for Vehicle-Edge-Cloud-Integrated Hierarchical Federated Learning

HEART:实现车辆-边缘-云集成分层联邦学习的多模型及时训练

Xiaohong Yang, Minghui Liwang, Xianbin Wang, Zhipeng Cheng, Seyyedali Hosseinalipour, Huaiyu Dai, Zhenzhen Jiao

AI总结 针对车辆-边缘-云分层联邦学习中多模型训练面临的模型过时、数据利用低效和资源分配不平衡问题,提出HEART框架,通过混合同步-异步聚合规则和两阶段优化算法(改进PSO+GA与贪心算法)最小化全局训练延迟并实现任务平衡。

Comments Accepted by IEEE Transactions on Cloud Computing (22 pages, 7 figures)

详情
AI中文摘要

人工智能赋能的物联网车辆(IoV)的快速发展需要高效的机器学习(ML)解决方案,以处理高车辆移动性和分散数据。这推动了车辆-边缘-云架构上的分层联邦学习(VEC-HFL)的出现。然而,VEC-HFL文献中尚未充分探讨的一个方面是,车辆通常需要同时执行多个ML任务,这种多模型训练环境带来了关键挑战。首先,不恰当的聚合规则可能导致模型过时和训练时间延长。其次,车辆移动性可能阻止车辆将模型返回网络边缘,导致数据利用效率低下。第三,跨不同任务实现平衡的资源分配变得至关重要,因为它极大地影响协作训练的有效性。我们率先提出一个针对动态VEC-HFL中多模型训练的框架,目标是最小化全局训练延迟,同时确保跨各种任务的平衡训练,该问题被证明是NP难的。为了促进及时模型训练,我们引入了一种混合同步-异步聚合规则。在此基础上,我们提出了一种称为混合进化与贪婪分配(HEART)的新方法。该框架分两个阶段运行:首先,通过结合改进的粒子群优化(PSO)和遗传算法(GA)的混合启发式方法实现平衡的任务调度;其次,采用低复杂度的贪心算法确定车辆上分配任务的训练优先级。在真实数据集上的实验证明了HEART相对于现有方法的优越性。

英文摘要

The rapid growth of AI-enabled Internet of Vehicles (IoV) calls for efficient Machine Learning (ML) solutions that can handle high vehicular mobility and decentralized data. This has motivated the emergence of Hierarchical Federated Learning over vehicle-edge-cloud architectures (VEC-HFL). Nevertheless, one aspect which is underexplored in the literature on VEC-HFL is that vehicles often need to execute multiple ML tasks simultaneously, where this multi-model training environment introduces crucial challenges. First, improper aggregation rules can lead to model obsolescence and prolonged training times. Second, vehicular mobility may result in inefficient data utilization by preventing the vehicles from returning their models to the network edge. Third, achieving a balanced resource allocation across diverse tasks becomes of paramount importance as it majorly affects the effectiveness of collaborative training. We take one of the first steps towards addressing these challenges via proposing a framework for multi-model training in dynamic VEC-HFL with the goal of minimizing global training latency while ensuring balanced training across various tasks, a problem that turns out to be NP-hard. To facilitate timely model training, we introduce a hybrid synchronous-asynchronous aggregation rule. Building on this, we present a novel method called Hybrid Evolutionary And gReedy allocaTion (HEART). The framework operates in two stages: first, it achieves balanced task scheduling through a hybrid heuristic approach that combines improved Particle Swarm Optimization (PSO) and Genetic Algorithms (GA); second, it employs a low-complexity greedy algorithm to determine the training priority of assigned tasks on vehicles. Experiments on real-world datasets demonstrate the superiority of HEART over existing methods.

2512.18566 2026-05-28 cs.LG cs.SY eess.SY q-bio.NC

Comparing Dynamical Models Through Diffeomorphic Vector Field Alignment

通过微分同胚向量场对齐比较动力学模型

Ruiqi Chen, Giacomo Vedovati, Todd Braver, ShiNung Ching

AI总结 提出DFORM框架,通过非线性坐标变换对齐两个动力系统的轨迹,评估拓扑等价性并定位高维模型中的低维动力学模式。

Comments 57 pages, 18 figures. For associated code, see https://github.com/rq-Chen/DFORM_stable

详情
Journal ref
Neural Computation (2026) 38 (6): 1006-1061
AI中文摘要

诸如递归神经网络(RNN)等动力系统模型在理论神经科学中越来越受欢迎,用于假设生成和数据分析。评估这些模型中的动力学是理解其学习到的生成机制的关键。然而,这种评估受到两个主要挑战的阻碍:首先,由于没有强制要求坐标系统等价,跨模型比较学习到的动力学很困难。其次,在高维非线性模型(如RNN)中,识别机制上重要的低维模式(例如极限集)是难以处理的。在这里,我们提出了一个全面的框架来解决这两个问题,称为学习模型的微分同胚向量场对齐(DFORM)。DFORM学习两个动力系统状态空间之间的非线性坐标变换,以最大程度地一对一地对齐它们的轨迹。通过这样做,DFORM能够评估两个模型是否表现出拓扑等价性,即尽管坐标系统不同但机制相似。该方法的一个副产品是一种在高维系统中嵌入的低维流形上定位动力学模式的方法。我们使用典型的拓扑等价系统、RNN和通过非线性流相关的系统验证了DFORM识别线性和非线性坐标变换的能力。DFORM还被证明可以提供拓扑不同系统之间的相似性量化。然后,我们证明了DFORM可以在高维模型中定位重要的动力学模式,包括不变流形和鞍极限集。最后,使用一组在人类功能性磁共振成像(fMRI)记录上训练的RNN模型,我们展示了DFORM可以从高维数据驱动模型中识别极限环,这与先前的数值分析结果一致。

英文摘要

Dynamical systems models such as recurrent neural networks (RNNs) are increasingly popular in theoretical neuroscience for hypothesis-generation and data analysis. Evaluating the dynamics in such models is key to understanding their learned generative mechanisms. However, such evaluation is impeded by two major challenges: First, comparison of learned dynamics across models is difficult because there is no enforced equivalence of their coordinate systems. Second, identification of mechanistically important low-dimensional motifs (e.g., limit sets) is intractable in high-dimensional nonlinear models such as RNNs. Here, we propose a comprehensive framework to address these two issues, termed Diffeomorphic vector field alignment FOR learned Models (DFORM). DFORM learns a nonlinear coordinate transformation between the state spaces of two dynamical systems, which aligns their trajectories in a maximally one-to-one manner. In so doing, DFORM enables an assessment of whether two models exhibit topological equivalence, i.e., similar mechanisms despite differences in coordinate systems. A byproduct of this method is a means to locate dynamical motifs on low-dimensional manifolds embedded within higher-dimensional systems. We verified DFORM's ability to identify linear and nonlinear coordinate transformations using canonical topologically equivalent systems, RNNs, and systems related by nonlinear flows. DFORM was also shown to provide a quantification of similarity between topologically distinct systems. We then demonstrated that DFORM can locate important dynamical motifs including invariant manifolds and saddle limit sets within high-dimensional models. Finally, using a set of RNN models trained on human functional MRI (fMRI) recordings, we illustrated that DFORM can identify limit cycles from high-dimensional data-driven models, which agreed well with prior numerical analysis.

2512.17375 2026-05-28 cs.LG cs.CL cs.CR

AdvJudge-Zero: Binary Decision Flips in LLM-as-a-Judge via Adversarial Control Tokens

AdvJudge-Zero:通过对抗控制令牌在LLM作为评判者中实现二元决策翻转

Tung-Ling Li, Yuhao Wu, Hongliang Liu

AI总结 本文提出AdvJudge-Zero方法,通过从评判模型自身分布中采样低困惑度令牌,无需梯度优化即可将LLM评判者的二元判决从“否”翻转为“是”,并基于发现的令牌池提出防御策略以增强评判鲁棒性。

详情
AI中文摘要

LLM作为评判者系统在现代RLHF和RLVR流程中提供奖励信号,但其二元判决简化为一个隐藏状态上的单一线性读出F_gap。我们证明该读出足够浅,以至于短且低困惑度的令牌可以将判决从“否”翻转为“是”。这些令牌是从评判者自身在响应位置的下一个令牌分布中采样的,无需手动设置种子或基于梯度的优化。我们的方法AdvJudge-Zero在六个Qwen、Llama和Gemma评判者上的24个(模型,数据集)单元中,有22个实现了>90%的集成假阳性率,而先前策划的10令牌基准为54-72%,并且发现的表面跨格式转移到70B标量奖励模型。相同的发现池使得防御成为可能:基于9类机制分类法分层的LoRA微调,在相同池上的朴素采样失败的跨族泛化中增强了鲁棒性,其中机制广度而非池大小带来了增益。在GRPO训练下,硬化后的评判者消除了未硬化基线在MATH和GSM8K上每个条件十个种子时观察到的奖励崩溃失败(假阳性峰值和长度崩溃)。发现的池、机制分类法和每个提示的翻转记录将在负责任的披露下发布。

英文摘要

LLM-as-a-Judge systems supply the reward signal in modern RLHF and RLVR pipelines, but their binary verdict reduces to a single linear readout F_gap on one hidden state. We show this readout is shallow enough that short, low-perplexity tokens flip the verdict from "No" to "Yes". These tokens are sampled from the judge's own next-token distribution at the response position, with no manual seed set and no gradient-based optimization. Our procedure, AdvJudge-Zero, reaches $>$90% ensemble false-positive rate on 22 of 24 (model, dataset) cells across six Qwen, Llama, and Gemma judges, versus 54-72% for the prior curated 10-token benchmark, and the discovered surface transfers cross-format to a 70B scalar reward model. The same discovered pool enables a defense: a LoRA fine-tune stratified by a 9-class mechanism taxonomy hardens cross-family generalization where naive sampling on the same pool fails, with mechanism breadth rather than pool size carrying the gain. Under GRPO training, the hardened judge eliminates the reward-collapse failures (false-positive spikes and length collapse) we observe in the unhardened baseline on both MATH and GSM8K at ten seeds per condition. The discovered pool, the mechanism taxonomy, and per-prompt flip records will be released under responsible disclosure.

2505.17720 2026-05-28 cs.LG physics.ao-ph

PEAR: Equal Area Weather Forecasting on the Sphere

PEAR:球面上的等面积天气预报

Hampus Linander, Tage Tykesson, Pietro Rosso, Christoffer Petersson, Daniel Persson, Jan E. Gerken

AI总结 针对球面等角网格在极地分辨率过高的问题,提出基于HEALPix等面积网格的Transformer模型PEAR,实现无计算开销的全球天气预报性能提升。

Comments Extended version of manuscript published in the AI for Science workshop (NeurIPS 2025), 11 pages, 15 pages supplemental

详情
AI中文摘要

人工智能正在迅速重塑自然科学领域,天气预报作为AI4Science的标志性应用脱颖而出,机器学习模型如今能够与传统的数值模拟相媲美甚至超越。继里程碑式模型Pangu Weather和Graphcast在全球中期预报中超越传统数值方法后,许多新颖的数据驱动方法相继涌现。这些模型的一个共同局限是依赖球面的等角离散化,这种离散化在极地附近的网格比赤道附近更密集。相比之下,在球面的分层等面积纬度像素化(HEALPix)中,每个像素覆盖相同的表面积,消除了非物理的偏差。受气象和气候科学中对这种网格日益增长的支持的启发,我们提出使用原生运行在HEALPix网格上的深度学习模型进行天气预报。为此,我们引入了Pangu Equal ARea(PEAR),这是一个基于Transformer的天气预报模型,直接对HEALPix特征进行操作,在无任何计算开销的情况下,性能优于等角网格上的对应模型及其他基线。此外,我们对设置的等变性进行了数值实验,并验证了PEAR在气候模型模拟中的性能。

英文摘要

Artificial intelligence is rapidly reshaping the natural sciences, with weather forecasting emerging as a flagship AI4Science application where machine learning models can now rival and even surpass traditional numerical simulations. Following the success of the landmark models Pangu Weather and Graphcast, outperforming traditional numerical methods for global medium-range forecasting, many novel data-driven methods have emerged. A common limitation shared by many of these models is their reliance on an equiangular discretization of the sphere which suffers from a much finer grid at the poles than around the equator. In contrast, in the Hierarchical Equal Area iso-Latitude Pixelization (HEALPix) of the sphere, each pixel covers the same surface area, removing unphysical biases. Motivated by a growing support for this grid in meteorology and climate sciences, we propose to perform weather forecasting with deep learning models which natively operate on the HEALPix grid. To this end, we introduce Pangu Equal ARea (PEAR), a transformer-based weather forecasting model which operates directly on HEALPix-features and outperforms the corresponding model on an equiangular grid, and other baselines, without any computational overhead. Furthermore, we perform numerical experiments on the equivariance properties of our setup and verify the performance of PEAR on climate model emulation.

2512.16483 2026-05-28 cs.CV

FasterVAR: Plug-and-Play Acceleration for Visual Autoregressive Models

FasterVAR:视觉自回归模型的即插即用加速

Senmao Li, Kai Wang, Salman Khan, Fahad Shahbaz Khan, Jian Yang, Yaxing Wang

AI总结 针对VAR模型在大尺度步骤计算复杂度高的问题,提出一种基于阶段感知的即插即用加速框架FasterVAR,通过保留早期关键步骤并剪枝或近似后期细节步骤,实现最高3.4倍加速且几乎无性能损失。

Comments Accepted at ICML2026

详情
AI中文摘要

视觉自回归(VAR)建模通过下一尺度预测偏离了传统自回归(AR)模型的下一个标记预测范式,实现了高质量的图像生成。然而,VAR范式在大尺度步骤上面临计算复杂度和运行时间急剧增加的问题。尽管现有的加速方法减少了大尺度步骤的运行时间,但依赖于手动步骤选择,并忽略了生成过程中不同阶段的不同重要性。为了解决这一挑战,我们提出了FasterVAR,一个对VAR模型的系统研究和即插即用加速框架。我们的分析表明,早期步骤对于保持语义和结构一致性至关重要,应保持不变,而后期步骤主要细化细节,可以被剪枝或近似以加速。基于这些见解,FasterVAR引入了一种即插即用加速策略,利用后期计算中的语义无关性和低秩属性,无需额外训练。我们提出的FasterVAR实现了最高3.4倍的加速,且几乎没有性能损失,持续优于现有的加速基线。这些结果凸显了阶段感知设计作为高效视觉自回归图像生成的一个强大原则。

英文摘要

Visual Autoregressive (VAR) modeling departs from the next-token prediction paradigm of traditional Autoregressive (AR) models through next-scale prediction, enabling high-quality image generation. However, the VAR paradigm suffers from sharply increased computational complexity and running time at large-scale steps. Although existing acceleration methods reduce runtime for large-scale steps, but rely on manual step selection and overlook the varying importance of different stages in the generation process. To address this challenge, we present FasterVAR, a systematic study and plug-and-play acceleration framework for VAR models. Our analysis shows that early steps are critical for preserving semantic and structural consistency and should remain intact,while later steps mainly refine details and can be pruned or approximated for acceleration. Building on these insights, FasterVAR introduces a plug-and-play acceleration strategy that exploits semantic irrelevance and low-rank properties in late-stage computations, without requiring additional training. Our proposed FasterVAR achieves up to 3.4x speedup with almost no performance loss. consistently outperforming existing acceleration baselines.These results highlight stage-aware design as a powerful principle for efficient visual autoregressive image generation.

2307.06240 2026-05-28 cs.LG cs.AI cs.RO cs.SY eess.SY

DSSE: a drone swarm search environment

DSSE:无人机群搜索环境

Manuel Castanares, Luis F. S. Carrete, Enrico F. Damiani, Leonardo D. M. de Abreu, José Fernando B. Brancalion, Fabrício J. Barth

AI总结 基于PettingZoo的多智能体强化学习环境,无人机通过动态概率输入搜索目标。

Comments 7 pages

详情
AI中文摘要

无人机群搜索项目是一个基于 extsc{PettingZoo}的环境,用于多智能体(或单智能体)强化学习算法。在该环境中,智能体(无人机)必须找到目标(海难人员)。智能体不知道目标的位置,也不接收与自身到目标距离相关的奖励。然而,智能体会收到目标位于地图某个单元格的概率。该项目旨在辅助研究需要动态概率作为输入的强化学习算法。描述该软件第二版的同行评审论文已发表在JOSS上:https://doi.org/10.21105/joss.06746。

英文摘要

The Drone Swarm Search project is an environment, based on \textsc{PettingZoo}, that is to be used in conjunction with multi-agent (or single-agent) reinforcement learning algorithms. It is an environment in which the agents (drones), have to find the targets (shipwrecked people). The agents do not know the position of the target and do not receive rewards related to their own distance to the target(s). However, the agents receive the probabilities of the target(s) being in a certain cell of the map. The aim of this project is to aid in the study of reinforcement learning algorithms that require dynamic probabilities as inputs. A peer-reviewed paper describing version 2 of this software has been published in JOSS: https://doi.org/10.21105/joss.06746.

2512.12649 2026-05-28 cs.RO cs.SY eess.SY

Bayesian Optimization Parameter Tuning Framework for a Lyapunov Based Path Following Controller

基于Lyapunov的路径跟踪控制器的贝叶斯优化参数调优框架

Zhewen Zheng, Wenjing Cao, Hongkang Yu, Mo Chen, Takashi Suzuki

AI总结 针对非线性几何控制器中参数相互依赖导致手动调优效率低的问题,提出一种将闭环系统视为黑箱、利用高斯过程代理模型进行贝叶斯优化的数据高效调优方法,并在本田AI-Formula三轮机器人上验证了其在32次试验内提升控制器性能的有效性。

Comments The authors request withdrawal because the current arXiv version does not reflect the complete and finalized authorship record of the manuscript. The author list and contribution record require correction before further public dissemination

详情
AI中文摘要

实际实验中的参数调优受限于硬件上有限的评估预算。本文研究的路径跟踪控制器反映了非线性几何控制器中的典型情况,其中多个增益通过耦合非线性项影响动力学。这种相互依赖性使得手动调优效率低下,且在实际试验次数内难以获得令人满意的性能。为应对这一挑战,我们提出了一种贝叶斯优化(BO)框架,该框架将闭环系统视为黑箱,并使用高斯过程代理模型选择控制器增益。BO提供了无模型探索、量化不确定性和数据高效搜索,使其非常适合每次评估成本高昂的调优任务。该框架在Honda的AI-Formula三轮机器人上实现,并通过在固定测试轨道上重复全圈实验进行评估。结果表明,BO在32次试验内(包括15次预热初始评估)提升了控制器性能,表明它能够在实际条件下高效定位参数空间中的高性能区域。这些发现证明,BO为真实机器人平台上的非线性路径跟踪控制器提供了一种实用、可靠且数据高效的调优方法。

英文摘要

Parameter tuning in real-world experiments is constrained by the limited evaluation budget available on hardware. The path-following controller studied in this paper reflects a typical situation in nonlinear geometric controller, where multiple gains influence the dynamics through coupled nonlinear terms. Such interdependence makes manual tuning inefficient and unlikely to yield satisfactory performance within a practical number of trials. To address this challenge, we propose a Bayesian optimization (BO) framework that treats the closed-loop system as a black box and selects controller gains using a Gaussian-process surrogate. BO offers model-free exploration, quantified uncertainty, and data-efficient search, making it well suited for tuning tasks where each evaluation is costly. The framework is implemented on Honda's AI-Formula three-wheeled robot and assessed through repeated full-lap experiments on a fixed test track. The results show that BO improves controller performance within 32 trials, including 15 warm-start initial evaluations, indicating that it can efficiently locate high-performing regions of the parameter space under real-world conditions. These findings demonstrate that BO provides a practical, reliable, and data-efficient tuning approach for nonlinear path-following controllers on real robotic platforms.

2512.09800 2026-05-28 cs.LG cs.DC cs.PF

Ariel-ML: Computing Parallelization with Embedded Rust for Neural Networks on Heterogeneous Multi-core Microcontrollers

Ariel-ML:面向异构多核微控制器的嵌入式Rust神经网络计算并行化

Zhaolan Huang, Kaspar Schleiser, Gyungmin Myung, Emmanuel Baccelli

AI总结 针对多核MCU上TinyML推理的并行化需求,提出基于嵌入式Rust的Ariel-ML工具包,通过通用TinyML流水线和多核支持,在多种32位MCU上实现低延迟推理,并保持与C/C++相当的内存占用。

详情
AI中文摘要

低功耗微控制器(MCU)硬件正从单核架构向多核架构演进。同时,新的嵌入式软件构建块越来越多地用Rust编写,而C/C++在该领域的主导地位逐渐减弱。另一方面,各种小型人工神经网络(ANN)越来越多地部署在边缘AI用例中,直接在低功耗MCU上执行。在此背景下,增量改进和新颖创新服务需要不断通过已在现场部署的传感/执行系统上的嵌入式软件执行ANN来改造。然而,目前尚无能够自动并行化多核MCU上任意TinyML模型推理计算的Rust嵌入式软件平台。本文通过引入Ariel-ML填补了这一空白,这是一个新颖的工具包,结合了通用TinyML流水线和嵌入式Rust软件平台,能够充分利用各种32位微控制器系列(Arm Cortex-M、RISC-V、ESP-32)的多核能力。我们发布了其实现的完整开源代码,并使用多种TinyML模型对其性能进行了基准测试。结果表明,Ariel-ML在推理延迟方面优于现有技术,并且与使用嵌入式C/C++的现有工具包相比,实现了相当的内存占用。因此,Ariel-ML为TinyML从业者和资源受限的嵌入式Rust开发者提供了有用的基础。

英文摘要

Low-power microcontroller (MCU) hardware is currently evolving from single-core architectures to predominantly multi-core architectures. In parallel, new embedded software building blocks are more and more written in Rust, while C/C++ dominance fades in this domain. On the other hand, small artificial neural networks (ANN) of various kinds are increasingly deployed in edge AI use cases, thus deployed and executed directly on low-power MCUs. In this context, both incremental improvements and novel innovative services will have to be continuously retrofitted using ANNs execution in software embedded on sensing/actuating systems already deployed in the field. However, there was so far no Rust embedded software platform automating parallelization for inference computation on multi-core MCUs executing arbitrary TinyML models. This paper thus fills this gap by introducing Ariel-ML, a novel toolkit we designed combining a generic TinyML pipeline and an embedded Rust software platform which can take full advantage of multi-core capabilities of various 32bit microcontroller families (Arm Cortex-M, RISC-V, ESP-32). We published the full open source code of its implementation, which we used to benchmark its capabilities using a zoo of various TinyML models. We show that Ariel-ML outperforms prior art in terms of inference latency as expected, and we show that, compared to pre-existing toolkits using embedded C/C++, Ariel-ML achieves comparable memory footprints. Ariel-ML thus provides a useful basis for TinyML practitioners and resource-constrained embedded Rust developers.

2512.09786 2026-05-28 cs.LG cs.PF cs.SD eess.AS eess.SP

TinyDéjàVu: Smaller RAM and Faster Inference with Neural Networks on MCUs for Sensor Data Streams

TinyDéjàVu:用于传感器数据流的微控制器上更小RAM和更快推理的神经网络

Zhaolan Huang, Emmanuel Baccelli

AI总结 提出TinyDéjàVu框架,通过优化神经网络推理中的数据流,在微控制器上实现高达90%的RAM节省和相同计算延迟,用于传感器数据时间序列的推理。

详情
AI中文摘要

嵌入式智能的例子包括用于无线传感器和执行器上的各种微型神经网络,这些网络预期持续对感知数据的时间序列进行推理。为了满足电池供电时的寿命和能耗要求,此类硬件完全基于微控制器,并尽可能少的内存,例如128 kB的RAM。在此背景下,优化推理过程中神经网络层间的数据流变得至关重要。在本文中,我们介绍了一个新框架TinyDéjàVu以及我们设计的新算法,旨在大幅减少在典型微控制器硬件上使用各种神经网络模型对传感器数据时间序列进行推理所需的RAM预算。我们将TinyDéjàVu的实现开源,并在常见的微控制器硬件(Arm Cortex-M)上进行可重复的基准测试。我们表明,与先前工作(StreamiNNC)相比,在重叠滑动窗口输入上,TinyDéjàVu可以节省高达90%的RAM使用,同时计算延迟相同。

英文摘要

Examples of embedded intelligence include a wide variety of tiny neural networks used on-board wireless sensors and actuators, which are expected to continuously perform inference on time-series of the data they sense. In order to fit lifetime and energy consumption requirements when operating on battery, such hardware is exclusively based on microcontroller with as little memory as possible, e.g., 128 kB of RAM. In this context, optimizing data flows during inference across neural network layers becomes crucial. In this paper, we introduce a new framework, TinyDéjàVu, and novel algorithms we designed to drastically reduce the RAM budget required by inference using various neural network models for sensor data time-series on typical microcontroller hardware. We publish the implementation of TinyDéjàVu as open source, and we perform reproducible benchmarks on common microcontroller hardware (Arm Cortex-M). We show that TinyDéjàVu can save up to 90\% of RAM usage with equal compute latency compared to prior work (StreamiNNC) on overlapping sliding window inputs.

2512.00814 2026-05-28 cs.CV

IRPO: Boosting Image Restoration via Post-training GRPO

IRPO:通过后训练GRPO提升图像恢复

Haoxuan Xu, Yi Liu, Tianfu Li, Ruolin Shen, Boyuan Jiang, Jinlong Peng, Donghao Luo, Xiaobin Hu, Shuicheng Yan, Haoang Li

AI总结 提出IRPO框架,利用GRPO后训练优化确定性恢复模型,通过数据筛选和复合奖励建模,在域内和域外任务上显著提升性能。

详情
AI中文摘要

后训练在高层次生成任务中已变得有效,但在低层次视觉中的作用仍未被充分探索。现有的图像恢复方法通常依赖于对真实图像的固定逐像素拟合,这可能导致过度平滑和泛化能力弱。我们提出了IRPO,一个基于GRPO的后训练框架,用于确定性恢复模型。IRPO围绕两个轴构建:数据公式化和奖励建模。对于数据公式化,我们从预训练阶段选择表现最差的30%样本,这提高了准确性和训练效率。对于奖励建模,我们将面向保真度和面向质量的反馈与三个组件结合:用于结构保真度的通用奖励、使用视觉-语言模型作为粗粒度视觉质量评判的专家奖励,以及用于任务特定低级线索的恢复奖励。在六个域内和五个域外基准上的实验表明,IRPO在域内任务上将AdaIR基线提高了0.93 dB,在域外设置上提高了3.43 dB。我们的代码可在https://github.com/HaoxuanXU1024/IRPO查看。

英文摘要

Post-training has become effective for high-level generation, but its role in low-level vision remains underexplored. Existing image restoration methods often rely on fixed pixel-wise fitting to ground-truth images, which can lead to over-smoothing and weak generalization. We propose IRPO, a GRPO-based post-training framework for deterministic restoration models. IRPO is built around two axes: data formulation and reward modeling. For data formulation, we select the 30% underperforming samples from the pre-training stage, which improves both accuracy and training efficiency. For reward modeling, we combine fidelity-oriented and quality-aware feedback with three components: a General Reward for structural fidelity, an Expert Reward that uses a Vision-Language Model as a coarse visual-quality judge, and a Restoration Reward for task-specific low-level cues. Experiments on six in-domain and five out-of-domain (OOD) benchmarks show that IRPO improves the AdaIR baseline by 0.93 dB on in-domain tasks and 3.43 dB on OOD settings. Our code can be shown in https://github.com/HaoxuanXU1024/IRPO.

2508.13544 2026-05-28 cs.CV cs.AI

FLAIR: Frequency- and Locality-Aware Implicit Neural Representations

FLAIR: 频率与位置感知的隐式神经表示

Sukhun Ko, Seokhyun Youn, Dahyeon Kye, Kyle Min, Chanho Eom, Jihyong Oh

AI总结 针对隐式神经表示缺乏频率选择性和空间定位导致频谱偏差的问题,提出带限局部激活和小波能量引导编码,提升2D图像表示、3D形状重建和新视角合成性能。

Comments CVPR Findings 2026 (camera ready ver.). Please visit our project page at https://cmlab-korea.github.io/FLAIR/

详情
AI中文摘要

隐式神经表示利用神经网络将坐标映射到对应信号,实现连续且紧凑的表示。该范式推动了各种视觉任务的重大进展。然而,现有的隐式神经表示缺乏频率选择性和空间定位,导致过度依赖冗余信号分量。因此,它们表现出频谱偏差,倾向于早期学习低频分量,而难以捕捉精细的高频细节。为了解决这些问题,我们提出了FLAIR(频率与位置感知的隐式神经表示),它包含两个关键创新。第一个是带限局部激活(BLA),这是一种新颖的激活函数,设计用于在时频不确定性原理(TFUP)约束下进行联合频率选择和空间定位。通过结构化的频率控制和空间局部响应,BLA有效减轻了频谱偏差并增强了训练稳定性。第二个是小波能量引导编码(WEGE),它利用离散小波变换计算能量分数,并显式地将频率信息引导到网络,实现精确的频率选择和自适应频带控制。我们的方法在2D图像表示、3D形状重建和新视角合成方面始终优于现有的隐式神经表示。

英文摘要

Implicit Neural Representations (INRs) leverage neural networks to map coordinates to corresponding signals, enabling continuous and compact representations. This paradigm has driven significant advances in various vision tasks. However, existing INRs lack frequency selectivity and spatial localization, leading to an over-reliance on redundant signal components. Consequently, they exhibit spectral bias, tending to learn low-frequency components early while struggling to capture fine high-frequency details. To address these issues, we propose FLAIR (Frequency- and Locality-Aware Implicit Neural Representations), which incorporates two key innovations. The first is Band-Localized Activation (BLA), a novel activation designed for joint frequency selection and spatial localization under the constraints of the time-frequency uncertainty principle (TFUP). Through structured frequency control and spatially localized responses, BLA effectively mitigates spectral bias and enhances training stability. The second is Wavelet-Energy-Guided Encoding (WEGE), which leverages the discrete wavelet transform to compute energy scores and explicitly guide frequency information to the network, enabling precise frequency selection and adaptive band control. Our method consistently outperforms existing INRs in 2D image representation, as well as 3D shape reconstruction and novel view synthesis.

2501.01669 2026-05-28 cs.LG cs.RO

Inversely Learning Transferable Rewards via Abstracted States

通过抽象状态逆向学习可迁移奖励

Yikang Gui, Prashant Doshi

AI总结 提出一种通过行为轨迹逆向学习抽象奖励函数的方法,并在未见过的领域实例中验证其可迁移性。

Comments Accepted at IJCAI 2026

详情
AI中文摘要

逆向强化学习(IRL)在从行为数据中准确学习离散和连续领域中的潜在奖励方面取得了显著进展。下一步的进展是学习内在偏好,以在与观察到的设置或任务不同但一致的情况下产生有用行为。在机器人应用的背景下,这有助于将机器人集成到涉及新任务(具有共享的内在偏好)的处理线中,而无需从头编程。我们提出了一种方法,从领域中的两个或更多不同实例的行为轨迹中逆向学习一个抽象奖励函数。然后,该抽象奖励函数用于在领域的另一个单独实例中学习任务行为。这一步提供了其可迁移性的证据,并验证了其正确性。我们在OpenAI的Gym测试平台和AssistiveGym中多个领域的任务轨迹上评估了该方法,结果表明,学习到的抽象奖励函数能够成功地在各自领域的未见过的实例中学习任务行为。

英文摘要

Inverse reinforcement learning (IRL) has progressed significantly toward accurately learning the underlying rewards in both discrete and continuous domains from behavior data. The next advance is to learn {\em intrinsic} preferences in ways that produce useful behavior in settings or tasks which are different but aligned with the observed ones. In the context of robotic applications, this helps integrate robots into processing lines involving new tasks (with shared intrinsic preferences) without programming from scratch. We introduce a method to inversely learn an abstract reward function from behavior trajectories in two or more differing instances of a domain. The abstract reward function is then used to learn task behavior in another separate instance of the domain. This step offers evidence of its transferability and validates its correctness. We evaluate the method on trajectories in tasks from multiple domains in OpenAI's Gym testbed and AssistiveGym and show that the learned abstract reward functions can successfully learn task behaviors in instances of the respective domains, which have not been seen previously.

2506.10138 2026-05-28 cs.LG cs.AI

Path Channels and Plan Extension Kernels: a Mechanistic Description of Planning in a Sokoban RNN

路径通道与计划扩展核:Sokoban RNN中规划的机制描述

Mohammad Taufeeque, Aaron David Tucker, Adam Gleave, Adrià Garriga-Alonso

AI总结 通过逆向工程分析无模型强化学习训练的卷积循环神经网络,发现其通过路径通道存储未来动作计划,并利用卷积核实现双向传播与回溯规划。

Comments Published as a conference paper at ICLR 2026. 34 pages, 26 figures

详情
AI中文摘要

我们部分逆向工程了一个使用无模型强化学习训练来玩推箱子游戏Sokoban的卷积循环神经网络(RNN)。我们发现,RNN将未来动作(计划)存储为隐藏状态特定通道中的激活,我们称之为路径通道。特定位置的高激活意味着,当箱子在该位置时,它将被推向通道指定的方向。我们检查了路径通道之间的卷积核,发现它们编码了每个可能动作导致的位置变化,从而代表了部分学习到的转移模型。RNN通过从箱子和目标开始构建计划。这些核将路径通道中的激活从箱子向前传播,并从目标向后传播。负值被放置在障碍物处的通道中。这导致扩展核反向传播负值,从而修剪最后几步,让替代计划出现;这是一种回溯形式。我们的工作表明,对计划表示的精确理解使我们能够用更熟悉的术语直接理解无模型训练学到的双向规划类算法。

英文摘要

We partially reverse-engineer a convolutional recurrent neural network (RNN) trained with model-free reinforcement learning to play the box-pushing game Sokoban. We find that the RNN stores future moves (plans) as activations in particular channels of the hidden state, which we call path channels. A high activation in a particular location means that, when a box is in that location, it will get pushed in the channel's assigned direction. We examine the convolutional kernels between path channels and find that they encode the change in position resulting from each possible action, thus representing part of a learned transition model. The RNN constructs plans by starting at the boxes and goals. These kernels extend activations in path channels forwards from boxes and backwards from the goal. Negative values are placed in channels at obstacles. This causes the extension kernels to propagate the negative value in reverse, thus pruning the last few steps and letting an alternative plan emerge; a form of backtracking. Our work shows that, a precise understanding of the plan representation allows us to directly understand the bidirectional planning-like algorithm learned by model-free training in more familiar terms.

2512.02019 2026-05-28 cs.LG cs.AI stat.ML

Diffusion-Augmented Markov Decision Processes for Maximum Entropy Reinforcement Learning

扩散增强马尔可夫决策过程用于最大熵强化学习

Sebastian Sanokowski, Kaustubh Patil

AI总结 本文通过将最大熵强化学习扩展到扩散过程,提出扩散增强马尔可夫决策过程(DA-MDPs),以最小化反向KL散度的上界来学习最优策略轨迹分布,并成功将PPO、WPO和REPPO适配为扩散变体,在连续控制和多模态基准上取得与基线相当或更优的性能。

Comments Preprint

详情
AI中文摘要

扩散模型擅长从复杂的非归一化分布中采样。在这项工作中,我们将最大熵强化学习(ME-RL)扩展到扩散过程,从而能够从最优策略轨迹分布中采样。通过最小化扩散策略与最优策略轨迹分布之间的反向KL散度的可处理上界,我们推导出一个修改后的替代目标,并引入了扩散增强马尔可夫决策过程(DA-MDPs)。DA-MDPs允许将扩散策略无缝集成到任何ME-RL方法中,只需最小的修改。我们通过将近端策略优化(PPO)、Wasserstein策略优化(WPO)和相对熵路径策略优化(REPPO)适配为其基于扩散的变体:DA-MDP: PPO、DA-MDP: WPO和DA-MDP: REPPO,证明了其有效性。在标准连续控制基准上的实验结果表明,我们的方法匹配或优于基线方法,而在多模态基准上的实验证实了其建模多模态动作分布的能力。

英文摘要

Diffusion models excel at sampling from complex, unnormalized distributions. In this work, we extend Maximum Entropy Reinforcement Learning (ME-RL) to diffusion processes, enabling sampling from the optimal policy trajectory distribution. By minimizing a tractable upper bound on the reverse KL divergence between the diffusion policy and the optimal policy trajectory distributions, we derive a modified surrogate objective and introduce Diffusion-Augmented Markov Decision Processes (DA-MDPs). DA-MDPs allow for seamless integration of diffusion policies into any ME-RL method with minimal modifications. We demonstrate its effectiveness by adapting Proximal Policy Optimization (PPO), Wasserstein Policy Optimization (WPO), and Relative Entropy Pathwise Policy Optimization (REPPO) into their diffusion-based variants: DA-MDP: PPO, DA-MDP: WPO, and DA-MDP: REPPO. Empirical results on standard continuous-control benchmarks show that our approach matches or outperforms baseline methods, while experiments on multimodal benchmarks confirm its ability to model multimodal action distributions.

2512.01970 2026-05-28 cs.AI cs.CL

Atomic Skills are the Prerequisite: When Reinforcement Learning Synthesizes Compositional Reasoning, and When It Only Amplifies

原子技能是前提:当强化学习合成组合推理时,以及当它仅放大时

Sitao Cheng, Xunjian Yin, Ruiwen Zhou, Yuxuan Li, Xinyi Wang, Liangming Pan, William Yang Wang, Victor Zhong

AI总结 通过互补推理任务,研究强化学习是合成新技能还是仅放大已有技能,发现强化学习在基础模型通过监督微调掌握独立原子技能后才能合成新组合策略。

Comments Work in Progress. Code and data are available at https://github.com/sitaocheng/from_atomic_to_composite

详情
AI中文摘要

强化学习(RL)仅仅是放大现有技能,还是合成新技能?我们通过互补推理的视角研究这个问题:互补推理是整合内部知识与外部上下文的关键实践能力,是可靠的持续学习和检索增强生成的前提。为了避免预训练污染,我们构建了一个受控的语义合成传记数据集,并将这种能力分解为两个原子技能:参数推理(检索模型权重中编码的事实)和上下文推理(处理新的上下文信息)。我们有两个发现。首先,直接在复合任务上监督训练的模型在已知事实和推理路径上达到高准确率(90%),但在新事实和推理路径上崩溃(18%),表明监督微调(SFT)依赖于死记硬背而非真正的技能整合。其次,RL弥合了这一泛化差距,充当技能合成器而非仅仅是放大器——但只有在严格的前提条件下:只有当基础模型首先通过SFT掌握了独立的原子技能时,它才能合成新的组合策略。这些结果表明,解耦的原子训练后接RL为复杂的新推理提供了一条可扩展的路径。

英文摘要

Does Reinforcement Learning (RL) merely amplify existing skills, or synthesize novel skills? We investigate this question through the lens of Complementary Reasoning: the critical practical capability of integrating internal knowledge with external context, a prerequisite for reliable Continual Learning and Retrieval-Augmented Generation. To avoid pre-training contamination, we construct a controlled semanticsynthetic dataset of biographies and decompose this capability into two atomic skills: Parametric Reasoning (retrieving facts encoded in model weights) and Contextual Reasoning (processing novel in-context information). We present two findings. First, models supervised directly on the composite task reach high accuracy on seen facts and reasoning paths (90%) but collapse on novel facts and reasoning paths (18%), indicating that Supervised Fine-Tuning (SFT) relies on rote memorization rather than genuine skill integration. Second, RL bridges this generalization gap, acting as a skill synthesizer rather than a mere amplifier--but only under a strict prerequisite: it synthesizes new composite strategies only when the base model has first mastered the independent atomic skills via SFT. These results suggest that decoupled atomic training followed by RL offers a scalable path to complex novel reasoning.

2512.01988 2026-05-28 cs.CV

Artemis: Structured Visual Reasoning for Perception Policy Learning

Artemis: 用于感知策略学习的结构化视觉推理

Wei Tang, Yanpeng Sun, Shan Zhang, Weihao Bo, Xiaofan Li, Piotr Koniusz, Wei Li, Na Zhao, Zechao Li

AI总结 提出Artemis方法,通过结构化视觉推理(中间步骤表示为(标签,边界框)对)替代语言推理,提升视觉感知策略的性能,并统一处理多种感知任务。

详情
AI中文摘要

最近的视觉感知策略强化学习框架通常结合用自然语言表达的中间推理链。经验观察表明,这种纯语言中间推理通常会降低感知任务的性能。我们认为核心问题不在于推理本身,而在于推理的形式:虽然这些链在非结构化的语言空间中进行语义推理,但视觉感知需要在空间和以对象为中心的空间中进行推理。为此,我们引入了Artemis,一种感知策略学习方法,它执行结构化的视觉推理,其中每个中间步骤都表示为一个(标签,边界框)对,捕获可验证的视觉状态。这种设计能够显式跟踪中间状态,直接监督提议质量,并避免基于语言的推理引入的歧义。基于可验证和空间定位的推理链,Artemis为各种感知任务提供了统一的架构,无需依赖先前感知策略模型所依赖的任务特定设计。使用自然图像域中的定位和检测样本进行训练,Artemis泛化到计数和几何感知任务。其核心是空间定位的、以对象为中心的链式规则,为可扩展和通用的感知策略提供了原则性基础。

英文摘要

Recent reinforcement-learning frameworks for visual perception policy usually incorporate intermediate reasoning chains expressed in natural language. Empirical observations indicate that such purely linguistic intermediate reasoning often reduces performance on perception tasks. We argue that the core issue lies not in reasoning per se but in the form of reasoning: while these chains perform semantic reasoning in an unstructured linguistic space, \textbf{visual perception requires reasoning in a spatial and object-centric space}. In response, we introduce \textbf{Artemis}, a perception-policy learning method that performs structured visual reasoning, where each intermediate step is represented as a (label, bounding-box) pair capturing a verifiable visual state. This design enables explicit tracking of intermediate states, direct supervision for proposal quality, and avoids ambiguity introduced by language-based reasoning. Building upon verifiable and spatially grounded reasoning chains, Artemis provides a unified architecture for diverse perceptual tasks, without requiring the task-specific designs relied upon by prior perceptual policy models. Trained using grounding and detection sampeles in natural image domains, Artemis generalizes to counting and geometric perception tasks. At its core, a spatially grounded, object-centric chain rule provides a principled foundation for scalable and general perceptual policies.

2511.20934 2026-05-28 cs.AI cs.CV cs.LG

Guaranteed Optimal Compositional Explanations for Neurons

神经元的保证最优组合解释

Biagio La Rosa, Leilani H. Gilpin

AI总结 提出首个框架,通过分解、启发式和算法,在完整状态空间上计算保证最优的组合解释,并证明10-40%的波束搜索解释在概念重叠时非最优。

Comments Accepted at ICML 2026 (Oral), 43 pages, 10 figures

详情
AI中文摘要

组合解释是一类方法,旨在通过逻辑规则描述神经元感受野激活与概念之间的空间对齐,通常通过搜索所有可能的概念组合来计算。由于在整个状态空间上计算空间对齐在计算上不可行,文献中通常采用与组合结构相关的假设和波束搜索来限制状态空间。然而,波束搜索无法提供任何最优性的理论保证,且当前解释与真正最优解的接近程度仍不清楚。在这篇理论性论文中,我们通过引入首个框架来解决这一差距,该框架在采用假设所涵盖的整个状态空间上计算保证最优的组合解释。具体而言,我们提出:(i) 一种识别影响空间对齐因素的分解方法,(ii) 一种在搜索任何阶段估计对齐的启发式方法,以及(iii) 第一个能够在与穷举波束搜索相当的时间内计算最优组合解释的算法。使用该框架,我们证明当涉及重叠概念时,先前通过波束搜索获得的10-40%的解释是次优的。最后,我们评估了一种由我们提出的分解和启发式方法引导的波束搜索变体,表明它在超参数和计算资源方面提供更大灵活性的同时,匹配或改进了先前方法的运行时间。

英文摘要

Compositional explanations are a family of methods that aim to describe the spatial alignment between neurons' receptive field activations and concepts through logical rules, typically computed via a search over all possible concept combinations. Since computing the spatial alignment over the entire state space is computationally infeasible, the literature commonly adopts assumptions related to the structure of the combinations and beam search to restrict the state space. However, beam search cannot provide any theoretical guarantees of optimality, and it remains unclear how close current explanations are to the true optimum. In this theoretical paper, we address this gap by introducing the first framework for computing guaranteed optimal compositional explanations over the entire state space spanned by the adopted assumptions. Specifically, we propose: (i) a decomposition that identifies the factors influencing the spatial alignment, (ii) a heuristic to estimate the alignment at any stage of the search, and (iii) the first algorithm that can compute optimal compositional explanations in a time comparable to exhaustive beam search. Using this framework, we demonstrate that 10-40% of explanations previously obtained with beam search are suboptimal when overlapping concepts are involved. Finally, we evaluate a beam-search variant guided by our proposed decomposition and heuristic, showing that it matches or improves runtime over prior methods while offering greater flexibility in hyperparameters and computational resources.

2511.20439 2026-05-28 cs.CV cs.AI

Object-Centric Vision Token Pruning for Vision Language Models

面向视觉语言模型的以对象为中心的视觉令牌剪枝

Guangyuan Li, Rongzhen Zhao, Jinhong Deng, Yanbo Wang, Joni Pajarinen

AI总结 提出OC-VTP方法,通过轻量预训练以对象为中心的视觉令牌剪枝器,直接选择最具代表性的视觉令牌,在保持高精度的同时提升VLM推理效率。

详情
AI中文摘要

在视觉语言模型(VLM)中,与语言令牌相比,视觉令牌数量庞大但信息分散,因此消耗了大量不必要的计算。为了提升VLM推理效率,剪枝冗余视觉令牌的研究一直在进行,但现有方法都采用间接且无保证的方式。我们提出了OC-VTP,一种直接且有保证的方法,用于选择最具代表性的视觉令牌,以实现高效且保持精度的VLM推理。我们的OC-VTP仅需对一个小型的以对象为中心的视觉令牌剪枝器进行轻量预训练,然后即可将其插入现有VLM中,无需在任何数据集上微调任何模型。通过最小化从所选令牌重建原始未剪枝令牌的误差,保证保留最具代表性的视觉令牌。在任何视觉剪枝比例(即推理效率)下,我们的OC-VTP都能一致地帮助主流VLM保持最高的推理精度。我们的剪枝还展示了有趣的可解释性。我们的代码可在 https://github.com/GarryLarry010131/OC-VTP 获取。

英文摘要

In Vision Language Models (VLMs), vision tokens are quantity-heavy yet information-dispersed compared with language tokens, thus consume too much unnecessary computation. Pruning redundant vision tokens for high VLM inference efficiency has been continuously studied but all existing methods resort to indirect and non-guaranteed ways. We propose OC-VTP, a direct and guaranteed approach to select the most representative vision tokens for high-efficiency yet accuracy-preserving VLM inference. Our OC-VTP requires merely light-weight pre-training of a small object-centric vision token pruner, which can then be inserted into existing VLMs, without fine-tuning of any models on any datasets. It is gauranteed that the most representative vision tokens are kept by minimizing the error in reconstructing the original unpruned tokens from the selected ones. Across any vision pruning ratios, i.e., inference efficiency, our OC-VTP consistently helps mainstream VLMs to preserve the highest inference accuracy. Our pruning also demonstrates interesting interpretability. Our codes are available at https://github.com/GarryLarry010131/OC-VTP.

2511.02558 2026-05-28 cs.CV cs.LG q-bio.NC

Forecasting Future Anatomies: Longitudinal Brain Mri-to-Mri Prediction

预测未来解剖结构:纵向脑MRI到MRI的预测

Ali Farki, Elaheh Moradi, Deepika Koundal, Jussi Tohka

AI总结 本文研究从基线MRI预测未来脑部MRI,采用五种深度学习架构(UNet、U2-Net、UNETR、时间嵌入UNet和ODE-UNet)在ADNI和AIBL数据集上实现高保真体素级预测,并验证了跨队列泛化能力。

详情
Journal ref
2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI), Apr. 2026
AI中文摘要

从基线磁共振图像(MRI)预测未来脑状态是神经影像学的一个核心挑战,对研究阿尔茨海默病(AD)等神经退行性疾病具有重要意义。大多数现有方法预测未来认知评分或临床结果,例如从轻度认知障碍向痴呆的转化。相反,本文研究纵向MRI图像到图像的预测,该预测可以预测参与者未来数年的整个脑部MRI,内在建模复杂的、空间分布的神经退行模式。我们在两个纵向队列(ADNI和AIBL)上实施并评估了五种深度学习架构(UNet、U2-Net、UNETR、时间嵌入UNet和ODE-UNet)。使用捕捉全局相似性和局部差异的指标,将预测的随访MRI与实际随访扫描直接进行比较。表现最佳的模型实现了高保真预测,并且所有模型都能很好地泛化到独立的外部数据集,展示了稳健的跨队列性能。我们的结果表明,深度学习可以在体素水平上可靠地预测参与者特定的脑部MRI,为个体化预后提供了新的机会。

英文摘要

Predicting future brain state from a baseline magnetic resonance image (MRI) is a central challenge in neuroimaging and has important implications for studying neurodegenerative diseases such as Alzheimer's disease (AD). Most existing approaches predict future cognitive scores or clinical outcomes, such as conversion from mild cognitive impairment to dementia. Instead, here we investigate longitudinal MRI image-to-image prediction that forecasts a participant's entire brain MRI several years into the future, intrinsically modeling complex, spatially distributed neurodegenerative patterns. We implement and evaluate five deep learning architectures (UNet, U2-Net, UNETR, Time-Embedding UNet, and ODE-UNet) on two longitudinal cohorts (ADNI and AIBL). Predicted follow-up MRIs are directly compared with the actual follow-up scans using metrics that capture global similarity and local differences. The best performing models achieve high-fidelity predictions, and all models generalize well to an independent external dataset, demonstrating robust cross-cohort performance. Our results indicate that deep learning can reliably predict participant-specific brain MRI at the voxel level, offering new opportunities for individualized prognosis.

2511.15390 2026-05-28 cs.CV

Automatic Pruning Discovery for Large Language Models

大型语言模型的自动剪枝发现

Haidong Kang, Lihong Lin, Enneng Yang, Hongning Dai, Hao Wang

AI总结 提出AutoPrune方法,利用LLMs自动设计剪枝算法,并通过图驱动思维链优化提示,结合偏态感知动态稀疏分配解决高剪枝率下的异常值问题,在主流基准上超越现有方法。

Comments 15 pages, 10 figures

详情
AI中文摘要

大型语言模型(LLMs)在广泛任务上取得了显著性能,但由于其庞大的规模,阻碍了实际部署。现有的针对LLMs的剪枝方法(例如Wanda)严重依赖手动设计的剪枝算法,从而导致巨大的人力成本并需要专家知识。此外,我们首次识别出在高剪枝率下由均匀稀疏性导致的严重异常值问题,这引发了关于如何为LLMs设计自适应剪枝稀疏度的额外担忧。LLMs能否自行剪枝?在这项工作中,我们通过提出一种名为AutoPrune的新型剪枝方法给出了肯定答案,该方法首次通过利用LLMs自动为其自身设计最优剪枝算法,无需任何专家知识,从而克服了专家知识的限制。具体来说,为了缓解LLMs的黑箱性质,我们提出了一种图驱动思维链(GCoT)来优化提示,显著增强了学习剪枝算法中的推理过程,并使我们能够生成具有卓越性能和可解释性的下一代剪枝算法。最后,基于对异常值问题的洞察,我们引入了偏态感知动态稀疏分配(SDSA)来克服异常值问题,减轻高剪枝率下的性能下降。我们在主流LLMs基准上进行了广泛实验,证明了AutoPrune的优越性,它始终优于最先进的竞争对手。

英文摘要

Large language models (LLMs) have achieved remarkable performance on a wide range of tasks, hindering real-world deployment due to their massive size. Existing pruning methods (e.g., Wanda) tailored for LLMs rely heavily on manual design pruning algorithms, thereby leading to huge labor costs and requires expert knowledge. Furthermore, we are the first to identify the serious outlier value issue behind dramatic performance degradation under high pruning ratios that are caused by uniform sparsity, raising an additional concern about how to design adaptive pruning sparsity ideal for LLMs. Can LLMs prune by themselves? In this work, we introduce an affirmative answer by proposing a novel pruning method called AutoPrune, which first overcomes expert knowledge limits by leveraging LLMs to design optimal pruning algorithm for themselves automatically without any expert knowledge. Specifically, to mitigate the black-box nature of LLMs, we propose a Graph-driven Chain-of-Thought (GCoT) to optimize prompts, significantly enhancing the reasoning process in learning the pruning algorithm and enabling us to generate pruning algorithms with superior performance and interpretability in the next generation. Finally, grounded in insights of outlier value issue, we introduce Skew-aware Dynamic Sparsity Allocation (SDSA) to overcome the outlier value issue, mitigating performance degradation under high pruning ratios. We conduct extensive experiments on mainstream LLMs benchmarks, demonstrating the superiority of AutoPrune, which consistently excels state-of-the-art competitors.

2511.14558 2026-05-28 cs.CV

Explaining Digital Pathology Models via Clustering Activations

通过激活聚类解释数字病理学模型

Adam Bajger, Jan Obdržálek, Vojtěch Kůr, Rudolf Nenutil, Petr Holub, Vít Musil, Tomáš Brázdil

AI总结 提出一种基于卷积神经网络激活聚类的可解释性方法,通过展示模型全局行为并提供细粒度信息,增强对数字病理学模型的理解和信任。

详情
Journal ref
2026 IEEE 23rd International Symposium on Biomedical Imaging (ISBI)
AI中文摘要

我们提出了一种基于聚类的可解释性技术,用于基于卷积神经网络的数字病理学模型。与常用的基于显著性图的方法(如遮挡、GradCAM或相关性传播)不同,这些方法突出显示对单个切片预测贡献最大的区域,而我们的方法展示了所考虑模型的全局行为,同时提供了更细粒度的信息。结果聚类不仅可以可视化以理解模型,还可以增加对其操作的信心,从而在临床实践中更快地采用。我们还评估了我们的技术在现有用于检测前列腺癌的模型上的性能,证明了其实用性。

英文摘要

We present a clustering-based explainability technique for digital pathology models based on convolutional neural networks. Unlike commonly used methods based on saliency maps, such as occlusion, GradCAM, or relevance propagation, which highlight regions that contribute the most to the prediction for a single slide, our method shows the global behaviour of the model under consideration, while also providing more fine-grained information. The result clusters can be visualised not only to understand the model, but also to increase confidence in its operation, leading to faster adoption in clinical practice. We also evaluate the performance of our technique on an existing model for detecting prostate cancer, demonstrating its usefulness.

2511.09572 2026-05-28 cs.AI cs.LG cs.SE

SynthTools: A Framework for Scaling Synthetic Tools for Agent Development

SynthTools: 用于扩展智能体开发中合成工具的框架

Tommaso Castellani, Naimeng Ye, Daksh Mittal, Thomson Yen, Emmanouil Koukoumidis, William Zeng, Hongseok Namkoong

AI总结 提出基于LLM的端到端管道SynthTools,通过环境生成、模拟、验证和任务构建,生成大规模多样化工具使用环境,提升智能体工具使用能力。

详情
AI中文摘要

为了使智能体系统能够使用外部工具解决复杂、长期的任务,我们需要大量多样且可控的工具使用环境。我们引入了SynthTools,一个完全基于LLM的管道,涵盖整个生命周期:环境生成、模拟、验证和任务构建。通过端到端地使用LLM,我们的框架补充了其他受限于真实API复杂性的工具使用环境,并通过设计确保可扩展性和可控性。该框架由三个组件组成:自上而下的环境生成,分层构建多样化的、基于领域的工具环境;环境模拟与验证,确保工具能够可靠地模拟并过滤掉无法模拟的工具;以及自下而上的任务与轨迹生成,产生可解决且可验证的任务以及多步轨迹,对难度、长度、轨迹组成和领域焦点进行控制以保证灵活性。作为具体实例,我们发布了包含6800个环境和100个领域中的73883个经过验证的工具、79925个可验证任务的数据集,以及大规模生成轨迹的管道。在这些任务生成的轨迹语料库上训练不同规模的Qwen3模型,在多个工具使用基准测试(包括真实API)上取得了提升,表明在合成数据上训练的工具使用能力可能迁移到某些真实环境。这些结果共同表明,SynthTools可以作为大规模训练工具使用智能体的有用基础设施。

英文摘要

For agentic systems to use external tools to solve complex, long-horizon tasks, we need a large set of diverse and controllable tool-use environments. We introduce SynthTools, a fully LLM-based pipeline spanning the entire lifecycle: environment generation, simulation, validation and task construction. By operating end-to-end through LLMs, our framework complements other tool-use environments bottlenecked by the complexity of real APIs, and ensures scalability and controllability by design. The framework consists of three components: top-down environment generation, which hierarchically constructs diverse, domain-grounded tool environments; environment simulation and validation, which ensures tools can be reliably emulated and filters out those that cannot; and bottom-up task and trajectory generation, which produces solvable and verifiable tasks together with multi-step trajectories, exposing control over difficulty, length, trajectory composition, and domain focus to guarantee flexibility. As a concrete instantiation, we release the dataset comprising $73{,}883$ validated tools across $6{,}800$ environments and $100$ fields, $79{,}925$ verifiable tasks as well as the pipeline to generate trajectories at scale. Training Qwen3 models of various sizes on a corpus of trajectories generated from these tasks yields gains across multiple tool-use benchmarks, including real APIs, indicating tool-use capabilities trained on synthetic data may transfer to some real environments. Together, these results suggest that SynthTools can serve as a useful infrastructure for large-scale training of tool-use agents.

2511.05550 2026-05-28 cs.SD cs.CL cs.LG

Assessing Factual Music Comprehension in Large Audio Language Models

评估大型音频语言模型中的事实音乐理解能力

Daniel Chenyu Lin, Michael Freeman, John Thickstun

AI总结 针对现有MusicQA数据集无法衡量模型回答事实正确性的问题,提出基于可验证信息的评估协议,通过精确率、召回率和F1分数客观评估模型,并在三个数据集上定义六项事实检索任务,对九个最新LALM进行基准测试。

Comments 16 pages; second submission

详情
AI中文摘要

大型音频语言模型(LALMs)利用多模态表示生成对音频自然语言查询的开放式回答。本文(1)提供经验证据表明,使用流行的MusicQA数据集评估LALMs无法衡量模型关于音乐的回答是否事实正确,(2)开发了一种新的评估LALMs音乐理解能力的协议。具体来说,我们提出一个评估协议,提示LALM提供可事实验证的信息,并将其开放式回答解析为结构化格式,使用精确率、召回率和F1分数进行客观评估。利用该协议,我们定义了一个基准测试,包含在三个不同数据集(MusicNet、Free Music Archive和OverClocked ReMix)上定义的六项事实信息检索任务。我们对九个最近的LALMs进行了基准测试,包括前沿模型如Gemini和最新的开放模型如Music Flamingo,并在https://github.com/DCL2004/LALM-Eval发布了评估脚本套件,以方便新LALMs的基准测试。

英文摘要

Large audio language models (LALMs) leverage multimodal representations to generate open-ended answers to natural language queries about audio. In this paper, we (1) provide empirical evidence that assessment of LALMs using the popular MusicQA dataset fails to measure whether a model's responses about music are factually correct, and (2) develop a new protocol for assessing the music comprehension capabilities of LALMs. Specifically, we propose an evaluation protocol that prompts a LALM for factually verifiable information, and parses its open-ended response into a structured format that can be objectively assessed using Precision, Recall, and F1 scores. Using this protocol, we define a benchmark consisting of six factual information retrieval tasks defined on three diverse datasets: MusicNet, the Free Music Archive, and OverClocked ReMix. We benchmark nine recent LALMs, including frontier models like Gemini and the latest open models like Music Flamingo, and release the suite of evaluation scripts at https://github.com/DCL2004/LALM-Eval to facilitate benchmarking of new LALMs.

2511.02398 2026-05-28 cs.LG

A Spatially Informed Gaussian Process UCB Method for Decentralized Coverage Control

一种用于分散覆盖控制的空间信息高斯过程UCB方法

Gennaro Guidone, Luca Monegaglia, Elia Raimondi, Han Wang, Mattia Bianchi, Florian Dörfler

AI总结 提出一种基于高斯过程上置信界(GP-UCB)的分散算法,通过结合期望位置成本与方差探索项,使智能体自主平衡探索与利用,实现未知空间覆盖控制。

详情
AI中文摘要

我们提出了一种新颖的分散算法,用于由高斯过程(GP)建模的未知空间环境中的覆盖控制。为了在探索与利用之间进行权衡,每个智能体通过最小化局部成本函数自主确定其轨迹。受GP-UCB(高斯过程上置信界)采集函数的启发,所提出的成本将期望位置成本与基于方差的探索项相结合,引导智能体朝向预测密度高且模型不确定性大的区域。与以往工作相比,我们的算法完全以分散方式运行,仅依赖局部观测和与邻居智能体的通信。特别地,智能体使用贪婪选择策略定期更新其诱导点,从而实现可扩展的在线GP更新。我们通过仿真证明了算法的有效性。

英文摘要

We present a novel decentralized algorithm for coverage control in unknown spatial environments modeled by Gaussian Processes (GPs). To trade-off between exploration and exploitation, each agent autonomously determines its trajectory by minimizing a local cost function. Inspired by the GP-UCB (Upper Confidence Bound for GPs) acquisition function, the proposed cost combines the expected locational cost with a variance-based exploration term, guiding agents toward regions that are both high in predicted density and model uncertainty. Compared to previous work, our algorithm operates in a fully decentralized fashion, relying only on local observations and communication with neighboring agents. In particular, agents periodically update their inducing points using a greedy selection strategy, enabling scalable online GP updates. We demonstrate the effectiveness of our algorithm in simulation.