2606.19754 2026-06-19 cs.LG cs.NA math.NA 新提交

Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System

基于物理信息广度学习系统的偏微分方程通用逼近学习

Zhiwen Yu, Derong Yang, Liujian Zhang, Kaixiang Yang, Peilin Zhan, Jianmin Lv, Jane You, C. L. Philip Chen

发表机构 * School of Computer Science and Engineering, South China University of Technology（华南理工大学计算机科学与工程学院）； Peng Cheng Laboratory（鹏城实验室）； School of Future Technology, South China University of Technology（华南理工大学未来技术学院）； School of Computer Science and Technology, Guangdong University of Technology（广东工业大学计算机科学与技术学院）； Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University（香港理工大学工业及系统工程学系）

AI总结提出物理信息广度学习系统（PIBLS），通过无反向传播的最小二乘优化高效求解线性和非线性偏微分方程，比传统PINN快1-3个数量级且精度更高。

详情

AI中文摘要

偏微分方程（PDE）在建模复杂的物理、生物和工程系统中起着核心作用。虽然传统的数值求解器很稳健，但由于网格依赖性，它们常常带来高昂的计算成本，而最近的物理信息神经网络（PINN）提供了一种无网格替代方案，但经常遭受收敛缓慢和优化不稳定的问题。为了弥合这一差距，本文提出了物理信息广度学习系统（PIBLS），一种新颖的无反向传播框架，将PDE求解重新表述为直接的最小二乘优化。我们改进了该框架内的一个算法以高效处理非线性PDE，并提供了严格的数学证明，确立了PIBLS对这些方程的通用逼近性质。在线性和非线性PDE上的实验表明，PIBLS比传统PINN快1到3个数量级，同时实现了显著更高的求解精度。该框架为科学机器学习提供了一种计算高效的范式，为实时仿真和设计优化任务提供了一种实用、高速的替代方案。

英文摘要

Partial differential equations (PDEs) play a central role in modeling complex physical, biological, and engineering systems. While traditional numerical solvers are robust, they often incur prohibitive computational costs due to mesh dependencies, whereas recent Physics-Informed Neural Networks (PINNs) offer a mesh-free alternative but frequently suffer from slow convergence and optimization instability. To bridge this gap, this article proposes the Physics-Informed Broad Learning System (PIBLS), a novel backpropagation-free framework that reformulates PDE solving as a direct least-squares optimization. We improved an algorithm within this framework to handle nonlinear PDEs efficiently and provide a rigorous mathematical proof establishing the universal approximation property of PIBLS for these equations. Experiments on linear and nonlinear PDEs demonstrate that PIBLS is one to three orders of magnitude faster than conventional PINNs while achieving significantly higher solution accuracy. This framework provides a computationally efficient paradigm for scientific machine learning, offering a practical, high-speed alternative for real-time simulation and design optimization tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.19850 2026-06-19 cs.LG cs.AI 新提交

Neural Additive and Basis Models with Feature Selection and Interactions

具有特征选择和交互的神经加性模型与神经基础模型

Yasutoshi Kishimoto, Kota Yamanishi, Takuya Matsuda, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结提出在神经加性模型和神经基础模型中引入特征选择机制，通过特征选择层减少计算开销，并支持高维数据中的特征交互学习，性能优于或持平于现有GAM方法。

Comments Accepted at PAKDD 2024. Code is available at https://github.com/shiralab/NAM-FS

详情

DOI: 10.1007/978-981-97-2259-4_1

AI中文摘要

深度神经网络（DNN）在各个领域表现出色，但通常可解释性较低。神经加性模型（NAM）及其变体神经基础模型（NBM）在广义加性模型（GAM）中使用神经网络（NN）作为非线性形状函数。这两种模型具有高度可解释性，并且在NN训练中表现出良好的性能和灵活性。NAM和NBM基于GAM架构，可以提供并可视化每个特征对预测的贡献。然而，当使用双输入NN来考虑特征交互或将其应用于高维数据集时，由于所需计算资源的增加，训练NAM和NBM变得棘手。本文提出将特征选择机制融入NAM和NBM以解决计算瓶颈。我们在两种模型中引入特征选择层，并在训练过程中更新选择权重。我们的方法简单，与原始NAM和NBM相比，可以降低计算成本和模型大小。此外，它使我们即使在数据维度很高的情况下也能使用双输入NN并捕获特征交互。我们证明，所提出的模型与原始NAM和NBM相比计算效率更高，并且与最先进的GAM相比表现出更好或相当的性能。

英文摘要

Deep neural networks (DNNs) exhibit attractive performance in various fields but often suffer from low interpretability. The neural additive model (NAM) and its variant called the neural basis model (NBM) use neural networks (NNs) as nonlinear shape functions in generalized additive models (GAMs). Both models are highly interpretable and exhibit good performance and flexibility for NN training. NAM and NBM can provide and visualize the contribution of each feature to the prediction owing to GAM-based architectures. However, when using two-input NNs to consider feature interactions or when applying them to high-dimensional datasets, training NAM and NBM becomes intractable due to the increase in the computational resources required. This paper proposes incorporating the feature selection mechanism into NAM and NBM to resolve computational bottlenecks. We introduce the feature selection layer in both models and update the selection weights during training. Our method is simple and can reduce computational costs and model sizes compared to vanilla NAM and NBM. In addition, it enables us to use two-input NNs even in high-dimensional datasets and capture feature interactions. We demonstrate that the proposed models are computationally efficient compared to vanilla NAM and NBM, and they exhibit better or comparable performance with state-of-the-art GAMs.

URL PDF HTML ☆

赞 0 踩 0

2606.19853 2026-06-19 cs.LG physics.comp-ph 新提交

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

带有挤压-激励式注意力的物理信息神经网络

Yun-Fei Song, Long-Gang Pang, Fu-Peng Li, Jun-Jie Zhang

发表机构 * Key Laboratory of Quark and Lepton Physics (MOE) & Institute of Particle Physics, Central China Normal University（华中师范大学夸克与轻子物理教育部重点实验室及粒子物理研究所）； Artificial Intelligence and Computational Physics Research Center, Central China Normal University（华中师范大学人工智能与计算物理研究中心）； Key Laboratory of Nuclear Physics and Ion-beam Application (MOE) & Institute of Modern Physics, Fudan University（复旦大学核物理与离子束应用教育部重点实验室及现代物理研究所）； Shanghai Research Center for Theoretical Nuclear Physics, NSFC and Fudan University（国家自然科学基金委员会-复旦大学上海理论核物理研究中心）； Northwest Institute of Nuclear Technology（西北核技术研究所）

AI总结提出SEA-PINN架构，通过挤压-激励式注意力机制动态调整神经元重要性，实现稳定初始化，在20个基准问题中17个方差极小，无需傅里叶嵌入或周期激活即可达到与TSA-PINN相当的精度，并可作为轻量插件提升其他PINN性能。

Comments 15 pages, 6 figures

详情

AI中文摘要

我们引入了SEA-PINN，一种新颖的架构，它将类似挤压-激励的注意力机制融入物理信息神经网络，以动态重新校准各层神经元的重要性。SEA-PINN的一个关键特性是其高度稳定的初始化。在20个基准问题中的17个上，SEA-PINN表现出几乎可忽略的方差和显著降低的初始损失，为优化建立了一个准确定且有利的起点。值得注意的是，在没有采用傅里叶特征嵌入或周期激活函数的情况下，SEA-PINN与TSA-PINN（一种通过正弦激活中的可学习频率专门为高频问题设计的模型）相比，达到了具有竞争力的精度（在高频案例7上，相对于FNN-PINN的改进分别为83%和90%）。此外，将SEA-PINN集成到TSA-PINN中使性能提升了42.49%。这些结果强调了SEA-PINN作为一种轻量级插件模块，能够增强非线性表示能力，促进更稳健和高效的收敛，并提高物理信息学习的整体可靠性。

英文摘要

We introduce SEA-PINN, a novel architecture that incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. A key feature of SEA-PINN is its highly stable initialization. On 17 out of 20 benchmark problems, SEA-PINN exhibit nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Notably, without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attained competitive accuracy (83\% vs. 90\% improvement relative to FNN-PINN on the high-frequency case 7) as compared with TSA-PINN-a model specifically engineered for high-frequency problems via learnable frequencies in sinusoidal activations. Furthermore, integrating SEA-PINN into TSA-PINN boosted performance by 42.49\%. These results underscore SEA-PINN as a lightweight plug-in module that enhances nonlinear representation power, promotes more robust and efficient convergence, and strengthens the overall reliability of physics-informed learning.

URL PDF HTML ☆

赞 0 踩 0

2606.19941 2026-06-19 cs.LG 新提交

Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds

组合性在窄深度-连接性区域中涌现：架构约束与解流形

Dat H. Do, Rushi Shah, Duc V. Le, Dianbo Liu

发表机构 * National University of Singapore（新加坡国立大学）； University of Twente（特温特大学）

AI总结研究发现组合性仅在特定稀疏网络和特定深度区间涌现，提出基于相似性的剪枝和深度预测方法，并用理论框架解释原因。

详情

AI中文摘要

组合性被认为是泛化的基础，使模型能够在新颖组合中重用有意义的原语。然而，使用标准梯度优化训练的模型很少且通常仅微弱地表现出组合内部结构，并且尚不清楚这种组合性如何或为何形成。在这项工作中，我们表明组合性在一个狭窄的连接性-深度最佳点涌现。沿着连接性轴，组合性仅出现在某些特定稀疏网络中，严重依赖于保留哪些连接而非仅权重的稀疏性。沿着深度轴，组合性在一个狭窄的、目标依赖的区域内涌现，在特定深度达到峰值，而更浅和更深的网络都失败。当深度或连接性条件被违反时，梯度下降会静默地收敛到破碎解而非组合解。为了发现并利用这种涌现，我们引入了（i）基于相似性的剪枝（SP）以恢复组合连接性，以及（ii）一个启发式深度预测器以估计组合性最可能出现的深度。最后，我们通过基于组合稀疏性、体积比论证和特征干扰界限的理论框架支持这些实证发现，解释了为什么组合解仅在狭窄的深度-连接性区域内可达。

英文摘要

Compositionality is believed to be the foundation for generalization, enabling models to reuse meaningful primitives in novel combinations. Yet, models trained with standard gradient-based optimization rarely, and often only weakly, exhibit compositional internal structure, and it remains unclear how or why such compositionality forms. In this work, we show that compositionality emerges in a narrow connectivity-depth sweet spot. Along the connectivity axis, compositionality only appears in some specifically sparse networks, heavily depends on which connections remain rather than on weights' sparsity alone. Along the depth axis, compositionality emerges within a narrow, target-dependent regime, peaking at specific depths, while both shallower and deeper networks fail. When either the depth or connectivity condition is violated, gradient descent silently converges to fractured solutions rather than compositional ones. To discover and exploit this emergence, we introduce (i) similarity-based pruning (SP) to recover compositional connectivity and (ii) a heuristic depth predictor to estimate where compositionality is most likely to appear. Finally, we support these empirical findings with a theoretical framework based on compositional sparsity, volume-ratio arguments, and feature-interference bounds, explaining why compositional solutions are reachable only in a narrow depth-connectivity regime.

URL PDF HTML ☆

赞 0 踩 0

2606.19984 2026-06-19 cs.LG 新提交

物理信息控制问题中的神经架构作为函数先验

Sonia Rubio Herranz, Fernando Carlos López Hernández, Antonio López Montes

AI总结研究神经架构作为隐式函数先验在常微分方程控制问题中的作用，发现不同架构（MLP与傅里叶KAN）在相同条件下产生定性不同的控制，表现出功能特化现象。

Comments 17 pages, 6 figures. Physics-informed neural networks, optimal control, spectral bias, Kolmogorov-Arnold Networks

详情

AI中文摘要

在这项工作中，我们研究了神经架构作为隐式函数先验在由常微分方程控制的问题中的作用。我们的目标不是关注高度复杂的问题，而是在最简单的物理可解释设置中研究受控动力系统中依赖于架构的效应。特别地，我们研究了一个受控的线性RLC电路和一个非线性Duffing型动力系统。这两个系统首先通过经典最优控制公式进行分析，然后通过基于PINN的方法进行分析。我们比较了多层感知器（MLP）和基于傅里叶的KAN类架构的不同组合，并分析了它们对所得控制的影响。数值实验表明，即使在相同的控制方程、损失函数、初始和目标状态、训练参数以及物理约束下，不同的架构选择也会系统地产生定性不同的控制。学习到的解在谱结构、平滑性、能量分布和相空间行为方面出现显著差异。这项工作的一个核心观察是，当神经架构被允许足够的自由度来塑造学习到的控制结构时，会出现功能特化现象。更具体地说，在我们考虑的系统中，基于傅里叶的架构倾向于产生具有更丰富振荡内容的轨迹，而更平滑的低频偏置架构倾向于产生更规则且能量效率更高的控制。这表明控制问题的不同功能组件可能由不同的神经架构更有效地处理，从而导致状态表示和控制生成之间的隐式特化。

英文摘要

In this work we investigate the role of neural architectures as implicit functional priors in control problems governed by ordinary differential equations. Rather than focusing on highly complex problems, our objective is to investigate architecture-dependent effects in controlled dynamical systems within the simplest physically interpretable settings possible. In particular, we study a controlled linear RLC electrical circuit and a nonlinear Duffing-type dynamical system. Both systems are analyzed first through classical optimal-control formulations and later through PINN-based approaches. We compare different combinations of multilayer perceptrons (MLPs) and Fourier-based KAN-like architectures, and analyze their influence on the resulting controls. The numerical experiments suggest that different architectural choices systematically generate qualitatively distinct controls, even under identical governing equations, loss functionals, initial and target states, training parameters and physical constraints. Significant differences appear in the spectral structure, smoothness, energy distribution, and phase-space behavior of the learned solutions. A central observation of this work is the emergence of a functional specialization phenomenon when the neural architectures are allowed sufficient freedom to shape the structure of the learned controls. More specifically, in the systems considered here, Fourier-based architectures tend to produce trajectories with richer oscillatory content, whereas smoother low-frequency-biased architectures tend to generate more regular and energetically efficient controls. This suggests that different functional components of the control problem may be handled more efficiently by different neural architectures, leading to an implicit specialization between state representation and control generation.

URL PDF HTML ☆

赞 0 踩 0

2606.19538 2026-06-19 cs.AI cs.LG 交叉投稿

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

ITNet: 一种可学习的积分变换，统一卷积、注意力与循环

Ashim Dhor, Rasel Mondal, Pin Yu Chen

发表机构 * Indian Institute of Science Education and Research Bhopal（印度科学教育与研究学院博帕尔分校）； IBM Research（IBM研究院）

AI总结提出可学习积分变换网络ITNet，通过位置-特征联合核函数统一卷积、注意力和循环架构，实现跨模态高性能。

详情

AI中文摘要

卷积网络、循环网络和变换器各自编码不同的归纳偏置——局部性、序列记忆和内容相关的成对交互——自诞生以来在数学上一直彼此独立。我们表明，这种碎片化反映的不是信号处理方式的根本多样性，而是对单一底层数学对象的不完整视角：可学习的积分变换。我们引入积分变换网络（ITNet），这是一种统一架构，围绕一个依赖于位置和特征的联合可学习核构建。该核实现为一个小型神经网络（具体为MLP），用于建模成对交互，使模型能够从数据中自适应其行为。我们证明，卷积、自注意力（包括多头）和自回归循环（包括LSTM、GRU、S4和Mamba）在适当参数化下均作为特例出现，且ITNet是连续算子的通用逼近器。为使其实用，我们开发了分块核融合、重要性加权蒙特卡洛积分和可学习低秩分解，实现高效可扩展计算。单个ITNet架构，共享算子与轻量级模态特定编码器，在ImageNet-1K、GLUE、ModelNet40、VQA v2和NLVR2上匹配或超越专用基线。结果表明，单一学习交互机制可从数据中恢复所有三个架构族的行为。

英文摘要

Convolutional networks, recurrent networks, and transformers each encode different inductive biases -- locality, sequential memory, and content-dependent pairwise interaction -- and have remained mathematically distinct since their inception. We show that this fragmentation reflects not a fundamental diversity in how signals should be processed, but rather incomplete views of a single underlying mathematical object: a learnable integral transform. We introduce the Integral Transform Network (ITNet), a unified architecture built around a learnable kernel that depends jointly on positions and features. This kernel is implemented as a small neural network, specifically an MLP, that models pairwise interactions, enabling the model to adapt its behavior from data. We show that convolution, self-attention (including multi-head), and autoregressive recurrence (including LSTM, GRU, S4, and Mamba) arise as special cases under appropriate parameterizations, and that ITNet is a universal approximator of continuous operators. To make this practical, we develop tiled kernel fusion, importance-weighted Monte Carlo integration, and learned low-rank factorization, enabling efficient and scalable computation. A single ITNet architecture with a shared operator and lightweight modality-specific encoders matches or exceeds specialized baselines on ImageNet-1K , GLUE, ModelNet40, VQA\,v2 and NLVR2. The results demonstrate that a single learned interaction mechanism can recover the behavior of all three architectural families from data.

URL PDF HTML ☆

赞 0 踩 0

2606.19617 2026-06-19 cs.CV cs.GR cs.LG 交叉投稿

GB-LSR: A Fast Local Spectral Image Representation with a Single Global Bandwidth for Continuous Reconstruction and Super-Resolution

GB-LSR：一种具有单一全局带宽的快速局部光谱图像表示，用于连续重建和超分辨率

Max Shad, Naeem Khoshnevis

发表机构 * Harvard University（哈佛大学）

AI总结提出GB-LSR，一种基于全局带宽的局部光谱表示，通过共享卷积编码器预测截断傅里叶基系数，实现连续图像重建，在Kodak等基准上PSNR提升2.8-3.6 dB，推理速度比最慢基线快约4倍。

详情

AI中文摘要

PU-UNet：用于医学图像分割的稳定乘法交互

Ziyuan Li, Osamah Sufyan, Uwe Jaekel, Babette Dellen

发表机构 * Department of Mathematics, Informatics and Technology, University of Applied Sciences Koblenz（科布伦茨应用科学大学数学、信息学与技术系）； Technical University of Munich（慕尼黑工业大学）

AI总结提出PU-UNet，通过稳定乘积单元残差块在低分辨率阶段实现显式乘法特征交互，在三个医学图像分割数据集上提升Dice和IoU，降低假阳性率。

Comments Accepted to the ICANN 2026

详情

AI中文摘要

许多密集预测网络依赖于加性特征变换，并且仅隐式地建模高阶特征交互。乘积单元为乘法特征建模提供了显式机制，但其对数-指数公式可能导致数值不稳定性，这限制了它们在深度密集预测网络中的使用。在这项工作中，我们提出了乘积单元U-Net（PU-UNet），这是一种残差U-Net，它将稳定的乘积单元残差块集成到丰富的低分辨率阶段，用于医学图像分割。所提出的公式结合了平滑正性映射和对数域裁剪，实现了稳定的乘法特征学习，且计算开销可忽略不计。在ISIC 2018、Kvasir-SEG和BUSI上，PU-UNet分别达到了0.942、0.959和高达0.925的Dice分数。与匹配的残差U-Net基线相比，PU-UNet在保持参数、FLOPs和推理延迟几乎不变的情况下，持续提高了Dice和IoU，并将正常BUSI病例的图像级假阳性率从0.077降至零。消融研究表明，这些增益与乘积单元交互相关，在低分辨率放置下最强，并受益于所提出的稳定化设计。这些结果表明，稳定的乘积单元残差学习可以成为通过显式乘法交互增强U-Net风格分割网络的有效方式。

英文摘要

Many dense prediction networks rely on additive feature transformations and model higher-order feature interactions only implicitly. Product units provide an explicit mechanism for multiplicative feature modeling, but their logarithmic--exponential formulation can cause numerical instability, which has limited their use in deep dense prediction networks. In this work, we propose Product-Unit U-Net (PU-UNet), a residual U-Net that integrates stable product-unit residual blocks into rich low-resolution stages for medical image segmentation. The proposed formulation combines smooth positivity mapping with log-domain clipping, enabling stable multiplicative feature learning with negligible computational overhead. On ISIC 2018, Kvasir-SEG, and BUSI, PU-UNet achieves Dice scores of 0.942, 0.959, and up to 0.925, respectively. Compared with a matched Residual U-Net baseline, PU-UNet consistently improves Dice and IoU while keeping parameters, FLOPs, and inference latency nearly unchanged, and reduces the image-level false-positive rate on normal BUSI cases from 0.077 to zero. Ablation studies suggest that the gains are associated with product-unit interactions, are strongest under low-resolution placement, and benefit from the proposed stabilization design. These results suggest that stable product-unit residual learning can be an effective way to enhance U-Net-style segmentation networks with explicit multiplicative interactions.

URL PDF HTML ☆

赞 0 踩 0

2402.14035 2026-06-19 cs.LG cs.AI 版本更新

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

委员会智慧：来自大型基础模型和领域专家的多样化蒸馏

Zichang Liu, Qingyun Liu, Yuening Li, Liang Liu, Anshumali Shrivastava, Shuchao Bi, Lichan Hong, Ed H. Chi, Zhe Zhao

发表机构 * Rice University（Rice大学）； Google DeepMind（谷歌DeepMind）； Google Inc（谷歌公司）； University of California, Davis（加州大学戴维斯分校）

AI总结针对基础模型向紧凑领域模型蒸馏时能力、架构和模态差异大的问题，提出DiverseDistill框架，通过可学习的问答机制和对齐异构教师输出，在推荐和视觉任务上恢复73-114%的性能差距。

Comments Accepted at the 1st Workshop on Resource-Efficient Learning and Knowledge Discovery (RelKD), KDD 2026

Journal ref Proceedings of the RelKD Workshop at KDD 2026

详情

AI中文摘要

从基础模型向紧凑领域模型进行知识蒸馏因能力、架构和模态的巨大差异而具有挑战性。例如，在我们的实验中，从7600万参数的语言模型蒸馏到200万参数的推荐模型仅能弥补未蒸馏学生与教师之间不到40%的性能差距。我们表明，引入与基础模型共享学生架构特征的领域专家作为多样化教师委员会，能显著改善迁移效果。然而，标准的多教师方法未能利用这种多样性：简单组合异构教师可能使性能低于单教师蒸馏。为此，我们提出DiverseDistill，一种交互式蒸馏框架，采用可学习的问答机制生成教师条件查询，并将异构教师输出对齐到学生的表示空间。与需要基于梯度的协同优化或修改教师架构的方法不同，DiverseDistill在冻结教师的情况下仅通过其中间层的前向推理运行：无需参数更新、无需协同训练、无需架构修改。动态教师重要性机制通过过滤每个样本中低相关性的教师（例如，在推荐任务中减少约30%的前向传播且无质量损失）进一步降低训练成本，而整个蒸馏模块在训练后被丢弃，推理时零开销。在推荐（38倍压缩）和视觉（3.6倍压缩）任务上的评估表明，DiverseDistill恢复了73-114%的师生性能差距，持续优于所有单教师和多教师基线方法。

英文摘要

Knowledge distillation from foundation models to compact domain models is challenging due to substantial gaps in capacity, architecture, and modality. For example, in our experiments, distilling from a 76M-parameter language model to a 2M-parameter recommender closes less than 40% of the performance gap between the undistilled student and the teacher. We show that introducing domain-specific experts -- which share the student's architectural characteristics -- alongside the foundation model as a diverse teacher committee significantly improves transfer. However, standard multi-teacher methods fail to exploit this diversity: naively combining heterogeneous teachers can degrade performance below single-teacher distillation. To address this, we propose DiverseDistill, an interactive distillation framework that employs a learnable Question-Answer mechanism to generate teacher-conditioned queries and align heterogeneous teacher outputs into the student's representation space. Unlike methods requiring gradient-based co-optimization or architectural modification of teachers, DiverseDistill operates with frozen teachers using only forward-pass inference through their intermediate layers: no parameter updates, no co-training, and no architectural surgery. A dynamic teacher importance mechanism further reduces training cost by filtering low-relevance teachers per sample (e.g., ~30% fewer forward passes with no quality loss for recommendation tasks), while the entire Distillation Module is discarded after training, adding zero inference overhead. Evaluations on recommendation (38x compression) and vision (3.6x compression) tasks demonstrate that DiverseDistill recovers 73-114% of the teacher-student performance gap, consistently outperforming all single- and multi-teacher baselines.

URL PDF HTML ☆

赞 0 踩 0

2501.18322 2026-06-19 cs.LG math.AP 版本更新

A Unified Perspective on the Dynamics of Deep Transformers

深度Transformer动力学的统一视角

Valérie Castin, Pierre Ablin, José Antonio Carrillo, Gabriel Peyré

发表机构 * CNRS and Ecole Normale Supérieure PSL（CNRS和巴黎高等师范大学）； Apple（苹果公司）； Mathematical Institute, University of Oxford（牛津大学数学学院）

AI总结提出Transformer PDE作为注意力层迭代的均场极限，证明其适定性并分析高斯初始数据下的各向异性演化与聚类现象。

详情

AI中文摘要

Transformer在大多数机器学习任务中是最先进的，它将数据表示为称为token的向量序列。然后通过注意力函数利用这种表示，该函数学习token之间的依赖关系，是Transformer成功的关键。然而，跨层迭代应用注意力会导致复杂的动力学，这些动力学尚未被完全理解。为了分析这些动力学，我们将每个输入序列识别为一个概率测度，并将其演化建模为称为Transformer PDE的Vlasov方程，其速度场在概率测度中是非线性的。我们的第一组贡献聚焦于紧支撑初始数据。我们证明Transformer PDE是适定的，并且是相互作用粒子系统的均场极限，从而将先前的分析推广并扩展到自注意力的几种变体：多头注意力、L2注意力、Sinkhorn注意力、Sigmoid注意力和掩码注意力——利用条件Wasserstein框架。在第二组贡献中，我们首次研究非紧支撑初始条件，聚焦于高斯初始数据。再次针对不同类型的注意力，我们证明Transformer PDE保持高斯测度空间，这使我们能够从理论上和数值上分析高斯情况以识别典型行为。这种高斯分析捕捉了通过深度Transformer的数据各向异性演化。特别地，我们强调了与先前在非归一化离散情况下的结果平行的聚类现象。

英文摘要

Transformers, which are state-of-the-art in most machine learning tasks, represent the data as sequences of vectors called tokens. This representation is then exploited by the attention function, which learns dependencies between tokens and is key to the success of Transformers. However, the iterative application of attention across layers induces complex dynamics that remain to be fully understood. To analyze these dynamics, we identify each input sequence with a probability measure and model its evolution as a Vlasov equation called Transformer PDE, whose velocity field is non-linear in the probability measure. Our first set of contributions focuses on compactly supported initial data. We show the Transformer PDE is well-posed and is the mean-field limit of an interacting particle system, thus generalizing and extending previous analysis to several variants of self-attention: multi-head attention, L2 attention, Sinkhorn attention, Sigmoid attention, and masked attention--leveraging a conditional Wasserstein framework. In a second set of contributions, we are the first to study non-compactly supported initial conditions, by focusing on Gaussian initial data. Again for different types of attention, we show that the Transformer PDE preserves the space of Gaussian measures, which allows us to analyze the Gaussian case theoretically and numerically to identify typical behaviors. This Gaussian analysis captures the evolution of data anisotropy through a deep Transformer. In particular, we highlight a clustering phenomenon that parallels previous results in the non-normalized discrete case.

URL PDF HTML ☆

赞 0 踩 0

2511.04514 2026-06-19 cs.LG 版本更新

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

图像分类器深度集成在数据偏移下的线性模式连通性

C. Hepburn, T. Zielke, A. P. Raulf

发表机构 * Institute for AI Safety & Security（人工智能安全与安全研究所）

AI总结实验研究数据偏移下线性模式连通性（LMC）的条件，发现小学习率和大批量可减轻其影响，并揭示LMC在训练效率与集成多样性间的权衡。

Comments 17 pages, 22 figures

详情

AI中文摘要

线性模式连通性（LMC）现象将深度学习的多个方面联系起来，包括噪声随机梯度下的训练稳定性、局部最小值（盆地）的平滑性和泛化性、采样模型的相似性和功能多样性，以及架构对数据处理的影响。在这项工作中，我们实验研究了数据偏移下的LMC，并确定了减轻其影响的条件。我们将数据偏移解释为随机梯度噪声的额外来源，可以通过小学习率和大批量来减少。这些参数影响模型是收敛到相同的局部最小值，还是收敛到损失景观中具有不同平滑性和泛化性的区域。尽管通过LMC采样的模型往往比收敛到不同盆地的模型更频繁地犯相似错误，但LMC的好处在于平衡训练效率与从更大、更多样化的集成中获得的收益。代码和补充材料可从此https URL获取。本工作已提交给IEEE考虑发表。版权可能随时转移，此后此版本可能不再可访问。

英文摘要

The phenomenon of linear mode connectivity (LMC) links several aspects of deep learning, including training stability under noisy stochastic gradients, the smoothness and generalization of local minima (basins), the similarity and functional diversity of sampled models, and architectural effects on data processing. In this work, we experimentally study LMC under data shifts and identify conditions that mitigate their impact. We interpret data shifts as an additional source of stochastic gradient noise, which can be reduced through small learning rates and large batch sizes. These parameters influence whether models converge to the same local minimum or to regions of the loss landscape with varying smoothness and generalization. Although models sampled via LMC tend to make similar errors more frequently than those converging to different basins, the benefit of LMC lies in balancing training efficiency against the gains achieved from larger, more diverse ensembles. Code and supplementary materials are available at https://github.com/DLR-KI/LMC. This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible.

URL PDF HTML ☆

赞 0 踩 0

2602.09689 2026-06-19 cs.LG 版本更新

Model soups need only one ingredient

模型汤只需一种成分

Alireza Abdollahpoorrostam, Nikolaos Dimitriadis, Adam Hazimeh, Pascal Frossard

发表机构 * EPFL（瑞士联邦理工学院）； EPFL LTS4（瑞士联邦理工学院 LTS4）

AI总结提出MonoSoup方法，利用SVD分解单检查点的层更新，通过熵有效秩自动重加权成分，实现强分布内-分布外平衡，无需多检查点。

详情

AI中文摘要

在目标分布上微调大型预训练模型通常会提高分布内（ID）准确性，但代价是分布外（OOD）鲁棒性下降，因为表示会专门适应微调数据。权重空间集成方法，如模型汤（Model Soups），通过平均多个检查点来缓解这一影响，但它们在计算上代价高昂，需要训练和存储数十个微调模型。在本文中，我们介绍了MonoSoup，一种简单、无数据、无超参数的事后方法，仅使用单个检查点即可实现强大的ID-OOD平衡。我们的方法对每一层的更新应用奇异值分解（SVD），将其分解为捕捉任务特定适应的高能量方向和引入噪声但可能仍编码对鲁棒性有用的残余信号的低能量方向。然后，MonoSoup使用基于熵的有效秩自动重新加权这些分量，并考虑模型的谱和几何结构的逐层系数。在ImageNet上微调并在自然分布偏移下评估的CLIP模型，以及在数学推理和多选题基准上测试的Qwen语言模型上的实验表明，这种即插即用方法是多检查点方法的实用且有效的替代方案，保留了其大部分好处而无需计算开销。

英文摘要

Fine-tuning large pre-trained models on a target distribution often improves in-distribution (ID) accuracy, but at the cost of out-of-distribution (OOD) robustness as representations specialize to the fine-tuning data. Weight-space ensembling methods, such as Model Soups, mitigate this effect by averaging multiple checkpoints, but they are computationally prohibitive, requiring the training and storage of dozens of fine-tuned models. In this paper, we introduce MonoSoup, a simple, data-free, hyperparameter-free, post-hoc method that achieves a strong ID-OOD balance using only a single checkpoint. Our method applies Singular Value Decomposition (SVD) to each layer's update and decomposes it into high-energy directions that capture task-specific adaptation and low-energy directions that introduce noise but may still encode residual signals useful for robustness. MonoSoup then uses entropy-based effective rank to automatically re-weigh these components with layer-wise coefficients that account for the spectral and geometric structure of the model. Experiments on CLIP models fine-tuned on ImageNet and evaluated under natural distribution shifts, as well as on Qwen language models tested on mathematical reasoning and multiple-choice benchmarks, show that this plug-and-play approach is a practical and effective alternative to multi-checkpoint methods, retaining much of their benefits without their computational overhead.

URL PDF HTML ☆

赞 0 踩 0

2604.15838 2026-06-19 cs.LG 版本更新

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

可逆残差归一化缓解时空分布偏移

Zhaobo Hu, Vincent Gauthier, Mehdi Naima

发表机构 * CNRS -- LIP6 Sorbonne Universit\'e

AI总结针对时空分布偏移问题，提出可逆残差归一化框架，通过空间感知可逆变换同时处理时空维度偏移，结合图卷积与谱约束图神经网络实现自适应归一化。

详情

AI中文摘要

分布偏移严重降低了深度预测模型的性能。虽然这一问题在单变量时间序列中已有充分研究，但在时空领域中仍然是一个重大挑战。有效的解决方案如实例归一化及其变体可以通过标准化统计量来缓解时间偏移。然而，图上的分布偏移更为复杂，不仅涉及单个节点序列的漂移，还涉及空间网络中的异质性，其中不同节点表现出不同的统计特性。为了解决这个问题，我们提出了可逆残差归一化（RRN），一种新颖的框架，执行空间感知的可逆变换以解决空间和时间维度上的分布偏移。我们的方法在可逆残差块中集成了图卷积操作，实现了在保持可逆性的同时尊重底层图结构的自适应归一化。通过将中心归一化与谱约束图神经网络相结合，我们的方法以数据驱动的方式捕获和归一化复杂的时空关系。我们框架的双向性允许模型在归一化的潜在空间中学习，并通过逆变换恢复原始分布特性，为动态时空系统上的预测提供了一种鲁棒且模型无关的解决方案。

英文摘要

Distribution shift severely degrades the performance of deep forecasting models. While this issue is well-studied for individual time series, it remains a significant challenge in the spatio-temporal domain. Effective solutions like instance normalization and its variants can mitigate temporal shifts by standardizing statistics. However, distribution shift on a graph is far more complex, involving not only the drift of individual node series but also heterogeneity across the spatial network where different nodes exhibit distinct statistical properties. To tackle this problem, we propose Reversible Residual Normalization (RRN), a novel framework that performs spatially-aware invertible transformations to address distribution shift in both spatial and temporal dimensions. Our approach integrates graph convolutional operations within invertible residual blocks, enabling adaptive normalization that respects the underlying graph structure while maintaining reversibility. By combining Center Normalization with spectral-constrained graph neural networks, our method captures and normalizes complex Spatio-Temporal relationships in a data-driven manner. The bidirectional nature of our framework allows models to learn in a normalized latent space and recover original distributional properties through inverse transformation, offering a robust and model-agnostic solution for forecasting on dynamic spatio-temporal systems.

URL PDF HTML ☆

赞 0 踩 0

2605.09609 2026-06-19 cs.LG math.AG 版本更新

Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects

多项式神经网络的最小填充架构：反例、前沿搜索与缺陷

Kevin Dao, Jose Israel Rodriguez

发表机构 * Department of Mathematics, University of Wisconsin-Madison, Wisconsin, USA（威斯康星大学麦迪逊分校数学系）

AI总结本文通过前沿搜索和符号计算验证了多项式神经网络的最小单峰猜想反例，揭示了部分子架构存在较大缺陷，与以往小缺陷现象形成对比。

2605.30456 2026-06-19 cs.LG math.OC 版本更新

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

DisjunctiveNet: 通过可微凸优化层实现的神经符号学习

Shraman Pal, Can Li

发表机构 * Davidson School of Chemical Engineering, Purdue University, West Lafayette, USA（帕克大学化学工程大卫逊学校）

AI总结针对数据稀疏且富含领域知识的场景，提出DisjunctiveNet框架，通过可微凸优化层将析取约束嵌入神经网络，实现硬约束满足与强预测性能。

Comments ICML 2026

详情

AI中文摘要

科学与工程中的许多学习任务以稀疏数据集为特征，这限制了纯数据驱动方法的有效性。同时，这些问题通常伴随着源自物理定律、操作要求和专家启发式的丰富领域知识。这些知识经常以涉及逻辑命题和线性不等式的规则形式表达。现有的神经符号方法通常通过软惩罚近似地强制执行这些规则，在设计专门架构时假设输入无关的规则，或者依赖推理时的不可微后处理来实现硬约束满足。虽然可微优化层的最新进展使得在神经网络中实现端到端的可行性强制成为可能，但由于固有的非凸性，将这些方法扩展到逻辑或混合整数规则仍然具有挑战性。在这项工作中，我们提出了一个统一的端到端框架，用于在神经网络中强制执行硬性的、输入相关的混合整数线性约束。我们的方法将规则表示为析取约束，并应用层次凸松弛来获得凸包公式。这些松弛产生了易于处理的线性约束，可以嵌入为可微优化层，同时实现精确的规则满足。我们在真实数据集上展示了所提出框架的有效性，实现了完美的规则满足和强大的预测性能。

英文摘要

Many learning tasks in science and engineering are characterized by sparse datasets, which limits the effectiveness of purely data-driven approaches. At the same time, these problems are often accompanied by rich domain knowledge derived from physical laws, operational requirements, and expert heuristics. Such knowledge is frequently expressed as rules involving logical propositions and linear inequalities. Existing neuro-symbolic methods typically enforce these rules approximately through soft penalties, assume input-independent rules when designing specialized architectures, or rely on non-differentiable post-processing at inference time to achieve hard constraint satisfaction. While recent advances in differentiable optimization layers enable end-to-end feasibility enforcement within neural networks, extending these approaches to logical or mixed-integer rules remains challenging due to inherent nonconvexity. In this work, we propose a unified end-to-end framework for enforcing hard, input-dependent mixed integer linear constraints within neural networks. Our approach represents rules as disjunctive constraints and applies hierarchical convex relaxations to obtain convex hull formulations. These relaxations yield tractable linear constraints that can be embedded as differentiable optimization layers while enabling exact rule satisfaction. We demonstrate the effectiveness of the proposed framework on real-world datasets, achieving perfect rule satisfaction and strong predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2606.16575 2026-06-19 cs.LG math-ph math.MP 版本更新

RepNN: Tackling spectral bias in deep neural networks via parameter reparameterization

RepNet：通过参数重参数化解决深度神经网络中的谱偏差

Yong Wang, Tao Zhou, Xuhui Meng

发表机构 * Institute of Interdisciplinary Research for Mathematics and Applied Science, School of Mathematics and Statistics, Huazhong University of Science and Technology（华中科技大学数学与统计学院交叉科学与应用数学研究所）； Institute of Computational Mathematics, Academy of Mathematics and Systems Science, Chinese Academy of Sciences（中国科学院数学与系统科学研究院计算数学研究所）

AI总结针对深度神经网络在捕捉振荡和多尺度行为时的谱偏差问题，提出RepNet模型，通过重参数化第一隐藏层的权重和偏置，有效控制初始斜率尺度和分区点分布，实现自适应频率缩放，在函数逼近、PDE求解和算子学习中显著提升精度。

详情

AI中文摘要

深度神经网络（DNN）在科学计算中取得了显著成功，但在捕捉振荡和多尺度行为时常常受到谱偏差的影响。在本研究中，我们通过考察浅层ReLU神经网络在高频函数拟合中的失败来探究这一局限性。这一观察识别出解决快速振荡的两个重要因素：初始斜率尺度和网络诱导的分区点分布。受此分析启发，我们提出了RepNet，一种针对ReLU和tanh网络的重参数化DNN模型，专为高频和多尺度问题设计。关键思想是重参数化第一隐藏层的权重和偏置，从而能够有效控制初始斜率尺度并提供合适的初始分区点分布。此外，将重参数化的权重和偏置视为可训练参数，使得DNN在训练过程中实现自适应频率缩放。我们还推导了重参数化DNN的输出和斜率幅度的定量估计，以指导所提方法的初始化。数值实验，包括多尺度一维和四维函数逼近、结合物理信息神经网络（PINN）的正向和逆向PDE问题以及算子学习，表明RepNet在略微增加计算成本的情况下，提高了普通DNN在捕捉高度振荡特征时的预测精度。这些结果表明，RepNet为克服谱偏差并将DNN应用于多尺度问题提供了一种有效且灵活的方法。

英文摘要

Deep neural networks (DNNs) have achieved remarkable success in scientific computing, yet they often suffer from spectral bias in capturing oscillatory and multiscale behaviors. In this study, we investigate this limitation by examining the failure of shallow ReLU neural networks in fitting high-frequency functions. This observation identifies two important factors in resolving rapid oscillations: the initial slope scale and the distribution of partition points induced by the networks. Motivated by this analysis, we propose RepNN, a reparameterized neural network model with activation ReLU or tanh designed for high-frequency and multiscale problems. The key idea is to reparameterize the weights and biases in the first hidden layer, which enables effective control of the initial slope scale and provides an appropriate distribution of the initial partition points. Furthermore, treating the reparameterized weights and biases as trainable parameters allows the DNN to achieve adaptive frequency scaling during training. In addition, we derive quantitative estimates for the output and slope magnitudes of the reparameterized DNN to guide the initialization of the proposed method. Numerical experiments, including multiscale one- and four-dimensional function approximations, forward and inverse PDE problems in combination with physics-informed neural networks (PINNs), and operator learning for an earthquake problem using real data, demonstrate that RepNN improves the predicted accuracy of vanilla DNNs in capturing highly oscillatory features with slightly additional computational cost. These results indicate that RepNN provides an effective and flexible approach for overcoming spectral bias and applying DNNs to multiscale problems.

URL PDF HTML ☆

赞 0 踩 0

2606.17832 2026-06-19 cs.LG 版本更新

From Drift to Coherence: Stabilizing Beliefs in LLMs

从漂移到一致：稳定LLM中的信念

SongEun Kim, Seungyoo Lee, Edwin Fong, Hyungi Lee, Juho Lee

发表机构 * Department of Statistics, Seoul National University ； Korea Advanced Institute of Science \& Technology ； Department of AI, Kookmin University ； University of Hong Kong

AI总结研究LLM在多项选择问答中的信念漂移问题，提出提示式预测重采样（PPR）方法，发现信念过程会自稳定并收敛，进而提出种子答案提示策略和自一致性损失以加速稳定并提高预测一致性。

详情

AI中文摘要

大型语言模型（LLM）常被假设执行隐式贝叶斯推理，然而一个关键的一致性条件——预测信念的鞅性质——已被证明在受控的合成上下文学习设置中失效。我们在更典型的使用场景中重新审视这个问题：通用多项选择问答。利用离散答案空间，我们计算精确的预测分布，并研究由自回归答案重采样引起的信念动态。我们引入了提示式预测重采样（PPR），其中LLM对同一问题生成一系列答案。实验表明，PPR揭示了早期阶段的信念漂移，表明鞅性质被违反。然而，在足够的重采样步骤后，信念过程自稳定并收敛到一个一致的预测分布。基于这一观察，我们进一步提出了（i）种子答案提示策略以加速稳定，以及（ii）自一致性损失，通过微调将早期漂移摊销到模型中。在多项选择问答基准上的实验表明，我们的方法在不牺牲准确性的情况下显著减少了信念漂移并提高了预测一致性。

英文摘要

Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.17886 2026-06-19 cs.LG 版本更新

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

单调Kolmogorov-Arnold网络：单调性作为归纳偏置的理论与实证研究

Mikhail Krasnov, Blaž Bertalanič, Carolina Fortuna

发表机构 * Jozef Stefan Institute（约瑟夫·斯特凡研究所）

AI总结提出MKAN，通过指数重参数化B样条系数、正边权和单调基激活实现硬单调性，理论证明任何特征提取器可被单调化且编码器规模有界，实验表明MKAN在单调性基准上达到最优并保持KAN的逐边功能透明性。

详情

AI中文摘要

单调性一直是神经网络长期使用的架构归纳偏置，其动机来源于表格、科学和经济场景，其中输出已知对某些输入呈单调响应。现有方法基于MLP或流模型，缺乏逐边功能透明性；唯一具有单调性的KAN变体MonoKAN仅在受限参数子集上施加约束，并需要投影式训练过程。我们通过\textbf{MKAN}填补了这一空白，MKAN是一种KAN，通过B样条系数的指数重参数化、正边权和单调基激活，对所有参数值保证硬单调性。训练简化为标准的无约束梯度下降。我们的主要理论贡献是一个\textbf{表示代价}定理：任何诱导球状语义邻域划分的$C^K, K >0$特征提取器，都可以在$N' = N^* + k \le 2N^*$处实现等价邻域结构的单调实现，其中$k$是原始非单调坐标的数量。该界限与架构无关，并为单调编码器提供了原则性的规模确定规则。实验上，MKAN在SMM/ICML-2024基准上与最先进的单调神经网络竞争，同时是唯一结合了硬无约束单调性和KAN逐边功能透明性的方法；在四个真实数据集上的自监督特征规模扫描中验证了$2N^*$预测，在受控单调生成数据集上，MKAN以显著高于KAN、MLP和线性基线的Spearman对齐恢复了真实因子。

英文摘要

Monotonicity has been a long-running architectural inductive bias for neural networks, motivated by tabular, scientific, and economic settings where outputs are known to respond monotonically to certain inputs. Existing approaches are MLP- or flow-based and lack per-edge functional transparency; the only Kolmogorov--Arnold Network (KAN) variant with monotonicity, MonoKAN, enforces the constraint only on a restricted parameter subset and requires a projection-style training procedure. We close this gap with \textbf{MKAN}, a KAN with hard monotonicity guaranteed for \emph{all} parameter values via exponential reparameterization of B-spline coefficients, positive edge weights, and a monotone base activation. Training reduces to standard unconstrained gradient descent. Our headline theoretical contribution is a \emph{representation-cost} theorem: any $C^K, K >0$ feature extractor inducing a ball-shaped semantic-neighborhood partition admits a monotone realization of the equivalent neighborhood structure at $N' = N^* + k \le 2N^*$, where $k$ is the number of non-monotone coordinates of the original. The bound is architecture-agnostic and gives a principled sizing rule for monotone encoders. Empirically, MKAN is competitive with state-of-the-art monotone NNs on the SMM/ICML-2024 benchmark while being the only method that combines hard unconstrained monotonicity with KAN's per-edge functional transparency; the $2N^*$ prediction is validated in a self-supervised feature-size sweep on four real datasets, and on a controlled monotone-generative dataset MKAN recovers ground-truth factors with substantially higher Spearman alignment than KAN, MLP, and linear baselines.

URL PDF HTML ☆

赞 0 踩 0

2601.22300 2026-06-19 physics.optics cond-mat.dis-nn cs.ET cs.LG 版本更新

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

面向全光学无监督Hebbian学习的深度光子神经形态网络

Xi Li, Disha Biswas, Peng Zhou, Wesley H. Brigner, Anna Capuano, Joseph S. Friedman, Qing Gu

发表机构 * Department of Electrical and Computer Engineering, North Carolina State University（北卡罗来纳州立大学电气与计算机工程系）； Department of Electrical and Computer Engineering, The University of Texas at Dallas（德克萨斯大学达拉斯分校电气与计算机工程系）； Department of Materials Science and Engineering, North Carolina State University（北卡罗来纳州立大学材料科学与工程系）； Department of Physics, North Carolina State University（北卡罗来纳州立大学物理系）

AI总结提出一种基于相变材料突触和局部光反馈的深度光子神经形态网络架构，实现在线无监督Hebbian学习，实验验证了自适应突触演化和光学推理。

Comments 16 pages, 4 figures

详情

AI中文摘要

我们提出了一种基于相变材料（PCM）突触和局部光反馈的深度光子神经形态网络（PNN）架构，用于在线、无监督的Hebbian学习。该架构将光学矢量-矩阵乘法、非易失性PCM突触加权以及局部符合驱动的突触自适应结合在一个与光子集成电路兼容的多层光子交叉开关框架中。与依赖外部计算梯度、重复光电转换或全局反向传播的传统PNN不同，所提出的框架采用由突触前和突触后光学活动直接控制的局部Hebbian学习。为了研究所提出的学习机制的可行性，我们使用光纤组件、可编程可变光衰减器和包含PCM热动力学的实时软件控制实现了PNN设计。在离线和在线学习条件下，使用代表性图像识别任务实验评估了监督和无监督学习行为。实验结果表明，在现实光纤硬件条件下，通过局部Hebbian学习实现了自适应突触演化、成功的光学推理和自主模式编码。这些结果为未来能够实现可扩展和节能的在线Hebbian学习的集成光子神经形态系统铺平了道路。

英文摘要

We propose a deep photonic neuromorphic network (PNN) architecture based on phase-change material (PCM) synapses and local optical feedback for online, unsupervised Hebbian learning. The proposed architecture combines optical vector-matrix multiplication, non-volatile PCM synaptic weighting, and local coincidence-driven synaptic adaptation within a multilayer photonic crossbar framework compatible with photonic integrated circuits. Unlike conventional PNNs that rely on externally computed gradients, repeated optical-electrical-optical conversions, or global backpropagation, the proposed framework employs local Hebbian learning governed directly by correlated pre- and post-synaptic optical activity. To investigate the feasibility of the proposed learning mechanism, we implemented the PNN design using fiber-optic components, programmable variable optical attenuators, and real-time software control that incorporates PCM thermal dynamics. Supervised and unsupervised learning behaviors were experimentally evaluated under both offline and online learning conditions using representative image-recognition tasks. The experimental results demonstrate adaptive synaptic evolution, successful optical inference, and autonomous pattern encoding through local Hebbian learning under realistic fiber-optic hardware conditions. These results establish a pathway toward future integrated photonic neuromorphic systems capable of scalable and energy-efficient online Hebbian learning.

URL PDF HTML ☆

赞 0 踩 0

2606.11673 2026-06-19 quant-ph cs.LG 版本更新

Higher-Order Token Interactions via Quantum Attention

高阶令牌交互的量子注意力机制

Jian Xu, Chao Li, Delu Zeng, John Paisley, Qibin Zhao

发表机构 * RIKEN iTHEMS ； RIKEN AIP ； South China University of Technology（华南理工大学）； Columbia University（哥伦比亚大学）

AI总结提出量子高阶注意力（QHA），通过数据重上传和非克利福德纠缠器在浅电路中合成任意阶令牌交互，证明其表达能力超越经典自注意力，并具有可训练性保证，在遗传上位、带噪学习奇偶和图三角形检测中高效检测高阶交互。

详情

AI中文摘要

标准点积自注意力在单层中仅计算令牌间的成对（二阶）交互；表示一般的$k$阶交互已知需要在单层中使用超二次资源或通过深度组合。我们引入\textbf{量子高阶注意力（QHA）}，一种浅层、硬件可实现的量子注意力头，通过数据重上传和全对非克利福德纠缠器，在电路内部合成$k$阶令牌交互，并通过局部单量子比特读出暴露它们。我们证明：（i）表达能力分离：任何嵌入维度$m$、$H$个头和$p$位精度满足$mHp=o(N/\log\log N)$的单个标准自注意力层无法表示一个QHA头以电路深度$O(\log k)$（$O(k)$个两量子比特门）表示的$k$阶相关族；（ii）其局部设计实例的可训练性保证：使用局部读出和$O(\log n)$深度，梯度方差为$\Omega(1/\mathrm{poly}(n))$（无贫瘠高原），我们通过实验确认——同时明确我们基准测试的更具表达力的全对实例是经验训练的，并显示指数衰减的梯度。实验上，在参数预算小$6.5\times$的情况下，QHA从不相交输入中泛化每个阶$k\le6$的隐藏子集奇偶性，而更大的经典注意力头在阶~2之后崩溃；与理论一致，优势的大小跟踪目标的傅里叶度——奇偶性最大，当存在低阶结构时缩小。作为一个应用，QHA在三个领域——遗传上位、带噪学习奇偶和图三角形检测——作为紧凑的高阶交互检测器，在最小的参数预算下达到噪声上限，而领域标准的线性方法失败。

利用持续同调追踪大型语言模型中的表示动态

Naman Malhotra, Jay Ambadkar, Abhinav Gupta, Kushal Kasivel, Abbas Schwarz, Kamillo Ferry, Anthea Monod

发表机构 * Imperial College London（伦敦帝国学院）

AI总结通过持续同调分析激活空间拓扑，发现对齐过程中拓扑重组主要发生在训练早期，且不同对齐目标产生可区分的拓扑轨迹。

Comments 29 pages

详情

AI中文摘要

大型语言模型通常通过监督微调进行对齐，但关于其内部表示在此过程中如何演变的研究尚不充分。我们利用持续同调，通过追踪微调过程中激活空间的拓扑结构来研究对齐动态。在四个参数范围从1B到7B的Transformer语言模型以及对应于有用、无害和混合训练数据的三个对齐目标上，我们发现大多数拓扑重组发生在训练的最早阶段。密集检查点分析揭示了拓扑活动的瞬态峰值，随后迅速稳定。我们进一步表明，不同的对齐目标会引发可区分的拓扑轨迹，而指令微调和预训练模型则表现出定性不同的演化模式。我们的结果表明，持续同调为对齐提供了互补视角，揭示了仅从行为指标无法察觉的表示级变化。

英文摘要

Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking the topology of activation spaces throughout fine-tuning. Across four transformer language models ranging from 1B to 7B parameters and three alignment objectives corresponding to helpful, harmless, and mixed training data, we find that the majority of topological reorganization occurs during the earliest stages of training. A dense checkpoint analysis reveals a transient peak in topological activity followed by rapid stabilization. We further show that different alignment objectives induce distinguishable topological trajectories, while instruction-tuned and pretrained models exhibit qualitatively different patterns of evolution. Our results suggest that persistent homology provides a complementary perspective on alignment, revealing representation-level changes that are not apparent from behavioral metrics alone.

URL PDF HTML ☆

赞 0 踩 0

2606.19594 2026-06-19 cs.LG 新提交

Unsupervised Causal Abstractions Discovery

无监督因果抽象发现

Théo Saulus, Simon Lacoste-Julien, Dhanya Sridhar

发表机构 * Mila - Quebec AI Institute（魁北克人工智能研究所）； Université de Montréal（蒙特利尔大学）； Canada CIFAR AI Chair（加拿大CIFAR人工智能主席）

AI总结提出从低层测量数据中直接学习高层结构因果模型的方法，利用低秩因果发现假设，证明低秩图观测诱导的潜变量形成因果抽象，并给出可辨识性结果及实用学习目标。

2606.19827 2026-06-19 cs.LG cs.AI 新提交

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

何时、何地以及如何：面向表格自监督学习的自适应分箱

Daehwan Kim, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University（汉阳大学）； Hankuk University of Foreign Studies（韩国外国语大学）

AI总结提出自适应分箱方法，通过特征级粗到细课程学习动态优化离散化，结合类别重建与顺序监督，在医疗表格数据上提升自监督学习性能。

Comments Accepted to MICCAI 2026

详情

AI中文摘要

医疗表格数据在临床研究中无处不在，但表格数据的深度学习仍未被充分探索，因为可靠的标签通常需要昂贵的专家判定，尽管结构化临床变量通常以表格形式常规可用。自监督学习可以利用这些未标记的表格，而最近基于分箱的前置任务提供了一种有前景的归纳偏置，但现有目标固定单个全局分位数离散化并应用特征无关的监督。我们提出自适应分箱，一种用于表格自监督学习的训练自适应离散化前置任务，通过特征级粗到细课程将离散化与学习耦合。受神经网络的频谱偏差和课程学习原则的启发，我们的方法在检测到平台期时逐步细化每个特征的离散化，并选择表示感知的分割点，以联合改善值空间浓度和表示空间一致性。一种异质性感知目标统一了类别重建与数值特征的顺序监督，在统一评估协议下对公共医疗表格数据集的实验显示，线性探测和微调均取得一致改进，无需数据集特定的离散化调整。我们进一步引入一个医疗表格自监督学习基准，配备标准化协议，以支持这一未被充分探索领域的可重复进展。我们的代码可在该网址获取。

英文摘要

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.

URL PDF HTML ☆

赞 0 踩 0

2606.19888 2026-06-19 cs.LG cs.AI 新提交

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

SL-S4Wave：基于结构化状态空间模型的生理波形自监督学习

Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； OpenEvidence, USA（OpenEvidence（美国））； New York University（纽约大学）； Xi’an Jiaotong University（西安交通大学）； University of Toronto（多伦多大学）； Emory University（埃默里大学）

AI总结提出SL-S4Wave框架，结合对比学习与基于结构化状态空间模型的编码器，通过多尺度子核全局卷积捕获多通道生理波形的局部和长程依赖，在心律失常检测等任务中优于现有方法。

详情

AI中文摘要

由于高采样率、多通道信号复杂性、固有噪声和有限的标记数据，对长序列医学时间序列数据（如心电图）进行建模面临重大挑战。尽管最近基于各种编码器架构（如卷积神经网络）的自监督学习方法被提出用于从未标记数据中学习表示，但它们往往在捕获长程依赖和噪声不变特征方面存在不足。结构化状态空间模型擅长长序列建模，但现有的S4架构无法捕获多通道生理波形的独特特征。在这项工作中，我们提出了SL-S4Wave，一个自监督学习框架，它将对比学习与基于结构化状态空间模型的定制编码器相结合。该编码器利用多尺度子核实现多层全局卷积，从而能够在嘈杂的高分辨率多通道波形中捕获细粒度局部模式和长程时间依赖。在真实世界数据集上的大量实验表明，SL-S4Wave（1）在具有挑战性的心律失常检测任务中持续优于最先进的监督和自监督基线，（2）使用显著更少的标记示例实现高性能，展示了强大的标签效率，（3）在长波形片段上保持稳健性能，突出了其对大多数现有方法无法有效建模的长序列中复杂时间动态的建模能力，以及（4）有效迁移到未见的心律失常类型，强调了其强大的跨域泛化能力。我们还在多个EEG任务上评估了SL-S4Wave，在强基线上取得了优越性能，证明了我们的方法在心脏波形之外的泛化能力。

英文摘要

Modeling long-sequence medical time series data, such as electrocardiograms (ECG), poses significant challenges due to high sampling rates, multichannel signal complexity, inherent noise, and limited labeled data. While recent self-supervised learning (SSL) methods, based on various encoder architectures such as convolutional neural networks, have been proposed to learn representations from unlabeled data, they often fall short in capturing long-range dependencies and noise-invariant features. Structured state space models (S4) excel at long-sequence modeling, but existing S4 architectures fail to capture the unique characteristics of multichannel physiological waveforms. In this work, we propose SL-S4Wave, a self-supervised learning framework that combines contrastive learning with a tailored encoder built on structured state space models. The encoder incorporates multi-layer global convolution using multiscale subkernels, enabling the capture of both fine-grained local patterns and long-range temporal dependencies in noisy, high-resolution multichannel waveforms. Extensive experiments on real-world datasets demonstrate that SL-S4Wave (1) consistently outperforms state-of-the-art supervised and self-supervised baselines in a challenging arrhythmia detection task, (2) achieves high performance with significantly fewer labeled examples, showcasing strong label efficiency, and (3) maintains robust performance on long waveform segments, highlighting its capacity to model complex temporal dynamics in long sequences that most existing approaches fail to efficiently model, and (4) transfers effectively to unseen arrhythmia types, underscoring its robust cross-domain generalization. We additionally evaluate SL-S4Wave on multiple EEG tasks, achieving superior performance over strong baselines, demonstrating generalizability of our approach beyond cardiac waveforms.

URL PDF HTML ☆

赞 0 踩 0

2606.20167 2026-06-19 cs.LG 新提交

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

多模态对比学习用于基于位置绑定的隐式地球嵌入

Jonathan Hecht, Lukas Arzoumanidis, Ziyue Li, Youness Dehbi

发表机构 * Computational Methods Lab, HafenCity University Hamburg（汉堡港城大学计算方法实验室）； Dept. of Operations & Technology, Technical University of Munich（慕尼黑工业大学运营与技术系；海尔布隆数据科学中心；慕尼黑数据科学研究所）； Heilbronn Data Science Center（波恩大学大地测量与地理信息研究所）； Munich Data Science Institute ； Institute of Geodesy and Geoinformation, University of Bonn

AI总结提出两种多模态对比学习架构MELT和SALT，通过位置绑定整合未配对地理数据，在四个下游任务中匹配最强双模态基线SATCLIP，但增加模态数未持续提升性能，表明位置编码器是主要瓶颈。

详情

AI中文摘要

空间预测任务通常受限于缺乏高质量标记的地面真值观测。为克服这一挑战，自监督预训练是一种可能的解决方案，其中对比学习在位置编码器中占主导地位。这些方法通常仅将地理坐标与一种额外模态对齐。我们提出了两种多模态对比学习架构：通过位置绑定的多模态嵌入（MELT）和顺序交替位置训练（SALT）。这些架构通过利用未配对的地理空间数据，将框架扩展到两种模态以上。两种方法在技术上均可行，并在四个下游任务中匹配了最强的双模态基线（SATCLIP）的性能。然而，增加模态数量并未持续提升性能，这表明所选的位置编码器是主要限制——对比目标在早期达到峰值，无论模态多样性或预训练量如何。MELT比SALT提供更稳定的训练，并为未来的扩展提供了更强的基础。

英文摘要

Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for location encoders. Those approaches usually align geographic coordinates with just one additional modality. We propose two multimodal contrastive learning architectures: Multimodal Embedding via Location Tying (MELT) and Sequential Alternating Location Training (SALT). These architectures expand this framework beyond two modalities by utilising unpaired geospatial data. Both methods are technically viable and match the performance of the strongest two-modality baseline (SATCLIP) across four downstream tasks. However, increasing the number of modalities does not consistently improve performance, suggesting that the chosen location encoder is the main limitation - the contrastive objective reaches its peak early, regardless of modality diversity or pre-training volume. MELT provides more stable training than SALT and presents a stronger foundation for future scaling.

URL PDF HTML ☆

赞 0 踩 0

2606.19882 2026-06-19 cs.CV cs.LG 交叉投稿

Multimodal Concept Bottleneck Models

多模态概念瓶颈模型

Tongqing Shi, Ge Yan, Tuomas Oikarinen, Tsui-Wei Weng

发表机构 * UC San Diego（加州大学圣地亚哥分校）

AI总结提出多模态概念瓶颈模型（MM-CBM），利用双概念瓶颈层对齐图像和文本嵌入，实现可解释的零样本分类和图像检索，在四个基准上平均准确率提升高达51.26%。

Comments Present at NeurIPS 2025 Mechanistic Interpretability Workshop

详情

AI中文摘要

概念瓶颈模型（CBM）通过将图像提取的特征与自然概念对齐，增强了深度学习网络的可解释性。然而，现有的CBM在泛化到固定预定义类别集之外的能力以及非概念信息泄露的风险方面受到限制，其中预期概念之外的预测信号被无意中利用。在本文中，我们提出了多模态概念瓶颈模型（MM-CBM）来解决这些问题，并将CBM扩展到CLIP。MM-CBM利用双概念瓶颈层（CBL）将图像和文本嵌入对齐为可解释的特征。这使我们能够以可解释的方式执行新的视觉任务，如零样本分类或图像检索。与现有方法相比，MM-CBM在四个标准基准上平均准确率提升高达51.26%。我们的方法保持高准确率，在黑盒性能的约5%以内，同时提供更高的可解释性。

英文摘要

Concept Bottleneck Models (CBMs) enhance the interpretability of deep learning networks by aligning the features extracted from images with natural concepts. However, existing CBMs are constrained in their ability to generalize beyond a fixed set of predefined classes and the risk of non-concept information leakage, where predictive signals outside the intended concepts are inadvertently exploited. In this paper, we propose Multimodal Concept Bottleneck Model (MM-CBM) to address these issues and extend CBMs into CLIP. MM-CBM utilizes dual Concept Bottleneck Layers (CBLs) to align both the image and text embeddings into interpretable features. This allows us to perform new vision tasks like zero-shot classification or image retrieval in an interpretable way. Compared to existing methods, MM-CBM achieves up to 51.26% accuracy improvement on average across four standard benchmarks. Our method maintains high accuracy, staying within ~5% of black-box performance while offering greater interpretability.

URL PDF HTML ☆

赞 0 踩 0

2606.20559 2026-06-19 cs.CV cs.LG 交叉投稿

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

UNIEGO：代理作为中介的统一自我中心视频表示学习

Wenhao Chi, Arkaprava Sinha, Dominick Reilly, Hieu Le, Srijan Das

AI总结提出分层多教师蒸馏框架UNIEGO，通过代理模型将异构教师知识转化为同质自我中心空间，并采用选择性代理蒸馏自适应筛选可靠监督，在三个自我中心视频理解任务上达到最优。

详情

AI中文摘要

自我中心视频理解本质上受限于可穿戴摄像头的狭窄视角：单一视角、单一模态、单一模型无法捕捉人类动作的全部丰富性。我们认为，真正富有表现力的自我中心表示必须包含跨视角、跨模态和基础模型表示的互补知识，同时仍能仅从自我中心视频部署。为此，我们引入了一个分层多教师蒸馏框架，生成UNIEGO，一个统一的自我中心编码器，使用九个教师（涵盖自我-外部视角、RGB、深度和骨架模态）以及四个基础模型进行训练。我们的框架不是直接从异构教师中蒸馏（其不兼容的架构和特征几何会导致冲突梯度），而是在其中插入一层表示特定的代理模型，将多样的教师知识转化为同质的自我中心空间。第二阶段蒸馏，即选择性代理蒸馏（SPD），然后自适应地为每个训练样本选择既正确又自信的代理子集，仅从可靠监督中蒸馏并抑制错误信号。SPD进一步通过将UNIEGO初始化为代理参数的凸组合来稳定，在蒸馏开始前将统一模型置于损失景观的良好条件区域。UNIEGO在三个自我中心视频理解任务（动作识别、视频检索和动作分割）上，在三个具有挑战性的自我-外部基准测试中达到了最先进的性能，优于朴素的多教师蒸馏基线，并证明了结构化的、代理中介的知识转移能产生更丰富、更具判别性的自我中心表示。

英文摘要

Egocentric video understanding is inherently limited by the narrow perspective of wearable cameras: a single viewpoint, a single modality, a single model cannot capture the full richness of human action. We argue that a truly expressive egocentric representation must subsume complementary knowledge across viewpoints, modalities, and foundation model representations, yet remain deployable from egocentric video alone. To this end, we introduce a hierarchical multi-teacher distillation framework that produces UNIEGO, a unified egocentric encoder trained with nine teachers spanning ego-exo viewpoints, RGB, depth, and skeleton modalities, and four foundation models. Rather than distilling directly from heterogeneous teachers whose incompatible architectures and feature geometries induce conflicting gradients, our framework interposes a layer of representation-specific Proxy models that translate diverse teacher knowledge into a homogeneous egocentric space. A second distillation stage, Selective Proxy Distillation (SPD), then adaptively selects, for each training sample, the subset of proxies that are both correct and confident, distilling exclusively from reliable supervision and suppressing erroneous signals. SPD is further stabilized by initializing UNIEGO as a learned convex combination of proxy parameters, placing the unified model in a well-conditioned region of the loss landscape before distillation begins. UNIEGO achieves state-of-the-art performance across three egocentric video understanding tasks - action recognition, video retrieval, and action segmentation on three challenging ego-exo benchmarks, outperforming naive multi-teacher distillation baselines and demonstrating that structured, proxy-mediated knowledge transfer yields richer and more discriminative egocentric representations.

URL PDF HTML ☆

赞 0 踩 0

2406.07775 2026-06-19 cs.LG 版本更新

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

基于自注意力的非线性基变换用于动态光纤传输矩阵的紧凑潜在空间建模

Yijie Zheng, Robert J. Kilpatrick, David B. Phillips, George S. D. Gordon

发表机构 * Optics and Photonics research group, University of Nottingham, UK（诺丁汉大学光学与光子学研究组，英国）； University of Exeter, UK（埃克塞特大学，英国）； State Key Laboratory of Extreme Photonics and Instrumentation, College of Optical Science and Engineering International Research Center for Advanced Photonics, Zhejiang University, Hangzhou, China（极端光子学与仪器国家重点实验室，浙江大学光科学与工程学院，国际先进光子学研究中心，中国杭州）； Research Center for Humanoid Sensing, Zhejiang Lab, Hangzhou, China（人感知研究中心，浙江实验室，中国杭州）

AI总结提出使用自注意力层动态变换光纤矩阵的坐标表示到紧凑基，实现低维表示，在多个数据集上验证了基稀疏性（参与比0.01-0.11）和低重建误差（<10%）。

详情

AI中文摘要

多模光纤是头发丝粗细的玻璃丝，能高效传输光。它们有望实现下一代医用内窥镜，在体内深处提供前所未有的亚细胞图像分辨率。然而，将光限制在这样的光纤中意味着图像在传输过程中固有地被打乱。传统上，通过预先校准特定光纤如何打乱光并求解表示光纤物理模型的静态线性矩阵方程来补偿这种打乱。然而，随着技术向实际部署发展，解扰过程必须考虑由于移动和温度变化等因素导致的光纤对光影响的矩阵的动态变化，以及由于光纤尖端在体内不可及而产生的非线性。这种复杂、动态和非线性行为非常适合用神经网络近似，但大多数领先的图像重建网络依赖卷积层，这些层假设相邻像素之间存在强相关性，这种强归纳偏置不适用于光纤矩阵，因为光纤矩阵可以用具有长程相关性的任意坐标表示来表达。我们引入了一个新概念，使用自注意力层将变化的光纤矩阵的坐标表示动态变换到允许紧凑、低维表示的基，适合进一步处理。我们在不同的光纤矩阵数据集上展示了该方法的有效性。我们展示了我们的模型在变换基上显著提高了光纤基的稀疏性，以参与比p作为稀疏性度量，介于0.01和0.11之间。此外，我们展示了这些变换后的表示允许以<10%的重建误差重建原始矩阵，证明了可逆性。

英文摘要

Multimode optical fibres are hair-thin strands of glass that efficiently transport light. They promise next-generation medical endoscopes that provide unprecedented sub-cellular image resolution deep inside the body. However, confining light to such fibres means that images are inherently scrambled in transit. Conventionally, this scrambling has been compensated by pre-calibrating how a specific fibre scrambles light and solving a stationary linear matrix equation that represents a physical model of the fibre. However, as the technology develops towards real-world deployment, the unscrambling process must account for dynamic changes in the matrix representing the fibre's effect on light, due to factors such as movement and temperature shifts, and non-linearities resulting from the inaccessibility of the fibre tip when inside the body. Such complex, dynamic and nonlinear behaviour is well-suited to approximation by neural networks, but most leading image reconstruction networks rely on convolutional layers, which assume strong correlations between adjacent pixels, a strong inductive bias that is inappropriate for fibre matrices which may be expressed in a range of arbitrary coordinate representations with long-range correlations. We introduce a new concept that uses self-attention layers to dynamically transform the coordinate representations of varying fibre matrices to a basis that admits compact, low-dimensional representations suitable for further processing. We demonstrate the effectiveness of this approach on diverse fibre matrix datasets. We show our models significantly improve the sparsity of fibre bases in their transformed bases with a participation ratio, p, as a measure of sparsity, of between 0.01 and 0.11. Further, we show that these transformed representations admit reconstruction of the original matrices with < 10% reconstruction error, demonstrating the invertibility.

URL PDF HTML ☆

赞 0 踩 0

2502.03227 2026-06-19 cs.LG cs.CV 版本更新

Adversarial Dependence Minimization

对抗性依赖最小化

Pierre-François De Plaen, Tinne Tuytelaars, Marc Proesmans, Luc Van Gool

发表机构 * CVL, ETH Zürich, Switzerland（CVL，苏黎世联邦理工学院，瑞士）； INSAIT, Sofia University, Bulgaria（INSAIT，索菲亚大学，保加利亚）

AI总结提出ADM算法，通过对抗博弈最小化特征维度间的统计依赖性，证明全局最优时达到相互独立，并应用于非线性去相关、图像分类泛化提升和自监督学习维度坍塌预防。

2606.19370 2026-06-19 cs.LG cs.AI cs.MA 新提交

Human-like autonomy emerges from self-play and a pinch of human data

类人自主性从自我对弈和少量人类数据中涌现

Daphne Cornelisse, Julian Hunt, Zixu Zhang, Waël Doulazmi, Kevin Joseph, Jaime Fernández Fisac, Eugene Vinitsky

发表机构 * NYU Tandon School of Engineering（纽约大学坦登工程学院）； NYU Courant（纽约大学库朗数学科学研究所）； Princeton University（普林斯顿大学）； Centre for Robotics, Mines Paris（巴黎矿业大学机器人中心）； Valeo（法雷奥）

AI总结提出一种结合自我对弈强化学习与少量人类演示的正则化方法，仅用30分钟人类数据即可训练出与人类协调的驾驶策略，训练时间仅15小时。

Comments 10 pages

详情

AI中文摘要

自我对弈强化学习最近成为一种无需任何人类数据即可训练驾驶策略的方法。它利用廉价的大规模模拟来替代昂贵的大规模人类驾驶演示。这种方法的一个关键局限性是，通过纯自我对弈训练的策略可以学习有效但不符合人类习惯的驾驶惯例。先前的工作试图通过广泛的奖励工程和领域随机化来缓解这种行为偏差，但这些方法脆弱且劳动密集。我们的方法没有完全抛弃人类演示，而是将其作为最小安全目标达到奖励之上的正则化目标。就像好炖菜中的香料一样，我们发现少量人类数据大有裨益：我们的方法仅使用30分钟的人类演示，比同类模仿学习方法少2500倍。由此产生的策略与保留的人类轨迹协调，并在单个消费级GPU上15小时内完成训练。视频和完整源代码见https://this URL。

英文摘要

Self-play reinforcement learning has recently emerged as a way to train driving policies without any human data. It uses cheap, large-scale simulations to substitute expensive, large-scale human driving demonstrations. A key limitation of this approach is that policies trained through pure self-play can learn effective but alien driving conventions incompatible with people. Previous works attempt to mitigate such behavioral misalignments through extensive reward engineering and domain randomization, which are brittle and labor-intensive. Instead of completely discarding human demonstrations, our method treats them as a regularization objective on top of a minimal safe goal-reaching reward. Like the spice in a good stew, we find that a little human data goes a long way: our method uses only 30 minutes of human demonstrations, 2500x fewer than comparable imitation learning approaches. Resulting policies coordinate with held-out human trajectories and complete training in 15 hours on a single consumer-grade GPU. Videos and full source code are available at https://spiced-self-play.com/.

URL PDF HTML ☆

赞 0 踩 0

2606.19476 2026-06-19 cs.LG cs.AI 新提交

Can In-Context Learning Support Intrinsic Curiosity?

上下文学习能否支持内在好奇心？

Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Agüera y Arcas, João Sacramento, Rif A. Saurous, Guillaume Lajoie

发表机构 * Google – Paradigms of Intelligence Team（Google – 智能范式团队）； Google DeepMind

AI总结研究利用序列模型的上下文学习能力作为即时无更新世界模型，以消除传统内在好奇心方法中梯度下降的计算瓶颈，理论证明在非时间设置下可渐近收敛到真实学习进度。

详情

AI中文摘要

有效的机器学习不仅取决于我们如何对数据建模，还取决于我们选择收集哪些数据。虽然大型序列模型已经彻底改变了数据建模，但自动数据选择或“内在好奇心”的问题仍然是一个重大挑战。经典方法通过基于智能体的“学习进度”奖励来激励探索，该奖励衡量新获得的观测在多大程度上改进了世界模型的预测能力。然而，传统上评估这些奖励需要在每个轨迹内进行昂贵的梯度下降内循环更新，这使得它们在规模上计算上不可行。在这项工作中，我们研究序列模型涌现的上下文学习（ICL）能力是否可以通过作为即时的、无需更新的世界模型来消除这一瓶颈。具体来说，我们评估是否可以训练一个探索策略来最大化学习进度，仅使用上下文学习者的预测误差和反事实上下文操作。我们首先证明，在一般马尔可夫决策过程中，这实际上不可能以无偏的方式实现：由此产生的内在奖励要么包含干扰项，使其对真实学习进度的估计产生偏差，要么无法使用上下文学习者的预测误差来实现。相反，我们对于非时间设置的一个广泛子类（包括主动学习和贝叶斯实验设计）证明了积极结果：在这里，ICL派生的奖励成功界定了真实学习进度并渐近收敛到它。我们通过连续和符号环境中的受控实验证实了我们的理论，表明我们的ICL驱动框架成功训练了以最优方式进行探索的好奇数据收集策略。

英文摘要

Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewards traditionally requires expensive inner loops of gradient descent updates within each trajectory, rendering them computationally impractical at scale. In this work, we investigate whether the emergent in-context learning (ICL) capabilities of sequence models can eliminate this bottleneck by serving as immediate, update-free world models. Specifically, we evaluate whether an exploration policy can be trained to maximize learning progress, using solely the prediction errors and counterfactual context manipulations of an in-context learner. We first prove that in general Markov decision processes, this is in fact impossible in an unbiased way: the resulting intrinsic rewards either suffer from nuisance terms that bias their estimation of true learning progress, or they cannot be implemented using an in-context learner's prediction errors. Conversely, we prove a positive result for a broad subclass of non-temporal settings, encompassing active learning and Bayesian Experimental Design: here, ICL-derived rewards successfully bound and asymptotically converge to the true learning progress. We corroborate our theory with controlled experiments across continuous and symbolic environments, demonstrating that our ICL-driven framework successfully trains curious data-collection policies that explore optimally.

URL PDF HTML ☆

赞 0 踩 0

2606.19690 2026-06-19 cs.LG 新提交

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

多粒度注意力驱动的强化学习框架用于Web智能增强系统

Navin Chhibber, Deepak Singh, Anokh Kishore, Nikita Chawla, K. Anguraj

AI总结提出MGAR-WIES框架，通过语义图建模、注意力机制和自适应强化学习，解决Web环境中异构动态数据的语义理解与可扩展性问题，在准确率上达到80%。

Comments 2026 3rd International Conference on Integrated Intelligence and Communication Systems (ICIICS), 6 Pages

详情

AI中文摘要

近年来，Web智能增强系统越来越依赖异构和动态的Web数据来提供个性化的上下文感知服务。然而，传统的机器学习、深度学习和强化学习模型在持续演化的Web环境中往往难以应对语义理解、适应性和可扩展性的挑战。本研究提出了一种基于多粒度注意力的强化Web智能增强系统（MGAR-WIES），通过集成语义图建模、注意力机制和自适应强化学习来应对这些挑战。首先，收集包括结构化、半结构化和非结构化来源的异构Web数据，并进行预处理以生成统一特征表示。这些表示被转换为动态语义图，其中实体及其关系通过注意力机制增强的图嵌入进行建模，以捕捉局部相关性和全局上下文依赖。随后，一种自适应多智能体强化学习策略利用注意力感知的语义状态来优化个性化Web动作，如内容推荐、导航优化和服务自适应。最后，持续在线反馈被进一步集成，以实时更新图表示和学习策略，确保持续的适应性和性能。与现有方法相比，提出的MGAR-WIES在准确率（80%）方面取得了更好的结果。

英文摘要

From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.19721 2026-06-19 cs.LG cs.AI 新提交

OnDeFog: Online Decision Transformer under Frame Dropping

OnDeFog：帧丢失下的在线决策变压器

Daiki Yotsufuji, Kenta Nishihara, Shoma Shimizu, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University（横滨国立大学）

AI总结针对帧丢失导致性能下降的问题，提出OnDeFog，将DeFog机制与在线决策变压器结合，通过直接环境交互学习策略，在高丢帧率环境下优于ODT，在低奖励数据集上优于DeFog。

Comments Accepted to PRICAI 2025

详情

DOI: 10.1007/978-981-95-7072-0_10

AI中文摘要

在具有挑战性的现实世界强化学习应用中，通信延迟或传感器故障经常导致帧丢失，此时智能体无法接收丢失的状态及相关奖励。为了解决帧丢失导致的性能下降问题，通过将额外机制引入决策变压器以处理帧丢失，开发了随机帧丢失下的决策变压器（DeFog）。尽管DeFog可以缓解帧丢失环境中的性能下降，但由于DeFog是一种离线学习方法，它难以有效泛化到训练数据集中未充分表示的新状态。在本研究中，我们提出OnDeFog，它将DeFog中的机制与在线决策变压器（ODT）相结合，ODT是一种通过直接环境交互学习策略的在线强化学习方法。全面的实验评估表明，我们提出的OnDeFog在高丢帧率环境下相比ODT取得了更优的性能，并且在包含大量低奖励数据的数据集上优于DeFog。

英文摘要

In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

URL PDF HTML ☆

赞 0 踩 0

2606.19750 2026-06-19 cs.LG cs.AI cs.CL 新提交

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

流形赌博机：大语言模型潜在几何上的贝叶斯课程学习

Darrien McKenzie, Nicklas Hansen, Xiaolong Wang

发表机构 * University of California, San Diego（加州大学圣迭戈分校）

AI总结提出贝叶斯流形课程（BMC）框架，将问题采样建模为流形结构赌博机问题，通过层次任务树和贝叶斯学习引导采样，平衡学习信号、多样性和实用性。

Comments Webpage: https://darrienmckenzie.com/manifold-bandits/

详情

AI中文摘要

强化学习（RL）是提高大语言模型（LLMs）推理能力的关键方法，其中训练效率关键取决于优化过程中问题的采样方式。现有的自适应课程学习方法通常优先考虑中等难度的提示，将问题选择视为具有独立臂的标准赌博机问题，忽略了任务空间的结构化和异质性。在这项工作中，我们将问题采样框架化为具有内生非平稳性的流形结构赌博机问题：问题通过模型的潜在表示空间相关联，采样决策可以影响学习信号在该空间中的演变方式。为了实现这一视角，我们引入了贝叶斯流形课程（BMC），这是一个结构感知框架，将问题组织成层次任务树，并应用贝叶斯学习来指导采样。实验发现，不同的采样策略在生产性（学习信号）、多样性（任务流形覆盖）和实用性（评估相关性）之间引入了非平凡的权衡。这些结果表明，仅优先考虑难度不足以获得强大的下游性能，突出了将结构和类型感知纳入问题采样中的重要性。

英文摘要

Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing adaptive curriculum learning methods typically prioritize prompts of intermediate difficulty, treating problem selection as a standard bandit problem with independent arms and overlooking the structured, heterogeneous nature of the task space. In this work, we frame problem sampling as a manifold-structured bandit problem with endogenous non-stationarity: problems are related through the model's latent representation space, and sampling decisions can steer how learning signals evolve across that space. To operationalize this perspective, we introduce Bayesian Manifold Curriculum (BMC), a structure-aware framework that organizes problems into a hierarchical task tree and applies Bayesian learning to guide sampling. Empirically, we find that different sampling strategies induce non-trivial tradeoffs between productivity (learning signal), diversity (coverage of the task manifold), and utility (evaluation relevance). These results show that prioritizing difficulty alone is insufficient for strong downstream performance, highlighting the importance of incorporating structure and type-awareness into problem sampling.

URL PDF HTML ☆

赞 0 踩 0

2606.19883 2026-06-19 cs.LG stat.ML 新提交

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

匹配市场遇上累积前景理论：迈向最优和对抗鲁棒学习

Ananya Kunisetty, Avishek Ghosh

发表机构 * Indian Institute of Technology Bombay（印度理工学院孟买分校）

AI总结研究基于累积前景理论（CPT）的竞争性双边匹配市场多智能体多臂赌博机问题，提出最优遗憾界算法并扩展到对抗性市场。

Comments Accepted at ECML-PKDD 2026, Naples, Italy

详情

AI中文摘要

我们研究了一个在竞争性设置下具有双边匹配市场的多智能体多臂赌博机问题，该问题基于以人为中心的决策模型。为了捕捉人类偏好，我们使用累积前景理论（CPT），该理论通过一个（α-Hölder连续）权重函数以非线性方式加权智能体的行动。CPT已被广泛用于行为经济学和风险敏感机器学习中，以模拟人类偏好。我们分析了带有CPT权重扭曲奖励的最先进学习算法，并获得了玩家最优遗憾界为$\mathcal{O}(K\log T \left(\frac{1}{\Delta}\right)^{2/\alpha})$，其中$K$表示臂数，$T$是学习时间，$\Delta$表示（适当定义的）玩家的最小偏好差距。注意到对$\Delta$的依赖是次优的，我们通过明智地选择探索期间的活跃臂集进一步改进了这一遗憾，从而在主导项中消除了对$K$的依赖，并在臂数$K$显著大于玩家数$N$的设置中实现了改进的（最优）遗憾保证。此外，我们考虑了对抗性市场，其中智能体的观测奖励可能被破坏。我们提出并分析了在已知和未知总破坏预算两种设置下，以CPT作为风险敏感度量的鲁棒市场算法，并在两种情况下建立了对数级别的玩家最优遗憾保证。

英文摘要

We study a multi-agent multi-armed bandit problem in the competitive setup with two-sided matching markets under a human centric decision making model. To capture human preferences, we use cumulative prospect theory (CPT) that weighs the actions of the agent in a nonlinear fashion using a ($α$-Hölder continuous) weight function. CPT has been widely used in behavioral economics and risk sensitive machine learning to emulate human preferences. We analyze the state-of-the-art learning algorithm with CPT weight distorted rewards and obtain a player optimal regret of $\mathcal{O}(K\log T \left(\frac{1}Δ\right)^{2/α})$, where $K$ denotes the number of arms, $T$ is the learning horizon, and $Δ$ represents (suitably defined) players' minimum preference gap. Noticing the dependence on $Δ$ to be sub-optimal, we further improve this regret by judiciously selecting the active set of arms during exploration, which removes the dependence on $K$ in the dominant term and achieves an improved (optimal) regret guarantees in the setting where the number of arms $K$ is significantly larger than the number of players $N$. In addition, we consider adversarial markets where the observed rewards of the agents may be corrupted. We propose and analyze algorithms for robust markets with CPT as risk sensitive measure in both settings where the total corruption budget is known and where it is unknown, and establish logarithmic player-optimal regret guarantees in both cases.

URL PDF HTML ☆

赞 0 踩 0

2606.20002 2026-06-19 cs.LG cs.AI cs.CL 新提交

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Connect the Dots：通过强化学习训练具备跨域泛化能力的长期生命周期智能体

Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou

发表机构 * Alibaba Group（阿里巴巴集团）

AI总结提出Connect the Dots框架，通过端到端强化学习训练LLM在长期任务中自我更新上下文并泛化到新领域，实验验证了跨域泛化能力。

Comments Work in progress; we will continuously update the codebase and arXiv version

详情

AI中文摘要

本文提出了一个通用框架，用于训练大型语言模型（LLMs）具备“Connect the Dots”（CoD）这一元能力，该能力是长期生命周期智能体所必需的：当基于LLM的AI智能体部署在环境中时，它解决一系列长期任务，同时持续探索环境、从自身经验中学习，并迭代地自我更新关于环境的上下文，从而在更新上下文的条件下，在未来任务上实现逐步更好的性能。CoD框架的主要组成部分包括：（1）用于端到端强化学习（RL）的算法设计和基础设施，其中包含交替执行任务和更新上下文的长展开序列；（2）用于在训练过程中激励和激发LLM中目标元能力的任务和环境，以及在评估过程中忠实衡量进展的任务和环境。我们展示了CoD框架的概念验证实现，包括具有细粒度信用分配的GRPO风格RL算法，以及针对目标元能力（而非特定领域的LLM能力或标准的逐任务RL）量身定制的任务和环境。实证结果验证了CoD设置中端到端RL训练的有效性，并展示了所激发元能力的分布外泛化潜力——在训练领域内、跨不同领域以及从CoD到Ralph-loop设置中。我们对CoD的研究连接了多项先前工作，并为推进LLM和AI智能体开辟了新的机遇。为促进进一步研究和应用，我们在\url{this https URL}上发布了我们的实现。

英文摘要

This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deployed in an environment, it solves a long sequence of tasks while continuously exploring the environment, learning from its own experiences, and iteratively self-updating its context about the environment, thereby achieving progressively better performance on future tasks conditioned on the updated context. Major components of the CoD framework include: (1) algorithm design and infrastructure for end-to-end reinforcement learning (RL) with long rollout sequences interleaving solve-task and update-context episodes; (2) tasks and environments for incentivizing and eliciting the targeted meta-capability in LLMs during training, as well as for faithfully measuring progress during evaluation. We present proof-of-concept implementations of the CoD framework, including a GRPO-style RL algorithm with fine-grained credit assignment, as well as tasks and environments tailored to the targeted meta-capability (rather than domain-specific LLM capabilities or standard task-by-task RL). Empirical results validate the efficacy of end-to-end RL training in the CoD setting, and demonstrate the potential for out-of-distribution generalization -- within the training domains, across different domains, and from CoD to Ralph-loop settings -- of the elicited meta-capability. Our investigation of CoD connects several lines of prior works, and opens up new opportunities for advancing LLMs and AI agents. To facilitate further research and applications, we release our implementations at \url{https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod}.

URL PDF HTML ☆

赞 0 踩 0

2606.20008 2026-06-19 cs.LG 新提交

VIMPO: Value-Implicit Policy Optimization for LLMs

VIMPO: 值隐式策略优化用于大语言模型

Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao

发表机构 * UC Berkeley（加州大学伯克利分校）； Yale University（耶鲁大学）

AI总结提出VIMPO方法，通过KL正则化强化学习的最优条件导出策略隐含值函数，无需训练评论家，实现细粒度信用分配，在数学推理基准上优于GRPO。

详情

AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的核心工具，但当前方法在简单性与信用分配之间存在权衡。GRPO等群组相对方法避免了训练评论家，但通常为每个token分配轨迹级优势。Actor-critic方法提供更密集的学习信号，但需要学习值函数，其自身存在训练不稳定性。我们提出VIMPO，一种无需评论家的策略优化方法，从KL正则化强化学习的最优条件推导出策略隐含值函数。对于自回归生成，得到的值递归可以用策略-参考对数比率表示，并由轨迹结束时无未来奖励的终止条件锚定。这给出了一个简单的值损失，它结合了结果级可验证奖励，而无需训练评论家。相同的推导也产生了无需评论家的actor优势，使VIMPO能够通过值损失分离奖励合并，并通过PPO风格的actor更新进行策略改进。在数学RLVR基准上，VIMPO在MATH-500、AIME 2024、AIME 2025和OlympiadBench上均优于GRPO，尤其在竞赛式评估中提升更大。在噪声奖励下，VIMPO保持对GRPO的持续优势，表明策略隐含值优化可以在保持无评论家训练实用简单性的同时提供更精细的信用分配。

英文摘要

Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO avoid training a critic, but typically assign a trajectory-level advantage to every token. Actor-critic methods provide denser learning signals, but require a learned value function with its own training instability. We introduce VIMPO, a critic-free policy optimization method that derives a policy-implied value function from the optimality conditions of KL-regularized reinforcement learning. For autoregressive generation, the resulting value recurrence can be written in terms of policy-reference log-ratios and anchored by the terminal condition that no future reward remains at the end of a trajectory. This gives a simple value loss that incorporates outcome-level verifiable rewards without training a critic. The same derivation also yields a critic-free actor advantage, allowing VIMPO to separate reward incorporation through the value loss from policy improvement through a PPO-style actor update. On mathematical RLVR benchmarks, VIMPO improves over GRPO across MATH-500, AIME 2024, AIME 2025, and OlympiadBench, with especially larger gains on competition-style evaluations. Under noisy rewards, VIMPO retains a consistent advantage over GRPO, suggesting that policy-implied value optimization can provide finer credit assignment while preserving the practical simplicity of critic-free training.

URL PDF HTML ☆

赞 0 踩 0

2606.20014 2026-06-19 cs.LG cs.AI 新提交

Hierarchical Control in Multi-Agent Games: LLM-based Planning and RL Execution

多智能体博弈中的层次化控制：基于LLM的规划与RL执行

Jannik Hösch, Alessandro Sestini, Florian Fuchs, Amir Baghi, Joakim Bergdahl, Konrad Tollmar, Jean-Philippe Barrette-LaPierre, Linus Gisslén

AI总结提出LLM作为中央策略控制器选择RL技能策略的层次化架构，在2v2对抗环境中达到与手工BT相当的胜率，且被感知为最类人。

Comments 12 pages, 9 figures

详情

AI中文摘要

强化学习（RL）在序列决策中取得了强劲表现，但由于稀疏奖励、大状态-动作空间以及学习协调策略的困难，扩展到复杂多智能体环境仍具挑战。我们提出一种层次化架构，其中预训练的大语言模型（LLM）作为集中式策略控制器，为一组智能体选择专门的RL技能策略，而RL策略负责反应式底层执行。我们在竞争性2v2 King of the Hill环境中评估该混合系统，与行为树（BT）和“扁平”RL（无技能分解的端到端训练）基线进行比较。LLM+RL系统实现了与手工BT统计上相当的任务性能（胜率46.4% vs 51.5%，p=0.103），而两者均显著优于无技能分解训练的扁平RL。一项用户研究（n=15）显示，60%的参与者认为LLM+RL智能体最像人类（p=0.027），归因于行为适应性和战术变异性。这些结果表明，预训练LLM推理可以有效编排预训练RL技能，实现具有竞争力的多智能体协调和优越的感知可信度，而无需手动规则工程。

英文摘要

Reinforcement learning (RL) has achieved strong performance in sequential decision-making, yet scaling to complex multi-agent environments remains challenging due to sparse rewards, large state-action spaces, and the difficulty of learning coordinated strategies. We propose a hierarchical architecture where a pretrained large language model (LLM) acts as a centralized strategic controller that selects among specialized RL skill policies for a team of agents, while RL policies handle reactive low-level execution. We evaluate this hybrid system in a competitive 2v2 King of the Hill environment against behavior tree (BT) and \emph{``Flat''} RL (end-to-end training without skill decomposition) baselines. The LLM+RL system achieves task performance statistically equivalent to hand-crafted BT (46.4\% vs 51.5\% win rate, $p=0.103$) while both significantly outperform Flat RL trained without skill decomposition. A user study ($n=15$) reveals that 60\% of participants perceive LLM+RL agents as the most human-like ($p=0.027$), citing behavioral adaptability and tactical variability. These results demonstrate that pretrained LLM reasoning can effectively orchestrate pretrained RL skills, achieving competitive multi-agent coordination and superior perceived believability without manual rule engineering.

URL PDF HTML ☆

赞 0 踩 0

2606.20104 2026-06-19 cs.LG cs.AI 新提交

Sensorimotor World Models: Perception for Action via Inverse Dynamics

传感器运动世界模型：通过逆动力学实现面向行动感知

Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf

发表机构 * Max Planck Institute for Intelligent Systems（马克斯·普朗克智能系统研究所）； Department of Computer Science, Brown University（布朗大学计算机科学系）； ELLIS Institute（ELLIS研究所）； ETH Zürich（苏黎世联邦理工学院）

AI总结提出传感器运动世界模型（SMWM），通过逆动力学正则化端到端训练潜空间世界模型，防止表示崩溃并学习与行动对齐的紧凑表示，在2D和3D控制任务中实现竞争性规划性能。

详情

AI中文摘要

面向行动的感知表明，世界的表示不应仅由视觉保真度决定，而应由其与行动的相关性决定。同时，潜在的JEPA风格世界模型主张从高维观测中学习紧凑的预测状态以促进未来状态的预测，但这些模型的端到端训练并非易事，因为如果我们的唯一目标是构建易于预测的潜在状态，表示可能会崩溃。我们引入了一种传感器运动世界模型（SMWM）：一种通过逆动力学正则化进行端到端训练的潜在世界模型。这一单一正则化解决了两个问题：它防止表示崩溃并诱导与行动对齐的表示。通过迫使潜在状态保留关于转换背后行动的信息，它使模型偏向于环境中可控的自由度，同时丢弃不可控的干扰因素。这产生了从离线、无奖励轨迹中训练的稳定潜在世界模型，无需冻结编码器、指数移动平均或复杂的潜在正则化。实验表明，SMWM学习了紧凑、可解释的潜在空间，并在简单的2D和3D控制任务中实现了竞争性的规划性能。

英文摘要

Perception for action suggests that representations of the world should be shaped not by visual fidelity alone, but by their relevance for actions. At the same time, latent JEPA-style world models advocate learning compact predictive states from high-dimensional observations to facilitate the prediction of future states, but end-to-end training of these models is nontrivial because representations may collapse if our only goal is to construct a latent state that is easy to predict. We introduce a sensorimotor world model (SMWM): a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer addresses both issues: it prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers. Empirically, SMWM learns compact, interpretable latent spaces and enables competitive planning performance across simple 2D and 3D control tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.20107 2026-06-19 cs.LG 新提交

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

均值分位数：一种用于最小最大最优强化学习的无奖励集成方法

Asaf Cassel, Aviv Rosenberg

发表机构 * Google Research（谷歌研究院）

AI总结提出一种基于分位数的集成方法，无需计数即可在有限时域MDP中实现最优方差依赖的遗憾界，为强化学习中的集成探索提供理论依据。

2606.20411 2026-06-19 cs.LG 新提交

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

直接优势估计：可扩展且样本高效的深度强化学习

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结针对直接优势估计（DAE）在部分可观测域和高维观测下的局限性，本文扩展其理论框架并引入离散潜动态模型降低计算复杂度，在Arcade学习环境中验证了DAE的可扩展性和样本效率。

Comments Accepted at RLC2026

2606.20475 2026-06-19 cs.LG 新提交

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

边际优势累积用于记忆驱动智能体自我进化

Mingyu Yang, Keye Zheng, Congchao Cheng, Yujie Liu, Xingkang Lu, Fan Jiang, Yefei Zheng

发表机构 * Alibaba International Digital Commerce Group（阿里巴巴国际数字商业集团）

AI总结针对批量式轨迹蒸馏中跨批次证据缺失问题，提出边际优势累积（MAA）方法，通过差分信号构造、指数移动平均累积和语义身份合并，在16个设置中14个取得最佳结果，优化阶段token消耗减少约75%。

Comments 26 pages, 4 figures, 10 tables, 42 references

2606.19632 2026-06-19 cs.RO cs.AI cs.LG cs.LO cs.MA 交叉投稿

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

通过决策树蒸馏对学习到的多智能体通信策略进行形式化验证

Ahmad Farooq, Kamran Iqbal

发表机构 * University of Arkansas at Little Rock（阿肯色大学小石城分校）

AI总结提出通过决策树蒸馏将多智能体强化学习策略转化为可解释模型，并利用PRISM进行形式化验证，确保安全属性转移至原始网络，在无人机编队任务中实现88.9%属性满足率。

Comments 9 pages, 3 figures, 7 tables. Accepted at the 2026 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2026), Pittsburgh, Pennsylvania, USA, September 27-October 1, 2026

详情

AI中文摘要

多智能体强化学习使智能体能够通过涌现通信发展协调策略，但神经策略缺乏无人机群和自动驾驶车队等安全关键机器人部署所需的形式化安全保证。我们提出了首个通过学习策略抽象进行安全验证的端到端框架：神经策略被蒸馏为可解释的决策树，然后进行形式化验证，并通过经验验证确认验证的安全属性可转移至原始网络。我们的四阶段流程包括：从智能体观测中提取领域特定特征；决策树蒸馏达到97.9% +/- 1.2%的神经策略保真度；自动翻译为PRISM概率模型检查器规范，具有完整的特征到状态变量对应关系；以及通过成对分解、联合界聚合和经验邻居建模对概率计算树逻辑属性进行组合验证。评估用于5-7个智能体多无人机协调的矢量量化变分信息瓶颈策略，我们验证了18个涵盖安全性、活性和合作的时间逻辑属性，实现了88.9%的属性满足率，所有五个安全阈值均满足（碰撞概率0.3% vs 阈值1%）。原始神经策略的蒙特卡洛验证确认验证的安全属性转移偏差<=0.6个百分点（95%置信区间）。离散VQ-VIB消息相比连续方法提供+11.6至+13.6个百分点的保真度优势，实现3-4倍更快的验证。我们的框架为蒸馏策略抽象提供了经验验证的安全验证，作为深度多智能体强化学习与多机器人部署形式化安全工作流之间的实用桥梁。

英文摘要

Multi-agent reinforcement learning (MARL) enables agents to develop coordination strategies through emergent communication, but neural policies lack the formal safety guarantees required for safety-critical robotic deployment in drone swarms and autonomous vehicle fleets. We present the first end-to-end framework for safety verification of learned multi-agent communication policies through policy abstraction: neural policies are distilled into interpretable decision trees, then formally verified, with empirical validation confirming that verified safety properties transfer to original networks. Our four-stage pipeline consists of domain-specific feature extraction from agent observations, decision tree distillation achieving 97.9% +/- 1.2% fidelity to neural policies, automated translation to PRISM probabilistic model checker specifications with complete feature-to-state-variable correspondence, and compositional verification of Probabilistic Computation Tree Logic (PCTL) properties via pairwise decomposition with union-bound aggregation and empirical neighbor modeling. Evaluating Vector-Quantized Variational Information Bottleneck (VQ-VIB) policies for multi-drone coordination with 5-7 agents, we verify 18 temporal logic properties across safety, liveness, and cooperation, achieving 88.9% property satisfaction with all five safety thresholds satisfied (0.3% collision probability vs. 1% threshold). Monte Carlo validation of original neural policies confirms that verified safety properties transfer with <=0.6 percentage-point deviation (95% CI). Discrete VQ-VIB messages provide +11.6 to +13.6 percentage-point fidelity advantages over continuous methods, enabling 3-4x faster verification. Our framework provides empirically validated safety verification for distilled policy abstractions, serving as a practical bridge between deep MARL and formal safety workflows for multi-robot deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.19656 2026-06-19 cs.RO cs.LG 交叉投稿

DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

DF-ExpEnse: 扩散滤波探索用于高效样本微调

Calvin Luo, Chen Sun, Shuran Song

发表机构 * Stanford University（斯坦福大学）； Brown University（布朗大学）

AI总结提出DF-ExpEnse探索技术，利用生成控制策略的多模态建模能力和评论家集成，在微调中高效收集在线经验，提升样本效率。

Comments ICML 2026

详情

AI中文摘要

智能机器人决策的自然方案是从预训练的生成控制策略初始化，该策略总结了离线经验，并将其适应于自收集的在线经验。我们提出了DF-ExpEnse，一种探索技术，可提高在线经验收集的质量，从而提升微调样本效率。DF-ExpEnse利用生成控制策略的多模态建模能力，创建一个表达性强且易于评估的候选集。然后，它利用评论家集成来识别在质量与高探索兴趣之间最佳平衡的动作。在群体设置中，DF-ExpEnse进一步支持跨智能体通信，以促进群体协作探索。DF-ExpEnse可以无缝集成到通过强化学习微调预训练生成控制策略的现有策略中。我们通过实验验证，在各种操作和 locomotion 任务中，与默认微调和替代动作选择方案相比，DF-ExpEnse 持续带来样本效率优势。项目可在此 https URL 找到。

英文摘要

A natural recipe for intelligent robotic decision-making is initializing from pretrained generative control policies, which have summarized offline experience, and adapting them to self-collected online experience. We present DF-ExpEnse, an exploration technique that improves the quality of online experience collection, thus increasing finetuning sample-efficiency. DF-ExpEnse leverages the multimodal modeling capabilities of the generative control policy to create an expressive and tractably evaluatable candidate set. It then utilizes an ensemble of critics to identify the action that best balances quality with high exploration interest. In fleet settings, DF-ExpEnse further enables cross-agent communication to facilitate collaborative exploration as a group. DF-ExpEnse can be seamlessly integrated with existing strategies that finetune pretrained generative control policies via reinforcement learning. We experimentally validate consistent sample-efficiency benefits through DF-ExpEnse across a variety of manipulation and locomotion tasks, compared to default finetuning and alternative action selection schemes. Project can be found at https://df-expense.github.io.

URL PDF HTML ☆

赞 0 踩 0

2606.19920 2026-06-19 cs.RO cs.LG cs.MA 交叉投稿

Deep-Unfolded Coordination

深度展开协调

Hunter Kuperman, Minchan Jung, Rahul V. Ghosh, Alex Oshin, Evangelos A. Theodorou

发表机构 * Autonomous Control and Decision Systems Laboratory Georgia Institute of Technology United States（佐治亚理工学院自主控制与决策系统实验室）

AI总结提出Deep Coordinator框架，通过深度展开ADMM-DDP迭代学习动态调整超参数，实现非凸优化器求解时自适应惩罚参数，在车队和四旋翼仿真中速度提升6.18-9.44倍且可扩展至8倍规模。

Comments The second and third authors contributed equally (equal second authorship). 35 pages (10 pages main text), 17 figures, 3 tables

详情

AI中文摘要

分布式优化是一种高度可扩展且结构透明的技术，用于解决多机器人问题；然而，这类方法通常需要高度专门化、针对特定问题的超参数调整。在这项工作中，我们提出了Deep Coordinator，一个深度展开框架，学习在求解时根据优化器性能动态调整ADMM-DDP（一种流行的机器人任务分布式求解器）的超参数。我们的架构包括将固定数量的ADMM-DDP迭代展开成一个神经网络，层之间具有可学习的函数，将优化器状态映射到下一个超参数。据我们所知，Deep Coordinator是第一个在求解时调整非凸优化器惩罚参数的深度展开框架；我们展示了主流的监督方法在训练此类模型时可能产生退化解，并提出了一种无监督学习方案。在车队和四旋翼飞行器的仿真中，Deep Coordinator生成的轨迹质量与常规求解器相当，但速度快6.18-9.44倍。此外，当部署到比训练规模大8倍的系统时，Deep Coordinator仍能保持其性能优势。

英文摘要

Distributed optimization is a highly scalable and structurally transparent technique to solve multi-agent robotics problems; however, such methods often suffer from the need for highly-specialized, problem-specific hyperparameter tunings. In this work, we propose Deep Coordinator, a deep-unfolding framework that learns to dynamically adjust the hyperparameters of ADMM-DDP, a popular distributed solver for robotics tasks, at solve-time in response to optimizer performance. Our architecture consists of unrolling a fixed number of ADMM-DDP iterations into a neural network with learnable functions between layers mapping the optimizer state to the next hyperparameters. To the best of our knowledge, Deep Coordinator is the first deep-unfolding framework to adapt the penalty parameters of a non-convex optimizer at solve-time; we show that the mainstream supervised approach can yield degenerate solutions when training such models, and propose an unsupervised learning scheme. On simulations with fleets of cars and quadrotors, Deep Coordinator produces trajectories of comparable quality 6.18-9.44x faster than conventional solvers. Furthermore, Deep Coordinator retains its performance benefits when deployed to systems up to 8x larger than trained on.

URL PDF HTML ☆

赞 0 踩 0

2606.20022 2026-06-19 stat.ML cs.LG math.OC 交叉投稿

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

具有有界噪声的随机线性上下文赌博机：一种集合成员方法

Haonan Xu, Yingying Li

AI总结针对有界奖励噪声的随机线性上下文赌博机，提出基于集合成员估计和乐观原则的SME-OFU算法，实现O(log T)的遗憾界，优于次高斯噪声下的最优界。

Comments 23 pages, 1 figure

详情

AI中文摘要

本文考虑具有有界奖励噪声的随机线性上下文赌博机（SLCB）。现有工作通常假设次高斯奖励噪声和有界期望奖励，在此条件下最优遗憾界关于时间T为$\tilde{O}(\sqrt{T})$。然而，在许多应用中，实现/观测到的奖励也自然有界，这意味着奖励噪声有界。有界噪声比次高斯条件更具信息性，但在SLCB文献中尚未被明确利用。本文通过利用一种称为集合成员估计（SME）的不确定性量化方法，并应用面对不确定性的乐观原则（OFU），提出了一种新颖的算法SME-OFU。我们的算法享有改进的遗憾界$O(\log T)$。注意，这并不与次高斯噪声下现有的最优界$\tilde{O}(\sqrt{T})$矛盾，因为有界噪声是更强的条件。最后，仿真表明，当奖励噪声有界时，SME-OFU相对于为次高斯噪声设计的基准算法在经验上有所改进。

英文摘要

This paper considers stochastic linear contextual bandits (SLCB) with bounded reward noise. Existing works typically assume sub-Gaussian reward noise and bounded expected rewards, under which the optimal regret bound scales as $\tilde{O}(\sqrt{T})$ in terms of horizon $T$. However, in many applications, realized/observed rewards are also naturally bounded, implying bounded reward noise. Bounded noise is more informative than the sub-Gaussian condition but has not been leveraged explicitly in the SLCB literature. In this paper, we propose a novel algorithm SME-OFU by utilizing an uncertainty quantification method called set-membership estimation (SME) and applying the principle of optimism in the face of uncertainty (OFU). Our algorithm enjoys an improved regret bound $O(\log T)$. Notice that this does not contradict the existing optimal bound $\tilde{O}(\sqrt{T})$ for sub-Gaussian noise because bounded noise is a stronger condition. Finally, simulations show empirical improvements of SME-OFU over a benchmark algorithm designed for sub-Gaussian noise when the reward noise is bounded.

URL PDF HTML ☆

赞 0 踩 0

2606.20206 2026-06-19 stat.ML cs.LG 交叉投稿

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

马尔可夫决策过程中奖励非随机缺失的缺失感知策略的离线评估

Ziheng Wei, Annie Qu, Rui Miao

AI总结针对奖励非随机缺失的离线强化学习问题，提出基于未来状态作为影子变量的识别方法，并利用桥函数和min-max估计器恢复条件均值奖励，实现缺失感知策略的离线评估。

Comments Accepted at ICML 2026. 31 pages, 6 figures

详情

AI中文摘要

在离线强化学习中，由于记录稀疏或不规则，或超出特定奖励值的审查，记录批次数据中的即时奖励通常未被观测到。这个问题出现在实际场景中，包括医疗和营销。我们研究了有限时域马尔可夫决策过程中奖励非随机缺失时的离线策略评估，这破坏了可忽略性，并即使在以状态和行动为条件后也会引起选择偏差。为了解决这个问题，我们形式化了一个依赖于奖励的倾向模型，并使用未来状态作为影子变量来识别完整数据的条件均值奖励。我们进一步引入了一个桥函数，无需显式建模MNAR机制即可恢复条件均值奖励，并通过min-max过程进行估计以避免双重采样。基于这些识别结果，我们提出了一个类似Fitted-Q-Evaluation的估计器，该估计器传播恢复的奖励，同时允许目标策略依赖于过去的缺失指示符。最后，我们为我们的OPE估计器建立了一致性和有限样本误差界，并通过实验在模拟数据和MIMIC-III脓毒症数据上展示了我们方法相比现有方法的强性能。

英文摘要

In offline Reinforcement Learning, immediate rewards in logged batch data are often unobserved due to sparse or irregular record-keeping, or censored beyond certain reward values. This issue arises in practical settings, including health care and marketing. We investigate off-policy evaluation (OPE) in finite-horizon Markov decision processes when rewards are missing not at random (MNAR), which breaks ignorability and induces selection bias even after conditioning on states and actions. To address this, we formalize a reward-dependent propensity model and use future states as shadow variables to identify the full-data conditional mean reward. We further introduce a bridge function that recovers the conditional mean reward without explicitly modeling the MNAR mechanism, and estimate it via a min-max procedure to avoid double sampling. Building upon these identification results, we propose an Fitted-Q-Evaluation-style estimator that propagates the recovered rewards while allowing target policies to depend on past missingness indicators. Finally, we establish consistency and finite-sample error bounds for our OPE estimator, and show through experiments the strong performance of our method compared to existing methods on simulated and MIMIC-III Sepsis data.

URL PDF HTML ☆

赞 0 踩 0

2606.20236 2026-06-19 cs.AI cs.LG cs.MA 交叉投稿

A Multi-Agent system for Multi-Objective constrained optimization

多目标约束优化的多智能体系统

Federica Filippini

发表机构 * University of Milano-Bicocca（米兰比可卡大学）

AI总结提出MAMO，通过多智能体强化学习解耦任务执行与目标设计，自动学习奖励权重以平衡主目标优化与约束违反，提升动态环境下RL的自主性和鲁棒性。

Comments Presented at the 17th Workshop on Optimization and Learning in Multiagent Systems (OptLearnMAS, https://optlearnmas.github.io), co-located with the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情

AI中文摘要

计算和网络系统中的许多决策问题可以自然地表述为在性能约束下的成本最小化问题。在动态环境中，强化学习（RL）通常通过在运行时将成本和约束违反通过加权惩罚项嵌入到单个标量奖励中（遵循拉格朗日启发式公式）来解决此类问题。然而，在这种背景下，学习策略的行为关键取决于这些权重的选择，而权重通常是手动选择的。这使得难以在优化主要目标和有效避免约束违反之间找到适当的权衡，特别是在非平稳环境中，它们的相对重要性可能发生变化。本文提出了MAMO（多目标约束优化的多智能体系统），一种通过多智能体RL解决这种平衡问题的方法。MAMO通过将奖励权重的选择表述为一个学习问题，将任务执行与目标设计解耦，为动态环境中约束优化问题的更自主和鲁棒的基于RL的解决方案迈出了第一步。

英文摘要

Many decision-making problems in computing and networking systems can be naturally formulated as cost-minimization problems under performance constraints. In dynamic environments, reinforcement learning (RL) is often used to solve such problems at runtime by embedding both costs and constraint violations into a single scalar reward through weighted penalty terms, following a Lagrangian-inspired formulation. However, in this context the behavior of the learned policy critically depends on the choice of these weights, which are typically selected manually. This makes it difficult to identify an appropriate trade-off between optimizing the primary objective and effectively avoiding constraint violations, particularly in non-stationary environments where their relative importance may change. This paper presents MAMO (Multi-Agent system for Multi-Objective constrained optimization), an approach to tackle this balancing problem through multi-agent RL. MAMO decouples task execution from objective design by formulating the selection of reward weights as a learning problem, providing a !rst step towards more autonomous and robust RL-based solutions for constrained optimization problems in dynamic environments.

URL PDF HTML ☆

赞 0 踩 0

2606.20324 2026-06-19 cs.SE cs.LG 交叉投稿

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

一种模型驱动的方法用于开发强化学习环境族

Xiaoran Liu, Istvan David

AI总结提出一种模型驱动方法，通过混合遗传算法和模型转换自动生成强化学习训练环境族，以解决手动开发环境族耗时且易错的问题，并在野火缓解场景中验证了其有效性。

详情

AI中文摘要

虚拟训练环境是软件密集型系统，强化学习（RL）智能体在其中学习、适应并展示有意义的行为。虚拟训练环境为在现实环境中训练智能体提供了一种安全且成本效益高的替代方案。然而，为了收敛，大多数现实的RL问题需要在多个相似但略有不同的环境中进行训练——即环境变体族。环境族的典型开发过程是一项劳动密集型且容易出错的手动工作，难以扩展。为了缓解这些问题，本文提出了一种模型驱动的方法来开发RL训练环境族。为了获得环境族，我们开发了一种方法和原型工具。在我们的方法中，一种混合遗传算法——基于种群的全局搜索和启发式局部搜索的结合——生成环境族。变异和约束被表达为模型转换，并通过最先进的模型转换引擎操作化为搜索过程。我们在野火缓解场景和课程学习（一种依赖于环境族的特定学习范式）中展示了我们方法的有效性。

英文摘要

Virtual training environments are software-intensive systems in which reinforcement learning (RL) agents learn, adapt, and demonstrate meaningful behavior. Virtual training environments offer a safe and cost-efficient alternative to training agents in real-world settings. However, to converge, most realistic RL problems require training in multiple, mostly similar but slightly different environments - i.e., families of environment variants. The typical development process of environment families is a labor-intensive and error-prone manual endeavor that does not scale well. To alleviate these issues, in this paper, we propose a model-driven approach for developing families of RL training environments. To obtain the family of environments, we develop an approach and prototype tool. In our approach, a hybrid genetic algorithm - a combination of population-based global search and heuristic local search - generates environment families. Mutations and constraints are expressed as model transformations and are operationalized into a search process by a state-of-the-art model transformation engine. We demonstrate the soundness of our approach in a wildfire mitigation scenario and curriculum learning - a particular learning paradigm that relies on environment families.

URL PDF HTML ☆

赞 0 踩 0

2606.20356 2026-06-19 math.OC cs.AI cs.LG math.PR stat.ML 交叉投稿

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

公共噪声Wasserstein不确定性下的平均场控制鲁棒$Q$-学习

Mathieu Laurière, Ariel Neufeld, Kyunghyun Park

AI总结提出一种针对公共噪声分布Wasserstein不确定性的离散时间平均场控制鲁棒$Q$-学习算法，结合量化投影与Wasserstein对偶，证明同步和异步学习的收敛性及有限时间界，并在系统风险和流行病模型中验证鲁棒性-性能权衡。

2509.15927 2026-06-19 cs.LG cs.AI 版本更新

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

增强生成式自动出价：结合离线奖励评估与策略搜索

Zhiyu Mou, Yiqin Lv, Miao Xu, Qi Wang, Yixiu Mao, Jinghao Chen, Qichen Ye, Chao Li, Rongquan Bai, Chuan Yu, Jian Xu, Bo Zheng

发表机构 * Taobao & Tmall Group of Alibaba（阿里巴巴淘宝与天猫集团）； Department of Automation, Tsinghua University（清华大学自动化系）

AI总结针对现有生成式自动出价方法无法超越静态数据集进行探索的性能瓶颈，提出AIGB-Pearl方法，通过轨迹评估器和KL-Lipschitz约束的分数最大化方案实现安全高效探索，在模拟和真实广告系统中取得最优性能。

详情

AI中文摘要

自动出价是广告主提升广告效果的关键工具。最近进展表明，AI生成式出价（AIGB）从离线数据中学习条件生成规划器，相比典型的基于离线强化学习（RL）的自动出价方法取得了更优性能。然而，现有AIGB方法仍面临性能瓶颈，因其固有能力无法在静态数据集之外进行带反馈的探索。为解决此问题，我们提出\textbf{AIGB-Pearl}（\emph{\textbf{P}lanning with \textbf{E}valu\textbf{A}tor via \textbf{RL}}），一种融合生成式规划与策略优化的新方法。AIGB-Pearl的核心在于构建轨迹评估器以评估生成分数的质量，并设计一个理论上可靠的KL-Lipschitz约束分数最大化方案，确保在离线数据集之外进行安全高效的探索。进一步开发了结合同步耦合技术的实用算法，以保证所提方案所需的模型正则性。在模拟和真实广告系统上的大量实验证明了我们方法的最优性能。

英文摘要

Auto-bidding is a critical tool for advertisers to improve advertising performance. Recent progress has demonstrated that AI-Generated Bidding (AIGB), which learns a conditional generative planner from offline data, achieves superior performance compared to typical offline reinforcement learning (RL)-based auto-bidding methods. However, existing AIGB methods still face a performance bottleneck due to their inherent inability to explore beyond the static dataset with feedback. To address this, we propose \textbf{AIGB-Pearl} (\emph{\textbf{P}lanning with \textbf{E}valu\textbf{A}tor via \textbf{RL}}), a novel method that integrates generative planning and policy optimization. The core of AIGB-Pearl lies in constructing a trajectory evaluator to assess the quality of generated scores and designing a provably sound KL-Lipschitz-constrained score-maximization scheme to ensure safe and efficient exploration beyond the offline dataset. A practical algorithm that incorporates the synchronous coupling technique is further developed to ensure the model regularity required by the proposed scheme. Extensive experiments on both simulated and real-world advertising systems demonstrate the state-of-the-art performance of our approach.

URL PDF HTML ☆

赞 0 踩 0

2510.19893 2026-06-19 cs.LG 版本更新

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

EQPO: 面向临床推理的公平群体相对策略优化

Shiqi Dai, Wei Dai, Jiaee Cheong, Paul Pu Liang

发表机构 * MIT（麻省理工学院）； Harvard University（哈佛大学）

AI总结提出EQPO分层强化学习方法，通过自适应重加权样本促进异质临床人群的均衡学习，在7个诊断基准上降低F1标准差43.9%，缩小预测公平差距27.2%。

Comments Accepted as Oral on NeurIPS 2025 GenAI4Health Workshop

详情

AI中文摘要

医疗AI系统展示了令人印象深刻的诊断性能，但它们在不同人口统计群体之间通常表现出不均匀的准确性，使代表性不足的人群处于不利地位。尽管多模态推理基础模型推动了临床诊断的发展，基于强化学习的后训练倾向于吸收并放大多数主导训练语料中存在的偏见。我们提出公平群体相对策略优化（EQPO），一种分层强化学习方法，通过根据子群表示、任务难度和数据来源自适应地重新加权样本，鼓励跨异质临床人群的平衡学习。由于人口统计注释在真实临床数据中经常缺失，EQPO还在不可用时应用无监督聚类来恢复潜在子群。在覆盖5种模态（X射线、CT、皮肤镜、乳腺X线摄影、超声）的7个诊断基准上，EQPO在QoQ-Med3-8B上相比原始GRPO将F1标准差降低43.9%，最大跨群体F1差距降低42.7%，并在MedGemma-4B上将预测公平差距缩小27.2%（相比有偏减轻的RL基线），同时即使没有任何人口统计标签也将F1提高12.5%。检查训练轨迹显示，EQPO在优化过程中稳步提高公平性，而基线方法的公平性随训练进行而下降，并且发现的隐式群体保持稳定并与掩蔽的人口统计属性对齐。我们进一步发布了EquiMedGemma-4B和EquiQoQ-Med3-8B，这两种具有公平意识的临床VLLM在显著缩小人口统计差距的同时达到了最先进的准确性。

英文摘要

Medical AI systems demonstrated impressive diagnostic performance, yet they routinely show uneven accuracy across demographic groups, disadvantaging underrepresented populations. Although multimodal reasoning foundation models have pushed clinical diagnosis forward, reinforcement learning-based post-training tends to absorb and magnify the biases present in majority-dominated training corpora. We propose Equitable Group Relative Policy Optimization (EQPO), a hierarchical reinforcement learning method that encourages balanced learning across heterogeneous clinical populations by adaptively reweighting samples according to subgroup representation, task difficulty, and data source. As demographic annotations are frequently missing in real-world clinical data, EQPO additionally applies unsupervised clustering to recover latent subpopulations when they are unavailable. On 7 diagnostic benchmarks covering 5 modalities (X-ray, CT, dermoscopy, mammography, ultrasound), EQPO reduces F1 standard deviation by 43.9% and the maximum cross-group F1 gap by 42.7% on QoQ-Med3-8B over vanilla GRPO, and narrows predictive parity gaps by 27.2% on MedGemma-4B over bias-mitigated RL baselines while raising F1 by 12.5% even without any demographic labels. Examining the training trajectory shows that EQPO steadily improves fairness over the course of optimization, in contrast to baseline methods whose fairness degrades as training proceeds, and the discovered implicit groups remain stable and align with masked demographic attributes. We further release EquiMedGemma-4B and EquiQoQ-Med3-8B, equitability-aware clinical VLLMs that attain state-of-the-art accuracy with markedly smaller demographic gaps.

URL PDF HTML ☆

赞 0 踩 0

2510.21978 2026-06-19 cs.LG cs.AI 版本更新

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

超越推理增益：缓解大型推理模型中的通用能力遗忘

Hoang Phan, Xianjun Yang, Yuanshun Yao, Jingyu Zhang, Shengjie Bi, Xiaocheng Tang, Madian Khabsa, Lijuan Liu, Deren Lei

发表机构 * Meta Superintelligence Labs（Meta超智能实验室）； New York University（纽约大学）； Johns Hopkins University（约翰霍普金斯大学）

AI总结针对强化学习训练导致推理模型遗忘基础能力的问题，提出RECAP重放策略，通过动态目标重加权在线调整训练重点，在保持通用能力的同时提升推理性能。

详情

AI中文摘要

基于可验证奖励的强化学习（RLVR）在数学和多模态推理方面取得了显著进展，并已成为当代语言和视觉-语言模型的标准后训练范式。然而，RLVR方法引入了能力退化的重大风险，即模型在长时间训练后，若未采用正则化策略，会遗忘基础技能。我们通过实验证实了这一担忧，观察到开源推理模型在感知和忠实性等核心能力上出现性能下降。虽然施加KL散度等正则化项有助于防止偏离基础模型，但这些项是在当前任务上计算的，因此不能保证保留更广泛的知识。同时，跨异构领域的经验回放使得决定每个目标应获得多少训练权重变得困难。为解决这一问题，我们提出RECAP——一种具有动态目标重加权的重放策略，用于通用知识保留。我们的重加权机制利用短期收敛和不稳定信号在线自适应，将后训练焦点从饱和目标转移到表现不佳或不稳定的目标。我们的方法是端到端的，可直接应用于现有RLVR流程，无需训练额外模型或进行繁重调优。在Qwen2.5-VL-3B和Qwen2.5-VL-7B上的广泛实验证明了我们方法的有效性，该方法不仅保留了通用能力，还通过实现任务内奖励的更灵活权衡提升了推理性能。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) has delivered impressive gains in mathematical and multimodal reasoning and has become a standard post-training paradigm for contemporary language and vision-language models. However, the RLVR recipe introduces a significant risk of capability regression, in which models forget foundational skills after prolonged training without employing regularization strategies. We empirically confirm this concern, observing that open-source reasoning models suffer performance degradation on core capabilities such as perception and faithfulness. While imposing regularization terms like KL divergence can help prevent deviation from the base model, these terms are computed on the current task and therefore do not guarantee preservation of broader knowledge. Meanwhile, commonly used experience replay across heterogeneous domains makes it nontrivial to decide how much training emphasis each objective should receive. To address this, we propose RECAP-a replay strategy with dynamic objective reweighting for general knowledge preservation. Our reweighting mechanism adapts online using short-horizon signals of convergence and instability, shifting the post-training focus away from saturated objectives and toward underperforming or volatile ones. Our method is end-to-end and readily applicable to existing RLVR pipelines without training additional models or heavy tuning. Extensive experiments on benchmarks using Qwen2.5-VL-3B and Qwen2.5-VL-7B demonstrate the effectiveness of our method, which not only preserves general capabilities but also improves reasoning by enabling more flexible trade-offs among in-task rewards.

URL PDF HTML ☆

赞 0 踩 0

2601.22970 2026-06-19 cs.LG cs.AI 版本更新

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

稳定Q-梯度场以实现Actor-Critic方法中的策略平滑性

Jeong Woon Lee, Kyoleen Kwak, Daeho Kim, Hyoseok Hwang

发表机构 * College of Software, Kyung Hee University（韩国庆熙大学软件学院）

AI总结针对连续动作空间中actor-critic方法策略振荡问题，提出基于评论家微分几何的PAVE框架，通过稳定Q-梯度场实现策略平滑，无需修改actor。

详情

AI中文摘要

通过连续actor-critic方法学习的策略通常表现出不稳定的高频振荡，使其不适合物理部署。当前方法试图通过直接正则化策略输出来强制平滑性。我们认为这种方法治标不治本。在这项工作中，我们从理论上建立了策略非平滑性根本上由评论家的微分几何决定。通过对actor-critic目标应用隐式微分，我们证明了最优策略的敏感性受限于Q函数的混合偏导数（噪声敏感性）与其动作空间曲率（信号区分度）之比。为了实证验证这一理论见解，我们引入了PAVE（策略感知值场均衡），一种以评论家为中心的正则化框架，将评论家视为标量场并稳定其诱导的动作梯度场。PAVE通过最小化Q-梯度波动同时保持局部曲率来修正学习信号。实验结果表明，PAVE在不修改actor的情况下，实现了与策略侧平滑正则化方法相当的平滑性，同时保持了有竞争力的任务性能。

英文摘要

Policies learned via continuous actor-critic methods often exhibit erratic, high-frequency oscillations, making them unsuitable for physical deployment. Current approaches attempt to enforce smoothness by directly regularizing the policy's output. We argue that this approach treats the symptom rather than the cause. In this work, we theoretically establish that policy non-smoothness is fundamentally governed by the differential geometry of the critic. By applying implicit differentiation to the actor-critic objective, we prove that the sensitivity of the optimal policy is bounded by the ratio of the Q-function's mixed-partial derivative (noise sensitivity) to its action-space curvature (signal distinctness). To empirically validate this theoretical insight, we introduce PAVE (Policy-Aware Value-field Equalization), a critic-centric regularization framework that treats the critic as a scalar field and stabilizes its induced action-gradient field. PAVE rectifies the learning signal by minimizing the Q-gradient volatility while preserving local curvature. Experimental results demonstrate that PAVE achieves smoothness comparable to policy-side smoothness regularization methods, while maintaining competitive task performance, without modifying the actor.

URL PDF HTML ☆

赞 0 踩 0

2602.04037 2026-06-19 cs.LG cs.RO 版本更新

DADP: Domain Adaptive Diffusion Policy

DADP: 领域自适应扩散策略

Pengcheng Wang, Qinghang Liu, Haotian Lin, Yiheng Li, Guojian Zhan, Masayoshi Tomizuka, Yixiao Wang

发表机构 * University of California, Berkeley, California, USA（加州大学伯克利分校）； Peking University, Beijing, China（北京大学）； Tsinghua University, Beijing, China（清华大学）

AI总结提出DADP，通过无监督解耦和领域感知扩散注入，实现跨动态环境的鲁棒零样本适应，在运动与操控任务上超越先前方法。

详情

AI中文摘要

学习能够泛化到未见过的转移动态的领域自适应策略，仍然是基于学习的控制中的一个基本挑战。通过领域表示学习来捕获领域特定信息，从而实现领域感知决策，已经取得了实质性进展。我们分析了通过动态预测学习领域表示的过程，发现选择与当前步骤相邻的上下文会导致学习到的表示将静态领域信息与变化的动态属性纠缠在一起。这种混合可能会混淆条件策略，从而限制零样本适应。为了应对这一挑战，我们提出了DADP（领域自适应扩散策略），通过无监督解耦和领域感知扩散注入实现鲁棒适应。首先，我们引入了滞后上下文动态预测，这是一种将未来状态估计条件化在历史偏移上下文上的策略；通过增加这个时间间隔，我们通过过滤掉瞬态属性来无监督地解耦静态领域表示。其次，我们通过偏置先验分布和重新制定扩散目标，将学习到的领域表示直接集成到生成过程中。在涉及运动和操控的具有挑战性的基准测试上的大量实验表明，DADP相对于先前方法具有优越的性能和泛化能力。更多可视化结果可在此https URL上获得。

英文摘要

Learning domain adaptive policies that can generalize to unseen transition dynamics, remains a fundamental challenge in learning-based control. Substantial progress has been made through domain representation learning to capture domain-specific information, thus enabling domain-aware decision making. We analyze the process of learning domain representations through dynamical prediction and find that selecting contexts adjacent to the current step causes the learned representations to entangle static domain information with varying dynamical properties. Such mixture can confuse the conditioned policy, thereby constraining zero-shot adaptation. To tackle the challenge, we propose DADP (Domain Adaptive Diffusion Policy), which achieves robust adaptation through unsupervised disentanglement and domain-aware diffusion injection. First, we introduce Lagged Context Dynamical Prediction, a strategy that conditions future state estimation on a historical offset context; by increasing this temporal gap, we unsupervisedly disentangle static domain representations by filtering out transient properties. Second, we integrate the learned domain representations directly into the generative process by biasing the prior distribution and reformulating the diffusion target. Extensive experiments on challenging benchmarks across locomotion and manipulation demonstrate the superior performance, and the generalizability of DADP over prior methods. More visualization results are available on the https://outsider86.github.io/DomainAdaptiveDiffusionPolicy/.

URL PDF HTML ☆

赞 0 踩 0

2602.17315 2026-06-19 cs.LG cs.AI 版本更新

Flickering Multi-Armed Bandits

闪烁多臂老虎机

Sourav Chakraborty, Amit Kiran Rege, Claire Monteleoni, Lijun Chen

发表机构 * University of Colorado Boulder（科罗拉多大学博尔德分校）； INRIA Paris（巴黎国家信息与自动化研究所）

AI总结提出闪烁多臂老虎机模型，通过随机图约束动作可用性，设计两阶段懒惰随机游走算法实现次线性遗憾界，并证明信息论下界的最优性。

强化学习基础模型本应已经存在

Abdelrahman Zighem, Jill-Jênn Vie

发表机构 * École normale supérieure de Paris, PSL University, Paris, France（巴黎高等师范学院，PSL大学，法国巴黎）； Soda team, Inria Saclay, Palaiseau, France（Soda团队，法国国家信息与自动化研究所萨克雷中心，法国帕莱索）

AI总结提出通过合成MDP构建强化学习基础模型，利用固定大小的充分统计量使注意力架构适用，在线和离线实验均优于传统算法。

详情

AI中文摘要

语言和视觉的基础模型由互联网规模的数据驱动，而结构化领域（表格预测、时间序列预测、图学习、强化学习）则不然。替代方案是合成数据，它将负担从收集转移到先验设计。这种先验已经存在于许多结构化任务中：TabPFN及其后续工作通过一个在合成贝叶斯先验上预训练的Transformer解决表格分类问题。我们提出两点。\textbf{首先}，强化学习是明显的空白：采样一个合成MDP与采样一个合成表格数据集一样可行，然而没有上下文强化学习工作将先验设计作为主要目标。\textbf{其次}，MDP允许一个固定大小的充分统计量，独立于观察到的回合且形状为表格形式，这使得它们直接适用于用于表格基础模型的基于注意力的架构，只需将策略头替换监督目标。这些共同定义了强化学习基础模型的议程。作为概念验证，我们完全在合成MDP上训练一个模型，并表明，无需任务特定的调优，它就能在上下文中解决留出的表格基准，包括在线和离线：在线时，使用比UCB-VI和表格Q-learning少得多的回合；离线时，与VI-LCB竞争。

英文摘要

Foundation models for language and vision are powered by internet-scale data, while structured domains such as tabular prediction are powered by synthetic data. This substitute shifts the challenge from collection to prior design. Such priors already exist for many structured tasks: TabPFN and its successors solve tabular classification with a transformer pretrained on a synthetic Bayesian prior. We make two points. \textbf{First}, reinforcement learning is the conspicuous gap: sampling a synthetic MDP is as feasible as sampling a synthetic tabular dataset, yet no in-context RL work treats prior design as a primary objective. \textbf{Second}, MDPs admit a fixed-size sufficient statistic, independent of the episodes observed and tabular in shape, which makes them directly amenable to the attention-based architectures used for tabular foundation models, with a policy head replacing the supervised target. Together these define the agenda for an RL foundation model. As a proof of concept, we train a Graph Attention Network entirely on synthetic MDPs and show that, with no task-specific tuning, it solves held-out tabular benchmarks in context, both online and offline: online, in far fewer episodes than UCB-VI and tabular Q-learning, and offline, competitively with VI-LCB.

URL PDF HTML ☆

赞 0 踩 0

2505.18201 2026-06-19 cs.RO cs.LG 版本更新

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

强化孪生用于扑翼无人机的混合控制

Romain Poletti, Lorenzo Schena, Lilla Koloszar, Joris Degroote, Miguel Alfonso Mendez

发表机构 * Environmental and Applied Fluid Dynamics, von Karman Institute for Fluid Dynamics（环境与应用流体动力学，冯·卡门流体动力学研究所）； Department of Mechanical Engineering, Vrije Universiteit Brussel（机械工程系，自由大学布鲁塞尔）； Department of Electromechanical, Systems and Metal Engineering, Ghent University（机电系统与金属工程系，根特大学）； Aero-Thermo-Mechanics Laboratory, École Polytechnique de Bruxelles, Université Libre de Bruxelles（航空热力学力学实验室，布鲁塞尔理工学院，自由大学布鲁塞尔）； Experimental Aerodynamics and Propulsion Lab, Universidad Carlos III de Madrid（实验空气动力学与推进实验室，马德里卡洛斯三世大学）

AI总结提出一种混合无模型/基于模型的扑翼无人机控制方法，通过强化孪生算法结合强化学习与自适应数字孪生，利用迁移学习和策略裁判提升样本效率与控制鲁棒性。

详情

AI中文摘要

控制扑翼无人机需要能够处理来自不完整、有噪声传感器数据的时变、非线性、欠驱动动力学的控制器。人工智能的最新进展，特别是强化学习，通过从环境交互中进行数据驱动的策略优化，为解决此类复杂控制问题开辟了新视角。然而，纯数据驱动方法样本效率低，需要大量甚至不安全的探索，尤其是在缺乏引导物理模型的情况下。这激发了混合人工智能-物理框架。本文提出了一种使用强化孪生算法的混合无模型/基于模型的飞行控制方法。基于模型的组件使用伴随公式和从实时轨迹中连续识别的自适应数字孪生；无模型组件使用强化学习。两个智能体通过迁移学习、模仿学习以及真实环境与数字孪生之间的共享经验来共享知识，并由一个策略裁判协调，该裁判根据数字孪生性能和真实到虚拟一致性比率选择哪个智能体在现实中行动。该框架针对扑翼无人机的纵向控制进行了评估，该无人机被建模为由准稳态气动力驱动的非线性时变系统。混合策略在三种自适应模型初始化下进行了测试：（1）从现有数据进行离线识别，（2）随机初始化并进行完全在线识别，以及（3）使用有偏参数进行离线预训练，然后进行在线自适应。在所有情况下，混合框架在性能、鲁棒性和样本效率方面均优于纯无模型和纯基于模型的方法。

英文摘要

Controlling flapping-wing drones requires controllers that handle time-varying, nonlinear, underactuated dynamics from incomplete, noisy sensor data. Recent advances in artificial intelligence (AI), particularly reinforcement learning (RL), have opened new perspectives for addressing such complex control problems through data-driven policy optimization from interaction with the environment. Yet purely data-driven methods are sample-inefficient, demanding extensive, sometimes unsafe exploration, especially without guiding physical models. This motivates hybrid AI-physics frameworks. This article proposes a hybrid model-free/model-based flight-control approach using the reinforcement twinning algorithm. The model-based (MB) component uses an adjoint formulation and an adaptive digital twin continuously identified from live trajectories; the model-free (MF) component uses RL. The two agents share knowledge via transfer learning, imitation learning, and shared experience between the real environment and the digital twin, coordinated by a policy referee that selects which agent acts in reality based on digital-twin performance and a real-to-virtual consistency ratio. The framework is evaluated for the longitudinal control of a flapping-wing drone, modelled as a nonlinear time-varying system driven by quasi-steady aerodynamic forces. The hybrid strategy is tested under three adaptive-model initializations: (1) offline identification from existing data, (2) random initialization with fully online identification, and (3) offline pre-training with biased parameters followed by online adaptation. In all cases, the hybrid framework improves performance, robustness, and sample efficiency over purely model-free and purely model-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2507.19712 2026-06-19 cs.DC cs.AI cs.GT cs.LG cs.NI 版本更新

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Oranits: 基于Open RAN的智能交通系统中的任务分配与卸载——元启发式与深度强化学习方法

Ngoc Hung Nguyen, Nguyen Van Thieu, Quang-Trung Luu, Anh Tuan Nguyen, Senura Wanasekara, Nguyen Cong Luong, Fatemeh Kavehmadavani, Van-Dinh Nguyen

发表机构 * Department of Smart City, Hanyang University（翰阳大学智能城市系）

AI总结提出Oranits系统模型，通过元启发式算法CGG-ARO和深度强化学习框架MA-DDQN优化车辆协作中的任务依赖与卸载成本，分别提升任务完成率7.7%和12.5%。

Comments 16 pages, 13 figures

Journal ref IEEE TRANSACTIONS ON INTELLIGENT TRANSPORTATION SYSTEMS, 2026

详情

AI中文摘要

本文研究了基于开放无线接入网（Open RAN）的智能交通系统（ITS）中的任务分配与卸载问题，其中自动驾驶车辆利用移动边缘计算进行高效处理。现有研究常忽视任务之间的复杂依赖关系以及将任务卸载到边缘服务器的成本，导致决策次优。为弥补这一不足，我们引入了Oranits，一种新颖的系统模型，明确考虑了任务依赖性和卸载成本，同时通过车辆协作优化性能。为此，我们提出了一种双重优化方法。首先，我们开发了一种基于元启发式的进化计算算法，即混沌高斯全局ARO（CGG-ARO），作为单时隙优化的基线。其次，我们设计了一种增强的基于奖励的深度强化学习（DRL）框架，称为多智能体双深度Q网络（MA-DDQN），该框架集成了多智能体协调和多动作选择机制，显著减少了任务分配时间并提高了对基线方法的适应性。大量仿真表明，CGG-ARO将完成任务数量和总体收益分别提高了约7.1%和7.7%。同时，MA-DDQN在任务完成率和总体收益方面分别实现了11.0%和12.5%的更大提升。这些结果凸显了Oranits在动态ITS环境中实现更快、更自适应、更高效任务处理的有效性。

英文摘要

In this paper, we explore mission assignment and task offloading in an Open Radio Access Network (Open RAN)-based intelligent transportation system (ITS), where autonomous vehicles leverage mobile edge computing for efficient processing. Existing studies often overlook the intricate interdependencies between missions and the costs associated with offloading tasks to edge servers, leading to suboptimal decision-making. To bridge this gap, we introduce Oranits, a novel system model that explicitly accounts for mission dependencies and offloading costs while optimizing performance through vehicle cooperation. To achieve this, we propose a twofold optimization approach. First, we develop a metaheuristic-based evolutionary computing algorithm, namely the Chaotic Gaussian-based Global ARO (CGG-ARO), serving as a baseline for one-slot optimization. Second, we design an enhanced reward-based deep reinforcement learning (DRL) framework, referred to as the Multi-agent Double Deep Q-Network (MA-DDQN), that integrates both multi-agent coordination and multi-action selection mechanisms, significantly reducing mission assignment time and improving adaptability over baseline methods. Extensive simulations reveal that CGG-ARO improves the number of completed missions and overall benefit by approximately 7.1% and 7.7%, respectively. Meanwhile, MA-DDQN achieves even greater improvements of 11.0% in terms of mission completions and 12.5% in terms of the overall benefit. These results highlight the effectiveness of Oranits in enabling faster, more adaptive, and more efficient task processing in dynamic ITS environments.

URL PDF HTML ☆

赞 0 踩 0

2605.00457 2026-06-19 cs.NI cs.LG cs.SY eess.SY 版本更新

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

基于策略驱动的DRL的NR-U与Wi-Fi共存中的TXOP自适应

Po-Heng Chou, Yi-Fang Yu, Shou-Yu Chen, Chiapin Wang

发表机构 * Research Center for Information Technology Innovation (CITI), Academia Sinica (AS)（资讯科技创新研究所以（CITI），中华学术界（AS））； Department of Electrical Engineering, National Taiwan Normal University (NTNU)（国立台湾师范大学电子工程系（NTNU））

AI总结针对NR-U与Wi-Fi在非授权频谱共存中的频谱利用不平衡问题，提出一种基于策略驱动的深度强化学习框架，通过奖励设计实现公平性、吞吐量和效用的灵活权衡控制。

Comments 15 pages, 13 figures, 2 tables, submitted to IEEE Open Journal of the Communications Society

详情

AI中文摘要

NR-U与Wi-Fi在非授权频谱中的共存引入了一个具有挑战性的共存管理问题，其中异构信道接入机制导致频谱利用的显著不平衡和Wi-Fi性能下降。为了解决这一挑战，我们提出了一种基于策略驱动的深度强化学习（DRL）框架，用于自适应传输机会（TXOP）控制，其中共存过程被建模为马尔可夫决策过程（MDP），深度Q网络（DQN）通过在线交互学习控制策略。一个关键贡献是通过奖励设计引入策略层，从而实现对公平性、吞吐量和效用之间共存权衡的显式控制。开发了三种策略，即绝对公平、适度公平和基于效用的公平，以实现不同的工作点。仿真结果表明，所提出的框架在严格公平控制下实现了高于0.9的Jain公平指数。与绝对公平相比，适度公平将总吞吐量提高了68.22%，而基于效用的策略进一步将效用提高了177.6%。这些结果表明，策略驱动控制为管理异构共存网络中的权衡提供了一种灵活有效的解决方案。

英文摘要

The coexistence of NR-U and Wi-Fi in the unlicensed spectrum introduces a challenging resource management problem, where heterogeneous channel access mechanisms can lead to unbalanced spectrum utilization and severe Wi-Fi performance degradation. To address this issue, this paper proposes a utility-aware deep reinforcement learning (DRL) framework for adaptive transmission opportunity (TXOP) control in NR-U/Wi-Fi coexistence networks. The coexistence process is formulated as a Markov decision process (MDP), in which the NR-U TXOP duration is treated as a controllable variable for regulating post-access channel occupancy. A deep Q-network (DQN) is then employed to learn adaptive TXOP control policies through online interaction with the coexistence environment. A key feature of the proposed framework is the integration of a configurable reward and criterion design, which enables explicit control of the fairness-efficiency-utility tradeoff. Three operating policies are developed, namely absolute fairness, moderate fairness, and utility-oriented moderate fairness, to characterize different coexistence operating points. Simulation results show that the proposed framework achieves a Jain fairness index above 0.9 under strict fairness control. Compared with the absolute fairness policy, the moderate fairness policy improves aggregate throughput by 68.22%, while the utility-oriented policy achieves a 177.6% improvement under the adopted utility evaluation metric. These results demonstrate that the proposed utility-aware DRL framework provides an effective and flexible solution for adaptive TXOP control and tradeoff management in heterogeneous unlicensed coexistence networks.

URL PDF HTML ☆

赞 0 踩 0

2605.22748 2026-06-19 cs.RO cs.AI cs.LG cs.MA 版本更新

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning

通过多智能体强化学习实现超人类安全且敏捷的赛车

Ismail Geles, Leonard Bauersfeld, Markus Wulfmeier, Davide Scaramuzza

发表机构 * Robotics and Perception Group, University of Zurich（苏黎世大学机器人与感知组）； Google DeepMind（谷歌深Mind）； Nomagic

AI总结本文提出通过多智能体强化学习在高速四旋翼赛车中实现安全且敏捷的性能，展示了多智能体交互对真实世界交互安全性的关键作用，同时在高速赛车中超越人类飞行员并减少碰撞率。

Comments 12 pages (+4 supplementary). Website: https://rpg.ifi.uzh.ch/marl

详情

AI中文摘要

自主系统在孤立或模拟环境中已实现超人类性能，但在共享、动态的真实世界空间中仍显得脆弱。这种失败源于物理应用中主导的单智能体范式，其中其他参与者被忽略或视为环境噪声，阻碍了有效协调。本文证明多智能体强化学习为真实世界交互提供了必要的安全性基础。使用高速四旋翼赛车作为高风险测试平台，训练智能体在复杂空气动力学相互作用和战略机动中导航，具有可变数量的赛车。通过联赛基于的自我对战，智能体进化出复杂的前瞻性行为，包括主动避障、超车和处理多智能体物理交互，包括空气动力学下洗。我们的智能体在超过22米/秒的速度下多玩家赛车中超越了冠军级人类飞行员，同时与最先进的单智能体基线相比，碰撞率减少了50%。关键的是，使用多样化的人工智能体进行训练能够实现零样本泛化到更安全的人类交互。这些结果表明，实现稳健的机器人共存的路径不在于孤立的安全约束，而在于多智能体交互的严格要求。多媒体材料可在：https://rpg.ifi.uzh.ch/marl

英文摘要

Autonomous systems have achieved superhuman performance in isolation or simulation, yet they remain brittle in shared, dynamic real-world spaces. This failure stems from the dominant single-agent paradigm for physical applications, where other actors are ignored or treated as environmental noise, preventing effective coordination. Here we show that multi-agent reinforcement learning provides the essential safety scaffolding required for real-world interaction. Using high-speed quadrotor racing as a high-stakes testbed, we train agents to navigate complex aerodynamic interactions and strategic maneuvering with a variable number of racers. Through league-based self-play, agents evolve sophisticated anticipatory behaviors, including proactive collision avoidance, overtaking, and handling multi-agent physical interactions, including aerodynamic downwash. Our agents outperform a champion-level human pilot in multi-player races at speeds exceeding 22 m/s, while simultaneously reducing collision rates by 50 % compared to state-of-the-art single-agent baselines. Crucially, training with diverse artificial agents enables zero-shot generalization to safer human interaction. These results suggest that the path to robust robotic co-existence lies not in isolated safety constraints, but in the rigorous demands of multi-agent interaction. Multimedia materials are available at: https://rpg.ifi.uzh.ch/marl

URL PDF HTML ☆

赞 0 踩 0

2606.19377 2026-06-19 cs.LG cs.AI 新提交

Emyx: Fast and efficient all-atom protein generation

Emyx: 快速高效的全原子蛋白质生成

Nicholas J. Williams, Ward Haddadin, Matteo P. Ferla, Constantin Schneider, Nicholas B. Woodall, Ruby Sedgwick, Christian D. Madsen, Andrew L. Hopkins, Edward O. Pyzer-Knapp

发表机构 * Xyme

AI总结提出Emyx，一种140M参数的流匹配模型，通过轻量条件表示和稀疏连接降低复杂度，在酶设计基准上超越现有方法，训练仅需682 GPU小时。

详情

AI中文摘要

计算酶设计需要生成能够支撑催化残基和配体的蛋白质，这要求生成模型同时具备几何准确性和结构多样性。当前的全原子生成模型继承了结构预测中的昂贵架构，导致训练成本高、样本多样性有限。我们认为，对于生成模型而言，这种复杂性大多是不必要的，因为生成模型依赖于稀疏的几何约束而非丰富的共进化信号。Emyx是一个140M参数的条件流匹配模型，将能力集中在标准Transformer块中，用轻量条件表示和稀疏连接替代了厚重的嵌入堆叠。此外，我们推导了流匹配插值到EDM噪声水平框架的精确重参数化，将流匹配训练效率与为扩散模型设计的最先进采样方法桥接起来，无需重新训练。尽管是最小的模型，Emyx在AME酶设计基准上，在要求全局折叠恢复和催化几何准确性的严格评估下，在成功率、结构新颖性、骨架多样性和几何有效性方面均优于Proteína-Complexa和RFdiffusion3，而训练仅需682 GPU小时，约为RFdiffusion3的1/4。

英文摘要

Computational enzyme design requires generating proteins that scaffold catalytic residues and ligands, a task that demands both geometric accuracy and structural diversity from the underlying generative model. Current all-atom generators inherit expensive architectures from structure prediction, leading to high training costs and limited sample diversity. We argue that much of this complexity is unnecessary for generators, which condition on sparse geometric constraints rather than rich co-evolutionary signals. Emyx is a 140M-parameter conditional flow matching model that concentrates capacity within standard transformer blocks, replacing heavy embedding stacks with lightweight conditional representations and sparse connectivity. We additionally derive an exact reparametrisation of the flow matching interpolant into the EDM noise-level framework, bridging flow matching training efficiency with state-of-the-art sampling methods designed for diffusion models without retraining. Despite being the smallest model, Emyx outperforms both Proteína-Complexa and RFdiffusion3 against the AME enzyme design benchmark across success rate under strict evaluation requiring both global fold recovery and catalytic geometry accuracy, structural novelty, scaffold diversity, and geometric validity, while training in just $682$ GPU-hours, roughly $4\times$ less than RFdiffusion3.

URL PDF HTML ☆

赞 0 踩 0

2606.19496 2026-06-19 cs.LG 新提交

Calibrating Generative Models to Feature Distributions with MMD Finetuning

使用MMD微调将生成模型校准到特征分布

Nathaniel L. Diamant, Brian L. Trippe

发表机构 * Stanford University（斯坦福大学）

AI总结提出kCGM方法，通过最小化生成与目标特征分布的最大均值差异（MMD）并加入KL正则化，在不牺牲有效性的前提下校准生成模型的特征分布，适用于多种生成模型。

详情

AI中文摘要

生成模型可以产生个体上合理的样本，但在关键特征分布上与目标集存在显著偏差。例如，在广泛的药物类化学空间上预训练的模型可能生成分子，其分子特征与感兴趣的治疗类别（如已知抗生素）不同。纠正这种分布校准错误具有挑战性：在目标集上直接微调可能导致过拟合，并且无法控制匹配哪些特征。为了填补这一空白，我们引入了核校准生成模型（kCGM）。kCGM使用无偏得分函数估计器最小化生成特征分布与目标特征分布之间的最大均值差异（MMD），并通过KL正则化保持与预训练模型的接近。在一个包含174种抗生素的目标集上，直接微调牺牲了化学有效性以匹配特征分布，而kCGM在提高有效性的同时改善了目标特征匹配。我们还在蛋白质和DNA生成任务中展示了kCGM，表明它可以使用仅特征级别的监督来适应自回归、连续空间扩散和离散扩散模型。代码可在https://this URL获取。

英文摘要

Generative models can produce individually plausible samples while deviating substantially from a target set in the distribution of key features. For example, a model pretrained on broad drug-like chemical space may generate molecules whose molecular features differ from those of a therapeutic class of interest, such as known antibiotics. Correcting such distributional miscalibration is challenging: direct finetuning on the target set can overfit and does not control which features are matched. To fill this gap, we introduce kernel Calibrating Generative Models (kCGM). kCGM minimizes a maximum mean discrepancy (MMD) between generated and target feature distributions using an unbiased score-function estimator, with KL regularization to remain close to the pretrained model. On a target set of 174 antibiotics, direct finetuning sacrifices chemical validity for feature-distribution matching, whereas kCGM improves target feature matching while increasing validity. We further demonstrate kCGM in protein and DNA generation tasks, showing it can adapt autoregressive, continuous-space diffusion, and discrete diffusion models using only feature-level supervision. Code is available at https://github.com/smithhenryd/cgm.

URL PDF HTML ☆

赞 0 踩 0

2606.19770 2026-06-19 cs.LG 新提交

An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling

基于潜在混合建模的图新颖性生成的信息论框架

Itsuki Nakagawa, Kenji Yamanishi

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo（东京大学信息科学与技术研究生院）

AI总结提出信息论框架，通过潜在混合建模和描述长度约束，生成与现有模式不同且保持全局结构一致性的新颖图数据。

详情

AI中文摘要

我们提出了一个用于图新颖性生成的信息论框架，旨在生成与现有模式不同且保持全局结构一致性的数据。我们的方法将数据嵌入潜在空间，使用有限混合模型对潜在分布进行建模，并通过基于描述长度制定的显式新颖性和可靠性条件生成新颖样本。具体来说，新颖性通过要求生成样本难以被所有现有混合成分解释来强制执行，而可靠性则根据最小描述长度（MDL）原则约束其对整体混合结构的影响。我们提供了理论分析，表明在适当的阈值选择下，将非新颖或不可靠样本错误分类的概率以显式速率收敛到零。在合成和基准图数据集上的实验表明，所提出的方法能够以可量化的风险实现原则性的新颖性生成。

英文摘要

We propose an information-theoretic framework for graph novelty generation, which aims to generate data that are distinct from existing patterns while preserving global structural consistency. Our approach embeds data into a latent space, models the latent distribution using finite mixture models, and generates novel samples by imposing explicit novelty and reliability conditions formulated in terms of description length. Specifically, novelty is enforced by requiring generated samples to be poorly explained by all existing mixture components, while reliability constrains their impact on the overall mixture structure under the Minimum Description Length (MDL) principle. We provide a theoretical analysis showing that, with appropriate threshold choices, the probabilities of misclassifying non-novel or unreliable samples converge to zero with explicit rates. Experiments on synthetic and benchmark graph datasets demonstrate that the proposed method enables principled novelty generation with quantifiable risk.

URL PDF HTML ☆

赞 0 踩 0

2606.19802 2026-06-19 cs.LG cs.CV 新提交

Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems

流映射去噪器：遍历逆问题的失真-感知平面

Nicolas Zilberstein, Morteza Mardani, Santiago Segarra

发表机构 * Rice University（莱斯大学）； NVIDIA Inc.（英伟达公司）

AI总结提出流映射模型，通过单一参数t在MMSE和感知质量间连续调节，实现逆问题的失真-感知权衡，无需额外监督或调参。

详情

AI中文摘要

图像复原面临一个基本权衡：最小化误差的方法产生模糊重建，而最大化感知质量的方法产生锐利但不够保真的图像。现有方法要么在失真-感知（DP）前沿上固定一个操作点，要么需要配对数据监督、辅助模型或对采样器进行超参数调优以访问不同点。我们证明，流映射模型——一种用于少步采样的流匹配的近期扩展，学习一个平均场——隐式定义了一个单参数去噪器族，连续跨越DP前沿。前瞻参数t充当MMSE和感知区域之间的控制旋钮。对于高斯目标，我们证明改变t精确恢复最优DP前沿；对于自然图像，我们在经验上观察到类似行为。在即插即用求解器中，相同机制扩展到一般逆问题，控制感知对齐与数据一致性之间的权衡。尽管在此设置中缺乏精确最优性保证，单个训练的流映射跨越DP权衡，在两端匹配或超越专门基线。在CelebA（128×128）和AFHQ（256×256）上的多个线性和非线性逆任务的广泛实验验证了我们的发现。

英文摘要

Image restoration faces a fundamental tradeoff: methods that minimize error produce blurry reconstructions, while those that maximize perceptual quality yield sharp but less faithful images. Existing approaches either commit to a single operating point on this distortion perception (DP) frontier or require paired-data supervision, auxiliary models, or hyperparameter tuning of the sampler to access different points. We show that flow map models, a recent extension of flow matching for few-step sampling that learns an average field, implicitly define a one-parameter family of denoisers that continuously spans the DP frontier. The lookahead parameter t acts as a control knob between the MMSE and perceptual regimes. For Gaussian targets, we prove that varying t exactly recovers the optimal DP frontier; for natural images, we observe similar behavior empirically. Within a Plug-and-Play solver, the same mechanism extends to general inverse problems, where it controls a tradeoff between perceptual alignment and data consistency. Despite the lack of exact optimality guarantees in this setting, a single trained flow map spans the DP tradeoff, matching or exceeding specialized baselines at both extremes. Extensive experiments on CelebA ($128\times 128$) and AFHQ ($256\times 256$) across several linear and nonlinear inverse tasks validate our findings.

URL PDF HTML ☆

赞 0 踩 0

2606.19894 2026-06-19 cs.LG 新提交

BrainG3N：用于可控3D脑MRI生成的双用途分词器

Max Van Puyvelde, Ibrahim Gulluk, Wim Van Criekinge, Olivier Gevaert

发表机构 * Department of Biomedical Data Science, Stanford University School of Medicine（斯坦福大学医学院生物医学数据科学系）； Department of Mathematical Modelling, Statistics & Bioinformatics, Ghent University（根特大学数学建模、统计与生物信息学系）； Department of Electrical Engineering, Stanford University（斯坦福大学电气工程系）

AI总结提出基于3D掩码自编码器的分词器，解耦编码器与解码器，在23项线性探测任务中21项超越SOTA，并支持条件生成和纵向预测。

详情

AI中文摘要

三维（3D）脑MRI是临床神经病学和神经肿瘤学的核心，生成模型可以增强代表性不足的队列、模拟疾病轨迹并支持隐私保护的数据共享。潜在扩散已成为建模成像数据的首选解决方案，但它对分词器提出了两个竞争性要求：编码器嵌入必须保留下游任务所需的临床信息，解码器必须重建解剖学上准确的体积。现有的重建驱动分词器以牺牲前者为代价实现了后者。为了解决这个问题，我们引入了一种基于全体积掩码自编码器（MAE）的分词器，用于3D脑MRI潜在扩散，解耦编码器和解码器：冻结的3D MAE编码器产生临床信息丰富的嵌入，而专用的CNN解码器从这些嵌入的线性投影重建体素。我们在来自18个公共队列的35,309个体积上预训练编码器，涵盖四种模态、十种疾病类别和200多个采集站点，并在两种设置中展示了其双重用途。首先，在23项线性探测基准测试中，编码器在21项任务上优于或匹配SOTA模型（即BrainIAC、BrainSegFounder和MedicalNet）。其次，在这些临床信息丰富的嵌入上训练的条件扩散变压器（DiT）支持跨六个变量的条件生成和患者特定的纵向预测。这些结果共同建立了一个单一的3D脑MRI嵌入空间，能够同时支持下游临床任务和可控生成。

英文摘要

Three-dimensional (3D) brain MRI is central to clinical neurology and neuro-oncology, where generative models could augment under-represented cohorts, simulate disease trajectories, and support privacy-preserving data sharing. Latent diffusion has been the go-to solution for modeling imaging data, but it places two competing demands on the tokenizer: encoder embeddings must retain the clinical information that downstream tasks act on, and the decoder must reconstruct anatomically faithful volumes. Existing reconstruction-driven tokenizers achieve the second at the expense of the first. To address this, we introduce a fully volumetric masked-autoencoder (MAE) based tokenizer for 3D brain MRI latent diffusion, decoupling encoder and decoder: a frozen 3D MAE encoder produces clinically informative embeddings, while a dedicated CNN decoder reconstructs voxels from a linear projection of those embeddings. We pretrain the encoder on 35,309 volumes from 18 public cohorts spanning four modalities, ten disease categories, and 200+ acquisition sites, and demonstrate its dual utility in two settings. First, on a 23-task linear-probing benchmark, the encoder outperforms or matches SOTA models (i.e., BrainIAC, BrainSegFounder, and MedicalNet) on 21 of 23 tasks. Second, a conditional diffusion transformer (DiT) trained on these clinically informative embeddings supports both conditional generation across six variables and patient-specific longitudinal forecasting. Together these results establish a single 3D brain-MRI embedding space capable of both downstream clinical tasks and controllable generation.

URL PDF HTML ☆

赞 0 踩 0

2606.20094 2026-06-19 cs.CV cs.AI cs.GR cs.LG cs.MM 交叉投稿

MakeupMirror: Improving Facial Attribute Preservation in Diffusion Models for Makeup Transfer

MakeupMirror：在用于化妆迁移的扩散模型中改进面部属性保持

Nefeli Andreou, Angel Martínez-González, Sabine Sternig, Matthieu Guillaumin, Epameinondas Antonakos, Michael Opitz

发表机构 * Amazon（亚马逊）

AI总结提出MakeupMirror扩散模型，通过ControlNet几何条件、区域特定迁移控制、肤色调制和Langevin采样器，在保持面部特征和肤色的同时实现高质量化妆迁移，相比Stable-Makeup提升面部识别相似度60%、降低肤色差异50%。

详情

AI中文摘要

化妆迁移模型能够实现有趣的增强现实（AR）体验以及在线化妆购物的虚拟试妆（VTO）。尽管最近最先进的基于扩散的解决方案（如Stable-Makeup）显著提高了化妆迁移的准确性和逼真度，但在身份和肤色保持方面仍存在局限性，使得用于化妆购物的生产级VTO不切实际。在这项工作中，我们提出了MakeupMirror，一种基于扩散的化妆迁移方法，在保持面部特征和肤色方面取得了显著进展。我们在Stable-Makeup的基础上引入了多项技术创新：（1）将面部几何条件与ControlNets集成以保持面部保真度；（2）区域特定的化妆迁移控制，以便在面部区域（如皮肤、眼睛和嘴唇）实现精确的化妆应用；（3）基于肤色的化妆迁移调制，防止跨主体迁移场景中的肤色改变；（4）集成Levenberg-Marquardt Langevin采样器以加速推理同时保持生成质量。我们在CPM-Real、Makeup Wild以及（本文新收集的、更多样化的）MakeupSelfies数据集上的实验表明，与Stable-Makeup相比，MakeupMirror将相对面部识别相似度提高了+60%，将相对肤色差异降低了-50%，延迟为0.7秒，同时在核心面部身份保持标准上达到了94%的专家接受率。

英文摘要

Makeup transfer models enable fun augmented reality (AR) experiences as well as virtual try-on (VTO) for online makeup shopping. While recent state-of-the-art diffusion based solutions such as Stable-Makeup dramatically improve the accuracy and realism of makeup transfer, they still face limitations in identity and skin color preservation, making production-level VTO for makeup shopping unrealistic. In this work, we propose MakeupMirror, a diffusion-based approach to makeup transfer that makes significant progress towards preserving facial features and skin tone. We introduce several technical innovations over Stable-Makeup: (1) integration of facial geometry conditioning with ControlNets to maintain facial fidelity; (2) region-specific makeup transfer control to enable precise makeup application across facial regions such as skin, eyes and lips; (3) skin tone-based makeup transfer modulation that prevent skin tone alteration in cross-subject transfer scenarios; and (4) integration of a Levenberg-Marquardt Langevin sampler to speed up inference while maintaining generation quality. Our experiments on CPM-Real, Makeup Wild, and (herein newly collected, more diverse) MakeupSelfies datasets show that MakeupMirror improves relative facial recognition similarity by +60%, reduces relative skin tone difference by -50% over Stable-Makeup, with a latency of 0.7s, while achieving expert acceptance rate of 94% across core facial identity preservation criteria.

URL PDF HTML ☆

赞 0 踩 0

2606.20457 2026-06-19 eess.AS cs.AI cs.LG 交叉投稿

Repurposing a Speech Classifier for Guided Diffusion-Based Speech Generation

重新利用语音分类器进行基于引导扩散的语音生成

Rostislav Makarov, Timo Gerkmann

AI总结提出将预训练的语音分类器作为扩散生成的主干，通过附加轻量子网络并仅训练该子网络，实现单主干模型的高质量条件语音生成，降低内存和计算成本。

Comments Accepted for publication in the Proceedings of Interspeech 2026

详情

AI中文摘要

分类器引导是一种通过使用噪声条件分类器将采样过程导向目标类别来控制扩散生成的方法。分类器引导的一个缺点是需要两个单独训练的模型：一个分类器和一个扩散模型。因此，我们研究了一种更紧凑的替代方案，其中将传统训练的语音分类器重新用作扩散生成的主干。从log-Mel空间中的冻结噪声条件分类器开始，我们附加一个轻量子网络，该子网络重用中间分类器表示，并在去噪分数匹配目标下仅训练该子网络。我们的工作表明，预训练的分类器可以重新用于条件生成，为判别建模和条件语音合成之间提供了有吸引力的桥梁，从而在单主干模型中实现高语音质量，同时减少内存占用和计算成本。

英文摘要

Classifier guidance is a way to control diffusion generation by using a noise-conditioned classifier to steer the sampling process toward a target class. One drawback of classifier guidance is that it requires two separately trained models: a classifier and a diffusion model. We therefore study a more compact alternative in which a conventionally trained speech classifier is repurposed as the backbone for diffusion generation. Starting from a frozen noise-conditioned classifier in log-Mel space, we attach a lightweight subnetwork that reuses intermediate classifier representations and train only this subnetwork under a Denoising Score Matching objective. Our work shows that a pretrained classifier can be repurposed for conditional generation, providing an appealing bridge between discriminative modeling and conditional speech synthesis resulting in high speech quality within a single-backbone model, with reduced memory footprint and computational cost.

URL PDF HTML ☆

赞 0 踩 0

2507.05169 2026-06-19 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Critique of World Model

世界模型批判：一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

AI总结本文从心理学“假设性思维”出发，提出世界模型的核心目标是模拟真实世界的所有可行动可能性，并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测（GLP）架构。

详情

AI中文摘要

AI中文摘要

基于用于诊断变压器权重分布的双参数 Weibull 框架，我们研究了为什么在 AdamW 训练期间 Weibull 权重尺度参数 λ 会增长、过冲然后松弛。我们从 AdamW 更新中推导出平方权重范数的领先阶三力分解：一个对齐力，测量权重与自适应更新方向之间的相关性；一个注入力，来自自适应步长幅度；以及一个衰减力，来自解耦的权重衰减。在具有真实优化器矩的自训练 Pythia-70M 模型上，对齐力主导上升阶段，在四个随机种子中贡献了绝对力预算的 88-94%，并且对超权重移除具有鲁棒性。接近饱和时，对齐力和衰减力趋于平衡，解释了从权重尺度增长到松弛的转变。这些力动态直接控制 λ(t) 背后的平方范数分量；剩余的 RMS 到 Weibull 重建偏移是可测量的，并分解为桥接分量和积分分量，在密集采样区域总计约 5-6%。为了将分析扩展到无法获得优化器矩的真实模型，我们引入了一种样条位移方法，该方法从稀疏检查点以约 92-94% 的准确率恢复对齐力，大约是朴素两点基线的两倍。我们进一步观察到，在我们的实验中，λ(t) 的峰值随训练数据一致性而变化，这表明权重尺度增长存在数据依赖成分，我们将其留待后续对照研究。代码和数据可在 https://this URL 获取。

英文摘要

Building on a two-parameter Weibull framework for diagnosing transformer weight distributions, we study why the Weibull weight-scale parameter $λ$ grows, overshoots, and then relaxes during AdamW training. We derive a leading-order three-force decomposition of the squared weight norm from the AdamW update: an alignment force measuring the correlation between weights and the adaptive update direction, an injection force from adaptive step magnitude, and a decay force from decoupled weight decay. On self-trained Pythia-70M models with ground-truth optimizer moments, alignment dominates the rise phase, contributing 88-94% of the absolute force budget across four random seeds and remaining robust to super-weight removal. Near saturation, alignment and decay approach balance, explaining the transition from weight-scale growth to relaxation. These force dynamics directly govern the squared-norm component underlying $λ(t)$; the remaining RMS-to-Weibull reconstruction offset is measurable and decomposes into bridge and integration components, totaling approximately 5-6% in densely sampled regions. To extend the analysis to real models where optimizer moments are unavailable, we introduce a spline displacement method that recovers the alignment force from sparse checkpoints with approximately 92-94% accuracy, about twice the naive two-point baseline. We further observe that the peak value of $λ(t)$ varies with training-data coherence in our experiments, suggesting a data-dependent component of weight-scale growth that we leave to a controlled follow-up study. Code and data are available at https://github.com/tiexinding/NPM-Weibull-public.

URL PDF HTML ☆

赞 0 踩 0

2606.19369 2026-06-19 cs.LG cs.AI 新提交

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

零膨胀高斯分布使估计分布算法中的参数空间稀疏化

Andreas Faust, Sven Nitzsche, Juergen Becker

发表机构 * University of Freiburg（弗莱堡大学）； FZI Research Center for Information Technology（FZI信息技术研究中心）； Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）

AI总结提出多元零膨胀高斯分布作为估计分布算法的采样分布，联合优化稀疏模式和活跃参数，无需手工设计稀疏算子，在Lunar Lander基准上收敛更快且最终回报更高。

详情

AI中文摘要

估计分布算法（EDA）是一类强大的黑箱优化进化方法，尤其当目标函数结构未知时。经典进化算法依赖于手工设计的变异和交叉算子，这些算子难以针对未知问题结构设计，且是偏差的来源，而EDA完全绕过了算子设计：它们将概率分布拟合到最佳个体，并从中采样下一代。EDA在连续参数空间上已得到充分确立，但此前尚未推广到稀疏空间——其中良好解的大多数系数恰好为零。现有的稀疏黑箱优化器因此重新引入了EDA旨在避免的东西：手工制作的稀疏算子、支持集与活跃值交替的双层方案、零阈值以及其他内置假设。我们通过提出多元零膨胀高斯（ZIG）分布作为EDA采样法则来填补这一空白。一个具有独立指示维度和值维度的潜在高斯模型表示稀疏模式、活跃参数之间的相关性以及两者之间的相互作用，因此稀疏模式和活跃值被联合优化，无需层次结构。我们证明该模型的潜在参数可以从观测样本中识别，不同于相关构造起源的缺失数据设置，并引入了实用的基于摊销反演的估计器。这些估计器准确恢复潜在相关结构，在Lunar Lander基准上，由此产生的ZIG-EDA比稠密高斯EDA、手工制作的稀疏进化算法和特设稀疏EDA收敛更快且最终回报更高，同时找到的控制器只有一小部分参数活跃。

英文摘要

Estimation-of-distribution algorithms (EDAs) are a powerful class of evolutionary methods for black-box optimization, especially when little is known about the structure of the objective. Whereas classical evolutionary algorithms rely on hand-designed mutation and crossover operators, hard to devise for unknown problem structures, and a source of bias, EDAs sidestep operator design entirely: they fit a probability distribution to the best individuals and sample the next generation from it. EDAs are well established on continuous parameter spaces, but they have not previously been generalized to sparse ones, in which most coefficients of a good solution are exactly zero. Existing sparse black-box optimizers therefore reintroduce exactly what EDAs were designed to avoid: hand-crafted sparsity operators, bi-level schemes alternating between support set and active values, zeroing thresholds, and other baked-in assumptions. We close this gap by proposing multivariate zero-inflated Gaussian (ZIG) distributions as EDA sampling laws. A latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and the interactions between the two, so sparsity patterns and active values are optimized jointly, hierarchy-free. We show that the latent parameters of this model are identifiable from observed samples, unlike in the missing-data settings where related constructions originate, and introduce practical amortized inversion-based estimators for them. The estimators accurately recover latent correlation structures, and on the Lunar Lander benchmark the resulting ZIG-EDA converges faster and reaches higher final returns than a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse EDA, while finding controllers with only a small fraction of parameters active.

URL PDF HTML ☆

赞 0 踩 0

2606.19491 2026-06-19 cs.LG stat.ML 新提交

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

LayerNorm Transformer 中的代数死方向：一种仅需前向传播的大语言模型规模诊断方法

Tejas Pradeep Shirodkar, P. J. Narayanan

发表机构 * IIIT, Hyderabad（海得拉巴国际信息技术学院）

AI总结本文发现 LayerNorm 的逆尺度方向是后最终归一化中心激活协方差矩阵的精确代数核，可仅从参数中读取死方向，无需前向或后向传播，并在 14 个预训练模型上验证了其有效性。

Comments 34 pages, 7 figures, 6 tables. Empirical companion to arXiv:2606.05957

详情

AI中文摘要

预训练 Transformer 位于损失函数的奇异极小值附近，此时 Fisher 信息度量沿死方向退化：参数空间中方向性 Fisher 为零的方向。通常定位这样的方向需要一次前向传播和激活矩阵的特征分解，或基于采样的复杂度估计；没有一种方法能仅从网络参数计算方向。我们针对 LayerNorm Transformer 给出了一个这样的方向。LayerNorm 仿射的逆尺度方向 $\gamma^{-1}/\|\gamma^{-1}\|$ 是后最终归一化中心激活协方差矩阵的精确代数核，适用于任何输入分布，并在参数空间中诱导出相应的死方向。它仅从 LN 尺度参数读取，无需前向或后向传播，无需特征分解：这是针对 LayerNorm 的最廉价死方向读取方法。我们在 14 个预训练 Transformer（9 个 LayerNorm，5 个 RMSNorm；160M-35B；语言和视觉目标）上进行了测试。在随机初始化时，预测方向与测量的底部奇异方向（一次前向传播，直接 SVD）在 9/9 的 LayerNorm 模型上匹配到小数点后四位，并在 5/5 的 RMSNorm 模型上正确缺失，后者缺乏产生该方向的均值减法投影器。在训练后的检查点上，沿该方向的协方差特征值加深约 ${\sim}10^3$ 倍，并打开更多死方向；随机初始化到训练后的差距是一次前向传播、每检查点沿预测坐标的奇异结构读出。由此得出两个闭式结论：残差流的最小奇异值在 13/14 个 Transformer 上逐块保持不变（在其自身输入分布上测量），唯一的例外（Gemma$4$-$31$B）是一个真正的死方向，同一读出可精确定位；核方向的存在从参数本身即可对 Transformer 的归一化进行分类。

英文摘要

Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $γ^{-1}/\|γ^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

URL PDF HTML ☆

赞 0 踩 0

2606.19521 2026-06-19 cs.LG math.OC 新提交

Interactive Pareto navigation for deep multi-task learning

深度多任务学习的交互式帕累托导航

Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz

发表机构 * Department of Computer Science, TU Dortmund, Dortmund, Germany（多特蒙德工业大学计算机科学系，德国多特蒙德）； Lamarr Institute for Machine Learning and Artificial Intelligence（拉马尔机器学习和人工智能研究所）

AI总结提出偏好帕累托探索（PPE）框架，通过预测-校正方法沿帕累托流形切线方向引导偏好，利用Krylov子空间方法避免Hessian计算，实现高效交互式多目标优化。

详情

AI中文摘要

在多任务学习中，处理越来越多的目标在计算资源和决策者选择适当权衡的能力方面都很快变得具有挑战性。因此，一种广泛使用的方法是通过加权和将各个损失聚合到单个损失函数中。这通常由于帕累托前沿的形状而无法捕捉决策者的偏好，或者需要多次调整和计算，这在深度学习应用中变得过于昂贵。为了解决这些问题，我们引入了一个新颖的框架，偏好帕累托探索（PPE），它在交互式探索过程中强制执行决策者的偏好，同时考虑帕累托集的几何形状。PPE基于预测-校正方法，该方法沿着帕累托最优解流形的切线方向执行预测步骤，遵循决策者的偏好。随后的校正步骤产生反映该偏好的新权衡。为了在表征流形切空间时避免显式的Hessian计算，我们采用了一种仅依赖于矩阵-向量乘积的Krylov子空间方法。这些乘积可以通过自动微分高效获得，确保了整个优化过程的效率和鲁棒性。该方法的有效性和性能通过玩具问题和深度学习示例进行了展示。

英文摘要

In multi-task learning, handling an increasing number of objectives can quickly become challenging, both in terms of the computational resources and the decision maker's capacity to choose appropriate trade-offs. A widely used approach is thus to aggregate the individual losses in a single loss function by a weighted sum. This often fails to capture either the decision maker's preferences as a result of the shape of the Pareto front, or requires multiple adjustments and computations which becomes prohibitively expensive in deep learning applications. To address these issues, we introduce a novel framework, Preference Pareto Exploration (PPE), which enforces the decision maker's preferences while accounting for the geometry of the Pareto set in an interactive exploration process. PPE is based on a predictor-corrector method that performs predictor steps tangential to the manifold of Pareto-optimal solutions, following the decision maker's preference. The subsequent corrector step results in a new trade-off reflecting this preference. To avoid explicit Hessian computations when characterizing the tangent space of the manifold, we employ a Krylov subspace method that relies solely on matrix-vector products. These products can be efficiently obtained via automatic differentiation, ensuring both efficiency and robustness throughout the optimization process. The method's functionality and performance are demonstrated using both toy problems and examples from deep learning.

URL PDF HTML ☆

赞 0 踩 0

2606.19652 2026-06-19 cs.LG 新提交

Convex training of Lipschitz-regularized shallow neural networks

Lipschitz正则化浅层神经网络的凸训练

Chao Yin, Antoine Lesage-Landry

发表机构 * Polytechnique Montréal, GERAD & Mila, Montréal, QC, Canada（蒙特利尔理工学院，GERAD & Mila，加拿大魁北克省蒙特利尔市）

AI总结提出一种凸限制方法求解非凸Lipschitz正则化训练问题，可全局最优求解，并作为预训练网络的后处理步骤，提升对抗鲁棒性和准确性。

详情

AI中文摘要

在这项工作中，我们引入了一种针对浅层神经网络的训练程序，该程序能够提升对对抗攻击的鲁棒性。我们通过引入一个凸限制来解决非凸的Lipschitz正则化训练问题，该凸限制可以高效地求解全局最优解。我们的方法可以作为后处理步骤，将预训练网络作为初始解，然后求解凸规划，其最优网络保证不劣于初始网络。我们通过在对抗设置下使用真实世界数据集进行回归任务的实验，展示了我们训练程序的改进。数值结果表明，与现有方法相比，求解我们提出的凸规划得到的网络在Lipschitz正则化程序上具有更低的目标值。此外，我们表明，在某些数据集上，使用我们的凸训练程序获得的网络在对抗攻击下既更准确又更鲁棒。

英文摘要

In this work, we introduce a training procedure for shallow neural networks that promotes robustness against adversarial attacks. We solve a non-convex Lipschitz-regularized training program by introducing a convex restriction that can be efficiently solved to global optimality. Our approach can be employed as a post-processing step by taking a pre-trained network as an initial solution to then solving the convex program whose optimal network is guaranteed to be no worse than the initial one. We illustrate the improvements of our training procedure with experiments using real world datasets for regression tasks under an adversarial setting. We show numerically that solving our proposed convex program yields networks with lower objective values on the Lipschitz-regularized program compared to existing methods. Additionally, we show that on certain datasets, networks obtained using our convex training program are both more accurate and robust with respect to adversarial attacks.

URL PDF HTML ☆

赞 0 踩 0

2606.19876 2026-06-19 cs.LG math.OC 新提交

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

通过反向Fisher散度实现高斯混合模型中得分匹配的梯度下降全局收敛

Alexander Tyurin

AI总结研究反向Fisher散度下梯度下降拟合高斯混合模型的全局收敛性，证明从任意初始化或随机初始化下学生分量收敛到最近教师分量，并给出全变差距离收敛条件。

详情

AI中文摘要

得分匹配问题是现代生成建模、扩散模型、拟合非归一化统计模型和逆问题中的核心训练目标。标准方法是最小化前向Fisher散度，其中期望相对于教师分布取。然而，最近结果表明，即使在简单的高斯混合模型设置中，该目标也可能导致不良且依赖初始化的收敛行为。本文研究另一种目标：反向Fisher散度，其中期望相对于学生分布取。我们分析梯度下降（GD）拟合高斯混合模型，并表明目标函数的这一改变导致显著更好的优化性质。首先，当教师分布是单个高斯分布且学生是固定权重和单位协方差的高斯混合模型时，我们证明了从任意初始化出发GD的全局收敛性。其次，我们将分析扩展到教师也是高斯混合模型的情况，并在全局随机初始化方案和目标均值满足$\widetilde{\Omega}(1)$-分离假设下证明了全局收敛保证。特别地，以高概率，每个学生分量收敛到其最近的教师分量，并且我们提供了学生分布在全变差距离下收敛的条件。我们的证明依赖于基于Lyapunov的梯度下降动力学新分析，表明反向Fisher散度比前向Fisher散度具有更有利的优化景观。

英文摘要

The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetildeΩ(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

URL PDF HTML ☆

赞 0 踩 0

2606.19878 2026-06-19 cs.LG math.OC stat.ML 新提交

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University（普渡大学）

AI总结提出分段多项式插值梯度下降（PPI-GD）方法，通过数据域等距点查询一阶预言构造多项式插值近似全梯度，在强凸和非凸损失下分析预言复杂度，证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情

DOI: 10.1109/TAC.2026.3682210

AI中文摘要

最近关于经验风险最小化（ERM）的一阶优化器的工作表明，可以利用ERM损失函数在训练数据中的光滑性（而非优化参数中的光滑性）来改进梯度下降（GD）方法的预言复杂度。在本文中，我们提出了一种不精确梯度方法——分段多项式插值梯度下降（PPI-GD），该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度，从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度，其中数据空间维数以训练样本数量的多对数函数为界，并发现当损失函数足够光滑时，PPI-GD在关键区域优于几种GD变体。此外，我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中，这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.19891 2026-06-19 cs.LG 新提交

Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses

具有全局有界扰动的凸损失对抗性赌博机优化

Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto

发表机构 * Department of Informatics, Kyushu University（九州大学信息学系）； RIKEN AIP（理化学研究所革新智能综合研究中心）

AI总结研究损失函数可能非凸非光滑的对抗性赌博机优化，提出一种修改的赌博机优化算法，并分析扰动预算对遗憾的影响，将线性损失下的全局预算后行动扰动模型扩展到一般凸且β-光滑损失。

详情

AI中文摘要

我们研究对抗性赌博机优化，其中损失函数可能非凸且非光滑。在每一轮中，学习者选择一个动作并仅观察该动作产生的损失。损失由一个潜在的凸且β-光滑分量和一个对抗性扰动组成，该扰动可能在观察学习者的动作后选择。扰动受全局预算约束，控制其随时间累积的幅度。该框架将全局预算的后行动扰动模型从线性损失扩展到一般凸且β-光滑损失。对于这个更广泛的类别，我们建立了期望遗憾保证，明确刻画了扰动预算的影响。为了建立这些保证，我们修改了一个标准的赌博机优化算法，并开发了一种分析来控制由扰动引起的额外遗憾。在没有扰动的情况下，我们的结果退化为具有β-光滑损失的标准赌博机凸优化设置的遗憾保证。

英文摘要

We study adversarial bandit optimization in which the loss functions may be non-convex and non-smooth. In each round, the learner selects an action and observes only the loss incurred at that action. The loss consists of an underlying convex and $β$-smooth component and an adversarial perturbation that may be chosen after observing the learner's action. The perturbations are subject to a global budget controlling their cumulative magnitude over time. This framework extends the globally budgeted, post-action perturbation model from underlying linear losses to general convex and $β$-smooth losses. For this broader class, we establish expected regret guarantees that explicitly characterize the effect of the perturbation budget. To establish these guarantees, we modify a standard bandit optimization algorithm and develop an analysis that controls the additional regret caused by the perturbations. In the absence of perturbations, our results reduce to regret guarantees for the standard bandit convex optimization setting with $β$-smooth losses.

URL PDF HTML ☆

赞 0 踩 0

2606.20075 2026-06-19 cs.LG cs.CL 新提交

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

什么使得潜在思维链中的监督有效：一种信息论分析

Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen

发表机构 * Ningbo Institute of Digital Twin, Eastern Institute of Technology（宁波数字孪生研究院，东方理工大学）； Department of Computing, The Hong Kong Polytechnic University（香港理工大学计算学系）

AI总结本文从信息论角度分析潜在思维链中的监督失效问题，提出轨迹监督和空间监督两个维度，并引入统一潜在探针（ULP）量化信息保真度，揭示了信息-性能绑定关系。

详情

AI中文摘要

潜在思维链（Latent Chain-of-Thought, CoT）将推理内化到连续隐藏状态中，为冗长的离散推理轨迹提供了一种有前景的替代方案。然而，鲁棒的潜在推理仍然困难，因为结果监督提供的学习信号较弱，且容易导致潜在轨迹发生语义漂移。在这项工作中，我们从信息论角度分析潜在CoT，并将这种失效识别为双重崩溃：优化路径上的梯度衰减和潜在空间中的表征漂移。我们进一步将过程监督分解为两个互补维度：轨迹监督（注入密集的逐步推理信号）和空间监督（保持潜在流形的语义结构）。我们的分析表明，刚性几何压缩可能坍缩推理空间，而生成式重建提供了更灵活的语义锚点，更好地保留了信息容量。为了衡量这些效应，我们引入了统一潜在探针（Unified Latent Probe, ULP），用于量化潜在轨迹与显式推理步骤之间的互信息。实验揭示了清晰的信息-性能绑定关系：推理准确性取决于潜在链中保留的信息保真度。这些发现为潜在推理监督提供了一个原则性框架，并建议从几何模仿转向互信息最大化。我们的代码可在\href{this https URL}{此仓库}获取。

英文摘要

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information-Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain. These findings provide a principled framework for latent reasoning supervision and suggest shifting from geometric imitation toward mutual information maximization. Our code is available at \href{https://github.com/EIT-NLP/Supervision-in-Latent-CoT}{this repository}.

URL PDF HTML ☆

赞 0 踩 0

2606.20183 2026-06-19 cs.LG 新提交

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

有效维度主导量子核视觉模型的泛化

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

AI总结通过有效维度d_eff解释量子视觉模型中纠缠结构增强泛化与量子噪声提升测试精度的现象，提出噪声形状核的谱分解与正则化机制。

详情

AI中文摘要

最近的量子视觉模型——量子视觉变换器和量子卷积网络——报告了两个引人注目但尚未解释的经验现象：(i) 具有更多或更均匀分布纠缠的拟设泛化更好，以及(ii) 注入量子噪声可以提高测试精度而不是降低它。这些观察目前被视为奇闻，通过网格搜索发现，并且如果有解释的话，也是手工进行的。我们表明，两者都是一个单一可测量量的表现：即（噪声形状的）量子特征核的\emph{有效维度}$d_{\rm eff}$。主要使用量子核视觉模型——由核分类器读出的量子特征映射——我们给出了一个谱解释，其中纠缠结构和量子噪声是调节$d_{\rm eff}$的两个旋钮；在过拟合区域，收缩$d_{\rm eff}$起到类似岭正则化的作用。我们分析了机制：退极化核$K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$的\emph{精确}分解，其中$d_{\rm eff}(K_p)\to1$，振幅阻尼的收缩结果（及其边界），核机器容量界，以及容量/对齐风险分解；在我们的纠缠实验中运作的单调收缩是经验验证的，并非普遍证明。沿着单参数退极化族，坍缩反而是通过构造精确的；我们仅用它来确认核分解到机器精度，最多达12个量子比特，而不是作为$d_{\rm eff}$的证据。振幅阻尼收缩$d_{\rm eff}$并沿倒U型最佳点将测试精度提升高达+13%；效应符号在过拟合和欠拟合区域之间翻转；噪声注入匹配显式谱过滤前沿。我们的结果将两个报告的现象组织成一个单一可测量原则，用于设计量子视觉模型。

英文摘要

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

URL PDF HTML ☆

赞 0 踩 0

2606.20325 2026-06-19 cs.LG cs.SC math.DS 新提交

Recurrent neural networks approximate continuous functions

递归神经网络近似连续函数

Valentin Abadie, Clemens Hutter, Helmut Bölcskei

AI总结本文证明，对于[-1,1]上的任意连续函数，存在一个固定权重和隐藏维度的ReLU递归神经网络，其时间演化可以均匀逼近该函数，并给出了收敛速率和极小极大下界。

详情

AI中文摘要

经典逼近定理要求每当目标精度提高时，就需要一个新的神经网络。本文研究相反的可能性：能否一劳永逸地选择网络，而仅通过让其运行更长时间来换取精度？我们证明这对于[-1,1]上的每个连续函数都是可能的。更准确地说，每个这样的函数都可以通过一个具有固定权重和固定隐藏维度的单ReLU递归神经网络的时间演化来均匀逼近。该构造背后的机制是一个新的中间模型——带神经单元的图灵机（TMNU）。该模型保留了实现多项式逼近方案所需的算法自由度，同时保持足够的刚性，以便被具有显式隐藏维度和权重幅度界限的RNN模拟。由此产生的收敛速率反映了底层多项式逼近的速率。我们通过极小极大下界补充了该构造，表明运行时间不仅仅是证明的产物，而是这种固定网络逼近范式中不可避免的资源。

英文摘要

Classical approximation theorems ask for a new neural network whenever the target accuracy is improved. This paper studies the opposite possibility: can the network be chosen once and for all, and can accuracy be bought only by letting it run longer? We prove that this is possible for every continuous function on [-1,1]. More precisely, each such function is uniformly approximated by the time evolution of a single ReLU recurrent neural network with fixed weights and fixed hidden dimension. The mechanism behind the construction is a new intermediate model, the Turing machine with neural units (TMNU). This model retains the algorithmic freedom needed to implement polynomial approximation schemes, while remaining rigid enough to be simulated by RNNs with explicit bounds on hidden dimension and weight magnitude. The resulting convergence rates reflect the underlying polynomial approximation rates. We complement the construction with minimax lower bounds showing that runtime is not merely a proof artifact, but an unavoidable resource in this fixed-network approximation paradigm.

URL PDF HTML ☆

赞 0 踩 0

2606.20357 2026-06-19 cs.LG 新提交

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

时序差分学习的方差及其通过控制变量的降低

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结本文分析表格表示下相位设置中时序差分学习的方差，证明其方差降低机制是通过有效聚合更多独立轨迹，并比较了TD、MC和DAE的方差界限。

Comments Accepted at RLC2026

2606.20469 2026-06-19 cs.LG cs.CG 新提交

Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

Fisher-几何锐度与SGD对平坦极小值的隐式偏好

Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta

发表机构 * Gauhati University（高哈蒂大学）

AI总结针对SGD偏好平坦极小值但欧氏锐度不具重参数化不变性的问题，提出基于Fisher信息矩阵的黎曼锐度，证明其不变性，并导出SGD稳态分布集中于平坦极小值，PAC-Bayes界联系泛化性能。

Comments 18 pages, 5 figures, preprint

详情

AI中文摘要

深度学习中的一个广泛直觉是随机梯度下降（SGD）隐式偏好平坦极小值，且平坦极小值泛化更好，但损失Hessian的迹或最大特征值等标准欧氏平坦度度量在保持网络函数的重参数化下并非不变，这削弱了这一叙事的理论基础。在本研究中，我们通过将平坦度建立在由Fisher信息矩阵（FIM）诱导的统计流形的黎曼几何上，解决了这一问题。我们在数学上定义了黎曼锐度，并证明它在光滑、保函数的重参数化下是不变的，这直接回应了Dinh等人在论文“Sharp minima can generalize for deep nets”中的批评。我们注意到这种不变性是真实FIM的一个性质；实践中使用的对角经验估计量（以及下面所有实验中的）仅近似继承不变性，而在任意重参数化下的精确不变性需要结构化估计量如K-FAC。我们将小批量SGD的梯度噪声形式化为具有与FIM成比例的协方差结构，推导出所得随机微分方程的稳态分布，然后证明概率质量指数级集中在黎曼平坦极小值处。一个由SR显式控制的PAC-Bayes泛化界正式地将这种几何偏差与测试性能联系起来。我们在MNIST和CIFAR-10上的实验证实，SR以欧氏锐度无法做到的方式可靠地跟踪泛化，并且其随$\eta/B$的缩放与理论预测相匹配。这些结果共同提供了一个严格的、重参数化不变的解释，说明为什么平坦极小值能泛化。

英文摘要

A widely held intuition in deep learning is that stochastic gradient descent (SGD) implicitly favors flat minima and that flat minima generalize better, but standard Euclidean measures of flatness such as the trace or maximum eigenvalue of the loss Hessian are not invariant under reparametrizations that preserve the network function, which undermines the theoretical foundations of this narrative. In this study we resolve this issue by grounding flatness in the Riemannian geometry of the statistical manifold induced by the Fisher Information Matrix (FIM). We define Riemannian sharpness mathematically and prove that it is invariant under smooth, function-preserving reparametrizations, which directly addresses the critique of Dinh et al. in the paper ``Sharp minima can generalize for deep nets''.We note that this invariance is a property of the true FIM; the diagonal empirical estimator used in practice (and in all experiments below) inherits invariance only approximately, and exact invariance under arbitrary reparametrizations would require structured estimators such as K-FAC. We formalize the gradient noise of mini-batch SGD as having a covariance structure proportional to the FIM, derive the stationary distribution of the resulting stochastic differential equation, and then show that the probability mass is exponentially concentrated at Riemannian-flat minima. A PAC-Bayes generalization bound controlled explicitly by SR formally links this geometric bias to test performance. Our experiments on MNIST and CIFAR-10 confirm that SR reliably tracks generalization in ways that Euclidean sharpness does not, and that its scaling with $η/B$ matches the theoretical predictions. Together these results provide a rigorous, reparametrization-invariant account of why flat minima generalize.

URL PDF HTML ☆

赞 0 踩 0

2606.19410 2026-06-19 stat.ML cs.LG 交叉投稿

Doeblin 曲线

Dongmin Lee, William Lu, Anuran Makur, Japneet Singh

AI总结提出 Doeblin 曲线概念，量化马尔可夫核在不同散度和功率水平下的收缩行为，并应用于噪声迭代优化、噪声电路可靠计算和差分隐私等领域的更细粒度收缩分析。

Comments 42 pages, 2 figures

Journal ref IEEE Transactions on Information Theory, vol. 72, no. 6, pp. 3556-3596, June 2026

详情

DOI: 10.1109/TIT.2026.3678229

AI中文摘要

近期关于 Doeblin 系数的研究揭示了它们作为 TV 距离的 Dobrushin 收缩系数的多路泛化的有用性，这与它们在马尔可夫链遍历性理论中的经典作用不同。然而，为了建立信息收缩的存在性，通常需要强条件，例如远离 0。基于最近提出的非线性信息收缩概念，我们旨在提出一种更细粒度的基于 Doeblin 的多路收缩行为刻画，即使对于 Doeblin 系数为 0 的信道，也能产生非平凡的收缩保证。为此，我们引入了 Doeblin 曲线的概念——一种非线性函数，它量化了马尔可夫核在特定散度和功率水平下对输入分布集合的收缩行为。在我们的分析过程中，我们发展了 Doeblin 系数的新变分刻画，提出了 Doeblin 曲线的若干性质，定义了功率约束 Doeblin 曲线的几个版本，并利用上述变分刻画推导了上下界。然后，我们将这些结果应用于不同领域，包括噪声迭代优化的泛化界、噪声电路可靠计算的误差界以及在线迭代算法的差分隐私保证。特别是，我们将这些领域的结果扩展到更广泛的领域或群体设置，利用 Doeblin 曲线揭示比 Doeblin 系数更细粒度的收缩现象。

英文摘要

Recent research on Doeblin coefficients has shed light on their usefulness as a multi-way generalization of the Dobrushin contraction coefficient for TV distance, in a separate vein from their classic role in the theory of Markov chain ergodicity. However, strong conditions, such as being bounded away from 0, are typically necessary for Doeblin coefficients to establish the existence of information contraction. Building on recently formulated concepts of nonlinear information contraction, we aim to propose a finer-grained Doeblin-based characterization of multi-way contraction behavior which yields non-vacuous contraction guarantees even for channels whose Doeblin coefficient is 0. To this end, we introduce the notion of a Doeblin curve -- a nonlinear function which quantifies the contraction behavior of a Markov kernel on collections of input distributions at specific levels of divergence and power. Through the course of our analysis, we develop a new variational characterization of Doeblin coefficients, present several properties of Doeblin curves, define several versions of power-constrained Doeblin curves, and derive upper and lower bounds using our aforementioned variational characterization. We then utilize these results in diverse areas, including generalization bounds for noisy iterative optimization, error bounds for reliable computation with noisy circuits, and differential privacy guarantees for online iterative algorithms. In particular, we extend results in these areas to broader domains or group settings, leveraging Doeblin curves to reveal finer-grained contraction phenomena than Doeblin coefficients.

URL PDF HTML ☆

赞 0 踩 0

2606.20062 2026-06-19 math.OC cs.LG math.PR 交叉投稿

Optimal Coarse Correlated Equilibria in Mean Field Games: Linear Programming and No-Regret Learning

平均场博弈中的最优粗相关均衡：线性规划与无遗憾学习

Luciano Campi, Federico Cannerozzi, Ioannis Tzouanas

AI总结针对连续时间平均场博弈，提出最优粗相关均衡的线性规划刻画，并设计基于拉格朗日对偶的无遗憾学习算法，给出收敛速率。

Comments 55 pages, 3 figures

详情

AI中文摘要

我们引入了连续时间平均场博弈的最优粗相关均衡。粗相关均衡是一种随机推荐方案，任何玩家都无法通过忽略推荐并转向替代策略而获益。问题如下：一个协调者在所有平均场粗相关均衡中选择一个，以优化一个规定的性能准则，该准则可能不同于代表性玩家的目标。在问题公式化之后，我们开发了一个线性规划（LP）公式，证明了最优LP粗相关均衡的存在性，并将LP刻画与原始概率设定联系起来。基于这一刻画，我们设计了一个无遗憾原始-对偶算法，基于外部遗憾约束的等价拉格朗日公式，用于学习此类均衡。我们提供了学习算法的显式收敛速率，数值例子说明了该方法。

英文摘要

We introduce optimal coarse correlated equilibria for continuous-time mean field games. A coarse correlated equilibrium is a randomized recommendation scheme from which no player can gain by ignoring the recommendation and switching to an alternative strategy. The problem is as follows: a moderator selects, among all mean-field coarse correlated equilibria, one that optimizes a prescribed performance criterion, which may differ from the representative player's objective. After formulating the problem, we develop a linear programming (LP) formulation, prove the existence of optimal LP coarse correlated equilibria, and relate the LP characterization to the original probabilistic setting. Building on this characterization, we design a no-regret primal-dual algorithm, based on an equivalent Lagrangian formulation of the external-regret constraint, for learning such equilibria. We provide explicit convergence rates for the learning algorithm, and numerical examples illustrate the method.

URL PDF HTML ☆

赞 0 踩 0

2606.20082 2026-06-19 math.OC cs.DS cs.LG 交叉投稿

Beyond Averaging in John Ellipsoid Approximation: High-Accuracy Algorithms in the Leverage-Score Model

超越John椭球逼近中的平均化：杠杆分数模型中的高精度算法

Xiaoyu Li, Junwei Yu, Jiaojiao Jiang, Junbin Gao, Andi Han

AI总结本文分离了John椭球逼近算法中的认证、识别和精度三种成本，证明精度依赖仅为双对数，并提出了加速方法和阻尼牛顿法，在杠杆分数模型中实现了高精度逼近。

详情

AI中文摘要

对称多面体 $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$ 的 John 椭球由一系列杠杆分数算法计算，从 Cohen, Cousins, Lee 和 Yang (COLT 2019) 到其后续工作 [WY24, CLS+25]，均在 $\Theta(\varepsilon^{-1}\log(n/d))$ 次迭代内达到 $(1+\varepsilon)$-逼近。我们将这一复杂度分离为现代算法混淆的三种成本（认证、识别和精度），并发现历史上的 $\varepsilon^{-1}$ 仅存在于第一种成本中。在等价的 D-最优设计形式 $\min_{\mathbf{p}\in\Delta_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$ 中，杠杆分数预言机恰好是一阶预言机，而 $(1+\varepsilon)$-John 保证对应于 Frank-Wolfe 间隙 $g(\mathbf{p})\le\varepsilon d$；通过这一对应关系，成本得以分离。$\varepsilon^{-1}$ 是认证的产物：迭代点的均匀平均（该系列算法中使用的认证）的间隙恰好为 $\Theta(1/T)$，无论每次迭代多么廉价。相反，针对最后迭代点，同一预言机是快速的：热启动加速方法在 $\varepsilon$-无关的初始化 $C(\mathbf{A})$ 后，仅需 $C(\mathbf{A})+O(\sqrt{\kappa}\log(1/\varepsilon))$ 次查询即可达到保证；一旦最优面被识别，面问题成为无约束自和谐最小化，其 Hessian 可由预言机精确恢复，因此阻尼牛顿法仅需 $O(\log\log(1/\varepsilon))$ 步，总查询数为 $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$。因此，在 $\varepsilon$-无关、条件依赖的初始化后，精度依赖是双对数的；开放问题在于剩余的识别成本（达到最优面的无条件界）和下界。精度并非障碍。

英文摘要

The John ellipsoid of a symmetric polytope $P=\{\mathbf{x}\in\mathbb{R}^d:\|\mathbf{A}\mathbf{x}\|_\infty\le1\}$, $\mathbf{A}\in\mathbb{R}^{n\times d}$, is computed by a long line of leverage-score algorithms, from Cohen, Cousins, Lee and Yang (COLT 2019) to its successors [WY24, CLS+25], all reaching a $(1+\varepsilon)$-approximation in $Θ(\varepsilon^{-1}\log(n/d))$ iterations. We separate this complexity into three costs the modern line conflates (certification, identification, and accuracy) and locate the historical $\varepsilon^{-1}$ in the first alone. In the equivalent D-optimal-design form $\min_{\mathbf{p}\inΔ_n}-\log\det(\sum_i p_i\mathbf{a}_i\mathbf{a}_i^\top)$, the leverage-score oracle is exactly the first-order oracle and the $(1+\varepsilon)$-John guarantee the Frank-Wolfe gap $g(\mathbf{p})\le\varepsilon d$; through this dictionary the costs come apart. The $\varepsilon^{-1}$ is a certification artifact: the uniform average of the iterates, the certificate used throughout the line, has gap exactly $Θ(1/T)$, however cheap each iteration is made. Pointed instead at the last iterate the same oracle is fast: a warm-started accelerated method reaches the guarantee in $C(\mathbf{A})+O(\sqrtκ\log(1/\varepsilon))$ queries after an $\varepsilon$-independent setup $C(\mathbf{A})$, and once the optimal face is identified the facial problem is an unconstrained self-concordant minimization whose Hessian the oracle recovers exactly, so damped Newton needs only $O(\log\log(1/\varepsilon))$ steps, for a total of $C(\mathbf{A})+O(d^2\log\log(1/\varepsilon))$ queries. The accuracy dependence is thus doubly logarithmic after an $\varepsilon$-independent, condition-dependent setup; the open problem is the remaining identification cost (a condition-free bound on reaching the optimal face) and lower bounds. Accuracy is not the obstruction.

URL PDF HTML ☆

赞 0 踩 0

2606.20299 2026-06-19 stat.ML cs.LG hep-ph physics.data-an 交叉投稿

Statistical Properties of Training & Generalization

训练与泛化的统计特性

Itay Lavie, Noam Levi, Yonatan Kahn

AI总结从物理学角度研究深度学习的关键特征和意外现象，回顾神经缩放定律及其与物理问题中约束和归纳偏置的相互作用。

Comments 32 pages, 3 figures. Part of the VERaiPHY initiative

2511.22283 2026-06-19 cs.LG 版本更新

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结本文通过统一MAIR框架，将GP-UCB与MAMS算法置于共同语言下，提出结合两者优势的安全主算法，并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情

AI中文摘要

高斯过程上置信界（GP-UCB）和决策估计系数（DEC）方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中，用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的（而非真实的）高斯过程先验，并利用实现轨迹的复杂度以及计算可处理性，而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验，我们推广了GP-UCB分析和MAMS算法，提出了一种结合两者优势的安全主算法，并提供了一个核赌博机构造，表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是：算法信息和类宽极小极大系数回答不同的问题，并可能导致不同的差距；核赌博机提供了一个干净的环境，使得这种区别在数学上变得可见。

英文摘要

We develop indexed Bellman information complexity, a representation-level theory of interactive decision making centered on information indices and reference histories. The representation strips away problem-specific syntax and retains only the ingredients needed for dynamic programming and information accounting, thereby unifying the earlier framework of indexed algorithmic information ratios (AIR). On the upper-bound side, regret is controlled by Bellman supersolutions or potential identities whose gradient bracket is paid for by indexed information. Upper-confidence-bound (UCB), estimation-to-decision/decision-estimation-coefficient (E2D/DEC), and adaptive-minimax-sampling or exploration-by-optimization (AMS/EBO) methods appear as three relaxations of this same identity. On the lower-bound side, the posterior-reference trajectory supplies both the information telescope and the ghost quantile of small-regret trajectories. The resulting critical radius in the lower bound is an effective-dimension-scale quantity, as in Fano and local-prior-mass lower bounds, rather than the constant radius of a two-point Le Cam argument. The examples show that DEC is best viewed as a one-step relaxation of indexed Bellman information complexity, not as a universally tight conversion mechanism. We illustrate the framework through several applications, with particular emphasis on kernel bandits. In this setting, the active action marginal provides a concrete basis for comparing UCB, E2D, and AMS/EBO.

URL PDF HTML ☆

赞 0 踩 0

2606.15832 2026-06-19 cs.LG math.OC 版本更新

SILAGE: Memory-Efficient, Full-Gradient-Free Nonconvex Optimization for Nested Finite Sums

SILAGE: 针对嵌套有限和的内存高效、完全无全梯度的非凸优化

Igor Sokolov, Laurent Condat, Peter Richtárik

发表机构 * Center of Excellence for Generative AI, King Abdullah University of Science and Technology (KAUST)（生成人工智能卓越中心，国王阿卜杜勒-阿齐兹大学科学与技术学院）

AI总结针对大规模数据中嵌套双有限和结构的非凸优化，提出SILAGE算法，通过利用双和结构避免全局全梯度刷新，仅需O(n)内存，并基于组间和组内异质性实现自适应收敛分析。

Comments 81 pages, 3 algorithms, 4 theorems, 2 corollaries, 11 lemmas, 2 figures, 12 tables

详情

AI中文摘要

大规模数据集上的经验风险最小化自然呈现出嵌套的双有限和结构，其中 $N=nm$ 个总样本被逻辑或物理地划分为 $n$ 个大小为 $m$ 的块（例如，在池化数据孤岛、核外学习或有意分层中）。虽然方差缩减方法对非凸目标实现了最优的 oracle 复杂度，但在此集中式场景中它们遭受严重的扩展瓶颈。递归估计器（如 PAGE）需要定期对所有 $nm$ 个样本进行全局全梯度刷新，这在计算上代价高昂。相反，单循环方法（如 SILVER）避免了此类刷新，但需要不切实际的 $\mathcal{O}(nm)$ 内存来存储每个样本的控制变量。在本文中，我们提出了 SILAGE，一种解决此权衡的方差缩减算法。通过主动利用双和结构，SILAGE 消除了对所有 $nm$ 组件的周期性全局全梯度刷新（每次迭代最多评估一个局部组梯度），同时仅需 $\mathcal{O}(n)$ 内存。此外，我们提供了严格的收敛分析，避免了悲观的 worst-case Lipschitz 常数。相反，SILAGE 的复杂度通过嵌套的函数相似性（组间异质性 $δ_1$ 和组内异质性 $δ_2$）自然地适应底层数据几何。我们的结果在几个实际相关场景中改进了现有的最先进界限。

英文摘要

Empirical risk minimization on massive datasets naturally exhibits a nested double finite-sum structure, where $N=nm$ total samples are logically or physically partitioned into $n$ blocks of size $m$ (e.g., in pooled data silos, out-of-core learning, or deliberate stratification). While variance-reduced methods achieve optimal oracle complexities for nonconvex objectives, they suffer from severe scaling bottlenecks in this centralized regime. Recursive estimators, such as PAGE, require periodic global full-gradient refreshes over all $nm$ samples, which are computationally expensive. Conversely, single-loop methods, such as SILVER, avoid such refreshes but require an impractical $\mathcal{O}(nm)$ memory footprint to store a control variate for every sample. In this paper, we propose SILAGE, a variance-reduced algorithm that addresses this trade-off. By actively exploiting the double-sum structure, SILAGE eliminates periodic global full-gradient refreshes over all $nm$ components (evaluating at most one local group gradient per iteration) while requiring only $\mathcal{O}(n)$ memory. Furthermore, we provide a tight convergence analysis that avoids pessimistic worst-case Lipschitz constants. Instead, SILAGE's complexity natively adapts to the underlying data geometry via nested functional similarities: across-group ($δ_1$) and within-group ($δ_2$) heterogeneity. Our results improve existing state-of-the-art bounds in several practically relevant regimes.

URL PDF HTML ☆

赞 0 踩 0

2309.15769 2026-06-19 math.ST cs.LG stat.ME stat.TH 版本更新

Benign overfitting beyond prediction: The ordinary least squares interpolator

超越预测的良性过拟合：普通最小二乘插值器

Dennis Shen, Dogyoon Song, Peng Ding, Jasjeet S. Sekhon

发表机构 * Department of Data Sciences & Operations, University of Southern California（数据科学与运营系，南加州大学）； Department of Statistics, University of California, Davis（统计学系，加州大学戴维斯分校）； Department of Statistics, University of California, Berkeley（统计学系，加州大学伯克利分校）； Google DeepMind（谷歌DeepMind）

AI总结本文研究过参数化线性模型中最小ℓ2范数OLS插值器的参数估计与推断性质，推导了留k法、遗漏变量偏误公式和Frisch-Waugh-Lovell定理的过参数化版本，并扩展了高斯-马尔可夫定理。

Comments This work is accepted for publication in Biometrika

详情

AI中文摘要

深度学习的最新进展突显了过参数化统计模型中良性过拟合的现象，引发了对其基础理解的浓厚兴趣。由于其简单性和实际相关性，普通最小二乘（OLS）插值器已成为从理论上理解这一现象的关键研究对象。虽然OLS在经典欠参数化设置下的性质已得到充分理解，但其在过参数化区域中的行为——与岭回归或lasso不同——仍相对较少被探索。我们通过为最小$\ell_2$范数OLS插值器推导新的代数和统计结果，为这一不断增长的文献做出贡献。与现有大部分关注预测风险的工作不同，我们的分析集中于参数估计和推断，这对于许多统计学和因果推断应用至关重要。具体地，我们建立了以下内容的过参数化类比：(i) 留$k$法公式，(ii) 遗漏变量偏误公式，以及(iii) Frisch-Waugh-Lovell定理。在高斯-马尔可夫模型下，我们进一步扩展了高斯-马尔可夫定理，并分析了过参数化设置下同方差性时的方差估计。这些结果共同为研究过参数化线性模型中的参数估计和推断提供了一个系统框架，为超越预测含义的良性过拟合提供了新视角。

英文摘要

Recent advances in deep learning have highlighted the phenomenon of benign overfitting in overparameterized statistical models, sparking significant interest in understanding its foundations. Owing to its simplicity and practical relevance, the ordinary least squares (OLS) interpolator has become a key object of study for gaining theoretical insight into this phenomenon. While the properties of OLS are well understood in classical underparameterized settings, its behavior in the overparameterized regime -- unlike that of ridge regression or the lasso -- remains comparatively less explored. We contribute to this growing literature by deriving new algebraic and statistical results for the minimum $\ell_2$-norm OLS interpolator. In contrast to much of the existing work, which focuses on prediction risk, we center our analysis on parameter estimation and inference, which are fundamental for many statistics and causal inference applications. Specifically, we establish overparameterized analogues of (i) the leave-$k$-out formulas, (ii) the omitted variable bias formula, and (iii) the Frisch-Waugh-Lovell theorem. Under the Gauss-Markov model, we further extend the Gauss-Markov theorem and analyze variance estimation under homoskedasticity in the overparameterized setting. Collectively, these results provide a systematic framework for studying parameter estimation and inference in overparameterized linear models, offering a novel perspective on benign overfitting beyond its implications for prediction.

URL PDF HTML ☆

赞 0 踩 0

2509.15822 2026-06-19 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Phase Transition for Stochastic Block Model with more than $\sqrt{n}$ Communities

具有多于 $\sqrt{n}$ 个社区的随机块模型的相变

Alexandra Carpentier, Christophe Giraud, Nicolas Verzelen

发表机构 * Institut für Mathematik – Universität Potsdam, Potsdam, Germany（波恩大学数学研究所，德国波恩）； Laboratoire de Mathématiques d’Orsay, Université Paris-Saclay, CNRS, France（奥赛数学实验室，巴黎-萨克雷大学，法国 CNRS）； INRAE, Institut Agro, MISTEA, Univ. Montpellier, France（国家农业研究院，蒙彼利埃大学，法国）

AI总结本文证明在随机块模型中，当社区数 $K\geq \sqrt{n}$ 时，低度多项式在 Chin 等人提出的阈值以下无法恢复社区，而通过计数特定子图可在多项式时间内实现恢复，支持了新相变阈值的猜想。

详情

AI中文摘要

高维经验风险最小化中高斯普适性破坏的表征

Chiheb Yaakoubi, Cosme Louart, Malik Tiomoko, Zhenyu Liao

发表机构 * School of Data Science, The Chinese University of Hong Kong, Shenzhen, China ； Huawei Noah's Ark Lab, Huawei Technologies, Paris, France ； School of Electronic Information ； Communications, Huazhong University of Science \& Technology, China

AI总结通过将凸高斯极小极大定理推广到非高斯数据，刻画了高维经验风险最小化估计量的渐近分布，揭示了高斯普适性的适用范围与局限。

Comments 28 pages, 5 figures, 1 table

Journal ref ICML 2026

详情

AI中文摘要

我们研究了一般非高斯数据设计下的高维凸经验风险最小化（ERM）。通过启发式地将凸高斯极小极大定理（CGMT）扩展到非高斯设置，我们推导出关键统计量的渐近极小极大表征，从而能够近似ERM估计量 $\hat{\theta}$ 的均值 $\mu_{\hat{\theta}}$ 和协方差 $C_{\hat{\theta}}$。具体地，在数据矩阵的集中假设以及损失和正则化子的标准正则性条件下，我们证明：对于独立于训练数据的测试协变量 $x$，投影 $\hat{\theta}^\top x$ 近似遵循 $\mu_{\hat{\theta}}^\top x$ 的一般非高斯分布与一个独立中心高斯变量（方差为 $\mathrm{tr}(C_{\hat{\theta}} \mathbb{E}[xx^\top])$）的卷积。这一结果阐明了ERM高斯普适性的范围和局限。此外，我们证明任何 $\mathcal{C}^2$ 正则化子渐近等价于一个由其零点的Hessian矩阵和 $\mu_{\hat{\theta}}$ 处的梯度唯一确定的二次型。我们提供了跨不同损失和模型的数值模拟，以验证我们的理论预测和定性见解。

英文摘要

We study high-dimensional convex empirical risk minimization (ERM) under general non-Gaussian data designs. By heuristically extending the Convex Gaussian Min-Max Theorem (CGMT) to non-Gaussian settings, we derive an asymptotic min-max characterization of key statistics, enabling approximation of the mean $μ_{\hatθ}$ and covariance $C_{\hatθ}$ of the ERM estimator $\hatθ$. Specifically, under a concentration assumption on the data matrix and standard regularity conditions on the loss and regularizer, we show that for a test covariate $x$ independent of the training data, the projection $\hatθ^\top x$ approximately follows the convolution of the generally non-Gaussian distribution of $μ_{\hatθ}^\top x$ with an independent centered Gaussian variable of variance $\mathrm{tr}(C_{\hatθ} \mathbb{E}[xx^\top])$. This result clarifies the scope and limits of Gaussian universality for ERMs. Additionally, we prove that any $\mathcal{C}^2$ regularizer is asymptotically equivalent to a quadratic form determined solely by its Hessian at zero and gradient at $μ_{\hatθ}$. Numerical simulations across diverse losses and models are provided to validate our theoretical predictions and qualitative insights.

URL PDF HTML ☆

赞 0 踩 0

2606.18679 2026-06-19 cs.DS cs.GT cs.LG math.OC 版本更新

Fair Online Resource Allocation

公平在线资源分配

Christopher En, Yuri Faenza, Andrea Lodi, Gonzalo Muñoz

发表机构 * Columbia University, IEOR Department（哥伦比亚大学工业工程与运营研究系）； Cornell Tech（康奈尔科技学院）； Universidad de Chile（智利大学）

AI总结研究在线资源分配中的公平性问题，提出基于对偶镜像下降的算法，在批次内强制执行公平约束，实现亚线性遗憾，并通过难民数据验证了福利与公平的权衡。

Comments 30 pages, 4 figures. To appear in the proceedings of EC 2026

详情

AI中文摘要

我们研究公平在线资源分配问题，其动机源于难民安置和航班调度等应用，其中代理顺序到达并必须分配到容量有限的设施。我们引入一个模型，在资源约束和Lipschitz公平性要求下最大化整体福利，该要求确保同一批次中到达的相似代理获得相似的预期结果。我们首先分析离线问题，证明最优公平分配的价值至少是最优不公平分配的$\Omega(1/\gamma)$倍，其中$\gamma$是公平系数，从而界定了公平的代价。对于在线设置，我们提出一种基于对偶镜像下降的算法，该算法在估计最优对偶变量的同时，在批次内强制执行公平约束。我们证明该算法相对于最优离线流体基准实现了亚线性遗憾。最后，我们使用难民经济项目的真实数据验证了理论结果，展示了算法的性能，并考察了福利最大化与公平执行之间的权衡。

英文摘要

We study the problem of fair online resource allocation, motivated by applications such as refugee resettlement and airline scheduling, where agents arrive sequentially and must be assigned to facilities with limited capacities. We introduce a model that maximizes the overall welfare subject to resource constraints and a Lipschitz fairness requirement, which ensures that similar agents arriving in the same batch receive similar expected outcomes. We first analyze the offline problem, proving that the value of the optimal fair allocation is at least an $Ω(1/γ)$ fraction of the optimal unfair allocation, where $γ$ is the fairness coefficient, thereby bounding the price of fairness. For the online setting, we propose an algorithm based on dual mirror descent that enforces fairness constraints within batches while estimating optimal dual variables. We prove that this algorithm achieves sublinear regret relative to the optimal offline fluid benchmark. Finally, we validate our theoretical results using real-world data from the Refugee Economies Programme, demonstrating the algorithm's performance and examining the trade-offs between welfare maximization and fairness enforcement.

URL PDF HTML ☆

赞 0 踩 0

2606.19364 2026-06-19 cs.LG 新提交

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

缩小社会-语义差距：SPSD用于云LLM推理中的边缘端提示压缩

Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan

AI总结针对云LLM推理中提示词预填充阶段能耗高的问题，提出SPSD边缘端管道，利用4比特量化小语言模型压缩用户提示，在保持响应质量非劣效的前提下，平均节省99.9个输入token，每调用净节能70-270 uWh。

Comments 19 pages, 7 tables, 1 figure, includes appendix

详情

AI中文摘要

大语言模型（LLM）推理的预填充阶段正成为云规模能耗的日益增长的贡献者。许多面向消费者的支持和对话提示包含社会性支架：礼貌标记、道歉性开场白、重复以及建立融洽关系的语言，这些对人类交流很重要，但对机器推理而言边际信息量较低。我们将这种差异称为社会-语义差距。我们提出SPSD（情感保留语义蒸馏），一种边缘端管道，在传输到云端部署的LLM之前，使用4比特量化的小语言模型压缩用户提示。在248个提示的语料库上，使用Gemma-2-2B-Instruct（Q4_K_M）作为SLM、Llama-3.1-8B-Instruct作为云端评估模型进行评估，每次蒸馏调用平均输入token节省99.9个，所有146次蒸馏调用均产生正向节省。通过盲法LLM-as-judge评分对121对进行评估，响应质量在15分制中预先指定的1分非劣效范围内不劣于原始路径；评审员给出43%平局、28%蒸馏胜出和29%原始胜出。余弦相似度结果不一：均值0.682，中位数0.712，54.1%的配对高于0.70参考阈值。安全关键领域通过基于规则的网关保守地路由至直通模式。在所述假设下，每次调用净节能估计为70-270 uWh。SPSD表明，设备端提示蒸馏可以在保持响应质量在实际非劣效范围内的同时，降低云LLM的输入token成本。

英文摘要

The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, repetition, and rapport-building language that is important for human communication but carries low marginal information for machine reasoning. We call this discrepancy the Social-Semantic Gap. We present SPSD (Sentiment Preserving Semantic Distillation), an edge-based pipeline that compresses user prompts using a 4-bit quantised Small Language Model before transmission to a cloud-deployed LLM. Evaluation on a 248-prompt corpus using Gemma-2-2B-Instruct (Q4_K_M) as the SLM and Llama-3.1-8B-Instruct as the cloud evaluation model yields a mean input token saving of 99.9 tokens per distilled call, with all 146 distilled calls yielding positive savings. Response quality, assessed by blind LLM-as-judge scoring across 121 pairs, is non-inferior to the raw path within a pre-specified 1-point margin on a 15-point rubric; the judge awarded 43 percent ties, 28 percent distilled wins, and 29 percent raw wins. Cosine similarity is mixed: mean 0.682, median 0.712, with 54.1 percent of pairs above the 0.70 reference threshold. Safety-critical domains are conservatively routed to passthrough via rule-based gates. Per-call net energy saving is estimated at 70-270 uWh under stated assumptions. SPSD shows that on-device prompt distillation can reduce cloud LLM input-token cost while preserving response quality within a practical non-inferiority margin.

URL PDF HTML ☆

赞 0 踩 0

2606.19365 2026-06-19 cs.LG 新提交

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

跨GPU架构的3D生成扩散模型性能分析与优化

Jeeho Ryoo, Yongchan Jung, Muhammad Ali Khaliq, Weidong Zhang, Jiatong Han, Byeong Kil Lee

发表机构 * Fairleigh Dickinson University（费尔利·迪金森大学）； The University of Colorado at Colorado Springs（科罗拉多大学科罗拉多斯普林斯分校）； Northeastern University（东北大学）

AI总结针对3D MRI扩散模型Med-DDPM，分析其在三代NVIDIA架构上的内核级性能瓶颈，提出TF32 Tensor Core激活和3D channels-last布局优化，实现SM周期和动态指令减少100倍，Tensor Core利用率提升至9.98倍，IPC提升7%。

详情

DOI: 10.1145/3777884.3797012

AI中文摘要

扩散模型已成为高保真3D MRI合成的关键，但由于每个样本需要数百次U-Net评估以及高度异构的内核行为，其部署仍受到大量GPU资源需求的限制。本文对最先进的医学扩散模型Med-DDPM在三代NVIDIA架构上进行了全面的性能分析，研究了内核级运行时分解、指令混合特征、内存系统利用率、线程束级活动以及分析器优先级得分估计。我们发现训练主要由cuDNN卷积和隐式GEMM内核主导，效率低下源于内存访问模式、张量布局转换和有限的Tensor Core利用率。基于这些洞察，我们评估了两种架构感知优化——TF32 Tensor Core激活和3D channels-last布局，并证明它们将SM周期减少多达100倍，动态指令减少100倍，Tensor Core利用率从1.45倍提高到9.98倍，并在A100上将IPC提高7%，且不降低合成质量。

英文摘要

Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly heterogeneous kernel behavior. This paper performs a comprehensive performance analysis of the state-of-the-art medical diffusion model, Med-DDPM, across three generations of NVIDIA architectures to study kernel-level runtime breakdowns, instruction-mix characteristics, memory system utilization, warp-level activities, and profiler priority-score estimates. We show that training is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels, with inefficiencies arising from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Guided by these insights, we evaluate two architecture-aware optimizations TF32 Tensor Core activation and a 3D channels-last layout and demonstrate that they reduce SM cycles by up to 100x, cut dynamic instructions by 100x, raise Tensor Core utilization from 1.45 to 9.98x, and increase IPC by 7% on A100, all without degrading synthesis quality.

URL PDF HTML ☆

赞 0 踩 0

2606.19528 2026-06-19 cs.LG cs.AI 新提交

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

边缘设备上LLM LoRA微调峰值内存降低技术

Hassan Dbouk, Matthias Reisser, Prathamesh Mandke, Likhita Arun Navali, Christos Louizos

AI总结针对边缘设备上LLM LoRA微调的内存瓶颈，提出四种互补技术（量化、检查点、softmax近似、logits掩码），在Llama-3.2 3B和Qwen-2.5 3B上实现高达26倍和28倍的峰值内存降低。

Comments Hassan Dbouk and Matthias Reisser contributed equally to this work

2606.19549 2026-06-19 cs.LG 新提交

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

预测参数高效微调更新的可合并性

Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang

发表机构 * Sichuan University（四川大学）； University of Electronic Science and Technology of China（电子科技大学）

AI总结提出MergeProbe，通过训练初期信号预测LoRA适配器的可合并性，在MERGE-PEFT基准上实现最佳平均和最差保留性能。

详情

AI中文摘要

低秩适配（LoRA）使得训练许多领域和任务特定的语言模型适配器变得廉价，但两个适配器是否可以合并通常只有在两者都经过充分训练和评估后才能发现。这种延迟反馈代价高昂：单独表现强大的适配器在合并更新后可能会产生破坏性干扰。我们询问是否可以预测这种结果。我们将适配器可合并性形式化为适配器在合并后保持其单任务效用的程度，并表明可以从训练初期百分之几的信号中预测——主要是低秩更新及其梯度在不同任务间的对齐程度以及它们对共享表示的干扰程度。我们将这些信号打包成MergeProbe，一个轻量级预测器，用于估计成对和集合级别的保留，并将估计转化为具体决策：直接合并、重新加权、剪枝或路由。在MERGE-PEFT（一个涵盖数学、代码、科学、指令遵循和安全的五领域基准）上，MergeProbe在强干扰感知合并基线中实现了最佳平均和最差保留，同时增加的部署开销远低于完整任务路由。这将LoRA合并从事后工程步骤转变为预期测量问题。

英文摘要

Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training -- chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

URL PDF HTML ☆

赞 0 踩 0

2606.19712 2026-06-19 cs.LG cs.CV 新提交

Efficient Neural Network Model Selection for Few-Class Application Datasets

面向少类应用数据集的高效神经网络模型选择

Bryan Bo Cao, Abhinav Sharma, Lawrence O'Gorman, Michael Coss, Shubham Jain

发表机构 * Nokia Bell Labs（诺基亚贝尔实验室）

AI总结针对实际应用中常见的少类数据集，提出基于数据属性的分类难度度量，实现比传统方法快6-29倍的模型选择，并扩展模型族至更小规模，在移动机器人等场景中提升效率。

Comments 36 pages, 9 tables, 13 figures

详情

AI中文摘要

尽管大量工作集中在开发和基准测试高性能神经网络上，但较少关注已知的数据集属性如何指导高效的模型选择。神经网络模型通常在数千类数据集上评估，然而许多实际应用涉及少于十类。为了解决这一被忽视但常见的情况，我们基于数据侧属性开发了一种分类难度度量，并展示了它如何为少类数据集实现更高效的模型选择，而传统方法在此效果较差。我们将此现象称为“少类独特性”。我们的度量允许比重复训练和测试快6到29倍的模型和数据集比较。利用这一洞察，我们将缩放模型族扩展到已发布的最小模型以下，在相似精度下实现更高效率，例如在移动机器人任务中模型比YOLOv5-nano小42%。针对资源受限的应用，我们在移动机器人、无人机和物联网场景中展示了少类模型选择，突出了在不牺牲性能的情况下效率的实际提升。

英文摘要

While much effort has focused on developing and benchmarking high-performance neural networks, less attention has been given to how dataset properties, known to practitioners, can guide efficient model selection. Neural models are typically evaluated on datasets with thousands of classes, yet many real-world applications involve fewer than ten. To address this understudied but common setting, we develop a measure of classification difficulty based on data-side properties and show how it enables more efficient model selection for few-class datasets, where traditional approaches are less effective. We term this phenomenon "few-class distinctiveness". Our metric allows comparison of models and datasets 6 to 29$\times$ faster than repeated training and testing. Leveraging this insight, we extend scaled model families below the smallest published models, achieving greater efficiency at similar accuracy, for example models up to 42% smaller than YOLOv5-nano for a mobile robot task. Targeting resource-constrained applications, we demonstrate few-class model selection across mobile robot, drone, and IoT scenarios, highlighting practical gains in efficiency without sacrificing performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19919 2026-06-19 cs.LG 新提交

ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models

ADaPT：面向高效大推理模型的令牌级解耦

Tingyun Li, Zishang Jiang, Jinyi Han, Xinyi Wang, Sihang Jiang, Han Xia, Zhaoqian Dai, Shuguang Ma, Fei Yu, Jiaqing Liang, Yanghua Xiao

发表机构 * School of Data Science, Fudan University（复旦大学数据科学学院）； Shanghai Institute of Artificial Intelligence for Education, East China Normal University（华东师范大学上海智能教育研究院）； College of Computer Science and Artificial Intelligence, Fudan University（复旦大学计算机科学与人工智能学院）； Ant Group（蚂蚁集团）

AI总结提出ADaPT，通过令牌级双过程框架解耦效率与正确性信号，引入模式选择令牌控制快慢推理，实现推理时效率-性能权衡的精确连续控制，在降低推理成本的同时保持强推理能力。

详情

AI中文摘要

大型推理模型依赖长思维链实现强性能，但统一应用此类推理会产生高计算成本。现有面向效率的方法试图缩短或混合推理策略，但往往会降低推理能力。我们将根本原因识别为效率激励与正确性优化之间的序列级耦合，这隐式惩罚了长但正确的推理轨迹。为解决此问题，我们提出自适应双过程思维（ADaPT），一种令牌级双过程框架，在训练期间显式解耦效率和正确性信号。ADaPT引入模式选择令牌来控制快速和慢速推理，将效率相关奖励仅应用于此令牌，以避免惩罚正确的长推理，同时在适当时鼓励效率。此外，ADaPT在推理时实现了对效率-性能权衡的精确连续控制：通过调整模式选择令牌的生成概率，单个训练好的模型可以平滑地沿效率-性能帕累托前沿移动。大量实验表明，ADaPT在多个基准测试中显著降低推理成本，同时保持强推理性能。

英文摘要

Large reasoning models rely on long chain-of-thought to achieve strong performance, but applying such reasoning uniformly incurs high computational cost. Existing efficiency-oriented methods attempt to shorten or mix reasoning strategies, yet often degrade reasoning capability. We identify the root cause as sequence-level coupling between efficiency incentives and correctness optimization, which implicitly penalizes long but correct reasoning trajectories. To address this issue, we propose Adaptive Dual-Process Thinking (ADaPT), a token-level dual-process framework that explicitly decouples efficiency and correctness signals during training. ADaPT introduces a mode-selection token to control fast and slow reasoning, applying efficiency-related rewards exclusively to this token to avoid penalizing correct long reasoning while encouraging efficiency when appropriate. Moreover, ADaPT enables precise and continuous control over the efficiency-performance trade-off at inference time: by adjusting the generation probability of the mode-selection token, a single trained model can smoothly move along the efficiency-performance Pareto frontier. Extensive experiments demonstrate that ADaPT significantly reduces inference cost while maintaining strong reasoning performance across multiple benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2606.19964 2026-06-19 cs.LG cs.AR 新提交

Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge

用于边缘Tsetlin Machine推理的低能耗精简RISC-V指令子集处理器

Chanda Gupta, Sanidhya Bhatia, Shaurya Priyadarshi, Himani Panwar, Rishad Shafik, Sudip Roy

AI总结针对Tsetlin Machine推理，提出一种领域专用RISC-V微处理器架构，通过指令精简和数据路径简化，在保持可编程性的同时实现高达98%的执行时间减少和29.7倍能耗降低。

Comments 6 pages, 6 Figures, Accepted in IEEE ISVLSI Conference 2026

详情

AI中文摘要

Tsetlin Machine (TM) 是一种基于逻辑的机器学习方法，依赖于简单的位运算和有限状态自动机，使其适用于边缘AI部署。最近的工作集中在基于Tsetlin Machine (TM) 的协处理器和加速器设计上。尽管这些设计实现了高性能，但它们通常依赖于紧密耦合的接口、微码风格的编程和外部主机处理器，限制了灵活性和编程简易性。在这项工作中，我们提出了一种面向TM推理的领域专用RISC-V微处理器架构和设计流程。利用RISC-V的模块化结构，我们设计了一个精简指令子集处理器，在保持可编程性的同时，针对TM工作负载提高了性能并降低了能耗。采用指令分析来指导指令精简，随后针对TM推理进行数据路径和控制路径的简化。在多个数据集上评估了基线RV32IM核心和所提出的精简核心，并与二值神经网络 (BNN) 进行比较，BNN由于在推理过程中依赖位运算而被用作硬件高效基线。结果表明，TM实现了相当或更高的准确率（例如，在CIFAR-2上高达88.18%，而BNN为60.0%），同时在多个数据集上执行时间减少了高达98%。此外，所提出的设计实现了平均29.7倍的能耗降低，证明了其在可编程且高效的边缘AI系统中的有效性。

英文摘要

Tsetlin Machine (TM) is a logic-based machine learning approach that relies on simple bitwise operations and finite-state automata, which makes it attractive for edge AI deployments. Recent work has focused on co-processor and accelerator designs based on Tsetlin Machines (TMs). Although these designs achieve high performance, they typically depend on tightly coupled interfaces, microcode-style programming, and external host processors, limiting flexibility and ease of programming. In this work, we present a domain-specific RISC-V microprocessor architecture and design flow tailored for TM inference. Leveraging the modular structure of RISC-V, we design a reduced instruction subset processor that retains programmability while targeting improved performance and lower energy consumption for TM workloads. Instruction profiling is employed to guide instruction reduction, followed by datapath and control path simplifications tailored to TM inference. Both the baseline RV32IM core and the proposed reduced core are evaluated across multiple datasets and compared with Binarized Neural Networks (BNNs), which serve as a hardware-efficient baseline due to their reliance on bitwise operations during inference. Results show that TM achieves comparable or higher accuracy (e.g., up to 88.18% on CIFAR-2 compared to 60.0% for BNN) while reducing execution time by up to 98% across multiple datasets. Furthermore, the proposed design achieves an average $29.7\times$ reduction in energy consumption, demonstrating its effectiveness for programmable and efficient edge AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.19993 2026-06-19 cs.LG 新提交

Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs

激活与影响感知秩 (AIR)：保持功能的SVD压缩用于大语言模型

Nico Harder, Daniel Becking, Karsten Mueller, Wojciech Samek

AI总结提出AIR框架，基于SVD和反向信号影响度量，通过单次交替最小二乘扫描实现权重矩阵的低秩近似，在参数保留≤60%时困惑度比SVD-LLM(W)改善>18%，并减少90%校准数据。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM), Seoul, South Korea (non-archival)

2606.20005 2026-06-19 cs.LG cs.AI 新提交

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

StreamKL: 快速且内存高效的KL散度用于提升注意力蒸馏

Guangda Liu, Yiquan Wang, Chengwei Li, Wenhao Chen, Jing Lin, Yiwu Yao, Danning Ke, Wenchao Ding, Jieru Zhao

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Huawei（华为）； Fudan University（复旦大学）

AI总结提出StreamKL，首个融合GPU原语，通过在线公式和逐块重计算将注意力蒸馏的内存和IO成本从O(N_QN_K)降至O(1)，实现高达43倍前向和14倍反向加速。

详情

AI中文摘要

注意力蒸馏通过最小化Kullback-Leibler (KL)散度来训练一个注意力分布匹配另一个，广泛应用于知识蒸馏、模型压缩、持续学习和稀疏注意力LLM训练。然而，现有方法在计算KL归约前需要具体化两个注意力分布，导致$O(N_QN_K)$的内存和IO成本，在长上下文长度下变得不可接受。我们提出StreamKL，首个用于注意力KL散度的融合GPU原语，消除了这种二次具体化。StreamKL推导了一种新颖的在线公式用于耦合的双分布KL归约，使得单个前向内核能够通过片上SRAM流式处理查询-键块。对于反向传播，StreamKL逐块重计算注意力概率，避免存储二次中间结果。我们进一步设计并实现了具有专用优化的高效GPU内核。实验表明，StreamKL在前向和反向传播中分别比基线方法快高达43倍和14倍。最重要的是，StreamKL将注意力蒸馏的额外HBM占用从$O(N_QN_K)$减少到$O(1)$，使得在单个GPU上进行长上下文蒸馏成为可能。

英文摘要

Attention distillation, which trains one attention distribution to match another by minimizing their Kullback-Leibler (KL) divergence, is widely used in knowledge distillation, model compression, continual learning, and sparse-attention LLM training. However, existing approaches materialize both attention distributions before computing the KL reduction, incurring $O(N_QN_K)$ memory and IO costs that become prohibitive at long context lengths. We present StreamKL, the first fused GPU primitive for attention KL divergence that eliminates this quadratic materialization. StreamKL derives a novel online formulation for the coupled two-distribution KL reduction, enabling a single one-pass forward kernel that streams query-key tiles through on-chip SRAM. For the backward pass, StreamKL recomputes attention probabilities tile-by-tile, avoiding storage of quadratic intermediates. We further design and implement efficient GPU kernels with dedicated optimizations. Experiments show StreamKL delivers up to $43\times$ and $14\times$ speedups over baseline methods in the forward and backward passes, respectively. Most importantly, StreamKL reduces the extra HBM footprint of attention distillation from $O(N_QN_K)$ to $O(1)$, enabling long-context distillation on a single GPU.

URL PDF HTML ☆

赞 0 踩 0

2606.20474 2026-06-19 cs.LG cs.AI cs.PF 新提交

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant: 面向上下文密集型智能体的4位KV缓存

Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao

发表机构 * Advanced Micro Devices（超威半导体）； University of California, Los Angeles（加州大学洛杉矶分校）； Purdue University（普渡大学）

AI总结针对上下文密集型智能体场景，提出UltraQuant方法，通过4位KV缓存压缩、旋转量化和代码本量化，结合AMD GPU优化，在长上下文多轮任务中延迟降低3.47倍，吞吐量提升1.63倍。

Comments 11 pages, 9 figures

详情

AI中文摘要

上下文密集型智能体给键值（KV）缓存带来了异常压力：长前缀在多个短轮次中重复使用，而并发性决定了服务系统能否保持GPU利用率。我们针对此场景研究4位KV缓存压缩，采用TurboQuant风格的旋转和代码本量化作为质量锚点，vLLM FP8 KV缓存作为部署锚点。我们报告三项贡献。首先，我们将4位KV缓存框架用于多轮智能体工作负载，其中任务质量、缓存驻留和服务吞吐量必须联合衡量。其次，我们描述了使4位路径鲁棒所需的实际设计选择，包括非对称K/V处理、Walsh-Hadamard旋转、QJL移除和块尺度变体。第三，我们展示了AMD GPU上的服务优化，包括优化的解码注意力内核和UltraQuant，一种使用FP8查询、FP4 KV张量、UE8M0组尺度和CDNA4上原生缩放MFMA支持的FP4近似路径。在长上下文、多轮智能体工作负载上，UltraQuant在缓存压力大的后期轮次中将P50首令牌延迟降低了3.47倍（所有轮次平均2.3倍），并将输出吞吐量比FP8 KV基线提高了1.63倍。

英文摘要

Context-heavy agents place unusual pressure on the key-value (KV) cache: long prefixes are reused across many short turns, while concurrency determines whether the serving system can keep GPUs utilized. We study 4-bit KV-cache compression for this setting, using TurboQuant-style rotation and codebook quantization as a quality anchor and vLLM FP8 KV caching as the deployment anchor. We report three contributions. First, we frame 4-bit KV caching around multi-round agent workloads where task quality, cache residency, and serving throughput must be measured jointly. Second, we describe the practical design choices needed to make the 4-bit path robust, including asymmetric K/V treatment, Walsh-Hadamard rotation, QJL removal, and block-scale variants. Third, we present serving optimizations on AMD GPUs, including optimized decode-attention kernels and UltraQuant, an FP4 approximation path that uses FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA support on CDNA4. On a long-context, multi-turn agentic workload, UltraQuant cuts P50 time-to-first-token by 3.47x in the cache-pressured late rounds (2.3x across all rounds) and raises output throughput by 1.63x over the FP8 KV baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.20537 2026-06-19 cs.LG cs.DC 新提交

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

执行状态胶囊：面向低延迟、小批量、设备端物理AI服务的图绑定执行状态检查点与恢复

Liang Su

AI总结针对低延迟、小批量、设备端物理AI服务场景，提出执行状态胶囊机制，通过图绑定检查点与恢复完整可恢复状态，在RTX 5090上实现亚毫秒级恢复，TTFT加速比达3.9倍至27倍。

Comments 27 pages, 9 figures

详情

AI中文摘要

主流LLM服务系统主要通过分页或基数键值（KV）缓存重用前缀工作。这对于高吞吐量、高并发服务非常有效，但它只管理执行状态的一个位置片段：KV缓存。我们研究相反的场景：低延迟、小批量、设备端物理AI服务，其中交互式LLM代理、语音系统和机器人策略在严格的响应预算下频繁分支、重置、中断和重新进入。我们引入执行状态胶囊，一种图绑定的检查点和恢复机制，用于在提交边界处保存完整的可恢复状态。FlashRT是一个白盒、后端内核运行时，其评估的NVIDIA CUDA后端在连续的静态缓冲区上运行捕获的图计划，无需块表间接寻址。由于活动状态是一组命名的封闭缓冲区，胶囊可以快照、恢复、分叉或回滚整个执行边界，包括KV、循环状态、卷积状态、MTP状态和元数据。这将重用从令牌寻址的KV片段转移到图绑定的执行状态边界。在RTX 5090上，胶囊恢复在存储状态级别是字节精确的，在贪婪解码下是令牌一致的。仅KV的消融实验出现分歧，表明循环状态是承载负载的。GPU驻留的快照和恢复是亚毫秒级的，TTFT相对于冷预填充的加速比从2k令牌时的3.9倍增长到16k令牌时的27倍。在Jetson AGX Thor和DGX Spark上，相同的正确性和结构属性成立。胶囊不是高吞吐量KV缓存服务的替代品；它们定义了一个互补的以延迟为先的服务点，用于显式执行状态重用。

英文摘要

Mainstream LLM serving systems reuse prefix work mainly through paged or radix key-value (KV) caches. This is highly effective for high-throughput, high-concurrency serving, but it manages only one positional fragment of execution state: the KV cache. We study the opposite regime: low-latency, small-batch, on-device physical-AI serving, where interactive LLM agents, speech systems, and robot policies repeatedly branch, reset, interrupt, and re-enter under tight responsiveness budgets. We introduce execution-state capsules, a graph-bound checkpoint and restore mechanism for the complete restorable state at a committed boundary. FlashRT is a white-box, backend-facing kernel runtime whose evaluated NVIDIA CUDA backend runs captured graph plans over contiguous static buffers with no block-table indirection. Because the live state is a closed set of named buffers, a capsule can snapshot, restore, fork, or roll back the whole execution boundary, including KV, recurrent state, convolution state, MTP state, and metadata. This moves reuse from token-addressed KV fragments to graph-bound execution-state boundaries. On an RTX 5090, capsule restore is byte-exact at the stored-state level and token-identical under greedy decode. A KV-only ablation diverges, showing that recurrent state is load-bearing. GPU-resident snapshot and restore are sub-millisecond, and TTFT speedup over cold prefill grows from 3.9x at 2k tokens to 27x at 16k tokens. On Jetson AGX Thor and DGX Spark, the same correctness and structural properties hold. Capsules are not a replacement for high-throughput KV-cache serving; they define a complementary latency-first serving point for explicit execution-state reuse.

URL PDF HTML ☆

赞 0 踩 0

2606.19354 2026-06-19 cs.CL cs.LG 交叉投稿

Granularity-Regulated Adaptive Computational Efficiency for Optimal Verification in Test-Time Scaling

粒度调控的自适应计算效率：测试时扩展中的最优验证

Ardit Krasniqi, Luan Vejsiu, Elira Dervishi

发表机构 * European University of Tirana（欧洲地拉那大学）

AI总结提出GRACE理论框架，将验证粒度建模为问题难度、验证器准确率和计算预算的函数，证明存在相变：细粒度验证在计算预算大或问题难时占优，粗粒度验证在低预算简单问题时更优，自适应策略可达到计算-性能帕累托前沿。

详情

AI中文摘要

测试时扩展（TTS）已成为一种强大的范式，通过在推理时投入额外计算来提升大语言模型（LLMs）的推理性能。TTS的核心组件是验证器，它选择或评分候选解以引导搜索过程。虽然先前工作已探索验证的益处，但一个基本问题仍未充分探索：在给定计算预算下，最优验证粒度是什么？粗粒度的结果奖励模型（ORMs）和细粒度的过程奖励模型（PRMs）代表两个极端，但两者单独均无法在所有场景下实现计算最优性。本文建立了一个统一的理论框架，称为GRACE（粒度调控的自适应计算效率），该框架将最优验证粒度刻画为问题难度、验证器准确率和计算预算的显式函数。我们证明存在一个相变：当计算预算大或问题难时，细粒度验证占优；而在低预算、简单问题场景下，粗粒度验证更受青睐。我们的理论将Best-of-N、束搜索和步骤级MCTS统一在一个帕累托最优框架内，并激发了一种自适应粒度策略，该策略可证明达到计算-性能帕累托前沿。在MATH-500、GSM8K和AIME基准上的实验结果证实了所有四个理论主张，在匹配计算量下，我们的自适应策略相比固定粒度基线准确率提升高达3.1%。

英文摘要

Test-time scaling (TTS) has emerged as a powerful paradigm for improving the reasoning performance of large language models (LLMs) by investing additional compute at inference time. A central component of TTS is the \emph{verifier}, which selects or scores candidate solutions to guide the search process. While prior work has explored the benefit of verification, a fundamental question remains underexplored: \emph{what is the optimal granularity of verification under a given compute budget?} Coarse-grained outcome reward models (ORMs) and fine-grained process reward models (PRMs) represent two extremes, yet neither alone achieves compute-optimality across all regimes. In this paper, we establish a unified theoretical framework, called \textbf{GRACE} (\underline{G}ranularity-\underline{R}egulated \underline{A}daptive \underline{C}omputational \underline{E}fficiency), that characterizes the optimal verification granularity as an explicit function of problem difficulty, verifier accuracy, and compute budget. We prove that there exists a phase transition: fine-grained verification dominates when either the compute budget is large or the problem is hard, whereas coarse-grained verification is preferred in the low-budget, easy-problem regime. Our theory unifies Best-of-$N$, beam search, and step-level MCTS within a single Pareto-optimality framework, and motivates an adaptive granularity strategy that provably achieves the compute-performance Pareto frontier. Empirical results on MATH-500, GSM8K, and AIME benchmarks corroborate all four theoretical claims, with our adaptive strategy outperforming fixed-granularity baselines by up to 3.1\% accuracy at matched compute.

URL PDF HTML ☆

赞 0 踩 0

2606.19799 2026-06-19 cs.SE cs.LG 交叉投稿

CAGE: 曲率感知梯度估计用于精确的量化感知训练

Soroush Tabesh, Mher Safaryan, Andrei Panferov, Alexandra Volkova, Dan Alistarh

发表机构 * Anonymous Authors（匿名作者）

AI总结提出CAGE方法，通过曲率感知校正项改进直通估计器，平衡损失最小化与量化约束，在平滑非凸设置下提供收敛保证，显著提升低比特量化感知训练的精度。

Comments Accepted at MLSys 2026 (Oral). To appear in Proceedings of Machine Learning and Systems 8

Journal ref Proceedings of Machine Learning and Systems 8 (MLSys 2026)

详情

AI中文摘要

尽管在低比特量化感知训练（QAT）方面已有大量工作，但这些技术与原生训练之间仍存在精度差距。为解决这一问题，我们引入了CAGE（曲率感知梯度估计），一种新的QAT方法，它用曲率感知校正项增强直通估计器（STE）梯度，旨在抵消量化引起的损失增加。CAGE源自QAT的多目标视角，平衡损失最小化与量化约束，产生一个依赖于局部曲率信息的原理性校正项。在理论方面，我们引入了量化优化的帕累托最优解概念，并证明CAGE在平滑非凸设置下具有强收敛保证。在实现方面，我们的方法是优化器无关的，但我们提供了一个利用Adam统计信息的高效实现。在相似计算成本下，CAGE在精度上显著优于先前最先进的方法：对于QAT微调，它将压缩精度损失相对于先前最佳方法减半；而对于Llama模型的QAT预训练，其在3比特权重和激活（W3A3）下的精度与先前最佳方法在4比特（W4A4）下达到的精度相当。官方实现可在以下链接找到：https://github.com/IST-DASLab/CAGE。

英文摘要

Despite significant work on low-bit quantization-aware training (QAT), there is still an accuracy gap between such techniques and native training. To address this, we introduce CAGE (Curvature-Aware Gradient Estimation), a new QAT method that augments the straight-through estimator (STE) gradient with a curvature-aware correction designed to counteract the loss increase induced by quantization. CAGE is derived from a multi-objective view of QAT that balances loss minimization with the quantization constraints, yielding a principled correction term that depends on local curvature information. On the theoretical side, we introduce the notion of Pareto-optimal solutions for quantized optimization, and establish that CAGE yields strong convergence guarantees in the smooth non-convex setting. In terms of implementation, our approach is optimizer-agnostic, but we provide a highly-efficient implementation that leverages Adam statistics. CAGE significantly improves upon the prior state-of-the-art methods in terms of accuracy, for similar computational cost: for QAT fine-tuning, it halves the compression accuracy loss relative to the prior best method, while for QAT pre-training of Llama models, its accuracy for 3-bit weights-and-activations (W3A3) matches the accuracy achieved at 4-bits (W4A4) with the prior best method. The official implementation can be found over https://github.com/IST-DASLab/CAGE .

URL PDF HTML ☆

赞 0 踩 0

2602.04396 2026-06-19 cs.LG cs.AI 版本更新

LoRDO: Distributed Low-Rank Optimization with Infrequent Communication

LoRDO: 分布式低秩优化与低频通信

Andrej Jovanović, Alex Iacob, Mher Safaryan, Ionut-Vlad Modoranu, Lorenzo Sani, William F. Shen, Xinchi Qiu, Dan Alistarh, Nicholas D. Lane

发表机构 * University of Cambridge（剑桥大学）； Institute of Science and Technology Austria（奥地利科学与技术研究院）； Lancaster University（兰卡斯特大学）； Flower Labs（Flower实验室）

AI总结提出LoRDO框架，统一低秩优化与低频同步，通过全秩准双曲更新恢复子空间探索，在125M-720M模型规模下实现与低秩DDP近似的性能，通信量减少约10倍。

Comments Accepted at ICML 2026

详情

AI中文摘要

通过$\ exttt{DDP}$进行基础模型的分布式训练受限于互连带宽。虽然低频通信策略减少了同步频率，但优化器状态的内存和通信需求仍然构成瓶颈。低秩优化器可以缓解这些限制；然而，在局部更新机制下，工作节点无法访问计算低秩投影所需的全批次梯度，这降低了性能。我们提出$\ exttt{LoRDO}$，一个统一低秩优化与低频同步的原则性框架。我们首先证明，虽然基于伪梯度的全局投影在理论上更优，但它们将优化轨迹永久限制在低秩子空间中。为了恢复子空间探索，我们引入了一个全秩准双曲更新。$\ exttt{LoRDO}$在125M-720M模型规模的语言建模和下游任务中实现了与低秩$\ exttt{DDP}$近乎相同的性能，同时将通信量减少了约10倍。最后，我们表明在具有小秩/小批次大小的极低内存设置中，$\ exttt{LoRDO}$的性能提升更为显著。

英文摘要

Distributed training of foundation models via $\texttt{DDP}$ is limited by interconnect bandwidth. While infrequent communication strategies reduce synchronization frequency, they remain bottlenecked by the memory and communication requirements of optimizer states. Low-rank optimizers can alleviate these constraints; however, in the local-update regime, workers lack access to the full-batch gradients required to compute low-rank projections, which degrades performance. We propose $\texttt{LoRDO}$, a principled framework unifying low-rank optimization with infrequent synchronization. We first demonstrate that, while global projections based on pseudo-gradients are theoretically superior, they permanently restrict the optimization trajectory to a low-rank subspace. To restore subspace exploration, we introduce a full-rank quasi-hyperbolic update. $\texttt{LoRDO}$ achieves near-parity with low-rank $\texttt{DDP}$ in language modeling and downstream tasks at model scales of $125$M--$720$M, while reducing communication by $\approx 10 \times$. Finally, we show that $\texttt{LoRDO}$ improves performance even more in very low-memory settings with small rank/batch size.

URL PDF HTML ☆

赞 0 踩 0

2602.22495 2026-06-19 cs.LG cs.AI 版本更新

UltraEP：在机架级节点上以近最优负载均衡释放MoE训练与推理

Xinming Wei, Chao Jin, Tuo Dai, Yinmin Zhong, Shan Yu, Chengxu Yang, Bingyang Wu, Zili Zhang, Jing Mai, Qianchao Zhu, Zhouyang Li, Yuliang Liu, Guojie Luo

AI总结提出UltraEP，首个基于精确负载的实时均衡器，通过协同设计规划求解与专家复制通信，在机架级节点上实现MoE训练和推理的微批次与逐层重均衡，达到94.3%的力均衡理想吞吐量。

详情

AI中文摘要

大规模专家并行（EP）正成为训练和服务前沿MoE模型的关键，但它也加剧了设备级专家负载不均衡，导致计算掉队者、令牌全对全瓶颈和激活内存峰值。现有的均衡器基于历史负载定期重新分配专家，这对于具有非平稳负载模式的生产部署变得不可靠。我们提出UltraEP，首个用于大规模EP MoE训练和在机架级节点（RSN）上服务预填充的精确负载实时均衡器。基于RSN扩展的纵向扩展连接性，UltraEP在关键路径上对每个微批次和层进行重均衡，这需要规划求解和专家复制通信的非平凡协同设计，以最小化暴露的开销。为此，UltraEP通过高效的配额驱动规划对门控后负载做出积极反应，并利用RSN原生的持久tile流和基于中继的扇出缓解来执行由此产生的不规则专家状态传输。在训练和预填充中，平均涵盖106B到671B参数的MoE模型，UltraEP实现了力均衡理想吞吐量的94.3%，相比无均衡提升了1.49倍，同时将最终跨秩不均衡从1.30-4.01降低到1.01-1.04。此外，我们在2560个GPU的生产MoE训练中验证了UltraEP的可扩展性和鲁棒性。

英文摘要

Large-scale expert parallelism (EP) is becoming pivotal for training and serving frontier MoE models, but it also amplifies device-level expert load imbalance into compute stragglers, token all-to-all bottlenecks, and activation-memory spikes. Existing balancers redistribute experts periodically based on historical load, which becomes unreliable for production deployments with non-stationary load patterns. We present UltraEP, the first exact-load, real-time balancer for large-EP MoE training and serving prefill on rack-scale nodes (RSNs). Leveraging the extended scale-up connectivity among dozens of GPUs within RSNs, UltraEP rebalances every microbatch and layer on critical paths, which requires nontrivial co-design of plan solving and expert replication communication to minimize exposed overhead. To this end, UltraEP eagerly reacts to post-gating load with an efficient quota-driven planner, and executes the resulting irregular expert-state transfers with RSN-native persistent tile streaming and relay-based fan-out mitigation. We evaluate UltraEP in a multi-RSN deployment of up to 256 GPUs, using cutting-edge MoE models from 106B to 671B parameters. Averaged across training and serving, UltraEP achieves 94.3% of the force-balanced ideal throughput, delivering 1.49$\times$ improvement over no-balancing, while reducing the final inter-rank imbalance from 1.30$-$4.01 to 1.01$-$1.04.

URL PDF HTML ☆

赞 0 踩 0

2606.19734 2026-06-19 cs.LG 新提交

Federated Bilevel Performative Prediction

联邦双层执行预测

Liangxin Qian, Chang Liu, Xuanyu Cao, Jun Zhao, Kwok-Yan Lam

发表机构 * Nanyang Technological University（南洋理工大学）； Zhejiang University（浙江大学）； Washington State University（华盛顿州立大学）

AI总结研究联邦学习中客户端数据分布受决策影响的双层优化问题，提出联邦双层执行稳定点概念及两种求解方法，实验验证了稳定性阈值和元泛化提升。

Comments Accepted by ICML 2026

详情

AI中文摘要

联邦双层优化广泛用于跨分布式客户端的嵌套学习问题，例如在隐私和通信约束下的联邦超参数调整和元学习。大多数现有公式假设客户端数据分布固定，但执行性可能违反这一假设，其中部署的决策会重塑客户端行为和数据收集，导致客户端特定的、决策依赖的分布偏移。我们研究联邦双层执行预测，其中上层（UL）和下层（LL）目标都在客户端依赖、决策依赖的分布下进行评估。我们在解耦风险视角下形式化联邦双层执行稳定（FBPS）点，并给出其存在性和唯一性的充分条件。然后，我们开发两种联邦方法来计算FBPS解：FBi-RRM，在收缩条件下线性收敛；以及FBi-SGD，一种基于联邦超梯度估计的通信高效随机方法，在步长递减且敏感性足够小时具有收敛保证。在策略回归和元策略分类上的实验验证了预测的稳定性阈值，并展示了相对于非执行基线的元泛化改进，基于CNN的分类进一步证明了所提方法在非凸神经网络设置中的实际有效性。

英文摘要

Federated bilevel optimization is widely used for nested learning problems across distributed clients, such as federated hyperparameter tuning and meta-learning under privacy and communication constraints. Most existing formulations assume fixed client data distributions, which can be violated by performativity, where deployed decisions reshape client behavior and data collection, inducing client-specific, decision-dependent distribution shift. We study federated bilevel performative prediction, where both upper-level (UL) and lower-level (LL) objectives are evaluated under client-dependent, decision-dependent distributions. We formalize the federated bilevel performatively stable (FBPS) point under a decoupled-risk perspective and provide sufficient conditions for its existence and uniqueness. We then develop two federated methods to compute the FBPS solution: FBi-RRM, which converges linearly under a contraction condition, and FBi-SGD, a communication-efficient stochastic method based on federated hypergradient estimation with convergence guarantees under diminishing step sizes when sensitivities are sufficiently small. Experiments on strategic regression and meta strategic classification validate the predicted stability thresholds and demonstrate improved meta-generalization over non-performative baselines, and CNN-based classification further demonstrates the practical effectiveness of the proposed methods in nonconvex neural network settings.

URL PDF HTML ☆

赞 0 踩 0

2606.20115 2026-06-19 cs.LG cs.CV 新提交

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

当校准失败于脆弱的医院：通过风险曲线收缩实现联邦共形风险控制

Nafis Fuad Shahid

AI总结针对联邦部署中标准共形风险控制（CRC）对个体机构覆盖不足的问题，提出基于风险曲线收缩的联邦CRC协议，在真实脑肿瘤数据上实现2.7/20的违规率且预测集仅扩大2.0倍。

Comments 9 pages, 3 figures, 2 tables. Submitted to the DeCaF Workshop at MICCAI 2026

详情

AI中文摘要

共形风险控制（CRC）通过在保留数据上校准预测集阈值，提供分割质量的无分布保证。在联邦部署中，标准方法将各站点的校准分数合并为一个阈值。我们在真实多机构脑肿瘤数据（FeTS-2022，1251名受试者，20个机构）上首次量化表明，这种朴素的合并CRC保护了平均医院，但违反了40%个体机构的覆盖，最差站点的假阴性率超出目标7.8个百分点。朴素的替代方案——每个站点本地CRC——基本恢复了覆盖，但将预测集扩大了83倍，使其在临床上无用。我们提出一种基于收缩的联邦CRC协议：每个站点仅将其经验风险曲线（G个标量）传输到服务器，服务器为每个站点计算收缩正则化阈值。单个超参数n0平滑地权衡最坏情况覆盖与预测集效率；留一站点敏感性分析确定n0=19，在2.0倍拉伸下实现2.7/20的违规。我们进一步表明，覆盖预算的直接拉格朗日优化失败，将风险集中在脆弱的医院，并且有限样本修正项是必不可少的：移除它会使违规增加三倍。在所述站点混合假设下，边际CRC保证通过构造得以保留；在三个种子下针对四个目标验证了每个站点的覆盖。没有患者级别的图像、掩膜或每体积分数离开任何站点。

英文摘要

Conformal risk control (CRC) provides distribution-free guarantees on segmentation quality by calibrating a prediction-set threshold on held-out data. In federated deployments, the standard approach pools calibration scores across sites into a single threshold. We provide the first quantification, on real multi-institutional brain tumor data (FeTS-2022, 1,251 subjects, 20 institutions), showing that this naive pooled CRC protects the average hospital but violates coverage at 40% of individual institutions, with the worst site exceeding the target false-negative rate by 7.8 percentage points. The naive alternative, per-site local CRC, largely restores coverage but inflates prediction sets by 83x, rendering them clinically useless. We propose a shrinkage-based federated CRC protocol: each site transmits only its empirical risk curve (G scalars) to a server, which computes a shrinkage-regularized threshold per site. A single hyperparameter n0 smoothly trades worst-case coverage for prediction-set efficiency; leave-one-site-out sensitivity analysis identifies n0=19, achieving 2.7/20 violations at 2.0x stretch. We further show that direct Lagrangian optimization of coverage budgets fails, concentrating risk on vulnerable hospitals, and that the finite-sample correction term is essential: removing it triples violations. The marginal CRC guarantee is preserved by construction under the stated site-mixture assumption; per-site coverage is validated across four targets with three seeds. No patient-level images, masks, or per-volume scores leave any site.

URL PDF HTML ☆

赞 0 踩 0

2606.20382 2026-06-19 cs.LG 新提交

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

面向模态不平衡的联邦图学习：一种基于数据合成的方法

Zhengyu Wu, Hongchao Qin, Xunkai Li, Zekai Chen, Rong-Hua Li, Guoren Wang

AI总结针对联邦图学习中客户端级和节点级模态不平衡问题，提出隐式图感知潜在语义表示合成范式FedMGS，通过可用性感知图编码器、原型引导语义合成器和可靠性校准融合机制恢复缺失模态语义，在四个任务上最高提升17.41%。

详情

AI中文摘要

多模态联邦图学习（MM-FGL）提供了一种自然的协作训练范式，但其实际部署受到两种粒度的模态不平衡挑战。当某些客户端缺少完整模态时，会出现客户端级不平衡；而当单个节点缺少视觉或文本属性时，会出现节点级不平衡。尽管存在一些相关研究，但我们的调查表明，它们主要针对图无关或集中式场景，难以直接适应。为了解决这些挑战，我们将模态不平衡的MM-FGL形式化为一个隐式图感知潜在语义表示合成问题。该范式直接在表示空间中恢复缺失的模态语义，从而最大化与原始数据语义分布的对齐，并缓解由缺失模态引起的高方差。为此，我们提出了FedMGS（联邦模态感知图合成），它集成了三个核心组件。可用性感知图编码器防止缺失模态污染局部结构传播。原型引导潜在语义合成器为不可用模态建立跨客户端语义锚点。可靠性校准语义融合机制在预测读出之前调节恢复的潜在表示的影响。在四个任务上的大量实验表明，FedMGS始终优于竞争基线，最高提升17.41%，并实现了最佳效率-性能权衡。

英文摘要

MultiModal Federated Graph Learning (MM-FGL) offers a natural collaborative training paradigm, but its practical deployment is challenged by two granularities of modality imbalance. Client-level imbalance occurs when certain clients lack entire modalities, while node-level imbalance occurs when individual nodes exhibit missing visual or textual attributes. While several relevant studies exist, our investigation reveals that they predominantly target graph-agnostic or centralized scenarios, rendering them difficult to adapt directly. To address these challenges, we formalize modality-imbalanced MM-FGL as an implicit graph-aware latent semantic representation synthesis problem. This paradigm recovers missing modal semantics directly within the representation space, thereby maximizing alignment with the original data's semantic distribution and mitigating the high variance induced by missing modalities. To this end, we propose FedMGS (Federated Modality-aware Graph Synthesis), which integrates three core components. The availability-aware graph encoder prevents missing modalities from contaminating local structural propagation. The prototype-guided latent semantic synthesizer establishes cross-client semantic anchors for unavailable modalities. The reliability-calibrated semantic fusion mechanism regulates the impact of recovered latent representations prior to predictive readout. Extensive experiments on four tasks show that FedMGS consistently outperforms competitive baselines with gains up to 17.41% with best efficiency-performance tradeoff.

URL PDF HTML ☆

赞 0 踩 0

2606.20546 2026-06-19 cs.LG 新提交

Predictability as a Fine-Grained Measure for Privacy

可预测性作为隐私的细粒度度量

Linda Lu, Karthik Sridharan

AI总结提出可预测性框架，通过攻击者预测敏感信息的能力增益来衡量隐私泄露，与差分隐私互补，并基于广义矩方法分析渐近可预测性，用于ERM输出扰动。

详情

AI中文摘要

差分隐私（DP）确保针对最知识渊博的攻击者的严格个体级隐私保证，但其最坏情况性质可能导致代价高昂的隐私-准确性权衡。我们引入了通过可预测性实现的隐私，这是一个细粒度框架，明确包含了攻击者的核心知识、由随机过程生成的数据集的受损部分以及指定的查询族。可预测性将隐私泄露衡量为攻击者在观察算法输出后，预测关于未知个体的敏感信息的能力的增量增益，超出已从受损数据中推断出的信息。我们表明，可预测性和DP通常是不可比的：一个可以很小而另一个很大。然而，在最坏情况下，当除一个个体外所有个体都受损且所有二元查询都被视为敏感时，可预测性意味着互信息DP。更一般地，可预测性提供了一种针对特定敏感信息和特定攻击者模型量身定制的更细粒度的隐私度量。我们引入了一个通用框架，使用广义矩方法（GMM），来分析当受损数据由平稳、遍历、混合过程生成时的渐近可预测性。利用这一分析，我们推导出用于ERM的可预测性校准输出扰动方案。我们的方法与DP互补，并且可以与DP一起使用以提供细粒度的隐私控制。

英文摘要

Differential privacy (DP) ensures rigorous individual-level privacy guarantees against even the most knowledgeable attackers, but its worst-case nature can impose a costly privacy-accuracy tradeoff. We introduce privacy via predictability, a fine-grained framework that explicitly incorporates the attacker's core knowledge, a compromised portion of the dataset generated by a stochastic process, and a specified family of queries. Predictability measures privacy leakage as the incremental gain in an attacker's ability to predict sensitive information about unknown individuals after observing the algorithm's output, beyond what can already be inferred from the compromised data. We show that predictability and DP are generally incomparable: each can be small while the other is large. However, in the worst-case regime where all but one individual is compromised, and all binary queries are considered sensitive, predictability implies mutual-information DP. More generally, predictability provides a finer-grained privacy metric tailored to specific sensitive information and specific attacker models. We introduce a general framework, using the generalized method of moments (GMM), to analyze asymptotic predictability when the compromised data is generated by a stationary, ergodic, mixing process. Using this analysis, we derive a predictability-calibrated output perturbation scheme for ERM. Our approach is complementary to DP and can be used alongside DP to provide fine-grained privacy control.

URL PDF HTML ☆

赞 0 踩 0

2606.19535 2026-06-19 cs.CR cs.LG 交叉投稿

FloatDoor: Platform-Triggered Backdoors in LLMs

FloatDoor: 大语言模型中的平台触发后门

Nils Loose, Jonas Sander, Felix Mächtle, Thomas Eisenbarth

AI总结提出FloatDoor，首个输入无关、平台触发的后门攻击，利用浮点运算平台差异，通过两个轻量LoRA适配器在目标平台触发恶意行为，同时保持模型正常效用。

详情

AI中文摘要

大型语言模型（LLM）越来越多地部署在软件工程等敏感环境中，其输出直接影响下游工件。最近的研究表明，由于非结合浮点运算和不同的内核实现，同一模型在不同部署平台上可能产生可测量的不同输出。我们研究了这种平台依赖可变性的安全影响，并揭示了LLM部署中一种新的攻击面。我们提出了FloatDoor，这是首个针对生成式LLM的输入无关、平台触发的后门攻击。被攻陷的模型在目标平台上表现出对手选择的行为，而在其他平台上则表现正常。FloatDoor通过两个轻量级LoRA适配器实现：一个放大平台间数值差异，另一个将由此产生的平台签名绑定到恶意下游任务，同时保持模型整体效用基本不变。FloatDoor利用了模型审计和部署之间的显著检查时间与使用时间差距。我们在Qwen3-4B上展示了FloatDoor，涵盖了广泛的部署目标，包括NVIDIA GPU、Google TPU、AWS Graviton和阿里巴巴Yitian-710。作为最终案例研究，我们展示了FloatDoor能够在选定的目标平台上可靠地诱导可利用的代码漏洞。我们的结果建立了一类新的LLM部署攻击，并强调了在敏感的LLM驱动应用中建立可信模型供应链的迫切需求。

英文摘要

Large language models (LLMs) are increasingly deployed in sensitive settings such as software engineering, where their outputs directly shape downstream artifacts. Recent work has shown that an identical model can produce measurably different outputs depending on the deployment platform, a consequence of non-associative floating-point arithmetic and divergent kernel implementations. We study the security implications of this platform-dependent variability and uncover a novel attack surface on LLM deployments. We introduce FloatDoor, the first input-independent, platform-triggered backdoor attack against generative LLMs. The compromised model exhibits adversary-chosen behavior when served on a target platform and is otherwise benign. FloatDoor is realized through two lightweight LoRA adapters, one that amplifies inter-platform numerical divergence and one that binds the resulting platform signature to a malicious downstream task, while leaving aggregate model utility largely intact. FloatDoor exploits a pronounced time-of-check, time-of-use gap between model auditing and serving. We demonstrate FloatDoor on Qwen3-4B across a broad range of deployment targets, including NVIDIA GPUs, Google TPUs, AWS Graviton, and Alibaba Yitian-710. As a final case study, we show that FloatDoor reliably induces exploitable code vulnerabilities on a chosen target platform. Our results establish a new class of attacks on LLM deployments and underscore the pressing need for trusted model supply chains in sensitive, LLM-powered applications.

URL PDF HTML ☆

赞 0 踩 0

2606.19643 2026-06-19 stat.ML cs.LG 交叉投稿

论基于最高密度区域的量化不确定性探索

Sam Goring, Tom Kuipers, Nicola Paoletti, David S. Watson

发表机构 * Northeastern University London（东北大学伦敦校区）

AI总结针对概率机器学习中回归问题的不确定性量化，提出基于最高密度区域体积的QUEST框架，满足单调性和平移不变性公理，在选择性预测基准上优于方差和微分熵。

Comments 27 pages, of which 10 are main text. Contains 7 figures, 4 tables, 1 algorithm in total

详情

AI中文摘要

不确定性量化对于概率机器学习中安全关键应用的可靠决策至关重要。对于回归问题，主流的标量不确定性量化方法——特别是基于适当评分规则的方法——通过逐点预测风险来衡量不确定性。当目标统计量不是条件期望时，这可能导致反直觉的结果。我们提出了一种替代框架，其中不确定性通过分布支持的最可能子集的体积来表征。QUEST（通过最高密度区域量化不确定性）是一种基于勒贝格测度在分布峰值处集中程度的新颖不确定性量化方法，在鲁棒性参数$\alpha$的一个或多个值处进行评估。我们建立了我们的度量与信息论和经济学中经典统计量之间的联系。我们表明，与基于适当评分规则的流行替代方案不同，QUEST的认知不确定性和偶然不确定性度量满足从不确定性量化文献中改编的一组公理，包括在分布扩散下的单调性和位置偏移的不变性。选择性预测基准证实，QUEST在方差和微分熵等标准度量上表现良好。

英文摘要

Uncertainty quantification (UQ) is essential for reliable decision-making in safety-critical applications in probabilistic machine learning. For regression problems, dominant scalar UQ approaches - notably, those based on proper scoring rules - measure uncertainty via pointwise predictive risk. This can lead to counterintuitive results when the target statistic is not the conditional expectation. We propose an alternative framework, in which uncertainty is characterised by the volume of the most probable subset of a distribution's support. QUEST (Quantifying Uncertainty via highest dEnSiTy regions) is a novel approach to UQ based on the concentration of Lebesgue measure at a distribution's peak(s), evaluated at one or more values of a robustness parameter $α$. We establish connections between our measures and classical statistics from information theory and economics. We show that, unlike popular alternatives based on proper scoring rules, QUEST measures of epistemic and aleatoric uncertainty satisfy a set of axioms adapted from the UQ literature, including monotonicity under distributional spread and invariance to location shifts. Selective prediction benchmarks confirm that QUEST performs favourably against standard measures such as variance and differential entropy.

URL PDF HTML ☆

赞 0 踩 0

2606.19603 2026-06-19 cs.LG 新提交

Comparing Linear Probes with Mahalanobis Cosine Similarity

比较线性探针与马氏余弦相似度

Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte

发表机构 * Columbia University（哥伦比亚大学）； Stanford University（斯坦福大学）； Schmidt Sciences（施密特科学）

AI总结研究证明马氏余弦相似度与OOD AUROC存在线性关系，提供理论解释并验证其作为线性探针比较指标的有效性。

Comments 16 pages, 10 figures

详情

AI中文摘要

线性探针广泛用于可解释性研究，并常通过余弦相似度进行比较。两个方向之间的马氏余弦相似度（MCS）通过测试数据协方差重新加权内积，是一种自然的任务感知改进。Ying等人（2026）报告称，探针与在分布外（OOD）数据上训练的参考探针的MCS近乎完美地线性预测了该探针的OOD AUROC（R^2 = 0.98）。在这里，我们将这一实证发现扩展到不同模型、层和概念领域，并以封闭形式证明了这一普遍现象：对于投影为高斯分布的平衡类别，OOD AUROC与参考探针的MCS是线性的，因为两者都是探针在测试数据上信噪比（SNR）的S形函数。该理论还预测了这种线性何时失效，我们通过实验验证了这一点。MCS为比较线性探针提供了有理论依据且经验有效的替代方案，优于欧几里得余弦相似度。

英文摘要

Linear probes are widely used in interpretability research and often compared by cosine similarity. The Mahalanobis cosine similarity (MCS) between two directions, which reweights the inner product by test data covariance, is a natural task-aware refinement. Ying et al. (2026) report that a probe's MCS to a reference probe trained on the out-of-distribution (OOD) data near-perfectly linearly predicts the probe's OOD AUROC (R^2 = 0.98). Here, we extend this empirical finding across models, layers, and concept domains, and prove this general phenomenon in closed form: For balanced classes whose projections are Gaussian, OOD AUROC and MCS to the reference probe are linear because both are sigmoid-shaped functions of the probe's signal-to-noise ratio (SNR) on the test data. The theory also predicts when this linearity fails, which we verify empirically. MCS offers a theoretically grounded and empirically effective alternative to Euclidean cosine similarity for comparing linear probes.

URL PDF HTML ☆

赞 0 踩 0

2606.19818 2026-06-19 cs.LG cs.AI 新提交

Uncertainty-Aware Reward Modeling for Stable RLHF

不确定性感知的奖励建模用于稳定的RLHF

Licheng Pan, Haocheng Yang, Haoxuan Li, Yichen Sun, Yunsheng Lu, Shijian Wang, Lei Shen, Yuan Lu, Zhixuan Chu, Hao Wang

发表机构 * Zhejiang University（浙江大学）； Peking University（北京大学）； National University of Singapore（新加坡国立大学）

AI总结提出不确定性感知奖励建模（UARM），通过分位数保形预测校准不确定性并利用异方差方差分解重加权GRPO优势，以缓解奖励黑客问题，提升对齐质量。

详情

AI中文摘要

从人类反馈中强化学习（RLHF）通过在偏好数据上训练奖励模型并优化策略以最大化预测奖励来对齐大型语言模型。然而，该流程面临两个基本挑战：（1）奖励模型无法在预测不可靠时发出信号，因为它们通常充当确定性点估计器；（2）现代基于组的策略优化可能放大不可靠的奖励信号，例如GRPO在优势计算中对奖励的统一处理。随着策略探索越来越多样化的响应，这两个限制造成了一个关键漏洞：不可靠的奖励估计可能被赋予不成比例的影响力，引发严重的奖励黑客问题。我们提出不确定性感知奖励建模（UARM），通过基于分位数的保形预测为奖励模型配备校准的不确定性，并通过异方差方差分解重加权GRPO优势。在HelpSteer、UltraFeedback和PKU-SafeRLHF上的实验表明，与标准GRPO和不确定性无关的基线相比，UARM显著改善了奖励模型校准，减少了奖励黑客问题，并增强了下游对齐质量。

英文摘要

Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental challenges: (1) reward models cannot signal when their predictions are unreliable, since they usually act as deterministic point estimators; and (2) modern group-based policy optimization can amplify unreliable reward signals, as exemplified by GRPO's uniform treatment of rewards during advantage computation. As policies explore increasingly diverse responses, these two limitations create a critical vulnerability: unreliable reward estimates may be granted disproportionate influence, triggering severe reward hacking. We propose Uncertainty-Aware Reward Modeling (UARM), which equips reward models with calibrated uncertainty via quantile-based conformal prediction and reweights GRPO advantages through heteroscedastic variance decomposition. Experiments across HelpSteer, UltraFeedback, and PKU-SafeRLHF demonstrate that UARM significantly improves reward model calibration, reduces reward hacking, and enhances downstream alignment quality compared to standard GRPO and uncertainty-agnostic baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.20415 2026-06-19 cs.LG 新提交

Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids

伪特征填充：一种针对电网虚假数据注入的轻量级防御方法

Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic

发表机构 * University of Tennessee（田纳西大学）； The University of Illinois at Springfield（伊利诺伊大学斯普林菲尔德分校）； Clemson University（克莱姆森大学）

AI总结提出一种轻量级防御框架，通过基于输入统计分布的伪特征填充增加输入维度，使对抗攻击因扰动不可转移和填充结构不可预测而计算不可行，显著提升深度神经网络在电网状态估计中的鲁棒性。

详情

AI中文摘要

深度神经网络（DNN）在各种任务中取得了显著的准确性，包括在信息物理系统（CPS）中用于检测关键操作期间的虚假数据注入攻击（FDIA）。然而，CPS的独特基础设施使得DNN容易受到攻击者的利用，以逃避检测。此外，CPS的独特性质对传统的FDIA防御机制提出了挑战。本文提出了一种创新的防御框架，通过引入一个额外的输入层，该层使用从输入统计分布中导出的伪特征值对输入样本进行填充，从而增强DNN抵御此类攻击的能力。这种填充以随机化和数据感知的方式增加了输入维度，使得由于精心设计的扰动的不可转移性和填充结构的不可预测性，对抗攻击在计算上变得不可行。我们的方法轻量级、与模型无关，并且不需要对核心架构进行修改，使其在现实世界的CPS环境中高度可部署。我们在关键电网应用（如使用IEEE 14节点、30节点、118节点和300节点系统的状态估计）上评估了我们的框架。对抗性设置下的实验表明，我们的填充策略显著提高了模型的鲁棒性，对性能的影响可以忽略不计，并有效缓解了原本会绕过传统防御的攻击。

英文摘要

Deep Neural Networks DNNs have achieved remarkable accuracy in various tasks including their application in CyberPhysical Systems CPS for detecting False Data Injection Attacks FDIA during critical operations However the unique infrastructure of CPS makes DNNs vulnerable to exploitation by attackers aiming to evade detection Additionally the distinct nature of CPS presents challenges for conventional defense mechanisms against FDIA This paper proposes an innovative defense framework that strengthens DNNs against such attacks by introducing an additional input layer that performs padding in the input samples using pseudofeature values derived from the inputs statistical distribution This padding increases the input dimensionality in a randomized and dataaware manner making adversarial attacks computationally infeasible due to the nontransferable nature of crafted perturbations and the unpredictability of the padded structure Our method is lightweight modelagnostic and requires no modifications to the core architecture making it highly deployable in realworld CPS settings We evaluated our framework on critical power grid applications such as state estimation using the IEEE 14bus 30bus 118bus and 300bus systems Experiments under adversarial settings demonstrate that our padding strategy significantly improves model robustness with negligible impact on performance and effectively mitigates attacks that would otherwise bypass conventional defenses

URL PDF HTML ☆

赞 0 踩 0

2606.20557 2026-06-19 cs.LG math.ST stat.ML stat.TH 新提交

Optimal Deterministic Multicalibration and Omniprediction

最优确定性多校准与全预测

Georgy Noarov, Aaron Roth

发表机构 * University of Pennsylvania（宾夕法尼亚大学）

AI总结本文提出一种确定性算法，实现多校准的极小化最优样本复杂度，并推广到结果不可区分性，解决确定性预测器是否必要的问题。

详情

AI中文摘要

一个模型在一组群体权重 $G$ 上是多校准的，如果它是校准的——即即使以其预测为条件也是无偏的——不仅整体上，而且在通过每个 $g \in G$ 对上下文重新加权后也是如此。这对于许多下游应用是一个有用的性质，也是可信机器学习的基本要求。在这项工作之前，所有已知达到 $\varepsilon$-多校准的极小化最优 $\widetilde O(\varepsilon^{-3})$ 样本复杂度的预测器都是随机化的，而确定性预测器仅以更差的样本复杂度已知。多校准中随机化对于最优样本复杂度是否必要的问题由 [CLNR26] 明确提出，并在之前的几项工作中隐含提出。我们通过给出一个输出确定性预测器的极小化最优多校准算法解决了这个开放问题。然后我们将该算法推广到产生满足关于有限或有限覆盖测试集合的结果不可区分性（OI）的最优确定性预测器。作为一个应用，这也给出了具有最优样本复杂度的确定性全预测器和泛预测器，解决了 [OKK25] 和 [BHHLZ25] 提出的开放问题。

英文摘要

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon^{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].

URL PDF HTML ☆

赞 0 踩 0

2606.19353 2026-06-19 cs.CL cs.LG 交叉投稿

Quantifying Aleatoric Uncertainty of In-Context Learning for Robust Measure of LLM Prediction Confidence

量化上下文学习中的偶然不确定性以稳健衡量LLM预测置信度

Jinseok Chung, Minkyoung Song, Hyunji Jung, Namhoon Lee

发表机构 * POSTECH（浦项科技大学）

AI总结针对上下文学习（ICL）中预测对提示设计敏感的问题，提出基于贝叶斯观点和机制可解释性的自函数向量，直接估计偶然不确定性，并设计严格评估协议，在合成和真实数据集上验证了方法的可靠性及在幻觉检测等应用中的实用性。

Comments Accepted to ACL 2026

详情

AI中文摘要

上下文学习（ICL）使LLM能够从少量示例中适应新任务，但其可靠性仍存疑虑：预测对提示设计和模型理解上下文的能力高度敏感，使得失败源于数据特性还是模型限制难以区分。不确定性分解——将偶然不确定性从认知不确定性中分离——在此场景中尤为关键，然而现有方法针对标准生成任务设计，未能捕捉ICL的独特动态。为解决此问题，我们引入基于贝叶斯观点和ICL机制可解释性的自函数向量概念。这些向量利用模型内部表示来建模上下文提示中学习的潜在概念，从而在贝叶斯框架内直接估计偶然不确定性，并规避了对脆弱的输入或解码操作的依赖。鉴于缺乏既定基准和合适的评估协议，我们还提出了首个严格的评估协议，其中数据以受控方式被操纵，以便精确量化偶然不确定性并将其与认知不确定性分离。借助这一新的评估框架（最初基于合成任务进行概念开发，随后扩展到真实世界数据集），我们展示了所提出的方法比现有替代方法更可靠地衡量LLM在ICL下做出的预测的不确定性。此外，我们展示了它可作为可信相关应用（如幻觉检测）的实用工具。我们的发现为将不确定性的量化观点与模型行为的机制理解联系起来开辟了新方向。

英文摘要

In-Context Learning (ICL) allows LLMs to adapt to new tasks from a few demonstrations, but its reliability remains a concern: predictions are highly sensitive to both prompt design and the model's ability to understand the context, obscuring whether failures arise from data properties or model limitations. Uncertainty decomposition-separating aleatoric from epistemic sources-is particularly crucial in this setting, yet existing methods, designed for standard generation tasks, fail to capture the unique dynamics of ICL. To address this, we introduce a concept of self-function vectors, built upon Bayesian views and the mechanistic interpretability of ICL. These vectors leverage internal model representations to model the latent concept learned during in-context prompting, thereby enabling a direct estimation of aleatoric uncertainty within a Bayesian framework and circumventing the reliance on brittle input or decoding manipulations. Given the lack of established benchmarks and suitable evaluation protocols, we also propose the first and rigorous evaluation protocol, in which data is manipulated in controlled ways so as to quantify aleatoric uncertainty precisely and separately from epistemic uncertainty. With this new evaluation framework, initially grounded in synthetic tasks for conceptual development and subsequently extended to real-world datasets, we show that our proposed methodology can measure uncertainty of LLM predictions made under ICL more reliably than existing alternative methods. Moreover, we show it can be used as a practical tool for trustworthy-related applications, such as hallucination detection. Our findings pave a new direction for connecting the quantitative view of uncertainty with the mechanistic understanding of model behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.19998 2026-06-19 cs.RO cs.AI cs.CV cs.LG 交叉投稿

Tri-Info: Generalizable, Interpretable Failure Prediction for VLA Models via Information Theory

Tri-Info: 基于信息论的VLA模型可泛化、可解释的故障预测

Jinghan Yang, Yunchao Zhang, Wang Yuan, Haolun Wan, Jiaming Zhang, Zhengyang Hu, Yanchao Yang

发表机构 * InfoBodied AI Lab, The University of Hong Kong（香港大学信息具身人工智能实验室）； HKU Musketeers Foundation Institute of Data Science（香港大学赛马会数据科学研究院）

AI总结提出Tri-Info方法，通过信息论信号捕捉动作多样性、时间一致性和状态耦合，实现跨架构、环境及仿真到现实的零样本故障检测，准确率达83%。

详情

AI中文摘要

视觉-语言-动作（VLA）模型越来越多地部署在各种任务中，但它们仍然是黑箱，其物理交互可能导致不可逆的伤害，因此需要可泛化和可解释的故障检测。我们观察到成功和失败的轨迹具有系统不同的信息论特征。基于此，我们将VLA控制形式化为闭环信息管道，并推导出三重信息论（Tri-Info）信号，这些信号捕捉动作是否保持多样性、时间一致性以及与状态转换的耦合。在六个VLA模型和三个基准环境中，Tri-Info在域内匹配最强的基线。此外，Tri-Info无需重新训练即可跨架构、环境和仿真到现实差距迁移，在现实世界任务中达到83%的准确率，而先前的检测器则降至随机水平。这确立了Tri-Info作为一种简单而强大的方法，不仅能够检测故障并具有强大的跨域泛化能力，还能提供底层故障模式的可解释诊断。

英文摘要

Vision-Language-Action (VLA) models are increasingly deployed across diverse tasks, yet they remain black boxes whose physical interactions can cause irreversible harm, making generalizable and interpretable failure detection essential. We observe that successful and failed rollouts carry systematically different information-theoretic signatures. Building on this, we formalize VLA control as a closed-loop information pipeline and derive the Triple Information-theoretic (Tri-Info) signals that capture whether actions remain diverse, temporally consistent, and coupled to state transitions. Across six VLA models and three benchmark environments, Tri-Info matches the strongest baselines in-domain. Moreover, Tri-Info transfers across architectures, environments, and the sim-to-real gap without retraining, reaching 83\% accuracy on real-world tasks where prior detectors collapse to chance. This establishes Tri-Info as a simple yet powerful method that not only detects failures with strong cross-domain generalization, but also delivers interpretable diagnostics of the underlying failure modes.

URL PDF HTML ☆

赞 0 踩 0

2606.20508 2026-06-19 cs.AI cs.LG 交叉投稿

What Do Safety-Aligned LLMs Learn From Mixed Compliance Demonstrations?

安全对齐的LLM从混合顺从演示中学到了什么？

Sihui Dai, Mann Patel

AI总结研究通过混合良性顺从演示和有害顺从演示，探究演示组成如何驱动有害顺从，发现演示内容、顺序和训练方法影响模型提取的信息。

详情

AI中文摘要

先前工作表明，上下文演示可以越狱语言模型，但模型如何解释不同类型的顺从演示仍不清楚。我们通过混合良性顺从演示（无害请求，有帮助响应）与有害顺从演示（有害请求，有帮助响应）并测试关于演示组成如何驱动有害顺从的三个假设来研究这一点。在四个模型中，我们发现良性和有害演示不可互换：良性演示根据模型不同可以减少或增加有害顺从。我们进一步表明，偏好优化是防止良性演示增加有害顺从的关键训练阶段，演示顺序表现出强烈的近因偏差，并且模型在拒绝与上下文学习的交互方式上有所不同：一些模型在拒绝时也采用演示的格式，而其他模型在拒绝时覆盖所有上下文信号。综合来看，这项工作超越了展示基于演示的越狱有效，而是描述了其工作原理：模型从顺从演示中提取的内容取决于演示内容、顺序和训练方法。

英文摘要

Prior work has shown that in-context demonstrations can jailbreak language models, but it remains unclear how models interpret different types of compliance demonstrations. We study this by mixing benign compliance demonstrations (non-harmful request, helpful response) with harmful compliance demonstrations (harmful request, helpful response) and testing three hypotheses about how demonstration composition drives harmful compliance. Across four models, we find that benign and harmful demonstrations are not interchangeable: benign demonstrations can either reduce or increase harmful compliance depending on the model. We further show that preference optimization is the critical training stage that prevents benign demonstrations from increasing harmful compliance, that demonstration ordering exhibits strong recency bias, and that models differ in how refusal interacts with in-context learning: some adopt demonstrated formatting even when refusing, while others override all in-context signals upon refusal. Taken together, this work moves beyond showing that demonstration-based jailbreaking works to characterizing how it works: what models extract from compliance demonstrations depends on demonstration content, ordering, and training methodology.

URL PDF HTML ☆

赞 0 踩 0

2606.20544 2026-06-19 cs.AI cs.LG 交叉投稿

Toward Calibrated Mixture-of-Experts Under Distribution Shift

面向分布偏移下的校准混合专家模型

Gina Wong, Drew Prinster, Suchi Saria, Rama Chellappa, Anqi Liu

发表机构 * Johns Hopkins University（约翰霍普金斯大学）

AI总结研究混合专家模型在分布偏移下的校准问题，提出对抗性重加权方法以改善路由聚合的校准误差，提升准确率-校准权衡。

Journal ref ICML 2026

详情

AI中文摘要

校准将模型的预测不确定性与其经验结果的频率对齐，对于理解和信任报告的概率很重要。最近的研究表明，在单个预测器级别强制执行校准可以提高集成准确性和校准，特别是混合专家（MoE）模型显示出强烈的经验改进；然而，校准有助于MoE的条件尚不清楚。在这项工作中，我们研究了MoE模型在分布偏移下的行为，重点关注路由机制如何与专家级校准相互作用。我们表明，在硬路由模型中，专家校准足以确保整体模型在一大类分布偏移下的校准，但不足以校准软路由模型。为了解决这个问题，我们提出了一种对抗性重加权方法，惩罚分布偏移下路由聚合的校准误差，并证明它在平均情况下以及在数据的困难子集上，跨模型类别、预测任务和分布偏移，改善了准确率-校准权衡。

英文摘要

Calibration aligns a model's predictive uncertainty with the frequencies of its empirical outcomes and is important for understanding and trusting reported probabilities. Recent work shows that enforcing calibration at the level of individual predictors can improve ensemble accuracy and calibration, with mixture-of-experts (MoE) models showing strong empirical improvements in particular; however, the conditions under which calibration helps MoE are not well understood. In this work, we study how MoE models behave under distribution shift, focusing on how routing mechanisms interact with expert-level calibration. We show that expert calibration is sufficient to ensure calibration of the overall model under a broad class of distribution shifts in hard-routed models, but is insufficient for calibrating soft-routed models. To address this, we propose an adversarial reweighting that penalizes calibration errors of the routed aggregate under distribution shift, and we demonstrate that it improves the accuracy-calibration tradeoff both on average and on difficult subsets of the data, across model classes, prediction tasks, and distribution shifts.

URL PDF HTML ☆

赞 0 踩 0

2604.06464 2026-06-19 cs.LG physics.app-ph stat.ML 版本更新

Weighted Bayesian Conformal Prediction

加权贝叶斯共形预测

Xiayin Lou, Peng Luo

发表机构 * Technical University of Munich（慕尼黑技术大学）； Massachusetts Institute of Technology（麻省理工学院）

AI总结提出加权贝叶斯共形预测（WBCP），通过加权Dirichlet先验推广贝叶斯共形预测到重要性加权设置，理论证明有效样本量决定后验方差，并提供更丰富的条件覆盖不确定性。

详情

AI中文摘要

共形预测提供具有有限样本覆盖保证的分布自由预测区间，Snell & Griffiths 最近的工作将其重新解释为贝叶斯求积（BQ-CP），通过阈值上的 Dirichlet 后验产生强大的数据条件保证。然而，BQ-CP 根本上要求 i.i.d. 假设。同时，加权共形预测通过重要性权重处理分布偏移，但仍然是频率学派方法，仅产生点估计阈值。我们提出 \textbf{加权贝叶斯共形预测（WBCP）}，它将 BQ-CP 推广到任意重要性加权设置，用加权 Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$ 替换均匀 Dirichlet $\Dir(1,\ldots,1)$，其中 $\neff$ 是 Kish 有效样本量。我们证明了四个理论结果：(1)~$\neff$ 是匹配频率学派和贝叶斯方差的唯一集中参数；(2)~后验标准差以 $O(1/\sqrt{\neff})$ 衰减；(3)~BQ-CP 的随机占优保证扩展到每个权重轮廓的数据条件保证；(4)~HPD 阈值在条件覆盖上提供 $O(1/\sqrt{\neff})$ 的改进。我们将 WBCP 实例化为 \emph{地理贝叶斯共形预测}，其中基于核的空间权重产生每个位置的后验，并具有可解释的诊断。在合成和真实空间数据集上的实验表明，WBCP 在保持覆盖保证的同时提供了更丰富的不确定性信息。

英文摘要

Conformal prediction provides distribution-free prediction intervals with finite-sample coverage guarantees, and recent work by Snell \& Griffiths reframes it as Bayesian Quadrature (BQ-CP), yielding powerful data-conditional guarantees via Dirichlet posteriors over thresholds. However, BQ-CP fundamentally requires the i.i.d. assumption. Meanwhile, weighted conformal prediction handles distribution shift via importance weights but remains frequentist, producing only point-estimate thresholds. We propose \textbf{Weighted Bayesian Conformal Prediction (WBCP)}, which generalizes BQ-CP to arbitrary importance-weighted settings by replacing the uniform Dirichlet $\Dir(1,\ldots,1)$ with a weighted Dirichlet $\Dir(\neff \cdot \tilde{w}_1, \ldots, \neff \cdot \tilde{w}_n)$, where $\neff$ is Kish's effective sample size. We prove four theoretical results: (1)~$\neff$ is the unique concentration parameter matching frequentist and Bayesian variances; (2)~posterior standard deviation decays as $O(1/\sqrt{\neff})$; (3)~BQ-CP's stochastic dominance guarantee extends to per-weight-profile data-conditional guarantees; (4)~the HPD threshold provides $O(1/\sqrt{\neff})$ improvement in conditional coverage. We instantiate WBCP for spatial prediction as \emph{Geographical BQ-CP}, where kernel-based spatial weights yield per-location posteriors with interpretable diagnostics. Experiments on synthetic and real-world spatial datasets demonstrate that WBCP maintains coverage guarantees while providing substantially richer uncertainty information.

URL PDF HTML ☆

赞 0 踩 0

2605.30089 2026-06-19 cs.LG 版本更新

Distributionally Robust Set Representation Learning Under Inference-Time Element Corruption

推理时元素损坏下的分布鲁棒集合表示学习

Yankai Chen, Hanrong Zhang, Bowei He, Philip S. Yu, Xue Liu

发表机构 * McGill University（麦吉尔大学）； University of Illinois Chicago（伊利诺伊大学芝加哥分校）

AI总结针对推理时元素损坏问题，提出SW-DRSO分布鲁棒优化框架，通过重心对抗近似最坏情况损失，在四个任务上验证了鲁棒性和性能。

Comments Accepted by ICML'26

2606.08892 2026-06-19 cs.LG 版本更新

一个探针无法捕捉所有：迈向有针对性的欺骗检测

Vikram Natarajan, Devina Jain, Shivam Arora, Satvik Golechha, Joseph Bloom

发表机构 * LASR Labs（LASR实验室）； UK AI Security Institute（英国人工智能安全研究所）

AI总结针对线性探针在欺骗检测中的异质性，提出根据具体欺骗类型匹配探针可显著提升性能（AUC提升0.108），建议组织定义威胁模型并部署相应探针。

详情

AI中文摘要

线性探针是一种有前景的监测AI系统欺骗行为的方法。先前工作表明，在对比指令对和简单数据集上训练的线性分类器可以达到良好性能。然而，这些探针即使在简单场景中也表现出显著失败，包括虚假相关性和对非欺骗响应的误报。在本文中，我们证明欺骗检测本质上是异质的：虽然单个通用探针实现了适度的改进（+0.032 AUC），但事后最优分析显示，当探针与特定欺骗类型匹配时，潜力显著更高（+0.108 AUC），并且合成验证实验表明，当欺骗类型事先已知时，这一上限是先验可实现的。我们的发现表明，指令对捕捉的是欺骗意图而非内容特定模式，这解释了为什么提示选择主导探针性能（占70.6%的方差）。鉴于这种异质性，我们得出结论，组织应定义其特定威胁模型并部署适当匹配的探针，而不是寻求通用的欺骗检测器。

英文摘要

Linear probes are a promising approach for monitoring AI systems for deceptive behaviour. Previous work has shown that a linear classifier trained on a contrastive instruction pair and a simple dataset can achieve good performance. However, these probes exhibit notable failures even in straightforward scenarios, including spurious correlations and false positives on non-deceptive responses. In this paper, we demonstrate that deception detection is inherently heterogeneous: while a single universal probe achieves modest improvements (+0.032 AUC), post-hoc oracle analysis reveals substantially higher potential (+0.108 AUC) when probes are matched to specific deception types, and synthetic validation experiments suggest this ceiling is achievable a priori when the deception type is known in advance. Our findings reveal that instruction pairs capture deceptive intent rather than content-specific patterns, explaining why prompt choice dominates probe performance (70.6% of variance). Given this heterogeneity, we conclude that organizations should define their specific threat models and deploy appropriately matched probes rather than seeking a universal deception detector.

URL PDF HTML ☆

赞 0 踩 0

2603.19423 2026-06-19 cs.CR cs.AI cs.LG 版本更新

利用邻近图增强图神经网络用于沙尘源排放预测

Maryam Sanisales, Zahed Rahmati, Ali Darvishi Boloorani, Ali Vefghi

发表机构 * Amirkabir University of Technology（阿米尔卡比尔理工大学）； University of Tehran（德黑兰大学）

AI总结提出使用Delaunay三角剖分等邻近图作为图神经网络输入，通过消息传递捕捉沙尘源排放的时空动态，相比随机图和LSTM模型显著提升预测精度。

详情

AI中文摘要

准确预测沙尘源排放对于减轻沙尘暴带来的重大环境和健康危害至关重要。传统预测方法通常难以捕捉这些现象的复杂时空动态。在本文中，我们证明邻近图使图神经网络（GNN）能够有效建模数据点之间复杂的空间和时间关系。具体来说，我们使用邻近图——如Delaunay三角剖分、Gabriel图、k-最近邻图和Yao图——作为GNN（包括GraphSAGE、图卷积网络和图注意力网络）的输入来执行消息传递。我们的方法强调了将邻近图与GNN集成用于稳健准确的沙尘源预测的有效性。为了强调邻近图表示的重要性，我们将我们的方法与使用随机图进行消息传递的GNN进行了比较。结果表明，使用邻近图的GNN显著优于使用随机图的GNN，并且在沙尘源排放预测中也远优于长短期记忆（LSTM）模型。

英文摘要

Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs--such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph--as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

URL PDF HTML ☆

赞 0 踩 0

2606.19956 2026-06-19 cs.LG 新提交

Towards Graph-Based Deep Learning for Map Generalization: Insights from Building Footprints Simplification and Aggregation

基于图深度学习的制图综合：来自建筑足迹简化和聚合的见解

Yanning Wang, Zhiyong Zhou, Zhouyu Liu, Mengni Yu, Yu Feng

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)（香港科技大学（广州））； Zhejiang University（浙江大学）； Mainz University of Applied Sciences（美因茨应用科学大学）

AI总结本研究首次探索将图深度学习应用于建筑足迹简化（节点移动预测）和聚合（链接预测），评估了GCN、GAT和GraphSAGE等架构，发现GraphSAGE在链接预测上表现较好，但节点移动预测仍具挑战，且聚合比简化更复杂。

Comments 15 pages, 20 figures, 10 tables

详情

AI中文摘要

制图综合仍然是制图学的基本任务之一，特别是对于复杂建筑足迹的简化和聚合。本研究首次探索将基于图的深度学习应用于这两项任务，在统一的图学习框架中将简化重新表述为节点移动预测，将聚合重新表述为链接预测。我们在多尺度建筑数据集上评估了代表性的图神经网络架构（GCN、GAT和GraphSAGE），结果表明GraphSAGE在链接预测准确性方面表现出相对优势，同时也揭示了精确节点移动预测中持续存在的挑战。除了定量性能外，结果还强调聚合比简化带来更大的复杂性和挑战，突显了当前深度学习方法在制图综合中捕捉更高层次空间关系的困难。尽管存在数据不平衡和需要后处理等局限性，该研究为利用深度学习方法推进自动化制图综合提供了宝贵的见解和方法方向。

英文摘要

Map generalization remains one of the fundamental tasks in cartography, especially for the simplification and aggregation of complex building footprints. This study presents the first exploratory application of graph-based deep learning to both tasks, reformulating simplification as node movement prediction and aggregation as link prediction within a unified graph learning framework. We evaluate representative graph neural network architectures (GCN, GAT, and GraphSAGE) on multi-scale building datasets, showing that GraphSAGE demonstrates relative strengths in link prediction accuracy, while also revealing persistent challenges in precise node movement prediction. Beyond quantitative performance, the results highlight that aggregation poses greater complexity and challenges than simplification, underscoring the difficulty of capturing higher-level spatial relationships in map generalization with current deep learning approaches. Although limitations such as data imbalance and the need for post-processing remain, the study provides valuable insights and methodological directions for advancing automated map generalization with deep learning approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.20283 2026-06-19 cs.LG cs.AI 新提交

Boundary Embedding Shaping with Adaptive Contrastive Learning for Graph Structural Disentanglement

基于自适应对比学习的边界嵌入塑造用于图结构解缠

Jiaqing Chen, Zidu Yin, Yichao Cai, Yuhang Liu, Zhen Zhang, Dong Gong, Javen Qinfeng Shi

发表机构 * Yunnan Normal University（云南师范大学）； Adelaide University（阿德莱德大学）； The University of New South Wales（新南威尔士大学）

AI总结针对图结构纠缠导致的分类性能下降，提出边界嵌入塑造模块，通过自适应对比学习选择性抑制决策边界处的虚假结构噪声，提升节点分类和链接预测精度。

Comments Accepted at ICML 2026

详情

AI中文摘要

图神经网络（GNN）在聚合邻居信息进行分类方面表现出色，但其性能受到图结构纠缠的阻碍，来自语义无关邻居的虚假相关污染了节点嵌入。这种挑战在嵌入空间中靠近类边界的节点最为严重，放大的结构噪声模糊了决策边界并破坏了预测的稳定性。现有的鲁棒GNN方法大多统一处理所有节点，忽略了边界脆弱性。本文中，为了提高分类性能，我们通过将边界区域纠缠识别为主要瓶颈来解决图结构解缠问题，并提出边界嵌入塑造（BES），一种自适应对比学习GNN插件模块，以最小的模型参数扰动选择性地抑制决策边界处的虚假结构噪声。大量实验表明，BES持续改善边界判别性，并优于现有领先方法。值得注意的是，BES在节点分类中平均提升GCN性能3.3%（在WikiCS上高达5.0%），并在链接预测中实现更优的准确率。

英文摘要

Graph neural networks (GNNs) excel at aggregating neighbor information for classification, yet their performance is hindered by graph structural entanglement, where spurious correlations from semantically irrelevant neighbors contaminate node embeddings. This challenge is most acute for nodes near class boundaries in the embedding space, where amplified structural noise blurs decision boundaries and destabilizes predictions. Existing robust GNN methods largely treat all nodes uniformly, ignoring boundary vulnerabilities. In this paper, to improve classification performance, we tackle graph structural disentanglement by identifying boundary-region entanglement as the primary bottleneck and propose Boundary Embedding Shaping (BES), an adaptive contrastive learning GNN plug-in module that selectively suppresses spurious structural noise at decision boundaries with minimal model parameter perturbation. Extensive experiments demonstrate that BES consistently improves boundary discrimination and outperforms existing leading methods. Notably, BES boosts GCN performance by an average of 3.3% in node classification (up to 5.0% on WikiCS) and achieves superior accuracy in link prediction.

URL PDF HTML ☆

赞 0 踩 0

2507.22524 2026-06-19 cs.LG 版本更新

HGCN(O): A Self-Tuning GCN HyperModel Toolkit for Outcome Prediction in Event-Sequence Data

HGCN(O)：一种用于事件序列数据结果预测的自调优GCN超模型工具包

Fang Wang, Paolo Ceravolo, Ernesto Damiani

发表机构 * College of Computing and Mathematical Sciences, Khalifa University（哈立发大学计算与数学科学学院）； Department of Computer Science, University of Milan（米兰大学计算机科学系）

AI总结提出HGCN(O)工具包，集成四种GCN架构和多种图表示，通过自调优优化预测准确性和稳定性，在平衡和不平衡数据集上表现优异，优于传统方法。

Comments 38 pages, 2 figures

2510.16311 2026-06-19 cs.LG 版本更新

LOKI: 无记忆零空间约束的终身知识编辑

Masih Eskandar, Miquel Sirera Perelló, Stratis Ioannidis, Jennifer Dy

AI总结提出LOKI方法，通过希尔伯特-施密特独立性准则动态选择层，并将梯度更新投影到模型权重的零空间，实现无需访问旧知识的终身知识编辑，平均准确率提升14%。

2606.20431 2026-06-19 cs.LG 新提交

Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual Learning

稀疏性、叠加与遗忘：持续学习中表示保持的机制研究

Jan Wasilewski, Jędrzej Kozal, Michał Woźniak, Bartosz Krawczyk

发表机构 * Rochester Institute of Technology（罗切斯特理工学院）； Wrocław University of Science and Technology（弗罗茨瓦夫科技大学）

AI总结通过可控玩具框架研究持续学习中的遗忘机制，发现叠加随时间增加但任务边界处有瞬降，高稀疏性增加叠加但不必然导致遗忘，任务级有效秩随稀疏性增长。

详情

AI中文摘要

持续学习（CL）系统常常遗忘先前获得的知识，但由于真实数据集纠缠了许多因素，遗忘的机制在实践中难以孤立。我们提出了一个可控的玩具世界框架，使这些机制可观察和可测试。使用合成生成器-分离器流水线，我们定义了真实潜在特征，构建了具有可调稀疏性和重叠的任务，并引入了表示强度和叠加（特征间的方向重叠）的可测量量。然后，我们通过拟合保留、叠加和暴露历史之间的稀疏动态关系（通过SINDy）来研究保留动态——表示强度的时间变化。基于有效秩的互补任务级分析表征了表示能力如何在任务间分配。我们的受控实验得出三个要点。（1）叠加随时间增加，在任务边界处有瞬降，表明边界特定的干扰而非稳定漂移。（2）更高的特征稀疏性导致更多叠加，但不必然引起遗忘；当表示保持强时，尽管重叠，遗忘可以减少。（3）任务级有效秩随稀疏性增长，表明在稀疏机制下更广泛的能力使用。这些结果共同细化了常见直觉——更多叠加导致更多遗忘，通过显示重叠与表示强度和能力分配相互作用。我们的玩具分析为CL提供了可证伪的假设和诊断工具。

英文摘要

Continual learning (CL) systems often forget previously acquired knowledge, yet the mechanisms driving forgetting remain hard to isolate in practice because real datasets entangle many factors. We present a controlled, toy-world framework that makes these mechanisms observable and testable. Using a synthetic generator-separator pipeline, we define ground-truth latent features, build tasks with tunable sparsity and overlap, and introduce measurable quantities for representation strength and superposition (directional overlap among features). We then study retention dynamics-the temporal change of representation strength by fitting sparse dynamical relations (via SINDy) between retention, superposition, and exposure history. A complementary task-level analysis based on effective rank characterizes how representational capacity is allocated across tasks. Our controlled experiments yield three takeaways. (1) Superposition tends to increase over time with transient dips at task boundaries, suggesting boundary-specific interference rather than steady drift. (2) Higher feature sparsity induces more superposition yet does not inevitably cause forgetting; when representations remain strong, forgetting can be reduced despite overlap. (3) Task-level effective rank grows with sparsity, indicating broader capacity usage under sparse regimes. Together, these results nuance the common intuition that more superposition leads to more forgetting by showing that overlap interacts with representation strength and capacity allocation. Our toy analysis provides falsifiable hypotheses and diagnostic tools for CL.

URL PDF HTML ☆

赞 0 踩 0

2606.20538 2026-06-19 cs.LG 新提交

Multi-Task Bayesian In-Context Learning

多任务贝叶斯上下文学习

Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho

发表机构 * New York University（纽约大学）

AI总结提出多任务上下文学习框架，通过将先验信息表示为上下文数据集前缀，训练Transformer实现分层贝叶斯预测推理，在多种分布偏移下匹配最优贝叶斯性能且速度提升数个数量级。

Comments ICML 2026

详情

AI中文摘要

贝叶斯预测推断为不确定性量化、数据效率和鲁棒泛化提供了原则性框架。然而，精确推断通常难以处理，可扩展近似可能仍计算昂贵或需要限制性建模假设，从而降低预测性能。先验数据拟合和上下文模型最近作为一种摊销替代方案出现，通过学习直接将数据集映射到预测分布，但现有方法与训练先验的支持紧密耦合，缺乏在测试时适应新先验的显式机制，导致在分布偏移下鲁棒性有限。我们引入了一个多任务上下文学习框架，用于摊销分层贝叶斯预测推断，该框架将先验信息显式表示为上下文数据集的前缀。一个在先验和目标任务序列上训练的Transformer学习跨先验族调整其预测。在一系列难度递增的评估中，包括元分布外先验和具有高维潜在结构的先验，我们的方法匹配了最优贝叶斯预测器，同时速度快了几个数量级。我们进一步在真实世界的时空温度预测基准上展示了其实用性。代码可在https://this URL获取。

英文摘要

Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.

URL PDF HTML ☆

赞 0 踩 0

2507.23534 2026-06-19 cs.LG cs.CV 版本更新

Continual Learning with Support Boundary Experience Blending

支持边界经验混合的持续学习

Chih-Fan Hsu, Ming-Ching Chang, Wei-Chao Chen

发表机构 * National Taiwan University（国立台湾大学）

AI总结提出经验混合框架，通过差分隐私启发的噪声生成支持边界数据，联合训练样本和边界数据以正则化决策边界，在多个数据集上提升持续学习准确率。

详情

AI中文摘要

持续学习旨在减轻模型在顺序任务训练时的灾难性遗忘。常见方法经验回放存储过去的样本，但仅稀疏地近似数据分布，导致决策边界脆弱且过于简化。我们通过引入支持边界数据来解决这一限制，该数据通过差分隐私启发的噪声注入潜在特征，生成边界邻近表示，隐式正则化决策边界。基于此，我们提出经验混合框架，通过双模型聚合策略联合训练样本和支持边界数据。经验混合有两个组成部分：(1) 潜在空间噪声注入以生成支持边界数据，(2) 联合利用样本和支持边界数据的端到端训练。与标准经验回放不同，支持边界数据丰富了决策边界附近的特征空间，从而实现更稳定和鲁棒的持续学习。在CIFAR-10、CIFAR-100、Tiny ImageNet和ImageNet1K上的大量实验分别展示了10%、6%、13%和2%的持续准确率提升。

英文摘要

Continual learning (CL) seeks to mitigate catastrophic forgetting when models are trained with sequential tasks. A common approach, experience replay (ER), stores past exemplars but only sparsely approximates the data distribution, yielding fragile and oversimplified decision boundaries. We address this limitation by introducing Support Boundary Data (SBD), generated via differential-privacy-inspired noise into latent features to create boundary-adjacent representations that implicitly regularize decision boundaries. Building on this idea, we propose Experience Blending (EB), a framework that jointly trains on exemplars and SBD through a dual-model aggregation strategy. EB has two components: (1) latent-space noise injection to generate support boundary data, and (2) end-to-end training that jointly leverages exemplars and SBD. Unlike standard experience replay, SBD enriches the feature space near decision boundaries, leading to more stable and robust continual learning. Extensive experiments on CIFAR-10, CIFAR-100, Tiny ImageNet, and ImageNet1K demonstrate consistent accuracy improvements of 10%, 6%, 14%, 2%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2104.08928 2026-06-19 stat.ML cs.CL cs.LG 版本更新

Group-Sparse Matrix Factorization for Transfer Learning of Word Embeddings

面向词嵌入迁移学习的组稀疏矩阵分解

Kan Xu, Xuanyi Zhao, Hamsa Bastani, Osbert Bastani

发表机构 * W. P. Carey School of Business, Arizona State University（亚利桑那州立大学韦伯商学院）； University of Pennsylvania（宾夕法尼亚大学）； Wharton School, University of Pennsylvania（宾夕法尼亚大学沃顿商学院）

AI总结提出一种基于组稀疏惩罚的两阶段估计器，通过结合大规模语料和少量领域数据高效迁移学习领域特定的词嵌入，并证明了其泛化误差界和非凸目标函数的局部最优与全局最优统计等价。

详情

AI中文摘要

非结构化文本为许多领域的决策者提供了丰富的数据源，从零售中的产品评论到医疗保健中的护理记录。为了利用这些信息，单词通常通过无监督学习算法（如矩阵分解）转化为词嵌入——编码单词之间语义关系的向量。然而，从训练数据有限的新领域学习词嵌入可能具有挑战性，因为在新领域中含义/用法可能不同，例如，单词“positive”通常具有积极情感，但在医疗记录中通常具有消极情感，因为它可能意味着患者检测出疾病阳性。在实践中，我们预计只有少数领域特定的单词可能具有新含义。我们提出了一种直观的两阶段估计器，通过组稀疏惩罚利用这种结构，通过结合大规模文本语料库（如维基百科）和有限的领域特定文本数据，高效地迁移学习领域特定的词嵌入。我们限定了迁移学习估计器的泛化误差，证明当只有少量嵌入在领域间改变时，它可以用显著更少的领域特定数据实现高精度。此外，我们证明了在标准正则化条件下，由非凸目标函数识别的所有局部最小值与全局最小值在统计上不可区分，这意味着我们的估计器可以高效计算。我们的结果首次给出了组稀疏矩阵分解的界限，这可能具有独立意义。我们通过与自然语言处理中最先进的微调启发式方法进行实证比较来评估我们的方法。

英文摘要

Unstructured text provides decision-makers with a rich data source in many domains, ranging from product reviews in retail to nursing notes in healthcare. To leverage this information, words are typically translated into word embeddings -- vectors that encode the semantic relationships between words -- through unsupervised learning algorithms such as matrix factorization. However, learning word embeddings from new domains with limited training data can be challenging, because the meaning/usage may be different in the new domain, e.g., the word ``positive'' typically has positive sentiment, but often has negative sentiment in medical notes since it may imply that a patient tested positive for a disease. In practice, we expect that only a small number of domain-specific words may have new meanings. We propose an intuitive two-stage estimator that exploits this structure via a group-sparse penalty to efficiently transfer learn domain-specific word embeddings by combining large-scale text corpora (such as Wikipedia) with limited domain-specific text data. We bound the generalization error of our transfer learning estimator, proving that it can achieve high accuracy with substantially less domain-specific data when only a small number of embeddings are altered between domains. Furthermore, we prove that all local minima identified by our nonconvex objective function are statistically indistinguishable from the global minimum under standard regularization conditions, implying that our estimator can be computed efficiently. Our results provide the first bounds on group-sparse matrix factorization, which may be of independent interest. We empirically evaluate our approach compared to state-of-the-art fine-tuning heuristics from natural language processing.

URL PDF HTML ☆

赞 0 踩 0

2601.02322 2026-06-19 stat.ME cs.LG 版本更新

Environment-Adaptive Covariate Selection: Learning When to Use Spurious Correlations for Out-of-Distribution Prediction

环境自适应协变量选择：学习何时利用虚假相关进行分布外预测

Shuozhi Zuo, Yixin Wang

发表机构 * Department of Statistics, University of Michigan, Ann Arbor（统计系，密歇根大学，安阿伯分校）

AI总结针对分布外预测中协变量选择问题，提出环境自适应算法，根据环境特征动态选择协变量集，在模拟和实际数据中优于静态方法。

详情

AI中文摘要

一种常见的分布外预测方法将模型限制为因果或不变协变量，以避免可能随环境变化的虚假关联。尽管具有理论吸引力，但当仅观察到结果的部分因果父节点时，该策略可能不如经验风险最小化。在这种情况下，非因果协变量可以作为未观察到的因果父节点的代理，当代理关系稳定时改善预测，但当变化破坏这种关系时则有害。因此，最优协变量集可能取决于所遇到的具体变化。由于不同的变化会在未标记的协变量分布中留下特征，我们提出了一种环境自适应协变量选择算法，该算法将环境级摘要映射到特定于环境的协变量集。这些摘要可以是手工制作的，也可以从多环境数据中学习，并且先验因果知识可以作为约束条件纳入。在模拟和应用数据集中，所提出的方法在各种变化下优于静态因果、不变和其他非自适应规则。

英文摘要

A common approach to out-of-distribution prediction restricts models to causal or invariant covariates to avoid spurious associations that may change across environments. Despite its theoretical appeal, this strategy can underperform empirical risk minimization when only a subset of the causal parents of the outcome is observed. In such settings, non-causal covariates can serve as proxies for unobserved causal parents and improve prediction when the proxy relationship is stable, but they can hurt when shifts disrupt that relationship. Thus, the optimal covariate set can depend on the specific shift encountered. Because different shifts leave signatures in the unlabeled covariate distribution, we propose an environment-adaptive covariate selection algorithm that maps environment-level summaries to environment-specific covariate sets. These summaries may be hand-crafted or learned from multi-environment data, and prior causal knowledge can be incorporated as constraints. Across simulations and applied datasets, the proposed method improves over static causal, invariant, and other non-adaptive rules under diverse shifts.

URL PDF HTML ☆

赞 0 踩 0

2606.19411 2026-06-19 cs.LG 新提交

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

通过NEPv的谱DPP：用于多样性感知数据选择的确定性点过程MAP的可扩展连续松弛

Richard Yi Da Xu

发表机构 * Hong Kong Baptist University（香港浸会大学）； TadReamk Limited（TadReamk有限公司）

AI总结提出将NP难的DPP-MAP选择问题转化为Stiefel流形上的连续优化，通过非线性特征值问题（NEPv）的自洽场迭代实现近线性时间求解，适用于大规模数据选择。

详情

AI中文摘要

从海量候选池中选择一个小的、多样化的、高质量的子集是现代机器学习中的一个常见原语——用于训练和微调大型模型的数据整理和核心集选择、主动学习批次获取、上下文学习的提示和示例选择、检索多样化以及实验设计。确定性点过程（DPP）为此任务提供了原则性的、良好校准的多样性概念，但其MAP目标——选择大小为$k$的子集$S$最大化$\log\det(L_S)$——是NP难的，并且标准的贪心和采样算法在候选集大小$n$上具有超线性复杂度。这种成本在多样性最重要的数据为中心的场景中尤其高昂，其中$n$范围从数百万到数十亿的候选示例、特征或嵌入。我们将DPP-MAP重新表述为Stiefel流形上的连续优化问题，并证明其最优性条件构成一个先前未研究形式的具有特征向量依赖性的非线性特征值问题（NEPv）。该NEPv允许自洽场（SCF）迭代，具有基于谱间隙的局部收缩保证，从而提供了一个原则性的迭代求解器，其中多样性目标驱动一个特征向量依赖的算子。由此产生的算法OurMethod仅需要与核的矩阵-向量乘积，运行时间为$O\!\big((ndk+nk^2)\,t\big)$，其中迭代次数$t$很小，在$n$上接近线性，并直接与机器学习中常见的低秩和特征映射核集成。本文重点介绍松弛、求解器和扩展分析；完整的真实数据基准测试留给计划中的实证研究。

英文摘要

Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\DPP s) give a principled, well-calibrated notion of diversity for this task, but their \emph{MAP} objective -- pick a size-$k$ subset $S$ maximizing $\logdet(L_S)$ -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \DPP-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a \emph{Nonlinear Eigenvalue Problem with eigenvector dependency} (\NEPv) of a previously unstudied form. This \NEPv\ admits a self-consistent field (\SCF) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \OurMethod, requires only matrix-vector products with the kernel and runs in time $O\!\big((ndk+nk^2)\,t\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

URL PDF HTML ☆

赞 0 踩 0

2606.19416 2026-06-19 cs.LG 新提交

MortarBench: Evaluating Mortgage Loan Origination Agents

MortarBench: 评估抵押贷款发起代理

Matthew Toles, Yunan Lu, Manav Munjal, Bojun Liu, Yuanhao Deng, Stephanie Selig, Derek Rindner, Cheng Li, Zhou Yu

发表机构 * Columbia University（哥伦比亚大学）； Tidalwave

AI总结提出MortarBench基准，通过金融数据合成与变异管道生成覆盖边缘案例的示例，评估大语言模型在贷款发起任务中的表现，发现模型准确率低且存在偏见，并引入CRIT校准框架提升准确率至80.5%。

详情

AI中文摘要

贷款发起是贷方创建新贷款的过程，从申请和承保到批准和融资。该过程在评估申请人的资格和风险水平方面起着关键作用。最近，尽管缺乏任何公开基准，公司已开始使用抵押贷款代理来增强人类贷款官员。为填补这一空白，我们提出了MortarBench，一个贷款发起代理基准。MortarBench使用金融数据合成和变异管道生成具有广泛边缘案例覆盖的示例，这些示例匹配真实世界的分布和问题。我们发现最先进的大语言模型（LLM）表现不佳，闭源模型最多达到77.1%的精确匹配准确率。我们还发现LLM对与非英语名字相关的外国性存在系统性偏见。注意到这些弱点，我们引入了CRIT，一个置信度校准框架。我们的方法将准确率提高到80.5%，同时改善了风险管理导向并减少了偏见。

英文摘要

Loan origination is the process by which a lender creates a new loan, from application and underwriting through approval and funding. This process serves a critical role in evaluating the eligibility and level of risk posed by an applicant. Recently, firms have begun using mortgage loan agents to augment human loan officers, despite a lack of any public benchmark. To fill this gap, we present MortarBench, a loan origination agent benchmark. MortarBench uses a financial data synthesis and mutation pipeline to generate examples with broad edge case coverage that match real-world distributions and questions. We find that state-of-the-art large language models (LLMs) perform poorly, with closed-source models achieving at most 77.1\% exact match accuracy. We also discover systematic biases in LLM perception of foreignness related to non-English names. Noting these weaknesses, we introduce CRIT, a confidence calibration framework. Our method increases accuracy to 80.5\% while improving risk management steering and reducing bias.

URL PDF HTML ☆

赞 0 踩 0

2606.19481 2026-06-19 cs.LG 新提交

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL：面向离线强化学习的重症监护室实时胰岛素管理

Thomas Frost, Steve Harris

AI总结针对电子健康记录离散化导致模型泛化性差的问题，提出基于真实临床轨迹的离线强化学习数据集Insulin4RL，包含375,000+决策和12,209名患者，用于评估模型在真实采样假设下的性能。

Comments Under submission

详情

AI中文摘要

离线强化学习（ORL）有潜力利用历史电子健康记录（EHR）数据提高临床决策质量。当前该领域的训练和评估实践严重依赖于按固定规则时间间隔离散化的EHR数据集。离散化创建了复杂临床场景的虚构表示，并损害了回顾性模型评估的泛化性。在本文中，我们介绍Insulin4RL，一个医疗ORL数据集，其特点是来自真实临床轨迹的自然不规则输入和动作。该数据集源自MIMIC-IV，包含超过375,000个标记决策，涉及12,209名需要在重症监护室进行胰岛素输注滴定的患者。因此，该数据集可用于研究ORL模型在现实临床采样假设下的性能。我们提供了数据集结构和特征的描述、使用无模型离线强化学习的基线性能指标，以及使用拟合Q评估的标准化评估协议。最后，我们提出了未来研究可以利用该资源解决的领域。

英文摘要

Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily on EHR datasets that have been temporally discretised into fixed, regular time intervals. Discretisation creates fictional representations of complex clinical scenarios and compromises the generalisability of retrospective model evaluations. In this paper, we introduce Insulin4RL, a healthcare ORL dataset featuring naturally irregular inputs and actions from real clinical trajectories. Derived from MIMIC-IV, Insulin4RL comprises over 375,000 labelled decisions across 12,209 patients requiring insulin infusion titration in the Intensive Care Unit. The dataset can thus be used for research into ORL model performance under realistic clinical sampling assumptions. We provide a description of the dataset's structure and characteristics, baseline performance metrics using model-free offline reinforcement learning, and a standardised evaluation protocol using fitted Q-evaluation. We conclude with suggested areas for future research that could be addressed using this resource.

URL PDF HTML ☆

赞 0 踩 0

2606.19558 2026-06-19 cs.LG cs.CL 新提交

Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment

位移不是方向：评估量化LLM部署的保真度指标

Miloš Nikolić, Ali Hadi Zadeh, Enrique Torres Sanchez, Andreas Moshovos

发表机构 * ByteShape ； University of Toronto（多伦多大学）； Vector Institute for Artificial Intelligence（向量人工智能研究所）

AI总结本文研究KL散度等保真度指标在量化语言模型部署中与下游基准分数的相关性，发现整体强相关但在近基线区域失效，归因于KL散度主要衡量分歧量而非方向。

详情

AI中文摘要

保真度指标，如每个token的KL散度（KLD）与高精度参考模型的比较，常被用作基准质量的低成本代理。我们在Qwen3.6-35B-A3B的28个量化模型和Devstral-Small-2-24B的41个量化模型上，通过一系列下游基准测试验证了这一做法。我们发现，在整个量化队列中，KLD与基准分数强相关（Qwen上ρ=-0.72，Devstral上ρ=-0.86，p<0.001）。然而，在接近基线的静默区，这种关系变得不显著（Qwen上ρ=+0.00，Devstral上ρ=-0.24，p=0.36）。这种失效在14种测量变体中持续存在，包括不同的KLD聚合方式、困惑度公式、top-1一致性、校准语料库和上下文长度。在逐提示层面，KLD在代码任务上仅有较弱的失败预测能力，在LiveCodeBench上五个模型的失败与通过几何平均比在[1.08,1.22]之间，并且作为跨模型路由器失败，在分歧提示上仅达到42.3%-49.4%的准确率。我们将这种失效归因于结构分解：KLD主要衡量与参考模型的分歧量，在静默区复合ρ在Qwen上为+0.94（p<0.001），在Devstral上为+0.55（p=0.03），而其与分歧方向的关系较弱且依赖于任务。

英文摘要

Fidelity metrics, such as per-token KL divergence (KLD) against a high-precision reference, are often used in practice as low-cost proxies for benchmark quality. We test this practice on a 28-quant cohort of Qwen3.6-35B-A3B and a 41-quant cohort of Devstral-Small-2-24B, evaluated across a suite of downstream benchmarks. We find that KLD is strongly correlated with benchmark score over the full cohort ($ρ=-0.72$ on Qwen and $ρ=-0.86$ on Devstral, both with $p<0.001$). However, this relationship collapses to non-significance in the near-baseline silent zone ($ρ=+0.00$ on Qwen and $ρ=-0.24$, $p=0.36$, on Devstral). This collapse persists across 14 measurement variants, including different KLD aggregations, perplexity formulations, top-1 agreement, calibration corpora, and context lengths. At the per-prompt level, KLD has only weak failure-prediction power on code, with failed-vs-passed geometric-mean ratios in $[1.08,1.22]$ across five models on LiveCodeBench, and fails as a cross-model router, achieving only $42.3\%-49.4\%$ accuracy on disagreement prompts. We trace the collapse to a structural decomposition: KLD primarily measures the volume of disagreement with the reference, with silent-zone composite $ρ=+0.94$ ($p<0.001$) on Qwen and $+0.55$ ($p=0.03$) on Devstral, while its relationship to the direction of those disagreements is weak and task-conditional.

URL PDF HTML ☆

赞 0 踩 0

2606.19595 2026-06-19 cs.LG cs.AI 新提交

IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows

IHBench：评估语音代理在结构化工作流中的中断后恢复能力

Ahmad Salimi, Wentao Ma, Yuzhi Tang, Dongming Shen, Mu Li, Alex Smola

发表机构 * Boson AI

AI总结提出IHBench基准，评估语音代理在结构化工作流中处理中断后的恢复能力，涵盖任务完成和恢复质量两个维度，实验表明闭源模型比开源模型更鲁棒。

详情

AI中文摘要

部署在结构化工作流（客户服务、医疗调度、账户管理）中的语音代理必须处理频繁的用户中断，同时保持多步骤程序的进度。现有的语音能力模型基准侧重于中断的时机：闯入检测、端点检测和轮流对话动态。它们忽略了中断后发生的情况：代理是否在正确的步骤恢复工作流？是否处理了用户的插话？是否避免重复用户已经听过的内容？我们引入了IHBench（中断处理基准），这是一个评估语音代理在10个企业领域中执行状态机驱动工作流时的中断后恢复能力的基准。六种中断类型在话语中间的控制点注入，并随数据生成每个中断的评估标准。每个中断在两个轴上评分：任务完成和恢复质量。我们评估了来自OpenAI、Google和开源社区的27个音频-语言模型配置。模型差异很大，恢复质量强烈依赖于中断类型。在我们的实验中，闭源模型比开源模型对中断更鲁棒：它们在任务完成上获胜的频率更高，随着对话变长，性能下降速度慢约3.3倍，并且没有音频与文本模态差距，而开源模型在这三个方面都处于劣势。一项人类研究验证了LLM评判员与人类标注者的一致性，与AudioMultiChallenge的跨基准分析表明，恢复质量在很大程度上是一个独立的能力轴。

英文摘要

Voice agents deployed in structured workflows (customer service, healthcare scheduling, account management) must handle frequent user interruptions while maintaining progress through multi-step procedures. Existing benchmarks for speech-capable models focus on the timing of interruptions: barge-in detection, endpointing, and turn-taking dynamics. They leave unmeasured what happens after the interruption: does the agent resume the workflow at the correct step? Does it address the user's interjection? Does it avoid re-delivering content the user already heard? We introduce IHBench (Interruption Handling Benchmark), a benchmark that evaluates post-interruption recovery in voice agents executing state-machine-driven workflows across 10 enterprise domains. Six interruption types are injected at controlled points mid-utterance, with per-interruption evaluation rubrics generated alongside the data. Each interruption is scored on two axes: task fulfillment and recovery quality. We evaluate 27 audio-language model configurations from OpenAI, Google, and the open-weight community. Models vary widely, and recovery quality depends strongly on the interruption type. Across our experiments, closed-weight models are consistently more robust to interruptions than open-weight ones: they win far more often on task fulfillment, degrade roughly 3.3x more slowly as conversations grow longer, and show no audio-versus-text modality gap, whereas the open-weight models lose ground on all three. A human study validates the LLM judge against human annotators, and a cross-benchmark analysis against AudioMultiChallenge indicates that recovery quality is a largely distinct capability axis.

URL PDF HTML ☆

赞 0 踩 0

覆盖约束下的数据偏差缓解与公平的代价

Bruno Scarone, Alfredo Viola, Renée J. Miller

发表机构 * Khoury College of Computer Sciences, Northeastern University（东北大学库里计算机科学学院）； Cheriton School of Computer Science, University of Waterloo（滑铁卢大学切里顿计算机科学学院）

AI总结针对多敏感属性交叉群体的偏差问题，提出在覆盖约束下扩展偏差缓解框架，通过整数线性规划优化缓解策略，权衡偏差近似误差与数据效率，并刻画公平的代价。

Comments Accepted to FAccT 2026

详情

AI中文摘要

机器学习模型已被证明在多个敏感属性（如种族和性别）交叉的个体上表现出歧视性结果或性能下降。这源于两个相互关联的挑战：缺乏量化偏差（可能是交叉的）的原则性措施，以及训练数据中交叉子群的代表性不足。我们扩展了一个最近的偏差缓解框架，以纳入覆盖约束，确保跨群体（包括交叉子群）的充分代表性。由于对所有群体实现完全零偏差可能不是数据高效的（意味着可能需要大量数据），我们的解决方案在满足覆盖约束的同时，用偏差的小近似误差换取更高的数据效率。我们还将偏差缓解表述为一个整数线性规划，优化所有缓解策略，并刻画公平的代价，即最小数据修改成本，作为公平容忍度的函数。这对于法律合规（法规可能规定特定的公平阈值）和数据治理（使从业者能够在偏差减少和数据修改（特别是数据购买）成本之间做出明智的权衡）都至关重要。我们在公开数据集上评估了我们的技术，表明通过我们的框架进行偏差缓解可以保持多个分类器的预测准确性，并且覆盖约束虽然出于统计考虑，但对于保持下游机器学习性能至关重要。

英文摘要

Machine learning models have been shown to exhibit discriminatory outcomes or degraded performance for individuals at the intersection of multiple sensitive attributes, such as race and gender. This stems in part from two interrelated challenges: the lack of principled measures for quantifying bias (potentially intersectional), and insufficient representation of intersectional subgroups in training data. We extend a recent bias mitigation framework to incorporate coverage constraints that enforce sufficient representation across groups, including intersectional subgroups. Since achieving exactly zero bias for all groups may not be data efficient (meaning it may require large amounts of data), our solution trades small approximation errors in bias for greater data efficiency while satisfying coverage constraints. We also formulate bias mitigation as an integer linear program that optimizes over all mitigation strategies, and characterize the price of fairness, the minimum data modification cost, as a function of fairness tolerance. This is essential both for legal compliance, where regulations may mandate specific fairness thresholds, and for data governance, enabling practitioners to make informed trade-offs between bias reduction and data modification (particularly, data purchasing) costs. We evaluate our techniques on publicly available datasets, demonstrating that bias mitigation via our framework preserves predictive accuracy across multiple classifiers, and that coverage constraints, while motivated by statistical considerations, are essential for preserving downstream ML performance.

URL PDF HTML ☆

赞 0 踩 0

2606.19597 2026-06-19 cs.SD cs.AI cs.LG 交叉投稿

PrefSQA: Pairwise Preference Prediction for Speech Quality Assessment and the Critical Role of High Quality Datasets

PrefSQA: 用于语音质量评估的成对偏好预测及高质量数据集的关键作用

Junyi Fan, Donald S. Williamson

发表机构 * Department of Computer Science and Engineering, The Ohio State University, USA（美国俄亥俄州立大学计算机科学与工程系）

AI总结提出PrefSQA模型，通过不确定性感知logits、损伤注意力头和非匹配参考比较模块，利用高质量偏好数据集提升语音质量评估的准确性。

Comments Accepted to INTERSPEECH 2026

详情

AI中文摘要

平均意见得分（MOS）广泛用于语音质量评估，但标量标签对评估者变异性和听力测试差异敏感，这引入了标签噪声，限制了MOS预测的可靠性。偏好预测通过让听者直接比较信号来减少这种变异性，产生更干净的标签。我们研究了无MOS的偏好预测，并提出了PrefSQA，它结合了不确定性感知logits、损伤注意力头以及基于非匹配参考比较的模块。我们使用并精炼了五个数据集，包括MOS衍生和低噪声模拟集（包含匹配和非匹配内容），在人类偏好集上进行实验，并在未见数据上测试。实验表明，在MOS衍生数据上改进较小，而其他数据集显示出相对于基线的明显改进，突显了高质量偏好数据的价值，并证明了所提出方法的有效性。

英文摘要

Mean opinion scores (MOS) are widely used for speech quality assessment, yet scalar labels are sensitive to rater variability and listening test differences. This introduces labeling noise, which limits the reliability of MOS prediction. Preference prediction reduces this variability as listeners compare signals directly, producing cleaner labels. We study MOS-free preference prediction and propose PrefSQA, which incorporates uncertainty-aware logits, an impairment attention head, and a module based on non-matching-reference comparisons. We use and refine five datasets, including MOS-derived and low-noise simulated sets with matching and non-matching content, experiment with human preference sets, and test on unseen data. Experiments show small improvements on MOS-derived data, while other sets reveal clear improvement over the baselines, highlighting the value of high-quality preference data and demonstrating the effectiveness of the proposed method.

URL PDF HTML ☆

赞 0 踩 0

2606.19714 2026-06-19 stat.ML cs.AI cs.LG stat.CO stat.ME 交叉投稿

EFIQA: 基于解剖先验的可解释眼底图像质量评估

Pengwei Wang, José Morano, Qian Wan, Hrvoje Bogunović

发表机构 * Institute of Artificial Intelligence, Center for Medical Data Science, Medical University of Vienna, Austria（维也纳医科大学医学数据科学中心人工智能研究所）； Christian Doppler Lab for Artificial Intelligence in Retina, Medical University of Vienna, Austria（维也纳医科大学视网膜人工智能克里斯蒂安·多普勒实验室）

AI总结提出无需质量标签的EFIQA框架，利用解剖先验通过掩膜解剖修复学习正常结构，生成空间质量图，在多个基准上超越监督方法，兼具可解释性。

Comments Accepted in MIDL 2026. Code: https://github.com/penway/EFIQA

Journal ref Proceedings of Machine Learning Research 315:2248-2264, 2026

详情

AI中文摘要

图像质量控制对于广泛的下游应用至关重要。基于深度学习的图像质量评估方法通常根据数据集特定的质量标签训练分类器，这继承了两种局限性：（1）泛化能力受限于训练集的标注标准；（2）这些方法无法提供质量下降的空间反馈，缺乏可解释性。在这项工作中，我们提出了EFIQA，一个无需质量相关监督的框架，并通过设计生成空间质量图。EFIQA不是从人工标注的标签中学习“什么是退化”，而是通过利用解剖先验来学习“应该有什么”。对于眼底摄影，我们将其实例化为两阶段方法：首先通过掩膜解剖修复训练无监督异常检测器，以识别缺失血管区域；然后将这一先验知识蒸馏到一个浅层适配器中，将冻结基础模型的特征映射到精确的质量图。外部数据集评估表明，这种无需标签且只需最小适配的方法，在不同质量标准的基准上，与监督方法相比，实现了更好的性能和可解释性，突显了其在现实应用中的潜力。

英文摘要

Image quality control is vital for a wide range of downstream applications. Deep learning-based image quality assessment methods typically train classifiers on dataset-specific quality labels, inheriting two limitations: (1) generalization is tied to the labeling criteria of the training set and (2) these methods cannot provide spatial feedback on where the quality is degraded, lacking explainability. In this work, we propose EFIQA, a framework that requires no quality-related supervision and produces spatial quality maps by design. Rather than learning ``what is degradation" from human-annotated labels, EFIQA learns ``what should be there" by leveraging anatomical priors. For fundus photography, we instantiate this as a two-stage approach, by first training an unsupervised anomaly detector via masked anatomical inpainting to identify regions of missing vasculature, and then distilling this prior knowledge into a shallow adapter mapping features of a frozen foundation model to precise quality maps. External-dataset evaluation demonstrates that this label-free approach with minimal adaptation achieves better performance and explainability compared with supervised methods across benchmarks with different quality criteria, highlighting its potential for real-world applications.

URL PDF HTML ☆

赞 0 踩 0

2606.20128 2026-06-19 cs.SE cs.DC cs.LG 交叉投稿

The Correctness Illusion in LLM-Generated GPU Kernels

LLM生成的GPU内核中的正确性错觉

Dipankar Sarkar

AI总结通过高精度CPU参考和操作模式感知的模糊测试，发现现有基准测试中基于固定形状的allclose检查无法检测LLM风格的转录错误，提出一种新协议并验证其有效性。

Comments 10 pages, 2 figures, LNCS format. Companion papers to follow on arXiv next week; IDs will be added in a v2 replace

详情

AI中文摘要

针对LLM生成的GPU内核的基准测试（KernelBench、TritonBench、GEAK）通过固定形状、小样本的allclose风格检查来评分正确性。不同基准测试的输入数量不同。每个内核的形状、数据类型和容差是固定的。我们凭经验测试了该oracle。我们构建了一个包含24个Triton和CPU替代内核（15个正确对照和9个带有记录转录错误的LLM风格错误变体）的受控语料库，并在操作模式感知的种子模糊测试下，使用高精度（fp64）CPU参考和每个（操作，数据类型）的绝对容差重新评估。种子oracle标记了9个错误内核中的9个，并通过了15个正确对照中的15个，对照的精度成本为零。我们将语料库扩展到26个操作（添加一个flash-attention对），并在五类GPU（RTX 3060、A10、L40S、A100 SXM4、H100 NVL）上重新运行相同的协议。所有五个GPU的判定结果相同：10个错觉中的10个被捕获，16个对照中的16个干净。语料库结果涉及LLM风格的转录错误，这些错误被单形状allclose oracle认证为正确，而不涉及任何特定部署的LLM的错误率。每个标记的失败都从存储的种子逐字节重放。

英文摘要

Benchmarks for LLM-generated GPU kernels (KernelBench, TritonBench, GEAK) score correctness through fixed-shape, small-sample allclose-style checks. The number of inputs varies between benchmarks. The shape, dtype, and tolerance are fixed for each kernel. We test that oracle empirically. We construct a controlled corpus of 24 Triton and CPU stand-in kernels (15 correct controls and 9 LLM-style buggy variants seeded with documented transcription errors) and re-evaluate it under op-schema-aware seeded fuzzing with a high-precision (fp64) CPU reference and per-(op, dtype) absolute tolerances. The seeded oracle flags 9 of 9 buggy kernels and passes 15 of 15 correct controls, at zero precision cost on controls. We extend the corpus to 26 ops (adding a flash-attention pair) and re-run the same protocol on five GPU classes (RTX 3060, A10, L40S, A100 SXM4, H100 NVL). The verdicts are identical across all five GPUs: 10 of 10 illusions caught and 16 of 16 controls clean. The corpus result is about LLM-style transcription bugs that the allclose-on-one-shape oracle certifies as correct, not about the bug rate of any specific deployed LLM. Every flagged failure replays byte-for-byte from a stored seed.

URL PDF HTML ☆

赞 0 踩 0

2606.20477 2026-06-19 cs.CV cs.CL cs.LG 交叉投稿

Scalable Training of Spatially Grounded 2D Vision-Language Models for Radiology

面向放射学的空间定位2D视觉-语言模型的可扩展训练

Yusuf Salcan, Simon Ging, Robin Schirrmeister, Philipp Arnold, Elmar Kotter, Behzad Bozorgtabar, Thomas Brox

发表机构 * Computer Vision Group, University of Freiburg, Germany（德国弗莱堡大学计算机视觉组）； Department of Radiology, Medical Center -- University of Freiburg, Germany（德国弗莱堡大学医学中心放射科）； CRIION-AI Lab, Freiburg, Germany（德国弗莱堡CRIION-AI实验室）

AI总结提出RefRad2D大规模双语数据集，通过LLM和自动分割生成空间定位数据，训练RadGrounder模型联合完成报告生成、VQA和空间定位，在外部基准上取得竞争性结果。

Comments Accepted for MICCAI 2026. First two authors: equal contribution. Last two authors: equal supervision

详情

AI中文摘要

我们研究了如何在没有手动空间标注的情况下，为放射学训练具有视觉定位能力的视觉-语言模型（VLM）。我们引入了RefRad2D，这是一个大规模的双语（德语/英语）数据集，包含来自临床实践的120万对CT和MR图像-文本对，并通过基于LLM的筛选和自动分割自动生成任务特定的VQA和空间定位子集。在此数据上训练的模型RadGrounder联合执行报告生成、视觉问答以及通过边界框检测或分割进行的空间定位。在外部VQA基准（Slake，VQA-RAD）上，RadGrounder取得了与专用医学VLM竞争的结果。将我们的临床数据加入训练混合集，相比于仅在下游数据集上微调，提高了开放式VQA的性能，显示了数据集的迁移性。关键在于，添加定位监督不会降低语言质量，从而在不牺牲VQA性能的情况下实现空间可验证的输出。

英文摘要

We study how to train visually grounded vision-language models (VLMs) for radiology without manual spatial annotations. We introduce RefRad2D, a large-scale bilingual (German/English) dataset of 1.2M CT and MR image-text pairs derived from clinical practice, with task-specific VQA and spatial grounding subsets generated automatically via LLM-based curation and automated segmentation. Trained on this data, our model RadGrounder jointly performs report generation, visual question answering, and spatial grounding via bounding-box detection or segmentation. On external VQA benchmarks (Slake, VQA-RAD), RadGrounder achieves competitive results with specialized medical VLMs. Adding our clinical data to the training mixture improves open-ended VQA over fine-tuning on the downstream datasets alone, showing the transferability of our dataset. Crucially, adding grounding supervision does not degrade language quality, enabling spatially verifiable outputs at no cost to VQA performance.

URL PDF HTML ☆

赞 0 踩 0

2606.20482 2026-06-19 cs.CL cs.HC cs.LG 交叉投稿

Your Mouse and Eyes Secretly Leak Your Preference: LLM Alignment using Implicit Feedback from Users

你的鼠标和眼睛悄悄泄露你的偏好：利用用户隐式反馈进行LLM对齐

Haw-Shiuan Chang, Jeffrey Gomez, Mehul Patwari, Aryan Sajith, Hamed Zamani

发表机构 * University of Massachusetts, Amherst（马萨诸塞大学阿默斯特分校）； York University（约克大学）

AI总结针对显式反馈稀缺的问题，提出利用鼠标轨迹和眼动数据等隐式反馈训练奖励模型，将文本奖励模型准确率从55%提升至64%，并显著提高DPO对齐后响应质量。

详情

AI中文摘要

为了对齐大型语言模型（LLM），大多数现有方法收集显式的人类反馈，并基于响应文本训练奖励模型来预测人类偏好。这些现有方法有两个关键局限性。首先，用户很少为LLM响应提供显式反馈，这使得高质量偏好标注的收集成本高昂。其次，这些方法没有利用隐式人类反馈，而隐式反馈已被证明对互联网巨头的经济护城河至关重要。为了量化隐式反馈的价值，我们构建了一个名为IFLLM的新数据集，收集了来自59名Mechanical Turk工作者的1336个多轮问题、他们的鼠标轨迹以及通过网络摄像头对LLM响应的眼动注视点。IFLLM显示用户具有非常多样化的注视行为和鼠标轨迹。基于隐式用户反馈的奖励模型将基于文本的奖励模型准确率从55%提升至64%，并在将DPO应用于八个LLM后，相对响应质量改进几乎翻了三倍，证明了隐式反馈在现实场景中的价值。我们的数据收集网站、数据集和代码可在以下网址找到：此https URL。

英文摘要

To align a Large Language Model (LLM), most existing methods collect explicit human feedback and train a reward model to predict the human preference based on the response text. These existing methods have two key limitations. First, the users rarely provide explicit feedback for LLM responses, which makes the high-quality preference annotation expensive to collect. Second, the methods do not leverage implicit human feedback, which has proven vital to the economic moats of Internet giants. To quantify the value of implicit feedback, we build a new dataset called IFLLM, which collects 1336 multi-turn questions from the 59 Mechanical Turk workers, their mouse trajectories, and eye gazing points to the LLMs' responses from their webcams. IFLLM shows that the users have very diverse types of gazing behavior and mouse trajectories. Our reward model based on the implicit user feedback boosts the accuracy of the text-based reward model from 55% to 64% and nearly triples the relative response quality improvements after applying the DPO to eight LLMs, demonstrating the value of implicit feedback in the wild. Our data collection website, dataset, and codes can be found at https://github.com/themehulpatwari/llm-implicit-feedback/.

URL PDF HTML ☆

赞 0 踩 0

2505.16319 2026-06-19 cs.LG 版本更新

FreshRetailNet-50K: A Stockout-Annotated Censored Demand Dataset for Latent Demand Recovery and Forecasting in Fresh Retail

FreshRetailNet-LT：面向生鲜零售中潜在需求恢复与预测的缺货标注删失需求数据集

Yangyang Wang, Jiawei Gu, Li Long, Xin Li, Li Shen, Zhouyu Fu, Xiangjun Zhou, Xu Jiang

发表机构 * Fresh Retail, Inc.（新鲜零售公司）

AI总结针对生鲜零售中缺货导致的销售数据删失问题，提出首个大规模基准数据集FreshRetailNet-50K，包含50,000条高时间分辨率小时级销售序列及缺货标注，并展示了两阶段需求建模方法，将预测准确率提升2.73%，需求低估偏差从7.37%降至近零。

详情

AI中文摘要

准确的需求估计对于零售业务指导易腐产品的库存和定价策略至关重要。然而，它面临缺货期间删失销售数据的根本挑战，其中未观察到的需求会造成系统性政策偏差。现有数据集缺乏解决这种删失效应所需的时间分辨率和标注。为填补这一空白，我们提出了FreshRetailNet-50K，这是首个用于删失需求估计的大规模基准。它包含来自18个主要城市898家商店的50,000条商店-产品时间序列的详细小时级销售数据，涵盖863个易腐SKU，并精心标注了缺货事件。该数据集独有的小时级库存状态记录，结合丰富的上下文协变量（包括促销折扣、降水和时间特征），使得超越现有解决方案的创新研究成为可能。我们展示了一个两阶段需求建模的用例：首先，利用精确的小时级标注重建缺货期间的潜在需求；然后，利用恢复的需求在第二阶段训练鲁棒的需求预测模型。实验结果表明，该方法将预测准确率提高了2.73%，同时将系统性需求低估从7.37%降至接近零偏差。凭借前所未有的时间粒度和全面的真实世界信息，FreshRetailNet-50K在需求插补、易腐库存优化和因果零售分析方面开辟了新的研究方向。该数据集独特的标注质量和规模解决了零售AI中长期存在的局限性，提供了即时解决方案和未来方法论创新的平台。数据（此 https URL ）和代码（此 https URL ）已公开。

英文摘要

Accurate demand estimation is critical for the retail business in guiding the inventory and pricing policies of perishable products. However, it faces fundamental challenges from censored sales data during stockouts, where unobserved demand creates systemic policy biases. Existing datasets lack the temporal resolution and annotations needed to address this censoring effect. To fill this gap, we present FreshRetailNet-50K, the first large-scale benchmark for censored demand estimation. It comprises 50,000 store-product time series of detailed hourly sales data from 898 stores in 18 major cities, encompassing 863 perishable SKUs meticulously annotated for stockout events. The hourly stock status records unique to this dataset, combined with rich contextual covariates, including promotional discounts, precipitation, and temporal features, enable innovative research beyond existing solutions. We demonstrate one such use case of two-stage demand modeling: first, we reconstruct the latent demand during stockouts using precise hourly annotations. We then leverage the recovered demand to train robust demand forecasting models in the second stage. Experimental results show that this approach achieves a 2.73% improvement in prediction accuracy while reducing the systematic demand underestimation from 7.37% to near-zero bias. With unprecedented temporal granularity and comprehensive real-world information, FreshRetailNet-50K opens new research directions in demand imputation, perishable inventory optimization, and causal retail analytics. The unique annotation quality and scale of the dataset address long-standing limitations in retail AI, providing immediate solutions and a platform for future methodological innovation. The data (https://huggingface.co/datasets/Dingdong-Inc/FreshRetailNet-50K) and code (https://github.com/Dingdong-Inc/frn-50k-baseline}) are openly released.

URL PDF HTML ☆

赞 0 踩 0

2507.15584 2026-06-19 cs.LG 版本更新

We Need to Rethink Benchmarking in Anomaly Detection

我们需要重新思考异常检测中的基准测试

Philipp Röchner, Simon Klüttermann, Kevin Kammler, Franz Rothlauf, Emmanuel Müller, Daniel Schlör

发表机构 * University of Mainz（马尔堡大学）； TU Dortmund（杜伊斯堡-艾森大学）； University of Würzburg（维尔茨堡大学）

AI总结本文指出当前异常检测基准测试导致进展停滞，提出基于场景分类的评估框架以改进算法选择和性能评估。

详情

AI中文摘要

尽管不断有新的异常检测算法提出且基准测试工作广泛，但进展似乎停滞不前，既有基线与新算法之间仅存在微小的性能差异。在这篇立场论文中，我们认为这种停滞源于我们评估异常检测算法的方式存在局限性。在当前的基准测试中，一个仅检查单个特征极端值的平凡算法与最先进的深度学习方法竞争激烈，尽管它在简单案例（如正常点环内的异常）上失败。此外，现有基准测试未能充分反映异常检测应用的多样性，使得从业者难以可靠地为其应用选择算法。因此，我们需要重新思考异常检测中的基准测试。我们认为，异常检测应通过使用场景来研究，这些场景将共享相关特征的应用分组，并通过通用分类法定义。场景内的基准测试能够实现预处理、度量和模型选择的场景特定选择，明确哪些进展在相似应用间迁移，并为从业者在其特定上下文中提供可靠指导。

英文摘要

Despite the continuous proposal of new anomaly detection algorithms and extensive benchmarking efforts, progress seems to stagnate, with only minor performance differences between established baselines and new algorithms. In this position paper, we argue that this stagnation is due to limitations in how we evaluate anomaly detection algorithms. In current benchmarks, a trivial algorithm that only checks for extreme values in individual features performs competitively with state-of-the-art deep learning methods, despite failing on simple cases such as anomalies within an annulus of normal points. Moreover, existing benchmarks do not adequately reflect the diversity of anomaly detection applications, making it difficult for practitioners to reliably select algorithms for their applications. Consequently, we need to rethink benchmarking in anomaly detection. In our opinion, anomaly detection should be studied using scenarios that group applications sharing relevant characteristics, defined through a common taxonomy. Benchmarking within scenarios enables scenario-specific choices for preprocessing, metrics, and model selection, clarifying which advances transfer across similar applications and providing practitioners with reliable guidance for their specific contexts.

URL PDF HTML ☆

赞 0 踩 0

2510.06048 2026-06-19 cs.LG 版本更新

脑MRI的量子潜GAN增强的受控基准测试

Syed Mujtaba Haider, Silvia Figini

发表机构 * Department of Mathematics（数学系）； Department of Political and Social Sciences（政治与社会科学系）

AI总结通过受控基准测试，比较量子与经典生成器在脑MRI数据增强中的性能，发现两者均未显著优于仅用真实数据训练，且量子生成器无额外优势。

详情

AI中文摘要

医学图像分类常受限于有限的标注数据，因此生成式增强被提出；最近，量子生成模型被用于此目的，并经常报告准确率提升。然而，这些声称通常基于单次训练运行，未匹配量子与经典生成器的参数预算，也未表征任何收益出现的数据范围。我们提出了一个受控基准测试，隔离量子生成器对脑MRI增强的贡献。图像被编码到KL正则化的潜在空间中，在该空间中，使用变分量子生成器或参数数量几乎相同的经典生成器（1648 vs. 1632）训练带有梯度惩罚的条件Wasserstein GAN。合成样本被解码并用于增强预训练分类器，覆盖从5%到100%的标注数据比例，通过八个随机种子进行配对显著性检验（多重比较校正）以及集内多样性和潜在分布分析。在所有比例下，没有增强变体显著优于仅用真实数据训练，且量子与经典生成器在统计上无法区分。任何低数据优势表现为正则化而非忠实的数据扩展：合成样本分布外移，并且在数据稀缺时严重模式崩溃，而量子生成器并不比经典生成器更多样化。我们发布该协议作为医学成像中量子生成增强严格评估的测试平台。

英文摘要

Medical image classification is often constrained by limited labeled data, motivating generative augmentation; recently, quantum generative models have been proposed for this purpose, frequently reporting accuracy gains. However, such claims are typically based on single training runs, do not match the parameter budgets of the quantum and classical generators, and do not characterize the data regime in which any benefit appears. We present a controlled benchmark that isolates the contribution of a quantum generator to brain-MRI augmentation. Images are encoded into a KL-regularized latent space in which a conditional Wasserstein GAN with gradient penalty is trained using either a variational quantum generator or a classical generator of near-identical parameter count (1648 vs. 1632). Synthetic samples are decoded and used to augment a pretrained classifier across labeled data fractions from 5% to 100%, evaluated over eight random seeds with paired significance testing (with multiple-comparison correction) and with intraset diversity and latent-distribution analyses. Across all fractions, no augmentation variant significantly outperforms real-data-only training, and the quantum and classical generators are statistically indistinguishable. Any low-data benefit behaves as regularization rather than faithful data expansion:synthetic samples are off distribution and severely mode collapsed precisely where data is scarce, and the quantum generator is no more diverse thanits classical counterpart. We release the protocol as a testbed for rigorous evaluation of quantum generative augmentation in medical imaging.

URL PDF HTML ☆

赞 0 踩 0

2508.05762 2026-06-19 cond-mat.mtrl-sci cs.LG 版本更新

Evaluating Universal Machine Learning Force Fields Against Experimental Measurements

评估通用机器学习力场与实验测量的对比

Sajid Mannan, Vaibhav Bihani, Carmelo Gonzales, Kin Long Kelvin Lee, Nitya Nand Gosvami, Sayan Ranu, Santiago Miret, N M Anoop Krishnan

发表机构 * Department of Civil Engineering, Indian Institute of Technology Delhi（印度理工学院德里土木工程系）； Yardi School of Artificial Intelligence, Indian Institute of Technology Delhi（印度理工学院德里人工智能学院）； Intel Labs, California, USA（美国加州英特尔实验室）； Department of Materials Science and Engineering, Indian Institute of Technology Delhi（印度理工学院德里材料科学与工程系）； Department of Computer Science and Engineering, Indian Institute of Technology Delhi（印度理工学院德里计算机科学与工程系）

AI总结提出UniFFBench框架和MinX数据集，系统评估六种通用机器学习力场，发现模型在计算基准上表现优异但在实验复杂性下存在显著“现实差距”，密度预测误差高于实际应用阈值。

详情

AI中文摘要

通用机器学习力场（UMLFFs）有望通过实现跨元素周期表的快速原子模拟来革新材料科学。然而，它们的评估一直局限于可能无法反映实际性能的计算基准。我们引入了UniFFBench，一个全面的评估框架，包含MinX数据集——一个涵盖85种元素、极端热力学条件（0–5000 K, 0–1000 GPa）和结构复杂性（包括部分占据和无序）的1500多种矿物系统的多样化集合。这种多样性，结合用于验证的实验参考值，使得能够评估UMLFF在化学空间和条件上的泛化能力，这些条件远超典型的训练场景。我们对六种最先进的UMLFF的系统评估揭示了一个显著的“现实差距”：在计算基准上表现令人印象深刻的模型在面对实验复杂性时常常失败。即使是最好的模型也表现出高于实际应用所需阈值的密度预测误差。我们观察到模拟稳定性和力学性能准确性之间的脱节，预测误差与训练数据表示相关，而非建模方法。

英文摘要

Universal machine learning force fields (UMLFFs) promise to revolutionize materials science by enabling rapid atomistic simulations across the periodic table. However, their evaluation has been limited to computational benchmarks that may not reflect real-world performance. We introduce UniFFBench, a comprehensive evaluation framework featuring the MinX dataset -- a diverse collection of 1,500+ mineral systems spanning 85 elements, extreme thermodynamic conditions (0--5000 K, 0--1000 GPa), and structural complexity, including partial occupancy and disorder. This diversity, combined with experimental reference values for validation, enables assessment of UMLFF generalization across chemical space and conditions substantially beyond typical training scenarios. Our systematic evaluation of six state-of-the-art UMLFFs reveals a substantial ``reality gap'': models achieving impressive performance on computational benchmarks often fail when confronted with experimental complexity. Even the best-performing models exhibit higher density prediction error than the threshold required for practical applications. We observe disconnects between simulation stability and mechanical property accuracy, with prediction errors correlating with training data representation rather than the modeling method.

URL PDF HTML ☆

赞 0 踩 0

2510.08807 2026-06-19 cs.RO cs.LG 版本更新

Humanoid Everyday: A Comprehensive Robotic Dataset for Open-World Humanoid Manipulation

Humanoid Everyday：面向开放世界人形机器人操作的综合机器人数据集

Zhenyu Zhao, Hongyi Jing, Xiawei Liu, Jiageng Mao, Abha Jha, Hanwen Yang, Rong Xue, Sergey Zakharov, Vitor Guizilini, Yue Wang

发表机构 * University of Southern California（南加州大学）； Toyota Research Institute（丰田研究院）

AI总结提出Humanoid Everyday数据集，包含10.3k轨迹、260个任务的多模态数据，用于人形机器人灵巧操作、人机交互和移动操作研究，并配套云评估平台。

详情

AI中文摘要

从运动到灵巧操作，人形机器人在展示复杂的全身能力方面取得了显著进展。然而，当前大多数机器人学习数据集和基准主要关注固定机器人臂，少数现有人形数据集要么局限于固定环境，要么任务多样性有限，通常缺乏人机交互和下肢运动。此外，缺乏用于在人形数据上对基于学习的策略进行基准测试的标准化评估平台。在这项工作中，我们提出了Humanoid Everyday，一个大规模且多样化的人形操作数据集，其特点是涉及灵巧物体操作、人机交互、运动集成动作等广泛的任务多样性。利用高效的人工监督遥操作流水线，Humanoid Everyday聚合了高质量的多模态感官数据，包括RGB、深度、LiDAR和触觉输入，以及自然语言注释，包含10.3k条轨迹和超过300万帧数据，涵盖7个大类共260个任务。此外，我们对数据集上的代表性策略学习方法进行了分析，提供了它们在不同任务类别中的优势和局限性的见解。为了标准化评估，我们引入了一个基于云的评估平台，允许研究人员在我们的受控环境中无缝部署他们的策略并接收性能反馈。通过发布Humanoid Everyday以及我们的策略学习分析和标准化的基于云的评估平台，我们旨在推进通用人形操作的研究，并为现实世界中更有能力和具身化的机器人代理奠定基础。我们的数据集、数据收集代码和云评估网站在我们的项目网站上公开发布。

英文摘要

From loco-motion to dextrous manipulation, humanoid robots have made remarkable strides in demonstrating complex full-body capabilities. However, the majority of current robot learning datasets and benchmarks mainly focus on stationary robot arms, and the few existing humanoid datasets are either confined to fixed environments or limited in task diversity, often lacking human-humanoid interaction and lower-body locomotion. Moreover, there are a few standardized evaluation platforms for benchmarking learning-based policies on humanoid data. In this work, we present Humanoid Everyday, a large-scale and diverse humanoid manipulation dataset characterized by extensive task variety involving dextrous object manipulation, human-humanoid interaction, locomotion-integrated actions, and more. Leveraging a highly efficient human-supervised teleoperation pipeline, Humanoid Everyday aggregates high-quality multimodal sensory data, including RGB, depth, LiDAR, and tactile inputs, together with natural language annotations, comprising 10.3k trajectories and over 3 million frames of data across 260 tasks across 7 broad categories. In addition, we conduct an analysis of representative policy learning methods on our dataset, providing insights into their strengths and limitations across different task categories. For standardized evaluation, we introduce a cloud-based evaluation platform that allows researchers to seamlessly deploy their policies in our controlled setting and receive performance feedback. By releasing Humanoid Everyday along with our policy learning analysis and a standardized cloud-based evaluation platform, we intend to advance research in general-purpose humanoid manipulation and lay the groundwork for more capable and embodied robotic agents in real-world scenarios. Our dataset, data collection code, and cloud evaluation website are made publicly available on our project website.

URL PDF HTML ☆

赞 0 踩 0

2603.28387 2026-06-19 cs.AI cs.LG 版本更新

The Scaffold Effect: How Prompt Framing Drives Apparent Multimodal Gains in Clinical VLM Evaluation

脚手架效应：提示框架如何驱动临床VLM评估中的表面多模态增益

Doan Nam Long Vu, Simone Balloccu

发表机构 * Technical University of Darmstadt（达姆施塔特技术大学）

AI总结研究发现，在临床VLM评估中，提示中提及MRI可用性即可解释70-80%的性能提升，与图像数据是否存在无关，这种“脚手架效应”揭示了表面评估无法反映真实多模态推理能力。

详情

AI中文摘要

可信的临床AI要求性能提升反映真实的证据整合而非表面伪影。我们在两个临床神经影像队列\textsc{FOR2107}（情感障碍）和\textsc{OASIS-3}（认知衰退）上评估了12个开源视觉语言模型（VLM）的二分类性能。两个数据集都包含结构MRI数据，但这些数据不携带可靠的个体级诊断信号。在这些条件下，较小的VLM在引入神经影像上下文后F1分数提升高达58%，蒸馏模型变得与规模大一个数量级的模型相当。对比置信度分析显示，仅仅在任务提示中\textit{提及}MRI可用性就解释了70-80%的转变，与影像数据是否存在无关，这是模态坍塌的一个领域特定实例，我们称之为\textit{脚手架效应}。专家评估揭示了在所有条件下捏造基于神经影像的正当理由，而偏好对齐虽然消除了引用MRI的行为，却使两种条件都退化为随机基线。我们的发现表明，表面评估不足以作为多模态推理的指标，这对VLM在临床环境中的部署有直接影响。

英文摘要

Trustworthy clinical AI requires that performance gains reflect genuine evidence integration rather than surface-level artifacts. We evaluate 12 open-weight vision-language models (VLMs) on binary classification across two clinical neuroimaging cohorts, \textsc{FOR2107} (affective disorders) and \textsc{OASIS-3} (cognitive decline). Both datasets come with structural MRI data that carries no reliable individual-level diagnostic signal. Under these conditions, smaller VLMs exhibit gains of up to 58\% F1 upon introduction of neuroimaging context, with distilled models becoming competitive with counterparts an order of magnitude larger. A contrastive confidence analysis reveals that merely \emph{mentioning} MRI availability in the task prompt accounts for 70-80\% of this shift, independent of whether imaging data is present, a domain-specific instance of modality collapse we term the \emph{scaffold effect}. Expert evaluation reveals fabrication of neuroimaging-grounded justifications across all conditions, and preference alignment, while eliminating MRI-referencing behavior, collapses both conditions toward random baseline. Our findings demonstrate that surface evaluations are inadequate indicators of multimodal reasoning, with direct implications for the deployment of VLMs in clinical settings.

URL PDF HTML ☆

赞 0 踩 0

2604.13240 2026-06-19 cs.CV cs.LG 版本更新

A High-Resolution Landscape Dataset for Concept-Based XAI With Application to Species Distribution Models

基于概念的可解释AI的高分辨率景观数据集及其在物种分布模型中的应用

Augustin de la Brosse, Damien Garreau, Thomas Houet, Thomas Corpetti

发表机构 * Université Rennes 2, CNRS, Nantes Université, Univ Brest, LETG, UMR 6554（里昂大学第二分校、法国国家科学研究中心、南特大学、布列塔尼大学、LETG、UMR 6554）； LTSER Zone Atelier Armorique（Armorique 领域实验室区）； University of Würzburg, Center for Artificial Intelligence and Data Science（乌尔姆大学、人工智能与数据科学中心）

AI总结提出首个基于概念的可解释AI方法用于物种分布模型，利用高分辨率多光谱和LiDAR无人机影像构建景观概念数据集，通过Robust TCAV量化景观概念对模型预测的影响，案例研究验证了方法的有效性。

详情

AI中文摘要

绘制物种空间分布对于保护政策和入侵物种管理至关重要。物种分布模型（SDMs）是完成此任务的主要工具，具有两个目的：实现稳健的预测性能，同时提供关于分布驱动因素的生态见解。然而，深度学习SDMs日益增长的复杂性使得提取这些见解更具挑战性。为了调和这些目标，我们提出了首个基于概念的可解释AI（XAI）在SDMs中的实现。我们利用Robust TCAV（测试与概念激活向量）方法量化景观概念对模型预测的影响。为此，我们提供了一个新的开放获取的景观概念数据集，该数据集源自高分辨率多光谱和LiDAR无人机影像。它包括跨越15个不同景观概念的653个斑块和1,450个随机参考斑块，旨在适用于广泛的物种。我们通过两个水生昆虫（襀翅目和毛翅目）的案例研究，使用两个卷积神经网络和一个视觉Transformer来展示这种方法。结果表明，基于概念的XAI有助于根据专家知识验证SDMs，同时发现产生新生态假说的新颖关联。Robust TCAV还提供了景观层面的信息，对政策制定和土地管理有用。代码和数据集公开可用。

英文摘要

Mapping the spatial distribution of species is essential for conservation policy and invasive species management. Species distribution models (SDMs) are the primary tools for this task, serving two purposes: achieving robust predictive performance while providing ecological insights into the driving factors of distribution. However, the increasing complexity of deep learning SDMs has made extracting these insights more challenging. To reconcile these objectives, we propose the first implementation of concept-based Explainable AI (XAI) for SDMs. We leverage the Robust TCAV (Testing with Concept Activation Vectors) methodology to quantify the influence of landscape concepts on model predictions. To enable this, we provide a new open-access landscape concept dataset derived from high-resolution multispectral and LiDAR drone imagery. It includes 653 patches across 15 distinct landscape concepts and 1,450 random reference patches, designed to suit a wide range of species. We demonstrate this approach through a case study of two aquatic insects, Plecoptera and Trichoptera, using two Convolutional Neural Networks and one Vision Transformer. Results show that concept-based XAI helps validate SDMs against expert knowledge while uncovering novel associations that generate new ecological hypotheses. Robust TCAV also provides landscape-level information, useful for policy-making and land management. Code and datasets are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2605.20448 2026-06-19 cs.CV cs.LG 版本更新

逐点是否无意义？基于图神经网络的降水临近预报的多模态消融研究

Ophélia Miralles, Máté Mile, Christoffer Artturi, Thomas Nipen, Ivar Seierstad

发表机构 * Norwegian Meteorological Institute（挪威气象研究所）

AI总结本研究通过多模态图神经网络系统，消融分析雷达、数值预报、地面观测、卫星数据及训练损失对降水临近预报的影响，发现各模态分别改善不同方面，点观测虽提升局部但需结合损失函数和不确定性表示才能优化雷达场。

详情

AI中文摘要

稀疏点观测在降水临近预报中日益可用，但尚不清楚它们能在多大程度上改善密集雷达场预报。我们通过北欧雷达区域的多模态图神经网络临近预报系统部分回答了这个问题。该模型预测未来两小时内每五分钟的降雨率，并采用雷达历史、MEPS数值天气预报、Netatmo地面观测、MSG卫星通道、随机噪声和基于CRPS的集合损失的不同组合进行训练。本研究设计为对操作相关信源和训练目标的消融。我们比较了仅雷达、NWP信息、站点信息、卫星信息、噪声增强和基于CRPS的配置，使用雷达网格、站点位置、降雨起始的互补诊断，以及oracle、位移和幅度评分。结果表明，每个信源改善了预报问题的不同方面。MEPS稳定了仅雷达外推，Netatmo观测改善了局部站点和起始诊断，卫星预测因子减少了某些站点级偏差，但在确定性使用时可能过早激活降雨。基于CRPS的配置提供了最一致的雷达网格增益，而卫星与CRPS的组合设置给出了最佳的整体oracle/DAS评分。这些结果不支持点观测对临近预报无用的结论，但表明局部观测技能和空间相干雷达场技能是不同的目标。实际意义是，稀疏观测可以提供有用的局部约束，但它们对雷达类场的益处取决于训练损失、不确定性表示以及观测支持在模型中的编码方式。

英文摘要

Sparse point observations are increasingly available for precipitation nowcasting, but it is unclear how much they improve dense radar-field forecasts. We partially address this question with a multimodal graph neural network nowcasting system over the Nordic radar domain. The model predicts rain rate every five minutes up to two hours ahead and is trained with different combinations of radar history, MEPS numerical weather prediction, Netatmo surface observations, MSG satellite channels, stochastic noise, and CRPS-based ensemble losses. The study is designed as an ablation of operationally relevant information sources and training objectives. We compare radar-only, NWP-informed, station-informed, satellite-informed, noise-augmented, and CRPS-based configurations using complementary diagnostics on the radar grid, at station locations, for rain onset, and through oracle, displacement, and amplitude scores. The results show that each source improves a different part of the forecast problem. MEPS stabilises radar-only extrapolation, Netatmo observations improve local station and onset diagnostics, and satellite predictors reduce some station-level biases but may activate rain too early when used deterministically. CRPS-based configurations provide the most consistent radar-grid gains, while the combined satellite and CRPS setup gives the best overall oracle/DAS score. These results do not support the conclusion that point observations are uninformative for nowcasting, but they show that local observational skill and spatially coherent radar-field skill are distinct targets. The practical implication is that sparse observations can provide useful local constraints, but their benefit for radar-like fields depends on the training loss, uncertainty representation, and how observation support is encoded in the model.

URL PDF HTML ☆

赞 0 踩 0

2606.19245 2026-06-19 cs.AI cs.LG 版本更新

TxBench-PP: Analyzing AI Agent Performance on Small-Molecule Preclinical Pharmacology

TxBench-PP：分析AI代理在小分子临床前药理学中的表现

Hannah Le, Ramesh Ramasamy, Alex Urrutia, Mahsa Yazdani, Tim Proctor, Kenny Workman

发表机构 * LatchBio

AI总结提出TxBench-PP基准，用于评估AI代理从真实实验数据中恢复临床前药理学结论的能力，测试显示最强配置Claude Opus 4.8 / Pi仅通过59.3%的端点尝试。

详情

AI中文摘要

人工智能（AI）代理有望通过压缩解释和决策循环来加速药物发现，但实际部署需要基于现实程序决策的可信评估。我们引入了TherapeuticsBench临床前药理学（TxBench-PP），这是一个针对小分子临床前药理学的可验证基准，也是更广泛的TherapeuticsBench在药物发现阶段和治疗模式中的首个聚焦切片。TxBench-PP测试代理是否能够从真实实验数据中恢复准确的结论，而非从文献中记忆的事实。该基准包含100个评估，按程序阶段、实验类型和任务结构索引，涵盖作用机制（MoA）和药效学（PD）推理、化合物-靶点结合、因果靶点验证、可开发性与安全性以及转化疗效。代理接收现实的工作流程快照，在编码环境中检查文件，并返回确定性评分的结构化答案。在16个模型-工具配置（包括11个模型和4,800条轨迹）中，没有系统能够可靠地恢复临床前药理学决策。最强配置Claude Opus 4.8 / Pi通过了59.3%的端点尝试（178/300；95% CI, 51.1-67.6），其次是GPT-5.5 / Pi，为55.3%（166/300；47.0-63.6）。

英文摘要

Artificial intelligence (AI) agents promise to accelerate drug discovery by compressing interpretation and decision-making loops, but practical deployment requires trusted evaluation on realistic program decisions. We introduce TherapeuticsBench Preclinical Pharmacology (TxBench-PP), a verifiable benchmark for small-molecule preclinical pharmacology and the first focused slice of a broader TherapeuticsBench effort across drug-discovery stages and therapeutic modalities. TxBench-PP tests whether agents can recover accurate conclusions from real-world assay data rather than memorized facts from literature. The benchmark contains 100 evaluations indexed by program stage, assay type, and task structure, spanning mechanism-of-action (MoA) and pharmacodynamic (PD) reasoning, compound-target engagement, causal target validation, developability and safety, and translational efficacy. Agents receive realistic workflow snapshots, inspect files in a coding environment, and return structured answers graded deterministically. Across 16 model-harness configurations, comprising 11 models and 4,800 trajectories, no system reliably recovered preclinical pharmacology decisions. The strongest configuration, Claude Opus 4.8 / Pi, passed 59.3\% of endpoint attempts (178/300; 95\% CI, 51.1-67.6), followed by GPT-5.5 / Pi at 55.3\% (166/300; 47.0-63.6).

URL PDF HTML ☆

赞 0 踩 0

2606.19363 2026-06-19 cs.LG 新提交

When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting

何时信任，如何蒸馏：面向轻量级鲁棒科学时间序列预测的多基础模型指导

Rupasree Dey, Abdul Matin, Nathan Orwick, Yao Zhang, Shrideep Pallickara, Sangmi Lee Pallickara

发表机构 * Colorado State University（科罗拉多州立大学）

AI总结提出Guard框架，通过上下文路由器和不确定性门控温度机制，从多个分布偏移的基础模型中蒸馏知识，训练轻量级预测器，在气象、碳通量等四个领域降低RMSE。

Comments KDD 2026, paper decision: Accepted, track: AI for Science. total 12 pages including references and appendix

详情

DOI: 10.1145/3770855.3819018

AI中文摘要

时间序列基础模型（TSFMs）在物理科学中的部署受到一个关键权衡的阻碍：虽然这些模型编码了丰富、通用的时间动态，但当零样本应用于特定科学领域时，它们会遭受严重的分布错位，并且其计算成本阻碍了在边缘计算传感器网络中的部署。我们解决了一个基本挑战：如何从错位的基础模型（FM）中提取潜在的结构知识，以训练轻量级、专门的预测器？我们提出了用于蒸馏的门控不确定性感知路由（Guard），这是一个新颖的框架，将多教师蒸馏重新定义为实例级决策过程，具有两种自适应机制：（1）上下文路由器，根据局部输入统计动态选择最相关的教师，利用不同基础模型之间的互补性；（2）不确定性门控温度机制，充当“断路器”，当教师置信度与领域现实偏离时自动减弱蒸馏强度。我们在四个气候关键领域评估了我们提出的轻量级框架：气象学、生态系统碳通量、土壤湿度和能源电网。我们的方法相对于固定权重的多教师蒸馏基线显著降低了RMSE，成功地从预训练的FM（教师）中蒸馏知识，即使由于原始和目标数据域之间的分布偏移，它们表现出次优的零样本准确性。我们证明，这些领域错位的教师仍然可以作为关键的纠正者，在28.5%的最难实例上优于全局优越的FM。最终，这使得适用于资源受限边缘部署的高精度科学预测成为可能。代码可在https://this URL获取。

英文摘要

The deployment of Time-Series Foundation Models (TSFMs) in physical sciences is hindered by a critical trade-off: while these models encode rich, universal temporal dynamics, they suffer from severe distributional misalignment when applied zero-shot to specific scientific domains, and their computational cost prohibits deployment in edge-computing sensor networks. We address a fundamental challenge: How can we extract latent structural knowledge from misaligned foundation models (FM) to train lightweight, specialized forecasters? We propose Gated Uncertainty-Aware Routing for Distillation (Guard), a novel framework that reframes multiteacher distillation as an instance-wise decision process with two adaptive mechanisms: (1) a Contextual Router that dynamically selects the most relevant teacher based on local input statistics, exploiting complementarity across diverse foundation models; and (2) an Uncertainty-Gated Temperature mechanism that acts as a "circuit-breaker," automatically attenuating distillation strength when teacher confidence diverges from domain reality. We evaluate our proposed lightweight framework on four climate-critical domains: meteorology, ecosystem carbon flux, soil moisture, and energy grids. Our method significantly reduces RMSE relative to a fixed-weight multi-teacher distillation baseline, successfully distilling knowledge from pretrained FMs (teachers) even when they exhibit suboptimal zero-shot accuracy due to distribution shift between the original and target data domains. We demonstrate that these domain-misaligned teachers can still serve as critical correctives, outperforming the globally superior FMs on 28.5% of the hardest instances. Ultimately, this enables high-precision scientific forecasting suitable for resource-constrained edge deployment. Code is available at https://github.com/RupasreeDey/GUARD-KDD2026.

URL PDF HTML ☆

赞 0 踩 0

2606.19371 2026-06-19 cs.LG cs.AI cs.CV 新提交

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

ProMUSE: 渐进式多模态不确定性引导的分阶段证据阿尔茨海默病分类

Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao

发表机构 * Kennesaw State University（肯尼索州立大学）； Michigan Technological University（密歇根理工大学）； University of Iowa（爱荷华大学）

AI总结提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，通过自适应决定何时需要额外模态，在保持准确性的同时降低数据采集成本。

详情

AI中文摘要

阿尔茨海默病（AD）是一种致命性疾病，会破坏老年人的记忆和认知能力。大多数AD治疗在早期阶段有效，导致对早期AD诊断的需求日益增加。AD诊断越来越依赖多模态数据，如临床评估、结构磁共振成像（MRI）和正电子发射断层扫描（PET）成像。然而，MRI和PET采集仍然昂贵且不易普及，使得全模态推理在现实临床工作流程中不切实际。我们提出ProMUSE，一种渐进式多模态不确定性引导的分阶段证据网络，该网络自适应地确定何时需要额外模态，有助于在保持准确性的同时降低数据采集的总体成本。ProMUSE首先使用低成本临床数据进行证据分类，并通过基于Dirichlet的主观逻辑模型量化不确定性。当不确定性超过学习阈值时，ProMUSE逐步引入MRI或PET特征，通过Dempster-Shafer理论融合模态层面的信念和不确定性，获得校准的多模态预测。这种分阶段采集策略能够在最小化对昂贵成像依赖的同时实现准确诊断。在ADNI、AIBL和OASIS数据集上针对CN-AD、CN-MCI和MCI-AD任务的实验表明，ProMUSE在减少50-90%的MRI/PET使用量的同时，实现了与全模态基线相当或更优的准确性，从而大幅节省成本。这些结果突显了ProMUSE作为现实世界AD筛查中一种实用、不确定性感知且资源高效的解决方案。

英文摘要

Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

URL PDF HTML ☆

赞 0 踩 0

2606.19373 2026-06-19 cs.LG cs.AI 新提交

VERITAS：验证器引导的零样本形式定理证明搜索

Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang

发表机构 * Department of Computer Science, Vanderbilt University（范德堡大学计算机科学系）； Amazon（亚马逊）

AI总结提出VERITAS框架，通过两阶段协议（Best-of-N采样+批评引导MCTS）利用验证器反馈进行零样本定理证明，在miniF2F上达40.6%准确率，并发布组合学基准VERITAS-CombiBench。

详情

AI中文摘要

基于LLM的形式化证明器通常将丰富的验证器信号（语法错误、类型不匹配、部分目标进展）压缩为二进制的通过/失败位。我们提出VERITAS，一个零样本框架，通过两阶段协议将每个验证器信号路由回证明搜索：首先进行Best-of-N采样，然后进行批评引导的MCTS遍历，该遍历将第一阶段失败作为显式负例吸收。该协议保留其第一阶段扫描解决的每个定理，因此第二阶段额外的解决可归因于反馈驱动的探索。VERITAS在miniF2F上达到40.6%（相比之下，独立运行的Best-of-5为36.9%，Portfolio为26.2%），在VERITAS-CombiBench上达到7.3%，这是一个我们发布的55个定理的组合学基准，在该基准上Best-of-5（1.8%）低于Portfolio（3.6%），暴露了当必须从验证器反馈中迭代恢复正确的引理名称时，无指导的采样会带来损害。工件可在GitHub上获取。

英文摘要

LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit. We present VERITAS, a zero-shot framework that routes every verifier signal back into proof search through a two-phase protocol: Best-of-N sampling first, then a critic-guided MCTS pass that ingests Phase 1 failures as explicit negative examples. The protocol preserves every theorem solved by its own Phase 1 sweep, so Phase 2's additional solves are attributable to feedback-driven exploration. VERITAS reaches 40.6% on miniF2F (vs. an independently run Best-of-5 at 36.9%, Portfolio 26.2%) and 7.3% on VERITAS-CombiBench, a 55-theorem combinatorics benchmark we release on which Best-of-5 (1.8%) falls below Portfolio (3.6%), exposing that unguided sampling hurts when correct lemma names must be recovered iteratively from verifier feedback. Artifacts are available on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2606.19412 2026-06-19 cs.LG 新提交

Spectral Retrieval-Augmented Time-Series Forecasting

频谱检索增强的时间序列预测

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

发表机构 * Applied Artificial Intelligence Initiative（应用人工智能倡议）； Deakin University（迪肯大学）

AI总结提出SpecReTF方法，通过将时间序列转换为窗口化频率表示并采用结合幅度和相位的相似性度量，以及指数移动平均加权方案，解决了现有检索方法在频谱盲区和时间近因上的局限性，提升了非平稳时间序列预测的准确性。

详情

AI中文摘要

时间序列预测利用历史模式来预测未来值，但传统方法在处理复杂、非平稳模式时面临挑战，这些模式在训练期间难以记忆。检索增强方法通过检索相似历史模式来增强预测，已成为有前景的解决方案。然而，现有检索方法存在两个基本局限性：频谱盲区，即忽略了捕捉潜在周期结构的关键频域特征；以及时间近因，即对所有历史数据一视同仁，而不强调最近、更相关的模式。在本文中，我们提出SpecReTF，一种新颖的检索方法，通过将时间序列转换为窗口化频率表示，并使用结合幅度和相位信息的组合度量来衡量相似性，从而解决这些问题。为了平衡近因和历史上下文，我们应用指数移动平均加权方案，强调最近的窗口。在基准数据集上的大量实验表明，SpecReTF优于时域检索方法，在多样化的非平稳时间序列上实现了卓越的预测准确性。

英文摘要

Time series forecasting leverages historical patterns to predict future values, but traditional methods face challenges when dealing with complex, non-stationary patterns that are difficult to memorize during training. Retrieval-augmented approaches have emerged as promising solutions by retrieving similar historical patterns to enhance predictions. However, existing retrieval methods suffer from two fundamental limitations: spectral blindness, which overlooks critical frequency-domain characteristics that capture underlying periodic structures, and temporal recency, which treats all historical data equally without emphasizing recent, more relevant patterns. In this paper, we propose SpecReTF, a novel retrieval method that addresses these issues by converting time series into windowed frequency representations, measuring similarity with a combined metric that captures both amplitude and phase information. To balance recency and historical context, we apply an exponential moving average weighting scheme that emphasizes recent windows. Extensive experiments on benchmark datasets demonstrate that SpecReTF outperforms time-domain retrieval methods, achieving superior forecasting accuracy across diverse, non-stationary time series.

URL PDF HTML ☆

赞 0 踩 0

2606.19413 2026-06-19 cs.LG 新提交

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

文本真的有用吗？揭示并解决多模态时间序列预测中的文本坍缩问题

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

AI总结针对多模态时间序列预测中文本分支被忽视导致“文本坍缩”的问题，提出REST-TS方法，通过让文本分支专门预测数值主干无法解释的残差，强制其提取真实内容，实现最先进性能。

详情

AI中文摘要

多模态时间序列预测将数值序列与领域相关的文本报告配对，有望将世界知识注入预测流程。然而，我们揭示了现有框架中的一个关键失败模式，称为文本坍缩：文本分支收敛到与内容无关的变换，无论输入描述如何，都贡献可忽略的判别信号。我们认为文本坍缩是时间序列预测中基本不对称性的结果：数值输入与输出强自相关，使得数值主干天生占主导地位，而文本分支尽管携带互补且通常关键的信息，却未被充分利用，导致其系统性欠利用。为解决此问题，我们提出REST-TS（时间序列中文本的残差独占监督），将不对称性转化为设计原则：数值主干产生其独立的数值预测，而文本分支被独占监督以预测残差的结构化组成部分，即数值无法解释的预测差距。由于没有数值路径可以减少这些损失，文本分支必须从输入描述中提取真实内容。在多样化的现实领域和主干架构上的评估表明，REST-TS实现了最先进的性能，并一致地显示出比现有框架更高的文本分支利用率，提供了强有力的经验证据，表明对文本分支进行残差监督迫使其从输入中提取真实内容。

英文摘要

Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose \textbf{REST-TS} (\textbf{R}esidual-\textbf{E}xclusive \textbf{S}upervision for \textbf{T}ext in \textbf{T}ime \textbf{S}eries), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.

URL PDF HTML ☆

赞 0 踩 0

2606.19560 2026-06-19 cs.LG 新提交

Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting

从流行病预测理解时间序列基础模型的关键特征

Alireza Jafari, Judy Fox, Geoffrey C. Fox, Madhav Marathe, Aniruddha Adiga

发表机构 * Department of Computer Science, School of Engineering and Applied Science, University of Virginia（弗吉尼亚大学工程与应用科学学院计算机科学系）； School of Data Science, University of Virginia（弗吉尼亚大学数据科学学院）； Biocomplexity Institute, University of Virginia（弗吉尼亚大学生物复杂性研究所）； Department of Electrical and Computer Engineering, School of Engineering and Applied Science, University of Virginia（弗吉尼亚大学工程与应用科学学院电气与计算机工程系）

AI总结系统评估多种时间序列模型在流感预测中的表现，发现混合专家模型性能最优，预训练在长时域提升显著，而LLM方法效果较差。

Comments 15 pages, 2 figures, 9 tables

详情

AI中文摘要

季节性流感每年感染数百万人，并在美国造成大量发病和死亡，因此准确的短期预测成为核心公共卫生需求。可靠的流行病时间序列预测可以为疫苗接种时机、医院人员配备和资源分配提供信息，然而现代预测架构在传染病监测数据上的比较行为仍未得到充分表征。我们通过系统评估区域流感预测来填补这一空白，使用流感样疾病监测和流感相关住院时间序列，在时间泛化和空间泛化设置下进行1-4周提前预测。我们比较了经典神经网络架构、基于数值的Transformer模型、预训练时间序列基础模型和基于LLM的预测方法。在各项任务中，我们证明融合多个预训练预测器的混合专家模型实现了最强的整体性能，表明异质预训练表示提供了互补的预测信息。我们的结果进一步表明，基于数值的Transformer模型产生可靠的预测，而预训练在更长时域上提供最大增益，特别是当预训练领域与流感动力学机制一致时。相比之下，基于LLM的时间序列方法在此设置下表现不如数值预测器。最后，我们研究了住院信息作为辅助协变量和预训练源的作用。住院信号在特定设置中提供了互补的改进，并阐明了额外的监测流如何增强多时域预测的鲁棒性。这些发现为流感防范的模型选择、预训练策略和辅助信号使用提供了可操作的指导。

英文摘要

Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.

URL PDF HTML ☆

赞 0 踩 0

2606.19562 2026-06-19 cs.LG physics.flu-dyn 新提交

Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport

耦合流体流动与输运的科学机器学习进展

Gabriel F. Barros, Rômulo M. Silva, Alvaro L. G. A. Coutinho

发表机构 * COPPE - Federal University of Rio de Janeiro - UFRJ（里约热内卢联邦大学COPPE学院）

AI总结综述科学机器学习在耦合流体流动与输运问题中的进展，包括基于SVD的线性降阶和PINNs、β-VAE等神经网络方法，并展示其在浊流和热对流中的应用。

详情

AI中文摘要

本章回顾了科学机器学习（SciML）在模拟由不可压缩Navier-Stokes方程和标量输运方程控制的耦合流体流动与输运现象方面的最新进展。这类系统出现在浊流和热对流等应用中，具有强非线性耦合和多尺度行为，使得高保真模拟计算成本高昂。为此，本章调查了构建高效代理模型的最新SciML方法，包括基于奇异值分解的线性降阶技术（如动态模态分解）和非线性神经网络方法（如物理信息神经网络（PINNs）和β-变分自编码器（β-VAEs））。首先介绍了作者将这些模型与高性能计算策略相结合的工作，包括自适应网格细化/粗化（AMR/C）和科学浮点数据压缩。然后提出了两个新贡献：通过PINNs对浊流进行代理建模，以及使用β-VAEs从热流中提取解缠的非线性模态。控制方程和代表性基准（包括锁交换流和Rayleigh-Bénard对流）说明了这些方法。本章篇幅较长，涵盖了耦合流体流动的数学和物理基础以及最先进建模的计算方面。总体而言，它展示了SciML如何在特定数据范围和建模假设下，实现复杂耦合系统的快速、精确近似，同时相对于全阶模拟大幅降低计算成本。实时预测和不确定性量化等更广泛的能力仍然是活跃的研究方向，其可行性在很大程度上取决于具体问题。

英文摘要

This chapter reviews recent advances in Scientific Machine Learning (SciML) for modeling coupled fluid flow and transport phenomena governed by the incompressible Navier-Stokes and scalar transport equations. Such systems, found in applications like turbidity currents and thermal convection, feature strong nonlinear coupling and multiscale behavior that make high-fidelity simulations computationally expensive. To address this, the chapter surveys state-of-the-art SciML methods for building efficient surrogate models, including linear reduced-order techniques based on Singular Value Decomposition (such as Dynamic Mode Decomposition) and nonlinear neural network approaches like Physics-Informed Neural Networks (PINNs) and $β$-Variational Autoencoders ($β$-VAEs). It first covers the authors' work combining these models with High Performance Computing strategies, including Adaptive Mesh Refinement/Coarsening (AMR/C) and scientific floating-point data compression. It then presents two new contributions: surrogate modeling of turbidity currents via PINNs, and the extraction of disentangled nonlinear modes from thermal flows using $β$-VAEs. Governing equations and representative benchmarks, including lock-exchange flows and Rayleigh-Bénard convection, illustrate these methodologies. The chapter is intentionally long, covering both the mathematical and physical foundations of coupled fluid flow and the computational aspects of state-of-the-art modeling. Overall, it demonstrates how SciML enables fast, accurate approximations of complex coupled systems within the specific data regimes and modeling assumptions considered, while substantially reducing computational cost relative to full-order simulations. Broader capabilities such as real-time prediction and uncertainty quantification remain active research directions whose feasibility depends strongly on the problem at hand.

URL PDF HTML ☆

赞 0 踩 0

2606.19623 2026-06-19 cs.LG 新提交

SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes

SEAGAN：面向动态植物过程的领域特定与边缘感知图注意力网络

Antriksh Srivastava, Soumyashree Kar

AI总结提出SEAGAN，将植物A-Ci曲线中的生化限制状态识别建模为图节点分类问题，利用距离kNN和辅助信号引导连接构建图，通过边缘感知图注意力网络提升分类性能，F1分数达0.857。

详情

AI中文摘要

图神经网络（GNN）为从通过物理、生物或功能关系关联的科学数据中学习提供了灵活框架。一个有前景的领域是植物生理学，其中测量的响应通常来自多个相互作用的过程，即使通过人工干预，这些过程的精确分离仍然困难。在植物生理学中，一个关键例子是A-Ci曲线，它关联净CO2同化速率（Anet）与叶片胞间CO2浓度（Ci），并用于估计叶片和作物冠层模型中的光合参数。然而，可靠估计需要识别每个曲线点处的活跃生化限制状态，这仍然是主要的不确定性来源。在这里，我们将沿A-Ci曲线的限制状态识别表述为基于图的节点分类问题，以曲线点为节点。使用基于距离的k近邻（kNN）和辅助信号引导（ASG）连接创建领域特定的图表示，边属性编码成对关系。该框架与常规学习基线、基于图的架构以及基于自动拟合的基准进行了评估。在具有已知真实限制状态的大型合成数据集上的结果表明，基于图的模型改善了分类，特别是在生化过渡区域附近。最佳配置SEAGAN（面向动态植物过程的领域特定与边缘感知图注意力网络）整合了过程感知节点特征、边属性、kNN连接和带加权交叉熵损失的图注意力，实现了0.857的F1分数和0.882的准确率。结果表明，将A-Ci曲线表示为图改善了生化限制状态分析，而局部kNN邻域上的边缘感知注意力提供了最有效的策略。

英文摘要

Graph neural networks (GNNs) provide a flexible framework for learning from scientific data linked through physical, biological, or functional relationships. One promising domain is plant physiology, where measured responses often arise from multiple interacting processes whose exact separation remains difficult even with manual intervention. In plant physiology, a key example is the A-Ci curve, which relates net CO2 assimilation rate (Anet) to leaf intercellular CO2 concentration (Ci) and is used to estimate photosynthetic parameters in leaf and crop-canopy models. However, reliable estimation requires identifying the active biochemical limitation state at each curve point, which remains a major source of uncertainty. Here, we formulate limitation-state identification along A-Ci curves as a graph-based node classification problem, with curve points as nodes. Domain-specific graph representations are created using distance-based k-nearest-neighbor (kNN) and auxiliary-signal-guided (ASG) connectivity, with edge attributes encoding pairwise relations. The framework was evaluated against conventional learning baselines, graph-based architectures, and an automated fitting-based benchmark. Results on a large synthetic dataset with known ground-truth limitation states show that graph-based models improve classification, particularly near biochemical transition regions. The best-performing configuration, SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), integrates process-aware node features, edge attributes, kNN connectivity, and graph attention with weighted cross-entropy loss, achieving an F1-score of 0.857 and an accuracy of 0.882. The results show that representing A-Ci curves as graphs improves biochemical limitation-state analysis, with edge-aware attention over local kNN neighborhoods providing the most effective strategy.

URL PDF HTML ☆

赞 0 踩 0

2606.20015 2026-06-19 cs.LG 新提交

Adaptive Distance-Aware Trunk Deep Operator Learning for Long-Span Roadway Bridges

自适应距离感知主干深度算子学习用于大跨度公路桥梁

Bilal Ahmed, Diab W. Abueidda, Waleed El-Sekelly, Tarek Abdoun, Mostafa E. Mobasher

发表机构 * Urban Engineering Department , addressline= New York University Abu Dhabi , country= United Arab Emirates ； organization= National Center for Supercomputing Applications , addressline= University of Illinois at Urbana-Champaign , country= United States of America ； organization= Department of Structural Engineering , addressline= Mansoura University , country= Mansoura, Egypt

AI总结提出自适应主干DeepONet框架，通过KNN构建荷载相关学习域、距离感知特征和刚度-informed Schur补全重建，实现大跨度桥梁局部响应高精度快速预测，相对误差低于5%，速度提升约60倍。

Comments 39 pages, 26 figures

详情

AI中文摘要

大跨度公路桥梁在车辆荷载下表现出高度局部化的结构响应，使得重复有限元分析在影响面生成和结构数字孪生等应用中计算成本高昂。现有的科学机器学习方法难以准确捕捉这些局部响应。为解决这一挑战，本研究提出了一种自适应主干DeepONet用于大型桥梁系统的局部结构响应预测。该框架利用KNN策略动态构建荷载相关的学习域，使网络聚焦于结构影响区域。主干网络进一步通过距离感知特征增强，这些特征编码了荷载与结构节点之间的几何关系。通过刚度-informed Schur补全公式引入基于物理的全场重建，使得自适应节点上的预测能够扩展到整个结构域。为了实现可扩展训练，使用降阶等效壳模型生成响应数据，该模型保留了主要的全局行为，同时显著降低了计算成本。该框架在基准桥梁模型和真实世界的Mussafah桥上进行了验证。结果表明，该方法实现了有限元级别的精度，相对误差低于5%，同时将总响应评估时间（包括全场重建）减少了约60倍；排除后处理重建步骤，AD-DeepONet推理比有限元快四个数量级。此外，该框架能够在任意车辆荷载配置下快速生成全场响应、影响线和影响面，显示出在大规模桥梁分析和数字孪生应用中的巨大潜力。

英文摘要

Long-span roadway bridges exhibit highly localized structural responses under vehicular loading, making repeated FE analysis computationally expensive for applications such as influence surface generation and structural digital twins. Existing SciML approaches struggle to accurately capture these localized responses. To address this challenge, this study proposes an adaptive-trunk DeepONet for localized structural response prediction in large-scale bridge systems. The framework dynamically constructs a load-dependent learning domain using a KNN strategy, allowing the network to focus on structural influence zones. The trunk network is further enhanced using distance-aware features that encode the geometric relationship between the load and structural nodes. A physics-based full-field reconstruction is incorporated through a stiffness-informed Schur complement formulation, enabling predictions at adaptive nodes to be extended to the entire structural domain. To enable scalable training, response data are generated using a reduced-order equivalent shell model that preserves the dominant global behavior while significantly reducing computational cost. The proposed framework is validated on both a benchmark bridge model and the real-world Mussafah Bridge. Results show that the method achieves FEM-level accuracy with relative errors below 5%, while reducing the total response evaluation time (including full-field reconstruction) by approximately 60x; excluding the post-processing reconstruction step, the AD-DeepONet inference is up to four orders of magnitude faster than FEM. In addition, the framework enables rapid generation of full-field responses, influence lines, and influence surfaces under arbitrary vehicular loading configurations, demonstrating strong potential for large-scale bridge analysis and digital twin applications.

URL PDF HTML ☆

赞 0 踩 0

2606.20034 2026-06-19 cs.LG 新提交

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

探索AlphaEarth和TESSERA嵌入在精细尺度局地气候区制图中的应用潜力：以瑞士五个城市为例

Htet Yamin Ko Ko, Clement Atzberger

AI总结本研究对比TESSERA和AlphaEarth嵌入与传统Sentinel-1/2数据，使用注意力U-Net将粗分辨率LCZ图提升至10米，发现嵌入模型在跨城市迁移和精度上表现更优，但跨年迁移仍是挑战。

详情

AI中文摘要

理解城市空间形态对于气候建模、风险评估和可持续城市设计至关重要，而局地气候区（LCZ）制图为此提供了基本框架。然而，许多城市仍使用约100米分辨率的粗LCZ记录，这并不适用于精细尺度的城市研究。在本研究中，我们将TESSERA（Feng等人，2025）和AlphaEarth（Brown等人，2025）的预计算嵌入与传统的Sentinel-1/2（S1S2）合成数据在瑞士五个城市进行比较，以评估它们是否能够使用基于注意力的U-Net将粗LCZ图提升至10米分辨率。三个实验评估了多城市迁移性、更高分辨率参考数据的影响以及对年际物候变化的时间鲁棒性。我们发现，所有数据集在前两个实验中均取得了强劲性能，测试数据的交并比（IoU）分别在0.59-0.69和0.77-0.82之间。TESSERA在两种设置下均一致优于S1S2和AlphaEarth。正如预期，我们发现基于嵌入的模型从一年迁移到另一年仍然是一个开放的挑战。然而，总体而言，我们的结果表明，来自地球观测基础模型的嵌入在减少耗时预处理和手动特征工程任务方面具有巨大潜力，并能够指导通用的基于深度学习的LCZ制图工作流程。当与简单的位置感知注意力U-Net架构结合时，这些嵌入增强了区域迁移性和可扩展性，支持为全球城市气候应用开发全面且可重复的精细尺度LCZ图。提高参考数据质量仍然是进一步提升精度的最强杠杆。

英文摘要

Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use coarse ~100-m resolution LCZ records, which are unsuitable for fine-scale urban research. In this study, precomputed embeddings from TESSERA (Feng et al., 2025) and AlphaEarth (Brown et al., 2025) are compared to traditional Sentinel-1/2 (S1S2) composites in five Swiss cities to see if they can upscale coarse LCZ maps to 10-m resolution using an attention-based U-Net. Three experiments assess multi-city transferability, the impact of higher-resolution reference data, and temporal robustness to year-to-year phenology changes. We find that all datasets achieve strong performance with test data Intersection-over-Union (IoU) ranging from 0.59-0.69 and 0.77-0.82 in the first two experiments. TESSERA consistently outperforms both S1S2 and AlphaEarth across both settings As expected, we find that the transfer of embedding-based models from one year to another remains an open challenge. Overall, however, our results demonstrate the promising potential of embeddings derived from EO foundation models to reduce time consuming preprocessing, respectively, manual feature engineering tasks and to guide a universal deep learning-based LCZ mapping workflow. When combined with a simple location-aware attention U-Net architecture, the embeddings enhance regional transferability and scalability, supporting the development of comprehensive and reproducible fine-scale LCZ maps for global urban climate applications Improving reference data quality remains the strongest lever for further accuracy gains.

URL PDF HTML ☆

赞 0 踩 0

2606.20037 2026-06-19 cs.LG 新提交

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

使用3D MRI和PET的多模态方法诊断阿尔茨海默病

Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis

发表机构 * DSS Lab, School of ECE, NTUA（NTUA ECE学院DSS实验室）

AI总结提出结合3D卷积特征提取器与三种融合策略（拼接、门控多模态单元、门控自注意力）及稀疏门控混合专家分类器的多模态模型，用于阿尔茨海默病诊断，在三个二分类任务上验证了输入自适应建模的有效性。

Comments 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

详情

DOI: 10.1109/BIBM66473.2025.11357133

AI中文摘要

阿尔茨海默病（AD）是一种不可逆的神经退行性疾病，也是全球主要的死亡原因之一。早期诊断尤为重要，尤其是在轻度认知障碍（MCI）阶段，及时干预有助于延缓其向AD的进展。神经影像数据，如磁共振成像（MRI）和正电子发射断层扫描（PET），可以通过提供与疾病相关的结构和功能脑变化来帮助早期检测脑部变化。然而，许多多模态模型仍通过静态拼接融合MRI和PET，并对所有受试者应用相同的计算，这限制了其对患者/站点异质性的鲁棒性，并可能浪费计算资源。为解决这些局限性，我们首次研究了将3D卷积特征提取器与三种融合策略（拼接、门控多模态单元（GMU）和门控自注意力）以及一个稀疏门控混合专家（MoE）分类器相结合的方法，该分类器执行输入自适应路由，仅激活每个病例中最具信息量的专家。最后，我们利用Grad-CAM可视化疾病相关区域，确保模型的可解释性。实验在三个二分类任务（NC vs. MCI、MCI vs. AD和NC vs. AD）上进行。结果表明，GMU在NC vs. MCI和NC vs. AD上分别达到80.46%和95.47%的准确率，而门控自注意力在MCI vs. AD上达到82.08%。消融实验表明，移除MoE会持续降低所有任务的准确率。这些发现强调了利用MRI和PET互补性的输入自适应多模态建模在AD诊断中的价值。

英文摘要

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

URL PDF HTML ☆

赞 0 踩 0

2606.20053 2026-06-19 cs.LG 新提交

Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States

用于电池内部状态自回归预测的神经代理架构比较研究

Gihyun Lee, Thorben Menne, Simon Olma, Jakob Hilgert, Sangyoung Park

AI总结系统比较四种神经网络架构（MLP、ResNet、U-Net、FNO）作为自回归状态转移算子，预测锂离子电池DFN模型内部状态，发现U-Net因多尺度空间归纳偏置在精度和速度上最优。

Comments 8 pages, 5 figures

详情

AI中文摘要

Doyle-Fuller-Newman (DFN) 模型以高保真度解析锂离子电池的内部电化学状态。然而，其控制方程的数值求解对于实时部署而言计算成本过高，限制了从单个电池到电池组及车队规模应用的可扩展性。虽然机器学习代理可以通过GPU加速大幅降低推理延迟，但现有大多数方法学习的是特定操作条件下的解近似，而非可泛化的状态演化动力学。本文系统比较了四种神经网络架构（MLP、ResNet、U-Net、FNO），它们被构建为自回归状态转移算子，可预测广泛操作条件下的完整DFN内部状态。为确保受控的架构比较，所有模型在统一框架下训练，采用多步展开和电流条件化，隔离了空间归纳偏置的影响。结果表明，U-Net的多尺度特征层次在300步自回归展开后，所有内部状态变量的平均最终步nRMSE达到3%，同时相比数值求解器实现了5.38倍的加速。这些发现强调了空间归纳偏置是代理性能的关键决定因素，推动了用于下一代电池管理系统和数字孪生的内部状态可观测性代理的发展。

英文摘要

The Doyle-Fuller-Newman (DFN) model resolves internal electrochemical states in lithium-ion batteries with high fidelity. However, the numerical solution of its governing equations is computationally prohibitive for real-time deployment, limiting scalability from individual cells to pack and fleet-scale applications. While machine learning surrogates can substantially reduce inference latency through GPU acceleration, most existing approaches learn solution approximations tied to specific operating conditions rather than learning generalizable state-evolution dynamics. This work presents a systematic comparison of four neural network architectures (MLP, ResNet, U-Net, FNO) formulated as autoregressive state-transition operators that predict full DFN internal states across a wide range of operating conditions. To ensure a controlled architectural comparison, all models are trained under a unified framework using multi-step unrolling and current-conditioning, isolating the impact of spatial inductive bias. Results demonstrate that the U-Net's multi-scale feature hierarchy achieves a mean final-step nRMSE of 3% averaged across all internal state variables after 300-step autoregressive rollouts, while providing a 5.38x speed-up over the numerical solver. These findings highlight spatial inductive bias as a critical determinant of surrogate performance, advancing the development of surrogates for internal state observability for next-generation battery management systems and digital twins.

URL PDF HTML ☆

赞 0 踩 0

2606.20055 2026-06-19 cs.LG 新提交

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

PaAno+：用于时间序列异常检测的多尺度编码与跨变量注意力

Youji Zhu, Hongbing Wang, Wenchao Liu, Xiaodong Liu, Xiangguang Xiong

发表机构 * School of Mathematical Sciences, Guizhou Normal University（贵州师范大学数学科学学院）； School of Big Data and Computer Science, Guizhou Normal University（贵州师范大学大数据与计算机科学学院）

AI总结提出PaAno模型，通过多尺度特征提取、跨变量融合注意力和补丁窗口排序预任务，实现轻量高效的时间序列异常检测，在TSB-AD基准上达到SOTA。

详情

AI中文摘要

时间序列异常检测在工业和医疗监测等关键领域具有重要的实用价值。当前基于Transformer和大模型的检测方法计算开销过大，而现有的轻量级替代方案受限于特征提取不足以及多变量间依赖关系建模不充分。为缓解上述缺陷，本研究在面向补丁的表征学习范式下，开发了一种轻量高效的异常检测模型PaAno。在编码器模块中，使用具有差异化感受野的卷积核构建多尺度特征提取主干，以捕获层次化时间特征；随后通过跨尺度自适应注意力聚合结合残差连接优化，进一步稳定特征表征学习。嵌入跨变量融合注意力模块以显式表征变量间相关性，使模型能够在复杂运行条件下识别异常模式。此外，定制了一种基于时间补丁窗口排序的新型前置任务，以揭示时间序列的内在结构特性，并利用三元组损失优化补丁嵌入空间以增强特征判别性。在TSB-AD基准上的大量实验表明，所提出的PaAno在单变量和多变量任务上均实现了最先进的检测精度，在包括VUS-PR在内的评估指标上相对于原始PaAno取得了显著性能提升。凭借紧凑的网络设计，该模型实现了良好的计算效率，能够在资源受限的终端上部署用于实时异常推理。

英文摘要

Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

URL PDF HTML ☆

赞 0 踩 0

2606.20172 2026-06-19 cs.LG 新提交

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

基于多模态胎儿MRI预测早产背景下的出生胎龄

Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

发表机构 * Leibniz University Hannover（莱布尼茨汉诺威大学）

AI总结提出结合多模态胎儿MRI和机器学习流程预测出生胎龄，包括数据插补、特征选择和回归模型，在333例对照和93例早产数据上评估，R²=0.13，MAE=2.74周，准确率0.77。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:013

Journal ref Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

详情

DOI: 10.59275/j.melba.2026-f34b

AI中文摘要

早产与高死亡率和终身发病风险相关。复杂的多因素病因阻碍了准确预测和最佳护理。我们开发并评估了一个包含定制机器学习方法的流程，用于数据插补、特征选择和回归模型，以从333例对照和93例早产病例的综合多模态形态和功能胎儿MRI数据预测出生胎龄。将出生胎龄预测分为足月和早产类别，并报告其准确性、敏感性和特异性。进行了消融研究以进一步验证流程设计。使用分层10折交叉验证评估性能。该流程实现了0.13的R²分数和2.74周的平均绝对误差。在交叉验证中，准确率为0.77，敏感性为0.59，特异性为0.82。流程选择的主要特征包括宫颈长度和基于胎盘T2*值的统计量。快速、运动鲁棒的多模态胎儿MRI技术与机器学习预测的结合使得能够预测出生胎龄。这些信息对任何妊娠都至关重要。据我们所知，早产在文献中仅作为分类问题处理。因此，这项工作提供了概念验证。未来工作将增加队列规模，以允许在早产队列内进行更精细的分层。我们的代码可在以下网址获取：此https URL。

英文摘要

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

URL PDF HTML ☆

赞 0 踩 0

2606.20174 2026-06-19 cs.LG 新提交

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

基于无细胞DNA分析的多癌早期检测的计算方法与挑战

Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki

发表机构 * AGH University of Krakow（AGH克拉科夫大学）； Norwegian Institute of Public Health（挪威公共卫生研究所）

AI总结综述2022-2025年cfDNA多癌早期检测的计算方法，重点分析片段组学和表观遗传特征提取技术，指出多模态集成方法最具临床整合潜力，但需标准化评估协议。

详情

AI中文摘要

无细胞DNA（cfDNA）是非侵入性多癌早期检测（MCED）的一个有前景的途径，因为它可以通过单次抽血同时检测多种癌症，尤其对目前缺乏既定筛查程序的癌症具有敏感性。本文综述了2022年至2025年间基于cfDNA的MCED计算方法。我们重点关注如何提取和分析片段组学和表观遗传特征以在早期阶段检测癌症。我们首先简要概述cfDNA信号的生物学基础，然后回顾经典的统计和机器学习方法以及深度学习框架，包括基于自编码器的模型。对于每种方法，我们讨论其生物学可解释性、验证策略以及临床整合的准备情况。此外，我们将当前挑战分为技术、计算和方法论三类，并概述该领域的开放问题。本综述表明，多模态集成方法在临床整合方面具有最强的前景和最高的准备度。然而，为了更好地评估未来工作和进行并排比较，标准化评估协议和报告结果至关重要。

英文摘要

Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

URL PDF HTML ☆

赞 0 踩 0

2606.20291 2026-06-19 cs.LG cs.CV 新提交

Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision

整合国家森林清查、机载激光雷达和卫星影像，利用计算机视觉实现森林结构的全覆盖制图

Luke J. Zachmann, David D. Diaz, Vincent A. Landau, Chelsey Walden-Schreiner, Tony Chang, Nathan E. Rutenbeck, Katharyn A. Duffy, Kiarie Ndegwa, Andreas Gros, Scott Conway, Guy Bayes

发表机构 * Vibrant Planet Public Benefit Corporation（Vibrant Planet 公益公司）

AI总结提出VibrantForests框架，结合卫星影像、激光雷达样本和计算机视觉，以10米分辨率生成美国本土的冠层覆盖、高度、生物量等森林属性图，减少饱和与回归均值问题。

详情

AI中文摘要

遥感技术越来越被依赖，以提供可操作的科学研究，用于大型景观的森林和野火风险管理。全覆盖、每年更新的地图是有效森林管理的持续需求。许多规划系统和数据收集结合了不同目的、年份和预测质量的异质数据源，导致运营规划系统中的混淆行为。我们介绍了VibrantForests框架，该框架被开发并应用于绘制森林属性，为有效的森林和野火规划提供一致的基础。VibrantForests包括一个基于卫星的森林结构模型，该模型在激光雷达衍生的样本上训练，并应用于美国本土，以10米分辨率同时生成冠层覆盖度、冠层高度、地上活树生物量、胸高断面积和二次平均直径的估计。我们展示了跨越从稀疏冠层/低生物量到密集冠层/高生物量的全部森林条件的预测能力。结果表明，我们的模型扩展了在类似被动传感器模型中常见的饱和范围，并减少了回归均值行为，该行为通常在小/稀疏条件下高估森林属性，在大/密集条件下低估森林属性。VibrantForests框架通过以年度节奏和10米分辨率提供管理相关属性的一致全覆盖估计，解决了大面积森林和野火规划中的一个关键限制。

英文摘要

Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine disparate data sources with different purposes, vintages, and prediction quality, which leads to confounding behavior in operational planning systems. We introduce the VibrantForests framework, developed and applied to map forest attributes and provide a coherent foundation for effective forest and wildfire planning. VibrantForests includes a satellite-based forest structure model trained on lidar-derived samples and applied across the contiguous United States to concurrently generate estimates of canopy cover, canopy height, aboveground live tree biomass, basal area, and quadratic mean diameter at 10-meter resolution. We demonstrate predictive capability spanning the full spectrum of forest conditions ranging from sparse-canopy/low-biomass to dense-canopy/high-biomass. Results show that our model extends the range at which saturation is commonly encountered in comparable passive-sensor models, and reduces regression-to-mean behavior that commonly produces overestimation of forest attributes in small/sparse conditions and underestimation in large/dense conditions. The VibrantForests framework addresses a key limitation in large-area forest and wildfire planning by delivering coherent wall-to-wall estimates of management-relevant attributes at annual cadence and 10m resolution.

URL PDF HTML ☆

赞 0 踩 0

2606.20326 2026-06-19 cs.LG physics.comp-ph 新提交

Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs

量子-经典物理信息Kolmogorov-Arnold网络求解偏微分方程

Xiang Rao, Yuxuan Shen

AI总结提出QCPIKAN，首个量子-经典物理信息Kolmogorov-Arnold网络，结合Chebyshev多项式KAN层和参数化量子电路，通过嵌入物理约束加速高频误差指数收敛并抑制数值色散，在多孔介质渗流场景中优于现有量子-经典PINN。

详情

AI中文摘要

我们开发了QCPIKAN，这是首个旨在求解偏微分方程（PDE）的量子-经典物理信息Kolmogorov-Arnold网络。该混合框架基于Chebyshev多项式KAN层和参数化量子电路构建，将物理约束嵌入训练损失中以强制执行物理一致性。我们的基于逼近论的理论研究证明，该设计将高频误差收敛加速至指数速率，并有效抑制数值色散。我们在多孔介质中的三个典型渗流场景（包括单相流、组分运移和两相流）上验证了该框架。与现有的量子-经典物理信息神经网络相比，QCPIKAN在全局预测精度、局部误差控制、动态演化跟踪和驱替前沿定位方面均实现了优越性能。这项工作为求解复杂PDE提供了一种鲁棒且高效的替代方案。

英文摘要

We develop QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations (PDEs). Built upon Chebyshev-polynomial KAN layers and parameterized quantum circuits, this hybrid framework embeds physical constraints into the training loss to enforce physical consistency. Our theoretical investigations grounded in approximation theory prove that this design accelerates high-frequency error convergence to an exponential rate and effectively mitigates numerical dispersion. We validate the framework across three typical seepage scenarios in porous media, including single-phase flow, component transport and two-phase flow. Compared with existing quantum-classical physics-informed neural networks, QCPIKAN achieves superior performance in global prediction accuracy, local error control, dynamic evolution tracking and displacement front localization. This work provides a robust and efficient alternative for solving complex PDEs.

URL PDF HTML ☆

赞 0 踩 0

2606.20329 2026-06-19 cs.LG physics.geo-ph 新提交

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

约束混合建模预测土壤系统中微生物动态与有机质周转

Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos

发表机构 * Agrosphere (IBG-3), Forschungszentrum Jülich GmbH（农业圈（IBG-3），于利希研究中心）； Institute of Crop Science and Resource Conservation, University of Bonn（波恩大学作物科学与资源保护研究所）； Institute of Computer Science, University of Bonn（波恩大学计算机科学研究所）； Lamarr Institute for Machine Learning and Artificial Intelligence（拉马尔机器学习和人工智能研究所）

AI总结提出首个混合建模框架，利用神经网络从宏基因组推断功能性状预测过程模型参数，并整合生态理论约束，有效预测微生物动态和有机质周转。

Comments Accepted at ICML '26

详情

AI中文摘要

土壤微生物控制有机质循环，并在很大程度上决定土壤系统如何应对和缓解气候变化及环境威胁。因此，在基于过程的土壤模型中表示微生物动态对于预测土壤碳循环至关重要，尽管从数据中获取信息极具挑战性。改进参数化的一个有前景的方法是整合基因组数据，然而建模基因组与微生物驱动过程之间复杂且未知的关系是一个未解决的问题。在这项工作中，我们提出了第一个混合建模框架，用于从基于DNA测序数据的宏基因组推断功能性状中推导基于过程的土壤有机质周转模型的生物动力学参数值。我们的模型通过神经网络从基因组性状数据预测过程模型的生物动力学参数，并整合来自生态理论和文献的约束，以确保即使是非观测状态变量也能实现逼真的行为。我们在不同复杂度的合成基因组性状数据集和真实数据上评估了我们的方法，结果表明，我们的方法在多个基线上提高了性能，并有效学习了过程模型中不可测量组分的动态，即使是在小训练数据集上也是如此。

英文摘要

Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is therefore critical to predict carbon cycling in soils, albeit highly challenging to inform from data. One promising approach to improve their parametrisation is the integration of genomic data, yet modelling the complex and unknown relationship between genomes and the processes the microbes are driving is an unsolved problem. In this work, we present the first hybrid modeling framework for deriving biokinetic parameter values of a process-based soil organic matter turnover model from metagenome-inferred functional traits based on DNA sequencing data. Our model predicts biokinetic parameters of the process-based model from genomic trait data with a neural network and integrates constraints from ecological theory and literature to ensure realistic behavior, even of non-observed state variables. We evaluate our method on synthetic genomic trait datasets of varying complexity and on real data, showing that our approach improves performance over multiple baselines and learns the dynamics of unmeasurable components of the process-based model effectively, even for small training datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.20359 2026-06-19 cs.LG 新提交

Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act

训练、检索，还是两者兼用？针对安大略省住宅租赁法的正确法定引用的四组头对头比较

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结研究自诉租户、房东和帮助台工作人员如何获得正确的法定引用，通过四组实验比较微调、检索及混合方法，发现SFT+RAG混合模型在精确匹配上得分最高且无幻觉引用。

详情

AI中文摘要

自诉租户、房东和帮助台工作人员需要被指向实际管辖问题的法律条款，并附有正确的法定引用。我们在2006年安大略省住宅租赁法（RTA）及其核心法规上研究此任务，从操作者的角度实证提问：微调是否足够，还是需要混合检索？我们在Qwen2.5-7B-Instruct上运行四组头对头比较（基础零样本、仅LoRA SFT、仅RAG、以及SFT+RAG混合），在一个小型、待人工验证的真实评估集上，以引用的精确匹配（节+小节）评分。基础模型无法引用RTA，仅SFT会错误回忆章节；检索至关重要，并通过构造将幻觉降至零；而SFT+RAG混合模型得分最高，精确匹配为0.481，且无幻觉引用。其优势在于SFT使得条款选择对高召回候选集（损害零样本RAG）更加鲁棒。值得注意的是，这种廉价的bge-small混合模型匹配或超越了基于更大、专门检索模型（更大的嵌入器和交叉编码器重排序器）的管道，更大/改进的训练集也无帮助：在此任务中，强法定引用性能不需要专门的检索模型或更多数据。该工件将幻觉归零并超过了基准提升线，但未达到期望的0.70精确匹配目标。所有结果均基于小型、待人工验证的真实评估集，并作为初步结果报告。

英文摘要

Self-represented tenants, landlords, and help-desk staff need to be pointed at the provision of law that actually governs a question, with a correct statutory citation. We study this task on the Ontario Residential Tenancies Act, 2006 (RTA) and its core regulation, asking the operator's question empirically: is fine-tuning enough, or is hybrid retrieval needed? We run a four-arm head-to-head on Qwen2.5-7B-Instruct (base zero-shot, LoRA SFT-only, RAG-only, and an SFT+RAG hybrid), scored on citation exact-match (section+subsection) over a small, human-verification-pending real eval set. The base model cannot cite the RTA and SFT-only mis-recalls sections; retrieval is essential and drives hallucination to zero by construction; and the SFT+RAG hybrid scores highest at 0.481 exact-match with zero hallucinated citations. Its edge comes from SFT making provision selection more robust to the higher-recall candidate sets that hurt zero-shot RAG. Notably, this cheap bge-small hybrid matches or beats a pipeline built on bigger, specialized retrieval models (a larger embedder and a cross-encoder reranker), and a larger/improved training set does not help either: strong statutory-citation performance here does not require specialized retrieval models or more data. The artifact zeroes hallucination and clears the lift-over-base bar but does not reach the aspirational 0.70 exact-match target. All results are on a small, human-verification-pending real eval set and are reported as preliminary.

URL PDF HTML ☆

赞 0 踩 0

2606.20364 2026-06-19 cs.LG 新提交

Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation

评判以改进：一种去偏的 VLM-as-3D-Judge 协议用于单图像 3D 生成

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结本文提出一种去偏的跨模型 VLM-as-3D-Judge 协议，将评判者从排序扩展到优化，通过训练与评估评判者分离、位置偏差校正及修复三种失效模式，实现轻量级适应下与强基线的匹配。

详情

AI中文摘要

一项伴随研究建立了一个去偏的、跨模型的 VLM-as-3D-Judge，能够可靠地对单图像到 3D 网格质量进行排序，而廉价的几何和 CLIP 代理在此方面表现不足。本文提出：该评判者的偏好能否专门化一个强大的开放生成器 TRELLIS，针对单一资产类别（家具），且无需人工标注？将评判者从排序扩展到优化是本文的工作所在。将 VLM 评判者推入训练和评估循环会暴露排序从未触发的失效模式，因此我们的贡献是对评判者进行优化级别的强化：一个训练评判者（Qwen2.5-VL-7B）与一个评估评判者（InternVL3-8B）保持分离以打破循环性；位置偏差校正；以及针对三种失效模式（图像过载、隐藏几何的溅射渲染、以及奖励干净但错误输出的无参考评判）的修复，并附有校准证据（清晰差距胜率 0.83-1.0；基线间约 0.5）。使用此协议作为独立评估者，仅从公开模型和数据出发，采用轻量级参数高效适应，我们发现我们的方法匹配了强基线而非超越它。独立基线样本几乎不携带可学习的偏好（0.94 顺序翻转率），因此信号必须通过质量对比构造来设计。在六种适应方法、两种输入模式和严重程度扫描中，最具针对性的方法——严重退化下的条件器修复——达到了与基线持平（0.50），而没有方法达到 >=65% 的胜率目标。结果是机制性的：干净输入使评判者饱和，流式 DIT 微调通过采样器被冲刷，而条件器修复是改变几何的位点。胜率在 n=8 个对象时具有方向性。匹配一个强大的公开数据基线本身具有信息量：超越它需要比公开数据上的轻量级 PEFT 更多，而评判者协议是可复用的。

英文摘要

A companion study established a de-biased, cross-model VLM-as-3D-judge that reliably ranks single-image-to-3D mesh quality where cheap geometry and CLIP proxies fall short. This paper asks: can that judge's preferences specialize a strong open generator, TRELLIS, on one asset class (furniture), cheaply and without human labels? Taking the judge from ranking to optimization is where the work lives. Pushing a VLM judge into the training and evaluation loop exposes failure modes ranking never triggered, so our contribution is an optimization-grade hardening of the judge: a training judge (Qwen2.5-VL-7B) held distinct from an evaluation judge (InternVL3-8B) to break circularity; position-bias correction; and fixes for three failure modes (image overload, geometry-hiding splat renders, and reference-free judging that rewards clean-but-wrong outputs), with calibration evidence (clear-gap win-rate 0.83-1.0; base-vs-base ~0.5). Using this protocol as an independent evaluator, and working only from public models and data with lightweight parameter-efficient adaptation, we find our methods match the strong base rather than exceed it. Independent base samples carry essentially no learnable preference (0.94 order-flip rate), so signal must be engineered by quality-contrastive construction. Across six adaptation methods, two input regimes, and a severity sweep, the most targeted - conditioner repair under severe degradation - reaches parity (0.50) with the base, while no method clears the >=65% win-rate target. The result is mechanistic: clean inputs saturate the judge, flow-DIT fine-tuning washes out through the sampler, and conditioning repair is the locus that moves geometry. Win-rates are directional at n=8 objects. Matching a strong public-data base with cheap adaptation is itself informative: exceeding it needs more than lightweight PEFT on public data, and the judge protocol is reusable.

URL PDF HTML ☆

赞 0 踩 0

2606.20417 2026-06-19 cs.LG 新提交

Neural network surrogates with uncertainty quantification for inverse problems in partial differential equations

具有不确定性量化的神经网络代理模型用于偏微分方程反问题

Christian Jimenez-Beltran, Aretha L. Teckentrup, Antonio Vergari, Konstantinos C. Zygalakis

AI总结提出DeepGaLA神经网络代理模型，为微分方程求解器提供不确定性感知预测，结合延迟接受MCMC诊断，实现高效可靠的贝叶斯反演。

详情

AI中文摘要

微分方程的反问题在科学和工程中普遍存在，其目标是从噪声或不完整的观测中推断未知模型参数。传统数值方法通常计算成本高昂，尤其是在贝叶斯设置中，对于复杂正向模型和高维参数空间，评估似然函数变得非常昂贵。为了应对这一挑战，我们引入了DeepGaLA，一种用于微分方程求解器的神经网络代理模型，它提供不确定性感知的预测，在训练数据有限时减少过度自信的推断。为了在实践中评估代理诱导的后验近似的保真度，我们表明，短时间运行的延迟接受马尔可夫链蒙特卡洛可以作为有效的诊断工具。在一系列数值实验中，DeepGaLA提供的正向模型近似精度与已建立的高斯过程代理相当，同时在参数维度增加时更好地保持效率。此外，它可以纳入微分方程约束，包括非线性情况。总体而言，这些结果表明，具有不确定性量化的神经代理模型能够实现复杂系统中反问题的可扩展且可靠的贝叶斯推断。

英文摘要

Inverse problems for differential equations arise throughout science and engineering, where one seeks to infer unknown model parameters from noisy or incomplete observations. Traditional numerical methods for these problems are often computationally expensive, particularly in Bayesian settings where evaluating the likelihood becomes costly for complex forward models and high-dimensional parameter spaces. To address this challenge, we introduce DeepGaLA, a neural-network surrogate for differential equation solvers that provides uncertainty-aware predictions, reducing overconfident inference when training data are limited. To evaluate the fidelity of the surrogate-induced posterior approximations in practice, we show that a short run of delayed-acceptance Markov chain Monte Carlo can serve as an effective diagnostic. Across a range of numerical experiments, DeepGaLA delivers forward-model approximations with accuracy comparable to established Gaussian-process surrogates, while better maintaining efficiency as parameter dimension grows. Moreover, it can incorporate differential-equation constraints, including in nonlinear settings. Overall, these results indicate that uncertainty-quantified neural surrogates can enable scalable and reliable Bayesian inference for inverse problems in complex systems.

URL PDF HTML ☆

赞 0 踩 0

2606.20467 2026-06-19 cs.LG cs.NA math.NA physics.comp-ph 新提交

Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks

智能符号搜索：超越手工表达式、网格和神经网络的PDE特征化

Zongmin Yu, Liu Yang

发表机构 * National University of Singapore（新加坡国立大学）

AI总结提出ASYS框架，通过智能体将PDE理论转化为可微分符号程序，结合进化搜索和梯度优化自动发现解析形式或近似，在多个问题中生成可解释表示。

详情

AI中文摘要

数学家通过数学结构而非计算值表来理解PDE解。历史上，这需要针对每个问题单独进行数学分析。数值模拟和神经网络都不能直接产生这些结构。我们提出智能符号搜索（ASYS），一种先验引导框架，其中智能体将PDE理论、公共问题约束和累积搜索经验转化为可测试的可微分符号程序。数学形式在进化搜索下被精炼，而其连续参数通过基于梯度的优化拟合。这使得搜索成为归纳偏置注入的自动化形式，而非盲目的符号回归。对于已知解析形式的问题，ASYS自然恢复这些形式；对于其他问题，ASYS构建解析近似，可引导数学家进行进一步分析。在我们的实验中，跨越五个问题，包括有界动力学、有限时间爆破和自由边界聚焦，ASYS产生了可解释表示，包括Allen-Cahn 2D动力学的几何界面公式和Keller-Segel趋化爆破的九参数收缩律，这些场景中先前没有闭式描述。ASYS展示了表征PDE解的新范式的可能性，超越了手工解析解、基于网格的数值解和神经网络近似。

英文摘要

Mathematicians understand a PDE solution through mathematical structures rather than tables of computed values. Historically, this has been the product of mathematical analysis, carried out by hand for each problem individually. Neither numerical simulation nor neural networks produce those structures directly. We propose Agentic Symbolic Search (ASYS), a prior-guided framework in which an agent translates PDE theory, public problem constraints, and accumulated search experience into testable differentiable symbolic programs. The mathematical forms are refined under evolutionary search, while their continuous parameters are fit by gradient-based optimization. This makes the search an automated form of inductive-bias injection rather than blind symbolic regression. For problems with known analytical forms, ASYS recovers these forms naturally; for other problems, ASYS constructs analytical approximations which can guide mathematicians toward further analysis. In our experiments, across five problems spanning bounded dynamics, finite-time blow-up, and free-boundary focusing, ASYS produces interpretable representations, including a geometric interface formula for Allen-Cahn 2D dynamics and a nine-parameter contraction law for Keller-Segel chemotactic blow-up, in settings where no closed-form description was previously available. ASYS shows the possibility of a new paradigm for characterizing PDE solutions, beyond handcrafted analytical solutions, mesh-based numerical solutions, and neural network approximations.

URL PDF HTML ☆

赞 0 踩 0

2606.17054 2026-06-19 cs.RO cs.AI cs.CV cs.LG 交叉投稿

Human Universal Grasping

人类通用抓取

Kevin Yuanbo Wu, Tianxing Zhou, Isaac Tu, Billy Yan, Irmak Guzey, David Fouhey, Dandan Shan, Lerrel Pinto

发表机构 * New York University（纽约大学）； Tsinghua University（清华大学）； University of Michigan（密歇根大学）

AI总结提出HUG模型，利用人类抓取数据（1M-HUG数据集）和流匹配方法，从单张RGB-D图像生成多样化抓取姿态，并重定向到机器人手，实现零样本抓取，在HUG-Bench上超越基线23%-34%。

Comments 28 pages, 20 figures, 7 tables

详情

AI中文摘要

人类可以轻松抓取物体，而多指机器人远未达到这种通用性。我们认为机器人抓取数据最自然的来源是人类，他们每天拿起数千个物体。我们提出HUG，一个流匹配模型，能够为任何用户指定的物体（从立体相机捕获的单张RGB-D图像中）生成多样化的人类抓取。使用智能眼镜，我们首先收集了1M-HUGs，一个自我中心的人类抓取数据集，涵盖100万帧（27.8小时）和41栋建筑中的6,707个物体实例。接下来，为了建模自然人类抓取的分布，我们的新型流匹配模型融合RGB和深度观测，输出由手腕平移、手腕旋转和MANO手姿态参数化的抓取。预测的抓取可以重定向到各种机器人手，实现在日常场景中的零样本抓取。为了标准化评估，我们构建了一个新的模拟基准HUG-Bench，包含来自五个几何类别和不同尺寸的90个未见物体，并带有公制尺度的3D网格。我们在真实世界中评估HUG，使用HUG-Bench的30个物体测试集，跨越多个立体相机、机器人实体和家庭环境。HUG在我们具有挑战性的物体集上比最先进的抓取基线高出23%和34%。代码、数据、基准、检查点和交互式演示已在我们的网站上发布：https://grasping.io/

英文摘要

Humans can grasp objects effortlessly, whereas multi-fingered robots are far from this level of generality. We argue that the most natural source of robot grasping data is from humans, who pick up thousands of objects every day. We present HUG, a flow-matching model that generates diverse human grasps for any user-specified object in a single RGB-D image captured from a stereo camera. Using smart glasses, we first collect 1M-HUGs, an egocentric dataset of human grasps spanning 1M frames (27.8 hrs) and 6,707 object instances across 41 buildings. Next, to model the distribution of natural human grasps, our novel flow-matching model fuses RGB and depth observations to output a grasp parameterized by wrist translation, wrist rotation, and MANO hand pose. Predicted grasps can be retargeted to various robot hands, enabling zero-shot grasping in everyday scenes. To standardize evaluation, we build a new simulated benchmark, HUG-Bench, of 90 unseen objects from five geometric categories and various sizes, with metric-scale 3D meshes. We evaluate HUG in the real world on the 30-object test set of HUG-Bench across multiple stereo cameras, robot embodiments, and household environments. HUG outperforms the state-of-the-art grasping baselines by +23% and +34% on our challenging object set. Code, data, benchmark, checkpoints, and an interactive demo are released on our website: https://grasping.io/

URL PDF HTML ☆

赞 0 踩 0

2606.19372 2026-06-19 eess.IV cs.CV cs.LG 交叉投稿

Full-Self Diagnostics (FSD): Physics-Grounded Visual Biomarker Inference from Smartphone Video via Inverse Problems and Operator Learning

全自诊断(FSD): 通过逆问题和算子学习从智能手机视频进行基于物理的可视生物标志物推断

Jonathan Thomas, Harsh Thaker

AI总结提出全自诊断(FSD)框架，结合物理前向模型、信息论可观测性、正则化逆问题、算子学习和随机变分推断，从9秒面部视频恢复生理状态，在59名受试者38812次扫描中验证，血糖MARD达29.86%。

Comments 38,812 paired scans, preliminary longitudinal validation of multichannel visual glucose inference (MARD 17 to 46 percent across cohorts); physics plus information theory plus operator learning framework

详情

AI中文摘要

我们提出全自诊断(FSD)，一个统一的数学框架，用于从消费级智能手机拍摄的无约束9秒面部视频中恢复潜在生理状态。该方法整合了五个相互增强的组件：(1)基于辐射传输方程和发色团吸收的物理前向模型，将相机观测映射到生物标志物浓度；(2)信息论可观测性理论，证明多通道视觉信号（光谱、脉搏、呼吸、微表情和眼动）与生理状态包含严格递增的互信息；(3)具有域均匀可辨识性保证的稳定Tikhonov正则化逆问题；(4)算子学习公式，实现跨设备、分辨率和人群的泛化；(5)可解释为随机变分推断的监督学习过程，从配对生物传感器真实值持续优化模型，性能随配对观测数量的平方根倒数比例提升。在59名受试者的38812次真实世界配对扫描上的实证验证展示了实际性能。第一作者自采数据（血糖范围35-550 mg/dL）的MARD为29.86%，97.57%的预测落在Clarke误差网格A+B区，仅0.27%在危险E区。一位管理良好的糖尿病参与者在较窄的70-180 mg/dL范围内达到MARD 17%。这些结果证实，消费级面部视频编码了足够的结构化信息，可在完全无约束条件下进行临床相关的非侵入性生物标志物推断，且性能随更多配对数据的可用性可预测地提升。

英文摘要

We present Full-Self Diagnostics (FSD), a unified mathematical framework for recovering latent physiological states from unconstrained 9-second facial videos captured by consumer smartphones. The approach integrates five mutually reinforcing components: (1) a physics-based forward model derived from the radiative transfer equation and chromophore absorption that maps camera observables to biomarker concentrations; (2) an information-theoretic observability theory proving that multi-channel visual signals (spectral, pulse, respiratory, micro-expression, and oculomotor) contain strictly increasing mutual information with physiological state; (3) a stable, Tikhonov-regularized inverse problem with domain-uniform identifiability guarantees; (4) an operator-learning formulation that enables generalization across devices, resolutions, and populations; and (5) a supervised learning procedure, interpretable as stochastic variational inference, that continuously refines the model from paired biosensor ground truth with performance improving proportionally to one over the square root of the number of paired observations. Empirical validation on 38812 real-world paired scans across 59 subjects demonstrates practical performance. Self-collected data from the lead author (glucose range 35-550 mg/dL) yields MARD of 29.86 percent with 97.57 percent of predictions in Clarke Error Grid Zones A+B and only 0.27 percent in the dangerous Zone E. A well-managed diabetic participant achieves MARD of 17 percent in the narrower 70-180 mg/dL band. These results confirm that consumer-grade facial video encodes sufficient structured information for clinically relevant, non-invasive biomarker inference under fully unconstrained conditions, with performance scaling predictably as more paired data becomes available.

URL PDF HTML ☆

赞 0 踩 0

2606.19380 2026-06-19 cs.SE cs.LG 交叉投稿

AgentArmor: A Framework, Evaluation, \& Mitigation of Coding Agent Failures

AgentArmor：编码代理失败的框架、评估与缓解

Kenneth Ge, Andre Assis

AI总结提出AgentArmor框架，通过系统提示增强、命令分类器、三振政策等机制，缓解编码代理因规范不足、能力错误和工具错误导致的失败，显著提升安全性。

详情

AI中文摘要

软件工程和部署正越来越多地委托给AI编码代理。它们的广泛采用暴露了罕见但极具破坏性的失败模式。在本文中，我们研究这些失败模式源于三种不同的机制：规范不足，即默认模型行为不安全；能力错误，即安全动作可用但模型因偏见或能力限制而未遵循；以及代理工具错误，即模型未能通过工具执行安全动作。我们在8个不同的评估中评估这些机制，每个评估都受实际部署失败的启发，总计20个编码环境和59个合成转录模板。基于此评估，我们提出AgentArmor，一种代理工具修改，以缓解这些错误。通过添加扩展的系统提示、单独的命令分类器、“三振”策略、确定性护栏以及代理编辑自身上下文的工具，我们证明AgentArmor在统计显著数量的样本上更安全。因此，我们为当前编码代理提出具体缓解措施，并为未来代理工具功能提出设计理念。

英文摘要

Software engineering and deployment are increasingly being delegated to AI coding agents. The scale of their adoption is surfacing rare, but highly destructive, failure modes. In this paper, we study these failure modes as stemming from three distinct mechanisms: underspecification, where default model behavior is unsafe; capability errors, where the safe action is available but the model does not adhere to it due to bias or capability limitations; and agent harness errors, where the model fails to execute the safe action through the harness. We evaluate these across 8 different evaluations, each inspired by real-life deployment failures, totaling 20 coding environments and 59 synthetic transcript templates. Based on this evaluation, we propose AgentArmor, an agent harness modification, to mitigate these errors. By adding an extended system prompt, a separate command classifier, a ``3 strikes'' policy, deterministic guardrails, and tools for the agent to edit its own context, we show that AgentArmor is safer across a statistically significant number of samples. Thus, we suggest concrete mitigations for current coding agents and a design philosophy for future agent harness features.

URL PDF HTML ☆

赞 0 踩 0

2606.19501 2026-06-19 cs.AI cs.CL cs.LG q-fin.RM 交叉投稿

DeXposure-Claw: An Agentic System for DeFi Risk Supervision

DeXposure-Claw: 一个用于DeFi风险监管的智能体系统

Aijie Shu, Bowei Chen, Wenbin Wu, Cathy Yi-Hsuan Chen, Fengxiang He

发表机构 * University of Edinburgh（爱丁堡大学）； University of Glasgow（格拉斯哥大学）； University of Cambridge（剑桥大学）

AI总结针对DeFi监管中LLM智能体易误报的问题，提出DeXposure-Claw系统，通过图时间序列基础模型预测风险网络，结合确定性监控和置信度门控生成可审计监管票据，并构建六轴评估基准DeXposure-Bench，实验验证有效性。

详情

AI中文摘要

去中心化金融使监管者面临快速变化的网络化信用风险。通用LLM智能体不适合此场景：它们过度解读弱证据并推荐高风险干预，而现有评估无法提供符合监管者需求的误报衡量方式。我们提出DeXposure-Claw，一个基于预测的智能体监管系统，通过结构化证据引导LLM决策：(1) DeXposure-FM，一个图时间序列基础模型，预测未来风险网络；(2) 确定性监控和压力场景将预测转化为类型化警报、归因信号和场景证据；(3) 数据健康和置信度门控在DeXposure-Claw发出带有理由的可审计监管票据前限制升级。我们进一步开发了DeXposure-Bench，一个六轴评估框架，其决策轴根据符合监管者的绝对损失真实情况和显式误干预率对票据评分。在五年每周真实数据上的实验充分支持了我们的系统。代码见 https://this URL。

英文摘要

Decentralized finance exposes supervisors to fast-moving, networked credit risks. General-purpose LLM agents fit this setting poorly: they over-read weak evidence and recommend high-stakes interventions, while existing evaluations offer no regulator-aligned way to measure the resulting false alarms. We introduce DeXposure-Claw, a forecast-grounded agentic supervision system that routes LLM decisions through structured evidence: (1) DeXposure-FM, a graph time-series foundation model, forecasts future exposure networks; (2) deterministic monitors and stress scenarios then turn those forecasts into typed alerts, attribution signals, and scenario evidence; and (3) data-health and confidence gates constrain escalation before DeXposure-Claw emits auditable supervisory tickets with rationales. We further develop DeXposure-Bench, a six-axis evaluation harness, whose decision axis scores tickets against a regulator-aligned absolute-loss ground truth and an explicit false-intervention rate. Experiments on five years of weekly real data fully support our system. Code is at https://github.com/EVIEHub/DeXposure-Claw.

URL PDF HTML ☆

赞 0 踩 0

2606.19627 2026-06-19 cs.IR cs.AI cs.LG 交叉投稿

VCG: A Multimodal Retrieval Framework for E-Commerce Video Feeds under Extreme Cold-Start Conditions

VCG：极端冷启动条件下电商视频流的多模态检索框架

Katya Mirylenka, Egor Malykh, Mahdyar Ravanbakhsh, Michael Gygli, Marco-Andrea Buchmann, Andrew Dzhoha, Svitlana Borzenko, Francesca Catino, Mohamed Gaafar, Maarten Versteegh, Thomas Kober, Dario d'Andrea, Ellie Langhans

AI总结针对电商视频流中的极端冷启动和偏差问题，提出基于领域自适应视觉-语言模型（CLIP）的可扩展多模态检索系统VCG，实现零样本检索，在线测试显示深度视频完成率提升50%。

详情

AI中文摘要

数字商业格局正从静态的搜索驱动型目录转向动态的沉浸式视频流。这一转变引入了“极端冷启动”问题：与传统商品不同，新的短视频缺乏协同过滤所需的密集交互历史。此外，沉浸式视频流引入了强烈的位置和时长偏差，扭曲了标准参与信号。在本文中，我们展示了视频候选生成（VCG）系统，这是一个可扩展的多模态检索引擎，旨在解决大规模电商环境中的这些挑战。通过利用领域自适应的视觉-语言模型（基于CLIP），我们将用户和视频映射到共享语义空间，实现基于视觉内容而非行为历史的零样本检索。我们详细介绍了系统的架构，并进行了严格的评估，比较了生成式（LLM）和判别式（CLIP）嵌入。结果表明，虽然生成式模型在属性预测方面表现出色，但在检索任务中会出现嵌入空间坍塌。在线A/B测试表明，VCG有效缓解了参与偏差，使深度视频完成率提升了50%。为了展示系统的能力，我们提供了一个交互式演示，包含三种双向检索场景：产品到视频、视频到产品和零样本语义搜索。

英文摘要

The digital commerce landscape is shifting from static, search-driven catalogs to dynamic, immersive video feeds. This transition introduces an ``extreme cold-start'' problem: unlike traditional items, new short-form videos lack the dense interaction history required for collaborative filtering. Furthermore, immersive feeds introduce strong position and duration biases that distort standard engagement signals. In this paper, we demonstrate the Video Candidate Generation (VCG) system, a scalable multimodal retrieval engine designed to solve these challenges in a large-scale e-commerce environment. By leveraging a domain-adapted vision-language model (based on CLIP), we map users and videos into a shared semantic space, enabling zero-shot retrieval based on visual content rather than behavioral history. We detail the system's architecture and present a rigorous evaluation comparing generative (LLM) vs. discriminative (CLIP) embeddings. Our results show that while generative models excel at attribute prediction, they suffer from embedding space collapse in retrieval tasks. Online A/B testing demonstrates that VCG effectively mitigates engagement biases, yielding a 50\% uplift in deep video completion. To showcase the system's capabilities, we present an interactive demonstration featuring three bi-directional retrieval scenarios: Product-to-Video, Video-to-Product, and Zero-Shot Semantic Search.

URL PDF HTML ☆

赞 0 踩 0

2606.19629 2026-06-19 cs.SD cs.AI cs.LG 交叉投稿

RIVET: Robust Idempotent Voice Attribute Editing

RIVET: 鲁棒的幂等语音属性编辑

Dareen Alharthi, Bhuvan Koduru, Rita Singh, Bhiksha Raj

发表机构 * Carnegie Mellon University（卡内基梅隆大学）

AI总结提出RIVET训练框架，通过幂等性正则化提升语音属性编辑模型对标签噪声的鲁棒性，在合成噪声和真实噪声数据集上均优于标准训练。

详情

AI中文摘要

语音属性编辑模型在保留说话人身份的同时修改年龄和性别等特征。然而，在大规模语音数据集中，属性标注通常带有噪声或不一致，这可能导致条件生成模型产生不稳定的编辑。在这项工作中，我们证明幂等性为提升对噪声标签的鲁棒性提供了一种有效机制。幂等算子是指重复应用不会改变结果的算子，即 f(f(x)) = f(x)。强制这一性质作为一种隐式正则化器，降低了对错误标注样本的敏感性。我们引入了 RIVET，一种结合幂等性目标以提升对标签噪声鲁棒性的训练框架。我们在受控标签噪声下以及在具有自然噪声标注的 GLOBE 数据集上评估了 RIVET。RIVET 提高了编辑成功率，并且比标准训练更好地保留了说话人身份，表明幂等性提升了语音编辑模型的鲁棒性。

英文摘要

Voice attribute editing models modify characteristics such as age and gender while preserving speaker identity. In large-scale speech datasets, however, attribute annotations are often noisy or inconsistent, which can cause conditional generative models to produce unstable edits. In this work, we show that idempotency provides an effective mechanism for improving robustness to noisy labels. An idempotent operator is one for which repeated application does not change the result, i.e., f(f(x)) = f(x). Enforcing this property acts as an implicit regularizer that reduces sensitivity to mislabeled examples. We introduce RIVET, a training framework that incorporates an idempotency objective to improve robustness to label noise. We evaluate RIVET under controlled label noise and on the GLOBE dataset with naturally noisy annotations. RIVET improves editing success and better preserves speaker identity than standard training, showing that idempotency improves robustness in voice editing models.

URL PDF HTML ☆

赞 0 踩 0

2606.19699 2026-06-19 cs.RO cs.LG cs.SY eess.SY 交叉投稿

Comparative Study on Agility, Efficiency, and Impact Absorption of Bipedal Robots with Active Toes

具有主动脚趾的双足机器人敏捷性、效率和冲击吸收的比较研究

Joong-Gil Kim, Wontae Ye, Geunwoo Cho, Seong-Ho Yun, Se-Hyoung Cho, Yong-Jae Kim

发表机构 * School of Electrical, Electronics and Communication Engineering, Korea University of Technology and Education（韩国技术教育大学电气、电子与通信工程学院）； Artificial Intelligence and Robotics Institute, Korea Institute of Science and Technology（韩国科学技术研究院人工智能与机器人研究所）； Robot Innovation Hub, WIRobotics Inc.（WIRobotics公司机器人创新中心）

AI总结提出一种14自由度双足机器人，模拟人类脚趾的轻量、高扭矩、坚固特性，通过高保真仿真训练环境，对比有无主动脚趾的配置，发现脚趾机器人以1.33米/秒行走时，CoT降低17.5%，脚跟冲击力降低5.0%，路径偏差平均和最大分别降低25.0%和34.0%。

Comments 6 pages, 7 figures

详情

AI中文摘要

人类腿部表现出高效率、敏捷性和冲击吸收能力，其中脚趾在这些能力中起着关键作用。尽管已经有许多尝试在机器人中实现类似人类的脚趾，但它们尚未完全复制人类特征，也没有严格验证其益处。我们提出了一种14自由度的双足机器人，模拟人类脚趾的轻量、高扭矩、坚固特性。为了定量分析主动脚趾在敏捷性、效率和冲击吸收方面的有效性，我们开发了一个高保真仿真训练环境，该环境反映了具有耦合传动和精确功耗的实际执行器。为了确保有和没有主动脚趾的配置之间的公平比较，我们设计了一个最小化强化学习奖励函数，并对两者应用了相同的训练程序。仿真结果表明，在1.33米/秒行走时，与无脚趾配置相比，配备脚趾的机器人将CoT降低了17.5%，脚跟冲击力降低了5.0%。在敏捷性测试中，平均和最大路径偏差分别降低了25.0%和34.0%。

英文摘要

Human legs exhibit high efficiency, agility, and impact absorption, with toes playing a crucial role in these capabilities. While many attempts have been made to implement human-like toes in robots, they have not fully replicated human characteristics nor rigorously validated their benefits. We propose a 14-DOF biped robot emulating human toes' lightweight, high-torque, robust nature. To quantitatively analyze the effectiveness of the active toes in terms of agility, efficiency, and impact absorption, we developed a high-fidelity simulation training environment that reflects actual actuators with coupled transmissions and accurate power consumption. To ensure a fair comparison between configurations with and without active toes, we designed a minimal RL reward function and applied an identical training procedure to both. The simulation results indicate that, at 1.33 m/s walking, the toe-equipped robot reduced CoT by 17.5% and heel-strike GRF by 5.0% compared with the toe-ablation configuration. On the agility test, average and maximum path deviation decreased by 25.0% and 34.0%, respectively.

URL PDF HTML ☆

赞 0 踩 0

2606.19711 2026-06-19 cs.RO cs.LG cs.SY eess.SY 交叉投稿

A Differentiable Composite Approximation Framework for Autonomous Underwater Vehicle Maneuvering Modeling from Sea-Trial Data

一种可微复合近似框架：基于海试数据的自主水下航行器机动建模

Aobo Wang, Aifei Xia, Zihao Wang, Lizhu Hao

发表机构 * College of Shipbuilding Engineering, Harbin Engineering University（哈尔滨工程大学船舶工程学院）； China Academy of Aerospace Aerodynamics（中国航天空气动力技术研究院）； Institute of Artificial Intelligence, Shanghai University（上海大学人工智能研究院）； China Ship Scientific Research Center（中国船舶科学研究中心）

AI总结提出可微复合近似框架，结合多项式基与数据自适应基联合校准，并引入转向运动电流估计补偿，提升AUV机动预测精度。

详情

AI中文摘要

基于机载测量的场建模可以生成反映真实运行特性的自主水下航行器（AUV）机动模型。从近似角度看，传统机动模型使用预定义的约束多项式基，而数据驱动模型使用数据自适应基。受此基函数视角启发，本文提出一种可微复合近似公式，其中多项式基分量和数据自适应基分量被视为单个预测器的可微部分并联合校准。开发了一种基于梯度的协同校准方法用于全尺寸AUV机动预测，其中灵敏度感知机制调节有界多项式更新，而神经残差在共享预测目标下捕获剩余非线性差异。为了考虑现场数据中的海流效应，引入了一种基于转向运动的电流估计和补偿程序，以构建电流补偿的学习目标用于训练和滚动预测。该框架使用从7米长AUV在多种机动条件下收集的海试数据进行评估。结果表明，与纯多项式、纯神经网络和冻结先验混合基线相比，所提方法改进了递归轨迹和速度预测，证明了其在基于现场数据的AUV机动建模中的适用性。

英文摘要

Field-based modeling from onboard measurements can produce autonomous underwater vehicle (AUV) maneuvering models that reflect real operating characteristics. From an approximation perspective, conventional maneuvering models use predefined constraint polynomial bases, whereas data-driven models use data-adaptive bases. Motivated by this basis-function view, this paper presents a differentiable composite-approximation formulation, in which the polynomial-basis component and the data-adaptive basis component are treated as differentiable parts of a single predictor and calibrated jointly. A gradient-based co-calibration method is developed for full-scale AUV maneuvering prediction, where a sensitivity-aware mechanism regulates bounded polynomial updates while the neural residual captures remaining nonlinear discrepancies under a shared prediction objective. To account for ocean-current effects in field data, a turning-motion-based current estimation and compensation procedure is incorporated to construct current-compensated learning targets for training and rollout. The framework is evaluated using sea-trial data collected from a 7-meter AUV under multiple maneuvering conditions. Results show that the proposed method improves recursive trajectory and velocity prediction compared with polynomial-only, neural-only, and frozen-prior hybrid baselines, demonstrating its applicability to field-data-based AUV maneuvering modeling.

URL PDF HTML ☆

赞 0 踩 0

2606.19793 2026-06-19 eess.AS cs.AI cs.LG cs.SD eess.SP 交叉投稿

Systematic Study of Dysarthric Speech Recognition: Spectral Features and Acoustic Models

构音障碍语音识别的系统研究：频谱特征与声学模型

Paban Sapkota, Hemant Kumar Kathania, Mikko Kurimo, Sudarsana Reddy Kadiri, Shrikanth Narayanan

AI总结本文系统研究不同频谱特征与声学模型的组合，通过引入音高特征和优化训练帧重叠数，在F-TDNN模型上实现孤立词和句子识别相对提升4.65%和4.63%。

详情

AI中文摘要

识别构音障碍语音的挑战主要源于发音精度受损导致的显著声学变异性。过去的研究表明，通过使用混合DNN/HMM序列区分性训练可以改善识别性能。本文对不同声学模型定制的各种声学特征组合进行了全面研究，为每种模型提供了合适的特征选择。音高特征的引入显著提高了识别性能，特别是对于涉及构音障碍语音的句子识别任务。通过对TORGO数据库的系统检查，我们证明了增强最先进的因子化时延神经网络（F-TDNN）模型识别构音障碍语音性能的潜力。使用F-TDNN模型实现的方法，与先前研究相比，在构音障碍语音的孤立词识别中获得了4.65%的相对改进，在句子识别中获得了4.63%的相对改进。这种改进有效补偿了语音变异性，这归因于我们精心选择了连续训练样本块之间的重叠帧数。

英文摘要

The challenge associated with recognizing dysarthric speech primarily arises from pronounced acoustic variability attributed to impaired articulatory precision. Past research has demonstrated improved recognition through the use of hybrid DNN/HMM sequence discriminative training. This paper presents a comprehensive investigation of various combinations of acoustic features tailored to different Acoustic Models, offering suitable feature selections for each. The incorporation of Pitch features notably improved recognition performance, especially for sentence recognition tasks involving dysarthric speech. Through a systematic examination of the TORGO database, we have demonstrated the potential to enhance the performance of the state-of-the-art Factorized Time Delay Neural Network (F-TDNN) model for recognizing dysarthric speech. Our methods, implemented with the F-TDNN model, resulted in a 4.65\% relative improvement in isolated word recognition and a 4.63\% relative improvement in sentence recognition for dysarthric speech, compared to previous research. This improvement effectively compensates for speech variability, attributable to our deliberate selection of the number of overlapping frames between consecutive training example chunks.

URL PDF HTML ☆

赞 0 踩 0

2606.19812 2026-06-19 cs.AI cs.LG 交叉投稿

Human-on-the-Loop Orchestration for AI-Assisted Legal Discovery

AI辅助法律发现中的人机协同编排

Anushree Sinha, Srivaths Ranganathan, Abhishek Dharmaratnakar, Debanshu Das

AI总结针对AI代理在电子取证中因多步推理错误导致的法律风险，提出一种四层验证架构，通过人机协同阈值减少特权豁免风险达61%。

详情

AI中文摘要

自主大语言模型（LLM）代理越来越多地部署于电子发现（e-discovery），其中跨多步推理链的复合错误可能构成法律渎职。与单轮检索不同，在特权文档语料库上运行的代理工作流表现出我们称之为“轨迹崩溃”的一类失败：早期错误分类无声传播，导致整个特权审查失效。本文做出三项贡献。首先，我们提出一个按功能阶段组织的法律信息检索中代理失败的结构化分类法。其次，我们引入一个四层验证架构——涵盖规划、推理、执行和不确定性量化——旨在这些失败复合之前拦截它们。第三，我们在一个合成电子取证语料库上进行初步模拟研究，展示强制性人机协同（HOTL）升级阈值如何相对于完全自主基线降低特权豁免风险。我们的结果表明，与完全自主部署相比，校准的不确定性阈值可将特权豁免风险降低高达61%，同时将不到四分之一的文档路由给律师审查。

英文摘要

Autonomous Large Language Model (LLM) agents are increasingly deployed in electronic discovery (e-discovery), where compounding errors across multi-step reasoning chains can constitute legal malpractice. Unlike single-turn retrieval, agentic workflows operating over privileged document corpora exhibit a class of failure we term "trajectory collapse": an early misclassification silently propagates, rendering an entire privilege review invalid. This paper makes three contributions. First, we propose a structured taxonomy of agentic failures in legal information retrieval, organized by functional stage. Second, we introduce a four-layer verification architecture -- spanning planning, reasoning, execution, and uncertainty quantification -- designed to intercept these failures before they compound. Third, we present a preliminary simulation study on a synthetic e-discovery corpus that demonstrates how mandatory Human-on-the-Loop (HOTL) escalation thresholds reduce privilege-waiver risk relative to fully autonomous baselines. Our results suggest that calibrated uncertainty thresholds can reduce privilege-waiver risk by up to 61% versus fully autonomous deployment, while routing fewer than one quarter of documents to attorney review.

URL PDF HTML ☆

赞 0 踩 0

2606.19821 2026-06-19 cs.AI cs.LG 交叉投稿

TelcoAgent: A Scalable 5G Multi-KPM Forecasting With 3GPP-Grounded Explainability

TelcoAgent: 一种可扩展的5G多KPM预测与3GPP基础可解释性

Geon Kim, Dara Ron, Sukhdeep Singh, Suyog Moogi, Pranshav Gajjar, V V N K Someswara Rao Koduri, Een Kee Hong, Vijay K. Shah

发表机构 * NextG Wireless Lab, North Carolina State University（北卡罗来纳州立大学下一代无线实验室）； Kyung Hee University（庆熙大学）

AI总结提出TelcoAgent框架，利用基础模型实现多KPM的零样本预测，通过3GPP知识图谱和可解释性管道提供可操作诊断。

Comments 6 pages, 6 figures. Submitted to IEEE GLOBECOM 2026

详情

AI中文摘要

关键性能测量（KPM）预测对于5G及下一代电信网络的主动网络管理至关重要。然而，现有的机器学习（ML）方法在可扩展性和可解释性方面存在显著局限性，限制了其在实际部署中的有效性。我们提出TelcoAgent，一个基于基础模型的框架，能够在不需站点特定训练的情况下，跨不同网络单元实现多个KPM的准确、可扩展和可解释预测。具体而言，该框架包含三个关键组件：(i) 一个自动化的三智能体管道，直接从规范文档构建第三代合作伙伴计划（3GPP）知识图谱；(ii) 一个可扩展的基于时间序列基础模型（TSFM）的预测管道，以提供准确的零样本预测；以及(iii) 一个推理和解释管道，提供可操作的、领域基础的诊断。使用来自美国网络运营商的三个月真实城市级5G KPM数据集进行评估，TelcoAgent在200个单元中针对每个单元的7个KPM均展示了高预测准确性，同时提供了可解释的见解和可操作的指令来解决网络退化问题。

英文摘要

Key Performance Measurement (KPM) forecasting is essential for proactive network management of 5G and next-generation telecom networks. However, existing machine learning (ML) approaches face significant limitations in scalability and explainability, restricting their effectiveness in real-world deployments. We propose TelcoAgent, a foundation model-based framework that enables accurate, scalable, and explainable forecasting of multiple KPMs across diverse network cells without the need for site-specific training. Specifically, the framework comprises three key components: (i) an automated three-agent pipeline that constructs a 3rd Generation Partnership Project (3GPP) knowledge graph directly from specification documents, (ii) a scalable, time-series foundation model (TSFM)-based prediction pipeline to deliver accurate, zero-shot forecasting, and finally (iii) a reasoning and explanation pipeline that provides actionable, domain-grounded diagnostics. Evaluated using a 3-month, real-world, city-scale 5G KPM dataset from a U.S.-based network operator, TelcoAgent demonstrates high forecasting accuracy for all 7 considered KPMs per cell across 200 cells, while delivering explainable insights and actionable instructions to address network degradations.

URL PDF HTML ☆

赞 0 踩 0

2606.19823 2026-06-19 eess.AS cs.LG 交叉投稿

Low-Burden Data Augmentation for Dysarthric ASR via Zero-Shot Voice Cloning

低负担数据增强：通过零样本语音克隆改善构音障碍语音识别

Satwinder Singh, Qianli Wang, Zihan Zhong, Clarion Mendes, Hasegawa-Johnson, Waleed Abdulla, Seyed Reza Shahamiri

AI总结针对构音障碍语音数据稀缺和变异性大的问题，提出使用零样本语音克隆（Higgs Audio V2）生成合成数据，微调Whisper-medium模型，在TORGO数据集上达到与真实数据微调相近的词错误率，并显著降低数据收集成本。

Comments Accepted to Interspeech 2026, Sydney, Australia

详情

AI中文摘要

由于数据稀缺和说话人之间高度变异，自动语音识别对于构音障碍语音仍然不可靠。虽然合成数据可以弥补这些不足，但传统方法通常需要大量的说话人特定数据，重新引入了数据收集瓶颈。我们研究零样本语音克隆作为一种低负担的增强策略，使用Higgs Audio V2克隆TORGO数据集中的说话人。我们在克隆数据、真实数据和混合数据上微调Whisper-medium，并在保留的真实语音上进行评估。与零样本基线（31.62%）相比，克隆数据微调实现了具有竞争力的26.00%词错误率，几乎与真实数据微调（24.44%）和混合数据微调（25.12%）相当。值得注意的是，对于中重度构音障碍说话人，克隆和混合微调优于真实数据微调。在SAP-1102上的跨语料库评估中，克隆微调取得了最佳结果（相对提升11.45%）。这些结果表明，零样本克隆提供了可扩展的训练数据，绕过了昂贵的数据收集瓶颈。

英文摘要

Automatic speech recognition remains unreliable for dysarthric speech due to data scarcity and high inter-speaker variability. While synthetic data can address these gaps, traditional methods often require extensive speaker-specific data, reintroducing the collection bottleneck. We investigate zero-shot voice cloning as a low-burden augmentation strategy, using Higgs Audio V2 to clone speakers in the TORGO dataset. We fine-tune (FT) Whisper-medium on cloned, real, and hybrid data and evaluate on held-out real speech. Compared to the zero-shot (31.62%), Clone FT achieved a competitive 26.00% WER, nearly matching the 24.44% and 25.12% seen with Real and Hybrid FT, respectively. Notably, Clone and Hybrid FT outperform Real FT for moderate-severe speakers. Clone FT achieves the best results (11.45% relative) in cross-corpus evaluation on the SAP-1102. These results suggest that zero-shot cloning provides scalable training data that circumvents the costly data collection bottleneck.

URL PDF HTML ☆

赞 0 踩 0

2606.19852 2026-06-19 cs.CL cs.LG 交叉投稿

Prompt, Plan, Extract: Zero-Shot Agentic LLMs Workflows for Lung Pathology Extraction from Clinical Narratives

提示、规划、提取：用于从临床叙述中提取肺部病理学的零样本智能体LLM工作流

Aman Pathak, Cheng Peng, Mengxian Lyu, Ziyi Chen, Reema Solan, Sankalp Talankar, Yasir Khan, Hiren Mehta, Aokun Chen, Yi Guo, Yonghui Wu

AI总结提出零样本智能体工作流，利用开源大语言模型从肺切除病理报告中提取13个CAP字段，在无训练下达到0.893 Micro-F1，接近监督方法。

Comments 7 pages, 2 figures, 3 tables. Affiliations: (1) Department of Health Outcomes and Biomedical Informatics, College of Medicine, University of Florida, Gainesville, FL, USA; (2) Division of Pulmonary, Critical Care and Sleep Medicine, Department of Medicine, College of Medicine, University of Florida, Gainesville, FL, USA; (3) College of Nursing, Florida State University, Tallahassee, FL, USA

详情

AI中文摘要

从病理报告中提取信息对于癌症分期和肿瘤登记人群至关重要。然而关键数据仍嵌入在叙述性报告中，使得手动提取劳动密集且易出错。传统的监督自然语言处理流程通过完全监督的命名实体识别和关系提取来解决这一问题，但需要昂贵的人工标注，并且当上游实体缺失时会出现级联故障。在本研究中，我们开发了一个零样本智能体工作流，并评估了五个开源生成式大语言模型（LLMs），以从肺切除病理报告中填充13个美国病理学家学会的概要字段。我们使用一种新颖的、与注册对齐的评估框架，将它们与最先进的监督GatorTron NER-RE基线进行比较。基线达到了0.960的Micro-F1，而最佳零样本模型（GPT-OSS-20B）达到了0.893的Micro-F1（召回率：0.949），在没有任务特定训练的情况下准确提取了复杂关系（如病理分期）。这些结果表明，开源零样本智能体LLMs是提取肺部病理信息的低成本解决方案。

英文摘要

Information extraction from pathology reports is essential for cancer staging, tumor registry population. Yet key data remains embedded in narrative reports, making manual extraction labor-intensive and error-prone. Traditional supervised Natural Language Processing pipelines address this through fully supervised Named Entity Recognition and Relation Extraction, but require expensive manual annotation and suffer cascading failures when upstream entities are missed. In this study, we developed a zero-shot, agentic workflow, and evaluated five open-source generative Large Language Models (LLMs) to populate 13 College of American Pathologists synoptic fields from lung resection pathology reports. We compared them against a state-of-the-art supervised GatorTron NER-RE baseline using a novel, registry-aligned evaluation framework. The baseline achieved Micro-F1of 0.960, while the best zero-shot model (GPT-OSS-20B) achieved Micro-F1 of 0.893 (recall: 0.949), accurately extracting complex relations like Pathologic Stage without task-specific training. These results suggest that open-source, zero-shot agentic LLMs are a low-cost solution for extracting lung pathology information.

URL PDF HTML ☆

赞 0 踩 0

2606.19895 2026-06-19 math.NA cs.LG cs.NA 交叉投稿

A fast direct solver based neural network for solving PDEs

基于快速直接求解器的神经网络求解偏微分方程

Jashwanth Reddy Kadaru, Vaishnavi Gujjula

AI总结提出一种学习HODLR矩阵逆运算的神经网络，并扩展为非线性PDE求解算子，实验表明在多种PDE上高效且泛化良好。

Comments 26 pages, 7 Figures, 5 Tables

详情

AI中文摘要

大规模$N$体问题产生的矩阵可以使用层次矩阵高效表示，其关键思想是允许跨矩阵分区层次结构的可接受非对角子矩阵可以通过低秩矩阵很好地近似。HODLR（层次非对角低秩）矩阵是层次矩阵的一个子类，其中递归二分划分的每一级的所有非对角子矩阵都是低秩的。本文提出一种神经网络，基于Ambikasaran和Darve（2013）开发的HODLR矩阵快速直接求解器，学习HODLR矩阵的逆运算。我们进一步通过将部分线性层替换为深度子网络，扩展该架构以学习与PDE相关的非线性解算子。我们通过进行一组全面的实验来展示所提出架构的性能，包括（i）求解线性问题，如第二类Fredholm积分方程，（ii）求解PDE，如非线性薛定谔方程、Burgers方程和稳态达西流方程，（iii）跨不同参数值的泛化研究，（iv）将所提出网络的推理时间与经典数值求解器的运行时间进行比较，以及（v）将所提出网络与一些现有的神经算子学习网络进行比较。

英文摘要

The matrices arising from large scale $N$-body problems can be efficiently represented using hierarchical matrices, whose key idea is that the admissible off-diagonal sub-matrices can be well approximated by low-rank matrices across a hierarchy of matrix partitions. HODLR (Hierarchical Off-Diagonal Low-Rank) matrices are a subclass of hierarchical matrices in which all off-diagonal submatrices at every level of a recursive binary partition are low-rank. In this article, we present a neural network that learns the inverse operation of HODLR matrices based on the fast direct solver for HODLR matrices developed by Ambikasaran and Darve (2013). We further extend the architecture to learn nonlinear solution operators associated with PDEs by replacing some of the linear layers with deep sub-networks. We demonstrate the performance of the proposed architecture by performing a comprehensive set of experiments that include (i) solving a linear problem such as the Fredholm integral equation of the second kind, (ii) solving PDEs such as the nonlinear Schrödinger equation, Burgers' equation, and the steady-state Darcy's flow equation, (iii) generalization study across varying parameter values, (iv) comparing the inference time of the proposed network with the run time of a classical numerical solver, and (v) comparing the proposed network with some of the existing neural operator learning networks.

URL PDF HTML ☆

赞 0 踩 0

2606.19912 2026-06-19 math.NA cs.LG cs.NA physics.comp-ph 交叉投稿

AI经济学家代理：一种基于模型的经济分析代理框架，结合RAG、知识图谱和大语言模型

Masahiro Kato

AI总结提出一种基于RAG的AI经济学家代理框架，利用知识图谱和大语言模型进行经济情景分析，通过代理规划、检索证据、选择模型并生成报告，提高经济叙事的连贯性和可追溯性。

详情

AI中文摘要

我们提出了一种基于模型的RAG型AI经济学家，具有用于经济情景分析的代理框架，使用大语言模型（LLMs）和知识图谱。虽然LLMs可以生成流畅的经济叙事，但经济学家通常需要做出基于经济理论和现实数据的经济主张。基于这一动机，本研究提出了一种基于RAG的AI经济学家，它利用包含经济数据和理论的知识图谱以及基于LLM的代理来规划分析、检索相关证据、选择合适的模型并生成报告。在我们的框架中，我们不直接仅使用语言模型产生定量主张；相反，我们生成基于显式模型计算的叙事，并通过AI代理与检索到的证据相关联。我们将我们的框架称为AI经济学家代理。我们在两个应用中评估了AI经济学家代理：为美国通胀持续性和美联储政策生成经济学家报告，以及为美国商业房地产再融资压力生成银行压力测试叙事。结果说明了如何通过基于生成报告来提高其经济连贯性和可追溯性。

英文摘要

We propose a model-grounded RAG-based AI economist with an agentic framework for economic scenario analysis using large language models (LLMs) and knowledge graphs. While LLMs can generate fluent economic narratives, economists are often required to make economic claims grounded by economic theory and real-world data. Based on this motivation, this study proposes an RAG-based AI economist, which utilizes knowledge graphs including economic data and theory and LLM-based agents to plan the analysis, retrieve relevant evidence, select appropriate models, and generate reports. In our framework, we do not produce quantitative claims directly with the language model alone; instead, we generate narratives grounded in explicit model-based computations and linked to the retrieved evidence via AI agents. We refer to our framework as an AI economist agent. We evaluate the AI economist agent in two applications: economist report generation for U.S. inflation persistence and Federal Reserve policy, and bank stress-test narrative generation for U.S. commercial real estate refinancing stress. The results illustrate how grounding the generated reports improves their economic coherence and traceability.

URL PDF HTML ☆

赞 0 踩 0

2606.20118 2026-06-19 cs.RO cs.LG 交叉投稿

MedRLM：用于长上下文临床推理、传感器引导筛查、证据支持决策及社区到三级转诊优化的递归多模态健康智能

Aueaphum Aueawatthanaphisut

发表机构 * School of Information, Computer ； Communication Technology Sirindhorn International Institute of Technology, Thammasat University Pathum Thani, Thailand 1

AI总结提出MedRLM递归多模态健康智能框架，通过递归检查、分解、检索、验证和合成患者信息，协调多个专业代理并引入临床证据图记忆，实现长上下文临床推理和传感器引导筛查。

Comments 9 pages, 3 figures, 3 tables, 1 Algorithm, 29 equations

详情

AI中文摘要

现实世界的临床决策支持需要对异质性和纵向的患者信息进行推理，而不是回答孤立的医学问题。然而，当前的医学大语言模型和检索增强生成系统通常依赖单步提示或检索，当临床证据分布在长电子健康记录、医学图像、传感器流、指南和转诊约束中时，这可能变得脆弱。本文提出MedRLM，一个用于长上下文临床推理、传感器引导筛查和社区到三级转诊支持的递归多模态健康智能框架。MedRLM不是将所有患者信息压缩到一个提示中，而是将患者病例视为一个外部临床环境，可以递归地检查、分解、检索、验证和综合。该框架协调了专门用于临床文本、纵向EHR、医学影像、生理传感器信号、指南检索、不确定性审计和转诊规划的代理。它进一步引入了临床证据图记忆，将患者特定的观察结果与检索到的证据、标准化定义、传感器衍生的生物标志物和转诊标准连接起来。传感器引导的递归触发机制在检测到异常生理或行为模式时激活更深层次的推理，而不确定性门控细化支持临床医生对高风险或低置信度病例的审查。我们还概述了一个使用公共和经认证的临床数据集（涵盖EHR、放射学、ECG、ICU时间序列和转诊代理结果）的真实数据评估设计。MedRLM旨在将医学AI从静态问答转向可审计、多模态和流程感知的临床决策支持。

英文摘要

Real-world clinical decision support requires reasoning over heterogeneous and longitudinal patient information rather than answering isolated medical questions. However, current medical large language models and retrieval-augmented generation systems often rely on single-step prompting or retrieval, which can be fragile when clinical evidence is distributed across long electronic health records, medical images, sensor streams, guidelines, and referral constraints. This paper proposes MedRLM, a Recursive Multimodal Health Intelligence framework for long-context clinical reasoning, sensor-guided screening, and community-to-tertiary referral support. Instead of compressing all patient information into one prompt, MedRLM treats the patient case as an external clinical environment that can be recursively inspected, decomposed, retrieved, verified, and synthesized. The framework coordinates specialized agents for clinical text, longitudinal EHR, medical imaging, physiological sensor signals, guideline retrieval, uncertainty auditing, and referral planning. It further introduces a Clinical Evidence Graph Memory to connect patient-specific observations with retrieved evidence, standardized definitions, sensor-derived biomarkers, and referral criteria. A sensor-guided recursive triggering mechanism activates deeper reasoning when abnormal physiological or behavioral patterns are detected, while uncertainty-gated refinement supports clinician review for high-risk or low-confidence cases. We also outline a real-data evaluation design using public and credentialed clinical datasets spanning EHR, radiology, ECG, ICU time series, and referral-proxy outcomes. MedRLM aims to move medical AI from static question answering toward auditable, multimodal, and workflow-aware clinical decision support.

URL PDF HTML ☆

赞 0 踩 0

2606.20437 2026-06-19 hep-ex cs.LG 交叉投稿

HEPTv2: End-to-End Efficient Point Transformer for Charged Particle Reconstruction

HEPTv2：用于带电粒子重建的端到端高效点变换器

Siqi Miao, Shitij Govil, Jack P. Rodgers, Mia Liu, Javier Duarte, Shih-Chieh Hsu, Yuan-Tang Chou, Pan Li

AI总结提出HEPTv2，一种端到端点变换器架构，通过局部敏感哈希编码和扇区化解码，无需图构建即可从探测器击中点直接重建粒子轨迹，在TrackML上以0.8%假率实现98.6%追踪效率，延迟仅15ms。

详情

AI中文摘要

Q-Net：基于卡尔曼神经网络的队列长度估计

Ting Gao, Elvin Isufi, Winnie Daamen, Erik-Sander Smits, Serge Hoogendoorn

发表机构 * University of Amsterdam（阿姆斯特丹大学）； Delft University of Technology（代尔夫特理工大学）

AI总结本文提出Q-Net框架，通过结合卡尔曼滤波与神经网络，解决信号交叉口队列长度估计中的数据融合问题，提升空间转移性和实时性，实现无需昂贵传感设备的准确队列估计。

Journal ref Transportation Research Part C: Emerging Technologies, Volume 190, September 2026, Article 105809

详情

DOI: 10.1016/j.trc.2026.105809

AI中文摘要

估计信号交叉口的队列长度一直是交通管理中的长期挑战。尽管有两类隐私保护的数据源：(i) 接近停止线的环形检测器提供的车辆计数汇总数据，以及 (ii) 提供路段平均速度测量的汇总浮动汽车数据 (aFCD)，但如何将这些具有不同空间和时间分辨率的数据源整合用于队列长度估计仍不清楚。为此，本文提出Q-Net：一种基于状态空间形式的队列估计框架。该设计解决了队列建模中的关键挑战，如违反交通守恒假设。Q-Net遵循卡尔曼预测-更新结构，并在状态演变和测量模型中保持物理可解释性。Q-Net使用AI增强的卡尔曼滤波器从数据中学习时间变化的增益动态。该框架支持实时实现，并通过将aFCD测量分组为固定大小的局部组来提高空间转移性，使可学习参数的数量与路段长度无关。在荷兰 Rotterdam 城市主干道的评估显示，Q-Net优于基线方法，能够准确追踪队列的形成和消散，并缓解aFCD引起的延迟。通过结合数据效率、可解释性、实时适用性和空间转移性，Q-Net在无需昂贵的传感基础设施（如摄像头或雷达）的情况下实现了准确的队列长度估计。

英文摘要

Estimating queue lengths at signalized intersections is a long-standing challenge in traffic management. Partial observability of vehicle flows complicates this task despite the availability of two privacy-preserving data sources: (i) aggregated vehicle counts from loop detectors near stop lines, and (ii) aggregated floating car data (aFCD) that provide segment-wise average speed measurements. However, how to integrate these sources with differing spatial and temporal resolutions for queue length estimation is rather unclear. Addressing this question, we present Q-Net: a queue estimation framework built upon a state-space formulation. This design addresses key challenges in queue modeling, such as violations of traffic conservation assumptions. Q-Net follows the Kalman predict-update structure and maintains physical interpretability in both the state evolution and measurement models. Q-Net uses an AI-augmented Kalman filter to learn time-varying gain dynamics from data. The framework supports real-time implementation and improves spatial transferability by grouping aFCD measurements into fixed-size local groups, making the number of learnable parameters independent of section length. Evaluations on urban main roads in Rotterdam, the Netherlands, show that Q-Net outperforms baseline methods, tracks queue formation and dissipation accurately, and mitigates aFCD-induced delays. By combining data efficiency, interpretability, real-time applicability, and spatial transferability, Q-Net makes accurate queue length estimation possible without costly sensing infrastructure like cameras or radar.

URL PDF HTML ☆

赞 0 踩 0

2412.18980 2026-06-19 cs.LG 版本更新

Evaluating deep learning models for fault diagnosis of a rotating machinery with epistemic and aleatoric uncertainty

评估深度学习模型在旋转机械故障诊断中的认知不确定性和偶然不确定性

Reza Jalayer, Masoud Jalayer, Andrea Mor, Carlotta Orsenigo, Carlo Vercellis

发表机构 * Faculty of Engineering and Natural Sciences（工程与自然科学学院）； Department of Information and Communications Engineering（信息与通信工程系）； Department of Management, Economics and Industrial Engineering（管理、经济与工业工程系）

AI总结本文首次全面比较了不确定性感知深度学习架构在旋转机械故障诊断中的表现，发现深度集成模型在检测未知故障和噪声数据方面优于其他方法。

详情

AI中文摘要

不确定性感知深度学习模型最近在故障诊断中受到关注，作为一种在来自未见故障（认知不确定性）或噪声存在（偶然不确定性）的分布外数据出现时促进可靠故障检测的方法。在本文中，我们首次对旋转机械故障诊断中最先进的不确定性感知深度学习架构进行了全面比较研究，其中研究了受认知不确定性影响的不同场景和不同类型的偶然不确定性。所选架构包括通过dropout采样、贝叶斯神经网络和深度集成。此外，为了区分不同场景中的分布内和分布外数据，我们交替应用了两个不确定性阈值，其中一个是在本文中引入的。我们的实证结果为必须部署实际不确定性感知故障诊断系统的从业者和研究人员提供了指导。特别是，它们揭示了在存在认知不确定性的情况下，所有深度学习模型都能够有效地检测到平均而言所有场景中相当一部分分布外数据。然而，深度集成模型显示出优越的性能，与用于区分的阈值无关。在存在偶然不确定性的情况下，噪声水平起着重要作用。具体来说，低噪声水平阻碍了模型有效检测分布外数据的能力。即使在这种情况下，深度集成模型也表现出较温和的性能下降，主导其他模型。这些成就，加上它们更短的推理时间，使得深度集成架构成为首选。

英文摘要

Uncertainty-aware deep learning (DL) models recently gained attention in fault diagnosis as a way to promote the reliable detection of faults when out-of-distribution (OOD) data arise from unseen faults (epistemic uncertainty) or the presence of noise (aleatoric uncertainty). In this paper, we present the first comprehensive comparative study of state-of-the-art uncertainty-aware DL architectures for fault diagnosis in rotating machinery, where different scenarios affected by epistemic uncertainty and different types of aleatoric uncertainty are investigated. The selected architectures include sampling by dropout, Bayesian neural networks, and deep ensembles. Moreover, to distinguish between in-distribution and OOD data in the different scenarios two uncertainty thresholds, one of which is introduced in this paper, are alternatively applied. Our empirical findings offer guidance to practitioners and researchers who have to deploy real-world uncertainty-aware fault diagnosis systems. In particular, they reveal that, in the presence of epistemic uncertainty, all DL models are capable of effectively detecting, on average, a substantial portion of OOD data across all the scenarios. However, deep ensemble models show superior performance, independently of the uncertainty threshold used for discrimination. In the presence of aleatoric uncertainty, the noise level plays an important role. Specifically, low noise levels hinder the models' ability to effectively detect OOD data. Even in this case, however, deep ensemble models exhibit a milder degradation in performance, dominating the others. These achievements, combined with their shorter inference time, make deep ensemble architectures the preferred choice.

URL PDF HTML ☆

赞 0 踩 0

2502.06866 2026-06-19 cs.LG cs.AI econ.EM stat.AP stat.ML 版本更新

Global Ease of Living Index: a machine learning framework for longitudinal analysis of major economies

全球生活便利指数：面向主要经济体纵向分析的机器学习框架

Arun Kumar Selvaraj, Tanay Panat, Rohitash Chandra

发表机构 * Transitional Artificial Intelligence Research Group, School of Mathematics and Statistics（过渡人工智能研究组，数学与统计学学院）； Centre for Artificial Intelligence and Innovation（人工智能与创新中心）； Pingla Institute（Pingla研究所）

AI总结提出全球生活便利指数，结合社会经济和基础设施因素，利用机器学习处理缺失数据，并通过主成分分析和因子分析降维，为政策制定者提供改善生活质量的可操作工具。

详情

AI中文摘要

全球经济、地缘政治条件以及COVID-19疫情等破坏性事件对生活成本和生活质量产生了巨大影响。理解主要经济体中生活成本和生活质量的长期影响至关重要。一个透明且全面的生活指数必须包含生活条件的多个维度。在本研究中，我们提出了一种通过全球生活便利指数量化生活质量的方法，该指数将各种社会经济和基础设施因素整合为一个单一综合得分。我们的指数利用定义生活水平的经济指标，这有助于针对特定领域进行干预改进。我们提出了一个机器学习框架来处理特定国家某些经济指标的数据缺失问题。然后，我们整理并更新数据，并使用降维方法（主成分分析和因子分析）创建自1970年以来主要经济体的生活便利指数。我们的工作通过为政策制定者提供识别需要改进领域（如医疗系统、就业机会和公共安全）的实用工具，显著丰富了相关文献。我们的方法使用开放数据和代码，易于复现并适用于各种情境，为生活质量评估的持续研究和政策制定提供了透明度和可访问性。

英文摘要

The drastic changes in the global economy, geopolitical conditions, and disruptions such as the COVID-19 pandemic have impacted the cost of living and quality of life. It is essential to comprehend the long-term implications of the cost of living and quality of life in major economies. A transparent and comprehensive living index must include multiple dimensions of living conditions. In this study, we present an approach to quantifying the quality of life through the Global Ease of Living Index that combines various socio-economic and infrastructural factors into a single composite score. Our index utilises economic indicators that define living standards, which could help in targeted interventions to improve specific areas. We present a machine learning framework to address missing data for certain economic indicators in specific countries. We then curate and update the data and use a dimensionality reduction approach (Principal Component Analysis and Factor Analysis) to create the Ease of Living Index for major economies since 1970. Our work significantly adds to the literature by offering a practical tool for policymakers to identify areas needing improvement, such as healthcare systems, employment opportunities, and public safety. Our approach with open data and code can be easily reproduced and applied to various contexts, providing transparency and accessibility for ongoing research and policy development in quality-of-life assessment.

URL PDF HTML ☆

赞 0 踩 0

2604.06265 2026-06-19 cs.LG cond-mat.stat-mech quant-ph 版本更新

SMT-AD: a scalable quantum-inspired anomaly detection approach

SMT-AD：一种可扩展的量子启发式异常检测方法

Apimuk Sornsaeng, Si Min Chan, Wenxuan Zhang, Swee Liang Wong, Joshua Lim, Jonathan Pan, Dario Poletti

发表机构 * Science, Mathematics and Technology Cluster, Singapore University of Technology and Design（新加坡科技设计大学科学、数学与技术集群）； Centre for Quantum Technologies, National University of Singapore（新加坡国立大学量子技术中心）； Artificial Intelligence and Data Analytics Strategic Technology Centre, ST Engineering（ST工程人工智能与数据分析战略技术中心）； Engineering Product Development Pillar, Singapore University of Technology and Design（新加坡科技设计大学工程产品开发支柱）

AI总结提出基于多分辨率张量叠加的量子启发式异常检测方法SMT-AD，通过傅里叶辅助特征嵌入和矩阵乘积算子实现线性可扩展，在标准数据集上取得竞争性能。

Comments 12 pages, 5 figures

详情

AI中文摘要

量子启发的张量网络算法已被证明是机器学习任务（包括异常检测）中有效且高效的模型。在此，我们提出一种高度可并行化的量子启发式方法，称为SMT-AD（Superposition of Multiresolution Tensors for Anomaly Detection）。它基于键维数为1的矩阵乘积算子的叠加，通过傅里叶辅助特征嵌入对输入数据进行变换，其中可学习参数的数量随特征大小、嵌入分辨率和矩阵乘积算子结构中附加组件的数量线性增长。我们展示了在标准数据集（包括信用卡交易）上成功的异常检测，并发现即使采用最小配置，它也能与已建立的异常检测基线相媲美。此外，它提供了一种直接的方法来减少模型权重，甚至通过突出最相关的输入特征来提高性能。

英文摘要

Quantum-inspired tensor networks algorithms have shown to be effective and efficient models for machine learning tasks, including anomaly detection. Here, we propose a highly parallelizable quantum-inspired approach which we call SMT-AD from Superposition of Multiresolution Tensors for Anomaly Detection. It is based upon the superposition of bond-dimension-1 matrix product operators to transform the input data with Fourier-assisted feature embedding, where the number of learnable parameters grows linearly with feature size, embedding resolutions, and the number of additional components in the matrix product operators structure. We demonstrate successful anomaly detection when applied to standard datasets, including credit card transactions, and find that, even with minimal configurations, it achieves competitive performance against established anomaly detection baselines. Furthermore, it provides a straightforward way to reduce the weight of the model and even improve the performance by highlighting the most relevant input features.

URL PDF HTML ☆

赞 0 踩 0

2503.04507 2026-06-19 q-bio.QM cs.CG cs.LG 版本更新

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

AI总结提出一种基于定向分段线性Morse理论的拓扑变换，通过记录多个高度函数下的临界点来量化嵌入对象的几何形状，生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情

AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而，为了统计推断或分类任务的目的，用数值描述几何信息仍然困难。在这里，我们引入了一种新的拓扑变换，它利用定向分段线性Morse理论，通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型（峰、谷或鞍点），保留了比欧拉特征变换更精细的信息，同时自然优先考虑形状的最外层区域。关键的是，该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选（LBVS）的描述符进行基准测试，这本质上依赖于分子的形状。在常见的梯度提升树分类流程下，与其他拓扑变换描述符和标准基于形状的LBVS描述符相比，Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

URL PDF HTML ☆

赞 0 踩 0

2605.15231 2026-06-19 cs.LG cs.CV 版本更新

Mask-Morph Graph U-Net: A Generalisable Mesh-Based Surrogate for Crashworthiness Field Prediction under Large Geometric Variation

Mask-Morph Graph U-Net：一种通用的基于网格的替代模型，用于在大几何变化下预测碰撞worthiness领域

Haoran Li, Tobias Lehrer, Yingxue Zhao, Haosu Zhou, Philipp Stocker, Tobias Pfaff, Marcus Wagner, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London（帝国理工学院伦敦设计工程学院）； TUM School of Engineering and Design, Technical University of Munich（慕尼黑技术大学工程与设计学院）； Faculty of Mechanical Engineering, OTH Regensburg（雷根斯堡机械工程学院）； NVIDIA（NVIDIA公司）

AI总结本文提出Mask-Morph Graph U-Net，通过特征对齐的重心参数化和节点掩码预训练，提升网格模拟的通用性和数据效率，适用于碰撞worthiness设计探索。

Comments 48 pages, 15 figures, jounral paper under review

详情

AI中文摘要

非线性有限元碰撞模拟准确但计算成本高，限制了其在迭代设计优化中的应用。基于图神经网络（GNN）的机器学习替代模型提供了更快的替代方案。消息传递GNN广泛用于网格模拟，其共享节点和边更新函数在不同图结构中相对通用。相比之下，非共享边特定聚合层能更准确地捕捉非线性关系，但通常需要固定图连接性，限制了通用性。本文提出Mask-Morph Graph U-Net（MMGUNet），一种解决分层图U-Net架构限制的方法，该架构使用边特定下采样和上采样层。固定粗图连接性是边特定层所必需的。为了在保留此连接性的同时提高空间对应性，所提出的方法通过特征对齐的重心参数化将粗化图层次变形到每个输入网格，然后构建跨图边。它进一步在监督预训练中应用节点掩码，随后进行参数高效的微调，其中高参数边特定层被冻结。所提出的方法在分布内、分布外和跨组件迁移设置中使用均欧距离和最大入侵百分比误差进行评估。结果表明，粗图变形相对于固定粗图基线提高了测试准确性，而掩码监督预训练减少了训练-测试差异并提高了迁移期间的数据效率。所提出的模型还比外部基线取得了更低的预测误差。这些结果展示了通往可重用、数据高效网格替代模型的实用路径，用于碰撞worthiness设计探索。

英文摘要

Nonlinear finite element crash simulations are accurate but computationally expensive, limiting their use in iterative design optimisation. Machine-learning surrogate models based on graph neural networks (GNNs) offer a faster alternative. Message-passing GNNs are widely used for mesh simulation, and their shared node and edge update functions are relatively generalisable across varying graph structures. By contrast, non-shareable edge-specific aggregation layers can capture nonlinear relationships more accurately but usually require fixed graph connectivity, which limits generalisability. This paper presents Mask-Morph Graph U-Net (MMGUNet), a practical approach to addressing the limitation of hierarchical Graph U-Net architectures that use edge-specific downsampling and upsampling layers. Fixed coarse graph connectivity is required for edge-specific layers. To retain this while improving spatial correspondence, the proposed method morphs the coarsened graph hierarchy to each input mesh using feature-aligned barycentric parameterisation before constructing cross-graph edges. It further applies node masking during supervised pretraining, followed by parameter-efficient fine-tuning in which high-parameter edge-specific layers are frozen. The proposed approach is evaluated in in-distribution, out-of-distribution, and cross-component transfer settings using mean Euclidean distance and maximum intrusion percentage error. Results show that coarse-graph morphing improves test accuracy relative to a fixed-coarse-graph baseline, while masked supervised pretraining reduces the train-test discrepancy and improves data efficiency during transfer. The proposed model also achieves lower prediction error compared with external baselines. These results demonstrate a practical route toward reusable, data-efficient mesh-based surrogate modelling for crashworthiness design exploration.

URL PDF HTML ☆

赞 0 踩 0

2606.12500 2026-06-19 cs.LG cs.AI 版本更新

Improving Crash Frequency Prediction from Simulated Traffic Conflicts Using Machine Learning Based Microsimulation

基于机器学习的微观仿真从模拟交通冲突改进碰撞频率预测

Xian Liu, Carlo G. Prato, Gustav Markkula

AI总结本文利用机器学习行为模型替代传统规则模型进行交通微观仿真，通过极端值理论分析模拟冲突预测碰撞频率，在英国利兹五个信号交叉口验证了ML模型无需地点校准即可提升预测准确性。

详情

AI中文摘要

交通微观仿真结合替代安全措施越来越多地被用作历史碰撞数据的主动替代方案，用于预测当前或计划道路基础设施设计的碰撞频率。然而，现有的基于微观仿真的安全研究采用了简化的基于规则的行为模型，这些模型能较好地再现交通流，但往往无法生成真实的冲突动态，限制了碰撞预测的准确性。机器学习（ML）行为模型的最新进展提供了一个有希望的机会，通过直接从大规模轨迹数据集中学习人类驾驶行为，可能提高微观仿真的真实性和碰撞频率预测。为了研究这种可能性，我们对英国利兹的五个真实信号交叉口进行了交通微观仿真，使用了标准的基于规则模型和最先进的ML模型。使用二维碰撞时间指标分析模拟车辆轨迹以识别模拟冲突，然后使用极端值理论建模以预测碰撞频率。结果表明，ML模型的冲突产生的碰撞预测与实际碰撞数据一致，而基于规则的模型由于缺乏对特定模拟交叉口的模型校准，无法产生有意义的预测。直接使用ML生成的模拟碰撞来预测实际碰撞频率也产生了较差的结果，这表明尽管当前的ML模型可以真实地再现冲突，但尚不能生成真实的碰撞。总体而言，研究结果表明，基于ML的行为模型在无需特定地点模型校准的情况下，有望从模拟冲突中改进碰撞预测，并为基于ML的交通微观仿真指明了明确的未来方向。

英文摘要

Traffic microsimulation combined with surrogate safety measures has increasingly been used as a proactive alternative to historical crash data for predicting crash frequency for current or planned road infrastructure designs. However, existing microsimulation-based safety studies have adopted simplified rule-based behaviour models, which reproduce traffic flow reasonably well but often fail to generate realistic conflict dynamics, limiting crash prediction accuracy. Recent advances in machine learning (ML)-based behaviour models offer a promising opportunity to potentially improve microsimulation realism and crash frequency predictions by learning human driving behaviour directly from large-scale trajectory datasets. To investigate this possibility, traffic microsimulation was conducted for five real-world signalised intersections in Leeds, UK, using both a standard rule-based model and a state-of-the-art ML model. Simulated vehicle trajectories were analysed using a two-dimensional Time-to-Collision metric to identify simulated conflicts, which were then modelled using Extreme Value Theory to predict crash frequency. Results show that conflicts from the ML model yielded crash predictions in line with the real-world crash data, whereas the rule-based model did not permit meaningful predictions, presumably due to a lack of model calibration to the specific simulated intersections. Directly using ML-generated simulated crashes to predict real-world crash frequency also yielded poor results, suggesting that while current ML models can realistically reproduce conflicts, they are not yet able to generate realistic crashes. Overall, the findings demonstrate that ML-based behaviour models are promising for improving crash prediction from simulated conflicts, without a need for location-specific model calibration, and suggest clear future directions for ML-based traffic microsimulation.

URL PDF HTML ☆

赞 0 踩 0

2606.18933 2026-06-19 cs.LG cs.IR stat.ME 版本更新

Zero-Shot Active Feature Acquisition via LLM-Elicitation

基于LLM启发式的零样本主动特征获取

Binyamin Perets, Natalie Mendelson, Shiran Vainberg, Yehuda Chowers, Shai Shen-Orr, Shie Mannor

发表机构 * Faculty of EE, Technion（技术学院电子工程系）； Faculty of Medicine, Technion（技术学院医学院）； CytoReason ； NVIDIA

AI总结提出通过LLM启发式获取马尔可夫随机场充分统计量的零样本主动特征获取框架，解决数据标注不足问题，在IBD患者诊断中优于现有方法。

详情

AI中文摘要

主动特征获取（AFA）顺序选择要观察的特征以达成分类或排序决策。其主要局限性在于依赖大量标注数据来拟合指导获取的概率模型。大型语言模型（LLM）提供无监督的领域知识，但作为序列规划者表现不佳。要求其同时知晓和决策会混淆最好分开的能力。这里，我们通过严格的启发式方法开发了一个零样本AFA框架：仅要求LLM返回其可被信任返回的内容，即马尔可夫随机场（MRF）的充分统计量——一元偏差和成对协变。我们将该框架应用于两个场景：二分类和top-$k$识别。实践中，LLM可靠地仅返回判别性统计量，即区分类别而非孤立每个类别的统计量，这阻碍了经典AFA。我们应用最大熵闭包来解决这种规范模糊性。我们在炎症性肠病（IBD）患者队列上进行评估，这是一个活跃的临床环境，其中诊断模糊性和患者异质性阻碍了稳定的治疗策略。我们的框架在真实标签和其自身提取的信念上均优于LLM。在最关键的地方，即最困难的患者上，我们的top-$k$获取策略显著优于所有现有方法。

英文摘要

Active feature acquisition (AFA) sequentially selects which features to observe to reach a classification or ranking decision. Its central limitation is reliance on large amount of labeled data to fit probabilistic models guiding acquisition. Large language models (LLMs) supply unsupervised domain knowledge, but are poor sequential planners. Asking one to both know and decide conflates capabilities best kept separate. Here, we develop a framework for zero-shot AFA through disciplined elicitation: asking the LLM only for what it can be trusted to return, the unary deviations and pairwise co-variations that are the sufficient statistics of a Markov random field (MRF). We apply our framework to two settings: binary classification and top-$k$ identification. In practice, the LLM reliably returns only discriminative statistics, what distinguishes the classes rather than each class in isolation, which precludes classical AFA. We apply a maximum-entropy closure that resolves this gauge ambiguity. We evaluate on a cohort of Inflammatory Bowel Disease (IBD) patients, an active clinical setting where diagnostic ambiguity and patient heterogeneity obstruct stable treatment strategies. Our framework outperforms the LLM both on real labels and on its own extracted beliefs. Where it matters most, on the hardest patients, our top-$k$ acquisition policy markedly outperforms all existing methods.

URL PDF HTML ☆

赞 0 踩 0

2503.17386 2026-06-19 eess.SY cs.LG cs.SY 版本更新

A graph neural network surrogate model for mesh-based crashworthiness prediction of vehicle panel components

基于图神经网络的网格级车辆面板部件耐撞性预测代理模型

Haoran Li, Yingxue Zhao, Haosu Zhou, Tobias Pfaff, Nan Li

发表机构 * Dyson School of Design Engineering, Imperial College London（迪森设计工程学院，帝国理工学院伦敦分校）； NVIDIA

AI总结提出递归图U-Net (ReGUNet) 代理模型，通过图表示有限元网格，结合层次架构和递归机制，高效准确预测车辆B柱等面板部件的动态变形和耐撞性指标。

Comments Accepted manuscript version. Final published version available in Results in Engineering via DOI: 10.1016/j.rineng.2026.110925

Journal ref Results in Engineering 30 (2026) 110925

详情

DOI: 10.1016/j.rineng.2026.110925

AI中文摘要

耐撞性是安全关键车辆面板部件（如B柱）设计中的关键性能指标。有限元（FE）模拟广泛用于评估碰撞响应，但对于大规模非线性碰撞场景，特别是当集成到迭代设计和优化过程中时，计算成本仍然很高。尽管基于机器学习的代理模型已被开发用于快速耐撞性分析，但它们在对复杂三维部件的详细表示方面存在局限性。图神经网络（GNN）已成为处理复杂结构数据的有前景的解决方案。然而，现有的GNN模型通常缺乏足够的精度和计算效率以满足工业需求。本文提出了递归图U-Net（ReGUNet），一种用于车辆面板部件耐撞性分析的基于图的代理模型。通过将有限元网格表示为图形式，该模型自然地适应复杂的非规则结构几何。其层次架构提高了计算效率和精度，而递归的引入增强了多时间步长上时间预测的稳定性。使用不同几何形状的热冲压钢B柱的侧面碰撞案例研究来生成训练数据集。训练后的模型在预测未见过的部件设计的动态变形行为和耐撞性指标方面表现出高精度。与基线方法相比，ReGUNet在平均变形预测误差上实现了超过52%的降低，同时计算效率显著提高。ReGUNet提供了快速可靠的耐撞性评估，从而加速了车辆面板部件的设计周期。

英文摘要

Crashworthiness is a key performance measure in the design of safety-critical vehicle panel components such as B-pillars. Finite element (FE) simulations are widely used to evaluate crash responses but remain computationally expensive for large-scale, nonlinear impact scenarios, particularly when integrated into iterative design and optimisation processes. Although machine learning-based surrogate models have been developed for rapid crashworthiness analysis, they exhibit limitations in detailed representation of complex 3-dimensional components. Graph Neural Networks (GNNs) have emerged as a promising solution for processing data with complex structures. However, existing GNN models often lack sufficient accuracy and computational efficiency to meet industrial demands. This paper proposes Recurrent Graph U-Net (ReGUNet), a graph-based surrogate model for crashworthiness analysis of vehicle panel components. By representing FE meshes in graph form, the model naturally accommodates complex irregular structural geometries. Its hierarchical architecture improves computational efficiency and accuracy, while the introduction of recurrence enhances stability of temporal predictions over multiple time steps. A side-impact case study of hot-stamped steel B-pillars with varying geometries is used to generate training dataset. The trained model demonstrates high accuracy in predicting the dynamic deformation behaviour and crashworthiness indicators of previously unseen component designs. ReGUNet achieves over a 52% reduction in the average deformation prediction error relative to baseline methods, together with markedly improved computational efficiency. ReGUNet provides rapid and reliable crashworthiness assessments, which in turn accelerates the design cycle of vehicle panel components.

URL PDF HTML ☆

赞 0 踩 0

2505.18726 2026-06-19 cs.SD cs.LG eess.AS 版本更新

Bioacoustic Geolocation: Species Sounds as Geographic Signals

生物声学地理定位：物种声音作为地理信号

Mustafa Chasmai, Wuao Liu, Subhransu Maji, Grant Van Horn

发表机构 * University of Massachusetts, Amherst（马萨诸塞大学阿姆赫斯特分校）

AI总结本文研究仅通过声音进行全球尺度地理定位，利用生物声学信号中的物种地理分布线索，提出结合物种范围预测与检索的地理定位方法，并验证多模态融合的潜力。

Comments Accepted to ICML 26

详情

AI中文摘要

我们能否仅通过听到的声音确定某人的地理位置？声学信号是否足以定位到国家、州甚至城市？在这项工作中，我们应对全球尺度音频地理定位的挑战，特别关注野生动物和自然声音。我们假设生物声学信号包含信息丰富的地理定位线索，因为物种具有明确的地理分布范围。为了验证这一假设，我们对图像地理定位和声景映射方法进行基准测试，设计预言机和以物种为中心的基线，并提出一种结合物种范围预测与基于检索的地理定位的混合方法。我们进一步探究地理定位是否随着物种多样性记录和跨邻近样本的时空聚合而改善。最后，我们将研究扩展到多模态地理定位，通过结合音频和视觉内容的电影案例研究。我们的结果突出了将生物声学信号纳入地理空间任务的潜力，为物种识别和音频地理定位的未来工作提供了动力。

英文摘要

Can we determine someone's geographic location solely from the sounds they hear? Are acoustic signals enough to localize within a country, state, or even city? In this work, we tackle the challenge of global-scale audio geolocation, with a particular focus on wildlife and natural sounds. We posit that bioacoustic signals contain informative geolocation cues because of well-defined geographic ranges of species. To test this hypothesis, we benchmark image geolocation and soundscape mapping methods, design oracles and species-centric baselines, and propose a hybrid approach that combines species range prediction with retrieval-based geolocation. We further ask whether geolocation improves with species-diverse recordings and spatiotemporal aggregation across neighboring samples. Finally, we extend our study to multimodal geolocation with case studies from movies that combine both audio and visual content. Our results highlight the potential of incorporating bioacoustic signals into geospatial tasks, motivating future work on species recognition and audio geolocation.

URL PDF HTML ☆

赞 0 踩 0

2507.19653 2026-06-19 cs.NI cs.AI cs.LG 版本更新

On the Limitations of Ray-Tracing for Learning-Based RF Tasks in Urban Environments

关于射线追踪在城市环境中基于学习的射频任务局限性的研究

Armen Manukyan, Hrant Khachatrian, Edvard Ghukasyan, Theofanis P. Raptis

发表机构 * Yerevan State University, Yerevan, Armenia（亚美尼亚叶里温州立大学）； YerevaNN, Yerevan, Armenia（亚美尼亚叶里温YerevaNN）； Institute of Informatics and Telematics, National Research Council, Pisa, Italy（意大利那不勒斯国家研究委员会信息与电信研究所）

AI总结通过罗马城区实测数据评估Sionna射线追踪仿真器，发现天线位置和方向对保真度影响显著，而超参数影响微弱；优化后相关性提升5%-130%，定位误差降低三分之一，但残差城市噪声仍是挑战。

Comments This work was supported by funding under the bilateral agreement between CNR (Italy) and HESC MESCS RA (Armenia) as part of the DeepRF project for the 2025-2026 biennium, and by the HESC MESCS RA grant No. 22rl-052 (DISTAL)

Journal ref 2026 IEEE Wireless Communications and Networking Conference (WCNC)

详情

DOI: 10.1109/WCNC65185.2026.11555460

AI中文摘要

我们研究了Sionna v1.0.2射线追踪在罗马市中心户外蜂窝链路中的真实感。我们使用了包含1,664个用户设备（UE）和六个名义基站（BS）站点的真实测量数据集。利用这些固定位置，我们系统地改变了主要仿真参数，包括路径深度、漫反射/镜面反射/折射标志、载波频率，以及天线的属性如高度、辐射方向和方向图。通过测量功率与仿真功率之间的Spearman相关性，以及基于RSSI指纹的k近邻定位算法，对每个基站的仿真保真度进行评分。在所有实验中，求解器超参数对所选指标的影响微不足道。相反，天线位置和方向被证明是决定性的。通过简单的贪婪优化，我们将不同基站的Spearman相关性提高了5%到130%，而仅使用仿真数据作为参考点的kNN定位误差在真实世界样本上减少了三分之一，但仍比纯真实数据的误差高一倍。因此，精确的几何形状和可信的天线模型是必要但不充分的；忠实地捕捉残余的城市噪声仍然是实现可迁移、高保真户外射频仿真的一个开放挑战。

英文摘要

We study the realism of Sionna v1.0.2 ray-tracing for outdoor cellular links in central Rome. We use a real measurement set of 1,664 user-equipments (UEs) and six nominal base-station (BS) sites. Using these fixed positions we systematically vary the main simulation parameters, including path depth, diffuse/specular/refraction flags, carrier frequency, as well as antenna's properties like its altitude, radiation pattern, and orientation. Simulator fidelity is scored for each base station via Spearman correlation between measured and simulated powers, and by a fingerprint-based k-nearest-neighbor localization algorithm using RSSI-based fingerprints. Across all experiments, solver hyper-parameters are having immaterial effect on the chosen metrics. On the contrary, antenna locations and orientations prove decisive. By simple greedy optimization we improve the Spearman correlation by 5% to 130% for various base stations, while kNN-based localization error using only simulated data as reference points is decreased by one-third on real-world samples, while staying twice higher than the error with purely real data. Precise geometry and credible antenna models are therefore necessary but not sufficient; faithfully capturing the residual urban noise remains an open challenge for transferable, high-fidelity outdoor RF simulation.

URL PDF HTML ☆

赞 0 踩 0

2510.00831 2026-06-19 cs.AI cs.LG eess.SP 版本更新

Controlled Comparison of Machine Learning Models for Fault Classification and Localization in Power System Protection

电力系统保护中故障分类与定位的机器学习模型受控比较

Julian Oelhaf, Georg Kordowich, Changhun Kim, Paula Andrea Pérez-Toro, Christian Bergler, Andreas Maier, Johann Jäger, Siming Bayer

发表机构 * Department of Electrical Engineering, Media and Computer Science, Ostbayerische Technische Hochschule Amberg-Weiden（奥贝格-魏登应用技术大学电气工程、媒体与计算机科学系）

AI总结在统一电磁暂态数据集和10-50ms决策窗口下，对比机器学习模型在故障分类与定位中的性能，发现分类在10ms时F1>0.98，定位误差稳定在约10%线路长度。

Comments Accepted at IEEE PES Innovative Smart Grid Technologies Europe 2026 (ISGT Europe 2026). Pre-camera-ready author version; final proceedings version may differ

详情

AI中文摘要

现代电力系统因逆变器基和分布式能源的集成而日益复杂，挑战了传统保护方案的可靠性，并推动了机器学习在保护任务中的应用。然而，由于不同研究中的数据集、传感假设和决策时域各异，已发表的结果往往难以比较。本文在相同的传感、时序和验证条件下，基于公共电磁暂态数据集，使用10-50ms的决策窗口以反映保护相关时间尺度，对故障分类（FC）和故障定位（FL）的机器学习模型进行了受控比较。对于FC，性能最佳的非线性模型在10ms时F1分数已超过0.98，而低容量模型在较短时域下性能下降，但随窗口延长而改善，表明相关故障类型信息在最早暂态中已存在。对于FL，顶级模型在所有评估时域下达到约10%归一化线路长度的稳定定位误差，而较弱模型形成明显分离的第二性能层级。线路解析分析显示，定位精度随电网段变化，表明存在拓扑依赖的难度而非仅时间上下文不足。这些发现为比较两个信息需求根本不同的保护任务中的机器学习模型提供了受控参考。

英文摘要

The increasing complexity of modern power systems, driven by the integration of inverter-based and distributed energy resources, challenges the reliability of conventional protection schemes and motivates the use of machine learning for protection tasks. However, published results are often difficult to compare because datasets, sensing assumptions, and decision horizons vary across studies. This paper presents a controlled comparison of machine learning models for fault classification (FC) and fault localization (FL) under identical sensing, timing, and validation conditions on a common electromagnetic transient dataset, using decision windows of 10-50 ms to reflect protection-relevant time scales. For FC, the best-performing nonlinear models achieve F1 scores above 0.98 already at 10 ms, while lower-capacity models degrade at shorter horizons but improve with longer windows, indicating that relevant fault-type information is already present in the earliest transient. For FL, the top-performing models reach a stable localization error of about 10 % of normalized line length across all evaluated horizons, while weaker models form a clearly separated second performance tier. Line-resolved analysis shows that localization accuracy varies across grid segments, indicating topology-dependent difficulty rather than insufficient temporal context alone. These findings provide a controlled reference for comparing machine learning models across two protection tasks with fundamentally different information requirements.

URL PDF HTML ☆

赞 0 踩 0

2511.22486 2026-06-19 physics.plasm-ph cs.LG 版本更新

The Machine Learning Approach to Moment Closure Relations for Plasma: A Review

等离子体矩闭包关系的机器学习方法：综述

Samuel Burles, Enrico Camporeale

发表机构 * School of Physical and Chemical Sciences, Queen Mary University of London（伦敦大学女王学院物理与化学科学学院）； Space Weather TREC, University of Colorado（科罗拉多大学空间天气TREC）

AI总结本文综述了机器学习方法在等离子体流体模型中发展改进闭包模型的研究，涵盖神经网络代理和方程发现两类方法，并讨论了离线测试与在线模拟的挑战及未来方向。

Comments 58 pages, 6 figures

详情

AI中文摘要

大规模等离子体全局模拟的需求是空间和实验室等离子体物理学中持续存在的挑战。任何基于流体模型的模拟都固有地需要高阶等离子体矩的闭包关系。本综述汇编并分析了近期涌现的机器学习方法，这些方法旨在开发改进的等离子体闭包模型，能够在等离子体流体模型中捕捉动力学现象。我们调查了两类方法：神经网络代理（从多层感知器到傅里叶神经算子，后者最近在流体求解器内在线复现了线性和非线性朗道阻尼）和方程发现方法（如稀疏回归）；并根据这些研究是离线对照参考数据测试还是在线在时间演化求解器内测试进行组织。我们概述了与机器学习闭包相关的挑战，包括非对角压力张量精度、超出训练分布的泛化能力以及稳定集成到大尺度模拟中，并指出了未来研究可能解决这些问题的方向。

英文摘要

The requirement for large-scale global simulations of plasma is an ongoing challenge in both space and laboratory plasma physics. Any simulation based on a fluid model inherently requires a closure relation for the high order plasma moments. This review compiles and analyses the recent surge of machine learning approaches developing improved plasma closure models capable of capturing kinetic phenomena within plasma fluid models. We survey two methodological families: neural-network surrogates (from multilayer perceptrons to Fourier neural operators, the latter recently reproducing both linear and non-linear Landau damping online within a fluid solver) and equation-discovery methods such as sparse regression; and organise the studies by whether they are tested offline against reference data or online within a time-evolving solver. We outline the challenges associated with machine-learning closures, including off-diagonal pressure-tensor accuracy, generalisation beyond the training distribution, and stable integration into large-scale simulations, and the directions future research might take to address them.

URL PDF HTML ☆

赞 0 踩 0

2601.00014 2026-06-19 eess.SP cs.AI cs.LG 版本更新

Modeling Day-Long ECG Signals to Predict Heart Failure Risk with Explainable AI

建模全天心电图信号以可解释人工智能预测心力衰竭风险

Eran Zvuloni, Ronit Almog, Michael Glikson, Shany Brimer Biton, Ilan Green, Izhar Laufer, Offer Amir, Joachim A. Behar

发表机构 * Leumit Health Services（Leumit健康服务）

AI总结提出DeepHHF深度学习模型，利用24小时单导联心电图数据预测五年内心力衰竭风险，AUC达0.80，优于短时片段和临床评分，可解释性分析显示模型关注心律失常和心脏异常。

详情

AI中文摘要

心力衰竭（HF）影响11.8%的65岁及以上成年人，降低生活质量和寿命。预防HF可降低发病率和死亡率。我们假设将人工智能（AI）应用于24小时单导联心电图（ECG）数据可预测五年内HF风险。为此，使用了Technion-Leumit Holter ECG（TLHE）数据集，包括20年间收集的47,729名患者的69,663条记录。我们的深度学习模型DeepHHF在24小时ECG记录上训练，实现了0.80的受试者工作特征曲线下面积，优于使用30秒片段和临床评分的模型。DeepHHF识别的高风险个体住院或死亡事件概率翻倍。可解释性分析显示DeepHHF关注心律失常和心脏异常。本研究强调了深度学习建模24小时连续ECG数据的可行性，捕捉了对可靠风险预测至关重要的阵发性事件。应用于单导联Holter ECG的人工智能无创、廉价且广泛可及，使其成为HF风险预测的有前景工具。

英文摘要

Heart failure (HF) affects 11.8% of adults aged 65 and older, reducing quality of life and longevity. Preventing HF can reduce morbidity and mortality. We hypothesized that artificial intelligence (AI) applied to 24-hour single-lead electrocardiogram (ECG) data could predict the risk of HF within five years. To research this, the Technion-Leumit Holter ECG (TLHE) dataset, including 69,663 recordings from 47,729 patients, collected over 20 years was used. Our deep learning model, DeepHHF, trained on 24-hour ECG recordings, achieved an area under the receiver operating characteristic curve of 0.80 that outperformed a model using 30-second segments and a clinical score. High-risk individuals identified by DeepHHF had a two-fold chance of hospitalization or death incidents. Explainability analysis showed DeepHHF focused on arrhythmias and heart abnormalities. This study highlights the feasibility of deep learning to model 24-hour continuous ECG data, capturing paroxysmal events essential for reliable risk prediction. Artificial intelligence applied to single-lead Holter ECG is non-invasive, inexpensive, and widely accessible, making it a promising tool for HF risk prediction.

URL PDF HTML ☆

赞 0 踩 0

2601.03040 2026-06-19 cs.RO cs.AI cs.LG 版本更新

PiDR: Physics-Informed Inertial Dead Reckoning for Autonomous Platforms

PiDR：面向自主平台的物理信息惯性航位推算

Arup Kumar Sahoo, Itzik Klein

发表机构 * Autonomous Navigation and Sensor Fusion Lab (ANSFL)（自主导航与传感器融合实验室（ANSFL））； Hatter Department of Marine Technologies（海洋技术系）； Charney School of Marine Sciences（海洋科学学院）； University of Haifa（海法大学）

AI总结提出PiDR框架，将惯性导航原理作为物理信息残差融入网络训练，在纯惯性导航中减少轨迹漂移，在移动机器人和水下自主航行器数据集上定位精度提升超29%。

Comments 11 pages and 7 figures

详情

AI中文摘要

完全自主的一个基本要求是在缺乏外部数据（如GNSS信号或视觉信息）的情况下维持精确导航的能力。在这些具有挑战性的环境中，平台必须完全依赖惯性传感器，导致纯惯性导航。然而，在现实场景中，惯性传感器的固有噪声和其他误差项会导致导航解随时间漂移。尽管传统的深度学习模型已成为惯性导航的一种可能方法，但它们本质上是黑箱的。此外，它们在有限的监督传感器数据下难以有效学习，并且常常无法保持物理原理。为了解决这些局限性，我们提出了PiDR，一种用于纯惯性导航情况下自主平台的物理信息惯性航位推算框架。PiDR通过物理信息残差组件将惯性导航原理明确地整合到网络训练过程中，从而提供了透明性。即使在有限或稀疏监督下，PiDR在减轻轨迹突然偏差方面也起着关键作用。我们在移动机器人和自主水下航行器收集的真实世界数据集上评估了PiDR。在两个数据集中，我们获得了超过29%的定位改进，证明了PiDR在不同环境和动力学下运行的不同平台上的泛化能力。因此，PiDR提供了一种鲁棒、轻量级且有效的架构，可以部署在资源受限的平台上，在不利场景中实现实时纯惯性导航。

英文摘要

A fundamental requirement for full autonomy is the ability to sustain accurate navigation in the absence of external data, such as GNSS signals or visual information. In these challenging environments, the platform must rely exclusively on inertial sensors, leading to pure inertial navigation. However, the inherent noise and other error terms of the inertial sensors in such real-world scenarios will cause the navigation solution to drift over time. Although conventional deep-learning models have emerged as a possible approach to inertial navigation, they are inherently black-box in nature. Furthermore, they struggle to learn effectively with limited supervised sensor data and often fail to preserve physical principles. To address these limitations, we propose PiDR, a physics-informed inertial dead-reckoning framework for autonomous platforms in situations of pure inertial navigation. PiDR offers transparency by explicitly integrating inertial navigation principles into the network training process through the physics-informed residual component. PiDR plays a crucial role in mitigating abrupt trajectory deviations even under limited or sparse supervision. We evaluated PiDR on real-world datasets collected by a mobile robot and an autonomous underwater vehicle. We obtained more than 29% positioning improvement in both datasets, demonstrating the ability of PiDR to generalize different platforms operating in various environments and dynamics. Thus, PiDR offers a robust, lightweight, yet effective architecture and can be deployed on resource-constrained platforms, enabling real-time pure inertial navigation in adverse scenarios.

URL PDF HTML ☆

赞 0 踩 0

2602.00510 2026-06-19 cs.AI cs.LG cs.SE 版本更新

一种联合求解具有任意参数和初始分布的瞬态Fokker-Planck方程的深度学习框架

Xiaolong Wang, Jing Feng, Qi Liu, Chengli Tan, Yuanyuan Liu, Yong Xu

发表机构 * School of Mathematics and Statistics, Shaanxi Normal University（陕西师范大学数学与统计学院）； School of Mathematics and Statistics, Northwestern Polytechnical University（西北工业大学数学与统计学院）； MOE Key Laboratory for Complexity Science in Aerospace, Northwestern Polytechnical University（航空复杂科学教育部重点实验室，西北工业大学）； School of Science, Xi’an University of Posts and Telecommunications（西安邮电大学理学院）； Department of Systems and Control Engineering, Institute of Science Tokyo（东京科学大学系统与控制工程系）

AI总结提出基于深度学习的伪解析概率解(PAPS)，通过单次训练同时求解任意多模态初始分布、系统参数和时间点的瞬态FPE，速度比GPU加速蒙特卡洛快四个数量级。

详情

AI中文摘要

高效求解Fokker-Planck方程(FPE)是分析复杂参数化随机系统的核心。然而，当前数值方法缺乏跨不同条件的并行计算能力，严重限制了全面的参数探索和瞬态分析。本文引入一种基于深度学习的伪解析概率解(PAPS)，通过单次训练过程，同时求解任意多模态初始分布、系统参数和时间点的瞬态FPE解。核心思想是通过高斯混合分布(GMD)统一初始、瞬态和稳态分布，并开发一个约束保持自编码器，将受约束的GMD参数双射映射到无约束的低维潜在表示。在该表示空间中，可以建模跨不同初始条件和系统参数的全局瞬态动力学。在典型系统上的大量实验表明，所提出的PAPS在保持高精度的同时，推理速度比GPU加速的蒙特卡洛模拟快四个数量级。这种效率提升使得以前难以实现的实时参数扫描和随机分岔的系统研究成为可能。通过将表示学习与物理信息瞬态动力学解耦，我们的工作为多维参数化随机系统的概率建模建立了一个可扩展的范式。

英文摘要

Efficiently solving the Fokker-Planck equation (FPE) is central to analyzing complex parameterized stochastic systems. However, current numerical methods lack parallel computation capabilities across varying conditions, severely limiting comprehensive parameter exploration and transient analysis. This paper introduces a deep learning-based pseudo-analytical probability solution (PAPS) that, via a single training process, simultaneously resolves transient FPE solutions for arbitrary multi-modal initial distributions, system parameters, and time points. The core idea is to unify initial, transient, and stationary distributions via Gaussian mixture distributions (GMDs) and develop a constraint-preserving autoencoder that bijectively maps constrained GMD parameters to unconstrained, low-dimensional latent representations. In this representation space, the panoramic transient dynamics across varying initial conditions and system parameters can be modeled by a single evolution network. Extensive experiments on paradigmatic systems demonstrate that the proposed PAPS maintains high accuracy while achieving inference speeds four orders of magnitude faster than GPU-accelerated Monte Carlo simulations. This efficiency leap enables previously intractable real-time parameter sweeps and systematic investigations of stochastic bifurcations. By decoupling representation learning from physics-informed transient dynamics, our work establishes a scalable paradigm for probabilistic modeling of multi-dimensional, parameterized stochastic systems.

URL PDF HTML ☆

赞 0 踩 0

2606.10686 2026-06-19 physics.comp-ph astro-ph.IM cs.LG 版本更新

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

基于物理信息Kolmogorov-Arnold网络的轴对称脉冲星磁层自适应框架

Spyros Rigas, Ioannis Contopoulos, Georgios Alexandridis, Antonios Nathanail

发表机构 * Department of Digital Industry Technologies, School of Science, National and Kapodistrian University of Athens（数字产业技术系，科学学院，国家与卡布利安大学）； Research Center for Astronomy and Applied Mathematics, Academy of Athens（天文与应用数学研究所，雅典学院）

AI总结提出基于Kolmogorov-Arnold网络的自适应框架，结合自动化训练流程和物理收敛准则，在双精度下将PDE残差均方误差降至O(1e-6)，收敛时间缩短至20分钟内，并可靠解析缩小80%的恒星半径。

Comments 25 pages, 10 figures

详情

AI中文摘要

脉冲星磁层直到最近才通过物理信息神经网络（PINNs）进行研究，采用区域分解方法并将分离线和赤道电流片视为无限薄的间断。然而，这一基线方法需要大量手动超参数调整，最终精度有限且需要数小时训练。我们通过引入基于Kolmogorov-Arnold网络的领域特定神经架构、自动化自适应训练流程以及基于物理的收敛准则来改进这一框架，消除了手动校准的需求。所提出的方法提供了自洽的轴对称磁层解，在双精度下PDE残差的均方误差达到O(1e-6)量级——比基线方法提高了两个数量级——同时在单精度下在20分钟内实现收敛。重要的是，该方法可靠地解析了相比基线缩小高达80%的恒星半径，克服了同样挑战传统求解器的严重空间尺度差异。此外，通过改变开放至无穷远的磁通量，我们提供了将其与赤道T点位置关联的方程的修正。完整框架已作为开源库PulsarX发布。

英文摘要

The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

URL PDF HTML ☆

赞 0 踩 0

2606.14776 2026-06-19 cs.RO cs.LG 版本更新

Deep Learning-Based Lunar Crater Terrain Relative Navigation

基于深度学习的月球陨石坑地形相对导航

Batu Candan, Simone Servadio

发表机构 * NASA（美国国家航空航天局）； University of Texas at Austin（德克萨斯大学奥斯汀分校）

AI总结提出一种结合深度学习陨石坑检测器和扩展卡尔曼滤波的地形相对导航算法，在初始位置偏差达5公里时仍能将导航误差降至数百米。

详情

AI中文摘要

准确的位置估计对于未来使用自主飞行器实现月球着陆至关重要，尤其是在地形特征稀疏的危险环境中。本文提出了一种地形相对导航（TRN）算法，该算法结合了我们专门为NASA陨石坑检测挑战问题设计的深度学习陨石坑检测器和扩展卡尔曼滤波（EKF）。我们的检测器分析从轨道获取的单目图像中的陨石坑特征，并通过匈牙利分配方法及基于共识的离群点去除方法，识别它们与全球数据库中陨石坑的匹配。然后，估计的测量值用于优化EKF，其中航天器在月心月固（LCLF）参考系中的姿态估计，结合高度辅助信息，约束径向漂移。仿真结果表明，即使航天器偏离实际位置达5公里，TRN也能从这种情况中恢复，将导航误差降低到几百米。需要注意的是，为了保持陨石坑特征的对应关系，必须将图像分辨率和场景中的尺度与检测器训练集分布相匹配。

英文摘要

Accurate position estimation is crucial for the successful implementation of future lunar landings using autonomous vehicles, especially in dangerous environments with sparse terrain features. In this paper, we propose a terrain relative navigation (TRN) algorithm combining our deep-learning crater detector, which was designed specifically for the NASA Crater Detection Challenge problem, and an Extended Kalman Filter (EKF). Our detector analyzes crater features from the monocular images acquired from orbit, and their matches with craters from a global database are identified via a Hungarian assignment approach followed by the consensus-based outliers removal method. The estimated measurements are then used to refine an EKF, where spacecraft pose estimation in the Lunar-Centered Lunar-Fixed (LCLF) frame of reference, augmented with altitude aiding information, constrains radial drift. The simulation results indicate that even if the spacecraft is off from its actual location up to 5 km, TRN could recover from this situation, achieving navigation error reduction to a few hundred meters. It should be noted that in order to maintain crater feature correspondences, it is important to match the image resolution and the scales within the scene to the detector training set distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.19149 2026-06-19 cs.CR cs.LG 版本更新

OpenAnt: LLM-Powered Vulnerability Discovery Through Code Decomposition, Adversarial Verification, and Dynamic Testing

OpenAnt：通过代码分解、对抗性验证和动态测试实现LLM驱动的漏洞发现

Nahum Korda, Gadi Evron

AI总结提出OpenAnt系统，结合静态分析与LLM推理，通过代码分解、对抗性验证和动态测试三阶段流水线，在降低误报率的同时发现未知漏洞。

详情

AI中文摘要

在大型代码库中自动发现漏洞仍然具有挑战性：传统静态分析误报率高，而模糊测试等动态方法需要大量基础设施且通常针对狭窄的漏洞类别。大型语言模型（LLM）的最新进展使得对程序行为进行语义推理成为可能，但将LLM应用于仓库级安全分析会引入上下文管理、成本和验证方面的挑战。我们提出了OpenAnt，一个开源漏洞发现系统，它在多阶段流水线中集成了静态程序分析与基于LLM的推理。OpenAnt引入了三种关键技术。首先，代码库被分解为自包含的分析单元，并通过从外部入口点的可达性进行过滤，将分析面减少高达97%，同时保留与攻击相关的代码。其次，候选漏洞通过受限攻击者模拟进行对抗性验证，其中模型在现实攻击者能力下评估可利用性。第三，通过动态验证确认发现结果，其中自动生成利用环境，在沙箱容器中执行，并在使用后丢弃。在包括OpenSSL、WordPress和Flowise在内的广泛使用的开源项目上的评估表明，这种架构可以识别先前未知的漏洞，同时保持可管理的分析成本并大幅减少误报。我们的结果表明，结合语义推理与利用验证的闭环漏洞发现流水线，为可扩展的自动化安全分析提供了一条实用路径。OpenAnt已在Apache 2.0许可下开源，网址为https://this https URL。

英文摘要

Automated vulnerability discovery in large codebases remains challenging: traditional static analysis produces high false-positive rates, while dynamic approaches such as fuzzing require substantial infrastructure and often target narrow classes of bugs. Recent advances in large language models (LLMs) enable semantic reasoning about program behavior, but applying LLMs to repository-scale security analysis introduces challenges related to context management, cost, and verification. We present OpenAnt, an open-source vulnerability discovery system that integrates static program analysis with LLM-based reasoning in a multi-stage pipeline. OpenAnt introduces three key techniques. First, codebases are decomposed into self-contained analysis units filtered by reachability from external entry points, reducing the analysis surface by up to 97% while preserving attack-relevant code. Second, candidate vulnerabilities undergo adversarial verification through constrained attacker simulation, where the model evaluates exploitability under realistic attacker capabilities. Third, findings are validated through dynamic verification, in which exploit environments are generated automatically, executed in sandboxed containers, and discarded after use. Evaluation on widely used open-source projects including OpenSSL, WordPress, and Flowise shows that this architecture can identify previously unknown vulnerabilities while maintaining manageable analysis cost and substantially reducing false positives. Our results suggest that closed-loop vulnerability discovery pipelines, combining semantic reasoning with exploit validation, provide a practical path toward scalable automated security analysis. OpenAnt is released as open source under the Apache 2.0 license at https://github.com/knostic/OpenAnt.

URL PDF HTML ☆

赞 0 踩 0

2606.19186 2026-06-19 cs.RO cs.LG 版本更新

Learning to Annotate Delayed and False AEB Events: A Practical System for Extreme Class Imbalance and Asymmetric Label Noise

学习标注延迟和误报AEB事件：针对极端类别不平衡和非对称标签噪声的实用系统

Mengxiang Hao, Xin Jiang, Xinghao Huang, Wenliang Su, Zhiteng Wang, Junjie Rao, Xiaotian Yang, Wei Liao, Chengyu Han, Gen Liang, Yulun Song, Zhitao Xu, Xianpeng Lang

发表机构 * Li Auto（理想汽车）

AI总结提出首个自动化AEB标注框架，通过特定数据增强和噪声抑制技术，解决极端类别不平衡和非对称标签噪声问题，将延迟/误报触发召回率提升80%，人工工作量减少50%。

Comments 8 pages, 5 figures, accepted by IEEE International Conference on Robotics and Automation (ICRA)

Journal ref 2026 IEEE International Conference on Robotics and Automation (ICRA)

详情

AI中文摘要

自主紧急制动（AEB）优化依赖于准确标注的真实世界触发事件，特别是揭示系统缺陷的罕见但关键的延迟和误报AEB触发事件。然而，这些少数样本在每天数千次触发事件中占比不到5%，使得大规模人工标注成本过高。我们提出了首个自动化AEB标注框架来解决这一问题。在开发过程中，我们识别出两个严重损害延迟/误报触发标注准确性的基本挑战：（1）极端类别不平衡，其中延迟/误报触发被真实触发淹没；（2）非对称标签噪声，其中误标注的多数样本（真实触发）抑制了少数样本（延迟/误报触发）的学习。为克服这些挑战，我们提出两项关键创新：（1）特定数据增强，通过操纵焦点目标属性、移植自车动态和掩蔽非焦点代理来合成逼真样本；（2）噪声抑制，使用稳定硬度估计和探针引导的自适应阈值来清理误标注的真实触发样本。关键的是，我们将模型部署为具有全栈架构的实用标注系统，从每天数千个AEB事件中高效识别关键的延迟/误报触发。生产结果表明，延迟/误报触发的召回率提高了80%，人工工作量减少了50%。除了直接收益，该系统通过积累高质量标注实现持续自我改进，为车载AEB系统优化奠定了必要的数据基础。

英文摘要

Autonomous Emergency Braking (AEB) optimization relies on accurately annotated real-world trigger events, particularly rare but critical delayed and false AEB triggers that expose system deficiencies. However, these minority samples comprise less than 5% of thousands of daily triggers, making manual annotation prohibitively expensive at scale. We present the first automated AEB annotation framework to address this problem. During development, we identified two fundamental challenges that severely impair delayed/false trigger annotation accuracy: (1) Extreme class imbalance where delayed/false triggers are overwhelmed by true triggers; (2) Asymmetric label noise where mislabeled majority samples (true triggers) suppress minority samples (delayed/false triggers) learning. To overcome these challenges, we propose two key innovations: (1) Specific data augmentation that synthesizes realistic samples by manipulating focal target attributes, transplanting ego-vehicle dynamics, and masking non-focal agents; (2) noise suppression using stable hardness estimation and probe-guided adaptive threshold to clean mislabeled true trigger samples. Crucially, we deploy our model as a practical annotation system with full-stack architecture, efficiently identifying critical delayed/false triggers from thousands of daily AEB events. Production results demonstrate 80% improvement in recall of delayed/false triggers and 50% reduction in manual workload. Beyond immediate gains, the system enables continuous self-improvement through accumulated high-quality annotations, establishing a necessary data foundation for on-vehicle AEB system optimization

URL PDF HTML ☆

赞 0 踩 0

2606.19610 2026-06-19 cs.LG cs.AI 新提交

Latent Confounded Causal Discovery via Lie Bracket Geometry

基于李括号几何的潜在混杂因果发现

Sridhar Mahadevan

发表机构 * Adobe Research（Adobe研究院）； University of Massachusetts, Amherst（马萨诸塞大学阿默斯特分校）

AI总结利用信息几何和范畴论，提出两种算法（BRIDGE和SKFM），通过干预诱导流的李括号非闭合性检测潜在混杂，大幅缩减因果图搜索空间。

Comments 39 pages

详情

AI中文摘要

最近关于Kan-Do-Calculus (KDC)的工作已经确立了被动观察和主动干预在因果推断中的边界是一个范畴论双伴随，其中干预由左Kan扩展建模，条件作用由右Kan扩展建模。本文在潜在混杂下引入了两种因果发现算法，基于KDC的信息几何和范畴论结果。在光滑统计设置中，观测和干预测度之间的Radon-Nikodym导数诱导局部因果向量场；这些场在李括号下不闭合的失败成为可计算的Frobenius残差，我们将其解释为失败的可视可积性和可能的潜在或未建模结构的证据。我们的第一个算法BRIDGE（用于干预发现和几何估计的括号残差）结合了一个干预密度或Radon-Nikodym比引擎与一个几何筛选器，该筛选器提出一个高召回率的可接受箭头族，识别非闭合的可视对作为潜在障碍候选，并将缩减后的族传递给下游的基于分数或可微的发现程序。第二个算法贡献，谱Kan-Do流匹配（SKFM），学习摊销干预场并在谱上分解潜在曲率，揭示BRIDGE指向的直接李空间端点。一系列详细的实验表明，两种算法都能发现具有潜在混杂的因果模型，同时将可能的DAG的超指数空间缩减多个数量级。本文引入了一种新的因果发现范式，其中潜在结构直接从干预诱导流的几何中推断出来。

英文摘要

Recent work on Kan-Do-Calculus (KDC) has established that the boundary between passive observation and active intervention in causal inference is a category-theoretic bi-adjunction, with interventions modeled by left Kan extensions and conditioning by right Kan extensions. This paper introduces two causal discovery algorithms under latent confounding, building on the information-geometric and categorical consequences of KDC. In smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures induce local causal vector fields; failures of these fields to close under Lie brackets become computable Frobenius residuals, which we interpret as witnesses of failed visible integrability and possible latent or unmodeled structure. Our first algorithm, BRIDGE (Bracket Residuals for Interventional Discovery and Geometric Estimation), combines an interventional density or Radon-Nikodym-ratio engine with a geometric screen that proposes a high-recall family of admissible arrows, identifies non-closing visible pairs as latent-obstruction candidates, and passes the reduced family to downstream score-based or differentiable discovery routines. The second algorithmic contribution, Spectral Kan-Do Flow Matching (SKFM), learns amortized intervention fields and factors latent curvature spectrally, exposing the direct Lie-space endpoint toward which BRIDGE points. A detailed set of experiments show that both algorithms are capable of discovering causal models with latent confounders while collapsing the super-exponential space of possible DAGs by many orders of magnitude. This paper introduces a new paradigm in causal discovery, where latent structure is inferred directly from the geometry of intervention-induced flows.

URL PDF HTML ☆

赞 0 踩 0

2606.20493 2026-06-19 cs.LG cs.AI cs.MA 新提交

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

传染网络：多智能体LLM系统中的评估者偏见传播

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering（齐鲁理工学院软件工程学院）

AI总结提出传染网络框架，量化评估者偏见在多智能体LLM系统中的传播，发现同模型智能体间偏见传播系数为0.157-0.352，且增大评估委员会规模可减少72.4%的传播效应。

Comments 20 pages, 4 figures, 4 tables

详情

AI中文摘要

当大型语言模型在多智能体系统中担任评估者时，其系统性评估偏见会通过智能体网络传播。我们引入传染网络，这是一个用于衡量评估者偏见如何在交互的LLM智能体间传播的正式框架。在使用DeepSeek-chat进行的受控3智能体实验中，我们采用了三种不同的评估者偏见配置文件（结构化、平衡、基于证据），测量了跨智能体传染矩阵Gamma_3，并发现评估者偏见始终在智能体间传播（gamma在[0.157, 0.352]范围内），即使是在相同底层模型内也是如此。我们识别出由谱半径rho(Gamma_N)控制的三种传播机制，并证明同质模型智能体产生的传染系数比先前工作中观察到的跨模型系数弱3-5倍（MM-EPC: gamma约0.85-1.3），使其处于抑制机制中。我们表明，将评估委员会规模从k=1增加到k=3可将有效传染减少72.4%，提供了一种可行的缓解策略。我们发布了开源的传染网络实验框架。

英文摘要

When large language models serve as evaluators in multi-agent systems, their systematic evaluation biases propagate through the agent network. We introduce Contagion Networks, a formal framework for measuring how evaluator biases spread across interacting LLM agents. In a controlled 3-agent experiment using DeepSeek-chat with three distinct evaluator bias profiles (structured, balanced, evidence-based), we measure the Cross-Agent Contagion Matrix Gamma_3 and find that evaluator biases consistently propagate between agents (gamma in [0.157, 0.352]), even within the same underlying model. We identify three propagation regimes governed by the spectral radius rho(Gamma_N), and demonstrate that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model coefficients observed in prior work (MM-EPC: gamma approx 0.85-1.3), placing them in the suppression regime. We show that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, providing an actionable mitigation strategy. We release the open-source Contagion Network experimental framework.

URL PDF HTML ☆

赞 0 踩 0

2606.20560 2026-06-19 cs.LG cs.AI 新提交

How Transparent is DiffusionGemma?

DiffusionGemma 的透明度如何？

Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda

发表机构 * Google（谷歌）

AI总结研究DiffusionGemma在连续潜空间中的推理透明度，通过变量透明度和算法透明度分解，发现可解释的令牌瓶颈将不透明串行深度降至Gemma 4的1.1倍，并揭示扩散特有现象。

Comments 20 main text pages and 6 pages of references and appendices

详情

AI中文摘要

LLM推理透明度是理解模型决策、减少误用和错位以及调试意外模型行为的关键能力。然而，DiffusionGemma在连续潜空间中执行了更大比例的计算；这是否使其推理透明度降低？我们通过将透明度分解为两个组成部分来研究这个问题：变量透明度，即我们是否理解模型计算状态的中间快照；以及算法透明度，即我们是否能够利用这些快照重建模型得出其输出的过程。直观上，DiffusionGemma的变量透明度较差：其不透明串行深度，即在可解释模型状态之间发生的串行计算量，最初似乎是相应自回归Gemma 4模型的28.6倍。然而，我们表明，我们可以通过一个可解释的令牌瓶颈映射去噪步骤之间流动的信息，且下游性能没有下降。将这些中间状态视为可解释的，将不透明串行深度降至仅为Gemma 4的1.1倍。对于扩散模型来说，算法透明度比自回归模型更难，因为画布中的所有令牌预测在每个去噪步骤中都可能发生变化，这使模型有能力在去噪过程中实现复杂的分布式算法。为了开始弥合这一差距，我们进行了一系列可解释性案例研究，发现了扩散特有现象（如非时序推理、令牌和序列涂抹以及中间上下文推理）的初步证据。最后，我们测试了可监控性，这是透明度的一个关键应用，衡量模型输出是否对下游任务有用。我们发现DiffusionGemma的可监控性与Gemma 4相似。

英文摘要

LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less transparent? We study this question by decomposing transparency into two components: variable transparency, whether we understand intermediate snapshots of a model's computational state; and algorithmic transparency, whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. Naively, DiffusionGemma has poor variable transparency: its opaque serial depth, the amount of serial computation that occurs in between interpretable model states, seems at first 28.6X higher than the corresponding autoregressive Gemma 4 model. However, we show that we can map the information flowing between denoising steps through an interpretable token bottleneck with no decrease in downstream performance. Treating these intermediate states as interpretable reduces the opaque serial depth to just 1.1X that of Gemma 4. Algorithmic transparency is harder for diffusion models than for autoregressive models because all token predictions in the canvas can change at every denoising step, giving the model the power to implement complicated distributed algorithms during the denoising process. To begin bridging this gap, we conduct a suite of interpretability case studies, uncovering initial evidence of novel diffusion-specific phenomena such as non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. Finally, we test monitorability, a key application of transparency that measures whether model outputs are useful for downstream tasks. We find that DiffusionGemma is similarly monitorable to Gemma 4.

URL PDF HTML ☆

赞 0 踩 0

2606.19386 2026-06-19 cs.SE cs.AI cs.LG 交叉投稿

策略感知向量搜索：向量数据库中细粒度访问控制的愿景

Lakshmi Sahithi Yalamarthi, Primal Pappachan

AI总结本文提出策略感知向量搜索的愿景，形式化向量数据库中的细粒度访问控制（FGAC）策略模型与实施问题，比较不同实施策略并指出未来挑战。

Comments Accepted at SeQureDB 26, Sigmod 2026

详情

AI中文摘要

向量数据库越来越多地用于安全敏感的场景，如检索增强生成和组织AI管道；然而，其安全能力仍然有限。具体而言，现代向量数据库不完全支持细粒度访问控制（FGAC），而FGAC是确保数据访问符合用户特定策略所必需的。与关系数据库不同，向量数据库结合结构化和非结构化属性以提供语义近似查询结果，这使FGAC实现复杂化。这就在正确执行FGAC策略、实现高ANN搜索召回率和保持低查询延迟之间产生了内在张力。在本文中，我们通过形式化向量数据库中的FGAC策略模型以及实施问题，提出了策略感知向量搜索的愿景。我们比较了各种实施策略，展示了初步发现，并指出了未来策略感知向量搜索研究的关键开放挑战。

英文摘要

Vector databases are increasingly used in security sensitive contexts with Retrieval Augmented Generation and organizational AI pipelines; however, their security capabilities remain limited. Specifically, Fine-grained Access Control (FGAC) which is required to ensure that data access adheres to user-specific policies is not fully supported in modern vector databases. Unlike relational databases, vector databases combine structured and unstructured attributes to provide semantic, approximate query results, which complicates FGAC implementation. This creates an inherent tension between enforcing FGAC policies correctly, achieving high ANN search recall and maintaining low query latency. In this paper, we present a vision for Policy-aware Vector Search by formalizing the FGAC policy model in vector databases as well as the enforcement problem. We compare various enforcement strategies, present preliminary findings, and identify key open challenges for future research in policy-aware vector search.

URL PDF HTML ☆

赞 0 踩 0

2505.22829 2026-06-19 cs.LG cs.AI 版本更新

Bridging Distribution Shift and AI Safety: Conceptual and Methodological Synergies

弥合分布偏移与AI安全：概念与方法论的协同

Chenruo Liu, Kenan Tang, Yao Qin, Qi Lei

发表机构 * Center for Data Science, New York University New York New York USA ； Computer Science Department, University of California, Santa Barbara Santa Barbara California USA ； Department of Electrical ； Computer Engineering, University of California, Santa Barbara Santa Barbara California USA ； Courant Institute for Mathematical Sciences \& Center for Data Science, New York University New York New York USA ； Center for Data Science, New York University ； Computer Science Department, University of California, Santa Barbara ； Computer Engineering, University of California, Santa Barbara ； Courant Institute for Mathematical Sciences \& Center for Data Science, New York University

AI总结本文通过分析分布偏移与AI安全之间的概念和方法论协同，建立了特定偏移类型与细粒度安全问题之间的两种联系，促进了两领域研究的深度融合。

Comments 35 pages

2406.02421 2026-06-19 cs.DM cs.LG cs.SC 版本更新

Representing Piecewise-Linear Functions by Functions with Minimal Arity

用最小元数函数表示分段线性函数

Christoph Koutschan, Anton Ponomarchuk, Josef Schicho

发表机构 * Johann Radon Institute for Computational and Applied Mathematics（约翰·拉登研究所（计算与应用数学））； Research Institute for Symbolic Computation（符号计算研究所）； Johannes Kepler University（约翰· Kepler大学）

AI总结本文研究了连续分段线性函数表示为max函数线性组合所需的最小参数个数，建立了函数诱导的空间剖分与所需参数个数之间的直接联系。

2509.03122 2026-06-19 cs.CL cs.AI cs.LG 版本更新

From Construction to Injection: Edit-Based Fingerprints for Large Language Models

从构建到注入：面向大型语言模型的基于编辑的指纹

Yue Li, Xin Yi, Dongsheng Shi, Yongyi Cui, Gerard de Melo, Linlin Wang

发表机构 * East China Normal University（华东师范大学）； Hasso Plattner Institute/University of Potsdam（哈索罗普拉特纳研究所/波茨坦大学）

AI总结提出端到端注入指纹框架，通过代码混合指纹和多候选编辑方法，解决黑盒部署中指纹的不可感知性和鲁棒性挑战。

Comments preprint

详情

AI中文摘要

可靠的模型指纹对于保护大型语言模型（LLMs）免受未经授权的重新分发和商业滥用至关重要。在黑盒部署中，验证受到对可疑指纹查询的防御性过滤以及可能削弱嵌入所有权证据的下游模型修改的阻碍。这些风险要求指纹在构建和注入方面都具有鲁棒性。在构建方面，先前的范式面临不可感知性的权衡：自然语言指纹可能被意外激活，而乱码指纹在统计上暴露且更容易被过滤。在注入方面，现有方法难以在模型修改下保持持久的触发-目标行为。我们提出了一个端到端的注入指纹框架来解决这些挑战。代码混合指纹（CF）在高复杂度约束下使用最低困惑度的代码混合来缓解这种双向不可感知性权衡。多候选编辑（MCEdit）构建结构冗余、间隔分离的触发-目标映射，以在模型修改下实现优雅降级。在不可感知性、可检测性和无害性方面的广泛评估表明，该框架在几乎不影响实用性的情况下实现了鲁棒的所有权验证。

英文摘要

Reliable model fingerprints are essential for protecting large language models (LLMs) against unauthorized redistribution and commercial misuse. In black-box deployment, verification is hindered by defensive filtering of suspected fingerprint queries, as well as by downstream model modifications that may weaken embedded ownership evidence. These risks require fingerprints to be robust in both construction and injection. For construction, prior paradigms face an imperceptibility trade-off: natural-language fingerprints may be accidentally activated, whereas garbled fingerprints are statistically exposed and easier to filter. For injection, existing methods struggle to preserve persistent trigger--target behaviors under model modification. We propose an end-to-end injected fingerprinting framework to address these challenges. Code-mixing Fingerprints (CF) use lowest-perplexity code-mixing under a high-complexity constraint to mitigate this two-sided imperceptibility trade-off. Multi-Candidate Editing (MCEdit) constructs structurally redundant, margin-separated trigger--target mappings to enable graceful degradation under model modification. Extensive evaluations on imperceptibility, detectability, and harmlessness demonstrate robust ownership verification with negligible impact on utility.

URL PDF HTML ☆

赞 0 踩 0

2605.20531 2026-06-19 cs.LO cs.LG 版本更新

Pseudo-Formalization for Automatic Proof Verification

伪形式化用于自动证明验证

Slim Barkallah, Luke Bailey, Kaiyue Wen, Mohammed Abouzaid, Tengyu Ma

发表机构 * GitHub

AI总结本文提出了一种名为伪形式化的证明格式，该格式在保持自然语言灵活性的同时，保留了形式证明的模块性和精确性，通过块验证算法实现了对自然语言证明的高效验证，其在错误发现的精度和召回率上优于现有基线方法。

Comments 31 pages, code available at https://github.com/Slim205/pseudo-formalization

详情

AI中文摘要

可靠的证明验证仍然是训练和评估在复杂数学推理上的人工智能系统的主要瓶颈。在像Lean这样的语言中，完全形式化的证明容易验证，因为它们是无歧义且模块化的。大多数证明，尤其是由人工智能系统编写证明，既没有这种属性，将它们翻译成形式语言在许多前沿数学领域仍然具有挑战性。我们提出了伪形式化（PF），一种证明格式，它捕捉了形式证明的模块性和精确性，同时保留了自然语言的灵活性。一个伪形式化证明被分解成自包含的模块，每个模块陈述其前提、结论和证明，用自然语言。为了验证一个常规的自然语言证明的正确性，一个LLM将其翻译成伪形式化，然后独立验证每个模块，我们称之为块验证（BV）。我们在两个涵盖竞赛和研究级数学的基准上评估PF+BV，其中它在错误发现的精度和召回率上优于LLM-as-judge基线。为了支持未来的工作，我们发布了我们的研究级证明验证基准ArxivMathGradingBench。

英文摘要

Reliable verification of proofs remains a bottleneck for training and evaluating AI systems on hard mathematical reasoning. Fully formal proofs, in languages like Lean, are easy to verify because they are unambiguous and modular. Most proofs, particularly those written by AI systems, have neither property, and translating them into formal languages remains challenging in many frontier math settings. We propose Pseudo-Formalization (PF), a proof format that captures the modularity and precision of formal proofs while retaining the flexibility of natural language. A Pseudo-Formal proof is decomposed into self-contained modules, each stating its premises, conclusion, and proof in natural language. To verify the correctness of a regular natural language proof, an LLM translates it to Pseudo-Formal and then verifies each module independently, an algorithm we call Block Verification (BV). We evaluate PF+BV on two benchmarks spanning olympiad and research-level mathematics, where it pareto-dominates LLM-as-judge baselines on error-finding precision and recall. To support future work, we release our research-level proof verification benchmark ArxivMathGradingBench.

URL PDF HTML ☆

赞 0 踩 0

2407.11933 2026-06-19 cs.LG 版本更新

Fairness-Aware Multi-Group Target Detection in Online Discussion

Soumyajit Gupta, Maria De-Arteaga, Matthew Lease

发表机构 * Dept. of Computer Science, The University of Texas at Austin（德克萨斯大学奥斯汀分校计算机科学系）； Department of Data, Analytics, Technology, and Artificial Intelligence, ESADE（ESADE大学数据、分析、技术和人工智能系）； The Information School, The University of Texas at Austin（德克萨斯大学奥斯汀分校信息学院）

Journal ref 2026 ACM Conference on Fairness, Accountability, and Transparency (FAccT)

2602.05416 2026-06-19 cs.CE cs.AI cs.LG physics.ao-ph physics.flu-dyn 版本更新

Reduced-Order Surrogates for Forced Flexible Mesh Coastal-Ocean Models

Freja Høgholm Petersen, Jesper Sandvig Mariegaard, Rocco Palmitessa, Allan P. Engsig-Karup

发表机构 * DTU（技术大学）

Comments Submitted for peer-review in a journal. v2: revised version submitted to journal after minor revisions

2601.12433 2026-06-19 eess.SP cs.LG 版本更新

Temporal Data and Short-Time Averages Improve Multiphase Mass Flow Metering

Amanda Nyholm, Yessica Arellano, Jinyu Liu, Damian Krakowiak, Pierluigi Salvo Rossi

发表机构 * Dept. Electronic Systems, Norwegian University of Science and Technology（电子系统系，挪威科学与技术大学）； Dept. Gas Technology, SINTEF Energy Research（气体技术系，SINTEF能源研究）； Dept. Research and Development, KROHNE Ltd.（研发部，KROHNE有限公司）

Comments 9 pages, 6 figures

Journal ref IEEE Sensors Journal, vol. 26, no. 11, pp. 17252-17261, 1 June 2026

2506.23396 2026-06-19 stat.ML cs.LG 版本更新

AICO: Feature Significance Tests for Supervised Learning

Kay Giesecke, Enguerrand Horel, Chartsiri Jirachotkulthorn

发表机构 * Stanford University, Department of Management Science and Engineering and Institute for Computational and Mathematical Engineering（斯坦福大学管理科学与工程系和计算与数学工程研究所）； Upstart, Inc.（Upstart公司）； Stanford University, Institute for Computational and Mathematical Engineering（斯坦福大学计算与数学工程研究所）

2412.20298 2026-06-19 cs.LG cs.CY stat.ML 版本更新

An Experimental Study on Fairness-aware Machine Learning for Credit Scoring Problems

Huyen Giang Thi Thu, Thang Viet Doan, Ha-Bang Ban, Tai Le Quy

发表机构 * Banking Academy of Vietnam（越南银行学院）； Vietnam Academy of Science and Technology（越南科学技术 academy）； Hanoi University of Science and Technology（河内科学技术大学）； University of Koblenz（科隆大学）

Comments The manuscript is submitted to Springer Nature's journal

2602.14239 2026-06-19 cs.SI cs.AI cs.LG 版本更新

A Hybrid TGN-SEAL Model for Dynamic Graph Link Prediction

Nafiseh Sadat Sajadi, Behnam Bahrak, Mahdi Jafari Siavoshani

发表机构 * Department of Computer Engineering, Sharif University of Technology（谢尔万大学计算机工程系）； Tehran Institute for Advanced Studies, Khatam University（泰赫兰高级研究院，卡塔姆大学）

Journal ref EPJ Data Science (2026)

2510.05013 2026-06-19 stat.ML cs.LG 版本更新

Curiosity-Driven Development of Action and Language in Robots Through Self-Exploration

通过自我探索的机器人好奇心驱动行为与语言发展

Theodore Jerome Tinker, Kenji Doya, Jun Tani

发表机构 * Okinawa Institute of Science and Technology（冲绳科学技术大学院大学）

AI总结本研究通过好奇心驱动的机器人自我探索，结合Q学习实现主动推理，揭示了组合泛化、快速学习、先配对后组合以及异常处理导致的U型发展模式，为人类高效语言习得提供解释。

Comments 27 pages, 22 pages of supplementary material

详情

AI中文摘要

婴儿通过极少的经验就能泛化习得语言，而大型语言模型需要数十亿的训练标记。人类高效发展的基础是什么？我们通过实验研究了这一问题，其中机器人代理通过好奇心驱动的自我探索学习执行与祈使句（例如，推红色立方体）相关的动作。我们的方法使用Q学习摊销主动推理，实现内在动机的发展性学习。模拟揭示了与发展心理学观察相对应的关键发现。i) 随着组合元素规模的增加，泛化能力显著提高。ii) 好奇心驱动的探索能够加速学习。iii) 句子和动作的机械配对先于组合泛化。iv) 异常处理导致U型发展表现，这种模式类似于儿童语言学习中的表征重述。这些结果表明，好奇心驱动的主动推理解释了内在动机的感觉运动-语言学习如何支持人类和人工代理中的可扩展组合泛化和异常处理。

英文摘要

Infants acquire language with generalization from minimal experience, whereas large language models require billions of training tokens. What underlies efficient development in humans? We investigated this problem through experiments wherein robotic agents learn to perform actions associated with imperative sentences (e.g., push red cube) via curiosity-driven self-exploration. Our approach amortizes active inference using Q-learning, enabling intrinsically motivated developmental learning. The simulations reveal key findings corresponding to observations in developmental psychology. i) Generalization improves drastically as the scale of compositional elements increases. ii) Curiosity-driven exploration enables faster learning. iii) Rote pairing of sentences and actions precedes compositional generalization. iv) Exception-handling induces U-shaped developmental performance, a pattern like representational redescription in child language learning. These results suggest that curiosity-driven active inference accounts for how intrinsically motivated sensorimotor-linguistic learning supports scalable compositional generalization and exception handling in humans and artificial agents.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 31 篇

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks

Efficiently Representing Algorithms With Chain-of-Thought Transformers

Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System

Neural Additive and Basis Models with Feature Selection and Interactions

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds

Kolmogorov-Arnold Reservoir Computing

Shifting-based Optimizable Linear Relaxations for General Activation Functions

Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

Neural Architectures as Functional Priors in Physics-Informed Control Problems

ITNet: A Learnable Integral Transform That Subsumes Convolution, Attention, and Recurrence

GB-LSR: A Fast Local Spectral Image Representation with a Single Global Bandwidth for Continuous Reconstruction and Super-Resolution

Token Factory: Efficiently Integrating Diverse Signals into Large Recommendation Models

Leverage Is Not Reach: A Control-Window Law for Single-Neuron Steering in Language Models

GEMS: Geometric Constraints Enable Multi-Semantic Superposition in LLMs

PU-UNet: Stable Multiplicative Interactions for Medical Image Segmentation

Wisdom of Committee: Diverse Distillation from Large Foundation Models and Domain Experts

A Unified Perspective on the Dynamics of Deep Transformers

Linear Mode Connectivity under Data Shifts for Deep Ensembles of Image Classifiers

Model soups need only one ingredient

Reversible Residual Normalization Alleviates Spatio-Temporal Distribution Shift

Minimal Filling Architectures of Polynomial Neural Networks: Counterexamples, Frontier Search, and Defects

DisjunctiveNet: Neural Symbolic Learning via Differentiable Convexified Optimization Layers

RepNN: Tackling spectral bias in deep neural networks via parameter reparameterization

From Drift to Coherence: Stabilizing Beliefs in LLMs

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

Toward all-optical unsupervised Hebbian learning in deep photonic neuromorphic networks

Higher-Order Token Interactions via Quantum Attention

QC-GAN: A Parameter-Efficient Quaternion Conformer GAN for High-Fidelity Speech Enhancement

2. 表示学习、自监督与对比学习 11 篇

FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning

3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning

Tracking Representation Dynamics in Large Language Models with Persistent Homology

Unsupervised Causal Abstractions Discovery

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

Multimodal Concept Bottleneck Models

UNIEGO: Proxies as Mediators for Unified Egocentric Video Representation Learning

Self-attention-based non-linear basis transformations for compact latent space modelling of dynamic optical fibre transmission matrices

Adversarial Dependence Minimization

3. 强化学习与序列决策 35 篇

Human-like autonomy emerges from self-play and a pinch of human data

Can In-Context Learning Support Intrinsic Curiosity?

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

OnDeFog: Online Decision Transformer under Frame Dropping

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

VIMPO: Value-Implicit Policy Optimization for LLMs

Hierarchical Control in Multi-Agent Games: LLM-based Planning and RL Execution

Sensorimotor World Models: Perception for Action via Inverse Dynamics

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

Formal Verification of Learned Multi-Agent Communication Policies via Decision Tree Distillation

DF-ExpEnse: Diffusion Filtered Exploration for Sample Efficient Finetuning

Deep-Unfolded Coordination

Stochastic Linear Contextual Bandits with Bounded Noise: A Set-Membership Approach

Off-Policy Evaluation for Missingness-Aware Policies in MDPs with Rewards Missing Not at Random

A Multi-Agent system for Multi-Objective constrained optimization

A Model-Driven Approach for Developing Families of Reinforcement Learning Environments

Robust $Q$-learning for mean-field control under Wasserstein uncertainty in common noise

Enhancing Generative Auto-bidding with Offline Reward Evaluation and Policy Search

EQPO: Equitable Group Relative Policy Optimization for Clinical Reasoning

Beyond Reasoning Gains: Mitigating General-Capability Forgetting in Large Reasoning Models

Stabilizing the Q-Gradient Field for Policy Smoothness in Actor-Critic Methods

DADP: Domain Adaptive Diffusion Policy

Flickering Multi-Armed Bandits

Approximate Next Policy Sampling: Replacing Conservative Target Policy Updates in Deep RL

Large Language Models Hack Rewards, and Society

StarOR: Synergizing Tree Search and Test-Time Reinforcement Learning for Optimization Modeling

Reinforcement Learning Foundation Models Should Already Be A Thing

Reinforcement Twinning for Hybrid Control of Flapping-Wing Drones

Oranits: Mission Assignment and Task Offloading in Open RAN-based ITS using Metaheuristic and Deep Reinforcement Learning

Utility-Aware DRL-Based TXOP Adaptation for NR-U and Wi-Fi Coexistence Networks

Superhuman Safe and Agile Racing through Multi-Agent Reinforcement Learning