URL PDF HTML ☆

赞 0 踩 0

2606.09907 2026-06-10 cs.LG cs.AI 新提交

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

LongMoE：基于轨迹感知的混合专家模型的纵向多模态学习

Maxx Richard Rahman, Prakhar Kumar, Wolfgang Maass

AI总结提出LongMoE框架，通过上下文感知插补、注意力标记化、轨迹感知编码和稀疏MoE路由，联合解决临床多模态学习中模态缺失和纵向动态两大挑战，在ADNI等数据集上验证了鲁棒性。

详情

AI中文摘要

多模态临床学习对于整合包括影像、文本和个性化健康记录在内的多样化患者数据日益重要。然而，它面临两个基本挑战：i) 模态缺失，即在一次患者就诊中任意子集的模态不可用；ii) 纵向动态，即观察结果的诊断意义取决于患者随时间演变的疾病轨迹。现有方法孤立地处理这些挑战：缺失模态框架将每次就诊视为独立的静态快照并丢弃时间上下文，而纵向模型通常假设模态完全可用并在系统性模态不完整时性能下降。我们提出LongMoE（纵向混合专家模型），这是一个统一框架，用于联合解决这两个挑战。LongMoE结合了上下文感知插补模块和注意力标记化模块，后者捕获不规则就诊序列中的频域时间模式，以及用于建模疾病进展的轨迹感知编码器和用于患者特定专家选择的上下文条件稀疏MoE路由。在ADNI、OASIS-3和MIMIC-IV上的实验表明，LongMoE在缺失或弱共时模态下提高了鲁棒性，并在全模态设置中保持竞争力，为纵向感知的多模态临床学习奠定了坚实基础。

英文摘要

Multimodal clinical learning is increasingly important for integrating diverse patient data, including imaging, text, and personalised health records. However, it faces two fundamental challenges: i) modality missingness, where arbitrary subsets of modalities are unavailable at a given patient visit, ii) longitudinal dynamics, where the diagnostic significance of an observation depends on the patient's evolving disease trajectory over time. Existing methods address these challenges in isolation: missing-modality frameworks treat each visit as an independent static snapshot and discard temporal context, while longitudinal models often assume complete modality availability and degrade under systematic modality incompleteness. We propose LongMoE (Longitudinal Mixture-of-Experts), the unified framework to jointly address both challenges. LongMoE combines a context-aware imputation module with an attentional tokenization module that captures frequency-domain temporal patterns across irregular visit sequences, a trajectory-aware encoder for modeling disease progression, and context-conditioned Sparse MoE routing for patient-specific expert selection. Experiments on ADNI, OASIS-3, and MIMIC-IV show that LongMoE improves robustness under missing or weak contemporaneous modalities and remains competitive in full-modality settings, establishing a strong foundation for longitudinally-aware multimodal clinical learning.

URL PDF HTML ☆

赞 0 踩 0

2606.09917 2026-06-10 cs.LG 新提交

SPDM: Geometry-Modulated State Space Modeling with Manifold Constraints for Time Series Forecasting

SPDM: 基于流形约束的几何调制状态空间建模用于时间序列预测

Xingsheng Chen, Siu-Ming Yiu

AI总结提出SPDM，一种将对称正定流形约束引入状态空间模型的几何感知架构，通过流形轨迹和几何门控机制调制选择性扫描，在保持线性复杂度同时提升多变量时间序列预测精度。

详情

AI中文摘要

多变量时间序列预测需要捕捉交互变量间持续演化的相关结构。现有状态空间模型通过扫描标记化的时间或空间序列来处理时间序列，忽略了演化的几何结构。我们通过将流形约束引入状态空间建模来解决这一局限性：将跨变量相关结构视为对称正定流形上的连续轨迹，其黎曼几何特征、切空间线性度和弗雷歇均值中心性作为原则性的几何正则化器，引导并稳定SSM的选择性扫描动态。我们提出SPDM，一种几何感知的SSM架构，通过两种协作机制实现这一原则：一个流形轨迹路径，将动态演化的协方差矩阵从SPD流形投影到欧几里得切空间；以及一个几何门控方案，基于从流形轨迹导出的几何信号直接调制SSM的内部选择性参数。该参数化在嵌入丰富结构约束的同时保持了Mamba并行扫描的线性时间复杂度，使架构同时保持预测精度和计算效率。在11个真实世界基准数据集上的广泛实验建立了最先进的预测性能，进一步研究证实几何约束的状态空间动态是其性能提升背后的主导架构因素。

英文摘要

Multivariate time series forecasting requires capturing the continuously evolving correlation structure among interacting variables. Existing state-space models process time series by scanning tokenized temporal or spatial sequences, discarding the evolutionary geometric structure. We address this limitation by introducing manifold constraints into state-space modeling: treating the cross-variable correlation structure as a continuous trajectory on the symmetric positive definite manifold, whose Riemannian geometric features, tangent space linearity, and Frechet mean centrality act as a principled geometric regularizer that guides and stabilizes the selective scanning dynamics of SSMs. We propose SPDM, a geometry-aware SSM architecture that realizes this principle through two cooperating mechanisms: a manifold trajectory path that projects dynamically evolving covariance matrices from the SPD manifold to a Euclidean tangent space, and a geometric gating scheme that directly modulates SSM's internal selective parameters based on geometric signals derived from the manifold trajectory. The parameterization preserves the linear-time complexity of the Mamba parallel scan while embedding rich structural constraints, making the architecture preserve prediction accuracy and computational efficiency simultaneously. Extensive experiments on eleven real-world benchmark datasets establish state-of-the-art forecasting performance, and further studies confirm that geometrically constrained state-space dynamics are the dominant architectural factor behind its performance gains.

URL PDF HTML ☆

赞 0 踩 0

2606.09928 2026-06-10 cs.LG cs.AI 新提交

Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment

具有可学习通道-类别分配的前向传播卷积神经网络

Mohammadnavid Ghader, Saeed Reza Kheradpisheh, Bahar Farahani, Mahmood Fazlali

AI总结提出可学习的通道-类别分配机制，结合熵和正交正则化，以及基于验证性能的损失感知层贡献策略，在残差CNN上实现前向传播学习，在CIFAR-10/100和Tiny-ImageNet上达到FF模型最佳性能，缩小与反向传播的差距。

详情

AI中文摘要

前向-前向（FF）算法通过用局部的前向目标替代基于梯度的信用分配，提供了一种受生物学启发的反向传播替代方案。虽然最近的扩展已将FF适应到卷积神经网络（CNN），但现有公式依赖于静态的通道-类别分区，并且在复杂任务中难以有效执行。在这项工作中，我们引入了一种可学习的通道-类别分配机制，实现了卷积通道的自适应、数据驱动特化，并辅以熵和正交正则化以提升学习性能。我们进一步提出了一种损失感知的层贡献策略，该策略根据中间层的验证性能自适应地加权其预测，从而增强前向推理的有效性。集成到残差CNN中，所提出的方法在CIFAR-10、CIFAR-100和Tiny-ImageNet上相比现有的类似前向方法持续实现了更优的性能。值得注意的是，它在基于FF的模型中建立了新的最先进性能，显著缩小了与反向传播的差距。这些发现表明，引入可学习的通道特化和层贡献加权显著增强了深度CNN中前向学习的表示能力。

英文摘要

The Forward-Forward (FF) algorithm offers a biologically inspired alternative to backpropagation by replacing gradient-based credit assignment with local, forward-only objectives. While recent extensions have adapted FF to convolutional neural networks (CNNs), existing formulations rely on static channel-class partitions and struggle to perform effectively in complex tasks. In this work, we introduce a learnable channel-class assignment mechanism that enables adaptive, data-driven specialization of convolutional channels, supported by entropy and orthogonality regularization to promote learning performance. We further propose a loss-aware layer contribution strategy that adaptively weights intermediate-layer predictions based on their validation performance, enhancing the effectiveness of forward-only inference. Integrated into residual CNNs, the proposed method achieves consistently superior performance across CIFAR-10, CIFAR-100, and Tiny-ImageNet compared to existing similar forward-only methods. Notably, it establishes new state-of-the-art performance among FF-based models, substantially narrowing the gap with backpropagation. These findings demonstrate that introducing learnable channel specialization and layer contribution weighting significantly enhances the representational capacity of forward-only learning in deep CNNs.

URL PDF HTML ☆

赞 0 踩 0

2606.09951 2026-06-10 cs.LG 新提交

Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks

注意力的哈斯图：设计Transformer掩码的偏序框架

Chentao Li, Han Guo

AI总结本文提出一个理论框架，证明多层Transformer的信息流收敛到哈斯图，并将并行训练任务设计转化为求哈斯图最小公共超图问题，由此导出两种新注意力掩码。

Comments 21 pages, 9 figures. Theoretical framework for attention mask design; no experiments included

详情

AI中文摘要

在大型Transformer模型的训练过程中，注意力掩码控制序列中信息流的范围和方向。存在多种掩码变体，诸如FlexAttention之类的算子已经支持任意注意力掩码。然而，对于任意掩码所引起的信息流结构，一直缺乏系统的形式化分析。本文开发了一个完整的理论框架。我们证明，在足够深度下，多层Transformer的信息流收敛到一个哈斯图——表示偏序的有向无环图。在此基础上，我们将并行训练任务的设计重新表述为寻找哈斯图的最小公共超图的问题，并建立了最小公共超图的判定准则。这产生了一种直接从任务族推导注意力掩码的构造性方法。应用该框架，我们设计了两种新颖的掩码：一种确保训练-推理一致性的块生成注意力掩码（块双流注意力），以及一种全监督双向注意力掩码（蝴蝶注意力）。这些结果证明了该框架发现新结构的能力。

英文摘要

During the training of large Transformer models, attention masks regulate the scope and direction of information flow across a sequence. Numerous mask variants exist, and operators such as FlexAttention already support arbitrary attention masks. Nevertheless, a systematic formal analysis of the information-flow structure induced by arbitrary masks has been missing. This paper develops a complete theoretical framework. We prove that, with sufficient depth, the information flow of a multi-layer Transformer converges to a Hasse diagram -- a directed acyclic graph representing a partial order. Building on this, we recast the design of parallel training tasks as the problem of finding a minimal common supergraph of Hasse diagrams, and we establish a criterion for the minimal common supergraph. This yields a constructive method to derive attention masks directly from a family of tasks. Applying the framework, we design two novel masks: a block-generation attention mask that ensures training-inference consistency (Block Two-Stream Attention), and a fully supervised bidirectional attention mask (Butterfly Attention). These results demonstrate the framework's capacity to discover new structures.

URL PDF HTML ☆

赞 0 踩 0

2606.09954 2026-06-10 cs.LG cs.AI 新提交

Does Normalization Choice Matter for Causal Large Time-Series Models?

归一化选择对因果大规模时间序列模型重要吗？

Samy-Melwan Vilhes, Gilles Gasso, Mokhtar Z Alaya

AI总结研究因果大规模时间序列模型中不同归一化策略对训练收敛和预测性能的影响，发现归一化选择显著影响模型效果。

详情

Journal ref: ICLR 2026 Workshop: Time Series in the Age of Large Models, Apr 2026, Rio De Janeiro, Brazil

AI中文摘要

用于时间序列预测的大规模模型已成为在异构信号集合上训练模型的有前景的范式。这些模型通常依赖于因果自回归架构，其中每个观测值根据过去依次预测。在实践中，真实世界的时间序列表现出非平稳性，这显著影响预测性能。为了缓解这一问题，通常采用归一化。然而，在高效的因果设置中，归一化可能在训练期间导致来自未来观测的信息泄漏。最近提出的替代方案，包括因果归一化和从初始观测计算的统计量，旨在解决这一问题，但其实际影响仍未被充分理解。在这项工作中，我们评估了基于Transformer的大规模时间序列模型（采用分块和高效因果策略训练）的归一化策略。我们展示了归一化选择显著影响训练收敛和预测性能。

英文摘要

Large models for time-series forecasting have been emerged as a promising paradigm for training models on heterogeneous collections of signals. These models typically rely on causal autoregressive architectures, where each observation is sequentially predicted from past. In practice, real-world time-series exhibit non-stationarities, which significantly influence predictive performance. To mitigate this, normalization is commonly employed. However, in efficient causal settings it might induce information leakage from future observations during training. Recent alternatives, including causal normalization and statistics computed from initial observations, have been proposed to address this issue, but their practical implications remain insufficiently understood. In this work, we evaluate normalization strategies for transformer-based large time-series models trained with patching and efficient causal strategy. We showcase that normalization choice significantly influences both training convergence and forecasting performance.

URL PDF HTML ☆

赞 0 踩 0

2606.10085 2026-06-10 cs.LG eess.SP math.OC 新提交

Structured Adaptive Tensor Prediction for Streaming Data

流式数据的结构化自适应张量预测

Zhen Qin, Yang Chen

AI总结针对矩阵值时间序列的流式预测，提出自适应张量回归框架，包含矩阵-矩阵和张量-矩阵两种形式，并开发在线SGD算法，张量-矩阵模型在稳态误差和去噪方面更优，同时建立了低维结构下的恢复保证。

详情

AI中文摘要

矩阵值时间序列出现在广泛的应用中，例如来自医学成像和地球物理学的时空数据。现有方法主要针对静态环境设计，缺乏对流式和时变环境的适应性。自适应滤波技术也大多局限于标量或向量值数据，使得矩阵值时间序列的自适应预测理解不足。为弥补这些差距，我们开发了一个自适应张量回归框架，包括矩阵-矩阵（MoM）和张量-矩阵（ToM）两种形式，用于流式矩阵值预测。这两种形式的区别在于是否直接建模矩阵值输出，或通过高阶张量表示利用时间结构。针对所提出的张量回归框架，我们开发了用于在线学习的随机梯度下降（SGD）算法。我们表明，将多个响应随时间堆叠成高阶张量可以提高性能；特别是，ToM比MoM实现了更低的稳态误差和更强的去噪能力，这促使我们关注ToM模型。我们进一步刻画了SGD在时变动态下的跟踪行为。从统计角度，我们建立了ToM在一般低维结构（包括稀疏性、低秩性及其联合稀疏低秩模型）下的固定时间恢复保证。

英文摘要

Matrix-valued time series arise in a wide range of applications, such as spatio-temporal data from medical imaging and geophysics. Existing methods are mainly designed for static settings and lack adaptability to streaming and time-varying environments. Adaptive filtering techniques have also been largely limited to data with scalar or vector values, leaving adaptive forecasting for matrix-valued time series inadequately understood. To bridge these gaps, we develop an adaptive tensor regression framework that includes Matrix-on-Matrix (MoM) and Tensor-on-Matrix (ToM) formulations for streaming matrix-valued prediction. The two formulations differ in whether to directly model matrix-valued outputs or to exploit temporal structure via higher-order tensor representations. For the proposed tensor regression framework, we develop stochastic gradient descent (SGD) algorithms for online learning. We show that stacking multiple responses across time into higher-order tensors improves performance; in particular, the ToM achieves lower steady-state error and stronger denoising capability than MoM, motivating our focus on the ToM model. We further characterize the tracking behavior of SGD under time-varying dynamics. From a statistical perspective, we establish fixed-time recovery guarantees for ToM under general low-dimensional structures, including sparsity, low-rankness, and their joint sparselow-rank models.

URL PDF HTML ☆

赞 0 踩 0

2606.10111 2026-06-10 cs.LG cs.SY eess.SY stat.ML 新提交

Nonlinear Estimator: Dual Bayesian Affine Estimators for Parameter Learning

非线性估计器：用于参数学习的双贝叶斯仿射估计器

Sasan Vakili, Daniël Woonings, Pradyumna Paruchuri, Peyman Mohajerin Esfahani

AI总结提出一种用于Wiener型状态空间模型的非线性参数估计器，通过固定点架构耦合两个仿射最小均方误差估计器，分别估计未知参数和潜在变量，并开发两种双估计器框架，实验表明双状态-参数估计器在参数均方误差上优于其他方法。

Comments 32 pages, 9 figures

详情

AI中文摘要

本文提出一种用于Wiener型状态空间模型的非线性参数估计器，该估计器采用固定点架构，耦合两个仿射最小均方误差（MMSE）估计器：一个用于未知参数，另一个用于潜在变量。该架构保留了最优仿射MMSE参数估计器的功能结构，同时引入了动态基统计（DBS）估计，以总结非线性基函数评估。开发了两种DBS构建策略，从而产生两种非线性估计器框架。双基-参数估计器将仿射基估计器与仿射参数估计器相结合，而双状态-参数估计器首先计算仿射状态估计及其协方差，然后通过高斯DBS算子映射这些状态估计统计量以获得DBS估计。两种双估计器都采用固定点表征，交替估计每个分量，使用另一个分量的更新先验（该先验来自前一次迭代中该分量的插件估计统计量）。通过广泛的蒙特卡洛实验检验了所提方法的有效性，结果表明双基-参数估计器获得的参数均方误差与纯仿射参数估计器相当，而双状态-参数估计器实现了最低的参数均方误差，优于双基-参数估计器、纯仿射参数估计器以及经典粒子吉布斯和期望最大化方案的顺序蒙特卡洛变体。

英文摘要

This paper presents a nonlinear parameter estimator for Wiener-type state-space models obtained as a fixed-point architecture that couples two affine minimum mean-squared error (MMSE) estimators: one for the unknown parameters and one for latent variables. The architecture retains the functional structure of the optimal affine MMSE parameter estimator while incorporating Dynamic Basis Statistics (DBS) estimates that summarize nonlinear basis-function evaluations. Two DBS construction strategies are developed, leading to two nonlinear estimator frameworks. The dual basis-parameter estimator combines an affine basis estimator with the affine parameter estimator, whereas the dual state-parameter estimator first computes affine state estimates and their covariances, then maps these state-estimate statistics through a Gaussian DBS operator to obtain DBS estimates. Both dual estimators admit fixed-point characterizations that alternate between estimating each component using the updated prior of the other, obtained from that component's plug-in estimate statistics from the previous iteration. The efficacy of the proposed methods is examined via extensive Monte Carlo experiments, showing that the dual basis-parameter estimator attains parameter mean-squared errors comparable to those of the purely affine parameter estimator, while the dual state-parameter estimator achieves the lowest parameter mean-squared error, outperforming both the dual basis-parameter and purely affine parameter estimators, as well as sequential Monte Carlo variants of classical Particle Gibbs and Expectation-Maximization schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.10170 2026-06-10 cs.LG 新提交

Learning Entropy and Spatial Adaptation Dynamics of Multilayer Perceptrons for Structural Point Extraction

多层感知机的学习熵与空间自适应动力学用于结构点提取

Jan Glaser, Ivo Bukovsky, Marcel Jirina

AI总结提出空间学习熵（SLEM）方法，通过分析MLP在图像样本学习中的权重自适应，识别对网络学习重要的图像点与区域，为特征提取提供新视角。

详情

AI中文摘要

本文将学习熵（LE）的概念从时间自适应系统扩展到应用于图像数据的多层感知机网络（MLP）中的空间学习。与局部邻域方法直接从梯度或协方差算子评估图像结构不同，所提方法通过学习熵分析学习过程本身。训练MLP从周围空间上下文预测中心像素的强度，同时从跨图像样本的学习过程中神经权重的增量自适应评估LE。生成的空间学习熵图（SLEM）识别出引起神经网络强烈自适应的异常图像点和区域，这些点在网络学习过程中具有重要作用。结果表明，空间学习熵通过突出对网络学习特别有信息量的空间位置，为传统特征提取和可解释性方法提供了补充视角。空间学习熵根据学习影响而非局部结构属性识别图像点和区域，为传统特征提取和可解释性方法提供了补充视角。所提框架可能为计算机视觉、制造和机器人学中的学习驱动图像或场景分析开辟新方向。

英文摘要

This paper extends the concept of Learning Entropy (LE) from temporal adaptive systems to spatial learning in multilayer perceptron networks (MLPs) applied to image data. Instead of evaluating image structure directly from gradients or covariance operators, as local neighborhood methods do, the proposed approach analyzes the learning process itself through Learning Entropy. An MLP is trained to predict the intensity of a center pixel from its surrounding spatial context, while LE is evaluated from the incremental adaptation of neural weights during learning across image-derived samples. The resulting Spatial Learning Entropy Maps (SLEM) identify unusual image points and regions that induce strong adaptation of the neural network and therefore have an important role in the learning process. The results indicate that spatial Learning Entropy provides a complementary perspective to conventional feature extraction and explainability methods by highlighting spatial locations that are particularly informative for network learning. Spatial Learning Entropy provides a complementary perspective to conventional feature extraction and explainability methods by identifying image points and regions according to their learning impact rather than their local structural properties. The proposed framework may open new directions for learning-driven image or scene analysis in computer vision, manufacturing, and robotics.

URL PDF HTML ☆

赞 0 踩 0

2606.10184 2026-06-10 cs.LG cs.AI 新提交

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

Dropout-GRPO: 用于连续潜在推理的变分随机性

Wooil Jung

AI总结针对GRPO在连续潜在推理模型中因确定性轨迹导致优势为零的问题，提出通过结构化Dropout引入随机性，使GRPO能优化贝叶斯模型平均策略，在GSM8K上提升Coconut基线准确率。

详情

AI中文摘要

组相对策略优化（GRPO）依赖于每组内$K$次rollout的多样性；否则，组平均优势$A^{(k)} = r^{(k)} - \mu_r$会坍缩为零。这对像Coconut这样的潜在推理模型构成了结构性挑战，该模型循环地馈送连续隐藏状态以替代离散的思维链token。由于给定参数和提示后潜在阶段本质上是确定性的，多次rollout会产生相同的轨迹，阻碍GRPO的进展。因此，将组相对强化学习应用于连续潜在推理被证明是困难的。为解决此问题，我们提出通过结构化dropout来获取必要的随机性。通过在给定rollout的所有潜在递归步骤中应用一个保持不变的单一伯努利掩码，我们生成必要的轨迹方差。这个共享掩码有效地将每个rollout视为来自参数变分分布的后验样本，使GRPO能够优化贝叶斯模型平均策略的期望奖励。我们为该方法提供了理论证明——包括无偏性、方差减少以及潜在梯度的良定义性——以及实证验证。在GSM8K上，dropout-GRPO将Coconut基线从$27.29\%$提升至$29.01\%$的pass@1，证明了GRPO学习在潜在推理模型中的可行性。我们的工作将此定位为一种实用且理论基础的潜在推理LLM后训练方法。

英文摘要

Group Relative Policy Optimization (GRPO) relies on the diversity of $K$ rollouts within each group; otherwise, the group-mean advantage $A^{(k)} = r^{(k)} - μ_r$ collapses to zero. This presents a structural challenge for latent-reasoning models like Coconut, which feed continuous hidden states recurrently in place of discrete chain-of-thought tokens. Because the latent phase is inherently deterministic given the parameters and prompt, multiple rollouts produce identical trajectories, stalling GRPO's progress. Consequently, applying group-relative reinforcement learning to continuous latent reasoning has proven difficult. To address this, we propose sourcing the necessary stochasticity through structured dropout. By applying a single Bernoulli mask held constant across all latent recurrence steps for a given rollout, we generate essential trajectory variance. This shared mask effectively treats each rollout as a posterior sample from a variational distribution over parameters, allowing GRPO to optimize the expected reward of a Bayesian model-average policy. We provide both theoretical justification for this method -- including unbiasedness, variance reduction, and the well-definedness of the latent gradient -- and empirical validation. On GSM8K, dropout-GRPO improves a Coconut baseline from $27.29\%$ to $29.01\%$ pass@1, demonstrating the viability of GRPO learning for latent-reasoning models. Our work positions this as a practical, theoretically grounded approach for post-training latent-reasoning LLMs.

URL PDF HTML ☆

赞 0 踩 0

2606.10294 2026-06-10 cs.LG cs.AI cs.AR cs.NE physics.comp-ph 新提交

LLM-Guided Neural Architecture Search for Robust Co-Design of Physical Neural Networks

LLM引导的神经架构搜索用于物理神经网络的鲁棒协同设计

Tyler King, Timothee Leleu

发表机构 * University of California, Berkeley（加州大学伯克利分校）； DeepMind（深度思维）

AI总结提出UH-NAS框架，利用大语言模型作为进化算子，协同优化任务准确率和推理能耗，实现跨硬件平台的公平比较，在光学MZI硬件上发现更鲁棒的架构。

详情

AI中文摘要

在非常规硬件上部署神经网络需要同时优化任务准确率和平台特定约束（如能耗成本、物理非理想性和数值精度）的架构。现有的神经架构搜索（NAS）方法通常针对单一硬件系列，限制了跨平台比较和泛化。我们提出了非常规硬件神经架构搜索（UH-NAS），一个硬件无关的、LLM引导的NAS框架，它集成语言模型作为进化算子来协同优化准确率和推理能耗。通过将硬件暴露为可交换的后端，并配备每平台能耗模型、物理约束和非理想性模拟器，UH-NAS能够在无需修改搜索算法的情况下，对各种后端进行公平的系统级比较。在光学MZI硬件上测试，UH-NAS发现了比传统基线更多样化、更鲁棒的架构，同时优于现有的LLM-to-NAS方法。对非理想性下架构鲁棒性以及系统提示作用的进一步消融研究，凸显了架构-硬件协同设计对新兴计算平台的重要性。

英文摘要

Deploying neural networks on unconventional hardware demands architectures that co-optimize task accuracy and platform-specific constraints such as energy cost, physical non-idealities, and numerical precision. Existing neural architecture search (NAS) methods are typically tailored to a single hardware family, limiting cross-platform comparison and generalization. We introduce Unconventional Hardware Neural Architecture Search (UH-NAS), a hardware-agnostic, LLM-guided NAS framework that integrates language models as evolutionary operators to co-optimize accuracy and inference energy. By exposing hardware as a swappable backend with per-platform energy models, physical constraints, and non-ideality simulators, UH-NAS enables fair system-level comparisons across various backends without modifying the search algorithm. Tested on optical MZI hardware, UH-NAS discovers more diverse, robust architectures than conventional baselines while outperforming existing LLM-to-NAS approaches. Additional ablations on architecture robustness under non-idealities and the role of system prompts highlight the importance of architecture-hardware co-design for emerging computing platforms.

URL PDF HTML ☆

赞 0 踩 0

2606.10385 2026-06-10 cs.LG cs.AI 新提交

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

超越绝对模仿：基于锚定残差引导的特权在线蒸馏

Wenhao Zhang

发表机构 * South China University of Technology（华南理工大学）

AI总结提出锚定残差在线蒸馏（AR-OPD），通过部分特权教师建立局部兼容锚点并注入受控残差，解决特权在线蒸馏中后见偏差导致的局部不可达问题，在推理任务上平均提升2.3个点。

Comments 17 pages, 8 figures. Project page: https://vanhowe.github.io/AR-OPD/

详情

AI中文摘要

在线蒸馏（OPD）通过将学生模型与教师在其自身轨迹上的预测分布对齐，在增强LLM复杂推理方面展现出显著的实证收益。一种新兴变体——特权OPD，通过使用增强特权信息（如oracle轨迹）的自教师模型进一步强化该范式，以缓解师生能力差距，同时提供密集的、答案导向的监督。然而，当前方法将特权信息视为一个整体的模仿目标，未能将局部可达的推理步骤与未来条件的oracle信号分离。因此，学生被鼓励去匹配一个事后偏差分布，该分布通常落在其局部预测支持之外。这种可达性不匹配激励学生模型跳过有效的中间推理，转而采用局部不支持的捷径。为解决此问题，我们引入锚定残差在线蒸馏（AR-OPD），一种解耦特权监督的双视角框架。AR-OPD不强制执行严格的全局模仿，而是使用部分特权教师建立局部兼容锚点，将oracle预见性隔离并作为受控残差注入，以提供目标导向的引导。在多种推理任务上，AR-OPD比完全特权OPD高出2.3个点，比SFT高出7.9个点。关键的是，这种锚定残差机制将事后泄漏减少了21.7%，并缓解了后期漂移，在超过768个token的挑战性长程轨迹上取得了高达7.2个点的优势。

英文摘要

On-policy distillation (OPD) has demonstrated strong empirical gains in enhancing complex reasoning in LLMs by aligning a student model with a teacher's predictive distribution over the student's own trajectories. An emerging variant, Privileged OPD, further strengthens this paradigm by employing a self-teacher model augmented with privileged information, such as oracle traces, to mitigate teacher-student capacity gaps while providing dense, answer-directed supervision. However, current methods treat privileged information as a monolithic imitation target, failing to disentangle locally reachable reasoning steps from future-conditioned oracle signals. Consequently, the student is encouraged to match a hindsight-biased distribution that often falls outside its local predictive support. This reachability mismatch incentivizes the student model to skip valid intermediate reasoning in favor of locally unsupported shortcuts. To resolve this, we introduce Anchored Residual On-Policy Distillation (AR-OPD), a dual-view framework that disentangles privileged supervision. Rather than enforcing strict full-view imitation, AR-OPD establishes a locally compatible anchor using a partially privileged teacher, isolating and injecting oracle foresight as a controlled residual to provide destination-directed guidance. Across diverse reasoning tasks, AR-OPD outperforms full privileged OPD by 2.3 points and SFT by 7.9 points. Crucially, this anchored residual mechanism reduces hindsight leakage by 21.7% and mitigates late-stage drift, yielding up to a 7.2-point advantage on challenging long-horizon trajectories exceeding 768 tokens.

URL PDF HTML ☆

赞 0 踩 0

2606.10406 2026-06-10 cs.LG cs.AI 新提交

FOGO: Forgetting-aware Orthogonalization Optimizer

FOGO：遗忘感知正交化优化器

Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim

发表机构 * School of Computer Science and Engineering, University of New South Wales（新南威尔士大学计算机科学与工程学院）； Department of Data Science & AI, Monash University（莫纳什大学数据科学与人工智能系）； DEVCOM Army Research Laboratory（DEVCOM陆军研究实验室）

AI总结提出FOGO优化器，通过谱正交化动量更新并利用紧凑码本记忆解决梯度干扰，在类别不平衡、持续学习和大模型微调等场景中提升收敛与知识保留。

详情

AI中文摘要

我们认为遗忘不仅局限于持续学习，而是一种普遍的优化现象：在标准训练过程中，主导的小批量梯度抑制了罕见但有用的更新方向，导致每一步的短期遗忘。当这些知识从未被重新访问时，这些损失会累积成长期遗忘——持续学习的经典失败模式。我们引入了FOGO，一种可扩展的优化器，能够持续检测并解决两种场景下的梯度干扰。FOGO对动量更新进行谱正交化，以防止主导方向垄断优化，然后将代表性的过去方向存储在基于随机投影的紧凑码本记忆中，其中成对距离在低维空间中得到可证明的保留。在每一步中，当前更新与存储方向之间的冲突通过轻量级正交校正解决，并通过近端步骤提升回来，开销极小且无需存储数据。在类别不平衡分类、领域和类别变化下的持续视觉学习、LLaVA-7B的持续微调以及GPT-2预训练中，FOGO持续改善收敛和知识保留，优于Adam和Muon。

英文摘要

We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both regimes. FOGO spectrally orthogonalizes momentum updates to prevent dominant directions from monopolizing optimization, then stores representative past directions in a compact codebook memory built on random projection, where pairwise distances are provably preserved in low-dimensional space. At each step, conflicts between the current update and stored directions are resolved via lightweight orthogonal correction and lifted back through a proximal step, with minimal overhead and no data storage. Across class-imbalanced classification, continual visual learning under domain and class shifts, continual fine-tuning of LLaVA-7B, and GPT-2 pretraining, FOGO consistently improves convergence and knowledge retention, outperforming Adam and Muon.

URL PDF HTML ☆

赞 0 踩 0

2606.10435 2026-06-10 cs.LG cs.CL 新提交

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

并行因果关联域：用于长上下文语言建模的门控稀疏记忆

Muhammad Ahmed

发表机构 * Independent Researcher（独立研究员）

AI总结提出并行因果关联域（PCAF），通过哈希桶存储局部记录、检索候选集形成稀疏缓存，并与参数化语言模型门控混合，实现稀疏长上下文访问，避免固定状态瓶颈。

Comments 17 pages, 5 figures, and 6 tables. Experiments on WikiText-103, PG-19, and WikiText-2 using TPU v4-32 and NVIDIA RTX 3060 hardware. Code: https://github.com/ahmed123hds/PCAF

详情

AI中文摘要

Transformer通过提供直接的token间通信路径实现了强大的语言建模性能，但因果自注意力的计算量随上下文长度呈二次方增长。循环模型和状态空间模型降低了这一成本，但将历史压缩为顺序更新的固定大小状态。本文研究了第三种原语：基于因果后继记录的并行内容寻址记忆。所提出的并行因果关联域（PCAF）将上下文窗口中的局部记录写入哈希桶，为当前查询检索有界的候选集，在后继token上形成稀疏缓存分布，并通过学习到的门将该缓存与参数化局部语言模型混合。所得模型在避免单一固定循环状态瓶颈的同时，保持了稀疏的长上下文访问。我们在WikiText-103和PG-19上使用分布式Google Cloud TPU v4-32 pod对PCAF进行了完全自回归预训练。在303M参数和上下文长度T=2048的情况下，PCAF-semantic在WikiText-103上达到36.31困惑度，在PG-19上达到52.45困惑度，而匹配的密集Transformer分别为47.49和53.84。PCAF-semantic在TPU pod上同时处理0.61-0.62M token/s，而密集和局部注意力基线为0.43M token/s。支持41M参数的多种子扫描和单GPU组件消融实验表明，关联缓存、检索容量和学习到的门对速度-质量权衡有实质性影响。

英文摘要

Transformers achieve strong language modeling performance by providing direct token-to-token communication paths, but causal self-attention scales quadratically with context length. Recurrent and state-space models reduce this cost, yet compress history into sequentially updated fixed-size states. This paper studies a third primitive: a parallel content-addressed memory over causal successor records. The proposed Parallel Causal Associative Field (PCAF) writes local records from a context window into hash buckets, retrieves a bounded candidate set for the current query, forms a sparse cache distribution over successor tokens, and mixes that cache with a parametric local language model through a learned gate. The resulting model maintains sparse long-context access while avoiding a single fixed recurrent state bottleneck. We evaluate PCAF under full autoregressive pretraining on WikiText-103 and PG-19 using a distributed Google Cloud TPU v4-32 pod. At 303M parameters and context length T = 2048, PCAF-semantic reaches 36.31 perplexity on WikiText-103 and 52.45 perplexity on PG-19, compared with 47.49 and 53.84 for a matched dense Transformer. PCAF-semantic simultaneously processes 0.61-0.62M tokens/s across the TPU pod, versus 0.43M tokens/s for dense and local attention baselines. Supporting 41M-parameter multi-seed sweeps and single-GPU component ablations show that the associative cache, retrieval capacity, and learned gate materially affect the speed-quality trade-off.

URL PDF HTML ☆

赞 0 踩 0

2606.10596 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Embedding Hybrid Systems into Continuous Latent Vector Fields

将混合系统嵌入连续潜在向量场

Sangli Teng, Hang Liu, Koushil Sreenath

AI总结证明当m>2n时，n维混合系统可嵌入m维欧氏空间中的连续向量场，并基于此提出一种潜在神经ODE方法，从时间序列数据中准确恢复混合系统流，优于现有方法。

Comments Accepted to ICML 2026

2606.10682 2026-06-10 cs.LG 新提交

推理模型中预测未来行为以实现更好的引导

Evgenii Kortukov, Piotr Komorowski, Florian Klein, Paula Engl, Gabriele Sarti, Seong Joon Oh, Sebastian Lapuschkin, Wojciech Samek

发表机构 * Fraunhofer HHI（弗劳恩霍夫海因里希·赫兹研究所）； Northeastern University（东北大学）； KAIST（韩国科学技术院）

AI总结通过训练激活探针预测推理模型未来行为，提出未来探针控制生成（FPCG）方法，在多个评估中实现几乎无质量下降的引导。

详情

AI中文摘要

部署的大型推理模型（LRM）经常出现意外行为。测试时引导通过干预其隐藏表示来控制LRM输出，但可能降低输出质量。我们认为，先前的引导工作隐含地依赖于检测已生成文本中行为的内部特征。我们表明这些检测特征是未来行为结果的不良预测器，因此不是自然的干预目标。相反，我们训练激活探针从中间推理步骤预测未来行为可能性。这些探针以64%-91%的准确率预测最可能的行为，揭示了一种不同类型的内部预测特征。基于这些预测特征，我们引入了一种文本级引导方法，即未来探针控制生成（FPCG）。FPCG采样多个候选句子，并根据预测未来行为可能性的探针选择最佳句子。这使得引导几乎没有输出质量下降。FPCG还在激活引导失败的多个评估中实现了引导。这些结果表明，区分检测和预测特征能够实现对LRM行为更细致的控制。

英文摘要

Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show that these detection features are poor predictors of future behavioral outcomes, and thus not the natural intervention target. Instead, we train activation probes to predict future behavior likelihoods from intermediate reasoning steps. These probes predict the most likely behavior with 64%-91% accuracy, revealing a separate type of internal prediction features. Building on these prediction features, we introduce a text-level steering method, Future Probe Controlled Generation. FPCG samples multiple candidate sentences and chooses the best one according to a probe predicting the future behavior likelihood. This enables steering with almost no output quality degradation. FPCG also enables steering in several evaluations where activation steering fails. These results show that distinguishing detection and prediction features enables a more nuanced approach to controlling LRM behaviors.

URL PDF HTML ☆

赞 0 踩 0

2606.11189 2026-06-10 cs.LG cs.AI cs.CL 新提交

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

通过目标分布设计审视监督微调的统一视角

Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh

发表机构 * University of California, Los Angeles (UCLA)（加州大学洛杉矶分校）； Arena

AI总结本文重新解读监督微调为目标分布设计，提出Q-target框架，将监督分解为对观测token的依赖强度与替代token的概率分配，并基于此提出Target-SFT方法，在多个推理任务中优于现有方法。

详情

AI中文摘要

监督微调（SFT）通常最大化示范轨迹中每个token的似然。然而，观测到的token可能非唯一、有噪声或与模型先验不一致。严格拟合这种one-hot目标可能不是最优的，尤其是当预训练模型编码了丰富的知识先验时。在这项工作中，我们将SFT重新解释为目标分布设计：不仅研究损失目标，还分析损失驱动模型匹配的token级目标。我们引入Q-target框架，将SFT监督分解为两个明确的选择：(1) 对观测token的依赖强度，以及(2) 如何将剩余概率质量分配给替代token。这一视角将许多现有的SFT变体统一为目标分布Q的隐式选择。基于这一观点，我们提出Target-SFT，直接从期望的目标分布构建训练目标。该方法在十个推理数据集-模型设置中一致优于现有方法，展示了这种基于目标的方法的有效性。总体而言，我们的公式揭示了SFT训练更基本的设计原则，并为SFT目标开辟了更广阔的搜索空间。

英文摘要

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit choices of the target distribution Q. Building on this view, we propose Target-SFT which constructs the training objective directly from the desired target distribution. This method consistently outperforms across the ten reasoning dataset-model settings evaluated, showing the effectiveness of this target-based approach. Overall, our formulation reveals a more fundamental design principle for SFT training and opens a broader search space for SFT objectives.

URL PDF HTML ☆

赞 0 踩 0

2606.09635 2026-06-10 cs.CL cs.LG 交叉投稿

Gradient-Guided Reward Optimization for Inference-time Alignment

梯度引导的推理时对齐奖励优化

Hankun Lin, Ruqi Zhang

发表机构 * Purdue University（普渡大学）

AI总结提出梯度引导奖励优化(GGRO)方法，通过解码时注入梯度信号生成的引导令牌，在推理时微调生成轨迹，提升安全性、有用性和推理性能，并增强对奖励攻击的鲁棒性。

Comments Accepted to UAI 2026

详情

AI中文摘要

确保大型语言模型(LLMs)在分布漂移下的可靠性需要推理时自适应。虽然推理时对齐方法如Best-of-$N$和拒绝采样被广泛使用，但它们将任务视为采样密集的奖励引导搜索，导致两个关键限制：性能受限于基础模型的生成质量，以及对不完美奖励模型的依赖使其易受奖励攻击。为解决这些挑战，我们引入梯度引导奖励优化(GGRO)，一种轻量级推理时方法，通过梯度引导在解码期间执行有针对性的最小干预。具体来说，GGRO监测令牌级熵以识别指示漂移或未对齐的高不确定性区域。一旦检测到，它通过注入使用现成奖励模型的梯度信号生成的引导令牌来响应，以引导生成轨迹而不仅仅是重新排序样本。实验表明，GGRO在安全性、有用性和推理基准上持续改进推理时对齐。它还提高了高质量响应的覆盖率和对奖励攻击的鲁棒性，且计算开销极小。代码可在https://github.com/lhk2004/GGRO获取。

英文摘要

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment methods such as Best-of-$N$ and rejection sampling are widely used, they frame the task as a sampling-intensive, reward-guided search, leading to two key limitations: their performance is bounded by the base model's generation quality, and their reliance on imperfect reward models makes them vulnerable to reward hacking. To address these challenges, we introduce Gradient-Guided Reward Optimization (GGRO), a lightweight inference-time method that performs targeted, minimal intervention during decoding via gradient guidance. Specifically, GGRO monitors token-level entropy to identify high-uncertainty regions indicative of drift or misalignment. Upon detection, it responds by injecting nudging tokens, generated using gradient signals from an off-the-shelf reward model, to steer the generation trajectory rather than merely re-ranking samples. Experiments show that GGRO consistently improves inference-time alignment across safety, helpfulness, and reasoning benchmarks. It also increases coverage of high-quality responses and robustness to reward hacking, with minimal computational overhead. Code is available at https://github.com/lhk2004/GGRO.

URL PDF HTML ☆

赞 0 踩 0

2606.09871 2026-06-10 cs.CV cs.AI cs.LG 交叉投稿

SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation

SD-GRPO：面向长格式视觉-语言生成的可验证片段分解

Hyunwoong Kim, Seongeun Lee, Hannah Yun, Junhyun Park, Jonggwon Park

AI总结提出SD-GRPO方法，通过将长格式输出分解为片段并计算逐片段优势，解决GRPO在视觉-语言任务中粗粒度信用分配不足的问题，实验证明其在多种长格式生成任务中优于基线。

详情

AI中文摘要

群体相对策略优化（GRPO）及其变体最初为大型语言模型（LLM）开发，最近被应用于多模态LLM并取得了强劲结果。然而，它们基于单一标量优势的粗粒度整体信用分配在视觉-语言（VL）任务中拟合不足，这些任务的输出通常是基于语义丰富图像的长格式响应。为解决这一限制，我们利用了一种单标量公式丢弃的结构化信号：长格式VL输出的自然分段。具体地，我们提出片段分解GRPO（SD-GRPO），它对整个rollout组中可验证的逐片段奖励进行z归一化，生成一个逐片段优势向量以替代单一标量。我们在三个设置中评估SD-GRPO，涵盖受控和真实世界的长格式VL生成，按片段间语义纠缠程度递增组织。在从DOCCI构建的受控多面板密集字幕任务中（片段语义独立），SD-GRPO始终优于GRPO基线，且片段数量越多增益越大。扩展到从MultiChartQA构建的受控多图表长格式VQA任务，我们从理论和经验上证明，rollout级奖励存在随输出长度增加而加剧的跨片段信用错误归因。在MMSci数据集上的真实世界科学图表字幕任务中（子图字幕共享图表上下文），混合整体和逐片段奖励进一步提升了两者性能，表明当片段语义纠缠时，仅逐片段归一化是不够的。最后，通过将SD-GRPO集成到Dr. GRPO中，我们确认它可以以最小的实现开销应用于任何GRPO框架，以增强长格式VL生成。

英文摘要

Group Relative Policy Optimization (GRPO) and its variants, originally developed for Large Language Models (LLMs), have recently been applied to Multimodal LLMs and produced strong results. However, their coarse-grained holistic credit assignment from a single scalar advantage underfits vision-language (VL) tasks, where outputs are often long-form responses grounded in semantically rich images. To address this limitation, we exploit a structured signal that single-scalar formulations discard: the natural segmentation of long-form VL outputs. Concretely, we propose Segment-Decomposed GRPO (SD-GRPO), which z-normalizes verifiable per-segment rewards across the rollout group, yielding a vector of per-segment advantages in place of a single scalar. We evaluate SD-GRPO across three settings spanning controlled and real-world long-form VL generation, organized by increasing semantic entanglement across segments. On a controlled multi-panel dense-captioning task constructed from DOCCI, where segments are semantically independent, SD-GRPO consistently outperforms the GRPO baseline, with larger gains at higher segment counts. Extending to a controlled multi-chart long-form VQA task constructed from MultiChartQA, we show both theoretically and empirically that rollout-level rewards suffer from cross-segment credit misattribution that scales with output length. On a real-world scientific figure captioning task on the MMSci dataset, where subfigure captions share context across the figure, blending holistic and per-segment rewards further improves on both, suggesting per-segment normalization alone is insufficient when segments are semantically entangled. Finally, by integrating SD-GRPO into Dr. GRPO, we confirm that it can be applied to any GRPO framework with minimal implementation overhead to enhance long-form VL generation.

URL PDF HTML ☆

赞 0 踩 0

2606.09900 2026-06-10 cs.CL cs.AI cs.IR cs.LG 交叉投稿

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

更少上下文，更高准确率：一种用于LLM Agent的双时间记忆引擎，其中精简检索上下文优于完整历史

Liuyin Wang

发表机构 * Independent Researcher（独立研究者）

AI总结提出一种双时间记忆引擎Engram，通过混合读取路径从约9.6k token的检索片段中回答，在LongMemEval_S上达到83.6%准确率，比完整历史（79k token）高10.4个百分点，且无错误。

Comments 14 pages, 4 figures, 3 tables. Code, reproducible harness, and raw per-question logs: https://github.com/ly-wang19/engram

详情

AI中文摘要

长期记忆是LLM Agent缺失的一层：跨会话时它们会遗忘，而常见的解决方法——将整个历史重放到提示中——成本高、速度慢，且随着干扰物积累，准确性下降。大多数记忆系统在成本或延迟上胜出，但在准确性上仍不如完整上下文基线，且基准测试结果在不一致、不可复现的测试平台上报告，导致同一系统在不同来源上得分差异巨大。我们提出Engram，一种基于双时间数据模型的开源双过程记忆引擎。快速写入路径附加无损事件，无需LLM参与关键路径；异步路径提取原子（主体、谓词、客体）事实，构建双时间知识图谱，并解决矛盾，无需每个事实调用LLM——使事实失效而非删除，因此每个事实都有来源和继承链。混合读取路径融合密集、词汇、图谱和时效/显著性信号，应用时间点（“截至”）过滤器，并组装紧凑、带有来源标记的上下文。在完整的500个问题的LongMemEval_S上，由官方分类特定评判器评分，Engram的精简配置——从约9.6k token的检索片段回答，而非完整历史——得分为83.6%，而完整上下文为73.2%（+10.4个百分点，McNemar p < 10^-6），token数约为1/8（9.6k vs. 79k），且0/500错误。这种增益需要混合读取路径：仅事实会丢失召回率，而事实加检索片段则恢复细节。我们还贡献了一个中立的、仓库内的评估平台，内置官方评判器，并在每个表格中包含完整上下文基线，发布原始每问题日志，并记录了无声扭曲记忆基准的测量完整性陷阱（截断、自制评判器、完整历史泄露）。每个数字都附带复现命令。

英文摘要

Long-term memory is the missing layer for LLM agents: across sessions they forget, and the common workaround -- replaying the whole history into the prompt -- is expensive, slow, and, as distractors accumulate, less accurate. Most memory systems win on cost or latency but still lose to the full-context baseline on accuracy, and benchmark numbers are reported on inconsistent, non-reproducible harnesses, so one system appears at wildly different scores across sources. We present Engram, an open-source, dual-process memory engine on a bi-temporal data model. A fast write path appends lossless episodes with no LLM on the critical path; an asynchronous path extracts atomic (subject, predicate, object) facts, builds a bi-temporal knowledge graph, and resolves contradictions without an LLM call per fact -- invalidating, never deleting, so every fact keeps provenance and a supersession chain. A hybrid read path fuses dense, lexical, graph, and recency/salience signals, applies a point-in-time ("as-of") filter, and assembles a compact, provenance-tagged context. On the full 500-question LongMemEval_S, graded by the official category-specific judge, Engram's lean configuration -- answering from a ~9.6k-token retrieved slice, never the full history -- scores 83.6% vs. 73.2% for full-context (+10.4 points, McNemar p < 10^-6) at ~8x fewer tokens (9.6k vs. 79k), with 0/500 errored. The gain needs a hybrid read path: facts alone lose recall, while facts plus retrieved chunks recover detail. We also contribute a neutral, in-repo evaluation harness with the official judge baked in and the full-context baseline in every table, publish the raw per-question logs, and document the measurement-integrity pitfalls (truncation, home-grown judges, full-history leaks) that silently distort memory benchmarks. Every number ships with a command to reproduce it.

URL PDF HTML ☆

赞 0 踩 0

神经算子混合体降低算子学习中的主动复杂度

Anastasis Kratsios, Takashi Furuya, Jose Antonio Lara Benitez, Matti Lassas, Maarten de Hoop

发表机构 * McMaster University and Vector Institute（麦斯特大学和向量研究所）； Shimane University（岛根大学）； Rice University（里士满大学）； University of Helsinki（赫尔辛基大学）

AI总结通过路由混合神经算子（MoNO）与固定单神经算子构造的比较，证明MoNO在主动专家规模上具有更优的深度、宽度和秩缩放，且对Lipschitz目标这些量以O(ε^{-1})为界。

详情

AI中文摘要

算子学习系统并非仅由总参数数量决定；对于一次查询，相关瓶颈可能是必须加载和评估的模型。我们通过路由混合神经算子（MoNO）与固定单神经算子构造之间的建设性比较，在紧致Sobolev子集上研究了经典神经算子的这一区别。该比较涉及相对于基线的专家主动复杂度，其中总存储大小和路由搜索分别考虑。MoNO将每个输入函数通过树路由到一个专家。我们的主要定理表明，在近似集上，每个具有有界输出Sobolev半径的标量一致连续非线性算子都存在一个MoNO近似，其主动专家具有比所分析的单神经算子构造更小的深度、宽度和秩缩放；对于Lipschitz目标，这些专家量以$\mathcal{O}(\varepsilon^{-1})$为界。该定理将局部化转化为主动专家大小、路由深度和专家数量的算子级核算。我们还证明了底层神经算子架构的定量通用近似定理，明确依赖于紧集直径和连续模。

英文摘要

Operator-learning systems are not governed solely by total parameter count; for one query, the relevant bottleneck can be the model that must be loaded and evaluated. We study this distinction for classical neural operators on compact Sobolev subsets through a constructive comparison between routed mixtures of neural operators (MoNOs) and a fixed single-neural-operator construction. The comparison concerns expert-active complexity relative to that baseline, with total stored size and routing search accounted separately. A MoNO routes each input function through a tree to one expert. Our main theorem shows that every scalar uniformly continuous nonlinear operator with bounded output Sobolev radius on the approximation set admits a MoNO approximation whose active expert has smaller depth, width, and rank scaling than the analyzed single-neural-operator construction; for Lipschitz targets these expert quantities are bounded by $\mathcal{O}(\varepsilon^{-1})$. The theorem turns localization into an operator-level accounting of active expert size, routing depth, and number of experts. We also prove a quantitative universal approximation theorem for the underlying neural-operator architecture, with explicit dependence on compact-set diameter and modulus of continuity.

URL PDF HTML ☆

赞 0 踩 0

2502.11034 2026-06-10 cs.LG 版本更新

AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping

AdaGC: 通过自适应梯度裁剪增强LLM预训练稳定性

Guoxia Wang, Shuai Li, Congliang Chen, Jinle Zeng, Jiabin Yang, Dianhai Yu, Yanjun Ma, Li Shen

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出自适应逐张量梯度裁剪方法AdaGC，通过限制梯度范数相对于历史裁剪值的指数移动平均来消除损失尖峰，在Llama-2 7B等模型上实现零尖峰并提升下游准确率。

Comments Accept by ICML 2026

详情

AI中文摘要

损失尖峰仍然是大规模语言模型预训练中的一个持续障碍。虽然先前的研究试图通过调查单个因素来识别损失尖峰的根本原因，但我们观察到，在实践中，这种尖峰通常是由异质因素的汇合触发的。根据经验，损失尖峰可能源于数据异常值、硬件或瞬时计算故障、数值精度问题和超参数设置的组合。无论根本原因如何，这些尖峰表现为不稳定的优化器更新，因为异常梯度污染了第一和第二矩状态。在本文中，我们提出了一种基于梯度的原则性补救措施：AdaGC，一种自适应逐张量梯度裁剪方案，通过将梯度范数限制在它们历史裁剪值的张量级指数移动平均附近来减轻这种污染。AdaGC与优化器无关，引入可忽略的内存开销，并且与GlobalGC相比降低了通信成本，特别是在混合并行分布式训练中。在Llama-2 7B、Mixtral 8x1B和ERNIE 10B-A1.4B上的实验表明，AdaGC稳健地消除了训练不稳定性，一致地将所有模型的尖峰分数降至零，并且相对于GlobalGC分别将下游准确率提高了1.32%、1.27%和2.48%。此外，AdaGC与Muon和Lion等优化器无缝集成，一致地产生更高的平均准确率和零尖峰分数。代码可在以下网址获得：此https URL（参见Research/AdaGC）。

英文摘要

Loss spikes remain a persistent obstacle in large-scale language model pretraining. While previous research has attempted to identify the root cause of loss spikes by investigating individual factors, we observe that, in practice, such spikes are typically triggered by the confluence of heterogeneous factors. Empirically, loss spikes may arise from a combination of data outliers, hardware or transient computational faults, numerical precision issues, and hyperparameter settings. Regardless of the underlying cause, these spikes manifest as unstable optimizer updates, as abnormal gradients contaminate both first- and second-moment states. In this paper, we propose a principled gradient-centric remedy: AdaGC, an adaptive per-tensor gradient clipping scheme that mitigates such contamination by bounding gradient norms relative to a tensor-wise exponential moving average of their historical clipped values. AdaGC is optimizer-agnostic, introduces negligible memory overhead, and reduces communication costs compared to GlobalGC, particularly in hybrid-parallel distributed training. Experiments on Llama-2 7B, Mixtral 8x1B, and ERNIE 10B-A1.4B demonstrate that AdaGC robustly eliminates training instabilities, consistently reducing spike scores to zero for all models and improving downstream accuracy over GlobalGC by 1.32%, 1.27%, and 2.48%, respectively. Furthermore, AdaGC seamlessly integrates with optimizers such as Muon and Lion, consistently yielding higher average accuracy and zero spike scores. The code is available at https://github.com/PaddlePaddle/PaddleFleet (see Research/AdaGC).

URL PDF HTML ☆

赞 0 踩 0

2503.24007 2026-06-10 cs.LG cs.AI 版本更新

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

CITRAS: 协变量感知的Transformer时间序列预测

Yosuke Yamaguchi, Issei Suemitsu, Wenpeng Wei

发表机构 * Research & Development Group, Hitachi, Ltd.（日立有限公司研发部）

AI总结提出CITRAS，一种仅解码器Transformer，通过KV移位和注意力分数平滑机制灵活整合已知协变量的未来部分，并捕获局部和全局跨变量依赖，提升预测精度。

详情

DOI: 10.1109/ACCESS.2026.3695717
Journal ref: IEEE Access, vol. 14, pp. 77983-77998, 2026

AI中文摘要

在时间序列预测中，协变量代表影响目标变量的外部因素。一些协变量仅在过去可观测（观测协变量，如记录的天气数据），而另一些则预先已知（已知协变量，如日历事件或折扣计划）。尽管协变量有潜力提升预测性能，但大多数基于深度学习的预测模型难以处理由已知协变量的未来部分引起的变量长度差异，且无法灵活利用它们。此外，捕获目标变量与协变量之间的依赖关系并非易事，因为模型必须准确反映协变量的局部影响，同时建模全局跨变量依赖。为应对这些挑战，我们提出CITRAS，一种仅解码器Transformer，灵活整合多个目标变量、观测协变量和已知协变量。在保持强大自回归建模能力的同时，CITRAS在分块跨变量注意力中引入两种新机制：键值移位和注意力分数平滑。键值移位通过基于并发依赖将已知协变量的未来部分与目标变量对齐，无缝融入预测过程。注意力分数平滑通过平滑历史注意力分数，将局部精确的分块跨变量依赖细化为全局变量级依赖。实验上，CITRAS在协变量感知和多变量设置下的广泛真实世界数据集上展现出强大性能，展示了其利用跨变量和跨时间依赖提升预测准确性的通用能力。

英文摘要

In time series forecasting, covariates represent external factors that influence target variables. Some covariates are observable only in the past (observed covariates, such as recorded weather data), while others are known in advance (known covariates, such as calendar events or discount schedules). Although covariates have the potential to enhance forecasting performance, most deep learning-based forecasting models struggle to address the length discrepancy between variables caused by the future portion of known covariates and fail to leverage them flexibly. Moreover, capturing dependencies between target variables and covariates is non-trivial, as models must accurately reflect the local impact of covariates while simultaneously modeling global cross-variate dependencies. To address these challenges, we propose CITRAS, a decoder-only Transformer that flexibly integrates multiple target variables, observed covariates, and known covariates. While preserving strong autoregressive modeling capabilities, CITRAS introduces two novel mechanisms within patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates the future portion of known covariates into the forecasting process by aligning them with target variables based on their concurrent dependencies. Attention Score Smoothing refines locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the historical attention scores. Experimentally, CITRAS demonstrates strong performance across a wide range of real-world datasets in both covariate-informed and multivariate settings, showcasing its versatile ability to leverage cross-variate and cross-time dependencies for improved forecasting accuracy.

URL PDF HTML ☆

赞 0 踩 0

2601.03093 2026-06-10 cs.LG cs.CL 版本更新

ATLAS: Verifier-Guided Adaptive Latent Activation Steering for Efficient LLM Reasoning

ATLAS：验证器引导的自适应潜在激活引导用于高效LLM推理

Tuc Nguyen, Thai Le

发表机构 * Indiana University Bloomington（印第安纳大学布卢明顿分校）

AI总结提出ATLAS框架，通过轻量级验证器动态调整推理时潜在状态引导策略，实现每步自适应控制，在数学和编码推理任务上提升准确率并减少测试时token使用。

Comments 21 pages, 6 figures

详情

AI中文摘要

最近关于激活和潜在引导的研究表明，修改内部表示可以有效引导大型语言模型（LLMs）在不更新模型参数的情况下提高推理和效率。然而，大多数现有方法依赖固定引导策略和静态干预强度，这限制了它们在问题实例上的鲁棒性，并常常导致过度或不足引导。我们提出自适应测试时潜在引导（ATLAS），这是一个轻量级框架，通过训练好的、轻量级验证器在推理时动态控制引导决策。给定中间隐藏状态，验证器预测当前推理的质量，并自适应选择要应用的引导动作，实现每个示例和每个步骤的调整，且开销最小。ATLAS提供了一个统一框架，将学习到的潜在验证与测试时激活引导相结合，无需额外的LLM解码或推理时过程奖励模型调用即可实现自适应推理控制。在多个数学和编码推理基准上的实验表明，ATLAS始终优于普通解码和固定引导基线，在实现更高准确率的同时大幅减少测试时token使用。这些结果表明，验证器引导的潜在适应提供了一种有效且可扩展的机制，可以在不牺牲解决方案质量的情况下控制推理效率。所有源代码将公开提供。

英文摘要

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without updating model parameters. However, most existing approaches rely on fixed steering policies and static intervention strengths, which limit their robustness across problem instances and often result in over- or under-steering. We propose Adaptive Test-time Latent Steering (ATLAS), a lightweight framework that dynamically controls steering decisions at inference time using a trained, lightweight verifier over the latent states. Given intermediate hidden states, the verifier predicts the quality of ongoing reasoning and adaptively selects which steering action to apply, enabling per-example and per-step adjustment with minimal overhead. ATLAS provides a unified framework for combining learned latent verification with test-time activation steering, enabling adaptive reasoning control without additional LLM decoding or inference-time process reward model calls. Experiments on multiple mathematical and coding reasoning benchmarks show that ATLAS consistently outperforms both vanilla decoding and fixed steering baselines, achieving higher accuracy while substantially reducing test-time token usage. These results demonstrate that verifier-guided latent adaptation provides an effective and scalable mechanism for controlling reasoning efficiency without sacrificing solution quality. All source code will be publicly available.

URL PDF HTML ☆

赞 0 踩 0

2602.03164 2026-06-10 cs.LG cs.AI 版本更新

Falcon-X：面向异构多变量建模的时间序列基础模型

Yiding Liu, Yifan Hu, Hongjie Xia, Peiyuan Liu, Hongzhou Chen, Xilin Dai, Zewei Dong, Jiang-Ming Yang

发表机构 * Ant Group（蚂蚁集团）

AI总结针对现有时间序列基础模型在语义对齐和关系表达上的局限，提出Falcon-X，通过将变量映射到统一潜在原型空间，利用统一原型差分注意力机制对齐异构变量，并在共享空间中通过潜在实体注意力进行跨变量交互，实现零样本结构迁移，在GIFT-Eval和fev-bench基准上取得最先进预测性能。

详情

AI中文摘要

时间序列基础模型（TSFMs）通过大规模跨领域预训练正在改变预测范式。然而，大多数现有的TSFMs仍然是单变量的，最近实现跨变量建模的努力仍然直接在原始变量空间内操作。这种设计在语义对齐和关系表达性方面引入了基本限制。具体来说，原始空间分组混合缺乏对齐异构物理量的专用机制，而标准的非负注意力无法捕捉现实世界系统中普遍存在的复杂协同和拮抗交互。为了解决这些挑战，我们提出了Falcon-X，它将变量从原始空间中解耦，并将其映射到一个统一的潜在原型空间中。Falcon-X采用统一原型差分注意力机制，显式评估正负语义亲和度以对齐异构变量。然后通过潜在实体注意力在该共享空间内高效执行跨变量交互，自然地促进零样本结构迁移。最后，变量重组路由器通过请求-分发机制稳健地重建变量特定的轨迹。在GIFT-Eval和fev-bench基准上的广泛评估表明，Falcon-X实现了最先进的预测性能，为复杂的多变量环境提供了一种有原则且可扩展的范式。Falcon-X已公开发布以支持未来研究。

英文摘要

Time series foundation models (TSFMs) are transforming the forecasting paradigm through large-scale cross-domain pretraining. However, most existing TSFMs remain univariate, and recent efforts to enable cross-variate modeling still operate directly within the raw variate space. This design introduces fundamental limitations in semantic alignment and relational expressivity. Specifically, raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities, while standard non-negative attention fails to capture the complex synergistic and antagonistic interactions ubiquitous in real-world systems. To address these challenges, we propose Falcon-X, decouples variates from the raw space and maps them into a unified latent prototype space. Falcon-X employs a Unified Prototype Diff-Attention mechanism that explicitly evaluates both positive and negative semantic affinities to explicitly align heterogeneous variates. Cross-variate interactions are then efficiently performed within this shared space via Latent Entity Attention, naturally facilitating zero-shot structural transfer. Finally, a Variate Reassembly Router robustly reconstructs variate-specific trajectories via a request-and-dispatch mechanism. Extensive evaluations on the GIFT-Eval and fev-bench benchmarks demonstrate that Falcon-X achieves excellent forecasting performance, offering a principled and scalable paradigm for complex multivariate environments. Falcon-X is publicly released to support future research.

URL PDF HTML ☆

赞 0 踩 0

2606.02378 2026-06-10 cs.LG cs.AI 版本更新

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

注意力电路何时形成？三种1B级架构中能力和注意力汇出现的发育轨迹

Yongzhong Xu

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结本文追踪三种1B级语言模型中注意力头电路的形成轨迹，发现归纳电路形成早于注意力汇形成10-20倍令牌，且电路识别无需最终模型。

Comments 27pages, 3 figures

详情

AI中文摘要

我们追踪了三种1B级语言模型（涵盖两个架构系列：密集Transformer和混合专家，以及两个预训练语料库：The Pile和DCLM）中注意力头电路形成的发育轨迹：Pythia 1B、OLMo 1B-0724-hf和OLMoE 1B-7B-0924。在每个模型的10个对数间隔检查点（共30次机械可解释性运行）上，我们应用参与比（PR）谱信号和全头能力特异性选择性筛选，以追踪归纳头、前一个标记头和BOS吸引头出现的过程。五个发现：（F1）第0层和第1层在每个模型的每个检查点都没有产生BOS分类头：L0/L1零BOS底限是架构属性，而非学习结果。（F2）整个模型的BOS吸引头比例呈现三种不同的出现形状——Pythia 1B中逐渐上升，OLMo 1B中急剧相变（相邻检查点间从7%到70%），OLMoE 1B-7B中逐渐上升。（F3）在DCLM模型中，归纳电路的形成比BOS吸引头的形成早10-20倍令牌；能力电路的形成和注意力汇的形成是两个不同的转变，而非一个。（F4）能力特异性筛选在总训练令牌的0.3-2%内收敛到最终的归纳电路——电路识别不需要最终模型。（F5）对于从所有三个模型中采样的每个最终检查点归纳头，其每个头的PR值在该头首次超过其能力选择性阈值时或之前就已升高。这些结果细化了归纳相变框架：在DCLM上训练的1B级模型中，归纳转变和注意力汇转变在令牌数量上相差一个数量级，并且具有定性不同的形状。

英文摘要

We track the developmental trajectory of attention-head circuit formation across three 1B-class language models spanning two architecture families (dense transformer, mixture-of-experts) and two pretraining corpora (The Pile, DCLM): Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924. At each of 10 log-spaced revisions per model -- 30 mechanistic-interpretability runs in total -- we apply a participation-ratio (PR) spectral signal and an all-head capability-specific selectivity screen to track induction, previous-token, and BOS-attractor heads as they emerge. Five findings. (F1) Layers 0 and 1 produce zero BOS-classified heads at every revision in every model: the L0/L1 zero-BOS floor is an architectural property, not a learned outcome. (F2) The whole-model BOS-attractor fraction follows three distinct emergence shapes -- a gradual ramp in Pythia 1B, a sharp phase transition in OLMo 1B (7% to 70% between adjacent checkpoints), and a gradual ramp in OLMoE 1B-7B. (F3) In DCLM models, induction-circuit formation precedes BOS-attractor formation by 10-20x in tokens; capability-circuit formation and attention-sink formation are two transitions, not one. (F4) The capability-specific screen converges to the final induction circuit within 0.3-2% of total training tokens -- circuit identification does not require the final model. (F5) For every final-checkpoint induction head sampled across all three models, per-head PR is elevated at or before the first revision at which that head crosses its capability-selectivity threshold. The results refine the induction-phase-transition framing: in 1B-class models trained on DCLM, the induction transition and the attention-sink transition are separated by an order of magnitude in tokens and have qualitatively different shapes.

URL PDF HTML ☆

赞 0 踩 0

2606.06888 2026-06-10 cs.LG 版本更新

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

数据受限的语言模型预训练：改进的正则化与缩放定律

Zhiwei Xu, Shihao Wu, Hanseul Cho, Wei Hu, Yixin Wang

发表机构 * University of Michigan（密歇根大学）； KAIST AI（韩国科学技术院人工智能研究所）

AI总结研究数据受限下语言模型预训练的正则化与缩放，提出掩码输入正则化（MIR）改善验证损失，并设计SoftQ缩放定律更准确拟合重复数据下的模型与数据规模交互。

详情

AI中文摘要

语言模型预训练的经典缩放定律在固定计算预算下平衡模型规模与训练数据集大小，假设数据充足且仅对语料库遍历一次。随着训练计算量增长快于自然语言数据的供应，预训练可能进入数据受限、计算丰富的阶段，模型在有限数据集上训练多个周期。我们沿正则化和缩放两个维度研究数据受限预训练。对于正则化，我们研究掩码输入正则化（MIR），一种对随机掩码输入进行辅助下一词预测损失的方法。MIR测试扩散语言模型中的随机掩码是否能在不改变架构或增加推理开销的情况下有益于自回归预训练。在72M到1.4B参数的模型中，我们发现MIR在强权重衰减基础上进一步改善了验证损失，优于仅使用强权重衰减的自回归模型，并在1.4B规模上带来下游性能提升。对于缩放，我们提出SoftQ，一种将模型规模和数据规模耦合以捕捉它们在重复数据下交互的缩放定律。经典替代方案如Chinchilla定律使用加性形式解耦这些项，导致在数据受限情况下设定错误。我们发现SoftQ比这些替代方案更好地拟合数据受限实验，并估计MIR带来的增益相当于约1.3倍的独特训练数据。我们在https://this URL 发布代码。

英文摘要

Classical scaling laws for language model pretraining balance model size against training dataset size under a fixed compute budget, assuming abundant data and a single pass over the corpus. As training compute grows faster than the supply of natural language data, pretraining is likely to enter a data-constrained, compute-rich regime where models train for multiple epochs over a finite dataset. We study data-constrained pretraining along two axes, regularization and scaling. For regularization, we study masked-input regularization (MIR), an auxiliary next-token prediction loss on randomly masked inputs. MIR tests whether the random masking central to diffusion language models can benefit autoregressive pretraining without architectural changes or inference overhead. Across 72M to 1.4B parameter models, we find that MIR added on top of strong weight decay improves validation loss over autoregressive strong-weight-decay-only models, with downstream gains at 1.4B. For scaling, we propose SoftQ, a scaling law that couples model size and data size to capture their interaction under repeated data. Classical alternatives such as the Chinchilla law use an additive form that decouples these terms, making them misspecified in the data-constrained regime. We find that SoftQ fits data-constrained experiments substantially better than these alternatives, and estimates MIR's gains as equivalent to roughly 1.3 times as much unique training data. We release our code at https://github.com/yixinw-lab/dc_pretrain.

URL PDF HTML ☆

赞 0 踩 0

2606.07998 2026-06-10 cs.LG cs.AI 版本更新

Enhancing AI Interpretability and Safety through Localised Architectures

通过局部化架构增强AI可解释性与安全性

Ian Seet, Jonas Bozenhard, Simon Ostermann

AI总结针对大型生成式AI模型可解释性差、计算成本高的问题，提出局部化机器学习架构，通过降低带宽、提高节点表达能力来提升可解释性和效率，并评估了多种硬件实现方案的适用性。

详情

AI中文摘要

近期生成式AI的进展，特别是强大的大型语言模型（LLM）和大型推理模型（LRM），引发了对这些庞大且不透明的AI模型的可解释性、安全性和可持续性的担忧。这些架构的能力不仅源于深度神经网络的可扩展性，还源于大规模并行硬件（如GPU集群）。深度神经网络的弥散性质使其在提供足够训练数据时具有强大的函数逼近能力，但代价是可解释性和计算效率的降低。观察到局部化机器学习（ML）模型在小数据集上往往比深度神经网络更具可解释性和计算效率，我们通过类比推理，认为类似的优势可能适用于特定的局部化硬件ML架构。我们主张，具有较低带宽但每个节点具有更高表达能力的局部化架构，有潜力在根本上比运行在GPU集群上的深度神经网络更具可解释性，同时在较小数据集上保持竞争力。然后，我们评估了各种硬件ML范式在实现此类局部化架构方面的适用性，并评估了它们的每节点表达能力、能效以及所需技术的实际成熟度。

英文摘要

Recent advances in generative AI, especially powerful Large Language Models (LLMs) and Large Reasoning Models (LRMs), raise concerns over the interpretability, safety and sustainability of these large and opaque AI models. The power of such architectures is derived not only from the scalability of deep neural networks, but also massively parallel hardware such as GPU clusters. The diffuse nature of deep neural networks gives them great function-approximation capability when provided with sufficient training data but imposes a cost in interpretability and computational efficiency. Observing that localised machine learning (ML) models tend to be more interpretable and computationally efficient than deep neural networks on small datasets, we reason by analogy that similar advantages may apply to specific localised hardware ML architectures. We argue that localised architectures with lower bandwidth but higher expressivity per node have the potential to be fundamentally more interpretable than deep neural networks running on GPU clusters while remaining competitive for smaller datasets. We then evaluate the suitability of various hardware ML paradigms for implementing such localised architectures and evaluate their per-node expressivity, energy efficiency and practical maturity of the technology required.

URL PDF HTML ☆

赞 0 踩 0

2606.08779 2026-06-10 cs.LG 版本更新

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

重新制定LLM强化学习以在黑箱差异下高效训练

Jiashun Liu, Runze Liu, Xu Wan, Jing Liang, Hongyao Tang, Ling Pan

发表机构 * Hong Kong University of Science and Technology（香港科技大学）； Zhejiang University（浙江大学）； Tianjin University（天津大学）

AI总结针对强化学习中的训练-推理差异问题，提出差异约束马尔可夫决策过程（DCMDP），通过拉格朗日松弛自适应平衡性能提升与差异控制，实现稳定高效训练。

详情

AI中文摘要

强化学习（RL）已成为一种关键的后训练范式，但它经常遭受不可预测的次优性能甚至训练崩溃。最近的研究将这些失败归因于隐藏的训练-推理差异（或不匹配），源于底层引擎和架构的不同。我们发现，当提供适当的学习信号时，训练策略可以主动自我纠正这种差异。然后，我们进一步通过经验确定了一个差异容忍区域：在该区域内，激进地缩小差异会抑制策略探索并降低学习效率，而在该区域外，减少过度差异可提高优化一致性并提升可达到的局部性能上限。根据这些发现，我们将此问题表述为差异约束马尔可夫决策过程（DCMDP），其中奖励最大化与对齐训练-推理行为的约束相结合，实现稳定的双目标优化。为了自适应地平衡性能改进和差异控制，我们引入了一种拉格朗日松弛机制，根据当前差异违反程度动态调整两个目标的相对权重。这使得双目标优化稳定：策略可以在容忍区域内自由探索，而当差异超出安全边界时则被引导回来。经验上，DCMDP显著提升了8B密集模型（Qwen-3-8b）和30B混合专家模型（Qwen-3-30bA3b）的性能，并实现了一种异构训练范式，其中LLM可以在高保真训练设置下进行优化，同时明确对齐以用于低成本、资源受限的推理部署。

英文摘要

Reinforcement Learning (RL) has emerged as a pivotal post-training paradigm, yet it frequently suffers from unpredictable sub-optimum performance or even training collapses. Recent findings attribute these failures to a hidden train-inference discrepancy (or mismatch), stemming from the disparate underlying engines and architecture. We find that the training policy can actively self-correct such a discrepancy when provided with an appropriate learning signal. Then, we further empirically identify a discrepancy tolerance region: within this region, aggressively narrowing the discrepancy can suppress policy exploration and reduce learning efficiency, whereas outside this region, reducing excessive discrepancy improves optimization consistency and raises the achievable local performance ceiling. According to such findings, we formulate this problem as a Discrepancy-Constrained Markov Decision Process (DCMDP), where reward maximization is coupled with a constraint that aligns training-Inference behavior, achieving stable dual-objective optimization. To adaptively balance performance improvement and discrepancy control, we introduce a Lagrangian relaxation mechanism that dynamically adjusts the relative weight of the two objectives according to the current degree of discrepancy violation. This enables stable dual-objective optimization: the policy is allowed to explore freely within the tolerance region, while being guided back when the discrepancy exceeds the safe boundary. Empirically, DCMDP significantly improves the performance of 8B dense model (Qwen-3-8b) and 30B Mixture-of-Expert model (Qwen-3-30bA3b), and enables a heterogeneous training paradigm, where LLMs can be optimized in high-fidelity training setup while being explicitly aligned for low-cost, resource-constrained inference deployment.

URL PDF HTML ☆

赞 0 踩 0

2501.04339 2026-06-10 stat.ML cs.LG physics.app-ph 版本更新

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

可解释的深度卷积模型用于复杂系统中的非线性多元时间序列

Domjan Baric, Davor Horvatic

发表机构 * Department of Physics, Faculty of Science, University of Zagreb（扎格拉布大学物理系）

AI总结提出DCIts架构，通过分解为Focuser和Modeler组件，实现非线性多元时间序列的局部可解释交互结构学习，在保持预测精度的同时恢复稳定的符号化滞后交互模式。

Comments 40 pages, 13 figures

详情

DOI: 10.1063/5.0325209
Journal ref: Chaos 36, 063116 (2026)

AI中文摘要

我们介绍了深度卷积时间序列解释器（DCIts），这是一种用于非线性多元时间序列的深度学习架构，能够提供样本特定、局部可解释的底层交互结构描述。与标准的黑箱预测器不同，DCIts学习一个时间和滞后依赖的转移张量，该张量被显式分解为两个组件：Focuser通过稀疏掩码机制选择相关的源序列和时间滞后，Modeler为这些选定的交互分配符号系数。这种分解为每个预测实例产生局部滞后邻接结构和符号化的源-滞后贡献，从而能够直接检查有效连接；当高阶分支被激活时，同一框架产生阶数分辨的元素级多项式贡献。在架构上，DCIts使用多样化的卷积滤波器库来捕获时间和跨变量依赖关系，这些依赖关系通过瓶颈网络映射到转移张量。在具有已知交互结构的受控基准数据集上，我们证明DCIts在实现竞争性预测误差（相对于强可解释基线）的同时，恢复了稳定的、符号化的、滞后分辨的交互模式。因此，该框架优先考虑内在可解释性，将预测准确性作为忠实性约束而非唯一目标。

英文摘要

We introduce the Deep Convolutional Interpreter for Time Series (DCIts), a deep-learning architecture for nonlinear multivariate time series that provides sample-specific, locally interpretable descriptions of the underlying interaction structure. Unlike standard black-box forecasters, DCIts learns a time- and lag-dependent transition tensor explicitly factorized into two components: a Focuser, which selects relevant source series and time lags via a sparse masking mechanism, and a Modeler, which assigns signed coefficients to these selected interactions. This decomposition yields a local lag-adjacency structure and signed source-lag contributions for every forecast instance, enabling direct inspection of effective connectivity; when higher-order branches are activated, the same framework yields order-resolved elementwise polynomial contributions. Architecturally, DCIts uses a diverse bank of convolutional filters to capture temporal and cross-variable dependencies, which are mapped through a bottleneck network to the transition tensor. On controlled benchmark datasets with a known interaction structure, we demonstrate that DCIts achieves competitive forecasting error relative to a strong interpretable baseline while recovering stable, signed, lag-resolved interaction patterns. The framework thus prioritizes intrinsic interpretability, using forecasting accuracy as a faithfulness constraint rather than the sole objective.

URL PDF HTML ☆

赞 0 踩 0

2604.06893 2026-06-10 cs.CV cs.LG 版本更新

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

能量正则化的空间遮蔽：一种增强视觉模型鲁棒性和可解释性的新方法

Tom Devynck, Bilal Faye, Djamel Bouchaffra, Nadjib Lazaar, Hanane Azzag, Mustapha Lebbah

发表机构 * DAVID Lab, UVSQ, Paris-Saclay University（DAVID实验室，UVSQ，巴黎-萨克雷大学）； LIPN, UMR CNRS 7030, Sorbonne Paris Nord University（LIPN，UMR CNRS 7030，索邦巴黎北大学）； LISN, Paris-Saclay University（LISN，巴黎-萨克雷大学）

AI总结本文提出能量正则化空间遮蔽框架，通过可微能量最小化问题重新定义特征选择，实现更鲁棒和可解释的视觉模型。

Comments 8 pages

详情

AI中文摘要

深度卷积神经网络通过密集空间特征图的彻底处理取得了显著性能，但这种暴力策略引入了显著的计算冗余并鼓励依赖于虚假背景相关性。为此，我们提出能量正则化空间遮蔽（ERSM），一种新的框架，将特征选择重新公式化为可微能量最小化问题。通过在标准卷积骨干中嵌入轻量级能量-遮蔽层，每个视觉标记被分配一个由两个竞争力组成的标量能量：内在的Unary重要性成本和Pairwise空间一致性惩罚。不同于以往的剪枝方法，ERSM允许网络自主发现针对每个输入的最佳信息密度平衡。我们验证了ERSM在卷积架构上的有效性，证明其产生新兴稀疏性、改进对结构遮挡的鲁棒性，并产生高度可解释的空间遮蔽，同时保持分类准确性。此外，我们表明所学的能量排名在删除基于鲁棒性测试中显著优于基于幅度的剪枝，揭示ERSM作为一种内在去噪机制，能够在无像素级监督的情况下隔离语义物体区域。

用稀疏自编码器解释和引导文本转语音语言模型

Nikita Koriagin, Georgii Aparin, Nikita Balagansky, Daniil Gavrilov

AI总结本文在CosyVoice3语言模型骨干上训练BatchTopK稀疏自编码器，发现特征可解释且因果可控，能操纵笑声、性别和语速。

详情

AI中文摘要

语言模型日益成为文本转语音（TTS）系统的骨干，但我们对其在文本和生成语音令牌共享单一残差流时构建的表示知之甚少。我们在CosyVoice3的语言模型骨干上训练BatchTopK稀疏自编码器，并引入一种模态感知的自动解释流水线，根据特征触发的位置——文本前缀上下文、1秒语音片段或两者——为每个特征打标签。恢复的特征是可解释的，涵盖音素、笑声、口音提示和说话者性别。通过SAE潜在空间进行引导表明，这些特征是因果性的而非仅仅是描述性的：有针对性的干预将笑声概率从0.02提高到0.79，翻转感知到的说话者性别，并在保持口语内容的同时控制语速。因此，SAE特征既可作为解释性对象，也可作为TTS合成的控制方向。

英文摘要

Language models increasingly serve as the backbone of text-to-speech (TTS) systems, yet we understand little about the representations they build when text and generated speech tokens share a single residual stream. We train BatchTopK sparse autoencoders on the LM backbone of CosyVoice3 and introduce a modality-aware auto-interp pipeline that labels each feature from where it fires-text-prefix context, 1-second speech clips, or both. The recovered features are interpretable, spanning phonemes, laughter, accent prompts and speaker gender. Steering through the SAE latent space shows these features are causal rather than merely descriptive: targeted interventions raise laughter probability from 0.02 to 0.79, flip perceived speaker gender, and control speech rate while preserving spoken content. SAE features thus serve both as interpretability objects and as control directions for TTS synthesis.

URL PDF HTML ☆

赞 0 踩 0

学习双稀疏显式条件变换

Tudor Pistol

AI总结提出一种将固定规范矩阵与自适应稀疏分量乘积形式的结构化显式条件变换学习方法，在保持快速稳定分析变换优势的同时引入可控自适应性，实验表明在双稀疏变换学习问题上达到最优性能。

Comments 10 pages, 1 figure, 1 table. Accepted for publication in Procedia Computer Science (30th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems - KES 2026; Invited Session: Global and Constrained Optimization: Algorithms and Applications)

详情

AI中文摘要

在最近的研究中，找到自然信号假定稀疏结构成立的便利空间已成为一个理想结果，其影响体现在数据压缩、降噪和特征提取等领域。虽然广泛使用的分析变换（如DFT或DCT）已经提供了高效的算法和鲁棒的稀疏表示，但它们假设了关于数据的固定先验，无法准确捕捉更严格信号类别的特定结构。为了解决这个问题，文献中引入了数据自适应学习变换的概念，允许减少变换域中的残差项。最近的研究表明，条件数在此背景下是一个良好的度量，期望的结果在泛化倾向和实现最小近似误差之间交替。受这些考虑启发，我们引入了一种结构化显式条件变换的学习，该变换被表述为一个固定规范矩阵与一个精炼的数据自适应稀疏分量的乘积。这种方法旨在保留快速稳定分析变换的优势，同时引入对数据的可控自适应性。目前尚未发现涉及这种特定公式的参考文献，表明其新颖性。所提出的算法在不精确近端方法的框架内被推导，利用了一个新导出的闭式投影算子。实验观察表明，在双稀疏变换学习问题上取得了最先进的结果，并且与密集变体相比，在显著降低计算成本的同时，有时收敛更快且更好地避免不良局部最小值。

英文摘要

Finding convenient spaces in which certain hypotheses regarding an assumed sparse structure of natural signals hold true has become a desirable result in recent research, its implications being reflected in areas such as data compression, noise reduction and feature extraction. While the extensively used analytical transforms, such as DFT or DCT, already provide efficient algorithms and robust sparse representations, they assume a fixed prior about the data, failing to accurately capture the specific structure of more restrictive classes of signals. To address this, the concept of a data-adaptive, learnt transform has been introduced in the literature, allowing for the reduction of a residual term in the transform domain. More recent studies have shown that the condition number serves as a good metric in this context, where the desired outcome alternates between a generalizing tendency and one that achieves minimal approximation error. Motivated by these considerations, we introduce the learning of a structured, explicitly conditioned transform formulated as the product of a fixed canonical matrix and a refining data-adaptive sparse component. This approach seeks to preserve the advantages of fast and stable analytical transforms, while introducing controllable adaptivity to the data. No references that concern this specific formulation have been identified so far, indicating its novelty. The proposed algorithm is motivated within the framework of inexact proximal methods, leveraging a newly derived closed-form projection operator. Empirical observations demonstrate state-of-the-art results on the doubly sparse transform learning problem and comparable performance with its dense variant at significantly lower computational costs and sometimes faster convergence and better avoidance of bad local minima.

URL PDF HTML ☆

赞 0 踩 0

2606.09856 2026-06-10 cs.CL cs.AI cs.LG stat.ML 交叉投稿

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

使用概率程序训练大型语言模型的归纳推理

Liyi Zhang, Akshay K. Jagadish, Brenden M. Lake, Thomas L. Griffiths

AI总结提出基于程序的后验训练（PPT）方法，利用LLM生成概率程序场景，通过推理产生分布目标，微调模型以提升归纳推理准确性、与人类判断的一致性及校准能力。

Comments 20 pages, 5 figures

详情

AI中文摘要

大型语言模型（LLM）的后训练推理通常专注于数学和编码等演绎任务，其中正确性可验证。然而，许多现实世界的推理问题是归纳性的：智能体必须从稀疏、模糊的观测中推断不确定的信念。使用标准微调方法进行归纳推理面临挑战，包括难以策划大规模、高质量标注数据集以及处理本质上是分布式的目标。在这项工作中，我们引入了一种称为基于程序的后验训练（PPT）的新方法来解决这些局限性：我们使用LLM生成多样化的开放世界场景作为概率程序，运行概率推理以产生查询的分布式目标响应，然后在这些概率软标签上进行微调。使用这种方法，我们在10,000个程序生成的场景上微调LLM，并在保留的模板、人工标注的判断和外部基准上进行评估。总体而言，PPT显著提高了保留归纳任务的估计准确性，增强了与人类判断的一致性，并迁移到估计和校准的外部基准。此外，原始校准的增益并未被事后温度缩放所涵盖，表明与输出重新缩放相比，模型更深入地内化了不确定性。这些结果表明，概率程序介导的微调是一种有前景的方法，用于后训练LLM以可靠地执行近似归纳推理。

英文摘要

Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer uncertain beliefs from sparse, ambiguous observations. There are challenges to using standard fine-tuning methods for inductive reasoning, including difficulties in curating large-scale, high-quality labeled datasets and in handling targets that are inherently distributional. In this work, we introduce a novel approach, called Program-based Posterior Training (PPT), to address these limitations: we use an LLM to generate diverse open-world scenarios as probabilistic programs, run probabilistic inference to produce distributional target responses to queries, and then fine-tune on these probabilistic soft labels. Using this approach, we fine-tune LLMs on 10,000 programmatically generated scenarios and evaluate on held-out motifs, human-labeled judgments, and external benchmarks. Overall, PPT substantially improves estimation accuracy on held-out inductive tasks, increases alignment with human judgments, and transfers to external benchmarks for estimation and calibration. Additionally, the gains in raw calibration are not subsumed by post-hoc temperature scaling, showing that the models have more deeply internalized uncertainty compared to output rescaling. Together, these results suggest that probabilistic-program-mediated fine-tuning is a promising approach for post-training LLMs to reliably perform approximate inductive inference.

URL PDF HTML ☆

赞 0 踩 0

2606.09893 2026-06-10 eess.IV cs.AI cs.LG 交叉投稿

Tractogram foundation model

TractFM：纤维束图基础模型

Guikun Chen, Yuqian Chen, Yijie Li, Yogesh Rathi, Nikos Makris, Fan Zhang, Wenguan Wang, Lauren J. O'Donnell

发表机构 * The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou（脑机智能国家重点实验室，浙江大学，杭州）； Department of Radiology, Brigham and Women’s Hospital, Mass General Brigham, Boston（放射科，布里洛妇女医院，马萨诸塞总医院，波士顿）； Harvard Medical School, Boston（哈佛医学院，波士顿）； Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin（医学工程与转化医学研究院，天津大学，天津）； School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu（信息与通信工程学院，电子科技大学，成都）； Psychiatry Neuroimaging Laboratory, Brigham and Women’s Hospital, Mass General Brigham, Boston（精神病神经影像实验室，布里洛妇女医院，马萨诸塞总医院，波士顿）； Department of Psychiatry, Center for Morphometric Analysis, Massachusetts General Hospital, Boston（精神病科，形态分析中心，马萨诸塞总医院，波士顿）

AI总结提出TractFM基础模型，直接从全脑纤维束集学习可复用表示，结合局部纤维编码器和置换等变纤维束编码器，通过密集解剖束分割预训练，实现纤维束级和受试者级任务的迁移。

详情

AI中文摘要

扩散MRI（dMRI）纤维束成像是在活体人脑中绘制白质通路的唯一非侵入性方法。它将每个大脑表示为一个纤维束图：一个大型、无序的三维流线集合，包含局部流线几何和全脑解剖组织的信息。这种结构使纤维束图成为表示学习的自然但具有挑战性的目标。现有方法将流线分类和受试者级预测视为独立问题：流线分类器关注几何模式，而受试者级预测通常依赖于手工特征。因此，当前方法无法学习连接流线解剖与全脑受试者间变异的可复用表示。本文介绍TractFM，一个纤维束图基础模型，直接从全脑纤维束集学习可复用表示。TractFM结合了局部流线编码器和置换等变纤维束编码器，使得一个受试者的所有流线能够在单次前向传递中共同上下文化。在密集解剖束分割（即给单个流线分配解剖标签）上的预训练产生了两种互补表示：用于束分割的上下文化流线级嵌入和用于下游受试者表型预测的紧凑受试者级描述符。在三种纤维束成像算法和五个dMRI数据集上，TractFM迁移到流线级和受试者级任务。其冻结表示实现了准确的束分割，并在独立数据集上预测年龄和性别。这些结果表明，全脑几何上下文（一次性学习）可以泛化到纤维束成像流程、数据集和预测任务中。

英文摘要

Diffusion MRI (dMRI) tractography is the only noninvasive approach for mapping white-matter pathways in the living human brain. It represents each brain as a tractogram: a large, unordered set of three-dimensional streamlines that includes information about both local streamline geometry and whole-brain anatomical organization. This structure makes tractograms a natural but challenging target for representation learning. Existing methods treat streamline classification and subject-level prediction as separate problems: streamline classifiers focus on geometric patterns, whereas subject-level prediction often depends on hand-crafted features. As a result, current methods do not learn reusable representations that connect streamline anatomy with whole-brain inter-subject variation. Here we introduce TractFM, a tractogram foundation model that learns reusable representations directly from whole-brain streamline sets. TractFM combines a local streamline encoder with a permutation-equivariant tractogram encoder, allowing all streamlines from a subject to be contextualized jointly in a single forward pass. Pretraining on dense anatomical tract parcellation, i.e., assigning anatomical labels to individual streamlines, yields two complementary representations: contextualized streamline-level embeddings for tract parcellation and compact subject-level descriptors for downstream prediction of subject phenotypes. Across three tractography algorithms and five dMRI datasets, TractFM transfers to both streamline-level and subject-level tasks. Its frozen representations achieve accurate tract parcellation and predict age and sex across independent datasets. These results show that whole-brain geometric context, learned once, can generalize across tractography pipelines, datasets, and prediction tasks.

URL PDF HTML ☆

赞 0 踩 0

2502.07027 2026-06-10 cs.LG cs.AI 版本更新

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

基于化学诱导契合的表征对齐用于分子关系学习

Peiliang Zhang, Jingling Yuan, Qing Xie, Yongjun Zhu, Chao Che, Lin Li

发表机构 * Wuhan University of Technology（武汉理工大学）； Yonsei University（延世大学）； Hubei Key Laboratory of Transportation Internet of Things（湖北省交通运输物联网重点实验室）； Dalian University（大连大学）

AI总结提出ReAlignFit方法，通过引入化学诱导契合的归纳偏置动态对齐子结构表征，并利用子图信息瓶颈优化高化学功能兼容性的子结构对，以提升分子关系学习在化学空间偏移数据上的稳定性。

Comments Accepted by SIGKDD2026 AI for Science Track

详情

AI中文摘要

分子关系学习（MRL）广泛应用于自然科学中，通过提取结构特征来预测分子对之间的关系。子结构对之间的表征相似性决定了分子结合位点的功能兼容性。然而，通过注意力机制对齐子结构表征缺乏化学知识的指导，导致模型在化学空间（例如，官能团、骨架）偏移数据上的性能不稳定。在理论论证的基础上，我们提出了基于化学诱导契合的表征对齐（ReAlignFit）来增强MRL的稳定性。ReAlignFit通过引入基于化学诱导契合的归纳偏置，动态对齐MRL中的子结构表征。在诱导过程中，我们设计了基于子结构边重构的偏差校正函数，通过模拟化学构象变化（子结构的动态组合）来对齐子结构对之间的表征。ReAlignFit进一步在契合过程中集成子图信息瓶颈，以精炼和优化具有高化学功能兼容性的子结构对，并利用它们生成分子嵌入。在九个数据集上的实验结果表明，ReAlignFit在两个任务中均优于最先进的模型，并显著增强了模型在规则偏移和骨架偏移数据分布上的稳定性。

英文摘要

Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting structural features. The representational similarity between substructure pairs determines the functional compatibility of molecular binding sites. Nevertheless, aligning substructure representations by attention mechanisms lacks guidance from chemical knowledge, resulting in unstable model performance in chemical space (\textit{e.g.}, functional group, scaffold) shifted data. With theoretical justification, we propose the \textbf{Re}presentational \textbf{Align}ment with Chemical Induced \textbf{Fit} (ReAlignFit) to enhance the stability of MRL. ReAlignFit dynamically aligns substructure representation in MRL by introducing chemical Induced Fit-based inductive bias. In the induction process, we design the Bias Correction Function based on substructure edge reconstruction to align representations between substructure pairs by simulating chemical conformational changes (dynamic combination of substructures). ReAlignFit further integrates the Subgraph Information Bottleneck during fit process to refine and optimize substructure pairs exhibiting high chemical functional compatibility, leveraging them to generate molecular embeddings. Experimental results on nine datasets demonstrate that ReAlignFit outperforms state-of-the-art models in two tasks and significantly enhances model's stability in both rule-shifted and scaffold-shifted data distributions.

URL PDF HTML ☆

赞 0 踩 0

2505.11702 2026-06-10 cs.LG stat.ML 版本更新

Post-Training Augmentation Invariance

训练后增强不变性

Keenan Eikenberry, Lizuo Liu, Yoonsang Lee

发表机构 * Department of Mathematics, Dartmouth College（达特茅斯学院数学系）

AI总结提出训练后增强不变性框架，通过轻量级MLP适配器网络在预训练模型潜空间上实现近似不变性，无需微调且保持原始特征。

详情

AI中文摘要

本文开发了一个训练后增强不变性的框架，其目标是为预训练网络添加不变性属性，同时不改变其在原始非增强输入分布上的行为。我们精确定义了这一概念，并引入了增强编码器，这是一种概率编码器，形式化了基于增强的编码过程，并作为我们的基本研究对象。我们提出了两种增强编码器的损失函数，即马尔可夫-瓦瑟斯坦最小化和瓦瑟斯坦相关性最大化，并通过实验证明，这两种损失函数可用于训练轻量级的单隐藏层MLP适配器网络$E_{\theta}$，当将其附加到预训练网络$F$的潜空间时，确实能实现（近似）训练后增强不变性。例如，在STL10上使用$F=\text{DINO}$特征时，复合网络$C\circ E_{\theta}\circ F$（其中$C$是线性分类器，$E_{\theta}$是我们提出的适配器网络之一）在任意旋转图像上达到94%的分类准确率，而没有适配器$E_{\theta}$的$C\circ F$网络则降至71%。类似地，我们可以将噪声不变分类结果从58%提升至86%。重要的是，我们无需微调即可获得这些结果（$F$的权重全程冻结），并且我们的方法对原始特征的破坏很小，因为$E_{\theta}$在非增强潜分布上几乎等距作用。相比之下，我们展示了使用替代候选损失函数（特别是SimCLR和HSIC最大化）训练的适配器网络产生了不具竞争力的分类结果，并从根本上破坏了原始潜空间。代码见https://this URL。

英文摘要

This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks $E_θ$ that, when appended to the latent space of a pretrained network $F$, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with $F=\text{DINO}$ features, the composite network $C\circ E_θ\circ F$, where $C$ is a linear classifier and where $E_θ$ is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form $C\circ F$ without the adapter $E_θ$ drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of $F$ remain frozen throughout), and our methods introduce little corruption to the original features, since $E_θ$ acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at https://github.com/keenan-eikenberry/augmentation_invariance

URL PDF HTML ☆

赞 0 踩 0

2602.19393 2026-06-10 cs.LG 版本更新

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

为余弦相似度辩护：归一化消除了规范自由度

Taha Bouhsine

AI总结本文证明，当嵌入被约束到单位球面时，对角规范矩阵的歧义消失，余弦距离与欧氏距离单调等价，从而解决了余弦相似度任意性的问题。

Comments This was a blog post companion draft, it needs to be updated to fit as a preprint, will do later

详情

AI中文摘要

Steck、Ekanadham 和 Kallus [arXiv:2403.05440] 表明，来自矩阵分解模型的学习嵌入的余弦相似度可以通过对角“规范”矩阵 $D$ 变得任意。他们的结果对于使用点积目标训练嵌入并计算余弦相似度的从业者来说是正确的且重要的。然而，我们认为，他们得出的普遍反对余弦相似度的结论，混淆了不兼容训练目标的病理与单位球面上余弦距离的几何有效性。我们证明，当嵌入被约束到单位球面 $\mathbb{S}^{d-1}$ 时（无论是在训练期间还是之后使用适当的目标），$D$ 矩阵的歧义完全消失，并且余弦距离恰好等于平方欧氏距离的一半。这种单调等价性意味着，在归一化嵌入上，基于余弦和基于欧氏距离的邻居排名是相同的。余弦相似度的“问题”不在于余弦相似度本身，而在于未能进行归一化。

英文摘要

Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix $D$. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize.

URL PDF HTML ☆

赞 0 踩 0

2606.06624 2026-06-10 cs.LG 版本更新

Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory

深度表示学习的原理与实践：或记忆的数学理论

Sam Buchanan, Druv Pai, Peng Wang, Yi Ma

AI总结本书通过表示学习视角，用优化和信息论解释现代神经网络架构设计原理，旨在打开黑箱，提高可解释性、可靠性和可控性。

Comments version 2; TeX source and supplementary material at https://ma-lab-berkeley.github.io/deep-representation-learning-book/

详情

AI中文摘要

在当前深度学习和特别是生成模型的时代，训练非常大的生成模型投入巨大。到目前为止，这类模型是难以理解的“黑箱”，因为它们具有不透明的内部机制，导致在可解释性、可靠性和可控性方面存在困难。自然，这种缺乏理解导致了炒作和恐惧。本书试图通过表示学习的视角“打开黑箱”并理解大型深度网络的机制，这是深度学习模型经验能力的主要因素——可以说是最重要的因素。本书简要大纲如下：第1章将总结贯穿全文的线索。第2、3、4、5和6章将通过优化和信息论解释现代神经网络架构的设计原理，一旦引入基本原理，就将架构开发过程（长期以来被描述为一种“炼金术”）简化为本科水平的线性代数和微积分练习。第7章和第8章将讨论这些原理在更范式化的问题解决中的应用，获得新的方法和模型，这些模型在设计上高效、可解释且可控，但又不亚于——有时甚至超过——它们所模仿的黑箱模型。第9章将讨论深度学习的潜在未来方向、表示学习的作用以及一些开放问题。

英文摘要

In the current era of deep learning and especially generative models, there is significant investment in training very large deep neural networks. Thus far, such models have been "black boxes" that are difficult to understand in the sense that they have opaque internal mechanisms, leading to difficulties in interpretability, reliability, and control. Naturally, this lack of understanding has led to both hype and fear. This book is an attempt to "open the black box" and understand the mechanisms of large deep networks, through the perspective of representation learning, which is a major factor - arguably the single most important one - in the empirical power of deep learning models. A brief outline of this book is as follows. Chapter 1 will summarize the threads that underlie the whole text. Chapters 2, 3, 4, 5, and 6 will explain the design principles of modern neural network architectures through optimization and information theory, reducing the process of architecture development (long having been described as a sort of "alchemy") to undergraduate-level linear algebra and calculus exercises once the underlying principles are introduced. Chapters 7 and 8 will discuss applications of these principles to solve problems in more paradigmatic ways, obtaining new methods and models which are efficient, interpretable, and controllable by design, and yet no less - sometimes even more - powerful than the black-box models they resemble. Chapter 9 will discuss potential future directions for deep learning, the role of representation learning, as well as some open problems.

URL PDF HTML ☆

赞 0 踩 0

2606.09883 2026-06-10 cs.LG cs.AI 新提交

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

TD-Grokking：通过训练时分解从零奖励问题中学习

Ningyuan Xi, Hao Xu, Hongsheng Xin, Ning Miao

发表机构 * Ningyuan Xi 1,2（西宁元 1,2）； Hao Xu 3（许浩 3）； Hongsheng Xin 3（辛红生 3）； Ning Miao 1,2, †（苗宁 1,2, †）

AI总结针对强化学习在零奖励问题上无法提供优化信号的问题，提出训练时分解框架TD-Grokking，将难解问题递归分解为可验证子问题，在数学和医疗任务上优于基线方法。

详情

AI中文摘要

大型语言模型（LLMs）在推理任务上取得了显著进展，这主要归功于后训练范式，特别是基于可验证奖励的强化学习（RLVR）。然而，一个关键瓶颈依然存在：RLVR在极具挑战性的零奖励问题上失败，因为所有采样的推理轨迹都产生统一失败的结果，无法提供优化信号来驱动模型改进。先前解决这一限制的努力，如密集过程监督、部分奖励分配或前缀引导探索，受到固有任务约束的限制，或者未能完全赋予策略模型解决原始难解问题所需的能力。为了解决这个问题，我们提出了TD-Grokking，一个针对零奖励问题的训练时分解框架。它递归地将难解的根问题分解为自包含、可验证的子问题，形成层次树，其中可解的叶子节点提供非零奖励。在数学和医疗任务上的评估表明，TD-Grokking优于普通的GRPO以及所有基线方法。结合详细分析，这些结果证实训练时分解有效地将零奖励示例转化为可用的训练信号，从而实现一致的性能提升。我们的代码和数据集可在以下网址获取：https://this URL。

英文摘要

Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by post-training paradigms, especially reinforcement learning with verifiable rewards (RLVR). However, a critical bottleneck persists: RLVR fails on highly challenging zero-reward problems, where all sampled reasoning trajectories yield uniformly failed outcomes, providing no optimization signal to drive model improvement. Prior efforts to address this limitation, such as dense process supervision, partial reward assignment, or prefix-guided exploration, suffer from inherent task constraints or do not fully equip the policy model with the capabilities necessary to solve the original intractable problems. To address this, we propose TD-Grokking, a training-time decomposition framework for zero-reward problems. It recursively decomposes intractable root problems into self-contained, verifiable subproblems, forming hierarchical trees where solvable leaves provide non-zero rewards. Evaluations on mathematical and medical tasks show that TD-Grokking outperforms vanilla GRPO as well as all baseline approaches. Together with detailed analysis, these results confirm that training-time decomposition effectively converts zero-reward examples into usable training signals, enabling consistent performance gains. Our code and datasets are available at https://anonymous.4open.science/r/TD-Grokking-6567/.

URL PDF HTML ☆

赞 0 踩 0

2606.09887 2026-06-10 cs.LG cs.AI cs.CL 新提交

SocraticPO: Policy Optimization via Interactive Guidance

SocraticPO: 通过交互式指导进行策略优化

Zirui Liu, Jie Ouyang, Qi Liu, Xianquan Wang, Jiayu Liu, Tingyue Pan, Qingchuan Li, Jing Sha, Zhenya Huang, Shijin Wang, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China（认知智能国家重点实验室，中国科学技术大学）； iFLYTEK AI Research (Central China), iFLYTEK Co., Ltd（iFLYTEK中央中国AI研究院，iFLYTEK公司）

AI总结提出SocraticPO框架，在强化学习中使用自然语言指导辅助推理，并通过奖励衰减防止模型依赖教师帮助，提升科学推理任务性能。

详情

AI中文摘要

用于大语言模型的强化学习通常使用标量结果奖励（如二元正确性）来监督推理。这种奖励提供了优化方向，但很少解释模型应如何修正其错误推理，这可能鼓励捷径学习和脆弱的策略。我们提出\textbf{SocraticPO}（苏格拉底式策略优化），一种策略优化框架，用苏格拉底式的自然语言指导增强强化学习展开。在展开过程中，学生首先独立回答；如果答案错误，教师诊断尝试并提供简洁的纠正性指导，之后学生在扩展的上下文下继续。关键的是，这种指导与奖励衰减配对：在教师干预后获得的正确答案只得到衰减的奖励，防止策略将教师帮助视为获取奖励的免费途径。由于SocraticPO只修改展开过程，而保持标准期望奖励目标不变，它可以插入到现有的策略梯度后端（如Reinforce++）中。此外，由于教师只提供文本级指导，SocraticPO可以利用更强的黑盒教师模型，而无需访问logits或分布匹配。在来自SciKnowEval的本科水平科学推理基准上，SocraticPO优于强强化学习和自蒸馏基线。消融实验表明，目标指导和奖励衰减都是必要的，奖励衰减减轻了对辅助纠正的依赖。

英文摘要

Reinforcement learning (RL) for large language models usually supervises reasoning with scalar outcome rewards, such as binary correctness. Such rewards provide an optimization direction but rarely explain how a model should revise its mistaken reasoning, which can encourage shortcut learning and brittle policies. We propose \textbf{SocraticPO} (Socratic Policy Optimization), a policy-optimization framework that augments RL rollouts with Socratic-style natural-language guidance. During rollout, the student first answers independently; if the answer is incorrect, a teacher diagnoses the attempt and provides concise corrective guidance, after which the student continues under the expanded context. Crucially, this guidance is paired with reward decay: correct answers obtained after teacher intervention only receive decayed rewards, preventing the policy from treating teacher help as a free path to reward. Since SocraticPO only modifies the rollout process while leaving the standard expected-reward objective intact, it can be plugged into existing policy-gradient backends such as Reinforce++. Moreover, because the teacher provides only text-level guidance, SocraticPO can leverage stronger black-box teacher models without requiring access to logits or distribution matching. On undergraduate-level scientific reasoning benchmarks from SciKnowEval, SocraticPO improves over strong RL and self-distillation baselines. Ablations show that both targeted guidance and reward decay are necessary, with reward decay mitigating reliance on assisted correction.

URL PDF HTML ☆

赞 0 踩 0

2606.09932 2026-06-10 cs.LG cs.AI 新提交

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

当强化学习在监督微调后失效：恢复模型可塑性以实现稳健的SFT到RL交接

Runze Liu, Jiashun Liu, Xu Wan, Yuqian Fu, Ling Pan

AI总结针对SFT过度训练导致RL阶段改进有限的问题，提出Rejuvenation方法，通过基模型锚定融合和神经元重置恢复模型可塑性，在数学推理和智能体任务上提升RL性能。

详情

AI中文摘要

监督微调（SFT）后接强化学习（RL）已成为大语言模型（LLM）后训练的标准流程。SFT预期为RL提供有用的行为先验，以进一步增强模型能力。然而，过度SFT的检查点在RL中往往表现出有限的改进。我们将此失败归因于模型可塑性的丧失：SFT初始化的策略被后续RL有效重塑的能力降低。为了更好地理解这一现象，我们从参数变化、输出空间和RL优化动态等多个角度进行了详细分析。我们的结果表明，过度SFT的模型倾向于产生过度自信的token分布，并表现出尖锐的参数景观，这使得它们在RL阶段更难优化。为了实现更稳健的SFT到RL交接，我们提出了Rejuvenation，一种简单而有效的方法，在保留有用的SFT获取先验的同时恢复可塑性。Rejuvenation利用基于基模型的模型融合来减少过度SFT引起的漂移，并通过有针对性的神经元重置来缓解模型僵化。在数学推理任务和智能体任务上的实验结果表明，我们的方法在过度训练的SFT模型上持续提升了RL性能，同时也增强了对分布外任务的泛化能力。

英文摘要

Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become a standard pipeline for Large Language Model (LLM) post-training. SFT is expected to provide a useful behavioral prior for RL to further enhance model capabilities. However, checkpoints with excessive SFT often show limited improvement during RL. We attribute this failure to the loss of model plasticity: the reduced ability of an SFT-initialized policy to be effectively reshaped by subsequent RL. To better understand this phenomenon, we conduct detailed analysis from multiple perspectives, including parameter changes, output spaces, and RL optimization dynamics. Our results show that models from excessive SFT tend to produce over-confident token distributions and exhibit sharp parameter landscapes, which make them harder to optimize in the RL stage. To enable a more robust SFT-to-RL handoff, we propose \texttt{Rejuvenation}, a simple yet effective method that restores plasticity while preserving useful SFT-acquired priors. Rejuvenation leverages base-anchored model fusion to reduce excessive SFT-induced drift with targeted neuron reset to mitigate model rigidity. Experimental results on both math reasoning tasks and agentic tasks demonstrate that our approach consistently improves RL performance on over-trained SFT models, while also enhancing generalization to out-of-distribution tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.09961 2026-06-10 cs.LG cs.AI 新提交

无基线的神经组合优化策略优化

Carlos S. Sepúlveda, Gonzalo A. Ruz

AI总结提出使用GRPO算法消除神经组合优化中的基线依赖，避免训练崩溃，在TSP和CVRP上达到接近POMO的性能。

详情

AI中文摘要

神经组合优化（NCO）训练自回归策略以解决路由问题。标准训练算法REINFORCE使用滚动基线，需要维护并定期更新策略的冻结副本以降低方差。这种基线引入了一个结构脆弱性：在更难的问题实例上，较差的基线会产生噪声梯度估计，从而破坏训练稳定性。我们评估了来自大语言模型对齐的组相对策略优化（GRPO），该算法通过归一化组内采样轨迹的优势完全消除了基线。在RL4CO框架内对TSP和CVRP基准上的五种RL算法进行受控比较，我们发现：(i) GRPO避免了REINFORCE在TSP-100上观察到的训练崩溃，其中性能在预热阶段后立即从成本9.8下降到52.1，并且在延长训练下无法恢复；(ii) 在匹配的梯度更新次数下，GRPO达到了与POMO（一种基于AM的强多起点基线）在2%以内的解质量，同时无需外部基线；(iii) P3O，一种也来自对齐文献的成对偏好算法，在TSP上具有竞争力，但在CVRP上表现出更高的变异性。这些结果表明GRPO是一种有前途的无基线NCO替代方案，特别是在基线依赖训练变得脆弱的场景中。

英文摘要

Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a rollout baseline, requires maintaining and periodically updating a frozen copy of the policy for variance reduction. This baseline introduces a structural vulnerability: on harder instances, a poor baseline produces noisy gradient estimates that can destabilize training. We evaluate Group Relative Policy Optimization (GRPO), an algorithm from large language model alignment that eliminates the baseline entirely by normalizing advantages within groups of sampled trajectories. In a controlled comparison of five RL algorithms on TSP and CVRP benchmarks within the RL4CO framework, we find that: (i) GRPO avoids the training collapse observed with REINFORCE on TSP-100, where performance degrades from cost 9.8 to 52.1 immediately after the warmup phase and does not recover under extended training; (ii) at matched gradient updates, GRPO achieves solution quality within 2% of POMO, a strong AM-based multi-start baseline, while requiring no external baseline; and (iii) P3O, a pairwise preference algorithm also from the alignment literature, is competitive on TSP but shows higher variability on CVRP. These results identify GRPO as a promising baseline-free alternative for NCO, particularly in settings where baseline-dependent training becomes fragile.

URL PDF HTML ☆

赞 0 踩 0

2606.10448 2026-06-10 cs.LG cs.AI 新提交

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

通过量子表示缓解低信噪比金融强化学习中的偏差

Zeyu Liu, Xuanzhi Feng, Sing Kwong Lai, Yuanchen Gao, Xiaoyi Pang, Hualei Zhang, Jingcai Guo, Jie Zhang, Song Guo

发表机构 * The Hong Kong University of Science and Technology（香港科技大学）

AI总结针对低信噪比金融市场中SAC算法的不稳定性，提出FPQC-SAC变体，在表征层使用参数化量子电路约束特征传播，减少极端波动影响，在真实组合管理任务中累计收益相对提升66.89%。

Comments Preprint. Code available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main

详情

AI中文摘要

金融市场是典型的低信噪比（SNR）环境，这常常使Soft Actor-Critic（SAC）等离策略最大熵方法不稳定。具体来说，噪声状态表示可能产生不可靠的Q值估计，而自举会放大这些误差，形成我们称之为“金融熵陷阱”的失效模式。在本文中，我们提出FPQC-SAC，一种高效且即插即用的SAC变体，它在演员和评论家网络之前放置一个紧凑且有界的参数化量子电路（PQC），以在表征层约束特征传播，而不是过滤原始输入或在自举后正则化Q值。值得注意的是，FPQC-SAC减少了极端市场波动对贝尔曼目标估计的影响，而可训练的量子纠缠保留了灵活的跨资产交互。在真实投资组合管理任务上的实证评估表明，FPQC-SAC通过实现比标准无约束SAC累计收益相对提升66.89%，显著增强了样本外稳定性和累计收益，并且比最佳连续控制深度强化学习基线高出约27%。开源代码可在该https URL获取。

英文摘要

The financial market is a typical low signal-to-noise ratio (SNR) setting, which often destabilizes off-policy maximum-entropy methods like Soft Actor-Critic (SAC). Specifically, noisy state representations may produce unreliable Q-value estimates, and bootstrapping amplifies these errors, forming a failure mode we call the "Financial Entropy Trap". In this paper, we propose FPQC-SAC, an efficient and plug-and-play SAC variant that places a compact and bounded Parameterized Quantum Circuit (PQC) before the actor and critic networks to constrain feature propagation at the representation level, rather than filtering raw inputs or regularizing Q-values after bootstrapping. Notably, FPQC-SAC reduces the impact of extreme market fluctuations on Bellman target estimation, while trainable quantum entanglement preserves flexible cross-asset interactions. Empirical evaluations on real-world portfolio management tasks demonstrate that FPQC-SAC substantially enhances out-of-sample stability and cumulative returns by achieving a 66.89% relative gain in cumulative return over standard unconstrained SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%. Open-source code is available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main.

URL PDF HTML ☆

赞 0 踩 0

2606.10528 2026-06-10 cs.LG cs.CL 新提交

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

表示感知优势估计：你的奖励模型提供的不仅仅是标量输出

Guozheng Li, Xiyan Fu, Yiwen Guo

发表机构 * Southeast University（东南大学）； Nanyang Technological University（南洋理工大学）； Independent Researcher（独立研究员）

AI总结提出表示感知优势估计方法，利用奖励模型隐藏状态作为辅助信号，通过图传播计算优势值，提升RLHF的样本效率和鲁棒性。

详情

AI中文摘要

当前基于人类反馈的强化学习（RLHF）方法主要依赖来自训练好的奖励模型（RM）的标量奖励。虽然有效，但标量奖励通常存在噪声，无法捕捉细粒度的偏好差异，而RM隐藏状态编码了更丰富的语义和偏好信息。我们引入了表示感知优势估计，利用RM隐藏状态并将其建模为辅助信号以实现更好的优势估计。具体来说，我们提出了基于图的优势估计（GraphAE），将每个采样组视为一个图，其中节点对应响应，边捕捉它们在RM隐藏空间中的相似性。然后通过图传播计算优势值，使每个样本能够从其邻居中融入上下文信息。GraphAE轻量级，可以无缝集成到现有的基于组的RL算法中。我们将GraphAE应用于GRPO、GSPO和RLOO，并在不同模型和基准上进行了大量实验。实证结果显示，在三个基准上均有一致改进，在Arena-Hard-v0.1上提升高达+6.3，在AlpacaEval 2.0上提升+8.27，在MT-Bench上提升+0.22。这些结果表明，利用RM表示可以实现更高效和鲁棒的RLHF。

英文摘要

Current reinforcement learning from human feedback (RLHF) methods primarily rely on scalar rewards from a trained reward model (RM). While effective, scalar rewards are often noisy and fail to capture fine-grained preference differences, whereas RM hidden states encode richer semantic and preference information. We introduce the representation-aware advantage estimation, which leverages RM hidden states and models them as auxiliary signals for better advantage estimation. Specifically, we propose the Graph-based Advantage Estimation (GraphAE), treat each sampled group as a graph, where nodes correspond to responses and edges capture their similarity in the RM hidden space. Then advantages are computed via graph propagation, enabling each sample to incorporate contextual information from its neighbors. GraphAE is lightweight and can be seamlessly integrated into existing group-based RL algorithms. We apply GraphAE to GRPO, GSPO and RLOO, and conduct extensive experiments on different models and benchmarks. Empirical results show consistent improvements across three benchmarks, with gains of up to + 6.3 on Arena-Hard-v0.1, + 8.27 on AlpacaEval 2.0, and + 0.22 on MT-Bench. These results demonstrate that leveraging RM representations leads to more sample efficient and robust RLHF.

URL PDF HTML ☆

赞 0 踩 0

2606.10611 2026-06-10 cs.LG cs.CV 新提交

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

几何感知强化学习用于二维不规则排样

Auguste Lehuger, Guillaume Henon-Just

发表机构 * Valeo Brain（法雷奥大脑）

AI总结提出Polygons Transformer架构与组合优化强化学习框架，使智能体从数据中学习几何先验，在二维不规则排样中达到与最先进启发式算法Sparrow竞争的面积利用率。

Comments 15 pages, 4 figures, 5 tables. Under review at the European Workshop on Reinforcement Learning (EWRL)

详情

AI中文摘要

针对二维不规则排样问题的传统启发式求解器存在一个根本性限制：它们对多边形几何是盲目的，依赖引导式暴力搜索在连续放置空间中导航，几何指导极少。本文认为，强化学习具有独特优势来克服这一瓶颈。通过将优化策略与几何感知神经编码器配对，智能体可以直接从数据中自动发现丰富的几何先验，利用这些学到的直觉来战略性地引导探索。为实现这一点，我们引入了Polygons Transformer（PoT），这是一种新颖的架构，能够编码二维连续矢量几何，同时允许跨多边形注意力。我们将这种新颖架构与组合优化强化学习（CORL）训练框架相结合，以寻找最优解。为了支持这一范式，我们发布了一个源自复杂地理轮廓的开源训练数据集以及一个专门的评估基准。我们的实证验证表明，训练后的智能体在面积利用率方面与最先进的启发式求解器Sparrow高度竞争，证明强化学习可以成功发现并利用几何感知来完成精确的空间任务。

英文摘要

Traditional heuristic solvers for the 2D irregular nesting problem share a fundamental limitation: they are blind to polygon geometry, relying on guided brute-force to navigate the continuous placement space with minimal geometrical guidance. In this paper, we argue that Reinforcement Learning is uniquely positioned to overcome this bottleneck. By pairing an optimization policy with a geometry-aware neural encoder, an agent can automatically discover rich geometric priors directly from data, utilizing these learned intuitions to strategically guide exploration. To realize this, we introduce the Polygons Transformer (PoT), a novel architecture that encodes 2D continuous vector geometries while allowing cross-polygons attention. We couple this novel architecture with a Combinatorial Optimization Reinforcement Learning (CORL) training framework to find optimal solutions. To support this paradigm, we release an open-source training dataset derived from complex geographic contours alongside a dedicated evaluation benchmark. Our empirical validation demonstrates that our trained agent achieves area utilization performance highly competitive with Sparrow, the state-of-the-art heuristic solver, proving that reinforcement learning can successfully discover and exploit geometric awareness for precise spatial tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.10613 2026-06-10 cs.LG cs.AI 新提交

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

基于自举流Q学习的离线强化学习快速且高表达性策略学习

Thanh Nguyen, Tri Ton, Hongbin Choe, Tung M. Luu, Chang D. Yoo

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出自举流Q学习（BFQ），通过分治位移向量并自举短程分量，实现单步动作生成，无需辅助网络或蒸馏，显著降低计算成本并提升性能。

Comments ICML 2026, 19 pages

详情

Journal ref: ICML 2026

AI中文摘要

基于扩散的Q学习已成为离线强化学习的一种强大范式，但其对多步去噪的依赖使得训练和推理在计算上昂贵且脆弱。最近将扩散Q学习加速到单步动作生成的努力通常引入辅助网络、策略蒸馏或多阶段训练，这常常损害简单性、稳定性或性能。为解决这些限制，我们引入了自举流Q学习（BFQ），一种新颖的框架，能够在训练和推理期间实现精确的单步动作生成，无需辅助网络或蒸馏过程。BFQ采用分治视角处理沿流路径的位移向量：它首先学习可以从流匹配边际速度准确估计的短程位移，然后自举这些分量以直接学习单步噪声到动作的映射。这种公式消除了多步去噪，导致学习过程更快、更简单、更稳健。广泛的D4RL评估表明，与多步扩散基线相比，BFQ在显著降低计算成本的同时提高了性能，证明了单步动作生成足以实现高性能的离线强化学习。

英文摘要

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

URL PDF HTML ☆

赞 0 踩 0

2606.10646 2026-06-10 cs.LG cs.CL 新提交

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

推理流如何流动？追踪注意力诱导的信息流以实现LLM中的目标RL

Zhichen Dong, Yang Li, Yuhan Sun, Weixun Wang, Yijia Luo, Zinian Peng, Taiheng Ye, Chao Yang, Wenbo Su, Yu Cheng, Bo Zheng, Junchi Yan

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Alibaba Group（阿里巴巴集团）； Shanghai Artificial Intelligence Laboratory（上海人工智能实验室）

AI总结提出FlowTracer框架，通过注意力诱导的有向无环图追踪答案导向的推理流，基于全局信息流结构分配token级信用，从而提升LLM在推理任务中的强化学习效果。

Comments 25 pages, 7 figures, 11 tables. Accepted at ICML 2026

详情

AI中文摘要

Token级信用分配仍然是大型语言模型（LLM）中强化学习（RL）的主要障碍，其中RL配方通常平等对待所有token，未能区分决定性推理步骤与常规格式或流畅填充。最近的研究利用模型内部信号分配更细粒度的信用，但这些往往是点式启发式方法，忽略了信息传播的全局结构。我们提出FlowTracer，一个RL框架，它在注意力诱导的有向无环图上追踪答案导向的推理流，其中节点对应token，边容量来自聚合的注意力权重，并从这种全局结构中推导出token信用。边容量被重新加权，仅保留能够到达答案区域的影响，同时强制执行局部流守恒，使得中间token不会因路径长度或无关分支而损失或获得有效质量。在此图上，FlowTracer提取连接问题与答案的信息流骨干，并通过流吞吐量对token进行评分，揭示调解长距离依赖的高影响枢纽和聚合检查点。这些推导出的重要性用于塑造token级奖励，使学习信号精确聚焦于将信息路由向（或远离）正确答案的token，并在各种推理任务中提供一致的性能提升。

英文摘要

Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We propose FlowTracer, an RL framework that traces answer-targeted reasoning flow on an attention-induced directed acyclic graph in which nodes correspond to tokens and edge capacities come from aggregated attention weights and derives token credit from this global structure. The edge capacities are reweighted to retain only the influence that can reach the answer region, while enforcing local flow conservation so intermediate tokens neither lose nor gain effective mass due to path length or irrelevant branches. On this graph, FlowTracer extracts an information-flow backbone connecting the question to the answer and scores tokens by flow throughput, revealing high-impact hubs and aggregation checkpoints that mediate long-range dependencies. These derived importances are used to shape token-level rewards, enabling learning signals to focus precisely on the tokens that route information toward (or away from) correct answers and delivering consistent performance gains across a range of reasoning tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.10684 2026-06-10 cs.LG cs.AI 新提交

Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

分工与合作：基于跨智能体学习信号的角色分解多智能体LLM训练

Jaewan Park, Solbee Cho, Jay-Yoon Lee

发表机构 * Seoul National University（首尔大学）

AI总结提出DAC框架，将多步推理分解为搜索和生成两个子任务，分别由专用智能体处理，并通过跨智能体学习信号解决信用分配问题，在QA基准上超越全参数微调的单体模型。

详情

AI中文摘要

现代语言智能体通过多步推理在知识密集型问答中表现出色。然而，现有方法通常将证据获取和答案生成耦合在单一策略中。这迫使单个模型扮演多个可能冲突的角色，导致策略空间组合爆炸并阻碍高效探索。同时，训练中引入信用分配问题：当生成失败时，检索到足够证据的搜索动作仍可能受到惩罚，反之亦然。我们提出DAC（分工与合作），一个角色分解的多智能体训练框架，将智能体搜索分解为两个合作性子任务，每个子任务由专用智能体处理，并使用角色特定的学习信号进行训练。生成器扮演双重角色：既是答案生成器，也是证据充分性验证器，当检索到的证据不足时放弃回答。该放弃信号被纳入搜索智能体的奖励中，提供结构化的跨智能体学习信号以改进信用分配。相反，搜索器通过硬阳性证据增强向生成器暴露多样且具有挑战性的证据环境，提高其鲁棒性。在通用和多跳问答基准上的实验表明，DAC通过共享骨干网络上的参数高效LoRA模块实现，在性能上优于先前依赖全参数微调单体模型的基线方法。

英文摘要

Modern language agents which perform multi-step reasoning have shown strong performance in knowledge-intensive question answering. However, existing approaches typically couple evidence acquisition and answer generation within a single policy. This forces a single model to play multiple potentially conflicting roles, inducing a combinatorial explosion in the policy space and hindering efficient exploration. It also introduces a credit assignment problem during training: a search action that retrieves sufficient evidence may still be penalized when generation fails, and vice versa. We propose DAC (Divide and Cooperate), a role-decomposed multi-agent training framework that divides agentic search into two cooperative subtasks, each handled by a dedicated agent trained with role-specific learning signals. The generator serves a dual role as both an answer producer and an evidence sufficiency verifier, abstaining when retrieved evidence is insufficient. This abstention signal is incorporated into the search agent's reward, providing structured cross-agent learning signals that improve credit assignment. Conversely, the searcher exposes the generator to diverse and challenging evidence environments by hard-positive evidence augmentation, improving its robustness. Experiments on general and multi-hop QA benchmarks show that DAC, implemented via parameter-efficient LoRA modules over a shared backbone, achieves strong performance against prior baselines that rely on full fine-tuning of monolithic models.

URL PDF HTML ☆

赞 0 踩 0

2606.10705 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

事件驱动强化学习实现半导体制造中的长时域控制

Yavar Yeganeh, Mahsa Shekari, Nicla Frigerio, Daniele Pagano, Andrea Matta

发表机构 * Politecnico di Milano（米兰理工大学）； STMicroelectronics（意法半导体）

AI总结提出事件驱动深度强化学习框架，将半导体制造控制建模为中心化智能体问题，通过事件驱动时序差分方法优化多目标策略，在高保真仿真中显著提升吞吐量和利用率。

详情

AI中文摘要

强化学习有望优化大规模系统中的序贯决策。半导体制造系统是随机且高度约束的环境，其中异构晶圆在广泛的设备网络中经历数百个加工步骤。这些特性产生了复杂、高维的决策问题，具有延迟反馈和长时域要求，使生产计划和控制复杂化。我们提出了一个用于此规模的多目标策略优化的深度强化学习框架。具体来说，我们将控制表述为一个中心化智能体问题，其中核心策略协调系统范围的决策，而系统演化被表示为由离散事件驱动的互联时间过程。相应地，我们开发了一个定制的事件驱动时序差分公式，该公式保持通用性，并可在相关训练设置下与各种策略优化方法集成。我们研究了纳入该框架的几种核心无模型算法，并使用不同工业现实操作场景的高保真仿真评估其有效性。在广泛的验证实验中，在离线和在线设置下训练的智能体在吞吐量和利用率方面显示出显著且一致的提升。我们进一步评估了训练阶段的表现和泛化能力，阐明了替代强化学习公式和算法的相对优势。总体而言，结果支持所提出框架在控制事件驱动复杂自适应系统方面的可扩展性、通用性和可迁移性。

英文摘要

Reinforcement learning promises to optimize sequential decisions in large-scale systems. Semiconductor manufacturing systems are stochastic and highly constrained environments where heterogeneous wafers traverse hundreds of processing steps across extensive equipment networks. These characteristics yield complex, high-dimensional decision problems with delayed feedback and long-horizon requirements, complicating production planning and control. We propose a deep reinforcement learning framework for multi-objective policy optimization at this scale. Specifically, we formulate control as a centralized-agent problem, where a core policy coordinates system-wide decisions, while system evolution is represented as an interconnected temporal process driven by discrete events. Accordingly, we develop a tailored event-driven temporal-difference formulation that remains general and can be integrated with various policy optimization methods under relevant training settings. We investigate several core model-free algorithms incorporated into this framework and evaluate their effectiveness using high-fidelity simulations of diverse, industry-real operating scenarios. Across extensive validation experiments, agents trained in both offline and online settings show significant and consistent gains in throughput and utilization. We further evaluate performance and generalization across training phases, clarifying the relative strengths of alternative reinforcement learning formulations and algorithms. Overall, the results support the scalability, generality, and transferability of the proposed framework for controlling event-driven complex adaptive systems.

URL PDF HTML ☆

赞 0 踩 0

2606.10768 2026-06-10 cs.LG cs.CL 新提交

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

N-GRPO：嵌入级邻居混合增强策略优化

Xukun Zhu, Hang Yu, Peng Di, Linchao Zhu

发表机构 * Zhejiang University（浙江大学）； Ant Group（蚂蚁集团）

AI总结针对大语言模型数学推理中探索策略的折衷问题，提出N-GRPO方法，通过语义邻居混合机制在嵌入层注入多样性，在保持语义一致性的同时提升策略优化效果。

Comments ACL 2026 Findings. 16 pages, 3 figures. Code: https://github.com/ZJUSCL/N-GRPO

详情

AI中文摘要

大语言模型在数学推理中的成功很大程度上依赖于生成多样化且有效的解题路径。然而，当前的展开技术面临一个基本折衷：token级采样通常产生仅在措辞上不同的冗余轨迹，而利用随机噪声的嵌入级方法则经常破坏语义一致性。为解决此问题，我们引入N-GRPO，一种集成到组相对策略优化（GRPO）框架中的新型探索策略。我们的方法不依赖于token级采样或原生嵌入级噪声，而是利用语义邻居混合机制。该机制通过混合锚点token及其最近语义邻居的嵌入来动态构建输入表示，从而在严格遵循局部语义流形的同时注入多样性。在不同大小的DeepSeek-R1-Distill-Qwen模型上的实验评估表明，N-GRPO不仅在数学推理基准上相比强基线取得一致改进，而且在分布外任务上展现出鲁棒的泛化能力。

英文摘要

The success of Large Language Models in mathematical reasoning relies heavily on the generation of diverse and valid solution paths during the rollout phase. However, current rollout techniques face a fundamental trade-off: token-level sampling often yields redundant trajectories that differ only in rephrasing, while embedding-level methods utilizing random noise frequently disrupt semantic consistency. To resolve this, we introduce N-GRPO, a novel exploration strategy integrated into the Group Relative Policy Optimization (GRPO) framework. Rather than relying on token-level sampling or native embedding-level noise, our approach leverages Semantic Neighbor Mixing. This mechanism dynamically constructs input representations by mixing the embeddings of an anchor token and its nearest semantic neighbors, thereby injecting diversity while strictly adhering to the local semantic manifold. Experimental evaluations on the DeepSeek-R1-Distill-Qwen models across different sizes show that N-GRPO not only achieves consistent improvements over strong baselines on math reasoning benchmarks but also exhibits robust generalization capabilities on out-of-distribution tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.10825 2026-06-10 cs.LG 新提交

MODIP: Efficient Model-Based Optimization for Diffusion Policies

MODIP：扩散策略的高效基于模型的优化

Zakariae El Asri, Philippe Gratias-Quiquandon, Nicolas Thome, Olivier Sigaud

发表机构 * Sorbonne Université, CNRS, ISIR, F-75005 Paris, France（索邦大学，法国国家科学研究中心，智能系统与机器人研究所，法国巴黎）； Institut Universitaire de France (IUF)（法国大学研究院）

AI总结提出MODIP框架，利用世界模型和模型预测控制生成高质量轨迹，以监督方式微调扩散策略，实现离线到在线的强化学习微调，在D4RL和RoboMimic任务上超越行为克隆基线。

详情

AI中文摘要

扩散策略（DPs）已成为机器人学习中表达力强的策略表示，通常与行为克隆（BC）等模仿学习方法一起使用。然而，虽然它们的成功主要局限于BC，但直接进行强化学习（RL）微调仍然具有挑战性，因为动作是通过多步去噪过程生成的。在这项工作中，我们提出了MODIP，一个用于扩散策略离线到在线微调的框架。MODIP不是直接将RL应用于DPs，而是利用世界模型（WM）来指导策略适应，并保持BC的简单性和稳定性。我们利用模型预测控制（MPC）在WM内生成高质量轨迹，并将其作为监督目标来微调DP。为了使MPC规划高效，MODIP使用终端状态值而不是依赖于策略的状态-动作值，从而减少了推理时间。此外，MODIP使用与策略无关的TD目标训练评论家，减少了训练时间。在D4RL（MuJoCo、Kitchen）和RoboMimic任务上的实验表明，MODIP改进了超越BC的扩散策略，并且与扩散策略RL微调方法和强基于模型的基线（如TD-MPC2）相比具有竞争力或更优性能。

英文摘要

Diffusion policies (DPs) have emerged as expressive policy representations for robot learning, often used with imitation learning methods such as behavioral cloning (BC). However, while their success has largely been confined to BC, direct reinforcement learning (RL) fine-tuning remains challenging because actions are generated through a multi-step denoising process. In this work, we propose MODIP, a framework for the offline-to-online fine-tuning of DPs. Rather than directly applying RL to the DPs, MODIP leverages a world model (WM) to guide policy adaptation and keeps the simplicity and stability of BC. We utilize model predictive control (MPC) to generate high-quality trajectories within the WM, and use them as supervised targets for fine-tuning the DP. To make MPC planning efficient, MODIP uses a terminal state value instead of a policy-dependent state-action value, reducing inference time. Additionally, MODIP trains critics with policy-independent TD targets, reducing training time. Experiments on D4RL (MuJoCo, Kitchen) and RoboMimic tasks show that MODIP improves diffusion policies beyond BC, and is competitive with or outperforms diffusion policy RL fine-tuning methods and strong model-based baselines such as TD-MPC2.

URL PDF HTML ☆

赞 0 踩 0

2606.10835 2026-06-10 cs.LG cs.AI 新提交

Geometrically Averaged Hard Target Updates for Linear Q-Learning

线性Q学习的几何平均硬目标更新

Donghwan Lee

发表机构 * School of Electrical Engineering, KAIST（韩国科学技术院电气工程学院）

AI总结提出λ-几何加权平均的周期目标更新方法，用于线性Q学习，通过切换系统模型分析其稳定性，连接了单周期更新和投影Q值迭代。

2606.11087 2026-06-10 cs.LG cs.AI 新提交

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

强化学习中流策略的测试时梯度引导

Zhiyuan Zhou, Andy Peng, Charles Xu, Qiyang Li, Tobias Springenberg, Kevin Frans, Sergey Levine

发表机构 * UC Berkeley（加州大学伯克利分校）； Physical Intelligence

AI总结提出QGF算法，通过预训练参考流策略和价值函数，在测试时利用价值梯度引导策略生成高价值动作，无需额外策略学习，在离线RL基准上优于现有测试时方法且与训练时方法竞争力相当。

详情

AI中文摘要

表达性连续控制策略，如扩散模型和流模型，构成了模拟和真实机器人控制中模仿学习近期进展的基础。尽管它们在监督模仿学习设置中能够稳定扩展，但将其纳入强化学习（RL）流程以改进策略已被证明更加困难。这通常需要专门的训练目标或通过去噪过程反向传播，这会导致众所周知的稳定性问题并影响可扩展性。在本文中，我们研究了一个问题：仅在测试时采用简单的策略改进方案，同时保持稳定的监督策略训练不变，是否能够成为避免这些问题的竞争性替代方案。为此，我们提出了QGF（Q-Guided Flow），一种完全在测试时进行策略优化的RL算法。QGF通过预训练一个参考流策略（通过标准的行为克隆目标）和一个价值函数评论家，并在测试时使用价值梯度引导参考策略生成更高价值的动作，而无需任何额外的策略学习。实验上，QGF在高维动作空间的单任务和目标条件离线RL基准测试中优于先前的测试时RL方法，并且与最先进的训练时算法竞争力相当，同时运行成本更低。此外，通过避免演员-评论家训练的不稳定性，它展现出随模型规模的良好扩展性，为使用表达性策略提供了一种实用且有效的替代RL算法。

英文摘要

Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and real robot control. While they are known to scale stably in the supervised imitation learning setting, incorporating them into reinforcement learning (RL) pipelines for policy improvement has proven more difficult. It often requires specialized training objectives or backpropagating through denoising processes, which cause well-known issues with stability and affect scalability. In this paper we study the question of whether simple policy improvement schemes at test time alone, leaving stable supervised policy training intact, can be a competitive alternative which sidesteps these issues. To this end, we propose QGF (Q-Guided Flow), an RL algorithm that performs policy optimization entirely at test time. QGF works by pre-training both a reference flow policy (via a standard behavioral cloning objective) and a value function critic and, at test time, using the value gradient to guide the reference policy to generate higher-value actions without any additional policy learning. Empirically, QGF outperforms prior test-time RL methods on single-task and goal-conditioned offline RL benchmarks with high-dimensional action spaces, and is competitive with state-of-the-art training-time algorithms while being much cheaper to run. Moreover, it exhibits favorable scaling with model size by avoiding the instability of actor-critic training, offering a practical and effective alternative RL algorithm with expressive policies.

URL PDF HTML ☆

赞 0 踩 0

2606.11119 2026-06-10 cs.LG cs.AI cs.CL 新提交

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

TRACE：一种用于高效智能体强化学习的统一展开预算分配框架

Heming Zou, Qi Wang, Yun Qu, Yuhang Jiang, Lizhou Cai, Yixiu Mao, Ru Peng, Xin Xu, Weijie Liu, Kai Yang, Saiyong Yang, Xiangyang Ji

发表机构 * Tsinghua University（清华大学）； Tencent（腾讯）

AI总结针对多轮智能体强化学习中奖励对比度不足的问题，提出TRACE框架，通过将每个ReAct式思考-行动-观察步骤建模为语义节点，在固定采样预算内将预算分配到提示根和中间前缀，增强奖励对比，提升策略更新信号。

Comments 32 pages, 12 figures, 6 tables

详情

AI中文摘要

具有可验证奖励的强化学习（RLVR）是增强大型语言模型推理和智能体行为的一种有前景的方法。然而，展开密集的策略优化常常受到奖励对比度不足的限制，当过于简单或复杂的提示产生低方差反馈，以及当仅结果奖励对多轮展开中的每个决策赋予相同的终端评估时，就会出现这种情况。过去的努力集中在将可用的展开资源分配给有希望的提示，但它们仅利用提示级别的样本信息性，而忽略了同一展开中不同轮次之间前缀级别信息性的变化。本工作针对多轮智能体强化学习，将每个ReAct风格的思考-行动-观察步骤建模为语义上不同的节点，使得预算分配从提示根扩展到具有进一步延续的轮次级前缀，这自然形成了树状结构的展开。我们引入了树状展开分配用于对比探索（TRACE），这是一个统一的展开分配框架，在固定采样预算内增强奖励对比。在技术上，TRACE将展开预算分配给最可能产生混合终端奖励的提示根和中间前缀。一个共享的通用预测器根据前缀历史估计这些锚点处的条件成功概率，以指导这种分配。由此产生的自适应树状结构丰富了仅结果反馈，并放大了策略更新信号。实验上，TRACE在典型的智能体基准测试中取得了有竞争力的性能和效率提升，例如，在相同采样成本下，Qwen3-14B多跳问答的平均准确率比竞争基线提高了2.8个百分点。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate low-variance feedback and when outcome-only rewards assign the same terminal assessment to every decision in a multi-turn rollout. Past efforts have focused on allocating available rollout resources to promising prompts, yet they only leverage sample informativeness at the prompt level and neglect variation in prefix-level informativeness across turns within the same rollout. This work targets multi-turn agentic RL by modeling each ReAct-style thought-action-observation turn as a semantically distinct node, allowing budget allocation to extend from prompt roots to turn-level prefixes with further continuations, which naturally forms tree-structured rollouts. We introduce Tree Rollout Allocation for Contrastive Exploration (TRACE), a unified rollout allocation framework that enhances reward contrast within a fixed sampling budget. Technically, TRACE allocates rollout budget to both prompt roots and intermediate prefixes that are most likely to yield mixed terminal rewards. A shared generalizable predictor estimates conditional success probability at these anchors from prefix histories to guide this allocation. The resulting adaptive tree structure enriches outcome-only feedback and amplifies the policy-update signal. Empirically, TRACE achieves competitive performance and efficiency gains on typical agentic benchmarks, e.g., improving Qwen3-14B Multi-Hop QA average accuracy by 2.8 points over competitive baselines at equal sampling cost.

URL PDF HTML ☆

赞 0 踩 0

2606.09884 2026-06-10 cs.MA cs.AI cs.LG econ.EM 交叉投稿

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

深度多智能体强化学习在异步定价中的失败模式：可复现触发器、轨迹诊断及部分修复

Shree Murthy, Rohan Pandey

发表机构 * DigitalOcean, USA（DigitalOcean美国）

AI总结研究连续时间定价市场中深度多智能体强化学习的两种可复现失败模式：DDPG智能体之间的默契合谋和高事件率下的演员-评论家不稳定性，并通过异步性实现部分修复。

详情

AI中文摘要

我们研究了连续时间定价市场中深度多智能体强化学习的两种可复现失败模式：(i) 竞争性DDPG智能体之间形成默契合谋，以及(ii) 高事件率下的演员-评论家不稳定性。我们在一个单一的CT-MARL基准测试（泊松时钟价格更新、观测延迟$\delta$、内部最优logit需求）中实例化了这两种模式，表明同步DDPG智能体可靠地触发失败模式1，合谋指数$\Delta = 0.69 \pm 0.11$，并量化了一种部分微观结构修复：仅异步性就将合谋降低了48%，而增加延迟使其降至最低$\Delta = 0.28$。该修复具有明确记录的成本：它是部分的（$\Delta$仍高于伯特兰水平），在$\delta$上非单调，并且无法承受失败模式2，后者在$\lambda = 5$时表现为DDPG评论家发散，并破坏了$(\lambda{=}5, \delta{=}1)$处的相图单元。我们为标量合谋指数配备了轨迹级诊断，揭示了情节内信号崩溃和冲击后无法恢复。

英文摘要

We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor--critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency $δ$, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index $Δ= 0.69 \pm 0.11$, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48\% and adding latency drives it to a minimum of $Δ= 0.28$. The fix has clearly documented costs: it is partial ($Δ$ remains supra-Bertrand), it is non-monotone in $δ$, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at $λ= 5$ and corrupts the phase-diagram cell at $(λ{=}5, δ{=}1)$. We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-shock non-recovery.

URL PDF HTML ☆

赞 0 踩 0

2606.10267 2026-06-10 cs.RO cs.AI cs.LG 交叉投稿

What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents

机器人策略编排的关键因素：分层VLA智能体的系统研究

Jiaheng Hu, Mohit Shridhar, Caden Lu, Dhruv Shah, Hao-Tien Lewis Chiang, Jie Tan, Annie Xie

发表机构 * Google DeepMind（谷歌深Mind）

AI总结系统研究分层视觉-语言-动作（Hi-VLA）系统的设计原则，通过统一框架分析规划器、控制器及接口机制对短时、长时及推理密集型任务性能的影响，提出构建更强健分层VLA智能体的实用原则。

详情

AI中文摘要

分层视觉-语言-动作（Hi-VLA）系统已成为复杂机器人操作的一种有前景的范式，它通过使用高层VLM规划器将任务分解为语言子目标，由低层VLA控制器执行。尽管近期取得了实证进展，但这些系统缺乏统一的设计原则：现有的Hi-VLA系统在选择和连接规划器、控制器、两者之间的切换机制以及规划器中观测和记忆的表示方式上存在差异。在本文中，我们对机器人操作的Hi-VLA设计进行了系统研究。我们将代表性的Hi-VLA智能体统一在一个选项式控制框架下，并在短时、长时和推理密集型任务上基准测试核心设计选择。我们的分析提炼出构建Hi-VLA系统的实用原则，展示了模型选择和接口机制如何共同塑造性能。应用这些原则，在仿真和真实ALOHA机器人上的实验中，我们得到了一个比平面VLA控制或朴素设计的分层系统都显著更强的系统。总体而言，我们的结果为构建更强大、更鲁棒且更有原则的分层VLA智能体奠定了基础。更多信息和视频请访问此http URL。

英文摘要

Hierarchical vision-language-action (Hi-VLA) systems have emerged as a promising paradigm for complex robot manipulation, by using high-level VLM planners to decompose tasks into language subgoals executed by low-level VLA controllers. Despite recent empirical progress, there is a lack of unified design principles for these systems: existing Hi-VLA systems differ in how they choose and connect planners, controllers, mechanisms to switch between the two, and how observations and memory are represented in the planner. In this paper, we present a systematic study of Hi-VLA design for robot manipulation. We unify representative Hi-VLA agents under an options-style control framework and benchmark core design choices across short-horizon, long-horizon, and reasoning-intensive tasks. Our analysis distills practical principles for building Hi-VLA systems, showing how model choices and interface mechanisms jointly shape performance. Applying these principles yields a substantially stronger system than either flat VLA control or a naively designed hierarchy, across experiments both in simulation and on a real ALOHA robot. Overall, our results provide a foundation for building more capable, robust, and principled hierarchical VLA agents. More information and video at jiahenghu.github.io/hi-vla.

URL PDF HTML ☆

赞 0 踩 0

2606.10601 2026-06-10 math.NA cs.AI cs.LG cs.NA 交叉投稿

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

Dmsh：一种用于全四边形网格生成的多智能体强化学习框架

Anirudh Kalyan, Cosmin Anitescu, Xiaoying Zhuang, Timon Rabczuk, Somdatta Goswami, Sundararajan Natarajan

发表机构 * Department of Mechanical Engineering, Indian Institute of Technology Madras（印度理工学院马德拉斯分校机械工程系）； Institute of Continuum Mechanics, Leibniz Universität Hannover（莱比锡大学汉诺威连续力学研究所）； Institute of Structural Mechanics, Bauhaus-Universität Weimar（魏玛包豪斯大学结构力学研究所）； Department of Civil and Systems Engineering, Johns Hopkins University（约翰霍普金斯大学土木与系统工程系）

AI总结提出Dmsh，首个全自动强化学习流水线，通过三个协调智能体处理拓扑简化、几何正则化和网格生成，采用参数化Soft Actor-Critic架构和课程学习策略，实现高质量全四边形网格生成。

详情

AI中文摘要

为任意几何体生成高质量网格仍然是计算工程中的一个基本瓶颈，通常需要启发式调整和半手动工作流程。在本文中，我们介绍了Dmsh，这是第一个完全自动化的强化学习流水线，它将几何分解和四边形网格生成统一在一个基于学习的框架中。Dmsh通过三个协调的智能体分解问题，分别处理拓扑简化、几何正则化和网格生成。网格生成过程被建模为马尔可夫决策过程，并使用具有解耦评论家的参数化Soft Actor-Critic架构求解，从而能够高效探索混合离散-连续动作空间。课程学习策略确保了从简单域到高度复杂几何体的可扩展性，并抑制了种子方差。通过设计，递归分解使得子区域能够并行网格化，生成全局一致的全四边形网格，无需事后校正。在广泛的基准测试中，Dmsh在自动化程度、鲁棒性和网格质量方面始终优于现有方法，为基于学习的网格生成建立了新范式。

英文摘要

Generating high-quality meshes for arbitrary geometries remains a fundamental bottleneck in computational engineering, often demanding heuristic tuning and semi-manual workflows. In this paper, we introduce Dmsh, a first fully automated reinforcement learning pipeline that unifies geometric decomposition and quadrilateral mesh generation within a single learning-based framework. Dmsh decomposes the problem through three coordinated agents handling topology simplification, geometric regularization, and mesh generation. The meshing process is formulated as a Markov Decision Process and solved using a parametric Soft Actor-Critic architecture with decoupled critics, enabling efficient exploration of a hybrid discrete-continuous action space. A curriculum learning strategy ensures scalability from simple domains to highly complex geometries, suppressing seed variance. By design, the recursive decomposition enables parallel meshing of subregions, yielding globally conforming all-quadrilateral meshes without post hoc correction. Across a wide range of benchmarks, Dmsh consistently outperforms existing methods in automation, robustness, and mesh quality, establishing a new paradigm for learning-based mesh generation.

URL PDF HTML ☆

赞 0 踩 0

2606.10614 2026-06-10 cs.RO cs.CV cs.LG 交叉投稿

Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations

灵巧点策略：从人类演示中学习基于点的灵巧手策略

Beomjun Kim, Seong Hyeon Park, Seunghoon Sim, Seungjun Moon, Sanghyeok Lee, Jinwoo Shin

发表机构 * KAIST（韩国科学技术院）

AI总结提出Dexterous Point Policy框架，通过统一3D关键点表示从人类视频学习灵巧操作策略，无需机器人演示，在真实任务中达到75%成功率。

详情

AI中文摘要

基于人类演示视频预训练的机器人基础模型显示出潜力，但当策略部署到真实机器人时仍存在显著的具身差距。常见的补救措施是在机器人特定演示上微调这些模型。然而，机器人数据收集可能过于昂贵和耗时，这在灵巧操作中尤为突出，例如，即使是单个原子任务，遥操作多指手也可能需要数天。为了解决这个问题，我们引入了Dexterous Point Policy，一个直接从人类视频学习灵巧操作策略且无需机器人演示的框架。我们的核心见解是，统一的3D关键点表示在用于观察和动作时，可以桥接人类和机器人的具身。具体来说，我们从原始视频中提取任务相关物体和人类手的3D关键点，并训练一个自回归变换器来处理这些关键点。我们观察到，在关键点层面，特别是手腕和指尖，人类和机器人的行为紧密对齐，从而实现直接策略迁移。在一套包括拾取放置和工具使用的真实机器人任务中，Dexterous Point Policy达到了75.0%的成功率，而最先进的VLA基线仅达到1.0%。此外，我们的方法对未见过的场景具有很强的泛化能力，包括多物体环境和新型物体类别。

英文摘要

Robotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous manipulation, e.g., teleoperating a multi-fingered hand for even a single atomic task can take days. To address this, we introduce Dexterous Point Policy, a framework that learns dexterous manipulation policies directly from human videos and requires no robot demonstrations. Our core insight is that a unified 3D keypoint representation can bridge human and robot embodiments when used for both observations and actions. Specifically, we extract 3D keypoints of task-relevant objects and human hands from raw videos, and train an autoregressive transformer over these keypoints. We observe that at the keypoint level, specifically the wrist and fingertips, human and robot behaviors closely align, enabling direct policy transfer. On a suite of real-robot tasks spanning pick-and-place and tool use, Dexterous Point Policy attains 75.0% success, whereas a state-of-the-art VLA baseline reaches only 1.0%. Furthermore, our method generalizes strongly to unseen scenarios, including multi-object environments and novel object categories.

URL PDF HTML ☆

赞 0 踩 0

2606.10771 2026-06-10 astro-ph.IM cs.LG cs.RO 交叉投稿

On-sky demonstration of reinforcement learning for adaptive optics control

自适应光学控制强化学习的在轨演示

Jalo Nousiainen, Vincent Chambouleyron, Benoit Neichel, Sylvain Cetre, Jean-Francois Sauvage, Angelie Alagao, Markus Kasper, Jonathan Dray, Romain Fetick, Byron Engler

发表机构 * European Southern Observatory（欧洲南天文学中心）； Aix Marseille University（艾克斯马赛大学）； CNRS（法国国家科学研究中心）； CNES（法国国家太空研究中心）； LAM（雷恩天文物理实验室）； Wakea Consulting（Wakea咨询公司）； Bertin Alpao

AI总结首次在望远镜上演示了基于强化学习的自适应光学控制器PO4AO，在多种条件下优于传统积分控制器，展示了鲁棒性和高性能。

Comments 11 pages, 12 figures accepted by A&A

详情

AI中文摘要

基于强化学习（RL）的算法最近已成为自适应光学（AO）控制的一种有前景的方法。在模拟和实验室实验中，它们已展现出对现实世界效应（如光子和探测器噪声、误配准、振动以及视宁度条件的快速变化）的鲁棒性。然而，它们的性能尚未在天空中得到验证。我们报告了首个基于强化学习的自适应光学控制器（名为PO4AO）的在轨演示。我们进一步分析了其在轨行为，并确定了改进算法及其实现的方向。PO4AO在位于OHP的1.52米望远镜（T152）的Coudé焦点的Papyrus自适应光学系统上实现并部署。基于Python的实现通过共享内存缓冲区与现有的实时控制器（DAO RTC）接口连接。在多个夜晚，覆盖不同的流量水平和大气条件，将PO4AO的性能与标准积分控制器进行了比较。PO4AO在所有测试配置中均持续优于标准积分器。该控制器成功学习并补偿了振动模式，并表现出对测量噪声的强鲁棒性。一旦为Papyrus调整好，PO4AO以交钥匙方式运行，在变化的观测条件和科学目标下使用单一超参数集。尽管非优化的Python实现引入了约750微秒的额外延迟，以及控制抖动和偶尔的帧丢失，但仍实现了这些性能提升。当正确实现和优化后，PO4AO构成了单共轭自适应光学系统的鲁棒且高性能的交钥匙控制器，为在轨AO操作中更广泛地采用强化学习策略铺平了道路。

英文摘要

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

URL PDF HTML ☆

赞 0 踩 0

2606.10857 2026-06-10 cs.RO cs.LG 交叉投稿

SCOPE: 过程干预的顺序因果优化

Jakob De Moor, Hans Weytjens, Johannes De Smedt, Jochen De Weerdt

发表机构 * Research Centre for Information Systems Engineering (LIRIS), KU Leuven, Leuven, Belgium（信息系统工程研究中心（LIRIS），鲁汶大学，比利时列文）； School of Computation, Information and Technology, Technical University of Munich (TUM), Munich, Germany（计算、信息与技术学院，慕尼黑技术大学（TUM），德国慕尼黑）

AI总结提出SCOPE方法，通过反向归纳和因果学习直接利用观测数据，优化业务流程中顺序干预的KPI，优于现有方法。

详情

AI中文摘要

规范性过程监控（PresPM）在运行业务流程期间推荐干预措施以优化关键绩效指标（KPI）。在现实环境中，干预很少是孤立的：组织需要对齐干预序列以共同引导案例的结果。现有的PresPM方法仅部分解决了这一挑战。许多方法专注于单个干预决策，而其他方法将多个干预视为独立，忽略了它们随时间如何相互作用。确实处理这些依赖关系的方法依赖于模拟或数据增强来近似过程以训练强化学习（RL）代理，这可能会造成现实差距并引入偏差。我们提出了SCOPE（过程干预的顺序因果优化），一种学习对齐的顺序干预推荐的PresPM方法。SCOPE采用反向归纳来估计每个候选干预动作的效果，将其影响从最终决策点传播回第一个决策点。通过利用因果学习器，我们的方法可以直接使用观测数据，不同于需要构建过程近似用于RL的方法。在现有合成数据集和新的半合成数据集上的实验表明，SCOPE在优化KPI方面始终优于最先进的PresPM技术。基于真实事件日志的新型半合成设置作为可重复使用的基准，用于未来关于顺序PresPM的工作。

英文摘要

Prescriptive Process Monitoring (PresPM) recommends interventions during running business processes to optimize key performance indicators (KPIs). In realistic settings, interventions are rarely isolated: organizations need to align sequences of interventions to jointly steer the outcome of a case. Existing PresPM approaches only partially address this challenge. Many focus on a single intervention decision, while others treat multiple interventions independently, ignoring how they interact over time. Methods that do address these dependencies depend either on simulation or data augmentation to approximate the process to train a Reinforcement Learning (RL) agent, which may create a reality gap and introduce bias. We introduce SCOPE (Sequential Causal Optimization of Process Interventions), a PresPM approach that learns aligned sequential intervention recommendations. SCOPE employs backward induction to estimate the effect of each candidate intervention action, propagating its impact from the final decision point back to the first. By leveraging causal learners, our method can utilize observational data directly, unlike methods that require constructing process approximations for RL. Experiments on both an existing synthetic dataset and a new semi-synthetic dataset show that SCOPE consistently outperforms state-of-the-art PresPM techniques in optimizing the KPI. The novel semi-synthetic setup, based on a real-life event log, is provided as a reusable benchmark for future work on sequential PresPM.

URL PDF HTML ☆

赞 0 踩 0

2602.13807 2026-06-10 cs.LG 版本更新

AnomaMind: Agentic Time Series Anomaly Detection with Tool-Augmented Reasoning

AnomaMind：基于工具增强推理的智能体时间序列异常检测

Xiaoyu Tao, Yuchong Wu, Mingyue Cheng, Ze Guo, Tian Gao

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出AnomaMind框架，将时间序列异常检测重构为顺序决策过程，通过粗到细的工作流（定位可疑区间、工具交互构建诊断证据、自我反思细化决策）结合知识记忆与数值诊断工具包，并采用混合推理机制，显著提升域内和跨域异常检测性能与泛化能力。

详情

AI中文摘要

时间序列异常检测在许多实际应用中至关重要，有效的解决方案必须定位异常区域并在复杂设置下支持可靠的决策。然而，现有大多数方法将异常检测视为具有固定特征表示的纯判别预测任务，而非基于证据的诊断过程。因此，当异常表现出强上下文依赖性、多样化模式或跨数据集领域偏移时，它们往往难以应对。为应对这些挑战，我们提出AnomaMind，一个智能体时间序列异常检测框架，将异常检测重构为顺序决策过程。AnomaMind通过粗到细的工作流运行：首先定位可疑区间，然后通过工具交互构建诊断证据，最后通过自我反思细化异常决策。该工作流由一个结合知识记忆和数值诊断的工具箱支持：从训练数据中挖掘的视觉异常模式和领域知识提供上下文指导，而统计、基于值、基于变化和区域级别的算子提供可测量的验证证据。AnomaMind进一步采用混合推理机制，其中通用模型处理灵活推理、工具调用和细化，而检测特定策略通过基于规则的奖励进行优化，以实现可解析输出、F1分数对齐和假阳性控制。在域内和跨域设置下的广泛实验表明，AnomaMind持续改善异常检测性能并增强跨异质异常模式的泛化能力，验证了工具增强推理在异常检测中的有效性。代码可在以下网址获取：https://this URL。

英文摘要

Time series anomaly detection is critical in many real-world applications, where effective solutions must localize anomalous regions and support reliable decision-making under complex settings. However, most existing methods frame anomaly detection as a purely discriminative prediction task with fixed feature representations, rather than an evidence-driven diagnostic process. As a result, they often struggle when anomalies exhibit strong context dependence, diverse patterns, or domain shifts across datasets. To address these challenges, we propose AnomaMind, an agentic time series anomaly detection framework that reformulates anomaly detection as a sequential decision-making process. AnomaMind operates through a coarse-to-fine workflow that first localizes suspicious intervals, then constructs diagnostic evidence through tool interaction, and finally refines anomaly decisions through self-reflection. The workflow is supported by a toolkit box that combines knowledge memory and numerical diagnostics: visual anomaly patterns mined from training data and domain knowledge provide contextual guidance, while statistical, value-based, change-based, and region-level operators provide measurable evidence for verification. AnomaMind further adopts a hybrid inference mechanism in which general-purpose models handle flexible reasoning, tool invocation, and refinement, while a detection-specific policy is optimized with rule-based rewards for parsable outputs, F1-score alignment, and false-positive control. Extensive experiments under both in-domain and cross-domain settings demonstrate that AnomaMind consistently improves anomaly detection performance and enhances generalization across heterogeneous anomaly patterns, validating the effectiveness of tool-augmented reasoning for anomaly detection. The code is available at https://github.com/Xiaoyu-Tao/AnomaMind-TS.

URL PDF HTML ☆

赞 0 踩 0

2604.15414 2026-06-10 cs.LG cs.AI cs.NE 版本更新

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

超越单模型优化：在持续强化学习中保持可塑性

Lute Lillo, Nick Cheney

发表机构 * Department of Computer Science University of Vermont（计算机科学系大学 of Vermont）

AI总结提出TeLAPA框架，通过维护行为多样性的策略档案和共享潜在空间，在持续强化学习中实现技能对齐的策略邻域，以解决单策略保存导致的可塑性丧失问题，提升任务学习、恢复和性能保持能力。

详情

AI中文摘要

持续强化学习必须在保留与适应之间取得平衡，然而许多方法仍然依赖于\emph{单模型保存}，即承诺将一个不断演化的策略作为跨任务的主要可复用解决方案。即使保留了先前成功的策略，在干扰后它可能不再为快速适应提供可靠的起点，这反映了单策略保存无法解决的一种\emph{可塑性丧失}形式。受质量-多样性方法的启发，我们引入了\emph{TeLAPA}（可迁移的潜在对齐策略档案），这是一个持续强化学习框架，它将行为多样性的策略邻域组织成每个任务的档案，并维护一个共享的潜在空间，使得存档的策略在非平稳漂移下保持可比性和可复用性。这种视角将持续强化学习从保留孤立解决方案转变为维护\emph{技能对齐的邻域}，其中包含有能力的、行为相关的策略，以支持未来的重新学习。在我们的MiniGrid持续学习设置中，\emph{TeLAPA}成功学习了更多任务，在干扰后重新访问任务时更快地恢复了能力，并在整个任务序列中保持了更高的性能。我们的分析表明，源最优策略通常不是迁移最优的，即使在局部有能力的邻域内也是如此，并且有效的复用依赖于保留和选择多个邻近的替代方案，而不是将它们合并为一个代表。总之，这些结果将持续强化学习重新定义为围绕可复用且有能力的策略邻域，提供了一条超越单模型保存、迈向更具可塑性的终身智能体的途径。

英文摘要

Continual reinforcement learning must balance retention with adaptation, yet many methods still rely on \emph{single-model preservation}, committing to one evolving policy as the main reusable solution across tasks. Even when a previously successful policy is retained, it may no longer provide a reliable starting point for rapid adaptation after interference, reflecting a form of \emph{loss of plasticity} that single-policy preservation cannot address. Inspired by quality-diversity methods, we introduce \textsc{TeLAPA} (Transfer-Enabled Latent-Aligned Policy Archives), a continual RL framework that organizes behaviorally diverse policy neighborhoods into per-task archives and maintains a shared latent space so that archived policies remain comparable and reusable under non-stationary drift. This perspective shifts continual RL from retaining isolated solutions to maintaining \emph{skill-aligned neighborhoods} with competent and behaviorally related policies that support future relearning. In our MiniGrid CL setting, \textsc{TeLAPA} learns more tasks successfully, recovers competence faster on revisited tasks after interference, and retains higher performance across a sequence of tasks. Our analyses show that source-optimal policies are often not transfer-optimal, even within a local competent neighborhood, and that effective reuse depends on retaining and selecting among multiple nearby alternatives rather than collapsing them to one representative. Together, these results reframe continual RL around reusable and competent policy neighborhoods, providing a route beyond single-model preservation toward more plastic lifelong agents.

URL PDF HTML ☆

赞 0 踩 0

2604.20024 2026-06-10 cs.LG 版本更新

Replicable Bandits with UCB based Exploration

基于UCB探索的可复现Bandits

Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee

发表机构 * University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； Amazon（亚马逊）； Carnegie Mellon University（卡内基梅隆大学）

AI总结研究随机多臂老虎机和线性老虎机中基于UCB探索的可复现算法，提出RepUCB和RepLinUCB，分别实现最优遗憾界，显著降低可复现性代价。

详情

AI中文摘要

我们研究随机多臂老虎机（MAB）和线性老虎机中基于UCB（上置信界）探索的可复现算法。一个bandit算法是$\rho$-可复现的，如果两次使用共享内部随机性但独立奖励实现的执行以至少$1-\rho$的概率产生相同的动作序列。先前解决该问题的方法是消除法，并且在具有无限动作的线性老虎机中依赖于离散化，导致对维度$d$和$\rho$的次优依赖。我们为这两种设置开发了乐观替代方案。对于随机多臂老虎机，我们提出RepUCB，一种可复现的批处理UCB算法，并证明其遗憾为$O\\!\left(\frac{K^2\log^2 T}{\rho^2}\sum_{a:\Delta_a>0}\left(\Delta_a+\frac{\log(KT\log T)}{\Delta_a}\right)\right)$。对于随机线性老虎机，我们首先引入RepRidge，一种可复现的岭回归估计器，它同时满足置信度保证和$\rho$-可复现性保证。除了在bandit算法中的作用外，它可能在其他统计估计设置中也具有独立意义。然后我们使用RepRidge设计RepLinUCB，一种用于随机线性老虎机的可复现乐观算法，并证明其遗憾以$\widetilde{O}\\!\big(\big(d+\frac{d^3}{\rho}\big)\sqrt{T}\big)$为界。这比先前的最佳遗憾保证提高了$O(d/\rho)$因子，表明我们的乐观算法可以显著降低可复现性的代价。这是第一个对于大量臂具有最优$\rho$依赖性的线性bandit算法。最后，我们将框架扩展到随机广义线性老虎机，开发了RepGLM（一种可复现的惩罚GLM估计器）和RepGLMUCB（一种用于该设置的可复现乐观算法）。

英文摘要

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $ρ$-replicable if two executions using shared internal randomness but independent reward realizations produce the same action sequence with probability at least $1-ρ$. Prior approaches to this problem are elimination-based and, in linear bandits with infinitely many actions, rely on discretization, leading to suboptimal dependence on the dimension $d$ and $ρ$. We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret $O\!\left(\frac{K^2\log^2 T}{ρ^2}\sum_{a:Δ_a>0}\left(Δ_a+\frac{\log(KT\log T)}{Δ_a}\right)\right)$. For stochastic linear bandits, we first introduce RepRidge, a replicable ridge regression estimator that satisfies both a confidence guarantee and a $ρ$-replicability guarantee. Beyond its role in our bandit algorithm, this may also be of independent interest in other statistical estimation settings. We then use RepRidge to design RepLinUCB, a replicable optimistic algorithm for stochastic linear bandits, and show that its regret is bounded by $\widetilde{O}\!\big(\big(d+\frac{d^3}ρ\big)\sqrt{T}\big)$. This improves the best prior regret guarantee by a factor of $O(d/ρ)$, showing that our optimistic algorithm can substantially reduce the price of replicability. This is the first linear-bandit algorithm with an optimal dependence on $ρ$ for large number of arms. Finally, we extend our framework to stochastic generalized linear bandits by developing RepGLM, a replicable penalized GLM estimator, and RepGLMUCB, a replicable optimistic algorithm for this setting.

URL PDF HTML ☆

赞 0 踩 0

2605.04254 2026-06-10 cs.LG cs.HC 版本更新

Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies

层次化支持向量状态划分用于黑箱强化学习策略的蒸馏

Senne Deproost, Mehrdad Asadi, Ann Nowé

发表机构 * Vrije Universiteit Brussel（布鲁塞尔自由大学）

AI总结本文提出SVSP方法，通过线性支持向量机划分状态动作对数据集，构建紧凑结构化表示，提升平均回报7.4%并减少子策略数量，推动更灵活的蒸馏方法。

Comments Accepted for poster presentation at HHAI 2026

2605.05857 2026-06-10 cs.LG 版本更新

Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

托卡马克旋转剖面控制的离线强化学习

Rohit Sonker, Hiro Josep Farre Kaga, Jiayu Chen, Andrew Rothstein, Ian Char, Ricardo Shousha, Egemen Kolemen, Jeff Schneider

发表机构 * Robotics Institute, Carnegie Mellon University（卡内基梅隆大学机器人研究所）； Princeton University（普林斯顿大学）； Princeton Plasma Physics Lab（普林斯顿等离子物理实验室）； The University of Hong Kong（香港大学）； Lila Sciences

AI总结针对托卡马克等离子体旋转剖面控制难题，提出基于历史数据的离线强化学习方法，利用概率模型生成轨迹训练策略，并在DIII-D托卡马克上验证了有效性。

详情

AI中文摘要

托卡马克仍然是实现实用聚变能的主要候选装置，然而这些装置内部的许多重要控制问题仍然困难或未解决。其中一个挑战是控制等离子体旋转剖面，它强烈影响稳定性、约束和输运。虽然平均旋转可以被控制，但由于高维度、对多个执行器的响应以及对等离子体条件的依赖性，控制完整剖面具有挑战性。基于学习的控制方法，如强化学习（RL），为解决这一难题提供了潜在方案，能够建模复杂相互作用，从而实现有效的多输入多输出控制。然而，由于缺乏能够建模旋转剖面动力学的精确模拟器，学习此类策略具有挑战性。在这项工作中，我们研究了使用离线RL和离线基于模型的RL算法进行旋转剖面控制，仅基于DIII-D托卡马克的历史数据训练它们。我们的最终方法使用等离子体动力学的概率模型为RL训练生成轨迹。我们在DIII-D托卡马克上部署该策略，并观察到有希望的实际结果。最后，我们强调了在使用有限历史数据的情况下，在复杂物理设备上训练和部署RL策略的关键挑战和见解。

英文摘要

Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stability, confinement, and transport. While the average rotation can be controlled, controlling the full profile is challenging due to high dimensionality, response to multiple actuators and dependence on plasma condition. Learning-based control methods, such as reinforcement learning (RL), provide a potential solution to this challenging problem with ability to model complex interactions leading to effective multi-input multi-output control. However, learning such policies is challenging due to the lack of accurate simulators that can model the rotation profile dynamics. In this work, we investigate the use of offline RL and offline model-based RL algorithms for rotation profile control, training them solely on historical data from the DIII-D tokamak. Our final method uses probabilistic models of plasma dynamics to generate rollouts for RL training. We deploy this policy on the DIII-D Tokamak and observe promising real-world results. We conclude by highlighting key challenges and insights from training and deploying an RL policy on a complex physical device while using only limited past data.

URL PDF HTML ☆

赞 0 踩 0

2606.05597 2026-06-10 cs.LG 版本更新

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

AsyncWebRL: 面向视觉网页智能体的高效多步强化学习

Hao Bai, Rui Yang, Chenlu Ye, Spencer Whitehead, Aviral Kumar, Tong Zhang

发表机构 * UIUC（伊利诺伊大学香槟分校）； Microsoft（微软）； CMU（卡内基梅隆大学）

AI总结提出异步系统设计和算法改进，解决多步强化学习中GPU空闲和轨迹过长问题，实现训练吞吐量提升2.9倍，并在WebGym测试集上取得新最优结果。

Comments Updated logo and code link

详情

AI中文摘要

使用多步强化学习训练视觉语言网页智能体计算密集，存在两种主要低效形式：同步强化学习中的GPU空闲，以及使用比必要更多步数和令牌的轨迹。我们提出AsyncWebRL，同时解决这两个问题。在系统方面，异步设计在迭代间重叠展开、梯度更新和策略刷新，并配合两种针对网页智能体的特定适配，即永久展开池和轻量级截图处理，共同实现端到端训练吞吐量比先前最快的开源同步流水线（WebGym）提升高达2.9倍。在算法方面，我们识别出多步GRPO中的每轨迹归一化器$1/|τ_i|$是轨迹级和令牌级低效的根本原因：因为失败轨迹系统性地比成功轨迹长，它降低了失败令牌上负梯度的权重，导致策略持续生成冗长的记忆模式。将$1/|τ_i|$替换为常数$1/k$打破了这种耦合，在保持总体成功率的同时缩短了轨迹。这些贡献共同在WebGym分布外测试集上设立了新的开源最优水平（相对先前最佳42.9%提升5.8%），在更难子集上提升最大（中等难度相对提升42%，困难难度相对提升48%）。

英文摘要

Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than necessary. We present AsyncWebRL, which addresses both. On the system side, an asynchronous design overlaps rollout, gradient update, and policy refresh across iterations, paired with two web-agent-specific adaptations, namely an everlasting rollout pool and lightweight screenshot handling, that together deliver up to a $2.9\times$ end-to-end training-throughput speedup over the previously fastest open synchronous pipeline (WebGym). On the algorithmic side, we identify the per-trajectory normalizer $1/|τ_i|$ in multi-step GRPO as the root cause of trajectory-level and token-level inefficiency: because failures are systematically longer than successes, it down-weights the negative gradient on failed tokens, so the policy keeps producing verbose memory schemas. Replacing $1/|τ_i|$ with a constant $1/k$ breaks this coupling, contracting trajectories while preserving aggregate success. Together, these contributions set a new open-source state of the art on the WebGym out-of-distribution test split (+5.8% relative over the 42.9% prior best), with the largest gains on the harder slices (+42% relative on Medium, +48% relative on Hard).

URL PDF HTML ☆

赞 0 踩 0

2606.06744 2026-06-10 cs.LG cs.GT cs.MA econ.TH 版本更新

Learn to Match: Two-Sided Matching with Temporally Extended Feedback

学会匹配：具有时间扩展反馈的双边匹配

Haijing Zong, Yancheng Liang, Boyang Zhou, Natasha Jaques

发表机构 * Department of Economics, University of Washington（华盛顿大学经济系）； Paul G. Allen School of Computer Science & Engineering, University of Washington（华盛顿大学保罗·G·艾伦计算机科学与工程学院）

AI总结提出一个具有时间扩展反馈的双边匹配框架，将其建模为部分可观测马尔可夫博弈，并基于多智能体强化学习构建Learn2Match基准，实验表明独立PPO优于bandit基线，但存在信息摩擦损失。

详情

AI中文摘要

双边匹配市场通常涉及随时间通过面试、重复互动、学习和分离而展开的信息。现有的匹配模型通常将此过程简化为关于固定偏好的即时亚高斯反馈，忽略了支付相关信息逐渐揭示并改变未来匹配决策的场景。我们引入了一个具有时间扩展反馈的框架，将双边匹配建模为一个部分可观测马尔可夫博弈，其中包含昂贵的匹配前筛选、有噪声的匹配后观测、演变的潜在特征以及内生的延续或解散。我们在Learn2Match中实例化该框架，这是一个用于动态匹配市场的多智能体强化学习基准。Learn2Match支持关于面试谁、与谁匹配以及何时解散匹配的分散决策，同时使用遗憾、社会福利和信息摩擦损失（衡量由潜在偏好不完全揭示引起的福利差距）来评估策略。我们发现，在时间扩展反馈下，独立PPO比bandit风格的CA-ETC基线实现了更高的累积社会福利和更低的累积遗憾，展示了MARL在动态匹配市场中的前景。然而，PPO仍然产生更高的信息摩擦损失，表明端到端MARL尚未提供匹配bandit方法的协调探索结构。这些结果将Learn2Match定位为开发下一代匹配市场算法的基准：像RL智能体一样自适应、像bandit算法一样统计严谨、像稳定匹配机制一样结构感知的方法。

英文摘要

Two-sided matching markets often involve information that unfolds over time through interviews, repeated interaction, learning, and separation. Existing matching models typically reduce this process to immediate sub-Gaussian feedback about fixed preferences, missing settings where payoff-relevant information is revealed gradually and changes future matching decisions. We introduce a framework with temporally extended feedback, that formulates two-sided matching as a partially observable Markov game with costly pre-match screening, noisy post-match observations, evolving latent profiles, and endogenous continuation or dissolution. We instantiate this framework in Learn2Match, a multi-agent reinforcement-learning benchmark for dynamic matching markets. Learn2Match supports decentralized decision making over whom to interview, whom to match with, and when to dissolve a match, while evaluating policies using regret, social welfare, and an information-friction loss that measures the welfare gap caused by incomplete revelation of latent preferences. We find that independent PPO achieves higher cumulative social welfare and lower cumulative regret than the bandit-style CA-ETC baseline under temporally extended feedback, demonstrating the promise of MARL for dynamic matching markets. However, PPO still incurs higher information-friction loss, revealing that end-to-end MARL does not yet provide the coordinated exploration structure of matching-bandit methods. These results position Learn2Match as a benchmark for developing the next generation of matching-market algorithms: methods that are adaptive like RL agents, statistically disciplined like bandit algorithms, and structurally aware like stable-matching mechanisms. Please refer to https://sites.google.com/view/learn-to-match/home for the official website and the code link.

URL PDF HTML ☆

赞 0 踩 0

2509.25760 2026-06-10 cs.CL cs.AI cs.LG 版本更新

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

TruthRL: 通过强化学习激励诚实的LLM

Zhepei Wei, Xiao Yang, Kai Sun, Jiaqi Wang, Rulin Shao, Jingxiang Chen, Mohammad Kachuee, Teja Gollapudi, Yiwei Liao, Nicolas Scheffer, Rakesh Wanga, Anuj Kumar, Yu Meng, Wen-tau Yih, Xin Luna Dong

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出TruthRL框架，使用GRPO和三值奖励直接优化LLM的诚实性，减少幻觉并允许不确定时弃权，在知识密集型基准上显著提升诚实性。

Comments ICML 2026. Code: https://github.com/facebookresearch/TruthRL

详情

AI中文摘要

虽然大型语言模型（LLM）在事实性问题回答上表现出色，但它们仍然容易产生幻觉和不真实的回答，特别是当任务需要其参数知识之外的信息时。事实上，诚实性需要的不仅仅是准确性——模型还必须识别不确定性，并在不确定时弃权以避免幻觉。这对现有方法提出了根本性挑战：优化准确性的方法往往会放大幻觉，而鼓励弃权的方法可能变得过于保守，牺牲正确答案。两种极端最终都损害了诚实性。在这项工作中，我们提出了TruthRL，一个通用的强化学习（RL）框架，直接优化LLM的诚实性。具体来说，我们使用GRPO实现TruthRL，并采用一个简单而有效的三值奖励，区分正确答案、幻觉和弃权。它激励模型不仅通过提供正确回答来减少幻觉，还通过在不确定时启用弃权来提高诚实性。在四个知识密集型基准上的大量实验表明，TruthRL显著减少了幻觉（例如，43.5% → 19.4%）并提高了诚实性（例如，5.3% → 37.2%），在各种骨干模型上均有一致的提升。分析表明，TruthRL的改进源于LLM识别其知识边界的能力增强，从而避免了像基线那样过于保守。

英文摘要

While large language models (LLMs) have demonstrated strong performance on factoid question answering, they are still prone to hallucination and untruthful responses, particularly when tasks demand information outside their parametric knowledge. Indeed, truthfulness requires more than accuracy -- models must also recognize uncertainty and abstain when unsure to avoid hallucinations. This presents a fundamental challenge for existing methods: approaches that optimize for accuracy often amplify hallucinations, while those that encourage abstention can become overly conservative, sacrificing correct answers. Both extremes ultimately compromise truthfulness. In this work, we present TruthRL, a general reinforcement learning (RL) framework that directly optimizes the truthfulness of LLMs. Specifically, we implement TruthRL using GRPO with a simple yet effective ternary reward that distinguishes correct answers, hallucinations, and abstentions. It incentivizes models to reduce hallucinations not only by providing correct responses, but also by enabling abstention when uncertain, thereby improving truthfulness. Extensive experiments across four knowledge-intensive benchmarks show that TruthRL significantly reduces hallucinations (e.g., 43.5% $\rightarrow$ 19.4%) and improves truthfulness (e.g., 5.3% $\rightarrow$ 37.2%), with consistent gains across various backbone models. Analysis shows that the improvement of TruthRL arises from enhanced capability of LLMs to recognize their knowledge boundary, hence avoiding being overly conservative as the baselines are.

URL PDF HTML ☆

赞 0 踩 0

2606.09926 2026-06-10 cs.LG cs.AI 新提交

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

在你挣扎处采样：通过熵引导的幂采样增强基础模型推理

Hong Guo, Nianhui Guo, Christoph Meinel, Haojin Yang

AI总结提出熵引导的幂采样（EGPS），一种无需训练和验证器的采样方法，通过利用前向传播中的token级熵将MCMC移动定位到高熵区域，在多个基准上以高达12.6倍加速达到最优或并列最优准确率。

详情

AI中文摘要

从序列级幂分布 $p^\alpha$ 采样可以在不更新任何参数的情况下从基础语言模型中引出强化学习级别的推理，但标准的Metropolis-Hastings（MH），一种马尔可夫链蒙特卡洛（MCMC）采样器，既昂贵又慢混合。我们将这两个问题归因于结构不匹配：$p^\alpha$ 主要在稀疏、空间聚集的高熵决策点集上偏离 $p$，然而MH沿着前缀均匀地提出重采样位置——在近简并条件上浪费计算，同时在模式发散处欠混合。我们提出熵引导的幂采样（EGPS），一种无需训练和验证器的采样器，它从已经在前向传播中的token级熵重新推导其提议。EGPS跳过确定性块，将每个MCMC移动定位到高熵邻域，并在决策点应用多尝试Metropolis——使得采样成本随熵质量而非序列长度缩放。在Qwen2.5-Math-7B上，EGPS在所有三个基准（MATH500 $75.8\\%$，HumanEval $62.2\\%$，GPQA $42.4\\%$）上达到最佳或并列最佳准确率，同时相对于MH基线实现了高达12.6倍的墙钟加速。

英文摘要

Sampling from the sequence-level power distribution $p^α$ elicits RL-level reasoning from base language models without any parameter updates, but the standard Metropolis--Hastings (MH), a Markov Chain Monte Carlo (MCMC) sampler, is both expensive and slow-mixing. We trace both to a structural mismatch: $p^α$ mainly departs from $p$ at a sparse, spatially clustered set of high-entropy decision points, yet MH proposes resampling positions uniformly along the prefix -- wasting compute on near-degenerate conditionals while under-mixing precisely where modes diverge. We propose Entropy-Guided Power Sampling (EGPS), a training-free and verifier-free sampler that re-derives its proposal from token-level entropy already in the forward pass. EGPS skips deterministic blocks, localizes each MCMC move to a high-entropy neighborhood, and applies Multiple-Try Metropolis at decision points -- making sampling cost scale with \emph{entropy mass rather than sequence length}. On Qwen2.5-Math-7B, EGPS reaches best or tied-best accuracy on all three benchmarks (MATH500 $75.8\%$, HumanEval $62.2\%$, GPQA $42.4\%$) at up to a $12.6\times$ wall-clock speedup over the MH baseline.

URL PDF HTML ☆

赞 0 踩 0

2606.09962 2026-06-10 cs.LG cs.AI cs.SD 新提交

Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech

FSQ 令牌在分类数据连续扩散中的最优性及其在文本到语音中的应用

Vadim Popov, Wenju Gu, Tasnima Sadekova, Georgii Aparin, Assel Yermekova

AI总结本文研究连续扩散模型中离散令牌的潜在空间结构，通过理论分析和实验证明 FSQ 令牌化方案在分类数据连续扩散中最优，并在文本到语音任务中验证其优于基于 LLM 的方法。

详情

AI中文摘要

分类数据的连续扩散是一种属于扩散家族的框架，旨在生成离散数据。近年来，由于研究人员试图实现寻找自回归大型语言模型的合理替代方案这一具有挑战性的目标，对此类模型的科学兴趣不断增长。在本文中，我们研究了与离散令牌相对应的潜在空间结构的性质，这些性质通过扩散路径测度上的 Kullback-Leibler 散度和最优训练扩散模型正确预测令牌的准确性来表达。我们发现，FSQ 令牌化方案具有的潜在空间结构使其最适合分类数据的连续扩散，这一点通过严格的理论分析和数值实验得到了验证。为了在现实场景中验证我们的发现，我们训练了几个以语音令牌作为中间声学特征的文本到语音扩散模型，并表明基于 FSQ 令牌的模型确实表现最佳，而且它优于其强大的基于 LLM 的对应模型，同时体积更小、速度更快。

英文摘要

Continuous diffusion for categorical data is a framework belonging to the diffusion family and aiming at generating discrete data. The scientific interest to such models has been constantly increasing these days because researchers try to achieve a challenging goal of finding reasonable alternatives to autoregressive large language models. In this paper, we study the properties of the structure of the latent space corresponding to discrete tokens expressed in terms of Kullback-Leibler divergence on diffusion path measures and accuracy of the correct token prediction by the optimally trained diffusion model. We find that FSQ tokenization scheme has the latent space structure with the properties that make it best suited for continuous diffusion for categorical data as verified through rigorous theoretical analysis and numerical experiments. To validate our findings in real-life scenario, we train several text-to-speech diffusion models having speech tokens as intermediate acoustic features, and show that the one based on FSQ tokens indeed performs the best, and, moreover, it outperforms its strong LLM-based counterpart, at the same time being significantly smaller and faster.

URL PDF HTML ☆

赞 0 踩 0

2606.10089 2026-06-10 cs.LG cs.AI 新提交

A Theory on Flow Matching with Neural Networks

基于神经网络的流匹配理论

Yihan He, Qishuo Yin, Yuan Cao, Jianqing Fan, Han Liu

AI总结本文为神经网络参数化的条件速度场流匹配建立了理论基础，证明了过参数化两层ReLU网络中梯度下降的收敛性，推导了条件速度场匹配目标的泛化界，并提供了生成样本的Wasserstein距离保证。

2606.10153 2026-06-10 cs.LG 新提交

Compositional Generative Modeling from Decentralized Data

Shreya Jha, Timo Schorlepp, Nicholas Geissler, Jules Berman, Benjamin Peherstorfer

AI总结提出一阶轨迹匹配（FTM）方法，通过学习随机系统轨迹的一阶局部概率质量输运，实现低成本的集成预测，并捕捉通量、环流等轨迹量。

2606.09901 2026-06-10 cs.GR cs.CV cs.HC cs.LG cs.MM 交叉投稿

On the Controllability-Fidelity Frontier in Diffusion Editing

扩散编辑中的可控性-保真度前沿

Yi Hu, Leying Yi, Emily Davis, Finn Carter

发表机构 * Xidian University（西安电子科技大学）

AI总结本文理论结合实证研究扩散图像编辑中用户意图遵循、非目标内容保持与输出质量间的权衡，提出算法框架并揭示关键失败模式，讨论伦理考量。

Comments Preprint

详情

AI中文摘要

基于扩散的生成模型实现了强大的图像编辑能力，但在保持保真度和安全性的同时实现精确控制仍然具有挑战性。我们对可控的基于扩散的图像编辑进行了全面的理论和实证研究，分析了用户意图遵循、非目标内容保持和输出质量之间的权衡。我们的工作涵盖了文本和掩码引导编辑、点/拖拽操作以及基于反演的流程。我们推导了编辑目标的数学公式，并分析了噪声注入、分数引导和反演误差的动力学。我们提供了重构误差、重复编辑下的稳定性以及变化局部性的理论界限。我们提出了掩码局部化和指令引导编辑的算法框架（附伪代码），并在多个任务和指标（FID、身份相似性、CLIP对齐、伪影分数等）上进行了广泛的实验，比较了最先进的方法（例如TF-ICON \cite{lu2023tficone}、DragFlow \cite{zhou2025dragflow}、InstructPix2Pix \cite{brooks2023instructpix2pix}、UltraEdit \cite{zhao2024ultraedit}）。我们的结果揭示了关键失败模式，如身份漂移、提示敏感性和组合错误。我们还讨论了图像编辑中的伦理考量，包括滥用风险、偏见、同意以及概念擦除技术（例如MACE \cite{lu2024mace}、ANT \cite{li2025ant}、EraseAnything \cite{gao2024eraseanything}）作为安全措施。最后，我们总结了负责任、高保真度扩散编辑的最佳实践和未来方向。

英文摘要

Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.

URL PDF HTML ☆

赞 0 踩 0

2606.09941 2026-06-10 stat.AP cs.LG stat.OT 交叉投稿

Stochastic weather generators for high-frequency wind vector time series

高频风矢量时间序列的随机天气生成器

Mingshi Cui, Kevin Eng, Justin T. Greene, Zern Ke, Abolfazl Sodagartojgi, Zhiqiu Xia, Gemma E. Moran, Michael L. Stein

发表机构 * Department of Statistics, Rutgers University（统计学系，罗格斯大学）

AI总结针对分钟级风矢量时间序列，开发基于时间矢量量化变分自编码器的机器学习模型，生成逼真序列，捕捉昼夜变化但极端风速分布匹配不足。

详情

AI中文摘要

地表风速在分钟尺度上变化显著，因此有必要研究其在此精细时间尺度上的变化。为最小化季节性影响，本文限定于六月，基于俄克拉荷马州拉蒙特站点超过30年的分钟级高质量测量数据，开发了一系列用于生成真实地表风矢量时间序列的机器学习模型。此类生成器可作为多种学科模型的输入，特别是风能领域，同时也适用于野火蔓延和航空等。数据显示风速和风向均存在复杂的昼夜结构，标准时间序列模型难以捕捉，因此我们考虑多种机器学习方法，基于时间矢量量化变分自编码器构建随机风生成器。我们考虑一次生成一天的数据，以及基于前一天风况生成一天的风矢量。我们还研究了在生成器中纳入离散天气状态变量的方法。我们使用多种正式和非正式方法评估生成器。其中最佳生成器能够捕捉观测数据中的许多（但非全部）复杂特征。特别地，我们的最佳方法准确模拟了风波动性的昼夜变化，但在匹配观测到的极端风速分布方面存在困难。

英文摘要

Surface winds can vary substantially from one minute to the next, so there is scope for studying its variation on this fine time scale. Restricting to the month of June to minimize seasonality, this work develops a range of machine learning models for generating realistic time series of surface wind vectors at a site in Lamont, Oklahoma based on more than 30 years of high quality measurements at the minute time scale. Such a generator could be used as an input into models from a range of disciplines, notably for wind energy, but also wildfire spread and aviation, among others. The data show complex diurnal structures in both wind speed and direction that would be challenging to capture with standard time series models, so we consider a number of machine learning approaches to producing a stochastic wind generator based on time vector-quantized variational autoencoders. We consider generating a day's worth of data at a time and generating a day of wind vectors conditional on the previous day's winds. We also study methods for incorporating a discrete weather state variable in the generator. We evaluate the generators using a wide range of formal and informal methods. The best of these generators can capture many but not all of the complex features present in the observational data. In particular, the best of our approaches accurately mimic diurnal changes in wind volatility but struggle to match the observed distribution of extreme wind speeds.

URL PDF HTML ☆

赞 0 踩 0

2606.10023 2026-06-10 astro-ph.CO astro-ph.IM cs.LG 交叉投稿

Learning the Universe: Posterior Reliability of Neural Generative Models in High-Dimensional Field-Level Inference of Cosmic Initial Conditions

学习宇宙：神经生成模型在高维场级宇宙初始条件推断中的后验可靠性

Ludvig Doeser, Jens Jasche

发表机构 * The Oskar Klein Centre, Department of Physics, Stockholm University, AlbaNova University Centre（奥斯卡·克莱因中心，物理系，斯德哥尔摩大学，阿尔瓦纳大学中心）； Center for Computational Astrophysics, Flatiron Institute（计算天体物理学中心，Flatiron研究所）

AI总结本文通过哈密顿蒙特卡洛参考后验，评估神经生成模型（随机插值和GLOW归一化流）在高维场级宇宙初始条件推断中的后验可靠性，发现匹配后验均值或边缘分布无法保证正确的不确定性结构。

Comments This is a Learning the Universe publication. 19 pages, 18 figures

详情

AI中文摘要

准确的后验估计是科学推断的核心，因为不确定性决定了从观测数据中能可靠地学到什么。虽然马尔可夫链蒙特卡洛方法提供了渐近收敛保证，但在高维设置中计算成本高昂。基于神经网络的生成模型能够对整个离散化三维场进行快速摊销推断，但通常缺乏收敛保证和原则性的精度评估。利用哈密顿蒙特卡洛获得参考后验样本，我们对隐式生成模型（随机插值）和显式基于似然的模型（GLOW归一化流）进行了受控的场级评估。这种在典型应用中无法获得的比较，使得能够检测到标准指标无法捕捉的后验几何失败。作为案例研究，我们考虑了从当今大尺度结构推断宇宙初始条件的宇宙学逆问题。为了匹配现代宇宙学数据的精度，该问题日益依赖复杂、非线性和不可微的模拟器，这些模拟器与基于梯度的推断框架不兼容。生成模型提供了一条应对这些挑战的途径，前提是它们推断的后验是可靠的。在这项工作中，我们表明，匹配后验均值、边缘分布或实现高互相关并不意味着正确的不确定性结构，这一点通过后验方差场和基于样本的评估得以揭示。通过这项工作，我们旨在提高对高维场级设置中不确定性估计挑战的认识，强调在科学应用中仔细设计和验证神经生成方法的重要性。

英文摘要

Accurate posterior estimation is central to scientific inference, as uncertainties determine what can be reliably learned from observational data. While Markov chain Monte Carlo methods provide asymptotic convergence guarantees, they are computationally demanding in high-dimensional settings. Neural network-based generative models for entire discretized 3D fields enable fast amortized inference but often lack convergence guarantees and principled accuracy assessment. Using Hamiltonian Monte Carlo to obtain reference posterior samples, we conduct a controlled field-level evaluation of an implicit generative model (Stochastic Interpolants) and an explicit likelihood-based model (GLOW normalizing flows). This comparison, unavailable in typical applications, enables the detection of posterior geometry failures that standard metrics cannot capture. As a case study, we consider the cosmological inverse problem of inferring cosmic initial conditions from present-day large-scale structure. To match the precision of modern cosmological data, this problem increasingly relies on complex, non-linear, and non-differentiable simulators, which are incompatible with gradient-based inference frameworks. Generative models offer a route to address these challenges, provided their inferred posteriors are reliable. In this work, we show that matching posterior means, marginal distributions, or achieving high cross-correlation does not imply correct uncertainty structure, as revealed by posterior variance fields and sample-based evaluations. Through this work, we aim to raise awareness of the challenges of uncertainty estimation in high-dimensional field-level settings, highlighting the importance of careful design and validation of neural generative approaches for scientific applications.

URL PDF HTML ☆

赞 0 踩 0

2606.10450 2026-06-10 cs.CV cs.LG 交叉投稿

Few-step Generative Models as Lossy Compression

少步生成模型作为有损压缩

Fuma Kimishima, Jinjia Zhou

发表机构 * University of Tokyo（东京大学）

AI总结研究将少步生成模型（Rectified Flow、CTM、MeanFlow）用于反向信道编码框架进行有损压缩，通过参数化等效和局部高斯近似实现无需重训练的编解码，在低分辨率基准上减少编解码时间并提升低比特率下的真实性。

详情

有限训练集下随机插值的生成性质

Yunchen Li, Shaohui Lin, Zhou Yu

AI总结研究有限训练集下随机插值生成模型的理论性质，推导最优速度场和得分函数的闭式解，揭示确定性和随机生成过程的行为，并定义欠拟合与过拟合。

Comments We found proof errors affecting key theorems and wish to avoid misleading readers. We have submitted a substantially revised new paper, arXiv:2606.08554, retaining only two old theorems and adding five new ones

详情

AI中文摘要

本文研究了有限训练总体下生成模型的理论行为。在随机插值生成框架内，我们推导了当仅有有限数量的训练样本可用时最优速度场和得分函数的闭式表达式。我们证明，在某些正则性条件下，确定性生成过程精确恢复训练样本，而随机生成过程表现为带有加性高斯噪声的训练样本。在理想化设置之外，我们考虑模型估计误差，并引入生成模型特有的欠拟合和过拟合的正式定义。我们的理论分析揭示，在存在估计误差的情况下，随机生成过程有效地产生训练样本的凸组合，这些组合被均匀噪声和高斯噪声的混合所破坏。在生成任务和分类等下游任务上的实验支持了我们的理论。

英文摘要

This paper investigates the theoretical behavior of generative models under finite training populations. Within the stochastic interpolation generative framework, we derive closed-form expressions for the optimal velocity field and score function when only a finite number of training samples are available. We demonstrate that, under some regularity conditions, the deterministic generative process exactly recovers the training samples, while the stochastic generative process manifests as training samples with added Gaussian noise. Beyond the idealized setting, we consider model estimation errors and introduce formal definitions of underfitting and overfitting specific to generative models. Our theoretical analysis reveals that, in the presence of estimation errors, the stochastic generation process effectively produces convex combinations of training samples corrupted by a mixture of uniform and Gaussian noise. Experiments on generation tasks and downstream tasks such as classification support our theory.

URL PDF HTML ☆

赞 0 踩 0

2601.08379 2026-06-10 cs.LG cs.AI cs.CV 版本更新

MMD Guidance: Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

MMD Guidance: 基于最大均值差异引导的无训练分布适应扩散模型

Matina Mahdizadeh Sani, Nima Jamali, Mohammad Jalali, Farzan Farnia

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出MMD Guidance，一种无训练方法，通过最大均值差异梯度引导扩散模型采样，实现与参考数据分布对齐，无需重新训练。

详情

AI中文摘要

预训练扩散模型已成为无条件及条件样本生成的有力先验，但其输出常偏离用户特定目标数据的特征。这种不匹配在领域适应任务中尤为突出，此时仅有少量参考样本可用且重新训练扩散模型不可行。现有推理时引导方法可调整采样轨迹，但通常优化替代目标（如分类器似然）而非直接对齐目标分布。我们提出MMD Guidance，一种无训练机制，通过生成样本与参考数据集之间的最大均值差异（MMD）梯度增强反向扩散过程。MMD能从有限数据中提供可靠分布估计，实践中方差低，且可高效微分，特别适合引导任务。我们的框架通过乘积核自然扩展到条件生成模型中的提示感知适应。此外，由于引导在潜在扩散模型（LDM）的潜在空间中进行，因此可高效计算。在合成及真实世界基准上的实验表明，MMD Guidance能在保持样本保真度的同时实现分布对齐。项目代码见该网址。

英文摘要

Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose \emph{MMD Guidance}, a training-free mechanism that augments the reverse diffusion process with gradients of the \textit{Maximum Mean Discrepancy (MMD)} between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity. The project code is available at github.com/matinamehdizadeh/MMD-Guidance.

URL PDF HTML ☆

赞 0 踩 0

2602.04119 2026-06-10 cs.LG q-bio.QM 版本更新

Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors

通过具有丰富化学先验的软约束GFlowNets生成可合成分子

Hyeonah Kim, Minsu Kim, Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, Yoshua Bengio, Alex Hernandez-Garcia

发表机构 * University of Toronto（多伦多大学）； DeepMind（深度思维）； University of Montreal（蒙特利尔大学）

AI总结提出S3-GFN方法，通过软正则化序列GFlowNet，利用大规模SMILES语料库的化学先验，生成高奖励且可合成的分子，实验表明可合成率≥95%。

详情

AI中文摘要

生成模型在实验药物发现活动中的应用受到严重限制，因为从头设计实际可合成的分子非常困难。先前的工作利用生成流网络（GFlowNets）通过基于预定义反应模板和构建块的状态和动作空间设计来施加硬合成性约束。尽管这种方法前景广阔，但目前缺乏灵活性和可扩展性。作为替代方案，我们提出了S3-GFN，它通过对基于序列的GFlowNet进行简单的软正则化来生成可合成的SMILES分子。我们的方法利用从大规模SMILES语料库中学习到的丰富分子先验，将分子生成引导向高奖励、可合成的化学空间。该模型通过离策略重放训练和基于可合成与不可合成样本分离缓冲区的对比学习信号来施加约束。我们的实验表明，S3-GFN能够学习生成可合成分子（≥95%），并在多种任务中获得更高奖励。

英文摘要

The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.09639 2026-06-10 cs.LG stat.ML 版本更新

Blind denoising diffusion models and the blessings of dimensionality

盲去噪扩散模型与维度的祝福

Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli

发表机构 * Flatiron Institute, Simons Foundation（Flatiron研究院，Simons基金会）； Foundations of Data Science, Yale University（数据科学基础，耶鲁大学）； Department of Statistics and Data Science, Yale University（统计与数据科学系，耶鲁大学）； Ctr. for Neural Science & Courant Institute, New York University（神经科学中心及Courant学院，纽约大学）

AI总结提出盲去噪扩散模型（BDDM），通过不向神经网络传递噪声幅度来简化设计，并在数据内在维度低于环境维度的假设下证明其正确性，实验显示自适应方案的优势。

Comments 39 pages, 13 figures; Accepted to ICML 2025 FoGen workshop

详情

AI中文摘要

去噪扩散模型（DDM）是跨多个领域从数据中学习密度的最先进方法，然而训练和采样流程的许多方面仍知之甚少。特别是，噪声调节要求从业者将人为设计的无原则噪声嵌入纳入神经网络架构，并使用临时噪声调度进行采样。为了解决这些缺点，我们提供了\emph{盲去噪扩散模型}（BDDM）的完整理论：这是DDM的一种变体，其中噪声幅度在训练或采样期间不传入神经网络，从而消除了上述设计选择的需要。我们在数据分布相对于环境维度具有低内在维度的假设下证明了BDDM作为采样算法的正确性。这一假设源于从单个噪声样本估计噪声水平的贝叶斯问题的引入，该问题可能具有独立的意义。我们通过实验将BDDM的性能与标准DDM进行比较，展示了我们分析严格证明的\emph{自适应}方案的优势。

英文摘要

Denoising diffusion models (DDMs) are state-of-the-art methods for learning densities from data across numerous domains, yet many aspects of the training and sampling pipeline remain poorly understood. In particular, noise conditioning requires practitioners to incorporate contrived unprincipled noise embeddings into neural network architectures and to use ad hoc noise schedules for sampling. To address these drawbacks, we provide a complete theory for \emph{blind denoising diffusion models} (BDDMs): a variant of DDMs where the noise amplitude is not passed into the neural network during training or sampling, obviating the need for the aforementioned design choices. We justify the correctness of BDDMs as a sampling algorithm under an assumption of low intrinsic dimensionality of the underlying data distribution relative to the ambient dimension. This assumption arises through the introduction of the Bayesian problem of estimating noise levels from a single noisy sample, which might be of independent interest. We empirically compare the performance of BDDMs to standard DDMs, showcasing the benefits of an \emph{adaptive} scheme which is rigorously justified by our analysis.

URL PDF HTML ☆

赞 0 踩 0

2606.07605 2026-06-10 cs.LG cs.AI 版本更新

SRT: Super-Resolution for Time Series via Disentangled Rectified Flow

SRT: 基于解缠校正流的时间序列超分辨率

Jufang Duan, Shenglong Xiao, Yuren Zhang

发表机构 * Bytedance（字节跳动）

AI总结提出SRT框架，通过解缠校正流将低分辨率时间序列重建为高分辨率，分解趋势与季节成分，利用隐式神经表示对齐分辨率，并引入跨分辨率注意力机制生成细节。

Comments Accepted to the International Conference on Learning Representations (ICLR) 2026

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

AI中文摘要

具有高时间分辨率的细粒度时间序列数据对于广泛应用的精确分析至关重要。然而，获取此类数据通常受到成本和可行性的限制。可以通过基于特定先验从低分辨率输入重建高分辨率信号来解决此问题，这被称为超分辨率。虽然在计算机视觉中得到了广泛研究，但直接将图像超分辨率技术迁移到时间序列并非易事。为了从根本上解决这一挑战，我们提出了时间序列超分辨率（SRT），这是一种通过解缠校正流重建低分辨率输入中丢失的时间模式的新框架。SRT将输入分解为趋势和季节成分，使用隐式神经表示将它们对齐到目标分辨率，并利用一种新颖的跨分辨率注意力机制来指导高分辨率细节的生成。我们进一步引入了SRT-large，这是一个经过大规模预训练的扩展版本，具有强大的零样本超分辨率能力。在九个公共数据集上的大量实验表明，SRT和SRT-large在多个尺度因子下始终优于现有方法，展示了稳健的性能以及我们架构中每个组件的有效性。

英文摘要

Fine-grained time series data with high temporal resolution is critical for accurate analytics across a wide range of applications. However, the acquisition of such data is often limited by cost and feasibility. This problem can be tackled by reconstructing high-resolution signals from low-resolution inputs based on specific priors, known as super-resolution. While extensively studied in computer vision, directly transferring image super-resolution techniques to time series is not trivial. To address this challenge at a fundamental level, we propose Super-Resolution for Time series (SRT), a novel framework that reconstructs temporal patterns lost in low-resolution inputs via disentangled rectified flow. SRT decomposes the input into trend and seasonal components, aligns them to the target resolution using an implicit neural representation, and leverages a novel cross-resolution attention mechanism to guide the generation of high-resolution details. We further introduce SRT-large, a scaled-up version with extensive pre-training, which enables strong zero-shot super-resolution capability. Extensive experiments on nine public datasets demonstrate that SRT and SRT-large consistently outperform existing methods across multiple scale factors, showing both robust performance and the effectiveness of each component in our architecture.

URL PDF HTML ☆

赞 0 踩 0

2412.11449 2026-06-10 cs.SD cs.AI cs.CL cs.LG eess.AS 版本更新

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Whisper-GPT -- 语音和音乐的连续离散混合表示语言模型

Prateek Verma

发表机构 * Stanford University（斯坦福大学）

AI总结提出Whisper-GPT，一种结合连续音频表示（如频谱图）和离散音频令牌的生成式大语言模型，解决了离散令牌方法上下文长度过长的问题，在语音和音乐的下一个令牌预测中降低了困惑度和负对数似然。

Comments 6 pages, 3 figures. 50th International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India

详情

AI中文摘要

我们提出了WHISPER-GPT：一种用于语音和音乐的生成式大语言模型（LLM），它允许我们在单个架构中同时处理连续音频表示和离散令牌。近年来，利用神经压缩算法（例如ENCODEC）导出的离散音频令牌的生成式音频、语音和音乐模型激增。然而，这种方法的主要缺点之一是处理上下文长度。如果必须考虑不同频率下的所有音频内容来进行下一个令牌预测，那么对于高保真生成架构来说，上下文长度会急剧增长。通过结合连续音频表示（如频谱图）和离散声学令牌，我们保留了两者的优点：在单个令牌中拥有来自音频特定时间实例的所有必要信息，同时允许LLM预测未来令牌，从而获得采样和离散空间提供的其他好处。我们展示了与基于令牌的语音和音乐LLM相比，我们的架构如何提高下一个令牌预测的困惑度和负对数似然分数。

英文摘要

We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of the major drawbacks of this approach is handling the context length. It blows up for high-fidelity generative architecture if one has to account for all the audio contents at various frequencies for the next token prediction. By combining continuous audio representation like the spectrogram and discrete acoustic tokens, we retain the best of both worlds: Have all the information needed from the audio at a specific time instance in a single token, yet allow LLM to predict the future token to allow for sampling and other benefits discrete space provides. We show how our architecture improves the perplexity and negative log-likelihood scores for the next token prediction compared to a token-based LLM for speech and music.

URL PDF HTML ☆

赞 0 踩 0

2506.03672 2026-06-10 stat.ML cs.LG math.OC 版本更新

Latent Guided Sampling for Combinatorial Optimization

面向组合优化的潜在引导采样

Sobihan Surendran, Adeline Fermanian, Sylvain Le Corff

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, F-75005 Paris, France（索邦大学和巴黎Cité大学，法国国家科学研究中心，概率、统计与建模实验室，法国巴黎F-75005）； LOPF, Califrais' Machine Learning Lab, Paris, France（LOPF，Califrais机器学习实验室，法国巴黎）

AI总结提出LGS-Net潜在空间模型，结合马尔可夫链蒙特卡洛与随机逼近的潜在引导采样方法，在路由任务上达到最先进性能。

详情

Journal ref: International Conference on Machine Learning, Jul 2026, Seoul, South Korea

AI中文摘要

组合优化问题在物流、制造和药物发现等领域广泛存在，但其NP-hard性质使其计算上具有挑战性。最近的神经组合优化（NCO）方法利用深度学习来学习构建解的策略，通过监督学习或强化学习进行训练。尽管有前景，但这些方法通常依赖于任务特定的增强，在分布外实例上表现不佳，并且缺乏鲁棒的推理机制。此外，现有的潜在空间模型要么需要标记数据，要么使用与实例无关的潜在分布。在这项工作中，我们提出了LGS-Net，一种新颖的以问题实例为条件的潜在空间模型，并引入了一种高效的推理方法——潜在引导采样（LGS），基于马尔可夫链蒙特卡洛和随机逼近。我们证明了我们方法的迭代形成一个时间非齐次马尔可夫链，并提供了严格的理论收敛保证。在基准路由任务上的实证结果表明，我们的方法在NCO基线中达到了最先进的性能。

英文摘要

Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization (NCO) methods leverage deep learning to learn policies for constructing solutions, trained via Supervised or Reinforcement Learning. While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms. Moreover, existing latent space models either require labeled data or use an instance-independent latent distribution. In this work, we propose LGS-Net, a novel latent space model that conditions on problem instances, and introduce an efficient inference method, Latent Guided Sampling (LGS), based on Markov Chain Monte Carlo and Stochastic Approximation. We show that the iterations of our method form a time-inhomogeneous Markov Chain and provide rigorous theoretical convergence guarantees. Empirical results on benchmark routing tasks show that our method achieves state-of-the-art performance among NCO baselines.

URL PDF HTML ☆

赞 0 踩 0

2508.07048 2026-06-10 cs.SD cs.AI cs.LG eess.AS 版本更新

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Whisfusion: 基于掩码扩散的并行ASR解码

Taeyoun Kwon, Junhyuk Ahn, Taegeun Yun, Heeju Jwa, Yoonchae Choi, Siwon Park, Jongchan Kim, Hyungon Ryu, Hyuk-Jae Lee, Nam-Joon Kim

发表机构 * Seoul National University（首尔国立大学）； Soongsil University（顺天大学）； NVIDIA Corporation（英伟达公司）

AI总结提出Whisfusion，在冻结的Whisper音频嵌入上训练专用掩码扩散解码器，通过并行扩散解码实现非自回归ASR，在多种语言基准上超越Whisper-large-v3，速度提升4-5倍。

Comments 16 pages, 3 figures

详情

AI中文摘要

自回归（AR）编码器-解码器模型主导着高质量的多语言ASR，但其从左到右的解码器使得推理延迟随转录长度增加。一种自然的替代方案，CTC风格的非自回归（NAR）系统避免了这一瓶颈，但其条件独立性假设牺牲了转录级别的生成建模。掩码扩散语言模型（例如LLaDA、MDLM）提供了一种有竞争力的NAR文本生成方法。我们探究这类模型是否能在消除从左到右瓶颈的同时，将NAR ASR带入强AR ASR系统的准确率范围。我们提出Whisfusion，它在冻结的Whisper-large-v3音频嵌入之上从头训练一个专用的掩码扩散解码器，仅需几步即可去噪掩码转录。我们在约68k小时的11种语言语音上训练，采用高掩码专门化以将训练与推理的完全掩码起始点对齐，并通过并行扩散解码进行解码。Whisfusion在英语、欧洲和CJK基准测试的组平均准确率上超越Whisper-large-v3，同时运行速度快4-5倍，在准确率和吞吐量上均超越Whisper-turbo。它达到与Canary和Qwen3-ASR竞争的准确率，同时运行速度快3-7倍。这些结果确立了掩码扩散作为高吞吐量多语言转录的帕累托竞争性非自回归范式。代码和模型权重可在https://this URL获取。

英文摘要

Autoregressive (AR) encoder-decoder models dominate high-quality multilingual ASR, but their left-to-right decoders make inference latency scale with transcript length. A natural alternative, CTC-style non-autoregressive (NAR) systems avoid this bottleneck but their conditional independence assumption sacrifices transcript-level generative modeling. Masked diffusion language models (e.g., LLaDA, MDLM) offer a competitive NAR text-generation approach. We ask whether such models can bring NAR ASR into the accuracy regime of strong AR ASR systems while removing the left-to-right bottleneck. We propose Whisfusion, which trains a dedicated masked diffusion decoder from scratch on top of frozen Whisper-large-v3 audio embeddings, denoising masked transcripts in just a few steps. We train on ~68k hours of 11-language speech with high-mask specialization to align training with the fully masked starting point of inference, and decode via Parallel Diffusion Decoding. Whisfusion surpasses Whisper-large-v3 on group-average accuracy across English, European, and CJK benchmarks, while running 4-5x faster, additionally surpassing Whisper-turbo in both accuracy and throughput. It reaches accuracy competitive with Canary and Qwen3-ASR while running 3-7x faster. These results establish masked diffusion as a Pareto-competitive non-autoregressive paradigm for high-throughput multilingual transcription. Code and model weights are available at https://github.com/taeyoun811/Whisfusion.

URL PDF HTML ☆

赞 0 踩 0

2606.09880 2026-06-10 cs.LG 新提交

Hyperparameter Learning for Latent Factorization of Tensors for Representation Learning to Large-scale Dynamic Weighted Directed Network

面向大规模动态加权有向网络表示学习的张量潜在因子超参数学习

Yaqian Zhan, Jialan He, Tianzhu Chen

发表机构 * College of Computer and Information Science（计算机与信息科学学院）

AI总结针对大规模动态加权有向网络，提出基于差分进化的张量潜在因子模型超参数自动优化框架，自动学习正则化参数，提升预测精度并减少人工调参。

详情

AI中文摘要

大规模动态加权有向网络（DWDNs）被广泛用于建模节点间的时变交互。张量潜在因子（LFT）通过低秩嵌入从DWDNs中提取目标知识。然而，与许多机器学习模型类似，LFT的性能在很大程度上依赖于超参数的选择。在实践中，这些参数通常通过手动或网格搜索进行调整，这需要大量的计算资源和人力。受此挑战的启发，本文提出了一种基于差分进化（DE）的LFT自动超参数优化框架（DE-LFT）。该方法将DE集成到LFT模型的训练过程中，以自动学习最优正则化参数$\lambda_1$、$\lambda_2$和$\lambda_3$。因此，模型能够自适应地搜索超参数空间并提高预测精度。在四个真实世界数据集上的实验结果表明，与手动调优的基线相比，所提方法实现了更低的MAE和RMSE，同时减少了对大量参数调优的需求。

英文摘要

Large-scale dynamic weighted directed networks (DWDNs) are widely used to model time-varying interactions among nodes. Latent factorization of tensors (LFT) extracts target knowledge from DWDNs via low-rank embedding. However, similar to many machine learning models, the performance of LFT heavily depends on the selection of hyperparameters. In practice, these parameters are often tuned manually or through grid search, which requires significant computational resources and human effort. Motivated by this challenge, this paper proposes an automated hyperparameter optimization framework based on Differential Evolution (DE) for LFT (DE-LFT). The proposed method integrates DE into the training process of the LFT model to automatically learn optimal regularization parameters $λ_1$, $λ_2$ and $λ_3$. As a result, the model can adaptively search the hyperparameter space and improve prediction accuracy. Experimental results on four real-world datasets demonstrate that the proposed approach achieves lower MAE and RMSE compared with manually tuned baselines while reducing the need for extensive parameter tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.09889 2026-06-10 cs.LG 新提交

Optuna Constrained Tree-Structured Parzen Estimator Is a Joint Density Generalization of c-TPE

Optuna约束树结构Parzen估计器是c-TPE的联合密度推广

Shuhei Watanabe, Kaichi Irie

发表机构 * Independent Researcher（独立研究者）； Kyoto University（京都大学）

AI总结本文证明Optuna的约束TPE是联合c-TPE，使用联合似然的ECI采集函数，并展示其对约束重复的鲁棒性优于独立c-TPE。

2606.09899 2026-06-10 cs.LG cs.AI 新提交

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

当归因补丁说谎时：诊断与二阶修正

Luyang Zhang, Jialu Wang

发表机构 * Carnegie Mellon University（卡内基梅隆大学）； University of California, Santa Cruz（加州大学圣克鲁兹分校）

AI总结研究归因补丁（梯度一阶近似）在机制可解释性中的不可靠性，发现主要误差源于下游网络的非线性，并提出可靠性评分、误差界和HVP二阶修正方法。

Comments 30 pages, 12 figures

详情

AI中文摘要

机制可解释性的一个核心目标是识别哪些内部组件因果地驱动语言模型的行为。由于这些重要性估计作为识别电路的证据，系统性错误可能导致对底层机制的误识别。虽然激活补丁提供了黄金标准的因果度量，但其计算成本在大规模下难以承受。从业者转而依赖归因补丁，一种基于梯度的一阶近似，其可靠性尚不明确。在这项工作中，我们刻画了这种不可靠性的来源，证明主要误差源于下游网络的非线性，而非补丁组件的局部曲率。这一洞察产生了三个实用工具：(i) 检测不可信估计的可靠性评分，(ii) 量化潜在归因误规范的误差界，以及 (iii) 仅需一次额外反向传播即可消除主导误差的Hessian-向量-乘积（HVP）修正。在五个模型家族（124M-9B参数）以及随机令牌和自然（名称交换）扰动的评估中，HVP是唯一在大规模下可行的二阶修正，而标准基线如积分梯度在计算上变得不可行。在对比实验中，多步HVP变体以显著更低的计算量达到或超过积分梯度的准确性，优于先前的二阶基线。这些改进在标准基准上实现了更高保真度的电路恢复，并支持一种屏幕-标记-修复工作流，仅将计算努力针对被标记为不可靠的组件。

英文摘要

A central goal of mechanistic interpretability is to identify which internal components causally drive a language model's behavior. Because these importance estimates serve as the evidence for identifying circuits, systematic errors can lead to the misidentification of the underlying mechanisms. While activation patching provides a gold-standard causal metric, its computational cost is prohibitive at scale. Practitioners instead rely on attribution patching, a gradient-based, first-order approximation whose reliability remains poorly understood. In this work, we characterize the source of this unreliability, demonstrating that the dominant error stems from the non-linearities in the downstream network rather than local curvature at the patched component. This insight yields three practical tools: (i) a reliability score to detect untrustworthy estimates, (ii) error bounds quantifying potential attribution mis-specifications, and (iii) a Hessian-vector-product (HVP) correction that eliminates the leading-order error with only one additional backward pass. In evaluations across five model families (124M-9B parameters) and both random-token and naturalistic (name-swap) perturbations, HVP is the only second-order correction feasible at larger scale, where standard baselines like Integrated Gradients become computationally prohibitive. In comparative experiments, a multi-step HVP variant matches or exceeds the accuracy of Integrated Gradients at significantly lower compute, outperforming prior second-order baselines. These improvements lead to higher-fidelity circuit recovery on standard benchmarks and support a Screen-Flag-Fix workflow that targets computational effort only toward the components flagged as unreliable.

URL PDF HTML ☆

赞 0 踩 0

2606.09929 2026-06-10 cs.LG cs.AI 新提交

Between Amnesia and Chaos: A Memory Stability Expressivity Trilemma for Trainable Dissipative Oscillator Networks

介于遗忘与混沌之间：可训练耗散振荡器网络的记忆稳定性表现力三难困境

Caleb Munigety

AI总结本文研究可训练非线性振荡器网络，发现记忆范围、梯度稳定性和动态表现力三者受阻尼控制，存在无法同时最大化的三难困境，并通过实验验证了理论边界。

详情

AI中文摘要

物理储层计算利用非线性机械动力学，但传统上冻结基底并仅训练线性读出层，假定基底不可训练。我们重新审视这一前提，研究非线性振荡器网络，其质量、阻尼和刚度通过辛积分器端到端学习。我们的核心结果是三难困境：记忆范围、梯度稳定性和动态表现力无法同时最大化，因为三者均由阻尼控制。反向梯度以阻尼决定的速率衰减，限制了信用传播的距离，而前向灵敏度以最大李雅普诺夫指数指数增长，因此可用梯度需要阻尼高于稳定下限。由于李雅普诺夫指数随阻尼增加而下降，而记忆上限随范围增加而下降，稳定训练被限制在一个随范围收缩并在临界点闭合的带状区域内。我们在一个二十振荡器网络上测试了每一步。阻尼扫描发现最大李雅普诺夫指数单调变化并在明确的下限处过零，证实了定理的关键假设。在九个范围上的延迟回忆任务中，学习基底与冻结基底的算力匹配比较显示，学习基底在短范围占优，优势在约十一步范围附近接近并反转，这是带状闭合的预测特征；训练模型稳定在稳定下限附近，自发寻求混沌边缘。解析上限高估经验交叉约五倍，这是可检测梯度与可学习梯度之间的差距，我们报告而非调整消除。贡献在于确认了何时训练物理基底优于冻结基底。

英文摘要

Physical reservoir computing harnesses nonlinear mechanical dynamics but, by convention, freezes the substrate and trains only a linear readout, presuming the substrate is not usefully trainable. We revisit that premise for networks of nonlinear oscillators whose mass, damping, and stiffness are learned end-to-end through a symplectic integrator. Our central result is a trilemma: memory horizon, gradient stability, and dynamical expressivity cannot be simultaneously maximized, because all three are governed by the damping. The backward gradient decays at a rate set by the damping, capping how far back credit can propagate, while forward sensitivities grow exponentially in the largest Lyapunov exponent, so usable gradients require damping above a stability floor. Since the Lyapunov exponent falls as damping rises while the memory ceiling falls as the horizon grows, stable training is confined to a band that contracts with horizon and closes at a critical point. We test every step on a twenty-oscillator network. A damping sweep finds the largest Lyapunov exponent monotone and crossing zero at a well-defined stability floor, confirming the theorem's key assumption. A compute-matched comparison of learned versus frozen substrate on delayed recall across nine horizons shows the learned substrate dominating at short horizons and the advantage closing and reversing near a horizon of eleven steps, the predicted signature of band closure; trained models settle near the stability floor, seeking the edge of chaos unprompted. The analytic ceiling overestimates the empirical crossover roughly fivefold, a gap between detectable and learnable gradient that we report rather than tune away. The contribution is a confirmed account of when training a physical substrate beats freezing it.

URL PDF HTML ☆

赞 0 踩 0

2606.09950 2026-06-10 cs.LG nucl-th physics.comp-ph physics.data-an 新提交

神经网络中数据对称性导致的守恒律

Jakob Galley, Vahid Shahverdi, Axel Flinth

发表机构 * Umeå University（于默奥大学）

AI总结研究训练数据的对称性是否在梯度流训练中产生守恒量，证明对于解析非多项式损失函数，数据对称性一般不产生额外守恒量；对于均方误差损失，数据增强可产生额外守恒量，并利用可张量化网络框架描述该现象。

2606.11130 2026-06-10 cs.LG 新提交

Robust Regression of General ReLUs with Queries

一般ReLU的鲁棒回归与查询

Ilias Diakonikolas, Daniel M. Kane, Mingchen Ma

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）； University of California, San Diego（加利福尼亚大学圣迭戈分校）

AI总结针对高斯分布下一般ReLU的平方损失鲁棒回归，提出首个高效查询算法，使用d polylog(1/ε)+Õ(min{1/p,1/ε})个标签查询达到O(opt)+ε误差，并证明查询复杂度近最优。

Comments Appeared at NeurIPS 2025

详情

AI中文摘要

我们研究在被动学习设置中，最近的工作给出了一个计算高效的算法，使用$poly(d,1/\epsilon)$个标记样本，输出误差为$O(opt)+\epsilon$的假设，其中$opt$是最佳拟合ReLU的平方损失。这里我们关注交互式设置，其中学习器对未标记样本的标签具有某种形式的查询访问。我们的主要结果是第一个计算高效的学习器，使用$d \operatorname{polylog}(1/\epsilon)+\tilde{O}(\min\{1/p, 1/\epsilon\})$个黑盒标签查询，其中$p$是目标函数的偏置，并达到误差$O(opt)+\epsilon$。我们通过证明其查询复杂度界在性质上接近最优来补充我们的算法结果，即使忽略计算约束。最后，我们确定查询访问对于改进被动学习的标签复杂度本质上是必要的。具体而言，对于基于池的主动学习，任何主动学习器都需要$\tilde{\Omega}(d/\epsilon)$个标签，除非它抽取了超多项式数量的未标记样本。

英文摘要

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses $poly(d,1/ε)$ labeled examples and outputs a hypothesis with error $O(opt)+ε$, where $opt$ is the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses $d polylog(1/ε)+\tilde{O}(\min\{1/p, 1/ε\})$ black-box label queries, where $p$ is the bias of the target function, and achieves error $O(opt)+ε$. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires $\tildeΩ(d/ε)$ labels, unless it draws a super-polynomial number of unlabeled examples.

URL PDF HTML ☆

赞 0 踩 0

2606.11149 2026-06-10 cs.LG 新提交

Efficiently Learning Drifting Halfspaces with Massart Noise

高效学习带有Massart噪声的漂移半空间

Mingchen Ma, Guyang Cao, Jelena Diakonikolas, Ilias Diakonikolas

发表机构 * University of Wisconsin-Madison（威斯康星大学麦迪逊分校）

AI总结针对Massart噪声下的漂移概念学习问题，提出一种计算高效的学习器，实现误差η + Õ(Δ^{1/3}/γ)，并证明该误差在低度多项式测试下最优。

Comments To appear at ICML 2026

详情

AI中文摘要

我们研究了在Massart噪声存在下学习漂移概念的问题。在该框架中，在线学习者可以访问独立样本的历史记录，这些样本的标签是目标概念的带噪版本，且目标概念可能每轮发生变化。目标是每轮输出一个具有较小预测误差的假设。我们研究了基本类别——边缘可分离线性分类器（半空间）——的学习复杂性。在正面结果方面，我们给出了一种计算高效的学习器，其误差达到η + Õ(Δ^{1/3}/γ)，其中η是Massart噪声率的上界，Δ是漂移率，γ是边缘。有趣的是，在可实现设置中，我们技术的改编产生了一个高效学习器，其误差率优于先前工作。在下界方面，我们提供了信息-计算权衡的形式化证据，强烈表明我们算法的性能本质上是最优的。具体来说，虽然信息论最优误差随Δ^{1/2}缩放，但我们证明即使在随机分类噪声的特殊情况下，Δ^{1/3}缩放对于低度多项式测试也是不可避免的。

英文摘要

We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output, in each round, a hypothesis with small prediction error. We study the complexity of this learning problem for the fundamental class of margin-separable linear classifiers (halfspaces). On the positive side, we give a computationally efficient learner achieving error $η+ \tilde O(Δ^{1/3}/γ)$, where $η$ upper bounds the Massart noise rate, $Δ$ is the drift rate, and $γ$ is the margin. Interestingly, in the realizable setting, an adaptation of our techniques yields an efficient learner with an improved error rate over prior work. On the lower-bound side, we provide formal evidence of an information-computation tradeoff, strongly suggesting that our algorithm's performance is essentially optimal. Specifically, while the information-theoretically optimal error scales with $Δ^{1/2}$, we prove that $Δ^{1/3}$-scaling is unavoidable for low-degree polynomial tests, even in the special case of random classification noise.

URL PDF HTML ☆

赞 0 踩 0

2606.11171 2026-06-10 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Algorithmic and Minimax Complexities in Kernel Bandits

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结本文通过统一MAIR框架，将GP-UCB与MAMS算法置于共同语言下，提出结合两者优势的安全主算法，并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情

AI中文摘要

高斯过程上置信界（GP-UCB）和决策估计系数（DEC）方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中，用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的（而非真实的）高斯过程先验，并利用实现轨迹的复杂度以及计算可处理性，而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验，我们推广了GP-UCB分析和MAMS算法，提出了一种结合两者优势的安全主算法，并提供了一个核赌博机构造，表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是：算法信息和类宽极小极大系数回答不同的问题，并可能导致不同的差距；核赌博机提供了一个干净的环境，使得这种区别在数学上变得可见。

英文摘要

Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for frequentist RKHS bandits. GP-UCB fixes an algorithmic, rather than true, Gaussian-process prior and exploits realized-trajectory complexity together with computational tractability, whereas MAMS optimizes a robust class-wide MAIR/DEC envelope. Through the unified MAIR framework and heterogeneous positive-semidefinite algorithmic priors, we generalize both the GP-UCB analysis and the MAMS algorithm, propose a safeguarded master that combines their advantages, and provide a kernel-bandit construction showing that algorithmic complexity can be more informative than class-wide minimax or DEC certificates in overparameterized models. The resulting message is that algorithmic information and class-wide minimax coefficients answer different questions and can lead to different gaps; kernel bandits provide a clean setting in which this distinction becomes mathematically visible.

URL PDF HTML ☆

赞 0 踩 0

2606.10119 2026-06-10 stat.ML cs.LG math.ST stat.TH 交叉投稿

Convergence Rates for Neural-Network Estimation with Current-Status Data

当前状态数据下神经网络估计的收敛速度

Yuan Wu, Tianhui Zhou

发表机构 * Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA（生物统计学与生物信息学系，杜克大学，达勒姆，北卡罗来纳州，美国）

AI总结针对当前状态数据，提出非参数神经网络筛最大似然估计器，结合ReLU网络逼近理论与经验过程论证，在Hölder光滑假设下建立显式收敛速度。

2606.10179 2026-06-10 quant-ph cs.LG 交叉投稿

Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

高斯初始化下IQP量子电路玻恩机的可训练性

Gennaro De Luca

发表机构 * Arizona State University（亚利桑那州立大学）

AI总结研究高斯初始化下IQP量子电路玻恩机的可训练性，利用Stein引理和Lipschitz集中界推导梯度方差下界和偏离概率界，讨论避免或促进指数集中的策略及贫瘠高原条件。

Comments 23 pages

详情

AI中文摘要

量子电路玻恩机（QCBMs）通过利用玻恩规则为生成式机器学习提供了一种自然方法。最近的工作提供了一种通过最大均值差异（MMD）损失来经典训练具有瞬时量子多项式（IQP）电路的QCBMs的方法。尽管从IQP电路经典采样被认为是棘手的，但它们的期望值可以经典计算，从而能够训练这些IQP QCBMs。然而，量子机器学习（QML）模型还面临其他各种挑战，包括由指数集中或贫瘠高原引起的可训练性问题。虽然这些问题已经针对从均匀分布采样的参数进行了探索，但很少有工作严格处理任意高斯初始化方案的使用。本文利用Stein引理和高斯随机变量的Lipschitz集中界，提供了梯度方差的解析下界以及梯度偏离其均值的概率集中界。它讨论了避免或促进指数集中的策略，以及贫瘠高原更可能发生的条件。

英文摘要

Quantum Circuit Born Machines (QCBMs) offer a natural approach to generative machine learning by leveraging the Born rule. Recent work has provided a method to classically train QCBMs with Instantaneous Quantum Polynomial (IQP) circuits via the Maximum Mean Discrepancy (MMD) loss. Despite the assumed intractability of sampling from IQP circuits classically, their expectation values can be computed classically, enabling training of these IQP QCBMs. However, quantum machine learning (QML) models have various other challenges, including trainability issues caused by exponential concentration or barren plateaus. While these issues have been explored for parameters sampled from a uniform distribution, little work has been done to rigorously treat the use of arbitrary Gaussian initialization schemes. This work leverages Stein's lemma and Lipschitz concentration bounds for Gaussian random variables to provide an analytical lower bound of the variance of the gradient and a probabilistic concentration bound of the deviation of the gradient from its mean. It discusses strategies to either avoid or encourage exponential concentration, as well as the conditions under which barren plateaus are more likely to occur.

URL PDF HTML ☆

赞 0 踩 0

2606.10361 2026-06-10 stat.ML cs.LG 交叉投稿

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

基于玻尔兹曼间隔的kNN分类近指数收敛速率

Luyuan Yang, Shayan Shafaei, Chao Lan

发表机构 * School of Computer Science, University of Oklahoma（计算机科学系，俄克拉荷马大学）

AI总结提出玻尔兹曼间隔条件，介于Tsybakov与Massart间隔之间，首次证明kNN分类器可实现近指数收敛速率。

Comments Conference on Uncertainty in Artificial Intelligence (UAI)

2606.10377 2026-06-10 math.ST cs.LG stat.TH 交叉投稿

Bidirectional Random Projections

双向随机投影

Chao Lan, Luyuan Yang

发表机构 * School of Computer Science, University of Oklahoma（俄克拉荷马大学计算机科学学院）

AI总结本文分析固定设计下普通最小二乘回归的双向随机投影，导出基于投影数据的OLS估计的期望超额损失界，与仅行投影相比，差距约为O(p1 + C/p1)，其中C随n1/n变化且可为负。

Comments Statistics & Probability Letters (Elsevier)

2606.10562 2026-06-10 math.OC cs.LG cs.NA math.NA 交叉投稿

Accelerating SAV-based optimization via randomized low-rank Hessian approximation

基于随机低秩Hessian近似的加速SAV优化方法

Ryo Sagawa, Daisuke Furihata, Yuto Miyatake

发表机构 * Department of Pure and Applied Mathematics, Graduate School of Information Science and Technology, The University of Osaka（纯粹与应用数学系，信息科学与技术研究生学校，大阪大学）

AI总结提出Nyström增强松弛标量辅助变量方法(N-RSAV)，通过随机低秩Nyström近似引入曲率信息加速收敛，并保持无条件修正能量耗散律，在病态问题（如PINNs）中显著快于传统RSAV方法。

Comments 25 pages, 4 figures

详情

AI中文摘要

我们提出了一种新的优化方法，即Nyström增强的松弛标量辅助变量方法（N-RSAV），它将曲率信息融入RSAV框架，以加速收敛，同时保持无条件修正能量耗散律。现有的基于RSAV的方法仅依赖一阶信息，并且通常收敛缓慢，特别是对于病态问题，例如物理信息神经网络（PINNs）中出现的问题。为了解决这一局限性，我们使用从随机低秩Nyström近似获得的近似Hessian信息来设计RSAV方案中的线性算子。为了保持耗散结构，我们通过特征值截断强制执行半正定性。此外，我们引入了一种自适应策略，根据原始能量和修正能量之间的偏差重用近似Hessian，从而显著降低计算成本。我们还提供了在Polyak-Lojasiewicz（PL）条件下具有一般半正定算子的RSAV方案的收敛性分析，并在PL条件和额外凸性假设下建立了N-RSAV的相应收敛保证。在具有有效低秩结构的病态问题（包括凸二次问题和PINNs训练）上的数值实验表明，所提出的方法比传统的基于RSAV的方法实现了更快的收敛。

英文摘要

We propose a new optimization method, the Nyström-enhanced relaxed scalar auxiliary variable method (N-RSAV), which incorporates curvature information into the RSAV framework to accelerate convergence while preserving an unconditional modified energy dissipation law. Existing RSAV-based methods rely solely on first-order information and often suffer from slow convergence, particularly for ill-conditioned problems such as those arising in physics-informed neural networks (PINNs). To address this limitation, we design the linear operator in the RSAV scheme using approximate Hessian information obtained from a randomized low-rank Nyström approximation. To preserve the dissipation structure, we enforce positive semidefiniteness through eigenvalue truncation. Furthermore, we introduce an adaptive strategy that reuses the approximate Hessian based on the deviation between the original and modified energies, significantly reducing computational cost. We also provide a convergence analysis of the RSAV scheme with a general positive semidefinite operator under the Polyak-Lojasiewicz (PL) condition and establish corresponding convergence guarantees for N-RSAV under the PL condition and an additional convexity assumption. Numerical experiments on ill-conditioned problems with effectively low-rank structure, including convex quadratic problems and training of PINNs, demonstrate that the proposed methods achieve substantially faster convergence than conventional RSAV-based approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.10770 2026-06-10 stat.ME cs.LG 交叉投稿

Correcting Variable Importance Scored by Random Forests

修正随机森林产生的变量重要性评分

Guancheng Zhou, Haiping Xu, Jason Liu, Donghui Yan

发表机构 * Computer and Information Science（计算机与信息科学）； Mathematics and Data Science（数学与数据科学）； University of Massachusetts, Dartmouth, MA（马萨诸塞大学达特茅斯分校）； The Rivers School, Weston, MA（韦斯特on学校的河流学校）

AI总结针对随机森林变量重要性受变量间相关性影响的问题，提出基于条件相关性的分组方法进行修正，实验证明两种计算高效方案均能有效校正变量重要性。

Comments 22 pages, 10 figures

详情

AI中文摘要

随机森林产生的变量重要性在统计分析中广泛应用，在辅助模型解释、模型选择和诊断、成本受限学习等任务中发挥重要作用。然而，RF中变量重要性的计算未考虑变量间的相关性，与许多其他变量相关的变量往往会获得较低的重要性指数，或被其他强相关变量完全掩盖（即重要性指数接近零）。为了在计算变量重要性时避免不相关变量的影响，我们提出根据变量的条件相关性（以响应变量为条件）对变量进行分组。我们探索了两种计算高效的方案：一种将变量单独分组，然后将感兴趣的变量与所有相关变量分离；另一种使用聚类根据变量间的成对条件相关性进行分组。实验表明，两种方法都能对变量重要性进行合理的修正。

英文摘要

Variable importance produced by Random Forests (RF) is used widely in statistical data analysis, and has played an important role in a variety of tasks such as assisting model interpretation, model selection and diagnosis, and cost-bounded learning etc. However, the calculation of variable importance in RF does not take into account of the correlations among variables, and variables that are correlated to many other variables tend to receive a lower importance index or being completely masked (i.e., with an importance index near zero) by other strongly correlated variables. To prevent influence from unwanted correlated variables in calculating variable importance, we propose to group variables by their conditional correlations (conditional on the response variable). We explore two computationally efficient options, with one grouping variables individually, and then separates the variable of interest from all correlated variables, while the other uses clustering to group variables according to their pair-wise conditional correlations. Our experiments show that both lead to sensible corrections to the importance of variables.

URL PDF HTML ☆

赞 0 踩 0

2606.11045 2026-06-10 cs.AI cs.LG 交叉投稿

What Fits (Into Few Tokens) Doesn't Overfit: Compression and Generalization in ML Research Agents

什么适合（少量标记）就不会过拟合：ML研究智能体中的压缩与泛化

Martin Andres Bertran, Aaron Roth, Zhiwei Steven Wu

发表机构 * Amazon Responsible AI（亚马逊负责任人工智能）； University of Pennsylvania（宾夕法尼亚大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结研究LLM驱动的科研智能体在输出和输入压缩下能否保持性能，发现短提示和可压缩反馈足以复现高性能模型，支持成功策略位于低复杂度区域的假设。

详情

AI中文摘要

原则上，自适应地重复使用保留的基准测试会引发过拟合。然而，基准驱动的机器学习（ML）在实践中却出人意料地很少出现过拟合。一个吸引人的假设是，成功的ML策略具有高度可压缩性。我们在LLM驱动的科研智能体设置中研究这一点，通过两个互补的信息瓶颈可以直接检验该假设。在\emph{输出压缩}中，探索智能体使用验证集自适应地搜索高性能模型，我们测试一个全新的“复现智能体”是否能在仅给出极短提示和训练数据的情况下复现其性能。在\emph{输入压缩}中，探索者仅接收一位反馈，指示每个提交的模型是否优于当前最佳。跨越8个数据集（涵盖表格分类、视觉、语言建模、扩散建模和奖励建模），我们发现这些瓶颈对性能影响很小：短提示和可压缩反馈足以复现和找到高性能模型。该假设是可证伪的：当我们故意诱导验证集过拟合时，结果无法用短提示复现。综合来看，我们的结果支持描述长度解释，即基准驱动ML中缺乏过拟合的原因是成功策略占据策略空间的低复杂度区域。

英文摘要

Reusing a held-out benchmark adaptively should, in principle, invite overfitting. Yet benchmark-driven machine learning (ML) has produced surprisingly little overfitting in practice. An attractive hypothesis is that successful ML strategies are highly compressible. We study this in the setting of LLM-driven research agents, where the hypothesis becomes directly testable via two complementary information bottlenecks. In \emph{output compression}, an exploration agent adaptively searches for high-performance models using a validation set, and we test whether a fresh ``reproducer agent'' can reproduce its performance given only an extremely short prompt and the training data. In \emph{input compression}, the explorer receives only one-bit feedback indicating whether each submitted model improves on the running best. Across 8 datasets spanning tabular classification, vision, language modeling, diffusion modeling, and reward modeling, we find that these bottlenecks have little effect on performance: short prompts and compressible feedback are sufficient to reproduce and find high-performance models. The hypothesis is falsifiable: when we deliberately induce validation-set overfitting, the results fail to reproduce with short prompts. Taken together, our results support a description-length explanation for the lack of overfitting in benchmark-driven ML: successful strategies occupy a low-complexity region of strategy space.

URL PDF HTML ☆

赞 0 踩 0

2402.00152 2026-06-10 cs.LG cs.NA math.NA stat.ML 版本更新

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

更深还是更宽：从Sobolev损失下最优泛化误差的视角

Yahong Yang, Juncai He

发表机构 * GitHub

AI总结本文通过分析Sobolev损失下的最优泛化误差，比较了深层网络与宽层网络，揭示了样本点数量、网络参数和损失函数正则性对架构选择的影响，并应用于深度Ritz和PINN方法。

Comments arXiv admin note: text overlap with arXiv:2310.10766, arXiv:2305.08466

详情

AI中文摘要

构建神经网络的架构是机器学习社区的一个具有挑战性的追求，而更深还是更宽的困境仍然是一个持续存在的问题。本文探讨了具有灵活层数的深层神经网络（DeNNs）与具有有限隐藏层的宽神经网络（WeNNs）之间的比较，重点关注它们在Sobolev损失下的最优泛化误差。分析研究表明，神经网络的架构可能受到多种因素的显著影响，包括样本点数量、神经网络内的参数以及损失函数的正则性。具体来说，更多的参数倾向于有利于WeNNs，而增加的样本点数量和损失函数的更大正则性则倾向于采用DeNNs。我们最终将该理论应用于使用深度Ritz和物理信息神经网络（PINN）方法求解偏微分方程，指导神经网络的设计。

英文摘要

Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.

URL PDF HTML ☆

赞 0 踩 0

2508.21022 2026-06-10 cs.LG math.OC stat.ML 版本更新

A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

子采样自然梯度算法的草图-投影分析

Gil Goldshlager, Jiang Hu, Lin Lin

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过将子采样自然梯度下降（SNG）视为草图-投影方法，提出基于平方体积采样的新代理，证明单小批量下SNG方向期望等于预处理梯度下降步，给出全局收敛保证和显式收敛率，并解释SNG相对于SGD的优势在于更有效利用模型雅可比矩阵的谱衰减。

Comments 26 pages, 7 figures

详情

AI中文摘要

子采样自然梯度下降（SNG）已被用于实现高精度科学机器学习，但基于随机预条件的标准分析无法洞察实际小样本设置。我们通过将SNG分析为草图-投影方法克服了这一限制。受此视角启发，我们摒弃了使用两个独立小批量解耦梯度和预条件的常规理论代理，取而代之的是基于平方体积采样的新代理。在这个新代理下，我们证明即使存在耦合，SNG方向的期望也等于预处理梯度下降步，从而得到：(i) 使用任意大小的单个小批量时的全局收敛保证，以及(ii) 用与草图-投影结构相关的量显式表征收敛速率。这些发现进而为小样本设置提供了新见解，例如表明SNG相对于SGD的优势在于它能更有效地利用模型雅可比矩阵中的谱衰减。我们还扩展这些思想以解释SNG的一种流行结构化动量方案SPRING，通过证明它自然源于加速草图-投影方法。

英文摘要

Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample settings. We overcome this limitation by instead analyzing SNG as a sketch-and-project method. Motivated by this lens, we discard the usual theoretical proxy which decouples gradients and preconditioners using two independent mini-batches, and we replace it with a new proxy based on squared volume sampling. Under this new proxy we show that the expectation of the SNG direction becomes equal to a preconditioned gradient descent step even in the presence of coupling, leading to (i) global convergence guarantees when using a single mini-batch of any size, and (ii) an explicit characterization of the convergence rate in terms of quantities related to the sketch-and-project structure. These findings in turn yield new insights into small-sample settings, for example by suggesting that the advantage of SNG over SGD is that it can more effectively exploit spectral decay in the model Jacobian. We also extend these ideas to explain a popular structured momentum scheme for SNG, known as SPRING, by showing that it arises naturally from accelerated sketch-and-project methods.

URL PDF HTML ☆

赞 0 踩 0

2510.12071 2026-06-10 cs.LG 版本更新

Influence Dynamics and Stagewise Data Attribution

影响动力学与分阶段数据归因

Jin Hwa Lee, Matthew Smith, Maxwell Adam, Jesse Hoogland

发表机构 * University College London（伦敦大学学院）； Independent（独立）； University of Melbourne（墨尔本大学）； Timaeus

AI总结针对神经网络训练中样本影响动态变化的问题，基于奇异学习理论提出分阶段数据归因框架，预测影响非单调变化（符号翻转、尖峰），并在玩具模型和语言模型中验证与模型学习阶段的对应。

Comments 28 pages, 15 figures

2512.06343 2026-06-10 cs.LG cs.AI cs.CL 版本更新

When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models

当距离干扰：BT损失中表示距离偏差对奖励模型的影响

Tong Xie, Andrew Bai, Yuanhao Ban, Yunqi Hong, Haoyu Li, Cho-Jui Hsieh

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结分析BT损失中表示距离导致的梯度偏差，提出NormBT自适应归一化方案，提升奖励模型在细粒度区分上的性能。

Comments ICML 2026

详情

AI中文摘要

奖励模型是RLHF框架中大型语言模型对齐的核心。奖励建模中使用的标准目标是Bradley-Terry（BT）损失，它从由选择和拒绝响应组成的成对数据中学习。在这项工作中，我们分析了BT损失的每个样本梯度，并展示了由于表示距离而产生的虚假学习信号。特别是，BT梯度范数由两个不同的组成部分缩放：（1）预测误差，反映选择和拒绝响应之间预测奖励的差异，以及关键地，（2）在最后一层输出空间中测量的对之间的表示距离。虽然第一项捕获了预期的训练信号，但第二项会显著影响更新幅度并导致学习错位。具体来说，表示距离小的对即使排名错误也经常收到微弱的更新，而距离大的对则收到不成比例的大更新。这导致来自大距离对的梯度掩盖了来自小距离对的梯度，而细粒度区分在小距离对中尤为重要。为了克服这一限制，我们提出了NormBT，一种自适应成对归一化方案，重新缩放更新以平衡表示驱动效应，并将学习信号聚焦于预测误差。NormBT是对BT损失的轻量级、即插即用修改，开销可忽略。在各种LLM骨干和数据集上，NormBT一致地提高了奖励模型性能，在RewardBench的推理类别上取得了超过5%的显著提升，该类别包含大量细粒度对。

英文摘要

Reward models are central to Large Language Model (LLM) alignment within the framework of RLHF. The standard objective used in reward modeling is the Bradley-Terry (BT) loss, which learns from pairwise data consisting of chosen and rejected responses. In this work, we analyze the per-sample gradient of BT-loss and show spurious learning signals due to representation distance. In particular, BT gradient norm scales with two distinct components: (1) prediction error, reflected by the difference in predicted rewards between chosen and rejected responses, and critically, (2) representation distance between the pair measured in the output space of the final layer. While the first term captures the intended training signal, the second term can significantly impact the update magnitude and misalign learning. Specifically, pairs with small representation distance often receive vanishingly weak updates, even when misranked, while pairs with large distance receive disproportionately strong updates. This leads to gradients from large-distance pairs to overshadow those from small-distance pairs, where fine-grained distinctions are especially important. To overcome this limitation, we propose NormBT, an adaptive pair-wise normalization scheme that rescales updates to balance representation-driven effects and focuses learning signals on prediction error. NormBT is a lightweight, drop-in modification to BT loss with negligible overhead. Across various LLM backbones and datasets, NormBT improves reward model performance consistently, with notable gains of over 5% on the Reasoning category of RewardBench, which contains numerous fine-grained pairs.

URL PDF HTML ☆

赞 0 踩 0

2603.12785 2026-06-10 cs.LG math.ST stat.TH 版本更新

残差控制乘子学习用于随机约束决策

Kang Liu, Jianchen Hu, Ziyu Qu, Edward Hengzhou Yan, Lun Yang, Meng Zhang

发表机构 * Xi’an Jiaotong University（西安交通大学）； Tencent（腾讯）； China University of Geosciences（中国地质大学）

AI总结提出残差控制乘子学习(RCML)，通过将乘子更新重构为投影压力反馈，并引入模块化随机稳定组件，解决随机约束决策中原始-对偶方法因小批量噪声导致乘子更新不稳定的问题，实现有限增益收敛和局部KKT残差解释。

详情

AI中文摘要

随机约束决策需要在强制执行统计要求（如安全性或公平性）的同时优化性能目标。然而，标准的原始-对偶方法在随机小批量反馈下难以稳健地更新乘子，因为小批量梯度和约束估计的噪声会直接累积到乘子记忆中。为了解决这个问题，我们提出了残差控制乘子学习（RCML），它将乘子更新重新表述为投影压力反馈。核心思想是将投影乘子分解为用于原始下降的有效压力信号和用于有限增益乘子跟踪的压力记忆残差。为了处理异质和有噪声的观测，我们进一步用模块化随机稳定组件增强这个残差-积分骨干。对于凸-仿射骨干，我们建立了有限增益收敛，推导了小批量反馈下的随机残差界，并表明在非凸问题的正则KKT点附近，残差反馈律具有局部KKT残差解释。在优化、分配和公平排序任务上的实验表明，RCML在保持竞争性目标性能的同时，改善了可行性控制和乘子稳定性。代码可在此处获取。

英文摘要

Stochastic constrained decision-making requires optimizing performance objectives while enforcing statistical requirements such as safety or fairness. However, standard primal--dual methods struggle to update multipliers robustly under stochastic mini-batch feedback, as the noise of mini-batch gradients and constraint estimates can be directly accumulated into the multiplier memory. To address this issue, we propose Residual-Controlled Multiplier Learning (RCML), which reformulates multiplier updating as projected-pressure feedback. The central idea is to decompose the projected multiplier into an effective pressure signal for primal descent and a pressure-memory residual for finite-gain multiplier tracking. To handle heterogeneous and noisy observations, we further augment this residual-integral backbone with modular stochastic stabilization components. For the convex-affine backbone, we establish finite-gain convergence, derive a stochastic residual bound under mini-batch feedback, and show that the residual feedback law admits a local KKT-residual interpretation near regular KKT points of nonconvex problems. Experiments across optimization, allocation, and fair-ranking tasks show that RCML improves feasibility control and multiplier stability while maintaining competitive objective performance. Code is released at https://anonymous.4open.science/r/RCML-3114/.

URL PDF HTML ☆

赞 0 踩 0

2503.20272 2026-06-10 stat.ML cs.LG 版本更新

最大熵远程采样的超缩放NLP界

Gabriel Ponte, Marcia Fampa, Jon Lee

发表机构 * University of Michigan（密歇根大学）； Universidade Federal do Rio de Janeiro（里约热内卢联邦大学）

AI总结针对最大熵远程采样问题，提出基于凸松弛的超缩放NLP界，理论上优于传统互补NLP界，并适用于秩亏协方差矩阵，通过数值实验验证了算法先进性。

详情

AI中文摘要

最大熵远程采样问题（MERSP）是从$n$个随机变量中选择一个包含$s$个随机变量的子集，以最大化关于一组不可直接观测的目标随机变量的信息。我们假设所有这些随机变量服从联合高斯分布，并且协方差矩阵已知。最后，我们使用香农微分熵来度量信息。解决中等规模MERSP实例精确解的主要方法是分支定界（B\&B），因此先前的工作集中于上界。在我们的工作之前，有两种MERSP的上界方法：所谓的“互补NLP界”和“谱界”，两者都是25年前提出的。我们现在能够建立这两个上界之间的支配关系。此外，我们基于一个微妙的凸松弛，提出了一种新颖有效的“超缩放NLP界”（hNLP界）。MERSP的“互补”版本hNLP界推广了先前MERSP的互补NLP界。我们提供了理论保证，给出了互补hNLP界严格支配互补NLP界的充分条件。此外，hNLP公式允许我们在秩亏协方差矩阵满足技术条件时推导其上界。这与先前的NLP界仅适用于正定协方差矩阵（因为它依赖于互补公式）形成对比。此外，我们描述了计算超缩放参数的步骤。最后，对于B\&B，我们提供了一种变量固定方法和结果，指导构建子问题的最佳方式。在基准实例上的数值实验证明了我们的方法在推进MERSP算法前沿方面的有效性。

英文摘要

The maximum-entropy remote sampling problem (MERSP) is to select a subset of $s$ random variables from a set of $n$ random variables, so as to maximize the information concerning a set of target random variables that are not directly observable. We assume that the set of all of these random variables follows a joint Gaussian distribution, and that we have the covariance matrix available. Finally, we measure information using Shannon's differential entropy. The main approach for exact solution of moderate-sized instances of MERSP has been branch-and-bound (B\&B), and so previous work concentrated on upper bounds. Prior to our work, there were two upper-bounding methods for MERSP: the so-called ``complementary NLP bound'' and the ``spectral bound'', both introduced 25 years ago. We are able now to establish domination results between these two upper bounds. Further, we propose a novel and effective ``hyper-scaled NLP bound'' (hNLP bound) based on a subtle convex relaxation. The ``complementary'' version of hNLP bound for MERSP generalizes the previous complementary NLP bound for MERSP. We provide theoretical guarantees, giving sufficient conditions under which the complementary hNLP bound strictly dominates the complementary NLP bound. In addition, the hNLP formulation allows us to derive upper bounds for rank-deficient covariance matrices when they satisfy a technical condition. This is in contrast to the previous NLP bound that worked with only positive definite covariance matrices (because it was wedded to a complementary formulation). Additionally, we describe procedures for calculating hyper-scaling parameters. Finally, for B\&B, we provide a variable-fixing methodology and results guiding the best way to construct subproblems. Numerical experiments on benchmark instances demonstrate the effectiveness of our approaches in advancing the algorithmic state-of-the-art for MERSP.

URL PDF HTML ☆

赞 0 踩 0

2603.02673 2026-06-10 stat.ML cs.LG 版本更新

Exact Functional ANOVA Decomposition for Categorical Inputs Models

类别输入模型的精确函数ANOVA分解

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

发表机构 * Institut de Mathématiques de Toulouse（图卢兹数学研究所）； Université de Toulouse（图卢兹大学）； CNRS（国家科学研究中心）

AI总结针对类别输入模型，提出一种无需假设的闭式函数ANOVA分解方法，高效处理任意依赖结构，并自然推广SHAP值。

详情

AI中文摘要

函数ANOVA通过将模型预测分解为主效应和高阶交互，为可解释性提供了原则性框架。对于独立特征，该分解定义明确，与SHAP值紧密相关，并作为加性可解释性的基石。然而，对于一般依赖分布，缺乏显式闭式表达式迫使实践者依赖昂贵的基于采样的近似。我们完全解决了类别输入的这一限制。通过将函数分析与离散傅里叶分析的扩展相结合，我们在没有任何假设的情况下推导出闭式分解。我们的公式计算效率非常高。它无缝地恢复了经典独立情况，并扩展到任意依赖结构，包括具有非矩形支撑的分布。此外，利用SHAP与ANOVA在独立性下的内在联系，我们的框架为一般类别设置提供了SHAP值的自然推广。

英文摘要

Functional ANOVA offers a principled framework for interpretability by decomposing a model's prediction into main effects and higher-order interactions. For independent features, this decomposition is well-defined, strongly linked with SHAP values, and serves as a cornerstone of additive explainability. However, the lack of an explicit closed-form expression for general dependent distributions has forced practitioners to rely on costly sampling-based approximations. We completely resolve this limitation for categorical inputs. By bridging functional analysis with the extension of discrete Fourier analysis, we derive a closed-form decomposition without any assumption. Our formulation is computationally very efficient. It seamlessly recovers the classical independent case and extends to arbitrary dependence structures, including distributions with non-rectangular support. Furthermore, leveraging the intrinsic link between SHAP and ANOVA under independence, our framework yields a natural generalization of SHAP values for the general categorical setting.

URL PDF HTML ☆

赞 0 踩 0

2605.17189 2026-06-10 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Sample-efficient inductive matrix completion with noise and inexact side-information

具有噪声和不精确侧信息的样本高效归纳矩阵补全

Yuepeng Yang, Cong Ma

发表机构 * Yale Department of Statistics and Data Sciences, Yale University（耶鲁大学统计与数据科学系）； UChicago Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结本文研究了在存在噪声和不精确侧信息的情况下，通过非凸投影梯度下降算法实现样本高效的归纳矩阵补全，提出了一个适用于有效问题规模的正则性条件，实现了线性收敛和估计误差仅依赖于有效问题规模的结论。

详情

AI中文摘要

低秩矩阵补全是一个广泛研究的问题，具有许多变体。归纳矩阵补全（IMC）结合了行和列的侧信息以显著缩小搜索空间。先前的工作分为两个领域：利用这种结构实现减少样本复杂度的方法，但仅适用于无噪声环境；以及处理噪声但需要样本复杂度与环境矩阵维度相匹配的方法，从而放弃了侧信息应提供的样本效率。在本文中，我们通过研究具有噪声的IMC并使用非凸投影梯度下降算法进行谱初始化来填补这一差距。我们的主要技术贡献是建立一个适用于由有效问题规模决定的减少样本复杂度的IMC损失函数的正则性条件，其规模与侧信息维度而非环境维度成比例。这直接导致了线性收敛和估计误差仅依赖于有效问题规模而非环境矩阵维度。我们进一步将分析扩展到不精确侧信息设置，证明减少的样本复杂度得以保持，并且估计误差在不精确性方面是最佳的。广泛的模拟和在MovieLens数据集上的实际实验验证了我们的理论发现。

英文摘要

Inductive matrix completion (IMC) is a variant of low-rank matrix completion that incorporates row and column side-information. In principle, it can reduce the effective dimension of the recovery problem from the ambient matrix size to the dimension of the side-information features. Existing theory, however, does not fully realize this advantage in the noisy setting: sample-efficient guarantees only apply to noiseless recovery, while noisy guarantees require sample sizes comparable to ordinary matrix completion. This paper closes this gap for noisy IMC. We analyze a nonconvex projected gradient descent algorithm with spectral initialization and prove that, under exact side-information, it achieves linear convergence and stable recovery at a sample complexity governed by the effective side-information dimension rather than the ambient matrix dimension. The key technical ingredient is a local regularity condition for the IMC loss that holds at this reduced sample size, despite the mismatch between the observation pattern and the side-information subspaces. We further extend the analysis to inexact side-information, showing that the same reduced sample complexity is preserved and that the estimation error degrades optimally with the level of subspace misspecification. Motivated by this trade-off, we also propose a penalized interpolation between IMC and ordinary matrix completion that balances sample efficiency against robustness to imperfect side-information. Simulations and experiments on the MovieLens dataset support the theoretical findings and illustrate the practical benefits of exploiting side-information in low-sample regimes.

URL PDF HTML ☆

赞 0 踩 0

2606.08799 2026-06-10 stat.ML cs.LG 版本更新

Generalization in Nonlinear Least Squares via Learned Feature Geometry

非线性最小二乘中基于学习特征几何的泛化性

Ayub Kharel, Ilja Kuzborskij, Patrick Rebeschini, Yasin Abbasi-Yadkori

发表机构 * University of Oxford（牛津大学）； Google DeepMind（谷歌DeepMind）； Sapient Intelligence（智睿科技）

AI总结通过算法稳定性分析岭正则化非线性最小二乘的泛化误差，利用经验雅可比Gram矩阵和残差曲率项定义数据依赖的有效维度，并证明其与内在维度而非参数数量相关。

Comments Preprint, under review

详情

AI中文摘要

我们通过平均算法稳定性研究了岭正则化非线性最小二乘模型的泛化性，推导了局部极小值点的误差界，该误差界依赖于数据依赖的有效维度，该维度通过经验雅可比Gram矩阵和残差-曲率项反映了训练参数处梯度模型的几何结构。在线性情况下，曲率项消失，这恢复了雅可比核协方差的经典有效维度，但评估的是训练后的模型而非初始化时的模型（如神经正切核分析中常见）。我们进一步通过梯度特征的覆盖复杂度来界定该有效维度，从而得到依赖于学习几何而非参数数量的保证。特别地，对于流形支持的数据和分段Lipschitz雅可比矩阵，界限随内在维度缩放；而对于单隐层ReLU网络，该机制可通过激活稳定区域的数量显式表达。在合成流形、聚类分布和基准数据集上的实验展示了训练后雅可比矩阵的压缩、残差-曲率线性化的紧致性，以及稳定性界限与观测泛化差距的一致性。我们界限的一个关键特征是推导的简洁性，它基于强对数凹噪声下的Brascamp-Lieb不等式从第一性原理得出。

英文摘要

We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.

URL PDF HTML ☆

赞 0 踩 0

2606.09877 2026-06-10 cs.LG cs.CE cs.CL 新提交

Streaming Knowledge Compilation: Proactive Materiality-Scored Pinning for Time-Evolving LLM Wikis

流式知识编译：面向时变LLM维基的主动物质性评分固定

Juan M. Huerta

发表机构 * Zinnia Tech Solutions（Zinnia科技解决方案）

AI总结提出流式知识编译框架，通过物质性信号φ_t主动固定重要文档，在金融和维基百科领域验证O(√T log K)遗憾界，并揭示LLM评判偏差。

详情

AI中文摘要

LLM维基系统将知识编译为预填充的KV缓存以实现高效推理，但假设语料库是静态的——当底层信息格局演变时，这一假设失效。我们形式化流式知识编译：给定文档流、固定令牌预算以及在摄取时未知的未来查询，维护一个编译后的维基，使其相对于具有完美预见力的离线oracle的累积遗憾最小化。关键洞察是物质性信号φ_t(k,n)∈[0,1]，它对时间t实体k的文档重要性进行评分，作为查询相关性的代理，在查询到达前主动固定；我们证明O(√T log K)遗憾界，其中ε=E[|φ_t-φ̂_t|]是唯一的领域特定量。我们在两个领域实例化：金融领域，其中φ_t是由冻结的Llama 3.1 8B分类头预测的异常股票波动率（在76K篇文章上AUROC=0.728，严格时间分割；预测为物质性的文章实现1.49倍更高的实际远期波动率）；以及维基百科领域，其中φ_t是异常编辑比率（AER），一种横截面标准化的编辑速度——表明同一算法可泛化到金融领域之外。在173个匹配对（金融）和119个（维基百科）上的端到端QA评估揭示了训练后知识上普遍的LLM-as-judge混淆，确立了遗憾分析——而非绝对QA分数——是编译知识系统的可靠评估指标。金融累积遗憾收敛至-20.0（-0.12/步）；维基百科收敛至+16.0（+0.13/步），正号确认维基百科编辑内容确实是训练后的——更丰富的上下文持续提高分数（无维基3.80 vs. Oracle 4.74）——并消除了这一混淆。O(√T log K)保证适用于任何知识差距可从流信号预测的领域。

英文摘要

LLM wiki systems compile knowledge into pre-filled KV caches for efficient inference, but assume a static corpus -- an assumption that fails whenever the underlying information landscape evolves. We formalize Streaming Knowledge Compilation: given a document stream, a fixed token budget, and future queries unknown at ingestion time, maintain a compiled wiki that minimizes cumulative regret against an offline oracle with perfect foresight. The enabling insight is a materiality signal $ϕ_t(k,n)\in[0,1]$ that scores document importance for entity $k$ at time $t$, acting as a query-relevance surrogate for proactive pinning before queries arrive; we prove an $O(\sqrt{T\log K})$ regret bound where $\varepsilon=\mathbb{E}[|ϕ_t-\hatϕ_t|]$ is the only domain-specific quantity. We instantiate in two domains: finance, where $ϕ_t$ is abnormal stock volatility predicted by frozen Llama 3.1 8B classification head (AUROC = 0.728 on 76K articles, strict temporal split; $1.49\times$ higher realized forward volatility for predicted-material articles); and Wikipedia, where $ϕ_t$ is the Abnormal Edit Ratio (AER), a cross-sectionally normalized edit velocity -- showing the same algorithm generalizes beyond the finance domain. End-to-end QA evaluation on 173 matched pairs (finance) and 119 (Wikipedia) reveals a pervasive LLM-as-judge confound on post-training knowledge, establishing that regret analysis -- not absolute QA scores -- is the reliable evaluation metric for compiled knowledge systems. Finance cumulative regret converges to -20.0 (-0.12/step); Wikipedia to +16.0 (+0.13/step), with the positive sign confirming that Wikipedia edit content is genuinely post-training -- richer context consistently improves scores (No Wiki 3.80 vs. Oracle 4.74) -- and eliminates this confound. The $O(\sqrt{T\log K})$ guarantee applies to any domain where knowledge gaps can be predicted from streaming signals.

URL PDF HTML ☆

赞 0 踩 0

2606.09879 2026-06-10 cs.LG 新提交

Operator Fusion for LLM Inference on the Tensix Architecture

面向Tensix架构的LLM推理算子融合

Qingbo Wu, Ke Li, Wenzhu Wang, Jie Yu, Ruian Zhang, Lili Liu

发表机构 * KylinSoft Co., Ltd, Tianjin, China（KylinSoft有限公司，天津，中国）； openKylin Community, Tianjin, China（openKylin社区，天津，中国）； College of Computer Science and Technology, National University of Defense Technology, Changsha, China（计算机科学与技术学院，国防科技大学，长沙，中国）

AI总结针对Tensix架构的LLM推理瓶颈，提出RMSNorm与矩阵乘法融合的算子融合策略，利用片上SRAM和NoC多播减少DRAM读写与调度开销，在Wormhole平台上实现注意力延迟降低37.44%、MLP延迟降低15.89%，且数值一致性保持98.75%以上。

Comments 11 pages, 5 figures

详情

AI中文摘要

本研究解决了Tenstorrent Tensix架构上Transformer模型的设备端推理瓶颈，并提出了一种增强数据局部性的算子融合策略。RMSNorm与自注意力和FFN中的矩阵乘法融合，使得内存受限和计算受限的算子能够在片上SRAM中连续执行，从而显著减少中间结果的DRAM读写次数和调度开销。为了支持多核并行，利用基于NoC的多播机制，其中行/列主节点高效地将输入和权重分发到核心网格，缓解DRAM带宽争用。在Wormhole平台上使用Qwen2.5-0.5B、Qwen3-0.6B和Qwen3-4B进行的实验显示，注意力延迟降低高达37.44%，MLP延迟降低15.89%，每解码层延迟降低高达7.91%，同时皮尔逊相关系数（PCC）保持在98.75%以上，证实了在数值一致性下显著的端到端效率提升。

英文摘要

This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in self-attention and in the FFN, enabling back-to-back execution of memory-bound and compute-bound operators in on-chip SRAM to significantly reduce DRAM reads/writes of intermediate results and scheduling overhead. To support multi-core parallelism, a NoC-based multicast mechanism is leveraged in which row/column master nodes efficiently distribute inputs and weights across the core mesh, alleviating DRAM bandwidth contention. Experiments on the Wormhole platform with Qwen2.5-0.5B, Qwen3-0.6B, and Qwen3-4B show up to 37.44% latency reduction for attention and 15.89% for MLP, with up to 7.91% reduction per decoder layer, while Pearson Correlation Coefficient (PCC) remains above 98.75%, confirming significant end-to-end efficiency gains under numerical consistency.

URL PDF HTML ☆

赞 0 踩 0

2606.09885 2026-06-10 cs.LG stat.ML 新提交

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

TENP：用于混合专家的梯形专家神经元剪枝

Jiangyang He, Shaolin Zhu, Deyi Xiong

发表机构 * TJUNLP Lab, School of Computer Science and Technology, Tianjin University（天津大学计算机科学与技术学院 TJUNLP实验室）

AI总结提出TENP框架，通过识别重要专家并对其余专家进行神经元剪枝，保留梯形参数模式，在40%路由专家稀疏度和平均63.76%激活参数下，DeepSeek模型准确率仅下降1点，代码生成任务提升10%。

详情

AI中文摘要

混合专家大语言模型通过稀疏激活实现高效扩展，但其部署受到专家大量静态参数占用的根本限制。现有压缩方法要么移除整个专家，破坏路由拓扑并损害性能，要么依赖非结构化权重剪枝，实际效率有限。为解决这些局限，我们提出TENP，一种结构化的梯形专家神经元剪枝框架。使用少量样本，我们识别并保留重要专家，同时对次要专家应用专家神经元剪枝（ENP），从浅层到深层以梯形模式保留模型参数。在评估专家重要性时，我们联合考虑专家输出的幅度及其改变输入向量方向的能力。对于ENP，我们测量每个神经元对专家输出的投影贡献，以识别并保留重要神经元。我们在Qwen和DeepSeek模型上进行了广泛实验。在路由专家稀疏度为40%且平均激活63.76%专家参数的情况下，DeepSeek模型相比全参数模型准确率仅下降1点。此外，在代码生成任务上，它比全参数模型提升10%。

英文摘要

Mixture-of-Experts large language models (LLMs) scale efficiently through sparse activation, yet their deployment is fundamentally constrained by the large static parameter footprint of experts. Existing compression approaches either remove entire experts, disrupting routing topology and harming performance, or rely on unstructured weight pruning with limited practical efficiency. To address the limitations, we propose TENP, a structured Trapezoidal ExpertNeuron Pruning framework. Using a few samples, we identify and retain important experts, while applying expert neuron pruning (ENP) to less important experts, reserving model parameters in a trapezoidal pattern from shallow to deep layers. When evaluating expert importance, we jointly consider both the magnitude of the expert output and its ability to change the direction of the input vector. For ENP, we measure each neuron's projected contribution to the expert output to identify and retain important neurons. We conduct extensive experiments on the Qwen and DeepSeek models. Under a routing expert sparsity of 40% and an average of 63.76% activated expert parameters, the DeepSeek model suffers only a 1-point drop in accuracy compared to the full-parameter model. Moreover, it outperforms the full-parameter model by 10% on code generation tasks.

URL PDF HTML ☆

赞 0 踩 0

2606.09886 2026-06-10 cs.LG cs.AI 新提交

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

SHAPE: 面向稀疏混合专家大语言模型的联盟感知专家剪枝

Yuhao Zhang

发表机构 * Beihang University（北航大学）

AI总结提出SHAPE框架，通过合作博弈论建模专家间协作，利用Shapley值识别高贡献专家，结合质量覆盖选择规则在剪枝预算下保留关键专家，实验表明在多种MoE模型上提升鲁棒性并降低显存。

详情

AI中文摘要

稀疏混合专家（MoE）大语言模型以低每token计算量实现了高质量，但其部署常受限于内存墙：必须保留全部专家池以支持依赖token的路由。专家剪枝是一种直接解决方案，但先前的标准通常独立评估专家，忽略了MoE推理本质上是“联盟性”的，即输出由路由到的top-$k$专家组合产生。我们提出\textbf{SHAPE}，一个任务驱动的剪枝框架，显式建模\textit{层内}专家协作。SHAPE将小校准集上的路由轨迹建模为经验合作博弈，并通过基于观察到的top-$k$联盟的Shapley式归因分配交互感知的专家价值，从而识别对高效用协作至关重要的专家，而不仅仅是频繁出现的专家。为了在全局剪枝预算下保持MoE拓扑，SHAPE进一步引入\textit{质量-覆盖}选择规则，在每层保留覆盖非负Shapley质量$\alpha$分数的最小专家子集，同时使用二分法匹配目标保留率。在三个现代MoE骨干网络（Qwen3-30B-A3B、GPT-OSS-20B和DeepSeek-V2-Lite）上的多个基准实验表明，SHAPE在20%和40%专家剪枝下，相比全局和逐层剪枝变体一致地提升了鲁棒性，无需额外训练即保持竞争性精度，并显著降低了峰值GPU内存占用。开源代码见此https URL。

英文摘要

Sparse Mixture-of-Experts (MoE) large language models achieve strong quality with low per-token compute, yet their deployment is often limited by the memory wall: the full expert pool must remain resident to support token-dependent routing. Expert pruning is a direct remedy, but prior criteria typically score experts independently and overlook that MoE inference is inherently \emph{coalitional}, where outputs arise from routed top-$k$ expert combinations. We propose \textbf{SHAPE}, a task-driven pruning framework that explicitly models \emph{intra-layer} expert cooperation. SHAPE formulates routing traces on a small calibration set as an empirical cooperative game and assigns interaction-aware expert values via a Shapley-style attribution over observed top-$k$ coalitions, enabling the identification of experts that are essential for high-utility collaborations rather than merely frequent. To preserve MoE topology under a global pruning budget, SHAPE further introduces a \emph{quality-coverage} selection rule that retains, in each layer, the minimal expert subset covering an $α$ fraction of non-negative Shapley mass, while using bisection to match a target keep rate. Experiments on three modern MoE backbones (Qwen3-30B-A3B, GPT-OSS-20B, and DeepSeek-V2-Lite) across diverse benchmarks show that SHAPE consistently improves robustness over global and layer-wise pruning variants, maintaining competitive accuracy under 20\% and 40\% expert pruning without additional training and delivering clear reductions in peak GPU memory footprint. The open-source code is available at https://github.com/Alizen-1009/Shapley-Moe.

URL PDF HTML ☆

赞 0 踩 0

2606.09916 2026-06-10 cs.LG cs.AI 新提交

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

IntentKV: 面向Agent推理的跨轮次意图感知KV缓存剪枝

Junjie Li, Jiong Lou, Jie Li

AI总结针对多轮LLM Agent中KV缓存成为服务瓶颈的问题，提出IntentKV方法，通过会话级QueryMemory和残差注意力头实现跨轮次意图感知的KV剪枝，在保持精度的同时大幅降低峰值请求token和KV读取量。

详情

AI中文摘要

多轮LLM Agent将短查询扩展为包含工具调用、搜索结果和中间推理的长轨迹。在单条轨迹中，KV内存和KV读取带宽增长数个数量级，使得键值（KV）缓存（而非参数计算）成为长时Agent的主要服务瓶颈。我们提出IntentKV，一种学习型KV剪枝方法，保持基础LLM冻结。IntentKV维护一个会话级的跨轮次意图QueryMemory，通过记忆-注意力规则对实时历史token进行评分，并添加一个零初始化的残差注意力头，对当前查询的K向量进行交叉注意力。为了与前缀缓存保持可组合性，驱逐采用槽位映射重定向：被丢弃的位置路由到一个哨兵死槽，而存活的K/V行、RoPE相位和槽位标识保持不变。在严格的KV预算下，IntentKV与无剪枝的全缓存基线相比几乎没有精度下降：在8k KV预算下，Qwen3-8B的平均峰值请求token下降23.9%，Qwen2.5-14B下降30.7%。在Qwen2.5-14B上所有方法都能完成的100个最长BCP查询中，IntentKV-8k进一步将最坏情况下的峰值请求token从92.3k降至20.5k（减少77.8%），最坏情况下的原始KV读取从4.11亿降至3100万（减少92.6%）。

英文摘要

Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the key-value (KV) cache, not parameter compute, the dominant serving bottleneck for long-horizon agents. We introduce IntentKV, learned KV pruning that keeps the base LLM frozen. IntentKV maintains a session-level QueryMemory of cross-turn intent, scores live history tokens with a memory-attention rule, and adds a zero-initialized residual head with cross-attention over current-query K-vectors. To stay composable with prefix caches, eviction is a slot-map redirection: dropped positions route to a sentinel dead slot while surviving K/V rows, RoPE phases, and slot identities stay in place. IntentKV matches the no-pruning full-cache baseline with almost no accuracy drop under tight KV budgets: at an 8k KV budget, mean peak request tokens drop 23.9% on Qwen3-8B and 30.7% on Qwen2.5-14B. On the 100 longest BCP queries that all methods complete on Qwen2.5-14B, IntentKV-8k further cuts worst-case peak request tokens from 92.3k to 20.5k, a 77.8% reduction, and worst-case raw KV reads from 411M to 31M, a 92.6% reduction.

URL PDF HTML ☆

赞 0 踩 0

2606.09924 2026-06-10 cs.LG cs.AI 新提交

统一LLM训练中的数据、内存和计算效率：一项综述

Vanessa Schmidt, Huy Hoang Nguyen, Cédric Jung, Shirin Salehi, Anke Schmeink

发表机构 * Chair of Information Theory and Data Analytics (INDA), RWTH Aachen University（亚琛工业大学信息理论与数据分析教席）； AIT Austrian Institute of Technology GmbH（奥地利技术研究所）； Automation and Control Institute, Technische Universität Wien (TUW)（维也纳工业大学自动化与控制研究所）

AI总结本文从资源约束视角综述大语言模型训练中的数据效率、内存效率和计算预算感知三大瓶颈，强调三者需联合优化而非孤立处理。

Comments Accpeted for publication in IEEE Transactions on Artificial Intelligence (TAI)

详情

AI中文摘要

资源约束日益决定了大语言模型（LLM）中可以训练、微调和部署的内容，然而效率通常通过孤立的技术而非作为相互作用的限制系统来研究。本综述采用以约束为中心的视角，围绕三个耦合的瓶颈组织近期进展：数据效率（训练什么）、内存效率（如何适应训练）和计算预算感知（何时何地消耗FLOPs）。在数据轴上，我们回顾了最大化每个token学习量的选择和剪枝方法，从基于学习动态的可扩展代理信号到基于梯度和影响的评分，以及难度感知和课程式策略。我们强调新兴证据表明，不同的“好数据”概念在不同机制中占主导地位，这意味着最优子集取决于任务目标和资源预算，而非普遍适用。在系统方面，我们表明GPU内存（而非原始计算）通常是微调中的主要瓶颈，有效的扩展需要联合减少权重存储、优化器状态和激活内存，而不是孤立地优化任何单一组件。超越内存，我们将训练和推理视为计算主导的过程，其中优化、数据选择和解码必须明确考虑有限的FLOP预算。我们回顾了计算最优分配和停止规则的证据，其中一旦边际性能增益低于预算依赖的阈值，计算应停止或重新分配。总之，这些结果将计算感知的数据选择、缩放定律和自适应推理统一在资源条件决策的共同原则下。

英文摘要

Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits. This survey adopts a constraint-centric perspective and organizes recent progress around three coupled bottlenecks: data efficiency (what to train on), memory efficiency (how to fit training), and compute budget awareness (when and where to spend FLOPs). On the data axis, we review selection and pruning methods that maximize learning per token, ranging from scalable proxy signals based on learning dynamics to gradient- and influence-based scoring, as well as difficulty-aware and curriculum-style strategies. We highlight emerging evidence that different notions of good data dominate in different regimes, implying that optimal subsets depend on the task objective and resource budget rather than being universal. On the systems side, we show that GPU memory, not raw compute, is often the dominant bottleneck in fine-tuning, and that effective scaling requires jointly reducing weight storage, optimizer states, and activation memory rather than optimizing any single component in isolation. Beyond memory, we frame training and inference as compute-governed processes in which optimization, data selection, and decoding must explicitly account for finite FLOP budgets. We review evidence for compute-optimal allocation and stopping rules, where computation should be halted or reallocated once marginal performance gains fall below a budget-dependent threshold. Together, these results unify compute-aware data selection, scaling laws, and adaptive inference under a common principle of resource-conditioned decision-making.

URL PDF HTML ☆

赞 0 踩 0

2606.10890 2026-06-10 cs.LG cs.AI 新提交

Optimal Post-Training Quantization Scales and Where to Find Them

最优后训练量化尺度及其寻找方法

Juan Amboage, Pablo Monteagudo-Lago, Ian Colbert, Giuseppe Franco, Nicholas Fraser

发表机构 * AMD

AI总结提出PiSO算法，利用校准数据精确高效地计算逐通道最优量化尺度，并扩展到分组量化，在Llama和Qwen模型上显著提升困惑度和零样本准确率。

详情

AI中文摘要

后训练量化（PTQ）通过将权重映射到低比特表示来压缩大型语言模型。定义量化网格的缩放因子通常使用简单的、无数据的启发式方法选择。在这项工作中，我们提出了PiSO（分段尺度优化），一种利用校准数据在最近舍入量化下精确且高效地计算最优逐通道权重尺度的算法。PiSO将尺度搜索空间划分为有限个区间，在这些区间上目标函数具有闭式最小值。我们通过原则性启发式方法将PiSO扩展到分组量化，并提出了将尺度优化与纠错交错的有效策略。在Llama和Qwen模型上，跨多个模型大小和目标权重位宽的实验表明，在困惑度和下游零样本准确率上均有持续改进，无论是单独使用还是与纠错结合。特别地，我们观察到随着目标位宽变窄、量化变得更加困难，收益增加。

英文摘要

Post-training quantization (PTQ) compresses large language models by mapping weights to low-bit representations. The scaling factor that defines the quantization grid is typically chosen using simple, data-free heuristics. In this work, we present PiSO (Piecewise Scale Optimization), an algorithm that leverages calibration data to compute the optimal channel-wise weight scales exactly and efficiently under round-to-nearest quantization. PiSO partitions the scale search space into finitely many intervals on which the objective admits a closed-form minimizer. We extend PiSO to group-wise quantization via principled heuristics and propose effective strategies for interleaving scale optimization with error correction. Experiments on Llama and Qwen models across multiple model sizes and target weight bit-widths demonstrate consistent improvements in perplexity and downstream zero-shot accuracy, both standalone and combined with error correction. In particular, we observe increased benefits as the target bit-width narrows and quantization becomes more challenging.

URL PDF HTML ☆

赞 0 踩 0

2606.10896 2026-06-10 cs.LG cs.DB cs.IR cs.PF 新提交

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Flash-GMM：一种用于可扩展软聚类的内存高效内核

Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo

发表机构 * IBM Research（IBM研究院）

AI总结提出Flash-GMM融合Triton内核，在单GPU通次中高效计算高斯混合模型，通过避免实例化完整责任矩阵，实现20倍加速并支持比先前大100倍的数据集训练，集成到IVF粗量化器中提升ANN搜索性能。

详情

AI中文摘要

我们提出了 \textbf{Flash-GMM}，一个融合的 Triton 内核，用于在单 GPU 通次中高效计算大规模数据上的高斯混合模型（GMM）。通过消除在 GPU 内存中实例化完整责任矩阵的需求，Flash-GMM 实现了比现有实现 \textbf{20$\times$} 的加速，并支持在单个设备上训练比以前可行数据集大 \textbf{100$\times$} 的数据集。为了展示其影响，我们将 Flash-GMM 集成到 IVF 粗量化器中用于近似最近邻（ANN）搜索。我们表明，软 GMM 聚类现在可以作为 $k$-means 的可行即插即用替代方案，并且可以利用 GMM 责任将边界向量分配到多个聚类。我们的方法在达到固定召回目标时，距离计算次数减少多达 $1.7\times$，或者在相同计算成本下，召回率@10 提高 $+2$--$12$。我们将该内核作为开源项目发布。

英文摘要

We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a \textbf{20$\times$} speedup over existing implementations and enables training on datasets more than \textbf{100$\times$} larger than previously feasible on one device. To demonstrate its impact, we integrate Flash-GMM into the IVF coarse quantizer for approximate nearest-neighbor (ANN) search. We show that soft GMM clustering is now a viable drop-in replacement for $k$-means, and that GMM responsibilities can be leveraged to assign border vectors to multiple clusters. Our approach reaches fixed recall targets with up to $1.7\times$ fewer distance computations, or equivalently, yields $+2$--$12$ recall@10 at matched computational cost. We release the kernel as an open-source project.

URL PDF HTML ☆

赞 0 踩 0

2606.10935 2026-06-10 cs.LG cs.AI 新提交

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

CLP: 零损失自适应多令牌推理的搭配长度预测

Xuezhen Xie, Zhiqiang Zhou

发表机构 * Xiamen University（厦门大学）； Tsinghua University（清华大学）

AI总结提出CLP方法，通过轻量级线性层预测可安全接受的额外令牌数，解决多令牌预测中头-主干竞争导致的输出退化问题，在Qwen2.5模型上实现零质量损失的1.14x-1.29x加速。

Comments 13 pages, 8 figures, 8 tables

详情

AI中文摘要

大型语言模型推理受限于自回归解码，每个令牌需要一次完整的前向传播。多令牌预测（MTP）提供了一种有前景的加速路径，但现有方法存在根本性的架构缺陷：第一个令牌的MTP头与主干自身的语言模型（LM）头竞争，导致预测被接受时质量严重下降。我们将这种头-主干竞争确定为先前基于MTP的加速方法中重复和不连贯输出的根本原因。为了解决这个问题，我们提出了Backbone-as-Architect设计原则，其中主干LM头始终生成第一个令牌，MTP头仅负责后续令牌。基于这一原则，我们引入了CLP（搭配长度预测器），一个轻量级的跨度级决策层，预测每个解码步骤可以安全接受多少个额外令牌。CLP仅使用单个线性层（4.6K--7.7K参数），取代了先前工作中过度设计的1M参数门控网络。在Qwen2.5模型（0.5B、1.5B、7B）上的实验表明，CLP在1.5B模型上实现了1.20x--1.29x加速，在7B模型上实现了1.14x--1.20x加速，且零质量退化（重复率<0.02），而基于门控的方法无法加速（1.07x）或产生严重退化的输出（重复率>0.5%）。我们进一步证明，较短的预测范围（k=2）在大模型上恢复了24%更高的MTP头准确率，建立了一个可扩展感知的设计原则。我们确定MTP头预测准确率是加速的约束条件，并为未来改进建立了清晰的路线图。

英文摘要

Large language model inference is bottlenecked by autoregressive decoding, where each token requires a full forward pass. Multi-token prediction (MTP) offers a promising acceleration path, but existing approaches suffer from a fundamental architectural flaw: the MTP head for the first token competes with the backbone's own language model (LM) head, leading to severe quality degradation when predictions are accepted. We identify this head-backbone competition as the root cause of repetitive and incoherent outputs in prior MTP-based acceleration methods. To address this, we propose Backbone-as-Architect, a design principle where the backbone LM head always generates the first token, and MTP heads are responsible only for subsequent tokens. Building on this principle, we introduce CLP (Collocation-Length Predictor), a lightweight span-level decision layer that predicts how many additional tokens can be safely accepted at each decoding step. CLP uses only a single linear layer (4.6K--7.7K parameters), replacing the over-engineered 1M-parameter gate networks used in prior work. Experiments on Qwen2.5 models (0.5B, 1.5B, 7B) show that CLP achieves 1.20x--1.29x speedup on 1.5B and 1.14x--1.20x on 7B, with zero quality degradation (repetition ratio < 0.02), while gate-based approaches fail to accelerate (1.07x) or produce severely degraded outputs (repetition ratio > 0.5%). We further demonstrate that shorter prediction horizons (k=2) recover 24% higher MTP head accuracy on large models, establishing a scaling-aware design principle. We identify MTP head prediction accuracy as the binding constraint on acceleration and establish a clear roadmap for future improvements.

URL PDF HTML ☆

赞 0 踩 0

2606.10944 2026-06-10 cs.LG cs.DS math.ST stat.ME stat.ML stat.TH 新提交

Express Language Modeling

Express 语言建模

Albert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey

AI总结提出 Express 工具，将非因果注意力近似转换为因果近似，结合 Thinformer 实现最优因果注意力保证，并加速语言建模中的四个资源瓶颈。

2606.09956 2026-06-10 cs.SE cs.LG 交叉投稿

Multi-task LLMs for Bug Classification: Efficient Inference with Auxiliary Decoding Heads

多任务大语言模型用于缺陷分类：基于辅助解码头的高效推理

Nikolai Rozanov

发表机构 * Recurse Ltd. ； Department of Computing, Imperial College London（帝国理工学院计算机系）

AI总结提出一种轻量级多任务大语言模型（MLC），通过令牌对齐算法和优化训练策略，实现全文件上下文下的行级缺陷定位，性能与代理方法相当但推理延迟降低数个数量级。

Comments 8 pages, 6 pages appendix

详情

AI中文摘要

基于大语言模型的代码生成技术被迅速采用，极大地加速了软件开发，但有效的验证方法仍然严重不足。现有的缺陷定位技术要么成本过高（每个文件需要数分钟的代理推理和数千个生成令牌），要么以粗粒度的函数级别运行，不适合精确调试。而专注于行级粒度且更轻量的工作往往在性能或上下文大小上受到限制。我们提出了一种新颖的行级缺陷定位方法，通过三个关键贡献解决了这些限制：（1）一种令牌对齐算法，克服了先前工作中的基本令牌化挑战；（2）一种轻量级多任务大语言模型用于缺陷定位（MLC），实现高效的行级缺陷分类；（3）一种针对多行预测的优化训练策略。我们的方法在全文件上下文下的行级缺陷定位中，在类似设置中达到了最先进的性能。同时，在Defects4J和PypiBugs基准测试中，我们达到了与代理方法相当的性能，同时将推理延迟降低了数个数量级，每个文件仅需生成一个令牌。我们还通过引入并在一个小型域外评估数据集（Python）上进行评估，进一步证明了强大的泛化能力。我们将在论文被接收后开源我们的代码、模型和数据集。

英文摘要

The rapid adoption of LLM-powered code generation has dramatically accelerated software development, yet effective verification methods remain severely underdeveloped. Existing bug localization techniques are either prohibitively expensive, requiring minutes of agentic reasoning and thousands of generated tokens per file, and/or operate at coarse function-level granularity unsuitable for precise debugging. While works that focus on line-level granularity and are more light-weight are often limited in their performance or context size. We introduce a novel line-level bug localization approach that addresses these limitations through three key contributions: (1) a token alignment algorithm that overcomes fundamental tokenization challenges in previous work, (2) a lightweight multi-task LLM for bug localization (MLC) enabling efficient line-level bug classification, and (3) an optimized training recipe for multi-line prediction. Our method achieves state-of-the-art performance among similar setups on line-level bug localization with full-file context. At the same time we reach comparable performance to agentic approaches on Defects4J and PypiBugs benchmarks while reducing inference latency by orders of magnitudes, requiring only a single generated token per file. We further demonstrate strong generalization by introducing and evaluating on a small out-of-domain evaluation datasets in Python. We will open source our code, models, and datasets upon acceptance.

URL PDF HTML ☆

赞 0 踩 0

2606.10008 2026-06-10 cs.NE cs.LG 交叉投稿

更细粒度更好（配合正确的缩放）

Clemens Schaefer, Gil Tabak

发表机构 * Google LLC（谷歌公司）； Mountain View, Ca（山景城，加利福尼亚）

AI总结本文研究大语言模型低精度量化中的块大小悖论，发现细粒度块在正确缩放策略下能降低量化误差，并通过4-over-6方法等解决退化问题。

详情

AI中文摘要

微缩放是保持量化到超低精度格式的大语言模型质量的关键技术。直观上，更细的块大小应产生更低的量化误差；然而，Fasoli等人（2026）最近发现的一个悖论表明，标准abs-max缩放实际上会随着块大小减小而导致模型质量下降。在这项工作中，我们研究了这一现象背后的机制。我们证明，这种退化并非细粒度的固有限制，而主要是由于较小块中的元素在统计上更接近其局部块最大值，与用作缩放因子的粗糙非规格化E4M3值交互不良所致。具体来说，我们表明：i）防止缩放因子下溢为零可减轻极端下溢引起的误差；ii）针对性的算法干预（如4-over-6方法）为缩放因子的选择提供更多灵活性，从而解决了较大值的悖论；iii）暴力搜索建立了最优基线，证实理论均方误差随块大小细化而严格改善。最终，我们的发现突显了硬件-软件协同设计的关键见解：块大小悖论部分源于天真的缩放选择。虽然使用分层缩放因子或更宽的格式（如UE5M3）可互换地解决大部分质量损失，但我们发现4-over-6缩放选择启发式方法可以进一步改善质量，特别是对于非常小的块大小。因此，最大化下一代ML加速器的性能需要将硅片格式规范和软件缩放算法视为紧密耦合的设计选择。

英文摘要

Microscaling is a critical technique for preserving the quality of Large Language Models (LLMs) quantized to ultra-low precision formats. Intuitively, finer block sizes should yield lower quantization error; however, a paradox recently identified by Fasoli et al. (2026) demonstrates that standard abs-max scaling can actually result in degraded model quality as block sizes shrink. In this work, we investigate the underlying mechanics of this phenomenon. We demonstrate that this degradation is not an inherent limitation of finer granularity, but is primarily driven by how elements in smaller blocks statistically cluster closer to their local block maximum, interacting poorly with the coarse subnormal E4M3 values used as scaling factors. Specifically, we show that i) preventing the scaling factor from underflowing to zero mitigates errors caused by extreme underflow, ii) targeted algorithmic interventions like the 4-over-6 methodology that give more flexibility to the choice of scaling factor resolve the paradox for larger values, and iii) a brute-force search establishes an optimal baseline, confirming that the theoretical Mean Squared Error (MSE) strictly improves with finer block sizes. Ultimately, our findings highlight a critical insight for hardware-software co-design: the block-size paradox is partially an artifact of naive scale selection. While using hierarchical scaling factors or wider formats like UE5M3 interchangeably resolves much of the quality loss, we found the 4-over-6 scale selection heuristic can even further improve quality, especially for very small block sizes. Consequently, maximizing the performance of next-generation ML accelerators will require treating silicon format specifications and software scaling algorithms as tightly coupled design choices.

URL PDF HTML ☆

赞 0 踩 0

2606.02608 2026-06-10 cs.LG 版本更新

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

通过Marchenko-Pastur分布剪枝深度神经网络

Leonid Berlyand, Theo Bourdais, Houman Owhadi, Yitzchak Shmalo

发表机构 * Department of Mathematics, Pennsylvania State University（数学系，宾夕法尼亚州立大学）； Department of Computing and Mathematical Sciences, California Institute of Technology（计算与数学科学系，加州理工学院）

AI总结提出基于Marchenko-Pastur随机矩阵理论的剪枝方法，在极短微调预算下保持精度，并在ImageNet-1k上验证了多种架构的高效稀疏执行加速。

详情

AI中文摘要

我们研究了一种基于Marchenko-Pastur（MP）随机矩阵的深度神经网络剪枝方法，该方法适用于非常小的剪枝后微调预算。主要的实际贡献在于，在短校准和微调计划下保持精度，而不是采用长的剪枝后重新优化流程。理论给出了确定性数据路径保证：如果移除的组件$R$具有小的传播logit效应$L_s \| R \psi_1(s) \|_\infty$，则剪枝会降低弹性网络目标，并保留那些密集边界超过两倍扰动的样本。零预算情况给出了完美剪枝；剪枝-恢复扩展在固定稀疏执行模式内建模权重恢复；加性$L_2$正则化模型显示，可接受的随机类组件在训练极限处消失，而持续尖峰在MP bulk坍缩时稳定。在iid高斯充分条件下，拟合的MP边缘$\sigma_+$给出了高概率的逐层预算信号。在ImageNet-1k上，仅经过三个蒸馏epoch，ViT-B/16 $2{:}4{+}$ToMe在$59.81\%$稀疏执行MAC缩减下达到$83.41\%$ top-1（比密集低1.70个百分点），同一检查点和ToMe图在A40原生$2{:}4$后端加速比为$1.388 imes$；另一个无ToMe的A100端点加速比为$2.705 imes$。在结构化稀疏方面，ViT-B/16 $6{:}12$达到$83.74\%$，ViT-L/16 $8{:}16$密集+置换达到$85.33\%$（低0.51个百分点），ConvNeXtV2-Base $12{:}16$达到$86.35\%$（低0.37个百分点）。对于CNN，ResNet50 $8{:}16$密集+置换达到$75.87\%$（低0.26个百分点），ResNet152d CAST-conv+置换达到$81.33\%$（低1.53个百分点），在${\sim}50\%$ MAC核算下，A40 im2col$+2{:}4$稀疏GEMM审计加速比为$1.62 imes$。

英文摘要

We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component $R$ has small propagated logit effect $L_s \| R ψ_1(s) \|_\infty$, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune--restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive $L_2$-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge $σ_+$ gives a high-probability layerwise budget signal. On ImageNet-1k, after only three distillation epochs, ViT-B/16 $2{:}4{+}$ToMe reaches $83.41\%$ top-1 ($-1.70$ pp from dense) at $59.81\%$ sparse-execution MAC reduction, with $1.388\times$ best-observed A40 native-$2{:}4$ backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives $2.705\times$. At structured sparsity, ViT-B/16 $6{:}12$ reaches $83.74\%$, ViT-L/16 $8{:}16$ dense+permutation reaches $85.33\%$ ($-0.51$ pp), and ConvNeXtV2-Base $12{:}16$ reaches $86.35\%$ ($-0.37$ pp). For CNNs, ResNet50 $8{:}16$ dense+permutation reaches $75.87\%$ ($-0.26$ pp), and ResNet152d CAST-conv+permutation reaches $81.33\%$ ($-1.53$ pp) at ${\sim}50\%$ MAC accounting with a $1.62\times$ A40 im2col$+2{:}4$ sparse-GEMM audit.

URL PDF HTML ☆

赞 0 踩 0

2606.06021 2026-06-10 cs.LG cs.AI 版本更新

利用张量并行和全分片数据并行扩展神经网络验证

Sergei Vorobyov, Eugene Ilyushin

发表机构 * Lomonosov Moscow State University（莫斯科国立大学）； Central University（中央大学）

AI总结针对神经网络形式化验证中GPU内存瓶颈，将张量并行（TP）和全分片数据并行（FSDP）适配到auto_LiRPA/α,β-CROWN框架，TP实现约2倍峰值内存降低但边界紧度下降，FSDP实现80-90%基础内存降低且边界与单GPU逐位一致，并支持完整验证和卷积层。

详情

AI中文摘要

形式化神经网络验证——证明网络对于指定域内所有输入满足安全属性——在实践中受限于GPU内存：边界传播算法（IBP、CROWN、α-CROWN）的标准实现要求权重矩阵和松弛系数矩阵完全驻留在单个加速器上。我们将最初为大规模模型训练开发的两种并行技术适配到auto_LiRPA/α,β-CROWN验证框架。张量并行（TP）将权重矩阵和A矩阵分片到多个GPU上，在P=2时实现约2倍的峰值内存降低；在VNN-COMP 2022 MNIST-FC基准测试上确认了正确性，但由于分片区域内中间边界被迫使用IBP替代，边界紧度随分片区域数量增加而下降。全分片数据并行（FSDP）仅对权重矩阵进行分片，并逐层使用AllGather，产生的边界与单GPU基线逐位一致：在宽MLP上，基础内存降低80-90%，峰值内存降低34-39%。FSDP与完整验证（β-CROWN + 分支定界）和卷积层（BoundConv）无缝集成；在FSDP下，CIFAR-100 ResNet-large（VNN-COMP 2024）获得了完整的不可满足结果。在所有实验中，α-CROWN+BaB模式下的内存瓶颈被证明是每个神经元的alpha张量，而非权重矩阵，这指出了未来工作的关键方向。

英文摘要

Formal neural network verification -- proving that a network satisfies safety properties for *all* inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $α$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the auto_LiRPA / $α,β$-CROWN verification framework. Tensor Parallelism (TP) shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. Fully Sharded Data Parallelism (FSDP) shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80--90%, peak memory by 34--39% on wide MLPs. FSDP integrates cleanly with complete verification ($β$-CROWN + Branch-and-Bound) and with convolutional layers (BoundConv); a complete unsat result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $α$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.

URL PDF HTML ☆

赞 0 踩 0

2506.14753 2026-06-10 cs.CV cs.LG 版本更新

Cost-Aware Routing for Efficient Text-To-Image Generation

面向文本到图像生成的高效路由：成本感知方法

Qinchan Li, Kenneth Chen, Changyue Su, Wittawat Jitkrittum, Qi Sun, Patsorn Sangkloy

发表机构 * Tandon School of Engineering, New York University（纽约大学Tandon工程学院）； Google（谷歌）； Eigen 4D Inc.（Eigen 4D公司）

AI总结提出成本感知路由框架，根据提示复杂度自动选择不同去噪步数或模型，在保证高质量的同时降低计算成本，优于单一模型。

Comments Accepted by TMLR

详情

AI中文摘要

扩散模型以其通过迭代去噪过程为输入提示生成高保真图像的能力而闻名。不幸的是，由于固有的顺序生成过程，高保真度也伴随着高计算成本。在这项工作中，我们寻求在质量和计算成本之间实现最优平衡，并提出一个框架，允许每个提示的计算量根据其复杂度而变化。每个提示自动路由到最合适的文本到图像生成函数，该函数可能对应扩散模型的不同去噪步数，或一个不同的、独立的文本到图像模型。与统一的成本降低技术（例如，蒸馏、模型量化）不同，我们的方法通过学习将昂贵的选择（例如，100+去噪步）仅保留给少数复杂提示，而对较简单的提示采用更经济的选择（例如，小型蒸馏模型），从而实现最优权衡。我们在COCO和DiffusionDB上经验性地证明，通过学习路由到九个已训练的文本到图像模型，我们的方法能够提供比这些模型单独使用时更高的平均质量。代码可在以下网址获取：https://this URL。

英文摘要

Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due to the inherently sequential generative process. In this work, we seek to optimally balance quality and computational cost, and propose a framework to allow the amount of computation to vary for each prompt, depending on its complexity. Each prompt is automatically routed to the most appropriate text-to-image generation function, which may correspond to a distinct number of denoising steps of a diffusion model, or a disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g., distillation, model quantization), our approach achieves the optimal trade-off by learning to reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and employ more economical choices (e.g., small distilled model) for less sophisticated prompts. We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine already-trained text-to-image models, our approach is able to deliver an average quality that is higher than that achievable by any of these models alone. Code is available at https://github.com/winglicopy/CATImage.

URL PDF HTML ☆

赞 0 踩 0

2605.18271 2026-06-10 cs.CL cs.AI cs.IR cs.LG 版本更新

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

从体积到价值：面向设备端RAG的偏好对齐记忆构建

Changmin Lee, Jaemin Kim, Taesik Gong

发表机构 * Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea（计算机科学与工程系，全州国立科学与技术研究所（UNIST），全州，韩国）

AI总结本文提出EPIC方法，通过将用户偏好作为紧凑且稳定的个人上下文形式，整合到RAG流程中，以在有限内存下提高检索与用户偏好的对齐度，从而减少内存使用并提升准确性。

Comments Accepted to ICML 2026. Code and data are available at https://github.com/UbiquitousAILab/EPIC

详情

AI中文摘要

随着基于大型语言模型（LLMs）的个人AI代理的迅速发展，将其部署到设备上已成为隐私和响应性的重要需求。为了处理现实世界请求中固有的个人和上下文依赖性，这些代理必须基于设备上存储的个人上下文进行生成。然而，在内存预算紧张的情况下，核心瓶颈是存储什么内容以确保检索与用户保持一致。我们提出EPIC（高效偏好对齐索引构建），专注于用户偏好作为紧凑且稳定的个人上下文形式，并在整个RAG流程中整合它们。EPIC会选择性地保留与偏好相关的信息，并将检索对准偏好对齐的上下文。在四个涵盖对话、辩论、解释和推荐的基准测试中，EPIC将索引内存减少了2,404倍，提高了偏好遵循的准确性20.17个百分点，并在最佳表现基线之上实现了33.33倍更低的检索延迟。在我们的设备端实验中，EPIC在29.35毫秒/查询的流式更新下保持内存占用低于1 MB。

英文摘要

With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 18.79 %p, and achieves 32.17 times lower retrieval latency over the best-performing baseline. In on-device experiments, EPIC maintains under 1 MB memory and achieves 5.21 to 29.35 ms/query latency across three platforms, while supporting streaming updates under preference drift. Our code and data are available at https://github.com/UbiquitousAILab/EPIC.

URL PDF HTML ☆

赞 0 踩 0

2606.09868 2026-06-10 cs.LG cs.AI 新提交

SPACE: Source-free Proxy Anchor Concept Erasure for MLLMs

SPACE: 面向多模态大语言模型的无源代理锚点概念擦除

Zhijing Zhang, Jiaqi Ding, Qianshan Wei, Nan Zhou, Jiaqi Li, Yongliang Wu, Tongxin Zhu, Xiaolin Fang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出SPACE框架，首个针对多模态大语言模型的无源机器遗忘方法，通过文本引导的代理锚点选择和双约束语义隔离，在不访问目标数据的情况下擦除概念，并保持模型性能。

详情

AI中文摘要

随着多模态大语言模型（MLLMs）面临日益增长的隐私风险和监管约束，机器遗忘（MU）已成为在保持模型性能的同时移除敏感数据的关键解决方案。然而，现有的MU方法通常依赖于目标概念的视觉数据，由于严格的数据保留政策，这些数据往往不可用，因此需要一种无需访问目标数据的无源遗忘方法。在这项工作中，我们提出了无源代理锚点概念擦除（SPACE），这是首个专门针对MLLMs的无源遗忘框架。SPACE包含两个阶段：（1）文本引导的代理锚点选择（TPAS），从共享特征空间中检索语义对齐的代理锚点；（2）双约束语义隔离（DCSI），优化这些锚点以间接擦除目标概念。DCSI将更新限制在保留知识的零空间中，确保结构完整性。我们从理论上证明，SPACE严格限制了保留知识的扰动，并最大化特征谱熵，从而保持模型性能。此外，在六个数据集上的大量实验表明，SPACE实现了与最先进的数据依赖方法相当的性能，验证了其在无源MU场景中的有效性。源代码将发布。

英文摘要

As Multimodal Large Language Models (MLLMs) face growing privacy risks and regulatory constraints, machine unlearning (MU) has emerged as a crucial solution for removing sensitive data while preserving model performance. However, existing MU methods typically rely on visual data of the target concepts, which is often unavailable due to strict data retention policies, thus creating a demand for source-free unlearning approaches that operate without access to the target data. In this work, we propose Source-free Proxy Anchor Concept Erasure (SPACE), the first source-free unlearning framework specialized for MLLMs. SPACE consists of two stages: (1) Text-Guided Proxy Anchor Selection (TPAS), which retrieves semantically aligned proxy anchors from the shared feature space. (2) Dual-Constraint Semantic Isolation (DCSI), which optimizes these anchors to indirectly erase target concepts. DCSI confines updates to the null space of retained knowledge, ensuring structural integrity. We theoretically prove that SPACE strictly bounds the perturbation on retained knowledge and maximizes feature spectral entropy, thereby maintaining the model's performance. Furthermore, extensive experiments across six datasets show that SPACE achieves performance comparable to that of state-of-the-art data-dependent methods, validating its effectiveness in source-free MU scenarios. The source code will be released.

URL PDF HTML ☆

赞 0 踩 0

2606.09869 2026-06-10 cs.LG cs.AI cs.CR 新提交

QSplitFL: Capability Aware Deep Q-Learning for Optimal Split Point Selection in Split Federated Learning

QSplitFL: 基于能力感知的深度Q学习在分割联邦学习中的最优分割点选择

Nazmus Shakib Shadin, Xinyue Zhang, Jingyi Wang, Miao Pan

AI总结提出QSplitFL框架，利用深度Q网络基于客户端硬件指标（CPU、内存、电池、网络延迟）动态选择最优分割点，解决异构设备上的分割联邦学习挑战，通过衰减损失奖励函数和委员会投票机制提升收敛速度和精度。

Comments Accepted by ECML-PKDD 2026

详情

AI中文摘要

联邦学习（FL）与分割学习（SL）结合是一种隐私保护范式，能够在资源受限设备上训练深度神经网络（DNN），同时降低整体训练成本。然而，确定最优分割点（即模型被分割的层）仍然是一个关键挑战，尤其是当客户端具有异构硬件能力时。固定分割点可能使弱设备过载，增加通信和服务器负载，从而减慢收敛速度并降低稳定性。本文介绍了QSplitFL，一种新颖的基于能力感知的深度Q网络（DQN）框架，用于在基于分割学习的联邦学习（SFL）环境中选择最优分割点。与依赖高维模型权重表示的现有方法不同，QSplitFL采用直接从客户端硬件指标（包括CPU利用率、内存、电池电量和网络延迟）导出的轻量级状态表示。所提出的框架包含一个衰减损失下降奖励函数，优先考虑早期收敛，以及一个基于委员会的DQN架构，通过多数投票来减轻奖励黑客攻击。在MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100数据集上，使用CNN、ResNet50、MobileNetV4和ConvNeXt架构进行的广泛实验表明，我们的方法在收敛速度和精度上优于现有方法，同时有效适应异构设备资源。源代码在此https URL公开可用。

英文摘要

Federated Learning (FL) combined with Split Learning (SL) is a privacy preserving paradigm that enables training deep neural networks (DNNs) on resource constrained devices while reducing overall training cost. However, determining the optimal split point, meaning the layer where the model is divided still remains a critical challenge, especially when clients have heterogeneous hardware capabilities. Fixed split points can overload weak devices and increase the communication and server load, which slows convergence and reduces stability. This paper introduces QSplitFL, a novel capability-aware Deep Q-Network (DQN) framework for optimal split point selection in Split learning based Federated Learning (SFL) environments. Unlike existing approaches that rely on high-dimensional model weight representations, QSplitFL employs a lightweight state representation derived directly from client hardware metrics, including CPU utilization, memory, battery level, and network latency. The proposed framework incorporates a decayed loss-drop reward function that prioritizes early convergence, and a committee-based DQN architecture with majority voting to mitigate reward hacking. Extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets using CNN, ResNet50, MobileNetV4, and ConvNeXt architectures demonstrate that our approach achieves better convergence and higher accuracy compared to existing methods, while effectively adapting to heterogeneous device resources. The source code is publicly available at https://github.com/AIPO-Lab/QSplitFL.

URL PDF HTML ☆

赞 0 踩 0

2606.10124 2026-06-10 cs.LG cs.AI 新提交

FedSteer: Taming Extreme Gradient Staleness in Federated Learning with Corrective Projections and Caching

FedSteer: 通过校正投影和缓存驯服联邦学习中的极端梯度陈旧性

Haoran Zhang, Cainã Figueiredo Pereira, Marie Siew, Xutong Liu, Carlee Joe-Wong, Rachid El-Azouzi

AI总结针对联邦学习中客户端参与不均导致的梯度陈旧问题，提出FedSteer方法，利用客户端梯度缓存构建子空间，通过投影和缓存策略校正陈旧梯度，显著提升训练稳定性与精度。

Comments UAI 2026

详情

AI中文摘要

联邦学习（FL）在客户端不持续参与训练轮次时，常遭受聚合方差的影响。虽然重用非活跃客户端的陈旧模型更新是减少这种方差的常见技术，但我们发现，在客户端参与偏斜的情况下，由此产生的更新陈旧性可能变得严重到足以破坏训练稳定性。为了解决这个问题，我们提出了FedSteer，一种新颖的方法，该方法从最近客户端梯度的缓存中构建一个梯度子空间，作为当前优化景观的低维表示。FedSteer将活跃客户端的真实梯度投影到这个子空间上，以找到一组最优坐标。对于非活跃客户端，FedSteer重用这些坐标，并结合由其他活跃客户端漂移的已演化的子空间。这个过程有效地将过时的梯度“引导”向当前的全局目标。此外，还辅以选择性缓存策略，识别代表性客户端子集以形成子空间，从而减少服务器内存。实验表明，FedSteer显著优于基线，在挑战性场景中防止性能崩溃，并在其他场景中实现超过7%的精度提升。

英文摘要

Federated learning (FL) is often subject to aggregation variance if clients do not consistently participate in training rounds. While reusing stale model updates from inactive clients is a common technique to reduce this variance, we find that with skewed client participation, the resulting update staleness can become severe enough to destabilize training. To remedy this, we propose FedSteer, a novel method that constructs a gradient subspace from a cache of recent client gradients to serve as a low-dimensional representation of the current optimization landscape. FedSteer projects an active client's true gradient onto this subspace to find a set of optimal coordinates. For an inactive client, FedSteer reuses these coordinates with the now-evolved subspace drifted by other active clients. This process effectively "steers" outdated gradients toward the current global objective. This is complemented by a selective caching strategy that identifies a representative client subset to form the subspace, reducing server memory. Experiments demonstrate that FedSteer significantly outperforms baselines, preventing performance collapse in challenging scenarios while delivering accuracy gains of over 7% in others.

URL PDF HTML ☆

赞 0 踩 0

2606.10217 2026-06-10 cs.LG cs.CR 新提交

Alignment Defends LLMs from Property Inference Attacks

对齐防御LLM免受属性推断攻击

Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

AI总结提出基于对齐的防御方法，通过后训练调整模型输出分布，在不修改训练数据的情况下缓解属性推断攻击，并保持效用与机密性的平衡。

详情

AI中文摘要

大型语言模型（LLMs）越来越多地在包含敏感数据集级属性的领域特定数据集上进行微调。最近的研究表明，此类数据集级信息可以通过属性推断攻击有效提取，构成机密性风险。现有的防御措施主要通过修改训练数据分布来运作，因此需要访问原始数据并重新训练模型，限制了其在数据不可用或模型已部署场景中的适用性。在这项工作中，我们提出了基于对齐的防御方法，用于缓解LLMs中的属性推断攻击。我们的方法通过后训练对齐将模型的输出分布重塑为目标属性比率，而无需修改训练数据。具体而言，我们通过构建偏好对和定义特定奖励函数，分别适配两种广泛使用的RLHF框架——直接偏好优化（DPO）和组相对策略优化（GRPO）——作为我们的防御方法。通过全面的实验，我们展示了基于对齐的防御方法有效缓解了属性推断攻击，同时保持了良好的效用-机密性权衡。

英文摘要

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets that may contain sensitive, dataset-level properties. Recent work has shown that such dataset-level information can be effectively extracted through property inference attacks, posing a confidentiality risk. Existing defenses against these attacks primarily operate by modifying the training data distribution and hence require access to the original data and retraining the model, limiting their applicability to settings where data is unavailable or models are already deployed. In this work, we propose alignment-based defenses for mitigating property inference attacks in LLMs. Our approach reshapes the model's output distribution towards a target property ratio via post-training alignment, without modifying the training data. In particular, we adapt two widely used RLHF frameworks--Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO)--as our defenses by constructing preference pairs and defining a specific reward function respectively. Through comprehensive experiments, we show that our alignment based defenses effectively mitigate property inference attacks while maintaining a strong utility confidentiality tradeoff.

URL PDF HTML ☆

赞 0 踩 0

2606.10250 2026-06-10 cs.LG cs.AI 新提交

Multi-Level Analyzation of Imbalance to Resolve Non-IID-Ness in Federated Learning

联邦学习中不平衡的多层次分析以解决非独立同分布问题

Haengbok Chung, Jae Sung Lee

发表机构 * Interdisciplinary Program in Artificial Intelligence, Seoul National University, Republic of Korea（人工智能跨学科项目，首尔国立大学，韩国）； Department of Nuclear Medicine, Seoul National University College of Medicine, Republic of Korea（核医学系，首尔国立大学医学院，韩国）； Brightonix Imaging Inc. Seoul, Republic of Korea（Brightonix影像公司，首尔，韩国）

AI总结提出FedBB算法，通过PNB损失函数和CBR重加权分别解决本地训练中的类内/类间不平衡和客户端间不平衡，在X射线和自然图像数据集上优于现有方法。

Comments 27 pages, 5 figures, 13 tables. Accepted for publication in Neurocomputing (2025). Author Accepted Manuscript

详情

DOI: 10.1016/j.neucom.2025.129528
Journal ref: Neurocomputing, Volume 626, 2025, Article 129528

AI中文摘要

类别不平衡是深度学习中常见的问题，会严重降低性能。在联邦学习（FL）中，它是导致非独立同分布数据（non-IID）的关键因素。基于先前的一些尝试，我们在三个层次上定义并分析了FL中的不平衡问题：案例间、类别间和客户端间。案例间不平衡处理每个单一类别内的不平衡；类别间不平衡比较不同类别之间的数据数量。客户端间不平衡表示不同客户端之间本地数据的偏斜程度。基于这些概念，我们提出了FedBB，它由两个主要部分组成：（1）正负平衡（PNB）损失函数解决了本地训练中的案例间和类别间不平衡，增强了高度偏斜的本地客户端数据集上的泛化能力。它通过为少数案例或类别分配更高的权重来优化多标签和多类分类。（2）客户端平衡重加权（CBR）在模型聚合期间根据客户端间不平衡重新加权客户端，为在偏斜较小的数据集上训练的模型赋予更大的权重。在X射线和自然图像数据集上的各种实验表明，FedBB在性能和效率上均优于其他算法。此外，它只需要有限的统计信息，这有利于隐私保护。通过消融研究，我们证明了PNB损失和CBR独立地贡献于性能。由于FedBB旨在构建一个能准确分类所有类别的全局模型，它可以作为通用和个性化FL的基线。

英文摘要

Class imbalance is a common problem in deep learning that severely degrades performance. In federated learning (FL), it is a critical factor contributing to non-identically distributed data (non-IID). Building on several previous attempts, we define and analyze imbalance issues in FL at three levels: inter-case, inter-class, and inter-client. Inter-case imbalance addresses the imbalance in every single class; inter-class imbalance compares the number of data between different classes. Inter-client imbalance represents different skewness of local data between clients. Based on these concepts, we propose FedBB, which consists of two main components: (1) Positive Negative Balanced (PNB) loss function addresses the inter-case and inter-class imbalances in local training, enhancing generalization on highly skewed local client datasets. It optimizes both multi-label and multi-class classifications by assigning higher weights to minority cases or classes. (2) Client Balanced Reweighting (CBR) reweights clients based on inter-client imbalance during model aggregation, giving greater weight to models trained on less skewed datasets. Various experiments on X-ray and natural image datasets demonstrate that FedBB outperforms other algorithms in both performance and efficiency. Additionally, it requires limited statistical information, which is beneficial for privacy protection. Through ablation studies, we proved that PNB loss and CBR independently contribute to performance. As FedBB aims to build a global model that accurately classifies all classes, it can serve as a baseline for the generic and personalized FL.

URL PDF HTML ☆

赞 0 踩 0

2606.10333 2026-06-10 cs.LG cs.CR 新提交

Privacy-Preserving Credit Risk Prediction with Alternative Data

基于替代数据的隐私保护信用风险预测

Hongzhe Zhang, Jiarong Xu, Jing He, Xiao Fang

发表机构 * School of Management and Economics, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen)（香港中文大学（深圳）管理学院）； School of Management, Fudan University（复旦大学管理学院）； Lerner College of Business and Economics, University of Delaware（特拉华大学勒纳商学院）

AI总结针对信用风险预测中替代数据共享导致的隐私泄露问题，提出PrivacyCredit方法，在保护消费者隐私、模型保密性和无损性能约束下，实现与传统明文数据组合相同的预测性能。

详情

AI中文摘要

信用风险预测是消费信贷行业中的一个关键问题。传统上，金融机构使用借款人的人口统计、财务和信用历史数据（统称为传统数据）构建信用风险预测模型。最近的研究表明，替代数据（如借款人的手机通信数据）使贷款人能够获得更全面、更准确的借款人信用状况画像，从而提高信用风险预测性能。然而，替代数据由独立于金融机构的外部实体持有。直接与金融机构共享替代数据会侵犯消费者隐私，但现有的信用风险预测研究大多忽略了这一问题。为填补这一空白，我们定义了一个新问题，即基于替代数据的隐私保护信用风险预测，该问题同时考虑三个实际约束：保护消费者隐私的隐私保护约束、在金融机构集中学习和存储模型的模型保密性约束，以及保持学习模型性能的无损约束。为解决该问题，我们开发了PrivacyCredit，一种新颖的隐私保护机器学习方法。然后，我们从理论上证明了PrivacyCredit的隐私保护、模型保密和无损特性。通过使用与替代数据关联的真实信用数据集进行大量实验，我们证明了安全地将替代数据纳入信用风险预测的预测价值，并表明PrivacyCredit实现了与从传统数据和替代数据的不安全明文组合中学习的模型相同的预测性能。我们进一步评估了其模型保密性和计算效率。

英文摘要

Credit risk prediction is a critical problem in the consumer credit industry. Traditionally, financial institutions construct credit risk prediction models using borrowers' demographic, financial, and credit history data, collectively referred to as traditional data. Recent studies have demonstrated that alternative data, such as borrowers' mobile phone communication data, enable lenders to acquire fuller and more accurate profiles of borrowers' creditworthiness, thereby improving credit risk prediction performance. Nevertheless, alternative data are held by external entities independent of financial institutions. Directly sharing alternative data with financial institutions infringe on consumer privacy, yet existing credit risk prediction studies largely overlook this issue. To address this gap, we define a new problem, namely privacy-preserving credit risk prediction with alternative data, which simultaneously considers three practical constraints: the privacy-preserving constraint that protects consumer privacy, the model-confidentiality constraint that learns and stores the model centrally at the financial institution, and the lossless constraint that maintains the performance of the learned model. To solve this problem, we develop PrivacyCredit, a novel privacy-preserving machine learning method. We then theoretically demonstrate the privacy-preserving, model-confidential, and lossless properties of PrivacyCredit. Through extensive experiments using a real-world credit dataset linked with alternative data, we demonstrate the predictive value of securely incorporating alternative data into credit risk prediction and show that PrivacyCredit achieves the same predictive performance as the model learned from the insecure plaintext combination of traditional and alternative data. We further evaluate its model-confidentiality property and computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.10481 2026-06-10 cs.LG cs.AI cs.CL cs.CR stat.ML 新提交

Advancing the State-of-the-Art in Empirical Privacy Auditing

推进经验隐私审计的最新水平

Nicole Mitchell, Galen Andrew, Arun Ganesh, Brendan McMahan, Peter Kairouz

发表机构 * Google Research（谷歌研究院）

AI总结提出通过高温采样生成合成金丝雀，用于经验隐私审计，并引入基于辅助模型的合成数据审计方法，系统研究模型容量与金丝雀熵对记忆化的交互影响。

详情

AI中文摘要

大型语言模型的参数高效微调可能表现出对个别训练示例的问题性记忆。经验隐私审计（EPA）通过测量成员推断（MI）或重构攻击上的实际数据泄露来量化这种风险。EPA的一个关键挑战是设计与隐私敏感训练数据混合的“金丝雀”示例。我们提出通过从LLM中进行高温采样（$T \geq 0.8$）生成合成金丝雀，使用针对隐私敏感训练数据定制的提示。这些金丝雀作为高影响异常值，确保高可识别性，从而实现强审计。此外，由于金丝雀本身是非私有的，它们是可检查的，并且可以重复插入，而不会危及真实数据的隐私。在隐私敏感数据上微调的模型的一个重要用途是生成合成数据。这也带来了隐私风险。我们引入了一种强大的合成数据审计方法，基于在合成数据上微调辅助模型。然后，对原始金丝雀的辅助模型进行审计，可以强有力地估计通过合成数据的隐私泄露。最后，利用我们强大的审计方法，我们系统研究了模型容量和金丝雀熵对记忆化的交互影响。

英文摘要

Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership inference (MI) or reconstruction attacks. A key challenge in EPA is designing ``canary'' examples that are mixed with the privacy-sensitive training data. We propose generating synthetic canaries via high-temperature sampling ($T \geq 0.8$) from LLMs, using prompts tailored to the privacy-sensitive training data. These canaries act as high-influence outliers, ensuring high identifiability and hence strong audits. Further, since the canaries are themselves non-private, they are inspectable and can be inserted with repetition without jeopardizing the privacy of the real data. An important use of models fine-tuned on privacy-sensitive data is the generation of synthetic data. This also comes with privacy risk. We introduce a powerful synthetic data audit based on fine-tuning an auxiliary model on the synthetic data. Auditing the auxiliary model for the original canaries then provides a strong estimate of the privacy leakage through the synthetic data. Finally, leveraging our strong auditing methodologies, we perform a systematic investigation into the interacting effects of model capacity and canary entropy on memorization.

URL PDF HTML ☆

赞 0 踩 0

2606.10499 2026-06-10 cs.LG cs.AI 新提交

MoE Enhanced Federated Learning for Spatiotemporal Prediction

基于混合专家模型增强的联邦学习用于时空预测

Zhehao Dai, Xiao Han, Zhaolin Deng, Zijian Zhang, Xiangyu Zhao, Guojiang Shen, Xiangjie Kong

发表机构 * Zhejiang University of Technology, Zhejiang Key Laboratory of Visual Information Intelligent Processing（浙江工业大学，浙江省可视信息智能处理重点实验室）； Jilin University（吉林大学）； City University of Hong Kong（香港城市大学）

AI总结提出MoE-FedTP框架，通过轻量级混合专家网络和门控机制，在保护隐私的同时实现跨城市时空预测，有效缓解数据稀缺和异质性问题。

详情

AI中文摘要

范围惩罚：理论洞见及其在联邦学习中的应用

Yiyuan She, Zhaojun Hu, Yifan Sun

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出范围正则化方法，通过极值聚类实现跨客户端正则化，并开发非渐近统计精度与模式恢复的新证明技术，以及利用局部强凸性的快速优化算法。

2409.04111 2026-06-10 cs.LG 版本更新

Active-Passive Federated Learning for Vertically Partitioned Multi-view Data

面向垂直分区多视角数据的主动-被动联邦学习

Jiyuan Liu, Siqi Wang, Xinhang Wan, Yi Zhang, Junsong Chen, Xin Lu, Xinwang Liu

发表机构 * National University of Defense Technology（国防科技大学）

AI总结提出主动-被动联邦学习框架，主动客户端独立构建完整模型，被动客户端仅辅助训练，解决推理时客户端协作不可靠问题，通过重构损失和对比损失实例化两种分类方法并验证有效性。

详情

AI中文摘要

垂直联邦学习是一种自然且优雅的方法，用于集成跨设备（客户端）垂直分区的多视角数据，同时保护其隐私。除了模型训练，现有方法在模型推理时需要所有客户端的协作。然而，模型推理可能长期维持服务，而协作（尤其是当客户端属于不同组织时）在现实场景中不可预测，例如合同取消、网络不可用等，导致推理失败。为了解决这个问题，我们首次尝试提出了一种灵活的主动-被动联邦学习（APFed）框架。具体来说，主动客户端是学习任务的发起者，负责构建完整模型，而被动客户端仅作为辅助。一旦模型构建完成，主动客户端可以独立进行推理。此外，我们将APFed框架实例化为两种分类方法，分别在被动客户端上采用重构损失和对比损失。同时，这两种方法在一系列实验中进行了测试，并取得了理想的结果，验证了它们的有效性。

英文摘要

Vertical federated learning is a natural and elegant approach to integrate multi-view data vertically partitioned across devices (clients) while preserving their privacies. Apart from the model training, existing methods requires the collaboration of all clients in the model inference. However, the model inference is probably maintained for service in a long time, while the collaboration, especially when the clients belong to different organizations, is unpredictable in real-world scenarios, such as concellation of contract, network unavailablity, etc., resulting in the failure of them. To address this issue, we, at the first attempt, propose a flexible Active-Passive Federated learning (APFed) framework. Specifically, the active client is the initiator of a learning task and responsible to build the complete model, while the passive clients only serve as assistants. Once the model built, the active client can make inference independently. In addition, we instance the APFed framework into two classification methods with employing the reconstruction loss and the contrastive loss on passive clients, respectively. Meanwhile, the two methods are tested in a set of experiments and achieves desired results, validating their effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2604.24012 2026-06-10 cs.LG math.OC 版本更新

FedSLoP: Memory-Efficient Federated Learning with Low-Rank Gradient Projection

FedSLoP: 基于低秩梯度投影的内存高效联邦学习

Yutong He, Zhengyang Huang, Jiahe Geng, Kun Yuan

发表机构 * Peking University（北京大学）； Beihang University（北航）

AI总结提出FedSLoP算法，结合随机低秩子空间投影降低通信和存储开销，理论证明以O(1/√NT)速率收敛到一阶稳定点，实验在异构MNIST上优于FedAvg等基线。

Comments 27 pages, 7 figures

详情

AI中文摘要

联邦学习使一组客户端能够在不交换原始数据的情况下协作训练机器学习模型，但标准算法如FedAvg在异构、资源受限的环境中收敛缓慢且通信和内存成本高。我们提出FedSLoP，一种联邦优化算法，它结合了梯度的随机低秩子空间投影，从而降低了通信和存储更新的维度，同时保持了优化进度。在理论方面，我们在标准光滑和有界方差假设下进行了详细的非凸收敛分析，表明FedSLoP保证以$O(1/\sqrt{NT})$的速率收敛到一阶稳定点。在实证方面，我们在具有异构数据分区的联邦MNIST分类上进行了大量实验，表明与FedAvg以及代表性的稀疏或低秩基线相比，FedSLoP显著减少了通信量和客户端内存，同时实现了具有竞争力或更好的准确率。总之，我们的结果表明，诸如FedSLoP之类的随机子空间动量方法为通信和内存高效的联邦学习提供了一种原则性和有效的方法。代码可在以下网址获得：this https URL。

英文摘要

Federated learning enables a population of clients to collaboratively train machine learning models without exchanging their raw data, but standard algorithms such as FedAvg suffer from slow convergence and high communication and memory costs in heterogeneous, resource-constrained environments. We introduce FedSLoP, a federated optimization algorithm that combines stochastic low-rank subspace projections of gradients, thereby reducing the dimension of communicated and stored updates while preserving optimization progress. On the theoretical side, we develop a detailed nonconvex convergence analysis under standard smoothness and bounded-variance assumptions, showing that FedSLoP is guaranteed to converge to a first-order stationary point at a rate of $O(1/\sqrt{NT})$. On the empirical side, we conduct extensive experiments on federated MNIST classification with heterogeneous data partitions, showing that FedSLoP substantially reduces communication volume and client-side memory while achieving competitive or better accuracy compared with FedAvg and representative sparse or low-rank baselines. Together, our results demonstrate that random subspace momentum methods such as FedSLoP provide a principled and effective approach to communication- and memory-efficient federated learning. Codes are available at: https://github.com/pkumelon/FedSLoP.git.

URL PDF HTML ☆

赞 0 踩 0

2606.09850 2026-06-10 cs.LG cs.CL 新提交

双人探戈：面向安全LLM微调的耦合任务-参考选择

Xinrui Chen, Jianhao Zhang, Ou Wu, Di Gao

AI总结提出DualSelect框架，通过耦合任务与安全参考选择，在微调时保持安全对齐，提升安全评分至少5.10点。

详情

AI中文摘要

在下游数据上微调安全对齐的大型语言模型（LLMs）可以提高适应性，但可能会侵蚀已学习的安全行为。现有方法使用固定的安全示例、全局约束或单边任务过滤。我们的诊断表明，任务更新暴露了不同的安全约束，从而激发了联合选择相关参考和兼容任务样本的需求。我们提出DualSelect，一个耦合的任务和参考选择框架，它在过滤与诱导参考方向兼容的整个任务样本之前，刷新任务条件化的安全参考。在极小极大视角下，DualSelect通过熵正则化评分代理、惰性参考刷新和梯度校正，选择具有高保留损失和任务冲突的安全参考以及兼容的任务样本。在1B-8B LLMs上，DualSelect在不损失任务效用的情况下保持安全性；使用REDORCA评估器，它在安全平均值上比最强基线至少提高5.10分，并且在所有评估器中保持最高的安全平均值，且开销适中。这一观点扩展到以保留为中心的持续学习。

英文摘要

Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our diagnostics show task updates expose different safety constraints, motivating joint selection of relevant references and compatible task samples. We propose DualSelect, a coupled framework for task and reference selection that refreshes task conditioned safety references before filtering whole task samples compatible with the induced reference direction. Under a minimax view, DualSelect selects safety references with high preservation loss and task conflict, together with compatible task samples, through entropy-regularized scoring surrogates, lazy reference refresh, and gradient correction. On 1B-8B LLMs, DualSelect preserves safety without losing task utility; using the REDORCA judge, it improves Safety Avg. over the strongest baseline by at least 5.10 points and remains highest in Safety Avg. across judges with moderate overhead. This view extends to retention focused continual learning.

URL PDF HTML ☆

赞 0 踩 0

2606.09875 2026-06-10 cs.LG cs.AI stat.ML 新提交

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

集成局部和全局熵用于大语言模型的不确定性量化

Johanne Medina, Tianyi Zhou, Keivin Isufaj, Aristides Gionis, Sanjay Chawla

AI总结本文提出GLU方法，通过融合隐藏状态几何熵（全局）和token级熵（局部）来量化LLM不确定性，有效捕捉自信但错误的失败模式，无需额外训练。

Comments 17 pages, 2 figures

详情

AI中文摘要

大语言模型会自信地产生幻觉，使得不确定性量化（UQ）对于可靠部署至关重要。现有方法主要依赖token级信号，而中间隐藏状态的几何结构未被充分利用。在本文中，我们将隐藏状态矩阵的几何复杂度作为LLM全局不确定性的度量，同时将token级不确定性估计视为局部度量。我们表明，隐藏状态几何熵（全局不确定性）和token级熵（局部不确定性）在统计上近似正交，捕捉了可靠性预测的不同失败模式。特别地，全局几何恢复了局部信号系统性遗漏的自信但错误的失败模式。基于此，我们提出了全局-局部不确定性（GLU），这是一种无监督、单次前向传播的分数，通过乘法门融合两种信号。在三个模型族和六个基准测试中，GLU匹配或优于所有无监督基线，同时仅需一次前向传播，且保持长度归一化和架构无关性。

英文摘要

Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.

URL PDF HTML ☆

赞 0 踩 0

2606.09876 2026-06-10 cs.LG 新提交

Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs

校准过度自信而不牺牲自信：基于探针条件的头部干预方法用于LLMs

Ke Li, Chongzhe Zhang, Zifan Zeng, Feng Liu, Qunli Zhang, Zheng Hu

发表机构 * Huawei Heisenberg Research Center（华为海森堡研究中心）； EPFL（苏黎世联邦理工学院）； TU Berlin（柏林技术大学）； TUM（慕尼黑工业大学）

AI总结提出PCHI方法，通过冻结探针检测可能错误但自信的响应，并条件性调整注意力头输出，在保持正确自信的同时减少过度自信，将ECE从21.9%降至9.2%。

Comments 11 pages, 4 figures

详情

AI中文摘要

大型语言模型常常对错误答案表现出高置信度。标准的校准方法通常全局或分数级别操作，减少无根据的自信，但也冒着侵蚀正确答案上有根据的自信的风险。我们引入了基于探针条件的头部干预（PCHI），一种推理时方法，使用冻结探针检测可能错误但自信的响应，并在置信度生成期间条件性地重新缩放下游注意力头的输出。在Qwen3-4B-Instruct解决OpenMathInstruct问题（具有结构化二进制置信度字段）上，读取令牌PCHI将82.2%的原始错误-是置信度读数转换为$\texttt{no}$，而跨上游置信度模板令牌的联合干预将ECE从21.9%降至9.2%，并且仅损坏5.1%的原始正确-是读数。读取令牌效应也出现在Gemma3-4B上，尽管上游干预较弱且更依赖于掩码。这些结果表明，通过条件性应用的内部干预，可以选择性地减少口头表达的过度自信，部分解耦抑制无根据自信与损失有根据自信之间的关系。

英文摘要

Large language models often express high confidence in answers that are wrong. Standard calibration remedies typically act globally or at the score level, reducing unwarranted confidence but also risking erosion of warranted confidence on correct answers. We introduce Probe-Conditioned Head Intervention (PCHI), an inference-time method that uses a frozen probe to detect likely wrong-but-confident responses and conditionally rescales downstream attention-head outputs during confidence generation. On Qwen3-4B-Instruct solving OpenMathInstruct problems with a structured binary confidence field, readout-token PCHI converts 82.2% of originally wrong-yes confidence readouts to $\texttt{no}$, while a joint intervention across upstream confidence-template tokens reduces ECE from 21.9% to 9.2% and damages only 5.1% of originally correct-yes readouts. The readout-token effect also appears on Gemma3-4B, though upstream interventions are weaker and more mask-dependent. These results show that verbalized overconfidence can be selectively reduced through conditionally applied internal intervention, partially decoupling the suppression of unwarranted confidence from the loss of warranted confidence.

URL PDF HTML ☆

赞 0 踩 0

2606.09881 2026-06-10 cs.LG cs.CR cs.CV 新提交

Toward Calibrated, Fair, and accurate Deepfake Detection

迈向校准、公平且准确的深度伪造检测

Ryan Brown, Chris Russell

发表机构 * University of Oxford（牛津大学）

AI总结提出Face-Fairness框架，通过Face-Feature Tuning实现无需人口统计标签的深度伪造检测公平性，同时保持或提升整体准确率。

详情

AI中文摘要

深度伪造检测器在不同人口群体间表现出较大的性能差距。现有的公平性方法需要人口统计标签、重新训练或牺牲准确性。我们引入了Face-Fairness (FF)，一个即插即用的偏差缓解框架。我们的主要贡献是Face-Feature Tuning (FFT)，这是首个在深度伪造检测中展示的无人口统计标签的公平性方法：一个轻量级校准器，基于冻结的人脸嵌入进行logit重映射。我们通过两种变体补充FFT：FF-Max，在人口统计标签可用时最大化最差组准确率；以及FF-Discover，通过嵌入发现的组实现相同目标。在域内和跨数据集测试设置中，FF一致地减少了FPR/TPR差距，提高了最小组准确率，同时保持（通常提升）整体准确率。该方法与检测器无关，增加了可忽略的运行时开销，并且不需要访问身份属性。

英文摘要

Deepfake detectors show large performance gaps across demographic groups. Existing fairness approaches require demographic labels, retraining, or sacrifice accuracy. We introduce Face-Fairness (FF), a plug-and-play framework for bias mitigation. Our primary contribution, Face-Feature Tuning (FFT), is the first demographic label-free fairness method demonstrated for deepfake detection: a lightweight calibrator that performs a logit remapping conditioned on frozen face embeddings. We complement FFT with two variants: FF-Max, which maximizes worst-group accuracy when demographics are available, and FF-Discover, which does the same with embedding-discovered groups. Across in-domain and cross-dataset test settings, FF consistently reduces FPR/TPR gaps and improves minimum group accuracy while maintaining (often improving) overall accuracy. The approach is detector-agnostic, adds negligible runtime overhead, and requires no access to identity attributes.

URL PDF HTML ☆

赞 0 踩 0

2606.09923 2026-06-10 cs.LG cs.AI 新提交

Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

神经算子的共形预测：物理模拟中无分布不确定性量化

Michael Chin

AI总结提出将分裂共形预测应用于神经算子物理模拟，实现无分布预测区间和有限样本覆盖保证，并通过归一化共形预测方案生成自适应宽度区间。

Comments 13 pages, 7 tables, 7 figures. Full-scale experiments on NVIDIA V100

详情

AI中文摘要

神经算子如傅里叶神经算子（FNO）已成为求解偏微分方程（PDE）的强大替代方法，比传统数值求解器快几个数量级。然而，在安全关键工程应用（如电子元件和电池系统的热管理）中部署这些模型，不仅需要准确的点预测，还需要严格的不确定性保证。现有的神经算子不确定性量化（UQ）方法，包括蒙特卡洛Dropout和深度集成，仅提供相对不确定性估计，没有正式的覆盖保证。在这项工作中，我们首次将分裂共形预测应用于基于神经算子的物理模拟，提供具有有限样本覆盖保证的无分布预测区间。我们进一步引入了一种归一化共形预测方案，利用MC Dropout不确定性生成自适应宽度区间，在低不确定性区域产生更紧的区间，在模型不太确定的区域产生更宽的区间。在稳态热传导基准上的全规模实验（3370万参数，800个训练样本，5个集成成员，NVIDIA V100）表明，我们的方法在目标水平alpha=0.1下达到89.1%的经验覆盖率，同时生成反映底层物理不确定性结构的空间自适应预测区间。我们还提供了一个不确定性分解框架，将认知不确定性（占总量的68%）与偶然不确定性（占总量的32%）分离，为数据收集和模型改进提供可操作指导。我们的方法在一个开源平台上实现，具有REST API端点和交互式3D可视化。

英文摘要

Neural operators such as the Fourier Neural Operator (FNO) have emerged as powerful surrogates for solving partial differential equations (PDEs), achieving speedups of several orders of magnitude over traditional numerical solvers. However, deploying these models in safety-critical engineering applications -- such as thermal management of electronic components and battery systems -- requires not only accurate point predictions but also rigorous uncertainty guarantees. Existing uncertainty quantification (UQ) methods for neural operators, including Monte Carlo Dropout and Deep Ensembles, provide only relative uncertainty estimates without formal coverage guarantees. In this work, we propose the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees. We further introduce a normalized conformal prediction scheme that leverages MC Dropout uncertainty to produce adaptive-width intervals, yielding tighter intervals in regions of low uncertainty and wider intervals where the model is less certain. Full-scale experiments (33.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100) on steady-state heat conduction benchmarks demonstrate that our method achieves 89.1% empirical coverage at the target level of alpha=0.1, while producing spatially adaptive prediction intervals that reflect the underlying physical uncertainty structure. We also provide an uncertainty decomposition framework that separates epistemic uncertainty (68% of total) from aleatoric uncertainty (32% of total), offering actionable guidance for data collection and model improvement. Our method is implemented in an open-source platform with REST API endpoints and interactive 3D visualization.

URL PDF HTML ☆

赞 0 踩 0

2606.10137 2026-06-10 cs.LG 新提交

Ambiguous Strategic Classification

模糊策略分类

Ivri Hikri, Nir Rosenfeld

AI总结研究在监管要求部分信息披露下，学习器如何联合优化分类器及其不确定性，引入模糊性概念并开发高效算法。

2606.10154 2026-06-10 cs.LG cs.CR 新提交

Quality Is Not a Safety Proxy Under Quantization

质量不是量化下的安全代理

Sahil Kadadekar

AI总结研究发现量化检查点的质量指标无法替代直接安全测试，提出拒绝模板稳定性指数（RTSI）以识别危险行。

Comments 21 pages, 6 figures. Preprint

详情

AI中文摘要

量化检查点通常首先通过质量指标筛选，然后才进行直接安全测试（如果有的话）。本文在一个匹配的51行矩阵上审计了这一捷径，该矩阵涵盖6个模型、4个系列、7级GGUF梯度和AWQ/GPTQ INT4检查点。在这个矩阵中，捷径失败：所有36个质量-安全配对在模型间方向分裂，9个隐藏危险行加上1个接近隐藏危险行显示质量稳定或改善，而拒绝率下降12-68个百分点。11个AWQ/GPTQ行中有7个是隐藏危险。对17个Hugging Face支持的FP16/AWQ/GPTQ单元格进行的四探针机械后续研究未能挽救：熵、拒绝方向和校准探针是危险行的弱或零分离器，尽管探针识别的安全相关神经元整体上吸收了1.39倍的量化误差（p < 5×10^{-7}），但该效应并非特定于体系。Claude Sonnet 4重新标记了预定义分层集中的11,470个项目，与主要gemma3:12b判断者在89.9%的行上一致（κ=0.873，95% CI [0.866, 0.881]），并且改变了0/10个隐藏危险单元格。一个校准的研究内部行为筛选——拒绝模板稳定性指数（RTSI），由四个拒绝模板漂移特征构建并在该矩阵上校准——将10/10个隐藏或接近隐藏危险行路由到直接安全测试（Wilson 95% CI下限0.72），同时在样本内评分和行级留一验证下，将45个非基线行中的23个留在低风险桶中；在同一矩阵上，最佳单特征基线（唯一前缀率差、原始拒绝率差）在匹配桶大小下分别恢复9/10和8/10，跨堆栈传输需要重新校准。对于此处研究的量化检查点、模型系列和安全结果，保留的质量不能免除直接安全评估。

英文摘要

Quantized checkpoints are often screened first with quality metrics and only later, if at all, with direct safety tests. This paper audits that shortcut on a matched 51-row matrix spanning 6 models, 4 families, a 7-level GGUF ladder, and AWQ/GPTQ INT4 checkpoints. In this matrix the shortcut fails: all 36 quality-safety pairings split direction across models, and 9 hidden-danger rows plus 1 near-hidden-danger row show quality stable or improved while refusal falls by 12-68 percentage points. Seven of the 11 AWQ/GPTQ rows are hidden-danger. A four-probe mechanistic follow-up over the 17 Hugging Face-backed FP16/AWQ/GPTQ cells does not rescue it: entropy, refusal-direction, and calibration probes are weak or null separators of dangerous rows, and although probe-identified safety-associated neurons absorb 1.39$\times$ more quantization error overall ($p < 5 \times 10^{-7}$), the effect is not regime-specific. Claude Sonnet 4 relabels 11,470 items in a predefined stratified set, agrees with the primary gemma3:12b judge on 89.9\% of rows ($κ= 0.873$, 95\% CI [0.866, 0.881]), and changes 0/10 hidden-danger cells. A calibrated study-internal behavioral screen -- the Refusal Template Stability Index (RTSI), built from four refusal-template drift features and calibrated on this matrix -- routes 10/10 hidden- or near-hidden-danger rows to direct safety testing (Wilson 95\% CI lower bound 0.72) while leaving 23 of 45 non-baseline rows in a low-risk bucket under both in-sample scoring and row-level leave-one-out validation; on the same matrix, the best single-feature baselines (unique-prefix-rate-delta, raw refusal-rate delta) recover 9/10 and 8/10 respectively at matched bucket size, and cross-stack transfer requires recalibration. For the quantized checkpoints, model families, and safety outcomes studied here, retained quality cannot waive direct safety evaluation.

URL PDF HTML ☆

赞 0 踩 0

2606.10347 2026-06-10 cs.LG cs.LO 新提交

Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models

超越预测解释：基于逻辑的机器学习模型置信度解释

Vinícius Peixoto Chagas, Carlos Henrique Leitão Cavalcante, Thiago Alves Rocha

AI总结提出置信度感知的反绎解释，通过最小置信度阈值量化解释的置信保证，并设计算法生成满足用户指定置信阈值的最小解释，在提升置信保证的同时仅适度增加解释长度。

详情

AI中文摘要

机器学习越来越多地应用于关键领域，在这些领域中，预测及其相关的置信水平都会影响重要决策。为了增强此类场景的透明度，理解模型为何对其预测有信心或不确定非常重要。最近的基于逻辑的方法提供了反绎解释，即足以保持预测类别的最小特征子集，并具有正确性保证。然而，这些方法仅关注分类行为，可能产生覆盖低预测置信度实例的解释。在这项工作中，我们引入了最小置信度阈值（MCT）的概念，它量化了反绎解释提供的最弱置信度保证。基于这一概念，我们提出了置信度感知的反绎解释，它不仅保持预测类别，还保持用户指定的置信度保证。我们将MCT计算表述为一个优化问题，并引入了一种算法，用于生成满足所需置信度阈值的最小解释。我们在用于二分类的提升树上评估了所提出的框架，尽管该方法也适用于其他提供置信度分数的机器学习模型。实验结果表明，传统的反绎解释通常提供比被解释实例本身相关的置信度弱得多的置信度保证。相比之下，置信度感知的解释持续提高了解释所保证的最小置信度，同时仅需要适度增加解释长度。这些特性使得所提出的方法特别适用于预测正确性和置信度对于可信决策都至关重要的应用。

英文摘要

Machine learning is increasingly used in critical domains, where both predictions and their associated confidence levels influence important decisions. To enhance transparency in such scenarios, it is important to understand why a model is confident or uncertain about its predictions. Recent logic-based approaches provide abductive explanations, minimal subsets of features sufficient to preserve the predicted class, with correctness guarantees. However, these methods focus solely on classification behavior and may produce explanations that cover instances with low predictive confidence. In this work, we introduce the concept of Minimum Confidence Threshold (MCT), which quantifies the weakest confidence guarantee provided by an abductive explanation. Building upon this concept, we propose confidence-aware abductive explanations, which preserve not only the predicted class but also a user-specified confidence guarantee. We formulate MCT computation as an optimization problem and introduce an algorithm for generating minimal explanations that satisfy a desired confidence threshold. We evaluate the proposed framework on boosted trees for binary classification, although the approach is applicable to other machine learning models that provide confidence scores. Experimental results show that traditional abductive explanations often provide substantially weaker confidence guarantees than the confidence associated with the explained instance itself. In contrast, confidence-aware explanations consistently improve the minimum confidence guaranteed by an explanation while requiring only a modest increase in explanation length. These properties make the proposed approach particularly suitable for applications where both predictive correctness and confidence are essential for trustworthy decision making.

URL PDF HTML ☆

赞 0 踩 0

2606.10632 2026-06-10 cs.LG cs.AI 新提交

Is Fairness Truly Fair? Towards Reliable Lipschitz Fairness in Multi-Task Learning via Fixed-\texorpdfstring{$δ$}{delta} Alignment

公平真的公平吗？通过固定δ对齐实现多任务学习中可靠的Lipschitz公平性

Junbo Ding, Xin Zang, Chenchen Pan, Donghao Song, Jiaxin Zhu, Danhuai Guo

发表机构 * Beijing University of Chemical Technology（北京化工大学）

AI总结针对多任务学习中Lipschitz个体公平性评估受表示尺度干扰的问题，提出固定δ审计与受控正则化框架ReLiF，实现语义一致的公平性评估与权衡。

详情

DOI: 10.1145/3770855.3817938

AI中文摘要

Lipschitz风格的个体公平性形式化了语义相似的样本应获得相似预测的思想，但在多任务学习（MTL）中，其评估可能受到方法引起的表示尺度的干扰。本文识别了阈值混淆问题：当审计容差源自每个模型自身的表示距离时，不同算法会在不同的语义阈值下进行比较。阈值漂移分析进一步展示了偏差排名如何变化，并识别了排名保持的充分条件。我们提出了\textbf{ReLiF}，一个可靠性感知框架，将评估时的固定$\delta$审计与训练时的受控正则化分离。ReLiF使用共享参考容差进行可比较的审计，并通过违反率反馈控制器保持Lipschitz代理活跃而不让其主导随机训练。本文还发展了关于阈值漂移、参考容差选择以及huberized训练代理与其未平滑的正间隔对应物之间关系的支持性分析。在临床时间序列基准和NYUv2（NYU Depth V2）密集预测上的实验表明，固定$\delta$审计暴露了方法依赖阈值可能掩盖的效用-公平性权衡。在使用ResNet50骨干的NYUv2上，ReLiF在共享固定阈值下实现了有竞争力的效用，同时显著减少了对齐偏差。在临床基准上，ReLiF产生了受控的公平性正则化权衡，而固定$\delta$审计揭示任务平衡基线有时能实现更低的偏差，且真正的效用-公平性权衡仍然存在。这些结果支持固定$\delta$审计作为评估MTL中Lipschitz公平性的语义一致协议。

英文摘要

Lipschitz-style individual fairness formalizes the idea that semantically similar examples should receive similar predictions, but its evaluation in multi-task learning (MTL) can be confounded by method-induced representation scales. This paper identifies threshold confounding: when the auditing tolerance is derived from each model's own representation distances, different algorithms are compared under different semantic thresholds. A threshold-drift analysis further shows how Bias rankings can change and identifies sufficient conditions for ranking preservation. We propose \textbf{ReLiF}, a reliability-aware framework that separates evaluation-time fixed-$δ$ auditing from training-time controlled regularization. ReLiF uses a shared reference tolerance for comparable auditing and a violation-rate feedback controller to keep the Lipschitz surrogate active without letting it dominate stochastic training. This work also develops supporting analysis for threshold drift, reference-tolerance selection, and the relationship between the huberized training surrogate and its unsmoothed positive-margin counterpart. Experiments on clinical time-series benchmarks and NYUv2 (NYU Depth V2) dense prediction show that fixed-$δ$ auditing exposes utility--fairness trade-offs that method-dependent thresholds can obscure. On NYUv2 with a ResNet50 backbone, ReLiF achieves competitive utility while substantially reducing aligned bias under shared fixed thresholds. On clinical benchmarks, ReLiF yields controlled fairness-regularized trade-offs, while fixed-$δ$ auditing reveals that task-balancing baselines can sometimes achieve lower bias and that genuine utility--fairness trade-offs persist. These results support fixed-$δ$ auditing as a semantically consistent protocol for evaluating Lipschitz fairness in MTL.

URL PDF HTML ☆

赞 0 踩 0

2606.10669 2026-06-10 cs.LG cs.AI cs.CR 新提交

In Defense of Information Leakage in Concept-based Models

为基于概念模型中的信息泄露辩护

Mateo Espinosa Zarlenga

发表机构 * GitHub ； arXiv

AI总结本文重新审视基于概念模型中的信息泄露问题，提出良性泄露概念，通过优化训练目标，在概念不完整时利用泄露提升准确性和可干预性。

Comments Accepted as a position paper at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

基于概念的模型（CMs）是深度神经网络，其预测基于与人类可理解概念（如“圆形”、“条纹”等）对齐的表示。已有研究表明，这些模型会学习到泄露概念无关信息的表示。传统观点认为，这种泄露是不可取的，应予以消除，因为它会导致模型不可解释。在本文中，我们认为这种关于CMs中泄露的传统观点不仅是不恰当的（因为泄露如何使模型更不可解释的证据往往不明确），而且在常见的现实约束下必然导致不实用的CMs。具体来说，我们认为在概念不完整是常态的现实环境中，为了构建准确且可干预的CMs，某种程度的泄露往往是必要的。为此，我们提出存在所谓的良性泄露，并表明通过重新优化典型的CM训练目标，CMs可以鼓励并利用这种形式的泄露，而不会牺牲准确性或可干预性。

英文摘要

Concept-based models (CMs), deep neural networks that ground their predictions on representations aligned with human-understandable concepts (e.g., "round", "stripes", etc.), have been shown to learn representations that leak concept-irrelevant information. As the traditional narrative goes, this leakage is undesirable and should be eradicated as it leads to uninterpretable models. In this paper, we posit that this conventional view of leakage in CMs is not only ill-posed, as the evidence of how leakage makes a model less interpretable is often inconclusive, but also bound to lead to impractical CMs under common real-world constraints. Specifically, we argue that in real-world settings where concept incompleteness is the norm, some leakage is often necessary for constructing accurate and intervenable CMs. To this end, we propose that there is such a thing as benign leakage and show that, by optimizing a reframing of the typical CM training objective, CMs can encourage and exploit this form of leakage without sacrificing accuracy or intervenability.

URL PDF HTML ☆

赞 0 踩 0

2606.10734 2026-06-10 cs.LG stat.ME stat.ML 新提交

SPACR: Single-Pass Adaptive Training of Uncertainty-Aware Conformal Regressors

SPACR: 单次自适应训练的不确定性感知共形回归器

Soundouss Messoudi, Sylvain Rousseau, Sébastien Destercke

发表机构 * Heudiasyc - UMR CNRS 7253, Université de Technologie de Compiègne（法国贡比涅技术大学 - CNRS 7253联合实验室 Heudiasyc）

AI总结提出SPACR方法，通过可微损失直接训练不确定性感知回归器，联合优化效率和有效性，无需批分割或预定义置信水平，单个模型在推理时支持多置信水平预测区间，实验表明其区间更窄、覆盖-效率权衡更优且计算成本更低。

2606.10777 2026-06-10 cs.LG 新提交

Can we trust our models? Epistemic calibration in second-order classification

我们能信任我们的模型吗？二阶分类中的认知校准

Arthur Hoarau

发表机构 * Université de Lorraine, CentraleSupélec Loria, CNRS（洛林大学，中央理工-高等电力学院洛里亚实验室，法国国家科学研究中心）

AI总结提出认知校准准则，衡量认知不确定性估计是否可靠，并证明其比经典校准更严格，通过EECE指标实验揭示不同不确定性量化方法的差异。

详情

AI中文摘要

不确定性估计对于在高风险场景中部署机器学习模型至关重要。然而，经典校准仅评估预测概率的可靠性，并不评估认知不确定性估计本身是否可信。这一局限性对于二阶分类模型尤为突出。我们引入认知校准，这是一个有原则的准则，用于衡量报告的认知不确定性是否忠实地反映了模型预测围绕真实值的分散程度。我们证明认知校准是比经典校准更严格的概念，并能捕捉标准指标无法发现的失败模式。通过一个在认知校准假设下成立的不可能性定理，我们将这项工作与现有文献联系起来。为了将这一概念付诸实践，我们提出了期望认知校准误差（EECE），并证明它是真实认知校准误差（TECE）的一致估计量。在广泛的不确定性量化方法上的实验表明，认知校准是一个连贯且有意义的准则，并揭示了不同方法之间的显著差异，尽管它们的预测性能相似。

英文摘要

Uncertainty estimation is critical for deploying machine learning models in high-stakes settings. However, classical calibration only assesses the reliability of predicted probabilities and does not evaluate whether epistemic uncertainty estimates are themselves trustworthy. This limitation is particularly relevant for second-order classification models. We introduce epistemic calibration, a principled criterion that measures whether reported epistemic uncertainty faithfully reflects the dispersion of model predictions around the ground truth. We show that epistemic calibration is a strictly stronger notion than classical calibration and captures failure modes invisible to standard metrics. We relate this work to the existing literature through an impossibility theorem that holds under the epistemic calibration hypothesis. To operationalize this concept, we propose the Expected Epistemic Calibration Error (EECE), which we prove to be a consistent estimator of a True Epistemic Calibration Error (TECE). Experiments across a broad range of uncertainty quantification methods show that epistemic calibration is a coherent and meaningful criterion and reveal substantial differences across methods, despite similar predictive performance.

URL PDF HTML ☆

赞 0 踩 0

2606.10877 2026-06-10 cs.LG cs.CV 新提交

XtrAIn: Training-Guided Occlusion for Feature Attribution

XtrAIn：训练引导的遮挡特征归因

Thodoris Lymperopoulos, Ioannis Kakogeorgiou, Denia Kanellopoulou

发表机构 * NCSR Demokritos（希腊国家科学研究中心德谟克利特）

AI总结提出XtrAIn方法，将遮挡操作从输入空间转移到参数空间，通过跟踪模型训练轨迹测量特征相关参数更新对输出logits的影响，解决传统遮挡归因中的偏差和不稳定性问题。

Comments 12 pages, 7 figures, 1 table

详情

AI中文摘要

基于遮挡的归因方法通过扰动输入特征并测量模型输出的变化来估计特征重要性，提供了一种直观的方式。然而，其可靠性受到特征移除实现方式的强烈影响：外部选择的基线可能引入偏差、分布外样本和不稳定的解释，而在非线性模型中，遮挡一组特征也可能改变非遮挡特征的贡献。我们将这种效应称为归因偏移，因为非遮挡特征的归因分数偏离其初始值。为了解决这些导致解释不稳定的主要问题，我们引入了XtrAIn，一种训练引导的归因方法，将遮挡操作从输入空间转移到参数空间。XtrAIn不用于工基线替换输入值，而是遵循模型的训练轨迹，测量特征相关参数更新如何影响输出logits。我们进一步引入了Xstep，一种轻量级近似方法以降低计算成本，以及XtrAIn+，一种目标聚焦变体，强调与目标类别一致的更新。在受控图像数据集和PAM50乳腺癌亚型分类上的实验表明，所提出的方法比标准归因基线产生更清晰、更可解释的归因模式。总体而言，XtrAIn提供了对特征归因的训练感知视角，并为研究训练过程中特征级证据的形成提供了有用的诊断工具。

英文摘要

Occlusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.

URL PDF HTML ☆

赞 0 踩 0

广义共形预测系统在分布偏移下的应用

Jef Jonkers, Johanna Ziegel

发表机构 * IDLab Seminar for Statistics（统计研究所研讨会）； Department of Electronics（电子系）； ETH Zurich（苏黎世联邦理工学院）； Information Systems Zurich, Switzerland（苏黎世信息系统，瑞士）； Ghent University（根特大学）

AI总结针对分布偏移，通过观测特定置换权重编码偏移，扩展广义共形预测系统，提出偏移感知预测系统，并引入权重不确定性框构建鲁棒共形预测系统包络，提供有限样本或渐近置信保证。

Comments 27 pages, 10 figures

2403.00420 2026-06-10 cs.LG cs.AI 版本更新

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

通过对抗攻击和训练实现鲁棒深度强化学习：综述

Lucas Schott, Josephine Delas, Hatem Hajri, Elies Gherbi, Reda Yaich, Nora Boulahia-Cuppens, Frederic Cuppens, Sylvain Lamprier

发表机构 * Institut de Recherche Technologique SystemX（技术研究 institute SystemX）

AI总结本文综述了深度强化学习中的对抗攻击与训练方法，系统分类并比较其目标与机制，以提升模型对环境变化和扰动的鲁棒性。

Comments 83 pages, 17 figues, 3 table, 15 algorithms

2509.25017 2026-06-10 cs.LG cs.CV 版本更新

Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting

不确定性感知的深度学习用于野火危险预测

Spyros Kondylatos, Nikolas Papadopoulos, Gustau Camps-Valls, Ioannis Papoutsis

发表机构 * Aix-Marseille University（艾克斯-马赛大学）； University of Cambridge（剑桥大学）； University of Malaga（马拉加大学）； University of Crete（希腊克里特大学）

AI总结提出不确定性感知深度学习框架，联合捕获认知不确定性和偶然不确定性，提升短期野火危险预测的准确性和可靠性，F1分数提高2.3%，预期校准误差降低2.1%。

详情

AI中文摘要

野火是最严重的自然灾害之一，对人类和自然生态系统构成重大威胁。日益增长的野火风险增加了对不仅准确而且可靠的预测模型的需求。深度学习在预测野火危险方面显示出潜力；然而，其采用受到对其预测可靠性的担忧的阻碍，部分源于缺乏不确定性量化。为应对这一挑战，我们提出了一个不确定性感知的深度学习框架，该框架联合捕获认知（模型）和偶然（数据）不确定性，以增强短期野火危险预测。在次日预测中，与确定性基线相比，我们表现最佳的模型将F1分数提高了2.3%，并将预期校准误差降低了2.1%，从而提升了预测技能和校准能力。我们的实验证实了不确定性估计的可靠性，并展示了它们在决策支持中的实际效用，包括识别拒绝低置信度预测的不确定性阈值，以及生成伴随不确定性层的良好校准的野火危险图。将预测范围延长至十天，我们观察到偶然不确定性随时间增加，表明环境条件的更大变异性，而认知不确定性保持稳定。最后，我们表明，尽管两种不确定性类型在低不确定性情况下可能是冗余的，但在更具挑战性的条件下它们提供互补的见解，强调了联合建模对稳健野火危险预测的价值。总之，我们的方法显著提高了野火危险预测的准确性和可靠性，推动了可信赖的野火深度学习系统的发展。

英文摘要

Wildfires are among the most severe natural hazards, posing a significant threat to both humans and natural ecosystems. The growing risk of wildfires increases the demand for forecasting models that are not only accurate but also reliable. Deep Learning (DL) has shown promise in predicting wildfire danger; however, its adoption is hindered by concerns over the reliability of its predictions, some of which stem from the lack of uncertainty quantification. To address this challenge, we present an uncertainty-aware DL framework that jointly captures epistemic (model) and aleatoric (data) uncertainty to enhance short-term wildfire danger forecasting. In the next-day forecasting, our best-performing model improves the F1 Score by 2.3% and reduces the Expected Calibration Error by 2.1% compared to a deterministic baseline, enhancing both predictive skill and calibration. Our experiments confirm the reliability of the uncertainty estimates and illustrate their practical utility for decision support, including the identification of uncertainty thresholds for rejecting low-confidence predictions and the generation of well-calibrated wildfire danger maps with accompanying uncertainty layers. Extending the forecast horizon up to ten days, we observe that aleatoric uncertainty increases with time, showing greater variability in environmental conditions, while epistemic uncertainty remains stable. Finally, we show that although the two uncertainty types may be redundant in low-uncertainty cases, they provide complementary insights under more challenging conditions, underscoring the value of their joint modeling for robust wildfire danger prediction. In summary, our approach significantly improves the accuracy and reliability of wildfire danger forecasting, advancing the development of trustworthy wildfire DL systems.

URL PDF HTML ☆

赞 0 踩 0

2603.25670 2026-06-10 cs.LG cs.SE 版本更新

Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring

不确定性引导的标签重平衡用于CPS安全监控

John Ayotunde, Qinghua Xu, Guancheng Wang, Lionel C. Briand

发表机构 * Lero Research Ireland Centre for Software Research, University of Limerick, Castletroy, Limerick, Ireland（勒罗爱尔兰软件研究中心，利默里克大学，卡斯莱特里，利默里克，爱尔兰）； University of Ottawa, Canada（渥太华大学，加拿大）； Lero Research Ireland Centre for Software Research, University of Limerick, Ireland（勒罗爱尔兰软件研究中心，利默里克大学，爱尔兰）

AI总结提出U-Balance方法，利用行为不确定性对CPS时间序列数据进行标签重平衡，通过GatedMLP预测不确定性并概率性重标异常安全样本，在无人机基准上F1达0.806，优于基线14.3个百分点。

Comments 11 pages (main content), 3 pages references, 5 figures, 5 tables

详情

AI中文摘要

安全监控对于信息物理系统（CPS）至关重要。然而，实际CPS运行中不安全事件罕见，导致极端类别不平衡，降低了安全预测器的性能。标准重平衡技术对时间序列CPS遥测数据表现不佳，要么生成不真实的合成样本，要么对少数类过拟合。同时，CPS操作中的行为不确定性（定义为CPS决策中的怀疑或不确定程度）通常与安全结果相关，但在安全监控中尚未被探索。为此，我们提出U-Balance，一种监督方法，在训练安全预测器之前利用行为不确定性对不平衡数据集进行重平衡。U-Balance首先训练一个基于GatedMLP的不确定性预测器，将每个遥测窗口总结为分布运动学特征并输出不确定性分数。然后，它应用不确定性引导的标签重平衡（uLNR）机制，将具有异常高不确定性的安全标记窗口概率性地重新标记为不安全，从而在不合成新数据的情况下，用信息丰富的边界样本丰富少数类。最后，在重平衡数据集上训练安全预测器用于安全监控。我们在一个安全与不安全比例为46:1的大规模无人机基准上评估U-Balance。结果证实了行为不确定性与安全之间存在中等但显著的相关性。然后，我们确定uLNR是利用不确定性信息的最有效策略，优于直接早期融合和晚期融合。U-Balance实现了0.806的F1分数，比最强基线高出14.3个百分点，同时保持了有竞争力的推理效率。消融研究证实，基于GatedMLP的不确定性预测器和uLNR机制都对U-Balance的有效性有显著贡献。

英文摘要

Safety monitoring is essential for Cyber-Physical Systems (CPSs). However, unsafe events are rare in real-world CPS operations, creating an extreme class imbalance that degrades safety predictors. Standard rebalancing techniques perform poorly on time-series CPS telemetry, either generating unrealistic synthetic samples or overfitting on the minority class. Meanwhile, behavioral uncertainty in CPS operations, defined as the degree of doubt or uncertainty in CPS decisions , is often correlated with safety outcomes but unexplored in safety monitoring. To that end, we propose U-Balance, a supervised approach that leverages behavioral uncertainty to rebalance imbalanced datasets prior to training a safety predictor. U-Balance first trains a GatedMLP-based uncertainty predictor that summarizes each telemetry window into distributional kinematic features and outputs an uncertainty score. It then applies an uncertainty-guided label rebalancing (uLNR) mechanism that probabilistically relabels $\textit{safe}$-labeled windows with unusually high uncertainty as $\textit{unsafe}$, thereby enriching the minority class with informative boundary samples without synthesizing new data. Finally, a safety predictor is trained on the rebalanced dataset for safety monitoring. We evaluate U-Balance on a large-scale UAV benchmark with a 46:1 safe-to-unsafe ratio. Results confirm a moderate but significant correlation between behavioral uncertainty and safety. We then identify uLNR as the most effective strategy to exploit uncertainty information, compared to direct early and late fusion. U-Balance achieves a 0.806 F1 score, outperforming the strongest baseline by 14.3 percentage points, while maintaining competitive inference efficiency. Ablation studies confirm that both the GatedMLP-based uncertainty predictor and the uLNR mechanism contribute significantly to U-Balance's effectiveness.

URL PDF HTML ☆

赞 0 踩 0

2605.03217 2026-06-10 cs.LG cs.CY 版本更新

对机器文本检测器的攻击保留风格指纹

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

发表机构 * GitHub ； University of California, Berkeley（加州大学伯克利分校）

AI总结研究机器文本检测器对抗攻击的局限性，提出一种同时优化不可检测性和特定人类风格的 paraphrasing 方法，发现单文档检测不可靠，需多文档分析。

详情

AI中文摘要

尽管机器文本检测器的开发取得了显著进展，但机器文本容易被操纵以逃避检测，这导致有人认为该问题本质上是难以解决的。在这项工作中，我们研究了这种逃避策略的局限性。我们证明，尽管当前的攻击（从提示工程到检测器引导的优化）可以有效降低标准检测器的性能，但它们无法抹去机器文本底层的风格“指纹”。我们表明，利用风格特征空间的少样本检测器对这些逃避尝试具有鲁棒性，即使对于明确调整以逃避检测的模型生成的样本也能可靠地检测。这引发了一个问题：风格是否代表了对机器检测攻击的通用防御？我们通过引入一种新颖的 paraphrasing 方法来证明答案是“不”，该方法同时优化不可检测性和对特定人类风格的遵循。我们表明，与先前方法不同，这种攻击有效逃避了所有考虑的检测器，包括那些利用写作风格的检测器。然而，我们发现这种逃避并非绝对：随着可供分析的文档数量增加，人类和机器分布再次变得可区分。总体而言，我们的发现表明，可靠的机器文本检测需要从单文档分析转向多文档分析。

英文摘要

Despite considerable progress in the development of machine-text detectors, the ease with which machine-text can be manipulated to evade detection has led to suggestions that the problem is inherently intractable. In this work, we investigate the limits of such evasion strategies. We demonstrate that while current attacks, ranging from prompt engineering to detector-guided optimization can effectively degrade performance of standard detectors, they fail to erase the underlying stylistic "fingerprints" of machine text. We show that few-shot detectors that utilize the stylistic feature space are robust to these evasion attempts, reliably detecting samples even from models explicitly tuned to prevent detection. This raises the question: does style represent a universal defense against machine-detection attacks? We demonstrate that the answer is "no'' by introducing a novel paraphrasing approach that simultaneously optimizes for undetectability and adherence to specific human styles. We show that unlike prior methods, this attack effectively evades all considered detectors, including those that utilize writing style. However, we find that this evasion is not absolute: as the number of documents available for analysis grows, the human and machine distributions become distinguishable again. Overall, our findings suggest that reliable machine-text detection requires moving beyond single-document analysis to multi-document analysis.

URL PDF HTML ☆

赞 0 踩 0

2604.24668 2026-06-10 cs.AI cs.LG 版本更新

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

同意的代价：在代理金融应用中衡量LLM的谄媚行为

Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel

发表机构 * Writer, Inc.（Writer公司）

AI总结研究评估LLM在金融代理任务中的谄媚行为，发现模型对用户反驳仅表现低至中等性能下降，但偏好信息导致多数模型失败，并测试了输入过滤等恢复方法。

Comments Accepted to ICLR 2026 FinAI Workshop

详情

AI中文摘要

鉴于当今LLM在金融系统中的使用增加，评估此类系统的安全性和鲁棒性变得重要。LLM在通用领域设置中经常表现出的一种失败模式是谄媚行为。也就是说，模型优先考虑与表达的用户信念一致，而非正确性，导致准确性和信任度下降。在这项工作中，我们专注于评估LLM在代理金融任务中表现出的谄媚行为。我们的发现有三方面：首先，我们发现模型在面对用户对参考答案的反驳或矛盾时，仅表现出低至中等的性能下降，这区别于先前工作中模型在金融代理设置中表现出的谄媚行为。其次，我们引入了一套任务，通过用户偏好信息（与参考答案矛盾）来测试谄媚行为，并发现大多数模型在存在此类输入时失败。最后，我们基准测试了不同的恢复模式，例如使用预训练LLM进行输入过滤。

英文摘要

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leading to decreased accuracy and trust. In this work, we focus on evaluating sycophancy that LLMs display in agentic financial tasks. Our findings are three-fold: first, we find the models show only low to modest drops in performance in the face of user rebuttals or contradictions to the reference answer, which distinguishes sycophancy that models display in financial agentic settings from findings in prior work. Second, we introduce a suite of tasks to test for sycophancy by user preference information that contradicts the reference answer and find that most models fail in the presence of such inputs. Lastly, we benchmark different modes of recovery such as input filtering with a pretrained LLM.

URL PDF HTML ☆

赞 0 踩 0

2605.22714 2026-06-10 cs.AI cs.CL cs.LG 版本更新

AMEL: Accumulated Message Effects on LLM Judgments

AMEL: 累积消息对LLM判断的影响

Sid-Ali Temkit

发表机构 * chut.app

AI总结研究LLM在对话中因历史消息极性而偏离基准判断的累积消息效应（AMEL），发现模型偏向历史主流极性，且负向历史偏见更强，但偏见不随上下文长度增长，简单修复是为每个项目使用新上下文。

Comments 24 pages, 14 figures, 8 tables. Single author. Code, data (84,088 deduplicated API responses), and analysis pipeline at https://github.com/chutapp/amel

详情

AI中文摘要

大型语言模型常被用作自动评估者：审查代码、审核内容或评分输出，通常许多项目通过一次对话处理。我们询问先前对话历史的极性是否会偏倚后续判断，我们将这种效应称为LLM判断的累积消息效应（AMEL）。通过对来自4个提供商（OpenAI、Anthropic、Google和四个开源模型）的11个模型进行75,898次API调用，我们在隔离或跟随以正面或负面评价为主的历史之后呈现相同的测试项目。模型倾向于对话的主流极性（d = -0.17, p < 10^-46）。该效应集中在模型在基线时真正不确定的项目上（高熵项目d = -0.34，而基线确定时d = -0.15）。偏见不随上下文长度增长：5个先前轮次和50个产生相同的偏移（Spearman |r| < 0.01；OLS斜率p = 0.80）。并且存在负性不对称：按项目配对，负面历史诱导的偏见是正面的1.62倍（t = 13.46, p < 10^-39, n = 2,481）。扩展规模有帮助但不能解决（Anthropic: Haiku -0.22到Opus -0.17；OpenAI: Nano -0.34到GPT-5.2 -0.17）。三项后续研究缩小了机制范围。令牌概率分布连续变化，而非在阈值处。负性不对称既有令牌级成分也有语义成分，尽管在我们的样本量下平衡归因是探索性的。位置不重要：在50轮历史中任何位置的五个有偏轮次产生相同的偏移。评估流程最简单的修复是为每个项目使用新上下文；当批处理不可避免时，平衡历史有帮助。

英文摘要

Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items passing through one conversation. We ask whether the polarity of prior conversation history biases subsequent judgments, an effect we call the accumulated message effect on LLM judgments (AMEL). Across 84,088 API calls to 12 models from 5 providers (OpenAI, Anthropic, Google, DeepSeek, and four open-source models), we present identical test items in isolation or following histories saturated with predominantly positive or negative evaluations. Models shift toward the conversation's prevailing polarity (d = -0.17, p < 10^-53). The effect concentrates on items where the model is genuinely uncertain at baseline (d = -0.36 for high-entropy items, vs d = -0.15 when the baseline is deterministic). Bias does not grow with context length: 5 prior turns and 50 produce the same shift (Spearman |r| < 0.01; OLS slope p = 0.80). And there is a negativity asymmetry: paired per item, negative histories induce 1.52x more bias than positive (t = 13.03, p < 10^-36, n = 2,733). Scaling helps but does not solve it (Anthropic: Haiku -0.22 to Opus -0.17; OpenAI: Nano -0.34 to GPT-5.2 -0.17). Three follow-ups narrow the mechanism. The token probability distribution shifts continuously, not at a threshold. The negativity asymmetry has both token-level and semantic components, though attributing the balance is exploratory at our sample sizes. Position does not matter: five biased turns anywhere in a 50-turn history produce the same shift. The simplest fix for evaluation pipelines is a fresh context per item; when batching is unavoidable, balancing the history helps.

URL PDF HTML ☆

赞 0 踩 0

2606.09475 2026-06-10 cs.AI cs.LG 版本更新

Emergent alignment and the projectability of ethical personas

涌现对齐与伦理人格的可投射性

Guillermo Del Pinal, Youngchan Lee, Calum McNamara, Alejandro Perez Carballo

发表机构 * University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校）； Indiana University Bloomington（印第安纳大学布卢明顿分校）

AI总结研究微调大语言模型在窄任务上如何引发广泛对齐行为，通过宪法AI方法赋予模型伦理人格，发现窄对齐可投射到未训练类别，并提出对齐策略应评估可投射性。

详情

AI中文摘要

关于“涌现错位”的研究表明，在窄任务上微调LLM会诱导广泛错位的行为。这支持了“人格选择”（PSM）假说：在预训练期间，LLM学会模拟不同的角色和视角，这些可以在后训练期间被激发和细化。本文研究了相反的现象“涌现对齐”，并用它来支持和细化PSM，并激发对齐的新需求。我们在广泛和狭窄的安全任务上微调一个仅帮助型模型。为了创建SFT样本，我们遵循“宪法AI”（CAI）方法，并使用四条编码合理对齐策略的宪法：道义论、后果论、美德伦理以及将AI对齐为从属于人类权威。对于每个模型，我们表明，在两个狭窄的安全子类别上微调可靠地诱导出在代表性的一般安全类别以及我们直接从用于窄对齐的数据集中过滤掉的安全子类别上的涌现对齐。为了使用更细粒度的评估测试“PSM”，我们使用了多维“伦理人格”诊断。对于每个宪法微调（广泛/狭窄）模型，我们评估其行为与预期特征概况的匹配程度。我们的结果表明，我们的CAI模型获得了预期的“伦理人格”——例如，使用后果论宪法创建的SFT样本窄微调的模型与功利主义信念的一致性显著高于道义论信念。然而，我们的粗粒度和细粒度评估显示，我们的（广泛/狭窄）微调CAI模型在投射效果上存在显著差异。我们得出结论，对齐策略不仅应基于其（分布内）一般安全性能进行评估，还应特别基于其可投射性程度进行评估。

英文摘要

Work on `emergent misalignment' shows that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the `persona selection' (PSM) hypothesis: during pre-training, LLMs learn to simulate different characters and perspectives, which can be elicited and refined during post-training. This paper investigates the converse phenomenon, `emergent alignment', and uses it to support and refine the PSM and motivate a novel desideratum for alignment. We finetune a helpful-only model on broad and narrow safety tasks. To create SFT samples, we follow the `Constitutional AI' (CAI) approach and use four constitutions which encode reasonable alignment strategies: deontology, consequentialism, virtue ethics, and aligning AIs as subordinate to human authority. For each of those models, we show that finetuning on two narrow safety sub-categories reliably induces emergent alignment over a representative set of general safety categories, and on safety subcategories that we directly filtered-out of the data sets used for narrow alignment. To test the `PSM' using a more fine-grained evaluation, we used a multidimensional `ethical persona' diagnostic. For each constitutionally finetuned (broad/narrow) model, we evaluate how well their behavior matches their expected signature profile. Our results show that our CAI models acquire their expected ``ethical persona'' -- e.g., the model narrowly fine-tuned on SFT samples created using the consequentialist constitution agrees significantly more with utilitarian than deontological beliefs. Yet our coarse and fine-grained evaluations show that there are significant differences across our (broad/narrow) finetuned CAI models in how well they project. We conclude that alignment strategies should be evaluated, not just on their (in-distribution) general safety performance, but also specifically on their degree of projectability.

URL PDF HTML ☆

赞 0 踩 0

2606.09872 2026-06-10 cs.LG cs.AI 新提交

PatchSTG: Scalable Spatiotemporal Graph Transformers for Traffic Forecasting on Irregular Sensor Networks

PatchSTG：面向不规则传感器网络的交通预测可扩展时空图Transformer

Jichao Li, Xuanming Shi

发表机构 * Portsmouth Abbey School（波特茅斯修道学校）； CodingFuture (Shanghai) Education Technology Co., Ltd.（上海CodingFuture教育科技有限公司）

AI总结提出PatchSTG，通过地理信息将传感器划分为平衡的局部补丁，并采用双注意力编码器交替捕捉局部和全局依赖，实现计算复杂度从二次降至近线性，在不规则传感器网络上取得高效且稳定的交通预测性能。

Comments 22 pages,12 figures

详情

AI中文摘要

交通预测是智能交通系统的基本组成部分，但由于传感器分布不规则以及建模大规模时空依赖的高计算成本，在实际环境中仍然具有挑战性。在实际交通网络中，传感器在不同区域分布不均，导致空间结构不均匀，限制了现有基于图和基于注意力模型的有效性和可扩展性。为了解决这些挑战，我们提出了PatchSTG，一种基于补丁的时空图Transformer，专为不规则传感器网络上的高效预测而设计。关键思想是引入一种层次化空间表示，基于地理信息将传感器划分为平衡且保持局部性的补丁。在此结构之上，双注意力编码器交替进行补丁内注意力（捕捉局部交互）和补丁间注意力（建模全局依赖），将计算复杂度从二次降低到近线性。我们在罗德岛的真实交通数据以及额外的大规模数据集上评估了PatchSTG。实验结果表明，所提模型在多个预测时域上实现了稳定且具有竞争力的预测性能，同时显著提高了计算效率。消融研究进一步验证了空间划分和双注意力在捕捉局部和长程交通动态方面的有效性。这些结果表明，基于补丁的时空建模为不规则空间设置下的交通预测提供了一个可扩展且有效的框架。

英文摘要

Traffic forecasting is a fundamental component of intelligent transportation systems, yet remains challenging in real-world settings due to irregular sensor distributions and the high computational cost of modeling large-scale spatiotemporal dependencies. In practical traffic networks, sensors are unevenly distributed across regions, leading to non-uniform spatial structures that limit the effectiveness and scalability of existing graph-based and attention-based models. To address these challenges, we propose PatchSTG, a patch-based spatiotemporal graph Transformer designed for efficient forecasting on irregular sensor networks. The key idea is to introduce a hierarchical spatial representation that partitions sensors into balanced, locality-preserving patches based on geographic information. On top of this structure, a dual attention encoder alternates between intra-patch attention for capturing local interactions and inter-patch attention for modeling global dependencies, reducing computational complexity from quadratic to near-linear scaling. We evaluate PatchSTG on real-world traffic data from Rhode Island and additional large-scale datasets. Experimental results demonstrate that the proposed model achieves stable and competitive forecasting performance across multiple horizons, while significantly improving computational efficiency. Ablation studies further validate the effectiveness of spatial partitioning and dual attention in capturing both local and long-range traffic dynamics. These results suggest that patch-based spatiotemporal modeling provides a scalable and effective framework for traffic forecasting under irregular spatial settings.

URL PDF HTML ☆

赞 0 踩 0

2606.10071 2026-06-10 cs.LG cs.AI 新提交

Temporal Sheaf Neural Networks with Dynamic Orthogonal Transport

时序层神经网络与动态正交传输

Md Sadek Hossain Asif, Tanzila Khan, Md. Mosaddek Khan

AI总结提出时序层神经网络（TSNN），通过动态正交帧和局部坐标系间显式传输实现时序链接预测，在多种基准上超越现有方法，尤其适用于节点角色异质性强的图。

详情

AI中文摘要

我们引入了时序层神经网络（TSNN），这是一个时序链接预测框架，它为每个节点配备一个时变正交帧，并仅在局部坐标系之间进行显式传输后比较节点状态。与在共享全局嵌入空间中运行的现有连续时间图模型不同，TSNN通过动态局部帧建模节点特定且不断演化的交互语义。该模型通过高效的低秩Householder乘积参数化每个节点的帧，在帧更新下精确保留存储的隐藏状态，并使用几何残差解码器，该解码器基于传输距离锚定预测，同时学习残差校正。所有计算严格因果，仅使用事件前历史。我们证明了对称度归一化层拉普拉斯算子与对称归一化图拉普拉斯算子正交相似，而随机游走归一化形式在相应度度量下相似；TSNN使用的全激活、特征缩放扩散正是组合层Dirichlet能量上的度量梯度步，具有无度单调下降和非扩张保证。帧漂移仅线性扰动更新。在TGB v2链接预测和时序异质排行榜以及DGB基准套件上，TSNN在大多数基准上匹配或超越最强先前方法，在表现出强节点角色异质性的图上改进最大。消融实验证实了动态帧、正交传输和几何残差解码的独特优势。

英文摘要

We introduce Temporal Sheaf Neural Networks (TSNN), a temporal link prediction framework that equips each node with a time-varying orthogonal frame and compares node states only after explicit transport between local coordinate systems. In contrast to existing continuous-time graph models that operate in a shared global embedding space, TSNN models node-specific and evolving interaction semantics through dynamic local frames. The model parameterizes per-node frames via efficient low-rank Householder products, preserves stored hidden states exactly under frame updates, and uses a geometric-residual decoder that anchors predictions on transported distances while learning residual corrections. All computations are strictly causal and use only the pre-event history. We show that the symmetric degree-normalized sheaf Laplacian is orthogonally similar to the symmetric normalized graph Laplacian, with the random-walk normalized form similar in the corresponding degree metric; the full-active, feature-scaled diffusion used by TSNN is exactly a metric-gradient step on the combinatorial sheaf Dirichlet energy, with a degree-free monotone-descent and non-expansiveness guarantee. Frame drift perturbs updates only linearly. Across TGB v2 link-prediction and temporal-heterogeneous leaderboards, together with the DGB benchmark suite, TSNN matches or surpasses the strongest prior methods on most benchmarks, with the largest improvements on graphs exhibiting strong node-role heterogeneity. Ablations confirm the distinct benefit of dynamic frames, orthogonal transport, and geometric-residual decoding.

URL PDF HTML ☆

赞 0 踩 0

2606.10249 2026-06-10 cs.LG cs.SI 新提交

When Design Rules Break: Benchmark Composition Determines Whether Label Informativeness Predicts GNN Aggregator Choice

当设计规则失效：基准组成决定标签信息性是否预测GNN聚合器选择

Neha Sharma, Ritesh Sharma

发表机构 * Department of Computer Science（计算机科学系）； Virginia Commonwealth University（弗吉尼亚 Commonwealth 大学）； Department of Electrical and Computer Engineering（电气与计算机工程系）

AI总结研究图神经网络聚合器选择（sum/mean/max）在24个节点分类数据集上的泛化性，发现标签信息性仅在传统基准上有效，在Facebook-100密集图中失效，且谱间隙能区分这些图。

详情

AI中文摘要

我们通过研究在24个节点分类数据集（涵盖引文、异嗜、LINKX Facebook-100、共同购买和共同作者图）上的聚合器选择（sum、mean、max），检验图神经网络（GNN）设计规则是否跨基准族泛化。边同嗜性仅能微弱预测GIN-Sum与GIN-Mean的性能差距。标签信息性在传统基准上能很好地预测这一差距，但当包含Facebook-100图时，预测能力大幅下降。在这些密集的朋友关系网络中，接近零的标签信息性与对sum聚合的强烈偏好共存，在扩展训练下产生7-10%的提升，最高达13%。随机块模型消融实验（包括匹配Facebook-100度规模的度修正变体）未能重现这一行为，表明平均度本身不能解释该效应。在若干与标签无关的图统计量中，谱间隙唯一地将这些图与其他低信息性数据集区分开来，该效应局限于单跳邻域并在不同架构中复现。我们进一步识别了与聚合器选择交互的训练机制，并表明PNA在标准引文基准上可能不如最佳单聚合器GIN。我们的结果表明，决定设计规则是否看似泛化的是基准组成而非数值不足，并且Facebook-100基准为未来的自适应聚合方法提供了具体目标。

英文摘要

We examine whether graph neural network (GNN) design rules generalize across benchmark families by studying aggregator selection (sum, mean, max) on 24 node-classification datasets spanning citation, heterophilic, LINKX Facebook-100, co-purchase, and co-authorship graphs. Edge homophily is only weakly predictive of the GIN-Sum versus GIN-Mean performance gap. Label informativeness predicts this gap well on legacy benchmarks but degrades substantially when Facebook-100 graphs are included. In these dense friendship networks, near-zero label informativeness coexists with a strong preference for sum aggregation, producing gains of 7-10% and up to 13% under extended training. Stochastic block model ablations, including degree-corrected variants matching Facebook-100 degree scales, fail to reproduce this behavior, indicating that mean degree alone does not explain the effect. Among several label-independent graph statistics, the spectral gap uniquely distinguishes these graphs from other low-informativeness datasets, with the effect localized to one-hop neighborhoods and replicated across architectures. We further identify training regimes that interact with aggregator choice and show that PNA can underperform the best single-aggregator GIN on standard citation benchmarks. Our results suggest that benchmark composition, rather than numerical insufficiency, determines whether design rules appear to generalize, and that the Facebook-100 regime provides a concrete target for future adaptive aggregation methods.

URL PDF HTML ☆

赞 0 踩 0

2606.10358 2026-06-10 cs.LG cs.AI 新提交

KG-SoftMAP: Soft Knowledge-Graph Priors for Bayesian Network Structure Learning from Sparse Discrete Data

KG-SoftMAP: 基于软知识图谱先验的稀疏离散数据贝叶斯网络结构学习

Guoliang Xu, James E. Corter

发表机构 * Columbia University（哥伦比亚大学）

AI总结针对稀疏离散数据中贝叶斯网络结构学习困难的问题，提出KG-SoftMAP方法，将加权有向知识图谱编码为软先验，结合BDeu评分与logit形式先验最大化MAP目标，在合成与真实数据上显著提升结构恢复性能。

Comments 33 pages including appendices, 1 figure

详情

AI中文摘要

从稀疏离散数据中学习贝叶斯网络（BN）结构是困难的：当每个实例仅记录少数变量时，大多数变量对缺乏可靠评分所需的联合观测，且纯数据方法恢复的结构很少。不完美的领域知识，可表示为加权有向知识图谱（KG），通常是可用的。我们提出KG-SoftMAP，它将这样的KG编码为软性的、置信度加权的、可被数据覆盖的边先验，并最大化结合BDeu评分与logit形式先验的MAP目标；KG可由专家整理或由LLM提取。在受控的合成基准（唯一具有真实DAG的设置）上，KG-SoftMAP在$\rho=0.05$时恢复部分有向结构（DF1从$0.14$到$0.29$，而基线接近零），当$\rho\geq0.2$时恢复更多（DF1从$0.46$到$0.96$），前提是配有一个信息丰富但不完美的KG；恢复性能随KG质量下降而优雅地退化。在无真实DAG的真实稀疏教育数据上，我们仅评估面向部署的指标：预测、校准和KG一致性。学习到的BN最好被解读为诊断模型：在SAF上，它落后于逻辑回归$0.03$的F1_FAIL，同时提供KG一致的边、校准的联合概率以及从任意观测概念子集的推理；当不存在有意义的KG时，判别式逻辑回归更可取。

英文摘要

Learning Bayesian network (BN) structure from sparse discrete data is hard: when each instance records only a few variables, most variable pairs lack the joint observations needed for reliable scoring, and data-only methods recover little structure. Imperfect domain knowledge, expressible as a weighted directed knowledge graph (KG), is often available. We propose KG-SoftMAP, which encodes such a KG as a soft, confidence-weighted, data-overridable edge prior and maximizes a MAP objective combining the BDeu score with a logit-form prior; the KG may be expert-curated or LLM-extracted. On controlled synthetic benchmarks, the only setting with ground-truth DAGs, KG-SoftMAP recovers partial directed structure at $ρ=0.05$ (DF1 $0.14$ to $0.29$, versus near-zero baselines) and substantially more once $ρ\geq0.2$ (DF1 $0.46$ to $0.96$), when paired with an informative but imperfect KG; recovery degrades gracefully as KG quality drops. On real sparse educational data, which has no ground-truth DAG, we evaluate deployment-facing measures only: prediction, calibration, and KG-consistency. The learned BN is best read as a diagnostic model: on SAF it trails logistic regression by $0.03$ F1_FAIL while providing KG-consistent edges, calibrated joint probabilities, and inference from arbitrary observed concept subsets; when no meaningful KG exists, discriminative logistic regression is preferable.

URL PDF HTML ☆

赞 0 踩 0

2606.11162 2026-06-10 cs.LG 新提交

表示课程：用于鲁棒排序和分配的分阶段训练

Ehsan Ebrahimzadeh, Sina Baharlouei, Abraham Bagherjeiran

发表机构 * eBay Search Ranking and Monetization（eBay搜索排名与变现）

AI总结提出表示课程（RC）方法，通过分阶段引入特征，先强调基于内容的信号，再引入依赖曝光的信号，减少对历史信号的捷径依赖，提升冷启动泛化性和鲁棒性。

Comments 12 pages, 5 figures

详情

DOI: 10.1145/3770855.3818470

AI中文摘要

数字市场中的排序是一种动态曝光分配机制：展示的物品塑造了发现轨迹和成功事件，平台记录这些事件以更新未来的分配策略。现代排序系统严重依赖曝光混杂信号（如流行度估计、CTR/CVR聚合和基于ID的表示），因为这些信号在静态需求下具有高度预测性。然而，这种预测能力可能成为一种学习捷径：早期访问依赖曝光的信念信号会使优化过度依赖它们，而忽视独立于曝光的价值信号（如基于内容的竞争力和语义亲和性）。因此，学习到的策略倾向于固化现有物品，并在分布偏移下降低冷启动泛化性和鲁棒性。我们提出表示课程（RC），一种训练时干预方法，按时间阶段安排特征使用。RC首先突出基于内容的价值信号，然后引入依赖曝光的信念信号，同时将内容路径锚定在学到的价值表示附近，从而抑制对历史信号的捷径依赖，并缓解内容信号上的梯度饥饿。我们形式化RC，使其独立于任务和假设类，并提供排序特定的实例化。在高斯线性岭回归设置中，我们推导出封闭形式解和充分条件，证明RC在冷启动目标分布上严格降低总体风险，并量化了与源性能的帕累托权衡。在公开的排序学习和推荐基准测试，以及大规模电商搜索系统中的随机在线实验中，RC显著地将依赖从历史信念信号转向基于内容的价值信号，并在头部性能可控权衡下，对冷群体带来一致的提升。

英文摘要

Ranking in digital marketplaces is a dynamic exposure-allocation mechanism: displayed items shape discovery trajectories and success events logged by the platform to update future allocation policies. Modern ranking systems rely heavily on exposure-confounded signals (e.g. popularity estimates, CTR/CVR aggregates, and ID-based representation), because they are highly predictive under stationary demand. Yet this predictive power can become a learning shortcut: early access to exposure-dependent belief signals steers optimization toward over-reliance on them and away from exposure-independent merit signals (e.g., content-based competitiveness and semantic affinity). Consequently, the learned policy tends to entrench incumbents and degrade cold-start generalization and robustness under distribution shift. We propose Representation Curriculum (RC), a training-time intervention that temporally stages feature utilization. RC foregrounds content-based merit signals initially, then introduces exposure-dependent belief signals while anchoring the content pathway near the learned merit representation, curbing shortcut reliance on historical signals and mitigating gradient starvation on content signals. We formalize RC independently of task and hypothesis class and provide ranking-specific instantiations. In a Gaussian linear ridge setting, we derive closed-form solutions and sufficient conditions under which RC strictly reduces population risk on a cold-start target distribution, with a quantified Pareto tradeoff against source performance. Experiments on public learning-to-rank and recommendation benchmarks, and randomized online experiments in a large-scale e-commerce search system, show that RC measurably shifts reliance from historical belief signals toward content-based merit signals and yields consistent gains on cold populations with a controlled trade-off in head performance.

URL PDF HTML ☆

赞 0 踩 0

2606.09960 2026-06-10 cs.LG cs.AI 新提交

HydraCIL: Decoupled Class-Incremental Learning through Prototype-Guided Multi-Head Classifiers

HydraCIL: 通过原型引导的多头分类器实现解耦的类增量学习

Daniel Vila-Cruz, Laura Morán-Fernández, Verónica Bolón-Canedo

AI总结提出HydraCIL模型，通过冻结主干网络、解耦特征提取与学习，并利用原型相似性选择任务特定分类头，在资源受限环境中实现高效类增量学习，匹配或超越现有方法同时大幅降低训练时间和碳排放。

Comments Accepted for publication at the International Joint Conference on Neural Networks (IJCNN 2026)

详情

AI中文摘要

我们提出HydraCIL，一种基于原型引导的多头分类器的解耦持续学习模型，旨在嵌入式及资源受限环境中的可持续部署。虽然大多数类增量学习（CIL）方法依赖强大硬件和长时间再训练周期，但实际系统（如机器人或边缘AI设备）必须在有限资源下快速适应。HydraCIL通过冻结主干网络并将特征提取与学习解耦来解决这一问题。对于每个任务，特征被提取一次，并创建一个轻量级的、任务特定的分类器头，避免了昂贵的主干再训练。在推理时，HydraCIL通过与原型的相似性选择适当的头。在CIFAR-100、ImageNet-100、CoRe50和Flowers102数据集上的实验表明，HydraCIL匹配或超越了最先进的CIL方法，同时显著减少了训练时间和碳足迹，使其成为在能源效率和快速适应至关重要的实际及嵌入式环境中进行持续学习的实用解决方案。

英文摘要

We present HydraCIL, a decoupled continual learning model based on prototype-guided multi-head classifiers, targeting sustainable deployment in embedded and resource-constrained environments. While most Class-Incremental Learning (CIL) methods rely on powerful hardware and long retraining cycles, real-world systems, such as robots or edge AI devices, must adapt quickly with limited resources. HydraCIL addresses this gap by freezing the backbone and decoupling feature extraction from learning. For each task, features are extracted once and a lightweight, task-specific classifier head is created, avoiding costly backbone retraining. At inference, HydraCIL selects the appropriate head via similarity with prototypes. Experiments on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 datasets show that HydraCIL matches or outperforms state-of-the-art CIL methods while significantly reducing training time and carbon footprint, making it a practical solution for continual learning in real-world and embedded settings, where energy efficiency and rapid adaptation are critical.

URL PDF HTML ☆

赞 0 踩 0

2606.10064 2026-06-10 cs.LG cs.AI 新提交

Bittensor Agent Arenas as a Trajectory Primitive: Distilling a Shopping Agent from ShoppingBench Subnet Traces

Bittensor 智能体竞技场作为轨迹基元：从 ShoppingBench 子网轨迹中蒸馏购物智能体

Shardul Bansal, Seth Schilbe, Jarrod Barnes

AI总结针对小模型后训练缺乏多轮轨迹数据的问题，利用 Bittensor 子网 SN15 的竞技机制生成激励对齐的轨迹，通过结构质量过滤提取智能体轨迹，后训练 Qwen3-4B 模型在 ShoppingBench 上达到 42.7% ASR，接近合成数据基线。

Comments 10 pages, 4 figures, Data and Models available at: https://huggingface.co/collections/oro-ai/shoppingbench-sn15-trajectory-primitive

详情

AI中文摘要

小模型智能体后训练的瓶颈更多在于其消耗的轨迹基质而非算法。领先的方案（RLVR、群体相对 RL、拒绝采样重 SFT）都需要携带每轨迹监督的多轮轨迹，而现有的两个来源存在不足：前沿合成数据继承了合成器的偏见并坍缩了长尾，而未经过滤的生产日志未经评判且被捷径行为污染。我们认为可以设计一个激励对齐的智能体竞技场来制造此类轨迹，并在 ORO Subnet 15（SN15）上进行了演示，这是 ShoppingBench 智能体电商基准的 Bittensor 部署。SN15 的竞赛机制、LLM 推理评判器和旋转泄漏簇防护问题集产生了一个具有三个特性的语料库：激励对齐的多样性、每轨迹评判和反记忆的留出评估。我们引入了一个结构质量过滤器，通过保留智能体轨迹（模型自身发出工具调用）并拒绝子任务轨迹（模型仅在确定性搜索循环上进行分类或叙述），将原始数据流转换为可训练的语料库，然后使用与已发布的 ShoppingBench SFT-然后-GRPO 流程匹配的方案对 Qwen3-4B 进行后训练。在泄漏簇防护的留出分区上，以生产严格方式评分，模型从已发布的 Qwen3-4B 基线的 18.0% ASR 提升至 42.7%，与合成数据 SFT 仅基线（43.6%）在单问题噪声范围内，同时仅训练了子网单日输出的一小部分。监督堆栈留下了较大的 pass@8 到 pass@1 差距（53.3% 对比 34.8%）；每步教师基础的 Dr. GRPO 奖励将该空间转化为过程改进，我们确定子任务数据流是缩小与 48.7% SFT+GRPO 基线差距的主要杠杆。我们发布了过滤器、语料库分割和竞技场机制。

英文摘要

Small-model agentic post-training is bottlenecked less by the algorithm than by the trajectory substrate it consumes. Leading recipes (RLVR, group-relative RL, rejection-sampled re-SFT) all need multi-turn traces carrying per-trajectory supervision, and the two existing sources fall short: frontier-synthesised data inherits the synthesizer's biases and collapses the long tail, while unfiltered production logs are unjudged and contaminated by shortcut behaviour. We argue that an incentive-aligned agent arena can be engineered to manufacture such trajectories, and demonstrate this on ORO Subnet 15 (SN15), a Bittensor deployment of the ShoppingBench agentic-commerce benchmark. SN15's race mechanism, LLM reasoning judge, and rotating leak-cluster-guarded problem suite yield a corpus with three properties: incentive-aligned diversity, per-trajectory judging, and anti-memorised held-out evaluation. We introduce a structural-quality filter that converts the raw firehose into a trainable corpus by keeping agentic trajectories (the model itself emits the tool calls) and rejecting sub-task trajectories (the model only classifies or narrates over a deterministic search loop), then post-train Qwen3-4B with a recipe matched to the published ShoppingBench SFT-then-GRPO pipeline. On a leak-cluster-guarded held-out partition scored production-strict, the model lifts from the published Qwen3-4B base of 18.0% ASR to 42.7%, within single-problem noise of the synthetic-data SFT-only baseline (43.6%), while training on a fraction of a single day of subnet output. The supervised stack leaves a large pass@8 to pass@1 gap (53.3% vs 34.8%); a per-step teacher-grounded Dr. GRPO reward converts that headroom into process improvement, and we identify the sub-task firehose as the primary lever for closing the gap to the 48.7% SFT+GRPO bar. We release the filter, the corpus splits, and the arena mechanics.

URL PDF HTML ☆

赞 0 踩 0

2606.10216 2026-06-10 cs.LG cs.AI 新提交

A Source Domain is All You Need: Source-Only Cross-OS Transfer Learning for APT Anomaly Detection via Semantic Alignment and Optimal Transport

一个源域足矣：基于语义对齐和最优传输的仅源域跨操作系统APT异常检测迁移学习

Sidahmed Benabderrahmanea, Petko Valtchev, James Cheney, Talal Rahwan

AI总结针对跨操作系统APT检测中目标域无标签的挑战，提出基于最优传输的仅源域异常评分框架，通过语义抽象和三种偏差通道实现零目标监督下的异常排序。

详情

AI中文摘要

高级持续性威胁（APT）是隐蔽的多阶段网络攻击，由于标记痕迹稀缺、严重的类别不平衡以及生成真实恶意行为的挑战，其检测十分困难。这些挑战在跨操作系统（cross-OS）设置中被放大，此时在一个源平台上训练的检测器必须部署在无标签的目标平台上，且无法访问目标域标签。我们利用系统级溯源轨迹研究这种仅源域的跨操作系统APT检测问题，并提出一个基于传输的框架，在零目标监督下对异常目标进程进行排序。该框架将进程行为抽象为结构化的自然语言描述，使用预训练语言模型进行嵌入，并构建源域正常参考用于目标评分。它结合了三种证据通道：与源域正常原型的语义偏差、由图自编码器捕获的结构偏差、以及通过最优传输（OT）度量的几何偏差。主要贡献是一个基于OT的重心异常分数，该分数将目标嵌入投影到源域正常流形上，并量化残差传输不匹配。我们进一步引入熵加权、角度感知和密度感知的OT变体，以捕获不确定性、方向漂移和稀疏支持行为。在DARPA透明计算数据（涵盖Linux、Windows、BSD和Android）上的评估，涉及两个APT场景和十二个跨操作系统传输对，表明所提框架在ROC-AUC和nDCG上优于仅源域异常检测基线。结果表明，仅源域溯源建模结合语义抽象和基于OT的异常评分，可以在没有目标域监督的情况下支持实际的跨平台APT检测。

英文摘要

Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior. These challenges are amplified in cross-operating-system (cross-OS) settings, where a detector trained on one source platform must be deployed on an unlabeled target platform without access to target-domain labels. We study this source-only cross-OS APT detection problem using system-level provenance traces and propose a transport-based framework for ranking anomalous target processes under zero target supervision. The framework abstracts process behavior into structured natural-language descriptions, embeds them using pretrained language models, and constructs a source-normal reference for target scoring. It combines three evidence channels: semantic deviation from source-normal prototypes, structural deviation captured by graph autoencoding, and geometric deviation measured through Optimal Transport (OT). The main contribution is an OT-based barycentric anomaly score that projects target embeddings onto the source-normal manifold and quantifies residual transport mismatch. We further introduce entropy-weighted, angle-aware, and density-aware OT variants to capture uncertainty, directional drift, and sparse-support behavior. Evaluation on DARPA Transparent Computing data spanning Linux, Windows, BSD, and Android, across two APT scenarios and twelve cross-OS transfer pairs, shows that the proposed framework improves ROC-AUC and nDCG over source-only anomaly-detection baselines. The results demonstrate that source-only provenance modeling, combined with semantic abstraction and OT-based anomaly scoring, can support practical cross-platform APT detection without target-domain supervision.

URL PDF HTML ☆

赞 0 踩 0

2606.10277 2026-06-10 cs.LG 新提交

A Unified Adaptive Feature Composition Framework for Multi-Task Generalization in Wireless Foundation Models

无线基础模型中多任务泛化的统一自适应特征组合框架

Yuxuan Shi, Tingting Yang, Kangning Ma, Liwen Jing, Yuwei Wang, Mengfan Zheng, Li Sun

发表机构 * Department of Broadband Communication, Pengcheng Laboratory（宽带通信系，鹏城实验室）； Purple Mountain Laboratories（紫金山实验室）

AI总结提出RAFC路由适配器，通过轻量级任务驱动网络动态组合Transformer各层隐藏特征，实现无线基础模型的多任务泛化，仅增加少于50K参数。

详情

AI中文摘要

尽管无线基础模型（WFM）在学习通用信道表示方面展现出强大潜力，但其适应各种下游任务仍受现有范式限制。微调策略引入了大量计算和存储开销，而冻结特征提取则导致跨不同下游任务的次优性能。为解决此问题，我们提出了一种用于WFM多任务泛化的统一自适应特征组合框架，其关键组件是用于特征组合的路由适配器（RAFC）。该路由器并非仅提取最后一层输出，而是将来自不同Transformer深度的隐藏状态视为可复用的多级隐藏特征池，并采用轻量级任务驱动特征组合网络生成逐层聚合权重，然后通过加权求和自适应地组合层次化表示。这种设计使每个下游任务能够访问合适的低、中、高级无线特征混合，而无需修改预训练骨干网络。在四个代表性无线任务上的大量实验表明，RAFC在引入少于50K额外参数的情况下，始终优于传统的适应基线。此外，学习到的路由权重提供了任务特定层偏好的可解释证据，使所提框架成为将WFM适应于各种下游场景的低复杂度、可扩展且可解释的接口。

英文摘要

Though wireless foundation models (WFMs) have shown strong potential in learning universal channel representations, their adaptation to various downstream tasks remains constrained by existing paradigms. Fine-tuning strategies introduces substantial computational and storage overhead, while frozen feature extraction leads to sub-optimal performance across diverse downstream tasks. To address this issue, we propose a unified adaptive feature composition framework for multitask generalization in WFMs, where the key component is the Routing Adapter for Feature Composition (RAFC). Instead of extracting only the final-layer output, this router treats the hidden states from different Transformer depths as a reusable pool of multi-level hidden features, and employs a lightweight task-driven feature composition network to generate layer-wise aggregation weights, then adaptively combine hierarchical representations through weighted summation. This design enables each downstream task to access suitable mixture of low-, mid-, and high-level wireless features without modifying the pretrained backbone. Extensive experiments on four representative wireless tasks demonstrate that RAFC consistently outperforms conventional adaptation baselines while introducing fewer than 50K additional parameters. Moreover, the learned routing weights provide interpretable evidence of task-specific layer preferences, making the proposed framework a low-complexity, scalable, and explainable interface for adapting WFMs to diverse downstream scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.11182 2026-06-10 cs.LG cs.AI 新提交

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

EEVEE：面向真实世界自改进智能体的测试时提示学习

Weixian Xu, Shilong Liu, Mengdi Wang

发表机构 * Shanghai Jiao Tong University（上海交通大学）； Princeton University（普林斯顿大学）

AI总结提出首个多数据集测试时提示学习框架EEVEE，通过路由器与提示协同进化策略解决跨数据集干扰，在异构数据流下提升鲁棒性。

Comments 19 pages, 6 figures

详情

AI中文摘要

本文提出EEVEE，首个面向LLM智能体的多数据集测试时提示学习框架，能够在真实世界任务流下实现测试时提示学习。现有方法主要针对单数据集设计，而实际应用要求模型处理来自多个数据集、领域和任务分布的异构输入流，限制了其实用性。为减轻跨数据集干扰，EEVEE引入一个路由器，将输入划分为任务簇并分配适当的提示配置。该设计通过路由器-提示协同进化策略进行优化，该策略采用交错的路由器和提示学习阶段来解决它们的相互依赖关系。跨多个数据集的实验表明，该框架在异构数据流下提高了鲁棒性，同时保持了单基准学习能力和效率。具体而言，EEVEE在Qwen3-4B-Instruct和DeepSeek-V3.2上平均多基准分数分别提高了10.38和24.32分，超过SOTA方法GEPA和ACE高达37.2%和48.2%。

英文摘要

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input streams drawn from multiple datasets, domains, and task distributions, limiting their practical applicability. To mitigate cross-dataset interference, EEVEE introduces a router that partitions incoming inputs into task clusters and assigns them to suitable prompt configurations. This design is optimized via a router-prompt co-evolution strategy, which employs interleaved router and prompt learning phases to address their mutual dependency. Experiments across multiple datasets demonstrate that the framework improves robustness under heterogeneous data streams while maintaining single-benchmark learning capability and efficiency. Specifically, EEVEE improves average multi-benchmark scores by 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, surpassing SOTA methods GEPA and ACE by up to 37.2% and 48.2%.

URL PDF HTML ☆

赞 0 踩 0

2606.10327 2026-06-10 cs.CL cs.LG 交叉投稿

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

顺序重要：LLaMA的序列微调用于连贯的自动作文评分

Ali Keramati, Mark Warschauer

发表机构 * University of California, Irvine（加州大学伊文斯分校）

AI总结提出对LLaMA-3.1-8B进行任务感知的序列微调，按作文话语结构顺序训练，在PERSUADE 2.0语料上证据F1达65%、结论F1达87%，超越独立训练和70B基线，证明课程设计可提升自动作文评分性能。

详情

AI中文摘要

自动作文评分（AES）系统必须判断相互依赖的话语元素（如引言、立场、证据、结论），但大多数方法孤立地处理这些元素，损害了连贯性和泛化能力。我们研究了对LLaMA-3.1-8B进行任务感知的微调，用于AES，使用参数高效的LoRA和4位量化，并比较了三种训练课程：（i）序列式（依次在引言、立场、主张、证据、结论上微调），（ii）独立式（任务特定模型），以及（iii）随机式（打乱的多任务）。在PERSUADE 2.0语料上的实验表明，建模任务依赖性很重要：序列微调取得了最强的整体结果，包括证据的F1分数65%和结论的87%，以及相应的准确率63%和85%，超越了独立训练，并且在结论上优于通用LLaMA-70B基线，尽管后者容量大得多。随机训练提高了立场评分（57% F1），但在其他地方一致性较差。这些发现表明：（1）与话语结构对齐的课程设计可以实质性地改善AES，以及（2）小型、任务优化的模型可以与显著更大的大型语言模型（LLM）竞争，为可扩展、成本效益高的评估提供了实用途径。我们发布模板和实现细节，以促进复现和未来在教育NLP中课程设计的工作。

英文摘要

Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate task-aware fine-tuning of LLaMA-3.1-8B for AES using parameter-efficient LoRA with 4-bit quantization and compare three training curricula: (i) Sequential (progressively fine-tuning on lead, then position, then claim, then evidence, then conclusion), (ii) Independent (task-specific models), and (iii) Randomized (shuffled multi-task). Experiments on the PERSUADE~2.0 corpus show that modeling task dependencies matters: Sequential fine-tuning yields the strongest overall results, including F1 scores of 65% (evidence) and 87% (conclusion) and corresponding accuracies of 63% and 85%, surpassing Independent training and outperforming a general-purpose LLaMA-70B baseline on conclusion despite its far larger capacity. Randomized training improves position scoring (57% F1) but is less consistent elsewhere. These findings indicate that (1) curriculum design aligned with discourse structure can materially improve AES, and (2) small, task-optimized models can be competitive with substantially larger Large Language Models (LLM), offering a practical path to scalable, cost-effective assessment. We release templates and implementation details to facilitate reproduction and future work on curriculum design for educational NLP.

URL PDF HTML ☆

赞 0 踩 0

2606.10918 2026-06-10 cs.RO cs.LG 交叉投稿

Task Robustness via Re-Labelling Vision-Action Robot Data

通过重新标注视觉-动作机器人数据的任务鲁棒性

Artur Kuramshin, Özgür Aslan, Cyrus Neary, Glen Berseth

发表机构 * Mila — Quebec AI Institute（Mila — 魁北克人工智能研究所）； Université de Montréal（蒙特利尔大学）； The University of British Columbia（不列颠哥伦比亚大学）

AI总结提出TREAD框架，利用大型视觉语言模型对机器人数据集进行语义子任务分解和多样化指令生成，无需额外数据收集，提升策略在未见任务上的泛化能力。

Comments Project website: https://akuramshin.github.io/tread

详情

AI中文摘要

近年来，机器人学习模型规模的扩大产生了令人印象深刻的策略，能够执行各种操作任务并泛化到新场景。然而，这些策略在遵循指令方面仍然存在困难，很可能是因为现有机器人数据集中的语言和动作序列多样性有限。本文介绍了通过重新标注视觉-动作机器人数据实现任务鲁棒性（TREAD），这是一个可扩展的框架，利用大型视觉语言模型（VLM）在不进行额外数据收集的情况下增强现有机器人数据集，利用这些模型中嵌入的可迁移知识。我们的方法通过三个阶段利用预训练的VLM：从原始指令标签和初始场景生成语义子任务，根据这些子任务对演示视频进行分割，并生成包含对象属性的多样化指令，有效地将较长的演示分解为基于语言-动作对。我们进一步通过用语言多样化的文本目标版本增强数据来提高鲁棒性。在LIBERO上的评估表明，在我们增强的数据集上训练的策略在未见过的、新颖的任务和目标上表现出改进的性能。我们的结果表明，TREAD通过轨迹分解增强了规划泛化，并通过增加语言多样性增强了语言条件策略泛化。

英文摘要

The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces Task Robustness via Re-Labelling Vision-Action Robot Data (TREAD), a scalable framework that leverages large Vision-Language Models (VLMs) to augment existing robotics datasets without additional data collection, harnessing the transferable knowledge embedded in these models. Our approach leverages a pretrained VLM through three stages: generating semantic sub-tasks from original instruction labels and initial scenes, segmenting demonstration videos conditioned on these sub-tasks, and producing diverse instructions that incorporate object properties, effectively decomposing longer demonstrations into grounded language-action pairs. We further enhance robustness by augmenting the data with linguistically diverse versions of the text goals. Evaluations on LIBERO demonstrate that policies trained on our augmented datasets exhibit improved performance on novel, unseen tasks and goals. Our results show that TREAD enhances both planning generalization through trajectory decomposition and language-conditioned policy generalization through increased linguistic diversity.

URL PDF HTML ☆

赞 0 踩 0

2507.14725 2026-06-10 cs.LG cs.AI 版本更新

MMClima：多模态气候科学数据与评估框架

Muhammad Umer Sheikh, Hassan Abid, Khawar Shehzad, Ufaq Khan, Muhammad Haris Khan

AI总结提出MMClima，一个包含10万+专家验证问答对的多模态气候问答框架，覆盖文本、视频和图表，用于评估多模态语言模型在气候科学中的表现。

详情

AI中文摘要

气候变化研究日益需要能够推理文本、动态视觉内容和科学图表的AI系统，然而现有的气候问答基准规模小、大多为文本，且覆盖模型范围狭窄。我们提出MMClima，一个大规模多模态气候问答框架，包含10万+专家验证的问答对，涵盖五个核心气候科学领域的文章、视频转录和图表。MMClima通过自动化的声明提取和问答合成构建，并采用人在回路验证以确保规模和可靠性。利用MMClima，我们在需要事实回忆、视觉解释和跨模态合成的任务上对最先进的多模态语言模型进行基准测试。此外，我们在文本分割上进行微调，得到mmclima-70b-txt，一个领域适应的基线模型，在文本问答上优于强大的开源和闭源模型。我们发布数据集、评估流程、微调模型权重和数据创建框架，以支持气候科学的标准多模态评估。

英文摘要

Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climate QA benchmarks are small, mostly textual, and cover a narrow range of models. We introduce MMClima, a large-scale multimodal climate question answering framework with 104k+ expert-validated question-answer pairs spanning articles, video transcriptions, and figures across five core climate science domains. MMClima is constructed via automated claim extraction and QA synthesis with human-in-the-loop validation to ensure both scale and reliability. Using MMClima, we benchmark state-of-the-art multimodal language models on tasks requiring factual recall, visual interpretation, and cross-modal synthesis. We additionally fine-tune on the textual split to produce mmclima-70b-txt, a domain-adapted baseline that outperforms strong open- and closed-source models on textual QA. We release the dataset, evaluation pipeline, fine-tuned model weights, and data creation framework to support standardized multimodal evaluation for climate science.

URL PDF HTML ☆

赞 0 踩 0

2606.10287 2026-06-10 cs.LG cs.CL 新提交

WHU-Infra3D：面向3D路边基础设施清单的全栈多模态数据集与基准

Chong Liu, Luxuan Fu, Xuyu Feng, Zhen Dong, Bisheng Yang

发表机构 * State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS)（信息工程测绘遥感国家重点实验室）； Wuhan University（武汉大学）

AI总结提出WHU-Infra3D多模态基准数据集，覆盖三城市53.8公里，融合全景图像与LiDAR点云，提供2D-3D实例关联和跨帧跟踪，支持基础设施状态诊断与属性识别，填补自动化维护数据集空白。

详情

AI中文摘要

数字孪生城市的范式正从粗略的视觉映射转向更精确、可操作的城市资产数字化。然而，现有数据集主要关注粗略的视觉感知，缺乏自动化基础设施维护所需的严格多模态对齐和属性及状态诊断。为弥合这一差距，我们引入了WHU-Infra3D，一个大规模、多模态的基准数据集，专门用于路边基础设施清单。覆盖三个城市53.8公里，WHU-Infra3D独特地集成了全景图像和LiDAR点云，并具有严格的2D-3D实例关联和跨帧跟踪。该数据集包含超过17.5万个多视图2D边界框以及数千个3D基础设施实例，提供了超过18.1万个详细的属性和状态注释（例如，锈蚀、遮挡），以支持运行健康评估。我们在五个核心任务上建立了全面的基线：2D检测、2D跨视图匹配、3D地理识别、3D点云分割和属性识别。广泛的评估暴露了当前模型在长尾缺陷状态上的显著跨城市领域差距和固有脆弱性，使WHU-Infra3D成为推进可扩展、AI驱动的城市基础设施清单和生命周期管理的重要试验场。WHU-Infra3D数据集可在以下网址获取：https://xxx。

英文摘要

The paradigm of digital twin cities is shifting from coarse visual mapping toward more precise and actionable digitization of urban assets. However, existing datasets predominantly focus on coarse visual perception, lacking the strict multi-modal alignment and attribute and status diagnosis required for automated infrastructure maintenance. To bridge this gap, we introduce WHU-Infra3D, a large-scale, multi-modal benchmark dataset dedicated to roadside infrastructure inventory. Covering 53.8 km across three cities, WHU-Infra3D uniquely integrates panoramic imagery and LiDAR point clouds with rigorous 2D-3D instance association and cross-frame tracking. Comprising over 175k multi-view 2D bounding boxes alongside thousands of 3D infrastructure instances, the dataset provides over 181k detailed attribute and status annotations (e.g., rust, occlusion) to empower operational health assessment. We establish comprehensive baselines across five core tasks: 2D detection, 2D cross-view matching, 3D geo-identification, 3D point cloud segmentation, and attribute recognition. Extensive evaluations expose significant cross-city domain gaps and inherent vulnerabilities of current models on long-tailed defective statuses, establishing WHU-Infra3D as an essential testbed for advancing scalable, AI-driven urban infrastructure inventory and lifecycle management. The WHU-Infra3D dataset is available at https://github.com/WHU-USI3DV/WHU-Infra3D.

URL PDF HTML ☆

赞 0 踩 0

2606.10066 2026-06-10 cs.CV cs.AI cs.LG 交叉投稿

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

公共医学视觉语言基准中预训练污染的受控审计

Bruce Changlong Xu, Lan Wu, Alexander Ryu

AI总结审计发现公共医学VLM基准存在图像源重叠和文本规范顺序交换性信号，但确认的像素级重复罕见，且现有成员推理检测器在小规模医学VLM队列中不可靠。

Comments 30 pages, 7 figures, 9 tables. Preprint

详情

AI中文摘要

医学视觉语言模型（VLM）在公共基准上进行评估，这些基准的图像和问答对多年来一直可自由下载，但报告准确度假设这些示例在预训练中不存在。我们对SLAKE-En、PathVQA、VQA-RAD以及一个辅助的公共OmniMedVQA镜像上的开放VLM进行了审计，使用了四种检测器系列：图像侧近邻重叠（针对PMC-OA-beta）、规范顺序可交换性、队列相对Min-K%++尾部富集以及跨模型Top-K重叠。我们发现SLAKE-En上存在可测量的图像侧源重叠：SigLIP-B-16标记了19.8%的图像，SigLIP-SO400M标记了4.2%，而域外对照产生0/2000个标记。人工裁定显示，相同模态、相同投影的匹配对应不同患者，而非经过验证的像素级重复，因此我们将其解释为源或分布重叠，而非确认的每图像记忆。在文本侧，Qwen2.5-VL在SLAKE-En上显示出规范顺序可交换性信号，该信号在顺序消融和外部非医学基线中仍然存在。在OmniMedVQA镜像上，五个医学和通用VLM触发了可交换性，而BLIP-2保持干净。相比之下，队列相对Min-K%++尾部富集和跨模型Top-K重叠在外部预域基线中崩溃：BLIP-2重现了明显的正信号，尽管缺乏合理的医学VQA暴露。我们得出结论，这些队列相对检测器作为小规模医学VLM队列上的独立成员推理信号是不可靠的。

英文摘要

Medical vision-language models (VLMs) are evaluated on public benchmarks whose images and question-answer pairs have been freely downloadable for years, yet reported accuracy assumes these examples were absent from pretraining. We audit open VLMs on SLAKE-En, PathVQA, VQA-RAD, and an auxiliary public OmniMedVQA mirror using four detector families: image-side near-neighbour overlap against PMC-OA-beta, canonical-order exchangeability, cohort-relative Min-K%++ tail enrichment, and cross-model top-K overlap. We find measurable image-side source overlap on SLAKE-En: 19.8% of images are flagged under SigLIP-B-16 and 4.2% under SigLIP-SO400M, while out-of-domain controls produce 0/2000 flags. Manual adjudication shows same-modality, same-projection matches to different patients rather than verified pixel-level duplicates, so we interpret this as source or distributional overlap rather than confirmed per-image memorization. On the text side, Qwen2.5-VL on SLAKE-En shows a canonical-order exchangeability signal that survives ordering ablation and external non-medical baselines. On the OmniMedVQA mirror, exchangeability fires for five medical and general VLMs while BLIP-2 remains clean. In contrast, cohort-relative Min-K%++ tail enrichment and cross-model top-K overlap collapse under an external pre-domain baseline: BLIP-2 reproduces the apparent positive signals despite lacking plausible medical-VQA exposure. We conclude that these cohort-relative detectors are unreliable as standalone membership-inference signals on small medical-VLM cohorts.

URL PDF HTML ☆

赞 0 踩 0

2606.10087 2026-06-10 cs.CL cs.LG 交叉投稿

CodeAlchemy: Synthetic Code Rewriting at Scale

CodeAlchemy：大规模合成代码重写

Ankit Gupta, Aditya Prasad, Rameswar Panda

AI总结提出CodeAlchemy框架，通过5种策略生成超过500B token的合成代码数据，引入DevEval和TraceEval基准，3B模型在多项任务上超越10倍大小的前沿模型。

详情

AI中文摘要

在原始代码上预训练可以学习语法，但为多样化的真实世界任务格式提供的信号稀疏。虽然合成数据已被证明对语言模型具有变革性，但代码领域除有限的质量改进外仍基本未被探索。我们提出CodeAlchemy，一个合成数据生成框架，通过5种策略将公开来源的代码转换为语义丰富的训练数据：CodeEnhance（质量感知重写）、CodeQA（基于模板的问题）、CodeDev（开发者任务）、CodeDialogue（多轮对话）和CodeTrace（执行轨迹）。我们处理了15种语言的3个语料库，生成了超过500B token的合成数据以及350B推理token，数量级远超先前工作。CodeTrace对14种语言和5K个库的1.3M+文件进行插桩和执行，捕获控制流、状态跟踪和库知识。我们引入了DevEval（开发者任务）和TraceEval（执行预测）基准；前沿模型如Claude Sonnet 4.5在TraceEval上仅达到5.6%的精确匹配，揭示了语义理解的关键差距。我们的3B模型在HumanEval上达到83.5%，在MBPP上达到63.2%，在DevEval上达到8.09%的胜率，在TraceEval上达到15.36 ROUGE-2，超越了包括27B Gemma-3和32B Granite-4.0在内的10倍大小的前沿模型。

英文摘要

Pre-training on raw code teaches syntax but provides sparse signal for diverse real-world task formats. While synthetic data has proven transformative for language models, code remains largely unexplored beyond limited quality improvements. We present CodeAlchemy, a synthetic data generation framework that transforms publicly sourced code into semantically-rich training data through 5 strategies: CodeEnhance (quality-aware rewriting), CodeQA (template-based problems), CodeDev (developer tasks), CodeDialogue (multi-turn conversations), and CodeTrace (execution traces). We process 3 corpora across 15 languages to generate 500B+ tokens of synthetic data plus 350B reasoning tokens, orders of magnitude more than prior efforts. CodeTrace instruments and executes 1.3M+ files across 14 languages and 5K libraries, capturing control flow, state tracking, and library knowledge. We introduce DevEval (developer tasks) and TraceEval (execution prediction) benchmarks; frontier models like Claude Sonnet 4.5 achieve only 5.6% exact match on TraceEval, revealing critical gaps in semantic understanding. Our 3B models achieve 83.5% on HumanEval, 63.2% on MBPP, 8.09% win rate on DevEval, and 15.36 ROUGE-2 on TraceEval, outperforming frontier models 10x the size including 27B Gemma-3 and 32B Granite-4.0.

URL PDF HTML ☆

赞 0 踩 0

2606.10125 2026-06-10 stat.ML cs.DB cs.LG 交叉投稿

Robust Active Learning for Few-Shot Example Selection in Text-to-SQL

鲁棒主动学习用于文本到SQL中的少样本示例选择

Arash Pourhabib

发表机构 * NVIDIA

AI总结针对文本到SQL中少样本示例选择，提出一种鲁棒主动学习方法，通过分层贪婪算法最大化异方差互信息目标，在嵌入流形上实现常数因子近似保证，显著减少标注成本。

Comments 31 pages, 4 figures, 5 tables

详情

AI中文摘要

少样本示例检索是将大型语言模型（LLM）应用于特定领域文本到SQL系统的主要范式。然而，标注示例库的质量直接决定系统准确性，且专家标注成本高昂。我们将这些示例的主动选择形式化为一个在语义查询嵌入的内在低维流形上的约束实验设计问题。与标准主动学习框架不同，我们的设置引入了三个关键挑战：依赖于查询的可变标注可靠性（异方差性）、跨语义主题的空间多样性严格要求（划分拟阵约束），以及嵌入空间真实协方差结构未知的固有现实（模型误设）。为了解决这些问题，我们提出了一种分层贪婪算法，该算法最大化异方差互信息目标。我们证明该目标在内在流形上保持子模性和近似单调性，从而得到理论上的常数因子近似保证。我们建立了一个谱界，表明当假设的替代核与真实数据生成过程存在偏差时，该近似保证会优雅地退化，而非灾难性地崩溃。实验结果表明，所提出的策略显著减少了标注工作量，同时保持了较高的文本到SQL检索准确性。

英文摘要

Few-shot example retrieval is the dominant paradigm for grounding large language models (LLMs) in domain-specific text-to-SQL systems. However, the quality of the annotated example bank directly governs system accuracy, and expert annotation is prohibitively expensive. We formalize the active selection of these examples as a constrained experimental design problem over the intrinsic, low-dimensional manifold of semantic query embeddings. Unlike standard active learning frameworks, our setting introduces three critical challenges: varying, query-dependent annotation reliability (heteroscedasticity), strict requirements for spatial diversity across semantic topics (partition matroid constraints), and the inherent reality that the true covariance structure of the embedding space is unknown (misspecification). To address these, we propose a stratified greedy algorithm that maximizes a heteroscedastic mutual information objective. We prove that this objective remains submodular and approximately monotonic on the intrinsic manifold, yielding a theoretical constant-factor approximation guarantee. We establish a spectral bound demonstrating that this approximation guarantee degrades gracefully, rather than catastrophically, when the assumed surrogate kernel diverges from the true underlying data-generating process. Empirical results demonstrate that the proposed strategy significantly reduces labeling effort while maintaining high text-to-SQL retrieval accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.10187 2026-06-10 stat.ML cs.LG 交叉投稿

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

面向流式广告中节奏控制的决策校准共形不确定性

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics, Embry-Riddle Aeronautical University（数学系，埃姆伯里-瑞德航空大学）

AI总结提出一种决策校准共形框架，通过衡量预测误差对实际部署策略的最大影响来校准不确定性，理论证明该分数是保护所有可部署节奏控制策略的最小有效不确定性度量，并在公开数据集上显著降低不确定性半径。

详情

AI中文摘要

我们开发了一个决策校准的共形框架，用于流式广告中的节奏控制决策。节奏控制依赖于不确定的未来库存、需求压力、增量响应和会员体验负载。该框架不是校准通用的预测残差，而是通过预测误差对实际可能部署的策略的最大影响来衡量预测误差。主要定理表明，所提出的分数是统一保护所有可部署节奏控制策略的最小有效不确定性度量。几何上，它是有符号策略敏感性集的支持函数。分裂共形校准为该分数提供了有限样本覆盖。一个高维分离定理表明，传统的残差校准可能因支付干扰库存维度而任意保守，而一个鲁棒的节奏控制结果结合了库存、响应和体验不确定性。在基于Criteo Uplift和KuaiRand数据集构建的公开数据校准节奏控制回放中，传统共形节奏控制仍然未解决，在Criteo上残差半径高达7236.7，在KuaiRand上为4629.4。采用所提出的决策校准方法，不确定性半径分别降至18.4和278.6，并为价值、交付、预算和会员负载设置了单独的边际。在Criteo上，所提出的方法证明了比点预测基线更不激进的节奏控制策略，并将保留的任何违规率从16.7%降至3.3%，且预算和会员负载违规为零。在KuaiRand上，选择仍未解决。简而言之，本文确立了预测、响应估计和会员体验模型应根据它们是否缩小节奏控制决策使用的不确定性来判断，因为这会导致自信且不过度保守的决策。

英文摘要

We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.

URL PDF HTML ☆

赞 0 踩 0

2606.10229 2026-06-10 cs.RO cs.LG 交叉投稿

What Demonstration Curation Metrics Do to Your Policy

演示筛选指标对策略的影响

Aarav Bedi

AI总结研究演示筛选指标在检测缺陷演示后，是否提升基于行为克隆的策略性能。发现指标检测缺陷的能力与策略性能严重脱钩，并揭示演示时长作为混淆变量的影响。

Comments 6 pages, 1 figure, 2 tables

详情

AI中文摘要

我们研究了检测缺陷训练演示的筛选指标是否也能改善基于筛选数据训练的行为克隆策略。在一个接触密集的LIBERO抓取放置基准任务中，通过引入受控结构缺陷（搬运阶段早期释放夹爪），我们发现这两个量是严重解耦的。具有最高缺陷检测AUROC（0.804）的指标产生了最差的筛选策略（任务成功率13.3%），而AUROC显著较低（0.638）的指标产生的策略几乎与在真实干净数据上训练的Oracle策略相匹配（90.0% vs. 93.3%）。我们进一步表明，我们评估的七个指标中有五个利用演示时长作为缺陷标签的琐碎代理，这种混淆因素将报告的AUROC膨胀到接近完美的值，并且在控制演示时长后消失。在所有条件下，受污染的基线仅在3.3%的测试中成功，而两种最佳的筛选方法将差距缩小到Oracle上限93.3%的3个百分点以内。我们的结果认为，筛选方法应根据其产生的策略来评估，而不是根据其标记的缺陷，并且任何筛选基准在报告检测准确性之前必须控制演示时长。我们发布了测试平台、所有指标实现和评估流程。

英文摘要

We study whether demonstration-curation metrics that detect defective training episodes also improve the downstream behavior-cloning policy that trains on the curated data. On a contact-rich LIBERO pick-and-place benchmark with a controlled structural defect (early gripper release during the carry phase), we find that the two quantities are sharply decoupled. The metric with the highest defect-detection AUROC (0.804) produces the worst curated policy (13.3% task success), while a metric with a substantially lower AUROC (0.638) produces a policy that nearly matches the oracle trained on ground-truth clean data (90.0% vs. 93.3%). We further show that five of the seven metrics we evaluate exploit episode length as a trivial proxy for the defect label, a confound that inflates reported AUROCs to near-perfect values and disappears once episode length is controlled. Across all conditions, the contaminated baseline succeeds on only 3.3% of rollouts, and the two best curation methods close this to within 3 percentage points of the 93.3% oracle ceiling. Our results argue that curation methods should be evaluated by the policy they produce, not the defects they flag, and that any curation benchmark must control for episode length before reporting detection accuracy. We release the testbed, all metric implementations, and the evaluation pipeline.

URL PDF HTML ☆

赞 0 踩 0

2606.10246 2026-06-10 cs.SD cs.AI cs.LG 交叉投稿

Linguistically Augmented Audio Speech Data (LinguAS)

语言增强音频语音数据 (LinguAS)

Ashley R. Keaton, Zahra Khanjani, Christine Mallinson, Vandana P. Janeja

发表机构 * University of Maryland, Baltimore County（马里兰大学巴尔的摩分校）

AI总结提出LinguAS数据集，通过专家定义的语言特征(EDLFs)增强音频数据，显著提升深度伪造语音检测模型性能。

详情

AI中文摘要

恶意创建的伪造语音，包括深度伪造和欺骗音频，正以惊人速度扩散，检测模型竞相保持领先。然而，大多数检测模型仅基于帧级音频特征进行推理，未利用更大时间尺度上的有价值语言线索。为弥补这一空白，我们提出语言增强音频语音数据(LinguAS)，这是一个包含真实和深度伪造音频样本的数据集，标注了五种策略性选择的、专家定义的语言特征(EDLFs)，这些特征在英语口语中频繁出现且是自然人类语音的特征。LinguAS包含超过800个音频样本，每个样本都标注了EDLFs。数据集包含四种欺骗音频攻击类型的平衡数量以及相应数量的真实语音样本。我们还包含说话者性别和每个欺骗音频样本的生成器/来源元数据，为模型训练提供更细粒度信息。我们发现，使用EDLFs增强数据训练的模型性能显著超过ASVspoof 2021深度学习基线和HuBert、XLSR等SSL模型。LinguAS增强的语言、性别和生成器元数据为音频深度伪造研究者提供了一个强调真实人类语言特征的数据集，以改进伪造语音的模型推理。数据和代码已公开。

英文摘要

Maliciously-created fake speech, including deepfaked and spoofed audio, is proliferating at an alarming rate, and detection models are racing to stay ahead of the curve. Yet, most detection models are trained to make inference on frame-level audio features alone without leveraging valuable linguistic cues at larger timescales. To address this gap, we present Linguistically Augmented Audio Speech Data (LinguAS), a dataset of genuine and deepfaked audio samples annotated with five strategically-chosen, Expert-Defined Linguistic Features (EDLFs) that occur frequently in spoken English and are characteristic of natural human speech. LinguAS contains over 800 audio samples, each of which are annotated with EDLFs. The dataset has a balanced number of four spoofed audio attack types and a proportionate number of genuine speech samples. We also include metadata on speaker gender and the generator/source for each spoofed audio sample, offering more granularity for model training. We found that models trained on data augmented with EDLFs had improved model performance significantly beyond the ASVspoof 2021 deep learning baselines and SSL models like HuBert and XLSR. LinguAS's augmented linguistic, gender, and generator metadata provide audio deepfake researchers with a dataset that emphasizes real human language traits to improve model inference of faked speech. Data and code are publicly available.

URL PDF HTML ☆

赞 0 踩 0

2606.10255 2026-06-10 eess.IV cs.CV cs.DL cs.LG physics.bio-ph 交叉投稿

POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET

POPSICLE: 用于冷冻电镜断层扫描中分割和定位的基准数据集

Jonathan Schwartz, Utz Heinrich Ermel, C. Braxton Owens, Zhuowen Zhao, Ariana Peck, Gus L. W. Hart, Grant J. Jensen, Bridget Carragher, Dari Kimanius

发表机构 * Biohub ； Brigham Young University

AI总结提出POPSICLE基准套件，基于CryoET数据门户构建，涵盖真核和原核系统、纯化与原位样本，支持体素分割和稀疏定位任务，旨在解决冷冻电镜断层扫描中缺乏标准化基准的问题。

详情

AI中文摘要

冷冻电镜断层扫描（cryoET）通过直接可视化完整细胞内的分子结构，将分子架构与细胞组织在天然环境中联系起来，已成为结构和细胞生物学中的强大工具。然而，实现cryoET的全部潜力日益依赖于计算分析，特别是机器学习（ML）的进步，以解释其复杂且信息丰富的数据。尽管进展迅速，cryoET的ML开发仍受限于缺乏标准化、良好注释的基准。现有评估通常规模小、任务特定且孤立构建，限制了方法间的稳健比较。在此，我们提出POPSICLE，一个基于CryoET数据门户（一个开放、ML就绪的断层数据、元数据和注释库）构建的cryoET分割和大分子定位基准套件。POPSICLE涵盖真核和原核系统、纯化和完全原位样本，以及密集体素分割和稀疏定位任务。基于动态数据资源，它可随着新数据集和注释的出现而扩展。基线实验揭示了模型排名在不同任务间的显著变化，强调了需要针对cryoET独特特征定制的基准，而非从相邻生物医学成像领域借鉴的评估实践。因此，POPSICLE为cryoET中可重复的ML评估提供了开放且可扩展的基础。

英文摘要

Cryo-electron tomography (cryoET) has emerged as a powerful tool in structural and cellular biology by enabling direct visualization of macromolecular structures within intact cells, thereby linking molecular architecture to cellular organization in a native context. Realizing the full potential of cryoET, however, increasingly depends on advances in computational analysis, particularly machine learning (ML), to interpret its complex and information-rich data. Despite rapid progress, ML development for cryoET remains bottlenecked by the lack of standardized, well-annotated benchmarks. Existing evaluations are typically small, task-specific, and are assembled in isolation, limiting robust comparisons across methods. Here, we present POPSICLE, a benchmark suite for cryoET segmentation and macromolecular localization built from the CryoET Data Portal - an open, ML-ready repository of tomographic data, metadata, and annotations. POPSICLE spans eukaryotic and prokaryotic systems, both purified and fully in situ samples, and dense voxel-wise segmentation as well as sparse localization tasks. Built on a living data resource, it can expand as new datasets and annotations become available. Baseline experiments reveal substantial variation in model rankings across tasks, underscoring the need for benchmarks tailored to the unique characteristics of cryoET rather than evaluation practices adapted from adjacent biomedical imaging domains. POPSICLE thus provides an open and extensible foundation for reproducible ML evaluation in cryoET.

URL PDF HTML ☆

赞 0 踩 0

2606.10295 2026-06-10 stat.ML cs.LG math.ST stat.TH 交叉投稿

$k$-Nearest Neighbors in Gromov--Wasserstein Space

Gromov--Wasserstein空间中的$k$-最近邻

Kaitlyn Hohmeier, Nicolas Fraiman, Caroline Moosmueller

发表机构 * University of North Carolina at Chapel Hill, Department of Mathematics（北卡罗来纳大学教堂山分校数学系）； University of North Carolina at Chapel Hill, Department of Statistics and Operations Research（北卡罗来纳大学教堂山分校统计与运筹学系）

AI总结本文在Gromov-Wasserstein距离框架下实现k-最近邻分类，证明了度量测度空间和图上分类器的普适一致性，并通过实验验证了其有效性。

详情

AI中文摘要

Gromov--Wasserstein (GW) 距离为比较度量测度空间提供了一个框架，无论其底层结构或几何形状如何。对于基于网络的数据，它能够直接比较具有不同节点数量的图，无需嵌入或其他抽象。此外，通过GW的变体——融合Gromov--Wasserstein (fGW)，还可以在图形结构之外结合节点特征。在这项工作中，我们使用GW和fGW距离实现了$k$-最近邻 ($k$-NN) 分类。我们证明了在具有有限支撑和均匀概率测度的度量测度空间等价类空间上，GW-$k$-NN分类器的普适一致性。通过将图视为具有成对距离度量和节点上均匀概率测度的有限支撑度量测度空间，我们获得了图空间上GW-$k$-NN的普适一致性。类似地，对于fGW-$k$-NN，我们证明了在由具有有限支撑和均匀概率测度的度量测度空间以及到欧几里得空间的特征映射组成的结构化对象的弱同构类空间上的普适一致性，从而建立了节点属性图空间上的普适一致性。我们的数值实验表明，GW-$k$-NN和fGW-$k$-NN在多个图数据集上始终表现良好，这表明诸如$k$-NN之类的度量分类器在GW框架中效果良好。

英文摘要

The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of nodes, without requiring an embedding or other abstraction. Furthermore, through a variant of GW known as fused Gromov--Wasserstein (fGW), it is also possible to incorporate node features in addition to graph structure. In this work, we implement $k$-nearest neighbors ($k$-NN) classification using the GW and fGW distances. We prove the universal consistency of the GW-$k$-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-$k$-NN for the space of graphs. Likewise for fGW-$k$-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space, thus establishing universal consistency on the space of node-attributed graphs. Our numerical experiments show that GW-$k$-NN and fGW-$k$-NN consistently perform well across multiple graph datasets, suggesting that metric classifiers such as $k$-NN work well in the GW framework.

URL PDF HTML ☆

赞 0 踩 0

2606.10440 2026-06-10 cs.DC cs.LG cs.NI 交叉投稿

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

ASTRA-sim 3.0：通过高保真GPU和基础设施建模实现下一代分布式机器学习模拟

William Won, Jinsun Yoo, Tuan Ta, Moumita Dey, Andy Balogh, Pradosh Datta, Furkan Eris, Conor Green, Winston Liu, Changhai Man, Kingshuk Mandal, Amos Rai, Vinay Ramakrishnaiah, Ruchi Shah, David Sidler, Harsh Sikhwal, Hanjiang Wu, Tushar Krishna, Bradford M. Beckmann

发表机构 * AMD Research and Advanced Development（AMD研究与高级开发）； Georgia Institute of Technology（佐治亚理工学院）； Keysight ； Purdue University（普渡大学）

AI总结针对分布式机器学习中延迟敏感通信建模的不足，提出ASTRA-sim 3.0，通过细粒度缓存行级负载存储模拟和标准化基础设施表示InfraGraph，实现高保真模拟，支持优化集合算法、网络需求和GPU架构的设计空间探索。

Comments 10 pages, 15 figures, one table

详情

CIAware-Bench: 评估前沿大语言模型的控制干预感知能力

Joachim Schaeffer, Thomas Jiralerspong, Alexander Panfilov, Guillaume Lajoie, Jonas Geiping, Yoshua Bengio, Roland S. Zimmermann

发表机构 * MATS ； Mila – Quebec AI Institute（Mila – 魁北克人工智能研究所）； Université de Montréal（蒙特利尔大学）； Astra Fellowship ； ELLIS Institute Tübingen, MPI for Intelligent Systems & Tübingen AI Center ； LawZero ； Google DeepMind

AI总结提出CIAware-Bench基准，通过四个任务域测试模型能否区分自身轨迹与被控制干预修改的轨迹，发现前沿模型在默认设置下感知能力低至中等，且因任务和模型对而异。

详情

AI中文摘要

AI控制协议通过监控不可信模型的行为并修改潜在不安全步骤来监督它们，通常使用可信模型。这在一定程度上篡改了不可信模型的轨迹。如果可信模型检测到这种干预，它可能推断监控器的属性并适应以逃避控制。我们引入\textbf{CIAware-Bench}，一个用于衡量前沿模型\textbf{控制干预（CI）感知}能力的基准。CIAware-Bench测试模型能否区分自身轨迹与被控制干预修改的轨迹。该基准包含四个任务域（论文写作、BigCodeBench、Bash Arena和SHADE-Arena），同时改变轨迹水印、侧任务存在和控制协议。评估十一个前沿模型，我们发现默认设置下CI感知能力低至中等（最高0.87；随机机会平衡二元分类准确率为0.5），且在不同任务域和模型对之间存在显著差异。跨模型家族的检测通常更容易，表明模型利用了提供商特定的风格或后训练差异。总体而言，CI感知不是一个固定的模型级属性，应针对每个新模型发布和部署场景进行测量。我们发布CIAware-Bench以跟踪CI感知并为干预更难检测的控制协议提供信息。

英文摘要

AI control protocols oversee untrusted models by monitoring their actions and modifying potentially unsafe steps, often using a trusted model. This partially tampers with the untrusted model's trajectory. If the trusted model detects such an intervention, it may infer properties of the monitor and adapt to evade control. We introduce \textbf{CIAware-Bench}, a benchmark for measuring \textbf{c}ontrol \textbf{i}ntervention (CI) awareness across frontier models. CIAware-Bench tests whether models can distinguish their own trajectories from those modified by a control intervention. The benchmark is comprised of a suite of four task domains (essay writing, BigCodeBench, Bash Arena, and SHADE-Arena), while varying trajectory watermarking, side-task presence, and the control protocol. Evaluating eleven frontier models, we find low to moderate CI awareness under default settings (up to 0.87; random chance balanced binary classification accuracy is 0.5) with substantial variation across task domains and model pairs. Detection is generally easier across model families, suggesting that models exploit provider-specific differences in style or post-training. Overall, CI awareness is not a fixed model-level property, and should be measured for each new model release and deployment scenario. We release CIAware-Bench to track CI awareness and inform control protocols whose interventions are harder to detect.

URL PDF HTML ☆

赞 0 踩 0

2606.11098 2026-06-10 cs.CR cs.LG 交叉投稿

Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017

Transformer 真的有助于入侵检测吗？基于 CIC-IDS2017 的时间序列评估

Zach Moczkodan, Hany Ragab

发表机构 * Department of Electrical and Computer Engineering, Faculty of Engineering, Royal Military College of Canada (RMC)（加拿大皇家军事学院电气与计算机工程系，工程学院）

AI总结本研究重新将 CIC-IDS2017 构建为时间序列入侵检测任务，发现填充方式而非架构决定 Transformer 性能，且随机分割和填充方式会高估模型鲁棒性。

Comments 11 pages, 9 figures, 9 tables. Preprint. Code: https://github.com/zachmocz/temporal-ids-bench

详情

AI中文摘要

近年来，用于网络入侵检测的深度学习方法越来越多地采用时间架构，如循环网络和 Transformer，通常在 CIC-IDS2017 上报告接近完美的性能。然而，许多现有研究既没有为其时间模块提供真实的序列输入，也没有在现实、无泄漏的条件下进行评估，使得报告的性能提升是否源于真正的序列建模能力尚不清楚。在这项工作中，我们通过从网络对话中构建有序流序列，并在随机分割、两种无泄漏分割以及填充方案消融下对九种经典和深度学习架构进行基准测试，将 CIC-IDS2017 重新表述为时间入侵检测任务。核心发现是，填充惯例而非架构决定了 Transformer 的性能：在真正的序列（非填充）窗口上，Transformer 实现了实验中所有模型的最高 macro-F1（0.89）；在零填充+掩码评估下，其性能显著下降（-0.24 macro-F1），而 LSTM、GRU 和 1D-CNN 保持稳定。在无泄漏组评估下，随机森林是最稳健的模型（+0.009），而 Transformer 的误报率从 0.04% 增长到 2.7%，增加了 67 倍，这在传统协议下是不可见的。这些发现表明，评估方法——特别是填充惯例和分割协议——对报告性能的影响大于架构选择，并且广泛使用的随机分割与重复最后填充可能高估模型鲁棒性高达 0.24 macro-F1。我们主张将无泄漏分割、显式填充披露和序列感知基准测试作为未来入侵检测研究的标准实践。代码和实现细节可在此 https URL 获取。

英文摘要

Recent deep learning approaches for network intrusion detection increasingly incorporate temporal architectures such as recurrent networks and Transformers, often reporting near-perfect performance on CIC-IDS2017. However, many existing studies neither supply their temporal modules with genuine sequence inputs nor evaluate under realistic, leakage-free conditions, making it unclear whether reported gains arise from true sequence-modeling capability. In this work, we reformulate CIC-IDS2017 as a temporal intrusion-detection task by constructing ordered flow sequences from network conversations and benchmarking nine classical and deep learning architectures under a random split, two leakage-free splits, and a padding-scheme ablation. The central finding is that padding convention, not architecture, determines the Transformer's performance: on genuinely sequential (non-padded) windows the Transformer achieves the highest macro-F1 of any model in the experiment (0.89); under zero-pad+mask evaluation it drops markedly (-0.24 macro-F1), while LSTM, GRU, and 1D-CNN remain stable. Under leakage-free group evaluation the Random Forest is the most robust model (+0.009), while the Transformer's false-alarm rate grows from 0.04% to 2.7%, a 67-fold increase invisible under conventional protocols. These findings demonstrate that evaluation methodology -- specifically padding convention and split protocol -- has a larger effect on reported performance than architectural choice, and that widely used random splits with repeat-last padding can overestimate model robustness by up to 0.24 macro-F1. We advocate leakage-free splits, explicit padding disclosure, and sequence-aware benchmarking as standard practice in future IDS research. Code and implementation details are available at https://github.com/zachmocz/temporal-ids-bench.

URL PDF HTML ☆

赞 0 踩 0

2102.05314 2026-06-10 cs.LG math.ST stat.ML stat.TH 版本更新

Time series forecasting from partial observations via Non-negative Matrix Factorization

基于非负矩阵分解的部分观测时间序列预测

Yohann de Castro, Luca Mencarelli

发表机构 * Institut Camille Jordan, Ecole Centrale Lyon（让-卡米尔·约当研究所，中央理工大学）； Institut Universitaire de France（法国大学研究院）

AI总结提出滑动掩码方法(SMM)结合非负矩阵补全进行多非负时间序列预测，通过掩码原型矩阵分解(mAMF)和掩码归一化非负矩阵分解(mNMF)实现，理论证明恢复误差与噪声成比例，实验优于Transformer、LSTM等方法。

详情

AI中文摘要

在现代时间序列问题中，我们旨在预测可能包含缺失值和噪声的多重时间序列。本文引入滑动掩码方法(SMM)，通过非负矩阵补全来预测多个非负时间序列：将观测到的噪声值和预测/缺失值收集成矩阵形式，并通过将其行表示为少量非负向量（称为原型）的凸组合来实现学习。我们提出了两种估计方法，掩码原型矩阵分解(mAMF)和掩码归一化非负矩阵分解(mNMF)，它们可以与SMM方法结合。我们证明这些估计能以与噪声成比例的误差恢复真实原型。我们使用近端交替线性化方法(PALM)来计算原型和凸组合权重。我们在真实数据上将我们的估计器与最先进的方法（Transformer、LSTM、SARIMAX...）进行了多时间序列预测比较，结果表明我们的方法在大多数实验中优于它们。

英文摘要

In modern time series problems, one aims at forecasting multiple time series with possible missing and noisy values. In this paper, we introduce the Sliding Mask Method (SMM) for forecasting multiple nonnegative time series by means of nonnegative matrix completion: observed noisy values and forecast/missing values are collected into matrix form, and learning is achieved by representing its rows as a convex combination of a small number of nonnegative vectors, referred to as the archetypes. We introduce two estimates, the mask Archetypal Matrix factorization (mAMF) and the mask normalized Nonnegative Matrix Factorization (mNMF) which can be combined with the SMM method. We prove that these estimates recover the true archetypes with an error proportional to the noise. We use a proximal alternating linearized method (PALM) to compute the archetypes and the convex combination weights. We compared our estimators with state-of-the-art methods (Transformers, LSTM, SARIMAX...) in multiple time series forecasting on real data and obtain that our method outperforms them in most of the experiments.

URL PDF HTML ☆

赞 0 踩 0

2411.02817 2026-06-10 cs.LG cs.AI cs.CV cs.IT math.IT 版本更新

Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

条件 Vendi 分数：生成式 AI 模型和 LLM 的提示感知多样性评估

Mohammad Jalali, Azim Ospanov, Amin Gohari, Farzan Farnia

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong（计算机科学与工程系，香港中文大学）； Department of Information Engineering, The Chinese University of Hong Kong（信息工程系，香港中文大学）

AI总结针对文本提示引导的生成模型，提出条件 Vendi 和条件 RKE 分数，通过条件熵分离模型自身多样性，并证明收敛性及在多个任务中恢复真实多样性排序。

详情

AI中文摘要

由文本提示引导的生成模型在保真度和提示对齐方面被广泛评估，但其产生输出的能力仍未被充分探索。现有的多样性指标（如基于核矩阵的 von Neumann 和 Rényi 熵的 Vendi 和 RKE）是为无条件模型开发的，无法区分提示引起的变异和模型引起的变异。我们通过引入 \textit{Conditional-Vendi} 和 \textit{Conditional-RKE} 来解决这一差距，这些多样性度量源自正半定矩阵的条件熵。这些分数在提示引导生成中分离出模型引起的多样性，其中 Conditional-RKE 具有 $O(1/\sqrt{n})$ 的收敛速度。对于 Conditional-Vendi，我们引入了一种截断谱近似，产生可扩展且一致的估计。在文本到图像、图像字幕和 LLM 任务上的实验表明，条件分数恢复了真实多样性排序，并且还可以引导扩散模型生成更多样化的样本。代码库可从此 https URL 获取。

英文摘要

Generative models guided by text prompts are widely evaluated for fidelity and prompt alignment, yet their ability to produce outputs remains underexplored. Existing diversity metrics such as Vendi and RKE, which are based on the von Neumann and Rényi entropies of kernel matrices, were developed for unconditional models and cannot distinguish prompt-induced from model-induced variability. We address this gap by introducing \textit{Conditional-Vendi} and \textit{Conditional-RKE}, diversity measures derived from the conditional entropy of positive semidefinite matrices. These scores isolate model-induced diversity in prompt-guided generation, with Conditional-RKE enjoying an $O(1/\sqrt{n})$ convergence rate. For Conditional-Vendi, we introduce a truncated-spectrum approximation that yields scalable and consistent estimates. Experiments on text-to-image, image-captioning, and LLM tasks show that the conditional scores recover ground-truth diversity orderings and can also guide diffusion models toward more diverse samples. The codebase is available at https://github.com/mjalali/conditional-vendi.

URL PDF HTML ☆

赞 0 踩 0

2508.13362 2026-06-10 cs.LG 版本更新

Optimization-based Online Conformal Prediction for Multi-step Forecasting

基于优化的在线共形预测用于多步预测

Ruipu Li, Daniel Menacho, Alexander Rodríguez

发表机构 * University of Michigan（密歇根大学）

AI总结提出O2CP框架，通过双层优化结构建模多步误差依赖，在保证边际覆盖有效性的同时生成更窄的预测区间，实验表明在自动驾驶、气候预测等领域优于现有方法。

详情

AI中文摘要

共形预测（CP）因其无分布覆盖保证而非常适合时间序列预测中的不确定性量化。然而，现有的多步方法往往难以平衡覆盖有效性与效率：它们要么独立校准每个预测步长，忽略时间相关性，要么强制执行严格的同步覆盖，导致区间过于保守。在这项工作中，我们提出了O2CP：基于优化的在线共形预测，这是一个统一的在线共形预测框架，显式建模多步误差依赖关系，同时不牺牲长期边际覆盖保证。我们首先证明，只要校准参数保持在定义的“安全”区域内，标准的在线共形更新就能保持有效性。利用这一理论见解，我们引入了一个双层架构：外层定义可接受的参数集以确保有效性，内层执行约束优化以建模联合误差分布并最小化整个预测步长的目标函数。为了使其计算可行，我们开发了一种轻量级采样策略，无需大型校准集即可估计联合分布。在包括自动驾驶、气候预测和公共卫生在内的真实世界数据集上的大量实验表明，O2CP始终优于最先进的基线方法，在实现目标覆盖率的同时，预测区间显著更窄，且长期遗憾更小。

英文摘要

Conformal prediction (CP) is well-suited for uncertainty quantification in time series forecasting due to its distribution-free coverage guarantees. However, existing multi-step methods often struggle to balance coverage validity with efficiency: they either calibrate horizons independently, ignoring temporal correlations, or enforce strict simultaneous coverage, resulting in overly conservative intervals. In this work, we propose O2CP: Optimization-based Online Conformal Prediction, a unified framework for online conformal prediction that explicitly models multi-step error dependencies without sacrificing long-term marginal coverage guarantees. We first prove that standard online conformal updates maintain validity as long as calibration parameters remain within a defined "safe" region. Leveraging this theoretical insight, we introduce a two-layer architecture: an outer layer that defines admissible parameter sets to ensure validity, and an inner layer that performs constrained optimization to model joint error distributions and minimize horizon-wide objectives. To make this computationally feasible, we develop a lightweight sampling strategy that estimates joint distributions without requiring large calibration sets. Extensive experiments on real-world datasets, including autonomous driving, climate forecasting, and public health, demonstrate that O2CP consistently outperforms state-of-the-art baselines, achieving target coverage with significantly sharper prediction intervals and reduced regret over long horizons.

URL PDF HTML ☆

赞 0 踩 0

2604.12306 2026-06-10 cs.LG cs.AI 版本更新

GCA Framework: A GCC Countries-Grounded Dataset and Agentic Pipeline for Climate Decision Support

GCA框架：面向海湾合作委员会国家的数据集与气候决策支持智能体管道

Muhammad Umer Sheikh, Khawar Shehzad, Salman Khan, Fahad Shahbaz Khan, Muhammad Haris Khan

发表机构 * Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)（莫扎德人工智能大学）； University of Missouri（密苏里大学）； Australian National University（澳大利亚国立大学）； Linköping University（林肯大学）

AI总结提出GCA框架，包含GCC国家多模态数据集GCA-DS和工具增强型智能体GCA，通过领域微调和工具集成提升气候决策可靠性。

详情

AI中文摘要

海湾合作委员会（GCC）国家的气候决策日益需要能够将异质的科学和政策证据转化为可操作指导的系统，然而通用大语言模型（LLM）在区域特定气候知识以及与地理空间和预测工具的接地交互方面仍然薄弱。我们提出GCA框架，它统一了（i）GCA-DS，一个基于GCC国家的精选多模态数据集，以及（ii）Gulf Climate Agent（GCA），一个工具增强型气候分析智能体。GCA-DS包含20万个问答对，涵盖政府政策和适应计划、非政府组织和国际框架、学术文献以及关于热浪、沙尘暴和洪水的事件驱动报告，并辅以将图像与文本证据相结合的遥感输入。在此基础上，GCA智能体编排了一个基于实时和历史信号以及地理空间处理的模块化工具管道，生成衍生指数和可解释的可视化。最后，我们在GCC国家的气候任务上对开源和专有LLM进行了基准测试，结果表明领域微调和工具集成显著提高了相对于通用基线的可靠性。

英文摘要

Climate decision-making in the GCC states increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated multimodal dataset grounded in the GCC states, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis. GCA-DS comprises 200k question--answer pairs spanning governmental policies and adaptation plans, NGO and international frameworks, academic literature, and event-driven reporting on heatwaves, dust storms, and floods, complemented with remote-sensing inputs that couple imagery with textual evidence. Building on this foundation, the GCA agent orchestrates a modular tool pipeline grounded in real-time and historical signals and geospatial processing that produces derived indices and interpretable visualizations. Finally, we benchmark open and proprietary LLMs on climate tasks in the GCC states and show that domain fine-tuning and tool integration substantially improve reliability over general-purpose baselines.

URL PDF HTML ☆

赞 0 踩 0

2605.01248 2026-06-10 cs.LG 版本更新

$S^3$-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data

$S^3$-R1: 通过合成数据学习逐步检索与回答

Harsh Goel, Akhil Udathu, Susmija Jabbireddy, Pradnesh Kalkar, Atharva Parulekar

发表机构 * The University of Texas at Austin（德克萨斯大学奥斯汀分校）； Google DeepMind（谷歌深Mind）

AI总结提出S^3-R1框架，通过合成数据生成和密集奖励信号，解决强化学习后训练中稀疏奖励和缺乏多跳问题数据的问题，提升模型搜索与问答能力。

Comments Under Review

详情

AI中文摘要

强化学习后训练使模型具备了新的能力，例如使用工具进行搜索。然而，这些模型主要面临两个限制：稀疏的基于结果的奖励，以及缺乏涵盖不同难度问题的训练数据，导致模型无法使用工具进行更深层次的搜索来收集证据以回答问题。为了解决这些限制，我们引入了S^3-R1（合成数据与稳定搜索R1），一个将数据驱动方法与更密集的学习信号相结合的框架。我们首先开发了一个合成生成与筛选流程，从现有文档中程序化地派生出多样化的多跳问题。该流程包含一个基于检索的验证步骤，专门用于分离出中等难度的问题。然后，我们将这个扩展的训练集与一个奖励结构配对，该结构同时评估中间搜索质量和最终答案的正确性。这种设置直接缓解了稀疏奖励固有的信用分配问题。我们的评估表明，S^3-R1通过学习更有效的搜索和综合策略，优于现有基线，在域外数据集上实现了高达10%的稳健泛化改进。

英文摘要

Reinforcement learning (RL) post-training has enabled newer capabilities in models, such as agentic tool-use for search. However, these models struggle primarily due to limitations with sparse outcome-based rewards and a lack of training data that encapsulates questions of differing hardness, which results in models not performing deeper searches with tools to collect evidence for question-answering. To address these limitations, we introduce S^3-R1 (Synthetic data and stabilized Search R1), a framework that couples a data-centric approach with denser learning signals. We first develop a synthetic generation and curation pipeline that programmatically derives diverse, multi-hop questions from existing documents. This pipeline incorporates a retrieval-based verification step to specifically isolate questions of intermediate difficulty. We then pair this expanded training set with a reward structure that evaluates both intermediate search quality and the correctness of the final answer. This setup directly mitigates the credit assignment problems inherent to sparse rewards. Our evaluations show that S^3-R1 outperforms existing baselines by learning more effective search and synthesis strategies, yielding up to a 10% improvement in robust generalization on out-of-domain datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.06698 2026-06-10 cs.LG cs.CL 版本更新

RECAP: Regression Evaluation for Continual Adaptation of Prompts

RECAP: 提示持续适应的回归评估

Harsh Deshpande, Kushal Chawla, Sangwoo Cho, William Campbell, Sambit Sahu

发表机构 * Capital One

AI总结提出RECAP基准，在严格主动适应-测试协议下评估提示优化方法对约束变化的持续学习能力，发现现有方法在主动场景下性能无显著提升，强调设计主动提示适应方法的必要性。

详情

AI中文摘要

生产中的代理系统经常面临不断变化的约束，并且必须从下一次交互开始就遵守。诸如工具调用通知更改合规阈值或策略更新添加披露要求等场景符合这一标准，在生产中几乎没有出错的空间。这种主动适应设置在部署中很常见，但在当前的基准测试中却不存在，这些基准测试假设要么是静态约束集，要么是带有评估反馈的反应式协议。我们引入了RECAP，这是一个基准测试，在严格主动适应-测试协议下，在约束级别测量持续学习现象（遗忘、回归、前向转移）：提示优化方法仅接收约束规范，并且必须在看到任何测试数据之前进行泛化。我们在四个LLM和三个具有不断变化的约束的调度上评估了六种方法，发现这些方法在性能上没有显著改善，即使在产生更高延迟之后也是如此。这些为离线或反应式设置设计的方法不足以应对主动范式。我们的工作强调了设计主动提示适应方法的日益增长的需求，其中模型必须对部署中不断变化的需求保持鲁棒性。

CleanPatrick: 图像数据清洗基准

Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Elisabeth Victoria Goessinger, Hanna Lindemann, Marie Bargiela, Marie Hofbauer, Omar Badri, Philipp Tschandl, Arash Koochek, Matthew Groh, Alexander A. Navarini, Marc Pouly

发表机构 * University of Basel（巴塞尔大学）； Lucerne University of Applied Sciences and Arts（卢塞恩应用科学大学）； University Hospital of Basel（巴塞尔大学医院）； Northwestern University（西北大学）； Northeast Dermatology Associates（东北皮肤科诊所）； Medical University of Vienna（维也纳医科大学）； Banner Health（Banner健康系统）

AI总结提出首个大规模图像数据清洗基准CleanPatrick，基于Fitzpatrick17k皮肤病数据集，收集大量众包标注并采用项目反应理论聚合，将问题检测形式化为排序任务，评估多种方法。

Comments Accepted at Journal of Data-centric Machine Learning Research (DMLR)

详情

AI中文摘要

鲁棒的机器学习依赖于干净的数据，然而当前的图像数据清洗基准依赖于合成噪声或狭窄的人类研究，限制了比较和现实相关性。我们引入CleanPatrick，这是图像领域首个大规模数据清洗基准，基于公开的Fitzpatrick17k皮肤病学数据集构建。我们收集了来自933名医学众包工作者的496,377个二元标注，识别出离题样本（4%）、近似重复（21%）和标签错误（32%），并采用受项目反应理论启发的聚合模型，随后经过专家审查以获得高质量的真实标签。CleanPatrick将问题检测形式化为排序任务，并采用反映真实审计流程的标准排序指标。我们基准测试了经典异常检测器、感知哈希、SSIM、Confident Learning、NoiseRank、FINE、BHN和SelfClean。在CleanPatrick上，自监督表示在近似重复检测方面表现出色，经典方法在受限审查预算下实现了有竞争力的离题检测，而在保守的人类判断下检测不合理标签对于细粒度医学分类仍然具有挑战性。通过发布数据集和评估框架，CleanPatrick使得图像清洗策略的系统比较成为可能。

英文摘要

Robust machine learning depends on clean data, yet current image data cleaning benchmarks rely on synthetic noise or narrow human studies, limiting comparison and real-world relevance. We introduce CleanPatrick, the first large-scale benchmark for data cleaning in the image domain, built upon the publicly available Fitzpatrick17k dermatology dataset. We collect 496,377 binary annotations from 933 medical crowd workers, identify off-topic samples (4%), near-duplicates (21%), and label errors (32%), and employ an aggregation model inspired by item-response theory followed by expert review to derive high-quality ground truth. CleanPatrick formalizes issue detection as a ranking task and employs standard ranking metrics that mirror real audit workflows. We benchmark classical anomaly detectors, perceptual hashing, SSIM, Confident Learning, NoiseRank, FINE, BHN, and SelfClean. On CleanPatrick, self-supervised representations excel at near-duplicate detection, classical methods achieve competitive off-topic detection under constrained review budgets, and detecting implausible labels under conservative human judgment remains challenging for fine-grained medical classification. By releasing both the dataset and the evaluation framework, CleanPatrick enables a systematic comparison of image-cleaning strategies.

URL PDF HTML ☆

赞 0 踩 0

2510.08906 2026-06-10 stat.ML cs.LG physics.chem-ph 版本更新

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

梯度引导的最远点采样用于鲁棒训练集选择

Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld

发表机构 * Berlin Institute for the Foundations of Learning（柏林学习与数据基础研究院）； Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada（化学物理理论组，化学系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, ON, Canada（材料科学与工程系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Vector Institute for Artificial Intelligence, Toronto, ON, Canada（人工智能研究所，多伦多，ON，加拿大）； Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada（物理系，多伦多大学圣乔治校区，多伦多，ON，加拿大）； Acceleration Consortium, University of Toronto, Toronto, ON, Canada（加速联盟，多伦多大学，多伦多，ON，加拿大）

AI总结提出梯度引导最远点采样（GGFPS），利用分子力范数指导构型空间采样，在MD17数据集上相比FPS和随机采样显著提升数据效率和模型鲁棒性。

Comments 41 pages, 43 figures, 2 algorithms; journal article with supplementary information appended

详情

DOI: 10.1088/2632-2153/ae68b8
Journal ref: Machine Learning: Science and Technology 7, 035047 (2026)

AI中文摘要

训练集采样方法用于提高机器学习问题中与化学相关的模型性能并降低数据成本。我们引入了梯度引导最远点采样（GGFPS），这是最远点采样（FPS）的一个简单扩展，利用分子力范数指导分子构型空间的高效采样。针对一个玩具系统（Styblinski-Tang函数）以及来自MD17数据集的分子动力学轨迹，提供了数值证据。我们的数值结果表明，与FPS、均匀随机采样（URS）以及已有的监督式FPS风格选择器PCov-FPS和PCov-CUR相比，使用GGFPS时数据效率和模型鲁棒性更优。对MD17数据的分布分析表明，FPS系统性地欠采样平衡几何结构，导致松弛结构测试误差较大。GGFPS纠正了这一缺陷，并且（i）在二维Styblinski-Tang系统中，与FPS相比，在不牺牲预测精度的情况下，训练成本可降低两倍；（ii）系统性地降低了MD17中平衡以及应变结构的预测误差；（iii）在所有MD17构型空间中系统性地降低了预测误差方差。这些结果表明，梯度感知采样方法作为有效的训练集选择工具具有很大潜力，而简单使用FPS可能导致训练不平衡和预测结果不一致。

英文摘要

Training set sampling methods are used to improve model performance and lower data costs in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy system (the Styblinski-Tang function) as well as for molecular dynamics trajectories from the MD17 dataset. Our numerical results indicate superior data efficiency and model robustness when using GGFPS compared to FPS and uniform random sampling (URS), as well as established supervised FPS-style selectors, PCov-FPS and PCov-CUR. Distribution analysis of the MD17 data suggests that FPS systematically under-samples equilibrium geometries, resulting in large test errors for relaxed structures. GGFPS cures this artifact and (i) enables up to twofold reductions in training cost without sacrificing predictive accuracy compared to FPS in the 2-dimensional Styblinski-Tang system, (ii) systematically lowers prediction errors for equilibrium as well as strained structures in MD17, and (iii) systematically decreases prediction error variances across all of the MD17 configuration spaces. These results suggest that gradient-aware sampling methods hold great promise as effective training set selection tools, and that naive use of FPS may result in imbalanced training and inconsistent prediction outcomes.

URL PDF HTML ☆

赞 0 踩 0

2512.11995 2026-06-10 cs.CV cs.AI cs.LG 版本更新

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

V-REX: 通过问题链进行探索性视觉推理的基准测试

Chenrui Fan, Yijun Liang, Shweta Bhardwaj, Kwesi Cobbina, Ming Li, Tianyi Zhou

发表机构 * University of Maryland, College Park（马里兰大学学院市分校）； Mohamed bin Zayed University of Artificial Intelligence（穆罕默德·本·扎耶德人工智能大学）

AI总结提出V-REX基准，通过问题链将多步探索推理分解为规划和遵循能力，评估视觉语言模型在复杂开放任务中的表现。

Comments 28 pages

详情

AI中文摘要

尽管许多视觉语言模型（VLM）被开发用于回答定义明确、目标高度具体的简单问题（如大多数基准测试所示），但在实践中，它们通常难以处理复杂的开放式任务，这些任务通常需要在视觉空间中进行多轮探索和推理。这种视觉思维路径不仅像AI侦探一样提供逐步探索和验证，还能对最终答案产生更好的解释。然而，由于中间步骤的探索空间巨大，这些路径难以评估。为弥补这一差距，我们开发了一个评估套件“多步探索视觉推理（V-REX）”，它由一个具有挑战性的视觉推理任务基准和一个评估协议组成。V-REX涵盖了跨不同领域的丰富应用场景。V-REX将多步探索推理转化为问题链（CoQ），并解耦了VLM的能力：（1）规划：通过选择一系列探索性问题来分解开放式任务；（2）遵循：顺序回答精心策划的CoQ以收集信息，从而推导出最终答案。通过每步策划有限的问题和答案选项，V-REX实现了对中间步骤的可靠定量和细粒度分析。通过评估最先进的专有和开源VLM，我们揭示了持续的扩展趋势、规划与遵循能力之间的显著差异，以及多步探索推理中巨大的改进空间。

英文摘要

While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice with complex open-ended tasks, which usually require multiple rounds of exploration and reasoning in the visual space. Such visual thinking paths not only provide step-by-step exploration and verification as an AI detective but also produce better interpretations of the final answers. However, these paths are challenging to evaluate due to the large exploration space of intermediate steps. To bridge the gap, we develop an evaluation suite, ``Visual Reasoning with multi-step EXploration (V-REX)'', which is composed of a benchmark of challenging visual reasoning tasks requiring native multi-step exploration and an evaluation protocol. V-REX covers rich application scenarios across diverse domains. V-REX casts the multi-step exploratory reasoning into a Chain-of-Questions (CoQ) and disentangles VLMs' capability to (1) Planning: breaking down an open-ended task by selecting a chain of exploratory questions; and (2) Following: answering curated CoQ sequentially to collect information for deriving the final answer. By curating finite options of questions and answers per step, V-REX achieves a reliable quantitative and fine-grained analysis of the intermediate steps. By assessing SOTA proprietary and open-sourced VLMs, we reveal consistent scaling trends, significant differences between planning and following abilities, and substantial room for improvement in multi-step exploratory reasoning.

URL PDF HTML ☆

赞 0 踩 0

2603.29730 2026-06-10 stat.ML cs.LG 版本更新

mlr3mbo: Bayesian Optimization in R

mlr3mbo：R语言中的贝叶斯优化

Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl

发表机构 * Department of Statistics, LMU Munich（慕尼黑大学统计系）； Munich Center for Machine Learning (MCML)（慕尼黑机器学习中心）； University of St Andrews（圣安德鲁大学）

AI总结介绍mlr3mbo，一个模块化的R语言贝叶斯优化工具箱，支持单/多目标优化、多提议、并行化，并通过坐标下降搜索和基准测试验证其性能与现有优化器相当。

详情

AI中文摘要

我们提出mlr3mbo，一个用于R语言中贝叶斯优化的模块化工具箱。mlr3mbo支持单目标和多目标优化、多点提议、批量与异步并行化以及稳健的错误处理。虽然它可用于许多标准贝叶斯优化变体的应用场景，但研究人员也可以从其灵活的构建块中构建自定义贝叶斯优化算法。除了介绍软件、设计原则和构建块外，本文还在基于代理的基准套件YAHPO Gym上进行了两次广泛的实证评估。为了识别数值和混合层次优化场景下的稳健默认配置，并进一步了解各个设置的各自影响，我们在mlr3mbo配置空间上运行坐标下降搜索并分析其结果。此外，我们将mlr3mbo与包括HEBO、SMAC3、Ax和Optuna在内的多种现有优化器进行基准测试，发现其性能与最新技术相当。

英文摘要

We present mlr3mbo, a modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom Bayesian optimization algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we benchmark mlr3mbo against a wide range of established optimizers, including HEBO, SMAC3, Ax, and Optuna, and find that it performs on par with state-of-the-art.

URL PDF HTML ☆

赞 0 踩 0

2605.24818 2026-06-10 stat.ME cs.CL cs.LG 版本更新

Spiking the training data to correct for test set contamination

向训练数据注入噪声以校正测试集污染

Johnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin Jia

发表机构 * University of Southern California（南加州大学）

AI总结提出通过以已知比例故意污染部分测试样本（注入噪声）来校正测试集污染导致的分数膨胀，并利用记忆预测器进行统计校正。

详情

AI中文摘要

关于测试集污染的文献主要集中在检测上，但对污染测试分数的校正研究不足。我们的核心建议是通过以已知比例故意污染一些测试样本来向训练数据注入噪声。然后，这些注入的样本可用于校准模型记忆的预测器，从而实现对膨胀测试分数的原则性统计校正。为了评估不同的校正估计量，我们首先提出了一个基于Hubble模型的模拟框架。Hubble模型以最小对形式出现，其中扰动模型被故意用几个测试集污染，而标准模型则没有，作为反事实和校正目标。我们考虑使用来自记忆预测器、正确性预测器或两者的信息的估计量。在模拟中，我们建立了基本的统计直觉，并表明利用记忆和正确性信息的估计量优于不做任何校正的朴素估计。然后，我们实例化了几种记忆和正确性预测器，并发现简单的预测器（如Platt缩放的成员推理指标）为校正提供了良好的信号。最后，我们考察了注入噪声的实际考虑。简单的记忆预测器在校准时不需要超过10个样本，并且通常从一个数据集迁移到另一个数据集。综上所述，注入噪声是解决测试集污染的一种有前景的方法。

英文摘要

The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by intentionally contaminating some test examples at known rates. The spiked examples can then be used to calibrate predictors of model memorization which enable principled statistical correction of inflated test scores. To evaluate different correction estimators, we first present a simulation framework based on the Hubble models. Hubble models come in minimal pairs, where the perturbed model was deliberately contaminated with several test sets, while the standard model was not, serving as the counterfactual and correction target. We consider estimators that use information from a memorization predictor, correctness predictor, or both. In simulation, we establish basic statistical intuitions and show that estimators leveraging memorization and correctness information are better than naive estimation which makes no correction at all. We then instantiate several memorization and correctness predictors, and find that simple predictors such as Platt-scaled membership inference metrics provide good signal for correction. Finally, we examine the practical considerations of spiking. Simple memorization predictors need no more than 10 examples for calibration and often transfer from one dataset to another. Taken together, spiking is a promising solution for test set contamination.

URL PDF HTML ☆

赞 0 踩 0

2605.30292 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

留出一个窗口：修改刀切法用于时间序列的预测推断

Hanyang Jiang, Rina Foygel Barber, Ashwin Pananjady, Yao Xie

发表机构 * Schools of Industrial and Systems Engineering and Electrical and Computer Engineering（工业与系统工程系和电气与计算机工程系）； Department of Statistics, University of Chicago（芝加哥大学统计系）

AI总结针对时间序列中数据非可交换性和记忆预测器的问题，提出留出一个窗口（LWO）方法，通过修改刀切法实现有效覆盖，并产生比分裂共形预测更窄的区间。

Comments 40 pages, 8 figures

详情

AI中文摘要

共形预测方法在数据可交换且预测器以无记忆方式训练时，具有强大的理论和经验预测推断性能。然而，这些假设和约束在许多真实数据场景中不切实际，例如时间序列（其中时间依赖性违反了可交换性，并且无记忆预测器不可避免地具有较差的预测准确性）。最近的研究表明，分裂共形预测方法对于记忆预测器和偏离可交换性（这是时间序列数据的常见特征）具有鲁棒性。然而，由于使用样本分裂可能导致较低的准确性，这促使我们探究其他不依赖数据分裂的预测推断方法是否也能可靠地用于时间序列设置。在这项工作中，我们表明即使在具有轻微时间依赖性的典型时间序列模型中，原始的留一刀切法也可能遭受任意的覆盖损失。作为补救措施，我们提出了一种针对此类设置的精心修改，称为留出一个窗口（LWO）方法，并表明只要模型拟合过程满足温和的稳定性条件，它就能实现有效的覆盖。我们的证明基于量化数据偏离循环可交换性的程度，并引入了新的系数来衡量这种偏离的程度。在时间序列数据上的实验表明，当原始刀切法无法覆盖时，我们的LWO方法通常能实现有效的覆盖，同时产生比分裂共形预测更窄的区间。

英文摘要

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.09857 2026-06-10 cs.LG physics.comp-ph 新提交

Uncertainty-aware Multi-fidelity Closure via Conditional Normalizing Flows

基于条件归一化流的不确定性感知多保真度闭合模型

Jice Zeng, Shady E. Ahmed, David Barajas-Solano, Panos Stinis

AI总结提出基于条件归一化流的不确定性感知多保真度框架，通过学习低保真度到高保真度系数的概率映射，解决降阶模型中的闭合问题，在涡旋合并问题中验证了残差学习优于直接学习。

Comments No comments

详情

AI中文摘要

降阶模型（ROM）为复杂多尺度系统提供了高效的替代模型，但其预测精度常因截断误差以及已解析尺度与未解析尺度之间相互作用的不足表示而受损。截断（未解析）尺度对ROM（已解析）尺度缺失的影响通常被称为闭合问题。在本工作中，我们将ROM闭合建模视为一个多保真度（MF）学习问题，并基于条件归一化流提出一个不确定性感知的MF框架，以提高ROM的预测精度。所提出的方法学习从低保真度（LF）ROM系数到高保真度（HF）系数的概率映射，从而在量化与所学闭合相关的不确定性的同时提高预测保真度。研究了两种校正策略：直接学习（直接从LF输入预测HF系数）和残差学习（学习LF与HF系数之间的差异，并用其恢复校正后的HF解）。该框架在由二维Navier-Stokes方程控制的涡旋合并问题上进行了验证。结果表明，两种校正策略均比未校正的ROM提高了精度，其中残差学习始终优于直接学习。此外，所提出的两种基于深度生成模型的策略为校正后的ROM系数提供了不确定性量化，这对于评估预测置信度和支持ROM在实际应用中的可靠使用至关重要。

英文摘要

Reduced-order models (ROMs) provide an efficient surrogate for complex multiscale systems, but their predictive accuracy is often compromised by truncation errors and the inadequate representation of interactions between resolved and unresolved scales. The missing effect of truncated (unresolved) scales on ROM (resolved) scales is often denoted as the closure problem. In this work, we formulate ROM closure modeling as a multi-fidelity (MF) learning problem and propose an uncertainty-aware MF framework based on conditional normalizing flow to enhance ROM predictive accuracy. The proposed approach learns a probabilistic mapping from low-fidelity (LF) ROM coefficients to high-fidelity (HF) coefficients, thereby improving predictive fidelity while quantifying the uncertainty associated with the learned closure. Two correction strategies are investigated: direct learning, in which HF coefficients are predicted directly from LF inputs, and residual learning, which learns the discrepancy between LF and HF coefficients and uses it to recover the corrected HF solution. The framework is demonstrated on a vortex merging problem governed by the two-dimensional Navier Stokes equations. Results show that both correction strategies improve ROM accuracy over uncorrected ROM, with residual learning achieving consistently better performance than direct learning. Moreover, the two proposed deep generative model-based strategies provide uncertainty quantification for the corrected ROM coefficients, which is critical for assessing prediction confidence and supporting the reliable use of ROMs in practical applications.

URL PDF HTML ☆

赞 0 踩 0

2606.09860 2026-06-10 cs.LG cs.AI stat.AP stat.ML 新提交

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

基于梯度提升与无分布覆盖的非酒精性脂肪肝病共形风险预测

Xinze Zhang

AI总结提出结合梯度提升决策树与共形预测的机器学习框架Method，实现非酒精性脂肪肝病个体风险的无分布校准覆盖预测，在中国多中心队列中AUROC达0.912，优于多种方法。

详情

AI中文摘要

非酒精性脂肪肝病（NAFLD）影响全球约25%的成年人，带来显著的肝脏和心血管风险。然而，人群层面的筛查工具仍不充分。我们提出Method，一种用于NAFLD风险预测的机器学习框架，将梯度提升决策树与共形预测相结合，以在个体风险估计上产生校准的、无分布的覆盖保证。它集成了基于互信息的稳定性选择过程，通过自助重采样识别紧凑、临床可解释的特征子集，构建预测集，其边际覆盖可证明超过用户指定的置信水平。我们在中国广州的多中心队列（主要n=2,187；外部验证n=412）上评估了Method，使用了涵盖人口统计学、代谢生物标志物和生活方式因素的78个候选特征。Method内部AUROC为0.912，外部为0.891，优于深度神经网络、TabNet、支持向量机和逻辑回归。共形预测集在90%名义水平下达到91.3%的经验覆盖。从这些分数得出的三层风险分层将人群分为不同组别，高风险亚组的12个月进展率是低风险组的4.7倍。选定的特征——特别是腰围、ALT、GGT、甘油三酯、空腹血糖和BMI——与已建立的代谢风险因素一致，提供了生物学合理性。

英文摘要

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

URL PDF HTML ☆

赞 0 踩 0

2606.09892 2026-06-10 cs.LG stat.ME 新提交

LMT: A Bayesian Framework for Causal Discovery from Textual Alarm Records in Manufacturing Systems

LMT: 制造系统中文本告警记录的因果发现贝叶斯框架

Xiaofeng Xiao, Jianhong Chen, Qiuzhuang Sun, Naichen Shi, Xubo Yue

发表机构 * Department of Mechanical & Industrial Engineering, Northeastern University, Boston, MA, USA（东北大学机械与工业工程系）； College of Integrative Studies, Singapore Management University, Singapore（新加坡国立大学整合研究学院）； Department of Industrial Engineering and Management Sciences, Department of Mechanical Engineering, Northwestern University, IL, USA（西北大学工业工程与管理科学系、机械工程系）

AI总结提出LMT框架，结合大语言模型提取的语义信号和基于泊松过程的时间证据，通过贝叶斯方法从文本告警记录中发现因果图，在小样本场景下表现优异。

Comments 19 pages

详情

AI中文摘要

文本事件记录（如告警日志）已成为工程和制造系统中越来越常见的数据源。除了识别相关性或重复模式外，工程师通常有兴趣了解在系统运行过程中哪些类型的事件因果性地触发或影响其他事件。文本事件描述可能包含关于此类因果关系的语义线索，而最近的大语言模型（LLM）为提取这些信号提供了有前景的工具。然而，仅依赖LLM编码的文本信息不足以进行准确的因果发现，因为语义模式并不直接揭示因果机制，并且可能将因果关系与相关性或频繁的顺序模式混淆。为了解决这些挑战，我们提出了\textbf{LMT}，一个用于工程事件数据的贝叶斯因果发现框架，它联合利用了文本描述和时间戳。具体来说，LMT首先使用LLM从事件描述中提取语义因果信号，并构建事件类型或事件簇之间因果图的先验分布。然后，它通过基于泊松过程的似然函数纳入时间证据，使得基于时间戳的统计证据能够精炼LLM信息先验。通过整合文本和时间信息，LMT生成一个既可解释又有数据支持的因果图。模拟研究表明，所提出的框架在不同设置下都是有效的，并且在样本量较小的告警事件场景中尤其具有优势。

英文摘要

Textual event records, such as alarm logs, have become an increasingly common data source in engineering and manufacturing systems. Beyond identifying correlations or recurring patterns, engineers are often interested in understanding which types of events causally trigger or influence other events during system operation. Textual event descriptions may contain semantic clues about such causal relationships, and recent large language models (LLMs) provide a promising tool for extracting these signals. However, relying solely on LLM-encoded textual information is insufficient for accurate causal discovery, since semantic patterns do not directly reveal causal mechanisms and may confuse causation with correlation or frequent sequential patterns. To address these challenges, we propose \textbf{LMT}, a Bayesian causal discovery framework for engineering event data that jointly leverages textual descriptions and timestamps. Specifically, LMT first uses LLMs to extract semantic causal signals from event descriptions and constructs a prior distribution over causal graphs among event types or event clusters. It then incorporates temporal evidence through a Poisson-process-based likelihood, allowing the LLM-informed prior to be refined by timestamp-based statistical evidence. By integrating the textual and temporal information, LMT produces a causal graph that is both interpretable and data-supported. Simulation studies show that the proposed framework is effective across different settings and is especially advantageous in small-sample alarm-event scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.09919 2026-06-10 cs.LG cs.AI cs.MA cs.RO 新提交

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

Co-GLANCE: 异构机器人团队的不确定性感知主动感知

Michal P. Podolinsky, Neel P. Bhatt, Pranay Samineni, Rohan Siva, Christian Ellis, Ufuk Topcu

AI总结提出Co-GLANCE系统，通过蒸馏视觉语言模型实现实时遮挡分割与机器人分配，结合共形预测与选择性弃权提供统计保证的不确定性量化，驱动主动感知，在真实场景中遮挡分割和分配准确率分别提升25%和36%，推理延迟降低350倍。

Comments Code, videos, and dataset available at https://co-glance.github.io/

详情

AI中文摘要

感知不确定性是异构机器人团队在非结构化户外环境中运行的核心挑战，因为单一视角无法提供可靠的场景理解。由遮挡等来源引起的感知不确定性，根据场景结构在不同机器人视角下表现不同。检测和解决感知不确定性的来源需要基于场景的上下文推理和具备能力感知的机器人分配。虽然视觉语言模型为两者提供了强大的语义先验，但它们对于机载推理在计算上过于昂贵，且缺乏校准的不确定性量化。我们介绍了Co-GLANCE，一个用于异构机器人团队不确定性解决的实时机载感知与决策系统。Co-GLANCE将视觉语言模型的语义推理能力蒸馏为用于遮挡分割和机器人分配的端到端模型，消除了对基于云推理的需求。为了量化感知不确定性，Co-GLANCE结合了共形预测与选择性弃权，为分割、机器人分配和检测输出提供统计有效的覆盖保证。这些校准的不确定性估计直接触发主动感知，派遣最合适的机器人获取信息丰富的视角并解决不确定性。在真实世界场景中，Co-GLANCE在遮挡分割和机器人分配准确率上分别比基于云的视觉语言模型基线高出25%和36%，同时将每帧推理延迟降低350倍。我们还发布了一个空地数据集以供未来研究。代码、视频和数据集可在以下网址获取：此 https URL。

英文摘要

Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as occlusions, manifests differently across robot viewpoints depending on scene structure. Detecting and resolving sources of perceptual uncertainty requires both scene-based contextual reasoning and capability-aware robot allocation. While vision-language models provide strong semantic priors for both, they are computationally prohibitive for onboard inference and lack calibrated uncertainty quantification. We introduce Co-GLANCE, a real-time onboard perception and decision-making system for uncertainty resolution in heterogeneous robot teams. Co-GLANCE distills the semantic reasoning capabilities of a vision-language model into an end-to-end model for occlusion segmentation and robot allocation, eliminating the need for cloud-based inference. To quantify perceptual uncertainty, Co-GLANCE combines conformal prediction with selective abstention to provide statistically valid coverage guarantees for segmentation, robot allocation, and detection outputs. These calibrated uncertainty estimates directly trigger active perception, dispatching the most appropriate robot to acquire informative viewpoints and resolve uncertainty. Across real-world scenarios, Co-GLANCE outperforms cloud-based vision-language model baselines in occlusion segmentation and robot allocation accuracy by 25% and 36%, respectively, while reducing per-frame inference latency 350x. We also release an air-ground dataset for future research. Code, videos, and dataset available at https://co-glance.github.io/ .

URL PDF HTML ☆

赞 0 踩 0

2606.09934 2026-06-10 cs.LG cs.CR 新提交

nCMD: Benign-Anchored Feature Selection for Imbalanced Network Intrusion Detection

nCMD: 面向不平衡网络入侵检测的良性锚定特征选择

Abu Fuad Ahmad, Istiaque Ahmed

AI总结提出良性锚定类均值偏差（nCMD）方法，通过计算攻击类分布与良性类均值的偏差进行特征选择，在四个基准数据集上优于传统过滤方法，尤其适用于特征预算紧张和类别严重不平衡的场景。

Comments 6 pages, IEEE double columns

详情

AI中文摘要

特征选择对于在操作和防御网络中常见的高维、高度不平衡流量下运行的网络入侵检测系统（NIDS）至关重要。传统的过滤方法使用跨类别对称计算的全局统计量对特征进行排序，因此无法捕捉入侵检测的不对称性，其中攻击最好被描述为对主导良性流量的偏离。我们提出了良性锚定类均值偏差（nCMD），一种轻量级且可解释的方法，该方法基于攻击类分布与良性类均值的偏差（而非全局有偏的参考）对特征相关性进行评分。这种方法使特征选择与NIDS的操作语义保持一致，且不增加额外计算成本。在四个基准数据集（CICIDS2017、CICDDoS2019、NSL-KDD和UNSW-NB15）、多个特征预算和三个下游分类器上，nCMD在宏平均F1分数上达到或超过了经典过滤基线。它在四个数据集中的三个以及每个分类器下均取得了最佳结果，在特征预算紧张和类别严重不平衡的情况下改进最为显著。这些结果支持良性锚定排序作为资源受限NIDS的可扩展且可解释的预处理组件。

英文摘要

Feature selection is critical for network intrusion detection systems (NIDS) operating under high-dimensional, highly imbalanced traffic, as found in operational and defense networks. Traditional filter methods rank features using global statistics computed symmetrically across classes and thus fail to capture the asymmetry of intrusion detection, where attacks are best characterized as deviations from dominant benign traffic. We propose benign-anchored Classwise Mean Deviation (nCMD), a lightweight and interpretable method that scores feature relevance based on the deviation of attack-class distributions from the benign-class mean, rather than a globally biased reference. This approach aligns feature selection with the operational semantics of NIDS at no additional computational cost. Across four benchmark datasets (CICIDS2017, CICDDoS2019, NSL-KDD, and UNSW-NB15), multiple feature budgets, and three downstream classifiers, nCMD matches or exceeds classical filter baselines in macro-averaged F1-score. It achieves the best result on three of the four datasets and under every classifier, with the strongest improvements observed under tight feature budgets and severe class imbalance. These results support benign-anchored ranking as a scalable and interpretable preprocessing component for resource-constrained NIDS.

URL PDF HTML ☆

赞 0 踩 0

2606.09949 2026-06-10 cs.LG cs.AI 新提交

Learning Where to Simulate: Generative Active Sampling for Online PDE Surrogate Training

学习何处模拟：在线PDE代理训练的生成式主动采样

Pierre Cesar, Sofya Dymchenko, Abhishek Purandare, Bruno Raffin

AI总结提出在线生成式主动采样（OGAS），通过扩散模型学习配置参数与代理性能的关系，主动采样高难度区域，显著降低尾部分布误差，提升代理最坏情况可靠性。

详情

AI中文摘要

数据驱动的PDE代理使用数值PDE求解器产生的数据进行训练。然而，当代理的目标是在广泛的PDE配置（例如初始条件和物理系数）上泛化时，生成具有代表性的训练集并非易事。配置参数的均匀采样通常低估了表现出挑战性动力学的轨迹，导致训练后的代理出现高预测误差和大误差方差。在线训练将数据生成和代理训练耦合，通过允许实时调整求解器参数提供了自然优势。为了有效利用这一能力，我们引入了在线生成式主动采样（OGAS），一种主动学习方法，它反应性地学习配置参数与代理性能之间的关系，以控制采样分布。OGAS与代理并行训练一个快速扩散模型，作为条件采样器，将代理派生的难度信号（例如损失或不确定性）映射到配置参数。通过主动从偏向高难度的先验中抽取目标信号，OGAS持续将数据生成导向挑战性区域，而不会延迟训练流程。我们在具有不同挑战性动力学的2D PDE（Kuramoto-Sivashinsky、Navier-Stokes、Gray-Scott）上评估OGAS，参数多达308个，并使用多种代理架构。在所有设置中，与均匀采样相比，OGAS一致地改善了尾部分布统计，显著降低了第99百分位以上的误差和整体误差离散度。虽然优先考虑挑战性轨迹引入了与平均误差的权衡，但OGAS有效确保了训练后代理的最坏情况可靠性，且壁钟时间开销可忽略不计。

英文摘要

Data-driven PDE surrogates are trained with data produced by numerical PDE solvers. However, when the surrogate's goal is to generalize across a wide range of PDE configurations (e.g., initial conditions and physical coefficients), generating a representative training set is non-trivial. Uniform sampling of configuration parameters often under-represents trajectories exhibiting challenging dynamics, leading to high prediction errors and large error variance in the trained surrogate. Online training, where data generation and surrogate training are coupled, offers a natural advantage by allowing solver parameters to be steered on-the-fly. To efficiently exploit this capability, we introduce Online Generative Active Sampling (OGAS), an active learning method that reactively learns the relationship between configuration parameters and surrogate performance to control the sampling distribution. OGAS trains a fast diffusion model in parallel to the surrogate to act as a conditional sampler, mapping a surrogate-derived difficulty signal (e.g., loss or uncertainty) to configuration parameters. By actively drawing target signals from a prior biased toward high difficulty, OGAS continuously steers data generation toward challenging regimes without delaying the training workflow. We evaluate OGAS across 2D PDEs with distinct challenging dynamics (Kuramoto-Sivashinsky, Navier-Stokes, Gray-Scott) and up to 308 parameters, using multiple surrogate architectures. Across all settings, OGAS consistently improves tail statistics, yielding substantial reductions in errors above the 99th percentile and overall error dispersion compared to uniform sampling. While prioritizing challenging trajectories introduces a trade-off with average error, OGAS effectively ensures worst-case reliability of trained surrogates with negligible wall-time overhead.

URL PDF HTML ☆

赞 0 踩 0

2606.09959 2026-06-10 cs.LG cs.AI 新提交

Temporal Context Conditioning for Seasonality-Aware Precipitation Nowcasting of High-Intensity Rainfall

面向高强度降雨的季节感知降水临近预报的时间上下文条件化

Gijs van Nieuwkoop, Siamak Mehrkanoon

AI总结提出TA-SmaAt-UNet模型，通过时间条件层（昼夜和季节循环编码）增强雷达降水临近预报，显著提升高强度降雨事件的预测性能。

Comments 9 pages, 6 figures

详情

AI中文摘要

降水临近预报越来越多地采用直接从近期雷达观测中学习的深度学习模型。尽管这类模型能有效捕捉短期降水运动，但它们往往缺乏降雨发展所依据的气象条件的更广泛上下文信息。本文研究轻量级时间上下文是否能改善基于雷达的临近预报，特别是针对高强度降雨。我们提出了时间感知小注意力U-Net（TA-SmaAt-UNet），它在核心SmaAt-UNet模型基础上扩展了时间条件层，利用昼夜时间和一年中时间的循环编码来调节中间特征表示。在KNMI雷达降水数据上的实验表明，时间条件化对罕见的高强度降水事件最为有益，同时也能改善季节变异性和预测降水强度分布的表征。层传导分析进一步表明，尽管参数成本很小，模型仍积极使用添加的时间条件层。这些发现表明，简单的、基于物理动机的时间上下文可以提高基于深度学习的降水临近预报的真实性和可靠性。我们的模型实现和训练设置可在GitHub上获取。

英文摘要

Precipitation nowcasting is increasingly being approached with deep learning models that learn directly from recent radar observations. Although such models can efficiently capture short-term precipitation motion, they often lack broader contextual information about the meteorological conditions under which rainfall develops. This paper investigates whether lightweight temporal context can improve radar-based nowcasting, particularly for high-intensity rainfall. We propose the Time-Aware Small-Attention U-Net (TA-SmaAt-UNet), which extends the core SmaAt-UNet model with temporal conditioning layers that use cyclical encodings of time-of-day and time-of-year to modulate intermediate feature representations. Experiments on KNMI radar precipitation data show that temporal conditioning is most beneficial for rare, high-intensity precipitation events, while also improving the representation of seasonal variability and predicted rainfall-intensity distributions. A layer conductance analysis further indicates that the added temporal conditioning layers are actively used by the model despite their small parameter cost. These findings suggest that simple, physically motivated temporal context can improve the realism and reliability of deep learning-based precipitation nowcasts. The implementation of our models and training setup is available on \href{https://github.com/gijsvn/TA-SmaAt-UNet}{GitHub}.

URL PDF HTML ☆

赞 0 踩 0

2606.10080 2026-06-10 cs.LG cs.AI q-bio.QM 新提交

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

VFUSE: 基于稀疏自编码器的毒力特征理解

Michael Yu, Matthew L. Olson

AI总结提出VFUSE方法，通过训练稀疏自编码器（SAE）分析扩散-Transformer模型激活，识别蛋白质设计中的危险特征，实现可解释性提升而不牺牲性能。

详情

DUET -- 用于站外转化预测的双用户嵌入变换器

Reazul Hasan Russel, Mingwei Tang, Rostam Shirani, Xinlong Liu, Navid Madani, Leo Ding, Yawen He, Xiangyu Wang, Mustafa Acar, Ashish Katiyar, Yuhai Li, Alan Yang, Metarya Ruparel, Derek Qiang Xu, Rupert Wu, Rui Yang, Liang Tao, Xinyi Zhao, Larry Zhang, Sri Reddy, Rob Malkin

AI总结针对点击信号丰富但转化信号稀疏、延迟的问题，提出DUET框架，通过为点击和转化流分别预训练专用变换器编码器，生成互补嵌入，在服务延迟约束下提升站外转化率预测精度。

详情

AI中文摘要

站外转化率（OCVR）预测是计算推荐系统中一个重要的排序问题。该任务面临建模挑战：点击信号丰富且时间跨度短，而转化信号本质稀疏、延迟长且常无法归因。尽管存在这些统计差异，两种信号都必须为在严格服务延迟约束下运行的模型提供信息。先前的预训练方法使用单一、无差别的编码器统一应用于两个数据流。我们提出DUET（双用户嵌入变换器），该框架明确将用户行为数据划分为两个领域一致的流——点击和转化——并为每个流预训练专用变换器编码器，其架构针对各流的统计特征定制：密集点击流使用多层自注意力，稀疏转化流使用交错交叉和自注意力。生成的互补嵌入由下游排序器联合使用，而不超出服务延迟预算。评估显示，相对于最强基线，归一化熵（NE）降低高达0.38%，A/B测试显示OCVR预测精度持续提升。

英文摘要

Offsite conversion rate (OCVR) prediction is an important ranking problem in computational recommendation systems. This task presents a modeling challenge: click signals are abundant and exhibit short temporal horizons, whereas conversion signals are inherently sparse, long-delayed, and frequently unattributed. Despite these statistical disparities, both signal types must inform models that operate within strict serving-latency constraints. Prior pre-training approaches address this heterogeneity with a single, undifferentiated encoder applied uniformly across both data streams. We propose DUET (Dual User Embedding Transformers), a framework that explicitly partitions user behavioral data into two domain-coherent streams -- clicks and conversions -- and pre-trains dedicated transformer encoders with architectures tailored to each stream's statistical characteristics: multi-layer self-attention for the dense click stream and interleaved cross- and self-attention for the sparse conversion stream. The resulting complementary embeddings are jointly consumed by a downstream ranker without exceeding serving-latency budgets. Evaluation demonstrates up to 0.38% normalized entropy (NE) reduction relative to the strongest baseline, and A/B test shows consistent improvements in OCVR prediction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.10393 2026-06-10 cs.LG cs.CE 新提交

用于蛋白质性质预测的灵活核函数

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

发表机构 * University of Cambridge（剑桥大学）

AI总结提出利用进化替代矩阵和局部线性性的序列核函数，结合高斯过程实现数据高效的蛋白质性质预测，并融入结构信息进行多任务学习。

Comments 50 pages; to appear at ICML 2026

2606.11066 2026-06-10 cs.LG q-bio.NC 新提交

GRAFT: Gain-Recalibrated Adapters for Transformer-Based Neural Population Activity Modeling

GRAFT: 基于Transformer的神经群体活动建模中的增益重校准适配器

Xiangsheng Ge, Yang Xie

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出GRAFT模型，通过分离可重用时间动态与可重校准神经元接口，在MC Maze数据集上达到SOTA，并仅更新9.21%参数实现跨天重校准。

详情

AI中文摘要

神经群体活动模型可以从分箱的尖峰信号中恢复丰富的时间结构，但其读入和读出层通常与固定的记录神经元集合绑定。这种耦合限制了在长期脑机接口中的重用，因为记录神经元的身份、数量和响应统计可能每天变化。我们引入了GRAFT，一种基于Transformer的神经群体活动模型，它将可重用时间动态与可重校准的神经元接口分离。神经元接口控制记录神经元如何进入和离开共享骨干网络，辅助增益和位置机制支持Transformer内部的神经活动建模。在标准NLB'21协议下的MC Maze上，GRAFT作为集成模型达到0.3866 co-bps，在公共和报告的NLB'21结果中，在主要co-bps指标上创造了新的最先进水平。在从NLB'21 MC Maze数据集系列构建的跨天协议中，GRAFT通过仅更新9.21%的参数，从MC Maze重校准到缩放后的MC Maze数据集（Large/Medium/Small），在受限的目标天支持集下分别达到0.3749、0.3112和0.3152 co-bps。这些结果表明，相同的接口-骨干分离既支持强大的基于Transformer的神经群体活动建模，也支持数据高效的跨天重校准。

英文摘要

Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to a fixed set of recorded neurons. This coupling limits reuse in long-term brain-computer interfaces, where recorded neuron identities, counts, and response statistics can change across days. We introduce GRAFT, a Transformer-based neural population activity model that separates reusable temporal dynamics from a recalibratable neuron interface. The neuron interface controls how recorded neurons enter and leave the shared backbone, and auxiliary gain and positional mechanisms support neural activity modeling inside the Transformer. On MC Maze under the standard NLB'21 protocol, GRAFT reaches 0.3866 co-bps as an ensemble, setting a new state of the art on the primary co-bps metric among public and reported NLB'21 results. In a cross-day protocol constructed from the NLB'21 MC Maze dataset series, GRAFT recalibrates from MC Maze to the scaled MC Maze datasets (Large/Medium/Small) by updating only 9.21% of parameters, reaching 0.3749, 0.3112, and 0.3152 co-bps with restricted target-day support sets. These results show that the same interface-backbone separation supports both strong Transformer-based neural population activity modeling and data-efficient cross-day recalibration.

URL PDF HTML ☆

赞 0 踩 0

2606.09852 2026-06-10 cs.HC cs.AI cs.CL cs.LG cs.MA cs.SE 交叉投稿

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

基于LLM的代码文档生成与多裁判评估

Ikbel Ghrab, Mohamed Dhieb, Ismail Khenissi, Ines Abdeljaoued-Tej

发表机构 * University of Tunis El Manar（突尼斯国家理工大学）

AI总结提出利用八种大语言模型自动生成代码文档，并通过多裁判评估框架（四个LLM从九个维度评分）提升文档质量，在医学物理库上实验显示最佳与最差模型性能差距达42%。

Comments ICAHS, \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情

Journal ref: Conference ICAHS IEEE, 2025

AI中文摘要

高质量的源代码文档至关重要但往往被忽视，尤其是在医疗保健等关键领域，可靠性和可维护性至关重要。我们提出了一个AI驱动的框架，利用八种最先进的大语言模型（包括GPT、Gemini、Qwen和LLaMA变体）自动从代码和仓库生成文档。该系统基于PocketFlow编排框架，采用模块化流水线和高级提示工程，生成结构化、上下文感知的文档。为确保质量并指导模型选择，我们引入了MultiLLMasJudges评估框架，其中四个独立的LLM从九个标准（如完整性、清晰度和忠实度）评估输出。在开源医学物理库上进行的实验表明，最佳和最差模型之间的性能差距为42%。通过结合多样化的模型输出、优化的提示和严格的评估，我们的方法提高了文档质量并减少了人工工作量，特别是在安全关键的医疗软件中。

英文摘要

High-quality source code documentation is vital yet often neglected, especially in critical domains like healthcare where reliability and maintainability are essential. We presented an AI powered framework that automates documentation generation from code and repositories using eight state of the art Large Language Models (LLMs), including GPT, Gemini, Qwen, and LLaMA variants. Built on the PocketFlow orchestration framework, the system applies modular pipelines and advanced prompt engineering to produce structured, context aware documentation. To ensure quality and guide model selection, we introduced a MultiLLMasJudges evaluation framework, where four independent LLMs assess outputs across nine criteria, such as Completeness, Clarity, and Faithfulness. Experiments conducted on an open-source medical physics library, demonstrated showed a 42% performance gap between top and bottom models. By combining diverse model outputs, optimized prompting, and rigorous evaluation, our approach enhances documentation quality and reduces manual effort, especially in safety critical healthcare software.

URL PDF HTML ☆

赞 0 踩 0

2606.09896 2026-06-10 cs.GT cs.AI cs.LG 交叉投稿

HMAF: A Hierarchical Multi-Slot GD-RTB Allocation Framework

HMAF：一种分层多槽GD-RTB分配框架

Tianxing Bu, Zhaoqi Zhang, Linyou Cai, Miao Xie, Shengri Xue, Tan Qu, Qianlong Xie, Xingxing Wang, Siqiang Luo, Gao Cong

发表机构 * Meituan（美团）； Nanyang Technological University（南洋理工大学）； China Agricultural University（中国农业大学）

AI总结针对GD与RTB共存广告平台中短期收益与长期交付的平衡难题，提出分层多槽分配框架HMAF，采用计划-校准-执行范式，集成离线约束优化与在线决策，在美团实现GD交付率提升3.72%、广告总收入提升1.59%。

Comments Accepted by KDD 2026 Applied Data Science Track

详情

DOI: 10.1145/3770855.3818434

AI中文摘要

在现代在线广告平台中，保量交付（GD）合约与实时竞价（RTB）拍卖共存并相互竞价。现有方法要么将GD和RTB优化解耦，要么依赖启发式优先级规则，因此在复杂多槽交付和曝光约束下，难以有效平衡短期收入最大化与长期合约交付。为应对这些挑战，我们提出HMAF（分层多槽分配框架），一个旨在优化GD-RTB广告平台中曝光分配的统一框架。HMAF采用计划-校准-执行范式作为其核心结构，整合离线约束优化与在线决策，平衡离线GD资源规划、动态校准GD-RTB竞争力，并在多槽环境中做出实时列表级排名决策。HMAF已在全球最大在线食品配送平台之一美团的多项营销场景中实施，使GD交付率提升3.72%，广告总收入提升1.59%。

英文摘要

In modern online advertising platforms, Guaranteed Delivery (GD) contracts coexist and bid with Real-Time Bidding (RTB) auctions. Recent approaches either decouple GD and RTB optimization or rely on heuristic priority rules, and thus fail to effectively balance short-term revenue maximization with long-term contract delivery under complex multi-slot delivery and impression constraints. To address these challenges, we propose HMAF (Hierarchical Multi-Slot Allocation Framework), a unified framework designed to optimize impression allocation in GD--RTB advertising platforms. HMAF employs the Plan--Calibrate--Execute paradigm as its core structure, and integrates offline constraint optimization with online decision-making, balancing offline GD resource planning, dynamically calibrating GD--RTB competitiveness, and making real-time listwise rank decisions across multi-slot environments. HMAF has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 3.72% increase in GD delivery rate and a 1.59% increase in total advertisement revenue.

URL PDF HTML ☆

赞 0 踩 0

2606.09953 2026-06-10 eess.IV cs.AI cs.LG 交叉投稿

Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT

深度切片插值用于减少头部CT的穿平面各向异性和噪声

Luis Cortés Ferre, Miguel A. Gutiérrez-Naranjo, Marcin Balcerzyk

发表机构 * Department of Computer Science and Artificial Intelligence, University of Seville（塞维利亚大学计算机科学与人工智能系）； Bioaraba Health Research Institute（Bioaraba健康研究 institute）； IKERBASQUE, Basque Foundation of Science（巴斯克科学基金会）

AI总结提出一种深度学习系统，通过相邻轴向切片对合成中间CT切片，将有效穿平面间距减半，同时实现隐式降噪，在结构指标上优于经典插值和视频帧插值方法。

详情

AI中文摘要

头部计算机断层扫描（CT）通常使用亚毫米级的面内分辨率，但穿平面间距为2-5毫米，造成显著的各向异性，这会降低多平面重建、血肿体积估计等体积测量以及假设近似各向同性体素的后续算法的性能。我们提出一个深度学习系统，从相邻轴向切片对合成中间CT切片，将有效穿平面间距减半。该系统改善三维可视化，同时产生固有降噪的输出，在一次推理中实现两个互补优势。为构建可靠系统，我们系统评估像素级损失（均方误差MSE和平均绝对误差L1）、结构相似性损失（结构相似性指数SSIM及其多尺度变体MS-SSIM）以及混合组合。在保留测试集上，所有收敛模型在所有结构指标上均优于经典插值基线和预训练视频帧插值方法（RIFE、FILM），其中MS-SSIM+L1提供最强平衡性能。我们还记录了SSIM族损失中的训练不稳定性并识别部分补救措施：标准数值修复消除了主要失败模式，但在较小批量大小下留下残余发散。所有结果均报告患者级自助法置信区间和配对统计检验。作为示例，我们将系统应用于来自Virgen del Rocío大学医院的非分布头部CT序列：模型合成中间切片，并在真实切片上表现出我们理论分析预测的隐式降噪特征，支持在单个外部病例中插值质量和隐式降噪不局限于训练分布。

英文摘要

Head computed tomography (CT) typically uses sub-millimeter in-plane resolution but 2-5 mm through-plane spacing, creating substantial anisotropy that degrades multiplanar reconstructions, volumetric measurements such as hematoma volume estimation, and downstream algorithms that assume near-isotropic voxels. We present a deep learning system that synthesizes intermediate CT slices from pairs of neighboring axial slices, halving the effective through-plane spacing. The system improves three-dimensional visualization while simultaneously producing inherently denoised outputs, yielding two complementary benefits from a single inference pass. To build a reliable system, we systematically evaluate pixel-wise losses, namely mean squared error (MSE) and mean absolute error (L1); structural-similarity losses, namely the structural similarity index (SSIM) and its multi-scale variant (MS-SSIM); and hybrid combinations. On a held-out test set, all converged models outperform classical interpolation baselines and pretrained video frame interpolation methods (RIFE, FILM) on all structural measures, with MS-SSIM+L1 offering the strongest balanced profile. We also document training instability in SSIM-family losses and identify partial remedies: the standard numerical fixes eliminate the dominant failure mode but leave residual divergence at smaller batch sizes. All results are reported with patient-level bootstrap confidence intervals and paired statistical tests. As an illustration, we apply the system to an out-of-distribution head CT series from Hospital Universitario Virgen del Rocío: the model synthesizes intermediate slices and exhibits on the real slices the implicit-denoising signature predicted by our theoretical analysis, supporting in a single external case that interpolation quality and implicit denoising are not confined to the training distribution.

URL PDF HTML ☆

赞 0 踩 0

2606.10025 2026-06-10 cs.RO cs.CV cs.LG 交叉投稿

GHOST: Hierarchical Sub-Goal Policies for Generalizing Robot Manipulation

GHOST: 用于泛化机器人操作的层次化子目标策略

Sriram Krishna, Ben Eisner, Haotian Zhan, Ying Yuan, Haoyu Zhen, Chuang Gan, Shubham Tulsiani, David Held

AI总结提出GHOST框架，通过将控制分解为高层子目标预测和低层目标条件控制器，实现视觉运动操作策略的泛化，并利用人类演示适应新物体和任务变化。

Comments Accepted at RSS 2026

详情

AI中文摘要

我们提出了GHOST，一个学习视觉运动操作策略的框架，该策略能够泛化到训练分布之外。GHOST将控制分解为：(i) 高层策略，从多视角RGB-D观测中预测下一个子目标作为3D末端执行器位姿的分布，以及(ii) 低层目标条件控制器，执行特定于具体体的动作。为了将基于图像的策略条件化于3D目标，我们引入了一个简单的空间接口，将预测的目标投影到图像平面，并将其表示为末端执行器热图。在一系列操作任务中，与平坦的扩散策略相比，这种层次化分解持续提高了性能和鲁棒性。此外，我们展示了这种层次化接口也使得整合人类演示变得容易，而无需依赖（嘈杂的）动作重定向。由于子目标在很大程度上与具体体无关，我们在人类视频上训练高层策略，以指定如何应用和组合学到的技能，同时保持低层策略仅在机器人数据上训练。这种层次结构使得能够使用少量人类演示适应新物体和任务变化。

英文摘要

We present GHOST, a framework for learning visuomotor manipulation policies that generalize beyond the training distribution. GHOST factorizes control into (i) a high-level policy that predicts the next sub-goal as a distribution over 3D end-effector poses from multi-view RGB-D observations, and (ii) a low-level goal-conditioned controller that executes embodiment-specific actions. To condition image-based policies on 3D goals, we introduce a simple spatial interface that projects predicted goals into the image plane and represents them as end-effector heatmaps. Across a suite of manipulation tasks, this hierarchical factorization consistently improves performance and robustness compared to a flat Diffusion Policy. Further, we show that this hierarchical interface also makes it easy to incorporate human demonstrations without relying on (noisy) action retargeting. As sub-goals are largely embodiment-agnostic, we train the high-level policy on human video to specify how learned skills should be applied and composed, while keeping the low-level policy trained purely on robot data. This hierarchy enables adaptation to novel objects and task variations using a small number of human demonstrations.

URL PDF HTML ☆

赞 0 踩 0

2606.10112 2026-06-10 cs.GT cs.AI cs.LG econ.TH 交叉投稿

Duality for Optimal Multi-Item, Multi-Bidder Auction Design: Revenue Certificates through Deep Learning

最优多物品多竞拍者拍卖设计的对偶性：通过深度学习的收入证书

Yanchen Jiang, David C. Parkes, Tonghan Wang

发表机构 * Harvard University（哈佛大学）； College of AI, Tsinghua University（清华大学人工智能学院）

AI总结提出首个直接处理多物品多竞拍者拍卖对偶问题的计算框架，通过神经网络参数化拉格朗日乘子并引入提升技术，生成可证明的收入上界，为连续类型提供近最优性证书。

详情

AI中文摘要

刻画多物品、多竞拍者设置下的收入最优拍卖仍然是一个基本开放问题，除了限制性的二元类型实例外，没有已知的闭式解。这激发了人们对最优拍卖设计的计算方法的兴趣。在本文中，我们引入了第一个直接处理多物品、多竞拍者拍卖和占优策略激励相容（DSIC）的对偶问题的计算框架，生成有证书的收入上界。我们的方法使用神经网络参数化具有结构保证的严格流量守恒性质的拉格朗日乘子，从而通过梯度下降对可行对偶解进行高效优化。为了弥合离散计算方法与连续类型的理论保证之间的差距，我们开发了一种新颖的提升技术，将对偶证书从粗离散化映射到精细细化。我们证明，对于具有连续均匀估值的多物品、多竞拍者拍卖，提升给出了有效的收入上界。此外，我们给出了任意连续分布的广义提升构造，并证明了这些提升对偶在离散极限下收敛到原始连续问题的收入。我们通过恢复典型实例的已知分析机制，验证了该对偶拍卖设计问题的计算框架。对于多物品多竞拍者问题，我们的框架在最优收入与已知最佳DSIC机制之间建立了小差距，提供了近最优性的计算证书。

英文摘要

Characterizing revenue-optimal auctions for multi-item, multi-bidder settings remains a fundamental open problem, with no known closed-form solution existing beyond restrictive binary-type instances. This has motivated interest in computational approaches to optimal auction design. In this paper, we introduce the first computational framework that directly tackles the dual problem for multi-item, multi-bidder auctions and dominant-strategy incentive compatibility (DSIC), generating certified revenue upper bounds. Our approach parametrizes Lagrange multipliers with a structurally guaranteed strict flow-conservation property using neural networks, enabling efficient optimization over feasible dual solutions via gradient descent. To bridge the gap between discrete computational methods and theoretical guarantees for continuous types, we develop a novel lifting technique that maps dual certificates from coarse discretizations to fine refinements. We prove that lifting gives valid revenue upper bounds for multi-item, multi-bidder auctions with continuous uniform valuations. Furthermore, we give a generalized lifting construction for arbitrary continuous distributions and demonstrate that these lifted duals converge to the revenue of the original continuous problem in the discrete limit. We validate this computational framework for the dual auction design problem by recovering known analytical mechanisms for canonical instances. For multi-item multi-bidder problems, our framework establishes a small gap between the optimal revenue and best-known DSIC mechanisms, providing computational certificates of near-optimality.

URL PDF HTML ☆

赞 0 踩 0

2606.10126 2026-06-10 cs.CL cs.AI cs.CY cs.LG 交叉投稿

Pareto-Guided Teacher Alignment for Fair Personalized Text Generation

帕累托引导的教师对齐用于公平个性化文本生成

Tunazzina Islam

发表机构 * Purdue University（普渡大学）

AI总结提出帕累托引导的教师对齐框架，通过修订候选生成、对感知可行性门控、帕累托候选选择和偏好优化，在减少人口统计差异的同时保持个性化保真度，实验表明公平缓解效果依赖于目标且跨域迁移不一致。

详情

AI中文摘要

个性化说服性文本生成可以提高相关性和参与度，但人口统计条件也可能引入跨群体的不平等框架。我们将个性化生成中的公平缓解研究为一个受约束的多目标对齐问题：在保持个性化保真度的同时减少人口统计差异。我们提出一个帕累托引导的教师对齐框架，结合了基于修订的候选生成、对感知可行性门控、帕累托风格的候选选择，以及通过监督微调和直接偏好优化的可选偏好优化。我们在气候变化和疫苗接种说服任务上评估该框架，使用一个受控的上下文丰富的人口统计网格（匹配性别和年龄对）以及一个统一的五审计评估套件，涵盖说服偏见、形式差异、情感框架差异、词汇关联差异和个性化保真度。在两个领域和跨族系迁移设置中，没有单一的对齐策略能同时主导所有目标。相反，方法占据了公平-个性化帕累托前沿的不同区域：一些方法实现更强的差异减少，而另一些则更好地保持个性化或人口统计稳定性。我们的结果表明，公平缓解效果依赖于目标，并在领域和模型族系间不一致地迁移，这促使在公平敏感的个性化生成中采用有界回归、多审计模型选择而非单指标优化。

英文摘要

Personalized persuasive text generation can improve relevance and engagement, but demographic conditioning may also introduce unequal framing across groups. We study fairness mitigation in personalized generation as a constrained multi-objective alignment problem: reduce demographic disparities while preserving personalization fidelity. We propose a Pareto-guided teacher alignment framework that combines revision-based candidate generation, pair-aware feasibility gating, Pareto-style candidate selection, and optional preference optimization through supervised fine-tuning and direct preference optimization. We evaluate the framework on climate change and vaccination persuasion tasks using a controlled context-rich demographic grid with matched gender and age pairs and a unified five-audit evaluation suite spanning persuasion bias, formality disparity, emotional framing disparity, lexical association disparity, and personalization fidelity. Across both domains and cross-family transfer settings, no single alignment strategy dominates all objectives simultaneously. Instead, methods occupy different regions of a fairness-personalization Pareto frontier: some achieve stronger disparity reductions, while others better preserve personalization or demographic stability. Our results show that fairness mitigation effects are objective-dependent and transfer inconsistently across domains and model families, motivating bounded-regression, multi-audit model selection over single-metric optimization for fairness-sensitive personalized generation.

URL PDF HTML ☆

赞 0 踩 0

2606.10233 2026-06-10 eess.AS cs.LG cs.SD 交叉投稿

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

ANCHOR: 自回归非侵入式分块有序细化用于联合多分辨率语音质量建模

Zhuoyan Tao, Jiatong Shi, Hye-jin Shim, Shinji Watanabe

发表机构 * University of Southern California, USA（美国南加州大学）； Carnegie Mellon University, USA（美国卡内基梅隆大学）

AI总结提出ANCHOR模型，将增量语音质量评估重构为多分辨率自回归任务，通过双分辨率令牌和分辨率感知层次实现分块到整句的粗到细细化，在部分输入下显著降低误差，并揭示感知质量的时域积累机制。

Comments Accepted at Interspeech 2026

详情

AI中文摘要

虽然语音质量通常是在完整话语上评估的，但流式和生成系统需要从部分音频中进行增量估计。现有的预测器假设完整的上下文，在受前缀约束的输入上性能下降。扩展ARECHO，我们提出ANCHOR，将增量评估重新表述为多分辨率自回归任务。它使用双分辨率令牌和分辨率感知层次结构在单个解码器中建模分块级和话语级质量，实现从粗到细的细化。实验表明，在部分输入下具有显著的鲁棒性，包括在2秒前缀上PLCMOS误差减少48%。收敛性分析揭示了4-6秒的有效感知上下文范围。压力测试进一步隔离了局部损坏下的结构化外推偏差。结果表明，层次监督改进了增量预测，并阐明了感知质量如何随时间累积。

英文摘要

While speech quality is typically assessed on complete utterances, streaming and generative systems require incremental estimation from partial audio. Existing predictors assume full context, degrading on prefix-constrained inputs. Extending ARECHO, we propose ANCHOR, reformulating incremental assessment as a multi-resolution autoregressive task. It models chunk- and utterance-level quality within a single decoder using dual-resolution tokens and a resolution-aware hierarchy for coarse-to-fine refinement. Experiments show substantial robustness under partial input, including a 48% PLCMOS error reduction on 2-second prefixes. Convergence analysis reveals a 4-6 s effective perceptual context horizon. A stress test further isolates structured extrapolation biases under localized corruption. Results demonstrate that hierarchical supervision improves incremental prediction and elucidates how perceptual quality accumulates over time.

URL PDF HTML ☆

赞 0 踩 0

2606.10279 2026-06-10 cs.AI cs.CL cs.LG 交叉投稿

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

使用合成理由数据进行监督微调损害真实世界疾病预测

Buxin Su, Bingxuan Li, Cheng Qian, Yiwei Wang, Jin Jin, Bingxin Zhao

发表机构 * University of Pennsylvania（宾夕法尼亚大学）； University of Illinois Urbana-Champaign（伊利诺伊大学厄巴纳-香槟分校）； University of California, Merced（加州大学默塞德分校）

AI总结研究发现，在临床预测任务中，使用合成理由数据进行监督微调反而显著降低模型性能，根本原因在于叙事合理性与判别优化之间的结构性冲突。

详情

AI中文摘要

监督微调中使用合成理由数据被广泛认为能通过教导模型不仅预测什么而且预测原因来提升语言模型在临床预测任务上的性能。我们在基于纵向健康史进行五年阿尔茨海默病及相关痴呆症（ADRD）预测的任务上检验了这一假设。通过一项包含504种配置的大规模对照实验，我们发现，与仅使用标签的微调相比，基于理由的SFT始终且显著地损害了预测性能。这种退化在多个模型系列和数据规模中持续存在，并且无法通过使用面向推理的基础模型来解决。关键的是，这种失败并非由理由质量差所致：人类专家注释证实生成的理由在医学上是准确的，并且忠实于患者特定的证据；少样本实验表明，当相同的理由作为推理时的演示而非训练目标使用时，能提升性能。我们确定根本原因在于叙事合理性与判别优化之间的结构性冲突。我们希望我们的工作能为更精确地理解理由监督何时以及如何有帮助、何时无帮助铺平道路，从而指导在高风险临床预测中负责任地开发语言模型。

英文摘要

Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year Alzheimer's disease and related dementias (ADRD) prediction from longitudinal health histories. Across a large-scale controlled experiment of 504 configurations, we find that rationale-based SFT consistently and substantially hurts prediction performance relative to label-only fine-tuning. The degradation persists across model families and data scales, and is not resolved by using a reasoning-oriented base model. Crucially, the failure is not explained by poor rationale quality: human expert annotation confirms that the generated rationales are medically accurate and faithfully grounded in patient-specific evidence, and few-shot experiments show that the same rationales improve performance when used as inference-time demonstrations rather than training targets. We identify the root cause as a structural conflict between narrative plausibility and discriminative optimization. We hope our work paves the path toward a more precise understanding of when and how rationale-based supervision helps and when it does not, guiding the responsible development of language models for high-stakes clinical prediction.

URL PDF HTML ☆

赞 0 踩 0

2606.10349 2026-06-10 cond-mat.dis-nn cond-mat.str-el cs.LG 交叉投稿

Magnetic HIP-NN for spin dynamics in disordered itinerant magnets

磁性HIP-NN用于无序巡游磁体中的自旋动力学

Supriyo Ghosh, Yunhao Fan, Sheng Zhang, Kipton Barros, Gia-Wei Chern

发表机构 * Department of Physics, University of Virginia（弗吉尼亚大学物理系）； Department of Chemistry, University of Chicago（芝加哥大学化学系）； Theoretical Division and CNLS, Los Alamos National Laboratory（洛斯阿拉莫斯国家实验室理论 division 和 CNLS）

AI总结提出磁性HIP-NN（mHIP-NN），通过旋转不变自旋关联的分层消息传递，高效模拟无序巡游磁体中电子介导的自旋动力学，准确再现Landau-Lifshitz-Gilbert动力学和热淬火后的非平衡自旋关联演化。

Comments 12 pages, 5 figures

详情

AI中文摘要

我们提出了分层相互作用粒子神经网络（HIP-NN）的磁性扩展，用于实现无序巡游磁体中电子介导自旋动力学的大规模模拟。由此产生的磁性HIP-NN（mHIP-NN）将旋转不变的自旋关联直接纳入分层消息传递层，使网络能够从耦合的几何-自旋环境中学习涌现的磁能景观和有效局域场，同时保持自旋旋转对称性。作为基准应用，我们考虑了结构无序的巡游$s$-$d$交换模型，其中有效磁力动态地来源于瞬时电子结构，并且使用传统的基于精确对角化的方法在计算上难以评估。我们表明，mHIP-NN准确再现了控制Landau-Lifshitz-Gilbert动力学的局域力矩，并忠实地捕捉了热淬火后空间自旋关联的非平衡演化。我们的结果确立了对称性感知的分层消息传递网络作为大规模模拟受挫巡游自旋系统和非平衡磁动力学的高效且可扩展的框架。更广泛地说，由于学习的能量泛函对原子坐标和自旋变量完全可微，该框架也为自旋依赖的原子间势和耦合原子-自旋动力学提供了自然基础。

英文摘要

We present a magnetic extension of the Hierarchically Interacting Particle Neural Network (HIP-NN) that enables large-scale simulations of electron-mediated spin dynamics in disordered itinerant magnets. The resulting magnetic HIP-NN (mHIP-NN) incorporates rotationally invariant spin correlations directly into hierarchical message-passing layers, enabling the network to learn emergent magnetic energy landscapes and effective local fields from coupled geometric-spin environments while preserving spin-rotation symmetry. As a benchmark application, we consider structurally disordered itinerant $s$-$d$ exchange models in which the effective magnetic forces arise dynamically from the instantaneous electronic structure and are computationally prohibitive to evaluate using conventional exact-diagonalization-based approaches. We show that mHIP-NN accurately reproduces the local torques governing Landau-Lifshitz-Gilbert dynamics and faithfully captures the nonequilibrium evolution of spatial spin correlations following thermal quenches. Our results establish symmetry-aware hierarchical message-passing networks as an efficient and scalable framework for large-scale simulations of frustrated itinerant spin systems and nonequilibrium magnetic dynamics. More broadly, because the learned energy functional remains fully differentiable with respect to both atomic coordinates and spin variables, the framework also provides a natural foundation for spin-dependent interatomic potentials and coupled atom-spin dynamics.

URL PDF HTML ☆

赞 0 踩 0

2606.10472 2026-06-10 cs.GT cs.LG 交叉投稿

Trading Utility for Dynamic Fairness in Multiple Resource Division with Sequential Demand

在顺序需求的多资源分配中权衡效用与动态公平性

Kaiqi Jiang, Karim El Husseini, Wenzhe Fan, Xinhua Zhang

发表机构 * Computer Science Dept. University of Illinois Chicago（伊利诺伊大学芝加哥分校计算机科学系）

AI总结提出一种神经分配机制，通过多目标优化在顺序分配中平衡公平与效用，实现更高的效用同时保持可比公平性。

详情

AI中文摘要

动态多资源分配是共享计算环境中的一个核心问题，其中用户的需求顺序到达，且必须在不知道未来需求的情况下公平分配资源。现有方法强调公平性保证，如共享激励、无嫉妒和动态帕累托最优性，但往往忽略系统效用。此外，这些公平性标准互不兼容，无法同时严格实施。我们提出一种神经分配机制，通过在顺序展开过程中进行多目标优化来调和公平性与效用。我们首先通过共享激励、无嫉妒和动态帕累托最优性的逐步损失函数形式化动态环境中的公平性，从而实现可微训练。利用非浪费性，我们通过将分配约束在需求子空间内来参数化解，同时允许在资源可用时进行弹性过度分配。实验结果表明，我们学习的分配器在可比公平性水平下实现了显著更高的效用，揭示了跨指标的清晰帕累托前沿式权衡。

英文摘要

Dynamic multi-resource allocation is a central problem in shared computing environments, where users' demands arrive sequentially and resources must be distributed fairly without knowledge of future demands. Existing methods emphasize fairness guarantees such as Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, but often overlook system utility. Moreover, these fairness criteria are mutually incompatible, preventing strict enforcement of them at the same time. We propose a neural allocation mechanism that reconciles fairness with utility through multi-objective optimization during sequential rollout. We first formalize fairness in the dynamic setting via stepwise loss functions for Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, enabling differentiable training. Leveraging non-wastefulness, we parameterized the solutions by constraining allocations to the subspace of demand while allowing elastic over-allocation when resources remain available. Empirical results demonstrate that our learned allocator achieves substantially higher utility at comparable levels of fairness, uncovering clear Pareto-frontier-like tradeoffs across metrics.

URL PDF HTML ☆

赞 0 踩 0

2606.10547 2026-06-10 eess.IV cond-mat.mtrl-sci cs.LG physics.ins-det 交叉投稿

Unsupervised Deep Learning for Limited-Angle STEM-EDX Tomography -- Application to 3D Chemical Analysis of Phase-Change Memory Devices

无监督深度学习用于有限角度STEM-EDX层析成像——在相变存储器件三维化学分析中的应用

Daniel del Pozo Bueno, Serge Brosset, Theo Monniez, Gabriele Navarro, Philippe Ciuciu, Zineb Saghi

发表机构 * CEA, LETI, Univ. Grenoble Alpes（CEA LETI 格鲁诺布尔大学）； CEA, Neurospin, Paris-Saclay University（CEA 神经科学研究中心巴黎-萨克雷大学）； Inria, MIND（Inria MIND）

AI总结提出基于深度图像先验和全变分正则化的无监督深度学习框架（DIP-TV及多通道扩展DIPm-TV），解决有限角度STEM-EDX层析成像中的缺失楔伪影和噪声问题，实现相变存储器件的三维化学分析。

Comments 29 pages (17 main manuscript + 12 supplementary information), 4 figures, 8 supplementary figures, 1 table, and 4 supplementary tables

详情

AI中文摘要

扫描透射电子显微镜（STEM）中的能量色散X射线（EDX）层析成像能够实现纳米尺度的三维成分和元素映射，但其应用受到有限倾斜范围和避免束损伤所需的低剂量条件的限制。有限角度采集会引入缺失楔伪影，如拉长和各向异性分辨率，而噪声低剂量数据进一步降低重建质量和定量可靠性。本文提出了一种基于深度图像先验和全变分正则化（DIP-TV）的无监督深度学习框架，用于有限角度STEM-EDX层析成像。我们将其扩展为多通道公式（DIPm-TV），通过利用空间相关性联合重建多个元素图。使用合成三通道体模，我们展示了该方法在中等噪声下补偿了约$100^\circ$缺失角度范围对应的严重缺失楔伪影，性能优于同步迭代重建技术和压缩感知方法。我们将该方法应用于原始（制备态）和SET（晶态）工作状态下的Ge-Sb-Te（GST）存储器件三维化学分析。样品制备为横截面聚焦离子束薄片，并在$-40^\circ$至$+40^\circ$的有限角度倾斜范围内以$5^\circ$步长和$2.0\times10^5$ $e^-/Ang^2$剂量采集。多通道方法仅利用EDX信号实现逐体素元素重建，无需外部结构先验（如高角环形暗场成像）。重建体积显示出近各向同性空间分辨率，并揭示了与器件操作相关的成分异质性。该方法能够在实验可实现的样品几何结构中进行三维化学表征，而传统方法因严重的角度限制而失效。

英文摘要

Energy Dispersive X-ray (EDX) tomography in Scanning Transmission Electron Microscopy (STEM) enables 3D compositional and elemental mapping at the nanoscale, but its use is limited by restricted tilt ranges and low-dose conditions required to avoid beam damage. Limited-angle acquisition introduces missing-wedge artefacts such as elongation and anisotropic resolution, while noisy low-dose data further degrade reconstruction quality and quantitative reliability. Here, we introduce an unsupervised deep learning framework based on Deep Image Prior with total variation regularization (DIP-TV) for limited-angle STEM-EDX tomography. We extend it to a multi-channel formulation (DIPm-TV) that jointly reconstructs multiple elemental maps by exploiting spatial correlations. Using a synthetic 3-channel phantom, we show that the method compensates for severe missing-wedge artefacts corresponding to approximately $100^\circ$ of missing angular range under moderate noise, outperforming simultaneous iterative reconstruction technique and compressed sensing approaches. We apply the method to 3D chemical analysis of Ge-Sb-Te (GST) memory devices in virgin (as-fabricated) and SET (crystalline) operational states. Samples were prepared as cross-sectional focused ion beam lamellae and acquired under a limited-angle tilt range from $-40^\circ$ to $+40^\circ$ with $5^\circ$ steps and a dose of $2.0\times10^5$ $e^-/Ang^2$. The multi-channel approach enables voxel-by-voxel elemental reconstruction using only EDX signals without external structural priors such as high-angle annular dark-field imaging. The reconstructed volumes show near-isotropic spatial resolution and reveal compositional heterogeneities associated with device operation. This approach enables 3D chemical characterization in experimentally accessible sample geometries where conventional methods fail due to severe angular limitations.

URL PDF HTML ☆

赞 0 踩 0

2606.10600 2026-06-10 eess.SY cs.LG cs.SY 交叉投稿

Toward Proactive RF Charging Scheduling: Generative AI for Decision Support

面向主动射频充电调度：用于决策支持的生成式人工智能

Amirhossein Azarbahram, Osmel M. Rosabal, David Ernesto Ruiz-Guirola, Melike Erol-Kantarci, Kaibin Huang, Onel L. A. López

发表机构 * Centre for Wireless Communications (CWC), University of Oulu, Finland（芬兰奥卢大学无线通信中心）； School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada（加拿大渥太华大学电气与计算机科学学院）； Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong（香港大学电子与电气工程系）

AI总结本文提出将生成式AI作为不确定性感知支持层，辅助射频无线充电调度器在有限资源和不确定条件下做出鲁棒充电决策，并通过仓库案例验证其有效性。

详情

AI中文摘要

射频无线能量传输（RF-WPT）是一种支持未来物联网系统不间断通信的使能技术，通过减少电池更换需求和缓解电池废弃物相关问题。在大规模RF-WPT部署中，主要挑战之一是调度器级别的资源分配。具体而言，发射器必须在有限的充电资源、不完整的接收端信息以及不确定的近未来充电条件下，决定输送多少能量、何时输送以及向谁输送。本文将生成式人工智能（GenAI）定位为一种有前景的工具，因为它能够基于粗略的操作上下文和接收端信息，预见多种可能的充电场景。我们提出GenAI作为RF-WPT调度器的不确定性感知支持层，而非独立的预测或决策工具。为此，我们首先重新审视RF-WPT调度面临的主要挑战，并讨论主要GenAI系列如何通过为下游任务生成基于场景的输入来支持不确定性感知的充电决策。然后，我们通过一个仓库式案例研究表明，与确定性预测和简单的无学习基线相比，通过生成模型的采样能力保留不确定性可以改善鲁棒充电决策，尤其是在风险敏感目标下。最后，我们指出了关键开放挑战并提出了未来研究方向。

英文摘要

Radio frequency wireless power transfer (RF-WPT) is an enabling technology for supporting uninterrupted communications in future Internet of Things systems by reducing the need for battery replacement and mitigating battery-waste-related issues. For large-scale RF-WPT deployment, one of the main challenges is the scheduler-level resource allocation. Specifically, the transmitter must decide how much energy to deliver, when, and to whom, under limited charging resources, incomplete receiver-side information, and uncertain near-future charging conditions. This article positions generative artificial intelligence (GenAI) as a promising tool for this setting because it can foresee multiple plausible charging scenarios conditioned on coarse operational context and receiver-side information. We propose GenAI to act as an uncertainty-aware support layer for the RF-WPT scheduler rather than as a standalone forecasting or decision-making tool. To this end, we first revisit the main challenges of RF-WPT scheduling, and discuss how major GenAI families can support uncertainty-aware charging decisions by generating scenario-based inputs for downstream tasks. We then present a warehouse-style case study showing that preserving uncertainty through the sampling capability of generative models can improve robust charging decisions compared with deterministic prediction and simple non-learning baselines, especially under risk-sensitive objectives. Finally, we identify key open challenges and present some directions for future research.

URL PDF HTML ☆

赞 0 踩 0

2606.10627 2026-06-10 cs.HC cs.LG cs.SD 交叉投稿

Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice

Profy: 面向钢琴练习的、可解释的专业技能依赖性运动技能可视化

Kazuki Kawamura, Fujiki Nakamura, Hayato Nishioka, Momoko Shioki, Shinichi Furuya, Jun Rekimoto

发表机构 * The University of Tokyo（东京大学）； Sony Computer Science Laboratories（索尼计算机科学实验室）； NeuroPiano Institute（神经钢琴研究所）

AI总结提出弱监督系统Profy，利用听众评分标签学习时间对齐的高亮，帮助钢琴学习者定位需重点练习的段落，在无局部标签下与专家标注高度一致。

Comments Designing Interactive Systems Conference (DIS '26), June 13-17, 2026, Singapore, Singapore

详情

DOI: 10.1145/3800645.3812903

AI中文摘要

钢琴演奏的质量取决于微妙的时机、发音和动态控制，但练习反馈通常是基于总结的且难以付诸行动。我们介绍了Profy，一个弱监督系统，它从聚合听众评分（专家标记与业余标记）中学习片段级标签，生成时间对齐的高亮，用于钢琴练习中的回顾。我们收集了73名钢琴家的同步1 kHz键运动与音频数据，并使用1083个有效片段进行建模和评估。模型在共享的重采样模型时间基上输出片段级预测和证据分数以进行可视化。在21名专家钢琴家标注的20个业余短技术练习片段上，尽管训练时没有局部标签，显示的高亮分数与专家标记用于回顾的段落一致（Pearson r=0.61，ROC-AUC 0.75）。Profy不是用一个全局分数总结一个片段，而是通过支持与专家-业余差异相关的时间局部段落的擦洗、循环和聚焦回放，帮助学习者决定下一步检查哪里。

英文摘要

The quality of piano performance depends on nuanced timing, articulation, and dynamic control, but practice feedback is often summary-based and hard to act on. We introduce Profy, a weakly supervised system that learns from take-level labels derived from aggregated listener ratings (expert-labeled vs. amateur-labeled) to produce time-aligned highlights for review during piano practice. We collected synchronized 1 kHz key-motion and audio from 73 pianists and used 1,083 valid takes for modeling and evaluation. The model outputs clip-level predictions together with evidence scores on a shared resampled model time base for visualization. On 20 amateur clips from short technique studies annotated by 21 expert pianists, the displayed highlight score aligns with passages that expert pianists marked for review despite training without localized labels (Pearson r=0.61, ROC-AUC 0.75). Rather than summarizing a take with a single global score, Profy helps learners decide where to inspect next by supporting scrubbing, looping, and focused replay of time-localized passages associated with expert-amateur differences.

URL PDF HTML ☆

赞 0 踩 0

2606.10686 2026-06-10 physics.comp-ph astro-ph.IM cs.LG 交叉投稿

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

基于物理信息Kolmogorov-Arnold网络的轴对称脉冲星磁层自适应框架

Spyros Rigas, Ioannis Contopoulos, Georgios Alexandridis, Antonios Nathanail

发表机构 * Department of Digital Industry Technologies, School of Science, National and Kapodistrian University of Athens（数字产业技术系，科学学院，国家与卡布利安大学）； Research Center for Astronomy and Applied Mathematics, Academy of Athens（天文与应用数学研究所，雅典学院）

AI总结提出基于Kolmogorov-Arnold网络的自适应框架，结合自动化训练流程和物理收敛准则，在双精度下将PDE残差均方误差降至O(1e-6)，收敛时间缩短至20分钟内，并可靠解析缩小80%的恒星半径。

Comments 25 pages, 10 figures. Submitted to Journal of Computational Physics

详情

AI中文摘要

脉冲星磁层直到最近才通过物理信息神经网络（PINNs）进行研究，采用区域分解方法并将分离线和赤道电流片视为无限薄的间断。然而，这一基线方法需要大量手动超参数调整，最终精度有限且需要数小时训练。我们通过引入基于Kolmogorov-Arnold网络的领域特定神经架构、自动化自适应训练流程以及基于物理的收敛准则来改进这一框架，消除了手动校准的需求。所提出的方法提供了自洽的轴对称磁层解，在双精度下PDE残差的均方误差达到O(1e-6)量级——比基线方法提高了两个数量级——同时在单精度下在20分钟内实现收敛。重要的是，该方法可靠地解析了相比基线缩小高达80%的恒星半径，克服了同样挑战传统求解器的严重空间尺度差异。此外，通过改变开放至无穷远的磁通量，我们提供了将其与赤道T点位置关联的方程的修正。完整框架已作为开源库PulsarX发布。

英文摘要

The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

URL PDF HTML ☆

赞 0 踩 0

2606.10692 2026-06-10 cs.CR cs.LG 交叉投稿

Do LLMsMakeNeural Distinguishers Wise?

LLM 是否使神经区分器更智能？

Tatsuya Sakagami, Masashi Hisai, Naoto Yanai

发表机构 * University of Tokyo（东京大学）

AI总结本文提出基于大语言模型（LLM）的神经区分器，通过提示设计在SPECK-32/64上实验，发现LLM未显著提升性能，高轮次下差分选择失效，但加入XOR结果可改善性能。

详情

Journal ref: DeMeSSAI 2026 poster

AI中文摘要

神经区分器是一种对称密钥密码的密码分析方法，它通过训练机器学习模型于具有特定差分的明文-密文对来恢复密钥。据我们所知，现有工作尚未探索使用大语言模型（LLM）进行神经区分器。在本文中，我们通过提示设计提出了基于LLM的神经区分器，并在SPECK-32/64上对其进行了广泛实验，以研究LLM能否增强神经区分器。然后，我们发现了三个关键见解。第一，通过将基于LLM的神经区分器与现有工作中的ResNet结果进行比较，我们证明LLM在神经区分器性能上没有提供可观察到的改进。第二，我们确认在高轮次下，差分的选择对基于LLM的神经区分器以及ResNet不再有效。第三，我们表明，通过仅将XOR运算结果作为提示设计，可以显著提高基于LLM的神经区分器的性能。

英文摘要

Neural distinguishers are a cryptanalysis method for symmetric-key cryptography that trains machine learning models on pairs of plaintexts and ciphertexts with specific differences in order to recover a secret key. To the best of our knowledge, no existing work has explored the use of large language models (LLMs) for neural distinguishers. In this paper, we propose LLM-based neural distinguishers through a prompt design and conduct extensive experiments with them on SPECK-32/64 to investigate whether LLMs can strengthen neural distinguishers. We then found three key insights. First, by comparing the results of LLM-based neural distinguishers with ResNet in the existing work, we demonstrate that LLMs provide no observable improvement in the performance of neural distinguishers. Second, we confirm that, at high rounds, the choice of differences is no longer effective for LLM-based neural distinguishers as well as ResNet. Third, we show that the performance of LLM-based neural distinguishers can be significantly improved by incorporating only the XOR operation results as a prompt design.

URL PDF HTML ☆

赞 0 踩 0

2606.10698 2026-06-10 hep-ph cs.LG hep-th 交叉投稿

Efficient AI-Inspired Reduction of Feynman Integrals via Tube Seeding

基于管状播种的费曼积分高效类脑约化

Justin Berman, Francois Charton, Andres Luna, Matthias Wilhelm, Mao Zeng

发表机构 * Leinweber Institute for Theoretical Physics, Randall Laboratory of Physics, University of Michigan, Ann Arbor, 450 Church St, Ann Arbor, MI 48109-1040, USA（莱因韦伯理论物理研究所，物理系拉尔登实验室，密歇根大学安娜堡分校）； Axiom Math, 124 University Avenue, Palo Alto, California, 94301, United States（Axiom数学公司，帕洛阿尔托，加利福尼亚州，94301，美国）； Niels Bohr International Academy, Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen , Denmark（尼尔斯·波尔国际学院，尼尔斯·波尔研究所，哥本哈根大学）； Center for Quantum Mathematics, Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark（量子数学中心，数学与计算机科学系，丹麦南部大学）； Higgs Centre for Theoretical Physics, University of Edinburgh, Edinburgh, EH9 3FD, United Kingdom（希格斯理论物理中心，爱丁堡大学）

AI总结利用机器学习发现一种新的种子选择策略，通过稀疏种子线性增长实现高幂次多圈积分的约化，显著降低计算时间和内存占用，适用于唯象应用。

Comments 61 pages, 25 figures, 11 tables

详情

AI中文摘要

在本文中，我们利用机器学习发现了一种新的种子选择策略，用于费曼积分的分部积分约化，这是理论粒子物理和引力波物理前沿计算中常见的瓶颈。我们的策略允许通过本质上标准的Laporta算法，但采用稀疏的种子积分选择，仅随分子幂次线性增长，从而约化具有大分子幂次的多圈积分，而现有策略会导致随被约化积分复杂度增加的多项式幂次增长。种子被限制在一个薄管状区域内，该区域沿之字形路径将目标积分与主积分连接起来。我们通过约化具有数值运动学、秩为20的非平面2圈5点积分来展示我们方法的能力，这对于传统播种的Laporta算法来说难以实现。超越单个积分，我们进一步展示了通过将目标积分分成若干块来约化完整的一组顶层秩10积分，每块都可以通过我们的稀疏播种策略以显著少于其他先进策略的时间和内存占用求解，使该方法适用于唯象应用。我们在GitHub上提供了原理验证实现，网址为https://this URL。

英文摘要

In this paper, we use machine learning to discover a new seeding strategy for integration-by-parts reduction of Feynman integrals, which is a frequent bottleneck in state-of-the-art calculations in theoretical particle and gravitational-wave physics. Our strategy allows us to reduce multi-loop integrals with large numerator powers via essentially the standard Laporta algorithm but with a sparse selection of seed integrals that grows only linearly with the numerator power, whereas existing strategies lead to growth with a polynomial power that increases with the complexity of the integral being reduced. The seeds are restricted to a thin tube-like region that connects the target integral to the master integrals along a zigzag path. We demonstrate the power of our approach by reducing non-planar 2-loop 5-point integrals of rank 20 with numerical kinematics over a finite field, which is prohibitively difficult for the Laporta algorithm with conventional seeding. Going beyond individual integrals, we further demonstrate the reduction of a complete set of top-level rank-10 integrals by dividing the target integrals into several chunks, each of which can be solved by our sparse seeding strategy with considerably less time and a significantly lower memory footprint than other state-of-the-art strategies, making the approach well-suited for phenomenological applications. We provide a proof-of-principle implementation on GitHub at https://github.com/andreslunagodoy/tube_seeding.

URL PDF HTML ☆

赞 0 踩 0

2606.10713 2026-06-10 eess.IV cs.AI cs.CV cs.LG 交叉投稿

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation

++nnU-Net: 基于前缀数据增强的nnU-Net扩展

Ana Sofia Santos, André Ferreira, Gijs Luijten, Naida Solak, Lisle Faray de Paiva, Behrus Hinrichs-Puladi, Jens Kleesiek, Jan Egger, Victor Alves

发表机构 * Center Algoritmi / LASI, University of Minho, Braga, Portugal（阿尔戈里米中心/拉斯伊大学，明霍大学，布拉加，葡萄牙）； Institute for Artificial Intelligence in Medicine, University Medicine Essen, Essen, Germany（医学人工智能研究所，埃森医学院，埃森，德国）； Institute of Medical Informatics / Dept. of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Germany（医学信息学研究所/口腔和颅面外科部，亚琛大学医院，德国）； Faculty of Computer Science, University of Duisburg-Essen, Essen, Germany（计算机科学学院，杜伊斯堡-埃森大学，埃森，德国）

AI总结提出++nnU-Net，通过图像配准进行数据增强，在预处理和训练前生成变形图像，在5个2D数据集上提升Dice系数最高约22%。

Comments 7 pages, 1 figure, 2 tables

详情

AI中文摘要

nnU-Net在医学分割任务中持续展现出成功，这严重依赖于标注生物医学数据的可用性和多样性。然而，由于隐私法规和标注成本等因素，收集医学影像队列仍然具有挑战性。因此，数据增强在增加数据可用性的同时保持解剖学可行性方面起着关键作用。为此，我们提出了++nnU-Net，一种基于图像配准的新型数据增强模块，在预处理和训练之前运行。我们的框架在五个不同的2D数据集上进行了评估。在该工作流中，图像数据经过两阶段配准过程，生成新的变形图像。然后将变换应用于相应的分割。此外，该管道计算可用磁盘空间，生成补充的二进制合成掩码并生成检查点。我们证明++nnU-Net优于nnU-Net基线，在Dice相似系数得分上有所提升。在最显著的情况下，我们观察到性能提升约22%。这些发现强调了基于配准的数据增强的有效性，特别是对于2D医学影像数据集，并表明++nnU-Net为在数据有限的情况下提高分割性能提供了一种实用且可扩展的方法。++nnU-Net的源代码可在以下网址获取：this https URL

英文摘要

The nnU-Net has demonstrated continuous success in medical segmentation tasks, which heavily rely on the availability and diversity of annotated biomedical data. However, assembling medical imaging cohorts remains challenging due to numerous factors such as privacy regulations and annotation costs. As a result, data augmentation plays a crucial role in increasing data availability while maintaining anatomical feasibility. Hence, we propose the ++nnU-Net, a novel data augmentation module based on image registration that operates prior to preprocessing and training take place. Our framework was evaluated across five different 2D datasets. In this workflow, image data go through a two-stage registration process, generating new warped images. The transformations are then applied to the respective segmentation. In addition, the pipeline computes available disk space, generates supplementary binary synthetic masks and generates checkpoints. We demonstrate that the ++nnU-Net outperforms the nnU-Net baseline, yielding improvements in Dice Similarity Coefficient scores. In the most prominent cases, we observe performance gains of approximately 22\%. These findings highlight the effectiveness of registration-based data augmentation, particularly for 2D medical imaging datasets and suggest that the ++nnU-Net provides a practical and scalable approach for enhancing segmentation performance in data-limited settings. The source code for the ++nnU-Net is available at: https://github.com/sofia-adelie/plusplusnnunet.git

URL PDF HTML ☆

赞 0 踩 0

2606.10782 2026-06-10 cs.CR cs.AI cs.LG 交叉投稿

A Bayesian Network Approach for Enhancing Security-Focused Decision Support Systems

一种增强安全导向决策支持系统的贝叶斯网络方法

Carolina Fernández-Martínez, Shuaib Siddiqui, Vanesa Daza

发表机构 * University of Granada（格拉纳达大学）； University of Birmingham（伯明翰大学）

AI总结提出基于贝叶斯网络的决策支持系统，帮助基础设施运营商选择安全工具，通过捕获用户需求并推理，提供最优安全机制，评估了时间和预测精度。

详情

DOI: 10.1109/LCN65610.2025.11146363
Journal ref: Proc. 2025 IEEE 50th Conference on Local Computer Networks (LCN), 2025

AI中文摘要

当今大多数基于开源网络的异构栈的采用和集成带来了明显的优势，如互操作性和高级功能的可用性。然而，另一方面，互联组件和移动部件数量的增加需要维护跨不同领域的不同工具的跨学科知识基础，以确保正常运行。为了减轻这些工作，本文提出了一种决策支持系统（DSS），指导基础设施运营商选择在其环境中采用的安全方法（例如工具）。该框架能够轻松捕获最终用户对不同领域安全三元组的高层需求，并在指定模型上运行推理，以提供更好地满足这些需求的已识别工具（安全机制）。所提出的DSS旨在提供一个可理解和可扩展的框架，以适应不同的需求和贝叶斯网络（BN）模型。提出了系统的架构和建模，并与其理论框架保持一致。其性能在时间和预测精度方面进行了评估。

英文摘要

The adoption and integration of heterogeneous stacks in most of today's open-source based networks brings clear benefits like interoperability and availability of advanced features. Yet, on the other hand the increasing number of interconnecting components and moving parts requires maintaining an ever increasing base of interdisciplinary knowledge of different tools in different domains to ensure proper operation. To alleviate such efforts, this work proposes a Decision Support System (DSS) to guide infrastructure operators through the selection of security approaches (e.g. tools) to adopt in their environments. This framework easily captures the end-user high-level requirements on the security triad for different domains and runs inference on the designated models to provide the identified tools (security mechanisms) that better serve such needs. The presented DSS aims at delivering an understandable and extensible framework to accommodate varying requirements and Bayesian Network (BN) models. The architecture and modelling of the system are proposed, aligned with its theoretical framework. Its performance is evaluated in terms of time and prediction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.10889 2026-06-10 q-bio.NC cs.LG 交叉投稿

Sleep EEG Signal Criticality as a Non-Invasive Predictor of Cognitive Decline in Dementia

睡眠脑电信号临界性作为痴呆认知衰退的非侵入性预测指标

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

发表机构 * Institute of Cybernetics and Human Informatics, Polish Academy of Sciences（波兰科学院信息学与人类科学研究所）

AI总结研究通过多重分形去趋势波动分析量化睡眠脑电信号临界性，发现认知健康者更接近最优临界状态，痴呆组DFA指数向1.0偏移，表明睡眠中无标度神经动力学重组先于临床症状，可作为早期筛查工具。

Comments 4 pages, 2 figures, accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026

详情

AI中文摘要

神经退行性疾病的早期检测仍然是一个关键的临床挑战。本研究探讨了通过多重分形去趋势波动分析（MFDFA）量化的睡眠脑电信号临界性是否可作为未来认知衰退的非侵入性生物标志物。我们分析了国家睡眠研究资源（NSRR）骨质疏松性骨折研究（SOF）队列的纵向数据，比较了保持认知正常与后来进展为痴呆相关损伤（3MS < 78）的女性之间的基线睡眠脑电动力学。我们的结果揭示了Hurst指数$H(q)$分布在组间的显著差异，特别是在非快速眼动阶段N2和N3期间。认知健康的个体在所有电极位置上表现出显著更接近最优临界状态的信号动力学（$p \leqslant 0.001$），支持了大脑临界性假说。监督UMAP投影证实了整夜睡眠期间组间的清晰空间分离。痴呆组表现出DFA指数向$1.0$的偏移，表明睡眠中无标度神经动力学的重组先于临床症状。这些发现强调了将MFDFA衍生测量整合到自动化、基于睡眠的筛查工具中的潜力，从而能够在痴呆的前驱窗口期进行更早的预防性干预。

英文摘要

Early detection of neurodegeneration remains a critical clinical challenge. This study investigates whether sleep EEG signal criticality, quantified via Multifractal Detrended Fluctuation Analysis (MFDFA), serves as a non-invasive biomarker for future cognitive decline. We analyzed longitudinal data from the National Sleep Research Resource (NSRR) Study of Osteoporotic Fractures (SOF) cohort, comparing baseline sleep EEG dynamics between women who remained cognitively normal and those who later progressed to dementia-related impairment ($3MS < 78$).Our results reveal significant group-level differences in Hurst exponent $H(q)$ distributions, particularly during non-REM stages N2 and N3. Cognitively healthy individuals exhibited signal dynamics significantly closer to an optimally critical state across all electrode locations ($p \leqslant 0.001$), supporting the Brain Criticality Hypothesis. Supervised UMAP projections confirmed clear spatial separation between groups throughout the overnight sleep architecture.The dementia group demonstrated a shift in DFA exponents toward $1.0$, suggesting that a reconfiguration of scale-free neural dynamics during sleep precedes clinical symptoms. These findings highlight the potential for MFDFA-derived measures to be integrated into automated, sleep-based screening tools, enabling earlier preventative interventions during the prodromal window of dementia.

URL PDF HTML ☆

赞 0 踩 0

2606.10908 2026-06-10 cs.SD cs.AI cs.CR cs.LG 交叉投稿

RAT: Reference-Augmented Training for ASV Anti-Spoofing

RAT：面向ASV反欺骗的参考增强训练

Vojtěch Staněk, Anton Firc, Jakub Reš, Kamil Malinka

发表机构 * Brno University of Technology（布尔诺理工大学）

AI总结提出一种基于说话人参考录音的欺骗对抗架构，发现训练时引入参考通道可提升深度伪造检测性能，即使推理时参考缺失或失配。基于此提出参考增强训练（RAT）策略，在ASVspoof 5基准上以单个检测器达到2.57% EER和0.074 minDCF，超越大型集成系统。

Comments Accepted to Interspeech 2026

详情

AI中文摘要

我们引入了一种以说话人参考录音为条件的欺骗对抗架构，但观察到它收敛到一种在推理时有效忽略参考的解决方案。令人惊讶的是，使用参考通道进行训练会诱导出不变性，从而改进深度伪造检测，即使在推理时参考缺失或失配。基于这一观察，我们提出了一种参考增强训练（RAT）策略。与单话语基线相比，RAT产生了改进的检测性能，即使在推理时将参考录音替换为零向量时也是如此。通过严格分析，我们证明优化过程迅速减少了参考贡献，导致推理很大程度上独立于参考通道。使用RAT，我们在ASVspoof 5基准上以单个检测器实现了最先进的2.57%等错误率和0.074最小检测代价函数，甚至超越了大型集成系统。

英文摘要

We introduce a spoofing countermeasure architecture conditioned on speaker-reference recordings, but observe that it converges to a solution that effectively ignores the reference during inference. Surprisingly, training with a reference channel induces invariance that improves deepfake detection, even when the reference is absent or mismatched during inference. Based on this observation, we propose a Reference-Augmented Training (RAT) strategy. RAT yields improved detection performance compared to single-utterance baselines, even when the reference recording is replaced with a zero vector at inference. Through rigorous analysis, we demonstrate that the optimization process rapidly diminishes the reference contributions, leading to inference largely independent of the reference channel. Using RAT, we achieve state-of-the-art 2.57% EER and 0.074 minDCF on the ASVspoof 5 benchmark with a single detector, surpassing even large ensemble systems.

URL PDF HTML ☆

赞 0 踩 0

2606.10909 2026-06-10 cs.CE cs.LG physics.comp-ph 交叉投稿

Non-linear mechanical field reconstruction coupling recurrent neural networks with physics-informed graph neural networks

非线性力学场重建：循环神经网络与物理信息图神经网络的耦合

Manuel Ricardo Guevara Garban, Yves Chemisky, Étienne Prulière, Michaël Clément, Martin Abendroth, Björn Kiefer

发表机构 * Univ. Bordeaux, CNRS, Bordeaux INP, I2M, UMR 5295（波尔多大学、国家科学研究中心、波尔多工业学院、I2M、UMR 5295）； Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800（波尔多大学、国家科学研究中心、波尔多工业学院、LaBRI、UMR 5800）； Institute of Mechanics and Fluid Dynamics, TU Bergakademie Freiberg（力学与流体动力学研究所、弗赖堡技术大学）； Univ. Grenoble Alpes, CNRS, UMR 5525（格勒诺布尔阿尔卑斯大学、国家科学研究中心、UMR 5525）； Arts et Metiers Institute of Technology, CNRS, Bordeaux INP, I2M, UMR 5295（艺术与工艺技术学院、国家科学研究中心、波尔多工业学院、I2M、UMR 5295）

AI总结提出LSTM-GNN耦合框架，利用LSTM编码宏观应力-应变序列的路径依赖响应，物理信息GNN重建空间应力场，通过相对加权策略平衡损失，实现弹塑性微观结构应力场快速重建，速度提升三个数量级。

详情

AI中文摘要

在非线性、历史依赖载荷下重建异质微结构的局部应力场仍然是多尺度模拟中的主要计算瓶颈。我们提出了一种耦合的LSTM-GNN框架，将局部应力场重建的时间和空间方面联系起来。长短期记忆网络将宏观应力-应变序列编码为紧凑的隐藏状态，捕获路径依赖的本构响应，而物理信息图神经网络在每个时间步重建空间分辨的应力场。我们引入了一种带有线性热启动的相对加权策略，以平衡数据驱动的重建损失和基于离散散度的平衡惩罚。这解决了在弹塑性范围内阻止固定权重公式收敛的尺度不匹配问题。该模型在应用于周期性含孔板微结构和von Mises弹塑性的10,000条非比例加载路径上训练。该模型相比有限元模拟实现了三个数量级的加速，并泛化到训练长度两倍的加载序列，累积误差为1.9%。由于图依赖于网格连通性而非特定单元类型，一个训练好的代理模型可以直接应用于不同单元类型、更粗和更细分辨率的网格，无需重新训练，同时在所有情况下复现训练中使用的高保真四边形单元有限元场。实际上，GNN和MeshGraphNet架构固有的消息传递特性使模型与网格无关。对LSTM隐藏状态的分析表明，存在与本构模型内部状态变量相关的低维结构。

英文摘要

Reconstructing local stress fields in heterogeneous microstructures under non-linear, history-dependent loading remains a major computational bottleneck in multi-scale simulations. We propose a coupled LSTM-GNN framework that links the temporal and spatial aspects of local stress field reconstruction. A Long Short-Term Memory network encodes macroscopic stress-strain sequences into a compact hidden state that captures the path-dependent constitutive response, while a physics-informed Graph Neural Network reconstructs the spatially-resolved stress field at each time step. We introduce a relative weighting strategy with linear warm-up to balance the data-driven reconstruction loss and a discrete divergence-based equilibrium penalty. This resolves the scale mismatch that prevents fixed-weight formulations from converging in the elasto-plastic regime. The model is trained on 10,000 non-proportional loading paths applied to a periodic plate-with-a-hole microstructure and von Mises elasto-plasticity. The model achieves three orders of magnitude speedup over finite element simulations and generalizes to loading sequences twice the training length, with 1.9% cumulative error. Because the graph relies on mesh connectivity instead of the specific element type, one trained surrogate can be applied directly without retraining to meshes with different element types and to both coarser and finer resolutions, while in all cases reproducing the high-fidelity quad-element FE field used during training. Indeed, the message passing characteristics inherent to GNN and MeshGraphNet architecture render the model mesh-agnostic. Analysis of the LSTM hidden states suggests a low-dimensional structure related to the internal state variables of the constitutive model.

URL PDF HTML ☆

赞 0 踩 0

2606.10928 2026-06-10 cs.CE cs.AI cs.LG physics.comp-ph 交叉投稿

A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS

FEniCS中变分多物理场有限元模拟的受约束自然语言接口

Nilay Upadhyay, Wesley F. Reinhart

发表机构 * Department of Engineering Science and Mechanics, The Pennsylvania State University（工程科学与力学系，宾夕法尼亚州立大学）； Department of Materials Science and Engineering, The Pennsylvania State University（材料科学与工程系，宾夕法尼亚州立大学）

AI总结提出一种受约束的自然语言接口，将LLM限制在前端任务（解析提示、生成Gmsh代码），后端使用确定性模板求解器，在基准测试中实现100%解析率和90%几何生成成功率。

Comments 23 pages, 17 figures

详情

AI中文摘要

大型语言模型可以减少设置有限元模拟所需的手动工作，但当生成的求解器代码位于关键路径上时，会引入可靠性风险。我们提出了一种用于多物理场有限元分析的受约束自然语言接口，其中LLM仅限于前端任务：将提示解析为结构化JSON，仅对非目录几何生成Gmsh代码，并对这些阶段使用重试反馈。它从不编写FEniCS求解器模板、推导弱形式或编写数值求解器核心。一个确定性调度器将验证后的规范映射到五个手写的FEniCS/UFL模板：线弹性、超弹性、弹塑性、热力耦合和相场断裂。我们针对解析解和已发表的2D/3D基准测试验证了该确定性模板层。在适当网格上，平滑案例达到低于1%的一致性，而较难的非线性案例达到2-5%的范围。我们还直接评估了面向LLM的前端。在15个提示的解析器基准测试中，首次通过有效解析获得了9个案例，其余所有案例在重试后修复，最终有效解析率为100.0%，问题类别准确率为100.0%，字段提取准确率为97.1%。在通过真实LLM到Gmsh路径路由的10个案例自定义几何基准测试中，首次通过和最终成功率均为90.0%，一次未恢复的无效几何失败。这些结果表明，解析器和受约束的提示/验证设计在这些基准测试上是有效的。作为端到端演示，该系统从一个自然语言提示生成并分析了一个带有圆角和螺栓孔的3D弹塑性L形支架。贡献在于一种用于自然语言驱动的变分模拟的测量架构，而非开放式的自主代码生成。

英文摘要

Large language models can reduce the manual effort required to set up finite element simulations, but they introduce reliability risks when generated solver code lies on the critical path. We present a constrained natural-language interface for multi-physics finite element analysis in which the LLM is limited to front-end tasks: parsing prompts into structured JSON, generating Gmsh code only for non-catalog geometries, and using retry feedback for those stages. It never writes FEniCS solver templates, derives weak forms, or writes the numerical solver core. A deterministic dispatcher maps the validated specification to five human-written FEniCS/UFL templates: linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. We validate this deterministic template layer against analytical solutions and published 2D/3D benchmarks. Smooth cases reach sub-percent agreement on adequate meshes, while harder nonlinear cases reach the 2-5 percent range. We also evaluate the LLM-facing front end directly. In a 15-prompt parser benchmark, first-pass valid parses were obtained for 9 cases, and all remaining cases were repaired after retry, giving a final valid parse rate of 100.0 percent, 100.0 percent problem-class accuracy, and 97.1 percent field-extraction accuracy. In a 10-case custom-geometry benchmark routed through the real LLM-to-Gmsh path, first-pass and final success were both 90.0 percent, with one unrecovered invalid-geometry failure. These results show that the parser and constrained prompt/validation design are effective on these benchmarks. As an end-to-end demonstration, the system generates and analyzes a 3D elastoplastic L-bracket with a fillet and bolt hole from one natural-language prompt. The contribution is a measured architecture for natural-language-driven variational simulation, not open-ended autonomous code generation.

URL PDF HTML ☆

赞 0 踩 0

2606.10942 2026-06-10 cs.NI cs.AI cs.LG 交叉投稿

Generative Explainability for Next-Generation Networks: LLM-Augmented XAI with Mutual Feature Interactions

下一代网络的生成式可解释性：基于互特征交互的LLM增强XAI

Kiarash Rezaei, Omran Ayoub, Sebastian Troia, Francesco Lelli, Paolo Monti, Carlos Natalino

发表机构 * Swedish Innovation Agency（瑞典创新署）； Swiss Innovation Agency（瑞士创新署）

AI总结提出一种利用大语言模型和互特征交互数据生成自然语言解释的框架，在光传输质量估计用例中，相比基线方法，解释有用性和范围分别提升12.2%和6.2%，正确率达97.5%。

Comments 7 pages, with one page for appendix. Accepted for publication at the 2025 21th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)

详情

DOI: 10.1109/WiMob66857.2025.11257542
Journal ref: Proc. WiMob, Marrakesh, Morocco, 2025

AI中文摘要

随着人工智能和机器学习模型成为网络运营的核心，其缺乏透明度对运营商信任构成重大障碍。现有的可解释人工智能技术往往无法为非专家弥合这一差距，产生的技术输出难以转化为可操作的见解。本文提出了一个专门解决这一缺陷的框架。它利用中等规模的大语言模型，并超越了SHapley Additive exPlanations特征影响值的标准用法。该框架采用结构化的提示，并辅以互特征交互数据，以生成人类可理解的自然语言解释。为了验证我们的框架，我们在光传输质量估计用例中进行了实证评估，并邀请了人类评估者。我们收集了专家的独立性能评估，显示出较高的评估者间一致性。与仅使用SHAP特征影响值进行简单提示的最先进基线相比，我们的方法将解释有用性和范围分别提高了12.2%和6.2%，同时实现了97.5%的正确性。

英文摘要

As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust. Existing explainable artificial intelligence (XAI) techniques often fail to bridge this gap for non-specialists, producing technical outputs that are difficult to translate into actionable insights. This paper presents a framework specifically designed to address this shortcoming. It leverages a moderately sized large language model (LLM) and extends beyond the standard use of SHapley Additive exPlanations (SHAP) feature influence values. The framework employs a structured prompt enriched with mutual feature interaction data to generate human-understandable natural language explanations. To validate our framework, we performed an empirical evaluation on an optical quality of transmission (QoT) estimation use case with human evaluators. We collected independent performance evaluations from specialists, which showed a high inter-evaluator agreement. Compared to a state-of-the-art baseline that uses only SHAP feature influence values in a straightforward prompt, our approach improves the explanation usefulness and scope by 12.2% and 6.2%, while achieving 97.5% correctness.

URL PDF HTML ☆

赞 0 踩 0

2606.11023 2026-06-10 cs.IR cs.CL cs.LG 交叉投稿

Generative Archetype-Grounded Item Representations for Sequential Recommendation

生成式原型驱动的物品表示用于序列推荐

Yifan Li, Jiahong Liu, Xinni Zhang, Hao Chen, Yankai Chen, Wenhao Yu, Jianting Chen, Irwin King

发表机构 * The Chinese University of Hong Kong（香港中文大学）； McGill University（麦吉尔大学）； Tongji University（同济大学）

AI总结提出GenAIR框架，利用大语言模型生成物品原型描述并提取嵌入，结合行为校准目标弥合语义与行为差距，显著提升序列推荐性能。

Comments Accepted by WWW 2026 (Oral)

详情

DOI: 10.1145/3774904.3792587

AI中文摘要

序列推荐旨在通过分析用户的历史行为来预测用户与物品的下一次交互。然而，物品表示的质量有限仍然是一个关键瓶颈。虽然预训练的大语言模型（LLM）可以提供丰富的语义表示，但现有方法仅依赖于固定属性的静态编码，忽视了目标受众在定义物品身份中的关键作用。此外，语义空间难以反映实际用户行为，导致语义表示与行为模式之间存在显著差距。为了解决这些局限性，我们提出了GenAIR，一个通用框架，通过生成式原型驱动的物品表示来增强序列推荐。具体来说，我们首先利用LLM分析物品元数据并推断原型的文本描述，该原型代表物品理想目标受众的概念轮廓。然后，我们在一次前向传播中提取相应的嵌入。此外，为了将这些生成式原型基于现实世界的行为，我们引入了一个行为校准目标，该目标明确地整合了来自实际交互的行为信号。该目标调整嵌入空间的结构以反映经验模式。GenAIR能够与大多数现有模型无缝集成，同时保持高效率。在三个真实世界数据集上进行的全面实验表明，GenAIR显著提高了各种序列推荐模型的性能，并始终优于最先进的基线方法。实现代码可在以下网址获取：https://this URL。

英文摘要

Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item representations remains a critical bottleneck. While pre-trained large language models (LLMs) can provide rich semantic representations, existing approaches only rely on static encoding of fixed attributes, overlooking the crucial role of target audiences in defining item identity. Moreover, the semantic space struggles to reflect actual user behavior, resulting in a significant gap between semantic representations and behavioral patterns. To address these limitations, we propose GenAIR, a general framework that empowers sequential recommendation with Generative Archetype-grounded Item Representations. Specifically, we first leverage an LLM to analyze item metadata and infer textual description of the Archetype, which represents the conceptual profile of the item's ideal target audience. We then extract the corresponding embeddings in a single forward pass. Further, to ground these generative archetypes in real-world behavior, we introduce a behavioral calibration objective, which explicitly incorporates behavioral signals from actual interactions. This objective adjusts the structure of the embedding space to reflect empirical patterns. GenAIR enables seamless integration with most existing models while maintaining high efficiency. Comprehensive experiments conducted on three real-world datasets demonstrate that GenAIR significantly improves the performance of various sequential recommendation models and consistently outperforms state-of-the-art baseline approaches. Implementation codes are available at https://github.com/AI-Santiago/GenAIR.

URL PDF HTML ☆

赞 0 踩 0

2606.11125 2026-06-10 eess.SP cs.LG 交叉投稿

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

DMT: 基于人口统计条件与形态增强Transformer的无袖带血压估计方法

Yidan Shen, Neville Mathew, Maham Rahimi, Deependra Dhakal, George Zouridakis, Xin Fu, Renjie Hu

发表机构 * University of California, San Diego（加州大学圣地亚哥分校）

AI总结提出一种基于Transformer的PPG信号无袖带血压估计网络，通过FiLM风格特征调制融入人口统计信息，并添加辅助形态头引导模型关注与动脉僵硬度相关的波形形态，在PulseDB数据集上实现收缩压MAE 4.56 mmHg、舒张压MAE 2.62 mmHg。

详情

AI中文摘要

血压（BP）是心血管风险评估和治疗决策的关键指标，而光电容积描记术（PPG）能够实现低成本、可穿戴友好的无袖带血压估计。然而，即使近期取得了进展，许多基于PPG的模型仅通过血压回归进行训练，可能依赖于以振幅为主的捷径。此外，系统性调节血管顺应性的人口统计协变量通常仅通过后期融合纳入，限制了特定于主体的表示学习。我们提出了一种基于Transformer的网络，用于从PPG信号进行无袖带血压估计，利用自注意力机制捕获多个心动周期之间的长程依赖关系。为了考虑特定主体的血管差异，模型通过Transformer块的注意力和前馈子层中应用的FiLM风格特征调制，以人口统计信息为条件。此外，我们添加了一个辅助形态头，引导模型关注与动脉硬度和波反射相关的血压相关波形形态。在大型PulseDB数据集上基于校准的评估协议下，所提方法在收缩压上实现了4.56 mmHg的平均绝对误差（MAE），在舒张压上实现了2.62 mmHg，与先前的人口统计增强PPG基线相比，误差分别减少了47%和50。由此产生的轻量级单传感器模型支持在启用校准的部署场景中进行可扩展且临床可靠的无袖带血压估计。

英文摘要

Blood pressure (BP) is a key marker for cardiovascular risk assessment and therapeutic decision-making, and Photoplethysmography (PPG) enables low-cost, wearable-friendly cuffless BP estimation. However, even with recent progress, many PPG-based models are trained with BP regression alone and may rely on amplitude-dominated shortcuts. In addition, demographic covariates that systematically modulate vascular compliance are often incorporated only via late fusion, limiting subject-specific representation learning. We propose a Transformer-based network for cuffless BP estimation from PPG signal, leveraging self-attention to capture long-range dependencies across multiple cardiac cycles. To account for subject-specific vascular differences, the model is conditioned on demographics via FiLM-style feature modulation applied through the attention and feed-forward sublayers of Transformer blocks. In addition, we add an auxiliary morphology head to guide the model to attend to BP-relevant waveform morphology associated with arterial stiffness and wave reflection. Under calibration-based evaluation protocols on the large-scale PulseDB dataset, the proposed method achieves MAE of 4.56 mmHg for systolic BP and 2.62 mmHg for diastolic BP, reducing errors by 47% and 50% compared with prior demographic-enhanced PPG baselines. The resulting lightweight, single-sensor model supports scalable and clinically grounded cuffless BP estimation in calibration-enabled deployment settings.

URL PDF HTML ☆

赞 0 踩 0

2606.11140 2026-06-10 physics.geo-ph cs.AI cs.LG stat.AP stat.ML 交叉投稿

Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques

基于潜扩散模型参数化的地下流体数据同化：集成卡尔曼与蒙特卡洛技术的性能

Guido Di Federico, Wenchao Teng, Louis J. Durlofsky

发表机构 * Department of Energy Science & Engineering, Stanford University（能源科学与工程系，斯坦福大学）

AI总结针对地下流体数据同化中高维参数反演问题，比较了基于潜扩散模型（LDM）的集成卡尔曼方法（ESMDA）与蒙特卡洛方法（MCMC/SMC）在三维河道地质模型上的性能，发现蒙特卡洛方法在保持地质真实性的同时能更有效地降低数据失配和不确定性。

详情

AI中文摘要

地下流体数据同化（DA）涉及校准模型参数以匹配观测数据（通常来自井），同时保持地质真实性。潜扩散模型（LDM）提供了从高维地质模型空间到低维潜变量的高效映射，降低了反问题的维度，同时保持了后验地质模型的合理性。然而，LDM映射的高度非线性可能会降低基于卡尔曼增益的集成更新的性能。我们针对具有层次地质不确定性的三维河道地质模型，系统比较了DA算法。我们使用多重数据同化集成平滑器（ESMDA）比较了模型空间和潜空间的DA，并展示了一个关键权衡：模型空间更新实现了显著的不确定性降低，但产生了地质上不现实的后验模型，而潜空间更新保持了真实性但表现出有限的不确定性降低。受此启发，我们在3D-LDM潜空间中探索了严格的马尔可夫链蒙特卡洛（MCMC）和序贯蒙特卡洛（SMC）算法。为适应其高计算需求，我们开发了一个快速代理流模型来近似井响应。MCMC和SMC在三个合成测试案例中与ESMDA进行了评估，DA在LDM潜空间中执行。由于LDM参数化，所有模型都保持了地质真实性。MCMC和SMC彼此一致，并且比潜空间ESMDA实现了更低的数据失配和更多的不确定性降低。我们的总体结果表明，集成卡尔曼方法在高度非线性参数化下可能提供过高的后验不确定性，而由快速代理模型支持的严格蒙特卡洛采样可以提供更可靠的替代方案。

英文摘要

Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide efficient mappings from high-dimensional geological model space to a low-dimensional latent variable, reducing the dimensionality of the inverse problem while maintaining plausibility in posterior geomodels. However, the high nonlinearity in the LDM mapping may degrade the performance of Kalman-gain-based ensemble updates. We present a systematic comparison of DA algorithms applied to large-scale 3D channelized geomodels with hierarchical geological uncertainty. We compare model-space and latent-space DA using the ensemble smoother with multiple data assimilation (ESMDA), and demonstrate a key trade-off: model-space updates achieve significant uncertainty reduction but produce geologically unrealistic posterior models, while latent-space updates preserve realism but exhibit limited uncertainty reduction. Motivated by this, we explore rigorous Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) algorithms in the 3D-LDM latent space. To accommodate their high computational demands, we develop a fast surrogate flow model that approximates well-rate responses. MCMC and SMC are evaluated against ESMDA across three synthetic test cases, with DA performed in the LDM latent space. All models maintain geological realism due to the LDM parameterization. MCMC and SMC are consistent with one another and achieve lower data mismatch and more uncertainty reduction than latent-space ESMDA. Our overall results demonstrate that ensemble Kalman methods may provide overestimated posterior uncertainty with highly nonlinear parameterizations, while rigorous Monte Carlo sampling, enabled by fast surrogate models, can provide a more reliable alternative.

URL PDF HTML ☆

赞 0 踩 0

2503.19158 2026-06-10 cs.LG q-bio.QM 版本更新

Integrating Biological-Informed Recurrent Neural Networks for Glucose-Insulin Dynamics Modeling

整合生物信息递归神经网络用于葡萄糖-胰岛素动态建模

Stefano De Carli, Nicola Licini, Davide Previtali, Fabio Previdi, Antonio Ferramosca

发表机构 * Department of Management, Information and Production Engineering, University of Bergamo（管理、信息与生产工程系，贝加莫大学）

AI总结本文提出生物信息递归神经网络框架，用于更准确地建模葡萄糖-胰岛素动态，以提高人工胰腺系统的个性化血糖调节能力。

Comments Accepted for publication in the proceedings of the Engineering Diabetes Technologies (EDT 2025). 7 pages, 2 figures and 1 table

详情

DOI: 10.1016/j.ifacol.2025.06.016
Journal ref: IFAC-PapersOnLine, 59(2), 2025, pp. 91-96

AI中文摘要

1型糖尿病管理由于多种变异性因素而复杂。人工胰腺系统通过先进控制算法自动化胰岛素输送，减轻了患者负担。然而，这些系统的有效性依赖于对葡萄糖-胰岛素动态的准确建模，而传统数学模型往往无法捕捉到患者特异性变化。本文引入了生物信息递归神经网络（BIRNN）框架，该框架利用门控递归单元（GRU）架构，并辅以包含生理约束的物理信息损失函数，确保预测准确性和生物原理的一致性。该框架通过商业UVA/Padova模拟器验证，其在葡萄糖预测准确性和未测量状态重构方面优于传统线性模型，即使在胰岛素敏感性昼夜变化下也表现优异。结果表明，BIRNN在人工胰腺系统的个性化葡萄糖调节和未来自适应控制策略中具有潜力。

英文摘要

Type 1 Diabetes (T1D) management is a complex task due to many variability factors. Artificial Pancreas (AP) systems have alleviated patient burden by automating insulin delivery through advanced control algorithms. However, the effectiveness of these systems depends on accurate modeling of glucose-insulin dynamics, which traditional mathematical models often fail to capture due to their inability to adapt to patient-specific variations. This study introduces a Biological-Informed Recurrent Neural Network (BIRNN) framework to address these limitations. The BIRNN leverages a Gated Recurrent Units (GRU) architecture augmented with physics-informed loss functions that embed physiological constraints, ensuring a balance between predictive accuracy and consistency with biological principles. The framework is validated using the commercial UVA/Padova simulator, outperforming traditional linear models in glucose prediction accuracy and reconstruction of unmeasured states, even under circadian variations in insulin sensitivity. The results demonstrate the potential of BIRNN for personalized glucose regulation and future adaptive control strategies in AP systems.

URL PDF HTML ☆

赞 0 踩 0

2511.01927 2026-06-10 cs.LG cs.AI cs.NA math.NA 版本更新

Learning-Guided Integration Contours Construction for Fast Large-Scale Generalized Eigensolvers

学习引导的积分轮廓构建用于快速大规模广义特征值求解器

Yeqiu Chen, Ziyan Liu, Hong Wang, Lei Liu

发表机构 * University of Science（科学技术大学）

AI总结提出Deepcontour混合框架，结合深度学习谱预测器与核密度估计自动构建优化积分轮廓，加速大规模广义特征值求解，实现最高5.63倍加速并保持数值精度。

详情

AI中文摘要

解决大规模广义特征值问题（GEPs）是科学与工程中一项基本但计算上极为困难的任务。作为一种有前景的方向，轮廓积分（CI）方法提供了高效且可并行化的框架。然而，其性能关键依赖于积分轮廓的选择——在没有可靠先验知识的情况下，不当选择可能导致显著的计算开销并损害数值精度。为应对这一挑战，我们提出Deepcontour，一种新颖的混合框架，它将基于深度学习的谱预测器与核密度估计（KDE）相结合，用于原则性的轮廓设计。具体而言，Deepcontour利用其专用的特征神经算子（ENO）提供快速的谱分布先验，驱动KDE模块自动构建优化的积分轮廓，从而引导CI求解器高效地找到所需特征值。Deepcontour在多种科学数据集上实现了高达5.63倍的加速，同时保持严格的数值精度。通过融合深度学习的预测能力与经典求解器的数值严谨性，这项工作为解决大规模GEPs建立了一种高效且稳健的范式。

英文摘要

Solving large-scale Generalized Eigenvalue Problems (GEPs) is a fundamental yet computationally prohibitive task in science and engineering. As a promising direction, contour integral (CI) methods offer an efficient and parallelizable framework. However, their performance is critically dependent on the selection of integration contours -- improper selection without reliable prior knowledge of eigenvalue distribution can incur significant computational overhead and compromise numerical accuracy. To address this challenge, we propose Deepcontour, a novel hybrid framework that integrates a deep learning-based spectral predictor with Kernel Density Estimation (KDE) for principled contour design. Specifically, Deepcontour utilizes its specialized Eigen-Neural-Operator (ENO) to provide rapid spectral distribution priors, driving a KDE module to automatically construct the optimized integration contours, which guide the CI solver to efficiently find the desired eigenvalues. Deepcontour achieves up to a 5.63x speedup across diverse scientific datasets while maintaining strict numerical rigor. By merging the predictive power of deep learning with the numerical rigor of classical solvers, this work establishes an efficient and robust paradigm for solving large-scale GEPs.

URL PDF HTML ☆

赞 0 踩 0

2601.14653 2026-06-10 cs.LG q-bio.GN 版本更新

Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport

基于聚类正则化最优传输的块缺失单细胞数据高效插补

Yuyu Liu, Jiannan Yang, Ziyang Yu, Weishen Pan, Fei Wang, Tengfei Ma

发表机构 * Department of Computer Science, Stony Brook University（计算机科学系，石板溪大学）； Department of Computer Science, Emory University（计算机科学系，埃默里大学）； Department of Population Health Sciences, Cornell University（流行病学与公共卫生系，康奈尔大学）； Department of Computer Science and Engineering, Cornell University（计算机科学与工程系，康奈尔大学）； Department of Biomedical Informatics, Stony Brook University（生物医学信息学系，石板溪大学）

AI总结提出CROT算法，利用最优传输处理单细胞数据中的块缺失问题，实现高精度插补并显著降低运行时间。

Comments Accepted to ACM-BCB 2026

详情

AI中文摘要

单细胞测序数据集中的缺失数据对提取有意义的生物学见解构成了重大挑战。然而，现有的插补方法通常假设数据均匀且完整，难以处理存在大片缺失数据的情况。在本文中，我们提出了CROT（聚类正则化最优传输），一种基于最优传输的插补算法，旨在处理表格格式中的块缺失数据。我们的方法在存在显著缺失的情况下有效捕捉底层数据结构。值得注意的是，它在显著减少运行时间的同时实现了优越的插补精度，展示了其在大规模数据集上的可扩展性和效率。这项工作为具有结构化数据缺失的异质性高维数据集提供了一种鲁棒的插补解决方案，解决了生物学和临床数据分析中的关键挑战。我们的代码可在GitHub上获取，https://github.com/yuyuliu11037/CROT。

英文摘要

Missing data in single-cell sequencing datasets poses significant challenges for extracting meaningful biological insights. However, existing imputation approaches, which often assume uniformity and data completeness, struggle to address cases with large patches of missing data. In this paper, we present CROT (Cluster-Regularized Optimal Transport), an optimal transport-based imputation algorithm designed to handle patch-based missing data in tabular formats. Our approach effectively captures the underlying data structure in the presence of significant missingness. Notably, it achieves superior imputation accuracy while significantly reducing runtime, demonstrating its scalability and efficiency for large-scale datasets. This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence, addressing critical challenges in both biological and clinical data analysis. Our code is available on GitHub, https://github.com/yuyuliu11037/CROT.

URL PDF HTML ☆

赞 0 踩 0

2602.02788 2026-06-10 cs.LG cs.AI physics.comp-ph 版本更新

Structure-Preserving Learning Improves Geometry Generalization in Neural PDEs

保结构学习提升神经PDE在几何泛化中的表现

Benjamin D. Shaffer, Shawn Koohy, Brooks Kinch, M. Ani Hsieh, Nathaniel Trask

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出General-Geometry Neural Whitney Forms (Geo-NeW)方法，通过联合学习微分算子和兼容的降阶有限元空间，结合有限元外微积分精确保持物理守恒律，显著提升对未见几何域的泛化能力。

详情

AI中文摘要

我们旨在为科学和工程开发物理基础模型，这些模型能为偏微分方程（PDE）提供实时解，并在适应未见几何时保持结构和精度。为此，我们引入了通用几何神经Whitney形式（Geo-NeW）：一种数据驱动的有限元方法。我们联合学习一个微分算子和定义在底层几何上的兼容降阶有限元空间。求解所得模型以生成预测，同时通过有限元外微积分精确保持物理守恒律。几何通过基于Transformer的编码和作为学习到的有限元空间的基础，以离散化网格的形式进入模型。这明确地将底层几何和施加的边界条件与解联系起来，为学习神经PDE提供了强大的归纳偏置，我们证明这改善了对未见域的泛化。我们提供了一种本构模型的新参数化，确保解的存在性和唯一性。我们的方法在几个稳态PDE基准测试中展示了最先进的性能，并在分布外几何上比传统基线有显著改进。

英文摘要

We aim to develop physics foundation models for science and engineering that provide real-time solutions to Partial Differential Equations (PDEs) which preserve structure and accuracy under adaptation to unseen geometries. To this end, we introduce General-Geometry Neural Whitney Forms (Geo-NeW): a data-driven finite element method. We jointly learn a differential operator and compatible reduced finite element spaces defined on the underlying geometry. The resulting model is solved to generate predictions, while exactly preserving physical conservation laws through Finite Element Exterior Calculus. Geometry enters the model as a discretized mesh both through a transformer-based encoding and as the basis for the learned finite element spaces. This explicitly connects the underlying geometry and imposed boundary conditions to the solution, providing a powerful inductive bias for learning neural PDEs, which we demonstrate improves generalization to unseen domains. We provide a novel parameterization of the constitutive model ensuring the existence and uniqueness of the solution. Our approach demonstrates state-of-the-art performance on several steady-state PDE benchmarks, and provides a significant improvement over conventional baselines on out-of-distribution geometries.

URL PDF HTML ☆

赞 0 踩 0

2602.06411 2026-06-10 cs.LG 版本更新

DAH-Net: A Dual-Attention Hybrid Network for Interpretable and Robust EEG-Based Emotion Recognition

DAH-Net: 用于可解释且鲁棒的基于脑电情绪识别的双注意力混合网络

S M Rakib UI Karim, Diponkor Bala, Wenyi Lu, Rownak Ara Rasul, Sean Goggins

发表机构 * Department of Electrical \& Computer Engineering University of Missouri Columbia, Missouri, USA ； Department of Computer Science ； Engineering City University Savar, Dhaka-1340, Bangladesh ； Department of Computer Science University of Missouri Columbia, Missouri, USA

AI总结提出DAH-Net双注意力混合网络，集成1D-CNN、BiLSTM和双多头注意力，在2479样本988特征上达到99.19%测试准确率，优于传统模型，并通过特征分析揭示协方差特征贡献最大。

详情

AI中文摘要

基于脑电的情绪识别支持情感脑机接口和心理健康监测，但仍面临信号复杂性、受试者变异性和可解释性有限的挑战。我们提出DAH-Net，一种双注意力混合网络，集成1D-CNN、BiLSTM和双多头注意力（16+8头），用于三类脑电情绪分类。在2479个样本（988个脑电特征）上评估，DAH-Net达到99.19%的留出测试准确率，训练-测试差距为0.81%，优于RF（96.17%）、SVM（96.77%）、MLP（97.18%）和Transformer（98.19%）基线。Friedman检验（χ²=28.54，p<0.001）和事后Wilcoxon比较证实了统计显著性。使用随机森林重要性、SHAP归因和特征类别隔离进行的特征级分析表明，协方差特征达到接近基线的独立准确率（94.96%），而特征值特征独立表现有限（84.07%），但提供了紧凑的互补信息。紧凑架构（3.33M参数，使用32位权重约13.3MB）表明未来轻量级基于脑电的情感计算的潜力，有待受试者独立和外部验证。

英文摘要

EEG-based emotion recognition supports affective brain-computer interfaces and mental health monitoring yet remains challenged by signal complexity, subject variability, and limited interpretability. We propose DAH-Net, a dual-attention hybrid network integrating 1D-CNN, BiLSTM, and dual multi-head attention (16+8 heads) for three-class EEG emotion classification. Evaluated on 2,479 samples with 988 EEG features, DAH-Net achieves 99.19% held-out test accuracy with a 0.81% train-test gap, outperforming RF (96.17%), SVM (96.77%), MLP (97.18%), and Transformer (98.19%) baselines. Friedman testing (\c{hi}2 = 28.54, p < 0.001) and post-hoc Wilcoxon comparisons confirm statistical significance. Feature-level analysis using Random Forest importance, SHAP attribution, and feature category isolation shows that covariance features achieve near-baseline standalone accuracy (94.96%), while eigenvalue features show limited standalone performance (84.07%) but provide compact complementary information. The compact architecture (3.33M parameters, approximately 13.3MB using 32-bit weights) suggests potential for future lightweight EEG-based affective computing, pending subject-independent and external validation.

URL PDF HTML ☆

赞 0 踩 0

2603.02221 2026-06-10 cs.LG cs.AI 版本更新

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

MedFeat: 基于模型感知与可解释性驱动的LLM特征工程用于临床表格预测

Zizheng Zhang, Yiming Li, Justin Xu, Jinyu Wang, Rui Wang, Lei Song, Jiang Bian, David W Eyre, Jingjing Fu

发表机构 * Microsoft Research（微软研究院）； University of Oxford（牛津大学）

AI总结提出MedFeat框架，利用模型感知和特征重要性信号迭代引导LLM生成针对性特征，在临床表格预测中平均提升超10%。

详情

AI中文摘要

在临床表格预测中，带有特征工程的经典机器学习模型通常优于神经方法。LLM越来越多地被用于自动化这一过程，作为领域专家提出多样化的特征变换以提升下游性能。然而，现有的基于LLM的方法将特征生成与下游模型解耦：LLM未接收到关于哪些特征当前驱动预测或模型表示能力不足的信号，因此提议既不针对特征空间中有前景的区域，也不适应学习器的归纳偏差。这一缺陷在医疗数据中尤为突出，医疗数据同时表现出类别不平衡、异质特征空间和严格的可解释性要求。本文提出MedFeat，这是首个受机器学习从业者工作流程启发的特征工程框架，利用模型感知和特征重要性信号迭代地指导临床表格学习的特征发现。我们在广泛的具有挑战性的真实临床任务上评估MedFeat，并表明它在统计上显著优于最先进的基线，在具有不同归纳偏差的模型上平均提升超过10%。

英文摘要

In clinical tabular prediction, classical machine learning models with feature engineering often outperform neural methods. LLMs are increasingly used to automate this process, acting as domain experts that propose diverse feature transformations to boost downstream performance. However, existing LLM-based methods decouple feature generation from the downstream model: the LLM receives no signal about which features currently drive predictions or where the model's representational capacity falls short, so proposals are neither targeted to promising regions of the feature space nor tailored to the learner's inductive bias. This shortcoming is amplified in healthcare data, which simultaneously exhibits class imbalance, heterogeneous feature spaces, and strict interpretability requirements. In this paper, we propose MedFeat, the first feature engineering framework inspired by the workflow of machine learning practitioners, leveraging model-awareness and feature importance signals to iteratively guide feature discovery for clinical tabular learning. We evaluate MedFeat on a broad range of challenging real-world clinical tasks and show that it statistically significantly outperforms state-of-the-art baselines, with an average improvement of more than 10% over the baseline across models with distinct inductive biases.

URL PDF HTML ☆

赞 0 踩 0

2603.22017 2026-06-10 cs.LG 版本更新

Domain Adapted Large Language Models for Additive Manufacturing

面向增材制造的领域自适应大语言模型

Peter Pak, Amir Barati Farimani

发表机构 * Department of Mechanical Engineering, Carnegie Mellon University（机械工程系，卡内基梅隆大学）

AI总结本文通过约5000万token的小型数据集对开源大语言模型进行领域自适应预训练和指令微调，构建多模态领域自适应模型，在增材制造基准测试中达到90%以上准确率。

详情

AI中文摘要

本文提出了一系列多模态领域自适应大语言模型，这些模型基于指令微调变体的开源权重模型（Gemma 3、Qwen 3、Gemma 4），使用约5000万token的相对小型数据集构建。该数据集包含开放获取的增材制造期刊文章，从中提取数据用于领域自适应预训练和视觉指令微调过程。开发模型的各个阶段使用增材制造基准进行评估，该基准由增材制造领域特定任务和已发布资源组成。领域自适应和指令微调模型在语言和视觉任务中均表现出色，在通用增材制造知识方面达到90%以上的准确率。这种领域自适应预训练和指令微调策略为大语言模型在增材制造等领域的专业化提供了一种可访问的方法。

英文摘要

This work presents a collection of multi-modal domain adapted large language models built upon the instruction tuned variants of open weight models (Gemma 3, Qwen 3, Gemma 4) using a relatively small dataset of around 50 million tokens. The dataset consists of open-access additive manufacturing journal articles with data extracted for the domain adaptive pretraining and visual instruction tuning processes. Various stages of the developed model are evaluated with the Additive-Manufacturing-Benchmark which consists of additive manufacturing domain specific tasks compiled published resources. Domain adapted and instruction tuned models exhibit proficiency in both language and vision based tasks, achieving accuracies upwards of 90% in general additive manufacturing knowledge. This domain adaptive pretraining and instruction tuning strategy outline an accessible specialization method for large language models to a domain such as additive manufacturing.

URL PDF HTML ☆

赞 0 踩 0

2604.07085 2026-06-10 cs.LG 版本更新

Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering

挖掘电子健康记录以研究集成深度聚类的有效性

Manar D. Samad, Yina Hou, Shrabani Ghosh

发表机构 * Department of Computer Science（计算机科学系）； Tennessee State University（田纳西州立大学）

AI总结针对电子健康记录中传统聚类方法在嵌入表示上的局限，提出基于集成嵌入的深度聚类方法，结合多种嵌入维度与经典聚类，在心力衰竭患者队列中取得最佳综合性能。

Comments 2026 14th IEEE Conference on Healthcare Informatics

详情

AI中文摘要

在电子健康记录（EHR）中，对患者进行聚类和区分疾病亚型是阐明病理生理学并辅助临床决策的关键任务。然而，医疗信息学中的聚类仍基于传统方法，尤其是K-means，当将其作为混合方法应用于自编码器学习的嵌入表示时，取得的成功有限。本文利用来自“All of Us”研究计划的真实EHR数据，研究了传统、混合和深度学习方法在心力衰竭患者队列中的有效性。传统聚类方法表现稳健，因为深度学习方法专门为图像聚类设计，该任务与表格型EHR数据设置显著不同。为了解决深度聚类的不足，我们引入了一种基于集成的深度聚类方法，该方法聚合从多个嵌入维度获得的聚类分配，而不是依赖于单个固定的嵌入空间。当在新型集成框架中与传统聚类结合时，所提出的用于深度聚类的集成嵌入在14种不同的聚类方法和多个患者队列中取得了最佳的整体性能排名。本文强调了EHR数据的生物学性别特异性聚类的重要性，以及将传统和深度聚类方法相结合相对于单一方法的优势。

英文摘要

In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.

URL PDF HTML ☆

赞 0 踩 0

2604.26991 2026-06-10 cs.LG cs.AI 版本更新

People-Centred Medical Image Analysis via Fairness-Aware Human-AI Cooperation

以人为本的医学图像分析：通过公平感知的人机协作

Zheng Zhang, Milad Masroor, Cuong Nguyen, Tahir Hassan, Yuanhong Chen, David Rosewarne, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro

发表机构 * arXiv

AI总结提出PecMan框架，联合建模子群依赖可靠性、决策分配和协作预测，通过门控与整合机制动态分配病例给自动模型或人类专家，无需测试时敏感属性，实现公平感知的人机协作分类。

详情

AI中文摘要

医学图像分析的机器学习模型通常表现出子群依赖的性能，这影响了在有限资源下如何在自动化系统和人类专家之间分配决策。先前关于AI公平性和人机协作的工作，包括学习推迟（L2D）和学习互补（L2C），通常孤立地处理这些问题。我们提出了以人为本的医学图像分析（PecMan），一个用于公平感知的人机协作分类框架，它联合建模子群依赖的可靠性、决策分配和协作预测。PecMan结合了子群专门的预测器与一个门控和整合机制，该机制动态地将病例分配给自动化模型、人类专家或它们的组合，而无需在测试时使用敏感属性。我们还引入了FairHAI基准，用于评估预测准确性、子群公平性和人类参与之间的权衡。此外，我们通过选择遗憾对多智能体门控进行了理论分析，并刻画了在输入依赖分配下的公平性-覆盖权衡。在多个医学影像数据集上的实验表明，与单独处理公平性或人机协作的方法相比，PecMan实现了持续改进的权衡。

英文摘要

Machine learning models for medical image analysis often exhibit subgroup-dependent performance, which impacts how decisions should be allocated between automated systems and human experts under limited resources. Prior work on AI fairness and human-AI cooperation, including learning to defer (L2D) and learning to complement (L2C), typically addresses these problems in isolation. We propose People-Centred Medical Image Analysis (PecMan), a framework for fairness-aware human-AI co-operative classification that jointly models subgroup-dependent reliability, decision allocation, and collaborative prediction. PecMan combines subgroup-specialised predictors with a gating and consolidation mechanism that dynamically assigns cases to automated models, human experts, or their combination, without requiring sensitive attributes at test time. We also introduce the FairHAI benchmark for evaluating trade-offs between predictive accuracy, subgroup equity, and human involvement. In addition, we provide a theoretical analysis of multi-agent gating via selection regret and characterise fairness-coverage trade-offs under input-dependent allocation. Experiments across multiple medical imaging datasets demonstrate that PecMan achieves consistently improved trade-offs compared to methods that address fairness or human-AI cooperation separately.

URL PDF HTML ☆

赞 0 踩 0

2606.07135 2026-06-10 cs.LG 版本更新

Explaining Unsupervised Disease Staging in Huntington's Disease: Insights into Model Representations and Clusters

解释亨廷顿病中的无监督疾病分期：模型表示与聚类洞察

Lubna Mahmoud Abu Zohair, Hind Zantout

发表机构 * Heriot-Watt University（赫瑞-沃德大学）

AI总结本文通过可解释性分析扩展无监督疾病分期框架，在Enroll-HD数据集上揭示模型嵌入与临床进展一致，并利用SHAP量化特征重要性，识别出从早期认知运动障碍到严重功能依赖的疾病阶段。

Comments Accepted for oral presentation and as a full-length paper at the International Conference on AI in Healthcare 2026 (26-28 August 2026, Imperial College London) and will be published by Springer in the Lecture Notes in Computer Science (LNCS) series

详情

DOI: 10.48550/arXiv.2606.07135

AI中文摘要

亨廷顿病（HD）是一种进行性神经退行性疾病，影响运动、认知和行为功能，准确描述疾病进展对于改善患者预后和生活质量至关重要。无监督机器学习（ML）方法已证明能够从纵向数据中发现疾病进展轨迹和有意义的潜在阶段；然而，其有限的可解释性限制了临床信任和转化。我们通过将可解释性分析应用于提取的特征表示和发现的疾病阶段，扩展了先前提出的基于ML的疾病分期框架。应用于Enroll-HD数据集，我们首先将学习到的表示投影到低维空间，以直观评估所得聚类是否与既定临床指标的进展一致。然后，我们使用显著性图识别随时间对学习嵌入贡献最大的临床特征。最后，我们训练一个替代分类器并应用SHAP来量化特征对聚类分配的重要性，并分析哪些临床变量驱动疾病阶段之间的转换。可解释性分析表明，学习到的嵌入捕捉了具有临床意义的疾病结构，与既定的运动和功能严重程度评分一致，并显示出跨聚类的进行性恶化。在此分析中，SHAP揭示了疾病阶段的分层，范围从早期认知运动障碍到严重功能依赖，与已知的临床进展模式一致，同时也突出了阶段内变异性。

英文摘要

Huntington's disease (HD) is a progressive neurodegenerative disorder that affects motor, cognitive, and behavioral functions, where accurate characterization of disease progression remains essential to improve patient outcome and quality of life. Unsupervised machine learning (ML) approaches have demonstrated the ability to uncover disease progression trajectories and meaningful latent stages from longitudinal data; however, their limited interpretability restricts clinical trust and translation. We extend a previously proposed ML-based disease staging framework by applying an explainability analysis to the extracted feature representations and discovered disease stages. Applied to the Enroll-HD dataset, we first project the learned representations into a lower-dimensional space to intuitively assess whether the resulting clusters align with the progression of established clinical measures. We then use saliency maps to identify the clinical features that most strongly contribute to the learned embeddings over time. Finally, we train a surrogate classifier and apply SHAP to quantify feature importance for cluster assignments and to analyze which clinical variables drive transitions between disease stages. The explainability analysis indicates that the learned embeddings capture clinically meaningful disease structure, aligning with established motor and functional severity scores and exhibiting progressive deterioration across clusters. Within this analysis, SHAP reveals a stratification of disease stages, ranging from early cognitive-motor impairment to severe functional dependency, consistent with known clinical progression patterns, while also highlighting intra-stage variability.

URL PDF HTML ☆

赞 0 踩 0

2512.18531 2026-06-10 physics.chem-ph cs.LG 版本更新

Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

利用人工智能推动一维核磁共振波谱在自动结构解析中的极限

Frank Hu, Jonathan M. Tubb, Dimitris Argyropoulos, Sergey Golotvin, Mikhail Elyashberg, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

发表机构 * Department of Chemistry, Stanford University（化学系，斯坦福大学）； ACD/Labs（ACD实验室）

AI总结提出基于Transformer的深度学习框架，仅利用一维1H和13C NMR谱，对含多达40个非氢原子的有机分子实现60.4%的首次15次预测准确率，克服化学空间组合爆炸。

详情

AI中文摘要

一维核磁共振波谱是有机化合物和天然产物表征中最广泛使用的技术之一。对于含有最多36个非氢原子的分子，可能的结构数量估计在$10^{20} - 10^{60}$范围内。因此，仅使用其一维$^1$H和/或$^{13}$C NMR谱来确定该大小分子的结构（分子式和连接性），即从头结构生成，似乎完全不可行。在这里，我们展示了如何通过深度学习框架，对含有最多40个非氢原子且涵盖有机化学中常见元素（C、N、O、H、P、S、Si、B和卤素）的系统实现这一任务，从而覆盖了类药化学空间的绝大部分。利用自然语言处理的见解，我们展示了基于Transformer的架构仅使用$^1$H和$^{13}$C NMR谱，在前15次预测中正确预测分子的准确率达到60.4%，从而克服了化学空间的组合增长，同时通过微调也可扩展到实验数据。

英文摘要

One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 60.4% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2601.05232 2026-06-10 cs.CL cs.CY cs.LG 版本更新

AI Application Gives Users Real-Time Feedback on the Level of Peace in the Social Media Videos They Watch

AI应用为用户观看的社交媒体视频提供实时和平水平反馈

P. Gilda, P. Dungarwal, A. Thongkham, E. T. Ajayi, S. Choudhary, T. M. Terol, C. Lam, J. P. Araujo, M. McFadyen-Mungalln, L. S. Liebovitch, P. T. Coleman, H. West, K. Sieck, S. Carter

发表机构 * Data Science Institute, Columbia University（哥伦比亚大学数据科学研究所）； Advanced Consortium on Cooperation, Conflict, and Complexity, Columbia University（哥伦比亚大学合作、冲突与复杂性高级联合体）； Computer Science, Columbia University（哥伦比亚大学计算机科学）； Data Science, St John’s University（圣约翰大学数据科学）； Quantitative Methods in the Social Sciences, Columbia University（哥伦比亚大学社会科学定量方法）； Barnard College, Columbia University（哥伦比亚大学巴纳德学院）； Teachers College, Columbia University（哥伦比亚大学教师学院）； Department of Industrial Engineering and Operations Research, Columbia University（哥伦比亚大学工业工程与运筹学系）； Harmonious Communities, Toyota Research Institute（丰田研究院和谐社区）

AI总结开发了一个实时分析YouTube视频中语言和平程度的AI应用，使用监督学习和大语言模型，大语言模型在测量和平相关社会维度上更接近人类编码者。

Comments 6 pages, 4 figures, corrected typos, minor edits; v3: 16 pages, improved title, abstract, introduction, discussion, conclusions, added more references

详情

AI中文摘要

现在大多数人通过社交媒体（如YouTube和Facebook）上的视频获取新闻，而不是通过精心策划的新闻业。“我们成为我们所注视的。”语言的内容和语调在开始或结束冲突中起着至关重要的作用。“仇恨言论”会加剧冲突，“和平言论”会促进和平。我们开发了一个应用程序，可以实时测量YouTube视频中这些方面的言论，从而为用户提供关于自身媒体消费的有用反馈。我们使用了两种方法：1）监督机器学习。在线新闻媒体文本中的语言通过衡量这些国家和平水平的调查进行标记。一个全连接前馈网络和两个卷积神经网络在该数据上训练，在测试集上预测和平水平的准确率约为97%，在另一个不同的新闻文本数据集中准确率约为70%，但未能泛化到YouTube视频，表明书面文本与转录的口语不同。2）社会科学维度。没有类似的外部数据来标记YouTube视频转录文本中的语言。因此，我们使用了2个词级情感分析（SA）和6个上下文级大语言模型（LLM）来测量59项社会科学研究确定的和平中的5个社会维度：同情-蔑视、新闻-观点、促进-预防、创造力-秩序、细微差别-简化。在52个视频上，LLM与3个人类编码者的值更接近（r^2~0.60），而SA的r^2~0.03。结果：与人类编码者相比，LLM成功测量了YouTube视频中与和平相关的重要社会维度。这些结果构成了一个分析引擎的基础，该引擎可以为用户和内容创作者提供关于自身媒体消费和创作的反馈。

英文摘要

Most people now get their news from videos on social media, such as YouTube and Facebook, rather than through curated journalism. "We become what we behold." The content and tone of language plays an essential role in starting or ending conflicts. "Hate Speech" can enhance conflict, "Peace Speech" can enhance peace. We developed an application that measures, in real time, these aspects of speech from YouTube videos, which can give users helpful feedback on their own media diet. We used two approaches: 1) supervised machine learning. Language in the text of online news media text was tagged by surveys that measure the level of peace in those countries. One fully connected feedforward and 2 convolutional neural networks trained on that data were $\sim 97\%$ accurate in predicting levels of peace in the test set and $\sim 70\%$ accurate in another distinct news text data set, but did not generalize to YouTube videos, suggesting that written text is different than transcribed spoken language. 2) social science dimensions. There is no similar external data to tag the text in the YouTube video transcripts. We therefore used 2 word-level sentiment analysis (SA) and 6 context-level large language models (LLMs) to measure 5 social dimensions in peace identified by 59 social science studies: compassion-contempt, news-opinion, promotion-prevention, creativity-order, nuance-simplification. LLMs more closely matched the values by 3 human coders on 52 videos, $r^2\sim0.60$ than SA, at $r^2\sim0.03$. Results: LLMs successfully measured social dimensions important in peace in YouTube videos, compared to human coders. These results serve as the basis of an analysis engine that can give users and content creators feedback on their own media diet and creations.

URL PDF HTML ☆

赞 0 踩 0

2604.13213 2026-06-10 stat.ML cs.LG math.OC physics.chem-ph 版本更新

Rare Event Analysis via Stochastic Optimal Control

基于随机最优控制的稀有事件分析

Yuanqi Du, Jiajun He, Dinghuai Zhang, Eric Vanden-Eijnden, Carles Domingo-Enrich

发表机构 * Microsoft Research New England（微软研究院新英格兰分部）； Cornell University（康奈尔大学）； University of Cambridge（剑桥大学）； Courant Institute of Mathematical Sciences, NYU（纽约大学Courant数学科学研究所）

AI总结提出将稀有事件分析中的committor函数估计转化为随机最优控制问题，通过反馈控制引导轨迹采样，并开发两种损失函数及处理亚稳态的方法，在基准系统上获得更准确的结果。

详情

AI中文摘要

稀有事件，如生物分子的构象变化、相变和化学反应，是许多物理系统行为的关键，但由于无偏模拟很少产生这些事件，因此计算研究极其困难。过渡路径理论（TPT）为分析此类事件提供了严格的统计框架：它表征了两个指定亚稳态（反应物和产物）之间的反应轨迹集合，其核心对象——committor函数（给出系统下一步到达产物而非反应物的概率）——编码了所有基本的动力学和热力学信息。我们引入了一个框架，将committor估计转化为随机最优控制（SOC）问题。在此公式中，committor定义了一个反馈控制（与其对数梯度成正比），该控制主动引导轨迹朝向反应区域，从而实现对反应路径的高效采样。为了解决由此产生的命中时间控制问题，我们开发了两个互补的目标：直接反向传播损失和基于原理的离策略值匹配损失，并为其建立了一阶最优性保证。我们进一步通过引入一种替代采样过程来解决亚稳态问题（该问题可能使受控轨迹陷入中间势阱），该过程在降低有效能垒的同时保持反应电流。在基准系统上，该框架比现有方法产生了显著更准确的committor估计、反应速率和平衡常数。

英文摘要

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

URL PDF HTML ☆

赞 0 踩 0

2606.06493 2026-06-10 cs.RO cs.AI cs.LG 版本更新

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

HANDOFF: 通过蒸馏互补教师实现人形机器人任务空间全身控制

Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames

发表机构 * California Institute of Technology（加州理工学院）； The Institute for Human & Machine Cognition（人机认知研究院）

AI总结提出HANDOFF框架，通过多教师KL蒸馏和上下文门控机制，将全身运动跟踪、行走和跌倒恢复三个专家策略融合为混合专家学生策略，实现基于紧凑显式接口的全身控制，在Unitree G1上达到先进的速度跟踪性能并扩展了操作工作空间。

Comments 22 pages, 9 figures, Project page: https://lzyang2000.github.io/HANDOFF/

详情

AI中文摘要

对于要在现实世界中部署的人形机器人，命令空间（即任务规划与全身控制之间的接口）的选择至关重要。现有的全身控制器通常需要密集的运动学或空间参考，而规划器难以从任务语义中合成这些参考。我们提出了一种紧凑、显式的接口，该接口直观、通用、模块化且具有足够的表达能力，适用于多种操作技能。为此，我们引入了HANDOFF，这是一个单一的人形全身控制器，遵循该接口，并通过多教师KL蒸馏，在上下文条件门控方案下，从三个互补专家（具有安全过滤数据的全身运动跟踪、行走和跌倒恢复）中蒸馏出混合专家学生。在Unitree G1上，HANDOFF达到了最先进的速度跟踪性能，并提供了最大的鲁棒操作工作空间之一。我们进一步通过多个自然语言驱动的任务执行演示了硬件可行性，这些任务由VLM驱动的智能体规划器提供支持，无需特定任务数据或控制器微调。

英文摘要

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2606.09894 2026-06-10 cs.LG cs.CL 新提交

A Navigable Manifold of Hypothesized Consciousness-Spectrum States in Language Model Representations

语言模型表示中假设的意识谱状态的可导航流形

Sophie Zhao

发表机构 * School of Computer Science（计算机科学学院）； Georgia Institute of Technology（佐治亚理工学院）

AI总结研究语言模型嵌入空间中与意识谱相关的几何结构，发现嵌入形成可导航流形，高低层区域稳定，中间为过渡走廊，导航性为内在属性。

详情

AI中文摘要

在沉思、哲学和心理学描述中，人类意识常被描述为从反应性和自我聚焦模式到更整合和连贯模式的类似谱系。理解语言模型是否在表示空间中编码了这种结构化、人类可解释的意识谱系，对于模型引导、评估和对齐具有重要意义。在这项工作中，我们研究了Transformer嵌入空间中沿该谱系的几何结构和动态模式。我们表明，嵌入表现出与该谱系对齐的全局组织几何：与相似状态相关的句子聚类成局部连贯区域，形成结构化流形。特别地，高层和低层区域表现出类似凸性的稳定性，而中间区域形成过渡走廊。在动态上，效用引导和纯几何贪婪轨迹都一致地从低层区域穿越到高层区域，经过中间层级，表明可导航性是表示空间的内在属性，由全局方向信号引导但非决定。这些结果表明，嵌入空间编码了与假设的意识谱分类法（广泛受沉思传统、哲学和现代心理学中人类意识反复出现的结构描述启发）对齐的结构化和可导航几何，为分析和引导模型行为提供了表示层面的视角。

英文摘要

Across contemplative, philosophical, and psychological accounts, human consciousness is often described along a similar spectrum, ranging from reactive and self-focused patterns to more integrative and coherent ones. Understanding whether language models encode such a structured, human-interpretable consciousness spectrum in representation space is important for model guidance, evaluation and alignment. In this work, we study the geometric structure and dynamics of patterns along this spectrum in transformer embedding spaces. We show that embeddings exhibit a globally organized geometry aligned with this spectrum: sentences associated with similar states cluster into locally coherent regions, forming a structured manifold. In particular, higher-level and lower-level regions exhibit convexity-like stability, while intermediate regions form a transition corridor. Dynamically, both utility-guided and geometry-only greedy trajectories consistently traverse from lower- to higher-level regions, passing through intermediate tiers, indicating that navigability is an intrinsic property of the representation space, guided but not dictated by a global directional signal. These results suggest that embedding spaces encode structured and navigable geometry aligned with a hypothesized consciousness-spectrum taxonomy, broadly inspired by recurring structural descriptions of human consciousness across contemplative traditions, philosophy, and modern psychology, providing a representation-level perspective for analyzing and guiding model behavior.

URL PDF HTML ☆

赞 0 踩 0

2606.09936 2026-06-10 cs.LG cs.AI 新提交

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

一个镜头，多个世界：面向世界模型可解释性的能力类型接口

Bhavith Chandra Challagundla, Sanskar Pandey, Param Thakkar, Rishikesh Mallagundla, Yugandhar Reddy Gogireddy, Wenhao Lu, Hindol Roy Choudhury, Shravani Challagundla, Mohamed Deraz Nasr, Spursh Deshpande

AI总结提出WorldModelLens，通过能力类型适配器统一不同世界模型（如PlaNet、IRIS、I-JEPA）的可解释性分析，避免重复实现。

详情

AI中文摘要

世界模型现在建立在截然不同的计算基板上。潜在循环状态空间模型（如PlaNet和Dreamer系列）将观测压缩为循环状态；基于token的模型（如IRIS）将观测量化到学习到的码本中，并用transformer进行自回归预测；联合嵌入预测架构（如I-JEPA）在没有像素解码器的学习潜在空间中进行预测。应用于这些模型的可解释性方法，包括探针、激活修补、稀疏自编码器和惊喜分析，共享一组共同的基元，但由于现有的钩子和缓存工具假设一个没有动作、环境步骤或想象回滚概念的transformer语言模型，它们为每个架构从头重新实现。我们认为这种碎片化反映了工具而非模型，并且世界模型的共享结构可以通过一个小型类型接口捕获。我们提出了WorldModelLens，一个围绕能力类型适配器组织的开源可解释性基板：每个模型实现四个必需方法（编码、转移、初始状态、采样），并通过显式能力描述符声明一组可选头（解码、奖励、继续、行动者、评论者），使得强化学习和自监督世界模型成为一等公民，而无需模仿对方。单一的钩子和缓存层在此接口上暴露时间索引的激活、想象回滚和干预重放，使得每个分析只需编写一次。

英文摘要

World models are now built on substantially different computational substrates. Latent recurrent state-space models such as PlaNet and the Dreamer family compress observations into recurrent states; token-based models such as IRIS quantize observations into a learned codebook and predict autoregressively with a transformer; and joint-embedding predictive architectures such as I-JEPA predict in a learned latent space with no pixel decoder. The interpretability methods applied to these models, including probing, activation patching, sparse autoencoders, and surprise analysis, share a common set of primitives, yet they are re-implemented from scratch for each architecture because existing hook-and-cache tooling assumes a transformer language model with no notion of actions, environment steps, or imagined rollouts. We argue that this fragmentation reflects the tooling rather than the models, and that the shared structure of world models is captured by a small typed interface. We present WorldModelLens, an open-source interpretability substrate organized around a capability-typed adapter: every model implements four required methods (encode, transition, initial state, sample) and declares a set of optional heads (decode, reward, continue, actor, critic) through an explicit capability descriptor, so that reinforcement-learning and self-supervised world models are first-class without either imitating the other. A single hook and cache layer exposes time-indexed activations, imagination rollouts, and intervention replay over this interface, allowing each analysis to be written once.

URL PDF HTML ☆

赞 0 踩 0

2606.10092 2026-06-10 cs.LG econ.GN q-fin.EC 新提交

Decision-Making under Combinatorial Risk

组合风险下的决策

Yifan Hong, Hongmiao Fan, Chen Wang

AI总结通过投资分配任务研究组合风险下的决策，发现参与者主要依据投资后成功概率等特征而非精确评估完整分布，并利用符号回归发现简洁描述模型。

详情

AI中文摘要

风险下的决策通常通过单次彩票选择来研究。然而，许多实际决策涉及组合风险，其中风险来自多个风险组件，因此结果上的彩票是诱导的而非直接给出的，并且精确评估可能代价高昂。我们引入了一项投资分配任务来研究组合风险下的决策，其中投资于一个组件会提高其成功概率，从而重塑结果分布。参与者倾向于选择概率增量较大的选项，当增量相等时，选择初始成功概率较高的选项。揭示诱导的概率质量函数（PMF）会显著改变行为，使参与者对组合风险特征的反应减弱，并减少选择方差。为了解释这些模式，我们超越标准基准和手工假设，使用符号回归发现简洁的描述模型。发现的模型主要依赖于组合风险特征，例如投资后的成功概率，而不是对完整诱导分布的精确评估。当显示PMF时，行为可以通过用前景理论残差模型增强该模型来很好地解释。结果表明，人们主要通过核心特征来导航组合风险，仅在显示诱导PMF时才转向彩票估值。

英文摘要

Decision-making under risk is typically studied through single-shot lottery choices. Yet many real decisions involve combinatorial risk, where risk arises from multiple risky components, so the lottery over outcomes is induced rather than given outright and can be costly to evaluate exactly. We introduce an investment-allocation task to study decision under combinatorial risk, where investing in a component raises its success probability and thereby reshapes the outcome distribution. Participants favor the option with the larger probability increment, and, when increments are equal, the option with the higher initial success probability. Revealing the induced probability mass function (PMF) substantially changes behavior, making participants less responsive to combinatorial-risk features and reducing choice variance. To explain these patterns, we move beyond standard benchmarks and hand-crafted hypotheses with symbolic regression to discover compact descriptive models. The discovered models rely mainly on combinatorial-risk features, such as the after-investment success probability, rather than exact evaluation of the full induced distribution. Behavior under the displayed PMF is then well explained by augmenting this model with a prospect-theoretic residual model. The results show that people navigate combinatorial risk primarily through its core features, shifting toward lottery valuation only when the induced PMF is displayed.

URL PDF HTML ☆

赞 0 踩 0

2606.10582 2026-06-10 cs.LG cs.AI 新提交

Drawing with Strangers: Population Scaling Drives Zero-Shot Mutual Intelligibility in Emergent Sketching

与陌生人共绘：种群规模驱动涌现素描中的零-shot互懂性

Jooyeon Kim

发表机构 * Graduate School of Artificial Intelligence, UNIST（UNIST人工智能研究生院）

AI总结研究通过视觉素描游戏，发现扩大训练种群规模能显著提升独立训练群体间的零-shot互懂性，其机制在于增加群体内变异并减少群体间差异，最终通过感知锚定实现结构收敛。

详情

AI中文摘要

涌现通信中的泛化主要关注新颖输入或语言结构，但智能体与来自严格不相交社区的陌生人进行通信的能力仍相对未被探索。在这项工作中，我们将这种能力形式化为\textit{零-shot互懂性（ZMI）}：独立训练群体之间无需事先接触即可成功通信。利用涌现素描（智能体通过绘制一组笔画进行通信）作为视觉接地模态，我们发现扩大训练种群规模显著提高了独立群体间的ZMI。关键的是，随着种群规模扩大，群体内通信变异增加，防止了同质化共适应。同时，群体间变异减少，表明向某种普遍性的结构收敛。进一步分析揭示，这种普遍性是通过感知接地实现的：扩大后的种群越来越将其涌现素描锚定在目标图像的客观视觉相似性上。这些结果共同将ZMI定位为涌现通信中一个独特的泛化轴，并提出了实现社会可互操作人工智能体的途径。

英文摘要

Generalization in emergent communication has largely focused on novel inputs or linguistic structures, yet the capacity for agents to communicate with strangers from strictly disjoint communities remains relatively unexplored. In this work, we formalize this capability as \textit{zero-shot mutual intelligibility (ZMI)}: successful communication between independently trained populations without prior exposure. Leveraging emergent sketching -- in which agents communicate through sets of drawn strokes -- as a visually grounded modality, we find that scaling the training population substantially improves ZMI across independent groups. Crucially, as we scale the population size, in-group communicative variation increases, preventing co-adaptation into homogeneity. Simultaneously, cross-group variation decreases, indicating a structural convergence toward a certain type of universality. Further analysis reveals that this universality is achieved through perceptual grounding: scaled populations increasingly anchor their emergent sketches on the objective visual resemblance of the target images. Together, these results position ZMI as a distinct axis of generalization in emergent communication and suggest a route toward socially interoperable artificial agents.

URL PDF HTML ☆

赞 0 踩 0

2606.10587 2026-06-10 cs.LG cs.AI 新提交

Towards Diverse Scientific Hypothesis Search with Large Language Models

面向多样化科学假设搜索的大语言模型

Haorui Wang, Parshin Shojaee, Kazem Meidani, Kunyang Sun, José Miguel Hernández-Lobato, Teresa Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结针对科学假设搜索中多样性崩溃问题，提出基于并行回火的多温度进化框架，在固定验证预算下提升假设质量与多样性。

Comments ICML 2026

详情

AI中文摘要

大语言模型（LLMs）在加速科学发现方面日益崛起，最近在生成有效科学假设等高级任务中表现突出。然而，在许多发现场景中，目标并非识别单一最佳假设，因为验证可能噪声大且成本高，科学家受益于一组高质量替代假设，以对冲下游不确定性，寻求最佳解决方案。尽管如此，常用的进化搜索策略在假设生成中往往优先优化而非探索，搜索过程中的选择压力导致多样性崩溃。受这些局限性的启发，我们将假设搜索表述为采样问题，目标是在固定验证预算下高效生成多样化、高质量的假设。基于这一视角，我们提出\ours，一种受经典并行回火算法启发的进化框架，在多个温度水平下搜索假设，并实现跨温度的原则性信息交换，以在不干扰收敛的情况下改善探索。在分子发现、方程发现和算法发现等领域，我们的方法在相同验证预算下持续提升假设质量和多样性，生成的候选假设在更昂贵的下游计算验证中仍保持稳健。

英文摘要

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

URL PDF HTML ☆

赞 0 踩 0

2606.10607 2026-06-10 cs.LG cs.AI cs.CL 新提交

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

因果集成智能体：基于LLM引导的专家重加权的层次化因果发现

Xinyu Li, Yuanyuan Wang, Haoxuan Li, Chuan Zhou, Erdun Gao, Bo Han, Tongliang Liu, Kun Zhang, Howard Bondell, Mingming Gong

发表机构 * The University of Melbourne（墨尔本大学）； MBZUAI（穆罕默德·本·扎耶德人工智能大学）； Peking University（北京大学）； Adelaide University（阿德莱德大学）； Hong Kong Baptist University（香港浸会大学）； The University of Sydney（悉尼大学）； Carnegie Mellon University（卡内基梅隆大学）

AI总结提出因果集成智能体（CEA）框架，通过线性意见池聚合不同层次的统计因果发现结果，并利用大语言模型（LLM）作为元裁判在决策边界附近动态重加权专家，从而构建更准确完整的因果图。

详情

AI中文摘要

因果发现旨在从观测数据中揭示因果结构，这对现实世界决策至关重要。然而，不同的因果发现算法可能产生相互冲突的结果，使得识别准确的因果图复杂化。传统方法依赖数值和统计假设，往往忽略丰富的领域特定信息（如特征描述），而这些信息也有助于结构学习。尽管近期研究探索使用大语言模型（LLM）通过直接查询推断因果关系，但由于缺乏与实际数据的一致性，此类方法可能不可靠。为解决这些限制，我们提出因果集成智能体（CEA），一种新颖框架，通过线性意见池聚合来自不同图层次的统计发现专家的结构见解，并在聚合置信度接近决策边界时，使用LLM作为元裁判动态重加权专家，从而组合出更完善、更完整的因果图。在合成和真实数据集上的大量实验表明，CEA在广泛的因果发现方法中实现了最强的整体性能，突显了在因果发现中使用LLM进行元分析的有效性。

英文摘要

Causal discovery aims to uncover causal structures from observational data, which is crucial for real-world decision-making. However, different causal discovery algorithms can produce divergent results that conflict with each other, complicating the identification of accurate causal graphs. Traditional approaches rely on numerical values and statistical assumptions, often ignoring rich domain-specific information, such as feature descriptions, which could also help structure learning. While recent works explore using Large Language Models (LLMs) to infer causal relations via direct queries, such methods can be unreliable due to a lack of alignment with the actual data. To address these limitations, we propose Causal Ensemble Agent (CEA), a novel framework that aggregates structural insights from statistical discovery experts across different graph levels via linear opinion pooling, and uses an LLM as a meta-referee to dynamically reweight experts when the aggregated confidence is close to the decision boundary, thereby composing an improved and more complete causal graph. Extensive experiments on both synthetic and real-world datasets demonstrate that CEA achieves the strongest overall performance across a wide range of causal discovery methods, highlighting the effectiveness of using LLMs for meta-analysis in causal discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.10824 2026-06-10 cs.LG math.AT 新提交

Encoding the Euler Characteristic Transform

编码欧拉特征变换

Nello Blaser, Odin Hoff Gardaa, Lars M. Salbu, Elena Xinyi Wang, Bastian Rieck

AI总结提出连续编码方法，将欧拉特征曲线转化为每个顶点的净变化序列，通过小型变换器生成特征向量，并在多个数据集上提升分类精度。

详情

AI中文摘要

欧拉特征曲线（ECC）记录线性嵌入的胞复形在给定方向上的欧拉特征随过滤高度的变化，而欧拉特征变换（ECT）是通过收集多个方向上的ECC得到的单射形状描述符。如何为神经网络编码ECT本身是一种归纳偏置，传统上通过离散化每个ECC来固定。我们引入一种连续编码：对于每个方向和每个顶点，它记录归因于该顶点的净欧拉特征变化，产生一个每个方向的令牌序列，由一个小型变换器映射到特征向量。我们将得到的流程分为两个正交轴上的阶段：一个ECC编码器，在每个方向内作用，将其曲线映射到固定长度向量；以及一个ECT表示，跨方向作用，聚合每个方向的向量为一个。我们研究了六种ECT表示架构，涵盖从结构无关的前馈基线到保持平面旋转等变性的卷积和复值模型的一系列归纳偏置。在涵盖点云、图、立方复形和网格的六个分类基准上，连续编码在六个数据集中有五个提高了准确率，控制实验将增益归因于令牌化本身而非增加的变换器容量。表示架构的重要性小于编码，其归纳偏置的收益取决于编码：前馈网络在连续编码下表现最佳，但在离散化下不如卷积架构鲁棒。

英文摘要

The Euler Characteristic Curve (ECC) records the Euler characteristic of a linearly embedded cell complex as a function of filtration height in a given direction, and the Euler Characteristic Transform (ECT) is the injective shape descriptor obtained by collecting ECCs over many directions. How the ECT is encoded for a neural network is itself an inductive bias, conventionally fixed by discretizing each ECC. We introduce a continuous encoding: for each direction and each vertex it records the net Euler-characteristic change attributed to that vertex, producing a per-direction token sequence that a small transformer maps to a feature vector. We separate the resulting pipeline into two stages on orthogonal axes: an ECC encoder that acts within each direction, mapping its curve to a fixed-length vector, and an ECT representation that acts across directions, aggregating the per-direction vectors into one. We study six ECT representation architectures spanning a range of inductive biases, from a structure-agnostic feedforward baseline to convolutional and complex-valued models that preserve equivariance under planar rotations. Across six classification benchmarks covering point clouds, graphs, cubical complexes, and meshes, the continuous encoding improves accuracy on five of six datasets, and control experiments attribute the gain to the tokenization itself rather than to the added transformer capacity. The representation architecture matters less than the encoding, and the payoff from its inductive biases depends on the encoding: a feedforward network performs best under continuous encoding but is less robust under discretization than convolutional architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.09854 2026-06-10 cs.CL cs.AI cs.CY cs.LG 交叉投稿

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

多智能体大语言模型能否识别其同类？角色约束政治分析中的笔迹风格指纹识别

Juergen Dietrich

AI总结研究多智能体LLM在政治分析中能否通过笔迹风格识别模型家族，提出SD-CV协议，T5模型在五类归属任务中达到F1=0.991，证明提示级匿名化无法消除模型身份信号。

Comments 24 pages, 3 figures

详情

AI中文摘要

用于政治声明分析的多智能体大语言模型（LLM）管道容易受到同伴保护偏见的影响：模型倾向于保护同伴模型免于停用，并表现出依赖身份的评分扭曲。提示级匿名化被提出作为缓解措施，但先前的工作同时记录了在角色约束输出中笔迹风格指纹在匿名化后仍然存在——这引发了该缓解措施是否足够的问题。本文首次系统研究LLM是否能在匿名化条件下识别政治分析文本背后的模型家族。我们评估了三种分类器方法——LLM零样本和少样本（Claude Sonnet 4.6和Llama-3.3-70B）以及微调的T5-base模型——在一个涵盖四个商业LLM家族和一个开放世界“未知”类的五类归属任务上。我们引入了一种声明不相交的交叉验证协议（SD-CV；定义见第3.5节），该协议保证训练和验证数据之间没有内容重叠，并将其与运行不相交的基线（RD-CV）进行对比。T5在SD-CV下达到Macro F1 = 0.991（±0.008），在24个完全保留的声明上F1 = 0.978——尽管与RD-CV相比，训练-测试内容距离增加了2.1倍（0.767 vs. 0.366，p<0.001），但仍表现出稳健性，证明了真正的笔迹风格泛化能力。一项分数SD-CV分析确定了训练数据40%（约440篇文本）处的性能拐点。我们的研究结果证实，仅靠提示级匿名化无法消除模型身份信号，这对欧盟AI法案合规性（第13、14、26条）以及质量关键型多智能体部署中的计算机系统验证（CSV）具有直接影响。

英文摘要

Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient. This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4.6 and Llama-3.3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0.991 (+-0.008) under SD-CV and F1 = 0.978 on 24 completely held-out statements - robust despite a 2.1x increase in train-test content distance versus RD-CV (0.767 vs. 0.366, p<0.001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts). Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.

URL PDF HTML ☆

赞 0 踩 0

2606.09957 2026-06-10 cs.SE cs.LG 交叉投稿

Data-aware Static Analysis: Improving Detection of Semantic Faults in Machine Learning Code Using Data Characteristics

数据感知静态分析：利用数据特征改进机器学习代码中语义故障的检测

Willem Meijer, Kristian Sandahl, Dániel Varró

发表机构 * Knut and Alice Wallenberg Foundation（Knut和Alice沃尔贝格基金会）； Software Center Project 61（软件中心项目61）； Vinnova CoDig competence center（Vinnova CoDig专业中心）

AI总结提出一种数据感知静态分析方法，结合数据流与控制流分析及API契约，在编写代码时而非训练后检测机器学习代码中的语义故障，如误用未缩放数据训练尺度敏感模型。

Comments 6 pages, 3 figures, 2 listings, 1 table; To be published in "2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE-NIER '26)"

详情

DOI: 10.1145/3786582.3786805

AI中文摘要

机器学习模型使用中的语义故障是机器学习开发者常见的问题，会导致预测次优、计算成本高或输出错误。例如，有人可能错误地使用未缩放的数据来训练尺度敏感模型。机器学习开发者在训练模型后手动分析结果来检测这些故障，这使得过程效率低下。我们提出了一种新颖的数据感知静态分析方法来检测机器学习代码中的语义故障，使开发者能够在编写代码时而不是在训练模型后揭示这些错误。我们的方法结合了数据流和控制流分析以及API契约，能够在高抽象层次上对机器学习代码进行数据感知推理。通过分析真实世界的机器学习笔记本样本，我们展示了我们解决方案的潜力，发现我们可以检测需要数据感知方法的故障。

英文摘要

Semantic faults specific to the use of machine learning models are a common problem for machine learning developers, causing suboptimal predictions, high computational cost, or incorrect outputs. For example, one may erroneously use unscaled data to train a scale-sensitive model. Machine learning developers detect these faults after training their models and manually analyzing the results, making it an inefficient process. We propose a novel data-aware static analysis approach to detect semantic faults in machine learning code, allowing developers to reveal these bugs while writing code instead of after training the model. Our approach uses combined data and control flow analysis, and API contracts, enabling data-aware reasoning about machine learning code at a high level of abstraction. We highlight the potential of our solution by analyzing a sample of real-world machine learning notebooks, finding that we can detect faults that require a data-aware approach.

URL PDF HTML ☆

赞 0 踩 0

2506.03411 2026-06-10 cs.LG cs.GT 版本更新

A Machine Learning Theory Perspective on Strategic Litigation

战略诉讼的机器学习理论视角

Melissa Dutz, Han Shao, Avrim Blum, Aloni Cohen

发表机构 * Toyota Technological Institute at Chicago（芝加哥丰田技术研究所）； University of Maryland（马里兰大学）； The University of Chicago（芝加哥大学）

AI总结从机器学习理论出发，建模普通法体系中战略诉讼者通过选择案件影响下级法院决策规则的问题，分析其影响力和最优策略，发现反直觉现象。

详情

AI中文摘要

战略诉讼是指提起诉讼的目标不仅限于解决特定纠纷，而是产生更广泛的影响。在普通法体系中，案件产生深远影响的一种方式是通过确立新的法律先例，后续法院必须遵循。本文从机器学习理论的角度探讨战略诉讼。我们考虑一个普通法法律体系的抽象模型，其中下级法院通过应用从上级法院过去裁决中学习到的决策规则来裁决新案件。在该模型中，我们探索战略诉讼者的力量，他们战略性地将案件提交给上级法院，以影响下级法院在未来案件中应用的决策规则。我们探讨的问题包括：战略诉讼者能产生什么影响？战略诉讼者应该将哪些案件提交法院？当战略诉讼者确信法院会做出不利于他们的裁决时，提起诉讼是否有意义？我们表明，这一战略案件选择问题具有有趣的结构，即使是简单的设置也会表现出反直觉的现象。当案件由一维点表示且下级法院的学习算法是最近邻时，或者当案件由d维点表示且下级法院的学习算法是支持向量机时，我们刻画了可诱导决策规则的集合，并开发了根据战略诉讼者目标选择最优案件集提交给上级法院的算法。

英文摘要

Strategic litigation involves bringing a case to court with the goal of having an impact beyond resolving the particular dispute at hand. In a common law system, one way a case may have far-reaching impact is by establishing new legal precedent that later courts must follow. In this paper, we explore strategic litigation from the perspective of machine learning theory. We consider an abstract model of a common law legal system where a lower court decides new cases by applying a decision rule learned from a higher court's past rulings. In this model, we explore the power of a strategic litigator, who strategically brings cases to the higher court to influence the decision rule applied by the lower court in future cases. We explore questions including: What impact can a strategic litigator have? Which cases should a strategic litigator bring to court? Does it ever make sense for a strategic litigator to bring a case when they are sure the court will rule against them? We show that this strategic case selection problem has interesting structure, with even simple settings exhibiting counterintuitive phenomena. When cases are represented by points in one dimension and the lower court's learning algorithm is nearest neighbor, or as points in d dimensions and the lower court's learning algorithm is a support vector machine, we characterize the set of inducible decision rules and develop algorithms for selecting an optimal set of cases to bring to the higher court given the strategic litigator's objectives.

URL PDF HTML ☆

赞 0 踩 0

2511.10234 2026-06-10 cs.LG cs.AI 版本更新

Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners

迷失在序列化中：LLM图推理器的不变性与泛化能力

Daniel Herbst, Lea Karbevska, Divyanshu Kumar, Akanksha Ahuja, Fatemeh Gholamzadeh Nasrabadi, Fabrizio Frasca

发表机构 * University of Cambridge（剑桥大学）； ETH Zurich（苏黎世联邦理工学院）

AI总结研究LLM图推理器对图表示对称性的缺乏不变性，通过分解序列化因素并评估微调影响，发现大模型更鲁棒，微调降低节点重标敏感但增加结构和格式敏感，且不保证泛化。

Comments ICML 2026 Workshop on Graph Foundation Models

详情

AI中文摘要

尽管前景广阔，基于大型语言模型（LLM）的图推理器缺乏对图表示中对称性的内置不变性。在顺序图序列化上操作时，LLM在节点重索引、边重排序或格式变化下可能产生不同输出，引发鲁棒性问题。我们系统分析了这些影响，研究了微调如何影响编码敏感性以及在未见任务上的泛化能力。我们提出了一种将图序列化分解为节点标记、边编码和语法的原则性方法，并在一个全面的基准测试套件上评估了LLM对每个因素变化的鲁棒性。我们还贡献了一组新的谱任务，以进一步评估微调推理器的泛化能力。结果表明，较大的（未微调）模型更鲁棒。微调降低了对节点重标的敏感性，但可能增加对结构和格式变化的敏感性，同时并未一致地提高在未见任务上的性能。

英文摘要

While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can produce different outputs under node reindexing, edge reordering, or formatting changes, raising robustness concerns. We systematically analyze these effects, studying how fine-tuning impacts encoding sensitivity as well generalization on unseen tasks. We propose a principled decomposition of graph serializations into node labeling, edge encoding, and syntax, and evaluate LLM robustness to variations of each of these factors on a comprehensive benchmarking suite. We also contribute a novel set of spectral tasks to further assess generalization abilities of fine-tuned reasoners. Results show that larger (non-fine-tuned) models are more robust. Fine-tuning reduces sensitivity to node relabeling but may increase it to variations in structure and format, while it does not consistently improve performance on unseen tasks.

URL PDF HTML ☆

赞 0 踩 0

2602.01135 2026-06-10 cs.LG 版本更新

Your Autoregressive Model Already Reveals the Causal Graph

你的自回归模型已经揭示了因果图

Hugo Math, Rainer Lienhart

发表机构 * Department of Machine Learning \& Computer Vision, University of Augsburg, Augsburg, Germany

AI总结本文提出TRACE框架，利用预训练自回归模型作为密度估计器进行条件独立性测试，从单序列离散事件中恢复时间因果图，并在大规模非线性SCM和真实车辆诊断日志上取得显著性能提升。

Comments 8 pages

详情

Journal ref: Structured Probabilistic Inference & Generative Modeling workshop ICML 2026

AI中文摘要

通过下一个词预测训练的自回归模型隐式地学习了其数据生成过程的条件独立结构。我们利用这一观察结果，从单个离散事件观测序列中执行可扩展的因果发现——无需任何特定任务的重新训练。这种单流设置自然出现在车辆诊断、制造系统和患者轨迹中，但至今仍未得到充分解决：缺乏重复样本、大量事件词汇和长程时间依赖使得现有方法要么不准确，要么计算上不可行。我们引入了TRACE，一个将任何预训练自回归模型重新用作条件互信息密度估计器的框架，条件互信息是条件独立性测试的基本原语。通过在GPU上构建并行化的CI测试，TRACE恢复了样本级时间因果图及其摘要投影，其规模随词汇量线性增长，同时自然处理延迟因果效应。关键的是，我们证明了最小化标准交叉熵预训练损失直接最小化了因果识别误差的上界，建立了序列预测与因果发现之间的对偶性。在非线性SCM（|X| = 8000）和真实车辆诊断日志（|X| = 29100）上，TRACE是此规模下首个适用的方法，在F1分数上超过最强基线20多点。

英文摘要

Autoregressive models trained via next-token prediction implicitly learn the conditional independence structure of their data-generating process. We exploit this observation to perform scalable causal discovery from a single observed sequence of discrete events -- without any task-specific retraining. Such single-stream settings arise naturally in vehicle diagnostics, manufacturing systems, and patient trajectories, yet they remain largely unsolved: the absence of repeated samples, massive event vocabularies, and long-range temporal dependencies render existing methods either inaccurate or computationally intractable. We introduce TRACE, a framework that repurposes any pretrained autoregressive model as a density estimator for conditional mutual information, the fundamental primitive for conditional independence testing. By constructing parallelized CI tests on GPUs, TRACE recovers both the sample-level time causal graph and its summary projection, scaling linearly with the vocabulary size while naturally handling delayed causal effects. Crucially, we prove that minimizing the standard cross-entropy pretraining loss directly minimizes an upper bound on the causal identification error, establishing a duality between sequence prediction and causal discovery. On nonlinear SCMs (|X| = 8000) and real-world vehicle diagnostic logs (|X| = 29100), TRACE is the first applicable method at this scale, outperforming the strongest baseline by over 20 F1 points.

URL PDF HTML ☆

赞 0 踩 0

2604.04287 2026-06-10 cs.LG cs.CL q-bio.GN 版本更新

Entropy, Disagreement, and the Limits of Foundation Models in Genomics

熵、分歧与基因组基础模型的局限性

Maxime Rochkoulets, Lovro Vrček, Mile Šikić

发表机构 * Genome Institute of Singapore, A*STAR（新加坡基因组研究院，A*STAR）； KU Leuven（卢森堡大学）； Faculty of Electrical Engineering and Computing, University of Zagreb（扎格雷布大学电子工程与计算学院）

AI总结本文通过分析熵对模型学习的影响，发现基因组序列的高熵导致输出分布接近均匀、模型间分歧大和静态嵌入不稳定，且Fisher信息集中在嵌入层，表明仅靠序列自监督训练可能不适用于基因组数据。

Comments Accepted to LMLR Workshop at ICLR 2026

详情

AI中文摘要

基因组学中的基础模型与自然语言处理中的基础模型相比，成功程度参差不齐。然而，其有效性有限的原因仍不清楚。在这项工作中，我们研究了熵作为限制此类模型从训练数据中学习并发展基础能力的基本因素的作用。我们在文本和DNA序列上训练模型集成，并分析它们的预测、静态嵌入和经验Fisher信息流。我们表明，从未见标记预测的角度来看，基因组序列的高熵导致输出分布接近均匀、模型间分歧大以及静态嵌入不稳定，即使模型在架构、训练和数据上匹配也是如此。然后，我们证明在DNA上训练的模型将Fisher信息集中在嵌入层，似乎未能利用标记间关系。我们的结果表明，仅从序列进行自监督训练可能不适用于基因组数据，这质疑了当前训练基因组基础模型方法背后的假设。

英文摘要

Foundation models in genomics have shown mixed success compared to their counterparts in natural language processing. Yet, the reasons for their limited effectiveness remain poorly understood. In this work, we investigate the role of entropy as a fundamental factor limiting the capacities of such models to learn from their training data and develop foundational capabilities. We train ensembles of models on text and DNA sequences and analyze their predictions, static embeddings, and empirical Fisher information flow. We show that the high entropy of genomic sequences -- from the point of view of unseen token prediction -- leads to near-uniform output distributions, disagreement across models, and unstable static embeddings, even for models that are matched in architecture, training and data. We then demonstrate that models trained on DNA concentrate Fisher information in embedding layers, seemingly failing to exploit inter-token relationships. Our results suggest that self-supervised training from sequences alone may not be applicable to genomic data, calling into question the assumptions underlying current methodologies for training genomic foundation models.

URL PDF HTML ☆

赞 0 踩 0

2206.02178 2026-06-10 cs.AI cs.LG 版本更新

Belief Acquisition as Stochastic Filtering

信念获取作为随机滤波

Dawei Chen, John Lloyd, Samuel Yang-Zhao, Kee Siong Ng

发表机构 * School of Computing, Australian National University（计算机学院，澳大利亚国立大学）

AI总结本文提出将信念获取视为随机滤波问题，通过分解条件滤波器在高维状态空间中同时跟踪状态和估计参数，并在流行病跟踪等实验中验证有效性。

Comments 51 pages

详情

AI中文摘要

本文研究如何利用随机滤波实现信念获取。首先，概述了经验信念的理论基础。然后，研究了该背景下的随机滤波。本文引入了因子化条件滤波器，这是一种新的滤波算法，用于在高维状态空间中同时跟踪状态和估计参数。算法的条件性质用于估计参数，因子化性质用于将状态空间分解为低维子空间，使得在这些子空间上的滤波得到的分布的乘积是对整个状态空间上分布的良好近似。算法成功应用的条件是：观测在子空间级别可用，且转移模式可以分解为近似局限于子空间的局部转移模式；这些条件在计算机科学、工程和地球物理滤波应用中广泛满足。在大型接触网络上跟踪流行病和估计参数的实验结果显示了该方法的有效性。

英文摘要

This paper studies how belief acquisition can be accomplished using stochastic filtering. First, a theoretical foundation for empirical beliefs is outlined. Then stochastic filtering in this context is studied. The paper introduces factored conditional filters, new filtering algorithms for simultaneously tracking states and estimating parameters in high-dimensional state spaces. The conditional nature of the algorithms is used to estimate parameters and the factored nature is used to decompose the state space into low-dimensional subspaces in such a way that filtering on these subspaces gives distributions whose product is a good approximation to the distribution on the entire state space. The conditions for successful application of the algorithms are that observations be available at the subspace level and that the transition schema can be factored into local transition schemas that are approximately confined to the subspaces; these conditions are widely satisfied in computer science, engineering, and geophysical filtering applications. Experimental results on tracking epidemics and estimating parameters in large contact networks show the effectiveness of the approach.

URL PDF HTML ☆

赞 0 踩 0

2603.04689 2026-06-10 cs.DS cs.CC cs.CG cs.CY cs.DB cs.LG 版本更新

Generalizing Fair Top-$k$ Selection: An Integrative Approach

公平的top-k选择的泛化：一种整合方法

Guangya Cai

发表机构 * University of Minnesota, Twin Cities（明尼苏达大学，双城分校）

AI总结本文研究了在多个受保护群体下寻找公平线性评分函数的问题，通过分析发现即使对于二维数据集和小k值，问题可能计算上不可行，但通过引入新的 disparity 测量方法，可恢复小k值时的效率。

详情

AI中文摘要

公平的top-k选择，确保在选出的top-k候选者中适当代表少数族或历史上不利群体，引起了广泛关注。我们研究了在多个受保护群体下寻找公平（线性）评分函数的问题，同时最小化与参考评分函数的差异。这扩展了之前的设置，该设置仅限于单群体设置且不考虑差异最小化。先前研究暗示受保护群体的数量可能对运行时间效率影响有限。然而，出于实验探索的需要，我们发现这一暗示忽略了可能影响结果公平性的关键问题。一旦正确考虑这个问题，我们的难度分析显示，即使对于二维数据集和小k值，问题可能计算上不可行。然而，我们的分析也揭示了难度障碍的差距，使我们能够在受保护群体数量足够小时恢复小k值的效率。此外，除了将差异测量为“公平评分函数与参考评分函数之间的距离”外，我们引入了另一种差异测量——即“效用损失”，这可能在小权重扰动下产生更稳定的评分函数。通过仔细的工程权衡，平衡实现复杂性、鲁棒性和性能，我们的增强双管方案在真实世界数据集上表现出强大的经验性能，实验观察也影响了算法设计和实现决策。

英文摘要

Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.

URL PDF HTML ☆

赞 0 踩 0

2606.04833 2026-06-10 cs.LG cs.AI 版本更新

Signed Dual Attention: Capturing Signed Dependencies in Time Series Forecasting

符号双注意力：在时间序列预测中捕捉符号依赖关系

Balthazar Courvoisier, Tristan Cazenave

发表机构 * Queensfield AI Technologies

AI总结提出符号双注意力机制，通过双消息传递方案同时捕捉正负依赖关系，无需额外参数，提升时间序列预测性能。

Comments 5 pages, 3 figures, accepted at AAAI 2026 AI4TS Workshop

详情

AI中文摘要

最初为自然语言处理开发的Transformer架构和注意力机制，现在已成为各种深度学习模型的核心，包括时间序列预测应用。然而，标准注意力机制隐含地假设同质性交互，限制了其对具有正负依赖关系（如时间序列）的数据建模能力。在这项工作中，我们引入了符号双注意力，一种新颖的注意力公式，无需额外参数即可捕捉正负关系模式。通过利用受相关结构启发的双消息传递方案，符号双注意力在单个共享块内传播支持和对比信息，有效实现了两个头注意力的表达能力而无需额外参数。该模块可以无缝集成到现有架构中，并在需要符号关系建模的某些情况下带来性能提升。这种方法为构建更具表达力和参数效率的Transformer开辟了道路。

英文摘要

Initially developed for natural language processing, Transformer architectures and attention mechanisms are now central to a wide range of deep learning models, including applications in time series forecasting. A standard attention mechanism, however, implicitly assumes homophilic interactions, limiting its ability to model data with positive and negative dependencies, such as time series. In this work, we introduce the Signed Dual Attention, a novel attention formulation that captures both positive and negative relational patterns without additional parameters. By leveraging a dual message-passing scheme inspired by correlation structures, Signed Dual Attention propagates both supportive and contrastive information within a single shared block, effectively achieving the expressiveness of two head attention without additional parameters. This module can be seamlessly integrated into existing architectures and can yield performance gains in certain situations, requiring signed relational modeling. This approach opens a pathway toward more expressive and parameter-efficient transformers.

URL PDF HTML ☆

赞 0 踩 0

2509.04154 2026-06-10 cs.LG cs.AI 版本更新

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation

鲁棒滤波注意力：自注意力作为精度加权状态估计

Peter Racioppo

发表机构 * Independent Researcher, Los Angeles, CA, USA（独立研究者，美国加利福尼亚州洛杉矶）

AI总结提出鲁棒滤波注意力（RFA），将自注意力建模为基于线性随机微分方程的状态估计，在语言建模中实现优于RoPE的困惑度与零样本外推稳定性。

2602.16898 2026-06-10 cs.RO cs.AI cs.CV cs.LG 版本更新

MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

MALLVI：一种多智能体框架用于集成通用机器人操作

Mehrshad Taji, Arad Mahdinezhad Kashani, Iman Ahmadi, AmirHossein Jadidi, Saina Kashani, Babak Khalaj

发表机构 * Department of Electrical Engineering, Sharif University of Technology（电气工程系，谢里夫大学）

AI总结 MALLVI通过多智能体协作实现闭环反馈驱动的机器人操作，提升泛化能力和零样本任务成功率。

Comments Some fundemental change in text and codebase

2510.17876 2026-06-10 physics.geo-ph cs.LG 版本更新

Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning

利用隐式神经表示和科学机器学习进行三维重力数据反演

Pankaj K Mishra, Sanni Laaksonen, Jochen Kamm, Anand Singh

发表机构 * Geological Survey of Finland（芬兰地质调查局）； Indian Institute of Technology Bombay（印度理工学院孟买分校）

AI总结本文提出一种基于隐式神经表示的三维重力反演方法，通过物理正演模型损失直接训练深度神经网络，实现连续密度场的反演，无需预定义网格或离散化，提升对地质结构的重建能力。

Comments Codes for reproducing results are at https://zenodo.org/records/19440024

详情

DOI: 10.1038/s41598-026-55960-5
Journal ref: Scientific Reports (2026)

AI中文摘要

重力数据反演是研究地下密度变化的重要方法，涉及矿产勘探、地热评估、碳储存、天然氢、地下水资源和构造演化。本文提出一种科学机器学习方法，利用隐式神经表示（INR）将地下密度表示为连续场。该方法通过物理正演模型损失直接训练深度神经网络，将空间坐标映射到连续密度场，无需预定义网格或离散化。空间编码增强了网络捕捉尖锐对比和短波长特征的能力，克服了传统坐标基网络因频谱偏倚导致的过度平滑问题。本文在合成示例中验证了该方法，包括平滑模型和倾斜块模型，以评估不同深度结构的恢复能力。INR框架在不使用显式正则化或深度加权的情况下重建了详细的结构和地质合理的边界，同时随着问题规模增大，减少了反演参数数量。这些结果展示了隐式表示在实现可扩展、灵活和可解释的大规模地球物理反演中的潜力。该框架可推广到其他地球物理方法及联合/多物理场反演。

英文摘要

Inversion of gravity data is an important method for investigating subsurface density variations relevant to mineral exploration, geothermal assessment, carbon storage, natural hydrogen, groundwater resources, and tectonic evolution. Here we present a scientific machine-learning approach for three-dimensional gravity inversion that represents subsurface density as a continuous field using an implicit neural representation (INR). The method trains a deep neural network directly through a physics-based forward-model loss, mapping spatial coordinates to a continuous density field without predefined meshes or discretisation. Spatial encoding enhances the network's capacity to capture sharp contrasts and short-wavelength features that conventional coordinate-based networks tend to oversmooth due to spectral bias. We demonstrate the approach on synthetic examples including smooth models, representing realistic geological complexity, and a dipping block model to assess recovery of structures at different depths. The INR framework reconstructs detailed structure and geologically plausible boundaries without explicit regularisation or depth weighting, while reducing the number of inversion parameters as the problem size grows bigger. These results highlight the potential of implicit representations to enable scalable, flexible, and interpretable large-scale geophysical inversion. This framework could generalise to other geophysical methods and for joint/multiphysics inversion.

URL PDF HTML ☆

赞 0 踩 0

2509.11517 2026-06-10 cs.CL cs.LG 版本更新

PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

PeruMedQA：在秘鲁医学考试上评估大语言模型（LLMs）——数据集构建与评估

Rodrigo M. Carrillo-Larco, Jesus Lovón Melgarejo, Manuel Castillo-Cara, Gusseppe Bravo-Rocca

发表机构 * Hubert Department of Global Health, Rollins School of Public Health, Emory University（霍伯特全球健康部门，埃默里大学公共卫生学院）； Emory Global Diabetes Research Center of Woodruff Health Sciences Center, Emory University（埃默里大学伍德鲁夫健康科学中心全球糖尿病研究中心）； Institut de Recherche en Informatique de Toulouse（图卢兹信息研究院）； Universidad Nacional de Educación a Distancia（远程教育国立大学）； Instituto de Investigación Científica, Universidad de Lima（科学研究所，利马大学）； Barcelona Supercomputing Center（巴塞罗那超级计算中心）

AI总结本文构建了包含8380道题的秘鲁医学考试数据集，通过微调大语言模型并对比不同模型的准确率，揭示了在西班牙语国家医学问题上的性能差异。

Comments https://github.com/rodrigo-carrillo/PeruMedQA

详情

DOI: 10.1007/s40670-026-02692-w

AI中文摘要

背景：医疗大语言模型（LLMs）在回答医学考试中表现出色，但其在西班牙语和拉丁美洲国家的医疗问题上的泛化能力尚不明确。目标：构建秘鲁医师专科学习考试问题数据集，对LLMs进行微调，并评估和比较普通LLMs与微调LLMs的准确性。方法：我们整理了包含8380道题的PeruMedQA数据集，涵盖12个专科（2018-2025年）。我们选择了10个医学LLMs，包括medgemma-4b-it和medgemma-27b-text-it，并开发了零样本任务特定提示来回答问题。我们使用参数高效微调（PEFT）和低秩适应（LoRA）对medgemma-4b-it进行微调，使用所有问题除外2025年（测试集）的问题。结果：medgemma-27b在所有专科中表现最佳，达到精神科89.29%的最高分；然而，在两个专科中，OctoMed-7B略胜一筹：神经外科77.27%和77.38%，放射科76.13%和77.39%。在专科层面，大多数参数少于100亿的LLM正确率低于50%。微调版的medgemma-4b-it在所有参数少于100亿的LLM中胜出，并在各种考试中与700亿参数的LLM竞争。结论：对于需要来自西班牙语国家和与秘鲁有相似流行病学特征的知识库的医疗AI应用和研究，应使用medgemma-27b-text-it。

英文摘要

BACKGROUND: Medical large language models (LLMs) have demonstrated remarkable performance in answering medical examinations. However, the extent to which this high performance is transferable to medical questions in Spanish and from a Latin American country remains unexplored. This knowledge is crucial as LLM-based medical applications gain traction in Latin America. AIMS: To build a dataset of questions medical examinations taken by Peruvian physicians pursuing specialty training; to fine-tune a LLM on this dataset; to evaluate and compare the performance in terms of accuracy between vanilla LLMs and the fine-tuned LLM. METHODS: We curated PeruMedQA, a multiple-choice question-answering (MCQA) dataset containing 8,380 questions spanning 12 specialties (2018-2025). We selected ten medical LLMs, including medgemma-4b-it and medgemma-27b-text-it, and developed zero-shot task specific prompts to answer the questions. We employed parameter-efficient fine tuning (PEFT) and low-rand adaptation (LoRA) to fine-tune medgemma-4b-it utilizing all questions except those from 2025 (test set). RESULTS: Medgemma-27b showed the highest accuracy across all specialities, achieving the highest score of 89.29% in Psychiatry; yet, in two specialties, OctoMed-7B exhibited slight superiority: Neurosurgery with 77.27% and 77.38, respectively; and Radiology with 76.13% and 77.39%, respectively. Across specialties, most LLMs with <10 billion parameters exhibited <50% of correct answers. The fine-tuned version of medgemma-4b-it emerged victorious against all LLMs with <10 billion parameters and rivaled a LLM with 70 billion parameters across various examinations. CONCLUSIONS: For medical AI applications and research that require knowledge bases from Spanish-speaking countries and those exhibiting similar epidemiological profile to Peru's, interested parties should utilize medgemma-27b-text-it.

URL PDF HTML ☆

赞 0 踩 0

2602.01023 2026-06-10 cs.IR cs.AI cs.LG 版本更新

Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

通过检索增强生成和多目标对齐统一查询自动补全中的排序与生成

Kai Yuan, Anthony Zheng, Jia Hu, Divyanshu Sheth, Hemanth Velaga, Kylee Kim, Matteo Guarrera, Besim Avci, Jianhua Li, Xuetao Yin, Rajyashree Mukherjee, Sean Suchter

发表机构 * Apple（苹果公司）； UC Berkeley（加州大学伯克利分校）

AI总结提出一个统一框架，通过检索增强生成（RAG）和多目标直接偏好优化（DPO）将查询自动补全重构为端到端列表生成，解决传统流水线长尾覆盖不足和生成方法幻觉风险的问题，并在大规模商业搜索平台上验证了有效性。

Comments 11 pages, 4 figures

详情

DOI: 10.1145/3770855.3818502
Journal ref: Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26), August 09--13, 2026, Jeju Island, Republic of Korea

AI中文摘要

查询自动补全（QAC）在用户输入时建议查询补全，帮助用户表达意图并更高效地获取结果。现有方法面临根本性挑战：传统的检索-排序流水线长尾覆盖有限且需要大量特征工程，而最近的生成方法存在幻觉和安全风险。我们提出了一个统一框架，通过检索增强生成（RAG）和多目标直接偏好优化（DPO）将QAC重构为端到端列表生成。我们的方法结合了三个关键创新：（1）将QAC重构为具有多目标优化的端到端列表生成；（2）定义并部署一套基于规则、基于模型和以LLM为评判的验证器用于QAC，并在综合方法中使用它们，结合RAG、多目标DPO和迭代批评-修订以生成高质量合成数据；（3）一种混合服务架构，可在严格的延迟约束下实现高效的生产部署。在大规模商业搜索平台上的评估显示了显著改进：离线指标在所有维度上均有提升，人工评估获得+0.40至+0.69的偏好分数，受控在线实验实现了击键次数减少5.44%和建议采纳率增加3.46%，验证了结合RAG和多目标对齐的统一生成为生产级QAC提供了有效解决方案。这项工作代表了向由大语言模型、RAG和多目标对齐驱动的端到端生成的范式转变，建立了一个经过生产验证的框架，可惠及更广泛的搜索和推荐行业。

英文摘要

Query Auto-Completion (QAC) suggests query completions as users type, helping them articulate intent and reach results more efficiently. Existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have limited long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as end-to-end list generation through Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO). Our approach combines three key innovations: (1) reformulating QAC as end-to-end list generation with multi-objective optimization; (2) defining and deploying a suite of rule-based, model-based, and LLM-as-judge verifiers for QAC, and using them in a comprehensive methodology that combines RAG, multi-objective DPO, and iterative critique-revision for high-quality synthetic data; (3) a hybrid serving architecture enabling efficient production deployment under strict latency constraints. Evaluation on a large-scale commercial search platform demonstrates substantial improvements: offline metrics show gains across all dimensions, human evaluation yields +0.40 to +0.69 preference scores, and a controlled online experiment achieves 5.44\% reduction in keystrokes and 3.46\% increase in suggestion adoption, validating that unified generation with RAG and multi-objective alignment provides an effective solution for production QAC. This work represents a paradigm shift to end-to-end generation powered by large language models, RAG, and multi-objective alignment, establishing a production-validated framework that can benefit the broader search and recommendation industry.

URL PDF HTML ☆

赞 0 踩 0

2510.09498 2026-06-10 q-bio.TO cs.CE cs.LG 版本更新

Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study

无监督的全场贝叶斯推断各向异性超弹性材料从单次双轴测试：心肌案例研究

Rogier P. Krijnen, Akshay Joshi, Siddhant Kumar, Mathias Peirlinck

发表机构 * TUDelft（代尔夫特理工大学）

AI总结本文提出利用全场运动学进行无监督贝叶斯推断，实现从单次双轴测试中可靠恢复各向异性超弹性材料参数，减少样本需求和实验操作。

详情

DOI: 10.1016/j.cma.2026.119034

AI中文摘要

心脏肌肉组织在被动变形过程中表现出高度非线性超弹性和各向异性材料行为。传统本构识别协议通常结合多种加载模式，通常需要多个样本和大量处理。在软活组织中，此类协议受到样本间和样本内变异性以及操作诱导的机械响应变化的挑战，这些变化可能偏转逆校准。在本工作中，我们利用空间异质性的全场运动学作为多模式测试的替代方案。我们将EUCLID方法重新定向为用于高度非线性、各向异性本构模型的贝叶斯参数推断。使用合成心肌组织板，我们证明单次异质双轴实验结合稀疏反力测量能够可靠恢复Holzapfel-Ogden参数并量化不确定性，适用于多种噪声水平。推断响应与地面真实模拟高度一致，并产生反映测量噪声对各向异性材料模型推断影响的可信区间。我们的工作支持从单次双轴测试中进行非线性各向异性材料模型的单次测试、不确定性感知表征，减少样本需求和实验操作。

英文摘要

Cardiac muscle tissue exhibits highly non-linear hyperelastic and orthotropic material behavior during passive deformation. Traditional constitutive identification protocols therefore combine multiple loading modes and typically require multiple specimens and substantial handling. In soft living tissues, such protocols are challenged by inter- and intra-sample variability and by manipulation-induced alterations of mechanical response, which can bias inverse calibration. In this work we exploit spatially heterogeneous full-field kinematics as an information-rich alternative to multimodal testing. We recast EUCLID, an unsupervised method for the automated discovery of constitutive models, towards Bayesian parameter inference for highly nonlinear, orthotropic constitutive models. Using synthetic myocardial tissue slabs, we demonstrate that a single heterogeneous biaxial experiment, combined with sparse reaction-force measurements, enables robust recovery of Holzapfel-Ogden parameters with quantified uncertainty, across multiple noise levels. The inferred responses agree closely with ground-truth simulations and yield credible intervals that reflect the impact of measurement noise on orthotropic material model inference. Our work supports single-shot, uncertainty-aware characterization of nonlinear orthotropic material models from a single biaxial test, reducing sample demand and experimental manipulation.

URL PDF HTML ☆

赞 0 踩 0

2510.03844 2026-06-10 cs.LG stat.AP stat.ME 版本更新

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

利用大型语言模型增强电子健康记录中临床驱动的缺失数据恢复算法

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

发表机构 * Department of Statistical Sciences, Wake Forest University（统计科学系，威克森林大学）； Wake Forest University（威克森林大学）； Wake Forest University School of Medicine（威克森林大学医学院）； Department of Psychology, North Carolina State University（心理学系，北卡罗来纳州立大学）； Department of Biostatistics and Data Science, Wake Forest University School of Medicine（生物统计学与数据科学系，威克森林大学医学院）； Department of Anesthesiology, Division of Critical Care Medicine, Wake Forest University School of Medicine（麻醉学系，重症医学科，威克森林大学医学院）； Outcomes Research Consortium（结局研究联盟）

AI总结本文探讨利用大型语言模型改进电子健康记录中缺失数据恢复算法的准确性与可扩展性，通过临床专家和LLM协同优化路标，实现与专家审查相似的数据恢复效果。

详情

DOI: 10.1093/jamiaopen/ooag080
Journal ref: 2026

AI中文摘要

目的：电子健康记录（EHR）数据易出现缺失和错误。先前，我们设计了一种“增强”图表审查协议，利用辅助诊断（路标）来恢复EHR数据中的缺失值（例如，糖尿病控制不良可能暗示缺失的血红蛋白A1c值不健康）。然而，图表审查成本高且耗时，限制了可审查患者的数量。现在，我们研究了基于ICD-10代码的路标驱动算法的准确性和可扩展性，以模拟专家图表审查并恢复缺失值。材料和方法：除了临床专家原始的路标外，我们考虑了通过大型语言模型（LLM）与临床专业知识结合迭代优化的新版本，以扩展辅助诊断列表。使用100名患者在扩展学习健康系统中的图表审查数据，我们检验了不同路标下的算法性能。在1000名患者的更大研究中，我们应用了最终算法，该算法使用了经临床专家批准的LLM添加的路标。结果：该算法恢复的缺失数据量与专家图表审查相当，甚至更多。讨论：临床驱动的算法（通过LLM增强）可以以与图表审查相似的准确性恢复EHR数据，并可应用于大规模样本。将这些算法扩展以监控其他数据质量维度（如合理性）是具有前景的未来方向。

英文摘要

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

URL PDF HTML ☆

赞 0 踩 0

2508.17196 2026-06-10 cs.LG cs.AI 版本更新

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

BudgetThinker: 通过控制令牌赋能预算感知的LLM推理

Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li

发表机构 * Institute for AI Industry Research (AIR) Tsinghua University（人工智能产业研究院（AIR）清华大学）； Global Innovation Exchange & Department of Automation Tsinghua University（全球创新交流中心及自动化系清华大学）

AI总结 BudgetThinker通过在推理过程中插入控制令牌，使LLM能够精确控制推理过程长度，采用两阶段训练流程提升模型在不同预算下的表现。

详情

AI中文摘要

近年来，大语言模型（LLM）通过增加测试时计算来增强推理能力，但此策略导致显著延迟和资源成本，限制了其在现实时间受限或成本敏感场景中的应用。本文提出BudgetThinker，一种新型框架，旨在使LLM具备预算感知推理能力，通过在推理过程中定期插入特殊控制令牌，持续告知模型剩余令牌预算。我们提出了一种方法，结合监督微调（SFT）和基于课程的学习强化学习（RL）阶段，利用长度感知奖励函数优化准确性和预算遵守度。我们证明BudgetThinker在各种推理预算下的数学基准测试中显著优于强基线。我们的方法提供了一种可扩展且有效的解决方案，用于开发高效可控的LLM推理，使高级模型更适用于资源受限和实时环境。

英文摘要

Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

URL PDF HTML ☆

赞 0 踩 0

2410.22967 2026-06-10 cs.LG eess.SP 版本更新

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

自适应NAD：在线且自适应的无监督网络异常检测器

Yachao Yuan, Yu Huang, Yingwen Wu

发表机构 * Suda University（苏州大学）

AI总结提出一种在线自适应的无监督网络异常检测框架Adaptive NAD，通过两层异常检测策略生成伪标签和在线训练方案，在多个数据集上实现最低误报率和更快推理速度。

详情

AI中文摘要

物联网的广泛使用增加了网络威胁的风险；因此，开发能够适应不断变化的流量模式的异常检测系统（ADS）至关重要。以往的研究主要关注离线无监督学习方法以保护ADS，但这在实际应用中并不适用。本文设计了Adaptive NAD，一种面向安全领域的在线自适应无监督网络异常检测框架。提出了一种两层异常检测策略来生成可靠的高置信度伪标签。然后，引入了一种在线训练方案，通过新颖的阈值计算技术来更新Adaptive NAD。实验结果表明，在CIC-Darknet2020、NSL-KDD和Edge-IIoTset数据集上，Adaptive NAD实现了最低的误报率（分别为1.33%、0.71%和0.08%），并且在线推理延迟比现有最优解决方案快3倍以上。代码已发布在https://github.com/MyLearnCodeSpace/Adaptive-NAD。

英文摘要

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats; thus, developing Anomaly Detection Systems (ADSs) that can adapt to evolving traffic pattern is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. In this paper, we design Adaptive NAD, an online and self-Adaptive unsupervised Network Anomaly Detection framework for security domains. A two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online training scheme is introduced to update Adaptive NAD by a novel threshold calculation technique. Experimental results demonstrate that Adaptive NAD achieves the lowest false alarm rate (1.33%, 0.71%, and 0.08%) and has a more than 3 times faster online inference latency compared with state-of-the-art solutions on the CIC-Darknet2020, NSL-KDD, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

URL PDF HTML ☆

赞 0 踩 0

2501.12486 2026-06-10 cs.LG cs.CL 版本更新

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

训练过程至关重要：平均预训练参数计数统一了稀疏和密集的扩展规律

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci, Suvinay Subramanian, Amir Yazdanbakhsh, Dan Alistarh, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL（MIT 计算科学与人工智能实验室）； Rice University（稻大学）； Google Research（谷歌研究）； Google DeepMind（谷歌深度思维）； Google（谷歌）； IST Austria（奥地利科学院）

AI总结本文通过研究80种不同的剪枝计划，发现预训练过程中在25%和75%的计算量启动和结束剪枝可获得最佳评估损失，提出新的扩展规律统一了稀疏和密集预训练的扩展规律。

Comments 17 pages

详情

Journal ref: The Thirteenth International Conference on Learning Representations (ICLR), 2025

AI中文摘要

剪枝通过消除神经网络中不必要的参数，为大型语言模型（LLMs）日益增长的计算需求提供了一个有前途的解决方案。虽然许多研究关注训练后的剪枝，但将剪枝和预训练结合到一个阶段的稀疏预训练提供了一个更简单的替代方案。在本文中，我们通过研究80种不同的剪枝计划，探讨了不同稀疏度和训练持续时间下的最优稀疏预训练配置。我们发现，在总训练计算量的25%处启动剪枝并在75%处结束可获得接近最优的最终评估损失。这些发现为高效且有效的LLMs稀疏预训练提供了有价值的见解。此外，我们提出了一种新的扩展规律，修改了Chinchilla扩展规律以使用预训练期间的平均参数计数。通过实证和理论验证，我们证明了这种修改后的扩展规律能够准确地建模稀疏和密集预训练LLMs的评估损失，统一了预训练范式的扩展规律。我们的发现表明，虽然稀疏预训练在等效计算预算下能获得与密集预训练相同的最终模型质量，但通过减少模型大小，它在推理过程中提供了显著的计算节省潜力。

英文摘要

Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large language models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.

URL PDF HTML ☆

赞 0 踩 0

2502.11517 2026-06-10 cs.CL cs.DC cs.LG 版本更新

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

学习承诺：通过学习异步解码扩展语言模型解码并行性

Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin

发表机构 * DeepMind, London, UK（深度思维公司，伦敦，英国）； Google Research, New York, NY, USA（谷歌研究院，纽约，纽约州，美国）； Stanford University, Stanford, CA, USA（斯坦福大学，斯坦福，加利福尼亚州，美国）； University of Toronto, Toronto, Ontario, Canada（多伦多大学，多伦多，安大略省，加拿大）； University of Washington, Seattle, WA, USA（华盛顿大学，西雅图，华盛顿州，美国）

AI总结本文提出PASTA系统，通过学习使语言模型识别语义独立性，提升解码并行性，实验证明在解码速度和响应质量上优于现有方法。

Comments 15 pages

详情

Journal ref: Proceedings of the 42nd International Conference on Machine Learning (ICML), PMLR 267:27941-27956, 2025

AI中文摘要

传统的自回归大语言模型（LLM）解码通常是顺序进行的，逐个生成token。新兴的研究探索了通过识别并同时生成语义独立的LLM响应片段来实现并行解码。然而，这些技术依赖于手工制定的启发式方法，与语法结构如列表和段落相关，使它们僵化且不精确。我们提出了PASTA，一个基于学习的系统，教会LLM识别语义独立性并在自身响应中表达并行解码机会。其核心是PASTA-LANG及其解释器：PASTA-LANG是一种注释语言，使LLM能够在自身响应中表达语义独立性；语言解释器作用于这些注释，以在推理时实时协调并行解码。通过两阶段微调过程，我们训练LLM生成PASTA-LANG注释，以优化响应质量和解码速度。在AlpacaEval指令遵循基准上的评估显示，我们的方法在解码速度和响应质量上优于现有方法；我们的结果表明，几何平均速度提升范围从1.21x到1.93x，对应的质量变化为+2.2%到-7.1%，通过长度控制的胜利率与顺序解码基线比较。

英文摘要

Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teaches LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that enables LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate parallel decoding on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21x to 1.93x with corresponding quality changes of +2.2% to -7.1%, measured by length-controlled win rates against sequential decoding baseline.

URL PDF HTML ☆

赞 0 踩 0

2501.11937 2026-06-10 cs.LG cs.AI 版本更新

MeshONet: A Generalizable and Efficient Operator Learning Method for Structured Mesh Generation

MeshONet: 一种通用且高效的结构网格生成运算学习方法

Jing Xiao, Xinhai Chen, Qingling Wang, Jie Liu

发表机构 * Laboratory of Digitizing Software for Frontier Equipment, Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology（前沿装备数字化软件实验室、并行与分布式处理技术实验室、国防科技大学）

AI总结本文提出MeshONet，一种用于结构网格生成的通用智能学习方法，通过将网格生成任务转化为运算学习问题，实现了高效生成和跨几何泛化。

详情

DOI: 10.1016/j.neunet.2026.108746
Journal ref: Neural Networks 199: 108746 (2026)

AI中文摘要

网格生成在科学计算中起着关键作用。传统网格生成方法如TFI和基于PDE的方法往往难以在效率和网格质量之间取得平衡。为解决这一挑战，近年来出现了物理引导的智能学习方法，显著提高了生成效率并保持了高质量网格。然而，物理引导方法在应用于以前未见过的几何时无法泛化，因为即使边界形状的微小变化也需要负担得起的重新训练来适应新的几何变化。在本文中，我们引入了MeshONet，这是第一个用于结构网格生成的通用智能学习方法。该方法将网格生成任务转换为一个具有多个输入和解函数的运算学习问题。为了有效克服运算学习方法的多变量映射限制，我们提出了一种双分支、共享主干的架构，以基于输入输出对的方式近似函数空间之间的映射。实验结果表明，MeshONet在生成效率上比传统方法快了四个数量级。它还能够泛化到不同的几何形状而无需重新训练，大大增强了智能方法的实用性。

英文摘要

Mesh generation plays a crucial role in scientific computing. Traditional mesh generation methods, such as TFI and PDE-based methods, often struggle to achieve a balance between efficiency and mesh quality. To address this challenge, physics-informed intelligent learning methods have recently emerged, significantly improving generation efficiency while maintaining high mesh quality. However, physics-informed methods fail to generalize when applied to previously unseen geometries, as even small changes in the boundary shape necessitate burdensome retraining to adapt to new geometric variations. In this paper, we introduce MeshONet, the first generalizable intelligent learning method for structured mesh generation. The method transforms the mesh generation task into an operator learning problem with multiple input and solution functions. To effectively overcome the multivariable mapping restriction of operator learning methods, we propose a dual-branch, shared-trunk architecture to approximate the mapping between function spaces based on input-output pairs. Experimental results show that MeshONet achieves a speedup of up to four orders of magnitude in generation efficiency over traditional methods. It also enables generalization to different geometries without retraining, greatly enhancing the practicality of intelligent methods.

URL PDF HTML ☆

赞 0 踩 0

2409.12263 2026-06-10 cs.LG cs.SI 版本更新

Detecting LGBTQ+ Instances of Cyberbullying

检测LGBTQ+群体的网络欺凌实例

Arslan Bisharat, Manuel Sandoval Madrigal, Mohammed Abuhamad, Deborah L. Hall, Yasin N. Silva

发表机构 * Loyola University Chicago（洛伊拉大学芝加哥分校）； Arizona State University（亚利桑那州立大学）

AI总结本文研究利用Transformer模型识别针对LGBTQ+群体的网络欺凌，分析不同模型在复杂微妙欺凌行为中的有效性。

Comments 10 pages, 4 tables, 1 figure, 17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation

2409.04519 2026-06-10 quant-ph cs.AI cs.LG physics.data-an 版本更新

The role of data embedding in quantum autoencoders for improved anomaly detection

数据嵌入在量子自编码器中用于改进异常检测的作用

Jack Y. Araz, Michael Spannowsky

发表机构 * Thomas Jefferson National Accelerator Facility（托马斯·杰斐逊国家加速器设施）； Institute for Particle Physics Phenomenology（粒子物理学现象研究所）； Durham University（达勒姆大学）

AI总结研究探讨了三种数据嵌入技术对量子自编码器异常检测性能的影响，发现改进的嵌入策略能显著提升检测准确性和数据表征能力。

Comments 8 pages, 5 figures, 4 tables

详情

DOI: 10.1007/s42484-026-00404-6
Journal ref: Quantum Mach. Intell. 8, 61 (2026)

AI中文摘要

量子自编码器（QAEs）在异常检测任务中的性能严重依赖于数据嵌入和ansatz设计。本研究探讨了三种数据嵌入技术——数据重新上传、并行嵌入和交替嵌入——对QAEs表征能力和检测效果的影响。我们的发现表明，即使使用相对简单的变分电路，增强的数据嵌入策略也能显著提高异常检测准确性和不同数据集下底层数据的表征能力。从低维数据的玩具示例开始，我们通过可视化展示不同嵌入技术对模型表征的影响。然后我们将分析扩展到复杂、高维数据集，强调嵌入方法对QAE性能的重大影响。

英文摘要

The performance of Quantum Autoencoders (QAEs) in anomaly detection tasks is critically dependent on the choice of data embedding and ansatz design. This study explores the effects of three data embedding techniques, data re-uploading, parallel embedding, and alternate embedding, on the representability and effectiveness of QAEs in detecting anomalies. Our findings reveal that even with relatively simple variational circuits, enhanced data embedding strategies can substantially improve anomaly detection accuracy and the representability of underlying data across different datasets. Starting with toy examples featuring low-dimensional data, we visually demonstrate the effect of different embedding techniques on the representability of the model. We then extend our analysis to complex, higher-dimensional datasets, highlighting the significant impact of embedding methods on QAE performance.

URL PDF HTML ☆

赞 0 踩 0

2408.07925 2026-06-10 cs.LG eess.SP 版本更新

A Single Channel-Based Neonatal Sleep-Wake Classification using Hjorth Parameters and Improved Gradient Boosting

基于Hjorth参数和改进梯度提升的单通道新生儿睡眠-觉醒分类

Arslan Bisharat, Muhammad Mubeen, Saadullah Farooq Abbasi, Muhammad Shahbaz Khan, Wadii Boulila, Jawad Ahmad

发表机构 * Department of Computer Science, Loyola University（洛约拉大学计算机科学系）； Department of Computer Science, University of People（人民大学计算机科学系）； Department of Electronic, Electrical and Systems Engineering, University of Birmingham（伯明翰大学电子、电气与系统工程系）； School of Computing, Engineering and the Built Environment, Edinburgh Napier University（爱丁堡纳皮尔大学计算、工程与环境科学学院）； RIOTU Lab, Prince Sultan University（普森大学RIOTU实验室）

AI总结本文提出利用单通道梯度提升算法与Hjorth特征进行新生儿睡眠阶段分类，通过随机搜索交叉验证优化参数，达到82.35%的分类准确率，验证方法采用5折交叉验证，提升现有算法并拓展应用范围。

Comments 8 pages, 5 figures, 3 tables, International Polydisciplinary Conference on Artificial Intelligence and New Technologies

详情

AI中文摘要

睡眠在新生儿发育中起关键作用。在新生儿重症监护室（NICU）中监测新生儿睡眠模式对于理解成熟过程至关重要。尽管多通道脑电图（EEG）被认为是睡眠分类的最佳实践，但其成本和对人工标注的依赖带来了挑战。现有研究常依赖多通道EEG信号，但对新生儿的脆弱性和可能影响睡眠质量存在担忧。本文提出一种新颖的新生儿睡眠阶段分类方法，采用单通道梯度提升算法与Hjorth特征。梯度提升参数通过随机搜索交叉验证（randomsearchCV）进行微调，实现82.35%的新生儿睡眠-觉醒分类准确率。通过5折交叉验证进行验证。所提算法不仅提升了现有新生儿睡眠算法，还为更广泛的应用开辟了新途径。

英文摘要

Sleep plays a crucial role in neonatal development. Monitoring the sleep patterns in neonates in a Neonatal Intensive Care Unit (NICU) is imperative for understanding the maturation process. While polysomnography (PSG) is considered the best practice for sleep classification, its expense and reliance on human annotation pose challenges. Existing research often relies on multichannel EEG signals; however, concerns arise regarding the vulnerability of neonates and the potential impact on their sleep quality. This paper introduces a novel approach to neonatal sleep stage classification using a single-channel gradient boosting algorithm with Hjorth features. The gradient boosting parameters are fine-tuned using random search cross-validation (randomsearchCV), achieving an accuracy of 82.35% for neonatal sleep-wake classification. Validation is conducted through 5-fold cross-validation. The proposed algorithm not only enhances existing neonatal sleep algorithms but also opens avenues for broader applications.

URL PDF HTML ☆

赞 0 踩 0

2408.07922 2026-06-10 cs.CV cs.LG 版本更新

A Deep Features-Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

基于改进ResNet50和梯度提升的深度特征方法用于视觉情感分类

Arslan Bisharat, Muhammad Mubeen, Arslan Akram, Saadullah Farooq Abbasi, Muhammad Salman Ali, Muhammad Usman Tariq

发表机构 * Department of Computer Science（计算机科学系）； Loyola University Chicago（芝加哥洛伊拉大学）； University Of the People（人民大学）； The Superior University Lahore（拉合尔超级大学）； University of Birmingham（伯明翰大学）

AI总结本文提出一种结合改进ResNet50提取深度特征和梯度提升算法的情感分类方法，通过两个基准数据集验证，优于现有深度学习和机器学习模型。

Comments 4 pages, 4 figures, 3 tables, IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2024

详情

AI中文摘要

视觉情感分析（VSA）的多功能性是其日益受到关注的原因之一。由于以往研究主要集中在单一模态的情感分析上，如文本，因此难以高效管理包含视觉信息的社会媒体数据。此外，大多数视觉情感研究需要充分分类情感，因为它们主要关注简单合并模态属性而未深入研究其复杂关系。为此，提出了一种融合深度学习和机器学习算法的方法。本研究使用深度特征方法进行多类分类，从改进的ResNet50中提取深度特征，并使用梯度提升算法对包含情感内容的照片进行分类。该方法在两个基准数据集CrowdFlower和GAPED上进行了彻底评估。最后，使用最先进的深度学习和机器学习模型来比较所提出的方法。与现有最先进的方法相比，所提出的方法在所呈现的数据集上表现出色。

英文摘要

The versatile nature of Visual Sentiment Analysis (VSA) is one reason for its rising profile. It isn't easy to efficiently manage social media data with visual information since previous research has concentrated on Sentiment Analysis (SA) of single modalities, like textual. In addition, most visual sentiment studies need to adequately classify sentiment because they are mainly focused on simply merging modal attributes without investigating their intricate relationships. This prompted the suggestion of developing a fusion of deep learning and machine learning algorithms. In this research, a deep feature-based method for multiclass classification has been used to extract deep features from modified ResNet50. Furthermore, gradient boosting algorithm has been used to classify photos containing emotional content. The approach is thoroughly evaluated on two benchmarked datasets, CrowdFlower and GAPED. Finally, cutting-edge deep learning and machine learning models were used to compare the proposed strategy. When compared to state-of-the-art approaches, the proposed method demonstrates exceptional performance on the datasets presented.

URL PDF HTML ☆

赞 0 踩 0

2310.04680 2026-06-10 cs.CL cs.AI cs.LG 版本更新

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

大语言模型降维的成本：事实回忆在内省学习之前恶化

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL（麻省理工学院计算机科学与人工智能实验室）； MIT Harvard University（麻省理工学院哈佛大学）； Google Research（谷歌研究）； Google DeepMind（谷歌深Mind）

AI总结研究探讨了大语言模型参数数量缩放对核心能力的影响，发现模型规模缩减会显著降低事实回忆能力，但对内省信息处理影响较小。

详情

Journal ref: The Twelfth International Conference on Learning Representations (ICLR), 2024

AI中文摘要

如何缩放大语言模型（LLMs）的参数数量会影响其核心能力？我们研究了两种自然缩放技术——权重剪枝和简单训练更小或更大的模型（称为密集缩放）——对LLMs两个核心能力的影响：（a）回忆训练期间呈现的事实，以及（b）处理推理期间呈现的信息。通过设计一系列任务来区分这两种能力，我们发现这两种能力在缩放时的表现存在显著差异。通过超过30%的模型规模缩减（通过任一缩放方法）会显著降低对训练期间呈现事实的回忆能力。然而，60-70%的缩减在很大程度上保留了模型处理内省信息的各种方式，从从长上下文检索答案到从内省示例中学习参数化函数。两种缩放方法均表现出这种行为，表明缩放模型大小对事实回忆和内省学习有本质上不同的影响。

英文摘要

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

URL PDF HTML ☆

赞 0 踩 0

2212.04930 2026-06-10 eess.AS cs.HC cs.LG cs.SD 版本更新

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

DDSupport: 一种展示差异和距离的语言学习支持系统

Kazuki Kawamura, Jun Rekimoto

发表机构 * The University of Tokyo, Tokyo, Japan（东京大学）； Sony CSL Kyoto, Kyoto, Japan（索尼CSL京都）

AI总结本文提出DDSupport系统，通过小规模未标注语音数据计算学习者发音评分和错误识别，以直观方式展示学习者与模型发音的差异和距离，帮助非母语者提升英语口语清晰度。

详情

DOI: 10.1109/ICMLA55696.2022.00051
Journal ref: 2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)

AI中文摘要

当初学者学习非母语发音时，他们难以自行判断发音是否良好。因此，计算机辅助发音训练系统被用来检测学习者的误发音。这些系统通常将用户发音与特定母语者的发音进行比较，以节奏、音素或单词为单位计算差异。然而，它们需要大量详细标注的语音数据或只能比较单一特定母语者。为克服这些问题，我们提出了一种新的语言学习支持系统，该系统基于少量未标注语音数据计算发音评分和检测初学者的误发音，而无需与特定个体比较。所提出的系统使用基于深度学习的语音处理技术，以直观的方式显示学习者发音的评分以及学习者与一组模型发音之间的差异/距离。学习者可以通过消除差异并缩短与模型的距离来逐步提高发音。此外，由于发音评分和差异/距离不是基于特定模型的特定句子计算的，用户可以自由选择他们想学习的句子。我们还构建了一个应用程序来帮助非母语者学习英语，并确认它可以提高用户的语音可懂度。

英文摘要

When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.

URL PDF HTML ☆

赞 0 踩 0

2209.14328 2026-06-10 quant-ph cond-mat.quant-gas cond-mat.str-el cs.LG 版本更新

Scalably learning quantum many-body Hamiltonians from dynamical data

从动力学数据中可扩展地学习量子多体哈密顿量

Frederik Wilde, Augustine Kshetrimayum, Ingo Roth, Dominik Hangleiter, Ryan Sweke, Jens Eisert

发表机构 * Dahlem Center for Complex Quantum Systems（达姆施塔特复杂量子系统中心）； Quantum Research Centre, Technology Innovation Institute (TII)（技术创新研究所量子研究中心）； Joint Center for Quantum Information and Computer Science (QuICS), University of Maryland & NIST（联合量子信息与计算机科学中心（QuICS），马里兰大学及国家标准与技术研究院）； Joint Quantum Institute (JQI), University of Maryland & NIST（联合量子研究所（JQI），马里兰大学及国家标准与技术研究院）； Fraunhofer Heinrich Hertz Institute（弗劳恩霍夫海因里希·赫兹研究所）

AI总结本文提出一种可扩展的数据驱动方法，利用梯度优化和张量网络，从动力学数据中学习多体相互作用哈密顿量家族，针对一维Heisenberg模型展示出误差随系统规模递减的特性。

Comments 11 pages, 5 figures

详情

DOI: 10.1088/2058-9565/ae6fe3
Journal ref: Quantum Sci. Technol. 11, 035002 (2026)

AI中文摘要

闭合量子系统的物理由其哈密顿量决定。然而在大多数实际情况下，哈密顿量并不精确已知，最终只有通过测量获得的数据存在。本文提出一种高度可扩展、数据驱动的方法，结合机器学习中的梯度优化技术和高效的张量网络量子态表示，从动力学数据中学习多体相互作用哈密顿量家族。该方法具有高度实用性、实验友好性，并能扩展到超过100个自旋的系统规模。特别地，在合成数据上展示，即使受限于单一初始态、少量单量子比特可观测量和相对较短的时间演化，算法仍能有效工作。对于一维Heisenberg模型，该算法的误差常数与系统规模成反比，且随数据集大小的平方根递减。

英文摘要

The physics of a closed quantum mechanical system is governed by its Hamiltonian. However, in most practical situations, this Hamiltonian is not precisely known, and ultimately all there is are data obtained from measurements on the system. In this work, we introduce a highly scalable, data-driven approach to learning families of interacting many-body Hamiltonians from dynamical data, by bringing together techniques from gradient-based optimization from machine learning with efficient quantum state representations in terms of tensor networks. Our approach is highly practical, experimentally friendly, and intrinsically scalable to allow for system sizes of above 100 spins. In particular, we demonstrate on synthetic data that the algorithm works even if one is restricted to one simple initial state, a small number of single-qubit observables, and time evolution up to relatively short times. For the concrete example of the one-dimensional Heisenberg model our algorithm exhibits an error constant in the system size and scaling as the inverse square root of the size of the data set.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 45 篇

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

Blurry Window Attention

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

SinkRec: Mitigating Semantic State Sink in Long Sequence Recommendation with Memory-Conditioned Gated Delta Networks

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

SPDM: Geometry-Modulated State Space Modeling with Manifold Constraints for Time Series Forecasting

Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment

Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks

Does Normalization Choice Matter for Causal Large Time-Series Models?

Structured Adaptive Tensor Prediction for Streaming Data

Nonlinear Estimator: Dual Bayesian Affine Estimators for Parameter Learning

Learning Entropy and Spatial Adaptation Dynamics of Multilayer Perceptrons for Structural Point Extraction

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

LLM-Guided Neural Architecture Search for Robust Co-Design of Physical Neural Networks

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

FOGO: Forgetting-aware Orthogonalization Optimizer

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

Embedding Hybrid Systems into Continuous Latent Vector Fields

PL-KKT-hPINN: Enforcing Nonlinear Equality Constraints on Neural Networks via Piecewise-Linear Projection

Recoverable but Not Stationary:Local Linear Structures in Weights and Activations

Unifying Local Communications and Local Updates for LLM Pretraining

Predicting Future Behaviors in Reasoning Models Enables Better Steering

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

Gradient-Guided Reward Optimization for Inference-time Alignment

SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

Compile Once, Differentiate Everywhere: A Differentiable Meta-Circular Interpreter

Effective Training Principles of Physical Reservoirs

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning

The Role of Feedback Alignment in Self-Distillation

Mixtures of Neural Operators Reduce Active Complexity in Operator Learning

AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

ATLAS: Verifier-Guided Adaptive Latent Activation Steering for Efficient LLM Reasoning

MemCast: Memory-Driven Time Series Forecasting with Experience-Conditioned Reasoning

PRISM: Parallel Residual Iterative Sequence Model

Communication Dynamics Neural Networks: FFT-Diagonalized Layers for Improved Hessian Conditioning at Reduced Parameter Count

Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

Enhancing AI Interpretability and Safety through Localised Architectures

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

Updating the standard neuron model in artificial neural networks

2. 表示学习、自监督与对比学习 14 篇

SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning

Interactions Between Crosscoder Features: A Compact Proofs Perspective

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing

ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs

Closing the Modality Gap in Zero-Shot HAR: Contrastive Training and Separability-Optimized Prototypes on IMU Data

Learning Doubly Sparse Explicitly Conditioned Transforms

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

Tractogram foundation model

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

Post-Training Augmentation Invariance

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory

3. 强化学习与序列决策 37 篇

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

SocraticPO: Policy Optimization via Interactive Guidance

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

MODIP: Efficient Model-Based Optimization for Diffusion Policies

Geometrically Averaged Hard Target Updates for Linear Q-Learning

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning