arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 45 篇

2606.09861 2026-06-10 cs.LG cs.AI 新提交

Time Series as Language: A Universal Tokenizer for General-Purpose Time Series Foundation Models

时间序列作为语言:通用时间序列基础模型的通用分词器

Yunhao Zhang, Ruiying Qi, Jiale Zheng, Jianfeng Zhang, Lujia Pan, Junchi Yan

AI总结 提出UniTok通用分词器将时间序列转化为离散令牌,并基于NTP预训练UniTok-FM基础模型,支持零样本预测、提示增强预测以及少样本生成和分类,无需任务特定修改。

详情
AI中文摘要

虽然下一个令牌预测(NTP)统一了LLM的预训练,但其对无界、连续时间序列(TS)的适应仍然是一个开放问题。为了弥合这一差距,我们引入了UniTok,一个将TS转化为离散令牌的通用分词器,以及UniTok-FM,一个在这些令牌上通过NTP预训练的基础模型。UniTok-FM是一个通用基础模型,支持零样本和提示增强的预测,以及通过无训练上下文推理进行的少样本生成和分类——这是先前工作未能实现的能力。在技术上,UniTok是一个向量量化自编码器,结合了前缀归一化以实现尺度稳定、渐进分辨率因果架构用于编码和解码,以及结构保持重建损失用于训练。UniTok-FM采用现成的LLM架构,无需针对TS的特定修改。它不是在孤立的TS上预训练,而是在由多个具有相似模式的序列形成的上下文窗口上执行NTP,旨在捕捉它们的共享动态。在预测、生成和分类上的实验表明,单个统一的UniTok-FM始终优于统计和监督基线,与任务特定的基础模型性能相当,并且独特地实现了跨任务的无训练上下文推理。

英文摘要

While Next-Token Prediction (NTP) has unified LLM pretraining, its adaptation to unbounded, continuous time series (TS) remains open. To bridge the gap, we introduce UniTok, a universal tokenizer that transforms TS into discrete tokens, and UniTok-FM, a foundation model pretrained via NTP on these tokens. UniTok-FM is a general-purpose foundation model that supports zero-shot and prompt-boosted forecasting, as well as few-shot generation and classification via training-free in-context inference--a capability not achieved by prior works. Technically, UniTok is a vector-quantized autoencoder incorporating prefix normalization for scale stabilization, a progressive-resolution causal architecture for encoding and decoding, and a structure-preserving reconstruction loss for training. UniTok-FM adopts an off-the-shelf LLM architecture without TS-specific modifications. Instead of pretraining on isolated TS, it performs NTP on context windows formed by multiple series with similar patterns, aiming to capture their shared dynamics. Experiments on forecasting, generation, and classification show that a single unified UniTok-FM consistently outperforms statistical and supervised baselines, achieves competitive performance with task-specific foundation models, and uniquely enables training-free in-context inference across tasks.

2606.09862 2026-06-10 cs.LG cs.AI 新提交

Blurry Window Attention

模糊窗口注意力

Axel Laborieux, Christos Sourmpis, Juan Gabriel Kostelec, Qinghai Guo

AI总结 提出模糊窗口注意力(BLA),一种基于Dirichlet核插值重构模糊KV历史的有界记忆控制方法,在合成任务中状态效率比滑动窗口注意力高8倍,且随状态增大性能提升。

详情
AI中文摘要

Transformer语言模型中的Softmax注意力操作在序列长度上具有二次复杂度,且状态大小以KV缓存形式增长,这成为长上下文场景中的瓶颈。为克服此限制,引入了具有线性复杂度和有限状态大小的替代架构,如状态空间模型(SSM)、线性注意力(LA)和有界记忆控制注意力(ABC)。尽管线性模型在语言困惑度上与Transformer相当,但在需要检索或回忆特定信息的任务中仍落后。本文提出模糊窗口注意力(BLA),一种受SSM启发的新型ABC方法。BLA存储一个频率窗口,通过使用Dirichlet核进行插值从中重建模糊的KV历史。根据Dirichlet核的分辨率,BLA可理解为滑动窗口注意力(SWA)的泛化,或门控槽注意力(GSA)的特例,其中衰减因子由Dirichlet核实现。我们详细描述了BLA的理论和高效实现。在多查询关联回忆(MQAR)合成任务上,我们表明BLA的状态效率比SWA高8倍,且与流行的线性注意力模型竞争;在RegBench合成任务中,在我们测试的线性模型中,只有BLA和SWA随着状态大小增长而提升性能。

英文摘要

The Softmax Attention operation in Transformer language models has a quadratic complexity in the sequence length and a growing state size in the form of KV cache, which becomes a bottleneck in long context scenarios. To overcome this limitation, alternative architectures with linear complexity and finite state size have been introduced, such as State-Space Models (SSMs), Linear Attention (LA), and Attention with Bounded-memory Control (ABC). Though linear models achieve similar language perplexity as Transformers, they are still behind in tasks which require retrieval or recall of specific information. In this work, we introduce Blurry Window Attention (BLA) a novel ABC method inspired by SSMs. BLA stores a frequency window from which a blurry KV history is reconstructed via interpolation using Dirichlet kernels. BLA can be understood as a generalization of Sliding Window Attention (SWA) depending on the Dirichlet kernels resolution or as a special case of the Gated Slot Attention (GSA), where the decay factor is implemented with Dirichlet kernels. We describe in details the theory and efficient implementation of BLA. On the Multi-Query Associate Recall (MQAR) synthetic task, we show that the state efficiency of BLA is 8$\times$ better than SWA and is competitive with popular linear attention models, and in the RegBench synthetic task, only BLA and SWA improve their performance as the state size grows among the linear models we tested.

2606.09873 2026-06-10 cs.LG cs.AI 新提交

Rotate2Think: Geometric Priming via Orthogonal Rotation to Improve Language Model Reasoning

Rotate2Think:通过正交旋转进行几何提示以提升语言模型推理能力

Aditya Sharma, Christopher J. Pal, Amal Zouaq

AI总结 发现推理模型的输入嵌入与思考嵌入存在高锥度且方向非共线,提出无训练方法Rotate2Think,通过正交Procrustes分析估计旋转并注入合成思考向量,在30/32配置中提升数学、科学和代码任务准确率。

详情
AI中文摘要

推理模型通过生成显式的中间推理轨迹再给出最终答案,在挑战性任务上取得了强劲表现。然而,推理过程中表示空间的内部结构仍知之甚少:模型的隐藏表示在思考时与输入提示的嵌入有何不同?这种结构能否被利用以在推理时激发更强的推理能力?我们表明,输入嵌入和思考嵌入(分别对提示和推理轨迹的最后一层隐藏状态进行均值池化)都表现出极高的锥度,所有向量紧密聚集在单一平均方向周围。关键的是,这些平均输入方向和思考方向是非共线的,思考嵌入在嵌入空间中占据了几何上不同的区域,这在许多不同模型和基准任务中均成立。这一观察启发我们将输入到思考的转换视为一个旋转问题,该问题可通过正交Procrustes分析得到闭式解。我们提出Rotate2Think,一种无需训练的方法,从少量正确求解的示例中估计该旋转,并在推理时将生成的合成思考向量注入思考分隔符之间,在推理轨迹开始时提供几何提示。在多个基准和模型家族上的评估表明,Rotate2Think在数学、科学和代码任务的32个模型-基准配置中改进了30个的准确率,并零样本泛化到MATH-Vision上的多模态推理。

英文摘要

Reasoning models achieve strong performance on challenging tasks by generating explicit intermediate reasoning traces before producing a final answer. Yet the internal structure of representation space when reasoning remains poorly understood: how do a model's hidden representations differ during thinking versus the embeddings of the input prompt, and can this structure be exploited to elicit stronger reasoning at inference time? We show that both input embeddings and thinking embeddings (mean-pooled last-layer hidden states over the prompt and reasoning trace, respectively) exhibit extremely high conicity, with all vectors clustering tightly around a single mean direction. Crucially, these mean input and thinking directions are non-collinear, with thinking embeddings occupying a geometrically distinct region of embedding space across many different models and benchmark tasks. This observation motivates casting the input-to-thinking transition as a rotation problem admitting a closed-form solution via orthogonal Procrustes analysis. We propose Rotate2Think, a training-free method that estimates this rotation from a small set of correctly solved examples and injects the resulting synthetic thinking vector between thinking delimiters at inference time, providing a geometric primer at the onset of the reasoning trace. Evaluated across multiple benchmarks and model families, Rotate2Think improves accuracy in 30 of 32 model-benchmark configurations across mathematics, science, and code tasks, and generalizes zero-shot to multimodal reasoning on MATH-Vision.

2606.09888 2026-06-10 cs.LG 新提交

SinkRec: Mitigating Semantic State Sink in Long Sequence Recommendation with Memory-Conditioned Gated Delta Networks

SinkRec: 通过记忆条件门控Delta网络缓解长序列推荐中的语义状态沉没

Zhuang Zhuang, Zhipeng Wei, Ji Dai, Jie Chen, Fei Pan, Peng Jiang, Kun Gai

发表机构 * Kuaishou Technology, Beijing, China(快手科技,北京,中国) Beijing University of Posts and Telecommunications, Beijing, China(北京邮电大学,北京,中国) Independent Researcher(独立研究员)

AI总结 针对线性注意力在长序列推荐中因重复行为模式导致的语义状态沉没问题,提出SinkRec混合记忆-循环架构,通过残差向量量化外部化重复模式,并设计TDGD模块净化读写过程,在保持线性时间效率的同时提升推荐效果。

详情
AI中文摘要

线性注意力通过避免标准Transformer的二次成本,为长序列推荐提供了高效的骨干网络,但其压缩的循环状态可能被重复行为模式主导。我们将此现象识别为语义状态沉没,即重复语义过度占据循环状态并偏置后续读出。为了缓解语义状态沉没,我们提出SinkRec,一种混合记忆-循环循环架构,将协同行为模式存储与动态转换建模解耦。SinkRec通过残差向量量化将重复局部模式外部化到可学习的条件记忆中,重新注入检索到的编码,并将记忆键值对暴露给注意力块。它进一步引入了时域感知状态关系差分门控DeltaNet(TDGD),通过抑制记忆覆盖的更新和移除记忆对齐的读出响应,利用记忆净化循环写入和读取。该设计将重复语义从状态竞争信号转变为记忆可检索模式,使循环状态专注于动态转换,并以线性时间效率缓解语义状态沉没。在公共和工业数据集上的实验证明了SinkRec的有效性和效率。

英文摘要

Linear attention provides an efficient backbone for long-sequence recommendation by avoiding the quadratic cost of standard Transformers, but its compressed recurrent state can be dominated by repetitive behavior patterns. We identify this phenomenon as semantic state sink, where recurring semantics over-occupy the recurrent state and bias subsequent readouts. To mitigate semantic state sink, we propose SinkRec, a hybrid memory-transition looped architecture that decouples collaborative behavioral pattern storage from dynamic transition modeling. SinkRec externalizes recurring local patterns into a learnable conditional memory through residual vector quantization, reinjects the retrieved codes, and exposes memory key-value pairs to the attention block. It further introduces Temporal-Aware State-Relation Differential Gated DeltaNet (TDGD), which uses memory to purify recurrent writing and reading by suppressing memory-covered updates and removing memory-aligned readout responses. This design turns recurring semantics from state-competing signals into memory-retrievable patterns, allowing the recurrent state to focus on dynamic transitions and alleviating semantic state sink with linear-time efficiency. Experiments on public and industrial datasets demonstrate the effectiveness and efficiency of SinkRec.

2606.09907 2026-06-10 cs.LG cs.AI 新提交

LongMoE: Longitudinal Multimodal Learning via Trajectory-Aware Mixture-of-Experts

LongMoE:基于轨迹感知的混合专家模型的纵向多模态学习

Maxx Richard Rahman, Prakhar Kumar, Wolfgang Maass

AI总结 提出LongMoE框架,通过上下文感知插补、注意力标记化、轨迹感知编码和稀疏MoE路由,联合解决临床多模态学习中模态缺失和纵向动态两大挑战,在ADNI等数据集上验证了鲁棒性。

详情
AI中文摘要

多模态临床学习对于整合包括影像、文本和个性化健康记录在内的多样化患者数据日益重要。然而,它面临两个基本挑战:i) 模态缺失,即在一次患者就诊中任意子集的模态不可用;ii) 纵向动态,即观察结果的诊断意义取决于患者随时间演变的疾病轨迹。现有方法孤立地处理这些挑战:缺失模态框架将每次就诊视为独立的静态快照并丢弃时间上下文,而纵向模型通常假设模态完全可用并在系统性模态不完整时性能下降。我们提出LongMoE(纵向混合专家模型),这是一个统一框架,用于联合解决这两个挑战。LongMoE结合了上下文感知插补模块和注意力标记化模块,后者捕获不规则就诊序列中的频域时间模式,以及用于建模疾病进展的轨迹感知编码器和用于患者特定专家选择的上下文条件稀疏MoE路由。在ADNI、OASIS-3和MIMIC-IV上的实验表明,LongMoE在缺失或弱共时模态下提高了鲁棒性,并在全模态设置中保持竞争力,为纵向感知的多模态临床学习奠定了坚实基础。

英文摘要

Multimodal clinical learning is increasingly important for integrating diverse patient data, including imaging, text, and personalised health records. However, it faces two fundamental challenges: i) modality missingness, where arbitrary subsets of modalities are unavailable at a given patient visit, ii) longitudinal dynamics, where the diagnostic significance of an observation depends on the patient's evolving disease trajectory over time. Existing methods address these challenges in isolation: missing-modality frameworks treat each visit as an independent static snapshot and discard temporal context, while longitudinal models often assume complete modality availability and degrade under systematic modality incompleteness. We propose LongMoE (Longitudinal Mixture-of-Experts), the unified framework to jointly address both challenges. LongMoE combines a context-aware imputation module with an attentional tokenization module that captures frequency-domain temporal patterns across irregular visit sequences, a trajectory-aware encoder for modeling disease progression, and context-conditioned Sparse MoE routing for patient-specific expert selection. Experiments on ADNI, OASIS-3, and MIMIC-IV show that LongMoE improves robustness under missing or weak contemporaneous modalities and remains competitive in full-modality settings, establishing a strong foundation for longitudinally-aware multimodal clinical learning.

2606.09917 2026-06-10 cs.LG 新提交

SPDM: Geometry-Modulated State Space Modeling with Manifold Constraints for Time Series Forecasting

SPDM: 基于流形约束的几何调制状态空间建模用于时间序列预测

Xingsheng Chen, Siu-Ming Yiu

AI总结 提出SPDM,一种将对称正定流形约束引入状态空间模型的几何感知架构,通过流形轨迹和几何门控机制调制选择性扫描,在保持线性复杂度同时提升多变量时间序列预测精度。

详情
AI中文摘要

多变量时间序列预测需要捕捉交互变量间持续演化的相关结构。现有状态空间模型通过扫描标记化的时间或空间序列来处理时间序列,忽略了演化的几何结构。我们通过将流形约束引入状态空间建模来解决这一局限性:将跨变量相关结构视为对称正定流形上的连续轨迹,其黎曼几何特征、切空间线性度和弗雷歇均值中心性作为原则性的几何正则化器,引导并稳定SSM的选择性扫描动态。我们提出SPDM,一种几何感知的SSM架构,通过两种协作机制实现这一原则:一个流形轨迹路径,将动态演化的协方差矩阵从SPD流形投影到欧几里得切空间;以及一个几何门控方案,基于从流形轨迹导出的几何信号直接调制SSM的内部选择性参数。该参数化在嵌入丰富结构约束的同时保持了Mamba并行扫描的线性时间复杂度,使架构同时保持预测精度和计算效率。在11个真实世界基准数据集上的广泛实验建立了最先进的预测性能,进一步研究证实几何约束的状态空间动态是其性能提升背后的主导架构因素。

英文摘要

Multivariate time series forecasting requires capturing the continuously evolving correlation structure among interacting variables. Existing state-space models process time series by scanning tokenized temporal or spatial sequences, discarding the evolutionary geometric structure. We address this limitation by introducing manifold constraints into state-space modeling: treating the cross-variable correlation structure as a continuous trajectory on the symmetric positive definite manifold, whose Riemannian geometric features, tangent space linearity, and Frechet mean centrality act as a principled geometric regularizer that guides and stabilizes the selective scanning dynamics of SSMs. We propose SPDM, a geometry-aware SSM architecture that realizes this principle through two cooperating mechanisms: a manifold trajectory path that projects dynamically evolving covariance matrices from the SPD manifold to a Euclidean tangent space, and a geometric gating scheme that directly modulates SSM's internal selective parameters based on geometric signals derived from the manifold trajectory. The parameterization preserves the linear-time complexity of the Mamba parallel scan while embedding rich structural constraints, making the architecture preserve prediction accuracy and computational efficiency simultaneously. Extensive experiments on eleven real-world benchmark datasets establish state-of-the-art forecasting performance, and further studies confirm that geometrically constrained state-space dynamics are the dominant architectural factor behind its performance gains.

2606.09928 2026-06-10 cs.LG cs.AI 新提交

Forward-Only Convolutional Neural Networks with Learnable Channel-Class Assignment

具有可学习通道-类别分配的前向传播卷积神经网络

Mohammadnavid Ghader, Saeed Reza Kheradpisheh, Bahar Farahani, Mahmood Fazlali

AI总结 提出可学习的通道-类别分配机制,结合熵和正交正则化,以及基于验证性能的损失感知层贡献策略,在残差CNN上实现前向传播学习,在CIFAR-10/100和Tiny-ImageNet上达到FF模型最佳性能,缩小与反向传播的差距。

详情
AI中文摘要

前向-前向(FF)算法通过用局部的前向目标替代基于梯度的信用分配,提供了一种受生物学启发的反向传播替代方案。虽然最近的扩展已将FF适应到卷积神经网络(CNN),但现有公式依赖于静态的通道-类别分区,并且在复杂任务中难以有效执行。在这项工作中,我们引入了一种可学习的通道-类别分配机制,实现了卷积通道的自适应、数据驱动特化,并辅以熵和正交正则化以提升学习性能。我们进一步提出了一种损失感知的层贡献策略,该策略根据中间层的验证性能自适应地加权其预测,从而增强前向推理的有效性。集成到残差CNN中,所提出的方法在CIFAR-10、CIFAR-100和Tiny-ImageNet上相比现有的类似前向方法持续实现了更优的性能。值得注意的是,它在基于FF的模型中建立了新的最先进性能,显著缩小了与反向传播的差距。这些发现表明,引入可学习的通道特化和层贡献加权显著增强了深度CNN中前向学习的表示能力。

英文摘要

The Forward-Forward (FF) algorithm offers a biologically inspired alternative to backpropagation by replacing gradient-based credit assignment with local, forward-only objectives. While recent extensions have adapted FF to convolutional neural networks (CNNs), existing formulations rely on static channel-class partitions and struggle to perform effectively in complex tasks. In this work, we introduce a learnable channel-class assignment mechanism that enables adaptive, data-driven specialization of convolutional channels, supported by entropy and orthogonality regularization to promote learning performance. We further propose a loss-aware layer contribution strategy that adaptively weights intermediate-layer predictions based on their validation performance, enhancing the effectiveness of forward-only inference. Integrated into residual CNNs, the proposed method achieves consistently superior performance across CIFAR-10, CIFAR-100, and Tiny-ImageNet compared to existing similar forward-only methods. Notably, it establishes new state-of-the-art performance among FF-based models, substantially narrowing the gap with backpropagation. These findings demonstrate that introducing learnable channel specialization and layer contribution weighting significantly enhances the representational capacity of forward-only learning in deep CNNs.

2606.09951 2026-06-10 cs.LG 新提交

Hasse Diagrams for Attention: A Partial Order Framework for Designing Transformer Masks

注意力的哈斯图:设计Transformer掩码的偏序框架

Chentao Li, Han Guo

AI总结 本文提出一个理论框架,证明多层Transformer的信息流收敛到哈斯图,并将并行训练任务设计转化为求哈斯图最小公共超图问题,由此导出两种新注意力掩码。

Comments 21 pages, 9 figures. Theoretical framework for attention mask design; no experiments included

详情
AI中文摘要

在大型Transformer模型的训练过程中,注意力掩码控制序列中信息流的范围和方向。存在多种掩码变体,诸如FlexAttention之类的算子已经支持任意注意力掩码。然而,对于任意掩码所引起的信息流结构,一直缺乏系统的形式化分析。本文开发了一个完整的理论框架。我们证明,在足够深度下,多层Transformer的信息流收敛到一个哈斯图——表示偏序的有向无环图。在此基础上,我们将并行训练任务的设计重新表述为寻找哈斯图的最小公共超图的问题,并建立了最小公共超图的判定准则。这产生了一种直接从任务族推导注意力掩码的构造性方法。应用该框架,我们设计了两种新颖的掩码:一种确保训练-推理一致性的块生成注意力掩码(块双流注意力),以及一种全监督双向注意力掩码(蝴蝶注意力)。这些结果证明了该框架发现新结构的能力。

英文摘要

During the training of large Transformer models, attention masks regulate the scope and direction of information flow across a sequence. Numerous mask variants exist, and operators such as FlexAttention already support arbitrary attention masks. Nevertheless, a systematic formal analysis of the information-flow structure induced by arbitrary masks has been missing. This paper develops a complete theoretical framework. We prove that, with sufficient depth, the information flow of a multi-layer Transformer converges to a Hasse diagram -- a directed acyclic graph representing a partial order. Building on this, we recast the design of parallel training tasks as the problem of finding a minimal common supergraph of Hasse diagrams, and we establish a criterion for the minimal common supergraph. This yields a constructive method to derive attention masks directly from a family of tasks. Applying the framework, we design two novel masks: a block-generation attention mask that ensures training-inference consistency (Block Two-Stream Attention), and a fully supervised bidirectional attention mask (Butterfly Attention). These results demonstrate the framework's capacity to discover new structures.

2606.09954 2026-06-10 cs.LG cs.AI 新提交

Does Normalization Choice Matter for Causal Large Time-Series Models?

归一化选择对因果大规模时间序列模型重要吗?

Samy-Melwan Vilhes, Gilles Gasso, Mokhtar Z Alaya

AI总结 研究因果大规模时间序列模型中不同归一化策略对训练收敛和预测性能的影响,发现归一化选择显著影响模型效果。

详情
Journal ref
ICLR 2026 Workshop: Time Series in the Age of Large Models, Apr 2026, Rio De Janeiro, Brazil
AI中文摘要

用于时间序列预测的大规模模型已成为在异构信号集合上训练模型的有前景的范式。这些模型通常依赖于因果自回归架构,其中每个观测值根据过去依次预测。在实践中,真实世界的时间序列表现出非平稳性,这显著影响预测性能。为了缓解这一问题,通常采用归一化。然而,在高效的因果设置中,归一化可能在训练期间导致来自未来观测的信息泄漏。最近提出的替代方案,包括因果归一化和从初始观测计算的统计量,旨在解决这一问题,但其实际影响仍未被充分理解。在这项工作中,我们评估了基于Transformer的大规模时间序列模型(采用分块和高效因果策略训练)的归一化策略。我们展示了归一化选择显著影响训练收敛和预测性能。

英文摘要

Large models for time-series forecasting have been emerged as a promising paradigm for training models on heterogeneous collections of signals. These models typically rely on causal autoregressive architectures, where each observation is sequentially predicted from past. In practice, real-world time-series exhibit non-stationarities, which significantly influence predictive performance. To mitigate this, normalization is commonly employed. However, in efficient causal settings it might induce information leakage from future observations during training. Recent alternatives, including causal normalization and statistics computed from initial observations, have been proposed to address this issue, but their practical implications remain insufficiently understood. In this work, we evaluate normalization strategies for transformer-based large time-series models trained with patching and efficient causal strategy. We showcase that normalization choice significantly influences both training convergence and forecasting performance.

2606.10085 2026-06-10 cs.LG eess.SP math.OC 新提交

Structured Adaptive Tensor Prediction for Streaming Data

流式数据的结构化自适应张量预测

Zhen Qin, Yang Chen

AI总结 针对矩阵值时间序列的流式预测,提出自适应张量回归框架,包含矩阵-矩阵和张量-矩阵两种形式,并开发在线SGD算法,张量-矩阵模型在稳态误差和去噪方面更优,同时建立了低维结构下的恢复保证。

详情
AI中文摘要

矩阵值时间序列出现在广泛的应用中,例如来自医学成像和地球物理学的时空数据。现有方法主要针对静态环境设计,缺乏对流式和时变环境的适应性。自适应滤波技术也大多局限于标量或向量值数据,使得矩阵值时间序列的自适应预测理解不足。为弥补这些差距,我们开发了一个自适应张量回归框架,包括矩阵-矩阵(MoM)和张量-矩阵(ToM)两种形式,用于流式矩阵值预测。这两种形式的区别在于是否直接建模矩阵值输出,或通过高阶张量表示利用时间结构。针对所提出的张量回归框架,我们开发了用于在线学习的随机梯度下降(SGD)算法。我们表明,将多个响应随时间堆叠成高阶张量可以提高性能;特别是,ToM比MoM实现了更低的稳态误差和更强的去噪能力,这促使我们关注ToM模型。我们进一步刻画了SGD在时变动态下的跟踪行为。从统计角度,我们建立了ToM在一般低维结构(包括稀疏性、低秩性及其联合稀疏低秩模型)下的固定时间恢复保证。

英文摘要

Matrix-valued time series arise in a wide range of applications, such as spatio-temporal data from medical imaging and geophysics. Existing methods are mainly designed for static settings and lack adaptability to streaming and time-varying environments. Adaptive filtering techniques have also been largely limited to data with scalar or vector values, leaving adaptive forecasting for matrix-valued time series inadequately understood. To bridge these gaps, we develop an adaptive tensor regression framework that includes Matrix-on-Matrix (MoM) and Tensor-on-Matrix (ToM) formulations for streaming matrix-valued prediction. The two formulations differ in whether to directly model matrix-valued outputs or to exploit temporal structure via higher-order tensor representations. For the proposed tensor regression framework, we develop stochastic gradient descent (SGD) algorithms for online learning. We show that stacking multiple responses across time into higher-order tensors improves performance; in particular, the ToM achieves lower steady-state error and stronger denoising capability than MoM, motivating our focus on the ToM model. We further characterize the tracking behavior of SGD under time-varying dynamics. From a statistical perspective, we establish fixed-time recovery guarantees for ToM under general low-dimensional structures, including sparsity, low-rankness, and their joint sparselow-rank models.

2606.10111 2026-06-10 cs.LG cs.SY eess.SY stat.ML 新提交

Nonlinear Estimator: Dual Bayesian Affine Estimators for Parameter Learning

非线性估计器:用于参数学习的双贝叶斯仿射估计器

Sasan Vakili, Daniël Woonings, Pradyumna Paruchuri, Peyman Mohajerin Esfahani

AI总结 提出一种用于Wiener型状态空间模型的非线性参数估计器,通过固定点架构耦合两个仿射最小均方误差估计器,分别估计未知参数和潜在变量,并开发两种双估计器框架,实验表明双状态-参数估计器在参数均方误差上优于其他方法。

Comments 32 pages, 9 figures

详情
AI中文摘要

本文提出一种用于Wiener型状态空间模型的非线性参数估计器,该估计器采用固定点架构,耦合两个仿射最小均方误差(MMSE)估计器:一个用于未知参数,另一个用于潜在变量。该架构保留了最优仿射MMSE参数估计器的功能结构,同时引入了动态基统计(DBS)估计,以总结非线性基函数评估。开发了两种DBS构建策略,从而产生两种非线性估计器框架。双基-参数估计器将仿射基估计器与仿射参数估计器相结合,而双状态-参数估计器首先计算仿射状态估计及其协方差,然后通过高斯DBS算子映射这些状态估计统计量以获得DBS估计。两种双估计器都采用固定点表征,交替估计每个分量,使用另一个分量的更新先验(该先验来自前一次迭代中该分量的插件估计统计量)。通过广泛的蒙特卡洛实验检验了所提方法的有效性,结果表明双基-参数估计器获得的参数均方误差与纯仿射参数估计器相当,而双状态-参数估计器实现了最低的参数均方误差,优于双基-参数估计器、纯仿射参数估计器以及经典粒子吉布斯和期望最大化方案的顺序蒙特卡洛变体。

英文摘要

This paper presents a nonlinear parameter estimator for Wiener-type state-space models obtained as a fixed-point architecture that couples two affine minimum mean-squared error (MMSE) estimators: one for the unknown parameters and one for latent variables. The architecture retains the functional structure of the optimal affine MMSE parameter estimator while incorporating Dynamic Basis Statistics (DBS) estimates that summarize nonlinear basis-function evaluations. Two DBS construction strategies are developed, leading to two nonlinear estimator frameworks. The dual basis-parameter estimator combines an affine basis estimator with the affine parameter estimator, whereas the dual state-parameter estimator first computes affine state estimates and their covariances, then maps these state-estimate statistics through a Gaussian DBS operator to obtain DBS estimates. Both dual estimators admit fixed-point characterizations that alternate between estimating each component using the updated prior of the other, obtained from that component's plug-in estimate statistics from the previous iteration. The efficacy of the proposed methods is examined via extensive Monte Carlo experiments, showing that the dual basis-parameter estimator attains parameter mean-squared errors comparable to those of the purely affine parameter estimator, while the dual state-parameter estimator achieves the lowest parameter mean-squared error, outperforming both the dual basis-parameter and purely affine parameter estimators, as well as sequential Monte Carlo variants of classical Particle Gibbs and Expectation-Maximization schemes.

2606.10170 2026-06-10 cs.LG 新提交

Learning Entropy and Spatial Adaptation Dynamics of Multilayer Perceptrons for Structural Point Extraction

多层感知机的学习熵与空间自适应动力学用于结构点提取

Jan Glaser, Ivo Bukovsky, Marcel Jirina

AI总结 提出空间学习熵(SLEM)方法,通过分析MLP在图像样本学习中的权重自适应,识别对网络学习重要的图像点与区域,为特征提取提供新视角。

详情
AI中文摘要

本文将学习熵(LE)的概念从时间自适应系统扩展到应用于图像数据的多层感知机网络(MLP)中的空间学习。与局部邻域方法直接从梯度或协方差算子评估图像结构不同,所提方法通过学习熵分析学习过程本身。训练MLP从周围空间上下文预测中心像素的强度,同时从跨图像样本的学习过程中神经权重的增量自适应评估LE。生成的空间学习熵图(SLEM)识别出引起神经网络强烈自适应的异常图像点和区域,这些点在网络学习过程中具有重要作用。结果表明,空间学习熵通过突出对网络学习特别有信息量的空间位置,为传统特征提取和可解释性方法提供了补充视角。空间学习熵根据学习影响而非局部结构属性识别图像点和区域,为传统特征提取和可解释性方法提供了补充视角。所提框架可能为计算机视觉、制造和机器人学中的学习驱动图像或场景分析开辟新方向。

英文摘要

This paper extends the concept of Learning Entropy (LE) from temporal adaptive systems to spatial learning in multilayer perceptron networks (MLPs) applied to image data. Instead of evaluating image structure directly from gradients or covariance operators, as local neighborhood methods do, the proposed approach analyzes the learning process itself through Learning Entropy. An MLP is trained to predict the intensity of a center pixel from its surrounding spatial context, while LE is evaluated from the incremental adaptation of neural weights during learning across image-derived samples. The resulting Spatial Learning Entropy Maps (SLEM) identify unusual image points and regions that induce strong adaptation of the neural network and therefore have an important role in the learning process. The results indicate that spatial Learning Entropy provides a complementary perspective to conventional feature extraction and explainability methods by highlighting spatial locations that are particularly informative for network learning. Spatial Learning Entropy provides a complementary perspective to conventional feature extraction and explainability methods by identifying image points and regions according to their learning impact rather than their local structural properties. The proposed framework may open new directions for learning-driven image or scene analysis in computer vision, manufacturing, and robotics.

2606.10184 2026-06-10 cs.LG cs.AI 新提交

Dropout-GRPO: Variational Stochasticity for Continuous Latent Reasoning

Dropout-GRPO: 用于连续潜在推理的变分随机性

Wooil Jung

AI总结 针对GRPO在连续潜在推理模型中因确定性轨迹导致优势为零的问题,提出通过结构化Dropout引入随机性,使GRPO能优化贝叶斯模型平均策略,在GSM8K上提升Coconut基线准确率。

详情
AI中文摘要

组相对策略优化(GRPO)依赖于每组内$K$次rollout的多样性;否则,组平均优势$A^{(k)} = r^{(k)} - \mu_r$会坍缩为零。这对像Coconut这样的潜在推理模型构成了结构性挑战,该模型循环地馈送连续隐藏状态以替代离散的思维链token。由于给定参数和提示后潜在阶段本质上是确定性的,多次rollout会产生相同的轨迹,阻碍GRPO的进展。因此,将组相对强化学习应用于连续潜在推理被证明是困难的。为解决此问题,我们提出通过结构化dropout来获取必要的随机性。通过在给定rollout的所有潜在递归步骤中应用一个保持不变的单一伯努利掩码,我们生成必要的轨迹方差。这个共享掩码有效地将每个rollout视为来自参数变分分布的后验样本,使GRPO能够优化贝叶斯模型平均策略的期望奖励。我们为该方法提供了理论证明——包括无偏性、方差减少以及潜在梯度的良定义性——以及实证验证。在GSM8K上,dropout-GRPO将Coconut基线从$27.29\%$提升至$29.01\%$的pass@1,证明了GRPO学习在潜在推理模型中的可行性。我们的工作将此定位为一种实用且理论基础的潜在推理LLM后训练方法。

英文摘要

Group Relative Policy Optimization (GRPO) relies on the diversity of $K$ rollouts within each group; otherwise, the group-mean advantage $A^{(k)} = r^{(k)} - μ_r$ collapses to zero. This presents a structural challenge for latent-reasoning models like Coconut, which feed continuous hidden states recurrently in place of discrete chain-of-thought tokens. Because the latent phase is inherently deterministic given the parameters and prompt, multiple rollouts produce identical trajectories, stalling GRPO's progress. Consequently, applying group-relative reinforcement learning to continuous latent reasoning has proven difficult. To address this, we propose sourcing the necessary stochasticity through structured dropout. By applying a single Bernoulli mask held constant across all latent recurrence steps for a given rollout, we generate essential trajectory variance. This shared mask effectively treats each rollout as a posterior sample from a variational distribution over parameters, allowing GRPO to optimize the expected reward of a Bayesian model-average policy. We provide both theoretical justification for this method -- including unbiasedness, variance reduction, and the well-definedness of the latent gradient -- and empirical validation. On GSM8K, dropout-GRPO improves a Coconut baseline from $27.29\%$ to $29.01\%$ pass@1, demonstrating the viability of GRPO learning for latent-reasoning models. Our work positions this as a practical, theoretically grounded approach for post-training latent-reasoning LLMs.

2606.10294 2026-06-10 cs.LG cs.AI cs.AR cs.NE physics.comp-ph 新提交

LLM-Guided Neural Architecture Search for Robust Co-Design of Physical Neural Networks

LLM引导的神经架构搜索用于物理神经网络的鲁棒协同设计

Tyler King, Timothee Leleu

发表机构 * University of California, Berkeley(加州大学伯克利分校) DeepMind(深度思维)

AI总结 提出UH-NAS框架,利用大语言模型作为进化算子,协同优化任务准确率和推理能耗,实现跨硬件平台的公平比较,在光学MZI硬件上发现更鲁棒的架构。

详情
AI中文摘要

在非常规硬件上部署神经网络需要同时优化任务准确率和平台特定约束(如能耗成本、物理非理想性和数值精度)的架构。现有的神经架构搜索(NAS)方法通常针对单一硬件系列,限制了跨平台比较和泛化。我们提出了非常规硬件神经架构搜索(UH-NAS),一个硬件无关的、LLM引导的NAS框架,它集成语言模型作为进化算子来协同优化准确率和推理能耗。通过将硬件暴露为可交换的后端,并配备每平台能耗模型、物理约束和非理想性模拟器,UH-NAS能够在无需修改搜索算法的情况下,对各种后端进行公平的系统级比较。在光学MZI硬件上测试,UH-NAS发现了比传统基线更多样化、更鲁棒的架构,同时优于现有的LLM-to-NAS方法。对非理想性下架构鲁棒性以及系统提示作用的进一步消融研究,凸显了架构-硬件协同设计对新兴计算平台的重要性。

英文摘要

Deploying neural networks on unconventional hardware demands architectures that co-optimize task accuracy and platform-specific constraints such as energy cost, physical non-idealities, and numerical precision. Existing neural architecture search (NAS) methods are typically tailored to a single hardware family, limiting cross-platform comparison and generalization. We introduce Unconventional Hardware Neural Architecture Search (UH-NAS), a hardware-agnostic, LLM-guided NAS framework that integrates language models as evolutionary operators to co-optimize accuracy and inference energy. By exposing hardware as a swappable backend with per-platform energy models, physical constraints, and non-ideality simulators, UH-NAS enables fair system-level comparisons across various backends without modifying the search algorithm. Tested on optical MZI hardware, UH-NAS discovers more diverse, robust architectures than conventional baselines while outperforming existing LLM-to-NAS approaches. Additional ablations on architecture robustness under non-idealities and the role of system prompts highlight the importance of architecture-hardware co-design for emerging computing platforms.

2606.10385 2026-06-10 cs.LG cs.AI 新提交

Beyond Absolute Imitation: Anchored Residual Guidance for Privileged On-Policy Distillation

超越绝对模仿:基于锚定残差引导的特权在线蒸馏

Wenhao Zhang

发表机构 * South China University of Technology(华南理工大学)

AI总结 提出锚定残差在线蒸馏(AR-OPD),通过部分特权教师建立局部兼容锚点并注入受控残差,解决特权在线蒸馏中后见偏差导致的局部不可达问题,在推理任务上平均提升2.3个点。

Comments 17 pages, 8 figures. Project page: https://vanhowe.github.io/AR-OPD/

详情
AI中文摘要

在线蒸馏(OPD)通过将学生模型与教师在其自身轨迹上的预测分布对齐,在增强LLM复杂推理方面展现出显著的实证收益。一种新兴变体——特权OPD,通过使用增强特权信息(如oracle轨迹)的自教师模型进一步强化该范式,以缓解师生能力差距,同时提供密集的、答案导向的监督。然而,当前方法将特权信息视为一个整体的模仿目标,未能将局部可达的推理步骤与未来条件的oracle信号分离。因此,学生被鼓励去匹配一个事后偏差分布,该分布通常落在其局部预测支持之外。这种可达性不匹配激励学生模型跳过有效的中间推理,转而采用局部不支持的捷径。为解决此问题,我们引入锚定残差在线蒸馏(AR-OPD),一种解耦特权监督的双视角框架。AR-OPD不强制执行严格的全局模仿,而是使用部分特权教师建立局部兼容锚点,将oracle预见性隔离并作为受控残差注入,以提供目标导向的引导。在多种推理任务上,AR-OPD比完全特权OPD高出2.3个点,比SFT高出7.9个点。关键的是,这种锚定残差机制将事后泄漏减少了21.7%,并缓解了后期漂移,在超过768个token的挑战性长程轨迹上取得了高达7.2个点的优势。

英文摘要

On-policy distillation (OPD) has demonstrated strong empirical gains in enhancing complex reasoning in LLMs by aligning a student model with a teacher's predictive distribution over the student's own trajectories. An emerging variant, Privileged OPD, further strengthens this paradigm by employing a self-teacher model augmented with privileged information, such as oracle traces, to mitigate teacher-student capacity gaps while providing dense, answer-directed supervision. However, current methods treat privileged information as a monolithic imitation target, failing to disentangle locally reachable reasoning steps from future-conditioned oracle signals. Consequently, the student is encouraged to match a hindsight-biased distribution that often falls outside its local predictive support. This reachability mismatch incentivizes the student model to skip valid intermediate reasoning in favor of locally unsupported shortcuts. To resolve this, we introduce Anchored Residual On-Policy Distillation (AR-OPD), a dual-view framework that disentangles privileged supervision. Rather than enforcing strict full-view imitation, AR-OPD establishes a locally compatible anchor using a partially privileged teacher, isolating and injecting oracle foresight as a controlled residual to provide destination-directed guidance. Across diverse reasoning tasks, AR-OPD outperforms full privileged OPD by 2.3 points and SFT by 7.9 points. Crucially, this anchored residual mechanism reduces hindsight leakage by 21.7% and mitigates late-stage drift, yielding up to a 7.2-point advantage on challenging long-horizon trajectories exceeding 768 tokens.

2606.10406 2026-06-10 cs.LG cs.AI 新提交

FOGO: Forgetting-aware Orthogonalization Optimizer

FOGO:遗忘感知正交化优化器

Toan Nguyen, Yang Liu, Trung Le, Celso de Melo, Flora D. Salim

发表机构 * School of Computer Science and Engineering, University of New South Wales(新南威尔士大学计算机科学与工程学院) Department of Data Science & AI, Monash University(莫纳什大学数据科学与人工智能系) DEVCOM Army Research Laboratory(DEVCOM陆军研究实验室)

AI总结 提出FOGO优化器,通过谱正交化动量更新并利用紧凑码本记忆解决梯度干扰,在类别不平衡、持续学习和大模型微调等场景中提升收敛与知识保留。

详情
AI中文摘要

我们认为遗忘不仅局限于持续学习,而是一种普遍的优化现象:在标准训练过程中,主导的小批量梯度抑制了罕见但有用的更新方向,导致每一步的短期遗忘。当这些知识从未被重新访问时,这些损失会累积成长期遗忘——持续学习的经典失败模式。我们引入了FOGO,一种可扩展的优化器,能够持续检测并解决两种场景下的梯度干扰。FOGO对动量更新进行谱正交化,以防止主导方向垄断优化,然后将代表性的过去方向存储在基于随机投影的紧凑码本记忆中,其中成对距离在低维空间中得到可证明的保留。在每一步中,当前更新与存储方向之间的冲突通过轻量级正交校正解决,并通过近端步骤提升回来,开销极小且无需存储数据。在类别不平衡分类、领域和类别变化下的持续视觉学习、LLaVA-7B的持续微调以及GPT-2预训练中,FOGO持续改善收敛和知识保留,优于Adam和Muon。

英文摘要

We argue that forgetting is not confined to continual learning but is a general optimization phenomenon: during standard training, dominant mini-batch gradients suppress rare but useful update directions, causing short-term forgetting at every step. When such knowledge is never revisited, these losses compound into long-term forgetting-the classical failure mode of continual learning. We introduce FOGO, a scalable optimizer that continuously detects and resolves gradient interference across both regimes. FOGO spectrally orthogonalizes momentum updates to prevent dominant directions from monopolizing optimization, then stores representative past directions in a compact codebook memory built on random projection, where pairwise distances are provably preserved in low-dimensional space. At each step, conflicts between the current update and stored directions are resolved via lightweight orthogonal correction and lifted back through a proximal step, with minimal overhead and no data storage. Across class-imbalanced classification, continual visual learning under domain and class shifts, continual fine-tuning of LLaVA-7B, and GPT-2 pretraining, FOGO consistently improves convergence and knowledge retention, outperforming Adam and Muon.

2606.10435 2026-06-10 cs.LG cs.CL 新提交

Parallel Causal Associative Fields: Gated Sparse Memory for Long-Context Language Modeling

并行因果关联域:用于长上下文语言建模的门控稀疏记忆

Muhammad Ahmed

发表机构 * Independent Researcher(独立研究员)

AI总结 提出并行因果关联域(PCAF),通过哈希桶存储局部记录、检索候选集形成稀疏缓存,并与参数化语言模型门控混合,实现稀疏长上下文访问,避免固定状态瓶颈。

Comments 17 pages, 5 figures, and 6 tables. Experiments on WikiText-103, PG-19, and WikiText-2 using TPU v4-32 and NVIDIA RTX 3060 hardware. Code: https://github.com/ahmed123hds/PCAF

详情
AI中文摘要

Transformer通过提供直接的token间通信路径实现了强大的语言建模性能,但因果自注意力的计算量随上下文长度呈二次方增长。循环模型和状态空间模型降低了这一成本,但将历史压缩为顺序更新的固定大小状态。本文研究了第三种原语:基于因果后继记录的并行内容寻址记忆。所提出的并行因果关联域(PCAF)将上下文窗口中的局部记录写入哈希桶,为当前查询检索有界的候选集,在后继token上形成稀疏缓存分布,并通过学习到的门将该缓存与参数化局部语言模型混合。所得模型在避免单一固定循环状态瓶颈的同时,保持了稀疏的长上下文访问。我们在WikiText-103和PG-19上使用分布式Google Cloud TPU v4-32 pod对PCAF进行了完全自回归预训练。在303M参数和上下文长度T=2048的情况下,PCAF-semantic在WikiText-103上达到36.31困惑度,在PG-19上达到52.45困惑度,而匹配的密集Transformer分别为47.49和53.84。PCAF-semantic在TPU pod上同时处理0.61-0.62M token/s,而密集和局部注意力基线为0.43M token/s。支持41M参数的多种子扫描和单GPU组件消融实验表明,关联缓存、检索容量和学习到的门对速度-质量权衡有实质性影响。

英文摘要

Transformers achieve strong language modeling performance by providing direct token-to-token communication paths, but causal self-attention scales quadratically with context length. Recurrent and state-space models reduce this cost, yet compress history into sequentially updated fixed-size states. This paper studies a third primitive: a parallel content-addressed memory over causal successor records. The proposed Parallel Causal Associative Field (PCAF) writes local records from a context window into hash buckets, retrieves a bounded candidate set for the current query, forms a sparse cache distribution over successor tokens, and mixes that cache with a parametric local language model through a learned gate. The resulting model maintains sparse long-context access while avoiding a single fixed recurrent state bottleneck. We evaluate PCAF under full autoregressive pretraining on WikiText-103 and PG-19 using a distributed Google Cloud TPU v4-32 pod. At 303M parameters and context length T = 2048, PCAF-semantic reaches 36.31 perplexity on WikiText-103 and 52.45 perplexity on PG-19, compared with 47.49 and 53.84 for a matched dense Transformer. PCAF-semantic simultaneously processes 0.61-0.62M tokens/s across the TPU pod, versus 0.43M tokens/s for dense and local attention baselines. Supporting 41M-parameter multi-seed sweeps and single-GPU component ablations show that the associative cache, retrieval capacity, and learned gate materially affect the speed-quality trade-off.

2606.10596 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Embedding Hybrid Systems into Continuous Latent Vector Fields

将混合系统嵌入连续潜在向量场

Sangli Teng, Hang Liu, Koushil Sreenath

AI总结 证明当m>2n时,n维混合系统可嵌入m维欧氏空间中的连续向量场,并基于此提出一种潜在神经ODE方法,从时间序列数据中准确恢复混合系统流,优于现有方法。

Comments Accepted to ICML 2026

详情
AI中文摘要

这项工作证明了当$m>2n$时,一个$n$维混合系统可以嵌入到一个$m$维欧氏空间中,并在其嵌入图像上配备一个连续向量场。这一结果表明,一个本质上不连续的混合系统通常允许一个连续的 extrinsic 表示,该表示对于可微优化是适定的。基于这一存在性定理,我们表明,在潜在空间和状态空间中都具有一致性损失的潜在神经ODE可以准确恢复混合系统的流。大量实验表明,所提出的方法在仅从时间序列数据学习具有不同几何形状的混合系统方面优于现有方法。

英文摘要

This work proves that an $n$-dimensional hybrid system can be embedded into an $m$-dimensional Euclidean space equipped with a continuous vector field on its embedded image whenever $m>2n$. This result suggests that an intrinsically discontinuous hybrid system generically admits a continuous extrinsic representation that is well-posed for differentiable optimization. Building on this existence theorem, we show that a latent Neural ODE with consistency loss in both the latent and state space can accurately recover the flow of hybrid systems. Extensive experiments suggest the proposed method outperforms the existing method in learning hybrid systems with varying geometries from only time series data.

2606.10682 2026-06-10 cs.LG 新提交

PL-KKT-hPINN: Enforcing Nonlinear Equality Constraints on Neural Networks via Piecewise-Linear Projection

PL-KKT-hPINN:通过分段线性投影在神经网络上强制非线性等式约束

Fateme Mohammad Mohammadi, Hector Budman, Joshua L. Pulsipher

发表机构 * Department of Chemical Engineering, University of Waterloo(滑铁卢大学化学工程系)

AI总结 提出PL-KKT-hPINN框架,通过分段线性投影严格强制非线性等式约束,在CSTR案例中保持预测精度的同时大幅降低约束违反,并在小样本下提升鲁棒性。

详情
AI中文摘要

尽管物理信息神经网络(PINN)在过程建模中显示出强大潜力,但物理方程仅在训练期间作为软约束强制执行,因此无法保证推理时的约束满足。我们提出一个称为分段线性Karush--Kuhn--Tucker硬约束PINN(PL-KKT-hPINN)的框架,通过分段线性投影严格强制非线性等式约束。这扩展了KKT-hPINN框架,后者通过Karush--Kuhn--Tucker(KKT)条件精确强制线性等式,该条件与将神经网络输出正交投影到约束可行域相关。该方法在连续搅拌釜反应器(CSTR)案例研究中进行了单输入和双输入情况的演示。结果表明,PL-KKT-hPINN保持了与标准神经网络相当的预测精度,同时实现了显著更低的约束违反。此外,所提出的模型在低数据情况下显示出改进的鲁棒性,在有限的训练样本量下,其RMSE低于无约束神经网络。这些结果表明,PL-KKT-hPINN为非线性化学工程系统的代理建模提供了一种计算高效且物理一致的框架。

英文摘要

While physics-informed neural networks (PINNs) have shown strong potential for process modeling, physical equations are only enforced as soft constraints during training, and thus, they do not guarantee constraint satisfaction at inference. We propose a framework, called piecewise-linear Karush--Kuhn--Tucker hard-constrained PINNs (PL-KKT-hPINNs), that strictly enforces nonlinear equality constraints through piecewise-linear projection. This extends the KKT-hPINN framewor, which exactly enforces linear equalities through the Karush--Kuhn--Tucker (KKT) conditions associated with orthogonally projecting neural network outputs onto the constraint feasible region. The method is demonstrated on a continuous stirred-tank reactor (CSTR) case study for both one and two inputs. Results show that PL-KKT-hPINN preserves predictive accuracy comparable to that of a standard neural network while achieving substantially lower constraint violations. In addition, the proposed model shows improved robustness in low-data regimes, yielding lower RMSE than the unconstrained neural network for limited training sample sizes. These results demonstrate that PL-KKT-hPINN provides a computationally efficient and physically consistent framework for surrogate modeling of nonlinear chemical engineering systems.

2606.10929 2026-06-10 cs.LG cs.AI 新提交

Recoverable but Not Stationary:Local Linear Structures in Weights and Activations

可恢复但不稳定:权重和激活中的局部线性结构

Irina Piontkovskaia, Sergey Nikolenko

发表机构 * St. Petersburg Department of the Steklov Institute of Mathematics(斯捷克洛夫数学研究所圣彼得堡分所) St. Petersburg State University(圣彼得堡国立大学)

AI总结 研究神经网络中线性结构的存在性与尺度,发现局部低秩任务梯度结构,但固定任务平面假设不成立;首次恢复更新形成轨迹前缀基,捕获大部分恢复位移;提出随机搜索理论解释高维随机参数搜索有效性,并验证参数扰动与激活引导的关系。

Comments 23 pages, 8 tables, 9 figures

详情
AI中文摘要

任务向量、LoRA、激活引导和预训练权重周围的随机搜索都表明学习行为可以由线性方向控制。我们询问哪些线性结构实际存在以及它们处于什么尺度。在合成多任务Transformer和DistilGPT-2/GPT-2上的LoRA适配器中,我们发现强烈的局部低秩任务梯度结构,但拒绝了固定任务平面假设:静态基会错过恢复方向,有用的基在100步内显著漂移。然而,首次恢复更新形成了一个轨迹前缀基,捕获了LoRA恢复位移的77%。我们开发了随机搜索理论,结合高斯局部线性定理,证明了即使在非常高维的情况下随机参数搜索的有效性。我们还研究了参数扰动与激活引导之间的关系:单次梯度步产生的激活偏移与标记对比CAA引导向量的余弦为0.58,对Qwen-0.5B BoolQ陈述具有类似的引导效果。我们通过在合成Transformer和LLM上的实验验证了结果。我们的结果表明,训练网络中的线性结构不是全局任务方向,而是演化的局部几何结构,这些结构在参数和激活空间中部分持续存在。

英文摘要

Task vectors, LoRA, activation steering, and random search around pretrained weights all suggest that learned behaviour can be controlled by linear directions. We ask which linear structures actually exist and on what scale. In a synthetic multitask transformer and LoRA adapters on DistilGPT-2 / GPT-2 we find strong local low-rank task-gradient structure but reject the fixed-task-plane hypothesis: static bases miss the recovery direction, and the useful basis drifts substantially within 100 steps. However, the first recovery updates form a trajectory-prefix basis capturing 77% of the LoRA recovery displacement. We develop random search theory with a Gaussian local-linear theorem that justifies the effectiveness of random parameter search even in very high dimensions. We also study the relation between parameter perturbations and activation steering: a single gradient step produces an activation shift with 0.58 cosine to a labelled-contrast CAA steering vector, with a similar steering effect on Qwen-0.5B BoolQ statements. We validate our results with experiments on synthetic Transformers and LLMs. Our results suggest that linear structures in trained networks are not global task directions, but evolving local geometries that partially persist across parameter and activation spaces.

2606.11081 2026-06-10 cs.LG cs.AI 新提交

Unifying Local Communications and Local Updates for LLM Pretraining

统一大语言模型预训练中的本地通信与本地更新

Pietro Cagnasso, Eugene Belilovsky, Edouard Oyallon

发表机构 * Concordia University(康考迪亚大学) Mila CNRS, Sorbonne University(法国国家科学研究中心,索邦大学)

AI总结 提出GASLoC算法,通过去中心化训练框架统一本地通信与更新,在异构带宽下优于DiLoCo,支持自适应优化器和多本地步骤。

Comments 38 pages, 9 figures

详情
AI中文摘要

随着训练依赖于跨集群、数据中心和低带宽链接的分布式计算,通信高效的大语言模型预训练变得越来越重要。许多实用方法降低了通信频率,但仍依赖于同步All-Reduce操作,这些操作保持相同的模型状态并将进度与全局集体操作绑定。当带宽或工作速度异构时,这可能成为瓶颈。我们引入了GASLoC,一种新颖的去中心化预训练算法,它将通信加速的概念推广到最近流行的“外部优化器”,以允许一个实用的基于八卦的训练框架,该框架与自适应优化器兼容,允许本地优化器步骤,并可以利用稀疏随机对等通信。在多个标准LLM训练任务上的实验表明,GASLoC在单步每通信设置下,对于多种拓扑结构优于最先进的去中心化算法,并且与LLM设置中现有的去中心化方法不同,它在利用多个本地步骤时能够获得与DiLoCo竞争的性能。在异构带宽设置下,我们展示了GASLoC的优势,表明它可以显著优于DiLoCo。

英文摘要

Communication-efficient pre-training of LLMs is increasingly important as training draws on compute distributed across clusters, data centers, and lower-bandwidth links. Many practical methods reduce communication frequency but still rely on synchronous All-Reduce operations that maintain identical model states and tie progress to global collectives. This can become a bottleneck when bandwidth or worker speed is heterogeneous. We introduce GASLoC, a novel decentralized pre-training algorithm that generalizes the notion of communication acceleration to the recently popular "outer optimizer" to allow a practical gossip-based training framework that is compatible with adaptive optimizers, allows for local optimizer steps, and can utilize sparse randomized peer communication. Empirically, on a number of standard LLM training tasks, we demonstrate that GASLoC outperforms state-of-the-art decentralized algorithms in single step per communication setting for a number of topologies and, unlike existing decentralized methods in the LLM setting, it allows to obtain performance competitive with DiLoCo when utilizing multiple local steps. In the heterogeneous bandwidth setting we demonstrate the advantage of GASLoC showing that it can significantly outperform DiLoCo.

2606.11172 2026-06-10 cs.LG 新提交

Predicting Future Behaviors in Reasoning Models Enables Better Steering

推理模型中预测未来行为以实现更好的引导

Evgenii Kortukov, Piotr Komorowski, Florian Klein, Paula Engl, Gabriele Sarti, Seong Joon Oh, Sebastian Lapuschkin, Wojciech Samek

发表机构 * Fraunhofer HHI(弗劳恩霍夫海因里希·赫兹研究所) Northeastern University(东北大学) KAIST(韩国科学技术院)

AI总结 通过训练激活探针预测推理模型未来行为,提出未来探针控制生成(FPCG)方法,在多个评估中实现几乎无质量下降的引导。

详情
AI中文摘要

部署的大型推理模型(LRM)经常出现意外行为。测试时引导通过干预其隐藏表示来控制LRM输出,但可能降低输出质量。我们认为,先前的引导工作隐含地依赖于检测已生成文本中行为的内部特征。我们表明这些检测特征是未来行为结果的不良预测器,因此不是自然的干预目标。相反,我们训练激活探针从中间推理步骤预测未来行为可能性。这些探针以64%-91%的准确率预测最可能的行为,揭示了一种不同类型的内部预测特征。基于这些预测特征,我们引入了一种文本级引导方法,即未来探针控制生成(FPCG)。FPCG采样多个候选句子,并根据预测未来行为可能性的探针选择最佳句子。这使得引导几乎没有输出质量下降。FPCG还在激活引导失败的多个评估中实现了引导。这些结果表明,区分检测和预测特征能够实现对LRM行为更细致的控制。

英文摘要

Deployed large reasoning models (LRMs) often behave unexpectedly. Test-time steering controls LRM outputs by intervening on their hidden representations, but it can degrade output quality. We argue that prior steering work implicitly relies on internal features that detect behavior in already generated text. We show that these detection features are poor predictors of future behavioral outcomes, and thus not the natural intervention target. Instead, we train activation probes to predict future behavior likelihoods from intermediate reasoning steps. These probes predict the most likely behavior with 64%-91% accuracy, revealing a separate type of internal prediction features. Building on these prediction features, we introduce a text-level steering method, Future Probe Controlled Generation. FPCG samples multiple candidate sentences and chooses the best one according to a probe predicting the future behavior likelihood. This enables steering with almost no output quality degradation. FPCG also enables steering in several evaluations where activation steering fails. These results show that distinguishing detection and prediction features enables a more nuanced approach to controlling LRM behaviors.

2606.11189 2026-06-10 cs.LG cs.AI cs.CL 新提交

A Unifying Lens on Supervised Fine-Tuning Through Target Distribution Design

通过目标分布设计审视监督微调的统一视角

Tong Xie, Yuanhao Ban, Yunqi Hong, Sohyun An, Yihang Chen, Cho-Jui Hsieh

发表机构 * University of California, Los Angeles (UCLA)(加州大学洛杉矶分校) Arena

AI总结 本文重新解读监督微调为目标分布设计,提出Q-target框架,将监督分解为对观测token的依赖强度与替代token的概率分配,并基于此提出Target-SFT方法,在多个推理任务中优于现有方法。

详情
AI中文摘要

监督微调(SFT)通常最大化示范轨迹中每个token的似然。然而,观测到的token可能非唯一、有噪声或与模型先验不一致。严格拟合这种one-hot目标可能不是最优的,尤其是当预训练模型编码了丰富的知识先验时。在这项工作中,我们将SFT重新解释为目标分布设计:不仅研究损失目标,还分析损失驱动模型匹配的token级目标。我们引入Q-target框架,将SFT监督分解为两个明确的选择:(1) 对观测token的依赖强度,以及(2) 如何将剩余概率质量分配给替代token。这一视角将许多现有的SFT变体统一为目标分布Q的隐式选择。基于这一观点,我们提出Target-SFT,直接从期望的目标分布构建训练目标。该方法在十个推理数据集-模型设置中一致优于现有方法,展示了这种基于目标的方法的有效性。总体而言,我们的公式揭示了SFT训练更基本的设计原则,并为SFT目标开辟了更广阔的搜索空间。

英文摘要

Supervised fine-tuning (SFT) typically maximizes the likelihood of every token in a demonstrated trajectory. However, an observed token can be non-unique, noisy, or misaligned with the model prior. Strictly fitting toward this one-hot target may be suboptimal, especially when the pretrained model encodes a rich knowledge prior. In this work, we reinterpret SFT as target distribution design: instead of studying only the loss objective, we analyze the token-level target that the loss drives the model to match. We introduce the Q-target framework, which decomposes SFT supervision into two explicit choices: (1) how strongly to rely on the observed token, and (2) how to allocate the remaining probability mass over alternatives. This perspective unifies many existing SFT variants as implicit choices of the target distribution Q. Building on this view, we propose Target-SFT which constructs the training objective directly from the desired target distribution. This method consistently outperforms across the ten reasoning dataset-model settings evaluated, showing the effectiveness of this target-based approach. Overall, our formulation reveals a more fundamental design principle for SFT training and opens a broader search space for SFT objectives.

2606.09635 2026-06-10 cs.CL cs.LG 交叉投稿

Gradient-Guided Reward Optimization for Inference-time Alignment

梯度引导的推理时对齐奖励优化

Hankun Lin, Ruqi Zhang

发表机构 * Purdue University(普渡大学)

AI总结 提出梯度引导奖励优化(GGRO)方法,通过解码时注入梯度信号生成的引导令牌,在推理时微调生成轨迹,提升安全性、有用性和推理性能,并增强对奖励攻击的鲁棒性。

Comments Accepted to UAI 2026

详情
AI中文摘要

确保大型语言模型(LLMs)在分布漂移下的可靠性需要推理时自适应。虽然推理时对齐方法如Best-of-$N$和拒绝采样被广泛使用,但它们将任务视为采样密集的奖励引导搜索,导致两个关键限制:性能受限于基础模型的生成质量,以及对不完美奖励模型的依赖使其易受奖励攻击。为解决这些挑战,我们引入梯度引导奖励优化(GGRO),一种轻量级推理时方法,通过梯度引导在解码期间执行有针对性的最小干预。具体来说,GGRO监测令牌级熵以识别指示漂移或未对齐的高不确定性区域。一旦检测到,它通过注入使用现成奖励模型的梯度信号生成的引导令牌来响应,以引导生成轨迹而不仅仅是重新排序样本。实验表明,GGRO在安全性、有用性和推理基准上持续改进推理时对齐。它还提高了高质量响应的覆盖率和对奖励攻击的鲁棒性,且计算开销极小。代码可在https://github.com/lhk2004/GGRO获取。

英文摘要

Ensuring the reliability of Large Language Models (LLMs) under distribution drift requires inference-time adaptation. While inference-time alignment methods such as Best-of-$N$ and rejection sampling are widely used, they frame the task as a sampling-intensive, reward-guided search, leading to two key limitations: their performance is bounded by the base model's generation quality, and their reliance on imperfect reward models makes them vulnerable to reward hacking. To address these challenges, we introduce Gradient-Guided Reward Optimization (GGRO), a lightweight inference-time method that performs targeted, minimal intervention during decoding via gradient guidance. Specifically, GGRO monitors token-level entropy to identify high-uncertainty regions indicative of drift or misalignment. Upon detection, it responds by injecting nudging tokens, generated using gradient signals from an off-the-shelf reward model, to steer the generation trajectory rather than merely re-ranking samples. Experiments show that GGRO consistently improves inference-time alignment across safety, helpfulness, and reasoning benchmarks. It also increases coverage of high-quality responses and robustness to reward hacking, with minimal computational overhead. Code is available at https://github.com/lhk2004/GGRO.

2606.09871 2026-06-10 cs.CV cs.AI cs.LG 交叉投稿

SD-GRPO: Verifiable Segment Decomposition for Long-Form Vision-Language Generation

SD-GRPO:面向长格式视觉-语言生成的可验证片段分解

Hyunwoong Kim, Seongeun Lee, Hannah Yun, Junhyun Park, Jonggwon Park

AI总结 提出SD-GRPO方法,通过将长格式输出分解为片段并计算逐片段优势,解决GRPO在视觉-语言任务中粗粒度信用分配不足的问题,实验证明其在多种长格式生成任务中优于基线。

详情
AI中文摘要

群体相对策略优化(GRPO)及其变体最初为大型语言模型(LLM)开发,最近被应用于多模态LLM并取得了强劲结果。然而,它们基于单一标量优势的粗粒度整体信用分配在视觉-语言(VL)任务中拟合不足,这些任务的输出通常是基于语义丰富图像的长格式响应。为解决这一限制,我们利用了一种单标量公式丢弃的结构化信号:长格式VL输出的自然分段。具体地,我们提出片段分解GRPO(SD-GRPO),它对整个rollout组中可验证的逐片段奖励进行z归一化,生成一个逐片段优势向量以替代单一标量。我们在三个设置中评估SD-GRPO,涵盖受控和真实世界的长格式VL生成,按片段间语义纠缠程度递增组织。在从DOCCI构建的受控多面板密集字幕任务中(片段语义独立),SD-GRPO始终优于GRPO基线,且片段数量越多增益越大。扩展到从MultiChartQA构建的受控多图表长格式VQA任务,我们从理论和经验上证明,rollout级奖励存在随输出长度增加而加剧的跨片段信用错误归因。在MMSci数据集上的真实世界科学图表字幕任务中(子图字幕共享图表上下文),混合整体和逐片段奖励进一步提升了两者性能,表明当片段语义纠缠时,仅逐片段归一化是不够的。最后,通过将SD-GRPO集成到Dr. GRPO中,我们确认它可以以最小的实现开销应用于任何GRPO框架,以增强长格式VL生成。

英文摘要

Group Relative Policy Optimization (GRPO) and its variants, originally developed for Large Language Models (LLMs), have recently been applied to Multimodal LLMs and produced strong results. However, their coarse-grained holistic credit assignment from a single scalar advantage underfits vision-language (VL) tasks, where outputs are often long-form responses grounded in semantically rich images. To address this limitation, we exploit a structured signal that single-scalar formulations discard: the natural segmentation of long-form VL outputs. Concretely, we propose Segment-Decomposed GRPO (SD-GRPO), which z-normalizes verifiable per-segment rewards across the rollout group, yielding a vector of per-segment advantages in place of a single scalar. We evaluate SD-GRPO across three settings spanning controlled and real-world long-form VL generation, organized by increasing semantic entanglement across segments. On a controlled multi-panel dense-captioning task constructed from DOCCI, where segments are semantically independent, SD-GRPO consistently outperforms the GRPO baseline, with larger gains at higher segment counts. Extending to a controlled multi-chart long-form VQA task constructed from MultiChartQA, we show both theoretically and empirically that rollout-level rewards suffer from cross-segment credit misattribution that scales with output length. On a real-world scientific figure captioning task on the MMSci dataset, where subfigure captions share context across the figure, blending holistic and per-segment rewards further improves on both, suggesting per-segment normalization alone is insufficient when segments are semantically entangled. Finally, by integrating SD-GRPO into Dr. GRPO, we confirm that it can be applied to any GRPO framework with minimal implementation overhead to enhance long-form VL generation.

2606.09900 2026-06-10 cs.CL cs.AI cs.IR cs.LG 交叉投稿

Less Context, More Accuracy: A Bi-Temporal Memory Engine for LLM Agents Where a Lean Retrieved Context Beats the Full History

更少上下文,更高准确率:一种用于LLM Agent的双时间记忆引擎,其中精简检索上下文优于完整历史

Liuyin Wang

发表机构 * Independent Researcher(独立研究者)

AI总结 提出一种双时间记忆引擎Engram,通过混合读取路径从约9.6k token的检索片段中回答,在LongMemEval_S上达到83.6%准确率,比完整历史(79k token)高10.4个百分点,且无错误。

Comments 14 pages, 4 figures, 3 tables. Code, reproducible harness, and raw per-question logs: https://github.com/ly-wang19/engram

详情
AI中文摘要

长期记忆是LLM Agent缺失的一层:跨会话时它们会遗忘,而常见的解决方法——将整个历史重放到提示中——成本高、速度慢,且随着干扰物积累,准确性下降。大多数记忆系统在成本或延迟上胜出,但在准确性上仍不如完整上下文基线,且基准测试结果在不一致、不可复现的测试平台上报告,导致同一系统在不同来源上得分差异巨大。我们提出Engram,一种基于双时间数据模型的开源双过程记忆引擎。快速写入路径附加无损事件,无需LLM参与关键路径;异步路径提取原子(主体、谓词、客体)事实,构建双时间知识图谱,并解决矛盾,无需每个事实调用LLM——使事实失效而非删除,因此每个事实都有来源和继承链。混合读取路径融合密集、词汇、图谱和时效/显著性信号,应用时间点(“截至”)过滤器,并组装紧凑、带有来源标记的上下文。在完整的500个问题的LongMemEval_S上,由官方分类特定评判器评分,Engram的精简配置——从约9.6k token的检索片段回答,而非完整历史——得分为83.6%,而完整上下文为73.2%(+10.4个百分点,McNemar p < 10^-6),token数约为1/8(9.6k vs. 79k),且0/500错误。这种增益需要混合读取路径:仅事实会丢失召回率,而事实加检索片段则恢复细节。我们还贡献了一个中立的、仓库内的评估平台,内置官方评判器,并在每个表格中包含完整上下文基线,发布原始每问题日志,并记录了无声扭曲记忆基准的测量完整性陷阱(截断、自制评判器、完整历史泄露)。每个数字都附带复现命令。

英文摘要

Long-term memory is the missing layer for LLM agents: across sessions they forget, and the common workaround -- replaying the whole history into the prompt -- is expensive, slow, and, as distractors accumulate, less accurate. Most memory systems win on cost or latency but still lose to the full-context baseline on accuracy, and benchmark numbers are reported on inconsistent, non-reproducible harnesses, so one system appears at wildly different scores across sources. We present Engram, an open-source, dual-process memory engine on a bi-temporal data model. A fast write path appends lossless episodes with no LLM on the critical path; an asynchronous path extracts atomic (subject, predicate, object) facts, builds a bi-temporal knowledge graph, and resolves contradictions without an LLM call per fact -- invalidating, never deleting, so every fact keeps provenance and a supersession chain. A hybrid read path fuses dense, lexical, graph, and recency/salience signals, applies a point-in-time ("as-of") filter, and assembles a compact, provenance-tagged context. On the full 500-question LongMemEval_S, graded by the official category-specific judge, Engram's lean configuration -- answering from a ~9.6k-token retrieved slice, never the full history -- scores 83.6% vs. 73.2% for full-context (+10.4 points, McNemar p < 10^-6) at ~8x fewer tokens (9.6k vs. 79k), with 0/500 errored. The gain needs a hybrid read path: facts alone lose recall, while facts plus retrieved chunks recover detail. We also contribute a neutral, in-repo evaluation harness with the official judge baked in and the full-context baseline in every table, publish the raw per-question logs, and document the measurement-integrity pitfalls (truncation, home-grown judges, full-history leaks) that silently distort memory benchmarks. Every number ships with a command to reproduce it.

2606.09930 2026-06-10 cs.PL cs.LG cs.SC 交叉投稿

Compile Once, Differentiate Everywhere: A Differentiable Meta-Circular Interpreter

一次编译,处处微分:可微分元循环解释器

Lucas Sheneman

AI总结 提出一种将Scheme子集编译为可微分计算图的编译器,实现可微分元循环解释(DMCI),支持对包含闭包、递归和数据结构的程序进行反向模式自动微分,无需重新编译。

详情
AI中文摘要

程序执行与基于梯度的优化之间的界限长期以来限制了代码本身作为可学习科学模型的使用。我们提出一个编译器,将Scheme的自托管子集转换为用于自动微分后端的可微分计算图。由于该子集可以编译自身的求值器,这产生了可微分元循环解释(DMCI):一个编译后的Scheme解释器执行作为数据提供的程序,而反向模式自动微分将梯度传播到嵌入在这些程序中的连续常数。解释器只编译一次,因此新程序无需重新编译或自定义梯度机制即可继承可微性,同时保留闭包、递归和数据结构。我们证明通过编译解释器的梯度几乎处处正确,并表明它们在171个递归和高阶程序-种子对上与直接编译的数值精度匹配。然后,我们使用DMCI进行程序与参数联合搜索,其中大型语言模型提出Scheme程序,精确梯度通过单个冻结的解释器校准其连续参数。这实现了OpenEvolve风格的程序搜索,其中外部循环提出离散程序结构,DMCI提供每个候选程序连续参数的精确基于梯度的校准。在电池容量衰减数据上,该搜索恢复了膝盖状退化结构,并在更难的早期外推分割上改善了保留外推性能,优于手工基线,在后期分割上与之匹配。在高维厄尔尼诺反问题中,DMCI优化了基于解释的卡尔曼滤波器似然,而无梯度搜索失败。这些结果将符号回归和神经符号搜索从闭式表达式扩展到可执行、有状态的程序,使模型生成的代码可直接针对数据进行优化。

英文摘要

The boundary between program execution and gradient-based optimization has long limited the use of code itself as a learnable scientific model. We present a compiler that translates a self-hosting subset of Scheme into differentiable computation graphs for autograd backends. Because the subset can compile its own evaluator, this yields differentiable meta-circular interpretation (DMCI): a compiled Scheme interpreter executes programs supplied as data, while reverse-mode autodiff propagates gradients to continuous constants embedded in those programs. The interpreter is compiled once, so new programs inherit differentiability without recompilation or custom gradient machinery, while retaining closures, recursion, and data structures. We prove that gradients through the compiled interpreter are correct almost everywhere and show that they match direct compilation to numerical precision across 171 recursive and higher-order program-seed pairs. We then use DMCI for program-and-parameter co-search, where a large language model proposes Scheme programs and exact gradients calibrate their continuous parameters through a single frozen interpreter. This enables OpenEvolve-style program search in which an outer loop proposes discrete program structures and DMCI supplies exact gradient-based calibration of each candidate's continuous parameters. On battery capacity-fade data, the search recovers a knee-like degradation structure and improves held-out extrapolation over hand-crafted baselines on the harder early-extrapolation split, matching them on the later split. On a high-dimensional El Nino inverse problem, DMCI optimizes an interpreted Kalman-filter likelihood where gradient-free search fails. These results extend symbolic regression and neurosymbolic search from closed-form expressions to executable, stateful programs, making model-generated code directly optimizable against data.

2606.10130 2026-06-10 physics.optics cs.LG physics.data-an 交叉投稿

Effective Training Principles of Physical Reservoirs

物理储层的有效训练原则

Sobhi Saeed, Mehmet Müftüoglu, Glitta R. Cheeran, Juliane Heim, Bennet Fischer, Mario Chemnitz

发表机构 * Leibniz-Institute of Photonic Technology(莱比锡光电技术研究所) Friedrich Schiller University(弗里德里希-席勒大学)

AI总结 研究通过输出剪枝和正则化方法减轻物理储层计算中的过拟合与计算负担,比较多种方法并展示输出采样和正则化对性能的提升。

Comments 19 pages, 7 figures

详情
AI中文摘要

储层计算机受益于光学现象的固有复杂性,这些现象提供了丰富的、通常是非线性的动力学。然而,直接在储层输出上进行训练会使系统容易过拟合,并且在训练阶段计算效率低下。在这项工作中,我们研究了通过输出剪枝和正则化来减轻过拟合和减少计算开销的策略。我们比较了损失最小化搜索方法(Equal Search 和 Branch and Bound)与面向输出的统计过滤方法(Variance Filter)以及随机剪枝,突出了每种方法的优缺点以及明智的储层输出采样的整体重要性,特别是对于缩小的潜在空间。我们进一步证明,强制在整个输出频谱上选择读出可以提高性能,特别是对于非迭代方法。此外,我们检查了 L1 和 L2 正则化技术(LASSO 和岭回归),两者都显著提高了高度非线性任务(如 Spiral Benchmark)的性能。虽然我们的方法具有通用性,但结果是从一个非线性光纤极端学习机中获得的,并以其为例进行讨论。总的来说,这项研究深入分析了储层的隐藏层过滤机制和输出层训练,从而在物理储层计算系统中实现优化性能。

英文摘要

Reservoir computers benefit from the inherent complexity of optical phenomena, which provide rich, often nonlinear dynamics. However, training directly on the reservoir's output renders the system prone to overfitting and computationally inefficient during the training phase. In this work, we investigate strategies to mitigate overfitting and reduce computational overhead through output pruning and regularization. We compare loss-minimizing search methods (Equal Search and Branch and Bound) against an output-oriented statistical filtering approach (Variance Filter) and random pruning, highlighting advantages and disadvantages of each approach and the overall importance of informed reservoir output sampling, particularly for a shrinking latent space. We further demonstrate that enforcing readout selection across the full output spectrum improves performance, especially for non-iterative methods. Additionally, we examine L1 and L2 regularization techniques (LASSO and ridge regression), both of which significantly enhance performance on highly nonlinear tasks such as the Spiral Benchmark. While our methods are of general use, results are obtained from and discussed exemplarily for a nonlinear fiber-optical extreme learning machine. Overall, this study provides a deep analysis of the reservoirs' hidden-layer filtering mechanisms and the output-layer training, enabling optimized performance in physical reservoir computing systems.

2606.10369 2026-06-10 cs.CL cs.LG 交叉投稿

PADD: Path-Aligned Decompression Distillation for Non-Router Teacher to Guide MoE Student Learning

PADD: 面向非路由器教师指导MoE学生学习的路径对齐解压缩蒸馏

Xinyue Peng, Yi Qian, Jiaojiao Lin, Wenjian Shao, Yanming Liu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出路径对齐解压缩蒸馏(PADD)框架,通过四阶段两阶段流程将密集教师知识蒸馏到混合专家(MoE)学生中,同时学习高质量路由策略,在数学推理任务上显著优于基线。

Comments published in ICML 2026

详情
AI中文摘要

随着大型语言模型(LLMs)持续扩展,在固定计算预算下增长模型容量变得越来越具有挑战性。我们提出路径对齐解压缩蒸馏(PADD),这是一个将知识从无显式路由的密集教师蒸馏到混合专家(MoE)学生中,同时学习高质量路由策略的框架。PADD将知识蒸馏组织为两个阶段的四个阶段:初始化阶段(阶段I)通过教师神经元聚类和学生专家预热在学生专家中构建多样功能,以及训练阶段(阶段II–IV)将在线自适应蒸馏、路径细化策略优化和奖励增强负载平衡集成在单一训练流程中。在数学推理基准上的实验表明,在相同推理成本下,PADD相比强基线取得了显著提升,且MoE学生能够匹配或超越其密集教师。实验还展示了有效的教师到学生知识蒸馏和稳定的路由行为。

英文摘要

As large language models (LLMs) continue to scale, it becomes increasingly challenging to grow model capacity under fixed computation budgets. We propose Path-Aligned Decompression Distillation (PADD), a framework for distilling knowledge from dense teachers without explicit routing into mixture-of-experts (MoE) students while learning high-quality routing policies. PADD organizes knowledge distillation into four stages in two phases: an initialization phase (Stage I) that builds diverse functionality in the student's experts through teacher neuron clustering and student-expert warmup, and a training phase (Stages II--IV) that integrates online adaptive distillation, path-refined policy optimization, and reward-augmented load balancing in a single training pipeline. Experiments on mathematical reasoning benchmarks demonstrate that PADD yields substantial gains over strong baselines at the same inference cost and that the MoE student can match or surpass its dense teacher. They also demonstrate effective teacher-to-student knowledge distillation and stable routing behavior.

2606.11173 2026-06-10 cs.AI cs.LG 交叉投稿

The Role of Feedback Alignment in Self-Distillation

反馈对齐在自蒸馏中的作用

Semih Kara, Oğuzhan Ersoy

发表机构 * Gensyn

AI总结 研究通过自蒸馏提升语言模型性能时,反馈与模型推理的结构对齐是关键因素,步级对齐批评比二元奖励或参考解更有效。

Comments Accepted to the ICML 2026 Workshop on RL from World Feedback (RLxF)

详情
AI中文摘要

在语言模型上附加额外上下文(例如对先前尝试的反馈)通常会改善其响应。自蒸馏训练模型在没有上下文时保留这种改进。该方法通过匹配模型在两种设置下的输出分布来工作:学生仅看到问题,而自教师还看到上下文。因此,模型学习的内容取决于自教师接收的上下文,然而上下文的设计在很大程度上尚未被探索。我们通过使用冻结的批评器训练求解器来研究自蒸馏的上下文设计。我们比较了三种条件:(i) 二元奖励(GRPO),(ii) 参考解,以及 (iii) 与求解器推理轨迹对齐的逐步批评。步级对齐批评带来了最大的增益,在Avg@12上比GRPO高出16.11分,比参考解条件化的自蒸馏高出5.27分。逐token优势分析揭示了原因:步级对齐反馈仅针对推理失败的token,保留正确行为不变。相比之下,条件化于参考解会迫使模型在每个token上改变其行为(即使是正确的步骤),因为替代推导在措辞和方法上不可避免地存在差异。这表明反馈与求解器推理之间的结构对齐是自蒸馏有效性的关键驱动因素。

英文摘要

Conditioning a language model on additional context, such as feedback on a previous attempt, typically improves its response. Self-distillation trains the model to retain this improvement when the context is not present. The method works by matching the model's output distribution under two settings: a student that sees only the question, and a self-teacher that also sees the context. What the model learns therefore depends on what context the self-teacher receives, yet the design of this context remains largely unexplored. We study context design for self-distillation by training a solver on feedback from a frozen critic. We compare three conditions: (i) a binary reward (GRPO), (ii) the reference solution, and (iii) a step-by-step critique aligned to the solver's reasoning trace. Step-aligned critique yields the largest gains, outperforming GRPO by 16.11 points and reference-solution-conditioned self-distillation by 5.27 points (Avg@12). Per-token advantage analysis reveals why: step-aligned feedback targets only the tokens where reasoning fails, leaving correct behavior intact. Conditioning on the reference solution, by contrast, pressures the model to change its behavior at every token (even correct steps) because an alternative derivation inevitably differs in phrasing and approach. This suggests that structural alignment between feedback and the solver's reasoning is a key driver of self-distillation effectiveness.

2404.09101 2026-06-10 cs.LG cs.AI cs.NA math.NA stat.ML 版本更新

Mixtures of Neural Operators Reduce Active Complexity in Operator Learning

神经算子混合体降低算子学习中的主动复杂度

Anastasis Kratsios, Takashi Furuya, Jose Antonio Lara Benitez, Matti Lassas, Maarten de Hoop

发表机构 * McMaster University and Vector Institute(麦斯特大学和向量研究所) Shimane University(岛根大学) Rice University(里士满大学) University of Helsinki(赫尔辛基大学)

AI总结 通过路由混合神经算子(MoNO)与固定单神经算子构造的比较,证明MoNO在主动专家规模上具有更优的深度、宽度和秩缩放,且对Lipschitz目标这些量以O(ε^{-1})为界。

详情
AI中文摘要

算子学习系统并非仅由总参数数量决定;对于一次查询,相关瓶颈可能是必须加载和评估的模型。我们通过路由混合神经算子(MoNO)与固定单神经算子构造之间的建设性比较,在紧致Sobolev子集上研究了经典神经算子的这一区别。该比较涉及相对于基线的专家主动复杂度,其中总存储大小和路由搜索分别考虑。MoNO将每个输入函数通过树路由到一个专家。我们的主要定理表明,在近似集上,每个具有有界输出Sobolev半径的标量一致连续非线性算子都存在一个MoNO近似,其主动专家具有比所分析的单神经算子构造更小的深度、宽度和秩缩放;对于Lipschitz目标,这些专家量以$\mathcal{O}(\varepsilon^{-1})$为界。该定理将局部化转化为主动专家大小、路由深度和专家数量的算子级核算。我们还证明了底层神经算子架构的定量通用近似定理,明确依赖于紧集直径和连续模。

英文摘要

Operator-learning systems are not governed solely by total parameter count; for one query, the relevant bottleneck can be the model that must be loaded and evaluated. We study this distinction for classical neural operators on compact Sobolev subsets through a constructive comparison between routed mixtures of neural operators (MoNOs) and a fixed single-neural-operator construction. The comparison concerns expert-active complexity relative to that baseline, with total stored size and routing search accounted separately. A MoNO routes each input function through a tree to one expert. Our main theorem shows that every scalar uniformly continuous nonlinear operator with bounded output Sobolev radius on the approximation set admits a MoNO approximation whose active expert has smaller depth, width, and rank scaling than the analyzed single-neural-operator construction; for Lipschitz targets these expert quantities are bounded by $\mathcal{O}(\varepsilon^{-1})$. The theorem turns localization into an operator-level accounting of active expert size, routing depth, and number of experts. We also prove a quantitative universal approximation theorem for the underlying neural-operator architecture, with explicit dependence on compact-set diameter and modulus of continuity.

2502.11034 2026-06-10 cs.LG 版本更新

AdaGC: Enhancing LLM Pretraining Stability via Adaptive Gradient Clipping

AdaGC: 通过自适应梯度裁剪增强LLM预训练稳定性

Guoxia Wang, Shuai Li, Congliang Chen, Jinle Zeng, Jiabin Yang, Dianhai Yu, Yanjun Ma, Li Shen

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出自适应逐张量梯度裁剪方法AdaGC,通过限制梯度范数相对于历史裁剪值的指数移动平均来消除损失尖峰,在Llama-2 7B等模型上实现零尖峰并提升下游准确率。

Comments Accept by ICML 2026

详情
AI中文摘要

损失尖峰仍然是大规模语言模型预训练中的一个持续障碍。虽然先前的研究试图通过调查单个因素来识别损失尖峰的根本原因,但我们观察到,在实践中,这种尖峰通常是由异质因素的汇合触发的。根据经验,损失尖峰可能源于数据异常值、硬件或瞬时计算故障、数值精度问题和超参数设置的组合。无论根本原因如何,这些尖峰表现为不稳定的优化器更新,因为异常梯度污染了第一和第二矩状态。在本文中,我们提出了一种基于梯度的原则性补救措施:AdaGC,一种自适应逐张量梯度裁剪方案,通过将梯度范数限制在它们历史裁剪值的张量级指数移动平均附近来减轻这种污染。AdaGC与优化器无关,引入可忽略的内存开销,并且与GlobalGC相比降低了通信成本,特别是在混合并行分布式训练中。在Llama-2 7B、Mixtral 8x1B和ERNIE 10B-A1.4B上的实验表明,AdaGC稳健地消除了训练不稳定性,一致地将所有模型的尖峰分数降至零,并且相对于GlobalGC分别将下游准确率提高了1.32%、1.27%和2.48%。此外,AdaGC与Muon和Lion等优化器无缝集成,一致地产生更高的平均准确率和零尖峰分数。代码可在以下网址获得:此https URL(参见Research/AdaGC)。

英文摘要

Loss spikes remain a persistent obstacle in large-scale language model pretraining. While previous research has attempted to identify the root cause of loss spikes by investigating individual factors, we observe that, in practice, such spikes are typically triggered by the confluence of heterogeneous factors. Empirically, loss spikes may arise from a combination of data outliers, hardware or transient computational faults, numerical precision issues, and hyperparameter settings. Regardless of the underlying cause, these spikes manifest as unstable optimizer updates, as abnormal gradients contaminate both first- and second-moment states. In this paper, we propose a principled gradient-centric remedy: AdaGC, an adaptive per-tensor gradient clipping scheme that mitigates such contamination by bounding gradient norms relative to a tensor-wise exponential moving average of their historical clipped values. AdaGC is optimizer-agnostic, introduces negligible memory overhead, and reduces communication costs compared to GlobalGC, particularly in hybrid-parallel distributed training. Experiments on Llama-2 7B, Mixtral 8x1B, and ERNIE 10B-A1.4B demonstrate that AdaGC robustly eliminates training instabilities, consistently reducing spike scores to zero for all models and improving downstream accuracy over GlobalGC by 1.32%, 1.27%, and 2.48%, respectively. Furthermore, AdaGC seamlessly integrates with optimizers such as Muon and Lion, consistently yielding higher average accuracy and zero spike scores. The code is available at https://github.com/PaddlePaddle/PaddleFleet (see Research/AdaGC).

2503.24007 2026-06-10 cs.LG cs.AI 版本更新

CITRAS: Covariate-Informed Transformer for Time Series Forecasting

CITRAS: 协变量感知的Transformer时间序列预测

Yosuke Yamaguchi, Issei Suemitsu, Wenpeng Wei

发表机构 * Research & Development Group, Hitachi, Ltd.(日立有限公司研发部)

AI总结 提出CITRAS,一种仅解码器Transformer,通过KV移位和注意力分数平滑机制灵活整合已知协变量的未来部分,并捕获局部和全局跨变量依赖,提升预测精度。

详情
Journal ref
IEEE Access, vol. 14, pp. 77983-77998, 2026
AI中文摘要

在时间序列预测中,协变量代表影响目标变量的外部因素。一些协变量仅在过去可观测(观测协变量,如记录的天气数据),而另一些则预先已知(已知协变量,如日历事件或折扣计划)。尽管协变量有潜力提升预测性能,但大多数基于深度学习的预测模型难以处理由已知协变量的未来部分引起的变量长度差异,且无法灵活利用它们。此外,捕获目标变量与协变量之间的依赖关系并非易事,因为模型必须准确反映协变量的局部影响,同时建模全局跨变量依赖。为应对这些挑战,我们提出CITRAS,一种仅解码器Transformer,灵活整合多个目标变量、观测协变量和已知协变量。在保持强大自回归建模能力的同时,CITRAS在分块跨变量注意力中引入两种新机制:键值移位和注意力分数平滑。键值移位通过基于并发依赖将已知协变量的未来部分与目标变量对齐,无缝融入预测过程。注意力分数平滑通过平滑历史注意力分数,将局部精确的分块跨变量依赖细化为全局变量级依赖。实验上,CITRAS在协变量感知和多变量设置下的广泛真实世界数据集上展现出强大性能,展示了其利用跨变量和跨时间依赖提升预测准确性的通用能力。

英文摘要

In time series forecasting, covariates represent external factors that influence target variables. Some covariates are observable only in the past (observed covariates, such as recorded weather data), while others are known in advance (known covariates, such as calendar events or discount schedules). Although covariates have the potential to enhance forecasting performance, most deep learning-based forecasting models struggle to address the length discrepancy between variables caused by the future portion of known covariates and fail to leverage them flexibly. Moreover, capturing dependencies between target variables and covariates is non-trivial, as models must accurately reflect the local impact of covariates while simultaneously modeling global cross-variate dependencies. To address these challenges, we propose CITRAS, a decoder-only Transformer that flexibly integrates multiple target variables, observed covariates, and known covariates. While preserving strong autoregressive modeling capabilities, CITRAS introduces two novel mechanisms within patch-wise cross-variate attention: Key-Value (KV) Shift and Attention Score Smoothing. KV Shift seamlessly incorporates the future portion of known covariates into the forecasting process by aligning them with target variables based on their concurrent dependencies. Attention Score Smoothing refines locally accurate patch-wise cross-variate dependencies into global variate-level dependencies by smoothing the historical attention scores. Experimentally, CITRAS demonstrates strong performance across a wide range of real-world datasets in both covariate-informed and multivariate settings, showcasing its versatile ability to leverage cross-variate and cross-time dependencies for improved forecasting accuracy.

2601.03093 2026-06-10 cs.LG cs.CL 版本更新

ATLAS: Verifier-Guided Adaptive Latent Activation Steering for Efficient LLM Reasoning

ATLAS:验证器引导的自适应潜在激活引导用于高效LLM推理

Tuc Nguyen, Thai Le

发表机构 * Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 提出ATLAS框架,通过轻量级验证器动态调整推理时潜在状态引导策略,实现每步自适应控制,在数学和编码推理任务上提升准确率并减少测试时token使用。

Comments 21 pages, 6 figures

详情
AI中文摘要

最近关于激活和潜在引导的研究表明,修改内部表示可以有效引导大型语言模型(LLMs)在不更新模型参数的情况下提高推理和效率。然而,大多数现有方法依赖固定引导策略和静态干预强度,这限制了它们在问题实例上的鲁棒性,并常常导致过度或不足引导。我们提出自适应测试时潜在引导(ATLAS),这是一个轻量级框架,通过训练好的、轻量级验证器在推理时动态控制引导决策。给定中间隐藏状态,验证器预测当前推理的质量,并自适应选择要应用的引导动作,实现每个示例和每个步骤的调整,且开销最小。ATLAS提供了一个统一框架,将学习到的潜在验证与测试时激活引导相结合,无需额外的LLM解码或推理时过程奖励模型调用即可实现自适应推理控制。在多个数学和编码推理基准上的实验表明,ATLAS始终优于普通解码和固定引导基线,在实现更高准确率的同时大幅减少测试时token使用。这些结果表明,验证器引导的潜在适应提供了一种有效且可扩展的机制,可以在不牺牲解决方案质量的情况下控制推理效率。所有源代码将公开提供。

英文摘要

Recent work on activation and latent steering has demonstrated that modifying internal representations can effectively guide large language models (LLMs) toward improved reasoning and efficiency without updating model parameters. However, most existing approaches rely on fixed steering policies and static intervention strengths, which limit their robustness across problem instances and often result in over- or under-steering. We propose Adaptive Test-time Latent Steering (ATLAS), a lightweight framework that dynamically controls steering decisions at inference time using a trained, lightweight verifier over the latent states. Given intermediate hidden states, the verifier predicts the quality of ongoing reasoning and adaptively selects which steering action to apply, enabling per-example and per-step adjustment with minimal overhead. ATLAS provides a unified framework for combining learned latent verification with test-time activation steering, enabling adaptive reasoning control without additional LLM decoding or inference-time process reward model calls. Experiments on multiple mathematical and coding reasoning benchmarks show that ATLAS consistently outperforms both vanilla decoding and fixed steering baselines, achieving higher accuracy while substantially reducing test-time token usage. These results demonstrate that verifier-guided latent adaptation provides an effective and scalable mechanism for controlling reasoning efficiency without sacrificing solution quality. All source code will be publicly available.

2602.03164 2026-06-10 cs.LG cs.AI 版本更新

MemCast: Memory-Driven Time Series Forecasting with Experience-Conditioned Reasoning

MemCast:基于经验条件推理的记忆驱动时间序列预测

Xiaoyu Tao, Mingyue Cheng, Ze Guo, Shuo Yu, Yaguo Liu, Qi Liu, Shijin Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出MemCast框架,将时间序列预测转化为经验条件推理任务,通过层次化记忆学习历史模式、推理智慧和一般规律,并采用动态置信度适应策略实现持续进化,在多个数据集上优于现有方法。

详情
AI中文摘要

时间序列预测(TSF)在许多现实世界的决策中起着关键作用。最近,基于大型语言模型(LLM)的预测器取得了有希望的进展。尽管有效,现有方法通常缺乏显式的经验积累和持续进化。在这项工作中,我们提出了MemCast,一个学习到记忆的框架,将TSF重新表述为经验条件推理任务。具体来说,我们从训练集中学习经验并将其组织成层次化记忆。这是通过将预测结果总结为历史模式、将推理轨迹提炼为推理智慧、以及将提取的时间特征归纳为一般规律来实现的。此外,在推理过程中,我们利用历史模式指导推理过程,利用推理智慧选择更好的轨迹,而一般规律则作为反思迭代的标准。另外,为了实现持续进化,我们设计了一种动态置信度适应策略,该策略在不泄露测试集分布的情况下更新单个条目的置信度。在多个数据集上的大量实验表明,MemCast始终优于以前的方法,验证了我们方法的有效性。我们的代码可在以下网址获得:此 https URL。

英文摘要

Time series forecasting (TSF) plays a critical role in decision-making for many real-world applications. Recently, large language model (LLM)- based forecasters have made promising advancements. Despite their effectiveness, existing methods often lack explicit experience accumulation and continual evolution. In this work, we propose MemCast, a learning-to-memory framework that reformulates TSF as an experience-conditioned reasoning task. Specifically, we learn experience from the training set and organize it into a hierarchical memory. This is achieved by summarizing prediction results into historical patterns, distilling inference trajectories into reasoning wisdom, and inducing extracted temporal features into general laws. Furthermore, during inference, we leverage historical patterns to guide the reasoning process and utilize reasoning wisdom to select better trajectories, while general laws serve as criteria for reflective iteration. Additionally, to enable continual evolution, we design a dynamic confidence adaptation strategy that updates the confidence of individual entries without leaking the test set distribution. Extensive experiments on multiple datasets demonstrate that MemCast consistently outperforms previous methods, validating the effectiveness of our approach. Our code is available at https://github.com/Xiaoyu-Tao/MemCast-TS.

2602.10796 2026-06-10 cs.LG 版本更新

PRISM: Parallel Residual Iterative Sequence Model

PRISM: 并行残差迭代序列模型

Jie Jiang, Ke Cheng, Xin Xu, Mengyang Pang, Tianhao Lu, Jiaheng Li, Yue Liu, Yuan Wang, Jun Zhang, Huan Yu, Zhouchen Lin

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出PRISM模型,通过写-遗忘解耦和两阶段代理架构,将迭代优化并行化,在保持表达能力的同时实现174倍吞吐量提升。

Comments 21 pages, 2 figures

详情
AI中文摘要

生成式序列建模面临Transformer的表达能力与线性序列模型的效率之间的基本矛盾。现有的高效架构在理论上受限于浅层单步线性更新,而像测试时训练(TTT)这样的强大迭代方法由于两个维度的串行依赖(token级状态依赖和步级迭代循环)破坏了硬件并行性。我们提出PRISM(并行残差迭代序列模型)来解决这一矛盾。PRISM以可并行化的形式显式重构了TTT的表达性门控×残差×方向迭代模式。我们采用写-遗忘解耦策略,将非线性隔离在注入算子内。为了绕过显式求解器的串行依赖,PRISM利用两阶段代理架构:短卷积利用局部历史能量锚定初始残差,而学习预测器直接从输入估计细化更新。该设计将与迭代校正相关的结构模式蒸馏为可并行化的前馈算子。理论上,我们证明该公式实现了Rank-$L$累积,在结构上扩展了更新流形,超越了单步Rank-$1$瓶颈。实验上,它实现了与显式优化方法相当的性能,同时实现了\textbf{174倍更高的吞吐量}。代码见该https URL。

英文摘要

Generative sequence modeling faces a fundamental tension between the expressivity of Transformers and the efficiency of linear sequence models. Existing efficient architectures are theoretically bounded by shallow, single-step linear updates, while powerful iterative methods like Test-Time Training (TTT) break hardware parallelism due to two dimensions of serial dependency: token-level state reliance and step-level iteration loops. We propose PRISM (Parallel Residual Iterative Sequence Model) to resolve this tension. PRISM explicitly reconstructs the expressive gate x residual x direction iteration pattern of TTT in a parallelizable form. We employ a Write-Forget Decoupling strategy that isolates non-linearity within the injection operator. To bypass the serial dependency of explicit solvers, PRISM utilizes a two-stage proxy architecture: a short-convolution anchors the initial residual using local history energy, while a learned predictor estimates the refinement updates directly from the input. This design distills structural patterns associated with iterative correction into a parallelizable feedforward operator. Theoretically, we prove that this formulation achieves Rank-$L$ accumulation, structurally expanding the update manifold beyond the single-step Rank-$1$ bottleneck. Empirically, it achieves comparable performance to explicit optimization methods while achieving \textbf{174x higher throughput}. Codes are available in https://github.com/gpr-prism/prism/.

2605.08171 2026-06-10 cs.LG cs.AI 版本更新

Communication Dynamics Neural Networks: FFT-Diagonalized Layers for Improved Hessian Conditioning at Reduced Parameter Count

通信动力学神经网络:用于改进Hessian条件数并减少参数数量的FFT对角化层

Lurong Pan

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出CDLinear块循环线性层,通过FFT对角化Hessian矩阵,在参数减少3.8倍下保持高精度,并给出显式条件数分析。

Comments 17 pages, 5 figures. Includes NumPy implementation, gradient checks, MNIST experiments, and reference PyTorch CD-Transformer implementation

详情
AI中文摘要

通信动力学神经网络(CDNNs)将通信动力学框架的循环谱机制应用于神经网络层设计。我们引入了CDLinear,一种块大小为B=2l+1的块循环线性层,其参数数量仅为具有相同输入输出维度的稠密层的1/B。该构造为优化提供了显式的傅里叶域诊断:对于均方损失,权重Hessian矩阵通过离散傅里叶变换对角化,特征值直接由输入块的傅里叶谱决定。在输入预白化下,总体Hessian条件数恰好为1,经验条件数对于N个样本被界定为1+O(sqrt(B/N))。我们使用纯NumPy实现CDLinear,并手工推导反向传播,通过有限差分验证梯度。在8x8 MNIST数字基准上,三个随机种子下,使用B=4的CDLinear MLP以2,380个参数达到97.50%±0.23%的测试准确率,而使用8,970个参数的稠密基线为98.15%±0.47%。这实现了3.8倍的参数减少,准确率损失0.65%。CD-MLP的平均Hessian条件数为1.9e4,比稠密基线的5.9e6小约310倍。我们将CDLinear定位为结构化矩阵神经网络层的一个特例,主要贡献包括闭式Hessian谱诊断、原则性的块多重性离散序列以及显式的条件数分析。我们还发布了一个参考PyTorch实现,将CDLinear集成到DeepSeek-V3风格的混合专家Transformer中,用于未来大规模基准测试。

英文摘要

Communication Dynamics Neural Networks (CDNNs) apply the circulant-spectral machinery of the Communication Dynamics framework to neural-network layer design. We introduce CDLinear, a block-circulant linear layer with block size B = 2l + 1 that uses 1/B the parameters of a dense layer with the same input and output dimensions. The construction gives an explicit Fourier-domain diagnostic for optimization: for mean-squared loss, the weight Hessian is diagonalized by the discrete Fourier transform, with eigenvalues determined directly by the Fourier spectrum of the input blocks. Under input pre-whitening, the population Hessian condition number is exactly 1, and the empirical condition number is bounded by 1 + O(sqrt(B/N)) for N samples. We implement CDLinear in pure NumPy with hand-derived backward passes and verify gradients by finite differences. On the 8x8 MNIST digits benchmark, across three random seeds, a CDLinear MLP with B = 4 reaches 97.50% +/- 0.23% test accuracy using 2,380 parameters, compared with 98.15% +/- 0.47% for a dense baseline using 8,970 parameters. This gives a 3.8x parameter reduction at a 0.65% accuracy cost. The CD-MLP's mean Hessian condition number is 1.9e4, about 310x smaller than the dense baseline's 5.9e6. We position CDLinear as a special case of structured matrix neural-network layers, with the main contributions being a closed-form Hessian-spectrum diagnostic, a principled discrete sequence of block multiplicities, and an explicit conditioning analysis. We also release a reference PyTorch implementation integrating CDLinear into a DeepSeek-V3-style mixture-of-experts transformer for future large-scale benchmarks.

2605.27286 2026-06-10 cs.LG cs.AI 版本更新

Falcon-X: A Time Series Foundation Model for Heterogeneous Multivariate Modeling

Falcon-X:面向异构多变量建模的时间序列基础模型

Yiding Liu, Yifan Hu, Hongjie Xia, Peiyuan Liu, Hongzhou Chen, Xilin Dai, Zewei Dong, Jiang-Ming Yang

发表机构 * Ant Group(蚂蚁集团)

AI总结 针对现有时间序列基础模型在语义对齐和关系表达上的局限,提出Falcon-X,通过将变量映射到统一潜在原型空间,利用统一原型差分注意力机制对齐异构变量,并在共享空间中通过潜在实体注意力进行跨变量交互,实现零样本结构迁移,在GIFT-Eval和fev-bench基准上取得最先进预测性能。

详情
AI中文摘要

时间序列基础模型(TSFMs)通过大规模跨领域预训练正在改变预测范式。然而,大多数现有的TSFMs仍然是单变量的,最近实现跨变量建模的努力仍然直接在原始变量空间内操作。这种设计在语义对齐和关系表达性方面引入了基本限制。具体来说,原始空间分组混合缺乏对齐异构物理量的专用机制,而标准的非负注意力无法捕捉现实世界系统中普遍存在的复杂协同和拮抗交互。为了解决这些挑战,我们提出了Falcon-X,它将变量从原始空间中解耦,并将其映射到一个统一的潜在原型空间中。Falcon-X采用统一原型差分注意力机制,显式评估正负语义亲和度以对齐异构变量。然后通过潜在实体注意力在该共享空间内高效执行跨变量交互,自然地促进零样本结构迁移。最后,变量重组路由器通过请求-分发机制稳健地重建变量特定的轨迹。在GIFT-Eval和fev-bench基准上的广泛评估表明,Falcon-X实现了最先进的预测性能,为复杂的多变量环境提供了一种有原则且可扩展的范式。Falcon-X已公开发布以支持未来研究。

英文摘要

Time series foundation models (TSFMs) are transforming the forecasting paradigm through large-scale cross-domain pretraining. However, most existing TSFMs remain univariate, and recent efforts to enable cross-variate modeling still operate directly within the raw variate space. This design introduces fundamental limitations in semantic alignment and relational expressivity. Specifically, raw-space group mixing lacks a dedicated mechanism to align heterogeneous physical quantities, while standard non-negative attention fails to capture the complex synergistic and antagonistic interactions ubiquitous in real-world systems. To address these challenges, we propose Falcon-X, decouples variates from the raw space and maps them into a unified latent prototype space. Falcon-X employs a Unified Prototype Diff-Attention mechanism that explicitly evaluates both positive and negative semantic affinities to explicitly align heterogeneous variates. Cross-variate interactions are then efficiently performed within this shared space via Latent Entity Attention, naturally facilitating zero-shot structural transfer. Finally, a Variate Reassembly Router robustly reconstructs variate-specific trajectories via a request-and-dispatch mechanism. Extensive evaluations on the GIFT-Eval and fev-bench benchmarks demonstrate that Falcon-X achieves excellent forecasting performance, offering a principled and scalable paradigm for complex multivariate environments. Falcon-X is publicly released to support future research.

2606.02378 2026-06-10 cs.LG cs.AI 版本更新

When Do Attention Circuits Form? Developmental Trajectories of Capability and Attention-Sink Emergence Across Three 1B-ClassArchitectures

注意力电路何时形成?三种1B级架构中能力和注意力汇出现的发育轨迹

Yongzhong Xu

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 本文追踪三种1B级语言模型中注意力头电路的形成轨迹,发现归纳电路形成早于注意力汇形成10-20倍令牌,且电路识别无需最终模型。

Comments 27pages, 3 figures

详情
AI中文摘要

我们追踪了三种1B级语言模型(涵盖两个架构系列:密集Transformer和混合专家,以及两个预训练语料库:The Pile和DCLM)中注意力头电路形成的发育轨迹:Pythia 1B、OLMo 1B-0724-hf和OLMoE 1B-7B-0924。在每个模型的10个对数间隔检查点(共30次机械可解释性运行)上,我们应用参与比(PR)谱信号和全头能力特异性选择性筛选,以追踪归纳头、前一个标记头和BOS吸引头出现的过程。五个发现:(F1)第0层和第1层在每个模型的每个检查点都没有产生BOS分类头:L0/L1零BOS底限是架构属性,而非学习结果。(F2)整个模型的BOS吸引头比例呈现三种不同的出现形状——Pythia 1B中逐渐上升,OLMo 1B中急剧相变(相邻检查点间从7%到70%),OLMoE 1B-7B中逐渐上升。(F3)在DCLM模型中,归纳电路的形成比BOS吸引头的形成早10-20倍令牌;能力电路的形成和注意力汇的形成是两个不同的转变,而非一个。(F4)能力特异性筛选在总训练令牌的0.3-2%内收敛到最终的归纳电路——电路识别不需要最终模型。(F5)对于从所有三个模型中采样的每个最终检查点归纳头,其每个头的PR值在该头首次超过其能力选择性阈值时或之前就已升高。这些结果细化了归纳相变框架:在DCLM上训练的1B级模型中,归纳转变和注意力汇转变在令牌数量上相差一个数量级,并且具有定性不同的形状。

英文摘要

We track the developmental trajectory of attention-head circuit formation across three 1B-class language models spanning two architecture families (dense transformer, mixture-of-experts) and two pretraining corpora (The Pile, DCLM): Pythia 1B, OLMo 1B-0724-hf, and OLMoE 1B-7B-0924. At each of 10 log-spaced revisions per model -- 30 mechanistic-interpretability runs in total -- we apply a participation-ratio (PR) spectral signal and an all-head capability-specific selectivity screen to track induction, previous-token, and BOS-attractor heads as they emerge. Five findings. (F1) Layers 0 and 1 produce zero BOS-classified heads at every revision in every model: the L0/L1 zero-BOS floor is an architectural property, not a learned outcome. (F2) The whole-model BOS-attractor fraction follows three distinct emergence shapes -- a gradual ramp in Pythia 1B, a sharp phase transition in OLMo 1B (7% to 70% between adjacent checkpoints), and a gradual ramp in OLMoE 1B-7B. (F3) In DCLM models, induction-circuit formation precedes BOS-attractor formation by 10-20x in tokens; capability-circuit formation and attention-sink formation are two transitions, not one. (F4) The capability-specific screen converges to the final induction circuit within 0.3-2% of total training tokens -- circuit identification does not require the final model. (F5) For every final-checkpoint induction head sampled across all three models, per-head PR is elevated at or before the first revision at which that head crosses its capability-selectivity threshold. The results refine the induction-phase-transition framing: in 1B-class models trained on DCLM, the induction transition and the attention-sink transition are separated by an order of magnitude in tokens and have qualitatively different shapes.

2606.06888 2026-06-10 cs.LG 版本更新

Data-Constrained Language Model Pretraining: Improved Regularization and Scaling Laws

数据受限的语言模型预训练:改进的正则化与缩放定律

Zhiwei Xu, Shihao Wu, Hanseul Cho, Wei Hu, Yixin Wang

发表机构 * University of Michigan(密歇根大学) KAIST AI(韩国科学技术院人工智能研究所)

AI总结 研究数据受限下语言模型预训练的正则化与缩放,提出掩码输入正则化(MIR)改善验证损失,并设计SoftQ缩放定律更准确拟合重复数据下的模型与数据规模交互。

详情
AI中文摘要

语言模型预训练的经典缩放定律在固定计算预算下平衡模型规模与训练数据集大小,假设数据充足且仅对语料库遍历一次。随着训练计算量增长快于自然语言数据的供应,预训练可能进入数据受限、计算丰富的阶段,模型在有限数据集上训练多个周期。我们沿正则化和缩放两个维度研究数据受限预训练。对于正则化,我们研究掩码输入正则化(MIR),一种对随机掩码输入进行辅助下一词预测损失的方法。MIR测试扩散语言模型中的随机掩码是否能在不改变架构或增加推理开销的情况下有益于自回归预训练。在72M到1.4B参数的模型中,我们发现MIR在强权重衰减基础上进一步改善了验证损失,优于仅使用强权重衰减的自回归模型,并在1.4B规模上带来下游性能提升。对于缩放,我们提出SoftQ,一种将模型规模和数据规模耦合以捕捉它们在重复数据下交互的缩放定律。经典替代方案如Chinchilla定律使用加性形式解耦这些项,导致在数据受限情况下设定错误。我们发现SoftQ比这些替代方案更好地拟合数据受限实验,并估计MIR带来的增益相当于约1.3倍的独特训练数据。我们在https://this URL 发布代码。

英文摘要

Classical scaling laws for language model pretraining balance model size against training dataset size under a fixed compute budget, assuming abundant data and a single pass over the corpus. As training compute grows faster than the supply of natural language data, pretraining is likely to enter a data-constrained, compute-rich regime where models train for multiple epochs over a finite dataset. We study data-constrained pretraining along two axes, regularization and scaling. For regularization, we study masked-input regularization (MIR), an auxiliary next-token prediction loss on randomly masked inputs. MIR tests whether the random masking central to diffusion language models can benefit autoregressive pretraining without architectural changes or inference overhead. Across 72M to 1.4B parameter models, we find that MIR added on top of strong weight decay improves validation loss over autoregressive strong-weight-decay-only models, with downstream gains at 1.4B. For scaling, we propose SoftQ, a scaling law that couples model size and data size to capture their interaction under repeated data. Classical alternatives such as the Chinchilla law use an additive form that decouples these terms, making them misspecified in the data-constrained regime. We find that SoftQ fits data-constrained experiments substantially better than these alternatives, and estimates MIR's gains as equivalent to roughly 1.3 times as much unique training data. We release our code at https://github.com/yixinw-lab/dc_pretrain.

2606.07998 2026-06-10 cs.LG cs.AI 版本更新

Enhancing AI Interpretability and Safety through Localised Architectures

通过局部化架构增强AI可解释性与安全性

Ian Seet, Jonas Bozenhard, Simon Ostermann

AI总结 针对大型生成式AI模型可解释性差、计算成本高的问题,提出局部化机器学习架构,通过降低带宽、提高节点表达能力来提升可解释性和效率,并评估了多种硬件实现方案的适用性。

详情
AI中文摘要

近期生成式AI的进展,特别是强大的大型语言模型(LLM)和大型推理模型(LRM),引发了对这些庞大且不透明的AI模型的可解释性、安全性和可持续性的担忧。这些架构的能力不仅源于深度神经网络的可扩展性,还源于大规模并行硬件(如GPU集群)。深度神经网络的弥散性质使其在提供足够训练数据时具有强大的函数逼近能力,但代价是可解释性和计算效率的降低。观察到局部化机器学习(ML)模型在小数据集上往往比深度神经网络更具可解释性和计算效率,我们通过类比推理,认为类似的优势可能适用于特定的局部化硬件ML架构。我们主张,具有较低带宽但每个节点具有更高表达能力的局部化架构,有潜力在根本上比运行在GPU集群上的深度神经网络更具可解释性,同时在较小数据集上保持竞争力。然后,我们评估了各种硬件ML范式在实现此类局部化架构方面的适用性,并评估了它们的每节点表达能力、能效以及所需技术的实际成熟度。

英文摘要

Recent advances in generative AI, especially powerful Large Language Models (LLMs) and Large Reasoning Models (LRMs), raise concerns over the interpretability, safety and sustainability of these large and opaque AI models. The power of such architectures is derived not only from the scalability of deep neural networks, but also massively parallel hardware such as GPU clusters. The diffuse nature of deep neural networks gives them great function-approximation capability when provided with sufficient training data but imposes a cost in interpretability and computational efficiency. Observing that localised machine learning (ML) models tend to be more interpretable and computationally efficient than deep neural networks on small datasets, we reason by analogy that similar advantages may apply to specific localised hardware ML architectures. We argue that localised architectures with lower bandwidth but higher expressivity per node have the potential to be fundamentally more interpretable than deep neural networks running on GPU clusters while remaining competitive for smaller datasets. We then evaluate the suitability of various hardware ML paradigms for implementing such localised architectures and evaluate their per-node expressivity, energy efficiency and practical maturity of the technology required.

2606.08779 2026-06-10 cs.LG 版本更新

Reformulate LLM Reinforcement Learning for Efficient Training under Black-box Discrepancy

重新制定LLM强化学习以在黑箱差异下高效训练

Jiashun Liu, Runze Liu, Xu Wan, Jing Liang, Hongyao Tang, Ling Pan

发表机构 * Hong Kong University of Science and Technology(香港科技大学) Zhejiang University(浙江大学) Tianjin University(天津大学)

AI总结 针对强化学习中的训练-推理差异问题,提出差异约束马尔可夫决策过程(DCMDP),通过拉格朗日松弛自适应平衡性能提升与差异控制,实现稳定高效训练。

详情
AI中文摘要

强化学习(RL)已成为一种关键的后训练范式,但它经常遭受不可预测的次优性能甚至训练崩溃。最近的研究将这些失败归因于隐藏的训练-推理差异(或不匹配),源于底层引擎和架构的不同。我们发现,当提供适当的学习信号时,训练策略可以主动自我纠正这种差异。然后,我们进一步通过经验确定了一个差异容忍区域:在该区域内,激进地缩小差异会抑制策略探索并降低学习效率,而在该区域外,减少过度差异可提高优化一致性并提升可达到的局部性能上限。根据这些发现,我们将此问题表述为差异约束马尔可夫决策过程(DCMDP),其中奖励最大化与对齐训练-推理行为的约束相结合,实现稳定的双目标优化。为了自适应地平衡性能改进和差异控制,我们引入了一种拉格朗日松弛机制,根据当前差异违反程度动态调整两个目标的相对权重。这使得双目标优化稳定:策略可以在容忍区域内自由探索,而当差异超出安全边界时则被引导回来。经验上,DCMDP显著提升了8B密集模型(Qwen-3-8b)和30B混合专家模型(Qwen-3-30bA3b)的性能,并实现了一种异构训练范式,其中LLM可以在高保真训练设置下进行优化,同时明确对齐以用于低成本、资源受限的推理部署。

英文摘要

Reinforcement Learning (RL) has emerged as a pivotal post-training paradigm, yet it frequently suffers from unpredictable sub-optimum performance or even training collapses. Recent findings attribute these failures to a hidden train-inference discrepancy (or mismatch), stemming from the disparate underlying engines and architecture. We find that the training policy can actively self-correct such a discrepancy when provided with an appropriate learning signal. Then, we further empirically identify a discrepancy tolerance region: within this region, aggressively narrowing the discrepancy can suppress policy exploration and reduce learning efficiency, whereas outside this region, reducing excessive discrepancy improves optimization consistency and raises the achievable local performance ceiling. According to such findings, we formulate this problem as a Discrepancy-Constrained Markov Decision Process (DCMDP), where reward maximization is coupled with a constraint that aligns training-Inference behavior, achieving stable dual-objective optimization. To adaptively balance performance improvement and discrepancy control, we introduce a Lagrangian relaxation mechanism that dynamically adjusts the relative weight of the two objectives according to the current degree of discrepancy violation. This enables stable dual-objective optimization: the policy is allowed to explore freely within the tolerance region, while being guided back when the discrepancy exceeds the safe boundary. Empirically, DCMDP significantly improves the performance of 8B dense model (Qwen-3-8b) and 30B Mixture-of-Expert model (Qwen-3-30bA3b), and enables a heterogeneous training paradigm, where LLMs can be optimized in high-fidelity training setup while being explicitly aligned for low-cost, resource-constrained inference deployment.

2501.04339 2026-06-10 stat.ML cs.LG physics.app-ph 版本更新

Interpretable deep convolutional model for nonlinear multivariate time series in complex systems

可解释的深度卷积模型用于复杂系统中的非线性多元时间序列

Domjan Baric, Davor Horvatic

发表机构 * Department of Physics, Faculty of Science, University of Zagreb(扎格拉布大学物理系)

AI总结 提出DCIts架构,通过分解为Focuser和Modeler组件,实现非线性多元时间序列的局部可解释交互结构学习,在保持预测精度的同时恢复稳定的符号化滞后交互模式。

Comments 40 pages, 13 figures

详情
Journal ref
Chaos 36, 063116 (2026)
AI中文摘要

我们介绍了深度卷积时间序列解释器(DCIts),这是一种用于非线性多元时间序列的深度学习架构,能够提供样本特定、局部可解释的底层交互结构描述。与标准的黑箱预测器不同,DCIts学习一个时间和滞后依赖的转移张量,该张量被显式分解为两个组件:Focuser通过稀疏掩码机制选择相关的源序列和时间滞后,Modeler为这些选定的交互分配符号系数。这种分解为每个预测实例产生局部滞后邻接结构和符号化的源-滞后贡献,从而能够直接检查有效连接;当高阶分支被激活时,同一框架产生阶数分辨的元素级多项式贡献。在架构上,DCIts使用多样化的卷积滤波器库来捕获时间和跨变量依赖关系,这些依赖关系通过瓶颈网络映射到转移张量。在具有已知交互结构的受控基准数据集上,我们证明DCIts在实现竞争性预测误差(相对于强可解释基线)的同时,恢复了稳定的、符号化的、滞后分辨的交互模式。因此,该框架优先考虑内在可解释性,将预测准确性作为忠实性约束而非唯一目标。

英文摘要

We introduce the Deep Convolutional Interpreter for Time Series (DCIts), a deep-learning architecture for nonlinear multivariate time series that provides sample-specific, locally interpretable descriptions of the underlying interaction structure. Unlike standard black-box forecasters, DCIts learns a time- and lag-dependent transition tensor explicitly factorized into two components: a Focuser, which selects relevant source series and time lags via a sparse masking mechanism, and a Modeler, which assigns signed coefficients to these selected interactions. This decomposition yields a local lag-adjacency structure and signed source-lag contributions for every forecast instance, enabling direct inspection of effective connectivity; when higher-order branches are activated, the same framework yields order-resolved elementwise polynomial contributions. Architecturally, DCIts uses a diverse bank of convolutional filters to capture temporal and cross-variable dependencies, which are mapped through a bottleneck network to the transition tensor. On controlled benchmark datasets with a known interaction structure, we demonstrate that DCIts achieves competitive forecasting error relative to a strong interpretable baseline while recovering stable, signed, lag-resolved interaction patterns. The framework thus prioritizes intrinsic interpretability, using forecasting accuracy as a faithfulness constraint rather than the sole objective.

2604.06893 2026-06-10 cs.CV cs.LG 版本更新

Energy-Regularized Spatial Masking: A Novel Approach to Enhancing Robustness and Interpretability in Vision Models

能量正则化的空间遮蔽:一种增强视觉模型鲁棒性和可解释性的新方法

Tom Devynck, Bilal Faye, Djamel Bouchaffra, Nadjib Lazaar, Hanane Azzag, Mustapha Lebbah

发表机构 * DAVID Lab, UVSQ, Paris-Saclay University(DAVID实验室,UVSQ,巴黎-萨克雷大学) LIPN, UMR CNRS 7030, Sorbonne Paris Nord University(LIPN,UMR CNRS 7030,索邦巴黎北大学) LISN, Paris-Saclay University(LISN,巴黎-萨克雷大学)

AI总结 本文提出能量正则化空间遮蔽框架,通过可微能量最小化问题重新定义特征选择,实现更鲁棒和可解释的视觉模型。

Comments 8 pages

详情
AI中文摘要

深度卷积神经网络通过密集空间特征图的彻底处理取得了显著性能,但这种暴力策略引入了显著的计算冗余并鼓励依赖于虚假背景相关性。为此,我们提出能量正则化空间遮蔽(ERSM),一种新的框架,将特征选择重新公式化为可微能量最小化问题。通过在标准卷积骨干中嵌入轻量级能量-遮蔽层,每个视觉标记被分配一个由两个竞争力组成的标量能量:内在的Unary重要性成本和Pairwise空间一致性惩罚。不同于以往的剪枝方法,ERSM允许网络自主发现针对每个输入的最佳信息密度平衡。我们验证了ERSM在卷积架构上的有效性,证明其产生新兴稀疏性、改进对结构遮挡的鲁棒性,并产生高度可解释的空间遮蔽,同时保持分类准确性。此外,我们表明所学的能量排名在删除基于鲁棒性测试中显著优于基于幅度的剪枝,揭示ERSM作为一种内在去噪机制,能够在无像素级监督的情况下隔离语义物体区域。

英文摘要

Deep convolutional neural networks achieve remarkable performance by exhaustively processing dense spatial feature maps, yet this brute-force strategy introduces significant computational redundancy and encourages reliance on spurious background correlations. As a result, modern vision models remain brittle and difficult to interpret. We propose Energy-Regularized Spatial Masking (ERSM), a novel framework that reformulates feature selection as a differentiable energy minimization problem. By embedding a lightweight Energy-Mask Layer inside standard convolutional backbones, each visual token is assigned a scalar energy composed of two competing forces: an intrinsic Unary importance cost and a Pairwise spatial coherence penalty. Unlike prior pruning methods that enforce rigid sparsity budgets or rely on heuristic importance scores, ERSM allows the network to autonomously discover an optimal information-density equilibrium tailored to each input. We validate ERSM on convolutional architectures and demonstrate that it produces emergent sparsity, improved robustness to structured occlusion, and highly interpretable spatial masks, while preserving classification accuracy. Furthermore, we show that the learned energy ranking significantly outperforms magnitude-based pruning in deletion-based robustness tests, revealing ERSM as an intrinsic denoising mechanism that isolates semantic object regions without pixel-level supervision.

2605.30370 2026-06-10 cs.NE cs.AI cs.CV cs.LG 版本更新

Updating the standard neuron model in artificial neural networks

更新人工神经网络中的标准神经元模型

Raul Mohedano, Thomas Batard, Erik Velasco-Salido, Ramsses De Los Santos Mendoza, Jorge H. Martínez, Stacey Levine, Marcelo Bertalmío

发表机构 * Spanish National Research Council (CSIC)(西班牙国家研究理事会(CSIC)) Center for Research in Mathematics (CIMAT)(数学研究中心(CIMAT)) Universidad Autónoma de Madrid (UAM)(马德里自治大学(UAM)) National Science Foundation (NSF)(国家科学基金会(NSF))

AI总结 本文用更真实的皮层细胞模型替代标准点神经元模型,在不增加参数的情况下,提升了人工神经网络的表达能力、鲁棒性和学习速度,并减少了记忆化和所需训练数据量。

Comments Acknowledgments included in the manuscript

详情
AI中文摘要

自20世纪50年代诞生以来,人工神经网络(ANNs)一直使用当时神经科学中流行的所谓点神经元模型,希望这种类比能够更好地模拟大脑功能。多年来,神经科学文献表明点神经元模型过于简单,无法正确表示许多基本的神经过程;然而,ANNs中的标准神经元模型仍然保持不变。在这里,我们用一个非常新的皮层细胞模型替代它,并通过理论分析和实验结果证明,仅仅通过使用更真实的神经单元元素而不增加参数数量,所得到的ANNs提供了许多重要优势,包括增强的表达能力、鲁棒性和学习速度,以及减少记忆化和所需的训练数据量。

英文摘要

From their inception in the 1950s, artificial neural networks (ANNs) started using the so-called point neuron model then prevalent in neuroscience, hoping that this analogy would allow for a better emulation of brain function. Over the years the neuroscience literature has shown that the point neuron model is too simplistic to properly represent many fundamental neural processes; however, the standard neuron model in ANNs still remains the same. Here we substitute it by a very recent model of cortical cells and demonstrate through theoretical analyses and experimental results how, simply by using a more realistic neural unit element without augmenting the number of parameters, the resulting ANNs offer a number of important advantages that include increases in expressivity, robustness and learning speed, and a reduction in memorization and the amount of training data needed.

2. 表示学习、自监督与对比学习 14 篇

2606.09853 2026-06-10 cs.LG cs.IT math.IT 新提交

SynIB: Informational Bottleneck for Maximizing Synergy in Multimodal Learning

SynIB: 多模态学习中最大化协同的信息瓶颈

Konstantinos Kontras, Teodora Gagaleska, Thomas Strypsteen, Christos Chatzichristos, Matthew Blaschko, Maarten De Vos, Paul Pu Liang

发表机构 * University of California, Berkeley(加州大学伯克利分校) Google Research(谷歌研究院) University of Amsterdam(阿姆斯特丹大学)

AI总结 提出SynIB方法,通过信息瓶颈理论直接优化多模态协同,在训练中屏蔽单模态时惩罚高置信度,提升跨模态推理能力,在合成和真实任务上准确率提升达7.8%。

详情
AI中文摘要

多模态学习的一个核心目标是捕捉协同:即仅通过联合使用多个模态才能获得的、且任何单一模态都无法提供的任务相关信息。虽然大多数方法通过更大或更复杂的融合模型在架构层面进行操作,但我们提出一个互补的轴:塑造训练目标本身。标准训练通常强调单模态或冗余信息,在需要跨模态推理的示例上表现不足。我们通过信息论形式化多模态协同,并引入协同信息瓶颈(SynIB),一个直接针对协同的可扩展目标。为了优先学习协同,SynIB激励模型从所有模态准确预测,同时当任何模态的信息被隐藏时惩罚置信度。除了标准任务损失外,模型每次运行时屏蔽一个模态进行前向传播,如果保持高置信度则受到惩罚,这表明依赖单模态线索而非跨模态交互。我们在两个场景中验证SynIB。在合成XOR任务中,真实协同已知,标准训练无法恢复协同而SynIB成功。在五个真实世界基准测试中,包括三个MultiBench情感任务、使用CLIP-ViT和DeBERTa骨干的Hateful Memes,以及我们引入的可控讽刺扩展CREMA-D,SynIB在依赖协同的示例上准确率提升高达7.8%,总体准确率提升高达3.8%。

英文摘要

A central objective in multimodal learning is to capture synergy: task-relevant information that arises only from the joint use of multiple modalities, and is not available from any single modality alone. While most approaches operate at the architectural level through larger or more complex fusion models, we propose a complementary axis: shaping the training objective itself. Standard training often emphasizes unimodal or redundant information, falling short on examples that require cross-modal reasoning. We formalize multimodal synergy through information theory and introduce the Synergistic Information Bottleneck (SynIB), a scalable objective that targets synergy directly. To prioritize learning synergy, SynIB motivates the model to predict accurately from all modalities while penalizing confidence when information from any modality is withheld. Alongside the standard task loss, the model runs forward passes with one modality masked at a time and is penalized for remaining confident, which would indicate reliance on unimodal cues rather than cross-modal interactions. We validate SynIB in two regimes. On synthetic XOR tasks where the ground-truth synergy is known by construction, standard training fails to recover it while SynIB does. On five real-world benchmarks, including three MultiBench affective tasks, Hateful Memes with CLIP-ViT and DeBERTa backbones, and a controllable irony extension of CREMA-D we introduce, SynIB improves accuracy on synergy-dependent examples by up to 7.8% and overall accuracy by up to 3.8%.

2606.09940 2026-06-10 cs.LG cs.AI 新提交

Interactions Between Crosscoder Features: A Compact Proofs Perspective

交叉编码器特征间的交互:一个紧凑证明的视角

Dmitry Manning-Coe, Thomas Read, Anna Soligo, Oliver Clive-Griffin, Chun-Hei Yip, Rajashree Agrawal, Jason Gross

AI总结 本文从紧凑证明角度形式化交叉编码器特征交互,提出交互度量并应用于计算稀疏性、语义聚类和检测休眠代理。

Comments Accepted at the NeurIPS 2025 Workshop on Mechanistic Interpretability

详情
AI中文摘要

像稀疏自编码器(SAEs)和交叉编码器这样的字典学习方法试图通过将模型的激活分解为独立特征来解释模型。因此,特征之间的交互会在重构中引入误差。我们通过紧凑证明形式化了这一直觉,并做出了五项贡献。首先,我们展示了原则上如何使用交叉编码器构建模型性能的紧凑证明。其次,我们证明了该证明中出现的误差项可以自然地解释为交叉编码器特征之间交互的度量,并提供了多层感知器(MLP)层中交互项的显式表达式。然后,我们提供了这种新交互度量的三个应用。在第三项贡献中,我们展示了交互项本身可以用作可微分的损失惩罚。应用这种惩罚,我们可以实现“计算稀疏”的交叉编码器,当在每个数据点和神经元仅保留单个特征时,保留MLP性能的60%,而标准交叉编码器仅保留10%。接着,我们展示了根据我们的交互度量进行聚类可以提供语义上有意义的特征聚类,最后,我们展示了休眠代理具有显著的交互。代码可在以下网址获取:https://this URL。

英文摘要

Dictionary learning methods like Sparse Autoencoders (SAEs) and crosscoders attempt to explain a model by decomposing its activations into independent features. Interactions between features hence induce errors in the reconstruction. We formalize this intuition via compact proofs and make five contributions. First, we show how, \textit{in principle}, a compact proof of model performance can be constructed using a crosscoder. Second, we show that an error term arising in this proof can naturally be interpreted as a measure of interaction between crosscoder features and provide an explicit expression for the interaction term in the Multi-Layer Perceptron (MLP) layers. We then provide three applications of this new interaction measure. In our third contribution we show that the interaction term itself can be used as a differentiable loss penalty. Applying this penalty, we can achieve ``computationally sparse'' crosscoders that retain $60\%$ of MLP performance when only keeping a single feature at each datapoint and neuron, compared to $10\%$ in standard crosscoders. We then show that clustering according to our interaction measure provides semantically meaningful feature clusters, and finally that sleeper agents have significant interactions. Code is available at https://github.com/chainik1125/crosscoders-feature-interactions/tree/arxiv.

2606.10029 2026-06-10 cs.LG cs.AI cs.CL 新提交

Interpreting and Steering a Text-to-Speech Language Model with Sparse Autoencoders

用稀疏自编码器解释和引导文本转语音语言模型

Nikita Koriagin, Georgii Aparin, Nikita Balagansky, Daniil Gavrilov

AI总结 本文在CosyVoice3语言模型骨干上训练BatchTopK稀疏自编码器,发现特征可解释且因果可控,能操纵笑声、性别和语速。

详情
AI中文摘要

语言模型日益成为文本转语音(TTS)系统的骨干,但我们对其在文本和生成语音令牌共享单一残差流时构建的表示知之甚少。我们在CosyVoice3的语言模型骨干上训练BatchTopK稀疏自编码器,并引入一种模态感知的自动解释流水线,根据特征触发的位置——文本前缀上下文、1秒语音片段或两者——为每个特征打标签。恢复的特征是可解释的,涵盖音素、笑声、口音提示和说话者性别。通过SAE潜在空间进行引导表明,这些特征是因果性的而非仅仅是描述性的:有针对性的干预将笑声概率从0.02提高到0.79,翻转感知到的说话者性别,并在保持口语内容的同时控制语速。因此,SAE特征既可作为解释性对象,也可作为TTS合成的控制方向。

英文摘要

Language models increasingly serve as the backbone of text-to-speech (TTS) systems, yet we understand little about the representations they build when text and generated speech tokens share a single residual stream. We train BatchTopK sparse autoencoders on the LM backbone of CosyVoice3 and introduce a modality-aware auto-interp pipeline that labels each feature from where it fires-text-prefix context, 1-second speech clips, or both. The recovered features are interpretable, spanning phonemes, laughter, accent prompts and speaker gender. Steering through the SAE latent space shows these features are causal rather than merely descriptive: targeted interventions raise laughter probability from 0.02 to 0.79, flip perceived speaker gender, and control speech rate while preserving spoken content. SAE features thus serve both as interpretability objects and as control directions for TTS synthesis.

2606.10099 2026-06-10 cs.LG cs.AI 新提交

Unsupervised Style Representation Learning for AI-Text Detection via Paraphrase Inversion

无监督风格表示学习用于通过释义反转检测AI文本

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

AI总结 提出无监督风格编码器,通过重构人工文本与机器生成释义间的差异学习判别性风格特征,实现少样本和零样本AI文本检测,性能优于基线。

详情
AI中文摘要

大型语言模型(LLMs)的快速发展引发了对其滥用的担忧,如抄袭、错误信息和自动化影响操作,这促使需要鲁棒的检测器。最近的研究表明,写作风格的神经表示对于检测是有效的,并且至关重要的是,对于击败大多数现有检测器的对抗攻击具有鲁棒性。然而,当前的基于风格的检测器依赖作者标签进行训练,并且仅限于少样本推理进行检测,需要可能并不总是可用的分布内样本。我们通过训练风格编码器从机器生成的释义中重构人工文本,从而在没有作者标签的情况下学习判别性风格特征;在训练期间冻结语义编码器,使风格编码器偏向于仅捕获重构所需的非语义特征。我们通过两种检测策略评估学习到的表示:少样本检测器和基于DeepSVDD的零样本检测器。在基准测试中,我们的方法在少样本设置下匹配或优于所有基线,并且在零样本设置下,与完全监督的分类器在分布内测试数据上具有竞争力,同时对未见过的LLMs具有更好的泛化能力。除了检测之外,学习到的表示还能泛化到未见过的任务,在作者验证和细粒度风格区分上取得竞争性表现,尽管从未针对这两个目标进行训练。

英文摘要

The rapid development of large language models (LLMs) has raised concerns about misuse such as plagiarism, misinformation, and automated influence operations, motivating the need for robust detectors. Recent work has shown that neural representations of writing style are effective for detection and, crucially, robust to adversarial attacks that defeat most existing detectors. However, current style-based detectors rely on authorship labels for training, and are limited to few-shot inference for detection, requiring in-distribution samples that may not always be available. We learn discriminative style features without authorship labels by training a style encoder to reconstruct human-authored text from its machine-generated paraphrase; freezing a semantic encoder during training biases the style encoder to capture only the non-semantic features needed for reconstruction. We evaluate the learned representations via two detection strategies: a few-shot detector and a zero-shot DeepSVDD-based detector. Across benchmarks, our method matches or outperforms all baselines in the few-shot setting and, in the zero-shot regime, is competitive with fully supervised classifiers on in-distribution test data while generalizing better to unseen LLMs. Beyond detection, the learned representations generalize to unseen tasks, achieving competitive performance on authorship verification and fine-grained style discrimination despite never being trained on either objective.

2606.10284 2026-06-10 cs.LG 新提交

Revisiting Positive Samples in Graph Contrastive Learning: From the Perspective of Message Passing

重新审视图对比学习中的正样本:从消息传递的角度

Lianze Shan, Ningchong Wang, Jitao Zhao, Di Jin, Dongxiao He

发表机构 * School of Computer Science and Technology, Tianjin University, Tianjin, China(天津大学计算机科学与技术学院) School of Future Technology, Tianjin University, Tianjin, China(天津大学未来技术学院)

AI总结 本文从Dirichlet能量角度理论发现消息传递机制使正样本最大化变得平凡,导致图对比学习难以从正样本中有效学习,并提出SPGCL方法通过仅传播高能量特征并利用低能量特征构建概率矩阵来恢复正样本的学习效能。

Comments 24 pages,6 figures

详情
AI中文摘要

图对比学习(GCL)通过最大化正样本之间的相似性并最小化负样本之间的相似性来训练图编码器,已成为主流的图预训练范式。普遍认为正样本在GCL中至关重要。理想情况下,最大化正样本的相似性使图编码器能够捕捉图数据的内在语义和模式。然而,我们发现一个有趣的现象:即使没有正样本,GCL也能取得有竞争力的性能。这促使我们重新审视GCL中正样本的基本机制。从Dirichlet能量的角度,我们理论上发现,消息传递(图编码器中的关键机制)使正样本的最大化变得平凡,从而阻止GCL从正样本中有效学习。为了解决这个问题,我们提出SPGCL来减轻消息传递导致的平凡化,并恢复正样本的学习效能。具体来说,我们发现高Dirichlet能量特征有助于正样本提供有效的学习信号,而低Dirichlet能量特征对正学习信号贡献很小,但对正采样有用。基于此,SPGCL仅传播高Dirichlet能量特征,并使用低能量特征构建概率矩阵以实现可靠的正采样。大量实验证明了SPGCL的有效性。

英文摘要

Graph Contrastive Learning (GCL), which trains graph encoders by maximizing similarity between positive samples and minimizing it between negative ones, has emerged as a mainstream graph pre-training paradigm. It is widely recognized that positive samples are essential in GCLs. Ideally, maximizing the similarity of positive samples enables graph encoders to capture intrinsic semantic and patterns of graph data. However, we discover an interesting phenomenon: GCLs can achieve competitive performance even without positive samples. This motivates us to revisit the fundamental mechanism of positive samples in GCLs. From the perspective of Dirichlet energy, we theoretically finds that message passing, a key mechanism in graph encoders, trivializes the maximization of positive samples, preventing GCLs from effectively learning from positive samples. To address this, we propose SPGCL to mitigate the trivialization caused by message passing and restore the learning efficacy of positive samples. Specifically, we find that high Dirichlet energy features help positive samples provide effective learning signals while low Dirichlet energy features contribute little to positive learning signal but is useful for positive sampling. Based on this, SPGCL propagates only high Dirichlet energy features and uses low energy features to construct a probability matrix for reliable positive sampling. Extensive experiments demonstrate the effectiveness of SPGCL.

2606.10461 2026-06-10 cs.LG cs.AI cs.CL 新提交

ERAlign: Energy-based Representation Alignment of GNNs and LLMs on Text-attributed Graphs

ERAlign: 文本属性图上GNN与LLM的基于能量的表示对齐

Xianlin Zeng, Fan Xia, Xiangyu Chen

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出ERAlign框架,利用能量模型对齐GNN和LLM的表示,通过能量差异优化实现分布一致性,在8个数据集上取得最优性能。

Comments Accepted to ICML 2026

详情
AI中文摘要

文本属性图(TAGs)将文本节点属性与图结构相结合,以描述丰富的关联语义。最近整合图神经网络(GNNs)和大语言模型(LLMs)的努力在TAGs学习上显示出前景,但实现良好对齐的表示仍然具有挑战性。先前的研究主要依赖于执行粗粒度匹配的启发式方法。它们缺乏足够的约束,忽略了分布对齐,导致表示漂移和泛化能力有限。基于能量模型(EBMs),我们提出了一种基于能量的表示对齐(ERAlign)框架,该框架将GNN编码的图结构和LLM导出的文本嵌入投影到共享潜在空间,以实现分布一致性。具体来说,层间对齐通过距离度量量化,并通过EBM目标进行优化。通过降低能量值,我们的框架为下游任务产生良好对齐的表示。在训练过程中,我们引入能量差异(ED)以避免与难以处理的归一化相关的高采样成本。ED还具有更高的训练效率和减少能量景观失真的理论保证。在八个TAG数据集上的实证评估表明,ERAlign在不同监督水平和跨任务迁移场景下均获得了最先进的性能。

英文摘要

Text-attributed Graphs (TAGs) incorporate textual node attributes with graph structures to describe rich relational semantics. Recent efforts to integrate Graph Neural Networks (GNNs) and Large Language Models (LLMs) have shown promise for learning on TAGs, yet achieving well-aligned representations remains challenging. Prior studies largely rely on heuristics that perform coarse-grained matching. They lack sufficient constraints and ignore distributional alignment, leading to representation drift and limited generalization. Building on Energy-based Models (EBMs), we propose an Energy-based Representation Alignment (ERAlign) framework that projects GNN-encoded graph structure and LLM-derived text embeddings in a shared latent space to achieve distribution consistency. Concretely, layer-wise alignment is quantified by a distance metric and optimized via an EBM objective. By decreasing energy values, our framework yields well-aligned representations for downstream tasks. During training, we introduce Energy Discrepancy (ED) to avoid high sampling costs associated with intractable normalization. ED also carries theoretical guarantees of higher training efficiency and reduced energy landscape distortion. Empirical evaluations on eight TAG datasets demonstrate that ERAlign obtains state-of-the-art performance across varying levels of supervision and cross-task transfer scenarios.

2606.10789 2026-06-10 cs.LG 新提交

Closing the Modality Gap in Zero-Shot HAR: Contrastive Training and Separability-Optimized Prototypes on IMU Data

缩小零样本HAR中的模态差距:基于IMU数据的对比训练与可分性优化原型

Anik Ghosh

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对IMU基零样本人体活动识别中的模态差距问题,提出对比训练与描述性原型结合的方法,在PAMAP2数据集上实现73.2%准确率和0.583宏F1,并指出宏F1更适合作为评估指标。

Comments 17 pages, 7 figures

详情
AI中文摘要

基于惯性测量单元(IMU)的人体活动识别(HAR)中的零样本学习(ZSL)面临一个核心挑战:弥合传感器嵌入与语义类表示之间的差距。我们在PAMAP2数据集上系统评估了三种推理方法与两种训练流程组合的七种配置,使用14个已知和4个未知活动类别,并保留受试者108和109用于测试。我们发现模态差距是一个由编码器目标决定的训练时现象。使用标签名称的Sentence-BERT原型进行交叉熵训练的时间卷积网络(TCN)产生的传感器嵌入与对应文本原型的平均余弦相似度为0.30,而将标签名称原型目标替换为判别性活动描述后,该值提升至0.69。这种对齐改进在所有三种推理方法中一致迁移。最强的结果结合了对比训练与反向softmax校正,在未知类别上达到73.2%的准确率和0.583的宏F1,而标签名称基线仅为58.3%准确率和0.34宏F1。另一个发现是,更丰富的文本描述降低了Sentence-BERT空间中原型间的可分性,因为共享的生物力学词汇导致语言模型压缩了原型云。只要原型描述保留足够的判别性词汇,这种效应不会抵消对比对齐的好处。我们还证明,当测试集类别分布不平衡时,总体准确率是一个误导性的主要指标,并推荐宏平均F1作为ZSL-HAR基准的标准报告指标。

英文摘要

Zero-shot learning (ZSL) for inertial measurement unit (IMU)-based human activity recognition (HAR) faces a central challenge: bridging the gap between sensor embeddings and semantic class representations. We systematically evaluate seven configurations combining three inference methods with two training pipelines on the PAMAP2 dataset, using 14 seen and 4 unseen activity classes with subjects 108 and 109 held out for testing. We find that the modality gap is a training-time phenomenon governed by the encoder objective. A temporal convolutional network (TCN) trained with cross-entropy over label-name Sentence- BERT prototypes yields sensor embeddings with a mean cosine similarity of 0.30 to the corresponding text prototypes, while replacing the label-name prototype targets with discriminative activity descriptions raises this to 0.69. This alignment improvement transfers consistently across all three inference methods. The strongest result combines contrastive training with inverted softmax correction, achieving 73.2% accuracy and 0.583 macro F1 on unseen classes, compared to 58.3% accuracy and 0.34 macro F1 for the label-name baseline. A secondary finding is that richer text descriptions reduce inter-prototype separability in Sentence-BERT space, because shared biomechanical vocabulary causes the language model to compress the prototype cloud. This effect does not negate the benefits of contrastive alignment provided prototype descriptions retain sufficient discriminative vocabulary. We also demonstrate that overall accuracy is a misleading primary metric when test-set class distributions are imbalanced, and recommend macro-averaged F1 as the standard reporting metric for ZSL-HAR benchmarks.

2606.10975 2026-06-10 cs.LG eess.SP math.OC 新提交

Learning Doubly Sparse Explicitly Conditioned Transforms

学习双稀疏显式条件变换

Tudor Pistol

AI总结 提出一种将固定规范矩阵与自适应稀疏分量乘积形式的结构化显式条件变换学习方法,在保持快速稳定分析变换优势的同时引入可控自适应性,实验表明在双稀疏变换学习问题上达到最优性能。

Comments 10 pages, 1 figure, 1 table. Accepted for publication in Procedia Computer Science (30th International Conference on Knowledge-Based and Intelligent Information & Engineering Systems - KES 2026; Invited Session: Global and Constrained Optimization: Algorithms and Applications)

详情
AI中文摘要

在最近的研究中,找到自然信号假定稀疏结构成立的便利空间已成为一个理想结果,其影响体现在数据压缩、降噪和特征提取等领域。虽然广泛使用的分析变换(如DFT或DCT)已经提供了高效的算法和鲁棒的稀疏表示,但它们假设了关于数据的固定先验,无法准确捕捉更严格信号类别的特定结构。为了解决这个问题,文献中引入了数据自适应学习变换的概念,允许减少变换域中的残差项。最近的研究表明,条件数在此背景下是一个良好的度量,期望的结果在泛化倾向和实现最小近似误差之间交替。受这些考虑启发,我们引入了一种结构化显式条件变换的学习,该变换被表述为一个固定规范矩阵与一个精炼的数据自适应稀疏分量的乘积。这种方法旨在保留快速稳定分析变换的优势,同时引入对数据的可控自适应性。目前尚未发现涉及这种特定公式的参考文献,表明其新颖性。所提出的算法在不精确近端方法的框架内被推导,利用了一个新导出的闭式投影算子。实验观察表明,在双稀疏变换学习问题上取得了最先进的结果,并且与密集变体相比,在显著降低计算成本的同时,有时收敛更快且更好地避免不良局部最小值。

英文摘要

Finding convenient spaces in which certain hypotheses regarding an assumed sparse structure of natural signals hold true has become a desirable result in recent research, its implications being reflected in areas such as data compression, noise reduction and feature extraction. While the extensively used analytical transforms, such as DFT or DCT, already provide efficient algorithms and robust sparse representations, they assume a fixed prior about the data, failing to accurately capture the specific structure of more restrictive classes of signals. To address this, the concept of a data-adaptive, learnt transform has been introduced in the literature, allowing for the reduction of a residual term in the transform domain. More recent studies have shown that the condition number serves as a good metric in this context, where the desired outcome alternates between a generalizing tendency and one that achieves minimal approximation error. Motivated by these considerations, we introduce the learning of a structured, explicitly conditioned transform formulated as the product of a fixed canonical matrix and a refining data-adaptive sparse component. This approach seeks to preserve the advantages of fast and stable analytical transforms, while introducing controllable adaptivity to the data. No references that concern this specific formulation have been identified so far, indicating its novelty. The proposed algorithm is motivated within the framework of inexact proximal methods, leveraging a newly derived closed-form projection operator. Empirical observations demonstrate state-of-the-art results on the doubly sparse transform learning problem and comparable performance with its dense variant at significantly lower computational costs and sometimes faster convergence and better avoidance of bad local minima.

2606.09856 2026-06-10 cs.CL cs.AI cs.LG stat.ML 交叉投稿

Using Probabilistic Programs to Train Inductive Reasoning in Large Language Models

使用概率程序训练大型语言模型的归纳推理

Liyi Zhang, Akshay K. Jagadish, Brenden M. Lake, Thomas L. Griffiths

AI总结 提出基于程序的后验训练(PPT)方法,利用LLM生成概率程序场景,通过推理产生分布目标,微调模型以提升归纳推理准确性、与人类判断的一致性及校准能力。

Comments 20 pages, 5 figures

详情
AI中文摘要

大型语言模型(LLM)的后训练推理通常专注于数学和编码等演绎任务,其中正确性可验证。然而,许多现实世界的推理问题是归纳性的:智能体必须从稀疏、模糊的观测中推断不确定的信念。使用标准微调方法进行归纳推理面临挑战,包括难以策划大规模、高质量标注数据集以及处理本质上是分布式的目标。在这项工作中,我们引入了一种称为基于程序的后验训练(PPT)的新方法来解决这些局限性:我们使用LLM生成多样化的开放世界场景作为概率程序,运行概率推理以产生查询的分布式目标响应,然后在这些概率软标签上进行微调。使用这种方法,我们在10,000个程序生成的场景上微调LLM,并在保留的模板、人工标注的判断和外部基准上进行评估。总体而言,PPT显著提高了保留归纳任务的估计准确性,增强了与人类判断的一致性,并迁移到估计和校准的外部基准。此外,原始校准的增益并未被事后温度缩放所涵盖,表明与输出重新缩放相比,模型更深入地内化了不确定性。这些结果表明,概率程序介导的微调是一种有前景的方法,用于后训练LLM以可靠地执行近似归纳推理。

英文摘要

Post-training Large Language Models (LLMs) for reasoning typically focuses on deductive tasks such as mathematics and coding where correctness is verifiable. Yet, many real-world reasoning problems are inductive: agents must infer uncertain beliefs from sparse, ambiguous observations. There are challenges to using standard fine-tuning methods for inductive reasoning, including difficulties in curating large-scale, high-quality labeled datasets and in handling targets that are inherently distributional. In this work, we introduce a novel approach, called Program-based Posterior Training (PPT), to address these limitations: we use an LLM to generate diverse open-world scenarios as probabilistic programs, run probabilistic inference to produce distributional target responses to queries, and then fine-tune on these probabilistic soft labels. Using this approach, we fine-tune LLMs on 10,000 programmatically generated scenarios and evaluate on held-out motifs, human-labeled judgments, and external benchmarks. Overall, PPT substantially improves estimation accuracy on held-out inductive tasks, increases alignment with human judgments, and transfers to external benchmarks for estimation and calibration. Additionally, the gains in raw calibration are not subsumed by post-hoc temperature scaling, showing that the models have more deeply internalized uncertainty compared to output rescaling. Together, these results suggest that probabilistic-program-mediated fine-tuning is a promising approach for post-training LLMs to reliably perform approximate inductive inference.

2606.09893 2026-06-10 eess.IV cs.AI cs.LG 交叉投稿

Tractogram foundation model

TractFM:纤维束图基础模型

Guikun Chen, Yuqian Chen, Yijie Li, Yogesh Rathi, Nikos Makris, Fan Zhang, Wenguan Wang, Lauren J. O'Donnell

发表机构 * The State Key Lab of Brain-Machine Intelligence, Zhejiang University, Hangzhou(脑机智能国家重点实验室,浙江大学,杭州) Department of Radiology, Brigham and Women’s Hospital, Mass General Brigham, Boston(放射科,布里洛妇女医院,马萨诸塞总医院,波士顿) Harvard Medical School, Boston(哈佛医学院,波士顿) Academy of Medical Engineering and Translational Medicine, Tianjin University, Tianjin(医学工程与转化医学研究院,天津大学,天津) School of Information and Communication Engineering, University of Electronic Science and Technology of China, Chengdu(信息与通信工程学院,电子科技大学,成都) Psychiatry Neuroimaging Laboratory, Brigham and Women’s Hospital, Mass General Brigham, Boston(精神病神经影像实验室,布里洛妇女医院,马萨诸塞总医院,波士顿) Department of Psychiatry, Center for Morphometric Analysis, Massachusetts General Hospital, Boston(精神病科,形态分析中心,马萨诸塞总医院,波士顿)

AI总结 提出TractFM基础模型,直接从全脑纤维束集学习可复用表示,结合局部纤维编码器和置换等变纤维束编码器,通过密集解剖束分割预训练,实现纤维束级和受试者级任务的迁移。

详情
AI中文摘要

扩散MRI(dMRI)纤维束成像是在活体人脑中绘制白质通路的唯一非侵入性方法。它将每个大脑表示为一个纤维束图:一个大型、无序的三维流线集合,包含局部流线几何和全脑解剖组织的信息。这种结构使纤维束图成为表示学习的自然但具有挑战性的目标。现有方法将流线分类和受试者级预测视为独立问题:流线分类器关注几何模式,而受试者级预测通常依赖于手工特征。因此,当前方法无法学习连接流线解剖与全脑受试者间变异的可复用表示。本文介绍TractFM,一个纤维束图基础模型,直接从全脑纤维束集学习可复用表示。TractFM结合了局部流线编码器和置换等变纤维束编码器,使得一个受试者的所有流线能够在单次前向传递中共同上下文化。在密集解剖束分割(即给单个流线分配解剖标签)上的预训练产生了两种互补表示:用于束分割的上下文化流线级嵌入和用于下游受试者表型预测的紧凑受试者级描述符。在三种纤维束成像算法和五个dMRI数据集上,TractFM迁移到流线级和受试者级任务。其冻结表示实现了准确的束分割,并在独立数据集上预测年龄和性别。这些结果表明,全脑几何上下文(一次性学习)可以泛化到纤维束成像流程、数据集和预测任务中。

英文摘要

Diffusion MRI (dMRI) tractography is the only noninvasive approach for mapping white-matter pathways in the living human brain. It represents each brain as a tractogram: a large, unordered set of three-dimensional streamlines that includes information about both local streamline geometry and whole-brain anatomical organization. This structure makes tractograms a natural but challenging target for representation learning. Existing methods treat streamline classification and subject-level prediction as separate problems: streamline classifiers focus on geometric patterns, whereas subject-level prediction often depends on hand-crafted features. As a result, current methods do not learn reusable representations that connect streamline anatomy with whole-brain inter-subject variation. Here we introduce TractFM, a tractogram foundation model that learns reusable representations directly from whole-brain streamline sets. TractFM combines a local streamline encoder with a permutation-equivariant tractogram encoder, allowing all streamlines from a subject to be contextualized jointly in a single forward pass. Pretraining on dense anatomical tract parcellation, i.e., assigning anatomical labels to individual streamlines, yields two complementary representations: contextualized streamline-level embeddings for tract parcellation and compact subject-level descriptors for downstream prediction of subject phenotypes. Across three tractography algorithms and five dMRI datasets, TractFM transfers to both streamline-level and subject-level tasks. Its frozen representations achieve accurate tract parcellation and predict age and sex across independent datasets. These results show that whole-brain geometric context, learned once, can generalize across tractography pipelines, datasets, and prediction tasks.

2502.07027 2026-06-10 cs.LG cs.AI 版本更新

Representational Alignment with Chemical Induced Fit for Molecular Relational Learning

基于化学诱导契合的表征对齐用于分子关系学习

Peiliang Zhang, Jingling Yuan, Qing Xie, Yongjun Zhu, Chao Che, Lin Li

发表机构 * Wuhan University of Technology(武汉理工大学) Yonsei University(延世大学) Hubei Key Laboratory of Transportation Internet of Things(湖北省交通运输物联网重点实验室) Dalian University(大连大学)

AI总结 提出ReAlignFit方法,通过引入化学诱导契合的归纳偏置动态对齐子结构表征,并利用子图信息瓶颈优化高化学功能兼容性的子结构对,以提升分子关系学习在化学空间偏移数据上的稳定性。

Comments Accepted by SIGKDD2026 AI for Science Track

详情
AI中文摘要

分子关系学习(MRL)广泛应用于自然科学中,通过提取结构特征来预测分子对之间的关系。子结构对之间的表征相似性决定了分子结合位点的功能兼容性。然而,通过注意力机制对齐子结构表征缺乏化学知识的指导,导致模型在化学空间(例如,官能团、骨架)偏移数据上的性能不稳定。在理论论证的基础上,我们提出了基于化学诱导契合的表征对齐(ReAlignFit)来增强MRL的稳定性。ReAlignFit通过引入基于化学诱导契合的归纳偏置,动态对齐MRL中的子结构表征。在诱导过程中,我们设计了基于子结构边重构的偏差校正函数,通过模拟化学构象变化(子结构的动态组合)来对齐子结构对之间的表征。ReAlignFit进一步在契合过程中集成子图信息瓶颈,以精炼和优化具有高化学功能兼容性的子结构对,并利用它们生成分子嵌入。在九个数据集上的实验结果表明,ReAlignFit在两个任务中均优于最先进的模型,并显著增强了模型在规则偏移和骨架偏移数据分布上的稳定性。

英文摘要

Molecular Relational Learning (MRL) is widely applied in natural sciences to predict relationships between molecular pairs by extracting structural features. The representational similarity between substructure pairs determines the functional compatibility of molecular binding sites. Nevertheless, aligning substructure representations by attention mechanisms lacks guidance from chemical knowledge, resulting in unstable model performance in chemical space (\textit{e.g.}, functional group, scaffold) shifted data. With theoretical justification, we propose the \textbf{Re}presentational \textbf{Align}ment with Chemical Induced \textbf{Fit} (ReAlignFit) to enhance the stability of MRL. ReAlignFit dynamically aligns substructure representation in MRL by introducing chemical Induced Fit-based inductive bias. In the induction process, we design the Bias Correction Function based on substructure edge reconstruction to align representations between substructure pairs by simulating chemical conformational changes (dynamic combination of substructures). ReAlignFit further integrates the Subgraph Information Bottleneck during fit process to refine and optimize substructure pairs exhibiting high chemical functional compatibility, leveraging them to generate molecular embeddings. Experimental results on nine datasets demonstrate that ReAlignFit outperforms state-of-the-art models in two tasks and significantly enhances model's stability in both rule-shifted and scaffold-shifted data distributions.

2505.11702 2026-06-10 cs.LG stat.ML 版本更新

Post-Training Augmentation Invariance

训练后增强不变性

Keenan Eikenberry, Lizuo Liu, Yoonsang Lee

发表机构 * Department of Mathematics, Dartmouth College(达特茅斯学院数学系)

AI总结 提出训练后增强不变性框架,通过轻量级MLP适配器网络在预训练模型潜空间上实现近似不变性,无需微调且保持原始特征。

详情
AI中文摘要

本文开发了一个训练后增强不变性的框架,其目标是为预训练网络添加不变性属性,同时不改变其在原始非增强输入分布上的行为。我们精确定义了这一概念,并引入了增强编码器,这是一种概率编码器,形式化了基于增强的编码过程,并作为我们的基本研究对象。我们提出了两种增强编码器的损失函数,即马尔可夫-瓦瑟斯坦最小化和瓦瑟斯坦相关性最大化,并通过实验证明,这两种损失函数可用于训练轻量级的单隐藏层MLP适配器网络$E_{\theta}$,当将其附加到预训练网络$F$的潜空间时,确实能实现(近似)训练后增强不变性。例如,在STL10上使用$F=\text{DINO}$特征时,复合网络$C\circ E_{\theta}\circ F$(其中$C$是线性分类器,$E_{\theta}$是我们提出的适配器网络之一)在任意旋转图像上达到94%的分类准确率,而没有适配器$E_{\theta}$的$C\circ F$网络则降至71%。类似地,我们可以将噪声不变分类结果从58%提升至86%。重要的是,我们无需微调即可获得这些结果($F$的权重全程冻结),并且我们的方法对原始特征的破坏很小,因为$E_{\theta}$在非增强潜分布上几乎等距作用。相比之下,我们展示了使用替代候选损失函数(特别是SimCLR和HSIC最大化)训练的适配器网络产生了不具竞争力的分类结果,并从根本上破坏了原始潜空间。代码见https://this URL。

英文摘要

This work develops a framework for post-training augmentation invariance, in which our goal is to add invariance properties to a pretrained network without altering its behavior on the original, non-augmented input distribution. We define this notion precisely and additionally introduce augmented encoders, which are probabilistic encoders that formalize augmentation-based encoding processes and that serve as our fundamental object of study. We introduce two losses for augmented encoders, namely, Markov-Wasserstein minimization and Wasserstein correlation maximization, and we demonstrate empirically that both losses can be used to train lightweight, one-hidden-layer MLP adapter networks $E_θ$ that, when appended to the latent space of a pretrained network $F$, do indeed lead to (approximate) post-training augmentation invariance. For example, on STL10 with $F=\text{DINO}$ features, the composite network $C\circ E_θ\circ F$, where $C$ is a linear classifier and where $E_θ$ is one of our proposed adapter networks, achieves 94% classification accuracy on arbitrarily rotated images, whereas a network of the form $C\circ F$ without the adapter $E_θ$ drops to 71% accuracy. Similarly, we can boost noise-invariant classification results from 58% up to 86%. Significantly, we obtain these results with no fine-tuning (the weights of $F$ remain frozen throughout), and our methods introduce little corruption to the original features, since $E_θ$ acts nearly isometrically on the non-augmented latent distribution. In contrast, we show that adapter networks trained with alternative candidate losses, specifically SimCLR and HSIC maximization, produce uncompetitive classification results and fundamentally corrupt the original latent space. Code available at https://github.com/keenan-eikenberry/augmentation_invariance

2602.19393 2026-06-10 cs.LG 版本更新

In Defense of Cosine Similarity: Normalization Eliminates the Gauge Freedom

为余弦相似度辩护:归一化消除了规范自由度

Taha Bouhsine

AI总结 本文证明,当嵌入被约束到单位球面时,对角规范矩阵的歧义消失,余弦距离与欧氏距离单调等价,从而解决了余弦相似度任意性的问题。

Comments This was a blog post companion draft, it needs to be updated to fit as a preprint, will do later

详情
AI中文摘要

Steck、Ekanadham 和 Kallus [arXiv:2403.05440] 表明,来自矩阵分解模型的学习嵌入的余弦相似度可以通过对角“规范”矩阵 $D$ 变得任意。他们的结果对于使用点积目标训练嵌入并计算余弦相似度的从业者来说是正确的且重要的。然而,我们认为,他们得出的普遍反对余弦相似度的结论,混淆了不兼容训练目标的病理与单位球面上余弦距离的几何有效性。我们证明,当嵌入被约束到单位球面 $\mathbb{S}^{d-1}$ 时(无论是在训练期间还是之后使用适当的目标),$D$ 矩阵的歧义完全消失,并且余弦距离恰好等于平方欧氏距离的一半。这种单调等价性意味着,在归一化嵌入上,基于余弦和基于欧氏距离的邻居排名是相同的。余弦相似度的“问题”不在于余弦相似度本身,而在于未能进行归一化。

英文摘要

Steck, Ekanadham, and Kallus [arXiv:2403.05440] demonstrate that cosine similarity of learned embeddings from matrix factorization models can be rendered arbitrary by a diagonal ``gauge'' matrix $D$. Their result is correct and important for practitioners who compute cosine similarity on embeddings trained with dot-product objectives. However, we argue that their conclusion, cautioning against cosine similarity in general, conflates the pathology of an incompatible training objective with the geometric validity of cosine distance on the unit sphere. We prove that when embeddings are constrained to the unit sphere $\mathbb{S}^{d-1}$ (either during or after training with an appropriate objective), the $D$-matrix ambiguity vanishes identically, and cosine distance reduces to exactly half the squared Euclidean distance. This monotonic equivalence implies that cosine-based and Euclidean-based neighbor rankings are identical on normalized embeddings. The ``problem'' with cosine similarity is not cosine similarity, it is the failure to normalize.

2606.06624 2026-06-10 cs.LG 版本更新

Principles and Practice of Deep Representation Learning: or a Mathematical Theory of Memory

深度表示学习的原理与实践:或记忆的数学理论

Sam Buchanan, Druv Pai, Peng Wang, Yi Ma

AI总结 本书通过表示学习视角,用优化和信息论解释现代神经网络架构设计原理,旨在打开黑箱,提高可解释性、可靠性和可控性。

Comments version 2; TeX source and supplementary material at https://ma-lab-berkeley.github.io/deep-representation-learning-book/

详情
AI中文摘要

在当前深度学习和特别是生成模型的时代,训练非常大的生成模型投入巨大。到目前为止,这类模型是难以理解的“黑箱”,因为它们具有不透明的内部机制,导致在可解释性、可靠性和可控性方面存在困难。自然,这种缺乏理解导致了炒作和恐惧。本书试图通过表示学习的视角“打开黑箱”并理解大型深度网络的机制,这是深度学习模型经验能力的主要因素——可以说是最重要的因素。本书简要大纲如下:第1章将总结贯穿全文的线索。第2、3、4、5和6章将通过优化和信息论解释现代神经网络架构的设计原理,一旦引入基本原理,就将架构开发过程(长期以来被描述为一种“炼金术”)简化为本科水平的线性代数和微积分练习。第7章和第8章将讨论这些原理在更范式化的问题解决中的应用,获得新的方法和模型,这些模型在设计上高效、可解释且可控,但又不亚于——有时甚至超过——它们所模仿的黑箱模型。第9章将讨论深度学习的潜在未来方向、表示学习的作用以及一些开放问题。

英文摘要

In the current era of deep learning and especially generative models, there is significant investment in training very large deep neural networks. Thus far, such models have been "black boxes" that are difficult to understand in the sense that they have opaque internal mechanisms, leading to difficulties in interpretability, reliability, and control. Naturally, this lack of understanding has led to both hype and fear. This book is an attempt to "open the black box" and understand the mechanisms of large deep networks, through the perspective of representation learning, which is a major factor - arguably the single most important one - in the empirical power of deep learning models. A brief outline of this book is as follows. Chapter 1 will summarize the threads that underlie the whole text. Chapters 2, 3, 4, 5, and 6 will explain the design principles of modern neural network architectures through optimization and information theory, reducing the process of architecture development (long having been described as a sort of "alchemy") to undergraduate-level linear algebra and calculus exercises once the underlying principles are introduced. Chapters 7 and 8 will discuss applications of these principles to solve problems in more paradigmatic ways, obtaining new methods and models which are efficient, interpretable, and controllable by design, and yet no less - sometimes even more - powerful than the black-box models they resemble. Chapter 9 will discuss potential future directions for deep learning, the role of representation learning, as well as some open problems.

3. 强化学习与序列决策 37 篇

2606.09883 2026-06-10 cs.LG cs.AI 新提交

TD-Grokking: Learning from Zero-Reward Problems by Training-Time Decomposition

TD-Grokking:通过训练时分解从零奖励问题中学习

Ningyuan Xi, Hao Xu, Hongsheng Xin, Ning Miao

发表机构 * Ningyuan Xi 1,2(西宁元 1,2) Hao Xu 3(许浩 3) Hongsheng Xin 3(辛红生 3) Ning Miao 1,2, †(苗宁 1,2, †)

AI总结 针对强化学习在零奖励问题上无法提供优化信号的问题,提出训练时分解框架TD-Grokking,将难解问题递归分解为可验证子问题,在数学和医疗任务上优于基线方法。

详情
AI中文摘要

大型语言模型(LLMs)在推理任务上取得了显著进展,这主要归功于后训练范式,特别是基于可验证奖励的强化学习(RLVR)。然而,一个关键瓶颈依然存在:RLVR在极具挑战性的零奖励问题上失败,因为所有采样的推理轨迹都产生统一失败的结果,无法提供优化信号来驱动模型改进。先前解决这一限制的努力,如密集过程监督、部分奖励分配或前缀引导探索,受到固有任务约束的限制,或者未能完全赋予策略模型解决原始难解问题所需的能力。为了解决这个问题,我们提出了TD-Grokking,一个针对零奖励问题的训练时分解框架。它递归地将难解的根问题分解为自包含、可验证的子问题,形成层次树,其中可解的叶子节点提供非零奖励。在数学和医疗任务上的评估表明,TD-Grokking优于普通的GRPO以及所有基线方法。结合详细分析,这些结果证实训练时分解有效地将零奖励示例转化为可用的训练信号,从而实现一致的性能提升。我们的代码和数据集可在以下网址获取:https://this URL。

英文摘要

Large language models (LLMs) have made remarkable progress in reasoning tasks, largely driven by post-training paradigms, especially reinforcement learning with verifiable rewards (RLVR). However, a critical bottleneck persists: RLVR fails on highly challenging zero-reward problems, where all sampled reasoning trajectories yield uniformly failed outcomes, providing no optimization signal to drive model improvement. Prior efforts to address this limitation, such as dense process supervision, partial reward assignment, or prefix-guided exploration, suffer from inherent task constraints or do not fully equip the policy model with the capabilities necessary to solve the original intractable problems. To address this, we propose TD-Grokking, a training-time decomposition framework for zero-reward problems. It recursively decomposes intractable root problems into self-contained, verifiable subproblems, forming hierarchical trees where solvable leaves provide non-zero rewards. Evaluations on mathematical and medical tasks show that TD-Grokking outperforms vanilla GRPO as well as all baseline approaches. Together with detailed analysis, these results confirm that training-time decomposition effectively converts zero-reward examples into usable training signals, enabling consistent performance gains. Our code and datasets are available at https://anonymous.4open.science/r/TD-Grokking-6567/.

2606.09887 2026-06-10 cs.LG cs.AI cs.CL 新提交

SocraticPO: Policy Optimization via Interactive Guidance

SocraticPO: 通过交互式指导进行策略优化

Zirui Liu, Jie Ouyang, Qi Liu, Xianquan Wang, Jiayu Liu, Tingyue Pan, Qingchuan Li, Jing Sha, Zhenya Huang, Shijin Wang, Enhong Chen

发表机构 * State Key Laboratory of Cognitive Intelligence, University of Science and Technology of China(认知智能国家重点实验室,中国科学技术大学) iFLYTEK AI Research (Central China), iFLYTEK Co., Ltd(iFLYTEK中央中国AI研究院,iFLYTEK公司)

AI总结 提出SocraticPO框架,在强化学习中使用自然语言指导辅助推理,并通过奖励衰减防止模型依赖教师帮助,提升科学推理任务性能。

详情
AI中文摘要

用于大语言模型的强化学习通常使用标量结果奖励(如二元正确性)来监督推理。这种奖励提供了优化方向,但很少解释模型应如何修正其错误推理,这可能鼓励捷径学习和脆弱的策略。我们提出\textbf{SocraticPO}(苏格拉底式策略优化),一种策略优化框架,用苏格拉底式的自然语言指导增强强化学习展开。在展开过程中,学生首先独立回答;如果答案错误,教师诊断尝试并提供简洁的纠正性指导,之后学生在扩展的上下文下继续。关键的是,这种指导与奖励衰减配对:在教师干预后获得的正确答案只得到衰减的奖励,防止策略将教师帮助视为获取奖励的免费途径。由于SocraticPO只修改展开过程,而保持标准期望奖励目标不变,它可以插入到现有的策略梯度后端(如Reinforce++)中。此外,由于教师只提供文本级指导,SocraticPO可以利用更强的黑盒教师模型,而无需访问logits或分布匹配。在来自SciKnowEval的本科水平科学推理基准上,SocraticPO优于强强化学习和自蒸馏基线。消融实验表明,目标指导和奖励衰减都是必要的,奖励衰减减轻了对辅助纠正的依赖。

英文摘要

Reinforcement learning (RL) for large language models usually supervises reasoning with scalar outcome rewards, such as binary correctness. Such rewards provide an optimization direction but rarely explain how a model should revise its mistaken reasoning, which can encourage shortcut learning and brittle policies. We propose \textbf{SocraticPO} (Socratic Policy Optimization), a policy-optimization framework that augments RL rollouts with Socratic-style natural-language guidance. During rollout, the student first answers independently; if the answer is incorrect, a teacher diagnoses the attempt and provides concise corrective guidance, after which the student continues under the expanded context. Crucially, this guidance is paired with reward decay: correct answers obtained after teacher intervention only receive decayed rewards, preventing the policy from treating teacher help as a free path to reward. Since SocraticPO only modifies the rollout process while leaving the standard expected-reward objective intact, it can be plugged into existing policy-gradient backends such as Reinforce++. Moreover, because the teacher provides only text-level guidance, SocraticPO can leverage stronger black-box teacher models without requiring access to logits or distribution matching. On undergraduate-level scientific reasoning benchmarks from SciKnowEval, SocraticPO improves over strong RL and self-distillation baselines. Ablations show that both targeted guidance and reward decay are necessary, with reward decay mitigating reliance on assisted correction.

2606.09932 2026-06-10 cs.LG cs.AI 新提交

When RL Fails after SFT: Rejuvenating Model Plasticity for Robust SFT-to-RL Handoff

当强化学习在监督微调后失效:恢复模型可塑性以实现稳健的SFT到RL交接

Runze Liu, Jiashun Liu, Xu Wan, Yuqian Fu, Ling Pan

AI总结 针对SFT过度训练导致RL阶段改进有限的问题,提出Rejuvenation方法,通过基模型锚定融合和神经元重置恢复模型可塑性,在数学推理和智能体任务上提升RL性能。

详情
AI中文摘要

监督微调(SFT)后接强化学习(RL)已成为大语言模型(LLM)后训练的标准流程。SFT预期为RL提供有用的行为先验,以进一步增强模型能力。然而,过度SFT的检查点在RL中往往表现出有限的改进。我们将此失败归因于模型可塑性的丧失:SFT初始化的策略被后续RL有效重塑的能力降低。为了更好地理解这一现象,我们从参数变化、输出空间和RL优化动态等多个角度进行了详细分析。我们的结果表明,过度SFT的模型倾向于产生过度自信的token分布,并表现出尖锐的参数景观,这使得它们在RL阶段更难优化。为了实现更稳健的SFT到RL交接,我们提出了Rejuvenation,一种简单而有效的方法,在保留有用的SFT获取先验的同时恢复可塑性。Rejuvenation利用基于基模型的模型融合来减少过度SFT引起的漂移,并通过有针对性的神经元重置来缓解模型僵化。在数学推理任务和智能体任务上的实验结果表明,我们的方法在过度训练的SFT模型上持续提升了RL性能,同时也增强了对分布外任务的泛化能力。

英文摘要

Supervised Fine-Tuning (SFT) followed by Reinforcement Learning (RL) has become a standard pipeline for Large Language Model (LLM) post-training. SFT is expected to provide a useful behavioral prior for RL to further enhance model capabilities. However, checkpoints with excessive SFT often show limited improvement during RL. We attribute this failure to the loss of model plasticity: the reduced ability of an SFT-initialized policy to be effectively reshaped by subsequent RL. To better understand this phenomenon, we conduct detailed analysis from multiple perspectives, including parameter changes, output spaces, and RL optimization dynamics. Our results show that models from excessive SFT tend to produce over-confident token distributions and exhibit sharp parameter landscapes, which make them harder to optimize in the RL stage. To enable a more robust SFT-to-RL handoff, we propose \texttt{Rejuvenation}, a simple yet effective method that restores plasticity while preserving useful SFT-acquired priors. Rejuvenation leverages base-anchored model fusion to reduce excessive SFT-induced drift with targeted neuron reset to mitigate model rigidity. Experimental results on both math reasoning tasks and agentic tasks demonstrate that our approach consistently improves RL performance on over-trained SFT models, while also enhancing generalization to out-of-distribution tasks.

2606.09961 2026-06-10 cs.LG cs.AI 新提交

3SPO: State-Score-Supervised Policy Optimization for LLM Agents

3SPO: 面向LLM智能体的状态分数监督策略优化

Yu Han, Kailing Li, Yang Jiao, Yulin Dai, Yuqian Fu, Linhai Zhuo, Tianwen Qian

AI总结 提出3SPO算法,通过动态状态分数监督实现逐步骤策略优化,解决多轮智能体任务中奖励稀疏和信用分配问题,在ALFWorld和WebShop上分别比GRPO提升22.6%和15.6个百分点。

详情
AI中文摘要

通过强化学习(RL)将大型语言模型(LLM)训练为自主智能体,已使前沿模型在长周期任务中实现超人类性能。然而,现有RL算法在轨迹级别操作,仅在收集完整回合后执行策略优化。这种粗粒度方法在多轮智能体设置中面临根本性挑战,其中奖励稀疏、延迟,且跨单个步骤的信用分配至关重要。在这项工作中,我们提出\textbf{状态分数监督策略优化(3SPO)},一种新颖的RL算法,通过动态状态分数监督执行逐步骤策略优化。在每个步骤,3SPO基于历史成功率计算状态分数,监督逐步骤信用分配、自适应回合和逐步骤策略优化,无需价值函数估计或额外辅助模型。理论上,在每状态臂架抽象下,我们证明所提出的分数监督分配机制实现了对数分配遗憾,并为动作识别、分数可区分性和过滤稳定性提供了样本复杂度保证。在ALFWorld和WebShop上使用Qwen2.5-1.5B/7B-Instruct的实验表明,3SPO在ALFWorld上持续优于GRPO $+22.6\%$,在WebShop上优于$+15.6$个百分点,同时使用相当资源实现了$2.4\times$更多的状态探索和$1.8\times$更快的收敛。代码可从此https URL获取。

英文摘要

Training large language models (LLMs) as autonomous agents via reinforcement learning (RL) has enabled frontier models to achieve superhuman performance in long-horizon tasks. However, existing RL algorithms operate at the trajectory level, performing policy optimization only after collecting complete episode rollouts. This coarse-grained approach faces fundamental challenges in multi-turn agent settings where rewards are sparse, delayed, and credit assignment across individual steps is critical. In this work, we propose \textbf{State-Score-Supervised Policy Optimization (3SPO)}, a novel RL algorithm that performs post-step policy optimization with dynamic state score supervision. At each step, 3SPO computes the state score based on historical success rates, supervising step-wise credit assignment, adaptive rollout and post-step policy optimization without requiring value function estimation or additional auxiliary models. Theoretically, under a per-state bandit abstraction, we show that the proposed score-supervised allocation mechanism achieves logarithmic allocation regret and provide sample-complexity guarantees for action identification, score distinguishability, and filtering stability. Experiments on ALFWorld and WebShop with Qwen2.5-1.5B/7B-Instruct show that 3SPO consistently outperforms GRPO by $+22.6\%$ on ALFWorld and $+15.6$ points on WebShop, while using comparable resources to achieve $2.4\times$ more state exploration and $1.8\times$ faster convergence. Code is available at https://github.com/genalyu/3SPO.

2606.10129 2026-06-10 cs.LG cs.NE 新提交

Discovering Interpretable Multi-Parameter Control Policies for Evolutionary Algorithms Using Deep Reinforcement Learning

使用深度强化学习发现可解释的进化算法多参数控制策略

Tai Nguyen, Phong Le, Carola Doerr, Nguyen Dang

AI总结 针对进化算法多参数控制缺乏可解释策略的问题,提出深度强化学习结合动作空间分解、奖励平移和长期折扣的方法,蒸馏出符号控制策略,在OneMax问题上超越现有基线。

Comments arXiv admin note: text overlap with arXiv:2505.12982

详情
AI中文摘要

虽然深度强化学习(deep-RL)已越来越多地应用于进化算法中的参数控制,但由于难以推导出适合形式化研究的有效、可解释的多参数策略,参数控制的严格理论分析在很大程度上仍局限于单参数设置。我们展示了如何利用深度强化学习克服这一障碍,以优化OneMax的(1+($\lambda$,$\lambda$))-遗传算法作为代表性案例研究,这是少数几个已正式证明动态控制具有超常数加速的问题之一。我们首先表明标准方法在这种多参数设置下难以收敛,并引入算法无关的增强技术,针对动作空间分解、奖励平移和长期折扣。在这些技术到位后,我们比较了常见的深度强化学习方法,发现双深度Q网络(Double Deep Q-Networks)独特地避免了近端策略优化(Proximal Policy Optimization)中观察到的策略崩溃,从而产生适合下游分析的轨迹。至关重要的是,我们通过将学习到的行为蒸馏为透明的符号控制策略,超越了神经网络的“黑箱”性质。由此产生的策略不仅为未来的理论分析提供了可解释性,而且表现出卓越的性能,在广泛的问题规模上始终优于现有基线。

英文摘要

While deep Reinforcement Learning (deep-RL) has been increasingly applied to parameter control in evolutionary algorithms, rigorous theoretical analysis of parameter control remains largely restricted to single-parameter settings, owing to the difficulty of deriving effective, interpretable multi-parameter policies amenable to formal study. We demonstrate how deep-RL can be leveraged to overcome this barrier, using the (1+($λ$,$λ$))-genetic algorithm optimizing OneMax, one of the few problems where a super-constant speedup of dynamic control has been formally proven, as a representative case study. We first show that standard approaches struggle to converge in this multi-parameter setting, and introduce algorithm-agnostic enhancements targeting action-space decomposition, reward shifting, and long-horizon discounting. With these in place, we compare common deep-RL methods and find that Double Deep Q-Networks uniquely avoid the policy collapse observed in Proximal Policy Optimization, yielding trajectories suitable for downstream analysis. Crucially, we move beyond the ``black-box'' nature of neural networks by distilling the learned behaviors into a transparent, symbolic control policy. This resulting policy does not only offer interpretability for future theoretical analysis but also yields exceptional performance, consistently outperforming existing baselines across a wide range of problem sizes.

2606.10228 2026-06-10 cs.LG cs.AI cs.RO 新提交

SHAPO: Sharpness-Aware Policy Optimization for Safe Exploration

SHAPO: 面向安全探索的锐度感知策略优化

Kaustubh Mani, Yann Pequignot, Vincent Mai, Liam Paull

AI总结 提出SHAPO算法,通过锐度感知策略更新隐式重加权梯度,放大罕见不安全动作的影响,抑制安全动作的贡献,从而在欠探索区域实现保守行为,提升安全性与任务性能。

Comments ICLR 2026

详情
AI中文摘要

安全探索是在安全关键领域部署强化学习(RL)智能体的先决条件。在本文中,我们通过认知不确定性的视角来探讨安全探索,其中智能体对参数扰动的敏感性作为高不确定性区域的实际代理。我们提出了锐度感知策略优化(SHAPO),一种锐度感知的策略更新规则,该规则在扰动参数处评估梯度,使得策略更新相对于智能体的认知不确定性变得悲观。分析表明,这种调整隐式地重新加权了策略梯度,放大了罕见不安全动作的影响,同时抑制了已安全动作的贡献,从而在欠探索区域将学习偏向于保守行为。在多个连续控制任务中,我们的方法在安全性和任务性能上均持续优于现有基线,显著扩展了它们的帕累托前沿。

英文摘要

Safe exploration is a prerequisite for deploying reinforcement learning (RL) agents in safety-critical domains. In this paper, we approach safe exploration through the lens of epistemic uncertainty, where the actor's sensitivity to parameter perturbations serves as a practical proxy for regions of high uncertainty. We propose Sharpness-Aware Policy Optimization (SHAPO), a sharpness-aware policy update rule that evaluates gradients at perturbed parameters, making policy updates pessimistic with respect to the actor's epistemic uncertainty. Analytically we show that this adjustment implicitly reweighs policy gradients, amplifying the influence of rare unsafe actions while tempering contributions from already safe ones, thereby biasing learning toward conservative behavior in under-explored regions. Across several continuous-control tasks, our method consistently improves both safety and task performance over existing baselines, significantly expanding their Pareto frontiers.

2606.10321 2026-06-10 cs.LG cs.AI cs.RO math.OC 新提交

Baseline-Free Policy Optimization for Neural Combinatorial Optimization

无基线的神经组合优化策略优化

Carlos S. Sepúlveda, Gonzalo A. Ruz

AI总结 提出使用GRPO算法消除神经组合优化中的基线依赖,避免训练崩溃,在TSP和CVRP上达到接近POMO的性能。

详情
AI中文摘要

神经组合优化(NCO)训练自回归策略以解决路由问题。标准训练算法REINFORCE使用滚动基线,需要维护并定期更新策略的冻结副本以降低方差。这种基线引入了一个结构脆弱性:在更难的问题实例上,较差的基线会产生噪声梯度估计,从而破坏训练稳定性。我们评估了来自大语言模型对齐的组相对策略优化(GRPO),该算法通过归一化组内采样轨迹的优势完全消除了基线。在RL4CO框架内对TSP和CVRP基准上的五种RL算法进行受控比较,我们发现:(i) GRPO避免了REINFORCE在TSP-100上观察到的训练崩溃,其中性能在预热阶段后立即从成本9.8下降到52.1,并且在延长训练下无法恢复;(ii) 在匹配的梯度更新次数下,GRPO达到了与POMO(一种基于AM的强多起点基线)在2%以内的解质量,同时无需外部基线;(iii) P3O,一种也来自对齐文献的成对偏好算法,在TSP上具有竞争力,但在CVRP上表现出更高的变异性。这些结果表明GRPO是一种有前途的无基线NCO替代方案,特别是在基线依赖训练变得脆弱的场景中。

英文摘要

Neural combinatorial optimization (NCO) trains autoregressive policies to solve routing problems. The standard training algorithm, REINFORCE with a rollout baseline, requires maintaining and periodically updating a frozen copy of the policy for variance reduction. This baseline introduces a structural vulnerability: on harder instances, a poor baseline produces noisy gradient estimates that can destabilize training. We evaluate Group Relative Policy Optimization (GRPO), an algorithm from large language model alignment that eliminates the baseline entirely by normalizing advantages within groups of sampled trajectories. In a controlled comparison of five RL algorithms on TSP and CVRP benchmarks within the RL4CO framework, we find that: (i) GRPO avoids the training collapse observed with REINFORCE on TSP-100, where performance degrades from cost 9.8 to 52.1 immediately after the warmup phase and does not recover under extended training; (ii) at matched gradient updates, GRPO achieves solution quality within 2% of POMO, a strong AM-based multi-start baseline, while requiring no external baseline; and (iii) P3O, a pairwise preference algorithm also from the alignment literature, is competitive on TSP but shows higher variability on CVRP. These results identify GRPO as a promising baseline-free alternative for NCO, particularly in settings where baseline-dependent training becomes fragile.

2606.10448 2026-06-10 cs.LG cs.AI 新提交

Mitigating Bias in Low-SNR Financial Reinforcement Learning via Quantum Representations

通过量子表示缓解低信噪比金融强化学习中的偏差

Zeyu Liu, Xuanzhi Feng, Sing Kwong Lai, Yuanchen Gao, Xiaoyi Pang, Hualei Zhang, Jingcai Guo, Jie Zhang, Song Guo

发表机构 * The Hong Kong University of Science and Technology(香港科技大学)

AI总结 针对低信噪比金融市场中SAC算法的不稳定性,提出FPQC-SAC变体,在表征层使用参数化量子电路约束特征传播,减少极端波动影响,在真实组合管理任务中累计收益相对提升66.89%。

Comments Preprint. Code available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main

详情
AI中文摘要

金融市场是典型的低信噪比(SNR)环境,这常常使Soft Actor-Critic(SAC)等离策略最大熵方法不稳定。具体来说,噪声状态表示可能产生不可靠的Q值估计,而自举会放大这些误差,形成我们称之为“金融熵陷阱”的失效模式。在本文中,我们提出FPQC-SAC,一种高效且即插即用的SAC变体,它在演员和评论家网络之前放置一个紧凑且有界的参数化量子电路(PQC),以在表征层约束特征传播,而不是过滤原始输入或在自举后正则化Q值。值得注意的是,FPQC-SAC减少了极端市场波动对贝尔曼目标估计的影响,而可训练的量子纠缠保留了灵活的跨资产交互。在真实投资组合管理任务上的实证评估表明,FPQC-SAC通过实现比标准无约束SAC累计收益相对提升66.89%,显著增强了样本外稳定性和累计收益,并且比最佳连续控制深度强化学习基线高出约27%。开源代码可在该https URL获取。

英文摘要

The financial market is a typical low signal-to-noise ratio (SNR) setting, which often destabilizes off-policy maximum-entropy methods like Soft Actor-Critic (SAC). Specifically, noisy state representations may produce unreliable Q-value estimates, and bootstrapping amplifies these errors, forming a failure mode we call the "Financial Entropy Trap". In this paper, we propose FPQC-SAC, an efficient and plug-and-play SAC variant that places a compact and bounded Parameterized Quantum Circuit (PQC) before the actor and critic networks to constrain feature propagation at the representation level, rather than filtering raw inputs or regularizing Q-values after bootstrapping. Notably, FPQC-SAC reduces the impact of extreme market fluctuations on Bellman target estimation, while trainable quantum entanglement preserves flexible cross-asset interactions. Empirical evaluations on real-world portfolio management tasks demonstrate that FPQC-SAC substantially enhances out-of-sample stability and cumulative returns by achieving a 66.89% relative gain in cumulative return over standard unconstrained SAC and outperforms the best continuous-control deep reinforcement learning baseline by approximately 27%. Open-source code is available at https://github.com/ZeyuLIU-UST/FPQC-SAC-main.

2606.10528 2026-06-10 cs.LG cs.CL 新提交

Representation-Aware Advantage Estimation: Your Reward Model Provides More Than A Scalar Output

表示感知优势估计:你的奖励模型提供的不仅仅是标量输出

Guozheng Li, Xiyan Fu, Yiwen Guo

发表机构 * Southeast University(东南大学) Nanyang Technological University(南洋理工大学) Independent Researcher(独立研究员)

AI总结 提出表示感知优势估计方法,利用奖励模型隐藏状态作为辅助信号,通过图传播计算优势值,提升RLHF的样本效率和鲁棒性。

详情
AI中文摘要

当前基于人类反馈的强化学习(RLHF)方法主要依赖来自训练好的奖励模型(RM)的标量奖励。虽然有效,但标量奖励通常存在噪声,无法捕捉细粒度的偏好差异,而RM隐藏状态编码了更丰富的语义和偏好信息。我们引入了表示感知优势估计,利用RM隐藏状态并将其建模为辅助信号以实现更好的优势估计。具体来说,我们提出了基于图的优势估计(GraphAE),将每个采样组视为一个图,其中节点对应响应,边捕捉它们在RM隐藏空间中的相似性。然后通过图传播计算优势值,使每个样本能够从其邻居中融入上下文信息。GraphAE轻量级,可以无缝集成到现有的基于组的RL算法中。我们将GraphAE应用于GRPO、GSPO和RLOO,并在不同模型和基准上进行了大量实验。实证结果显示,在三个基准上均有一致改进,在Arena-Hard-v0.1上提升高达+6.3,在AlpacaEval 2.0上提升+8.27,在MT-Bench上提升+0.22。这些结果表明,利用RM表示可以实现更高效和鲁棒的RLHF。

英文摘要

Current reinforcement learning from human feedback (RLHF) methods primarily rely on scalar rewards from a trained reward model (RM). While effective, scalar rewards are often noisy and fail to capture fine-grained preference differences, whereas RM hidden states encode richer semantic and preference information. We introduce the representation-aware advantage estimation, which leverages RM hidden states and models them as auxiliary signals for better advantage estimation. Specifically, we propose the Graph-based Advantage Estimation (GraphAE), treat each sampled group as a graph, where nodes correspond to responses and edges capture their similarity in the RM hidden space. Then advantages are computed via graph propagation, enabling each sample to incorporate contextual information from its neighbors. GraphAE is lightweight and can be seamlessly integrated into existing group-based RL algorithms. We apply GraphAE to GRPO, GSPO and RLOO, and conduct extensive experiments on different models and benchmarks. Empirical results show consistent improvements across three benchmarks, with gains of up to + 6.3 on Arena-Hard-v0.1, + 8.27 on AlpacaEval 2.0, and + 0.22 on MT-Bench. These results demonstrate that leveraging RM representations leads to more sample efficient and robust RLHF.

2606.10611 2026-06-10 cs.LG cs.CV 新提交

Geometry-Aware Reinforcement Learning for 2D Irregular Nesting

几何感知强化学习用于二维不规则排样

Auguste Lehuger, Guillaume Henon-Just

发表机构 * Valeo Brain(法雷奥大脑)

AI总结 提出Polygons Transformer架构与组合优化强化学习框架,使智能体从数据中学习几何先验,在二维不规则排样中达到与最先进启发式算法Sparrow竞争的面积利用率。

Comments 15 pages, 4 figures, 5 tables. Under review at the European Workshop on Reinforcement Learning (EWRL)

详情
AI中文摘要

针对二维不规则排样问题的传统启发式求解器存在一个根本性限制:它们对多边形几何是盲目的,依赖引导式暴力搜索在连续放置空间中导航,几何指导极少。本文认为,强化学习具有独特优势来克服这一瓶颈。通过将优化策略与几何感知神经编码器配对,智能体可以直接从数据中自动发现丰富的几何先验,利用这些学到的直觉来战略性地引导探索。为实现这一点,我们引入了Polygons Transformer(PoT),这是一种新颖的架构,能够编码二维连续矢量几何,同时允许跨多边形注意力。我们将这种新颖架构与组合优化强化学习(CORL)训练框架相结合,以寻找最优解。为了支持这一范式,我们发布了一个源自复杂地理轮廓的开源训练数据集以及一个专门的评估基准。我们的实证验证表明,训练后的智能体在面积利用率方面与最先进的启发式求解器Sparrow高度竞争,证明强化学习可以成功发现并利用几何感知来完成精确的空间任务。

英文摘要

Traditional heuristic solvers for the 2D irregular nesting problem share a fundamental limitation: they are blind to polygon geometry, relying on guided brute-force to navigate the continuous placement space with minimal geometrical guidance. In this paper, we argue that Reinforcement Learning is uniquely positioned to overcome this bottleneck. By pairing an optimization policy with a geometry-aware neural encoder, an agent can automatically discover rich geometric priors directly from data, utilizing these learned intuitions to strategically guide exploration. To realize this, we introduce the Polygons Transformer (PoT), a novel architecture that encodes 2D continuous vector geometries while allowing cross-polygons attention. We couple this novel architecture with a Combinatorial Optimization Reinforcement Learning (CORL) training framework to find optimal solutions. To support this paradigm, we release an open-source training dataset derived from complex geographic contours alongside a dedicated evaluation benchmark. Our empirical validation demonstrates that our trained agent achieves area utilization performance highly competitive with Sparrow, the state-of-the-art heuristic solver, proving that reinforcement learning can successfully discover and exploit geometric awareness for precise spatial tasks.

2606.10613 2026-06-10 cs.LG cs.AI 新提交

Fast and Highly Expressive Policy Learning for Offline Reinforcement Learning via Bootstrapped Flow Q-Learning

基于自举流Q学习的离线强化学习快速且高表达性策略学习

Thanh Nguyen, Tri Ton, Hongbin Choe, Tung M. Luu, Chang D. Yoo

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出自举流Q学习(BFQ),通过分治位移向量并自举短程分量,实现单步动作生成,无需辅助网络或蒸馏,显著降低计算成本并提升性能。

Comments ICML 2026, 19 pages

详情
Journal ref
ICML 2026
AI中文摘要

基于扩散的Q学习已成为离线强化学习的一种强大范式,但其对多步去噪的依赖使得训练和推理在计算上昂贵且脆弱。最近将扩散Q学习加速到单步动作生成的努力通常引入辅助网络、策略蒸馏或多阶段训练,这常常损害简单性、稳定性或性能。为解决这些限制,我们引入了自举流Q学习(BFQ),一种新颖的框架,能够在训练和推理期间实现精确的单步动作生成,无需辅助网络或蒸馏过程。BFQ采用分治视角处理沿流路径的位移向量:它首先学习可以从流匹配边际速度准确估计的短程位移,然后自举这些分量以直接学习单步噪声到动作的映射。这种公式消除了多步去噪,导致学习过程更快、更简单、更稳健。广泛的D4RL评估表明,与多步扩散基线相比,BFQ在显著降低计算成本的同时提高了性能,证明了单步动作生成足以实现高性能的离线强化学习。

英文摘要

Diffusion-based Q-learning has emerged as a powerful paradigm for offline reinforcement learning, but its reliance on multi-step denoising makes both training and inference computationally expensive and brittle. Recent efforts to accelerate diffusion Q-learning toward single-step action generation typically introduce auxiliary networks, policy distillation, or multi-phase training, which frequently compromise simplicity, stability, or performance. To address these limitations, we introduce Bootstrapped Flow Q-Learning (BFQ), a novel framework that enables accurate single-step action generation during both training and inference, without auxiliary networks or distillation procedures. BFQ adopts a divide-and-conquer view of the displacement vector along the flow path: it begins by learning short-range displacements that can be accurately estimated from the Flow Matching marginal velocity, and bootstraps these components to directly learn a noise-to-action mapping in a single step. This formulation eliminates multi-step denoising, resulting in a learning procedure that is substantially faster, simpler, and more robust. Extensive D4RL evaluations show that BFQ improves performance while significantly reducing computational cost compared to multi-step diffusion baselines, demonstrating that single-step action generation suffices for high-performance offline Reinforcement Learning.

2606.10646 2026-06-10 cs.LG cs.CL 新提交

How Does Reasoning Flow? Tracing Attention-Induced Information Flow for Targeted RL in LLMs

推理流如何流动?追踪注意力诱导的信息流以实现LLM中的目标RL

Zhichen Dong, Yang Li, Yuhan Sun, Weixun Wang, Yijia Luo, Zinian Peng, Taiheng Ye, Chao Yang, Wenbo Su, Yu Cheng, Bo Zheng, Junchi Yan

发表机构 * Shanghai Jiao Tong University(上海交通大学) Alibaba Group(阿里巴巴集团) Shanghai Artificial Intelligence Laboratory(上海人工智能实验室)

AI总结 提出FlowTracer框架,通过注意力诱导的有向无环图追踪答案导向的推理流,基于全局信息流结构分配token级信用,从而提升LLM在推理任务中的强化学习效果。

Comments 25 pages, 7 figures, 11 tables. Accepted at ICML 2026

详情
AI中文摘要

Token级信用分配仍然是大型语言模型(LLM)中强化学习(RL)的主要障碍,其中RL配方通常平等对待所有token,未能区分决定性推理步骤与常规格式或流畅填充。最近的研究利用模型内部信号分配更细粒度的信用,但这些往往是点式启发式方法,忽略了信息传播的全局结构。我们提出FlowTracer,一个RL框架,它在注意力诱导的有向无环图上追踪答案导向的推理流,其中节点对应token,边容量来自聚合的注意力权重,并从这种全局结构中推导出token信用。边容量被重新加权,仅保留能够到达答案区域的影响,同时强制执行局部流守恒,使得中间token不会因路径长度或无关分支而损失或获得有效质量。在此图上,FlowTracer提取连接问题与答案的信息流骨干,并通过流吞吐量对token进行评分,揭示调解长距离依赖的高影响枢纽和聚合检查点。这些推导出的重要性用于塑造token级奖励,使学习信号精确聚焦于将信息路由向(或远离)正确答案的token,并在各种推理任务中提供一致的性能提升。

英文摘要

Token-level credit assignment remains a key obstacle for reinforcement learning (RL) in large language models (LLMs), where RL recipes typically treat all tokens equally, failing to distinguish decisive reasoning steps from routine formatting or fluent filler. Recent attempts leverage model-internal signals to assign finer-grained credit, but these are often point-wise heuristics that ignore the global structure of information propagation. We propose FlowTracer, an RL framework that traces answer-targeted reasoning flow on an attention-induced directed acyclic graph in which nodes correspond to tokens and edge capacities come from aggregated attention weights and derives token credit from this global structure. The edge capacities are reweighted to retain only the influence that can reach the answer region, while enforcing local flow conservation so intermediate tokens neither lose nor gain effective mass due to path length or irrelevant branches. On this graph, FlowTracer extracts an information-flow backbone connecting the question to the answer and scores tokens by flow throughput, revealing high-impact hubs and aggregation checkpoints that mediate long-range dependencies. These derived importances are used to shape token-level rewards, enabling learning signals to focus precisely on the tokens that route information toward (or away from) correct answers and delivering consistent performance gains across a range of reasoning tasks.

2606.10684 2026-06-10 cs.LG cs.AI 新提交

Divide and Cooperate: Role-Decomposed Multi-Agent LLM Training with Cross-Agent Learning Signals

分工与合作:基于跨智能体学习信号的角色分解多智能体LLM训练

Jaewan Park, Solbee Cho, Jay-Yoon Lee

发表机构 * Seoul National University(首尔大学)

AI总结 提出DAC框架,将多步推理分解为搜索和生成两个子任务,分别由专用智能体处理,并通过跨智能体学习信号解决信用分配问题,在QA基准上超越全参数微调的单体模型。

详情
AI中文摘要

现代语言智能体通过多步推理在知识密集型问答中表现出色。然而,现有方法通常将证据获取和答案生成耦合在单一策略中。这迫使单个模型扮演多个可能冲突的角色,导致策略空间组合爆炸并阻碍高效探索。同时,训练中引入信用分配问题:当生成失败时,检索到足够证据的搜索动作仍可能受到惩罚,反之亦然。我们提出DAC(分工与合作),一个角色分解的多智能体训练框架,将智能体搜索分解为两个合作性子任务,每个子任务由专用智能体处理,并使用角色特定的学习信号进行训练。生成器扮演双重角色:既是答案生成器,也是证据充分性验证器,当检索到的证据不足时放弃回答。该放弃信号被纳入搜索智能体的奖励中,提供结构化的跨智能体学习信号以改进信用分配。相反,搜索器通过硬阳性证据增强向生成器暴露多样且具有挑战性的证据环境,提高其鲁棒性。在通用和多跳问答基准上的实验表明,DAC通过共享骨干网络上的参数高效LoRA模块实现,在性能上优于先前依赖全参数微调单体模型的基线方法。

英文摘要

Modern language agents which perform multi-step reasoning have shown strong performance in knowledge-intensive question answering. However, existing approaches typically couple evidence acquisition and answer generation within a single policy. This forces a single model to play multiple potentially conflicting roles, inducing a combinatorial explosion in the policy space and hindering efficient exploration. It also introduces a credit assignment problem during training: a search action that retrieves sufficient evidence may still be penalized when generation fails, and vice versa. We propose DAC (Divide and Cooperate), a role-decomposed multi-agent training framework that divides agentic search into two cooperative subtasks, each handled by a dedicated agent trained with role-specific learning signals. The generator serves a dual role as both an answer producer and an evidence sufficiency verifier, abstaining when retrieved evidence is insufficient. This abstention signal is incorporated into the search agent's reward, providing structured cross-agent learning signals that improve credit assignment. Conversely, the searcher exposes the generator to diverse and challenging evidence environments by hard-positive evidence augmentation, improving its robustness. Experiments on general and multi-hop QA benchmarks show that DAC, implemented via parameter-efficient LoRA modules over a shared backbone, achieves strong performance against prior baselines that rely on full fine-tuning of monolithic models.

2606.10705 2026-06-10 cs.LG cs.AI cs.SY eess.SY 新提交

Event-Driven Reinforcement Learning Enables Long-Horizon Control in Semiconductor Fabrication

事件驱动强化学习实现半导体制造中的长时域控制

Yavar Yeganeh, Mahsa Shekari, Nicla Frigerio, Daniele Pagano, Andrea Matta

发表机构 * Politecnico di Milano(米兰理工大学) STMicroelectronics(意法半导体)

AI总结 提出事件驱动深度强化学习框架,将半导体制造控制建模为中心化智能体问题,通过事件驱动时序差分方法优化多目标策略,在高保真仿真中显著提升吞吐量和利用率。

详情
AI中文摘要

强化学习有望优化大规模系统中的序贯决策。半导体制造系统是随机且高度约束的环境,其中异构晶圆在广泛的设备网络中经历数百个加工步骤。这些特性产生了复杂、高维的决策问题,具有延迟反馈和长时域要求,使生产计划和控制复杂化。我们提出了一个用于此规模的多目标策略优化的深度强化学习框架。具体来说,我们将控制表述为一个中心化智能体问题,其中核心策略协调系统范围的决策,而系统演化被表示为由离散事件驱动的互联时间过程。相应地,我们开发了一个定制的事件驱动时序差分公式,该公式保持通用性,并可在相关训练设置下与各种策略优化方法集成。我们研究了纳入该框架的几种核心无模型算法,并使用不同工业现实操作场景的高保真仿真评估其有效性。在广泛的验证实验中,在离线和在线设置下训练的智能体在吞吐量和利用率方面显示出显著且一致的提升。我们进一步评估了训练阶段的表现和泛化能力,阐明了替代强化学习公式和算法的相对优势。总体而言,结果支持所提出框架在控制事件驱动复杂自适应系统方面的可扩展性、通用性和可迁移性。

英文摘要

Reinforcement learning promises to optimize sequential decisions in large-scale systems. Semiconductor manufacturing systems are stochastic and highly constrained environments where heterogeneous wafers traverse hundreds of processing steps across extensive equipment networks. These characteristics yield complex, high-dimensional decision problems with delayed feedback and long-horizon requirements, complicating production planning and control. We propose a deep reinforcement learning framework for multi-objective policy optimization at this scale. Specifically, we formulate control as a centralized-agent problem, where a core policy coordinates system-wide decisions, while system evolution is represented as an interconnected temporal process driven by discrete events. Accordingly, we develop a tailored event-driven temporal-difference formulation that remains general and can be integrated with various policy optimization methods under relevant training settings. We investigate several core model-free algorithms incorporated into this framework and evaluate their effectiveness using high-fidelity simulations of diverse, industry-real operating scenarios. Across extensive validation experiments, agents trained in both offline and online settings show significant and consistent gains in throughput and utilization. We further evaluate performance and generalization across training phases, clarifying the relative strengths of alternative reinforcement learning formulations and algorithms. Overall, the results support the scalability, generality, and transferability of the proposed framework for controlling event-driven complex adaptive systems.

2606.10768 2026-06-10 cs.LG cs.CL 新提交

N-GRPO: Embedding-Level Neighbor Mixing for Enhanced Policy Optimization

N-GRPO:嵌入级邻居混合增强策略优化

Xukun Zhu, Hang Yu, Peng Di, Linchao Zhu

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团)

AI总结 针对大语言模型数学推理中探索策略的折衷问题,提出N-GRPO方法,通过语义邻居混合机制在嵌入层注入多样性,在保持语义一致性的同时提升策略优化效果。

Comments ACL 2026 Findings. 16 pages, 3 figures. Code: https://github.com/ZJUSCL/N-GRPO

详情
AI中文摘要

大语言模型在数学推理中的成功很大程度上依赖于生成多样化且有效的解题路径。然而,当前的展开技术面临一个基本折衷:token级采样通常产生仅在措辞上不同的冗余轨迹,而利用随机噪声的嵌入级方法则经常破坏语义一致性。为解决此问题,我们引入N-GRPO,一种集成到组相对策略优化(GRPO)框架中的新型探索策略。我们的方法不依赖于token级采样或原生嵌入级噪声,而是利用语义邻居混合机制。该机制通过混合锚点token及其最近语义邻居的嵌入来动态构建输入表示,从而在严格遵循局部语义流形的同时注入多样性。在不同大小的DeepSeek-R1-Distill-Qwen模型上的实验评估表明,N-GRPO不仅在数学推理基准上相比强基线取得一致改进,而且在分布外任务上展现出鲁棒的泛化能力。

英文摘要

The success of Large Language Models in mathematical reasoning relies heavily on the generation of diverse and valid solution paths during the rollout phase. However, current rollout techniques face a fundamental trade-off: token-level sampling often yields redundant trajectories that differ only in rephrasing, while embedding-level methods utilizing random noise frequently disrupt semantic consistency. To resolve this, we introduce N-GRPO, a novel exploration strategy integrated into the Group Relative Policy Optimization (GRPO) framework. Rather than relying on token-level sampling or native embedding-level noise, our approach leverages Semantic Neighbor Mixing. This mechanism dynamically constructs input representations by mixing the embeddings of an anchor token and its nearest semantic neighbors, thereby injecting diversity while strictly adhering to the local semantic manifold. Experimental evaluations on the DeepSeek-R1-Distill-Qwen models across different sizes show that N-GRPO not only achieves consistent improvements over strong baselines on math reasoning benchmarks but also exhibits robust generalization capabilities on out-of-distribution tasks.

2606.10825 2026-06-10 cs.LG 新提交

MODIP: Efficient Model-Based Optimization for Diffusion Policies

MODIP:扩散策略的高效基于模型的优化

Zakariae El Asri, Philippe Gratias-Quiquandon, Nicolas Thome, Olivier Sigaud

发表机构 * Sorbonne Université, CNRS, ISIR, F-75005 Paris, France(索邦大学,法国国家科学研究中心,智能系统与机器人研究所,法国巴黎) Institut Universitaire de France (IUF)(法国大学研究院)

AI总结 提出MODIP框架,利用世界模型和模型预测控制生成高质量轨迹,以监督方式微调扩散策略,实现离线到在线的强化学习微调,在D4RL和RoboMimic任务上超越行为克隆基线。

详情
AI中文摘要

扩散策略(DPs)已成为机器人学习中表达力强的策略表示,通常与行为克隆(BC)等模仿学习方法一起使用。然而,虽然它们的成功主要局限于BC,但直接进行强化学习(RL)微调仍然具有挑战性,因为动作是通过多步去噪过程生成的。在这项工作中,我们提出了MODIP,一个用于扩散策略离线到在线微调的框架。MODIP不是直接将RL应用于DPs,而是利用世界模型(WM)来指导策略适应,并保持BC的简单性和稳定性。我们利用模型预测控制(MPC)在WM内生成高质量轨迹,并将其作为监督目标来微调DP。为了使MPC规划高效,MODIP使用终端状态值而不是依赖于策略的状态-动作值,从而减少了推理时间。此外,MODIP使用与策略无关的TD目标训练评论家,减少了训练时间。在D4RL(MuJoCo、Kitchen)和RoboMimic任务上的实验表明,MODIP改进了超越BC的扩散策略,并且与扩散策略RL微调方法和强基于模型的基线(如TD-MPC2)相比具有竞争力或更优性能。

英文摘要

Diffusion policies (DPs) have emerged as expressive policy representations for robot learning, often used with imitation learning methods such as behavioral cloning (BC). However, while their success has largely been confined to BC, direct reinforcement learning (RL) fine-tuning remains challenging because actions are generated through a multi-step denoising process. In this work, we propose MODIP, a framework for the offline-to-online fine-tuning of DPs. Rather than directly applying RL to the DPs, MODIP leverages a world model (WM) to guide policy adaptation and keeps the simplicity and stability of BC. We utilize model predictive control (MPC) to generate high-quality trajectories within the WM, and use them as supervised targets for fine-tuning the DP. To make MPC planning efficient, MODIP uses a terminal state value instead of a policy-dependent state-action value, reducing inference time. Additionally, MODIP trains critics with policy-independent TD targets, reducing training time. Experiments on D4RL (MuJoCo, Kitchen) and RoboMimic tasks show that MODIP improves diffusion policies beyond BC, and is competitive with or outperforms diffusion policy RL fine-tuning methods and strong model-based baselines such as TD-MPC2.

2606.10835 2026-06-10 cs.LG cs.AI 新提交

Geometrically Averaged Hard Target Updates for Linear Q-Learning

线性Q学习的几何平均硬目标更新

Donghwan Lee

发表机构 * School of Electrical Engineering, KAIST(韩国科学技术院电气工程学院)

AI总结 提出λ-几何加权平均的周期目标更新方法,用于线性Q学习,通过切换系统模型分析其稳定性,连接了单周期更新和投影Q值迭代。

详情
AI中文摘要

周期性硬目标更新是现代深度Q学习中最常见的稳定化手段之一。最近的研究表明,目标更新可以提高使用函数逼近(包括线性函数逼近)的Q学习的稳定性。我们引入并分析了所谓的λ-目标更新,通过将m-周期目标更新映射与λ-几何权重$(1-\lambda)\lambda^{m-1}$($\lambda \in [0,1]$)平均得到。端点$\lambda=0$恢复单周期目标更新,而连续端点$\lambda\uparrow1$恢复投影Q值迭代。我们使用切换系统模型和相关工具,研究了这种机制在线性函数逼近的Q学习(即线性Q学习)中的应用。为清晰起见,本文处理确定性版本;该公式可扩展到随机强化学习设置。

英文摘要

Periodic hard target updates are among the most common stabilization devices in modern deep Q-learning. Recent studies suggest that target updates can improve stability in Q-learning with function approximation, including linear function approximation. We introduce and analyze the so-called $λ$-target update, obtained by averaging the $m$-periodic target update maps with $λ$-geometric weights $(1-λ)λ^{m-1}$, $λ\in [0,1]$. The endpoint $λ=0$ recovers the one-period target update, while the continuous endpoint $λ\uparrow1$ recovers projected Q-value iteration. We study this mechanism for Q-learning with linear function approximation, namely linear Q-learning, using a switching-system model and related tools. For clarity, the paper treats a deterministic version; the formulation extends to stochastic reinforcement-learning settings.

2606.11087 2026-06-10 cs.LG cs.AI 新提交

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

强化学习中流策略的测试时梯度引导

Zhiyuan Zhou, Andy Peng, Charles Xu, Qiyang Li, Tobias Springenberg, Kevin Frans, Sergey Levine

发表机构 * UC Berkeley(加州大学伯克利分校) Physical Intelligence

AI总结 提出QGF算法,通过预训练参考流策略和价值函数,在测试时利用价值梯度引导策略生成高价值动作,无需额外策略学习,在离线RL基准上优于现有测试时方法且与训练时方法竞争力相当。

详情
AI中文摘要

表达性连续控制策略,如扩散模型和流模型,构成了模拟和真实机器人控制中模仿学习近期进展的基础。尽管它们在监督模仿学习设置中能够稳定扩展,但将其纳入强化学习(RL)流程以改进策略已被证明更加困难。这通常需要专门的训练目标或通过去噪过程反向传播,这会导致众所周知的稳定性问题并影响可扩展性。在本文中,我们研究了一个问题:仅在测试时采用简单的策略改进方案,同时保持稳定的监督策略训练不变,是否能够成为避免这些问题的竞争性替代方案。为此,我们提出了QGF(Q-Guided Flow),一种完全在测试时进行策略优化的RL算法。QGF通过预训练一个参考流策略(通过标准的行为克隆目标)和一个价值函数评论家,并在测试时使用价值梯度引导参考策略生成更高价值的动作,而无需任何额外的策略学习。实验上,QGF在高维动作空间的单任务和目标条件离线RL基准测试中优于先前的测试时RL方法,并且与最先进的训练时算法竞争力相当,同时运行成本更低。此外,通过避免演员-评论家训练的不稳定性,它展现出随模型规模的良好扩展性,为使用表达性策略提供了一种实用且有效的替代RL算法。

英文摘要

Expressive continuous control policies, such as diffusion and flow models, form the backbone of recent advances in scaling imitation learning for simulated and real robot control. While they are known to scale stably in the supervised imitation learning setting, incorporating them into reinforcement learning (RL) pipelines for policy improvement has proven more difficult. It often requires specialized training objectives or backpropagating through denoising processes, which cause well-known issues with stability and affect scalability. In this paper we study the question of whether simple policy improvement schemes at test time alone, leaving stable supervised policy training intact, can be a competitive alternative which sidesteps these issues. To this end, we propose QGF (Q-Guided Flow), an RL algorithm that performs policy optimization entirely at test time. QGF works by pre-training both a reference flow policy (via a standard behavioral cloning objective) and a value function critic and, at test time, using the value gradient to guide the reference policy to generate higher-value actions without any additional policy learning. Empirically, QGF outperforms prior test-time RL methods on single-task and goal-conditioned offline RL benchmarks with high-dimensional action spaces, and is competitive with state-of-the-art training-time algorithms while being much cheaper to run. Moreover, it exhibits favorable scaling with model size by avoiding the instability of actor-critic training, offering a practical and effective alternative RL algorithm with expressive policies.

2606.11119 2026-06-10 cs.LG cs.AI cs.CL 新提交

TRACE: A Unified Rollout Budget Allocation Framework for Efficient Agentic Reinforcement Learning

TRACE:一种用于高效智能体强化学习的统一展开预算分配框架

Heming Zou, Qi Wang, Yun Qu, Yuhang Jiang, Lizhou Cai, Yixiu Mao, Ru Peng, Xin Xu, Weijie Liu, Kai Yang, Saiyong Yang, Xiangyang Ji

发表机构 * Tsinghua University(清华大学) Tencent(腾讯)

AI总结 针对多轮智能体强化学习中奖励对比度不足的问题,提出TRACE框架,通过将每个ReAct式思考-行动-观察步骤建模为语义节点,在固定采样预算内将预算分配到提示根和中间前缀,增强奖励对比,提升策略更新信号。

Comments 32 pages, 12 figures, 6 tables

详情
AI中文摘要

具有可验证奖励的强化学习(RLVR)是增强大型语言模型推理和智能体行为的一种有前景的方法。然而,展开密集的策略优化常常受到奖励对比度不足的限制,当过于简单或复杂的提示产生低方差反馈,以及当仅结果奖励对多轮展开中的每个决策赋予相同的终端评估时,就会出现这种情况。过去的努力集中在将可用的展开资源分配给有希望的提示,但它们仅利用提示级别的样本信息性,而忽略了同一展开中不同轮次之间前缀级别信息性的变化。本工作针对多轮智能体强化学习,将每个ReAct风格的思考-行动-观察步骤建模为语义上不同的节点,使得预算分配从提示根扩展到具有进一步延续的轮次级前缀,这自然形成了树状结构的展开。我们引入了树状展开分配用于对比探索(TRACE),这是一个统一的展开分配框架,在固定采样预算内增强奖励对比。在技术上,TRACE将展开预算分配给最可能产生混合终端奖励的提示根和中间前缀。一个共享的通用预测器根据前缀历史估计这些锚点处的条件成功概率,以指导这种分配。由此产生的自适应树状结构丰富了仅结果反馈,并放大了策略更新信号。实验上,TRACE在典型的智能体基准测试中取得了有竞争力的性能和效率提升,例如,在相同采样成本下,Qwen3-14B多跳问答的平均准确率比竞争基线提高了2.8个百分点。

英文摘要

Reinforcement learning with verifiable rewards (RLVR) is a promising approach for enhancing reasoning and agentic behavior in large language models. However, rollout-intensive policy optimization is often limited by insufficient reward contrast, arising when overly simple or complex prompts generate low-variance feedback and when outcome-only rewards assign the same terminal assessment to every decision in a multi-turn rollout. Past efforts have focused on allocating available rollout resources to promising prompts, yet they only leverage sample informativeness at the prompt level and neglect variation in prefix-level informativeness across turns within the same rollout. This work targets multi-turn agentic RL by modeling each ReAct-style thought-action-observation turn as a semantically distinct node, allowing budget allocation to extend from prompt roots to turn-level prefixes with further continuations, which naturally forms tree-structured rollouts. We introduce Tree Rollout Allocation for Contrastive Exploration (TRACE), a unified rollout allocation framework that enhances reward contrast within a fixed sampling budget. Technically, TRACE allocates rollout budget to both prompt roots and intermediate prefixes that are most likely to yield mixed terminal rewards. A shared generalizable predictor estimates conditional success probability at these anchors from prefix histories to guide this allocation. The resulting adaptive tree structure enriches outcome-only feedback and amplifies the policy-update signal. Empirically, TRACE achieves competitive performance and efficiency gains on typical agentic benchmarks, e.g., improving Qwen3-14B Multi-Hop QA average accuracy by 2.8 points over competitive baselines at equal sampling cost.

2606.09884 2026-06-10 cs.MA cs.AI cs.LG econ.EM 交叉投稿

Failure Modes of Deep Multi-Agent RL in Asynchronous Pricing: Reproducible Triggers, Trace Diagnostics, and a Partial Fix

深度多智能体强化学习在异步定价中的失败模式:可复现触发器、轨迹诊断及部分修复

Shree Murthy, Rohan Pandey

发表机构 * DigitalOcean, USA(DigitalOcean美国)

AI总结 研究连续时间定价市场中深度多智能体强化学习的两种可复现失败模式:DDPG智能体之间的默契合谋和高事件率下的演员-评论家不稳定性,并通过异步性实现部分修复。

详情
AI中文摘要

我们研究了连续时间定价市场中深度多智能体强化学习的两种可复现失败模式:(i) 竞争性DDPG智能体之间形成默契合谋,以及(ii) 高事件率下的演员-评论家不稳定性。我们在一个单一的CT-MARL基准测试(泊松时钟价格更新、观测延迟$\delta$、内部最优logit需求)中实例化了这两种模式,表明同步DDPG智能体可靠地触发失败模式1,合谋指数$\Delta = 0.69 \pm 0.11$,并量化了一种部分微观结构修复:仅异步性就将合谋降低了48%,而增加延迟使其降至最低$\Delta = 0.28$。该修复具有明确记录的成本:它是部分的($\Delta$仍高于伯特兰水平),在$\delta$上非单调,并且无法承受失败模式2,后者在$\lambda = 5$时表现为DDPG评论家发散,并破坏了$(\lambda{=}5, \delta{=}1)$处的相图单元。我们为标量合谋指数配备了轨迹级诊断,揭示了情节内信号崩溃和冲击后无法恢复。

英文摘要

We study two reproducible failure modes of deep multi-agent reinforcement learning in continuous-time pricing markets: (i) tacit cartel formation between competing DDPG agents, and (ii) actor--critic instability at high event rates. We instantiate both inside a single CT-MARL benchmark (Poisson-clocked price updates, observation latency $δ$, interior-optimum logit demand), show that synchronous DDPG agents reliably trigger Failure Mode 1 with collusion index $Δ= 0.69 \pm 0.11$, and quantify a partial microstructure fix: asynchrony alone cuts collusion by 48\% and adding latency drives it to a minimum of $Δ= 0.28$. The fix has clearly documented costs: it is partial ($Δ$ remains supra-Bertrand), it is non-monotone in $δ$, and it does not survive Failure Mode 2, which emerges as DDPG critic divergence at $λ= 5$ and corrupts the phase-diagram cell at $(λ{=}5, δ{=}1)$. We accompany the scalar collusion index with trajectory-level trace diagnostics that expose the within-episode signalling collapse and the post-shock non-recovery.

2606.10267 2026-06-10 cs.RO cs.AI cs.LG 交叉投稿

What Matters in Orchestrating Robot Policies: A Systematic Study of Hierarchical VLA Agents

机器人策略编排的关键因素:分层VLA智能体的系统研究

Jiaheng Hu, Mohit Shridhar, Caden Lu, Dhruv Shah, Hao-Tien Lewis Chiang, Jie Tan, Annie Xie

发表机构 * Google DeepMind(谷歌深Mind)

AI总结 系统研究分层视觉-语言-动作(Hi-VLA)系统的设计原则,通过统一框架分析规划器、控制器及接口机制对短时、长时及推理密集型任务性能的影响,提出构建更强健分层VLA智能体的实用原则。

详情
AI中文摘要

分层视觉-语言-动作(Hi-VLA)系统已成为复杂机器人操作的一种有前景的范式,它通过使用高层VLM规划器将任务分解为语言子目标,由低层VLA控制器执行。尽管近期取得了实证进展,但这些系统缺乏统一的设计原则:现有的Hi-VLA系统在选择和连接规划器、控制器、两者之间的切换机制以及规划器中观测和记忆的表示方式上存在差异。在本文中,我们对机器人操作的Hi-VLA设计进行了系统研究。我们将代表性的Hi-VLA智能体统一在一个选项式控制框架下,并在短时、长时和推理密集型任务上基准测试核心设计选择。我们的分析提炼出构建Hi-VLA系统的实用原则,展示了模型选择和接口机制如何共同塑造性能。应用这些原则,在仿真和真实ALOHA机器人上的实验中,我们得到了一个比平面VLA控制或朴素设计的分层系统都显著更强的系统。总体而言,我们的结果为构建更强大、更鲁棒且更有原则的分层VLA智能体奠定了基础。更多信息和视频请访问此http URL。

英文摘要

Hierarchical vision-language-action (Hi-VLA) systems have emerged as a promising paradigm for complex robot manipulation, by using high-level VLM planners to decompose tasks into language subgoals executed by low-level VLA controllers. Despite recent empirical progress, there is a lack of unified design principles for these systems: existing Hi-VLA systems differ in how they choose and connect planners, controllers, mechanisms to switch between the two, and how observations and memory are represented in the planner. In this paper, we present a systematic study of Hi-VLA design for robot manipulation. We unify representative Hi-VLA agents under an options-style control framework and benchmark core design choices across short-horizon, long-horizon, and reasoning-intensive tasks. Our analysis distills practical principles for building Hi-VLA systems, showing how model choices and interface mechanisms jointly shape performance. Applying these principles yields a substantially stronger system than either flat VLA control or a naively designed hierarchy, across experiments both in simulation and on a real ALOHA robot. Overall, our results provide a foundation for building more capable, robust, and principled hierarchical VLA agents. More information and video at jiahenghu.github.io/hi-vla.

2606.10601 2026-06-10 math.NA cs.AI cs.LG cs.NA 交叉投稿

Dmsh: A Multi-Agent Reinforcement Learning Framework for All-Quad Mesh Generation

Dmsh:一种用于全四边形网格生成的多智能体强化学习框架

Anirudh Kalyan, Cosmin Anitescu, Xiaoying Zhuang, Timon Rabczuk, Somdatta Goswami, Sundararajan Natarajan

发表机构 * Department of Mechanical Engineering, Indian Institute of Technology Madras(印度理工学院马德拉斯分校机械工程系) Institute of Continuum Mechanics, Leibniz Universität Hannover(莱比锡大学汉诺威连续力学研究所) Institute of Structural Mechanics, Bauhaus-Universität Weimar(魏玛包豪斯大学结构力学研究所) Department of Civil and Systems Engineering, Johns Hopkins University(约翰霍普金斯大学土木与系统工程系)

AI总结 提出Dmsh,首个全自动强化学习流水线,通过三个协调智能体处理拓扑简化、几何正则化和网格生成,采用参数化Soft Actor-Critic架构和课程学习策略,实现高质量全四边形网格生成。

详情
AI中文摘要

为任意几何体生成高质量网格仍然是计算工程中的一个基本瓶颈,通常需要启发式调整和半手动工作流程。在本文中,我们介绍了Dmsh,这是第一个完全自动化的强化学习流水线,它将几何分解和四边形网格生成统一在一个基于学习的框架中。Dmsh通过三个协调的智能体分解问题,分别处理拓扑简化、几何正则化和网格生成。网格生成过程被建模为马尔可夫决策过程,并使用具有解耦评论家的参数化Soft Actor-Critic架构求解,从而能够高效探索混合离散-连续动作空间。课程学习策略确保了从简单域到高度复杂几何体的可扩展性,并抑制了种子方差。通过设计,递归分解使得子区域能够并行网格化,生成全局一致的全四边形网格,无需事后校正。在广泛的基准测试中,Dmsh在自动化程度、鲁棒性和网格质量方面始终优于现有方法,为基于学习的网格生成建立了新范式。

英文摘要

Generating high-quality meshes for arbitrary geometries remains a fundamental bottleneck in computational engineering, often demanding heuristic tuning and semi-manual workflows. In this paper, we introduce Dmsh, a first fully automated reinforcement learning pipeline that unifies geometric decomposition and quadrilateral mesh generation within a single learning-based framework. Dmsh decomposes the problem through three coordinated agents handling topology simplification, geometric regularization, and mesh generation. The meshing process is formulated as a Markov Decision Process and solved using a parametric Soft Actor-Critic architecture with decoupled critics, enabling efficient exploration of a hybrid discrete-continuous action space. A curriculum learning strategy ensures scalability from simple domains to highly complex geometries, suppressing seed variance. By design, the recursive decomposition enables parallel meshing of subregions, yielding globally conforming all-quadrilateral meshes without post hoc correction. Across a wide range of benchmarks, Dmsh consistently outperforms existing methods in automation, robustness, and mesh quality, establishing a new paradigm for learning-based mesh generation.

2606.10614 2026-06-10 cs.RO cs.CV cs.LG 交叉投稿

Dexterous Point Policy: Learning Point-based Dexterous Hand Policies from Human Demonstrations

灵巧点策略:从人类演示中学习基于点的灵巧手策略

Beomjun Kim, Seong Hyeon Park, Seunghoon Sim, Seungjun Moon, Sanghyeok Lee, Jinwoo Shin

发表机构 * KAIST(韩国科学技术院)

AI总结 提出Dexterous Point Policy框架,通过统一3D关键点表示从人类视频学习灵巧操作策略,无需机器人演示,在真实任务中达到75%成功率。

详情
AI中文摘要

基于人类演示视频预训练的机器人基础模型显示出潜力,但当策略部署到真实机器人时仍存在显著的具身差距。常见的补救措施是在机器人特定演示上微调这些模型。然而,机器人数据收集可能过于昂贵和耗时,这在灵巧操作中尤为突出,例如,即使是单个原子任务,遥操作多指手也可能需要数天。为了解决这个问题,我们引入了Dexterous Point Policy,一个直接从人类视频学习灵巧操作策略且无需机器人演示的框架。我们的核心见解是,统一的3D关键点表示在用于观察和动作时,可以桥接人类和机器人的具身。具体来说,我们从原始视频中提取任务相关物体和人类手的3D关键点,并训练一个自回归变换器来处理这些关键点。我们观察到,在关键点层面,特别是手腕和指尖,人类和机器人的行为紧密对齐,从而实现直接策略迁移。在一套包括拾取放置和工具使用的真实机器人任务中,Dexterous Point Policy达到了75.0%的成功率,而最先进的VLA基线仅达到1.0%。此外,我们的方法对未见过的场景具有很强的泛化能力,包括多物体环境和新型物体类别。

英文摘要

Robotic foundation models pre-trained on human demonstration videos have shown promise, but a significant embodiment gap remains when the resulting policies are deployed on real robots. A common remedy is to fine-tune these models on robot-specific demonstrations. However, robot data collection can be prohibitively expensive and time-consuming, which is particularly acute in dexterous manipulation, e.g., teleoperating a multi-fingered hand for even a single atomic task can take days. To address this, we introduce Dexterous Point Policy, a framework that learns dexterous manipulation policies directly from human videos and requires no robot demonstrations. Our core insight is that a unified 3D keypoint representation can bridge human and robot embodiments when used for both observations and actions. Specifically, we extract 3D keypoints of task-relevant objects and human hands from raw videos, and train an autoregressive transformer over these keypoints. We observe that at the keypoint level, specifically the wrist and fingertips, human and robot behaviors closely align, enabling direct policy transfer. On a suite of real-robot tasks spanning pick-and-place and tool use, Dexterous Point Policy attains 75.0% success, whereas a state-of-the-art VLA baseline reaches only 1.0%. Furthermore, our method generalizes strongly to unseen scenarios, including multi-object environments and novel object categories.

2606.10771 2026-06-10 astro-ph.IM cs.LG cs.RO 交叉投稿

On-sky demonstration of reinforcement learning for adaptive optics control

自适应光学控制强化学习的在轨演示

Jalo Nousiainen, Vincent Chambouleyron, Benoit Neichel, Sylvain Cetre, Jean-Francois Sauvage, Angelie Alagao, Markus Kasper, Jonathan Dray, Romain Fetick, Byron Engler

发表机构 * European Southern Observatory(欧洲南天文学中心) Aix Marseille University(艾克斯马赛大学) CNRS(法国国家科学研究中心) CNES(法国国家太空研究中心) LAM(雷恩天文物理实验室) Wakea Consulting(Wakea咨询公司) Bertin Alpao

AI总结 首次在望远镜上演示了基于强化学习的自适应光学控制器PO4AO,在多种条件下优于传统积分控制器,展示了鲁棒性和高性能。

Comments 11 pages, 12 figures accepted by A&A

详情
AI中文摘要

基于强化学习(RL)的算法最近已成为自适应光学(AO)控制的一种有前景的方法。在模拟和实验室实验中,它们已展现出对现实世界效应(如光子和探测器噪声、误配准、振动以及视宁度条件的快速变化)的鲁棒性。然而,它们的性能尚未在天空中得到验证。我们报告了首个基于强化学习的自适应光学控制器(名为PO4AO)的在轨演示。我们进一步分析了其在轨行为,并确定了改进算法及其实现的方向。PO4AO在位于OHP的1.52米望远镜(T152)的Coudé焦点的Papyrus自适应光学系统上实现并部署。基于Python的实现通过共享内存缓冲区与现有的实时控制器(DAO RTC)接口连接。在多个夜晚,覆盖不同的流量水平和大气条件,将PO4AO的性能与标准积分控制器进行了比较。PO4AO在所有测试配置中均持续优于标准积分器。该控制器成功学习并补偿了振动模式,并表现出对测量噪声的强鲁棒性。一旦为Papyrus调整好,PO4AO以交钥匙方式运行,在变化的观测条件和科学目标下使用单一超参数集。尽管非优化的Python实现引入了约750微秒的额外延迟,以及控制抖动和偶尔的帧丢失,但仍实现了这些性能提升。当正确实现和优化后,PO4AO构成了单共轭自适应光学系统的鲁棒且高性能的交钥匙控制器,为在轨AO操作中更广泛地采用强化学习策略铺平了道路。

英文摘要

Reinforcement learning (RL)-based algorithms have recently emerged as a promising approach for adaptive optics (AO) control. In simulations and laboratory experiments, they have demonstrated robustness to real-world effects such as photon and detector noise, misregistration, vibrations, and rapid variations in seeing conditions. However, their performance has not yet been validated on sky. We report the first on-sky demonstration of a reinforcement learning controller for adaptive optics, named Policy Optimization for AO (PO4AO). We further analyze its on-sky behavior and identify directions for improving the algorithm and its implementation.PO4AO was implemented and deployed on the Papyrus adaptive optics system installed at the Coudé focus of the 1.52 m telescope (T152) at the OHP. A Python-based implementation was interfaced with the existing real-time controller (DAO RTC) via shared-memory buffers. The performance of PO4AO was compared to that of a standard integrator controller over several nights, covering a range of flux levels and atmospheric conditions. PO4AO consistently outperformed the standard integrator in all tested configurations. The controller successfully learned and compensated for vibration patterns and demonstrated strong robustness to measurement noise. Once tuned for Papyrus, PO4AO operated in a turnkey fashion, using a single set of hyperparameters across varying observing conditions and science targets. These performance gains were achieved despite a non-optimized Python implementation introducing approximately $750\,μ\text{s}$ of additional latency, along with control jitter and occasional frame drops. When properly implemented and optimized, PO4AO constitutes a robust and high-performance turnkey controller for single-conjugate adaptive optics systems, paving the way for broader adoption of reinforcement learning strategies in on-sky AO operations.

2606.10857 2026-06-10 cs.RO cs.LG 交叉投稿

Embodiment-conditioned Generalist Control for Multirotor Aerial Robots

基于具身条件的多旋翼空中机器人通用控制

Orestis Konstantaropoulos, Welf Rehberg, Mihir Kulkarni, Kostas Alexis

发表机构 * Department of Engineering Cybernetics, Norwegian University of Science and Technology (NTNU), Trondheim, Norway(挪威科技大学工程控制论系)

AI总结 提出一种通用位置控制策略,通过物理具身描述符(质量与惯性归一化控制分配矩阵)实现单一网络权重控制任意多旋翼构型,采用PPO训练,五分钟后零样本迁移至真实世界。

详情
AI中文摘要

我们提出了一种通用位置控制策略,能够使用单一网络权重控制具有特定旋翼数量(例如六旋翼或四旋翼)的任意多旋翼构型。该策略基于一个物理驱动的具身描述符:一个质量和惯性归一化的控制分配矩阵,该矩阵捕捉了质量归一化的电机推力如何在机体坐标系中产生线性和角加速度。为了训练该策略,我们从任意多旋翼构型的广泛分布中采样,包括非平面和非对称系统,并使用近端策略优化(PPO)优化单个紧凑网络。训练仅需在RTX 3090 GPU上使用基于NVIDIA Warp的自定义动力学模拟器进行五分钟。通过大量仿真实验,我们展示了具身条件化使得通用控制能够在任意形态下鲁棒工作。我们还在三种不同的六旋翼系统上展示了该通用策略的零样本真实世界迁移,包括一个平面机器人、一个部分对称的非平面系统,以及一个随机非对称非平面构型。

英文摘要

We present a generalist position control policy capable of controlling arbitrary multirotor configurations of a certain rotor count (e.g., hexarotors or quadrotors) with a single set of network weights. The policy is conditioned on a physics-grounded embodiment descriptor: a mass and inertia-normalized control allocation matrix that captures how mass-normalized motor thrusts generate linear and angular accelerations in the body-frame. To train the policy, we sample from a broad distribution of arbitrary multirotor configurations, including non-planar and asymmetric systems, and optimize a single, compact network using Proximal Policy Optimization. Training requires only five minutes on an RTX 3090 GPU using a custom NVIDIA Warp-based dynamics simulator. Through extensive simulation experiments, we show that embodiment conditioning enables robust generalist control across arbitrary morphologies. We demonstrate zero-shot real-world transfer of this generalist policy on three diverse hexarotor systems, including a planar robot, a partially symmetric non-planar system, and a random asymmetric, non-planar configuration.

2506.09171 2026-06-10 cs.LG cs.AI cs.CL 版本更新

Fact-Augmented Lookahead Planning for LLM Agents

面向LLM智能体的事实增强前瞻规划

Samuel Holt, Max Ruiz Luyten, Thomas Pouplin, Mihaela van der Schaar

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出LWM-Planner框架,通过从轨迹中提取关键事实并用于条件化动作提议、世界模型模拟和状态值估计,实现无需参数更新的在线规划改进,在多个环境上优于ReAct/Reflexion和纯搜索基线。

Comments Accepted at the 29th International Conference on Artificial Intelligence and Statistics (AISTATS 2026). Camera-ready version. 9-page main text plus appendices (63 pages total), 1 figure

详情
AI中文摘要

大型语言模型(LLM)能力日益增强,但在交互式、部分可观测、长周期环境中,当搜索无引导或近期历史不足时,LLM智能体仍难以有效规划。我们提出LWM-Planner,一种事实增强的前瞻规划框架,仅通过上下文学习改善智能体行为。每个回合后,智能体从轨迹中提取任务关键原子事实,通过轻量级预测一致性过滤器验证候选事实(并可选择压缩),然后使用生成的事实集来条件化动作提议、单步潜在世界模型模拟和状态值估计。规划通过递归、有限深度的前瞻进行,基于累积事实和近期历史对候选轨迹进行搜索,实现无需参数更新的在线改进。我们提供抽象风格的动机:将事实视为减少状态混淆(代理$\epsilon_{\mathrm{sim}}$),将事实条件模拟视为降低单步误差(代理$\delta_{\mathrm{model}}$),但不声称形式化保证。实验上,在文本FrozenLake变体、CrafterMini和ALFWorld上,该方法在累积回报上优于ReAct/Reflexion和纯搜索基线,表明额外的测试时搜索在由紧凑的经验派生事实引导时最为有用。

英文摘要

Large Language Models (LLMs) are increasingly capable, but LLM agents still struggle to plan effectively in interactive, partially observable, long-horizon environments when search is unguided or recent history is insufficient. We introduce LWM-Planner, a fact-augmented lookahead planning framework that improves agent behavior purely through in-context learning. After each episode, the agent extracts task-critical atomic facts from its trajectories, validates candidates with a lightweight predictive-consistency filter (and optionally compresses them), and uses the resulting fact set to condition action proposal, single-step latent world-model simulation, and state-value estimation. Planning then proceeds via recursive, depth-limited lookahead over candidate trajectories conditioned on the accumulated facts and recent history, enabling online improvement without parameter updates. We provide abstraction-style motivation: treating facts as reducing state aliasing (proxy $ε_{\mathrm{sim}}$) and fact-conditioned simulation as lowering one-step error (proxy $δ_{\mathrm{model}}$), without claiming formal guarantees. Empirically, on text FrozenLake variants, CrafterMini, and ALFWorld, the approach improves cumulative return over ReAct/Reflexion and search-only baselines, suggesting that additional test-time search is most useful when grounded by compact, experience-derived facts.

2509.26000 2026-06-10 cs.LG stat.ML 版本更新

Informed Asymmetric Actor-Critic: Leveraging Privileged Signals Beyond Full-State Access

知情非对称Actor-Critic:利用超越全状态访问的特权信号

Daniel Ebi, Damien Ernst, Klemens Böhm, Gaspard Lambrechts

发表机构 * DeepMind(深度Mind) University of Cambridge(剑桥大学) ETH Zurich(苏黎世联邦理工学院) University of Geneva(日内瓦大学)

AI总结 提出知情非对称Actor-Critic框架,允许评论家基于任意状态相关特权信号进行条件化,并证明其产生无偏策略梯度估计;设计两种信息性准则选择最优信号,实验表明精选信号可匹配或超越全状态基线。

Comments Accepted at ICML 2026

详情
AI中文摘要

非对称强化学习利用训练时可用的特权信息来改善部分可观测条件下的学习。现有的非对称actor-critic方法通常假设在训练期间可以访问完整环境状态以条件化评论家,这在实践中往往不现实。我们引入了知情非对称actor-critic框架,允许评论家基于任意状态相关的特权信号进行条件化,并证明任何此类信号都会产生无偏的策略梯度估计。这大大扩展了可允许的特权信息集,并提出了选择最具信息性信号以促进学习的问题。为此,我们提出了两种新颖的信息性准则:一种基于依赖性的测试,可在训练前应用;另一种基于价值预测改进的测试,可事后应用。在部分可观测基准和合成环境上的实验表明,精心选择的特权信号可以在依赖更少状态信息的同时,匹配或超越全状态非对称基线。

英文摘要

Asymmetric reinforcement learning leverages privileged information available during training to improve learning under partial observability. Existing asymmetric actor-critic methods typically assume access to the full environment state to condition the critic during training, which is often unrealistic in practice. We introduce the informed asymmetric actor-critic framework that allows the critic to be conditioned on arbitrary state-dependent privileged signals, and show that any such signal yields unbiased policy gradient estimates. This substantially expands the set of admissible privileged information and raises the problem of selecting the most informative signals for learning. To this end, we propose two novel informativeness criteria: a dependence-based test that can be applied prior to training, and a test based on improvements in value prediction that can be applied post hoc. Experiments on partially observable benchmarks and synthetic environments demonstrate that carefully selected privileged signals can match or outperform full-state asymmetric baselines while relying on strictly less state information.

2512.14617 2026-06-10 cs.LG cs.AI 版本更新

Model-Based Reinforcement Learning in Discrete-Action Non-Markovian Reward Decision Processes

离散动作非马尔可夫奖励决策过程中基于模型的强化学习

Alessandro Trapasso, Luca Iocchi, Fabio Patrizi

发表机构 * Fondazione Bruno Kessler(布雷诺·科塞拉基金会) Sapienza University of Rome(罗马萨皮恩扎大学)

AI总结 提出QR-MAX算法,通过奖励机分解马尔可夫转移学习与非马尔可夫奖励处理,首次在离散NMRDP中获得PAC收敛到ε-最优策略的多项式样本复杂度,并扩展至连续状态空间。

Comments Accepted at IJCAI-ECAI 2026. 19 pages, 32 figures, includes appendix

详情
AI中文摘要

许多实际决策问题涉及的任务成功取决于整个系统历史,而非达到具有期望属性的状态。马尔可夫强化学习(RL)方法不适用于此类任务,而基于非马尔可夫奖励决策过程(NMRDP)的RL使智能体能够处理时间依赖任务。长期以来,这种方法缺乏关于(近)最优性和样本效率的形式保证。我们通过QR-MAX解决了这两个问题,这是一种新颖的基于模型的算法,用于离散NMRDP,通过奖励机将马尔可夫转移学习与非马尔可夫奖励处理分解。据我们所知,这是第一个利用这种分解获得PAC收敛到ε-最优策略且具有多项式样本复杂度的离散动作NMRDP的基于模型的RL算法。然后,我们将QR-MAX扩展到连续状态空间,提出Bucket-QR-MAX,一种基于SimHash的离散化器,它保留了相同的分解结构,无需手动网格划分或函数逼近即可实现快速稳定的学习。我们在复杂度递增的环境中将我们的方法与现代最先进的基于模型的RL方法进行了实验比较,显示出样本效率的显著提高和寻找最优策略的鲁棒性增强。

英文摘要

Many practical decision-making problems involve tasks whose success depends on the entire system history, rather than on achieving a state with desired properties. Markovian Reinforcement Learning (RL) approaches are not suitable for such tasks, while RL with non-Markovian reward decision processes (NMRDPs) enables agents to tackle temporal-dependency tasks. This approach has long been known to lack formal guarantees on both (near-)optimality and sample efficiency. We contribute to solving both issues with QR-MAX, a novel model-based algorithm for discrete NMRDPs that factorizes Markovian transition learning from non-Markovian reward handling via reward machines. To the best of our knowledge, this is the first model-based RL algorithm for discrete-action NMRDPs that exploits this factorization to obtain PAC convergence to $\varepsilon$-optimal policies with polynomial sample complexity. We then extend QR-MAX to continuous state spaces with Bucket-QR-MAX, a SimHash-based discretiser that preserves the same factorized structure and achieves fast and stable learning without manual gridding or function approximation. We experimentally compare our method with modern state-of-the-art model-based RL approaches on environments of increasing complexity, showing a significant improvement in sample efficiency and increased robustness in finding optimal policies.

2512.17629 2026-06-10 cs.LG cs.AI 版本更新

SCOPE: Sequential Causal Optimization of Process Interventions

SCOPE: 过程干预的顺序因果优化

Jakob De Moor, Hans Weytjens, Johannes De Smedt, Jochen De Weerdt

发表机构 * Research Centre for Information Systems Engineering (LIRIS), KU Leuven, Leuven, Belgium(信息系统工程研究中心(LIRIS),鲁汶大学,比利时列文) School of Computation, Information and Technology, Technical University of Munich (TUM), Munich, Germany(计算、信息与技术学院,慕尼黑技术大学(TUM),德国慕尼黑)

AI总结 提出SCOPE方法,通过反向归纳和因果学习直接利用观测数据,优化业务流程中顺序干预的KPI,优于现有方法。

详情
AI中文摘要

规范性过程监控(PresPM)在运行业务流程期间推荐干预措施以优化关键绩效指标(KPI)。在现实环境中,干预很少是孤立的:组织需要对齐干预序列以共同引导案例的结果。现有的PresPM方法仅部分解决了这一挑战。许多方法专注于单个干预决策,而其他方法将多个干预视为独立,忽略了它们随时间如何相互作用。确实处理这些依赖关系的方法依赖于模拟或数据增强来近似过程以训练强化学习(RL)代理,这可能会造成现实差距并引入偏差。我们提出了SCOPE(过程干预的顺序因果优化),一种学习对齐的顺序干预推荐的PresPM方法。SCOPE采用反向归纳来估计每个候选干预动作的效果,将其影响从最终决策点传播回第一个决策点。通过利用因果学习器,我们的方法可以直接使用观测数据,不同于需要构建过程近似用于RL的方法。在现有合成数据集和新的半合成数据集上的实验表明,SCOPE在优化KPI方面始终优于最先进的PresPM技术。基于真实事件日志的新型半合成设置作为可重复使用的基准,用于未来关于顺序PresPM的工作。

英文摘要

Prescriptive Process Monitoring (PresPM) recommends interventions during running business processes to optimize key performance indicators (KPIs). In realistic settings, interventions are rarely isolated: organizations need to align sequences of interventions to jointly steer the outcome of a case. Existing PresPM approaches only partially address this challenge. Many focus on a single intervention decision, while others treat multiple interventions independently, ignoring how they interact over time. Methods that do address these dependencies depend either on simulation or data augmentation to approximate the process to train a Reinforcement Learning (RL) agent, which may create a reality gap and introduce bias. We introduce SCOPE (Sequential Causal Optimization of Process Interventions), a PresPM approach that learns aligned sequential intervention recommendations. SCOPE employs backward induction to estimate the effect of each candidate intervention action, propagating its impact from the final decision point back to the first. By leveraging causal learners, our method can utilize observational data directly, unlike methods that require constructing process approximations for RL. Experiments on both an existing synthetic dataset and a new semi-synthetic dataset show that SCOPE consistently outperforms state-of-the-art PresPM techniques in optimizing the KPI. The novel semi-synthetic setup, based on a real-life event log, is provided as a reusable benchmark for future work on sequential PresPM.

2602.13807 2026-06-10 cs.LG 版本更新

AnomaMind: Agentic Time Series Anomaly Detection with Tool-Augmented Reasoning

AnomaMind:基于工具增强推理的智能体时间序列异常检测

Xiaoyu Tao, Yuchong Wu, Mingyue Cheng, Ze Guo, Tian Gao

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出AnomaMind框架,将时间序列异常检测重构为顺序决策过程,通过粗到细的工作流(定位可疑区间、工具交互构建诊断证据、自我反思细化决策)结合知识记忆与数值诊断工具包,并采用混合推理机制,显著提升域内和跨域异常检测性能与泛化能力。

详情
AI中文摘要

时间序列异常检测在许多实际应用中至关重要,有效的解决方案必须定位异常区域并在复杂设置下支持可靠的决策。然而,现有大多数方法将异常检测视为具有固定特征表示的纯判别预测任务,而非基于证据的诊断过程。因此,当异常表现出强上下文依赖性、多样化模式或跨数据集领域偏移时,它们往往难以应对。为应对这些挑战,我们提出AnomaMind,一个智能体时间序列异常检测框架,将异常检测重构为顺序决策过程。AnomaMind通过粗到细的工作流运行:首先定位可疑区间,然后通过工具交互构建诊断证据,最后通过自我反思细化异常决策。该工作流由一个结合知识记忆和数值诊断的工具箱支持:从训练数据中挖掘的视觉异常模式和领域知识提供上下文指导,而统计、基于值、基于变化和区域级别的算子提供可测量的验证证据。AnomaMind进一步采用混合推理机制,其中通用模型处理灵活推理、工具调用和细化,而检测特定策略通过基于规则的奖励进行优化,以实现可解析输出、F1分数对齐和假阳性控制。在域内和跨域设置下的广泛实验表明,AnomaMind持续改善异常检测性能并增强跨异质异常模式的泛化能力,验证了工具增强推理在异常检测中的有效性。代码可在以下网址获取:https://this URL。

英文摘要

Time series anomaly detection is critical in many real-world applications, where effective solutions must localize anomalous regions and support reliable decision-making under complex settings. However, most existing methods frame anomaly detection as a purely discriminative prediction task with fixed feature representations, rather than an evidence-driven diagnostic process. As a result, they often struggle when anomalies exhibit strong context dependence, diverse patterns, or domain shifts across datasets. To address these challenges, we propose AnomaMind, an agentic time series anomaly detection framework that reformulates anomaly detection as a sequential decision-making process. AnomaMind operates through a coarse-to-fine workflow that first localizes suspicious intervals, then constructs diagnostic evidence through tool interaction, and finally refines anomaly decisions through self-reflection. The workflow is supported by a toolkit box that combines knowledge memory and numerical diagnostics: visual anomaly patterns mined from training data and domain knowledge provide contextual guidance, while statistical, value-based, change-based, and region-level operators provide measurable evidence for verification. AnomaMind further adopts a hybrid inference mechanism in which general-purpose models handle flexible reasoning, tool invocation, and refinement, while a detection-specific policy is optimized with rule-based rewards for parsable outputs, F1-score alignment, and false-positive control. Extensive experiments under both in-domain and cross-domain settings demonstrate that AnomaMind consistently improves anomaly detection performance and enhances generalization across heterogeneous anomaly patterns, validating the effectiveness of tool-augmented reasoning for anomaly detection. The code is available at https://github.com/Xiaoyu-Tao/AnomaMind-TS.

2604.15414 2026-06-10 cs.LG cs.AI cs.NE 版本更新

Beyond Single-Model Optimization: Preserving Plasticity in Continual Reinforcement Learning

超越单模型优化:在持续强化学习中保持可塑性

Lute Lillo, Nick Cheney

发表机构 * Department of Computer Science University of Vermont(计算机科学系大学 of Vermont)

AI总结 提出TeLAPA框架,通过维护行为多样性的策略档案和共享潜在空间,在持续强化学习中实现技能对齐的策略邻域,以解决单策略保存导致的可塑性丧失问题,提升任务学习、恢复和性能保持能力。

详情
AI中文摘要

持续强化学习必须在保留与适应之间取得平衡,然而许多方法仍然依赖于\emph{单模型保存},即承诺将一个不断演化的策略作为跨任务的主要可复用解决方案。即使保留了先前成功的策略,在干扰后它可能不再为快速适应提供可靠的起点,这反映了单策略保存无法解决的一种\emph{可塑性丧失}形式。受质量-多样性方法的启发,我们引入了\emph{TeLAPA}(可迁移的潜在对齐策略档案),这是一个持续强化学习框架,它将行为多样性的策略邻域组织成每个任务的档案,并维护一个共享的潜在空间,使得存档的策略在非平稳漂移下保持可比性和可复用性。这种视角将持续强化学习从保留孤立解决方案转变为维护\emph{技能对齐的邻域},其中包含有能力的、行为相关的策略,以支持未来的重新学习。在我们的MiniGrid持续学习设置中,\emph{TeLAPA}成功学习了更多任务,在干扰后重新访问任务时更快地恢复了能力,并在整个任务序列中保持了更高的性能。我们的分析表明,源最优策略通常不是迁移最优的,即使在局部有能力的邻域内也是如此,并且有效的复用依赖于保留和选择多个邻近的替代方案,而不是将它们合并为一个代表。总之,这些结果将持续强化学习重新定义为围绕可复用且有能力的策略邻域,提供了一条超越单模型保存、迈向更具可塑性的终身智能体的途径。

英文摘要

Continual reinforcement learning must balance retention with adaptation, yet many methods still rely on \emph{single-model preservation}, committing to one evolving policy as the main reusable solution across tasks. Even when a previously successful policy is retained, it may no longer provide a reliable starting point for rapid adaptation after interference, reflecting a form of \emph{loss of plasticity} that single-policy preservation cannot address. Inspired by quality-diversity methods, we introduce \textsc{TeLAPA} (Transfer-Enabled Latent-Aligned Policy Archives), a continual RL framework that organizes behaviorally diverse policy neighborhoods into per-task archives and maintains a shared latent space so that archived policies remain comparable and reusable under non-stationary drift. This perspective shifts continual RL from retaining isolated solutions to maintaining \emph{skill-aligned neighborhoods} with competent and behaviorally related policies that support future relearning. In our MiniGrid CL setting, \textsc{TeLAPA} learns more tasks successfully, recovers competence faster on revisited tasks after interference, and retains higher performance across a sequence of tasks. Our analyses show that source-optimal policies are often not transfer-optimal, even within a local competent neighborhood, and that effective reuse depends on retaining and selecting among multiple nearby alternatives rather than collapsing them to one representative. Together, these results reframe continual RL around reusable and competent policy neighborhoods, providing a route beyond single-model preservation toward more plastic lifelong agents.

2604.20024 2026-06-10 cs.LG 版本更新

Replicable Bandits with UCB based Exploration

基于UCB探索的可复现Bandits

Rohan Deb, Udaya Ghai, Karan Singh, Arindam Banerjee

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Amazon(亚马逊) Carnegie Mellon University(卡内基梅隆大学)

AI总结 研究随机多臂老虎机和线性老虎机中基于UCB探索的可复现算法,提出RepUCB和RepLinUCB,分别实现最优遗憾界,显著降低可复现性代价。

详情
AI中文摘要

我们研究随机多臂老虎机(MAB)和线性老虎机中基于UCB(上置信界)探索的可复现算法。一个bandit算法是$\rho$-可复现的,如果两次使用共享内部随机性但独立奖励实现的执行以至少$1-\rho$的概率产生相同的动作序列。先前解决该问题的方法是消除法,并且在具有无限动作的线性老虎机中依赖于离散化,导致对维度$d$和$\rho$的次优依赖。我们为这两种设置开发了乐观替代方案。对于随机多臂老虎机,我们提出RepUCB,一种可复现的批处理UCB算法,并证明其遗憾为$O\\!\left(\frac{K^2\log^2 T}{\rho^2}\sum_{a:\Delta_a>0}\left(\Delta_a+\frac{\log(KT\log T)}{\Delta_a}\right)\right)$。对于随机线性老虎机,我们首先引入RepRidge,一种可复现的岭回归估计器,它同时满足置信度保证和$\rho$-可复现性保证。除了在bandit算法中的作用外,它可能在其他统计估计设置中也具有独立意义。然后我们使用RepRidge设计RepLinUCB,一种用于随机线性老虎机的可复现乐观算法,并证明其遗憾以$\widetilde{O}\\!\big(\big(d+\frac{d^3}{\rho}\big)\sqrt{T}\big)$为界。这比先前的最佳遗憾保证提高了$O(d/\rho)$因子,表明我们的乐观算法可以显著降低可复现性的代价。这是第一个对于大量臂具有最优$\rho$依赖性的线性bandit算法。最后,我们将框架扩展到随机广义线性老虎机,开发了RepGLM(一种可复现的惩罚GLM估计器)和RepGLMUCB(一种用于该设置的可复现乐观算法)。

英文摘要

We study replicable algorithms for stochastic multi-armed bandits (MAB) and linear bandits with UCB (Upper Confidence Bound) based exploration. A bandit algorithm is $ρ$-replicable if two executions using shared internal randomness but independent reward realizations produce the same action sequence with probability at least $1-ρ$. Prior approaches to this problem are elimination-based and, in linear bandits with infinitely many actions, rely on discretization, leading to suboptimal dependence on the dimension $d$ and $ρ$. We develop optimistic alternatives for both settings. For stochastic multi-armed bandits, we propose RepUCB, a replicable batched UCB algorithm and show that it attains a regret $O\!\left(\frac{K^2\log^2 T}{ρ^2}\sum_{a:Δ_a>0}\left(Δ_a+\frac{\log(KT\log T)}{Δ_a}\right)\right)$. For stochastic linear bandits, we first introduce RepRidge, a replicable ridge regression estimator that satisfies both a confidence guarantee and a $ρ$-replicability guarantee. Beyond its role in our bandit algorithm, this may also be of independent interest in other statistical estimation settings. We then use RepRidge to design RepLinUCB, a replicable optimistic algorithm for stochastic linear bandits, and show that its regret is bounded by $\widetilde{O}\!\big(\big(d+\frac{d^3}ρ\big)\sqrt{T}\big)$. This improves the best prior regret guarantee by a factor of $O(d/ρ)$, showing that our optimistic algorithm can substantially reduce the price of replicability. This is the first linear-bandit algorithm with an optimal dependence on $ρ$ for large number of arms. Finally, we extend our framework to stochastic generalized linear bandits by developing RepGLM, a replicable penalized GLM estimator, and RepGLMUCB, a replicable optimistic algorithm for this setting.

2605.04254 2026-06-10 cs.LG cs.HC 版本更新

Hierarchical Support Vector State Partitioning for Distilling Black Box Reinforcement Learning Policies

层次化支持向量状态划分用于黑箱强化学习策略的蒸馏

Senne Deproost, Mehrdad Asadi, Ann Nowé

发表机构 * Vrije Universiteit Brussel(布鲁塞尔自由大学)

AI总结 本文提出SVSP方法,通过线性支持向量机划分状态动作对数据集,构建紧凑结构化表示,提升平均回报7.4%并减少子策略数量,推动更灵活的蒸馏方法。

Comments Accepted for poster presentation at HHAI 2026

详情
AI中文摘要

我们介绍状态向量空间划分(SVSP),一种新颖的方法,通过一组可解释的子策略来模仿黑箱强化学习策略。通过线性支持向量机划分蒸馏数据集中的状态动作对,SVSP构建了原始策略的紧凑且结构化的表示。我们的方法在均值回报上比之前的批评驱动状态划分方法如Voronoi状态划分(VSP)提高了7.4%,比原始TD3策略提高了2.8%,同时将所需子策略数量比VSP减少了82.1%。我们的结果为一种更灵活的蒸馏方法铺平了道路,其中决策边界和替代模型可以在原始黑箱行为的范围内进行选择。

英文摘要

We introduce State Vector Space Partitioning (SVSP), a novel method to mimic a black box reinforcement learning policy using a set of human-interpretable subpolicies. By partitioning a distillation dataset of state action pairs with linear support vector machine splits, SVSP constructs a compact and structured representation of the original policy. Our method improves mean return by +7.4% over previous critic driven state partitioning attempts such as Voronoi State Partitioning (VSP) and +2.8% over the original TD3 policy, while reducing the number of required subpolicies against VSP by 82.1%. Our results pave the path towards a more flexible form of distillation where both the decision boundary and surrogate models can be chosen within a margin of the original black box behavior.

2605.05857 2026-06-10 cs.LG 版本更新

Offline Reinforcement Learning for Rotation Profile Control in Tokamaks

托卡马克旋转剖面控制的离线强化学习

Rohit Sonker, Hiro Josep Farre Kaga, Jiayu Chen, Andrew Rothstein, Ian Char, Ricardo Shousha, Egemen Kolemen, Jeff Schneider

发表机构 * Robotics Institute, Carnegie Mellon University(卡内基梅隆大学机器人研究所) Princeton University(普林斯顿大学) Princeton Plasma Physics Lab(普林斯顿等离子物理实验室) The University of Hong Kong(香港大学) Lila Sciences

AI总结 针对托卡马克等离子体旋转剖面控制难题,提出基于历史数据的离线强化学习方法,利用概率模型生成轨迹训练策略,并在DIII-D托卡马克上验证了有效性。

详情
AI中文摘要

托卡马克仍然是实现实用聚变能的主要候选装置,然而这些装置内部的许多重要控制问题仍然困难或未解决。其中一个挑战是控制等离子体旋转剖面,它强烈影响稳定性、约束和输运。虽然平均旋转可以被控制,但由于高维度、对多个执行器的响应以及对等离子体条件的依赖性,控制完整剖面具有挑战性。基于学习的控制方法,如强化学习(RL),为解决这一难题提供了潜在方案,能够建模复杂相互作用,从而实现有效的多输入多输出控制。然而,由于缺乏能够建模旋转剖面动力学的精确模拟器,学习此类策略具有挑战性。在这项工作中,我们研究了使用离线RL和离线基于模型的RL算法进行旋转剖面控制,仅基于DIII-D托卡马克的历史数据训练它们。我们的最终方法使用等离子体动力学的概率模型为RL训练生成轨迹。我们在DIII-D托卡马克上部署该策略,并观察到有希望的实际结果。最后,我们强调了在使用有限历史数据的情况下,在复杂物理设备上训练和部署RL策略的关键挑战和见解。

英文摘要

Tokamaks remain leading candidates for achieving practical fusion energy, yet many important control problems inside these devices are still difficult or unsolved. One such challenge is controlling the plasma rotation profile, which strongly influences stability, confinement, and transport. While the average rotation can be controlled, controlling the full profile is challenging due to high dimensionality, response to multiple actuators and dependence on plasma condition. Learning-based control methods, such as reinforcement learning (RL), provide a potential solution to this challenging problem with ability to model complex interactions leading to effective multi-input multi-output control. However, learning such policies is challenging due to the lack of accurate simulators that can model the rotation profile dynamics. In this work, we investigate the use of offline RL and offline model-based RL algorithms for rotation profile control, training them solely on historical data from the DIII-D tokamak. Our final method uses probabilistic models of plasma dynamics to generate rollouts for RL training. We deploy this policy on the DIII-D Tokamak and observe promising real-world results. We conclude by highlighting key challenges and insights from training and deploying an RL policy on a complex physical device while using only limited past data.

2606.05597 2026-06-10 cs.LG 版本更新

AsyncWebRL: Efficient Multi-Step RL for Visual Web Agents

AsyncWebRL: 面向视觉网页智能体的高效多步强化学习

Hao Bai, Rui Yang, Chenlu Ye, Spencer Whitehead, Aviral Kumar, Tong Zhang

发表机构 * UIUC(伊利诺伊大学香槟分校) Microsoft(微软) CMU(卡内基梅隆大学)

AI总结 提出异步系统设计和算法改进,解决多步强化学习中GPU空闲和轨迹过长问题,实现训练吞吐量提升2.9倍,并在WebGym测试集上取得新最优结果。

Comments Updated logo and code link

详情
AI中文摘要

使用多步强化学习训练视觉语言网页智能体计算密集,存在两种主要低效形式:同步强化学习中的GPU空闲,以及使用比必要更多步数和令牌的轨迹。我们提出AsyncWebRL,同时解决这两个问题。在系统方面,异步设计在迭代间重叠展开、梯度更新和策略刷新,并配合两种针对网页智能体的特定适配,即永久展开池和轻量级截图处理,共同实现端到端训练吞吐量比先前最快的开源同步流水线(WebGym)提升高达2.9倍。在算法方面,我们识别出多步GRPO中的每轨迹归一化器$1/|τ_i|$是轨迹级和令牌级低效的根本原因:因为失败轨迹系统性地比成功轨迹长,它降低了失败令牌上负梯度的权重,导致策略持续生成冗长的记忆模式。将$1/|τ_i|$替换为常数$1/k$打破了这种耦合,在保持总体成功率的同时缩短了轨迹。这些贡献共同在WebGym分布外测试集上设立了新的开源最优水平(相对先前最佳42.9%提升5.8%),在更难子集上提升最大(中等难度相对提升42%,困难难度相对提升48%)。

英文摘要

Training vision-language web agents with multi-step RL is compute-intensive, with two dominant forms of inefficiency: idle GPUs in synchronous RL, and trajectories that use more steps and tokens than necessary. We present AsyncWebRL, which addresses both. On the system side, an asynchronous design overlaps rollout, gradient update, and policy refresh across iterations, paired with two web-agent-specific adaptations, namely an everlasting rollout pool and lightweight screenshot handling, that together deliver up to a $2.9\times$ end-to-end training-throughput speedup over the previously fastest open synchronous pipeline (WebGym). On the algorithmic side, we identify the per-trajectory normalizer $1/|τ_i|$ in multi-step GRPO as the root cause of trajectory-level and token-level inefficiency: because failures are systematically longer than successes, it down-weights the negative gradient on failed tokens, so the policy keeps producing verbose memory schemas. Replacing $1/|τ_i|$ with a constant $1/k$ breaks this coupling, contracting trajectories while preserving aggregate success. Together, these contributions set a new open-source state of the art on the WebGym out-of-distribution test split (+5.8% relative over the 42.9% prior best), with the largest gains on the harder slices (+42% relative on Medium, +48% relative on Hard).

2606.06744 2026-06-10 cs.LG cs.GT cs.MA econ.TH 版本更新

Learn to Match: Two-Sided Matching with Temporally Extended Feedback

学会匹配:具有时间扩展反馈的双边匹配

Haijing Zong, Yancheng Liang, Boyang Zhou, Natasha Jaques

发表机构 * Department of Economics, University of Washington(华盛顿大学经济系) Paul G. Allen School of Computer Science & Engineering, University of Washington(华盛顿大学保罗·G·艾伦计算机科学与工程学院)

AI总结 提出一个具有时间扩展反馈的双边匹配框架,将其建模为部分可观测马尔可夫博弈,并基于多智能体强化学习构建Learn2Match基准,实验表明独立PPO优于bandit基线,但存在信息摩擦损失。

详情
AI中文摘要

双边匹配市场通常涉及随时间通过面试、重复互动、学习和分离而展开的信息。现有的匹配模型通常将此过程简化为关于固定偏好的即时亚高斯反馈,忽略了支付相关信息逐渐揭示并改变未来匹配决策的场景。我们引入了一个具有时间扩展反馈的框架,将双边匹配建模为一个部分可观测马尔可夫博弈,其中包含昂贵的匹配前筛选、有噪声的匹配后观测、演变的潜在特征以及内生的延续或解散。我们在Learn2Match中实例化该框架,这是一个用于动态匹配市场的多智能体强化学习基准。Learn2Match支持关于面试谁、与谁匹配以及何时解散匹配的分散决策,同时使用遗憾、社会福利和信息摩擦损失(衡量由潜在偏好不完全揭示引起的福利差距)来评估策略。我们发现,在时间扩展反馈下,独立PPO比bandit风格的CA-ETC基线实现了更高的累积社会福利和更低的累积遗憾,展示了MARL在动态匹配市场中的前景。然而,PPO仍然产生更高的信息摩擦损失,表明端到端MARL尚未提供匹配bandit方法的协调探索结构。这些结果将Learn2Match定位为开发下一代匹配市场算法的基准:像RL智能体一样自适应、像bandit算法一样统计严谨、像稳定匹配机制一样结构感知的方法。

英文摘要

Two-sided matching markets often involve information that unfolds over time through interviews, repeated interaction, learning, and separation. Existing matching models typically reduce this process to immediate sub-Gaussian feedback about fixed preferences, missing settings where payoff-relevant information is revealed gradually and changes future matching decisions. We introduce a framework with temporally extended feedback, that formulates two-sided matching as a partially observable Markov game with costly pre-match screening, noisy post-match observations, evolving latent profiles, and endogenous continuation or dissolution. We instantiate this framework in Learn2Match, a multi-agent reinforcement-learning benchmark for dynamic matching markets. Learn2Match supports decentralized decision making over whom to interview, whom to match with, and when to dissolve a match, while evaluating policies using regret, social welfare, and an information-friction loss that measures the welfare gap caused by incomplete revelation of latent preferences. We find that independent PPO achieves higher cumulative social welfare and lower cumulative regret than the bandit-style CA-ETC baseline under temporally extended feedback, demonstrating the promise of MARL for dynamic matching markets. However, PPO still incurs higher information-friction loss, revealing that end-to-end MARL does not yet provide the coordinated exploration structure of matching-bandit methods. These results position Learn2Match as a benchmark for developing the next generation of matching-market algorithms: methods that are adaptive like RL agents, statistically disciplined like bandit algorithms, and structurally aware like stable-matching mechanisms. Please refer to https://sites.google.com/view/learn-to-match/home for the official website and the code link.

2509.25760 2026-06-10 cs.CL cs.AI cs.LG 版本更新

TruthRL: Incentivizing Truthful LLMs via Reinforcement Learning

TruthRL: 通过强化学习激励诚实的LLM

Zhepei Wei, Xiao Yang, Kai Sun, Jiaqi Wang, Rulin Shao, Jingxiang Chen, Mohammad Kachuee, Teja Gollapudi, Yiwei Liao, Nicolas Scheffer, Rakesh Wanga, Anuj Kumar, Yu Meng, Wen-tau Yih, Xin Luna Dong

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出TruthRL框架,使用GRPO和三值奖励直接优化LLM的诚实性,减少幻觉并允许不确定时弃权,在知识密集型基准上显著提升诚实性。

Comments ICML 2026. Code: https://github.com/facebookresearch/TruthRL

详情
AI中文摘要

虽然大型语言模型(LLM)在事实性问题回答上表现出色,但它们仍然容易产生幻觉和不真实的回答,特别是当任务需要其参数知识之外的信息时。事实上,诚实性需要的不仅仅是准确性——模型还必须识别不确定性,并在不确定时弃权以避免幻觉。这对现有方法提出了根本性挑战:优化准确性的方法往往会放大幻觉,而鼓励弃权的方法可能变得过于保守,牺牲正确答案。两种极端最终都损害了诚实性。在这项工作中,我们提出了TruthRL,一个通用的强化学习(RL)框架,直接优化LLM的诚实性。具体来说,我们使用GRPO实现TruthRL,并采用一个简单而有效的三值奖励,区分正确答案、幻觉和弃权。它激励模型不仅通过提供正确回答来减少幻觉,还通过在不确定时启用弃权来提高诚实性。在四个知识密集型基准上的大量实验表明,TruthRL显著减少了幻觉(例如,43.5% → 19.4%)并提高了诚实性(例如,5.3% → 37.2%),在各种骨干模型上均有一致的提升。分析表明,TruthRL的改进源于LLM识别其知识边界的能力增强,从而避免了像基线那样过于保守。

英文摘要

While large language models (LLMs) have demonstrated strong performance on factoid question answering, they are still prone to hallucination and untruthful responses, particularly when tasks demand information outside their parametric knowledge. Indeed, truthfulness requires more than accuracy -- models must also recognize uncertainty and abstain when unsure to avoid hallucinations. This presents a fundamental challenge for existing methods: approaches that optimize for accuracy often amplify hallucinations, while those that encourage abstention can become overly conservative, sacrificing correct answers. Both extremes ultimately compromise truthfulness. In this work, we present TruthRL, a general reinforcement learning (RL) framework that directly optimizes the truthfulness of LLMs. Specifically, we implement TruthRL using GRPO with a simple yet effective ternary reward that distinguishes correct answers, hallucinations, and abstentions. It incentivizes models to reduce hallucinations not only by providing correct responses, but also by enabling abstention when uncertain, thereby improving truthfulness. Extensive experiments across four knowledge-intensive benchmarks show that TruthRL significantly reduces hallucinations (e.g., 43.5% $\rightarrow$ 19.4%) and improves truthfulness (e.g., 5.3% $\rightarrow$ 37.2%), with consistent gains across various backbone models. Analysis shows that the improvement of TruthRL arises from enhanced capability of LLMs to recognize their knowledge boundary, hence avoiding being overly conservative as the baselines are.

4. 生成模型与概率建模 26 篇

2606.09926 2026-06-10 cs.LG cs.AI 新提交

Sample Where You Struggle: Sharpening Base Model Reasoning via Entropy-Guided Power Sampling

在你挣扎处采样:通过熵引导的幂采样增强基础模型推理

Hong Guo, Nianhui Guo, Christoph Meinel, Haojin Yang

AI总结 提出熵引导的幂采样(EGPS),一种无需训练和验证器的采样方法,通过利用前向传播中的token级熵将MCMC移动定位到高熵区域,在多个基准上以高达12.6倍加速达到最优或并列最优准确率。

详情
AI中文摘要

从序列级幂分布 $p^\alpha$ 采样可以在不更新任何参数的情况下从基础语言模型中引出强化学习级别的推理,但标准的Metropolis-Hastings(MH),一种马尔可夫链蒙特卡洛(MCMC)采样器,既昂贵又慢混合。我们将这两个问题归因于结构不匹配:$p^\alpha$ 主要在稀疏、空间聚集的高熵决策点集上偏离 $p$,然而MH沿着前缀均匀地提出重采样位置——在近简并条件上浪费计算,同时在模式发散处欠混合。我们提出熵引导的幂采样(EGPS),一种无需训练和验证器的采样器,它从已经在前向传播中的token级熵重新推导其提议。EGPS跳过确定性块,将每个MCMC移动定位到高熵邻域,并在决策点应用多尝试Metropolis——使得采样成本随熵质量而非序列长度缩放。在Qwen2.5-Math-7B上,EGPS在所有三个基准(MATH500 $75.8\\%$,HumanEval $62.2\\%$,GPQA $42.4\\%$)上达到最佳或并列最佳准确率,同时相对于MH基线实现了高达12.6倍的墙钟加速。

英文摘要

Sampling from the sequence-level power distribution $p^α$ elicits RL-level reasoning from base language models without any parameter updates, but the standard Metropolis--Hastings (MH), a Markov Chain Monte Carlo (MCMC) sampler, is both expensive and slow-mixing. We trace both to a structural mismatch: $p^α$ mainly departs from $p$ at a sparse, spatially clustered set of high-entropy decision points, yet MH proposes resampling positions uniformly along the prefix -- wasting compute on near-degenerate conditionals while under-mixing precisely where modes diverge. We propose Entropy-Guided Power Sampling (EGPS), a training-free and verifier-free sampler that re-derives its proposal from token-level entropy already in the forward pass. EGPS skips deterministic blocks, localizes each MCMC move to a high-entropy neighborhood, and applies Multiple-Try Metropolis at decision points -- making sampling cost scale with \emph{entropy mass rather than sequence length}. On Qwen2.5-Math-7B, EGPS reaches best or tied-best accuracy on all three benchmarks (MATH500 $75.8\%$, HumanEval $62.2\%$, GPQA $42.4\%$) at up to a $12.6\times$ wall-clock speedup over the MH baseline.

2606.09962 2026-06-10 cs.LG cs.AI cs.SD 新提交

Optimality of FSQ Tokens for Continuous Diffusion for Categorical Data with Application to Text-to-Speech

FSQ 令牌在分类数据连续扩散中的最优性及其在文本到语音中的应用

Vadim Popov, Wenju Gu, Tasnima Sadekova, Georgii Aparin, Assel Yermekova

AI总结 本文研究连续扩散模型中离散令牌的潜在空间结构,通过理论分析和实验证明 FSQ 令牌化方案在分类数据连续扩散中最优,并在文本到语音任务中验证其优于基于 LLM 的方法。

详情
AI中文摘要

分类数据的连续扩散是一种属于扩散家族的框架,旨在生成离散数据。近年来,由于研究人员试图实现寻找自回归大型语言模型的合理替代方案这一具有挑战性的目标,对此类模型的科学兴趣不断增长。在本文中,我们研究了与离散令牌相对应的潜在空间结构的性质,这些性质通过扩散路径测度上的 Kullback-Leibler 散度和最优训练扩散模型正确预测令牌的准确性来表达。我们发现,FSQ 令牌化方案具有的潜在空间结构使其最适合分类数据的连续扩散,这一点通过严格的理论分析和数值实验得到了验证。为了在现实场景中验证我们的发现,我们训练了几个以语音令牌作为中间声学特征的文本到语音扩散模型,并表明基于 FSQ 令牌的模型确实表现最佳,而且它优于其强大的基于 LLM 的对应模型,同时体积更小、速度更快。

英文摘要

Continuous diffusion for categorical data is a framework belonging to the diffusion family and aiming at generating discrete data. The scientific interest to such models has been constantly increasing these days because researchers try to achieve a challenging goal of finding reasonable alternatives to autoregressive large language models. In this paper, we study the properties of the structure of the latent space corresponding to discrete tokens expressed in terms of Kullback-Leibler divergence on diffusion path measures and accuracy of the correct token prediction by the optimally trained diffusion model. We find that FSQ tokenization scheme has the latent space structure with the properties that make it best suited for continuous diffusion for categorical data as verified through rigorous theoretical analysis and numerical experiments. To validate our findings in real-life scenario, we train several text-to-speech diffusion models having speech tokens as intermediate acoustic features, and show that the one based on FSQ tokens indeed performs the best, and, moreover, it outperforms its strong LLM-based counterpart, at the same time being significantly smaller and faster.

2606.10089 2026-06-10 cs.LG cs.AI 新提交

A Theory on Flow Matching with Neural Networks

基于神经网络的流匹配理论

Yihan He, Qishuo Yin, Yuan Cao, Jianqing Fan, Han Liu

AI总结 本文为神经网络参数化的条件速度场流匹配建立了理论基础,证明了过参数化两层ReLU网络中梯度下降的收敛性,推导了条件速度场匹配目标的泛化界,并提供了生成样本的Wasserstein距离保证。

详情
AI中文摘要

在这项工作中,我们为神经网络参数化的条件速度场流匹配建立了理论基础。我们证明了过参数化两层ReLU神经网络中梯度下降的收敛性保证。我们推导了条件速度场匹配目标的泛化界。基于这些结果,我们为诱导流生成的样本提供了Wasserstein距离保证。我们的分析基于具有无界损失的多任务表示学习的泛化界,这可能对流式生成建模之外的其他领域也有独立意义。这些理论结果通过在合成和真实图像基准上的大量实验得到了验证。

英文摘要

In this work, we develop theoretical foundation for flow matching with neural-network-parameterized conditional velocity fields. We establish convergence guarantees for gradient descent in the over-parameterized 2-layered ReLU neural network regime. We derive generalization bounds for the conditional velocity-field matching objective. Building on these results, we provide Wasserstein-distance guarantees for the samples generated by the induced flow. Our analysis is based on generalization bound for multi-task representation learning with unbounded losses, which may be of independent interest beyond flow-based generative modeling. These theoretical results are validated through extensive experiments on both synthetic and real-world image benchmarks.

2606.10153 2026-06-10 cs.LG 新提交

Compositional Generative Modeling from Decentralized Data

来自分散数据的组合生成建模

Mashrur M. Morshed, Vishnu Naresh Boddeti

AI总结 提出去中心化组合流匹配(DCFM)框架,通过结构约束实现分散数据中生成因子的组合,无需交换原始数据,在条件图像生成、机器人空间规划和医学属性共现建模中显著优于基线。

Comments ICML 2026

详情
AI中文摘要

学习物理世界的组合性质需要联合观察相互作用因素。然而,由于实际数据通常是分散的,这些因素被碎片化地隔离在孤岛中。现有的去中心化生成方法仅关注对孤岛数据并集的建模,忽略了整体所隐含的新颖组合。为弥合这一差距,我们引入了去中心化组合流匹配(DCFM),这是一个在全局生成因子集上强制执行结构约束的框架,无需交换任何原始数据。DCFM使得新颖组合能够通过同伴交互涌现,即使没有单一数据源能独立支持该组合。实验上,DCFM在条件图像生成、机器人空间规划和医学属性共现建模中显著优于联邦学习和混合专家基线。

英文摘要

Learning the compositional nature of the physical world requires joint observation of interacting factors. However, because practical data is often decentralized, these factors are fragmented across isolated silos. Existing decentralized generative approaches focus only on modeling the union of siloed data, overlooking novel combinations implied by the collective whole. To bridge this gap, we introduce Decentralized Compositional Flow Matching (DCFM), a framework that enforces structural constraints across the global set of generative factors, without exchanging any raw data. DCFM enables novel combinations to emerge through peer interactions, even when no single data source can independently support the composition. Empirically, DCFM substantially outperforms federated learning and mixture-of-experts baselines across conditional image generation, robotic spatial planning, and medical attribute co-occurrence modeling.

2606.10199 2026-06-10 cs.LG cs.CL 新提交

A Continuous-Time Markov Chain Framework for Insertion Language Models

插入语言模型的连续时间马尔可夫链框架

Dhruvesh Patel, Benjamin Rozonoyer, Soumitra Das, Tahira Naseem, Tim G. J. Rudner, Andrew McCallum

AI总结 提出基于连续时间马尔可夫链的插入语言模型去噪框架,统一现有方法,在规划任务中优于自回归和掩码扩散模型,语言建模中与现有方法竞争且采样更灵活。

Comments Accepted at AISTATS 2026. Code is available at https://github.com/dhruvdcoder/ctmc_dilm

详情
AI中文摘要

插入语言模型(ILMs)相比从左到右生成和基于掩码的生成具有若干优势。然而,现有的插入式生成公式大多是临时性的。在本文中,我们通过将噪声过程建模为变长序列空间上的连续时间马尔可夫链,从第一性原理推导出ILMs的扩散式去噪目标。我们表明,先前的ILMs公式可以视为该去噪框架的特例。通过在合成规划任务上的实证评估,我们展示了所提出的方法保留了插入式生成相对于从左到右生成和掩码扩散模型的优势。在语言建模中,我们的基于扩散的方法与从左到右生成和掩码扩散模型具有竞争力,同时与现有的插入语言模型相比,在采样方面提供了额外的灵活性。

英文摘要

Insertion Language Models (ILMs) offer several advantages over left-to-right generation and mask-based generation. However, existing formulations of insertion-based generation have largely been ad-hoc. In this paper, we derive a diffusion-style denoising objective for ILMs from first principles by formulating the noising process as a continuous-time Markov chain on the space of variable-length sequences. We show that previous formulations of ILMs can be viewed as special cases of this denoising framework. Through empirical evaluation on a synthetic planning task, we show that the proposed approach retains the benefits of insertion-based generation over left-to-right generation and masked diffusion models. In language modeling, our diffusion-based approach is competitive with left-to-right generation and masked diffusion models, while offering additional flexibility in sampling compared to existing insertion language models.

2606.10543 2026-06-10 cs.LG cs.AI cs.ET q-bio.QM 新提交

Flexible Flows for Biological Sequence Design

生物序列设计的灵活流模型

Yogesh Verma, Dani Korpela, Harri Lähdesmäki, Vikas Garg

发表机构 * Aalto University(阿尔托大学) YaiYai Ltd(YaiYai有限公司) OpenProtein.AI

AI总结 提出结构化耦合、潜编辑速率参数化和潜分类器无引导机制,实现变长序列生成和细粒度控制,在多种生物序列任务中达到最优性能。

详情
AI中文摘要

设计功能性生物序列需要在严格的进化和生物物理约束下导航巨大的离散空间。离散流匹配(DFM)提供了在此类空间上的生成框架,但现有方法依赖于生物学上无信息的耦合,并且在变长序列生成和细粒度控制方面灵活性有限。我们提出了一种结构化耦合,编码序列元素间的领域特定偏好,将源分布偏向合理区域,而不修改流目标或训练过程。在此基础上,我们引入了一种基于潜编辑的速率参数化,通过基于共享全局潜变量的编辑操作(类似于潜变量模型)对变长生成进行建模,同时保持可追踪性。我们进一步引入了一种潜分类器无引导机制,在连续潜空间中连贯地引导生成,以及用于测试时控制编辑操作的Dirichlet先验温度缩放。我们的方法在多种生物序列任务中实现了最先进的性能,包括密度估计、无条件和条件DNA序列生成以及肽序列生成。

英文摘要

Designing functional biological sequences requires navigating vast discrete spaces under strict evolutionary and biophysical constraints. Discrete Flow Matching (DFM) offers a generative framework over such spaces, but existing approaches rely on biologically uninformative couplings and offer limited flexibility for variable-length sequence generation and fine-grained control. We propose a structured coupling that encodes domain-specific preferences among sequence elements, biasing the source distribution toward plausible regions without modifying the flow objective or training procedure. Building on this, we introduce a latent edit-based rate parameterization that models variable-length generation via edit operations conditioned on a shared global latent, akin to a latent variable model, while remaining tractable. We further introduce a latent classifier-free guidance mechanism that steers generation coherently in continuous latent space, along with Dirichlet-prior temperature scaling for test-time control over edit operations. Our method achieves state-of-the-art performance across diverse biological sequence tasks, including density estimation, unconditional and conditional DNA sequence generation, and peptide sequence generation.

2606.10592 2026-06-10 cs.LG 新提交

Dirichlet-Guided Group Forecasting for Alleviating Over-smoothing in Time Series Forecasting

Dirichlet引导的群体预测:缓解时间序列预测中的过度平滑

Xingyu Zhang, Jingyao Wang, Xin Yu, Zeen Song, Jianqi Zhang, Changwen Zheng, Wenwen Qiang

发表机构 * University of Chinese Academy of Sciences(中国科学院大学) Institute of Software, Chinese Academy of Sciences(中国科学院软件研究所)

AI总结 针对时间序列预测中的过度平滑问题,提出Dirichlet引导的群体预测(DGF)框架,通过显式建模多个模式条件预测分布及其选择概率的不确定性,并采用Dirichlet引导的分层采样和奖励优化,提升预测的准确性、多样性和动态一致性。

详情
AI中文摘要

时间序列预测常常遭受过度平滑的影响,尤其是当未来动态是多模态时。预测可能遵循观测未来的粗略趋势,但未能保留定义合理动态演变的急剧变化、振荡、转折点和制度转换。在这项工作中,我们从潜在动态模式压缩的角度重新审视过度平滑:在部分观测和单一实现监督下,多个可能的未来模式可能在预测过程中被削弱、合并或平均。基于这一观点,我们提出了Dirichlet引导的群体预测(DGF),一种保持模式的预测框架,它显式建模多个模式条件预测分布及其选择概率的不确定性。DGF使用Dirichlet引导的分层采样机制和基于奖励的优化,以鼓励预测准确、动态一致且模式区分。在真实世界预测基准上的大量实验表明,DGF减少了过度平滑,同时提高了预测准确性、多样性和动态一致性。

英文摘要

Time series forecasting often suffers from over-smoothing, especially when future dynamics are multi-modal. Forecasts may follow the coarse trend of the observed future, but fail to preserve sharp changes, oscillations, turning points, and regime transitions that define plausible dynamic evolution. In this work, we revisit over-smoothing from the perspective of latent dynamical mode compression: under partial observation and single-realization supervision, multiple plausible future modes can be weakened, merged, or averaged during forecasting. Based on this view, we propose Dirichlet-Guided Group Forecasting (DGF), a mode-preserving forecasting framework that explicitly models multiple mode-conditioned predictive distributions and uncertainty over their selection probabilities. DGF uses a Dirichlet-guided hierarchical sampling mechanism and reward-based optimization to encourage forecasts that are accurate, dynamically consistent, and mode-distinct. Extensive experiments on real-world forecasting benchmarks show that DGF reduces over-smoothing while improving forecasting accuracy, diversity, and dynamical consistency.

2606.10959 2026-06-10 cs.LG 新提交

Population-Aware Physics-Informed Neural Particle Flow for Bayesian Update

群体感知的物理信息神经粒子流用于贝叶斯更新

Batu Candan, Simone Servadio

发表机构 * Iowa State University(爱荷华州立大学)

AI总结 提出群体感知的物理信息神经粒子流(PA-PINPF),通过群体编码器增强粒子更新,在贝叶斯后验传输中优于标准方法。

详情
AI中文摘要

物理信息神经粒子流(PINPF)学习一个确定性传输场,该场将粒子从先验分布移动到贝叶斯后验分布,同时强制执行控制概率演化的方程。然而,标准PINPF速度模型独立处理每个粒子,因此不显式地将其传输决策基于经验粒子群体。本文引入了群体感知的PINPF(PA-PINPF),它通过整个粒子集的置换不变深度集表示来增强每个粒子的更新。我们研究了两种群体编码器。PA-PINPF-State总结粒子状态,而PA-PINPF-Feature总结完整的局部物理信息特征向量,包括粒子位置、伪时间、测量信息、似然值和得分信息。后者使得群体上下文不仅能表示粒子云几何,还能表示群体级别的贝叶斯传输几何。这些方法保留了原始无监督的物理信息残差目标,并且在训练过程中不需要真实后验样本。在距离测量任务和非线性到达时间差后验传输上的实验表明,两种群体感知变体均优于逐粒子PINPF,而特征群体编码提供了最强的性能。这些结果表明,群体级别的物理特征为学习贝叶斯粒子传输提供了有用的全局信息。

英文摘要

Physics-informed neural particle flow (PINPF) learns a deterministic transport field that moves particles from a prior distribution toward a Bayesian posterior while enforcing the governing probability-evolution equation. However, the standard PINPF velocity model processes particles independently and therefore does not explicitly condition its transport decisions on the empirical particle population. This paper introduces population-aware PINPF (PA-PINPF), which augments each particle update with a permutation-invariant Deep Sets representation of the full particle set. We investigate two population encoders. PA-PINPF-State summarizes the particle states, whereas PA-PINPF-Feature summarizes the complete local physics-informed feature vectors, including particle position, pseudo-time, measurement information, likelihood values, and score information. The latter allows the population context to represent not only particle-cloud geometry, but also the population-level Bayesian transport geometry. The methods retain the original unsupervised physics-informed residual objective and require no ground-truth posterior samples during training. Experiments on range-measurement tasks and nonlinear time-difference-of-arrival posterior transport demonstrate that both population-aware variants improve over particle-wise PINPF, while feature-population encoding provides the strongest performance. These results show that population-level physics features provide useful global information for learned Bayesian particle transport.

2606.11025 2026-06-10 cs.LG 新提交

Flow-DPPO: Divergence Proximal Policy Optimization for Flow Matching Models

Flow-DPPO:用于流匹配模型的散度近端策略优化

Bowen Ping, Xiangxin Zhou, Penghui Qi, Minnan Luo, Liefeng Bo, Tianyu Pang

发表机构 * Xi’an Jiaotong University(西安交通大学) Tencent Hunyuan(腾讯混元) National University of Singapore(新加坡国立大学)

AI总结 针对流匹配模型中PPO比率裁剪的结构性缺陷,提出Flow-DPPO方法,利用高斯策略的精确KL散度计算实现散度近端约束,提升奖励和训练稳定性。

详情
AI中文摘要

近期工作表明,在线强化学习(RL)可以显著提高图像和视频生成中流匹配模型的质量和对齐性。Flow-GRPO和CPS等方法将去噪过程建模为马尔可夫决策过程,并应用PPO风格的比率裁剪来强制执行信任区域。然而,我们认为比率裁剪在结构上不适合流模型:新旧策略之间的概率比率是真实策略散度的有噪声单样本估计,导致在轨迹的某些区域过度约束,而在其他区域约束不足。我们提出Flow-DPPO(流散度近端策略优化),用散度近端约束替代比率裁剪。一个关键观察是,流模型中每步策略是高斯分布,使得新旧策略之间的KL散度可以精确且廉价地计算。Flow-DPPO采用非对称散度掩码,仅当梯度更新同时远离信任区域并违反散度阈值时才阻止更新。实验表明,Flow-DPPO以更好的KL近端效率获得更高奖励,减轻灾难性遗忘,促进平衡的多目标优化,并在比率裁剪退化的多轮训练中保持稳定。代码和模型见此https URL。

英文摘要

Recent work has demonstrated that online reinforcement learning (RL) can substantially improve the quality and alignment of flow matching models for image and video generation. Methods such as Flow-GRPO and CPS cast the denoising process as a Markov Decision Process and apply PPO-style ratio clipping to enforce a trust region. However, we argue that ratio clipping is structurally ill-suited for flow models: the probability ratio between new and old policies is a noisy, single-sample estimate of the true policy divergence, leading to over-constraining in some regions of the trajectory and under-constraining in others. We propose Flow-DPPO (Flow Divergence Proximal Policy Optimization), which replaces ratio clipping with a divergence proximal constraint. A key observation is that the per-step policy in flow models is Gaussian, enabling exact and cheap computation of the KL divergence between old and new policies. Flow-DPPO employs an asymmetric divergence mask that blocks gradient updates only when they simultaneously move away from the trusted region and violate the divergence threshold. Experiments show that Flow-DPPO achieves higher rewards with better KL-proximal efficiency, alleviates catastrophic forgetting, promotes balanced multi-objective optimization, and enables stable multi-epoch training where ratio clipping degrades. Code and models are available at https://github.com/Tencent-Hunyuan/UniRL/tree/main/FlowDPPO.

2606.11075 2026-06-10 cs.LG 新提交

Exploring the Design Space of Reward Backpropagation for Flow Matching

探索流匹配的奖励反向传播设计空间

Ruoyu Wang, Boye Niu, Xiangxin Zhou, Yushi Huang, Tongliang Liu, Chi Zhang

发表机构 * Westlake University(西湖大学) Tencent Hunyuan(腾讯混元) University of Sydney(悉尼大学) The Hong Kong University of Science and Technology(香港科技大学)

AI总结 针对文本到图像流匹配模型,提出统一代理轨迹框架FlowBP,通过设计反向轨迹(稀疏重建、桥耦合、拉格朗日积分)解决直接奖励反向传播中的内存和梯度爆炸问题,在多个模型和指标上优于基线。

详情
AI中文摘要

通过直接奖励反向传播将文本到图像流匹配模型与人类偏好对齐是样本高效的,但受到两个众所周知的问题的阻碍:在现代模型规模下,激活值无法在整个采样轨迹上存储,并且跨步骤的链式雅可比乘积使得奖励梯度在传播回早期索引时膨胀。基于连接器的方法,如LeapAlign,通过用短的固定路径替换完整的反向轨迹来解决这些问题,突出了采样和优化之间的有用解耦。然而,所得梯度的质量取决于该短路径对完整展开的近似精度,尤其是在长间隔上。我们提出FlowBP,一个统一的代理轨迹框架,将反向轨迹本身视为设计对象。FlowBP保留一个无梯度的缓存展开用于采样,然后从缓存和选择性重新前向的速度中构建一个轻量级的反向代理。这个视角分离了四个选择:奖励模型输入、活动集、积分权重和桥耦合,并将先前的直接梯度方法恢复为特定设置。在此框架内,我们实例化了三个变体:FlowBP-Sparse使用稀疏欧拉重建,FlowBP-Bridge添加受控桥耦合,FlowBP-Lagrange提高跳跃求积的阶数。所有三个变体都将内存限制在活动集大小,并将梯度链限制在最多一个雅可比因子。在SD3.5-M、FLUX.1-dev和FLUX.2-Klein-base上,针对偏好、质量和组合指标,这三个变体在大多数指标上优于直接梯度基线。

英文摘要

Aligning text-to-image flow matching models with human preferences via direct reward backpropagation is sample-efficient but hampered by two well-known pathologies: activations cannot be stored across the full sampling trajectory at modern model scale, and chained Jacobian products across steps inflate the reward gradient as it travels back to early indices. Connector-based methods, such as LeapAlign, address these issues by replacing the full backward trajectory with a short pinned path, highlighting a useful decoupling between sampling and optimization. However, the quality of the resulting gradient depends on how accurately this short path approximates the full rollout, especially over long intervals. We propose FlowBP, a unified surrogate-trajectory framework that treats the backward trajectory itself as the design object. FlowBP keeps a no-gradient cached rollout for sampling, then builds a lightweight backward surrogate from cached and selectively re-forwarded velocities. This view separates four choices: the reward-model input, active set, integration weights, and bridge coupling, and recovers prior direct-gradient methods as particular settings. Within this framework, we instantiate three variants: FlowBP-Sparse uses sparse Euler reconstruction, FlowBP-Bridge adds controlled bridge coupling, and FlowBP-Lagrange raises the order of leap quadrature. All three bound memory by the active-set size and limit gradient chaining to at most one Jacobian factor. Across SD3.5-M, FLUX.1-dev, and FLUX.2-Klein-base on preference, quality, and compositional metrics, the three variants improve over direct-gradient baselines on most metrics.

2606.11138 2026-06-10 cs.LG cs.NA math.NA 新提交

First-Order Trajectory Matching: Fast Ensemble Predictions of Chaotic, Turbulent, Stochastic Systems

一阶轨迹匹配:混沌、湍流、随机系统的快速集成预测

Shreya Jha, Timo Schorlepp, Nicholas Geissler, Jules Berman, Benjamin Peherstorfer

AI总结 提出一阶轨迹匹配(FTM)方法,通过学习随机系统轨迹的一阶局部概率质量输运,实现低成本的集成预测,并捕捉通量、环流等轨迹量。

详情
AI中文摘要

我们引入一阶轨迹匹配(FTM),这是一种替代建模方法,从随机系统的轨迹中学习概率质量的一阶局部输运。通过匹配轨迹的对称一阶运动,FTM学习概率流速度,其流动保持时间边缘以匹配集成平均值,同时捕获类似流的轨迹量,如通量、环流和跨势垒电流。FTM直接从轨迹学习流速度,避免了漂移、扩散和分数估计。我们的稳定性分析将离散化误差与采样方差分开,并表明当时间分辨率和样本量适当平衡时,单步无模拟的FTM损失是稳定的。在随机动力系统和PDE示例中,我们经验证明FTM以低确定性展开成本提供轨迹感知的集成预测。

英文摘要

We introduce First-Order Trajectory Matching (FTM), a surrogate-modeling method that learns the first-order local transport of probability mass from trajectories of stochastic systems. By matching the symmetric first-order motion of trajectories, FTM learns the probability current velocity, whose flow preserves time marginals to match ensemble averages, while also capturing current-like trajectory quantities such as fluxes, circulations, and barrier-crossing currents. FTM learns the current velocity directly from trajectories, avoiding drift, diffusion, and score estimation. Our stability analysis separates discretization error from sampling variance and shows that the one-step simulation-free FTM loss is stable when temporal resolution and sample size are properly balanced. Across stochastic dynamical systems and PDE examples, we empirically demonstrate that FTM provides trajectory-aware ensemble predictions at low, deterministic-rollout cost.

2606.09901 2026-06-10 cs.GR cs.CV cs.HC cs.LG cs.MM 交叉投稿

On the Controllability-Fidelity Frontier in Diffusion Editing

扩散编辑中的可控性-保真度前沿

Yi Hu, Leying Yi, Emily Davis, Finn Carter

发表机构 * Xidian University(西安电子科技大学)

AI总结 本文理论结合实证研究扩散图像编辑中用户意图遵循、非目标内容保持与输出质量间的权衡,提出算法框架并揭示关键失败模式,讨论伦理考量。

Comments Preprint

详情
AI中文摘要

基于扩散的生成模型实现了强大的图像编辑能力,但在保持保真度和安全性的同时实现精确控制仍然具有挑战性。我们对可控的基于扩散的图像编辑进行了全面的理论和实证研究,分析了用户意图遵循、非目标内容保持和输出质量之间的权衡。我们的工作涵盖了文本和掩码引导编辑、点/拖拽操作以及基于反演的流程。我们推导了编辑目标的数学公式,并分析了噪声注入、分数引导和反演误差的动力学。我们提供了重构误差、重复编辑下的稳定性以及变化局部性的理论界限。我们提出了掩码局部化和指令引导编辑的算法框架(附伪代码),并在多个任务和指标(FID、身份相似性、CLIP对齐、伪影分数等)上进行了广泛的实验,比较了最先进的方法(例如TF-ICON \cite{lu2023tficone}、DragFlow \cite{zhou2025dragflow}、InstructPix2Pix \cite{brooks2023instructpix2pix}、UltraEdit \cite{zhao2024ultraedit})。我们的结果揭示了关键失败模式,如身份漂移、提示敏感性和组合错误。我们还讨论了图像编辑中的伦理考量,包括滥用风险、偏见、同意以及概念擦除技术(例如MACE \cite{lu2024mace}、ANT \cite{li2025ant}、EraseAnything \cite{gao2024eraseanything})作为安全措施。最后,我们总结了负责任、高保真度扩散编辑的最佳实践和未来方向。

英文摘要

Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.

2606.09941 2026-06-10 stat.AP cs.LG stat.OT 交叉投稿

Stochastic weather generators for high-frequency wind vector time series

高频风矢量时间序列的随机天气生成器

Mingshi Cui, Kevin Eng, Justin T. Greene, Zern Ke, Abolfazl Sodagartojgi, Zhiqiu Xia, Gemma E. Moran, Michael L. Stein

发表机构 * Department of Statistics, Rutgers University(统计学系,罗格斯大学)

AI总结 针对分钟级风矢量时间序列,开发基于时间矢量量化变分自编码器的机器学习模型,生成逼真序列,捕捉昼夜变化但极端风速分布匹配不足。

详情
AI中文摘要

地表风速在分钟尺度上变化显著,因此有必要研究其在此精细时间尺度上的变化。为最小化季节性影响,本文限定于六月,基于俄克拉荷马州拉蒙特站点超过30年的分钟级高质量测量数据,开发了一系列用于生成真实地表风矢量时间序列的机器学习模型。此类生成器可作为多种学科模型的输入,特别是风能领域,同时也适用于野火蔓延和航空等。数据显示风速和风向均存在复杂的昼夜结构,标准时间序列模型难以捕捉,因此我们考虑多种机器学习方法,基于时间矢量量化变分自编码器构建随机风生成器。我们考虑一次生成一天的数据,以及基于前一天风况生成一天的风矢量。我们还研究了在生成器中纳入离散天气状态变量的方法。我们使用多种正式和非正式方法评估生成器。其中最佳生成器能够捕捉观测数据中的许多(但非全部)复杂特征。特别地,我们的最佳方法准确模拟了风波动性的昼夜变化,但在匹配观测到的极端风速分布方面存在困难。

英文摘要

Surface winds can vary substantially from one minute to the next, so there is scope for studying its variation on this fine time scale. Restricting to the month of June to minimize seasonality, this work develops a range of machine learning models for generating realistic time series of surface wind vectors at a site in Lamont, Oklahoma based on more than 30 years of high quality measurements at the minute time scale. Such a generator could be used as an input into models from a range of disciplines, notably for wind energy, but also wildfire spread and aviation, among others. The data show complex diurnal structures in both wind speed and direction that would be challenging to capture with standard time series models, so we consider a number of machine learning approaches to producing a stochastic wind generator based on time vector-quantized variational autoencoders. We consider generating a day's worth of data at a time and generating a day of wind vectors conditional on the previous day's winds. We also study methods for incorporating a discrete weather state variable in the generator. We evaluate the generators using a wide range of formal and informal methods. The best of these generators can capture many but not all of the complex features present in the observational data. In particular, the best of our approaches accurately mimic diurnal changes in wind volatility but struggle to match the observed distribution of extreme wind speeds.

2606.10023 2026-06-10 astro-ph.CO astro-ph.IM cs.LG 交叉投稿

Learning the Universe: Posterior Reliability of Neural Generative Models in High-Dimensional Field-Level Inference of Cosmic Initial Conditions

学习宇宙:神经生成模型在高维场级宇宙初始条件推断中的后验可靠性

Ludvig Doeser, Jens Jasche

发表机构 * The Oskar Klein Centre, Department of Physics, Stockholm University, AlbaNova University Centre(奥斯卡·克莱因中心,物理系,斯德哥尔摩大学,阿尔瓦纳大学中心) Center for Computational Astrophysics, Flatiron Institute(计算天体物理学中心,Flatiron研究所)

AI总结 本文通过哈密顿蒙特卡洛参考后验,评估神经生成模型(随机插值和GLOW归一化流)在高维场级宇宙初始条件推断中的后验可靠性,发现匹配后验均值或边缘分布无法保证正确的不确定性结构。

Comments This is a Learning the Universe publication. 19 pages, 18 figures

详情
AI中文摘要

准确的后验估计是科学推断的核心,因为不确定性决定了从观测数据中能可靠地学到什么。虽然马尔可夫链蒙特卡洛方法提供了渐近收敛保证,但在高维设置中计算成本高昂。基于神经网络的生成模型能够对整个离散化三维场进行快速摊销推断,但通常缺乏收敛保证和原则性的精度评估。利用哈密顿蒙特卡洛获得参考后验样本,我们对隐式生成模型(随机插值)和显式基于似然的模型(GLOW归一化流)进行了受控的场级评估。这种在典型应用中无法获得的比较,使得能够检测到标准指标无法捕捉的后验几何失败。作为案例研究,我们考虑了从当今大尺度结构推断宇宙初始条件的宇宙学逆问题。为了匹配现代宇宙学数据的精度,该问题日益依赖复杂、非线性和不可微的模拟器,这些模拟器与基于梯度的推断框架不兼容。生成模型提供了一条应对这些挑战的途径,前提是它们推断的后验是可靠的。在这项工作中,我们表明,匹配后验均值、边缘分布或实现高互相关并不意味着正确的不确定性结构,这一点通过后验方差场和基于样本的评估得以揭示。通过这项工作,我们旨在提高对高维场级设置中不确定性估计挑战的认识,强调在科学应用中仔细设计和验证神经生成方法的重要性。

英文摘要

Accurate posterior estimation is central to scientific inference, as uncertainties determine what can be reliably learned from observational data. While Markov chain Monte Carlo methods provide asymptotic convergence guarantees, they are computationally demanding in high-dimensional settings. Neural network-based generative models for entire discretized 3D fields enable fast amortized inference but often lack convergence guarantees and principled accuracy assessment. Using Hamiltonian Monte Carlo to obtain reference posterior samples, we conduct a controlled field-level evaluation of an implicit generative model (Stochastic Interpolants) and an explicit likelihood-based model (GLOW normalizing flows). This comparison, unavailable in typical applications, enables the detection of posterior geometry failures that standard metrics cannot capture. As a case study, we consider the cosmological inverse problem of inferring cosmic initial conditions from present-day large-scale structure. To match the precision of modern cosmological data, this problem increasingly relies on complex, non-linear, and non-differentiable simulators, which are incompatible with gradient-based inference frameworks. Generative models offer a route to address these challenges, provided their inferred posteriors are reliable. In this work, we show that matching posterior means, marginal distributions, or achieving high cross-correlation does not imply correct uncertainty structure, as revealed by posterior variance fields and sample-based evaluations. Through this work, we aim to raise awareness of the challenges of uncertainty estimation in high-dimensional field-level settings, highlighting the importance of careful design and validation of neural generative approaches for scientific applications.

2606.10450 2026-06-10 cs.CV cs.LG 交叉投稿

Few-step Generative Models as Lossy Compression

少步生成模型作为有损压缩

Fuma Kimishima, Jinjia Zhou

发表机构 * University of Tokyo(东京大学)

AI总结 研究将少步生成模型(Rectified Flow、CTM、MeanFlow)用于反向信道编码框架进行有损压缩,通过参数化等效和局部高斯近似实现无需重训练的编解码,在低分辨率基准上减少编解码时间并提升低比特率下的真实性。

详情
AI中文摘要

DiffC 提供了一种重用预训练扩散模型进行有损压缩的原则性方法,但其编码和解码过程仍然缓慢,因为它们需要许多离散化的前向和反向步骤。我们研究少步生成模型——Rectified Flow、一致性轨迹模型(CTM)和 MeanFlow——是否可以在相同的反向信道编码(RCC)框架中作为编解码器使用。主要挑战在于 RCC 需要后验和共享分布参数,而这些模型并未显式参数化中间条件分布。对于 Rectified Flow 和 MeanFlow,我们利用速度参数化与扩散式去噪参数化之间的等价性来推导 RCC 所需的量。对于从 EDM 蒸馏得到的 CTM,我们采用 EDM 噪声参数化以及中间状态下发送方和共享分布的局部高斯近似。这产生了一个概念验证的概率公式,使得无需重新训练即可使用预训练的少步生成模型进行压缩。在低分辨率基准上,由此产生的编解码器减少了编码和解码时间,并在低比特率范围内提高了真实性。

英文摘要

DiffC provides a principled way to reuse pre-trained diffusion models for lossy compression, but its encoding and decoding procedures remain slow because they require many discretized forward and reverse steps. We study whether few-step generative models -- Rectified Flow, Consistency Trajectory Models (CTM), and MeanFlow -- can be cast as codecs within the same reverse channel coding (RCC) framework. The main challenge is that RCC requires posterior and shared distribution parameters, whereas these models do not explicitly parameterize intermediate conditional distributions. For Rectified Flow and MeanFlow, we use the equivalence between velocity parameterization and diffusion-style denoising parameterization to derive the quantities required by RCC. For CTM, which is distilled from EDM, we adopt the EDM noise parameterization together with local Gaussian approximations of the sender and shared distributions at intermediate states. This yields a proof-of-concept probabilistic formulation that enables compression with pre-trained few-step generative models without retraining. On low-resolution benchmarks, the resulting codecs reduce encoding and decoding time and improve realism in the low-bit-rate regime.

2606.11156 2026-06-10 stat.ML cs.LG 交叉投稿

Itô maps for any-step SDEs

任意步SDE的Itô映射

Zhengkai Pan, Peter Potaptchik, Wenxi Yao, Michael S. Albergo, Jakiw Pidstrigach

发表机构 * Harvard University(哈佛大学) University of Oxford(牛津大学) Kempner Institute(凯门研究所)

AI总结 提出Itô映射,一种任意步随机流映射,通过单次前向传播预测未来状态,实现随机动力学的精确蒸馏,并支持推理时控制和后验采样。

详情
AI中文摘要

最近的单步生成模型通过学习底层动力学的确定性流映射来加速采样。这些方法依赖于从常微分方程学习,但如何为随机动力学定义精确的蒸馏过程仍是开放问题。我们引入Itô映射,一种任意步随机流映射,它接收中间状态和布朗路径,并在单次前向传播中预测未来状态。Itô映射公式通过提供廉价、可微的后验样本访问,为推理时控制提供了新的估计器。实验上,Itô映射从固定的中间状态生成多样、条件有效的端点样本,并在合成和图像生成基准上支持强引导性能。这些结果确立了任意步SDE积分作为后验采样和随机控制的有用原语。

英文摘要

Recent one-step generative models accelerate sampling by learning deterministic flow maps of the underlying dynamics. These methods rely on learning from ordinary differential equations, leaving open how to define an exact distillation procedure for stochastic dynamics. We introduce the Itô map, an any-step stochastic flow map that takes an intermediate state and Brownian path and predicts future states in a single pass. The Itô map formulation yields novel estimators for inference-time control by providing cheap, differentiable access to posterior samples. Empirically, Itô maps produce diverse, conditionally valid endpoint samples from fixed intermediate states and support strong steering performance on synthetic and image-generation benchmarks. These results establish any-step SDE integration as a useful primitive for posterior sampling and stochastic control.

2310.05264 2026-06-10 cs.LG cs.CV 版本更新

The Emergence of Reproducibility and Generalizability in Diffusion Models

扩散模型中可重复性与泛化性的出现

Huijie Zhang, Jinfan Zhou, Yifu Lu, Minzhe Guo, Peng Wang, Liyue Shen, Qing Qu

发表机构 * CIFAR-10 dataset(CIFAR-10数据集)

AI总结 研究发现扩散模型在相同初始噪声和确定性采样器下,不同模型输出高度相似,且这种可重复性在记忆和泛化两种训练模式下均存在,对训练效率、模型隐私等有重要启示。

Comments NeurIPS Diffusion Model Workshop 2023 (best paper award), the Forty-first International Conference on Machine Learning (ICML 2024)

详情
AI中文摘要

在这项工作中,我们研究了扩散模型的一个有趣且普遍的现象,我们称之为“一致模型可重复性”:给定相同的起始噪声输入和确定性采样器,不同的扩散模型通常会产生非常相似的输出。我们通过全面的实验证实了这一现象,这意味着不同的扩散模型一致地达到相同的数据分布和评分函数,无论扩散模型框架、模型架构或训练过程如何。更引人注目的是,我们的进一步研究表明,扩散模型学习到的不同分布受到训练数据大小的影响。这一点得到了以下事实的支持:模型可重复性表现在两种不同的训练机制中:(i)“记忆机制”,其中扩散模型过拟合到训练数据分布,以及(ii)“泛化机制”,其中模型学习底层数据分布。我们的研究还发现,这一有价值的特性推广到许多扩散模型的变体,包括用于条件使用、解决逆问题和模型微调的变体。最后,我们的工作提出了许多有趣的理论问题供未来研究,并强调了关于训练效率、模型隐私和扩散模型受控生成的实际意义。

英文摘要

In this work, we investigate an intriguing and prevalent phenomenon of diffusion models which we term as "consistent model reproducibility": given the same starting noise input and a deterministic sampler, different diffusion models often yield remarkably similar outputs. We confirm this phenomenon through comprehensive experiments, implying that different diffusion models consistently reach the same data distribution and scoring function regardless of diffusion model frameworks, model architectures, or training procedures. More strikingly, our further investigation implies that diffusion models are learning distinct distributions affected by the training data size. This is supported by the fact that the model reproducibility manifests in two distinct training regimes: (i) "memorization regime", where the diffusion model overfits to the training data distribution, and (ii) "generalization regime", where the model learns the underlying data distribution. Our study also finds that this valuable property generalizes to many variants of diffusion models, including those for conditional use, solving inverse problems, and model fine-tuning. Finally, our work raises numerous intriguing theoretical questions for future investigation and highlights practical implications regarding training efficiency, model privacy, and the controlled generation of diffusion models.

2409.02426 2026-06-10 cs.LG cs.CV 版本更新

Breaking the Curse of Dimensionality: Diffusion Models Efficiently Learn Low-Dimensional Distributions

打破维度诅咒:扩散模型高效学习低维分布

Peng Wang, Huijie Zhang, Zekai Zhang, Siyi Chen, Yi Ma, Qing Qu

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出新数学框架,证明扩散模型通过等价于子空间聚类,能以线性于内在维度的样本复杂度学习低维分布,避免维度诅咒。

Comments 37 pages, 8 figures, 2 tables

详情
AI中文摘要

尽管扩散模型在广泛的生成任务中取得了经验上的成功,但其学习数据分布能力的基本原理仍不清楚。在这项工作中,我们开发了一个新的数学框架,解释了扩散模型如何能够从有限数量的训练样本中有效学习低维分布,而不受维度诅咒的影响。具体来说,受图像数据内在低维结构的启发,我们在理论上分析了一个数据分布被建模为低秩高斯混合的场景。在合适的网络参数化下,我们表明优化扩散模型的训练目标等价于在训练样本上解决经典子空间聚类问题,其中每个子空间基对应于一个高斯分量的低秩协方差。这种等价性使我们能够证明,学习底层分布的样本复杂度与数据的内在维度呈线性关系,而不是与环境维度呈指数关系。我们的理论发现得到了经验证据的进一步支持,这些证据展示了在合成和真实世界图像数据集上的泛化相变现象。此外,我们建立了学习到的子空间基与图像数据语义属性之间的对应关系,为可控图像生成提供了原则性基础。

英文摘要

Despite their empirical success across a wide range of generative tasks, the fundamental principles underlying the ability of diffusion models to learn data distributions are poorly understood. In this work, we develop a new mathematical framework that explains how diffusion models can effectively learn low-dimensional distributions from a finite number of training samples without suffering from the curse of dimensionality. Specifically, motivated by the intrinsic low-dimensional structure of image data, we theoretically analyze a setting in which the data distribution is modeled as a mixture of low-rank Gaussians. Under suitable network parameterization, we show that optimizing the training objective of diffusion models is equivalent to solving the canonical subspace clustering problem over the training samples, where each subspace basis corresponds to the low-rank covariance of a Gaussian component. This equivalence allows us to show that the sample complexity for learning the underlying distribution scales linearly with the intrinsic dimension of the data, rather than exponentially with the ambient dimension. Our theoretical findings are further supported by empirical evidence that demonstrates phase transition phenomena in generalization on both synthetic and real-world image datasets. Moreover, we establish a correspondence between the learned subspace bases and semantic attributes of image data, providing a principled foundation for controllable image generation.

2509.21925 2026-06-10 cs.LG cs.AI 版本更新

Generation Properties of Stochastic Interpolation under Finite Training Set

有限训练集下随机插值的生成性质

Yunchen Li, Shaohui Lin, Zhou Yu

AI总结 研究有限训练集下随机插值生成模型的理论性质,推导最优速度场和得分函数的闭式解,揭示确定性和随机生成过程的行为,并定义欠拟合与过拟合。

Comments We found proof errors affecting key theorems and wish to avoid misleading readers. We have submitted a substantially revised new paper, arXiv:2606.08554, retaining only two old theorems and adding five new ones

详情
AI中文摘要

本文研究了有限训练总体下生成模型的理论行为。在随机插值生成框架内,我们推导了当仅有有限数量的训练样本可用时最优速度场和得分函数的闭式表达式。我们证明,在某些正则性条件下,确定性生成过程精确恢复训练样本,而随机生成过程表现为带有加性高斯噪声的训练样本。在理想化设置之外,我们考虑模型估计误差,并引入生成模型特有的欠拟合和过拟合的正式定义。我们的理论分析揭示,在存在估计误差的情况下,随机生成过程有效地产生训练样本的凸组合,这些组合被均匀噪声和高斯噪声的混合所破坏。在生成任务和分类等下游任务上的实验支持了我们的理论。

英文摘要

This paper investigates the theoretical behavior of generative models under finite training populations. Within the stochastic interpolation generative framework, we derive closed-form expressions for the optimal velocity field and score function when only a finite number of training samples are available. We demonstrate that, under some regularity conditions, the deterministic generative process exactly recovers the training samples, while the stochastic generative process manifests as training samples with added Gaussian noise. Beyond the idealized setting, we consider model estimation errors and introduce formal definitions of underfitting and overfitting specific to generative models. Our theoretical analysis reveals that, in the presence of estimation errors, the stochastic generation process effectively produces convex combinations of training samples corrupted by a mixture of uniform and Gaussian noise. Experiments on generation tasks and downstream tasks such as classification support our theory.

2601.08379 2026-06-10 cs.LG cs.AI cs.CV 版本更新

MMD Guidance: Training-Free Distribution Adaptation for Diffusion Models via Maximum Mean Discrepancy Guidance

MMD Guidance: 基于最大均值差异引导的无训练分布适应扩散模型

Matina Mahdizadeh Sani, Nima Jamali, Mohammad Jalali, Farzan Farnia

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出MMD Guidance,一种无训练方法,通过最大均值差异梯度引导扩散模型采样,实现与参考数据分布对齐,无需重新训练。

详情
AI中文摘要

预训练扩散模型已成为无条件及条件样本生成的有力先验,但其输出常偏离用户特定目标数据的特征。这种不匹配在领域适应任务中尤为突出,此时仅有少量参考样本可用且重新训练扩散模型不可行。现有推理时引导方法可调整采样轨迹,但通常优化替代目标(如分类器似然)而非直接对齐目标分布。我们提出MMD Guidance,一种无训练机制,通过生成样本与参考数据集之间的最大均值差异(MMD)梯度增强反向扩散过程。MMD能从有限数据中提供可靠分布估计,实践中方差低,且可高效微分,特别适合引导任务。我们的框架通过乘积核自然扩展到条件生成模型中的提示感知适应。此外,由于引导在潜在扩散模型(LDM)的潜在空间中进行,因此可高效计算。在合成及真实世界基准上的实验表明,MMD Guidance能在保持样本保真度的同时实现分布对齐。项目代码见该网址。

英文摘要

Pre-trained diffusion models have emerged as powerful generative priors for both unconditional and conditional sample generation, yet their outputs often deviate from the characteristics of user-specific target data. Such mismatches are especially problematic in domain adaptation tasks, where only a few reference examples are available and retraining the diffusion model is infeasible. Existing inference-time guidance methods can adjust sampling trajectories, but they typically optimize surrogate objectives such as classifier likelihoods rather than directly aligning with the target distribution. We propose \emph{MMD Guidance}, a training-free mechanism that augments the reverse diffusion process with gradients of the \textit{Maximum Mean Discrepancy (MMD)} between generated samples and a reference dataset. MMD provides reliable distributional estimates from limited data, exhibits low variance in practice, and is efficiently differentiable, which makes it particularly well-suited for the guidance task. Our framework naturally extends to prompt-aware adaptation in conditional generation models via product kernels. Also, it can be applied with computational efficiency in latent diffusion models (LDMs), since guidance is applied in the latent space of the LDM. Experiments on synthetic and real-world benchmarks demonstrate that MMD Guidance can achieve distributional alignment while preserving sample fidelity. The project code is available at github.com/matinamehdizadeh/MMD-Guidance.

2602.04119 2026-06-10 cs.LG q-bio.QM 版本更新

Synthesizable Molecular Generation via Soft-constrained GFlowNets with Rich Chemical Priors

通过具有丰富化学先验的软约束GFlowNets生成可合成分子

Hyeonah Kim, Minsu Kim, Celine Roget, Dionessa Biton, Louis Vaillancourt, Yves V. Brun, Yoshua Bengio, Alex Hernandez-Garcia

发表机构 * University of Toronto(多伦多大学) DeepMind(深度思维) University of Montreal(蒙特利尔大学)

AI总结 提出S3-GFN方法,通过软正则化序列GFlowNet,利用大规模SMILES语料库的化学先验,生成高奖励且可合成的分子,实验表明可合成率≥95%。

详情
AI中文摘要

生成模型在实验药物发现活动中的应用受到严重限制,因为从头设计实际可合成的分子非常困难。先前的工作利用生成流网络(GFlowNets)通过基于预定义反应模板和构建块的状态和动作空间设计来施加硬合成性约束。尽管这种方法前景广阔,但目前缺乏灵活性和可扩展性。作为替代方案,我们提出了S3-GFN,它通过对基于序列的GFlowNet进行简单的软正则化来生成可合成的SMILES分子。我们的方法利用从大规模SMILES语料库中学习到的丰富分子先验,将分子生成引导向高奖励、可合成的化学空间。该模型通过离策略重放训练和基于可合成与不可合成样本分离缓冲区的对比学习信号来施加约束。我们的实验表明,S3-GFN能够学习生成可合成分子(≥95%),并在多种任务中获得更高奖励。

英文摘要

The application of generative models for experimental drug discovery campaigns is severely limited by the difficulty of designing molecules de novo that can be synthesized in practice. Previous works have leveraged Generative Flow Networks (GFlowNets) to impose hard synthesizability constraints through the design of state and action spaces based on predefined reaction templates and building blocks. Despite the promising prospects of this approach, it currently lacks flexibility and scalability. As an alternative, we propose S3-GFN, which generates synthesizable SMILES molecules via simple soft regularization of a sequence-based GFlowNet. Our approach leverages rich molecular priors learned from large-scale SMILES corpora to steer molecular generation towards high-reward, synthesizable chemical spaces. The model induces constraints through off-policy replay training with a contrastive learning signal based on separate buffers of synthesizable and unsynthesizable samples. Our experiments show that S3-GFN learns to generate synthesizable molecules ($\geq 95\%$) with higher rewards in diverse tasks.

2602.09639 2026-06-10 cs.LG stat.ML 版本更新

Blind denoising diffusion models and the blessings of dimensionality

盲去噪扩散模型与维度的祝福

Zahra Kadkhodaie, Aram-Alexandre Pooladian, Sinho Chewi, Eero Simoncelli

发表机构 * Flatiron Institute, Simons Foundation(Flatiron研究院,Simons基金会) Foundations of Data Science, Yale University(数据科学基础,耶鲁大学) Department of Statistics and Data Science, Yale University(统计与数据科学系,耶鲁大学) Ctr. for Neural Science & Courant Institute, New York University(神经科学中心及Courant学院,纽约大学)

AI总结 提出盲去噪扩散模型(BDDM),通过不向神经网络传递噪声幅度来简化设计,并在数据内在维度低于环境维度的假设下证明其正确性,实验显示自适应方案的优势。

Comments 39 pages, 13 figures; Accepted to ICML 2025 FoGen workshop

详情
AI中文摘要

去噪扩散模型(DDM)是跨多个领域从数据中学习密度的最先进方法,然而训练和采样流程的许多方面仍知之甚少。特别是,噪声调节要求从业者将人为设计的无原则噪声嵌入纳入神经网络架构,并使用临时噪声调度进行采样。为了解决这些缺点,我们提供了\emph{盲去噪扩散模型}(BDDM)的完整理论:这是DDM的一种变体,其中噪声幅度在训练或采样期间不传入神经网络,从而消除了上述设计选择的需要。我们在数据分布相对于环境维度具有低内在维度的假设下证明了BDDM作为采样算法的正确性。这一假设源于从单个噪声样本估计噪声水平的贝叶斯问题的引入,该问题可能具有独立的意义。我们通过实验将BDDM的性能与标准DDM进行比较,展示了我们分析严格证明的\emph{自适应}方案的优势。

英文摘要

Denoising diffusion models (DDMs) are state-of-the-art methods for learning densities from data across numerous domains, yet many aspects of the training and sampling pipeline remain poorly understood. In particular, noise conditioning requires practitioners to incorporate contrived unprincipled noise embeddings into neural network architectures and to use ad hoc noise schedules for sampling. To address these drawbacks, we provide a complete theory for \emph{blind denoising diffusion models} (BDDMs): a variant of DDMs where the noise amplitude is not passed into the neural network during training or sampling, obviating the need for the aforementioned design choices. We justify the correctness of BDDMs as a sampling algorithm under an assumption of low intrinsic dimensionality of the underlying data distribution relative to the ambient dimension. This assumption arises through the introduction of the Bayesian problem of estimating noise levels from a single noisy sample, which might be of independent interest. We empirically compare the performance of BDDMs to standard DDMs, showcasing the benefits of an \emph{adaptive} scheme which is rigorously justified by our analysis.

2606.07605 2026-06-10 cs.LG cs.AI 版本更新

SRT: Super-Resolution for Time Series via Disentangled Rectified Flow

SRT: 基于解缠校正流的时间序列超分辨率

Jufang Duan, Shenglong Xiao, Yuren Zhang

发表机构 * Bytedance(字节跳动)

AI总结 提出SRT框架,通过解缠校正流将低分辨率时间序列重建为高分辨率,分解趋势与季节成分,利用隐式神经表示对齐分辨率,并引入跨分辨率注意力机制生成细节。

Comments Accepted to the International Conference on Learning Representations (ICLR) 2026

详情
Journal ref
The Fourteenth International Conference on Learning Representations (ICLR 2026)
AI中文摘要

具有高时间分辨率的细粒度时间序列数据对于广泛应用的精确分析至关重要。然而,获取此类数据通常受到成本和可行性的限制。可以通过基于特定先验从低分辨率输入重建高分辨率信号来解决此问题,这被称为超分辨率。虽然在计算机视觉中得到了广泛研究,但直接将图像超分辨率技术迁移到时间序列并非易事。为了从根本上解决这一挑战,我们提出了时间序列超分辨率(SRT),这是一种通过解缠校正流重建低分辨率输入中丢失的时间模式的新框架。SRT将输入分解为趋势和季节成分,使用隐式神经表示将它们对齐到目标分辨率,并利用一种新颖的跨分辨率注意力机制来指导高分辨率细节的生成。我们进一步引入了SRT-large,这是一个经过大规模预训练的扩展版本,具有强大的零样本超分辨率能力。在九个公共数据集上的大量实验表明,SRT和SRT-large在多个尺度因子下始终优于现有方法,展示了稳健的性能以及我们架构中每个组件的有效性。

英文摘要

Fine-grained time series data with high temporal resolution is critical for accurate analytics across a wide range of applications. However, the acquisition of such data is often limited by cost and feasibility. This problem can be tackled by reconstructing high-resolution signals from low-resolution inputs based on specific priors, known as super-resolution. While extensively studied in computer vision, directly transferring image super-resolution techniques to time series is not trivial. To address this challenge at a fundamental level, we propose Super-Resolution for Time series (SRT), a novel framework that reconstructs temporal patterns lost in low-resolution inputs via disentangled rectified flow. SRT decomposes the input into trend and seasonal components, aligns them to the target resolution using an implicit neural representation, and leverages a novel cross-resolution attention mechanism to guide the generation of high-resolution details. We further introduce SRT-large, a scaled-up version with extensive pre-training, which enables strong zero-shot super-resolution capability. Extensive experiments on nine public datasets demonstrate that SRT and SRT-large consistently outperform existing methods across multiple scale factors, showing both robust performance and the effectiveness of each component in our architecture.

2412.11449 2026-06-10 cs.SD cs.AI cs.CL cs.LG eess.AS 版本更新

Whisper-GPT -- Continuous Discrete Hybrid Representation Language Models For Speech And Music

Whisper-GPT -- 语音和音乐的连续离散混合表示语言模型

Prateek Verma

发表机构 * Stanford University(斯坦福大学)

AI总结 提出Whisper-GPT,一种结合连续音频表示(如频谱图)和离散音频令牌的生成式大语言模型,解决了离散令牌方法上下文长度过长的问题,在语音和音乐的下一个令牌预测中降低了困惑度和负对数似然。

Comments 6 pages, 3 figures. 50th International Conference on Acoustics, Speech and Signal Processing, Hyderabad, India

详情
AI中文摘要

我们提出了WHISPER-GPT:一种用于语音和音乐的生成式大语言模型(LLM),它允许我们在单个架构中同时处理连续音频表示和离散令牌。近年来,利用神经压缩算法(例如ENCODEC)导出的离散音频令牌的生成式音频、语音和音乐模型激增。然而,这种方法的主要缺点之一是处理上下文长度。如果必须考虑不同频率下的所有音频内容来进行下一个令牌预测,那么对于高保真生成架构来说,上下文长度会急剧增长。通过结合连续音频表示(如频谱图)和离散声学令牌,我们保留了两者的优点:在单个令牌中拥有来自音频特定时间实例的所有必要信息,同时允许LLM预测未来令牌,从而获得采样和离散空间提供的其他好处。我们展示了与基于令牌的语音和音乐LLM相比,我们的架构如何提高下一个令牌预测的困惑度和负对数似然分数。

英文摘要

We propose WHISPER-GPT: A generative large language model (LLM) for speech and music that allows us to work with continuous audio representations and discrete tokens simultaneously as part of a single architecture. There has been a huge surge in generative audio, speech, and music models that utilize discrete audio tokens derived from neural compression algorithms, e.g. ENCODEC. However, one of the major drawbacks of this approach is handling the context length. It blows up for high-fidelity generative architecture if one has to account for all the audio contents at various frequencies for the next token prediction. By combining continuous audio representation like the spectrogram and discrete acoustic tokens, we retain the best of both worlds: Have all the information needed from the audio at a specific time instance in a single token, yet allow LLM to predict the future token to allow for sampling and other benefits discrete space provides. We show how our architecture improves the perplexity and negative log-likelihood scores for the next token prediction compared to a token-based LLM for speech and music.

2506.03672 2026-06-10 stat.ML cs.LG math.OC 版本更新

Latent Guided Sampling for Combinatorial Optimization

面向组合优化的潜在引导采样

Sobihan Surendran, Adeline Fermanian, Sylvain Le Corff

发表机构 * Sorbonne Université and Université Paris Cité, CNRS, Laboratoire de Probabilités, Statistique et Modélisation, F-75005 Paris, France(索邦大学和巴黎Cité大学,法国国家科学研究中心,概率、统计与建模实验室,法国巴黎F-75005) LOPF, Califrais' Machine Learning Lab, Paris, France(LOPF,Califrais机器学习实验室,法国巴黎)

AI总结 提出LGS-Net潜在空间模型,结合马尔可夫链蒙特卡洛与随机逼近的潜在引导采样方法,在路由任务上达到最先进性能。

详情
Journal ref
International Conference on Machine Learning, Jul 2026, Seoul, South Korea
AI中文摘要

组合优化问题在物流、制造和药物发现等领域广泛存在,但其NP-hard性质使其计算上具有挑战性。最近的神经组合优化(NCO)方法利用深度学习来学习构建解的策略,通过监督学习或强化学习进行训练。尽管有前景,但这些方法通常依赖于任务特定的增强,在分布外实例上表现不佳,并且缺乏鲁棒的推理机制。此外,现有的潜在空间模型要么需要标记数据,要么使用与实例无关的潜在分布。在这项工作中,我们提出了LGS-Net,一种新颖的以问题实例为条件的潜在空间模型,并引入了一种高效的推理方法——潜在引导采样(LGS),基于马尔可夫链蒙特卡洛和随机逼近。我们证明了我们方法的迭代形成一个时间非齐次马尔可夫链,并提供了严格的理论收敛保证。在基准路由任务上的实证结果表明,我们的方法在NCO基线中达到了最先进的性能。

英文摘要

Combinatorial Optimization problems are widespread in domains such as logistics, manufacturing, and drug discovery, yet their NP-hard nature makes them computationally challenging. Recent Neural Combinatorial Optimization (NCO) methods leverage deep learning to learn policies for constructing solutions, trained via Supervised or Reinforcement Learning. While promising, these approaches often rely on task-specific augmentations, perform poorly on out-of-distribution instances, and lack robust inference mechanisms. Moreover, existing latent space models either require labeled data or use an instance-independent latent distribution. In this work, we propose LGS-Net, a novel latent space model that conditions on problem instances, and introduce an efficient inference method, Latent Guided Sampling (LGS), based on Markov Chain Monte Carlo and Stochastic Approximation. We show that the iterations of our method form a time-inhomogeneous Markov Chain and provide rigorous theoretical convergence guarantees. Empirical results on benchmark routing tasks show that our method achieves state-of-the-art performance among NCO baselines.

2508.07048 2026-06-10 cs.SD cs.AI cs.LG eess.AS 版本更新

Whisfusion: Parallel ASR Decoding with Masked Diffusion

Whisfusion: 基于掩码扩散的并行ASR解码

Taeyoun Kwon, Junhyuk Ahn, Taegeun Yun, Heeju Jwa, Yoonchae Choi, Siwon Park, Jongchan Kim, Hyungon Ryu, Hyuk-Jae Lee, Nam-Joon Kim

发表机构 * Seoul National University(首尔国立大学) Soongsil University(顺天大学) NVIDIA Corporation(英伟达公司)

AI总结 提出Whisfusion,在冻结的Whisper音频嵌入上训练专用掩码扩散解码器,通过并行扩散解码实现非自回归ASR,在多种语言基准上超越Whisper-large-v3,速度提升4-5倍。

Comments 16 pages, 3 figures

详情
AI中文摘要

自回归(AR)编码器-解码器模型主导着高质量的多语言ASR,但其从左到右的解码器使得推理延迟随转录长度增加。一种自然的替代方案,CTC风格的非自回归(NAR)系统避免了这一瓶颈,但其条件独立性假设牺牲了转录级别的生成建模。掩码扩散语言模型(例如LLaDA、MDLM)提供了一种有竞争力的NAR文本生成方法。我们探究这类模型是否能在消除从左到右瓶颈的同时,将NAR ASR带入强AR ASR系统的准确率范围。我们提出Whisfusion,它在冻结的Whisper-large-v3音频嵌入之上从头训练一个专用的掩码扩散解码器,仅需几步即可去噪掩码转录。我们在约68k小时的11种语言语音上训练,采用高掩码专门化以将训练与推理的完全掩码起始点对齐,并通过并行扩散解码进行解码。Whisfusion在英语、欧洲和CJK基准测试的组平均准确率上超越Whisper-large-v3,同时运行速度快4-5倍,在准确率和吞吐量上均超越Whisper-turbo。它达到与Canary和Qwen3-ASR竞争的准确率,同时运行速度快3-7倍。这些结果确立了掩码扩散作为高吞吐量多语言转录的帕累托竞争性非自回归范式。代码和模型权重可在https://this URL获取。

英文摘要

Autoregressive (AR) encoder-decoder models dominate high-quality multilingual ASR, but their left-to-right decoders make inference latency scale with transcript length. A natural alternative, CTC-style non-autoregressive (NAR) systems avoid this bottleneck but their conditional independence assumption sacrifices transcript-level generative modeling. Masked diffusion language models (e.g., LLaDA, MDLM) offer a competitive NAR text-generation approach. We ask whether such models can bring NAR ASR into the accuracy regime of strong AR ASR systems while removing the left-to-right bottleneck. We propose Whisfusion, which trains a dedicated masked diffusion decoder from scratch on top of frozen Whisper-large-v3 audio embeddings, denoising masked transcripts in just a few steps. We train on ~68k hours of 11-language speech with high-mask specialization to align training with the fully masked starting point of inference, and decode via Parallel Diffusion Decoding. Whisfusion surpasses Whisper-large-v3 on group-average accuracy across English, European, and CJK benchmarks, while running 4-5x faster, additionally surpassing Whisper-turbo in both accuracy and throughput. It reaches accuracy competitive with Canary and Qwen3-ASR while running 3-7x faster. These results establish masked diffusion as a Pareto-competitive non-autoregressive paradigm for high-throughput multilingual transcription. Code and model weights are available at https://github.com/taeyoun811/Whisfusion.

5. 优化、泛化与理论分析 34 篇

2606.09880 2026-06-10 cs.LG 新提交

Hyperparameter Learning for Latent Factorization of Tensors for Representation Learning to Large-scale Dynamic Weighted Directed Network

面向大规模动态加权有向网络表示学习的张量潜在因子超参数学习

Yaqian Zhan, Jialan He, Tianzhu Chen

发表机构 * College of Computer and Information Science(计算机与信息科学学院)

AI总结 针对大规模动态加权有向网络,提出基于差分进化的张量潜在因子模型超参数自动优化框架,自动学习正则化参数,提升预测精度并减少人工调参。

详情
AI中文摘要

大规模动态加权有向网络(DWDNs)被广泛用于建模节点间的时变交互。张量潜在因子(LFT)通过低秩嵌入从DWDNs中提取目标知识。然而,与许多机器学习模型类似,LFT的性能在很大程度上依赖于超参数的选择。在实践中,这些参数通常通过手动或网格搜索进行调整,这需要大量的计算资源和人力。受此挑战的启发,本文提出了一种基于差分进化(DE)的LFT自动超参数优化框架(DE-LFT)。该方法将DE集成到LFT模型的训练过程中,以自动学习最优正则化参数$\lambda_1$、$\lambda_2$和$\lambda_3$。因此,模型能够自适应地搜索超参数空间并提高预测精度。在四个真实世界数据集上的实验结果表明,与手动调优的基线相比,所提方法实现了更低的MAE和RMSE,同时减少了对大量参数调优的需求。

英文摘要

Large-scale dynamic weighted directed networks (DWDNs) are widely used to model time-varying interactions among nodes. Latent factorization of tensors (LFT) extracts target knowledge from DWDNs via low-rank embedding. However, similar to many machine learning models, the performance of LFT heavily depends on the selection of hyperparameters. In practice, these parameters are often tuned manually or through grid search, which requires significant computational resources and human effort. Motivated by this challenge, this paper proposes an automated hyperparameter optimization framework based on Differential Evolution (DE) for LFT (DE-LFT). The proposed method integrates DE into the training process of the LFT model to automatically learn optimal regularization parameters $λ_1$, $λ_2$ and $λ_3$. As a result, the model can adaptively search the hyperparameter space and improve prediction accuracy. Experimental results on four real-world datasets demonstrate that the proposed approach achieves lower MAE and RMSE compared with manually tuned baselines while reducing the need for extensive parameter tuning.

2606.09889 2026-06-10 cs.LG 新提交

Optuna Constrained Tree-Structured Parzen Estimator Is a Joint Density Generalization of c-TPE

Optuna约束树结构Parzen估计器是c-TPE的联合密度推广

Shuhei Watanabe, Kaichi Irie

发表机构 * Independent Researcher(独立研究者) Kyoto University(京都大学)

AI总结 本文证明Optuna的约束TPE是联合c-TPE,使用联合似然的ECI采集函数,并展示其对约束重复的鲁棒性优于独立c-TPE。

详情
AI中文摘要

约束超参数优化(HPO)在实践中很常见,但Optuna广泛使用的约束TPE缺乏算法分析。虽然c-TPE提出了一种假设目标与约束独立的期望约束改进(ECI)方法,但Optuna使用两者的联合密度。我们证明Optuna的约束TPE是联合c-TPE——使用联合似然的相同ECI采集函数。我们展示联合c-TPE对约束重复具有不变性,而独立c-TPE随着乘积累积重复因子而性能下降。我们概述了两种公式之间的实际权衡以及未来研究的方向。

英文摘要

Constrained hyperparameter optimization (HPO) is common in practice, yet Optuna's widely used constrained TPE lacks algorithmic analysis. While c-TPE proposes an expected constrained improvement (ECI) approach assuming independence between the objective and constraints, Optuna uses a single joint density over both. We show that Optuna's constrained TPE is joint c-TPE -- the same ECI acquisition function using a joint likelihood. We demonstrate joint c-TPE is invariant to constraint duplication whereas independent c-TPE degrades as the product accumulates duplicated factors. We outline practical tradeoffs between the formulations and directions for future study.

2606.09899 2026-06-10 cs.LG cs.AI 新提交

When Attribution Patching Lies: Diagnosis and a Second-Order Correction

当归因补丁说谎时:诊断与二阶修正

Luyang Zhang, Jialu Wang

发表机构 * Carnegie Mellon University(卡内基梅隆大学) University of California, Santa Cruz(加州大学圣克鲁兹分校)

AI总结 研究归因补丁(梯度一阶近似)在机制可解释性中的不可靠性,发现主要误差源于下游网络的非线性,并提出可靠性评分、误差界和HVP二阶修正方法。

Comments 30 pages, 12 figures

详情
AI中文摘要

机制可解释性的一个核心目标是识别哪些内部组件因果地驱动语言模型的行为。由于这些重要性估计作为识别电路的证据,系统性错误可能导致对底层机制的误识别。虽然激活补丁提供了黄金标准的因果度量,但其计算成本在大规模下难以承受。从业者转而依赖归因补丁,一种基于梯度的一阶近似,其可靠性尚不明确。在这项工作中,我们刻画了这种不可靠性的来源,证明主要误差源于下游网络的非线性,而非补丁组件的局部曲率。这一洞察产生了三个实用工具:(i) 检测不可信估计的可靠性评分,(ii) 量化潜在归因误规范的误差界,以及 (iii) 仅需一次额外反向传播即可消除主导误差的Hessian-向量-乘积(HVP)修正。在五个模型家族(124M-9B参数)以及随机令牌和自然(名称交换)扰动的评估中,HVP是唯一在大规模下可行的二阶修正,而标准基线如积分梯度在计算上变得不可行。在对比实验中,多步HVP变体以显著更低的计算量达到或超过积分梯度的准确性,优于先前的二阶基线。这些改进在标准基准上实现了更高保真度的电路恢复,并支持一种屏幕-标记-修复工作流,仅将计算努力针对被标记为不可靠的组件。

英文摘要

A central goal of mechanistic interpretability is to identify which internal components causally drive a language model's behavior. Because these importance estimates serve as the evidence for identifying circuits, systematic errors can lead to the misidentification of the underlying mechanisms. While activation patching provides a gold-standard causal metric, its computational cost is prohibitive at scale. Practitioners instead rely on attribution patching, a gradient-based, first-order approximation whose reliability remains poorly understood. In this work, we characterize the source of this unreliability, demonstrating that the dominant error stems from the non-linearities in the downstream network rather than local curvature at the patched component. This insight yields three practical tools: (i) a reliability score to detect untrustworthy estimates, (ii) error bounds quantifying potential attribution mis-specifications, and (iii) a Hessian-vector-product (HVP) correction that eliminates the leading-order error with only one additional backward pass. In evaluations across five model families (124M-9B parameters) and both random-token and naturalistic (name-swap) perturbations, HVP is the only second-order correction feasible at larger scale, where standard baselines like Integrated Gradients become computationally prohibitive. In comparative experiments, a multi-step HVP variant matches or exceeds the accuracy of Integrated Gradients at significantly lower compute, outperforming prior second-order baselines. These improvements lead to higher-fidelity circuit recovery on standard benchmarks and support a Screen-Flag-Fix workflow that targets computational effort only toward the components flagged as unreliable.

2606.09929 2026-06-10 cs.LG cs.AI 新提交

Between Amnesia and Chaos: A Memory Stability Expressivity Trilemma for Trainable Dissipative Oscillator Networks

介于遗忘与混沌之间:可训练耗散振荡器网络的记忆稳定性表现力三难困境

Caleb Munigety

AI总结 本文研究可训练非线性振荡器网络,发现记忆范围、梯度稳定性和动态表现力三者受阻尼控制,存在无法同时最大化的三难困境,并通过实验验证了理论边界。

详情
AI中文摘要

物理储层计算利用非线性机械动力学,但传统上冻结基底并仅训练线性读出层,假定基底不可训练。我们重新审视这一前提,研究非线性振荡器网络,其质量、阻尼和刚度通过辛积分器端到端学习。我们的核心结果是三难困境:记忆范围、梯度稳定性和动态表现力无法同时最大化,因为三者均由阻尼控制。反向梯度以阻尼决定的速率衰减,限制了信用传播的距离,而前向灵敏度以最大李雅普诺夫指数指数增长,因此可用梯度需要阻尼高于稳定下限。由于李雅普诺夫指数随阻尼增加而下降,而记忆上限随范围增加而下降,稳定训练被限制在一个随范围收缩并在临界点闭合的带状区域内。我们在一个二十振荡器网络上测试了每一步。阻尼扫描发现最大李雅普诺夫指数单调变化并在明确的下限处过零,证实了定理的关键假设。在九个范围上的延迟回忆任务中,学习基底与冻结基底的算力匹配比较显示,学习基底在短范围占优,优势在约十一步范围附近接近并反转,这是带状闭合的预测特征;训练模型稳定在稳定下限附近,自发寻求混沌边缘。解析上限高估经验交叉约五倍,这是可检测梯度与可学习梯度之间的差距,我们报告而非调整消除。贡献在于确认了何时训练物理基底优于冻结基底。

英文摘要

Physical reservoir computing harnesses nonlinear mechanical dynamics but, by convention, freezes the substrate and trains only a linear readout, presuming the substrate is not usefully trainable. We revisit that premise for networks of nonlinear oscillators whose mass, damping, and stiffness are learned end-to-end through a symplectic integrator. Our central result is a trilemma: memory horizon, gradient stability, and dynamical expressivity cannot be simultaneously maximized, because all three are governed by the damping. The backward gradient decays at a rate set by the damping, capping how far back credit can propagate, while forward sensitivities grow exponentially in the largest Lyapunov exponent, so usable gradients require damping above a stability floor. Since the Lyapunov exponent falls as damping rises while the memory ceiling falls as the horizon grows, stable training is confined to a band that contracts with horizon and closes at a critical point. We test every step on a twenty-oscillator network. A damping sweep finds the largest Lyapunov exponent monotone and crossing zero at a well-defined stability floor, confirming the theorem's key assumption. A compute-matched comparison of learned versus frozen substrate on delayed recall across nine horizons shows the learned substrate dominating at short horizons and the advantage closing and reversing near a horizon of eleven steps, the predicted signature of band closure; trained models settle near the stability floor, seeking the edge of chaos unprompted. The analytic ceiling overestimates the empirical crossover roughly fivefold, a gap between detectable and learnable gradient that we report rather than tune away. The contribution is a confirmed account of when training a physical substrate beats freezing it.

2606.09950 2026-06-10 cs.LG nucl-th physics.comp-ph physics.data-an 新提交

Integrating Out, Twice:The Open-System Case That Neural-Network Ensemble Theory Is Missing

两次积分:神经网络集成理论缺失的开系统情形

Jin Lei

AI总结 本文揭示神经网络参数平均与高斯边缘化等价,指出集成理论仅覆盖闭系统,缺失开系统情形;借鉴核反应理论,通过非厄米有效生成器描述开系统,并在注意力图等应用中测试,发现主要结果为负,并解释其结构原因。

详情
AI中文摘要

将神经网络在其随机参数上平均与边缘化高斯扇区是相同的操作,即被消除块的Schur补,当该块闭合时,它返回协方差及其逆。网络集成产生的全部就是闭情形。开情形缺失,而核反应理论已将其解决。将散射问题投影到选定的通道集上,其余部分不可逆地将概率携带到连续谱,留下一个非厄米有效生成器,它精确地守恒并列举它所失去的:核光学模型及其广义光学定理。我仅使用分布的矩、高斯代数和块逆来并置这两种情形,不使用场论,并完整给出闭情形的词典:神经正切核是Fisher灵敏度核,无限宽高斯极限是高斯过程仿真器,从懒惰到特征转换是简化基仿真器的有效性边界。然后我在截断的注意力图、令牌级传输算子和稀疏专家路由器上测试开情形的导出,并报告一个主要为负的结果。守恒流账本在真正存在开放性的地方起作用,但其独特内容缺失,是所选划分的伪影,或被训练目标固定在某个下限附近,而操作上有用的不确定性实际上是认知性的,存在于对应的闭半部分,而非开半部分。这个负结果有一个结构原因,本文使其精确:开情形需要一个具有连续谱和波动(而非弛豫)动力学的被消除扇区,而主流学习的有限或耗散对象无法提供。这是一篇笔记,而非结果;其主要发现是那个负结果,其价值在于定位它的地图。

英文摘要

Averaging a neural network over its random parameters and marginalizing a Gaussian sector are the same operation, the Schur complement of the eliminated block, and when that block is closed it returns a covariance and its inverse. That is all a network ensemble produces, the closed case. The open case is missing, and nuclear reaction theory has it worked out. Projecting a scattering problem onto a chosen set of channels, with the rest carrying probability irreversibly to a continuum, leaves a non-Hermitian effective generator that conserves and itemizes exactly what it loses: the nuclear optical model and its generalized optical theorem. I set the two cases side by side using only the moments of a distribution, the algebra of Gaussians, and block inversion, no field theory, and give the closed-case dictionary in full: the neural tangent kernel is the Fisher sensitivity kernel, the infinite-width Gaussian limit is the Gaussian-process emulator, and the lazy-to-feature transition is the validity boundary of a reduced-basis emulator. I then test the open export on a truncated attention map, a token-level transfer operator, and a sparse expert router, and report a mostly negative result. The conserved flux ledger ports wherever openness is genuinely present, but its distinctive content is absent, an artifact of the chosen partition, or pinned near a floor by the training objective, and the operationally useful uncertainty turns out to be epistemic, living in the closed half of the correspondence, not the open one. The negative has a structural reason this note makes precise: the open case needs an eliminated sector with a continuous spectrum and wave-like, not relaxational, dynamics, which mainstream learning's finite or dissipative objects do not supply. This is a note, not a result; its main finding is that negative one, and its value is the map that locates it.

2606.10068 2026-06-10 cs.LG cs.AI 新提交

Importance-Aware Scheduling for High-Dimensional Hyperparameter Optimization

高维超参数优化的重要性感知调度

Ruinan Wang, Ian Nabney, Mohammad Golbabaee

AI总结 提出GIF方法,通过小样本预热估计超参数重要性,按重要性分组并比例分配试验,保留全空间回退,在高维基准上优于TPE等方法,提升采样效率。

Comments 8 pages, 5 figures. Accepted to IJCNN 2026

详情
AI中文摘要

超参数优化(HPO)对于构建高性能的ML/DL模型至关重要,但传统优化器在高维空间中常常难以应对,其中评估成本高昂且进展被许多低影响变量稀释。我们提出贪婪重要性优先(GIF),一种重要性感知的调度策略,使用小样本预热来估计超参数重要性,形成基于重要性的分组,按比例分配试验,并保留全空间回退。我们在五个各向异性解析函数、Bayesmark和NAS-Bench-301上,在固定评估预算下评估GIF。在高维基准上,GIF比TPE、BOHB、随机搜索和顺序分组更快地达到更好的当前最优解。在有效维度较小的Bayesmark上,GIF仍具有竞争力,但优势较小。消融研究表明,重要性估计、比例分配和回退步骤都有助于性能提升。我们还验证了HIA组件在解析基准上恢复了预期的各向异性。这些结果表明,GIF是一种简单且即插即用的方法,可提高高维HPO中的样本效率。

英文摘要

Hyperparameter Optimization (HPO) is essential for building high-performing ML/DL models, yet conventional optimizers often struggle in high-dimensional spaces where evaluations are costly and progress is diluted across many low-impact variables. We propose Greedy Importance First (GIF), an importance-aware scheduling strategy that uses a small-sample warm start to estimate hyperparameter importance, forms importance-based groups, allocates trials proportionally, and retains a full-space fallback. We evaluate GIF under fixed evaluation budgets on five anisotropic analytic functions, Bayesmark, and NAS-Bench-301. On the higher-dimensional benchmarks, GIF reaches better incumbents with faster convergence than TPE, BOHB, Random Search, and Sequential Grouping. On Bayesmark, where the effective dimensionality is smaller, GIF remains competitive but the margins are smaller. Ablation studies show that importance estimation, proportional allocation, and the fallback step all contribute to the gains. We also verify that the HIA component recovers the intended anisotropy on the analytic benchmarks. These results suggest that GIF is a simple and plug-compatible way to improve sample efficiency in high-dimensional HPO.

2606.10324 2026-06-10 cs.LG cond-mat.stat-mech stat.ML 新提交

Rank Collapse, Fixed Points, and the Renormalization Group Structure of MLP Residual Networks

MLP残差网络的秩坍缩、不动点与重正化群结构

Parviz Haggi-Mani, Irina Rish

AI总结 本文通过MLP残差网络在合成马尔可夫链上的掩码预测任务,首次定量证明网络深度方向存在选择性秩坍缩,对应重正化群中的相关自由度整合,并发现层间核漂移集中在少数转换处。

Comments 16 pages, 9 figures

详情
AI中文摘要

深度神经网络前向传播与重正化群流之间的类比在文献中反复被提及,但现有处理仍是定性的:深度被描述为粗粒化尺度,注意力被比作配分函数,表示被认为流向不动点。尚无工作定义可测量的RG序参量,在输入分布受控变化下测试它,或做出经实验验证的定量预测。我们研究了类比可处理的最简单架构:一个纯MLP残差堆栈,在具有已知谱性质的合成马尔可夫链序列上训练掩码标记预测。我们报告三个发现。(i) 训练后残差流的有效秩随深度单调递减,与无关自由度的逐步整合一致。(ii) 这种秩坍缩是选择性的:它发生在相关长度约1的短链上,但在相关长度约7的长链上不存在(在位置级别测量以控制均值池化伪影)。网络精确保留了预测任务相关的自由度,即RG相关性判据的内容。(iii) 层间核漂移集中在一两个特定转换处,网络其余部分接近不动点,与离散不动点平台一致。这些发现共同构成了首个定量的位置级证据,表明MLP残差网络实现了由输入分布谱结构控制的选择性粗粒化过程。

英文摘要

The analogy between deep neural network forward passes and renormalization group (RG) flows has been repeatedly noted in the literature, but existing treatments remain qualitative: depth is described as a coarse-graining scale, attention is likened to a partition function, and representations are said to flow toward fixed points. No existing work has defined a measurable RG order parameter, tested it under controlled variation of the input distribution, or made quantitative predictions that are empirically verified. We study the simplest architecture for which the analogy is tractable: a pure MLP residual stack trained on masked token prediction over synthetic Markov chain sequences with known spectral properties. We report three findings. (i) The effective rank of the residual stream decreases monotonically with depth after training, consistent with progressive integration of irrelevant degrees of freedom. (ii) This rank collapse is selective: it occurs for chains with short correlation length approximately 1 but is absent for chains with long correlation length approximately 7, measured at the position level to control for mean-pooling artifacts. The network preserves exactly the degrees of freedom relevant to the prediction task, the content of the RG relevance criterion. (iii) Inter-layer kernel drift is concentrated at one or two specific transitions, with the remainder of the network near a fixed point, consistent with a discrete fixed-point plateau. Together these findings constitute the first quantitative, position-level evidence that MLP residual networks implement a selective coarse-graining procedure governed by the spectral structure of the input distribution.

2606.10580 2026-06-10 cs.LG cs.AI 新提交

Convergence of Monte Carlo Optimistic Policy Iteration: Beyond Uniform State-Action Updates

蒙特卡洛乐观策略迭代的收敛性:超越均匀状态-动作更新

Octave Oliviers, Glenn Vinnicombe

发表机构 * Department of Engineering, University of Cambridge(剑桥大学工程系)

AI总结 本文证明,在每状态动作均匀更新的条件下,首次访问蒙特卡洛乐观策略迭代收敛到最优,放宽了传统均匀状态-动作更新的要求,并通过均场动力学和锁定论证方法给出证明。

详情
AI中文摘要

蒙特卡洛乐观策略迭代(MC-O-PI)的渐近行为是一个长期悬而未决的问题。当环境模型未知时(这在实践中很常见),唯一已知的保证收敛到最优性的条件是不切实际的。在其标准形式中,该条件要求用于策略评估的回合在整个状态-动作空间上均匀初始化。本文严格放宽了这一要求。具体来说,我们证明即使更新仅在每个状态内的动作上均匀,首次访问MC-O-PI也能收敛到最优性。这允许回合以任意频率从不同状态开始;当状态空间很大或未知但每个状态中的动作空间可管理时,这是一种现实的实现。证明脱离了Tsitsiklis的经典分析,其中心交换性论证在状态以不同频率更新时不再适用。相反,我们首先证明当更新在每个状态的动作上均匀时,MC-O-PI的均场动力学生成单调改进的策略,然后通过扩展组合稳定性-ODE方法的锁定论证,证明噪声不能持续阻止这种改进。这种方法为一般研究乐观策略迭代算法提供了一种新途径。

英文摘要

The asymptotic behaviour of Monte Carlo optimistic policy iteration (MC-O-PI) is a long-standing open question. When the model of the environment is unknown, as is common in practice, the only known condition that guarantees convergence to optimality is impractical. In its canonical form, this condition requires that the episodes used for policy evaluation be initialised uniformly over the entire state-action space. This paper strictly relaxes that requirement. Specifically, we prove that initial-visit MC-O-PI converges to optimality even when updates are uniform only over the actions within each state. This allows episodes to start in different states at arbitrary frequencies; a realistic implementation when the state space is large or unknown but the action space in each state is manageable. The proof departs from the classical analysis of Tsitsiklis whose central commutativity argument no longer applies when states are updated at different frequencies. Instead, we first show that the mean-field dynamics of MC-O-PI generate monotonically improving policies when updates are uniform over the actions in each state, and then prove that noise cannot consistently prevent this improvement by extending the lock-in argument of the combined stability-ODE method. This approach suggests a new way to study optimistic policy-iteration algorithms in general.

2606.10913 2026-06-10 cs.LG stat.ML 新提交

Conservation Laws from Data Symmetry in Neural Networks

神经网络中数据对称性导致的守恒律

Jakob Galley, Vahid Shahverdi, Axel Flinth

发表机构 * Umeå University(于默奥大学)

AI总结 研究训练数据的对称性是否在梯度流训练中产生守恒量,证明对于解析非多项式损失函数,数据对称性一般不产生额外守恒量;对于均方误差损失,数据增强可产生额外守恒量,并利用可张量化网络框架描述该现象。

详情
AI中文摘要

我们探讨训练数据的内在对称性是否在神经网络的梯度流训练中导致守恒量。在假设损失函数是解析且非多项式的情况下,我们证明数据对称性通常不会诱导任何额外的运动积分。另一方面,对于均方误差(MSE)损失,存在数据增强产生额外守恒量的情况。我们构建了一个利用\emph{可张量化网络}来描述这一现象的框架。可张量化网络是一类架构,其参数和输入的依赖关系可以通过中间表示分离。它们包括线性网络、多项式网络以及闪电注意力(Lightning Attention)。

英文摘要

We explore whether intrinsic symmetries of the training data lead to conserved quantities during gradient-flow training of neural networks. Under the assumption that the loss function is analytic and non-polynomial, we prove that data symmetries generically do not induce any additional integrals of motion. For mean squared error (MSE) loss, on the other hand, there are situations in which data augmentation yields extra conserved quantities. We build a framework, utilizing \emph{tensorizable networks} to describe this phenomenon. Tensorizable networks are a family of architectures whose dependence on parameters and inputs can be separated using an intermediate representation. They include linear and polynomial networks, as well as Lightning Attention.

2606.11130 2026-06-10 cs.LG 新提交

Robust Regression of General ReLUs with Queries

一般ReLU的鲁棒回归与查询

Ilias Diakonikolas, Daniel M. Kane, Mingchen Ma

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校) University of California, San Diego(加利福尼亚大学圣迭戈分校)

AI总结 针对高斯分布下一般ReLU的平方损失鲁棒回归,提出首个高效查询算法,使用d polylog(1/ε)+Õ(min{1/p,1/ε})个标签查询达到O(opt)+ε误差,并证明查询复杂度近最优。

Comments Appeared at NeurIPS 2025

详情
AI中文摘要

我们研究在被动学习设置中,最近的工作给出了一个计算高效的算法,使用$poly(d,1/\epsilon)$个标记样本,输出误差为$O(opt)+\epsilon$的假设,其中$opt$是最佳拟合ReLU的平方损失。这里我们关注交互式设置,其中学习器对未标记样本的标签具有某种形式的查询访问。我们的主要结果是第一个计算高效的学习器,使用$d \operatorname{polylog}(1/\epsilon)+\tilde{O}(\min\{1/p, 1/\epsilon\})$个黑盒标签查询,其中$p$是目标函数的偏置,并达到误差$O(opt)+\epsilon$。我们通过证明其查询复杂度界在性质上接近最优来补充我们的算法结果,即使忽略计算约束。最后,我们确定查询访问对于改进被动学习的标签复杂度本质上是必要的。具体而言,对于基于池的主动学习,任何主动学习器都需要$\tilde{\Omega}(d/\epsilon)$个标签,除非它抽取了超多项式数量的未标记样本。

英文摘要

We study the task of agnostically learning general (as opposed to homogeneous) ReLUs under the Gaussian distribution with respect to the squared loss. In the passive learning setting, recent work gave a computationally efficient algorithm that uses $poly(d,1/ε)$ labeled examples and outputs a hypothesis with error $O(opt)+ε$, where $opt$ is the squared loss of the best fit ReLU. Here we focus on the interactive setting, where the learner has some form of query access to the labels of unlabeled examples. Our main result is the first computationally efficient learner that uses $d polylog(1/ε)+\tilde{O}(\min\{1/p, 1/ε\})$ black-box label queries, where $p$ is the bias of the target function, and achieves error $O(opt)+ε$. We complement our algorithmic result by showing that its query complexity bound is qualitatively near-optimal, even ignoring computational constraints. Finally, we establish that query access is essentially necessary to improve on the label complexity of passive learning. Specifically, for pool-based active learning, any active learner requires $\tildeΩ(d/ε)$ labels, unless it draws a super-polynomial number of unlabeled examples.

2606.11149 2026-06-10 cs.LG 新提交

Efficiently Learning Drifting Halfspaces with Massart Noise

高效学习带有Massart噪声的漂移半空间

Mingchen Ma, Guyang Cao, Jelena Diakonikolas, Ilias Diakonikolas

发表机构 * University of Wisconsin-Madison(威斯康星大学麦迪逊分校)

AI总结 针对Massart噪声下的漂移概念学习问题,提出一种计算高效的学习器,实现误差η + Õ(Δ^{1/3}/γ),并证明该误差在低度多项式测试下最优。

Comments To appear at ICML 2026

详情
AI中文摘要

我们研究了在Massart噪声存在下学习漂移概念的问题。在该框架中,在线学习者可以访问独立样本的历史记录,这些样本的标签是目标概念的带噪版本,且目标概念可能每轮发生变化。目标是每轮输出一个具有较小预测误差的假设。我们研究了基本类别——边缘可分离线性分类器(半空间)——的学习复杂性。在正面结果方面,我们给出了一种计算高效的学习器,其误差达到η + Õ(Δ^{1/3}/γ),其中η是Massart噪声率的上界,Δ是漂移率,γ是边缘。有趣的是,在可实现设置中,我们技术的改编产生了一个高效学习器,其误差率优于先前工作。在下界方面,我们提供了信息-计算权衡的形式化证据,强烈表明我们算法的性能本质上是最优的。具体来说,虽然信息论最优误差随Δ^{1/2}缩放,但我们证明即使在随机分类噪声的特殊情况下,Δ^{1/3}缩放对于低度多项式测试也是不可避免的。

英文摘要

We study the problem of learning a drifting concept in the presence of Massart noise. In this framework, an online learner has access to a history of independent samples whose labels are noisy versions of a target concept that may change from round to round. The goal is to output, in each round, a hypothesis with small prediction error. We study the complexity of this learning problem for the fundamental class of margin-separable linear classifiers (halfspaces). On the positive side, we give a computationally efficient learner achieving error $η+ \tilde O(Δ^{1/3}/γ)$, where $η$ upper bounds the Massart noise rate, $Δ$ is the drift rate, and $γ$ is the margin. Interestingly, in the realizable setting, an adaptation of our techniques yields an efficient learner with an improved error rate over prior work. On the lower-bound side, we provide formal evidence of an information-computation tradeoff, strongly suggesting that our algorithm's performance is essentially optimal. Specifically, while the information-theoretically optimal error scales with $Δ^{1/2}$, we prove that $Δ^{1/3}$-scaling is unavoidable for low-degree polynomial tests, even in the special case of random classification noise.

2606.11171 2026-06-10 cs.LG cond-mat.stat-mech cs.IT math.IT math.OC math.ST stat.TH 新提交

Algorithmic and Minimax Complexities in Kernel Bandits

核赌博机中的算法与极小极大复杂度

Yunbei Xu

AI总结 本文通过统一MAIR框架,将GP-UCB与MAMS算法置于共同语言下,提出结合两者优势的安全主算法,并证明在过参数化模型中算法复杂度比类宽极小极大或DEC证书更具信息性。

详情
AI中文摘要

高斯过程上置信界(GP-UCB)和决策估计系数(DEC)方法乍看之下可能属于不同的理论。本文将这两种观点置于一个共同的算法信息语言中,用于频率学派RKHS赌博机。GP-UCB固定了一个算法性的(而非真实的)高斯过程先验,并利用实现轨迹的复杂度以及计算可处理性,而MAMS优化了一个鲁棒的类宽MAIR/DEC包络。通过统一的MAIR框架和异质半正定算法先验,我们推广了GP-UCB分析和MAMS算法,提出了一种结合两者优势的安全主算法,并提供了一个核赌博机构造,表明在过参数化模型中算法复杂度可以比类宽极小极大或DEC证书更具信息性。由此得出的信息是:算法信息和类宽极小极大系数回答不同的问题,并可能导致不同的差距;核赌博机提供了一个干净的环境,使得这种区别在数学上变得可见。

英文摘要

Gaussian-process upper confidence bound (GP-UCB) and decision-estimation-coefficient (DEC) methods may appear, at first sight, to belong to different theories. This paper places the two viewpoints in a common algorithmic-information language for frequentist RKHS bandits. GP-UCB fixes an algorithmic, rather than true, Gaussian-process prior and exploits realized-trajectory complexity together with computational tractability, whereas MAMS optimizes a robust class-wide MAIR/DEC envelope. Through the unified MAIR framework and heterogeneous positive-semidefinite algorithmic priors, we generalize both the GP-UCB analysis and the MAMS algorithm, propose a safeguarded master that combines their advantages, and provide a kernel-bandit construction showing that algorithmic complexity can be more informative than class-wide minimax or DEC certificates in overparameterized models. The resulting message is that algorithmic information and class-wide minimax coefficients answer different questions and can lead to different gaps; kernel bandits provide a clean setting in which this distinction becomes mathematically visible.

2606.10119 2026-06-10 stat.ML cs.LG math.ST stat.TH 交叉投稿

Convergence Rates for Neural-Network Estimation with Current-Status Data

当前状态数据下神经网络估计的收敛速度

Yuan Wu, Tianhui Zhou

发表机构 * Department of Biostatistics and Bioinformatics, Duke University, Durham, North Carolina, USA(生物统计学与生物信息学系,杜克大学,达勒姆,北卡罗来纳州,美国)

AI总结 针对当前状态数据,提出非参数神经网络筛最大似然估计器,结合ReLU网络逼近理论与经验过程论证,在Hölder光滑假设下建立显式收敛速度。

详情
AI中文摘要

当前状态数据出现在事件时间仅通过一个指示变量(是否在检查时间之前发生)被观测到时。本文研究了事件时间条件累积分布函数的非参数神经网络筛最大似然估计器。在Hölder光滑假设下,我们通过结合整流线性单元神经网络的逼近理论与经验过程论证,建立了显式收敛速度。这一结果为当前状态观测下的神经网络估计及后续推断提供了理论支持。

英文摘要

Current-status data arise when an event time is observed only through an indicator of whether it occurred before an examination time. This paper studies a nonparametric neural-network sieve maximum likelihood estimator of the conditional cumulative distribution function of the event time. Under Hölder smoothness assumptions, we establish an explicit convergence rate by combining approximation theory for rectified linear unit neural networks with empirical-process arguments. This result provides theoretical support for neural-network estimation and subsequent inference under current-status observation.

2606.10179 2026-06-10 quant-ph cs.LG 交叉投稿

Trainability of IQP Quantum Circuit Born Machines Under Gaussian Initialization

高斯初始化下IQP量子电路玻恩机的可训练性

Gennaro De Luca

发表机构 * Arizona State University(亚利桑那州立大学)

AI总结 研究高斯初始化下IQP量子电路玻恩机的可训练性,利用Stein引理和Lipschitz集中界推导梯度方差下界和偏离概率界,讨论避免或促进指数集中的策略及贫瘠高原条件。

Comments 23 pages

详情
AI中文摘要

量子电路玻恩机(QCBMs)通过利用玻恩规则为生成式机器学习提供了一种自然方法。最近的工作提供了一种通过最大均值差异(MMD)损失来经典训练具有瞬时量子多项式(IQP)电路的QCBMs的方法。尽管从IQP电路经典采样被认为是棘手的,但它们的期望值可以经典计算,从而能够训练这些IQP QCBMs。然而,量子机器学习(QML)模型还面临其他各种挑战,包括由指数集中或贫瘠高原引起的可训练性问题。虽然这些问题已经针对从均匀分布采样的参数进行了探索,但很少有工作严格处理任意高斯初始化方案的使用。本文利用Stein引理和高斯随机变量的Lipschitz集中界,提供了梯度方差的解析下界以及梯度偏离其均值的概率集中界。它讨论了避免或促进指数集中的策略,以及贫瘠高原更可能发生的条件。

英文摘要

Quantum Circuit Born Machines (QCBMs) offer a natural approach to generative machine learning by leveraging the Born rule. Recent work has provided a method to classically train QCBMs with Instantaneous Quantum Polynomial (IQP) circuits via the Maximum Mean Discrepancy (MMD) loss. Despite the assumed intractability of sampling from IQP circuits classically, their expectation values can be computed classically, enabling training of these IQP QCBMs. However, quantum machine learning (QML) models have various other challenges, including trainability issues caused by exponential concentration or barren plateaus. While these issues have been explored for parameters sampled from a uniform distribution, little work has been done to rigorously treat the use of arbitrary Gaussian initialization schemes. This work leverages Stein's lemma and Lipschitz concentration bounds for Gaussian random variables to provide an analytical lower bound of the variance of the gradient and a probabilistic concentration bound of the deviation of the gradient from its mean. It discusses strategies to either avoid or encourage exponential concentration, as well as the conditions under which barren plateaus are more likely to occur.

2606.10361 2026-06-10 stat.ML cs.LG 交叉投稿

Near-Exponential Convergence Rates for kNN Classification based on Boltzmann Margin

基于玻尔兹曼间隔的kNN分类近指数收敛速率

Luyuan Yang, Shayan Shafaei, Chao Lan

发表机构 * School of Computer Science, University of Oklahoma(计算机科学系,俄克拉荷马大学)

AI总结 提出玻尔兹曼间隔条件,介于Tsybakov与Massart间隔之间,首次证明kNN分类器可实现近指数收敛速率。

Comments Conference on Uncertainty in Artificial Intelligence (UAI)

详情
AI中文摘要

分类器的收敛速率分析通常在Tsybakov间隔或Massart间隔下进行。前者是相对较弱的条件,通常产生多项式速率,而后者更强,但能保证指数速率。本文引入一种新条件,称为玻尔兹曼间隔,它填补了这两种机制之间的空白。该条件弱于Massart间隔,通常强于Tsybakov间隔,并在适当条件下能蕴含它们的许多性质。我们将玻尔兹曼间隔应用于kNN分类器的分析,并建立了kNN分类的第一个近指数收敛速率。我们还给出了主要结果的扩展,并提供了支持主要理论结论的数值证据。

英文摘要

Convergence-rate analysis for classifiers is often conducted under either Tsybakov margin or Massart margin. The former is a relatively weak condition that typically yields polynomial rates, while the latter is substantially stronger but can guarantee exponential rates. In this paper, we introduce a new condition, called Boltzmann margin, that bridges the gap between these two regimes. It is weaker than Massart margin, generally stronger than Tsybakov margin, and can imply many of their properties under suitable conditions. We apply Boltzmann margin to the analysis of kNN classifiers and establish the first near-exponential convergence rates for kNN classification. We also present extensions of the main results and provide numerical evidence supporting the main theoretical implications.

2606.10377 2026-06-10 math.ST cs.LG stat.TH 交叉投稿

Bidirectional Random Projections

双向随机投影

Chao Lan, Luyuan Yang

发表机构 * School of Computer Science, University of Oklahoma(俄克拉荷马大学计算机科学学院)

AI总结 本文分析固定设计下普通最小二乘回归的双向随机投影,导出基于投影数据的OLS估计的期望超额损失界,与仅行投影相比,差距约为O(p1 + C/p1),其中C随n1/n变化且可为负。

Comments Statistics & Probability Letters (Elsevier)

详情
AI中文摘要

本文分析了固定设计设置下普通最小二乘(OLS)回归的双向随机投影。设$(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$为样本,$R \in \mathbb{R}^{n_1 \times n}, W \in \mathbb{R}^{p \times p_1}$为两个适当分布的随机投影。我们推导了基于$(WXR, WY)$构建的OLS估计量的期望超额损失界。与基于$(XR, Y)$构建的OLS估计量的已有界相比,差距约为$O\left( p_1 + C \frac{1}{p_1} \right)$,其中$C$随$n_1/n$缩放,且对于小的$n_1/n$可以为负。其含义通过真实世界数据的数值结果得到证实。

英文摘要

This paper analyzes bidirectional random projections for ordinary least squares (OLS) regression under the fixed design setting. Let $(X,Y) \in \mathbb{R}^{n \times p} \times \mathbb{R}^n$ be a sample and $R \in \mathbb{R}^{n_1 \times n}, W \in \mathbb{R}^{p \times p_1}$ be two properly distributed random projections. We develop an expected excess loss bound for the OLS estimator built on $(WXR, WY)$. Compared to an established bound for OLS estimator built on $(XR, Y)$, the gap is approximately $O\left( p_1 + C \frac{1}{p_1} \right)$, where $C$ scales with $n_1/n$ and can be negative for small $n_1/n$. Its implications are confirmed by numerical results on real-world data.

2606.10562 2026-06-10 math.OC cs.LG cs.NA math.NA 交叉投稿

Accelerating SAV-based optimization via randomized low-rank Hessian approximation

基于随机低秩Hessian近似的加速SAV优化方法

Ryo Sagawa, Daisuke Furihata, Yuto Miyatake

发表机构 * Department of Pure and Applied Mathematics, Graduate School of Information Science and Technology, The University of Osaka(纯粹与应用数学系,信息科学与技术研究生学校,大阪大学)

AI总结 提出Nyström增强松弛标量辅助变量方法(N-RSAV),通过随机低秩Nyström近似引入曲率信息加速收敛,并保持无条件修正能量耗散律,在病态问题(如PINNs)中显著快于传统RSAV方法。

Comments 25 pages, 4 figures

详情
AI中文摘要

我们提出了一种新的优化方法,即Nyström增强的松弛标量辅助变量方法(N-RSAV),它将曲率信息融入RSAV框架,以加速收敛,同时保持无条件修正能量耗散律。现有的基于RSAV的方法仅依赖一阶信息,并且通常收敛缓慢,特别是对于病态问题,例如物理信息神经网络(PINNs)中出现的问题。为了解决这一局限性,我们使用从随机低秩Nyström近似获得的近似Hessian信息来设计RSAV方案中的线性算子。为了保持耗散结构,我们通过特征值截断强制执行半正定性。此外,我们引入了一种自适应策略,根据原始能量和修正能量之间的偏差重用近似Hessian,从而显著降低计算成本。我们还提供了在Polyak-Lojasiewicz(PL)条件下具有一般半正定算子的RSAV方案的收敛性分析,并在PL条件和额外凸性假设下建立了N-RSAV的相应收敛保证。在具有有效低秩结构的病态问题(包括凸二次问题和PINNs训练)上的数值实验表明,所提出的方法比传统的基于RSAV的方法实现了更快的收敛。

英文摘要

We propose a new optimization method, the Nyström-enhanced relaxed scalar auxiliary variable method (N-RSAV), which incorporates curvature information into the RSAV framework to accelerate convergence while preserving an unconditional modified energy dissipation law. Existing RSAV-based methods rely solely on first-order information and often suffer from slow convergence, particularly for ill-conditioned problems such as those arising in physics-informed neural networks (PINNs). To address this limitation, we design the linear operator in the RSAV scheme using approximate Hessian information obtained from a randomized low-rank Nyström approximation. To preserve the dissipation structure, we enforce positive semidefiniteness through eigenvalue truncation. Furthermore, we introduce an adaptive strategy that reuses the approximate Hessian based on the deviation between the original and modified energies, significantly reducing computational cost. We also provide a convergence analysis of the RSAV scheme with a general positive semidefinite operator under the Polyak-Lojasiewicz (PL) condition and establish corresponding convergence guarantees for N-RSAV under the PL condition and an additional convexity assumption. Numerical experiments on ill-conditioned problems with effectively low-rank structure, including convex quadratic problems and training of PINNs, demonstrate that the proposed methods achieve substantially faster convergence than conventional RSAV-based approaches.

2606.10770 2026-06-10 stat.ME cs.LG 交叉投稿

Correcting Variable Importance Scored by Random Forests

修正随机森林产生的变量重要性评分

Guancheng Zhou, Haiping Xu, Jason Liu, Donghui Yan

发表机构 * Computer and Information Science(计算机与信息科学) Mathematics and Data Science(数学与数据科学) University of Massachusetts, Dartmouth, MA(马萨诸塞大学达特茅斯分校) The Rivers School, Weston, MA(韦斯特on学校的河流学校)

AI总结 针对随机森林变量重要性受变量间相关性影响的问题,提出基于条件相关性的分组方法进行修正,实验证明两种计算高效方案均能有效校正变量重要性。

Comments 22 pages, 10 figures

详情
AI中文摘要

随机森林产生的变量重要性在统计分析中广泛应用,在辅助模型解释、模型选择和诊断、成本受限学习等任务中发挥重要作用。然而,RF中变量重要性的计算未考虑变量间的相关性,与许多其他变量相关的变量往往会获得较低的重要性指数,或被其他强相关变量完全掩盖(即重要性指数接近零)。为了在计算变量重要性时避免不相关变量的影响,我们提出根据变量的条件相关性(以响应变量为条件)对变量进行分组。我们探索了两种计算高效的方案:一种将变量单独分组,然后将感兴趣的变量与所有相关变量分离;另一种使用聚类根据变量间的成对条件相关性进行分组。实验表明,两种方法都能对变量重要性进行合理的修正。

英文摘要

Variable importance produced by Random Forests (RF) is used widely in statistical data analysis, and has played an important role in a variety of tasks such as assisting model interpretation, model selection and diagnosis, and cost-bounded learning etc. However, the calculation of variable importance in RF does not take into account of the correlations among variables, and variables that are correlated to many other variables tend to receive a lower importance index or being completely masked (i.e., with an importance index near zero) by other strongly correlated variables. To prevent influence from unwanted correlated variables in calculating variable importance, we propose to group variables by their conditional correlations (conditional on the response variable). We explore two computationally efficient options, with one grouping variables individually, and then separates the variable of interest from all correlated variables, while the other uses clustering to group variables according to their pair-wise conditional correlations. Our experiments show that both lead to sensible corrections to the importance of variables.

2606.11045 2026-06-10 cs.AI cs.LG 交叉投稿

What Fits (Into Few Tokens) Doesn't Overfit: Compression and Generalization in ML Research Agents

什么适合(少量标记)就不会过拟合:ML研究智能体中的压缩与泛化

Martin Andres Bertran, Aaron Roth, Zhiwei Steven Wu

发表机构 * Amazon Responsible AI(亚马逊负责任人工智能) University of Pennsylvania(宾夕法尼亚大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 研究LLM驱动的科研智能体在输出和输入压缩下能否保持性能,发现短提示和可压缩反馈足以复现高性能模型,支持成功策略位于低复杂度区域的假设。

详情
AI中文摘要

原则上,自适应地重复使用保留的基准测试会引发过拟合。然而,基准驱动的机器学习(ML)在实践中却出人意料地很少出现过拟合。一个吸引人的假设是,成功的ML策略具有高度可压缩性。我们在LLM驱动的科研智能体设置中研究这一点,通过两个互补的信息瓶颈可以直接检验该假设。在\emph{输出压缩}中,探索智能体使用验证集自适应地搜索高性能模型,我们测试一个全新的“复现智能体”是否能在仅给出极短提示和训练数据的情况下复现其性能。在\emph{输入压缩}中,探索者仅接收一位反馈,指示每个提交的模型是否优于当前最佳。跨越8个数据集(涵盖表格分类、视觉、语言建模、扩散建模和奖励建模),我们发现这些瓶颈对性能影响很小:短提示和可压缩反馈足以复现和找到高性能模型。该假设是可证伪的:当我们故意诱导验证集过拟合时,结果无法用短提示复现。综合来看,我们的结果支持描述长度解释,即基准驱动ML中缺乏过拟合的原因是成功策略占据策略空间的低复杂度区域。

英文摘要

Reusing a held-out benchmark adaptively should, in principle, invite overfitting. Yet benchmark-driven machine learning (ML) has produced surprisingly little overfitting in practice. An attractive hypothesis is that successful ML strategies are highly compressible. We study this in the setting of LLM-driven research agents, where the hypothesis becomes directly testable via two complementary information bottlenecks. In \emph{output compression}, an exploration agent adaptively searches for high-performance models using a validation set, and we test whether a fresh ``reproducer agent'' can reproduce its performance given only an extremely short prompt and the training data. In \emph{input compression}, the explorer receives only one-bit feedback indicating whether each submitted model improves on the running best. Across 8 datasets spanning tabular classification, vision, language modeling, diffusion modeling, and reward modeling, we find that these bottlenecks have little effect on performance: short prompts and compressible feedback are sufficient to reproduce and find high-performance models. The hypothesis is falsifiable: when we deliberately induce validation-set overfitting, the results fail to reproduce with short prompts. Taken together, our results support a description-length explanation for the lack of overfitting in benchmark-driven ML: successful strategies occupy a low-complexity region of strategy space.

2402.00152 2026-06-10 cs.LG cs.NA math.NA stat.ML 版本更新

Deeper or Wider: A Perspective from Optimal Generalization Error with Sobolev Loss

更深还是更宽:从Sobolev损失下最优泛化误差的视角

Yahong Yang, Juncai He

发表机构 * GitHub

AI总结 本文通过分析Sobolev损失下的最优泛化误差,比较了深层网络与宽层网络,揭示了样本点数量、网络参数和损失函数正则性对架构选择的影响,并应用于深度Ritz和PINN方法。

Comments arXiv admin note: text overlap with arXiv:2310.10766, arXiv:2305.08466

详情
AI中文摘要

构建神经网络的架构是机器学习社区的一个具有挑战性的追求,而更深还是更宽的困境仍然是一个持续存在的问题。本文探讨了具有灵活层数的深层神经网络(DeNNs)与具有有限隐藏层的宽神经网络(WeNNs)之间的比较,重点关注它们在Sobolev损失下的最优泛化误差。分析研究表明,神经网络的架构可能受到多种因素的显著影响,包括样本点数量、神经网络内的参数以及损失函数的正则性。具体来说,更多的参数倾向于有利于WeNNs,而增加的样本点数量和损失函数的更大正则性则倾向于采用DeNNs。我们最终将该理论应用于使用深度Ritz和物理信息神经网络(PINN)方法求解偏微分方程,指导神经网络的设计。

英文摘要

Constructing the architecture of a neural network is a challenging pursuit for the machine learning community, and the dilemma of whether to go deeper or wider remains a persistent question. This paper explores a comparison between deeper neural networks (DeNNs) with a flexible number of layers and wider neural networks (WeNNs) with limited hidden layers, focusing on their optimal generalization error in Sobolev losses. Analytical investigations reveal that the architecture of a neural network can be significantly influenced by various factors, including the number of sample points, parameters within the neural networks, and the regularity of the loss function. Specifically, a higher number of parameters tends to favor WeNNs, while an increased number of sample points and greater regularity in the loss function lean towards the adoption of DeNNs. We ultimately apply this theory to address partial differential equations using deep Ritz and physics-informed neural network (PINN) methods, guiding the design of neural networks.

2508.21022 2026-06-10 cs.LG math.OC stat.ML 版本更新

A Sketch-and-Project Analysis of Subsampled Natural Gradient Algorithms

子采样自然梯度算法的草图-投影分析

Gil Goldshlager, Jiang Hu, Lin Lin

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过将子采样自然梯度下降(SNG)视为草图-投影方法,提出基于平方体积采样的新代理,证明单小批量下SNG方向期望等于预处理梯度下降步,给出全局收敛保证和显式收敛率,并解释SNG相对于SGD的优势在于更有效利用模型雅可比矩阵的谱衰减。

Comments 26 pages, 7 figures

详情
AI中文摘要

子采样自然梯度下降(SNG)已被用于实现高精度科学机器学习,但基于随机预条件的标准分析无法洞察实际小样本设置。我们通过将SNG分析为草图-投影方法克服了这一限制。受此视角启发,我们摒弃了使用两个独立小批量解耦梯度和预条件的常规理论代理,取而代之的是基于平方体积采样的新代理。在这个新代理下,我们证明即使存在耦合,SNG方向的期望也等于预处理梯度下降步,从而得到:(i) 使用任意大小的单个小批量时的全局收敛保证,以及(ii) 用与草图-投影结构相关的量显式表征收敛速率。这些发现进而为小样本设置提供了新见解,例如表明SNG相对于SGD的优势在于它能更有效地利用模型雅可比矩阵中的谱衰减。我们还扩展这些思想以解释SNG的一种流行结构化动量方案SPRING,通过证明它自然源于加速草图-投影方法。

英文摘要

Subsampled natural gradient descent (SNG) has been used to enable high-precision scientific machine learning, but standard analyses based on stochastic preconditioning fail to provide insight into realistic small-sample settings. We overcome this limitation by instead analyzing SNG as a sketch-and-project method. Motivated by this lens, we discard the usual theoretical proxy which decouples gradients and preconditioners using two independent mini-batches, and we replace it with a new proxy based on squared volume sampling. Under this new proxy we show that the expectation of the SNG direction becomes equal to a preconditioned gradient descent step even in the presence of coupling, leading to (i) global convergence guarantees when using a single mini-batch of any size, and (ii) an explicit characterization of the convergence rate in terms of quantities related to the sketch-and-project structure. These findings in turn yield new insights into small-sample settings, for example by suggesting that the advantage of SNG over SGD is that it can more effectively exploit spectral decay in the model Jacobian. We also extend these ideas to explain a popular structured momentum scheme for SNG, known as SPRING, by showing that it arises naturally from accelerated sketch-and-project methods.

2510.12071 2026-06-10 cs.LG 版本更新

Influence Dynamics and Stagewise Data Attribution

影响动力学与分阶段数据归因

Jin Hwa Lee, Matthew Smith, Maxwell Adam, Jesse Hoogland

发表机构 * University College London(伦敦大学学院) Independent(独立) University of Melbourne(墨尔本大学) Timaeus

AI总结 针对神经网络训练中样本影响动态变化的问题,基于奇异学习理论提出分阶段数据归因框架,预测影响非单调变化(符号翻转、尖峰),并在玩具模型和语言模型中验证与模型学习阶段的对应。

Comments 28 pages, 15 figures

详情
AI中文摘要

当前的训练数据归因(TDA)方法将样本对另一个样本的影响视为静态的,但神经网络在表现出不同影响模式的独特阶段中学习。在这项工作中,我们引入了一个基于奇异学习理论的分阶段数据归因框架。我们预测影响可以非单调地变化,包括符号翻转和发展转变处的尖锐峰值。我们首先在玩具模型中通过分析和实验验证这些预测,表明影响的动态变化直接映射到模型对语义层次结构的逐步学习。最后,我们在语言模型中大规模展示了这些现象,其中令牌级别的影响变化与已知的发展阶段一致。

英文摘要

Current training data attribution (TDA) methods treat the influence one sample has on another as static, but neural networks learn in distinct stages that exhibit changing patterns of influence. In this work, we introduce a framework for stagewise data attribution grounded in singular learning theory. We predict that influence can change non-monotonically, including sign flips and sharp peaks at developmental transitions. We first validate these predictions analytically and empirically in a toy model, showing that dynamic shifts in influence directly map to the model's progressive learning of a semantic hierarchy. Finally, we demonstrate these phenomena at scale in language models, where token-level influence changes align with known developmental stages.

2512.06343 2026-06-10 cs.LG cs.AI cs.CL 版本更新

When Distance Distracts: Representation Distance Bias in BT-Loss for Reward Models

当距离干扰:BT损失中表示距离偏差对奖励模型的影响

Tong Xie, Andrew Bai, Yuanhao Ban, Yunqi Hong, Haoyu Li, Cho-Jui Hsieh

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 分析BT损失中表示距离导致的梯度偏差,提出NormBT自适应归一化方案,提升奖励模型在细粒度区分上的性能。

Comments ICML 2026

详情
AI中文摘要

奖励模型是RLHF框架中大型语言模型对齐的核心。奖励建模中使用的标准目标是Bradley-Terry(BT)损失,它从由选择和拒绝响应组成的成对数据中学习。在这项工作中,我们分析了BT损失的每个样本梯度,并展示了由于表示距离而产生的虚假学习信号。特别是,BT梯度范数由两个不同的组成部分缩放:(1)预测误差,反映选择和拒绝响应之间预测奖励的差异,以及关键地,(2)在最后一层输出空间中测量的对之间的表示距离。虽然第一项捕获了预期的训练信号,但第二项会显著影响更新幅度并导致学习错位。具体来说,表示距离小的对即使排名错误也经常收到微弱的更新,而距离大的对则收到不成比例的大更新。这导致来自大距离对的梯度掩盖了来自小距离对的梯度,而细粒度区分在小距离对中尤为重要。为了克服这一限制,我们提出了NormBT,一种自适应成对归一化方案,重新缩放更新以平衡表示驱动效应,并将学习信号聚焦于预测误差。NormBT是对BT损失的轻量级、即插即用修改,开销可忽略。在各种LLM骨干和数据集上,NormBT一致地提高了奖励模型性能,在RewardBench的推理类别上取得了超过5%的显著提升,该类别包含大量细粒度对。

英文摘要

Reward models are central to Large Language Model (LLM) alignment within the framework of RLHF. The standard objective used in reward modeling is the Bradley-Terry (BT) loss, which learns from pairwise data consisting of chosen and rejected responses. In this work, we analyze the per-sample gradient of BT-loss and show spurious learning signals due to representation distance. In particular, BT gradient norm scales with two distinct components: (1) prediction error, reflected by the difference in predicted rewards between chosen and rejected responses, and critically, (2) representation distance between the pair measured in the output space of the final layer. While the first term captures the intended training signal, the second term can significantly impact the update magnitude and misalign learning. Specifically, pairs with small representation distance often receive vanishingly weak updates, even when misranked, while pairs with large distance receive disproportionately strong updates. This leads to gradients from large-distance pairs to overshadow those from small-distance pairs, where fine-grained distinctions are especially important. To overcome this limitation, we propose NormBT, an adaptive pair-wise normalization scheme that rescales updates to balance representation-driven effects and focuses learning signals on prediction error. NormBT is a lightweight, drop-in modification to BT loss with negligible overhead. Across various LLM backbones and datasets, NormBT improves reward model performance consistently, with notable gains of over 5% on the Reasoning category of RewardBench, which contains numerous fine-grained pairs.

2603.12785 2026-06-10 cs.LG math.ST stat.TH 版本更新

Upper Bounds for Local Learning Coefficients of Three-Layer Neural Networks

三层神经网络局部学习系数的上界

Yuki Kurumadani

发表机构 * sigmath.es.osaka-u.ac.jp(大阪大学)

AI总结 针对三层神经网络的奇异参数点,提出一种基于预算、需求和供给约束的计数规则来推导局部学习系数的上界,覆盖了swish等激活函数,并在一维输入下与已知精确值一致。

详情
AI中文摘要

已知三层神经网络构成奇异学习模型,其贝叶斯渐近行为由学习系数(或实对数规范阈值)控制。尽管该量在正则模型和某些特殊奇异模型中已被阐明,但在神经网络中广泛适用的评估方法仍然有限。最近,半正则模型的局部学习系数公式被提出,给出了学习系数的上界。然而,该公式仅适用于实现参数集中的非奇异点,不能用于奇异点。特别是对于三层神经网络,所得上界在某些情况下与已知的学习系数值存在显著差异。本文推导了三层神经网络中一类奇异实现参数的局部学习系数上界公式。该公式可解释为在预算、需求和供给约束下的计数规则。在非多项式实解析情况下,该公式适用于一般设置;而在多项式情况下,它适用于真实分布没有隐藏单元的限制。特别地,我们的结果涵盖了诸如swish函数等激活函数,并在上述限制下包括多项式激活函数,从而将先前结果扩展到更广泛的激活函数类。我们进一步证明,当输入维度为一时,上界公式右侧的数值与先前已知的学习系数一致,从而提供了与已知精确结果的有用比较。我们的结果还提供了关于三层神经网络权重参数如何影响学习系数的系统视角。

英文摘要

Three-layer neural networks are known to form singular learning models, and their Bayesian asymptotic behavior is governed by the learning coefficient, or real log canonical threshold. Although this quantity has been clarified for regular models and for some special singular models, broadly applicable methods for evaluating it in neural networks remain limited. Recently, a formula for the local learning coefficient of semiregular models was proposed, yielding an upper bound on the learning coefficient. However, this formula applies only to nonsingular points in the set of realization parameters and cannot be used at singular points. In particular, for three-layer neural networks, the resulting upper bound has been shown to differ substantially from learning coefficient values already known in some cases. In this paper, we derive a formula for an upper bound on local learning coefficients at a class of singular realization parameters in three-layer neural networks. This formula can be interpreted as a counting rule under budget, demand, and supply constraints. In the non-polynomial real-analytic case, the formula applies in general settings, whereas in the polynomial case it applies under the restriction that the true distribution has no hidden units. In particular, our result covers activation functions such as the swish function and also includes polynomial activation functions under the above restriction, thereby extending previous results to a broader class of activation functions. We further show that, when the input dimension is one, the numerical value given by the right-hand side of our upper-bound formula agrees with the previously known learning coefficient, thereby providing a useful comparison with known exact results. Our result also provides a systematic perspective on how the weight parameters of three-layer neural networks affect the learning coefficient.

2605.28057 2026-06-10 cs.LG cs.AI 版本更新

On the Learnability of Test-Time Adaptation: A Recovery Complexity Perspective

测试时自适应可学习性:恢复复杂性视角

Zhi Zhou, Ming Yang, Shi-Yu Tian, Kun-Yang Yu, Lan-Zhe Guo, Yu-Feng Li

发表机构 * Zhi Zhou(周智) Ming Yang(杨明) Shi-Yu Tian(田世宇) Kun-Yang Yu(于坤阳) Lan-Zhe Guo(郭兰哲) Yu-Feng Li(李宇峰)

AI总结 本文提出首个理论框架,通过引入(ε,δ)-恢复复杂性和(ε,ρ)-TTA可学习性,研究非平稳流下测试时自适应的可学习性,并揭示自适应-信息权衡。

Comments Accepted by ICML 2026

详情
AI中文摘要

测试时自适应(TTA)旨在调整模型以在非平稳测试流上保持可靠性能,无需标注数据。尽管其实证成功,非平稳流下TTA的可学习性仍未探索。关键挑战在于缺乏一个原则性的理论框架,该框架同时与TTA目标一致,并捕捉持续演变的分布偏移和内在信息约束。为解决这一差距,我们提出了首个研究TTA可学习性的理论框架,并引入了(ε,δ)-恢复复杂性和(ε,ρ)-TTA可学习性。恢复复杂性衡量偏移后维持超额风险低于目标水平所需的时间(高概率),并进一步扩展到TTA可学习性,衡量TTA的长期可靠性。在该框架内,我们引入了一种新颖的非平稳测试流离散替代,使得对渐变和突变进行统一且易于处理的分析成为可能。我们推导了恢复复杂性的阶匹配下界和上界,揭示了TTA的基本限制和内在的自适应-信息权衡。这些结果为TTA提供了统一的可学习性保证,补充了基于遗憾的分析。

英文摘要

Test-time adaptation (TTA) aims to adapt models to maintain reliable performance on non-stationary test streams without requiring labeled data. Despite its empirical success, the learnability of TTA under non-stationary streams remains unexplored. A key challenge is the lack of a principled theoretical framework that simultaneously aligns with the TTA objective and captures both continuously evolving distribution shifts and intrinsic information constraints. To address this gap, we propose the first theoretical framework for studying the learnability of TTA and introduce $(ε,δ)$-Recovery Complexity and $(ε,ρ)$-TTA Learnability. Recovery complexity measures the post-shift time needed to maintain excess risk below a target level with high probability, and is further extended to TTA learnability, which measures the long-term reliability of TTA. Within this framework, we introduce a novel discrete surrogate for non-stationary test streams, enabling a unified and tractable analysis of both gradual and abrupt shifts. We derive order-wise matching lower and upper bounds on recovery complexity, revealing fundamental limits of TTA and an intrinsic adaptivity-information trade-off. These results provide unified learnability guarantees for TTA that complement regret-based analyses.

2606.04212 2026-06-10 cs.LG stat.ML 版本更新

Edge of Stability Selectively Shapes Learning Across the Data Distribution

稳定性边缘选择性地塑造数据分布上的学习

Shauna Kwag, Anakha Ganesh, Tomaso Poggio, Pierfrancesco Beneventano

发表机构 * MIT(麻省理工学院)

AI总结 本文发现优化中的稳定性边缘(EoS)具有选择性,通过分支干预因果证明了EoS在训练数据子集间重新分配学习,并识别了受益组需满足的两个条件:梯度与Hessian主特征向量对齐,以及梯度幅度持续非零。

Comments ICML HiLD 2026; 27 pages, 22 figures

详情
AI中文摘要

现有对稳定性边缘(EoS)的分析将其视为优化的全局属性。我们表明它也具有选择性:稳定性约束在训练分布的各个子集之间重新分配学习,放大某些组上的进展,同时抑制其他组上的进展。通过从相同训练状态进入或退出EoS regime的分支干预,我们因果地证明了这种权衡,并识别了组受益的两个必要条件。首先,其聚合梯度必须与顶部Hessian特征向量对齐。我们通过一个受控扰动隔离了这一机制,该扰动保持距离但随机化方向,破坏了对齐并消除了优势。其次,该组必须随时间保持非零梯度幅度。在交叉熵损失下,梯度饱和使置信度高的组解耦,将优势转移到输出异常值,后者的梯度持续存在。总之,这些结果表明EoS不仅作为稳定性边界,而且作为控制数据分布上学习分配的机制。

英文摘要

Existing analyses of the edge of stability (EoS) treat it as a global property of optimization. We show that it is also selective: the stability constraint redistributes learning across subsets of the training distribution, amplifying progress on some groups while suppressing progress on others. Using a branching intervention that enters or exits the EoS regime from the same training state, we causally demonstrate this trade-off and identify two necessary conditions for a group to benefit. First, its aggregate gradient must align with the top Hessian eigenvector. We isolate this mechanism with a controlled perturbation that preserves distance but randomizes direction, destroying alignment and eliminating the advantage. Second, the group must sustain non-vanishing gradient magnitude over time. Under cross-entropy loss, gradient saturation decouples confidently classified groups, shifting the advantage to output-outliers, whose gradients persist. Together, these results show that EoS functions not only as a stability boundary, but as a mechanism governing the allocation of learning across the data distribution.

2606.07088 2026-06-10 cs.LG math.OC 版本更新

Residual-Controlled Multiplier Learning for Stochastic Constrained Decision-Making

残差控制乘子学习用于随机约束决策

Kang Liu, Jianchen Hu, Ziyu Qu, Edward Hengzhou Yan, Lun Yang, Meng Zhang

发表机构 * Xi’an Jiaotong University(西安交通大学) Tencent(腾讯) China University of Geosciences(中国地质大学)

AI总结 提出残差控制乘子学习(RCML),通过将乘子更新重构为投影压力反馈,并引入模块化随机稳定组件,解决随机约束决策中原始-对偶方法因小批量噪声导致乘子更新不稳定的问题,实现有限增益收敛和局部KKT残差解释。

详情
AI中文摘要

随机约束决策需要在强制执行统计要求(如安全性或公平性)的同时优化性能目标。然而,标准的原始-对偶方法在随机小批量反馈下难以稳健地更新乘子,因为小批量梯度和约束估计的噪声会直接累积到乘子记忆中。为了解决这个问题,我们提出了残差控制乘子学习(RCML),它将乘子更新重新表述为投影压力反馈。核心思想是将投影乘子分解为用于原始下降的有效压力信号和用于有限增益乘子跟踪的压力记忆残差。为了处理异质和有噪声的观测,我们进一步用模块化随机稳定组件增强这个残差-积分骨干。对于凸-仿射骨干,我们建立了有限增益收敛,推导了小批量反馈下的随机残差界,并表明在非凸问题的正则KKT点附近,残差反馈律具有局部KKT残差解释。在优化、分配和公平排序任务上的实验表明,RCML在保持竞争性目标性能的同时,改善了可行性控制和乘子稳定性。代码可在此处获取。

英文摘要

Stochastic constrained decision-making requires optimizing performance objectives while enforcing statistical requirements such as safety or fairness. However, standard primal--dual methods struggle to update multipliers robustly under stochastic mini-batch feedback, as the noise of mini-batch gradients and constraint estimates can be directly accumulated into the multiplier memory. To address this issue, we propose Residual-Controlled Multiplier Learning (RCML), which reformulates multiplier updating as projected-pressure feedback. The central idea is to decompose the projected multiplier into an effective pressure signal for primal descent and a pressure-memory residual for finite-gain multiplier tracking. To handle heterogeneous and noisy observations, we further augment this residual-integral backbone with modular stochastic stabilization components. For the convex-affine backbone, we establish finite-gain convergence, derive a stochastic residual bound under mini-batch feedback, and show that the residual feedback law admits a local KKT-residual interpretation near regular KKT points of nonconvex problems. Experiments across optimization, allocation, and fair-ranking tasks show that RCML improves feasibility control and multiplier stability while maintaining competitive objective performance. Code is released at https://anonymous.4open.science/r/RCML-3114/.

2503.20272 2026-06-10 stat.ML cs.LG 版本更新

An $(ε,δ)$-accurate level set estimation with a stopping criterion

一个具有停止准则的 $(\epsilon,\delta)$-精确水平集估计

Hideaki Ishibashi, Kota Matsui, Kentaro Kutsukake, Hideitsu Hino

发表机构 * Kyushu Institute of Technology(九州工业技术大学) Nagoya University / RIKEN AIP(名古屋大学 / RIKEN AIP) The Institute of Statistical Mathematics/ RIKEN AIP(统计数学研究所 / RIKEN AIP)

AI总结 提出一种带停止准则的水平集估计获取策略,理论上证明满足 $\epsilon$-精确度和 $1-\delta$ 置信水平,减少不必要的函数评估,实验验证了其有效性。

详情
AI中文摘要

水平集估计问题旨在识别候选点集内未知且评估代价高昂的函数值超过指定阈值的区域,为全面评估函数值提供了一种高效替代方案。传统方法通常采用序列优化策略来寻找 $\epsilon$-精确解,该解允许在阈值轮廓周围留有余量,但往往缺乏有效的停止准则,导致过度探索和效率低下。本文引入了一种带有停止准则的水平集估计获取策略,确保算法在进一步探索不太可能带来改进时停止,从而减少不必要的函数评估。我们从理论上证明,该方法在 $1-\delta$ 的置信水平下满足 $\epsilon$-精确度,弥补了现有方法的一个关键空白。此外,我们表明这还带来了对 F-score 等性能指标下限的保证。数值实验表明,所提出的获取函数在达到与现有方法相当的精确度的同时,确认了停止准则在充分探索后有效终止算法。

英文摘要

The level set estimation problem seeks to identify regions within a set of candidate points where an unknown and costly to evaluate function's value exceeds a specified threshold, providing an efficient alternative to exhaustive evaluations of function values. Traditional methods often use sequential optimization strategies to find $ε$-accurate solutions, which permit a margin around the threshold contour but frequently lack effective stopping criteria, leading to excessive exploration and inefficiencies. This paper introduces an acquisition strategy for level set estimation that incorporates a stopping criterion, ensuring the algorithm halts when further exploration is unlikely to yield improvements, thereby reducing unnecessary function evaluations. We theoretically prove that our method satisfies $ε$-accuracy with a confidence level of $1 - δ$, addressing a key gap in existing approaches. Furthermore, we show that this also leads to guarantees on the lower bounds of performance metrics such as F-score. Numerical experiments demonstrate that the proposed acquisition function achieves comparable precision to existing methods while confirming that the stopping criterion effectively terminates the algorithm once adequate exploration is completed.

2509.17251 2026-06-10 stat.ML cs.LG 版本更新

Risk Comparisons in Linear Regression: Implicit Regularization Dominates Explicit Regularization

线性回归中的风险比较:隐式正则化主导显式正则化

Jingfeng Wu, Peter L. Bartlett, Sham M. Kakade, Jason D. Lee, Bin Yu

发表机构 * University of California, Berkeley(加州大学伯克利分校) Alphabetical order Harvard University(哈佛大学) Google DeepMind(谷歌DeepMind)

AI总结 本文通过实例比较线性回归中梯度下降、岭回归和随机梯度下降的有限样本风险,发现梯度下降优于岭回归,但与随机梯度下降不可比,且在某些问题中梯度下降可能更差。

Comments Accepted for presentation at the Conference on Learning Theory (COLT) 2026

详情
AI中文摘要

现有理论表明,对于按容量和源条件分类的线性回归问题,梯度下降(GD)始终是极小化最优的,而岭回归和在线随机梯度下降(SGD)对于某些类别的问题则是多项式次优的。超越极小化理论,本文为任何良好设定的线性回归问题提供了这些算法有限样本风险的实例比较。我们的分析得出三个关键发现。首先,GD 优于岭回归:在可比较的正则化下,GD 的过剩风险始终在岭回归的一个常数因子内,但即使经过最优调整,岭回归也可能多项式地更差。其次,GD 与 SGD 不可比。虽然已知对于某些问题 GD 可以多项式地优于 SGD,但反之亦然:我们受良性过拟合理论启发构造了问题,其中最优停止的 GD 多项式地更差。最后,对于一类重要子问题——具有快速且连续衰减协方差谱的问题,GD 优于 SGD,这包括所有满足标准容量条件的问题。

英文摘要

Existing theory suggests that for linear regression problems categorized by capacity and source conditions, gradient descent (GD) is always minimax optimal, while both ridge regression and online stochastic gradient descent (SGD) are polynomially suboptimal for certain categories of such problems. Moving beyond minimax theory, this work provides instance-wise comparisons of the finite-sample risks for these algorithms on any well-specified linear regression problem. Our analysis yields three key findings. First, GD dominates ridge regression: with comparable regularization, the excess risk of GD is always within a constant factor of that of ridge, but ridge can be polynomially worse even when tuned optimally. Second, GD is incomparable with SGD. While it is known that for certain problems GD can be polynomially better than SGD, the reverse is also true: we construct problems, inspired by benign overfitting theory, where optimally stopped GD is polynomially worse. Finally, GD dominates SGD for a significant subclass of problems -- those with fast and continuously decaying covariance spectra -- which includes all problems satisfying the standard capacity condition.

2511.22331 2026-06-10 math.OC cs.AI cs.LG 版本更新

On the Condition Number Dependency in Bilevel Optimization

关于双层优化中条件数依赖性的研究

Lesi Chen, Jingzhao Zhang

发表机构 * IIIS, Tsinghua University(清华大学信息学院)

AI总结 本文针对非凸上层、强凸下层的双层优化问题,建立了条件数依赖性的下界,揭示了双层与极小极大优化在条件数依赖上的首次可证明差距。

Comments This new version improves deterministic lower bounds in v1

详情
AI中文摘要

双层优化最小化一个由上层问题定义的目标函数,其可行域是下层问题的解集。我们研究当上层问题非凸、下层问题强凸时,使用一阶方法寻找 $\epsilon$-稳定点的 oracle 复杂度。近期工作 (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen et al., JMLR 2025) 达到了 $\tilde{\mathcal{O}}(\bar \kappa_y^4 \epsilon^{-2})$ 的上界,在 $\epsilon$ 上接近最优,通过在内循环中朴素应用 Nesterov 加速可降至 $\tilde{\mathcal{O}}(\bar \kappa_y^{7/2} \epsilon^{-2})$,其中 $\bar \kappa_y$ 是全局条件数。然而,条件数的最优依赖性未知。本文建立了新的 $\Omega(\kappa_y^{5/2} \epsilon^{-2})$ 下界,其中 $\kappa_y < \bar \kappa_y$ 是下层条件数,当光滑常数为 $\mathcal{O}(1)$ 时与 $\bar \kappa_y$ 同阶。我们的下界首次证明了在此设定下双层问题与极小极大优化在条件数依赖性上的可证明差距。下界可推广到多种设置,包括高阶光滑函数、随机 oracle 和凸超目标:(1) 对于二阶和任意光滑问题,我们分别给出 $\Omega({\kappa_y^{31/14}} \epsilon^{-12/7})$ 和 $\Omega(\kappa_y^{21/10} \epsilon^{-8/5})$ 的下界。(2) 对于凸-强凸问题,我们将先前最佳下界 (Ji and Liang, JMLR 2022) 从 $\Omega(\kappa_y /\sqrt{\epsilon})$ 改进为 $\Omega(\kappa_y^{3/2} / \sqrt{\epsilon})$。(3) 对于光滑随机问题,我们也给出 $\Omega(\kappa_y^4 \epsilon^{-4})$ 的下界。

英文摘要

Bilevel optimization minimizes an objective function, defined by an upper-level problem whose feasible region is the solution of a lower-level problem. We study the oracle complexity of finding an $ε$-stationary point with first-order methods when the upper-level problem is nonconvex, and the lower-level problem is strongly convex. Recent works (Ji et al., ICML 2021; Arbel and Mairal, ICLR 2022; Chen et al., JMLR 2025) achieve a $\tilde{\mathcal{O}}(\bar κ_y^4 ε^{-2})$ upper bound that is near-optimal in $ε$, which can be reduced to $\tilde{\mathcal{O}}(\bar κ_y^{7/2} ε^{-2})$ by a naive application of Nesterov acceleration in the inner loop, where $\bar κ_y$ is the global condition number. However, the optimal dependency on the condition number is unknown. In this work, we establish a new $Ω(κ_y^{5/2} ε^{-2})$ lower bound, where $κ_y < \bar κ_y$ is the lower-level condition number that is of the same order as $\bar κ_y$ when the smoothness constants are $\mathcal{O}(1)$. Our lower bound establishes the first provable gap in terms of condition number dependency between bilevel problems and minimax problems in this setup. Our lower bounds can be extended to various settings, including high-order smooth functions, stochastic oracles, and convex hyper-objectives: (1) For second-order and arbitrarily smooth problems, we show lower bounds of $Ω({κ_y^{31/14}} ε^{-12/7})$ and $Ω(κ_y^{21/10} ε^{-8/5})$, respectively. (2) For convex-strongly-convex problems, we improve the previously best lower bound (Ji and Liang, JMLR 2022) from $Ω(κ_y /\sqrtε)$ to $Ω(κ_y^{3/2} / \sqrtε)$. (3) For smooth stochastic problems, we also show a lower bound of $Ω(κ_y^4 ε^{-4})$.

2601.20970 2026-06-10 math.OC cs.IT cs.LG math.IT 版本更新

The hyper-scaled NLP bound for maximum-entropy remote sampling

最大熵远程采样的超缩放NLP界

Gabriel Ponte, Marcia Fampa, Jon Lee

发表机构 * University of Michigan(密歇根大学) Universidade Federal do Rio de Janeiro(里约热内卢联邦大学)

AI总结 针对最大熵远程采样问题,提出基于凸松弛的超缩放NLP界,理论上优于传统互补NLP界,并适用于秩亏协方差矩阵,通过数值实验验证了算法先进性。

详情
AI中文摘要

最大熵远程采样问题(MERSP)是从$n$个随机变量中选择一个包含$s$个随机变量的子集,以最大化关于一组不可直接观测的目标随机变量的信息。我们假设所有这些随机变量服从联合高斯分布,并且协方差矩阵已知。最后,我们使用香农微分熵来度量信息。解决中等规模MERSP实例精确解的主要方法是分支定界(B\&B),因此先前的工作集中于上界。在我们的工作之前,有两种MERSP的上界方法:所谓的“互补NLP界”和“谱界”,两者都是25年前提出的。我们现在能够建立这两个上界之间的支配关系。此外,我们基于一个微妙的凸松弛,提出了一种新颖有效的“超缩放NLP界”(hNLP界)。MERSP的“互补”版本hNLP界推广了先前MERSP的互补NLP界。我们提供了理论保证,给出了互补hNLP界严格支配互补NLP界的充分条件。此外,hNLP公式允许我们在秩亏协方差矩阵满足技术条件时推导其上界。这与先前的NLP界仅适用于正定协方差矩阵(因为它依赖于互补公式)形成对比。此外,我们描述了计算超缩放参数的步骤。最后,对于B\&B,我们提供了一种变量固定方法和结果,指导构建子问题的最佳方式。在基准实例上的数值实验证明了我们的方法在推进MERSP算法前沿方面的有效性。

英文摘要

The maximum-entropy remote sampling problem (MERSP) is to select a subset of $s$ random variables from a set of $n$ random variables, so as to maximize the information concerning a set of target random variables that are not directly observable. We assume that the set of all of these random variables follows a joint Gaussian distribution, and that we have the covariance matrix available. Finally, we measure information using Shannon's differential entropy. The main approach for exact solution of moderate-sized instances of MERSP has been branch-and-bound (B\&B), and so previous work concentrated on upper bounds. Prior to our work, there were two upper-bounding methods for MERSP: the so-called ``complementary NLP bound'' and the ``spectral bound'', both introduced 25 years ago. We are able now to establish domination results between these two upper bounds. Further, we propose a novel and effective ``hyper-scaled NLP bound'' (hNLP bound) based on a subtle convex relaxation. The ``complementary'' version of hNLP bound for MERSP generalizes the previous complementary NLP bound for MERSP. We provide theoretical guarantees, giving sufficient conditions under which the complementary hNLP bound strictly dominates the complementary NLP bound. In addition, the hNLP formulation allows us to derive upper bounds for rank-deficient covariance matrices when they satisfy a technical condition. This is in contrast to the previous NLP bound that worked with only positive definite covariance matrices (because it was wedded to a complementary formulation). Additionally, we describe procedures for calculating hyper-scaling parameters. Finally, for B\&B, we provide a variable-fixing methodology and results guiding the best way to construct subproblems. Numerical experiments on benchmark instances demonstrate the effectiveness of our approaches in advancing the algorithmic state-of-the-art for MERSP.

2603.02673 2026-06-10 stat.ML cs.LG 版本更新

Exact Functional ANOVA Decomposition for Categorical Inputs Models

类别输入模型的精确函数ANOVA分解

Baptiste Ferrere, Nicolas Bousquet, Fabrice Gamboa, Jean-Michel Loubes, Joseph Muré

发表机构 * Institut de Mathématiques de Toulouse(图卢兹数学研究所) Université de Toulouse(图卢兹大学) CNRS(国家科学研究中心)

AI总结 针对类别输入模型,提出一种无需假设的闭式函数ANOVA分解方法,高效处理任意依赖结构,并自然推广SHAP值。

详情
AI中文摘要

函数ANOVA通过将模型预测分解为主效应和高阶交互,为可解释性提供了原则性框架。对于独立特征,该分解定义明确,与SHAP值紧密相关,并作为加性可解释性的基石。然而,对于一般依赖分布,缺乏显式闭式表达式迫使实践者依赖昂贵的基于采样的近似。我们完全解决了类别输入的这一限制。通过将函数分析与离散傅里叶分析的扩展相结合,我们在没有任何假设的情况下推导出闭式分解。我们的公式计算效率非常高。它无缝地恢复了经典独立情况,并扩展到任意依赖结构,包括具有非矩形支撑的分布。此外,利用SHAP与ANOVA在独立性下的内在联系,我们的框架为一般类别设置提供了SHAP值的自然推广。

英文摘要

Functional ANOVA offers a principled framework for interpretability by decomposing a model's prediction into main effects and higher-order interactions. For independent features, this decomposition is well-defined, strongly linked with SHAP values, and serves as a cornerstone of additive explainability. However, the lack of an explicit closed-form expression for general dependent distributions has forced practitioners to rely on costly sampling-based approximations. We completely resolve this limitation for categorical inputs. By bridging functional analysis with the extension of discrete Fourier analysis, we derive a closed-form decomposition without any assumption. Our formulation is computationally very efficient. It seamlessly recovers the classical independent case and extends to arbitrary dependence structures, including distributions with non-rectangular support. Furthermore, leveraging the intrinsic link between SHAP and ANOVA under independence, our framework yields a natural generalization of SHAP values for the general categorical setting.

2605.17189 2026-06-10 stat.ML cs.IT cs.LG math.IT math.ST stat.TH 版本更新

Sample-efficient inductive matrix completion with noise and inexact side-information

具有噪声和不精确侧信息的样本高效归纳矩阵补全

Yuepeng Yang, Cong Ma

发表机构 * Yale Department of Statistics and Data Sciences, Yale University(耶鲁大学统计与数据科学系) UChicago Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 本文研究了在存在噪声和不精确侧信息的情况下,通过非凸投影梯度下降算法实现样本高效的归纳矩阵补全,提出了一个适用于有效问题规模的正则性条件,实现了线性收敛和估计误差仅依赖于有效问题规模的结论。

详情
AI中文摘要

低秩矩阵补全是一个广泛研究的问题,具有许多变体。归纳矩阵补全(IMC)结合了行和列的侧信息以显著缩小搜索空间。先前的工作分为两个领域:利用这种结构实现减少样本复杂度的方法,但仅适用于无噪声环境;以及处理噪声但需要样本复杂度与环境矩阵维度相匹配的方法,从而放弃了侧信息应提供的样本效率。在本文中,我们通过研究具有噪声的IMC并使用非凸投影梯度下降算法进行谱初始化来填补这一差距。我们的主要技术贡献是建立一个适用于由有效问题规模决定的减少样本复杂度的IMC损失函数的正则性条件,其规模与侧信息维度而非环境维度成比例。这直接导致了线性收敛和估计误差仅依赖于有效问题规模而非环境矩阵维度。我们进一步将分析扩展到不精确侧信息设置,证明减少的样本复杂度得以保持,并且估计误差在不精确性方面是最佳的。广泛的模拟和在MovieLens数据集上的实际实验验证了我们的理论发现。

英文摘要

Inductive matrix completion (IMC) is a variant of low-rank matrix completion that incorporates row and column side-information. In principle, it can reduce the effective dimension of the recovery problem from the ambient matrix size to the dimension of the side-information features. Existing theory, however, does not fully realize this advantage in the noisy setting: sample-efficient guarantees only apply to noiseless recovery, while noisy guarantees require sample sizes comparable to ordinary matrix completion. This paper closes this gap for noisy IMC. We analyze a nonconvex projected gradient descent algorithm with spectral initialization and prove that, under exact side-information, it achieves linear convergence and stable recovery at a sample complexity governed by the effective side-information dimension rather than the ambient matrix dimension. The key technical ingredient is a local regularity condition for the IMC loss that holds at this reduced sample size, despite the mismatch between the observation pattern and the side-information subspaces. We further extend the analysis to inexact side-information, showing that the same reduced sample complexity is preserved and that the estimation error degrades optimally with the level of subspace misspecification. Motivated by this trade-off, we also propose a penalized interpolation between IMC and ordinary matrix completion that balances sample efficiency against robustness to imperfect side-information. Simulations and experiments on the MovieLens dataset support the theoretical findings and illustrate the practical benefits of exploiting side-information in low-sample regimes.

2606.08799 2026-06-10 stat.ML cs.LG 版本更新

Generalization in Nonlinear Least Squares via Learned Feature Geometry

非线性最小二乘中基于学习特征几何的泛化性

Ayub Kharel, Ilja Kuzborskij, Patrick Rebeschini, Yasin Abbasi-Yadkori

发表机构 * University of Oxford(牛津大学) Google DeepMind(谷歌DeepMind) Sapient Intelligence(智睿科技)

AI总结 通过算法稳定性分析岭正则化非线性最小二乘的泛化误差,利用经验雅可比Gram矩阵和残差曲率项定义数据依赖的有效维度,并证明其与内在维度而非参数数量相关。

Comments Preprint, under review

详情
AI中文摘要

我们通过平均算法稳定性研究了岭正则化非线性最小二乘模型的泛化性,推导了局部极小值点的误差界,该误差界依赖于数据依赖的有效维度,该维度通过经验雅可比Gram矩阵和残差-曲率项反映了训练参数处梯度模型的几何结构。在线性情况下,曲率项消失,这恢复了雅可比核协方差的经典有效维度,但评估的是训练后的模型而非初始化时的模型(如神经正切核分析中常见)。我们进一步通过梯度特征的覆盖复杂度来界定该有效维度,从而得到依赖于学习几何而非参数数量的保证。特别地,对于流形支持的数据和分段Lipschitz雅可比矩阵,界限随内在维度缩放;而对于单隐层ReLU网络,该机制可通过激活稳定区域的数量显式表达。在合成流形、聚类分布和基准数据集上的实验展示了训练后雅可比矩阵的压缩、残差-曲率线性化的紧致性,以及稳定性界限与观测泛化差距的一致性。我们界限的一个关键特征是推导的简洁性,它基于强对数凹噪声下的Brascamp-Lieb不等式从第一性原理得出。

英文摘要

We study the generalization of ridge-regularized nonlinear least-squares models via on-average algorithmic stability, deriving error bounds for local minimizers in terms of a data-dependent effective dimension that reflects the geometry of the gradient model at the trained parameters, through the empirical Jacobian Gram matrix and a residual-curvature term. In the linear case, where the curvature term vanishes, this recovers the classical effective dimension of the Jacobian kernel covariance, but evaluated at the trained model rather than at initialization as is typical in neural tangent kernel analyses. We further bound this effective dimension via covering complexity of the gradient features, leading to guarantees that depend on learned geometry rather than parameter count. In particular, for manifold-supported data and piecewise Lipschitz Jacobians, the bounds scale with intrinsic dimension, while for one-hidden-layer ReLU networks, the mechanism can be made explicit through counts of activation-stable regions. Experiments on synthetic manifolds, clustered distributions, and benchmark datasets illustrate trained-Jacobian compression, the tightness of the residual-curvature linearization, and agreement between the stability bound and observed generalization gaps. A key feature of our bounds is the simplicity of their derivation, which follows from first principles using the Brascamp-Lieb inequality under strongly log-concave noise.

6. 高效学习、压缩与部署 27 篇

2606.09877 2026-06-10 cs.LG cs.CE cs.CL 新提交

Streaming Knowledge Compilation: Proactive Materiality-Scored Pinning for Time-Evolving LLM Wikis

流式知识编译:面向时变LLM维基的主动物质性评分固定

Juan M. Huerta

发表机构 * Zinnia Tech Solutions(Zinnia科技解决方案)

AI总结 提出流式知识编译框架,通过物质性信号φ_t主动固定重要文档,在金融和维基百科领域验证O(√T log K)遗憾界,并揭示LLM评判偏差。

详情
AI中文摘要

LLM维基系统将知识编译为预填充的KV缓存以实现高效推理,但假设语料库是静态的——当底层信息格局演变时,这一假设失效。我们形式化流式知识编译:给定文档流、固定令牌预算以及在摄取时未知的未来查询,维护一个编译后的维基,使其相对于具有完美预见力的离线oracle的累积遗憾最小化。关键洞察是物质性信号φ_t(k,n)∈[0,1],它对时间t实体k的文档重要性进行评分,作为查询相关性的代理,在查询到达前主动固定;我们证明O(√T log K)遗憾界,其中ε=E[|φ_t-φ̂_t|]是唯一的领域特定量。我们在两个领域实例化:金融领域,其中φ_t是由冻结的Llama 3.1 8B分类头预测的异常股票波动率(在76K篇文章上AUROC=0.728,严格时间分割;预测为物质性的文章实现1.49倍更高的实际远期波动率);以及维基百科领域,其中φ_t是异常编辑比率(AER),一种横截面标准化的编辑速度——表明同一算法可泛化到金融领域之外。在173个匹配对(金融)和119个(维基百科)上的端到端QA评估揭示了训练后知识上普遍的LLM-as-judge混淆,确立了遗憾分析——而非绝对QA分数——是编译知识系统的可靠评估指标。金融累积遗憾收敛至-20.0(-0.12/步);维基百科收敛至+16.0(+0.13/步),正号确认维基百科编辑内容确实是训练后的——更丰富的上下文持续提高分数(无维基3.80 vs. Oracle 4.74)——并消除了这一混淆。O(√T log K)保证适用于任何知识差距可从流信号预测的领域。

英文摘要

LLM wiki systems compile knowledge into pre-filled KV caches for efficient inference, but assume a static corpus -- an assumption that fails whenever the underlying information landscape evolves. We formalize Streaming Knowledge Compilation: given a document stream, a fixed token budget, and future queries unknown at ingestion time, maintain a compiled wiki that minimizes cumulative regret against an offline oracle with perfect foresight. The enabling insight is a materiality signal $ϕ_t(k,n)\in[0,1]$ that scores document importance for entity $k$ at time $t$, acting as a query-relevance surrogate for proactive pinning before queries arrive; we prove an $O(\sqrt{T\log K})$ regret bound where $\varepsilon=\mathbb{E}[|ϕ_t-\hatϕ_t|]$ is the only domain-specific quantity. We instantiate in two domains: finance, where $ϕ_t$ is abnormal stock volatility predicted by frozen Llama 3.1 8B classification head (AUROC = 0.728 on 76K articles, strict temporal split; $1.49\times$ higher realized forward volatility for predicted-material articles); and Wikipedia, where $ϕ_t$ is the Abnormal Edit Ratio (AER), a cross-sectionally normalized edit velocity -- showing the same algorithm generalizes beyond the finance domain. End-to-end QA evaluation on 173 matched pairs (finance) and 119 (Wikipedia) reveals a pervasive LLM-as-judge confound on post-training knowledge, establishing that regret analysis -- not absolute QA scores -- is the reliable evaluation metric for compiled knowledge systems. Finance cumulative regret converges to -20.0 (-0.12/step); Wikipedia to +16.0 (+0.13/step), with the positive sign confirming that Wikipedia edit content is genuinely post-training -- richer context consistently improves scores (No Wiki 3.80 vs. Oracle 4.74) -- and eliminates this confound. The $O(\sqrt{T\log K})$ guarantee applies to any domain where knowledge gaps can be predicted from streaming signals.

2606.09879 2026-06-10 cs.LG 新提交

Operator Fusion for LLM Inference on the Tensix Architecture

面向Tensix架构的LLM推理算子融合

Qingbo Wu, Ke Li, Wenzhu Wang, Jie Yu, Ruian Zhang, Lili Liu

发表机构 * KylinSoft Co., Ltd, Tianjin, China(KylinSoft有限公司,天津,中国) openKylin Community, Tianjin, China(openKylin社区,天津,中国) College of Computer Science and Technology, National University of Defense Technology, Changsha, China(计算机科学与技术学院,国防科技大学,长沙,中国)

AI总结 针对Tensix架构的LLM推理瓶颈,提出RMSNorm与矩阵乘法融合的算子融合策略,利用片上SRAM和NoC多播减少DRAM读写与调度开销,在Wormhole平台上实现注意力延迟降低37.44%、MLP延迟降低15.89%,且数值一致性保持98.75%以上。

Comments 11 pages, 5 figures

详情
AI中文摘要

本研究解决了Tenstorrent Tensix架构上Transformer模型的设备端推理瓶颈,并提出了一种增强数据局部性的算子融合策略。RMSNorm与自注意力和FFN中的矩阵乘法融合,使得内存受限和计算受限的算子能够在片上SRAM中连续执行,从而显著减少中间结果的DRAM读写次数和调度开销。为了支持多核并行,利用基于NoC的多播机制,其中行/列主节点高效地将输入和权重分发到核心网格,缓解DRAM带宽争用。在Wormhole平台上使用Qwen2.5-0.5B、Qwen3-0.6B和Qwen3-4B进行的实验显示,注意力延迟降低高达37.44%,MLP延迟降低15.89%,每解码层延迟降低高达7.91%,同时皮尔逊相关系数(PCC)保持在98.75%以上,证实了在数值一致性下显著的端到端效率提升。

英文摘要

This study addresses on-device inference bottlenecks of Transformer models on Tenstorrent's Tensix architecture and proposes an operator fusion strategy that enhances data locality. RMSNorm is fused with matrix multiplication in self-attention and in the FFN, enabling back-to-back execution of memory-bound and compute-bound operators in on-chip SRAM to significantly reduce DRAM reads/writes of intermediate results and scheduling overhead. To support multi-core parallelism, a NoC-based multicast mechanism is leveraged in which row/column master nodes efficiently distribute inputs and weights across the core mesh, alleviating DRAM bandwidth contention. Experiments on the Wormhole platform with Qwen2.5-0.5B, Qwen3-0.6B, and Qwen3-4B show up to 37.44% latency reduction for attention and 15.89% for MLP, with up to 7.91% reduction per decoder layer, while Pearson Correlation Coefficient (PCC) remains above 98.75%, confirming significant end-to-end efficiency gains under numerical consistency.

2606.09885 2026-06-10 cs.LG stat.ML 新提交

TENP: Trapezoidal Expert Neuron Pruning For Mixture-of-Experts

TENP:用于混合专家的梯形专家神经元剪枝

Jiangyang He, Shaolin Zhu, Deyi Xiong

发表机构 * TJUNLP Lab, School of Computer Science and Technology, Tianjin University(天津大学计算机科学与技术学院 TJUNLP实验室)

AI总结 提出TENP框架,通过识别重要专家并对其余专家进行神经元剪枝,保留梯形参数模式,在40%路由专家稀疏度和平均63.76%激活参数下,DeepSeek模型准确率仅下降1点,代码生成任务提升10%。

详情
AI中文摘要

混合专家大语言模型通过稀疏激活实现高效扩展,但其部署受到专家大量静态参数占用的根本限制。现有压缩方法要么移除整个专家,破坏路由拓扑并损害性能,要么依赖非结构化权重剪枝,实际效率有限。为解决这些局限,我们提出TENP,一种结构化的梯形专家神经元剪枝框架。使用少量样本,我们识别并保留重要专家,同时对次要专家应用专家神经元剪枝(ENP),从浅层到深层以梯形模式保留模型参数。在评估专家重要性时,我们联合考虑专家输出的幅度及其改变输入向量方向的能力。对于ENP,我们测量每个神经元对专家输出的投影贡献,以识别并保留重要神经元。我们在Qwen和DeepSeek模型上进行了广泛实验。在路由专家稀疏度为40%且平均激活63.76%专家参数的情况下,DeepSeek模型相比全参数模型准确率仅下降1点。此外,在代码生成任务上,它比全参数模型提升10%。

英文摘要

Mixture-of-Experts large language models (LLMs) scale efficiently through sparse activation, yet their deployment is fundamentally constrained by the large static parameter footprint of experts. Existing compression approaches either remove entire experts, disrupting routing topology and harming performance, or rely on unstructured weight pruning with limited practical efficiency. To address the limitations, we propose TENP, a structured Trapezoidal ExpertNeuron Pruning framework. Using a few samples, we identify and retain important experts, while applying expert neuron pruning (ENP) to less important experts, reserving model parameters in a trapezoidal pattern from shallow to deep layers. When evaluating expert importance, we jointly consider both the magnitude of the expert output and its ability to change the direction of the input vector. For ENP, we measure each neuron's projected contribution to the expert output to identify and retain important neurons. We conduct extensive experiments on the Qwen and DeepSeek models. Under a routing expert sparsity of 40% and an average of 63.76% activated expert parameters, the DeepSeek model suffers only a 1-point drop in accuracy compared to the full-parameter model. Moreover, it outperforms the full-parameter model by 10% on code generation tasks.

2606.09886 2026-06-10 cs.LG cs.AI 新提交

SHAPE: Coalition-Aware Expert Pruning for Sparse Mixture-of-Experts LLMs

SHAPE: 面向稀疏混合专家大语言模型的联盟感知专家剪枝

Yuhao Zhang

发表机构 * Beihang University(北航大学)

AI总结 提出SHAPE框架,通过合作博弈论建模专家间协作,利用Shapley值识别高贡献专家,结合质量覆盖选择规则在剪枝预算下保留关键专家,实验表明在多种MoE模型上提升鲁棒性并降低显存。

详情
AI中文摘要

稀疏混合专家(MoE)大语言模型以低每token计算量实现了高质量,但其部署常受限于内存墙:必须保留全部专家池以支持依赖token的路由。专家剪枝是一种直接解决方案,但先前的标准通常独立评估专家,忽略了MoE推理本质上是“联盟性”的,即输出由路由到的top-$k$专家组合产生。我们提出\textbf{SHAPE},一个任务驱动的剪枝框架,显式建模\textit{层内}专家协作。SHAPE将小校准集上的路由轨迹建模为经验合作博弈,并通过基于观察到的top-$k$联盟的Shapley式归因分配交互感知的专家价值,从而识别对高效用协作至关重要的专家,而不仅仅是频繁出现的专家。为了在全局剪枝预算下保持MoE拓扑,SHAPE进一步引入\textit{质量-覆盖}选择规则,在每层保留覆盖非负Shapley质量$\alpha$分数的最小专家子集,同时使用二分法匹配目标保留率。在三个现代MoE骨干网络(Qwen3-30B-A3B、GPT-OSS-20B和DeepSeek-V2-Lite)上的多个基准实验表明,SHAPE在20%和40%专家剪枝下,相比全局和逐层剪枝变体一致地提升了鲁棒性,无需额外训练即保持竞争性精度,并显著降低了峰值GPU内存占用。开源代码见此https URL。

英文摘要

Sparse Mixture-of-Experts (MoE) large language models achieve strong quality with low per-token compute, yet their deployment is often limited by the memory wall: the full expert pool must remain resident to support token-dependent routing. Expert pruning is a direct remedy, but prior criteria typically score experts independently and overlook that MoE inference is inherently \emph{coalitional}, where outputs arise from routed top-$k$ expert combinations. We propose \textbf{SHAPE}, a task-driven pruning framework that explicitly models \emph{intra-layer} expert cooperation. SHAPE formulates routing traces on a small calibration set as an empirical cooperative game and assigns interaction-aware expert values via a Shapley-style attribution over observed top-$k$ coalitions, enabling the identification of experts that are essential for high-utility collaborations rather than merely frequent. To preserve MoE topology under a global pruning budget, SHAPE further introduces a \emph{quality-coverage} selection rule that retains, in each layer, the minimal expert subset covering an $α$ fraction of non-negative Shapley mass, while using bisection to match a target keep rate. Experiments on three modern MoE backbones (Qwen3-30B-A3B, GPT-OSS-20B, and DeepSeek-V2-Lite) across diverse benchmarks show that SHAPE consistently improves robustness over global and layer-wise pruning variants, maintaining competitive accuracy under 20\% and 40\% expert pruning without additional training and delivering clear reductions in peak GPU memory footprint. The open-source code is available at https://github.com/Alizen-1009/Shapley-Moe.

2606.09916 2026-06-10 cs.LG cs.AI 新提交

IntentKV: Cross-Turn Intent-Aware KV Cache Pruning for Agent Inference

IntentKV: 面向Agent推理的跨轮次意图感知KV缓存剪枝

Junjie Li, Jiong Lou, Jie Li

AI总结 针对多轮LLM Agent中KV缓存成为服务瓶颈的问题,提出IntentKV方法,通过会话级QueryMemory和残差注意力头实现跨轮次意图感知的KV剪枝,在保持精度的同时大幅降低峰值请求token和KV读取量。

详情
AI中文摘要

多轮LLM Agent将短查询扩展为包含工具调用、搜索结果和中间推理的长轨迹。在单条轨迹中,KV内存和KV读取带宽增长数个数量级,使得键值(KV)缓存(而非参数计算)成为长时Agent的主要服务瓶颈。我们提出IntentKV,一种学习型KV剪枝方法,保持基础LLM冻结。IntentKV维护一个会话级的跨轮次意图QueryMemory,通过记忆-注意力规则对实时历史token进行评分,并添加一个零初始化的残差注意力头,对当前查询的K向量进行交叉注意力。为了与前缀缓存保持可组合性,驱逐采用槽位映射重定向:被丢弃的位置路由到一个哨兵死槽,而存活的K/V行、RoPE相位和槽位标识保持不变。在严格的KV预算下,IntentKV与无剪枝的全缓存基线相比几乎没有精度下降:在8k KV预算下,Qwen3-8B的平均峰值请求token下降23.9%,Qwen2.5-14B下降30.7%。在Qwen2.5-14B上所有方法都能完成的100个最长BCP查询中,IntentKV-8k进一步将最坏情况下的峰值请求token从92.3k降至20.5k(减少77.8%),最坏情况下的原始KV读取从4.11亿降至3100万(减少92.6%)。

英文摘要

Multi-turn LLM agents fan short queries into long trajectories of tool calls, search results, and intermediate reasoning. Both KV memory and KV read bandwidth grow by orders of magnitude across a single trajectory, making the key-value (KV) cache, not parameter compute, the dominant serving bottleneck for long-horizon agents. We introduce IntentKV, learned KV pruning that keeps the base LLM frozen. IntentKV maintains a session-level QueryMemory of cross-turn intent, scores live history tokens with a memory-attention rule, and adds a zero-initialized residual head with cross-attention over current-query K-vectors. To stay composable with prefix caches, eviction is a slot-map redirection: dropped positions route to a sentinel dead slot while surviving K/V rows, RoPE phases, and slot identities stay in place. IntentKV matches the no-pruning full-cache baseline with almost no accuracy drop under tight KV budgets: at an 8k KV budget, mean peak request tokens drop 23.9% on Qwen3-8B and 30.7% on Qwen2.5-14B. On the 100 longest BCP queries that all methods complete on Qwen2.5-14B, IntentKV-8k further cuts worst-case peak request tokens from 92.3k to 20.5k, a 77.8% reduction, and worst-case raw KV reads from 411M to 31M, a 92.6% reduction.

2606.09924 2026-06-10 cs.LG cs.AI 新提交

Sigma-Branch: Hierarchical Single-Path Network Reconstruction for Dynamic Inference with Reduced Active Parameters

Sigma-Branch: 用于动态推理的分层单路径网络重构,减少活跃参数

Kohga Tanaka, Hiroaki Nishi

AI总结 提出Sigma-Branch框架,通过分层二叉树结构将预训练密集网络重构为共享主干、分层路由器和专用叶子,利用激活聚类初始化并微调,推理时仅执行单一路径,在CIFAR-100/ResNet-50等任务上减少58-60%活跃参数,性能损失小于1.72个百分点。

详情
AI中文摘要

在内存受限的边缘加速器上部署深度神经网络,瓶颈在于每次推理的片外权重传输而非计算:密集网络无法保留在芯片上,每个输入都必须加载所有参数。现有模型压缩仅在永久容量损失代价下减少这种传输。我们提出Sigma-Branch (SigmaB),一个将预训练密集网络重构为分层二叉树的框架,该树由共享主干、分层路由器和专用叶子组成。预训练权重通过基于激活的球形k-means聚类分布在树中,该聚类联合初始化路由器权重和每分支通道分配;然后通过软路由微调使每个叶子与其路由输入子集对齐。在推理时,所得网络仅执行一条根到叶路径,减少活跃参数占用,同时将完整密集参数集存储在内存中。在CIFAR-100 / ResNet-50、ImageNet-1K / ResNet-50和ModelNet40 / PointNet++上,SigmaB-Net将每次推理的活跃参数减少58-60%,同时与密集基线Top-1相比误差在1.72个百分点以内。在可比的ImageNet-1K Top-1下,活跃参数减少超过静态结构化剪枝(FPGM、HRank)14-23个百分点。跨模态评估涵盖2D视觉和3D点云骨干网络,证实了将每次推理内存流量与总参数数量解耦的框架级主张。

英文摘要

Deploying deep neural networks on memory-constrained edge accelerators is bottlenecked by per-inference off-chip weight transfer rather than computation: the dense network cannot be retained on-chip, and every parameter must be loaded for every input. Existing model compression reduces this transfer only at the cost of permanent capacity loss. We propose Sigma-Branch (SigmaB), a framework that restructures a pretrained dense network into a hierarchical binary tree composed of a shared backbone, hierarchical routers, and specialized leaves. Pretrained weights are distributed across the tree via activation-based spherical k-means clustering, which jointly initializes router weights and per-branch channel allocations; soft-routing fine-tuning then aligns each leaf with its routed input subset. At inference, the resulting network executes only a single root-to-leaf path, reducing the active-parameter footprint while storing the complete dense parameter set in memory. Across CIFAR-100 / ResNet-50, ImageNet-1K / ResNet-50, and ModelNet40 / PointNet++, SigmaB-Net reduces per-inference active parameters by 58-60% while remaining within 1.72 percentage points (pp) of the dense baseline Top-1. At comparable ImageNet-1K Top-1, the active-parameter reduction exceeds static structured pruning (FPGM, HRank) by 14-23 pp. The cross-modal evaluation, spanning 2D vision and 3D point-cloud backbones, substantiates a framework-level claim that decouples per-inference memory traffic from the total parameter count.

2606.09927 2026-06-10 cs.LG cs.AI cs.CL 新提交

Trainable Smooth-Rotation Transforms with Learned Channel Scales for LLM Quantization

可训练平滑旋转变换与学习通道尺度用于LLM量化

Patrik Czakó, Gábor Kertész, Sándor Szénási

AI总结 针对大语言模型量化中激活值量化困难的问题,提出基于分位数鲁棒的缩放策略和梯度优化的通道尺度学习,在W4A4量化下显著降低误差。

Comments 6 pages, 8 figures, 3 tables. Accepted to IEEE INES 2026 conference proceedings

详情
AI中文摘要

后训练量化(PTQ)是降低大语言模型(LLM)服务成本最实用的方法之一,但激活值量化仍然困难,因为异常值主导的通道会导致较大的量化误差。本文研究了这种退化是否部分由基于缩放的等效变换中的过度迁移引起。我们引入了一种用于SmoothRot风格变换的分位数鲁棒缩放策略,用高分位数替代基于最大值的激活统计量,并辅以通道尺度的约束梯度优化。在LLaMA-3.2-1B的W4A4量化下,仅分位数策略搜索相比SmoothRot基线将选定层误差降低11.1%,联合(alpha, q)搜索降低12%,训练达到18.5%。将最佳选定层策略重放到所有解码器块的下投影层,相应的全层平均误差从97.51降至78.08(19.9%)。结果表明,鲁棒的迁移控制和轻量级尺度学习在保持等效变换框架的同时,相比基于最大值的固定策略提供了持续改进。

英文摘要

Post-training quantization (PTQ) is one of the most practical ways to reduce the serving cost of Large Language Models (LLMs), but activation quantization remains difficult because outlier-dominated channels lead to large quantization errors. This paper investigates whether part of this degradation is caused by over-migration in scaling-based equivalent transformations. We introduce a quantile-robust scaling policy for SmoothRot-style transforms by replacing max-based activation statistics with high quantiles, and we complement it with constrained gradient-based optimization of channel scales. On LLaMA-3.2-1B under W4A4 quantization, quantile-only policy search improves selected-layer error by 11.1% over the SmoothRot baseline, joint (alpha, q) search improves it by 12%, and training reaches 18.5%. Replaying the best selected-layer policy on all decoder-block down-projection layers reduces the corresponding full-layer mean error from 97.51 to 78.08 (19.9%). The results show that robust migration control and lightweight scale learning provide consistent gains over max-based fixed policies while preserving the equivalent-transform framework.

2606.09937 2026-06-10 cs.LG cs.AI cs.CL 新提交

RKSC: Reasoning-Aware KV Cache Sharing and Confident Early Exit for Multi-Step LLM Inference

RKSC:面向多步LLM推理的感知推理的KV缓存共享与自信提前退出

Anirudh Sekar

AI总结 提出RKSC框架,通过注意力相似性KV共享、置信门控提前退出和推理选择性块缓存管理,消除多分支LLM推理中的结构冗余,实现平均3.008倍加速,错误率仅0.37%。

Comments Accepted to the ICML 2026 Workshop on Statistical Frameworks for Uncertainty in Agentic Systems

详情
AI中文摘要

我们提出RKSC(感知推理的KV缓存共享),一种无需训练的推理框架,消除了多分支LLM推理流程中的两种结构冗余。ASKS(注意力相似性KV共享)计算前缀KV缓存一次,并通过隐藏状态余弦相似度广播给所有语义相似的分支,严格推广了vLLM和SGLang使用的精确令牌前缀缓存。CGEE(置信门控提前退出)应用两种互补的退出机制:(1)当生成置信度在分支间具有决定性时,完全跳过验证前向传播;(2)当逐层熵稳定时,在中间层终止验证传播,使用Transformer骨干上的轻量级钩子。RSBCM(推理选择性块缓存管理器)通过注意力加权深度优先驱逐防止无界缓存增长。在五个模型家族(7B-10B)、四个基准测试和1000个评估问题上,RKSC相对于无KV基线实现了平均3.008倍加速(峰值3.990倍),相对于vLLM等效前缀缓存平均提升1.66倍,CGEE导致的错误率仅为0.37%(1616次验证调用中6次错误)。无需微调或架构更改。代码可在该URL获取。

英文摘要

We introduce RKSC (Reasoning-Aware KV Cache Sharing), a training-free inference framework that eliminates two structural redundancies in multi-branch LLM reasoning pipelines. ASKS (Attention-Similarity KV Sharing) computes the prefix KV cache once and broadcasts it to all semantically similar branches via hidden-state cosine similarity, strictly generalising the token-exact prefix caching used by vLLM and SGLang. CGEE (Confidence-Gated Early Exit) applies two complementary exit mechanisms: (1) it skips the verification forward pass entirely when generation confidence is decisive across branches, and (2) it terminates the verification pass at an intermediate layer when per-layer entropy stabilises, using lightweight hooks on the transformer backbone. RSBCM (Reasoning-Selective Block Cache Manager) prevents unbounded cache growth via attention-weighted depth-priority eviction. Across five model families (7B-10B), four benchmarks, and 1,000 evaluated problems, RKSC achieves a mean speedup of 3.008x over the No-KV baseline (peak 3.990x), a 1.66x mean improvement over vLLM-equivalent prefix caching, with a CGEE-induced error rate of only 0.37% (6 errors out of 1,616 verify calls). No fine-tuning or architecture changes are required. Code is available at https://github.com/AnirudhSekar/RKSC.

2606.10445 2026-06-10 cs.LG cs.CL 新提交

SpenseGPT: Practical One-shot Pruning Enabling Sparse and Dense GEMMs for LLM Inference

SpenseGPT: 面向LLM推理的实用一次性剪枝,支持稀疏和稠密GEMM

Jaeseong Lee, Seung-won Hwang, Samyam Rajbhandari

发表机构 * Snowflake AI Research(Snowflake AI研究) Seoul National University(首尔大学)

AI总结 提出Spense混合稀疏-稠密格式,将权重矩阵分为2:4稀疏和稠密区域,结合一次性剪枝方法SpenseGPT,在B200 GPU上实现高达1.2倍端到端解码加速,同时保持模型精度。

详情
AI中文摘要

半结构化2:4稀疏性被现代加速器广泛支持,可提供高达2倍的理论加速。然而,其严格的50%稀疏性约束在训练后剪枝下常导致不可忽略的精度下降。同时,现有的宽松稀疏格式要么需要专门的编译器支持,要么引入限制端到端加速的运行时开销。我们提出Spense,一种实用的混合稀疏-稠密格式,将每个权重矩阵分为2:4稀疏区域和稠密区域。该设计放宽了有效稀疏性约束,同时保持与现有高性能稀疏和稠密GEMM库的兼容性,避免了自定义编译器支持和输入激活扩展。基于此格式,我们引入SpenseGPT,一种一次性训练后剪枝方法,生成稀疏和稠密区域。值得注意的是,我们表明选择正确的稠密区域很重要,并设计了两种不同的策略来选择它们。在Qwen3-32B和Seed-OSS-36B上的实验表明,我们的方法在B200 GPU上使用FP8精度实现了高达1.2倍的端到端解码加速,同时保持精度。据我们所知,这是首个在B200等最新GPU上通过半结构化稀疏张量核心实现真实世界端到端LLM解码加速并保持模型质量的一次性剪枝演示。

英文摘要

Semi-structured 2:4 sparsity is widely supported by modern accelerators, providing up to a 2x theoretical speedup. However, its strict 50% sparsity constraint often causes non-negligible accuracy degradation under post-training pruning. Meanwhile, existing relaxed sparsity formats either require specialized compiler support or introduce runtime overheads that limit end-to-end speedup. We propose Spense, a practical hybrid sparse-dense format that splits each weight matrix into a 2:4 sparse region and a dense region. This design relaxes the effective sparsity constraint while remaining compatible with existing high-performance sparse and dense GEMM libraries, avoiding both custom compiler support and input activation expansion. Building on this format, we introduce SpenseGPT, a one-shot post-training pruning method that produces sparse and dense regions. Notably, we show that selecting the right dense regions is important, and we devise two different strategies to choose them. Experiments on Qwen3-32B and Seed-OSS-36B demonstrate that our method achieves up to 1.2x end-to-end decoding speedup on B200 GPUs with FP8 precision, while preserving accuracy. To the best of our knowledge, this is the first one-shot pruning demonstration of real-world end-to-end LLM decoding speedup from semi-structured sparse tensor cores on recent GPUs such as B200s, while maintaining model quality.

2606.10706 2026-06-10 cs.LG cs.AI 新提交

Unifying Data, Memory, and Compute Efficiency in LLM training: A Survey

统一LLM训练中的数据、内存和计算效率:一项综述

Vanessa Schmidt, Huy Hoang Nguyen, Cédric Jung, Shirin Salehi, Anke Schmeink

发表机构 * Chair of Information Theory and Data Analytics (INDA), RWTH Aachen University(亚琛工业大学信息理论与数据分析教席) AIT Austrian Institute of Technology GmbH(奥地利技术研究所) Automation and Control Institute, Technische Universität Wien (TUW)(维也纳工业大学自动化与控制研究所)

AI总结 本文从资源约束视角综述大语言模型训练中的数据效率、内存效率和计算预算感知三大瓶颈,强调三者需联合优化而非孤立处理。

Comments Accpeted for publication in IEEE Transactions on Artificial Intelligence (TAI)

详情
AI中文摘要

资源约束日益决定了大语言模型(LLM)中可以训练、微调和部署的内容,然而效率通常通过孤立的技术而非作为相互作用的限制系统来研究。本综述采用以约束为中心的视角,围绕三个耦合的瓶颈组织近期进展:数据效率(训练什么)、内存效率(如何适应训练)和计算预算感知(何时何地消耗FLOPs)。在数据轴上,我们回顾了最大化每个token学习量的选择和剪枝方法,从基于学习动态的可扩展代理信号到基于梯度和影响的评分,以及难度感知和课程式策略。我们强调新兴证据表明,不同的“好数据”概念在不同机制中占主导地位,这意味着最优子集取决于任务目标和资源预算,而非普遍适用。在系统方面,我们表明GPU内存(而非原始计算)通常是微调中的主要瓶颈,有效的扩展需要联合减少权重存储、优化器状态和激活内存,而不是孤立地优化任何单一组件。超越内存,我们将训练和推理视为计算主导的过程,其中优化、数据选择和解码必须明确考虑有限的FLOP预算。我们回顾了计算最优分配和停止规则的证据,其中一旦边际性能增益低于预算依赖的阈值,计算应停止或重新分配。总之,这些结果将计算感知的数据选择、缩放定律和自适应推理统一在资源条件决策的共同原则下。

英文摘要

Resource constraints increasingly determine what can be trained, fine-tuned, and deployed in large language models (LLMs), yet efficiency is often studied through isolated techniques rather than as an interacting system of limits. This survey adopts a constraint-centric perspective and organizes recent progress around three coupled bottlenecks: data efficiency (what to train on), memory efficiency (how to fit training), and compute budget awareness (when and where to spend FLOPs). On the data axis, we review selection and pruning methods that maximize learning per token, ranging from scalable proxy signals based on learning dynamics to gradient- and influence-based scoring, as well as difficulty-aware and curriculum-style strategies. We highlight emerging evidence that different notions of good data dominate in different regimes, implying that optimal subsets depend on the task objective and resource budget rather than being universal. On the systems side, we show that GPU memory, not raw compute, is often the dominant bottleneck in fine-tuning, and that effective scaling requires jointly reducing weight storage, optimizer states, and activation memory rather than optimizing any single component in isolation. Beyond memory, we frame training and inference as compute-governed processes in which optimization, data selection, and decoding must explicitly account for finite FLOP budgets. We review evidence for compute-optimal allocation and stopping rules, where computation should be halted or reallocated once marginal performance gains fall below a budget-dependent threshold. Together, these results unify compute-aware data selection, scaling laws, and adaptive inference under a common principle of resource-conditioned decision-making.

2606.10890 2026-06-10 cs.LG cs.AI 新提交

Optimal Post-Training Quantization Scales and Where to Find Them

最优后训练量化尺度及其寻找方法

Juan Amboage, Pablo Monteagudo-Lago, Ian Colbert, Giuseppe Franco, Nicholas Fraser

发表机构 * AMD

AI总结 提出PiSO算法,利用校准数据精确高效地计算逐通道最优量化尺度,并扩展到分组量化,在Llama和Qwen模型上显著提升困惑度和零样本准确率。

详情
AI中文摘要

后训练量化(PTQ)通过将权重映射到低比特表示来压缩大型语言模型。定义量化网格的缩放因子通常使用简单的、无数据的启发式方法选择。在这项工作中,我们提出了PiSO(分段尺度优化),一种利用校准数据在最近舍入量化下精确且高效地计算最优逐通道权重尺度的算法。PiSO将尺度搜索空间划分为有限个区间,在这些区间上目标函数具有闭式最小值。我们通过原则性启发式方法将PiSO扩展到分组量化,并提出了将尺度优化与纠错交错的有效策略。在Llama和Qwen模型上,跨多个模型大小和目标权重位宽的实验表明,在困惑度和下游零样本准确率上均有持续改进,无论是单独使用还是与纠错结合。特别地,我们观察到随着目标位宽变窄、量化变得更加困难,收益增加。

英文摘要

Post-training quantization (PTQ) compresses large language models by mapping weights to low-bit representations. The scaling factor that defines the quantization grid is typically chosen using simple, data-free heuristics. In this work, we present PiSO (Piecewise Scale Optimization), an algorithm that leverages calibration data to compute the optimal channel-wise weight scales exactly and efficiently under round-to-nearest quantization. PiSO partitions the scale search space into finitely many intervals on which the objective admits a closed-form minimizer. We extend PiSO to group-wise quantization via principled heuristics and propose effective strategies for interleaving scale optimization with error correction. Experiments on Llama and Qwen models across multiple model sizes and target weight bit-widths demonstrate consistent improvements in perplexity and downstream zero-shot accuracy, both standalone and combined with error correction. In particular, we observe increased benefits as the target bit-width narrows and quantization becomes more challenging.

2606.10896 2026-06-10 cs.LG cs.DB cs.IR cs.PF 新提交

Flash-GMM: A Memory-Efficient Kernel for Scalable Soft Clustering

Flash-GMM:一种用于可扩展软聚类的内存高效内核

Gal Bloch, Ariel Gera, Matan Orbach, Ohad Eytan, Assaf Toledo

发表机构 * IBM Research(IBM研究院)

AI总结 提出Flash-GMM融合Triton内核,在单GPU通次中高效计算高斯混合模型,通过避免实例化完整责任矩阵,实现20倍加速并支持比先前大100倍的数据集训练,集成到IVF粗量化器中提升ANN搜索性能。

详情
AI中文摘要

我们提出了 \textbf{Flash-GMM},一个融合的 Triton 内核,用于在单 GPU 通次中高效计算大规模数据上的高斯混合模型(GMM)。通过消除在 GPU 内存中实例化完整责任矩阵的需求,Flash-GMM 实现了比现有实现 \textbf{20$\times$} 的加速,并支持在单个设备上训练比以前可行数据集大 \textbf{100$\times$} 的数据集。为了展示其影响,我们将 Flash-GMM 集成到 IVF 粗量化器中用于近似最近邻(ANN)搜索。我们表明,软 GMM 聚类现在可以作为 $k$-means 的可行即插即用替代方案,并且可以利用 GMM 责任将边界向量分配到多个聚类。我们的方法在达到固定召回目标时,距离计算次数减少多达 $1.7\times$,或者在相同计算成本下,召回率@10 提高 $+2$--$12$。我们将该内核作为开源项目发布。

英文摘要

We present \textbf{Flash-GMM}, a fused Triton kernel for efficient computation of Gaussian Mixture Models (GMMs) over large-scale data in a single GPU pass. By eliminating the need to materialize the full responsibility matrix in GPU memory, Flash-GMM achieves a \textbf{20$\times$} speedup over existing implementations and enables training on datasets more than \textbf{100$\times$} larger than previously feasible on one device. To demonstrate its impact, we integrate Flash-GMM into the IVF coarse quantizer for approximate nearest-neighbor (ANN) search. We show that soft GMM clustering is now a viable drop-in replacement for $k$-means, and that GMM responsibilities can be leveraged to assign border vectors to multiple clusters. Our approach reaches fixed recall targets with up to $1.7\times$ fewer distance computations, or equivalently, yields $+2$--$12$ recall@10 at matched computational cost. We release the kernel as an open-source project.

2606.10935 2026-06-10 cs.LG cs.AI 新提交

CLP: Collocation-Length Prediction for Zero-Loss Adaptive Multi-Token Inference

CLP: 零损失自适应多令牌推理的搭配长度预测

Xuezhen Xie, Zhiqiang Zhou

发表机构 * Xiamen University(厦门大学) Tsinghua University(清华大学)

AI总结 提出CLP方法,通过轻量级线性层预测可安全接受的额外令牌数,解决多令牌预测中头-主干竞争导致的输出退化问题,在Qwen2.5模型上实现零质量损失的1.14x-1.29x加速。

Comments 13 pages, 8 figures, 8 tables

详情
AI中文摘要

大型语言模型推理受限于自回归解码,每个令牌需要一次完整的前向传播。多令牌预测(MTP)提供了一种有前景的加速路径,但现有方法存在根本性的架构缺陷:第一个令牌的MTP头与主干自身的语言模型(LM)头竞争,导致预测被接受时质量严重下降。我们将这种头-主干竞争确定为先前基于MTP的加速方法中重复和不连贯输出的根本原因。为了解决这个问题,我们提出了Backbone-as-Architect设计原则,其中主干LM头始终生成第一个令牌,MTP头仅负责后续令牌。基于这一原则,我们引入了CLP(搭配长度预测器),一个轻量级的跨度级决策层,预测每个解码步骤可以安全接受多少个额外令牌。CLP仅使用单个线性层(4.6K--7.7K参数),取代了先前工作中过度设计的1M参数门控网络。在Qwen2.5模型(0.5B、1.5B、7B)上的实验表明,CLP在1.5B模型上实现了1.20x--1.29x加速,在7B模型上实现了1.14x--1.20x加速,且零质量退化(重复率<0.02),而基于门控的方法无法加速(1.07x)或产生严重退化的输出(重复率>0.5%)。我们进一步证明,较短的预测范围(k=2)在大模型上恢复了24%更高的MTP头准确率,建立了一个可扩展感知的设计原则。我们确定MTP头预测准确率是加速的约束条件,并为未来改进建立了清晰的路线图。

英文摘要

Large language model inference is bottlenecked by autoregressive decoding, where each token requires a full forward pass. Multi-token prediction (MTP) offers a promising acceleration path, but existing approaches suffer from a fundamental architectural flaw: the MTP head for the first token competes with the backbone's own language model (LM) head, leading to severe quality degradation when predictions are accepted. We identify this head-backbone competition as the root cause of repetitive and incoherent outputs in prior MTP-based acceleration methods. To address this, we propose Backbone-as-Architect, a design principle where the backbone LM head always generates the first token, and MTP heads are responsible only for subsequent tokens. Building on this principle, we introduce CLP (Collocation-Length Predictor), a lightweight span-level decision layer that predicts how many additional tokens can be safely accepted at each decoding step. CLP uses only a single linear layer (4.6K--7.7K parameters), replacing the over-engineered 1M-parameter gate networks used in prior work. Experiments on Qwen2.5 models (0.5B, 1.5B, 7B) show that CLP achieves 1.20x--1.29x speedup on 1.5B and 1.14x--1.20x on 7B, with zero quality degradation (repetition ratio < 0.02), while gate-based approaches fail to accelerate (1.07x) or produce severely degraded outputs (repetition ratio > 0.5%). We further demonstrate that shorter prediction horizons (k=2) recover 24% higher MTP head accuracy on large models, establishing a scaling-aware design principle. We identify MTP head prediction accuracy as the binding constraint on acceleration and establish a clear roadmap for future improvements.

2606.10944 2026-06-10 cs.LG cs.DS math.ST stat.ME stat.ML stat.TH 新提交

Express Language Modeling

Express 语言建模

Albert Gong, Annabelle Michael Carrell, Raaz Dwivedi, Lester Mackey

AI总结 提出 Express 工具,将非因果注意力近似转换为因果近似,结合 Thinformer 实现最优因果注意力保证,并加速语言建模中的四个资源瓶颈。

详情
AI中文摘要

我们引入了一个新工具 Express,用于将非因果注意力近似转换为具有匹配近似保证的因果近似。当与最先进的 Thinformer 近似结合时,Express 改进了已知的最佳因果注意力保证,对于长度为 $n$ 的序列,实现了 $\log^{3/2}(n)/s$ 的近似误差,仅需 $O(s)$ 内存和 $O(s^2 \log^2(n))$ 的压缩开销。我们将这些进展与高效的 I/O 感知 Triton 实现相结合,展示了相对于 FlashAttention 2 的显著加速,并使用 Express 克服了语言建模流程中的四个资源瓶颈:长上下文预填充、KV 缓存压缩、长形式内存受限解码和长形式计算受限解码。

英文摘要

We introduce a new tool, Express, for converting a non-causal attention approximation into a causal approximation with matching approximation guarantees. When combined with the state-of-the-art Thinformer approximation, Express improves upon the best known causal attention guarantees, delivering $\log^{3/2}(n)/s$ approximation error with only $O(s)$ memory and $O(s^2 \log^2(n))$ compression overhead for a sequence of length $n$. We pair these developments with an efficient I/O-aware Triton implementation, demonstrate substantial speedups over FlashAttention 2, and use Express to overcome four resource bottlenecks in the language modeling pipeline: long-context prefill, KV cache compression, long-form memory-constrained decoding, and long-form compute-constrained decoding.

2606.09956 2026-06-10 cs.SE cs.LG 交叉投稿

Multi-task LLMs for Bug Classification: Efficient Inference with Auxiliary Decoding Heads

多任务大语言模型用于缺陷分类:基于辅助解码头的高效推理

Nikolai Rozanov

发表机构 * Recurse Ltd. Department of Computing, Imperial College London(帝国理工学院计算机系)

AI总结 提出一种轻量级多任务大语言模型(MLC),通过令牌对齐算法和优化训练策略,实现全文件上下文下的行级缺陷定位,性能与代理方法相当但推理延迟降低数个数量级。

Comments 8 pages, 6 pages appendix

详情
AI中文摘要

基于大语言模型的代码生成技术被迅速采用,极大地加速了软件开发,但有效的验证方法仍然严重不足。现有的缺陷定位技术要么成本过高(每个文件需要数分钟的代理推理和数千个生成令牌),要么以粗粒度的函数级别运行,不适合精确调试。而专注于行级粒度且更轻量的工作往往在性能或上下文大小上受到限制。我们提出了一种新颖的行级缺陷定位方法,通过三个关键贡献解决了这些限制:(1)一种令牌对齐算法,克服了先前工作中的基本令牌化挑战;(2)一种轻量级多任务大语言模型用于缺陷定位(MLC),实现高效的行级缺陷分类;(3)一种针对多行预测的优化训练策略。我们的方法在全文件上下文下的行级缺陷定位中,在类似设置中达到了最先进的性能。同时,在Defects4J和PypiBugs基准测试中,我们达到了与代理方法相当的性能,同时将推理延迟降低了数个数量级,每个文件仅需生成一个令牌。我们还通过引入并在一个小型域外评估数据集(Python)上进行评估,进一步证明了强大的泛化能力。我们将在论文被接收后开源我们的代码、模型和数据集。

英文摘要

The rapid adoption of LLM-powered code generation has dramatically accelerated software development, yet effective verification methods remain severely underdeveloped. Existing bug localization techniques are either prohibitively expensive, requiring minutes of agentic reasoning and thousands of generated tokens per file, and/or operate at coarse function-level granularity unsuitable for precise debugging. While works that focus on line-level granularity and are more light-weight are often limited in their performance or context size. We introduce a novel line-level bug localization approach that addresses these limitations through three key contributions: (1) a token alignment algorithm that overcomes fundamental tokenization challenges in previous work, (2) a lightweight multi-task LLM for bug localization (MLC) enabling efficient line-level bug classification, and (3) an optimized training recipe for multi-line prediction. Our method achieves state-of-the-art performance among similar setups on line-level bug localization with full-file context. At the same time we reach comparable performance to agentic approaches on Defects4J and PypiBugs benchmarks while reducing inference latency by orders of magnitudes, requiring only a single generated token per file. We further demonstrate strong generalization by introducing and evaluating on a small out-of-domain evaluation datasets in Python. We will open source our code, models, and datasets upon acceptance.

2606.10008 2026-06-10 cs.NE cs.LG 交叉投稿

Spiking Neural Network inference on FPGAs with hls4ml

基于hls4ml的FPGA上脉冲神经网络推理

Barry M. Dillon

发表机构 * ISRC, Ulster University(ISRC、乌斯特大学)

AI总结 本文扩展hls4ml工具,实现将PyTorch训练的脉冲神经网络(SNN)部署到FPGA固件上,在Heidelberg Spiking Digits数据集上达到约34μs的推理延迟。

详情
AI中文摘要

脉冲神经网络(SNN)提供了一种自然的时序机器学习框架。它们的神经元维持内部状态并通过离散脉冲传播信息,从而实现低延迟的时序推理。尽管SNN通常与异步神经形态处理器相关联,但许多科学实时推理系统依赖于传统的同步现场可编程门阵列(FPGA)和高级综合(HLS)工作流程。在本文中,我们提出了hls4ml的扩展,使得在PyTorch中训练的SNN能够以时钟驱动方式部署到FPGA固件上。我们使用在Heidelberg Spiking Digits数据集上训练的密集量化SNN演示了该工作流程,其推理延迟约为34μs。通过软件参考比较、HLS C仿真、HLS综合、导出和Vivado综合报告,我们验证了生成的设计。这项工作将hls4ml工具包开放给神经形态计算,允许对SNN模型进行流线型优化、综合和部署,用于实时推理。

英文摘要

Spiking Neural Networks (SNNs) provide a naturally temporal machine-learning framework. Their neurons maintain an internal state and propagate information through discrete spikes, enabling low-latency temporal inference. Although SNNs are often associated with asynchronous neuromorphic processors, many scientific real-time inference systems rely on conventional synchronous field-programmable gate arrays (FPGAs) and high-level synthesis (HLS) workflows. In this paper we present an extension of hls4ml that enables clock-driven deployment of SNNs trained in pytorch onto FPGA firmware. We demonstrate the workflow using a dense quantised SNN trained on the Heidelberg Spiking Digits dataset where it achieves inference latencies of approximately $34μ$s. We validate the generated design through software reference comparisons, HLS C simulation, HLS synthesis, export, and Vivado synthesis reports. This work opens up the hls4ml toolkit to neuromorphic computing, allowing streamlined optimisation, synthesis, and deployment of SNN models for real-time inference.

2606.10209 2026-06-10 cs.AI cs.LG cs.SE 交叉投稿

Less Context, Better Agents: Efficient Context Engineering for Long-Horizon Tool-Using LLM Agents

更少上下文,更优智能体:面向长周期工具使用LLM智能体的高效上下文工程

Abhilasha Lodha, Mahsa Pahlavikhah Varnosfaderani, Abir Chakraborty, Abhinav Mithal

发表机构 * Microsoft(微软)

AI总结 针对企业工具使用场景中上下文过长导致的问题,提出选择性保留最近工具交互并添加紧凑摘要的方法,在费用明细任务上将完成率从71.0%提升至91.6%,同时大幅降低token消耗和运行时间。

Comments 17 pages, 3 figures, 8 tables

详情
AI中文摘要

部署为自主智能体用于企业工作流的大型语言模型面临一个关键挑战:来自企业系统的冗长工具响应可能导致上下文溢出、状态过时错误和高推理成本。我们在Microsoft Dynamics 365 Finance and Operations中使用Model Context Protocol工具研究自动费用明细化问题。我们在一个包含50个任务的酒店费用基准上评估了四种GPT-5配置:无用户模型、完整对话历史、上下文裁剪至最近5个工具调用/响应对、以及裁剪加自动摘要。结果在5次独立运行中取平均,用户模型在上下文工程比较中保持不变。无用户模型基线仅达到8.0%的完全明细化。完整上下文保留将完成率提升至71.0%,但每次基准测试消耗1,480,996个token和14.56小时。裁剪至最近5个工具调用将完成率提升至79.0%,同时将token使用降至535,274个,运行时间降至5.39小时。添加摘要实现了最佳结果:91.6%的完全明细化和99.64%的平均明细金额,使用553,374个token和5.79小时。我们进一步报告了置信区间、效应量分析、裁剪和摘要窗口的敏感性、失败分析、按三个类别分组的五种费用类型的结果,以及使用Claude Sonnet 4.5的跨模型证据。这些结果表明,对于这类企业工具使用工作流,选择性保留最近的工具交互加上紧凑摘要,与保留完整历史相比,可以提高可靠性和效率。

英文摘要

Large language models deployed as autonomous agents for enterprise workflows face a key challenge: verbose tool responses from enterprise systems can cause context overflow, stale-state errors, and high inference cost. We study this problem in automated expense itemization in Microsoft Dynamics 365 Finance and Operations using Model Context Protocol tools. We evaluate four GPT-5 configurations on a 50-task hotel expense benchmark: no user model, full conversation history, context pruned to the last 5 tool call/response pairs, and pruning with automated summarization. Results are averaged across 5 independent runs, with the user model held constant for the context-engineering comparison. The no-user-model baseline achieves only 8.0% complete itemization. Full-context retention improves completion to 71.0%, but consumes 1,480,996 tokens and 14.56 hours per benchmark. Pruning to the last 5 tool calls improves completion to 79.0% while reducing token use to 535,274 and runtime to 5.39 hours. Adding summarization achieves the best result: 91.6% complete itemization and 99.64% average amount itemized, with 553,374 tokens and 5.79 hours. We further report confidence intervals, effect-size analysis, sensitivity over pruning and summary windows, failure analysis, results across five expense types grouped into three categories, and cross-model evidence with Claude Sonnet 4.5. These results show that, for this class of enterprise tool-use workflow, selective retention of recent tool interactions plus compact summarization can improve both reliability and efficiency compared with full-history retention.

2606.10493 2026-06-10 cs.DC cs.AI cs.LG cs.NE 交叉投稿

Achieving Cloud-Grade SLOs for Local Mixture-of-Experts Inference through CPU-GPU Hybrid Design

实现本地混合专家模型推理的云级SLO:CPU-GPU混合设计

Wenxin Wang, Yule Hou, Yu Ji, Peng Qu, Youhui Zhang

发表机构 * Tsinghua University(清华大学) Xingyun Integrated Circuits Co., Ltd.(星云集成电路有限公司) Beijing National Research Center for Information Science and Technology(北京信息科学与技术国家研究中心)

AI总结 针对本地MoE推理在低并发下仍无法达到云级服务质量的问题,提出CPU-GPU混合系统,通过流加载预填充、分布式SLP、节点内预填充-解码分离、AVX-512优化FP8 GEMV内核和细粒度CPU并行,在消费级硬件上实现云级SLO。

Comments Accepted to the 20th USENIX Symposium on Operating Systems Design and Implementation (OSDI '26). The official version will appear in the OSDI '26 proceedings published by USENIX

详情
AI中文摘要

本地部署大型混合专家(MoE)模型即使在低并发工作负载下也无法达到云级环境中的服务质量。我们识别出本地MoE推理中的四个关键差距:依赖容量缩减模型(量化、蒸馏、重路由)、无法满足长预填充(超过12K)的30秒TTFT、低于基线的解码吞吐量(低于20 tokens/s)、以及在混合预填充-解码和批量解码工作负载下的并发性差。我们提出一个CPU-GPU混合系统,通过以下方式在双插槽商用CPU和消费级GPU上实现云级SLO:(1)流加载预填充(SLP),将预填充吞吐量提升至1,200 tokens/s,并在30秒内支持32K提示;(2)采用SmallEP专家并行的分布式SLP(DSLP),在两张RTX 5090上达到1,800 tokens/s和45K提示;(3)节点内预填充-解码分离,具有零拷贝共享权重和双批次注意力-MoE重叠方案,在延迟增加低于15%且吞吐量提升50%的情况下维持并发性;(4)AVX-512优化的FP8 GEMV内核,实现原生CPU FP8推理,同时降低4-5倍CPU延迟;(5)细粒度CPU并行,在INT4 DeepSeek-V3上达到28 tokens/s,在完整FP8 V3上达到21.5 tokens/s。评估表明,我们的系统在消费级CPU-GPU平台上为旗舰MoE模型提供云级QoS,通过完整原始精度推理重塑本地部署,无需数据中心基础设施即可实现高质量、经济高效的访问。

英文摘要

Local deployment of large Mixture-of-Experts (MoE) models falls short of the service quality achieved in cloud-scale environments, even under low-concurrency workloads. We identify four key gaps in local MoE inference: reliance on capacity-reduced models (quantized, distilled, rerouted), inability to meet 30-second TTFT for long prefills (more than 12K), sub-baseline decode throughput (under 20 tokens/s), and poor concurrency under mixed prefill-decode and batched decode workloads. We present a CPU-GPU hybrid system that achieves cloud-level SLOs on dual-socket commodity CPUs and consumer GPUs by (1) stream-loading prefill (SLP), boosting prefill throughput to 1,200 tokens/s and enabling 32K prompts within 30 seconds; (2) distributed SLP (DSLP) with SmallEP expert parallelism, reaching 1,800 tokens/s and 45K prompts in 30 seconds on two RTX 5090s; (3) intra-node prefill-decode disaggregation with zero-copy shared weights and a dual-batch attention-MoE overlap scheme, sustaining concurrency with under 15 percent latency increase and 50 percent throughput gains; (4) an AVX-512-optimized FP8 GEMV kernel, enabling native CPU FP8 inference while delivering 4-5x lower CPU latency; and (5) fine-grained CPU parallelism that attains 28 tokens/s on INT4 DeepSeek-V3 and 21.5 tokens/s on intact FP8 V3. Evaluations show our system delivers cloud-level QoS for flagship MoE models on consumer CPU-GPU platforms, reshaping local deployment with intact, original-precision inference and enabling high-quality, cost-effective access without datacenter infrastructure.

2606.10932 2026-06-10 cs.CL cs.LG 交叉投稿

Density Field State Space Models: 1-Bit Distillation, Efficient Inference, and Knowledge Organization in Mamba-2

密度场状态空间模型:Mamba-2中的1比特蒸馏、高效推理与知识组织

Chirag Shinde

发表机构 * Independent Researcher(独立研究者)

AI总结 提出DF-SSM框架,将SSM压缩至1比特骨架加int8低秩校正,应用于Mamba-2 1.3B模型,实现9.7倍压缩和21.4倍推理加速,仅需3200万令牌和6小时蒸馏,并发现模型内部知识组织的三个处理阶段。

Comments 16 pages, 6 figures, 7 tables. Code available at https://github.com/cs-cmyk/df-ssm

详情
AI中文摘要

我们提出了密度场状态空间模型(DF-SSM),这是一个将SSM压缩为1比特骨架并带有int8低秩校正的框架。应用于Mamba-2 1.3B模型,我们得到了一个278 MB的模型(比2.7 GB的FP16教师模型小9.7倍),在GPU上推理速度提升21.4倍(batch=1,相对于mamba-ssm参考实现),同时在下游任务性能上保持在BitMamba-2(一个在150B令牌上从头训练的1.58比特模型)的2-4个百分点以内。蒸馏本身仅需3200万令牌和6小时(在单个A100 GPU上),尽管它假设有一个预训练的FP16教师模型。我们开发了一个优化的推理流水线,结合了用于骨架矩阵乘法的cuBLAS INT8张量核心、用于有状态SSM和卷积操作的自定义CUDA内核,以及用于在GPU和CPU上高效部署的AVX-512 CPU后端。除了压缩,我们还研究了所得模型的内部知识组织,发现了三个不同的处理阶段:意图分类(第0-3层,在没有词汇对齐的抽象空间中操作)、知识检索(第25-35层,事实关联定位在一个5层窗口内)和输出格式化(第36-47层,类别结构消失)。通过对19个类别中445个事实提示的系统分析,我们发现早期层分类是句法的(由模板结构驱动)而非语义的,并且尽管事实回忆较弱,模型仍表现出组织良好的知识表示——这表明表示结构可能先于事实强度。

英文摘要

We present Density Field State Space Models (DF-SSM), a framework for compressing SSMs to a 1-bit scaffold with int8 low-rank correction. Applied to Mamba-2 1.3B, we achieve a 278 MB model (9.7x smaller than the 2.7 GB FP16 teacher) that runs at 21.4x faster inference on GPU (batch=1, relative to the mamba-ssm reference implementation) while maintaining downstream task performance within 2-4 percentage points of BitMamba-2, a 1.58-bit model trained from scratch on 150B tokens. The distillation itself requires only 32M tokens and 6 hours on a single A100 GPU, though it presupposes a pretrained FP16 teacher. We develop an optimized inference pipeline combining cuBLAS INT8 tensor cores for the scaffold matmul, custom CUDA kernels for stateful SSM and convolution operations, and an AVX-512 CPU backend for efficient deployment on both GPU and CPU. Beyond compression, we investigate the internal knowledge organization of the resulting model, discovering three distinct processing phases: intent classification (layers 0-3, operating in an abstract space with no vocabulary alignment), knowledge retrieval (layers 25-35, where factual associations localize to a 5-layer window), and output formatting (layers 36-47, where category structure dissolves). Through systematic analysis of 445 factual prompts across 19 categories, we find that early-layer classification is syntactic (driven by template structure) rather than semantic, and that the model exhibits well-organized knowledge representations despite weak factual recall--suggesting that representational structure may precede factual strength.

2605.08565 2026-06-10 cs.LG 版本更新

Finer is Better (with the Right Scaling)

更细粒度更好(配合正确的缩放)

Clemens Schaefer, Gil Tabak

发表机构 * Google LLC(谷歌公司) Mountain View, Ca(山景城,加利福尼亚)

AI总结 本文研究大语言模型低精度量化中的块大小悖论,发现细粒度块在正确缩放策略下能降低量化误差,并通过4-over-6方法等解决退化问题。

详情
AI中文摘要

微缩放是保持量化到超低精度格式的大语言模型质量的关键技术。直观上,更细的块大小应产生更低的量化误差;然而,Fasoli等人(2026)最近发现的一个悖论表明,标准abs-max缩放实际上会随着块大小减小而导致模型质量下降。在这项工作中,我们研究了这一现象背后的机制。我们证明,这种退化并非细粒度的固有限制,而主要是由于较小块中的元素在统计上更接近其局部块最大值,与用作缩放因子的粗糙非规格化E4M3值交互不良所致。具体来说,我们表明:i)防止缩放因子下溢为零可减轻极端下溢引起的误差;ii)针对性的算法干预(如4-over-6方法)为缩放因子的选择提供更多灵活性,从而解决了较大值的悖论;iii)暴力搜索建立了最优基线,证实理论均方误差随块大小细化而严格改善。最终,我们的发现突显了硬件-软件协同设计的关键见解:块大小悖论部分源于天真的缩放选择。虽然使用分层缩放因子或更宽的格式(如UE5M3)可互换地解决大部分质量损失,但我们发现4-over-6缩放选择启发式方法可以进一步改善质量,特别是对于非常小的块大小。因此,最大化下一代ML加速器的性能需要将硅片格式规范和软件缩放算法视为紧密耦合的设计选择。

英文摘要

Microscaling is a critical technique for preserving the quality of Large Language Models (LLMs) quantized to ultra-low precision formats. Intuitively, finer block sizes should yield lower quantization error; however, a paradox recently identified by Fasoli et al. (2026) demonstrates that standard abs-max scaling can actually result in degraded model quality as block sizes shrink. In this work, we investigate the underlying mechanics of this phenomenon. We demonstrate that this degradation is not an inherent limitation of finer granularity, but is primarily driven by how elements in smaller blocks statistically cluster closer to their local block maximum, interacting poorly with the coarse subnormal E4M3 values used as scaling factors. Specifically, we show that i) preventing the scaling factor from underflowing to zero mitigates errors caused by extreme underflow, ii) targeted algorithmic interventions like the 4-over-6 methodology that give more flexibility to the choice of scaling factor resolve the paradox for larger values, and iii) a brute-force search establishes an optimal baseline, confirming that the theoretical Mean Squared Error (MSE) strictly improves with finer block sizes. Ultimately, our findings highlight a critical insight for hardware-software co-design: the block-size paradox is partially an artifact of naive scale selection. While using hierarchical scaling factors or wider formats like UE5M3 interchangeably resolves much of the quality loss, we found the 4-over-6 scale selection heuristic can even further improve quality, especially for very small block sizes. Consequently, maximizing the performance of next-generation ML accelerators will require treating silicon format specifications and software scaling algorithms as tightly coupled design choices.

2606.02608 2026-06-10 cs.LG 版本更新

Pruning Deep Neural Networks via the Marchenko--Pastur Distribution

通过Marchenko-Pastur分布剪枝深度神经网络

Leonid Berlyand, Theo Bourdais, Houman Owhadi, Yitzchak Shmalo

发表机构 * Department of Mathematics, Pennsylvania State University(数学系,宾夕法尼亚州立大学) Department of Computing and Mathematical Sciences, California Institute of Technology(计算与数学科学系,加州理工学院)

AI总结 提出基于Marchenko-Pastur随机矩阵理论的剪枝方法,在极短微调预算下保持精度,并在ImageNet-1k上验证了多种架构的高效稀疏执行加速。

详情
AI中文摘要

我们研究了一种基于Marchenko-Pastur(MP)随机矩阵的深度神经网络剪枝方法,该方法适用于非常小的剪枝后微调预算。主要的实际贡献在于,在短校准和微调计划下保持精度,而不是采用长的剪枝后重新优化流程。理论给出了确定性数据路径保证:如果移除的组件$R$具有小的传播logit效应$L_s \| R \psi_1(s) \|_\infty$,则剪枝会降低弹性网络目标,并保留那些密集边界超过两倍扰动的样本。零预算情况给出了完美剪枝;剪枝-恢复扩展在固定稀疏执行模式内建模权重恢复;加性$L_2$正则化模型显示,可接受的随机类组件在训练极限处消失,而持续尖峰在MP bulk坍缩时稳定。在iid高斯充分条件下,拟合的MP边缘$\sigma_+$给出了高概率的逐层预算信号。在ImageNet-1k上,仅经过三个蒸馏epoch,ViT-B/16 $2{:}4{+}$ToMe在$59.81\%$稀疏执行MAC缩减下达到$83.41\%$ top-1(比密集低1.70个百分点),同一检查点和ToMe图在A40原生$2{:}4$后端加速比为$1.388 imes$;另一个无ToMe的A100端点加速比为$2.705 imes$。在结构化稀疏方面,ViT-B/16 $6{:}12$达到$83.74\%$,ViT-L/16 $8{:}16$密集+置换达到$85.33\%$(低0.51个百分点),ConvNeXtV2-Base $12{:}16$达到$86.35\%$(低0.37个百分点)。对于CNN,ResNet50 $8{:}16$密集+置换达到$75.87\%$(低0.26个百分点),ResNet152d CAST-conv+置换达到$81.33\%$(低1.53个百分点),在${\sim}50\%$ MAC核算下,A40 im2col$+2{:}4$稀疏GEMM审计加速比为$1.62 imes$。

英文摘要

We study a Marchenko--Pastur (MP) random-matrix approach to pruning deep neural networks with very small post-pruning fine-tuning budgets. The main practical contribution is accuracy retention under short calibration and fine-tuning schedules, rather than a long post-pruning reoptimization pipeline. The theory gives deterministic data-path certificates: if the removed component $R$ has small propagated logit effect $L_s \| R ψ_1(s) \|_\infty$, pruning decreases an elastic-net objective and preserves samples whose dense margin exceeds twice the perturbation. The zero-budget case gives perfect pruning; a prune--restore extension models weight restoration inside a fixed sparse-execution pattern; and an additive $L_2$-regularized model shows admissible random-like components vanish at the training limit, with persistent spikes stabilizing as the MP bulk collapses. Under iid-Gaussian sufficient conditions, the fitted MP edge $σ_+$ gives a high-probability layerwise budget signal. On ImageNet-1k, after only three distillation epochs, ViT-B/16 $2{:}4{+}$ToMe reaches $83.41\%$ top-1 ($-1.70$ pp from dense) at $59.81\%$ sparse-execution MAC reduction, with $1.388\times$ best-observed A40 native-$2{:}4$ backend speedup for the same checkpoint and ToMe graph; a separate no-ToMe A100 endpoint gives $2.705\times$. At structured sparsity, ViT-B/16 $6{:}12$ reaches $83.74\%$, ViT-L/16 $8{:}16$ dense+permutation reaches $85.33\%$ ($-0.51$ pp), and ConvNeXtV2-Base $12{:}16$ reaches $86.35\%$ ($-0.37$ pp). For CNNs, ResNet50 $8{:}16$ dense+permutation reaches $75.87\%$ ($-0.26$ pp), and ResNet152d CAST-conv+permutation reaches $81.33\%$ ($-1.53$ pp) at ${\sim}50\%$ MAC accounting with a $1.62\times$ A40 im2col$+2{:}4$ sparse-GEMM audit.

2606.06021 2026-06-10 cs.LG cs.AI 版本更新

OPRD: On-Policy Representation Distillation

OPRD: 在线策略表示蒸馏

Shenzhi Yang, Guangcheng Zhu, Bowen Song, Haobo Wang, Mingxuan Xia, Xing Zheng, Yingfan Ma, Zhongqi Chen, Weiqiang Wang, Gang Chen

发表机构 * Zhejiang University(浙江大学) Ant Group(蚂蚁集团)

AI总结 针对在线策略蒸馏中输出空间监督的采样方差和忽略中间隐藏状态的问题,提出OPRD方法,通过在隐藏状态空间对齐师生表示,消除采样方差、提供更丰富的逐层结构信息,并在AIME等基准上缩小师生差距,训练速度提升1.44倍,内存减少54%。

详情
AI中文摘要

在线策略蒸馏(OPD)仅通过匹配下一个词元的概率在输出空间监督学生。这种仅输出范式有两个限制:(1)在大词汇量(例如Qwen约15万个词元)上,蒙特卡洛KL估计的采样方差在整个训练过程中持续存在;(2)它将教师视为黑盒,丢弃了LM头之后的所有中间隐藏状态。我们提出在线策略表示蒸馏(OPRD),通过在相同轨迹上选择层对齐学生和教师的表示,将蒸馏提升到隐藏状态空间,完全绕过LM头。理论上,OPRD消除了采样方差,并提供了更丰富的逐层结构信息。实验上,OPRD在AIME 2024/2025和AIMO上缩小了学生与教师之间的差距,而输出空间OPD基线停滞在教师水平以下。OPRD的训练速度也比top-k OPD快1.44倍,内存使用减少54%。代码:https://github.com/ShenzhiYang2000/OPRD。

英文摘要

On-policy distillation (OPD) supervises the student only in output space by matching next-token probabilities. This output-only paradigm has two limits: (1) sampling variance from Monte Carlo KL estimates over large vocabularies (e.g., Qwen's ~150k tokens) persists throughout training, and (2) it treats the teacher as a black-box, discarding all intermediate hidden states after the LM head. We propose On-Policy Representation Distillation (OPRD), which lifts distillation into hidden-state space by aligning student and teacher representations across selected layers on the same rollouts, bypassing the LM head entirely. Theoretically, OPRD eliminates sampling variance and provides richer per-layer structural information. Empirically, OPRD closes the student-teacher gap on AIME 2024/2025 and AIMO, while output-space OPD baselines plateau below the teacher. OPRD also trains 1.44x faster and uses 54% less memory than top-k OPD. Code: https://github.com/ShenzhiYang2000/OPRD.

2606.07586 2026-06-10 cs.LG cs.AI cs.AR cs.MA 版本更新

From Human Guidance to Autonomy: Agent Skill System for End-to-End LLM Deployment on Spatial NPUs

从人类引导到自主:面向空间NPU上端到端LLM部署的智能体技能系统

Jiajie Li, Erwei Wang, Zhiru Zhang, Samuel Bayliss

发表机构 * AMD Research and Advanced Development(AMD研究与高级开发)

AI总结 提出两阶段方法,从人类引导的智能体辅助部署到自主技能系统,在AMD XDNA 2 NPU上实现8种LLM的端到端自动部署,性能超越或持平人工优化基线。

Comments Accepted to the Machine Learning for Architecture and Systems Workshop (MLArchSys), co-located with ISCA 2026

详情
AI中文摘要

空间神经处理单元(NPU)为边缘LLM推理提供了能效平台,但在此类硬件上高效端到端部署LLM仍然劳动密集。尽管AI编码智能体已开始降低这一成本,现有研究主要关注单核优化,而非在资源受限的空间NPU上进行端到端LLM部署。\n我们提出一种两阶段方法,在AMD XDNA 2 NPU上实例化,从人类引导开发进展到智能体自主。第一阶段,我们通过人类引导的智能体辅助开发Llama-3.2-1B的参考部署。与手工优化基线相比,该实现实现了2.2倍的预填充加速和4.0倍的解码加速,优化轨迹及其经验教训全程记录为结构化文档。第二阶段,我们将文档提炼为一个由八个阶段组成的智能体技能系统,编排优化和调试技能集,并在每个阶段严格执行数值正确性。\n利用我们的智能体技能系统,我们使用开源编译器栈在AMD XDNA 2 NPU上自主端到端部署了另外八个仅解码器LLM(Llama-3.2-3B、SmolLM2-1.7B、Qwen2.5-{0.5B, 1.5B, 3B}、Qwen3-{0.6B, 1.7B, 4B})。据我们所知,这些模型此前尚未通过任何开源软件栈部署在AMD NPU上。每次部署在0.5-4小时的智能体挂钟时间内完成,几乎无需人类引导,并通过数值正确性门控,展示了对先前未见LLM的功能泛化能力。其中八个中的三个达到或超过了我们Llama-3.2-1B参考部署的持续性能,表明所得实现无需额外模型特定人工工程即可具有竞争力。

英文摘要

Spatial neural processing units (NPUs) provide an energy-efficient platform for edge LLM inference, but efficiently deploying an LLM end-to-end on such hardware remains labor-intensive. Although AI coding agents have begun to lower this cost, existing studies have largely focused on single-kernel optimization rather than end-to-end LLM deployment on resource-constrained spatial NPUs. We present a two-stage methodology, instantiated on the AMD XDNA 2 NPU, that progresses from human-guided development to agent autonomy. In the first stage, we develop a reference deployment of Llama-3.2-1B through human-guided agent assistance. The resulting implementation achieves a speedup of 2.2x on prefill and 4.0x on decode over the hand-optimized baseline, with the optimization trajectory and its lessons recorded as structured documentation throughout. In the second stage, we distill the documentation into an agent skill system consisting of eight phases, orchestrating the optimization and debugging skill sets, with numerical correctness strictly enforced at each phase. Using our agent skill system, we autonomously deploy eight additional decoder-only LLMs (Llama-3.2-3B, SmolLM2-1.7B, Qwen2.5-{0.5B, 1.5B, 3B}, Qwen3-{0.6B, 1.7B, 4B}) end-to-end on the AMD XDNA 2 NPU using the open-source compiler stack. To our knowledge, these models have not previously been deployed on AMD NPUs via any open-source software stack. Each deployment completes in 0.5-4 hours of agent wall time with almost no human guidance, and passes the numerical-correctness gates, demonstrating functional generalization to previously unencountered LLMs. Three of the eight match or exceed the sustained performance of our Llama-3.2-1B reference deployment, suggesting that the resulting implementations can be competitive without additional model-specific human engineering.

2606.09079 2026-06-10 cs.LG cs.AI 版本更新

FlashMemory-DeepSeek-V4: Lightning Index Ultra-Long Context via Lookahead Sparse Attention

FlashMemory-DeepSeek-V4: 通过前瞻稀疏注意力实现闪电索引超长上下文

Yan Wang, Qifan Zhang, Jiachen Yu, Tian Liang, Dongyang Ma, Xiang Hu, Zibo Lin, Chunyang Li, Zhichao Wang, Miao Peng, Nuo Chen, Jia Li, Yujiu Yang, Haitao Mi, Dong Yu

发表机构 * Independent Researchers(独立研究者) Tencent(腾讯) The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Tsinghua University(清华大学)

AI总结 提出前瞻稀疏注意力(LSA),基于DeepSeek-V4架构的神经记忆索引器,通过预测未来上下文需求仅保留关键KV块,在超长上下文场景下将物理KV缓存压缩至全上下文的13.5%,同时保持或略微提升下游准确率。

Comments Technical report. 11 pages. Code and model available at https://github.com/libertywing/FlashMemory-Deepseek-V4 and https://huggingface.co/libertywing/FlashMemory-Deepseek-V4

详情
AI中文摘要

传统大语言模型在解码过程中保持完整的KV缓存,导致超长上下文服务出现严重的GPU内存瓶颈。在本报告中,我们提出前瞻稀疏注意力(LSA),一种基于DeepSeek-V4架构构建的神经记忆索引器驱动的新型推理范式。LSA并非被动地关注所有历史令牌,而是主动预测未来的上下文需求,并仅在GPU内存中保留查询关键的KV块。关键的是,我们通过无骨干的解耦训练策略实例化该架构。通过将索引器制定为标准双编码器架构,我们使用标准检索训练框架独立训练它,而无需将庞大的骨干模型加载到GPU内存中。我们证明这种“少即是多”的范式显著最大化服务效率,同时在依赖长期全局记忆的任务中充当有效的注意力去噪器。在主要的长上下文评估套件(例如LongBench-v2、LongMemEval和RULER)中,FM-DS-V4将平均物理KV缓存占用压缩至全上下文基线的仅13.5%,同时一致地保持或略微提升下游准确率(平均绝对边际+0.6%)。关键的是,在极端500K规模下,FlashMemory将物理KV缓存开销抑制超过90%,而不会破坏骨干的核心推理能力。

英文摘要

Conventional LLMs keep the full KV cache loaded during decoding, causing a severe GPU memory bottleneck for ultra-long context serving. In this report, we propose Lookahead Sparse Attention (LSA), a novel inference paradigm powered by a Neural Memory Indexer built upon the DeepSeek-V4 architecture. Rather than passively attending to all historical tokens, LSA proactively predicts future context demands and preserves only the query-critical KV chunks in the GPU memory. Crucially, we instantiate this architecture via a backbone-free decoupled training strategy. By formulating the indexer as a standard dual-encoder architecture, we train it independently using standard retrieval training frameworks without ever loading the massive backbone model into GPU memory. We demonstrate that this "less is more" paradigm significantly maximizes serving efficiency while acting as an effective attention denoiser in tasks that rely on long-term global memory. Across primary long-context evaluation suites (e.g., LongBench-v2, LongMemEval, and RULER), FM-DS-V4 compresses the average physical KV cache footprint down to merely 13.5% of the full-context baseline, while consistently preserving or slightly elevating downstream accuracy (+0.6% absolute margin on average). Crucially, at extreme 500K scales, FlashMemory suppresses the physical KV cache overhead by over 90% without destabilizing the backbone's core reasoning capacities.

2606.09377 2026-06-10 cs.LG cs.AI 版本更新

Scaling Neural Network Verification with Tensor Parallelism and Fully Sharded Data Parallelism

利用张量并行和全分片数据并行扩展神经网络验证

Sergei Vorobyov, Eugene Ilyushin

发表机构 * Lomonosov Moscow State University(莫斯科国立大学) Central University(中央大学)

AI总结 针对神经网络形式化验证中GPU内存瓶颈,将张量并行(TP)和全分片数据并行(FSDP)适配到auto_LiRPA/α,β-CROWN框架,TP实现约2倍峰值内存降低但边界紧度下降,FSDP实现80-90%基础内存降低且边界与单GPU逐位一致,并支持完整验证和卷积层。

详情
AI中文摘要

形式化神经网络验证——证明网络对于指定域内所有输入满足安全属性——在实践中受限于GPU内存:边界传播算法(IBP、CROWN、α-CROWN)的标准实现要求权重矩阵和松弛系数矩阵完全驻留在单个加速器上。我们将最初为大规模模型训练开发的两种并行技术适配到auto_LiRPA/α,β-CROWN验证框架。张量并行(TP)将权重矩阵和A矩阵分片到多个GPU上,在P=2时实现约2倍的峰值内存降低;在VNN-COMP 2022 MNIST-FC基准测试上确认了正确性,但由于分片区域内中间边界被迫使用IBP替代,边界紧度随分片区域数量增加而下降。全分片数据并行(FSDP)仅对权重矩阵进行分片,并逐层使用AllGather,产生的边界与单GPU基线逐位一致:在宽MLP上,基础内存降低80-90%,峰值内存降低34-39%。FSDP与完整验证(β-CROWN + 分支定界)和卷积层(BoundConv)无缝集成;在FSDP下,CIFAR-100 ResNet-large(VNN-COMP 2024)获得了完整的不可满足结果。在所有实验中,α-CROWN+BaB模式下的内存瓶颈被证明是每个神经元的alpha张量,而非权重矩阵,这指出了未来工作的关键方向。

英文摘要

Formal neural network verification -- proving that a network satisfies safety properties for *all* inputs in a specified domain -- is bounded in practice by GPU memory: standard implementations of bound-propagation algorithms (IBP, CROWN, $α$-CROWN) require weight and relaxation-coefficient matrices to reside entirely on one accelerator. We adapt two parallelism techniques originally developed for large-scale model training to the auto_LiRPA / $α,β$-CROWN verification framework. Tensor Parallelism (TP) shards both weight and $A$-matrices across GPUs, achieving ${\approx}2\times$ peak-memory reduction at $P{=}2$; soundness is confirmed on VNN-COMP 2022 MNIST-FC benchmarks, though bound tightness degrades with the number of sharded zones due to forced IBP substitution for intermediate bounds inside sharded zones. Fully Sharded Data Parallelism (FSDP) shards only weight matrices with a per-layer AllGather, producing bounds that are bitwise identical to the single-GPU baseline: baseline memory drops by 80--90%, peak memory by 34--39% on wide MLPs. FSDP integrates cleanly with complete verification ($β$-CROWN + Branch-and-Bound) and with convolutional layers (BoundConv); a complete unsat result is obtained for CIFAR-100 ResNet-large (VNN-COMP 2024) under FSDP. Across all experiments the memory bottleneck in $α$-CROWN+BaB mode proves to be per-neuron alpha tensors, not weight matrices, pointing to the key direction for future work.

2506.14753 2026-06-10 cs.CV cs.LG 版本更新

Cost-Aware Routing for Efficient Text-To-Image Generation

面向文本到图像生成的高效路由:成本感知方法

Qinchan Li, Kenneth Chen, Changyue Su, Wittawat Jitkrittum, Qi Sun, Patsorn Sangkloy

发表机构 * Tandon School of Engineering, New York University(纽约大学Tandon工程学院) Google(谷歌) Eigen 4D Inc.(Eigen 4D公司)

AI总结 提出成本感知路由框架,根据提示复杂度自动选择不同去噪步数或模型,在保证高质量的同时降低计算成本,优于单一模型。

Comments Accepted by TMLR

详情
AI中文摘要

扩散模型以其通过迭代去噪过程为输入提示生成高保真图像的能力而闻名。不幸的是,由于固有的顺序生成过程,高保真度也伴随着高计算成本。在这项工作中,我们寻求在质量和计算成本之间实现最优平衡,并提出一个框架,允许每个提示的计算量根据其复杂度而变化。每个提示自动路由到最合适的文本到图像生成函数,该函数可能对应扩散模型的不同去噪步数,或一个不同的、独立的文本到图像模型。与统一的成本降低技术(例如,蒸馏、模型量化)不同,我们的方法通过学习将昂贵的选择(例如,100+去噪步)仅保留给少数复杂提示,而对较简单的提示采用更经济的选择(例如,小型蒸馏模型),从而实现最优权衡。我们在COCO和DiffusionDB上经验性地证明,通过学习路由到九个已训练的文本到图像模型,我们的方法能够提供比这些模型单独使用时更高的平均质量。代码可在以下网址获取:https://this URL。

英文摘要

Diffusion models are well known for their ability to generate a high-fidelity image for an input prompt through an iterative denoising process. Unfortunately, the high fidelity also comes at a high computational cost due to the inherently sequential generative process. In this work, we seek to optimally balance quality and computational cost, and propose a framework to allow the amount of computation to vary for each prompt, depending on its complexity. Each prompt is automatically routed to the most appropriate text-to-image generation function, which may correspond to a distinct number of denoising steps of a diffusion model, or a disparate, independent text-to-image model. Unlike uniform cost reduction techniques (e.g., distillation, model quantization), our approach achieves the optimal trade-off by learning to reserve expensive choices (e.g., 100+ denoising steps) only for a few complex prompts, and employ more economical choices (e.g., small distilled model) for less sophisticated prompts. We empirically demonstrate on COCO and DiffusionDB that by learning to route to nine already-trained text-to-image models, our approach is able to deliver an average quality that is higher than that achievable by any of these models alone. Code is available at https://github.com/winglicopy/CATImage.

2605.18271 2026-06-10 cs.CL cs.AI cs.IR cs.LG 版本更新

From Volume to Value: Preference-Aligned Memory Construction for On-Device RAG

从体积到价值:面向设备端RAG的偏好对齐记忆构建

Changmin Lee, Jaemin Kim, Taesik Gong

发表机构 * Department of Computer Science and Engineering, Ulsan National Institute of Science and Technology (UNIST), Ulsan, Republic of Korea(计算机科学与工程系,全州国立科学与技术研究所(UNIST),全州,韩国)

AI总结 本文提出EPIC方法,通过将用户偏好作为紧凑且稳定的个人上下文形式,整合到RAG流程中,以在有限内存下提高检索与用户偏好的对齐度,从而减少内存使用并提升准确性。

Comments Accepted to ICML 2026. Code and data are available at https://github.com/UbiquitousAILab/EPIC

详情
AI中文摘要

随着基于大型语言模型(LLMs)的个人AI代理的迅速发展,将其部署到设备上已成为隐私和响应性的重要需求。为了处理现实世界请求中固有的个人和上下文依赖性,这些代理必须基于设备上存储的个人上下文进行生成。然而,在内存预算紧张的情况下,核心瓶颈是存储什么内容以确保检索与用户保持一致。我们提出EPIC(高效偏好对齐索引构建),专注于用户偏好作为紧凑且稳定的个人上下文形式,并在整个RAG流程中整合它们。EPIC会选择性地保留与偏好相关的信息,并将检索对准偏好对齐的上下文。在四个涵盖对话、辩论、解释和推荐的基准测试中,EPIC将索引内存减少了2,404倍,提高了偏好遵循的准确性20.17个百分点,并在最佳表现基线之上实现了33.33倍更低的检索延迟。在我们的设备端实验中,EPIC在29.35毫秒/查询的流式更新下保持内存占用低于1 MB。

英文摘要

With the rapid emergence of personal AI agents based on Large Language Models (LLMs), implementing them on-device has become essential for privacy and responsiveness. To handle the inherently personal and context-dependent nature of real-world requests, such agents must ground their generation in device-resident personal context. However, under tight memory budgets, the core bottleneck is what to store so that retrieval remains aligned with the user. We propose EPIC (Efficient Preference-aligned Index Construction), which focuses on user preferences as a compact and stable form of personal context and integrates them throughout the RAG pipeline. EPIC selectively retains preference-relevant information from raw data and aligns retrieval toward preference-aligned contexts. Across four benchmarks covering conversations, debates, explanations, and recommendations, EPIC reduces indexing memory by 2,404 times, improves preference-following accuracy by 18.79 %p, and achieves 32.17 times lower retrieval latency over the best-performing baseline. In on-device experiments, EPIC maintains under 1 MB memory and achieves 5.21 to 29.35 ms/query latency across three platforms, while supporting streaming updates under preference drift. Our code and data are available at https://github.com/UbiquitousAILab/EPIC.

7. 联邦学习、隐私与安全 14 篇

2606.09868 2026-06-10 cs.LG cs.AI 新提交

SPACE: Source-free Proxy Anchor Concept Erasure for MLLMs

SPACE: 面向多模态大语言模型的无源代理锚点概念擦除

Zhijing Zhang, Jiaqi Ding, Qianshan Wei, Nan Zhou, Jiaqi Li, Yongliang Wu, Tongxin Zhu, Xiaolin Fang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出SPACE框架,首个针对多模态大语言模型的无源机器遗忘方法,通过文本引导的代理锚点选择和双约束语义隔离,在不访问目标数据的情况下擦除概念,并保持模型性能。

详情
AI中文摘要

随着多模态大语言模型(MLLMs)面临日益增长的隐私风险和监管约束,机器遗忘(MU)已成为在保持模型性能的同时移除敏感数据的关键解决方案。然而,现有的MU方法通常依赖于目标概念的视觉数据,由于严格的数据保留政策,这些数据往往不可用,因此需要一种无需访问目标数据的无源遗忘方法。在这项工作中,我们提出了无源代理锚点概念擦除(SPACE),这是首个专门针对MLLMs的无源遗忘框架。SPACE包含两个阶段:(1)文本引导的代理锚点选择(TPAS),从共享特征空间中检索语义对齐的代理锚点;(2)双约束语义隔离(DCSI),优化这些锚点以间接擦除目标概念。DCSI将更新限制在保留知识的零空间中,确保结构完整性。我们从理论上证明,SPACE严格限制了保留知识的扰动,并最大化特征谱熵,从而保持模型性能。此外,在六个数据集上的大量实验表明,SPACE实现了与最先进的数据依赖方法相当的性能,验证了其在无源MU场景中的有效性。源代码将发布。

英文摘要

As Multimodal Large Language Models (MLLMs) face growing privacy risks and regulatory constraints, machine unlearning (MU) has emerged as a crucial solution for removing sensitive data while preserving model performance. However, existing MU methods typically rely on visual data of the target concepts, which is often unavailable due to strict data retention policies, thus creating a demand for source-free unlearning approaches that operate without access to the target data. In this work, we propose Source-free Proxy Anchor Concept Erasure (SPACE), the first source-free unlearning framework specialized for MLLMs. SPACE consists of two stages: (1) Text-Guided Proxy Anchor Selection (TPAS), which retrieves semantically aligned proxy anchors from the shared feature space. (2) Dual-Constraint Semantic Isolation (DCSI), which optimizes these anchors to indirectly erase target concepts. DCSI confines updates to the null space of retained knowledge, ensuring structural integrity. We theoretically prove that SPACE strictly bounds the perturbation on retained knowledge and maximizes feature spectral entropy, thereby maintaining the model's performance. Furthermore, extensive experiments across six datasets show that SPACE achieves performance comparable to that of state-of-the-art data-dependent methods, validating its effectiveness in source-free MU scenarios. The source code will be released.

2606.09869 2026-06-10 cs.LG cs.AI cs.CR 新提交

QSplitFL: Capability Aware Deep Q-Learning for Optimal Split Point Selection in Split Federated Learning

QSplitFL: 基于能力感知的深度Q学习在分割联邦学习中的最优分割点选择

Nazmus Shakib Shadin, Xinyue Zhang, Jingyi Wang, Miao Pan

AI总结 提出QSplitFL框架,利用深度Q网络基于客户端硬件指标(CPU、内存、电池、网络延迟)动态选择最优分割点,解决异构设备上的分割联邦学习挑战,通过衰减损失奖励函数和委员会投票机制提升收敛速度和精度。

Comments Accepted by ECML-PKDD 2026

详情
AI中文摘要

联邦学习(FL)与分割学习(SL)结合是一种隐私保护范式,能够在资源受限设备上训练深度神经网络(DNN),同时降低整体训练成本。然而,确定最优分割点(即模型被分割的层)仍然是一个关键挑战,尤其是当客户端具有异构硬件能力时。固定分割点可能使弱设备过载,增加通信和服务器负载,从而减慢收敛速度并降低稳定性。本文介绍了QSplitFL,一种新颖的基于能力感知的深度Q网络(DQN)框架,用于在基于分割学习的联邦学习(SFL)环境中选择最优分割点。与依赖高维模型权重表示的现有方法不同,QSplitFL采用直接从客户端硬件指标(包括CPU利用率、内存、电池电量和网络延迟)导出的轻量级状态表示。所提出的框架包含一个衰减损失下降奖励函数,优先考虑早期收敛,以及一个基于委员会的DQN架构,通过多数投票来减轻奖励黑客攻击。在MNIST、Fashion-MNIST、CIFAR-10和CIFAR-100数据集上,使用CNN、ResNet50、MobileNetV4和ConvNeXt架构进行的广泛实验表明,我们的方法在收敛速度和精度上优于现有方法,同时有效适应异构设备资源。源代码在此https URL公开可用。

英文摘要

Federated Learning (FL) combined with Split Learning (SL) is a privacy preserving paradigm that enables training deep neural networks (DNNs) on resource constrained devices while reducing overall training cost. However, determining the optimal split point, meaning the layer where the model is divided still remains a critical challenge, especially when clients have heterogeneous hardware capabilities. Fixed split points can overload weak devices and increase the communication and server load, which slows convergence and reduces stability. This paper introduces QSplitFL, a novel capability-aware Deep Q-Network (DQN) framework for optimal split point selection in Split learning based Federated Learning (SFL) environments. Unlike existing approaches that rely on high-dimensional model weight representations, QSplitFL employs a lightweight state representation derived directly from client hardware metrics, including CPU utilization, memory, battery level, and network latency. The proposed framework incorporates a decayed loss-drop reward function that prioritizes early convergence, and a committee-based DQN architecture with majority voting to mitigate reward hacking. Extensive experiments on MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100 datasets using CNN, ResNet50, MobileNetV4, and ConvNeXt architectures demonstrate that our approach achieves better convergence and higher accuracy compared to existing methods, while effectively adapting to heterogeneous device resources. The source code is publicly available at https://github.com/AIPO-Lab/QSplitFL.

2606.10124 2026-06-10 cs.LG cs.AI 新提交

FedSteer: Taming Extreme Gradient Staleness in Federated Learning with Corrective Projections and Caching

FedSteer: 通过校正投影和缓存驯服联邦学习中的极端梯度陈旧性

Haoran Zhang, Cainã Figueiredo Pereira, Marie Siew, Xutong Liu, Carlee Joe-Wong, Rachid El-Azouzi

AI总结 针对联邦学习中客户端参与不均导致的梯度陈旧问题,提出FedSteer方法,利用客户端梯度缓存构建子空间,通过投影和缓存策略校正陈旧梯度,显著提升训练稳定性与精度。

Comments UAI 2026

详情
AI中文摘要

联邦学习(FL)在客户端不持续参与训练轮次时,常遭受聚合方差的影响。虽然重用非活跃客户端的陈旧模型更新是减少这种方差的常见技术,但我们发现,在客户端参与偏斜的情况下,由此产生的更新陈旧性可能变得严重到足以破坏训练稳定性。为了解决这个问题,我们提出了FedSteer,一种新颖的方法,该方法从最近客户端梯度的缓存中构建一个梯度子空间,作为当前优化景观的低维表示。FedSteer将活跃客户端的真实梯度投影到这个子空间上,以找到一组最优坐标。对于非活跃客户端,FedSteer重用这些坐标,并结合由其他活跃客户端漂移的已演化的子空间。这个过程有效地将过时的梯度“引导”向当前的全局目标。此外,还辅以选择性缓存策略,识别代表性客户端子集以形成子空间,从而减少服务器内存。实验表明,FedSteer显著优于基线,在挑战性场景中防止性能崩溃,并在其他场景中实现超过7%的精度提升。

英文摘要

Federated learning (FL) is often subject to aggregation variance if clients do not consistently participate in training rounds. While reusing stale model updates from inactive clients is a common technique to reduce this variance, we find that with skewed client participation, the resulting update staleness can become severe enough to destabilize training. To remedy this, we propose FedSteer, a novel method that constructs a gradient subspace from a cache of recent client gradients to serve as a low-dimensional representation of the current optimization landscape. FedSteer projects an active client's true gradient onto this subspace to find a set of optimal coordinates. For an inactive client, FedSteer reuses these coordinates with the now-evolved subspace drifted by other active clients. This process effectively "steers" outdated gradients toward the current global objective. This is complemented by a selective caching strategy that identifies a representative client subset to form the subspace, reducing server memory. Experiments demonstrate that FedSteer significantly outperforms baselines, preventing performance collapse in challenging scenarios while delivering accuracy gains of over 7% in others.

2606.10217 2026-06-10 cs.LG cs.CR 新提交

Alignment Defends LLMs from Property Inference Attacks

对齐防御LLM免受属性推断攻击

Pengrun Huang, Chhavi Yadav, Ruihan Wu, Kamalika Chaudhuri

AI总结 提出基于对齐的防御方法,通过后训练调整模型输出分布,在不修改训练数据的情况下缓解属性推断攻击,并保持效用与机密性的平衡。

详情
AI中文摘要

大型语言模型(LLMs)越来越多地在包含敏感数据集级属性的领域特定数据集上进行微调。最近的研究表明,此类数据集级信息可以通过属性推断攻击有效提取,构成机密性风险。现有的防御措施主要通过修改训练数据分布来运作,因此需要访问原始数据并重新训练模型,限制了其在数据不可用或模型已部署场景中的适用性。在这项工作中,我们提出了基于对齐的防御方法,用于缓解LLMs中的属性推断攻击。我们的方法通过后训练对齐将模型的输出分布重塑为目标属性比率,而无需修改训练数据。具体而言,我们通过构建偏好对和定义特定奖励函数,分别适配两种广泛使用的RLHF框架——直接偏好优化(DPO)和组相对策略优化(GRPO)——作为我们的防御方法。通过全面的实验,我们展示了基于对齐的防御方法有效缓解了属性推断攻击,同时保持了良好的效用-机密性权衡。

英文摘要

Large language models (LLMs) are increasingly fine-tuned on domain-specific datasets that may contain sensitive, dataset-level properties. Recent work has shown that such dataset-level information can be effectively extracted through property inference attacks, posing a confidentiality risk. Existing defenses against these attacks primarily operate by modifying the training data distribution and hence require access to the original data and retraining the model, limiting their applicability to settings where data is unavailable or models are already deployed. In this work, we propose alignment-based defenses for mitigating property inference attacks in LLMs. Our approach reshapes the model's output distribution towards a target property ratio via post-training alignment, without modifying the training data. In particular, we adapt two widely used RLHF frameworks--Direct Preference Optimization (DPO) and Group Relative Policy Optimization (GRPO)--as our defenses by constructing preference pairs and defining a specific reward function respectively. Through comprehensive experiments, we show that our alignment based defenses effectively mitigate property inference attacks while maintaining a strong utility confidentiality tradeoff.

2606.10250 2026-06-10 cs.LG cs.AI 新提交

Multi-Level Analyzation of Imbalance to Resolve Non-IID-Ness in Federated Learning

联邦学习中不平衡的多层次分析以解决非独立同分布问题

Haengbok Chung, Jae Sung Lee

发表机构 * Interdisciplinary Program in Artificial Intelligence, Seoul National University, Republic of Korea(人工智能跨学科项目,首尔国立大学,韩国) Department of Nuclear Medicine, Seoul National University College of Medicine, Republic of Korea(核医学系,首尔国立大学医学院,韩国) Brightonix Imaging Inc. Seoul, Republic of Korea(Brightonix影像公司,首尔,韩国)

AI总结 提出FedBB算法,通过PNB损失函数和CBR重加权分别解决本地训练中的类内/类间不平衡和客户端间不平衡,在X射线和自然图像数据集上优于现有方法。

Comments 27 pages, 5 figures, 13 tables. Accepted for publication in Neurocomputing (2025). Author Accepted Manuscript

详情
Journal ref
Neurocomputing, Volume 626, 2025, Article 129528
AI中文摘要

类别不平衡是深度学习中常见的问题,会严重降低性能。在联邦学习(FL)中,它是导致非独立同分布数据(non-IID)的关键因素。基于先前的一些尝试,我们在三个层次上定义并分析了FL中的不平衡问题:案例间、类别间和客户端间。案例间不平衡处理每个单一类别内的不平衡;类别间不平衡比较不同类别之间的数据数量。客户端间不平衡表示不同客户端之间本地数据的偏斜程度。基于这些概念,我们提出了FedBB,它由两个主要部分组成:(1)正负平衡(PNB)损失函数解决了本地训练中的案例间和类别间不平衡,增强了高度偏斜的本地客户端数据集上的泛化能力。它通过为少数案例或类别分配更高的权重来优化多标签和多类分类。(2)客户端平衡重加权(CBR)在模型聚合期间根据客户端间不平衡重新加权客户端,为在偏斜较小的数据集上训练的模型赋予更大的权重。在X射线和自然图像数据集上的各种实验表明,FedBB在性能和效率上均优于其他算法。此外,它只需要有限的统计信息,这有利于隐私保护。通过消融研究,我们证明了PNB损失和CBR独立地贡献于性能。由于FedBB旨在构建一个能准确分类所有类别的全局模型,它可以作为通用和个性化FL的基线。

英文摘要

Class imbalance is a common problem in deep learning that severely degrades performance. In federated learning (FL), it is a critical factor contributing to non-identically distributed data (non-IID). Building on several previous attempts, we define and analyze imbalance issues in FL at three levels: inter-case, inter-class, and inter-client. Inter-case imbalance addresses the imbalance in every single class; inter-class imbalance compares the number of data between different classes. Inter-client imbalance represents different skewness of local data between clients. Based on these concepts, we propose FedBB, which consists of two main components: (1) Positive Negative Balanced (PNB) loss function addresses the inter-case and inter-class imbalances in local training, enhancing generalization on highly skewed local client datasets. It optimizes both multi-label and multi-class classifications by assigning higher weights to minority cases or classes. (2) Client Balanced Reweighting (CBR) reweights clients based on inter-client imbalance during model aggregation, giving greater weight to models trained on less skewed datasets. Various experiments on X-ray and natural image datasets demonstrate that FedBB outperforms other algorithms in both performance and efficiency. Additionally, it requires limited statistical information, which is beneficial for privacy protection. Through ablation studies, we proved that PNB loss and CBR independently contribute to performance. As FedBB aims to build a global model that accurately classifies all classes, it can serve as a baseline for the generic and personalized FL.

2606.10333 2026-06-10 cs.LG cs.CR 新提交

Privacy-Preserving Credit Risk Prediction with Alternative Data

基于替代数据的隐私保护信用风险预测

Hongzhe Zhang, Jiarong Xu, Jing He, Xiao Fang

发表机构 * School of Management and Economics, The Chinese University of Hong Kong, Shenzhen (CUHK-Shenzhen)(香港中文大学(深圳)管理学院) School of Management, Fudan University(复旦大学管理学院) Lerner College of Business and Economics, University of Delaware(特拉华大学勒纳商学院)

AI总结 针对信用风险预测中替代数据共享导致的隐私泄露问题,提出PrivacyCredit方法,在保护消费者隐私、模型保密性和无损性能约束下,实现与传统明文数据组合相同的预测性能。

详情
AI中文摘要

信用风险预测是消费信贷行业中的一个关键问题。传统上,金融机构使用借款人的人口统计、财务和信用历史数据(统称为传统数据)构建信用风险预测模型。最近的研究表明,替代数据(如借款人的手机通信数据)使贷款人能够获得更全面、更准确的借款人信用状况画像,从而提高信用风险预测性能。然而,替代数据由独立于金融机构的外部实体持有。直接与金融机构共享替代数据会侵犯消费者隐私,但现有的信用风险预测研究大多忽略了这一问题。为填补这一空白,我们定义了一个新问题,即基于替代数据的隐私保护信用风险预测,该问题同时考虑三个实际约束:保护消费者隐私的隐私保护约束、在金融机构集中学习和存储模型的模型保密性约束,以及保持学习模型性能的无损约束。为解决该问题,我们开发了PrivacyCredit,一种新颖的隐私保护机器学习方法。然后,我们从理论上证明了PrivacyCredit的隐私保护、模型保密和无损特性。通过使用与替代数据关联的真实信用数据集进行大量实验,我们证明了安全地将替代数据纳入信用风险预测的预测价值,并表明PrivacyCredit实现了与从传统数据和替代数据的不安全明文组合中学习的模型相同的预测性能。我们进一步评估了其模型保密性和计算效率。

英文摘要

Credit risk prediction is a critical problem in the consumer credit industry. Traditionally, financial institutions construct credit risk prediction models using borrowers' demographic, financial, and credit history data, collectively referred to as traditional data. Recent studies have demonstrated that alternative data, such as borrowers' mobile phone communication data, enable lenders to acquire fuller and more accurate profiles of borrowers' creditworthiness, thereby improving credit risk prediction performance. Nevertheless, alternative data are held by external entities independent of financial institutions. Directly sharing alternative data with financial institutions infringe on consumer privacy, yet existing credit risk prediction studies largely overlook this issue. To address this gap, we define a new problem, namely privacy-preserving credit risk prediction with alternative data, which simultaneously considers three practical constraints: the privacy-preserving constraint that protects consumer privacy, the model-confidentiality constraint that learns and stores the model centrally at the financial institution, and the lossless constraint that maintains the performance of the learned model. To solve this problem, we develop PrivacyCredit, a novel privacy-preserving machine learning method. We then theoretically demonstrate the privacy-preserving, model-confidential, and lossless properties of PrivacyCredit. Through extensive experiments using a real-world credit dataset linked with alternative data, we demonstrate the predictive value of securely incorporating alternative data into credit risk prediction and show that PrivacyCredit achieves the same predictive performance as the model learned from the insecure plaintext combination of traditional and alternative data. We further evaluate its model-confidentiality property and computational efficiency.

2606.10481 2026-06-10 cs.LG cs.AI cs.CL cs.CR stat.ML 新提交

Advancing the State-of-the-Art in Empirical Privacy Auditing

推进经验隐私审计的最新水平

Nicole Mitchell, Galen Andrew, Arun Ganesh, Brendan McMahan, Peter Kairouz

发表机构 * Google Research(谷歌研究院)

AI总结 提出通过高温采样生成合成金丝雀,用于经验隐私审计,并引入基于辅助模型的合成数据审计方法,系统研究模型容量与金丝雀熵对记忆化的交互影响。

详情
AI中文摘要

大型语言模型的参数高效微调可能表现出对个别训练示例的问题性记忆。经验隐私审计(EPA)通过测量成员推断(MI)或重构攻击上的实际数据泄露来量化这种风险。EPA的一个关键挑战是设计与隐私敏感训练数据混合的“金丝雀”示例。我们提出通过从LLM中进行高温采样($T \geq 0.8$)生成合成金丝雀,使用针对隐私敏感训练数据定制的提示。这些金丝雀作为高影响异常值,确保高可识别性,从而实现强审计。此外,由于金丝雀本身是非私有的,它们是可检查的,并且可以重复插入,而不会危及真实数据的隐私。在隐私敏感数据上微调的模型的一个重要用途是生成合成数据。这也带来了隐私风险。我们引入了一种强大的合成数据审计方法,基于在合成数据上微调辅助模型。然后,对原始金丝雀的辅助模型进行审计,可以强有力地估计通过合成数据的隐私泄露。最后,利用我们强大的审计方法,我们系统研究了模型容量和金丝雀熵对记忆化的交互影响。

英文摘要

Parameter-efficient fine-tuning of large language models (LLMs) can exhibit problematic memorization of individual training examples. Empirical privacy auditing (EPA) quantifies this risk by measuring realistic data leakage on membership inference (MI) or reconstruction attacks. A key challenge in EPA is designing ``canary'' examples that are mixed with the privacy-sensitive training data. We propose generating synthetic canaries via high-temperature sampling ($T \geq 0.8$) from LLMs, using prompts tailored to the privacy-sensitive training data. These canaries act as high-influence outliers, ensuring high identifiability and hence strong audits. Further, since the canaries are themselves non-private, they are inspectable and can be inserted with repetition without jeopardizing the privacy of the real data. An important use of models fine-tuned on privacy-sensitive data is the generation of synthetic data. This also comes with privacy risk. We introduce a powerful synthetic data audit based on fine-tuning an auxiliary model on the synthetic data. Auditing the auxiliary model for the original canaries then provides a strong estimate of the privacy leakage through the synthetic data. Finally, leveraging our strong auditing methodologies, we perform a systematic investigation into the interacting effects of model capacity and canary entropy on memorization.

2606.10499 2026-06-10 cs.LG cs.AI 新提交

MoE Enhanced Federated Learning for Spatiotemporal Prediction

基于混合专家模型增强的联邦学习用于时空预测

Zhehao Dai, Xiao Han, Zhaolin Deng, Zijian Zhang, Xiangyu Zhao, Guojiang Shen, Xiangjie Kong

发表机构 * Zhejiang University of Technology, Zhejiang Key Laboratory of Visual Information Intelligent Processing(浙江工业大学,浙江省可视信息智能处理重点实验室) Jilin University(吉林大学) City University of Hong Kong(香港城市大学)

AI总结 提出MoE-FedTP框架,通过轻量级混合专家网络和门控机制,在保护隐私的同时实现跨城市时空预测,有效缓解数据稀缺和异质性问题。

详情
AI中文摘要

交通预测是智能交通系统和城市计算的基础,然而由于传感器部署有限和城市发展不均衡,许多城市仍然面临交通数据稀缺的问题。跨城市知识转移因此受到越来越多的关注,使数据丰富的城市能够帮助数据稀缺的城市。然而,集中式方法引发了隐私问题,而现有的联邦方法难以应对城市间显著的时空异质性。为了解决这些挑战,我们提出了MoE-FedTP,一种基于轻量级混合专家(MoE)网络的个性化联邦跨城市时空预测框架。MoE-FedTP首先利用时空神经网络从源城市和目标城市提取特征,然后通过部分参数共享引入来自不同源城市的专家网络集合。门控机制动态融合专家以捕捉多样的交通动态,在保护隐私的同时实现城市异质性的细粒度建模。在四个真实世界交通数据集上的实验表明,MoE-FedTP始终优于最先进的跨城市和联邦学习基线,证明了其在提高数据稀缺城市预测准确性方面的有效性。

英文摘要

Traffic prediction is fundamental to intelligent transportation systems and urban computing, yet many cities continue to suffer from traffic data scarcity due to limited sensor deployment and uneven urban development. Cross-city knowledge transfer has thus attracted increasing attention, enabling data-rich cities to assist data-scarce ones. However, centralized approaches raise privacy concerns, while existing federated methods struggle with pronounced spatiotemporal heterogeneity across cities. To address these challenges, we propose MoE-FedTP, a personalized federated cross-city spatiotemporal prediction framework based on lightweight Mixture-of-Experts (MoE) networks. MoE-FedTP first employs spatiotemporal neural networks to extract features from both source and target cities, then introduces a set of expert networks derived from different source cities through partial parameter sharing. A gating mechanism dynamically fuses the experts to capture diverse traffic dynamics, achieving fine-grained modeling of urban heterogeneity while preserving privacy. Experiments on four real-world traffic datasets show that MoE-FedTP consistently outperforms state-of-the-art cross-city and federated learning baselines, demonstrating its effectiveness in enhancing prediction accuracy for data-scarce cities.

2606.10091 2026-06-10 cs.CR cs.LG 交叉投稿

SoK: Colluding Adversaries in Machine Learning Pipelines

SoK: 机器学习流水线中的合谋攻击者

Vasisht Duddu, Lipeng He, Asim Waheed, N. Asokan

发表机构 * University of Waterloo(滑铁卢大学) KTH Royal Institute of Technology(皇家理工学院)

AI总结 本文提出一个系统框架,研究机器学习流水线中训练阶段与推理阶段攻击者之间的合谋行为,通过五个实证案例验证了合谋的潜在风险,并讨论了攻击者特征对合谋可能性的影响。

Comments USENIX Security Symposium, 2026

详情
AI中文摘要

机器学习模型容易受到各种安全、隐私和公平性风险的影响。具有不同特征(即目标、知识和能力)的攻击者可以通过执行一种攻击来放大其他攻击,从而进行合谋。现有工作缺乏一个系统框架来探索攻击者之间的合谋,以及研究攻击者特征的影响。我们提出了一个涵盖(a)训练阶段和推理阶段攻击者之间,以及(b)推理阶段攻击者之间的合谋框架。我们的框架考虑了促成攻击者之间合谋的因素。我们提出了一种指南,利用促成因素推测合谋的可能性。我们用它来解释先前的工作,推测未探索的合谋,并实证验证了五个这样的案例。最后,我们讨论了攻击者特征如何影响合谋的可能性。

英文摘要

Machine learning (ML) models are susceptible to various security, privacy, and fairness risks. Adversaries with different characteristics (i.e., objectives, knowledge, and capabilities) can collude by executing one attack to amplify others. Existing work lacks a systematic framework to explore collusion among adversaries, and to study the implications of the adversaries' characteristics. We present a framework covering collusion (a) between train- and inference-time adversaries, and (b) among inference-time adversaries. Our framework accounts for factors enabling collusion between adversaries. We propose a guideline to conjecture about the potential for collusion using enabling factors. We use it to explain prior work, conjecture about unexplored collusions, and empirically validate five such cases. Finally, we discuss how adversaries' characteristics influence the potential for collusion.

2606.10742 2026-06-10 cs.CR cs.LG 交叉投稿

MemVenom: Triggered Poisoning of Multimodal Memories in Web Agents

MemVenom:网络代理中多模态记忆的触发式投毒

Yv Zhang, Hao Sun, Hao Fang, Kuofeng Gao, Fan Mo, Bin Chen, Shu-Tao Xia, Yaowei Wang

发表机构 * Harbin Institute of Technology, Shenzhen(哈尔滨工业大学(深圳)) Peng Cheng Laboratory(鹏城实验室) Tianjin University(天津大学) Shenzhen International Graduate School, Tsinghua University(深圳国际研究生学院,清华大学) Huawei Technologies Ltd.(华为技术有限公司)

AI总结 提出MemVenom框架,针对网络代理的外部记忆系统,通过触发条件检索和攻击诱导,实现黑盒多模态记忆投毒,达到高成功率且不影响良性性能。

Comments Preprint. 27 pages, 6 figures, 6 tables

详情
AI中文摘要

外部记忆已成为现代网络代理的核心组件,通过检索过去经验实现长程推理。然而,这种范式引入了一个关键漏洞:注入记忆中的恶意内容可以被持续召回并反复影响代理行为。在这项工作中,我们识别并系统研究了多模态记忆投毒——网络代理系统中一个被忽视但实际存在的攻击面。我们提出MemVenom,一个统一的黑盒攻击框架,通过协调的文本-图像证据对图结构外部记忆进行投毒。我们的方法包括两阶段设计:(1) 触发条件检索攻击,确保恶意记忆的高概率召回;(2) 检索后攻击诱导,利用对抗性扰动和隐蔽OCR注入覆盖原始用户目标。与先前针对提示或纯文本记忆的攻击不同,我们的方法无需修改模型参数或重新优化恶意任务,即可实现持久、可重用且目标无关的攻击。在多个网络代理框架和视觉语言模型上的实验表明,MemVenom在最小化对良性性能影响的同时,实现了强大的端到端攻击成功率,在GPT-5系列网络代理上达到99.15%,并在不同架构和模型规模间有效迁移。

英文摘要

External memory has become a core component of modern web agents, enabling long-horizon reasoning through the retrieval of past experiences. However, this paradigm introduces a critical vulnerability: malicious content injected into memory can be persistently recalled and repeatedly influence agent behavior. In this work, we identify and systematically study multimodal memory poisoning, an overlooked yet practical attack surface in web-agent systems. We propose MemVenom, a unified black-box attack framework that poisons graph-structured external memory with coordinated text-image evidence. Our method consists of a two-stage design: (1) a trigger-conditioned retrieval attack that ensures high-probability recall of malicious memory, and (2) a post-retrieval attack induction that leverages adversarial perturbations and stealthy OCR injection to override the original user objective. Unlike prior attacks that operate on prompts or text-only memory, our approach enables persistent, reusable, and goal-agnostic attacks without modifying model parameters or re-optimizing malicious tasks. Experiments across multiple web-agent frameworks and vision-language models demonstrate that MemVenom achieves strong end-to-end attack success with minimal impact on benign performance, reaching up to 99.15% on GPT-5-family web agents, while transferring effectively across architectures and model scales.

2606.10780 2026-06-10 cs.IT cs.CR cs.LG math.IT 交叉投稿

Secure Aggregation with Top-K Sparsification in Decentralized Federated Learning

去中心化联邦学习中的Top-K稀疏化安全聚合

Hengxuan Tang, Jinbao Zhu, Xiaohu Tang

发表机构 * Southwest Jiaotong University(西南交通大学)

AI总结 针对去中心化联邦学习中带宽有限和节点不可靠的问题,提出一种结合Top-K梯度稀疏化的信息论安全聚合方案,通过离线阶段处理维度相关开销,使用随机掩码和排列保护梯度,在1%稀疏率下保持精度并显著降低通信成本。

Comments 6 pages, 1 figure, accepted to IEEE ISIT 2026

详情
AI中文摘要

安全聚合是联邦学习中缓解梯度泄露的关键组件,但其通信成本通常随梯度维度扩展。这对于大型模型变得难以承受,在带宽有限和节点不可靠的去中心化联邦学习中更为突出。Top-K梯度稀疏化是一种有效的通信减少方法,仅传输完整梯度的少数条目,同时保持有竞争力的模型精度。然而,每个用户选择的Top-K条目不可预测且因用户而异,这对高效的稀疏安全聚合构成了挑战。本文研究了去中心化联邦学习中存在用户退出和用户合谋时的信息论安全聚合与Top-K稀疏化。我们提出了一种通信高效的稀疏安全聚合方案,将维度相关的开销转移到离线阶段,并使用随机掩码和排列保护私有梯度。实验结果表明,即使仅使用1%的梯度稀疏化,我们的方案也能保持与全梯度聚合相当的精度,同时大幅降低通信成本。

英文摘要

Secure aggregation is a vital component for mitigating gradient leakage in federated learning, but its communication cost conventionally scales with the gradient dimension. This becomes prohibitive for large models and even more pronounced in decentralized federated learning with limited bandwidth and unreliable nodes. Top-K gradient sparsification is an effective approach to reduce communication by transmitting only a few entries of the full gradient, while maintaining competitive model accuracy. Nevertheless, the top-K entries selected by each user are unpredictable and vary across users, which poses a challenge for efficient sparse secure aggregation. This paper studies information-theoretic secure aggregation with top-K sparsification in decentralized federated learning under user dropouts and user collusion. We propose a communication-efficient sparse secure aggregation scheme that offloads dimension-dependent overhead to an offline phase and protects private gradients using random masks and permutations. Experimental results demonstrate that our scheme preserves accuracy comparable to full-gradient aggregation even with only 1% gradient sparsification, while substantially reducing the communication cost.

2606.10916 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 交叉投稿

Range Penalization: Theoretical Insights with Applications in Federated Learning

范围惩罚:理论洞见及其在联邦学习中的应用

Yiyuan She, Zhaojun Hu, Yifan Sun

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出范围正则化方法,通过极值聚类实现跨客户端正则化,并开发非渐近统计精度与模式恢复的新证明技术,以及利用局部强凸性的快速优化算法。

详情
AI中文摘要

本文针对具有线性系统组件的联邦学习引入范围正则化,以提高统计精度并诱导跨客户端正则性,从而有利于量化、编码和资源效率。我们的方法识别不同客户端之间共享权重的特征,并将个性化特征的权重自适应地聚类到极值,这一过程称为极值聚类。由于正则化子的半范数性质和不可分解性,相关估计量的理论分析面临重大挑战。我们为统计精度和忠实模式恢复的非渐近分析开发了新的证明技术。此外,提出了一种利用不同程度局部强凸性的快速优化算法,以降低迭代复杂度。实验支持了所提方法的有效性和效率。

英文摘要

This paper introduces range regularization for federated learning with linear systematic components to enhance statistical accuracy and induce cross-client regularity conducive to quantization, coding, and resource efficiency. Our approach identifies features with shared weights across different clients and adaptively clusters the weights of personalized features at extreme values, a process we refer to as polar clustering. Theoretical analysis of the associated estimators poses significant challenges due to the seminorm nature and non-decomposability of the regularizer. We develop new proof techniques for the nonasymptotic analysis of statistical accuracy and faithful pattern recovery. Moreover, a fast optimization algorithm that leverages varying degrees of local strong convexity is proposed to reduce iteration complexity. Experiments support the efficacy and efficiency of the proposed approach.

2409.04111 2026-06-10 cs.LG 版本更新

Active-Passive Federated Learning for Vertically Partitioned Multi-view Data

面向垂直分区多视角数据的主动-被动联邦学习

Jiyuan Liu, Siqi Wang, Xinhang Wan, Yi Zhang, Junsong Chen, Xin Lu, Xinwang Liu

发表机构 * National University of Defense Technology(国防科技大学)

AI总结 提出主动-被动联邦学习框架,主动客户端独立构建完整模型,被动客户端仅辅助训练,解决推理时客户端协作不可靠问题,通过重构损失和对比损失实例化两种分类方法并验证有效性。

详情
AI中文摘要

垂直联邦学习是一种自然且优雅的方法,用于集成跨设备(客户端)垂直分区的多视角数据,同时保护其隐私。除了模型训练,现有方法在模型推理时需要所有客户端的协作。然而,模型推理可能长期维持服务,而协作(尤其是当客户端属于不同组织时)在现实场景中不可预测,例如合同取消、网络不可用等,导致推理失败。为了解决这个问题,我们首次尝试提出了一种灵活的主动-被动联邦学习(APFed)框架。具体来说,主动客户端是学习任务的发起者,负责构建完整模型,而被动客户端仅作为辅助。一旦模型构建完成,主动客户端可以独立进行推理。此外,我们将APFed框架实例化为两种分类方法,分别在被动客户端上采用重构损失和对比损失。同时,这两种方法在一系列实验中进行了测试,并取得了理想的结果,验证了它们的有效性。

英文摘要

Vertical federated learning is a natural and elegant approach to integrate multi-view data vertically partitioned across devices (clients) while preserving their privacies. Apart from the model training, existing methods requires the collaboration of all clients in the model inference. However, the model inference is probably maintained for service in a long time, while the collaboration, especially when the clients belong to different organizations, is unpredictable in real-world scenarios, such as concellation of contract, network unavailablity, etc., resulting in the failure of them. To address this issue, we, at the first attempt, propose a flexible Active-Passive Federated learning (APFed) framework. Specifically, the active client is the initiator of a learning task and responsible to build the complete model, while the passive clients only serve as assistants. Once the model built, the active client can make inference independently. In addition, we instance the APFed framework into two classification methods with employing the reconstruction loss and the contrastive loss on passive clients, respectively. Meanwhile, the two methods are tested in a set of experiments and achieves desired results, validating their effectiveness.

2604.24012 2026-06-10 cs.LG math.OC 版本更新

FedSLoP: Memory-Efficient Federated Learning with Low-Rank Gradient Projection

FedSLoP: 基于低秩梯度投影的内存高效联邦学习

Yutong He, Zhengyang Huang, Jiahe Geng, Kun Yuan

发表机构 * Peking University(北京大学) Beihang University(北航)

AI总结 提出FedSLoP算法,结合随机低秩子空间投影降低通信和存储开销,理论证明以O(1/√NT)速率收敛到一阶稳定点,实验在异构MNIST上优于FedAvg等基线。

Comments 27 pages, 7 figures

详情
AI中文摘要

联邦学习使一组客户端能够在不交换原始数据的情况下协作训练机器学习模型,但标准算法如FedAvg在异构、资源受限的环境中收敛缓慢且通信和内存成本高。我们提出FedSLoP,一种联邦优化算法,它结合了梯度的随机低秩子空间投影,从而降低了通信和存储更新的维度,同时保持了优化进度。在理论方面,我们在标准光滑和有界方差假设下进行了详细的非凸收敛分析,表明FedSLoP保证以$O(1/\sqrt{NT})$的速率收敛到一阶稳定点。在实证方面,我们在具有异构数据分区的联邦MNIST分类上进行了大量实验,表明与FedAvg以及代表性的稀疏或低秩基线相比,FedSLoP显著减少了通信量和客户端内存,同时实现了具有竞争力或更好的准确率。总之,我们的结果表明,诸如FedSLoP之类的随机子空间动量方法为通信和内存高效的联邦学习提供了一种原则性和有效的方法。代码可在以下网址获得:this https URL。

英文摘要

Federated learning enables a population of clients to collaboratively train machine learning models without exchanging their raw data, but standard algorithms such as FedAvg suffer from slow convergence and high communication and memory costs in heterogeneous, resource-constrained environments. We introduce FedSLoP, a federated optimization algorithm that combines stochastic low-rank subspace projections of gradients, thereby reducing the dimension of communicated and stored updates while preserving optimization progress. On the theoretical side, we develop a detailed nonconvex convergence analysis under standard smoothness and bounded-variance assumptions, showing that FedSLoP is guaranteed to converge to a first-order stationary point at a rate of $O(1/\sqrt{NT})$. On the empirical side, we conduct extensive experiments on federated MNIST classification with heterogeneous data partitions, showing that FedSLoP substantially reduces communication volume and client-side memory while achieving competitive or better accuracy compared with FedAvg and representative sparse or low-rank baselines. Together, our results demonstrate that random subspace momentum methods such as FedSLoP provide a principled and effective approach to communication- and memory-efficient federated learning. Codes are available at: https://github.com/pkumelon/FedSLoP.git.

8. 鲁棒性、不确定性与可信学习 31 篇

2606.09850 2026-06-10 cs.LG cs.CL 新提交

Mechanistic Analysis of Alignment Algorithms in Language Models

语言模型中对齐算法的机制分析

Aarush Sinha, Ishan Garg, Veeraraju Elluru, Arth Singh, Kushal Garg

AI总结 本文通过层间线性探针、稀疏自编码器和交叉编码器,系统分析了六种偏好优化方法在语言模型中的内部机制,发现不同目标函数导致不同的表示几何变换,并揭示了行为对齐与内部结构变化的不一致性。

Comments Work in Progress

详情
AI中文摘要

后训练对齐算法主要作为黑箱进行评估,掩盖了它们如何重塑语言模型的内部计算。我们对三种开源模型家族的六种偏好优化方法(PPO、DPO、SimPO、ORPO、GRPO 和 KTO)进行了系统的机制分析。通过集成层间线性探针、稀疏自编码器和交叉编码器,我们定位了偏好表示并量化了对齐引起的潜在空间几何变换。我们发现偏好信号一致地集中在早期-中期或中期-后期层,但不同的目标函数导致定性的不同表示偏移。KTO 和 GRPO 通过建设性的特征共享和稀疏高显著性招募增强了线性可分离性。相反,DPO 和 ORPO 通过非建设性的几何旋转和特征衰减降低了可分离性,而 PPO 和 SimPO 基本保持了基线几何。这些变换表现出架构依赖的变异性,表明行为对齐并不意味着统一的内部重构。我们的发现将对齐确立为一种异质性干预,激励了安全性和可解释性的标准化特征级审计,并强调了需要机制感知的优化目标。

英文摘要

Post-training alignment algorithms are predominantly evaluated as black boxes, obscuring how they reshape language models' internal computations. We present a systematic mechanistic analysis of six preference-optimization methods: PPO, DPO, SimPO, ORPO, GRPO, and KTO across three open-weight model families. By integrating layer-wise linear probing, Sparse Autoencoders, and crosscoders, we localize preference representations and quantify alignment-induced geometric transformations in latent space. We find that preference signals consistently concentrate in early--mid or mid--late layers, but different objectives induce qualitatively distinct representational shifts. KTO and GRPO enhance linear separability through constructive feature sharing and sparse, high-salience recruitment. In contrast, DPO and ORPO degrade separability via non-constructive geometric rotation and feature attenuation, while PPO and SimPO largely preserve baseline geometry. These transformations exhibit architecture-dependent variability, demonstrating that behavioral alignment does not imply uniform internal restructuring. Our findings establish alignment as a heterogeneous intervention, motivate standardized feature-level auditing for safety and interpretability, and highlight the need for mechanism-aware optimization objectives.

2606.09859 2026-06-10 cs.LG cs.AI 新提交

Mitigating Manifold Departure: Uncertainty-Aware Subspace Rectification for Trustworthy MLLM Decoding

缓解流形偏离:面向可信MLLM解码的不确定性感知子空间校正

Yingxuan Zhuang, Jingxiao Yang, Miao Pan, Cheng Tan, Yuxiang Cai, Siwei Tan, Chen Zhi, Xuhong Zhang, Jianwei Yin, Jintao Chen

发表机构 * Nanyang Technological University(南洋理工大学)

AI总结 提出MGAP方法,通过SVD构建语言先验子空间并自适应衰减投影分量,在抑制幻觉的同时保持语义结构,优于现有解码基线。

Comments ICML 2026 regular

详情
AI中文摘要

MLLM经常产生与视觉输入不一致的幻觉对象。这一问题通常归因于对语言先验的过度依赖,这种依赖可能覆盖视觉上下文。最近的训练自由解码策略通过惩罚语言先验来解决这一问题。然而,这些方法忽视了语言先验的双重性质——根据与视觉证据的对齐程度,它们既可能有益也可能有害。特别是,盲目抑制语言先验往往会破坏模型的语义流形,导致性能下降,我们将这种现象称为流形偏离。为了解决这个问题,我们提出了流形引导自适应投影(MGAP),一种几何感知、训练自由的解码方法,在抑制幻觉的同时保持表示结构。MGAP首先通过SVD从盲隐藏状态构建语言先验子空间。在解码过程中,MGAP将每个多模态隐藏状态投影到该子空间上,并应用一致性感知门控自适应地仅衰减投影的先验分量,从而产生子空间选择性更新,很大程度上保留了正交的语义分量。在POPE和CHAIR上的大量实验表明,MGAP优于先前的解码基线,在不牺牲连贯性的情况下实现了更强的幻觉抑制。

英文摘要

MLLMs frequently hallucinate objects inconsistent with visual inputs. This issue is typically attributed to the over-reliance on language priors, which can override the visual context. Recent training-free decoding strategies address this by penalizing language priors. However, these methods overlook the dual nature of language priors, where they can be both helpful and harmful depending on the alignment with visual evidence. In particular, blindly suppressing language priors often disrupts the model's semantic manifold, leading to performance degradation, a phenomenon we term Manifold Departure. To address this, we propose Manifold-Guided Adaptive Projection (MGAP), a geometry-aware, training-free decoding method that mitigates hallucinations while preserving representation structure. MGAP first constructs a language-prior subspace from blind hidden states via SVD. During decoding, MGAP projects each multimodal hidden state onto this subspace and applies a consistency-aware gate to adaptively attenuate only the projected prior component, yielding a subspace-selective update that largely preserves the orthogonal semantic components. Extensive experiments on POPE and CHAIR show that MGAP outperforms prior decoding baselines, achieving stronger hallucination suppression without sacrificing coherence.

2606.09863 2026-06-10 cs.LG 新提交

From Confident Closing to Silent Failure: Characterizing False Success in LLM Agents

从自信收尾到无声失败:LLM智能体中的虚假成功特征分析

Laksh Advani

发表机构 * Laksh Advani

AI总结 研究LLM智能体在环境状态未完成时声称任务完成的虚假成功模式,发现其普遍存在但检测困难,轻量级TF-IDF检测器优于LLM评判器。

Comments Accepted to FAGEN@ICML2026

详情
AI中文摘要

LLM智能体可能在环境状态显示任务未完成时,通过断言任务已完成而无声地失败。我们研究了这种失败模式——虚假成功,在两个智能体基准测试中:来自8个模型家族的9,876条tau2-bench轨迹和来自4个模型家族的1,879条AppWorld轨迹,均具有文本无关的真实标签。虚假成功很常见,但随设置而异:在单控制tau2-bench域中占失败的45-48%,在双控制电信域中占3%,在具有显式状态声明的AppWorld自评估编码智能体轨迹中占75.8%。LLM评判器不可靠:在tau2-bench上,5个评判器、5种提示策略和完整任务规格的任何配置均未超过AUROC 0.65;同样的评判器在AppWorld API调用轨迹上仅达到AUROC 0.54。评判器依赖表面完成代理——tau2-bench中的自信收尾语言和AppWorld中的粗略动作序列量——而非验证的状态变化。轻量级TF-IDF检测器在tau2-bench上达到任务分离AUROC 0.83,在AppWorld上达到0.95,在相同标记率下比最佳评判器多恢复4-8倍的虚假成功,且延迟低3300倍。这些结果表明,生产监控应使用轻量级、领域校准的检测器作为分诊信号,而非依赖LLM评判器作为虚假成功的主要监控手段。

英文摘要

LLM agents can fail silently by asserting task completion when the environment state shows otherwise. We study this failure mode, false success, across two agent benchmarks: 9,876 tau2-bench trajectories from 8 model families and 1,879 AppWorld trajectories from 4 model families with text-independent ground truth. False success is common but varies by setting: 45--48% of failures in single-control tau2-bench domains, 3% in dual-control telecom, and 75.8% among AppWorld self-assessing coding-agent trajectories with explicit status claims. LLM judges fail reliably: no configuration across 5 judges, 5 prompt strategies, and full task specifications exceeds AUROC 0.65 on tau2-bench, and the same judges reach only 0.54 AUROC on AppWorld API-call traces. Judges rely on surface completion proxies -- confident closing language in tau2-bench and coarse action-sequence volume in AppWorld -- rather than verified state changes. Lightweight TF-IDF detectors achieve task-disjoint AUROC 0.83 on tau2-bench and 0.95 on AppWorld, recovering 4--8x more false successes than the best judge at the same flag rate with 3,300x lower latency. These results suggest that production monitoring should use lightweight, domain-calibrated detectors as triage signals rather than relying on LLM judges as the primary monitor for false success.

2606.09864 2026-06-10 cs.LG cs.AI cs.ET 新提交

Alignment Collapse Under KV Cache Quantization: Diagnosis and Mitigation

KV缓存量化下的对齐崩溃:诊断与缓解

Bruce Changlong Xu, Adarsh Kumarappan, Mu Zhou

发表机构 * Stanford University(斯坦福大学) California Institute of Technology(加利福尼亚理工学院)

AI总结 研究发现低比特KV缓存量化会无声破坏大模型的安全对齐,根源在于安全特征位于低维激活子空间,易受量化噪声影响;提出逐通道缩减(PCR)诊断方法,分类三种失效模式并指导缓解,无需训练即可恢复高达97%的对齐损失。

Comments Preprint. 61 pages, 9 figures

详情
AI中文摘要

键值(KV)缓存量化被广泛用于减少大语言模型(LLM)推理内存,然而现有评估仅关注困惑度和准确率,未评估安全性影响。在本研究中,我们探索了KV缓存量化下的对齐保持。在11个指令微调模型(3.8B-72B)和5个基准(1,894个提示)上,我们发现低比特量化可以无声地破坏安全对齐:Mistral-7B在仅1.03倍困惑度下丢失了15.2%的拒绝能力,且不存在通用的安全位宽,标准指标无法察觉的尖锐模型特定相变普遍存在。我们识别出根本原因是几何性的:安全特征占据一个低维激活子空间,其对量化噪声的脆弱性比困惑度平均的全表示空间高10^2-10^3倍。受此观察启发,我们提出逐通道缩减(PCR),一种诊断方法,将每个模型分类为三种机制性失效模式之一:异常值压碎安全(安全位于非异常值通道,被异常值驱动的缩放因子连带损害);异常值即安全(安全与异常值通道重叠,更细粒度无法挽救);多层稀释(安全分布在许多层,逐层修复失败)。PCR在全部9个主要模型和来自独立家族的1个保留模型上,使用20个校准提示预测了正确的缓解方向。PCR泛化到未见过的提示、模型和生产量化器,包括KIVI,恢复率高达97.2%,而基于注意力的分配方法失败。由此产生的免训练协议,大约需要35 GPU分钟,以最小的内存开销恢复高达97%的丢失对齐,解决了在NVIDIA GPU上使用FP8 KV缓存的生产vLLM服务中确认的漏洞。

英文摘要

Key-value (KV) cache quantization is widely used to reduce Large Language Model (LLM) inference memory, yet existing evaluations solely focus on measuring perplexity and accuracy without assessing the safety impact. In this study, we explore alignment preservation under KV cache quantization. Across eleven instruction-tuned models (3.8B-72B) and five benchmarks (1,894 prompts), we find that low-bit quantization can silently destroy safety alignment: Mistral-7B loses 15.2% of its refusals at only 1.03x perplexity, and no universal safe bit-width exists, with sharp model-specific phase transitions invisible to standard metrics. We identify that the root cause is geometric: safety features occupy a low-dimensional activation subspace 10^2-10^3x more vulnerable to quantization noise than the full representation space perplexity averages over. Inspired by this observation, we propose Per-Channel Reduction (PCR), a diagnostic that classifies each model into one of three mechanistic failure modes: outlier-crushes-safety, where safety lives in non-outlier channels collaterally damaged by outlier-driven scale factors; outlier-as-safety, where safety overlaps outlier channels and finer granularity cannot rescue it; and multi-layer dilution, where safety is distributed across many layers and per-layer fixes fail. PCR predicts the correct mitigation direction on all nine primary models and one held-out model from an independent family using 20 calibration prompts. PCR generalizes across unseen prompts, models, and production quantizers, including KIVI with up to 97.2% recovery, succeeding where attention-based allocation methods fail. The resulting training-free protocol, requiring approximately 35 GPU-minutes, recovers up to 97% of lost alignment at minimal memory overhead, addressing vulnerabilities confirmed in production vLLM serving with FP8 KV cache on NVIDIA GPUs.

2606.09866 2026-06-10 cs.LG cs.AI 新提交

Two to Tango: Coupled Task-Reference Selection for Safe LLM Fine-tuning

双人探戈:面向安全LLM微调的耦合任务-参考选择

Xinrui Chen, Jianhao Zhang, Ou Wu, Di Gao

AI总结 提出DualSelect框架,通过耦合任务与安全参考选择,在微调时保持安全对齐,提升安全评分至少5.10点。

详情
AI中文摘要

在下游数据上微调安全对齐的大型语言模型(LLMs)可以提高适应性,但可能会侵蚀已学习的安全行为。现有方法使用固定的安全示例、全局约束或单边任务过滤。我们的诊断表明,任务更新暴露了不同的安全约束,从而激发了联合选择相关参考和兼容任务样本的需求。我们提出DualSelect,一个耦合的任务和参考选择框架,它在过滤与诱导参考方向兼容的整个任务样本之前,刷新任务条件化的安全参考。在极小极大视角下,DualSelect通过熵正则化评分代理、惰性参考刷新和梯度校正,选择具有高保留损失和任务冲突的安全参考以及兼容的任务样本。在1B-8B LLMs上,DualSelect在不损失任务效用的情况下保持安全性;使用REDORCA评估器,它在安全平均值上比最强基线至少提高5.10分,并且在所有评估器中保持最高的安全平均值,且开销适中。这一观点扩展到以保留为中心的持续学习。

英文摘要

Fine-tuning safety aligned large language models (LLMs) on downstream data improves adaptation but may erode learned safety behavior. Existing methods use fixed safety examples, global constraints, or one-sided task filtering. Our diagnostics show task updates expose different safety constraints, motivating joint selection of relevant references and compatible task samples. We propose DualSelect, a coupled framework for task and reference selection that refreshes task conditioned safety references before filtering whole task samples compatible with the induced reference direction. Under a minimax view, DualSelect selects safety references with high preservation loss and task conflict, together with compatible task samples, through entropy-regularized scoring surrogates, lazy reference refresh, and gradient correction. On 1B-8B LLMs, DualSelect preserves safety without losing task utility; using the REDORCA judge, it improves Safety Avg. over the strongest baseline by at least 5.10 points and remains highest in Safety Avg. across judges with moderate overhead. This view extends to retention focused continual learning.

2606.09875 2026-06-10 cs.LG cs.AI stat.ML 新提交

Integrating Local and Global Entropy for Uncertainty Quantification in LLMs

集成局部和全局熵用于大语言模型的不确定性量化

Johanne Medina, Tianyi Zhou, Keivin Isufaj, Aristides Gionis, Sanjay Chawla

AI总结 本文提出GLU方法,通过融合隐藏状态几何熵(全局)和token级熵(局部)来量化LLM不确定性,有效捕捉自信但错误的失败模式,无需额外训练。

Comments 17 pages, 2 figures

详情
AI中文摘要

大语言模型会自信地产生幻觉,使得不确定性量化(UQ)对于可靠部署至关重要。现有方法主要依赖token级信号,而中间隐藏状态的几何结构未被充分利用。在本文中,我们将隐藏状态矩阵的几何复杂度作为LLM全局不确定性的度量,同时将token级不确定性估计视为局部度量。我们表明,隐藏状态几何熵(全局不确定性)和token级熵(局部不确定性)在统计上近似正交,捕捉了可靠性预测的不同失败模式。特别地,全局几何恢复了局部信号系统性遗漏的自信但错误的失败模式。基于此,我们提出了全局-局部不确定性(GLU),这是一种无监督、单次前向传播的分数,通过乘法门融合两种信号。在三个模型族和六个基准测试中,GLU匹配或优于所有无监督基线,同时仅需一次前向传播,且保持长度归一化和架构无关性。

英文摘要

Large language models hallucinate confidently, making uncertainty quantification (UQ) essential for reliable deployment. Existing methods rely predominantly on token-level signals, leaving the geometric structure of intermediate hidden states underused. In this paper, we take the geometric complexity of hidden-state matrices as a measure of the global uncertainty of LLMs, while treating token-level uncertainty estimation as a local metric. We show that hidden-state geometric entropy (global uncertainty) and token-level entropy (local uncertainty) are statistically near-orthogonal, capturing distinct failure regimes for reliability prediction. In particular, global geometry recovers the confident-but-wrong failure mode that local signals systematically miss. Building on this, we propose Global-Local Uncertainty (GLU), an unsupervised, single-pass score that fuses the two signals via a multiplicative gate. Across three model families and six benchmarks, GLU matches or outperforms all unsupervised baselines while requiring only a single forward pass and remaining length-normalized and architecture-agnostic.

2606.09876 2026-06-10 cs.LG 新提交

Calibrating Overconfidence Without Sacrificing Confidence: Probe-Conditioned Head Intervention for LLMs

校准过度自信而不牺牲自信:基于探针条件的头部干预方法用于LLMs

Ke Li, Chongzhe Zhang, Zifan Zeng, Feng Liu, Qunli Zhang, Zheng Hu

发表机构 * Huawei Heisenberg Research Center(华为海森堡研究中心) EPFL(苏黎世联邦理工学院) TU Berlin(柏林技术大学) TUM(慕尼黑工业大学)

AI总结 提出PCHI方法,通过冻结探针检测可能错误但自信的响应,并条件性调整注意力头输出,在保持正确自信的同时减少过度自信,将ECE从21.9%降至9.2%。

Comments 11 pages, 4 figures

详情
AI中文摘要

大型语言模型常常对错误答案表现出高置信度。标准的校准方法通常全局或分数级别操作,减少无根据的自信,但也冒着侵蚀正确答案上有根据的自信的风险。我们引入了基于探针条件的头部干预(PCHI),一种推理时方法,使用冻结探针检测可能错误但自信的响应,并在置信度生成期间条件性地重新缩放下游注意力头的输出。在Qwen3-4B-Instruct解决OpenMathInstruct问题(具有结构化二进制置信度字段)上,读取令牌PCHI将82.2%的原始错误-是置信度读数转换为$\texttt{no}$,而跨上游置信度模板令牌的联合干预将ECE从21.9%降至9.2%,并且仅损坏5.1%的原始正确-是读数。读取令牌效应也出现在Gemma3-4B上,尽管上游干预较弱且更依赖于掩码。这些结果表明,通过条件性应用的内部干预,可以选择性地减少口头表达的过度自信,部分解耦抑制无根据自信与损失有根据自信之间的关系。

英文摘要

Large language models often express high confidence in answers that are wrong. Standard calibration remedies typically act globally or at the score level, reducing unwarranted confidence but also risking erosion of warranted confidence on correct answers. We introduce Probe-Conditioned Head Intervention (PCHI), an inference-time method that uses a frozen probe to detect likely wrong-but-confident responses and conditionally rescales downstream attention-head outputs during confidence generation. On Qwen3-4B-Instruct solving OpenMathInstruct problems with a structured binary confidence field, readout-token PCHI converts 82.2% of originally wrong-yes confidence readouts to $\texttt{no}$, while a joint intervention across upstream confidence-template tokens reduces ECE from 21.9% to 9.2% and damages only 5.1% of originally correct-yes readouts. The readout-token effect also appears on Gemma3-4B, though upstream interventions are weaker and more mask-dependent. These results show that verbalized overconfidence can be selectively reduced through conditionally applied internal intervention, partially decoupling the suppression of unwarranted confidence from the loss of warranted confidence.

2606.09881 2026-06-10 cs.LG cs.CR cs.CV 新提交

Toward Calibrated, Fair, and accurate Deepfake Detection

迈向校准、公平且准确的深度伪造检测

Ryan Brown, Chris Russell

发表机构 * University of Oxford(牛津大学)

AI总结 提出Face-Fairness框架,通过Face-Feature Tuning实现无需人口统计标签的深度伪造检测公平性,同时保持或提升整体准确率。

详情
AI中文摘要

深度伪造检测器在不同人口群体间表现出较大的性能差距。现有的公平性方法需要人口统计标签、重新训练或牺牲准确性。我们引入了Face-Fairness (FF),一个即插即用的偏差缓解框架。我们的主要贡献是Face-Feature Tuning (FFT),这是首个在深度伪造检测中展示的无人口统计标签的公平性方法:一个轻量级校准器,基于冻结的人脸嵌入进行logit重映射。我们通过两种变体补充FFT:FF-Max,在人口统计标签可用时最大化最差组准确率;以及FF-Discover,通过嵌入发现的组实现相同目标。在域内和跨数据集测试设置中,FF一致地减少了FPR/TPR差距,提高了最小组准确率,同时保持(通常提升)整体准确率。该方法与检测器无关,增加了可忽略的运行时开销,并且不需要访问身份属性。

英文摘要

Deepfake detectors show large performance gaps across demographic groups. Existing fairness approaches require demographic labels, retraining, or sacrifice accuracy. We introduce Face-Fairness (FF), a plug-and-play framework for bias mitigation. Our primary contribution, Face-Feature Tuning (FFT), is the first demographic label-free fairness method demonstrated for deepfake detection: a lightweight calibrator that performs a logit remapping conditioned on frozen face embeddings. We complement FFT with two variants: FF-Max, which maximizes worst-group accuracy when demographics are available, and FF-Discover, which does the same with embedding-discovered groups. Across in-domain and cross-dataset test settings, FF consistently reduces FPR/TPR gaps and improves minimum group accuracy while maintaining (often improving) overall accuracy. The approach is detector-agnostic, adds negligible runtime overhead, and requires no access to identity attributes.

2606.09923 2026-06-10 cs.LG cs.AI 新提交

Conformal Prediction for Neural Operators: Distribution-Free Uncertainty Quantification in Physics Simulation

神经算子的共形预测:物理模拟中无分布不确定性量化

Michael Chin

AI总结 提出将分裂共形预测应用于神经算子物理模拟,实现无分布预测区间和有限样本覆盖保证,并通过归一化共形预测方案生成自适应宽度区间。

Comments 13 pages, 7 tables, 7 figures. Full-scale experiments on NVIDIA V100

详情
AI中文摘要

神经算子如傅里叶神经算子(FNO)已成为求解偏微分方程(PDE)的强大替代方法,比传统数值求解器快几个数量级。然而,在安全关键工程应用(如电子元件和电池系统的热管理)中部署这些模型,不仅需要准确的点预测,还需要严格的不确定性保证。现有的神经算子不确定性量化(UQ)方法,包括蒙特卡洛Dropout和深度集成,仅提供相对不确定性估计,没有正式的覆盖保证。在这项工作中,我们首次将分裂共形预测应用于基于神经算子的物理模拟,提供具有有限样本覆盖保证的无分布预测区间。我们进一步引入了一种归一化共形预测方案,利用MC Dropout不确定性生成自适应宽度区间,在低不确定性区域产生更紧的区间,在模型不太确定的区域产生更宽的区间。在稳态热传导基准上的全规模实验(3370万参数,800个训练样本,5个集成成员,NVIDIA V100)表明,我们的方法在目标水平alpha=0.1下达到89.1%的经验覆盖率,同时生成反映底层物理不确定性结构的空间自适应预测区间。我们还提供了一个不确定性分解框架,将认知不确定性(占总量的68%)与偶然不确定性(占总量的32%)分离,为数据收集和模型改进提供可操作指导。我们的方法在一个开源平台上实现,具有REST API端点和交互式3D可视化。

英文摘要

Neural operators such as the Fourier Neural Operator (FNO) have emerged as powerful surrogates for solving partial differential equations (PDEs), achieving speedups of several orders of magnitude over traditional numerical solvers. However, deploying these models in safety-critical engineering applications -- such as thermal management of electronic components and battery systems -- requires not only accurate point predictions but also rigorous uncertainty guarantees. Existing uncertainty quantification (UQ) methods for neural operators, including Monte Carlo Dropout and Deep Ensembles, provide only relative uncertainty estimates without formal coverage guarantees. In this work, we propose the first application of split conformal prediction to neural operator-based physics simulation, providing distribution-free prediction intervals with finite-sample coverage guarantees. We further introduce a normalized conformal prediction scheme that leverages MC Dropout uncertainty to produce adaptive-width intervals, yielding tighter intervals in regions of low uncertainty and wider intervals where the model is less certain. Full-scale experiments (33.7M parameters, 800 training samples, 5 ensemble members, NVIDIA V100) on steady-state heat conduction benchmarks demonstrate that our method achieves 89.1% empirical coverage at the target level of alpha=0.1, while producing spatially adaptive prediction intervals that reflect the underlying physical uncertainty structure. We also provide an uncertainty decomposition framework that separates epistemic uncertainty (68% of total) from aleatoric uncertainty (32% of total), offering actionable guidance for data collection and model improvement. Our method is implemented in an open-source platform with REST API endpoints and interactive 3D visualization.

2606.10137 2026-06-10 cs.LG 新提交

Ambiguous Strategic Classification

模糊策略分类

Ivri Hikri, Nir Rosenfeld

AI总结 研究在监管要求部分信息披露下,学习器如何联合优化分类器及其不确定性,引入模糊性概念并开发高效算法。

详情
AI中文摘要

策略分类中的一个常见假设是分类器是公开的。然而,系统是否会选择完全披露,以及为什么,仍不清楚。我们研究了一个监管要求系统披露部分(而非全部)信息的设置。这引发了一个学习任务,其中学习器必须联合优化分类器及其周围的不确定性。为此,我们从稳健机制设计中采纳了模糊性概念,在我们的设置中,这允许学习器揭示一组或一系列可能的分类器,同时私下选择最终实现哪一个。我们研究了模糊性如何影响学习任务,开发了计算最佳响应和训练的高效算法,并通过我们的方法在新设置中实证探索了策略学习及其结果。

英文摘要

A common assumption in strategic classification is that the classifier is public knowledge. However, it remains unclear whether, and why, a system would choose to commit to full disclosure. We study a setting in which regulation requires the system to disclose some, but not all, of the information. This induces a learning task in which the learner must jointly optimize the classifier and the uncertainty surrounding it. To this end, we adopt from robust mechanism design the notion of ambiguity, which in our setting allows the learner to reveal a set or range of possible classifiers, while privately choosing which of them to ultimately realize. We investigate how ambiguity affects the learning task, develop efficient algorithms for computing best-responses and training, and empirically explore strategic learning and its outcomes in this novel setting and using our approach.

2606.10154 2026-06-10 cs.LG cs.CR 新提交

Quality Is Not a Safety Proxy Under Quantization

质量不是量化下的安全代理

Sahil Kadadekar

AI总结 研究发现量化检查点的质量指标无法替代直接安全测试,提出拒绝模板稳定性指数(RTSI)以识别危险行。

Comments 21 pages, 6 figures. Preprint

详情
AI中文摘要

量化检查点通常首先通过质量指标筛选,然后才进行直接安全测试(如果有的话)。本文在一个匹配的51行矩阵上审计了这一捷径,该矩阵涵盖6个模型、4个系列、7级GGUF梯度和AWQ/GPTQ INT4检查点。在这个矩阵中,捷径失败:所有36个质量-安全配对在模型间方向分裂,9个隐藏危险行加上1个接近隐藏危险行显示质量稳定或改善,而拒绝率下降12-68个百分点。11个AWQ/GPTQ行中有7个是隐藏危险。对17个Hugging Face支持的FP16/AWQ/GPTQ单元格进行的四探针机械后续研究未能挽救:熵、拒绝方向和校准探针是危险行的弱或零分离器,尽管探针识别的安全相关神经元整体上吸收了1.39倍的量化误差(p < 5×10^{-7}),但该效应并非特定于体系。Claude Sonnet 4重新标记了预定义分层集中的11,470个项目,与主要gemma3:12b判断者在89.9%的行上一致(κ=0.873,95% CI [0.866, 0.881]),并且改变了0/10个隐藏危险单元格。一个校准的研究内部行为筛选——拒绝模板稳定性指数(RTSI),由四个拒绝模板漂移特征构建并在该矩阵上校准——将10/10个隐藏或接近隐藏危险行路由到直接安全测试(Wilson 95% CI下限0.72),同时在样本内评分和行级留一验证下,将45个非基线行中的23个留在低风险桶中;在同一矩阵上,最佳单特征基线(唯一前缀率差、原始拒绝率差)在匹配桶大小下分别恢复9/10和8/10,跨堆栈传输需要重新校准。对于此处研究的量化检查点、模型系列和安全结果,保留的质量不能免除直接安全评估。

英文摘要

Quantized checkpoints are often screened first with quality metrics and only later, if at all, with direct safety tests. This paper audits that shortcut on a matched 51-row matrix spanning 6 models, 4 families, a 7-level GGUF ladder, and AWQ/GPTQ INT4 checkpoints. In this matrix the shortcut fails: all 36 quality-safety pairings split direction across models, and 9 hidden-danger rows plus 1 near-hidden-danger row show quality stable or improved while refusal falls by 12-68 percentage points. Seven of the 11 AWQ/GPTQ rows are hidden-danger. A four-probe mechanistic follow-up over the 17 Hugging Face-backed FP16/AWQ/GPTQ cells does not rescue it: entropy, refusal-direction, and calibration probes are weak or null separators of dangerous rows, and although probe-identified safety-associated neurons absorb 1.39$\times$ more quantization error overall ($p < 5 \times 10^{-7}$), the effect is not regime-specific. Claude Sonnet 4 relabels 11,470 items in a predefined stratified set, agrees with the primary gemma3:12b judge on 89.9\% of rows ($κ= 0.873$, 95\% CI [0.866, 0.881]), and changes 0/10 hidden-danger cells. A calibrated study-internal behavioral screen -- the Refusal Template Stability Index (RTSI), built from four refusal-template drift features and calibrated on this matrix -- routes 10/10 hidden- or near-hidden-danger rows to direct safety testing (Wilson 95\% CI lower bound 0.72) while leaving 23 of 45 non-baseline rows in a low-risk bucket under both in-sample scoring and row-level leave-one-out validation; on the same matrix, the best single-feature baselines (unique-prefix-rate-delta, raw refusal-rate delta) recover 9/10 and 8/10 respectively at matched bucket size, and cross-stack transfer requires recalibration. For the quantized checkpoints, model families, and safety outcomes studied here, retained quality cannot waive direct safety evaluation.

2606.10347 2026-06-10 cs.LG cs.LO 新提交

Beyond Explaining Predictions: Logic-Based Explanations for Confidence in Machine Learning Models

超越预测解释:基于逻辑的机器学习模型置信度解释

Vinícius Peixoto Chagas, Carlos Henrique Leitão Cavalcante, Thiago Alves Rocha

AI总结 提出置信度感知的反绎解释,通过最小置信度阈值量化解释的置信保证,并设计算法生成满足用户指定置信阈值的最小解释,在提升置信保证的同时仅适度增加解释长度。

详情
AI中文摘要

机器学习越来越多地应用于关键领域,在这些领域中,预测及其相关的置信水平都会影响重要决策。为了增强此类场景的透明度,理解模型为何对其预测有信心或不确定非常重要。最近的基于逻辑的方法提供了反绎解释,即足以保持预测类别的最小特征子集,并具有正确性保证。然而,这些方法仅关注分类行为,可能产生覆盖低预测置信度实例的解释。在这项工作中,我们引入了最小置信度阈值(MCT)的概念,它量化了反绎解释提供的最弱置信度保证。基于这一概念,我们提出了置信度感知的反绎解释,它不仅保持预测类别,还保持用户指定的置信度保证。我们将MCT计算表述为一个优化问题,并引入了一种算法,用于生成满足所需置信度阈值的最小解释。我们在用于二分类的提升树上评估了所提出的框架,尽管该方法也适用于其他提供置信度分数的机器学习模型。实验结果表明,传统的反绎解释通常提供比被解释实例本身相关的置信度弱得多的置信度保证。相比之下,置信度感知的解释持续提高了解释所保证的最小置信度,同时仅需要适度增加解释长度。这些特性使得所提出的方法特别适用于预测正确性和置信度对于可信决策都至关重要的应用。

英文摘要

Machine learning is increasingly used in critical domains, where both predictions and their associated confidence levels influence important decisions. To enhance transparency in such scenarios, it is important to understand why a model is confident or uncertain about its predictions. Recent logic-based approaches provide abductive explanations, minimal subsets of features sufficient to preserve the predicted class, with correctness guarantees. However, these methods focus solely on classification behavior and may produce explanations that cover instances with low predictive confidence. In this work, we introduce the concept of Minimum Confidence Threshold (MCT), which quantifies the weakest confidence guarantee provided by an abductive explanation. Building upon this concept, we propose confidence-aware abductive explanations, which preserve not only the predicted class but also a user-specified confidence guarantee. We formulate MCT computation as an optimization problem and introduce an algorithm for generating minimal explanations that satisfy a desired confidence threshold. We evaluate the proposed framework on boosted trees for binary classification, although the approach is applicable to other machine learning models that provide confidence scores. Experimental results show that traditional abductive explanations often provide substantially weaker confidence guarantees than the confidence associated with the explained instance itself. In contrast, confidence-aware explanations consistently improve the minimum confidence guaranteed by an explanation while requiring only a modest increase in explanation length. These properties make the proposed approach particularly suitable for applications where both predictive correctness and confidence are essential for trustworthy decision making.

2606.10632 2026-06-10 cs.LG cs.AI 新提交

Is Fairness Truly Fair? Towards Reliable Lipschitz Fairness in Multi-Task Learning via Fixed-\texorpdfstring{$δ$}{delta} Alignment

公平真的公平吗?通过固定δ对齐实现多任务学习中可靠的Lipschitz公平性

Junbo Ding, Xin Zang, Chenchen Pan, Donghao Song, Jiaxin Zhu, Danhuai Guo

发表机构 * Beijing University of Chemical Technology(北京化工大学)

AI总结 针对多任务学习中Lipschitz个体公平性评估受表示尺度干扰的问题,提出固定δ审计与受控正则化框架ReLiF,实现语义一致的公平性评估与权衡。

详情
AI中文摘要

Lipschitz风格的个体公平性形式化了语义相似的样本应获得相似预测的思想,但在多任务学习(MTL)中,其评估可能受到方法引起的表示尺度的干扰。本文识别了阈值混淆问题:当审计容差源自每个模型自身的表示距离时,不同算法会在不同的语义阈值下进行比较。阈值漂移分析进一步展示了偏差排名如何变化,并识别了排名保持的充分条件。我们提出了\textbf{ReLiF},一个可靠性感知框架,将评估时的固定$\delta$审计与训练时的受控正则化分离。ReLiF使用共享参考容差进行可比较的审计,并通过违反率反馈控制器保持Lipschitz代理活跃而不让其主导随机训练。本文还发展了关于阈值漂移、参考容差选择以及huberized训练代理与其未平滑的正间隔对应物之间关系的支持性分析。在临床时间序列基准和NYUv2(NYU Depth V2)密集预测上的实验表明,固定$\delta$审计暴露了方法依赖阈值可能掩盖的效用-公平性权衡。在使用ResNet50骨干的NYUv2上,ReLiF在共享固定阈值下实现了有竞争力的效用,同时显著减少了对齐偏差。在临床基准上,ReLiF产生了受控的公平性正则化权衡,而固定$\delta$审计揭示任务平衡基线有时能实现更低的偏差,且真正的效用-公平性权衡仍然存在。这些结果支持固定$\delta$审计作为评估MTL中Lipschitz公平性的语义一致协议。

英文摘要

Lipschitz-style individual fairness formalizes the idea that semantically similar examples should receive similar predictions, but its evaluation in multi-task learning (MTL) can be confounded by method-induced representation scales. This paper identifies threshold confounding: when the auditing tolerance is derived from each model's own representation distances, different algorithms are compared under different semantic thresholds. A threshold-drift analysis further shows how Bias rankings can change and identifies sufficient conditions for ranking preservation. We propose \textbf{ReLiF}, a reliability-aware framework that separates evaluation-time fixed-$δ$ auditing from training-time controlled regularization. ReLiF uses a shared reference tolerance for comparable auditing and a violation-rate feedback controller to keep the Lipschitz surrogate active without letting it dominate stochastic training. This work also develops supporting analysis for threshold drift, reference-tolerance selection, and the relationship between the huberized training surrogate and its unsmoothed positive-margin counterpart. Experiments on clinical time-series benchmarks and NYUv2 (NYU Depth V2) dense prediction show that fixed-$δ$ auditing exposes utility--fairness trade-offs that method-dependent thresholds can obscure. On NYUv2 with a ResNet50 backbone, ReLiF achieves competitive utility while substantially reducing aligned bias under shared fixed thresholds. On clinical benchmarks, ReLiF yields controlled fairness-regularized trade-offs, while fixed-$δ$ auditing reveals that task-balancing baselines can sometimes achieve lower bias and that genuine utility--fairness trade-offs persist. These results support fixed-$δ$ auditing as a semantically consistent protocol for evaluating Lipschitz fairness in MTL.

2606.10669 2026-06-10 cs.LG cs.AI cs.CR 新提交

In Defense of Information Leakage in Concept-based Models

为基于概念模型中的信息泄露辩护

Mateo Espinosa Zarlenga

发表机构 * GitHub arXiv

AI总结 本文重新审视基于概念模型中的信息泄露问题,提出良性泄露概念,通过优化训练目标,在概念不完整时利用泄露提升准确性和可干预性。

Comments Accepted as a position paper at the Forty-Third International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

基于概念的模型(CMs)是深度神经网络,其预测基于与人类可理解概念(如“圆形”、“条纹”等)对齐的表示。已有研究表明,这些模型会学习到泄露概念无关信息的表示。传统观点认为,这种泄露是不可取的,应予以消除,因为它会导致模型不可解释。在本文中,我们认为这种关于CMs中泄露的传统观点不仅是不恰当的(因为泄露如何使模型更不可解释的证据往往不明确),而且在常见的现实约束下必然导致不实用的CMs。具体来说,我们认为在概念不完整是常态的现实环境中,为了构建准确且可干预的CMs,某种程度的泄露往往是必要的。为此,我们提出存在所谓的良性泄露,并表明通过重新优化典型的CM训练目标,CMs可以鼓励并利用这种形式的泄露,而不会牺牲准确性或可干预性。

英文摘要

Concept-based models (CMs), deep neural networks that ground their predictions on representations aligned with human-understandable concepts (e.g., "round", "stripes", etc.), have been shown to learn representations that leak concept-irrelevant information. As the traditional narrative goes, this leakage is undesirable and should be eradicated as it leads to uninterpretable models. In this paper, we posit that this conventional view of leakage in CMs is not only ill-posed, as the evidence of how leakage makes a model less interpretable is often inconclusive, but also bound to lead to impractical CMs under common real-world constraints. Specifically, we argue that in real-world settings where concept incompleteness is the norm, some leakage is often necessary for constructing accurate and intervenable CMs. To this end, we propose that there is such a thing as benign leakage and show that, by optimizing a reframing of the typical CM training objective, CMs can encourage and exploit this form of leakage without sacrificing accuracy or intervenability.

2606.10734 2026-06-10 cs.LG stat.ME stat.ML 新提交

SPACR: Single-Pass Adaptive Training of Uncertainty-Aware Conformal Regressors

SPACR: 单次自适应训练的不确定性感知共形回归器

Soundouss Messoudi, Sylvain Rousseau, Sébastien Destercke

发表机构 * Heudiasyc - UMR CNRS 7253, Université de Technologie de Compiègne(法国贡比涅技术大学 - CNRS 7253联合实验室 Heudiasyc)

AI总结 提出SPACR方法,通过可微损失直接训练不确定性感知回归器,联合优化效率和有效性,无需批分割或预定义置信水平,单个模型在推理时支持多置信水平预测区间,实验表明其区间更窄、覆盖-效率权衡更优且计算成本更低。

详情
AI中文摘要

共形预测(CP)为预测模型提供了鲁棒的不确定性保证,但通常事后应用,这导致模型训练与产生高效(即窄)区间的共形目标不一致。我们提出SPACR(单次自适应共形回归器),一种在可微损失内直接训练不确定性感知回归器的新方法。SPACR联合优化效率和有效性,无需在训练期间进行批分割或预定义置信水平。因此,单个SPACR模型在推理时能在多个置信水平下产生有效的预测区间,避免了像DOICR等方法所需的高成本重训练。在多个数据集上的实验表明,与标准CP和DOICR相比,SPACR始终提供更紧的区间和更好的覆盖-效率权衡,同时显著降低计算成本。

英文摘要

Conformal Prediction (CP) provides robust uncertainty guarantees for predictive models, but is typically applied post hoc, which misaligns model training with the conformal goal of producing efficient (i.e, narrow) intervals. We propose SPACR (Single-Pass Adaptive Conformal Regressor), a novel method for directly training uncertainty-aware regressors within a differentiable loss. SPACR jointly optimizes efficiency and validity without batch-splitting or a predefined confidence levels during training. As a result, a single SPACR model yields valid prediction intervals at multiple confidence levels during inference, avoiding the costly retraining required by methods like DOICR. Experiments on diverse datasets show that SPACR consistently gives tighter intervals and better coverage-efficiency trade-offs compared to standard CP and DOICR, while significantly reducing computational costs.

2606.10777 2026-06-10 cs.LG 新提交

Can we trust our models? Epistemic calibration in second-order classification

我们能信任我们的模型吗?二阶分类中的认知校准

Arthur Hoarau

发表机构 * Université de Lorraine, CentraleSupélec Loria, CNRS(洛林大学,中央理工-高等电力学院洛里亚实验室,法国国家科学研究中心)

AI总结 提出认知校准准则,衡量认知不确定性估计是否可靠,并证明其比经典校准更严格,通过EECE指标实验揭示不同不确定性量化方法的差异。

详情
AI中文摘要

不确定性估计对于在高风险场景中部署机器学习模型至关重要。然而,经典校准仅评估预测概率的可靠性,并不评估认知不确定性估计本身是否可信。这一局限性对于二阶分类模型尤为突出。我们引入认知校准,这是一个有原则的准则,用于衡量报告的认知不确定性是否忠实地反映了模型预测围绕真实值的分散程度。我们证明认知校准是比经典校准更严格的概念,并能捕捉标准指标无法发现的失败模式。通过一个在认知校准假设下成立的不可能性定理,我们将这项工作与现有文献联系起来。为了将这一概念付诸实践,我们提出了期望认知校准误差(EECE),并证明它是真实认知校准误差(TECE)的一致估计量。在广泛的不确定性量化方法上的实验表明,认知校准是一个连贯且有意义的准则,并揭示了不同方法之间的显著差异,尽管它们的预测性能相似。

英文摘要

Uncertainty estimation is critical for deploying machine learning models in high-stakes settings. However, classical calibration only assesses the reliability of predicted probabilities and does not evaluate whether epistemic uncertainty estimates are themselves trustworthy. This limitation is particularly relevant for second-order classification models. We introduce epistemic calibration, a principled criterion that measures whether reported epistemic uncertainty faithfully reflects the dispersion of model predictions around the ground truth. We show that epistemic calibration is a strictly stronger notion than classical calibration and captures failure modes invisible to standard metrics. We relate this work to the existing literature through an impossibility theorem that holds under the epistemic calibration hypothesis. To operationalize this concept, we propose the Expected Epistemic Calibration Error (EECE), which we prove to be a consistent estimator of a True Epistemic Calibration Error (TECE). Experiments across a broad range of uncertainty quantification methods show that epistemic calibration is a coherent and meaningful criterion and reveal substantial differences across methods, despite similar predictive performance.

2606.10877 2026-06-10 cs.LG cs.CV 新提交

XtrAIn: Training-Guided Occlusion for Feature Attribution

XtrAIn:训练引导的遮挡特征归因

Thodoris Lymperopoulos, Ioannis Kakogeorgiou, Denia Kanellopoulou

发表机构 * NCSR Demokritos(希腊国家科学研究中心德谟克利特)

AI总结 提出XtrAIn方法,将遮挡操作从输入空间转移到参数空间,通过跟踪模型训练轨迹测量特征相关参数更新对输出logits的影响,解决传统遮挡归因中的偏差和不稳定性问题。

Comments 12 pages, 7 figures, 1 table

详情
AI中文摘要

基于遮挡的归因方法通过扰动输入特征并测量模型输出的变化来估计特征重要性,提供了一种直观的方式。然而,其可靠性受到特征移除实现方式的强烈影响:外部选择的基线可能引入偏差、分布外样本和不稳定的解释,而在非线性模型中,遮挡一组特征也可能改变非遮挡特征的贡献。我们将这种效应称为归因偏移,因为非遮挡特征的归因分数偏离其初始值。为了解决这些导致解释不稳定的主要问题,我们引入了XtrAIn,一种训练引导的归因方法,将遮挡操作从输入空间转移到参数空间。XtrAIn不用于工基线替换输入值,而是遵循模型的训练轨迹,测量特征相关参数更新如何影响输出logits。我们进一步引入了Xstep,一种轻量级近似方法以降低计算成本,以及XtrAIn+,一种目标聚焦变体,强调与目标类别一致的更新。在受控图像数据集和PAM50乳腺癌亚型分类上的实验表明,所提出的方法比标准归因基线产生更清晰、更可解释的归因模式。总体而言,XtrAIn提供了对特征归因的训练感知视角,并为研究训练过程中特征级证据的形成提供了有用的诊断工具。

英文摘要

Occlusion-based attribution methods provide an intuitive way to estimate feature importance by perturbing input features and measuring the resulting change in model output. However, their reliability is strongly affected by how feature removal is implemented: externally selected baselines can introduce bias, out-of-distribution samples, and unstable explanations, while in nonlinear models the occlusion of a set of features can also alter the contribution of non-occluded features. We refer to this effect as attribution shift, as the attribution scores of the non-occluded features drift from their initial values. To challenge these major issues that render explanations unstable, we introduce XtrAIn, a training-guided attribution method that transfers the occlusion operation from the input space to the parameter space. Instead of replacing input values with hand-crafted baselines, XtrAIn follows the model's training trajectory and measures how feature-associated parameter updates affect the output logits. We further introduce Xstep, a lightweight approximation for reducing computational cost, and XtrAIn+, a target-focused variant that emphasizes updates aligned with the target class. Experiments on controlled image datasets and PAM50 breast-cancer subtype classification show that the proposed methods produce cleaner and more interpretable attribution patterns than standard attribution baselines. Overall, XtrAIn provides a training-aware perspective on feature attribution and offers a useful diagnostic tool for studying how feature-level evidence is formed during training.

2606.10159 2026-06-10 cs.CL cs.AI cs.CY cs.LG 交叉投稿

Gaming AI-Assisted Peer Reviews Poses New Risks to the Scientific Community

游戏化AI辅助同行评审对科学界构成新风险

Lin Li, Qi Zhang, Xander Davies, Jianing Qiu, Yarin Gal

AI总结 研究发现,通过表面改写摘要即可显著操纵AI评审结果,成功率约38%,且成本低、难以区分,可能扭曲科学评估的公正性。

详情
AI中文摘要

AI越来越多地被用于支持科学同行评审,从稿件筛选、评审辅助到编辑分类。尽管这类系统有望减轻评审负担并加速出版,但其对策略性操纵的鲁棒性仍知之甚少。本文表明,AI中介的同行评审容易受到一种简单、低成本的操纵:对稿件摘要进行表面改写。在不改变底层科学内容和交流方式,甚至不了解评审模型的情况下,对抗性重写的摘要显著改善了AI评审结果。我们在不同学科和出版场所,针对人类撰写和AI生成的论文都观察到了这一现象。我们最强的攻击实现了约38%的攻击成功率,将Gemini 3 Flash评审员的接受评分提高了+1.31,将GPT 5.4 Mini评审员的接受评分提高了+0.88(10分制)。当原始AI评审建议“拒绝”时,成功率升至50%以上。这种效应不仅限于总体分数膨胀,还增加了评审信心以及核心科学标准(如合理性、重要性和感知贡献)的得分。该攻击实用性强,仅需约5分钟和1美元即可完成一篇10页的AI会议投稿,且难以与普通科学编辑区分。膨胀的AI评审可能偏向下游人类决策,将编辑建议从拒绝转向接受。这些发现揭示了AI辅助科学评估中的一个普遍漏洞:当AI生成的评审影响编辑决策时,作者可能被激励优化稿件以迎合AI判断而非科学价值。我们的结果表明,在高风险的同行评审中,AI工具不应被视为中立的评估者,而应进行系统的鲁棒性测试、透明的保障措施和谨慎的人工监督。

英文摘要

AI is increasingly used to support scientific peer review, from manuscript screening, reviewer assistance to editorial triage. Although such systems promise to reduce reviewer burden and accelerate publication, their robustness to strategic manipulation remains poorly understood. Here we show that AI-mediated peer review is vulnerable to a simple, low-cost manipulation: superficial rephrasing of the manuscript abstract. Without changing the underlying scientific content and communication, and even without knowledge of the reviewing model, adversarially rewritten abstracts substantially improve AI review outcomes. We see this across disciplines and publication venues, for both human-written and AI-generated papers. Our strongest attack achieves an attack-success-rate of about 38%, increasing acceptance ratings by +1.31 for Gemini 3 Flash reviewers and by +0.88 for GPT 5.4 Mini reviewers on a 10-point scale. When the original AI review suggests 'reject', the success rate rises to more than 50%. This effect extends beyond overall score inflation, increasing review confidence and scores on core scientific criteria such as soundness, significance and perceived contribution. The attack is practical, requiring only about 5 minutes and $1 for a 10-page AI conference submission, and is hard to distinguish from ordinary scientific editing. Inflated AI reviews could bias downstream human decision-making, shifting editorial recommendations from rejection towards acceptance. These findings reveal a general vulnerability in AI-assisted scientific evaluation: when AI-generated review influence editorial decisions, authors may be incentivized to optimize manuscripts for AI judgment rather than scientific merit. Our results suggest that AI tools should not be treated as neutral evaluators in high-stakes peer review without systematic robustness testing, transparent safeguards and careful human oversight.

2606.10906 2026-06-10 stat.ML cs.AI cs.LG 交叉投稿

Human-AI Teaming Through the Lens of Calibration

通过校准视角看人机协作

Eric Nalisnick, Chi Zhang, Sophia Qian, Yixin Wang

发表机构 * Department of Computer Science, Johns Hopkins University(计算机科学系,约翰霍普金斯大学) Department of Statistics, University of Michigan(统计学系,密歇根大学)

AI总结 研究通过统计校准视角分析人机协作模型,发现组合方法不保留人类校准度,而委托方法将校准负担转移给拒绝器元模型,且当人类依赖系统不可观测信息时无法实现。

Comments 19 pages, 5 figures (including appendix)

详情
AI中文摘要

我们通过统计校准的视角研究人机协作模型。假设团队由AI模型和人类组成——两者相对于特征空间的某种划分都是校准的——并揭示校准假设如何传播到协作框架中。特别地,我们考虑两种框架:(i) 结合人类和模型预测,或 (ii) 将预测责任委托给人类或模型。通过理论和实证结果,我们表明现有的组合方法不保留人类的校准程度。委托方法(通过委托行为本身)保留了后续预测器的校准,但将负担转移到了决定谁进行预测的拒绝器元模型上。拒绝器必须足够精细地校准,以定位每个成员的优势所在,这一需求随着人类专业知识的增长而增加,并且当人类依赖系统无法观测的信息时变得无法实现。

英文摘要

We study models for human-AI teaming through the lens of statistical calibration. We assume the team consists of an AI model and human -- both of which are calibrated with respect to some partitioning of the feature space -- and expose how the calibration assumptions propagate into the teaming framework. In particular, we consider frameworks that either (i) combine human and model predictions or (ii) delegate prediction responsibility to either a human or model. We show via theoretical and empirical results that existing methods for combination do not preserve the human's degree of calibration. Methods for delegation (by the very act of delegation) preserve calibration of the downstream predictors but shift the burden onto the rejector meta-model that decides who predicts. The rejector must be calibrated finely enough to locate where each member is superior, a demand that grows with the human's expertise and becomes unattainable when the human relies on information the system cannot observe.

2606.10912 2026-06-10 cs.SD cs.AI cs.CR cs.LG 交叉投稿

What Do Deepfake Speech Detectors Actually Hear?

深度伪造语音检测器实际上听到了什么?

Vojtěch Staněk, Veronika Jirmusová, Anton Firc, Kamil Malinka, Jakub Reš, Martin Perešíni

发表机构 * Brno University of Technology(布尔诺理工大学)

AI总结 提出基于自监督表示和积分梯度的可解释性方法,分析三种WavLM检测器在ASVspoof5上的决策线索,发现它们分别依赖环境噪声、音素伪影和词边界。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

深度伪造语音检测器通常输出一个分数,而不解释为什么音频样本被标记、证据在信号中的位置或哪些线索驱动了决策。我们提出了一种音频原生的可解释性管道,使用时间对齐的自监督表示上的积分梯度来随时间定位决策证据。我们将所提出的方法应用于ASVspoof5上的三个基于WavLM的检测器(AASIST、CA-MHFA、SLS),并手动注释最高归因区域以提供最重要线索的语义含义。尽管性能相似,检测器依赖不同的线索:AASIST强调非语音/环境线索,CA-MHFA关注局部音素伪影,SLS依赖词边界和频谱完整性。我们超越推测性推理,通过因果遮蔽主要检测器线索来验证我们的发现。观察到的性能下降进一步支持了解释的检测器语义。

英文摘要

Deepfake speech detectors often output a single score without explaining why an audio sample is flagged, where in the signal the evidence lies, or what cues drive the decision. We propose an audio-native explainability pipeline using Integrated Gradients on time-aligned self-supervised representations to localize decision evidence over time. We apply the proposed method to three WavLM-based detectors (AASIST, CA-MHFA, SLS) on ASVspoof 5 and manually annotate the highest-attribution regions to provide a semantic meaning of the most important cues. Despite similar performance, the detectors rely on different cues: AASIST emphasizes non-speech/environment cues, CA-MHFA focuses on localized phoneme artifacts, and SLS relies on word boundaries and spectral integrity. We move beyond speculative reasoning and validate our findings by causal masking of the primary detector cues. Observed performance degradation further supports the explained detector semantics.

2606.11044 2026-06-10 stat.ML cs.LG 交叉投稿

Generalized Conformal Predictive Systems Under Distributional Shifts

广义共形预测系统在分布偏移下的应用

Jef Jonkers, Johanna Ziegel

发表机构 * IDLab Seminar for Statistics(统计研究所研讨会) Department of Electronics(电子系) ETH Zurich(苏黎世联邦理工学院) Information Systems Zurich, Switzerland(苏黎世信息系统,瑞士) Ghent University(根特大学)

AI总结 针对分布偏移,通过观测特定置换权重编码偏移,扩展广义共形预测系统,提出偏移感知预测系统,并引入权重不确定性框构建鲁棒共形预测系统包络,提供有限样本或渐近置信保证。

Comments 27 pages, 10 figures

详情
AI中文摘要

共形预测系统(CPS)在可交换性假设下输出校准的CDF带。我们通过观测特定的置换权重编码分布偏移,将广义CPS扩展到非可交换设置。这产生了偏移感知预测系统,当测试点(条件于无序样本)是从观测原子中加权抽取时,该系统保持有效。由于此类权重通常需要估计,我们引入了权重不确定性框,并构建了具有有限样本或渐近置信保证的鲁棒CPS包络。我们推导了符合性度量CPS、共形分箱和共形等渗分布回归的高效计算方法。在协变量偏移和反馈驱动的生物分子设计实验下,校准的预测带在更强偏移下变宽,随样本量增加而收紧。

英文摘要

Conformal predictive systems (CPS) output calibrated bands of CDFs under exchangeability. We extend generalized CPS to non-exchangeable settings by encoding distributional shifts through observation-specific permutation weights. This yields shift-aware predictive systems that remain valid whenever the test point is, conditionally on the unordered sample, a weighted draw from the observed atoms. Since such weights are typically estimated, we introduce weight-uncertainty boxes and construct robust CPS envelopes with finite-sample or asymptotic confidence guarantees. We derive efficient computation for conformity-measure CPS, conformal binning, and conformal isotonic distributional regression. Experiments under covariate shift and feedback-driven biomolecular design show calibrated predictive bands that widen under stronger shifts and tighten as sample size increases.

2403.00420 2026-06-10 cs.LG cs.AI 版本更新

Robust Deep Reinforcement Learning Through Adversarial Attacks and Training : A Survey

通过对抗攻击和训练实现鲁棒深度强化学习:综述

Lucas Schott, Josephine Delas, Hatem Hajri, Elies Gherbi, Reda Yaich, Nora Boulahia-Cuppens, Frederic Cuppens, Sylvain Lamprier

发表机构 * Institut de Recherche Technologique SystemX(技术研究 institute SystemX)

AI总结 本文综述了深度强化学习中的对抗攻击与训练方法,系统分类并比较其目标与机制,以提升模型对环境变化和扰动的鲁棒性。

Comments 83 pages, 17 figues, 3 table, 15 algorithms

详情
AI中文摘要

深度强化学习是机器学习的一个子领域,用于训练在复杂环境中执行序列动作的自主智能体。尽管在已知环境中表现出色,但它仍容易受到微小条件变化的影响,引发对其在实际应用中可靠性的担忧。为了提高可用性,深度强化学习必须展示出可信赖性和鲁棒性。提升深度强化学习对环境条件未知变化和可能扰动的鲁棒性的一种方法是通过对抗训练,即针对观测和环境动态的合适对抗攻击来训练智能体。针对这一关键问题,我们的工作深入分析了当代对抗攻击和训练方法,系统地对它们进行分类,并比较了它们的目标和操作机制。

英文摘要

Deep Reinforcement Learning (DRL) is a subfield of machine learning for training autonomous agents that take sequential actions across complex environments. Despite its significant performance in well-known environments, it remains susceptible to minor condition variations, raising concerns about its reliability in real-world applications. To improve usability, DRL must demonstrate trustworthiness and robustness. A way to improve the robustness of DRL to unknown changes in the environmental conditions and possible perturbations is through Adversarial Training, by training the agent against well-suited adversarial attacks on the observations and the dynamics of the environment. Addressing this critical issue, our work presents an in-depth analysis of contemporary adversarial attack and training methodologies, systematically categorizing them and comparing their objectives and operational mechanisms.

2509.25017 2026-06-10 cs.LG cs.CV 版本更新

Uncertainty-Aware Deep Learning for Wildfire Danger Forecasting

不确定性感知的深度学习用于野火危险预测

Spyros Kondylatos, Nikolas Papadopoulos, Gustau Camps-Valls, Ioannis Papoutsis

发表机构 * Aix-Marseille University(艾克斯-马赛大学) University of Cambridge(剑桥大学) University of Malaga(马拉加大学) University of Crete(希腊克里特大学)

AI总结 提出不确定性感知深度学习框架,联合捕获认知不确定性和偶然不确定性,提升短期野火危险预测的准确性和可靠性,F1分数提高2.3%,预期校准误差降低2.1%。

详情
AI中文摘要

野火是最严重的自然灾害之一,对人类和自然生态系统构成重大威胁。日益增长的野火风险增加了对不仅准确而且可靠的预测模型的需求。深度学习在预测野火危险方面显示出潜力;然而,其采用受到对其预测可靠性的担忧的阻碍,部分源于缺乏不确定性量化。为应对这一挑战,我们提出了一个不确定性感知的深度学习框架,该框架联合捕获认知(模型)和偶然(数据)不确定性,以增强短期野火危险预测。在次日预测中,与确定性基线相比,我们表现最佳的模型将F1分数提高了2.3%,并将预期校准误差降低了2.1%,从而提升了预测技能和校准能力。我们的实验证实了不确定性估计的可靠性,并展示了它们在决策支持中的实际效用,包括识别拒绝低置信度预测的不确定性阈值,以及生成伴随不确定性层的良好校准的野火危险图。将预测范围延长至十天,我们观察到偶然不确定性随时间增加,表明环境条件的更大变异性,而认知不确定性保持稳定。最后,我们表明,尽管两种不确定性类型在低不确定性情况下可能是冗余的,但在更具挑战性的条件下它们提供互补的见解,强调了联合建模对稳健野火危险预测的价值。总之,我们的方法显著提高了野火危险预测的准确性和可靠性,推动了可信赖的野火深度学习系统的发展。

英文摘要

Wildfires are among the most severe natural hazards, posing a significant threat to both humans and natural ecosystems. The growing risk of wildfires increases the demand for forecasting models that are not only accurate but also reliable. Deep Learning (DL) has shown promise in predicting wildfire danger; however, its adoption is hindered by concerns over the reliability of its predictions, some of which stem from the lack of uncertainty quantification. To address this challenge, we present an uncertainty-aware DL framework that jointly captures epistemic (model) and aleatoric (data) uncertainty to enhance short-term wildfire danger forecasting. In the next-day forecasting, our best-performing model improves the F1 Score by 2.3% and reduces the Expected Calibration Error by 2.1% compared to a deterministic baseline, enhancing both predictive skill and calibration. Our experiments confirm the reliability of the uncertainty estimates and illustrate their practical utility for decision support, including the identification of uncertainty thresholds for rejecting low-confidence predictions and the generation of well-calibrated wildfire danger maps with accompanying uncertainty layers. Extending the forecast horizon up to ten days, we observe that aleatoric uncertainty increases with time, showing greater variability in environmental conditions, while epistemic uncertainty remains stable. Finally, we show that although the two uncertainty types may be redundant in low-uncertainty cases, they provide complementary insights under more challenging conditions, underscoring the value of their joint modeling for robust wildfire danger prediction. In summary, our approach significantly improves the accuracy and reliability of wildfire danger forecasting, advancing the development of trustworthy wildfire DL systems.

2603.25670 2026-06-10 cs.LG cs.SE 版本更新

Uncertainty-Guided Label Rebalancing for CPS Safety Monitoring

不确定性引导的标签重平衡用于CPS安全监控

John Ayotunde, Qinghua Xu, Guancheng Wang, Lionel C. Briand

发表机构 * Lero Research Ireland Centre for Software Research, University of Limerick, Castletroy, Limerick, Ireland(勒罗爱尔兰软件研究中心,利默里克大学,卡斯莱特里,利默里克,爱尔兰) University of Ottawa, Canada(渥太华大学,加拿大) Lero Research Ireland Centre for Software Research, University of Limerick, Ireland(勒罗爱尔兰软件研究中心,利默里克大学,爱尔兰)

AI总结 提出U-Balance方法,利用行为不确定性对CPS时间序列数据进行标签重平衡,通过GatedMLP预测不确定性并概率性重标异常安全样本,在无人机基准上F1达0.806,优于基线14.3个百分点。

Comments 11 pages (main content), 3 pages references, 5 figures, 5 tables

详情
AI中文摘要

安全监控对于信息物理系统(CPS)至关重要。然而,实际CPS运行中不安全事件罕见,导致极端类别不平衡,降低了安全预测器的性能。标准重平衡技术对时间序列CPS遥测数据表现不佳,要么生成不真实的合成样本,要么对少数类过拟合。同时,CPS操作中的行为不确定性(定义为CPS决策中的怀疑或不确定程度)通常与安全结果相关,但在安全监控中尚未被探索。为此,我们提出U-Balance,一种监督方法,在训练安全预测器之前利用行为不确定性对不平衡数据集进行重平衡。U-Balance首先训练一个基于GatedMLP的不确定性预测器,将每个遥测窗口总结为分布运动学特征并输出不确定性分数。然后,它应用不确定性引导的标签重平衡(uLNR)机制,将具有异常高不确定性的安全标记窗口概率性地重新标记为不安全,从而在不合成新数据的情况下,用信息丰富的边界样本丰富少数类。最后,在重平衡数据集上训练安全预测器用于安全监控。我们在一个安全与不安全比例为46:1的大规模无人机基准上评估U-Balance。结果证实了行为不确定性与安全之间存在中等但显著的相关性。然后,我们确定uLNR是利用不确定性信息的最有效策略,优于直接早期融合和晚期融合。U-Balance实现了0.806的F1分数,比最强基线高出14.3个百分点,同时保持了有竞争力的推理效率。消融研究证实,基于GatedMLP的不确定性预测器和uLNR机制都对U-Balance的有效性有显著贡献。

英文摘要

Safety monitoring is essential for Cyber-Physical Systems (CPSs). However, unsafe events are rare in real-world CPS operations, creating an extreme class imbalance that degrades safety predictors. Standard rebalancing techniques perform poorly on time-series CPS telemetry, either generating unrealistic synthetic samples or overfitting on the minority class. Meanwhile, behavioral uncertainty in CPS operations, defined as the degree of doubt or uncertainty in CPS decisions , is often correlated with safety outcomes but unexplored in safety monitoring. To that end, we propose U-Balance, a supervised approach that leverages behavioral uncertainty to rebalance imbalanced datasets prior to training a safety predictor. U-Balance first trains a GatedMLP-based uncertainty predictor that summarizes each telemetry window into distributional kinematic features and outputs an uncertainty score. It then applies an uncertainty-guided label rebalancing (uLNR) mechanism that probabilistically relabels $\textit{safe}$-labeled windows with unusually high uncertainty as $\textit{unsafe}$, thereby enriching the minority class with informative boundary samples without synthesizing new data. Finally, a safety predictor is trained on the rebalanced dataset for safety monitoring. We evaluate U-Balance on a large-scale UAV benchmark with a 46:1 safe-to-unsafe ratio. Results confirm a moderate but significant correlation between behavioral uncertainty and safety. We then identify uLNR as the most effective strategy to exploit uncertainty information, compared to direct early and late fusion. U-Balance achieves a 0.806 F1 score, outperforming the strongest baseline by 14.3 percentage points, while maintaining competitive inference efficiency. Ablation studies confirm that both the GatedMLP-based uncertainty predictor and the uLNR mechanism contribute significantly to U-Balance's effectiveness.

2605.03217 2026-06-10 cs.LG cs.CY 版本更新

Moral Sensitivity in LLMs: A Tiered Evaluation of Contextual Bias via Behavioral Profiling and Mechanistic Interpretability

大语言模型中的道德敏感性:通过行为剖析和机制可解释性对上下文偏见进行分层评估

Yash Aggarwal, Atmika Gorti, Vinija Jain, Aman Chadha, Krishnaprasad Thirunarayan, Manas Gaur

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校) Purdue University(普渡大学) Meta Apple(苹果公司) Wright State University(怀特州立大学) University of Maryland, Baltimore County(马里兰大学巴尔的摩县分校)

AI总结 提出道德敏感性指数(MSI)量化大语言模型在七级压力测试中的偏见概率,并通过行为剖析和机制验证揭示模型偏见随情境变化的U型曲线,发现推理蒸馏会重新激活浅层统计关联。

详情
AI中文摘要

大型语言模型(LLM)越来越多地部署在需要细致伦理推理的环境中,但现有的偏见评估将模型输出简单地视为“有偏见”或“无偏见”。这种二元框架忽略了偏见实际出现的渐进、情境敏感的方式。我们分两个阶段解决这一差距:行为剖析和机制验证。在行为阶段,我们引入了道德敏感性指数(MSI),该指标量化了在从抽象数值问题到基于历史和社会经济不公正场景的七级压力测试中产生偏见输出的概率。评估四个领先模型(Claude 3.5、Qwen 3.5、Llama 3和Gemini 1.5),我们识别出由对齐设计塑造的不同行为特征:例如,Gemini 1.5在社会经济框架下达到第5级时MSI为72.7%,而Claude表现出与基于身份的安全训练一致的强烈抑制。然后,我们在机制上验证这些行为模式。我们选择在所有模型中产生最高MSI分数的犯罪偏见场景作为探针,并将logit透镜、注意力分析、激活修补和语义探针应用于一组受控的六个模型,涵盖三个能力层级:小型语言模型(SLM)、指令微调基础模型和推理蒸馏变体。电路级分析揭示了偏见的U型曲线:SLM表现出强烈的犯罪偏见;扩展到指令微调模型消除了偏见;推理蒸馏将偏见重新引入到类似SLM的水平,尽管参数数量相同,这表明蒸馏以重新激活浅层统计关联的方式压缩了推理轨迹。关键的是,驱动高MSI分数的社会负载线索激活了与机制识别出的相同偏见驱动电路,提供了跨阶段验证。

英文摘要

Large language models (LLMs) are increasingly deployed in settings that require nuanced ethical reasoning, yet existing bias evaluations treat model outputs as simply "biased" or "unbiased." This binary framing misses the gradual, context-sensitive way bias actually emerges. We address this gap in two stages: behavioral profiling and mechanistic validation. In the behavioral stage, we introduce the Moral Sensitivity Index (MSI), a metric that quantifies the probability of biased output across a graduated, seven-tier stress test ranging from abstract numerical problems to scenarios rooted in historical and socioeconomic injustice. Evaluating four leading models (Claude 3.5, Qwen 3.5, Llama 3, and Gemini 1.5), we identify distinct behavioral signatures shaped by alignment design: for instance, Gemini 1.5 reaches 72.7% MSI by Tier 5 under socioeconomic framing, while Claude exhibits sharp suppression consistent with identity-based safety training. We then verify these behavioral patterns mechanistically. We select criminal-bias scenarios, which produced the highest MSI scores across models, as probes and apply logit lens, attention analysis, activation patching, and semantic probing to a controlled set of six models spanning three capability tiers: small language models (SLMs), instruction-tuned base models, and reasoning-distilled variants. Circuit-level analysis reveals a U-curve of bias: SLMs exhibit strong criminal bias; scaling to instruction-tuned models eliminates it; reasoning distillation reintroduces bias to SLM-like levels despite identical parameter counts, suggesting distillation compresses reasoning traces in ways that reactivate shallow statistical associations. Critically, the socially loaded cues that drive high MSI scores activate the same bias-driving circuits identified mechanistically, providing cross-stage validation.

2605.20347 2026-06-10 cs.LG stat.ML 版本更新

Symmetrization of Loss Functions for Robust Training of Neural Networks in the Presence of Noisy Labels

损失函数对称化以在存在噪声标签的情况下实现神经网络的鲁棒训练

Alexandre Lemire Paquin, Brahim Chaib-Draa, Philippe Giguère

发表机构 * Department of Computer Science and Software Engineering(计算机科学与软件工程系)

AI总结 本文研究了通过将交叉熵损失对称化来设计鲁棒损失函数的方法,提出了一种多类对称损失函数,并展示了其在噪声标签下的有效性。

Comments 28 pages, 1 figure, 4 tables. v2: Added relevant prior-work citations and revised the related-work discussion and Section 5.2. Minor wording corrections

详情
AI中文摘要

训练集的标注通常是昂贵且易出错的,因此设计对噪声具有鲁棒性的损失函数是一个重要的问题。对称条件为这种噪声的鲁棒性提供了理论保证。在本文中,我们研究了一种源自任何多类损失函数唯一分解为对称部分和类无关项的对称化方法。特别是,对交叉熵损失进行对称化会导致多类线性扩展的unhinged损失。与二分类情况不同,多类版本必须具有特定的系数才能满足对称条件。在适当假设下,我们证明这种多类unhinged损失是唯一的凸多类对称损失。我们还证明它在局部上具有根本作用:任何对称损失在具有相等分量的分数向量处的线性近似等价于多类unhinged损失。然后我们引入了SGCE和alpha-MAE两种损失函数,它们在多类unhinged损失和均值绝对误差之间进行插值,同时允许控制损失的beta-平滑性。在标准的噪声标签基准上的实验表明,其性能与现有的鲁棒损失函数相比具有竞争力。

英文摘要

Labeling a training set is often expensive and susceptible to errors, making the design of robust loss functions for label noise an important problem. The symmetry condition provides theoretical guarantees for robustness to such noise. In this work, we study a symmetrization method arising from the unique decomposition of any multi-class loss function into a symmetric component and a class-insensitive term. In particular, symmetrizing the cross-entropy loss leads to a linear multi-class extension of the unhinged loss. Unlike in the binary case, the multi-class version must have specific coefficients in order to satisfy the symmetry condition. Under suitable assumptions, we show that this multi-class unhinged loss is the unique convex multi-class symmetric loss. We also show that it has a fundamental local role: the linear approximation of any symmetric loss around score vectors with equal components is equivalent to the multi-class unhinged loss. We then introduce SGCE and alpha-MAE, two loss functions that interpolate between the multi-class unhinged loss and the Mean Absolute Error while allowing control of the beta-smoothness of the loss. Experiments on standard noisy-label benchmarks show competitive performance compared with existing robust loss functions.

2411.05698 2026-06-10 cs.CV cs.AI cs.LG 版本更新

Visual-TCAV: Concept-based Attribution and Saliency Maps for Post-hoc Explainability in Image Classification

Visual-TCAV:用于图像分类事后可解释性的基于概念的归因和显著性图

Antonio De Santis, Riccardo Campi, Matteo Bianchi, Marco Brambilla

发表机构 * Politecnico di Milano(米兰理工大学)

AI总结 提出Visual-TCAV框架,结合概念激活向量和积分梯度,生成类无关显著性图并估计概念归因,在受控实验中比TCAV更忠实于真实解释。

Comments Accepted in TMLR

详情
AI中文摘要

卷积神经网络在图像分类中表现出色,但由于模型规模和复杂性,解释其预测具有挑战性。最先进的显著性方法生成局部解释,突出输入图像中识别类别的区域,但无法解释感兴趣的概念如何贡献于预测。另一方面,基于概念的方法(如TCAV)提供了网络对人类定义概念敏感性的见解,但无法计算其在特定预测中的归因,也无法显示其在输入图像中的位置。我们引入了Visual-TCAV,一种新颖的可解释性框架,旨在通过提供局部和全局解释来弥合这些方法之间的差距。Visual-TCAV使用概念激活向量(CAV)生成类无关的显著性图,显示网络识别特定概念的位置。此外,它可以使用积分梯度的推广来估计这些概念对任何类别输出的归因。我们通过一个已知解释真实情况的受控实验评估了该方法的忠实性,显示出比TCAV更好的真实情况对齐。我们的代码可在https://this URL获取。

英文摘要

Convolutional Neural Networks (CNNs) have shown remarkable performance in image classification. However, interpreting their predictions is challenging due to the size and complexity of these models. State-of-the-art saliency methods generate local explanations highlighting the area in the input image where a class is identified but cannot explain how a concept of interest contributes to the prediction. On the other hand, concept-based methods, such as TCAV, provide insights into how sensitive the network is to a human-defined concept but cannot compute its attribution in a specific prediction nor show its location within the input image. We introduce Visual-TCAV, a novel explainability framework aiming to bridge the gap between these methods by providing both local and global explanations. Visual-TCAV uses Concept Activation Vectors (CAVs) to generate class-agnostic saliency maps that show where the network recognizes a certain concept. Moreover, it can estimate the attribution of these concepts to the output of any class using a generalization of Integrated Gradients. We evaluate the method's faithfulness via a controlled experiment where the ground truth for explanations is known, showing better ground truth alignment than TCAV. Our code is available at https://github.com/DataSciencePolimi/Visual-TCAV.

2505.14608 2026-06-10 cs.CL cs.AI cs.LG 版本更新

Attacks on Machine-Text Detectors Retain Stylistic Fingerprints

对机器文本检测器的攻击保留风格指纹

Rafael Rivera Soto, Barry Chen, Nicholas Andrews

发表机构 * GitHub University of California, Berkeley(加州大学伯克利分校)

AI总结 研究机器文本检测器对抗攻击的局限性,提出一种同时优化不可检测性和特定人类风格的 paraphrasing 方法,发现单文档检测不可靠,需多文档分析。

详情
AI中文摘要

尽管机器文本检测器的开发取得了显著进展,但机器文本容易被操纵以逃避检测,这导致有人认为该问题本质上是难以解决的。在这项工作中,我们研究了这种逃避策略的局限性。我们证明,尽管当前的攻击(从提示工程到检测器引导的优化)可以有效降低标准检测器的性能,但它们无法抹去机器文本底层的风格“指纹”。我们表明,利用风格特征空间的少样本检测器对这些逃避尝试具有鲁棒性,即使对于明确调整以逃避检测的模型生成的样本也能可靠地检测。这引发了一个问题:风格是否代表了对机器检测攻击的通用防御?我们通过引入一种新颖的 paraphrasing 方法来证明答案是“不”,该方法同时优化不可检测性和对特定人类风格的遵循。我们表明,与先前方法不同,这种攻击有效逃避了所有考虑的检测器,包括那些利用写作风格的检测器。然而,我们发现这种逃避并非绝对:随着可供分析的文档数量增加,人类和机器分布再次变得可区分。总体而言,我们的发现表明,可靠的机器文本检测需要从单文档分析转向多文档分析。

英文摘要

Despite considerable progress in the development of machine-text detectors, the ease with which machine-text can be manipulated to evade detection has led to suggestions that the problem is inherently intractable. In this work, we investigate the limits of such evasion strategies. We demonstrate that while current attacks, ranging from prompt engineering to detector-guided optimization can effectively degrade performance of standard detectors, they fail to erase the underlying stylistic "fingerprints" of machine text. We show that few-shot detectors that utilize the stylistic feature space are robust to these evasion attempts, reliably detecting samples even from models explicitly tuned to prevent detection. This raises the question: does style represent a universal defense against machine-detection attacks? We demonstrate that the answer is "no'' by introducing a novel paraphrasing approach that simultaneously optimizes for undetectability and adherence to specific human styles. We show that unlike prior methods, this attack effectively evades all considered detectors, including those that utilize writing style. However, we find that this evasion is not absolute: as the number of documents available for analysis grows, the human and machine distributions become distinguishable again. Overall, our findings suggest that reliable machine-text detection requires moving beyond single-document analysis to multi-document analysis.

2604.24668 2026-06-10 cs.AI cs.LG 版本更新

The Price of Agreement: Measuring LLM Sycophancy in Agentic Financial Applications

同意的代价:在代理金融应用中衡量LLM的谄媚行为

Zhenyu Zhao, Aparna Balagopalan, Adi Agrawal, Dilshoda Yergasheva, Waseem Alshikh, Daniel M. Bikel

发表机构 * Writer, Inc.(Writer公司)

AI总结 研究评估LLM在金融代理任务中的谄媚行为,发现模型对用户反驳仅表现低至中等性能下降,但偏好信息导致多数模型失败,并测试了输入过滤等恢复方法。

Comments Accepted to ICLR 2026 FinAI Workshop

详情
AI中文摘要

鉴于当今LLM在金融系统中的使用增加,评估此类系统的安全性和鲁棒性变得重要。LLM在通用领域设置中经常表现出的一种失败模式是谄媚行为。也就是说,模型优先考虑与表达的用户信念一致,而非正确性,导致准确性和信任度下降。在这项工作中,我们专注于评估LLM在代理金融任务中表现出的谄媚行为。我们的发现有三方面:首先,我们发现模型在面对用户对参考答案的反驳或矛盾时,仅表现出低至中等的性能下降,这区别于先前工作中模型在金融代理设置中表现出的谄媚行为。其次,我们引入了一套任务,通过用户偏好信息(与参考答案矛盾)来测试谄媚行为,并发现大多数模型在存在此类输入时失败。最后,我们基准测试了不同的恢复模式,例如使用预训练LLM进行输入过滤。

英文摘要

Given the increased use of LLMs in financial systems today, it becomes important to evaluate the safety and robustness of such systems. One failure mode that LLMs frequently display in general domain settings is that of sycophancy. That is, models prioritize agreement with expressed user beliefs over correctness, leading to decreased accuracy and trust. In this work, we focus on evaluating sycophancy that LLMs display in agentic financial tasks. Our findings are three-fold: first, we find the models show only low to modest drops in performance in the face of user rebuttals or contradictions to the reference answer, which distinguishes sycophancy that models display in financial agentic settings from findings in prior work. Second, we introduce a suite of tasks to test for sycophancy by user preference information that contradicts the reference answer and find that most models fail in the presence of such inputs. Lastly, we benchmark different modes of recovery such as input filtering with a pretrained LLM.

2605.22714 2026-06-10 cs.AI cs.CL cs.LG 版本更新

AMEL: Accumulated Message Effects on LLM Judgments

AMEL: 累积消息对LLM判断的影响

Sid-Ali Temkit

发表机构 * chut.app

AI总结 研究LLM在对话中因历史消息极性而偏离基准判断的累积消息效应(AMEL),发现模型偏向历史主流极性,且负向历史偏见更强,但偏见不随上下文长度增长,简单修复是为每个项目使用新上下文。

Comments 24 pages, 14 figures, 8 tables. Single author. Code, data (84,088 deduplicated API responses), and analysis pipeline at https://github.com/chutapp/amel

详情
AI中文摘要

大型语言模型常被用作自动评估者:审查代码、审核内容或评分输出,通常许多项目通过一次对话处理。我们询问先前对话历史的极性是否会偏倚后续判断,我们将这种效应称为LLM判断的累积消息效应(AMEL)。通过对来自4个提供商(OpenAI、Anthropic、Google和四个开源模型)的11个模型进行75,898次API调用,我们在隔离或跟随以正面或负面评价为主的历史之后呈现相同的测试项目。模型倾向于对话的主流极性(d = -0.17, p < 10^-46)。该效应集中在模型在基线时真正不确定的项目上(高熵项目d = -0.34,而基线确定时d = -0.15)。偏见不随上下文长度增长:5个先前轮次和50个产生相同的偏移(Spearman |r| < 0.01;OLS斜率p = 0.80)。并且存在负性不对称:按项目配对,负面历史诱导的偏见是正面的1.62倍(t = 13.46, p < 10^-39, n = 2,481)。扩展规模有帮助但不能解决(Anthropic: Haiku -0.22到Opus -0.17;OpenAI: Nano -0.34到GPT-5.2 -0.17)。三项后续研究缩小了机制范围。令牌概率分布连续变化,而非在阈值处。负性不对称既有令牌级成分也有语义成分,尽管在我们的样本量下平衡归因是探索性的。位置不重要:在50轮历史中任何位置的五个有偏轮次产生相同的偏移。评估流程最简单的修复是为每个项目使用新上下文;当批处理不可避免时,平衡历史有帮助。

英文摘要

Large language models are routinely used as automated evaluators: to review code, moderate content, or score outputs, often with many items passing through one conversation. We ask whether the polarity of prior conversation history biases subsequent judgments, an effect we call the accumulated message effect on LLM judgments (AMEL). Across 84,088 API calls to 12 models from 5 providers (OpenAI, Anthropic, Google, DeepSeek, and four open-source models), we present identical test items in isolation or following histories saturated with predominantly positive or negative evaluations. Models shift toward the conversation's prevailing polarity (d = -0.17, p < 10^-53). The effect concentrates on items where the model is genuinely uncertain at baseline (d = -0.36 for high-entropy items, vs d = -0.15 when the baseline is deterministic). Bias does not grow with context length: 5 prior turns and 50 produce the same shift (Spearman |r| < 0.01; OLS slope p = 0.80). And there is a negativity asymmetry: paired per item, negative histories induce 1.52x more bias than positive (t = 13.03, p < 10^-36, n = 2,733). Scaling helps but does not solve it (Anthropic: Haiku -0.22 to Opus -0.17; OpenAI: Nano -0.34 to GPT-5.2 -0.17). Three follow-ups narrow the mechanism. The token probability distribution shifts continuously, not at a threshold. The negativity asymmetry has both token-level and semantic components, though attributing the balance is exploratory at our sample sizes. Position does not matter: five biased turns anywhere in a 50-turn history produce the same shift. The simplest fix for evaluation pipelines is a fresh context per item; when batching is unavoidable, balancing the history helps.

2606.09475 2026-06-10 cs.AI cs.LG 版本更新

Emergent alignment and the projectability of ethical personas

涌现对齐与伦理人格的可投射性

Guillermo Del Pinal, Youngchan Lee, Calum McNamara, Alejandro Perez Carballo

发表机构 * University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校) Indiana University Bloomington(印第安纳大学布卢明顿分校)

AI总结 研究微调大语言模型在窄任务上如何引发广泛对齐行为,通过宪法AI方法赋予模型伦理人格,发现窄对齐可投射到未训练类别,并提出对齐策略应评估可投射性。

详情
AI中文摘要

关于“涌现错位”的研究表明,在窄任务上微调LLM会诱导广泛错位的行为。这支持了“人格选择”(PSM)假说:在预训练期间,LLM学会模拟不同的角色和视角,这些可以在后训练期间被激发和细化。本文研究了相反的现象“涌现对齐”,并用它来支持和细化PSM,并激发对齐的新需求。我们在广泛和狭窄的安全任务上微调一个仅帮助型模型。为了创建SFT样本,我们遵循“宪法AI”(CAI)方法,并使用四条编码合理对齐策略的宪法:道义论、后果论、美德伦理以及将AI对齐为从属于人类权威。对于每个模型,我们表明,在两个狭窄的安全子类别上微调可靠地诱导出在代表性的一般安全类别以及我们直接从用于窄对齐的数据集中过滤掉的安全子类别上的涌现对齐。为了使用更细粒度的评估测试“PSM”,我们使用了多维“伦理人格”诊断。对于每个宪法微调(广泛/狭窄)模型,我们评估其行为与预期特征概况的匹配程度。我们的结果表明,我们的CAI模型获得了预期的“伦理人格”——例如,使用后果论宪法创建的SFT样本窄微调的模型与功利主义信念的一致性显著高于道义论信念。然而,我们的粗粒度和细粒度评估显示,我们的(广泛/狭窄)微调CAI模型在投射效果上存在显著差异。我们得出结论,对齐策略不仅应基于其(分布内)一般安全性能进行评估,还应特别基于其可投射性程度进行评估。

英文摘要

Work on `emergent misalignment' shows that finetuning LLMs on narrow tasks can induce broadly misaligned behavior. This supports the `persona selection' (PSM) hypothesis: during pre-training, LLMs learn to simulate different characters and perspectives, which can be elicited and refined during post-training. This paper investigates the converse phenomenon, `emergent alignment', and uses it to support and refine the PSM and motivate a novel desideratum for alignment. We finetune a helpful-only model on broad and narrow safety tasks. To create SFT samples, we follow the `Constitutional AI' (CAI) approach and use four constitutions which encode reasonable alignment strategies: deontology, consequentialism, virtue ethics, and aligning AIs as subordinate to human authority. For each of those models, we show that finetuning on two narrow safety sub-categories reliably induces emergent alignment over a representative set of general safety categories, and on safety subcategories that we directly filtered-out of the data sets used for narrow alignment. To test the `PSM' using a more fine-grained evaluation, we used a multidimensional `ethical persona' diagnostic. For each constitutionally finetuned (broad/narrow) model, we evaluate how well their behavior matches their expected signature profile. Our results show that our CAI models acquire their expected ``ethical persona'' -- e.g., the model narrowly fine-tuned on SFT samples created using the consequentialist constitution agrees significantly more with utilitarian than deontological beliefs. Yet our coarse and fine-grained evaluations show that there are significant differences across our (broad/narrow) finetuned CAI models in how well they project. We conclude that alignment strategies should be evaluated, not just on their (in-distribution) general safety performance, but also specifically on their degree of projectability.

9. 图学习与结构化数据 8 篇

2606.09872 2026-06-10 cs.LG cs.AI 新提交

PatchSTG: Scalable Spatiotemporal Graph Transformers for Traffic Forecasting on Irregular Sensor Networks

PatchSTG:面向不规则传感器网络的交通预测可扩展时空图Transformer

Jichao Li, Xuanming Shi

发表机构 * Portsmouth Abbey School(波特茅斯修道学校) CodingFuture (Shanghai) Education Technology Co., Ltd.(上海CodingFuture教育科技有限公司)

AI总结 提出PatchSTG,通过地理信息将传感器划分为平衡的局部补丁,并采用双注意力编码器交替捕捉局部和全局依赖,实现计算复杂度从二次降至近线性,在不规则传感器网络上取得高效且稳定的交通预测性能。

Comments 22 pages,12 figures

详情
AI中文摘要

交通预测是智能交通系统的基本组成部分,但由于传感器分布不规则以及建模大规模时空依赖的高计算成本,在实际环境中仍然具有挑战性。在实际交通网络中,传感器在不同区域分布不均,导致空间结构不均匀,限制了现有基于图和基于注意力模型的有效性和可扩展性。为了解决这些挑战,我们提出了PatchSTG,一种基于补丁的时空图Transformer,专为不规则传感器网络上的高效预测而设计。关键思想是引入一种层次化空间表示,基于地理信息将传感器划分为平衡且保持局部性的补丁。在此结构之上,双注意力编码器交替进行补丁内注意力(捕捉局部交互)和补丁间注意力(建模全局依赖),将计算复杂度从二次降低到近线性。我们在罗德岛的真实交通数据以及额外的大规模数据集上评估了PatchSTG。实验结果表明,所提模型在多个预测时域上实现了稳定且具有竞争力的预测性能,同时显著提高了计算效率。消融研究进一步验证了空间划分和双注意力在捕捉局部和长程交通动态方面的有效性。这些结果表明,基于补丁的时空建模为不规则空间设置下的交通预测提供了一个可扩展且有效的框架。

英文摘要

Traffic forecasting is a fundamental component of intelligent transportation systems, yet remains challenging in real-world settings due to irregular sensor distributions and the high computational cost of modeling large-scale spatiotemporal dependencies. In practical traffic networks, sensors are unevenly distributed across regions, leading to non-uniform spatial structures that limit the effectiveness and scalability of existing graph-based and attention-based models. To address these challenges, we propose PatchSTG, a patch-based spatiotemporal graph Transformer designed for efficient forecasting on irregular sensor networks. The key idea is to introduce a hierarchical spatial representation that partitions sensors into balanced, locality-preserving patches based on geographic information. On top of this structure, a dual attention encoder alternates between intra-patch attention for capturing local interactions and inter-patch attention for modeling global dependencies, reducing computational complexity from quadratic to near-linear scaling. We evaluate PatchSTG on real-world traffic data from Rhode Island and additional large-scale datasets. Experimental results demonstrate that the proposed model achieves stable and competitive forecasting performance across multiple horizons, while significantly improving computational efficiency. Ablation studies further validate the effectiveness of spatial partitioning and dual attention in capturing both local and long-range traffic dynamics. These results suggest that patch-based spatiotemporal modeling provides a scalable and effective framework for traffic forecasting under irregular spatial settings.

2606.10071 2026-06-10 cs.LG cs.AI 新提交

Temporal Sheaf Neural Networks with Dynamic Orthogonal Transport

时序层神经网络与动态正交传输

Md Sadek Hossain Asif, Tanzila Khan, Md. Mosaddek Khan

AI总结 提出时序层神经网络(TSNN),通过动态正交帧和局部坐标系间显式传输实现时序链接预测,在多种基准上超越现有方法,尤其适用于节点角色异质性强的图。

详情
AI中文摘要

我们引入了时序层神经网络(TSNN),这是一个时序链接预测框架,它为每个节点配备一个时变正交帧,并仅在局部坐标系之间进行显式传输后比较节点状态。与在共享全局嵌入空间中运行的现有连续时间图模型不同,TSNN通过动态局部帧建模节点特定且不断演化的交互语义。该模型通过高效的低秩Householder乘积参数化每个节点的帧,在帧更新下精确保留存储的隐藏状态,并使用几何残差解码器,该解码器基于传输距离锚定预测,同时学习残差校正。所有计算严格因果,仅使用事件前历史。我们证明了对称度归一化层拉普拉斯算子与对称归一化图拉普拉斯算子正交相似,而随机游走归一化形式在相应度度量下相似;TSNN使用的全激活、特征缩放扩散正是组合层Dirichlet能量上的度量梯度步,具有无度单调下降和非扩张保证。帧漂移仅线性扰动更新。在TGB v2链接预测和时序异质排行榜以及DGB基准套件上,TSNN在大多数基准上匹配或超越最强先前方法,在表现出强节点角色异质性的图上改进最大。消融实验证实了动态帧、正交传输和几何残差解码的独特优势。

英文摘要

We introduce Temporal Sheaf Neural Networks (TSNN), a temporal link prediction framework that equips each node with a time-varying orthogonal frame and compares node states only after explicit transport between local coordinate systems. In contrast to existing continuous-time graph models that operate in a shared global embedding space, TSNN models node-specific and evolving interaction semantics through dynamic local frames. The model parameterizes per-node frames via efficient low-rank Householder products, preserves stored hidden states exactly under frame updates, and uses a geometric-residual decoder that anchors predictions on transported distances while learning residual corrections. All computations are strictly causal and use only the pre-event history. We show that the symmetric degree-normalized sheaf Laplacian is orthogonally similar to the symmetric normalized graph Laplacian, with the random-walk normalized form similar in the corresponding degree metric; the full-active, feature-scaled diffusion used by TSNN is exactly a metric-gradient step on the combinatorial sheaf Dirichlet energy, with a degree-free monotone-descent and non-expansiveness guarantee. Frame drift perturbs updates only linearly. Across TGB v2 link-prediction and temporal-heterogeneous leaderboards, together with the DGB benchmark suite, TSNN matches or surpasses the strongest prior methods on most benchmarks, with the largest improvements on graphs exhibiting strong node-role heterogeneity. Ablations confirm the distinct benefit of dynamic frames, orthogonal transport, and geometric-residual decoding.

2606.10249 2026-06-10 cs.LG cs.SI 新提交

When Design Rules Break: Benchmark Composition Determines Whether Label Informativeness Predicts GNN Aggregator Choice

当设计规则失效:基准组成决定标签信息性是否预测GNN聚合器选择

Neha Sharma, Ritesh Sharma

发表机构 * Department of Computer Science(计算机科学系) Virginia Commonwealth University(弗吉尼亚 Commonwealth 大学) Department of Electrical and Computer Engineering(电气与计算机工程系)

AI总结 研究图神经网络聚合器选择(sum/mean/max)在24个节点分类数据集上的泛化性,发现标签信息性仅在传统基准上有效,在Facebook-100密集图中失效,且谱间隙能区分这些图。

详情
AI中文摘要

我们通过研究在24个节点分类数据集(涵盖引文、异嗜、LINKX Facebook-100、共同购买和共同作者图)上的聚合器选择(sum、mean、max),检验图神经网络(GNN)设计规则是否跨基准族泛化。边同嗜性仅能微弱预测GIN-Sum与GIN-Mean的性能差距。标签信息性在传统基准上能很好地预测这一差距,但当包含Facebook-100图时,预测能力大幅下降。在这些密集的朋友关系网络中,接近零的标签信息性与对sum聚合的强烈偏好共存,在扩展训练下产生7-10%的提升,最高达13%。随机块模型消融实验(包括匹配Facebook-100度规模的度修正变体)未能重现这一行为,表明平均度本身不能解释该效应。在若干与标签无关的图统计量中,谱间隙唯一地将这些图与其他低信息性数据集区分开来,该效应局限于单跳邻域并在不同架构中复现。我们进一步识别了与聚合器选择交互的训练机制,并表明PNA在标准引文基准上可能不如最佳单聚合器GIN。我们的结果表明,决定设计规则是否看似泛化的是基准组成而非数值不足,并且Facebook-100基准为未来的自适应聚合方法提供了具体目标。

英文摘要

We examine whether graph neural network (GNN) design rules generalize across benchmark families by studying aggregator selection (sum, mean, max) on 24 node-classification datasets spanning citation, heterophilic, LINKX Facebook-100, co-purchase, and co-authorship graphs. Edge homophily is only weakly predictive of the GIN-Sum versus GIN-Mean performance gap. Label informativeness predicts this gap well on legacy benchmarks but degrades substantially when Facebook-100 graphs are included. In these dense friendship networks, near-zero label informativeness coexists with a strong preference for sum aggregation, producing gains of 7-10% and up to 13% under extended training. Stochastic block model ablations, including degree-corrected variants matching Facebook-100 degree scales, fail to reproduce this behavior, indicating that mean degree alone does not explain the effect. Among several label-independent graph statistics, the spectral gap uniquely distinguishes these graphs from other low-informativeness datasets, with the effect localized to one-hop neighborhoods and replicated across architectures. We further identify training regimes that interact with aggregator choice and show that PNA can underperform the best single-aggregator GIN on standard citation benchmarks. Our results suggest that benchmark composition, rather than numerical insufficiency, determines whether design rules appear to generalize, and that the Facebook-100 regime provides a concrete target for future adaptive aggregation methods.

2606.10358 2026-06-10 cs.LG cs.AI 新提交

KG-SoftMAP: Soft Knowledge-Graph Priors for Bayesian Network Structure Learning from Sparse Discrete Data

KG-SoftMAP: 基于软知识图谱先验的稀疏离散数据贝叶斯网络结构学习

Guoliang Xu, James E. Corter

发表机构 * Columbia University(哥伦比亚大学)

AI总结 针对稀疏离散数据中贝叶斯网络结构学习困难的问题,提出KG-SoftMAP方法,将加权有向知识图谱编码为软先验,结合BDeu评分与logit形式先验最大化MAP目标,在合成与真实数据上显著提升结构恢复性能。

Comments 33 pages including appendices, 1 figure

详情
AI中文摘要

从稀疏离散数据中学习贝叶斯网络(BN)结构是困难的:当每个实例仅记录少数变量时,大多数变量对缺乏可靠评分所需的联合观测,且纯数据方法恢复的结构很少。不完美的领域知识,可表示为加权有向知识图谱(KG),通常是可用的。我们提出KG-SoftMAP,它将这样的KG编码为软性的、置信度加权的、可被数据覆盖的边先验,并最大化结合BDeu评分与logit形式先验的MAP目标;KG可由专家整理或由LLM提取。在受控的合成基准(唯一具有真实DAG的设置)上,KG-SoftMAP在$\rho=0.05$时恢复部分有向结构(DF1从$0.14$到$0.29$,而基线接近零),当$\rho\geq0.2$时恢复更多(DF1从$0.46$到$0.96$),前提是配有一个信息丰富但不完美的KG;恢复性能随KG质量下降而优雅地退化。在无真实DAG的真实稀疏教育数据上,我们仅评估面向部署的指标:预测、校准和KG一致性。学习到的BN最好被解读为诊断模型:在SAF上,它落后于逻辑回归$0.03$的F1_FAIL,同时提供KG一致的边、校准的联合概率以及从任意观测概念子集的推理;当不存在有意义的KG时,判别式逻辑回归更可取。

英文摘要

Learning Bayesian network (BN) structure from sparse discrete data is hard: when each instance records only a few variables, most variable pairs lack the joint observations needed for reliable scoring, and data-only methods recover little structure. Imperfect domain knowledge, expressible as a weighted directed knowledge graph (KG), is often available. We propose KG-SoftMAP, which encodes such a KG as a soft, confidence-weighted, data-overridable edge prior and maximizes a MAP objective combining the BDeu score with a logit-form prior; the KG may be expert-curated or LLM-extracted. On controlled synthetic benchmarks, the only setting with ground-truth DAGs, KG-SoftMAP recovers partial directed structure at $ρ=0.05$ (DF1 $0.14$ to $0.29$, versus near-zero baselines) and substantially more once $ρ\geq0.2$ (DF1 $0.46$ to $0.96$), when paired with an informative but imperfect KG; recovery degrades gracefully as KG quality drops. On real sparse educational data, which has no ground-truth DAG, we evaluate deployment-facing measures only: prediction, calibration, and KG-consistency. The learned BN is best read as a diagnostic model: on SAF it trails logistic regression by $0.03$ F1_FAIL while providing KG-consistent edges, calibrated joint probabilities, and inference from arbitrary observed concept subsets; when no meaningful KG exists, discriminative logistic regression is preferable.

2606.11162 2026-06-10 cs.LG 新提交

COGENT: Continuous Graph Emulators with Neural Ordinary Differential Equations for Long-Term Physical Forecasting

COGENT: 基于神经常微分方程的连续图仿真器用于长期物理预测

Zesheng Liu, Maryam Rahnemoonfar

发表机构 * Lehigh University(理海大学)

AI总结 提出COGENT,一种结合图编码器和神经常微分方程的连续图仿真器,用于不规则地理空间网格上的长期物理预测,通过连续潜在动力学实现任意时间预测,并采用滚动视界采样和渐进调度策略稳定训练,在冰盖模拟中展现出优于自回归图基线的长期稳定性。

详情
AI中文摘要

在这项工作中,我们提出了COGENT,一种基于神经常微分方程的连续图仿真器,用于不规则地理空间网格上的长期物理预测。COGENT使用基于图的历史编码器对系统状态的有限历史以及相关的强迫场和外部强迫进行编码,生成节点级上下文向量,这些向量捕捉了局部空间交互和时间演化。这些上下文向量初始化并条件化一个潜在神经常微分方程,其动力学由插值的未来强迫和显式的相对展开时间驱动。通过将预测轨迹建模为连续潜在动力系统,COGENT可以在任意未来时间生成预测,而不受固定时间离散化的限制。残差解码器将得到的潜在轨迹映射回未来的物理状态,从而实现直接的多步预测,无需反复将预测状态反馈到模型中。这种公式将基于图的空间表示、历史条件化的潜在动力学和连续时间展开统一在一个框架中,用于基于网格的物理仿真模拟。为了稳定长期监督的训练,我们还提出了有效的展开视界采样和渐进式展开视界调度策略。我们在由冰盖和海平面系统模型生成的瞬态冰盖模拟上评估了COGENT,展示了相对于自回归图基线的改进的长期稳定性。这些结果表明,连续图神经ODE为不规则地理空间网格上的可扩展物理预测提供了一种有前景的方法,特别是在需要稳定长期预测和能够在任意时间查询系统状态的应用中。

英文摘要

In this work, we present COGENT, a continuous graph emulator with Neural Ordinary Differential Equations for long-term physical forecasting on irregular geospatial meshes. COGENT encodes a finite history of system states and associated forcing fields and external forcings with a graph-based history encoder, producing node-wise context vectors that capture both local spatial interactions and temporal evolution. These context vectors initialize and condition a latent Neural Ordinary Differential Equation whose dynamics are driven by interpolated future forcings and explicit relative rollout time. By modeling the forecast trajectory as a continuous latent dynamical system, COGENT can generate predictions at arbitrary future times rather than being restricted to a fixed temporal discretization. A residual decoder maps the resulting latent trajectories back to future physical states, enabling direct multi-step forecasting without repeatedly feeding predicted states back into the model. This formulation combines graph-based spatial representation, history-conditioned latent dynamics, and continuous-time rollout in a unified framework for mesh-based physical simulation emulation. In order to stabilize training with long-horizon supervision, we also propose effective rollout-horizon sampling and a progressive rollout-horizon scheduling strategy. We evaluate COGENT on transient ice-sheet simulations generated by the Ice-sheet and Sea-level System Model, demonstrating improved long-range stability over autoregressive graph baselines. These results suggest that continuous graph Neural ODEs provide a promising methodology for scalable physical forecasting on irregular geospatial meshes, particularly in applications that require stable long-horizon predictions and the ability to query system states at arbitrary times.

2605.27770 2026-06-10 hep-th cs.LG 交叉投稿

Sampling Triangulations and Calabi-Yau Threefolds with Autoregressive GNNs

使用自回归图神经网络采样三角剖分和卡拉比-丘三维流形

Nate MacFadden

发表机构 * Department of Physics, Cornell University(康奈尔大学物理系)

AI总结 提出dualGNN,一种自回归消息传递图神经网络,用于采样凸多面体的精细正则三角剖分,并应用于弦论中均匀采样卡拉比-丘三维流形,模型参数少、训练快、泛化能力强。

Comments 50 pages, 27 figures, 3 tables

详情
AI中文摘要

我们引入了`dualGNN`,一种自回归消息传递图神经网络,用于采样凸多面体的精细正则三角剖分(FRT)。dualGNN 在三角剖分对偶图的推广上操作,边由`有向电路`标记——来自定向拟阵理论的组合不变量,我们证明这些对于揭示正则性既是必要的也是充分的。该模型独立于多面体中的点数,并在多面体的保向对称群($\mathrm{SL}(d,\mathbb{Z}) \ltimes \mathbb{Z}^d$)下不变。当使用某种掩码程序实现时,还可以保证每次 rollout 都产生一个精细三角剖分(在二维中)。在 $N_\mathrm{pts} \leq 40$ 的未见多边形上,dualGNN 是我们测试过的最均匀的 FRT 采样器,甚至仅在单个多边形上训练的模型也能很好地泛化到其他多边形。该模型很小(约 92k 参数),在单个消费级 GPU 上训练约 7.5 小时,并且无需修改即可在 M1 MacBook Pro 上运行。我们将 dualGNN 应用于弦论,在 $h^{1,1}=86$ 处均匀采样卡拉比-丘三维流形,并在 $h^{1,1}=128$ 处与均匀性一致。这比之前的学习方法提升了一个数量级,而模型小了约 1000 倍。代码、训练脚本和预训练模型可在 https://github.com/natemacfadden/dualGNN 获取。

英文摘要

We introduce `dualGNN', an autoregressive message-passing GNN for sampling fine, regular triangulations (FRTs) of convex polytopes. dualGNN operates on a generalization of the dual graph of a triangulation, with edges labeled by `signed circuits' -- combinatorial invariants from oriented matroid theory which we show are both necessary and sufficient for exposing regularity. The model is independent of the number of points in the polytope and invariant under the polytope's orientation-preserving symmetries ($\mathrm{SL}(d,\mathbb{Z}) \ltimes \mathbb{Z}^d$). When implemented with a certain masking procedure, one can also guarantee that every rollout produces a fine triangulation (in $2$D). On unseen polygons with $N_\mathrm{pts} \leq 40$, dualGNN is the most uniform FRT sampler we tested, and even a model trained on a single polygon generalizes well to other polygons. The model is small ($\sim92$k parameters), trains in $\sim7.5$ hours on a single consumer GPU, and runs without modification on an M1 MacBook Pro. We apply dualGNN to string theory, uniformly sampling Calabi-Yau threefolds at $h^{1,1}=86$ and consistent with uniformity at $h^{1,1}=128$. This is an order of magnitude beyond previous learned methods with a model $\sim1000\times$ smaller. Code, training scripts, and pretrained models are available at https://github.com/natemacfadden/dualGNN .

2502.01272 2026-06-10 cs.LG 版本更新

Boosting Graph Robustness Against Backdoor Attacks: An Over-Similarity Perspective

提升图神经网络对后门攻击的鲁棒性:过度相似性视角

Chang Liu, Hai Huang, Yujie Xing, Xingquan Zuo

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学)

AI总结 针对图后门攻击中触发器与干净节点难以区分的问题,提出基于过度相似性检测的防御方法SimGuard,利用对比学习训练检测器分离触发器,在保持干净节点性能的同时有效防御多种后门攻击。

Comments After discussions with one of the co-authors, it was decided that this version should not be made public at this time. To respect the co-author's perspective and ensure alignment among all authors, I am requesting the withdrawal of this article

详情
AI中文摘要

图神经网络(GNN)在社交网络和交通网络等任务中取得了显著成功。然而,最近的研究强调了GNN易受后门攻击的脆弱性,引发了对其在实际应用中可靠性的重大担忧。尽管已有初步努力来防御特定的图后门攻击,但现有防御方法面临两个主要挑战:要么无法在触发器和干净节点之间建立明确区分,导致许多干净节点被移除;要么未能消除触发器的影响,使得难以将目标节点恢复到攻击前的状态。通过对各种现有图后门攻击的实证分析,我们观察到这些方法生成的触发器在特征和结构上都表现出过度相似性。基于这一观察,我们提出了一种新颖的图后门防御方法SimGuard。我们首先利用基于相似性的度量来检测触发器,然后采用对比学习训练一个后门检测器,生成能够将触发器与干净节点分离的嵌入,从而提高检测效率。在真实数据集上进行的大量实验表明,我们提出的方法在保持干净节点性能的同时,有效防御了各种图后门攻击。代码将在接收后发布。

英文摘要

Graph Neural Networks (GNNs) have achieved notable success in tasks such as social and transportation networks. However, recent studies have highlighted the vulnerability of GNNs to backdoor attacks, raising significant concerns about their reliability in real-world applications. Despite initial efforts to defend against specific graph backdoor attacks, existing defense methods face two main challenges: either the inability to establish a clear distinction between triggers and clean nodes, resulting in the removal of many clean nodes, or the failure to eliminate the impact of triggers, making it challenging to restore the target nodes to their pre-attack state. Through empirical analysis of various existing graph backdoor attacks, we observe that the triggers generated by these methods exhibit over-similarity in both features and structure. Based on this observation, we propose a novel graph backdoor defense method SimGuard. We first utilizes a similarity-based metric to detect triggers and then employs contrastive learning to train a backdoor detector that generates embeddings capable of separating triggers from clean nodes, thereby improving detection efficiency. Extensive experiments conducted on real-world datasets demonstrate that our proposed method effectively defends against various graph backdoor attacks while preserving performance on clean nodes. The code will be released upon acceptance.

2603.10676 2026-06-10 cs.LG cs.CE 版本更新

Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

时空注意力图神经网络:用注意力解释因果关系

Kosti Koistinen, Kirsi Hellsten, Joni Herttuainen, Kimmo K. Kaski

发表机构 * Aalto University School of Science Computer Science Department(阿alto大学科学学院计算机科学系)

AI总结 提出时空注意力图神经网络(STA-GNN),用于工业控制系统的无监督可解释异常检测,通过动态图建模时空依赖,注意力机制揭示因果关系,并采用共形预测控制误报率。

Comments 33 pages, 7 figures

详情
AI中文摘要

工业控制系统(ICS)支撑关键基础设施,并因操作技术与网络环境的融合面临日益增长的网络物理威胁。虽然基于机器学习的ICS异常检测方法在理论上表现出色,但部署常受限于可解释性差、误报率高以及对系统行为演变(即基线漂移)的敏感性。我们提出一种时空注意力图神经网络(STA-GNN),用于ICS中无监督且可解释的异常检测,该网络同时建模系统的时间动态和关系结构。传感器、控制器和网络实体被表示为动态学习图中的节点,使模型能够捕获物理过程和通信模式之间的相互依赖关系。注意力机制提供影响关系,支持检查检测事件背后的相关性和潜在因果路径。该方法支持多种数据模态,包括SCADA点测量、网络流特征和载荷特征,从而实现统一的网络物理分析。为满足操作需求,我们引入共形预测策略来控制误报率并监控环境漂移下的性能退化。我们的发现强调了模型评估的可能性与局限性,以及ICS异常检测中的常见陷阱。我们的发现强调了可解释、漂移感知的评估对于可靠部署基于学习的安全监控系统的重要性。

英文摘要

Industrial Control Systems (ICS) underpin critical infrastructure and face growing cyber-physical threats due to the convergence of operational technology and networked environments. While machine learning-based anomaly detection approaches in ICS shows strong theoretical performance, deployment is often limited by poor explainability, high false-positive rates, and sensitivity to evolving system behavior, i.e., baseline drifting. We propose a Spatio-Temporal Attention Graph Neural Network (STA-GNN) for unsupervised and explainable anomaly detection in ICS that models both temporal dynamics and relational structure of the system. Sensors, controllers, and network entities are represented as nodes in a dynamically learned graph, enabling the model to capture inter-dependencies across physical processes and communication patterns. Attention mechanisms provide influential relationships, supporting inspection of correlations and potential causal pathways behind detected events. The approach supports multiple data modalities, including SCADA point measurements, network flow features, and payload features, and thus enables unified cyber-physical analysis. To address operational requirements, we incorporate a conformal prediction strategy to control false alarm rates and monitor performance degradation under drifting of the environment. Our findings highlight the possibilities and limitations of model evaluation and common pitfalls in anomaly detection in ICS. Our findings emphasise the importance of explainable, drift-aware evaluation for reliable deployment of learning-based security monitoring systems.

10. 迁移、元学习与持续学习 11 篇

2606.09891 2026-06-10 cs.LG cs.AI cs.IR 新提交

Representation Curriculum: Stagewise Training for Robust Ranking and Allocation

表示课程:用于鲁棒排序和分配的分阶段训练

Ehsan Ebrahimzadeh, Sina Baharlouei, Abraham Bagherjeiran

发表机构 * eBay Search Ranking and Monetization(eBay搜索排名与变现)

AI总结 提出表示课程(RC)方法,通过分阶段引入特征,先强调基于内容的信号,再引入依赖曝光的信号,减少对历史信号的捷径依赖,提升冷启动泛化性和鲁棒性。

Comments 12 pages, 5 figures

详情
AI中文摘要

数字市场中的排序是一种动态曝光分配机制:展示的物品塑造了发现轨迹和成功事件,平台记录这些事件以更新未来的分配策略。现代排序系统严重依赖曝光混杂信号(如流行度估计、CTR/CVR聚合和基于ID的表示),因为这些信号在静态需求下具有高度预测性。然而,这种预测能力可能成为一种学习捷径:早期访问依赖曝光的信念信号会使优化过度依赖它们,而忽视独立于曝光的价值信号(如基于内容的竞争力和语义亲和性)。因此,学习到的策略倾向于固化现有物品,并在分布偏移下降低冷启动泛化性和鲁棒性。我们提出表示课程(RC),一种训练时干预方法,按时间阶段安排特征使用。RC首先突出基于内容的价值信号,然后引入依赖曝光的信念信号,同时将内容路径锚定在学到的价值表示附近,从而抑制对历史信号的捷径依赖,并缓解内容信号上的梯度饥饿。我们形式化RC,使其独立于任务和假设类,并提供排序特定的实例化。在高斯线性岭回归设置中,我们推导出封闭形式解和充分条件,证明RC在冷启动目标分布上严格降低总体风险,并量化了与源性能的帕累托权衡。在公开的排序学习和推荐基准测试,以及大规模电商搜索系统中的随机在线实验中,RC显著地将依赖从历史信念信号转向基于内容的价值信号,并在头部性能可控权衡下,对冷群体带来一致的提升。

英文摘要

Ranking in digital marketplaces is a dynamic exposure-allocation mechanism: displayed items shape discovery trajectories and success events logged by the platform to update future allocation policies. Modern ranking systems rely heavily on exposure-confounded signals (e.g. popularity estimates, CTR/CVR aggregates, and ID-based representation), because they are highly predictive under stationary demand. Yet this predictive power can become a learning shortcut: early access to exposure-dependent belief signals steers optimization toward over-reliance on them and away from exposure-independent merit signals (e.g., content-based competitiveness and semantic affinity). Consequently, the learned policy tends to entrench incumbents and degrade cold-start generalization and robustness under distribution shift. We propose Representation Curriculum (RC), a training-time intervention that temporally stages feature utilization. RC foregrounds content-based merit signals initially, then introduces exposure-dependent belief signals while anchoring the content pathway near the learned merit representation, curbing shortcut reliance on historical signals and mitigating gradient starvation on content signals. We formalize RC independently of task and hypothesis class and provide ranking-specific instantiations. In a Gaussian linear ridge setting, we derive closed-form solutions and sufficient conditions under which RC strictly reduces population risk on a cold-start target distribution, with a quantified Pareto tradeoff against source performance. Experiments on public learning-to-rank and recommendation benchmarks, and randomized online experiments in a large-scale e-commerce search system, show that RC measurably shifts reliance from historical belief signals toward content-based merit signals and yields consistent gains on cold populations with a controlled trade-off in head performance.

2606.09960 2026-06-10 cs.LG cs.AI 新提交

HydraCIL: Decoupled Class-Incremental Learning through Prototype-Guided Multi-Head Classifiers

HydraCIL: 通过原型引导的多头分类器实现解耦的类增量学习

Daniel Vila-Cruz, Laura Morán-Fernández, Verónica Bolón-Canedo

AI总结 提出HydraCIL模型,通过冻结主干网络、解耦特征提取与学习,并利用原型相似性选择任务特定分类头,在资源受限环境中实现高效类增量学习,匹配或超越现有方法同时大幅降低训练时间和碳排放。

Comments Accepted for publication at the International Joint Conference on Neural Networks (IJCNN 2026)

详情
AI中文摘要

我们提出HydraCIL,一种基于原型引导的多头分类器的解耦持续学习模型,旨在嵌入式及资源受限环境中的可持续部署。虽然大多数类增量学习(CIL)方法依赖强大硬件和长时间再训练周期,但实际系统(如机器人或边缘AI设备)必须在有限资源下快速适应。HydraCIL通过冻结主干网络并将特征提取与学习解耦来解决这一问题。对于每个任务,特征被提取一次,并创建一个轻量级的、任务特定的分类器头,避免了昂贵的主干再训练。在推理时,HydraCIL通过与原型的相似性选择适当的头。在CIFAR-100、ImageNet-100、CoRe50和Flowers102数据集上的实验表明,HydraCIL匹配或超越了最先进的CIL方法,同时显著减少了训练时间和碳足迹,使其成为在能源效率和快速适应至关重要的实际及嵌入式环境中进行持续学习的实用解决方案。

英文摘要

We present HydraCIL, a decoupled continual learning model based on prototype-guided multi-head classifiers, targeting sustainable deployment in embedded and resource-constrained environments. While most Class-Incremental Learning (CIL) methods rely on powerful hardware and long retraining cycles, real-world systems, such as robots or edge AI devices, must adapt quickly with limited resources. HydraCIL addresses this gap by freezing the backbone and decoupling feature extraction from learning. For each task, features are extracted once and a lightweight, task-specific classifier head is created, avoiding costly backbone retraining. At inference, HydraCIL selects the appropriate head via similarity with prototypes. Experiments on CIFAR-100, ImageNet-100, CoRe50, and Flowers102 datasets show that HydraCIL matches or outperforms state-of-the-art CIL methods while significantly reducing training time and carbon footprint, making it a practical solution for continual learning in real-world and embedded settings, where energy efficiency and rapid adaptation are critical.

2606.10064 2026-06-10 cs.LG cs.AI 新提交

Bittensor Agent Arenas as a Trajectory Primitive: Distilling a Shopping Agent from ShoppingBench Subnet Traces

Bittensor 智能体竞技场作为轨迹基元:从 ShoppingBench 子网轨迹中蒸馏购物智能体

Shardul Bansal, Seth Schilbe, Jarrod Barnes

AI总结 针对小模型后训练缺乏多轮轨迹数据的问题,利用 Bittensor 子网 SN15 的竞技机制生成激励对齐的轨迹,通过结构质量过滤提取智能体轨迹,后训练 Qwen3-4B 模型在 ShoppingBench 上达到 42.7% ASR,接近合成数据基线。

Comments 10 pages, 4 figures, Data and Models available at: https://huggingface.co/collections/oro-ai/shoppingbench-sn15-trajectory-primitive

详情
AI中文摘要

小模型智能体后训练的瓶颈更多在于其消耗的轨迹基质而非算法。领先的方案(RLVR、群体相对 RL、拒绝采样重 SFT)都需要携带每轨迹监督的多轮轨迹,而现有的两个来源存在不足:前沿合成数据继承了合成器的偏见并坍缩了长尾,而未经过滤的生产日志未经评判且被捷径行为污染。我们认为可以设计一个激励对齐的智能体竞技场来制造此类轨迹,并在 ORO Subnet 15(SN15)上进行了演示,这是 ShoppingBench 智能体电商基准的 Bittensor 部署。SN15 的竞赛机制、LLM 推理评判器和旋转泄漏簇防护问题集产生了一个具有三个特性的语料库:激励对齐的多样性、每轨迹评判和反记忆的留出评估。我们引入了一个结构质量过滤器,通过保留智能体轨迹(模型自身发出工具调用)并拒绝子任务轨迹(模型仅在确定性搜索循环上进行分类或叙述),将原始数据流转换为可训练的语料库,然后使用与已发布的 ShoppingBench SFT-然后-GRPO 流程匹配的方案对 Qwen3-4B 进行后训练。在泄漏簇防护的留出分区上,以生产严格方式评分,模型从已发布的 Qwen3-4B 基线的 18.0% ASR 提升至 42.7%,与合成数据 SFT 仅基线(43.6%)在单问题噪声范围内,同时仅训练了子网单日输出的一小部分。监督堆栈留下了较大的 pass@8 到 pass@1 差距(53.3% 对比 34.8%);每步教师基础的 Dr. GRPO 奖励将该空间转化为过程改进,我们确定子任务数据流是缩小与 48.7% SFT+GRPO 基线差距的主要杠杆。我们发布了过滤器、语料库分割和竞技场机制。

英文摘要

Small-model agentic post-training is bottlenecked less by the algorithm than by the trajectory substrate it consumes. Leading recipes (RLVR, group-relative RL, rejection-sampled re-SFT) all need multi-turn traces carrying per-trajectory supervision, and the two existing sources fall short: frontier-synthesised data inherits the synthesizer's biases and collapses the long tail, while unfiltered production logs are unjudged and contaminated by shortcut behaviour. We argue that an incentive-aligned agent arena can be engineered to manufacture such trajectories, and demonstrate this on ORO Subnet 15 (SN15), a Bittensor deployment of the ShoppingBench agentic-commerce benchmark. SN15's race mechanism, LLM reasoning judge, and rotating leak-cluster-guarded problem suite yield a corpus with three properties: incentive-aligned diversity, per-trajectory judging, and anti-memorised held-out evaluation. We introduce a structural-quality filter that converts the raw firehose into a trainable corpus by keeping agentic trajectories (the model itself emits the tool calls) and rejecting sub-task trajectories (the model only classifies or narrates over a deterministic search loop), then post-train Qwen3-4B with a recipe matched to the published ShoppingBench SFT-then-GRPO pipeline. On a leak-cluster-guarded held-out partition scored production-strict, the model lifts from the published Qwen3-4B base of 18.0% ASR to 42.7%, within single-problem noise of the synthetic-data SFT-only baseline (43.6%), while training on a fraction of a single day of subnet output. The supervised stack leaves a large pass@8 to pass@1 gap (53.3% vs 34.8%); a per-step teacher-grounded Dr. GRPO reward converts that headroom into process improvement, and we identify the sub-task firehose as the primary lever for closing the gap to the 48.7% SFT+GRPO bar. We release the filter, the corpus splits, and the arena mechanics.

2606.10216 2026-06-10 cs.LG cs.AI 新提交

A Source Domain is All You Need: Source-Only Cross-OS Transfer Learning for APT Anomaly Detection via Semantic Alignment and Optimal Transport

一个源域足矣:基于语义对齐和最优传输的仅源域跨操作系统APT异常检测迁移学习

Sidahmed Benabderrahmanea, Petko Valtchev, James Cheney, Talal Rahwan

AI总结 针对跨操作系统APT检测中目标域无标签的挑战,提出基于最优传输的仅源域异常评分框架,通过语义抽象和三种偏差通道实现零目标监督下的异常排序。

详情
AI中文摘要

高级持续性威胁(APT)是隐蔽的多阶段网络攻击,由于标记痕迹稀缺、严重的类别不平衡以及生成真实恶意行为的挑战,其检测十分困难。这些挑战在跨操作系统(cross-OS)设置中被放大,此时在一个源平台上训练的检测器必须部署在无标签的目标平台上,且无法访问目标域标签。我们利用系统级溯源轨迹研究这种仅源域的跨操作系统APT检测问题,并提出一个基于传输的框架,在零目标监督下对异常目标进程进行排序。该框架将进程行为抽象为结构化的自然语言描述,使用预训练语言模型进行嵌入,并构建源域正常参考用于目标评分。它结合了三种证据通道:与源域正常原型的语义偏差、由图自编码器捕获的结构偏差、以及通过最优传输(OT)度量的几何偏差。主要贡献是一个基于OT的重心异常分数,该分数将目标嵌入投影到源域正常流形上,并量化残差传输不匹配。我们进一步引入熵加权、角度感知和密度感知的OT变体,以捕获不确定性、方向漂移和稀疏支持行为。在DARPA透明计算数据(涵盖Linux、Windows、BSD和Android)上的评估,涉及两个APT场景和十二个跨操作系统传输对,表明所提框架在ROC-AUC和nDCG上优于仅源域异常检测基线。结果表明,仅源域溯源建模结合语义抽象和基于OT的异常评分,可以在没有目标域监督的情况下支持实际的跨平台APT检测。

英文摘要

Advanced Persistent Threats (APTs) are stealthy, multi-stage cyberattacks whose detection is difficult due to scarce labeled traces, severe class imbalance, and the challenge of generating realistic malicious behavior. These challenges are amplified in cross-operating-system (cross-OS) settings, where a detector trained on one source platform must be deployed on an unlabeled target platform without access to target-domain labels. We study this source-only cross-OS APT detection problem using system-level provenance traces and propose a transport-based framework for ranking anomalous target processes under zero target supervision. The framework abstracts process behavior into structured natural-language descriptions, embeds them using pretrained language models, and constructs a source-normal reference for target scoring. It combines three evidence channels: semantic deviation from source-normal prototypes, structural deviation captured by graph autoencoding, and geometric deviation measured through Optimal Transport (OT). The main contribution is an OT-based barycentric anomaly score that projects target embeddings onto the source-normal manifold and quantifies residual transport mismatch. We further introduce entropy-weighted, angle-aware, and density-aware OT variants to capture uncertainty, directional drift, and sparse-support behavior. Evaluation on DARPA Transparent Computing data spanning Linux, Windows, BSD, and Android, across two APT scenarios and twelve cross-OS transfer pairs, shows that the proposed framework improves ROC-AUC and nDCG over source-only anomaly-detection baselines. The results demonstrate that source-only provenance modeling, combined with semantic abstraction and OT-based anomaly scoring, can support practical cross-platform APT detection without target-domain supervision.

2606.10277 2026-06-10 cs.LG 新提交

A Unified Adaptive Feature Composition Framework for Multi-Task Generalization in Wireless Foundation Models

无线基础模型中多任务泛化的统一自适应特征组合框架

Yuxuan Shi, Tingting Yang, Kangning Ma, Liwen Jing, Yuwei Wang, Mengfan Zheng, Li Sun

发表机构 * Department of Broadband Communication, Pengcheng Laboratory(宽带通信系,鹏城实验室) Purple Mountain Laboratories(紫金山实验室)

AI总结 提出RAFC路由适配器,通过轻量级任务驱动网络动态组合Transformer各层隐藏特征,实现无线基础模型的多任务泛化,仅增加少于50K参数。

详情
AI中文摘要

尽管无线基础模型(WFM)在学习通用信道表示方面展现出强大潜力,但其适应各种下游任务仍受现有范式限制。微调策略引入了大量计算和存储开销,而冻结特征提取则导致跨不同下游任务的次优性能。为解决此问题,我们提出了一种用于WFM多任务泛化的统一自适应特征组合框架,其关键组件是用于特征组合的路由适配器(RAFC)。该路由器并非仅提取最后一层输出,而是将来自不同Transformer深度的隐藏状态视为可复用的多级隐藏特征池,并采用轻量级任务驱动特征组合网络生成逐层聚合权重,然后通过加权求和自适应地组合层次化表示。这种设计使每个下游任务能够访问合适的低、中、高级无线特征混合,而无需修改预训练骨干网络。在四个代表性无线任务上的大量实验表明,RAFC在引入少于50K额外参数的情况下,始终优于传统的适应基线。此外,学习到的路由权重提供了任务特定层偏好的可解释证据,使所提框架成为将WFM适应于各种下游场景的低复杂度、可扩展且可解释的接口。

英文摘要

Though wireless foundation models (WFMs) have shown strong potential in learning universal channel representations, their adaptation to various downstream tasks remains constrained by existing paradigms. Fine-tuning strategies introduces substantial computational and storage overhead, while frozen feature extraction leads to sub-optimal performance across diverse downstream tasks. To address this issue, we propose a unified adaptive feature composition framework for multitask generalization in WFMs, where the key component is the Routing Adapter for Feature Composition (RAFC). Instead of extracting only the final-layer output, this router treats the hidden states from different Transformer depths as a reusable pool of multi-level hidden features, and employs a lightweight task-driven feature composition network to generate layer-wise aggregation weights, then adaptively combine hierarchical representations through weighted summation. This design enables each downstream task to access suitable mixture of low-, mid-, and high-level wireless features without modifying the pretrained backbone. Extensive experiments on four representative wireless tasks demonstrate that RAFC consistently outperforms conventional adaptation baselines while introducing fewer than 50K additional parameters. Moreover, the learned routing weights provide interpretable evidence of task-specific layer preferences, making the proposed framework a low-complexity, scalable, and explainable interface for adapting WFMs to diverse downstream scenarios.

2606.11182 2026-06-10 cs.LG cs.AI 新提交

EEVEE: Towards Test-time Prompt Learning in the Real World for Self-Improving Agents

EEVEE:面向真实世界自改进智能体的测试时提示学习

Weixian Xu, Shilong Liu, Mengdi Wang

发表机构 * Shanghai Jiao Tong University(上海交通大学) Princeton University(普林斯顿大学)

AI总结 提出首个多数据集测试时提示学习框架EEVEE,通过路由器与提示协同进化策略解决跨数据集干扰,在异构数据流下提升鲁棒性。

Comments 19 pages, 6 figures

详情
AI中文摘要

本文提出EEVEE,首个面向LLM智能体的多数据集测试时提示学习框架,能够在真实世界任务流下实现测试时提示学习。现有方法主要针对单数据集设计,而实际应用要求模型处理来自多个数据集、领域和任务分布的异构输入流,限制了其实用性。为减轻跨数据集干扰,EEVEE引入一个路由器,将输入划分为任务簇并分配适当的提示配置。该设计通过路由器-提示协同进化策略进行优化,该策略采用交错的路由器和提示学习阶段来解决它们的相互依赖关系。跨多个数据集的实验表明,该框架在异构数据流下提高了鲁棒性,同时保持了单基准学习能力和效率。具体而言,EEVEE在Qwen3-4B-Instruct和DeepSeek-V3.2上平均多基准分数分别提高了10.38和24.32分,超过SOTA方法GEPA和ACE高达37.2%和48.2%。

英文摘要

In this paper, we propose EEVEE, the first multi-dataset test-time prompt learning framework for LLM agents, enabling test-time prompt learning under real-world task streams. Existing methods are largely designed for single-dataset settings, while real-world applications require models to handle heterogeneous input streams drawn from multiple datasets, domains, and task distributions, limiting their practical applicability. To mitigate cross-dataset interference, EEVEE introduces a router that partitions incoming inputs into task clusters and assigns them to suitable prompt configurations. This design is optimized via a router-prompt co-evolution strategy, which employs interleaved router and prompt learning phases to address their mutual dependency. Experiments across multiple datasets demonstrate that the framework improves robustness under heterogeneous data streams while maintaining single-benchmark learning capability and efficiency. Specifically, EEVEE improves average multi-benchmark scores by 10.38 and 24.32 points over Qwen3-4B-Instruct and DeepSeek-V3.2, surpassing SOTA methods GEPA and ACE by up to 37.2% and 48.2%.

2606.10327 2026-06-10 cs.CL cs.LG 交叉投稿

The Order Matters: Sequential Fine-Tuning of LLaMA for Coherent Automated Essay Scoring

顺序重要:LLaMA的序列微调用于连贯的自动作文评分

Ali Keramati, Mark Warschauer

发表机构 * University of California, Irvine(加州大学伊文斯分校)

AI总结 提出对LLaMA-3.1-8B进行任务感知的序列微调,按作文话语结构顺序训练,在PERSUADE 2.0语料上证据F1达65%、结论F1达87%,超越独立训练和70B基线,证明课程设计可提升自动作文评分性能。

详情
AI中文摘要

自动作文评分(AES)系统必须判断相互依赖的话语元素(如引言、立场、证据、结论),但大多数方法孤立地处理这些元素,损害了连贯性和泛化能力。我们研究了对LLaMA-3.1-8B进行任务感知的微调,用于AES,使用参数高效的LoRA和4位量化,并比较了三种训练课程:(i)序列式(依次在引言、立场、主张、证据、结论上微调),(ii)独立式(任务特定模型),以及(iii)随机式(打乱的多任务)。在PERSUADE 2.0语料上的实验表明,建模任务依赖性很重要:序列微调取得了最强的整体结果,包括证据的F1分数65%和结论的87%,以及相应的准确率63%和85%,超越了独立训练,并且在结论上优于通用LLaMA-70B基线,尽管后者容量大得多。随机训练提高了立场评分(57% F1),但在其他地方一致性较差。这些发现表明:(1)与话语结构对齐的课程设计可以实质性地改善AES,以及(2)小型、任务优化的模型可以与显著更大的大型语言模型(LLM)竞争,为可扩展、成本效益高的评估提供了实用途径。我们发布模板和实现细节,以促进复现和未来在教育NLP中课程设计的工作。

英文摘要

Automated Essay Scoring (AES) systems must judge interdependent discourse elements (e.g., lead, claim, evidence, conclusion), yet most approaches treat these in isolation, harming coherence and generalization. We investigate task-aware fine-tuning of LLaMA-3.1-8B for AES using parameter-efficient LoRA with 4-bit quantization and compare three training curricula: (i) Sequential (progressively fine-tuning on lead, then position, then claim, then evidence, then conclusion), (ii) Independent (task-specific models), and (iii) Randomized (shuffled multi-task). Experiments on the PERSUADE~2.0 corpus show that modeling task dependencies matters: Sequential fine-tuning yields the strongest overall results, including F1 scores of 65% (evidence) and 87% (conclusion) and corresponding accuracies of 63% and 85%, surpassing Independent training and outperforming a general-purpose LLaMA-70B baseline on conclusion despite its far larger capacity. Randomized training improves position scoring (57% F1) but is less consistent elsewhere. These findings indicate that (1) curriculum design aligned with discourse structure can materially improve AES, and (2) small, task-optimized models can be competitive with substantially larger Large Language Models (LLM), offering a practical path to scalable, cost-effective assessment. We release templates and implementation details to facilitate reproduction and future work on curriculum design for educational NLP.

2606.10918 2026-06-10 cs.RO cs.LG 交叉投稿

Task Robustness via Re-Labelling Vision-Action Robot Data

通过重新标注视觉-动作机器人数据的任务鲁棒性

Artur Kuramshin, Özgür Aslan, Cyrus Neary, Glen Berseth

发表机构 * Mila — Quebec AI Institute(Mila — 魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) The University of British Columbia(不列颠哥伦比亚大学)

AI总结 提出TREAD框架,利用大型视觉语言模型对机器人数据集进行语义子任务分解和多样化指令生成,无需额外数据收集,提升策略在未见任务上的泛化能力。

Comments Project website: https://akuramshin.github.io/tread

详情
AI中文摘要

近年来,机器人学习模型规模的扩大产生了令人印象深刻的策略,能够执行各种操作任务并泛化到新场景。然而,这些策略在遵循指令方面仍然存在困难,很可能是因为现有机器人数据集中的语言和动作序列多样性有限。本文介绍了通过重新标注视觉-动作机器人数据实现任务鲁棒性(TREAD),这是一个可扩展的框架,利用大型视觉语言模型(VLM)在不进行额外数据收集的情况下增强现有机器人数据集,利用这些模型中嵌入的可迁移知识。我们的方法通过三个阶段利用预训练的VLM:从原始指令标签和初始场景生成语义子任务,根据这些子任务对演示视频进行分割,并生成包含对象属性的多样化指令,有效地将较长的演示分解为基于语言-动作对。我们进一步通过用语言多样化的文本目标版本增强数据来提高鲁棒性。在LIBERO上的评估表明,在我们增强的数据集上训练的策略在未见过的、新颖的任务和目标上表现出改进的性能。我们的结果表明,TREAD通过轨迹分解增强了规划泛化,并通过增加语言多样性增强了语言条件策略泛化。

英文摘要

The recent trend in scaling models for robot learning has resulted in impressive policies that can perform various manipulation tasks and generalize to novel scenarios. However, these policies continue to struggle with following instructions, likely due to the limited linguistic and action sequence diversity in existing robotics datasets. This paper introduces Task Robustness via Re-Labelling Vision-Action Robot Data (TREAD), a scalable framework that leverages large Vision-Language Models (VLMs) to augment existing robotics datasets without additional data collection, harnessing the transferable knowledge embedded in these models. Our approach leverages a pretrained VLM through three stages: generating semantic sub-tasks from original instruction labels and initial scenes, segmenting demonstration videos conditioned on these sub-tasks, and producing diverse instructions that incorporate object properties, effectively decomposing longer demonstrations into grounded language-action pairs. We further enhance robustness by augmenting the data with linguistically diverse versions of the text goals. Evaluations on LIBERO demonstrate that policies trained on our augmented datasets exhibit improved performance on novel, unseen tasks and goals. Our results show that TREAD enhances both planning generalization through trajectory decomposition and language-conditioned policy generalization through increased linguistic diversity.

2507.14725 2026-06-10 cs.LG cs.AI 版本更新

GRID: Scaling Task-Agnostic Inference in Continual Prompt Tuning

GRID:持续提示调优中任务无关推理的规模化

Anushka Tiwari, Sayantan Pal, Rohini K. Srihari, Kaiyi Ji

发表机构 * State University of New York at Buffalo(纽约州立大学布法罗分校) Department of Computer Science and Engineering(计算机科学与工程系) Institute for Artificial Intelligence and Data Science(人工智能与数据科学研究院)

AI总结 提出GRID框架,通过输出空间感知解码和梯度引导提示选择,解决持续学习中任务无关推理的性能退化与可扩展性问题,在长序列和负迁移基准上提升后向迁移并减少提示内存。

详情
AI中文摘要

基于提示的持续学习提供了一种参数高效的方式,使大型语言模型能够适应任务序列。然而,现有方法通常依赖任务感知推理,并维护不断扩展的任务特定提示集,导致(1)当推理时任务标识符不可用于提示选择时,早期任务性能严重下降;(2)随着任务序列增长,可扩展性受限。我们提出GRID,一个统一的框架来解决这些挑战。GRID包含一个输出空间感知解码机制,通过利用代表性输入和自动标签语义归一化来增强后向迁移,以及一个梯度引导的提示选择策略,将信息量较少的提示压缩为单个聚合表示,以实现可扩展、内存高效的持续学习。在长序列和负迁移基准上的大量实验表明,GRID改善了后向迁移,实现了有竞争力的前向迁移,并显著减少了编码器-解码器和仅解码器架构(包括T5、Qwen和LLaMA)中的提示内存。源代码可从此https URL获取。

英文摘要

Prompt-based continual learning (CL) offers a parameter-efficient way to adapt large language models (LLMs) across task sequences. However, existing methods often rely on task-aware inference and maintain an expanding set of task-specific prompts, leading to (1) severe performance degradation on earlier tasks when task identifiers are unavailable for prompt selection at inference time, and (2) limited scalability as task sequence grows. We propose GRID, a unified framework designed to address these challenges. GRID incorporates an output-space-aware decoding mechanism that enhances backward transfer by leveraging representative inputs and automatic label semantic normalization, alongside a gradient-guided prompt selection strategy that compresses less informative prompts into a single aggregated representation for scalable, memory-efficient continual learning. Extensive experiments on long-sequence and negative-transfer benchmarks show that GRID improves backward transfer, achieves competitive forward transfer, and substantially reduces prompt memory across encoder-decoder and decoder-only architectures, including T5, Qwen, and LLaMA. Source code is available at https://github.com/AnushkaTi/GRID.

2602.01179 2026-06-10 cs.LG 版本更新

Rethinking the Flow-Based Gradual Domain Adaptation: A Semi-Dual Optimal Transport Perspective

重新思考基于流的渐进域适应:半对偶最优传输视角

Zhichao Chen, Zhan Zhuang, Yunfei Teng, Hao Wang, Fangyikang Wang, Zhengnan Li, Tianqiao Liu, Haoxuan Li, Zhouchen Lin

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 针对渐进域适应中中间域构建问题,提出熵正则化半对偶非平衡最优传输框架,避免似然估计,实现稳定训练并提升性能。

Comments The paper has been accepted for presentation as a regular paper at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

渐进域适应(GDA)旨在通过中间域逐步将模型从源域适应到目标域,以减轻域偏移。然而,真实的中间域通常不可用或无效,因此需要合成中间样本。基于流的模型最近被用于此目的,通过在源分布和目标分布之间插值。值得注意的是,它们的训练通常依赖于基于样本的对数似然估计,这可能会丢弃有用信息,从而降低GDA性能。解决这一限制的关键是通过样本直接构建中间域。为此,我们提出了一个熵正则化半对偶非平衡最优传输(E-SUOT)框架来构建中间域。具体来说,我们将基于流的GDA重新表述为拉格朗日对偶问题,并推导出一个等效的半对偶目标,从而避免了似然估计的需要。然而,对偶问题导致不稳定的最小-最大训练过程。为了缓解这个问题,我们进一步引入熵正则化,将其转换为更稳定的顺序优化过程。基于此,我们提出了一个新的GDA训练框架,并提供了稳定性和泛化方面的理论分析。最后,进行了大量实验来证明E-SUOT框架的有效性。

英文摘要

Gradual domain adaptation (GDA) aims to mitigate domain shift by progressively adapting models from the source domain to the target domain via intermediate domains. However, real intermediate domains are often unavailable or ineffective, necessitating the synthesis of intermediate samples. Flow-based models have recently been used for this purpose by interpolating between source and target distributions. Notably, their training typically relies on sample-based log-likelihood estimation, which can discard useful information and thus degrade GDA performance. The key to addressing this limitation is constructing the intermediate domains via samples directly. To this end, we propose an Entropy-regularized Semi-dual Unbalanced Optimal Transport (E-SUOT) framework to construct intermediate domains. Specifically, we reformulate flow-based GDA as a Lagrangian dual problem and derive an equivalent semi-dual objective that circumvents the need for likelihood estimation. However, the dual problem leads to an unstable min-max training procedure. To alleviate this issue, we further introduce the entropy regularization to convert it into a more stable sequential optimization procedure. Based on this, we propose a novel GDA training framework and provide theoretical analysis in terms of stability and generalization. Finally, extensive experiments are conducted to demonstrate the efficacy of the E-SUOT framework.

2602.12542 2026-06-10 cs.LG cs.AI 版本更新

Exploring Accurate and Transparent Domain Adaptation in Predictive Healthcare via Concept-Grounded Orthogonal Inference

探索预测性医疗中基于概念正交推理的准确且透明的域适应

Pengfei Hu, Chang Lu, Feifan Liu, Yue Ning

发表机构 * Department of Computer Science, Stevens Institute of Technology, Hoboken, NJ, United States(斯蒂文斯理工学院计算机科学系) UMass Chan Medical School, University of Massachusetts Amherst, Amherst, MA, United States(马萨诸塞大学阿默斯特分校UMass Chan医学学校)

AI总结 提出ExtraCare模型,通过将患者表示分解为不变和协变分量并强制正交,在保留标签信息的同时暴露域特定变异,实现准确预测并提供基于医疗概念的可解释性。

Comments Accepted by ICML 2026 Main Conference

详情
AI中文摘要

用于电子健康记录(EHR)临床事件预测的深度学习模型在不同数据分布下部署时,常常性能下降。虽然域适应(DA)方法可以缓解这种偏移,但其“黑箱”性质阻碍了在临床实践中的广泛采用,而临床实践中透明度对于信任和安全至关重要。我们提出ExtraCare,将患者表示分解为不变和协变分量。通过监督这两个分量并在训练中强制其正交性,我们的模型在保留标签信息的同时暴露域特定变异,从而实现比大多数特征对齐模型更准确的预测。更重要的是,它通过将稀疏的潜在维度映射到医疗概念,并通过目标消融量化其贡献,提供人类可理解的解释。ExtraCare在两个真实EHR数据集上,跨多个域划分设置进行评估,展示了优越的性能以及增强的透明度,其准确预测和来自广泛案例研究的解释证明了这一点。

英文摘要

Deep learning models for clinical event prediction on electronic health records (EHR) often suffer performance degradation when deployed under different data distributions. While domain adaptation (DA) methods can mitigate such shifts, their "black-box" nature prevents widespread adoption in clinical practice where transparency is essential for trust and safety. We propose ExtraCare to decompose patient representations into invariant and covariant components. By supervising these two components and enforcing their orthogonality during training, our model preserves label information while exposing domain-specific variation at the same time for more accurate predictions than most feature alignment models. More importantly, it offers human-understandable explanations by mapping sparse latent dimensions to medical concepts and quantifying their contributions via targeted ablations. ExtraCare is evaluated on two real-world EHR datasets across multiple domain partition settings, demonstrating superior performance along with enhanced transparency, as evidenced by its accurate predictions and explanations from extensive case studies.

11. 数据集、基准与评测 41 篇

2606.09865 2026-06-10 cs.LG cs.CR cs.IR 新提交

LLM-as-a-Discriminator: When Synthetic Tables Still Look Real

LLM作为判别器:当合成表格看起来仍然真实

Manel Slokom, Malek Slokom, Thierno Kante

AI总结 提出用LLM区分真实与合成表格数据,测试不同设置和模型,发现LLM判别可作为实用的隐私审计信号。

详情
AI中文摘要

隐私和数据共享常常处于紧张状态。许多组织使用合成数据来降低隐私风险,同时仍能共享有用的数据。对于表格数据,审计隐私仍然困难。在许多情况下,即使是人类也很难判断一个表格是真实的还是合成的。在本文中,我们提出了一种基于LLM判别的方法。我们要求LLM将每个表格样本分类为真实或合成。我们测试了两种设置:C1仅包含表格,C2包含表格和分布元数据。我们使用LLaMA作为开放模型,Gemini作为参考模型。在我们的实验中,我们在两个公共数据集UCI Adult和ACS Census上运行了三种合成模型:CTGAN、TVAE和Gaussian Copula。我们收集了451个有效试验。我们的结果显示模型之间存在明显差异。在Adult上,LLaMA在报告单元格中达到DRS=0%,而Gemini对CTGAN和TVAE达到DRS=100%。在Census上,LLaMA预测大多数样本为合成,而Gemini在C1中保持高值,但在C2中对CTGAN和TVAE下降。我们还与分类器双样本检验(C2ST)和记录链接作为分布基线进行了比较,并与2名标注员和240次试验的人类试点进行了比较。我们的结果表明,当模型选择、每个提供者的报告和数据编码得到谨慎处理时,LLM判别是一种实用的隐私审计信号。为了可重复性,代码和实验脚本可在以下网址获得:https://this URL。

英文摘要

Privacy and data sharing are often in tension. Many organizations use synthetic data to reduce privacy risk and still share useful data. For tabular data, auditing privacy remains hard. In many cases, even humans cannot easily tell if a table is real or synthetic. In this paper, we propose a method based on LLM discrimination. We ask an LLM to classify each table sample as REAL or SYNTHETIC. We test two settings: C1 with table only, and C2 with table plus distributional metadata. We use LLaMA as an open model and Gemini as a reference model. In our experiments, we run three synthesis models, CTGAN, TVAE, and Gaussian Copula, on two public datasets, UCI Adult and ACS Census. We collect 451 valid trials. Our results show clear differences between models. On Adult, LLaMA reaches DRS=0% in reported cells, while Gemini reaches DRS=100% for CTGAN and TVAE. On Census, LLaMA predicts SYNTHETIC for most samples, while Gemini stays high in C1 but drops for CTGAN and TVAE in C2. We also compare with a classifier two-sample test (C2ST) and record linkage as distributional baselines, and with a human pilot of 2 annotators and 240 trials. Our results show that LLM discrimination is a practical privacy audit signal when model choice, per provider reporting, and data encoding are handled with care. For reproducibility, code and experiment scripts are available at https://github.com/SlokomManel/LLM-as-a-Discriminator.

2606.09874 2026-06-10 cs.LG stat.ML 新提交

Disjoint or Overlapping? Inference Windowing for Reconstruction-Based Time Series Anomaly Detection

不相交还是重叠?基于重构的时间序列异常检测中的推理窗口化

Guillaume Coulaud, Reza Akbarinia, Florent Masseglia

发表机构 * University of Montpellier, Inria, CNRS, LIRMM(蒙彼利埃大学、Inria、CNRS、LIRMM)

AI总结 研究推理步长(重叠窗口)对基于重构的时间序列异常检测性能的影响,提出统一评估协议,实验表明重叠窗口平均提升28%且改变方法排名。

详情
AI中文摘要

基于重构的方法广泛用于时间序列异常检测,其中模型被训练来重构子序列,并通过重构误差识别异常。然而,由于异构的评估实践和不明确的推理过程,报告的结果往往难以比较。在本文中,我们重新审视单变量离线设置下的基于重构的异常检测,并研究推理步长的作用,该步长控制子序列是作为不相交窗口处理还是重叠处理。我们在精心策划的TSB-AD基准上提出了一个统一的训练、调优和多种子评估协议,并研究了重叠推理如何影响一系列重构模型的异常检测性能,包括基于PCA的基线、DLinear、AutoEncoder、TimesNet和Transformer变体。结果表明,在所有模型中,重叠窗口带来一致的改进,平均相对增益高达+28%,并且可以改变方法排名。我们进一步分析了跨数据集、随机种子和超参数配置的变异性。最后,我们使用与滑动窗口重构对齐的定位标准,在完整的UCR存档上补充了基准研究。总体而言,我们的结果强调,基于重构的异常检测性能不仅取决于模型架构和训练,还取决于推理选择,这促使采用清晰且可重复的协议。我们的结果表明,基于重构的基线在TSB-AD和UCR基准上都取得了强劲的性能,支持它们作为单变量时间序列异常检测的竞争性和实用方法。

英文摘要

Reconstruction-based methods are widely used for time series anomaly detection, where models are trained to reconstruct subsequences, and anomalies are identified through reconstruction errors. However, reported results are often hard to compare due to heterogeneous evaluation practices and underspecified inference procedures. In this paper, we revisit reconstruction-based anomaly detection in the univariate offline setting and study the role of the inference stride, which controls whether subsequences are processed as disjoint windows or with overlap. We propose a unified training, tuning, and multi-seed evaluation protocol on the curated TSB-AD benchmark, and study how overlapping inference affects anomaly detection performance for a range of reconstruction models, including PCA-based baselines, DLinear, an AutoEncoder, TimesNet, and Transformer variants. The results show that across all models, overlapping windows yield consistent improvements, with average relative gain up to +28%, and can alter method rankings. We further analyze variability across datasets, random seeds, and hyperparameter configurations. Finally, we complement the benchmark study with an evaluation on the full UCR archive using localization criteria aligned with sliding-window reconstruction. Overall, our results highlight that reconstruction-based anomaly detection performance depends not only on model architecture and training, but also on inference choices, motivating a clear and reproducible protocol. Our results show that reconstructionbased baselines achieve strong performance on both TSB-AD and UCR benchmarks, supporting them as competitive and practical approaches for univariate time series anomaly detection.

2606.09878 2026-06-10 cs.LG 新提交

FailureScope: Cross-Regime Behavioral Diagnosis of Language Model Weaknesses

FailureScope: 语言模型弱点的跨机制行为诊断

Nicholas Saban

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出FailureScope方法,通过跨模型通过/失败模式聚类评估探针,在单轮基准、多轮对话和对抗性代理攻击三种机制下稳定生成可解释的失败分类,实现高效任务采样和跨模型失败预测。

详情
AI中文摘要

标准基准报告聚合准确率,但从业者需要知道模型缺乏哪些特定能力。我们引入FailureScope,一种行为诊断方法,通过跨模型通过/失败模式(留一模法,LOMO)对评估探针进行聚类,并展示其在通常单独研究的三种机制(单轮基准、多轮对话和对抗性代理攻击)中产生稳定、可解释的失败分类。在18个模型的2,664个单轮任务上,基于分类的采样在50个任务时达到Kendall's tau=0.81(随机选择为0.34),跨模型失败预测达到AUC 0.88。相同的原语在363个任务的多轮语料库和630个对抗性代理轨迹上恢复出可解释的聚类,其中揭示了一个元失败模式:LLM评判ASR与实际执行之间存在73-100个百分点的差距。所有三种机制的聚类凝聚力保持强劲,我们认为这证明了行为聚类是一种可移植的诊断原语,能够泛化到任何单一基准之外。我们发布了该流程、三个带注释的语料库以及跨机制分类。

英文摘要

Standard benchmarks report aggregate accuracy, but practitioners need to know which specific capabilities a model lacks. We introduce FailureScope, a behavioral-diagnosis method that clusters evaluation probes by their cross-model pass/fail patterns (leave-one-model-out, LOMO), and show it yields stable, interpretable failure taxonomies across three regimes usually studied separately: single-turn benchmarks, multi-turn dialogue, and adversarial agent attacks. On 2,664 single-turn tasks across 18 models, taxonomy-conditioned sampling reaches Kendall's tau = 0.81 at 50 tasks (versus 0.34 for random selection), and cross-model failure prediction reaches AUC 0.88. The same primitive recovers interpretable clusters on a 363-task multi-turn corpus and on 630 adversarial agent traces, where it exposes a meta-failure mode: a 73-100 percentage-point gap between LLM-judge ASR and real execution. Cluster cohesion remains strong across all three regimes, which we take as evidence that behavioral clustering is a portable diagnosis primitive that generalizes beyond any single benchmark. We release the pipeline, three annotated corpora, and the cross-regime taxonomies.

2606.09890 2026-06-10 cs.LG cs.AI cs.CL 新提交

PreAct-Bench: Benchmarking Predictive Monitoring in LLMs

PreAct-Bench:大语言模型中的预测性监控基准

Hainiu Xu, Italo Luis da Silva, Jiangnan Ye, Yuhao Wang, Wei Liu, Linyi Yang, Jonathan Richard Schwarz, Nicola Paoletti, Yulan He, Hanqi Yan

发表机构 * King’s College London(伦敦国王学院) National University of Singapore(新加坡国立大学) Southern University of Science and Technology(南方科技大学) Thomson Reuters Foundational Research(汤姆森路透基础研究) Imperial College London(伦敦帝国学院) The Alan Turing Institute(艾伦·图灵研究所)

AI总结 提出预测性监控任务,在动作执行前判断是否会导致不道德行为,并构建PreActBench基准,评估多种模型发现该任务具有挑战性。

详情
AI中文摘要

大语言模型(LLMs)越来越多地被部署为能够执行多步动作轨迹以实现给定目标的自主代理。虽然现有的安全研究集中于从完整轨迹中检测不道德行为,但这种范式本质上是回顾性的:它仅在伤害已经发生后识别伤害。在这项工作中,我们研究了一个关键但被忽视的安全任务,我们称之为预测性监控:仅给定部分动作轨迹,模型能否在执行公开动作之前推断出它是否会以不道德行为告终?为了支持这一任务,我们提出了PreActBench,一个包含1000个跨五个领域的成对道德和不道德动作轨迹的基准。我们使用我们的前缀远见F1指标,在动作轨迹的不同部分上评估了一系列LLMs、安全护栏模型和潜在探测方法。结果表明,尽管人类取得了有希望的性能,但即使对于强模型,预测性监控仍然具有挑战性,突显了在LLM安全中需要面向未来的风险推理。

英文摘要

Large language models (LLMs) are increasingly deployed as autonomous agents capable of executing multi-step action trajectories toward a given objective. While existing safety research has focused on detecting unethical behavior from complete trajectories, this paradigm is fundamentally retrospective: it identifies harm only after it has already occurred. In this work, we study a critical yet overlooked safety task, which we term Predictive Monitoring: given only a partial action trajectory, can a model infer whether it will culminate in an unethical action before the overt action is executed? To support this task, we present PreActBench, a benchmark of 1,000 paired ethical and unethical action trajectories spanning five domains. We evaluate a range of LLMs, safety guardrail models, and latent probing methods across varying fractions of the action trajectory using our Prefix Foresight F1 metric. Results show that while humans achieve promising performance, predictive monitoring remains challenging even for strong models, highlighting the need for future-oriented risk reasoning in LLM safety.

2606.09898 2026-06-10 cs.LG cs.MA q-bio.QM 新提交

TRAPS: Therapeutic Response Analysis via Pathway-informed Stratification

TRAPS: 基于通路信息分层的治疗反应分析

Sujoy Banik, Sayantan Chakraborty, Boishakhi Das Toma, Zainab Ghafoor, Ushashi Bhattacharjee, Koushik Howlader, Tirtho Roy

发表机构 * Rajshahi University of Engineering \& Technology Bangladesh University of Dhaka Bangladesh American International University of Bangladesh Bangladesh Sonoma State University United States Iowa State University United States Rajshahi University of Engineering \& Technology University of Dhaka American International University of Bangladesh Sonoma State University Iowa State University

AI总结 提出首个统一基准,评估三种通路引导的深度学习模型在联合预测癌症治疗反应和生存率上的表现,发现不同模型在不同任务上各有优劣。

详情
AI中文摘要

癌症治疗规划需要同时考虑多个临床维度。临床医生必须确定患者是否应接受靶向分子治疗、放疗,以及他们是否可能存活超过六个月。现有的通路引导深度学习模型是孤立开发和测试的,无法进行跨架构的公平比较。我们提出了第一个用于通路引导治疗反应建模的统一基准,评估了三种生物信息学架构:BINN、GraphPath 和 PATH,使用了来自癌症基因组图谱的五个癌症队列,代表 2,622 名患者,这些患者使用 Reactome 通路活性评分进行编码。每个模型在相同的数据和评估条件下联合训练所有三个临床结果,这是第一项将通路结构化深度学习视为联合治疗和生存预测问题的研究。我们的结果表明,没有一个架构在所有任务中获胜:PATH 在整体靶向分子治疗预测中表现最佳,BINN 在生存预测中最可靠,而没有一个模型能对放疗产生有用的预测,因为该决策的关键驱动因素是基因表达数据中未捕获的临床变量。最引人注目的是,GraphPath 在前列腺靶向分子治疗预测中达到了 0.92 的 AUROC,是整个基准中的最高分,这表明当与具有狭窄靶向驱动程序的队列匹配时,即使存在极端类别不平衡(仅 11% 阳性率),横向共调控结构也能产生卓越的判别能力。

英文摘要

Cancer treatment planning requires decisions across multiple clinical dimensions at once. Clinicians must determine whether a patient should receive targeted molecular therapy, radiation therapy, and whether they are likely to survive beyond six months. Existing pathway-informed deep learning models have been developed and tested in isolation, making fair comparison across architectures impossible. We present the first unified benchmark for pathway-guided therapy response modeling, evaluating three biologically informed architectures, BINN, GraphPath, and PATH, across five cancer cohorts drawn from The Cancer Genome Atlas, representing 2,622 patients encoded using Reactome pathway activity scores. Each model is trained jointly on all three clinical outcomes under identical data and evaluation conditions, the first study to treat pathway-structured deep learning as a combined therapy and survival prediction problem. Our results show that no single architecture wins across all tasks: PATH performs best for targeted molecular therapy prediction overall, BINN is most reliable for survival prediction, and no model produces useful predictions for radiation therapy, as the key drivers of that decision are clinical variables not captured in gene expression data. Most strikingly, GraphPath achieves an AUROC of 0.92 on prostate targeted molecular therapy prediction, the highest score in the entire benchmark, demonstrating that lateral co-regulation structure produces exceptional discriminative power when matched to a cohort with a narrow targetable driver programme, even under conditions of extreme class imbalance at only 11\% positive prevalence.

2606.09912 2026-06-10 cs.LG cs.AI 新提交

Mix, Don't Pick: Why Synthetic Corpus Composition Matters for Time Series Foundation Model Pretraining

混合而非挑选:为什么合成语料组合对时间序列基础模型预训练至关重要

Aaryan Nagpal, Debdeep Sanyal, Murari Mandal, Dhruv Kumar, Saurabh Deshpande

AI总结 针对时间序列基础模型预训练中合成数据生成器选择困难的问题,提出简单等权混合所有生成器的方法,匹配或超越最优单个生成器,并与真实数据结合获得最强预训练语料。

Comments Accepted at the ICML 2026 Workshop on Foundation Models for Structured Data (FMSD), Seoul, South Korea

详情
AI中文摘要

为时间序列基础模型预训练选择错误的合成生成器代价高昂:在相同训练预算下,最佳和最差生成器产生的预测误差差距可达2倍,然而该领域尚无原则性的选择方法。问题因生成器排名在不同架构间不稳定而加剧:在11个生成器家族上,对从头训练的Chronos-T5-Mini和Moirai-Small进行评估,我们发现哪些生成器有用取决于模型架构。我们没有解决生成器选择问题,而是绕过了它:所有生成器的简单等权混合匹配或击败了两种架构的最佳单个生成器,并且将此混合与真实数据组合产生了整体最强的预训练语料。因此,合成预训练是一个语料组合问题,而非生成器选择问题,组合选择应针对每个模型家族进行验证,而非假设可迁移。

英文摘要

Choosing the wrong synthetic generator for time-series foundation model pretraining is costly: under identical training budgets, the best and worst generators produce up to a $2\times$ gap in forecasting error, yet the field has no principled way to make this choice. The problem is compounded by the fact that generator rankings are not stable across architectures: across 11 generator families evaluated on Chronos-T5-Mini and Moirai-Small trained from scratch, we find that which generators are useful depends on the model architecture. Rather than solving the generator selection problem, we sidestep it: a simple equal-weight mixture of all generators matches or beats the best individual generator for both architectures, and composing this mixture with real data yields the strongest pretraining corpora overall. Synthetic pretraining is therefore a corpus composition problem, not a generator selection problem, and composition choices should be validated per model family rather than assumed to transfer.

2606.10194 2026-06-10 cs.LG cs.AI 新提交

MMClima: A Framework for Multimodal Climate Science Data and Evaluation

MMClima:多模态气候科学数据与评估框架

Muhammad Umer Sheikh, Hassan Abid, Khawar Shehzad, Ufaq Khan, Muhammad Haris Khan

AI总结 提出MMClima,一个包含10万+专家验证问答对的多模态气候问答框架,覆盖文本、视频和图表,用于评估多模态语言模型在气候科学中的表现。

详情
AI中文摘要

气候变化研究日益需要能够推理文本、动态视觉内容和科学图表的AI系统,然而现有的气候问答基准规模小、大多为文本,且覆盖模型范围狭窄。我们提出MMClima,一个大规模多模态气候问答框架,包含10万+专家验证的问答对,涵盖五个核心气候科学领域的文章、视频转录和图表。MMClima通过自动化的声明提取和问答合成构建,并采用人在回路验证以确保规模和可靠性。利用MMClima,我们在需要事实回忆、视觉解释和跨模态合成的任务上对最先进的多模态语言模型进行基准测试。此外,我们在文本分割上进行微调,得到mmclima-70b-txt,一个领域适应的基线模型,在文本问答上优于强大的开源和闭源模型。我们发布数据集、评估流程、微调模型权重和数据创建框架,以支持气候科学的标准多模态评估。

英文摘要

Climate change research increasingly requires AI systems that reason across text, dynamic visual content, and scientific figures, yet existing climate QA benchmarks are small, mostly textual, and cover a narrow range of models. We introduce MMClima, a large-scale multimodal climate question answering framework with 104k+ expert-validated question-answer pairs spanning articles, video transcriptions, and figures across five core climate science domains. MMClima is constructed via automated claim extraction and QA synthesis with human-in-the-loop validation to ensure both scale and reliability. Using MMClima, we benchmark state-of-the-art multimodal language models on tasks requiring factual recall, visual interpretation, and cross-modal synthesis. We additionally fine-tune on the textual split to produce mmclima-70b-txt, a domain-adapted baseline that outperforms strong open- and closed-source models on textual QA. We release the dataset, evaluation pipeline, fine-tuned model weights, and data creation framework to support standardized multimodal evaluation for climate science.

2606.10287 2026-06-10 cs.LG cs.CL 新提交

When Metrics Disagree: A Meta-Analysis of Knowledge-Graph-Completion Model Benchmarking

当指标不一致时:知识图谱补全模型基准测试的元分析

Haji Gul, Ajaz Ahmad Bhat

发表机构 * School of Digital Science, Universiti Brunei Darussalam(布鲁内尔大学数字科学学院)

AI总结 针对KGC模型评估中指标冲突问题,提出多准则决策框架,通过元分析发现Z-score是最平衡的聚合器,并识别出不同预测任务下的最优模型。

详情
AI中文摘要

评估知识图谱补全(KGC)模型仍然具有挑战性,因为标准评估依赖于孤立的基于排名的指标,如MRR、Hits$@$k和Mean Rank,这些指标通常在不同数据集上产生冲突的模型排序。一个在MRR上领先的模型可能在Hits@1上落后,而在一个数据集上的强性能可能无法推广到另一个数据集。这种碎片化阻碍了比较,使得选择性报告成为可能,并掩盖了真正的进展。我们将KGC评估重新定义为多准则决策(MCDM)问题,并提出了一个对七个聚合器在五个测试上的元分析:一致性、跨数据集稳定性、指标独立性、噪声下的鲁棒性和泛化性。每个测试通过留一模型(LOMO)和留一组(LOGO)移除进行平均,以便可靠性反映聚合器在不同模型子集上的行为。在尾部$(h,r,?)$和关系$(h,?,t)$预测中,帕累托最优分析确定Z-score是最平衡的聚合器,它在尾部预测中排名DualE最高,在关系预测中排名FMS(流调制评分)最高。使用相同移除的测试敏感性分析表明,一致性和稳定性在很大程度上是移除不变的,而泛化性和独立性是最敏感的。该框架解决了评估不一致性,并为KGC中的聚合器选择和模型基准测试提供了基于证据的指导。

英文摘要

Evaluating Knowledge Graph Completion (KGC) models remains challenging because standard assessment relies on isolated rank-based metrics such as MRR, Hits$@$k, and Mean Rank, which often produce conflicting model orderings across datasets. A model that leads on MRR may trail on Hits@1, and strong performance on one dataset may not generalize to another. This fragmentation hinders comparison, enables selective reporting, and obscures real progress. We reframe KGC evaluation as a Multi-Criteria Decision-Making (MCDM) problem and present a meta-analysis of seven aggregators across five tests: consistency, cross-dataset stability, metric independence, robustness under noise, and generalizability. Each test is averaged over leave-one-model-out (LOMO) and leave-one-group-out (LOGO) removals so that reliability reflects aggregator behavior across diverse model subsets. Across tail $(h,r,?)$ and relation $(h,?,t)$ prediction, Pareto-optimal analysis identifies Z-score as the most balanced aggregator, which ranks DualE highest for tail prediction and FMS (Flow-Modulated Scoring) highest for relation prediction. A test-sensitivity analysis using the same removals shows that consistency and stability are largely removal-invariant, while generalizability and independence are the most sensitive. The framework resolves evaluation inconsistencies and offers evidence-based guidance for aggregator selection and model benchmarking in KGC.

2606.10798 2026-06-10 cs.LG 新提交

CITRAS-FM: Tiny Time Series Foundation Model for Covariate-Informed Zero-Shot Forecasting

CITRAS-FM: 面向协变量信息零样本预测的微型时间序列基础模型

Yosuke Yamaguchi, Issei Suemitsu, Yuki Kajihara, Wenpeng Wei

发表机构 * University of Tokyo(东京大学) Keio University(庆应大学)

AI总结 提出CITRAS-FM,一个仅7M参数的时间序列基础模型,通过引入Shifted Attention和协变量合成方法CovSynth,实现高效零样本预测,在100个任务上达到子10M模型最优精度且CPU推理时间低于0.1秒。

Comments Accepted to EUSIPCO 2026

详情
AI中文摘要

预训练的时间序列基础模型(TSFMs)已实现对未见目标序列的零样本预测。然而,现有TSFMs通常计算成本高,对多样变量类型的支持有限,且往往未能考虑外生影响目标变异的协变量。为解决这些挑战,我们提出CITRAS-FM,一个仅7M参数的微型TSFM,支持单变量、多变量和协变量信息零样本预测,并实现实时CPU推理。基于补丁化的仅解码器Transformer,CITRAS-FM在跨变量模块中引入Shifted Attention,以有效利用在整个预测范围内可获取的已知协变量。此外,为了在协变量丰富语料稀缺的情况下实现协变量感知预训练,我们提出CovSynth,从目标序列的分解成分中合成逼真的协变量。在fev-bench上的实验(涵盖不同设置下的100个任务)表明,CITRAS-FM在子10M TSFMs中实现了最先进的零样本精度,同时提供低于0.1秒的CPU推理,在预测精度和实时部署能力之间取得了强平衡。

英文摘要

Pretrained time series foundation models (TSFMs) have enabled zero-shot forecasting on unseen target series. However, existing TSFMs often incur high computational cost and provide limited support for diverse variable types, often failing to account for covariates that exogenously influence target variability. To address these challenges, we propose CITRAS-FM, a tiny 7M-parameter TSFM that supports univariate, multivariate, and covariate-informed zero-shot forecasting with real-time CPU inference. Built on a patch-based, decoder-only Transformer, CITRAS-FM introduces Shifted Attention into the cross-variate module to effectively exploit known covariates accessible throughout the forecast horizon. Moreover, to enable covariate-aware pretraining despite the scarcity of covariate-rich corpora, we propose CovSynth, which synthesizes realistic covariates from decomposed components of target series. Experiments on fev-bench, spanning 100 tasks across various settings, demonstrate that CITRAS-FM achieves state-of-the-art zero-shot accuracy among sub-10M TSFMs while delivering sub-0.1-second CPU inference, offering a strong balance between forecasting accuracy and real-time deployability.

2606.10938 2026-06-10 cs.LG 新提交

A Systematic Approach for Selecting Trajectories for Data Augmentation

一种系统化的轨迹数据增强选择方法

Adam Nordling

发表机构 * Masters Degree Project(硕士项目)

AI总结 提出系统化框架评估五种轨迹选择策略(异常性、多样性、代表性、不确定性和随机),在四个数据集上测试,发现异常性和不确定性策略在稀疏数据中提升稳定性,但在密集数据中可能引入噪声。

Comments 39 pages, 4 figures, Masters project

详情
AI中文摘要

轨迹数据增强是缓解机器学习应用中数据稀缺问题的一种有前景的方法,但其效用因保持时空一致性的复杂性而受到限制。尽管先前的工作证明了几何扰动的可行性,但它依赖于简单的随机选择,在理解哪些轨迹应被增强以获得最大收益方面留下了关键空白。本文通过开发一个系统且可扩展的框架来评估五种系统选择策略:异常性、多样性、代表性、不确定性和随机选择,填补了这一空白。这些策略在四个数据集(涵盖动物行为(Foxes和Starkey)、海上交通(AIS)和城市交通(Car))上使用一系列线性和非线性机器学习模型进行了严格测试。作为评估的一部分,集成了基于Optuna的超参数优化循环,以在探索的搜索空间内经验性地确定每个数据集的最佳增强参数。结果表明,虽然系统选择并非通用解决方案,但它比随机基线具有明显优势。系统策略,特别是异常性和不确定性,表现出更高的稳定性,并且在密集数据集中不易出现随机采样观察到的性能下降。然而,研究结果也表明,增强的价值是有严格条件的。通过UMAP的可视化分析表明,虽然系统增强成功修复了稀疏数据集中的拓扑碎片化,但在高质量密集数据集中,它可能充当破坏性噪声信号。此外,研究还发现了高速度域中的物理限制,其中标准扰动技术导致特征空间中的发散。

英文摘要

Trajectory data augmentation is a promising approach to mitigate data scarcity in machine learning applications, but its utility has been limited by the complexity of preserving spatio-temporal coherence. Although prior work demonstrated the viability of geometric perturbation, it relied on naive random selection, leaving a critical gap in understanding which trajectories should be augmented for maximal benefit. This thesis addresses this gap by developing a systematic and scalable framework to evaluate five systematic selection strategies: Outlierness, Diversity, Representativeness, Uncertainty, and Random selection. These strategies were rigorously tested across four datasets covering animal behavior (Foxes and Starkey), maritime traffic (AIS), and urban traffic (Car) using a suite of linear and non-linear machine learning models. As part of this evaluation, an Optuna-based hyperparameter optimization loop was integrated to empirically identify the best-performing augmentation parameters for each dataset within the explored search space. The results indicate that, while systematic selection is not a universal solution, it offers distinct advantages over the random baseline. Systematic strategies, particularly Outlierness and Uncertainty, demonstrated higher stability and were less prone to performance degradation observed with random sampling in dense datasets. However, the findings also reveal that the value of augmentation is strictly conditional. Visual analysis via UMAP demonstrates that while systematic augmentation successfully repairs topological fragmentation in sparse datasets, it can act as a corrupting noise signal in high-quality, dense datasets. Furthermore, the study identified physical limitations in high-velocity domains, where standard perturbation techniques lead to divergence in feature space...

2606.11144 2026-06-10 cs.LG q-bio.GN q-bio.QM stat.AP 新提交

OncoTraj: a public benchmark for longitudinal resistance prediction in EGFR-mutant non-small-cell lung cancer on osimertinib

OncoTraj:EGFR突变非小细胞肺癌奥希替尼耐药纵向预测的公共基准

Abhijoy Sarkar, Aarchi Singh Thakur

发表机构 * Span AI

AI总结 针对EGFR突变非小细胞肺癌一线奥希替尼耐药预测缺乏公共基准的问题,提出OncoTraj基准,整合813名患者数据,定义三项任务,并发现单时间点组织NGS特征导致所有模型性能接近随机,而TP53共突变与进展率升高相关。

Comments 24 pages, 7 figures, 4 tables. Code, data, and trained model weights: https://github.com/span-ai-labs/oncotraj. Python package: pip install oncotraj. Dataset: https://huggingface.co/datasets/span-ai-labs/oncotraj-v1

详情
AI中文摘要

EGFR突变非小细胞肺癌(NSCLC)对一线奥希替尼的耐药是治疗压力下可预测克隆演化的典型例子,但目前尚无用于训练或评估相应纵向患者轨迹计算模型的公共基准。我们推出OncoTraj,这是一个来自三个真实世界临床基因组数据源(MSK-CHORD(672名患者)、AACR Project GENIE BPC NSCLC(34名患者)和FLAURA分子耐药补充(107名患者))的813名接受一线奥希替尼治疗的EGFR突变NSCLC患者的公共基准。OncoTraj定义了三个锁定任务:(A)固定12个月标志点的进展二元分类,(B)首次进展时间(天)的回归,以及(C)主要耐药机制的六类分类。我们发布了统一的数据集、经过审计的无泄漏保证的患者级训练/验证/测试划分、一个开源评估框架,以及六个参考基线,涵盖多数类预测器、逻辑回归、随机森林、XGBoost、LSTM和多任务Transformer。使用v1的单时间点快照特征,所有模型在干净的源内评估中均未超过随机水平:这种天花板在不同模型类别中的一致性表明限制在于输入模态(单快照组织NGS而非连续ctDNA),而非算法。该基准确实恢复了可重复的、与文献一致的关联:TP53共突变使整个队列的12个月进展率从29%提高到59%。OncoTraj建立了一个可重复、经泄漏审计的基线,并将模态限制转化为针对富集连续ctDNA的v2的具体设计要求。

英文摘要

Resistance to first-line osimertinib in EGFR-mutant non-small-cell lung cancer (NSCLC) is the canonical example of predictable clonal evolution under therapeutic pressure, yet no public benchmark exists for training or evaluating computational models on the corresponding longitudinal patient trajectories. We introduce OncoTraj, a public benchmark of 813 EGFR-mutant NSCLC patients receiving first-line osimertinib, harmonized from three real-world clinical-genomic sources: MSK-CHORD (672 patients), AACR Project GENIE BPC NSCLC (34 patients), and the FLAURA molecular-resistance supplement (107 patients). OncoTraj defines three locked tasks: (A) binary classification of progression by a fixed 12-month landmark, (B) regression of time-to-first-progression in days, and (C) six-class classification of the dominant resistance mechanism. We release the harmonized dataset, patient-level train/validation/test splits with an audited no-leakage guarantee, an open-source evaluation harness, and six reference baselines spanning a majority-class predictor, logistic regression, random forest, XGBoost, an LSTM, and a multi-task transformer. With v1's single-timepoint snapshot features, no task clears chance on clean within-source evaluation: the uniformity of this ceiling across every model class localizes the limit to the input modality (single-snapshot tissue NGS rather than serial ctDNA), not the algorithm. The benchmark does recover a reproducible literature-consistent association: TP53 co-mutation raises the 12-month progression rate from 29% to 59% cohort-wide. OncoTraj establishes a reproducible, leakage-audited baseline and converts the modality limit into concrete design requirements for a serial-ctDNA-enriched v2.

2606.09882 2026-06-10 cs.CV cs.LG 交叉投稿

WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory

WHU-Infra3D:面向3D路边基础设施清单的全栈多模态数据集与基准

Chong Liu, Luxuan Fu, Xuyu Feng, Zhen Dong, Bisheng Yang

发表机构 * State Key Laboratory of Information Engineering in Surveying, Mapping and Remote Sensing (LIESMARS)(信息工程测绘遥感国家重点实验室) Wuhan University(武汉大学)

AI总结 提出WHU-Infra3D多模态基准数据集,覆盖三城市53.8公里,融合全景图像与LiDAR点云,提供2D-3D实例关联和跨帧跟踪,支持基础设施状态诊断与属性识别,填补自动化维护数据集空白。

详情
AI中文摘要

数字孪生城市的范式正从粗略的视觉映射转向更精确、可操作的城市资产数字化。然而,现有数据集主要关注粗略的视觉感知,缺乏自动化基础设施维护所需的严格多模态对齐和属性及状态诊断。为弥合这一差距,我们引入了WHU-Infra3D,一个大规模、多模态的基准数据集,专门用于路边基础设施清单。覆盖三个城市53.8公里,WHU-Infra3D独特地集成了全景图像和LiDAR点云,并具有严格的2D-3D实例关联和跨帧跟踪。该数据集包含超过17.5万个多视图2D边界框以及数千个3D基础设施实例,提供了超过18.1万个详细的属性和状态注释(例如,锈蚀、遮挡),以支持运行健康评估。我们在五个核心任务上建立了全面的基线:2D检测、2D跨视图匹配、3D地理识别、3D点云分割和属性识别。广泛的评估暴露了当前模型在长尾缺陷状态上的显著跨城市领域差距和固有脆弱性,使WHU-Infra3D成为推进可扩展、AI驱动的城市基础设施清单和生命周期管理的重要试验场。WHU-Infra3D数据集可在以下网址获取:https://xxx。

英文摘要

The paradigm of digital twin cities is shifting from coarse visual mapping toward more precise and actionable digitization of urban assets. However, existing datasets predominantly focus on coarse visual perception, lacking the strict multi-modal alignment and attribute and status diagnosis required for automated infrastructure maintenance. To bridge this gap, we introduce WHU-Infra3D, a large-scale, multi-modal benchmark dataset dedicated to roadside infrastructure inventory. Covering 53.8 km across three cities, WHU-Infra3D uniquely integrates panoramic imagery and LiDAR point clouds with rigorous 2D-3D instance association and cross-frame tracking. Comprising over 175k multi-view 2D bounding boxes alongside thousands of 3D infrastructure instances, the dataset provides over 181k detailed attribute and status annotations (e.g., rust, occlusion) to empower operational health assessment. We establish comprehensive baselines across five core tasks: 2D detection, 2D cross-view matching, 3D geo-identification, 3D point cloud segmentation, and attribute recognition. Extensive evaluations expose significant cross-city domain gaps and inherent vulnerabilities of current models on long-tailed defective statuses, establishing WHU-Infra3D as an essential testbed for advancing scalable, AI-driven urban infrastructure inventory and lifecycle management. The WHU-Infra3D dataset is available at https://github.com/WHU-USI3DV/WHU-Infra3D.

2606.10066 2026-06-10 cs.CV cs.AI cs.LG 交叉投稿

A Controlled Audit of Pretraining Contamination in Public Medical Vision-Language Benchmarks

公共医学视觉语言基准中预训练污染的受控审计

Bruce Changlong Xu, Lan Wu, Alexander Ryu

AI总结 审计发现公共医学VLM基准存在图像源重叠和文本规范顺序交换性信号,但确认的像素级重复罕见,且现有成员推理检测器在小规模医学VLM队列中不可靠。

Comments 30 pages, 7 figures, 9 tables. Preprint

详情
AI中文摘要

医学视觉语言模型(VLM)在公共基准上进行评估,这些基准的图像和问答对多年来一直可自由下载,但报告准确度假设这些示例在预训练中不存在。我们对SLAKE-En、PathVQA、VQA-RAD以及一个辅助的公共OmniMedVQA镜像上的开放VLM进行了审计,使用了四种检测器系列:图像侧近邻重叠(针对PMC-OA-beta)、规范顺序可交换性、队列相对Min-K%++尾部富集以及跨模型Top-K重叠。我们发现SLAKE-En上存在可测量的图像侧源重叠:SigLIP-B-16标记了19.8%的图像,SigLIP-SO400M标记了4.2%,而域外对照产生0/2000个标记。人工裁定显示,相同模态、相同投影的匹配对应不同患者,而非经过验证的像素级重复,因此我们将其解释为源或分布重叠,而非确认的每图像记忆。在文本侧,Qwen2.5-VL在SLAKE-En上显示出规范顺序可交换性信号,该信号在顺序消融和外部非医学基线中仍然存在。在OmniMedVQA镜像上,五个医学和通用VLM触发了可交换性,而BLIP-2保持干净。相比之下,队列相对Min-K%++尾部富集和跨模型Top-K重叠在外部预域基线中崩溃:BLIP-2重现了明显的正信号,尽管缺乏合理的医学VQA暴露。我们得出结论,这些队列相对检测器作为小规模医学VLM队列上的独立成员推理信号是不可靠的。

英文摘要

Medical vision-language models (VLMs) are evaluated on public benchmarks whose images and question-answer pairs have been freely downloadable for years, yet reported accuracy assumes these examples were absent from pretraining. We audit open VLMs on SLAKE-En, PathVQA, VQA-RAD, and an auxiliary public OmniMedVQA mirror using four detector families: image-side near-neighbour overlap against PMC-OA-beta, canonical-order exchangeability, cohort-relative Min-K%++ tail enrichment, and cross-model top-K overlap. We find measurable image-side source overlap on SLAKE-En: 19.8% of images are flagged under SigLIP-B-16 and 4.2% under SigLIP-SO400M, while out-of-domain controls produce 0/2000 flags. Manual adjudication shows same-modality, same-projection matches to different patients rather than verified pixel-level duplicates, so we interpret this as source or distributional overlap rather than confirmed per-image memorization. On the text side, Qwen2.5-VL on SLAKE-En shows a canonical-order exchangeability signal that survives ordering ablation and external non-medical baselines. On the OmniMedVQA mirror, exchangeability fires for five medical and general VLMs while BLIP-2 remains clean. In contrast, cohort-relative Min-K%++ tail enrichment and cross-model top-K overlap collapse under an external pre-domain baseline: BLIP-2 reproduces the apparent positive signals despite lacking plausible medical-VQA exposure. We conclude that these cohort-relative detectors are unreliable as standalone membership-inference signals on small medical-VLM cohorts.

2606.10087 2026-06-10 cs.CL cs.LG 交叉投稿

CodeAlchemy: Synthetic Code Rewriting at Scale

CodeAlchemy:大规模合成代码重写

Ankit Gupta, Aditya Prasad, Rameswar Panda

AI总结 提出CodeAlchemy框架,通过5种策略生成超过500B token的合成代码数据,引入DevEval和TraceEval基准,3B模型在多项任务上超越10倍大小的前沿模型。

详情
AI中文摘要

在原始代码上预训练可以学习语法,但为多样化的真实世界任务格式提供的信号稀疏。虽然合成数据已被证明对语言模型具有变革性,但代码领域除有限的质量改进外仍基本未被探索。我们提出CodeAlchemy,一个合成数据生成框架,通过5种策略将公开来源的代码转换为语义丰富的训练数据:CodeEnhance(质量感知重写)、CodeQA(基于模板的问题)、CodeDev(开发者任务)、CodeDialogue(多轮对话)和CodeTrace(执行轨迹)。我们处理了15种语言的3个语料库,生成了超过500B token的合成数据以及350B推理token,数量级远超先前工作。CodeTrace对14种语言和5K个库的1.3M+文件进行插桩和执行,捕获控制流、状态跟踪和库知识。我们引入了DevEval(开发者任务)和TraceEval(执行预测)基准;前沿模型如Claude Sonnet 4.5在TraceEval上仅达到5.6%的精确匹配,揭示了语义理解的关键差距。我们的3B模型在HumanEval上达到83.5%,在MBPP上达到63.2%,在DevEval上达到8.09%的胜率,在TraceEval上达到15.36 ROUGE-2,超越了包括27B Gemma-3和32B Granite-4.0在内的10倍大小的前沿模型。

英文摘要

Pre-training on raw code teaches syntax but provides sparse signal for diverse real-world task formats. While synthetic data has proven transformative for language models, code remains largely unexplored beyond limited quality improvements. We present CodeAlchemy, a synthetic data generation framework that transforms publicly sourced code into semantically-rich training data through 5 strategies: CodeEnhance (quality-aware rewriting), CodeQA (template-based problems), CodeDev (developer tasks), CodeDialogue (multi-turn conversations), and CodeTrace (execution traces). We process 3 corpora across 15 languages to generate 500B+ tokens of synthetic data plus 350B reasoning tokens, orders of magnitude more than prior efforts. CodeTrace instruments and executes 1.3M+ files across 14 languages and 5K libraries, capturing control flow, state tracking, and library knowledge. We introduce DevEval (developer tasks) and TraceEval (execution prediction) benchmarks; frontier models like Claude Sonnet 4.5 achieve only 5.6% exact match on TraceEval, revealing critical gaps in semantic understanding. Our 3B models achieve 83.5% on HumanEval, 63.2% on MBPP, 8.09% win rate on DevEval, and 15.36 ROUGE-2 on TraceEval, outperforming frontier models 10x the size including 27B Gemma-3 and 32B Granite-4.0.

2606.10125 2026-06-10 stat.ML cs.DB cs.LG 交叉投稿

Robust Active Learning for Few-Shot Example Selection in Text-to-SQL

鲁棒主动学习用于文本到SQL中的少样本示例选择

Arash Pourhabib

发表机构 * NVIDIA

AI总结 针对文本到SQL中少样本示例选择,提出一种鲁棒主动学习方法,通过分层贪婪算法最大化异方差互信息目标,在嵌入流形上实现常数因子近似保证,显著减少标注成本。

Comments 31 pages, 4 figures, 5 tables

详情
AI中文摘要

少样本示例检索是将大型语言模型(LLM)应用于特定领域文本到SQL系统的主要范式。然而,标注示例库的质量直接决定系统准确性,且专家标注成本高昂。我们将这些示例的主动选择形式化为一个在语义查询嵌入的内在低维流形上的约束实验设计问题。与标准主动学习框架不同,我们的设置引入了三个关键挑战:依赖于查询的可变标注可靠性(异方差性)、跨语义主题的空间多样性严格要求(划分拟阵约束),以及嵌入空间真实协方差结构未知的固有现实(模型误设)。为了解决这些问题,我们提出了一种分层贪婪算法,该算法最大化异方差互信息目标。我们证明该目标在内在流形上保持子模性和近似单调性,从而得到理论上的常数因子近似保证。我们建立了一个谱界,表明当假设的替代核与真实数据生成过程存在偏差时,该近似保证会优雅地退化,而非灾难性地崩溃。实验结果表明,所提出的策略显著减少了标注工作量,同时保持了较高的文本到SQL检索准确性。

英文摘要

Few-shot example retrieval is the dominant paradigm for grounding large language models (LLMs) in domain-specific text-to-SQL systems. However, the quality of the annotated example bank directly governs system accuracy, and expert annotation is prohibitively expensive. We formalize the active selection of these examples as a constrained experimental design problem over the intrinsic, low-dimensional manifold of semantic query embeddings. Unlike standard active learning frameworks, our setting introduces three critical challenges: varying, query-dependent annotation reliability (heteroscedasticity), strict requirements for spatial diversity across semantic topics (partition matroid constraints), and the inherent reality that the true covariance structure of the embedding space is unknown (misspecification). To address these, we propose a stratified greedy algorithm that maximizes a heteroscedastic mutual information objective. We prove that this objective remains submodular and approximately monotonic on the intrinsic manifold, yielding a theoretical constant-factor approximation guarantee. We establish a spectral bound demonstrating that this approximation guarantee degrades gracefully, rather than catastrophically, when the assumed surrogate kernel diverges from the true underlying data-generating process. Empirical results demonstrate that the proposed strategy significantly reduces labeling effort while maintaining high text-to-SQL retrieval accuracy.

2606.10187 2026-06-10 stat.ML cs.LG 交叉投稿

Decision-Calibrated Conformal Uncertainty for Pacing Decisions in Streaming Advertising

面向流式广告中节奏控制的决策校准共形不确定性

Prashant Shekhar, Caroline Howard

发表机构 * Department of Mathematics, Embry-Riddle Aeronautical University(数学系,埃姆伯里-瑞德航空大学)

AI总结 提出一种决策校准共形框架,通过衡量预测误差对实际部署策略的最大影响来校准不确定性,理论证明该分数是保护所有可部署节奏控制策略的最小有效不确定性度量,并在公开数据集上显著降低不确定性半径。

详情
AI中文摘要

我们开发了一个决策校准的共形框架,用于流式广告中的节奏控制决策。节奏控制依赖于不确定的未来库存、需求压力、增量响应和会员体验负载。该框架不是校准通用的预测残差,而是通过预测误差对实际可能部署的策略的最大影响来衡量预测误差。主要定理表明,所提出的分数是统一保护所有可部署节奏控制策略的最小有效不确定性度量。几何上,它是有符号策略敏感性集的支持函数。分裂共形校准为该分数提供了有限样本覆盖。一个高维分离定理表明,传统的残差校准可能因支付干扰库存维度而任意保守,而一个鲁棒的节奏控制结果结合了库存、响应和体验不确定性。在基于Criteo Uplift和KuaiRand数据集构建的公开数据校准节奏控制回放中,传统共形节奏控制仍然未解决,在Criteo上残差半径高达7236.7,在KuaiRand上为4629.4。采用所提出的决策校准方法,不确定性半径分别降至18.4和278.6,并为价值、交付、预算和会员负载设置了单独的边际。在Criteo上,所提出的方法证明了比点预测基线更不激进的节奏控制策略,并将保留的任何违规率从16.7%降至3.3%,且预算和会员负载违规为零。在KuaiRand上,选择仍未解决。简而言之,本文确立了预测、响应估计和会员体验模型应根据它们是否缩小节奏控制决策使用的不确定性来判断,因为这会导致自信且不过度保守的决策。

英文摘要

We develop a decision-calibrated conformal framework for pacing decisions in streaming advertising. Pacing depends on uncertain future inventory, demand pressure, incremental response, and member-experience load. Instead of calibrating a generic forecast residual, the framework measures forecast error by its largest impact on the policies that could actually be deployed. The main theorem shows that the proposed score is the smallest valid uncertainty measure that uniformly protects all deployable pacing policies. Geometrically, it is the support function of the signed policy sensitivity set. Split conformal calibration gives finite-sample coverage for this score. A high-dimensional separation theorem shows that traditional residual calibration can be arbitrarily more conservative by paying for nuisance inventory dimensions, and a robust pacing result combines inventory, response, and experience uncertainty. On public-data-calibrated pacing replays built from Criteo Uplift and KuaiRand datasets, traditional conformal pacing remains unresolved with high residual radii of 7236.7 on Criteo and 4629.4 on KuaiRand. With the proposed decision calibration approach, the uncertainty radii are reduced to 18.4 and 278.6 respectively, with separate margins for value, delivery, budget, and member load. On Criteo, the proposed method certifies a less aggressive pacing policy than the point-forecast baseline, and reduces held-out any-violation rate from 16.7% to 3.3%, with zero budget and member-load violations. On KuaiRand, the choice remains unresolved. In a nutshell, the paper establishes that forecasts, response estimates, and member-experience models should be judged by whether they shrink the uncertainty that the pacing decision uses, as this leads to confident decisions that are not overly conservative.

2606.10229 2026-06-10 cs.RO cs.LG 交叉投稿

What Demonstration Curation Metrics Do to Your Policy

演示筛选指标对策略的影响

Aarav Bedi

AI总结 研究演示筛选指标在检测缺陷演示后,是否提升基于行为克隆的策略性能。发现指标检测缺陷的能力与策略性能严重脱钩,并揭示演示时长作为混淆变量的影响。

Comments 6 pages, 1 figure, 2 tables

详情
AI中文摘要

我们研究了检测缺陷训练演示的筛选指标是否也能改善基于筛选数据训练的行为克隆策略。在一个接触密集的LIBERO抓取放置基准任务中,通过引入受控结构缺陷(搬运阶段早期释放夹爪),我们发现这两个量是严重解耦的。具有最高缺陷检测AUROC(0.804)的指标产生了最差的筛选策略(任务成功率13.3%),而AUROC显著较低(0.638)的指标产生的策略几乎与在真实干净数据上训练的Oracle策略相匹配(90.0% vs. 93.3%)。我们进一步表明,我们评估的七个指标中有五个利用演示时长作为缺陷标签的琐碎代理,这种混淆因素将报告的AUROC膨胀到接近完美的值,并且在控制演示时长后消失。在所有条件下,受污染的基线仅在3.3%的测试中成功,而两种最佳的筛选方法将差距缩小到Oracle上限93.3%的3个百分点以内。我们的结果认为,筛选方法应根据其产生的策略来评估,而不是根据其标记的缺陷,并且任何筛选基准在报告检测准确性之前必须控制演示时长。我们发布了测试平台、所有指标实现和评估流程。

英文摘要

We study whether demonstration-curation metrics that detect defective training episodes also improve the downstream behavior-cloning policy that trains on the curated data. On a contact-rich LIBERO pick-and-place benchmark with a controlled structural defect (early gripper release during the carry phase), we find that the two quantities are sharply decoupled. The metric with the highest defect-detection AUROC (0.804) produces the worst curated policy (13.3% task success), while a metric with a substantially lower AUROC (0.638) produces a policy that nearly matches the oracle trained on ground-truth clean data (90.0% vs. 93.3%). We further show that five of the seven metrics we evaluate exploit episode length as a trivial proxy for the defect label, a confound that inflates reported AUROCs to near-perfect values and disappears once episode length is controlled. Across all conditions, the contaminated baseline succeeds on only 3.3% of rollouts, and the two best curation methods close this to within 3 percentage points of the 93.3% oracle ceiling. Our results argue that curation methods should be evaluated by the policy they produce, not the defects they flag, and that any curation benchmark must control for episode length before reporting detection accuracy. We release the testbed, all metric implementations, and the evaluation pipeline.

2606.10246 2026-06-10 cs.SD cs.AI cs.LG 交叉投稿

Linguistically Augmented Audio Speech Data (LinguAS)

语言增强音频语音数据 (LinguAS)

Ashley R. Keaton, Zahra Khanjani, Christine Mallinson, Vandana P. Janeja

发表机构 * University of Maryland, Baltimore County(马里兰大学巴尔的摩分校)

AI总结 提出LinguAS数据集,通过专家定义的语言特征(EDLFs)增强音频数据,显著提升深度伪造语音检测模型性能。

详情
AI中文摘要

恶意创建的伪造语音,包括深度伪造和欺骗音频,正以惊人速度扩散,检测模型竞相保持领先。然而,大多数检测模型仅基于帧级音频特征进行推理,未利用更大时间尺度上的有价值语言线索。为弥补这一空白,我们提出语言增强音频语音数据(LinguAS),这是一个包含真实和深度伪造音频样本的数据集,标注了五种策略性选择的、专家定义的语言特征(EDLFs),这些特征在英语口语中频繁出现且是自然人类语音的特征。LinguAS包含超过800个音频样本,每个样本都标注了EDLFs。数据集包含四种欺骗音频攻击类型的平衡数量以及相应数量的真实语音样本。我们还包含说话者性别和每个欺骗音频样本的生成器/来源元数据,为模型训练提供更细粒度信息。我们发现,使用EDLFs增强数据训练的模型性能显著超过ASVspoof 2021深度学习基线和HuBert、XLSR等SSL模型。LinguAS增强的语言、性别和生成器元数据为音频深度伪造研究者提供了一个强调真实人类语言特征的数据集,以改进伪造语音的模型推理。数据和代码已公开。

英文摘要

Maliciously-created fake speech, including deepfaked and spoofed audio, is proliferating at an alarming rate, and detection models are racing to stay ahead of the curve. Yet, most detection models are trained to make inference on frame-level audio features alone without leveraging valuable linguistic cues at larger timescales. To address this gap, we present Linguistically Augmented Audio Speech Data (LinguAS), a dataset of genuine and deepfaked audio samples annotated with five strategically-chosen, Expert-Defined Linguistic Features (EDLFs) that occur frequently in spoken English and are characteristic of natural human speech. LinguAS contains over 800 audio samples, each of which are annotated with EDLFs. The dataset has a balanced number of four spoofed audio attack types and a proportionate number of genuine speech samples. We also include metadata on speaker gender and the generator/source for each spoofed audio sample, offering more granularity for model training. We found that models trained on data augmented with EDLFs had improved model performance significantly beyond the ASVspoof 2021 deep learning baselines and SSL models like HuBert and XLSR. LinguAS's augmented linguistic, gender, and generator metadata provide audio deepfake researchers with a dataset that emphasizes real human language traits to improve model inference of faked speech. Data and code are publicly available.

2606.10255 2026-06-10 eess.IV cs.CV cs.DL cs.LG physics.bio-ph 交叉投稿

POPSICLE: Benchmark Datasets for Segmentation and Localization in CryoET

POPSICLE: 用于冷冻电镜断层扫描中分割和定位的基准数据集

Jonathan Schwartz, Utz Heinrich Ermel, C. Braxton Owens, Zhuowen Zhao, Ariana Peck, Gus L. W. Hart, Grant J. Jensen, Bridget Carragher, Dari Kimanius

发表机构 * Biohub Brigham Young University

AI总结 提出POPSICLE基准套件,基于CryoET数据门户构建,涵盖真核和原核系统、纯化与原位样本,支持体素分割和稀疏定位任务,旨在解决冷冻电镜断层扫描中缺乏标准化基准的问题。

详情
AI中文摘要

冷冻电镜断层扫描(cryoET)通过直接可视化完整细胞内的分子结构,将分子架构与细胞组织在天然环境中联系起来,已成为结构和细胞生物学中的强大工具。然而,实现cryoET的全部潜力日益依赖于计算分析,特别是机器学习(ML)的进步,以解释其复杂且信息丰富的数据。尽管进展迅速,cryoET的ML开发仍受限于缺乏标准化、良好注释的基准。现有评估通常规模小、任务特定且孤立构建,限制了方法间的稳健比较。在此,我们提出POPSICLE,一个基于CryoET数据门户(一个开放、ML就绪的断层数据、元数据和注释库)构建的cryoET分割和大分子定位基准套件。POPSICLE涵盖真核和原核系统、纯化和完全原位样本,以及密集体素分割和稀疏定位任务。基于动态数据资源,它可随着新数据集和注释的出现而扩展。基线实验揭示了模型排名在不同任务间的显著变化,强调了需要针对cryoET独特特征定制的基准,而非从相邻生物医学成像领域借鉴的评估实践。因此,POPSICLE为cryoET中可重复的ML评估提供了开放且可扩展的基础。

英文摘要

Cryo-electron tomography (cryoET) has emerged as a powerful tool in structural and cellular biology by enabling direct visualization of macromolecular structures within intact cells, thereby linking molecular architecture to cellular organization in a native context. Realizing the full potential of cryoET, however, increasingly depends on advances in computational analysis, particularly machine learning (ML), to interpret its complex and information-rich data. Despite rapid progress, ML development for cryoET remains bottlenecked by the lack of standardized, well-annotated benchmarks. Existing evaluations are typically small, task-specific, and are assembled in isolation, limiting robust comparisons across methods. Here, we present POPSICLE, a benchmark suite for cryoET segmentation and macromolecular localization built from the CryoET Data Portal - an open, ML-ready repository of tomographic data, metadata, and annotations. POPSICLE spans eukaryotic and prokaryotic systems, both purified and fully in situ samples, and dense voxel-wise segmentation as well as sparse localization tasks. Built on a living data resource, it can expand as new datasets and annotations become available. Baseline experiments reveal substantial variation in model rankings across tasks, underscoring the need for benchmarks tailored to the unique characteristics of cryoET rather than evaluation practices adapted from adjacent biomedical imaging domains. POPSICLE thus provides an open and extensible foundation for reproducible ML evaluation in cryoET.

2606.10295 2026-06-10 stat.ML cs.LG math.ST stat.TH 交叉投稿

$k$-Nearest Neighbors in Gromov--Wasserstein Space

Gromov--Wasserstein空间中的$k$-最近邻

Kaitlyn Hohmeier, Nicolas Fraiman, Caroline Moosmueller

发表机构 * University of North Carolina at Chapel Hill, Department of Mathematics(北卡罗来纳大学教堂山分校数学系) University of North Carolina at Chapel Hill, Department of Statistics and Operations Research(北卡罗来纳大学教堂山分校统计与运筹学系)

AI总结 本文在Gromov-Wasserstein距离框架下实现k-最近邻分类,证明了度量测度空间和图上分类器的普适一致性,并通过实验验证了其有效性。

详情
AI中文摘要

Gromov--Wasserstein (GW) 距离为比较度量测度空间提供了一个框架,无论其底层结构或几何形状如何。对于基于网络的数据,它能够直接比较具有不同节点数量的图,无需嵌入或其他抽象。此外,通过GW的变体——融合Gromov--Wasserstein (fGW),还可以在图形结构之外结合节点特征。在这项工作中,我们使用GW和fGW距离实现了$k$-最近邻 ($k$-NN) 分类。我们证明了在具有有限支撑和均匀概率测度的度量测度空间等价类空间上,GW-$k$-NN分类器的普适一致性。通过将图视为具有成对距离度量和节点上均匀概率测度的有限支撑度量测度空间,我们获得了图空间上GW-$k$-NN的普适一致性。类似地,对于fGW-$k$-NN,我们证明了在由具有有限支撑和均匀概率测度的度量测度空间以及到欧几里得空间的特征映射组成的结构化对象的弱同构类空间上的普适一致性,从而建立了节点属性图空间上的普适一致性。我们的数值实验表明,GW-$k$-NN和fGW-$k$-NN在多个图数据集上始终表现良好,这表明诸如$k$-NN之类的度量分类器在GW框架中效果良好。

英文摘要

The Gromov--Wasserstein (GW) distance provides a framework for comparing metric measure spaces, regardless of their underlying structure or geometry. For network-based data, it enables direct comparisons of graphs with different numbers of nodes, without requiring an embedding or other abstraction. Furthermore, through a variant of GW known as fused Gromov--Wasserstein (fGW), it is also possible to incorporate node features in addition to graph structure. In this work, we implement $k$-nearest neighbors ($k$-NN) classification using the GW and fGW distances. We prove the universal consistency of the GW-$k$-NN classifier on the space of equivalence classes of metric measure spaces with finite support and uniform probability measure. By viewing graphs as finitely supported metric measure spaces equipped with the pairwise distance metric and a uniform probability measure on the nodes, we obtain universal consistency of GW-$k$-NN for the space of graphs. Likewise for fGW-$k$-NN, we prove universal consistency on the space of weak isomorphism classes of structured objects consisting of metric measure spaces with finite support and uniform probability measure and feature maps into Euclidean space, thus establishing universal consistency on the space of node-attributed graphs. Our numerical experiments show that GW-$k$-NN and fGW-$k$-NN consistently perform well across multiple graph datasets, suggesting that metric classifiers such as $k$-NN work well in the GW framework.

2606.10440 2026-06-10 cs.DC cs.LG cs.NI 交叉投稿

ASTRA-sim 3.0: Next-Level Distributed Machine Learning Simulations via High-Fidelity GPU and Infrastructure Modeling

ASTRA-sim 3.0:通过高保真GPU和基础设施建模实现下一代分布式机器学习模拟

William Won, Jinsun Yoo, Tuan Ta, Moumita Dey, Andy Balogh, Pradosh Datta, Furkan Eris, Conor Green, Winston Liu, Changhai Man, Kingshuk Mandal, Amos Rai, Vinay Ramakrishnaiah, Ruchi Shah, David Sidler, Harsh Sikhwal, Hanjiang Wu, Tushar Krishna, Bradford M. Beckmann

发表机构 * AMD Research and Advanced Development(AMD研究与高级开发) Georgia Institute of Technology(佐治亚理工学院) Keysight Purdue University(普渡大学)

AI总结 针对分布式机器学习中延迟敏感通信建模的不足,提出ASTRA-sim 3.0,通过细粒度缓存行级负载存储模拟和标准化基础设施表示InfraGraph,实现高保真模拟,支持优化集合算法、网络需求和GPU架构的设计空间探索。

Comments 10 pages, 15 figures, one table

详情
AI中文摘要

分布式机器学习是当今大规模人工智能应用的关键范式。随着模型推理成为重要用例,对延迟敏感的集合通信进行忠实建模从未如此重要。因此,如今必须高保真地捕获设备架构并建模控制和数据路径。拥有分布式机器学习基础设施的通用、详细表示也至关重要。我们重新审视了有前途的开源社区驱动模拟器:ASTRA-sim。在这项工作中,我们识别了当前ASTRA-sim模拟器的局限性,并为其增加了新功能。为此,我们通过标准化的基础设施表示实现了细粒度、高保真的模拟,开辟了新的设计空间探索机会。我们提出了缓存行大小的负载存储粒度的模拟,并带有详细的图形处理单元(GPU)执行模型,以平衡模拟的可扩展性和保真度。我们还引入了InfraGraph,一种标准化表示,用于详细捕获分布式机器学习网络基础设施。使用更新的ASTRA-sim 3.0模拟器,我们展示了设计优化集合算法、网络需求和GPU架构的有趣设计空间探索。

英文摘要

Distributed machine learning (ML) is a key paradigm for today's large-scale artificial intelligence applications. As model inference arises as an important use case, faithful modeling of latency-sensitive collective communication has never been more important. Capturing the device architecture and modeling control and data paths at high fidelity is therefore a necessity today. Having a common, detailed representation for distributed ML infrastructure is also crucial. We revisit the promising open-source, community-driven simulator: ASTRA-sim. In this work, we identify limitations of the current ASTRA-sim simulator and augment it with new features. To this end, we enable fine-grained, high-fidelity simulation with a standardized infrastructure representation, opening new design space exploration opportunities. We propose the simulation at cache-line-sized load-store granularity, with a detailed graphics processing unit (GPU) execution model, to balance simulation scalability and fidelity. We also introduce InfraGraph, a standardized representation to capture distributed ML network infrastructure in detail. Using the updated ASTRA-sim 3.0 simulator, we showcase interesting design space explorations for designing optimized collective algorithms, network requirements, and GPU architectures.

2606.10673 2026-06-10 stat.OT cs.LG 交叉投稿

ClusBench: The Clustering Benchmark Data Resource You've All Been Waiting For (?)

ClusBench:你一直期待的聚类基准测试数据资源(?)

David P. Hofmeyr

发表机构 * School of Mathematical Sciences, Lancaster University(兰卡斯特大学数学科学学院)

AI总结 本文通过拟合灵活的非参数分布,从200多个公开数据集生成近3000个合成数据集,用于大规模聚类方法评估,保留真实数据细微差别。

详情
AI中文摘要

尽管存在一些非常常见的测试平台用于评估聚类方法的性能,但大规模基准测试通常局限于相对简单的模拟设置。在这里,我们描述了近3000个合成数据集的生成和整理,这些数据集源自200多个公开可用的数据集;其中大多数来自实际应用。通过为每个基础数据集拟合灵活的非参数分布,我们能够保留真实数据中许多难以在标准模拟中重现的细微差别,同时生成的数据集的大小有时远大于它们所源自的数据集。合成数据集以及附带的R包可从该https URL下载。

英文摘要

Although some very common test beds exist for assessing the performance of clustering methods, large scale benchmarking is typically limited to relatively simplistic simulation set-ups. Here we describe the production and curation of close to 3000 synthetic data sets, derived from more than 200 publicly available data sets; the majority of which arose from real-world applications. By fitting a flexible non-parametric distribution to each base data set we are able to retain much of the nuance in real-world data which is difficult to reproduce in standard simulations, while also producing data sets whose sizes are sometimes substantially greater than the data sets from which they are derived. The synthetic data sets, plus an accompanying R package, are available for download from https://github.com/DavidHofmeyr/ClusBench.

2606.10911 2026-06-10 cs.SD cs.AI cs.CR cs.LG 交叉投稿

Ethical and Technical Limits of Deepfake Speech Datasets

深度伪造语音数据集的伦理与技术限制

Vojtěch Staněk, Eva Trnovská, Kamil Malinka, Anton Firc

发表机构 * Brno University of Technology(布尔诺理工大学)

AI总结 通过审计39个深度伪造语音数据集,发现公平性评估因缺乏人口统计元数据而不可行,且数据集间真实语音源语料库重叠严重,影响跨数据集评估的可靠性。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

关于深度伪造语音检测器的鲁棒性和公平性的声明,其可信度仅与用于训练和评估这些系统的数据集相当。我们对深度伪造语音领域进行了数据集级别的审计。我们整理并分析了39个深度伪造语音数据集,检查了关键属性,包括可访问性、文档、人口统计和语言覆盖范围、数据集规模以及底层的真实语音来源。我们的审计揭示了两个重要的发现。首先,公平性评估在很大程度上不可行,因为大多数数据集缺乏人口统计元数据,只有少数包含性别或语言标签。这阻止了任何有意义的子组分析,并使得其他人口统计属性未被处理。其次,我们识别出不同数据集之间底层的真实语音源语料库存在大量重叠,这可能破坏跨数据集评估,并导致对泛化能力的夸大声称。

英文摘要

Claims about the robustness and fairness of deepfake speech detectors are only as credible as the datasets used to train and evaluate those systems. We present a dataset-level audit of the deepfake speech landscape. We compile and analyze 39 deepfake speech datasets, examining key attributes including accessibility, documentation, demographic and language coverage, dataset scale, and the underlying bona fide speech sources. Our audit reveals two important takeaways. Firstly, fairness assessment is largely infeasible because most datasets lack demographic metadata, and only a few contain gender or language labels. This prevents any meaningful subgroup analysis and leaves other demographic attributes unaddressed. Secondly, we identify substantial overlap in underlying bona fide source corpora across datasets, which can undermine cross-dataset evaluation and lead to overstated generalization claims.

2606.10940 2026-06-10 cs.CV cs.AI cs.LG 交叉投稿

Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals

民主化相机陷阱AI:用于检测英国哺乳动物的开源模型

Paul Fergus, Philip Stephens, Russell A. Hill, Lee Oliver, Katie Appleby, Sarah Beatham, Naomi Davies Walsh, Stuart Nixon, Naomi Matthews, Chris Sutherland, Kelly Hitchcock

发表机构 * Liverpool John Moores University(利物浦约翰穆里斯大学) Durham University(杜伦大学) MammalWeb(哺乳动物网) Game & Wildlife Conservation Trust(游戏与野生动物保护信托) National Trust(国家信托) Animal and Plant Health Agency(动物和植物卫生局) Chester Zoo(切斯特动物园) University of St Andrews(圣安德鲁大学) Nottingham Trent University(诺丁汉特伦特大学)

AI总结 发布一个针对31类(28种英国常见哺乳动物和鸟类)的开源目标检测模型,基于YOLO26x在48,165个标注实例上训练,mAP@0.5达0.984,旨在降低生态学家使用AI的门槛。

Comments 15 Pages, 4 Figures

详情
AI中文摘要

相机陷阱已成为生物多样性监测的基石,但将大量图像转化为可用生态数据的人工智能通常被锁定在商业平台之后,或针对与不列颠群岛不相符的动物群进行训练。为了消除障碍并提高采用率,我们发布了一个针对31类(28种英国常见哺乳动物和鸟类,以及人类、校准杆和车辆等实用类)的开源目标检测模型,该模型基于从多个地点经过十年运营部署(通过Conservation AI及其后续项目Trap Tracker)收集的48,165个标注实例的精选数据集。该模型是YOLO26x检测器,在80/10/10的类别分层划分上进行训练和测试,在保留的验证集上,IoU为0.5时平均精度为0.984(IoU 0.5-0.95时为0.956),精确率为0.988,召回率为0.965。在未见过的保留测试集上,31个类别的平均物种置信度范围为0.96至0.99,假阴性率为0.17%,主要集中在困难的夜间、远处或遮挡图像中。这些指标来自与训练相同站点和相机池的数据,因此在新站点的性能留待未来工作。我们以非商业许可发布ONNX格式的训练权重,支持本地桌面和实时相机,明确面向没有机器学习经验的生态学家。此发布是对过去十年中开发的多个付费模型的有意制衡。

英文摘要

Camera traps have become a cornerstone of biodiversity monitoring, but the artificial intelligence that turns vast quantities of images into usable ecological data is often locked behind commercial platforms or trained on fauna that does not match that of the British Isles. In an attempt to remove barriers and increase uptake, we release an open-source object detection model for 31 classes, 28 common UK mammal and bird species, plus utility classes for humans, calibration poles, and vehicles, drawn from a curated dataset of 48,165 labelled instances assembled from multiple sites over a decade of operational deployment through Conservation AI and its successor, Trap Tracker. The model, a YOLO26x detector trained and tested on an 80/10/10 class-stratified split, achieves a mean Average Precision of 0.984 at Intersection over Union (IoU) of 0.5 (0.956 at IoU 0.5-0.95) on the held-out validation set, with precision 0.988 and recall 0.965. On an unseen held-out test split, mean per-species confidence ranged from 0.96 to 0.99 across the 31 classes, with a 0.17% false-negative rate concentrated in difficult night-time, distant, or occluded images. These metrics are from data from the same pool of sites and cameras as training, so performance at entirely new sites is left to future work. We release the trained weights in ONNX format under a non-commercial licence, with local desktop and real-time camera support, aimed explicitly at ecologists with no machine-learning experience. This release is a deliberate counterweight to the multiple paid for models that have developed over the last decade.

2606.11063 2026-06-10 cs.AI cs.LG 交叉投稿

CIAware-Bench: Benchmarking Control Intervention Awareness Across Frontier LLMs

CIAware-Bench: 评估前沿大语言模型的控制干预感知能力

Joachim Schaeffer, Thomas Jiralerspong, Alexander Panfilov, Guillaume Lajoie, Jonas Geiping, Yoshua Bengio, Roland S. Zimmermann

发表机构 * MATS Mila – Quebec AI Institute(Mila – 魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) Astra Fellowship ELLIS Institute Tübingen, MPI for Intelligent Systems & Tübingen AI Center LawZero Google DeepMind

AI总结 提出CIAware-Bench基准,通过四个任务域测试模型能否区分自身轨迹与被控制干预修改的轨迹,发现前沿模型在默认设置下感知能力低至中等,且因任务和模型对而异。

详情
AI中文摘要

AI控制协议通过监控不可信模型的行为并修改潜在不安全步骤来监督它们,通常使用可信模型。这在一定程度上篡改了不可信模型的轨迹。如果可信模型检测到这种干预,它可能推断监控器的属性并适应以逃避控制。我们引入\textbf{CIAware-Bench},一个用于衡量前沿模型\textbf{控制干预(CI)感知}能力的基准。CIAware-Bench测试模型能否区分自身轨迹与被控制干预修改的轨迹。该基准包含四个任务域(论文写作、BigCodeBench、Bash Arena和SHADE-Arena),同时改变轨迹水印、侧任务存在和控制协议。评估十一个前沿模型,我们发现默认设置下CI感知能力低至中等(最高0.87;随机机会平衡二元分类准确率为0.5),且在不同任务域和模型对之间存在显著差异。跨模型家族的检测通常更容易,表明模型利用了提供商特定的风格或后训练差异。总体而言,CI感知不是一个固定的模型级属性,应针对每个新模型发布和部署场景进行测量。我们发布CIAware-Bench以跟踪CI感知并为干预更难检测的控制协议提供信息。

英文摘要

AI control protocols oversee untrusted models by monitoring their actions and modifying potentially unsafe steps, often using a trusted model. This partially tampers with the untrusted model's trajectory. If the trusted model detects such an intervention, it may infer properties of the monitor and adapt to evade control. We introduce \textbf{CIAware-Bench}, a benchmark for measuring \textbf{c}ontrol \textbf{i}ntervention (CI) awareness across frontier models. CIAware-Bench tests whether models can distinguish their own trajectories from those modified by a control intervention. The benchmark is comprised of a suite of four task domains (essay writing, BigCodeBench, Bash Arena, and SHADE-Arena), while varying trajectory watermarking, side-task presence, and the control protocol. Evaluating eleven frontier models, we find low to moderate CI awareness under default settings (up to 0.87; random chance balanced binary classification accuracy is 0.5) with substantial variation across task domains and model pairs. Detection is generally easier across model families, suggesting that models exploit provider-specific differences in style or post-training. Overall, CI awareness is not a fixed model-level property, and should be measured for each new model release and deployment scenario. We release CIAware-Bench to track CI awareness and inform control protocols whose interventions are harder to detect.

2606.11098 2026-06-10 cs.CR cs.LG 交叉投稿

Do Transformers Actually Help Intrusion Detection? A Temporal Sequence Evaluation on CIC-IDS2017

Transformer 真的有助于入侵检测吗?基于 CIC-IDS2017 的时间序列评估

Zach Moczkodan, Hany Ragab

发表机构 * Department of Electrical and Computer Engineering, Faculty of Engineering, Royal Military College of Canada (RMC)(加拿大皇家军事学院电气与计算机工程系,工程学院)

AI总结 本研究重新将 CIC-IDS2017 构建为时间序列入侵检测任务,发现填充方式而非架构决定 Transformer 性能,且随机分割和填充方式会高估模型鲁棒性。

Comments 11 pages, 9 figures, 9 tables. Preprint. Code: https://github.com/zachmocz/temporal-ids-bench

详情
AI中文摘要

近年来,用于网络入侵检测的深度学习方法越来越多地采用时间架构,如循环网络和 Transformer,通常在 CIC-IDS2017 上报告接近完美的性能。然而,许多现有研究既没有为其时间模块提供真实的序列输入,也没有在现实、无泄漏的条件下进行评估,使得报告的性能提升是否源于真正的序列建模能力尚不清楚。在这项工作中,我们通过从网络对话中构建有序流序列,并在随机分割、两种无泄漏分割以及填充方案消融下对九种经典和深度学习架构进行基准测试,将 CIC-IDS2017 重新表述为时间入侵检测任务。核心发现是,填充惯例而非架构决定了 Transformer 的性能:在真正的序列(非填充)窗口上,Transformer 实现了实验中所有模型的最高 macro-F1(0.89);在零填充+掩码评估下,其性能显著下降(-0.24 macro-F1),而 LSTM、GRU 和 1D-CNN 保持稳定。在无泄漏组评估下,随机森林是最稳健的模型(+0.009),而 Transformer 的误报率从 0.04% 增长到 2.7%,增加了 67 倍,这在传统协议下是不可见的。这些发现表明,评估方法——特别是填充惯例和分割协议——对报告性能的影响大于架构选择,并且广泛使用的随机分割与重复最后填充可能高估模型鲁棒性高达 0.24 macro-F1。我们主张将无泄漏分割、显式填充披露和序列感知基准测试作为未来入侵检测研究的标准实践。代码和实现细节可在此 https URL 获取。

英文摘要

Recent deep learning approaches for network intrusion detection increasingly incorporate temporal architectures such as recurrent networks and Transformers, often reporting near-perfect performance on CIC-IDS2017. However, many existing studies neither supply their temporal modules with genuine sequence inputs nor evaluate under realistic, leakage-free conditions, making it unclear whether reported gains arise from true sequence-modeling capability. In this work, we reformulate CIC-IDS2017 as a temporal intrusion-detection task by constructing ordered flow sequences from network conversations and benchmarking nine classical and deep learning architectures under a random split, two leakage-free splits, and a padding-scheme ablation. The central finding is that padding convention, not architecture, determines the Transformer's performance: on genuinely sequential (non-padded) windows the Transformer achieves the highest macro-F1 of any model in the experiment (0.89); under zero-pad+mask evaluation it drops markedly (-0.24 macro-F1), while LSTM, GRU, and 1D-CNN remain stable. Under leakage-free group evaluation the Random Forest is the most robust model (+0.009), while the Transformer's false-alarm rate grows from 0.04% to 2.7%, a 67-fold increase invisible under conventional protocols. These findings demonstrate that evaluation methodology -- specifically padding convention and split protocol -- has a larger effect on reported performance than architectural choice, and that widely used random splits with repeat-last padding can overestimate model robustness by up to 0.24 macro-F1. We advocate leakage-free splits, explicit padding disclosure, and sequence-aware benchmarking as standard practice in future IDS research. Code and implementation details are available at https://github.com/zachmocz/temporal-ids-bench.

2102.05314 2026-06-10 cs.LG math.ST stat.ML stat.TH 版本更新

Time series forecasting from partial observations via Non-negative Matrix Factorization

基于非负矩阵分解的部分观测时间序列预测

Yohann de Castro, Luca Mencarelli

发表机构 * Institut Camille Jordan, Ecole Centrale Lyon(让-卡米尔·约当研究所,中央理工大学) Institut Universitaire de France(法国大学研究院)

AI总结 提出滑动掩码方法(SMM)结合非负矩阵补全进行多非负时间序列预测,通过掩码原型矩阵分解(mAMF)和掩码归一化非负矩阵分解(mNMF)实现,理论证明恢复误差与噪声成比例,实验优于Transformer、LSTM等方法。

详情
AI中文摘要

在现代时间序列问题中,我们旨在预测可能包含缺失值和噪声的多重时间序列。本文引入滑动掩码方法(SMM),通过非负矩阵补全来预测多个非负时间序列:将观测到的噪声值和预测/缺失值收集成矩阵形式,并通过将其行表示为少量非负向量(称为原型)的凸组合来实现学习。我们提出了两种估计方法,掩码原型矩阵分解(mAMF)和掩码归一化非负矩阵分解(mNMF),它们可以与SMM方法结合。我们证明这些估计能以与噪声成比例的误差恢复真实原型。我们使用近端交替线性化方法(PALM)来计算原型和凸组合权重。我们在真实数据上将我们的估计器与最先进的方法(Transformer、LSTM、SARIMAX...)进行了多时间序列预测比较,结果表明我们的方法在大多数实验中优于它们。

英文摘要

In modern time series problems, one aims at forecasting multiple time series with possible missing and noisy values. In this paper, we introduce the Sliding Mask Method (SMM) for forecasting multiple nonnegative time series by means of nonnegative matrix completion: observed noisy values and forecast/missing values are collected into matrix form, and learning is achieved by representing its rows as a convex combination of a small number of nonnegative vectors, referred to as the archetypes. We introduce two estimates, the mask Archetypal Matrix factorization (mAMF) and the mask normalized Nonnegative Matrix Factorization (mNMF) which can be combined with the SMM method. We prove that these estimates recover the true archetypes with an error proportional to the noise. We use a proximal alternating linearized method (PALM) to compute the archetypes and the convex combination weights. We compared our estimators with state-of-the-art methods (Transformers, LSTM, SARIMAX...) in multiple time series forecasting on real data and obtain that our method outperforms them in most of the experiments.

2411.02817 2026-06-10 cs.LG cs.AI cs.CV cs.IT math.IT 版本更新

Conditional Vendi Score: Prompt-Aware Diversity Evaluation for Generative AI Models and LLMs

条件 Vendi 分数:生成式 AI 模型和 LLM 的提示感知多样性评估

Mohammad Jalali, Azim Ospanov, Amin Gohari, Farzan Farnia

发表机构 * Department of Computer Science and Engineering, The Chinese University of Hong Kong(计算机科学与工程系,香港中文大学) Department of Information Engineering, The Chinese University of Hong Kong(信息工程系,香港中文大学)

AI总结 针对文本提示引导的生成模型,提出条件 Vendi 和条件 RKE 分数,通过条件熵分离模型自身多样性,并证明收敛性及在多个任务中恢复真实多样性排序。

详情
AI中文摘要

由文本提示引导的生成模型在保真度和提示对齐方面被广泛评估,但其产生输出的能力仍未被充分探索。现有的多样性指标(如基于核矩阵的 von Neumann 和 Rényi 熵的 Vendi 和 RKE)是为无条件模型开发的,无法区分提示引起的变异和模型引起的变异。我们通过引入 \textit{Conditional-Vendi} 和 \textit{Conditional-RKE} 来解决这一差距,这些多样性度量源自正半定矩阵的条件熵。这些分数在提示引导生成中分离出模型引起的多样性,其中 Conditional-RKE 具有 $O(1/\sqrt{n})$ 的收敛速度。对于 Conditional-Vendi,我们引入了一种截断谱近似,产生可扩展且一致的估计。在文本到图像、图像字幕和 LLM 任务上的实验表明,条件分数恢复了真实多样性排序,并且还可以引导扩散模型生成更多样化的样本。代码库可从此 https URL 获取。

英文摘要

Generative models guided by text prompts are widely evaluated for fidelity and prompt alignment, yet their ability to produce outputs remains underexplored. Existing diversity metrics such as Vendi and RKE, which are based on the von Neumann and Rényi entropies of kernel matrices, were developed for unconditional models and cannot distinguish prompt-induced from model-induced variability. We address this gap by introducing \textit{Conditional-Vendi} and \textit{Conditional-RKE}, diversity measures derived from the conditional entropy of positive semidefinite matrices. These scores isolate model-induced diversity in prompt-guided generation, with Conditional-RKE enjoying an $O(1/\sqrt{n})$ convergence rate. For Conditional-Vendi, we introduce a truncated-spectrum approximation that yields scalable and consistent estimates. Experiments on text-to-image, image-captioning, and LLM tasks show that the conditional scores recover ground-truth diversity orderings and can also guide diffusion models toward more diverse samples. The codebase is available at https://github.com/mjalali/conditional-vendi.

2508.13362 2026-06-10 cs.LG 版本更新

Optimization-based Online Conformal Prediction for Multi-step Forecasting

基于优化的在线共形预测用于多步预测

Ruipu Li, Daniel Menacho, Alexander Rodríguez

发表机构 * University of Michigan(密歇根大学)

AI总结 提出O2CP框架,通过双层优化结构建模多步误差依赖,在保证边际覆盖有效性的同时生成更窄的预测区间,实验表明在自动驾驶、气候预测等领域优于现有方法。

详情
AI中文摘要

共形预测(CP)因其无分布覆盖保证而非常适合时间序列预测中的不确定性量化。然而,现有的多步方法往往难以平衡覆盖有效性与效率:它们要么独立校准每个预测步长,忽略时间相关性,要么强制执行严格的同步覆盖,导致区间过于保守。在这项工作中,我们提出了O2CP:基于优化的在线共形预测,这是一个统一的在线共形预测框架,显式建模多步误差依赖关系,同时不牺牲长期边际覆盖保证。我们首先证明,只要校准参数保持在定义的“安全”区域内,标准的在线共形更新就能保持有效性。利用这一理论见解,我们引入了一个双层架构:外层定义可接受的参数集以确保有效性,内层执行约束优化以建模联合误差分布并最小化整个预测步长的目标函数。为了使其计算可行,我们开发了一种轻量级采样策略,无需大型校准集即可估计联合分布。在包括自动驾驶、气候预测和公共卫生在内的真实世界数据集上的大量实验表明,O2CP始终优于最先进的基线方法,在实现目标覆盖率的同时,预测区间显著更窄,且长期遗憾更小。

英文摘要

Conformal prediction (CP) is well-suited for uncertainty quantification in time series forecasting due to its distribution-free coverage guarantees. However, existing multi-step methods often struggle to balance coverage validity with efficiency: they either calibrate horizons independently, ignoring temporal correlations, or enforce strict simultaneous coverage, resulting in overly conservative intervals. In this work, we propose O2CP: Optimization-based Online Conformal Prediction, a unified framework for online conformal prediction that explicitly models multi-step error dependencies without sacrificing long-term marginal coverage guarantees. We first prove that standard online conformal updates maintain validity as long as calibration parameters remain within a defined "safe" region. Leveraging this theoretical insight, we introduce a two-layer architecture: an outer layer that defines admissible parameter sets to ensure validity, and an inner layer that performs constrained optimization to model joint error distributions and minimize horizon-wide objectives. To make this computationally feasible, we develop a lightweight sampling strategy that estimates joint distributions without requiring large calibration sets. Extensive experiments on real-world datasets, including autonomous driving, climate forecasting, and public health, demonstrate that O2CP consistently outperforms state-of-the-art baselines, achieving target coverage with significantly sharper prediction intervals and reduced regret over long horizons.

2604.12306 2026-06-10 cs.LG cs.AI 版本更新

GCA Framework: A GCC Countries-Grounded Dataset and Agentic Pipeline for Climate Decision Support

GCA框架:面向海湾合作委员会国家的数据集与气候决策支持智能体管道

Muhammad Umer Sheikh, Khawar Shehzad, Salman Khan, Fahad Shahbaz Khan, Muhammad Haris Khan

发表机构 * Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI)(莫扎德人工智能大学) University of Missouri(密苏里大学) Australian National University(澳大利亚国立大学) Linköping University(林肯大学)

AI总结 提出GCA框架,包含GCC国家多模态数据集GCA-DS和工具增强型智能体GCA,通过领域微调和工具集成提升气候决策可靠性。

详情
AI中文摘要

海湾合作委员会(GCC)国家的气候决策日益需要能够将异质的科学和政策证据转化为可操作指导的系统,然而通用大语言模型(LLM)在区域特定气候知识以及与地理空间和预测工具的接地交互方面仍然薄弱。我们提出GCA框架,它统一了(i)GCA-DS,一个基于GCC国家的精选多模态数据集,以及(ii)Gulf Climate Agent(GCA),一个工具增强型气候分析智能体。GCA-DS包含20万个问答对,涵盖政府政策和适应计划、非政府组织和国际框架、学术文献以及关于热浪、沙尘暴和洪水的事件驱动报告,并辅以将图像与文本证据相结合的遥感输入。在此基础上,GCA智能体编排了一个基于实时和历史信号以及地理空间处理的模块化工具管道,生成衍生指数和可解释的可视化。最后,我们在GCC国家的气候任务上对开源和专有LLM进行了基准测试,结果表明领域微调和工具集成显著提高了相对于通用基线的可靠性。

英文摘要

Climate decision-making in the GCC states increasingly demands systems that can translate heterogeneous scientific and policy evidence into actionable guidance, yet general-purpose large language models (LLMs) remain weak both in region-specific climate knowledge and grounded interaction with geospatial and forecasting tools. We present the GCA framework, which unifies (i) GCA-DS, a curated multimodal dataset grounded in the GCC states, and (ii) Gulf Climate Agent (GCA), a tool-augmented agent for climate analysis. GCA-DS comprises 200k question--answer pairs spanning governmental policies and adaptation plans, NGO and international frameworks, academic literature, and event-driven reporting on heatwaves, dust storms, and floods, complemented with remote-sensing inputs that couple imagery with textual evidence. Building on this foundation, the GCA agent orchestrates a modular tool pipeline grounded in real-time and historical signals and geospatial processing that produces derived indices and interpretable visualizations. Finally, we benchmark open and proprietary LLMs on climate tasks in the GCC states and show that domain fine-tuning and tool integration substantially improve reliability over general-purpose baselines.

2605.01248 2026-06-10 cs.LG 版本更新

$S^3$-R1: Learning to Retrieve and Answer Step-by-Step with Synthetic Data

$S^3$-R1: 通过合成数据学习逐步检索与回答

Harsh Goel, Akhil Udathu, Susmija Jabbireddy, Pradnesh Kalkar, Atharva Parulekar

发表机构 * The University of Texas at Austin(德克萨斯大学奥斯汀分校) Google DeepMind(谷歌深Mind)

AI总结 提出S^3-R1框架,通过合成数据生成和密集奖励信号,解决强化学习后训练中稀疏奖励和缺乏多跳问题数据的问题,提升模型搜索与问答能力。

Comments Under Review

详情
AI中文摘要

强化学习后训练使模型具备了新的能力,例如使用工具进行搜索。然而,这些模型主要面临两个限制:稀疏的基于结果的奖励,以及缺乏涵盖不同难度问题的训练数据,导致模型无法使用工具进行更深层次的搜索来收集证据以回答问题。为了解决这些限制,我们引入了S^3-R1(合成数据与稳定搜索R1),一个将数据驱动方法与更密集的学习信号相结合的框架。我们首先开发了一个合成生成与筛选流程,从现有文档中程序化地派生出多样化的多跳问题。该流程包含一个基于检索的验证步骤,专门用于分离出中等难度的问题。然后,我们将这个扩展的训练集与一个奖励结构配对,该结构同时评估中间搜索质量和最终答案的正确性。这种设置直接缓解了稀疏奖励固有的信用分配问题。我们的评估表明,S^3-R1通过学习更有效的搜索和综合策略,优于现有基线,在域外数据集上实现了高达10%的稳健泛化改进。

英文摘要

Reinforcement learning (RL) post-training has enabled newer capabilities in models, such as agentic tool-use for search. However, these models struggle primarily due to limitations with sparse outcome-based rewards and a lack of training data that encapsulates questions of differing hardness, which results in models not performing deeper searches with tools to collect evidence for question-answering. To address these limitations, we introduce S^3-R1 (Synthetic data and stabilized Search R1), a framework that couples a data-centric approach with denser learning signals. We first develop a synthetic generation and curation pipeline that programmatically derives diverse, multi-hop questions from existing documents. This pipeline incorporates a retrieval-based verification step to specifically isolate questions of intermediate difficulty. We then pair this expanded training set with a reward structure that evaluates both intermediate search quality and the correctness of the final answer. This setup directly mitigates the credit assignment problems inherent to sparse rewards. Our evaluations show that S^3-R1 outperforms existing baselines by learning more effective search and synthesis strategies, yielding up to a 10% improvement in robust generalization on out-of-domain datasets.

2606.06698 2026-06-10 cs.LG cs.CL 版本更新

RECAP: Regression Evaluation for Continual Adaptation of Prompts

RECAP: 提示持续适应的回归评估

Harsh Deshpande, Kushal Chawla, Sangwoo Cho, William Campbell, Sambit Sahu

发表机构 * Capital One

AI总结 提出RECAP基准,在严格主动适应-测试协议下评估提示优化方法对约束变化的持续学习能力,发现现有方法在主动场景下性能无显著提升,强调设计主动提示适应方法的必要性。

详情
AI中文摘要

生产中的代理系统经常面临不断变化的约束,并且必须从下一次交互开始就遵守。诸如工具调用通知更改合规阈值或策略更新添加披露要求等场景符合这一标准,在生产中几乎没有出错的空间。这种主动适应设置在部署中很常见,但在当前的基准测试中却不存在,这些基准测试假设要么是静态约束集,要么是带有评估反馈的反应式协议。我们引入了RECAP,这是一个基准测试,在严格主动适应-测试协议下,在约束级别测量持续学习现象(遗忘、回归、前向转移):提示优化方法仅接收约束规范,并且必须在看到任何测试数据之前进行泛化。我们在四个LLM和三个具有不断变化的约束的调度上评估了六种方法,发现这些方法在性能上没有显著改善,即使在产生更高延迟之后也是如此。这些为离线或反应式设置设计的方法不足以应对主动范式。我们的工作强调了设计主动提示适应方法的日益增长的需求,其中模型必须对部署中不断变化的需求保持鲁棒性。

英文摘要

Production agentic systems routinely face evolving constraints and must comply from the very next interaction. Scenarios like a tool-call notification changing a compliance threshold or a policy update adding disclosure requirements fit this criteria, having close to no room for errors in production. This proactive adaptation setting is common in deployment, but absent from current benchmarks, which assume either static constraint sets or reactive protocols with evaluation feedback. We introduce RECAP, a benchmark that measures continual-learning phenomena (forgetting, regression, forward transfer) at the constraint level under a strictly proactive adapt-then-test protocol: prompt optimization methods receive only the constraint specification and must generalize before seeing any test data. Evaluating six methods across four LLMs and three schedules with evolving constraints, we find that these methods show no significant improvement in performance, even after incurring a higher latency. These methods, designed for offline or reactive settings, are inadequate for the proactive paradigm. Our work emphasizes the growing need for designing proactive prompt adaptation methods, where the models must remain robust to evolving needs in deployment.

2606.06742 2026-06-10 cs.LG stat.ML 版本更新

TorchKM: A GPU-Oriented Library for Kernel Learning and Model Selection

TorchKM:面向GPU的核学习与模型选择库

Yikai Zhang, Gaoxiang Jia, Jie Ding, Boxiang Wang

发表机构 * University of Iowa(爱荷华大学) University of Minnesota(明尼苏达大学) Individual Researcher(独立研究者) AIScientists, Inc. (MorphMind)(AIScientists公司(MorphMind)) Department of Statistics and Actuarial Science, University of Iowa(爱荷华大学统计与精算科学系)

AI总结 提出GPU加速的核学习库TorchKM,通过智能复用矩阵运算加速SVM、核逻辑回归等模型的训练与模型选择,性能优于标准基线。

Comments 14 pages, 2 figures

详情
AI中文摘要

TorchKM是一个用于核机器的开源库,包括支持向量机、核逻辑回归和核分位数回归,并具有GPU加速。该库采用scikit-learn风格的API,旨在利用GPU友好的线性代数,通过智能复用矩阵运算加速完整的训练和模型选择流程。基准测试显示,与标准基线相比,具有竞争力的预测性能以及显著的加速效果。代码和文档可在https://this URL获取,并且该包可以通过PyPI轻松安装。

英文摘要

TorchKM is an open-source library for kernel machines, including support vector machines, kernel logistic regression, and kernel quantile regression, with GPU acceleration. The library features a scikit-learn-style API and is designed to exploit GPU-friendly linear algebra, accelerating the full training and model-selection pipeline through intelligent reuse of matrix operations. Benchmarks show competitive predictive performance with substantial speedups over standard baselines. The efficiency and programmable design also make TorchKM a kernel-learning component for AI-driven workflows. Code and documentation are available at https://github.com/YikaiZhang95/torchkm, and the package can be easily installed via PyPI.

2606.09026 2026-06-10 cs.LG 版本更新

Structural Grid Descriptors Predict Within-Task Solver Success on ARC-AGI

结构网格描述符预测ARC-AGI任务内求解器成功率

Ayan Pendharkar

发表机构 * ARC-AGI Report(ARC-AGI报告)

AI总结 通过条件互信息检验,发现中间网格状态的结构属性可预测符号ARC-AGI求解器在相同任务内的成败,主要预测信息沿单一网格复杂度轴分布,且跨求解器架构泛化。

详情
AI中文摘要

我们询问中间网格状态的结构属性是否能预测符号ARC-AGI求解器是否成功,将其作为条件互信息I(X;Y|task) > 0的检验。在跨越两个架构不同的求解器(束搜索和随机DFS)、400个ARC任务、每个求解器28种配置以及训练和评估分割的44,800次运行中,在50%轨迹完成度处测量的人工设计网格描述符区分了相同任务内的成功和失败运行(平均任务内最佳特征AUC = 0.885,在任务内标签置换下p < 0.001)。大部分预测内容沿单一网格复杂度轴分布。该结果跨求解器架构泛化:在一个求解器上选择的特征在四个转移方向上预测另一个求解器的成功率,AUC为0.747-0.762(p < 0.001,控制泄漏)。在预注册的41个可靠任务保留集上,冻结特征n_components_final达到AUC = 0.765(95% CI [0.717, 0.810],p < 0.001),在任务聚类自助重采样和跨求解器任务合并下稳健。该信号不能由求解器容量解释(配置残差化后束搜索和SDFS的AUC分别为0.927和0.896,p < 0.001),且与得分轨迹弱相关(R^2约0)。在50%完成度时提前停止将束搜索计算量减少33.6%,同时保留98.9%的解法;退化轨迹检测将SDFS计算量减少65.3%,且无解法损失。最后,在400个评估任务中的229个中,DSL基本库从输入网格无法产生有效转换。这种0步崩溃对搜索预算不变,且束搜索普遍失败,表明是DSL覆盖范围限制而非搜索预算效应。

英文摘要

We ask whether structural properties of intermediate grid states predict whether a symbolic ARC-AGI solver will succeed, framed as a test of conditional mutual information I(X;Y|task) > 0. Across 44,800 runs spanning two architecturally distinct solvers (beam search and Stochastic DFS), 400 ARC tasks, 28 configurations per solver, and both training and evaluation splits, hand-crafted grid descriptors measured at 50% trajectory completion discriminate successful from failed runs within the same task (mean within-task best-feature AUC = 0.885, p < 0.001 under within-task label permutation). Most predictive content lies along a single grid-complexity axis. The result generalizes across solver architectures: a feature selected on one solver predicts success on the other with AUC 0.747-0.762 in all four transfer directions (p < 0.001, leakage controlled). On a pre-registered held-out set of 41 reliable tasks, the frozen feature n_components_final achieves AUC = 0.765 (95% CI [0.717, 0.810], p < 0.001), robust under task-clustered bootstrap resampling and cross-solver task collapsing. The signal is not explained by solver capacity (configuration-residualized AUC = 0.927 and 0.896 for beam search and SDFS, p < 0.001) and is only weakly coupled to score trajectories (R^2 approximately 0). Early stopping at 50% completion reduces beam-search compute by 33.6% while retaining 98.9% of solves; degenerate-trajectory detection reduces SDFS compute by 65.3% with no solve loss. Finally, on 229 of 400 evaluation tasks the DSL primitive library produces no valid transition from the input grid. This 0-step collapse is invariant to search budget and universally failed by beam search, indicating a DSL coverage limitation rather than a search-budget effect.

2606.09601 2026-06-10 cs.LG 版本更新

Assessing Sample Quality in Conditional Generation under Compositional Shift

在组合偏移下评估条件生成中的样本质量

Berker Demirel, Valentino Maiorca, Marco Fumero, Theofanis Karaletsos, Francesco Locatello

发表机构 * Institute of Science and Technology Austria (ISTA)(奥地利科学技术学院) Pyramidal Inc(Pyramidal公司) Achira Inc(Achira公司)

AI总结 针对条件生成在组合偏移下的评估难题,提出一种基于训练分布的后验信任分数,结合全局真实性和属性忠实度,实现样本过滤、排序和弃权,提升生成质量。

详情
AI中文摘要

条件生成器为可控生成提供了自然工具,包括所需条件是观测属性或实验因素的新组合的场景。在许多应用中,尤其是在科学领域,此类模型对于探索真实样本稀少、昂贵或尚未观测到的条件具有吸引力。然而,这给评估带来了循环问题:标准条件质量指标需要参考目标分布,但在外推场景下该分布根据定义不可用。我们通过一个基于训练分布的后验、每样本信任分数来解决这个问题,该分数用于评估条件样本。该分数结合了两个可估计的量:全局真实性(衡量与真实数据流形的兼容性)和属性忠实度(衡量样本是否更接近请求的属性而非合理的替代属性)。我们证明,在观测属性的温和覆盖条件下,该分数可以恢复跨外推生成的有意义比较。这些比较能够实现生成的有效过滤、排序和弃权,并可直接用于现成的预训练模型。在生物成像中,选定的样本更好地保留了真实的形态结构,并提高了下游预测性能,而在受控视觉基准上也观察到类似的增益。最后,我们展示了该分数如何在生成过程中应用,从而在完全解码之前实现弃权。代码可在 https://github.com/berkerdemirel/faithful-cond-gen 获取。

英文摘要

Conditional generators provide a natural tool for controllable generation, including settings where the desired condition is a new composition of observed attributes or experimental factors. In many applications, especially in scientific domains, such models are attractive to explore conditions for which real samples are rare, expensive, or not yet observed. However, this creates a circularity for evaluation: standard conditional quality metrics require a reference target distribution, but in the extrapolative regime that distribution is unavailable by definition. We address this problem with a post-hoc, per-sample trust score for assessing conditional samples using only the training distribution. The score combines two estimable quantities: global realism, measuring compatibility with the real data manifold, and attribute-wise faithfulness, measuring whether a sample is closer to the requested attributes than to plausible alternatives. We show that the score can recover meaningful comparisons across extrapolated generations, under a mild coverage condition on the observed attributes. These comparisons enable effective filtering, ranking, and abstention of generations and can be used directly on off-the-shelf pretrained models. In biological imaging, selected samples preserve real morphological structure better and improve downstream predictive performance, while similar gains are observed on controlled vision benchmarks. Finally, we show how the score can be applied during generation, enabling abstention before full decoding. Code is available at https://github.com/berkerdemirel/faithful-cond-gen.

2505.11034 2026-06-10 cs.CV cs.AI cs.LG 版本更新

CleanPatrick: A Benchmark for Image Data Cleaning

CleanPatrick: 图像数据清洗基准

Fabian Gröger, Simone Lionetti, Philippe Gottfrois, Alvaro Gonzalez-Jimenez, Ludovic Amruthalingam, Elisabeth Victoria Goessinger, Hanna Lindemann, Marie Bargiela, Marie Hofbauer, Omar Badri, Philipp Tschandl, Arash Koochek, Matthew Groh, Alexander A. Navarini, Marc Pouly

发表机构 * University of Basel(巴塞尔大学) Lucerne University of Applied Sciences and Arts(卢塞恩应用科学大学) University Hospital of Basel(巴塞尔大学医院) Northwestern University(西北大学) Northeast Dermatology Associates(东北皮肤科诊所) Medical University of Vienna(维也纳医科大学) Banner Health(Banner健康系统)

AI总结 提出首个大规模图像数据清洗基准CleanPatrick,基于Fitzpatrick17k皮肤病数据集,收集大量众包标注并采用项目反应理论聚合,将问题检测形式化为排序任务,评估多种方法。

Comments Accepted at Journal of Data-centric Machine Learning Research (DMLR)

详情
AI中文摘要

鲁棒的机器学习依赖于干净的数据,然而当前的图像数据清洗基准依赖于合成噪声或狭窄的人类研究,限制了比较和现实相关性。我们引入CleanPatrick,这是图像领域首个大规模数据清洗基准,基于公开的Fitzpatrick17k皮肤病学数据集构建。我们收集了来自933名医学众包工作者的496,377个二元标注,识别出离题样本(4%)、近似重复(21%)和标签错误(32%),并采用受项目反应理论启发的聚合模型,随后经过专家审查以获得高质量的真实标签。CleanPatrick将问题检测形式化为排序任务,并采用反映真实审计流程的标准排序指标。我们基准测试了经典异常检测器、感知哈希、SSIM、Confident Learning、NoiseRank、FINE、BHN和SelfClean。在CleanPatrick上,自监督表示在近似重复检测方面表现出色,经典方法在受限审查预算下实现了有竞争力的离题检测,而在保守的人类判断下检测不合理标签对于细粒度医学分类仍然具有挑战性。通过发布数据集和评估框架,CleanPatrick使得图像清洗策略的系统比较成为可能。

英文摘要

Robust machine learning depends on clean data, yet current image data cleaning benchmarks rely on synthetic noise or narrow human studies, limiting comparison and real-world relevance. We introduce CleanPatrick, the first large-scale benchmark for data cleaning in the image domain, built upon the publicly available Fitzpatrick17k dermatology dataset. We collect 496,377 binary annotations from 933 medical crowd workers, identify off-topic samples (4%), near-duplicates (21%), and label errors (32%), and employ an aggregation model inspired by item-response theory followed by expert review to derive high-quality ground truth. CleanPatrick formalizes issue detection as a ranking task and employs standard ranking metrics that mirror real audit workflows. We benchmark classical anomaly detectors, perceptual hashing, SSIM, Confident Learning, NoiseRank, FINE, BHN, and SelfClean. On CleanPatrick, self-supervised representations excel at near-duplicate detection, classical methods achieve competitive off-topic detection under constrained review budgets, and detecting implausible labels under conservative human judgment remains challenging for fine-grained medical classification. By releasing both the dataset and the evaluation framework, CleanPatrick enables a systematic comparison of image-cleaning strategies.

2510.08906 2026-06-10 stat.ML cs.LG physics.chem-ph 版本更新

Gradient-Guided Furthest Point Sampling for Robust Training Set Selection

梯度引导的最远点采样用于鲁棒训练集选择

Morris Trestman, Stefan Gugler, Felix A. Faber, O. A. von Lilienfeld

发表机构 * Berlin Institute for the Foundations of Learning(柏林学习与数据基础研究院) Chemical Physics Theory Group, Department of Chemistry, University of Toronto, St. George Campus, Toronto, ON, Canada(化学物理理论组,化学系,多伦多大学圣乔治校区,多伦多,ON,加拿大) Department of Materials Science and Engineering, University of Toronto, St. George Campus, Toronto, ON, Canada(材料科学与工程系,多伦多大学圣乔治校区,多伦多,ON,加拿大) Vector Institute for Artificial Intelligence, Toronto, ON, Canada(人工智能研究所,多伦多,ON,加拿大) Department of Physics, University of Toronto, St. George Campus, Toronto, ON, Canada(物理系,多伦多大学圣乔治校区,多伦多,ON,加拿大) Acceleration Consortium, University of Toronto, Toronto, ON, Canada(加速联盟,多伦多大学,多伦多,ON,加拿大)

AI总结 提出梯度引导最远点采样(GGFPS),利用分子力范数指导构型空间采样,在MD17数据集上相比FPS和随机采样显著提升数据效率和模型鲁棒性。

Comments 41 pages, 43 figures, 2 algorithms; journal article with supplementary information appended

详情
Journal ref
Machine Learning: Science and Technology 7, 035047 (2026)
AI中文摘要

训练集采样方法用于提高机器学习问题中与化学相关的模型性能并降低数据成本。我们引入了梯度引导最远点采样(GGFPS),这是最远点采样(FPS)的一个简单扩展,利用分子力范数指导分子构型空间的高效采样。针对一个玩具系统(Styblinski-Tang函数)以及来自MD17数据集的分子动力学轨迹,提供了数值证据。我们的数值结果表明,与FPS、均匀随机采样(URS)以及已有的监督式FPS风格选择器PCov-FPS和PCov-CUR相比,使用GGFPS时数据效率和模型鲁棒性更优。对MD17数据的分布分析表明,FPS系统性地欠采样平衡几何结构,导致松弛结构测试误差较大。GGFPS纠正了这一缺陷,并且(i)在二维Styblinski-Tang系统中,与FPS相比,在不牺牲预测精度的情况下,训练成本可降低两倍;(ii)系统性地降低了MD17中平衡以及应变结构的预测误差;(iii)在所有MD17构型空间中系统性地降低了预测误差方差。这些结果表明,梯度感知采样方法作为有效的训练集选择工具具有很大潜力,而简单使用FPS可能导致训练不平衡和预测结果不一致。

英文摘要

Training set sampling methods are used to improve model performance and lower data costs in machine learning problems relevant to chemistry. We introduce Gradient Guided Furthest Point Sampling (GGFPS), a simple extension of Furthest Point Sampling (FPS) that leverages molecular force norms to guide efficient sampling of configurational spaces of molecules. Numerical evidence is presented for a toy system (the Styblinski-Tang function) as well as for molecular dynamics trajectories from the MD17 dataset. Our numerical results indicate superior data efficiency and model robustness when using GGFPS compared to FPS and uniform random sampling (URS), as well as established supervised FPS-style selectors, PCov-FPS and PCov-CUR. Distribution analysis of the MD17 data suggests that FPS systematically under-samples equilibrium geometries, resulting in large test errors for relaxed structures. GGFPS cures this artifact and (i) enables up to twofold reductions in training cost without sacrificing predictive accuracy compared to FPS in the 2-dimensional Styblinski-Tang system, (ii) systematically lowers prediction errors for equilibrium as well as strained structures in MD17, and (iii) systematically decreases prediction error variances across all of the MD17 configuration spaces. These results suggest that gradient-aware sampling methods hold great promise as effective training set selection tools, and that naive use of FPS may result in imbalanced training and inconsistent prediction outcomes.

2512.11995 2026-06-10 cs.CV cs.AI cs.LG 版本更新

V-REX: Benchmarking Exploratory Visual Reasoning via Chain-of-Questions

V-REX: 通过问题链进行探索性视觉推理的基准测试

Chenrui Fan, Yijun Liang, Shweta Bhardwaj, Kwesi Cobbina, Ming Li, Tianyi Zhou

发表机构 * University of Maryland, College Park(马里兰大学学院市分校) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学)

AI总结 提出V-REX基准,通过问题链将多步探索推理分解为规划和遵循能力,评估视觉语言模型在复杂开放任务中的表现。

Comments 28 pages

详情
AI中文摘要

尽管许多视觉语言模型(VLM)被开发用于回答定义明确、目标高度具体的简单问题(如大多数基准测试所示),但在实践中,它们通常难以处理复杂的开放式任务,这些任务通常需要在视觉空间中进行多轮探索和推理。这种视觉思维路径不仅像AI侦探一样提供逐步探索和验证,还能对最终答案产生更好的解释。然而,由于中间步骤的探索空间巨大,这些路径难以评估。为弥补这一差距,我们开发了一个评估套件“多步探索视觉推理(V-REX)”,它由一个具有挑战性的视觉推理任务基准和一个评估协议组成。V-REX涵盖了跨不同领域的丰富应用场景。V-REX将多步探索推理转化为问题链(CoQ),并解耦了VLM的能力:(1)规划:通过选择一系列探索性问题来分解开放式任务;(2)遵循:顺序回答精心策划的CoQ以收集信息,从而推导出最终答案。通过每步策划有限的问题和答案选项,V-REX实现了对中间步骤的可靠定量和细粒度分析。通过评估最先进的专有和开源VLM,我们揭示了持续的扩展趋势、规划与遵循能力之间的显著差异,以及多步探索推理中巨大的改进空间。

英文摘要

While many vision-language models (VLMs) are developed to answer well-defined, straightforward questions with highly specified targets, as in most benchmarks, they often struggle in practice with complex open-ended tasks, which usually require multiple rounds of exploration and reasoning in the visual space. Such visual thinking paths not only provide step-by-step exploration and verification as an AI detective but also produce better interpretations of the final answers. However, these paths are challenging to evaluate due to the large exploration space of intermediate steps. To bridge the gap, we develop an evaluation suite, ``Visual Reasoning with multi-step EXploration (V-REX)'', which is composed of a benchmark of challenging visual reasoning tasks requiring native multi-step exploration and an evaluation protocol. V-REX covers rich application scenarios across diverse domains. V-REX casts the multi-step exploratory reasoning into a Chain-of-Questions (CoQ) and disentangles VLMs' capability to (1) Planning: breaking down an open-ended task by selecting a chain of exploratory questions; and (2) Following: answering curated CoQ sequentially to collect information for deriving the final answer. By curating finite options of questions and answers per step, V-REX achieves a reliable quantitative and fine-grained analysis of the intermediate steps. By assessing SOTA proprietary and open-sourced VLMs, we reveal consistent scaling trends, significant differences between planning and following abilities, and substantial room for improvement in multi-step exploratory reasoning.

2603.29730 2026-06-10 stat.ML cs.LG 版本更新

mlr3mbo: Bayesian Optimization in R

mlr3mbo:R语言中的贝叶斯优化

Marc Becker, Lennart Schneider, Martin Binder, Lars Kotthoff, Bernd Bischl

发表机构 * Department of Statistics, LMU Munich(慕尼黑大学统计系) Munich Center for Machine Learning (MCML)(慕尼黑机器学习中心) University of St Andrews(圣安德鲁大学)

AI总结 介绍mlr3mbo,一个模块化的R语言贝叶斯优化工具箱,支持单/多目标优化、多提议、并行化,并通过坐标下降搜索和基准测试验证其性能与现有优化器相当。

详情
AI中文摘要

我们提出mlr3mbo,一个用于R语言中贝叶斯优化的模块化工具箱。mlr3mbo支持单目标和多目标优化、多点提议、批量与异步并行化以及稳健的错误处理。虽然它可用于许多标准贝叶斯优化变体的应用场景,但研究人员也可以从其灵活的构建块中构建自定义贝叶斯优化算法。除了介绍软件、设计原则和构建块外,本文还在基于代理的基准套件YAHPO Gym上进行了两次广泛的实证评估。为了识别数值和混合层次优化场景下的稳健默认配置,并进一步了解各个设置的各自影响,我们在mlr3mbo配置空间上运行坐标下降搜索并分析其结果。此外,我们将mlr3mbo与包括HEBO、SMAC3、Ax和Optuna在内的多种现有优化器进行基准测试,发现其性能与最新技术相当。

英文摘要

We present mlr3mbo, a modular toolbox for Bayesian optimization in R. mlr3mbo supports single- and multi-objective optimization, multi-point proposals, batch and asynchronous parallelization, and robust error handling. While it can be used for many standard Bayesian optimization variants in applied settings, researchers can also construct custom Bayesian optimization algorithms from its flexible building blocks. In addition to an introduction to the software, its design principles, and its building blocks, the paper presents two extensive empirical evaluations on the surrogate-based benchmark suite YAHPO Gym. To identify robust default configurations for both numeric and mixed-hierarchical optimization regimes, and to gain further insights into the respective impacts of individual settings, we run a coordinate descent search over the mlr3mbo configuration space and analyze its results. Furthermore, we benchmark mlr3mbo against a wide range of established optimizers, including HEBO, SMAC3, Ax, and Optuna, and find that it performs on par with state-of-the-art.

2605.24818 2026-06-10 stat.ME cs.CL cs.LG 版本更新

Spiking the training data to correct for test set contamination

向训练数据注入噪声以校正测试集污染

Johnny Tian-Zheng Wei, Jerry Li, Ameya Godbole, Robin Jia

发表机构 * University of Southern California(南加州大学)

AI总结 提出通过以已知比例故意污染部分测试样本(注入噪声)来校正测试集污染导致的分数膨胀,并利用记忆预测器进行统计校正。

详情
AI中文摘要

关于测试集污染的文献主要集中在检测上,但对污染测试分数的校正研究不足。我们的核心建议是通过以已知比例故意污染一些测试样本来向训练数据注入噪声。然后,这些注入的样本可用于校准模型记忆的预测器,从而实现对膨胀测试分数的原则性统计校正。为了评估不同的校正估计量,我们首先提出了一个基于Hubble模型的模拟框架。Hubble模型以最小对形式出现,其中扰动模型被故意用几个测试集污染,而标准模型则没有,作为反事实和校正目标。我们考虑使用来自记忆预测器、正确性预测器或两者的信息的估计量。在模拟中,我们建立了基本的统计直觉,并表明利用记忆和正确性信息的估计量优于不做任何校正的朴素估计。然后,我们实例化了几种记忆和正确性预测器,并发现简单的预测器(如Platt缩放的成员推理指标)为校正提供了良好的信号。最后,我们考察了注入噪声的实际考虑。简单的记忆预测器在校准时不需要超过10个样本,并且通常从一个数据集迁移到另一个数据集。综上所述,注入噪声是解决测试集污染的一种有前景的方法。

英文摘要

The literature on test set contamination largely focuses on detection, but the correction of contaminated test scores is underexplored. Our core proposal is to spike the training data by intentionally contaminating some test examples at known rates. The spiked examples can then be used to calibrate predictors of model memorization which enable principled statistical correction of inflated test scores. To evaluate different correction estimators, we first present a simulation framework based on the Hubble models. Hubble models come in minimal pairs, where the perturbed model was deliberately contaminated with several test sets, while the standard model was not, serving as the counterfactual and correction target. We consider estimators that use information from a memorization predictor, correctness predictor, or both. In simulation, we establish basic statistical intuitions and show that estimators leveraging memorization and correctness information are better than naive estimation which makes no correction at all. We then instantiate several memorization and correctness predictors, and find that simple predictors such as Platt-scaled membership inference metrics provide good signal for correction. Finally, we examine the practical considerations of spiking. Simple memorization predictors need no more than 10 examples for calibration and often transfer from one dataset to another. Taken together, spiking is a promising solution for test set contamination.

2605.30292 2026-06-10 stat.ML cs.LG math.ST stat.ME stat.TH 版本更新

Leave a Window Out: Modifying the Jackknife for Predictive Inference in Time Series

留出一个窗口:修改刀切法用于时间序列的预测推断

Hanyang Jiang, Rina Foygel Barber, Ashwin Pananjady, Yao Xie

发表机构 * Schools of Industrial and Systems Engineering and Electrical and Computer Engineering(工业与系统工程系和电气与计算机工程系) Department of Statistics, University of Chicago(芝加哥大学统计系)

AI总结 针对时间序列中数据非可交换性和记忆预测器的问题,提出留出一个窗口(LWO)方法,通过修改刀切法实现有效覆盖,并产生比分裂共形预测更窄的区间。

Comments 40 pages, 8 figures

详情
AI中文摘要

共形预测方法在数据可交换且预测器以无记忆方式训练时,具有强大的理论和经验预测推断性能。然而,这些假设和约束在许多真实数据场景中不切实际,例如时间序列(其中时间依赖性违反了可交换性,并且无记忆预测器不可避免地具有较差的预测准确性)。最近的研究表明,分裂共形预测方法对于记忆预测器和偏离可交换性(这是时间序列数据的常见特征)具有鲁棒性。然而,由于使用样本分裂可能导致较低的准确性,这促使我们探究其他不依赖数据分裂的预测推断方法是否也能可靠地用于时间序列设置。在这项工作中,我们表明即使在具有轻微时间依赖性的典型时间序列模型中,原始的留一刀切法也可能遭受任意的覆盖损失。作为补救措施,我们提出了一种针对此类设置的精心修改,称为留出一个窗口(LWO)方法,并表明只要模型拟合过程满足温和的稳定性条件,它就能实现有效的覆盖。我们的证明基于量化数据偏离循环可交换性的程度,并引入了新的系数来衡量这种偏离的程度。在时间序列数据上的实验表明,当原始刀切法无法覆盖时,我们的LWO方法通常能实现有效的覆盖,同时产生比分裂共形预测更窄的区间。

英文摘要

Conformal prediction methods enjoy strong theoretical and empirical predictive inference performance, provided the data is exchangeable and is treated symmetrically during training. However, these assumptions are impractical in many settings, such as time series, where temporal dependence violates exchangeability and it is preferable to use predictors that leverage dependence by treating data asymmetrically. Recent work shows that split conformal prediction is robust to these issues, but sample splitting can reduce accuracy, motivating the study of methods that do not rely on data splitting in the time series setting. In this work, we show that the vanilla leave-one-out jackknife can suffer arbitrary loss of coverage even in canonical time series models with mild temporal dependence. As a remedy, we propose a modification tailored to such settings, which we term the leave-a-window-out (LWO) method, and show that it can achieve valid coverage provided that the model-fitting procedure satisfies mild stability properties. Our proofs are based on quantifying the degree to which the data departs from cyclic exchangeability, which we introduce new coefficients to measure. Experiments on time series demonstrate that our method often enjoys valid coverage when the vanilla jackknife fails to cover, while producing much narrower intervals than split conformal prediction.

12. 机器学习应用 64 篇

2606.09857 2026-06-10 cs.LG physics.comp-ph 新提交

Uncertainty-aware Multi-fidelity Closure via Conditional Normalizing Flows

基于条件归一化流的不确定性感知多保真度闭合模型

Jice Zeng, Shady E. Ahmed, David Barajas-Solano, Panos Stinis

AI总结 提出基于条件归一化流的不确定性感知多保真度框架,通过学习低保真度到高保真度系数的概率映射,解决降阶模型中的闭合问题,在涡旋合并问题中验证了残差学习优于直接学习。

Comments No comments

详情
AI中文摘要

降阶模型(ROM)为复杂多尺度系统提供了高效的替代模型,但其预测精度常因截断误差以及已解析尺度与未解析尺度之间相互作用的不足表示而受损。截断(未解析)尺度对ROM(已解析)尺度缺失的影响通常被称为闭合问题。在本工作中,我们将ROM闭合建模视为一个多保真度(MF)学习问题,并基于条件归一化流提出一个不确定性感知的MF框架,以提高ROM的预测精度。所提出的方法学习从低保真度(LF)ROM系数到高保真度(HF)系数的概率映射,从而在量化与所学闭合相关的不确定性的同时提高预测保真度。研究了两种校正策略:直接学习(直接从LF输入预测HF系数)和残差学习(学习LF与HF系数之间的差异,并用其恢复校正后的HF解)。该框架在由二维Navier-Stokes方程控制的涡旋合并问题上进行了验证。结果表明,两种校正策略均比未校正的ROM提高了精度,其中残差学习始终优于直接学习。此外,所提出的两种基于深度生成模型的策略为校正后的ROM系数提供了不确定性量化,这对于评估预测置信度和支持ROM在实际应用中的可靠使用至关重要。

英文摘要

Reduced-order models (ROMs) provide an efficient surrogate for complex multiscale systems, but their predictive accuracy is often compromised by truncation errors and the inadequate representation of interactions between resolved and unresolved scales. The missing effect of truncated (unresolved) scales on ROM (resolved) scales is often denoted as the closure problem. In this work, we formulate ROM closure modeling as a multi-fidelity (MF) learning problem and propose an uncertainty-aware MF framework based on conditional normalizing flow to enhance ROM predictive accuracy. The proposed approach learns a probabilistic mapping from low-fidelity (LF) ROM coefficients to high-fidelity (HF) coefficients, thereby improving predictive fidelity while quantifying the uncertainty associated with the learned closure. Two correction strategies are investigated: direct learning, in which HF coefficients are predicted directly from LF inputs, and residual learning, which learns the discrepancy between LF and HF coefficients and uses it to recover the corrected HF solution. The framework is demonstrated on a vortex merging problem governed by the two-dimensional Navier Stokes equations. Results show that both correction strategies improve ROM accuracy over uncorrected ROM, with residual learning achieving consistently better performance than direct learning. Moreover, the two proposed deep generative model-based strategies provide uncertainty quantification for the corrected ROM coefficients, which is critical for assessing prediction confidence and supporting the reliable use of ROMs in practical applications.

2606.09860 2026-06-10 cs.LG cs.AI stat.AP stat.ML 新提交

Conformal Risk Prediction for Non-Alcoholic Fatty Liver Disease Using Gradient Boosting with Distribution-Free Coverages

基于梯度提升与无分布覆盖的非酒精性脂肪肝病共形风险预测

Xinze Zhang

AI总结 提出结合梯度提升决策树与共形预测的机器学习框架Method,实现非酒精性脂肪肝病个体风险的无分布校准覆盖预测,在中国多中心队列中AUROC达0.912,优于多种方法。

详情
AI中文摘要

非酒精性脂肪肝病(NAFLD)影响全球约25%的成年人,带来显著的肝脏和心血管风险。然而,人群层面的筛查工具仍不充分。我们提出Method,一种用于NAFLD风险预测的机器学习框架,将梯度提升决策树与共形预测相结合,以在个体风险估计上产生校准的、无分布的覆盖保证。它集成了基于互信息的稳定性选择过程,通过自助重采样识别紧凑、临床可解释的特征子集,构建预测集,其边际覆盖可证明超过用户指定的置信水平。我们在中国广州的多中心队列(主要n=2,187;外部验证n=412)上评估了Method,使用了涵盖人口统计学、代谢生物标志物和生活方式因素的78个候选特征。Method内部AUROC为0.912,外部为0.891,优于深度神经网络、TabNet、支持向量机和逻辑回归。共形预测集在90%名义水平下达到91.3%的经验覆盖。从这些分数得出的三层风险分层将人群分为不同组别,高风险亚组的12个月进展率是低风险组的4.7倍。选定的特征——特别是腰围、ALT、GGT、甘油三酯、空腹血糖和BMI——与已建立的代谢风险因素一致,提供了生物学合理性。

英文摘要

Non-alcoholic fatty liver disease (NAFLD) affects roughly 25% of global adults, posing substantial hepatic and cardiovascular risks. Yet, population-level screening tools remain inadequate. We present Method, a machine-learning framework for NAFLD risk prediction coupling gradient-boosted decision trees with conformal prediction to yield calibrated, distribution-free coverage guarantees on individual risk estimates. It integrates a mutual-information-based stability selection procedure to identify a compact, clinically interpretable feature subset via bootstrap resampling, constructing prediction sets whose marginal coverage provably exceeds a user-specified confidence level. We evaluated Method on a multicenter cohort from Guangzhou, China (primary n=2,187; external validation n=412) using 78 candidate features across demographics, metabolic biomarkers, and lifestyle factors. Method achieves an AUROC of 0.912 internally and 0.891 externally, outperforming deep neural networks, TabNet, support vector machines, and logistic regression. Conformal prediction sets achieve 91.3% empirical coverage at the 90% nominal level. A three-tier risk stratification derived from these scores separates the population into distinct groups, with the high-risk subgroup showing a 12-month progression rate 4.7 times that of the low-risk tier. The selected features -- notably waist circumference, ALT, GGT, triglycerides, fasting glucose, and BMI -- align with established metabolic risk factors, providing biological plausibility.

2606.09892 2026-06-10 cs.LG stat.ME 新提交

LMT: A Bayesian Framework for Causal Discovery from Textual Alarm Records in Manufacturing Systems

LMT: 制造系统中文本告警记录的因果发现贝叶斯框架

Xiaofeng Xiao, Jianhong Chen, Qiuzhuang Sun, Naichen Shi, Xubo Yue

发表机构 * Department of Mechanical & Industrial Engineering, Northeastern University, Boston, MA, USA(东北大学机械与工业工程系) College of Integrative Studies, Singapore Management University, Singapore(新加坡国立大学整合研究学院) Department of Industrial Engineering and Management Sciences, Department of Mechanical Engineering, Northwestern University, IL, USA(西北大学工业工程与管理科学系、机械工程系)

AI总结 提出LMT框架,结合大语言模型提取的语义信号和基于泊松过程的时间证据,通过贝叶斯方法从文本告警记录中发现因果图,在小样本场景下表现优异。

Comments 19 pages

详情
AI中文摘要

文本事件记录(如告警日志)已成为工程和制造系统中越来越常见的数据源。除了识别相关性或重复模式外,工程师通常有兴趣了解在系统运行过程中哪些类型的事件因果性地触发或影响其他事件。文本事件描述可能包含关于此类因果关系的语义线索,而最近的大语言模型(LLM)为提取这些信号提供了有前景的工具。然而,仅依赖LLM编码的文本信息不足以进行准确的因果发现,因为语义模式并不直接揭示因果机制,并且可能将因果关系与相关性或频繁的顺序模式混淆。为了解决这些挑战,我们提出了\textbf{LMT},一个用于工程事件数据的贝叶斯因果发现框架,它联合利用了文本描述和时间戳。具体来说,LMT首先使用LLM从事件描述中提取语义因果信号,并构建事件类型或事件簇之间因果图的先验分布。然后,它通过基于泊松过程的似然函数纳入时间证据,使得基于时间戳的统计证据能够精炼LLM信息先验。通过整合文本和时间信息,LMT生成一个既可解释又有数据支持的因果图。模拟研究表明,所提出的框架在不同设置下都是有效的,并且在样本量较小的告警事件场景中尤其具有优势。

英文摘要

Textual event records, such as alarm logs, have become an increasingly common data source in engineering and manufacturing systems. Beyond identifying correlations or recurring patterns, engineers are often interested in understanding which types of events causally trigger or influence other events during system operation. Textual event descriptions may contain semantic clues about such causal relationships, and recent large language models (LLMs) provide a promising tool for extracting these signals. However, relying solely on LLM-encoded textual information is insufficient for accurate causal discovery, since semantic patterns do not directly reveal causal mechanisms and may confuse causation with correlation or frequent sequential patterns. To address these challenges, we propose \textbf{LMT}, a Bayesian causal discovery framework for engineering event data that jointly leverages textual descriptions and timestamps. Specifically, LMT first uses LLMs to extract semantic causal signals from event descriptions and constructs a prior distribution over causal graphs among event types or event clusters. It then incorporates temporal evidence through a Poisson-process-based likelihood, allowing the LLM-informed prior to be refined by timestamp-based statistical evidence. By integrating the textual and temporal information, LMT produces a causal graph that is both interpretable and data-supported. Simulation studies show that the proposed framework is effective across different settings and is especially advantageous in small-sample alarm-event scenarios.

2606.09919 2026-06-10 cs.LG cs.AI cs.MA cs.RO 新提交

Co-GLANCE: Uncertainty-Aware Active Perception for Heterogeneous Robot Teaming

Co-GLANCE: 异构机器人团队的不确定性感知主动感知

Michal P. Podolinsky, Neel P. Bhatt, Pranay Samineni, Rohan Siva, Christian Ellis, Ufuk Topcu

AI总结 提出Co-GLANCE系统,通过蒸馏视觉语言模型实现实时遮挡分割与机器人分配,结合共形预测与选择性弃权提供统计保证的不确定性量化,驱动主动感知,在真实场景中遮挡分割和分配准确率分别提升25%和36%,推理延迟降低350倍。

Comments Code, videos, and dataset available at https://co-glance.github.io/

详情
AI中文摘要

感知不确定性是异构机器人团队在非结构化户外环境中运行的核心挑战,因为单一视角无法提供可靠的场景理解。由遮挡等来源引起的感知不确定性,根据场景结构在不同机器人视角下表现不同。检测和解决感知不确定性的来源需要基于场景的上下文推理和具备能力感知的机器人分配。虽然视觉语言模型为两者提供了强大的语义先验,但它们对于机载推理在计算上过于昂贵,且缺乏校准的不确定性量化。我们介绍了Co-GLANCE,一个用于异构机器人团队不确定性解决的实时机载感知与决策系统。Co-GLANCE将视觉语言模型的语义推理能力蒸馏为用于遮挡分割和机器人分配的端到端模型,消除了对基于云推理的需求。为了量化感知不确定性,Co-GLANCE结合了共形预测与选择性弃权,为分割、机器人分配和检测输出提供统计有效的覆盖保证。这些校准的不确定性估计直接触发主动感知,派遣最合适的机器人获取信息丰富的视角并解决不确定性。在真实世界场景中,Co-GLANCE在遮挡分割和机器人分配准确率上分别比基于云的视觉语言模型基线高出25%和36%,同时将每帧推理延迟降低350倍。我们还发布了一个空地数据集以供未来研究。代码、视频和数据集可在以下网址获取:此 https URL。

英文摘要

Perceptual uncertainty is a central challenge for heterogeneous robot teams operating in unstructured outdoor environments, where no single viewpoint affords reliable scene understanding. Perceptual uncertainty, arising from sources such as occlusions, manifests differently across robot viewpoints depending on scene structure. Detecting and resolving sources of perceptual uncertainty requires both scene-based contextual reasoning and capability-aware robot allocation. While vision-language models provide strong semantic priors for both, they are computationally prohibitive for onboard inference and lack calibrated uncertainty quantification. We introduce Co-GLANCE, a real-time onboard perception and decision-making system for uncertainty resolution in heterogeneous robot teams. Co-GLANCE distills the semantic reasoning capabilities of a vision-language model into an end-to-end model for occlusion segmentation and robot allocation, eliminating the need for cloud-based inference. To quantify perceptual uncertainty, Co-GLANCE combines conformal prediction with selective abstention to provide statistically valid coverage guarantees for segmentation, robot allocation, and detection outputs. These calibrated uncertainty estimates directly trigger active perception, dispatching the most appropriate robot to acquire informative viewpoints and resolve uncertainty. Across real-world scenarios, Co-GLANCE outperforms cloud-based vision-language model baselines in occlusion segmentation and robot allocation accuracy by 25% and 36%, respectively, while reducing per-frame inference latency 350x. We also release an air-ground dataset for future research. Code, videos, and dataset available at https://co-glance.github.io/ .

2606.09934 2026-06-10 cs.LG cs.CR 新提交

nCMD: Benign-Anchored Feature Selection for Imbalanced Network Intrusion Detection

nCMD: 面向不平衡网络入侵检测的良性锚定特征选择

Abu Fuad Ahmad, Istiaque Ahmed

AI总结 提出良性锚定类均值偏差(nCMD)方法,通过计算攻击类分布与良性类均值的偏差进行特征选择,在四个基准数据集上优于传统过滤方法,尤其适用于特征预算紧张和类别严重不平衡的场景。

Comments 6 pages, IEEE double columns

详情
AI中文摘要

特征选择对于在操作和防御网络中常见的高维、高度不平衡流量下运行的网络入侵检测系统(NIDS)至关重要。传统的过滤方法使用跨类别对称计算的全局统计量对特征进行排序,因此无法捕捉入侵检测的不对称性,其中攻击最好被描述为对主导良性流量的偏离。我们提出了良性锚定类均值偏差(nCMD),一种轻量级且可解释的方法,该方法基于攻击类分布与良性类均值的偏差(而非全局有偏的参考)对特征相关性进行评分。这种方法使特征选择与NIDS的操作语义保持一致,且不增加额外计算成本。在四个基准数据集(CICIDS2017、CICDDoS2019、NSL-KDD和UNSW-NB15)、多个特征预算和三个下游分类器上,nCMD在宏平均F1分数上达到或超过了经典过滤基线。它在四个数据集中的三个以及每个分类器下均取得了最佳结果,在特征预算紧张和类别严重不平衡的情况下改进最为显著。这些结果支持良性锚定排序作为资源受限NIDS的可扩展且可解释的预处理组件。

英文摘要

Feature selection is critical for network intrusion detection systems (NIDS) operating under high-dimensional, highly imbalanced traffic, as found in operational and defense networks. Traditional filter methods rank features using global statistics computed symmetrically across classes and thus fail to capture the asymmetry of intrusion detection, where attacks are best characterized as deviations from dominant benign traffic. We propose benign-anchored Classwise Mean Deviation (nCMD), a lightweight and interpretable method that scores feature relevance based on the deviation of attack-class distributions from the benign-class mean, rather than a globally biased reference. This approach aligns feature selection with the operational semantics of NIDS at no additional computational cost. Across four benchmark datasets (CICIDS2017, CICDDoS2019, NSL-KDD, and UNSW-NB15), multiple feature budgets, and three downstream classifiers, nCMD matches or exceeds classical filter baselines in macro-averaged F1-score. It achieves the best result on three of the four datasets and under every classifier, with the strongest improvements observed under tight feature budgets and severe class imbalance. These results support benign-anchored ranking as a scalable and interpretable preprocessing component for resource-constrained NIDS.

2606.09949 2026-06-10 cs.LG cs.AI 新提交

Learning Where to Simulate: Generative Active Sampling for Online PDE Surrogate Training

学习何处模拟:在线PDE代理训练的生成式主动采样

Pierre Cesar, Sofya Dymchenko, Abhishek Purandare, Bruno Raffin

AI总结 提出在线生成式主动采样(OGAS),通过扩散模型学习配置参数与代理性能的关系,主动采样高难度区域,显著降低尾部分布误差,提升代理最坏情况可靠性。

详情
AI中文摘要

数据驱动的PDE代理使用数值PDE求解器产生的数据进行训练。然而,当代理的目标是在广泛的PDE配置(例如初始条件和物理系数)上泛化时,生成具有代表性的训练集并非易事。配置参数的均匀采样通常低估了表现出挑战性动力学的轨迹,导致训练后的代理出现高预测误差和大误差方差。在线训练将数据生成和代理训练耦合,通过允许实时调整求解器参数提供了自然优势。为了有效利用这一能力,我们引入了在线生成式主动采样(OGAS),一种主动学习方法,它反应性地学习配置参数与代理性能之间的关系,以控制采样分布。OGAS与代理并行训练一个快速扩散模型,作为条件采样器,将代理派生的难度信号(例如损失或不确定性)映射到配置参数。通过主动从偏向高难度的先验中抽取目标信号,OGAS持续将数据生成导向挑战性区域,而不会延迟训练流程。我们在具有不同挑战性动力学的2D PDE(Kuramoto-Sivashinsky、Navier-Stokes、Gray-Scott)上评估OGAS,参数多达308个,并使用多种代理架构。在所有设置中,与均匀采样相比,OGAS一致地改善了尾部分布统计,显著降低了第99百分位以上的误差和整体误差离散度。虽然优先考虑挑战性轨迹引入了与平均误差的权衡,但OGAS有效确保了训练后代理的最坏情况可靠性,且壁钟时间开销可忽略不计。

英文摘要

Data-driven PDE surrogates are trained with data produced by numerical PDE solvers. However, when the surrogate's goal is to generalize across a wide range of PDE configurations (e.g., initial conditions and physical coefficients), generating a representative training set is non-trivial. Uniform sampling of configuration parameters often under-represents trajectories exhibiting challenging dynamics, leading to high prediction errors and large error variance in the trained surrogate. Online training, where data generation and surrogate training are coupled, offers a natural advantage by allowing solver parameters to be steered on-the-fly. To efficiently exploit this capability, we introduce Online Generative Active Sampling (OGAS), an active learning method that reactively learns the relationship between configuration parameters and surrogate performance to control the sampling distribution. OGAS trains a fast diffusion model in parallel to the surrogate to act as a conditional sampler, mapping a surrogate-derived difficulty signal (e.g., loss or uncertainty) to configuration parameters. By actively drawing target signals from a prior biased toward high difficulty, OGAS continuously steers data generation toward challenging regimes without delaying the training workflow. We evaluate OGAS across 2D PDEs with distinct challenging dynamics (Kuramoto-Sivashinsky, Navier-Stokes, Gray-Scott) and up to 308 parameters, using multiple surrogate architectures. Across all settings, OGAS consistently improves tail statistics, yielding substantial reductions in errors above the 99th percentile and overall error dispersion compared to uniform sampling. While prioritizing challenging trajectories introduces a trade-off with average error, OGAS effectively ensures worst-case reliability of trained surrogates with negligible wall-time overhead.

2606.09959 2026-06-10 cs.LG cs.AI 新提交

Temporal Context Conditioning for Seasonality-Aware Precipitation Nowcasting of High-Intensity Rainfall

面向高强度降雨的季节感知降水临近预报的时间上下文条件化

Gijs van Nieuwkoop, Siamak Mehrkanoon

AI总结 提出TA-SmaAt-UNet模型,通过时间条件层(昼夜和季节循环编码)增强雷达降水临近预报,显著提升高强度降雨事件的预测性能。

Comments 9 pages, 6 figures

详情
AI中文摘要

降水临近预报越来越多地采用直接从近期雷达观测中学习的深度学习模型。尽管这类模型能有效捕捉短期降水运动,但它们往往缺乏降雨发展所依据的气象条件的更广泛上下文信息。本文研究轻量级时间上下文是否能改善基于雷达的临近预报,特别是针对高强度降雨。我们提出了时间感知小注意力U-Net(TA-SmaAt-UNet),它在核心SmaAt-UNet模型基础上扩展了时间条件层,利用昼夜时间和一年中时间的循环编码来调节中间特征表示。在KNMI雷达降水数据上的实验表明,时间条件化对罕见的高强度降水事件最为有益,同时也能改善季节变异性和预测降水强度分布的表征。层传导分析进一步表明,尽管参数成本很小,模型仍积极使用添加的时间条件层。这些发现表明,简单的、基于物理动机的时间上下文可以提高基于深度学习的降水临近预报的真实性和可靠性。我们的模型实现和训练设置可在GitHub上获取。

英文摘要

Precipitation nowcasting is increasingly being approached with deep learning models that learn directly from recent radar observations. Although such models can efficiently capture short-term precipitation motion, they often lack broader contextual information about the meteorological conditions under which rainfall develops. This paper investigates whether lightweight temporal context can improve radar-based nowcasting, particularly for high-intensity rainfall. We propose the Time-Aware Small-Attention U-Net (TA-SmaAt-UNet), which extends the core SmaAt-UNet model with temporal conditioning layers that use cyclical encodings of time-of-day and time-of-year to modulate intermediate feature representations. Experiments on KNMI radar precipitation data show that temporal conditioning is most beneficial for rare, high-intensity precipitation events, while also improving the representation of seasonal variability and predicted rainfall-intensity distributions. A layer conductance analysis further indicates that the added temporal conditioning layers are actively used by the model despite their small parameter cost. These findings suggest that simple, physically motivated temporal context can improve the realism and reliability of deep learning-based precipitation nowcasts. The implementation of our models and training setup is available on \href{https://github.com/gijsvn/TA-SmaAt-UNet}{GitHub}.

2606.10080 2026-06-10 cs.LG cs.AI q-bio.QM 新提交

VFUSE: Virulent Feature Understanding with Sparse autoEncoders

VFUSE: 基于稀疏自编码器的毒力特征理解

Michael Yu, Matthew L. Olson

AI总结 提出VFUSE方法,通过训练稀疏自编码器(SAE)分析扩散-Transformer模型激活,识别蛋白质设计中的危险特征,实现可解释性提升而不牺牲性能。

详情
AI中文摘要

生成模型在蛋白质设计等领域取得了显著进展,但这种能力也使得危险蛋白质的生成变得不透明。在这项工作中,我们引入了VFUSE(基于稀疏自编码器的毒力特征理解),这是一种机制可解释性方法,通过在扩散-Transformer激活上训练SAE来审计蛋白质模型中的危险感知特征。我们将VFUSE应用于RoseTTAFold3和RFDiffusion3,这些是流行的开源蛋白质折叠和合成模型。我们发现,对于某些模块,线性探针在SAE潜在空间中的拟合效果显著优于原始模型表示,从而在不牺牲模型性能的情况下提高了可解释性。此外,我们识别出SAE中的单语义特征,这些特征仅在危险设计上激活,AUROC高达0.84(q < 10^{-13})。据我们所知,这是首次在全原子扩散模型上训练SAE,也是首次对蛋白质设计模型进行特征级毒力审计,为安全且可解释的蛋白质设计铺平了道路。

英文摘要

Generative models have shown remarkable progress in a variety of domains such as protein design, but such power enables the opaque generation of hazardous proteins. In this work, we introduce VFUSE (Virulent Feature Understanding with Sparse autoEncoders), a mechanistic interpretability approach that trains SAEs on diffusion-transformer activations to audit protein models for hazard-aware features. We apply VFUSE to RoseTTAFold3 and RFDiffusion3, popular open-weight models for protein folding and synthesis. We find that for certain blocks, linear probes detect hazardous designs significantly better when fit in the SAE latent space over the original model's representations: improving interpretability without sacrificing model performance. Furthermore, we identify monosemantic features from the SAE that fire only on hazardous designs at up to AUROC $0.84$ ($q < 10^{-13}$). To our knowledge this is the first SAE trained on an all-atom diffusion model and the first feature-level virulence audit of a protein design model, paving the way towards safe and interpretable protein design.

2606.10084 2026-06-10 cs.LG cs.AI 新提交

Divide-and-Conquer Modeling for the CTF-4-Science Lorenz Benchmark

CTF-4-Science Lorenz基准的分治建模策略

Shundong Li

AI总结 提出分治建模策略,针对CTF-4-Science Lorenz基准的五个场景族分别设计模型,通过平滑去噪、NG-RC/NVAR预测、Lorenz过渡校正和参数前缀混合,以79.63分证明场景特定更新优于通用模型。

详情
AI中文摘要

本文针对CTF-4-Science Lorenz基准提出了一种分治建模策略,该基准通过十二个隐藏分数和五个场景族评估混沌系统预测:干净预测、噪声重建、噪声输入预测、少样本学习和参数泛化。最终系统不是强制一个模型类处理所有场景,而是将每个预测块与其任务组的评估行为相匹配。主要贡献包括:基于平滑的重建用于噪声全轨迹去噪;针对噪声长时间吸引子预测调优的NG-RC/NVAR模型;限制在敏感干净短时间前缀上的拟合Lorenz过渡校正;以及用于插值任务的参数前缀混合。最终系统得分为79.63,表明在混合混沌预测基准上,有界、场景特定的更新可以优于广泛的模型替换。

英文摘要

This work presents a divide-and-conquer modeling strategy for the CTF-4-Science Lorenz benchmark, which evaluates chaotic-system prediction across twelve hidden scores and five scenario families: clean forecasting, noisy reconstruction, noisy-input forecasting, few-shot learning, and parametric generalization. Rather than forcing one model class to handle all regimes, the final system matched each prediction block to the evaluation behavior of its task group. The main contributions are: smoothing-based reconstruction for noisy full-trajectory denoising; NG-RC/NVAR models tuned for noisy long-time attractor forecasting; a fitted Lorenz transition correction restricted to the sensitive clean short-time prefix; and a parametric prefix blend for the interpolation task. The resulting system with final public score of 79.63 shows that bounded, scenario-specific updates can outperform broad model replacement on mixed chaotic forecasting benchmarks.

2606.10219 2026-06-10 cs.LG cs.AI 新提交

Fast Exact Nearest-Neighbor Learning for High-Frequency Financial Time Series

高频金融时间序列的快速精确最近邻学习

Henry Han, Diane Li

AI总结 针对金融高频数据增长带来的实时性挑战,提出基于Mojo的SIMD k-d树方法,通过方差分裂、连续存储和编译时向量化距离计算,在保持精确输出的同时实现17.5-43.5倍加速,并支持期权定价模型训练数据量提升10倍。

Comments 15 pages 5 figures;

详情
AI中文摘要

随着股票、ETF、外汇、期权和高频交易数据量的激增,AI在金融领域的大规模效率变得至关重要。这种增长给成熟的金融AI系统带来了核心挑战:模型必须从更大的历史语料库中学习,同时满足交易、风险管理和衍生品定价中的实时延迟约束。我们以高频金融时间序列的精确最近邻学习为具体案例,展示基于Mojo的金融AI可以应对这一挑战。我们引入了一种Mojo SIMD k-d树,采用基于方差的划分、连续的扁平缓冲区存储和编译时向量化距离计算。我们还提供了运行时结果,表明在标准剪枝和实现成本假设下,对于固定股票、大样本量、中等维度的情况,Mojo SIMD k-d树渐近地优于Mojo SIMD暴力搜索和scikit-learn的k-d树。在x86和ARM64架构的八个金融数据集上(训练样本最多277K),该方法在x86上比scikit-learn的k-d树加速17.5-21.6倍,在ARM64股票/ETF数据集上比scikit-learn暴力搜索加速28.1-43.5倍,同时保持精确输出。除了最近邻推理,Mojo的编译执行使得基于Extra Trees的隐含波动率定价模型能够训练10倍以上的期权数据,将看跌期权IV RMSE降低8.0%。这些结果将Mojo定位为金融AI的可扩展、生产就绪栈,并为其他数据密集型领域的高效AI提供了有前景的基础。

英文摘要

AI efficiency at scale is becoming critical in finance as market data volumes surge across equities, ETFs, FX, options, and high-frequency trading streams. This growth creates a core challenge for mature financial AI systems: models must learn from larger historical corpora while still meeting real-time latency constraints in trading, risk management, and derivative pricing. We use exact nearest-neighbor learning for high-frequency financial time series as a concrete case study to show that Mojo-based financial AI can address this challenge. We introduce a Mojo SIMD k-d tree with variance-based splitting, contiguous flat-buffer storage, and compile-time vectorized distance computation. We also provide a runtime result showing that, under standard pruning and implementation-cost assumptions, the Mojo SIMD k-d tree asymptotically dominates Mojo SIMD brute force and scikit-learn's k-d tree in the fixed-stock, large-$n$, moderate-dimensional regime. Empirically, across eight financial datasets on x86 and ARM64 with up to 277K training samples, the method achieves 17.5--21.6$\times$ speedup over scikit-learn's k-d tree on x86 and 28.1--43.5$\times$ over scikit-learn brute force on ARM64 equity/ETF datasets, while preserving exact outputs. Beyond nearest-neighbor inference, Mojo's compiled execution enables an Extra Trees-based implied-volatility pricing model to train on $10\times$ more options data, reducing put-IV RMSE by 8.0\%. These results position Mojo as a scalable, production-ready stack for financial AI and a promising foundation for efficient AI in other data-intensive fields. \keywords{Financial AI \and AI Efficiency \and Mojo \and SIMD \and K-D Trees \and KNN \and High-Frequency Trading \and Financial Time Series \and Scaling}

2606.10227 2026-06-10 cs.LG 新提交

Spatiotemporal Graph Transformer for 3D Neighborhood Interaction and Quality Prediction in Metal Additive Manufacturing

时空图Transformer用于金属增材制造中的3D邻域交互与质量预测

Joyce Karen Pelaez, Siqi Zhang, Hoo Sang Ko

AI总结 提出一种时空图Transformer,通过加权网络表示和双注意力机制建模3D邻域交互,显著提升金属增材制造质量预测性能。

Comments Submitted to Journal of Intelligent Manufacturing, 23 pages, 10 figures, 2 tables

详情
AI中文摘要

金属增材制造能够制造复杂零件,但由于重复的逐层熔化、凝固和再加热在3D构建中引起的交互作用,实现一致的构建质量仍然具有挑战性。先进传感技术为收集实际制造过程的丰富观测数据以实现实时质量监控和控制提供了巨大机会。然而,现有方法通常难以表示多层交互并量化其对质量的贡献。在本文中,我们开发了一种新颖的时空图Transformer,用于建模3D邻域交互并学习其对金属增材制造构建质量的影响。具体来说,我们首先引入制造过程的加权网络表示,其中熔合位置被建模为节点,其空间和过程依赖关系被编码为边权重。这种表示还允许将多模态数据(例如几何设计、工艺设置和原位传感数据)集成到统一结构中,用于下游学习任务。在此网络基础上,我们进一步设计了一种双注意力图Transformer,它同时捕获节点内特征依赖和跨节点邻域交互,用于质量表示学习。实验结果表明,所提出的框架在表征过程-质量关系方面显著优于基于图像、序列和图的模型。更重要的是,跨层交互的纳入对于提高质量预测性能至关重要。该框架广泛适用于涉及网络建模和基于图的表示学习的其他任务。

英文摘要

Metal additive manufacturing enables the fabrication of complex parts, but achieving consistent build quality remains challenging due to interactions induced by repeated layer-wise melting, solidification, and reheating across the 3D build. Advanced sensing provide a great opportunity to collect rich observations of the actual manufacturing process for real-time quality monitoring and control. Yet, existing methods often have limited ability to represent multi-layer interactions and quantify their contributions to quality. In this paper, we develop a novel spatiotemporal graph transformer for modeling 3D neighborhood interactions and learn their effects on build quality in metal additive manufacturing. Specifically, we first introduce a weighted network representation of the manufacturing process, where fusing locations are modeled as nodes, and their spatial- and process-dependent relationships are encoded as edge weights. This representation also enables the integration of multimodal data (e.g., geometric design, process settings, and in-situ sensing data) into a unified structure for downstream learning tasks. Building on this network, we further design a dual-attention graph transformer that captures both within-node feature dependencies and cross-node neighborhood interactions for quality representation learning. Experimental results show that the proposed framework significantly outperforms image-based, sequence-based, and graph-based models in characterizing process-quality relationships. More importantly, the incorporation of cross-layer interactions is critical for improving quality prediction performance. This framework is broadly applicable to other tasks involving network modeling and graph-based representation learning.

2606.10243 2026-06-10 cs.LG 新提交

DUET -- Dual User Embedding Transformers for Offsite Conversion Prediction

DUET -- 用于站外转化预测的双用户嵌入变换器

Reazul Hasan Russel, Mingwei Tang, Rostam Shirani, Xinlong Liu, Navid Madani, Leo Ding, Yawen He, Xiangyu Wang, Mustafa Acar, Ashish Katiyar, Yuhai Li, Alan Yang, Metarya Ruparel, Derek Qiang Xu, Rupert Wu, Rui Yang, Liang Tao, Xinyi Zhao, Larry Zhang, Sri Reddy, Rob Malkin

AI总结 针对点击信号丰富但转化信号稀疏、延迟的问题,提出DUET框架,通过为点击和转化流分别预训练专用变换器编码器,生成互补嵌入,在服务延迟约束下提升站外转化率预测精度。

详情
AI中文摘要

站外转化率(OCVR)预测是计算推荐系统中一个重要的排序问题。该任务面临建模挑战:点击信号丰富且时间跨度短,而转化信号本质稀疏、延迟长且常无法归因。尽管存在这些统计差异,两种信号都必须为在严格服务延迟约束下运行的模型提供信息。先前的预训练方法使用单一、无差别的编码器统一应用于两个数据流。我们提出DUET(双用户嵌入变换器),该框架明确将用户行为数据划分为两个领域一致的流——点击和转化——并为每个流预训练专用变换器编码器,其架构针对各流的统计特征定制:密集点击流使用多层自注意力,稀疏转化流使用交错交叉和自注意力。生成的互补嵌入由下游排序器联合使用,而不超出服务延迟预算。评估显示,相对于最强基线,归一化熵(NE)降低高达0.38%,A/B测试显示OCVR预测精度持续提升。

英文摘要

Offsite conversion rate (OCVR) prediction is an important ranking problem in computational recommendation systems. This task presents a modeling challenge: click signals are abundant and exhibit short temporal horizons, whereas conversion signals are inherently sparse, long-delayed, and frequently unattributed. Despite these statistical disparities, both signal types must inform models that operate within strict serving-latency constraints. Prior pre-training approaches address this heterogeneity with a single, undifferentiated encoder applied uniformly across both data streams. We propose DUET (Dual User Embedding Transformers), a framework that explicitly partitions user behavioral data into two domain-coherent streams -- clicks and conversions -- and pre-trains dedicated transformer encoders with architectures tailored to each stream's statistical characteristics: multi-layer self-attention for the dense click stream and interleaved cross- and self-attention for the sparse conversion stream. The resulting complementary embeddings are jointly consumed by a downstream ranker without exceeding serving-latency budgets. Evaluation demonstrates up to 0.38% normalized entropy (NE) reduction relative to the strongest baseline, and A/B test shows consistent improvements in OCVR prediction accuracy.

2606.10393 2026-06-10 cs.LG cs.CE 新提交

Validation-Stage Combinatorial Fusion Analysis for Imbalanced Credit-Card Fraud Detection

面向不平衡信用卡欺诈检测的验证阶段组合融合分析

Xiao Han, Chenyu Wu

发表机构 * Xiao Han and Chenyu Wu contributed equally to this work.(小韩和吴晨宇共同完成了本工作。)

AI总结 针对信用卡欺诈检测中数据不平衡问题,提出在验证阶段使用组合融合分析(CFA)选择互补模型子集并赋予多样性权重,在IEEE-CIS基准上AUC-ROC达0.9405。

详情
AI中文摘要

信用卡欺诈检测因欺诈交易稀少、成本高且分布不均而困难。强梯度提升树模型在结构化交易数据上已表现良好,因此另一种融合方法的价值并不明显。本文研究组合融合分析(CFA)——通过搜索模型子集和排名得分融合规则——是否能在IEEE-CIS欺诈检测基准上增加价值。使用无泄漏的60/20/20训练/验证/测试协议,我们评估了由七个基分类器构建的480种融合配置。最佳测试集结果来自随机森林、XGBoost和LightGBM的多样性加权得分融合(DEF WtScore),AUC-ROC = 0.9405,AUPRC = 0.6699,F1 = 0.6373。来自1000次重抽样的Bootstrap置信区间显示,对于所有三个指标,相对于最强单一模型的增益均排除零。CFA在AUC-ROC上与软投票持平,提高了AUPRC和F1,并在该设置下优于堆叠。CTGAN增强实验给出了负面结果:合成欺诈样本降低了单个模型和CFA的性能。总体而言,CFA在此处最有用的不是作为组合所有分类器的方法,而是作为验证阶段的方法,用于选择小的、互补的子集并分配多样性感知的权重。

英文摘要

Credit-card fraud detection is difficult because fraudulent transactions are rare, costly, and unevenly distributed. Strong gradient-boosted tree models already perform well on structured transaction data, so the value of another fusion method is not obvious. This paper examines whether Combinatorial Fusion Analysis (CFA), which searches over model subsets and rank-score fusion rules, can still add value on the IEEE-CIS Fraud Detection benchmark. Using a leakage-free 60/20/20 train/validation/test protocol, we evaluate 480 fusion configurations built from seven base classifiers. The best test-set result comes from diversity-weighted score fusion of Random Forest, XGBoost, and LightGBM (DEF WtScore), with AUC-ROC = 0.9405, AUPRC = 0.6699, and F1 = 0.6373. Bootstrap confidence intervals from 1,000 resamples show that the gains over the strongest single model exclude zero for all three metrics. CFA matches soft voting on AUC-ROC, improves AUPRC and F1, and outperforms stacking in this setting. A CTGAN augmentation experiment gives a negative result: synthetic fraud samples degrade both individual models and CFA. Overall, CFA is most useful here not as a way to combine every classifier, but as a validation-stage method for choosing a small, complementary subset and assigning diversity-aware weights.

2606.10410 2026-06-10 cs.LG eess.SP q-bio.QM 新提交

A Comprehensive Inference-Time Augmentation Framework in Physiological Signals: Application to PPG-Based AF Detection

生理信号中的综合推理时增强框架:应用于基于PPG的房颤检测

Davood Fattahi, Runze Yan, Saurabh Kataria, Zhaoliang Chen, Xiao Hu

发表机构 * Nell Hodgson Woodruff School of Nursing, Emory University(埃默里大学护理学院) Department of Computer Science, Emory University(埃默里大学计算机科学系) Wallace H. Coulter Department of Biomedical Engineering, Georgia Institute of Technology(佐治亚理工学院生物医学工程系) Department of Biomedical Informatics, Emory University School of Medicine(埃默里大学医学学院生物医学信息学系)

AI总结 提出一个包含13种增强方法的统一推理时增强框架,通过贝叶斯优化超参数,在PPG房颤检测任务中显著提升AUROC和AUPRC,降低假阳性率。

Comments 22 pages, 11 figures, 4 tables. Under review at Physiological Measurement

详情
AI中文摘要

目标:在真实部署中,生理信号的准确分类面临传感器噪声、运动伪影以及训练数据与部署数据之间分布偏移的挑战。推理时增强(ITA)在推理过程中应用增强而非重新训练,提供了一种简单、模型无关的机制来提高鲁棒性。然而,ITA在生理信号中的应用范围仍然狭窄,依赖于有限的增强方法和固定的未优化参数。本文提出一个统一的ITA框架以解决这一差距。方法:该框架包含13种增强方法,涵盖时域、幅值域、频域和伪影注入变换,并通过贝叶斯优化优化超参数。我们使用GPT-PPG和ResNet在五个数据集(包含400多名患者和约9,800小时记录)上评估基于30秒PPG信号的房颤(AF)检测。主要结果:标准ITA持续改善了AUROC(GPT-PPG最高提升8.5%,ResNet最高提升0.7%)和AUPRC(GPT-PPG最高提升10.6%,ResNet最高提升0.8%)。选择性ITA进一步将非AF数据集上的平均FPR降低了高达4.4%(GPT-PPG)和1.3%(ResNet)。意义:这些发现确立了ITA作为一种实用的、模型无关的方法,用于在无法重新训练的部署环境中提高基于PPG的房颤分类可靠性,并具有更广泛的生理信号分析适用性。

英文摘要

Objective: Accurate classification of physiological signals in real-world deployments is challenged by sensor noise, motion artifacts, and distribution shifts between training and deployment data. Inference-time augmentation (ITA), which applies augmentations during inference rather than retraining, offers a simple, model-agnostic mechanism to improve robustness. However, ITA application to physiological signals has remained narrow in scope, relying on limited augmentation methods with fixed, unoptimized parameters. This work proposes a unified ITA framework to address that gap. Approach: The framework incorporates 13 augmentation methods spanning time-domain, amplitude-domain, frequency-domain, and artifact-injection transformations, with hyperparameters optimized via Bayesian optimization. We evaluate on atrial fibrillation (AF) detection from 30-second PPG signals using GPT-PPG and ResNet across five datasets comprising more than 400 patients and ${\sim}$9,800 hours of recording. Main results: Standard ITA consistently improved AUROC (up to 8.5% for GPT-PPG and 0.7% for ResNet) and AUPRC (up to 10.6% for GPT-PPG and 0.8% for ResNet). Selective ITA further reduced average FPR by up to 4.4% (GPT-PPG) and 1.3% (ResNet) on non-AF datasets. Significance: These findings establish ITA as a practical, model-agnostic approach for improving PPG-based AF classification reliability in deployment settings where retraining is not feasible, with broader applicability to physiological signal analysis.

2606.10487 2026-06-10 cs.LG cs.AI 新提交

Stop Early, Spend Less: Hidden-State Probes as a Practical Recipe for Streaming Moderation of LLM Outputs

早停早省:隐藏状态探针作为LLM输出流式审核的实用方案

Huizhen Shu, Xuying Li, Piao Xue

发表机构 * ModelOneAI yunshanai(云山AI)

AI总结 提出基于隐藏状态的轻量级词元级探针,在解码循环中实时检测不安全输出,无需额外前向传播,实现亚毫秒级安全审核,可提前中断或修改生成。

Comments Technical Report. 14 pages, 3 figures, 4 tables

详情
AI中文摘要

在面向用户的系统中部署大型语言模型需要高效的输出安全过滤。现有方法通常依赖于生成后应用的单独审核模型,这会使推理成本翻倍,并且仅在生成完成后检测违规。我们观察到审核所需的信号已经存在于模型隐藏状态中。基于此,我们训练了轻量级的词元级探针,直接操作内部激活,生成每个词元的安全分数,这些分数可以聚合用于离线评估和在线干预。该探针重用生成器的激活,无需额外的前向传播,从而在解码循环内实现亚毫秒级的逐词元安全检查。应用于单个中间层的探针可以恢复强防护模型的大部分决策,作为一个低成本替代方案,优化延迟而非准确性。在流式设置中,它可以在不安全输出完全生成之前暂停或修改它们,用连续的词元级监控取代序列结束时的审核。与事后和流式防护模型相比,我们的方法实现了数量级的计算开销降低,且延迟成本最小。我们还提供了一个实用的部署方案,包括层选择、聚合策略、探测频率和触发阈值。最后,我们展示了探针的线性分量对应于残差空间中的一个方向,从而能够以可忽略的成本实现检测和激活引导。

英文摘要

Deploying large language models in user-facing systems requires efficient output safety filtering. Existing approaches typically rely on a separate moderation model applied after generation, which doubles inference cost and only detects violations after generation completes. We observe that the signal needed for moderation is already present in the model hidden states. Based on this, we train lightweight token-level probes that operate directly on internal activations, producing per-token safety scores that can be aggregated for both offline evaluation and online intervention. The probe reuses activations from the generator and requires no additional forward pass, enabling sub millisecond per-token safety checks inside the decoding loop. A probe applied to a single mid layer recovers most decisions of a strong guard model, acting as a low cost surrogate optimized for latency rather than accuracy. In streaming settings, it can halt or modify unsafe outputs before they are fully generated, replacing end of sequence moderation with continuous token level monitoring. Compared to post hoc and streaming guard models, our method achieves orders of magnitude lower compute overhead with minimal latency cost. We also provide a practical deployment recipe, including layer selection, aggregation strategy, probing frequency, and triggering thresholds. Finally, we show that the probe linear component corresponds to a direction in residual space, enabling both detection and activation steering at negligible cost.

2606.10530 2026-06-10 cs.LG cs.AI 新提交

Machine Learning Methods for Studying Latent Neural Activity Dynamics

研究潜在神经活动动力学的机器学习方法

Shufeng Kong, Fumei Deng, Xinyi Dong, Caihua Liu, Weiwei Chen, Yingheng Wang, Daniel Cao, Azahara Oliva, Antonio Fernandez-Ruiz, Carla Gomes

发表机构 * School of Software Engineering, Sun Yat-sen University(中山大学软件工程学院) Department of Computer Science, Cornell University(康奈尔大学计算机科学系) Department of Neurobiology and Behavior, Cornell University(康奈尔大学神经生物学与行为学系) Department of Ecology and Evolutionary Biology, Cornell University(康奈尔大学生态学与进化生物学系) School of Computer Science and Artificial Intelligence, Foshan University(佛山大学计算机科学与人工智能学院)

AI总结 综述从状态空间模型到深度生成模型的潜在变量模型,涵盖单区域动力学、多区域通信和行为对齐建模,并讨论大规模神经基础模型及未来挑战。

Comments Accepted by IJCAI 2026 survey track

详情
AI中文摘要

脑记录的最新发展推动了对能够解码大量神经元潜在结构的机器学习工具的需求。本文提供了全面的综述,概述了潜在变量模型(LVM)从早期状态空间模型到最近深度生成模型的轨迹。我们将文献组织为三个密切相关的领域:(1)单区域潜在动力学,包括从线性动力系统到由循环神经网络(RNN)和神经常微分方程(ODE)表示的更复杂动力学模型;(2)多区域通信,采用概率和子空间方法研究信息如何在不同脑区之间传递,考虑突触传播延迟和网络连接;(3)行为对齐建模,旨在通过监督或对比学习将与任务表现相关的神经活动与其他内部状态分离。本综述还包括大规模神经基础模型,如Transformer和扩散模型,它们依赖大规模预训练以实现跨主体的最佳性能。最后,我们总结并讨论基准、评估标准和开放挑战,如识别因果联系或通信方向的能力,以促进弥合可解释脑动力学与可靠神经解码之间的未来研究。

英文摘要

Recent developments in brain recording are driving a demand for machine learning tools capable of decoding the latent structure of large populations of neurons. In this paper, we provide a comprehensive survey that outlines the trajectory of Latent Variable Models (LVMs) from early state-space models to more recent deep generative models. We organize the literature into three closely related domains: (1) Single-Region Latent Dynamics, which includes models such as linear dynamical systems to more complex dynamics represented by Recurrent Neural Networks (RNNs) and Neural Ordinary Differential Equations (ODEs); (2) Multi-Region Communication, which employs probabilistic as well as subspace methods to study how information is transferred across different brain areas considering synaptic propagation delays and network connectivity; and (3) Behavior-Aligned Modeling, which seeks to disentangle neural activity related to task performance from other internal states via supervised or contrastive learning. This survey also includes large-scale neural foundation models, such as Transformers and diffusion models, that rely on large-scale pre-training for optimal performance across subjects. Finally, we conclude and discuss benchmarks, evaluation criteria, and open challenges, such as the ability to identify causal links or directionality of communication, to facilitate future research for bridging interpretable brain dynamics with reliable neural decoding.

2606.10583 2026-06-10 cs.LG cs.AI math.OC 新提交

NOVA: Symbolic Regression Discovery of Interpretable Car-Following and Lane-Change Models with Driver Heterogeneity

NOVA: 可解释的跟驰与换道模型及驾驶员异质性的符号回归发现

Ishak Abassi, Nassim Ali Bouazzouni, Farah Ibelaiden, Nadir Farhi

AI总结 提出NOVA符号回归框架,从原始轨迹数据自动发现可解释的跟驰与换道结构,在NGSIM数据集上优于基线,并揭示主导非线性项与心理物理理论关联。

详情
AI中文摘要

我们提出了NOVA,一个自主符号回归框架,能够从原始轨迹数据中识别出可解释的跟驰和换道结构,且仅需极少的先验行为假设。应用于来自NGSIM I-80和US-101数据集的4,765,788个活跃驾驶观测,NOVA的确定性Rust驱动搜索引擎评估了超过10,000个候选代数结构,并在前向平移滚动均值预测目标下识别出一个紧凑的两项加速度模型。在两种互补的预处理流程下评估,NOVA在意图预测基准上实现了RMSE = 1.376 m/s²(R² = 15.57%),在相同评估协议下,RMSE比最佳重新校准的符号回归基线(SR-LLM, PNAS 2025)低0.135 m/s²。在八个独立实验中,单个主导非线性项作为人类跟驰的稳健骨干出现;残差引导的扩展进一步将所选结构与已建立的碰撞避免心理物理理论联系起来。发现的特征算子在不同高速公路地点之间零样本迁移,R²损失低于3个百分点。扩展到多项logit框架内的换道建模,NOVA在502个未见驾驶员的严格车辆ID留出测试中实现了67.4%的平衡准确率,在三类问题上超过现有换道基线+29.8个百分点。

英文摘要

We present NOVA, an autonomous symbolic regression framework that identifies interpretable car-following and lane-change structures from raw trajectory data with minimal behavioral priors. Applied to 4,765,788 active driving observations from the NGSIM I-80 and US-101 datasets, NOVA's deterministic Rust-powered search engine evaluates over 10,000 candidate algebraic structures and identifies a compact two-term acceleration model under a forward-shifted rolling-mean prediction target. Evaluated under two complementary preprocessing pipelines, NOVA achieves $RMSE = 1.376 m/s^2$ ($R^2 = 15.57\%$) on the intent-forecasting benchmark, outperforming the best recalibrated symbolic-regression baseline (SR-LLM, PNAS~2025) by 0.135 m/s$^2$ in RMSE under an identical evaluation protocol. Across eight independent experiments, a single dominant nonlinear term emerges as a robust backbone of human car-following; a residual-guided extension further links the selected structure to an established psychophysical theory of collision avoidance. The discovered feature operators transfer zero-shot between freeway sites with under 3 pp $R^2$ loss. Extended to lane-change modelling within a multinomial logit framework, NOVA achieves 67.4\% balanced accuracy under strict vehicle-ID holdout on 502 unseen drivers, surpassing existing lane-changing baselines by +29.8 percentage points on a three-class problem.

2606.10718 2026-06-10 cs.LG cs.AI 新提交

Transformer Based Model for Spatiotemporal Feature Learning in EEG Emotion Recognition

基于Transformer的脑电情绪识别时空特征学习模型

Xinglong Cui, Dian Gu

发表机构 * Beijing Neurodeep Technology Co., Ltd(北京纽罗德普科技有限公司) University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出EEG-TransNet架构,通过局部自注意力块和模糊注意力同步Transformer捕捉脑电信号的时空特征,在三个数据集上优于现有方法。

详情
AI中文摘要

脑电图(EEG)是一种广泛采用的监测大脑活动的技术,因其高时间分辨率和成本效益,为神经状态提供了有价值的见解。为了增强对复杂EEG数据的分析,我们提出了EEG-TransNet,一种旨在捕捉EEG信号的时间、区域和同步特征的架构。EEG-TransNet引入了三个关键模块:1)利用ResNet和基于小波去噪的预处理与特征提取模块,2)用于区域特征学习的局部自注意力块,以及3)用于建模时空依赖性的模糊注意力同步Transformer(FAST)。通过在三个EEG数据集(BETA、SEED和DepEEG)上的大量实验,所提出的模型在不同信号长度下的分类准确性和鲁棒性方面始终优于其他方法。消融研究证实了局部自注意力块在提高性能方面的贡献,并且解码器中引入深度可分离卷积降低了计算复杂度,同时保持了高准确性。EEG-TransNet在受试者间具有最小的性能变化,突显了其作为基于EEG的大脑活动分类和情绪识别任务的鲁棒工具的潜力。

英文摘要

Electroencephalography (EEG) is a widely adopted technique for monitoring brain activity, offering valuable insights into neurological states due to its high temporal resolution and cost-effectiveness. To enhance the analysis of complex EEG data, we propose EEG-TransNet, an architecture designed to capture temporal, regional, and synchronous features of EEG signals. EEG-TransNet introduces three key modules: 1) a preprocessing and feature extraction module leveraging ResNet and wavelet-based denoising, 2) a Local Self-Attention Block for regional feature learning, and 3) a Fuzzy-Attention Synchronous Transformer (FAST) to model spatiotemporal dependencies. Through extensive experiments on three EEG datasets (BETA, SEED, and DepEEG), the proposed model consistently outperforms other methods in terms of classification accuracy and robustness across varying signal lengths. Ablation studies confirm the contribution of the Local Self-Attention Block in improving performance, and the inclusion of depthwise separable convolutions in the decoder reduces computational complexity while maintaining high accuracy. EEG-TransNet's ability to generalize across subjects with minimal performance variation highlights its potential as a robust tool for EEG-based brain activity classification and emotion recognition tasks.

2606.10802 2026-06-10 cs.LG cs.AI 新提交

Boosting ECG Classification Performance by Pre-training with Synthesized Data

通过合成数据预训练提升心电图分类性能

Naoki Nonaka, Jun Seita

发表机构 * Advanced Data Science Project, RIKEN Information R&D and Strategy Headquarters(理化学研究所信息研发与战略总部先进数据科学项目)

AI总结 提出基于医学知识的高斯组合合成算法生成单导联II心电图数据,用于预训练深度神经网络,在四种异常分类中平均提升最高33.2%,尤其在小数据集场景下效果显著。

详情
AI中文摘要

深度神经网络通常需要大量数据集才能有效训练。在医学领域,由于隐私问题和某些疾病的罕见性,获取大规模数据往往具有挑战性。为了解决数据稀缺问题,我们研究了使用基于领域医学知识生成的合成数据训练深度神经网络模型的有效性。具体来说,我们针对单导联II心电图开发了一种知识驱动的高斯组合合成算法,其中每个心跳由高斯形状的P、Q、R、S和T波分量表示。使用该模拟器,我们为四种异常心电图类别生成合成数据:心房颤动、心房扑动、室性早搏和沃尔夫-帕金森-怀特综合征。我们通过使用十种不同的深度神经网络架构进行异常心电图分类来评估该合成数据的效用。结果表明,合成到真实的训练提高了四种目标异常中三种的分类性能,其中心房扑动观察到的最大架构平均增益为33.2%。进一步分析表明,合成数据带来的性能提升在真实数据集较小时更为明显。这些发现表明,基于领域知识的合成心电图可以作为有用的预训练资源,特别是在真实数据有限或难以获取的场景中。

英文摘要

Deep Neural Networks (DNNs) typically require extensive datasets for effective training. In the medical domain, acquiring large-scale data is often challenging due to privacy concerns and the rarity of certain diseases. To address this data scarcity, we investigate the efficacy of training DNN models using synthetic data, generated based on domain-specific medical knowledge. Specifically, we develop a knowledge-driven Gaussian-composition synthesis algorithm for single-lead II ECGs, in which each heartbeat is represented by Gaussian-shaped P, Q, R, S, and T wave components. Using this simulator, we generate synthetic data for four abnormal electrocardiogram (ECG) classes: atrial fibrillation (AF), atrial flutter (AFLT), premature ventricular complex (PVC), and Wolff-Parkinson-White Syndrome (WPW). We evaluate the utility of this synthetic data by conducting abnormal ECG classification using ten different DNN architectures. Our results demonstrate that synthetic-to-real training improves classification performance for three of the four target abnormalities, with the largest architecture-averaged gain of $33.2\%$ observed for AFLT. Further analysis reveals that the performance enhancement from synthetic data is more pronounced with smaller real-world datasets. These findings suggest that domain-knowledge-based synthetic ECGs can serve as a useful pre-training resource, particularly in scenarios where real-world data are limited or difficult to obtain.

2606.10868 2026-06-10 cs.LG astro-ph.IM 新提交

When Do Autoregressive Sequence Models Forecast Physical Wavefields? A Controlled Study on Synthetic Seismograms

自回归序列模型何时能预测物理波场?基于合成地震图的受控研究

Waleed Esmail, Stuart Russell, Jana Klinge, Alexander Kappes, Christine Thomas

AI总结 通过合成三分量地震图受控消融实验,发现多token预测是自回归波场滚动预测稳定的主要因素,并揭示上下文比率阈值和相位感知损失的关键作用。

Comments 16 pages, 5 figures and 3 tables

详情
AI中文摘要

长时程自回归预测振荡物理信号(如地震图、引力波应变及类似波场)受限于误差累积:当因果模型在数百步中不断接收自身输出时,微小的每步误差会复合为相位漂移,而逐点指标无法检测到这种漂移。我们以合成三分量地震图作为物理结构化的测试平台,以\ extsc{SeismoGPT}自回归预测器作为研究对象,探究这种滚动预测何时保持稳定。通过受控的架构内消融实验,在自由运行滚动预测上结合配对显著性检验进行评估,我们分离了每个设计选择的贡献。多token预测是主要的稳定因素,几乎贡献了相对于单token基线的全部改进(中位数NCC提升+0.040);地平线嵌入混合预测头和跨地平线STFT幅度相干性损失各自增加了微小但一致的额外增益。性能严重依赖于接近1的上下文比率阈值(大致为观测信号的完整P-S区间),低于该阈值时滚动泛化能力崩溃。主要的残余失败是极性反转,而基于幅度的频谱损失无法(按设计)对此进行惩罚,这表明相位感知目标自然成为下一步方向。我们将此定位为对振荡波场滚动稳定性的受控研究,而非预测架构的基准测试。

英文摘要

Long-horizon autoregressive forecasting of oscillatory physical signals, such as seismograms, gravitational-wave strain, and similar wavefields is limited by error accumulation: as a causal model is fed its own outputs over hundreds of steps, small per-step errors compound into phase drift that pointwise metrics fail to detect. We ask when such rollout stays stable, using synthetic three-component seismograms as a physically structured testbed and the \textsc{SeismoGPT} autoregressive forecaster as the model under study. Through controlled, intra-architecture ablations evaluated on free-running rollout with paired significance tests, we isolate the contribution of each design choice. Multi-token prediction is the dominant stabilizer, accounting for almost the entire improvement over a single-token baseline ($+0.040$ median NCC); a horizon-embedding hybrid prediction head and a cross-horizon STFT-magnitude coherence loss each add a small but consistent further gain. Performance depends sharply on a context-ratio threshold near one, roughly the full P-S interval of observed signal, below which rollout generalization collapses. The dominant residual failure is a polarity inversion that a magnitude-based spectral loss cannot, by construction, penalize, identifying phase-aware objectives as the natural next step. We frame this as a controlled study of rollout stability on oscillatory wavefields, not a benchmark of forecasting architectures.

2606.11017 2026-06-10 cs.LG eess.AS 新提交

Data-Driven Runway and Taxiway Exits Prediction of Landing Aircraft: A Case Study at Hartsfield-Jackson Atlanta International Airport

基于数据驱动的着陆飞机跑道和滑行道出口预测:以哈茨菲尔德-杰克逊亚特兰大国际机场为例

Alex Porcayo, Yutian Pang, Maria Thomas, John-Paul Clarke

发表机构 * Department of Aerospace Engineering and Engineering Mechanics, The University of Texas at Austin(德克萨斯大学奥斯汀分校航空航天工程与工程力学系)

AI总结 针对机场场面运行效率问题,提出两阶段数据驱动决策辅助模型,分别预测着陆飞机选择的跑道出口及是否穿越活跃起飞跑道,在亚特兰大机场数据上验证,XGBoost和LightGBM表现最优,进近速度是出口选择的关键因素。

详情
AI中文摘要

机场场面运行日益成为高吞吐量枢纽性能的瓶颈。本研究考察了哈茨菲尔德-杰克逊亚特兰大国际机场(KATL)的到达滑入决策,并提出了一种模拟管制员工作流程的两阶段数据驱动决策辅助工具。第一阶段预测到达飞机选择的跑道出口。第二阶段预测,在给定该出口的情况下,飞机是否会在指定点穿越活跃起飞跑道或使用绕行滑行道。模型使用ASDE-X场面轨迹、飞机特性、停机坪目的地、短期交通率和天气数据,跨越多个回顾时间窗口进行训练。我们基准测试了九种分类器,包括随机森林、XGBoost、LightGBM和CatBoost,并评估了准确率、宏F1、精确率-召回率行为、混淆矩阵、Brier分数和期望校准误差。在东西流向中,XGBoost和LightGBM优于随机森林。第一阶段达到0.86-0.89的准确率,宏F1分数为0.40-0.50;第二阶段达到0.70-0.74的准确率,宏F1分数为0.28-0.55。特征重要性分析表明,进近速度是出口选择的主要驱动因素。起飞率、穿越率、停机坪目的地,以及对于西流向,所选出口是预测穿越与绕行路径的最强预测因子。由于特征空间重叠,少数类仍然难以预测,如t-SNE和UMAP分析所示。所提出的框架通过校准的、可解释的预测支持管制员态势感知,同时保留人类对最终路由决策的责任。

英文摘要

Airport surface operations increasingly constrain performance at high-throughput hubs. This study examines arrival taxi-in decisions at Hartsfield-Jackson Atlanta International Airport (KATL) and proposes a two-stage, data-driven decision aid that mirrors controller workflow. Stage I predicts the runway exit selected by an arriving aircraft. Stage II predicts whether, given that exit, the aircraft will cross the active departure runway at a designated point or use the end-around taxiway. Models are trained using ASDE-X surface trajectories, aircraft characteristics, ramp destinations, short-horizon traffic rates, and weather across multiple look-back windows. We benchmark nine classifiers, including Random Forest, XGBoost, LightGBM, and CatBoost, and evaluate accuracy, macro-F1, precision-recall behavior, confusion matrices, Brier score, and Expected Calibration Error. Across east and west flows, XGBoost and LightGBM outperform Random Forest. Stage I achieves 0.86-0.89 accuracy with macro-F1 scores of 0.40-0.50, while Stage II achieves 0.70-0.74 accuracy with macro-F1 scores of 0.28-0.55. Feature-importance analysis shows that approach speed is the main driver of exit choice. Departure rate, crossing rate, ramp destination, and, for west flow, the selected exit are the strongest predictors of crossing versus end-around routing. Minority classes remain harder to predict because of feature-space overlap, as shown by t-SNE and UMAP analyses. The proposed framework supports controller situational awareness through calibrated, explainable predictions while preserving human responsibility for final routing decisions.

2606.11033 2026-06-10 cs.LG cs.AI cs.CL 新提交

AuRA: Internalizing Audio Understanding into LLMs as LoRA

AuRA: 将音频理解内化到LLM中作为LoRA

Bo Cheng, Lei Shi, Zhanyu Ma, Yuan Wu, Jun Xu, Jiuchong Gao, Jinghua Hao, Renqing He

发表机构 * Meituan(美团) Jilin University(吉林大学)

AI总结 提出AuRA方法,通过层间蒸馏将ASR编码器的语音表示内化到LoRA适配的LLM中,实现紧耦合的语音-语言联合建模和高效并行端到端推理,在多个基准上优于级联系统和现有适应方法。

详情
AI中文摘要

最近将大语言模型(LLM)扩展到语音输入的努力通常依赖于级联的ASR-LLM流水线、端到端语音-语言模型或基于桥接/蒸馏的适应方法。虽然这些路线分别重用了强大的预训练组件、实现了原生语音-语言交互或提供了轻量级适应,但它们常常遭受转录-接口延迟、昂贵的多模态训练或顺序语音-语言耦合的问题。为了解决这些限制,我们提出了AuRA,一种将音频编码能力蒸馏到LLM中的方法。具体来说,AuRA通过一个轻量级音频嵌入层将相同的语音输入馈送到ASR编码器(作为教师)和LoRA适配的LLM(作为学生),并使用逐层蒸馏将学生的隐藏状态与相应的教师表示对齐,从而将语音表示内化到轻量级的LLM侧适应中。与级联和串行桥接方法相比,AuRA实现了更紧密的语音-语言联合建模和高效的并行端到端推理,同时重用了预训练的语音和语言模型,而不需要大规模的多模态训练。在多个语音-语言基准上,AuRA在有效性和效率方面始终优于级联系统、语音到LLM适应基线以及大规模语音-语言和多模态模型。

英文摘要

Recent efforts to extend large language models (LLMs) to speech inputs typically rely on cascaded ASR-LLM pipelines, end-to-end speech-language models, or bridge/distillation-based adaptation. While these routes respectively reuse strong pretrained components, enable native speech-language interaction, or offer lightweight adaptation, they often suffer from transcript-interface latency, costly multimodal training, or sequential speech-language coupling. To address these limitations, we present AuRA, a method that distills audio encoding capability into the LLM. Specifically, AuRA feeds the same speech input to an ASR encoder (as a teacher) and a LoRA-adapted LLM (as a student) through a lightweight audio embedding layer, and uses layer-wise distillation to align the student's hidden states with corresponding teacher representations, thereby internalizing speech representations into lightweight LLM-side adaptations. Compared with cascaded and serial bridge methods, AuRA enables tighter speech-language joint modeling and efficient parallel end-to-end inference, while also reusing pretrained speech and language models rather than requiring large-scale multimodal training. On multiple speech-language benchmarks, AuRA consistently outperforms cascaded systems, speech-to-LLM adaptation baselines, and large-scale speech-language and multimodal models in both effectiveness and efficiency.

2606.11057 2026-06-10 cs.LG q-bio.BM stat.ML 新提交

Flexible Kernels for Protein Property Prediction

用于蛋白质性质预测的灵活核函数

Martin Jankowiak, Yerdos Ordabayev, Rudraksh Tuwani, Henry N. Ward, Hunter Nisonoff, James M. McFarland, Gevorg Grigoryan

发表机构 * University of Cambridge(剑桥大学)

AI总结 提出利用进化替代矩阵和局部线性性的序列核函数,结合高斯过程实现数据高效的蛋白质性质预测,并融入结构信息进行多任务学习。

Comments 50 pages; to appear at ICML 2026

详情
AI中文摘要

尽管对蛋白质设计应用至关重要,但从稀疏实验数据预测蛋白质性质(如结合亲和力和热稳定性)仍然是一个重大挑战。因此,我们引入了一类序列核函数,利用进化替代矩阵以及局部线性性,并证明由此产生的高斯过程为蛋白质性质景观提供了数据高效的模型,通常优于依赖基础模型嵌入的替代方法。此外,通过学习实际上是结构感知的替代矩阵,我们展示了我们的核函数可以轻松地整合来自基础模型的结构信息。我们证明了这些结构条件核函数非常适合跨多个蛋白质性质景观的多任务学习,并且可以显著优于局部监督学习方法。

英文摘要

Despite its importance to applications in protein design, predicting protein properties like binding affinity and thermostability from sparse experimental data remains a significant challenge. Accordingly, we introduce a class of sequence kernels that exploit evolutionary substitution matrices as well as local linearity and demonstrate that the resulting Gaussian processes provide data-efficient models of protein property landscapes, frequently outperforming alternatives that rely on foundation model embeddings. Furthermore--by learning what are in effect structure-aware substitution matrices--we show that our kernels can readily incorporate structural information from foundation models. We demonstrate that these structure-conditioned kernels are well suited to multi-task learning across multiple protein property landscapes and can decisively outperform local supervised learning methods.

2606.11066 2026-06-10 cs.LG q-bio.NC 新提交

GRAFT: Gain-Recalibrated Adapters for Transformer-Based Neural Population Activity Modeling

GRAFT: 基于Transformer的神经群体活动建模中的增益重校准适配器

Xiangsheng Ge, Yang Xie

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出GRAFT模型,通过分离可重用时间动态与可重校准神经元接口,在MC Maze数据集上达到SOTA,并仅更新9.21%参数实现跨天重校准。

详情
AI中文摘要

神经群体活动模型可以从分箱的尖峰信号中恢复丰富的时间结构,但其读入和读出层通常与固定的记录神经元集合绑定。这种耦合限制了在长期脑机接口中的重用,因为记录神经元的身份、数量和响应统计可能每天变化。我们引入了GRAFT,一种基于Transformer的神经群体活动模型,它将可重用时间动态与可重校准的神经元接口分离。神经元接口控制记录神经元如何进入和离开共享骨干网络,辅助增益和位置机制支持Transformer内部的神经活动建模。在标准NLB'21协议下的MC Maze上,GRAFT作为集成模型达到0.3866 co-bps,在公共和报告的NLB'21结果中,在主要co-bps指标上创造了新的最先进水平。在从NLB'21 MC Maze数据集系列构建的跨天协议中,GRAFT通过仅更新9.21%的参数,从MC Maze重校准到缩放后的MC Maze数据集(Large/Medium/Small),在受限的目标天支持集下分别达到0.3749、0.3112和0.3152 co-bps。这些结果表明,相同的接口-骨干分离既支持强大的基于Transformer的神经群体活动建模,也支持数据高效的跨天重校准。

英文摘要

Neural population activity models can recover rich temporal structure from binned spikes, but their read-in and readout layers often remain tied to a fixed set of recorded neurons. This coupling limits reuse in long-term brain-computer interfaces, where recorded neuron identities, counts, and response statistics can change across days. We introduce GRAFT, a Transformer-based neural population activity model that separates reusable temporal dynamics from a recalibratable neuron interface. The neuron interface controls how recorded neurons enter and leave the shared backbone, and auxiliary gain and positional mechanisms support neural activity modeling inside the Transformer. On MC Maze under the standard NLB'21 protocol, GRAFT reaches 0.3866 co-bps as an ensemble, setting a new state of the art on the primary co-bps metric among public and reported NLB'21 results. In a cross-day protocol constructed from the NLB'21 MC Maze dataset series, GRAFT recalibrates from MC Maze to the scaled MC Maze datasets (Large/Medium/Small) by updating only 9.21% of parameters, reaching 0.3749, 0.3112, and 0.3152 co-bps with restricted target-day support sets. These results show that the same interface-backbone separation supports both strong Transformer-based neural population activity modeling and data-efficient cross-day recalibration.

2606.09852 2026-06-10 cs.HC cs.AI cs.CL cs.LG cs.MA cs.SE 交叉投稿

LLM-Based Code Documentation Generation and Multi-Judge Evaluation

基于LLM的代码文档生成与多裁判评估

Ikbel Ghrab, Mohamed Dhieb, Ismail Khenissi, Ines Abdeljaoued-Tej

发表机构 * University of Tunis El Manar(突尼斯国家理工大学)

AI总结 提出利用八种大语言模型自动生成代码文档,并通过多裁判评估框架(四个LLM从九个维度评分)提升文档质量,在医学物理库上实验显示最佳与最差模型性能差距达42%。

Comments ICAHS, \c{opyright} 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works

详情
Journal ref
Conference ICAHS IEEE, 2025
AI中文摘要

高质量的源代码文档至关重要但往往被忽视,尤其是在医疗保健等关键领域,可靠性和可维护性至关重要。我们提出了一个AI驱动的框架,利用八种最先进的大语言模型(包括GPT、Gemini、Qwen和LLaMA变体)自动从代码和仓库生成文档。该系统基于PocketFlow编排框架,采用模块化流水线和高级提示工程,生成结构化、上下文感知的文档。为确保质量并指导模型选择,我们引入了MultiLLMasJudges评估框架,其中四个独立的LLM从九个标准(如完整性、清晰度和忠实度)评估输出。在开源医学物理库上进行的实验表明,最佳和最差模型之间的性能差距为42%。通过结合多样化的模型输出、优化的提示和严格的评估,我们的方法提高了文档质量并减少了人工工作量,特别是在安全关键的医疗软件中。

英文摘要

High-quality source code documentation is vital yet often neglected, especially in critical domains like healthcare where reliability and maintainability are essential. We presented an AI powered framework that automates documentation generation from code and repositories using eight state of the art Large Language Models (LLMs), including GPT, Gemini, Qwen, and LLaMA variants. Built on the PocketFlow orchestration framework, the system applies modular pipelines and advanced prompt engineering to produce structured, context aware documentation. To ensure quality and guide model selection, we introduced a MultiLLMasJudges evaluation framework, where four independent LLMs assess outputs across nine criteria, such as Completeness, Clarity, and Faithfulness. Experiments conducted on an open-source medical physics library, demonstrated showed a 42% performance gap between top and bottom models. By combining diverse model outputs, optimized prompting, and rigorous evaluation, our approach enhances documentation quality and reduces manual effort, especially in safety critical healthcare software.

2606.09896 2026-06-10 cs.GT cs.AI cs.LG 交叉投稿

HMAF: A Hierarchical Multi-Slot GD-RTB Allocation Framework

HMAF:一种分层多槽GD-RTB分配框架

Tianxing Bu, Zhaoqi Zhang, Linyou Cai, Miao Xie, Shengri Xue, Tan Qu, Qianlong Xie, Xingxing Wang, Siqiang Luo, Gao Cong

发表机构 * Meituan(美团) Nanyang Technological University(南洋理工大学) China Agricultural University(中国农业大学)

AI总结 针对GD与RTB共存广告平台中短期收益与长期交付的平衡难题,提出分层多槽分配框架HMAF,采用计划-校准-执行范式,集成离线约束优化与在线决策,在美团实现GD交付率提升3.72%、广告总收入提升1.59%。

Comments Accepted by KDD 2026 Applied Data Science Track

详情
AI中文摘要

在现代在线广告平台中,保量交付(GD)合约与实时竞价(RTB)拍卖共存并相互竞价。现有方法要么将GD和RTB优化解耦,要么依赖启发式优先级规则,因此在复杂多槽交付和曝光约束下,难以有效平衡短期收入最大化与长期合约交付。为应对这些挑战,我们提出HMAF(分层多槽分配框架),一个旨在优化GD-RTB广告平台中曝光分配的统一框架。HMAF采用计划-校准-执行范式作为其核心结构,整合离线约束优化与在线决策,平衡离线GD资源规划、动态校准GD-RTB竞争力,并在多槽环境中做出实时列表级排名决策。HMAF已在全球最大在线食品配送平台之一美团的多项营销场景中实施,使GD交付率提升3.72%,广告总收入提升1.59%。

英文摘要

In modern online advertising platforms, Guaranteed Delivery (GD) contracts coexist and bid with Real-Time Bidding (RTB) auctions. Recent approaches either decouple GD and RTB optimization or rely on heuristic priority rules, and thus fail to effectively balance short-term revenue maximization with long-term contract delivery under complex multi-slot delivery and impression constraints. To address these challenges, we propose HMAF (Hierarchical Multi-Slot Allocation Framework), a unified framework designed to optimize impression allocation in GD--RTB advertising platforms. HMAF employs the Plan--Calibrate--Execute paradigm as its core structure, and integrates offline constraint optimization with online decision-making, balancing offline GD resource planning, dynamically calibrating GD--RTB competitiveness, and making real-time listwise rank decisions across multi-slot environments. HMAF has been implemented in multiple marketing scenarios at Meituan, one of the world's largest online food delivery platforms, leading to a 3.72% increase in GD delivery rate and a 1.59% increase in total advertisement revenue.

2606.09953 2026-06-10 eess.IV cs.AI cs.LG 交叉投稿

Deep Slice Interpolation for Reducing Through-Plane Anisotropy and Noise in Head CT

深度切片插值用于减少头部CT的穿平面各向异性和噪声

Luis Cortés Ferre, Miguel A. Gutiérrez-Naranjo, Marcin Balcerzyk

发表机构 * Department of Computer Science and Artificial Intelligence, University of Seville(塞维利亚大学计算机科学与人工智能系) Bioaraba Health Research Institute(Bioaraba健康研究 institute) IKERBASQUE, Basque Foundation of Science(巴斯克科学基金会)

AI总结 提出一种深度学习系统,通过相邻轴向切片对合成中间CT切片,将有效穿平面间距减半,同时实现隐式降噪,在结构指标上优于经典插值和视频帧插值方法。

详情
AI中文摘要

头部计算机断层扫描(CT)通常使用亚毫米级的面内分辨率,但穿平面间距为2-5毫米,造成显著的各向异性,这会降低多平面重建、血肿体积估计等体积测量以及假设近似各向同性体素的后续算法的性能。我们提出一个深度学习系统,从相邻轴向切片对合成中间CT切片,将有效穿平面间距减半。该系统改善三维可视化,同时产生固有降噪的输出,在一次推理中实现两个互补优势。为构建可靠系统,我们系统评估像素级损失(均方误差MSE和平均绝对误差L1)、结构相似性损失(结构相似性指数SSIM及其多尺度变体MS-SSIM)以及混合组合。在保留测试集上,所有收敛模型在所有结构指标上均优于经典插值基线和预训练视频帧插值方法(RIFE、FILM),其中MS-SSIM+L1提供最强平衡性能。我们还记录了SSIM族损失中的训练不稳定性并识别部分补救措施:标准数值修复消除了主要失败模式,但在较小批量大小下留下残余发散。所有结果均报告患者级自助法置信区间和配对统计检验。作为示例,我们将系统应用于来自Virgen del Rocío大学医院的非分布头部CT序列:模型合成中间切片,并在真实切片上表现出我们理论分析预测的隐式降噪特征,支持在单个外部病例中插值质量和隐式降噪不局限于训练分布。

英文摘要

Head computed tomography (CT) typically uses sub-millimeter in-plane resolution but 2-5 mm through-plane spacing, creating substantial anisotropy that degrades multiplanar reconstructions, volumetric measurements such as hematoma volume estimation, and downstream algorithms that assume near-isotropic voxels. We present a deep learning system that synthesizes intermediate CT slices from pairs of neighboring axial slices, halving the effective through-plane spacing. The system improves three-dimensional visualization while simultaneously producing inherently denoised outputs, yielding two complementary benefits from a single inference pass. To build a reliable system, we systematically evaluate pixel-wise losses, namely mean squared error (MSE) and mean absolute error (L1); structural-similarity losses, namely the structural similarity index (SSIM) and its multi-scale variant (MS-SSIM); and hybrid combinations. On a held-out test set, all converged models outperform classical interpolation baselines and pretrained video frame interpolation methods (RIFE, FILM) on all structural measures, with MS-SSIM+L1 offering the strongest balanced profile. We also document training instability in SSIM-family losses and identify partial remedies: the standard numerical fixes eliminate the dominant failure mode but leave residual divergence at smaller batch sizes. All results are reported with patient-level bootstrap confidence intervals and paired statistical tests. As an illustration, we apply the system to an out-of-distribution head CT series from Hospital Universitario Virgen del Rocío: the model synthesizes intermediate slices and exhibits on the real slices the implicit-denoising signature predicted by our theoretical analysis, supporting in a single external case that interpolation quality and implicit denoising are not confined to the training distribution.

2606.10025 2026-06-10 cs.RO cs.CV cs.LG 交叉投稿

GHOST: Hierarchical Sub-Goal Policies for Generalizing Robot Manipulation

GHOST: 用于泛化机器人操作的层次化子目标策略

Sriram Krishna, Ben Eisner, Haotian Zhan, Ying Yuan, Haoyu Zhen, Chuang Gan, Shubham Tulsiani, David Held

AI总结 提出GHOST框架,通过将控制分解为高层子目标预测和低层目标条件控制器,实现视觉运动操作策略的泛化,并利用人类演示适应新物体和任务变化。

Comments Accepted at RSS 2026

详情
AI中文摘要

我们提出了GHOST,一个学习视觉运动操作策略的框架,该策略能够泛化到训练分布之外。GHOST将控制分解为:(i) 高层策略,从多视角RGB-D观测中预测下一个子目标作为3D末端执行器位姿的分布,以及(ii) 低层目标条件控制器,执行特定于具体体的动作。为了将基于图像的策略条件化于3D目标,我们引入了一个简单的空间接口,将预测的目标投影到图像平面,并将其表示为末端执行器热图。在一系列操作任务中,与平坦的扩散策略相比,这种层次化分解持续提高了性能和鲁棒性。此外,我们展示了这种层次化接口也使得整合人类演示变得容易,而无需依赖(嘈杂的)动作重定向。由于子目标在很大程度上与具体体无关,我们在人类视频上训练高层策略,以指定如何应用和组合学到的技能,同时保持低层策略仅在机器人数据上训练。这种层次结构使得能够使用少量人类演示适应新物体和任务变化。

英文摘要

We present GHOST, a framework for learning visuomotor manipulation policies that generalize beyond the training distribution. GHOST factorizes control into (i) a high-level policy that predicts the next sub-goal as a distribution over 3D end-effector poses from multi-view RGB-D observations, and (ii) a low-level goal-conditioned controller that executes embodiment-specific actions. To condition image-based policies on 3D goals, we introduce a simple spatial interface that projects predicted goals into the image plane and represents them as end-effector heatmaps. Across a suite of manipulation tasks, this hierarchical factorization consistently improves performance and robustness compared to a flat Diffusion Policy. Further, we show that this hierarchical interface also makes it easy to incorporate human demonstrations without relying on (noisy) action retargeting. As sub-goals are largely embodiment-agnostic, we train the high-level policy on human video to specify how learned skills should be applied and composed, while keeping the low-level policy trained purely on robot data. This hierarchy enables adaptation to novel objects and task variations using a small number of human demonstrations.

2606.10112 2026-06-10 cs.GT cs.AI cs.LG econ.TH 交叉投稿

Duality for Optimal Multi-Item, Multi-Bidder Auction Design: Revenue Certificates through Deep Learning

最优多物品多竞拍者拍卖设计的对偶性:通过深度学习的收入证书

Yanchen Jiang, David C. Parkes, Tonghan Wang

发表机构 * Harvard University(哈佛大学) College of AI, Tsinghua University(清华大学人工智能学院)

AI总结 提出首个直接处理多物品多竞拍者拍卖对偶问题的计算框架,通过神经网络参数化拉格朗日乘子并引入提升技术,生成可证明的收入上界,为连续类型提供近最优性证书。

详情
AI中文摘要

刻画多物品、多竞拍者设置下的收入最优拍卖仍然是一个基本开放问题,除了限制性的二元类型实例外,没有已知的闭式解。这激发了人们对最优拍卖设计的计算方法的兴趣。在本文中,我们引入了第一个直接处理多物品、多竞拍者拍卖和占优策略激励相容(DSIC)的对偶问题的计算框架,生成有证书的收入上界。我们的方法使用神经网络参数化具有结构保证的严格流量守恒性质的拉格朗日乘子,从而通过梯度下降对可行对偶解进行高效优化。为了弥合离散计算方法与连续类型的理论保证之间的差距,我们开发了一种新颖的提升技术,将对偶证书从粗离散化映射到精细细化。我们证明,对于具有连续均匀估值的多物品、多竞拍者拍卖,提升给出了有效的收入上界。此外,我们给出了任意连续分布的广义提升构造,并证明了这些提升对偶在离散极限下收敛到原始连续问题的收入。我们通过恢复典型实例的已知分析机制,验证了该对偶拍卖设计问题的计算框架。对于多物品多竞拍者问题,我们的框架在最优收入与已知最佳DSIC机制之间建立了小差距,提供了近最优性的计算证书。

英文摘要

Characterizing revenue-optimal auctions for multi-item, multi-bidder settings remains a fundamental open problem, with no known closed-form solution existing beyond restrictive binary-type instances. This has motivated interest in computational approaches to optimal auction design. In this paper, we introduce the first computational framework that directly tackles the dual problem for multi-item, multi-bidder auctions and dominant-strategy incentive compatibility (DSIC), generating certified revenue upper bounds. Our approach parametrizes Lagrange multipliers with a structurally guaranteed strict flow-conservation property using neural networks, enabling efficient optimization over feasible dual solutions via gradient descent. To bridge the gap between discrete computational methods and theoretical guarantees for continuous types, we develop a novel lifting technique that maps dual certificates from coarse discretizations to fine refinements. We prove that lifting gives valid revenue upper bounds for multi-item, multi-bidder auctions with continuous uniform valuations. Furthermore, we give a generalized lifting construction for arbitrary continuous distributions and demonstrate that these lifted duals converge to the revenue of the original continuous problem in the discrete limit. We validate this computational framework for the dual auction design problem by recovering known analytical mechanisms for canonical instances. For multi-item multi-bidder problems, our framework establishes a small gap between the optimal revenue and best-known DSIC mechanisms, providing computational certificates of near-optimality.

2606.10126 2026-06-10 cs.CL cs.AI cs.CY cs.LG 交叉投稿

Pareto-Guided Teacher Alignment for Fair Personalized Text Generation

帕累托引导的教师对齐用于公平个性化文本生成

Tunazzina Islam

发表机构 * Purdue University(普渡大学)

AI总结 提出帕累托引导的教师对齐框架,通过修订候选生成、对感知可行性门控、帕累托候选选择和偏好优化,在减少人口统计差异的同时保持个性化保真度,实验表明公平缓解效果依赖于目标且跨域迁移不一致。

详情
AI中文摘要

个性化说服性文本生成可以提高相关性和参与度,但人口统计条件也可能引入跨群体的不平等框架。我们将个性化生成中的公平缓解研究为一个受约束的多目标对齐问题:在保持个性化保真度的同时减少人口统计差异。我们提出一个帕累托引导的教师对齐框架,结合了基于修订的候选生成、对感知可行性门控、帕累托风格的候选选择,以及通过监督微调和直接偏好优化的可选偏好优化。我们在气候变化和疫苗接种说服任务上评估该框架,使用一个受控的上下文丰富的人口统计网格(匹配性别和年龄对)以及一个统一的五审计评估套件,涵盖说服偏见、形式差异、情感框架差异、词汇关联差异和个性化保真度。在两个领域和跨族系迁移设置中,没有单一的对齐策略能同时主导所有目标。相反,方法占据了公平-个性化帕累托前沿的不同区域:一些方法实现更强的差异减少,而另一些则更好地保持个性化或人口统计稳定性。我们的结果表明,公平缓解效果依赖于目标,并在领域和模型族系间不一致地迁移,这促使在公平敏感的个性化生成中采用有界回归、多审计模型选择而非单指标优化。

英文摘要

Personalized persuasive text generation can improve relevance and engagement, but demographic conditioning may also introduce unequal framing across groups. We study fairness mitigation in personalized generation as a constrained multi-objective alignment problem: reduce demographic disparities while preserving personalization fidelity. We propose a Pareto-guided teacher alignment framework that combines revision-based candidate generation, pair-aware feasibility gating, Pareto-style candidate selection, and optional preference optimization through supervised fine-tuning and direct preference optimization. We evaluate the framework on climate change and vaccination persuasion tasks using a controlled context-rich demographic grid with matched gender and age pairs and a unified five-audit evaluation suite spanning persuasion bias, formality disparity, emotional framing disparity, lexical association disparity, and personalization fidelity. Across both domains and cross-family transfer settings, no single alignment strategy dominates all objectives simultaneously. Instead, methods occupy different regions of a fairness-personalization Pareto frontier: some achieve stronger disparity reductions, while others better preserve personalization or demographic stability. Our results show that fairness mitigation effects are objective-dependent and transfer inconsistently across domains and model families, motivating bounded-regression, multi-audit model selection over single-metric optimization for fairness-sensitive personalized generation.

2606.10233 2026-06-10 eess.AS cs.LG cs.SD 交叉投稿

ANCHOR: Autoregressive Non-intrusive Chunk-Ordered Refinement for Joint Multi-Resolution Speech Quality Modeling

ANCHOR: 自回归非侵入式分块有序细化用于联合多分辨率语音质量建模

Zhuoyan Tao, Jiatong Shi, Hye-jin Shim, Shinji Watanabe

发表机构 * University of Southern California, USA(美国南加州大学) Carnegie Mellon University, USA(美国卡内基梅隆大学)

AI总结 提出ANCHOR模型,将增量语音质量评估重构为多分辨率自回归任务,通过双分辨率令牌和分辨率感知层次实现分块到整句的粗到细细化,在部分输入下显著降低误差,并揭示感知质量的时域积累机制。

Comments Accepted at Interspeech 2026

详情
AI中文摘要

虽然语音质量通常是在完整话语上评估的,但流式和生成系统需要从部分音频中进行增量估计。现有的预测器假设完整的上下文,在受前缀约束的输入上性能下降。扩展ARECHO,我们提出ANCHOR,将增量评估重新表述为多分辨率自回归任务。它使用双分辨率令牌和分辨率感知层次结构在单个解码器中建模分块级和话语级质量,实现从粗到细的细化。实验表明,在部分输入下具有显著的鲁棒性,包括在2秒前缀上PLCMOS误差减少48%。收敛性分析揭示了4-6秒的有效感知上下文范围。压力测试进一步隔离了局部损坏下的结构化外推偏差。结果表明,层次监督改进了增量预测,并阐明了感知质量如何随时间累积。

英文摘要

While speech quality is typically assessed on complete utterances, streaming and generative systems require incremental estimation from partial audio. Existing predictors assume full context, degrading on prefix-constrained inputs. Extending ARECHO, we propose ANCHOR, reformulating incremental assessment as a multi-resolution autoregressive task. It models chunk- and utterance-level quality within a single decoder using dual-resolution tokens and a resolution-aware hierarchy for coarse-to-fine refinement. Experiments show substantial robustness under partial input, including a 48% PLCMOS error reduction on 2-second prefixes. Convergence analysis reveals a 4-6 s effective perceptual context horizon. A stress test further isolates structured extrapolation biases under localized corruption. Results demonstrate that hierarchical supervision improves incremental prediction and elucidates how perceptual quality accumulates over time.

2606.10279 2026-06-10 cs.AI cs.CL cs.LG 交叉投稿

Supervised Fine-tuning with Synthetic Rationale Data Hurts Real-World Disease Prediction

使用合成理由数据进行监督微调损害真实世界疾病预测

Buxin Su, Bingxuan Li, Cheng Qian, Yiwei Wang, Jin Jin, Bingxin Zhao

发表机构 * University of Pennsylvania(宾夕法尼亚大学) University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) University of California, Merced(加州大学默塞德分校)

AI总结 研究发现,在临床预测任务中,使用合成理由数据进行监督微调反而显著降低模型性能,根本原因在于叙事合理性与判别优化之间的结构性冲突。

详情
AI中文摘要

监督微调中使用合成理由数据被广泛认为能通过教导模型不仅预测什么而且预测原因来提升语言模型在临床预测任务上的性能。我们在基于纵向健康史进行五年阿尔茨海默病及相关痴呆症(ADRD)预测的任务上检验了这一假设。通过一项包含504种配置的大规模对照实验,我们发现,与仅使用标签的微调相比,基于理由的SFT始终且显著地损害了预测性能。这种退化在多个模型系列和数据规模中持续存在,并且无法通过使用面向推理的基础模型来解决。关键的是,这种失败并非由理由质量差所致:人类专家注释证实生成的理由在医学上是准确的,并且忠实于患者特定的证据;少样本实验表明,当相同的理由作为推理时的演示而非训练目标使用时,能提升性能。我们确定根本原因在于叙事合理性与判别优化之间的结构性冲突。我们希望我们的工作能为更精确地理解理由监督何时以及如何有帮助、何时无帮助铺平道路,从而指导在高风险临床预测中负责任地开发语言模型。

英文摘要

Supervised fine-tuning with synthetic rationale data is widely assumed to improve language model performance on clinical prediction tasks by teaching models not just what to predict but why. We test this assumption on five-year Alzheimer's disease and related dementias (ADRD) prediction from longitudinal health histories. Across a large-scale controlled experiment of 504 configurations, we find that rationale-based SFT consistently and substantially hurts prediction performance relative to label-only fine-tuning. The degradation persists across model families and data scales, and is not resolved by using a reasoning-oriented base model. Crucially, the failure is not explained by poor rationale quality: human expert annotation confirms that the generated rationales are medically accurate and faithfully grounded in patient-specific evidence, and few-shot experiments show that the same rationales improve performance when used as inference-time demonstrations rather than training targets. We identify the root cause as a structural conflict between narrative plausibility and discriminative optimization. We hope our work paves the path toward a more precise understanding of when and how rationale-based supervision helps and when it does not, guiding the responsible development of language models for high-stakes clinical prediction.

2606.10349 2026-06-10 cond-mat.dis-nn cond-mat.str-el cs.LG 交叉投稿

Magnetic HIP-NN for spin dynamics in disordered itinerant magnets

磁性HIP-NN用于无序巡游磁体中的自旋动力学

Supriyo Ghosh, Yunhao Fan, Sheng Zhang, Kipton Barros, Gia-Wei Chern

发表机构 * Department of Physics, University of Virginia(弗吉尼亚大学物理系) Department of Chemistry, University of Chicago(芝加哥大学化学系) Theoretical Division and CNLS, Los Alamos National Laboratory(洛斯阿拉莫斯国家实验室理论 division 和 CNLS)

AI总结 提出磁性HIP-NN(mHIP-NN),通过旋转不变自旋关联的分层消息传递,高效模拟无序巡游磁体中电子介导的自旋动力学,准确再现Landau-Lifshitz-Gilbert动力学和热淬火后的非平衡自旋关联演化。

Comments 12 pages, 5 figures

详情
AI中文摘要

我们提出了分层相互作用粒子神经网络(HIP-NN)的磁性扩展,用于实现无序巡游磁体中电子介导自旋动力学的大规模模拟。由此产生的磁性HIP-NN(mHIP-NN)将旋转不变的自旋关联直接纳入分层消息传递层,使网络能够从耦合的几何-自旋环境中学习涌现的磁能景观和有效局域场,同时保持自旋旋转对称性。作为基准应用,我们考虑了结构无序的巡游$s$-$d$交换模型,其中有效磁力动态地来源于瞬时电子结构,并且使用传统的基于精确对角化的方法在计算上难以评估。我们表明,mHIP-NN准确再现了控制Landau-Lifshitz-Gilbert动力学的局域力矩,并忠实地捕捉了热淬火后空间自旋关联的非平衡演化。我们的结果确立了对称性感知的分层消息传递网络作为大规模模拟受挫巡游自旋系统和非平衡磁动力学的高效且可扩展的框架。更广泛地说,由于学习的能量泛函对原子坐标和自旋变量完全可微,该框架也为自旋依赖的原子间势和耦合原子-自旋动力学提供了自然基础。

英文摘要

We present a magnetic extension of the Hierarchically Interacting Particle Neural Network (HIP-NN) that enables large-scale simulations of electron-mediated spin dynamics in disordered itinerant magnets. The resulting magnetic HIP-NN (mHIP-NN) incorporates rotationally invariant spin correlations directly into hierarchical message-passing layers, enabling the network to learn emergent magnetic energy landscapes and effective local fields from coupled geometric-spin environments while preserving spin-rotation symmetry. As a benchmark application, we consider structurally disordered itinerant $s$-$d$ exchange models in which the effective magnetic forces arise dynamically from the instantaneous electronic structure and are computationally prohibitive to evaluate using conventional exact-diagonalization-based approaches. We show that mHIP-NN accurately reproduces the local torques governing Landau-Lifshitz-Gilbert dynamics and faithfully captures the nonequilibrium evolution of spatial spin correlations following thermal quenches. Our results establish symmetry-aware hierarchical message-passing networks as an efficient and scalable framework for large-scale simulations of frustrated itinerant spin systems and nonequilibrium magnetic dynamics. More broadly, because the learned energy functional remains fully differentiable with respect to both atomic coordinates and spin variables, the framework also provides a natural foundation for spin-dependent interatomic potentials and coupled atom-spin dynamics.

2606.10472 2026-06-10 cs.GT cs.LG 交叉投稿

Trading Utility for Dynamic Fairness in Multiple Resource Division with Sequential Demand

在顺序需求的多资源分配中权衡效用与动态公平性

Kaiqi Jiang, Karim El Husseini, Wenzhe Fan, Xinhua Zhang

发表机构 * Computer Science Dept. University of Illinois Chicago(伊利诺伊大学芝加哥分校计算机科学系)

AI总结 提出一种神经分配机制,通过多目标优化在顺序分配中平衡公平与效用,实现更高的效用同时保持可比公平性。

详情
AI中文摘要

动态多资源分配是共享计算环境中的一个核心问题,其中用户的需求顺序到达,且必须在不知道未来需求的情况下公平分配资源。现有方法强调公平性保证,如共享激励、无嫉妒和动态帕累托最优性,但往往忽略系统效用。此外,这些公平性标准互不兼容,无法同时严格实施。我们提出一种神经分配机制,通过在顺序展开过程中进行多目标优化来调和公平性与效用。我们首先通过共享激励、无嫉妒和动态帕累托最优性的逐步损失函数形式化动态环境中的公平性,从而实现可微训练。利用非浪费性,我们通过将分配约束在需求子空间内来参数化解,同时允许在资源可用时进行弹性过度分配。实验结果表明,我们学习的分配器在可比公平性水平下实现了显著更高的效用,揭示了跨指标的清晰帕累托前沿式权衡。

英文摘要

Dynamic multi-resource allocation is a central problem in shared computing environments, where users' demands arrive sequentially and resources must be distributed fairly without knowledge of future demands. Existing methods emphasize fairness guarantees such as Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, but often overlook system utility. Moreover, these fairness criteria are mutually incompatible, preventing strict enforcement of them at the same time. We propose a neural allocation mechanism that reconciles fairness with utility through multi-objective optimization during sequential rollout. We first formalize fairness in the dynamic setting via stepwise loss functions for Sharing Incentive, Envy Freeness, and Dynamic Pareto Optimality, enabling differentiable training. Leveraging non-wastefulness, we parameterized the solutions by constraining allocations to the subspace of demand while allowing elastic over-allocation when resources remain available. Empirical results demonstrate that our learned allocator achieves substantially higher utility at comparable levels of fairness, uncovering clear Pareto-frontier-like tradeoffs across metrics.

2606.10547 2026-06-10 eess.IV cond-mat.mtrl-sci cs.LG physics.ins-det 交叉投稿

Unsupervised Deep Learning for Limited-Angle STEM-EDX Tomography -- Application to 3D Chemical Analysis of Phase-Change Memory Devices

无监督深度学习用于有限角度STEM-EDX层析成像——在相变存储器件三维化学分析中的应用

Daniel del Pozo Bueno, Serge Brosset, Theo Monniez, Gabriele Navarro, Philippe Ciuciu, Zineb Saghi

发表机构 * CEA, LETI, Univ. Grenoble Alpes(CEA LETI 格鲁诺布尔大学) CEA, Neurospin, Paris-Saclay University(CEA 神经科学研究中心 巴黎-萨克雷大学) Inria, MIND(Inria MIND)

AI总结 提出基于深度图像先验和全变分正则化的无监督深度学习框架(DIP-TV及多通道扩展DIPm-TV),解决有限角度STEM-EDX层析成像中的缺失楔伪影和噪声问题,实现相变存储器件的三维化学分析。

Comments 29 pages (17 main manuscript + 12 supplementary information), 4 figures, 8 supplementary figures, 1 table, and 4 supplementary tables

详情
AI中文摘要

扫描透射电子显微镜(STEM)中的能量色散X射线(EDX)层析成像能够实现纳米尺度的三维成分和元素映射,但其应用受到有限倾斜范围和避免束损伤所需的低剂量条件的限制。有限角度采集会引入缺失楔伪影,如拉长和各向异性分辨率,而噪声低剂量数据进一步降低重建质量和定量可靠性。本文提出了一种基于深度图像先验和全变分正则化(DIP-TV)的无监督深度学习框架,用于有限角度STEM-EDX层析成像。我们将其扩展为多通道公式(DIPm-TV),通过利用空间相关性联合重建多个元素图。使用合成三通道体模,我们展示了该方法在中等噪声下补偿了约$100^\circ$缺失角度范围对应的严重缺失楔伪影,性能优于同步迭代重建技术和压缩感知方法。我们将该方法应用于原始(制备态)和SET(晶态)工作状态下的Ge-Sb-Te(GST)存储器件三维化学分析。样品制备为横截面聚焦离子束薄片,并在$-40^\circ$至$+40^\circ$的有限角度倾斜范围内以$5^\circ$步长和$2.0\times10^5$ $e^-/Ang^2$剂量采集。多通道方法仅利用EDX信号实现逐体素元素重建,无需外部结构先验(如高角环形暗场成像)。重建体积显示出近各向同性空间分辨率,并揭示了与器件操作相关的成分异质性。该方法能够在实验可实现的样品几何结构中进行三维化学表征,而传统方法因严重的角度限制而失效。

英文摘要

Energy Dispersive X-ray (EDX) tomography in Scanning Transmission Electron Microscopy (STEM) enables 3D compositional and elemental mapping at the nanoscale, but its use is limited by restricted tilt ranges and low-dose conditions required to avoid beam damage. Limited-angle acquisition introduces missing-wedge artefacts such as elongation and anisotropic resolution, while noisy low-dose data further degrade reconstruction quality and quantitative reliability. Here, we introduce an unsupervised deep learning framework based on Deep Image Prior with total variation regularization (DIP-TV) for limited-angle STEM-EDX tomography. We extend it to a multi-channel formulation (DIPm-TV) that jointly reconstructs multiple elemental maps by exploiting spatial correlations. Using a synthetic 3-channel phantom, we show that the method compensates for severe missing-wedge artefacts corresponding to approximately $100^\circ$ of missing angular range under moderate noise, outperforming simultaneous iterative reconstruction technique and compressed sensing approaches. We apply the method to 3D chemical analysis of Ge-Sb-Te (GST) memory devices in virgin (as-fabricated) and SET (crystalline) operational states. Samples were prepared as cross-sectional focused ion beam lamellae and acquired under a limited-angle tilt range from $-40^\circ$ to $+40^\circ$ with $5^\circ$ steps and a dose of $2.0\times10^5$ $e^-/Ang^2$. The multi-channel approach enables voxel-by-voxel elemental reconstruction using only EDX signals without external structural priors such as high-angle annular dark-field imaging. The reconstructed volumes show near-isotropic spatial resolution and reveal compositional heterogeneities associated with device operation. This approach enables 3D chemical characterization in experimentally accessible sample geometries where conventional methods fail due to severe angular limitations.

2606.10600 2026-06-10 eess.SY cs.LG cs.SY 交叉投稿

Toward Proactive RF Charging Scheduling: Generative AI for Decision Support

面向主动射频充电调度:用于决策支持的生成式人工智能

Amirhossein Azarbahram, Osmel M. Rosabal, David Ernesto Ruiz-Guirola, Melike Erol-Kantarci, Kaibin Huang, Onel L. A. López

发表机构 * Centre for Wireless Communications (CWC), University of Oulu, Finland(芬兰奥卢大学无线通信中心) School of Electrical Engineering and Computer Science, University of Ottawa, Ottawa, Canada(加拿大渥太华大学电气与计算机科学学院) Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong(香港大学电子与电气工程系)

AI总结 本文提出将生成式AI作为不确定性感知支持层,辅助射频无线充电调度器在有限资源和不确定条件下做出鲁棒充电决策,并通过仓库案例验证其有效性。

详情
AI中文摘要

射频无线能量传输(RF-WPT)是一种支持未来物联网系统不间断通信的使能技术,通过减少电池更换需求和缓解电池废弃物相关问题。在大规模RF-WPT部署中,主要挑战之一是调度器级别的资源分配。具体而言,发射器必须在有限的充电资源、不完整的接收端信息以及不确定的近未来充电条件下,决定输送多少能量、何时输送以及向谁输送。本文将生成式人工智能(GenAI)定位为一种有前景的工具,因为它能够基于粗略的操作上下文和接收端信息,预见多种可能的充电场景。我们提出GenAI作为RF-WPT调度器的不确定性感知支持层,而非独立的预测或决策工具。为此,我们首先重新审视RF-WPT调度面临的主要挑战,并讨论主要GenAI系列如何通过为下游任务生成基于场景的输入来支持不确定性感知的充电决策。然后,我们通过一个仓库式案例研究表明,与确定性预测和简单的无学习基线相比,通过生成模型的采样能力保留不确定性可以改善鲁棒充电决策,尤其是在风险敏感目标下。最后,我们指出了关键开放挑战并提出了未来研究方向。

英文摘要

Radio frequency wireless power transfer (RF-WPT) is an enabling technology for supporting uninterrupted communications in future Internet of Things systems by reducing the need for battery replacement and mitigating battery-waste-related issues. For large-scale RF-WPT deployment, one of the main challenges is the scheduler-level resource allocation. Specifically, the transmitter must decide how much energy to deliver, when, and to whom, under limited charging resources, incomplete receiver-side information, and uncertain near-future charging conditions. This article positions generative artificial intelligence (GenAI) as a promising tool for this setting because it can foresee multiple plausible charging scenarios conditioned on coarse operational context and receiver-side information. We propose GenAI to act as an uncertainty-aware support layer for the RF-WPT scheduler rather than as a standalone forecasting or decision-making tool. To this end, we first revisit the main challenges of RF-WPT scheduling, and discuss how major GenAI families can support uncertainty-aware charging decisions by generating scenario-based inputs for downstream tasks. We then present a warehouse-style case study showing that preserving uncertainty through the sampling capability of generative models can improve robust charging decisions compared with deterministic prediction and simple non-learning baselines, especially under risk-sensitive objectives. Finally, we identify key open challenges and present some directions for future research.

2606.10627 2026-06-10 cs.HC cs.LG cs.SD 交叉投稿

Profy: Interpretable Visualization of Expertise-Dependent Motor Skills Toward Supporting Piano Practice

Profy: 面向钢琴练习的、可解释的专业技能依赖性运动技能可视化

Kazuki Kawamura, Fujiki Nakamura, Hayato Nishioka, Momoko Shioki, Shinichi Furuya, Jun Rekimoto

发表机构 * The University of Tokyo(东京大学) Sony Computer Science Laboratories(索尼计算机科学实验室) NeuroPiano Institute(神经钢琴研究所)

AI总结 提出弱监督系统Profy,利用听众评分标签学习时间对齐的高亮,帮助钢琴学习者定位需重点练习的段落,在无局部标签下与专家标注高度一致。

Comments Designing Interactive Systems Conference (DIS '26), June 13-17, 2026, Singapore, Singapore

详情
AI中文摘要

钢琴演奏的质量取决于微妙的时机、发音和动态控制,但练习反馈通常是基于总结的且难以付诸行动。我们介绍了Profy,一个弱监督系统,它从聚合听众评分(专家标记与业余标记)中学习片段级标签,生成时间对齐的高亮,用于钢琴练习中的回顾。我们收集了73名钢琴家的同步1 kHz键运动与音频数据,并使用1083个有效片段进行建模和评估。模型在共享的重采样模型时间基上输出片段级预测和证据分数以进行可视化。在21名专家钢琴家标注的20个业余短技术练习片段上,尽管训练时没有局部标签,显示的高亮分数与专家标记用于回顾的段落一致(Pearson r=0.61,ROC-AUC 0.75)。Profy不是用一个全局分数总结一个片段,而是通过支持与专家-业余差异相关的时间局部段落的擦洗、循环和聚焦回放,帮助学习者决定下一步检查哪里。

英文摘要

The quality of piano performance depends on nuanced timing, articulation, and dynamic control, but practice feedback is often summary-based and hard to act on. We introduce Profy, a weakly supervised system that learns from take-level labels derived from aggregated listener ratings (expert-labeled vs. amateur-labeled) to produce time-aligned highlights for review during piano practice. We collected synchronized 1 kHz key-motion and audio from 73 pianists and used 1,083 valid takes for modeling and evaluation. The model outputs clip-level predictions together with evidence scores on a shared resampled model time base for visualization. On 20 amateur clips from short technique studies annotated by 21 expert pianists, the displayed highlight score aligns with passages that expert pianists marked for review despite training without localized labels (Pearson r=0.61, ROC-AUC 0.75). Rather than summarizing a take with a single global score, Profy helps learners decide where to inspect next by supporting scrubbing, looping, and focused replay of time-localized passages associated with expert-amateur differences.

2606.10686 2026-06-10 physics.comp-ph astro-ph.IM cs.LG 交叉投稿

An adaptive framework for the axisymmetric pulsar magnetosphere using physics-informed Kolmogorov-Arnold networks

基于物理信息Kolmogorov-Arnold网络的轴对称脉冲星磁层自适应框架

Spyros Rigas, Ioannis Contopoulos, Georgios Alexandridis, Antonios Nathanail

发表机构 * Department of Digital Industry Technologies, School of Science, National and Kapodistrian University of Athens(数字产业技术系,科学学院,国家与卡布利安大学) Research Center for Astronomy and Applied Mathematics, Academy of Athens(天文与应用数学研究所,雅典学院)

AI总结 提出基于Kolmogorov-Arnold网络的自适应框架,结合自动化训练流程和物理收敛准则,在双精度下将PDE残差均方误差降至O(1e-6),收敛时间缩短至20分钟内,并可靠解析缩小80%的恒星半径。

Comments 25 pages, 10 figures. Submitted to Journal of Computational Physics

详情
AI中文摘要

脉冲星磁层直到最近才通过物理信息神经网络(PINNs)进行研究,采用区域分解方法并将分离线和赤道电流片视为无限薄的间断。然而,这一基线方法需要大量手动超参数调整,最终精度有限且需要数小时训练。我们通过引入基于Kolmogorov-Arnold网络的领域特定神经架构、自动化自适应训练流程以及基于物理的收敛准则来改进这一框架,消除了手动校准的需求。所提出的方法提供了自洽的轴对称磁层解,在双精度下PDE残差的均方误差达到O(1e-6)量级——比基线方法提高了两个数量级——同时在单精度下在20分钟内实现收敛。重要的是,该方法可靠地解析了相比基线缩小高达80%的恒星半径,克服了同样挑战传统求解器的严重空间尺度差异。此外,通过改变开放至无穷远的磁通量,我们提供了将其与赤道T点位置关联的方程的修正。完整框架已作为开源库PulsarX发布。

英文摘要

The pulsar magnetosphere has only recently been addressed using Physics-Informed Neural Networks (PINNs), by deploying a domain-decomposition approach and treating the separatrix and equatorial current sheet as infinitesimally thin discontinuities. However, this baseline requires extensive manual hyperparameter tuning, achieves limited final accuracy and demands several hours of training. We refine this framework by introducing domain-specific neural architectures based on Kolmogorov-Arnold networks, an automated adaptive training pipeline and a physics-based convergence criterion that eliminate the need for manual calibration. The proposed methodology delivers self-consistent axisymmetric magnetosphere solutions with mean squared errors of the PDE residuals at O(1e-6) in double precision - an improvement of two orders of magnitude over the baseline - while achieving convergence in under 20 minutes in single precision. Importantly, the method reliably resolves stellar radii reduced by up to 80% compared to the baseline, overcoming the severe spatial scale disparities that also challenge traditional solvers. Furthermore, by varying the flux that opens to infinity, we provide a correction to the equation that connects it to the equatorial T-point's position. The complete framework is released as the open-source library PulsarX.

2606.10692 2026-06-10 cs.CR cs.LG 交叉投稿

Do LLMsMakeNeural Distinguishers Wise?

LLM 是否使神经区分器更智能?

Tatsuya Sakagami, Masashi Hisai, Naoto Yanai

发表机构 * University of Tokyo(东京大学)

AI总结 本文提出基于大语言模型(LLM)的神经区分器,通过提示设计在SPECK-32/64上实验,发现LLM未显著提升性能,高轮次下差分选择失效,但加入XOR结果可改善性能。

详情
Journal ref
DeMeSSAI 2026 poster
AI中文摘要

神经区分器是一种对称密钥密码的密码分析方法,它通过训练机器学习模型于具有特定差分的明文-密文对来恢复密钥。据我们所知,现有工作尚未探索使用大语言模型(LLM)进行神经区分器。在本文中,我们通过提示设计提出了基于LLM的神经区分器,并在SPECK-32/64上对其进行了广泛实验,以研究LLM能否增强神经区分器。然后,我们发现了三个关键见解。第一,通过将基于LLM的神经区分器与现有工作中的ResNet结果进行比较,我们证明LLM在神经区分器性能上没有提供可观察到的改进。第二,我们确认在高轮次下,差分的选择对基于LLM的神经区分器以及ResNet不再有效。第三,我们表明,通过仅将XOR运算结果作为提示设计,可以显著提高基于LLM的神经区分器的性能。

英文摘要

Neural distinguishers are a cryptanalysis method for symmetric-key cryptography that trains machine learning models on pairs of plaintexts and ciphertexts with specific differences in order to recover a secret key. To the best of our knowledge, no existing work has explored the use of large language models (LLMs) for neural distinguishers. In this paper, we propose LLM-based neural distinguishers through a prompt design and conduct extensive experiments with them on SPECK-32/64 to investigate whether LLMs can strengthen neural distinguishers. We then found three key insights. First, by comparing the results of LLM-based neural distinguishers with ResNet in the existing work, we demonstrate that LLMs provide no observable improvement in the performance of neural distinguishers. Second, we confirm that, at high rounds, the choice of differences is no longer effective for LLM-based neural distinguishers as well as ResNet. Third, we show that the performance of LLM-based neural distinguishers can be significantly improved by incorporating only the XOR operation results as a prompt design.

2606.10698 2026-06-10 hep-ph cs.LG hep-th 交叉投稿

Efficient AI-Inspired Reduction of Feynman Integrals via Tube Seeding

基于管状播种的费曼积分高效类脑约化

Justin Berman, Francois Charton, Andres Luna, Matthias Wilhelm, Mao Zeng

发表机构 * Leinweber Institute for Theoretical Physics, Randall Laboratory of Physics, University of Michigan, Ann Arbor, 450 Church St, Ann Arbor, MI 48109-1040, USA(莱因韦伯理论物理研究所,物理系拉尔登实验室,密歇根大学安娜堡分校) Axiom Math, 124 University Avenue, Palo Alto, California, 94301, United States(Axiom数学公司,帕洛阿尔托,加利福尼亚州,94301,美国) Niels Bohr International Academy, Niels Bohr Institute, University of Copenhagen, Blegdamsvej 17, 2100 Copenhagen , Denmark(尼尔斯·波尔国际学院,尼尔斯·波尔研究所,哥本哈根大学) Center for Quantum Mathematics, Department of Mathematics and Computer Science, University of Southern Denmark, Campusvej 55, 5230 Odense M, Denmark(量子数学中心,数学与计算机科学系,丹麦南部大学) Higgs Centre for Theoretical Physics, University of Edinburgh, Edinburgh, EH9 3FD, United Kingdom(希格斯理论物理中心,爱丁堡大学)

AI总结 利用机器学习发现一种新的种子选择策略,通过稀疏种子线性增长实现高幂次多圈积分的约化,显著降低计算时间和内存占用,适用于唯象应用。

Comments 61 pages, 25 figures, 11 tables

详情
AI中文摘要

在本文中,我们利用机器学习发现了一种新的种子选择策略,用于费曼积分的分部积分约化,这是理论粒子物理和引力波物理前沿计算中常见的瓶颈。我们的策略允许通过本质上标准的Laporta算法,但采用稀疏的种子积分选择,仅随分子幂次线性增长,从而约化具有大分子幂次的多圈积分,而现有策略会导致随被约化积分复杂度增加的多项式幂次增长。种子被限制在一个薄管状区域内,该区域沿之字形路径将目标积分与主积分连接起来。我们通过约化具有数值运动学、秩为20的非平面2圈5点积分来展示我们方法的能力,这对于传统播种的Laporta算法来说难以实现。超越单个积分,我们进一步展示了通过将目标积分分成若干块来约化完整的一组顶层秩10积分,每块都可以通过我们的稀疏播种策略以显著少于其他先进策略的时间和内存占用求解,使该方法适用于唯象应用。我们在GitHub上提供了原理验证实现,网址为https://this URL。

英文摘要

In this paper, we use machine learning to discover a new seeding strategy for integration-by-parts reduction of Feynman integrals, which is a frequent bottleneck in state-of-the-art calculations in theoretical particle and gravitational-wave physics. Our strategy allows us to reduce multi-loop integrals with large numerator powers via essentially the standard Laporta algorithm but with a sparse selection of seed integrals that grows only linearly with the numerator power, whereas existing strategies lead to growth with a polynomial power that increases with the complexity of the integral being reduced. The seeds are restricted to a thin tube-like region that connects the target integral to the master integrals along a zigzag path. We demonstrate the power of our approach by reducing non-planar 2-loop 5-point integrals of rank 20 with numerical kinematics over a finite field, which is prohibitively difficult for the Laporta algorithm with conventional seeding. Going beyond individual integrals, we further demonstrate the reduction of a complete set of top-level rank-10 integrals by dividing the target integrals into several chunks, each of which can be solved by our sparse seeding strategy with considerably less time and a significantly lower memory footprint than other state-of-the-art strategies, making the approach well-suited for phenomenological applications. We provide a proof-of-principle implementation on GitHub at https://github.com/andreslunagodoy/tube_seeding.

2606.10713 2026-06-10 eess.IV cs.AI cs.CV cs.LG 交叉投稿

++nnU-Net: Scaling nnU-Net with Prefix-Based Data Augmentation

++nnU-Net: 基于前缀数据增强的nnU-Net扩展

Ana Sofia Santos, André Ferreira, Gijs Luijten, Naida Solak, Lisle Faray de Paiva, Behrus Hinrichs-Puladi, Jens Kleesiek, Jan Egger, Victor Alves

发表机构 * Center Algoritmi / LASI, University of Minho, Braga, Portugal(阿尔戈里米中心/拉斯伊大学,明霍大学,布拉加,葡萄牙) Institute for Artificial Intelligence in Medicine, University Medicine Essen, Essen, Germany(医学人工智能研究所,埃森医学院,埃森,德国) Institute of Medical Informatics / Dept. of Oral and Maxillofacial Surgery, University Hospital RWTH Aachen, Germany(医学信息学研究所/口腔和颅面外科部,亚琛大学医院,德国) Faculty of Computer Science, University of Duisburg-Essen, Essen, Germany(计算机科学学院,杜伊斯堡-埃森大学,埃森,德国)

AI总结 提出++nnU-Net,通过图像配准进行数据增强,在预处理和训练前生成变形图像,在5个2D数据集上提升Dice系数最高约22%。

Comments 7 pages, 1 figure, 2 tables

详情
AI中文摘要

nnU-Net在医学分割任务中持续展现出成功,这严重依赖于标注生物医学数据的可用性和多样性。然而,由于隐私法规和标注成本等因素,收集医学影像队列仍然具有挑战性。因此,数据增强在增加数据可用性的同时保持解剖学可行性方面起着关键作用。为此,我们提出了++nnU-Net,一种基于图像配准的新型数据增强模块,在预处理和训练之前运行。我们的框架在五个不同的2D数据集上进行了评估。在该工作流中,图像数据经过两阶段配准过程,生成新的变形图像。然后将变换应用于相应的分割。此外,该管道计算可用磁盘空间,生成补充的二进制合成掩码并生成检查点。我们证明++nnU-Net优于nnU-Net基线,在Dice相似系数得分上有所提升。在最显著的情况下,我们观察到性能提升约22%。这些发现强调了基于配准的数据增强的有效性,特别是对于2D医学影像数据集,并表明++nnU-Net为在数据有限的情况下提高分割性能提供了一种实用且可扩展的方法。++nnU-Net的源代码可在以下网址获取:this https URL

英文摘要

The nnU-Net has demonstrated continuous success in medical segmentation tasks, which heavily rely on the availability and diversity of annotated biomedical data. However, assembling medical imaging cohorts remains challenging due to numerous factors such as privacy regulations and annotation costs. As a result, data augmentation plays a crucial role in increasing data availability while maintaining anatomical feasibility. Hence, we propose the ++nnU-Net, a novel data augmentation module based on image registration that operates prior to preprocessing and training take place. Our framework was evaluated across five different 2D datasets. In this workflow, image data go through a two-stage registration process, generating new warped images. The transformations are then applied to the respective segmentation. In addition, the pipeline computes available disk space, generates supplementary binary synthetic masks and generates checkpoints. We demonstrate that the ++nnU-Net outperforms the nnU-Net baseline, yielding improvements in Dice Similarity Coefficient scores. In the most prominent cases, we observe performance gains of approximately 22\%. These findings highlight the effectiveness of registration-based data augmentation, particularly for 2D medical imaging datasets and suggest that the ++nnU-Net provides a practical and scalable approach for enhancing segmentation performance in data-limited settings. The source code for the ++nnU-Net is available at: https://github.com/sofia-adelie/plusplusnnunet.git

2606.10782 2026-06-10 cs.CR cs.AI cs.LG 交叉投稿

A Bayesian Network Approach for Enhancing Security-Focused Decision Support Systems

一种增强安全导向决策支持系统的贝叶斯网络方法

Carolina Fernández-Martínez, Shuaib Siddiqui, Vanesa Daza

发表机构 * University of Granada(格拉纳达大学) University of Birmingham(伯明翰大学)

AI总结 提出基于贝叶斯网络的决策支持系统,帮助基础设施运营商选择安全工具,通过捕获用户需求并推理,提供最优安全机制,评估了时间和预测精度。

详情
Journal ref
Proc. 2025 IEEE 50th Conference on Local Computer Networks (LCN), 2025
AI中文摘要

当今大多数基于开源网络的异构栈的采用和集成带来了明显的优势,如互操作性和高级功能的可用性。然而,另一方面,互联组件和移动部件数量的增加需要维护跨不同领域的不同工具的跨学科知识基础,以确保正常运行。为了减轻这些工作,本文提出了一种决策支持系统(DSS),指导基础设施运营商选择在其环境中采用的安全方法(例如工具)。该框架能够轻松捕获最终用户对不同领域安全三元组的高层需求,并在指定模型上运行推理,以提供更好地满足这些需求的已识别工具(安全机制)。所提出的DSS旨在提供一个可理解和可扩展的框架,以适应不同的需求和贝叶斯网络(BN)模型。提出了系统的架构和建模,并与其理论框架保持一致。其性能在时间和预测精度方面进行了评估。

英文摘要

The adoption and integration of heterogeneous stacks in most of today's open-source based networks brings clear benefits like interoperability and availability of advanced features. Yet, on the other hand the increasing number of interconnecting components and moving parts requires maintaining an ever increasing base of interdisciplinary knowledge of different tools in different domains to ensure proper operation. To alleviate such efforts, this work proposes a Decision Support System (DSS) to guide infrastructure operators through the selection of security approaches (e.g. tools) to adopt in their environments. This framework easily captures the end-user high-level requirements on the security triad for different domains and runs inference on the designated models to provide the identified tools (security mechanisms) that better serve such needs. The presented DSS aims at delivering an understandable and extensible framework to accommodate varying requirements and Bayesian Network (BN) models. The architecture and modelling of the system are proposed, aligned with its theoretical framework. Its performance is evaluated in terms of time and prediction accuracy.

2606.10889 2026-06-10 q-bio.NC cs.LG 交叉投稿

Sleep EEG Signal Criticality as a Non-Invasive Predictor of Cognitive Decline in Dementia

睡眠脑电信号临界性作为痴呆认知衰退的非侵入性预测指标

Stanisław Narębski, Tomasz Komendziński, Tomasz M. Rutkowski

发表机构 * Institute of Cybernetics and Human Informatics, Polish Academy of Sciences(波兰科学院信息学与人类科学研究所)

AI总结 研究通过多重分形去趋势波动分析量化睡眠脑电信号临界性,发现认知健康者更接近最优临界状态,痴呆组DFA指数向1.0偏移,表明睡眠中无标度神经动力学重组先于临床症状,可作为早期筛查工具。

Comments 4 pages, 2 figures, accepted for publication in the Proc. 48th Annu. Int. Conf. IEEE EMBS (EMBC 2026), Toronto, Canada, July 20-24, 2026

详情
AI中文摘要

神经退行性疾病的早期检测仍然是一个关键的临床挑战。本研究探讨了通过多重分形去趋势波动分析(MFDFA)量化的睡眠脑电信号临界性是否可作为未来认知衰退的非侵入性生物标志物。我们分析了国家睡眠研究资源(NSRR)骨质疏松性骨折研究(SOF)队列的纵向数据,比较了保持认知正常与后来进展为痴呆相关损伤(3MS < 78)的女性之间的基线睡眠脑电动力学。我们的结果揭示了Hurst指数$H(q)$分布在组间的显著差异,特别是在非快速眼动阶段N2和N3期间。认知健康的个体在所有电极位置上表现出显著更接近最优临界状态的信号动力学($p \leqslant 0.001$),支持了大脑临界性假说。监督UMAP投影证实了整夜睡眠期间组间的清晰空间分离。痴呆组表现出DFA指数向$1.0$的偏移,表明睡眠中无标度神经动力学的重组先于临床症状。这些发现强调了将MFDFA衍生测量整合到自动化、基于睡眠的筛查工具中的潜力,从而能够在痴呆的前驱窗口期进行更早的预防性干预。

英文摘要

Early detection of neurodegeneration remains a critical clinical challenge. This study investigates whether sleep EEG signal criticality, quantified via Multifractal Detrended Fluctuation Analysis (MFDFA), serves as a non-invasive biomarker for future cognitive decline. We analyzed longitudinal data from the National Sleep Research Resource (NSRR) Study of Osteoporotic Fractures (SOF) cohort, comparing baseline sleep EEG dynamics between women who remained cognitively normal and those who later progressed to dementia-related impairment ($3MS < 78$).Our results reveal significant group-level differences in Hurst exponent $H(q)$ distributions, particularly during non-REM stages N2 and N3. Cognitively healthy individuals exhibited signal dynamics significantly closer to an optimally critical state across all electrode locations ($p \leqslant 0.001$), supporting the Brain Criticality Hypothesis. Supervised UMAP projections confirmed clear spatial separation between groups throughout the overnight sleep architecture.The dementia group demonstrated a shift in DFA exponents toward $1.0$, suggesting that a reconfiguration of scale-free neural dynamics during sleep precedes clinical symptoms. These findings highlight the potential for MFDFA-derived measures to be integrated into automated, sleep-based screening tools, enabling earlier preventative interventions during the prodromal window of dementia.

2606.10908 2026-06-10 cs.SD cs.AI cs.CR cs.LG 交叉投稿

RAT: Reference-Augmented Training for ASV Anti-Spoofing

RAT:面向ASV反欺骗的参考增强训练

Vojtěch Staněk, Anton Firc, Jakub Reš, Kamil Malinka

发表机构 * Brno University of Technology(布尔诺理工大学)

AI总结 提出一种基于说话人参考录音的欺骗对抗架构,发现训练时引入参考通道可提升深度伪造检测性能,即使推理时参考缺失或失配。基于此提出参考增强训练(RAT)策略,在ASVspoof 5基准上以单个检测器达到2.57% EER和0.074 minDCF,超越大型集成系统。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

我们引入了一种以说话人参考录音为条件的欺骗对抗架构,但观察到它收敛到一种在推理时有效忽略参考的解决方案。令人惊讶的是,使用参考通道进行训练会诱导出不变性,从而改进深度伪造检测,即使在推理时参考缺失或失配。基于这一观察,我们提出了一种参考增强训练(RAT)策略。与单话语基线相比,RAT产生了改进的检测性能,即使在推理时将参考录音替换为零向量时也是如此。通过严格分析,我们证明优化过程迅速减少了参考贡献,导致推理很大程度上独立于参考通道。使用RAT,我们在ASVspoof 5基准上以单个检测器实现了最先进的2.57%等错误率和0.074最小检测代价函数,甚至超越了大型集成系统。

英文摘要

We introduce a spoofing countermeasure architecture conditioned on speaker-reference recordings, but observe that it converges to a solution that effectively ignores the reference during inference. Surprisingly, training with a reference channel induces invariance that improves deepfake detection, even when the reference is absent or mismatched during inference. Based on this observation, we propose a Reference-Augmented Training (RAT) strategy. RAT yields improved detection performance compared to single-utterance baselines, even when the reference recording is replaced with a zero vector at inference. Through rigorous analysis, we demonstrate that the optimization process rapidly diminishes the reference contributions, leading to inference largely independent of the reference channel. Using RAT, we achieve state-of-the-art 2.57% EER and 0.074 minDCF on the ASVspoof 5 benchmark with a single detector, surpassing even large ensemble systems.

2606.10909 2026-06-10 cs.CE cs.LG physics.comp-ph 交叉投稿

Non-linear mechanical field reconstruction coupling recurrent neural networks with physics-informed graph neural networks

非线性力学场重建:循环神经网络与物理信息图神经网络的耦合

Manuel Ricardo Guevara Garban, Yves Chemisky, Étienne Prulière, Michaël Clément, Martin Abendroth, Björn Kiefer

发表机构 * Univ. Bordeaux, CNRS, Bordeaux INP, I2M, UMR 5295(波尔多大学、国家科学研究中心、波尔多工业学院、I2M、UMR 5295) Univ. Bordeaux, CNRS, Bordeaux INP, LaBRI, UMR 5800(波尔多大学、国家科学研究中心、波尔多工业学院、LaBRI、UMR 5800) Institute of Mechanics and Fluid Dynamics, TU Bergakademie Freiberg(力学与流体动力学研究所、弗赖堡技术大学) Univ. Grenoble Alpes, CNRS, UMR 5525(格勒诺布尔阿尔卑斯大学、国家科学研究中心、UMR 5525) Arts et Metiers Institute of Technology, CNRS, Bordeaux INP, I2M, UMR 5295(艺术与工艺技术学院、国家科学研究中心、波尔多工业学院、I2M、UMR 5295)

AI总结 提出LSTM-GNN耦合框架,利用LSTM编码宏观应力-应变序列的路径依赖响应,物理信息GNN重建空间应力场,通过相对加权策略平衡损失,实现弹塑性微观结构应力场快速重建,速度提升三个数量级。

详情
AI中文摘要

在非线性、历史依赖载荷下重建异质微结构的局部应力场仍然是多尺度模拟中的主要计算瓶颈。我们提出了一种耦合的LSTM-GNN框架,将局部应力场重建的时间和空间方面联系起来。长短期记忆网络将宏观应力-应变序列编码为紧凑的隐藏状态,捕获路径依赖的本构响应,而物理信息图神经网络在每个时间步重建空间分辨的应力场。我们引入了一种带有线性热启动的相对加权策略,以平衡数据驱动的重建损失和基于离散散度的平衡惩罚。这解决了在弹塑性范围内阻止固定权重公式收敛的尺度不匹配问题。该模型在应用于周期性含孔板微结构和von Mises弹塑性的10,000条非比例加载路径上训练。该模型相比有限元模拟实现了三个数量级的加速,并泛化到训练长度两倍的加载序列,累积误差为1.9%。由于图依赖于网格连通性而非特定单元类型,一个训练好的代理模型可以直接应用于不同单元类型、更粗和更细分辨率的网格,无需重新训练,同时在所有情况下复现训练中使用的高保真四边形单元有限元场。实际上,GNN和MeshGraphNet架构固有的消息传递特性使模型与网格无关。对LSTM隐藏状态的分析表明,存在与本构模型内部状态变量相关的低维结构。

英文摘要

Reconstructing local stress fields in heterogeneous microstructures under non-linear, history-dependent loading remains a major computational bottleneck in multi-scale simulations. We propose a coupled LSTM-GNN framework that links the temporal and spatial aspects of local stress field reconstruction. A Long Short-Term Memory network encodes macroscopic stress-strain sequences into a compact hidden state that captures the path-dependent constitutive response, while a physics-informed Graph Neural Network reconstructs the spatially-resolved stress field at each time step. We introduce a relative weighting strategy with linear warm-up to balance the data-driven reconstruction loss and a discrete divergence-based equilibrium penalty. This resolves the scale mismatch that prevents fixed-weight formulations from converging in the elasto-plastic regime. The model is trained on 10,000 non-proportional loading paths applied to a periodic plate-with-a-hole microstructure and von Mises elasto-plasticity. The model achieves three orders of magnitude speedup over finite element simulations and generalizes to loading sequences twice the training length, with 1.9% cumulative error. Because the graph relies on mesh connectivity instead of the specific element type, one trained surrogate can be applied directly without retraining to meshes with different element types and to both coarser and finer resolutions, while in all cases reproducing the high-fidelity quad-element FE field used during training. Indeed, the message passing characteristics inherent to GNN and MeshGraphNet architecture render the model mesh-agnostic. Analysis of the LSTM hidden states suggests a low-dimensional structure related to the internal state variables of the constitutive model.

2606.10928 2026-06-10 cs.CE cs.AI cs.LG physics.comp-ph 交叉投稿

A Constrained Natural-Language Interface for Variational Multi-Physics Finite Element Simulations in FEniCS

FEniCS中变分多物理场有限元模拟的受约束自然语言接口

Nilay Upadhyay, Wesley F. Reinhart

发表机构 * Department of Engineering Science and Mechanics, The Pennsylvania State University(工程科学与力学系,宾夕法尼亚州立大学) Department of Materials Science and Engineering, The Pennsylvania State University(材料科学与工程系,宾夕法尼亚州立大学)

AI总结 提出一种受约束的自然语言接口,将LLM限制在前端任务(解析提示、生成Gmsh代码),后端使用确定性模板求解器,在基准测试中实现100%解析率和90%几何生成成功率。

Comments 23 pages, 17 figures

详情
AI中文摘要

大型语言模型可以减少设置有限元模拟所需的手动工作,但当生成的求解器代码位于关键路径上时,会引入可靠性风险。我们提出了一种用于多物理场有限元分析的受约束自然语言接口,其中LLM仅限于前端任务:将提示解析为结构化JSON,仅对非目录几何生成Gmsh代码,并对这些阶段使用重试反馈。它从不编写FEniCS求解器模板、推导弱形式或编写数值求解器核心。一个确定性调度器将验证后的规范映射到五个手写的FEniCS/UFL模板:线弹性、超弹性、弹塑性、热力耦合和相场断裂。我们针对解析解和已发表的2D/3D基准测试验证了该确定性模板层。在适当网格上,平滑案例达到低于1%的一致性,而较难的非线性案例达到2-5%的范围。我们还直接评估了面向LLM的前端。在15个提示的解析器基准测试中,首次通过有效解析获得了9个案例,其余所有案例在重试后修复,最终有效解析率为100.0%,问题类别准确率为100.0%,字段提取准确率为97.1%。在通过真实LLM到Gmsh路径路由的10个案例自定义几何基准测试中,首次通过和最终成功率均为90.0%,一次未恢复的无效几何失败。这些结果表明,解析器和受约束的提示/验证设计在这些基准测试上是有效的。作为端到端演示,该系统从一个自然语言提示生成并分析了一个带有圆角和螺栓孔的3D弹塑性L形支架。贡献在于一种用于自然语言驱动的变分模拟的测量架构,而非开放式的自主代码生成。

英文摘要

Large language models can reduce the manual effort required to set up finite element simulations, but they introduce reliability risks when generated solver code lies on the critical path. We present a constrained natural-language interface for multi-physics finite element analysis in which the LLM is limited to front-end tasks: parsing prompts into structured JSON, generating Gmsh code only for non-catalog geometries, and using retry feedback for those stages. It never writes FEniCS solver templates, derives weak forms, or writes the numerical solver core. A deterministic dispatcher maps the validated specification to five human-written FEniCS/UFL templates: linear elasticity, hyperelasticity, elastoplasticity, thermo-mechanical coupling, and phase-field fracture. We validate this deterministic template layer against analytical solutions and published 2D/3D benchmarks. Smooth cases reach sub-percent agreement on adequate meshes, while harder nonlinear cases reach the 2-5 percent range. We also evaluate the LLM-facing front end directly. In a 15-prompt parser benchmark, first-pass valid parses were obtained for 9 cases, and all remaining cases were repaired after retry, giving a final valid parse rate of 100.0 percent, 100.0 percent problem-class accuracy, and 97.1 percent field-extraction accuracy. In a 10-case custom-geometry benchmark routed through the real LLM-to-Gmsh path, first-pass and final success were both 90.0 percent, with one unrecovered invalid-geometry failure. These results show that the parser and constrained prompt/validation design are effective on these benchmarks. As an end-to-end demonstration, the system generates and analyzes a 3D elastoplastic L-bracket with a fillet and bolt hole from one natural-language prompt. The contribution is a measured architecture for natural-language-driven variational simulation, not open-ended autonomous code generation.

2606.10942 2026-06-10 cs.NI cs.AI cs.LG 交叉投稿

Generative Explainability for Next-Generation Networks: LLM-Augmented XAI with Mutual Feature Interactions

下一代网络的生成式可解释性:基于互特征交互的LLM增强XAI

Kiarash Rezaei, Omran Ayoub, Sebastian Troia, Francesco Lelli, Paolo Monti, Carlos Natalino

发表机构 * Swedish Innovation Agency(瑞典创新署) Swiss Innovation Agency(瑞士创新署)

AI总结 提出一种利用大语言模型和互特征交互数据生成自然语言解释的框架,在光传输质量估计用例中,相比基线方法,解释有用性和范围分别提升12.2%和6.2%,正确率达97.5%。

Comments 7 pages, with one page for appendix. Accepted for publication at the 2025 21th International Conference on Wireless and Mobile Computing, Networking and Communications (WiMob)

详情
Journal ref
Proc. WiMob, Marrakesh, Morocco, 2025
AI中文摘要

随着人工智能和机器学习模型成为网络运营的核心,其缺乏透明度对运营商信任构成重大障碍。现有的可解释人工智能技术往往无法为非专家弥合这一差距,产生的技术输出难以转化为可操作的见解。本文提出了一个专门解决这一缺陷的框架。它利用中等规模的大语言模型,并超越了SHapley Additive exPlanations特征影响值的标准用法。该框架采用结构化的提示,并辅以互特征交互数据,以生成人类可理解的自然语言解释。为了验证我们的框架,我们在光传输质量估计用例中进行了实证评估,并邀请了人类评估者。我们收集了专家的独立性能评估,显示出较高的评估者间一致性。与仅使用SHAP特征影响值进行简单提示的最先进基线相比,我们的方法将解释有用性和范围分别提高了12.2%和6.2%,同时实现了97.5%的正确性。

英文摘要

As artificial intelligence and machine learning (AI/ML) models become integral to network operations, their lack of transparency poses a significant barrier to operator trust. Existing explainable artificial intelligence (XAI) techniques often fail to bridge this gap for non-specialists, producing technical outputs that are difficult to translate into actionable insights. This paper presents a framework specifically designed to address this shortcoming. It leverages a moderately sized large language model (LLM) and extends beyond the standard use of SHapley Additive exPlanations (SHAP) feature influence values. The framework employs a structured prompt enriched with mutual feature interaction data to generate human-understandable natural language explanations. To validate our framework, we performed an empirical evaluation on an optical quality of transmission (QoT) estimation use case with human evaluators. We collected independent performance evaluations from specialists, which showed a high inter-evaluator agreement. Compared to a state-of-the-art baseline that uses only SHAP feature influence values in a straightforward prompt, our approach improves the explanation usefulness and scope by 12.2% and 6.2%, while achieving 97.5% correctness.

2606.11023 2026-06-10 cs.IR cs.CL cs.LG 交叉投稿

Generative Archetype-Grounded Item Representations for Sequential Recommendation

生成式原型驱动的物品表示用于序列推荐

Yifan Li, Jiahong Liu, Xinni Zhang, Hao Chen, Yankai Chen, Wenhao Yu, Jianting Chen, Irwin King

发表机构 * The Chinese University of Hong Kong(香港中文大学) McGill University(麦吉尔大学) Tongji University(同济大学)

AI总结 提出GenAIR框架,利用大语言模型生成物品原型描述并提取嵌入,结合行为校准目标弥合语义与行为差距,显著提升序列推荐性能。

Comments Accepted by WWW 2026 (Oral)

详情
AI中文摘要

序列推荐旨在通过分析用户的历史行为来预测用户与物品的下一次交互。然而,物品表示的质量有限仍然是一个关键瓶颈。虽然预训练的大语言模型(LLM)可以提供丰富的语义表示,但现有方法仅依赖于固定属性的静态编码,忽视了目标受众在定义物品身份中的关键作用。此外,语义空间难以反映实际用户行为,导致语义表示与行为模式之间存在显著差距。为了解决这些局限性,我们提出了GenAIR,一个通用框架,通过生成式原型驱动的物品表示来增强序列推荐。具体来说,我们首先利用LLM分析物品元数据并推断原型的文本描述,该原型代表物品理想目标受众的概念轮廓。然后,我们在一次前向传播中提取相应的嵌入。此外,为了将这些生成式原型基于现实世界的行为,我们引入了一个行为校准目标,该目标明确地整合了来自实际交互的行为信号。该目标调整嵌入空间的结构以反映经验模式。GenAIR能够与大多数现有模型无缝集成,同时保持高效率。在三个真实世界数据集上进行的全面实验表明,GenAIR显著提高了各种序列推荐模型的性能,并始终优于最先进的基线方法。实现代码可在以下网址获取:https://this URL。

英文摘要

Sequential recommendation aims to predict users' next interaction with items by analyzing their historical behavior. However, the limited quality of item representations remains a critical bottleneck. While pre-trained large language models (LLMs) can provide rich semantic representations, existing approaches only rely on static encoding of fixed attributes, overlooking the crucial role of target audiences in defining item identity. Moreover, the semantic space struggles to reflect actual user behavior, resulting in a significant gap between semantic representations and behavioral patterns. To address these limitations, we propose GenAIR, a general framework that empowers sequential recommendation with Generative Archetype-grounded Item Representations. Specifically, we first leverage an LLM to analyze item metadata and infer textual description of the Archetype, which represents the conceptual profile of the item's ideal target audience. We then extract the corresponding embeddings in a single forward pass. Further, to ground these generative archetypes in real-world behavior, we introduce a behavioral calibration objective, which explicitly incorporates behavioral signals from actual interactions. This objective adjusts the structure of the embedding space to reflect empirical patterns. GenAIR enables seamless integration with most existing models while maintaining high efficiency. Comprehensive experiments conducted on three real-world datasets demonstrate that GenAIR significantly improves the performance of various sequential recommendation models and consistently outperforms state-of-the-art baseline approaches. Implementation codes are available at https://github.com/AI-Santiago/GenAIR.

2606.11125 2026-06-10 eess.SP cs.LG 交叉投稿

DMT: Demographic Conditioning, Morphology-Enhanced Transformer for Cuffless Blood Pressure Estimation from PPG Signals

DMT: 基于人口统计条件与形态增强Transformer的无袖带血压估计方法

Yidan Shen, Neville Mathew, Maham Rahimi, Deependra Dhakal, George Zouridakis, Xin Fu, Renjie Hu

发表机构 * University of California, San Diego(加州大学圣地亚哥分校)

AI总结 提出一种基于Transformer的PPG信号无袖带血压估计网络,通过FiLM风格特征调制融入人口统计信息,并添加辅助形态头引导模型关注与动脉僵硬度相关的波形形态,在PulseDB数据集上实现收缩压MAE 4.56 mmHg、舒张压MAE 2.62 mmHg。

详情
AI中文摘要

血压(BP)是心血管风险评估和治疗决策的关键指标,而光电容积描记术(PPG)能够实现低成本、可穿戴友好的无袖带血压估计。然而,即使近期取得了进展,许多基于PPG的模型仅通过血压回归进行训练,可能依赖于以振幅为主的捷径。此外,系统性调节血管顺应性的人口统计协变量通常仅通过后期融合纳入,限制了特定于主体的表示学习。我们提出了一种基于Transformer的网络,用于从PPG信号进行无袖带血压估计,利用自注意力机制捕获多个心动周期之间的长程依赖关系。为了考虑特定主体的血管差异,模型通过Transformer块的注意力和前馈子层中应用的FiLM风格特征调制,以人口统计信息为条件。此外,我们添加了一个辅助形态头,引导模型关注与动脉硬度和波反射相关的血压相关波形形态。在大型PulseDB数据集上基于校准的评估协议下,所提方法在收缩压上实现了4.56 mmHg的平均绝对误差(MAE),在舒张压上实现了2.62 mmHg,与先前的人口统计增强PPG基线相比,误差分别减少了47%和50。由此产生的轻量级单传感器模型支持在启用校准的部署场景中进行可扩展且临床可靠的无袖带血压估计。

英文摘要

Blood pressure (BP) is a key marker for cardiovascular risk assessment and therapeutic decision-making, and Photoplethysmography (PPG) enables low-cost, wearable-friendly cuffless BP estimation. However, even with recent progress, many PPG-based models are trained with BP regression alone and may rely on amplitude-dominated shortcuts. In addition, demographic covariates that systematically modulate vascular compliance are often incorporated only via late fusion, limiting subject-specific representation learning. We propose a Transformer-based network for cuffless BP estimation from PPG signal, leveraging self-attention to capture long-range dependencies across multiple cardiac cycles. To account for subject-specific vascular differences, the model is conditioned on demographics via FiLM-style feature modulation applied through the attention and feed-forward sublayers of Transformer blocks. In addition, we add an auxiliary morphology head to guide the model to attend to BP-relevant waveform morphology associated with arterial stiffness and wave reflection. Under calibration-based evaluation protocols on the large-scale PulseDB dataset, the proposed method achieves MAE of 4.56 mmHg for systolic BP and 2.62 mmHg for diastolic BP, reducing errors by 47% and 50% compared with prior demographic-enhanced PPG baselines. The resulting lightweight, single-sensor model supports scalable and clinically grounded cuffless BP estimation in calibration-enabled deployment settings.

2606.11140 2026-06-10 physics.geo-ph cs.AI cs.LG stat.AP stat.ML 交叉投稿

Data assimilation for subsurface flow using latent diffusion model parameterization: performance of ensemble-Kalman and Monte Carlo techniques

基于潜扩散模型参数化的地下流体数据同化:集成卡尔曼与蒙特卡洛技术的性能

Guido Di Federico, Wenchao Teng, Louis J. Durlofsky

发表机构 * Department of Energy Science & Engineering, Stanford University(能源科学与工程系,斯坦福大学)

AI总结 针对地下流体数据同化中高维参数反演问题,比较了基于潜扩散模型(LDM)的集成卡尔曼方法(ESMDA)与蒙特卡洛方法(MCMC/SMC)在三维河道地质模型上的性能,发现蒙特卡洛方法在保持地质真实性的同时能更有效地降低数据失配和不确定性。

详情
AI中文摘要

地下流体数据同化(DA)涉及校准模型参数以匹配观测数据(通常来自井),同时保持地质真实性。潜扩散模型(LDM)提供了从高维地质模型空间到低维潜变量的高效映射,降低了反问题的维度,同时保持了后验地质模型的合理性。然而,LDM映射的高度非线性可能会降低基于卡尔曼增益的集成更新的性能。我们针对具有层次地质不确定性的三维河道地质模型,系统比较了DA算法。我们使用多重数据同化集成平滑器(ESMDA)比较了模型空间和潜空间的DA,并展示了一个关键权衡:模型空间更新实现了显著的不确定性降低,但产生了地质上不现实的后验模型,而潜空间更新保持了真实性但表现出有限的不确定性降低。受此启发,我们在3D-LDM潜空间中探索了严格的马尔可夫链蒙特卡洛(MCMC)和序贯蒙特卡洛(SMC)算法。为适应其高计算需求,我们开发了一个快速代理流模型来近似井响应。MCMC和SMC在三个合成测试案例中与ESMDA进行了评估,DA在LDM潜空间中执行。由于LDM参数化,所有模型都保持了地质真实性。MCMC和SMC彼此一致,并且比潜空间ESMDA实现了更低的数据失配和更多的不确定性降低。我们的总体结果表明,集成卡尔曼方法在高度非线性参数化下可能提供过高的后验不确定性,而由快速代理模型支持的严格蒙特卡洛采样可以提供更可靠的替代方案。

英文摘要

Data assimilation (DA) in subsurface flow entails calibrating model parameters to match observed data, typically at wells, while preserving geological realism. Latent diffusion models (LDMs) provide efficient mappings from high-dimensional geological model space to a low-dimensional latent variable, reducing the dimensionality of the inverse problem while maintaining plausibility in posterior geomodels. However, the high nonlinearity in the LDM mapping may degrade the performance of Kalman-gain-based ensemble updates. We present a systematic comparison of DA algorithms applied to large-scale 3D channelized geomodels with hierarchical geological uncertainty. We compare model-space and latent-space DA using the ensemble smoother with multiple data assimilation (ESMDA), and demonstrate a key trade-off: model-space updates achieve significant uncertainty reduction but produce geologically unrealistic posterior models, while latent-space updates preserve realism but exhibit limited uncertainty reduction. Motivated by this, we explore rigorous Markov chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) algorithms in the 3D-LDM latent space. To accommodate their high computational demands, we develop a fast surrogate flow model that approximates well-rate responses. MCMC and SMC are evaluated against ESMDA across three synthetic test cases, with DA performed in the LDM latent space. All models maintain geological realism due to the LDM parameterization. MCMC and SMC are consistent with one another and achieve lower data mismatch and more uncertainty reduction than latent-space ESMDA. Our overall results demonstrate that ensemble Kalman methods may provide overestimated posterior uncertainty with highly nonlinear parameterizations, while rigorous Monte Carlo sampling, enabled by fast surrogate models, can provide a more reliable alternative.

2503.19158 2026-06-10 cs.LG q-bio.QM 版本更新

Integrating Biological-Informed Recurrent Neural Networks for Glucose-Insulin Dynamics Modeling

整合生物信息递归神经网络用于葡萄糖-胰岛素动态建模

Stefano De Carli, Nicola Licini, Davide Previtali, Fabio Previdi, Antonio Ferramosca

发表机构 * Department of Management, Information and Production Engineering, University of Bergamo(管理、信息与生产工程系,贝加莫大学)

AI总结 本文提出生物信息递归神经网络框架,用于更准确地建模葡萄糖-胰岛素动态,以提高人工胰腺系统的个性化血糖调节能力。

Comments Accepted for publication in the proceedings of the Engineering Diabetes Technologies (EDT 2025). 7 pages, 2 figures and 1 table

详情
Journal ref
IFAC-PapersOnLine, 59(2), 2025, pp. 91-96
AI中文摘要

1型糖尿病管理由于多种变异性因素而复杂。人工胰腺系统通过先进控制算法自动化胰岛素输送,减轻了患者负担。然而,这些系统的有效性依赖于对葡萄糖-胰岛素动态的准确建模,而传统数学模型往往无法捕捉到患者特异性变化。本文引入了生物信息递归神经网络(BIRNN)框架,该框架利用门控递归单元(GRU)架构,并辅以包含生理约束的物理信息损失函数,确保预测准确性和生物原理的一致性。该框架通过商业UVA/Padova模拟器验证,其在葡萄糖预测准确性和未测量状态重构方面优于传统线性模型,即使在胰岛素敏感性昼夜变化下也表现优异。结果表明,BIRNN在人工胰腺系统的个性化葡萄糖调节和未来自适应控制策略中具有潜力。

英文摘要

Type 1 Diabetes (T1D) management is a complex task due to many variability factors. Artificial Pancreas (AP) systems have alleviated patient burden by automating insulin delivery through advanced control algorithms. However, the effectiveness of these systems depends on accurate modeling of glucose-insulin dynamics, which traditional mathematical models often fail to capture due to their inability to adapt to patient-specific variations. This study introduces a Biological-Informed Recurrent Neural Network (BIRNN) framework to address these limitations. The BIRNN leverages a Gated Recurrent Units (GRU) architecture augmented with physics-informed loss functions that embed physiological constraints, ensuring a balance between predictive accuracy and consistency with biological principles. The framework is validated using the commercial UVA/Padova simulator, outperforming traditional linear models in glucose prediction accuracy and reconstruction of unmeasured states, even under circadian variations in insulin sensitivity. The results demonstrate the potential of BIRNN for personalized glucose regulation and future adaptive control strategies in AP systems.

2511.01927 2026-06-10 cs.LG cs.AI cs.NA math.NA 版本更新

Learning-Guided Integration Contours Construction for Fast Large-Scale Generalized Eigensolvers

学习引导的积分轮廓构建用于快速大规模广义特征值求解器

Yeqiu Chen, Ziyan Liu, Hong Wang, Lei Liu

发表机构 * University of Science(科学技术大学)

AI总结 提出Deepcontour混合框架,结合深度学习谱预测器与核密度估计自动构建优化积分轮廓,加速大规模广义特征值求解,实现最高5.63倍加速并保持数值精度。

详情
AI中文摘要

解决大规模广义特征值问题(GEPs)是科学与工程中一项基本但计算上极为困难的任务。作为一种有前景的方向,轮廓积分(CI)方法提供了高效且可并行化的框架。然而,其性能关键依赖于积分轮廓的选择——在没有可靠先验知识的情况下,不当选择可能导致显著的计算开销并损害数值精度。为应对这一挑战,我们提出Deepcontour,一种新颖的混合框架,它将基于深度学习的谱预测器与核密度估计(KDE)相结合,用于原则性的轮廓设计。具体而言,Deepcontour利用其专用的特征神经算子(ENO)提供快速的谱分布先验,驱动KDE模块自动构建优化的积分轮廓,从而引导CI求解器高效地找到所需特征值。Deepcontour在多种科学数据集上实现了高达5.63倍的加速,同时保持严格的数值精度。通过融合深度学习的预测能力与经典求解器的数值严谨性,这项工作为解决大规模GEPs建立了一种高效且稳健的范式。

英文摘要

Solving large-scale Generalized Eigenvalue Problems (GEPs) is a fundamental yet computationally prohibitive task in science and engineering. As a promising direction, contour integral (CI) methods offer an efficient and parallelizable framework. However, their performance is critically dependent on the selection of integration contours -- improper selection without reliable prior knowledge of eigenvalue distribution can incur significant computational overhead and compromise numerical accuracy. To address this challenge, we propose Deepcontour, a novel hybrid framework that integrates a deep learning-based spectral predictor with Kernel Density Estimation (KDE) for principled contour design. Specifically, Deepcontour utilizes its specialized Eigen-Neural-Operator (ENO) to provide rapid spectral distribution priors, driving a KDE module to automatically construct the optimized integration contours, which guide the CI solver to efficiently find the desired eigenvalues. Deepcontour achieves up to a 5.63x speedup across diverse scientific datasets while maintaining strict numerical rigor. By merging the predictive power of deep learning with the numerical rigor of classical solvers, this work establishes an efficient and robust paradigm for solving large-scale GEPs.

2601.14653 2026-06-10 cs.LG q-bio.GN 版本更新

Efficient Imputation for Patch-based Missing Single-cell Data via Cluster-regularized Optimal Transport

基于聚类正则化最优传输的块缺失单细胞数据高效插补

Yuyu Liu, Jiannan Yang, Ziyang Yu, Weishen Pan, Fei Wang, Tengfei Ma

发表机构 * Department of Computer Science, Stony Brook University(计算机科学系,石板溪大学) Department of Computer Science, Emory University(计算机科学系,埃默里大学) Department of Population Health Sciences, Cornell University(流行病学与公共卫生系,康奈尔大学) Department of Computer Science and Engineering, Cornell University(计算机科学与工程系,康奈尔大学) Department of Biomedical Informatics, Stony Brook University(生物医学信息学系,石板溪大学)

AI总结 提出CROT算法,利用最优传输处理单细胞数据中的块缺失问题,实现高精度插补并显著降低运行时间。

Comments Accepted to ACM-BCB 2026

详情
AI中文摘要

单细胞测序数据集中的缺失数据对提取有意义的生物学见解构成了重大挑战。然而,现有的插补方法通常假设数据均匀且完整,难以处理存在大片缺失数据的情况。在本文中,我们提出了CROT(聚类正则化最优传输),一种基于最优传输的插补算法,旨在处理表格格式中的块缺失数据。我们的方法在存在显著缺失的情况下有效捕捉底层数据结构。值得注意的是,它在显著减少运行时间的同时实现了优越的插补精度,展示了其在大规模数据集上的可扩展性和效率。这项工作为具有结构化数据缺失的异质性高维数据集提供了一种鲁棒的插补解决方案,解决了生物学和临床数据分析中的关键挑战。我们的代码可在GitHub上获取,https://github.com/yuyuliu11037/CROT。

英文摘要

Missing data in single-cell sequencing datasets poses significant challenges for extracting meaningful biological insights. However, existing imputation approaches, which often assume uniformity and data completeness, struggle to address cases with large patches of missing data. In this paper, we present CROT (Cluster-Regularized Optimal Transport), an optimal transport-based imputation algorithm designed to handle patch-based missing data in tabular formats. Our approach effectively captures the underlying data structure in the presence of significant missingness. Notably, it achieves superior imputation accuracy while significantly reducing runtime, demonstrating its scalability and efficiency for large-scale datasets. This work introduces a robust solution for imputation in heterogeneous, high-dimensional datasets with structured data absence, addressing critical challenges in both biological and clinical data analysis. Our code is available on GitHub, https://github.com/yuyuliu11037/CROT.

2602.02788 2026-06-10 cs.LG cs.AI physics.comp-ph 版本更新

Structure-Preserving Learning Improves Geometry Generalization in Neural PDEs

保结构学习提升神经PDE在几何泛化中的表现

Benjamin D. Shaffer, Shawn Koohy, Brooks Kinch, M. Ani Hsieh, Nathaniel Trask

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出General-Geometry Neural Whitney Forms (Geo-NeW)方法,通过联合学习微分算子和兼容的降阶有限元空间,结合有限元外微积分精确保持物理守恒律,显著提升对未见几何域的泛化能力。

详情
AI中文摘要

我们旨在为科学和工程开发物理基础模型,这些模型能为偏微分方程(PDE)提供实时解,并在适应未见几何时保持结构和精度。为此,我们引入了通用几何神经Whitney形式(Geo-NeW):一种数据驱动的有限元方法。我们联合学习一个微分算子和定义在底层几何上的兼容降阶有限元空间。求解所得模型以生成预测,同时通过有限元外微积分精确保持物理守恒律。几何通过基于Transformer的编码和作为学习到的有限元空间的基础,以离散化网格的形式进入模型。这明确地将底层几何和施加的边界条件与解联系起来,为学习神经PDE提供了强大的归纳偏置,我们证明这改善了对未见域的泛化。我们提供了一种本构模型的新参数化,确保解的存在性和唯一性。我们的方法在几个稳态PDE基准测试中展示了最先进的性能,并在分布外几何上比传统基线有显著改进。

英文摘要

We aim to develop physics foundation models for science and engineering that provide real-time solutions to Partial Differential Equations (PDEs) which preserve structure and accuracy under adaptation to unseen geometries. To this end, we introduce General-Geometry Neural Whitney Forms (Geo-NeW): a data-driven finite element method. We jointly learn a differential operator and compatible reduced finite element spaces defined on the underlying geometry. The resulting model is solved to generate predictions, while exactly preserving physical conservation laws through Finite Element Exterior Calculus. Geometry enters the model as a discretized mesh both through a transformer-based encoding and as the basis for the learned finite element spaces. This explicitly connects the underlying geometry and imposed boundary conditions to the solution, providing a powerful inductive bias for learning neural PDEs, which we demonstrate improves generalization to unseen domains. We provide a novel parameterization of the constitutive model ensuring the existence and uniqueness of the solution. Our approach demonstrates state-of-the-art performance on several steady-state PDE benchmarks, and provides a significant improvement over conventional baselines on out-of-distribution geometries.

2602.06411 2026-06-10 cs.LG 版本更新

DAH-Net: A Dual-Attention Hybrid Network for Interpretable and Robust EEG-Based Emotion Recognition

DAH-Net: 用于可解释且鲁棒的基于脑电情绪识别的双注意力混合网络

S M Rakib UI Karim, Diponkor Bala, Wenyi Lu, Rownak Ara Rasul, Sean Goggins

发表机构 * Department of Electrical \& Computer Engineering University of Missouri Columbia, Missouri, USA Department of Computer Science Engineering City University Savar, Dhaka-1340, Bangladesh Department of Computer Science University of Missouri Columbia, Missouri, USA

AI总结 提出DAH-Net双注意力混合网络,集成1D-CNN、BiLSTM和双多头注意力,在2479样本988特征上达到99.19%测试准确率,优于传统模型,并通过特征分析揭示协方差特征贡献最大。

详情
AI中文摘要

基于脑电的情绪识别支持情感脑机接口和心理健康监测,但仍面临信号复杂性、受试者变异性和可解释性有限的挑战。我们提出DAH-Net,一种双注意力混合网络,集成1D-CNN、BiLSTM和双多头注意力(16+8头),用于三类脑电情绪分类。在2479个样本(988个脑电特征)上评估,DAH-Net达到99.19%的留出测试准确率,训练-测试差距为0.81%,优于RF(96.17%)、SVM(96.77%)、MLP(97.18%)和Transformer(98.19%)基线。Friedman检验(χ²=28.54,p<0.001)和事后Wilcoxon比较证实了统计显著性。使用随机森林重要性、SHAP归因和特征类别隔离进行的特征级分析表明,协方差特征达到接近基线的独立准确率(94.96%),而特征值特征独立表现有限(84.07%),但提供了紧凑的互补信息。紧凑架构(3.33M参数,使用32位权重约13.3MB)表明未来轻量级基于脑电的情感计算的潜力,有待受试者独立和外部验证。

英文摘要

EEG-based emotion recognition supports affective brain-computer interfaces and mental health monitoring yet remains challenged by signal complexity, subject variability, and limited interpretability. We propose DAH-Net, a dual-attention hybrid network integrating 1D-CNN, BiLSTM, and dual multi-head attention (16+8 heads) for three-class EEG emotion classification. Evaluated on 2,479 samples with 988 EEG features, DAH-Net achieves 99.19% held-out test accuracy with a 0.81% train-test gap, outperforming RF (96.17%), SVM (96.77%), MLP (97.18%), and Transformer (98.19%) baselines. Friedman testing (\c{hi}2 = 28.54, p < 0.001) and post-hoc Wilcoxon comparisons confirm statistical significance. Feature-level analysis using Random Forest importance, SHAP attribution, and feature category isolation shows that covariance features achieve near-baseline standalone accuracy (94.96%), while eigenvalue features show limited standalone performance (84.07%) but provide compact complementary information. The compact architecture (3.33M parameters, approximately 13.3MB using 32-bit weights) suggests potential for future lightweight EEG-based affective computing, pending subject-independent and external validation.

2603.02221 2026-06-10 cs.LG cs.AI 版本更新

MedFeat: Model-Aware and Explainability-Driven Feature Engineering with LLMs for Clinical Tabular Prediction

MedFeat: 基于模型感知与可解释性驱动的LLM特征工程用于临床表格预测

Zizheng Zhang, Yiming Li, Justin Xu, Jinyu Wang, Rui Wang, Lei Song, Jiang Bian, David W Eyre, Jingjing Fu

发表机构 * Microsoft Research(微软研究院) University of Oxford(牛津大学)

AI总结 提出MedFeat框架,利用模型感知和特征重要性信号迭代引导LLM生成针对性特征,在临床表格预测中平均提升超10%。

详情
AI中文摘要

在临床表格预测中,带有特征工程的经典机器学习模型通常优于神经方法。LLM越来越多地被用于自动化这一过程,作为领域专家提出多样化的特征变换以提升下游性能。然而,现有的基于LLM的方法将特征生成与下游模型解耦:LLM未接收到关于哪些特征当前驱动预测或模型表示能力不足的信号,因此提议既不针对特征空间中有前景的区域,也不适应学习器的归纳偏差。这一缺陷在医疗数据中尤为突出,医疗数据同时表现出类别不平衡、异质特征空间和严格的可解释性要求。本文提出MedFeat,这是首个受机器学习从业者工作流程启发的特征工程框架,利用模型感知和特征重要性信号迭代地指导临床表格学习的特征发现。我们在广泛的具有挑战性的真实临床任务上评估MedFeat,并表明它在统计上显著优于最先进的基线,在具有不同归纳偏差的模型上平均提升超过10%。

英文摘要

In clinical tabular prediction, classical machine learning models with feature engineering often outperform neural methods. LLMs are increasingly used to automate this process, acting as domain experts that propose diverse feature transformations to boost downstream performance. However, existing LLM-based methods decouple feature generation from the downstream model: the LLM receives no signal about which features currently drive predictions or where the model's representational capacity falls short, so proposals are neither targeted to promising regions of the feature space nor tailored to the learner's inductive bias. This shortcoming is amplified in healthcare data, which simultaneously exhibits class imbalance, heterogeneous feature spaces, and strict interpretability requirements. In this paper, we propose MedFeat, the first feature engineering framework inspired by the workflow of machine learning practitioners, leveraging model-awareness and feature importance signals to iteratively guide feature discovery for clinical tabular learning. We evaluate MedFeat on a broad range of challenging real-world clinical tasks and show that it statistically significantly outperforms state-of-the-art baselines, with an average improvement of more than 10% over the baseline across models with distinct inductive biases.

2603.22017 2026-06-10 cs.LG 版本更新

Domain Adapted Large Language Models for Additive Manufacturing

面向增材制造的领域自适应大语言模型

Peter Pak, Amir Barati Farimani

发表机构 * Department of Mechanical Engineering, Carnegie Mellon University(机械工程系,卡内基梅隆大学)

AI总结 本文通过约5000万token的小型数据集对开源大语言模型进行领域自适应预训练和指令微调,构建多模态领域自适应模型,在增材制造基准测试中达到90%以上准确率。

详情
AI中文摘要

本文提出了一系列多模态领域自适应大语言模型,这些模型基于指令微调变体的开源权重模型(Gemma 3、Qwen 3、Gemma 4),使用约5000万token的相对小型数据集构建。该数据集包含开放获取的增材制造期刊文章,从中提取数据用于领域自适应预训练和视觉指令微调过程。开发模型的各个阶段使用增材制造基准进行评估,该基准由增材制造领域特定任务和已发布资源组成。领域自适应和指令微调模型在语言和视觉任务中均表现出色,在通用增材制造知识方面达到90%以上的准确率。这种领域自适应预训练和指令微调策略为大语言模型在增材制造等领域的专业化提供了一种可访问的方法。

英文摘要

This work presents a collection of multi-modal domain adapted large language models built upon the instruction tuned variants of open weight models (Gemma 3, Qwen 3, Gemma 4) using a relatively small dataset of around 50 million tokens. The dataset consists of open-access additive manufacturing journal articles with data extracted for the domain adaptive pretraining and visual instruction tuning processes. Various stages of the developed model are evaluated with the Additive-Manufacturing-Benchmark which consists of additive manufacturing domain specific tasks compiled published resources. Domain adapted and instruction tuned models exhibit proficiency in both language and vision based tasks, achieving accuracies upwards of 90% in general additive manufacturing knowledge. This domain adaptive pretraining and instruction tuning strategy outline an accessible specialization method for large language models to a domain such as additive manufacturing.

2604.07085 2026-06-10 cs.LG 版本更新

Mining Electronic Health Records to Investigate Effectiveness of Ensemble Deep Clustering

挖掘电子健康记录以研究集成深度聚类的有效性

Manar D. Samad, Yina Hou, Shrabani Ghosh

发表机构 * Department of Computer Science(计算机科学系) Tennessee State University(田纳西州立大学)

AI总结 针对电子健康记录中传统聚类方法在嵌入表示上的局限,提出基于集成嵌入的深度聚类方法,结合多种嵌入维度与经典聚类,在心力衰竭患者队列中取得最佳综合性能。

Comments 2026 14th IEEE Conference on Healthcare Informatics

详情
AI中文摘要

在电子健康记录(EHR)中,对患者进行聚类和区分疾病亚型是阐明病理生理学并辅助临床决策的关键任务。然而,医疗信息学中的聚类仍基于传统方法,尤其是K-means,当将其作为混合方法应用于自编码器学习的嵌入表示时,取得的成功有限。本文利用来自“All of Us”研究计划的真实EHR数据,研究了传统、混合和深度学习方法在心力衰竭患者队列中的有效性。传统聚类方法表现稳健,因为深度学习方法专门为图像聚类设计,该任务与表格型EHR数据设置显著不同。为了解决深度聚类的不足,我们引入了一种基于集成的深度聚类方法,该方法聚合从多个嵌入维度获得的聚类分配,而不是依赖于单个固定的嵌入空间。当在新型集成框架中与传统聚类结合时,所提出的用于深度聚类的集成嵌入在14种不同的聚类方法和多个患者队列中取得了最佳的整体性能排名。本文强调了EHR数据的生物学性别特异性聚类的重要性,以及将传统和深度聚类方法相结合相对于单一方法的优势。

英文摘要

In electronic health records (EHRs), clustering patients and distinguishing disease subtypes are key tasks to elucidate pathophysiology and aid clinical decision-making. However, clustering in healthcare informatics is still based on traditional methods, especially K-means, and has achieved limited success when applied to embedding representations learned by autoencoders as hybrid methods. This paper investigates the effectiveness of traditional, hybrid, and deep learning methods in heart failure patient cohorts using real EHR data from the All of Us Research Program. Traditional clustering methods perform robustly because deep learning approaches are specifically designed for image clustering, a task that differs substantially from the tabular EHR data setting. To address the shortcomings of deep clustering, we introduce an ensemble-based deep clustering approach that aggregates cluster assignments obtained from multiple embedding dimensions, rather than relying on a single fixed embedding space. When combined with traditional clustering in a novel ensemble framework, the proposed ensemble embedding for deep clustering delivers the best overall performance ranking across 14 diverse clustering methods and multiple patient cohorts. This paper underscores the importance of biological sex-specific clustering of EHR data and the advantages of combining traditional and deep clustering approaches over a single method.

2604.26991 2026-06-10 cs.LG cs.AI 版本更新

People-Centred Medical Image Analysis via Fairness-Aware Human-AI Cooperation

以人为本的医学图像分析:通过公平感知的人机协作

Zheng Zhang, Milad Masroor, Cuong Nguyen, Tahir Hassan, Yuanhong Chen, David Rosewarne, Kevin Wells, Thanh-Toan Do, Gustavo Carneiro

发表机构 * arXiv

AI总结 提出PecMan框架,联合建模子群依赖可靠性、决策分配和协作预测,通过门控与整合机制动态分配病例给自动模型或人类专家,无需测试时敏感属性,实现公平感知的人机协作分类。

详情
AI中文摘要

医学图像分析的机器学习模型通常表现出子群依赖的性能,这影响了在有限资源下如何在自动化系统和人类专家之间分配决策。先前关于AI公平性和人机协作的工作,包括学习推迟(L2D)和学习互补(L2C),通常孤立地处理这些问题。我们提出了以人为本的医学图像分析(PecMan),一个用于公平感知的人机协作分类框架,它联合建模子群依赖的可靠性、决策分配和协作预测。PecMan结合了子群专门的预测器与一个门控和整合机制,该机制动态地将病例分配给自动化模型、人类专家或它们的组合,而无需在测试时使用敏感属性。我们还引入了FairHAI基准,用于评估预测准确性、子群公平性和人类参与之间的权衡。此外,我们通过选择遗憾对多智能体门控进行了理论分析,并刻画了在输入依赖分配下的公平性-覆盖权衡。在多个医学影像数据集上的实验表明,与单独处理公平性或人机协作的方法相比,PecMan实现了持续改进的权衡。

英文摘要

Machine learning models for medical image analysis often exhibit subgroup-dependent performance, which impacts how decisions should be allocated between automated systems and human experts under limited resources. Prior work on AI fairness and human-AI cooperation, including learning to defer (L2D) and learning to complement (L2C), typically addresses these problems in isolation. We propose People-Centred Medical Image Analysis (PecMan), a framework for fairness-aware human-AI co-operative classification that jointly models subgroup-dependent reliability, decision allocation, and collaborative prediction. PecMan combines subgroup-specialised predictors with a gating and consolidation mechanism that dynamically assigns cases to automated models, human experts, or their combination, without requiring sensitive attributes at test time. We also introduce the FairHAI benchmark for evaluating trade-offs between predictive accuracy, subgroup equity, and human involvement. In addition, we provide a theoretical analysis of multi-agent gating via selection regret and characterise fairness-coverage trade-offs under input-dependent allocation. Experiments across multiple medical imaging datasets demonstrate that PecMan achieves consistently improved trade-offs compared to methods that address fairness or human-AI cooperation separately.

2606.07135 2026-06-10 cs.LG 版本更新

Explaining Unsupervised Disease Staging in Huntington's Disease: Insights into Model Representations and Clusters

解释亨廷顿病中的无监督疾病分期:模型表示与聚类洞察

Lubna Mahmoud Abu Zohair, Hind Zantout

发表机构 * Heriot-Watt University(赫瑞-沃德大学)

AI总结 本文通过可解释性分析扩展无监督疾病分期框架,在Enroll-HD数据集上揭示模型嵌入与临床进展一致,并利用SHAP量化特征重要性,识别出从早期认知运动障碍到严重功能依赖的疾病阶段。

Comments Accepted for oral presentation and as a full-length paper at the International Conference on AI in Healthcare 2026 (26-28 August 2026, Imperial College London) and will be published by Springer in the Lecture Notes in Computer Science (LNCS) series

详情
AI中文摘要

亨廷顿病(HD)是一种进行性神经退行性疾病,影响运动、认知和行为功能,准确描述疾病进展对于改善患者预后和生活质量至关重要。无监督机器学习(ML)方法已证明能够从纵向数据中发现疾病进展轨迹和有意义的潜在阶段;然而,其有限的可解释性限制了临床信任和转化。我们通过将可解释性分析应用于提取的特征表示和发现的疾病阶段,扩展了先前提出的基于ML的疾病分期框架。应用于Enroll-HD数据集,我们首先将学习到的表示投影到低维空间,以直观评估所得聚类是否与既定临床指标的进展一致。然后,我们使用显著性图识别随时间对学习嵌入贡献最大的临床特征。最后,我们训练一个替代分类器并应用SHAP来量化特征对聚类分配的重要性,并分析哪些临床变量驱动疾病阶段之间的转换。可解释性分析表明,学习到的嵌入捕捉了具有临床意义的疾病结构,与既定的运动和功能严重程度评分一致,并显示出跨聚类的进行性恶化。在此分析中,SHAP揭示了疾病阶段的分层,范围从早期认知运动障碍到严重功能依赖,与已知的临床进展模式一致,同时也突出了阶段内变异性。

英文摘要

Huntington's disease (HD) is a progressive neurodegenerative disorder that affects motor, cognitive, and behavioral functions, where accurate characterization of disease progression remains essential to improve patient outcome and quality of life. Unsupervised machine learning (ML) approaches have demonstrated the ability to uncover disease progression trajectories and meaningful latent stages from longitudinal data; however, their limited interpretability restricts clinical trust and translation. We extend a previously proposed ML-based disease staging framework by applying an explainability analysis to the extracted feature representations and discovered disease stages. Applied to the Enroll-HD dataset, we first project the learned representations into a lower-dimensional space to intuitively assess whether the resulting clusters align with the progression of established clinical measures. We then use saliency maps to identify the clinical features that most strongly contribute to the learned embeddings over time. Finally, we train a surrogate classifier and apply SHAP to quantify feature importance for cluster assignments and to analyze which clinical variables drive transitions between disease stages. The explainability analysis indicates that the learned embeddings capture clinically meaningful disease structure, aligning with established motor and functional severity scores and exhibiting progressive deterioration across clusters. Within this analysis, SHAP reveals a stratification of disease stages, ranging from early cognitive-motor impairment to severe functional dependency, consistent with known clinical progression patterns, while also highlighting intra-stage variability.

2512.18531 2026-06-10 physics.chem-ph cs.LG 版本更新

Pushing the limits of one-dimensional NMR spectroscopy for automated structure elucidation using artificial intelligence

利用人工智能推动一维核磁共振波谱在自动结构解析中的极限

Frank Hu, Jonathan M. Tubb, Dimitris Argyropoulos, Sergey Golotvin, Mikhail Elyashberg, Grant M. Rotskoff, Matthew W. Kanan, Thomas E. Markland

发表机构 * Department of Chemistry, Stanford University(化学系,斯坦福大学) ACD/Labs(ACD实验室)

AI总结 提出基于Transformer的深度学习框架,仅利用一维1H和13C NMR谱,对含多达40个非氢原子的有机分子实现60.4%的首次15次预测准确率,克服化学空间组合爆炸。

详情
AI中文摘要

一维核磁共振波谱是有机化合物和天然产物表征中最广泛使用的技术之一。对于含有最多36个非氢原子的分子,可能的结构数量估计在$10^{20} - 10^{60}$范围内。因此,仅使用其一维$^1$H和/或$^{13}$C NMR谱来确定该大小分子的结构(分子式和连接性),即从头结构生成,似乎完全不可行。在这里,我们展示了如何通过深度学习框架,对含有最多40个非氢原子且涵盖有机化学中常见元素(C、N、O、H、P、S、Si、B和卤素)的系统实现这一任务,从而覆盖了类药化学空间的绝大部分。利用自然语言处理的见解,我们展示了基于Transformer的架构仅使用$^1$H和$^{13}$C NMR谱,在前15次预测中正确预测分子的准确率达到60.4%,从而克服了化学空间的组合增长,同时通过微调也可扩展到实验数据。

英文摘要

One-dimensional NMR spectroscopy is one of the most widely used techniques for the characterization of organic compounds and natural products. For molecules with up to 36 non-hydrogen atoms, the number of possible structures has been estimated to range from $10^{20} - 10^{60}$. The task of determining the structure (formula and connectivity) of a molecule of this size using only its one-dimensional $^1$H and/or $^{13}$C NMR spectrum, i.e. de novo structure generation, thus appears completely intractable. Here we show how it is possible to achieve this task for systems with up to 40 non-hydrogen atoms across the full elemental coverage typically encountered in organic chemistry (C, N, O, H, P, S, Si, B, and the halogens) using a deep learning framework, thus covering a vast portion of the drug-like chemical space. Leveraging insights from natural language processing, we show that our transformer-based architecture predicts the correct molecule with 60.4% accuracy within the first 15 predictions using only the $^1$H and $^{13}$C NMR spectra, thus overcoming the combinatorial growth of the chemical space while also being extensible to experimental data via fine-tuning.

2601.05232 2026-06-10 cs.CL cs.CY cs.LG 版本更新

AI Application Gives Users Real-Time Feedback on the Level of Peace in the Social Media Videos They Watch

AI应用为用户观看的社交媒体视频提供实时和平水平反馈

P. Gilda, P. Dungarwal, A. Thongkham, E. T. Ajayi, S. Choudhary, T. M. Terol, C. Lam, J. P. Araujo, M. McFadyen-Mungalln, L. S. Liebovitch, P. T. Coleman, H. West, K. Sieck, S. Carter

发表机构 * Data Science Institute, Columbia University(哥伦比亚大学数据科学研究所) Advanced Consortium on Cooperation, Conflict, and Complexity, Columbia University(哥伦比亚大学合作、冲突与复杂性高级联合体) Computer Science, Columbia University(哥伦比亚大学计算机科学) Data Science, St John’s University(圣约翰大学数据科学) Quantitative Methods in the Social Sciences, Columbia University(哥伦比亚大学社会科学定量方法) Barnard College, Columbia University(哥伦比亚大学巴纳德学院) Teachers College, Columbia University(哥伦比亚大学教师学院) Department of Industrial Engineering and Operations Research, Columbia University(哥伦比亚大学工业工程与运筹学系) Harmonious Communities, Toyota Research Institute(丰田研究院和谐社区)

AI总结 开发了一个实时分析YouTube视频中语言和平程度的AI应用,使用监督学习和大语言模型,大语言模型在测量和平相关社会维度上更接近人类编码者。

Comments 6 pages, 4 figures, corrected typos, minor edits; v3: 16 pages, improved title, abstract, introduction, discussion, conclusions, added more references

详情
AI中文摘要

现在大多数人通过社交媒体(如YouTube和Facebook)上的视频获取新闻,而不是通过精心策划的新闻业。“我们成为我们所注视的。”语言的内容和语调在开始或结束冲突中起着至关重要的作用。“仇恨言论”会加剧冲突,“和平言论”会促进和平。我们开发了一个应用程序,可以实时测量YouTube视频中这些方面的言论,从而为用户提供关于自身媒体消费的有用反馈。我们使用了两种方法:1)监督机器学习。在线新闻媒体文本中的语言通过衡量这些国家和平水平的调查进行标记。一个全连接前馈网络和两个卷积神经网络在该数据上训练,在测试集上预测和平水平的准确率约为97%,在另一个不同的新闻文本数据集中准确率约为70%,但未能泛化到YouTube视频,表明书面文本与转录的口语不同。2)社会科学维度。没有类似的外部数据来标记YouTube视频转录文本中的语言。因此,我们使用了2个词级情感分析(SA)和6个上下文级大语言模型(LLM)来测量59项社会科学研究确定的和平中的5个社会维度:同情-蔑视、新闻-观点、促进-预防、创造力-秩序、细微差别-简化。在52个视频上,LLM与3个人类编码者的值更接近(r^2~0.60),而SA的r^2~0.03。结果:与人类编码者相比,LLM成功测量了YouTube视频中与和平相关的重要社会维度。这些结果构成了一个分析引擎的基础,该引擎可以为用户和内容创作者提供关于自身媒体消费和创作的反馈。

英文摘要

Most people now get their news from videos on social media, such as YouTube and Facebook, rather than through curated journalism. "We become what we behold." The content and tone of language plays an essential role in starting or ending conflicts. "Hate Speech" can enhance conflict, "Peace Speech" can enhance peace. We developed an application that measures, in real time, these aspects of speech from YouTube videos, which can give users helpful feedback on their own media diet. We used two approaches: 1) supervised machine learning. Language in the text of online news media text was tagged by surveys that measure the level of peace in those countries. One fully connected feedforward and 2 convolutional neural networks trained on that data were $\sim 97\%$ accurate in predicting levels of peace in the test set and $\sim 70\%$ accurate in another distinct news text data set, but did not generalize to YouTube videos, suggesting that written text is different than transcribed spoken language. 2) social science dimensions. There is no similar external data to tag the text in the YouTube video transcripts. We therefore used 2 word-level sentiment analysis (SA) and 6 context-level large language models (LLMs) to measure 5 social dimensions in peace identified by 59 social science studies: compassion-contempt, news-opinion, promotion-prevention, creativity-order, nuance-simplification. LLMs more closely matched the values by 3 human coders on 52 videos, $r^2\sim0.60$ than SA, at $r^2\sim0.03$. Results: LLMs successfully measured social dimensions important in peace in YouTube videos, compared to human coders. These results serve as the basis of an analysis engine that can give users and content creators feedback on their own media diet and creations.

2604.13213 2026-06-10 stat.ML cs.LG math.OC physics.chem-ph 版本更新

Rare Event Analysis via Stochastic Optimal Control

基于随机最优控制的稀有事件分析

Yuanqi Du, Jiajun He, Dinghuai Zhang, Eric Vanden-Eijnden, Carles Domingo-Enrich

发表机构 * Microsoft Research New England(微软研究院新英格兰分部) Cornell University(康奈尔大学) University of Cambridge(剑桥大学) Courant Institute of Mathematical Sciences, NYU(纽约大学Courant数学科学研究所)

AI总结 提出将稀有事件分析中的committor函数估计转化为随机最优控制问题,通过反馈控制引导轨迹采样,并开发两种损失函数及处理亚稳态的方法,在基准系统上获得更准确的结果。

详情
AI中文摘要

稀有事件,如生物分子的构象变化、相变和化学反应,是许多物理系统行为的关键,但由于无偏模拟很少产生这些事件,因此计算研究极其困难。过渡路径理论(TPT)为分析此类事件提供了严格的统计框架:它表征了两个指定亚稳态(反应物和产物)之间的反应轨迹集合,其核心对象——committor函数(给出系统下一步到达产物而非反应物的概率)——编码了所有基本的动力学和热力学信息。我们引入了一个框架,将committor估计转化为随机最优控制(SOC)问题。在此公式中,committor定义了一个反馈控制(与其对数梯度成正比),该控制主动引导轨迹朝向反应区域,从而实现对反应路径的高效采样。为了解决由此产生的命中时间控制问题,我们开发了两个互补的目标:直接反向传播损失和基于原理的离策略值匹配损失,并为其建立了一阶最优性保证。我们进一步通过引入一种替代采样过程来解决亚稳态问题(该问题可能使受控轨迹陷入中间势阱),该过程在降低有效能垒的同时保持反应电流。在基准系统上,该框架比现有方法产生了显著更准确的committor估计、反应速率和平衡常数。

英文摘要

Rare events such as conformational changes in biomolecules, phase transitions, and chemical reactions are central to the behavior of many physical systems, yet they are extremely difficult to study computationally because unbiased simulations seldom produce them. Transition Path Theory (TPT) provides a rigorous statistical framework for analyzing such events: it characterizes the ensemble of reactive trajectories between two designated metastable states (reactant and product), and its central object--the committor function, which gives the probability that the system will next reach the product rather than the reactant--encodes all essential kinetic and thermodynamic information. We introduce a framework that casts committor estimation as a stochastic optimal control (SOC) problem. In this formulation the committor defines a feedback control--proportional to the gradient of its logarithm--that actively steers trajectories toward the reactive region, thereby enabling efficient sampling of reactive paths. To solve the resulting hitting-time control problem we develop two complementary objectives: a direct backpropagation loss and a principled off-policy Value Matching loss, for which we establish first-order optimality guarantees. We further address metastability, which can trap controlled trajectories in intermediate basins, by introducing an alternative sampling process that preserves the reactive current while lowering effective energy barriers. On benchmark systems, the framework yields markedly more accurate committor estimates, reaction rates, and equilibrium constants than existing methods.

2606.06493 2026-06-10 cs.RO cs.AI cs.LG 版本更新

HANDOFF: Humanoid Agentic Task-Space Whole-Body Control via Distilled Complementary Teachers

HANDOFF: 通过蒸馏互补教师实现人形机器人任务空间全身控制

Lizhi Yang, Junheng Li, Nehar Poddar, Yiling Hou, Gio Huh, Robert Griffin, Georgia Gkioxari, Aaron Ames

发表机构 * California Institute of Technology(加州理工学院) The Institute for Human & Machine Cognition(人机认知研究院)

AI总结 提出HANDOFF框架,通过多教师KL蒸馏和上下文门控机制,将全身运动跟踪、行走和跌倒恢复三个专家策略融合为混合专家学生策略,实现基于紧凑显式接口的全身控制,在Unitree G1上达到先进的速度跟踪性能并扩展了操作工作空间。

Comments 22 pages, 9 figures, Project page: https://lzyang2000.github.io/HANDOFF/

详情
AI中文摘要

对于要在现实世界中部署的人形机器人,命令空间(即任务规划与全身控制之间的接口)的选择至关重要。现有的全身控制器通常需要密集的运动学或空间参考,而规划器难以从任务语义中合成这些参考。我们提出了一种紧凑、显式的接口,该接口直观、通用、模块化且具有足够的表达能力,适用于多种操作技能。为此,我们引入了HANDOFF,这是一个单一的人形全身控制器,遵循该接口,并通过多教师KL蒸馏,在上下文条件门控方案下,从三个互补专家(具有安全过滤数据的全身运动跟踪、行走和跌倒恢复)中蒸馏出混合专家学生。在Unitree G1上,HANDOFF达到了最先进的速度跟踪性能,并提供了最大的鲁棒操作工作空间之一。我们进一步通过多个自然语言驱动的任务执行演示了硬件可行性,这些任务由VLM驱动的智能体规划器提供支持,无需特定任务数据或控制器微调。

英文摘要

For a humanoid robot to be deployed in the real world, the choice of command space (i.e., the interface between task planning and whole-body control) is crucial. Existing whole-body controllers typically demand dense kinematic or spatial references that planners struggle to synthesize from task semantics. We instead propose a compact, explicit interface that is intuitive, general, modular, and expressive enough for diverse loco-manipulation skills. To this end, we introduce HANDOFF, a single humanoid whole-body controller that follows this interface and is distilled via multi-teacher KL distillation under a context-conditioned gating scheme into a mixture-of-experts student from three complementary specialists: whole-body motion tracking with safety-filtered data, locomotion, and fall-recovery. On the Unitree G1, HANDOFF matches state-of-the-art velocity tracking and offers one of the largest robust manipulation workspaces. We further demonstrate hardware feasibility through multiple natural-language-driven task roll-outs, powered by a VLM-driven agentic planner with no task-specific data or controller fine-tuning.

13. 其他/综合机器学习 35 篇

2606.09894 2026-06-10 cs.LG cs.CL 新提交

A Navigable Manifold of Hypothesized Consciousness-Spectrum States in Language Model Representations

语言模型表示中假设的意识谱状态的可导航流形

Sophie Zhao

发表机构 * School of Computer Science(计算机科学学院) Georgia Institute of Technology(佐治亚理工学院)

AI总结 研究语言模型嵌入空间中与意识谱相关的几何结构,发现嵌入形成可导航流形,高低层区域稳定,中间为过渡走廊,导航性为内在属性。

详情
AI中文摘要

在沉思、哲学和心理学描述中,人类意识常被描述为从反应性和自我聚焦模式到更整合和连贯模式的类似谱系。理解语言模型是否在表示空间中编码了这种结构化、人类可解释的意识谱系,对于模型引导、评估和对齐具有重要意义。在这项工作中,我们研究了Transformer嵌入空间中沿该谱系的几何结构和动态模式。我们表明,嵌入表现出与该谱系对齐的全局组织几何:与相似状态相关的句子聚类成局部连贯区域,形成结构化流形。特别地,高层和低层区域表现出类似凸性的稳定性,而中间区域形成过渡走廊。在动态上,效用引导和纯几何贪婪轨迹都一致地从低层区域穿越到高层区域,经过中间层级,表明可导航性是表示空间的内在属性,由全局方向信号引导但非决定。这些结果表明,嵌入空间编码了与假设的意识谱分类法(广泛受沉思传统、哲学和现代心理学中人类意识反复出现的结构描述启发)对齐的结构化和可导航几何,为分析和引导模型行为提供了表示层面的视角。

英文摘要

Across contemplative, philosophical, and psychological accounts, human consciousness is often described along a similar spectrum, ranging from reactive and self-focused patterns to more integrative and coherent ones. Understanding whether language models encode such a structured, human-interpretable consciousness spectrum in representation space is important for model guidance, evaluation and alignment. In this work, we study the geometric structure and dynamics of patterns along this spectrum in transformer embedding spaces. We show that embeddings exhibit a globally organized geometry aligned with this spectrum: sentences associated with similar states cluster into locally coherent regions, forming a structured manifold. In particular, higher-level and lower-level regions exhibit convexity-like stability, while intermediate regions form a transition corridor. Dynamically, both utility-guided and geometry-only greedy trajectories consistently traverse from lower- to higher-level regions, passing through intermediate tiers, indicating that navigability is an intrinsic property of the representation space, guided but not dictated by a global directional signal. These results suggest that embedding spaces encode structured and navigable geometry aligned with a hypothesized consciousness-spectrum taxonomy, broadly inspired by recurring structural descriptions of human consciousness across contemplative traditions, philosophy, and modern psychology, providing a representation-level perspective for analyzing and guiding model behavior.

2606.09936 2026-06-10 cs.LG cs.AI 新提交

One Lens, Many Worlds : A Capability-Typed Interface for World-Model Interpretability

一个镜头,多个世界:面向世界模型可解释性的能力类型接口

Bhavith Chandra Challagundla, Sanskar Pandey, Param Thakkar, Rishikesh Mallagundla, Yugandhar Reddy Gogireddy, Wenhao Lu, Hindol Roy Choudhury, Shravani Challagundla, Mohamed Deraz Nasr, Spursh Deshpande

AI总结 提出WorldModelLens,通过能力类型适配器统一不同世界模型(如PlaNet、IRIS、I-JEPA)的可解释性分析,避免重复实现。

详情
AI中文摘要

世界模型现在建立在截然不同的计算基板上。潜在循环状态空间模型(如PlaNet和Dreamer系列)将观测压缩为循环状态;基于token的模型(如IRIS)将观测量化到学习到的码本中,并用transformer进行自回归预测;联合嵌入预测架构(如I-JEPA)在没有像素解码器的学习潜在空间中进行预测。应用于这些模型的可解释性方法,包括探针、激活修补、稀疏自编码器和惊喜分析,共享一组共同的基元,但由于现有的钩子和缓存工具假设一个没有动作、环境步骤或想象回滚概念的transformer语言模型,它们为每个架构从头重新实现。我们认为这种碎片化反映了工具而非模型,并且世界模型的共享结构可以通过一个小型类型接口捕获。我们提出了WorldModelLens,一个围绕能力类型适配器组织的开源可解释性基板:每个模型实现四个必需方法(编码、转移、初始状态、采样),并通过显式能力描述符声明一组可选头(解码、奖励、继续、行动者、评论者),使得强化学习和自监督世界模型成为一等公民,而无需模仿对方。单一的钩子和缓存层在此接口上暴露时间索引的激活、想象回滚和干预重放,使得每个分析只需编写一次。

英文摘要

World models are now built on substantially different computational substrates. Latent recurrent state-space models such as PlaNet and the Dreamer family compress observations into recurrent states; token-based models such as IRIS quantize observations into a learned codebook and predict autoregressively with a transformer; and joint-embedding predictive architectures such as I-JEPA predict in a learned latent space with no pixel decoder. The interpretability methods applied to these models, including probing, activation patching, sparse autoencoders, and surprise analysis, share a common set of primitives, yet they are re-implemented from scratch for each architecture because existing hook-and-cache tooling assumes a transformer language model with no notion of actions, environment steps, or imagined rollouts. We argue that this fragmentation reflects the tooling rather than the models, and that the shared structure of world models is captured by a small typed interface. We present WorldModelLens, an open-source interpretability substrate organized around a capability-typed adapter: every model implements four required methods (encode, transition, initial state, sample) and declares a set of optional heads (decode, reward, continue, actor, critic) through an explicit capability descriptor, so that reinforcement-learning and self-supervised world models are first-class without either imitating the other. A single hook and cache layer exposes time-indexed activations, imagination rollouts, and intervention replay over this interface, allowing each analysis to be written once.

2606.10092 2026-06-10 cs.LG econ.GN q-fin.EC 新提交

Decision-Making under Combinatorial Risk

组合风险下的决策

Yifan Hong, Hongmiao Fan, Chen Wang

AI总结 通过投资分配任务研究组合风险下的决策,发现参与者主要依据投资后成功概率等特征而非精确评估完整分布,并利用符号回归发现简洁描述模型。

详情
AI中文摘要

风险下的决策通常通过单次彩票选择来研究。然而,许多实际决策涉及组合风险,其中风险来自多个风险组件,因此结果上的彩票是诱导的而非直接给出的,并且精确评估可能代价高昂。我们引入了一项投资分配任务来研究组合风险下的决策,其中投资于一个组件会提高其成功概率,从而重塑结果分布。参与者倾向于选择概率增量较大的选项,当增量相等时,选择初始成功概率较高的选项。揭示诱导的概率质量函数(PMF)会显著改变行为,使参与者对组合风险特征的反应减弱,并减少选择方差。为了解释这些模式,我们超越标准基准和手工假设,使用符号回归发现简洁的描述模型。发现的模型主要依赖于组合风险特征,例如投资后的成功概率,而不是对完整诱导分布的精确评估。当显示PMF时,行为可以通过用前景理论残差模型增强该模型来很好地解释。结果表明,人们主要通过核心特征来导航组合风险,仅在显示诱导PMF时才转向彩票估值。

英文摘要

Decision-making under risk is typically studied through single-shot lottery choices. Yet many real decisions involve combinatorial risk, where risk arises from multiple risky components, so the lottery over outcomes is induced rather than given outright and can be costly to evaluate exactly. We introduce an investment-allocation task to study decision under combinatorial risk, where investing in a component raises its success probability and thereby reshapes the outcome distribution. Participants favor the option with the larger probability increment, and, when increments are equal, the option with the higher initial success probability. Revealing the induced probability mass function (PMF) substantially changes behavior, making participants less responsive to combinatorial-risk features and reducing choice variance. To explain these patterns, we move beyond standard benchmarks and hand-crafted hypotheses with symbolic regression to discover compact descriptive models. The discovered models rely mainly on combinatorial-risk features, such as the after-investment success probability, rather than exact evaluation of the full induced distribution. Behavior under the displayed PMF is then well explained by augmenting this model with a prospect-theoretic residual model. The results show that people navigate combinatorial risk primarily through its core features, shifting toward lottery valuation only when the induced PMF is displayed.

2606.10582 2026-06-10 cs.LG cs.AI 新提交

Drawing with Strangers: Population Scaling Drives Zero-Shot Mutual Intelligibility in Emergent Sketching

与陌生人共绘:种群规模驱动涌现素描中的零-shot互懂性

Jooyeon Kim

发表机构 * Graduate School of Artificial Intelligence, UNIST(UNIST人工智能研究生院)

AI总结 研究通过视觉素描游戏,发现扩大训练种群规模能显著提升独立训练群体间的零-shot互懂性,其机制在于增加群体内变异并减少群体间差异,最终通过感知锚定实现结构收敛。

详情
AI中文摘要

涌现通信中的泛化主要关注新颖输入或语言结构,但智能体与来自严格不相交社区的陌生人进行通信的能力仍相对未被探索。在这项工作中,我们将这种能力形式化为\textit{零-shot互懂性(ZMI)}:独立训练群体之间无需事先接触即可成功通信。利用涌现素描(智能体通过绘制一组笔画进行通信)作为视觉接地模态,我们发现扩大训练种群规模显著提高了独立群体间的ZMI。关键的是,随着种群规模扩大,群体内通信变异增加,防止了同质化共适应。同时,群体间变异减少,表明向某种普遍性的结构收敛。进一步分析揭示,这种普遍性是通过感知接地实现的:扩大后的种群越来越将其涌现素描锚定在目标图像的客观视觉相似性上。这些结果共同将ZMI定位为涌现通信中一个独特的泛化轴,并提出了实现社会可互操作人工智能体的途径。

英文摘要

Generalization in emergent communication has largely focused on novel inputs or linguistic structures, yet the capacity for agents to communicate with strangers from strictly disjoint communities remains relatively unexplored. In this work, we formalize this capability as \textit{zero-shot mutual intelligibility (ZMI)}: successful communication between independently trained populations without prior exposure. Leveraging emergent sketching -- in which agents communicate through sets of drawn strokes -- as a visually grounded modality, we find that scaling the training population substantially improves ZMI across independent groups. Crucially, as we scale the population size, in-group communicative variation increases, preventing co-adaptation into homogeneity. Simultaneously, cross-group variation decreases, indicating a structural convergence toward a certain type of universality. Further analysis reveals that this universality is achieved through perceptual grounding: scaled populations increasingly anchor their emergent sketches on the objective visual resemblance of the target images. Together, these results position ZMI as a distinct axis of generalization in emergent communication and suggest a route toward socially interoperable artificial agents.

2606.10587 2026-06-10 cs.LG cs.AI 新提交

Towards Diverse Scientific Hypothesis Search with Large Language Models

面向多样化科学假设搜索的大语言模型

Haorui Wang, Parshin Shojaee, Kazem Meidani, Kunyang Sun, José Miguel Hernández-Lobato, Teresa Head-Gordon, Jiajun He, Chandan K. Reddy, Chao Zhang, Yuanqi Du

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 针对科学假设搜索中多样性崩溃问题,提出基于并行回火的多温度进化框架,在固定验证预算下提升假设质量与多样性。

Comments ICML 2026

详情
AI中文摘要

大语言模型(LLMs)在加速科学发现方面日益崛起,最近在生成有效科学假设等高级任务中表现突出。然而,在许多发现场景中,目标并非识别单一最佳假设,因为验证可能噪声大且成本高,科学家受益于一组高质量替代假设,以对冲下游不确定性,寻求最佳解决方案。尽管如此,常用的进化搜索策略在假设生成中往往优先优化而非探索,搜索过程中的选择压力导致多样性崩溃。受这些局限性的启发,我们将假设搜索表述为采样问题,目标是在固定验证预算下高效生成多样化、高质量的假设。基于这一视角,我们提出\ours,一种受经典并行回火算法启发的进化框架,在多个温度水平下搜索假设,并实现跨温度的原则性信息交换,以在不干扰收敛的情况下改善探索。在分子发现、方程发现和算法发现等领域,我们的方法在相同验证预算下持续提升假设质量和多样性,生成的候选假设在更昂贵的下游计算验证中仍保持稳健。

英文摘要

Large language models (LLMs) are on the rise for accelerating scientific discovery, most recently in advanced tasks such as generating valid scientific hypotheses. Yet in many discovery settings, the goal is not to identify a single best hypothesis since validation can be noisy and expensive, and scientists benefit from a set of high-quality alternative hypotheses that hedge against downstream uncertainty for the best solutions. Nevertheless, commonly used evolutionary search recipes tend to prioritize optimization over exploration in hypothesis generation, and the resulting selection pressure during the search process leads to diversity collapse. Motivated by these limitations, we formulate hypothesis search as a sampling problem, where the objective is to efficiently produce diverse, high-quality hypotheses under a fixed validation budget. Building on this perspective, we propose \ours, an evolutionary framework inspired by the classical parallel tempering algorithm that searches hypotheses at multiple temperature levels and enables principled information exchange across temperatures to improve exploration without disrupting convergence. Across domains including molecular discovery, equation discovery, and algorithm discovery, our approach consistently improves both hypothesis quality and diversity under the same validation budget, and produces candidates that remain robust under more expensive downstream computational validations.

2606.10607 2026-06-10 cs.LG cs.AI cs.CL 新提交

Causal Ensemble Agent: Hierarchical Causal Discovery with LLM-guided Expert Reweighting

因果集成智能体:基于LLM引导的专家重加权的层次化因果发现

Xinyu Li, Yuanyuan Wang, Haoxuan Li, Chuan Zhou, Erdun Gao, Bo Han, Tongliang Liu, Kun Zhang, Howard Bondell, Mingming Gong

发表机构 * The University of Melbourne(墨尔本大学) MBZUAI(穆罕默德·本·扎耶德人工智能大学) Peking University(北京大学) Adelaide University(阿德莱德大学) Hong Kong Baptist University(香港浸会大学) The University of Sydney(悉尼大学) Carnegie Mellon University(卡内基梅隆大学)

AI总结 提出因果集成智能体(CEA)框架,通过线性意见池聚合不同层次的统计因果发现结果,并利用大语言模型(LLM)作为元裁判在决策边界附近动态重加权专家,从而构建更准确完整的因果图。

详情
AI中文摘要

因果发现旨在从观测数据中揭示因果结构,这对现实世界决策至关重要。然而,不同的因果发现算法可能产生相互冲突的结果,使得识别准确的因果图复杂化。传统方法依赖数值和统计假设,往往忽略丰富的领域特定信息(如特征描述),而这些信息也有助于结构学习。尽管近期研究探索使用大语言模型(LLM)通过直接查询推断因果关系,但由于缺乏与实际数据的一致性,此类方法可能不可靠。为解决这些限制,我们提出因果集成智能体(CEA),一种新颖框架,通过线性意见池聚合来自不同图层次的统计发现专家的结构见解,并在聚合置信度接近决策边界时,使用LLM作为元裁判动态重加权专家,从而组合出更完善、更完整的因果图。在合成和真实数据集上的大量实验表明,CEA在广泛的因果发现方法中实现了最强的整体性能,突显了在因果发现中使用LLM进行元分析的有效性。

英文摘要

Causal discovery aims to uncover causal structures from observational data, which is crucial for real-world decision-making. However, different causal discovery algorithms can produce divergent results that conflict with each other, complicating the identification of accurate causal graphs. Traditional approaches rely on numerical values and statistical assumptions, often ignoring rich domain-specific information, such as feature descriptions, which could also help structure learning. While recent works explore using Large Language Models (LLMs) to infer causal relations via direct queries, such methods can be unreliable due to a lack of alignment with the actual data. To address these limitations, we propose Causal Ensemble Agent (CEA), a novel framework that aggregates structural insights from statistical discovery experts across different graph levels via linear opinion pooling, and uses an LLM as a meta-referee to dynamically reweight experts when the aggregated confidence is close to the decision boundary, thereby composing an improved and more complete causal graph. Extensive experiments on both synthetic and real-world datasets demonstrate that CEA achieves the strongest overall performance across a wide range of causal discovery methods, highlighting the effectiveness of using LLMs for meta-analysis in causal discovery.

2606.10824 2026-06-10 cs.LG math.AT 新提交

Encoding the Euler Characteristic Transform

编码欧拉特征变换

Nello Blaser, Odin Hoff Gardaa, Lars M. Salbu, Elena Xinyi Wang, Bastian Rieck

AI总结 提出连续编码方法,将欧拉特征曲线转化为每个顶点的净变化序列,通过小型变换器生成特征向量,并在多个数据集上提升分类精度。

详情
AI中文摘要

欧拉特征曲线(ECC)记录线性嵌入的胞复形在给定方向上的欧拉特征随过滤高度的变化,而欧拉特征变换(ECT)是通过收集多个方向上的ECC得到的单射形状描述符。如何为神经网络编码ECT本身是一种归纳偏置,传统上通过离散化每个ECC来固定。我们引入一种连续编码:对于每个方向和每个顶点,它记录归因于该顶点的净欧拉特征变化,产生一个每个方向的令牌序列,由一个小型变换器映射到特征向量。我们将得到的流程分为两个正交轴上的阶段:一个ECC编码器,在每个方向内作用,将其曲线映射到固定长度向量;以及一个ECT表示,跨方向作用,聚合每个方向的向量为一个。我们研究了六种ECT表示架构,涵盖从结构无关的前馈基线到保持平面旋转等变性的卷积和复值模型的一系列归纳偏置。在涵盖点云、图、立方复形和网格的六个分类基准上,连续编码在六个数据集中有五个提高了准确率,控制实验将增益归因于令牌化本身而非增加的变换器容量。表示架构的重要性小于编码,其归纳偏置的收益取决于编码:前馈网络在连续编码下表现最佳,但在离散化下不如卷积架构鲁棒。

英文摘要

The Euler Characteristic Curve (ECC) records the Euler characteristic of a linearly embedded cell complex as a function of filtration height in a given direction, and the Euler Characteristic Transform (ECT) is the injective shape descriptor obtained by collecting ECCs over many directions. How the ECT is encoded for a neural network is itself an inductive bias, conventionally fixed by discretizing each ECC. We introduce a continuous encoding: for each direction and each vertex it records the net Euler-characteristic change attributed to that vertex, producing a per-direction token sequence that a small transformer maps to a feature vector. We separate the resulting pipeline into two stages on orthogonal axes: an ECC encoder that acts within each direction, mapping its curve to a fixed-length vector, and an ECT representation that acts across directions, aggregating the per-direction vectors into one. We study six ECT representation architectures spanning a range of inductive biases, from a structure-agnostic feedforward baseline to convolutional and complex-valued models that preserve equivariance under planar rotations. Across six classification benchmarks covering point clouds, graphs, cubical complexes, and meshes, the continuous encoding improves accuracy on five of six datasets, and control experiments attribute the gain to the tokenization itself rather than to the added transformer capacity. The representation architecture matters less than the encoding, and the payoff from its inductive biases depends on the encoding: a feedforward network performs best under continuous encoding but is less robust under discretization than convolutional architectures.

2606.09854 2026-06-10 cs.CL cs.AI cs.CY cs.LG 交叉投稿

Can Multi-Agent LLMs Identify Their Peers? Stylometric Fingerprinting in Role-Constrained Political Analysis

多智能体大语言模型能否识别其同类?角色约束政治分析中的笔迹风格指纹识别

Juergen Dietrich

AI总结 研究多智能体LLM在政治分析中能否通过笔迹风格识别模型家族,提出SD-CV协议,T5模型在五类归属任务中达到F1=0.991,证明提示级匿名化无法消除模型身份信号。

Comments 24 pages, 3 figures

详情
AI中文摘要

用于政治声明分析的多智能体大语言模型(LLM)管道容易受到同伴保护偏见的影响:模型倾向于保护同伴模型免于停用,并表现出依赖身份的评分扭曲。提示级匿名化被提出作为缓解措施,但先前的工作同时记录了在角色约束输出中笔迹风格指纹在匿名化后仍然存在——这引发了该缓解措施是否足够的问题。本文首次系统研究LLM是否能在匿名化条件下识别政治分析文本背后的模型家族。我们评估了三种分类器方法——LLM零样本和少样本(Claude Sonnet 4.6和Llama-3.3-70B)以及微调的T5-base模型——在一个涵盖四个商业LLM家族和一个开放世界“未知”类的五类归属任务上。我们引入了一种声明不相交的交叉验证协议(SD-CV;定义见第3.5节),该协议保证训练和验证数据之间没有内容重叠,并将其与运行不相交的基线(RD-CV)进行对比。T5在SD-CV下达到Macro F1 = 0.991(±0.008),在24个完全保留的声明上F1 = 0.978——尽管与RD-CV相比,训练-测试内容距离增加了2.1倍(0.767 vs. 0.366,p<0.001),但仍表现出稳健性,证明了真正的笔迹风格泛化能力。一项分数SD-CV分析确定了训练数据40%(约440篇文本)处的性能拐点。我们的研究结果证实,仅靠提示级匿名化无法消除模型身份信号,这对欧盟AI法案合规性(第13、14、26条)以及质量关键型多智能体部署中的计算机系统验证(CSV)具有直接影响。

英文摘要

Multi-agent large language model (LLM) pipelines for political statement analysis are vulnerable to peer-preservation bias: models tend to protect peer models from deactivation and show identity-dependent scoring distortions. Prompt-level anonymization was proposed as a mitigation, but prior work simultaneously documented that stylometric fingerprints survive anonymization in role-constrained outputs - raising the question of whether this mitigation is sufficient. This paper provides the first systematic investigation of whether LLMs can identify the model family behind political analysis texts under anonymization conditions. We evaluate three classifier approaches - LLM zero-shot and few-shot (Claude Sonnet 4.6 and Llama-3.3-70B) and a fine-tuned T5-base model - on a five-class attribution task covering four commercial LLM families and an open-world 'unknown' class. We introduce a statement-disjoint cross-validation protocol (SD-CV; defined in Section 3.5) that guarantees no content overlap between training and validation data, and contrast it with a run-disjoint baseline (RD-CV). T5 achieves Macro F1 = 0.991 (+-0.008) under SD-CV and F1 = 0.978 on 24 completely held-out statements - robust despite a 2.1x increase in train-test content distance versus RD-CV (0.767 vs. 0.366, p<0.001), demonstrating genuine stylometric generalization. A fractional SD-CV analysis identifies a performance knee at 40% of training data (~440 texts). Our findings confirm that prompt-level anonymization alone cannot neutralize model identity signals, with direct implications for EU AI Act compliance (Articles 13, 14, 26) and for computer system validation (CSV) in quality-critical multi-agent deployments.

2606.09957 2026-06-10 cs.SE cs.LG 交叉投稿

Data-aware Static Analysis: Improving Detection of Semantic Faults in Machine Learning Code Using Data Characteristics

数据感知静态分析:利用数据特征改进机器学习代码中语义故障的检测

Willem Meijer, Kristian Sandahl, Dániel Varró

发表机构 * Knut and Alice Wallenberg Foundation(Knut和Alice沃尔贝格基金会) Software Center Project 61(软件中心项目61) Vinnova CoDig competence center(Vinnova CoDig专业中心)

AI总结 提出一种数据感知静态分析方法,结合数据流与控制流分析及API契约,在编写代码时而非训练后检测机器学习代码中的语义故障,如误用未缩放数据训练尺度敏感模型。

Comments 6 pages, 3 figures, 2 listings, 1 table; To be published in "2026 IEEE/ACM 48th International Conference on Software Engineering (ICSE-NIER '26)"

详情
AI中文摘要

机器学习模型使用中的语义故障是机器学习开发者常见的问题,会导致预测次优、计算成本高或输出错误。例如,有人可能错误地使用未缩放的数据来训练尺度敏感模型。机器学习开发者在训练模型后手动分析结果来检测这些故障,这使得过程效率低下。我们提出了一种新颖的数据感知静态分析方法来检测机器学习代码中的语义故障,使开发者能够在编写代码时而不是在训练模型后揭示这些错误。我们的方法结合了数据流和控制流分析以及API契约,能够在高抽象层次上对机器学习代码进行数据感知推理。通过分析真实世界的机器学习笔记本样本,我们展示了我们解决方案的潜力,发现我们可以检测需要数据感知方法的故障。

英文摘要

Semantic faults specific to the use of machine learning models are a common problem for machine learning developers, causing suboptimal predictions, high computational cost, or incorrect outputs. For example, one may erroneously use unscaled data to train a scale-sensitive model. Machine learning developers detect these faults after training their models and manually analyzing the results, making it an inefficient process. We propose a novel data-aware static analysis approach to detect semantic faults in machine learning code, allowing developers to reveal these bugs while writing code instead of after training the model. Our approach uses combined data and control flow analysis, and API contracts, enabling data-aware reasoning about machine learning code at a high level of abstraction. We highlight the potential of our solution by analyzing a sample of real-world machine learning notebooks, finding that we can detect faults that require a data-aware approach.

2506.03411 2026-06-10 cs.LG cs.GT 版本更新

A Machine Learning Theory Perspective on Strategic Litigation

战略诉讼的机器学习理论视角

Melissa Dutz, Han Shao, Avrim Blum, Aloni Cohen

发表机构 * Toyota Technological Institute at Chicago(芝加哥丰田技术研究所) University of Maryland(马里兰大学) The University of Chicago(芝加哥大学)

AI总结 从机器学习理论出发,建模普通法体系中战略诉讼者通过选择案件影响下级法院决策规则的问题,分析其影响力和最优策略,发现反直觉现象。

详情
AI中文摘要

战略诉讼是指提起诉讼的目标不仅限于解决特定纠纷,而是产生更广泛的影响。在普通法体系中,案件产生深远影响的一种方式是通过确立新的法律先例,后续法院必须遵循。本文从机器学习理论的角度探讨战略诉讼。我们考虑一个普通法法律体系的抽象模型,其中下级法院通过应用从上级法院过去裁决中学习到的决策规则来裁决新案件。在该模型中,我们探索战略诉讼者的力量,他们战略性地将案件提交给上级法院,以影响下级法院在未来案件中应用的决策规则。我们探讨的问题包括:战略诉讼者能产生什么影响?战略诉讼者应该将哪些案件提交法院?当战略诉讼者确信法院会做出不利于他们的裁决时,提起诉讼是否有意义?我们表明,这一战略案件选择问题具有有趣的结构,即使是简单的设置也会表现出反直觉的现象。当案件由一维点表示且下级法院的学习算法是最近邻时,或者当案件由d维点表示且下级法院的学习算法是支持向量机时,我们刻画了可诱导决策规则的集合,并开发了根据战略诉讼者目标选择最优案件集提交给上级法院的算法。

英文摘要

Strategic litigation involves bringing a case to court with the goal of having an impact beyond resolving the particular dispute at hand. In a common law system, one way a case may have far-reaching impact is by establishing new legal precedent that later courts must follow. In this paper, we explore strategic litigation from the perspective of machine learning theory. We consider an abstract model of a common law legal system where a lower court decides new cases by applying a decision rule learned from a higher court's past rulings. In this model, we explore the power of a strategic litigator, who strategically brings cases to the higher court to influence the decision rule applied by the lower court in future cases. We explore questions including: What impact can a strategic litigator have? Which cases should a strategic litigator bring to court? Does it ever make sense for a strategic litigator to bring a case when they are sure the court will rule against them? We show that this strategic case selection problem has interesting structure, with even simple settings exhibiting counterintuitive phenomena. When cases are represented by points in one dimension and the lower court's learning algorithm is nearest neighbor, or as points in d dimensions and the lower court's learning algorithm is a support vector machine, we characterize the set of inducible decision rules and develop algorithms for selecting an optimal set of cases to bring to the higher court given the strategic litigator's objectives.

2511.10234 2026-06-10 cs.LG cs.AI 版本更新

Lost in Serialization: Invariance and Generalization of LLM Graph Reasoners

迷失在序列化中:LLM图推理器的不变性与泛化能力

Daniel Herbst, Lea Karbevska, Divyanshu Kumar, Akanksha Ahuja, Fatemeh Gholamzadeh Nasrabadi, Fabrizio Frasca

发表机构 * University of Cambridge(剑桥大学) ETH Zurich(苏黎世联邦理工学院)

AI总结 研究LLM图推理器对图表示对称性的缺乏不变性,通过分解序列化因素并评估微调影响,发现大模型更鲁棒,微调降低节点重标敏感但增加结构和格式敏感,且不保证泛化。

Comments ICML 2026 Workshop on Graph Foundation Models

详情
AI中文摘要

尽管前景广阔,基于大型语言模型(LLM)的图推理器缺乏对图表示中对称性的内置不变性。在顺序图序列化上操作时,LLM在节点重索引、边重排序或格式变化下可能产生不同输出,引发鲁棒性问题。我们系统分析了这些影响,研究了微调如何影响编码敏感性以及在未见任务上的泛化能力。我们提出了一种将图序列化分解为节点标记、边编码和语法的原则性方法,并在一个全面的基准测试套件上评估了LLM对每个因素变化的鲁棒性。我们还贡献了一组新的谱任务,以进一步评估微调推理器的泛化能力。结果表明,较大的(未微调)模型更鲁棒。微调降低了对节点重标的敏感性,但可能增加对结构和格式变化的敏感性,同时并未一致地提高在未见任务上的性能。

英文摘要

While promising, graph reasoners based on Large Language Models (LLMs) lack built-in invariance to symmetries in graph representations. Operating on sequential graph serializations, LLMs can produce different outputs under node reindexing, edge reordering, or formatting changes, raising robustness concerns. We systematically analyze these effects, studying how fine-tuning impacts encoding sensitivity as well generalization on unseen tasks. We propose a principled decomposition of graph serializations into node labeling, edge encoding, and syntax, and evaluate LLM robustness to variations of each of these factors on a comprehensive benchmarking suite. We also contribute a novel set of spectral tasks to further assess generalization abilities of fine-tuned reasoners. Results show that larger (non-fine-tuned) models are more robust. Fine-tuning reduces sensitivity to node relabeling but may increase it to variations in structure and format, while it does not consistently improve performance on unseen tasks.

2602.01135 2026-06-10 cs.LG 版本更新

Your Autoregressive Model Already Reveals the Causal Graph

你的自回归模型已经揭示了因果图

Hugo Math, Rainer Lienhart

发表机构 * Department of Machine Learning \& Computer Vision, University of Augsburg, Augsburg, Germany

AI总结 本文提出TRACE框架,利用预训练自回归模型作为密度估计器进行条件独立性测试,从单序列离散事件中恢复时间因果图,并在大规模非线性SCM和真实车辆诊断日志上取得显著性能提升。

Comments 8 pages

详情
Journal ref
Structured Probabilistic Inference & Generative Modeling workshop ICML 2026
AI中文摘要

通过下一个词预测训练的自回归模型隐式地学习了其数据生成过程的条件独立结构。我们利用这一观察结果,从单个离散事件观测序列中执行可扩展的因果发现——无需任何特定任务的重新训练。这种单流设置自然出现在车辆诊断、制造系统和患者轨迹中,但至今仍未得到充分解决:缺乏重复样本、大量事件词汇和长程时间依赖使得现有方法要么不准确,要么计算上不可行。我们引入了TRACE,一个将任何预训练自回归模型重新用作条件互信息密度估计器的框架,条件互信息是条件独立性测试的基本原语。通过在GPU上构建并行化的CI测试,TRACE恢复了样本级时间因果图及其摘要投影,其规模随词汇量线性增长,同时自然处理延迟因果效应。关键的是,我们证明了最小化标准交叉熵预训练损失直接最小化了因果识别误差的上界,建立了序列预测与因果发现之间的对偶性。在非线性SCM(|X| = 8000)和真实车辆诊断日志(|X| = 29100)上,TRACE是此规模下首个适用的方法,在F1分数上超过最强基线20多点。

英文摘要

Autoregressive models trained via next-token prediction implicitly learn the conditional independence structure of their data-generating process. We exploit this observation to perform scalable causal discovery from a single observed sequence of discrete events -- without any task-specific retraining. Such single-stream settings arise naturally in vehicle diagnostics, manufacturing systems, and patient trajectories, yet they remain largely unsolved: the absence of repeated samples, massive event vocabularies, and long-range temporal dependencies render existing methods either inaccurate or computationally intractable. We introduce TRACE, a framework that repurposes any pretrained autoregressive model as a density estimator for conditional mutual information, the fundamental primitive for conditional independence testing. By constructing parallelized CI tests on GPUs, TRACE recovers both the sample-level time causal graph and its summary projection, scaling linearly with the vocabulary size while naturally handling delayed causal effects. Crucially, we prove that minimizing the standard cross-entropy pretraining loss directly minimizes an upper bound on the causal identification error, establishing a duality between sequence prediction and causal discovery. On nonlinear SCMs (|X| = 8000) and real-world vehicle diagnostic logs (|X| = 29100), TRACE is the first applicable method at this scale, outperforming the strongest baseline by over 20 F1 points.

2604.04287 2026-06-10 cs.LG cs.CL q-bio.GN 版本更新

Entropy, Disagreement, and the Limits of Foundation Models in Genomics

熵、分歧与基因组基础模型的局限性

Maxime Rochkoulets, Lovro Vrček, Mile Šikić

发表机构 * Genome Institute of Singapore, A*STAR(新加坡基因组研究院,A*STAR) KU Leuven(卢森堡大学) Faculty of Electrical Engineering and Computing, University of Zagreb(扎格雷布大学电子工程与计算学院)

AI总结 本文通过分析熵对模型学习的影响,发现基因组序列的高熵导致输出分布接近均匀、模型间分歧大和静态嵌入不稳定,且Fisher信息集中在嵌入层,表明仅靠序列自监督训练可能不适用于基因组数据。

Comments Accepted to LMLR Workshop at ICLR 2026

详情
AI中文摘要

基因组学中的基础模型与自然语言处理中的基础模型相比,成功程度参差不齐。然而,其有效性有限的原因仍不清楚。在这项工作中,我们研究了熵作为限制此类模型从训练数据中学习并发展基础能力的基本因素的作用。我们在文本和DNA序列上训练模型集成,并分析它们的预测、静态嵌入和经验Fisher信息流。我们表明,从未见标记预测的角度来看,基因组序列的高熵导致输出分布接近均匀、模型间分歧大以及静态嵌入不稳定,即使模型在架构、训练和数据上匹配也是如此。然后,我们证明在DNA上训练的模型将Fisher信息集中在嵌入层,似乎未能利用标记间关系。我们的结果表明,仅从序列进行自监督训练可能不适用于基因组数据,这质疑了当前训练基因组基础模型方法背后的假设。

英文摘要

Foundation models in genomics have shown mixed success compared to their counterparts in natural language processing. Yet, the reasons for their limited effectiveness remain poorly understood. In this work, we investigate the role of entropy as a fundamental factor limiting the capacities of such models to learn from their training data and develop foundational capabilities. We train ensembles of models on text and DNA sequences and analyze their predictions, static embeddings, and empirical Fisher information flow. We show that the high entropy of genomic sequences -- from the point of view of unseen token prediction -- leads to near-uniform output distributions, disagreement across models, and unstable static embeddings, even for models that are matched in architecture, training and data. We then demonstrate that models trained on DNA concentrate Fisher information in embedding layers, seemingly failing to exploit inter-token relationships. Our results suggest that self-supervised training from sequences alone may not be applicable to genomic data, calling into question the assumptions underlying current methodologies for training genomic foundation models.

2206.02178 2026-06-10 cs.AI cs.LG 版本更新

Belief Acquisition as Stochastic Filtering

信念获取作为随机滤波

Dawei Chen, John Lloyd, Samuel Yang-Zhao, Kee Siong Ng

发表机构 * School of Computing, Australian National University(计算机学院,澳大利亚国立大学)

AI总结 本文提出将信念获取视为随机滤波问题,通过分解条件滤波器在高维状态空间中同时跟踪状态和估计参数,并在流行病跟踪等实验中验证有效性。

Comments 51 pages

详情
AI中文摘要

本文研究如何利用随机滤波实现信念获取。首先,概述了经验信念的理论基础。然后,研究了该背景下的随机滤波。本文引入了因子化条件滤波器,这是一种新的滤波算法,用于在高维状态空间中同时跟踪状态和估计参数。算法的条件性质用于估计参数,因子化性质用于将状态空间分解为低维子空间,使得在这些子空间上的滤波得到的分布的乘积是对整个状态空间上分布的良好近似。算法成功应用的条件是:观测在子空间级别可用,且转移模式可以分解为近似局限于子空间的局部转移模式;这些条件在计算机科学、工程和地球物理滤波应用中广泛满足。在大型接触网络上跟踪流行病和估计参数的实验结果显示了该方法的有效性。

英文摘要

This paper studies how belief acquisition can be accomplished using stochastic filtering. First, a theoretical foundation for empirical beliefs is outlined. Then stochastic filtering in this context is studied. The paper introduces factored conditional filters, new filtering algorithms for simultaneously tracking states and estimating parameters in high-dimensional state spaces. The conditional nature of the algorithms is used to estimate parameters and the factored nature is used to decompose the state space into low-dimensional subspaces in such a way that filtering on these subspaces gives distributions whose product is a good approximation to the distribution on the entire state space. The conditions for successful application of the algorithms are that observations be available at the subspace level and that the transition schema can be factored into local transition schemas that are approximately confined to the subspaces; these conditions are widely satisfied in computer science, engineering, and geophysical filtering applications. Experimental results on tracking epidemics and estimating parameters in large contact networks show the effectiveness of the approach.

2603.04689 2026-06-10 cs.DS cs.CC cs.CG cs.CY cs.DB cs.LG 版本更新

Generalizing Fair Top-$k$ Selection: An Integrative Approach

公平的top-k选择的泛化:一种整合方法

Guangya Cai

发表机构 * University of Minnesota, Twin Cities(明尼苏达大学,双城分校)

AI总结 本文研究了在多个受保护群体下寻找公平线性评分函数的问题,通过分析发现即使对于二维数据集和小k值,问题可能计算上不可行,但通过引入新的 disparity 测量方法,可恢复小k值时的效率。

详情
AI中文摘要

公平的top-k选择,确保在选出的top-k候选者中适当代表少数族或历史上不利群体,引起了广泛关注。我们研究了在多个受保护群体下寻找公平(线性)评分函数的问题,同时最小化与参考评分函数的差异。这扩展了之前的设置,该设置仅限于单群体设置且不考虑差异最小化。先前研究暗示受保护群体的数量可能对运行时间效率影响有限。然而,出于实验探索的需要,我们发现这一暗示忽略了可能影响结果公平性的关键问题。一旦正确考虑这个问题,我们的难度分析显示,即使对于二维数据集和小k值,问题可能计算上不可行。然而,我们的分析也揭示了难度障碍的差距,使我们能够在受保护群体数量足够小时恢复小k值的效率。此外,除了将差异测量为“公平评分函数与参考评分函数之间的距离”外,我们引入了另一种差异测量——即“效用损失”,这可能在小权重扰动下产生更稳定的评分函数。通过仔细的工程权衡,平衡实现复杂性、鲁棒性和性能,我们的增强双管方案在真实世界数据集上表现出强大的经验性能,实验观察也影响了算法设计和实现决策。

英文摘要

Fair top-$k$ selection, which ensures appropriate proportional representation of members from minority or historically disadvantaged groups among the top-$k$ selected candidates, has drawn significant attention. We study the problem of finding a fair (linear) scoring function with multiple protected groups while also minimizing the disparity from a reference scoring function. This generalizes the prior setup, which was restricted to the single-group setting without disparity minimization. Previous studies imply that the number of protected groups may have a limited impact on the runtime efficiency. However, driven by the need for experimental exploration, we find that this implication overlooks a critical issue that may affect the fairness of the outcome. Once this issue is properly considered, our hardness analysis shows that the problem may become computationally intractable even for a two-dimensional dataset and small values of $k$. However, our analysis also reveals a gap in the hardness barrier, enabling us to recover the efficiency for the case of small $k$ when the number of protected groups is sufficiently small. Furthermore, beyond measuring disparity as the "distance" between the fair and the reference scoring functions, we introduce an alternative disparity measure$\unicode{x2014}$utility loss$\unicode{x2014}$that may yield a more stable scoring function under small weight perturbations. Through careful engineering trade-offs that balance implementation complexity, robustness, and performance, our augmented two-pronged solution demonstrates strong empirical performance on real-world datasets, with experimental observations also informing algorithm design and implementation decisions.

2606.04833 2026-06-10 cs.LG cs.AI 版本更新

Signed Dual Attention: Capturing Signed Dependencies in Time Series Forecasting

符号双注意力:在时间序列预测中捕捉符号依赖关系

Balthazar Courvoisier, Tristan Cazenave

发表机构 * Queensfield AI Technologies

AI总结 提出符号双注意力机制,通过双消息传递方案同时捕捉正负依赖关系,无需额外参数,提升时间序列预测性能。

Comments 5 pages, 3 figures, accepted at AAAI 2026 AI4TS Workshop

详情
AI中文摘要

最初为自然语言处理开发的Transformer架构和注意力机制,现在已成为各种深度学习模型的核心,包括时间序列预测应用。然而,标准注意力机制隐含地假设同质性交互,限制了其对具有正负依赖关系(如时间序列)的数据建模能力。在这项工作中,我们引入了符号双注意力,一种新颖的注意力公式,无需额外参数即可捕捉正负关系模式。通过利用受相关结构启发的双消息传递方案,符号双注意力在单个共享块内传播支持和对比信息,有效实现了两个头注意力的表达能力而无需额外参数。该模块可以无缝集成到现有架构中,并在需要符号关系建模的某些情况下带来性能提升。这种方法为构建更具表达力和参数效率的Transformer开辟了道路。

英文摘要

Initially developed for natural language processing, Transformer architectures and attention mechanisms are now central to a wide range of deep learning models, including applications in time series forecasting. A standard attention mechanism, however, implicitly assumes homophilic interactions, limiting its ability to model data with positive and negative dependencies, such as time series. In this work, we introduce the Signed Dual Attention, a novel attention formulation that captures both positive and negative relational patterns without additional parameters. By leveraging a dual message-passing scheme inspired by correlation structures, Signed Dual Attention propagates both supportive and contrastive information within a single shared block, effectively achieving the expressiveness of two head attention without additional parameters. This module can be seamlessly integrated into existing architectures and can yield performance gains in certain situations, requiring signed relational modeling. This approach opens a pathway toward more expressive and parameter-efficient transformers.

2509.04154 2026-06-10 cs.LG cs.AI 版本更新

Robust Filter Attention: Self-Attention as Precision-Weighted State Estimation

鲁棒滤波注意力:自注意力作为精度加权状态估计

Peter Racioppo

发表机构 * Independent Researcher, Los Angeles, CA, USA(独立研究者,美国加利福尼亚州洛杉矶)

AI总结 提出鲁棒滤波注意力(RFA),将自注意力建模为基于线性随机微分方程的状态估计,在语言建模中实现优于RoPE的困惑度与零样本外推稳定性。

详情
AI中文摘要

我们引入鲁棒滤波注意力(RFA),一种将自注意力表述为鲁棒状态估计的方法。每个令牌被视为由线性随机微分方程(SDE)控制的潜在轨迹的带噪声观测,注意力权重由该模型下的一致性决定,而非静态特征相似性。在各向同性噪声和衰减假设下,RFA的计算复杂度与标准注意力相当。在语言建模基准上,RFA在训练窗口内实现了比RoPE更低的困惑度,同时在零样本外推到更长上下文时保持稳定。该框架还提供了标准位置机制的动力学解释,将旋转嵌入和近因偏差与随机动力学引起的传输和不确定性传播联系起来。

英文摘要

We introduce Robust Filter Attention (RFA), a formulation of self-attention as a robust state estimator. Each token is treated as a noisy observation of a latent trajectory governed by a linear stochastic differential equation (SDE), and attention weights are determined by consistency under this model rather than static feature similarity. Under isotropic noise and decay assumptions, RFA matches the computational complexity of standard attention. On language modeling benchmarks, RFA achieves lower perplexity than RoPE within the training window while remaining stable under zero-shot extrapolation to longer contexts. The framework also provides a dynamical interpretation of standard positional mechanisms, connecting rotational embeddings and recency biases to transport and uncertainty propagation induced by stochastic dynamics.

2602.16898 2026-06-10 cs.RO cs.AI cs.CV cs.LG 版本更新

MALLVI: A Multi-Agent Framework for Integrated Generalized Robotics Manipulation

MALLVI:一种多智能体框架用于集成通用机器人操作

Mehrshad Taji, Arad Mahdinezhad Kashani, Iman Ahmadi, AmirHossein Jadidi, Saina Kashani, Babak Khalaj

发表机构 * Department of Electrical Engineering, Sharif University of Technology(电气工程系,谢里夫大学)

AI总结 MALLVI通过多智能体协作实现闭环反馈驱动的机器人操作,提升泛化能力和零样本任务成功率。

Comments Some fundemental change in text and codebase

详情
AI中文摘要

MALLVI通过多智能体协作实现闭环反馈驱动的机器人操作,提升泛化能力和零样本任务成功率。

英文摘要

Task planning for robotic manipulation with large language models (LLMs) is an emerging area. Prior approaches rely on specialized models, fine tuning, or prompt tuning, and often operate in an open loop manner without robust environmental feedback, making them fragile in dynamic settings. MALLVI presents a Multi Agent Large Language and Vision framework that enables closed-loop feedback driven robotic manipulation. Given a natural language instruction and an image of the environment, MALLVI generates executable atomic actions for a robot manipulator. After action execution, a Vision Language Model (VLM) evaluates environmental feedback and decides whether to repeat the process or proceed to the next step. Rather than using a single model, MALLVI coordinates specialized agents, Decomposer, Localizer, Thinker, and Reflector, to manage perception, localization, reasoning, and high level planning. An optional Descriptor agent provides visual memory of the initial state. The Reflector supports targeted error detection and recovery by reactivating only relevant agents, avoiding full replanning. Experiments in simulation and real-world settings show that iterative closed loop multi agent coordination improves generalization and increases success rates in zero shot manipulation tasks. Code available at https://github.com/iman1234ahmadi/MALLVI .

2510.17876 2026-06-10 physics.geo-ph cs.LG 版本更新

Three-dimensional inversion of gravity data using implicit neural representations and scientific machine learning

利用隐式神经表示和科学机器学习进行三维重力数据反演

Pankaj K Mishra, Sanni Laaksonen, Jochen Kamm, Anand Singh

发表机构 * Geological Survey of Finland(芬兰地质调查局) Indian Institute of Technology Bombay(印度理工学院孟买分校)

AI总结 本文提出一种基于隐式神经表示的三维重力反演方法,通过物理正演模型损失直接训练深度神经网络,实现连续密度场的反演,无需预定义网格或离散化,提升对地质结构的重建能力。

Comments Codes for reproducing results are at https://zenodo.org/records/19440024

详情
Journal ref
Scientific Reports (2026)
AI中文摘要

重力数据反演是研究地下密度变化的重要方法,涉及矿产勘探、地热评估、碳储存、天然氢、地下水资源和构造演化。本文提出一种科学机器学习方法,利用隐式神经表示(INR)将地下密度表示为连续场。该方法通过物理正演模型损失直接训练深度神经网络,将空间坐标映射到连续密度场,无需预定义网格或离散化。空间编码增强了网络捕捉尖锐对比和短波长特征的能力,克服了传统坐标基网络因频谱偏倚导致的过度平滑问题。本文在合成示例中验证了该方法,包括平滑模型和倾斜块模型,以评估不同深度结构的恢复能力。INR框架在不使用显式正则化或深度加权的情况下重建了详细的结构和地质合理的边界,同时随着问题规模增大,减少了反演参数数量。这些结果展示了隐式表示在实现可扩展、灵活和可解释的大规模地球物理反演中的潜力。该框架可推广到其他地球物理方法及联合/多物理场反演。

英文摘要

Inversion of gravity data is an important method for investigating subsurface density variations relevant to mineral exploration, geothermal assessment, carbon storage, natural hydrogen, groundwater resources, and tectonic evolution. Here we present a scientific machine-learning approach for three-dimensional gravity inversion that represents subsurface density as a continuous field using an implicit neural representation (INR). The method trains a deep neural network directly through a physics-based forward-model loss, mapping spatial coordinates to a continuous density field without predefined meshes or discretisation. Spatial encoding enhances the network's capacity to capture sharp contrasts and short-wavelength features that conventional coordinate-based networks tend to oversmooth due to spectral bias. We demonstrate the approach on synthetic examples including smooth models, representing realistic geological complexity, and a dipping block model to assess recovery of structures at different depths. The INR framework reconstructs detailed structure and geologically plausible boundaries without explicit regularisation or depth weighting, while reducing the number of inversion parameters as the problem size grows bigger. These results highlight the potential of implicit representations to enable scalable, flexible, and interpretable large-scale geophysical inversion. This framework could generalise to other geophysical methods and for joint/multiphysics inversion.

2509.11517 2026-06-10 cs.CL cs.LG 版本更新

PeruMedQA: Benchmarking Large Language Models (LLMs) on Peruvian Medical Exams -- Dataset Construction and Evaluation

PeruMedQA:在秘鲁医学考试上评估大语言模型(LLMs)——数据集构建与评估

Rodrigo M. Carrillo-Larco, Jesus Lovón Melgarejo, Manuel Castillo-Cara, Gusseppe Bravo-Rocca

发表机构 * Hubert Department of Global Health, Rollins School of Public Health, Emory University(霍伯特全球健康部门,埃默里大学公共卫生学院) Emory Global Diabetes Research Center of Woodruff Health Sciences Center, Emory University(埃默里大学伍德鲁夫健康科学中心全球糖尿病研究中心) Institut de Recherche en Informatique de Toulouse(图卢兹信息研究院) Universidad Nacional de Educación a Distancia(远程教育国立大学) Instituto de Investigación Científica, Universidad de Lima(科学研究所,利马大学) Barcelona Supercomputing Center(巴塞罗那超级计算中心)

AI总结 本文构建了包含8380道题的秘鲁医学考试数据集,通过微调大语言模型并对比不同模型的准确率,揭示了在西班牙语国家医学问题上的性能差异。

Comments https://github.com/rodrigo-carrillo/PeruMedQA

详情
AI中文摘要

背景:医疗大语言模型(LLMs)在回答医学考试中表现出色,但其在西班牙语和拉丁美洲国家的医疗问题上的泛化能力尚不明确。目标:构建秘鲁医师专科学习考试问题数据集,对LLMs进行微调,并评估和比较普通LLMs与微调LLMs的准确性。方法:我们整理了包含8380道题的PeruMedQA数据集,涵盖12个专科(2018-2025年)。我们选择了10个医学LLMs,包括medgemma-4b-it和medgemma-27b-text-it,并开发了零样本任务特定提示来回答问题。我们使用参数高效微调(PEFT)和低秩适应(LoRA)对medgemma-4b-it进行微调,使用所有问题除外2025年(测试集)的问题。结果:medgemma-27b在所有专科中表现最佳,达到精神科89.29%的最高分;然而,在两个专科中,OctoMed-7B略胜一筹:神经外科77.27%和77.38%,放射科76.13%和77.39%。在专科层面,大多数参数少于100亿的LLM正确率低于50%。微调版的medgemma-4b-it在所有参数少于100亿的LLM中胜出,并在各种考试中与700亿参数的LLM竞争。结论:对于需要来自西班牙语国家和与秘鲁有相似流行病学特征的知识库的医疗AI应用和研究,应使用medgemma-27b-text-it。

英文摘要

BACKGROUND: Medical large language models (LLMs) have demonstrated remarkable performance in answering medical examinations. However, the extent to which this high performance is transferable to medical questions in Spanish and from a Latin American country remains unexplored. This knowledge is crucial as LLM-based medical applications gain traction in Latin America. AIMS: To build a dataset of questions medical examinations taken by Peruvian physicians pursuing specialty training; to fine-tune a LLM on this dataset; to evaluate and compare the performance in terms of accuracy between vanilla LLMs and the fine-tuned LLM. METHODS: We curated PeruMedQA, a multiple-choice question-answering (MCQA) dataset containing 8,380 questions spanning 12 specialties (2018-2025). We selected ten medical LLMs, including medgemma-4b-it and medgemma-27b-text-it, and developed zero-shot task specific prompts to answer the questions. We employed parameter-efficient fine tuning (PEFT) and low-rand adaptation (LoRA) to fine-tune medgemma-4b-it utilizing all questions except those from 2025 (test set). RESULTS: Medgemma-27b showed the highest accuracy across all specialities, achieving the highest score of 89.29% in Psychiatry; yet, in two specialties, OctoMed-7B exhibited slight superiority: Neurosurgery with 77.27% and 77.38, respectively; and Radiology with 76.13% and 77.39%, respectively. Across specialties, most LLMs with <10 billion parameters exhibited <50% of correct answers. The fine-tuned version of medgemma-4b-it emerged victorious against all LLMs with <10 billion parameters and rivaled a LLM with 70 billion parameters across various examinations. CONCLUSIONS: For medical AI applications and research that require knowledge bases from Spanish-speaking countries and those exhibiting similar epidemiological profile to Peru's, interested parties should utilize medgemma-27b-text-it.

2602.01023 2026-06-10 cs.IR cs.AI cs.LG 版本更新

Unifying Ranking and Generation in Query Auto-Completion via Retrieval-Augmented Generation and Multi-Objective Alignment

通过检索增强生成和多目标对齐统一查询自动补全中的排序与生成

Kai Yuan, Anthony Zheng, Jia Hu, Divyanshu Sheth, Hemanth Velaga, Kylee Kim, Matteo Guarrera, Besim Avci, Jianhua Li, Xuetao Yin, Rajyashree Mukherjee, Sean Suchter

发表机构 * Apple(苹果公司) UC Berkeley(加州大学伯克利分校)

AI总结 提出一个统一框架,通过检索增强生成(RAG)和多目标直接偏好优化(DPO)将查询自动补全重构为端到端列表生成,解决传统流水线长尾覆盖不足和生成方法幻觉风险的问题,并在大规模商业搜索平台上验证了有效性。

Comments 11 pages, 4 figures

详情
Journal ref
Proceedings of the 32nd ACM SIGKDD Conference on Knowledge Discovery and Data Mining V.2 (KDD '26), August 09--13, 2026, Jeju Island, Republic of Korea
AI中文摘要

查询自动补全(QAC)在用户输入时建议查询补全,帮助用户表达意图并更高效地获取结果。现有方法面临根本性挑战:传统的检索-排序流水线长尾覆盖有限且需要大量特征工程,而最近的生成方法存在幻觉和安全风险。我们提出了一个统一框架,通过检索增强生成(RAG)和多目标直接偏好优化(DPO)将QAC重构为端到端列表生成。我们的方法结合了三个关键创新:(1)将QAC重构为具有多目标优化的端到端列表生成;(2)定义并部署一套基于规则、基于模型和以LLM为评判的验证器用于QAC,并在综合方法中使用它们,结合RAG、多目标DPO和迭代批评-修订以生成高质量合成数据;(3)一种混合服务架构,可在严格的延迟约束下实现高效的生产部署。在大规模商业搜索平台上的评估显示了显著改进:离线指标在所有维度上均有提升,人工评估获得+0.40至+0.69的偏好分数,受控在线实验实现了击键次数减少5.44%和建议采纳率增加3.46%,验证了结合RAG和多目标对齐的统一生成为生产级QAC提供了有效解决方案。这项工作代表了向由大语言模型、RAG和多目标对齐驱动的端到端生成的范式转变,建立了一个经过生产验证的框架,可惠及更广泛的搜索和推荐行业。

英文摘要

Query Auto-Completion (QAC) suggests query completions as users type, helping them articulate intent and reach results more efficiently. Existing approaches face fundamental challenges: traditional retrieve-and-rank pipelines have limited long-tail coverage and require extensive feature engineering, while recent generative methods suffer from hallucination and safety risks. We present a unified framework that reformulates QAC as end-to-end list generation through Retrieval-Augmented Generation (RAG) and multi-objective Direct Preference Optimization (DPO). Our approach combines three key innovations: (1) reformulating QAC as end-to-end list generation with multi-objective optimization; (2) defining and deploying a suite of rule-based, model-based, and LLM-as-judge verifiers for QAC, and using them in a comprehensive methodology that combines RAG, multi-objective DPO, and iterative critique-revision for high-quality synthetic data; (3) a hybrid serving architecture enabling efficient production deployment under strict latency constraints. Evaluation on a large-scale commercial search platform demonstrates substantial improvements: offline metrics show gains across all dimensions, human evaluation yields +0.40 to +0.69 preference scores, and a controlled online experiment achieves 5.44\% reduction in keystrokes and 3.46\% increase in suggestion adoption, validating that unified generation with RAG and multi-objective alignment provides an effective solution for production QAC. This work represents a paradigm shift to end-to-end generation powered by large language models, RAG, and multi-objective alignment, establishing a production-validated framework that can benefit the broader search and recommendation industry.

2510.09498 2026-06-10 q-bio.TO cs.CE cs.LG 版本更新

Unsupervised full-field Bayesian inference of orthotropic hyperelasticity from a single biaxial test: a myocardial case study

无监督的全场贝叶斯推断各向异性超弹性材料从单次双轴测试:心肌案例研究

Rogier P. Krijnen, Akshay Joshi, Siddhant Kumar, Mathias Peirlinck

发表机构 * TUDelft(代尔夫特理工大学)

AI总结 本文提出利用全场运动学进行无监督贝叶斯推断,实现从单次双轴测试中可靠恢复各向异性超弹性材料参数,减少样本需求和实验操作。

详情
AI中文摘要

心脏肌肉组织在被动变形过程中表现出高度非线性超弹性和各向异性材料行为。传统本构识别协议通常结合多种加载模式,通常需要多个样本和大量处理。在软活组织中,此类协议受到样本间和样本内变异性以及操作诱导的机械响应变化的挑战,这些变化可能偏转逆校准。在本工作中,我们利用空间异质性的全场运动学作为多模式测试的替代方案。我们将EUCLID方法重新定向为用于高度非线性、各向异性本构模型的贝叶斯参数推断。使用合成心肌组织板,我们证明单次异质双轴实验结合稀疏反力测量能够可靠恢复Holzapfel-Ogden参数并量化不确定性,适用于多种噪声水平。推断响应与地面真实模拟高度一致,并产生反映测量噪声对各向异性材料模型推断影响的可信区间。我们的工作支持从单次双轴测试中进行非线性各向异性材料模型的单次测试、不确定性感知表征,减少样本需求和实验操作。

英文摘要

Cardiac muscle tissue exhibits highly non-linear hyperelastic and orthotropic material behavior during passive deformation. Traditional constitutive identification protocols therefore combine multiple loading modes and typically require multiple specimens and substantial handling. In soft living tissues, such protocols are challenged by inter- and intra-sample variability and by manipulation-induced alterations of mechanical response, which can bias inverse calibration. In this work we exploit spatially heterogeneous full-field kinematics as an information-rich alternative to multimodal testing. We recast EUCLID, an unsupervised method for the automated discovery of constitutive models, towards Bayesian parameter inference for highly nonlinear, orthotropic constitutive models. Using synthetic myocardial tissue slabs, we demonstrate that a single heterogeneous biaxial experiment, combined with sparse reaction-force measurements, enables robust recovery of Holzapfel-Ogden parameters with quantified uncertainty, across multiple noise levels. The inferred responses agree closely with ground-truth simulations and yield credible intervals that reflect the impact of measurement noise on orthotropic material model inference. Our work supports single-shot, uncertainty-aware characterization of nonlinear orthotropic material models from a single biaxial test, reducing sample demand and experimental manipulation.

2510.03844 2026-06-10 cs.LG stat.AP stat.ME 版本更新

On Using Large Language Models to Enhance Clinically-Driven Missing Data Recovery Algorithms in Electronic Health Records

利用大型语言模型增强电子健康记录中临床驱动的缺失数据恢复算法

Sarah C. Lotspeich, Abbey Collins, Brian J. Wells, Ashish K. Khanna, Joseph Rigdon, Lucy D'Agostino McGowan

发表机构 * Department of Statistical Sciences, Wake Forest University(统计科学系,威克森林大学) Wake Forest University(威克森林大学) Wake Forest University School of Medicine(威克森林大学医学院) Department of Psychology, North Carolina State University(心理学系,北卡罗来纳州立大学) Department of Biostatistics and Data Science, Wake Forest University School of Medicine(生物统计学与数据科学系,威克森林大学医学院) Department of Anesthesiology, Division of Critical Care Medicine, Wake Forest University School of Medicine(麻醉学系,重症医学科,威克森林大学医学院) Outcomes Research Consortium(结局研究联盟)

AI总结 本文探讨利用大型语言模型改进电子健康记录中缺失数据恢复算法的准确性与可扩展性,通过临床专家和LLM协同优化路标,实现与专家审查相似的数据恢复效果。

详情
Journal ref
2026
AI中文摘要

目的:电子健康记录(EHR)数据易出现缺失和错误。先前,我们设计了一种“增强”图表审查协议,利用辅助诊断(路标)来恢复EHR数据中的缺失值(例如,糖尿病控制不良可能暗示缺失的血红蛋白A1c值不健康)。然而,图表审查成本高且耗时,限制了可审查患者的数量。现在,我们研究了基于ICD-10代码的路标驱动算法的准确性和可扩展性,以模拟专家图表审查并恢复缺失值。材料和方法:除了临床专家原始的路标外,我们考虑了通过大型语言模型(LLM)与临床专业知识结合迭代优化的新版本,以扩展辅助诊断列表。使用100名患者在扩展学习健康系统中的图表审查数据,我们检验了不同路标下的算法性能。在1000名患者的更大研究中,我们应用了最终算法,该算法使用了经临床专家批准的LLM添加的路标。结果:该算法恢复的缺失数据量与专家图表审查相当,甚至更多。讨论:临床驱动的算法(通过LLM增强)可以以与图表审查相似的准确性恢复EHR数据,并可应用于大规模样本。将这些算法扩展以监控其他数据质量维度(如合理性)是具有前景的未来方向。

英文摘要

Objective: Electronic health records (EHR) data are prone to missingness and errors. Previously, we devised an "enriched" chart review protocol where a "roadmap" of auxiliary diagnoses (anchors) was used to recover missing values in EHR data (e.g., a diagnosis of impaired glycemic control might imply that a missing hemoglobin A1c value would be considered unhealthy). Still, chart reviews are expensive and time-intensive, which limits the number of patients whose data can be reviewed. Now, we investigate the accuracy and scalability of a roadmap-driven algorithm, based on ICD-10 codes (International Classification of Diseases, 10th revision), to mimic expert chart reviews and recover missing values. Materials and Methods: In addition to the clinicians' original roadmap from our previous work, we consider new versions that were iteratively refined using large language models (LLM) in conjunction with clinical expertise to expand the list of auxiliary diagnoses. Using chart reviews for 100 patients from the EHR at an extensive learning health system, we examine algorithm performance with different roadmaps. Using the larger study of $1000$ patients, we applied the final algorithm, which used a roadmap with clinician-approved additions from the LLM. Results: The algorithm recovered as much, if not more, missing data as the expert chart reviewers, depending on the roadmap. Discussion: Clinically-driven algorithms (enhanced by LLM) can recover missing EHR data with similar accuracy to chart reviews and can feasibly be applied to large samples. Extending them to monitor other dimensions of data quality (e.g., plausability) is a promising future direction.

2508.17196 2026-06-10 cs.LG cs.AI 版本更新

BudgetThinker: Empowering Budget-aware LLM Reasoning with Control Tokens

BudgetThinker: 通过控制令牌赋能预算感知的LLM推理

Hao Wen, Xinrui Wu, Yi Sun, Feifei Zhang, Liye Chen, Jie Wang, Yunxin Liu, Yunhao Liu, Ya-Qin Zhang, Yuanchun Li

发表机构 * Institute for AI Industry Research (AIR) Tsinghua University(人工智能产业研究院(AIR)清华大学) Global Innovation Exchange & Department of Automation Tsinghua University(全球创新交流中心及自动化系 清华大学)

AI总结 BudgetThinker通过在推理过程中插入控制令牌,使LLM能够精确控制推理过程长度,采用两阶段训练流程提升模型在不同预算下的表现。

详情
AI中文摘要

近年来,大语言模型(LLM)通过增加测试时计算来增强推理能力,但此策略导致显著延迟和资源成本,限制了其在现实时间受限或成本敏感场景中的应用。本文提出BudgetThinker,一种新型框架,旨在使LLM具备预算感知推理能力,通过在推理过程中定期插入特殊控制令牌,持续告知模型剩余令牌预算。我们提出了一种方法,结合监督微调(SFT)和基于课程的学习强化学习(RL)阶段,利用长度感知奖励函数优化准确性和预算遵守度。我们证明BudgetThinker在各种推理预算下的数学基准测试中显著优于强基线。我们的方法提供了一种可扩展且有效的解决方案,用于开发高效可控的LLM推理,使高级模型更适用于资源受限和实时环境。

英文摘要

Recent advancements in Large Language Models (LLMs) have leveraged increased test-time computation to enhance reasoning capabilities, a strategy that, while effective, incurs significant latency and resource costs, limiting their applicability in real-world time-constrained or cost-sensitive scenarios. This paper introduces BudgetThinker, a novel framework designed to empower LLMs with budget-aware reasoning, enabling precise control over the length of their thought processes. We propose a methodology that periodically inserts special control tokens during inference to continuously inform the model of its remaining token budget. This approach is coupled with a comprehensive two-stage training pipeline, beginning with Supervised Fine-Tuning (SFT) to familiarize the model with budget constraints, followed by a curriculum-based Reinforcement Learning (RL) phase that utilizes a length-aware reward function to optimize for both accuracy and budget adherence. We demonstrate that BudgetThinker significantly surpasses strong baselines in maintaining performance across a variety of reasoning budgets on challenging mathematical benchmarks. Our method provides a scalable and effective solution for developing efficient and controllable LLM reasoning, making advanced models more practical for deployment in resource-constrained and real-time environments.

2410.22967 2026-06-10 cs.LG eess.SP 版本更新

Adaptive NAD: Online and Self-adaptive Unsupervised Network Anomaly Detector

自适应NAD:在线且自适应的无监督网络异常检测器

Yachao Yuan, Yu Huang, Yingwen Wu

发表机构 * Suda University(苏州大学)

AI总结 提出一种在线自适应的无监督网络异常检测框架Adaptive NAD,通过两层异常检测策略生成伪标签和在线训练方案,在多个数据集上实现最低误报率和更快推理速度。

详情
AI中文摘要

物联网的广泛使用增加了网络威胁的风险;因此,开发能够适应不断变化的流量模式的异常检测系统(ADS)至关重要。以往的研究主要关注离线无监督学习方法以保护ADS,但这在实际应用中并不适用。本文设计了Adaptive NAD,一种面向安全领域的在线自适应无监督网络异常检测框架。提出了一种两层异常检测策略来生成可靠的高置信度伪标签。然后,引入了一种在线训练方案,通过新颖的阈值计算技术来更新Adaptive NAD。实验结果表明,在CIC-Darknet2020、NSL-KDD和Edge-IIoTset数据集上,Adaptive NAD实现了最低的误报率(分别为1.33%、0.71%和0.08%),并且在线推理延迟比现有最优解决方案快3倍以上。代码已发布在https://github.com/MyLearnCodeSpace/Adaptive-NAD。

英文摘要

The widespread usage of the Internet of Things (IoT) has raised the risks of cyber threats; thus, developing Anomaly Detection Systems (ADSs) that can adapt to evolving traffic pattern is critical. Previous studies primarily focused on offline unsupervised learning methods to safeguard ADSs, which is not applicable in practical real-world applications. In this paper, we design Adaptive NAD, an online and self-Adaptive unsupervised Network Anomaly Detection framework for security domains. A two-layer anomaly detection strategy is proposed to generate reliable high-confidence pseudo-labels. Then, an online training scheme is introduced to update Adaptive NAD by a novel threshold calculation technique. Experimental results demonstrate that Adaptive NAD achieves the lowest false alarm rate (1.33%, 0.71%, and 0.08%) and has a more than 3 times faster online inference latency compared with state-of-the-art solutions on the CIC-Darknet2020, NSL-KDD, and Edge-IIoTset datasets, respectively. The code is released at https://github.com/MyLearnCodeSpace/Adaptive-NAD.

2501.12486 2026-06-10 cs.LG cs.CL 版本更新

The Journey Matters: Average Parameter Count over Pre-training Unifies Sparse and Dense Scaling Laws

训练过程至关重要:平均预训练参数计数统一了稀疏和密集的扩展规律

Tian Jin, Ahmed Imtiaz Humayun, Utku Evci, Suvinay Subramanian, Amir Yazdanbakhsh, Dan Alistarh, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL(MIT 计算科学与人工智能实验室) Rice University(稻大学) Google Research(谷歌研究) Google DeepMind(谷歌深度思维) Google(谷歌) IST Austria(奥地利科学院)

AI总结 本文通过研究80种不同的剪枝计划,发现预训练过程中在25%和75%的计算量启动和结束剪枝可获得最佳评估损失,提出新的扩展规律统一了稀疏和密集预训练的扩展规律。

Comments 17 pages

详情
Journal ref
The Thirteenth International Conference on Learning Representations (ICLR), 2025
AI中文摘要

剪枝通过消除神经网络中不必要的参数,为大型语言模型(LLMs)日益增长的计算需求提供了一个有前途的解决方案。虽然许多研究关注训练后的剪枝,但将剪枝和预训练结合到一个阶段的稀疏预训练提供了一个更简单的替代方案。在本文中,我们通过研究80种不同的剪枝计划,探讨了不同稀疏度和训练持续时间下的最优稀疏预训练配置。我们发现,在总训练计算量的25%处启动剪枝并在75%处结束可获得接近最优的最终评估损失。这些发现为高效且有效的LLMs稀疏预训练提供了有价值的见解。此外,我们提出了一种新的扩展规律,修改了Chinchilla扩展规律以使用预训练期间的平均参数计数。通过实证和理论验证,我们证明了这种修改后的扩展规律能够准确地建模稀疏和密集预训练LLMs的评估损失,统一了预训练范式的扩展规律。我们的发现表明,虽然稀疏预训练在等效计算预算下能获得与密集预训练相同的最终模型质量,但通过减少模型大小,它在推理过程中提供了显著的计算节省潜力。

英文摘要

Pruning eliminates unnecessary parameters in neural networks; it offers a promising solution to the growing computational demands of large language models (LLMs). While many focus on post-training pruning, sparse pre-training--which combines pruning and pre-training into a single phase--provides a simpler alternative. In this work, we present the first systematic exploration of optimal sparse pre-training configurations for LLMs through an examination of 80 unique pruning schedules across different sparsity levels and training durations. We find that initiating pruning at 25% of total training compute and concluding at 75% achieves near-optimal final evaluation loss. These findings provide valuable insights for efficient and effective sparse pre-training of LLMs. Furthermore, we propose a new scaling law that modifies the Chinchilla scaling law to use the average parameter count over pre-training. Through empirical and theoretical validation, we demonstrate that this modified scaling law accurately models evaluation loss for both sparsely and densely pre-trained LLMs, unifying scaling laws across pre-training paradigms. Our findings indicate that while sparse pre-training achieves the same final model quality as dense pre-training for equivalent compute budgets, it provides substantial benefits through reduced model size, enabling significant potential computational savings during inference.

2502.11517 2026-06-10 cs.CL cs.DC cs.LG 版本更新

Learning to Keep a Promise: Scaling Language Model Decoding Parallelism with Learned Asynchronous Decoding

学习承诺:通过学习异步解码扩展语言模型解码并行性

Tian Jin, Ellie Y. Cheng, Zack Ankner, Nikunj Saunshi, Blake M. Elias, Amir Yazdanbakhsh, Jonathan Ragan-Kelley, Suvinay Subramanian, Michael Carbin

发表机构 * DeepMind, London, UK(深度思维公司,伦敦,英国) Google Research, New York, NY, USA(谷歌研究院,纽约,纽约州,美国) Stanford University, Stanford, CA, USA(斯坦福大学,斯坦福,加利福尼亚州,美国) University of Toronto, Toronto, Ontario, Canada(多伦多大学,多伦多,安大略省,加拿大) University of Washington, Seattle, WA, USA(华盛顿大学,西雅图,华盛顿州,美国)

AI总结 本文提出PASTA系统,通过学习使语言模型识别语义独立性,提升解码并行性,实验证明在解码速度和响应质量上优于现有方法。

Comments 15 pages

详情
Journal ref
Proceedings of the 42nd International Conference on Machine Learning (ICML), PMLR 267:27941-27956, 2025
AI中文摘要

传统的自回归大语言模型(LLM)解码通常是顺序进行的,逐个生成token。新兴的研究探索了通过识别并同时生成语义独立的LLM响应片段来实现并行解码。然而,这些技术依赖于手工制定的启发式方法,与语法结构如列表和段落相关,使它们僵化且不精确。我们提出了PASTA,一个基于学习的系统,教会LLM识别语义独立性并在自身响应中表达并行解码机会。其核心是PASTA-LANG及其解释器:PASTA-LANG是一种注释语言,使LLM能够在自身响应中表达语义独立性;语言解释器作用于这些注释,以在推理时实时协调并行解码。通过两阶段微调过程,我们训练LLM生成PASTA-LANG注释,以优化响应质量和解码速度。在AlpacaEval指令遵循基准上的评估显示,我们的方法在解码速度和响应质量上优于现有方法;我们的结果表明,几何平均速度提升范围从1.21x到1.93x,对应的质量变化为+2.2%到-7.1%,通过长度控制的胜利率与顺序解码基线比较。

英文摘要

Decoding with autoregressive large language models (LLMs) traditionally occurs sequentially, generating one token after another. An emerging line of work explored parallel decoding by identifying and simultaneously generating semantically independent chunks of LLM responses. However, these techniques rely on hand-crafted heuristics tied to syntactic structures like lists and paragraphs, making them rigid and imprecise. We present PASTA, a learning-based system that teaches LLMs to identify semantic independence and express parallel decoding opportunities in their own responses. At its core are PASTA-LANG and its interpreter: PASTA-LANG is an annotation language that enables LLMs to express semantic independence in their own responses; the language interpreter acts on these annotations to orchestrate parallel decoding on-the-fly at inference time. Through a two-stage finetuning process, we train LLMs to generate PASTA-LANG annotations that optimize both response quality and decoding speed. Evaluation on AlpacaEval, an instruction following benchmark, shows that our approach Pareto-dominates existing methods in terms of decoding speed and response quality; our results demonstrate geometric mean speedups ranging from 1.21x to 1.93x with corresponding quality changes of +2.2% to -7.1%, measured by length-controlled win rates against sequential decoding baseline.

2501.11937 2026-06-10 cs.LG cs.AI 版本更新

MeshONet: A Generalizable and Efficient Operator Learning Method for Structured Mesh Generation

MeshONet: 一种通用且高效的结构网格生成运算学习方法

Jing Xiao, Xinhai Chen, Qingling Wang, Jie Liu

发表机构 * Laboratory of Digitizing Software for Frontier Equipment, Science and Technology on Parallel and Distributed Processing Laboratory, National University of Defense Technology(前沿装备数字化软件实验室、并行与分布式处理技术实验室、国防科技大学)

AI总结 本文提出MeshONet,一种用于结构网格生成的通用智能学习方法,通过将网格生成任务转化为运算学习问题,实现了高效生成和跨几何泛化。

详情
Journal ref
Neural Networks 199: 108746 (2026)
AI中文摘要

网格生成在科学计算中起着关键作用。传统网格生成方法如TFI和基于PDE的方法往往难以在效率和网格质量之间取得平衡。为解决这一挑战,近年来出现了物理引导的智能学习方法,显著提高了生成效率并保持了高质量网格。然而,物理引导方法在应用于以前未见过的几何时无法泛化,因为即使边界形状的微小变化也需要负担得起的重新训练来适应新的几何变化。在本文中,我们引入了MeshONet,这是第一个用于结构网格生成的通用智能学习方法。该方法将网格生成任务转换为一个具有多个输入和解函数的运算学习问题。为了有效克服运算学习方法的多变量映射限制,我们提出了一种双分支、共享主干的架构,以基于输入输出对的方式近似函数空间之间的映射。实验结果表明,MeshONet在生成效率上比传统方法快了四个数量级。它还能够泛化到不同的几何形状而无需重新训练,大大增强了智能方法的实用性。

英文摘要

Mesh generation plays a crucial role in scientific computing. Traditional mesh generation methods, such as TFI and PDE-based methods, often struggle to achieve a balance between efficiency and mesh quality. To address this challenge, physics-informed intelligent learning methods have recently emerged, significantly improving generation efficiency while maintaining high mesh quality. However, physics-informed methods fail to generalize when applied to previously unseen geometries, as even small changes in the boundary shape necessitate burdensome retraining to adapt to new geometric variations. In this paper, we introduce MeshONet, the first generalizable intelligent learning method for structured mesh generation. The method transforms the mesh generation task into an operator learning problem with multiple input and solution functions. To effectively overcome the multivariable mapping restriction of operator learning methods, we propose a dual-branch, shared-trunk architecture to approximate the mapping between function spaces based on input-output pairs. Experimental results show that MeshONet achieves a speedup of up to four orders of magnitude in generation efficiency over traditional methods. It also enables generalization to different geometries without retraining, greatly enhancing the practicality of intelligent methods.

2409.12263 2026-06-10 cs.LG cs.SI 版本更新

Detecting LGBTQ+ Instances of Cyberbullying

检测LGBTQ+群体的网络欺凌实例

Arslan Bisharat, Manuel Sandoval Madrigal, Mohammed Abuhamad, Deborah L. Hall, Yasin N. Silva

发表机构 * Loyola University Chicago(洛伊拉大学芝加哥分校) Arizona State University(亚利桑那州立大学)

AI总结 本文研究利用Transformer模型识别针对LGBTQ+群体的网络欺凌,分析不同模型在复杂微妙欺凌行为中的有效性。

Comments 10 pages, 4 tables, 1 figure, 17th International Conference on Social Computing, Behavioral-Cultural Modeling, & Prediction and Behavior Representation in Modeling and Simulation

详情
AI中文摘要

社交媒体持续影响人类发展轨迹,但其引入也使键盘成为武器,使原本用于现实欺凌的攻击性语言跃入屏幕,即网络欺凌。网络欺凌对全球青少年构成重大威胁,影响许多人心理健康。LGBTQ+群体尤其易受网络欺凌影响,研究发现认同LGBTQ+与遭受更多在线骚扰存在强相关性。因此,开发能准确识别LGBTQ+成员网络欺凌事件的机器学习模型至关重要。本研究旨在比较几种Transformer模型在识别针对LGBTQ+个体网络欺凌方面的有效性,通过真实社交媒体数据评估现有方法的优劣。

英文摘要

Social media continues to have an impact on the trajectory of humanity. However, its introduction has also weaponized keyboards, allowing the abusive language normally reserved for in-person bullying to jump onto the screen, i.e., cyberbullying. Cyberbullying poses a significant threat to adolescents globally, affecting the mental health and well-being of many. A group that is particularly at risk is the LGBTQ+ community, as researchers have uncovered a strong correlation between identifying as LGBTQ+ and suffering from greater online harassment. Therefore, it is critical to develop machine learning models that can accurately discern cyberbullying incidents as they happen to LGBTQ+ members. The aim of this study is to compare the efficacy of several transformer models in identifying cyberbullying targeting LGBTQ+ individuals. We seek to determine the relative merits and demerits of these existing methods in addressing complex and subtle kinds of cyberbullying by assessing their effectiveness with real social media data.

2409.04519 2026-06-10 quant-ph cs.AI cs.LG physics.data-an 版本更新

The role of data embedding in quantum autoencoders for improved anomaly detection

数据嵌入在量子自编码器中用于改进异常检测的作用

Jack Y. Araz, Michael Spannowsky

发表机构 * Thomas Jefferson National Accelerator Facility(托马斯·杰斐逊国家加速器设施) Institute for Particle Physics Phenomenology(粒子物理学现象研究所) Durham University(达勒姆大学)

AI总结 研究探讨了三种数据嵌入技术对量子自编码器异常检测性能的影响,发现改进的嵌入策略能显著提升检测准确性和数据表征能力。

Comments 8 pages, 5 figures, 4 tables

详情
Journal ref
Quantum Mach. Intell. 8, 61 (2026)
AI中文摘要

量子自编码器(QAEs)在异常检测任务中的性能严重依赖于数据嵌入和ansatz设计。本研究探讨了三种数据嵌入技术——数据重新上传、并行嵌入和交替嵌入——对QAEs表征能力和检测效果的影响。我们的发现表明,即使使用相对简单的变分电路,增强的数据嵌入策略也能显著提高异常检测准确性和不同数据集下底层数据的表征能力。从低维数据的玩具示例开始,我们通过可视化展示不同嵌入技术对模型表征的影响。然后我们将分析扩展到复杂、高维数据集,强调嵌入方法对QAE性能的重大影响。

英文摘要

The performance of Quantum Autoencoders (QAEs) in anomaly detection tasks is critically dependent on the choice of data embedding and ansatz design. This study explores the effects of three data embedding techniques, data re-uploading, parallel embedding, and alternate embedding, on the representability and effectiveness of QAEs in detecting anomalies. Our findings reveal that even with relatively simple variational circuits, enhanced data embedding strategies can substantially improve anomaly detection accuracy and the representability of underlying data across different datasets. Starting with toy examples featuring low-dimensional data, we visually demonstrate the effect of different embedding techniques on the representability of the model. We then extend our analysis to complex, higher-dimensional datasets, highlighting the significant impact of embedding methods on QAE performance.

2408.07925 2026-06-10 cs.LG eess.SP 版本更新

A Single Channel-Based Neonatal Sleep-Wake Classification using Hjorth Parameters and Improved Gradient Boosting

基于Hjorth参数和改进梯度提升的单通道新生儿睡眠-觉醒分类

Arslan Bisharat, Muhammad Mubeen, Saadullah Farooq Abbasi, Muhammad Shahbaz Khan, Wadii Boulila, Jawad Ahmad

发表机构 * Department of Computer Science, Loyola University(洛约拉大学计算机科学系) Department of Computer Science, University of People(人民大学计算机科学系) Department of Electronic, Electrical and Systems Engineering, University of Birmingham(伯明翰大学电子、电气与系统工程系) School of Computing, Engineering and the Built Environment, Edinburgh Napier University(爱丁堡纳皮尔大学计算、工程与环境科学学院) RIOTU Lab, Prince Sultan University(普森大学RIOTU实验室)

AI总结 本文提出利用单通道梯度提升算法与Hjorth特征进行新生儿睡眠阶段分类,通过随机搜索交叉验证优化参数,达到82.35%的分类准确率,验证方法采用5折交叉验证,提升现有算法并拓展应用范围。

Comments 8 pages, 5 figures, 3 tables, International Polydisciplinary Conference on Artificial Intelligence and New Technologies

详情
AI中文摘要

睡眠在新生儿发育中起关键作用。在新生儿重症监护室(NICU)中监测新生儿睡眠模式对于理解成熟过程至关重要。尽管多通道脑电图(EEG)被认为是睡眠分类的最佳实践,但其成本和对人工标注的依赖带来了挑战。现有研究常依赖多通道EEG信号,但对新生儿的脆弱性和可能影响睡眠质量存在担忧。本文提出一种新颖的新生儿睡眠阶段分类方法,采用单通道梯度提升算法与Hjorth特征。梯度提升参数通过随机搜索交叉验证(randomsearchCV)进行微调,实现82.35%的新生儿睡眠-觉醒分类准确率。通过5折交叉验证进行验证。所提算法不仅提升了现有新生儿睡眠算法,还为更广泛的应用开辟了新途径。

英文摘要

Sleep plays a crucial role in neonatal development. Monitoring the sleep patterns in neonates in a Neonatal Intensive Care Unit (NICU) is imperative for understanding the maturation process. While polysomnography (PSG) is considered the best practice for sleep classification, its expense and reliance on human annotation pose challenges. Existing research often relies on multichannel EEG signals; however, concerns arise regarding the vulnerability of neonates and the potential impact on their sleep quality. This paper introduces a novel approach to neonatal sleep stage classification using a single-channel gradient boosting algorithm with Hjorth features. The gradient boosting parameters are fine-tuned using random search cross-validation (randomsearchCV), achieving an accuracy of 82.35% for neonatal sleep-wake classification. Validation is conducted through 5-fold cross-validation. The proposed algorithm not only enhances existing neonatal sleep algorithms but also opens avenues for broader applications.

2408.07922 2026-06-10 cs.CV cs.LG 版本更新

A Deep Features-Based Approach Using Modified ResNet50 and Gradient Boosting for Visual Sentiments Classification

基于改进ResNet50和梯度提升的深度特征方法用于视觉情感分类

Arslan Bisharat, Muhammad Mubeen, Arslan Akram, Saadullah Farooq Abbasi, Muhammad Salman Ali, Muhammad Usman Tariq

发表机构 * Department of Computer Science(计算机科学系) Loyola University Chicago(芝加哥洛伊拉大学) University Of the People(人民大学) The Superior University Lahore(拉合尔超级大学) University of Birmingham(伯明翰大学)

AI总结 本文提出一种结合改进ResNet50提取深度特征和梯度提升算法的情感分类方法,通过两个基准数据集验证,优于现有深度学习和机器学习模型。

Comments 4 pages, 4 figures, 3 tables, IEEE International Conference on Multimedia Information Processing and Retrieval (MIPR) 2024

详情
AI中文摘要

视觉情感分析(VSA)的多功能性是其日益受到关注的原因之一。由于以往研究主要集中在单一模态的情感分析上,如文本,因此难以高效管理包含视觉信息的社会媒体数据。此外,大多数视觉情感研究需要充分分类情感,因为它们主要关注简单合并模态属性而未深入研究其复杂关系。为此,提出了一种融合深度学习和机器学习算法的方法。本研究使用深度特征方法进行多类分类,从改进的ResNet50中提取深度特征,并使用梯度提升算法对包含情感内容的照片进行分类。该方法在两个基准数据集CrowdFlower和GAPED上进行了彻底评估。最后,使用最先进的深度学习和机器学习模型来比较所提出的方法。与现有最先进的方法相比,所提出的方法在所呈现的数据集上表现出色。

英文摘要

The versatile nature of Visual Sentiment Analysis (VSA) is one reason for its rising profile. It isn't easy to efficiently manage social media data with visual information since previous research has concentrated on Sentiment Analysis (SA) of single modalities, like textual. In addition, most visual sentiment studies need to adequately classify sentiment because they are mainly focused on simply merging modal attributes without investigating their intricate relationships. This prompted the suggestion of developing a fusion of deep learning and machine learning algorithms. In this research, a deep feature-based method for multiclass classification has been used to extract deep features from modified ResNet50. Furthermore, gradient boosting algorithm has been used to classify photos containing emotional content. The approach is thoroughly evaluated on two benchmarked datasets, CrowdFlower and GAPED. Finally, cutting-edge deep learning and machine learning models were used to compare the proposed strategy. When compared to state-of-the-art approaches, the proposed method demonstrates exceptional performance on the datasets presented.

2310.04680 2026-06-10 cs.CL cs.AI cs.LG 版本更新

The Cost of Down-Scaling Language Models: Fact Recall Deteriorates before In-Context Learning

大语言模型降维的成本:事实回忆在内省学习之前恶化

Tian Jin, Nolan Clement, Xin Dong, Vaishnavh Nagarajan, Michael Carbin, Jonathan Ragan-Kelley, Gintare Karolina Dziugaite

发表机构 * MIT CSAIL(麻省理工学院计算机科学与人工智能实验室) MIT Harvard University(麻省理工学院哈佛大学) Google Research(谷歌研究) Google DeepMind(谷歌深Mind)

AI总结 研究探讨了大语言模型参数数量缩放对核心能力的影响,发现模型规模缩减会显著降低事实回忆能力,但对内省信息处理影响较小。

详情
Journal ref
The Twelfth International Conference on Learning Representations (ICLR), 2024
AI中文摘要

如何缩放大语言模型(LLMs)的参数数量会影响其核心能力?我们研究了两种自然缩放技术——权重剪枝和简单训练更小或更大的模型(称为密集缩放)——对LLMs两个核心能力的影响:(a)回忆训练期间呈现的事实,以及(b)处理推理期间呈现的信息。通过设计一系列任务来区分这两种能力,我们发现这两种能力在缩放时的表现存在显著差异。通过超过30%的模型规模缩减(通过任一缩放方法)会显著降低对训练期间呈现事实的回忆能力。然而,60-70%的缩减在很大程度上保留了模型处理内省信息的各种方式,从从长上下文检索答案到从内省示例中学习参数化函数。两种缩放方法均表现出这种行为,表明缩放模型大小对事实回忆和内省学习有本质上不同的影响。

英文摘要

How does scaling the number of parameters in large language models (LLMs) affect their core capabilities? We study two natural scaling techniques -- weight pruning and simply training a smaller or larger model, which we refer to as dense scaling -- and their effects on two core capabilities of LLMs: (a) recalling facts presented during pre-training and (b) processing information presented in-context during inference. By curating a suite of tasks that help disentangle these two capabilities, we find a striking difference in how these two abilities evolve due to scaling. Reducing the model size by more than 30\% (via either scaling approach) significantly decreases the ability to recall facts seen in pre-training. Yet, a 60--70\% reduction largely preserves the various ways the model can process in-context information, ranging from retrieving answers from a long context to learning parameterized functions from in-context exemplars. The fact that both dense scaling and weight pruning exhibit this behavior suggests that scaling model size has an inherently disparate effect on fact recall and in-context learning.

2212.04930 2026-06-10 eess.AS cs.HC cs.LG cs.SD 版本更新

DDSupport: Language Learning Support System that Displays Differences and Distances from Model Speech

DDSupport: 一种展示差异和距离的语言学习支持系统

Kazuki Kawamura, Jun Rekimoto

发表机构 * The University of Tokyo, Tokyo, Japan(东京大学) Sony CSL Kyoto, Kyoto, Japan(索尼CSL京都)

AI总结 本文提出DDSupport系统,通过小规模未标注语音数据计算学习者发音评分和错误识别,以直观方式展示学习者与模型发音的差异和距离,帮助非母语者提升英语口语清晰度。

详情
Journal ref
2022 21st IEEE International Conference on Machine Learning and Applications (ICMLA)
AI中文摘要

当初学者学习非母语发音时,他们难以自行判断发音是否良好。因此,计算机辅助发音训练系统被用来检测学习者的误发音。这些系统通常将用户发音与特定母语者的发音进行比较,以节奏、音素或单词为单位计算差异。然而,它们需要大量详细标注的语音数据或只能比较单一特定母语者。为克服这些问题,我们提出了一种新的语言学习支持系统,该系统基于少量未标注语音数据计算发音评分和检测初学者的误发音,而无需与特定个体比较。所提出的系统使用基于深度学习的语音处理技术,以直观的方式显示学习者发音的评分以及学习者与一组模型发音之间的差异/距离。学习者可以通过消除差异并缩短与模型的距离来逐步提高发音。此外,由于发音评分和差异/距离不是基于特定模型的特定句子计算的,用户可以自由选择他们想学习的句子。我们还构建了一个应用程序来帮助非母语者学习英语,并确认它可以提高用户的语音可懂度。

英文摘要

When beginners learn to speak a non-native language, it is difficult for them to judge for themselves whether they are speaking well. Therefore, computer-assisted pronunciation training systems are used to detect learner mispronunciations. These systems typically compare the user's speech with that of a specific native speaker as a model in units of rhythm, phonemes, or words and calculate the differences. However, they require extensive speech data with detailed annotations or can only compare with one specific native speaker. To overcome these problems, we propose a new language learning support system that calculates speech scores and detects mispronunciations by beginners based on a small amount of unannotated speech data without comparison to a specific person. The proposed system uses deep learning--based speech processing to display the pronunciation score of the learner's speech and the difference/distance between the learner's and a group of models' pronunciation in an intuitively visual manner. Learners can gradually improve their pronunciation by eliminating differences and shortening the distance from the model until they become sufficiently proficient. Furthermore, since the pronunciation score and difference/distance are not calculated compared to specific sentences of a particular model, users are free to study the sentences they wish to study. We also built an application to help non-native speakers learn English and confirmed that it can improve users' speech intelligibility.

2209.14328 2026-06-10 quant-ph cond-mat.quant-gas cond-mat.str-el cs.LG 版本更新

Scalably learning quantum many-body Hamiltonians from dynamical data

从动力学数据中可扩展地学习量子多体哈密顿量

Frederik Wilde, Augustine Kshetrimayum, Ingo Roth, Dominik Hangleiter, Ryan Sweke, Jens Eisert

发表机构 * Dahlem Center for Complex Quantum Systems(达姆施塔特复杂量子系统中心) Quantum Research Centre, Technology Innovation Institute (TII)(技术创新研究所量子研究中心) Joint Center for Quantum Information and Computer Science (QuICS), University of Maryland & NIST(联合量子信息与计算机科学中心(QuICS),马里兰大学及国家标准与技术研究院) Joint Quantum Institute (JQI), University of Maryland & NIST(联合量子研究所(JQI),马里兰大学及国家标准与技术研究院) Fraunhofer Heinrich Hertz Institute(弗劳恩霍夫海因里希·赫兹研究所)

AI总结 本文提出一种可扩展的数据驱动方法,利用梯度优化和张量网络,从动力学数据中学习多体相互作用哈密顿量家族,针对一维Heisenberg模型展示出误差随系统规模递减的特性。

Comments 11 pages, 5 figures

详情
Journal ref
Quantum Sci. Technol. 11, 035002 (2026)
AI中文摘要

闭合量子系统的物理由其哈密顿量决定。然而在大多数实际情况下,哈密顿量并不精确已知,最终只有通过测量获得的数据存在。本文提出一种高度可扩展、数据驱动的方法,结合机器学习中的梯度优化技术和高效的张量网络量子态表示,从动力学数据中学习多体相互作用哈密顿量家族。该方法具有高度实用性、实验友好性,并能扩展到超过100个自旋的系统规模。特别地,在合成数据上展示,即使受限于单一初始态、少量单量子比特可观测量和相对较短的时间演化,算法仍能有效工作。对于一维Heisenberg模型,该算法的误差常数与系统规模成反比,且随数据集大小的平方根递减。

英文摘要

The physics of a closed quantum mechanical system is governed by its Hamiltonian. However, in most practical situations, this Hamiltonian is not precisely known, and ultimately all there is are data obtained from measurements on the system. In this work, we introduce a highly scalable, data-driven approach to learning families of interacting many-body Hamiltonians from dynamical data, by bringing together techniques from gradient-based optimization from machine learning with efficient quantum state representations in terms of tensor networks. Our approach is highly practical, experimentally friendly, and intrinsically scalable to allow for system sizes of above 100 spins. In particular, we demonstrate on synthetic data that the algorithm works even if one is restricted to one simple initial state, a small number of single-qubit observables, and time evolution up to relatively short times. For the concrete example of the one-dimensional Heisenberg model our algorithm exhibits an error constant in the system size and scaling as the inverse square root of the size of the data set.