arXivDaily arXiv每日学术速递 周一至周五更新

1. 深度学习架构与训练方法 11 篇

2606.19379 2026-06-19 cs.LG cs.AI cs.CL 新提交

How Linear Is a Transformer Feed-Forward Block? Per-Block Linear Recoverability Is Learned, Not Architectural

Transformer 前馈块有多线性?逐块线性可恢复性是学习得到的,而非架构决定的

Stuart Whipp

发表机构 * Independent Research(独立研究)

AI总结 通过精确最小二乘线性近似,测量训练后 Transformer 各前馈块的线性可恢复性,发现其高度异质且非单调,是学习得到的属性而非架构决定,并可用于压缩和诊断。

Comments 14 pages, 5 figures

详情
AI中文摘要

Transformer 前馈网络(FFN)通常被视为非线性的计算存储单元,但训练后的 FFN 块实际非线性程度很少被测量。我们将每个 FFN 视为位置级的输入-输出映射,并将其分解为精确的最小二乘线性近似加上残差。闭式线性映射解释的留出方差定义了一个块的线性可恢复性(R^2_lin),这是一种无需优化器的线性度量。在 GPT-2、Pythia-160m 和 llama-160m 的所有十二个块中,R^2_lin 高度异质且随深度非单调变化,相邻块之间范围从近线性(>0.99)到强非线性(<0.3),且并非由激活函数决定:相同宽度的 GELU 模型 GPT-2 和 Pythia-160m 具有截然不同的轮廓,因此可恢复性是单个训练块的学习属性,而非架构属性。残差的低秩双线性探针仅恢复少量 R^2 点,且增益与残差非线性不相关:未恢复的计算不是单个位置级乘积,而是高阶或分布式结构。该测量还作为有针对性的压缩信号:可恢复块允许大的单层替换(GPT-2 的早期 FFN 参数减少 8 倍,困惑度增加 +0.77),而低可恢复性块标记了这不安全的情况。它还暴露了一个方法论陷阱:训练后的线性基线可能在病态条件的 Transformer 激活上严重欠收敛,因此我们报告了整个过程中精确的闭式最小二乘上限。

英文摘要

Transformer feed-forward networks (FFNs) are often treated as nonlinear stores of computation, yet how nonlinear a trained FFN block actually is has rarely been measured. We treat each FFN as a position-wise input-to-output map and split it into the exact least-squares linear approximation plus a residual. The held-out variance the closed-form linear map explains defines a block's linear recoverability (R^2_lin), an optimiser-free measure of its linearity. Across all twelve blocks of GPT-2, Pythia-160m, and llama-160m, R^2_lin is highly heterogeneous and non-monotone with depth, ranging from near-linear (>0.99) to strongly nonlinear (<0.3) between adjacent blocks, and is not set by the activation function: same-width GELU models GPT-2 and Pythia-160m have sharply different profiles, so recoverability is a learned property of individual trained blocks, not an architectural one. A low-rank bilinear probe of the residual recovers only a few points of R^2, with gain uncorrelated with residual nonlinearity: the unrecovered computation is not a single position-wise product but higher-order or distributed structure. The measurement also serves as a targeted compression signal: recoverable blocks admit large single-layer replacements (GPT-2's early FFN at 8x fewer parameters for +0.77 perplexity), while low-recoverability blocks flag where this is unsafe. It further exposes a methodological pitfall: trained linear baselines can badly under-converge on ill-conditioned transformer activations, so we report the exact closed-form least-squares ceiling throughout.

2606.19489 2026-06-19 cs.LG cs.AI 新提交

Concept Flow Models: Anchoring Concept-Based Reasoning with Hierarchical Bottlenecks

概念流模型:通过层次瓶颈锚定基于概念的推理

Ya Wang, Adrian Paschke

发表机构 * Fraunhofer Institute for Open Communication Systems(弗劳恩霍夫开放通信系统研究所) Freie Universität Berlin(柏林自由大学)

AI总结 提出概念流模型(CFM),用层次化概念决策树替代扁平瓶颈,通过逐步缩小预测范围减少信息泄露,在保持预测性能的同时提升可解释性。

Journal ref Transaction on Machine Learning Research, 2/2026

详情
AI中文摘要

概念瓶颈模型(CBM)通过将学习到的特征投影到人类可理解的概念空间来增强可解释性。最近的方法利用视觉-语言模型生成概念嵌入,减少了对人工概念标注的需求。然而,这些模型存在一个关键限制:当概念数量接近嵌入维度时,信息泄露增加,使得模型能够利用虚假或语义上不相关的相关性,从而削弱可解释性。在这项工作中,我们提出了概念流模型(CFM),它将扁平瓶颈替换为层次化的、概念驱动的决策树。层次结构中的每个内部节点专注于局部判别性概念子集,逐步缩小预测范围。我们的框架从视觉嵌入构建决策层次,在每个层次级别分布语义概念,并通过概率树遍历训练可微的概念权重。在多个基准上的大量实验表明,CFM在预测性能上与扁平CBM相当,同时通过减少有效概念使用显著缓解了信息泄露。此外,CFM产生逐步决策流,使得具有层次类结构的透明且可审计的模型推理成为可能。

英文摘要

Concept Bottleneck Models (CBMs) enhance interpretability by projecting learned features into a human-understandable concept space. Recent approaches leverage vision-language models to generate concept embeddings, reducing the need for manual concept annotations. However, these models suffer from a critical limitation: as the number of concepts approaches the embedding dimension, information leakage increases, enabling the model to exploit spurious or semantically irrelevant correlations and undermining interpretability. In this work, we propose Concept Flow Models (CFMs), which replace the flat bottleneck with a hierarchical, concept-driven decision tree. Each internal node in the hierarchy focuses on a localized subset of discriminative concepts, progressively narrowing the prediction scope. Our framework constructs decision hierarchies from visual embeddings, distributes semantic concepts at each hierarchy level, and trains differentiable concept weights through probabilistic tree traversal. Extensive experiments on diverse benchmarks demonstrate that CFMs match the predictive performance of flat CBMs, while substantially mitigating information leakage by reducing effective concept usage. Furthermore, CFMs yield stepwise decision flows that enable transparent and auditable model reasoning with hierarchical class structures.

2606.19697 2026-06-19 cs.LG cs.AI cs.CL 新提交

Efficiently Representing Algorithms With Chain-of-Thought Transformers

高效表示链式思维Transformer中的算法

Yanhong Li, Anej Svete, Ashish Sabharwal, William Merrill

发表机构 * Allen Institute for AI(艾伦人工智能研究所) ETH Zürich(苏黎世联邦理工学院)

AI总结 本文证明链式思维Transformer能以多对数开销高效模拟Word RAM算法,包括排序和Dijkstra算法,优于模拟图灵机的二次开销。

详情
AI中文摘要

推理模型(即在产生答案前输出一系列推理或思维token的语言模型)日益流行,部分原因在于理论结果表明链式思维(CoT)Transformer可以模拟图灵机,从而执行任意计算。然而,图灵机虽然适用于复杂性理论分析,但在讨论算法时并不方便、直观或高效。算法通常在更高的抽象层次上设计和分析,即具有随机访问存储器和单位成本操作(对$\bigO(\log n)$位字)的Word RAM模型。因此,Word RAM算法可能比其图灵机对应物更高效,这引出了一个问题:CoT Transformer能否高效模拟Word RAM算法?例如,它们能否在$\bigO(n \log n)$步内对n个元素排序,或在$\bigO(E + V \log V)$步内运行Dijkstra算法?我们给出肯定回答,开销不超过多对数。我们首先为具有多对数宽度和最右唯一硬注意力的有限精度Transformer建立这一结果,然后将结果推广到两个更实际的设置:有限宽度和对数精度:连续CoT(其中推理采用向量而非token形式)和混合架构(其中Transformer层位于循环(线性RNN)层之上)。在所有三种情况下,我们发现CoT可以高效模拟任何Word RAM算法,仅需在n上多对数开销。当Word RAM具有“平坦”指令集时,此开销降至对数平方,而对于无乘法平坦指令仅需对数开销——这与已知的CoT模拟图灵机(需要二次开销)形成鲜明对比。

英文摘要

The increasing popularity of \emph{reasoning} models -- language models that output a series of reasoning or thought tokens before producing an answer -- is justified, in part, by theoretical results showing that chain-of-thought (CoT) transformers can simulate Turing machines, and thus perform arbitrary computation. However, the Turing machine, while suitable for complexity-theoretic analysis, is not convenient, intuitive, or efficient for discussing algorithms. Algorithms are typically designed and analyzed at a higher level of abstraction, captured by the \emph{Word RAM} model with random-access memory and unit-cost operations on $\bigO(\log n)$-bit words. As a result, Word RAM algorithms can be substantially more efficient than their Turing machine counterparts, raising the question: \emph{Can CoT transformers efficiently simulate Word RAM algorithms?} For instance, can they sort $n$ items in $\bigO(n \log n)$ steps or run Dijkstra's algorithm in $\bigO(E + V \log V)$ steps? We answer affirmatively, up to poly-logarithmic overhead. We first establish this for finite-precision transformers with poly-logarithmic width and rightmost unique hard attention, then strengthen the result to two more practical settings with finite width and log-precision: \emph{continuous} CoT, where reasoning takes the form of vectors rather than tokens, and a \emph{hybrid} architecture in which transformer layers sit atop a recurrent (linear RNN) layer. In all three cases, we find that CoT \emph{can} efficiently simulate any Word RAM algorithm with only a poly-logarithmic overhead in $n$. This overhead reduces to log-square when the Word RAM has a ``flat'' instruction set, and only logarithmic for multiplication-free flat instructions -- in stark contrast to known CoT simulations of Turing machines, which require quadratic overhead over Word RAM.

2606.19754 2026-06-19 cs.LG cs.NA math.NA 新提交

Learning universal approximations for partial differential equations with Physics-Informed Broad Learning System

基于物理信息广度学习系统的偏微分方程通用逼近学习

Zhiwen Yu, Derong Yang, Liujian Zhang, Kaixiang Yang, Peilin Zhan, Jianmin Lv, Jane You, C. L. Philip Chen

发表机构 * School of Computer Science and Engineering, South China University of Technology(华南理工大学计算机科学与工程学院) Peng Cheng Laboratory(鹏城实验室) School of Future Technology, South China University of Technology(华南理工大学未来技术学院) School of Computer Science and Technology, Guangdong University of Technology(广东工业大学计算机科学与技术学院) Department of Industrial and Systems Engineering, The Hong Kong Polytechnic University(香港理工大学工业及系统工程学系)

AI总结 提出物理信息广度学习系统(PIBLS),通过无反向传播的最小二乘优化高效求解线性和非线性偏微分方程,比传统PINN快1-3个数量级且精度更高。

详情
AI中文摘要

偏微分方程(PDE)在建模复杂的物理、生物和工程系统中起着核心作用。虽然传统的数值求解器很稳健,但由于网格依赖性,它们常常带来高昂的计算成本,而最近的物理信息神经网络(PINN)提供了一种无网格替代方案,但经常遭受收敛缓慢和优化不稳定的问题。为了弥合这一差距,本文提出了物理信息广度学习系统(PIBLS),一种新颖的无反向传播框架,将PDE求解重新表述为直接的最小二乘优化。我们改进了该框架内的一个算法以高效处理非线性PDE,并提供了严格的数学证明,确立了PIBLS对这些方程的通用逼近性质。在线性和非线性PDE上的实验表明,PIBLS比传统PINN快1到3个数量级,同时实现了显著更高的求解精度。该框架为科学机器学习提供了一种计算高效的范式,为实时仿真和设计优化任务提供了一种实用、高速的替代方案。

英文摘要

Partial differential equations (PDEs) play a central role in modeling complex physical, biological, and engineering systems. While traditional numerical solvers are robust, they often incur prohibitive computational costs due to mesh dependencies, whereas recent Physics-Informed Neural Networks (PINNs) offer a mesh-free alternative but frequently suffer from slow convergence and optimization instability. To bridge this gap, this article proposes the Physics-Informed Broad Learning System (PIBLS), a novel backpropagation-free framework that reformulates PDE solving as a direct least-squares optimization. We improved an algorithm within this framework to handle nonlinear PDEs efficiently and provide a rigorous mathematical proof establishing the universal approximation property of PIBLS for these equations. Experiments on linear and nonlinear PDEs demonstrate that PIBLS is one to three orders of magnitude faster than conventional PINNs while achieving significantly higher solution accuracy. This framework provides a computationally efficient paradigm for scientific machine learning, offering a practical, high-speed alternative for real-time simulation and design optimization tasks.

2606.19850 2026-06-19 cs.LG cs.AI 新提交

Neural Additive and Basis Models with Feature Selection and Interactions

具有特征选择和交互的神经加性模型与神经基础模型

Yasutoshi Kishimoto, Kota Yamanishi, Takuya Matsuda, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 提出在神经加性模型和神经基础模型中引入特征选择机制,通过特征选择层减少计算开销,并支持高维数据中的特征交互学习,性能优于或持平于现有GAM方法。

Comments Accepted at PAKDD 2024. Code is available at https://github.com/shiralab/NAM-FS

详情
AI中文摘要

深度神经网络(DNN)在各个领域表现出色,但通常可解释性较低。神经加性模型(NAM)及其变体神经基础模型(NBM)在广义加性模型(GAM)中使用神经网络(NN)作为非线性形状函数。这两种模型具有高度可解释性,并且在NN训练中表现出良好的性能和灵活性。NAM和NBM基于GAM架构,可以提供并可视化每个特征对预测的贡献。然而,当使用双输入NN来考虑特征交互或将其应用于高维数据集时,由于所需计算资源的增加,训练NAM和NBM变得棘手。本文提出将特征选择机制融入NAM和NBM以解决计算瓶颈。我们在两种模型中引入特征选择层,并在训练过程中更新选择权重。我们的方法简单,与原始NAM和NBM相比,可以降低计算成本和模型大小。此外,它使我们即使在数据维度很高的情况下也能使用双输入NN并捕获特征交互。我们证明,所提出的模型与原始NAM和NBM相比计算效率更高,并且与最先进的GAM相比表现出更好或相当的性能。

英文摘要

Deep neural networks (DNNs) exhibit attractive performance in various fields but often suffer from low interpretability. The neural additive model (NAM) and its variant called the neural basis model (NBM) use neural networks (NNs) as nonlinear shape functions in generalized additive models (GAMs). Both models are highly interpretable and exhibit good performance and flexibility for NN training. NAM and NBM can provide and visualize the contribution of each feature to the prediction owing to GAM-based architectures. However, when using two-input NNs to consider feature interactions or when applying them to high-dimensional datasets, training NAM and NBM becomes intractable due to the increase in the computational resources required. This paper proposes incorporating the feature selection mechanism into NAM and NBM to resolve computational bottlenecks. We introduce the feature selection layer in both models and update the selection weights during training. Our method is simple and can reduce computational costs and model sizes compared to vanilla NAM and NBM. In addition, it enables us to use two-input NNs even in high-dimensional datasets and capture feature interactions. We demonstrate that the proposed models are computationally efficient compared to vanilla NAM and NBM, and they exhibit better or comparable performance with state-of-the-art GAMs.

2606.19853 2026-06-19 cs.LG physics.comp-ph 新提交

Physics-Informed Neural Network with Squeeze-Excitation-like Attention

带有挤压-激励式注意力的物理信息神经网络

Yun-Fei Song, Long-Gang Pang, Fu-Peng Li, Jun-Jie Zhang

发表机构 * Key Laboratory of Quark and Lepton Physics (MOE) & Institute of Particle Physics, Central China Normal University(华中师范大学夸克与轻子物理教育部重点实验室及粒子物理研究所) Artificial Intelligence and Computational Physics Research Center, Central China Normal University(华中师范大学人工智能与计算物理研究中心) Key Laboratory of Nuclear Physics and Ion-beam Application (MOE) & Institute of Modern Physics, Fudan University(复旦大学核物理与离子束应用教育部重点实验室及现代物理研究所) Shanghai Research Center for Theoretical Nuclear Physics, NSFC and Fudan University(国家自然科学基金委员会-复旦大学上海理论核物理研究中心) Northwest Institute of Nuclear Technology(西北核技术研究所)

AI总结 提出SEA-PINN架构,通过挤压-激励式注意力机制动态调整神经元重要性,实现稳定初始化,在20个基准问题中17个方差极小,无需傅里叶嵌入或周期激活即可达到与TSA-PINN相当的精度,并可作为轻量插件提升其他PINN性能。

Comments 15 pages, 6 figures

详情
AI中文摘要

我们引入了SEA-PINN,一种新颖的架构,它将类似挤压-激励的注意力机制融入物理信息神经网络,以动态重新校准各层神经元的重要性。SEA-PINN的一个关键特性是其高度稳定的初始化。在20个基准问题中的17个上,SEA-PINN表现出几乎可忽略的方差和显著降低的初始损失,为优化建立了一个准确定且有利的起点。值得注意的是,在没有采用傅里叶特征嵌入或周期激活函数的情况下,SEA-PINN与TSA-PINN(一种通过正弦激活中的可学习频率专门为高频问题设计的模型)相比,达到了具有竞争力的精度(在高频案例7上,相对于FNN-PINN的改进分别为83%和90%)。此外,将SEA-PINN集成到TSA-PINN中使性能提升了42.49%。这些结果强调了SEA-PINN作为一种轻量级插件模块,能够增强非线性表示能力,促进更稳健和高效的收敛,并提高物理信息学习的整体可靠性。

英文摘要

We introduce SEA-PINN, a novel architecture that incorporates a Squeeze-Excitation-like attention mechanism into physics-informed neural networks to dynamically recalibrate the importance of neurons across layers. A key feature of SEA-PINN is its highly stable initialization. On 17 out of 20 benchmark problems, SEA-PINN exhibit nearly negligible variance and significantly reduced initial loss, establishing a quasi-deterministic and favorable starting point for optimization. Notably, without employing Fourier feature embeddings or periodic activation functions, SEA-PINN attained competitive accuracy (83\% vs. 90\% improvement relative to FNN-PINN on the high-frequency case 7) as compared with TSA-PINN-a model specifically engineered for high-frequency problems via learnable frequencies in sinusoidal activations. Furthermore, integrating SEA-PINN into TSA-PINN boosted performance by 42.49\%. These results underscore SEA-PINN as a lightweight plug-in module that enhances nonlinear representation power, promotes more robust and efficient convergence, and strengthens the overall reliability of physics-informed learning.

2606.19941 2026-06-19 cs.LG 新提交

Compositionality Emerges in a Narrow Depth-Connectivity Regime: Architecture Constraints and Solution Manifolds

组合性在窄深度-连接性区域中涌现:架构约束与解流形

Dat H. Do, Rushi Shah, Duc V. Le, Dianbo Liu

发表机构 * National University of Singapore(新加坡国立大学) University of Twente(特温特大学)

AI总结 研究发现组合性仅在特定稀疏网络和特定深度区间涌现,提出基于相似性的剪枝和深度预测方法,并用理论框架解释原因。

详情
AI中文摘要

组合性被认为是泛化的基础,使模型能够在新颖组合中重用有意义的原语。然而,使用标准梯度优化训练的模型很少且通常仅微弱地表现出组合内部结构,并且尚不清楚这种组合性如何或为何形成。在这项工作中,我们表明组合性在一个狭窄的连接性-深度最佳点涌现。沿着连接性轴,组合性仅出现在某些特定稀疏网络中,严重依赖于保留哪些连接而非仅权重的稀疏性。沿着深度轴,组合性在一个狭窄的、目标依赖的区域内涌现,在特定深度达到峰值,而更浅和更深的网络都失败。当深度或连接性条件被违反时,梯度下降会静默地收敛到破碎解而非组合解。为了发现并利用这种涌现,我们引入了(i)基于相似性的剪枝(SP)以恢复组合连接性,以及(ii)一个启发式深度预测器以估计组合性最可能出现的深度。最后,我们通过基于组合稀疏性、体积比论证和特征干扰界限的理论框架支持这些实证发现,解释了为什么组合解仅在狭窄的深度-连接性区域内可达。

英文摘要

Compositionality is believed to be the foundation for generalization, enabling models to reuse meaningful primitives in novel combinations. Yet, models trained with standard gradient-based optimization rarely, and often only weakly, exhibit compositional internal structure, and it remains unclear how or why such compositionality forms. In this work, we show that compositionality emerges in a narrow connectivity-depth sweet spot. Along the connectivity axis, compositionality only appears in some specifically sparse networks, heavily depends on which connections remain rather than on weights' sparsity alone. Along the depth axis, compositionality emerges within a narrow, target-dependent regime, peaking at specific depths, while both shallower and deeper networks fail. When either the depth or connectivity condition is violated, gradient descent silently converges to fractured solutions rather than compositional ones. To discover and exploit this emergence, we introduce (i) similarity-based pruning (SP) to recover compositional connectivity and (ii) a heuristic depth predictor to estimate where compositionality is most likely to appear. Finally, we support these empirical findings with a theoretical framework based on compositional sparsity, volume-ratio arguments, and feature-interference bounds, explaining why compositional solutions are reachable only in a narrow depth-connectivity regime.

2606.19984 2026-06-19 cs.LG 新提交

Kolmogorov-Arnold Reservoir Computing

Kolmogorov-Arnold 储层计算

Juntian Huang, Jurgen Kurths, Ying Tang

发表机构 * Institute of Fundamental and Frontier Sciences, University of Electronic Science and Technology of China(电子科技大学基础与前沿科学研究所) Potsdam Institute for Climate Impact Research(波茨坦气候影响研究所) Department of Physics, Humboldt University Berlin(柏林洪堡大学物理系) Research Institute of Intelligent Complex Systems, Fudan University(复旦大学智能复杂系统研究所) School of Physics, University of Electronic Science and Technology of China(电子科技大学物理学院) Key Laboratory of Quantum Physics and Photonic Quantum Information, Ministry of Education, University of Electronic Science and Technology of China(电子科技大学教育部量子物理与光子量子信息重点实验室) Non-classical Information Science Basic Discipline Research Center of Sichuan Province, University of Electronic Science and Technology of China(电子科技大学四川省非经典信息科学基础学科研究中心)

AI总结 提出Kolmogorov-Arnold储层计算(KARC),用显式基函数展开替代储层,结合KAN的表达能力和储层计算的闭式训练,在偏微分方程等基准上优于现有方法。

详情
AI中文摘要

储层计算为预测动力系统提供了轻量级框架,但由于表示能力有限,可能难以捕捉长程依赖。传统储层计算循环使用可训练储层,对超参数敏感,而下一代储层计算以特征维度快速增长为代价去除了循环。在此,我们开发了Kolmogorov-Arnold储层计算(KARC),它受Kolmogorov-Arnold表示定理启发,用显式基函数展开替代储层。我们严格证明KARC是Kolmogorov-Arnold网络(KAN)的轻量级设计,保留了KAN的潜在表达能力,同时允许储层计算的高效闭式训练。在相当的成本下,KARC在包括偏微分方程在内的挑战性基准上优于现有储层计算方法。它还可以与生成扩散模型集成用于文本到图像生成。因此,本工作建立了储层计算与KAN之间的原则性桥梁,实现了高效高保真的动力系统预测。

英文摘要

Reservoir computing offers a lightweight framework for forecasting dynamical systems but may struggle to capture long-range dependencies due to limited representational capacity. Conventional reservoir computing recurrently uses trainable reservoirs with hyperparameter sensitivity, while the next-generation reservoir computing removes recurrence at the cost of rapidly growing feature dimensions. Here, we develop Kolmogorov-Arnold Reservoir Computing (KARC), which replaces reservoirs with explicit basis-function expansions inspired by the Kolmogorov-Arnold representation theorem. We rigorously show that KARC is a lightweight design of Kolmogorov-Arnold networks (KANs), preserving the potential expressive capacity of KANs while admitting efficient closed-form training of reservoir computing. At comparable cost, KARC outperforms existing reservoir computing methods on challenging benchmarks including partial differential equations. It can also be integrated with generative diffusion models for text-to-image generation. This work thus establishes a principled bridge between reservoir computing and KANs, enabling efficient and high-fidelity dynamical system forecasting.

2606.20292 2026-06-19 cs.LG cs.LO 新提交

Shifting-based Optimizable Linear Relaxations for General Activation Functions

基于平移的可优化线性松弛用于通用激活函数

Philipp Kern, László Antal, Erika Ábráham, Carsten Sinz

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院) Karlsruhe University of Applied Sciences(卡尔斯鲁厄应用科学大学)

AI总结 提出SLiR方法,通过斜率参数化和平移过程生成任意激活函数的线性松弛,在保持正确性的同时实现高效优化,验证属性数量比现有方法多7.8倍。

Comments 21 pages, under review

详情
AI中文摘要

神经网络(NN)的使用正在迅速增加,包括在安全关键领域。为了提供关于NN行为的正式保证,许多验证方法依赖于激活函数的可优化线性松弛。然而,现有技术依赖于为每个激活函数手工制作的松弛。因此,扩展到最先进的激活函数需要大量手动工作。相比之下,我们的方法SLiR(基于平移的线性松弛)具有广泛的适用性,仅需要Lipschitz常数或一组临界点。SLiR通过斜率参数化松弛,并通过平移过程计算相应的偏移,确保在输入域上的可靠上下界,从而在保持正确性的同时实现高效优化。我们的实验表明,SLiR在广泛的实际激活函数上产生紧致的松弛,并且与最先进的方法相比,能够验证多达7.8倍更多的属性。

英文摘要

The use of neural networks (NNs) is rapidly increasing, including in safety- and security-critical domains. To provide formal guarantees about NN behavior, many verification methods rely on optimizable linear relaxations of activation functions. However, existing techniques depend on hand-crafted relaxations for each activation function. Extension to state-of-the-art activation functions therefore requires substantial manual effort. In contrast, our approach SLiR (Shifting-based Linear Relaxations) is broadly applicable, requiring only a Lipschitz constant or a set of critical points. SLiR parameterizes relaxations by their slope and computes the corresponding offset via a shifting procedure that ensures sound upper and lower bounds over the input domain, enabling efficient optimization while maintaining correctness. Our experiments show that SLiR produces tight relaxations across a wide range of practical activation functions and enables verification of up to 7.8x more properties compared to state-of-the-art methods.

2606.20442 2026-06-19 cs.LG cs.NA cs.NE math.NA 新提交

Evolutionary Two-Stage Hyperparameter Optimization Strategies for Physics-Informed Neural Networks

物理信息神经网络的进化两阶段超参数优化策略

Fedor Buzaev, Dmitry Efremenko, Egor Bugaev, Andrei Ermakov, Denis Derkach, Daria Pugacheva, Fedor Ratnikov

发表机构 * HSE University(高等经济大学) AXXX

AI总结 针对物理信息神经网络训练不稳定、超参数敏感的问题,提出基于进化算法的两阶段优化策略,先低保真筛选再全训练,在三个PDE问题上显著降低误差。

Comments Equal advising: Daria Pugacheva and Fedor Ratnikov. Accepted to the ICLR 2026 Workshop on AI and PDEs

详情
AI中文摘要

物理信息神经网络(PINNs)通过将物理定律嵌入神经网络训练来求解偏微分方程(PDE)。然而,由于物理信息损失的高度非凸和多项结构,其性能受到不稳定收敛、训练平台期以及对架构和优化超参数的强敏感性的影响。在这种情况下,外循环超参数搜索是一个在异构参数上的噪声黑盒优化问题,经典的局部或基于梯度的策略容易陷入次优区域。进化算法凭借其基于种群的探索能力和处理混合、不可微搜索空间的能力,为发现有前景的配置提供了更稳健的机制。我们提出并研究了一种基于进化算法的两阶段方法,该方法结合了PINNs训练的探索和利用部分,以在固定计算预算下提高解的精度和鲁棒性。在第一阶段,我们执行具有截断轮次的低保真训练运行,以快速筛选候选配置,将超参数选择视为黑盒外循环问题。在第二阶段,只有最有希望的候选者使用标准基于梯度的优化器进行完全训练以细化解。在三个流行问题(即平流方程、Klein-Gordon方程和Helmholtz方程)上评估,我们的方法一致优于标准训练,并在受限计算资源内实现了显著更低的平均误差。

英文摘要

Physics-Informed Neural Networks (PINNs) solve Partial Differential Equations (PDEs) by embedding physical laws into neural network training. However, their performance suffers from unstable convergence, training plateaus, and strong sensitivity to architectural and optimization hyperparameters due to the highly non-convex and multi-term structure of the physics-informed loss. In this setting, the outer-loop hyperparameter search is a noisy and black-box optimization problem over heterogeneous parameters, where classical local or gradient-based strategies are easily trapped in suboptimal regions. Evolutionary algorithms, with their population-based exploration and ability to handle mixed, non-differentiable search spaces, provide a more robust mechanism for discovering promising configurations. We propose and investigate a two-stage approach based on evolutionary algorithms that combines exploration and exploitation parts of PINNs training to improve solution accuracy and robustness under fixed computational budgets. In the first stage, we perform low-fidelity training runs with truncated epochs to rapidly screen candidate configurations, treating hyperparameter selection as a black-box outer-loop problem. In the second stage, only the most promising candidates are fully trained with standard gradient-based optimizers to refine the solution. Evaluated on three popular problems, namely Advection, Klein-Gordon and Helmholtz equations, our method consistently outperforms standard training and achieves significantly lower mean error within constrained computational resources.

2606.20547 2026-06-19 cs.LG cs.CV cs.GR cs.RO math.DG 新提交

The Token Is a Group Element: On Lie-Algebra Attention over Matrix Lie Groups

Token 是群元素:关于矩阵李群上的李代数注意力

Przemyslaw Musialski

发表机构 * New Jersey Institute of Technology(新泽西理工学院)

AI总结 提出李代数注意力机制,将token定义为矩阵李群元素,利用相对位姿的李代数范数作为注意力分数,无需学习核函数或表示论工具,适用于仿射全帧群等非紧致非阿贝尔群。

Comments preprint, 19 pages, 3 figures

详情
AI中文摘要

我们将注意力token置于群上:一个token是矩阵李群$G$的一个元素$g_i$——一个纯粹的变换,没有特征负载,也没有外部作用$\rho(g)$承载它。据我们所知,这是第一个token为裸矩阵李群元素的注意力构造:它们的分数是相对位姿的闭式代数范数,而非学习核,并且它达到了每个基于不可约表示或满射指数的方法必须排除的仿射全帧群。我们称之为李代数注意力。一旦token是群元素,其余部分无需通常的表示论机制。一对的相对几何是规范的,即$g_i^{-1} g_j$,因此成对不变量$w_{ij} = \log(g_i^{-1} g_j)$是内在的而非设计的;在$G$对角作用下的等变性是重言式的,且余循环条件自动成立。注意力分数是负平方代数范数$s_{ij} = -\|\log(g_i^{-1} g_j)\|_\lambda^2/\tau$:在块加权Frobenius内积下的规范邻近核,无需不可约表示、球谐函数、Clebsch-Gordan积或学习核。该构造适用于任何矩阵李群,在包含相对位姿的选定对数图上,包括具有尺度和剪切的非紧致非阿贝尔仿射群,这些是向量token注意力方法无法达到的:既不是不可约表示传统,也不是满射指数方法。在SE(2)、SO(3)和Aff(2)上的三个序列补全实验证实了这一点:闭式分数匹配了相同不变量上的学习MLP核,并在SE(2)上优于它,使用的分数参数少50到80倍,而向量token基线破坏了不变量,误差达五到十二个数量级。

英文摘要

We place the attention token on the group: a token is an element $g_i$ of a matrix Lie group $G$ -- a bare transformation, with no feature payload and no external action $ρ(g)$ carrying it. To our knowledge this is the first attention construction whose tokens are bare matrix Lie group elements: their score is the closed-form algebra norm of the relative pose rather than a learned kernel, and it reaches the affine full-frame groups that every irrep- or surjective-exp-based method must exclude. We call it Lie-Algebra Attention. Once tokens are group elements, the rest follows with none of the usual representation-theoretic machinery. The relative geometry of a pair is canonical, $g_i^{-1} g_j$, so the pairwise invariant $w_{ij} = \log(g_i^{-1} g_j)$ is intrinsic rather than designed; equivariance under the diagonal $G$-action is tautological, and the cocycle condition holds automatically. The attention score is the negative squared algebra norm, $s_{ij} = -\|\log(g_i^{-1} g_j)\|_λ^2/τ$: the canonical proximity kernel under a block-weighted Frobenius inner product, with no irreducible representations, spherical harmonics, Clebsch-Gordan products, or learned kernel. The construction applies to any matrix Lie group on a chosen logarithm chart containing the relative poses, including the non-compact non-abelian affine groups with scale and shear that no vector-token attention method reaches: neither the irrep tradition nor surjective-exp methods. Three sequence-completion experiments, on SE(2), SO(3), and Aff(2), bear this out: the closed-form score matches a learned MLP kernel on the same invariant and outperforms it on SE(2), using 50 to 80x fewer score parameters, while a vector-token baseline breaks invariance by five to twelve orders of magnitude.

2. 表示学习、自监督与对比学习 7 篇

2606.19408 2026-06-19 cs.LG cs.RO 新提交

FlexLAM: Resolving the Bottleneck Trade-off in Latent Action Learning

FlexLAM: 解决潜在动作学习中的瓶颈权衡

Takanori Yoshimoto, Yang Hu, Naruya Kondo, Tatsuya Matsushima

发表机构 * University of Tsukuba(筑波大学) The University of Tokyo(东京大学)

AI总结 针对潜在动作模型中固定容量瓶颈导致的权衡问题,提出FlexLAM,通过嵌套dropout实现变长潜在动作,在不增加架构或损失的情况下,在稀缺标签和低回报任务中优于固定容量模型,并支持推理时调整令牌预算。

详情
AI中文摘要

潜在动作为无动作视频与下游决策提供了紧凑接口,但现有潜在动作模型(LAM)强制每个转换通过固定容量瓶颈。我们识别出一个瓶颈权衡:过于紧凑的编码可能丢弃动作对齐所需的转换线索,而过于松散的编码则保留了额外的转换变化,当对齐标签稀缺或分布狭窄时必须解决这些变化。FlexLAM用通过嵌套dropout训练的变长潜在动作取代固定容量,产生前缀有效编码,首先捕获紧凑的转换结构,仅在需要时添加细节,无需新架构或损失。在标准稀缺标签监督下和低回报单任务对齐压力测试中,单个FlexLAM在每个评估的令牌预算下匹配或超越单独训练的固定容量LAM,表明FlexLAM不仅在推理时可调整,而且在相同令牌预算下学习了更好的潜在动作接口。同一模型支持推理时令牌预算调整而无需重新训练,并且FlexLAM改善了Ego4D转换重建。这些结果表明,变长潜在动作是对潜在动作模型、潜在动作世界模型和视频预训练动作接口中固定容量瓶颈的无架构、即插即用升级。

英文摘要

Latent actions provide a compact interface between action-free video and downstream decision-making, yet existing Latent Action Models (LAMs) force every transition through a fixed-capacity bottleneck. We identify a bottleneck trade-off: overly tight codes can discard transition cues needed for action alignment, while overly loose codes preserve additional transition variation that must be resolved when alignment labels are scarce or narrowly distributed. FlexLAM replaces this fixed capacity with variable-length latent actions trained by nested dropout, yielding prefix-valid codes that capture compact transition structure first and add detail only when needed, without new architectures or losses. A single FlexLAM matches or surpasses separately trained fixed-capacity LAMs at every evaluated token budget under standard scarce-label supervision and under a low-return single-task alignment stress test, indicating that FlexLAM is not merely adjustable at inference time but learns a better latent-action interface at the same token budgets. The same model supports inference-time token-budget adjustment without retraining, and FlexLAM improves Ego4D transition reconstruction. These results suggest that variable-length latent actions are an architecture-free, drop-in upgrade to the fixed-capacity bottleneck in latent action models, latent-action world models, and video-pretrained action interfaces.

2606.19451 2026-06-19 cs.LG cs.CV cs.RO 新提交

3D-DLP: Self-Supervised 3D Object-Centric Scene Representation Learning

3D-DLP:自监督3D物体中心场景表示学习

Ellina Zhang, Madhaven Iyengar, Amir Zadeh, Chuan Li, Deepak Pathak, David Held, Tal Daniel

AI总结 提出3D-DLP模型,通过自监督学习将场景级RGB-D或体素观测分解为3D潜在粒子,每个粒子编码解耦属性,实现可解释的逐粒子分割图,并支持场景操控和下游机器人操作。

Comments ICML 2026. Project webpage: https://eubooks3003.github.io/3d-dlp

详情
AI中文摘要

我们引入了3D-DLP,一种自监督的物体中心表示学习模型,它将场景级RGB-D或体素观测分解为一组3D潜在粒子。基于深度潜在粒子(DLP)框架,每个粒子编码解耦的属性,包括3D关键点位置、边界框尺寸和外观特征,并代表场景中的一个独特实体。该模型通过端到端的自监督重建目标学习可解释的逐粒子分割图。我们在模拟和真实数据集上证明,学习到的潜在空间是可解释和可控的:通过操纵粒子位置并解码,我们可以生成新颖的场景配置。此外,我们展示了将这些紧凑的3D潜在粒子用于下游机器人操作,相比缺乏显式3D信息或依赖无物体中心结构的密集3D输入的基线方法,性能有所提升。代码和视频可在以下网址获取:此 https URL。

英文摘要

We introduce 3D-DLP, a self-supervised object-centric representation learning model that decomposes scene-level RGB-D or voxel observations into a set of 3D latent particles. Building on the Deep Latent Particles (DLP) framework, each particle encodes disentangled attributes, including 3D keypoint position, bounding box dimensions, and appearance features, and represents a distinct entity in the scene. The model learns interpretable per-particle segmentation maps through an end-to-end self-supervised reconstruction objective. We demonstrate on both simulated and real-world datasets that the learned latent space is interpretable and controllable: by manipulating particle positions and decoding, we can generate novel scene configurations. Furthermore, we show that leveraging these compact 3D latent particles for downstream robotic manipulation improves performance over baselines that either lack explicit 3D information or rely on memory-intensive dense 3D inputs without object-centric structure. Code and videos are available at https://eubooks3003.github.io/3d-dlp.

2606.19542 2026-06-19 cs.LG 新提交

Tracking Representation Dynamics in Large Language Models with Persistent Homology

利用持续同调追踪大型语言模型中的表示动态

Naman Malhotra, Jay Ambadkar, Abhinav Gupta, Kushal Kasivel, Abbas Schwarz, Kamillo Ferry, Anthea Monod

发表机构 * Imperial College London(伦敦帝国学院)

AI总结 通过持续同调分析激活空间拓扑,发现对齐过程中拓扑重组主要发生在训练早期,且不同对齐目标产生可区分的拓扑轨迹。

Comments 29 pages

详情
AI中文摘要

大型语言模型通常通过监督微调进行对齐,但关于其内部表示在此过程中如何演变的研究尚不充分。我们利用持续同调,通过追踪微调过程中激活空间的拓扑结构来研究对齐动态。在四个参数范围从1B到7B的Transformer语言模型以及对应于有用、无害和混合训练数据的三个对齐目标上,我们发现大多数拓扑重组发生在训练的最早阶段。密集检查点分析揭示了拓扑活动的瞬态峰值,随后迅速稳定。我们进一步表明,不同的对齐目标会引发可区分的拓扑轨迹,而指令微调和预训练模型则表现出定性不同的演化模式。我们的结果表明,持续同调为对齐提供了互补视角,揭示了仅从行为指标无法察觉的表示级变化。

英文摘要

Large language models are commonly aligned through supervised fine-tuning, yet little is known about how their internal representations evolve during this process. We study alignment dynamics using persistent homology by tracking the topology of activation spaces throughout fine-tuning. Across four transformer language models ranging from 1B to 7B parameters and three alignment objectives corresponding to helpful, harmless, and mixed training data, we find that the majority of topological reorganization occurs during the earliest stages of training. A dense checkpoint analysis reveals a transient peak in topological activity followed by rapid stabilization. We further show that different alignment objectives induce distinguishable topological trajectories, while instruction-tuned and pretrained models exhibit qualitatively different patterns of evolution. Our results suggest that persistent homology provides a complementary perspective on alignment, revealing representation-level changes that are not apparent from behavioral metrics alone.

2606.19594 2026-06-19 cs.LG 新提交

Unsupervised Causal Abstractions Discovery

无监督因果抽象发现

Théo Saulus, Simon Lacoste-Julien, Dhanya Sridhar

发表机构 * Mila - Quebec AI Institute(魁北克人工智能研究所) Université de Montréal(蒙特利尔大学) Canada CIFAR AI Chair(加拿大CIFAR人工智能主席)

AI总结 提出从低层测量数据中直接学习高层结构因果模型的方法,利用低秩因果发现假设,证明低秩图观测诱导的潜变量形成因果抽象,并给出可辨识性结果及实用学习目标。

详情
AI中文摘要

因果抽象形式化了当高层结构因果模型(SCM)捕捉低层SCM的干预行为时的情形。该概念的现有应用主要遵循假设检验范式:专家提出候选高层模型,然后评估低层系统是否实现了它。我们研究了直接从低层测量中学习高层模型的互补问题。我们的贡献利用了低秩因果发现的假设,可以总结如下:(1)我们证明了由低秩图生成的观测数据诱导出形成因果抽象的潜变量,(2)我们提供了关于这些潜变量的可辨识性结果,以及(3)我们提出了一个实用的目标来学习这个高层SCM。

英文摘要

Causal abstractions formalize when a high-level structural causal model (SCM) captures the interventional behavior of a lower-level SCM. Existing applications of this notion largely follow a hypothesis-testing paradigm: an expert proposes a candidate high-level model and then evaluates if the low-level system implements it. We study the complementary problem of learning a high-level model directly from low-level measurements. Our contributions leverage hypotheses from low-rank causal discovery, and can be summarized as follows: (1) we show that observations generated by a low-rank graph induce latents that form a causal abstraction, (2) we provide identifiability results about these latents, and (3) we propose a practical objective to learn this high-level SCM.

2606.19827 2026-06-19 cs.LG cs.AI 新提交

When, Where, and How: Adaptive Binning for Tabular Self-Supervised Learning

何时、何地以及如何:面向表格自监督学习的自适应分箱

Daehwan Kim, Haejun Chung, Ikbeom Jang

发表机构 * Hanyang University(汉阳大学) Hankuk University of Foreign Studies(韩国外国语大学)

AI总结 提出自适应分箱方法,通过特征级粗到细课程学习动态优化离散化,结合类别重建与顺序监督,在医疗表格数据上提升自监督学习性能。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

医疗表格数据在临床研究中无处不在,但表格数据的深度学习仍未被充分探索,因为可靠的标签通常需要昂贵的专家判定,尽管结构化临床变量通常以表格形式常规可用。自监督学习可以利用这些未标记的表格,而最近基于分箱的前置任务提供了一种有前景的归纳偏置,但现有目标固定单个全局分位数离散化并应用特征无关的监督。我们提出自适应分箱,一种用于表格自监督学习的训练自适应离散化前置任务,通过特征级粗到细课程将离散化与学习耦合。受神经网络的频谱偏差和课程学习原则的启发,我们的方法在检测到平台期时逐步细化每个特征的离散化,并选择表示感知的分割点,以联合改善值空间浓度和表示空间一致性。一种异质性感知目标统一了类别重建与数值特征的顺序监督,在统一评估协议下对公共医疗表格数据集的实验显示,线性探测和微调均取得一致改进,无需数据集特定的离散化调整。我们进一步引入一个医疗表格自监督学习基准,配备标准化协议,以支持这一未被充分探索领域的可重复进展。我们的代码可在该网址获取。

英文摘要

Medical tabular data are ubiquitous in clinical research, but deep learning for tables remains underexplored because reliable labels often require costly expert adjudication, even though structured clinical variables are routinely available in tabular form. Self-supervised learning can leverage these unlabeled tables, and recent binning-based pretexts offer a promising inductive bias, but existing objectives fix a single global quantile discretization and apply feature-agnostic supervision. We propose Adaptive Binning, a training-adaptive discretization pretext for tabular SSL that couples discretization to learning through a feature-wise coarse-to-fine curriculum. Motivated by the spectral bias of neural networks and the principles of curriculum learning, our method progressively refines discretization per feature upon plateau detection and selects representation-aware splits to jointly improve value-space concentration and representation-space coherence. A heterogeneity-aware objective unifies categorical reconstruction with ordinal supervision for numerical features, and experiments on public medical tabular datasets under unified evaluation protocols show consistent gains for linear probing and fine-tuning without dataset-specific discretization tuning. We further introduce a medical tabular SSL benchmark with standardized protocols to support reproducible progress in this underexplored domain. Our code is available at https://github.com/labhai/Adaptive-Binning.

2606.19888 2026-06-19 cs.LG cs.AI 新提交

SL-S4Wave: Self-Supervised Learning of Physiological Waveforms with Structured State Space Models

SL-S4Wave:基于结构化状态空间模型的生理波形自监督学习

Feng Wu, Harsh Deep, Eric Lehman, Sanyam Kapoor, Guoshuai Zhao, Rahul Krishnan, Gari Clifford, Li-wei H Lehman

发表机构 * Massachusetts Institute of Technology(麻省理工学院) OpenEvidence, USA(OpenEvidence(美国)) New York University(纽约大学) Xi’an Jiaotong University(西安交通大学) University of Toronto(多伦多大学) Emory University(埃默里大学)

AI总结 提出SL-S4Wave框架,结合对比学习与基于结构化状态空间模型的编码器,通过多尺度子核全局卷积捕获多通道生理波形的局部和长程依赖,在心律失常检测等任务中优于现有方法。

详情
AI中文摘要

由于高采样率、多通道信号复杂性、固有噪声和有限的标记数据,对长序列医学时间序列数据(如心电图)进行建模面临重大挑战。尽管最近基于各种编码器架构(如卷积神经网络)的自监督学习方法被提出用于从未标记数据中学习表示,但它们往往在捕获长程依赖和噪声不变特征方面存在不足。结构化状态空间模型擅长长序列建模,但现有的S4架构无法捕获多通道生理波形的独特特征。在这项工作中,我们提出了SL-S4Wave,一个自监督学习框架,它将对比学习与基于结构化状态空间模型的定制编码器相结合。该编码器利用多尺度子核实现多层全局卷积,从而能够在嘈杂的高分辨率多通道波形中捕获细粒度局部模式和长程时间依赖。在真实世界数据集上的大量实验表明,SL-S4Wave(1)在具有挑战性的心律失常检测任务中持续优于最先进的监督和自监督基线,(2)使用显著更少的标记示例实现高性能,展示了强大的标签效率,(3)在长波形片段上保持稳健性能,突出了其对大多数现有方法无法有效建模的长序列中复杂时间动态的建模能力,以及(4)有效迁移到未见的心律失常类型,强调了其强大的跨域泛化能力。我们还在多个EEG任务上评估了SL-S4Wave,在强基线上取得了优越性能,证明了我们的方法在心脏波形之外的泛化能力。

英文摘要

Modeling long-sequence medical time series data, such as electrocardiograms (ECG), poses significant challenges due to high sampling rates, multichannel signal complexity, inherent noise, and limited labeled data. While recent self-supervised learning (SSL) methods, based on various encoder architectures such as convolutional neural networks, have been proposed to learn representations from unlabeled data, they often fall short in capturing long-range dependencies and noise-invariant features. Structured state space models (S4) excel at long-sequence modeling, but existing S4 architectures fail to capture the unique characteristics of multichannel physiological waveforms. In this work, we propose SL-S4Wave, a self-supervised learning framework that combines contrastive learning with a tailored encoder built on structured state space models. The encoder incorporates multi-layer global convolution using multiscale subkernels, enabling the capture of both fine-grained local patterns and long-range temporal dependencies in noisy, high-resolution multichannel waveforms. Extensive experiments on real-world datasets demonstrate that SL-S4Wave (1) consistently outperforms state-of-the-art supervised and self-supervised baselines in a challenging arrhythmia detection task, (2) achieves high performance with significantly fewer labeled examples, showcasing strong label efficiency, and (3) maintains robust performance on long waveform segments, highlighting its capacity to model complex temporal dynamics in long sequences that most existing approaches fail to efficiently model, and (4) transfers effectively to unseen arrhythmia types, underscoring its robust cross-domain generalization. We additionally evaluate SL-S4Wave on multiple EEG tasks, achieving superior performance over strong baselines, demonstrating generalizability of our approach beyond cardiac waveforms.

2606.20167 2026-06-19 cs.LG 新提交

Multi-Modal Contrastive Learning for Implicit Earth Embeddings via Location Tying

多模态对比学习用于基于位置绑定的隐式地球嵌入

Jonathan Hecht, Lukas Arzoumanidis, Ziyue Li, Youness Dehbi

发表机构 * Computational Methods Lab, HafenCity University Hamburg(汉堡港城大学计算方法实验室) Dept. of Operations & Technology, Technical University of Munich(慕尼黑工业大学运营与技术系;海尔布隆数据科学中心;慕尼黑数据科学研究所) Heilbronn Data Science Center(波恩大学大地测量与地理信息研究所) Munich Data Science Institute Institute of Geodesy and Geoinformation, University of Bonn

AI总结 提出两种多模态对比学习架构MELT和SALT,通过位置绑定整合未配对地理数据,在四个下游任务中匹配最强双模态基线SATCLIP,但增加模态数未持续提升性能,表明位置编码器是主要瓶颈。

详情
AI中文摘要

空间预测任务通常受限于缺乏高质量标记的地面真值观测。为克服这一挑战,自监督预训练是一种可能的解决方案,其中对比学习在位置编码器中占主导地位。这些方法通常仅将地理坐标与一种额外模态对齐。我们提出了两种多模态对比学习架构:通过位置绑定的多模态嵌入(MELT)和顺序交替位置训练(SALT)。这些架构通过利用未配对的地理空间数据,将框架扩展到两种模态以上。两种方法在技术上均可行,并在四个下游任务中匹配了最强的双模态基线(SATCLIP)的性能。然而,增加模态数量并未持续提升性能,这表明所选的位置编码器是主要限制——对比目标在早期达到峰值,无论模态多样性或预训练量如何。MELT比SALT提供更稳定的训练,并为未来的扩展提供了更强的基础。

英文摘要

Spatial prediction tasks are often limited by a lack of high-quality labelled ground-truth observations. To overcome this challenge, self-supervised pre-training is a possible solution, with contrastive learning dominant for location encoders. Those approaches usually align geographic coordinates with just one additional modality. We propose two multimodal contrastive learning architectures: Multimodal Embedding via Location Tying (MELT) and Sequential Alternating Location Training (SALT). These architectures expand this framework beyond two modalities by utilising unpaired geospatial data. Both methods are technically viable and match the performance of the strongest two-modality baseline (SATCLIP) across four downstream tasks. However, increasing the number of modalities does not consistently improve performance, suggesting that the chosen location encoder is the main limitation - the contrastive objective reaches its peak early, regardless of modality diversity or pre-training volume. MELT provides more stable training than SALT and presents a stronger foundation for future scaling.

3. 强化学习与序列决策 13 篇

2606.19370 2026-06-19 cs.LG cs.AI cs.MA 新提交

Human-like autonomy emerges from self-play and a pinch of human data

类人自主性从自我对弈和少量人类数据中涌现

Daphne Cornelisse, Julian Hunt, Zixu Zhang, Waël Doulazmi, Kevin Joseph, Jaime Fernández Fisac, Eugene Vinitsky

发表机构 * NYU Tandon School of Engineering(纽约大学坦登工程学院) NYU Courant(纽约大学库朗数学科学研究所) Princeton University(普林斯顿大学) Centre for Robotics, Mines Paris(巴黎矿业大学机器人中心) Valeo(法雷奥)

AI总结 提出一种结合自我对弈强化学习与少量人类演示的正则化方法,仅用30分钟人类数据即可训练出与人类协调的驾驶策略,训练时间仅15小时。

Comments 10 pages

详情
AI中文摘要

自我对弈强化学习最近成为一种无需任何人类数据即可训练驾驶策略的方法。它利用廉价的大规模模拟来替代昂贵的大规模人类驾驶演示。这种方法的一个关键局限性是,通过纯自我对弈训练的策略可以学习有效但不符合人类习惯的驾驶惯例。先前的工作试图通过广泛的奖励工程和领域随机化来缓解这种行为偏差,但这些方法脆弱且劳动密集。我们的方法没有完全抛弃人类演示,而是将其作为最小安全目标达到奖励之上的正则化目标。就像好炖菜中的香料一样,我们发现少量人类数据大有裨益:我们的方法仅使用30分钟的人类演示,比同类模仿学习方法少2500倍。由此产生的策略与保留的人类轨迹协调,并在单个消费级GPU上15小时内完成训练。视频和完整源代码见https://this URL。

英文摘要

Self-play reinforcement learning has recently emerged as a way to train driving policies without any human data. It uses cheap, large-scale simulations to substitute expensive, large-scale human driving demonstrations. A key limitation of this approach is that policies trained through pure self-play can learn effective but alien driving conventions incompatible with people. Previous works attempt to mitigate such behavioral misalignments through extensive reward engineering and domain randomization, which are brittle and labor-intensive. Instead of completely discarding human demonstrations, our method treats them as a regularization objective on top of a minimal safe goal-reaching reward. Like the spice in a good stew, we find that a little human data goes a long way: our method uses only 30 minutes of human demonstrations, 2500x fewer than comparable imitation learning approaches. Resulting policies coordinate with held-out human trajectories and complete training in 15 hours on a single consumer-grade GPU. Videos and full source code are available at https://spiced-self-play.com/.

2606.19476 2026-06-19 cs.LG cs.AI 新提交

Can In-Context Learning Support Intrinsic Curiosity?

上下文学习能否支持内在好奇心?

Eric Elmoznino, Sangnie Bhardwaj, Johannes von Oswald, Rajai Nasser, Blaise Agüera y Arcas, João Sacramento, Rif A. Saurous, Guillaume Lajoie

发表机构 * Google – Paradigms of Intelligence Team(Google – 智能范式团队) Google DeepMind

AI总结 研究利用序列模型的上下文学习能力作为即时无更新世界模型,以消除传统内在好奇心方法中梯度下降的计算瓶颈,理论证明在非时间设置下可渐近收敛到真实学习进度。

详情
AI中文摘要

有效的机器学习不仅取决于我们如何对数据建模,还取决于我们选择收集哪些数据。虽然大型序列模型已经彻底改变了数据建模,但自动数据选择或“内在好奇心”的问题仍然是一个重大挑战。经典方法通过基于智能体的“学习进度”奖励来激励探索,该奖励衡量新获得的观测在多大程度上改进了世界模型的预测能力。然而,传统上评估这些奖励需要在每个轨迹内进行昂贵的梯度下降内循环更新,这使得它们在规模上计算上不可行。在这项工作中,我们研究序列模型涌现的上下文学习(ICL)能力是否可以通过作为即时的、无需更新的世界模型来消除这一瓶颈。具体来说,我们评估是否可以训练一个探索策略来最大化学习进度,仅使用上下文学习者的预测误差和反事实上下文操作。我们首先证明,在一般马尔可夫决策过程中,这实际上不可能以无偏的方式实现:由此产生的内在奖励要么包含干扰项,使其对真实学习进度的估计产生偏差,要么无法使用上下文学习者的预测误差来实现。相反,我们对于非时间设置的一个广泛子类(包括主动学习和贝叶斯实验设计)证明了积极结果:在这里,ICL派生的奖励成功界定了真实学习进度并渐近收敛到它。我们通过连续和符号环境中的受控实验证实了我们的理论,表明我们的ICL驱动框架成功训练了以最优方式进行探索的好奇数据收集策略。

英文摘要

Effective machine learning depends not only on how we model data, but also on what data we choose to collect. While large sequence models have revolutionized data modeling, the problem of automated data selection, or "intrinsic curiosity", remains a significant challenge. Classic approaches incentivize exploration by rewarding an agent based on its "learning progress", which measures how much a newly acquired observation improves a world model's predictive ability. However, evaluating these rewards traditionally requires expensive inner loops of gradient descent updates within each trajectory, rendering them computationally impractical at scale. In this work, we investigate whether the emergent in-context learning (ICL) capabilities of sequence models can eliminate this bottleneck by serving as immediate, update-free world models. Specifically, we evaluate whether an exploration policy can be trained to maximize learning progress, using solely the prediction errors and counterfactual context manipulations of an in-context learner. We first prove that in general Markov decision processes, this is in fact impossible in an unbiased way: the resulting intrinsic rewards either suffer from nuisance terms that bias their estimation of true learning progress, or they cannot be implemented using an in-context learner's prediction errors. Conversely, we prove a positive result for a broad subclass of non-temporal settings, encompassing active learning and Bayesian Experimental Design: here, ICL-derived rewards successfully bound and asymptotically converge to the true learning progress. We corroborate our theory with controlled experiments across continuous and symbolic environments, demonstrating that our ICL-driven framework successfully trains curious data-collection policies that explore optimally.

2606.19690 2026-06-19 cs.LG 新提交

Multi-Granular Attention-Driven Reinforcement Learning Framework for Web Intelligent Enhancement Systems

多粒度注意力驱动的强化学习框架用于Web智能增强系统

Navin Chhibber, Deepak Singh, Anokh Kishore, Nikita Chawla, K. Anguraj

AI总结 提出MGAR-WIES框架,通过语义图建模、注意力机制和自适应强化学习,解决Web环境中异构动态数据的语义理解与可扩展性问题,在准确率上达到80%。

Comments 2026 3rd International Conference on Integrated Intelligence and Communication Systems (ICIICS), 6 Pages

详情
AI中文摘要

近年来,Web智能增强系统越来越依赖异构和动态的Web数据来提供个性化的上下文感知服务。然而,传统的机器学习、深度学习和强化学习模型在持续演化的Web环境中往往难以应对语义理解、适应性和可扩展性的挑战。本研究提出了一种基于多粒度注意力的强化Web智能增强系统(MGAR-WIES),通过集成语义图建模、注意力机制和自适应强化学习来应对这些挑战。首先,收集包括结构化、半结构化和非结构化来源的异构Web数据,并进行预处理以生成统一特征表示。这些表示被转换为动态语义图,其中实体及其关系通过注意力机制增强的图嵌入进行建模,以捕捉局部相关性和全局上下文依赖。随后,一种自适应多智能体强化学习策略利用注意力感知的语义状态来优化个性化Web动作,如内容推荐、导航优化和服务自适应。最后,持续在线反馈被进一步集成,以实时更新图表示和学习策略,确保持续的适应性和性能。与现有方法相比,提出的MGAR-WIES在准确率(80%)方面取得了更好的结果。

英文摘要

From the past few years, web intelligent enhancement systems increasingly rely on heterogeneous and dynamic web data to deliver personalized, context-aware services. However, traditional machine learning, deep learning, and reinforcement learning models often struggle with semantic understanding, adaptability, and scalability in continuously evolving web environments. In this research, a Multi-Granular Attention-based Reinforcement Web Intelligent Enhancement System (MGAR-WIES) is proposed to address the challenges by integrating semantic graph modeling, attention mechanisms, and adaptive reinforcement learning. Initially, heterogeneous web data comprising structured, semi-structured and unstructured sources are collected and preprocessed for generating unified feature representations. These representations are transformed into a dynamic semantic graph, where entities and their relationships are modeled by using graph embeddings enhanced by attention mechanisms for capturing both local relevance and global contextual dependencies. Subsequently, an adaptive multi-agent reinforcement learning strategy leverages the attention-aware semantic states to optimize personalized web actions like content recommendation, navigation optimization, and service adaptation. Finally, the continuous online feedback is further integrated to update graph representations and learning policies in real time by ensuring sustained adaptability and performance. The proposed MGAR-WIES acheived better results in terms of accuracy (80%) when compared with existing approaches.

2606.19721 2026-06-19 cs.LG cs.AI 新提交

OnDeFog: Online Decision Transformer under Frame Dropping

OnDeFog:帧丢失下的在线决策变压器

Daiki Yotsufuji, Kenta Nishihara, Shoma Shimizu, Kento Uchida, Shinichi Shirakawa

发表机构 * Yokohama National University(横滨国立大学)

AI总结 针对帧丢失导致性能下降的问题,提出OnDeFog,将DeFog机制与在线决策变压器结合,通过直接环境交互学习策略,在高丢帧率环境下优于ODT,在低奖励数据集上优于DeFog。

Comments Accepted to PRICAI 2025

详情
AI中文摘要

在具有挑战性的现实世界强化学习应用中,通信延迟或传感器故障经常导致帧丢失,此时智能体无法接收丢失的状态及相关奖励。为了解决帧丢失导致的性能下降问题,通过将额外机制引入决策变压器以处理帧丢失,开发了随机帧丢失下的决策变压器(DeFog)。尽管DeFog可以缓解帧丢失环境中的性能下降,但由于DeFog是一种离线学习方法,它难以有效泛化到训练数据集中未充分表示的新状态。在本研究中,我们提出OnDeFog,它将DeFog中的机制与在线决策变压器(ODT)相结合,ODT是一种通过直接环境交互学习策略的在线强化学习方法。全面的实验评估表明,我们提出的OnDeFog在高丢帧率环境下相比ODT取得了更优的性能,并且在包含大量低奖励数据的数据集上优于DeFog。

英文摘要

In challenging real-world reinforcement learning applications, communication delays or sensor failures often cause frame dropping, in which the agent cannot receive the dropped states and associated rewards. To address the performance degradation caused by frame dropping, the Decision Transformer under Random Frame Dropping (DeFog) was developed by incorporating additional mechanisms into the decision transformer to tackle frame dropping. Although DeFog can mitigate performance degradation in frame-dropping environments, since DeFog is an offline learning method, it struggles to effectively generalize to novel states not adequately represented in the training dataset. In this study, we propose OnDeFog, which integrates the mechanisms in DeFog with the online decision transformer (ODT), an online reinforcement learning method that learns policies through direct environmental interaction. Comprehensive experimental evaluation demonstrates that our proposed OnDeFog achieves superior performance compared to ODT in environments characterized by high dropping frame rate and outperforms DeFog on datasets containing a large amount of low-reward data.

2606.19750 2026-06-19 cs.LG cs.AI cs.CL 新提交

Manifold Bandits: Bayesian Curriculum Learning over the Latent Geometry of Large Language Models

流形赌博机:大语言模型潜在几何上的贝叶斯课程学习

Darrien McKenzie, Nicklas Hansen, Xiaolong Wang

发表机构 * University of California, San Diego(加州大学圣迭戈分校)

AI总结 提出贝叶斯流形课程(BMC)框架,将问题采样建模为流形结构赌博机问题,通过层次任务树和贝叶斯学习引导采样,平衡学习信号、多样性和实用性。

Comments Webpage: https://darrienmckenzie.com/manifold-bandits/

详情
AI中文摘要

强化学习(RL)是提高大语言模型(LLMs)推理能力的关键方法,其中训练效率关键取决于优化过程中问题的采样方式。现有的自适应课程学习方法通常优先考虑中等难度的提示,将问题选择视为具有独立臂的标准赌博机问题,忽略了任务空间的结构化和异质性。在这项工作中,我们将问题采样框架化为具有内生非平稳性的流形结构赌博机问题:问题通过模型的潜在表示空间相关联,采样决策可以影响学习信号在该空间中的演变方式。为了实现这一视角,我们引入了贝叶斯流形课程(BMC),这是一个结构感知框架,将问题组织成层次任务树,并应用贝叶斯学习来指导采样。实验发现,不同的采样策略在生产性(学习信号)、多样性(任务流形覆盖)和实用性(评估相关性)之间引入了非平凡的权衡。这些结果表明,仅优先考虑难度不足以获得强大的下游性能,突出了将结构和类型感知纳入问题采样中的重要性。

英文摘要

Reinforcement learning (RL) is a central approach for improving reasoning capabilities in large language models (LLMs), where training efficiency depends critically on how problems are sampled during optimization. Existing adaptive curriculum learning methods typically prioritize prompts of intermediate difficulty, treating problem selection as a standard bandit problem with independent arms and overlooking the structured, heterogeneous nature of the task space. In this work, we frame problem sampling as a manifold-structured bandit problem with endogenous non-stationarity: problems are related through the model's latent representation space, and sampling decisions can steer how learning signals evolve across that space. To operationalize this perspective, we introduce Bayesian Manifold Curriculum (BMC), a structure-aware framework that organizes problems into a hierarchical task tree and applies Bayesian learning to guide sampling. Empirically, we find that different sampling strategies induce non-trivial tradeoffs between productivity (learning signal), diversity (coverage of the task manifold), and utility (evaluation relevance). These results show that prioritizing difficulty alone is insufficient for strong downstream performance, highlighting the importance of incorporating structure and type-awareness into problem sampling.

2606.19883 2026-06-19 cs.LG stat.ML 新提交

Matching Markets meet Cumulative Prospect Theory: Towards Optimal and Adversarially Robust Learning

匹配市场遇上累积前景理论:迈向最优和对抗鲁棒学习

Ananya Kunisetty, Avishek Ghosh

发表机构 * Indian Institute of Technology Bombay(印度理工学院孟买分校)

AI总结 研究基于累积前景理论(CPT)的竞争性双边匹配市场多智能体多臂赌博机问题,提出最优遗憾界算法并扩展到对抗性市场。

Comments Accepted at ECML-PKDD 2026, Naples, Italy

详情
AI中文摘要

我们研究了一个在竞争性设置下具有双边匹配市场的多智能体多臂赌博机问题,该问题基于以人为中心的决策模型。为了捕捉人类偏好,我们使用累积前景理论(CPT),该理论通过一个(α-Hölder连续)权重函数以非线性方式加权智能体的行动。CPT已被广泛用于行为经济学和风险敏感机器学习中,以模拟人类偏好。我们分析了带有CPT权重扭曲奖励的最先进学习算法,并获得了玩家最优遗憾界为$\mathcal{O}(K\log T \left(\frac{1}{\Delta}\right)^{2/\alpha})$,其中$K$表示臂数,$T$是学习时间,$\Delta$表示(适当定义的)玩家的最小偏好差距。注意到对$\Delta$的依赖是次优的,我们通过明智地选择探索期间的活跃臂集进一步改进了这一遗憾,从而在主导项中消除了对$K$的依赖,并在臂数$K$显著大于玩家数$N$的设置中实现了改进的(最优)遗憾保证。此外,我们考虑了对抗性市场,其中智能体的观测奖励可能被破坏。我们提出并分析了在已知和未知总破坏预算两种设置下,以CPT作为风险敏感度量的鲁棒市场算法,并在两种情况下建立了对数级别的玩家最优遗憾保证。

英文摘要

We study a multi-agent multi-armed bandit problem in the competitive setup with two-sided matching markets under a human centric decision making model. To capture human preferences, we use cumulative prospect theory (CPT) that weighs the actions of the agent in a nonlinear fashion using a ($α$-Hölder continuous) weight function. CPT has been widely used in behavioral economics and risk sensitive machine learning to emulate human preferences. We analyze the state-of-the-art learning algorithm with CPT weight distorted rewards and obtain a player optimal regret of $\mathcal{O}(K\log T \left(\frac{1}Δ\right)^{2/α})$, where $K$ denotes the number of arms, $T$ is the learning horizon, and $Δ$ represents (suitably defined) players' minimum preference gap. Noticing the dependence on $Δ$ to be sub-optimal, we further improve this regret by judiciously selecting the active set of arms during exploration, which removes the dependence on $K$ in the dominant term and achieves an improved (optimal) regret guarantees in the setting where the number of arms $K$ is significantly larger than the number of players $N$. In addition, we consider adversarial markets where the observed rewards of the agents may be corrupted. We propose and analyze algorithms for robust markets with CPT as risk sensitive measure in both settings where the total corruption budget is known and where it is unknown, and establish logarithmic player-optimal regret guarantees in both cases.

2606.20002 2026-06-19 cs.LG cs.AI cs.CL 新提交

Connect the Dots: Training LLMs for Long-Lifecycle Agents with Cross-Domain Generalization Via Reinforcement Learning

Connect the Dots:通过强化学习训练具备跨域泛化能力的长期生命周期智能体

Yanxi Chen, Weijie Shi, Yuexiang Xie, Boyi Hu, Yaliang Li, Bolin Ding, Jingren Zhou

发表机构 * Alibaba Group(阿里巴巴集团)

AI总结 提出Connect the Dots框架,通过端到端强化学习训练LLM在长期任务中自我更新上下文并泛化到新领域,实验验证了跨域泛化能力。

Comments Work in progress; we will continuously update the codebase and arXiv version

详情
AI中文摘要

本文提出了一个通用框架,用于训练大型语言模型(LLMs)具备“Connect the Dots”(CoD)这一元能力,该能力是长期生命周期智能体所必需的:当基于LLM的AI智能体部署在环境中时,它解决一系列长期任务,同时持续探索环境、从自身经验中学习,并迭代地自我更新关于环境的上下文,从而在更新上下文的条件下,在未来任务上实现逐步更好的性能。CoD框架的主要组成部分包括:(1)用于端到端强化学习(RL)的算法设计和基础设施,其中包含交替执行任务和更新上下文的长展开序列;(2)用于在训练过程中激励和激发LLM中目标元能力的任务和环境,以及在评估过程中忠实衡量进展的任务和环境。我们展示了CoD框架的概念验证实现,包括具有细粒度信用分配的GRPO风格RL算法,以及针对目标元能力(而非特定领域的LLM能力或标准的逐任务RL)量身定制的任务和环境。实证结果验证了CoD设置中端到端RL训练的有效性,并展示了所激发元能力的分布外泛化潜力——在训练领域内、跨不同领域以及从CoD到Ralph-loop设置中。我们对CoD的研究连接了多项先前工作,并为推进LLM和AI智能体开辟了新的机遇。为促进进一步研究和应用,我们在\url{this https URL}上发布了我们的实现。

英文摘要

This work presents a general framework for training large language models (LLMs) to "Connect the Dots" (CoD), a meta-capability required by long-lifecycle agents: as an LLM-based AI agent gets deployed in an environment, it solves a long sequence of tasks while continuously exploring the environment, learning from its own experiences, and iteratively self-updating its context about the environment, thereby achieving progressively better performance on future tasks conditioned on the updated context. Major components of the CoD framework include: (1) algorithm design and infrastructure for end-to-end reinforcement learning (RL) with long rollout sequences interleaving solve-task and update-context episodes; (2) tasks and environments for incentivizing and eliciting the targeted meta-capability in LLMs during training, as well as for faithfully measuring progress during evaluation. We present proof-of-concept implementations of the CoD framework, including a GRPO-style RL algorithm with fine-grained credit assignment, as well as tasks and environments tailored to the targeted meta-capability (rather than domain-specific LLM capabilities or standard task-by-task RL). Empirical results validate the efficacy of end-to-end RL training in the CoD setting, and demonstrate the potential for out-of-distribution generalization -- within the training domains, across different domains, and from CoD to Ralph-loop settings -- of the elicited meta-capability. Our investigation of CoD connects several lines of prior works, and opens up new opportunities for advancing LLMs and AI agents. To facilitate further research and applications, we release our implementations at \url{https://github.com/agentscope-ai/Trinity-RFT/tree/research/cod/examples/research_cod}.

2606.20008 2026-06-19 cs.LG 新提交

VIMPO: Value-Implicit Policy Optimization for LLMs

VIMPO: 值隐式策略优化用于大语言模型

Zhewei Kang, Aosong Feng, Sergey Levine, Dawn Song, Xuandong Zhao

发表机构 * UC Berkeley(加州大学伯克利分校) Yale University(耶鲁大学)

AI总结 提出VIMPO方法,通过KL正则化强化学习的最优条件导出策略隐含值函数,无需训练评论家,实现细粒度信用分配,在数学推理基准上优于GRPO。

详情
AI中文摘要

基于可验证奖励的强化学习已成为提升大语言模型推理能力的核心工具,但当前方法在简单性与信用分配之间存在权衡。GRPO等群组相对方法避免了训练评论家,但通常为每个token分配轨迹级优势。Actor-critic方法提供更密集的学习信号,但需要学习值函数,其自身存在训练不稳定性。我们提出VIMPO,一种无需评论家的策略优化方法,从KL正则化强化学习的最优条件推导出策略隐含值函数。对于自回归生成,得到的值递归可以用策略-参考对数比率表示,并由轨迹结束时无未来奖励的终止条件锚定。这给出了一个简单的值损失,它结合了结果级可验证奖励,而无需训练评论家。相同的推导也产生了无需评论家的actor优势,使VIMPO能够通过值损失分离奖励合并,并通过PPO风格的actor更新进行策略改进。在数学RLVR基准上,VIMPO在MATH-500、AIME 2024、AIME 2025和OlympiadBench上均优于GRPO,尤其在竞赛式评估中提升更大。在噪声奖励下,VIMPO保持对GRPO的持续优势,表明策略隐含值优化可以在保持无评论家训练实用简单性的同时提供更精细的信用分配。

英文摘要

Reinforcement learning with verifiable rewards has become a central tool for improving the reasoning ability of large language models, but current methods face a trade-off between simplicity and credit assignment. Group-relative methods such as GRPO avoid training a critic, but typically assign a trajectory-level advantage to every token. Actor-critic methods provide denser learning signals, but require a learned value function with its own training instability. We introduce VIMPO, a critic-free policy optimization method that derives a policy-implied value function from the optimality conditions of KL-regularized reinforcement learning. For autoregressive generation, the resulting value recurrence can be written in terms of policy-reference log-ratios and anchored by the terminal condition that no future reward remains at the end of a trajectory. This gives a simple value loss that incorporates outcome-level verifiable rewards without training a critic. The same derivation also yields a critic-free actor advantage, allowing VIMPO to separate reward incorporation through the value loss from policy improvement through a PPO-style actor update. On mathematical RLVR benchmarks, VIMPO improves over GRPO across MATH-500, AIME 2024, AIME 2025, and OlympiadBench, with especially larger gains on competition-style evaluations. Under noisy rewards, VIMPO retains a consistent advantage over GRPO, suggesting that policy-implied value optimization can provide finer credit assignment while preserving the practical simplicity of critic-free training.

2606.20014 2026-06-19 cs.LG cs.AI 新提交

Hierarchical Control in Multi-Agent Games: LLM-based Planning and RL Execution

多智能体博弈中的层次化控制:基于LLM的规划与RL执行

Jannik Hösch, Alessandro Sestini, Florian Fuchs, Amir Baghi, Joakim Bergdahl, Konrad Tollmar, Jean-Philippe Barrette-LaPierre, Linus Gisslén

AI总结 提出LLM作为中央策略控制器选择RL技能策略的层次化架构,在2v2对抗环境中达到与手工BT相当的胜率,且被感知为最类人。

Comments 12 pages, 9 figures

详情
AI中文摘要

强化学习(RL)在序列决策中取得了强劲表现,但由于稀疏奖励、大状态-动作空间以及学习协调策略的困难,扩展到复杂多智能体环境仍具挑战。我们提出一种层次化架构,其中预训练的大语言模型(LLM)作为集中式策略控制器,为一组智能体选择专门的RL技能策略,而RL策略负责反应式底层执行。我们在竞争性2v2 King of the Hill环境中评估该混合系统,与行为树(BT)和“扁平”RL(无技能分解的端到端训练)基线进行比较。LLM+RL系统实现了与手工BT统计上相当的任务性能(胜率46.4% vs 51.5%,p=0.103),而两者均显著优于无技能分解训练的扁平RL。一项用户研究(n=15)显示,60%的参与者认为LLM+RL智能体最像人类(p=0.027),归因于行为适应性和战术变异性。这些结果表明,预训练LLM推理可以有效编排预训练RL技能,实现具有竞争力的多智能体协调和优越的感知可信度,而无需手动规则工程。

英文摘要

Reinforcement learning (RL) has achieved strong performance in sequential decision-making, yet scaling to complex multi-agent environments remains challenging due to sparse rewards, large state-action spaces, and the difficulty of learning coordinated strategies. We propose a hierarchical architecture where a pretrained large language model (LLM) acts as a centralized strategic controller that selects among specialized RL skill policies for a team of agents, while RL policies handle reactive low-level execution. We evaluate this hybrid system in a competitive 2v2 King of the Hill environment against behavior tree (BT) and \emph{``Flat''} RL (end-to-end training without skill decomposition) baselines. The LLM+RL system achieves task performance statistically equivalent to hand-crafted BT (46.4\% vs 51.5\% win rate, $p=0.103$) while both significantly outperform Flat RL trained without skill decomposition. A user study ($n=15$) reveals that 60\% of participants perceive LLM+RL agents as the most human-like ($p=0.027$), citing behavioral adaptability and tactical variability. These results demonstrate that pretrained LLM reasoning can effectively orchestrate pretrained RL skills, achieving competitive multi-agent coordination and superior perceived believability without manual rule engineering.

2606.20104 2026-06-19 cs.LG cs.AI 新提交

Sensorimotor World Models: Perception for Action via Inverse Dynamics

传感器运动世界模型:通过逆动力学实现面向行动感知

Petr Ivashkov, Randall Balestriero, Bernhard Schölkopf

发表机构 * Max Planck Institute for Intelligent Systems(马克斯·普朗克智能系统研究所) Department of Computer Science, Brown University(布朗大学计算机科学系) ELLIS Institute(ELLIS研究所) ETH Zürich(苏黎世联邦理工学院)

AI总结 提出传感器运动世界模型(SMWM),通过逆动力学正则化端到端训练潜空间世界模型,防止表示崩溃并学习与行动对齐的紧凑表示,在2D和3D控制任务中实现竞争性规划性能。

详情
AI中文摘要

面向行动的感知表明,世界的表示不应仅由视觉保真度决定,而应由其与行动的相关性决定。同时,潜在的JEPA风格世界模型主张从高维观测中学习紧凑的预测状态以促进未来状态的预测,但这些模型的端到端训练并非易事,因为如果我们的唯一目标是构建易于预测的潜在状态,表示可能会崩溃。我们引入了一种传感器运动世界模型(SMWM):一种通过逆动力学正则化进行端到端训练的潜在世界模型。这一单一正则化解决了两个问题:它防止表示崩溃并诱导与行动对齐的表示。通过迫使潜在状态保留关于转换背后行动的信息,它使模型偏向于环境中可控的自由度,同时丢弃不可控的干扰因素。这产生了从离线、无奖励轨迹中训练的稳定潜在世界模型,无需冻结编码器、指数移动平均或复杂的潜在正则化。实验表明,SMWM学习了紧凑、可解释的潜在空间,并在简单的2D和3D控制任务中实现了竞争性的规划性能。

英文摘要

Perception for action suggests that representations of the world should be shaped not by visual fidelity alone, but by their relevance for actions. At the same time, latent JEPA-style world models advocate learning compact predictive states from high-dimensional observations to facilitate the prediction of future states, but end-to-end training of these models is nontrivial because representations may collapse if our only goal is to construct a latent state that is easy to predict. We introduce a sensorimotor world model (SMWM): a latent world model trained end-to-end with inverse dynamics regularization. This single regularizer addresses both issues: it prevents representation collapse and induces action-aligned representations. By forcing latent states to preserve information about the action underlying a transition, it biases the model toward the controllable degrees of freedom of the environment while discarding uncontrollable distractors. This yields stable latent world models trained from offline, reward-free trajectories, without frozen encoders, exponential moving averages, or complex latent regularizers. Empirically, SMWM learns compact, interpretable latent spaces and enables competitive planning performance across simple 2D and 3D control tasks.

2606.20107 2026-06-19 cs.LG 新提交

Quantile of Means: A Bonus-Free Ensemble Method for Minimax Optimal Reinforcement Learning

均值分位数:一种用于最小最大最优强化学习的无奖励集成方法

Asaf Cassel, Aviv Rosenberg

发表机构 * Google Research(谷歌研究院)

AI总结 提出一种基于分位数的集成方法,无需计数即可在有限时域MDP中实现最优方差依赖的遗憾界,为强化学习中的集成探索提供理论依据。

详情
AI中文摘要

最优强化学习算法通常依赖于精心构造的基于计数的不确定性估计来驱动探索。尽管理论上合理,但这类估计在实际设置中难以计算,因此对设计探索启发式方法提供的见解有限。与此同时,集成方法已成为一种实用的方法,但仍缺乏理论证明。基于最近一种用于多臂赌博机的集成方法,我们提出了一种用于有限时域马尔可夫决策过程(MDP)的基于分位数的集成方法。我们这种简单的无计数方法实现了最优方差依赖的遗憾界,为强化学习中的集成探索提供了理论基础。

英文摘要

Optimal Reinforcement Learning (RL) algorithms typically rely on carefully constructed count-based uncertainty estimates to drive exploration. Although theoretically sound, such estimates are hard to compute in practical settings and therefore offer limited insight for designing exploration heuristics. Meanwhile, ensembling has emerged as a practical approach, but remains without theoretical justification. Building on a recent ensemble-based method for Multi-Armed Bandits, we propose a quantile-based ensemble method for finite-horizon Markov Decision Processes (MDPs). Our simple count-free approach achieves optimal variance-dependent regret bounds, providing theoretical grounding for ensemble-based exploration in RL.

2606.20411 2026-06-19 cs.LG 新提交

Direct Advantage Estimation for Scalable and Sample-efficient Deep Reinforcement Learning

直接优势估计:可扩展且样本高效的深度强化学习

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结 针对直接优势估计(DAE)在部分可观测域和高维观测下的局限性,本文扩展其理论框架并引入离散潜动态模型降低计算复杂度,在Arcade学习环境中验证了DAE的可扩展性和样本效率。

Comments Accepted at RLC2026

详情
AI中文摘要

直接优势估计(DAE)已被证明可以提高深度强化学习算法的样本效率。然而,它对完全环境可观测性的依赖限制了其在现实场景中的适用性,并且其对转移概率建模的要求在高维观测下会带来巨大的计算开销。在本文中,我们解决了这两个局限性。首先,我们将DAE的理论框架扩展到部分可观测域,只需最小的修改。其次,我们通过引入高效近似转移概率的离散潜动态模型来降低其计算复杂度。我们在Arcade学习环境上评估了我们的方法,发现DAE在保持高样本效率的同时,能有效地随函数逼近器容量扩展。

英文摘要

Direct Advantage Estimation (DAE) has been shown to improve the sample efficiency of deep reinforcement learning algorithms. However, its reliance on full environment observability limits its applicability in realistic settings, and its requirement to model transition probabilities incurs substantial computational overhead for high-dimensional observations. In the present work, we address both limitations. First, we extend the theoretical framework of DAE to partially observable domains with minimal modifications. Second, we reduce its computational complexity by introducing discrete latent dynamics models that efficiently approximate transition probabilities. We evaluate our approach on the Arcade Learning Environment and find that DAE scales effectively with function approximator capacity while retaining high sample efficiency.

2606.20475 2026-06-19 cs.LG 新提交

Marginal Advantage Accumulation for Memory-Driven Agent Self-Evolution

边际优势累积用于记忆驱动智能体自我进化

Mingyu Yang, Keye Zheng, Congchao Cheng, Yujie Liu, Xingkang Lu, Fan Jiang, Yefei Zheng

发表机构 * Alibaba International Digital Commerce Group(阿里巴巴国际数字商业集团)

AI总结 针对批量式轨迹蒸馏中跨批次证据缺失问题,提出边际优势累积(MAA)方法,通过差分信号构造、指数移动平均累积和语义身份合并,在16个设置中14个取得最佳结果,优化阶段token消耗减少约75%。

Comments 26 pages, 4 figures, 10 tables, 42 references

详情
AI中文摘要

在批量式轨迹蒸馏中,同一记忆操作可能在不同批次间收到矛盾的反馈。现有方法缺乏跨批次、操作级别的证据累积机制,无法区分稳定有效的操作与偶然命中。本文将需求形式化为两个结构条件:可对齐性和可比性,并提出边际优势累积(MAA)。MAA构造差分信号使其跨批次可比,通过指数移动平均(EMA)累积每个操作的有符号证据,并通过语义身份合并确保跨批次可追溯性。作为一种后处理架构,MAA在4个基准和4个目标模型的16个设置中14个取得最佳结果,持续优于现有批量级蒸馏基线,并在大多数设置中匹配或超越在线替代方法,同时将优化阶段的token消耗减少约75%。

英文摘要

In batch-style trace distillation, the same memory operation may receive contradictory feedback across different batches. Existing methods lack a cross-batch, operation-level evidence accumulation mechanism, making it impossible to distinguish stably effective operations from accidental hits. This paper formalizes the requirement as two structural conditions, alignability and comparability, and proposes Marginal Advantage Accumulation (MAA). MAA constructs differential signals to make them comparable across batches, accumulates signed evidence per operation via EMA, and ensures cross-batch traceability through semantic identity merging. As a post-processing architecture, MAA achieves the best results in 14 out of 16 settings across 4 benchmarks and 4 target models, consistently outperforming existing batch-level distillation baselines and matching or surpassing online alternatives in most settings, while reducing optimization-phase token consumption by approximately 75%.

4. 生成模型与概率建模 7 篇

2606.19377 2026-06-19 cs.LG cs.AI 新提交

Emyx: Fast and efficient all-atom protein generation

Emyx: 快速高效的全原子蛋白质生成

Nicholas J. Williams, Ward Haddadin, Matteo P. Ferla, Constantin Schneider, Nicholas B. Woodall, Ruby Sedgwick, Christian D. Madsen, Andrew L. Hopkins, Edward O. Pyzer-Knapp

发表机构 * Xyme

AI总结 提出Emyx,一种140M参数的流匹配模型,通过轻量条件表示和稀疏连接降低复杂度,在酶设计基准上超越现有方法,训练仅需682 GPU小时。

详情
AI中文摘要

计算酶设计需要生成能够支撑催化残基和配体的蛋白质,这要求生成模型同时具备几何准确性和结构多样性。当前的全原子生成模型继承了结构预测中的昂贵架构,导致训练成本高、样本多样性有限。我们认为,对于生成模型而言,这种复杂性大多是不必要的,因为生成模型依赖于稀疏的几何约束而非丰富的共进化信号。Emyx是一个140M参数的条件流匹配模型,将能力集中在标准Transformer块中,用轻量条件表示和稀疏连接替代了厚重的嵌入堆叠。此外,我们推导了流匹配插值到EDM噪声水平框架的精确重参数化,将流匹配训练效率与为扩散模型设计的最先进采样方法桥接起来,无需重新训练。尽管是最小的模型,Emyx在AME酶设计基准上,在要求全局折叠恢复和催化几何准确性的严格评估下,在成功率、结构新颖性、骨架多样性和几何有效性方面均优于Proteína-Complexa和RFdiffusion3,而训练仅需682 GPU小时,约为RFdiffusion3的1/4。

英文摘要

Computational enzyme design requires generating proteins that scaffold catalytic residues and ligands, a task that demands both geometric accuracy and structural diversity from the underlying generative model. Current all-atom generators inherit expensive architectures from structure prediction, leading to high training costs and limited sample diversity. We argue that much of this complexity is unnecessary for generators, which condition on sparse geometric constraints rather than rich co-evolutionary signals. Emyx is a 140M-parameter conditional flow matching model that concentrates capacity within standard transformer blocks, replacing heavy embedding stacks with lightweight conditional representations and sparse connectivity. We additionally derive an exact reparametrisation of the flow matching interpolant into the EDM noise-level framework, bridging flow matching training efficiency with state-of-the-art sampling methods designed for diffusion models without retraining. Despite being the smallest model, Emyx outperforms both Proteína-Complexa and RFdiffusion3 against the AME enzyme design benchmark across success rate under strict evaluation requiring both global fold recovery and catalytic geometry accuracy, structural novelty, scaffold diversity, and geometric validity, while training in just $682$ GPU-hours, roughly $4\times$ less than RFdiffusion3.

2606.19496 2026-06-19 cs.LG 新提交

Calibrating Generative Models to Feature Distributions with MMD Finetuning

使用MMD微调将生成模型校准到特征分布

Nathaniel L. Diamant, Brian L. Trippe

发表机构 * Stanford University(斯坦福大学)

AI总结 提出kCGM方法,通过最小化生成与目标特征分布的最大均值差异(MMD)并加入KL正则化,在不牺牲有效性的前提下校准生成模型的特征分布,适用于多种生成模型。

详情
AI中文摘要

生成模型可以产生个体上合理的样本,但在关键特征分布上与目标集存在显著偏差。例如,在广泛的药物类化学空间上预训练的模型可能生成分子,其分子特征与感兴趣的治疗类别(如已知抗生素)不同。纠正这种分布校准错误具有挑战性:在目标集上直接微调可能导致过拟合,并且无法控制匹配哪些特征。为了填补这一空白,我们引入了核校准生成模型(kCGM)。kCGM使用无偏得分函数估计器最小化生成特征分布与目标特征分布之间的最大均值差异(MMD),并通过KL正则化保持与预训练模型的接近。在一个包含174种抗生素的目标集上,直接微调牺牲了化学有效性以匹配特征分布,而kCGM在提高有效性的同时改善了目标特征匹配。我们还在蛋白质和DNA生成任务中展示了kCGM,表明它可以使用仅特征级别的监督来适应自回归、连续空间扩散和离散扩散模型。代码可在https://this URL获取。

英文摘要

Generative models can produce individually plausible samples while deviating substantially from a target set in the distribution of key features. For example, a model pretrained on broad drug-like chemical space may generate molecules whose molecular features differ from those of a therapeutic class of interest, such as known antibiotics. Correcting such distributional miscalibration is challenging: direct finetuning on the target set can overfit and does not control which features are matched. To fill this gap, we introduce kernel Calibrating Generative Models (kCGM). kCGM minimizes a maximum mean discrepancy (MMD) between generated and target feature distributions using an unbiased score-function estimator, with KL regularization to remain close to the pretrained model. On a target set of 174 antibiotics, direct finetuning sacrifices chemical validity for feature-distribution matching, whereas kCGM improves target feature matching while increasing validity. We further demonstrate kCGM in protein and DNA generation tasks, showing it can adapt autoregressive, continuous-space diffusion, and discrete diffusion models using only feature-level supervision. Code is available at https://github.com/smithhenryd/cgm.

2606.19770 2026-06-19 cs.LG 新提交

An Information Theoretic Framework for Graph Novelty Generation via Latent Mixture Modeling

基于潜在混合建模的图新颖性生成的信息论框架

Itsuki Nakagawa, Kenji Yamanishi

发表机构 * Graduate School of Information Science and Technology, The University of Tokyo(东京大学信息科学与技术研究生院)

AI总结 提出信息论框架,通过潜在混合建模和描述长度约束,生成与现有模式不同且保持全局结构一致性的新颖图数据。

详情
AI中文摘要

我们提出了一个用于图新颖性生成的信息论框架,旨在生成与现有模式不同且保持全局结构一致性的数据。我们的方法将数据嵌入潜在空间,使用有限混合模型对潜在分布进行建模,并通过基于描述长度制定的显式新颖性和可靠性条件生成新颖样本。具体来说,新颖性通过要求生成样本难以被所有现有混合成分解释来强制执行,而可靠性则根据最小描述长度(MDL)原则约束其对整体混合结构的影响。我们提供了理论分析,表明在适当的阈值选择下,将非新颖或不可靠样本错误分类的概率以显式速率收敛到零。在合成和基准图数据集上的实验表明,所提出的方法能够以可量化的风险实现原则性的新颖性生成。

英文摘要

We propose an information-theoretic framework for graph novelty generation, which aims to generate data that are distinct from existing patterns while preserving global structural consistency. Our approach embeds data into a latent space, models the latent distribution using finite mixture models, and generates novel samples by imposing explicit novelty and reliability conditions formulated in terms of description length. Specifically, novelty is enforced by requiring generated samples to be poorly explained by all existing mixture components, while reliability constrains their impact on the overall mixture structure under the Minimum Description Length (MDL) principle. We provide a theoretical analysis showing that, with appropriate threshold choices, the probabilities of misclassifying non-novel or unreliable samples converge to zero with explicit rates. Experiments on synthetic and benchmark graph datasets demonstrate that the proposed method enables principled novelty generation with quantifiable risk.

2606.19802 2026-06-19 cs.LG cs.CV 新提交

Flow Map Denoisers: Traversing the Distortion-Perception Plane for Inverse Problems

流映射去噪器:遍历逆问题的失真-感知平面

Nicolas Zilberstein, Morteza Mardani, Santiago Segarra

发表机构 * Rice University(莱斯大学) NVIDIA Inc.(英伟达公司)

AI总结 提出流映射模型,通过单一参数t在MMSE和感知质量间连续调节,实现逆问题的失真-感知权衡,无需额外监督或调参。

详情
AI中文摘要

图像复原面临一个基本权衡:最小化误差的方法产生模糊重建,而最大化感知质量的方法产生锐利但不够保真的图像。现有方法要么在失真-感知(DP)前沿上固定一个操作点,要么需要配对数据监督、辅助模型或对采样器进行超参数调优以访问不同点。我们证明,流映射模型——一种用于少步采样的流匹配的近期扩展,学习一个平均场——隐式定义了一个单参数去噪器族,连续跨越DP前沿。前瞻参数t充当MMSE和感知区域之间的控制旋钮。对于高斯目标,我们证明改变t精确恢复最优DP前沿;对于自然图像,我们在经验上观察到类似行为。在即插即用求解器中,相同机制扩展到一般逆问题,控制感知对齐与数据一致性之间的权衡。尽管在此设置中缺乏精确最优性保证,单个训练的流映射跨越DP权衡,在两端匹配或超越专门基线。在CelebA(128×128)和AFHQ(256×256)上的多个线性和非线性逆任务的广泛实验验证了我们的发现。

英文摘要

Image restoration faces a fundamental tradeoff: methods that minimize error produce blurry reconstructions, while those that maximize perceptual quality yield sharp but less faithful images. Existing approaches either commit to a single operating point on this distortion perception (DP) frontier or require paired-data supervision, auxiliary models, or hyperparameter tuning of the sampler to access different points. We show that flow map models, a recent extension of flow matching for few-step sampling that learns an average field, implicitly define a one-parameter family of denoisers that continuously spans the DP frontier. The lookahead parameter t acts as a control knob between the MMSE and perceptual regimes. For Gaussian targets, we prove that varying t exactly recovers the optimal DP frontier; for natural images, we observe similar behavior empirically. Within a Plug-and-Play solver, the same mechanism extends to general inverse problems, where it controls a tradeoff between perceptual alignment and data consistency. Despite the lack of exact optimality guarantees in this setting, a single trained flow map spans the DP tradeoff, matching or exceeding specialized baselines at both extremes. Extensive experiments on CelebA ($128\times 128$) and AFHQ ($256\times 256$) across several linear and nonlinear inverse tasks validate our findings.

2606.19894 2026-06-19 cs.LG 新提交

Score Approximation for Diffusion Models on Arbitrary Low-Dimensional Structures

任意低维结构上扩散模型的分数近似

Xinhe Mu, Zaijiu Shang, Zhaoqi Zhou, Chuan Zhou, Qi Meng, Guiying Yan, Zhiming Ma

发表机构 * Shanghai Institute for Mathematics and Interdisciplinary Sciences(上海数学与交叉科学研究院) Huawei Technologies Co., Ltd.(华为技术有限公司)

AI总结 针对任意紧支撑分布,提出一种基于离散混合的分数近似方法,证明ReLU网络复杂度仅随上Minkowski维数d指数增长,打破环境维数诅咒,解释扩散模型在非光滑数据上的有效性。

详情
AI中文摘要

基于分数的扩散模型的显著成功激发了大量建立其理论基础的努力。然而,现有的分数近似复杂度界限严重依赖于限制性假设,如Lipschitz连续密度或光滑流形支撑,而这些假设通常被真实感知数据固有的奇异性、尖锐边界和不连续簇所违反。本文建立了一个通用的分数近似定理,适用于任何支撑在任意上Minkowski维数为$d$的紧集上的分布。通过一种新颖的离散混合公式,我们证明了分数函数可以用ReLU网络近似,其复杂度仅随$d$指数增长,从而打破了环境维数的指数诅咒。结合现有关于精确求解任意紧分布的反向扩散SDE的理论,我们的工作表明扩散模型能够自适应地处理不规则、非光滑的数据结构,解释了它们在真实生成任务中的能力。

英文摘要

The remarkable success of score-based diffusion models has spurred significant efforts to establish their theoretical foundations. However, existing complexity bounds for score approximation rely heavily on restrictive assumptions like Lipschitz continuous densities or smooth manifold supports, which are routinely violated by the singularities, sharp boundaries, and disjoint clusters inherent to real-world perceptual data. This work establishes a universal score approximation theorem that works for any distribution supported on any compact set of upper Minkowski dimension $d$. Using a novel discrete-mixture formulation, we prove that the score function can be approximated with a ReLU network whose complexity grows exponentially only with $d$, thus breaking the exponential curse of ambient dimensionality. Combined with existing theories on accurately solving the backward diffusion SDE for arbitrary compact distributions, our work shows that diffusion models readily adapt to irregular, non-smooth data structures, explaining their competence in real-world generative tasks.

2606.14510 2026-06-19 cs.LG q-bio.BM 新提交

PepALD: Macrocyclic Peptide Generation via Autoregressive Latent Diffusion

PepALD: 通过自回归潜在扩散生成大环肽

Junming Zhang, Siyu Yi, Wei Ju, Zhonghui Gu

发表机构 * College of Computer Science, Sichuan University(四川大学计算机科学学院) School of Mathematics, Sichuan University(四川大学数学学院) School of Artificial Intelligence, Sichuan University(四川大学人工智能学院) Lingang Laboratory(临港实验室)

AI总结 提出PepALD模型,结合自回归潜在扩散与化学嵌入,实现从头设计大环肽,并利用偏好优化提升亲和力,在生成质量和奖励优化上优于基线。

Comments 18 pages, 5 figures, 3 tables

详情
AI中文摘要

大环肽是细胞内靶点的有前景的治疗候选物,但其设计需要同时控制非天然单体化学、环拓扑、膜通透性和靶点结合。现有的SMILES或HELM字符串生成模型要么在长原子级序列空间中操作,要么将单体视为具有有限化学基础符号化令牌。我们引入了PepALD,一个用于从头生成大环肽的自回归潜在扩散(ALD)基础模型。该模型使用结构化化学嵌入表示HELM单体,通过在化学信息潜在空间中的上下文条件扩散生成每个残基,在自回归生成过程中预测R基团感知的环闭合,并使用胜者保护的扩散自适应偏好优化将去噪器与亲和力奖励对齐。体外实验表明,PepALD在生成质量和奖励优化性能上优于代表性肽生成基线。

英文摘要

Macrocyclic peptides are promising therapeutic candidates for intracellular targets, but their design requires simultaneous control over non-natural monomer chemistry, ring topology, membrane permeability, and target binding. Existing SMILES- or HELM-string generative models either operate in long atom-level sequence spaces or treat monomers as symbolic tokens with limited chemical grounding. We introduce PepALD, an Autoregressive Latent Diffusion (ALD) foundation model for \textit{de novo} macrocyclic peptide generation. The model represents HELM monomers with structured chemical embeddings, generates each residue through context-conditioned diffusion in chemically informed latent space, predicts R-group-aware ring closures during autoregressive generation, and aligns the denoiser to affinity rewards using winner-protected diffusion-adapted preference optimization. In silico experiments demonstrate PepALD's generation quality and reward-optimization performance against representative peptide generation baselines.

2606.20416 2026-06-19 cs.LG cs.CV 新提交

On the Redundancy of Timestep Embeddings in Diffusion Models

扩散模型中时间步嵌入的冗余性研究

José A. Chávez

发表机构 * Independent Researcher, Lima, Peru(独立研究者,秘鲁利马)

AI总结 本文通过理论和实验证明,在U-Net和Diffusion Transformer架构中,扩散模型无需显式时间步嵌入也能达到全局最优,甚至在某些指标上超越有条件模型。

Comments 17 pages

详情
AI中文摘要

扩散模型严重依赖显式的时间步嵌入来调节不同噪声尺度下的去噪过程。在这项工作中,我们通过分析时间步嵌入对U-Net和Diffusion Transformer架构的影响,挑战了这些时间信号的必要性。除了经验证据外,我们提供了一个理论框架,证明在某些条件下,无需显式时间步条件即可达到扩散训练目标的全局最小值。我们的发现揭示了当完全移除时间步嵌入时令人惊讶的鲁棒性。在CelebA和CIFAR-10数据集上的大量消融研究表明,这些时间无关模型可以保持高结构保真度,甚至在竞争性指标(包括FID、精确率和召回率)上超越其有条件对应模型。我们的分析表明,这些架构可以在特定假设下从损坏输入中隐式推断噪声尺度,使得显式时间条件变得冗余。这项研究挑战了长期以来的时间条件范式,并为更高效、更注重结构的生成架构铺平了道路。

英文摘要

Diffusion models rely heavily on explicit timestep embeddings to modulate the denoising process across various noise scales. In this work, we challenge the necessity of these temporal signals by analyzing their impact on U-Net and Diffusion Transformer architectures. Beyond empirical evidence, we provide a theoretical framework demonstrating that, under certain conditions, the global minimizer of the diffusion training objective can be achieved without explicit timestep conditioning. Our findings reveal a surprising robustness when timestep embeddings are completely removed. Extensive ablation studies on the CelebA and CIFAR-10 datasets show that these time-agnostic models can maintain high structural fidelity and even surpass their conditioned counterparts in competitive metrics, including FID, precision, and recall. Our analysis suggests these architectures can implicitly infer noise scales from the corrupted input under specific assumptions, rendering explicit temporal conditioning redundant. This study challenges long-standing temporal conditioning paradigms and paves the way for more efficient and structurally focused generative architectures.

5. 优化、泛化与理论分析 15 篇

2606.19361 2026-06-19 cs.LG cs.AI cs.NA math.NA stat.CO stat.ME stat.ML 新提交

Computational Identifiability

计算可识别性

Lucius E. J. Bynum, Rajesh Ranganath, Kyunghyun Cho

发表机构 * New York University(纽约大学)

AI总结 提出“计算可识别性”框架,通过有限计算搜索过程在指定误差容限内找到经验估计量,从而解决理论可识别性在有限样本、模糊图标准等实际场景中的不足。

详情
AI中文摘要

识别条件描述了目标查询或感兴趣参数作为可用信息类型和数量的函数的可计算性。在因果识别中,这些信息通常以因果图的形式表达,数据是针对图中某些变量子集观测或收集的。目标查询可以是单个效应,也可以是给定模型中的一类效应。识别算法的推导在数学上定义了期望中理论上唯一确定所需因果效应的过程。期望中的可识别性,即“理论可识别性”,通常假设渐近性质、无限数据或其他数学理想化条件。在本文中,我们探讨了这种理论理想化的可识别性与一种受计算限制的替代方案之间的根本区别。我们提出的框架——“计算可识别性”——而是为经验估计量定义一个有限的计算搜索过程。如果该过程在期望的误差容限内经验性地找到了估计量,则满足可识别性,条件取决于搜索的指定假设(即参数上的先验分布)以及搜索过程本身。通过多个实验,我们展示了该框架如何回答细粒度的实际识别问题,例如小有限样本下的识别、模糊图标准下的识别、混合观测-干预数据下的识别,以及跨反事实数据和估计量的识别。代码见 https://this https URL。

英文摘要

Identification conditions describe the computability of a target query or parameter of interest as a function of the type and amount of information available. In causal identification, this information is often expressed in the form of a causal graph, and data are observed or collected for some subset of variables in the graph. Target queries may be for a single effect alone or for a class of effects in a given model. The derivation of an identification algorithm then defines mathematically the process by which the desired causal effect(s) can be uniquely determined, theoretically, in expectation. Identifiability in expectation, or 'theoretical identifiability,' generally assumes asymptotic properties, infinite data, or other mathematically idealized conditions. In this paper, we explore a fundamental distinction between this theoretical, idealized notion of identifiability and a proposed alternative that is computation-bound. The framework we propose - 'computational identifiability' - is to instead define a finite computational search procedure for an empirical estimator. If this process finds an estimator empirically, within a desired error tolerance, then identifiability is satisfied, conditional on the specified assumptions of the search (i.e., a prior distribution over the parameters) and conditional on the search procedure itself. Through several experiments, we demonstrate how this framework allows us to answer fine-grained, practical identification questions, such as identification with small finite samples, with ambiguous graphical criteria, with mixed observational-interventional data, and across counterfactual data and estimands. Code is available at https://github.com/lbynum/metadentify.

2606.19366 2026-06-19 cs.LG cs.AI eess.SP 新提交

Information Lattice Learning as Probabilistic Graphical Model Structure Learning

信息格学习作为概率图模型结构学习

Haizi Yu, Lav R. Varshney

发表机构 * Kocree, Inc.(Kocree公司) AI Innovation Institute, Stony Brook University(石溪大学人工智能创新研究所)

AI总结 将信息格学习(ILL)解释为概率图模型结构学习,通过投影到分区格上学习可解释规则,并建立与最大熵和因子图的联系。

详情
AI中文摘要

信息格学习(ILL)通过将信号交替投影到编码抽象层次结构的分区格上,并将选定的规则提升回信号域,来学习信号的可解释规则。当信号是概率质量函数时,我们证明ILL学习的概率规则具有自然的概率图模型(PGM)解释,并详细发展了这一解释。ILL中的分区诱导出一个确定性的商变量,规则是该商变量的边际分布。因此,规则集是可解释抽象上的边际约束集合。一般提升是满足这些约束的所有联合分布的可行族,而特殊提升则选择最大无知重建,在ILL中通过L2均匀性原理实现,该原理与最大熵密切相关。在香农熵提升下,相同的约束产生一个对数线性因子图,其因子由学习的抽象索引。然而,信息格本身不是贝叶斯网络:其边编码抽象的细化与粗化,而非条件依赖。因此,ILL最好被视为商变量上可解释的基于约束的因子图的结构学习。这一观点阐明了ILL如何与图模型和最大熵模型相关,同时为推理、可识别性和混合符号-概率学习提出了新方向。

英文摘要

Information lattice learning (ILL) learns interpretable rules of a signal by alternately projecting the signal onto a partition lattice that encodes a hierarchy of abstractions and lifting selected rules back to the signal domain. When the signal is a probability mass function, we show the probabilistic rules learned by ILL admit a natural probabilistic graphical model (PGM) interpretation and develop this interpretation in detail. A partition in ILL induces a deterministic quotient variable, and a rule is the marginal law of that quotient variable. A rule set is therefore a collection of marginal constraints over interpretable abstractions. General lifting is the feasible family of all joint distributions satisfying those constraints, while special lifting chooses a maximum-ignorance reconstruction, implemented in ILL by an L2 uniformity principle closely related to maximum entropy. Under a Shannon-entropy lifting, the same constraints yield a log-linear factor graph whose factors are indexed by learned abstractions. The information lattice itself, however, is not a Bayesian network: its edges encode refinement and coarsening of abstractions, not conditional dependence. Thus ILL is best viewed as structure learning for interpretable constraint-based factor graphs over quotient variables. This view clarifies how ILL relates to graphical models and maximum entropy models, while suggesting new directions for inference, identifiability, and hybrid symbolic-probabilistic learning.

2606.19367 2026-06-19 cs.LG 新提交

Weibull Weight-Scale Parameter Evolution under AdamW Training Dynamics

Weibull 权重尺度参数在 AdamW 训练动态下的演化

Tiexin Ding

发表机构 * Independent Researcher(独立研究员)

AI总结 研究 AdamW 训练中 Weibull 权重尺度参数 λ 增长、过冲和松弛的原因,推导出三种力(对齐、注入、衰减)的分解,并在 Pythia-70M 模型上验证对齐力主导上升阶段,贡献 88-94%。

Comments 21 pages, 14 figures

详情
AI中文摘要

基于用于诊断变压器权重分布的双参数 Weibull 框架,我们研究了为什么在 AdamW 训练期间 Weibull 权重尺度参数 λ 会增长、过冲然后松弛。我们从 AdamW 更新中推导出平方权重范数的领先阶三力分解:一个对齐力,测量权重与自适应更新方向之间的相关性;一个注入力,来自自适应步长幅度;以及一个衰减力,来自解耦的权重衰减。在具有真实优化器矩的自训练 Pythia-70M 模型上,对齐力主导上升阶段,在四个随机种子中贡献了绝对力预算的 88-94%,并且对超权重移除具有鲁棒性。接近饱和时,对齐力和衰减力趋于平衡,解释了从权重尺度增长到松弛的转变。这些力动态直接控制 λ(t) 背后的平方范数分量;剩余的 RMS 到 Weibull 重建偏移是可测量的,并分解为桥接分量和积分分量,在密集采样区域总计约 5-6%。为了将分析扩展到无法获得优化器矩的真实模型,我们引入了一种样条位移方法,该方法从稀疏检查点以约 92-94% 的准确率恢复对齐力,大约是朴素两点基线的两倍。我们进一步观察到,在我们的实验中,λ(t) 的峰值随训练数据一致性而变化,这表明权重尺度增长存在数据依赖成分,我们将其留待后续对照研究。代码和数据可在 https://this URL 获取。

英文摘要

Building on a two-parameter Weibull framework for diagnosing transformer weight distributions, we study why the Weibull weight-scale parameter $λ$ grows, overshoots, and then relaxes during AdamW training. We derive a leading-order three-force decomposition of the squared weight norm from the AdamW update: an alignment force measuring the correlation between weights and the adaptive update direction, an injection force from adaptive step magnitude, and a decay force from decoupled weight decay. On self-trained Pythia-70M models with ground-truth optimizer moments, alignment dominates the rise phase, contributing 88-94% of the absolute force budget across four random seeds and remaining robust to super-weight removal. Near saturation, alignment and decay approach balance, explaining the transition from weight-scale growth to relaxation. These force dynamics directly govern the squared-norm component underlying $λ(t)$; the remaining RMS-to-Weibull reconstruction offset is measurable and decomposes into bridge and integration components, totaling approximately 5-6% in densely sampled regions. To extend the analysis to real models where optimizer moments are unavailable, we introduce a spline displacement method that recovers the alignment force from sparse checkpoints with approximately 92-94% accuracy, about twice the naive two-point baseline. We further observe that the peak value of $λ(t)$ varies with training-data coherence in our experiments, suggesting a data-dependent component of weight-scale growth that we leave to a controlled follow-up study. Code and data are available at https://github.com/tiexinding/NPM-Weibull-public.

2606.19369 2026-06-19 cs.LG cs.AI 新提交

Zero-Inflated Gaussian Distributions Enable Parameter-Space Sparsity in Estimation-of-Distribution Algorithms

零膨胀高斯分布使估计分布算法中的参数空间稀疏化

Andreas Faust, Sven Nitzsche, Juergen Becker

发表机构 * University of Freiburg(弗莱堡大学) FZI Research Center for Information Technology(FZI信息技术研究中心) Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院)

AI总结 提出多元零膨胀高斯分布作为估计分布算法的采样分布,联合优化稀疏模式和活跃参数,无需手工设计稀疏算子,在Lunar Lander基准上收敛更快且最终回报更高。

详情
AI中文摘要

估计分布算法(EDA)是一类强大的黑箱优化进化方法,尤其当目标函数结构未知时。经典进化算法依赖于手工设计的变异和交叉算子,这些算子难以针对未知问题结构设计,且是偏差的来源,而EDA完全绕过了算子设计:它们将概率分布拟合到最佳个体,并从中采样下一代。EDA在连续参数空间上已得到充分确立,但此前尚未推广到稀疏空间——其中良好解的大多数系数恰好为零。现有的稀疏黑箱优化器因此重新引入了EDA旨在避免的东西:手工制作的稀疏算子、支持集与活跃值交替的双层方案、零阈值以及其他内置假设。我们通过提出多元零膨胀高斯(ZIG)分布作为EDA采样法则来填补这一空白。一个具有独立指示维度和值维度的潜在高斯模型表示稀疏模式、活跃参数之间的相关性以及两者之间的相互作用,因此稀疏模式和活跃值被联合优化,无需层次结构。我们证明该模型的潜在参数可以从观测样本中识别,不同于相关构造起源的缺失数据设置,并引入了实用的基于摊销反演的估计器。这些估计器准确恢复潜在相关结构,在Lunar Lander基准上,由此产生的ZIG-EDA比稠密高斯EDA、手工制作的稀疏进化算法和特设稀疏EDA收敛更快且最终回报更高,同时找到的控制器只有一小部分参数活跃。

英文摘要

Estimation-of-distribution algorithms (EDAs) are a powerful class of evolutionary methods for black-box optimization, especially when little is known about the structure of the objective. Whereas classical evolutionary algorithms rely on hand-designed mutation and crossover operators, hard to devise for unknown problem structures, and a source of bias, EDAs sidestep operator design entirely: they fit a probability distribution to the best individuals and sample the next generation from it. EDAs are well established on continuous parameter spaces, but they have not previously been generalized to sparse ones, in which most coefficients of a good solution are exactly zero. Existing sparse black-box optimizers therefore reintroduce exactly what EDAs were designed to avoid: hand-crafted sparsity operators, bi-level schemes alternating between support set and active values, zeroing thresholds, and other baked-in assumptions. We close this gap by proposing multivariate zero-inflated Gaussian (ZIG) distributions as EDA sampling laws. A latent Gaussian model with separate indicator and value dimensions represents sparsity patterns, correlations among active parameters, and the interactions between the two, so sparsity patterns and active values are optimized jointly, hierarchy-free. We show that the latent parameters of this model are identifiable from observed samples, unlike in the missing-data settings where related constructions originate, and introduce practical amortized inversion-based estimators for them. The estimators accurately recover latent correlation structures, and on the Lunar Lander benchmark the resulting ZIG-EDA converges faster and reaches higher final returns than a dense Gaussian EDA, a hand-crafted sparse evolutionary algorithm, and an ad-hoc sparse EDA, while finding controllers with only a small fraction of parameters active.

2606.19491 2026-06-19 cs.LG stat.ML 新提交

Algebraic Dead Directions in LayerNorm Transformers: A Forward-Pass-Only Diagnostic at LLM Scale

LayerNorm Transformer 中的代数死方向:一种仅需前向传播的大语言模型规模诊断方法

Tejas Pradeep Shirodkar, P. J. Narayanan

发表机构 * IIIT, Hyderabad(海得拉巴国际信息技术学院)

AI总结 本文发现 LayerNorm 的逆尺度方向是后最终归一化中心激活协方差矩阵的精确代数核,可仅从参数中读取死方向,无需前向或后向传播,并在 14 个预训练模型上验证了其有效性。

Comments 34 pages, 7 figures, 6 tables. Empirical companion to arXiv:2606.05957

详情
AI中文摘要

预训练 Transformer 位于损失函数的奇异极小值附近,此时 Fisher 信息度量沿死方向退化:参数空间中方向性 Fisher 为零的方向。通常定位这样的方向需要一次前向传播和激活矩阵的特征分解,或基于采样的复杂度估计;没有一种方法能仅从网络参数计算方向。我们针对 LayerNorm Transformer 给出了一个这样的方向。LayerNorm 仿射的逆尺度方向 $\gamma^{-1}/\|\gamma^{-1}\|$ 是后最终归一化中心激活协方差矩阵的精确代数核,适用于任何输入分布,并在参数空间中诱导出相应的死方向。它仅从 LN 尺度参数读取,无需前向或后向传播,无需特征分解:这是针对 LayerNorm 的最廉价死方向读取方法。我们在 14 个预训练 Transformer(9 个 LayerNorm,5 个 RMSNorm;160M-35B;语言和视觉目标)上进行了测试。在随机初始化时,预测方向与测量的底部奇异方向(一次前向传播,直接 SVD)在 9/9 的 LayerNorm 模型上匹配到小数点后四位,并在 5/5 的 RMSNorm 模型上正确缺失,后者缺乏产生该方向的均值减法投影器。在训练后的检查点上,沿该方向的协方差特征值加深约 ${\sim}10^3$ 倍,并打开更多死方向;随机初始化到训练后的差距是一次前向传播、每检查点沿预测坐标的奇异结构读出。由此得出两个闭式结论:残差流的最小奇异值在 13/14 个 Transformer 上逐块保持不变(在其自身输入分布上测量),唯一的例外(Gemma$4$-$31$B)是一个真正的死方向,同一读出可精确定位;核方向的存在从参数本身即可对 Transformer 的归一化进行分类。

英文摘要

Pretrained transformers sit near singular minima of the loss, where the Fisher information metric degenerates along dead directions: directions in parameter space along which the directional Fisher vanishes. Locating such a direction normally needs a forward pass and an eigendecomposition of activations, or a sampling-based complexity estimate; none returns a direction computable from the network's parameters alone. We give one, for LayerNorm transformers. The inverse-scale direction $γ^{-1}/\|γ^{-1}\|$ of the LayerNorm affine is an exact algebraic kernel of the post-final-norm centred activation covariance, for any input distribution, and induces a corresponding dead direction in parameter space. It is read from the LN scale parameter alone, with no forward or backward pass and no eigensolve: the cheapest dead-direction read, specific to LayerNorm. We test it on $14$ pretrained transformers ($9$ LayerNorm, $5$ RMSNorm; $160$M-$35$B; language and vision objectives). At random initialisation the predicted direction matches the measured bottom singular direction (one forward pass, direct SVD) to four decimal places on $9/9$ LayerNorm models, and is correctly absent on $5/5$ RMSNorm models, which lack the mean-subtraction projector that creates it. On the trained checkpoint the covariance eigenvalue along this direction deepens by ${\sim}10^3\times$ and further dead directions open; the random-init-to-trained gap is a one-forward-pass, per-checkpoint readout of singular structure along the predicted coordinate. Two consequences follow in closed form: the residual stream's smallest singular value is preserved block-to-block on $13/14$ transformers measured on their own input distribution, the one exception (Gemma$4$-$31$B) a genuine dead direction the same read pinpoints; and the kernel direction's presence classifies a transformer's normalisation from the parameters alone.

2606.19521 2026-06-19 cs.LG math.OC 新提交

Interactive Pareto navigation for deep multi-task learning

深度多任务学习的交互式帕累托导航

Augustina C. Amakor, Konstantin Sonntag, Sebastian Peitz

发表机构 * Department of Computer Science, TU Dortmund, Dortmund, Germany(多特蒙德工业大学计算机科学系,德国多特蒙德) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

AI总结 提出偏好帕累托探索(PPE)框架,通过预测-校正方法沿帕累托流形切线方向引导偏好,利用Krylov子空间方法避免Hessian计算,实现高效交互式多目标优化。

详情
AI中文摘要

在多任务学习中,处理越来越多的目标在计算资源和决策者选择适当权衡的能力方面都很快变得具有挑战性。因此,一种广泛使用的方法是通过加权和将各个损失聚合到单个损失函数中。这通常由于帕累托前沿的形状而无法捕捉决策者的偏好,或者需要多次调整和计算,这在深度学习应用中变得过于昂贵。为了解决这些问题,我们引入了一个新颖的框架,偏好帕累托探索(PPE),它在交互式探索过程中强制执行决策者的偏好,同时考虑帕累托集的几何形状。PPE基于预测-校正方法,该方法沿着帕累托最优解流形的切线方向执行预测步骤,遵循决策者的偏好。随后的校正步骤产生反映该偏好的新权衡。为了在表征流形切空间时避免显式的Hessian计算,我们采用了一种仅依赖于矩阵-向量乘积的Krylov子空间方法。这些乘积可以通过自动微分高效获得,确保了整个优化过程的效率和鲁棒性。该方法的有效性和性能通过玩具问题和深度学习示例进行了展示。

英文摘要

In multi-task learning, handling an increasing number of objectives can quickly become challenging, both in terms of the computational resources and the decision maker's capacity to choose appropriate trade-offs. A widely used approach is thus to aggregate the individual losses in a single loss function by a weighted sum. This often fails to capture either the decision maker's preferences as a result of the shape of the Pareto front, or requires multiple adjustments and computations which becomes prohibitively expensive in deep learning applications. To address these issues, we introduce a novel framework, Preference Pareto Exploration (PPE), which enforces the decision maker's preferences while accounting for the geometry of the Pareto set in an interactive exploration process. PPE is based on a predictor-corrector method that performs predictor steps tangential to the manifold of Pareto-optimal solutions, following the decision maker's preference. The subsequent corrector step results in a new trade-off reflecting this preference. To avoid explicit Hessian computations when characterizing the tangent space of the manifold, we employ a Krylov subspace method that relies solely on matrix-vector products. These products can be efficiently obtained via automatic differentiation, ensuring both efficiency and robustness throughout the optimization process. The method's functionality and performance are demonstrated using both toy problems and examples from deep learning.

2606.19652 2026-06-19 cs.LG 新提交

Convex training of Lipschitz-regularized shallow neural networks

Lipschitz正则化浅层神经网络的凸训练

Chao Yin, Antoine Lesage-Landry

发表机构 * Polytechnique Montréal, GERAD & Mila, Montréal, QC, Canada(蒙特利尔理工学院,GERAD & Mila,加拿大魁北克省蒙特利尔市)

AI总结 提出一种凸限制方法求解非凸Lipschitz正则化训练问题,可全局最优求解,并作为预训练网络的后处理步骤,提升对抗鲁棒性和准确性。

详情
AI中文摘要

在这项工作中,我们引入了一种针对浅层神经网络的训练程序,该程序能够提升对对抗攻击的鲁棒性。我们通过引入一个凸限制来解决非凸的Lipschitz正则化训练问题,该凸限制可以高效地求解全局最优解。我们的方法可以作为后处理步骤,将预训练网络作为初始解,然后求解凸规划,其最优网络保证不劣于初始网络。我们通过在对抗设置下使用真实世界数据集进行回归任务的实验,展示了我们训练程序的改进。数值结果表明,与现有方法相比,求解我们提出的凸规划得到的网络在Lipschitz正则化程序上具有更低的目标值。此外,我们表明,在某些数据集上,使用我们的凸训练程序获得的网络在对抗攻击下既更准确又更鲁棒。

英文摘要

In this work, we introduce a training procedure for shallow neural networks that promotes robustness against adversarial attacks. We solve a non-convex Lipschitz-regularized training program by introducing a convex restriction that can be efficiently solved to global optimality. Our approach can be employed as a post-processing step by taking a pre-trained network as an initial solution to then solving the convex program whose optimal network is guaranteed to be no worse than the initial one. We illustrate the improvements of our training procedure with experiments using real world datasets for regression tasks under an adversarial setting. We show numerically that solving our proposed convex program yields networks with lower objective values on the Lipschitz-regularized program compared to existing methods. Additionally, we show that on certain datasets, networks obtained using our convex training program are both more accurate and robust with respect to adversarial attacks.

2606.19876 2026-06-19 cs.LG math.OC 新提交

Global Convergence of Gradient Descent for Score Matching in Gaussian Mixtures via Reverse Fisher Divergence

通过反向Fisher散度实现高斯混合模型中得分匹配的梯度下降全局收敛

Alexander Tyurin

AI总结 研究反向Fisher散度下梯度下降拟合高斯混合模型的全局收敛性,证明从任意初始化或随机初始化下学生分量收敛到最近教师分量,并给出全变差距离收敛条件。

详情
AI中文摘要

得分匹配问题是现代生成建模、扩散模型、拟合非归一化统计模型和逆问题中的核心训练目标。标准方法是最小化前向Fisher散度,其中期望相对于教师分布取。然而,最近结果表明,即使在简单的高斯混合模型设置中,该目标也可能导致不良且依赖初始化的收敛行为。本文研究另一种目标:反向Fisher散度,其中期望相对于学生分布取。我们分析梯度下降(GD)拟合高斯混合模型,并表明目标函数的这一改变导致显著更好的优化性质。首先,当教师分布是单个高斯分布且学生是固定权重和单位协方差的高斯混合模型时,我们证明了从任意初始化出发GD的全局收敛性。其次,我们将分析扩展到教师也是高斯混合模型的情况,并在全局随机初始化方案和目标均值满足$\widetilde{\Omega}(1)$-分离假设下证明了全局收敛保证。特别地,以高概率,每个学生分量收敛到其最近的教师分量,并且我们提供了学生分布在全变差距离下收敛的条件。我们的证明依赖于基于Lyapunov的梯度下降动力学新分析,表明反向Fisher散度比前向Fisher散度具有更有利的优化景观。

英文摘要

The score matching problem is a central training objective in modern generative modeling, diffusion models, fitting unnormalized statistical models, and inverse problems. A standard approach is to minimize the forward Fisher divergence, where the expectation is taken with respect to the teacher distribution. However, recent results show that even in simple Gaussian mixture model settings, this objective can lead to undesirable and initialization-dependent convergence behavior. In this paper, we study an alternative objective: the reverse Fisher divergence, where the expectation is taken with respect to the student distribution. We analyze gradient descent (GD) for fitting Gaussian mixture models and show that this change in the objective leads to significantly better optimization properties. First, when the teacher distribution is a single Gaussian and the student is a Gaussian mixture model with fixed weights and identity covariances, we prove the global convergence of GD from arbitrary initializations. Second, we extend the analysis to the case where the teacher is also a Gaussian mixture model and prove global convergence guarantees under a global random initialization scheme and a $\widetildeΩ(1)$-separation assumption on the target means. In particular, with high probability, each student component converges near its closest teacher component, and we provide conditions under which the student distribution converges in total variation distance. Our proofs rely on a new Lyapunov-based analysis of the gradient descent dynamics, showing that the reverse Fisher divergence has a much more favorable optimization landscape than the forward Fisher divergence.

2606.19878 2026-06-19 cs.LG math.OC stat.ML 新提交

On the Oracle Complexity of Interpolation-Based Gradient Descent

基于插值的梯度下降的预言复杂度

Dongmin Lee, William Lu, Anuran Makur

发表机构 * Purdue University(普渡大学)

AI总结 提出分段多项式插值梯度下降(PPI-GD)方法,通过数据域等距点查询一阶预言构造多项式插值近似全梯度,在强凸和非凸损失下分析预言复杂度,证明在数据维数受限且损失足够光滑时优于多种GD变体。

Comments 16 pages, 2 figures

详情
AI中文摘要

最近关于经验风险最小化(ERM)的一阶优化器的工作表明,可以利用ERM损失函数在训练数据中的光滑性(而非优化参数中的光滑性)来改进梯度下降(GD)方法的预言复杂度。在本文中,我们提出了一种不精确梯度方法——分段多项式插值梯度下降(PPI-GD),该方法通过在数据域中的等距点处查询一阶预言来近似每次迭代中的全梯度,从而在数据域的适当大小的块上构造所得梯度样本的多项式插值。我们分析了PPI-GD在强凸和非凸损失函数下的预言复杂度,其中数据空间维数以训练样本数量的多对数函数为界,并发现当损失函数足够光滑时,PPI-GD在关键区域优于几种GD变体。此外,我们的分析将双三次样条插值误差分析中的几种技术扩展到$d$变量张量积多项式插值的设置中,这可能对插值分析具有独立意义。

英文摘要

Recent work on first-order optimizers for empirical risk minimization (ERM) has suggested that smoothness of ERM loss functions in the training data, rather than in the optimization parameters, can be leveraged to improve the oracle complexity of gradient descent (GD) methods. In this paper, we propose an inexact gradient method, piecewise polynomial interpolation-based gradient descent (PPI-GD), which approximates the full gradient in each iteration by querying the first-order oracle at equidistant points in the data domain to construct polynomial interpolants of the resulting gradient samples over appropriately sized patches of the data domain. We analyze the oracle complexity of PPI-GD for strongly convex and non-convex loss functions when the data space dimension is bounded by a polylogarithmic function of the number of training samples, and find it to outperform several GD variants in key regimes when the loss function is sufficiently smooth. Furthermore, our analysis extends several techniques from the error analysis of bicubic spline interpolants to the setting of $d$-variate tensor product polynomial interpolants which may be of independent interest in interpolation analysis.

2606.19891 2026-06-19 cs.LG 新提交

Adversarial Bandit Optimization with Globally Bounded Perturbations to Convex Losses

具有全局有界扰动的凸损失对抗性赌博机优化

Zhuoyu Cheng, Kohei Hatano, Eiji Takimoto

发表机构 * Department of Informatics, Kyushu University(九州大学信息学系) RIKEN AIP(理化学研究所革新智能综合研究中心)

AI总结 研究损失函数可能非凸非光滑的对抗性赌博机优化,提出一种修改的赌博机优化算法,并分析扰动预算对遗憾的影响,将线性损失下的全局预算后行动扰动模型扩展到一般凸且β-光滑损失。

详情
AI中文摘要

我们研究对抗性赌博机优化,其中损失函数可能非凸且非光滑。在每一轮中,学习者选择一个动作并仅观察该动作产生的损失。损失由一个潜在的凸且β-光滑分量和一个对抗性扰动组成,该扰动可能在观察学习者的动作后选择。扰动受全局预算约束,控制其随时间累积的幅度。该框架将全局预算的后行动扰动模型从线性损失扩展到一般凸且β-光滑损失。对于这个更广泛的类别,我们建立了期望遗憾保证,明确刻画了扰动预算的影响。为了建立这些保证,我们修改了一个标准的赌博机优化算法,并开发了一种分析来控制由扰动引起的额外遗憾。在没有扰动的情况下,我们的结果退化为具有β-光滑损失的标准赌博机凸优化设置的遗憾保证。

英文摘要

We study adversarial bandit optimization in which the loss functions may be non-convex and non-smooth. In each round, the learner selects an action and observes only the loss incurred at that action. The loss consists of an underlying convex and $β$-smooth component and an adversarial perturbation that may be chosen after observing the learner's action. The perturbations are subject to a global budget controlling their cumulative magnitude over time. This framework extends the globally budgeted, post-action perturbation model from underlying linear losses to general convex and $β$-smooth losses. For this broader class, we establish expected regret guarantees that explicitly characterize the effect of the perturbation budget. To establish these guarantees, we modify a standard bandit optimization algorithm and develop an analysis that controls the additional regret caused by the perturbations. In the absence of perturbations, our results reduce to regret guarantees for the standard bandit convex optimization setting with $β$-smooth losses.

2606.20075 2026-06-19 cs.LG cs.CL 新提交

What Makes Effective Supervision in Latent Chain-of-Thought: An Information-Theoretic Analysis

什么使得潜在思维链中的监督有效:一种信息论分析

Xinghao Chen, Chak Tou Leong, Wenjin Guo, Jian Wang, Wenjie Li, Xiaoyu Shen

发表机构 * Ningbo Institute of Digital Twin, Eastern Institute of Technology(宁波数字孪生研究院,东方理工大学) Department of Computing, The Hong Kong Polytechnic University(香港理工大学计算学系)

AI总结 本文从信息论角度分析潜在思维链中的监督失效问题,提出轨迹监督和空间监督两个维度,并引入统一潜在探针(ULP)量化信息保真度,揭示了信息-性能绑定关系。

详情
AI中文摘要

潜在思维链(Latent Chain-of-Thought, CoT)将推理内化到连续隐藏状态中,为冗长的离散推理轨迹提供了一种有前景的替代方案。然而,鲁棒的潜在推理仍然困难,因为结果监督提供的学习信号较弱,且容易导致潜在轨迹发生语义漂移。在这项工作中,我们从信息论角度分析潜在CoT,并将这种失效识别为双重崩溃:优化路径上的梯度衰减和潜在空间中的表征漂移。我们进一步将过程监督分解为两个互补维度:轨迹监督(注入密集的逐步推理信号)和空间监督(保持潜在流形的语义结构)。我们的分析表明,刚性几何压缩可能坍缩推理空间,而生成式重建提供了更灵活的语义锚点,更好地保留了信息容量。为了衡量这些效应,我们引入了统一潜在探针(Unified Latent Probe, ULP),用于量化潜在轨迹与显式推理步骤之间的互信息。实验揭示了清晰的信息-性能绑定关系:推理准确性取决于潜在链中保留的信息保真度。这些发现为潜在推理监督提供了一个原则性框架,并建议从几何模仿转向互信息最大化。我们的代码可在\href{this https URL}{此仓库}获取。

英文摘要

Latent Chain-of-Thought (CoT) internalizes reasoning within continuous hidden states, offering a promising alternative to verbose discrete reasoning traces. However, robust latent reasoning remains difficult because outcome supervision provides weak learning signals and leaves latent trajectories prone to semantic drift. In this work, we analyze Latent CoT from an information-theoretic perspective and identify this failure as a dual collapse: gradient attenuation along the optimization path and representational drift in the latent space. We further decompose process supervision into two complementary dimensions: Trajectory Supervision, which injects dense stepwise reasoning signals, and Space Supervision, which preserves the semantic structure of the latent manifold. Our analysis shows that rigid geometric compression can collapse the reasoning space, whereas generative reconstruction provides a more flexible semantic anchor that better preserves information capacity. To measure these effects, we introduce the Unified Latent Probe (ULP), which quantifies the mutual information between latent trajectories and explicit reasoning steps. Experiments reveal a clear Information-Performance Binding: reasoning accuracy depends on the information fidelity preserved in the latent chain. These findings provide a principled framework for latent reasoning supervision and suggest shifting from geometric imitation toward mutual information maximization. Our code is available at \href{https://github.com/EIT-NLP/Supervision-in-Latent-CoT}{this repository}.

2606.20183 2026-06-19 cs.LG 新提交

Effective Dimension Governs Generalization in Quantum Kernel Vision Models

有效维度主导量子核视觉模型的泛化

Jian Xu, Delu Zeng, John Paisley, Qibin Zhao

AI总结 通过有效维度d_eff解释量子视觉模型中纠缠结构增强泛化与量子噪声提升测试精度的现象,提出噪声形状核的谱分解与正则化机制。

详情
AI中文摘要

最近的量子视觉模型——量子视觉变换器和量子卷积网络——报告了两个引人注目但尚未解释的经验现象:(i) 具有更多或更均匀分布纠缠的拟设泛化更好,以及(ii) 注入量子噪声可以提高测试精度而不是降低它。这些观察目前被视为奇闻,通过网格搜索发现,并且如果有解释的话,也是手工进行的。我们表明,两者都是一个单一可测量量的表现:即(噪声形状的)量子特征核的\emph{有效维度}$d_{\rm eff}$。主要使用量子核视觉模型——由核分类器读出的量子特征映射——我们给出了一个谱解释,其中纠缠结构和量子噪声是调节$d_{\rm eff}$的两个旋钮;在过拟合区域,收缩$d_{\rm eff}$起到类似岭正则化的作用。我们分析了机制:退极化核$K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$的\emph{精确}分解,其中$d_{\rm eff}(K_p)\to1$,振幅阻尼的收缩结果(及其边界),核机器容量界,以及容量/对齐风险分解;在我们的纠缠实验中运作的单调收缩是经验验证的,并非普遍证明。沿着单参数退极化族,坍缩反而是通过构造精确的;我们仅用它来确认核分解到机器精度,最多达12个量子比特,而不是作为$d_{\rm eff}$的证据。振幅阻尼收缩$d_{\rm eff}$并沿倒U型最佳点将测试精度提升高达+13%;效应符号在过拟合和欠拟合区域之间翻转;噪声注入匹配显式谱过滤前沿。我们的结果将两个报告的现象组织成一个单一可测量原则,用于设计量子视觉模型。

英文摘要

Recent quantum vision models-quantum vision transformers and quantum convolutional networks-report two striking but unexplained empirical phenomena: (i) ansatze with more, or more uniformly distributed, entanglement generalize better, and (ii) injecting quantum noise can improve test accuracy rather than degrade it. These observations are currently treated as curiosities, discovered by grid search and explained, if at all, by hand. We show that both are manifestations of a single, measurable quantity: the \emph{effective dimension} $d_{\rm eff}$ of the (noise-shaped) quantum feature kernel. Working primarily with quantum-kernel vision models-a quantum feature map read out by a kernel classifier-we give a spectral account in which entanglement structure and quantum noise are two knobs that move $d_{\rm eff}$; in an overfitting regime, contracting $d_{\rm eff}$ acts as ridge-like regularization. We analyze the mechanism: an \emph{exact} decomposition of the depolarized kernel $K_p=(1-p)^2K+\tfrac{p(2-p)}{D}\mathbf{1}\mathbf{1}^\top$ with $d_{\rm eff}(K_p)\to1$, a contraction result (and its boundary) for amplitude damping, a kernel-machine capacity bound, and a capacity/alignment risk decomposition; the monotone contraction operative in our entangled experiments is verified empirically, not proven in general. Along the one-parameter depolarizing family the collapse is instead exact by construction; we use it only to confirm the kernel decomposition to machine precision and at up to $12$ qubits, not as evidence for $d_{\rm eff}$. Amplitude damping contracts $d_{\rm eff}$ and lifts test accuracy by up to $+13\%$ along an inverted-U sweet spot; the effect's sign flips between the over- and under-fitting regimes; noise injection matches an explicit spectral-filtering frontier. Our results organize two reported anecdotes into a single measurable principle for designing quantum-vision models.

2606.20325 2026-06-19 cs.LG cs.SC math.DS 新提交

Recurrent neural networks approximate continuous functions

递归神经网络近似连续函数

Valentin Abadie, Clemens Hutter, Helmut Bölcskei

AI总结 本文证明,对于[-1,1]上的任意连续函数,存在一个固定权重和隐藏维度的ReLU递归神经网络,其时间演化可以均匀逼近该函数,并给出了收敛速率和极小极大下界。

详情
AI中文摘要

经典逼近定理要求每当目标精度提高时,就需要一个新的神经网络。本文研究相反的可能性:能否一劳永逸地选择网络,而仅通过让其运行更长时间来换取精度?我们证明这对于[-1,1]上的每个连续函数都是可能的。更准确地说,每个这样的函数都可以通过一个具有固定权重和固定隐藏维度的单ReLU递归神经网络的时间演化来均匀逼近。该构造背后的机制是一个新的中间模型——带神经单元的图灵机(TMNU)。该模型保留了实现多项式逼近方案所需的算法自由度,同时保持足够的刚性,以便被具有显式隐藏维度和权重幅度界限的RNN模拟。由此产生的收敛速率反映了底层多项式逼近的速率。我们通过极小极大下界补充了该构造,表明运行时间不仅仅是证明的产物,而是这种固定网络逼近范式中不可避免的资源。

英文摘要

Classical approximation theorems ask for a new neural network whenever the target accuracy is improved. This paper studies the opposite possibility: can the network be chosen once and for all, and can accuracy be bought only by letting it run longer? We prove that this is possible for every continuous function on [-1,1]. More precisely, each such function is uniformly approximated by the time evolution of a single ReLU recurrent neural network with fixed weights and fixed hidden dimension. The mechanism behind the construction is a new intermediate model, the Turing machine with neural units (TMNU). This model retains the algorithmic freedom needed to implement polynomial approximation schemes, while remaining rigid enough to be simulated by RNNs with explicit bounds on hidden dimension and weight magnitude. The resulting convergence rates reflect the underlying polynomial approximation rates. We complement the construction with minimax lower bounds showing that runtime is not merely a proof artifact, but an unavoidable resource in this fixed-network approximation paradigm.

2606.20357 2026-06-19 cs.LG 新提交

On the Variance of Temporal Difference Learning and its Reduction Using Control Variates

时序差分学习的方差及其通过控制变量的降低

Hsiao-Ru Pan, Bernhard Schölkopf

AI总结 本文分析表格表示下相位设置中时序差分学习的方差,证明其方差降低机制是通过有效聚合更多独立轨迹,并比较了TD、MC和DAE的方差界限。

Comments Accepted at RLC2026

详情
AI中文摘要

我们使用表格表示的相位设置分析了时序差分(TD)学习的方差,并表明其降低方差的能力背后的机制之一是通过有效聚合大量独立轨迹。基于这一见解,我们证明(1)TD的方差渐近地被蒙特卡洛(MC)估计器的方差从上方界定,以及(2)对于固定数量的样本,较短的水平更新会导致较小的方差。除了TD,我们还展示了直接优势估计(DAE),一种估计优势函数的方法,可以被视为一种回归调整的控制变量,在大样本极限下实现了比TD更紧的方差界限。最后,我们通过精心设计的环境数值说明了这些估计器的行为。

英文摘要

We analyze the variance of temporal difference (TD) learning using the phased setting with tabular representation, and show that one of the mechanisms behind its ability to reduce variance is by effectively aggregating over a larger number of independent trajectories. Based on this insight, we demonstrate that (1) the variance of TD is asymptotically bounded from above by Monte Carlo (MC) estimators, and (2) shorter horizon updates incurs less variance for a fixed number of samples. Beyond TD, we show that Direct Advantage Estimation (DAE), a method for estimating the advantage function, can be seen as a type of regression-adjusted control variate, which achieves a tighter bound on the variance compared to TD in the large-sample limit. Finally, we numerically illustrate the behaviors of these estimators with carefully designed environments.

2606.20469 2026-06-19 cs.LG cs.CG 新提交

Fisher-Geometric Sharpness and the Implicit Bias of SGD toward Flat Minima

Fisher-几何锐度与SGD对平坦极小值的隐式偏好

Md Sakir Ahmed, Kumaresh Sarmah, Hemen Dutta

发表机构 * Gauhati University(高哈蒂大学)

AI总结 针对SGD偏好平坦极小值但欧氏锐度不具重参数化不变性的问题,提出基于Fisher信息矩阵的黎曼锐度,证明其不变性,并导出SGD稳态分布集中于平坦极小值,PAC-Bayes界联系泛化性能。

Comments 18 pages, 5 figures, preprint

详情
AI中文摘要

深度学习中的一个广泛直觉是随机梯度下降(SGD)隐式偏好平坦极小值,且平坦极小值泛化更好,但损失Hessian的迹或最大特征值等标准欧氏平坦度度量在保持网络函数的重参数化下并非不变,这削弱了这一叙事的理论基础。在本研究中,我们通过将平坦度建立在由Fisher信息矩阵(FIM)诱导的统计流形的黎曼几何上,解决了这一问题。我们在数学上定义了黎曼锐度,并证明它在光滑、保函数的重参数化下是不变的,这直接回应了Dinh等人在论文“Sharp minima can generalize for deep nets”中的批评。我们注意到这种不变性是真实FIM的一个性质;实践中使用的对角经验估计量(以及下面所有实验中的)仅近似继承不变性,而在任意重参数化下的精确不变性需要结构化估计量如K-FAC。我们将小批量SGD的梯度噪声形式化为具有与FIM成比例的协方差结构,推导出所得随机微分方程的稳态分布,然后证明概率质量指数级集中在黎曼平坦极小值处。一个由SR显式控制的PAC-Bayes泛化界正式地将这种几何偏差与测试性能联系起来。我们在MNIST和CIFAR-10上的实验证实,SR以欧氏锐度无法做到的方式可靠地跟踪泛化,并且其随$\eta/B$的缩放与理论预测相匹配。这些结果共同提供了一个严格的、重参数化不变的解释,说明为什么平坦极小值能泛化。

英文摘要

A widely held intuition in deep learning is that stochastic gradient descent (SGD) implicitly favors flat minima and that flat minima generalize better, but standard Euclidean measures of flatness such as the trace or maximum eigenvalue of the loss Hessian are not invariant under reparametrizations that preserve the network function, which undermines the theoretical foundations of this narrative. In this study we resolve this issue by grounding flatness in the Riemannian geometry of the statistical manifold induced by the Fisher Information Matrix (FIM). We define Riemannian sharpness mathematically and prove that it is invariant under smooth, function-preserving reparametrizations, which directly addresses the critique of Dinh et al. in the paper ``Sharp minima can generalize for deep nets''.We note that this invariance is a property of the true FIM; the diagonal empirical estimator used in practice (and in all experiments below) inherits invariance only approximately, and exact invariance under arbitrary reparametrizations would require structured estimators such as K-FAC. We formalize the gradient noise of mini-batch SGD as having a covariance structure proportional to the FIM, derive the stationary distribution of the resulting stochastic differential equation, and then show that the probability mass is exponentially concentrated at Riemannian-flat minima. A PAC-Bayes generalization bound controlled explicitly by SR formally links this geometric bias to test performance. Our experiments on MNIST and CIFAR-10 confirm that SR reliably tracks generalization in ways that Euclidean sharpness does not, and that its scaling with $η/B$ matches the theoretical predictions. Together these results provide a rigorous, reparametrization-invariant account of why flat minima generalize.

6. 高效学习、压缩与部署 11 篇

2606.19364 2026-06-19 cs.LG 新提交

Closing the Social-Semantic Gap: SPSD for Edge-Based Prompt Compression in Cloud LLM Inference

缩小社会-语义差距:SPSD用于云LLM推理中的边缘端提示压缩

Abhinit Sen, Ajeet Kumar, Manaranjan Pradhan

AI总结 针对云LLM推理中提示词预填充阶段能耗高的问题,提出SPSD边缘端管道,利用4比特量化小语言模型压缩用户提示,在保持响应质量非劣效的前提下,平均节省99.9个输入token,每调用净节能70-270 uWh。

Comments 19 pages, 7 tables, 1 figure, includes appendix

详情
AI中文摘要

大语言模型(LLM)推理的预填充阶段正成为云规模能耗的日益增长的贡献者。许多面向消费者的支持和对话提示包含社会性支架:礼貌标记、道歉性开场白、重复以及建立融洽关系的语言,这些对人类交流很重要,但对机器推理而言边际信息量较低。我们将这种差异称为社会-语义差距。我们提出SPSD(情感保留语义蒸馏),一种边缘端管道,在传输到云端部署的LLM之前,使用4比特量化的小语言模型压缩用户提示。在248个提示的语料库上,使用Gemma-2-2B-Instruct(Q4_K_M)作为SLM、Llama-3.1-8B-Instruct作为云端评估模型进行评估,每次蒸馏调用平均输入token节省99.9个,所有146次蒸馏调用均产生正向节省。通过盲法LLM-as-judge评分对121对进行评估,响应质量在15分制中预先指定的1分非劣效范围内不劣于原始路径;评审员给出43%平局、28%蒸馏胜出和29%原始胜出。余弦相似度结果不一:均值0.682,中位数0.712,54.1%的配对高于0.70参考阈值。安全关键领域通过基于规则的网关保守地路由至直通模式。在所述假设下,每次调用净节能估计为70-270 uWh。SPSD表明,设备端提示蒸馏可以在保持响应质量在实际非劣效范围内的同时,降低云LLM的输入token成本。

英文摘要

The prefill stage of Large Language Model (LLM) inference is a growing contributor to cloud-scale energy cost. Many consumer-support and conversational prompts contain social scaffolding: politeness markers, apologetic preamble, repetition, and rapport-building language that is important for human communication but carries low marginal information for machine reasoning. We call this discrepancy the Social-Semantic Gap. We present SPSD (Sentiment Preserving Semantic Distillation), an edge-based pipeline that compresses user prompts using a 4-bit quantised Small Language Model before transmission to a cloud-deployed LLM. Evaluation on a 248-prompt corpus using Gemma-2-2B-Instruct (Q4_K_M) as the SLM and Llama-3.1-8B-Instruct as the cloud evaluation model yields a mean input token saving of 99.9 tokens per distilled call, with all 146 distilled calls yielding positive savings. Response quality, assessed by blind LLM-as-judge scoring across 121 pairs, is non-inferior to the raw path within a pre-specified 1-point margin on a 15-point rubric; the judge awarded 43 percent ties, 28 percent distilled wins, and 29 percent raw wins. Cosine similarity is mixed: mean 0.682, median 0.712, with 54.1 percent of pairs above the 0.70 reference threshold. Safety-critical domains are conservatively routed to passthrough via rule-based gates. Per-call net energy saving is estimated at 70-270 uWh under stated assumptions. SPSD shows that on-device prompt distillation can reduce cloud LLM input-token cost while preserving response quality within a practical non-inferiority margin.

2606.19365 2026-06-19 cs.LG 新提交

Performance Analysis and Optimization of 3D Generative Diffusion Models across GPU Architectures

跨GPU架构的3D生成扩散模型性能分析与优化

Jeeho Ryoo, Yongchan Jung, Muhammad Ali Khaliq, Weidong Zhang, Jiatong Han, Byeong Kil Lee

发表机构 * Fairleigh Dickinson University(费尔利·迪金森大学) The University of Colorado at Colorado Springs(科罗拉多大学科罗拉多斯普林斯分校) Northeastern University(东北大学)

AI总结 针对3D MRI扩散模型Med-DDPM,分析其在三代NVIDIA架构上的内核级性能瓶颈,提出TF32 Tensor Core激活和3D channels-last布局优化,实现SM周期和动态指令减少100倍,Tensor Core利用率提升至9.98倍,IPC提升7%。

详情
AI中文摘要

扩散模型已成为高保真3D MRI合成的关键,但由于每个样本需要数百次U-Net评估以及高度异构的内核行为,其部署仍受到大量GPU资源需求的限制。本文对最先进的医学扩散模型Med-DDPM在三代NVIDIA架构上进行了全面的性能分析,研究了内核级运行时分解、指令混合特征、内存系统利用率、线程束级活动以及分析器优先级得分估计。我们发现训练主要由cuDNN卷积和隐式GEMM内核主导,效率低下源于内存访问模式、张量布局转换和有限的Tensor Core利用率。基于这些洞察,我们评估了两种架构感知优化——TF32 Tensor Core激活和3D channels-last布局,并证明它们将SM周期减少多达100倍,动态指令减少100倍,Tensor Core利用率从1.45倍提高到9.98倍,并在A100上将IPC提高7%,且不降低合成质量。

英文摘要

Diffusion models have become essential for high-fidelity 3D MRI synthesis, yet their deployment remains constrained by substantial GPU resource demands arising from hundreds of U-Net evaluations per sample and a highly heterogeneous kernel behavior. This paper performs a comprehensive performance analysis of the state-of-the-art medical diffusion model, Med-DDPM, across three generations of NVIDIA architectures to study kernel-level runtime breakdowns, instruction-mix characteristics, memory system utilization, warp-level activities, and profiler priority-score estimates. We show that training is overwhelmingly dominated by cuDNN convolution and implicit-GEMM kernels, with inefficiencies arising from memory-access patterns, tensor-layout conversions, and limited Tensor Core utilization. Guided by these insights, we evaluate two architecture-aware optimizations TF32 Tensor Core activation and a 3D channels-last layout and demonstrate that they reduce SM cycles by up to 100x, cut dynamic instructions by 100x, raise Tensor Core utilization from 1.45 to 9.98x, and increase IPC by 7% on A100, all without degrading synthesis quality.

2606.19528 2026-06-19 cs.LG cs.AI 新提交

Techniques for Peak Memory Reduction for LoRA Fine-tuning of LLMs on Edge Devices

边缘设备上LLM LoRA微调峰值内存降低技术

Hassan Dbouk, Matthias Reisser, Prathamesh Mandke, Likhita Arun Navali, Christos Louizos

AI总结 针对边缘设备上LLM LoRA微调的内存瓶颈,提出四种互补技术(量化、检查点、softmax近似、logits掩码),在Llama-3.2 3B和Qwen-2.5 3B上实现高达26倍和28倍的峰值内存降低。

Comments Hassan Dbouk and Matthias Reisser contributed equally to this work

详情
AI中文摘要

使用低秩适配(LoRA)在终端用户数据上微调大型语言模型(LLM)可提供个性化体验并保护数据隐私,但在消费级硬件上面临严重的内存限制。微调期间的峰值内存通常超过设备限制,尤其是对于具有数十亿参数和长上下文训练数据的模型。本文介绍了一套互补技术,可在不牺牲模型质量的情况下减少内存占用:(1)基模型量化与即时反量化,(2)结合选择性激活缓存和磁盘卸载的内存高效检查点,(3)使用语义相关令牌子集的softmax近似,以及(4)logits掩码。在Llama-3.2 3B和Qwen-2.5 3B上的实验表明,峰值内存降低高达26倍和28倍,从而能够在资源受限设备上进行微调。

英文摘要

Fine-tuning of Large Language Models (LLMs) using Low-Rank Adaptation (LoRA) on an end-user's data offers personalized experiences while keeping data private, but faces severe memory constraints on consumer hardware. Peak memory during fine-tuning often exceeds device limits, especially for models with billions of parameters and long-context training data. This paper introduces a suite of complementary techniques to reduce memory footprint without sacrificing model quality: (1) base model quantization with on-the-fly dequantization, (2) memory-efficient checkpointing combining selective activation caching and disk offloading, (3) softmax approximation using semantically relevant token subsets, and (4) logits masking. Experiments on Llama-3.2 3B and Qwen-2.5 3B demonstrate up to $26\times$ and $28\times$ reduction in peak memory, enabling fine-tuning on resource-constrained devices.

2606.19549 2026-06-19 cs.LG 新提交

Predicting Mergeability of Parameter-Efficient Fine-Tuning Updates

预测参数高效微调更新的可合并性

Lin Tang, Wei Zhang, Jing Li, Hongyu Chen, Ming Zhao, Yuxuan Wang

发表机构 * Sichuan University(四川大学) University of Electronic Science and Technology of China(电子科技大学)

AI总结 提出MergeProbe,通过训练初期信号预测LoRA适配器的可合并性,在MERGE-PEFT基准上实现最佳平均和最差保留性能。

详情
AI中文摘要

低秩适配(LoRA)使得训练许多领域和任务特定的语言模型适配器变得廉价,但两个适配器是否可以合并通常只有在两者都经过充分训练和评估后才能发现。这种延迟反馈代价高昂:单独表现强大的适配器在合并更新后可能会产生破坏性干扰。我们询问是否可以预测这种结果。我们将适配器可合并性形式化为适配器在合并后保持其单任务效用的程度,并表明可以从训练初期百分之几的信号中预测——主要是低秩更新及其梯度在不同任务间的对齐程度以及它们对共享表示的干扰程度。我们将这些信号打包成MergeProbe,一个轻量级预测器,用于估计成对和集合级别的保留,并将估计转化为具体决策:直接合并、重新加权、剪枝或路由。在MERGE-PEFT(一个涵盖数学、代码、科学、指令遵循和安全的五领域基准)上,MergeProbe在强干扰感知合并基线中实现了最佳平均和最差保留,同时增加的部署开销远低于完整任务路由。这将LoRA合并从事后工程步骤转变为预期测量问题。

英文摘要

Low-rank adaptation (LoRA) makes it cheap to train many domain- and task-specific language model adapters, but whether two adapters can be merged is usually discovered only after both have been fully trained and evaluated. This late feedback is costly: adapters that are strong in isolation can interfere destructively once their updates are combined. We ask whether this outcome can be anticipated. We formalize adapter mergeability as the degree to which an adapter preserves its single-task utility after merging, and show that it can be forecast from signals measured in the first few percent of training -- chiefly how the low-rank updates and their gradients align across tasks and how much they disturb shared representations. We package these signals into MergeProbe, a lightweight predictor that estimates pairwise and set-level retention and turns the estimate into a concrete decision: merge directly, reweight, prune, or route. On MERGE-PEFT, a five-domain benchmark spanning math, code, science, instruction following, and safety, MergeProbe attains the best average and worst-case retention among strong interference-aware merge baselines while adding far less deployment overhead than full task routing. This turns LoRA merging from a post-hoc engineering step into an anticipatory measurement problem.

2606.19712 2026-06-19 cs.LG cs.CV 新提交

Efficient Neural Network Model Selection for Few-Class Application Datasets

面向少类应用数据集的高效神经网络模型选择

Bryan Bo Cao, Abhinav Sharma, Lawrence O'Gorman, Michael Coss, Shubham Jain

发表机构 * Nokia Bell Labs(诺基亚贝尔实验室)

AI总结 针对实际应用中常见的少类数据集,提出基于数据属性的分类难度度量,实现比传统方法快6-29倍的模型选择,并扩展模型族至更小规模,在移动机器人等场景中提升效率。

Comments 36 pages, 9 tables, 13 figures

详情
AI中文摘要

尽管大量工作集中在开发和基准测试高性能神经网络上,但较少关注已知的数据集属性如何指导高效的模型选择。神经网络模型通常在数千类数据集上评估,然而许多实际应用涉及少于十类。为了解决这一被忽视但常见的情况,我们基于数据侧属性开发了一种分类难度度量,并展示了它如何为少类数据集实现更高效的模型选择,而传统方法在此效果较差。我们将此现象称为“少类独特性”。我们的度量允许比重复训练和测试快6到29倍的模型和数据集比较。利用这一洞察,我们将缩放模型族扩展到已发布的最小模型以下,在相似精度下实现更高效率,例如在移动机器人任务中模型比YOLOv5-nano小42%。针对资源受限的应用,我们在移动机器人、无人机和物联网场景中展示了少类模型选择,突出了在不牺牲性能的情况下效率的实际提升。

英文摘要

While much effort has focused on developing and benchmarking high-performance neural networks, less attention has been given to how dataset properties, known to practitioners, can guide efficient model selection. Neural models are typically evaluated on datasets with thousands of classes, yet many real-world applications involve fewer than ten. To address this understudied but common setting, we develop a measure of classification difficulty based on data-side properties and show how it enables more efficient model selection for few-class datasets, where traditional approaches are less effective. We term this phenomenon "few-class distinctiveness". Our metric allows comparison of models and datasets 6 to 29$\times$ faster than repeated training and testing. Leveraging this insight, we extend scaled model families below the smallest published models, achieving greater efficiency at similar accuracy, for example models up to 42% smaller than YOLOv5-nano for a mobile robot task. Targeting resource-constrained applications, we demonstrate few-class model selection across mobile robot, drone, and IoT scenarios, highlighting practical gains in efficiency without sacrificing performance.

2606.19919 2026-06-19 cs.LG 新提交

ADaPT: Token-Level Decoupling for Efficient Large Reasoning Models

ADaPT:面向高效大推理模型的令牌级解耦

Tingyun Li, Zishang Jiang, Jinyi Han, Xinyi Wang, Sihang Jiang, Han Xia, Zhaoqian Dai, Shuguang Ma, Fei Yu, Jiaqing Liang, Yanghua Xiao

发表机构 * School of Data Science, Fudan University(复旦大学数据科学学院) Shanghai Institute of Artificial Intelligence for Education, East China Normal University(华东师范大学上海智能教育研究院) College of Computer Science and Artificial Intelligence, Fudan University(复旦大学计算机科学与人工智能学院) Ant Group(蚂蚁集团)

AI总结 提出ADaPT,通过令牌级双过程框架解耦效率与正确性信号,引入模式选择令牌控制快慢推理,实现推理时效率-性能权衡的精确连续控制,在降低推理成本的同时保持强推理能力。

详情
AI中文摘要

大型推理模型依赖长思维链实现强性能,但统一应用此类推理会产生高计算成本。现有面向效率的方法试图缩短或混合推理策略,但往往会降低推理能力。我们将根本原因识别为效率激励与正确性优化之间的序列级耦合,这隐式惩罚了长但正确的推理轨迹。为解决此问题,我们提出自适应双过程思维(ADaPT),一种令牌级双过程框架,在训练期间显式解耦效率和正确性信号。ADaPT引入模式选择令牌来控制快速和慢速推理,将效率相关奖励仅应用于此令牌,以避免惩罚正确的长推理,同时在适当时鼓励效率。此外,ADaPT在推理时实现了对效率-性能权衡的精确连续控制:通过调整模式选择令牌的生成概率,单个训练好的模型可以平滑地沿效率-性能帕累托前沿移动。大量实验表明,ADaPT在多个基准测试中显著降低推理成本,同时保持强推理性能。

英文摘要

Large reasoning models rely on long chain-of-thought to achieve strong performance, but applying such reasoning uniformly incurs high computational cost. Existing efficiency-oriented methods attempt to shorten or mix reasoning strategies, yet often degrade reasoning capability. We identify the root cause as sequence-level coupling between efficiency incentives and correctness optimization, which implicitly penalizes long but correct reasoning trajectories. To address this issue, we propose Adaptive Dual-Process Thinking (ADaPT), a token-level dual-process framework that explicitly decouples efficiency and correctness signals during training. ADaPT introduces a mode-selection token to control fast and slow reasoning, applying efficiency-related rewards exclusively to this token to avoid penalizing correct long reasoning while encouraging efficiency when appropriate. Moreover, ADaPT enables precise and continuous control over the efficiency-performance trade-off at inference time: by adjusting the generation probability of the mode-selection token, a single trained model can smoothly move along the efficiency-performance Pareto frontier. Extensive experiments demonstrate that ADaPT significantly reduces inference cost while maintaining strong reasoning performance across multiple benchmarks.

2606.19964 2026-06-19 cs.LG cs.AR 新提交

Low-Energy Reduced RISC-V Instruction Subset Processor for Tsetlin Machine Inference at the Edge

用于边缘Tsetlin Machine推理的低能耗精简RISC-V指令子集处理器

Chanda Gupta, Sanidhya Bhatia, Shaurya Priyadarshi, Himani Panwar, Rishad Shafik, Sudip Roy

AI总结 针对Tsetlin Machine推理,提出一种领域专用RISC-V微处理器架构,通过指令精简和数据路径简化,在保持可编程性的同时实现高达98%的执行时间减少和29.7倍能耗降低。

Comments 6 pages, 6 Figures, Accepted in IEEE ISVLSI Conference 2026

详情
AI中文摘要

Tsetlin Machine (TM) 是一种基于逻辑的机器学习方法,依赖于简单的位运算和有限状态自动机,使其适用于边缘AI部署。最近的工作集中在基于Tsetlin Machine (TM) 的协处理器和加速器设计上。尽管这些设计实现了高性能,但它们通常依赖于紧密耦合的接口、微码风格的编程和外部主机处理器,限制了灵活性和编程简易性。在这项工作中,我们提出了一种面向TM推理的领域专用RISC-V微处理器架构和设计流程。利用RISC-V的模块化结构,我们设计了一个精简指令子集处理器,在保持可编程性的同时,针对TM工作负载提高了性能并降低了能耗。采用指令分析来指导指令精简,随后针对TM推理进行数据路径和控制路径的简化。在多个数据集上评估了基线RV32IM核心和所提出的精简核心,并与二值神经网络 (BNN) 进行比较,BNN由于在推理过程中依赖位运算而被用作硬件高效基线。结果表明,TM实现了相当或更高的准确率(例如,在CIFAR-2上高达88.18%,而BNN为60.0%),同时在多个数据集上执行时间减少了高达98%。此外,所提出的设计实现了平均29.7倍的能耗降低,证明了其在可编程且高效的边缘AI系统中的有效性。

英文摘要

Tsetlin Machine (TM) is a logic-based machine learning approach that relies on simple bitwise operations and finite-state automata, which makes it attractive for edge AI deployments. Recent work has focused on co-processor and accelerator designs based on Tsetlin Machines (TMs). Although these designs achieve high performance, they typically depend on tightly coupled interfaces, microcode-style programming, and external host processors, limiting flexibility and ease of programming. In this work, we present a domain-specific RISC-V microprocessor architecture and design flow tailored for TM inference. Leveraging the modular structure of RISC-V, we design a reduced instruction subset processor that retains programmability while targeting improved performance and lower energy consumption for TM workloads. Instruction profiling is employed to guide instruction reduction, followed by datapath and control path simplifications tailored to TM inference. Both the baseline RV32IM core and the proposed reduced core are evaluated across multiple datasets and compared with Binarized Neural Networks (BNNs), which serve as a hardware-efficient baseline due to their reliance on bitwise operations during inference. Results show that TM achieves comparable or higher accuracy (e.g., up to 88.18% on CIFAR-2 compared to 60.0% for BNN) while reducing execution time by up to 98% across multiple datasets. Furthermore, the proposed design achieves an average $29.7\times$ reduction in energy consumption, demonstrating its effectiveness for programmable and efficient edge AI systems.

2606.19993 2026-06-19 cs.LG 新提交

Activation- and Influence-Aware Ranks (AIR): Function-Preserving SVD Compression for LLMs

激活与影响感知秩 (AIR):保持功能的SVD压缩用于大语言模型

Nico Harder, Daniel Becking, Karsten Mueller, Wojciech Samek

AI总结 提出AIR框架,基于SVD和反向信号影响度量,通过单次交替最小二乘扫描实现权重矩阵的低秩近似,在参数保留≤60%时困惑度比SVD-LLM(W)改善>18%,并减少90%校准数据。

Comments Accepted at the ICML 2026 Workshop on Resource-Adaptive Foundation Model Inference (AdaptFM), Seoul, South Korea (non-archival)

详情
AI中文摘要

我们提出了激活与影响感知秩(AIR),一个基于SVD的大语言模型压缩框架,它使用反向信号影响度量来指导每个权重矩阵的低秩近似。从SVD-LLM(W)的激活感知最优解出发,AIR运行单次封闭形式的交替最小二乘(ALS)扫描,在单调下降保证下逐元素整合影响。AIR是层局部的,并与端到端方法正交组合:单独使用时超过ACIP,AIR+LoRA进一步超越。AIR在参数保留≤60%时,困惑度比SVD-LLM(W)改善超过18%,使用约90%更少的校准数据达到相同质量,并将参数节省转化为FLOP、峰值内存和每令牌延迟的收益。

英文摘要

We present Activation- and Influence-Aware Ranks (AIR), an SVD-based LLM compression framework that guides each weight matrix's low-rank approximation with a backward-signal influence metric. Starting from the activation-aware optimum of SVD-LLM(W), AIR runs a single closed-form alternating least squares (ALS) sweep that integrates influence element-wise under a monotone-descent guarantee. AIR is layer-local and composes orthogonally with end-to-end methods: alone it exceeds ACIP, and AIR+LoRA outperforms it further. AIR improves perplexity over SVD-LLM(W) by >18% at <=60% parameter retention, matches its quality with ~90% less calibration data, and turns parameter savings into FLOP, peak-memory, and per-token latency gains.

2606.20005 2026-06-19 cs.LG cs.AI 新提交

StreamKL: Fast and Memory-Efficient KL Divergence for Boosting Attention Distillation

StreamKL: 快速且内存高效的KL散度用于提升注意力蒸馏

Guangda Liu, Yiquan Wang, Chengwei Li, Wenhao Chen, Jing Lin, Yiwu Yao, Danning Ke, Wenchao Ding, Jieru Zhao

发表机构 * Shanghai Jiao Tong University(上海交通大学) Huawei(华为) Fudan University(复旦大学)

AI总结 提出StreamKL,首个融合GPU原语,通过在线公式和逐块重计算将注意力蒸馏的内存和IO成本从O(N_QN_K)降至O(1),实现高达43倍前向和14倍反向加速。

详情
AI中文摘要

注意力蒸馏通过最小化Kullback-Leibler (KL)散度来训练一个注意力分布匹配另一个,广泛应用于知识蒸馏、模型压缩、持续学习和稀疏注意力LLM训练。然而,现有方法在计算KL归约前需要具体化两个注意力分布,导致$O(N_QN_K)$的内存和IO成本,在长上下文长度下变得不可接受。我们提出StreamKL,首个用于注意力KL散度的融合GPU原语,消除了这种二次具体化。StreamKL推导了一种新颖的在线公式用于耦合的双分布KL归约,使得单个前向内核能够通过片上SRAM流式处理查询-键块。对于反向传播,StreamKL逐块重计算注意力概率,避免存储二次中间结果。我们进一步设计并实现了具有专用优化的高效GPU内核。实验表明,StreamKL在前向和反向传播中分别比基线方法快高达43倍和14倍。最重要的是,StreamKL将注意力蒸馏的额外HBM占用从$O(N_QN_K)$减少到$O(1)$,使得在单个GPU上进行长上下文蒸馏成为可能。

英文摘要

Attention distillation, which trains one attention distribution to match another by minimizing their Kullback-Leibler (KL) divergence, is widely used in knowledge distillation, model compression, continual learning, and sparse-attention LLM training. However, existing approaches materialize both attention distributions before computing the KL reduction, incurring $O(N_QN_K)$ memory and IO costs that become prohibitive at long context lengths. We present StreamKL, the first fused GPU primitive for attention KL divergence that eliminates this quadratic materialization. StreamKL derives a novel online formulation for the coupled two-distribution KL reduction, enabling a single one-pass forward kernel that streams query-key tiles through on-chip SRAM. For the backward pass, StreamKL recomputes attention probabilities tile-by-tile, avoiding storage of quadratic intermediates. We further design and implement efficient GPU kernels with dedicated optimizations. Experiments show StreamKL delivers up to $43\times$ and $14\times$ speedups over baseline methods in the forward and backward passes, respectively. Most importantly, StreamKL reduces the extra HBM footprint of attention distillation from $O(N_QN_K)$ to $O(1)$, enabling long-context distillation on a single GPU.

2606.20474 2026-06-19 cs.LG cs.AI cs.PF 新提交

UltraQuant: 4-bit KV Caching for Context-Heavy Agents

UltraQuant: 面向上下文密集型智能体的4位KV缓存

Inesh Chakrabarti, David Limpus, Aditi Ghai Rana, Bowen Bao, Spandan Tiwari, Thiago Crepaldi, Ashish Sirasao

发表机构 * Advanced Micro Devices(超威半导体) University of California, Los Angeles(加州大学洛杉矶分校) Purdue University(普渡大学)

AI总结 针对上下文密集型智能体场景,提出UltraQuant方法,通过4位KV缓存压缩、旋转量化和代码本量化,结合AMD GPU优化,在长上下文多轮任务中延迟降低3.47倍,吞吐量提升1.63倍。

Comments 11 pages, 9 figures

详情
AI中文摘要

上下文密集型智能体给键值(KV)缓存带来了异常压力:长前缀在多个短轮次中重复使用,而并发性决定了服务系统能否保持GPU利用率。我们针对此场景研究4位KV缓存压缩,采用TurboQuant风格的旋转和代码本量化作为质量锚点,vLLM FP8 KV缓存作为部署锚点。我们报告三项贡献。首先,我们将4位KV缓存框架用于多轮智能体工作负载,其中任务质量、缓存驻留和服务吞吐量必须联合衡量。其次,我们描述了使4位路径鲁棒所需的实际设计选择,包括非对称K/V处理、Walsh-Hadamard旋转、QJL移除和块尺度变体。第三,我们展示了AMD GPU上的服务优化,包括优化的解码注意力内核和UltraQuant,一种使用FP8查询、FP4 KV张量、UE8M0组尺度和CDNA4上原生缩放MFMA支持的FP4近似路径。在长上下文、多轮智能体工作负载上,UltraQuant在缓存压力大的后期轮次中将P50首令牌延迟降低了3.47倍(所有轮次平均2.3倍),并将输出吞吐量比FP8 KV基线提高了1.63倍。

英文摘要

Context-heavy agents place unusual pressure on the key-value (KV) cache: long prefixes are reused across many short turns, while concurrency determines whether the serving system can keep GPUs utilized. We study 4-bit KV-cache compression for this setting, using TurboQuant-style rotation and codebook quantization as a quality anchor and vLLM FP8 KV caching as the deployment anchor. We report three contributions. First, we frame 4-bit KV caching around multi-round agent workloads where task quality, cache residency, and serving throughput must be measured jointly. Second, we describe the practical design choices needed to make the 4-bit path robust, including asymmetric K/V treatment, Walsh-Hadamard rotation, QJL removal, and block-scale variants. Third, we present serving optimizations on AMD GPUs, including optimized decode-attention kernels and UltraQuant, an FP4 approximation path that uses FP8 queries, FP4 KV tensors, UE8M0 group scales, and native scaled-MFMA support on CDNA4. On a long-context, multi-turn agentic workload, UltraQuant cuts P50 time-to-first-token by 3.47x in the cache-pressured late rounds (2.3x across all rounds) and raises output throughput by 1.63x over the FP8 KV baseline.

2606.20537 2026-06-19 cs.LG cs.DC 新提交

Execution-State Capsules: Graph-Bound Execution-State Checkpoint and Restore for Low-Latency, Small-Batch, On-Device Physical-AI Serving

执行状态胶囊:面向低延迟、小批量、设备端物理AI服务的图绑定执行状态检查点与恢复

Liang Su

AI总结 针对低延迟、小批量、设备端物理AI服务场景,提出执行状态胶囊机制,通过图绑定检查点与恢复完整可恢复状态,在RTX 5090上实现亚毫秒级恢复,TTFT加速比达3.9倍至27倍。

Comments 27 pages, 9 figures

详情
AI中文摘要

主流LLM服务系统主要通过分页或基数键值(KV)缓存重用前缀工作。这对于高吞吐量、高并发服务非常有效,但它只管理执行状态的一个位置片段:KV缓存。我们研究相反的场景:低延迟、小批量、设备端物理AI服务,其中交互式LLM代理、语音系统和机器人策略在严格的响应预算下频繁分支、重置、中断和重新进入。我们引入执行状态胶囊,一种图绑定的检查点和恢复机制,用于在提交边界处保存完整的可恢复状态。FlashRT是一个白盒、后端内核运行时,其评估的NVIDIA CUDA后端在连续的静态缓冲区上运行捕获的图计划,无需块表间接寻址。由于活动状态是一组命名的封闭缓冲区,胶囊可以快照、恢复、分叉或回滚整个执行边界,包括KV、循环状态、卷积状态、MTP状态和元数据。这将重用从令牌寻址的KV片段转移到图绑定的执行状态边界。在RTX 5090上,胶囊恢复在存储状态级别是字节精确的,在贪婪解码下是令牌一致的。仅KV的消融实验出现分歧,表明循环状态是承载负载的。GPU驻留的快照和恢复是亚毫秒级的,TTFT相对于冷预填充的加速比从2k令牌时的3.9倍增长到16k令牌时的27倍。在Jetson AGX Thor和DGX Spark上,相同的正确性和结构属性成立。胶囊不是高吞吐量KV缓存服务的替代品;它们定义了一个互补的以延迟为先的服务点,用于显式执行状态重用。

英文摘要

Mainstream LLM serving systems reuse prefix work mainly through paged or radix key-value (KV) caches. This is highly effective for high-throughput, high-concurrency serving, but it manages only one positional fragment of execution state: the KV cache. We study the opposite regime: low-latency, small-batch, on-device physical-AI serving, where interactive LLM agents, speech systems, and robot policies repeatedly branch, reset, interrupt, and re-enter under tight responsiveness budgets. We introduce execution-state capsules, a graph-bound checkpoint and restore mechanism for the complete restorable state at a committed boundary. FlashRT is a white-box, backend-facing kernel runtime whose evaluated NVIDIA CUDA backend runs captured graph plans over contiguous static buffers with no block-table indirection. Because the live state is a closed set of named buffers, a capsule can snapshot, restore, fork, or roll back the whole execution boundary, including KV, recurrent state, convolution state, MTP state, and metadata. This moves reuse from token-addressed KV fragments to graph-bound execution-state boundaries. On an RTX 5090, capsule restore is byte-exact at the stored-state level and token-identical under greedy decode. A KV-only ablation diverges, showing that recurrent state is load-bearing. GPU-resident snapshot and restore are sub-millisecond, and TTFT speedup over cold prefill grows from 3.9x at 2k tokens to 27x at 16k tokens. On Jetson AGX Thor and DGX Spark, the same correctness and structural properties hold. Capsules are not a replacement for high-throughput KV-cache serving; they define a complementary latency-first serving point for explicit execution-state reuse.

7. 联邦学习、隐私与安全 4 篇

2606.19734 2026-06-19 cs.LG 新提交

Federated Bilevel Performative Prediction

联邦双层执行预测

Liangxin Qian, Chang Liu, Xuanyu Cao, Jun Zhao, Kwok-Yan Lam

发表机构 * Nanyang Technological University(南洋理工大学) Zhejiang University(浙江大学) Washington State University(华盛顿州立大学)

AI总结 研究联邦学习中客户端数据分布受决策影响的双层优化问题,提出联邦双层执行稳定点概念及两种求解方法,实验验证了稳定性阈值和元泛化提升。

Comments Accepted by ICML 2026

详情
AI中文摘要

联邦双层优化广泛用于跨分布式客户端的嵌套学习问题,例如在隐私和通信约束下的联邦超参数调整和元学习。大多数现有公式假设客户端数据分布固定,但执行性可能违反这一假设,其中部署的决策会重塑客户端行为和数据收集,导致客户端特定的、决策依赖的分布偏移。我们研究联邦双层执行预测,其中上层(UL)和下层(LL)目标都在客户端依赖、决策依赖的分布下进行评估。我们在解耦风险视角下形式化联邦双层执行稳定(FBPS)点,并给出其存在性和唯一性的充分条件。然后,我们开发两种联邦方法来计算FBPS解:FBi-RRM,在收缩条件下线性收敛;以及FBi-SGD,一种基于联邦超梯度估计的通信高效随机方法,在步长递减且敏感性足够小时具有收敛保证。在策略回归和元策略分类上的实验验证了预测的稳定性阈值,并展示了相对于非执行基线的元泛化改进,基于CNN的分类进一步证明了所提方法在非凸神经网络设置中的实际有效性。

英文摘要

Federated bilevel optimization is widely used for nested learning problems across distributed clients, such as federated hyperparameter tuning and meta-learning under privacy and communication constraints. Most existing formulations assume fixed client data distributions, which can be violated by performativity, where deployed decisions reshape client behavior and data collection, inducing client-specific, decision-dependent distribution shift. We study federated bilevel performative prediction, where both upper-level (UL) and lower-level (LL) objectives are evaluated under client-dependent, decision-dependent distributions. We formalize the federated bilevel performatively stable (FBPS) point under a decoupled-risk perspective and provide sufficient conditions for its existence and uniqueness. We then develop two federated methods to compute the FBPS solution: FBi-RRM, which converges linearly under a contraction condition, and FBi-SGD, a communication-efficient stochastic method based on federated hypergradient estimation with convergence guarantees under diminishing step sizes when sensitivities are sufficiently small. Experiments on strategic regression and meta strategic classification validate the predicted stability thresholds and demonstrate improved meta-generalization over non-performative baselines, and CNN-based classification further demonstrates the practical effectiveness of the proposed methods in nonconvex neural network settings.

2606.20115 2026-06-19 cs.LG cs.CV 新提交

When Calibration Fails the Vulnerable Hospital: Federated Conformal Risk Control via Risk-Curve Shrinkage

当校准失败于脆弱的医院:通过风险曲线收缩实现联邦共形风险控制

Nafis Fuad Shahid

AI总结 针对联邦部署中标准共形风险控制(CRC)对个体机构覆盖不足的问题,提出基于风险曲线收缩的联邦CRC协议,在真实脑肿瘤数据上实现2.7/20的违规率且预测集仅扩大2.0倍。

Comments 9 pages, 3 figures, 2 tables. Submitted to the DeCaF Workshop at MICCAI 2026

详情
AI中文摘要

共形风险控制(CRC)通过在保留数据上校准预测集阈值,提供分割质量的无分布保证。在联邦部署中,标准方法将各站点的校准分数合并为一个阈值。我们在真实多机构脑肿瘤数据(FeTS-2022,1251名受试者,20个机构)上首次量化表明,这种朴素的合并CRC保护了平均医院,但违反了40%个体机构的覆盖,最差站点的假阴性率超出目标7.8个百分点。朴素的替代方案——每个站点本地CRC——基本恢复了覆盖,但将预测集扩大了83倍,使其在临床上无用。我们提出一种基于收缩的联邦CRC协议:每个站点仅将其经验风险曲线(G个标量)传输到服务器,服务器为每个站点计算收缩正则化阈值。单个超参数n0平滑地权衡最坏情况覆盖与预测集效率;留一站点敏感性分析确定n0=19,在2.0倍拉伸下实现2.7/20的违规。我们进一步表明,覆盖预算的直接拉格朗日优化失败,将风险集中在脆弱的医院,并且有限样本修正项是必不可少的:移除它会使违规增加三倍。在所述站点混合假设下,边际CRC保证通过构造得以保留;在三个种子下针对四个目标验证了每个站点的覆盖。没有患者级别的图像、掩膜或每体积分数离开任何站点。

英文摘要

Conformal risk control (CRC) provides distribution-free guarantees on segmentation quality by calibrating a prediction-set threshold on held-out data. In federated deployments, the standard approach pools calibration scores across sites into a single threshold. We provide the first quantification, on real multi-institutional brain tumor data (FeTS-2022, 1,251 subjects, 20 institutions), showing that this naive pooled CRC protects the average hospital but violates coverage at 40% of individual institutions, with the worst site exceeding the target false-negative rate by 7.8 percentage points. The naive alternative, per-site local CRC, largely restores coverage but inflates prediction sets by 83x, rendering them clinically useless. We propose a shrinkage-based federated CRC protocol: each site transmits only its empirical risk curve (G scalars) to a server, which computes a shrinkage-regularized threshold per site. A single hyperparameter n0 smoothly trades worst-case coverage for prediction-set efficiency; leave-one-site-out sensitivity analysis identifies n0=19, achieving 2.7/20 violations at 2.0x stretch. We further show that direct Lagrangian optimization of coverage budgets fails, concentrating risk on vulnerable hospitals, and that the finite-sample correction term is essential: removing it triples violations. The marginal CRC guarantee is preserved by construction under the stated site-mixture assumption; per-site coverage is validated across four targets with three seeds. No patient-level images, masks, or per-volume scores leave any site.

2606.20382 2026-06-19 cs.LG 新提交

Towards Modality-imbalanced Federated Graph Learning: A Data Synthesis-based Approach

面向模态不平衡的联邦图学习:一种基于数据合成的方法

Zhengyu Wu, Hongchao Qin, Xunkai Li, Zekai Chen, Rong-Hua Li, Guoren Wang

AI总结 针对联邦图学习中客户端级和节点级模态不平衡问题,提出隐式图感知潜在语义表示合成范式FedMGS,通过可用性感知图编码器、原型引导语义合成器和可靠性校准融合机制恢复缺失模态语义,在四个任务上最高提升17.41%。

详情
AI中文摘要

多模态联邦图学习(MM-FGL)提供了一种自然的协作训练范式,但其实际部署受到两种粒度的模态不平衡挑战。当某些客户端缺少完整模态时,会出现客户端级不平衡;而当单个节点缺少视觉或文本属性时,会出现节点级不平衡。尽管存在一些相关研究,但我们的调查表明,它们主要针对图无关或集中式场景,难以直接适应。为了解决这些挑战,我们将模态不平衡的MM-FGL形式化为一个隐式图感知潜在语义表示合成问题。该范式直接在表示空间中恢复缺失的模态语义,从而最大化与原始数据语义分布的对齐,并缓解由缺失模态引起的高方差。为此,我们提出了FedMGS(联邦模态感知图合成),它集成了三个核心组件。可用性感知图编码器防止缺失模态污染局部结构传播。原型引导潜在语义合成器为不可用模态建立跨客户端语义锚点。可靠性校准语义融合机制在预测读出之前调节恢复的潜在表示的影响。在四个任务上的大量实验表明,FedMGS始终优于竞争基线,最高提升17.41%,并实现了最佳效率-性能权衡。

英文摘要

MultiModal Federated Graph Learning (MM-FGL) offers a natural collaborative training paradigm, but its practical deployment is challenged by two granularities of modality imbalance. Client-level imbalance occurs when certain clients lack entire modalities, while node-level imbalance occurs when individual nodes exhibit missing visual or textual attributes. While several relevant studies exist, our investigation reveals that they predominantly target graph-agnostic or centralized scenarios, rendering them difficult to adapt directly. To address these challenges, we formalize modality-imbalanced MM-FGL as an implicit graph-aware latent semantic representation synthesis problem. This paradigm recovers missing modal semantics directly within the representation space, thereby maximizing alignment with the original data's semantic distribution and mitigating the high variance induced by missing modalities. To this end, we propose FedMGS (Federated Modality-aware Graph Synthesis), which integrates three core components. The availability-aware graph encoder prevents missing modalities from contaminating local structural propagation. The prototype-guided latent semantic synthesizer establishes cross-client semantic anchors for unavailable modalities. The reliability-calibrated semantic fusion mechanism regulates the impact of recovered latent representations prior to predictive readout. Extensive experiments on four tasks show that FedMGS consistently outperforms competitive baselines with gains up to 17.41% with best efficiency-performance tradeoff.

2606.20546 2026-06-19 cs.LG 新提交

Predictability as a Fine-Grained Measure for Privacy

可预测性作为隐私的细粒度度量

Linda Lu, Karthik Sridharan

AI总结 提出可预测性框架,通过攻击者预测敏感信息的能力增益来衡量隐私泄露,与差分隐私互补,并基于广义矩方法分析渐近可预测性,用于ERM输出扰动。

详情
AI中文摘要

差分隐私(DP)确保针对最知识渊博的攻击者的严格个体级隐私保证,但其最坏情况性质可能导致代价高昂的隐私-准确性权衡。我们引入了通过可预测性实现的隐私,这是一个细粒度框架,明确包含了攻击者的核心知识、由随机过程生成的数据集的受损部分以及指定的查询族。可预测性将隐私泄露衡量为攻击者在观察算法输出后,预测关于未知个体的敏感信息的能力的增量增益,超出已从受损数据中推断出的信息。我们表明,可预测性和DP通常是不可比的:一个可以很小而另一个很大。然而,在最坏情况下,当除一个个体外所有个体都受损且所有二元查询都被视为敏感时,可预测性意味着互信息DP。更一般地,可预测性提供了一种针对特定敏感信息和特定攻击者模型量身定制的更细粒度的隐私度量。我们引入了一个通用框架,使用广义矩方法(GMM),来分析当受损数据由平稳、遍历、混合过程生成时的渐近可预测性。利用这一分析,我们推导出用于ERM的可预测性校准输出扰动方案。我们的方法与DP互补,并且可以与DP一起使用以提供细粒度的隐私控制。

英文摘要

Differential privacy (DP) ensures rigorous individual-level privacy guarantees against even the most knowledgeable attackers, but its worst-case nature can impose a costly privacy-accuracy tradeoff. We introduce privacy via predictability, a fine-grained framework that explicitly incorporates the attacker's core knowledge, a compromised portion of the dataset generated by a stochastic process, and a specified family of queries. Predictability measures privacy leakage as the incremental gain in an attacker's ability to predict sensitive information about unknown individuals after observing the algorithm's output, beyond what can already be inferred from the compromised data. We show that predictability and DP are generally incomparable: each can be small while the other is large. However, in the worst-case regime where all but one individual is compromised, and all binary queries are considered sensitive, predictability implies mutual-information DP. More generally, predictability provides a finer-grained privacy metric tailored to specific sensitive information and specific attacker models. We introduce a general framework, using the generalized method of moments (GMM), to analyze asymptotic predictability when the compromised data is generated by a stationary, ergodic, mixing process. Using this analysis, we derive a predictability-calibrated output perturbation scheme for ERM. Our approach is complementary to DP and can be used alongside DP to provide fine-grained privacy control.

8. 鲁棒性、不确定性与可信学习 6 篇

2606.19404 2026-06-19 cs.LG cs.CL 新提交

Thermodynamic Signatures of Reasoning: Free-Energy and Spectral-Form-Factor Diagnostics for Hallucination Detection in Large Language Models

推理的热力学特征:用于大型语言模型幻觉检测的自由能和谱形因子诊断

Salim Khazem

发表机构 * Talan Research & Innovation Center(Talan研究与创新中心)

AI总结 提出自由能签名(Fes)作为谱描述符,将注意力拉普拉斯视为哈密顿量并提取热力学势和随机矩阵理论谱形因子,用于检测LLM幻觉,无需训练即可实现高AUROC。

详情
AI中文摘要

大型语言模型(LLM)中的幻觉检测对部署至关重要,近期研究表明注意力导出的图拉普拉斯谱携带关于推理质量的强信号。然而,先前的谱诊断仅通过少数特征值或手工选取的标量来总结拉普拉斯谱,忽略了其大部分结构。我们提出自由能签名(Fes),一种谱描述符,将每层的注意力拉普拉斯视为哈密顿量,并提取其热力学势(配分函数、自由能、谱熵、热容)以及随机矩阵理论(RMT)谱形因子。我们证明了三个结果:(i)Fes在注意力扰动下的Lipschitz稳定性;(ii)一个表达性结果,表明Fes丰富了有限谱摘要,并在明确的规则性和网格分辨率假设下逼近矩导出的谱泛函;(iii)基于Fes构建的无训练检测器AUROC的有限样本PAC界。实验上,在六个开源LLM和六个基准测试中,基于Fes描述符的轻量级探测在注意力谱基线中实现了最强的平均AUROC,相比LapEig平均提高+6.5 AUROC点,相比GoR-4平均提高+2.4点,且无需更新底层LLM。在完全无监督设置下,RMT偏差得分达到平均AUROC 0.71,提供了一个无标签但较弱的检测器。互补的RMT分析表明,正确生成表现出更接近Wigner-Dyson的谱统计,而幻觉表现出更接近Poisson的统计。匿名代码和配置在补充材料中提供。

英文摘要

Hallucination detection in large language models (LLMs) is deployment-critical, and recent work shows that the spectrum of attention-derived graph Laplacians carries strong signal about reasoning quality. Prior spectral diagnostics, however, summarize the Laplacian spectrum by a handful of eigenvalues or hand-picked scalars, leaving most of its structure unused. We propose Free-Energy Signatures (Fes), a spectral descriptor that treats each layer's attention Laplacian as a Hamiltonian and extracts its thermodynamic potentials partition function, free energy, spectral entropy, heat capacity together with the random-matrix-theory (RMT) spectral form factor. We prove three results: (i)~Lipschitz stability of Fes under attention perturbation; (ii)~an expressiveness result showing that Fes enriches finite spectral summaries and approximates moment-derived spectral functionals under explicit regularity and grid-resolution assumptions; and (iii)~a finite-sample PAC bound on the AUROC of a training-free detector built from Fes. Empirically, across six open-weight LLMs and six benchmarks, a lightweight probe on Fes descriptors achieves the strongest aggregate AUROC among attention-spectral baselines, improving over LapEig by $+6.5$ AUROC points and over GoR-4 by $+2.4$ points on average, while requiring no update to the underlying LLM. In the fully unsupervised setting, an RMT-deviation score achieves mean AUROC $0.71$, providing a label-free but weaker detector. A complementary RMT analysis shows that correct generations exhibit more Wigner-Dyson like spectral statistics, whereas hallucinations exhibit more Poisson-like statistics. The anonymized code and config are provided in the supplementary material.

2606.19569 2026-06-19 cs.LG 新提交

On the QUEST for Uncertainty Quantification via Highest Density Regions

论基于最高密度区域的量化不确定性探索

Sam Goring, Tom Kuipers, Nicola Paoletti, David S. Watson

发表机构 * Northeastern University London(东北大学伦敦校区)

AI总结 针对概率机器学习中回归问题的不确定性量化,提出基于最高密度区域体积的QUEST框架,满足单调性和平移不变性公理,在选择性预测基准上优于方差和微分熵。

Comments 27 pages, of which 10 are main text. Contains 7 figures, 4 tables, 1 algorithm in total

详情
AI中文摘要

不确定性量化对于概率机器学习中安全关键应用的可靠决策至关重要。对于回归问题,主流的标量不确定性量化方法——特别是基于适当评分规则的方法——通过逐点预测风险来衡量不确定性。当目标统计量不是条件期望时,这可能导致反直觉的结果。我们提出了一种替代框架,其中不确定性通过分布支持的最可能子集的体积来表征。QUEST(通过最高密度区域量化不确定性)是一种基于勒贝格测度在分布峰值处集中程度的新颖不确定性量化方法,在鲁棒性参数$\alpha$的一个或多个值处进行评估。我们建立了我们的度量与信息论和经济学中经典统计量之间的联系。我们表明,与基于适当评分规则的流行替代方案不同,QUEST的认知不确定性和偶然不确定性度量满足从不确定性量化文献中改编的一组公理,包括在分布扩散下的单调性和位置偏移的不变性。选择性预测基准证实,QUEST在方差和微分熵等标准度量上表现良好。

英文摘要

Uncertainty quantification (UQ) is essential for reliable decision-making in safety-critical applications in probabilistic machine learning. For regression problems, dominant scalar UQ approaches - notably, those based on proper scoring rules - measure uncertainty via pointwise predictive risk. This can lead to counterintuitive results when the target statistic is not the conditional expectation. We propose an alternative framework, in which uncertainty is characterised by the volume of the most probable subset of a distribution's support. QUEST (Quantifying Uncertainty via highest dEnSiTy regions) is a novel approach to UQ based on the concentration of Lebesgue measure at a distribution's peak(s), evaluated at one or more values of a robustness parameter $α$. We establish connections between our measures and classical statistics from information theory and economics. We show that, unlike popular alternatives based on proper scoring rules, QUEST measures of epistemic and aleatoric uncertainty satisfy a set of axioms adapted from the UQ literature, including monotonicity under distributional spread and invariance to location shifts. Selective prediction benchmarks confirm that QUEST performs favourably against standard measures such as variance and differential entropy.

2606.19603 2026-06-19 cs.LG 新提交

Comparing Linear Probes with Mahalanobis Cosine Similarity

比较线性探针与马氏余弦相似度

Zhuofan Josh Ying, Peter Hase, Nikolaus Kriegeskorte

发表机构 * Columbia University(哥伦比亚大学) Stanford University(斯坦福大学) Schmidt Sciences(施密特科学)

AI总结 研究证明马氏余弦相似度与OOD AUROC存在线性关系,提供理论解释并验证其作为线性探针比较指标的有效性。

Comments 16 pages, 10 figures

详情
AI中文摘要

线性探针广泛用于可解释性研究,并常通过余弦相似度进行比较。两个方向之间的马氏余弦相似度(MCS)通过测试数据协方差重新加权内积,是一种自然的任务感知改进。Ying等人(2026)报告称,探针与在分布外(OOD)数据上训练的参考探针的MCS近乎完美地线性预测了该探针的OOD AUROC(R^2 = 0.98)。在这里,我们将这一实证发现扩展到不同模型、层和概念领域,并以封闭形式证明了这一普遍现象:对于投影为高斯分布的平衡类别,OOD AUROC与参考探针的MCS是线性的,因为两者都是探针在测试数据上信噪比(SNR)的S形函数。该理论还预测了这种线性何时失效,我们通过实验验证了这一点。MCS为比较线性探针提供了有理论依据且经验有效的替代方案,优于欧几里得余弦相似度。

英文摘要

Linear probes are widely used in interpretability research and often compared by cosine similarity. The Mahalanobis cosine similarity (MCS) between two directions, which reweights the inner product by test data covariance, is a natural task-aware refinement. Ying et al. (2026) report that a probe's MCS to a reference probe trained on the out-of-distribution (OOD) data near-perfectly linearly predicts the probe's OOD AUROC (R^2 = 0.98). Here, we extend this empirical finding across models, layers, and concept domains, and prove this general phenomenon in closed form: For balanced classes whose projections are Gaussian, OOD AUROC and MCS to the reference probe are linear because both are sigmoid-shaped functions of the probe's signal-to-noise ratio (SNR) on the test data. The theory also predicts when this linearity fails, which we verify empirically. MCS offers a theoretically grounded and empirically effective alternative to Euclidean cosine similarity for comparing linear probes.

2606.19818 2026-06-19 cs.LG cs.AI 新提交

Uncertainty-Aware Reward Modeling for Stable RLHF

不确定性感知的奖励建模用于稳定的RLHF

Licheng Pan, Haocheng Yang, Haoxuan Li, Yichen Sun, Yunsheng Lu, Shijian Wang, Lei Shen, Yuan Lu, Zhixuan Chu, Hao Wang

发表机构 * Zhejiang University(浙江大学) Peking University(北京大学) National University of Singapore(新加坡国立大学)

AI总结 提出不确定性感知奖励建模(UARM),通过分位数保形预测校准不确定性并利用异方差方差分解重加权GRPO优势,以缓解奖励黑客问题,提升对齐质量。

详情
AI中文摘要

从人类反馈中强化学习(RLHF)通过在偏好数据上训练奖励模型并优化策略以最大化预测奖励来对齐大型语言模型。然而,该流程面临两个基本挑战:(1)奖励模型无法在预测不可靠时发出信号,因为它们通常充当确定性点估计器;(2)现代基于组的策略优化可能放大不可靠的奖励信号,例如GRPO在优势计算中对奖励的统一处理。随着策略探索越来越多样化的响应,这两个限制造成了一个关键漏洞:不可靠的奖励估计可能被赋予不成比例的影响力,引发严重的奖励黑客问题。我们提出不确定性感知奖励建模(UARM),通过基于分位数的保形预测为奖励模型配备校准的不确定性,并通过异方差方差分解重加权GRPO优势。在HelpSteer、UltraFeedback和PKU-SafeRLHF上的实验表明,与标准GRPO和不确定性无关的基线相比,UARM显著改善了奖励模型校准,减少了奖励黑客问题,并增强了下游对齐质量。

英文摘要

Reinforcement learning from human feedback (RLHF) aligns large language models by training reward models on preference data and optimizing policies to maximize predicted rewards. However, this pipeline faces two fundamental challenges: (1) reward models cannot signal when their predictions are unreliable, since they usually act as deterministic point estimators; and (2) modern group-based policy optimization can amplify unreliable reward signals, as exemplified by GRPO's uniform treatment of rewards during advantage computation. As policies explore increasingly diverse responses, these two limitations create a critical vulnerability: unreliable reward estimates may be granted disproportionate influence, triggering severe reward hacking. We propose Uncertainty-Aware Reward Modeling (UARM), which equips reward models with calibrated uncertainty via quantile-based conformal prediction and reweights GRPO advantages through heteroscedastic variance decomposition. Experiments across HelpSteer, UltraFeedback, and PKU-SafeRLHF demonstrate that UARM significantly improves reward model calibration, reduces reward hacking, and enhances downstream alignment quality compared to standard GRPO and uncertainty-agnostic baselines.

2606.20415 2026-06-19 cs.LG 新提交

Pseudo-Feature Padding: A Lightweight Defense Against False Data Injection in Power Grids

伪特征填充:一种针对电网虚假数据注入的轻量级防御方法

Farhin Farhad Riya, Shahinul Hoque, Yingyuan Yang, Jinyuan Sun, Kevin Tomsovic

发表机构 * University of Tennessee(田纳西大学) The University of Illinois at Springfield(伊利诺伊大学斯普林菲尔德分校) Clemson University(克莱姆森大学)

AI总结 提出一种轻量级防御框架,通过基于输入统计分布的伪特征填充增加输入维度,使对抗攻击因扰动不可转移和填充结构不可预测而计算不可行,显著提升深度神经网络在电网状态估计中的鲁棒性。

详情
AI中文摘要

深度神经网络(DNN)在各种任务中取得了显著的准确性,包括在信息物理系统(CPS)中用于检测关键操作期间的虚假数据注入攻击(FDIA)。然而,CPS的独特基础设施使得DNN容易受到攻击者的利用,以逃避检测。此外,CPS的独特性质对传统的FDIA防御机制提出了挑战。本文提出了一种创新的防御框架,通过引入一个额外的输入层,该层使用从输入统计分布中导出的伪特征值对输入样本进行填充,从而增强DNN抵御此类攻击的能力。这种填充以随机化和数据感知的方式增加了输入维度,使得由于精心设计的扰动的不可转移性和填充结构的不可预测性,对抗攻击在计算上变得不可行。我们的方法轻量级、与模型无关,并且不需要对核心架构进行修改,使其在现实世界的CPS环境中高度可部署。我们在关键电网应用(如使用IEEE 14节点、30节点、118节点和300节点系统的状态估计)上评估了我们的框架。对抗性设置下的实验表明,我们的填充策略显著提高了模型的鲁棒性,对性能的影响可以忽略不计,并有效缓解了原本会绕过传统防御的攻击。

英文摘要

Deep Neural Networks DNNs have achieved remarkable accuracy in various tasks including their application in CyberPhysical Systems CPS for detecting False Data Injection Attacks FDIA during critical operations However the unique infrastructure of CPS makes DNNs vulnerable to exploitation by attackers aiming to evade detection Additionally the distinct nature of CPS presents challenges for conventional defense mechanisms against FDIA This paper proposes an innovative defense framework that strengthens DNNs against such attacks by introducing an additional input layer that performs padding in the input samples using pseudofeature values derived from the inputs statistical distribution This padding increases the input dimensionality in a randomized and dataaware manner making adversarial attacks computationally infeasible due to the nontransferable nature of crafted perturbations and the unpredictability of the padded structure Our method is lightweight modelagnostic and requires no modifications to the core architecture making it highly deployable in realworld CPS settings We evaluated our framework on critical power grid applications such as state estimation using the IEEE 14bus 30bus 118bus and 300bus systems Experiments under adversarial settings demonstrate that our padding strategy significantly improves model robustness with negligible impact on performance and effectively mitigates attacks that would otherwise bypass conventional defenses

2606.20557 2026-06-19 cs.LG math.ST stat.ML stat.TH 新提交

Optimal Deterministic Multicalibration and Omniprediction

最优确定性多校准与全预测

Georgy Noarov, Aaron Roth

发表机构 * University of Pennsylvania(宾夕法尼亚大学)

AI总结 本文提出一种确定性算法,实现多校准的极小化最优样本复杂度,并推广到结果不可区分性,解决确定性预测器是否必要的问题。

详情
AI中文摘要

一个模型在一组群体权重 $G$ 上是多校准的,如果它是校准的——即即使以其预测为条件也是无偏的——不仅整体上,而且在通过每个 $g \in G$ 对上下文重新加权后也是如此。这对于许多下游应用是一个有用的性质,也是可信机器学习的基本要求。在这项工作之前,所有已知达到 $\varepsilon$-多校准的极小化最优 $\widetilde O(\varepsilon^{-3})$ 样本复杂度的预测器都是随机化的,而确定性预测器仅以更差的样本复杂度已知。多校准中随机化对于最优样本复杂度是否必要的问题由 [CLNR26] 明确提出,并在之前的几项工作中隐含提出。我们通过给出一个输出确定性预测器的极小化最优多校准算法解决了这个开放问题。然后我们将该算法推广到产生满足关于有限或有限覆盖测试集合的结果不可区分性(OI)的最优确定性预测器。作为一个应用,这也给出了具有最优样本复杂度的确定性全预测器和泛预测器,解决了 [OKK25] 和 [BHHLZ25] 提出的开放问题。

英文摘要

A model is multicalibrated on a collection of group weights $G$ if it is calibrated -- i.e. unbiased even conditional on its prediction -- not just overall, but also after reweighting contexts by each $g \in G$. It is a useful property for many downstream applications and is a basic desideratum of trustworthy machine learning. Before this work, all predictors known to attain the minimax-optimal $\widetilde O(\varepsilon^{-3})$ sample complexity rate for $\varepsilon$-multicalibration were randomized, while deterministic predictors were known only with substantially worse sample complexity. Whether randomization is necessary for optimal sample complexity in multicalibration was explicitly asked by [CLNR26] and implicitly in several prior works. We resolve this open problem by giving a minimax-optimal multicalibration algorithm that outputs a deterministic predictor. We then generalize the algorithm to produce optimal deterministic predictors that satisfy outcome indistinguishability (OI) with respect to finite or finitely covered collections of tests. As an application, this also gives deterministic omnipredictors and panpredictors with optimal sample complexity, resolving open problems posed by [OKK25] and [BHHLZ25].

9. 图学习与结构化数据 4 篇

2606.19374 2026-06-19 cs.LG cs.AI 新提交

Protein Representation Learning with Secondary-Structure and Energy-Filtered Hydrogen-Bond Graphs

基于二级结构和能量过滤氢键图的蛋白质表示学习

Mohamed Mouhajir, Limei Wang, El Houcine Bergou, Hajar El Hammouti, Lamiae Azizi, Dongqi Fu

发表机构 * College of Computing, UM6P(穆罕默德六世理工大学计算机学院)

AI总结 提出一种二级结构感知的图神经网络,通过增强残基节点表示并基于能量过滤的氢键构建边,以捕获局部结构上下文和长程耦合,在蛋白质基准上取得一致改进并增强生物学可解释性。

Journal ref The 25th International Workshop on Data Mining in Bioinformatics (BIOKDD 2026)

详情
AI中文摘要

基于图的表示被广泛用于蛋白质建模,然而许多现有方法主要依赖序列邻接或几何邻近,这仅部分反映了控制蛋白质折叠的原理。蛋白质实际上采用围绕二级结构元素(如α-螺旋和β-折叠)组织的复杂三维构象,这些元素编码了重复的局部基序和稳定的氢键相互作用。在这项工作中,我们引入了一种二级结构感知的图神经网络用于蛋白质表示学习。残基级别的节点表示通过二级结构分配得到增强,图边由经过能量强度过滤的氢键相互作用构建。这种设计使模型能够捕获对蛋白质稳定性和功能至关重要的局部结构上下文和长程耦合。我们在常用的蛋白质基准上评估了所提出的方法,并观察到相对于现有基于图的方法的一致改进。此外,生成的图表示提供了增强的生物学可解释性,因为学习到的连接性与已建立的结构基序一致。这些发现表明,融入二级结构和能量过滤的氢键拓扑为蛋白质表示学习提供了有效的归纳偏置。代码发布在 https://this URL。

英文摘要

Graph-based representations are widely used in protein modeling, yet many existing approaches rely primarily on sequence adjacency or geometric proximity, which only partially reflect the principles governing protein folding. Proteins instead adopt complex three-dimensional conformations organized around secondary structure elements, such as $α$-helices and $β$-sheets, which encode recurring local motifs and stabilizing hydrogen-bond interactions. In this work, we introduce a secondary-structure-aware graph neural network for protein representation learning. Residue-level node representations are augmented with secondary structure assignments, and graph edges are constructed from hydrogen-bond interactions filtered by their energetic strength. This design enables the model to capture both local structural context and long-range couplings that are central to protein stability and function. We evaluate the proposed approach on commonly used protein benchmarks and observe consistent improvements over existing graph-based methods. In addition, the resulting graph representations offer enhanced biological interpretability, as the learned connectivity aligns with established structural motifs. These findings suggest that incorporating secondary structure and energy-filtered hydrogen-bond topology provides an effective inductive bias for protein representation learning. The code is released at https://github.com/mohamedmohamed2021/SSProNet

2606.19825 2026-06-19 cs.LG 新提交

Enhancing Graph Neural Networks Using Proximity Graphs for Dust Source Emission Forecasting

利用邻近图增强图神经网络用于沙尘源排放预测

Maryam Sanisales, Zahed Rahmati, Ali Darvishi Boloorani, Ali Vefghi

发表机构 * Amirkabir University of Technology(阿米尔卡比尔理工大学) University of Tehran(德黑兰大学)

AI总结 提出使用Delaunay三角剖分等邻近图作为图神经网络输入,通过消息传递捕捉沙尘源排放的时空动态,相比随机图和LSTM模型显著提升预测精度。

详情
AI中文摘要

准确预测沙尘源排放对于减轻沙尘暴带来的重大环境和健康危害至关重要。传统预测方法通常难以捕捉这些现象的复杂时空动态。在本文中,我们证明邻近图使图神经网络(GNN)能够有效建模数据点之间复杂的空间和时间关系。具体来说,我们使用邻近图——如Delaunay三角剖分、Gabriel图、k-最近邻图和Yao图——作为GNN(包括GraphSAGE、图卷积网络和图注意力网络)的输入来执行消息传递。我们的方法强调了将邻近图与GNN集成用于稳健准确的沙尘源预测的有效性。为了强调邻近图表示的重要性,我们将我们的方法与使用随机图进行消息传递的GNN进行了比较。结果表明,使用邻近图的GNN显著优于使用随机图的GNN,并且在沙尘源排放预测中也远优于长短期记忆(LSTM)模型。

英文摘要

Accurate prediction of dust source emissions is critical for mitigating the significant environmental and health hazards posed by dust storms. Traditional forecasting methods often struggle to capture the complex spatiotemporal dynamics of these phenomena. In this paper, we demonstrate that proximity graphs enable Graph Neural Networks (GNNs) to effectively model the intricate spatial and temporal relationships between data points. Specifically, we use proximity graphs--such as Delaunay triangulation, Gabriel graph, k-Nearest Neighbor graph, and Yao graph--as the input for GNNs (including GraphSAGE, Graph Convolutional Networks, and Graph Attention Networks) to perform message passing. Our approach highlights the effectiveness of integrating proximity graphs with GNNs for robust and accurate dust source forecasting. To emphasize the importance of proximity graph representations, we compare our method against GNNs using random graphs for message passing. The results show that GNNs with proximity graphs significantly outperform those with random graphs and are also far superior to Long Short-Term Memory (LSTM) model in dust source emission forecasting.

2606.19956 2026-06-19 cs.LG 新提交

Towards Graph-Based Deep Learning for Map Generalization: Insights from Building Footprints Simplification and Aggregation

基于图深度学习的制图综合:来自建筑足迹简化和聚合的见解

Yanning Wang, Zhiyong Zhou, Zhouyu Liu, Mengni Yu, Yu Feng

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州)) Zhejiang University(浙江大学) Mainz University of Applied Sciences(美因茨应用科学大学)

AI总结 本研究首次探索将图深度学习应用于建筑足迹简化(节点移动预测)和聚合(链接预测),评估了GCN、GAT和GraphSAGE等架构,发现GraphSAGE在链接预测上表现较好,但节点移动预测仍具挑战,且聚合比简化更复杂。

Comments 15 pages, 20 figures, 10 tables

详情
AI中文摘要

制图综合仍然是制图学的基本任务之一,特别是对于复杂建筑足迹的简化和聚合。本研究首次探索将基于图的深度学习应用于这两项任务,在统一的图学习框架中将简化重新表述为节点移动预测,将聚合重新表述为链接预测。我们在多尺度建筑数据集上评估了代表性的图神经网络架构(GCN、GAT和GraphSAGE),结果表明GraphSAGE在链接预测准确性方面表现出相对优势,同时也揭示了精确节点移动预测中持续存在的挑战。除了定量性能外,结果还强调聚合比简化带来更大的复杂性和挑战,突显了当前深度学习方法在制图综合中捕捉更高层次空间关系的困难。尽管存在数据不平衡和需要后处理等局限性,该研究为利用深度学习方法推进自动化制图综合提供了宝贵的见解和方法方向。

英文摘要

Map generalization remains one of the fundamental tasks in cartography, especially for the simplification and aggregation of complex building footprints. This study presents the first exploratory application of graph-based deep learning to both tasks, reformulating simplification as node movement prediction and aggregation as link prediction within a unified graph learning framework. We evaluate representative graph neural network architectures (GCN, GAT, and GraphSAGE) on multi-scale building datasets, showing that GraphSAGE demonstrates relative strengths in link prediction accuracy, while also revealing persistent challenges in precise node movement prediction. Beyond quantitative performance, the results highlight that aggregation poses greater complexity and challenges than simplification, underscoring the difficulty of capturing higher-level spatial relationships in map generalization with current deep learning approaches. Although limitations such as data imbalance and the need for post-processing remain, the study provides valuable insights and methodological directions for advancing automated map generalization with deep learning approaches.

2606.20283 2026-06-19 cs.LG cs.AI 新提交

Boundary Embedding Shaping with Adaptive Contrastive Learning for Graph Structural Disentanglement

基于自适应对比学习的边界嵌入塑造用于图结构解缠

Jiaqing Chen, Zidu Yin, Yichao Cai, Yuhang Liu, Zhen Zhang, Dong Gong, Javen Qinfeng Shi

发表机构 * Yunnan Normal University(云南师范大学) Adelaide University(阿德莱德大学) The University of New South Wales(新南威尔士大学)

AI总结 针对图结构纠缠导致的分类性能下降,提出边界嵌入塑造模块,通过自适应对比学习选择性抑制决策边界处的虚假结构噪声,提升节点分类和链接预测精度。

Comments Accepted at ICML 2026

详情
AI中文摘要

图神经网络(GNN)在聚合邻居信息进行分类方面表现出色,但其性能受到图结构纠缠的阻碍,来自语义无关邻居的虚假相关污染了节点嵌入。这种挑战在嵌入空间中靠近类边界的节点最为严重,放大的结构噪声模糊了决策边界并破坏了预测的稳定性。现有的鲁棒GNN方法大多统一处理所有节点,忽略了边界脆弱性。本文中,为了提高分类性能,我们通过将边界区域纠缠识别为主要瓶颈来解决图结构解缠问题,并提出边界嵌入塑造(BES),一种自适应对比学习GNN插件模块,以最小的模型参数扰动选择性地抑制决策边界处的虚假结构噪声。大量实验表明,BES持续改善边界判别性,并优于现有领先方法。值得注意的是,BES在节点分类中平均提升GCN性能3.3%(在WikiCS上高达5.0%),并在链接预测中实现更优的准确率。

英文摘要

Graph neural networks (GNNs) excel at aggregating neighbor information for classification, yet their performance is hindered by graph structural entanglement, where spurious correlations from semantically irrelevant neighbors contaminate node embeddings. This challenge is most acute for nodes near class boundaries in the embedding space, where amplified structural noise blurs decision boundaries and destabilizes predictions. Existing robust GNN methods largely treat all nodes uniformly, ignoring boundary vulnerabilities. In this paper, to improve classification performance, we tackle graph structural disentanglement by identifying boundary-region entanglement as the primary bottleneck and propose Boundary Embedding Shaping (BES), an adaptive contrastive learning GNN plug-in module that selectively suppresses spurious structural noise at decision boundaries with minimal model parameter perturbation. Extensive experiments demonstrate that BES consistently improves boundary discrimination and outperforms existing leading methods. Notably, BES boosts GCN performance by an average of 3.3% in node classification (up to 5.0% on WikiCS) and achieves superior accuracy in link prediction.

10. 迁移、元学习与持续学习 3 篇

2606.19679 2026-06-19 cs.LG cs.AI 新提交

LOKI: Memory-Free Null-Space Constrained Lifelong Knowledge Editing

LOKI: 无记忆零空间约束的终身知识编辑

Masih Eskandar, Miquel Sirera Perelló, Stratis Ioannidis, Jennifer Dy

AI总结 提出LOKI方法,通过希尔伯特-施密特独立性准则动态选择层,并将梯度更新投影到模型权重的零空间,实现无需访问旧知识的终身知识编辑,平均准确率提升14%。

详情
AI中文摘要

终身知识编辑旨在随着时间推移,当新知识可用或模型出错时,高效且顺序地更新语言模型,同时保持对过去知识的可接受性能。一个未解决的挑战是现有方法对所有新知识样本修改固定层集,降低了灵活性并增加了灾难性遗忘。另一个挑战是需要访问先前知识并进行大量预处理以获得数据统计。为了解决这些挑战,我们引入了LOKI,一种新颖的方法,它基于希尔伯特-施密特独立性准则进行动态层选择,并将梯度更新投影到模型权重的零空间,从而绕过了对先前知识访问的需求。我们表明,LOKI在广泛的实验中实现了优于现有方法的性能,平均准确率提升高达14%。

英文摘要

Lifelong knowledge editing aims to efficiently and sequentially update language models over time, as new knowledge becomes available or when the model makes mistakes, while preserving acceptable performance on past knowledge. One unresolved challenge is that existing methods modify a fixed set of layers for all new knowledge samples, reducing flexibility and increasing catastrophic forgetting. Another is requiring access to previous knowledge and extensive pre-processing to obtain data statistics. To address these challenges, we introduce LOKI, a novel approach that uses dynamic layer selection based on the Hilbert-Schmidt Independence Criterion and projects gradient updates onto the null-space of the model weights, bypassing the requirement for previous knowledge access. We show that LOKI achieves superior performance to existing approaches across a wide variety of experiments, achieving up to a 14\% improvement in average accuracy.

2606.20431 2026-06-19 cs.LG 新提交

Sparsity, Superposition, and Forgetting: A Mechanistic Study of Representation Retention in Continual Learning

稀疏性、叠加与遗忘:持续学习中表示保持的机制研究

Jan Wasilewski, Jędrzej Kozal, Michał Woźniak, Bartosz Krawczyk

发表机构 * Rochester Institute of Technology(罗切斯特理工学院) Wrocław University of Science and Technology(弗罗茨瓦夫科技大学)

AI总结 通过可控玩具框架研究持续学习中的遗忘机制,发现叠加随时间增加但任务边界处有瞬降,高稀疏性增加叠加但不必然导致遗忘,任务级有效秩随稀疏性增长。

详情
AI中文摘要

持续学习(CL)系统常常遗忘先前获得的知识,但由于真实数据集纠缠了许多因素,遗忘的机制在实践中难以孤立。我们提出了一个可控的玩具世界框架,使这些机制可观察和可测试。使用合成生成器-分离器流水线,我们定义了真实潜在特征,构建了具有可调稀疏性和重叠的任务,并引入了表示强度和叠加(特征间的方向重叠)的可测量量。然后,我们通过拟合保留、叠加和暴露历史之间的稀疏动态关系(通过SINDy)来研究保留动态——表示强度的时间变化。基于有效秩的互补任务级分析表征了表示能力如何在任务间分配。我们的受控实验得出三个要点。(1)叠加随时间增加,在任务边界处有瞬降,表明边界特定的干扰而非稳定漂移。(2)更高的特征稀疏性导致更多叠加,但不必然引起遗忘;当表示保持强时,尽管重叠,遗忘可以减少。(3)任务级有效秩随稀疏性增长,表明在稀疏机制下更广泛的能力使用。这些结果共同细化了常见直觉——更多叠加导致更多遗忘,通过显示重叠与表示强度和能力分配相互作用。我们的玩具分析为CL提供了可证伪的假设和诊断工具。

英文摘要

Continual learning (CL) systems often forget previously acquired knowledge, yet the mechanisms driving forgetting remain hard to isolate in practice because real datasets entangle many factors. We present a controlled, toy-world framework that makes these mechanisms observable and testable. Using a synthetic generator-separator pipeline, we define ground-truth latent features, build tasks with tunable sparsity and overlap, and introduce measurable quantities for representation strength and superposition (directional overlap among features). We then study retention dynamics-the temporal change of representation strength by fitting sparse dynamical relations (via SINDy) between retention, superposition, and exposure history. A complementary task-level analysis based on effective rank characterizes how representational capacity is allocated across tasks. Our controlled experiments yield three takeaways. (1) Superposition tends to increase over time with transient dips at task boundaries, suggesting boundary-specific interference rather than steady drift. (2) Higher feature sparsity induces more superposition yet does not inevitably cause forgetting; when representations remain strong, forgetting can be reduced despite overlap. (3) Task-level effective rank grows with sparsity, indicating broader capacity usage under sparse regimes. Together, these results nuance the common intuition that more superposition leads to more forgetting by showing that overlap interacts with representation strength and capacity allocation. Our toy analysis provides falsifiable hypotheses and diagnostic tools for CL.

2606.20538 2026-06-19 cs.LG 新提交

Multi-Task Bayesian In-Context Learning

多任务贝叶斯上下文学习

Qingyang Zhu, Eric Karl Oermann, Kyunghyun Cho

发表机构 * New York University(纽约大学)

AI总结 提出多任务上下文学习框架,通过将先验信息表示为上下文数据集前缀,训练Transformer实现分层贝叶斯预测推理,在多种分布偏移下匹配最优贝叶斯性能且速度提升数个数量级。

Comments ICML 2026

详情
AI中文摘要

贝叶斯预测推断为不确定性量化、数据效率和鲁棒泛化提供了原则性框架。然而,精确推断通常难以处理,可扩展近似可能仍计算昂贵或需要限制性建模假设,从而降低预测性能。先验数据拟合和上下文模型最近作为一种摊销替代方案出现,通过学习直接将数据集映射到预测分布,但现有方法与训练先验的支持紧密耦合,缺乏在测试时适应新先验的显式机制,导致在分布偏移下鲁棒性有限。我们引入了一个多任务上下文学习框架,用于摊销分层贝叶斯预测推断,该框架将先验信息显式表示为上下文数据集的前缀。一个在先验和目标任务序列上训练的Transformer学习跨先验族调整其预测。在一系列难度递增的评估中,包括元分布外先验和具有高维潜在结构的先验,我们的方法匹配了最优贝叶斯预测器,同时速度快了几个数量级。我们进一步在真实世界的时空温度预测基准上展示了其实用性。代码可在https://this URL获取。

英文摘要

Bayesian predictive inference provides a principled framework for uncertainty quantification, data efficiency, and robust generalization. However, exact inference is often intractable, and scalable approximations may remain computationally expensive or require restrictive modeling assumptions that degrade predictive performance. Prior-Data Fitted and in-context models have recently emerged as an amortized alternative by learning to map datasets directly to predictive distributions, but existing approaches are tightly coupled to the support of the training prior and lack explicit mechanisms for adapting to new priors at test time, resulting in limited robustness under distribution shift. We introduce a multi-task in-context learning framework for amortized hierarchical Bayesian predictive inference that explicitly represents prior information as a prefix of in-context datasets. A transformer trained on sequences of prior and target tasks learns to adapt its predictions across families of priors. On a suite of evaluations with increasing difficulty, including out-of-meta-distribution priors and priors with high-dimensional latent structures, our method matches oracle Bayesian predictors while being orders of magnitude faster. We further demonstrate its practical relevance on a real-world spatiotemporal temperature prediction benchmark. Code is available at https://github.com/martianmartina/multi-task-bayesian-icl/.

11. 数据集、基准与评测 13 篇

2606.19411 2026-06-19 cs.LG 新提交

Spectral DPPs via NEPv: A Scalable Continuous Relaxation of Determinantal MAP for Diversity-Aware Data Selection

通过NEPv的谱DPP:用于多样性感知数据选择的确定性点过程MAP的可扩展连续松弛

Richard Yi Da Xu

发表机构 * Hong Kong Baptist University(香港浸会大学) TadReamk Limited(TadReamk有限公司)

AI总结 提出将NP难的DPP-MAP选择问题转化为Stiefel流形上的连续优化,通过非线性特征值问题(NEPv)的自洽场迭代实现近线性时间求解,适用于大规模数据选择。

详情
AI中文摘要

从海量候选池中选择一个小的、多样化的、高质量的子集是现代机器学习中的一个常见原语——用于训练和微调大型模型的数据整理和核心集选择、主动学习批次获取、上下文学习的提示和示例选择、检索多样化以及实验设计。确定性点过程(DPP)为此任务提供了原则性的、良好校准的多样性概念,但其MAP目标——选择大小为$k$的子集$S$最大化$\log\det(L_S)$——是NP难的,并且标准的贪心和采样算法在候选集大小$n$上具有超线性复杂度。这种成本在多样性最重要的数据为中心的场景中尤其高昂,其中$n$范围从数百万到数十亿的候选示例、特征或嵌入。我们将DPP-MAP重新表述为Stiefel流形上的连续优化问题,并证明其最优性条件构成一个先前未研究形式的具有特征向量依赖性的非线性特征值问题(NEPv)。该NEPv允许自洽场(SCF)迭代,具有基于谱间隙的局部收缩保证,从而提供了一个原则性的迭代求解器,其中多样性目标驱动一个特征向量依赖的算子。由此产生的算法OurMethod仅需要与核的矩阵-向量乘积,运行时间为$O\!\big((ndk+nk^2)\,t\big)$,其中迭代次数$t$很小,在$n$上接近线性,并直接与机器学习中常见的低秩和特征映射核集成。本文重点介绍松弛、求解器和扩展分析;完整的真实数据基准测试留给计划中的实证研究。

英文摘要

Selecting a small, diverse, high-quality subset from a massive pool of candidates is a recurring primitive in modern machine learning -- data curation and coreset selection for training and fine-tuning large models, active-learning batch acquisition, prompt and exemplar selection for in-context learning, retrieval diversification, and experimental design. Determinantal Point Processes (\DPP s) give a principled, well-calibrated notion of diversity for this task, but their \emph{MAP} objective -- pick a size-$k$ subset $S$ maximizing $\logdet(L_S)$ -- is NP-hard, and the standard greedy and sampling algorithms scale superlinearly in the ground-set size $n$. This cost is prohibitive precisely in the data-centric regime where diversity matters most, where $n$ ranges over millions to billions of candidate examples, features, or embeddings. We recast \DPP-MAP as a continuous optimization problem over the Stiefel manifold, and show that its first-order optimality conditions form a \emph{Nonlinear Eigenvalue Problem with eigenvector dependency} (\NEPv) of a previously unstudied form. This \NEPv\ admits a self-consistent field (\SCF) iteration with a spectral-gap-based local contraction guarantee, giving a principled iterative solver where the diversity objective drives an eigenvector-dependent operator. The resulting algorithm, \OurMethod, requires only matrix-vector products with the kernel and runs in time $O\!\big((ndk+nk^2)\,t\big)$ for a small number of iterations $t$, scaling near-linearly in $n$ and integrating directly with low-rank and feature-map kernels common in ML. This paper focuses on the relaxation, solver, and scaling analysis; full real-data benchmarking is left to a planned empirical study.

2606.19416 2026-06-19 cs.LG 新提交

MortarBench: Evaluating Mortgage Loan Origination Agents

MortarBench: 评估抵押贷款发起代理

Matthew Toles, Yunan Lu, Manav Munjal, Bojun Liu, Yuanhao Deng, Stephanie Selig, Derek Rindner, Cheng Li, Zhou Yu

发表机构 * Columbia University(哥伦比亚大学) Tidalwave

AI总结 提出MortarBench基准,通过金融数据合成与变异管道生成覆盖边缘案例的示例,评估大语言模型在贷款发起任务中的表现,发现模型准确率低且存在偏见,并引入CRIT校准框架提升准确率至80.5%。

详情
AI中文摘要

贷款发起是贷方创建新贷款的过程,从申请和承保到批准和融资。该过程在评估申请人的资格和风险水平方面起着关键作用。最近,尽管缺乏任何公开基准,公司已开始使用抵押贷款代理来增强人类贷款官员。为填补这一空白,我们提出了MortarBench,一个贷款发起代理基准。MortarBench使用金融数据合成和变异管道生成具有广泛边缘案例覆盖的示例,这些示例匹配真实世界的分布和问题。我们发现最先进的大语言模型(LLM)表现不佳,闭源模型最多达到77.1%的精确匹配准确率。我们还发现LLM对与非英语名字相关的外国性存在系统性偏见。注意到这些弱点,我们引入了CRIT,一个置信度校准框架。我们的方法将准确率提高到80.5%,同时改善了风险管理导向并减少了偏见。

英文摘要

Loan origination is the process by which a lender creates a new loan, from application and underwriting through approval and funding. This process serves a critical role in evaluating the eligibility and level of risk posed by an applicant. Recently, firms have begun using mortgage loan agents to augment human loan officers, despite a lack of any public benchmark. To fill this gap, we present MortarBench, a loan origination agent benchmark. MortarBench uses a financial data synthesis and mutation pipeline to generate examples with broad edge case coverage that match real-world distributions and questions. We find that state-of-the-art large language models (LLMs) perform poorly, with closed-source models achieving at most 77.1\% exact match accuracy. We also discover systematic biases in LLM perception of foreignness related to non-English names. Noting these weaknesses, we introduce CRIT, a confidence calibration framework. Our method increases accuracy to 80.5\% while improving risk management steering and reducing bias.

2606.19481 2026-06-19 cs.LG 新提交

Insulin4RL: Real-Time Insulin Management in the Intensive Care Unit for Offline Reinforcement Learning

Insulin4RL:面向离线强化学习的重症监护室实时胰岛素管理

Thomas Frost, Steve Harris

AI总结 针对电子健康记录离散化导致模型泛化性差的问题,提出基于真实临床轨迹的离线强化学习数据集Insulin4RL,包含375,000+决策和12,209名患者,用于评估模型在真实采样假设下的性能。

Comments Under submission

详情
AI中文摘要

离线强化学习(ORL)有潜力利用历史电子健康记录(EHR)数据提高临床决策质量。当前该领域的训练和评估实践严重依赖于按固定规则时间间隔离散化的EHR数据集。离散化创建了复杂临床场景的虚构表示,并损害了回顾性模型评估的泛化性。在本文中,我们介绍Insulin4RL,一个医疗ORL数据集,其特点是来自真实临床轨迹的自然不规则输入和动作。该数据集源自MIMIC-IV,包含超过375,000个标记决策,涉及12,209名需要在重症监护室进行胰岛素输注滴定的患者。因此,该数据集可用于研究ORL模型在现实临床采样假设下的性能。我们提供了数据集结构和特征的描述、使用无模型离线强化学习的基线性能指标,以及使用拟合Q评估的标准化评估协议。最后,我们提出了未来研究可以利用该资源解决的领域。

英文摘要

Offline reinforcement learning (ORL) offers the potential to improve the quality of clinical decision-making using historical electronic health record (EHR) data. Current training and evaluative practices in this field rely heavily on EHR datasets that have been temporally discretised into fixed, regular time intervals. Discretisation creates fictional representations of complex clinical scenarios and compromises the generalisability of retrospective model evaluations. In this paper, we introduce Insulin4RL, a healthcare ORL dataset featuring naturally irregular inputs and actions from real clinical trajectories. Derived from MIMIC-IV, Insulin4RL comprises over 375,000 labelled decisions across 12,209 patients requiring insulin infusion titration in the Intensive Care Unit. The dataset can thus be used for research into ORL model performance under realistic clinical sampling assumptions. We provide a description of the dataset's structure and characteristics, baseline performance metrics using model-free offline reinforcement learning, and a standardised evaluation protocol using fitted Q-evaluation. We conclude with suggested areas for future research that could be addressed using this resource.

2606.19558 2026-06-19 cs.LG cs.CL 新提交

Displacement Is Not Direction: Evaluating Fidelity Metrics for Quantized LLM Deployment

位移不是方向:评估量化LLM部署的保真度指标

Miloš Nikolić, Ali Hadi Zadeh, Enrique Torres Sanchez, Andreas Moshovos

发表机构 * ByteShape University of Toronto(多伦多大学) Vector Institute for Artificial Intelligence(向量人工智能研究所)

AI总结 本文研究KL散度等保真度指标在量化语言模型部署中与下游基准分数的相关性,发现整体强相关但在近基线区域失效,归因于KL散度主要衡量分歧量而非方向。

详情
AI中文摘要

保真度指标,如每个token的KL散度(KLD)与高精度参考模型的比较,常被用作基准质量的低成本代理。我们在Qwen3.6-35B-A3B的28个量化模型和Devstral-Small-2-24B的41个量化模型上,通过一系列下游基准测试验证了这一做法。我们发现,在整个量化队列中,KLD与基准分数强相关(Qwen上ρ=-0.72,Devstral上ρ=-0.86,p<0.001)。然而,在接近基线的静默区,这种关系变得不显著(Qwen上ρ=+0.00,Devstral上ρ=-0.24,p=0.36)。这种失效在14种测量变体中持续存在,包括不同的KLD聚合方式、困惑度公式、top-1一致性、校准语料库和上下文长度。在逐提示层面,KLD在代码任务上仅有较弱的失败预测能力,在LiveCodeBench上五个模型的失败与通过几何平均比在[1.08,1.22]之间,并且作为跨模型路由器失败,在分歧提示上仅达到42.3%-49.4%的准确率。我们将这种失效归因于结构分解:KLD主要衡量与参考模型的分歧量,在静默区复合ρ在Qwen上为+0.94(p<0.001),在Devstral上为+0.55(p=0.03),而其与分歧方向的关系较弱且依赖于任务。

英文摘要

Fidelity metrics, such as per-token KL divergence (KLD) against a high-precision reference, are often used in practice as low-cost proxies for benchmark quality. We test this practice on a 28-quant cohort of Qwen3.6-35B-A3B and a 41-quant cohort of Devstral-Small-2-24B, evaluated across a suite of downstream benchmarks. We find that KLD is strongly correlated with benchmark score over the full cohort ($ρ=-0.72$ on Qwen and $ρ=-0.86$ on Devstral, both with $p<0.001$). However, this relationship collapses to non-significance in the near-baseline silent zone ($ρ=+0.00$ on Qwen and $ρ=-0.24$, $p=0.36$, on Devstral). This collapse persists across 14 measurement variants, including different KLD aggregations, perplexity formulations, top-1 agreement, calibration corpora, and context lengths. At the per-prompt level, KLD has only weak failure-prediction power on code, with failed-vs-passed geometric-mean ratios in $[1.08,1.22]$ across five models on LiveCodeBench, and fails as a cross-model router, achieving only $42.3\%-49.4\%$ accuracy on disagreement prompts. We trace the collapse to a structural decomposition: KLD primarily measures the volume of disagreement with the reference, with silent-zone composite $ρ=+0.94$ ($p<0.001$) on Qwen and $+0.55$ ($p=0.03$) on Devstral, while its relationship to the direction of those disagreements is weak and task-conditional.

2606.19595 2026-06-19 cs.LG cs.AI 新提交

IHBench: Evaluating Post-Interruption Recovery in Voice Agents with Structured Workflows

IHBench:评估语音代理在结构化工作流中的中断后恢复能力

Ahmad Salimi, Wentao Ma, Yuzhi Tang, Dongming Shen, Mu Li, Alex Smola

发表机构 * Boson AI

AI总结 提出IHBench基准,评估语音代理在结构化工作流中处理中断后的恢复能力,涵盖任务完成和恢复质量两个维度,实验表明闭源模型比开源模型更鲁棒。

详情
AI中文摘要

部署在结构化工作流(客户服务、医疗调度、账户管理)中的语音代理必须处理频繁的用户中断,同时保持多步骤程序的进度。现有的语音能力模型基准侧重于中断的时机:闯入检测、端点检测和轮流对话动态。它们忽略了中断后发生的情况:代理是否在正确的步骤恢复工作流?是否处理了用户的插话?是否避免重复用户已经听过的内容?我们引入了IHBench(中断处理基准),这是一个评估语音代理在10个企业领域中执行状态机驱动工作流时的中断后恢复能力的基准。六种中断类型在话语中间的控制点注入,并随数据生成每个中断的评估标准。每个中断在两个轴上评分:任务完成和恢复质量。我们评估了来自OpenAI、Google和开源社区的27个音频-语言模型配置。模型差异很大,恢复质量强烈依赖于中断类型。在我们的实验中,闭源模型比开源模型对中断更鲁棒:它们在任务完成上获胜的频率更高,随着对话变长,性能下降速度慢约3.3倍,并且没有音频与文本模态差距,而开源模型在这三个方面都处于劣势。一项人类研究验证了LLM评判员与人类标注者的一致性,与AudioMultiChallenge的跨基准分析表明,恢复质量在很大程度上是一个独立的能力轴。

英文摘要

Voice agents deployed in structured workflows (customer service, healthcare scheduling, account management) must handle frequent user interruptions while maintaining progress through multi-step procedures. Existing benchmarks for speech-capable models focus on the timing of interruptions: barge-in detection, endpointing, and turn-taking dynamics. They leave unmeasured what happens after the interruption: does the agent resume the workflow at the correct step? Does it address the user's interjection? Does it avoid re-delivering content the user already heard? We introduce IHBench (Interruption Handling Benchmark), a benchmark that evaluates post-interruption recovery in voice agents executing state-machine-driven workflows across 10 enterprise domains. Six interruption types are injected at controlled points mid-utterance, with per-interruption evaluation rubrics generated alongside the data. Each interruption is scored on two axes: task fulfillment and recovery quality. We evaluate 27 audio-language model configurations from OpenAI, Google, and the open-weight community. Models vary widely, and recovery quality depends strongly on the interruption type. Across our experiments, closed-weight models are consistently more robust to interruptions than open-weight ones: they win far more often on task fulfillment, degrade roughly 3.3x more slowly as conversations grow longer, and show no audio-versus-text modality gap, whereas the open-weight models lose ground on all three. A human study validates the LLM judge against human annotators, and a cross-benchmark analysis against AudioMultiChallenge indicates that recovery quality is a largely distinct capability axis.

2606.19624 2026-06-19 cs.LG 新提交

MassSpecGym in the Wild: Uncovering and Correcting Evaluation Pitfalls in AI-Driven Molecule Discovery

MassSpecGym in the Wild: 揭示并纠正AI驱动分子发现中的评估陷阱

Hongxuan Liu, Roman Bushuiev, Ivy Lightheart, Mrunali Manjrekar, Anton Bushuiev, Magdalena Lederbauer, Filip Jozefov, Yinkai Wang, Soha Hassoun, Josef Sivic, James Taylor, Runzhong Wang, David Healey, Tomáš Pluskal, Connor W. Coley

AI总结 本文系统审查了基于串联质谱的分子发现中机器学习模型的评估问题,以MassSpecGym基准为例,发现26篇论文中至少17篇存在数据泄露、捷径学习和实现错误三类问题,并通过实验量化影响,提出改进建议并发布MassSpecGym v1.5。

详情
AI中文摘要

可靠的基准测试对于开发基于串联质谱(MS/MS)分子发现的机器学习模型至关重要。实验设计和模型评估过程中的细微问题会降低此类基准的可信度,并导致错误结论。我们以标准MassSpecGym基准套件为例,对近期MS/MS机器学习文献中的模型评估问题进行了全面审查,以说明这些问题的影响。在采用MassSpecGym基准的第一年内,我们发现在26篇报告MassSpecGym基准结果的论文中,至少有17篇存在评估问题。我们将失败原因归纳为三类:(i) 数据泄露,(ii) 捷径学习,以及(iii) 实现错误和指标分歧。通过大量实验和代码复现,我们量化了这些问题的影响,并展示了它们如何破坏MassSpecGym旨在强制执行的评估标准。我们将研究结果提炼为适用于MS/MS挑战、基准和自定义评估设置的建议。我们还发布了MassSpecGym v1.5,这是我们在MassSpecGym基准套件中实施建议的版本,解决了本次审计中发现的失败模式。MassSpecGym v1.5可从此https URL公开获取。

英文摘要

Reliable benchmarking is critical for developing machine learning models for tandem mass spectrometry (MS/MS) based molecule discovery. Subtle issues in experimental design and model evaluation procedures can degrade the trustworthiness of such benchmarks and lead to erroneous conclusions. We conduct a thorough review of model evaluation issues in the recent MS/MS machine learning literature, using the standard MassSpecGym benchmark suite as a case study to illustrate the impact of these issues. We find evaluation issues in at least 17 of 26 papers reporting MassSpecGym benchmark results in the first year of its adoption. We isolate three classes of failures: (i) data leakage, (ii) shortcut learning, and (iii) implementation bugs and metric divergence. Through extensive experimentation and code replication, we quantify the impact of these issues and show how they corrupt the evaluation standards MassSpecGym was designed to enforce. We distill our findings into recommendations generalizable to MS/MS challenges, benchmarks, and custom evaluation setups. We also release MassSpecGym v1.5, an implementation of our recommendations in the MassSpecGym benchmarking suite which addresses the failure modes identified in this audit. MassSpecGym v1.5 is publicly available at https://github.com/pluskal-lab/MassSpecGym.

2606.19636 2026-06-19 cs.LG cs.AI 新提交

Hard or Just Unreached? Diagnosing the Sampling Blind Spot in Math-Reasoning Difficulty Estimation

困难还是未触及?诊断数学推理难度估计中的采样盲点

Luca Zhou, Sajel Shah, Emanuele Rodolà, Roberto Dessì

发表机构 * Sapienza University of Rome(罗马大学)

AI总结 发现pass@k在数学推理难度估计中存在盲点,通过激活嫁接的确定性采样可恢复10.3-22.9%的零解样本,揭示结构可识别性。

Comments 9 pages of main paper, 4 figures and 5 tables in the main paper, with more in the appendix

详情
AI中文摘要

数学和科学推理基准依赖pass@k(达到正确结果的采样链比例)作为每个示例的典型难度信号。同样的信号驱动具有可验证奖励的强化学习、数学数据整理、合成课程和验证器训练。我们表明该代理在其最困难的层级上存在持续盲点:在我们测试的八个自由形式数学单元(GSM8K和MATH,跨四个开放权重模型)中,10.3-22.9%的示例在六次尝试中没有任何采样种子解决,但通过六链确定性机制在匹配计算量下被解决。这些是贪婪解码加上通过激活嫁接应用的五个廉价残差流扰动,而单独贪婪解码在这些数学单元上最多解决6%。恢复随额外预算扩展,跨扰动(其机制差异性我们通过所有十二个单元验证,每种设置下跨类型固定集Jaccard <= 0.47)。激活嫁接用作对内部表示的干预,而非解码方法;我们纯粹将其作为诊断和多样化工具,并且我们恢复的项目表明pass@k=0%层级在残差流中结构可识别,而非未修改模型在普通推理下达到它们。

英文摘要

Math and science reasoning benchmarks rely on pass@k, the fraction of sampled chains that reach gold, as the canonical per-example difficulty signal. The same signal drives RL with verifiable rewards, math data curation, synthetic curricula, and verifier training. We show this proxy has a persistent blind spot on its hardest stratum: on the eight free-form math cells we test (GSM8K and MATH across four open-weight models), 10.3-22.9% of the examples that no sampling seed solves in six tries are instead solved at matched compute by a six-chain deterministic regime. These are greedy decoding plus five cheap residual-stream perturbations applied via activation grafting, while greedy alone solves at most 6% on these math cells. Recovery scales with the additional budget, across perturbations whose mechanistic distinctness we verify across all twelve cells (cross-kind fix-set Jaccard <= 0.47 in every setup). Activation grafting is used as an intervention on internal representations, not a decoding method; we use it purely as a diagnostic and diversification tool, and our recovered items show that the pass@k= 0 % stratum is structurally identifiable in the residual stream rather than that the unmodified model reaches them under ordinary inference.

2606.20010 2026-06-19 cs.LG 新提交

Self-Adaptive Scale Handling for Forecasting Time Series with Scale Heterogeneity

面向尺度异质性时间序列的自适应尺度处理方法

Xu Zhang, Zhengang Huang, Yunzhi Wu, Xun Lu, Erpeng Qi, Yunkai Chen, Zhongya Xue, Peng Wang, Wei Wang

发表机构 * College of Computer Science and Artificial Intelligence, Fudan University(复旦大学计算机科学与人工智能学院) Ant Group(蚂蚁集团)

AI总结 提出自适应尺度处理模块,通过学习自适应尺度因子保留语义区分性并减少逆缩放误差,在基金销售数据集上提升主流预测模型性能。

Comments This is the full version of the paper accepted by the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). The code and dataset are available at https://github.com/Meteor-Stars/ASTSF

详情
AI中文摘要

当前时间序列预测研究主要关注尺度同质数据,即不同时间序列具有相似的数值量级范围。然而,在金融产品销售等真实工业场景中,不同时间序列常相差多个数量级(尺度异质性)。由于这些序列共享相似的时间模式,联合建模有利于更好地利用数据,但现有缩放方法要么压缩低尺度信号(全局归一化),要么破坏语义区分性并放大逆缩放误差(基于窗口的缩放)。本文提出一种自适应尺度处理模块,该模块学习针对每个输入的自适应尺度因子,在保持语义区分性的同时减少逆缩放误差。AS由尺度校准(SC)和缩放选择(SS)组成,SC通过神经网络校准先验均值尺度因子,SS决定是否应用校准或保留原始因子,避免过度校准。在蚂蚁财富和支付宝的真实基金销售数据集上的实验表明,AS能无缝集成到主流TSF模型中并持续提升其性能。代码和数据集可在链接 https://this URL 获取。

英文摘要

Current time series forecasting (TSF) research predominantly focuses on scale-homogeneous data, where different time series share similar numerical magnitude ranges. However, in real-world industrial scenarios such as financial product sales, different time series often differ by orders of magnitude (scale heterogeneity). Since these series share similar temporal patterns, joint modeling is desirable for better data utilization, yet existing scaling methods either compress low-scale signals (global normalization) or destroy semantic discriminability and amplify inverse-scaling errors (window-based scaling). This paper proposes a self-Adaptive Scale-handling (AS) module that learns adaptive scale factors tailored to each input, preserving semantic discriminability while reducing inverse-scaling errors. AS consists of Scale Calibrating (SC), which calibrates prior mean scaling factors through neural networks, and Scaling Selection (SS), which decides whether to apply calibration or retain the original factor, avoiding over-calibration. Experiments on real-world fund sales datasets from Ant Fortune and Alipay show that AS seamlessly integrates into popular TSF models and consistently improves their performance. The code and dataset are available at the link https://github.com/Meteor-Stars/ASTSF.

2606.20216 2026-06-19 cs.LG cs.AI 新提交

Learner-based Concept Drift Detection: Analysis and Evaluation

基于学习器的概念漂移检测:分析与评估

Md Moman Ul Haque Khan, Samira Sadaoui

发表机构 * Department of Computer Science, University of Regina(里贾纳大学计算机科学系)

AI总结 本文从理论上分析概念漂移特征,并评估多种漂移检测算法在合成和真实数据集上的性能,旨在增强对漂移检测器行为及其适用性的理解。

Comments 2 authors, 29 pages

详情
AI中文摘要

部署于演化流环境中的机器学习算法必须处理非平稳数据分布,即所谓的概念漂移。概念漂移的存在对许多实际应用构成重大挑战,因为它会严重降低预测性能,阻碍其支持稳健决策的能力。因此,及时高效地检测漂移事件对于长期保持高准确性至关重要。本研究从理论上考察了概念漂移特征以及多个类别的多种漂移检测算法。此外,我们评估了它们在合成和真实数据集上的性能,这些数据集展示了多样的流场景和漂移特征,如突变和渐变。本研究旨在增强对概念漂移特征和漂移检测器行为这一复杂概念的理解,以及它们在不同情境下的适用性。

英文摘要

Machine learning algorithms deployed for evolving streaming environments must handle the non-stationary data distributions, commonly referred to as concept drift. The presence of concept drift poses a major challenge for many real-world applications because it can severely degrade their predictive performance, hindering their ability to support robust decision-making. Consequently, the timely and efficient detection of drift events is critical for sustaining high accuracy over time. This study examines theoretically the concept drift characteristics and numerous drift detection algorithms across several categories. Furthermore, we evaluate their performance on both synthetic and real-world datasets exhibiting diverse streaming scenarios and drift characteristics, such as abrupt and gradual changes. This study aims to enhance understanding of the complex notion of concept drift characteristics and behavior of drift detectors, along with their applicability to diverse contexts.

2606.20347 2026-06-19 cs.LG cond-mat.dis-nn 新提交

Critical Percolation as a Synthetic Data Model for Interpretability

临界渗流作为可解释性的合成数据模型

Aryeh Brill, Tom Ingebretsen Carlson

AI总结 提出基于临界平均场渗流簇的层次函数合成数据集,具有稀疏、分形和幂律分布特性,支持几乎线性时间算法生成任意规模数据,可用于评估可解释性方法。

Comments 21 pages, 10 figures, accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

神经网络学习反映自然数据层次化、多尺度结构的特征。用于评估可解释性方法的合成数据集通常缺乏这种结构,限制了其作为现实玩具模型的价值。为弥补这一差距,我们引入了一系列合成数据集,由定义在高维数据空间中嵌入的临界平均场渗流簇上的层次函数组成。渗流数据由稀疏、低维的分形簇组成,具有幂律大小分布。模拟分类层次结构的潜变量生成每个数据点的目标值。该数据模型在分析上易于处理,具有已知的临界指数,无需超参数调整即可固定其属性。我们利用渗流簇、随机树和加法凝聚之间的映射,提出了一种几乎线性时间的算法,用于联合采样随机树及其层次潜变量分解,从而能够生成任意规模的数据。通过探测实验,我们发现模型的地面真值潜变量可以从神经网络激活中线性解码。稀疏性、自相似性、幂律统计和分析可处理性共同使临界渗流成为可解释性研究的原理性测试平台。

英文摘要

Neural networks learn features that reflect the hierarchical, multi-scale structure of natural data. Synthetic datasets used to evaluate interpretability methods typically lack this structure, limiting their value as realistic toy models. To close this gap, we introduce a family of synthetic datasets consisting of hierarchical functions defined on critical mean-field percolation clusters embedded in a high-dimensional data space. The percolation data consists of sparse, low-dimensional fractal clusters with a power-law size distribution. Latent variables modeling a taxonomic hierarchy generate each data point's target value. The data model is analytically tractable with known critical exponents that fix its properties without requiring hyperparameter tuning. We leverage a mapping between percolation clusters, random trees, and additive coalescence to propose an almost linear-time algorithm to jointly sample a random tree and its hierarchical latent decomposition, enabling data generation at arbitrary scale. Using probing experiments, we find that the model's ground-truth latent variables can be linearly decoded from neural network activations. Together, sparsity, self-similarity, power-law statistics, and analytical tractability make critical percolation a principled testbed for interpretability research.

2606.20376 2026-06-19 cs.LG cs.AI 新提交

CRAX: Fast Safe Reinforcement Learning Benchmarking

CRAX:快速安全强化学习基准测试

Tristan Tomilin, Mourad Boustani, Mickey Beurskens, Thiago D. Simão

发表机构 * Eindhoven University of Technology(埃因霍温理工大学)

AI总结 提出基于JAX加速的安全RL基准CRAX,利用MJX物理引擎实现高达100倍加速,包含6个环境套件和3个智能体任务,评估6种方法揭示性能与安全权衡。

详情
AI中文摘要

安全性是强化学习(RL)智能体在机器人、自动驾驶等现实领域部署的核心问题。尽管基准测试对RL的进步至关重要,但现有具有高保真3D物理的安全基准计算速度慢,限制了大规模实验和快速原型开发。为解决这一问题,我们提出CRAX(基于JAX加速的约束RL)。CRAX构建在具有逼真3D动力学的MuJoCo XLA(MJX)物理引擎之上,利用向量化操作和硬件加速,相比基于CPU的同类安全基准实现高达约100倍的加速。该基准包含六个环境套件和三个智能体特定任务,每个任务涵盖三个难度级别。对六种流行安全RL方法的评估表明,没有单一方法在所有任务中占主导地位,并揭示了性能与安全之间的权衡。我们发现,跨难度级别的课程学习和安全迁移可以比直接在更困难设置中训练提高性能。

英文摘要

Safety is a core concern for deploying reinforcement learning (RL) agents in real-world domains such as robotics and autonomous driving. While benchmarks have been central to progress in RL, existing safety benchmarks with high-fidelity 3D physics remain computationally slow, limiting large-scale experimentation and rapid prototyping. To address this gap, we propose CRAX (Constrained RL Accelerated with JAX). Built on top of the MuJoCo XLA (MJX) physics engine with realistic 3D dynamics, CRAX leverages vectorized operations and hardware acceleration, yielding up to ~100x speedups over comparable CPU-based safety benchmarks. The benchmark features six environment suites and three agent-specific tasks, each spanning three difficulty levels. Evaluating six popular safe RL methods shows that no single approach dominates across all tasks, and reveals the trade-offs between performance and safety. We find that curriculum learning across difficulty levels and safety transfer can improve performance over direct training in harder settings.

2606.20400 2026-06-19 cs.LG 新提交

The Significance of Style Diversity in Annotation-Free Synthetic Data Generation

无标注合成数据生成中风格多样性的重要性

Zahra Abbasiantaeb, Zeno Belligoli, Omar Essam, Mohammad Aliannejadi

发表机构 * University of Amsterdam(阿姆斯特丹大学)

AI总结 提出无需人工标注的对话生成框架,利用主题和风格属性增强多样性,并设计两种后处理风格化模型,实验表明风格多样性比主题多样性更关键,性能可达人工标注数据的93.3%。

详情
AI中文摘要

为意图分类生成高实用性的合成数据通常需要人工标注的种子数据,这在快节奏的工业环境中往往不可用。在本文中,我们提出了一个完全无需人工标注数据、仅依赖意图定义的合成对话生成框架。我们提出的对话生成框架利用两种不同类型的主题和风格属性来提高数据多样性。此外,我们提出了两种新颖的后处理风格化模型,称为Univ和Exam,以将合成的LLM生成的语句转换为更多样化、更接近人类的语言风格。为了提升数据质量,我们利用LLM作为评判的过滤过程。在工业数据集和公开数据集上的实验结果表明,所提出的方法达到了使用人工标注训练数据所获得性能的93.3%。至关重要的是,研究结果揭示,对于合成数据的实用性,风格多样性比主题多样性更为关键,因为它能防止模型学习虚假的风格相关性。此外,研究表明,在生成过程中融入风格属性比后处理风格适应更有效。

英文摘要

Generating high-utility synthetic data for intent classification typically requires human-annotated seed data, which is often unavailable in fast-paced industrial settings. In this paper, we propose a framework for synthetic dialogue generation that works entirely without human-annotated data, relying solely on intent definitions. Our proposed dialogue generation framework utilizes two different types of topic and style attributes to improve data diversity. Also, we propose two novel post-hoc stylization models called Univ and Exam to transform synthetic LLM-generated utterances into more varied, human-like linguistic styles. To enhance data quality, we utilize an LLM-as-a-judge filtering process. Experimental results on both industrial and public datasets demonstrate that the proposed approach achieves up to 93.3% of the performance obtained using human-annotated training data. Crucially, the findings reveal that style diversity is more critical than topic diversity for synthetic data utility, as it prevents models from learning spurious stylistic correlations. Furthermore, the study shows that incorporating style attributes during the generation process is more effective than post-hoc style adaptation.

2606.20461 2026-06-19 cs.LG cs.CY cs.DB 新提交

Data Bias Mitigation under Coverage Constraints & The Price of Fairness

覆盖约束下的数据偏差缓解与公平的代价

Bruno Scarone, Alfredo Viola, Renée J. Miller

发表机构 * Khoury College of Computer Sciences, Northeastern University(东北大学库里计算机科学学院) Cheriton School of Computer Science, University of Waterloo(滑铁卢大学切里顿计算机科学学院)

AI总结 针对多敏感属性交叉群体的偏差问题,提出在覆盖约束下扩展偏差缓解框架,通过整数线性规划优化缓解策略,权衡偏差近似误差与数据效率,并刻画公平的代价。

Comments Accepted to FAccT 2026

详情
AI中文摘要

机器学习模型已被证明在多个敏感属性(如种族和性别)交叉的个体上表现出歧视性结果或性能下降。这源于两个相互关联的挑战:缺乏量化偏差(可能是交叉的)的原则性措施,以及训练数据中交叉子群的代表性不足。我们扩展了一个最近的偏差缓解框架,以纳入覆盖约束,确保跨群体(包括交叉子群)的充分代表性。由于对所有群体实现完全零偏差可能不是数据高效的(意味着可能需要大量数据),我们的解决方案在满足覆盖约束的同时,用偏差的小近似误差换取更高的数据效率。我们还将偏差缓解表述为一个整数线性规划,优化所有缓解策略,并刻画公平的代价,即最小数据修改成本,作为公平容忍度的函数。这对于法律合规(法规可能规定特定的公平阈值)和数据治理(使从业者能够在偏差减少和数据修改(特别是数据购买)成本之间做出明智的权衡)都至关重要。我们在公开数据集上评估了我们的技术,表明通过我们的框架进行偏差缓解可以保持多个分类器的预测准确性,并且覆盖约束虽然出于统计考虑,但对于保持下游机器学习性能至关重要。

英文摘要

Machine learning models have been shown to exhibit discriminatory outcomes or degraded performance for individuals at the intersection of multiple sensitive attributes, such as race and gender. This stems in part from two interrelated challenges: the lack of principled measures for quantifying bias (potentially intersectional), and insufficient representation of intersectional subgroups in training data. We extend a recent bias mitigation framework to incorporate coverage constraints that enforce sufficient representation across groups, including intersectional subgroups. Since achieving exactly zero bias for all groups may not be data efficient (meaning it may require large amounts of data), our solution trades small approximation errors in bias for greater data efficiency while satisfying coverage constraints. We also formulate bias mitigation as an integer linear program that optimizes over all mitigation strategies, and characterize the price of fairness, the minimum data modification cost, as a function of fairness tolerance. This is essential both for legal compliance, where regulations may mandate specific fairness thresholds, and for data governance, enabling practitioners to make informed trade-offs between bias reduction and data modification (particularly, data purchasing) costs. We evaluate our techniques on publicly available datasets, demonstrating that bias mitigation via our framework preserves predictive accuracy across multiple classifiers, and that coverage constraints, while motivated by statistical considerations, are essential for preserving downstream ML performance.

12. 机器学习应用 26 篇

2606.19363 2026-06-19 cs.LG 新提交

When to Trust, How to Distill: Multi-Foundation Model Guidance for Lightweight, Robust Scientific Time Series Forecasting

何时信任,如何蒸馏:面向轻量级鲁棒科学时间序列预测的多基础模型指导

Rupasree Dey, Abdul Matin, Nathan Orwick, Yao Zhang, Shrideep Pallickara, Sangmi Lee Pallickara

发表机构 * Colorado State University(科罗拉多州立大学)

AI总结 提出Guard框架,通过上下文路由器和不确定性门控温度机制,从多个分布偏移的基础模型中蒸馏知识,训练轻量级预测器,在气象、碳通量等四个领域降低RMSE。

Comments KDD 2026, paper decision: Accepted, track: AI for Science. total 12 pages including references and appendix

详情
AI中文摘要

时间序列基础模型(TSFMs)在物理科学中的部署受到一个关键权衡的阻碍:虽然这些模型编码了丰富、通用的时间动态,但当零样本应用于特定科学领域时,它们会遭受严重的分布错位,并且其计算成本阻碍了在边缘计算传感器网络中的部署。我们解决了一个基本挑战:如何从错位的基础模型(FM)中提取潜在的结构知识,以训练轻量级、专门的预测器?我们提出了用于蒸馏的门控不确定性感知路由(Guard),这是一个新颖的框架,将多教师蒸馏重新定义为实例级决策过程,具有两种自适应机制:(1)上下文路由器,根据局部输入统计动态选择最相关的教师,利用不同基础模型之间的互补性;(2)不确定性门控温度机制,充当“断路器”,当教师置信度与领域现实偏离时自动减弱蒸馏强度。我们在四个气候关键领域评估了我们提出的轻量级框架:气象学、生态系统碳通量、土壤湿度和能源电网。我们的方法相对于固定权重的多教师蒸馏基线显著降低了RMSE,成功地从预训练的FM(教师)中蒸馏知识,即使由于原始和目标数据域之间的分布偏移,它们表现出次优的零样本准确性。我们证明,这些领域错位的教师仍然可以作为关键的纠正者,在28.5%的最难实例上优于全局优越的FM。最终,这使得适用于资源受限边缘部署的高精度科学预测成为可能。代码可在https://this URL获取。

英文摘要

The deployment of Time-Series Foundation Models (TSFMs) in physical sciences is hindered by a critical trade-off: while these models encode rich, universal temporal dynamics, they suffer from severe distributional misalignment when applied zero-shot to specific scientific domains, and their computational cost prohibits deployment in edge-computing sensor networks. We address a fundamental challenge: How can we extract latent structural knowledge from misaligned foundation models (FM) to train lightweight, specialized forecasters? We propose Gated Uncertainty-Aware Routing for Distillation (Guard), a novel framework that reframes multiteacher distillation as an instance-wise decision process with two adaptive mechanisms: (1) a Contextual Router that dynamically selects the most relevant teacher based on local input statistics, exploiting complementarity across diverse foundation models; and (2) an Uncertainty-Gated Temperature mechanism that acts as a "circuit-breaker," automatically attenuating distillation strength when teacher confidence diverges from domain reality. We evaluate our proposed lightweight framework on four climate-critical domains: meteorology, ecosystem carbon flux, soil moisture, and energy grids. Our method significantly reduces RMSE relative to a fixed-weight multi-teacher distillation baseline, successfully distilling knowledge from pretrained FMs (teachers) even when they exhibit suboptimal zero-shot accuracy due to distribution shift between the original and target data domains. We demonstrate that these domain-misaligned teachers can still serve as critical correctives, outperforming the globally superior FMs on 28.5% of the hardest instances. Ultimately, this enables high-precision scientific forecasting suitable for resource-constrained edge deployment. Code is available at https://github.com/RupasreeDey/GUARD-KDD2026.

2606.19371 2026-06-19 cs.LG cs.AI cs.CV 新提交

ProMUSE: Progressive Multi-modal Uncertainty-guided Staged Evidential Alzheimer Disease Classification

ProMUSE: 渐进式多模态不确定性引导的分阶段证据阿尔茨海默病分类

Long Doan, Branden Chen, Ethan Litton, Huan Huang, Jiajing Huang, Yixin Xie, Weihua Zhou, Nandakumar Narayanan, Chen Zhao

发表机构 * Kennesaw State University(肯尼索州立大学) Michigan Technological University(密歇根理工大学) University of Iowa(爱荷华大学)

AI总结 提出ProMUSE,一种渐进式多模态不确定性引导的分阶段证据网络,通过自适应决定何时需要额外模态,在保持准确性的同时降低数据采集成本。

详情
AI中文摘要

阿尔茨海默病(AD)是一种致命性疾病,会破坏老年人的记忆和认知能力。大多数AD治疗在早期阶段有效,导致对早期AD诊断的需求日益增加。AD诊断越来越依赖多模态数据,如临床评估、结构磁共振成像(MRI)和正电子发射断层扫描(PET)成像。然而,MRI和PET采集仍然昂贵且不易普及,使得全模态推理在现实临床工作流程中不切实际。我们提出ProMUSE,一种渐进式多模态不确定性引导的分阶段证据网络,该网络自适应地确定何时需要额外模态,有助于在保持准确性的同时降低数据采集的总体成本。ProMUSE首先使用低成本临床数据进行证据分类,并通过基于Dirichlet的主观逻辑模型量化不确定性。当不确定性超过学习阈值时,ProMUSE逐步引入MRI或PET特征,通过Dempster-Shafer理论融合模态层面的信念和不确定性,获得校准的多模态预测。这种分阶段采集策略能够在最小化对昂贵成像依赖的同时实现准确诊断。在ADNI、AIBL和OASIS数据集上针对CN-AD、CN-MCI和MCI-AD任务的实验表明,ProMUSE在减少50-90%的MRI/PET使用量的同时,实现了与全模态基线相当或更优的准确性,从而大幅节省成本。这些结果突显了ProMUSE作为现实世界AD筛查中一种实用、不确定性感知且资源高效的解决方案。

英文摘要

Alzheimer's disease (AD) is a fatal disorder that destroys memory and cognitive skills in the elderly population. Most treatments for AD are effective in the early stage, leading to an increasing demand for early AD diagnosis. AD diagnosis increasingly relies on multimodal data such as clinical assessments, structural Magnetic Resonance Imaging (MRI), and Positron Emission Tomography (PET) imaging. However, MRI and PET acquisition remain costly and not universally accessible, making full-modality inference impractical in real-world clinical workflows. We propose ProMUSE, a Progressive Multi-modal Uncertainty Guided Staged Evidential Network that adaptively determines when additional modalities are necessary, helping reduce the overall cost of data acquisition while maintaining accuracy. ProMUSE first performs evidential classification using low-cost clinical data and quantifies uncertainty via a Dirichlet-based subjective logic model. When uncertainty exceeds a learned threshold, ProMUSE progressively incorporates MRI or PET features, fusing modality-wise belief and uncertainty through Dempster-Shafer theory to obtain a calibrated multimodal prediction. This staged acquisition strategy enables accurate diagnosis while minimizing reliance on expensive imaging. Experiments on ADNI, AIBL, and OASIS across CN-AD, CN-MCI, and MCI-AD tasks demonstrate that ProMUSE achieves competitive or superior accuracy compared to full-modality baselines while reducing MRI/PET usage by 50-90%, yielding substantial cost savings. These results highlight ProMUSE as a practical, uncertainty-aware, and resource-efficient solution for real-world AD screening.

2606.19373 2026-06-19 cs.LG cs.AI 新提交

cAPM: Continual AI-Assisted Pace-Mapping with Active Learning

cAPM:具有主动学习的持续AI辅助起搏标测

Dylan O'Hara, Pradeep Bajracharya, Casey Meisenzahl, Karli Gillette, Anton J. Prassl, Gernot Plank, Saman Nazarian, Roderick Tung, John L Sapp, Linwei Wang

发表机构 * Rochester Institute of Technology(罗切斯特理工学院) University of Utah(犹他大学) Scientific Computing and Imaging Institute, University of Utah(犹他大学科学计算与成像研究所) Medical University of Graz(格拉茨医科大学) University of Pennsylvania Perelman School of Medicine(宾夕法尼亚大学佩雷尔曼医学院) The University of Arizona College of Medicine(亚利桑那大学医学院) Dalhousie University(达尔豪斯大学)

AI总结 提出cAPM框架,通过任务无关的代理神经网络、主动学习和持续学习策略,在减少起搏标测数据量的同时,实现跨室性心动过速的知识迁移,将定位精度提升至81%。

详情
AI中文摘要

室性心动过速是一种危及生命的心律失常,是心源性猝死的主要原因。起搏标测是一种临床程序,用于在导管消融室性心动过速期间识别干预靶点。它要求临床医生在心室的不同部位起搏,并快速解释由此产生的心电图,以确定下一步起搏位置或是否已识别出靶点。已提出主动学习AI模型来指导临床医生选择下一个起搏点,显示出在减少起搏点数量和改善起搏标测效率方面的潜力。现有方法需要对每个靶点重新训练,无法在同一患者或不同患者的多个室性心动过速之间迁移知识。我们引入cAPM用于持续AI辅助起搏标测,以捕获和迁移从过去起搏标测数据中积累的知识,从而减少未来靶点室性心动过速所需的起搏标测数据量。这是通过一个任务无关的代理神经网络实现的,该网络学习从起搏点到12导联心电图形态的映射;一种主动学习策略,通过为每个靶点选择信息量最大的起搏点来优化该代理模型;以及一种持续学习策略,以顺序方式执行此操作,同时保留先前靶点的知识。在由不同生理条件和心室几何形状下顺序呈现的定位任务组成的计算机模拟测试平台上评估,cAPM(无论是否重放过去数据样本)在使用4.5个起搏标测点时,在临床耐受范围内(5毫米精度)定位的概率达到81%,而最先进的主动学习方法使用13.7个起搏点达到38%的概率。这些结果为cAPM准备用于体内临床前和临床研究提供了坚实基础,在这些研究中,cAPM可用于指导起搏标测。

英文摘要

Ventricular tachycardia is a life-threatening rhythm disorder and a major cause of sudden cardiac death. Pace-mapping is a clinical procedure for identifying the intervention target during catheter ablation of VT. It requires clinicians to pace different sites in the ventricles and rapidly interpret the resulting electrocardiograms to determine where to pace next or whether a target site has been identified. Active learning AI models have been proposed to guide clinicians to the next pacing site, showing promise in reducing the number of pacing sites and improving the efficiency of pace-mapping. Existing methods require retraining each target without the ability to transfer knowledge across multiple VTs within the same patient or across patients. We introduce cAPM for continuous AI-assisted pace-mapping to capture and transfer knowledge accumulated from past pace-mapping data to reduce the number of pace-mapping data needed for future target VTs. This is made possible by a task-agnostic surrogate neural network that learns the mapping from pacing sites to 12-lead ECG morphology, an active-learning strategy that refines this surrogate model by selecting the most informative pacing site for each target, and a continual learning strategy to do so sequentially while retaining knowledge from prior targets. Evaluated on an in-silico testbed consisting of sequentially-presented localization tasks across different physiological conditions and ventricular geometries, cAPM with and without replay of past data samples achieved an 81% probability of localizing within clinical tolerance (5 mm accuracy) using 4.5 pace-mapping sites, compared to the state-of-the-art active-learning method achieving 38% probability using 13.7 pacing sites. These results provide a strong basis for preparing cAPM towards in-vivo preclinical and clinical studies where it can be used to guide pace-mapping.

2606.19375 2026-06-19 cs.LG cond-mat.mtrl-sci 新提交

Physics-Informed Discovery of Yield Functions in Plasticity via Convex Neural Representations

基于凸神经表示的塑性屈服函数物理信息发现

Hyeonbin Moon, Donghyuk Cho, Jecheon Yu, Jeong Whan Yoon, Seunghwa Ryu

发表机构 * KAIST(韩国科学技术院)

AI总结 提出一种物理信息框架,从全场位移和反力数据中自动发现各向异性屈服函数,无需应力观测或预设参数形式,采用凸神经网络表示并嵌入弹塑性应力积分中训练。

Comments 39 pages

详情
AI中文摘要

识别各向异性屈服函数仍然具有挑战性,因为屈服在全场力学测量中无法直接观测,方向标定可能需要多个加载方向,且选择合适的解析形式并非易事。本研究提出一种物理信息框架,用于从全场位移数据和反力数据中发现屈服函数,无需应力观测、塑性应变测量、直接屈服面数据或预设的参数化屈服函数。该框架将屈服函数识别为弹塑性应力积分中受力学约束的本构组成部分,而非通过直接的应力空间监督。屈服函数由凸神经网络表示,该网络强制执行凸性和一次正齐次性,同时施加假定的拉压对称性,并通过可微应力更新和跨多个加载工况的物理信息力平衡损失来训练该神经屈服函数。使用von Mises、Hill 1948和Yld2000-2d屈服函数的有限元基准研究验证了所提框架,评估了屈服轮廓一致性、位移噪声敏感性、通过塑性活跃应力状态的可识别性、认知不确定性和多项式代理部署。本研究提供了一条受力学约束的路径,用于从位移和力数据中发现各向异性屈服函数,同时将识别出的组件保留在弹塑性应力积分的结构内。

英文摘要

Identifying anisotropic yield functions remains challenging since yielding is not directly observed in full-field mechanical measurements, directional calibration can require many loading directions, and selecting an appropriate analytical form is nontrivial. This study proposes a physics-informed framework for discovering yield functions from full-field displacement data and reaction force data, without stress observations, plastic strain measurements, direct yield surface data, or a prescribed parametric yield function. The framework identifies the yield function as a mechanically constrained constitutive component inside elastoplastic stress integration, rather than through direct stress-space supervision. The yield function is represented by a convex neural network that enforces convexity and positive homogeneity of degree one while imposing the assumed tension-compression symmetry, and this neural yield function is trained with a differentiable stress update and a physics-informed force equilibrium loss across multiple loading cases. The proposed framework is validated using finite element (FE) benchmark studies with von Mises, Hill 1948, and Yld2000-2d yield functions, assessing yield contour agreement, displacement-noise sensitivity, identifiability through plastically active stress states, epistemic uncertainty, and polynomial-surrogate deployment. This study provides a mechanics-constrained pathway for discovering anisotropic yield functions from displacement and force data while keeping the identified component within the structure of elastoplastic stress integration.

2606.19376 2026-06-19 cs.LG cs.AI cs.IR 新提交

Cost-Optimal LLM Routing with Limited User Feedback under User Satisfaction Guarantees

在用户满意度保证下基于有限用户反馈的成本最优LLM路由

Herbert Woisetschläger, Arastun Mammadli, Ryan Zhang, Shiqiang Wang

发表机构 * Technical University of Munich(慕尼黑工业大学) University of Exeter(埃克塞特大学) Horace Greeley High School(霍勒斯格里利高中)

AI总结 针对LLM推理成本与服务质量之间的矛盾,提出SLARouter在线路由算法,利用稀疏单侧用户反馈学习成本最优策略,理论保证成本最优和SLA合规,实验显示成本降低高达2.2倍。

Comments Preprint. Under review

详情
AI中文摘要

大型语言模型(LLM)应用的推理成本正在快速增长,这是由于需求激增和基础设施成本上升所驱动的。用户期望高质量的响应,在商业环境中,这被正式编码在服务级别协议(SLA)中,从而在成本和质量之间形成了根本性的矛盾。最近在成本感知的LLM请求路由方面的进展显示出解决这一矛盾的潜力,但现有方法依赖于完整的反馈信号、离线训练、大量的每工作负载调优,并且大多数缺乏SLA保证或推理时适应性。我们引入了SLARouter,一种在线路由算法,它从生产系统中可用的稀疏、单侧用户反馈中学习成本最优策略。SLARouter为成本最优性和严格的SLA合规性提供了理论保证。在广泛的LLM基准测试上的实验表明,SLARouter无需每基准调优即可满足SLA约束,将运营成本降低至现有基线的2.2倍。

英文摘要

Inference costs for large language model (LLM) applications are rapidly growing, driven by surging demand and rising infrastructure cost. Users expect high-quality responses, and in commercial settings this is formally codified in Service Level Agreements (SLAs), creating a fundamental tension between cost and quality. Recent progress on cost-aware LLM request routing has shown potential to resolve this tension, but existing approaches rely on complete feedback signals, offline training, extensive per-workload tuning, and most lack SLA guarantees or inference-time adaptivity. We introduce SLARouter, an online routing algorithm that learns a cost-optimal policy from the sparse, one-sided user feedback available in production systems. SLARouter provides theoretical guarantees for both cost optimality and strict SLA compliance. Experiments across a wide range of LLM benchmarks show that SLARouter satisfies SLA constraints without the need for per-benchmark tuning, reducing operating cost by up to 2.2x over existing baselines.

2606.19378 2026-06-19 cs.LG cond-mat.mtrl-sci 新提交

A Hybrid GNN-FEM Framework for Phase-Field Fracture Simulation. Physics-Preserving Hybridization for Generalizable Surrogate Modeling

一种用于相场断裂模拟的混合GNN-FEM框架:面向通用代理模型的物理保持混合方法

Hyeonbin Moon, Yongjin Choi, Seunghwa Ryu

发表机构 * KAIST(韩国科学技术院)

AI总结 提出混合GNN-FEM框架,用图神经网络替代相场更新步骤,保留FEM位移求解器,通过无量纲特征设计和物理信息损失实现跨几何、载荷、材料和离散化的通用断裂模拟,降低计算成本并保持精度。

Comments 46 pages

详情
AI中文摘要

科学机器学习(SciML)已成为加速复杂物理系统模拟的一种有前景的方法,但对于非线性、历史依赖问题实现物理一致且可泛化的预测仍然是一个核心挑战。在本研究中,我们提出了一种混合GNN-FEM框架,用于高效且可泛化的相场断裂建模。虽然相场方法为模拟复杂裂纹演化提供了稳健的变分框架,但其高计算成本限制了实际应用,因为需要在增量有限元过程中求解耦合、非线性和历史依赖的系统。为应对这一挑战,我们将图神经网络代理集成到传统的交错方案中,在每个载荷增量下替代相场更新,同时保留基于FEM的位移求解器以强制执行力学平衡和边界条件。通过保留增量求解结构,该框架与历史依赖的断裂演化保持一致,而无需代理近似整个解轨迹。这种选择性代理策略强调识别物理上有意义且增量结构化的学习目标,而非依赖暴力数据生成来学习整个断裂过程。所提出的框架通过无量纲特征设计、基于网格域的图公式以及源自控制相场方程的物理信息损失,实现了跨不同几何、载荷条件、材料属性和离散化的强泛化能力。数值实验表明,与传统FEM相比,该混合方法在保持精度的同时降低了计算成本,并在多种问题设置下展现出稳健的预测性能。

英文摘要

Scientific machine learning (SciML) has emerged as a promising approach for accelerating simulations of complex physical systems, yet achieving physically consistent and generalizable predictions for nonlinear, history-dependent problems remains a central challenge. In this study, we propose a hybrid GNN--FEM framework for efficient and generalizable phase-field fracture modeling. While phase-field approaches provide a robust variational framework for simulating complex crack evolution, their high computational cost limits practical applications because they require solving coupled, nonlinear, and history-dependent systems within an incremental finite element procedure. To address this challenge, a graph neural network surrogate is integrated into the conventional staggered scheme, replacing the phase-field update at each load increment while retaining the FEM-based displacement solver to enforce mechanical equilibrium and boundary conditions. By preserving the incremental solution structure, the framework remains consistent with history-dependent fracture evolution without requiring the surrogate to approximate the full solution trajectory. This selective surrogate strategy emphasizes the identification of a physically meaningful and incrementally structured learning target, rather than relying on brute-force data generation to learn the full fracture process. The proposed framework achieves strong generalization across varying geometries, loading conditions, material properties, and discretizations through dimensionless feature design, a graph-based formulation on mesh-based domains, and a physics-informed loss derived from the governing phase-field equation. Numerical experiments demonstrate that the hybrid approach reduces computational cost while maintaining accuracy compared with conventional FEM, and exhibits robust predictive performance across diverse problem settings.

2606.19399 2026-06-19 cs.LG cs.AI cs.LO cs.PL 新提交

VERITAS: Verifier-Guided Proof Search for Zero-Shot Formal Theorem Proving

VERITAS:验证器引导的零样本形式定理证明搜索

Manish Acharya, Zhenyu Liao, Yueke Zhang, Kevin Leach, Yu Huang, Yifan Zhang

发表机构 * Department of Computer Science, Vanderbilt University(范德堡大学计算机科学系) Amazon(亚马逊)

AI总结 提出VERITAS框架,通过两阶段协议(Best-of-N采样+批评引导MCTS)利用验证器反馈进行零样本定理证明,在miniF2F上达40.6%准确率,并发布组合学基准VERITAS-CombiBench。

详情
AI中文摘要

基于LLM的形式化证明器通常将丰富的验证器信号(语法错误、类型不匹配、部分目标进展)压缩为二进制的通过/失败位。我们提出VERITAS,一个零样本框架,通过两阶段协议将每个验证器信号路由回证明搜索:首先进行Best-of-N采样,然后进行批评引导的MCTS遍历,该遍历将第一阶段失败作为显式负例吸收。该协议保留其第一阶段扫描解决的每个定理,因此第二阶段额外的解决可归因于反馈驱动的探索。VERITAS在miniF2F上达到40.6%(相比之下,独立运行的Best-of-5为36.9%,Portfolio为26.2%),在VERITAS-CombiBench上达到7.3%,这是一个我们发布的55个定理的组合学基准,在该基准上Best-of-5(1.8%)低于Portfolio(3.6%),暴露了当必须从验证器反馈中迭代恢复正确的引理名称时,无指导的采样会带来损害。工件可在GitHub上获取。

英文摘要

LLM-based formal provers often collapse rich verifier signals (syntax errors, type mismatches, partial goal progress) into a binary pass/fail bit. We present VERITAS, a zero-shot framework that routes every verifier signal back into proof search through a two-phase protocol: Best-of-N sampling first, then a critic-guided MCTS pass that ingests Phase 1 failures as explicit negative examples. The protocol preserves every theorem solved by its own Phase 1 sweep, so Phase 2's additional solves are attributable to feedback-driven exploration. VERITAS reaches 40.6% on miniF2F (vs. an independently run Best-of-5 at 36.9%, Portfolio 26.2%) and 7.3% on VERITAS-CombiBench, a 55-theorem combinatorics benchmark we release on which Best-of-5 (1.8%) falls below Portfolio (3.6%), exposing that unguided sampling hurts when correct lemma names must be recovered iteratively from verifier feedback. Artifacts are available on GitHub.

2606.19412 2026-06-19 cs.LG 新提交

Spectral Retrieval-Augmented Time-Series Forecasting

频谱检索增强的时间序列预测

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

发表机构 * Applied Artificial Intelligence Initiative(应用人工智能倡议) Deakin University(迪肯大学)

AI总结 提出SpecReTF方法,通过将时间序列转换为窗口化频率表示并采用结合幅度和相位的相似性度量,以及指数移动平均加权方案,解决了现有检索方法在频谱盲区和时间近因上的局限性,提升了非平稳时间序列预测的准确性。

详情
AI中文摘要

时间序列预测利用历史模式来预测未来值,但传统方法在处理复杂、非平稳模式时面临挑战,这些模式在训练期间难以记忆。检索增强方法通过检索相似历史模式来增强预测,已成为有前景的解决方案。然而,现有检索方法存在两个基本局限性:频谱盲区,即忽略了捕捉潜在周期结构的关键频域特征;以及时间近因,即对所有历史数据一视同仁,而不强调最近、更相关的模式。在本文中,我们提出SpecReTF,一种新颖的检索方法,通过将时间序列转换为窗口化频率表示,并使用结合幅度和相位信息的组合度量来衡量相似性,从而解决这些问题。为了平衡近因和历史上下文,我们应用指数移动平均加权方案,强调最近的窗口。在基准数据集上的大量实验表明,SpecReTF优于时域检索方法,在多样化的非平稳时间序列上实现了卓越的预测准确性。

英文摘要

Time series forecasting leverages historical patterns to predict future values, but traditional methods face challenges when dealing with complex, non-stationary patterns that are difficult to memorize during training. Retrieval-augmented approaches have emerged as promising solutions by retrieving similar historical patterns to enhance predictions. However, existing retrieval methods suffer from two fundamental limitations: spectral blindness, which overlooks critical frequency-domain characteristics that capture underlying periodic structures, and temporal recency, which treats all historical data equally without emphasizing recent, more relevant patterns. In this paper, we propose SpecReTF, a novel retrieval method that addresses these issues by converting time series into windowed frequency representations, measuring similarity with a combined metric that captures both amplitude and phase information. To balance recency and historical context, we apply an exponential moving average weighting scheme that emphasizes recent windows. Extensive experiments on benchmark datasets demonstrate that SpecReTF outperforms time-domain retrieval methods, achieving superior forecasting accuracy across diverse, non-stationary time series.

2606.19413 2026-06-19 cs.LG 新提交

Does Text Actually Help? Uncovering and Resolving Text Collapse in Multimodal Time Series Forecasting

文本真的有用吗?揭示并解决多模态时间序列预测中的文本坍缩问题

Huu Hiep Nguyen, Minh Hoang Nguyen, Dung Nguyen, Hung Le

AI总结 针对多模态时间序列预测中文本分支被忽视导致“文本坍缩”的问题,提出REST-TS方法,通过让文本分支专门预测数值主干无法解释的残差,强制其提取真实内容,实现最先进性能。

详情
AI中文摘要

多模态时间序列预测将数值序列与领域相关的文本报告配对,有望将世界知识注入预测流程。然而,我们揭示了现有框架中的一个关键失败模式,称为文本坍缩:文本分支收敛到与内容无关的变换,无论输入描述如何,都贡献可忽略的判别信号。我们认为文本坍缩是时间序列预测中基本不对称性的结果:数值输入与输出强自相关,使得数值主干天生占主导地位,而文本分支尽管携带互补且通常关键的信息,却未被充分利用,导致其系统性欠利用。为解决此问题,我们提出REST-TS(时间序列中文本的残差独占监督),将不对称性转化为设计原则:数值主干产生其独立的数值预测,而文本分支被独占监督以预测残差的结构化组成部分,即数值无法解释的预测差距。由于没有数值路径可以减少这些损失,文本分支必须从输入描述中提取真实内容。在多样化的现实领域和主干架构上的评估表明,REST-TS实现了最先进的性能,并一致地显示出比现有框架更高的文本分支利用率,提供了强有力的经验证据,表明对文本分支进行残差监督迫使其从输入中提取真实内容。

英文摘要

Multimodal time series forecasting, which pairs numerical sequences with domain-relevant textual reports, promises to inject world knowledge into forecasting pipelines. However, we uncover a critical failure mode in existing frameworks that we term text collapse: the text branch converges to a content-independent transformation, contributing negligible discriminative signal regardless of the input description. We argue that text collapse is a consequence of a fundamental asymmetry in time series forecasting: the numerical input is strongly autocorrelated with the output, making the numerical backbone inherently dominant, while the text branch, despite carrying complementary and often critical information, is insufficiently utilized, leading to its systematic underexploitation. To address this, we propose \textbf{REST-TS} (\textbf{R}esidual-\textbf{E}xclusive \textbf{S}upervision for \textbf{T}ext in \textbf{T}ime \textbf{S}eries), which turns the asymmetry into a design principle: the numerical backbone produces its own independent numerical forecast, and the text branch is exclusively supervised to predict the structured components of the residual, the prediction gap that numbers cannot explain. Because no numerical pathway can reduce these losses, the text branch must extract genuine content from the input description. Evaluated across diverse real-world domains and backbone architectures, REST-TS achieves state-of-the-art performance and consistently demonstrates greater text-branch utilization than existing frameworks, providing strong empirical evidence that supervising the text branch on the residual compels it to extract genuine content from the input.

2606.19560 2026-06-19 cs.LG 新提交

Understanding Key Features of Time Series Foundation Models from Epidemic Forecasting

从流行病预测理解时间序列基础模型的关键特征

Alireza Jafari, Judy Fox, Geoffrey C. Fox, Madhav Marathe, Aniruddha Adiga

发表机构 * Department of Computer Science, School of Engineering and Applied Science, University of Virginia(弗吉尼亚大学工程与应用科学学院计算机科学系) School of Data Science, University of Virginia(弗吉尼亚大学数据科学学院) Biocomplexity Institute, University of Virginia(弗吉尼亚大学生物复杂性研究所) Department of Electrical and Computer Engineering, School of Engineering and Applied Science, University of Virginia(弗吉尼亚大学工程与应用科学学院电气与计算机工程系)

AI总结 系统评估多种时间序列模型在流感预测中的表现,发现混合专家模型性能最优,预训练在长时域提升显著,而LLM方法效果较差。

Comments 15 pages, 2 figures, 9 tables

详情
AI中文摘要

季节性流感每年感染数百万人,并在美国造成大量发病和死亡,因此准确的短期预测成为核心公共卫生需求。可靠的流行病时间序列预测可以为疫苗接种时机、医院人员配备和资源分配提供信息,然而现代预测架构在传染病监测数据上的比较行为仍未得到充分表征。我们通过系统评估区域流感预测来填补这一空白,使用流感样疾病监测和流感相关住院时间序列,在时间泛化和空间泛化设置下进行1-4周提前预测。我们比较了经典神经网络架构、基于数值的Transformer模型、预训练时间序列基础模型和基于LLM的预测方法。在各项任务中,我们证明融合多个预训练预测器的混合专家模型实现了最强的整体性能,表明异质预训练表示提供了互补的预测信息。我们的结果进一步表明,基于数值的Transformer模型产生可靠的预测,而预训练在更长时域上提供最大增益,特别是当预训练领域与流感动力学机制一致时。相比之下,基于LLM的时间序列方法在此设置下表现不如数值预测器。最后,我们研究了住院信息作为辅助协变量和预训练源的作用。住院信号在特定设置中提供了互补的改进,并阐明了额外的监测流如何增强多时域预测的鲁棒性。这些发现为流感防范的模型选择、预训练策略和辅助信号使用提供了可操作的指导。

英文摘要

Seasonal influenza infects millions of people and causes substantial morbidity and mortality in the United States each year, making accurate short-term forecasting a core public-health need. Reliable forecasts of epidemic time series can inform vaccination timing, hospital staffing, and resource allocation, yet the comparative behavior of modern forecasting architectures on infectious-disease surveillance data remains insufficiently characterized. We address this gap through a systematic evaluation of regional influenza forecasting using influenza-like illness surveillance and influenza-associated hospitalization time series under both temporal and spatial generalization settings for 1-4-week-ahead prediction. We compare classical neural network architectures, numerical transformer-based models, pretrained time series foundation models, and LLM-based forecasting approaches. Across tasks, we demonstrate that a mixture-of-experts model that fuses multiple pretrained forecasters achieves the strongest overall performance, indicating that heterogeneous pretrained representations provide complementary predictive information. Our results further show that numerical transformer-based models produce reliable forecasts, while pretraining provides the largest gains at longer horizons, particularly when the pretraining domain is mechanistically aligned with influenza dynamics. In contrast, LLM-based time series methods underperform relative to numerical forecasters in this setting. Finally, we examine hospitalization information as both an auxiliary covariate and a pretraining source. Hospitalization signals provide complementary improvements in selected settings and clarify when additional surveillance streams enhance the robustness of multi-horizon forecasting. These findings provide actionable guidance on model selection, pretraining strategy, and auxiliary-signal use for influenza preparedness.

2606.19562 2026-06-19 cs.LG physics.flu-dyn 新提交

Advances in Scientific Machine Learning for Coupled Fluid Flow and Transport

耦合流体流动与输运的科学机器学习进展

Gabriel F. Barros, Rômulo M. Silva, Alvaro L. G. A. Coutinho

发表机构 * COPPE - Federal University of Rio de Janeiro - UFRJ(里约热内卢联邦大学COPPE学院)

AI总结 综述科学机器学习在耦合流体流动与输运问题中的进展,包括基于SVD的线性降阶和PINNs、β-VAE等神经网络方法,并展示其在浊流和热对流中的应用。

详情
AI中文摘要

本章回顾了科学机器学习(SciML)在模拟由不可压缩Navier-Stokes方程和标量输运方程控制的耦合流体流动与输运现象方面的最新进展。这类系统出现在浊流和热对流等应用中,具有强非线性耦合和多尺度行为,使得高保真模拟计算成本高昂。为此,本章调查了构建高效代理模型的最新SciML方法,包括基于奇异值分解的线性降阶技术(如动态模态分解)和非线性神经网络方法(如物理信息神经网络(PINNs)和β-变分自编码器(β-VAEs))。首先介绍了作者将这些模型与高性能计算策略相结合的工作,包括自适应网格细化/粗化(AMR/C)和科学浮点数据压缩。然后提出了两个新贡献:通过PINNs对浊流进行代理建模,以及使用β-VAEs从热流中提取解缠的非线性模态。控制方程和代表性基准(包括锁交换流和Rayleigh-Bénard对流)说明了这些方法。本章篇幅较长,涵盖了耦合流体流动的数学和物理基础以及最先进建模的计算方面。总体而言,它展示了SciML如何在特定数据范围和建模假设下,实现复杂耦合系统的快速、精确近似,同时相对于全阶模拟大幅降低计算成本。实时预测和不确定性量化等更广泛的能力仍然是活跃的研究方向,其可行性在很大程度上取决于具体问题。

英文摘要

This chapter reviews recent advances in Scientific Machine Learning (SciML) for modeling coupled fluid flow and transport phenomena governed by the incompressible Navier-Stokes and scalar transport equations. Such systems, found in applications like turbidity currents and thermal convection, feature strong nonlinear coupling and multiscale behavior that make high-fidelity simulations computationally expensive. To address this, the chapter surveys state-of-the-art SciML methods for building efficient surrogate models, including linear reduced-order techniques based on Singular Value Decomposition (such as Dynamic Mode Decomposition) and nonlinear neural network approaches like Physics-Informed Neural Networks (PINNs) and $β$-Variational Autoencoders ($β$-VAEs). It first covers the authors' work combining these models with High Performance Computing strategies, including Adaptive Mesh Refinement/Coarsening (AMR/C) and scientific floating-point data compression. It then presents two new contributions: surrogate modeling of turbidity currents via PINNs, and the extraction of disentangled nonlinear modes from thermal flows using $β$-VAEs. Governing equations and representative benchmarks, including lock-exchange flows and Rayleigh-Bénard convection, illustrate these methodologies. The chapter is intentionally long, covering both the mathematical and physical foundations of coupled fluid flow and the computational aspects of state-of-the-art modeling. Overall, it demonstrates how SciML enables fast, accurate approximations of complex coupled systems within the specific data regimes and modeling assumptions considered, while substantially reducing computational cost relative to full-order simulations. Broader capabilities such as real-time prediction and uncertainty quantification remain active research directions whose feasibility depends strongly on the problem at hand.

2606.19623 2026-06-19 cs.LG 新提交

SEAGAN: domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes

SEAGAN:面向动态植物过程的领域特定与边缘感知图注意力网络

Antriksh Srivastava, Soumyashree Kar

AI总结 提出SEAGAN,将植物A-Ci曲线中的生化限制状态识别建模为图节点分类问题,利用距离kNN和辅助信号引导连接构建图,通过边缘感知图注意力网络提升分类性能,F1分数达0.857。

详情
AI中文摘要

图神经网络(GNN)为从通过物理、生物或功能关系关联的科学数据中学习提供了灵活框架。一个有前景的领域是植物生理学,其中测量的响应通常来自多个相互作用的过程,即使通过人工干预,这些过程的精确分离仍然困难。在植物生理学中,一个关键例子是A-Ci曲线,它关联净CO2同化速率(Anet)与叶片胞间CO2浓度(Ci),并用于估计叶片和作物冠层模型中的光合参数。然而,可靠估计需要识别每个曲线点处的活跃生化限制状态,这仍然是主要的不确定性来源。在这里,我们将沿A-Ci曲线的限制状态识别表述为基于图的节点分类问题,以曲线点为节点。使用基于距离的k近邻(kNN)和辅助信号引导(ASG)连接创建领域特定的图表示,边属性编码成对关系。该框架与常规学习基线、基于图的架构以及基于自动拟合的基准进行了评估。在具有已知真实限制状态的大型合成数据集上的结果表明,基于图的模型改善了分类,特别是在生化过渡区域附近。最佳配置SEAGAN(面向动态植物过程的领域特定与边缘感知图注意力网络)整合了过程感知节点特征、边属性、kNN连接和带加权交叉熵损失的图注意力,实现了0.857的F1分数和0.882的准确率。结果表明,将A-Ci曲线表示为图改善了生化限制状态分析,而局部kNN邻域上的边缘感知注意力提供了最有效的策略。

英文摘要

Graph neural networks (GNNs) provide a flexible framework for learning from scientific data linked through physical, biological, or functional relationships. One promising domain is plant physiology, where measured responses often arise from multiple interacting processes whose exact separation remains difficult even with manual intervention. In plant physiology, a key example is the A-Ci curve, which relates net CO2 assimilation rate (Anet) to leaf intercellular CO2 concentration (Ci) and is used to estimate photosynthetic parameters in leaf and crop-canopy models. However, reliable estimation requires identifying the active biochemical limitation state at each curve point, which remains a major source of uncertainty. Here, we formulate limitation-state identification along A-Ci curves as a graph-based node classification problem, with curve points as nodes. Domain-specific graph representations are created using distance-based k-nearest-neighbor (kNN) and auxiliary-signal-guided (ASG) connectivity, with edge attributes encoding pairwise relations. The framework was evaluated against conventional learning baselines, graph-based architectures, and an automated fitting-based benchmark. Results on a large synthetic dataset with known ground-truth limitation states show that graph-based models improve classification, particularly near biochemical transition regions. The best-performing configuration, SEAGAN (domain-Specific and Edge-Aware Graph Attention Network for Dynamic Plant Processes), integrates process-aware node features, edge attributes, kNN connectivity, and graph attention with weighted cross-entropy loss, achieving an F1-score of 0.857 and an accuracy of 0.882. The results show that representing A-Ci curves as graphs improves biochemical limitation-state analysis, with edge-aware attention over local kNN neighborhoods providing the most effective strategy.

2606.20015 2026-06-19 cs.LG 新提交

Adaptive Distance-Aware Trunk Deep Operator Learning for Long-Span Roadway Bridges

自适应距离感知主干深度算子学习用于大跨度公路桥梁

Bilal Ahmed, Diab W. Abueidda, Waleed El-Sekelly, Tarek Abdoun, Mostafa E. Mobasher

发表机构 * Urban Engineering Department , addressline= New York University Abu Dhabi , country= United Arab Emirates organization= National Center for Supercomputing Applications , addressline= University of Illinois at Urbana-Champaign , country= United States of America organization= Department of Structural Engineering , addressline= Mansoura University , country= Mansoura, Egypt

AI总结 提出自适应主干DeepONet框架,通过KNN构建荷载相关学习域、距离感知特征和刚度-informed Schur补全重建,实现大跨度桥梁局部响应高精度快速预测,相对误差低于5%,速度提升约60倍。

Comments 39 pages, 26 figures

详情
AI中文摘要

大跨度公路桥梁在车辆荷载下表现出高度局部化的结构响应,使得重复有限元分析在影响面生成和结构数字孪生等应用中计算成本高昂。现有的科学机器学习方法难以准确捕捉这些局部响应。为解决这一挑战,本研究提出了一种自适应主干DeepONet用于大型桥梁系统的局部结构响应预测。该框架利用KNN策略动态构建荷载相关的学习域,使网络聚焦于结构影响区域。主干网络进一步通过距离感知特征增强,这些特征编码了荷载与结构节点之间的几何关系。通过刚度-informed Schur补全公式引入基于物理的全场重建,使得自适应节点上的预测能够扩展到整个结构域。为了实现可扩展训练,使用降阶等效壳模型生成响应数据,该模型保留了主要的全局行为,同时显著降低了计算成本。该框架在基准桥梁模型和真实世界的Mussafah桥上进行了验证。结果表明,该方法实现了有限元级别的精度,相对误差低于5%,同时将总响应评估时间(包括全场重建)减少了约60倍;排除后处理重建步骤,AD-DeepONet推理比有限元快四个数量级。此外,该框架能够在任意车辆荷载配置下快速生成全场响应、影响线和影响面,显示出在大规模桥梁分析和数字孪生应用中的巨大潜力。

英文摘要

Long-span roadway bridges exhibit highly localized structural responses under vehicular loading, making repeated FE analysis computationally expensive for applications such as influence surface generation and structural digital twins. Existing SciML approaches struggle to accurately capture these localized responses. To address this challenge, this study proposes an adaptive-trunk DeepONet for localized structural response prediction in large-scale bridge systems. The framework dynamically constructs a load-dependent learning domain using a KNN strategy, allowing the network to focus on structural influence zones. The trunk network is further enhanced using distance-aware features that encode the geometric relationship between the load and structural nodes. A physics-based full-field reconstruction is incorporated through a stiffness-informed Schur complement formulation, enabling predictions at adaptive nodes to be extended to the entire structural domain. To enable scalable training, response data are generated using a reduced-order equivalent shell model that preserves the dominant global behavior while significantly reducing computational cost. The proposed framework is validated on both a benchmark bridge model and the real-world Mussafah Bridge. Results show that the method achieves FEM-level accuracy with relative errors below 5%, while reducing the total response evaluation time (including full-field reconstruction) by approximately 60x; excluding the post-processing reconstruction step, the AD-DeepONet inference is up to four orders of magnitude faster than FEM. In addition, the framework enables rapid generation of full-field responses, influence lines, and influence surfaces under arbitrary vehicular loading configurations, demonstrating strong potential for large-scale bridge analysis and digital twin applications.

2606.20034 2026-06-19 cs.LG 新提交

Exploring the potential of AlphaEarth and TESSERA embeddings for Fine-scale Local Climate Zone Mapping: A case study across five cities in Switzerland

探索AlphaEarth和TESSERA嵌入在精细尺度局地气候区制图中的应用潜力:以瑞士五个城市为例

Htet Yamin Ko Ko, Clement Atzberger

AI总结 本研究对比TESSERA和AlphaEarth嵌入与传统Sentinel-1/2数据,使用注意力U-Net将粗分辨率LCZ图提升至10米,发现嵌入模型在跨城市迁移和精度上表现更优,但跨年迁移仍是挑战。

详情
AI中文摘要

理解城市空间形态对于气候建模、风险评估和可持续城市设计至关重要,而局地气候区(LCZ)制图为此提供了基本框架。然而,许多城市仍使用约100米分辨率的粗LCZ记录,这并不适用于精细尺度的城市研究。在本研究中,我们将TESSERA(Feng等人,2025)和AlphaEarth(Brown等人,2025)的预计算嵌入与传统的Sentinel-1/2(S1S2)合成数据在瑞士五个城市进行比较,以评估它们是否能够使用基于注意力的U-Net将粗LCZ图提升至10米分辨率。三个实验评估了多城市迁移性、更高分辨率参考数据的影响以及对年际物候变化的时间鲁棒性。我们发现,所有数据集在前两个实验中均取得了强劲性能,测试数据的交并比(IoU)分别在0.59-0.69和0.77-0.82之间。TESSERA在两种设置下均一致优于S1S2和AlphaEarth。正如预期,我们发现基于嵌入的模型从一年迁移到另一年仍然是一个开放的挑战。然而,总体而言,我们的结果表明,来自地球观测基础模型的嵌入在减少耗时预处理和手动特征工程任务方面具有巨大潜力,并能够指导通用的基于深度学习的LCZ制图工作流程。当与简单的位置感知注意力U-Net架构结合时,这些嵌入增强了区域迁移性和可扩展性,支持为全球城市气候应用开发全面且可重复的精细尺度LCZ图。提高参考数据质量仍然是进一步提升精度的最强杠杆。

英文摘要

Understanding urban spatial morphology is critical for climate modeling, risk assessment, and sustainable urban design, and Local Climate Zone (LCZ) mapping provides the basic framework for this. However, many cities still use coarse ~100-m resolution LCZ records, which are unsuitable for fine-scale urban research. In this study, precomputed embeddings from TESSERA (Feng et al., 2025) and AlphaEarth (Brown et al., 2025) are compared to traditional Sentinel-1/2 (S1S2) composites in five Swiss cities to see if they can upscale coarse LCZ maps to 10-m resolution using an attention-based U-Net. Three experiments assess multi-city transferability, the impact of higher-resolution reference data, and temporal robustness to year-to-year phenology changes. We find that all datasets achieve strong performance with test data Intersection-over-Union (IoU) ranging from 0.59-0.69 and 0.77-0.82 in the first two experiments. TESSERA consistently outperforms both S1S2 and AlphaEarth across both settings As expected, we find that the transfer of embedding-based models from one year to another remains an open challenge. Overall, however, our results demonstrate the promising potential of embeddings derived from EO foundation models to reduce time consuming preprocessing, respectively, manual feature engineering tasks and to guide a universal deep learning-based LCZ mapping workflow. When combined with a simple location-aware attention U-Net architecture, the embeddings enhance regional transferability and scalability, supporting the development of comprehensive and reproducible fine-scale LCZ maps for global urban climate applications Improving reference data quality remains the strongest lever for further accuracy gains.

2606.20037 2026-06-19 cs.LG 新提交

Alzheimer's Disease Diagnosis using a Multimodal Approach with 3D MRI and PET

使用3D MRI和PET的多模态方法诊断阿尔茨海默病

Loukas Ilias, Anthi-Maria Vozinaki, Christos Ntanos, Dimitris Askounis

发表机构 * DSS Lab, School of ECE, NTUA(NTUA ECE学院DSS实验室)

AI总结 提出结合3D卷积特征提取器与三种融合策略(拼接、门控多模态单元、门控自注意力)及稀疏门控混合专家分类器的多模态模型,用于阿尔茨海默病诊断,在三个二分类任务上验证了输入自适应建模的有效性。

Comments 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM)

详情
AI中文摘要

阿尔茨海默病(AD)是一种不可逆的神经退行性疾病,也是全球主要的死亡原因之一。早期诊断尤为重要,尤其是在轻度认知障碍(MCI)阶段,及时干预有助于延缓其向AD的进展。神经影像数据,如磁共振成像(MRI)和正电子发射断层扫描(PET),可以通过提供与疾病相关的结构和功能脑变化来帮助早期检测脑部变化。然而,许多多模态模型仍通过静态拼接融合MRI和PET,并对所有受试者应用相同的计算,这限制了其对患者/站点异质性的鲁棒性,并可能浪费计算资源。为解决这些局限性,我们首次研究了将3D卷积特征提取器与三种融合策略(拼接、门控多模态单元(GMU)和门控自注意力)以及一个稀疏门控混合专家(MoE)分类器相结合的方法,该分类器执行输入自适应路由,仅激活每个病例中最具信息量的专家。最后,我们利用Grad-CAM可视化疾病相关区域,确保模型的可解释性。实验在三个二分类任务(NC vs. MCI、MCI vs. AD和NC vs. AD)上进行。结果表明,GMU在NC vs. MCI和NC vs. AD上分别达到80.46%和95.47%的准确率,而门控自注意力在MCI vs. AD上达到82.08%。消融实验表明,移除MoE会持续降低所有任务的准确率。这些发现强调了利用MRI和PET互补性的输入自适应多模态建模在AD诊断中的价值。

英文摘要

Alzheimer's disease (AD) is an irreversible neurodegenerative disorder and a leading cause of death worldwide. Early diagnosis plays an important part especially at the Mild Cognitive Impairment stage, where timely intervention can help slow its progression before it advances to AD. Neuroimaging data, like Magnetic Resonance Imaging (MRI) and Positron Emission Tomography (PET) scans, can help detect brain changes early by providing structural and functional brain changes related to the disease. Yet, many multimodal models still fuse MRI and PET with static concatenation and apply identical computation to all subjects, which limits robustness to patient/site heterogeneity and can waste computation. To address these limitations, we present the first study of combining 3D convolutional feature extractors with three fusion strategies - concatenation, Gated Multimodal Unit (GMU), and gated self-attention - and a sparsely gated Mixture-of-Experts (MoE) classifier that performs input-adaptive routing, activating only the most informative experts per case. Finally, we utilize Grad-CAM to visualize disease-related regions, ensuring model interpretability. Experiments are performed across three binary classification tasks (NC vs. MCI, MCI vs. AD, and NC vs. AD). Results show that GMU achieves accuracies of 80.46 % (NC vs. MCI) and 95.47 % (NC vs. AD), while gated self-attention attains 82.08 % on MCI vs. AD. Ablations show that removing the MoE consistently degrades accuracy across all tasks. These findings underscore the value of input-adaptive, multimodal modeling for AD diagnosis by leveraging the complementary nature of MRI and PET.

2606.20053 2026-06-19 cs.LG 新提交

Comparative Study of Neural Surrogate Architectures for Autoregressive Prediction of Internal Battery States

用于电池内部状态自回归预测的神经代理架构比较研究

Gihyun Lee, Thorben Menne, Simon Olma, Jakob Hilgert, Sangyoung Park

AI总结 系统比较四种神经网络架构(MLP、ResNet、U-Net、FNO)作为自回归状态转移算子,预测锂离子电池DFN模型内部状态,发现U-Net因多尺度空间归纳偏置在精度和速度上最优。

Comments 8 pages, 5 figures

详情
AI中文摘要

Doyle-Fuller-Newman (DFN) 模型以高保真度解析锂离子电池的内部电化学状态。然而,其控制方程的数值求解对于实时部署而言计算成本过高,限制了从单个电池到电池组及车队规模应用的可扩展性。虽然机器学习代理可以通过GPU加速大幅降低推理延迟,但现有大多数方法学习的是特定操作条件下的解近似,而非可泛化的状态演化动力学。本文系统比较了四种神经网络架构(MLP、ResNet、U-Net、FNO),它们被构建为自回归状态转移算子,可预测广泛操作条件下的完整DFN内部状态。为确保受控的架构比较,所有模型在统一框架下训练,采用多步展开和电流条件化,隔离了空间归纳偏置的影响。结果表明,U-Net的多尺度特征层次在300步自回归展开后,所有内部状态变量的平均最终步nRMSE达到3%,同时相比数值求解器实现了5.38倍的加速。这些发现强调了空间归纳偏置是代理性能的关键决定因素,推动了用于下一代电池管理系统和数字孪生的内部状态可观测性代理的发展。

英文摘要

The Doyle-Fuller-Newman (DFN) model resolves internal electrochemical states in lithium-ion batteries with high fidelity. However, the numerical solution of its governing equations is computationally prohibitive for real-time deployment, limiting scalability from individual cells to pack and fleet-scale applications. While machine learning surrogates can substantially reduce inference latency through GPU acceleration, most existing approaches learn solution approximations tied to specific operating conditions rather than learning generalizable state-evolution dynamics. This work presents a systematic comparison of four neural network architectures (MLP, ResNet, U-Net, FNO) formulated as autoregressive state-transition operators that predict full DFN internal states across a wide range of operating conditions. To ensure a controlled architectural comparison, all models are trained under a unified framework using multi-step unrolling and current-conditioning, isolating the impact of spatial inductive bias. Results demonstrate that the U-Net's multi-scale feature hierarchy achieves a mean final-step nRMSE of 3% averaged across all internal state variables after 300-step autoregressive rollouts, while providing a 5.38x speed-up over the numerical solver. These findings highlight spatial inductive bias as a critical determinant of surrogate performance, advancing the development of surrogates for internal state observability for next-generation battery management systems and digital twins.

2606.20055 2026-06-19 cs.LG 新提交

PaAno+: Multiscale Encoding and Cross-Variable Attention for Time Series Anomaly Detection

PaAno+:用于时间序列异常检测的多尺度编码与跨变量注意力

Youji Zhu, Hongbing Wang, Wenchao Liu, Xiaodong Liu, Xiangguang Xiong

发表机构 * School of Mathematical Sciences, Guizhou Normal University(贵州师范大学数学科学学院) School of Big Data and Computer Science, Guizhou Normal University(贵州师范大学大数据与计算机科学学院)

AI总结 提出PaAno模型,通过多尺度特征提取、跨变量融合注意力和补丁窗口排序预任务,实现轻量高效的时间序列异常检测,在TSB-AD基准上达到SOTA。

详情
AI中文摘要

时间序列异常检测在工业和医疗监测等关键领域具有重要的实用价值。当前基于Transformer和大模型的检测方法计算开销过大,而现有的轻量级替代方案受限于特征提取不足以及多变量间依赖关系建模不充分。为缓解上述缺陷,本研究在面向补丁的表征学习范式下,开发了一种轻量高效的异常检测模型PaAno。在编码器模块中,使用具有差异化感受野的卷积核构建多尺度特征提取主干,以捕获层次化时间特征;随后通过跨尺度自适应注意力聚合结合残差连接优化,进一步稳定特征表征学习。嵌入跨变量融合注意力模块以显式表征变量间相关性,使模型能够在复杂运行条件下识别异常模式。此外,定制了一种基于时间补丁窗口排序的新型前置任务,以揭示时间序列的内在结构特性,并利用三元组损失优化补丁嵌入空间以增强特征判别性。在TSB-AD基准上的大量实验表明,所提出的PaAno在单变量和多变量任务上均实现了最先进的检测精度,在包括VUS-PR在内的评估指标上相对于原始PaAno取得了显著性能提升。凭借紧凑的网络设计,该模型实现了良好的计算效率,能够在资源受限的终端上部署用于实时异常推理。

英文摘要

Time-series anomaly detection has significant practical value for industrial and medical monitoring, as well as other critical domains. Current Transformer- and large-model-based detection approaches incur excessive computational overhead, while existing lightweight alternatives are constrained by insufficient feature extraction and inadequate modeling of dependencies across multivariate variables. To mitigate the above drawbacks, this study develops a lightweight, efficient anomaly detection model, dubbed PaAno, within the patch-oriented representation learning paradigm. In the encoder module, a multiscale feature-extraction backbone is constructed using convolutional kernels with differentiated receptive fields to capture hierarchical temporal characteristics; subsequent cross-scale adaptive attention aggregation, combined with residual connection optimization, further stabilizes feature representation learning. A cross-variable fusion attention module is embedded to explicitly characterize inter-variable correlations, empowering the model to identify anomalous patterns amid intricate operational conditions. Moreover, a novel pretext task based on temporal patch-window sorting is customized to uncover intrinsic structural properties of time series, and triplet loss is leveraged to optimize the patch embedding space for enhanced feature discrimination. Extensive experiments on the TSB-AD benchmark demonstrate that the proposed PaAno achieves state-of-the-art detection accuracy on both univariate and multivariate tasks, yielding significant performance gains across evaluation metrics, including VUS-PR, relative to the original PaAno. Leveraging a compact network design, the presented model achieves favorable computational efficiency, enabling deployment on resource-limited terminals for real-time anomaly inference.

2606.20172 2026-06-19 cs.LG 新提交

Predicting gestational age at birth in the context of preterm birth from multi-modal fetal MRI

基于多模态胎儿MRI预测早产背景下的出生胎龄

Diego Fajardo-Rojas, Megan Hall, Daniel Cromb, Mary A. Rutherford, Lisa Story, Emma C. Robinson, Jana Hutter

发表机构 * Leibniz University Hannover(莱布尼茨汉诺威大学)

AI总结 提出结合多模态胎儿MRI和机器学习流程预测出生胎龄,包括数据插补、特征选择和回归模型,在333例对照和93例早产数据上评估,R²=0.13,MAE=2.74周,准确率0.77。

Comments Accepted for publication at the Journal of Machine Learning for Biomedical Imaging (MELBA) https://melba-journal.org/2026:013

Journal ref Machine.Learning.for.Biomedical.Imaging. 2026 (2026)

详情
AI中文摘要

早产与高死亡率和终身发病风险相关。复杂的多因素病因阻碍了准确预测和最佳护理。我们开发并评估了一个包含定制机器学习方法的流程,用于数据插补、特征选择和回归模型,以从333例对照和93例早产病例的综合多模态形态和功能胎儿MRI数据预测出生胎龄。将出生胎龄预测分为足月和早产类别,并报告其准确性、敏感性和特异性。进行了消融研究以进一步验证流程设计。使用分层10折交叉验证评估性能。该流程实现了0.13的R²分数和2.74周的平均绝对误差。在交叉验证中,准确率为0.77,敏感性为0.59,特异性为0.82。流程选择的主要特征包括宫颈长度和基于胎盘T2*值的统计量。快速、运动鲁棒的多模态胎儿MRI技术与机器学习预测的结合使得能够预测出生胎龄。这些信息对任何妊娠都至关重要。据我们所知,早产在文献中仅作为分类问题处理。因此,这项工作提供了概念验证。未来工作将增加队列规模,以允许在早产队列内进行更精细的分层。我们的代码可在以下网址获取:此https URL。

英文摘要

Preterm birth is associated with significant mortality and a risk for lifelong morbidity. The complex multifactorial aetiology hampers accurate prediction and thus optimal care. A pipeline consisting of bespoke machine learning methods for data imputation, feature selection, and regression models to predict gestational age (GA) at birth was developed and evaluated from comprehensive multi-modal morphological and functional fetal MRI data from 333 control cases and 93 preterm birth cases. The GA at birth predictions were classified into term and preterm categories and their accuracy, sensitivity, and specificity were reported. An ablation study was performed to further validate the design of the pipeline. Performance was evaluated using stratified 10-fold cross-validation. The pipeline achieves an R2 score of 0.13 and a mean absolute error of 2.74 weeks. It also achieves a 0.77 accuracy, 0.59 sensitivity, and 0.82 specificity across folds. The predominant features selected by the pipeline include cervical length and statistics derived from placental T2* values. The confluence of fast, motion-robust and multi-modal fetal MRI techniques and machine learning prediction allowed the prediction of the gestation at birth. This information is essential for any pregnancy. To the best of our knowledge, preterm birth had only been addressed as a classification problem in the literature. Therefore, this work provides a proof of concept. Future work will increase the cohort size to allow for finer stratification within the preterm birth cohort. Our code is available at https://github.com/dfajardorojas/ml-for-preterm-birth-.

2606.20174 2026-06-19 cs.LG 新提交

Computational Methods and Challenges in Cell-Free DNA Analysis for Multi-Cancer Early Detection

基于无细胞DNA分析的多癌早期检测的计算方法与挑战

Nicko Starkey, Marcin W. Wojewodzic, Krzysztof Rzecki

发表机构 * AGH University of Krakow(AGH克拉科夫大学) Norwegian Institute of Public Health(挪威公共卫生研究所)

AI总结 综述2022-2025年cfDNA多癌早期检测的计算方法,重点分析片段组学和表观遗传特征提取技术,指出多模态集成方法最具临床整合潜力,但需标准化评估协议。

详情
AI中文摘要

无细胞DNA(cfDNA)是非侵入性多癌早期检测(MCED)的一个有前景的途径,因为它可以通过单次抽血同时检测多种癌症,尤其对目前缺乏既定筛查程序的癌症具有敏感性。本文综述了2022年至2025年间基于cfDNA的MCED计算方法。我们重点关注如何提取和分析片段组学和表观遗传特征以在早期阶段检测癌症。我们首先简要概述cfDNA信号的生物学基础,然后回顾经典的统计和机器学习方法以及深度学习框架,包括基于自编码器的模型。对于每种方法,我们讨论其生物学可解释性、验证策略以及临床整合的准备情况。此外,我们将当前挑战分为技术、计算和方法论三类,并概述该领域的开放问题。本综述表明,多模态集成方法在临床整合方面具有最强的前景和最高的准备度。然而,为了更好地评估未来工作和进行并排比较,标准化评估协议和报告结果至关重要。

英文摘要

Cell-free DNA (cfDNA) is a promising avenue for non-invasive multicancer early detection (MCED), in that, it can enable multiple cancer detection simultaneously from a single blood draw, with particular sensitivity to cancers that currently lack established screening programs. Here we review the computational methods developed between 2022 and 2025 for cfDNA-based MCED. We focus on how fragmentomics and epigenetic features are extracted and analyzed to detect cancer at early stages. We first briefly outline the biological basis of cfDNA signals, then review classical statistical and machine learning approaches alongside deep learning frameworks including autoencoder-based models. For each method we discuss biological interpretability, validation strategy, and readiness for clinical integration. Furthermore, we categorize the current challenges into technical, computational, and methodological while outlining open problems in the field. This review shows that multimodal ensemble approaches have the strongest promise for clinical integration and the highest readiness. However, for better assessment of future work and side-by-side comparison, standardization of evaluation protocols and reporting results will be crucial.

2606.20291 2026-06-19 cs.LG cs.CV 新提交

Integrating national forest inventory, airborne lidar, and satellite imagery for wall-to-wall mapping of forest structure with computer vision

整合国家森林清查、机载激光雷达和卫星影像,利用计算机视觉实现森林结构的全覆盖制图

Luke J. Zachmann, David D. Diaz, Vincent A. Landau, Chelsey Walden-Schreiner, Tony Chang, Nathan E. Rutenbeck, Katharyn A. Duffy, Kiarie Ndegwa, Andreas Gros, Scott Conway, Guy Bayes

发表机构 * Vibrant Planet Public Benefit Corporation(Vibrant Planet 公益公司)

AI总结 提出VibrantForests框架,结合卫星影像、激光雷达样本和计算机视觉,以10米分辨率生成美国本土的冠层覆盖、高度、生物量等森林属性图,减少饱和与回归均值问题。

详情
AI中文摘要

遥感技术越来越被依赖,以提供可操作的科学研究,用于大型景观的森林和野火风险管理。全覆盖、每年更新的地图是有效森林管理的持续需求。许多规划系统和数据收集结合了不同目的、年份和预测质量的异质数据源,导致运营规划系统中的混淆行为。我们介绍了VibrantForests框架,该框架被开发并应用于绘制森林属性,为有效的森林和野火规划提供一致的基础。VibrantForests包括一个基于卫星的森林结构模型,该模型在激光雷达衍生的样本上训练,并应用于美国本土,以10米分辨率同时生成冠层覆盖度、冠层高度、地上活树生物量、胸高断面积和二次平均直径的估计。我们展示了跨越从稀疏冠层/低生物量到密集冠层/高生物量的全部森林条件的预测能力。结果表明,我们的模型扩展了在类似被动传感器模型中常见的饱和范围,并减少了回归均值行为,该行为通常在小/稀疏条件下高估森林属性,在大/密集条件下低估森林属性。VibrantForests框架通过以年度节奏和10米分辨率提供管理相关属性的一致全覆盖估计,解决了大面积森林和野火规划中的一个关键限制。

英文摘要

Remote sensing is increasingly relied upon to deliver actionable science for forest and wildfire risk management across large landscapes. Wall-to-wall, annually updated maps are a persistent need for effective forest management. Many planning systems and data collections combine disparate data sources with different purposes, vintages, and prediction quality, which leads to confounding behavior in operational planning systems. We introduce the VibrantForests framework, developed and applied to map forest attributes and provide a coherent foundation for effective forest and wildfire planning. VibrantForests includes a satellite-based forest structure model trained on lidar-derived samples and applied across the contiguous United States to concurrently generate estimates of canopy cover, canopy height, aboveground live tree biomass, basal area, and quadratic mean diameter at 10-meter resolution. We demonstrate predictive capability spanning the full spectrum of forest conditions ranging from sparse-canopy/low-biomass to dense-canopy/high-biomass. Results show that our model extends the range at which saturation is commonly encountered in comparable passive-sensor models, and reduces regression-to-mean behavior that commonly produces overestimation of forest attributes in small/sparse conditions and underestimation in large/dense conditions. The VibrantForests framework addresses a key limitation in large-area forest and wildfire planning by delivering coherent wall-to-wall estimates of management-relevant attributes at annual cadence and 10m resolution.

2606.20326 2026-06-19 cs.LG physics.comp-ph 新提交

Quantum-classical physics-informed Kolmogorov-Arnold networks for PDEs

量子-经典物理信息Kolmogorov-Arnold网络求解偏微分方程

Xiang Rao, Yuxuan Shen

AI总结 提出QCPIKAN,首个量子-经典物理信息Kolmogorov-Arnold网络,结合Chebyshev多项式KAN层和参数化量子电路,通过嵌入物理约束加速高频误差指数收敛并抑制数值色散,在多孔介质渗流场景中优于现有量子-经典PINN。

详情
AI中文摘要

我们开发了QCPIKAN,这是首个旨在求解偏微分方程(PDE)的量子-经典物理信息Kolmogorov-Arnold网络。该混合框架基于Chebyshev多项式KAN层和参数化量子电路构建,将物理约束嵌入训练损失中以强制执行物理一致性。我们的基于逼近论的理论研究证明,该设计将高频误差收敛加速至指数速率,并有效抑制数值色散。我们在多孔介质中的三个典型渗流场景(包括单相流、组分运移和两相流)上验证了该框架。与现有的量子-经典物理信息神经网络相比,QCPIKAN在全局预测精度、局部误差控制、动态演化跟踪和驱替前沿定位方面均实现了优越性能。这项工作为求解复杂PDE提供了一种鲁棒且高效的替代方案。

英文摘要

We develop QCPIKAN, the first quantum-classical physics-informed Kolmogorov-Arnold network designed to solve partial differential equations (PDEs). Built upon Chebyshev-polynomial KAN layers and parameterized quantum circuits, this hybrid framework embeds physical constraints into the training loss to enforce physical consistency. Our theoretical investigations grounded in approximation theory prove that this design accelerates high-frequency error convergence to an exponential rate and effectively mitigates numerical dispersion. We validate the framework across three typical seepage scenarios in porous media, including single-phase flow, component transport and two-phase flow. Compared with existing quantum-classical physics-informed neural networks, QCPIKAN achieves superior performance in global prediction accuracy, local error control, dynamic evolution tracking and displacement front localization. This work provides a robust and efficient alternative for solving complex PDEs.

2606.20329 2026-06-19 cs.LG physics.geo-ph 新提交

Constrained hybrid modelling to predict microbial dynamics and organic matter turnover in soil systems

约束混合建模预测土壤系统中微生物动态与有机质周转

Paul Collart, Juergen Gall, Andrea Schnepf, Holger Pagel, Lars Doorenbos

发表机构 * Agrosphere (IBG-3), Forschungszentrum Jülich GmbH(农业圈(IBG-3),于利希研究中心) Institute of Crop Science and Resource Conservation, University of Bonn(波恩大学作物科学与资源保护研究所) Institute of Computer Science, University of Bonn(波恩大学计算机科学研究所) Lamarr Institute for Machine Learning and Artificial Intelligence(拉马尔机器学习和人工智能研究所)

AI总结 提出首个混合建模框架,利用神经网络从宏基因组推断功能性状预测过程模型参数,并整合生态理论约束,有效预测微生物动态和有机质周转。

Comments Accepted at ICML '26

详情
AI中文摘要

土壤微生物控制有机质循环,并在很大程度上决定土壤系统如何应对和缓解气候变化及环境威胁。因此,在基于过程的土壤模型中表示微生物动态对于预测土壤碳循环至关重要,尽管从数据中获取信息极具挑战性。改进参数化的一个有前景的方法是整合基因组数据,然而建模基因组与微生物驱动过程之间复杂且未知的关系是一个未解决的问题。在这项工作中,我们提出了第一个混合建模框架,用于从基于DNA测序数据的宏基因组推断功能性状中推导基于过程的土壤有机质周转模型的生物动力学参数值。我们的模型通过神经网络从基因组性状数据预测过程模型的生物动力学参数,并整合来自生态理论和文献的约束,以确保即使是非观测状态变量也能实现逼真的行为。我们在不同复杂度的合成基因组性状数据集和真实数据上评估了我们的方法,结果表明,我们的方法在多个基线上提高了性能,并有效学习了过程模型中不可测量组分的动态,即使是在小训练数据集上也是如此。

英文摘要

Soil microorganisms control organic matter cycling and largely determine how soil systems can cope with and mitigate climate change and environmental threats. Representing microbial dynamics in process-based soil models is therefore critical to predict carbon cycling in soils, albeit highly challenging to inform from data. One promising approach to improve their parametrisation is the integration of genomic data, yet modelling the complex and unknown relationship between genomes and the processes the microbes are driving is an unsolved problem. In this work, we present the first hybrid modeling framework for deriving biokinetic parameter values of a process-based soil organic matter turnover model from metagenome-inferred functional traits based on DNA sequencing data. Our model predicts biokinetic parameters of the process-based model from genomic trait data with a neural network and integrates constraints from ecological theory and literature to ensure realistic behavior, even of non-observed state variables. We evaluate our method on synthetic genomic trait datasets of varying complexity and on real data, showing that our approach improves performance over multiple baselines and learns the dynamics of unmeasurable components of the process-based model effectively, even for small training datasets.

2606.20359 2026-06-19 cs.LG 新提交

Train, Retrieve, or Both? A Four-Arm Head-to-Head for Correct Statutory Citation on the Ontario Residential Tenancies Act

训练、检索,还是两者兼用?针对安大略省住宅租赁法的正确法定引用的四组头对头比较

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 研究自诉租户、房东和帮助台工作人员如何获得正确的法定引用,通过四组实验比较微调、检索及混合方法,发现SFT+RAG混合模型在精确匹配上得分最高且无幻觉引用。

详情
AI中文摘要

自诉租户、房东和帮助台工作人员需要被指向实际管辖问题的法律条款,并附有正确的法定引用。我们在2006年安大略省住宅租赁法(RTA)及其核心法规上研究此任务,从操作者的角度实证提问:微调是否足够,还是需要混合检索?我们在Qwen2.5-7B-Instruct上运行四组头对头比较(基础零样本、仅LoRA SFT、仅RAG、以及SFT+RAG混合),在一个小型、待人工验证的真实评估集上,以引用的精确匹配(节+小节)评分。基础模型无法引用RTA,仅SFT会错误回忆章节;检索至关重要,并通过构造将幻觉降至零;而SFT+RAG混合模型得分最高,精确匹配为0.481,且无幻觉引用。其优势在于SFT使得条款选择对高召回候选集(损害零样本RAG)更加鲁棒。值得注意的是,这种廉价的bge-small混合模型匹配或超越了基于更大、专门检索模型(更大的嵌入器和交叉编码器重排序器)的管道,更大/改进的训练集也无帮助:在此任务中,强法定引用性能不需要专门的检索模型或更多数据。该工件将幻觉归零并超过了基准提升线,但未达到期望的0.70精确匹配目标。所有结果均基于小型、待人工验证的真实评估集,并作为初步结果报告。

英文摘要

Self-represented tenants, landlords, and help-desk staff need to be pointed at the provision of law that actually governs a question, with a correct statutory citation. We study this task on the Ontario Residential Tenancies Act, 2006 (RTA) and its core regulation, asking the operator's question empirically: is fine-tuning enough, or is hybrid retrieval needed? We run a four-arm head-to-head on Qwen2.5-7B-Instruct (base zero-shot, LoRA SFT-only, RAG-only, and an SFT+RAG hybrid), scored on citation exact-match (section+subsection) over a small, human-verification-pending real eval set. The base model cannot cite the RTA and SFT-only mis-recalls sections; retrieval is essential and drives hallucination to zero by construction; and the SFT+RAG hybrid scores highest at 0.481 exact-match with zero hallucinated citations. Its edge comes from SFT making provision selection more robust to the higher-recall candidate sets that hurt zero-shot RAG. Notably, this cheap bge-small hybrid matches or beats a pipeline built on bigger, specialized retrieval models (a larger embedder and a cross-encoder reranker), and a larger/improved training set does not help either: strong statutory-citation performance here does not require specialized retrieval models or more data. The artifact zeroes hallucination and clears the lift-over-base bar but does not reach the aspirational 0.70 exact-match target. All results are on a small, human-verification-pending real eval set and are reported as preliminary.

2606.20364 2026-06-19 cs.LG 新提交

Judging to Improve: A De-biased VLM-as-3D-Judge Protocol for Single-Image 3D Generation

评判以改进:一种去偏的 VLM-as-3D-Judge 协议用于单图像 3D 生成

Ali Asaria, Tony Salomone, Deep Gandhi

发表机构 * Transformer Lab

AI总结 本文提出一种去偏的跨模型 VLM-as-3D-Judge 协议,将评判者从排序扩展到优化,通过训练与评估评判者分离、位置偏差校正及修复三种失效模式,实现轻量级适应下与强基线的匹配。

详情
AI中文摘要

一项伴随研究建立了一个去偏的、跨模型的 VLM-as-3D-Judge,能够可靠地对单图像到 3D 网格质量进行排序,而廉价的几何和 CLIP 代理在此方面表现不足。本文提出:该评判者的偏好能否专门化一个强大的开放生成器 TRELLIS,针对单一资产类别(家具),且无需人工标注?将评判者从排序扩展到优化是本文的工作所在。将 VLM 评判者推入训练和评估循环会暴露排序从未触发的失效模式,因此我们的贡献是对评判者进行优化级别的强化:一个训练评判者(Qwen2.5-VL-7B)与一个评估评判者(InternVL3-8B)保持分离以打破循环性;位置偏差校正;以及针对三种失效模式(图像过载、隐藏几何的溅射渲染、以及奖励干净但错误输出的无参考评判)的修复,并附有校准证据(清晰差距胜率 0.83-1.0;基线间约 0.5)。使用此协议作为独立评估者,仅从公开模型和数据出发,采用轻量级参数高效适应,我们发现我们的方法匹配了强基线而非超越它。独立基线样本几乎不携带可学习的偏好(0.94 顺序翻转率),因此信号必须通过质量对比构造来设计。在六种适应方法、两种输入模式和严重程度扫描中,最具针对性的方法——严重退化下的条件器修复——达到了与基线持平(0.50),而没有方法达到 >=65% 的胜率目标。结果是机制性的:干净输入使评判者饱和,流式 DIT 微调通过采样器被冲刷,而条件器修复是改变几何的位点。胜率在 n=8 个对象时具有方向性。匹配一个强大的公开数据基线本身具有信息量:超越它需要比公开数据上的轻量级 PEFT 更多,而评判者协议是可复用的。

英文摘要

A companion study established a de-biased, cross-model VLM-as-3D-judge that reliably ranks single-image-to-3D mesh quality where cheap geometry and CLIP proxies fall short. This paper asks: can that judge's preferences specialize a strong open generator, TRELLIS, on one asset class (furniture), cheaply and without human labels? Taking the judge from ranking to optimization is where the work lives. Pushing a VLM judge into the training and evaluation loop exposes failure modes ranking never triggered, so our contribution is an optimization-grade hardening of the judge: a training judge (Qwen2.5-VL-7B) held distinct from an evaluation judge (InternVL3-8B) to break circularity; position-bias correction; and fixes for three failure modes (image overload, geometry-hiding splat renders, and reference-free judging that rewards clean-but-wrong outputs), with calibration evidence (clear-gap win-rate 0.83-1.0; base-vs-base ~0.5). Using this protocol as an independent evaluator, and working only from public models and data with lightweight parameter-efficient adaptation, we find our methods match the strong base rather than exceed it. Independent base samples carry essentially no learnable preference (0.94 order-flip rate), so signal must be engineered by quality-contrastive construction. Across six adaptation methods, two input regimes, and a severity sweep, the most targeted - conditioner repair under severe degradation - reaches parity (0.50) with the base, while no method clears the >=65% win-rate target. The result is mechanistic: clean inputs saturate the judge, flow-DIT fine-tuning washes out through the sampler, and conditioning repair is the locus that moves geometry. Win-rates are directional at n=8 objects. Matching a strong public-data base with cheap adaptation is itself informative: exceeding it needs more than lightweight PEFT on public data, and the judge protocol is reusable.

2606.20417 2026-06-19 cs.LG 新提交

Neural network surrogates with uncertainty quantification for inverse problems in partial differential equations

具有不确定性量化的神经网络代理模型用于偏微分方程反问题

Christian Jimenez-Beltran, Aretha L. Teckentrup, Antonio Vergari, Konstantinos C. Zygalakis

AI总结 提出DeepGaLA神经网络代理模型,为微分方程求解器提供不确定性感知预测,结合延迟接受MCMC诊断,实现高效可靠的贝叶斯反演。

详情
AI中文摘要

微分方程的反问题在科学和工程中普遍存在,其目标是从噪声或不完整的观测中推断未知模型参数。传统数值方法通常计算成本高昂,尤其是在贝叶斯设置中,对于复杂正向模型和高维参数空间,评估似然函数变得非常昂贵。为了应对这一挑战,我们引入了DeepGaLA,一种用于微分方程求解器的神经网络代理模型,它提供不确定性感知的预测,在训练数据有限时减少过度自信的推断。为了在实践中评估代理诱导的后验近似的保真度,我们表明,短时间运行的延迟接受马尔可夫链蒙特卡洛可以作为有效的诊断工具。在一系列数值实验中,DeepGaLA提供的正向模型近似精度与已建立的高斯过程代理相当,同时在参数维度增加时更好地保持效率。此外,它可以纳入微分方程约束,包括非线性情况。总体而言,这些结果表明,具有不确定性量化的神经代理模型能够实现复杂系统中反问题的可扩展且可靠的贝叶斯推断。

英文摘要

Inverse problems for differential equations arise throughout science and engineering, where one seeks to infer unknown model parameters from noisy or incomplete observations. Traditional numerical methods for these problems are often computationally expensive, particularly in Bayesian settings where evaluating the likelihood becomes costly for complex forward models and high-dimensional parameter spaces. To address this challenge, we introduce DeepGaLA, a neural-network surrogate for differential equation solvers that provides uncertainty-aware predictions, reducing overconfident inference when training data are limited. To evaluate the fidelity of the surrogate-induced posterior approximations in practice, we show that a short run of delayed-acceptance Markov chain Monte Carlo can serve as an effective diagnostic. Across a range of numerical experiments, DeepGaLA delivers forward-model approximations with accuracy comparable to established Gaussian-process surrogates, while better maintaining efficiency as parameter dimension grows. Moreover, it can incorporate differential-equation constraints, including in nonlinear settings. Overall, these results indicate that uncertainty-quantified neural surrogates can enable scalable and reliable Bayesian inference for inverse problems in complex systems.

2606.20467 2026-06-19 cs.LG cs.NA math.NA physics.comp-ph 新提交

Agentic Symbolic Search: Characterizing PDEs Beyond Hand-crafted Expressions, Meshes, and Neural Networks

智能符号搜索:超越手工表达式、网格和神经网络的PDE特征化

Zongmin Yu, Liu Yang

发表机构 * National University of Singapore(新加坡国立大学)

AI总结 提出ASYS框架,通过智能体将PDE理论转化为可微分符号程序,结合进化搜索和梯度优化自动发现解析形式或近似,在多个问题中生成可解释表示。

详情
AI中文摘要

数学家通过数学结构而非计算值表来理解PDE解。历史上,这需要针对每个问题单独进行数学分析。数值模拟和神经网络都不能直接产生这些结构。我们提出智能符号搜索(ASYS),一种先验引导框架,其中智能体将PDE理论、公共问题约束和累积搜索经验转化为可测试的可微分符号程序。数学形式在进化搜索下被精炼,而其连续参数通过基于梯度的优化拟合。这使得搜索成为归纳偏置注入的自动化形式,而非盲目的符号回归。对于已知解析形式的问题,ASYS自然恢复这些形式;对于其他问题,ASYS构建解析近似,可引导数学家进行进一步分析。在我们的实验中,跨越五个问题,包括有界动力学、有限时间爆破和自由边界聚焦,ASYS产生了可解释表示,包括Allen-Cahn 2D动力学的几何界面公式和Keller-Segel趋化爆破的九参数收缩律,这些场景中先前没有闭式描述。ASYS展示了表征PDE解的新范式的可能性,超越了手工解析解、基于网格的数值解和神经网络近似。

英文摘要

Mathematicians understand a PDE solution through mathematical structures rather than tables of computed values. Historically, this has been the product of mathematical analysis, carried out by hand for each problem individually. Neither numerical simulation nor neural networks produce those structures directly. We propose Agentic Symbolic Search (ASYS), a prior-guided framework in which an agent translates PDE theory, public problem constraints, and accumulated search experience into testable differentiable symbolic programs. The mathematical forms are refined under evolutionary search, while their continuous parameters are fit by gradient-based optimization. This makes the search an automated form of inductive-bias injection rather than blind symbolic regression. For problems with known analytical forms, ASYS recovers these forms naturally; for other problems, ASYS constructs analytical approximations which can guide mathematicians toward further analysis. In our experiments, across five problems spanning bounded dynamics, finite-time blow-up, and free-boundary focusing, ASYS produces interpretable representations, including a geometric interface formula for Allen-Cahn 2D dynamics and a nine-parameter contraction law for Keller-Segel chemotactic blow-up, in settings where no closed-form description was previously available. ASYS shows the possibility of a new paradigm for characterizing PDE solutions, beyond handcrafted analytical solutions, mesh-based numerical solutions, and neural network approximations.

13. 其他/综合机器学习 3 篇

2606.19610 2026-06-19 cs.LG cs.AI 新提交

Latent Confounded Causal Discovery via Lie Bracket Geometry

基于李括号几何的潜在混杂因果发现

Sridhar Mahadevan

发表机构 * Adobe Research(Adobe研究院) University of Massachusetts, Amherst(马萨诸塞大学阿默斯特分校)

AI总结 利用信息几何和范畴论,提出两种算法(BRIDGE和SKFM),通过干预诱导流的李括号非闭合性检测潜在混杂,大幅缩减因果图搜索空间。

Comments 39 pages

详情
AI中文摘要

最近关于Kan-Do-Calculus (KDC)的工作已经确立了被动观察和主动干预在因果推断中的边界是一个范畴论双伴随,其中干预由左Kan扩展建模,条件作用由右Kan扩展建模。本文在潜在混杂下引入了两种因果发现算法,基于KDC的信息几何和范畴论结果。在光滑统计设置中,观测和干预测度之间的Radon-Nikodym导数诱导局部因果向量场;这些场在李括号下不闭合的失败成为可计算的Frobenius残差,我们将其解释为失败的可视可积性和可能的潜在或未建模结构的证据。我们的第一个算法BRIDGE(用于干预发现和几何估计的括号残差)结合了一个干预密度或Radon-Nikodym比引擎与一个几何筛选器,该筛选器提出一个高召回率的可接受箭头族,识别非闭合的可视对作为潜在障碍候选,并将缩减后的族传递给下游的基于分数或可微的发现程序。第二个算法贡献,谱Kan-Do流匹配(SKFM),学习摊销干预场并在谱上分解潜在曲率,揭示BRIDGE指向的直接李空间端点。一系列详细的实验表明,两种算法都能发现具有潜在混杂的因果模型,同时将可能的DAG的超指数空间缩减多个数量级。本文引入了一种新的因果发现范式,其中潜在结构直接从干预诱导流的几何中推断出来。

英文摘要

Recent work on Kan-Do-Calculus (KDC) has established that the boundary between passive observation and active intervention in causal inference is a category-theoretic bi-adjunction, with interventions modeled by left Kan extensions and conditioning by right Kan extensions. This paper introduces two causal discovery algorithms under latent confounding, building on the information-geometric and categorical consequences of KDC. In smooth statistical settings, Radon-Nikodym derivatives between observational and interventional measures induce local causal vector fields; failures of these fields to close under Lie brackets become computable Frobenius residuals, which we interpret as witnesses of failed visible integrability and possible latent or unmodeled structure. Our first algorithm, BRIDGE (Bracket Residuals for Interventional Discovery and Geometric Estimation), combines an interventional density or Radon-Nikodym-ratio engine with a geometric screen that proposes a high-recall family of admissible arrows, identifies non-closing visible pairs as latent-obstruction candidates, and passes the reduced family to downstream score-based or differentiable discovery routines. The second algorithmic contribution, Spectral Kan-Do Flow Matching (SKFM), learns amortized intervention fields and factors latent curvature spectrally, exposing the direct Lie-space endpoint toward which BRIDGE points. A detailed set of experiments show that both algorithms are capable of discovering causal models with latent confounders while collapsing the super-exponential space of possible DAGs by many orders of magnitude. This paper introduces a new paradigm in causal discovery, where latent structure is inferred directly from the geometry of intervention-induced flows.

2606.20493 2026-06-19 cs.LG cs.AI cs.MA 新提交

Contagion Networks: Evaluator Bias Propagation in Multi-Agent LLM Systems

传染网络:多智能体LLM系统中的评估者偏见传播

Zewen Liu

发表机构 * Qilu Institute of Technology, School of Software Engineering(齐鲁理工学院软件工程学院)

AI总结 提出传染网络框架,量化评估者偏见在多智能体LLM系统中的传播,发现同模型智能体间偏见传播系数为0.157-0.352,且增大评估委员会规模可减少72.4%的传播效应。

Comments 20 pages, 4 figures, 4 tables

详情
AI中文摘要

当大型语言模型在多智能体系统中担任评估者时,其系统性评估偏见会通过智能体网络传播。我们引入传染网络,这是一个用于衡量评估者偏见如何在交互的LLM智能体间传播的正式框架。在使用DeepSeek-chat进行的受控3智能体实验中,我们采用了三种不同的评估者偏见配置文件(结构化、平衡、基于证据),测量了跨智能体传染矩阵Gamma_3,并发现评估者偏见始终在智能体间传播(gamma在[0.157, 0.352]范围内),即使是在相同底层模型内也是如此。我们识别出由谱半径rho(Gamma_N)控制的三种传播机制,并证明同质模型智能体产生的传染系数比先前工作中观察到的跨模型系数弱3-5倍(MM-EPC: gamma约0.85-1.3),使其处于抑制机制中。我们表明,将评估委员会规模从k=1增加到k=3可将有效传染减少72.4%,提供了一种可行的缓解策略。我们发布了开源的传染网络实验框架。

英文摘要

When large language models serve as evaluators in multi-agent systems, their systematic evaluation biases propagate through the agent network. We introduce Contagion Networks, a formal framework for measuring how evaluator biases spread across interacting LLM agents. In a controlled 3-agent experiment using DeepSeek-chat with three distinct evaluator bias profiles (structured, balanced, evidence-based), we measure the Cross-Agent Contagion Matrix Gamma_3 and find that evaluator biases consistently propagate between agents (gamma in [0.157, 0.352]), even within the same underlying model. We identify three propagation regimes governed by the spectral radius rho(Gamma_N), and demonstrate that homogeneous-model agents produce contagion coefficients 3-5x weaker than cross-model coefficients observed in prior work (MM-EPC: gamma approx 0.85-1.3), placing them in the suppression regime. We show that increasing evaluator committee size from k=1 to k=3 reduces effective contagion by 72.4%, providing an actionable mitigation strategy. We release the open-source Contagion Network experimental framework.

2606.20560 2026-06-19 cs.LG cs.AI 新提交

How Transparent is DiffusionGemma?

DiffusionGemma 的透明度如何?

Joshua Engels, Callum McDougall, Bilal Chughtai, Janos Kramar, Senthoran Rajamanoharan, Cindy Wu, Arthur Conmy, Asic Q Chen, Jean Tarbouriech, Min Ma, Brendan O'Donoghue, João Gabriel Lopes de Oliveira, Rohin Shah, Neel Nanda

发表机构 * Google(谷歌)

AI总结 研究DiffusionGemma在连续潜空间中的推理透明度,通过变量透明度和算法透明度分解,发现可解释的令牌瓶颈将不透明串行深度降至Gemma 4的1.1倍,并揭示扩散特有现象。

Comments 20 main text pages and 6 pages of references and appendices

详情
AI中文摘要

LLM推理透明度是理解模型决策、减少误用和错位以及调试意外模型行为的关键能力。然而,DiffusionGemma在连续潜空间中执行了更大比例的计算;这是否使其推理透明度降低?我们通过将透明度分解为两个组成部分来研究这个问题:变量透明度,即我们是否理解模型计算状态的中间快照;以及算法透明度,即我们是否能够利用这些快照重建模型得出其输出的过程。直观上,DiffusionGemma的变量透明度较差:其不透明串行深度,即在可解释模型状态之间发生的串行计算量,最初似乎是相应自回归Gemma 4模型的28.6倍。然而,我们表明,我们可以通过一个可解释的令牌瓶颈映射去噪步骤之间流动的信息,且下游性能没有下降。将这些中间状态视为可解释的,将不透明串行深度降至仅为Gemma 4的1.1倍。对于扩散模型来说,算法透明度比自回归模型更难,因为画布中的所有令牌预测在每个去噪步骤中都可能发生变化,这使模型有能力在去噪过程中实现复杂的分布式算法。为了开始弥合这一差距,我们进行了一系列可解释性案例研究,发现了扩散特有现象(如非时序推理、令牌和序列涂抹以及中间上下文推理)的初步证据。最后,我们测试了可监控性,这是透明度的一个关键应用,衡量模型输出是否对下游任务有用。我们发现DiffusionGemma的可监控性与Gemma 4相似。

英文摘要

LLM reasoning transparency is a critical affordance for understanding model decisions, mitigating misuse and misalignment, and debugging surprising model behaviors. However, DiffusionGemma performs a larger fraction of its computation in a continuous latent space; does this make its reasoning less transparent? We study this question by decomposing transparency into two components: variable transparency, whether we understand intermediate snapshots of a model's computational state; and algorithmic transparency, whether we can use these snapshots to reconstruct the process by which the model arrived at its outputs. Naively, DiffusionGemma has poor variable transparency: its opaque serial depth, the amount of serial computation that occurs in between interpretable model states, seems at first 28.6X higher than the corresponding autoregressive Gemma 4 model. However, we show that we can map the information flowing between denoising steps through an interpretable token bottleneck with no decrease in downstream performance. Treating these intermediate states as interpretable reduces the opaque serial depth to just 1.1X that of Gemma 4. Algorithmic transparency is harder for diffusion models than for autoregressive models because all token predictions in the canvas can change at every denoising step, giving the model the power to implement complicated distributed algorithms during the denoising process. To begin bridging this gap, we conduct a suite of interpretability case studies, uncovering initial evidence of novel diffusion-specific phenomena such as non-chronological reasoning, token and sequence smearing, and intermediate-context reasoning. Finally, we test monitorability, a key application of transparency that measures whether model outputs are useful for downstream tasks. We find that DiffusionGemma is similarly monitorable to Gemma 4.