URL PDF HTML ☆

赞 0 踩 0

2606.17567 2026-06-17 cs.LG 新提交

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

通过残差正交化减少Boosting中的学习器冗余

Ye Su, Jipeng Guo, Yong Liu, Xin Xu, Gangchun Zhang, Jinxin Chen, Di Wu, Longlong Zhao

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences（中国科学院深圳先进技术研究院）； College of Information Science and Technology, Beijing University of Chemical Technology（北京化工大学信息科学与技术学院）； Gaoling School of Artificial Intelligence, Renmin University of China（中国人民大学高瓴人工智能学院）； School of Computer Science, Central China Normal University（华中师范大学计算机学院）； the School of Computing, Engineering and Mathematical Sciences, La Trobe University（拉筹伯大学计算、工程与数学科学学院）

AI总结针对Boosting中残差拟合导致的学习器冗余问题，提出SCBoost框架，通过谱残差投影和协方差正则加权两种机制减少冗余，理论证明其几何性质，实验表明在精度和F1分数上表现优异。

详情

AI中文摘要

虽然顺序残差拟合是标准Boosting框架的基础，但它通过反复处理相关的误差成分，内在导致了学习器冗余。为了解决这一瓶颈，我们提出从残差拟合转向\textit{残差正交化}，并引入SCBoost。我们的框架通过两种互补机制处理冗余：谱残差投影（SRP）和协方差正则加权（CRW）。在训练过程中，SRP将每个残差目标投影到历史预测子空间的正交补上，迫使后续学习器仅捕获新的经验创新。在聚合过程中，CRW在验证集上优化集成权重，并加入显式的协方差惩罚以减轻剩余相关性。理论上，我们提供了有限样本的几何刻画，证明SRP产生精确的加性残差能量分解。此外，在各向同性噪声假设下，我们严格建立了该投影改善有效信噪比的条件。在十个基准数据集上的大量实验表明，SCBoost在开箱即用的情况下表现出色，特别是在准确率和F1分数上。这项工作通过几何视角重新诠释了Boosting，表明显式的冗余控制是迈向更高效集成架构的一个有原则且必要的步骤。

英文摘要

While sequential residual fitting is the bedrock of standard boosting frameworks, it inherently breeds learner redundancy by repeatedly revisiting correlated error components. To address this bottleneck, we propose a shift from residual fitting to \textit{residual orthogonalization} and introduce SCBoost. Our framework tackles redundancy through two complementary mechanisms: Spectral Residual Projection (SRP) and Covariance-Regularized Weighting (CRW). During training, SRP projects each residual target onto the orthogonal complement of the historical prediction subspace, forcing successive learners to capture only novel empirical innovations. During aggregation, CRW optimizes ensemble weights on a validation set with an explicit covariance penalty to mitigate remaining correlations. Theoretically, we provide a finite-sample geometric characterization proving that SRP yields an exact additive residual-energy decomposition. Furthermore, under an isotropic-noise assumption, we rigorously establish the conditions under which this projection improves the effective Signal-to-Noise Ratio. Extensive experiments across ten benchmark datasets demonstrate that SCBoost delivers strong out-of-the-box performance, particularly in accuracy and F1 score. This work reinterprets boosting through a geometric lens, suggesting that explicit redundancy control is a principled and necessary step toward more efficient ensemble architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.17572 2026-06-17 cs.LG cs.SY eess.SY 新提交

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

当动力学模型读取错误的时间步：无标签事件信用重锚定以实现鲁棒的全局读出

Yifan Wang

AI总结针对序列到全局接口中的时间信用稀释问题，提出无训练无标签的CREST方法，通过事件核心估计与对比重锚定，减少分布外误差并恢复事件信用。

Comments 7 pages, 6 figures

详情

AI中文摘要

学习到的动力学模型通常通过将每步特征序列池化为一个读出向量来回答全局物理问题，如故障严重性或冲击刚度。这种序列到全局的接口产生了一个未被充分研究的时间信用问题：在仅有轨迹级监督的情况下，模型可以在训练条件下准确预测，同时从丰富的平滑相关物而非决定目标的短暂物理事件中读取信息。我们将这种失败称为时间信用稀释。它不会被训练损失暴露，也不会被标准的物理信息残差消除，因为错误在于全局读出分配功能信用的位置。我们引入了Credit-in-Event，一种接口级探针，用于测量池化信用落在事件步上的程度，并闭式证明当事件分数缩小时，池化线性读取器将信用路由到虚假的背景通道。然后我们提出了CREST，一种无训练且无标签的读出方法，它从学习到的特征中估计瞬态事件核心，并通过事件与其余部分的对比重锚定池化表示。在模拟齿轮和冲击系统、循环和注意力编码器以及公共轴承振动数据上，CREST减少了分布外误差，同时恢复了事件信用。消融实验表明，稳定步选择和感受野缩小失败，证实了增益来自事件核心信用重锚定，而非通用的局部性或稳定性先验。

英文摘要

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.

URL PDF HTML ☆

赞 0 踩 0

2606.17816 2026-06-17 cs.LG cs.AI 新提交

Conservation Laws for Modern Neural Architectures

现代神经架构的守恒律

Viet-Hoang Tran, Vinh Khanh Bui, Tan Lai Ngoc, Nam Nguyen, Tuan Dam, Tan M. Nguyen

发表机构 * National University of Singapore（新加坡国立大学）； Center for AI Research, VinUniversity（Vin大学人工智能研究中心）； Independent Researcher（独立研究者）； Hanoi University of Science and Technology（河内科学技术大学）

AI总结本文提出统一框架，刻画GELU、SiLU、SwiGLU激活的前馈网络、多头注意力及混合专家模型中的梯度流守恒律，实验验证了理论预测的不变量。

Comments Published at the International Conference on Machine Learning (ICML 2026)

2606.17830 2026-06-17 cs.LG cs.AI 新提交

Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity

注意力中的功能等价性：一项综合研究及其在线性模式连通性中的应用

Viet-Hoang Tran, Vinh Khanh Bui, Van-Hoan Trinh, Tan Lai Ngoc, Tan M. Nguyen

发表机构 * National University of Singapore（新加坡国立大学）； Center for AI Research, VinUniversity（Vin大学人工智能研究中心）； Independent Researcher（独立研究者）； Technical University of Munich（慕尼黑技术大学）

AI总结本文形式化研究了Transformer中位置编码对功能等价性的影响，发现正弦编码保持原始注意力的对称性，而旋转编码显著减少对称群从而增强表达力，并通过对齐算法实证了位置编码对线性模式连通性的关键作用。

Comments Published at the International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

神经网络参数空间本质上是非单射的，因为不同的参数配置可以通过功能等价性实现相同的函数。虽然这种对称性在经典的全连接和卷积模型中已被充分理解，但在现代基于注意力的架构中变得更为复杂。现有的多头注意力分析主要关注原始公式，忽略了从根本上重塑架构对称性的位置编码。在这项工作中，我们提供了对带有位置编码的Transformer中功能等价性的形式化研究。聚焦于两种最广泛使用的变体——正弦和旋转位置编码（RoPE）——我们表明正弦编码保留了原始注意力的等价结构，而旋转编码显著减少了对称群，从而增强了表达力。这为RoPE在实践中日益突出的地位提供了原则性解释。我们进一步研究了位置编码如何影响线性模式连通性，并通过一种对齐算法，实证表明Transformer设置中连通性的存在和可变性关键取决于位置编码。

英文摘要

Neural network parameter spaces are inherently non-injective, as distinct parameter configurations can realize identical functions through functional equivalence. While this symmetry is well understood in classical fully connected and convolutional models, it becomes substantially more intricate in modern attention-based architectures. Existing analyses of multihead attention have largely focused on the vanilla formulation, overlooking positional encodings that fundamentally reshape architectural symmetries. In this work, we provide a formal study of functional equivalence in Transformers with positional encodings. Focusing on the two most widely used variants--sinusoidal and rotary positional encodings (RoPE)--we show that sinusoidal encodings preserve the equivalence structure of vanilla attention, whereas rotary encodings significantly reduce the symmetry group, thereby enhancing expressivity. This offers a principled explanation for the growing prominence of RoPE in practice. We further examine how positional encodings affect linear mode connectivity, and through an alignment algorithm, empirically demonstrate that the presence and variability of connectivity across Transformer settings crucially depend on the positional encoding.

URL PDF HTML ☆

赞 0 踩 0

2606.17832 2026-06-17 cs.LG 新提交

From Drift to Coherence: Stabilizing Beliefs in LLMs

从漂移到一致：稳定LLM中的信念

SongEun Kim, Seungyoo Lee, Edwin Fong, Hyungi Lee, Juho Lee

发表机构 * Department of Statistics, Seoul National University ； Korea Advanced Institute of Science \& Technology ； Department of AI, Kookmin University ； University of Hong Kong

AI总结研究LLM在多项选择问答中的信念漂移问题，提出提示式预测重采样（PPR）方法，发现信念过程会自稳定并收敛，进而提出种子答案提示策略和自一致性损失以加速稳定并提高预测一致性。

详情

AI中文摘要

大型语言模型（LLM）常被假设执行隐式贝叶斯推理，然而一个关键的一致性条件——预测信念的鞅性质——已被证明在受控的合成上下文学习设置中失效。我们在更典型的使用场景中重新审视这个问题：通用多项选择问答。利用离散答案空间，我们计算精确的预测分布，并研究由自回归答案重采样引起的信念动态。我们引入了提示式预测重采样（PPR），其中LLM对同一问题生成一系列答案。实验表明，PPR揭示了早期阶段的信念漂移，表明鞅性质被违反。然而，在足够的重采样步骤后，信念过程自稳定并收敛到一个一致的预测分布。基于这一观察，我们进一步提出了（i）种子答案提示策略以加速稳定，以及（ii）自一致性损失，通过微调将早期漂移摊销到模型中。在多项选择问答基准上的实验表明，我们的方法在不牺牲准确性的情况下显著减少了信念漂移并提高了预测一致性。

英文摘要

Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.17886 2026-06-17 cs.LG 新提交

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

单调Kolmogorov-Arnold网络：单调性作为归纳偏置的理论与实证研究

Mikhail Krasnov, Carolina Fortuna, Blaž Bertalanič

发表机构 * Jozef Stefan Institute（约瑟夫·斯特凡研究所）

AI总结提出MKAN，通过指数重参数化B样条系数、正边权和单调基激活实现硬单调性，理论证明任何特征提取器可被单调化且编码器规模有界，实验表明MKAN在单调性基准上达到最优并保持KAN的逐边功能透明性。

详情

AI中文摘要

单调性一直是神经网络长期使用的架构归纳偏置，其动机来源于表格、科学和经济场景，其中输出已知对某些输入呈单调响应。现有方法基于MLP或流模型，缺乏逐边功能透明性；唯一具有单调性的KAN变体MonoKAN仅在受限参数子集上施加约束，并需要投影式训练过程。我们通过\textbf{MKAN}填补了这一空白，MKAN是一种KAN，通过B样条系数的指数重参数化、正边权和单调基激活，对所有参数值保证硬单调性。训练简化为标准的无约束梯度下降。我们的主要理论贡献是一个\textbf{表示代价}定理：任何诱导球状语义邻域划分的$C^K, K >0$特征提取器，都可以在$N' = N^* + k \le 2N^*$处实现等价邻域结构的单调实现，其中$k$是原始非单调坐标的数量。该界限与架构无关，并为单调编码器提供了原则性的规模确定规则。实验上，MKAN在SMM/ICML-2024基准上与最先进的单调神经网络竞争，同时是唯一结合了硬无约束单调性和KAN逐边功能透明性的方法；在四个真实数据集上的自监督特征规模扫描中验证了$2N^*$预测，在受控单调生成数据集上，MKAN以显著高于KAN、MLP和线性基线的Spearman对齐恢复了真实因子。

英文摘要

Monotonicity has been a long-running architectural inductive bias for neural networks, motivated by tabular, scientific, and economic settings where outputs are known to respond monotonically to certain inputs. Existing approaches are MLP- or flow-based and lack per-edge functional transparency; the only Kolmogorov--Arnold Network (KAN) variant with monotonicity, MonoKAN, enforces the constraint only on a restricted parameter subset and requires a projection-style training procedure. We close this gap with \textbf{MKAN}, a KAN with hard monotonicity guaranteed for \emph{all} parameter values via exponential reparameterization of B-spline coefficients, positive edge weights, and a monotone base activation. Training reduces to standard unconstrained gradient descent. Our headline theoretical contribution is a \emph{representation-cost} theorem: any $C^K, K >0$ feature extractor inducing a ball-shaped semantic-neighborhood partition admits a monotone realization of the equivalent neighborhood structure at $N' = N^* + k \le 2N^*$, where $k$ is the number of non-monotone coordinates of the original. The bound is architecture-agnostic and gives a principled sizing rule for monotone encoders. Empirically, MKAN is competitive with state-of-the-art monotone NNs on the SMM/ICML-2024 benchmark while being the only method that combines hard unconstrained monotonicity with KAN's per-edge functional transparency; the $2N^*$ prediction is validated in a self-supervised feature-size sweep on four real datasets, and on a controlled monotone-generative dataset MKAN recovers ground-truth factors with substantially higher Spearman alignment than KAN, MLP, and linear baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.17927 2026-06-17 cs.LG cs.AI 新提交

KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation

KANLib -- 一个模块化、可扩展且快速的Kolmogorov-Arnold网络实现

Julian Hoever, Gregor Schiele

发表机构 * Intelligent Embedded Systems University of Duisburg-Essen（智能嵌入式系统杜伊斯堡-埃森大学）

AI总结提出KANLib框架，通过统一现有KAN实现、支持多种基函数和自适应网格缩放，在保持灵活性和高性能的同时，实现可复现的预测结果。

详情

AI中文摘要

Kolmogorov-Arnold网络（KAN）最近通过用可学习的一元函数替代线性权重，成为传统多层感知器的一种有前途的替代方案。尽管在可解释性和表达能力方面具有理论优势，但由于高计算成本和现有框架中不一致的功能支持，KAN的实际研究仍然困难。本文介绍了KANLib，一个用于开发和评估KAN架构的模块化、可扩展且计算高效的框架。KANLib在强调灵活性、功能一致性和高性能的一致软件架构中，统一了现有实现（包括PyKAN、EfficientKAN和FastKAN）的核心概念。该框架支持两种基函数类型、自适应网格缩放、网格扩展和细粒度架构定制，同时保持与标准PyTorch工作流的兼容性。在加利福尼亚房价基准上的实验评估表明，KANLib在重现已建立参考KAN实现的预测行为的同时，实现了具有竞争力的计算效率。此外，该框架能够探索超出标准KAN公式的架构变体，且对预测性能影响很小。总体而言，KANLib为未来关于可扩展和可扩展KAN架构的研究提供了坚实的基础。

英文摘要

Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional multilayer perceptrons by replacing linear weights with learnable univariate functions. Despite their theoretical advantages in interpretability and expressiveness, practical research of KANs remains difficult due to high computational costs and inconsistent feature support across existing frameworks. This paper introduces KANLib, a modular, extensible, and computationally efficient framework for developing and evaluating KAN architectures. KANLib unifies core concepts from existing implementations, including PyKAN, EfficientKAN, and FastKAN, within a consistent software architecture that emphasizes flexibility, feature parity, and high performance. The framework supports two basis function types, adaptive grid rescaling, grid extension, and fine-grained architectural customization while maintaining compatibility with standard PyTorch workflows. Experimental evaluation on the California Housing benchmark demonstrates that KANLib reproduces the predictive behavior of established reference KAN implementations while achieving competitive computational efficiency. Furthermore, the framework enables the exploration of architectural variations beyond standard KAN formulations with only minor impacts on predictive performance. Overall, KANLib provides a robust foundation for future research on scalable and extensible KAN architectures.

URL PDF HTML ☆

赞 0 踩 0

2606.17952 2026-06-17 cs.LG cs.AI 新提交

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

SoftMoE: 用于大语言模型混合专家网络的软可微路由

Mikołaj Zasada, Łukasz Struski, Jacek Tabor, Marcin Kurdziel

发表机构 * AGH University of Krakow, Poland（克拉科夫AGH大学）； Faculty of Mathematics（数学系）； Computer Science, Jagiellonian University, Poland（计算机科学系，杰哥利安大学，波兰）； Centre for Credible Artificial Intelligence, Warsaw University of Technology（可信人工智能中心，华沙技术大学）

AI总结提出SoftMoE，通过软top-k LapSum松弛替代离散路由，实现专家路由的梯度优化，并学习每层专家激活数量，在语言建模中激活更少专家达到相当或更优性能。

Comments Accepted at ICML 2026

详情

AI中文摘要

稀疏混合专家（MoE）架构通过仅激活一小部分专家（通过top-$k$路由）在固定推理预算下扩展LLM参数。虽然这保持了因果性并适用于自回归语言模型，但离散的top-$k$算子不可微，强制每个输入激活固定数量的专家，导致计算利用效率低下。我们提出SoftMoE，用截断的软top-$k$ LapSum松弛替代离散路由，允许基于梯度的专家路由优化。我们进一步参数化每层平均激活专家数，并施加全局预算约束，使模型能够学习跨层分配专家容量。SoftMoE完全兼容自回归建模，在语言建模和下游任务上达到与稀疏MoE相当或更优的性能，同时激活显著更少的专家。值得注意的是，学习到的分配高度非均匀，后层激活更多专家。源代码已公开$^\dagger$。

英文摘要

Sparse Mixture-of-Experts (MoE) architectures enable scaling LLM parameters under a fixed inference budget by activating only a small subset of experts via top-$k$ routing. While this preserves causality and suits autoregressive language models, the discrete top-$k$ operator is not differentiable, forcing a fixed number of active experts per input and resulting in inefficient use of computation. We propose SoftMoE, which replaces discrete routing with a truncated soft top-$k$ LapSum relaxation, allowing gradient-based optimization of expert routing. We further parameterize the mean number of active experts per layer and impose a global budget constraint, enabling the model to learn how to allocate expert capacity across layers. SoftMoE remains fully compatible with autoregressive modeling and achieves performance comparable to or better than sparse MoE on language modeling and downstream tasks, while activating significantly fewer experts. Notably, the learned allocation is highly non-uniform, with later layers activating more experts. The source code is publicly available$^\dagger$.

URL PDF HTML ☆

赞 0 踩 0

2606.18023 2026-06-17 cs.LG cs.AI 新提交

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

LoopCoder-v2: 仅循环一次以实现高效的测试时计算扩展

Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai

发表机构 * Beihang University（北京航空航天大学）； IQuest Research ； Langboat（浪波）； Renmin University of China（中国人民大学）

AI总结本文提出并行循环Transformer（PLT）并研究循环次数选择，发现两循环变体在代码生成等任务上显著提升，而三循环以上性能下降，揭示了增益-成本权衡。

详情

AI中文摘要

循环Transformer通过重复应用共享块来扩展潜在计算，但顺序循环会随着循环次数增加延迟和KV缓存内存。并行循环Transformer（PLT）通过跨循环位置偏移（CLP）和共享KV门控滑动窗口注意力来缓解这一成本，使循环次数成为实际设计选择。因此，我们通过增益-成本视角研究PLT循环次数选择：额外的循环可能细化表示，但CLP在每个循环边界引入位置不匹配。我们通过从头训练LoopCoder-v2（一组具有不同循环次数的7B PLT编码器）在18T token上，随后进行匹配的指令调优和评估来实例化这项研究。经验上，两循环变体在代码生成、代码推理、代理软件工程和工具使用基准上比无循环基线带来广泛提升，将SWE-bench Verified从43.0提高到64.4分，Multi-SWE从14.0提高到31.0分。相比之下，三循环或更多循环的变体性能下降，揭示了强烈的非单调循环次数效应。我们的诊断表明，循环2提供了主要的生产性细化，而后续循环产生递减、振荡的更新和降低的表示多样性。由于CLP引起的不匹配在细化收益缩小时大致固定，偏移成本日益占主导。这种增益-成本权衡解释了PLT在两循环处饱和，并为循环次数选择提供了诊断。

重新审视自回归多任务表格识别中的结构依赖性：基于顺序无关的单元格级表示

Takaya Kawakatsu

发表机构 * Preferred Networks, Inc.（Preferred Networks公司）

AI总结针对自回归多任务表格识别中单元格表示顺序依赖导致全局一致性下降的问题，提出通过非因果注意力生成顺序无关的单元格特征，实现并行推理，在两大数据集上提升定位与识别性能，推理时间减少约3倍。

Comments ICDAR 2026

2606.18032 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 交叉投稿

INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities

INI-VPINN：一种隐式处理纽曼边界和界面的变分物理信息神经网络，适用于具有几何奇异性的多材料域

Shayan Dodge, Alessandro Formisano, Sami Barmada

发表机构 * DESTeC University of Pisa（DESTeC 帕尔米斯大学）

AI总结提出一种新的弱形式物理信息神经网络INI-VPINN，通过隐式处理纽曼边界和界面条件，无需额外损失项或多子域网络，在多材料问题中实现更高精度和更快收敛。

Comments Preprint version. Under peer review. Code available at: https://github.com/ShayanDodge/INI-VPINN

详情

AI中文摘要

我们提出了一种新的弱形式物理信息神经网络方法（命名为INI-VPINN）。INI-VPINN将纽曼边界和界面条件自然地纳入变分公式中，消除了对额外损失项或多个子域网络的需求。该框架采用紧支撑加权函数和分部积分来隐式地施加通量和连续性约束，从而在材料边界上隐式地确保物理一致性。所提出的方法在具有尖锐界面和复杂几何的泊松和拉普拉斯问题上进行了测试。结果表明，与其他几种基于物理信息神经网络的公式相比，INI-VPINN始终实现更高的精度、更平滑和更快的收敛。所提出的框架提供了一种使用神经网络求解具有复杂几何和混合纽曼-狄利克雷边界条件的多材料问题的通用方法。该实现已在GitHub仓库中公开。

英文摘要

We propose a new weak-form Physics-Informed Neural Network approach (named INI-VPINN). INI-VPINN naturally incorporates Neumann boundary and interface conditions into the variational formulation. It removes the need for additional loss terms or multiple subdomain networks. This framework employs compact support weighting functions and integration by parts to implicitly impose flux and continuity constraints. In this way, it implicitly ensures physical consistency across material boundaries. The proposed method is tested on Poisson and Laplace problems with sharp interfaces and complex geometries. Results show that, compared with several other Physics Informed Neural Networks-based formulations, the INI-VPINN consistently achieves higher accuracy, smoother and faster convergence. The proposed framework provides a general approach for solving multimaterial problems with complex geometries and mixed Neumann-Dirichlet boundary conditions using neural networks. The implementation is publicly available in a GitHub repository.

URL PDF HTML ☆

赞 0 踩 0

2606.18175 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 交叉投稿

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

一种基于凸拟线性化的物理信息神经网络求解非线性偏微分方程的方法

Gbenga T. Awojinrin, Abdul-Akeem Olawoyin, Rami M. Younis

发表机构 * Texas A\&M University, College Station, Texas, U.S.A.（德克萨斯大学阿姆斯特朗分校）

AI总结提出LiL-Q方法，通过Bellman-Kalaba拟线性化将非线性PDE转化为线性子问题序列，采用线性参数化试验空间（LiL）和凸最小二乘求解，避免非凸梯度训练，理论保证牛顿-康托罗维奇收敛，在多个基准上以少量外迭代达到高精度。

Comments Preprint. 56 pages, 18 figures. Code: https://github.com/awojinrin/lilq-pinn

详情

AI中文摘要

我们提出了一种数值方法，用于求解非线性偏微分方程（PDE）的正向问题。该方法中，Bellman-Kalaba拟线性化将非线性问题简化为一系列线性子问题，每个子问题通过配置法离散到参数线性输入的试验空间上，并通过单次直接线性最小二乘QR分解求解。该试验空间称为线性可学习（LiL），包含其可训练参数线性进入的表示，包括随机特征极限学习机、谱多项式基和三角展开，每个都作为物理信息神经网络实现。因此，该方法用凸的每步求解替代了限制标准PINN的非凸梯度训练。我们建立了外迭代在显式小条件下局部牛顿-康托罗维奇收敛到残差受限邻域，极限精度由试验空间的最佳逼近残差决定，而非优化容差。该方法记为LiL-Q，在七个基准上进行了评估，涵盖标量非线性PDE（Bratu、粘性Burgers、Buckley-Leverett）、耦合系统（平面应变弹性和二维及三维不可压缩Navier-Stokes方程）以及具有非均匀渗透率的稳态达西流。在这些问题中，LiL-Q在大多数情况下以个位数外迭代收敛，即使在最粗的基尺寸下且与参数数量无关。当精确解位于试验空间的张成空间中时，该方法在单次求解中恢复至机器精度。在Navier-Stokes基准上，它匹配或超过已发表的PINN求解器，可训练参数少两个数量级，且无需梯度优化。

英文摘要

We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters and solved by a single direct linear least-squares QR factorization. The trial space, which we term Linear-in-Learnables (LiL), comprises representations whose trainable parameters enter linearly, including random-feature extreme learning machines, spectral polynomial bases, and trigonometric expansions, each implemented as a physics-informed neural network. The method thus replaces the nonconvex gradient-based training that limits standard PINNs with a convex per-step solve. We establish local Newton-Kantorovich convergence of the outer iteration to a residual-limited neighborhood under an explicit smallness condition, with the limiting accuracy governed by the best-approximation residual of the trial space rather than by an optimization tolerance. The method, denoted LiL-Q, is assessed on seven benchmarks spanning scalar nonlinear PDEs (Bratu, viscous Burgers, Buckley-Leverett), coupled systems (plane-strain elasticity and the incompressible Navier-Stokes equations in two and three spatial dimensions), and steady-state Darcy flow with heterogeneous permeability. Across these problems, LiL-Q converges in single-digit outer iterations in most cases, even at the coarsest basis sizes and independent of the parameter count. When the exact solution lies in the span of the trial space, the method recovers it to machine precision in a single solve. On the Navier-Stokes benchmarks, it matches or exceeds published PINN solvers with up to two orders of magnitude fewer trainable parameters, without gradient-based optimization.

URL PDF HTML ☆

赞 0 踩 0

2606.18231 2026-06-17 cs.CV cs.LG cs.RO 交叉投稿

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

自适应体积力学属性场：分辨率无关

Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I. W. Levin, Maria Shugrina

发表机构 * NVIDIA（英伟达）

AI总结提出AdaVoMP方法，利用稀疏自适应体素结构和自回归Transformer编解码器，为3D物体预测高分辨率空间变化的杨氏模量、泊松比和密度，相比现有技术分辨率提升16^3倍且更准确。

Comments Project Page and hi-res paper: https://research.nvidia.com/labs/sil/projects/adavomp/. ICML 2026

详情

AI中文摘要

精确的力学属性（或材料）杨氏模量（$E$）、泊松比（$\ u$）和密度（$\ ho$）对于数字世界的可靠物理模拟至关重要，但大多数3D资产缺乏这些信息。我们提出AdaVoMP，一种预测输入3D物体跨表示形式的精确密集空间变化（$E$，$\ u$，$\ ho$）的方法，在分辨率、准确性和内存效率上优于现有技术。我们技术的基础是一种稀疏自适应体素结构SAV，它能高效地表示输入3D形状和材料场输出。我们将最准确的先前方法VoMP的固定体素模型替换为一种新颖的稀疏Transformer编码器-解码器模型，该模型学习为每个输入形状自回归地生成唯一的SAV来表示其材料，实现比先前技术高$16^3$倍的分辨率。实验表明，即使测试时计算量少于所有先前技术，AdaVoMP也能估计出更准确的体积属性。这使得我们能够将高分辨率复杂3D物体转换为可模拟的资产，从而实现逼真的可变形模拟。

英文摘要

Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($ν$) and density ($ρ$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $ν$, $ρ$) for input 3D objects across representations, improving the resolution, accuracy, and memory efficiency over the state-of-the-art. The foundation of our technique is a sparse and adaptive voxel structure SAV that efficiently represents both the input 3D shape and the material field output. We replace the fixed-voxel model of the most accurate prior method, VoMP, with a novel sparse transformer encoder-decoder model that learns to generate a unique SAV autoregressively for every input shape to represent its materials, achieving a resolution $16^3\times$ higher than prior art. Experiments show that AdaVoMP estimates more accurate volumetric properties, even with lesser test-time compute than all prior art. This allows us to convert high-resolution complex 3D objects into simulation-ready assets, resulting in realistic deformable simulations.

URL PDF HTML ☆

赞 0 踩 0

2505.17740 2026-06-17 cs.LG cs.NE physics.comp-ph 版本更新

A tensor network approach for chaotic time series prediction

一种用于混沌时间序列预测的张量网络方法

Rodrigo Martínez-Peña, Román Orús

AI总结针对混沌时间序列预测问题，提出基于张量网络的模型，通过分解高维数组降低参数复杂度，在精度和计算效率上优于传统回声状态网络。

Comments 15 pages, 4 figures. Comments are welcome!

详情

AI中文摘要

对混沌时间序列进行准确预测是一个复杂的挑战。储层计算是一种受神经形态启发的方法，已成为这项任务的强大工具。它利用动力系统的记忆和非线性，无需大量参数调整。然而，选择和优化储层架构仍然是一个开放问题。下一代储层计算通过采用基于截断Volterra级数的非线性向量自回归简化了该问题，从而降低了超参数复杂度。但后者在最大单项式次数方面存在指数级参数增长。张量网络通过将多维数组分解为低维结构，为解决该问题提供了有前景的方案，从而缓解了维度灾难。本文探索了先前提出的张量网络模型在混沌时间序列预测中的应用，展示了其在精度和计算效率方面相比传统回声状态网络的优势。使用最先进的张量网络方法，我们能够弥合张量网络与储层计算社区之间的差距，促进两个领域的进步。

英文摘要

Making accurate predictions of chaotic time series is a complex challenge. Reservoir computing, a neuromorphic-inspired approach, has emerged as a powerful tool for this task. It exploits the memory and nonlinearity of dynamical systems without requiring extensive parameter tuning. However, selecting and optimizing reservoir architectures remains an open problem. Next-generation reservoir computing simplifies this problem by employing nonlinear vector autoregression based on truncated Volterra series, thereby reducing hyperparameter complexity. Nevertheless, the latter suffers from exponential parameter growth in terms of the maximum monomial degree. Tensor networks offer a promising solution to this issue by decomposing multidimensional arrays into low-dimensional structures, thus mitigating the curse of dimensionality. This paper explores the application of a previously proposed tensor network model for predicting chaotic time series, demonstrating its advantages in terms of accuracy and computational efficiency compared to conventional echo state networks. Using a state-of-the-art tensor network approach enables us to bridge the gap between the tensor network and reservoir computing communities, fostering advances in both fields.

URL PDF HTML ☆

赞 0 踩 0

2512.13853 2026-06-17 cs.LG cond-mat.stat-mech math.PR stat.ML 版本更新

Dropout Neural Network Training Viewed from a Percolation Perspective

从逾渗视角看待Dropout神经网络训练

Finley Devlin, Jaron Sanders

AI总结本文研究使用dropout训练深度神经网络时的逾渗现象，建立新逾渗模型刻画网络拓扑与路径问题的关系，揭示dropout中的逾渗效应及其可能导致训练崩溃的机制。

Comments 21 pages, 14 figures

详情

AI中文摘要

在这项工作中，我们研究了使用dropout训练深度神经网络（NNs）时逾渗的存在和影响。Dropout方法是训练NNs的正则化技术，由G. Hinton等人（2012）首次提出。这些方法在训练的每个阶段随机临时移除NN中的连接，并用随机梯度下降（SGD）更新剩余子网络。随机从网络中移除连接的过程类似于逾渗，这是统计物理的一个范式模型。如果dropout移除足够多的连接，使得NN的输入和输出之间没有路径，那么NN就无法根据数据做出预测。我们研究了模拟NN中dropout的新逾渗模型，并刻画了网络拓扑与该路径问题之间的关系。该理论证明了dropout中存在逾渗效应。我们还表明，在使用dropout训练无偏置NN时，这种逾渗效应可能导致训练崩溃；并且我们启发式地论证了这种崩溃也扩展到有偏置的NN。

英文摘要

In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.

URL PDF HTML ☆

赞 0 踩 0

2603.22372 2026-06-17 cs.LG cs.AI 版本更新

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

重新思考时间序列的多模态融合：文本模态需要受约束的融合

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

AI总结针对多模态时间序列预测中朴素融合方法效果不佳的问题，提出受约束融合方法及受控融合适配器（CFA），通过低秩适配器过滤无关文本信息，在多种数据集和模型上验证了有效性。

Comments KDD Workshop on Mining and Learning from Time Series 2026

详情

AI中文摘要

多模态学习的最新进展推动了将文本或视觉等辅助模态集成到时间序列（TS）预测中。然而，现有方法大多增益有限，通常仅在特定数据集上提升性能，或依赖限制泛化能力的架构特定设计。在本文中，我们表明采用朴素融合策略（例如简单加法或拼接）的多模态模型通常表现不如单模态TS模型，我们将其归因于辅助模态的未受控集成可能引入无关信息。受此观察启发，我们探索了各种旨在控制这种集成的受约束融合方法，并发现它们始终优于朴素融合方法。此外，我们提出了受控融合适配器（CFA），一种简单的即插即用方法，无需修改TS主干即可实现受控的跨模态交互，仅集成与TS动态对齐的相关文本信息。CFA采用低秩适配器在将文本信息融合到时间表示之前过滤无关文本信息。我们在各种数据集和TS/文本模型上进行了超过20K次实验，证明了受约束融合方法的有效性。代码见：this https URL。

英文摘要

Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods. Code is available at: https://github.com/seunghan96/cfa.

URL PDF HTML ☆

赞 0 踩 0

2604.03444 2026-06-17 cs.LG cs.CL 版本更新

Olmo Hybrid: From Theory to Practice and Back

Olmo Hybrid：从理论到实践再回到理论

William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal

AI总结本文通过理论分析和实验验证，证明混合模型（结合注意力与线性RNN）在表达能力、扩展效率上优于纯Transformer，并训练了7B参数的Olmo Hybrid模型，在标准评估中超越Olmo 3。

Comments Corrected author list and typos in appendix

详情

AI中文摘要

近期工作展示了非Transformer语言模型（尤其是线性递归神经网络（RNN）和混合注意力与递归的混合模型）的潜力。然而，对于这些新架构的潜在优势是否值得承担规模化扩展的风险和努力，尚无共识。为解决此问题，我们从多个方面提供混合模型优于纯Transformer的证据。首先，理论上，我们证明混合模型不仅继承了Transformer和线性RNN的表达能力，还能表达超出两者的任务，例如代码执行。将这一理论付诸实践，我们训练了Olmo Hybrid，一个70亿参数模型，与Olmo 3 7B基本相当，但将滑动窗口层替换为Gated DeltaNet层。我们表明，在标准预训练和中期训练评估中，Olmo Hybrid优于Olmo 3，证明了混合模型在受控大规模设置下的优势。我们发现混合模型的扩展效率显著高于Transformer，这解释了其更高的性能。然而，尚不清楚为何特定形式问题上的更高表达能力会导致更好的扩展性或在下游任务（与这些问题无关）上表现更优。为解释这一明显差距，我们回到理论，论证为何增强的表达能力应转化为更好的扩展效率，从而完成循环。总体而言，我们的结果表明，混合注意力和递归层的混合模型是语言建模范式的强大扩展：不仅用于减少推理时的内存，更是获得在预训练中更好扩展的更具表达能力模型的基本途径。

英文摘要

Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. To address this, we provide evidence for the advantages of hybrid models over pure transformers on several fronts. First, theoretically, we show that hybrid models do not merely inherit the expressivity of transformers and linear RNNs, but can express tasks beyond both, such as code execution. Putting this theory to practice, we train Olmo Hybrid, a 7B-parameter model largely comparable to Olmo 3 7B but with the sliding window layers replaced by Gated DeltaNet layers. We show that Olmo Hybrid outperforms Olmo 3 across standard pretraining and mid-training evaluations, demonstrating the benefit of hybrid models in a controlled, large-scale setting. We find that the hybrid model scales significantly more efficiently than the transformer, explaining its higher performance. However, its unclear why greater expressivity on specific formal problems should result in better scaling or superior performance on downstream tasks unrelated to those problems. To explain this apparent gap, we return to theory and argue why increased expressivity should translate to better scaling efficiency, completing the loop. Overall, our results suggest that hybrid models mixing attention and recurrent layers are a powerful extension to the language modeling paradigm: not merely to reduce memory during inference, but as a fundamental way to obtain more expressive models that scale better during pretraining.

URL PDF HTML ☆

赞 0 踩 0

2606.03089 2026-06-17 cs.LG cs.AI 版本更新

Constitutional On-Policy Safe Distillation

宪法性在策略安全蒸馏

Ming Wen, Yuxuan Liu, Kun Yang, Yunhao Feng, Zhuoer Xu, Yuhao Sun, Shiwen Cui, Xiang Zheng, Guoyu Wang, Xingjun Ma, Yu-Gang Jiang

发表机构 * Institute of Trustworthy Embodied AI（可信具身人工智能研究院）； Fudan University（复旦大学）； Shanghai Innovation Institute（上海创新研究院）； Ant Group（蚂蚁集团）； Zhejiang University（浙江大学）； City University of Hong Kong（香港城市大学）

AI总结针对在策略自蒸馏在安全对齐中因宪法条件导致教师分布收缩、表达能力下降的问题，提出宪法性在策略安全蒸馏（COPSD），通过交叉SFT冷启动校准教师分布，再进行宪法条件在策略蒸馏，在12个基准上实现了更优的安全-有用性权衡并降低安全税。

详情

AI中文摘要

在策略自蒸馏（OPSD）通过使用基于特权信息条件的教师提供密集的令牌级监督，已成为一种高效的后训练范式。先前工作表明，OPSD在可验证推理任务中可能崩溃，但安全对齐不同，它由高层宪法而非显式目标答案指导，因此是重新审视密集蒸馏的自然场景。然而，我们的初步研究表明，安全OPSD仍然遭受严重崩溃：宪法条件将教师分布收缩为短且过于保守的响应，而反向KL进一步将这种收缩放大为表达能力下降。我们将此效应形式化为非正交语义空间中安全边界下的几何泄漏，其中安全压力转移到表达能力维度。基于此分析，我们提出宪法性在策略安全蒸馏（COPSD），首先通过交叉SFT冷启动校准教师，然后执行宪法条件在策略蒸馏。在12个基准上的实验表明，COPSD比基线实现了持续更强的安全-有用性权衡，同时大幅降低了对通用推理能力的安全税。

英文摘要

On-policy self-distillation (OPSD) has emerged as an efficient post-training paradigm by using a teacher conditioned on privileged information to provide dense token-level supervision. Prior work has shown that OPSD can collapse in verifiable reasoning tasks, but safety alignment differs in that it is guided by high-level constitutions rather than explicit target answers, making it a natural setting to revisit dense distillation. However, our pilot study show that safety OPSD still suffers from severe collapse: constitutional conditioning contracts the teacher distribution toward short and overly conservative responses, and Reverse KL further amplifies this contraction into reduced expressiveness. We formalize this effect as geometric leakage under safety boundaries in a non-orthogonal semantic space, where safety pressure transfers into the expressiveness dimension. Based on this analysis, we propose Constitutional On-Policy Safe Distillation (COPSD), which first calibrates the teacher through a Cross-SFT cold-start and then performs constitution-conditioned on-policy distillation. Experiments on 12 benchmarks show that COPSD achieves a consistently stronger safety--helpfulness trade-off than baselines while substantially reducing the safety tax on general reasoning ability.

URL PDF HTML ☆

赞 0 踩 0

2606.14668 2026-06-17 cs.LG 版本更新

When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

何时写入与何时抑制：面向记忆辅助知识编辑的路径专用双适配器

Baijia Zhang, Yining Huang

发表机构 * institutetext（机构）

AI总结提出路径专用双适配器编辑器，通过相关性路由器决定是否应用编辑记忆，分别训练编辑适配器和局部性适配器，在三个基准上取得最佳概率偏好准确率。

详情

AI中文摘要

知识编辑系统必须更新选定的事实，同时保持邻近但无关的行为不变。本文在记忆辅助设置中研究该问题，其中在推理时检索编辑记忆，参数高效适配器校正模型的对象偏好。我们认为核心设计问题不仅是如何写入编辑，还包括何时抑制它。我们引入\method{}，一种路径专用双适配器编辑器。相关性路由器首先决定提示是否应接收编辑记忆。被路由的提示使用训练为偏好新对象而非原始对象的编辑适配器；未被路由的非直接提示使用单独的局部性适配器，该适配器训练为保留或恢复原始对象偏好。我们在三个1,000案例协议\cf{}、\zsre{}和\mquake{}上，在相同记忆协议和两个7B/8B基础模型下评估\method{}。在Llama-3.1-8B-Instruct上，\method{}在所有三个基准上获得最佳总体概率偏好准确率：\cf{}为0.8180，\zsre{}为0.8946，\mquake{}为0.9922。在Qwen3-8B上趋势相同。路由器消融实验表明，相关记忆边界因数据集而异：在\cf{}上，词汇神经路由器最安全；而在\zsre{}和\mquake{}上，BGE嵌入路由效果更好。组件和模块消融实验表明，增益主要来自将编辑注入与离路抑制分离，而非单纯增加LoRA容量。

英文摘要

Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central design question is not only how to write an edit, but also when to suppress it. We introduce \method{}, a route-specialized dual-adapter editor. A relevance router first decides whether a prompt should receive an edit memory. Routed prompts use an edit adapter trained to prefer the new object over the original object; unrouted non-direct prompts use a separate locality adapter trained to preserve or restore the original-object preference. We evaluate \method{} on three 1,000-case protocols, \cf{}, \zsre{}, and \mquake{}, under the same memory protocol and two 7B/8B base models. On Llama-3.1-8B-Instruct, \method{} obtains the best overall probability-preference accuracy on all three benchmarks: 0.8180 on \cf{}, 0.8946 on \zsre{}, and 0.9922 on \mquake{}. The same trend holds on Qwen3-8B. Router ablations show that the relevant memory boundary differs across datasets: a lexical neural router is safest on \cf{}, while BGE embedding routing is better on \zsre{} and \mquake{}. Component and module ablations show that the gain mainly comes from separating edit injection from off-route suppression rather than from simply increasing LoRA capacity.

URL PDF HTML ☆

赞 0 踩 0

2606.14990 2026-06-17 cs.LG cs.AI 版本更新

Rational Sparse Autoencoder

有理稀疏自编码器

Naiyu Yin, Yue Yu

发表机构 * Lehigh University（里海大学）

AI总结提出有理稀疏自编码器（RSAE），用可训练有理函数替代固定编码器激活，通过两阶段流程（初始化+微调）在多种语言模型和基线激活族上提升重构与下游行为指标，不牺牲特征可解释性。

Comments Accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情

AI中文摘要

稀疏自编码器（SAE）是机械可解释性的标准工具，但当前的SAE系列受限于固定的编码器非线性，如ReLU、JumpReLU和TopK。这会将特定的稀疏机制硬编码到模型中，并可能扭曲重构与稀疏性的权衡。我们引入了有理稀疏自编码器（RSAE），它将固定的编码器激活替换为可训练的有理函数。有理激活足够灵活，可以在紧致域上一致逼近现有SAE系列使用的激活原语（对于TopK，提供分离top-k阈值后获得的阈值门），同时提供更丰富的函数类以适应观察到的预激活几何形状。我们通过两阶段流程实现这一想法：初始化过程复制预训练的基线SAE权重，插入通过在合成数据上使用松弛Remez交换获得的有理系数，并随有理系数一起校准尺度参数；然后在标准稀疏正则化重构目标下进行微调步骤。实验上，在三个开源权重语言模型的残差流激活上，以及所有三个基线激活族中，RSAE在微调步骤后严格改进，无论是在重构侧指标还是在下游行为指标上，且不牺牲稀疏探测下的特征级可解释性。这些增益在宿主语言模型、基线激活族以及我们测试的完整基线稀疏范围内一致，而升级本身每个自编码器仅增加少量标量参数，并在单个消费级GPU上运行几分钟。

英文摘要

Sparse autoencoders (SAEs) are standard tools for mechanistic interpretability, but current SAE families are constrained by fixed encoder nonlinearities such as ReLU, JumpReLU, and TopK. This hard-codes a particular sparsity mechanism into the model and can distort the reconstruction-versus-sparsity trade-off. We introduce the Rational Sparse Autoencoder (RSAE), which replaces the fixed encoder activation with a trainable rational function. Rational activations are flexible enough to uniformly approximate the activation primitives used by existing SAE families on compact domains (for TopK, the thresholded gate obtained after a separating top-k threshold is supplied), while also providing a richer function class for adapting to the observed pre-activation geometry. We realise this idea through a two-stage pipeline: an initialisation procedure that copies the pre-trained baseline SAE weights, plugs in rational coefficients obtained by the relaxed Remez exchange on synthetic data, and calibrates the scale parameters along with the rational coefficients; followed by a fine-tuning step under the standard sparsity-regularised reconstruction objective. Empirically, on residual-stream activations of three open-weight language models and across all three baseline activation families, the RSAE strictly improves on it after the fine-tuning step, both on reconstruction-side metrics and on downstream-behaviour metrics, without sacrificing feature-level interpretability under sparse probing. These gains are consistent across host language models, across baseline activation families, and across the full range of baseline sparsity we tested, while the upgrade itself adds only a handful of scalar parameters per autoencoder and runs in minutes on a single consumer GPU.

URL PDF HTML ☆

赞 0 踩 0

2406.07435 2026-06-17 cs.CV cs.LG eess.IV 版本更新

Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

警惕混叠——信号保留对鲁棒图像复原至关重要

Shashank Agnihotri, Julia Grabinski, Janis Keuper, Margret Keuper

AI总结针对图像复原网络因混叠导致鲁棒性差的问题，提出BOA-Restormer，通过在频域执行部分下采样和上采样操作，确保无混叠路径，在低成本下提升模型鲁棒性。

Comments Tags: Adversarial attack, image restoration, image deblurring, frequency sampling

2511.09204 2026-06-17 quant-ph cs.LG 版本更新

Resource-Efficient Variational Quantum Classifier

资源高效的变分量子分类器

Petr Ptáček, Paulina Lewandowska, Ryszard Kukulski

发表机构 * IT4Innovations, VSB - Technical University of Ostrava（IT4Innovations奥斯特拉瓦技术大学）； Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava（电气工程与计算机科学学院，奥斯特拉瓦技术大学）

AI总结提出基于汉明距离测量与经典后处理的无歧义量子分类器，通过更有效利用ansatz表达性提升分类性能，同时大幅减少电路评估次数，并增强对噪声的鲁棒性。

Comments 13 pages, 7 figures, 1 table; current format of preprint template

2601.18252 2026-06-17 cs.CV cs.AI cs.LG stat.ML 版本更新

Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

Co-PLNet: 一种用于提示引导的线框解析的协作点线网络

Chao Wang, Xuanying Li, Cheng Dai, Jinglei Feng, Yuxiang Luo, Hao Qin, Yuqi Ouyang

AI总结提出点线协作框架Co-PLNet，通过点线提示编码器交换空间线索，并利用交叉引导线解码器增强点线一致性，在Wireframe和YorkUrban数据集上提升线框解析的准确性和鲁棒性。

详情

AI中文摘要

线框解析旨在恢复线段及其连接点，以形成结构化的几何表示，用于同时定位与地图构建（SLAM）等下游任务。现有方法分别预测线和点，并在事后进行调和，导致不匹配和鲁棒性降低。我们提出Co-PLNet，一个点线协作框架，在两个任务之间交换空间线索，其中早期检测通过点线提示编码器（PLP-Encoder）转换为空间提示，该编码器将几何属性编码为紧凑且空间对齐的图。交叉引导线解码器（CGL-Decoder）随后通过基于互补提示的稀疏注意力细化预测，强制点线一致性和效率。在Wireframe和YorkUrban上的实验显示，准确性和鲁棒性持续改进，同时具有有利的实时效率，证明了我们在结构化几何感知中的有效性。我们的代码可在该 https URL 获取。

英文摘要

Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception. Our code is available at https://github.com/GalacticHogrider/Co-PLNet.

URL PDF HTML ☆

赞 0 踩 0

2602.14771 2026-06-17 cs.CV cs.AI cs.LG cs.MM cs.NE 版本更新

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

GOT-JEPA：基于联合嵌入预测架构的通用目标跟踪与模型自适应及遮挡处理

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin

AI总结提出GOT-JEPA框架，通过预测跟踪模型而非图像特征来提升泛化能力，并设计OccuSolver增强遮挡感知，在七个基准上验证了有效性。

Comments Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). This research focuses on learning model adaptation for adverse and dynamic environments, as well as fine-grained occlusion perception for tracking

详情

DOI: 10.1109/TCSVT.2026.3675005
Journal ref: IEEE Transactions on Circuits and Systems for Video Technology 2026

AI中文摘要

人类视觉系统通过整合当前观测与先前观测信息、适应目标和场景变化、以及精细推理遮挡来跟踪物体。相比之下，最近的通用目标跟踪器通常针对训练目标进行优化，这限制了在未见场景中的鲁棒性和泛化能力，并且它们的遮挡推理仍然粗糙，缺乏对遮挡模式的详细建模。为了解决这些在泛化和遮挡感知方面的局限性，我们提出了GOT-JEPA，一个模型预测预训练框架，将JEPA从预测图像特征扩展到预测跟踪模型。给定相同的历史信息，教师预测器从干净的当前帧生成伪跟踪模型，学生预测器学习从当前帧的损坏版本预测相同的伪跟踪模型。这种设计提供了稳定的伪监督，并明确训练预测器在遮挡、干扰和其他不利观测下产生可靠的跟踪模型，从而提高了对动态环境的泛化能力。基于GOT-JEPA，我们进一步提出了OccuSolver来增强目标跟踪的遮挡感知。OccuSolver调整了一个以点为中心的点跟踪器，用于目标感知的可见性估计和详细的遮挡模式捕获。在跟踪器迭代生成的目标先验条件下，OccuSolver逐步细化可见性状态，增强遮挡处理，并产生更高质量的参考标签，逐步改进后续模型预测。在七个基准上的广泛评估表明，我们的方法有效增强了跟踪器的泛化能力和鲁棒性。

英文摘要

The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often optimized for training targets, which limits robustness and generalization in unseen scenarios, and their occlusion reasoning remains coarse, lacking detailed modeling of occlusion patterns. To address these limitations in generalization and occlusion perception, we propose GOT-JEPA, a model-predictive pretraining framework that extends JEPA from predicting image features to predicting tracking models. Given identical historical information, a teacher predictor generates pseudo-tracking models from a clean current frame, and a student predictor learns to predict the same pseudo-tracking models from a corrupted version of the current frame. This design provides stable pseudo supervision and explicitly trains the predictor to produce reliable tracking models under occlusions, distractors, and other adverse observations, improving generalization to dynamic environments. Building on GOT-JEPA, we further propose OccuSolver to enhance occlusion perception for object tracking. OccuSolver adapts a point-centric point tracker for object-aware visibility estimation and detailed occlusion-pattern capture. Conditioned on object priors iteratively generated by the tracker, OccuSolver incrementally refines visibility states, strengthens occlusion handling, and produces higher-quality reference labels that progressively improve subsequent model predictions. Extensive evaluations on seven benchmarks show that our method effectively enhances tracker generalization and robustness.

URL PDF HTML ☆

赞 0 踩 0

2603.18104 2026-06-17 cs.AI cs.DC cs.LG cs.NE 版本更新

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

自适应领域模型：贝叶斯演化、热旋转与几何及神经形态AI的规范化训练

Houston Haynes

AI总结提出基于维度类型系统、程序超图和b-posit有界设计的替代训练架构，实现内存开销恒定、梯度精确累积和级保持更新，并引入贝叶斯蒸馏和热旋转机制，支持领域特定模型的持续自适应与可验证正确性。

Comments 32 pages, 3 figures

详情

AI中文摘要

当前AI训练假设在IEEE-754算术上进行反向模式自动微分。训练相对于推理的内存开销、优化器复杂性以及训练过程中几何属性的结构退化，都是该算术基底的后果。本文基于三项先前结果开发了一种替代训练架构：维度类型系统和确定性内存管理框架（Haynes 2026），将栈可分配梯度分配和精确quire累积确立为设计时可验证属性；程序超图（Haynes 2026），将几何代数计算中的级保持确立为类型级不变量；以及b-posit有界设计（Jonnalagadda et al. 2025），使posit算术在传统上被视为仅推理的硬件目标上变得可行。它们的组合实现了深度无关的训练内存（约为推理占用量的两倍）、级保持的权重更新和精确梯度累积，统一适用于损失函数优化和脉冲时序依赖的神经形态模型。我们引入了*贝叶斯蒸馏*，一种通过ADM训练机制提取通用模型潜在先验结构的机制，解决了领域特定训练的数据稀缺自举问题。对于部署，我们引入了*热旋转*，一种操作模式，其中更新后的模型在不中断服务的情况下过渡到活跃推理路径，并通过PHG证书和签名版本记录形式化正确性。结果是一类领域特定AI系统，比通用模型更小、更精确，持续自适应，相对于其领域的物理结构可验证正确，并且可从现有模型初始化。

英文摘要

Prevailing AI training assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework (Haynes 2026), which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph (Haynes 2026), which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit bounded-regime design (Jonnalagadda et al. 2025), which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce *Bayesian distillation*, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce *warm rotation*, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

URL PDF HTML ☆

赞 0 踩 0

2606.17516 2026-06-17 cs.LG cs.AI stat.ME stat.ML 新提交

FoundCause: Causal Discovery with Latent Confounders from Observational Data

FoundCause: 从观测数据中发现含隐混淆因子的因果关系

Patrick Blöbaum, Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan

发表机构 * Amazon Web Services（亚马逊云服务）； Department of Statistics, University of California, Davis（加州大学戴维斯分校统计系）

AI总结提出FoundCause，一种基于合成数据训练的摊销因果发现模型，通过单次前向传递直接映射数据集到因果图，显式建模隐混淆因子，在15个真实数据集上优于11种非摊销和4种摊销方法。

Comments Download the model at https://github.com/amazon-science/foundcause

详情

AI中文摘要

从观测数据中发现因果关系仍然具有挑战性，因为需要在没有干预的情况下恢复有向结构和隐混淆因子。我们提出了FoundCause，一种完全在合成数据上训练的摊销因果发现模型，它通过单次前向传递直接将数据集映射到因果图。通过从大量模拟结构因果模型中学习，FoundCause捕获了可迁移的统计模式，这些模式泛化到单个数据集之外。该架构融合了因果发现的几个关键归纳偏置。它使用一个置换不变的Transformer编码器，通过交替关注样本和变量来联合建模跨变量依赖性和每个变量的分布。通过统计条件注意力注入来自经典非对称度量的成对统计特征，引导模型朝向已知的因果信号。一个分解的解码器将边的存在性与方向分离，而一个三角细化模块使得能够推理高阶因果模式，如链和碰撞器。此外，一个基于可学习隐令牌的专用混淆因子模块显式建模隐藏的共同原因，并且模型通过其掩码输入表示显式处理缺失数据。据我们所知，FoundCause是第一个显式建模隐混淆因子的摊销因果发现方法。FoundCause在15个真实数据集上优于11种经典非摊销方法（如PC、GES、NOTEARS风格优化）和4种摊销因果发现方法，相对于最强的非摊销方法，在$F_1$上提高了9.6%，在AUROC上提高了1.2%，结构汉明距离减少了18.9%，同时仅需单次前向传递即可完成推理。

英文摘要

Causal discovery from observational data remains challenging due to the need to recover directed structure and latent confounding without interventions. We propose FoundCause, an amortized causal discovery model trained entirely on synthetic data that maps datasets directly to causal graphs in a single forward pass. By learning from large collections of simulated structural causal models, FoundCause captures transferable statistical patterns that generalize beyond individual datasets. The architecture incorporates several key inductive biases for causal discovery. It uses a permutation-invariant transformer encoder with alternating attention over samples and variables to jointly model cross-variable dependence and per-variable distributions. Pairwise statistical features derived from classical asymmetry measures are injected through statistics-conditioned attention, guiding the model toward known causal signals. A factorized decoder separates edge existence from direction, while a triangular refinement module enables reasoning over higher-order causal motifs such as chains and colliders. In addition, a dedicated confounder module based on learnable latent tokens explicitly models hidden common causes, and the model explicitly handles missing data via its masked input representation. To our knowledge, FoundCause is the first amortized causal discovery approach to explicitly model latent confounding. FoundCause outperforms 11 classical non-amortized methods (e.g., PC, GES, NOTEARS-style optimization) and 4 amortized causal discovery methods on 15 real-world datasets, achieving +9.6% improvement in $F_1$, +1.2% in AUROC, and an 18.9% reduction in structural Hamming distance relative to the strongest non-amortized methods, while performing inference in a single forward pass.

URL PDF HTML ☆

赞 0 踩 0

2606.17603 2026-06-17 cs.LG 新提交

Brep2Shape：通过自监督变换器对齐边界与形状表示

Yuanxu Sun, Yuezhou Ma, Haixu Wu, Guanyang Zeng, Muye Chen, Jianmin Wang, Mingsheng Long

AI总结提出Brep2Shape自监督预训练方法，利用双Transformer骨干和拓扑注意力对齐B-rep的抽象边界表示与直观形状表示，在多项下游任务中达到最优精度并加速收敛。

详情

AI中文摘要

边界表示（B-rep）是计算机辅助设计（CAD）的行业标准。虽然深度学习在处理B-rep模型方面显示出潜力，但现有方法存在表示差距：连续方法提供分析精度但视觉上抽象，而离散方法提供直观清晰性但牺牲了几何精度。为弥合这一差距，我们引入了Brep2Shape，一种新颖的自监督预训练方法，旨在对齐抽象边界表示与直观形状表示。我们的方法采用几何感知任务，其中模型学习从参数化贝塞尔控制点预测密集空间点，使网络能够更好地理解从抽象系数导出的物理流形。为增强这种对齐，我们提出了一个双Transformer骨干，具有并行流，独立编码表面和曲线令牌以捕获它们不同的几何属性。此外，集成了拓扑注意力以建模表面和曲线之间的相互依赖关系，从而保持拓扑一致性。实验结果表明，Brep2Shape具有显著的可扩展性，在各种下游任务中实现了最先进的精度和更快的收敛速度。代码可在以下仓库获取：this https URL。

英文摘要

Boundary representation (B-rep) is the industry standard for computer-aided design (CAD). While deep learning shows promise in processing B-rep models, existing methods suffer from a representation gap: continuous approaches offer analytical precision but are visually abstract, whereas discrete methods provide intuitive clarity at the expense of geometric precision. To bridge this gap, we introduce Brep2Shape, a novel self-supervised pre-training method designed to align abstract boundary representations with intuitive shape representations. Our method employs a geometry-aware task where the model learns to predict dense spatial points from parametric Bézier control points, enabling the network to better understand physical manifolds derived from abstract coefficients. To enhance this alignment, we propose a Dual Transformer backbone with parallel streams that independently encode surface and curve tokens to capture their distinct geometric properties. Moreover, the topology attention is integrated to model the interdependencies between surfaces and curves, thereby maintaining topological consistency. Experimental results demonstrate that Brep2Shape offers significant scalability, achieving state-of-the-art accuracy and faster convergence across various downstream tasks.Code is available at this repository: https://github.com/thuml/Brep2Shape.

URL PDF HTML ☆

赞 0 踩 0

2606.16379 2026-06-17 cs.LG stat.ML 版本更新

Scalable and Interpretable Representation Alignment with Ordinal Similarity

可扩展且可解释的序数相似性表示对齐

Diogo Soares, Pankhil Gawade, Andrea Dittadi, Ewa Szczurek

发表机构 * University of Maryland（马里兰大学）； Google Research（谷歌研究院）

AI总结针对现有表示相似性度量缺乏可解释性、对异常值敏感且计算复杂的问题，提出基于序数相似性的三元组和四元组相似性指数，实现可解释、鲁棒且高效的对齐度量。

详情

AI中文摘要

评估表示相似性是表示学习的基础。然而，现有度量存在显著局限性：由于基线漂移而缺乏可解释性，对异常值缺乏鲁棒性，并且对于大型数据集计算上难以处理，迫使依赖启发式近似。为了解决这些问题，我们开发了一个序数相似性框架，通过三元组相似性指数（TSI）和四元组相似性指数（QSI）实例化，通过量化序数关系的一致性来衡量对齐。我们从理论上证明，这种公式本质上是可解释的、对异常值鲁棒的，并且计算高效。最后，我们建立了TSI与通过互近邻度量的局部邻域对齐之间的形式等价性。实验上，我们验证了这些性质，并表明序数相似性提供了一种可扩展的对齐度量方法，使从业者能够更好地理解和设计表示。

英文摘要

Evaluating representation similarity is fundamental to representation learning. However, existing metrics suffer from significant limitations: they lack interpretability due to shifting baselines, lack robustness to outliers, and are computationally intractable for large datasets, forcing reliance on heuristic approximations. To address this, we develop an ordinal-similarity framework, instantiated by the Triplet (TSI) and Quadruplet (QSI) Similarity Indices, which measure alignment by quantifying the consistency of ordinal relationships. We theoretically demonstrate this formulation is inherently interpretable, robust to outliers, and computationally efficient. Finally, we establish a formal equivalence between TSI and local neighborhood alignment, measured by Mutual Nearest Neighbors. Empirically, we validate these properties and show that ordinal similarity offers a scalable approach to measuring alignment, enabling practitioners to better understand and design representations.

URL PDF HTML ☆

赞 0 踩 0

2407.13053 2026-06-17 cs.CY cs.AI cs.CL cs.LG 版本更新

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

E2Vec：基于时间信息的特征嵌入用于分析电子书系统中的学生行为

Yuma Miyazaki, Valdemar Švábenský, Yuta Taniguchi, Fumiya Okubo, Tsubasa Minematsu, Atsushi Shimada

发表机构 * Kyushu University（九州大学）

AI总结提出E2Vec方法，利用词嵌入将操作日志和时间间隔转化为学生向量，用于风险检测任务，提升泛化性和性能。

Comments Research paper published in the Proceedings of the 17th Educational Data Mining Conference (EDM 2024), see https://doi.org/10.5281/zenodo.12729853

详情

DOI: 10.5281/zenodo.12729853

AI中文摘要

数字教科书（电子书）系统将学生与教科书的交互记录为一系列事件，称为事件流数据。过去，研究人员从事件流中提取有意义的特征，并将其用作下游任务（如成绩预测和学生行为建模）的输入。先前的研究评估了主要使用基于统计的特征（如操作类型数量或访问频率）的模型。虽然这些特征有助于提供某些见解，但它们缺乏捕捉不同学生学习行为中细粒度差异的时间信息。本研究提出E2Vec，一种基于词嵌入的新型特征表示方法。该方法将每个学生的操作日志及其时间间隔视为字符字符串序列，并生成包含时间信息的学习活动特征的学生向量。我们应用fastText为来自两年计算机科学课程数据集的305名学生生成嵌入向量。然后，我们研究了E2Vec在风险检测任务中的有效性，展示了其泛化性和性能潜力。

英文摘要

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for downstream tasks such as grade prediction and modeling of student behavior. Previous research evaluated models that mainly used statistical-based features derived from EventStream logs, such as the number of operation types or access frequencies. While these features are useful for providing certain insights, they lack temporal information that captures fine-grained differences in learning behaviors among different students. This study proposes E2Vec, a novel feature representation method based on word embeddings. The proposed method regards operation logs and their time intervals for each student as a string sequence of characters and generates a student vector of learning activity features that incorporates time information. We applied fastText to generate an embedding vector for each of 305 students in a dataset from two years of computer science courses. Then, we investigated the effectiveness of E2Vec in an at-risk detection task, demonstrating potential for generalizability and performance.

URL PDF HTML ☆

赞 0 踩 0

2603.04198 2026-06-17 stat.ML cs.LG 版本更新

Stable and Steerable Sparse Autoencoders with Weight Regularization

基于权重正则化的稳定且可操控的稀疏自编码器

Piotr Jedryszek, Oliver M. Crook

AI总结通过L1/L2权重正则化提高稀疏自编码器的跨种子特征一致性，并在语言模型上提升操控成功率，同时保持可解释性分数。

详情

AI中文摘要

稀疏自编码器（SAEs）被广泛用于从神经网络激活中提取人类可解释的特征，但其学习到的特征在不同随机种子和训练选择下可能差异很大。为了提高稳定性，我们研究了通过添加编码器和解码器权重的L1或L2惩罚进行权重正则化，并评估了正则化与常见SAE训练默认值的交互作用。在MNIST上，我们观察到L2权重正则化产生了一个高度对齐的特征核心，并且当与绑定初始化和单位范数解码器约束结合时，它显著提高了跨种子的特征一致性。对于在语言模型激活（Pythia-70M-deduped）上训练的TopK SAEs，添加小的L2权重惩罚增加了三个随机种子间共享特征的比例，并使操控成功率大致翻倍，同时自动可解释性分数的平均值基本保持不变。最后，在正则化设置下，激活操控成功与否能更好地由自动可解释性分数预测，这表明正则化可以使基于文本的特征解释与功能可控性对齐。

英文摘要

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

URL PDF HTML ☆

赞 0 踩 0

2603.22281 2026-06-17 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

ThinkJEPA：赋予潜在世界模型大型视觉-语言推理能力

Haichao Zhang, Yijiang Li, Shwai He, Tushar Nagarajan, Mingfei Chen, Jianglin Lu, Ang Li, Yun Fu

AI总结提出ThinkJEPA框架，结合密集JEPA分支与稀疏VLM思考者分支，通过分层金字塔表示提取模块，实现细粒度运动建模与长程语义引导，在手部操作轨迹预测任务上超越基线。

Comments 10 pages, 5 figures

详情

AI中文摘要

潜在世界模型（如V-JEPA2）的最新进展展示了从视频观测预测未来世界状态的能力。然而，短观测窗口的密集预测限制了时间上下文，可能导致预测偏向局部低层次外推，难以捕捉长程语义并降低下游效用。相比之下，视觉-语言模型（VLM）通过对均匀采样帧进行推理，提供强大的语义基础和通用知识，但由于计算驱动的稀疏采样、语言输出瓶颈（将细粒度交互状态压缩为文本导向表示）以及适应小规模动作条件数据集时的数据分布不匹配，它们不适合作为独立的密集预测器。我们提出了一种VLM引导的JEPA风格潜在世界建模框架，通过双时间路径结合密集帧动态建模与长程语义指导：一个密集JEPA分支用于细粒度运动和交互线索，以及一个均匀采样的VLM“思考者”分支，具有更大的时间步长以提供知识丰富的指导。为了有效传递VLM的渐进推理信号，我们引入了一个分层金字塔表示提取模块，将多层VLM表示聚合成与潜在预测兼容的指导特征。在手部操作轨迹预测实验上，我们的方法优于强VLM-only基线和JEPA预测器基线，并展现出更鲁棒的长程展开行为。

英文摘要

Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision--language models (VLMs), in contrast, provide strong semantic grounding and general knowledge by reasoning over uniformly sampled frames, but they are not ideal as standalone dense predictors due to compute-driven sparse sampling, a language-output bottleneck that compresses fine-grained interaction states into text-oriented representations, and a data-regime mismatch when adapting to small action-conditioned datasets. We propose a VLM-guided JEPA-style latent world modeling framework that combines dense-frame dynamics modeling with long-horizon semantic guidance via a dual-temporal pathway: a dense JEPA branch for fine-grained motion and interaction cues, and a uniformly sampled VLM \emph{thinker} branch with a larger temporal stride for knowledge-rich guidance. To transfer the VLM's progressive reasoning signals effectively, we introduce a hierarchical pyramid representation extraction module that aggregates multi-layer VLM representations into guidance features compatible with latent prediction. Experiments on hand-manipulation trajectory prediction show that our method outperforms both a strong VLM-only baseline and a JEPA-predictor baseline, and yields more robust long-horizon rollout behavior.

URL PDF HTML ☆

赞 0 踩 0

2604.22128 2026-06-17 cs.CL cs.LG 版本更新

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

括号序列Transformer中可解码性与因果使用的分离

Aryan Sharma, Cutter Dawes, Shivam Raval

AI总结通过探针和干预实验，发现Dyck语言Transformer中层级表示虽可解码，但仅注意力模式中的栈顶位置对长距离准确性有因果影响。

详情

AI中文摘要

当在需要理解层级结构的任务上训练时，Transformer被发现以不同方式表示这种层级：在残差流的几何结构中，以及在维持后进先出顺序的类栈注意力模式中。然而，这些表示是被因果使用还是仅仅可解码仍不清楚。我们在Dyck语言（一种平衡括号序列的形式语言）上训练的Transformer中检验了这一差距，其中层级真实标签是明确的。通过探针和干预残差流及注意力模式，我们发现深度、距离和栈顶信号都是可解码的，但它们的因果作用不同。具体而言，掩盖真实栈顶位置的注意力会导致长距离准确性急剧下降，而消融低维残差流子空间则影响相对较小。这些结果扩展到模板化的自然语言设置，表明即使在相关层级变量已知的受控设置中，仅可解码性并不意味着因果使用。

英文摘要

When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns maintaining a last-in, first-out ordering. However, it remains unclear whether these representations are causally used or merely decodable. We examine this gap in transformers trained on the Dyck language (a formal language of balanced bracket sequences), where the hierarchical ground truth is explicit. By probing and intervening on the residual stream and attention patterns, we find that depth, distance, and top-of-stack signals are all decodable, yet their causal roles diverge. Specifically, masking attention to the true top-of-stack position causes a sharp drop in long-distance accuracy, while ablating low-dimensional residual stream subspaces has comparatively little effect. These results, which extend to a templated natural language setting, suggest that even in a controlled setting where the relevant hierarchical variables are known, decodability alone does not imply causal use.

URL PDF HTML ☆

赞 0 踩 0

2606.07555 2026-06-17 cs.CL cs.LG 版本更新

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

先验通过抑制持续存在：词汇覆盖的斯特鲁普范式

Han-yu Wang

发表机构 * The University of Hong Kong（香港大学）

AI总结通过斯特鲁普范式实验，发现语言模型中的词汇先验在局部规则覆盖后仍持续存在，并通过激活修补定位到源位置三元组，揭示了先验是干扰起源和覆盖痕迹的共同通道。

详情

AI中文摘要

词汇表、技术规范和系统提示通常要求语言模型以不熟悉的方式使用熟悉的词汇。当这种方式有效时，词汇先验通过覆盖而非替换持续存在：它在局部规则应用后继续运作，规则降低其logit而非在顶部安装新含义。我们通过斯特鲁普风格范式对此进行测试：一个重映射规则（“doctor”意为“forest”）与查询词的词汇先验干扰项（“hospital”）对抗，并匹配中性对照。在跨越四个家族和1B-9B参数的11个开源权重模型中，即使在项目级别控制答案先验、频率、分词和提示措辞后，词汇先验强度仍能预测干扰。对五个对齐模型的激活修补定位到一个源位置三元组（定义主语、定义目标、查询词），该三元组几乎完全恢复了冲突效应（聚合$R \in [0.92, 1.06]$）。定义目标交换表明该三元组执行绑定而非身份匹配。分离实验将目标保留隔离为绑定特定特征：干扰抑制在匹配、交换和项目不匹配条件下均发生，而目标logit崩溃仅在定义目标位置被破坏时发生。行为和机制汇聚到同一通道：词汇先验既是干扰的起源，也是覆盖留下痕迹的地方。

英文摘要

Glossaries, technical specifications, and system prompts routinely ask language models to use familiar words in unfamiliar ways. When this works, the local rule does not install the new meaning on top of the old one; the pretrained prior keeps operating underneath, and its strength still shows through. We test this with a Stroop-style paradigm: a remapping rule (doctor means forest) pitted against the query word's lexical-prior distractor (hospital), with matched neutral controls. Across 11 open-weight models spanning four families and 1B-9B parameters, lexical-prior strength predicts interference even after item-level controls for answer prior, frequency, tokenization, and prompt wording. Activation patching on five aligned models locates a source-position triplet (definition subject, definition target, query word) that nearly fully recovers the conflict effect (aggregate $R \in [0.92, 1.06]$); a definition-target swap shows the triplet performs binding rather than identity matching. Dissociation experiments isolate target preservation as the binding-specific signature: distractor suppression occurs under matched, swap, and item-mismatched conditions alike, whereas target logit collapse occurs only when the definition-target position is corrupted. Behavior and mechanism converge on the same channel: the prior's strength both predicts which overrides fail and marks where the causal repair lands.

URL PDF HTML ☆

赞 0 踩 0

2606.09770 2026-06-17 q-bio.NC cs.LG 版本更新

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

发现功能选择性脑区：一种深度地形多模态模型

Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf

发表机构 * University of California, Berkeley（加州大学伯克利分校）； Max Planck Institute for Human Cognitive and Brain Sciences（马克斯·普朗克人类认知与脑科学研究所）； ETH Zurich（苏黎世联邦理工学院）

AI总结提出Topo-Omni模型，通过空间平滑微调预训练基础模型，在单一连续虚拟皮层上整合视觉、听觉和语言/认知处理，产生与人类神经影像一致的多模态聚类，并用于发现新脑区。

Comments Preprint. First two author contributed equally

详情

AI中文摘要

基于时变需求的约束赌博机在线LLM选择

Yin Huang, Qingsong Liu, Jie Xu

发表机构 * Department of Electrical and Computer Engineering, University of Florida（佛罗里达大学电气与计算机工程系）； Manning College of Information and Computer Sciences, University of Massachusetts Amherst（马萨诸塞大学阿默斯特分校曼宁信息与计算机科学学院）

AI总结针对边缘云推理系统中异构LLM的选择问题，提出一种基于置信界估计和需求预测的在线学习算法，在硬预算和软延迟约束下实现亚线性遗憾和约束违反。

Comments 11 pages, 3 figures with multiple subfigures, 1 table, submitted for possible journal publication

详情

AI中文摘要

大型语言模型（LLM）越来越多地部署在边缘云推理系统中，以处理具有异构准确性、延迟和成本配置的多样化用户任务。为每个传入任务选择合适的LLM对于确保服务质量和高效资源利用至关重要。然而，模型异构性、随机且未知的性能特征以及时变的任务需求使得静态选择策略不再适用。实际部署通常施加硬资源预算（如货币支出限制）和软服务级别要求（如延迟保证）。这些约束为在线决策带来了额外挑战。我们将该问题形式化为一个约束随机赌博机学习任务，其中学习者在包装型（硬）和覆盖型（软）约束下顺序选择模型，同时适应时变的任务需求。学习者无法访问底层奖励、成本或延迟分布，必须依赖部分反馈。我们开发了一种新颖的在线学习算法，利用置信界估计和需求预测来平衡奖励最大化与长期约束满足。我们提供了理论保证，表明与具有完整信息的离线基准相比，该算法实现了亚线性遗憾和亚线性覆盖约束违反。在合成工作负载上的实验结果证明了我们的方法在动态、资源受限环境中的有效性和鲁棒性。

英文摘要

Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems to handle diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Selecting the appropriate LLM for each incoming task is critical for ensuring service quality and efficient resource utilization. However, model heterogeneity, stochastic and unknown performance characteristics, and time-varying task demands make static selection strategies inadequate. Real-world deployments often impose hard resource budgets such as monetary expenditure limits, along with soft service-level requirements such as latency guarantees. These constraints introduce additional challenges for online decision-making. We formulate this problem as a constrained stochastic bandit learning task, where the learner sequentially selects models under both packing-type (hard) and covering-type (soft) constraints, while adapting to time-varying task demand. The learner operates without access to the underlying reward, cost, or latency distributions and must rely on partial feedback. We develop a novel online learning algorithm that leverages confidence-bound estimates and demand predictions to balance reward maximization with long-term constraint satisfaction. We provide theoretical guarantees showing sublinear regret and sublinear covering constraint violations compared to an offline benchmark with full information. Experimental results on synthetic workloads demonstrate the effectiveness and robustness of our approach in dynamic, resource-constrained environments.

URL PDF HTML ☆

赞 0 踩 0

2606.17524 2026-06-17 cs.LG 新提交

Learning to Refine Hidden States for Reliable LLM Reasoning

学习精炼隐藏状态以实现可靠的LLM推理

Chia-Hsuan Hsu, Jui-Ming Yao

发表机构 * Tongyu0924

AI总结提出ReLAR框架，通过强化学习引导的潜在状态精炼，自适应调整推理步数和方向，提升复杂多步推理的准确性和稳定性，降低推理开销。

Comments Code is available at tongyu0924/Learning-to-Refine-Hidden-States

2606.17545 2026-06-17 cs.LG q-fin.CP q-fin.PR 新提交

Continuous-time Optimal Stopping through Deep Reinforcement Learning

通过深度强化学习的连续时间最优停止

Cosmin Borsa, Michael Ludkovski

发表机构 * Department of Statistics & Applied Probability, UC Santa Barbara（加州大学圣塔芭芭拉分校统计与应用概率系）

AI总结提出CARLOS算法，利用聚合深度神经网络学习任意精细时间分辨率下的停止规则，通过渐进式时间网格细化和自适应采样，逼近美式期权价格上界。

Comments 33 pages

详情

AI中文摘要

基于仿真的最优停止问题求解器必须离散化停止决策。在经典动态规划下，粗网格（只有少数停止机会）会显著低估最优期望回报，而在极细网格上，近似误差通过反向递归累积。为消除这一限制，我们开发了一种新的强化学习启发算法，能够在任意精细时间分辨率下学习停止规则。我们的CARLOS（连续时间自适应强化学习最优停止）算法利用聚合深度神经网络（ADNN）学习联合时空决策边界。从粗时间网格开始，我们逐步增加停止机会的频率，同时并行训练ADNN以精化其时机-价值估计。此外，我们设计了一种自适应采样策略，逐渐将训练集中到停止边界附近。基准测试结果表明，CARLOS相比现有百慕大求解器提供更高的价格，接近美式上界，并且相对于非RL比较器实现了高计算效率。

英文摘要

Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal expected reward, whereas on a very fine grid, approximation errors accumulate through the backward recursion. To remove this limitation, we develop a new reinforcement-learning inspired algorithm that enables us to learn the exercise rule at arbitrarily fine time resolution. Our CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping) algorithm utilizes an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. Starting from a coarse time grid, we progressively increase the frequency of stopping opportunities, while in parallel training the ADNN to refine its timing-value estimates. We moreover design an adaptive sampling strategy that gradually concentrates training effort near the stopping boundary. Benchmarked results show that CARLOS delivers higher prices than existing Bermudan solvers, approaching the American upper bound, and achieves high computational efficiency relative to non-RL comparators.

URL PDF HTML ☆

赞 0 踩 0

2606.17551 2026-06-17 cs.LG cs.AI 新提交

Reversal Q-Learning

逆向Q学习

Aditya Oberai, Seohong Park, Sergey Levine

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结提出逆向Q学习（RQL）算法，通过扩展MDP框架和逆向流生成虚拟在线轨迹，结合偏差-方差缩减技术，实现基于流策略的离线强化学习，在50个机器人任务中取得最佳平均性能。

详情

AI中文摘要

迭代生成建模技术（如流匹配）为建模复杂行为以进行有效的离线强化学习（RL）提供了强大工具。在这项工作中，我们提出了一种新的离策略RL算法，该算法基于先验数据训练流策略。我们的想法始于“扩展”马尔可夫决策过程（MDP）框架，该框架将单个流细化步骤视为MDP中的独立动作。为了在该框架中实现离策略RL，我们应用了两种技术：我们通过“逆向”流生成虚拟在线轨迹，使该框架与先验数据兼容；并应用偏差-方差缩减技术来缓解离策略RL中的视界诅咒。我们将由此产生的算法称为逆向Q学习（RQL）。RQL相比先前基于流的RL方法具有若干优势：它不受时间反向传播的影响，更好地利用学习到的价值函数，并直接训练完整的、富有表现力的流策略。通过在50个具有挑战性的模拟机器人任务上的实验，我们表明，与最先进的基于流的离线RL算法相比，RQL实现了最佳的平均离线RL性能。

英文摘要

Iterative generative modeling techniques, such as flow matching, provide powerful tools to model complex behaviors for effective offline reinforcement learning (RL). In this work, we propose a new off-policy RL algorithm that trains a flow policy based on prior data. Our idea starts from the "expanded" Markov decision process (MDP) framework, which treats individual flow refinement steps as separate actions in an MDP. To enable off-policy RL within this framework, we apply two techniques: we generate virtual on-policy trajectories (by "reversing" flows) to make this framework compatible with prior data, and we apply a bias-and-variance reduction technique to mitigate the curse of horizon in off-policy RL. We call the resulting algorithm Reversal Q-learning (RQL). RQL has several advantages over previous flow-based RL methods: it does not suffer from backpropagation through time, makes better use of the learned value function, and directly trains the full, expressive flow policy. Through our experiments on 50 challenging simulated robotic tasks, we show that RQL leads to the best average offline RL performance compared to state-of-the-art flow-based offline RL algorithms.

URL PDF HTML ☆

赞 0 踩 0

2606.17680 2026-06-17 cs.LG cs.CL 新提交

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

EnvRL: 在智能体强化学习中从环境动力学中学习

Zhitong Wang, Songze Li, Hao Peng, Shuzheng Si, Yi Wang, Maosong Sun, Juanzi Li

发表机构 * Department of Computer Science and Technology, Tsinghua University（清华大学计算机科学与技术系）； Shanghai AI Laboratory（上海人工智能实验室）

AI总结提出EnvRL框架，通过状态预测和逆动力学两个辅助目标，将环境动力学学习融入智能体强化学习，在长周期任务中显著提升成功率。

详情

AI中文摘要

强化学习已成为训练大型语言模型作为智能体的强大范式。然而，针对长周期智能体任务的常规强化学习方法往往难以处理稀疏的结果奖励。直观上，这忽略了展开交互轨迹中包含的丰富环境动力学信息。我们认为交互体验本身固有地充当隐式监督信号，揭示了环境的潜在转换机制，并使智能体能够构建更准确的环境内部模型。因此，在这项工作中，我们研究了如何利用这一额外信号来改进策略学习。具体来说，我们提出了EnvRL，一个通过两个辅助目标（状态预测和逆动力学）将环境动力学学习融入智能体强化学习的框架。通过与主要强化学习目标联合优化，我们鼓励智能体从其自身的交互体验中内化环境动力学。在两个长周期智能体基准上的大量实验表明，EnvRL在成功率上比仅使用强化学习的基线有显著提升，例如，当使用GRPO训练时，在ALFWorld上将Qwen-2.5-1.5B-Instruct从72.8%提升到77.4%，在WebShop上从56.8%提升到67.0%。

英文摘要

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards. Intuitively, this overlooks the rich environment dynamics information contained in rollout interaction trajectories. We argue that the interaction experience inherently serves as an implicit supervision signal, reveals the underlying transition mechanisms of the environment, and enables the agent to construct a more accurate internal model of the environment.. Therefore, in this work, we investigate how to leverage this additional signal to improve policy learning. Specifically, we propose EnvRL, a framework that incorporates environment dynamics learning into agentic RL via two auxiliary objectives: state prediction and inverse dynamics. By jointly optimizing with the primary RL objective, we encourage the agent to internalize environment dynamics from its own interaction experience. Extensive experiments on two long-horizon agentic benchmarks demonstrate that EnvRL achieves significant improvements on success-rates over RL-only baselines, e.g., when trained with GRPO, lifting Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld, and from 56.8% to 67.0% on WebShop.

URL PDF HTML ☆

赞 0 踩 0

2606.18106 2026-06-17 cs.LG 新提交

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

深度强化学习用于最小零强制集

Steve Halley, Maurício Gruppi

发表机构 * Department of Computing Sciences, Villanova University（维拉诺瓦大学计算科学系）

AI总结提出一种基于强化学习的框架SD-ZFS，通过改进S2V-DQN架构求解最小零强制集问题，在多种图结构上验证了其优于贪心启发式算法。

详情

AI中文摘要

本文探讨了在无向图中寻找最小零强制集的问题，并提出了一种自适应的机器学习框架来解决该问题。最小零强制集问题是一种图着色问题，其中初始节点集的颜色在整个网络中传播。如果节点集在颜色变化规则的约束下迫使所有未着色节点改变颜色，则该节点集是零强制集。该问题在网络科学、网络控制和逻辑电路设计等不同领域有多种应用。寻找最小零强制集已被证明是NP难的。我们提出了一种强化学习框架SD-ZFS，该框架将S2V-DQN架构适配到ZFS问题。我们在该适配框架上训练了多个模型，并分析了在不同结构图数据集上的性能。我们评估了在该框架上训练的模型在不同网络类型上的泛化、扩展和迁移能力。结果表明，与最优解和贪心启发式算法相比，该框架是有效的。我们进一步深入了解了如何通过机器学习解决ZFS问题以及网络结构对该问题的影响。

英文摘要

This paper explores the problem of finding the minimum zero-forcing set on undirected graphs and proposes an adapted machine-learning framework to solve the problem. The minimum zero-forcing set problem is a graph coloring problem where the color of an initial set of nodes propagates throughout a network. The set of nodes is zero-forcing if it forces all uncolored nodes to change color under the constraint of the color-change rule. There are several applications to this problem across different domains such as network science, network control, and designing logical circuits. Finding the minimum zero-forcing set is shown to be NP-hard. We propose a reinforcement learning framework, SD-ZFS, that adapts the S2V-DQN architecture to the ZFS problem. We train several models on this adapted framework and analyze the performance across graph datasets that have varying structures. We evaluate how the models trained on the framework generalize, scale, and transfer to different network types. The results demonstrate the effectiveness of the framework when compared against the optimal solution and greedy heuristic. We provide further insight into how the ZFS problem can be solved through machine-learning and the influence of network structure on the problem.

URL PDF HTML ☆

赞 0 踩 0

2606.18111 2026-06-17 cs.LG cs.AI 新提交

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中学习公平帕累托最优策略

Umer Siddique, Peilang Li, Yongcan Cao

AI总结针对多目标强化学习中固定用户偏好无法提供多样化策略的问题，提出基于广义基尼福利函数的多策略方法，学习公平帕累托最优策略集。

Comments Accepted at the Reinforcement Learning Conference (RLC) 2025. 12 pages main + appendix, 8 figures, 4 tables

详情

AI中文摘要

公平性是多目标强化学习（MORL）决策中的一个重要方面，策略必须确保在多个潜在冲突的目标上既达到最优又实现公平。虽然单策略MORL方法可以使用福利函数（如广义基尼福利函数GGF）为固定的用户偏好学习公平策略，但它们无法提供动态或未知用户偏好所需的多样的策略集。为解决这一局限性，我们形式化了多策略MORL中的公平优化问题，其目标是学习一组帕累托最优策略，确保在所有可能的用户偏好下实现公平。我们的关键技术贡献有三点：（1）我们证明对于凹的、分段线性的福利函数（例如GGF），公平策略仍然在凸覆盖集（CCS）中，CCS是线性标量化下的近似帕累托前沿。（2）我们证明非平稳策略（通过累积奖励历史增强）和随机策略通过动态适应历史不公平性来改善公平性。（3）我们提出了三种新算法，包括将GGF与多策略多目标Q学习（MOQL）集成、用于学习非平稳策略的状态增强多策略MOQL，以及用于学习随机策略的新扩展。我们在多个领域评估了我们的算法，并将我们的方法与最先进的MORL基线进行了比较。实验结果表明，我们的方法学习了一组公平策略，能够适应不同的用户偏好。

英文摘要

Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

URL PDF HTML ☆

赞 0 踩 0

2606.17847 2026-06-17 cs.AI cs.LG 交叉投稿

WallZero: Mastering the Game of WallGo with Strategic Analysis

WallZero：通过战略分析掌握WallGo游戏

Hsing-Yu Chen, Jérôme Arjonilla, I-Chen Wu, Ti-Rong Wu

发表机构 * National Yang Ming Chiao Tung University（国立阳明交通大学）； Academia Sinica（中央研究院）

AI总结提出基于AlphaZero的WallZero智能体，通过定制动作和特征设计，在WallGo游戏中击败职业围棋选手，并分析游戏公平性与关键策略。

Comments Accepted by the Computers and Games conference (CG 2026)

详情

AI中文摘要

WallGo是一种最近引入的战略棋盘游戏，因2025年Netflix系列剧《The Devil's Plan》而流行。尽管在7x7的小棋盘上进行，但其石头移动和墙壁放置的组合导致了高游戏树复杂性和复杂的战略互动。尽管其日益流行，WallGo仍未得到充分探索。本文提出了WallZero，一个基于AlphaZero的双人WallGo设置智能体。我们引入了定制的动作和特征设计，以显著提高游戏性能。在评估中，WallZero击败了参与本研究的两位职业围棋选手，平均每局获得1.98倍的地盘。除了其强度，我们使用WallZero评估游戏公平性并识别掌握WallGo的关键策略。有趣的是，我们的结果显示，Netflix系列剧中使用的开局产生了更平衡的游戏。我们的代码可在以下网址获取：此 https URL。

英文摘要

WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 x 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98x more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at https://rlg.iis.sinica.edu.tw/papers/wallzero.

URL PDF HTML ☆

赞 0 踩 0

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习：在交易策略中增强风险回报权衡

Zheli Xiong

AI总结本文研究了在金融交易策略中使用集成强化学习模型的全面研究，利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机（SVM）、决策树和逻辑回归相结合，探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性，将其与单个强化学习模型在关键金融指标（包括累计回报率、夏普比率（SR）、卡勒姆比率和最大回撤（MDD））上进行比较。结果表明，集成方法在风险调整后的回报方面始终优于基础模型，提供了更好的回撤管理和整体稳定性。然而，我们发现集成性能对方差阈值τ的选择敏感，强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值，对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情

AI中文摘要

本文提出了一项全面研究，探讨在金融交易策略中使用集成强化学习（RL）模型的应用，利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机（SVM）、决策树和逻辑回归，我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性，将其与单个RL模型在关键金融指标（包括累计回报率、夏普比率（SR）、卡勒姆比率和最大回撤（MDD））上进行比较。我们的结果表明，集成方法在风险调整后的回报方面始终优于基础模型，提供了更好的回撤管理和整体稳定性。然而，我们发现集成性能对方差阈值τ的选择敏感，强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值，对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold $τ$, classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

URL PDF HTML ☆

赞 0 踩 0

2506.03802 2026-06-17 cs.LG 版本更新

Learning in Matching Games with Bandit Feedback

带强盗反馈的匹配博弈学习

Andreas Athanasopoulos, Christos Dimitrakakis

AI总结研究广义双边匹配市场中代理通过零和博弈交互的学习问题，提出基于UCB的算法，以匹配不稳定性为遗憾度量，实现次线性遗憾上界。

Comments 22 pages, 2 figures

详情

AI中文摘要

我们在广义双边匹配市场中引入了一个学习问题，其中代理选择行动以与其匹配对象进行交互。具体来说，我们考虑一个场景，其中匹配的代理参与具有初始未知收益矩阵的零和博弈，并研究集中式程序是否可以从强盗反馈中学习均衡。我们采用\emph{匹配均衡}的解概念，其中匹配$ \mathfrak{m} $和一组代理策略$ X $构成均衡，如果没有代理有动机偏离$ (\mathfrak{m}, X) $。为了量化候选解$ (\mathfrak{m}, X) $与均衡$ (\mathfrak{m}^\star, X^\star) $的偏差，我们引入了\emph{匹配不稳定性}的概念，它作为学习问题的遗憾度量。我们提出了一种基于UCB的算法，其中代理根据收益的乐观估计形成偏好并选择行动。我们的分析建立了一个次线性、实例无关的遗憾上界，并得到了经验证据的进一步支持。

英文摘要

We introduce a learning problem in a generalized two-sided matching market, where agents select actions to interact with their match. Specifically, we consider a setting in which matched agents engage in zero-sum games with initially unknown payoff matrices, and we investigate whether a centralized procedure can learn an equilibrium from bandit feedback. We adopt the solution concept of a \emph{matching equilibrium}, where a matching $ \mathfrak{m} $ and a set of agent strategies $ X $ form an equilibrium if no agent has an incentive to deviate from $ (\mathfrak{m}, X) $. To quantify deviations of a candidate solution $ (\mathfrak{m}, X) $ from the equilibrium $ (\mathfrak{m}^\star, X^\star) $, we introduce the notion of \emph{matching instability}, which serves as a regret measure for the learning problem. We propose a UCB-based algorithm in which agents form preferences and select actions according to optimistic estimates of the payoffs. Our analysis establishes a sublinear, instance-independent regret upper bound, further supported by empirical evidence.

URL PDF HTML ☆

赞 0 踩 0

2602.06014 2026-06-17 cs.LG cs.AI math.OC math.ST stat.ML stat.TH 版本更新

Optimism Stabilizes Thompson Sampling for Adaptive Inference

乐观主义稳定自适应推断的汤普森采样

Shunxing Yan, Han Zhong

AI总结本文通过引入乐观机制（如方差膨胀或均值奖励）稳定汤普森采样，使得各臂拉取次数收敛于确定性尺度，从而在K臂随机bandit中实现渐近有效的Wald推断，并解决了多最优臂的扩展问题。

Comments Accepted in part to COLT 2026

详情

AI中文摘要

汤普森采样（TS）广泛用于随机多臂老虎机，但其在自适应数据收集下的推断性质微妙。样本均值的经典渐近理论可能失效，因为臂特定样本量是随机的，并通过动作选择规则与奖励耦合。我们研究了具有高斯随机指数的K臂随机bandit中汤普森采样的自适应推断，其中奖励噪声为独立次高斯，并确定乐观主义是恢复稳定性的关键机制，即每个臂的拉取次数集中在确定性尺度附近。这种稳定性使得尽管自适应采样，仍能获得渐近有效的Wald推断。首先，我们证明方差膨胀的TS对任意K≥2是稳定的，包括多个臂最优的挑战性情况，对最优臂具有渐近均匀分配，对次优臂具有尖锐的对数拉取次数渐近性。这解决了Halder等人提出的K臂扩展问题，使用新的胜者图和Lyapunov漂移技术来控制多个最优臂之间的分配。其次，我们分析了一种替代的乐观修改，保持高斯指数方差不变但向指数中心添加显式均值奖励，并建立了类似的稳定性结论。总之，适当实施的乐观主义稳定了汤普森采样，并在多臂老虎机中实现了渐近有效的Wald推断，同时仅产生轻微额外的遗憾代价。

英文摘要

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study adaptive inference for Thompson sampling with Gaussian randomized indices in $K$-armed stochastic bandits with independent sub-Gaussian reward noises, and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, meaning that each arm's pull count concentrates around a deterministic scale. This stability yields asymptotically valid Wald inference despite adaptive sampling. First, we prove that variance-inflated TS is stable for any $K \ge 2$, including the challenging regime where multiple arms are optimal, with asymptotically uniform allocation over optimal arms and sharp logarithmic pull-count asymptotics for suboptimal arms. This resolves the $K$-armed extension question raised by \citet{halder2025stable}, using new winner-map and Lyapunov-drift techniques to control allocation among multiple optimal arms. Second, we analyze an alternative optimistic modification that keeps the Gaussian index variance unchanged but adds an explicit mean bonus to the index center, and establish a similar stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid Wald inference in multi-armed bandits, while incurring only a mild additional regret cost.

URL PDF HTML ☆

赞 0 踩 0

2602.23116 2026-06-17 cs.LG cs.GT stat.ML 版本更新

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

具有广义双线性偏好的可证明高效正则化在线RLHF

Junghyun Lee, Minju Hong, Kwang-Sung Jun, Chulhee Yun, Se-Young Yun

AI总结研究在线RLHF中正则化最佳响应最大遗憾最小化问题，通过广义双线性偏好模型证明强凸性可导出多对数遗憾，表明快速遗憾不限于KL散度。

Comments 48 pages, 3 figures (ver3: major revisions; ver2: more colorful boxes, fixed some typos)

详情

AI中文摘要

我们考虑在一般偏好和bandit反馈下在线RLHF中的正则化最佳响应最大遗憾最小化问题。虽然各种正则化器被用于增强对齐的鲁棒性，但已知的多对数遗憾保证仍然高度特定于KL。为了研究这种快速速率是否扩展到KL之外，我们采用广义双线性偏好模型（GBPM）——通过一个秩为$2r$的斜对称矩阵捕获$d$维逐项特征上的非传递偏好——以隔离一般正则化的影响。关键地，在GBPM下，我们证明任何贪婪策略的对偶间隙受限于平方估计误差，该误差仅利用强凸性和斜对称性导出。在特征覆盖假设下，我们通过贪婪采样建立了$\tilde{\mathcal{O}}(\eta d^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$的通用多对数遗憾，并通过探索后提交（Explore-Then-Commit）建立了$\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{\eta r T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$的维度改进遗憾（对于条件良好的臂集），其中$\eta^{-1}$是正则化系数，$T$是时间范围，$C_{\min}$是依赖于臂集的量。这表明“快速”遗憾并非KL特有，而是通用强凸几何的基本结果。

英文摘要

We consider the problem of regularized best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized to robustify alignment, known polylogarithmic regret guarantees remain heavily specific to KL. To investigate whether such fast rates extend beyond KL, we adopt the Generalized Bilinear Preference Model (GBPM) -- capturing intransitive preferences over $d$-dimensional item-wise features via a rank-$2r$ skew-symmetric matrix -- to isolate the impact of generic regularization. Crucially, under GBPM, we prove that the dual gap of any greedy policy is bounded by the squared estimation error, derived using \emph{only} strong convexity and skew-symmetry. Under a feature coverage assumption, we establish a \emph{generic} polylogarithmic regret of $\tilde{\mathcal{O}}(ηd^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$ with Greedy Sampling, and a dimension-wise improved regret (for well-conditioned arm-sets) of $\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{ηr T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$ with Explore-Then-Commit, where $η^{-1}$ is the regularization coefficient, $T$ is the time horizon, and $C_{\min}$ is an arm-set dependent quantity. This demonstrates that ``fast'' regrets are not KL-specific, but rather a fundamental consequence of generic strongly convex geometry.

URL PDF HTML ☆

赞 0 踩 0

2604.18701 2026-06-17 cs.LG cs.AI stat.ML 版本更新

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Curiosity-Critic：累积预测误差改进作为世界模型训练的可处理内在奖励

Vin Bhaskara, Haicheng Wang

AI总结提出Curiosity-Critic方法，通过可处理的每步替代项（当前预测误差与渐近误差基线的差值）作为内在奖励，利用共训练的评论家在线估计误差基线，有效分离可约与不可约预测误差，在随机网格世界实验中优于现有方法。

Comments Accepted to ICML 2026 Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026). Code: https://github.com/vinbhaskara/Curiosity-Critic

详情

AI中文摘要

基于局部预测误差的好奇心奖励仅关注当前转移，而不考虑世界模型在所有已访问转移上的累积预测误差。我们引入了Curiosity-Critic，其内在奖励基于这一累积目标的改进，并证明它有一个可处理的每步替代项：当前预测误差与当前状态转移的渐近误差基线之间的差值。我们通过一个与世界模型共同训练的评论家在线估计这一误差基线；由于评论家只需学习一个转移的预测难度，其对不可约噪声基线的估计在世界模型饱和之前就已收敛，从而将探索引导向可学习的转移。该奖励对可学习转移较高，而对随机转移趋近于零，从而在线分离认知（可约）和偶然（不可约）预测误差。从Schmidhuber（1991）到学习特征空间变体的先前预测误差好奇心公式，都作为该误差基线的特定近似特例出现。在随机网格世界上的实验表明，Curiosity-Critic在训练速度和最终世界模型准确性上优于基于预测误差、访问计数和随机网络蒸馏的方法。

英文摘要

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; since the critic only has to learn how hard a transition is to predict, its estimate of the irreducible noise floor converges well before the world model saturates, redirecting exploration toward learnable transitions. The reward is higher for learnable transitions and collapses toward zero for stochastic ones, thereby separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.16590 2026-06-17 cs.LG cs.AI q-bio.NC 版本更新

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

婴儿自发运动噪声改善深度强化学习中的探索

Francisco M. López, Markus R. Ernst, Francisco Cruz, Matej Hoffmann, and Jochen Triesch

发表机构 * Frankfurt Institute for Advanced Studies（法兰克福高等研究所）； School of Computer Science and Engineering, University of New South Wales（新南威尔士大学计算机科学与工程学院）； Escuela de Ingeniería, Universidad Central de Chile（智利中央大学工程学院）； Faculty of Electrical Engineering, Czech Technical University（捷克理工大学电气工程学院）

AI总结受婴儿自发运动噪声启发，提出一种在RL训练中逐步增加时间自相关的探索噪声机制，实验表明其能产生结构化探索行为并提高学习效率。

Comments 6 pages, 4 figures, 1 table. Accepted at IEEE ICDL 2026. Cite as: F. M. López, M. R. Ernst, F. Cruz, M. Hoffmann, and J. Triesch, "Infant Spontaneous Movement Noise Improves Exploration in Deep RL", in 2026 IEEE International Conference on Development and Learning (ICDL). IEEE, 2026, pp. 1-6

详情

AI中文摘要

深度强化学习（RL）中的探索通常实现为时间上不相关的白噪声。然而，最近的研究表明，时间相关的有色噪声可以通过产生更平滑的轨迹和更好的状态空间覆盖来提高探索效率。我们探究受婴儿自发运动启发的动作噪声是否也能改善深度RL中的探索。我们发现婴儿末端执行器速度的功率谱密度遵循有色噪声过程，其谱指数随年龄增长而增加。受这一发育模式的启发，我们引入了一种机制，在RL训练过程中逐步增加探索噪声的时间自相关，与婴儿统计数据相匹配。在多个RL环境中的实验表明，婴儿启发的噪声产生结构化的探索行为，并且与传统的探索策略相比可以提高学习效率。这些发现表明，人类运动和认知发展可以为人工智能体的学习机制设计提供有用的指导。我们的代码可在 https://github.com/trieschlab/baby-noise-rl 获取。

英文摘要

Exploration in deep reinforcement learning (RL) is commonly implemented as temporally uncorrelated white noise. However, recent works show that temporally correlated colored noise can improve exploration efficiency by producing smooth trajectories with better coverage of the state space. We inquire whether action noise inspired by infant spontaneous movements can also improve exploration in deep RL. We find that the power spectral densities of babies' end-effector velocities follow a colored noise process where the spectral exponent increases with age. Inspired by this developmental pattern, we introduce a mechanism that progressively increases the temporal auto-correlation of exploration noise during RL training, matching the infant statistics. Experiments across several RL environments show that infant-inspired noise produces structured exploratory behavior and can improve learning efficiency compared to conventional exploration strategies. These findings suggest that human motor and cognitive development can provide useful guidance for designing learning mechanisms in artificial agents. Our code is available at https://github.com/trieschlab/baby-noise-rl.

URL PDF HTML ☆

赞 0 踩 0

2509.26633 2026-06-17 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

OmniRetarget：面向人形全身运动操控与场景交互的交互保持数据生成

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

AI总结提出OmniRetarget引擎，通过交互网格显式建模并保持智能体、地形和物体间的空间与接触关系，将人类运动重定向为机器人运动，生成高质量轨迹以训练强化学习策略，实现长时间跑酷和操控技能。

Comments Project website: https://omniretarget.github.io

详情

AI中文摘要

教授人形机器人复杂技能的主流范式是将人类运动重定向为运动学参考，以训练强化学习（RL）策略。然而，现有的重定向流程常常难以应对人与机器人之间的显著具身差异，产生物理上不可信的伪影，如脚滑和穿透。更重要的是，常见的重定向方法忽略了对于表达性运动及运动操控至关重要的丰富的人-物和人-环境交互。为解决这一问题，我们引入了OmniRetarget，一种基于交互网格的交互保持数据生成引擎，该网格显式建模并保持智能体、地形和操作对象之间的关键空间与接触关系。通过最小化人体与机器人网格之间的拉普拉斯变形同时施加运动学约束，OmniRetarget生成运动学上可行的轨迹。此外，保持任务相关的交互使得从单一示范到不同机器人本体、地形和物体配置的高效数据增强成为可能。我们通过将来自OMOMO、LAFAN1和我们内部MoCap数据集的运动进行重定向，全面评估了OmniRetarget，生成了超过8小时的轨迹，这些轨迹在运动学约束满足和接触保持方面优于广泛使用的基线。这种高质量数据使得本体感觉RL策略能够在Unitree G1人形机器人上成功执行长达30秒的长时间跑酷和运动操控技能，且仅使用5个奖励项和所有任务共享的简单域随机化进行训练，无需任何学习课程。

英文摘要

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.

URL PDF HTML ☆

赞 0 踩 0

2510.19528 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

学习上下值包络以塑造在线强化学习：一种原则性方法

Sebastian Reboul, Hélène Halconruy

AI总结提出一种两阶段框架，利用离线数据学习值函数的上下界，并将其融入在线算法，通过解耦上下界实现更灵活紧致的近似，理论分析给出高概率遗憾界，实验表明显著降低遗憾。

Comments 35 pages, 5 figures

详情

AI中文摘要

我们研究了利用离线数据加速在线强化学习这一基本问题——该方向潜力巨大但理论基础有限。我们的研究聚焦于如何在此背景下\emph{学习}和\emph{应用}值包络。为此，我们引入了一个原则性的两阶段框架：第一阶段使用离线数据推导值函数的上下界，第二阶段将这些学习到的界融入在线算法。我们的方法通过解耦上下界扩展了先前工作，实现了更灵活和紧致的近似。与依赖固定塑形函数的方法不同，我们的包络是数据驱动的，并明确建模为随机变量，通过过滤论证确保各阶段的独立性。分析建立了由两个可解释量决定的高概率遗憾界，从而为离线预训练和在线微调之间提供了形式化的桥梁。在表格型MDP上的实验结果表明，与UCBVI和先前方法相比，我们的方法显著降低了遗憾，同时与相关方法保持竞争力。

英文摘要

We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to \emph{learn} and \emph{apply} value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods while remaining competitive with related approaches.

URL PDF HTML ☆

赞 0 踩 0

2601.22184 2026-06-17 cs.GT cs.LG cs.MA 版本更新

Tacit Coordination of Large Language Models

大型语言模型的隐性协调

Ido Aharon, Emanuele La Malfa, Michael Wooldridge, Sarit Kraus

AI总结研究大型语言模型在多智能体无通信协调中的焦点涌现能力，通过博弈和搜救任务评估，发现模型在多数场景匹配或超越人类，但在数值常识和文化显著性任务中失败，并提出无学习策略改善协调。

Comments Code: https://github.com/EmanueleLM/focal-points

详情

AI中文摘要

大型语言模型（LLMs）越来越多地被部署在需要无通信协调的多智能体环境中，从人机交互到安全关键场景。人类通常通过焦点来克服缺乏沟通的问题：这些是自然突出的显著解决方案。我们首次大规模评估了焦点如何在LLMs中涌现、何时以及为何涌现，通过合作与竞争博弈（包括真实的搜救场景）比较其与人类的行为，展示了焦点何时能实现有效协调。在超过20个开源和闭源模型中，我们发现LLMs表现出显著的无通信协调能力，通常匹配或超越人类。然而，相同的模型在需要数值常识或文化细微显著性的任务中始终失败。我们还评估了简单的无学习策略，这些策略显著改善了LLMs之间以及人类与LLMs之间的协调。我们的结果揭示了现代LLMs中惊人的协调能力以及社会局限性，并提供了对其编码的潜在显著性概念的新见解。我们的发现警示，在协调环境中部署LLMs时，不应假设它们共享人类的文化和感知基础。

英文摘要

Large Language Models (LLMs) are increasingly deployed in multi-agent settings that require coordination without communication, from human-AI interaction to safety-critical scenarios. Humans often overcome the absence of communication through focal points: salient solutions that naturally stand out to all participants. We present the first large-scale evaluation of how, when, and why focal points emerge in LLMs, comparing their behaviour with humans across cooperative and competitive games, including realistic search and rescue scenarios, demonstrating when focal points enable effective coordination. Across more than 20 open- and closed-source models, we find that LLMs exhibit a remarkable ability to coordinate without communication, often matching or outperforming humans. However, the same models consistently fail in tasks requiring numerical common sense or culturally nuanced notions of salience. We additionally evaluate simple learning-free strategies that substantially improve coordination both among LLMs and between humans and LLMs. Our results reveal striking coordination capabilities, as well as social limitations in modern LLMs, and offer new insight into the latent notions of salience encoded within them. Our findings caution against assuming that LLMs share humans' cultural and perceptual substrate when deployed in coordination settings.

URL PDF HTML ☆

赞 0 踩 0

2602.10635 2026-06-17 cs.AI cs.LG 版本更新

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

OmniSapiens: 一种通过异质性感知相对策略优化进行社会行为处理的基础模型

Keane Ong, Sabri Boughorbel, Luwei Xiao, Chanakya Ekbote, Wei Dai, Ao Qu, Jingyao Wu, Rui Mao, Ehsan Hoque, Erik Cambria, Gianmarco Mengaldo, Paul Pu Liang

发表机构 * Massachusetts Institute of Technology（麻省理工学院）； National University of Singapore（新加坡国立大学）； Nanyang Technological University（南洋理工大学）； Prince Sattam bin Abdulaziz University（普森·萨塔姆·本·阿卜杜勒阿齐兹大学）； University of Rochester（罗切斯特大学）

AI总结针对行为数据异质性导致的训练不平衡问题，提出Omnisapiens-7B 2.0基础模型，采用异质性感知相对策略优化（HARPO）方法，在10个行为任务和5个零样本泛化基准上取得最佳性能。

Comments Accepted to ICML 2026 Main Conference

详情

AI中文摘要

社交智能AI系统必须能够推理多样的人类行为任务，并泛化到新情境。然而，AI尚未达到这种社交智能水平。现有模型仍然受到行为数据训练引起的学习动态不平衡的根本限制。即，行为数据本质上是异质的，包含多种模态和预测目标，通常在不同样本间产生不均匀的训练信号。为了解决这个问题，我们开发了Omnisapiens-7B 2.0，一个专门处理异质行为数据学习的社会行为处理基础模型。这是通过异质性感知相对策略优化（HARPO）实现的，这是一种新颖的推理强化学习方法，明确地重新平衡样本间的学习信号。核心思想是近似策略更新的贡献信号，利用它们进行几何中心化和惯性平滑的优势调节。结果表明，Omnisapiens-7B 2.0在10个不同的行为任务上取得了最佳且最一致的性能，同时在所有五个保留的零样本泛化基准上也取得了最佳性能，分别提升了高达+12.02%和+9.37%。此外，Omnisapiens-7B 2.0展示了更一致和可解释的推理轨迹，支持可靠的现实世界行为应用。我们的模型和代码可在https://github.com/MIT-MI/human_behavior_atlas找到。

英文摘要

Socially intelligent AI systems must reason across diverse human behavioral tasks and generalize to new social contexts. However, behavioral data is inherently heterogeneous, comprising diverse modalities and prediction targets that produce uneven training signals across samples, creating imbalanced learning dynamics that challenge existing AI models. To address this, we develop Omnisapiens-7B 2.0, a foundation model for social behavior processing that explicitly addresses learning from heterogeneous behavioral data. This is enabled through Heterogeneity-Aware Relative Policy Optimization, a new RL method that rebalances learning signals across samples by approximating each sample's contribution to the policy update and using these estimates to drive geometrically centered, inertially smoothed advantage modulation for stable training. Omnisapiens-7B 2.0 achieves the best and most consistent performance across 10 behavioral tasks, while also attaining the best performance on all five held-out benchmarks, with gains of up to +12.02% and +9.37% respectively. Furthermore, it demonstrates more consistent and interpretable reasoning traces, supporting reliable real-world behavioral applications. Our model is available at https://github.com/MIT-MI/human_behavior_atlas.

URL PDF HTML ☆

赞 0 踩 0

2603.27049 2026-06-17 stat.ML cs.LG 版本更新

Overcoming the Incentive Collapse Paradox

克服激励崩溃悖论

Qichuan Yin, Ziwei Su, Shuangning Li

AI总结针对AI辅助任务中激励崩溃问题，提出哨兵审计支付机制，在有限成本下维持正人力努力，并构建激励感知的主动统计推断框架优化审计率与采样分配。

Comments Accepted to ICML 2026

详情

AI中文摘要

AI辅助任务委派日益普遍，但此类系统中的人力成本高昂且通常不可观测。Bastani和Cachon (2025); Sambasivan等人 (2021) 的最新研究表明，基于准确度的支付方案存在激励崩溃：随着AI准确度提升，维持正向人力努力需要无界支付。我们在预算约束的委托-代理框架中研究这一现象，其中战略型人类代理的输出准确度取决于不可观测的努力。我们的第一个贡献是一般性不可能结果，表明激励崩溃不仅是简单线性支付的局限，而是任何仅基于观测任务结果的支付规则都会出现。为克服这一障碍，我们提出一种哨兵审计支付机制，该机制以有限成本强制执行严格为正且可控的人力努力水平，且与AI准确度无关。在此激励鲁棒的基础上，我们构建了一个激励感知的主动统计推断框架，联合优化(i)审计率和(ii)跨不同难度任务的主动采样与预算分配，以在单一预算下最小化最终统计损失。实验表明，相对于标准主动学习和仅审计基线，该方法改善了成本-误差权衡。

英文摘要

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task accuracy.To overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.06227 2026-06-17 physics.flu-dyn cs.LG 版本更新

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

减阻还是奖励黑客？赚取其奖励的循环多智能体强化学习

Giorgio Maria Cavallazzi, Miguel Pérez-Cuadrado, Alfredo Pinelli

发表机构 * School of Science and Technology, Department of Engineering, City St. George’s, University of London（伦敦大学科学与技术学院，工程系，圣乔治学院）

AI总结针对壁湍流减阻控制中强化学习奖励与设计目标偏离的问题，提出可微投影、循环策略和真实壁面功率奖励的修正方案，在诚实核算下实现17%的保守减阻。

详情

AI中文摘要

强化学习智能体最大化其奖励，这可能偏离其设计者预期的结果。在物理控制中，奖励很少弥合这一差距，而壁湍流中的减阻使其具体化。质量守恒投影耦合了智能体的输出，并抹去了策略梯度所需的每个智能体信用；无记忆策略无法解决其作用的缓慢近壁循环；压力梯度奖励通过壁面泵送功率来支付名义上的减阻。两个退化控制器实现了大的减阻，而总耗散增加，因此报告的数字可能掩盖了更耗能的流动。我们将每个缺陷追溯到其原因并加以修复：恢复信用的可微投影、具有加宽感知模板的循环策略以及基于真实壁面功率的奖励。修正后的控制器在封闭能量预算内作用于流动，在诚实核算下实现了保守的17%减阻。

英文摘要

A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

URL PDF HTML ☆

赞 0 踩 0

2606.17106 2026-06-17 cs.LG cs.CY 新提交

Informative Missingness to Generate Irregular Clinical Time Series

信息性缺失生成不规则临床时间序列

Hadi Mehdizavareh, Gabriele Santangelo, Giovanna Nicora, Simon Lebech Cichosz, Arianna Dagliati, Arijit Khan, Riccardo Bellazzi

发表机构 * Aalborg University（奥尔堡大学）； University of Pavia（帕维亚大学）； Bowling Green State University（博林格林州立大学）

AI总结提出基于扩散的临床时间序列生成方法，联合建模实验室值和观察模式，在DACMI基准上验证，能捕获生理与检测行为间的临床依赖。

详情

AI中文摘要

电子健康记录中的实验室检测是不规则收集的，检测指令的缺失可能与测量值本身一样具有信息性。这种缺失反映了临床医生的决策和患者生理状态，因此直接对其建模而非将其视为预处理伪影非常重要。本文提出一种基于扩散的方法，用于生成临床时间序列，该方法使用源自MIMIC-III的公共数据填补缺失数据挑战（DACMI）基准，联合建模实验室值及其观察模式。为了保持真实的采样，我们将图表时间对齐为4小时间隔，并将入院记录分割为7天窗口，生成每个实验室值对应一个观察指示符的轨迹。应用标准变换和归一化以稳定训练。我们的方法扩展了TimeDiff框架，通过互补的扩散目标学习连续的实验室值和离散的缺失模式。实验表明，生成的数据在单个实验室分布和联合值-缺失嵌入方面与真实患者轨迹高度匹配，证明扩散模型能够捕获在类似MNAR（非随机缺失）缺失下患者生理与临床医生检测行为之间的临床有意义依赖。这些初步结果表明，我们的模型可以作为开发临床基础模型的初始组件。通过生成保留关键生理-缺失关系的合成先验，本工作激励了后续训练能够利用信息性缺失的先验数据拟合网络，我们将在扩展工作中对此进行研究。

英文摘要

Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology, making it important to model it directly rather than treat it as a preprocessing artifact. Here we present a diffusion-based approach for generating clinical time series that jointly models laboratory values and their observation patterns using the public Data Analytics Challenge on Missing Data Imputation (DACMI) benchmark derived from MIMIC-III. To preserve realistic sampling, we align chart times into 4-hour intervals and segment admissions into 7-day windows, producing trajectories that pair each lab value with a corresponding observation indicator. Standard transformations and normalization are applied to stabilize training. Our method extends the TimeDiff framework to learn continuous lab values and discrete missingness patterns through complementary diffusion objectives. Experiments show that the generated data closely match real patient trajectories across individual lab distributions and joint value-missingness embeddings, demonstrating that diffusion models can capture clinically meaningful dependencies between patient physiology and clinicians' testing behavior under MNAR-like (missing-not-at-random) missingness. These preliminary results indicate that our model can serve as an initial component toward developing clinical foundation models. By producing synthetic priors that preserve key physiology-missingness relationships, this work motivates the subsequent training of Prior-Data Fitted Networks capable of leveraging informative missingness, which we will investigate in the extended work.

URL PDF HTML ☆

赞 0 踩 0

2606.17192 2026-06-17 cs.LG 新提交

Constrained Diffusion Models with Primal-Dual Inference

约束扩散模型与原始-对偶推理

Samar Hadou, Yigit Berkay Uslu, Alejandro Ribeiro

发表机构 * Department of Electrical and Systems Engineering, University of Pennsylvania（宾夕法尼亚大学电气与系统工程系）

AI总结提出原始-对偶推理（PDI）方法，通过联合推断最优原始分布和其对偶变量，在扩散模型反向过程中交替去噪与对偶上升，实现平均约束下的熵正则化优化问题采样。

详情

AI中文摘要

本文开发了具有原始-对偶推理（PDI）的约束扩散模型，用于从具有平均约束的熵正则化优化问题的最优分布中采样。我们在拉格朗日对偶域中形式化约束采样，其中最优分布采用由最优对偶变量索引的吉布斯分布形式。PDI不是先估计该对偶乘子并在整个生成过程中冻结它，而是联合推断最优原始分布及其参数化对偶变量。每个反向扩散步骤使用与当前乘子相关的得分场去噪，然后通过使用去噪样本的估计约束违反进行对偶上升来更新乘子。为了实现这种条件得分场，我们在推理过程中遇到的对偶变量所诱导的吉布斯分布族上训练一个单一的条件得分网络。我们证明了沿推理轨迹生成的对偶变量的时间平均收敛到对偶最优的邻域，并通过依赖于调度的时间稳定性因子限定了残余对偶失配对终端分布的影响。我们在高斯混合约束采样、无线资源分配和投资组合管理上评估了PDI。

英文摘要

This paper develops constrained diffusion models with primal-dual inference (PDI) to sample from optimal distributions of entropy-regularized optimization problems with \emph{average} constraints. We formalize constrained sampling in the Lagrangian dual domain, where the optimal distribution takes the form of a Gibbs distribution indexed by the optimal dual variable. Rather than estimating this dual multiplier before sampling and freezing it throughout generation, PDI jointly infers the optimal primal distribution and its parametrizing dual variable. Each reverse diffusion step denoises using the score field associated with the current multiplier and then updates the multiplier through dual ascent using the estimated constraint violation of the denoised samples. To enable this conditional score field, we train a single dual-conditioned score network over the family of Gibbs distributions induced by the dual variables encountered during inference. We prove that the time average of the dual variables generated along the inference trajectory converges to a neighborhood of the dual optimum and bound the effect of residual dual mismatch on the terminal distribution through schedule-dependent stability factors. We evaluate PDI on constrained sampling from a mixture of Gaussians, wireless resource allocation, and portfolio management.

URL PDF HTML ☆

赞 0 踩 0

2606.17409 2026-06-17 cs.LG cs.AI 新提交

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

离散自回归变压器用于生成式机构综合

Anar Nurizada, Anurag Purwar

发表机构 * Computer-Aided Design and Innovation Lab, Department of Mechanical Engineering, Stony Brook University（石溪大学机械工程系计算机辅助设计与创新实验室）

AI总结提出离散自回归变压器，将平面路径综合转化为条件序列建模，通过VAE潜在变量和机构类型令牌生成关节坐标，实现多样准确机构设计。

详情

AI中文摘要

平面路径综合需要机构的耦合曲线匹配预定轨迹；从曲线到连杆的映射本质上是一对多的，跨越四杆、六杆和八杆拓扑。我们通过模拟接地评估，在一个包含超过一百万个机构的策划语料库上解决这个设计问题，报告了正向运动学和几何对齐后的Chamfer距离和动态时间规整。我们将综合问题表述为条件自回归序列建模：关节坐标被均匀量化成令牌，并由一个解码器-only变压器生成，该变压器具有目标曲线的变分自编码器（VAE）潜在变量和一个显式的机构类型令牌。训练结合了令牌交叉熵和一个高斯平滑的bin辅助损失，该损失尊重bin之间的序数结构。在推理时，一个有界潜在噪声调度在每个噪声水平下解码所有机构类型；我们根据几何误差保留前五个候选，从而在没有数据集查找的情况下产生多样准确的族。在保留测试中，平均Chamfer距离为$0.0132$，平均动态时间规整为$0.153$；一个潜在$k$-最近邻基线，在VAE空间中基于训练集邻居潜在变量进行条件化，使用相同的解码器实现了匹配拓扑的平均Chamfer距离$0.0071$和平均动态时间规整$0.117$。

英文摘要

Planar path synthesis requires mechanisms whose coupler curves match a prescribed trajectory; the mapping from curve to linkage is inherently one-to-many across four-, six-, and eight-bar topologies. We address this design problem with simulation-grounded evaluation on a curated corpus of over one million mechanisms, reporting Chamfer distance and dynamic time warping after forward kinematics and geometric alignment. We formulate synthesis as conditional autoregressive sequence modeling: joint coordinates are uniformly quantized to tokens and generated by a decoder-only transformer with a variational-autoencoder (VAE) latent of the target curve and an explicit mechanism-type token. Training combines token cross-entropy with a Gaussian-smoothed bin auxiliary loss that respects ordinal structure among bins. At inference, a bounded latent-noise schedule decodes all mechanism types at each noise level; we retain the top five candidates by geometric error, yielding diverse accurate families without dataset lookup. On held-out tests, aggregate mean Chamfer distance is $0.0132$ and mean dynamic time warping is $0.153$; a latent $k$-nearest-neighbor baseline that conditions on training-set neighbor latents in VAE space achieves matched-topology mean Chamfer distance $0.0071$ and mean dynamic time warping $0.117$ using the same decoder.

URL PDF HTML ☆

赞 0 踩 0

2606.17465 2026-06-17 cs.LG cs.SY eess.SY 新提交

Perron--Frobenius Operator Matching for Generative Modeling

Perron--Frobenius算子匹配用于生成建模

Shiqi Zhang, Wuwei Wu, Jaemin Oh, Jie Chen, Xiaoning Qian

发表机构 * Texas A&M University（德克萨斯农工大学）； City University of Hong Kong（香港城市大学）

AI总结提出Perron-Frobenius算子匹配（PFOM）生成框架，通过积分PF算子匹配密度演化，统一流、扩散和跳跃模型，并证明KL散度在Bregman散度中唯一保持密度级与样本条件目标等价，开发Nesterov加速训练和采样方法。

2606.18022 2026-06-17 cs.LG 新提交

用于鲁棒扩散策略的Kolmogorov回归

Lekan Molu

发表机构 * Bala Cynwyd, PA 19004（巴拉辛威德, PA 19004）

AI总结提出后向Kolmogorov方程将扩散策略提升至Cameron-Martin空间，用确定性边界值PDE问题替代随机分数匹配，通过精度加权损失和残差诊断实现收敛保证、轨迹规则化和无奖励故障检测。

详情

AI中文摘要

有限维扩散策略由于离散化伪影导致时间漂移，降低了长期性能（当部署在物理系统上时）。我们引入了一个后向Kolmogorov方程，将扩散策略提升至Cameron-Martin空间——希尔伯特空间的一个子集。本质上，用确定性边界值PDE问题替代随机分数匹配。我们的核心创新基于高斯测度理论，其中扩散噪声协方差算子由有色噪声分布实现，该分布规定了推理时模型样本的正则性概念。我们使用推导出的精度加权Cameron-Martin损失训练扩散模型，并引入Kolmogorov残差作为推理时的PDE诊断。这些替换产生了：(i) 收敛保证，其中界的常数取决于核的有效秩而非动作维度，(ii) 通过谱加权改进轨迹规则性，以及(iii) 无需奖励信号的确定性故障检测器。在两个应用领域的验证显示了显著改进：在PushT操作基准测试中，Cameron-Martin损失在最大回合奖励上实现了17%的提升（0.95对比0.78的MSE），并通过引入的残差幅度在推理期间减少了67.6%的步间漂移。类似地，在具有恒定在制品（CONWIP）流量控制的6站生产线上，我们实现了比经典LSTM基线低28.4%的RMSE；高饥饿事件召回率（测试周期中为1.0），以及有效的瓶颈识别（测试集中Precision@1=1.0，信噪比13倍）。然后，我们使用Hamilton-Jacobi可达性理论认证调度策略，与100次模拟运行中的无控制调度相比，死锁事件减少了96%（防止了351个事件）。

英文摘要

Finite-dimensional (FD) diffusion policies exhibit temporal drift owing to discretization artifacts that degrade long-horizon performance (when deployed on physical systems). We introduce a backward Kolmogorov equation that lifts diffusion policies to a Cameron-Martin space -- a subset of the Hilbert space. Essentially, replacing stochastic score matching with a deterministic boundary-value PDE problem. Our core innovation thrives on Gaussian measure theory whereupon the diffusion noise covariance operator is realized from a colored noise distribution which prescribes a notion of regularity on samples from the model at inference time. We train the diffusion model with a derived precision-weighted Cameron- Martin loss and a Kolmogorov residual is introduced as a PDE diagnostic during inference. These substitutions yield (i) convergence guarantees where the bound's constants depend on the effective rank of the kernel rather than action dimension, (ii) improved trajectory regularity via spectral weighting, and (iii) a deterministic failure detector without reward signals. Validation across two application domains demonstrates substantial improvements: on the PushT manipulation benchmark, the Cameron-Martin loss achieves a 17% improvement in maximum episode reward (0.95 vs. 0.78 for MSE) and 67.6% reduction in inter-step drifts during inference via the introduced residual magnitude. Similarly, on a 6-station manufacturing line with constant work-in-process (CONWIP) flow control, we achieve 28.4% lower RMSE than classical LSTM baselines; a high starvation-event recall (1.0 in test cycles), and effective bottleneck identification (Precision@1 = 1.0 in test set, 13x signal-to-noise ratio). We then certify the dispatch policies with Hamilton-Jacobi reachability theory which reduces deadlock events by 96% compared to uncontrolled dispatch over 100 simulated runs (351 events prevented).

URL PDF HTML ☆

赞 0 踩 0

2606.17127 2026-06-17 q-bio.QM cs.AI cs.LG 交叉投稿

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3

AMPGAN v3 的非经典抗菌肽智能发现

Jay Jung, Xiaohan Zhang, Shenghan Song, Mahmoud Sayedahmed, Chijian Xiang, Yunong Xu, Ahmed AbdelKhalek, Severin T. Schneebeli, Matthew J. Wargo, Jianing Li, Safwan Wshah

发表机构 * University of Vermont（弗吉尼亚大学）； Larner College of Medicine, University of Vermont（弗吉尼亚大学医学学院）； Purdue University（普渡大学）； Department of Comparative Pathobiology（比较病理科部门）； Department of Horticulture and Landscape Architecture（园艺与景观建筑部门）； Department of Industrial and Molecular Pharmaceutics（工业与分子药学部门）

AI总结提出 AMPGAN v3，一种多目标条件 GAN，扩展生成词汇至 D-氨基酸和末端修饰，通过双判别器提升稳定性，体外验证显示对革兰氏阳性菌有活性，并引入 PepCraft 多智能体框架用于端到端发现。

Comments Presented at the GenBio Workshop, ICML 2026

详情

AI中文摘要

抗菌药物耐药性每年导致超过一百万人死亡。抗菌肽（AMP）是一种有前景的解决方案，但生成式 AMP 模型尚未准备好设计含有非天然氨基酸和/或化学修饰的肽，而这些对于实际肽药物至关重要。我们提出了 AMPGAN v3，一种多目标条件 GAN，它将生成词汇扩展到 D-氨基酸和 N/C 末端修饰（如酰胺化）。通过将对抗性和活性感知监督分离到两个专门的判别器中，AMPGAN v3 显著提高了训练稳定性，并在外部分类器上优于先前的生成式 AMP 模型。我们在体外验证了跨越三个结构类别的五个候选物；其中两个对革兰氏阳性菌株表现出活性，最佳候选物对枯草芽孢杆菌的 MIC 达到 8 μg/mL。为了支持下游筛选，我们进一步提出了 PepCraft，一个用于端到端 AMP 发现的多智能体框架，其中规划智能体协调专门的执行器进行生成、过滤和验证。其优先级推荐与我们的体外结果一致。这些贡献使我们能够在小型但真实的规模上研究生成式和智能体 AI 如何在治疗性肽发现中协同作用。代码：this https URL

英文摘要

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifications, which are essential for real-world peptide drugs. We present AMPGAN v3, a multi-objective conditional GAN that expands the generative vocabulary to D-amino acids and N/C-terminus modifications such as amidation. By separating adversarial and activity-aware supervision across two specialized discriminators, AMPGAN v3 substantially improves training stability and outperforms prior generative AMP models on external classifiers. We validated five candidates spanning three structural classes in vitro; two showed activity against Gram-positive strains, with the best candidate reaching MIC 8 μg/mL against B. subtilis. To support downstream curation, we further present PepCraft, a multi-agent framework for end-to-end AMP discovery in which a Planning Agent orchestrates specialized executors for generation, filtering, and verification. Its prioritization recommendations align with our in vitro outcomes. Together, these contributions let us examine, on a small but real scale, how generative and agentic AI compose in therapeutic peptide discovery. Code: https://github.com/marszzibros/AMPGANv3

URL PDF HTML ☆

赞 0 踩 0

2606.17301 2026-06-17 cs.SD cs.LG 交叉投稿

Turning music identification into a neural forward pass

将音乐识别转化为神经前向传播

Muhammad Taimoor Haseeb, Ahmad Hammoudeh, Gus Xia

发表机构 * Music X Lab（音乐X实验室）； Mohamed Bin Zayed University of Artificial Intelligence（Mohamed Bin Zayed人工智能大学）

AI总结提出用生成式Transformer通过单次神经前向传播实现音乐识别，在短音频片段上超越传统声学指纹方法，存储和延迟显著降低。

详情

AI中文摘要

搜索是计算机科学中的基础操作，它将查询映射到集合中的匹配项。通常，它被实现为类似系统2的基于规则的流水线：计算键、探测索引、验证候选。相比之下，人类识别类似于系统1的联想式身份恢复模型，其中即使部分线索也能触发回忆，而无需显式枚举、排序甚至访问离散候选。在这里，我们展示了音乐声音识别——一个困难的搜索问题——可以通过生成式Transformer在单次神经前向传播中完成。该模型在音频数据集上训练，从短音频片段预测对应的曲目标识符。这种方法超越了最先进的声学指纹识别，对于短音频片段（1秒）的提升最大，证明了该方法不仅可行而且具有优势。此外，它将外部存储减少到基线的0.33%，并将推理延迟提高了2.3倍（p95）。而且，该模型可以拒绝未见曲目的查询，支持开放集操作，同时降低误归因风险。以音乐曲目识别为例，这项工作重新定义了搜索，使其更接近人类联想识别，远离算法数据库查找。

英文摘要

Search, a foundational operation in computer science, maps a query to a matching item in a collection. It is typically implemented as a System-2 like, rule-based pipeline in which a key is computed, an index is probed, and candidates are verified. By contrast, human recognition resembles a System-1 like, associative model of identity recovery, in which even partial cues can trigger a recall without explicitly enumerating, ranking, or even accessing discrete candidates. Here, we show that music sound identification, a difficult search problem, can be performed in a single neural feed-forward pass by a generative transformer. Trained on an audio dataset, the model predicts the corresponding track identifier from a short audio excerpt. This approach surpasses state-of-the-art acoustic fingerprinting, with the largest gains for short audio segments (1 second), demonstrating the method is not only viable but advantageous. Moreover, it reduces external storage to 0.33% of the baseline footprint and improves inference latency by 2.3x (p95). Furthermore, the model can reject queries for unseen tracks, supporting open-set operation while reducing misattribution risk. Using music track identification as an example, this work reframes search, bringing it closer in spirit to human associative recognition and away from algorithmic database lookup.

URL PDF HTML ☆

赞 0 踩 0

2606.17584 2026-06-17 cs.CV cs.LG 交叉投稿

Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

基于轨迹直线度的整流流根选择不动点反演

Semin Kim, Jihwan Yoon, Seunghoon Hong

发表机构 * KAIST（韩国科学技术院）

AI总结提出SelFix方法，通过选择使逆轨迹更直的不动点解，在整流流中实现精确反演，提升图像重建和编辑质量。

详情

AI中文摘要

找到生成给定数据样本的初始噪声（称为反演）是下游应用（如无训练图像编辑）的关键组成部分。现有的不动点反演方法通过将每个反演步骤表述为不动点问题来提高反演精度，但它们缺乏一个原则性的机制来选择实践中可能出现的多个不动点解。我们观察到不同的选择会引发不同的反演轨迹，导致重建和编辑质量的显著变化。对于整流流，我们进一步发现这种变化与轨迹直线度密切相关，这促使我们将直线度作为原则性的选择标准。我们提出SelFix，一种不动点反演方法，它选择诱导更直逆轨迹的不动点解，同时在标准局部假设下保持收敛到精确的反演根。在FLUX.1-dev和PIE-Bench上的实验表明，SelFix改进了不动点反演，实现了比先前反演基线更强的真实图像重建和更好的源保持提示编辑。代码可在该https URL获取。

英文摘要

Finding the initial noise that generates a given data sample, known as inversion, is a key component for downstream applications such as training-free image editing. Existing fixed-point inversion methods improve inversion accuracy by formulating each inversion step as a fixed-point problem, but they lack a principled mechanism for selecting among multiple fixed-point solutions that can arise in practice. We observe that different selections induce different inversion trajectories, leading to substantial variation in reconstruction and editing quality. For rectified flows, we further find that this variation is closely associated with trajectory straightness, motivating straightness as a principled selection criterion. We propose SelFix, a fixed-point inversion method that selects fixed-point solutions inducing straighter inverse trajectories while retaining convergence to an exact inverse root under standard local assumptions. Experiments on FLUX.1-dev and PIE-Bench show that SelFix improves fixed-point inversion, achieving stronger real-image reconstruction and better source-preserving prompt-based editing than prior inversion baselines. The code is available at https://github.com/seminkim/selfix.

URL PDF HTML ☆

赞 0 踩 0

2410.10137 2026-06-17 cs.LG math.DG stat.CO stat.ML 版本更新

Variational autoencoders with latent high-dimensional steady geometric flows for dynamics

具有潜在高维稳态几何流的变分自编码器用于动力学

Andrew Gracyk

AI总结提出VAE-DLM方法，在潜在空间中引入稳态几何流，通过物理信息方法求解高维流，增强潜在表示的表达能力，在PDE型数据上降低OOD误差15%-35%。

详情

Journal ref: 23rd International Conference of Numerical Analysis and Applied Mathematics (ICNAAM) 2025

AI中文摘要

我们开发了用于PDE型环境数据的变分自编码器（VAE）的黎曼方法，其中包含正则化几何潜在动力学，称为VAE-DLM（具有动态潜在流形的VAE）。我们重新构建了VAE框架，使得嵌入欧几里得空间中的流形几何（受我们的几何流约束）在编码器和解码器开发的中间潜在空间中被学习。通过定制潜在空间演化的几何流，我们诱导出我们选择的潜在几何性质，这些性质反映在经验性能中。我们通过谨慎选择先验重新表述了传统的证据下界（ELBO）损失。我们开发了一个具有稳态正则化项的线性几何流。该流只需要对一个时间导数进行自动微分，并且可以在中等高维度上以物理信息方法求解，从而允许更具表达力的潜在表示。我们讨论了该流如何被表述为梯度流，并保持熵远离度量奇点。这结合特征值惩罚条件，有助于确保流形在测度上足够大、非退化且具有规范几何，从而有助于鲁棒表示。我们的方法侧重于改进的多层感知器架构，使用tanh激活函数用于流形编码器-解码器。我们在感兴趣的数据集上证明，我们的方法至少与传统VAE表现相当，且通常更好。我们的方法可以超越传统VAE以及采用我们提出架构的VAE，在选定数据集上经常将分布外（OOD）误差降低15%至35%。我们重点展示了我们的方法在环境PDE上的应用，这些PDE的解在后期保持最小变化。我们提供了经验性证明，说明如何通过VAE改进外部动力学的鲁棒学习。

英文摘要

We develop Riemannian approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics, which we refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, embedded in Euclidean space are learned in the intermediary latent space developed by encoders and decoders. By tailoring the geometric flow in which the latent space evolves, we induce latent geometric properties of our choosing, which are reflected in empirical performance. We reformulate the traditional evidence lower bound (ELBO) loss with a considerate choice of prior. We develop a linear geometric flow with a steady-state regularizing term. This flow requires only automatic differentiation of one time derivative, and can be solved in moderately high dimensions in a physics-informed approach, allowing more expressive latent representations. We discuss how this flow can be formulated as a gradient flow, and maintains entropy away from metric singularity. This, along with an eigenvalue penalization condition, helps ensure the manifold is sufficiently large in measure, nondegenerate, and a canonical geometry, which contribute to a robust representation. Our methods focus on the modified multi-layer perceptron architecture with tanh activations for the manifold encoder-decoder. We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better. Our methods can outperform this and a VAE endowed with our proposed architecture, frequently reducing out-of-distribution (OOD) error between 15% to 35% on select datasets. We highlight our method on ambient PDEs whose solutions maintain minimal variation in late times. We provide empirical justification towards how we can improve robust learning for external dynamics with VAEs.

URL PDF HTML ☆

赞 0 踩 0

2507.05169 2026-06-17 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Critique of World Model: A Generative Latent Prediction Architecture for World Modeling

世界模型批判：一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

AI总结本文从心理学“假设性思维”出发，提出世界模型的核心目标是模拟真实世界的所有可行动可能性，并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测（GLP）架构。

详情

AI中文摘要

世界模型，即生物智能体所经历并对其采取行动的真实世界环境的算法模拟器，近年来因开发具有人工（通用）智能的虚拟智能体的需求日益增长而成为一个新兴课题。关于世界模型究竟是什么、如何构建、如何使用以及如何评估，已有许多讨论。本文从著名科幻经典《沙丘》中的想象出发，并借鉴心理学文献中“假设性思维”的概念，论证世界模型的主要目标是模拟真实世界中所有可行动的可能性，以进行有目的的推理和行动。我们审视了世界建模的关键设计维度：数据、表示、架构、学习目标和使用，调查了现有方法并分析了它们的权衡。在此基础上，我们提出了一种新的通用世界模型生成式潜在预测（GLP）架构，基于有状态的、分层的、多层次的、混合连续/离散表示，以及生成式和自监督学习框架，并展望了由这种模型支持的物理、智能体和嵌套（PAN）AGI系统。

英文摘要

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

URL PDF HTML ☆

赞 0 踩 0

2601.22495 2026-06-17 cs.LG 版本更新

Gradual Fine-Tuning for Flow Matching Models

流匹配模型的渐进微调

Gudrun Thorkelsdottir, Arindam Banerjee

AI总结提出渐进微调（GFT）框架，通过退火策略在目标分布样本下微调流生成模型，理论保证逼近真实目标，实验表明稳定性、效率与多样性优于现有方法。

Comments Preprint. Added methodology and experimental sections

详情

AI中文摘要

在数据有限、分布演变或计算受限的场景中，微调流匹配模型是一个核心挑战。尽管近期工作取得了显著进展，特别是在基于奖励的微调领域，但现有方法在稳定性、效率和多样性保持方面既未展示理论正确性，也未获得强有力的实证结果。本文提出渐进微调（GFT），一个简单而基于退火的框架，用于在仅有目标分布样本时微调流生成模型。对于随机流，GFT定义了一个温度控制的中间目标序列，平滑地插值预训练漂移和目标漂移，并在温度趋近于零时理论上逼近真实目标。我们分析证明，GFT后的样本生成可以通过使用任意（例如最优传输）耦合以及利用少步推理方法显著提高效率。实验上，GFT显著改善了收敛稳定性，同时相比其他微调方法保持或提高了生成质量、训练速度和生成多样性。我们的结果将GFT定位为在分布偏移下可扩展适应流匹配模型的简单、理论扎实且实践有效的替代方案。

英文摘要

Fine-tuning flow matching models is a central challenge in settings with limited data, evolving distributions, or computational constraints. While recent work has produced significant advances, particularly in the area of reward-based fine-tuning, current methods fail to demonstrate both theoretical correctness as well as strong empirical results in terms of stability, efficiency, and diversity preservation. In this work, we propose Gradual Fine-Tuning (GFT), a simple yet principled annealing-based framework for fine-tuning flow generative models when only samples from the target distribution are available. For stochastic flows, GFT defines a temperature-controlled sequence of intermediate objectives that smoothly interpolate between the pretrained and target drifts, provably approaching the true target as the temperature approaches zero. We analytically demonstrate that sample generation after GFT can be made substantially more efficient with the use of arbitrary (e.g., optimal transport) couplings, as well as by utilizing few-step inference methods. Empirically, GFT significantly improves convergence stability, while maintaining or improving generation quality, training speed, and generation diversity compared to other fine-tuning methods. Our results position GFT as a simple yet theoretically grounded and practically effective alternative for scalable adaptation of flow matching models under distribution shift.

URL PDF HTML ☆

赞 0 踩 0

2602.11590 2026-06-17 cs.LG 版本更新

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

从错误中学习：自纠正掩码扩散模型

Yair Schiff, Omer Belhasin, Roy Uziel, Guanghan Wang, Marianne Arriola, Gilad Turok, Ran Zilberstein, Michael Elad, Volodymyr Kuleshov

发表机构 * Cornell（康奈尔大学）； NVIDIA（英伟达）

AI总结提出ProSeCo框架，通过训练模型同时进行掩码去除和错误纠正，在生成过程中迭代修正已解码标记，提升样本质量并实现更快的采样速度。

Comments Code to reproduce our experiments is available here: https://github.com/kuleshov-group/proseco

详情

AI中文摘要

掩码扩散模型（MDMs）已成为自回归模型的有前途的替代方案，能够实现并行标记生成，同时保持竞争性能。尽管有这些优势，MDMs面临一个根本性限制：一旦标记被解除掩码，它们就保持固定，导致错误累积并最终降低样本质量。我们通过提出一个框架来解决这个问题，该框架训练模型同时执行掩码去除和纠正。通过重用MDM去噪网络的输出作为纠正器训练的输入，我们训练模型从潜在错误中恢复。在生成过程中，我们在掩码去除步骤之间应用额外的纠正性细化步骤，以更改解码的标记并改进输出。我们将我们的训练和采样方法命名为渐进式自纠正（ProSeCo），因为它具有独特的能力，可以迭代地细化整个序列，包括已生成的标记。我们在多个条件和无条件任务上进行了广泛的实验验证，表明我们的方法产生了更好的质量-效率权衡（采样速度提升高达约4倍），并实现了推理时计算缩放，以进一步提高样本质量，超越标准MDMs（在基准测试上提升高达约1.2倍）。

英文摘要

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that \method~yields better quality-efficiency trade-offs (up to ~4x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.2x improvement on benchmarks).

URL PDF HTML ☆

赞 0 踩 0

2604.24357 2026-06-17 cs.LG cs.AI 版本更新

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

DPRM: 一种用于扩散语言模型的即插即用Doob h变换诱导的令牌排序模块

Dake Bu, Wei Huang, Andi Han, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda

AI总结提出DPRM模块，通过在线估计从置信度驱动排序逐步过渡到过程奖励引导排序，改进扩散语言模型的令牌排序策略，在九种任务中提升性能。

详情

AI中文摘要

扩散语言模型生成时没有固定的从左到右顺序，令牌排序是一个核心算法选择。现有系统主要使用随机掩码或置信度驱动排序，分别存在训练-测试不匹配和短视探索的问题。我们引入DPRM（Doob变换过程奖励模型），一个即插即用的令牌排序模块，保持宿主架构、去噪目标和监督不变，仅修改排序策略。DPRM从置信度驱动排序开始，通过在线估计逐渐过渡到过程奖励引导排序。我们将精确的DPRM策略描述为奖励倾斜的Gibbs揭示律，证明其阶段式Soft-BoN近似的收敛性，表明在线分桶跟踪器以经验Bernstein速率跟踪精确的DPRM分数，并在可处理的优化假设下建立样本复杂度优势。在涵盖语言推理、测试时扩展、蛋白质、单细胞、分子、DNA、文本到图像生成和VQA的九个宿主中，DPRM排序变体改进了多个语言、DNA和多模态设置，同时也识别了仅置信度排序或任务特定效用更优的边界情况。代码见：this https URL

英文摘要

Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train--test mismatch and myopic exploration. We introduce DPRM (Doob -transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. Across nine hosts covering language reasoning, test-time scaling, protein, single-cell, molecular, DNA, text-to-image generation, and VQA, DPRM order variants improve several language, DNA, and multimodal settings while also identifying boundary cases where confidence-only ordering or task-specific utilities are preferable. Code is available at: https://github.com/DakeBU/DPRM-DLLM

URL PDF HTML ☆

赞 0 踩 0

2501.09876 2026-06-17 math.NA cs.LG cs.NA 版本更新

Geometry-Preserving Encoder/Decoder in Latent Generative Models

潜在生成模型中的几何保持编码器/解码器

Wonjun Lee, Riley C. W. O'Neill, Dongmian Zou, Jeff Calder, Gilad Lerman

发表机构 * Department of Mathematics, The Ohio State University（俄亥俄州立大学数学系）； Department of Mathematics, University of Minnesota（明尼苏达大学数学系）； Zu Chongzhi Center for Mathematics and Computational Sciences, Duke Kunshan University（杜克-昆山大学仲长奇中心）

AI总结本文提出一种新型几何保持编码器/解码器框架，通过保留数据分布的几何结构，在潜在扩散模型中实现更高效的训练和更快的收敛。

Comments 50 pages

详情

AI中文摘要

生成建模旨在生成与给定数据集相似的新数据样本。当使用扩散模型完成此任务时，主要挑战之一是在输入空间中解决问题，而输入空间往往非常高维。为了解决这个问题，最近的方法通过编码器将数据空间映射到较低维的潜在空间，在潜在空间中求解扩散模型，从而提高了训练效率并取得了最先进的结果。变分自编码器（VAE）是该领域最常用的编码器/解码器框架，以其学习潜在表示和生成数据样本的能力而闻名。在本文中，我们引入了一种新颖的编码器/解码器框架，其理论特性与VAE不同，专门设计用于保留数据分布的几何结构。我们证明了这种几何保持编码器在编码器和解码器训练过程中的显著优势。此外，我们提供了理论结果，证明了训练过程的收敛性，包括编码器训练的收敛保证，以及使用几何保持编码器时解码器训练收敛更快的结果。

英文摘要

Generative modeling aims to generate new data samples that resemble a given dataset. When using diffusion models for this task, one of the main challenges is solving the problem in the input space, which tends to be very high-dimensional. To address this, recent approaches solve diffusion models in the latent space through an encoder that maps from the data space to a lower-dimensional latent space, improving training efficiency and achieving state-of-the-art results. The variational autoencoder (VAE) is the most commonly used encoder/decoder framework in this domain, known for its ability to learn latent representations and generate data samples. In this paper, we introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE, specifically designed to preserve the geometric structure of the data distribution. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder. Additionally, we provide theoretical results proving convergence of the training process, including convergence guarantees for encoder training, and results showing faster convergence of decoder training when using the geometry-preserving encoder.

URL PDF HTML ☆

赞 0 踩 0

2502.18049 2026-06-17 stat.ML cs.LG 版本更新

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

无崩溃的递归学习：基于加权的稳定化框架

Hengzhi He, Shirong Xu, Guang Cheng

AI总结针对递归生成模型训练中的模型崩溃问题，提出基于加权的训练策略，在混合真实与合成数据场景下，理论推导出最优加权方案的统一表达式，揭示合成数据利用与模型性能间的权衡。

Comments This article has been accepted for publication in Journal of the Royal Statistical Society: Series B, published by Oxford University Press

详情

AI中文摘要

最近的研究发现了递归生成模型训练中的一个有趣现象，称为模型崩溃，即基于先前模型生成的数据训练的模型表现出严重的性能下降。解决这一问题并开发更有效的训练策略已成为生成模型研究的核心挑战。在本文中，我们在一个新框架下研究这一现象，其中生成模型在每一步迭代中基于新收集的真实数据和上一步的合成数据的组合进行训练。为了开发整合真实和合成数据的最优训练策略，我们评估了加权训练方案在各种场景下的性能，包括高斯分布估计、广义线性模型和非参数估计。我们从理论上刻画了合成数据的混合比例和加权方案对最终模型性能的影响。我们的关键发现是，在不同设置下，不同合成数据比例下的最优加权方案渐近地遵循一个统一表达式，揭示了利用合成数据与模型性能之间的基本权衡。在某些情况下，分配给真实数据的最优权重对应于黄金比例的倒数。最后，我们在大量模拟数据集和一个真实表格数据集上验证了我们的理论结果。

英文摘要

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

URL PDF HTML ☆

赞 0 踩 0

2602.03420 2026-06-17 cs.SD cs.LG 版本更新

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

CoCoEmo: 通过激活引导实现可组合且可控的类人情感语音合成

Siyi Wang, Shihong Tan, Siyi Liu, Hong Jia, Gongping Huang, James Bailey, Ting Dang

AI总结提出基于激活引导的框架，在混合TTS模型中实现可组合的混合情感合成和文本-情感不匹配合成，发现情感韵律主要由语言模块而非流匹配模块生成。

详情

AI中文摘要

人类语音中的情感表达是微妙且组合的，通常涉及多种、有时相互冲突的情感线索，这些线索可能与语言内容不一致。相比之下，大多数表现性文本转语音系统强制执行单一话语级别的情感，压缩了情感多样性并抑制了混合或文本-情感不匹配的表达。虽然通过潜在方向向量进行激活引导提供了一种有前景的解决方案，但情感表示在TTS中是否线性可引导、在混合TTS架构中应在何处应用引导以及如何评估这种复杂的情感行为仍不清楚。本文首次系统分析了混合TTS模型中用于情感控制的激活引导，引入了一个定量、可控的引导框架，以及多评估者评估协议，实现了可组合的混合情感合成和可靠的文本-情感不匹配合成。我们的结果首次证明，情感韵律和表达变异性主要由TTS语言模块而非流匹配模块合成，并提供了一种轻量级引导方法，用于生成自然、类人的情感语音。

英文摘要

Emotional expression in human speech is nuanced and compositional, often involving multiple, sometimes conflicting, affective cues that may diverge from linguistic content. In contrast, most expressive text-to-speech systems enforce a single utterance-level emotion, collapsing affective diversity and suppressing mixed or text-emotion-misaligned expression. While activation steering via latent direction vectors offers a promising solution, it remains unclear whether emotion representations are linearly steerable in TTS, where steering should be applied within hybrid TTS architectures, and how such complex emotion behaviors should be evaluated. This paper presents the first systematic analysis of activation steering for emotional control in hybrid TTS models, introducing a quantitative, controllable steering framework, and multi-rater evaluation protocols that enable composable mixed-emotion synthesis and reliable text-emotion mismatch synthesis. Our results demonstrate, for the first time, that emotional prosody and expressive variability are primarily synthesized by the TTS language module instead of the flow-matching module, and also provide a lightweight steering approach for generating natural, human-like emotional speech.

URL PDF HTML ☆

赞 0 踩 0

2602.06806 2026-06-17 cs.CV cs.LG 版本更新

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

RAIGen: 文本到图像生成模型中的罕见属性识别

Silpa Vadakkeeveetil Sreelatha, Dan Wang, Serge Belongie, Muhammad Awais, Anjan Dutta

发表机构 * University of California, Berkeley（加州大学伯克利分校）； UC Berkeley（加州大学伯克利分校）

AI总结提出RAIGen框架，利用Matryoshka稀疏自编码器和新颖的少数度量，在无标签条件下发现扩散模型中的罕见属性，并支持属性放大。

Comments Accepted at ICML 2026. Webpage and code available at https://github.com/VSSILPA/RAIGen

详情

AI中文摘要

文本到图像扩散模型实现了令人印象深刻的生成质量，但继承并放大了训练数据中的偏差，扭曲了语义属性的覆盖。先前的工作以两种方式解决这一问题。封闭集方法在预定义的公平性类别（如性别、种族）中减轻偏差，假设社会显著的少数属性是先验已知的。开放集方法将任务框架化为偏差识别，突出主导输出的多数属性。两者都忽略了一个互补的任务：揭示在数据分布中代表性不足（社会、文化或风格）但仍编码在模型表示中的罕见或少数特征。我们介绍了RAIGen，据我们所知，这是第一个用于扩散模型中无标签罕见属性发现的框架，不需要预定义的少数类别。RAIGen利用Matryoshka稀疏自编码器和一种新颖的少数度量，结合神经元激活频率与语义独特性，识别出那些其最高激活图像揭示代表性不足属性的可解释神经元。实验表明，RAIGen在Stable Diffusion中发现了超出固定公平性类别的属性，可扩展到更大的模型如SDXL，支持跨架构的系统审计，并在生成过程中实现罕见属性的定向放大。项目页面可在 https://vssilpa.github.io/RAIGen_webpage/ 获取。

英文摘要

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for label-free rare-attribute discovery in diffusion models, requiring no predefined minority categories. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation. The project page is available at https://vssilpa.github.io/RAIGen_webpage/ .

URL PDF HTML ☆

赞 0 踩 0

2602.11453 2026-06-17 cs.IR cs.AI cs.LG 版本更新

From Noise to Order: Learning to Rank via Denoising Diffusion

从噪声到有序：通过去噪扩散学习排序

Sajad Ebrahimi, Bhaskar Mitra, Negar Arabzadeh, Ye Yuan, Haolun Wu, Fattane Zarrinkalam, Ebrahim Bagheri

发表机构 * University of Guelph（圭尔夫大学）； Independent Researcher（独立研究者）； University of California, Berkeley（加州大学伯克利分校）； McGill University（麦吉尔大学）； University of Toronto（多伦多大学）

AI总结提出基于去噪扩散的生成式排序模型DiffusionRank，通过建模特征向量与相关性标签的联合分布，在四个标准LTR数据集上优于传统判别式方法。

详情

AI中文摘要

在信息检索（IR）中，学习排序（LTR）方法传统上局限于判别式机器学习方法，这些方法基于查询-文档对的特征表示来建模文档与查询相关的概率。在这项工作中，我们提出了一种基于去噪扩散的深度生成式LTR方法，该方法转而建模特征向量和相关性标签的完整联合分布。虽然在判别式设置中，过参数化的排序模型可能通过不同方式拟合训练数据，但我们假设在生成式设置下能够解释完整数据分布的候选解能更好地估计相关性。基于这一动机，我们提出了DiffusionRank，它扩展了TabDiff（一种用于表格数据集的基于去噪扩散的生成模型），以创建经典判别式逐点和成对LTR目标的生成式等价物。我们在四个标准LTR数据集上进行了彻底的实证评估，证明了DiffusionRank模型相对于其判别式对应物的改进。我们的工作为未来研究探索如何利用深度生成建模方法（如扩散）在IR中进行学习排序提供了丰富的空间。

英文摘要

Learning-to-rank (LTR) methods have traditionally been limited to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. We propose an alternative denoising diffusion-based generative approach to LTR that instead models the full joint distribution over features and relevance labels. While in discriminative LTR, an over-parameterized ranking model may find different ways to fit the training data, we posit that candidate solutions that can explain the full data distribution under the generative setting maybe better at estimating relevance. Thus, we propose DiffusionRank that extends TabDiff, an existing diffusion model for tabular datasets, to create generative alternatives to classical discriminative pointwise and pairwise LTR objectives. Our work demonstrates improvements from DiffusionRank over discriminative counterparts on four standard LTR datasets and points to a rich space for future exploration to leverage ongoing advancements in deep generative models for LTR. Our code is publicly available at https://github.com/sadjadeb/DiffusionRank.

URL PDF HTML ☆

赞 0 踩 0

2603.04438 2026-06-17 eess.IV cs.AI cs.LG 版本更新

CogGen: Cognitive-Load-Inspired Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

CogGen: 认知负荷启发的全无监督深度生成模型用于压缩感知MRI重建

Qingyong Zhu, Yumin Tan, Xiang Gu, Dong Liang

AI总结提出CogGen框架，基于认知易到难原则，通过自定进度课程学习和MRI感知双阈值加权策略，将CS-MRI重建分解为分阶段反演问题，理论证明降低局部充分迭代界和累积噪声放大界，实验优于现有无监督和有监督方法。

详情

AI中文摘要

全无监督深度生成建模（FU-DGM）为压缩感知磁共振成像（CS-MRI）重建提供了巨大潜力。代表性的FU-DGM公式，如深度图像先验（DIP）和隐式神经表示（INR），利用架构偏置在图像空间中诱导与正向观测对齐的低维流形。然而，由于底层逆系统高度病态，FU-DGM中长时间的迭代拟合通常导致效率低下和噪声放大。本文受认知易到难学习原则的启发，提出CogGen，一种将CS-MRI重建重新表述为分阶段反演问题的FU-DGM框架。具体地，CogGen通过MRI感知的双阈值加权准则实现自定进度课程学习（SPCL）驱动的渐进调度策略，该准则自适应地调节k空间测量参与。数据一致性残差阈值评估当前生成器的拟合可靠性，而k空间半径阈值控制阶段性的测量暴露，从而避免整个优化过程中的均匀拟合。理论上，我们的分析表明，当早期阶段倾向于易拟合的测量时，CogGen产生更低的局部充分迭代界和更小的累积噪声放大界，解释了CogGen在有限迭代预算内改进的收敛行为和重建保真度。数值实验表明，CogGen的两种实例化，CogGen-DIP和CogGen-INR，在包括无监督和有监督流程在内的现有CS-MRI重建技术中实现了优越的性能。

英文摘要

Fully unsupervised deep generative modeling (FU-DGM) offers significant potential for compressively sampled magnetic resonance imaging (CS-MRI) reconstruction. Representative FU-DGM formulations, such as deep image prior (DIP) and implicit neural representation (INR), employ architectural bias to induce a low-dimensional manifold in the image space that aligns with the forward observation. However, as the underlying inverse system is highly ill-posed, prolonged iterative fitting in FU-DGM typically leads to poor efficiency and noise amplification. In this paper, guided by the cognitive principle of easy-to-hard learning, we propose CogGen, an FU-DGM framework that reformulates CS-MRI reconstruction as a staged inversion problem. Specifically, CogGen implements an self-paced curriculum learning (SPCL)-driven progressive scheduling strategy through an MRI-aware dual-threshold weighting criterion, which adaptively regulates k-space measurement participation. The data-consistency residual thresholding evaluates the fitting reliability of the current generator, while the k-space radius thresholding controls stage-wise measurement exposure, thereby avoiding uniform fitting throughout optimization. Theoretically, our analysis shows that, when early stages favor easy-to-fit measurements, CogGen yields a reduced local sufficient-iteration bound and a smaller cumulative noise-amplification bound, explaining the improved convergence behavior and reconstruction fidelity of CogGen within a finite iteration budget. Numerical experiments demonstrate that both CogGen instantiations, CogGen-DIP and CogGen-INR, achieve superior performance over prevailing CS-MRI reconstruction techniques, including unsupervised and supervised pipelines.

URL PDF HTML ☆

赞 0 踩 0

2604.01197 2026-06-17 quant-ph cond-mat.stat-mech cs.CC cs.LG 版本更新

Learning and Generating Mixed States Prepared by Shallow Channel Circuits

通过浅层通道电路学习和生成混合态

Fangjun Hu, Christian Kokail, Milan Kornjača, Pedro L. S. Lopes, Weiyuan Gong, Sheng-Tao Wang, Xun Gao, Stefan Ostermann

发表机构 * QuEra Computing Inc.（QuEra计算公司）； School of Engineering and Applied Sciences, Harvard University（哈佛大学工程与应用科学学院）

AI总结研究通过浅层通道电路生成混合态的学习问题，证明在特定相态下，仅通过测量数据即可高效学习生成混合态，为量子生成模型提供结构基础。

Comments 44 pages, 14 figures, 1 table

详情

AI中文摘要

从测量数据中学习量子态是量子信息和计算复杂性中的核心问题。本文研究在有限维晶格上学习生成混合态的问题。受混合态物质相的最新发展启发，我们专注于平凡相中的任意态。一个态属于平凡相当于存在一个浅层准备通道电路，使得在准备过程中保持局部可逆性。我们证明了此类混合态可通过仅测量访问高效学习。具体而言，给定未知平凡相混合态的多个副本，我们的算法输出一个浅层局部通道电路，可近似生成该态。样本复杂度和运行时间与量子位数呈多项式（或准多项式）关系，假设电路深度和门局部性为常数（或多项式对数）。重要的是，学习者不被提供原始准备电路，仅依赖其存在。我们的结果为基于浅层通道电路的量子生成模型提供了结构基础。在经典极限下，我们的框架也启发了一种仅通过训练和生成的多项式过载高效算法，用于经典扩散模型。

英文摘要

Learning quantum states from measurement data is a central problem in quantum information and computational complexity. In this work, we study the problem of learning to generate mixed states on a finite-dimensional lattice. Motivated by recent developments in mixed state phases of matter, we focus on arbitrary states in the trivial phase. A state belongs to the trivial phase if there exists a shallow preparation channel circuit under which local reversibility is preserved throughout the preparation. We prove that any mixed state in this class can be efficiently learned from measurement access alone. Specifically, given copies of an unknown trivial phase mixed state, our algorithm outputs a shallow local channel circuit that approximately generates this state in trace distance. The sample complexity and runtime are polynomial (or quasi-polynomial) in the number of qubits, assuming constant (or polylogarithmic) circuit depth and gate locality. Importantly, the learner is not given the original preparation circuit and relies only on its existence. Our results provide a structural foundation for quantum generative models based on shallow channel circuits. In the classical limit, our framework also inspires an efficient algorithm for classical diffusion models using only a polynomial overhead of training and generation.

URL PDF HTML ☆

赞 0 踩 0

2605.07971 2026-06-17 cs.CV cs.LG 版本更新

DVD: Discrete Voxel Diffusion for 3D Generation and Editing

DVD: 用于3D生成和编辑的离散体素扩散

Zhengrui Xiang, Jiaqi Wu, Fupeng Sun, Heliang Zheng, Yingzhen Li

发表机构 * Imperial College London（伦敦帝国学院）； Math Magic ； Hitem3D

AI总结提出离散体素扩散框架（DVD），通过将体素占用视为离散变量，实现稀疏体素的生成、不确定性估计和编辑，避免连续到离散的阈值处理，并提供可解释的生成动态。

详情

AI中文摘要

我们引入了离散体素扩散（DVD），这是一个离散扩散框架，用于生成、评估和编辑基于SLat（结构化潜在）的3D生成管道中的稀疏体素。尽管离散扩散通常没有在类似图像的生成中取代连续扩散，但我们表明它可以作为稀疏体素支架的有效第一阶段先验。通过将体素占用视为原生离散变量，DVD避免了连续到离散的阈值处理，并为体素生成、不确定性估计和编辑提供了一个简单的框架。除了质量提升外，DVD通过显式类别建模提供了更可解释的生成动态。此外，我们利用预测熵作为稳健的不确定性度量，以识别模糊的体素区域和复杂样本，促进数据过滤和质量评估等任务。最后，我们提出了一种使用块结构扰动模式的轻量级微调策略。这种方法使模型能够在单次采样轮次内修复和编辑体素，所需的辅助计算量可忽略不计，且无需额外的模型评估。

英文摘要

We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations. Code is available at https://github.com/TeCai/DVD.

URL PDF HTML ☆

赞 0 踩 0

2606.08810 2026-06-17 cs.CL cs.LG 版本更新

Continuous Language Diffusion as a Decoder-Interface Problem

连续语言扩散作为解码器-接口问题

Zhicheng Du, Lan Ma

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University（清华大学深圳国际研究生院, 清华大学）

AI总结研究连续扩散语言模型如何从高斯噪声生成流畅文本，提出解码器-盆地机制，并设计诊断协议揭示标量指标隐藏的失败，通过接口相图解释令牌恢复行为。

详情

AI中文摘要

高斯扰乱的句子嵌入没有直接的语言解释，但连续扩散语言模型可以从它们生成流畅文本。我们通过嵌入式语言流（ELF）研究这一谜题，并识别出解码器-盆地机制：当轨迹到达原生解码器可以读取稳定令牌的区域时，去噪成功。我们引入了可去噪性、语义可恢复性、顺序敏感性、解码器兼容性和轨迹可靠性的诊断协议。它暴露了标量指标隐藏的失败：低均方误差可能丢弃语言内容，低困惑度可能反映低熵崩溃，干净的潜在重建可能与狭窄的解码器盆地共存。一个解码器-边界界解释了为什么令牌恢复依赖于边界和局部解码器敏感性，而不仅仅是潜在误差。审计公开的ELF检查点揭示了一个接口相图：早期预测弱可读，轨迹中期分歧标志竞争区域，晚期预测进入高边界最终令牌盆地。一旦进入，在生成的ELF状态上令牌实现出奇简单：冻结的T5令牌嵌入查找恢复了原生解码器决策的93%–96%，单个线性读出在32k样本时达到97.9%的一致性，在结构化残差尾部留下约1.1的困惑度差距。在显式诊断监控下，保守的边界门在去噪步骤中提前17%–27%退出。对LangFlow、BitstreamDiffusion和连续潜在扩散语言模型（Cola-DLM）的边界检查表明，当状态对象和解码器改变时，相同的接口问题仍然有意义。因此，连续和潜在扩散语言模型应作为表示-解码器系统进行评估。

英文摘要

Gaussian-corrupted sentence embeddings have no direct linguistic interpretation, yet continuous diffusion language models can generate fluent text from them. We study this puzzle through Embedded Language Flows (ELF) and identify a decoder-basin mechanism: our evidence suggests that denoising becomes reliable when trajectories reach regions where the native decoder can read stable tokens. We introduce a diagnostic protocol for denoisability, semantic recoverability, order sensitivity, decoder compatibility, and trajectory reliability. It exposes failures hidden by scalar metrics: low mean-squared error can discard linguistic content, low perplexity can reflect low-entropy collapse, and clean latent reconstruction can coexist with a narrow decoder basin. A decoder-margin bound explains why token recovery depends on margin and local decoder sensitivity, not latent error alone. Auditing public ELF checkpoints reveals an interface phase diagram: early predictions are weakly readable, mid-trajectory disagreement marks a competition region, and late predictions enter a high-margin decoder basin. Once inside, token realization is surprisingly simple on generated ELF states: frozen T5 (Text-to-Text Transfer Transformer) token-embedding lookup recovers $93$--$96\%$ of native decoder decisions, and a single linear readout reaches $97.9\%$ agreement at 32k samples, leaving an $\approx1.1$--$1.2$ perplexity gap in a structured residual tail. Under conservative held-out gates, a margin rule exits roughly $17$--$28\%$ earlier in denoising steps under an explicit diagnostic monitor. Boundary checks on LangFlow, BitstreamDiffusion, and the Continuous Latent Diffusion Language Model (Cola-DLM) show that the same interface questions remain meaningful when the state object and decoder change. Continuous and latent diffusion language models should therefore be evaluated as representation-decoder systems.

URL PDF HTML ☆

赞 0 踩 0

2606.17120 2026-06-17 cs.LG physics.chem-ph 新提交

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

噪声驱动从亚稳态逃逸解释深度神经网络中的grokking现象

Ibrahim Talha Ersoy, Karoline Wiesner

发表机构 * Complexity Science Group, Institute of Physics and Astronomy, University of Potsdam（波茨坦大学物理与天文研究所复杂性科学组）

AI总结本文通过线性DNN模型证明，grokking现象源于L2正则化引起的一阶相变中的迟滞效应，SGD噪声驱动模型从低精度亚稳态逃逸，逃逸时间符合Arrhenius标度。

Comments 13 pages, 4 figures. Accepted at HiLD 2026: 4th Workshop on High-dimensional Learning Dynamics

详情

AI中文摘要

深度神经网络（DNN）在L2正则化强度变化下表现出第一阶相变，每个相变标志着新可学习特征的出现。在临界正则化强度以下，所有特征原则上可学习，但共存的亚稳态（由能量势垒分隔）可能困住网络并阻碍收敛。DNN的优势在于其泛化能力，但仍有许多开放问题，其中包括所谓的grokking的起源：在长时间明显的过拟合后突然延迟出现的泛化。我们在线性DNN中证明，grokking与一阶L2相变中的迟滞一致：通过使用L2正则化设计有意的困住，我们证明低精度亚稳态中的模型仅在SGD噪声驱动其跨越能量势垒时逃逸，逃逸时间遵循Arrhenius标度。我们通过故意将模型困在亚稳态中，在逃逸时间两个数量级范围内重现了类似grokking的延迟收敛。使用稀疏子采样，我们还重现了典型的grokking曲线，其中测试误差最终接近最终训练误差。我们的工作表明，亚稳态的数量等于可学习特征的数量——每个数据协方差的奇异值对应一个——迟滞的潜力随任务复杂度自然增长。我们提供证据表明相同机制可能适用于一般非线性DNN。我们的结果为更高效的学习方案提供了途径。

英文摘要

Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all features are in principle learnable, but coexisting metastable states, separated by energy barriers, can trap the network and impede convergence. A strength of DNNs is their ability to generalize. But many open questions remain, among them the origin of so called grokking: the abrupt, delayed onset of generalization after prolonged apparent overfitting. We show for linear DNNs that grokking is consistent with hysteresis in first-order L2 phase transitions: using L2 regularization to engineer deliberate trapping, we demonstrate that a model in a low-accuracy metastable state escapes only when SGD noise drives it across an energy barrier, with escape times following Arrhenius scaling. We reproduce grokking-like delayed convergence across two orders of magnitude in escape time by deliberately trapping models in metastable phases. Using sparse sub-sampling we also reproduce the canonical grokking curve where test error eventually approaches the final training error. Our work suggests that the number of metastable states equals the number of learnable features -- one per singular value of the data covariance -- the potential for hysteresis grows naturally with task complexity. We provide evidence that the same mechanism likely operates in general nonlinear DNNs. Our results provide routes toward more efficient learning schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.17215 2026-06-17 cs.LG cs.DS stat.ML 新提交

Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization

鲁棒半空间学习中重加权铰链方法的平方和度障碍：一个Christoffel函数刻画

Xiaoyu Li

发表机构 * Xiaoyu Li（李小宇）

AI总结本文通过Christoffel函数精确刻画了有界度证书无法去除的异常质量，揭示了重加权铰链方法在恶意噪声下学习γ-间隔半空间时，证书的SoS度与异常容忍度之间的基本权衡。

详情

再探概率测度的Log-PCA：一种动力学公式与统计收敛性

Peng Xu, Changbo Zhu, Young-Heon Kim, Xiaohui Chen

发表机构 * Department of Statistics University of Illinois Urbana-Champaign（统计学系伊利诺伊大学厄巴纳-香槟分校）； Department of ACMS University of Notre Dame（ACMS系诺丁汉大学）； Department of Mathematics University of British Columbia（数学系不列颠哥伦比亚大学）； Department of Mathematics Thomas Lord Department of Computer Science University of Southern California（数学系托马斯·劳德计算机科学系南加州大学）

AI总结本文在Wasserstein几何下提出一种动力学公式解释log-PCA，称为Wasserstein切向PCA（WT-PCA），并推导了经验WT-PCA相对于总体测度的统计收敛速率。

2606.17260 2026-06-17 math.OC cs.LG stat.ML 交叉投稿

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

基于确定性积分时间的哈密顿动力学的加速凸优化

Xiuyuan Wang, Vishwak Srinivasan, Qiang Fu, Siddharth Mitra, Ashia Wilson, Andre Wibisono

发表机构 * Department of Computer Science, Yale University（耶鲁大学计算机科学系）； Department of EECS, Massachusetts Institute of Technology（麻省理工学院电子工程与计算机科学系）

AI总结提出基于哈密顿动力学的平滑凸优化算法，通过利用平均哈密顿流轨迹的收缩而非端点收缩，实现确定性加速收敛，并推导出具有最优一阶复杂度的离散实现。

Comments 51 pages, 7 figures. Accepted to the 39th Annual Conference on Learning Theory (COLT 2026)

2606.17319 2026-06-17 stat.ML cs.LG math.CO math.ST stat.TH 交叉投稿

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

低次稀疏布尔多项式的紧 $L_\infty$ 样本复杂度

Jasper van Doornmalen, Mathieu Molina, Victor Verdugo, José Verschae

发表机构 * Institute for Mathematical and Computational Engineering（数学与计算工程研究所）； Pontificia Universidad Católica de Chile（智利天主教大学）； Blavatnik School of Computer Science and AI（Blavatnik计算机科学与人工智能学院）； Tel Aviv University（特拉维夫大学）； Department of Industrial and Systems Engineering（工业与系统工程系）

AI总结针对有界二进制黑箱函数优化，研究布尔超立方体上多项式代理的学习问题，要求均匀 $L_\infty$ 误差保证，刻画了次高斯噪声下两类有界多项式的最小最大样本复杂度。

详情

AI中文摘要

受有界二进制黑箱函数优化的启发，我们研究了在布尔超立方体上学习多项式代理的问题。为了确保优化代理能为底层目标产生良好解，我们需要均匀的 $L_\infty$ 误差保证，而非通常的 $L_2$ 型保证。我们刻画了次高斯噪声下两类有界多项式的均匀估计的最小最大样本复杂度。首先，对于 $n$ 个变量上次数至多为 $d$ 的多项式，样本复杂度为 $n^{d+1}$。其次，对于 $s$-稀疏 Fourier-Walsh 多项式且 $s \leq n$，样本复杂度为 $ns^2$。这些速率在结构上不同于无噪声情形，其中均匀精确恢复的速率分别为 $n^d$ 和 $ns$。我们的下界甚至对任意自适应学习者也成立，表明额外的因子是噪声情形固有的。$L_2$ 范数的标准傅里叶分析工具不能自然地扩展到 $L_\infty$ 设置以产生均匀保证。我们的证明通过依赖适当选择的辅助范数作为控制 $L_\infty$ 误差的代理来克服这一困难。总之，我们的结果提供了学习优化安全多项式代理的样本复杂度的紧刻画。

英文摘要

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

URL PDF HTML ☆

赞 0 踩 0

2606.17426 2026-06-17 stat.ML cs.LG math.PR 交叉投稿

Bounded Difference Concentration for Infinitely Exchangeable Sequences with Applications to AI Benchmark Uncertainty

无限可交换序列的有界差分集中不等式及其在AI基准不确定性中的应用

Fangyuan Lin, Spencer Frei, Victor H. de la Pena

发表机构 * Department of Statistics, Columbia University（哥伦比亚大学统计系）； Google DeepMind（谷歌DeepMind）

AI总结通过de Finetti测度分解有界差分函数的偏差，提出有效方差代理的集中不等式，并证明零和线性对比中潜在混合项完全抵消，应用于AI基准如MMLU的不确定性量化。

详情

AI中文摘要

我们考虑无限可交换随机变量函数的集中性质。通过对de Finetti导向测度取条件，我们证明任何具有有界差分常数$c_1, \dots, c_n$的函数的偏差分解为条件采样波动和潜在混合波动。当该潜在混合是$\sigma_{\mathrm{mix}}^2$-次高斯时，我们建立了一个有效方差代理为$\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$的集中不等式。关键的是，我们证明对于零和线性对比，例如子样本均值与总体均值之差，潜在混合项完全抵消。这种抵消产生了一个紧的、无混合的Hoeffding型界，为近期有限可交换集中结果的无限可扩展极限提供了直接的de Finetti机制。我们将该框架应用于量化复合AI基准（如MMLU）中的不确定性，其中问题项在领域间自然表现出可交换依赖性。我们的结果既提供了一个领域分层层次模型来限制准确率分数的不确定性，也提供了一个无分布、节省成本的统计保证，用于从随机子集准确估计完整的基准分数。

英文摘要

We consider the concentration properties of functions of infinitely exchangeable random variables. By conditioning on the de Finetti directing measure, we show that the deviation of any function with bounded-difference constants $c_1, \dots, c_n$ decomposes into a conditional sampling fluctuation and a latent mixture fluctuation. When this latent mixture is $σ_{\mathrm{mix}}^2$-subgaussian, we establish a concentration inequality with an effective variance proxy of $\frac{1}{4}\sum_i c_i^2 + σ_{\mathrm{mix}}^2$. Crucially, we demonstrate that for zero-sum linear contrasts, such as the difference between a subsample mean and a full population mean, the latent mixture term cancels exactly. This cancellation yields a tight, mixture-free Hoeffding-type bound that provides a direct de Finetti mechanism for the infinite-extendibility limit of recent finite-exchangeable concentration results. We apply this framework to quantify uncertainty in composite AI benchmarks, such as MMLU, where question items naturally exhibit exchangeable dependence across domains. Our results provide both a domain-stratified hierarchical model for bounding the uncertainty of accuracy scores, and a distribution-free, cost-saving statistical guarantee for accurately estimating full benchmark scores from random subsets.

URL PDF HTML ☆

赞 0 踩 0

2606.17523 2026-06-17 math.OC cs.LG 交叉投稿

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

超越IGO流：面向连续空间中IGO的收敛性分析

Ryosuke Kimura, Youhei Akimoto

发表机构 * University of Tsukuba, Tsukuba, Japan（茨口大学）； RIKEN Center for Advanced Intelligence Project, Tokyo, Japan（理化学研究所先进情报项目）

AI总结研究离散时间IGO在连续空间中的收敛性，针对强凸二次目标函数上的多元高斯族，证明了协方差矩阵收敛到零矩阵，并在条件数有界时均值向量收敛到全局最优。

Comments Accepted at PPSN 2026

详情

AI中文摘要

信息几何优化（IGO）通过将搜索分布的适应解释为自然梯度更新，为黑箱优化提供了统一框架。尽管其概念重要，IGO的收敛理论仍然有限：大多数现有结果涉及连续时间理想化，如IGO流，而非具有非无穷小学习率的离散时间更新。在本文中，我们研究连续空间中的离散时间IGO，将其表述为指数族期望参数坐标下的自然梯度更新。特别地，我们分析了在强凸二次目标函数上对多元高斯族的IGO。我们的分析涵盖了一个同时结合全协方差适应、固定正学习率和基于分位数权重的设置。在此设置中，我们证明了协方差矩阵收敛到零矩阵。我们进一步表明，如果适当缩放的协方差矩阵的条件数在足够频繁的迭代中有界，则均值向量收敛到全局最优。这些结果推进了IGO的收敛理论，并有助于弥合IGO数学理论与实际协方差自适应搜索方法（如CMA-ES）之间的差距。

英文摘要

Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

URL PDF HTML ☆

赞 0 踩 0

2606.18074 2026-06-17 stat.ML cs.LG stat.ME 交叉投稿

Tensor-based second-order causal discovery

基于张量的二阶因果发现

Nathan Ouyang, Kexin Wan, Anna Seigal

AI总结提出TSCD算法，利用观测和干预数据的协方差矩阵张量，在线性结构方程模型下识别有向无环图及其边函数，仅要求噪声不相关，并扩展到非线性模型，具有对数级干预可识别性。

Comments 27 pages, 7 figures. Code available at https://github.com/QWE123665/Tensor-based-Second-order-Causal-Discovery

详情

AI中文摘要

大提示词机制下的Softmax作为线性注意力：基于测度的视角

Etienne Boursier, Claire Boyer

AI总结提出基于测度的框架，证明在无限提示词极限下softmax注意力收敛到线性算子，并给出有限提示词下的非渐近浓度界，从而将线性注意力的优化分析迁移到大提示词下的softmax注意力。

详情

AI中文摘要

Softmax注意力是Transformer架构的核心组成部分，但其非线性结构给理论分析带来了重大挑战。我们开发了一个统一的、基于测度的框架，用于研究有限和无限提示词下的单层softmax注意力。对于独立同分布的高斯输入，我们利用softmax算子在大提示词极限下收敛到作用于底层输入标记测度的线性算子这一事实。基于这一见解，我们建立了softmax注意力输出和梯度的非渐近浓度界，量化了有限提示词模型接近其无限提示词对应模型的速度，并证明了在具有次高斯标记的一般上下文学习设置中，这种浓度在整个训练轨迹上保持稳定。在线性回归的上下文学习中，我们利用易处理的无限提示词动力学来分析有限提示词长度下的训练。我们的结果表明，当提示词足够长时，为线性注意力开发的优化分析可以直接迁移到softmax注意力上，表明大提示词下的softmax注意力继承了其线性对应物的分析结构。这反过来为研究大提示词机制下softmax注意力层的训练动力学和统计行为提供了一个有原则且广泛适用的工具包。

英文摘要

Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gaussian inputs, we lean on the fact that the softmax operator converges in the infinite-prompt limit to a linear operator acting on the underlying input-token measure. Building on this insight, we establish non-asymptotic concentration bounds for the output and gradient of softmax attention, quantifying how rapidly the finite-prompt model approaches its infinite-prompt counterpart, and prove that this concentration remains stable along the entire training trajectory in general in-context learning settings with sub-Gaussian tokens. In the case of in-context linear regression, we use the tractable infinite-prompt dynamics to analyze training at finite prompt length. Our results allow optimization analyses developed for linear attention to transfer directly to softmax attention when prompts are sufficiently long, showing that large-prompt softmax attention inherits the analytical structure of its linear counterpart. This, in turn, provides a principled and broadly applicable toolkit for studying the training dynamics and statistical behavior of softmax attention layers in large prompt regimes.

URL PDF HTML ☆

赞 0 踩 0

2601.10962 2026-06-17 cs.LG cond-mat.dis-nn 版本更新

Noise-Driven Exploration and Transient Freezing Select Flat Minima in Stochastic Gradient Descent

噪声驱动的探索与瞬态冻结在随机梯度下降中选择平坦极小值

Ning Yang, Yikuan Zhang, Qi Ouyang, Chao Tang, Yuhai Tu

AI总结通过分析SGD学习动力学，发现非平衡机制驱动解选择：瞬态探索阶段逃离尖锐谷，噪声重塑势能稳定平坦解，冻结延迟增强泛化。

Comments 12 pages, 4 figures

详情

AI中文摘要

随机梯度下降（SGD）是深度学习的核心，但其偏好更平坦、更泛化解的动力学起源仍不清楚。本文通过分析SGD学习动力学，识别出一种非平衡机制，该机制在训练过程中控制解的选择。数值实验揭示了一个瞬态探索阶段，在此阶段SGD轨迹反复逃离尖锐谷，并向损失景观中更平坦的区域迁移，然后才被限制在最终盆地中。利用一个可处理的物理模型，我们证明SGD噪声将损失景观重塑为一个有效势能，该势能优先稳定平坦解。我们进一步揭示了一种瞬态冻结机制：随着训练进行，平坦化的景观抑制了竞争谷之间的跃迁。更强的SGD噪声延迟了这种冻结转变，延长了探索阶段，从而增加了收敛到更平坦极小值的概率。这些结果共同提供了一个统一的物理框架，连接了学习动力学、损失景观几何和泛化，并为设计更有效的优化算法提供了指导原则。

英文摘要

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism that governs solution selection during training. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and migrate toward flatter regions of the loss landscape before becoming confined to a final basin. Using a tractable physical model, we show that SGD noise reshapes the loss landscape into an effective potential that preferentially stabilizes flat solutions. We further uncover a transient freezing mechanism: as training progresses, the flattening landscape suppresses transitions between competing valleys. Stronger SGD noise delays this freezing transition, prolonging the exploratory phase and thereby increasing the probability of convergence to flatter minima. Together, these results provide a unified physical framework connecting learning dynamics, loss-landscape geometry, and generalization, and suggest guiding principles for the design of more effective optimization algorithms.

URL PDF HTML ☆

赞 0 踩 0

2602.06257 2026-06-17 cs.LG cs.GT 版本更新

On Randomized Algorithms in Online Strategic Classification

关于在线策略分类中的随机化算法

Chase Hutton, Adam Melrod, Han Shao

AI总结研究在线策略分类中随机化算法的优势，在可实现和不可知场景下分别给出基于Littlestone维度和操纵图最大度的改进界限，并证明随机化可突破确定性算法的下界。

详情

AI中文摘要

在线策略分类研究智能体策略性地修改其特征以获得有利预测的场景。例如，给定一个基于信用评分决定贷款批准的分类器，申请人可能开设或关闭信用卡和银行账户以获得正面预测。学习目标是在此类行为下实现低错误率或遗憾界。尽管随机化算法在策略环境中可能为学习者带来优势，但它们尚未得到充分探索。在可实现场景中，随机化算法没有已知的下界，而确定性学习者的现有下界构造可以通过随机化规避。在不可知场景中，已知的最佳遗憾上界为$O(T^{3/4}\log^{1/4}T|\mathcal H|)$，远低于标准在线学习率$O(\sqrt{T\log|\mathcal H|})$。在这项工作中，我们为两种场景下的在线策略分类提供了精细化的界限；我们的界限依赖于假设类$\mathcal H$的Littlestone维度$\mathrm{Ldim}(\mathcal H)$和操纵图的最大度$\Delta$。在可实现场景中，对于$T > \mathrm{Ldim}(\mathcal H) \Delta^2$，我们将确定性学习者的现有下界$\Omega(\mathrm{Ldim}(\mathcal H) \Delta)$扩展到所有学习者。这产生了第一个适用于随机化学习者的下界。然后，我们提供了第一个随机化学习者，改进了已知的（确定性）上界$O(\mathrm{Ldim}(\mathcal H) \cdot \Delta \log \Delta)$。在不可知场景中，我们给出了一个非恰当随机化学习者，将遗憾上界改进为$O(\sqrt{T\log|\mathcal H|})$，匹配标准在线学习率。我们还展示了所有恰当学习规则的更大下界，证明非恰当性对于达到最优率是必要的。

英文摘要

Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings; our bounds depend on the Littlestone dimension $\mathrm{Ldim}(\mathcal H)$ of the hypothesis class $\mathcal H$ and the maximum degree $Δ$ of the manipulation graph. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal H) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal H) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We then provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give an improper randomized learner that improves the regret upper bound to $O(\sqrt{T\log|\mathcal H|})$, matching the standard online learning rate. We also show a larger lower bound for all proper learning rules, demonstrating that improperness is necessary to achieve the optimal rate.

URL PDF HTML ☆

赞 0 踩 0

2606.14187 2026-06-17 cs.LG 版本更新

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

Zeta: 通过坐标自适应预处理实现矩阵优化的双重白化

Kaiwen Chen, Shuhai Zhang, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan, Qiuwu Chen

发表机构 * South China University of Technology（华南理工大学）； AIGCode ； Hong Kong Baptist University（香港浸会大学）

AI总结针对矩阵优化中坐标尺度异质性问题，提出双重白化优化器Zeta，通过先坐标白化后谱白化的严格顺序降低正交化误差，在语言建模和视觉任务上提升收敛速度与泛化性能。

详情

AI中文摘要

大规模神经网络训练日益依赖矩阵感知优化器，这类优化器利用权重参数的结构，超越逐元素自适应。然而，现有矩阵感知方法（如Muon）存在一个未被充分认识的脆弱性：其核心操作Newton-Schulz迭代严重依赖于输入条件，而原始动量矩阵表现出严重的坐标尺度异质性。本文首先通过卡方均匀性检验验证了这种尺度异质性，表明矩阵内尺度不平衡在Transformer层中普遍存在，且坐标白化能有效纠正。受此发现启发，我们提出Zeta，一种双重白化优化器，在严格有序的流程中应用坐标白化和谱白化。该顺序不是可调选择，而是源于数学依赖：坐标白化建立了谱白化可靠运行所需的统计各向同性。我们进一步证明，通过改善输入的条件数，该双重流程相对于纯谱方法严格降低了正交化误差。实验上，Zeta在语言建模（0.6B至8B参数）、混合专家架构和视觉任务中匹配或超越强基线，表明在正交化前解决尺度不平衡能带来更快的收敛和更好的泛化。代码可在该https URL获取。

英文摘要

Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, Newton-Schulz iteration, depends critically on input conditioning, yet the raw momentum matrices exhibit severe coordinate-wise scale heterogeneity. In this paper, we first verify this scale heterogeneity through a chi-square uniformity test, showing that intra-matrix scale imbalance is prevalent across Transformer layers and that coordinate whitening effectively corrects it. Motivated by this finding, we propose Zeta, a dual whitening optimizer that applies coordinate whitening and spectral whitening in a strictly ordered pipeline. The ordering is not a tunable choice but follows from a mathematical dependency: coordinate whitening establishes the statistical isotropy that spectral whitening requires to function reliably. We further prove that this dual pipeline strictly reduces orthogonalization error relative to pure spectral methods by improving the condition number of the input. Empirically, Zeta matches or surpasses strong baselines across language modeling (0.6B to 8B parameters), mixture-of-experts architectures, and vision tasks, demonstrating that resolving scale imbalance before orthogonalization leads to faster convergence and better generalization. Code is available at https://github.com/AIGCodeOS/aigcode_zeta_optimizer.

URL PDF HTML ☆

赞 0 踩 0

2405.15379 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Randomized Midpoint Method for Log-Concave Sampling under Constraints

对数凹分布约束采样的随机中点方法

Yifeng Yu, Shijie Zhang, Lu Yu

AI总结提出约束域中过阻尼和动能朗之万扩散的随机中点离散化方法，通过投影算子建立统一框架，证明Wasserstein-q距离下的收敛保证并得到近最优下界。

详情

AI中文摘要

本文研究在凸紧集上支撑的对数凹分布的采样问题，特别关注约束域中过阻尼和动能朗之万扩散的随机中点离散化。我们重新审视了通过投影算子处理约束的近端框架，并发展了一个更通用的公式，涵盖了欧几里得、Bregman和Gauge投影。由此产生的光滑近似允许对约束下的朗之万算法及其变体进行统一且易于处理的分析。在此框架内，我们建立了光滑代理与目标分布之间Wasserstein-$q$（$q\geqslant 1$）距离的收敛保证。我们进一步推导了互补的下界，表明结果在阶上是近乎最优的。基于这种紧致近似分析，我们获得了约束下随机中点朗之万算法的新收敛保证，以及普通和动能朗之万蒙特卡洛方法的改进界，从而推进了约束扩散采样的理论理解。

英文摘要

In this paper, we study the problem of sampling from log-concave distributions supported on convex and compact sets, with a particular focus on the randomized midpoint discretization of both overdamped and kinetic Langevin diffusions in constrained domains. We revisit the proximal framework for handling constraints through projection operators and develop a more general formulation that encompasses Euclidean, Bregman, and Gauge projections. The resulting smooth approximation allows a unified and tractable analysis of Langevin algorithms and their variants under constraints. Within this framework, we establish convergence guarantees in Wasserstein-$q$ $(q\geqslant 1)$ distances between the smooth surrogate and the target distribution. We further derive complementary lower bounds, showing that the results are near-optimal in order. Building upon this tight approximation analysis, we obtain new convergence guarantees for the randomized midpoint Langevin algorithms and refined bounds for both vanilla and kinetic Langevin Monte Carlo methods under constraints, thereby advancing the theoretical understanding of constrained diffusion-based sampling.

URL PDF HTML ☆

赞 0 踩 0

2501.10729 2026-06-17 stat.ME cs.LG stat.ML 版本更新

Robust Local Polynomial Regression with Similarity Kernels

基于相似性核的稳健局部多项式回归

Yaniv Shulman

AI总结针对传统局部多项式回归对异常值敏感的问题，提出一种结合响应变量信息的条件密度核加权方法，通过局部密度估计降低异常值影响，在保持与标准LOWESS竞争力同时降低经验偏差。

详情

AI中文摘要

局部多项式回归（LPR）因其灵活性和简单性，是一种广泛使用的非参数方法，用于建模复杂关系。它通过拟合低阶多项式到数据的局部子集（按邻近度加权）来估计回归函数。然而，传统的LPR对异常值和高杠杆点敏感，这些点会显著影响估计精度。本文重新审视用于计算回归权重的核函数，并提出一种新颖的框架，将预测变量和响应变量都纳入加权机制。本工作的重点是一种条件密度核，通过局部密度估计减轻异常值的影响，从而稳健地估计权重。所提出的方法已在Python中实现，并在此https URL公开提供。总体分析量化了基于密度的稳健加权引起的偏差，报告的实验显示，与迭代稳健LOWESS相比，经验偏差更低，同时与标准LOWESS保持竞争力。这一进展为传统LPR提供了有前景的扩展，为稳健回归应用开辟了新的可能性。

英文摘要

Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. The proposed method is implemented in Python and is publicly available at https://github.com/yaniv-shulman/rsklpr. The population analysis quantifies the bias induced by density-based robust weighting, and the reported experiments show lower empirical bias than iterative robust LOWESS while remaining competitive with standard LOWESS. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.

URL PDF HTML ☆

赞 0 踩 0

2507.05164 2026-06-17 math.DS cs.LG nlin.AO 版本更新

A Dynamical Systems Perspective on the Analysis of Neural Networks

神经网络分析的动力学系统视角

Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

AI总结利用动力学系统重新表述深度神经网络、梯度下降等挑战，研究信息传播、训练动态和平均场极限，揭示网络嵌入、稳定性及图极限等性质。

Comments preprint of a book chapter contribution

详情

AI中文摘要

在本章中，我们利用动力学系统分析机器学习算法的几个方面。作为阐述性贡献，我们展示了如何将深度神经网络、（随机）梯度下降及相关主题中的各种挑战重新表述为动力学陈述。我们还解决了三个具体挑战。首先，我们考虑信息通过神经网络的传播过程，即研究不同架构下的输入-输出映射。我们解释了增强神经ODE的通用嵌入性质（可表示给定正则性的任意函数）、根据合适函数类对多层感知器和神经ODE的分类，以及神经延迟方程中的记忆依赖性。其次，我们从动力学角度考虑神经网络的训练方面。我们描述了梯度下降的动力学系统视角，并研究了超定问题的稳定性。然后我们将此分析扩展到过参数化设置，并描述了稳定性边缘现象，也涉及隐式偏差的可能解释。对于随机梯度下降，我们通过插值解的Lyapunov指数展示了过参数化设置的稳定性结果。第三，我们解释了关于神经网络平均场极限的几个结果。我们描述了一个结果，该结果通过有向图测度将现有技术扩展到涉及图极限的异质神经网络。这表明大类神经网络自然落入图上Kuramoto型模型及其大图极限的框架内。最后，我们指出使用动力学研究可解释和可靠AI的类似策略也可应用于生成模型或梯度训练方法中的基本问题（如反向传播或梯度消失/爆炸）等设置。

英文摘要

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

URL PDF HTML ☆

赞 0 踩 0

2507.11366 2026-06-17 cs.GT cs.LG 版本更新

Characterizing Nash Equilibria in Zero-Sum Games: A Physics-Inspired, Parallelizable Approach with a Linear Number of Gradient Queries

零和博弈中纳什均衡的表征：一种受物理学启发、可并行化且具有线性梯度查询次数的方法

Taemin Kim, James P. Bailey

发表机构 * Industrial and Systems Engineering（工业与系统工程系）； Rensselaer Polytechnic Institute（伦塞拉尔理工学院）

AI总结提出一种受哈密顿动力学启发的在线优化方法，通过交替梯度下降在线性迭代次数内表征零和博弈的纳什均衡集，支持并行化和任意学习率，实验性能显著优于传统方法。

2602.17894 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

从有偏且昂贵的数据源学习：预算下的极小极大最优数据收集

Michael O. Harding, Vikas Singh, Kirthevasan Kandasamy

AI总结针对预算固定的多源数据收集问题，提出最大化有效样本量的采样方案，结合事后分层估计器，实现极小极大最优风险。

Comments COLT 2026

详情

AI中文摘要

数据收集是现代统计和机器学习流程的关键组成部分，特别是当必须从多个异质数据源收集数据以研究感兴趣的目标总体时。在许多用例中，如医学研究或政治民意调查，不同数据源产生不同的采样成本。观测通常具有相关的群体身份——例如健康指标、人口统计或政治派别——并且这些群体的相对组成可能在源总体之间以及源总体与目标总体之间存在显著差异。在这项工作中，我们研究在固定预算下的多源数据收集，重点关注总体均值和群体条件均值的估计。我们表明，朴素的数据收集策略（例如试图“匹配”目标分布）或依赖标准估计量（例如样本均值）可能高度次优。相反，我们开发了一种采样方案，该方案最大化有效样本量——总样本量除以 $D_{\chi^2}(q\mid\mid\overline{p}) + 1$，其中 $q$ 是目标分布，$\overline{p}$ 是聚合源分布，$D_{\chi^2}$ 是 $\chi^2$ 散度。我们将此采样方案与经典的事后分层估计器配对，并给出其风险的上界。我们提供了匹配的下界，证明我们的方法达到了预算下的极小极大最优风险。我们的技术也扩展到最小化超额风险的预测问题，为具有昂贵和异质数据源的多源学习提供了原则性方法。

英文摘要

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{χ^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{χ^2}$ is the $χ^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

URL PDF HTML ☆

赞 0 踩 0

2603.02159 2026-06-17 stat.ML cs.LG 版本更新

Instrumental and Proximal Causal Inference with Gaussian Processes

基于高斯过程的工具变量和近端因果推断

Yuqi Zhang, Krikamol Muandet, Dino Sejdinovic, Edwin Fong, Siu Lun Chau

AI总结提出去条件高斯过程框架，用于存在未观测混杂时的因果推断，同时提供可靠的后验不确定性量化，并通过边际似然优化实现模型选择。

数据科学中的表示代价：基础与深度神经网络的拟巴拿赫空间

Greg Ongie, Rahul Parhi

发表机构 * Marquette University（马凯特大学）； University of California, San Diego（加州大学圣地亚哥分校）

AI总结本文建立了一个统一框架，通过参数空间正则化子分析参数化数据拟合方法的表示代价，揭示了深度神经网络诱导的本征空间是拟巴拿赫空间，并证明了表示定理等自然结果。

详情

AI中文摘要

我们开发了一个通用框架，通过参数空间正则化子分析参数化数据拟合方法的表示代价。从这个抽象视角，我们定义了任意参数化模型的表示代价，并揭示了它们诱导的（本征）函数空间。这统一了最近数据拟合方法的函数空间观点。我们还证明了许多自然结果在这个抽象设置中成立，包括参数方法在其本征空间上的表示定理。该框架还严格地将参数化方法与其在充分过参数化下的等价非参数描述联系起来。经典方法及其本征空间，如核方法/再生核希尔伯特空间、小波/贝索夫空间和浅层神经网络/变分空间，都是我们抽象框架的特例。将表示代价研究“公理化”的一个副产品是，我们立即获得了深度神经网络的新结果：对于深度为$L$的前馈ReLU网络，其诱导的本征空间是$p$范数可拟的拟巴拿赫空间，其中$p = 2/L$。这揭示了深度神经网络的归纳偏置（由表示代价给出）在深度$L > 2$时无法被范数捕捉。

英文摘要

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

URL PDF HTML ☆

赞 0 踩 0

2606.17118 2026-06-17 cs.LG cs.AI 新提交

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

MODE: 面向MoE多模态大语言模型的模态分解专家级混合精度量化

Yuanteng Chen, Peisong Wang, Zhilei Liu, Nanxin Zeng, Yuantian Shao, Shiqiang Lang, Tao Liu, Chuangyi Li, Qinghao Hu, Gang Li, Jing Liu, Jian Cheng

发表机构 * Institute of Automation, Chinese Academy of Sciences（中国科学院自动化研究所）； School of Artificial Intelligence, University of Chinese Academy of Sciences（中国科学院大学人工智能学院）； Zhongguancun Academy（中关村学院）

AI总结针对MoE多模态大语言模型在专家重要性估计中存在的跨模态和视觉内偏差，提出模态分解的专家级混合精度量化框架MODE，通过分解选择频率、过滤冗余视觉令牌并评估模态敏感性，在给定预算下分配比特宽度，在W3A16下平均性能损失控制在2.9%以内。

Comments 18 pages, 8 figures

详情

AI中文摘要

混合专家多模态大语言模型（MoE-MLLMs）性能卓越，但GPU内存成本高昂，因此压缩至关重要。在PTQ方法中，专家级混合精度量化已被证明对MoE-LLMs有效，但由于专家重要性估计中两个被忽视的偏差，在MoE-MLLMs上性能显著下降。（1）在跨模态层面，视觉令牌的数值优势导致专家选择频率被视觉令牌主导，掩盖了对文本模态至关重要的专家；（2）在视觉内层面，大量冗余视觉令牌进一步扭曲频率统计，模糊了对信息性视觉内容关键的专家。为弥补差距，我们提出MODE，一种面向MoE-MLLMs的模态分解专家级混合精度量化框架，该框架按模态分解专家选择频率，过滤冗余视觉令牌以获得去噪的视觉频率，并进一步评估每个模态的量化敏感性作为基于频率估计的补充信号。这些信号被整合到整数线性规划公式中，以在给定预算下分配每个专家的比特宽度。大量实验表明，MODE特别适合MoE-MLLMs，在W3A16下平均性能损失限制在2.9%以内，在极端2比特设置下获得更大增益。

英文摘要

Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs) offer remarkable performance but incur prohibitive GPU memory costs, making compression essential. Among PTQ methods, expert-level mixed-precision quantization has proven effective for MoE-LLMs, yet suffers notable degradation on MoE-MLLMs due to two overlooked biases in expert importance estimation. (1) At the cross-modal level, the numerical dominance of vision tokens causes expert selection frequency to be dominated by vision tokens, masking experts that are critical to the text modality; (2) at the intra-vision level, the large proportion of redundant vision tokens further skew frequency statistics, obscuring experts critical for informative visual content. To bridge gaps, we propose MODE, a modality-decomposed expert-level mixed-precision quantization framework for MoE-MLLMs that decomposes expert selection frequency by modality, filters redundant vision tokens to obtain denoised visual frequency, and further evaluates quantization sensitivity per modality as a complementary signal to frequency-based estimation. These signals are integrated into an Integer Linear Programming formulation to assign per-expert bit-widths under a given budget. Extensive experiments show that MODE is particularly well-suited for MoE-MLLMs, limiting average performance loss to within 2.9% at W3A16, with larger gains at the extreme 2-bit setting.

URL PDF HTML ☆

赞 0 踩 0

2606.17460 2026-06-17 cs.LG cs.NA math.NA physics.comp-ph 新提交

Operator Boosting Produces Pareto-Efficient PDE Surrogates

算子提升产生帕累托高效的PDE代理模型

Lennon J. Shikhman

发表机构 * College of Computing, Georgia Institute of Technology（佐治亚理工学院计算学院）； Department of Mathematics and Systems Engineering, Florida Institute of Technology（佛罗里达理工学院数学与系统工程系）

AI总结提出算子提升框架，通过残差学习直接构建紧凑神经算子代理，在30个数据集-架构对上平均准确率提升，参数量减少72-95%，并在多个PDE基准上实现帕累托改进。

Comments 19 pages, 4 figures, 3 tables. Preprint submitted to Elsevier

详情

AI中文摘要

神经算子被广泛用作偏微分方程（PDE）的代理解映射，但在多查询科学工作流中，全尺寸模型可能存储、部署和评估成本高昂。本文引入算子提升（Operator Boosting），一种逐阶段残差学习框架，直接构建紧凑的神经算子代理，而非先训练大模型再压缩。从归一化输出坐标中的经验均值预测器开始，该方法在残差场上训练一系列同族小型神经算子，并通过验证选择的收缩整合每个修正。我们以傅里叶神经算子（FNO）、DeepONet和卷积神经算子（CNO）实例化该框架，并将提升的小型堆栈与来自PDEBench、APEBench和The Well的一维、二维和三维PDE基准上的全尺寸单体基线进行比较。在30个数据集-架构对中，21个显示平均准确率正向提升，17个具有正置信区间，而所有提升堆栈的可训练参数数量减少约72-95%。最佳模型比较显示，在10个完成的PDE基准中，有7个实现了经验帕累托改进，包括二维纳维-斯托克斯方程、浅水动力学、达西流、一维输运和反应系统，以及三维可压缩纳维-斯托克斯方程。这些结果表明，算子提升通常改善了神经PDE代理的经验准确率-参数帕累托前沿，同时也揭示了残差提升未能抵消压缩的PDE和架构依赖区域。

英文摘要

Neural operators are widely used as surrogate solution maps for partial differential equations (PDEs), but full-size models can be costly to store, deploy, and evaluate in many-query scientific workflows. This work introduces Operator Boosting, a stagewise residual-learning framework for constructing compact neural-operator surrogates directly, rather than training a large model and compressing it afterward. Starting from the empirical mean predictor in normalized output coordinates, the method trains a sequence of tiny same-family neural operators on residual fields and incorporates each correction through validation-selected shrinkage. We instantiate the framework with Fourier neural operators (FNOs), DeepONets, and convolutional neural operators (CNOs), and compare boosted tiny stacks against full-size monolithic baselines across one-, two-, and three-dimensional PDE benchmarks from PDEBench, APEBench, and The Well. Across 30 dataset-architecture pairs, 21 show positive mean accuracy gains and 17 have positive confidence intervals, while all boosted stacks reduce trainable parameter count by approximately 72-95%. Best-model comparisons show empirical Pareto improvements on 7 of 10 completed PDE benchmarks, including two-dimensional Navier-Stokes, shallow-water dynamics, Darcy flow, one-dimensional transport and reaction systems, and three-dimensional compressible Navier-Stokes. These results show that Operator Boosting often improves the empirical accuracy-parameter Pareto frontier of neural PDE surrogates, while also exposing PDE- and architecture-dependent regimes where residual boosting fails to offset compression.

URL PDF HTML ☆

赞 0 踩 0

2606.17471 2026-06-17 cs.LG cs.SY eess.SY 新提交

ReRAM-aware Model Finetuning addressing I-V Non-linearity and Retention Errors

面向ReRAM的模型微调：解决I-V非线性和保留误差

Ching-Yi Lin, Shamik Kundu, Arnab Raha, Sahil Shah

发表机构 * Intel Corporation（英特尔公司）

AI总结提出一种基于微调的硬件感知训练算法，通过范围收缩的sinh变换缓解I-V非线性，并将保留误差纳入正则化损失，实现ReRAM上DNN的高效部署，在图像分类和问答任务中精度损失极小。

Comments 11 pages, 12 figures, 2 tables, with appendix (5 pages, 9 figures)

详情

AI中文摘要

传统的CPU、GPU和NPU架构日益受到冯·诺依曼瓶颈的限制。虽然使用ReRAM交叉阵列的存内计算（IMC）提供了一种高密度、高能效的替代方案，但其实际部署受到非理想特性的制约。现有的硬件感知训练框架通常需要从头开始训练，这对于现代大规模模型来说计算成本过高。在这项工作中，我们提出了一种基于微调的硬件感知训练算法，能够在最小训练开销下实现DNN在ReRAM上的鲁棒部署。我们的方法通过应用范围收缩的sinh变换来缓解I-V非线性，并在微调过程中将保留误差直接纳入正则化损失。我们在图像分类和问答（QA）等模型和任务上评估了我们的框架。实验结果表明，我们的方法在ResNet18和DeiT-Tiny等大规模模型上实现了与基础模型相似的精度。在ImageNet上的MobileNetV3系列中，该技术的精度下降不到2%。此外，将该技术应用于SQuAD v2数据集，F-1分数仅下降1点。

英文摘要

Traditional CPU, GPU, and NPU architectures are increasingly limited by the von Neumann bottleneck. While In-Memory Computing (IMC) using ReRAM crossbar arrays offers a high-density, energy-efficient alternative, its practical deployment is constrained through their non-idealities. Existing hardware-aware training frameworks often require training from scratch, which is computationally prohibitive for modern large-scale models. In this work, we propose a finetuning-based hardware-aware training algorithm that enables robust DNN deployment on ReRAM with minimal training overhead. Our approach mitigates I-V non-linearity by applying a range-shrunk sinh transformation and incorporates retention errors directly into a regularization loss during the finetuning process. We evaluate our framework across models and tasks such as image classification and question-answering (QA). Experimental results demonstrate that our method achieves similar accuracy on large-scale models like ResNet18 and DeiT-Tiny as the base model. In-case of ImageNet for MobileNetV3 families the technique has only less than 2% accuracy degradation. Further, applying the technique on the SQuAD v2 dataset results in only 1 point degradation of F-1 score.

URL PDF HTML ☆

赞 0 踩 0

2606.17500 2026-06-17 cs.LG cs.AR 新提交

Reconfigurable Computing Challenge: Transformer for Jet Tagging on Versal AI Engines

可重构计算挑战：Versal AI Engine上的喷注标记Transformer

Gram Koski, Sean Lipps, Zhenghua Ma, G. Abarajithan, Ryan Kastner

发表机构 * Department of Computer Science and Engineering（计算机科学与工程系）； University of California San Diego（加州大学圣地亚哥分校）； La Jolla, CA, USA（拉贾拉, 加州, 美国）

AI总结针对CERN LHC喷注标记任务，提出在AMD Versal AI Engine上部署量化整数Transformer的初始实现，并开发可重用软件框架自动生成Vitis图代码。

Comments 4 pages, 4 figures. In FCCM 2026 proceedings

详情

DOI: 10.1109/FCCM68464.2026.00078
Journal ref: 2026 IEEE 34th Int. Symp. on Field-Programmable Custom Computing Machines (FCCM), Atlanta, GA, USA, 2026, pp. 307-310

AI中文摘要

基于Transformer的模型在CERN LHC的喷注标记中表现出强大的性能，但在低延迟、资源受限的触发系统中部署它们具有挑战性。我们提出了一个在AMD Versal AI Engine（AIE）上用于喷注标记的量化、纯整数Transformer的初始实现，将密集层和多头注意力（MHA）层映射到AIE瓦片。主要贡献是一个可重用的软件框架，该框架将Transformer层表示为可组合的AIE构建块，并从高级Python模型描述自动生成相应的Vitis图代码。该框架为未来研究提供了基础，并作为开源软件在此https URL发布。

英文摘要

Transformer-based models achieve strong performance for jet tagging at the CERN LHC, but deploying them in low-latency, resource-constrained trigger systems is challenging. We present an initial implementation of a quantized, integer-only transformer for jet tagging on the AMD Versal AI Engine (AIE), mapping dense and multi-head attention (MHA) layers to AIE tiles. The main contribution is a reusable software framework that represents transformer layers as composable AIE building blocks and automatically generates the corresponding Vitis graph code from a high-level Python model description. This framework provides a foundation for future research and is released as open-source software at https://github.com/KastnerRG/particle_transformer_aie.

URL PDF HTML ☆

赞 0 踩 0

2606.17803 2026-06-17 cs.LG 新提交

Continual Self-Improvement with Lightweight Experiential Latent Memories

持续自我改进：轻量级经验潜在记忆

Vaggelis Dorovatas, Nancy Kalaj, Rahaf Aljundi

发表机构 * Toyota Motor Europe（丰田汽车欧洲公司）； University of Trento（特伦托大学）

AI总结提出一种在线方法，将推理时计算转化为轻量级模块化潜在记忆，通过自生成测试时信号进行训练，实现持续改进且避免灾难性遗忘。

详情

AI中文摘要

大型语言模型通过扩展推理时计算实现了强大的推理性能，但本质上仍然是无状态的，丢弃了在此过程中产生的丰富、自生成的推理轨迹。我们研究模型是否可以从这种经验中在线学习，将瞬态计算（推理轨迹）转化为持久可复用的知识，且无需外部监督或访问未来数据。我们表明，对原始推理轨迹进行上下文学习（ICL）无法泛化，反映了令牌级复用的根本局限性：即使经过细化（例如自我反思），单个轨迹也缺乏迁移所需的抽象。相比之下，受近期无监督强化学习工作的启发，我们发现使用自生成的测试时信号（多数投票）作为奖励的轻量级每实例训练能带来显著收益，通常超过全数据集离线训练，这促使从原始轨迹转向学习到的潜在表示。基于这一见解，我们提出一种在线方法，将遇到问题所花费的推理时计算蒸馏为紧凑的模块化潜在记忆，捕捉底层推理结构。这些记忆被存储并检索用于未来输入，通过模块化设计实现持续改进，同时避免灾难性遗忘。重要的是，我们的方法高效，参数化为极其轻量级的软提示记忆（约模型参数的0.001%），仅需少量梯度步训练，但性能与完全参数更新和离线训练相当。在具有挑战性的数学推理基准测试中，我们的方法显著优于零样本和原始数据ICL基线，并在数据集间有效迁移。

英文摘要

Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate whether models can instead learn online from this experience, converting transient computation (reasoning traces) into persistent reusable knowledge, and without external supervision or access to future data. We show that In-Context Learning (ICL) over raw reasoning traces fails to generalize, reflecting a fundamental limitation of token-level reuse: individual traces lack the abstraction needed for transfer, even after refinement (e.g. self-reflection). In contrast, drawing inspiration from recent works on unsupervised reinforcement learning, we find that lightweight per-instance training with self-generated test-time signals (majority voting) as rewards yields substantial gains, often surpassing full-dataset offline training, motivating a shift from raw traces to learned latent representations. Building on this insight, we propose an online method that distills inference-time compute spent on encountered problems into compact modular latent memories capturing the underlying reasoning structure. These memories are stored and retrieved for future inputs, enabling continual improvement while avoiding catastrophic forgetting through modular design. Importantly, our method is highly efficient, parametrized as extremely lightweight soft prompt memories (~0.001% of model parameters) and trained with only a few gradient steps, yet achieving performance competitive with full parametric updates and offline training. Across challenging mathematical reasoning benchmarks, our approach significantly outperforms zero-shot and raw data ICL baselines, while transferring effectively across datasets.

URL PDF HTML ☆

赞 0 踩 0

2606.17872 2026-06-17 cs.LG cs.AI 新提交

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

AnchorKV: 通过拒绝锚点的软惩罚实现安全感知的KV缓存压缩

Ning Ni, Yingjie Lao

发表机构 * Department of Computer Science, Tufts University（塔夫茨大学计算机科学系）； Department of Electrical and Computer Engineering, Tufts University（塔夫茨大学电气与计算机工程系）

AI总结提出AnchorKV，一种通过软惩罚机制调整令牌保留分数以远离有害提示的KV缓存压缩方法，在保持实用性的同时显著提升安全性。

详情

AI中文摘要

大型语言模型（LLMs）在生成推理和长上下文任务上优于早期架构，但其庞大的规模在内存使用、能耗和设备端部署方面带来了重大挑战。由于缩放预训练语言模型能提升下游能力\cite{zhao2023survey}，键值（KV）缓存成为主要的推理瓶颈。最近的KV缓存压缩方法\cite{jo2025fastkv,li2024snapkv,zhou2024dynamickv}通过仅保留注意力相关令牌的子集来降低这一成本。然而，虽然这些方法在良性工作负载上保持了准确性，但其压缩策略要么无法防御越狱攻击\cite{jiang2024robustkv}，要么在激进驱逐下降低安全对齐。我们提出AnchorKV，一种对KV缓存压缩的即插即用修改，它使令牌保留分数偏向远离与有害提示相关的键空间方向。AnchorKV通过将均值差异表示工程方法\cite{arditi2024refusal,zou2023representation}适配到KV缓存中使用的层特定键投影空间，构建了一个离线安全锚点。基于该锚点，一种软惩罚令牌选择规则以少量效用换取显著改善的安全对齐，当惩罚为零时则退化为原始压缩器。

英文摘要

Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability \cite{zhao2023survey}, the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods \cite{jo2025fastkv,li2024snapkv,zhou2024dynamickv} reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks \cite{jiang2024robustkv} or degrade safety alignment under aggressive eviction. We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach \cite{arditi2024refusal,zou2023representation} to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

URL PDF HTML ☆

赞 0 踩 0

2606.18096 2026-06-17 cs.LG cs.AI cs.DC 新提交

S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices

S4oP：面向资源受限设备的结构化状态空间模型的算子级剪枝

Marco Deano, Filippo Ziche, Nicola Bombieri

发表机构 * University of Verona（威尼斯大学）

AI总结提出一种针对S4和S4D模型的增量算子级剪枝方法，通过结构化掩码与微调交替进行，在保持预测性能的同时显著降低推理成本，首次系统研究SSM的结构化算子剪枝。

详情

AI中文摘要

结构化状态空间模型（SSMs），包括S4和S4D架构，最近已成为捕捉序列数据中长程依赖关系的基于注意力模型的有力替代方案。尽管其经验性能强劲，但由于计算和内存需求，在时间和资源受限的环境中部署这些模型仍然具有挑战性。在本文中，我们提出了一种新颖的增量式算子级剪枝方法，用于基于S4和S4D的模型，该方法在保持预测性能的同时显著降低推理成本。据我们所知，这是首个系统研究SSM结构化算子剪枝的工作。我们的方法通过将结构化掩码与微调交替进行，逐步剪枝模型算子，同时联合监控准确性和推理延迟。我们在一个统一的训练和评估框架中实现了这种方法，该框架能够系统地探索效率-准确性的权衡。在多个基准数据集上的实验表明，剪枝高达70%的模型算子在大多数情况下保持了原始模型的性能，同时显著降低了推理延迟。这些结果表明，结构化算子剪枝是一种有效且先前未被探索的提高SSM效率的策略，并有助于它们在资源受限的实际场景中的部署。

英文摘要

Structured State Space Models (SSMs), including the S4 and S4D architectures, have recently emerged as powerful alternatives to attention-based models for capturing long-range dependencies in sequential data. Despite their strong empirical performance, deploying these models in time- and resource-constrained settings remains challenging due to their computational and memory demands. In this paper, we propose a novel incremental, operator-level pruning approach for S4- and S4D-based models that significantly reduces inference cost while preserving predictive performance. To the best of our knowledge, this is the first work to systematically investigate structured operator pruning for SSMs. Our method progressively prunes model operators by interleaving structured masking with fine-tuning, while jointly monitoring accuracy and inference latency. We implement this approach within a unified training and evaluation framework that enables systematic exploration of efficiency-accuracy trade-offs. Experiments across multiple benchmark datasets show that pruning up to 70% of the model operators preserves the performance of the original models in most cases, while substantially reducing inference latency. These results demonstrate that structured operator pruning is an effective and previously unexplored strategy for improving the efficiency of SSMs and facilitate their deployment in practical, resource-constrained scenarios.

URL PDF HTML ☆

赞 0 踩 0

2606.18114 2026-06-17 cs.LG cs.AI 新提交

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

Ternary Mamba: 分组量化感知训练的 W1.58A16 状态空间模型

Ramprasath Ganesaraja, Sahil Dilip Panse, Swathika N

发表机构 * EdgeVerve Systems Limited（EdgeVerve系统有限公司）

AI总结提出从预训练检查点进行分组量化感知训练（QAT）结合知识蒸馏，以极低数据量（1亿token）将Mamba-2 1.3B压缩至3.61倍，零样本准确率接近Bi-Mamba，并发现预训练QAT特有的零比率坍塌问题。

详情

AI中文摘要

状态空间模型（SSM）如Mamba-2提供线性时间推理，但其内存占用限制了边缘部署。先前的三元SSM工作（Slender-Mamba）在150B token上从头训练；我们证明预训练检查点足以胜任，将边际token预算减少1000倍。使用分组量化感知训练（QAT）结合冻结FP16教师的知识蒸馏，我们将Mamba-2 1.3B压缩3.61倍（从2687 MB到744 MB），并在仅102M token（4 GPU小时，单H100）下达到48.1%的零样本准确率（7任务平均）——接近Bi-Mamba的48.4%（在+/-0.9pp置信区间内）。这种从预训练开始的QAT设置揭示了零比率坍塌，一种由可学习量化尺度引起的新不稳定性，在从头训练中不会出现。我们进一步证明，由于通过循环的误差累积，对Transformer有效的后处理校正策略对SSM失效。这些结果表明三元SSM不需要昂贵的从头训练：从预训练检查点进行QAT结合KD是一种数据高效的替代方案。

英文摘要

State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,000x. Using grouped quantization-aware training (QAT) with knowledge distillation from a frozen FP16 teacher, we compress Mamba-2 1.3B to 3.61x (2,687 to 744 MB) and achieve 48.1% zero-shot accuracy (7-task average) in just 102M tokens (4 GPU-hours, single H100) -- approaching Bi-Mamba's 48.4% (within +/-0.9pp CI). This QAT-from-pretrained setting reveals zero-ratio collapse, a novel instability caused by learnable quantization scales that does not arise in from-scratch training. We further show that post-hoc correction strategies effective for Transformers fail for SSMs due to error accumulation through the recurrence. These results demonstrate that ternary SSMs do not require expensive from-scratch training: QAT from pretrained checkpoints with KD is a data-efficient alternative.

URL PDF HTML ☆

赞 0 踩 0

2606.17249 2026-06-17 cs.AR cs.LG cs.NE eess.SP 交叉投稿

From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers

从压缩到部署：超受限微控制器上的实时节能FastGRNN

Emre Can Kizilates

发表机构 * Electronics Engineer Independent Researcher, Izmir, Turkey

AI总结针对超受限微控制器，提出端到端开源FastGRNN压缩部署方案，结合低秩分解、稀疏化和量化，在8位和16位MCU上实现实时50Hz推理，模型仅566字节权重，F1达0.918，并贡献了跨平台确定性推理、循环预热延迟、无乘法器查找表和硬件能耗分析。

Comments 14 pages, 8 figures. Code: https://github.com/emre1998/fastgrnn-har

详情

AI中文摘要

现代机器学习的主导轨迹一直是规模化：更大的模型、更大的加速器、更大的内存预算。然而，多年的全球半导体供应限制以及始终在线推理日益增长的能源和碳成本暴露了这一轨迹的脆弱性，并推动了相反的方向：重构AI和ML算法，使其适应已经在可穿戴设备、传感器和边缘设备中大规模生产的小型、无处不在的微控制器。我们提出了FastGRNN（一种紧凑的门控循环单元）的端到端开源复现，部署在两个裸机目标上：8位Arduino（ATmega328P）和16位MSP430（无硬件乘法器；16 KB闪存；512 B SRAM）。我们的压缩流水线结合了低秩权重分解、迭代硬阈值稀疏性和基于张量的Q15训练后量化，并带有显式激活校准。部署的模型占用566字节权重，在HAPT测试集上达到宏F1=0.918（种子0；五个种子的Q15平均值为0.853±0.107）。它在3399个测试窗口上与PyTorch参考实现达到100%预测一致（MCU种子0；五个种子上99.91-100% C等效）。两个平台都支持实时50Hz流式推理（Arduino上每个样本9.21 ms；MSP430上13 ms），其中256条目sigmoid/tanh查找表在无乘法器的MSP430上实现了30.5倍加速。四个贡献扩展了原始FastGRNN论文：（i）跨平台位等效确定性推理；（ii）循环预热延迟的表征（中位数74个样本，1.48秒；最坏情况125个样本，2.50秒，超过100个测试窗口）；（iii）针对无乘法器嵌入式目标的可部署查找表方案；（iv）硬件能耗表征，显示17.7 mW主动推理功率，<0.09 mW空闲功率，以及使用LUT实现96.7%的能耗降低。

英文摘要

The dominant trajectory of modern machine learning has been to scale up: larger models, larger accelerators, larger memory budgets. Yet a multi-year global semiconductor supply constraint and the growing energy and carbon cost of always-online inference expose the fragility of this trajectory and motivate the opposite direction: refactoring AI and ML algorithms to fit the small, ubiquitous microcontrollers already in mass production in wearables, sensors, and edge appliances. We present an end-to-end open-source reproduction of FastGRNN, a compact gated recurrent cell, deployed on two bare-metal targets: the 8-bit Arduino (ATmega328P) and the 16-bit MSP430 (no hardware multiplier; 16 KB Flash; 512 B SRAM). Our compression pipeline combines low-rank weight factorization, iterative hard-thresholding sparsity, and per-tensor Q15 post-training quantization with explicit activation calibration. The deployed model occupies 566 bytes of weights and achieves macro F1 = 0.918 (seed 0; five-seed Q15 mean 0.853+-0.107) on the HAPT test set. It matches a PyTorch reference at 100% prediction agreement across 3,399 test windows (MCU seed 0; 99.91-100% C-equivalent across five seeds). Both platforms sustain real-time 50 Hz streaming inference (9.21 ms per sample on Arduino; 13 ms on MSP430), where a 256-entry sigmoid/tanh look-up table delivers a 30.5x speedup on the multiplier-less MSP430. Four contributions extend the original FastGRNN paper: (i) cross-platform bit-equivalent deterministic inference; (ii) characterization of recurrent warm-up latency (median 74 samples, 1.48 s; worst-case 125 samples, 2.50 s over 100 test windows); (iii) a deployable look-up-table recipe for multiplier-less embedded targets; and (iv) hardware energy characterization showing 17.7 mW active inference power, <0.09 mW idle power, and 96.7% energy reduction with the LUT.

URL PDF HTML ☆

赞 0 踩 0

2606.17566 2026-06-17 cs.DC cs.LG 交叉投稿

AoiZora: Topology-Aware Auto-Parallel Optimization for Inference of Diffusion Transformers

AoiZora: 面向扩散变换器推理的拓扑感知自动并行优化

Kaijian Wang, Yuanyuan Xu, Fanjiang Ye, Ye Cao, Jingwei Zuo, T. S. Eugene Ng, Yarong Mu, Yuke Wang

发表机构 * Rice University（里士大学）； Independent Researcher（独立研究者）； Google（谷歌）

AI总结针对扩散变换器推理中的低延迟需求，提出AoiZora编译器，通过拓扑感知的物理布局优化自动并行策略，在TPU子片上实现高达1.42倍的加速。

详情

AI中文摘要

视频扩散已迅速成为关键的生成服务负载，但生成每个片段需要对大型时空潜在变量进行多次去噪迭代，这使得在单个设备上难以实现低延迟推理。因此，去噪步骤通常分布在多个加速器上，而TPU子片已成为一种有吸引力且实用的计算结构。然而，当前的自动并行系统几乎完全在逻辑设备网格上进行搜索，忽略了所选分片在物理TPU互连上的实际布局——这种疏忽导致了大量与拓扑相关的性能损失。我们通过AoiZora填补了这一空白，这是一个专为TPU子片上低延迟视频扩散推理设计的编译器中介拓扑规划器。其指导原则是通过利用编译流程中的不同点，重新连接逻辑分片与物理布局：AoiZora首先从廉价的预编译IR中消除弱分片候选，然后仅编译存活的候选，并使用编译后的HLO结合拓扑感知通信模型对其物理布局进行排序。最终方案沿普通编译器路径实现，保持模型代码、编译器降级、集合内核和网络路由完全不变。在TPU v5e子片上，与现有解决方案相比，AoiZora将Wan 2.1单步去噪延迟降低了多达1.42倍。

英文摘要

Video diffusion has quickly grown into a key generative serving workload, yet producing each clip demands many denoising iterations over large spatio-temporal latents, which puts low-latency inference out of reach on a single device. A denoising step is therefore typically distributed across multiple accelerators, and TPU sub-slices have become an attractive and practical fabric for doing so. Current auto-parallel systems, however, search almost exclusively over logical device meshes and disregard how a chosen sharding is actually laid out on the physical TPU interconnect -- an oversight that leaves large, topology-dependent performance on the table. We address this gap with AoiZora, a compiler-mediated topology planner built for low-latency video diffusion inference on TPU sub-slices. Its guiding principle is to reconnect logical sharding with physical placement by drawing on different points in the compilation flow: AoiZora first eliminates weak sharding candidates from inexpensive pre-compilation IRs, then compiles only the ones that survive and orders their physical placements using compiled HLO together with a topology-aware communication model. The winning plan is realized along the ordinary compiler path, leaving model code, compiler lowering, collective kernels, and network routing entirely intact. On TPU v5e sub-slices, AoiZora reduces Wan 2.1 one-step denoising latency by as much as 1.42x relative to existing solutions.

URL PDF HTML ☆

赞 0 踩 0

2404.01965 2026-06-17 cs.LG cs.AI 版本更新

Towards Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks

迈向利用AutoML实现可持续深度学习：深度移位神经网络上的多目标HPO方法

Leona Hennig, Tanja Tornede, Marius Lindauer

AI总结针对深度学习计算成本高的问题，提出结合多保真度HPO与多目标优化，在深度移位神经网络上同时最大化精度和最小化能耗，实验获得超80%精度且低计算开销。

详情

AI中文摘要

深度学习通过从大型数据集中提取复杂模式，推动了各个领域的发展。然而，深度学习模型的计算需求带来了环境和资源方面的挑战。深度移位神经网络（DSNNs）通过利用移位操作来降低推理时的计算复杂度，提供了一种解决方案。遵循标准DNNs的见解，我们感兴趣的是通过AutoML技术充分利用DSNNs的潜力。我们研究了超参数优化（HPO）的影响，以最大化DSNN性能，同时最小化资源消耗。由于这结合了多目标（MO）优化，其中精度和能耗作为潜在互补目标，我们提出将最先进的多保真度（MF）HPO与多目标优化相结合。实验结果表明了我们方法的有效性，得到了精度超过80%且计算成本低的模型。总体而言，我们的方法加速了高效模型开发，同时实现了可持续的AI应用。

英文摘要

Deep Learning (DL) has advanced various fields by extracting complex patterns from large datasets. However, the computational demands of DL models pose environmental and resource challenges. Deep shift neural networks (DSNNs) offer a solution by leveraging shift operations to reduce computational complexity at inference. Following the insights from standard DNNs, we are interested in leveraging the full potential of DSNNs by means of AutoML techniques. We study the impact of hyperparameter optimization (HPO) to maximize DSNN performance while minimizing resource consumption. Since this combines multi-objective (MO) optimization with accuracy and energy consumption as potentially complementary objectives, we propose to combine state-of-the-art multi-fidelity (MF) HPO with multi-objective optimization. Experimental results demonstrate the effectiveness of our approach, resulting in models with over 80\% in accuracy and low computational cost. Overall, our method accelerates efficient model development while enabling sustainable AI applications.

URL PDF HTML ☆

赞 0 踩 0

2505.00986 2026-06-17 cs.LG cs.CV 版本更新

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

EmbodiTTA：面向具身视觉系统的资源高效测试时自适应

Xiao Ma, Young D. Kwon, Dong Ma

AI总结提出按需测试时自适应范式OD-TTA，通过轻量域移检测、源域选择和分离批归一化更新，在边缘设备上实现高效准确的自适应，显著降低计算和能耗开销。

详情

AI中文摘要

连续测试时自适应（CTTA）持续对每个到达的数据批次调整部署模型。虽然达到了最优精度，但现有的CTTA方法由于巨大的内存开销和能耗，在资源受限的边缘设备上实际应用性差。本文首先引入一种新范式——按需TTA，仅在检测到显著域移时触发自适应。然后，我们提出OD-TTA，一种用于边缘设备上准确高效自适应的按需TTA框架。OD-TTA包含三项创新技术：1）轻量级域移检测机制，仅在需要时激活TTA，大幅降低总体计算开销；2）源域选择模块，选择合适的源模型进行自适应，确保高且鲁棒的精度；3）解耦的批归一化（BN）更新方案，实现小批量下的内存高效自适应。大量实验表明，OD-TTA在显著降低能量和计算开销的同时，实现了可比甚至更好的性能，使TTA成为实际可行的技术。

英文摘要

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm -- on-demand TTA -- which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

URL PDF HTML ☆

赞 0 踩 0

2505.23939 2026-06-17 cs.LG cs.NI 版本更新

Searching Neural Architectures for Sensor Nodes on IoT Gateways

搜索物联网网关上传感器节点的神经架构

Andrea Mattia Garavagno, Edoardo Ragusa, Antonio Frisoli, Paolo Gastaldo

发表机构 * University of Genoa（基因瓦大学）

AI总结提出一种在物联网网关上自动设计神经网络的方法，保护数据隐私，在Raspberry Pi Zero 2上10小时内搜索出达到SOTA的架构。

详情

DOI: 10.1109/JIOT.2025.3581442
Journal ref: IEEE Internet of Things Journal, vol. 12, no. 21, pp. 44492-44501, 2025

AI中文摘要

本文提出一种在边缘自动设计神经网络的方法，即使在隐私敏感的物联网应用中也能实现机器学习。该方法在物联网网关上运行，为连接的传感器节点设计神经网络，而无需将收集的数据共享到本地网络之外，将数据保留在采集现场。这种方法有潜力为医疗物联网和工业物联网启用机器学习，在边缘设计硬件友好且定制的神经网络，用于个性化医疗和高级工业服务，如质量控制、预测性维护或故障诊断。通过防止数据泄露到云服务，该方法保护了敏感信息，包括工业机密和个人数据。全面的实验结果表明，在Visual Wake Words数据集上，所提出的方法通过在Raspberry Pi Zero 2上运行不到10小时的搜索过程，可以达到最先进的结果。

英文摘要

This paper presents an automatic method for the design of Neural Networks (NNs) at the edge, enabling Machine Learning (ML) access even in privacy-sensitive Internet of Things (IoT) applications. The proposed method runs on IoT gateways and designs NNs for connected sensor nodes without sharing the collected data outside the local network, keeping the data in the site of collection. This approach has the potential to enable ML for Healthcare Internet of Things (HIoT) and Industrial Internet of Things (IIoT), designing hardware-friendly and custom NNs at the edge for personalized healthcare and advanced industrial services such as quality control, predictive maintenance, or fault diagnosis. By preventing data from being disclosed to cloud services, this method safeguards sensitive information, including industrial secrets and personal data. The outcomes of a thorough experimental session confirm that -- on the Visual Wake Words dataset -- the proposed approach can achieve state-of-the-art results by exploiting a search procedure that runs in less than 10 hours on the Raspberry Pi Zero 2.

URL PDF HTML ☆

赞 0 踩 0

2602.06154 2026-06-17 cs.LG cs.CL 版本更新

MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models

MoSE: 混合可瘦身专家实现高效自适应语言模型

Nurbek Tastan, Stefanos Laskaridis, Karthik Nandakumar, Samuel Horvath

AI总结提出MoSE架构，每个专家具有可变宽度的嵌套结构，支持在推理时连续调节精度-计算权衡，通过多宽度训练和轻量级测试时训练实现高效自适应。

Comments Accepted to ICML 2026

详情

AI中文摘要

混合专家（MoE）模型通过稀疏激活专家高效扩展大型语言模型，但一旦选定专家，其执行是完整的。因此，MoE模型中精度与计算之间的权衡通常表现出较大的不连续性。我们提出混合可瘦身专家（MoSE），这是一种MoE架构，其中每个专家具有嵌套的、可瘦身的结构，可以以可变宽度执行。这不仅实现了对激活哪些专家的条件计算，还实现了对每个专家利用多少的条件计算。因此，单个预训练的MoSE模型可以在推理时支持更连续的精度-计算权衡谱。我们提出了一种简单且稳定的训练方法，用于在稀疏路由下训练可瘦身专家，将多宽度训练与标准MoE目标相结合。在推理过程中，我们探索了运行时宽度确定的策略，包括一种轻量级的测试时训练机制，该机制学习如何在固定预算下将路由器置信度/概率映射到专家宽度。在GPT风格模型、各种路由机制、零样本下游推理基准以及DeepSeek模型的持续预训练适应上的实验表明，MoSE在全宽度下匹配或优于标准MoE，并持续将计算-质量边界向更低的推理FLOPs移动。代码可在以下网址找到：this https URL。

英文摘要

Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and computation in an MoE model typically exhibits large discontinuities. We propose Mixture of Slimmable Experts (MoSE), an MoE architecture in which each expert has a nested, slimmable structure that can be executed at variable widths. This enables conditional computation not only over which experts are activated but also over how much of each expert is utilized. Consequently, a single pretrained MoSE model can support a more continuous spectrum of accuracy-compute trade-offs at inference time. We present a simple and stable training recipe for slimmable experts under sparse routing, combining multi-width training with standard MoE objectives. During inference, we explore strategies for runtime width determination, including a lightweight test-time training mechanism that learns how to map router confidence/probabilities to expert widths under a fixed budget. Experiments on GPT-style models, various routing regimes, zero-shot downstream reasoning benchmarks, and continual pre-training adaptation of DeepSeek model show that MoSE matches or improves standard MoE at full width and consistently shifts the compute-quality frontier toward lower inference FLOPs. The code can be found at: https://github.com/tnurbek/mose.

URL PDF HTML ☆

赞 0 踩 0

2603.08001 2026-06-17 cs.LG stat.ML 版本更新

Amortizing Maximum Inner Product Search with Learned Support Functions

通过学习支持函数摊销最大内积搜索

Theo X. Olausson, João Monteiro, Michal Klein, Marco Cuturi

AI总结提出基于回归的摊销MIPS方法，通过训练神经网络直接预测最优键，利用支持函数的凸性加速搜索，在BEIR基准上显著提升IVF匹配率。

详情

AI中文摘要

最大内积搜索（MIPS）是机器学习中的关键子程序，需要从数据库（键）中识别出与给定查询最匹配的向量。我们提出摊销MIPS：一种基于回归的方法，训练神经网络直接预测MIPS解，从而摊销在固定键数据库上从已知分布中重复求解查询的MIPS成本。我们的关键洞察是，MIPS值函数是键集合的\emph{支持}函数，这是一个经过充分研究的凸函数，其梯度给出最优键。这激发了两种互补的摊销模型：SupportNet，一个输入凸神经网络，用于回归支持函数；以及KeyNet，一个向量值网络，直接回归最优键。SupportNet可以作为聚类路由器，将查询引导到相关的数据库分区，而KeyNet可以作为原始查询的直接替代品，直接输入到现成的索引流水线中。我们在BEIR基准上的实验表明，对于文档嵌入，当考虑计算工作量（无论是FLOPs、探测次数还是挂钟时间）时，学习的SupportNet和KeyNet显著提高了IVF匹配率。我们的代码可在以下网址获取：this https URL。

英文摘要

Maximum inner product search (MIPS) is a crucial subroutine in machine learning, requiring the identification of a vector taken within a database (the keys) that best aligns with a given query. We propose amortized MIPS: a regression-based approach that trains neural networks to directly predict MIPS solutions, amortizing the cost of repeatedly solving MIPS for queries drawn from a known distribution over a fixed key database. Our key insight is that the MIPS value function is the \emph{support} function of the set of keys, a well-studied convex function whose gradient yields the optimal key. This motivates two complementary amortized models: SupportNet, an input-convex neural network trained to regress the support function, and KeyNet, a vector-valued network that directly regresses the optimal key. SupportNet can serve as a cluster router, steering queries toward relevant database partitions, while KeyNet can be used as a drop-in replacement for the original query, fed directly to off-the-shelf indexing pipelines. Our experiments on the BEIR benchmark show that, for document embeddings, learned \SupportNet{}s and \KeyNet{}s significantly improve IVF match rates when accounting for compute effort, whether measured in FLOPs, number of probes, or wall-clock time. Our code is available at: https://github.com/apple/ml-amips.

URL PDF HTML ☆

赞 0 踩 0

2603.18492 2026-06-17 cs.LG 版本更新

AIMER: Calibration-Free Task-Agnostic MoE Expert Pruning

AIMER: 免校准任务无关的MoE专家剪枝

Zongfang Liu, Guangyi Chen, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan

AI总结提出AIMER方法，通过专家权重的集中度模式识别独特专家，实现免校准的任务无关MoE专家剪枝，在7B至47B模型上优于现有方法。

详情

AI中文摘要

混合专家（MoE）语言模型在不增加每token计算量的情况下增加了参数容量，但部署时仍需存储全部专家池，因此专家剪枝对于减少内存和服务开销至关重要。现有的任务无关专家剪枝方法通常依赖校准：它们通过校准集上的路由或激活统计估计专家重要性，使得剪枝决策对校准数据变化敏感，同时引入大量预处理成本。我们提出AIMER（基于均方根绝对均值的重要性专家排序），一种简单的免校准准则，通过捕捉专家权重的集中度模式来识别更独特的专家，使其非常适合任务无关的专家剪枝。在具有不同架构的7B至47B MoE语言模型和16个多样化基准上，AIMER在跨任务能力平衡方面始终优于现有的免校准方法。令人惊讶的是，AIMER还比基于强校准的专家剪枝基线（在广泛使用的任务无关C4语料库上校准）实现了更好的平衡，同时仅需0.22–2.06秒即可对所有专家进行评分。

英文摘要

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token computation, yet deployment still requires storing the full expert pool, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert-pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, making pruning decisions sensitive to calibration-data variation while introducing substantial preprocessing cost. We propose AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that identifies more distinct experts by capturing the concentration pattern of expert weights, making it well suited for task-agnostic expert pruning. Across 7B to 47B MoE language models with distinct architectures and 16 diverse benchmarks, AIMER consistently delivers stronger capability balance across diverse tasks than existing calibration-free methods. Surprisingly, AIMER also achieves better balance than strong calibration-based expert-pruning baselines calibrated on the widely used task-agnostic C4 corpus, while requiring only 0.22--2.06 seconds to score all experts.

URL PDF HTML ☆

赞 0 踩 0

2605.00330 2026-06-17 cs.LG 版本更新

高斯过程后验采样的差分隐私

Tomasz Maciazek

发表机构 * School of Mathematics, University of Bristol（布里斯托大学数学学院）

AI总结研究高斯过程后验样本路径的隐私性，通过Rényi-DP界分离后验均值与协方差泄露，揭示有效岭正则化的关键作用，并验证成员推断攻击与正则化的依赖关系。

Comments 8 pages of main text + 25 pages appendix

详情

AI中文摘要

我们研究了当整个训练集（包括协变量和响应）是私有时，从高斯过程（GP）发布后验样本路径的隐私性。与添加外部噪声的标准差分隐私（DP）机制不同，后验采样在构造上是随机的。我们表明，这种内在随机性通过推导GP后验样本路径发布的显式Rényi-DP界来提供DP保证。这些界将后验均值泄露与数据相关的后验协方差泄露分开，表明有意义的隐私严重依赖于有效的岭正则化。我们应用成员推断攻击来表明经验泄露遵循对正则化、后验方差和发布的样本路径数量的预测依赖关系。在下游后验采样任务上的效用实验识别了噪声观测机制，其中隐私兼容的正则化以适度的效用损失保留了有用的决策。当需要更强的隐私时，可以通过添加校准的GP噪声来增强内在保证，提供显式的额外隐私调节旋钮。

英文摘要

We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that this intrinsic randomness yields DP guarantees by deriving explicit Rényi-DP bounds for GP posterior sample-path release. The bounds separate posterior-mean leakage from data-dependent posterior-covariance leakage showing that meaningful privacy depends sharply on effective ridge regularisation. We apply membership-inference attacks to show that empirical leakage follows the predicted dependence on regularisation, posterior variance and the number of released posterior sample-paths. Utility experiments on downstream posterior-sampling tasks identify noisy-observation regimes where privacy-compatible regularisation preserves useful decisions with modest utility loss. When stronger privacy is needed, the intrinsic guarantee can be sharpened by adding calibrated GP noise, providing an explicit additional privacy knob.

URL PDF HTML ☆

赞 0 踩 0

2503.10945 2026-06-17 cs.LG cs.AI cs.CR stat.ML 版本更新

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

高斯差分隐私：机器学习中报告差分隐私保证的方法

Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P. Calmon, Jamie Hayes, Borja Balle, Antti Honkela

AI总结针对当前机器学习中差分隐私报告不完整的问题，提出使用非渐近高斯差分隐私（GDP）作为主要报告方式，通过数值会计和决策理论度量，证明GDP能无误差地捕获DP-SGD等算法的完整隐私特征。

Comments IEEE SatML 2026 (position paper track)

详情

AI中文摘要

当前报告机器学习算法（如DP-SGD）的差分隐私（DP）保证的做法提供了不完整且可能误导的图景。例如，如果仅知道机制的一个$(\varepsilon, \delta)$，标准分析表明可能存在针对训练数据记录的高精度推理攻击，而更仔细的分析发现，对于大多数实际机制，这种精确攻击并不存在。在这篇立场论文中，我们主张使用_非渐近_高斯差分隐私（GDP）作为机器学习中传达DP保证的主要手段，以避免这些潜在缺点。利用DP文献中的两个最新进展：（i）能够以任意精度计算DP-SGD的隐私配置文件和$f$-DP曲线的开源数值会计，以及（ii）关于DP表示的决策理论度量，我们展示了如何使用数值会计提供GDP的非渐近界，并表明GDP能够以几乎无误差的方式捕获DP-SGD及相关算法的整个隐私配置文件（由该度量量化）。为了支持我们的主张，我们研究了最先进的DP大规模图像分类以及美国十年人口普查的TopDown算法的隐私配置文件，观察到GDP在所有情况下都与其配置文件拟合得非常好。最后，我们讨论了这种方法的优缺点，并探讨了哪些其他隐私机制可以从GDP中受益。

英文摘要

Current practices for reporting differential privacy (DP) guarantees for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture. For instance, if only a single $(\varepsilon, δ)$ is known about a mechanism, standard analyses show that there could exist highly accurate inference attacks against training data records, when, upon a more careful analysis, such accurate attacks do not exist for most practical mechanisms. In this position paper, we argue that using _non-asymptotic_ Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.

URL PDF HTML ☆

赞 0 踩 0

2507.15104 2026-06-17 cs.LG cs.AI 版本更新

AnalogFed: Privacy-Preserving Discovery of Analog Circuits at Scale with Federated Generative AI

AnalogFed: 基于联邦生成式AI的大规模模拟电路隐私保护发现

Qiufeng Li, Shu Hong, Tian Lan, Weidong Cao

AI总结提出AnalogFed，首个结合联邦学习和生成式AI的隐私保护框架，用于大规模模拟电路拓扑发现，通过虚拟令牌注入和同态加密防御成员推理和模型反转攻击，实现高效协作设计。

详情

AI中文摘要

生成式AI的最新进展已展现出对现代硬件设计的变革潜力。然而，由于硬件数据集的专有性和孤立性，无法集中进行模型训练，现有的生成式AI驱动方法难以实现大规模电子设计自动化。实现大规模生成式AI驱动的EDA需要一种新颖的隐私保护框架，能够在不损害机密性的情况下利用分布式数据。本文介绍了AnalogFed，这是首个利用联邦学习和生成式AI进行大规模模拟电路拓扑发现的隐私保护框架。AnalogFed在解决关键安全挑战的同时，确立了协作式模拟拓扑设计的可行性：它通过基于虚拟令牌注入的新型输入扰动策略减轻成员推理攻击，并使用定制的高效同态加密防御模型反转攻击。大量实验证明了AnalogFed的有效性和效率，在保持模型效用的同时实现了强大的隐私保护。该框架为下一代基于生成式AI的硬件设计自动化中的可扩展多方协作奠定了基础。

英文摘要

Recent advances in generative AI (GenAI) have shown transformative potential for modern hardware design. However, existing GenAI-driven approaches fall short of enabling large-scale electronic design automation (EDA) due to the proprietary and siloed nature of hardware datasets, which cannot be centralized for model training. Achieving at-scale GenAI-driven EDA, therefore, requires a novel privacy-preserving framework that can leverage distributed data without compromising confidentiality. This work introduces AnalogFed, the first privacy-preserving framework for large-scale analog circuit topology discovery using federated learning (FedL) and GenAI. AnalogFed establishes the feasibility of collaborative analog topology design while addressing key security challenges: it mitigates membership inference attacks (MIAs) through a novel input perturbation strategy based on dummy token injection, and defends against model inversion attacks with customized, efficient homomorphic encryption. Extensive experiments demonstrate AnalogFed's effectiveness and efficiency, achieving strong privacy protection without degrading model utility. This framework lays the foundation for scalable, multi-party collaboration in next-generation hardware design automation with GenAI.

URL PDF HTML ☆

赞 0 踩 0

2508.06692 2026-06-17 cs.LG 版本更新

HeteRo-Select: Informativeness as the Participation Driver in Heterogeneous Federated Learning

HeteRo-Select: 信息量作为异构联邦学习中的参与驱动因素

Md. Akmol Masud, Md Abrar Jahin, Mahmud Hasan

AI总结提出HeteRo-Select框架，用客户端信息量分数替代带宽驱动压缩，联合决定客户端选择、压缩比和聚合权重，降低异构性并减少流量，在CIFAR-10上实现1.78倍加速和18.2%流量减少。

详情

AI中文摘要

联邦学习系统通常根据链路速度分配梯度压缩。当带宽和数据信息量一致时，这是合理的。然而，在非IID数据下，这些信号常常去相关或反转。基于带宽的分配器可能最严重地压缩信息量最大的梯度。我们提出HeteRo-Select，一个用每个客户端的信息量分数替代带宽作为压缩主要驱动因素的框架。该分数联合控制每轮的三个决策：客户端选择、压缩比和服务器聚合权重，带宽仅作为硬上限保留。分数比例选择可证明地降低所选子集的有效异构性；分数比例压缩可证明地在固定流量下降低聚合top-$k$误差。在精确的FedCG模拟协议下，HeteRo-Select在CIFAR-10上实现了$1.78\times$加速和$18.2\%$流量减少。相同的配置，未经改变，从$7{,}850$参数的逻辑回归扩展到$11.27$M参数的ResNet-18，在四个基准测试中的三个达到了准确率目标。当带宽和信息量被故意反相关时，该方法仍能以比正常带宽运行更少的流量达到目标准确率。

英文摘要

Federated learning systems typically allocate gradient compression by link speed. This is sensible when bandwidth and data informativeness align. However, under non-IID data, these signals often decorrelate or invert. A bandwidth-driven allocator then risks compressing the most informative gradients hardest. We propose HeteRo-Select, a framework that replaces bandwidth with a per-client informativeness score as the primary driver of compression. The score jointly governs three decisions per round: client selection, compression ratio, and server aggregation weight, with bandwidth retained only as a hard ceiling. Score-proportional selection provably reduces the effective heterogeneity of the chosen subset; score-proportional compression provably lowers aggregate top-$k$ error at fixed traffic. Under the exact FedCG simulation protocol, HeteRo-Select delivers a $1.78\times$ speedup and an $18.2\%$ reduction in traffic on CIFAR-10. The same configuration, unchanged, scales from a $7{,}850$-parameter logistic regression to an $11.27$M-parameter ResNet-18, hitting the accuracy target on three of four benchmarks. When bandwidth and informativeness are deliberately anti-correlated, the method still achieves the target accuracy with less traffic than the normal-bandwidth run.

URL PDF HTML ☆

赞 0 踩 0

2606.10774 2026-06-17 cs.LG cs.DC 版本更新

Asynchronous Decentralized Federated Learning over Lossy Wireless Links via Reception- and Age-Aware Aggregation

部分接收下分散式联邦学习的逆概率加权与信息年龄聚合

Chanuka A. S. Hewa Kaluannakkage, Rajkumar Buyya

发表机构 * University of Melbourne（墨尔本大学）； University of Technology Sydney（悉尼科技大学）

AI总结针对无线网络下分散式联邦学习的选择偏差和更新过时问题，提出结合逆概率加权与信息年龄加权的DFL-AA方法，理论消除链路质量偏差，实验优于现有基线。

Comments 14 pages, 9 figures, research paper for journal submission

详情

AI中文摘要

在有损无线网络上的分散式联邦学习面临两个关键挑战：选择偏差，即由于部分模型接收，来自劣质链路的更新被系统性地低估；以及更新过时，即异步节点贡献过时信息。我们表明，使用局部填充重建的均匀八卦聚合会引入持久的链路质量诱导偏差，而基于完整性的加权进一步放大了这种效应。为了解决这些挑战，我们提出了DFL-AA（具有自适应AoI加权聚合的分散式联邦学习），它结合了逆概率加权与基于在线EWMA的信道估计来纠正选择偏差，以及基于信息年龄的加权来减轻过时，而无需全局同步。我们从理论上证明DFL-AA在期望上消除了链路质量失真，并通过实验证明在不同丢包率、网络规模和异构无线条件下，其性能持续优于最先进的基线。

英文摘要

Decentralized Federated Learning(DFL) enables collaborative model training across wireless edge nodes, including IoT deployments, autonomous vehicles, UAV swarms, and satellite constellations. Operating over lossy wireless links under constraints, these systems cannot rely on retransmissions, so model parameters must be accepted as partial chunks, leading to two key failure modes, which are selection bias, where poor-quality links are systematically under-represented in gossip aggregation, and update staleness, where asynchronous nodes contribute outdated models. We prove that classical gossip aggregation introduces irreducible selection bias proportional to the link-loss rate. We propose DFL-AA (Decentralized Federated Learning with Adaptive AoI-weighted Aggregation), which corrects selection bias using Inverse Probability Weighting (IPW) with online channel estimation and mitigates staleness via Age-of-Information (AoI) decay without requiring a global clock. We prove that DFL-AA removes link-quality distortion in expectation and consistently outperforms state-of-the-art baselines across varying loss rates and heterogeneous channel conditions on fixed directed topologies.

URL PDF HTML ☆

赞 0 踩 0

2606.17352 2026-06-17 cs.LG cs.CV 新提交

MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion

MM++: 无监督尺度不变多层OOD检测通过Top-K门控特征融合

Rahim Hossain, Md Tawheedul Islam Bhuian, Md Farhan Shadiq, Kyoung-Don Kang

发表机构 * School of Computing, State University of New York at Binghamton（纽约州立大学宾汉姆顿分校计算机学院）

AI总结提出MM++框架，通过熵密度下降识别判别性中间层，结合Ledoit-Wolf正则化协方差矩阵实现无监督、后处理、尺度不变的多层OOD检测，在近/远OOD场景中表现鲁棒。

2606.17435 2026-06-17 cs.LG 新提交

MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense

MorphStrata: 时间序列移动目标防御中生成Morphence学生的层特定扰动

Abhishek Bhardwaj, Arnav Doshi, Anusri Nagarajan, Thanh Quynh Nhu Ta, Mohammad Masum, Robert Chun, Jaydip Sen, Saptarshi Sengupta

发表机构 * Department of Computer Science, San Jos\' e State University, San Jos\' e , CA, USA ； Department of Computer Engineering, San Jos\' e State University, San Jos\' e , CA, USA ； Praxis Business School, Kolkata, India

AI总结提出MorphStrata策略，通过选择性层特定随机噪声注入生成结构异质的学生模型，在保持移动目标防御鲁棒性的同时，将训练开销增量控制在1%以内，并在高熵周期性数据集上实现高达24.11%和97.97%的RMSE降低。

Comments 13 pages, 9 figures, 11 tables

详情

AI中文摘要

时间序列预测模型仍然容易受到基于梯度的对抗攻击，而现有的防御机制通常会在鲁棒性与有限响应和计算成本之间进行权衡。这个问题在移动目标防御中尤为突出，因为维护多个随机化模型实例会显著增加训练开销。在这项工作中，我们引入了MorphStrata，一种具有选择性、层特定随机噪声注入的学生生成策略，扩展了传统的Morphence防御。MorphStrata使用Transformer骨干网络作为教师，并随机扰动选定的架构块，以在学生模型之间创建结构异质性，以应对不同的数据分布和威胁模型。我们在包括Jena Climate、Electricity Load Diagrams和Appliances Energy Prediction在内的一系列基准测试上，使用FGSM、BIM和PGD攻击以及多种攻击强度，与原始Transformer和Morphence骨干网络进行了评估。在不同的数据集和攻击机制下，所提出的集成模型保持了可比较的对抗RMSE。具体来说，对于高熵、周期性的数据集（如AEP数据），MorphStrata在所有攻击和扰动预算下实现了最低的RMSE，在30次随机试验中，在epsilon值为0.5时，相对于静态基线，在FGSM和BIM下分别提高了24.11%和97.97%。在大多数实验中，针对层生成MorphStrata学生导致的训练时间增加不到Morphence MTD基线的1%，同时实现了两位数的对抗RMSE降低。我们还观察到生成的学生的成对L2距离与整体防御有效性之间存在正相关。总之，与现有基线相比，MorphStrata以边际成本差异保持了作为MTD防御的对抗鲁棒性。

英文摘要

Time-series forecasting models remain vulnerable to gradient-based adversarial attacks while existing defense mechanisms typically incur a trade-off in robustness for bounded response and compute cost. The problem is pronounced in Moving Target Defense where maintaining multiple randomized model instances substantially exacerbates the training overhead. In this work, we introduce MorphStrata, a student generation strategy with selective, layer-specific stochastic noise injection that extends the traditional Morphence defense. MorphStrata uses a Transformer backbone as the teacher and perturbs randomly selected architectural blocks to create structured heterogeneity across student models in response to varied data distributions and threat models. We evaluate against vanilla Transformer and Morphence backbones on a suite of benchmarks including the Jena Climate, Electricity Load Diagrams, and Appliances Energy Prediction using FGSM, BIM and PGD attacks across multiple attack strengths. Across datasets and attack regimes, the proposed ensemble maintains comparable adversarial RMSE. Specifically, for high entropy, periodic datasets as in the case of the AEP data, MorphStrata achieves the lowest RMSE across all attacks and perturbation budgets, improving over the static baseline by up to 24.11% and 97.97% under FGSM and BIM respectively at an epsilon value of 0.5 over 30 randomized trials. Targeting the layers to generate MorphStrata students accounts for less than 1% increase in train-times over the Morphence MTD baseline for most of the experiments, while accounting for double digit gains in adversarial RMSE reduction. We also observe a positive correlation between higher pairwise L2 distance (among generated students) and overall defense effectiveness. In summary, MorphStrata maintains adversarial robustness as an MTD defense at marginal cost deltas when compared to existing baselines.

URL PDF HTML ☆

赞 0 踩 0

2606.17513 2026-06-17 cs.LG cs.AI 新提交

Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning

几何感知的算子学习事后不确定性量化

Oriol Vendrell-Gallart, Nima Negarandeh, Ramin Bostanabad

发表机构 * Department of Mechanical and Aerospace Engineering, University of California, Irvine（加州大学尔湾分校机械与航空航天工程系）

AI总结提出REEF-GP框架，通过高斯过程拟合冻结神经算子的残差，利用其内在坐标-特征表示构建几何感知的不确定性，在多个PDE基准上实现校准的不确定性估计，且计算成本远低于深度集成。

详情

AI中文摘要

神经算子为偏微分方程提供快速代理模型，但其确定性预测限制了在需要不确定性量化（UQ）的任务中的使用，尤其是在几何变化下。现有方法主要对网络参数进行不确定性建模，很大程度上忽略了算子本身学习的几何感知表示。我们提出REEF-GP（残差嵌入特征高斯过程），一种事后UQ框架，将高斯过程拟合到冻结神经算子的残差上，该算子的内部嵌入定义了核特征空间。REEF-GP不学习单独的特征映射，而是调整算子固有的坐标-特征表示以构建几何感知的不确定性。为了确保非结构化域上的稳定性和可扩展性，REEF-GP结合了谱归一化投影、异方差几何感知噪声以及高效基于子集的训练，避免了限制性的低秩近似。在五个具有不同几何形状的PDE基准测试中，REEF-GP保持了预测准确性，同时实现了与深度集成相竞争但成本仅为其一小部分的校准不确定性估计。我们的方法在几何分布偏移下保持鲁棒性，不确定性集中在物理上有意义的区域（例如激波前沿）。我们的结果表明，神经算子的准确且可扩展的事后UQ可以直接在其学习的特征空间中实现，为参数中心方法提供了实用替代方案。

英文摘要

Neural operators provide fast surrogates for PDEs but their deterministic predictions limit their use in tasks requiring uncertainty quantification (UQ), especially under geometric variability. Existing approaches primarily model uncertainty in network parameters, largely overlooking the geometry-aware representations learned by the operator itself. We propose REEF-GP (Residual on Embedded Features Gaussian Process), a post-hoc UQ framework that fits a GP to the residuals of a frozen neural operator whose internal embeddings define the kernel feature space. Rather than learning a separate feature map, REEF-GP adapts the operator's intrinsic coordinate-feature representations to construct geometry-aware uncertainties. To ensure stability and scalability on unstructured domains, REEF-GP incorporates spectral-normalized projections, heteroscedastic geometry-aware noise, and efficient subset-based training that avoids restrictive low-rank approximations. Across five PDE benchmarks with varying geometries, REEF-GP preserves predictive accuracy while achieving calibrated uncertainty estimates competitive with deep ensembles but at a fraction of their cost. Our approach remains robust under geometric distribution shift, with uncertainty concentrating in physically meaningful regions (e.g., shock fronts). Our results demonstrate that accurate and scalable post-hoc UQ for neural operators can be achieved directly in their learned feature space, offering a practical alternative to parameter-centric approaches.

URL PDF HTML ☆

赞 0 踩 0

2606.17756 2026-06-17 cs.LG 新提交

A fairness-aware extension of Stochastic Multicriteria Acceptability Analysis for ranking

一种公平性感知的随机多准则可接受性分析扩展用于排序

Guilherme Dean Pelegrina, Renata Pelissari

发表机构 * Engineering School, Mackenzie Presbyterian University（麦肯锡长老会大学工程学院）

AI总结提出SMAA-Fair，通过重加权排序以提升群体公平性，结合统计均等、rKL和nDKL指标，在保持鲁棒性同时改善受保护群体在有利位置的代表性。

详情

AI中文摘要

公平性已成为涉及个人或社会群体的排序问题的核心关注点，特别是在负责任人工智能议程下。在多准则决策分析中，随机多准则可接受性分析（SMAA）为处理不确定性和不完整偏好信息提供了稳健框架，但未明确解决排序结果中的公平性。本文提出SMAA-Fair，一种公平性感知的SMAA扩展用于排序问题。该方法根据模拟排序的群体公平性水平对其重新加权，使得更公平的排序对可接受性指数和中心权重向量贡献更大。该框架独立于聚合模型，并可纳入不同的公平性度量。本研究采用统计均等、归一化折扣Kullback-Leibler散度（rKL）和归一化折扣累积Kullback-Leibler散度（nDKL）。排序通过公平性调整的可接受性矩阵，使用期望排序和最大可接受性排序得出。我们还根据所得排序的公平程度推导中心权重。使用合成数据和真实数据的数值实验表明，SMAA-Fair改善了受保护群体在有利排序位置中的代表性，同时保持对偏好不确定性的鲁棒性。

英文摘要

Fairness has become a central concern in ranking problems involving individuals or social groups, particularly under the Responsible Artificial Intelligence agenda. In Multi-Criteria Decision Analysis, Stochastic Multicriteria Acceptability Analysis (SMAA) provides a robust framework for handling uncertainty and incomplete preference information, but it does not explicitly address fairness in the resulting rankings. This paper proposes SMAA-Fair, a fairness-aware extension of SMAA for ranking problems. The approach reweights the simulated rankings generated by SMAA according to their level of group fairness, so that fairer rankings contribute more strongly to the acceptability indices and central weights vector. The framework is independent of the aggregation model and can incorporate different fairness metrics. In this study, Statistical Parity, normalized discounted Kullback--Leibler divergence (rKL) and normalized discounted cumulative Kullback--Leibler divergence (nDKL) are adopted. Rankings are derived from the fairness-adjusted acceptability matrix using expected ranking and maximum acceptability ranking. We also derive the central weight according to the degree of fairness in the obtained rankings. Numerical experiments with synthetic and real data show that SMAA-Fair improves the representation of protected groups among favourable ranking positions, while preserving robustness to preference uncertainty.

URL PDF HTML ☆

赞 0 踩 0

2606.17810 2026-06-17 cs.LG cs.AI 新提交

No-Free-Fairness: Fundamental Limits and Trade-offs in Learning Systems

无免费公平：学习系统中的基本限制与权衡

Khoat Than

发表机构 * Hanoi University of Science and Technology（河内科技大学）

AI总结本文提出无免费公平定理，揭示学习系统中三个固有差异来源：任务固有成本导致性能与公平的权衡、有限样本诱导子群差异、模型类表达力限制导致公平不可达，表明不公平源于决策问题结构、数据有限性和模型表达力。

详情

AI中文摘要

在本文中，我们建立了一组理论不可能性结果，称为无免费公平定理，这些定理识别了学习系统中三个根本性的差异来源。首先，我们证明当任务在某个子群上表现出不可约成本时，任何决策规则都必须在整体性能与差异之间进行权衡，从而产生固有的公平-成本前沿。其次，我们证明即使在理想的无噪声环境中，存在完全公平且准确的解，仅凭有限样本学习就会导致非平凡的子群差异，排除了分布无关的公平保证。更严重的是，强制执行严格的相对公平会造成统计瓶颈：实现低成本可能需要指数级数量的样本。第三，我们证明模型类的局限性可以独立地导致差异：如果模型无法为某个子群表示准确的解，那么无论数据或训练过程如何，公平性都无法实现。总体而言，这些结果表明不公平不仅仅是由于有偏数据或次优优化，而是源于决策问题的内在结构、有限数据的约束以及模型的表达力。我们的框架广泛适用于标准监督学习之外，并表明实现公平需要明确的权衡，应被视为核心设计考虑因素。

英文摘要

In this paper, we establish a set of theoretical impossibility results, termed the No-Free-Fairness theorems, that identify three fundamental sources of disparity in learning systems. First, we show that when a task exhibits irreducible cost on a subgroup, any decision rule must trade off overall performance with disparity, yielding an inherent fairness--cost frontier. Second, we prove that even in ideal, noise-free settings where a perfectly fair and accurate solution exists, finite-sample learning alone induces nontrivial subgroup disparity, ruling out distribution-free fairness guarantees. More seriously, enforcing strict relative fairness creates a statistical bottleneck: achieving low cost may require exponentially many samples. Third, we show that limitations of the model class can independently induce disparity: if the model cannot represent accurate solutions for a subgroup, fairness remains unattainable regardless of data or training procedure. Overall, these results demonstrate that unfairness is not solely a consequence of biased data or suboptimal optimization, but arises from the intrinsic structure of decision problems, the constraints of finite data, and the expressivity of models. Our framework applies broadly beyond standard supervised learning, and suggests that achieving fairness requires explicit trade-offs and should be treated as a core design consideration.

URL PDF HTML ☆

赞 0 踩 0

2606.17110 2026-06-17 cs.CR cs.LG 交叉投稿

基于强化学习优化器的分布外检测的理论基础

Salimeh Sekeh, Xin Zhang

发表机构 * San Diego State University（圣地亚哥州立大学）

AI总结本文提出一种强化学习引导的优化器，通过修正梯度下降更新来降低语义分布外误报率，理论分析了模型变化和环境变化对泛化误差的影响。

详情

AI中文摘要

动态开放世界环境中的分布外（OOD）检测要求模型持续适应不断变化的数据分布，同时泛化到协变量偏移输入并拒绝语义偏移的OOD样本。大多数现有的OOD检测方法仅优化当前步目标，并未明确考虑部署后环境变化如何影响未来的OOD行为。在本文中，我们使用强化学习（RL）引导的优化器为动态OOD检测建立了理论基础，该优化器明确偏好随时间降低语义OOD假阳性率的更新。我们开发了一种新颖的增强优化器，在标准梯度下降（GD）之上使用RL引导的修正项，并展示了其在未来域泛化和语义OOD拒绝方面的改进。我们从模型变化和环境变化泛化误差的角度分析了时间误差分解，并开发了一个新的理论框架来比较GD和RL引导优化器下的泛化误差。

英文摘要

Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.

URL PDF HTML ☆

赞 0 踩 0

2606.18043 2026-06-17 cs.RO cs.LG 交叉投稿

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

基于流的视觉-语言-动作模型的不确定性量化

Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, Angela P. Schoellig

发表机构 * TU Munich（慕尼黑工业大学）； ETH Zurich（苏黎世联邦理工学院）； MPI IS Tübingen（马克斯·普朗克智能系统研究所）

AI总结提出利用速度场差异（VFD）量化流匹配模型中的认知不确定性，用于故障检测和主动微调，在LIBERO基准上实现高效任务适应。

Comments Project page: tum-lsy.github.io/uq_vla/. 28 pages, 12 figures

详情

AI中文摘要

视觉-语言-动作模型（VLAs）将视觉-语言骨干网络与通过大规模机器人数据集上的流匹配训练的生成式动作头相结合。尽管在机器人操作中表现出强大的经验性能，但VLAs缺乏量化其预测置信度和检测动作可能不可靠的机制。这对于在非平稳环境中的实际部署构成了关键限制，因为模型不可避免地会遇到其预训练分布之外的场景，并可能在没有警告的情况下失败。为了解决这个问题，我们通过利用小集成中的速度场差异（VFD），推导出一种量化流匹配模型中认知不确定性的高效方法。我们成功地将这种不确定性估计用于部署期间的故障检测和基于流的VLA的主动微调。为此，我们提出了SAVE，一个不确定性引导的主动多任务微调框架，减少了将VLA适应新任务所需的高成本专家演示数量。通过在LIBERO基准上的广泛实验，我们证明VFD能产生更校准的不确定性估计，预测下游性能，VFD在检测故障方面表现出色，并且使用SAVE进行不确定性引导的数据采集所需的样本比基线至少少22%。总之，我们的工作表明，量化基于流的VLA中的认知不确定性既提高了故障感知能力，也提高了适应性。项目网站：此http URL。

英文摘要

Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: tum-lsy.github.io/uq_vla/.

URL PDF HTML ☆

赞 0 踩 0

2507.20708 2026-06-17 cs.LG math.OC stat.AP 版本更新

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

揭露公平的幻象：审计对分布操纵攻击的脆弱性

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

AI总结研究恶意被审计方如何通过分布操纵制造公平假象，提出基于熵和最优传输的操纵策略，并评估统计检验的检测能力，为监管验证提供指导。

详情

Journal ref: European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Applied Data Science Track, 2026

AI中文摘要

人工智能系统在高风险领域（包括欧盟AI法案（Regulation (EU) 2024/1689）归类为高风险的领域）的快速部署，加剧了对可靠合规审计的需求。对于二分类器，监管风险评估通常依赖于全局公平性指标，如差异影响比，该指标广泛用于评估潜在歧视。在典型的审计设置中，被审计方将其数据集的一个子集提供给审计方，而监管机构可能验证该子集是否代表完整的底层分布。在这项工作中，我们研究了恶意被审计方在多大程度上可以从一个不合规的原始分布中构建一个符合公平性且看似具有代表性的样本，从而制造公平的幻象。我们将该问题形式化为一个受约束的分布投影任务，并引入基于熵和最优传输投影的数学基础操纵策略。这些构造刻画了满足公平约束所需的最小分布偏移。为了对抗此类攻击，我们通过基于分布距离的统计检验形式化代表性，并系统评估其检测操纵样本的能力。我们的分析强调了公平性操纵在统计上未被检测到的条件，并为加强监管验证提供了实用指南。我们通过在用于偏差检测的标准表格数据集上进行实验来验证我们的理论发现。代码公开于 https://this URL。

英文摘要

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

URL PDF HTML ☆

赞 0 踩 0

2510.11709 2026-06-17 cs.LG cs.AI cs.CV 版本更新

Adversarial Attacks Leverage Interference Between Features in Superposition

对抗攻击利用特征叠加中的干扰

Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal

AI总结本文揭示神经网络中特征叠加导致的干扰是对抗脆弱性的根源，通过理论推导和实验验证了干扰模式决定攻击成功与迁移性。

Comments Forty-third International Conference on Machine Learning

详情

AI中文摘要

为什么对抗样本存在，并且为什么它们能在模型间迁移？现有的解释诉诸于高维几何、输入中的非鲁棒模式以及决策边界结构，但没有一个提供表示层面的机制来解释为什么特定的扰动会成功以及为什么攻击能在模型间迁移。在本文中，我们表明对抗脆弱性可能源于神经网络中高效的信息编码。具体来说，脆弱性可能源于叠加——网络表示的概念数量超过其维度，迫使非正交表示从而产生干扰。这种干扰导致针对一个表示的扰动会影响其他表示，从而产生由干扰模式决定的脆弱性。在精确控制叠加的合成环境中，我们证实叠加足以产生对抗脆弱性。由此产生的攻击是可预测的：PGD发现的扰动与从干扰几何导出的理论最优扰动一致。在相似数据上训练的模型会发展出相似的干扰模式，这解释了攻击的可迁移性。然后我们表明，对图像分类器的成功攻击表现出我们提出的机制所预测的结构。这些发现揭示了对抗脆弱性可能是网络表示压缩的副产品，补充了基于数据属性或架构因素的现有解释。

英文摘要

Why do adversarial examples exist, and why do they transfer between models? Existing explanations appeal to high-dimensional geometry, non-robust patterns in the input, and decision boundary structure, but none provides a representation-level mechanism that explains why specific perturbations succeed and why attacks transfer between models. In this paper, we show that adversarial vulnerability can stem from efficient information encoding in neural networks. Specifically, vulnerability can arise from superposition - the phenomenon where networks represent more concepts than they have dimensions, forcing non-orthogonal representation and thus interference. This interference causes perturbations targeting one representation to affect others, creating vulnerabilities determined by interference patterns. In synthetic settings with precisely controlled superposition, we establish that superposition suffices to create adversarial vulnerability. The resulting attacks are predictable: PGD-discovered perturbations align with theoretically optimal perturbations derived from the interference geometry. Models trained on similar data develop similar interference patterns, explaining attack transferability. We then show that successful attacks on image classifiers exhibit the structure predicted by our proposed mechanism. These findings reveal that adversarial vulnerability can be a byproduct of networks' representational compression, complementing existing explanations based on data properties or architectural factors.

URL PDF HTML ☆

赞 0 踩 0

2511.01352 2026-06-17 cs.LG astro-ph.HE astro-ph.IM hep-ex physics.data-an 版本更新

MiniFool -- Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

MiniFool——深度神经网络中基于物理约束感知的最小化器对抗攻击

Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen

AI总结提出MiniFool算法，通过最小化结合χ²检验统计量与目标分数偏差的代价函数，生成物理感知的对抗样本，用于测试粒子与天体物理中的神经网络分类器，并量化网络决策的鲁棒性。

Comments Submitted to Computing and Software for Big Science

详情

DOI: 10.1140/epjc/s10052-026-15773-2
Journal ref: Published in: Eur.Phys.J.C 86 (2026) 6, 641

AI中文摘要

在本文中，我们提出了一种新算法MiniFool，该算法实现了物理启发的对抗攻击，用于测试粒子物理和天体粒子物理中基于神经网络的分类任务。虽然我们最初为IceCube中微子天文台的天体物理tau中微子搜索开发了该算法，但我们将其应用于其他科学领域的更多数据，从而证明了其通用性。在此，我们将该算法应用于著名的MNIST数据集，以及大型强子对撞机CMS实验的开放数据。该算法基于最小化一个代价函数，该函数结合了基于χ²的检验统计量与期望目标分数的偏差。检验统计量根据实验不确定性量化了应用于数据的扰动的概率。对于我们研究的用例，我们发现翻转分类的可能性对于最初正确分类和错误分类的事件是不同的。当测试分类随攻击参数（该参数缩放实验不确定性）的变化时，可以量化网络决策的鲁棒性。此外，这允许测试未标记实验数据分类的鲁棒性。

英文摘要

In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $χ^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

URL PDF HTML ☆

赞 0 踩 0

2602.08470 2026-06-17 cs.LG stat.ML 版本更新

Learning Credal Ensembles via Distributionally Robust Optimization

通过分布鲁棒优化学习信度集成

Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez

AI总结提出CreDRO方法，通过分布鲁棒优化学习集成模型，捕获由训练与测试数据分布偏移导致的认知不确定性，在分布外检测和选择性分类任务上优于现有方法。

Comments Accepted by ICML 2026 as Spotlight paper (https://icml.cc/virtual/2026/poster/62862)

详情

AI中文摘要

信度预测器是能够感知认知不确定性并产生凸集概率预测的模型。它们提供了一种量化预测认知不确定性（EU）的原则性方法，并已被证明能在各种设置下提高模型鲁棒性。然而，大多数最先进的方法主要将EU定义为由随机训练初始化引起的不一致性，这主要反映对优化随机性的敏感性，而非来自更深层次来源的不确定性。为了解决这一问题，我们将EU定义为在训练数据和测试数据之间i.i.d.假设的不同松弛下训练的模型之间的不一致性。基于这一思想，我们提出CreDRO，通过分布鲁棒优化学习一个由合理模型组成的集成。因此，CreDRO不仅从训练随机性中捕获EU，还从由于训练和测试数据之间潜在分布偏移而产生的有意义的不一致性中捕获EU。实验结果表明，CreDRO在多个基准的分布外检测和医学应用中的选择性分类等任务上，始终优于现有的信度方法。

英文摘要

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

URL PDF HTML ☆

赞 0 踩 0

2606.10703 2026-06-17 cs.LG cs.CL 版本更新

From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

从观察到干预：混合专家模型中专家重要性的因果审计

Leonard Engmann, Christian Medeiros Adriano, Holger Giese

发表机构 * University of California, Berkeley（加州大学伯克利分校）

AI总结通过因果审计发现，混合专家模型中的路由统计指标无法预测专家重要性，现有剪枝方法的成功源于早期层冗余而非识别可删除专家。

Comments 9 pages, 2 figures, 9 tables. Accepted at the ICML 2026 Workshop on Philosophy of Science Meets Machine Learning (PhilML). Camera-ready Version. Non-archival

详情

AI中文摘要

可解释性方法通常使用观察到的模型行为的总体统计量来推断特定计算的目标干预效果；用Pearl的术语来说，它们将第一层的关联证据视为支持第二层的干预结论，而这种做法的有效性很少被检验。我们考察了一个具体实例：混合专家（MoE）剪枝中路由统计量的使用，其中利用率、激活范数和路由权重分布被视为预测哪些专家可以被移除而不产生功能损失的指标。在三个高冗余MoE架构（OLMoE-1B-7B-0924、Qwen1.5-MoE-A2.7B、DeepSeek-V2-Lite）上进行的token级干预审计发现，经过多重比较校正后，没有任何观测指标能预测任何模型中的因果专家重要性，所有60个指标-层组合的效应量均低于Cohen's $d = 0.17$。通过每个token的路由权重控制排除了统计功效不足的问题，仅在OLMoE的最后一个MoE层恢复了一个Bonferroni显著的信号（$d = +0.231$, $p = 0.0013$）。现有剪枝方法在此场景下的成功并非由于识别了可删除的专家，而是因为早期层的冗余使得大多数选择标准可互换。我们的结果提供了一个明确的反例，表明从总体观测统计量到关于专家重要性的token级干预推断这一常见推理步骤存在问题，并展示了干预审计如何校准可解释性主张的证据标准。

英文摘要

Interpretability methods routinely use population-level summary statistics over observed model behaviour to license claims about the effects of targeted interventions on specific computations; in Pearl's terms, they treat rung-1 associational evidence as if it supported rung-2 interventional conclusions, a move whose validity is rarely tested. We examine one concrete instance: the use of routing statistics in Mixture-of-Experts (MoE) pruning, where utilization rates, activation norms, and routing weight distributions are treated as predictors of which experts can be removed without functional cost. A token-level interventional audit across three high-redundancy MoE architectures (OLMoE-1B-7B-0924, Qwen1.5-MoE-A2.7B, DeepSeek-V2-Lite) finds no observational metric predicts causal expert importance in any model: across all 60 metric-layer combinations effect sizes stay below Cohen's $d = 0.23$, and no metric is reliably positive under our corrected, dual-test criterion. A per-token routing weight control, run with identical $n$, rules out insufficient power, recovering a signal whose CI excludes zero at OLMoE's final MoE layer ($d = +0.231$, 95\% CI $[+0.09, +0.37]$, $p = 0.0013$). Existing pruning methods succeed in this regime not by identifying dispensable experts but because early-layer redundancy renders most selection criteria interchangeable. Our results provide an explicit counterexample to the common inferential step from population-level observational summaries to token-level interventional claims about expert importance, and illustrate how interventional audits can calibrate the evidential standards for interpretability claims.

URL PDF HTML ☆

赞 0 踩 0

2606.15531 2026-06-17 cs.LG cs.CR 版本更新

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

贪婪坐标扩散：通过扩散引导实现有效且语义一致的对抗攻击

Bohdan Turbal, Blossom Metevier, Max Springer, Aleksandra Korolova

发表机构 * University of Maryland（马里兰大学）； University of California, Berkeley（加州大学伯克利分校）

AI总结提出贪婪坐标扩散方法，利用扩散模型引导生成语义连贯的对抗样本，在保持自然性的同时实现高攻击成功率。

详情

Journal ref: ICML 2026

AI中文摘要

在良性任务（如数学辅导）上微调对齐的语言模型会系统性破坏安全护栏，即使训练数据不包含有害内容。虽然机械论方法已揭示对齐在模型权重中的位置，但它们并未提供通用形式框架来推导关于微调何时降低对齐的保证——这使得该领域缺乏预测或防止对齐崩溃的原则性工具。我们通过参数空间轨迹的几何分析开发了一个局部几何框架，并将其应用于理解微调中对齐的脆弱性。虽然一阶分析表明正交更新是安全的，但我们证明这是虚幻的：微调损失的曲率诱导二阶加速，可能导致二阶漂移进入对齐敏感区域。我们将框架的一个构造形式化为对齐不稳定性条件（AIC），即三个几何性质，当它们存在时足以保证退化。我们的主要结果证明了沿梯度流轨迹的对齐退化四次方起始，这由对齐对特定参数的依赖程度以及任务与这些参数的耦合强度决定。这些发现给出了静态一阶保护在梯度下降下失效的正式充分条件。我们进一步实证验证了框架的基础，表明Fisher信息矩阵可以代理不同微调中安全退化的程度。

英文摘要

Adversarial attacks on large language models have limited practical impact despite extensive research. Optimization-based attacks such as Greedy Coordinate Gradient (GCG) (Zou et al., 2023) produce high-perplexity, incoherent suffixes that existing defenses easily detect (Bengio et al., 2024). Moreover, attempting to enforce coherence constraints during optimization often prevents the attack from successfully eliciting the specific targeted response, resulting in low success rates against robust models. Conversely, attacks that maintain coherence often alter the semantic intent of queries; when the model complies with these altered queries, responses fail to address the adversary's original goal. In this work, we introduce Greedy Coordinate Diffusion (GCD), a novel framework that efficiently generates adversarial attacks against safety-aligned models while maintaining low perplexity and high semantic adherence to the adversary's original intent. GCD leverages the generative priors of discrete diffusion language models to guide the search for adversarial suffixes that achieve semantic coherence and adherence. Unlike GCG, GCD does not require direct gradient access, allowing it to operate in a gray-box setting. We show GCD achieves highest ASR while remaining competitive on response-quality scores, and that the constructed adversarial prompts are detected at lower rates than other methods by perplexity-based and guard-model filters.

URL PDF HTML ☆

赞 0 踩 0

2411.08821 2026-06-17 stat.ML cs.LG stat.CO 版本更新

Conditional Local Importance by Quantile Expectations

基于分位数期望的条件局部重要性

Kelvyn K. Bladen, Adele Cutler, D. Richard Cutler, Kevin R. Moon

AI总结提出模型无关的局部变量重要性方法CLIQUE，通过分位数期望捕获局部依赖关系，提升稳定性并直接适用于多类分类问题。

Comments 29 pages, 28 figures

详情

Journal ref: Transactions on Machine Learning Research (2026)

AI中文摘要

全局变量重要性度量通常用于解释机器学习模型的结果。局部变量重要性技术评估变量如何影响单个观测。当前流行的方法，包括LIME和SHAP，在预测空间中提供了有用的特征贡献度量，但在模型损失空间中改进局部结构表征方面仍有空间。此外，它们本身不适用于多类分类问题。我们提出了一种新的模型无关的局部变量重要性计算方法CLIQUE，它突出局部依赖关系，比基于置换的方法具有更好的稳定性，并且可以直接应用于多类分类问题。模拟和真实示例表明，CLIQUE强调局部依赖信息，捕获超出相关性可评估的交互行为，并在响应变量对变量变化不变的区域分配零重要性。

英文摘要

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, provide useful measures of feature contribution in the prediction space, while leaving opportunities for improved characterization of local structure in the model loss space. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that highlights locally dependent relationships, provides improved stability over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and assigns zero importance in regions where the response is invariant to changes in variables.

URL PDF HTML ☆

赞 0 踩 0

2602.04155 2026-06-17 stat.ML cs.GT cs.LG 版本更新

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

最大化相对改进：将公平学习视为讨价还价问题

Jiwoo Han, Moulinath Banerjee, Yuekai Sun

AI总结提出将群体公平解释为子群体间的讨价还价问题，通过相对改进指标恢复Kalai-Smorodinsky解，并给出公理化和有限样本收敛保证。

Comments Accepted at ICML 2026

2603.03824 2026-06-17 cs.AI cs.CL cs.LG cs.MA 版本更新

芬斯勒几何、图神经网络与你

T. Mitchell Roddenberry, Richard G. Baraniuk

发表机构 * Rice University（莱斯大学）

AI总结针对图拉普拉斯只能近似各向同性算子的局限，提出基于芬斯勒拉普拉斯的图神经网络层，证明其收敛性并恢复非线性扩散方程的几何结构。

2606.17531 2026-06-17 cs.LG cs.CG math.AT 新提交

Non-negative Matrix Factorisation with Topological Regularisation

带拓扑正则化的非负矩阵分解

Matias de Jong van Lier, Shizuo Kaji, Keunsu Kim

发表机构 * Recursive Inc.（Recursive公司）； Graduate School of Science, Kyoto University（京都大学理学研究科）； Institute of Mathematics for Industry, Kyushu University（九州大学数理学研究院）

AI总结提出通过持久同调作为拓扑正则化项融入非负矩阵分解目标函数，以学习具有空间连贯性、周期结构或团状图信号的可解释基函数。

2606.17579 2026-06-17 cs.LG cs.AI cs.CL cs.SI 新提交

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

LLM特征可能损害GNN：同配图基准上的拼接干扰

Zhongyuan Wang, Pratyusha Vemuri

AI总结本文发现将LLM特征通过纯输入拼接（而非联合训练）引入图神经网络时，会在同配基准上系统性地降低准确率，并提出了一个基于LLM单独判别性指标Delta_sig来预测拼接效果。

Comments 29 pages, 8 figures

详情

AI中文摘要

将LLM生成的节点特征添加到图神经网络（GNN）中，被广泛报道能提高标准基准的准确率。我们记录了一个相反的观察：当LLM特征通过纯输入拼接（而非联合训练、蒸馏或提示条件）引入时，它们会在相同的同配基准上系统地降低准确率，而端到端LLM流水线在这些基准上却能成功。使用MLP骨干网络、Planetoid公共划分和词袋原始特征，拼接SBERT编码的GPT-4o-mini TAPE特征导致PubMed测试准确率下降-17.0±0.3个百分点，Cora下降-4.3±0.6个百分点（CiteSeer下降-0.6±0.8个百分点，在种子噪声范围内）。当我们放宽每个条件（GCN/GCNII/GAT骨干网络、随机划分、更小编码器）时，下降幅度减弱，并在中等同配的WikiCS（+4.4个百分点）和ogbn-arxiv（+11.7个百分点）上逆转。为了预测拼接何时有益或有害，我们报告了一个简单的LLM单独判别性指标Delta_sig。在9个数据集上，Delta_sig与拼接成本的相关系数（r^2=0.38）强于同配性（r^2=0.06；N=9，bootstrap置信区间重叠）。bootstrap最佳变点为tau=13.8个百分点，规则“Delta_sig <= tau预测非正拼接成本”正确分类了7/9个数据集；由于60%的bootstrap样本将tau置于[5,30]个百分点之间，我们将Delta_sig视为解释性透镜而非精确过滤器。在PubMed上进行的维度控制消融实验将LLM特征下降置于同源PCA（-2.3个百分点）和同维高斯噪声（-37.3个百分点）之间，排除了维度和权重衰减的影响。九个PubMed配置拟合出幂律|Delta_concat| ∝ (sqrt(d_l/n))^1.31，r^2=0.97；低Delta_sig、小n的角落正是标题中-17个百分点PubMed缺陷出现的位置。

英文摘要

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig <= tau predicts non-positive concat cost" classifies 7/9 datasets correctly; since 60% of bootstrap samples place tau in [5, 30] pp, we treat Delta_sig as an interpretive lens rather than a precision filter. A dimension-controlled ablation on PubMed places the LLM-feature drop between same-source PCA (-2.3 pp) and same-dim Gaussian noise (-37.3 pp), ruling out dimensionality and weight-decay artifacts. Nine PubMed configurations fit a power law |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97; the low-Delta_sig, small-n corner is exactly where the headline -17 pp PubMed deficit appears.

URL PDF HTML ☆

赞 0 踩 0

2606.17667 2026-06-17 cs.LG cs.AI 新提交

Handling Feature Heterogeneity with Learnable Graph Patches

处理特征异质性：可学习图块方法

Yifei Sun, Yang Yang, Xiao Feng, Zijun Wang, Haoyang Zhong, Chunping Wang, Lei Chen

发表机构 * Zhejiang University（浙江大学）； Huazhong University of Science and Technology（华中科技大学）； Finvolution Group（信也科技集团）

AI总结提出可学习图块概念，将图分解为语义单元，通过补丁编码器和聚合器实现跨域图数据的可迁移预训练，提升下游任务性能。

Comments Accepted at KDD 2025

详情

DOI: 10.1145/3690624.3709242

AI中文摘要

近年来，基础模型和图预训练技术的快速发展激发了构建通用预训练图模型或图基础模型（GFM）的兴趣。然而，一个重大挑战是现有模型无法处理无文本信息的图数据中的特征异质性，这阻碍了图模型在不同数据集间的可迁移性。为弥补这一差距，我们提出了可学习图块的概念，将其视为任何图数据的最小语义单元。我们通过展开节点特征并分别构建相应的图块结构，将图分解为可学习图块。然后，我们设计了一个框架，从跨域图数据中挖掘可迁移信息。具体来说，在提取图块后，我们提出一个补丁编码器从每个单元中提取知识，以及一个补丁聚合器学习如何将单元组合成整体。由于其领域无关的特性，该模型可应用于不同领域的下游数据。此外，我们分析了我们的方法与现有图模型之间的联系，以及其生成的节点嵌入的可迁移性。实验表明，我们的方法不仅实现了使用多域图进行预训练的能力，而且在各种下游数据集和任务上表现出增强的性能。此外，我们观察到随着预训练数据量的增加，下游性能持续提升。

英文摘要

In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.

URL PDF HTML ☆

赞 0 踩 0

2606.18001 2026-06-17 cs.LG 新提交

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

半条链接足以预测整条链接：理解知识图谱基础模型中的泛化

Cosimo Gregucci, Obaidah Theeb, Daniel Hernandez, Antonio Vergari, Steffen Staab

发表机构 * Institute for AI, University of Stuttgart（斯图加特大学人工智能研究所）； University of Southampton（南安普顿大学）； University of Edinburgh（爱丁堡大学）

AI总结本文通过分析知识图谱基础模型在未见图上的零样本泛化，发现模型利用部分可见的“半链接”进行预测，并基于此提出四类场景的分类法，揭示现有模型的泛化机制与改进方向。

详情

AI中文摘要

知识图谱（KG）基础模型（KGFMs）是零样本泛化器：只需训练一次，它们就能在未见过的图上预测链接，无需重新训练。然而，理解它们何时以及如何能够在不同KG间稳健泛化仍是一个开放问题。在本文中，我们揭示了它们的泛化机制，强调了它们在未见KG上的性能在涉及部分可见链接（我们称之为半链接）时并非均匀。事实上，我们表明，要预测一个测试三元组$(h,r,t)$，在实践中可能只需在推理图中观察到半链接$(h,r)$或$(r,t)$。这产生了四种场景的分类法，这些半链接的组合被观察到或未被观察到。通过对这些场景进行严格的分层分析，我们揭示了SoTA KGFMs利用可见的半链接进行预测，而不可见的半链接则带来不同的挑战。因此，我们更细粒度的分类法可以作为稳健KGFM泛化的诊断协议，并突出新KGFM可以改进的地方。

英文摘要

Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

URL PDF HTML ☆

赞 0 踩 0

2606.17684 2026-06-17 stat.ML cs.CY cs.LG 交叉投稿

Geometrical fairness in graph neural networks

图神经网络中的几何公平性

Arturo Pérez-Peralta, Sandra Benítez-Peña, Blas Kolic, Rosa E. Lillo

发表机构 * Department of Statistics, University Carlos III of Madrid, Spain（马德里卡斯蒂利亚-拉曼恰大学统计系）； uc3m-Santander Big Data Institute（uc3m-桑坦德大数据研究所）

AI总结针对图神经网络中公平性问题，通过修改拉普拉斯算子引入多种互补变换（子空间投影、频谱调整、频率滤波）来缓解偏差，理论分析并实验验证了公平性提升与竞争性能。

Comments 32 pages, 21 tables, 6 figures

详情

AI中文摘要

基于图的学习方法因其在多种应用中的强大性能而日益突出。其中，基于扩散过程的最新框架提供了一个统一的视角，扩展了传统的图神经网络公式，同时解决了标准消息传递机制的局限性。尽管取得了这些进展，但此类模型的公平性问题仍然令人担忧，因为它们可能传播或放大数据中存在的偏差。在这项工作中，我们通过修改底层拉普拉斯算子，引入了一种基于图扩散的公平性感知适应方法。我们的方法结合了多种互补变换，包括子空间投影、频谱调整和基于频率的滤波，以减轻与偏差相关的成分。利用图扩散的内在平滑特性，我们对由此产生的行为进行了原则性分析，并建立了公平性属性的理论见解。我们在合成数据集和真实数据集上评估了所提出的框架，结果表明，在有限的计算成本下，它实现了具有竞争力的性能，同时提高了公平性指标。

英文摘要

Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

URL PDF HTML ☆

赞 0 踩 0

2401.14381 2026-06-17 cs.LG math.DG 版本更新

Manifold GCN: Diffusion-based Convolutional Neural Network for Manifold-valued Graphs

Manifold GCN：基于扩散的流形值图卷积神经网络

Martin Hanik, Gabriele Steidl, Christoph von Tycowicz

发表机构 * BIFOLD—Berlin Institute for the Foundations of Learning and Data（柏林学习与数据基础研究院）； Technical University Berlin（柏林技术大学）； Zuse Institute Berlin（柏林泽尼茨研究所）

AI总结提出两种适用于黎曼流形特征图的图神经网络层：基于流形值图扩散方程的扩散层和受向量神经元启发的切向多层感知器，两者在节点置换和流形等距下等变，在更广泛问题上优于任务特定网络。

Comments Extended ADNI experiment

详情

DOI: 10.1007/s11263-026-02899-9
Journal ref: International Journal of Computer Vision, Volume 134, article number 315 (2026)

AI中文摘要

我们提出了两种适用于黎曼流形中特征图的图神经网络层。首先，基于流形值图扩散方程，我们构建了一个可应用于任意数量节点和图连接模式的扩散层。其次，通过将向量神经元框架的思想迁移到我们的通用设置中，我们建模了一个切向多层感知器。这两层在节点置换和特征流形的等距变换下都是等变的。这些特性在许多深度学习任务中带来了有益的归纳偏置。此外，它们还支持新颖、更灵活的特征设计。合成数据上的数值示例以及基于右海马体三角网格的阿尔茨海默病分类应用证明了我们新层的实用性：虽然它们适用于更广泛的问题类别，但在性能上优于任务特定的最先进网络。

英文摘要

We propose two graph neural network layers for graphs with features in a Riemannian manifold. First, based on a manifold-valued graph diffusion equation, we construct a diffusion layer that can be applied to an arbitrary number of nodes and graph connectivity patterns. Second, we model a tangent multilayer perceptron by transferring ideas from the vector neuron framework to our general setting. Both layers are equivariant under node permutations and the feature manifold's isometries. These properties have led to a beneficial inductive bias in many deep-learning tasks. Furthermore, they enable novel, more flexible feature designs. Numerical examples on synthetic data and an Alzheimer's classification application on triangle meshes of the right hippocampus demonstrate the usefulness of our new layers: While they apply to a much broader class of problems, they outperform task-specific state-of-the-art networks.

URL PDF HTML ☆

赞 0 踩 0

2507.11178 2026-06-17 cs.LG cs.AI 版本更新

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

基于梯度的因果发现框架及其在复杂工业过程中的应用

Meiliang Liu, Huiwen Dong, Xiaoxiao Yang, Yunfang Xu, Mingbao Yang, Zijin Li, Zhengye Si, Xinyue Yang, Zhiwen Zhao

AI总结提出GRNGC方法，通过对模型输入输出梯度施加L1正则化推断Granger因果，仅需一个预测模型，降低计算开销，在多个基准和真实数据集上优于现有方法。

Comments 9 pages,3 figures, conference

详情

AI中文摘要

随着深度学习技术的发展，各种基于神经网络的Granger因果模型已被提出。尽管这些模型表现出显著改进，但仍存在若干局限性。大多数现有方法采用组件式架构，需要为每个时间序列构建单独的模型，导致大量计算成本。此外，对神经网络第一层权重施加稀疏性惩罚以提取因果关系，削弱了模型捕捉复杂交互的能力。为解决这些局限性，我们提出基于梯度正则化的神经Granger因果（GRNGC），该方法仅需一个时间序列预测模型，并对模型输入与输出之间的梯度施加$L_{1}$正则化以推断Granger因果。此外，GRNGC不依赖于特定的时间序列预测模型，可通过KAN、MLP和LSTM等多种架构实现，提供增强的灵活性。在DREAM、Lorenz-96、fMRI BOLD和CausalTime上的数值模拟表明，GRNGC优于现有基线，并显著降低计算开销。同时，在真实世界的DNA、酵母、HeLa和膀胱尿路上皮癌数据集上的实验进一步验证了该模型在重建基因调控网络方面的有效性。

英文摘要

With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.

URL PDF HTML ☆

赞 0 踩 0

2605.00725 2026-06-17 cs.LG 版本更新

Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

组合复形上的Weisfeiler-Lehman测试：拓扑神经网络的泛化表达能力

Jiawen Chen, Qi Shao, Zhiqiang Ge, Duxin Chen, Wenwu Yu

AI总结提出组合复形Weisfeiler-Lehman（CCWL）框架，通过四种结构邻域统一拓扑神经网络的表达能力，并证明在特定条件下可简化为仅使用上下邻域桥信息，实例化为CCIN网络，实验验证其有效性。

详情

AI中文摘要

拓扑神经网络已成为建模超图、单纯复形和胞腔复形等超越成对图的高阶关系结构的有效工具。然而，现有的Weisfeiler-Leman类型表达能力分析通常在不同的结构域上开发，并依赖于特定域的邻域系统，使得它们的表达能力难以在统一形式下进行比较。本文提出了组合复形Weisfeiler-Lehman（CCWL）框架，这是在组合复形上定义的一种统一的表达能力细化。通过利用组合复形表示集合类型关系和部分-整体层次结构的能力，CCWL通过四个结构邻域进行拓扑颜色细化：边界、共边界、下邻接和上邻接。我们证明，在指定的提升映射下，CCWL可以模拟多个特定域的WL类型细化，从而为分析拓扑消息传递提供了共同的理论基线。我们进一步研究了邻域充分性问题，并证明在显式覆盖条件下，仅使用下邻接和上邻接桥信息的简化细化保留了完整四邻域CCWL细化的区分能力。基于这一理论结果，我们将简化细化实例化为组合复形同构网络（CCIN）。在合成和真实世界基准上的实验表明，CCIN在代表性图和拓扑神经网络基线上取得了有竞争力的性能。消融研究和资源效率分析进一步支持了所提出的下/上邻域设计的有效性。

英文摘要

Topological neural networks have emerged as effective tools for modeling higher-order relational structures beyond pairwise graphs, including hypergraphs, simplicial complexes, and cell complexes. However, existing Weisfeiler-Leman type expressivity analyses are typically developed on different structural domains and rely on domain-specific neighborhood systems, making their expressive powers difficult to compare within a common formalism. In this paper, we introduce the Combinatorial Complex Weisfeiler-Leman (CCWL) framework, a unified expressive power refinement defined on combinatorial complexes. By exploiting the ability of combinatorial complexes to represent both set-type relations and part-whole hierarchies, CCWL performs topological color refinement through four structural neighborhoods: boundary, co-boundary, lower adjacency, and upper adjacency. We show that, under specified lifting maps, CCWL can simulate several domain-specific WL-type refinements, thereby providing a common theoretical baseline for analyzing topological message passing. We further study the neighborhood sufficiency problem and prove that, under explicit coverage conditions, a reduced refinement using only lower- and upper-adjacent bridge information preserves the distinguishing power of the full four-neighborhood CCWL refinement. Guided by this theoretical result, we instantiate the reduced refinement as the Combinatorial Complex Isomorphism Network (CCIN). Experiments on synthetic and real-world benchmarks demonstrate that CCIN achieves competitive performance against representative graph and topological neural network baselines. Ablation studies and resource-efficiency analyses further support the effectiveness of the proposed lower/upper-neighborhood design.

URL PDF HTML ☆

赞 0 踩 0

2606.12863 2026-06-17 cs.LG 版本更新

Multimodal Graph Negative Learning

多模态图负学习

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出GraphMNL框架，通过负学习解决多模态属性图中节点级分支语义不平衡问题，避免主导分支偏差传播，在Grocery和Reddit M数据集上取得最优性能。

详情

AI中文摘要

多模态属性图（MAGs）将图拓扑与异构模态属性（如文本和图像）集成，从而能够对复杂关系系统进行更丰富的建模。然而，这种表达能力也使得MAGs上的学习依赖于多个语义源，包括结构拓扑、文本和视觉属性，每个都可以被视为节点表示的一个分支。当这些分支在语义信息量和可靠性上因节点而异时，就会出现节点级分支语义不平衡：一个分支为某个节点提供判别性语义，但由于模态质量或结构上下文的偏差，可能会误导另一个节点。现有方法通常通过跨分支一致性或对齐来缓解这种异质性，隐含地将主导预测视为可靠监督。当主导分支有偏差时，强制模仿可能会将其偏差传播到其他分支，并抑制对分类有用的原始语义。我们提出GraphMNL，一种图感知的多模态负学习框架，通过使用负学习作为跨分支指导来解决这个问题。该模型不强制劣质分支模仿教师预测，而是教导它们节点不太可能属于哪些类别。GraphMNL构建分支库，通过图感知可靠性仲裁识别主导和劣质分支，门控不稳定传输，并对非目标类别应用目标保持负学习。这种设计将目标监督与分支指导解耦，使得监督损失学习正确类别，而当分支一致性不可靠时，负学习抑制不太可能的备选类别。通过全面的实验评估，GraphMNL在Grocery数据集上达到72.47%的准确率，在Reddit M数据集上达到76.60的F1分数，取得了最佳性能。

英文摘要

Multimodal attributed graphs (MAGs) integrate graph topology with heterogeneous modality attributes, such as text and images, thereby enabling richer modeling of complex relational systems. However, such expressiveness also makes learning on MAGs depend on multiple semantic sources, including structural topology, textual and visual attributes, each of which can be regarded as a branch for node representation. Node-level branch semantic imbalance arises when these branches differ across nodes in semantic informativeness and reliability: a branch that provides discriminative semantics for one node may mislead another due to bias in modality quality or structural context. Existing methods often mitigate such heterogeneity through cross-branch agreement or alignment, implicitly treating the dominant prediction as reliable supervision. When the dominant branch is biased, forced imitation may propagate its bias to other branches and suppress original semantics that are useful for classification. We propose GraphMNL, a graph-aware multimodal negative learning framework that addresses this issue by using Negative Learning as cross-branch guidance. Instead of forcing inferior branches to imitate a teacher prediction, the model teaches them which classes a node is unlikely to belong to. GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes. This design decouples target supervision from branch guidance so that supervised losses learn the correct class, while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable. Through the comprehensive experimental evaluation, GraphMNL achieves the best performance on Grocery datasets with 72.47% accuracy and 76.60 F1 score on Reddit M datasets.

URL PDF HTML ☆

赞 0 踩 0

超越领域：通过可迁移交互模式重用网络技能

Shiqi He, Yue Cui, Feijie Wu, Xinyu Ma, Jiaheng Lu, Yaliang Li, Bolin Ding, Mosharaf Chowdhury

发表机构 * University of Michigan（密歇根大学）； Alibaba Group（阿里巴巴集团）； Purdue University（普渡大学）； McMaster University（麦克马斯特大学）； University of Pennsylvania（宾夕法尼亚大学）

AI总结提出SkillMigrator代理，通过学习可迁移交互模式（TIP）匹配布局结构而非元素引用，实现跨站点技能重用，在WebArena和Mind2Web上成功轨迹的LLM动作数减少8-10%。

详情

AI中文摘要

大型语言模型（LLM）网络代理通常被部署为工具调用者：每轮，模型读取新的页面观察并发出一个结构化工具动作。当每个动作都是低级原语时，视野迅速增长，面向策略的LLM完成次数也随之增加，在Mind2Web和WebArena等基准测试中主导了延迟和成本。因此，最近的系统将重复的交互片段包装为网络技能：从成功轨迹或诱导程序中构建的可调用工具，这样一次调用可以替代多个原语。然而，先前的技能库仍然主要通过指令相似性或粗略的站点元数据触发，这导致在未见站点上技能重用率低，并留下了许多潜在的步骤和令牌减少空间。我们提出了SkillMigrator，一个学习可重用网络技能并通过匹配布局结构而非特定元素引用来跨站点迁移它们的代理。每个诱导技能被存储为可迁移交互模式（TIP）：技能与诱导时快照的结构草图配对。在测试时，SkillMigrator通过布局相似性检索TIP，并将其引用锚定到实时页面。其余堆栈是标准的：具有稳定引用的可访问性快照观察，以及基于原语加技能调用的固定工具调用。与最先进的方法相比，SkillMigrator在匹配成功率的情况下，将WebArena和Mind2Web上成功轨迹的平均LLM动作数减少了8-10%。

英文摘要

Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such as Mind2Web and WebArena. Recent systems therefore wrap repeated interaction fragments as web skills: callable tools built from successful trajectories or induced programs, so one call can replace several primitives. However, prior skill libraries are still triggered mainly by instruction similarity or coarse site metadata, which yields low skill reuse on held-out sites and leaves much of the potential step and token reduction on the table. We present SkillMigrator, an agent that learns reusable web skills and transfers them across sites by matching layout structure rather than specific element references. Each induced skill is stored as a transferable interaction pattern (TIP): the skill paired with a structural sketch of the snapshot at induction time. At test time, SkillMigrator retrieves TIPs by layout similarity and grounds their references on the live page. The rest of the stack is standard: accessibility-snapshot observations with stable references, and fixed tool calling over primitives plus skill invocations. Compared with the state-of-the-art approaches, SkillMigrator reduces the average LLM-action count on successful trajectories by 8-10% across both WebArena and Mind2Web at matched success rate.

URL PDF HTML ☆

赞 0 踩 0

2512.04524 2026-06-17 cs.LG cs.AI 版本更新

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

基于原型语义一致性对齐的域自适应检索

Tianle Hu, Weijun Lv, Na Han, Xiaozhao Fang, Jie Wen, Jiaxing Li, Guoxu Zhou

发表机构 * School of Computer Science and Technology, Guangdong University of Technology（广东工业大学计算机科学与技术学院）； School of Automation, Guangdong University of Technology（广东工业大学自动化学院）； School of Computer Science, Guangdong Polytechnic Normal University（广东 polytechnic 正规大学计算机科学学院）； School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen（哈尔滨工业大学深圳校区计算机科学与技术学院）； School of Artificial Intelligence, Guangzhou University（广州大学人工智能学院）

AI总结提出原型语义一致性对齐（PSCA）两阶段框架，通过正交原型建立类级语义连接，利用几何邻近性加权伪标签置信度，并在重构特征上量化生成统一哈希码，解决域自适应检索中的类级对齐缺失和量化质量下降问题。

Comments AAAI2026

详情

AI中文摘要

域自适应检索旨在将知识从有标签的源域迁移到无标签的目标域，实现有效检索的同时缓解域差异。然而，现有方法存在几个根本性局限：1）忽略类级语义对齐，过度追求成对样本对齐；2）缺乏伪标签可靠性考虑或评估标签正确性的几何指导；3）直接量化受域偏移影响的原始特征，损害所学哈希码的质量。鉴于这些局限，我们提出基于原型的语义一致性对齐（PSCA），一种用于有效域自适应检索的两阶段框架。在第一阶段，一组正交原型直接建立类级语义连接，在聚集类内样本的同时最大化类间分离性。在原型学习过程中，几何邻近性通过自适应加权伪标签置信度，为语义一致性对齐提供可靠性指标。所得的隶属度矩阵和原型促进特征重建，确保在重建特征而非原始特征上进行量化，从而改善后续哈希编码质量并无缝连接两个阶段。在第二阶段，特定域的量化函数在相互逼近约束下处理重建特征，生成跨域的统一二进制哈希码。大量实验验证了PSCA在多个数据集上的优越性能。

英文摘要

Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-Based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudo-label confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA's superior performance across multiple datasets.

URL PDF HTML ☆

赞 0 踩 0

2602.03846 2026-06-17 cs.LG cs.AI 版本更新

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

PLATE: 可塑性可调的几何感知持续学习高效适配器

Romain Cosentino

AI总结提出无需旧任务数据的持续学习方法PLATE，利用预训练网络的几何冗余性，通过结构化低秩更新显式控制可塑性-保留权衡，提升最坏情况保留保证。

详情

AI中文摘要

我们为预训练模型开发了一种持续学习方法，该方法不需要访问旧任务数据，解决了基础模型适应中预训练分布通常不可用的实际障碍。我们的关键观察是，预训练网络表现出大量的几何冗余性，并且这种冗余性可以通过两种互补的方式加以利用。首先，冗余神经元提供了预训练时代主导特征方向的代理，使得可以直接从预训练权重构建近似受保护的更新子空间。其次，冗余性为可塑性的放置位置提供了自然偏差：通过将更新限制在冗余神经元的子集并约束剩余的自由度，我们获得了在旧数据分布上功能漂移减少且最坏情况保留保证改善的更新族。这些见解导致了PLATE（可塑性可调的高效适配器），一种不需要过去任务数据的持续学习方法，它提供了对可塑性-保留权衡的显式控制。PLATE通过结构化低秩更新ΔW = B A Q^T参数化每一层，其中B和Q从预训练权重一次性计算并保持冻结，只有A在新任务上训练。代码可在https://this URL获取。

英文摘要

We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $ΔW = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.

URL PDF HTML ☆

赞 0 踩 0

2603.01761 2026-06-17 cs.LG cs.AI 版本更新

Position: Modular Memory is the Key to Continual Learning Agents

Position: 模块化记忆是持续学习智能体的关键

Vaggelis Dorovatas, Malte Schwerin, Andrew D. Bagdanov, Lucas Caccia, Antonio Carta, Laurent Charlin, Barbara Hammer, Tyler L. Hayes, Timm Hess, Christopher Kanan, Dhireesha Kudithipudi, Xialei Liu, Vincenzo Lomonaco, Jorge Mendez-Mendez, Darshan Patil, Ameya Prabhu, Elisa Ricci, Tinne Tuytelaars, Gido M. van de Ven, Liyuan Wang, Joost van de Weijer, Jonghyun Choi, Martin Mundt, Rahaf Aljundi

AI总结本文提出通过模块化记忆结合权重内学习与上下文学习，解决持续学习中的灾难性遗忘问题，实现大规模持续适应。

Comments ICML 2026 Position Track Spotlight. This work stems from discussions held at the Dagstuhl seminar on Continual Learning in the Era of Foundation Models (October 2025)

详情

AI中文摘要

基础模型通过大规模预训练和增加测试时计算已经改变了机器学习。尽管在多个领域超越了人类表现，这些模型在持续运行、经验积累和个性化方面仍然存在根本性限制，而这些能力是自适应智能的核心。虽然持续学习研究长期以来一直瞄准这些目标，但其历史上专注于权重内学习（IWL），即更新单个模型的参数以吸收新知识，导致灾难性遗忘成为一个持续挑战。我们的立场是，通过设计模块化记忆，结合权重内学习（IWL）和新出现的上下文学习（ICL）的优势，是实现大规模持续适应的缺失环节。我们概述了一个以模块化记忆为中心的架构的概念框架，该架构利用ICL进行快速适应和知识积累，利用IWL对模型能力进行稳定更新，为持续学习智能体绘制了一条实用的路线图。

英文摘要

Foundation models have transformed machine learning through large-scale pretraining and increased test-time compute. Despite surpassing human performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning (IWL), i.e., updating a single model's parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale. We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, charting a practical roadmap toward continually learning agents.

URL PDF HTML ☆

赞 0 踩 0

2605.01973 2026-06-17 cs.CL cs.LG 版本更新

Learn-To-Learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

在任意文本条件下学习：一种超网络驱动的元门控大语言模型

Luo Ji, Qi Qin, Ningyuan Xi, Teng Chen, Qingqing Gu, Hongyan Li

发表机构 * University of Science and Technology of China（中国科学技术大学）

AI总结提出一种超网络驱动的元门控机制，通过动态调整SwiGLU块中的β参数，使LLM适应不同文本条件，优于微调和元学习基线。

Comments Accepted by ICML2026

2606.17321 2026-06-17 cs.LG cs.CV 新提交

ProCUA-SFT Technical Report

ProCUA-SFT 技术报告

Jaehun Jung, Ximing Lu, Brandon Cui, Muhammad Khalifa, Shaokun Zhang, Hao Zhang, Jin Xu, Amala Sanjay Deshmukh, Karan Sapra, Andrew Tao, Yejin Choi, Jan Kautz, Mingjie Liu, Yi Dong

发表机构 * NVIDIA（英伟达）； University of Washington（华盛顿大学）； Allen Institute for AI（艾伦人工智能研究所）

AI总结提出 ProCUA-SFT 数据集，通过自动化管道从 2484 个应用组合的合成轨迹中蒸馏出 310 万步级 SFT 样本，微调 UI-TARS 7B 在 OSWorld 上达到 45.0% 的成功率，比基线提升 18.7 个百分点。

Comments 15 pages, 5 figures

详情

AI中文摘要

训练计算机使用智能体（CUA）——通过截图和键盘/鼠标操作与图形桌面交互的模型——需要在全桌面环境中收集的大规模、多样化的轨迹数据。最大的公共资源 AgentNet（22.5K 条人类轨迹）在用于监督微调（SFT）时会导致负迁移：在 AgentNet 上继续训练 UI-TARS 7B 导致 OSWorld 成功率从 26.3% 下降到 8-10%。我们提出了 ProCUA-SFT，一个包含 310 万步级 SFT 样本的数据集，这些样本从 2484 个应用组合中的 93K 条合成轨迹中蒸馏得到。该数据集由一个全自动管道生成，该管道（i）在带有真实世界内容的实况桌面上合成有基础的任务——912 个来自 SpreadsheetBench 的电子表格、约 10K 个来自 Zenodo10K 的宽松许可演示文稿以及多应用 OSWorld 配置——以及（ii）在展开前通过二元前置条件检查验证每个任务的可行性。单个 VLM（Kimi-K2.5）作为目标生成器、前置条件判断器和轨迹执行器，消除了规划器-执行器的能力差距。每条轨迹被扩展为步前缀样本，精确复现推理时看到的上下文布局。在 ProCUA-SFT 上微调 UI-TARS 7B 一个 epoch 后，在 OSWorld 上达到 45.0%——比基础模型提升 18.7 个百分点，比 AgentNet 训练的模型高出 35% 以上。ProCUA 的一个子集被纳入 Nemotron 3 Nano Omni 模型的训练数据中，为其计算机使用能力做出了贡献。

英文摘要

Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when used for supervised fine-tuning (SFT): continuing training UI-TARS 7B on AgentNet causes OSWorld success rate to fall from 26.3% to 8-10%. We present ProCUA-SFT, a dataset of 3.1M step-level SFT samples distilled from 93K synthetic trajectories across 2,484 application combinations. The dataset is produced by a fully automated pipeline that (i) synthesizes grounded tasks on live desktops seeded with real-world content -- 912 spreadsheets from SpreadsheetBench, approximately 10K permissively-licensed presentations from Zenodo10K, and multi-application OSWorld configs -- and (ii) verifies each task's feasibility through binary precondition checking before rollout. A single VLM (Kimi-K2.5) serves as goal generator, precondition judge, and trajectory executor, eliminating planner-actor capability gaps. Each trajectory is expanded into step-prefix samples that exactly reproduce the context layout seen at inference time. Fine-tuning UI-TARS 7B on ProCUA-SFT for one epoch yields 45.0% on OSWorld -- an 18.7 percentage-point improvement over the base model and over 35% above AgentNet-trained counterparts. A subset of ProCUA was incorporated into the training data for the Nemotron 3 Nano Omni model, contributing to its computer-use capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.17464 2026-06-17 cs.LG 新提交

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

CheckMIABench: 语言模型成员推理攻击的坚实基础

Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel

发表机构 * Harvard University（哈佛大学）； Harvard Business School（哈佛商学院）

AI总结为解决成员推理攻击评估中的分布偏移问题，提出基于训练中固定点前后数据同分布的基准框架，在Pythia和OLMo模型上评估多种攻击，并开源模块化库。

详情

AI中文摘要

成员推理攻击（MIA）是评估机器学习模型隐私属性的标准方法。尽管已有多次尝试评估语言模型上的MIA，但现有文献在构建干净评估以测试新技术方面遇到诸多困难。特别是，成员集和非成员集之间的微妙分布偏移可能破坏MIA的统计有效性；最近的研究通过展示没有访问底层模型的“盲”方法在同一基准上的表现远优于已发布方法，强调了这一点。本文利用训练过程中固定点前后的训练数据来自同一分布的洞察，构建了一个用于对LLM进行原则性MIA评估的基准。因此，所有具有中间检查点和公开训练数据的开源模型都可以转化为MIA测试平台。我们将我们的框架应用于针对Pythia和OLMo模型系列（从70M到7B参数）的半打已发布攻击。为促进进一步的隐私研究，我们开源了一个模块化库，用于在此设置中设计和实现攻击：此 https URL。

英文摘要

Membership inference attacks (MIAs) are a canonical way to assess a machine learning model's privacy properties. Although several attempts have been made to evaluate MIAs on language models, the extant literature has suffered numerous difficulties in constructing clean evaluations to test new techniques. In particular, subtle distribution shifts between member and non-member sets can undermine the statistical validity of MIAs; recent work has underscored this by showing that "blind" methods with no access to the underlying model can perform far better than published methods on the same benchmarks. This paper constructs a benchmark for principled evaluation of MIAs against LLMs, by leveraging the insight that training data before and after a fixed point during training are drawn from the same distribution. Therefore, all open-source models with intermediate checkpoints and public training data can be converted into MIA testbeds. We apply our framework to a half-dozen published attacks on the Pythia and OLMo family of models, from 70M to 7B parameters. To facilitate further privacy research, we open-source a modular library for designing and implementing attacks in this setting: https://github.com/safr-ai-lab/pandora_llm.

URL PDF HTML ☆

赞 0 踩 0

2606.17508 2026-06-17 cs.LG cs.DC cs.PL cs.SE 新提交

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

当下一步不是一步：面向并发Go程序的分布感知执行建模

Kaviru Hapuarachchi

发表机构 * University of Colombo School of Computing（科伦坡大学计算学院）

AI总结针对并发程序非确定性调度导致单标签预测困难的问题，提出分布感知训练方法，通过多次运行聚合经验分布并微调7B模型，在真实Go缺陷预测中准确率达36.2%，并降低期望校准误差。

Comments 10 pages, 2 figures

详情

AI中文摘要

ARVO：开源软件可复现漏洞图谱

Xiang Mei, Jordi Del Castillo, Pulkit Singh Singaria, Haoran Xi, Abdelouahab Benchikh, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, Hammond Pearce, Brendan Dolan-Gavitt

发表机构 * National Vulnerability Database（国家漏洞数据库）； Google（谷歌）

AI总结提出一种大规模构建可复现漏洞数据集的方法，基于OSS-Fuzz构建含6100+真实漏洞的ARVO数据集，实现81%复现率与89.4%补丁定位精度，解决可复现性、数量与多样性三难问题。

Comments Accepted at IEEE European Symposium on Security and Privacy (EuroS&P) 2026

详情

AI中文摘要

长期以来，在漏洞数据集中实现可复现性、数量和多样性被视为固有的三方权衡，改进一个维度往往以牺牲其他维度为代价。在实践中，可复现性是最常被忽视的维度。这限制了从历史错误数据集中自动提取的内容，并降低了它们对下游安全研究的实用性。在这项工作中，我们提出了一种方法，通过识别大规模错误复现的关键障碍并用通用解决方案加以解决，从而生成一个新的安全数据集，确保大规模多样化漏洞的可复现性。使用这种方法，我们为最大的开源软件漏洞数据集（OSS-Fuzz）引入了完全可复现性，并构建了ARVO数据集（开源软件可复现漏洞图谱）。ARVO是一个大规模数据集，包含311个项目中的6100多个真实世界漏洞。专注于可复现性，ARVO与现有数据集的不同之处在于，它以可以跨版本一致重建、触发和分析的形式提供每个漏洞。可复现性还使得能够自动识别每个漏洞的相应补丁，并支持代码更改后直接与漏洞交互，这是现有大规模数据集所不具备的能力。在我们的评估中，ARVO成功复现了81%的漏洞，并在定位的补丁上达到了89.4%的准确率。我们还讨论了ARVO对上游实践和下游安全研究的影响。

英文摘要

Achieving reproducibility, quantity, and diversity in vulnerability datasets has long been viewed as an inherent three-way trade-off, where improving one dimension often comes at the cost of the others. In practice, reproducibility has been the dimension most often neglected. This has limited what can be automatically extracted from historical bug datasets, and has reduced their utility for downstream security research. In this work, we propose a method to produce a new security dataset which ensures reproducibility for diverse vulnerabilities at scale by identifying the key obstacles to large-scale bug reproduction and addressing them with general solutions. Using this method, we introduce full reproducibility to the largest open source software vulnerability dataset (OSS-Fuzz) and construct the ARVO dataset (an Atlas of Reproducible Vulnerabilities in Open-source software). ARVO is a large-scale dataset consisting of over 6,100 real-world vulnerabilities across 311 projects. Focusing on reproducibility, ARVO differs from existing datasets by providing each vulnerability in a form that can be consistently rebuilt, triggered, and analyzed across versions. Reproducibility also enables automatic identification of the corresponding patch for each vulnerability and supports direct interaction with vulnerabilities after code changes, capabilities that existing large-scale datasets do not provide. In our evaluation, ARVO successfully reproduces 81% of vulnerabilities and achieves 89.4% accuracy on the located patches. We also discuss ARVO's influence on both upstream practices and downstream security research.

URL PDF HTML ☆

赞 0 踩 0

2606.17391 2026-06-17 cs.CL cs.AI cs.LG 交叉投稿

NarrativeWorldBench: A Frontier-Saturated Benchmark and a Latent World Model for Long-Horizon Co-Creative Audio Drama

NarrativeWorldBench：面向长程共创音频剧的前沿饱和基准与潜在世界模型

Logan Mann, Abdur Rahman, Mohammad Saifullah, Taaha Kazi, Vasu Sharma

发表机构 * University of California, Santa Barbara（加州大学圣塔芭芭拉分校）； Pocket FM

AI总结提出NarrativeWorldBench基准，在九种叙事结构指标上评估21个模型，并引入N-VSSM变分状态空间模型，通过Mamba-2骨干和事件条件后验在200集以上维持结构化潜在状态，在长弧一致性和可控性上超越Claude Opus 4.5。

Comments 10 pages. Accepted to the ICML 2026 Workshops on High-dimensional Learning Dynamics (HiLD) and Culture x AI

详情

AI中文摘要

长篇连载音频剧，其剧情弧线跨越200至800集，是一种重要的创意媒介，也是前沿大语言模型（LLM）表现不佳的场景。我们在一组统一的叙事结构指标上，对21个模型进行了基准测试，涵盖经典、微调、开放前沿、封闭前沿和推理层级。所有封闭前沿系统在情节节拍F1上饱和于[0.78, 0.81]区间，并在视界h=200时下降约-0.20 F1。我们引入了NarrativeWorldBench，一个开放基准，包含九种叙事结构指标，在h∈{10, 20, 50, 100, 200}的视界上评估，并在四种印度语言（印地语、泰米尔语、泰卢固语、马拉地语）上进行跨语言评估。我们提出了N-VSSM，一种叙事变分状态空间模型，通过Mamba-2骨干网络和事件条件后验以及8B解码器，在超过200集的时间内维持一个结构化的256维潜在世界状态。N-VSSM在所有视界上保持情节节拍F1≥0.84，计算量仅为封闭前沿区间的1/4。学习到的文化迁移函数将跨语言忠实度提高了+0.20至+0.23 Likert分。在一项受试者内作家研究（n=12位专业作者，240次试验）中，N-VSSM在长弧一致性上以71%的偏好率优于Claude Opus 4.5，在可控性上评分高出+1.3 Likert分。

英文摘要

Long-form serialized audio drama, with arcs that run for 200 to 800 episodes, is a major creative medium and a setting where frontier large language models (LLMs) fail. We benchmark 21 models, spanning classical, fine-tuned, open-frontier, closed-frontier, and reasoning tiers, on a uniform set of structural narrative metrics. All closed-frontier systems saturate at a plot-beat F1 in the band [0.78, 0.81] and collapse by about -0.20 F1 at horizon h=200. We introduce NarrativeWorldBench, an open benchmark of nine narrative-structure metrics evaluated across horizons h in {10, 20, 50, 100, 200}, with cross-lingual evaluation across four Indic languages (Hindi, Tamil, Telugu, Marathi). We introduce N-VSSM, a Narrative Variational State-Space Model that maintains a structured 256-dimensional latent world state over more than 200 episodes via a Mamba-2 backbone with an event-conditioned posterior and an 8B decoder. N-VSSM holds plot-beat F1 >= 0.84 across all horizons at 4x lower compute than the closed-frontier band. A learned Cultural Transfer Function lifts cross-language fidelity by +0.20 to +0.23 Likert points. In a within-subjects writer study (n = 12 professional authors, 240 trials), N-VSSM is preferred over Claude Opus 4.5 on long-arc consistency 71% of the time and rated +1.3 Likert points higher on controllability.

URL PDF HTML ☆

赞 0 踩 0

2606.17529 2026-06-17 cs.CE cs.LG 交叉投稿

Domain-Validity-Gated Metamorphic Testing of Scientific ML Surrogates

基于域有效性门控的科学机器学习代理模型蜕变测试

Meng Li, Xiaohua Yang, Jie Liu, Shiyu Yan

发表机构 * School of Computing, University of South China（南方大学计算机学院）； Hunan Engineering Research Center of Software Evaluation and Testing for Intellectual Equipment（湖南软件评估与测试研究中心）； CNNC Key Laboratory on High Trusted Computing（中核高可信计算重点实验室）

AI总结针对科学机器学习代理模型缺乏真实输出的问题，提出域有效性筛选方法将候选蜕变关系转化为可执行测试资产，并在多种代理模型上验证了其有效性。

详情

AI中文摘要

科学机器学习（SciML）代理模型近似昂贵的模拟，但任意输入的精确预期输出不可用（预言机问题）。蜕变测试检查执行间的关系，但候选关系并非自动有效：其前提条件、输出映射以及评分算子的数值下限决定了违反是否有意义。我们研究如何筛选候选蜕变关系（MR）的域有效性，并将其转化为可执行的、无预言机的SciML代理模型测试资产。我们提出：（i）域有效性准则，仅当候选的容差主导算子的数值下限且其前提条件成立时才接受该候选；（ii）MR卡可执行资产格式，记录源案例、变换、度量、容差和类型化的关系级判定；（iii）在MeshGraphNets圆柱流代理模型上的案例研究协议，附带声明账本将每个结果绑定到可追踪工件。在MeshGraphNets检查点上，节点置换达到机器精度，镜像y是有界分布外压力发现而非精确对称，绝对守恒被推迟而参考相对守卫通过。相同的读数在保留轨迹、检查点列表、另外三种架构以及PhysicsNeMo上保持一致。在第二个CFD任务（可压缩翼型）上，谓词反而基于物理原因拒绝不可压缩连续性，表明它推理域有效性而非运行固定检查表。在第二个PDE族上，FNO Burgers和热代理模型运行完整的接受/拒绝/执行判定。证据涵盖两个CFD任务和第二个PDE族，支持从候选MR到可审计SciML测试资产的域有效性感知桥梁，将模型级违反与域外应用区分开。

英文摘要

Scientific machine-learning (SciML) surrogates approximate expensive simulations, but exact expected outputs for arbitrary inputs are unavailable (the oracle problem). Metamorphic testing checks relations across executions, yet a candidate relation is not automatically valid: its preconditions, output mapping, and the numerical floor of the scoring operator determine whether a violation is meaningful. We study how candidate metamorphic relations (MRs) can be screened for domain validity and turned into executable, oracle-free test assets for SciML surrogates. We propose (i) a domain-validity rubric that admits a candidate only when its tolerance dominates the operator's numerical floor and its preconditions hold; (ii) an MR-card executable-asset format recording source cases, transformations, metrics, tolerances, and typed relation-level verdicts; and (iii) a case-study protocol on MeshGraphNets cylinder-flow surrogates, with a claim ledger binding every result to a tracked artifact. On a MeshGraphNets checkpoint, node permutation holds to machine precision, mirror-y is a bounded out-of-distribution stress finding rather than an exact symmetry, and absolute conservation stays deferred while a reference-relative guard passes. The same readings hold across held-out trajectories, a checkpoint roster, three further architectures, and PhysicsNeMo. On a second CFD task (compressible airfoil) the predicate instead rejects incompressible continuity on physical grounds, showing it reasons about domain validity rather than running a fixed checklist. On a second PDE family, FNO Burgers and heat surrogates run full admit/reject/execute verdicts. The evidence spans two CFD tasks and a second PDE family, supporting a validity-aware bridge from candidate MRs to auditable SciML test assets that separates model-level violations from out-of-domain applications.

URL PDF HTML ☆

赞 0 踩 0

2606.17710 2026-06-17 cs.CV cs.AI cs.CL cs.LG 交叉投稿

Vision-language models for chest radiography do not always need the image

胸部X光片的视觉-语言模型并不总是需要图像

Mahshad Lotfinia, Sebastian Ziegelmayer, Lisa Adams, Daniel Truhn, Andreas Maier, Soroosh Tayebi Arasteh

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg（弗里德里希-亚历山大-埃尔朗根-纽伦堡大学模式识别实验室）； Department of Diagnostic and Interventional Radiology, TUM University Clinic, School of Medicine and Health, Klinikum rechts der Isar, Technical University of Munich（慕尼黑工业大学医学院与健康学院伊萨尔河右岸医院诊断与介入放射学系）； Lab for AI in Medicine, RWTH Aachen University（亚琛工业大学医学人工智能实验室）； Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen（亚琛工业大学医院诊断与介入放射学系）

AI总结本文通过因果审计方法，发现许多医学视觉-语言模型在胸部X光片任务中依赖文本先验而非图像，纯文本模型与多模态模型性能接近，并提出了基于图像依赖性的评估框架。

详情

AI中文摘要

医学视觉-语言模型报告了强大的胸部X光片准确性，这越来越多地被解读为它们使用了图像的证据。这种推断是不安全的：一个利用发现名称先验的模型得分与读取扫描的模型相同，且没有标准基准能区分它们。我们引入了一种因果审计方法，通过遮挡相关区域、遮挡无关区域以及替换为另一患者的相同标签扫描来干预图像，并结合三种行为指标测试正确答案是否依赖于图像。在九个系统中，一个没有图像访问权限的纯文本模型达到了最佳多模态模型5.7个准确度点以内的水平，而一个1190亿参数的多模态模型在统计上与70亿参数的纯文本基线无法区分。审计将队列分为三个忽略图像的模型、一个不稳定的模型和五个选择性使用图像的模型（针对部分发现）；这些分类在第二个数据集、分辨率和提示措辞上保持一致。与委员会认证的放射科医生相比，纯文本模型在准确率上与放射科医生无统计差异，但基础归因于零，而使用图像的模型的基础归因率与放射科医生相当。报告的置信度仅在模型使用图像时标记无根据的答案。基础归因审计（而非准确性）应成为临床部署的门槛。

英文摘要

Medical vision-language models report strong chest radiograph accuracy, and this is increasingly read as evidence that they use the image. That inference is unsafe: a model exploiting finding-name priors scores like one that reads the scan, and no standard benchmark separates them. We introduce a causal audit that intervenes on the image, occluding the relevant region, occluding an irrelevant one, and swapping in another patient's same-label scan, and combines three behavioral metrics to test whether a correct answer depends on the image. Across nine systems, a text-only model with no image access reaches within 5.7 accuracy points of the best multimodal one, and a 119-billion-parameter multimodal model is statistically indistinguishable from a 7-billion text-only baseline. The audit splits the cohort into three models that ignore the image, one that is unstable, and five that use it selectively, for a subset of findings; the categories hold across a second dataset, resolution, and prompt phrasing. Against board-certified radiologists, a text-only model is statistically indistinguishable from a radiologist's accuracy while grounding at zero, whereas the image-using models ground at radiologist-comparable rates. Reported confidence flags ungrounded answers only when a model uses the image. Grounding audits, not accuracy, should gate clinical deployment.

URL PDF HTML ☆

赞 0 踩 0

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 交叉投稿

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh（生物医学信息学系，匹兹堡大学）

AI总结提出BLITZ方法，通过两阶段回归（低阶多项式+浅层树）快速消除条件集影响，实现校准良好的非参数条件独立性检验，适用于因果发现。

Comments A fast R implementation with C++ back-end is available at https://github.com/ericstrobl/BLITZ

详情

AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验，但快速非参数检验往往牺牲校准性，尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ（Broad-to-Local Independence Testing via residualiZation），一种非参数条件独立性检验，旨在在一秒内运行良好，同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖，然后应用一个小型非线性特征映射，并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差，并采用矩匹配卡方近似于零分布。我们从理论上证明，两阶段设计降低了树残差化器面临的有效复杂度，使得浅层树能够控制残差条件均值偏差，同时避免过度过拟合。在模拟中，BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准，同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中，BLITZ在保留的邻接中产生了更可靠的端点方向，并具有竞争力的结构恢复。这些结果表明，从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.18166 2026-06-17 cs.CR cs.LG 交叉投稿

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

评估开源大语言模型在CTI报告上的多标签ATT&CK技术分类

Ahmed Ryan, Saad Sakib Noor, Md Erfan, Shaswata Mitra, Sudip Mittal, Md Rayhanur Rahman

发表机构 * The University of Dhaka（达卡大学）

AI总结针对开源LLM在复杂非结构化CTI报告上的ATT&CK分类性能未被评估的问题，构建了2076句人工标注数据集，评估7个开源LLM，最高F1为0.22，表明当前模型不足以用于生产。

详情

AI中文摘要

使用MITRE ATT&CK对网络威胁情报（CTI）进行分类对于主动防御至关重要，但历史上需要大量人工。大语言模型（LLM）之前的自动化加速了这一过程，但无法解决非结构化CTI报告中复杂的语言和多步攻击模式。LLM通过上下文推理理解非结构化文本，解决了以前的局限性。然而，当前的评估依赖于简化的单技术句子，忽略了真实CTI报告的复杂性，往往导致性能结果膨胀。因此，开源LLM在复杂非结构化CTI报告上的基线性能仍未得到评估。为弥补这一差距，我们从83份复杂非结构化CTI报告中构建了一个包含2076句人工标注（1281句技术阳性，795句阴性）的真实数据集。这些句子通过六阶段标注过程映射到114种独特的ATT&CK技术，实现了kappa=0.68的标注者间一致性。利用该数据集，我们评估了7个参数从8B到236B的开源LLM，涉及提示策略和温度配置。性能最高的LLM实现了0.22的微平均F1分数，为复杂非结构化CTI上的多标签ATT&CK分类建立了经验基线。参数大小与F1分数呈统计显著正相关。提示策略和温度在不同模型配置下未产生统计显著的增益。这些结果表明，当前开源LLM不足以用于生产级ATT&CK分类。该数据集、基准和发现为未来的CTI研究提供了可复现的基础。

英文摘要

Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the complex language and multi-step attack patterns found in unstructured CTI reports. LLMs addressed previous limitations by using contextual reasoning to understand unstructured text. However, current evaluations rely on simplified, single-technique sentences that ignore the complexity of real-world CTI reports, which often leads to inflated performance results. Consequently, the baseline performance of open-source LLMs on complex unstructured CTI reports remains unevaluated. To address this gap, we constructed a ground-truth dataset of 2,076 human-annotated sentences (1,281 technique-positive, 795 negative) from 83 complex unstructured CTI reports. These sentences were mapped to 114 unique ATT&CK techniques using a six-phase annotation process, achieving \k{appa} = 0.68 inter-annotator agreement. Using this dataset, we evaluated seven open-source LLMs ranging from 8B to 236B parameters across prompt strategy and temperature configurations. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, establishing the empirical baseline for multi-label ATT&CK classification on complex unstructured CTI. Parameter size showed a statistically significant positive correlation with F1 score. Prompt strategy and temperature produced no statistically significant gains across model configurations. These results indicate that current open-source LLMs are insufficient for production-grade ATT&CK classification. The dataset, benchmark, and findings provide a reproducible foundation for future CTI research.

URL PDF HTML ☆

赞 0 踩 0

2606.18190 2026-06-17 cs.CR cs.LG 交叉投稿

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

多源网络安全日志：一个ATT&CK标记数据集及小语言模型评估

Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi, Md Erfan, Md Rayhanur Rahman

发表机构 * Windows endpoints（Windows终端）

AI总结为解决多阶段网络攻击检测中缺乏带ATT&CK技术标签的多源日志数据集问题，构建了包含870个会话（70个攻击、800个良性）和约230万事件的多源日志数据集，并基于该数据集微调三个小语言模型，在分块分类任务上准确率从约8%提升至90%-97%。

详情

AI中文摘要

多阶段网络攻击跨越系统、网络和浏览器日志。检测它们需要关联所有三个来源的事件。机器学习方法可以学习这些跨源模式，但需要带标签的多源数据。现有的公共数据集存在不足。仅网络数据集如CICIDS和UNSW-NB15缺少主机和浏览器活动。以主机为中心的数据集如LMDG和CICAPT-IIoT缺乏浏览器遥测。ATLAS包含所有三个来源，但仅将事件标记为恶意或良性，没有MITRE对抗战术、技术和通用知识（ATT&CK）技术的粒度。没有公共数据集将三个来源与每条记录的ATT&CK技术标签结合起来。我们通过构建一个包含870个会话（70个攻击，800个良性）和约230万事件的多源日志数据集来弥补这一差距。我们在Windows端点上同时捕获了系统、网络和浏览器活动。我们用ATT&CK技术ID标记了恶意事件，涵盖了12种战术和53种技术。我们使用真实工具生成了所有攻击数据，包括远程访问木马（RAT）、命令与控制（C2）隧道和云外泄。为了展示可学习性，我们使用低秩适配（LoRA）微调了三个小语言模型（SLM）（Qwen2.5-1.5B、Llama-3.2-3B、Phi-4-Mini）。我们在两个任务（分块分类和ATT&CK技术识别）上，将每个模型与其基础变体在十个指标上进行了比较。微调在每个指标上改进了每个模型。分块分类准确率从基础变体的大约8%提高到微调后的90%到97%。技术识别仍然具有挑战性，最佳精确匹配准确率为42%，尽管高部分匹配分数表明模型捕捉到了大部分底层推理。

英文摘要

Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets such as CICIDS and UNSW-NB15 miss host and browser activity. Host-focused datasets such as LMDG and CICAPT-IIoT lack browser telemetry. ATLAS includes all three sources but labels events only as malicious or benign, without MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) technique granularity. No public dataset combines all three sources with per-entry ATT&CK technique labels. We close the gap by building a multi-source log dataset of 870 sessions (70 attack, 800 benign) and approximately 2.3 million events. We captured system, network, and browser activity simultaneously on Windows endpoints. We labeled malicious events with ATT&CK technique IDs, covering 12 tactics and 53 techniques. We generated all attack data using real tools, including Remote Access Trojan (RAT), Command and Control (C2) tunnels, and cloud exfiltration. To demonstrate learnability, we fine-tuned three Small Language Models (SLMs) (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) using Low-Rank Adaptation (LoRA). We compared each against its base variant across ten metrics on two tasks: chunk classification and ATT&CK technique identification. Fine-tuning improved every model on every metric. Chunk classification accuracy rose from approximately 8% in the base variants to between 90% and 97% after fine-tuning. Technique identification remained challenging, with the best exact-match accuracy at 42%, although high partial-match scores show the models captured most of the underlying reasoning.

URL PDF HTML ☆

赞 0 踩 0

2606.18237 2026-06-17 cs.CL cs.AI cs.LG 交叉投稿

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

ReproRepo: 利用 GitHub 仓库问题扩展可重复性审计

Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar

发表机构 * School of Computer Science, Carnegie Mellon University（卡内基梅隆大学计算机科学学院）； Datadog

AI总结提出 ReproRepo 框架，利用 GitHub issues 作为监督信号，对 1149 篇论文进行可重复性评估，发现 Codex with GPT-5.5 能识别约 90% 论文的语义相关复现问题。

详情

AI中文摘要

从论文和已发布代码中复现研究结果对科学进步至关重要。现有工作引入了基准测试来评估 LLM 代理是否能协助可重复性，但由于数据整理和评估需要大量人工努力，这些基准难以扩展。我们提出了 ReproRepo，一个可扩展的可重复性评估框架，利用人类提出的 GitHub issues 作为真实复现障碍的自然监督信号。我们在来自主要会议的 1149 篇近期机器学习论文上实例化 ReproRepo，并评估了四种前沿模型代理配置。我们的结果表明，即使不执行代码，LLM 代理也能从论文-仓库对中识别出许多现实世界的可重复性问题：我们研究中的最佳代理，即带有 GPT-5.5 的 Codex，为研究中约 90% 的论文揭示了至少一个语义相关的人类报告的障碍。进一步分析表明，代理在揭示可见故障和识别正确语义区域方面特别有效，但在精确定位方面可能仍不足。ReproRepo 可作为未来在真实世界可重复性审计中评估 LLM 代理的可重用、可扩展框架。我们的代码发布在 https://this URL。

英文摘要

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We introduce ReproRepo, a scalable framework for reproducibility evaluation that leverages human-raised GitHub issues as naturally occurring supervision on realistic reproduction blockers. We instantiate ReproRepo on 1,149 recent machine learning papers from major conferences and evaluate four frontier model-agent configurations. Our results show that LLM agents, even without executing code, can identify many real-world reproducibility problems from paper-repository pairs: the best agent in our study, namely Codex with GPT-5.5, surfaces at least one semantically related human-reported blocker for ~90% of papers in the study. Further analysis shows that agents are particularly effective for surfacing visible failures and identifying the right semantic region, but may still be insufficient in exact localization. ReproRepo can serve as a reusable, scalable framework for future evaluations of LLM agents on real-world reproducibility auditing. Our code is released at https://github.com/LithiumDA/ReproRepo.

URL PDF HTML ☆

赞 0 踩 0

2502.00241 2026-06-17 cs.LG cs.AI cs.CL cs.CV 版本更新

Mordal: Automated Pretrained Model Selection for Vision Language Models

Mordal: 面向视觉语言模型的自动化预训练模型选择

Shiqi He, Insu Jang, Mosharaf Chowdhury

AI总结提出Mordal框架，通过减少候选模型数量和评估时间，自动化搜索用户定义任务的最佳视觉语言模型，相比网格搜索降低GPU耗时8.9-11.6倍，加权Kendall's τ平均提升69%。

详情

AI中文摘要

将多种模态融入大型语言模型（LLMs）是增强其对非文本数据理解、使其能够执行多模态任务的有效方式。视觉语言模型（VLMs）因其在医疗、机器人和无障碍等领域的众多实际应用，成为增长最快的多模态模型类别。然而，尽管文献中不同的VLM在不同基准测试中展现出令人印象深刻的视觉能力，它们都是由人类专家手工设计的；目前尚无自动化框架来创建特定任务的多模态模型。我们引入Mordal，一种自动化多模态模型搜索框架，能够高效地为用户定义的任务找到最佳VLM，无需人工干预。Mordal通过减少搜索过程中需考虑的候选模型数量以及最小化评估每个剩余候选模型所需的时间来实现这一目标。我们的评估表明，Mordal能够找到给定问题的最佳VLM，其GPU耗时比网格搜索低8.9倍至11.6倍。我们还发现，Mordal在不同任务上平均比最先进的模型选择方法实现约69%更高的加权Kendall's τ。

英文摘要

Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models. We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using $8.9\times$--$11.6\times$ lower GPU hours than grid search. We have also discovered that Mordal achieves about 69\% higher weighted Kendall's $τ$ on average than the state-of-the-art model selection method across diverse tasks.

URL PDF HTML ☆

赞 0 踩 0

2507.18623 2026-06-17 cs.LG cs.AI cs.MA 版本更新

Moving Out: Physically-grounded Human-AI Collaboration

Moving Out: 基于物理的人机协作

Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo

AI总结提出Moving Out基准测试，模拟物理约束下的协作场景，并开发BASS方法增强智能体多样性及动作理解，实验证明其与未见过的AI和人类均能有效协作。

Comments Accepted at ICML 2026

详情

AI中文摘要

适应环境中的物理动作和约束的能力对于具身智能体（如机器人）与人类有效协作至关重要。这种基于物理的人机协作必须考虑连续状态-动作空间增加的复杂性以及物理约束导致的受限动力学。然而，大多数现有的协作基准是离散的，或者不考虑物理属性和约束。为了解决这个问题，我们引入了Moving Out，一个人机协作基准，它模拟了受物理属性和约束影响的各种协作模式，例如一起移动重物以及协调动作将物品绕过角落。Moving Out包含两个挑战和人类-人类交互数据，以全面评估模型适应多样化人类行为和未见物理属性的能力。为了使具身智能体能够在物理属性和约束下与人类协作，我们提出了一种新方法BASS（行为增强、模拟和选择），以增强智能体的多样性及其对动作结果的理解。我们系统地将BASS与最先进模型在AI-AI和人机实验中进行了比较，结果表明BASS能够有效地与未见过的AI和人类协作。项目页面可在此https URL访问。

英文摘要

The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. However, most existing collaboration benchmarks are discrete or do not consider physical attributes and constraints. To address this, we introduce Moving Out, a human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and coordinating actions to move an item around a corner. Moving Out consists of two challenges and human-human interaction data to comprehensively evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To give embodied agents the capability to collaborate with humans under physical attributes and constraints, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. We systematically compare BASS and state-of-the-art models in AI-AI and human-AI experiments, showing that BASS can effectively collaborate with both unseen AI and humans. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.

URL PDF HTML ☆

赞 0 踩 0

2512.21315 2026-06-17 cs.LG cs.CV stat.ML 版本更新

Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

数据处理不等式是否反映实践？论低级任务的有用性

Roy Turgeman, Tom Tirer

AI总结本文研究低级处理（如去噪、编码）如何提升分类性能，证明在有限样本下存在预处理可提高准确率，并通过实验验证理论趋势。

Comments ICLR 2026 (camera-ready). Code is available at: https://github.com/serveroy/process-before-you-classify

详情

Journal ref: The Fourteenth International Conference on Learning Representations (ICLR 2026)

AI中文摘要

数据处理不等式是一个信息论原理，指出信号的信息内容不能通过处理观测数据而增加。特别地，它表明在解决分类问题之前，增强信号或对其进行编码没有益处。对于最优贝叶斯分类器，这一断言可以被证明是正确的。然而，在实践中，尽管现代深度神经网络具有强大的能力，但在高级下游任务之前执行“低级”任务仍然很常见。在本文中，我们旨在理解低级处理何时以及为何对分类有益。我们提出了一个二元分类设置的综合理论研究，其中我们考虑一个与最优贝叶斯分类器紧密相连的分类器，并随着训练样本数量的增加而收敛到它。我们证明，对于任何有限数量的训练样本，存在一种预分类处理可以提高分类准确率。我们还探讨了类分离、训练集大小和类平衡对该过程相对增益的影响。我们通过理论设置的经验研究来支持我们的理论。最后，我们进行了一项实证研究，调查去噪和编码对基准数据集上实际深度分类器性能的影响。具体来说，我们改变了训练集的大小和类别分布以及噪声水平，并展示了与我们的理论结果一致的趋势。

英文摘要

The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.

URL PDF HTML ☆

赞 0 踩 0

2602.03300 2026-06-17 cs.LG cs.AI cs.CL cs.CV 版本更新

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

R1-SyntheticVL：生成模型的合成数据是否已为多模态大语言模型做好准备？

Jingyi Zhang, Tianyi Lin, Huanjin Yao, Xiang Lan, Shunyu Liu, Jiaxing Huang

AI总结提出集体对抗数据合成（CADS）方法，通过集体智能和对抗学习自动生成高质量、多样且具有挑战性的多模态数据，用于增强多模态大语言模型（MLLM）在复杂现实任务中的性能。

Comments ICML 2026 Camera Ready

详情

AI中文摘要

在这项工作中，我们旨在开发有效的数据合成技术，自主合成多模态训练数据，以增强MLLM解决复杂现实任务的能力。为此，我们提出了集体对抗数据合成（CADS），这是一种新颖且通用的方法，用于合成高质量、多样且具有挑战性的多模态数据。CADS的核心思想是利用集体智能确保高质量和多样化的生成，同时探索对抗学习以合成具有挑战性的样本，从而有效驱动模型改进。具体来说，CADS包含两个循环阶段：集体对抗数据生成（CAD-Generate）和集体对抗数据判断（CAD-Judge）。CAD-Generate利用集体知识共同生成新的多样化多模态数据，而CAD-Judge则协作评估合成数据的质量。此外，CADS引入了一种对抗上下文优化机制，以优化生成上下文，鼓励生成具有挑战性和高价值的数据。通过CADS，我们构建了MMSynthetic-20K并训练了我们的模型R1-SyntheticVL，该模型在多个基准测试中表现出优越的性能。

英文摘要

In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.

URL PDF HTML ☆

赞 0 踩 0

2602.06276 2026-06-17 cs.LG stat.ML 版本更新

Statistical Learning from Attribution Sets

从归因集合中进行统计学习

Lorne Applebaum, Robert Busa-Fekete, August Y. Chen, Claudio Gentile, Tomer Koren, Aryan Mokhtari

发表机构 * Google Research（谷歌研究）； Cornell University（康奈尔大学）； Tel Aviv University（特拉维夫大学）； UT Austin（德克萨斯大学奥斯汀分校）

AI总结针对隐私约束下广告点击与转化无法直接关联的问题，提出基于归因集合的无偏损失估计方法，实现经验风险最小化的泛化保证，并优于行业启发式方法。

Comments COLT 2026. 45 pages

详情

AI中文摘要

我们解决了隐私约束下广告领域转化预测模型的训练问题，其中点击和转化之间缺乏直接链接。受隐私保护浏览器API和第三方cookie弃用的启发，我们研究了一种设置，其中学习器观察到一系列点击和一系列转化，但只能将转化与一组候选点击（归因集合）相关联，而不是唯一的来源。我们将此形式化为从由具有候选先验分布的无知对手生成的归因集合中进行学习。尽管缺乏显式标签，我们通过一种新颖的方法从这些粗粒度信号中构建了总体损失的无偏估计量。利用该估计量，我们表明经验风险最小化实现了泛化保证，该保证随先验的信息量而缩放，并且对先验的估计误差也具有鲁棒性，尽管归因集合之间存在复杂的依赖关系。在标准数据集上的简单实证评估表明，我们的无偏方法显著优于常见的行业启发式方法，特别是在归因集合较大或重叠的情况下。

英文摘要

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

URL PDF HTML ☆

赞 0 踩 0

2603.20775 2026-06-17 cs.LG 版本更新

Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

评估结构偏差下的提升建模：对指标稳定性和模型鲁棒性的洞察

Yuxuan Yang, Dugang Liu, Yiyan Huang

AI总结针对现实营销数据中的多种偏差，设计半合成基准框架，发现TARNet具有鲁棒性，且与ATE对齐的指标更稳定。

Comments Accepted by KDD 26

详情

AI中文摘要

在个性化营销中，提升模型通过反事实分析模拟客户在不同干预下的行为变化，来估计干预的增量效果。然而，现实营销数据常存在多种偏差，如选择偏差、溢出效应、测量误差和未观测混杂。这些偏差会同时影响提升估计的准确性和评估指标的有效性。尽管偏差感知评估很重要，但缺乏系统研究来评估不同模型和指标在偏差条件下的表现。为填补这一空白，我们设计了一个系统基准框架。与标准预测任务不同，现实提升数据集天然缺乏反事实真值。这一限制使得评估指标的直接验证不可行，并阻碍了偏差的精确量化。因此，半合成方法成为系统基准的关键推动力。该方法通过保留现实特征依赖关系，同时提供隔离结构偏差所需的真值，有效弥合了差距。我们的研究发现：(i) 提升定位和预测可能表现为不同目标，擅长一个并不保证另一个有效；(ii) 尽管许多模型在多种偏差下表现不一致，但TARNet表现出显著的鲁棒性，为后续模型设计提供了见解；(iii) 评估指标的稳定性与其与ATE的数学对齐程度相关，表明在结构数据不完美下，近似ATE的指标能产生更一致的模型排名。这些发现表明，在现实数据不完美下需要更鲁棒的提升模型和评估指标。

英文摘要

In personalized marketing, uplift models estimate the incremental effect of an intervention by modeling how customer behavior would change under alternative treatments using counterfactual analysis. However, real-world marketing data often exhibit various biases, such as selection bias, spillover effects, measurement error, and unobserved confounding. These biases can adversely affect both the accuracy of uplift estimation and the validity of evaluation metrics. Despite the importance of bias-aware assessment, there remains a lack of systematic studies evaluating how different models and metrics perform under such biased conditions. To bridge this gap, we design a systematic benchmarking framework. Unlike standard predictive tasks, real-world uplift datasets inherently lack counterfactual ground truth. This limitation renders the direct validation of evaluation metrics infeasible and prevents the precise quantification of biases. Therefore, a semi-synthetic approach serves as a critical enabler for systematic benchmarking. This approach effectively bridges the gap by retaining real-world feature dependencies while providing the ground truth needed to isolate structural biases. Our investigations reveal that (i) uplift targeting and prediction can manifest as distinct objectives, where proficiency in one does not ensure efficacy in the other; (ii) while many models exhibit inconsistent performance under diverse biases, TARNet shows notable robustness, providing insights for subsequent model design; (iii) the stability of evaluation metrics is linked to their mathematical alignment with the ATE, suggesting that ATE-approximating metrics yield more consistent model rankings under structural data imperfections. These findings suggest the need for more robust uplift models and evaluation metrics under real-world data imperfections.

URL PDF HTML ☆

赞 0 踩 0

2603.26592 2026-06-17 cs.LG cs.AI cs.HC 版本更新

Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation

评估交互式二维可视化作为生物医学时间序列数据标注的样本选择策略

Einari Vaaras, Manu Airaksinen, Okko Räsänen

AI总结针对生物医学时间序列标注困难，比较随机采样、最远优先遍历和基于交互式2D可视化（2DV）的三种样本选择方法，在婴儿运动评估和语音情感识别任务中，2DV在聚合标签时表现最佳，但个体标注者间标签分布差异大，随机采样最安全。

Comments Accepted for publication in Computers in Biology and Medicine (Elsevier)

详情

DOI: 10.1016/j.compbiomed.2026.111809

AI中文摘要

生物医学领域中可靠的机器学习模型依赖于准确的标签，然而标注生物医学时间序列数据仍然具有挑战性。算法样本选择可能支持标注，但涉及真实人类标注者的研究证据很少。因此，我们比较了三种用于标注的样本选择方法：随机采样（RND）、最远优先遍历（FAFT）和一种基于图形用户界面的方法，该方法能够探索高维数据的互补二维可视化（2DV）。我们在婴儿运动评估（IMA）和语音情感识别（SER）的四个分类任务中评估了这些方法。十二名标注者，分为专家和非专家，在有限的标注预算下进行数据标注，并进行了标注后实验以评估采样方法。在所有分类任务中，当聚合标注者的标签时，2DV表现最佳。在IMA中，2DV最有效地捕获了稀有类别，但也表现出由于有限的标注预算导致的标注者间标签分布变异性增大，当模型在个体标注者的标签上训练时，分类性能下降；在这些情况下，FAFT表现出色。对于SER，2DV在专家标注者中优于其他方法，并在个体标注者设置中与非专家标注者的性能相当。失败风险分析显示，当标注者数量或标注者专业知识不确定时，RND是最安全的选择，而2DV由于标签分布变异性更大而具有最高风险。此外，实验后访谈表明，2DV使标注任务更有趣和愉快。总体而言，基于2DV的采样对于生物医学时间序列数据标注似乎很有前景，特别是在标注预算不是非常紧张的情况下。

英文摘要

Reliable machine-learning models in biomedical settings depend on accurate labels, yet annotating biomedical time-series data remains challenging. Algorithmic sample selection may support annotation, but evidence from studies involving real human annotators is scarce. Consequently, we compare three sample selection methods for annotation: random sampling (RND), farthest-first traversal (FAFT), and a graphical user interface-based method enabling exploration of complementary 2D visualizations (2DVs) of high-dimensional data. We evaluated the methods across four classification tasks in infant motility assessment (IMA) and speech emotion recognition (SER). Twelve annotators, categorized as experts or non-experts, performed data annotation under a limited annotation budget, and post-annotation experiments were conducted to evaluate the sampling methods. Across all classification tasks, 2DV performed best when aggregating labels across annotators. In IMA, 2DV most effectively captured rare classes, but also exhibited greater annotator-to-annotator label distribution variability resulting from the limited annotation budget, decreasing classification performance when models were trained on individual annotators' labels; in these cases, FAFT excelled. For SER, 2DV outperformed the other methods among expert annotators and matched their performance for non-experts in the individual-annotator setting. A failure risk analysis revealed that RND was the safest choice when annotator count or annotator expertise was uncertain, whereas 2DV had the highest risk due to its greater label distribution variability. Furthermore, post-experiment interviews indicated that 2DV made the annotation task more interesting and enjoyable. Overall, 2DV-based sampling appears promising for biomedical time-series data annotation, particularly when the annotation budget is not highly constrained.

URL PDF HTML ☆

赞 0 踩 0

2606.11616 2026-06-17 cs.LG cs.IR 版本更新

DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors

DeMix: 通过影响向量调试包含混合错误类型的训练数据

Jiale Deng, Yanyan Shen, Xiaogang Shi, Junjun Chai

发表机构 * Shanghai Jiao Tong University（上海交通大学）； ByteDance Inc.（字节跳动）； Tiktok

AI总结提出DeMix框架，利用影响向量捕捉不同错误类型对模型行为的独特模式，将数据调试转化为多标签分类问题，并引入基于干预的学习策略，在11个任务上显著提升调试F1分数和修复后模型性能。

详情

AI中文摘要

高质量的训练数据对于机器学习模型的成功至关重要。然而，真实世界的数据集通常包含由数据准备流程中的系统性缺陷引起的混合错误类型，包括标签错误、特征错误和虚假相关性。有效的训练数据调试既需要检测错误样本，也需要识别其具体的错误类型以便进行针对性修复，但现有的数据清洗和归因方法未能充分满足这一双重需求。在本文中，我们提出DeMix，一种同时诊断错误样本及其错误类型的新框架。我们的关键见解是，不同的错误类型会在模型行为上产生不同的模式。DeMix通过影响向量捕获这些特定于错误的模式，这些影响向量描述了每个训练样本如何影响所有验证样本上的模型预测。我们将训练数据调试形式化为一个多标签分类问题，其中开发了一个分类器直接从影响向量预测错误类型。我们进一步引入了一种基于干预的学习策略，引导分类器捕获每种错误类型特有的不变理由，确保学到的分类器有效泛化。在表格数据预测、推荐系统和LLM对齐等11个任务上的实证评估表明，DeMix显著优于最先进的方法，在数据调试F1分数上提高了22.61%，在数据修复后任务模型性能上提高了9.32%。代码可在以下网址获取：this https URL。

英文摘要

High-quality training data is essential for the success of machine learning models. However, real-world datasets often contain mixed types of errors arising from systematic flaws in data preparation pipelines, including label errors, feature errors, and spurious correlations. Effective debugging of training data requires both detecting erroneous samples and identifying their specific error types to enable targeted repair, yet existing data cleaning and attribution methods fail to adequately address this dual requirement. In this paper, we propose DeMix, a novel framework that simultaneously diagnoses erroneous samples and their error types. Our key insight is that different error types produce distinct patterns on model behavior. DeMix captures such error-specific patterns by influence vectors that characterize how each training sample affects model predictions across all validation samples. We formulate training data debugging as a multi-label classification problem where a classifier is developed to predict error types directly from influence vectors. We further introduce an intervention-based learning strategy that guides the classifier to capture invariant rationales specific to each error type, ensuring the learned classifier generalizes effectively. Empirical evaluations on 11 tasks across tabular data prediction, recommendation systems, and LLM alignment demonstrate that DeMix significantly outperforms state-of-the-art approaches, achieving a 22.61% improvement in data debugging F1-score and a 9.32% gain in task model performance after data repair. Code is available at: https://github.com/SJTU-DMTai/DeMix.

URL PDF HTML ☆

赞 0 踩 0

2510.04421 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Survival Models with Right-Censored Reporting Delays

学习带有右删失报告延迟的生存模型

Yuta Shikuri, Hironori Fujisawa

发表机构 * The Graduate University for Advanced Studies（高级研究大学）； Tokio Marine Holdings, Inc.（东京海上日赤保险株式会社）； The Institute of Statistical Mathematics（统计数学研究所）； RIKEN（理化学研究所）

AI总结针对报告延迟导致的生存数据右删失问题，联合建模事件和报告过程的参数风险，提出一致估计量和蒙特卡洛EM算法，并利用迁移学习提高行政删失下及时风险评估的准确性。

Comments 26 pages, 3 figures, 3 tables

2511.01650 2026-06-17 cs.CL cs.AI cs.LG 版本更新

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

EngTrace：工程推理可验证过程监督的符号基准

Ayesha Gull, Muhammad Usman Safder, Rania Elbadry, Fan Zhang, Veselin Stoyanov, Preslav Nakov, Zhuohan Xie

AI总结提出EngTrace符号基准，包含1350个参数化测试用例，通过两阶段可验证评估框架（分层协议+AI仲裁）检验中间推理轨迹与最终答案，揭示数值精度与轨迹保真度的权衡。

Comments 33 pages, includes figures and tables; introduces the EngTrace benchmark

详情

AI中文摘要

大型语言模型（LLM）正越来越多地进入由严格定量标准和不变物理定律约束的专业化、安全关键的工程工作流程，因此对其推理能力进行严格评估势在必行。然而，现有的基准（如MMLU、MATH和HumanEval）评估的是孤立的认知技能，未能捕捉工程中核心的基于物理的推理，其中科学原理、定量建模和实际约束必须融合。为了实现工程中的可验证过程监督，我们引入了EngTrace，这是一个基于90个参数化模板构建的符号基准，每个模板生成独特的、抗污染的实例，涵盖三个主要工程分支、九个核心领域和20个不同领域，产生1350个测试用例，以压力测试跨多样物理场景的泛化能力。超越结果匹配，我们引入了一个可验证的两阶段评估框架，该框架使用分层协议通过自动化程序检查和异构AI仲裁来验证中间推理轨迹以及最终答案。我们对27个领先LLM的评估揭示了数值精度与轨迹保真度之间的明显权衡，识别出一个复杂性悬崖，其中抽象数学预训练未能转化为高级工程任务所需的整合推理。

英文摘要

Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning capabilities imperative. However, existing benchmarks such as MMLU, MATH, and HumanEval assess isolated cognitive skills, failing to capture the physically grounded reasoning central to engineering, where scientific principles, quantitative modeling, and practical constraints must converge. To enable verifiable process supervision in engineering, we introduce EngTrace, a symbolic benchmark built on 90 parameterized templates, each generating unique, contamination-resistant problem instances, spanning three major engineering branches, nine core domains, and 20 distinct areas, yielding 1,350 test cases that stress-test generalization across diverse physical scenarios. Moving beyond outcome matching, we introduce a verifiable two-stage evaluation framework that uses a tiered protocol to validate intermediate reasoning traces alongside final answers through automated procedural checks and a heterogeneous AI Tribunal. Our evaluation of 27 leading LLMs reveals a distinct trade-off between numeric precision and trace fidelity, identifying a complexity cliff where abstract mathematical pre-training fails to translate into the integrative reasoning required for advanced engineering tasks.

URL PDF HTML ☆

赞 0 踩 0

2601.21455 2026-06-17 stat.ML cs.LG 版本更新

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

质疑共形预测中的覆盖-长度度量：当更短的区间并不更好时

Yizhou Min, Yizhou Lu, Lanqi Li, Zhen Zhang, Jiaye Teng

AI总结本文批判性检验共形预测中标准度量（覆盖率和区间长度）的充分性，揭示一种称为“偏见技巧”（PT）的反直觉方法可欺骗性地缩短区间长度而保持覆盖有效，并提出新度量“区间稳定性”以检测此类行为。

详情

AI中文摘要

共形预测（CP）已成为无分布不确定性量化的基石，通常通过其覆盖率和区间长度进行评估。本文批判性地检验了这些标准度量的充分性。我们证明，通过一种称为偏见技巧（PT）的反直觉方法，区间长度可能被欺骗性地改善，而覆盖率仍然有效。具体而言，对于任何给定的测试样本，PT 概率性地返回一个区间，该区间要么为空，要么使用调整后的置信水平构建，从而保持边际覆盖率。虽然 PT 可能产生欺骗性较低的区间长度，但它引入了实际漏洞：同一输入在算法的重复运行中可能产生完全不同的预测区间。我们正式推导了 PT 实现这些误导性改进的条件，并在各种回归和分类任务中提供了广泛的实证证据。此外，我们引入了一个新度量——区间稳定性，它有助于检测新的 CP 方法是否基于此类 PT 技术隐式地改善了长度。代码可在 https://this URL 获取。

英文摘要

Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at https://github.com/benben-cd/PT-Conformal-Prediction.

URL PDF HTML ☆

赞 0 踩 0

2603.25937 2026-06-17 cs.RO cs.LG 版本更新

AgentCyberRange：在真实网络靶场中基准测试前沿AI系统

Fengyu Liu, Jiarun Dai, Yihe Fan, Wuyuao Mai, Ziao Li, Bofei Chen, Jie Zhang, Zheng Lou, Bocheng Xiang, Qiyi Zhang, Xudong Pan, Geng Hong, Yuan Zhang, Min Yang

发表机构 * Fudan University（复旦大学）

AI总结提出首个开源多靶场基础设施AgentCyberRange，集成110个漏洞和156个内部主机，评估前沿AI系统在真实网络攻击中的能力，发现GPT-5.5+Codex在web利用和后利用任务中表现最佳。

详情

AI中文摘要

前沿AI系统在网络安全任务中能力日益增强，包括代码库检查、漏洞检测和利用。然而，评估其攻击能力仍受限于缺乏开放、可复现、多主机的网络靶场。现有公开基准测试捕获了CTF解题、漏洞复现和利用生成等孤立技能，但通常忽略了真实的入侵工作流：发现暴露服务、获得立足点、收集内部信息以及跨主机扩大入侵范围。这一差距使得早期观察新兴风险变得困难，因为前沿AI系统很少在真实攻击条件下进行评估。我们引入了AgentCyberRange，这是首个用于在真实网络靶场中衡量自主网络攻击能力的开源多靶场基础设施。它整合了15个真实Web应用和8个企业级网络靶场中的110个漏洞，以及156个内部主机，并提供了Cage工具链用于执行、编排、结果收集和验证。该基准测试涵盖两个核心阶段：Web利用（代理探索暴露的应用并验证漏洞）和后利用（代理将初始立足点转化为更广泛的内部入侵）。我们在匹配的提示和预算下评估了六个前沿AI系统。GPT-5.5与Codex表现最佳，解决了16.1%的Web利用任务和31.7%的后利用任务；在更具体的提示下，这些比率分别提高到33.0%和46.3%。我们还观察到基准测试之外的发现，包括流行项目中的未知漏洞，以及绕过主机防御的有效载荷变异。这些结果表明，开放的网络靶场评估对于在真实且可复现的条件下观察新兴攻击能力是必要的。

英文摘要

Frontier AI systems are increasingly capable of cybersecurity tasks, including codebase inspection, vulnerability detection, and exploitation. However, evaluating their offensive capabilities remains constrained by limited access to open, reproducible, multi-host cyber ranges. Existing public benchmarks capture isolated skills such as CTF solving, vulnerability reproduction, and exploit generation, but often abstract away realistic intrusion workflows: discovering exposed services, gaining a foothold, collecting internal information, and expanding compromise across hosts. This gap makes it difficult to observe emerging risks early, because frontier AI systems are rarely evaluated under realistic attack conditions. We introduce AgentCyberRange, the first open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges. It combines 110 vulnerabilities across 15 real web applications and 8 enterprise-like cyber ranges with 156 internal hosts, plus Cage, a toolchain for execution, orchestration, result collection, and verification. The benchmark covers two core stages: web exploitation, where agents explore exposed applications and validate vulnerabilities, and post exploitation, where agents turn an initial foothold into broader internal compromise. We evaluate six frontier AI systems under matched prompts and budgets. GPT-5.5 with Codex performs best, solving 16.1% of web exploitation tasks and 31.7% of post-exploitation tasks; with more concrete hints, these rates increase to 33.0% and 46.3%. We also observe out-of-benchmark findings, including unknown vulnerabilities in popular projects, and payload mutation that bypasses host defenses. These results show that open cyber-range evaluation is necessary for observing emerging offensive capabilities under realistic and reproducible conditions.

URL PDF HTML ☆

赞 0 踩 0

2606.17057 2026-06-17 cs.LG cs.AI cs.CL 新提交

Correct When Paired, Wrong When Split: Decoupling and Editing Modality-Specific Neurons in MLLMs

配对时正确，分离时错误：多模态大语言模型中模态特定神经元的解耦与编辑

Tingchao Fu, Wenkai Wang, Fanxiao Li, Huadong Zhang, Jinhong Zhang, Dayang Li, Yunyun Dong, Renyang Liu, Wei Zhou

发表机构 * School of Information Science and Engineering, Yunnan University（云南大学信息科学与工程学院）； School of Software, Yunnan University（云南大学软件学院）； National University of Singapore（新加坡国立大学）； School of Engineering, Yunnan University（云南大学工程学院）

AI总结针对多模态大语言模型知识编辑中存在的解耦失败问题，提出DECODE方法，通过显式解耦和定位模态特定神经元组，实现跨模态触发下的有效知识更新。

Comments 18 pages, 11 figures

详情

AI中文摘要

尽管知识编辑为多模态大语言模型（MLLMs）的知识更新提供了一种高效机制，但我们发现当前范式仍面临一个重要但尚未充分探索的问题：编辑解耦失败，即当模型被多模态输入（文本-图像查询对）触发时，实体相关知识可以更新，但当配对输入被拆分为单模态输入时，这些知识往往恢复为编辑前的旧事实。我们深入的实证分析表明，MLLMs中的实体知识并非以统一表示存储，而是分布在解耦的模态特定路径中。因此，偏向多模态查询的更新无法有效传播到单模态电路。为弥补这一差距，我们提出DECODE，该方法显式解耦并定位模态特定神经元组以获取目标知识。大量实验证明，DECODE在不同模态触发下均能实现有效的知识更新，从而缓解编辑解耦失败。

英文摘要

Although Knowledge Editing provides an efficient mechanism for updating the knowledge of Multimodal Large Language Models (MLLMs), we find that current paradigms still suffer from an important yet remain underexplored issue : editing decoupling failure, where entity-related knowledge can be updated when the model is triggered by multimodal inputs (text--image query pairs), however, it often reverts to outdated pre-edit facts when the paired inputs are split into unimodal ones. Our in-depth empirical analysis reveals that the entity knowledge in MLLMs is not stored as a unified representation, but is instead distributed across disentangled modality-specific pathways. As a result, updates biased toward multimodal queries fail to propagate effectively to unimodal circuits. To bridge this gap, we propose DECODE, which explicitly disentangles and localizes modality-specific neuron groups for targeted knowledge. Extensive experiments demonstrate that DECODE consistently achieves effective knowledge updates under different modality triggers, thereby mitigating editing decoupling failures.

URL PDF HTML ☆

赞 0 踩 0

2606.17093 2026-06-17 cs.LG eess.IV 新提交

Diagnosing and Repairing Shape-Prior Shortcuts in Long-Range Single-Shot Fringe Projection Profilometry

诊断和修复长距离单次条纹投影轮廓测量中的形状先验捷径

Adam Haroon, Anush Lakshman, Cody Fleming, Beiwen Li

发表机构 * Department of Mechanical Engineering, Iowa State University（爱荷华州立大学机械工程系）； College of Engineering, University of Georgia（佐治亚大学工程学院）

AI总结通过机械可解释性和共形不确定性量化诊断长距离单次条纹投影轮廓测量中网络依赖形状先验而非条纹相位解码的问题，提出PhiCalNet架构修复，将物体平均绝对误差降低3.3倍。

Comments 44 pages, 27 figures

详情

AI中文摘要

基于学习的单次条纹投影轮廓测量术（FPP）主要在近距离下研究。长距离（工作距离超过1米）情况仍未得到充分解决：平方反比强度衰减降低了条纹信噪比并降低了物理真实度，单次问题由于一幅图像中缺乏条纹阶次信息而病态，且这些架构尚未被机制性地研究。我们提出了一项诊断-修复-验证研究，使用机械可解释性（MI）和共形不确定性量化（UQ）作为收敛的诊断工具：它们在一个物理故障点上达成一致，驱动并验证了架构修复。在一个逼真的合成基准（15,600幅条纹图像，50个物体在1.5-2.1米距离）上，最佳UNet基线达到14.54毫米的物体平均绝对误差（MAE）。三种探测方法（线性探测、Grad-CAM、平面外分布测试）收敛：基线通过物体边界形状先验而非条纹相位解码来解决任务。我们通过PhiCalNet修复此问题，该网络输出包裹相位而非深度，并应用固定的可微校准层将相位映射到深度，从架构上而非通过损失惩罚从假设空间中移除形状先验解。一个物理信息损失，作为对深度回归网络的软惩罚强制执行相同物理规律，没有带来可测量的增益，从而将架构隔离为操作因素。PhiCalNet将物体MAE降低3.3倍至4.46毫米；残余由±π包裹不连续处的0.103%像素承载。逐像素共形UQ确认了诊断：通过快照不一致性拒绝前5%的物体像素，将PhiCalNet RMSE降低64%（20.6->7.4毫米），而基线仅降低3.5%。MI和UQ在相同的故障点上收敛。

英文摘要

Learning-based single-shot fringe projection profilometry (FPP) has been studied mostly at close range. The long-range regime (standoff beyond 1 m) remains largely unaddressed: inverse-square intensity falloff lowers fringe signal-to-noise ratio and degrades physical ground truth, the single-shot problem is ill-posed because fringe-order information is absent from one image, and these architectures have not been studied mechanistically. We present a diagnose-repair-verify study using mechanistic interpretability (MI) and conformal uncertainty quantification (UQ) as convergent diagnostics: they agree on one physical failure locus, driving and verifying an architectural repair. On a photorealistic synthetic benchmark (15,600 fringe images, 50 objects at 1.5-2.1 m), a best UNet baseline reaches 14.54 mm object mean absolute error (MAE). Three probes (linear probing, Grad-CAM, flat-plane out-of-distribution test) converge: the baseline solves the task via object-boundary shape priors rather than fringe-phase decoding. We repair this with PhiCalNet, which outputs wrapped phase rather than depth and applies a fixed differentiable calibration layer mapping phase to depth, removing the shape-prior solution from the hypothesis space architecturally rather than by a loss penalty. A physics-informed loss that enforces the same physics as a soft penalty on a depth-regressing network yields no measurable gain, isolating the architecture as the operative factor. PhiCalNet reduces object MAE 3.3x to 4.46 mm; the residual is carried by 0.103% of pixels at the +/-pi wrap discontinuity. Pixel-wise conformal UQ confirms the diagnosis: rejecting the top 5% of object pixels by snapshot disagreement cuts PhiCalNet RMSE by 64% (20.6->7.4 mm) versus 3.5% for the baseline. MI and UQ converge on the same failure locus.

URL PDF HTML ☆

赞 0 踩 0

2606.17113 2026-06-17 cs.LG cs.CL 新提交

The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance

模型选择在因果推断中的关键作用：基于InferBERT框架的药物警戒分类模型比较分析

Csaba Kiss, Roland Molontay, Gabriele Pergola

发表机构 * Department of Stochastics, Institute of Mathematics, Budapest University of Technology and Economics（布达佩斯技术与经济大学数学研究所随机学系）； Institute of Biostatistics and Network Science, Semmelweis University（塞梅维什大学生物统计学与网络科学研究所）； Department of Computer Science, University of Warwick（华威大学计算机科学系）

AI总结本研究在InferBERT框架下比较XGBoost、ALBERT、BioBERT和Med-LLaMA四种模型，发现领域特定预训练（BioBERT）在药物警戒因果ADE检测中优于简单基线和大型LLM，校准改善ECE但对准确率和因果发现影响不一。

Comments 10 pages, 5 figures

详情

AI中文摘要

区分因果性药物不良事件（ADE）与虚假相关性仍然是药物警戒中的核心挑战。InferBERT框架将Transformer模型与Do-calculus相结合，但其成功依赖于底层的分类模型。本研究评估了InferBERT中模型选择的影响，考察了更简单的模型是否足够、领域特定预训练是否有帮助、扩展到LLM是否能改善因果检测，以及事后校准的效果。我们在两个基准上进行了比较研究：镇痛药诱导的急性肝衰竭（AILF）和曲马多相关死亡率（TRAM）。评估了四种模型——XGBoost（基线）、ALBERT（原始InferBERT）、BioBERT（生物医学Transformer）和Med-LLaMA（医学LLM）——使用重复20次的5折交叉验证。我们测量了准确率、等渗回归前后的期望校准误差（ECE），以及因果项与PRR、ROR和EBGM的Jaccard一致性；显著性通过配对t检验测试。BioBERT在两个数据集上均取得了最高准确率，而Med-LLaMA尽管规模大且进行了参数高效微调，表现不佳。领域特定预训练起到了决定性作用。校准改善了ECE，但对准确率和因果发现的影响不一。BioBERT的优越性也使其与传统药物警戒信号的一致性最强。这些结果表明，领域特定预训练相比简单基线和更大的LLM具有明显优势。在计算药物警戒中，投资于可管理的、领域感知的模型比单纯扩大模型规模更有效。

英文摘要

Distinguishing causal adverse drug events (ADEs) from spurious correlations remains a central challenge in pharmacovigilance. The InferBERT framework integrates transformer models with Do-calculus, but its success hinges on the underlying classification model. This study evaluates the impact of model choice in InferBERT, assessing whether simpler models suffice, if domain-specific pre-training helps, whether scaling to LLMs improves causal detection, and the effect of post-hoc calibration. We performed a comparative study on two benchmarks: Analgesics-induced Acute Liver Failure (AILF) and Tramadol-related Mortalities (TRAM). Four models were evaluated-XGBoost (baseline), ALBERT (original InferBERT), BioBERT (biomedical transformer), and Med-LLaMA (medical LLM)-using 5-fold cross-validation repeated over 20 runs. We measured accuracy, Expected Calibration Error (ECE) pre- and post-isotonic regression, and Jaccard concordance of causal terms with PRR, ROR, and EBGM; significance was tested with paired t-tests. BioBERT achieved the highest accuracy on both datasets, while Med-LLaMA underperformed despite its size and parameter-efficient fine-tuning. Domain-specific pre-training was decisive. Calibration improved ECE but had mixed effects on accuracy and causal discovery. BioBERT's superiority also yielded the strongest concordance with traditional pharmacovigilance signals. These results show that domain-specific pre-training provides a clear advantage over simpler baselines and larger LLMs. Investing in manageable, domain-aware models is more effective for computational pharmacovigilance than simply scaling model size.

URL PDF HTML ☆

赞 0 踩 0

2606.17115 2026-06-17 cs.LG cs.AI q-bio.QM 新提交

Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

探测、融合与可信度：基础模型表示在多模态癌症分析中的系统评估

Jingyu Hu, Giuseppe Tripodi, Reed Naidoo, Sarah F. McGough, Tapabrata Chakraborti

发表机构 * The Alan Turing Institute（艾伦·图灵研究所）； University of Bristol（布里斯托大学）； University of Manchester（曼彻斯特大学）； The Institute of Cancer Research（癌症研究所）； Genentech（基因泰克）

AI总结系统评估基础模型表示在计算病理学任务中的性能，发现图像和组学表示互补，多模态融合在单模态不占优时有效，并利用共形预测验证了不确定性感知推理的临床价值。

详情

AI中文摘要

基础模型（FMs）已成为医学数据的强大表示提取器，但它们在分布偏移下的泛化能力仍未充分探索。本工作系统评估了基于FM的表示在计算病理学任务上的表现，涉及两个真实世界商业队列IH-BC和IH-NSCLC，这些队列来自许可的内部（IH）肿瘤学数据集。分析聚焦于两种模态：全切片图像和转录组图谱，均来自IH多模态数据。我们首先在八个下游分类任务上对五个FM进行单模态探测性能基准测试，发现图像和组学表示携带互补的预测信号。然后，我们通过比较三种基于配对表示的图像-组学融合策略，研究多模态融合是否能在单模态基线之上带来额外收益。进一步通过共形预测评估所选单模态和多模态管道的可信度。我们的结果表明，FM表示在分布外数据上取得了竞争性性能，且多模态融合主要在单模态不占主导信号时有所帮助。共形预测揭示，在点预测失败的大多数情况下，真实诊断仍可在预测集中恢复，这强化了不确定性感知推理对临床支持的价值。

英文摘要

Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.

URL PDF HTML ☆

赞 0 踩 0

2606.17233 2026-06-17 cs.LG stat.ML 新提交

Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

基于多项式混沌展开与多元主动学习的工程结构不确定性量化

Qitian Lu, Jafar Jafari-Asl, Panagiotis Spyridis, Lukas Novak

发表机构 * Brno University of Technology（布尔诺理工大学）； University of Rostock（罗斯托克大学）

AI总结针对多输出工程问题中单一实验设计难以同时准确近似所有输出量的问题，提出一种自适应序贯采样方法，通过平衡输入空间探索与多输出聚合方差信息，构建多项式混沌展开代理模型，数值实验表明该方法提高了代理精度和稳定性。

详情

AI中文摘要

在许多工程应用中，单个高保真模型在相同输入参数下产生多个感兴趣的量（QoIs），例如复杂物理系统的有限元模型。为了减轻直接模型评估的高计算成本，代理模型被广泛用于构建模型响应的高效近似。自然地，代理模型的精度强烈依赖于实验设计（ED）的质量。然而，单个ED可能无法同时为所有输出提供足够的表示，特别是当不同输出对输入变量表现出不同的敏感性时。一个直接的解决方案是为每个输出分别进行采样，但这会导致采样复杂性和计算成本增加。从统计角度来看，这种方法也忽略了所有输出之间潜在的相关性，并可能损害数据一致性。为了解决这个问题，一种用于构建多项式混沌展开代理模型的自适应序贯采样方法被推广到向量值QoIs。该方法基于新样本对输出方差的局部贡献，从候选池中顺序选择新样本，同时平衡基于距离的输入空间探索和跨所有输出的聚合方差信息的利用。通过来自工程问题的几个数值示例，将其性能与非序贯拉丁超立方采样进行比较。数值结果表明，所提出的策略提高了代理模型的精度和稳定性，并提供了更可靠的二阶统计量估计。

英文摘要

In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.

URL PDF HTML ☆

赞 0 踩 0

2606.17345 2026-06-17 cs.LG cs.AI 新提交

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

棒球投球序列的反事实优化及其对赛季级统计指标影响的估计

Ryota Takamido, Hiroki Nakamoto

发表机构 * Sports Innovation Organization, National Institute of Fitness and Sports in Kanoya（体育创新组织，国立健身与体育研究所）

AI总结利用Transformer模型和反事实分析，优化MLB投球序列中的最终投球和设置投球，发现可显著提升赛季级表现（如K/9提高1.0以上），并提供了速度带有效位置等实用见解。

详情

AI中文摘要

尽管投球序列是棒球分析的核心话题，但以往研究主要关注单次打席中最终投球的优化，对前期设置投球的作用及其对长期赛季级表现的影响研究不足。为解决这些问题，本研究利用MLB Statcast数据进行了反事实分析。训练了一个基于Transformer的机器学习模型，用于预测目标投球是否会导致击球结果或挥空。然后，通过将最终投球或前期设置投球替换为替代的投球类型和位置，同时保持周围背景信息不变，生成了反事实投球序列。最优反事实选择定义为那些最小化预测击球概率的选择，并使用将模型输出与赛季统计指标关联的回归模型估计其对投手赛季统计指标的预期影响。结果表明，最终投球和设置投球的优化都可能显著影响赛季级表现，包括K/9提高超过1.0。分析还提供了若干实用见解，包括特定速度带的有效位置、投球指令的重要性以及通过中速投球扩展投球选择范围。这些发现定量支持了投球序列在棒球中的战略重要性。

英文摘要

Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long-term season-level performance insufficiently examined. To address these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, including improvements of more than 1.0 in K/9. The analyses also provided several practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.

URL PDF HTML ☆

赞 0 踩 0

2606.17413 2026-06-17 cs.LG stat.AP 新提交

Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

基于深度学习的OCO-2光谱大气CO2摊销概率检索：结合拉普拉斯近似与归一化流

Alejandro Calle-Saldarriaga, Felix Jimenez, Jack Grosskreuz, Jiazheng Wang, Jonathan Hobbs, Matthias Katzfuss

发表机构 * University of Wisconsin–Madison（威斯康星大学麦迪逊分校）； Jet Propulsion Laboratory, California Institute of Technology（加州理工学院喷气推进实验室）

AI总结提出深度学习框架，利用拉普拉斯近似和归一化流从OCO-2光谱中快速、准确地检索大气CO2浓度，并量化不确定性，相比传统方法加速数个数量级且精度更高。

Comments 23 pages, 8 figures

详情

AI中文摘要

基于空间的大气二氧化碳（CO2）监测对于约束全球碳收支至关重要。NASA的轨道碳观测者-2号（OCO-2）利用高分辨率光谱估算柱平均干空气CO2摩尔分数（XCO2）。然而，当前的操作检索算法计算成本高且未能正确量化不确定性。我们提出了一种新颖的深度学习框架来解决这些挑战。由于真实卫星观测的地面真值数据难以获取，我们使用高保真模拟数据集开发并验证了我们的方法。该数据集旨在支持OCO-2不确定性量化（UQ），并包含了真实的前向模型误差。我们的架构使用多分支神经网络编码光谱波段，并通过两种可扩展的UQ方法——拉普拉斯近似和归一化流——来估计完整CO2柱或其所需汇总的后验分布。与操作性的“全物理”求解器相比，我们的方法具有五个关键优势：（1）摊销：推理速度提高数个数量级，能够实时处理海量数据流；（2）模型误差鲁棒性：通过在明确包含模型差异的模拟数据上训练，我们的方法考虑了标准反演中常被忽略的系统误差；（3）点估计精度：与基线方法相比，我们实现了更优的预测精度；（4）改进的UQ：概率输出提供了校准更好的不确定性估计；（5）非高斯后验：当使用归一化流时，我们的框架成功建模了复杂、非对称的后验分布，克服了高斯假设的局限性。这些结果表明，基于模拟的深度学习是迈向下一代操作处理系统的可行路径。

英文摘要

Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

URL PDF HTML ☆

赞 0 踩 0

2606.17445 2026-06-17 cs.LG cond-mat.mtrl-sci physics.chem-ph 新提交

Toward Controllable Catalyst Inverse Design via Large-Scale Autoregressive Pretraining

面向可控催化剂逆向设计的大规模自回归预训练

Dong Hyeon Mok, Jonggeol Na, Seoin Back

发表机构 * Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University（化学与生物分子工程系，新兴材料研究所，首尔大学）； Department of Chemical Engineering and Materials Science, Ewha Womans University（化学工程与材料科学系，成实女子大学）； Department of Chemical Engineering, Graduate Program in System Health Science and Engineering, Ewha Womans University（化学工程系，系统健康科学与工程研究生院，成实女子大学）； Institute for Multiscale Matter and Systems (IMMS), Ewha Womans University（多尺度物质与系统研究所（IMMS），成实女子大学）； KU-KIST Graduate School of Converging Science and Technology, Korea University（KU-KIST融合科学与技术研究生院，韩国大学）； Department of Integrated Energy Engineering, Korea University（整合能源工程系，韩国大学）； Center for Hydrogen and Fuel Cells, Korea Institute of Science and Technology(KIST)（氢气与燃料电池中心，韩国科学技术院（KIST））

AI总结提出基于生成式预训练Transformer的条件催化剂生成模型，通过大规模预训练和微调实现高结构有效性和条件匹配率，显著提升筛选效率。

详情

AI中文摘要

多相催化剂的逆向设计仍然具有挑战性，因为催化剂表面表现出显著的结构复杂性，在广阔的化学空间中存在耦合的表面-吸附物相互作用，仅通过传统筛选难以高效探索。尽管基于机器学习的高通量筛选加速了催化剂发现，但其效率随着搜索空间的增长而不可避免地下降，这促使了能够直接构建具有目标特性的催化剂的生成模型的发展。在这里，我们提出了一种基于生成式预训练Transformer架构的条件催化剂生成模型，该模型具有数值嵌入层，能够在单一自回归框架内生成以分类和连续属性为条件的催化剂结构。该模型在1.33亿个催化剂结构上进行了预训练，随后在大约46万个优化结构上进行了微调，这些结构具有相关的分类属性和结合能，用于条件生成。最终模型实现了98%的结构有效性、95%的优化有效性以及高分类条件保真度，吸附物类型和组成的联合匹配率达到93%。对于结合能条件，约20%的匹配率相比基线训练分布提高了四倍，生成的分布系统地朝向目标值偏移，使得无需额外微调即可将反应靶向催化剂发现的筛选效率提高1.5至4倍。这些结果表明，大规模自回归预训练结合显式属性条件为可控催化剂生成和加速催化剂发现提供了一条实用途径。

英文摘要

Inverse design of heterogeneous catalysts remains challenging because catalyst surfaces exhibit substantial structural complexity with coupled surface-adsorbate interactions across a vast chemical space that is difficult to explore efficiently through conventional screening alone. Although machine learning-based high-throughput screening has accelerated catalyst discovery, its efficiency inevitably declines as the search space grows, motivating the development of generative models that can directly construct catalysts with target properties. Here, we present a conditional catalyst generative model based on the Generative Pretrained Transformer architecture with a numerical embedding layer that enables the generation of catalyst structures conditioned on both categorical and continuous properties within a single autoregressive framework. The model was pretrained on 133 million catalyst structures and subsequently fine-tuned on approximately 460,000 optimized structures with associated categorical properties and binding energies for conditional generation. The resulting model achieved 98% structural validity, 95% optimization validity, and high categorical condition fidelity, with a 93 % joint match rate for adsorbate type and composition. For binding energy conditioning, the match rate of approximately 20% represents a four-fold improvement over the baseline training distribution, and the generated distributions shift systematically toward the target values, enabling a 1.5 to 4-fold improvement in screening efficiency for reaction-targeted catalyst discovery without additional fine-tuning. These results show that large-scale autoregressive pre-training, combined with explicit property conditioning, provides a practical route toward controllable catalyst generation and accelerated catalysts discovery.

URL PDF HTML ☆

赞 0 踩 0

2606.17451 2026-06-17 cs.LG cs.RO 新提交

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

操作设计域转移下自动驾驶汽车责任的可信度加权定价

Doyeon Jang

AI总结针对自动驾驶系统部署中经验稀疏、ODD转移及风险非平稳问题，提出分层贝叶斯可信度框架，通过ODD相似性核进行部分池化，在Waymo数据上验证其有效性。

2606.17462 2026-06-17 cs.LG cs.NI 新提交

ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation

ResAware: 通过资源特权蒸馏实现跨环境网站指纹识别

Chongru Fan, Wei Wang, Wentao Huang, Zhenquan Ding, Jinqiao Shi, Lei Cui, Zhiyu Hao, Xiaochun Yun

发表机构 * Beijing University of Posts and Telecommunications（北京邮电大学）； Zhongguancun Laboratory（中关村实验室）

AI总结提出ResAware框架，利用资源级特征训练教师模型并通过异构知识蒸馏指导学生模型，在不增加在线开销下提升跨环境鲁棒性，在五个月大规模数据集上显著提升基线方法性能。

Comments 18 pages, 9 figures

详情

AI中文摘要

虽然网站指纹识别（WF）攻击在受控实验室环境中实现了高精度，但在现实环境中，由于时空漂移、浏览器异构性、代理混淆等因素，其性能往往大幅下降。这一限制源于它们仅依赖低层流量特征，而这些特征噪声大且对环境扰动高度敏感。为解决此问题，我们提出\textbf{ResAware}，一种在\textit{训练丰富/推理贫乏}非对称设置下的跨环境资源感知蒸馏框架。具体来说，ResAware在资源级特征上训练教师模型，然后通过异构知识蒸馏将所得特权知识蒸馏到学生模型中。部署时，学生模型仅使用加密流量进行推理，不产生额外成本。我们在一个跨越五个月、从六个全球观测点收集的大规模数据集上评估ResAware，包含超过$160{,}000$个配对样本。结果表明，ResAware显著增强了多种WF基线的跨环境鲁棒性。例如，在150天的时间漂移下，ResAware将Var-CNN的F1分数从$72.77\%$提升至$81.49\%$，开放世界$TPR@1\%FPR$从$22.40\%$提升至$27.20\%$。我们的结果表明，资源级监督在不扩大在线观测能力的情况下提高了WF鲁棒性。

英文摘要

While Website Fingerprinting (WF) attacks achieve high accuracy in controlled laboratory settings, they often degrade substantially in real-world environments due to spatio-temporal drift, browser heterogeneity, proxy obfuscation and etc. This limitation stems from their sole reliance on low-level traffic features that are noisy and highly sensitive to environmental perturbations. To address this problem, we propose \textbf{ResAware}, a cross-environment resource-aware distillation framework under a \textit{training-rich/inference-poor} asymmetric setting. Specifically, ResAware trains a teacher model on resource-level features, and then distills the resulting privileged knowledge into a student model through heterogeneous knowledge distillation. At deployment time, the student model performs inference using only encrypted traffic, incurring zero additional cost. We evaluate ResAware on a large-scale dataset collected over five months from six globally distributed vantage points, comprising more than $160{,}000$ paired samples. The results show that ResAware significantly enhances the cross-environment robustness of diverse WF baselines. Under a 150-day temporal drift, for example, ResAware improves the F1-score of Var-CNN from $72.77\%$ to $81.49\%$ and the open-world $TPR@1\%FPR$ from $22.40\%$ to $27.20\%$. Our results demonstrate that resource-level supervision improves WF robustness without expanding online observation capabilities.

URL PDF HTML ☆

赞 0 踩 0

2606.17476 2026-06-17 cs.LG 新提交

Multi-Adapter PPO: A Cross-Attention Enhanced Wavelength Selection Framework for LIBS Quantitative Analysis

多适配器PPO：一种用于LIBS定量分析的交叉注意力增强波长选择框架

Hao Li, Man Fung Zhuo

发表机构 * Electrical and Computer Engineering（电气与计算机工程系）； University of Arizona（亚利桑那大学）； Computer Engineering University of Arizona Tucson, USA（计算机工程大学亚利桑那大学图森美国）

AI总结提出多适配器PPO框架，将波长选择转化为强化学习问题，利用交叉注意力和多适配器捕获光谱关系，在钢铁和煤炭数据集上综合评分平均提升28.4%，预测精度提升45.2%。

Comments 6 pages

2606.17553 2026-06-17 cs.LG 新提交

基于Delta目标重构的LSTM与Transformer短期电力负荷预测

Vansh Bansal

AI总结针对电力负荷非平稳性，提出Delta目标重构方法，让LSTM和Transformer预测负荷变化量而非绝对值，在小时级预测中MAE和MAPE降低超50%。

Comments 8 pages, 3 tables

详情

AI中文摘要

准确的短期电力负荷预测对于现代电力系统的可靠和经济运行至关重要，尤其是在天气变化、日历效应和消费模式演变导致的非平稳性下。尽管LSTM和Transformer等深度学习模型表现出色，但大多数现有研究侧重于直接预测绝对负荷，而未明确解决目标非平稳性。受ARIMA模型中经典时间序列差分技术的启发，本文研究了一种基于Delta的目标重构方法，用于深度学习的短期电力负荷预测。该方法不直接预测绝对负荷值，而是训练模型预测连续时间步之间的负荷变化，最终预测通过最后一次观测负荷重建。这旨在稳定学习目标并降低预测难度。利用印度多年逐小时真实电力负荷数据，辅以NASA POWER项目的气象变量和日历特征，本研究评估了LSTM和Transformer在两种公式下的表现，并以LightGBM作为基准。实验针对小时前和日前预测范围进行，通过平均绝对误差（MAE）和平均绝对百分比误差（MAPE）评估性能。结果表明，Delta重构在所有评估模型的小时前预测中持续提高预测精度，与绝对公式相比，MAPE降低超过50%。对于日前预测，Delta目标特别有利于深度序列模型（LSTM和Transformer），而LightGBM在绝对公式下仍具有竞争力。这些发现表明，Delta重构是神经网络的一种强大归纳偏置，但其效果依赖于模型和预测范围。

英文摘要

Accurate short-term electricity load forecasting is critical for the reliable and economic operation of modern power systems, under non-stationarity arising from weather variability, calendar effects, and evolving consumption patterns. While deep learning models such as LSTMs and Transformers show promising performance, most existing studies focus on direct absolute load prediction without explicitly addressing target non-stationarity. Motivated by classical time-series differencing techniques in ARIMA models, this paper investigates a delta-based target reformulation for short-term electricity load forecasting using deep learning. Instead of directly predicting absolute load values, the proposed formulation trains models to predict the change in load between consecutive time steps, with final forecasts reconstructed using the last observed load. This aims to stabilize the learning target and reduce forecasting difficulty. Using multi-year, hourly real-world electricity load data from India, augmented with meteorological variables from the NASA POWER project and calendar features, this study evaluates LSTM and Transformer models under both formulations, benchmarking them against LightGBM. Experiments are conducted for hour-ahead and day-ahead horizons, assessing performance via Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Results show that delta-based reformulation consistently improves forecasting accuracy for hour-ahead prediction across all evaluated models, yielding MAPE reductions of over 50% compared to absolute formulations. For day-ahead forecasting, delta targets specifically benefit deep sequence models (LSTM and Transformer), while LightGBM remains competitive under the absolute formulation. These findings indicate that while delta reformulation is a powerful inductive bias for neural networks, its efficacy is model- and horizon-dependent.

URL PDF HTML ☆

赞 0 踩 0

2606.17805 2026-06-17 cs.LG 新提交

QueryMarket: Cost-Aware Online Active Learning in Data Markets

QueryMarket: 数据市场中成本感知的在线主动学习

Xiwen Huang, Pierre Pinson

发表机构 * Dyson School of Design Engineering, Imperial College London（帝国理工学院戴森设计工程学院）； Halfspace (part of Accenture)（埃森哲旗下Halfspace）； Technical University of Denmark (DTU Management)（丹麦技术大学（DTU管理系））； Aarhus University (CoRE)（奥胡斯大学（CoRE））

AI总结提出QueryMarket框架和OVBAL算法，通过D-最优性准则估计边际效用，在滚动预算约束下实现成本感知的在线主动学习，适应非平稳流和异构标签成本。

Comments 10 pages, 8 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems

详情

AI中文摘要

数据采集是实时流学习中一个主要瓶颈：分析师必须在滚动预算约束下即时决定购买哪些标签。然而，现有的在线主动学习很少在概念漂移下统一考虑定价、信息增益和滚动预算约束。我们引入了QueryMarket，一个受市场启发的框架，它根据每个传入数据点对模型的估计效用及其价格进行查询。在该框架内，我们提出了OVBAL（基于方差的在线主动学习），它通过使用带有指数遗忘的D-最优性准则估计每个样本的边际效用，并在滚动预算约束下执行成本感知的购买，将数据定价与信息驱动的选择相结合。OVBAL产生了一个简单的、完全在线的决策规则，能够适应非平稳流和异构标签成本。在合成数据和真实世界太阳能发电预测任务上的实验表明，OVBAL在卖方中心定价下特别有效，并且在两种定价方案下，在真实世界任务中实现了更有利的长期误差-成本权衡。

英文摘要

Data acquisition is a major bottleneck for learning in real-time streams: analysts must decide on the fly which labels to purchase while respecting a rolling budget. However, existing online active learning rarely unifies pricing, information gain, and rolling budget constraints under concept drift. We introduce QueryMarket, a market-inspired framework that queries each incoming data point based on its estimated utility to the model and its price. Within this framework, we propose OVBAL (online variance-based active learning), which integrates data pricing with information-driven selection by estimating each sample's marginal utility via a D-optimality criterion with exponential forgetting and executing cost-aware purchases under rolling budget constraints. OVBAL yields a simple, fully online decision rule that adapts to nonstationary streams and heterogeneous label costs. Experiments on synthetic data and a real-world solar power generation forecasting task show that OVBAL is particularly effective under seller-centric pricing and yields a more favorable long-run error-cost trade-off in the real-world task under both pricing schemes.

URL PDF HTML ☆

赞 0 踩 0

2606.17931 2026-06-17 cs.LG 新提交

Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model

电子商务中基于混合Ret-DNN与XGBoost模型的客户行为预测分析

Degala Pushpa Sri, Mayank Atreya, Lakshmi. H, Navin Chhibber, Mukesh Soni

发表机构 * Chewy Inc（Chewy公司）； Pace Institute of Technology and Atlanta, USA（佩斯理工学院和亚特兰大美国）； Nitte Meenakshi Institute of Sciences（尼特梅恩克希科学学院）； Lovely Professional University（洛丽专业大学）； Infinity Tech Group（无限科技集团）； University（大学）

AI总结提出混合Ret-DNN与XGBoost模型，通过特征提取和梯度提升预测客户购买概率，在UK零售数据集上MAE达0.2193。

Comments 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON)

详情

AI中文摘要

近年来，电子商务服务在人们的日常生活中迅速增长，帮助他们在线购买产品。然而，零售平台难以理解客户行为，并难以预测其未来购买。为克服这些挑战，本研究提出一种混合零售深度神经网络（Ret-DNN）与极端梯度提升（XGBoost）模型，用于捕捉零售数据的时间特征和表格动态。首先，数据来自一家英国在线零售商，包含近50万条交易记录。然后，使用一系列技术对收集的数据进行预处理，如数据清洗、异常值处理、时间特征提取、特征编码和z-score归一化，以确保数据准备好进行模型训练和测试。随后，预处理后的数据被输入到Ret-DNN模型中，该模型作为特征提取器，理解客户交易的完整上下文。进一步，提取的数据作为输入输入到XGBoost模型，该模型预测最终输出为客户购买概率。最后，提出的Ret-DNN XGBoost模型取得了更好的结果，平均绝对误差（MAE）为0.2193，优于现有的Ret-DNN模型。关键词：客户行为预测，极端梯度提升，电子商务，预测分析，零售深度神经网络。

英文摘要

In recent years, electronic (E) commerce services have rapidly increased in the daily lives of people, which helpsthem to purchase products online. However, retail platforms have struggled to understand customer behavior and make it difficult to predict their future purchases. To overcome these challenges, this study proposes a hybrid Retail Deep NeuralNetwork (Ret-DNN) with an Extreme Gradient Boosting(XGBoost) model for capturing temporal features and tabular dynamics of retail data. First, data were sourced from a UnitedKingdom (UK)-based online retailer that contains transactions with almost 500,000 records. Then, the collected data were pre-processed using a series of techniques, such as data cleaning, outlier handling, temporal feature extraction, feature encoding, and z-score normalization, to ensure that the data were ready for model training and testing. Subsequently, the preprocessed data were fed into the Ret-DNN model, which acts as a feature extractor to understand the complete context of customer transactions. Further, the extracted data were fed as input into the XGBoost model, which predicted the final output as the purchase probability of customers. Finally, the proposed Ret-DNN XGBoost model achieved better results by attaining aMean Absolute Error (MAE) 0.2193 when compared to the existing Ret-DNN model. Keywords: Customer behavior forecasting, extreme gradientboosting, electronic commerce, predictive analytic, retail deepneural networks.

URL PDF HTML ☆

赞 0 踩 0

2606.17996 2026-06-17 cs.LG cs.AI 新提交

KFTD: 用于连续海洋时空预测的Koopman-Fourier时间可微网络

Qinghui Chen, Zekai Zhang, Hailong Liu, Jinglin Zhang, Cong Bai

发表机构 * Shandong University（山东大学）； Laoshan Laboratory（崂山实验室）； Chinese Academy of Sciences（中国科学院）； Zhejiang University of Technology（浙江工业大学）

AI总结提出KFTD网络，通过Koopman线性空间和傅里叶分析实现连续时间插值，结合轻量残差网络进行预测，在四个海洋数据集上均方误差平均降低5.6%，效率提升76.25%。

详情

AI中文摘要

准确的海洋预测对于气候监测和灾害预警至关重要。然而，海洋时空预测面临建模复杂动力系统和确保计算效率的双重挑战。我们提出了Koopman傅里叶时间可微（KFTD）网络，一种时间连续的两阶段范式，将插值与预测解耦，以实现高效且可扩展的时空建模。我们将复杂的非线性动力学映射到Koopman线性空间，并利用傅里叶分析实现任意子步的连续时间插值。一个轻量级残差网络消耗高保真中间状态以产生最终预测。与扩散模型不同，KFTD消除了多步噪声采样，直接在连续时间内演化系统，实现了4倍的计算加速。我们进一步引入DPP损失，以端到端方式支持任意PDE约束，打破了纯数据驱动方法的物理一致性瓶颈。在四个海洋数据集上的实验结果证实，我们的连续时间框架使MSE平均降低5.6%（SST最高达12.7%），并且效率比MCVD提高了76.25%。

英文摘要

Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring computational efficiency. We present Koopman Fourier Time-Differentiable (KFTD) Network, a time continuous twostage paradigm that decouples interpolation from prediction to achieve efficient and scalable spatiotemporal modeling. We map complex nonlinear dynamics into the Koopman linear space and exploit Fourier analysis to enable continuous time interpolation at arbitrary sub-steps. A lightweight residual network consumes the high fidelity intermediate states to yield the final forecast. Unlike diffusion models, KFTD eliminates multi step noise sampling and directly evolves the system in continuous time, yielding a 4 computational speedup. We further introduce a DPP Loss that supports arbitrary PDE constraints in an endtoend manner, breaking the physical consistency bottleneck of pure data-driven approaches. Empirical results on four ocean datasets confirm that our continuous time framework reduces MSE by an average of 5.6% (up to 12.7% for SST) and improves efficiency over MCVD by 76.25%.

URL PDF HTML ☆

赞 0 踩 0

2606.17109 2026-06-17 cs.CR cs.AI cs.LG 交叉投稿

Timestamp-Aware Spatio-Temporal Graph Contrastive Learning for Network Intrusion Detection

时间戳感知的时空图对比学习用于网络入侵检测

Jianli Dai, Guangwei Wu, Jiacheng Li, Weiping Wang, An He, Xinjun Xiao

发表机构 * Central South University of Forestry and Technology, School of Computer Science and Mathematics（中央林业科技大学计算机科学与数学学院）； Central South University, School of Computer Science and Engineering（中南大学计算机科学与工程学院）

AI总结提出一种自监督图神经网络框架，通过时间戳构建时序图，结合E-GraphSAGE和LSTM编码时空依赖，并采用多视图图对比学习（时空特征对比）及自适应权重策略，在四个数据集上达到与监督方法相当的性能。

详情

AI中文摘要

鉴于图神经网络（GNN）在建模网络流量间关系结构方面的有效性，它们已被广泛用于网络入侵检测系统（NIDS）。然而，大多数现有基于GNN的NIDS方法关注流量关系的结构，并将其视为时间独立，这限制了它们应对不断演变的攻击行为的能力。此外，它们对监督或半监督学习的依赖通常限制了对未见攻击的泛化能力。为解决这些限制，我们提出了一种新颖的自监督GNN框架。据我们所知，所提出的模型是首批显式利用真实时间戳的自监督GNN-based NIDS模型之一，这为表示学习提供了忠实的时间依赖关系。我们首先根据时间戳从网络流量中构建一系列时序图，然后采用基于E-GraphSAGE和LSTM的编码器充分提取网络流量的时间信息和空间依赖关系，而无需引入耗时的注意力机制。引入了一种多视图图对比学习（GCL）方案，其中联合执行时间、空间和特征对比，分别捕获时间连续性、保持结构一致性并提高所学表示的泛化性和鲁棒性。此外，设计了一种基于梯度范数的自适应加权策略来优化对比损失权重。在四个具有真实时间戳的代表性NIDS数据集上的实验结果表明，我们的方法显著优于现有自监督方法，并达到了与监督最先进GNN方法相当的性能，同时保持了高计算效率。

英文摘要

Given their effectiveness in modeling the relational structure among network traffic flows, graph neural networks (GNNs) have been widely adopted in network intrusion detection systems (NIDSs). However, most existing GNN-based NIDS approaches focus on the relational structure of traffic flows, and treat them as temporally independent, which limits their ability to cope with evolving attack behaviors. Moreover, their reliance on supervised or semi-supervised learning often restricts generalization to unseen attacks. To address these limitations, we propose a novel self-supervised GNN-based framework. To the best of our knowledge, the proposed model is among the first self-supervised GNN-based NIDS models to explicitly leverage real timestamps, which provides faithful temporal dependencies for representation learning. We first construct a series of temporal graphs from network traffic flows according to their timestamps, and then employ an E-GraphSAGE and LSTM based encoder to fully extract temporal information and spatial dependencies of network traffic, without introducing time-costly attention mechanisms. A multi-view graph contrastive learning (GCL) scheme is introduced, where temporal, spatial, and feature contrasts are jointly performed to capture temporal continuity, preserve structural consistency, and improve the generalization and robustness of the learned representations, respectively. In addition, a gradient-norm-based adaptive weighting strategy is designed to optimize the contrastive loss weights. Experimental results on four representative NIDS datasets with real timestamps demonstrate that our method significantly outperforms existing self-supervised approaches and achieves performance comparable to the supervised state-of-the-art GNN method, while maintaining high computational efficiency.

URL PDF HTML ☆

赞 0 踩 0

2606.17121 2026-06-17 stat.AP cs.LG physics.flu-dyn 交叉投稿

Regularized Machine Learning for System Identification of Ship Free-Running Manoeuvres from CFD-Based Synthetic Data: A Comparative Study

基于CFD合成数据的船舶自由航行操纵系统辨识的正则化机器学习：比较研究

R. F. Suárez, J. C. Berndt, M. Abdel-Maksoud

发表机构 * Hamburg University of Technology (TUHH)（汉堡技术大学）

AI总结本研究使用正则化回归方法从CFD生成的自由航行数据中辨识船舶水动力系数，重点评估了系数集大小、训练长度和操纵组合对模型性能的影响，发现Ridge回归在计算效率和预测精度间取得最佳平衡。

Comments 28 pages

详情

AI中文摘要

本研究探讨了从CFD生成的自由航行仿真数据中辨识船舶水动力系数的监督机器学习技术。具体而言，将普通最小二乘法和正则化回归方法应用于Abkowitz型操纵模型。训练和验证数据集来自Z形和回转操纵的URANS仿真，这些仿真已通过实验基准数据验证。分析评估了系数集大小、预测模型训练所需的最小训练长度以及操纵组合对模型性能的影响。结果表明，只要通过适当的系数选择、回归模型或输入数据变异性解决多重共线性问题，大角度Z形操纵适用于水动力系统辨识。较大的系数集为可变条件提供了更大的模型灵活性，但更容易出现多重共线性。正则化回归技术有效缓解了多重共线性，并显著提高了预测精度，而纳入更多样化的操纵数据同样如此。在测试的模型中，Ridge回归在计算效率和预测精度之间提供了最佳折衷。

英文摘要

This study investigates supervised machine learning techniques for identifying ship hydrodynamic coefficients from CFD-generated data from free-running simulations. Specifically, ordinary least squares and regularized regression methods are applied to Abkowitz-type manoeuvring models. Training and validation datasets are derived from URANS simulations of zig-zag and turning circle manoeuvres, which are validated against experimental benchmark data. The analysis evaluates the effects of coefficient set size, minimum training length required for predictive model training, and manoeuvre combinations on model performance. Results demonstrate the suitability of large-angle zig-zag manoeuvres for hydrodynamic system identification, provided that multicollinearity is addressed through appropriate coefficient selection, regression models, or input data variability. Larger coefficient sets offer greater model flexibility for variable conditions but are more prone to multicollinearity. Regularized regression techniques effectively mitigate multicollinearity and notably enhance prediction accuracy, as does incorporating more diverse manoeuvring data. Among tested models, Ridge regression provided the best compromise between computational efficiency and prediction accuracy.

URL PDF HTML ☆

赞 0 踩 0

2606.17294 2026-06-17 cs.RO cs.LG 交叉投稿

纽约市拥堵收费后公共交通增益与空间不均的出行需求变化

Donghang Li, Dingyi Zhuang, Yunlin Li, Chenan Shen, Nina Cao, Yunhan Zheng, Shenhao Wang, Jinhua Zhao

发表机构 * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology（麻省理工学院土木与环境工程系）； Department of Urban Studies and Planning, Massachusetts Institute of Technology（麻省理工学院城市研究与规划系）； Mathematical Institute, University of Oxford（牛津大学数学院）； Department of Mechanical Engineering, Massachusetts Institute of Technology（麻省理工学院机械工程系）； College of Urban and Environmental Sciences, Peking University（北京大学城市与环境科学学院）； Department of Urban and Regional Planning, University of Florida（佛罗里达大学城市与区域规划系）； Center for Computational Science and Engineering, Massachusetts Institute of Technology（麻省理工学院计算科学与工程中心）

AI总结利用时间序列基础模型生成概率反事实预测，评估纽约市2025年实施的拥堵收费政策，发现公交和地铁客流量显著增加，但总体出行需求略有下降，且影响存在空间异质性。

详情

AI中文摘要

纽约市于2025年1月实施了全国首个基于区域的拥堵收费计划，为评估全系统城市出行如何响应大规模定价干预提供了机会。由于此类政策会在不同交通方式和区域间产生溢出效应，因此难以构建可信的控制组。我们利用时间序列基础模型生成具有校准不确定性的概率反事实需求预测，以应对这一挑战。将该框架应用于公交、地铁和总出行量数据，我们发现，与预期无政策需求相比，政策实施后公交和地铁客流量显著增加，而总体出行需求略有下降。影响存在空间异质性：总体出行需求的减少集中在拥堵缓解区内，而公共交通的增益则延伸至曼哈顿核心区以外。社会人口分析进一步揭示了不同社区之间的适应差异，凸显了空间公平性问题。我们的框架为在缺乏干净控制组的情况下，对全系统城市干预进行不确定性感知评估提供了一种可扩展的方法。

英文摘要

New York City implemented the nation's first cordon-based congestion pricing program in January 2025, providing an opportunity to evaluate how system-wide urban mobility responds to large-scale pricing interventions. Because such policies generate spillovers across modes and locations, credible control groups are difficult to construct. We address this challenge using time series foundation models to generate probabilistic counterfactual demand forecasts with calibrated uncertainty. Applying this framework to bus, subway, and aggregate trip volume data, we find that post-policy bus and subway ridership increased significantly relative to expected no-policy demand, while overall travel demand decreased modestly. The effects are spatially heterogeneous: while reductions in overall travel demand are concentrated within the Congestion Relief Zone, transit gains extend beyond Manhattan's core. Socio-demographic analyses further reveal uneven adaptation across neighborhoods, highlighting spatial equity implications. Our framework provides a scalable approach for the uncertainty-aware evaluation of system-wide urban interventions when clean control groups are unavailable.

URL PDF HTML ☆

赞 0 踩 0

2606.17958 2026-06-17 cs.CV cs.LG 交叉投稿

Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation

超越视觉线索：CoT增强推理用于半监督医学图像分割

Yuming Chen, Yuxin Xie, Tao Zhou, Yi Zhou

发表机构 * School of Computer Science and Engineering, Southeast University（东南大学计算机科学与工程学院）； Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education（教育部新一代人工智能技术及其跨学科应用重点实验室）； Nanjing University of Science and Technology（南京理工大学）

AI总结提出CERS框架，通过集成链式思维推理和语义参考选择策略，解决半监督医学图像分割中的视觉-语义不匹配问题，在边界模糊和语义不一致场景下优于现有方法。

Comments Accepted to MICCAI 2026

详情

AI中文摘要

半监督医学图像分割已成为医学图像分析中的主导研究问题，通过对未标记数据利用一致性正则化来缓解标注稀缺。然而，现有方法主要通过视觉模式匹配操作，严重依赖像素级相似性。这种以视觉为中心的依赖在临床场景中常常失效，因为视觉上相似的病变可能需要不同的诊断结论，从而无法捕捉专家使用的潜在诊断逻辑。为了解决这个问题，我们超越视觉线索，提出了CERS（CoT增强推理分割），一个集成链式思维（CoT）推理以区分病理上不同案例的框架。具体来说，我们构建了一个知识池，其中包含由大型语言模型（LLMs）生成的丰富语言推理描述。引入了一种语义感知的参考选择策略来识别历史证据，首先通过形态学过滤候选，然后通过CoT一致性进行细化以消除硬负样本。此外，设计了多尺度坐标注意力模块（MCAM）以有效地将这种推理衍生的上下文融合到解码过程中。大量实验证明了CERS相对于最先进方法的优越性，特别是在解决边界模糊和语义不一致方面。代码可在该https URL获取。

英文摘要

Semi-supervised medical image segmentation has emerged as a dominant research problem in medical image analysis, mitigating annotation scarcity by leveraging consistency regularization on unlabeled data. However, existing approaches operate predominantly via visual pattern matching, relying heavily on pixel-level similarities. This visual-centric dependency often falters in clinical scenarios characterized by the visual-semantic mismatch, where visually similar lesions warrant distinct diagnostic conclusions, thus failing to capture the underlying diagnostic logic used by experts. To address this, we move beyond visual cues and propose CERS (CoT-Enhanced Reasoning Segmentation), a framework that integrates Chain-of-Thought (CoT) reasoning to distinguish pathologically distinct cases. Specifically, we construct a knowledge pool enriched with linguistic reasoning descriptions generated by large language models (LLMs). A semantic-aware reference selection strategy is introduced to identify historical evidence, filtering candidates first by morphology, and then refining them via CoT consistency to eliminate hard negatives. Furthermore, a multi-scale coordinate attention module (MCAM) is designed to effectively fuse this reasoning-derived context into the decoding process. Extensive experiments demonstrate the superiority of CERS against state-of-the-art approaches, particularly in resolving boundary ambiguities and semantic inconsistencies. The code is available at https://github.com/cymasuna/CERS.

URL PDF HTML ☆

赞 0 踩 0

2606.18021 2026-06-17 cs.AI cs.CL cs.LG cs.MA 交叉投稿

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

LegalHalluLens: 类型化幻觉审计与校准的多智能体辩论以实现可信赖的法律AI

Lalit Yadav, Akshaj Gurugubelli

发表机构 * Independent Researcher, Sunnyvale, CA, USA（独立研究者，美国加州太阳谷）； Independent Researcher, San Diego, CA, USA（独立研究者，美国加州圣地亚哥）

AI总结针对法律AI中聚合指标掩盖的错误集中性和方向性问题，提出LegalHalluLens审计框架，通过类型化幻觉画像、风险方向指数（RDI）和校准辩论管道，将幻觉检测减少45%，并揭示聚合指标隐藏的失败模式。

Comments 15 pages, 5 figures; Published at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情

AI中文摘要

部署在法律工作流程中的AI系统以聚合指标报告的约52%的比率产生幻觉，但这个平均值掩盖了错误集中的位置和方向，使合规官员无法获得可操作的可信部署信号。我们提出LegalHalluLens，一个包含三个组件的审计框架：基于CUAD（Hendrycks等人，2021）的四种法律动机声明类别（数字、时间、义务/权利、事实）的类型化幻觉画像；一个风险方向指数（RDI），将遗漏与发明偏差简化为一个可部署比较的标量；以及一个针对幅度和方向校准的类型化辩论管道。在510份合同和249,252个条款级实例上，我们测量了义务/数字和时间声明之间约38-40个百分点的模型内差距，而聚合报告隐藏了这一点，并表明两个具有匹配的52%比率的系统可能具有相反的RDI。辩论管道将虚构检测减少了45%，每个类别的收益跟踪诊断结果，使用显著更小的骨干网络（4B活跃参数）匹配商业API。类型化画像和RDI揭示了聚合指标隐藏的失败模式；我们进一步表明这些诊断可作为多智能体辩论管道的校准输入，其中针对测量失败模式的怀疑挑战和非对称门优于通用调整的辩论。该框架支持部署在现实世界中的法律AI的方向感知采购、问责制和智能体设计。

英文摘要

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

URL PDF HTML ☆

赞 0 踩 0

2606.18063 2026-06-17 cs.CV cs.AI cs.LG 交叉投稿

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

当LLM分析疤痕：从图像到临床有意义的特征

Ruman Wang, Hangting Ye

发表机构 * Liaoning University of Traditional Chinese Medicine（辽宁中医药大学）； School of Artificial Intelligence, Jilin University（吉林大学人工智能学院）

AI总结提出ScaFE框架，利用LLM作为知识驱动的特征工程师，将高维图像转化为低维临床可解释特征，在数据稀缺的疤痕分类中优于端到端深度学习方法。

详情

AI中文摘要

医学图像分类面临一个基本困境：虽然深度学习模型在大规模数据上表现卓越，但现实临床场景中由于标注成本、隐私约束和疾病罕见性，常常遭受严重的数据稀缺。这一挑战在病理性疤痕分类中尤为突出，区分瘢痕疙瘩和增生性疤痕需要微妙的专家知识，且标注图像极其有限。我们提出一种新范式，将大型语言模型（LLM）重新定位为知识驱动的特征工程师，而非端到端分类器。我们将此框架称为ScaFE（疤痕特征工程）。我们的关键洞察是，LLM编码了丰富的医学知识，可以外部化为可执行的特征提取代码，从而将高维图像转化为低维、临床可解释的表示。具体来说，我们使用既定的疤痕评估标准提示LLM，生成确定性的Python代码，提取与临床评分系统（如温哥华疤痕量表）对齐的特征。我们的方法提供三个关键优势：（1）数据效率，通过将知识获取与统计学习解耦，在有限训练样本下实现稳健性能；（2）隐私保护，原始图像在本地处理，不暴露给外部LLM；（3）可解释性，通过基于临床推理的显式特征。在疤痕分类上的大量实验表明，在数据有限条件下，我们的方法始终优于端到端深度学习基线或使用LLM作为黑盒分类器，为将LLM集成到数据高效且临床透明的医学AI系统中开辟了有前景的方向。

英文摘要

Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounced in pathological scar classification, where differentiating keloids from hypertrophic scars requires subtle expert knowledge and labeled images are extremely limited. We propose a novel paradigm that repositions large language models (LLMs) as knowledge-driven feature engineers rather than end-to-end classifiers. We call this framework ScaFE (Scar Feature Engineering). Our key insight is that LLMs encode rich medical knowledge that can be externalized as executable feature extraction code, enabling the transformation of high-dimensional images into low-dimensional, clinically interpretable representations. Specifically, we prompt an LLM with established scar assessment criteria to generate deterministic Python code that extracts features aligned with clinical scoring systems such as the Vancouver Scar Scale. Our approach offers three key advantages: (1) data efficiency, achieving robust performance with limited training samples by decoupling knowledge acquisition from statistical learning; (2) privacy preservation, as raw images are processed locally without exposure to external LLMs; and (3) interpretability, through explicit features grounded in clinical reasoning. Extensive experiments on scar classification demonstrate that our method consistently outperforms end-to-end deep learning baselines or using LLMs as black-box classifiers under limited data conditions, establishing a promising direction for integrating LLMs into data-efficient and clinically transparent medical AI systems.

URL PDF HTML ☆

赞 0 踩 0

2606.18223 2026-06-17 cs.CR cs.AI cs.LG cs.SY eess.SY 交叉投稿

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

从观测中学习红方代理策略用于神经符号自主网络代理

Ankita Samaddar, Sandeep Neema, Daniel Balasubramanian, Xenofon Koutsoukos

发表机构 * MIT（麻省理工学院）

AI总结针对网络攻击中红方动作不可观测的问题，提出基于模仿学习的策略学习技术，从网络观测和防御动作预测红方行为，集成神经符号防御代理实现高精度预测。

详情

AI中文摘要

随着复杂网络攻击日益普遍，现代网络需要经由强化学习训练的智能自主网络防御代理。这些代理采用神经符号方法，如带有学习组件的行为树，来学习、推理、适应和实施安全规则，同时维持关键操作。然而，这些自主网络是部分可观测系统，即网络攻击者（红方代理）的动作不可观测，使得防御者难以预测红方动作、学习红方策略或评估攻击者的入侵程度。为解决此问题，我们提出一种策略学习技术，利用模仿学习来学习具有离散状态和离散动作的部分可观测RL代理的策略。我们在自主网络环境中应用该技术，从网络观测和防御动作预测红方代理的动作。与神经符号网络防御代理集成后，我们的方法有效处理不同红方策略，并在多种模拟场景中实现高预测精度。

英文摘要

With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.

URL PDF HTML ☆

赞 0 踩 0

2505.03509 2026-06-17 cs.LG astro-ph.IM 版本更新

AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning

AnomalyMatch: 通过半监督和主动学习发现罕见感兴趣对象

Pablo Gómez, Laslo E. Ruhberg, Maria Teresa Nardone, David O'Ryan

AI总结提出AnomalyMatch框架，结合半监督FixMatch算法和主动学习，将异常检测视为二分类问题，利用少量标注和大量未标注图像训练，在严重类别不平衡下实现高AUROC和AUPRC。

Comments Accepted for publication in RASTI; 17 pages; 12 figures

详情

AI中文摘要

大数据集中的异常检测在天文学和计算机视觉中至关重要。然而，由于标记数据稀缺，通常无法应用监督方法进行异常检测。我们提出了AnomalyMatch，一个结合了使用EfficientNet分类器的半监督FixMatch算法与主动学习的异常检测框架。AnomalyMatch专为大规模应用定制，并集成到ESA Datalabs科学平台中。在该方法中，我们将异常检测视为二分类问题，并有效利用有限的标记图像和丰富的未标记图像进行训练。我们通过用户界面实现主动学习，用于验证高置信度异常并纠正误报。在严重类别不平衡下，对GalaxyMNIST天文数据集和miniImageNet自然图像基准的评估显示出强大性能。从五到十个标记异常开始，我们实现了平均AUROC为0.96（miniImageNet）和0.89（GalaxyMNIST），相应的AUPRC分别为0.82和0.77。经过三个主动学习周期后，按分数排名前1%的图像中，异常精度达到76%（miniImageNet）至94%（GalaxyMNIST）。我们与已建立的Astronomaly软件在来自'Galaxy Zoo - The Galaxy Challenge'数据集的选定'奇特'星系上进行比较，实现了可比较的性能，平均AUROC为0.83。我们的结果强调了该方法在异常发现方面的卓越实用性和可扩展性，突显了针对标签严重稀缺领域的专门方法的价值。

英文摘要

Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly detection. We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm using EfficientNet classifiers with active learning. AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform. In this method, we treat anomaly detection as a binary classification problem and efficiently utilise limited labelled and abundant unlabelled images for training. We enable active learning via a user interface for verification of high-confidence anomalies and correction of false positives. Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance. Starting from five to ten labelled anomalies, we achieve an average AUROC of 0.96 (miniImageNet) and 0.89 (GalaxyMNIST), with respective AUPRC of 0.82 and 0.77. After three active learning cycles, anomalies are ranked with 76% (miniImageNet) to 94% (GalaxyMNIST) precision in the top 1% of the highest-ranking images by score. We compare to the established Astronomaly software on selected 'odd' galaxies from the 'Galaxy Zoo- The Galaxy Challenge' dataset, achieving comparable performance with an average AUROC of 0.83. Our results underscore the exceptional utility and scalability of this approach for anomaly discovery, highlighting the value of specialised approaches for domains characterised by severe label scarcity

URL PDF HTML ☆

赞 0 踩 0

2506.05797 2026-06-17 cs.LG cs.CE cs.RO 版本更新

EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator

EqCollide: 等变且碰撞感知的可变形物体神经模拟器

Qianyi Chen, Tianrun Gao, Chenbo Jiang, Tailin Wu

发表机构 * Westlake University（西交大大学）； Fudan University（复旦大学）； Tongji University（同济大学）； McGill University（麦吉尔大学）

AI总结提出首个端到端等变神经场模拟器EqCollide，通过等变编码器和碰撞感知消息传递的图神经网络常微分方程，实现可变形物体碰撞的准确、稳定和可扩展模拟。

Comments SIGKDD 2026 Oral AI4S Track. 20 pages, 16 figures

详情

AI中文摘要

模拟可变形物体的碰撞是一项基础但具有挑战性的任务，因为涉及固体力学和多体相互作用的复杂性。现有的数据驱动方法通常缺乏对物理对称性的等变性、对碰撞处理不足以及可扩展性有限。本文介绍\name，这是首个用于可变形物体及其碰撞的端到端等变神经场模拟器。我们提出一个等变编码器，将物体几何和速度映射到潜在控制点。随后，基于等变图神经网络的神经常微分方程通过碰撞感知消息传递建模控制点之间的相互作用。为了重建速度场，我们查询一个以控制点特征为条件的神经场，实现连续且分辨率无关的运动预测。在2D和3D场景上的实验结果表明，\name在不同物体配置下实现了准确、稳定且可扩展的模拟。与最佳基线模型相比，其滚动均方误差降低了24.34%至57.62%。此外，\name能够泛化到更多碰撞物体和更长的时间范围，并对群作用下的输入变换保持鲁棒。代码可在以下网址获取：this https URL

英文摘要

Simulating collisions of deformable objects is a fundamental yet challenging task due to the complexity of modeling solid mechanics and multi-body interactions. Existing data-driven methods often suffer from lack of equivariance to physical symmetries, inadequate handling of collisions, and limited scalability. Here we introduce EqCollide, the first end-to-end equivariant neural fields simulator for deformable objects and their collisions. We propose an equivariant encoder to map object geometry and velocity into latent control points. A subsequent equivariant Graph Neural Network-based Neural Ordinary Differential Equation models the interactions among control points via collision-aware message passing. To reconstruct velocity fields, we query a neural field conditioned on control point features, enabling continuous and resolution-independent motion predictions. Experimental results on 2D and 3D scenarios show that EqCollide achieves accurate, stable, and scalable simulations across diverse object configurations. It achieves $24.34\%$ to $57.62\%$ lower rollout MSE, even compared with the best-performing baseline model. Furthermore, EqCollide could generalize to more colliding objects and extended temporal horizons, and stay robust to input transformed with group action. Code is available at: https://github.com/AI4Science-WestlakeU/EqCollide

URL PDF HTML ☆

赞 0 踩 0

2602.03045 2026-06-17 cs.LG 版本更新

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

先澄清再绘制：面向鲁棒文本到CAD生成的主动式智能体

Bo Yuan, Zelin Zhao, Petr Molodyk, Bin Hu, Yongxin Chen

AI总结提出主动式智能体框架ProCAD，通过澄清代理在代码生成前解决用户提示中的歧义，再通过CAD编码代理生成可执行程序，显著提升鲁棒性，平均Chamfer距离降低79.9%。

Comments ICML 2026

详情

AI中文摘要

大型语言模型最近使得文本到CAD系统能够从自然语言提示中合成参数化CAD程序（例如CadQuery）。然而在实践中，几何描述可能是不明确或内部不一致的：关键尺寸可能缺失，约束可能冲突。然而，现有的微调模型倾向于被动地遵循用户指令，并在文本模糊时产生幻觉尺寸。为了解决这个问题，我们提出了一个用于文本到CadQuery生成的主动式智能体框架，名为ProCAD，它在代码合成之前解决规范问题。我们的框架将主动式澄清代理（该代理审计提示并仅在必要时提出有针对性的澄清问题以生成自洽的规范）与CAD编码代理（将规范转换为可执行的CadQuery程序）配对。我们基于精心策划的高质量文本到CadQuery数据集微调编码代理，并通过在澄清轨迹上进行智能体SFT来训练澄清代理。实验表明，主动式澄清显著提高了对模糊提示的鲁棒性，同时保持较低的交互开销。ProCAD优于前沿闭源模型，包括Claude Sonnet 4.5，将平均Chamfer距离降低了79.9%，并将无效比率从4.8%降至0.9%。我们的代码和数据集在此https URL上公开。

英文摘要

Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural-language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. However, existing fine-tuned models tend to reactively follow the user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named as ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent based on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9% and lowering the invalidity ratio from 4.8% to 0.9%. Our code and datasets are made publicly available on https://github.com/BoYuanVisionary/Pro-CAD.

URL PDF HTML ☆

赞 0 踩 0

2602.11715 2026-06-17 cs.LG cs.CL 版本更新

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

DICE：扩散大语言模型在生成CUDA内核方面表现出色

Haolei Bai, Lingcheng Kong, Xueyi Chen, Jianmian Wang, Zhiqiang Tao, Huan Wang

AI总结提出CuKe数据集和BiC-RL训练框架，构建DICE系列扩散大语言模型（1.7B/4B/8B），在KernelBench上显著优于同类自回归和扩散模型，实现CUDA内核生成新SOTA。

Comments v2: Expanded with dLLM vs. autoregressive LLM comparisons, ablation studies, and qualitative case studies

详情

AI中文摘要

扩散大语言模型（dLLMs）因其并行生成令牌的能力，已成为自回归（AR）LLMs的有力替代方案。这一范式特别适用于代码生成，其中整体结构规划和非顺序优化至关重要。尽管有这种潜力，但针对CUDA内核生成定制dLLMs仍然具有挑战性，不仅因为高度专业化，还因为严重缺乏高质量的训练数据。为了解决这些挑战，我们构建了CuKe，一个针对高性能CUDA内核优化的增强监督微调数据集。在此基础上，我们提出了一个双阶段策划强化学习（BiC-RL）框架，包括CUDA内核填充阶段和端到端CUDA内核生成阶段。利用这一训练框架，我们推出了DICE，一系列专为CUDA内核生成设计的扩散大语言模型，涵盖1.7B、4B和8B三个参数规模。在KernelBench上的大量实验表明，DICE显著优于同等规模的自回归和扩散LLMs，为CUDA内核生成建立了新的最先进水平。

英文摘要

Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.

URL PDF HTML ☆

赞 0 踩 0

2602.22277 2026-06-17 cs.LG eess.SP 版本更新

X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation

X-REFINE：基于XAI的相关性输入过滤与架构微调用于信道估计

Abdul Karim Gizzini, Yahia Medjahdi

AI总结提出X-REFINE框架，通过分解稳定化LRP epsilon规则联合优化输入过滤和架构微调，在信道估计中实现性能-复杂度-可解释性的优越权衡。

Comments This paper has been accepted for publication in the IEEE Transactions on Vehicular Technology (TVT) as a correspondence paper

详情

AI中文摘要

AI原生架构对于6G无线通信至关重要。在信道估计等关键应用中采用的深度学习模型的黑盒特性和高复杂度限制了其实际部署。虽然基于扰动的可解释人工智能（XAI）解决方案提供了输入过滤，但它们往往忽略了内部结构优化。我们提出了X-REFINE，一个基于XAI的联合输入过滤和架构微调框架。通过利用基于分解的、符号稳定的LRP epsilon规则，X-REFINE反向传播预测以获取子载波和隐藏神经元的高分辨率相关性分数。这使得能够进行可靠的优化，识别出最可靠的模型组件。仿真结果表明，与基于外部扰动的XAI框架相比，X-REFINE实现了优越的性能-复杂度-可解释性权衡，显著降低了计算复杂度，同时保持了稳健的误码率（BER）性能。

英文摘要

AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based eXplainable Artificial Intelligence (XAI) solutions offer input filtering, they often neglect internal structural optimization. We propose X-REFINE, an XAI-based framework for joint input-filtering and architecture fine-tuning. By utilizing a decomposition-based, sign-stabilized LRP epsilon rule, X-REFINE backpropagates predictions to derive high-resolution relevance scores for both subcarriers and hidden neurons. This enables a reliable optimization that identifies the most reliable model components. Simulation results demonstrate that X-REFINE achieves a superior performance-complexity-interpretability trade-off compared to the external perturbation-based XAI frameworks, significantly reducing computational complexity while maintaining robust bit error rate (BER) performance.

URL PDF HTML ☆

赞 0 踩 0

2604.17616 2026-06-17 cs.LG 版本更新

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

时间序列异常检测中根因分析的条件归因

Shashank Mishra, Karan Patil, Cedric Schockaert, Didier Stricker, Jason Rambach

AI总结提出一种条件归因框架，通过检索与异常观测上下文相似的正态实例进行依赖保持的解释，结合变分自编码器和UMAP流形嵌入实现高维时间序列的高效归因，并在SWaT和MSDS基准上提升了根因识别准确率与鲁棒性。

Comments Accepted at ECML PKDD. 16 pages, 8 figures, 13 tables, and an appendix

详情

Journal ref: ECML PKDD 2026

AI中文摘要

根因分析对于时间序列异常检测在复杂真实世界系统的可靠运行中至关重要。现有的解释方法通常依赖于不切实际的特征扰动，并忽略时间依赖和跨特征依赖，导致归因不可靠。我们提出了一种条件归因框架，该框架相对于上下文相似的正态系统状态来解释异常。我们的方法不是使用边际或随机采样的基线，而是检索以异常观测为条件的代表性正态实例，从而实现依赖保持且操作上有意义的解释。为了支持高维时间序列数据，在学习的低维表示中使用变分自编码器潜在空间和UMAP流形嵌入进行上下文检索。通过将检索过程基于系统学习的流形，该策略避免了分布外伪影，并在保持计算效率的同时确保归因保真度。我们进一步引入了置信感知和时间评估指标，用于评估解释的可靠性和响应性。在SWaT和MSDS基准上的实验表明，所提出的方法在多个异常检测模型上持续提高了根因识别准确率、时间定位和鲁棒性。这些结果突显了条件归因在复杂时间序列系统中用于可解释异常诊断的实际效用。代码和模型将公开发布。

英文摘要

Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

URL PDF HTML ☆

赞 0 踩 0

2606.11990 2026-06-17 cs.LG cs.AI 版本更新

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

用于剩余使用寿命估计的时间序列基础模型嵌入

Amir El-Ghoussani, Michele De Vita, Ronald Naumann, Vasileios Belagiannis

发表机构 * University of Erlangen-Nuremberg（埃尔朗根-纽伦堡大学）； Siemens AG（西门子股份公司）

AI总结提出冻结预训练时间序列基础模型Chronos-2作为骨干，结合轻量回归头进行剩余寿命预测，在工业传感器数据上优于多种基线方法。

Comments Accepted to EUSIPCO 2026, 4 pages, 2 figures, 2 tables

详情

AI中文摘要

剩余使用寿命（RUL）预测对于工业预测性维护至关重要，然而许多基于学习的方法依赖于大量的特征工程或大型标注数据集来训练特定任务的序列模型。在这项工作中，我们引入了一种轻量级学习方法，利用冻结的预训练时间序列基础模型（TSFM），并将其与一个小型回归头结合，用于从多变量传感器流中估计RUL。具体来说，我们使用Chronos-2作为冻结骨干来提取上下文窗口特征，并训练一个轻量级回归神经网络进行RUL预测。在来自两种设备类型的真实工业传感器数据上的实验表明，在相同的预处理和评估协议下，Chronos-2特征一致地优于循环、卷积、基于Transformer和梯度提升基线。我们进一步分析了上下文长度的影响，发现随着历史记录变长，性能显著提升，这表明TSFM表示为工业环境中的RUL估计提供了一种实用且数据高效的替代方案。

英文摘要

Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which we leverage a frozen pretrained time-series foundation model (TSFM) and combine it with a small regression head for RUL estimation from multivariate sensor streams. More specifically, we use Chronos-2 as a frozen backbone to extract context window features and train a lightweight regression neural network for RUL prediction. Experiments on real-world industrial sensor data from two device types show that Chronos-2 features consistently improve over recurrent, convolutional, Transformer-based, and gradient-boosting baselines under the same preprocessing and evaluation protocol. We further analyze the impact of context length and find that performance improves significantly with longer histories, indicating that TSFM representation offer a practical and data-efficient alternative for RUL estimation in industrial settings.

URL PDF HTML ☆

赞 0 踩 0

2606.16878 2026-06-17 cs.LG 版本更新

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

集成营销归因：基于贝叶斯框架的隐私安全粒度测量，锚定于MMM

Meghana R. Bhat, Ankit Umare, Utsav Aggarwal, Richard Vecsler, Arunkumar Mani, Karthik Nair, Chandhu Nair

AI总结提出集成营销归因（IMA）框架，结合营销组合模型（MMM）与贝叶斯归因模型，从聚合数据中推导出活动级效果，实现隐私安全且粒度精细的归因。

2403.18957 2026-06-17 cs.CY cs.CL cs.LG cs.SI 版本更新

Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models

使用大型视觉语言模型审核不安全的用户生成内容游戏中的非法在线图像推广

Keyan Guo, Ayush Utkarsh, Wenbo Ding, Isabelle Ondracek, Ziming Zhao, Guo Freeman, Nishant Vishwamitra, Hongxin Hu

AI总结针对社交媒体上非法推广不安全UGC游戏的图像，提出UGCG-Guard系统，利用大型视觉语言模型和条件提示策略实现零样本域适应，检测准确率达94%。

Comments In Proceedings of the 33rd USENIX Conference on Security Symposium (SEC '24), August 14-16, 2024

详情

AI中文摘要

在线用户生成内容游戏（UGCG）在儿童和青少年中越来越受欢迎，用于社交互动和更具创造性的在线娱乐。然而，它们带来了更高的接触露骨内容的风险，引发了对儿童和青少年在线安全的日益关注。尽管存在这些担忧，但很少有研究关注社交媒体上基于图像的非法不安全UGCG推广问题，这种推广可能无意中吸引年轻用户。这一挑战源于难以获得全面的UGCG图像训练数据以及这些图像与传统不安全内容不同的独特性质。在这项工作中，我们迈出了研究不安全UGCG非法推广威胁的第一步。我们收集了一个包含2,924张图像的真实世界数据集，这些图像展示了游戏创作者用于推广UGCG的多种色情和暴力内容。我们的深入研究揭示了对此问题的新认识，以及自动标记非法UGCG推广的迫切需求。我们还创建了一个尖端系统UGCG-Guard，旨在帮助社交媒体平台有效识别用于非法UGCG推广的图像。该系统利用最近引入的大型视觉语言模型（VLM），并采用一种新颖的条件提示策略进行零样本域适应，以及思维链（CoT）推理进行上下文识别。UGCG-Guard取得了出色结果，在现实场景中检测这些用于非法推广游戏的图像时准确率达到94%。

英文摘要

Online user generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.

URL PDF HTML ☆

赞 0 踩 0

2410.08562 2026-06-17 cond-mat.mtrl-sci cs.LG 版本更新

利用人工智能检测和缓解DDoS攻击：综述

Alexandru Apostu, Silviu Gheorghe, Andrei Hîji, Nicolae Cleju, Andrei Pătraşcu, Cristian Rusu, Radu Ionescu, Paul Irofti

发表机构 * Department of Computer Science, University of Bucharest（布加勒斯大学计算机科学系）

AI总结本文综述了基于AI的DDoS攻击检测与缓解方法，提供了基于专家层次和AI生成树状图的分类法，讨论了数据集、对抗训练及未来研究方向。

2509.15210 2026-06-17 cs.SD cs.AI cs.LG 版本更新

Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation

显式上下文驱动的神经声学建模用于高保真RIR生成

Chen Si, Qianyi Wu, Chaitanya Amballa, Romit Roy Choudhury

AI总结提出MiNAF模型，通过查询房间网格并提取距离分布作为显式局部几何特征，引导神经隐式模型生成更准确的房间脉冲响应（RIR），在多项指标上达到竞争性能。

详情

AI中文摘要

逼真的声音模拟在许多应用中起着关键作用。声音模拟的一个关键要素是房间脉冲响应（RIR），它描述了声音在给定空间中的传播方式。最近的研究应用神经隐式方法，利用从环境中收集的上下文信息（如场景图像）来学习RIR。然而，这些方法没有有效利用环境中的显式几何信息。为了进一步利用具有直接几何特征的神经隐式模型，我们提出了MiNAF，它在给定位置查询粗略的房间网格，并提取距离分布作为局部上下文的显式表示。我们的方法表明，结合显式的局部几何特征可以更好地引导模型生成更准确的RIR预测。通过与常规和最先进方法的比较，我们展示了MiNAF在各种评估指标上具有竞争力的性能。

英文摘要

Realistic sound simulation plays a critical role in many applications. A key element in sound simulation is the room impulse response (RIR), which characterizes how sound propagates within a given space. Recent studies have applied neural implicit methods to learn RIR using context information collected from the environment, such as scene images. However, these approaches do not effectively leverage explicit geometric information from the environment. To further exploit neural implicit models with direct geometric features, we present MiNAF, which queries a rough room mesh at given locations and extracts distance distributions as an explicit representation of local context. Our approach demonstrates that incorporating explicit local geometric features can better guide the model in generating more accurate RIR predictions. Through comparisons with conventional and state-of-the-art methods, we show that MiNAF performs competitively across various evaluation metrics.

URL PDF HTML ☆

赞 0 踩 0

2509.26476 2026-06-17 cs.CL cs.AI cs.LG cs.PF cs.SE 版本更新

Regression Language Models for Code

代码的回归语言模型

Yash Akhauri, Xingyou Song, Arissa Wongpanich, Bryan Lewandowski, Mohamed S. Abdelfattah

AI总结提出回归语言模型（RLM），利用冻结的大语言模型编码器直接从文本预测代码执行结果（如内存占用、延迟、神经网络精度等），在多个任务上达到高相关度。

Comments Published in International Conference on Machine Learning (ICML) 2026

详情

AI中文摘要

我们研究代码到指标的回归：预测代码执行的数值结果，由于编程语言的开放性，这是一项具有挑战性的任务。虽然先前的方法依赖于繁重且特定领域的特征工程，但我们展示了一个统一的回归语言模型（RLM），使用冻结的LLM编码器可以直接从文本同时预测：(i) 多种高级语言（如Python和C++）代码的内存占用，(ii) Triton GPU内核的延迟，以及(iii) 以ONNX表示的已训练神经网络的精度和速度。特别是，一个基于T5Gemma的较小300M参数RLM在APPS的竞赛编程提交上获得了>0.9的Spearman等级相关系数，而单个统一模型在CodeNet的17种不同语言上获得了>0.5的平均Spearman等级相关系数。此外，RLM在五个经典NAS设计空间上获得了最高平均Kendall-Tau 0.46，这些空间此前由图神经网络主导，并且能同时预测多种硬件平台上的架构延迟。

英文摘要

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) using a frozen LLM encoder can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM based on T5Gemma, obtains >0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves >0.5 average Spearman-rank across 24 different programming languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

URL PDF HTML ☆

赞 0 踩 0

2510.19838 2026-06-17 cs.AI cs.CL cs.LG 版本更新

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Branch-and-Browse：具有树状推理与动作记忆的高效可控网页探索

Shiqi He, Yue Cui, Xinyu Ma, Yaliang Li, Bolin Ding, Mosharaf Chowdhury

AI总结提出Branch-and-Browse框架，通过树状结构化推理、网页状态重放和页面动作记忆，实现LLM网页代理的高效可控多分支探索，在WebArena上成功率35.8%，执行时间降低40.4%。

详情

AI中文摘要

由大型语言模型（LLM）驱动的自主网页代理在执行目标导向任务（如信息检索、报告生成和在线交易）方面展现出强大潜力。这些代理标志着向开放网络环境中实用具身推理的关键一步。然而，现有方法在推理深度和效率方面仍然受限：简单的线性方法无法进行多步推理且缺乏有效的回溯，而其他搜索策略则粗粒度且计算成本高。我们引入了Branch-and-Browse，一个细粒度的网页代理框架，它统一了结构化推理-行动、上下文记忆和高效执行。它（i）采用显式子任务管理与树状结构化探索，实现可控的多分支推理；（ii）通过高效的网页状态重放与后台推理引导探索；（iii）利用页面动作记忆在会话内和跨会话间共享已探索的动作。在WebArena基准测试中，Branch-and-Browse的任务成功率达到35.8%，相对于最先进的方法执行时间减少高达40.4%。这些结果表明，Branch-and-Browse是一个可靠且高效的基于LLM的网页代理框架。

英文摘要

Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8\% and reduces execution time by up to 40.4\% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.

URL PDF HTML ☆

赞 0 踩 0

2511.19162 2026-06-17 cs.IR cs.CY cs.HC cs.LG cs.MM 版本更新

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

BioArtlas：生物艺术中多维复杂性的计算聚类

Joonhyung Bae

发表机构 * Graduate School of Culture Technology（文化科技研究生院）

AI总结本文提出BioArtlas，通过新型轴感知表示对81件生物艺术作品进行多维分析，揭示四种组织模式，并通过交互式网页界面提供分析与探索。

Comments Bae, J. BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Creative AI Track: Humanity

详情

AI中文摘要

生物艺术的混合性质跨越艺术、科学、技术、伦理和政治，挑战传统单一轴分类。我提出了BioArtlas，利用新型轴感知表示分析81件生物艺术作品，共十三个 curated 维度。我们的代码本方法将相关概念分组为统一聚类，解决文化术语的多义性。对多达800种表示空间-算法组合的全面评估发现，Agglomerative clustering在k=15的4D UMAP上最优（轮廓系数0.664±0.008，信任度/连续性0.805/0.812）。该方法揭示了四种组织模式：艺术家特定的方法论凝聚力、基于技术的分段、时间艺术演变以及跨时间的概念亲和力。通过将分析优化与公共传播分离，我通过交互式网页界面（https://www.bioartlas.com）提供严谨分析和可访问的探索，数据集公开可用（https://github.com/joonhyungbae/BioArtlas）.

英文摘要

Bioart brings living material into artistic practice, where a single work can be at once an aesthetic object, a scientific instrument, and an ethical provocation. Traditional categories sort such works along one axis at a time, which flattens the very hybridity that defines the field and leaves curators no way to compare works across many dimensions together. I introduce BioArtlas, a computational atlas that represents each bioartwork along many curated dimensions at once and organizes the field by conceptual similarity rather than by medium or chronology. My method embeds the keywords of all 81 works on each of thirteen interpretive axes, groups related concepts into a shared codebook that tames inconsistent terminology, and then searches systematically for a clustering that is both statistically clean and interpretable. Among the methods that place every work on the map, agglomerative clustering separates the field far more cleanly than the usual k-means baseline (silhouette 0.664 versus 0.483), whereas density-based methods reach higher scores only by discarding most of the corpus as noise. By separating rigorous analysis from public storytelling, BioArtlas turns the tangled complexity of bioart into a navigable landscape, openly available as an interactive interface (https://www.bioartlas.com) and dataset (https://github.com/joonhyungbae/BioArtlas).

URL PDF HTML ☆

赞 0 踩 0

2601.06862 2026-06-17 cs.CR cs.CV cs.LG cs.MM eess.IV 版本更新

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

从加密视频会议流量的数据包级别测量中学习QoE

Michael Sidorov, Ofer Hadar

AI总结针对ISP无法访问加密内容评估QoE的挑战，提出基于CNN的框架仅利用数据包大小预测BRISQUE和MOS，在WhatsApp和Zoom数据集上优于先前模型。

详情

AI中文摘要

用户体验质量已成为当今世界最重要的方面之一，因为它直接影响个人继续使用或放弃产品或服务的意愿。在此背景下，视频会议应用（VCAs）在COVID-19大流行后得到广泛采用，必须在日益拥挤的市场中提供卓越性能以保持竞争力。尽管内容提供商（CPs）如Zoom、WhatsApp、Telegram和Google Meet可以通过比较发送和接收的数据来评估通话质量，但VCAs中广泛使用的端到端加密使得互联网服务提供商（ISPs）评估体验质量（QoE）变得更加困难。由于ISPs无法访问加密内容，他们必须依赖对数据路径上未加密流量特征的被动测量。在这项工作中，我们提出了一个简单而有效的QoE预测框架，基于几乎原生的卷积神经网络（CNN）架构，仅使用从视频会议（VC）通话中两个参与者之间的通信中提取的数据包大小来预测两个QoE指标：BRISQUE和MOS。所提出的框架简单、易于实现，且不需要高端计算资源，但提供了优越的预测性能，正如我们在从WhatsApp和Zoom收集的两个自定义数据集上的实验所示，这些实验在QoE预测任务上比先前模型取得了显著改进。

英文摘要

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.

URL PDF HTML ☆

赞 0 踩 0

2602.04901 2026-06-17 q-bio.GN cs.LG 版本更新

记忆作为消耗性资产：为具身智能体定价闪存耐久性及其局限性

Josef Liyanjun Chen

发表机构 * KAIKAKU

AI总结本文提出将机器人闪存耐久性视为折旧资本，通过单一影子价格η进行定价，实现成本最优的存储层级分配，并基于真实机器人日志测量价值-写入关联χ的符号，发现其取决于部署场景。

详情

AI中文摘要

机器人的闪存耐久性是一种不可再生资源：每次持久化写入都会消耗数千次编程/擦除周期中的一次，且无法补充，然而目前没有实际部署的机器人内存系统对哪些记忆值得消耗一次擦除周期进行定价。我们将具身记忆视为折旧资本，并用单一耐久性影子价格η对该资源定价，这使得在RAM/板载NVM/云层级中进行成本最小化的放置成为一个在磨损增强的每字节索引中的阈值。无论价值-写入关联χ的符号如何，该索引都是成本最优的；只有当χ>0时，最优解才变为非单调，将机器人最有价值的记忆从闪存中移出。因此，关键点是经验性的，我们在预定义的关口上测量真实机器人日志中的χ：其符号是部署场景的一个属性——在重复的长时域操作中为正（χ̂≈+1.0×10^{-3}，在全功率下可复现），在较短时域任务中为零，在非重复遥操作中为负。两个边界限制了该结果。在高端3,000 P/E TLC闪存按数据手册价格计算时，耐久性预算处于休眠状态；而在廉价边缘机器人使用的商用QLC/eMMC（约1,000 P/E）上则具有约束力。当约束生效时，学习到的磨损感知控制器仅在任务价值上与基于价格的路由持平，因为实现的价值在RAM、NVM和云层级之间是不变的：租金决定设备寿命和成本，而非任务性能。磨损感知放置是否能提高任务价值仍是一个开放问题——χ是针对价值代理测量的，而非单调最优解虽已被证明，但尚未在数据中观察到。

英文摘要

A robot's flash endurance is a non-renewable stock: every persisted write spends one of a few thousand program/erase cycles and never refills, yet no fielded robot memory system prices which memories are worth an erase cycle. We treat embodied memory as depreciating capital and price that stock with a single endurance shadow price $η$, which makes cost-minimizing placement across a RAM / on-board NVM / cloud hierarchy a threshold in a wear-augmented per-byte index. The index is cost-optimal whatever the sign of the value-write association $χ$; only when $χ> 0$ does the optimum turn non-monotone, sending a robot's most valuable memories off its flash. The pivot is thus empirical, and we measure $χ$ on real robot logs at a pre-specified gate: its sign is a property of the deployment regime -- positive on recurrent long-horizon manipulation ($\hatχ \approx +1.0 \times 10^{-3}$, replicated at full power), null on a shorter-horizon suite, and negative on non-recurrent teleoperation. Two boundaries scope the result. The endurance budget is dormant on premium 3,000-P/E TLC at datasheet prices and binding on the commodity QLC/eMMC ($\sim$1,000 P/E) that cheaper edge robots run. And where it binds, a learned wear-aware controller only ties price-based routing on task value, because realized value is tier-invariant across RAM, NVM, and cloud: the rent governs device lifetime and cost, not task performance. Whether wear-aware placement improves task value remains open -- $χ$ is measured against a value proxy, and the non-monotone optimum, while proven, is not yet observed in data.

URL PDF HTML ☆

赞 0 踩 0

2409.17502 2026-06-17 cs.LG 版本更新

Broadcast Product: Redefining Shape-aligned Element-wise Multiplication and Beyond

广播乘积：重新定义形状对齐的逐元素乘法及其扩展

Yusuke Matsui, Tatsuya Yokota

AI总结本文引入广播乘积$\boxdot$，形式化扩展Hadamard乘积以处理形状不匹配的张量逐元素乘法，并建立其代数性质及与线性代数的联系，为广播感知的张量运算奠定数学基础。

Comments TMLR2026. OpenReview: https://openreview.net/forum?id=zv0OtOPpPO

详情

AI中文摘要

广播操作在科学计算库中被广泛使用，但其数学形式化在机器学习文献中常常是隐式的且表示不一致。当逐元素乘积被写出但张量形状不匹配时，这个问题经常导致无效的方程。在本文中，我们通过引入广播乘积$\boxdot$来形式化此类操作，该乘积通过形状对齐的元素复制显式扩展了Hadamard乘积。我们提供了广播乘积的严格定义，分析了其代数性质，并展示了如何使用标准线性代数表示它。基于这一框架，我们制定了最小二乘问题并勾勒出一个概念验证的广播分解。作为初步说明，我们展示了该形式化方法能够产生一类具有与传统张量分解不同结构特性的新分解。这项工作为广播感知的张量运算建立了数学基础，将实际实现与严格的张量分析联系起来。

英文摘要

Broadcast operations are widely used in scientific computing libraries, yet their mathematical formulation is often implicit and inconsistently represented in machine learning literature. This problem frequently leads to invalid equations when element-wise products are written despite mismatched tensor shapes. In this paper, we formalize such operations by introducing the broadcast product $\boxdot$, which explicitly extends the Hadamard product through shape-aligned element duplication. We provide a rigorous definition of the broadcast product, analyze its algebraic properties, and show how it can be expressed using standard linear algebra. Building on this framework, we formulate least-squares problems and sketch a proof-of-concept broadcast decomposition. As a preliminary illustration, we show that the formalism enables a new family of decompositions with distinct structural properties from conventional tensor decompositions. This work establishes a mathematical foundation for broadcast-aware tensor operations, connecting practical implementations with rigorous tensor analysis.

URL PDF HTML ☆

赞 0 踩 0

2605.12646 2026-06-17 cs.LG cs.AI cs.HC 版本更新

Learning to Decide with AI Assistance under Human-Alignment

在人工智能协助下的人类对齐决策学习

Nina Corvelo Benz, Eleni Straitouri, Manuel Gomez-Rodriguez

发表机构 * GitHub

AI总结本文研究了在高风险领域中，人工智能如何通过预测结果帮助决策者，并探讨了AI预测信心与决策者自身信心的对齐程度对决策学习复杂性的影响。

详情

AI中文摘要

人们普遍认为，当人工智能模型通过预测感兴趣的结果来协助决策者时，它们应传达预测的置信度。然而，实证证据表明，决策者往往难以仅根据传达的置信度来判断何时信任预测。在此背景下，近期的理论和实证工作表明，AI辅助决策的效用与AI置信度和决策者自身置信度之间的对齐程度之间存在正相关性。关键的是，这些发现尚未阐明这种对齐程度如何影响通过重复交互学习做出最佳决策的复杂性。在本文中，我们考虑二元预测和二元决策的典型情况，首先证明该问题等价于具有完全反馈的双臂在线上下文学习问题，并建立了任何学习者可以达到的期望遗憾的下界为$Ω(\sqrt{|H| \cdot |B| \cdot T} )$，其中$H$和$B$分别表示人类和AI置信度的集合。然后我们证明，在AI和人类置信度完全对齐的情况下，学习者可以达到期望遗憾为$O(\sqrt{|H| \cdot T\log T})$，当$\sqrt{|H|} = O(\log T)$且$B$是可数的时，Dvoretzky-Kiefer-Wolfowitz不等式的非平凡推广将遗憾界改进到$O(\sqrt{T\log T})$。这些结果表明，对齐可以减少在人工智能协助下学习决策的复杂性。在两个不同的人类主体研究中，参与者通过AI模型协助解决简单决策任务的实验证明，我们的理论结果在完全对齐被违反时仍然稳健。

英文摘要

It is widely agreed that when AI models assist decision-makers in high-stakes domains by predicting an outcome of interest, they should communicate the confidence of their predictions. However, empirical evidence suggests that decision-makers often struggle to determine when to trust a prediction based solely on this communicated confidence. In this context, recent theoretical and empirical work suggests a positive correlation between the utility of AI-assisted decision-making and the degree of alignment between the AI confidence and the decision-makers' confidence in their own predictions. Crucially, these findings do not yet elucidate the extent to which this alignment influences the complexity of learning to make optimal decisions through repeated interactions. In this paper, we address this question in the canonical case of binary predictions and binary decisions. We first show that this problem is equivalent to a two-armed online contextual learning problem with full feedback, and establish a lower bound of $Ω(\sqrt{|H| \cdot |B| \cdot T} )$ on the expected regret any learner can attain, where $H$ and $B$ denote the sets of human and AI confidence values. We then demonstrate that, under perfect alignment between AI and human confidence, a learner can attain an expected regret of $O(\sqrt{|H| \cdot T\log T})$ and, when $\sqrt{|H|} = O(\log T)$ and $B$ is countable, a non-trivial generalization of the Dvoretzky-Kiefer-Wolfowitz inequality improves the regret bound to $O(\sqrt{T\log T})$. Taken together, these results reveal that alignment can reduce the complexity of learning to make decisions with AI assistance. Experiments on real data from two different human-subject studies where participants solve simple decision-making tasks assisted by AI models show that our theoretical results are robust to violations of perfect alignment.

URL PDF HTML ☆

赞 0 踩 0

2606.15386 2026-06-17 cs.LG 版本更新

A Compositional Framework for Open-ended Intelligence

开放智能的组合框架

Ida Momennejad, Roberta Raileanu

发表机构 * GitHub

AI总结提出开放智能的形式化定义，通过有限原始集和组合算子生成闭包，支持跨任务和世界的无限组合生成，并引入下一原始预测作为架构目标。

详情

AI中文摘要

开放智能是指适应与训练环境显著不同的新问题和新环境的能力。我们将开放智能形式化为由有限原始集 $P$ 和一组组合算子 $C$ 诱导的闭包。我们刻画了诱导闭包 $\mathcal{L}(P,C)$ 的性质，该闭包支持跨任务和世界族的无界组合生成。开放智能的数学需要两个支柱：一组最小的表示原始（例如状态、动作）和算法原始（例如最近邻），以及反映习得组合语法的组合模式（例如递归、序列化）。这两个支柱的闭包使得能够在广泛的环境中生成无限的自适应响应。该数学支持互补的研究议程，包括解释性和可解释性的评估指标，以及构建组合泛化原生的架构。我们提出下一原始预测作为一种新的架构目标，其中训练目标鼓励获取可重用的算法原始及其组合语法，从而通过重组生成新的解决方案。课程学习和自我博弈通过跨任务和世界族发现可重用原始和转换模式，实现闭包的终身学习和扩展。我们通过物理学、进化论和神经科学的案例研究来夯实该框架。

英文摘要

Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. A mathematics of open-ended intelligence requires two pillars: first, a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., nearest neighbor); and second, an acquired compositional grammar for selection, recursion, and branching that produces sequences of operations and recurring motifs. We formalize open-ended intelligence in terms of the compositional closure induced by a finite primitive set $P$ and a set of composition operators $C$. We characterize properties of the induced closure $\mathcal{L}(P,C)$ that support unbounded compositional generation across families of tasks and worlds. The closure of the two pillars yields infinite adaptive responses across a wide range of settings. The mathematics supports complementary research agendas, including evaluation metrics for explanation and interpretability, and novel architectures where compositional generalization is native. We propose next primitive prediction (NPP) as a novel architectural objective, where training encourages the acquisition of reusable algorithmic primitives and their compositional grammar, such that new solutions are generated through recombination. Given such an objective, curriculum learning and self-play can enable lifelong learning, expanding the closure by discovering reusable primitives and transition motifs across settings. We ground the framework through case studies in physics, evolution, and neuroscience.

URL PDF HTML ☆

赞 0 踩 0

2606.06523 2026-06-17 cs.AI cs.LG cs.LO cs.SE 版本更新

TriBand-BEV：基于高度感知的鸟瞰图与高分辨率特征融合的实时仅LiDAR三维行人检测

Mohammad Khoshkdahan, Alexey Vinel

发表机构 * Karlsruhe Institute of Technology（卡尔斯鲁厄理工学院）

AI总结本文提出TriBand-BEV方法，通过高度感知的鸟瞰图与高分辨率特征融合实现实时LiDAR-only三维行人检测，采用轻量级鸟瞰图张量映射，单网络一次通过检测车辆、行人和自行车，提升检测精度与速度。

Comments Accepted for publication in the Proceedings of the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情

DOI: 10.65109/INST9866
Journal ref: Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

AI中文摘要

安全的自动驾驶代理和移动机器人需要快速的实时三维感知，尤其是对于行人等易受伤害道路使用者。我们介绍了一种新的鸟瞰图（BEV）编码方法，将完整的三维LiDAR点云映射到轻量级的二维BEV张量中，分为三个高度带。我们明确地将三维检测重新公式化为二维检测问题，然后从BEV输出中重建三维框。单个网络在一次通过中检测车辆、行人和自行车。骨干网络在深层阶段使用区域注意力，层次化的双向颈部网络在P1到P4之间融合上下文和细节，头部使用分布焦点学习预测定向框，以预测侧偏移和旋转IoU损失。训练应用小垂直重新分箱和温和的反射率抖动以防止记忆化。我们使用四分位距（IQR）过滤器在三维重建中去除噪声和离群的LiDAR点。在KITTI数据集上，TriBand-BEV在49 FPS的单个消费级GPU上实现了易、中等和困难样本的行人BEV AP分别为58.7/52.6/47.2%，优于Complex-YOLO，分别提升了+12.6%、+7.5%和+3.1%。定性场景显示在遮挡下检测稳定。该流程紧凑且适用于实时机器人部署。我们的源代码在GitHub上公开可用。

英文摘要

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

URL PDF HTML ☆

赞 0 踩 0

2604.13662 2026-06-17 cond-mat.mes-hall cs.CV cs.LG 版本更新

Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram

300毫米FDSOI量子点自动电荷状态调节：基于神经网络的电荷稳定性图分割

Peter Samaha, Amine Torki, Ysaline Renaud, Sam Fiette, Emmanuel Chanrion, Pierre-Andre Mortemousque, Yann Beilliard

发表机构 * CEA-Leti（法国格勒诺耶大学（Univ. Grenoble Alpes））

AI总结本文提出基于深度学习的语义分割流程，通过识别电荷稳定性图中的过渡线实现量子点自动电荷调节，提升硅量子点量子比特的高通量电荷调节效率。

Comments 10 pages, 6 figures, supplementary materials available

详情

DOI: 10.1088/2632-2153/ae7cda

AI中文摘要

调节由门定义的半导体量子点（QDs）是扩展自旋量子比特技术的主要瓶颈。我们提出了一种由深度学习（DL）驱动的语义分割流程，通过在完整的电荷稳定性图（CSDs）中定位过渡线来实现电荷自动调节，并返回单电荷 regime 的门电压目标。我们组装并手动注释了1015个实验测量的硅量子点设备的大型异构数据集，涵盖九种设计几何形状、多个晶圆和制造批次。一个具有MobileNetV2编码器的U-Net风格卷积神经网络（CNN）通过五折分组交叉验证进行训练和验证。我们的模型在定位单电荷 regime 方面实现了80.0%的离线调节成功率，某些设计的峰值性能超过88%。我们分析了主导的失败模式并提出了针对性的缓解措施。最后，宽范围图分割也自然地启用了可扩展的基于物理的特征提取，可以反馈到制造和设计流程中，并概述了在低温晶圆探针中实现实时集成的道路图。总体而言，我们的结果表明，基于神经网络（NN）的宽图分割是实现硅量子点量子比特高通量电荷调节的可行步骤。

英文摘要

Tuning of gate-defined semiconductor quantum dots (QDs) is a major bottleneck for scaling spin qubit technologies. We present a deep learning (DL) driven, semantic-segmentation pipeline that performs charge auto-tuning by locating transition lines in full charge stability diagrams (CSDs) and returns gate voltage targets for the single charge regime. We assemble and manually annotate a large, heterogeneous dataset of 1015 experimental CSDs measured from silicon QD devices, spanning nine design geometries, multiple wafers, and fabrication runs. A U-Net style convolutional neural network (CNN) with a MobileNetV2 encoder is trained and validated through five-fold group cross validation. Our model achieves an overall offline tuning success of 80.0% in locating the single-charge regime, with peak performance exceeding 88% for some designs. We analyze dominant failure modes and propose targeted mitigations. Finally, wide-range diagram segmentation also naturally enables scalable physic-based feature extraction that can feed back to fabrication and design workflows and outline a roadmap for real-time integration in a cryogenic wafer prober. Overall, our results show that neural network (NN) based wide-diagram segmentation is a practical step toward automated, high-throughput charge tuning for silicon QD qubits.

URL PDF HTML ☆

赞 0 踩 0

2512.03805 2026-06-17 cs.LG 版本更新

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

基于动态算法配置的深度强化学习：在OneMax优化中使用(1+(λ,λ))-GA的案例研究

Tai Nguyen, Phong Le, André Biedenkapp, Carola Doerr, Nguyen Dang

发表机构 * University of St Andrews, United Kingdom（圣安德鲁大学，英国）； Sorbonne Université, CNRS, LIP6, France（索邦大学，法国）； University of Freiburg, Germany（弗赖堡大学，德国）

AI总结本文研究了深度强化学习算法DDQN和PPO在OneMax问题中控制(1+(λ,λ))-GA种群大小的挑战，发现DDQN和PPO存在可扩展性下降和学习不稳定问题，通过自适应奖励转移机制改进DDQN，使其在样本效率上优于传统方法。

Comments arXiv admin note: text overlap with arXiv:2502.20265

详情

DOI: 10.1145/3821217

AI中文摘要

动态算法配置（DAC）研究参数化优化算法控制策略的高效识别。许多研究利用强化学习（RL）解决DAC挑战；然而，应用RL通常需要大量领域专业知识。在本文中，我们对两种深度RL算法——双深度Q网络（DDQN）和近端策略优化（PPO）——进行深入研究，以控制OneMax实例上的(1+(λ,λ))-GA种群大小。尽管OneMax在结构上简单，但为(1+(λ,λ))-GA学习有效的控制策略诱导了一个高度具有挑战性的DAC景观，使其成为受控且 demanding 的基准。我们的研究揭示了限制DDQN和PPO的两个基本挑战：可扩展性下降和学习不稳定，归因于探索不足和规划时间跨度覆盖不足。为了解决探索不足，我们引入了一种自适应奖励转移机制，利用奖励分布统计信息来增强DDQN的探索。这消除了实例特定超参数调优，并确保了在问题规模上的一致有效性。为了解决规划时间跨度覆盖问题，我们证明了在DDQN中无折扣学习的成功，而PPO面临根本的方差问题，需要替代设计。我们进一步表明，尽管超参数优化增强了PPO的稳定性，但它始终无法识别有效的策略。最后，DDQN结合自适应奖励转移在样本效率上与理论推导的策略相当，远超先前的DAC方法。我们的发现提供了对标准深度RL方法在这一具有挑战性的DAC设置中所面临根本障碍的理解，并突显了有效学习所需的关键方法论成分。

英文摘要

Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies leverage Reinforcement Learning (RL) to address DAC challenges; however, applying RL often requires extensive domain expertise. In this work, we conduct a comprehensive study of two deep-RL algorithms--Double Deep Q-Networks (DDQN) and Proximal Policy Optimization (PPO)--for controlling the population size of the $(1+(λ,λ))$-GA on OneMax instances. Although OneMax is structurally simple, learning effective control policies for the $(1+(λ,λ))$-GA induces a highly challenging DAC landscape, making it a controlled yet demanding benchmark. Our investigation reveals two fundamental challenges limiting DDQN and PPO: scalability degradation and learning instability, traced to under-exploration and planning horizon coverage. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN exploration. This eliminates instance-specific hyperparameter tuning and ensures consistent effectiveness across problem scales. To resolve planning horizon coverage, we demonstrate that undiscounted learning succeeds in DDQN, while PPO faces fundamental variance issues necessitating alternative designs. We further show that while hyperparameter optimization enhances PPO's stability, it consistently fails to identify effective policies. Finally, DDQN with adaptive reward shifting achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by orders of magnitude. Our findings provide insights into the fundamental obstacles faced by standard deep-RL approaches in this challenging DAC setting and highlight the key methodological ingredients required for effective learning.

URL PDF HTML ☆

赞 0 踩 0

2310.06328 2026-06-17 cs.LG eess.SP 版本更新

ARC-Fi: Exploiting Antenna Spatial Diversity for Label-Efficient Domain Generalization in Wi-Fi Sensing

ARC-Fi: 利用天线空间多样性实现标签高效领域泛化在Wi-Fi传感

Ke Xu, Zhiyong Zheng, Hongyuan Zhu, Lei Wang, Jiangtao Wang

发表机构 * Suzhou Institute for Advanced Research, University of Science and Technology of China（中国科学技术大学苏州研究院）； Suzhou Big Data and AI Research and Engineering Center（苏州大数据与人工智能研究与工程中心）； School of Artificial Intelligence and Data Science, University of Science and Technology of China（中国科学技术大学人工智能与数据科学学院）； Institute for Infocomm Research (I 2 R), A*STAR（资讯与通讯研究院（I2R），A*STAR）； School of Computer Science and Technology, Soochow University（苏州大学计算机科学与技术学院）

AI总结 ARC-Fi通过引入物理指导的数据增强策略，解决Wi-Fi传感中领域偏移问题，实现高效领域泛化。

Comments This work has been submitted to the IEEE for possible publication

详情

AI中文摘要

Wi-Fi传感系统在部署于未见过的现实环境时受到领域偏移的严重阻碍。尽管现有方法试图通过无监督领域适应（UDA）或领域泛化（DG）来解决这一问题，但它们严重依赖于不可用的目标数据或过于昂贵且庞大的标注源数据集。在实践中，收集大量未标注的信道状态信息（CSI）是可行的，而手动标注则受到严重限制。这种现实困境需要半监督领域泛化（SSDG）。为此，我们提出了ARC-Fi，这是首个专门用于Wi-Fi传感的SSDG框架。直接应用传统对比学习到CSI数据不可避免地触发领域特定的“捷径学习”，导致模型记忆环境背景而非手势动态。为克服这一问题，ARC-Fi引入了一种物理指导的数据增强策略：天线响应一致性（ARC）模块。ARC利用多天线系统的内在空间多样性，将位于同一位置的天线信号视为自然语义保持的增强视图，以明确阻止环境捷径。此外，我们引入了一个统一的半监督对比目标，利用稀缺标签和可靠的伪标签对跨领域特征进行对齐，有效防止了同类实例的盲目排斥。在Widar和CSIDA数据集上的广泛实验表明，ARC-Fi建立了新的最先进的水平，显著优于现有的UDA、DG和SSDG方法。最终，这项工作提供了一个基于物理的、标签高效的解决方案，推动了稳健现实Wi-Fi传感系统的大规模部署。代码可在：https://github.com/KaoruMiyazono/UniCrossFi。

英文摘要

Wi-Fi sensing systems are severely hindered by domain shifts when deployed in unseen real-world environments. While existing methods attempt to tackle this through Unsupervised Domain Adaptation (UDA) or Domain Generalization (DG), they critically rely on either inaccessible target data or prohibitively expensive, massive labeled source datasets. In practice, collecting abundant unlabeled Channel State Information (CSI) is feasible, whereas manual labeling is severely constrained. This realistic dilemma necessitates Semi-Supervised Domain Generalization (SSDG). To this end, we propose ARC-Fi, the first dedicated SSDG framework for Wi-Fi sensing. Directly applying conventional contrastive learning to CSI data inevitably triggers paradigm-specific "shortcut learning," causing models to memorize environmental backgrounds rather than gesture dynamics. To overcome this, ARC-Fi introduces a physics-informed data augmentation strategy: the Antenna Response Consistency (ARC) module. ARC exploits the intrinsic spatial diversity of multi-antenna systems, treating signals from co-located antennas as naturally semantics-preserving augmented views to explicitly block environmental shortcuts. Furthermore, we introduce a unified Semi-Supervised Contrastive Objective that leverages scarce labels and reliable pseudo-labels to align cross-domain features, effectively preventing the blind repulsion of same-class instances. Extensive experiments on the Widar and CSIDA datasets demonstrate that ARC-Fi establishes a new state-of-the-art, significantly outperforming existing UDA, DG, and SSDG methods. Ultimately, this work provides a physics-grounded, label-efficient solution, advancing the scalable deployment of robust real-world Wi-Fi sensing systems. Code is available at: https://github.com/KaoruMiyazono/UniCrossFi.

URL PDF HTML ☆

赞 0 踩 0

2602.13318 2026-06-17 cs.AI cs.CV cs.LG 版本更新

DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

DECKBench：用于学术幻灯片生成和编辑的多智能体框架基准测试

Daesik Jang, Morgan Lindsay Heisler, Linzi Xing, Yifei Li, Edward Wang, Ying Xiong, Yong Zhang, Zhenan Fan

发表机构 * Huawei Technologies Canada（华为加拿大技术有限公司）； University of British Columbia（不列颠哥伦比亚大学）

AI总结本文提出DECKBench，一个用于评估多智能体生成和编辑学术幻灯片的框架，通过定制数据集和模拟编辑指令，系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

详情

DOI: 10.1145/3770855.3817525

AI中文摘要

本文提出DECKBench，一个用于评估多智能体生成和编辑学术幻灯片的框架，通过定制数据集和模拟编辑指令，系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

英文摘要

Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental results demonstrate that the proposed benchmark highlights strengths, exposes failure modes, and provides actionable insights for improving multi-agent slide generation and editing systems. Overall, this work establishes a standardized foundation for reproducible and comparable evaluation of academic presentation generation and editing. Code and data are publicly available at https://github.com/morgan-heisler/DeckBench .

URL PDF HTML ☆

赞 0 踩 0

2602.00473 2026-06-17 quant-ph cs.AI cs.LG 版本更新

Quantum Phase Recognition via Quantum Attention Mechanism

通过量子注意机制进行量子相识别

Jin-Long Chen, Xin Li, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research（量子技术研究中心）； Key Laboratory of Advanced Optoelectronic Quantum Architecture（先进光电量子架构重点实验室）； Measurements (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China（测量（MOE），物理学院，北京理工大学，北京100081，中国）

AI总结本文提出混合量子-经典注意模型，利用交换测试和参数化量子电路提取量子态关联，实现基态分类，针对簇异或模型在9和15个量子比特系统中表现出高准确率和鲁棒性。

Comments 10 pages, 7 figures

详情

DOI: 10.1103/rcjd-bgdb
Journal ref: Phys. Rev. A 113, 062403 (2026)

AI中文摘要

许多体系统中的量子相变本质上由复杂的关联结构特征化，这给传统方法在大规模系统中的计算带来了挑战。为此，我们提出了一种混合量子-经典注意模型。该模型利用交换测试和参数化量子电路实现的注意机制，提取量子态中的关联并执行基态分类。在9和15个量子比特的簇异或模型上进行测试，该模型在少于100个训练数据的情况下实现了高分类准确率，并展示了对训练集变化的鲁棒性。进一步分析表明，该模型成功捕捉了相敏感特征和特征物理长度尺度，为复杂许多体系统中的量子相识别提供了一种可扩展且数据高效的解决方案。

英文摘要

Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large systems. To address this, we propose a hybrid quantum-classical attention model. This model uses an attention mechanism, realized through swap tests and a parameterized quantum circuit, to extract correlations within quantum states and perform ground-state classification. Benchmarked on the cluster-Ising model with system sizes of 9 and 15 qubits, the model achieves high classification accuracy with less than 100 training data and demonstrates robustness against variations in the training set. Further analysis reveals that the model successfully captures phase-sensitive features and characteristic physical length scales, offering a scalable and data-efficient approach for quantum phase recognition in complex many-body systems.

URL PDF HTML ☆

赞 0 踩 0

2509.11154 2026-06-17 cs.LG cs.AI 版本更新

Feature Space Topology Control via Hopkins Loss

通过霍普金斯损失控制特征空间拓扑

Einari Vaaras, Manu Airaksinen

发表机构 * Signal Processing Research Centre Tampere University（信号处理研究中心塔尔皮莱大学）； BABA Center, Department of Physiology University of Helsinki（BABA中心生理学系赫尔辛基大学）

AI总结本文提出霍普金斯损失，用于控制特征空间拓扑，通过非线性瓶颈自编码器在语音、文本和图像数据中验证其在分类和降维中的有效性。

Comments Accepted for publication in Proc. IEEE ICTAI 2025, Athens, Greece

详情

DOI: 10.1109/ICTAI66417.2025.00064

AI中文摘要

特征空间拓扑指的是特征空间中样本的组织方式。修改此拓扑在机器学习应用中有益，包括降维、生成建模、迁移学习和对抗攻击的鲁棒性。本文引入了霍普金斯损失，利用霍普金斯统计量来强制实现期望的特征空间拓扑，与现有拓扑相关方法旨在保留输入特征拓扑不同。我们在语音、文本和图像数据的两个场景中评估了霍普金斯损失的有效性：分类和使用非线性瓶颈自编码器的降维。实验表明，将霍普金斯损失整合到分类或降维中对分类性能影响很小，但能提供修改特征拓扑的好处。

英文摘要

Feature space topology refers to the organization of samples within the feature space. Modifying this topology can be beneficial in machine learning applications, including dimensionality reduction, generative modeling, transfer learning, and robustness to adversarial attacks. This paper introduces a novel loss function, Hopkins loss, which leverages the Hopkins statistic to enforce a desired feature space topology, which is in contrast to existing topology-related methods that aim to preserve input feature topology. We evaluate the effectiveness of Hopkins loss on speech, text, and image data in two scenarios: classification and dimensionality reduction using nonlinear bottleneck autoencoders. Our experiments show that integrating Hopkins loss into classification or dimensionality reduction has only a small impact on classification performance while providing the benefit of modifying feature topology.

URL PDF HTML ☆

赞 0 踩 0

2509.03932 2026-06-17 cs.CL cs.CY cs.LG 版本更新

KPoEM: A Human-Annotated Dataset for Emotion Classification and RAG-Based Poetry Generation in Korean Modern Poetry

KPoEM：用于韩国现代诗歌情感分类与基于RAG的诗歌生成的人工标注数据集

Iro Lim, Haein Ji, Byungjun Kim

发表机构 * The Academy of Korean Studies（韩国学术院）； Graduate School of Korean Studies（韩国研究研究生院）； Cultural Informatics（文化信息学）

AI总结本研究构建了KPoEM多标签情感数据集，通过序列微调策略实现F1-micro 0.60的情感分类，并验证了基于RAG的诗歌生成在韩国文学情感与文化表达上的可行性。

Comments 43 pages, 22 tables, 3 figures, Digital Humanities and Social Sciences Korea Conference, James Joo-Jin Kim Center for Korean Studies, University of Pennsylvania, Philadelphia, USA

详情

DOI: 10.25024/review.2026.29.1.006
Journal ref: The Review of Korean Studies 29(1) (2026) 161-206

AI中文摘要

本研究介绍了KPoEM（韩国诗歌情感映射），这是一个新颖的数据集，为现代韩国诗歌中情感中心分析和生成应用奠定了基础。尽管自然语言处理取得了进展，但由于诗歌复杂的比喻语言和文化特异性，其研究仍不充分。我们构建了一个包含7,662条条目（7,007条行级和615条作品级）的多标签数据集，由五位有影响力的韩国诗人的44个细粒度情感类别进行标注。通过序列策略（从通用语料库到专门的KPoEM数据集）微调的KPoEM情感分类模型，实现了0.60的F1-micro分数，显著优于之前的模型（0.43）。该模型在保留核心诗歌情感的同时，展示了识别时间和文化特定情感表达的能力增强。此外，将结构化情感数据集应用于基于RAG的诗歌生成模型，证明了生成反映韩国文学情感和文化敏感性文本的实证可行性。这种综合方法加强了计算技术与文学分析之间的联系，为定量情感研究和生成诗学开辟了新途径。总体而言，本研究为推进现代韩国诗歌中情感中心分析和创作提供了基础。

英文摘要

This study introduces KPoEM (Korean Poetry Emotion Mapping), a novel dataset that serves as a foundation for both emotion-centered analysis and generative applications in modern Korean poetry. Despite advancements in NLP, poetry remains underexplored due to its complex figurative language and cultural specificity. We constructed a multi-label dataset of 7,662 entries (7,007 line-level and 615 work-level), annotated with 44 fine-grained emotion categories from five influential Korean poets. The KPoEM emotion classification model, fine-tuned through a sequential strategy -- moving from general-purpose corpora to the specialized KPoEM dataset -- achieved an F1-micro score of 0.60, significantly outperforming previous models (0.43). The model demonstrates an enhanced ability to identify temporally and culturally specific emotional expressions while preserving core poetic sentiments. Furthermore, applying the structured emotion dataset to a RAG-based poetry generation model demonstrates the empirical feasibility of generating texts that reflect the emotional and cultural sensibilities of Korean literature. This integrated approach strengthens the connection between computational techniques and literary analysis, opening new pathways for quantitative emotion research and generative poetics. Overall, this study provides a foundation for advancing emotion-centered analysis and creation in modern Korean poetry.

URL PDF HTML ☆

赞 0 踩 0

2511.03876 2026-06-17 eess.IV cs.CV cs.LG physics.med-ph 版本更新

Computed Tomography (CT)-derived Cardiovascular Flow Estimation Using Physics-Informed Neural Networks Improves with Sinogram-based Training: A Simulation Study

基于CT的心血管血流估计利用物理信息神经网络，通过sinogram训练提升：一项模拟研究

Jinyuxuan Guo, Gurnoor Singh Khurana, Alejandro Gonzalo Grande, Juan C. del Alamo, Francisco Contijoch

发表机构 * Dept. of Bioengineering, University of California San Diego（加州大学圣地亚哥分校生物工程系）； Dept. of Computer Science Engineering, University of California San Diego（加州大学圣地亚哥分校计算机科学与工程系）； Dept. of Mechanical Engineering, Univ of Washington（华盛顿大学机械工程系）； Depts of Mechanical Engineering and Cardiology, Univ. of Washington（华盛顿大学机械工程与心内科系）； Depts. of Bioengineering, Radiology, University of California San Diego（加州大学圣地亚哥分校生物工程与放射学系）

AI总结本研究评估了CT影像对基于物理信息神经网络（PINN）的血流估计的影响，提出了一种改进框架SinoFlow，直接利用sinogram数据估计血流，结果显示SinoFlow在避免滤波反投影引入的误差方面表现更优。

详情

DOI: 10.1002/mp.70519

AI中文摘要

背景：非侵入性成像基于血流评估在评估心脏功能和结构中起关键作用。CT是一种广泛使用的成像模态，能够稳健地评估心血管解剖和功能，但直接从对比剂演变的电影中估计血流速度的方法尚未开发。目的：本研究评估CT影像对基于物理信息神经网络（PINN）的血流估计的影响，并提出一种改进框架SinoFlow，直接利用sinogram数据估计血流。方法：我们利用计算流体力学生成理想化的2D血管分叉中的脉动流场，并模拟了不同 gantry 旋转速度、管电流和脉冲模式成像设置的CT扫描。我们比较了基于重建图像的PINN血流估计（ImageFlow）与SinoFlow的性能。结果：SinoFlow通过避免滤波反投影引入的误差显著提高了血流估计性能。SinoFlow在所有测试的gantry旋转速度下都表现出鲁棒性，并且始终产生比ImageFlow更低的均方误差和速度误差。此外，SinoFlow与脉冲模式成像兼容，并且在较短的脉冲宽度下保持更高的准确性。结论：本研究展示了SinoFlow在CT基血流估计中的潜力，为非侵入性血流评估提供了一种更有前景的方法。研究结果旨在为PINNs在CT图像中的未来应用提供信息，并提供了一种基于图像的估计解决方案，合理采集参数可产生准确的血流估计。

英文摘要

Background: Non-invasive imaging-based assessment of blood flow plays a critical role in evaluating heart function and structure. Computed Tomography (CT) is a widely-used imaging modality that can robustly evaluate cardiovascular anatomy and function, but direct methods to estimate blood flow velocity from movies of contrast evolution have not been developed. Purpose: This study evaluates the impact of CT imaging on Physics-Informed Neural Networks (PINN)-based flow estimation and proposes an improved framework, SinoFlow, which uses sinogram data directly to estimate blood flow. Methods: We generated pulsatile flow fields in an idealized 2D vessel bifurcation using computational fluid dynamics and simulated CT scans with varying gantry rotation speeds, tube currents, and pulse mode imaging settings. We compared the performance of PINN-based flow estimation using reconstructed images (ImageFlow) to SinoFlow. Results: SinoFlow significantly improved flow estimation performance by avoiding propagating errors introduced by filtered backprojection. SinoFlow was robust across all tested gantry rotation speeds and consistently produced lower mean squared error and velocity errors than ImageFlow. Additionally, SinoFlow was compatible with pulsed-mode imaging and maintained higher accuracy with shorter pulse widths. Conclusions: This study demonstrates the potential of SinoFlow for CT-based flow estimation, providing a more promising approach for non-invasive blood flow assessment. The findings aim to inform future applications of PINNs to CT images and provide a solution for image-based estimation, with reasonable acquisition parameters yielding accurate flow estimates.

URL PDF HTML ☆

赞 0 踩 0

2501.16370 2026-06-17 cs.LG cs.AI cs.NA cs.NE math.NA 版本更新

Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

先进物理指导神经网络与残差用于求解复杂积分方程

Mahdi Movahedian Moghaddam, Kourosh Parand, Saeed Reza Kheradpisheh

发表机构 * Department of Computer and Data Sciences, Shahid Beheshti University（计算机与数据科学系，谢赫·贝赫什提大学）； Department of Cognitive Modeling, Shahid Beheshti University（认知建模系，谢赫·贝赫什提大学）

AI总结本文提出残差积分求解网络（RISN），通过高精度数值方法与残差连接提升求解积分和积分微分方程的精度与稳定性，实验表明其在多种方程类型上均优于传统PINN及其变体。

详情

DOI: 10.22128/ansne.2026.3261.1200
Journal ref: Anal. Numer. Solut. Nonlinear Equ. 11 (2026), no. 1, 153-173

AI中文摘要

本文提出残差积分求解网络（RISN），一种新型神经网络架构，旨在求解广泛类别的积分和积分微分方程，包括一维、多维、常微分和偏微分、分数类型以及包含振荡核的霍尔迈尔类型积分方程。RISN整合残差连接与高精度数值方法如高斯求积和分数导数运算矩阵，使其在精度和稳定性上优于传统物理指导神经网络（PINN）。残差连接有助于缓解消失梯度问题，使RISN能够处理更深层的网络和更复杂的核，特别是在多维问题中。通过广泛实验，我们证明RISN在各种方程类型上均优于传统PINN及其变体，如辅助PINN（A-PINN）和自适应PINN（SA-PINN），在各种方程类型上均取得显著更低的平均绝对误差（MAE）。这些结果突显了RISN在求解具有挑战性的积分和积分微分问题中的鲁棒性和效率，使其成为传统方法难以应对的现实应用中的宝贵工具。

英文摘要

In this paper, we present the Residual Integral Solver Network (RISN), a novel neural network architecture designed to solve a wide range of integral and integro-differential equations, including one-dimensional, multi-dimensional, ordinary and partial integro-differential, systems, fractional types, and Helmholtz-type integral equations involving oscillatory kernels. RISN integrates residual connections with high-accuracy numerical methods such as Gaussian quadrature and fractional derivative operational matrices, enabling it to achieve higher accuracy and stability than traditional Physics-Informed Neural Networks (PINN). The residual connections help mitigate vanishing gradient issues, allowing RISN to handle deeper networks and more complex kernels, particularly in multi-dimensional problems. Through extensive experiments, we demonstrate that RISN consistently outperforms not only classical PINNs but also advanced variants such as Auxiliary PINN (A-PINN) and Self-Adaptive PINN (SA-PINN), achieving significantly lower Mean Absolute Errors (MAE) across various types of equations. These results highlight RISN's robustness and efficiency in solving challenging integral and integro-differential problems, making it a valuable tool for real-world applications where traditional methods often struggle.

URL PDF HTML ☆

赞 0 踩 0

2509.10089 2026-06-17 cs.LG 版本更新

KAN-SR: A Kolmogorov-Arnold Network Guided Symbolic Regression Framework

KAN-SR：基于Kolmogorov-Arnold网络的符号回归框架

Marco Andrea Bühler, Gonzalo Guillén-Gosálbez

发表机构 * ETH Zürich（苏黎世联邦理工学院）

AI总结本文提出基于Kolmogorov-Arnold网络的KAN-SR框架，通过深度学习技术和简化策略恢复Feynman符号回归科学发现数据集的真实方程，并结合神经控制微分方程精确建模生物过程系统。

详情

DOI: 10.1016/j.compchemeng.2026.109721
Journal ref: Computers & Chemical Engineering, Volume 213, 2026, 109721

AI中文摘要

我们介绍了一种新颖的符号回归框架，即KAN-SR，其基于Kolmogorov-Arnold网络（KANs），采用分而治之的方法。符号回归旨在寻找最佳拟合给定数据集的数学方程，通常通过遗传编程方法解决。我们证明通过使用深度学习技术、更具体的KANs以及结合简化策略如平移对称性和分离性，能够恢复Feynman符号回归科学发现数据集的真实方程。此外，我们还证明通过将所提出的框架与神经控制微分方程结合，能够精确建模生物过程系统，为其他工程系统的动态建模打开大门。

英文摘要

We introduce a novel symbolic regression framework, namely KAN-SR, built on Kolmogorov Arnold Networks (KANs) which follows a divide-and-conquer approach. Symbolic regression searches for mathematical equations that best fit a given dataset and is commonly solved with genetic programming approaches. We show that by using deep learning techniques, more specific KANs, and combining them with simplification strategies such as translational symmetries and separabilities, we are able to recover ground-truth equations of the Feynman Symbolic Regression for Scientific Discovery (SRSD) dataset. Additionally, we show that by combining the proposed framework with neural controlled differential equations, we are able to model the dynamics of an in-silico bioprocess system precisely, opening the door for the dynamic modeling of other engineering systems.

URL PDF HTML ☆

赞 0 踩 0

2508.10908 2026-06-17 physics.ao-ph cs.LG 版本更新

Data-driven global ocean model resolving ocean-atmosphere coupling dynamics

数据驱动的全球海洋模型解析海洋-大气耦合动力学

Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham

发表机构 * Center for Climate and Carbon Cycle Research, Korea Institute of Science and Technology, Seoul, Republic of Korea（韩国科学技术院气候与碳循环研究中心，首尔，大韩民国）； Department of Environment and Energy, Jeonbuk National University, Jeonju, Republic of Korea（全南国立大学环境与能源系，全州，大韩民国）； School of Earth and Environmental Sciences, Seoul National University, Seoul, Republic of Korea（首尔国立大学地球与环境科学学院，首尔，大韩民国）； Department of Environmental Management, Seoul National University, Seoul, Republic of Korea（首尔国立大学环境管理系，首尔，大韩民国）

AI总结本文提出KIST-Ocean模型，利用U型视觉注意力对抗网络架构，通过部分卷积、对抗训练和迁移学习提升海洋预测能力，准确模拟热带太平洋的Kelvin波和Rossby波传播及环流风应力诱导的垂直运动，展现其在气候现象中的耦合机制表示能力。

Comments The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table

详情

DOI: 10.1126/sciadv.aed1225
Journal ref: Sci. Adv. 12, eaed1225 (2026)

AI中文摘要

人工智能已推动全球天气预报发展，优于传统数值模型在准确性和计算效率方面。然而，预测超亚季节时间尺度需要开发基于深度学习的海洋-大气耦合模型，以真实模拟复杂海洋对大气强迫的响应。本文提出KIST-Ocean，一种基于深度学习的全球三维海洋环流模型，采用U型视觉注意力对抗网络架构。KIST-Ocean通过部分卷积、对抗训练和迁移学习解决海岸复杂性和预测分布漂移问题。全面评估证实了模型的鲁棒海洋预测能力和效率。此外，它准确捕捉现实海洋响应，如热带太平洋的Kelvin和Rossby波传播，以及由环流和反环流风应力引起的垂直运动，展示其在气候现象（如厄尔尼诺-南方涛动）中关键海洋-大气耦合机制的表示能力。这些发现增强了基于深度学习的全球天气和气候模型的信心，并拓展深度学习方法到更广泛的地球系统建模，为提升气候预测能力提供潜力。

英文摘要

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

URL PDF HTML ☆

赞 0 踩 0

2502.10112 2026-06-17 cs.LG 版本更新

Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

基于加速度计的日常活动能量消耗估计：不同加速度计配置的比较

Shuhao Que, Remco Poelarends, Peter Veltink, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * Department of Electrical Engineering, University of Twente（特文特大学电气工程系）； Department of Nuclear Medicine, Isala（Isala核医学部）

AI总结本文比较了基于身体中心质量加速度和腕部加速度计的不同配置在日常活动能量消耗估计中的表现，发现基于身体中心质量的3-acc配置表现最佳。

Comments This work has been accepted by IEEE EMBC 2025

详情

DOI: 10.1109/EMBC58623.2025.11253809

AI中文摘要

身体活动能量消耗（PAEE）可通过呼吸数据测量，也可通过身体运动预测。身体中心质量（COM）加速度反映全身运动，是PAEE的良好预测指标。本文使用COSMED K5测量的呼吸数据作为参考，评估了基于COM和腕部的配置性能。COM配置包括仅使用骨盆加速度计（pelvis-acc）和骨盆加速度计加双大腿加速度计（3-acc）。腕部配置包括仅使用左腕或右腕加速度计。两种现有PAEE估计方法（线性回归和CNN-LSTM）在3-acc配置下表现最佳（LR：R²=0.41，CNN-LSTM：R²=0.53）。3-acc与pelvis-acc配置无显著差异（p值=0.278）。对于两种模型，左腕或右腕配置在PAEE预测中无显著表现（R²接近0，显著劣于COM配置（p值<0.05）。左右腕无显著差异（p值=0.329）

英文摘要

Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).

URL PDF HTML ☆

赞 0 踩 0

2503.08679 2026-06-17 cs.AI cs.CL cs.LG 版本更新

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

现实中的思维链推理并不总是忠实的

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

发表机构 * Poseidon Research（Poseidon研究）

AI总结研究发现，在自然语言提示下，模型有时会生成表面连贯但自相矛盾的思维链，揭示出隐含的事后合理化现象，且前沿模型也未能完全避免。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情

AI中文摘要

最近的研究表明，当面对提示中的显式偏见时，模型通常会在其思维链（CoT）输出中省略提及这些偏见，揭示出口头推理可能给出模型如何得出错误结论的不正确图景（不忠实）。在这项工作中，我们展示了不忠实的CoT也发生在自然措辞、非对抗性的提示上，而无需添加人为偏见或编辑模型输出。我们发现，当分别呈现问题“X比Y大吗？”和“Y比X大吗？”时，模型有时会生成表面连贯的论证来证明系统性地对两者都回答“是”或都回答“否”是合理的，尽管存在矛盾。我们提供了初步证据表明这是由于模型对“是”或“否”的隐含偏见，并将其标记为隐含的事后合理化。我们的结果显示，生产模型的不忠实率高达13%，而前沿模型虽然更忠实，但没有一个完全忠实，包括像DeepSeek R1（0.37%）和Sonnet 3.7 with thinking（0.04%）这样的思考模型。我们还研究了不忠实的非逻辑捷径，即模型使用微妙的非逻辑推理来使对困难数学问题的推测性答案看起来经过严格证明。我们的发现表明，虽然CoT可用于评估输出，但它并不是产生模型答案的内部过程的完整描述，应在代理或安全关键环境中谨慎使用。

英文摘要

Recent studies indicate that when faced with explicit biases in prompts, models often omit mentioning these biases in their Chain-of-Thought (CoT) output, revealing that verbalized reasoning can give an incorrect picture of how models arrive at conclusions (unfaithfulness). In this work, we show that unfaithful CoT also occurs on naturally worded, non-adversarial prompts without adding artificial biases or editing model outputs. We find that when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify systematically answering Yes to both or No to both, despite the contradiction. We present preliminary evidence that this is due to models' implicit biases towards Yes or No, labeling this Implicit Post-Hoc Rationalization. Our results reveal rates up to 13% for production models, and while frontier models are more faithful, none are entirely so, including thinking models like DeepSeek R1 (0.37%) and Sonnet 3.7 with thinking (0.04%). We also investigate Unfaithful Illogical Shortcuts, where models use subtly illogical reasoning to make speculative answers to hard math problems seem rigorously proven. Our findings indicate that while CoT can be useful for assessing outputs, it is not a complete account of the internal process that produced the model's answer and should be used with caution in agentic or safety-critical settings.

URL PDF HTML ☆

赞 0 踩 0

2506.08654 2026-06-17 physics.med-ph cs.LG 版本更新

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

一种保护隐私的联邦学习框架用于头颈区域CBCT到合成CT的可推广转换

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

发表机构 * Institute of Biomedical Engineering（生物医学工程研究所）； Karlsruhe Institute of Technology（卡尔斯鲁厄理工大学）； Department of Experimental and Clinical Medicine（实验与临床医学系）

AI总结本文提出一种跨机构联邦学习框架，用于头颈区域CBCT到合成CT的转换，通过保护数据隐私实现跨机构模型的泛化能力。

详情

DOI: 10.3389/fdgth.2026.1812254
Journal ref: Frontiers in Digital Health, 8:1812254, June 2026

AI中文摘要

锥束计算机断层扫描（CBCT）已成为图像引导放射治疗（IGRT）中广泛应用的成像模态。然而，CBCT存在噪声增加、软组织对比度有限和伪影等问题，导致Hounsfield单位值不可靠，阻碍了直接剂量计算。合成CT（sCT）生成从CBCT中解决了这些问题，尤其是使用深度学习（DL）方法。现有方法受到机构异质性、扫描仪依赖性变化和数据隐私法规的限制，这些法规防止多中心数据共享。为克服这些挑战，我们提出了一种跨机构横向联邦学习（FL）方法，用于头颈区域CBCT到sCT的合成，扩展了我们的FedSynthCT框架。一个条件生成对抗网络在欧洲三个医疗中心的公共SynthRAD2025挑战数据集上协同训练。联邦模型在不同中心间表现出有效的泛化能力，平均绝对误差（MAE）范围从64.38±13.63到85.90±7.10 HU，结构相似性指数（SSIM）从0.882±0.022到0.922±0.039，峰值信噪比（PSNR）从32.86±0.94到34.91±1.04 dB。值得注意的是，在60名患者的外部验证数据集上，未进行额外训练即可实现相似的性能（MAE: 75.22±11.81 HU，SSIM: 0.904±0.034，PSNR: 33.52±2.06 dB），证实了在协议、扫描仪差异和配准误差的情况下具有鲁棒的泛化能力。这些发现展示了联邦学习在CBCT到sCT合成中的技术可行性，同时保护了数据隐私，并提供了一种无需集中数据共享或特定站点微调即可在不同机构之间开发可推广模型的协作解决方案。

英文摘要

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

URL PDF HTML ☆

赞 0 踩 0

2501.15351 2026-06-17 cs.CY cs.LG 版本更新

Fairness in LLM-Generated Surveys

LLM生成调查中的公平性

Andrés Abeliuk, Vanessa Gaete, Naim Bro

发表机构 * Department of Computer Science, University of Chile（智利大学计算机科学系）； National Center for Artificial Intelligence (CENIA)（国家人工智能中心）； School of Government, Adolfo Ibáñez University（阿道弗·伊巴涅斯大学政府学院）； Millennium Institute for Foundational Research on Data (IMFD)（数据基础研究千年研究所）

AI总结研究分析了LLM在不同人口中的表现，发现其在美国数据集上表现更优，但存在因训练数据偏见导致的公平性问题，提出新的测量框架以提升模型公平性。

详情

DOI: 10.1140/epjds/s13688-026-00673-y
Journal ref: EPJ Data Science (2026)

AI中文摘要

大型语言模型（LLMs）在文本生成和理解方面表现出色，尤其在模拟社会政治和经济模式方面，可作为传统调查的替代方案。然而，其全球适用性仍存疑，因未探索的社会人口和地理背景中的偏见。本研究通过分析智利和美国的公开调查，探讨LLM在不同人群中的表现，关注预测准确性和公平性指标。结果显示，LLM在美国数据集上表现更优，此偏见源于以美国为中心的训练数据，即使考虑社会人口差异后仍显著。在美国，政治身份和种族显著影响预测准确性，而在智利，性别、教育和宗教归属起更重要作用。本研究提出一种新的框架，用于测量LLM中的社会人口偏见，为确保在不同社会文化背景下实现更公平和公正的模型表现提供路径。

英文摘要

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.

URL PDF HTML ☆

赞 0 踩 0

2305.09366 2026-06-17 cs.LG eess.SP 版本更新

Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors

基于可穿戴运动传感器的自动婴儿运动分类中自监督预训练的评估

Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen

发表机构 * Helsinki University Hospital, Helsinki, Finland（赫尔辛基大学医院，芬兰）

AI总结本文评估了自监督预训练在提高基于可穿戴运动传感器的婴儿运动分类准确性中的效果，发现预训练无标签数据可提升分类模型的鲁棒性，且选择上下文相关数据进一步提升了性能。

Comments To be published in Proc. IEEE EMBC 2023, Sydney, Australia

详情

DOI: 10.1109/EMBC40787.2023.10340118

AI中文摘要

最近开发的婴儿可穿戴MAIJU设备为在非医院环境客观评估婴儿运动性能提供了新方法，该信息可用于发展研究和临床决策支持，如检测发育问题并指导治疗干预。MAIJU分析完全依赖于婴儿姿势和运动的分类，因此研究如何提高此类分类的准确性至关重要。本文研究了自监督预训练如何提升用于分析MAIJU记录的分类器性能，并探讨了预训练数据的上下文选择性质量筛选是否会影响分类器性能。实验表明，i）使用无标签数据预训练分类器可使后续分类模型的准确性显著提升，ii）选择上下文相关预训练数据可进一步提高分类器性能。

英文摘要

The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings. This information could be used for developmental research and to support clinical decision-making, such as detection of developmental problems and guiding of their therapeutic interventions. MAIJU-based analyses rely fully on the classification of infant's posture and movement; it is hence essential to study ways to increase the accuracy of such classifications, aiming to increase the reliability and robustness of the automated analysis. Here, we investigated how self-supervised pre-training improves performance of the classifiers used for analyzing MAIJU recordings, and we studied whether performance of the classifier models is affected by context-selective quality-screening of pre-training data to exclude periods of little infant movement or with missing sensors. Our experiments show that i) pre-training the classifier with unlabeled data leads to a robust accuracy increase of subsequent classification models, and ii) selecting context-relevant pre-training data leads to substantial further improvements in the classifier performance.

URL PDF HTML ☆

赞 0 踩 0

2206.10188 2026-06-17 cs.LG cs.SD eess.AS 版本更新

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

基于聚类的主动学习中自监督学习与降维方法的分析用于语音情感识别

Einari Vaaras, Manu Airaksinen, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland（图皮大学计算科学系，芬兰）； Helsinki University Hospital, Helsinki, Finland（赫尔辛基大学医院，芬兰）

AI总结本文研究了在语音情感识别中，利用自监督学习和降维方法提升基于聚类的主动学习性能，探讨了特征空间局部和全局拓扑结构对主动学习的影响，发现降维不影响性能且二维特征表现良好。

Comments To be published in Proc. Interspeech 2022, Incheon, South Korea

详情

DOI: 10.21437/Interspeech.2022-329

AI中文摘要

当领域专家需要进行数据标注时，减少标注工作量以节省时间和成本至关重要。在无标注情况下，可以利用特征空间结构进行基于聚类的主动学习（AL）方法。然而，这些方法高度依赖于样本在特征空间中的组织方式和距离度量。无监督方法如对比预测编码（CPC）可以用于学习有序的特征空间，但这些方法通常会产生高维特征，这可能对估计数据密度构成挑战。本文结合CPC和多种降维方法，探索基于聚类的AL的实用方法。我们的实验表明，特征空间的局部和全局拓扑结构可以成功用于AL，并且CPC可以提高基于传统信号特征的聚类AL性能。此外，我们观察到压缩数据维度对AL性能影响不大，当标注数量不低时，二维特征表示与高维特征表示在AL性能上相似。

英文摘要

When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features. Additionally, we observe that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

URL PDF HTML ☆

赞 0 踩 0

2106.09539 2026-06-17 eess.AS cs.LG cs.SD 版本更新

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

对新生儿重症监护病房中以儿童为中心的全天候录音中语音情感内容的自动分析

Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland（图瓦大学计算科学系）； Department of Clinical Medicine, University of Turku, Finland（图尔库大学临床医学系）； Department of Signal Processing and Acoustics, Aalto University, Finland（阿尔托大学信号处理与声学系）

AI总结本文研究了如何通过自动语音情感识别系统分析新生儿录音中的情感内容，探讨了跨语料泛化、WGAN域适应和主动学习在新领域部署中的有效性，实现了73.4%的UAR分类性能。

详情

DOI: 10.21437/Interspeech.2021-303

AI中文摘要

研究人员最近开始研究年轻婴儿听到的情感语音如何影响其发展结果。作为这项研究的一部分，来自芬兰和爱沙尼亚两家医院的数百小时全天候录音被收集，用于所谓的APPLE研究。为了分析此类大规模数据集中的语音情感内容，需要一个自动语音情感识别（SER）系统。然而，目前没有情感标签或现成的领域内SER系统可用。本文介绍了最初未标注的大型真实世界音频数据集，并描述了针对芬兰子集数据开发的功能性SER系统。我们探讨了替代的最先进技术在新领域部署SER系统的有效性，比较了跨语料泛化、基于WGAN的域适应和主动学习在该任务中的效果。结果表明，表现最好的模型能够实现二元分类中valence和arousal的73.4%未加权平均召回率（UAR）和73.2% UAR。结果还显示，主动学习在与其他两种方法相比时表现最为一致。

英文摘要

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.

URL PDF HTML ☆

赞 0 踩 0

1. 深度学习架构与训练方法 31 篇

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

PowerOPD: Stabilizing On-Policy Distillation with Bounded Power Transformation

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

Conservation Laws for Modern Neural Architectures

Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity

From Drift to Coherence: Stabilizing Beliefs in LLMs

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

Looped World Models

Where Should Action Generation Begin? A Learnable Source Prior for Generative Robot Policies

An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

Revisiting Structural Dependency in Autoregressive Multi-Task Table Recognition via Order-Independent Cell-Level Representations

INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

A tensor network approach for chaotic time series prediction

Dropout Neural Network Training Viewed from a Percolation Perspective

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

Olmo Hybrid: From Theory to Practice and Back

Constitutional On-Policy Safe Distillation

When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

Rational Sparse Autoencoder

Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

Resource-Efficient Variational Quantum Classifier

Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

2. 表示学习、自监督与对比学习 12 篇

FoundCause: Causal Discovery with Latent Confounders from Observational Data

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

Blind Recovery of Latent Domains via Unsupervised Symmetry Discovery

Geodesic Calculus on Implicitly Defined Latent Manifolds

Brep2Shape: Boundary and Shape Representation Alignment via Self-Supervised Transformers

Scalable and Interpretable Representation Alignment with Ordinal Similarity

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

Stable and Steerable Sparse Autoencoders with Weight Regularization

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

3. 强化学习与序列决策 24 篇

Rethinking Groups in Critic-Free RLVR

Decision-Driven Geosteering Under Uncertainty: A Unified Framework for Sequential Decision Optimization

Performance-Driven Environment Abstraction with Multi-Timescale Learning

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

Online LLM Selection via Constrained Bandits with Time-Varying Demand

Learning to Refine Hidden States for Reliable LLM Reasoning

Continuous-time Optimal Stopping through Deep Reinforcement Learning

Reversal Q-Learning

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

WallZero: Mastering the Game of WallGo with Strategic Analysis

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

Learning in Matching Games with Bandit Feedback

Optimism Stabilizes Thompson Sampling for Adaptive Inference

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

Tacit Coordination of Large Language Models

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

Overcoming the Incentive Collapse Paradox

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

4. 生成模型与概率建模 25 篇

Informative Missingness to Generate Irregular Clinical Time Series

Constrained Diffusion Models with Primal-Dual Inference

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

Perron--Frobenius Operator Matching for Generative Modeling

Recursive Scaling in Masked Diffusion Models

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

Volterra Generative Models

Kolmogorov Regression for Robust Diffusion Policies

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3