arXivDaily arXiv每日学术速递 周一至周五更新
重置

1. 深度学习架构与训练方法 31 篇

2606.17107 2026-06-17 cs.LG cs.AI 新提交

Models Take Notes at Prefill: KV Cache Can Be Editable and Composable

模型在预填充阶段记笔记:KV缓存可编辑且可组合

Bojie Li

发表机构 * Pine AI

AI总结 研究发现KV缓存像笔记一样存储结论,支持编辑和组合:编辑单个字段可修正决策(8B模型准确率1.00,仅需~1%计算),组合预编译技能可无缝插入任意上下文(logit余弦相似度0.90-0.999),延迟降低至O(L)。

详情
AI中文摘要

前缀缓存仅对完全共享的前缀重用预填充结果,因此一个字段的改变会使整个下游缓存失效。然而,覆盖该字段自身的键/值向量并重用其余部分,会导致模型基于旧值行动。通过四个模型家族的因果分析,原因在于:在预填充阶段,模型已将基于字段条件的结论写入下游笔记;该字段自身的键/值对决策的贡献不足1%。将KV缓存视为记录已记忆结论的笔记本,可以引出两个能力。(1) 可编辑性。一个显著的勘误可以修正笔记;结合思维链,仅编辑该字段即可恢复决策(8B模型准确率1.00,约1%计算),而无思维链时则被忽略。(2) 可组合性。笔记具有位置可移植性,因此预编译的技能可以通过RoPE重新定位并拼接至任意上下文,与完全重计算无法区分(logit余弦相似度0.90-0.999,十二个模型),且首次令牌延迟为O(L)而非O(L^2)。统一的编辑+组合智能体在决策上与重计算相同,延迟降低高达14.9倍。该方法适用于任何逐令牌注意力KV缓存,在规模、量化、混合专家和多模态缓存上得到验证,并通过小型适配器扩展到多种注意力变体。由于勘误仅追加,它与生产环境中的前缀缓存兼容:在在线vLLM基准测试中,它保持前缀缓存对齐(命中率98.5%),将p90首次令牌延迟降低53-398倍。

英文摘要

Prefix caching reuses prefill only across an exactly shared prefix, so one changed field invalidates the entire downstream cache. Yet overwriting the field's own key/value vectors and reusing the rest leaves the model acting on the old value. The reason, established causally across four model families: at prefill the model has already written the field-conditioned conclusion onto downstream notes; the field's own key/value drives under 1% of the decision. Read as a notebook of memoized conclusions, two capabilities follow. (1) It is editable. A salient erratum amends the notes; and with chain-of-thought, editing the field alone recovers the decision (1.00 at 8B, ~1% compute), while without CoT it is ignored. (2) It is composable. The notes are position-portable, so a precompiled skill can be RoPE-repositioned and spliced into any context, indistinguishable from full recompute (logit cosine 0.90-0.999, twelve models) at O(L) rather than O(L^2) time-to-first-token. A unified edit+compose agent stays decision-identical to recompute at up to 14.9x lower latency. The approach applies to any per-token attention KV cache, validated across scale, quantization, Mixture-of-Experts, and multimodal caches, and extends to several attention variants through small adapters. Because the erratum is append-only, it composes with production prefix caching: in an online vLLM benchmark it keeps the prefix cache-aligned (98.5% hit-rate), cutting p90 time-to-first-token by 53-398x.

2606.17199 2026-06-17 cs.LG cs.AI 新提交

PowerOPD: Stabilizing On-Policy Distillation with Bounded Power Transformation

PowerOPD:利用有界幂变换稳定在线策略蒸馏

Anhao Zhao, Junlong Tong, Yingqi Fan, Ping Nie, Wenjie Li, Xiaoyu Shen

发表机构 * Eastern Institute of Technology, Ningbo(宁波东方理工大学) The Hong Kong Polytechnic University(香港理工大学) Shanghai Jiao Tong University(上海交通大学) University of Waterloo(滑铁卢大学)

AI总结 针对在线策略蒸馏中log-ratio奖励无界导致训练不稳定问题,提出基于Box-Cox幂变换的有界、符号一致奖励族PowerOPD,在数学推理任务上平均提升Avg@8/Pass@8达+6.37/+5.71,并降低59.2%时间与23.1%显存。

详情
AI中文摘要

大型语言模型的标准在线策略蒸馏(OPD)利用学生采样令牌估计反向KL散度,得到一个无偏的单样本蒙特卡洛估计器,避免了全词汇计算。然而,我们表明该估计器在实践中存在严重的训练病态:样本效率低、生成动态不稳定,以及与精确全词汇OPD相比显著的性能差距。奖励级别的诊断将这些病态追溯到log-ratio奖励,该奖励在结构上无界,产生极高方差的梯度,集中在早期位置并持续整个训练;标准的后验缩放方法仅在失真发生后操作,因此失效。为解决此问题,我们提出PowerOPD:一个源自Box-Cox幂变换的原生有界、符号一致的奖励族,由alpha > 0参数化,其中log-ratio是其退化极限alpha -> 0。在六个数学推理基准和四个Qwen3师生对中,PowerOPD在基准平均Avg@8/Pass@8上相比原始OPD提升高达+6.37/+5.71,相比后验稳定化提升+3.01/+3.54,相比全词汇OPD提升+2.59/+8.90,同时减少59.2%的挂钟时间和23.1%的峰值GPU内存。较大的alpha通常提高准确率,一致缩短响应长度,并使梯度范数比原始OPD小3000倍以上。

英文摘要

Standard on-policy distillation (OPD) for large language models estimates the reverse-KL objective using student-sampled tokens, yielding an unbiased single-sample Monte Carlo estimator that avoids vocabulary-wide computation. However, we show that this estimator suffers from severe training pathologies in practice: sample inefficiency, unstable generation dynamics, and a substantial performance gap compared to exact full-vocabulary OPD. Reward-level diagnosis traces these pathologies to the log-ratio reward, which is unbounded by construction, producing extremely high-variance gradients concentrated at early positions and persisting throughout training; standard post-hoc scaling fail as they operate only after this distortion occurs. To solve this problem, we propose PowerOPD: a family of natively bounded, sign-consistent rewards from the Box-Cox power transformation, parameterized by alpha > 0, of which the log-ratio is the degenerate alpha -> 0 limit. Across six mathematical reasoning benchmarks and four Qwen3 teacher-student pairs, PowerOPD achieves benchmark-averaged Avg@8/Pass@8 gains of up to +6.37/+5.71 over vanilla OPD, +3.01/+3.54 over post-hoc stabilization, and +2.59/+8.90 over full-vocabulary OPD, while reducing wall-clock time by 59.2% and peak GPU memory by 23.1%. Larger alpha generally improves accuracy, consistently shortens responses, and keeps gradient norms more than 3,000x smaller than vanilla OPD.

2606.17399 2026-06-17 cs.LG cs.AI 新提交

The Discrete-Log Clock: How a Transformer Learns Modular Multiplication

离散对数时钟:Transformer如何学习模乘法

Huu Danh Nguyen

发表机构 * Stanford University(斯坦福大学)

AI总结 通过乘法特征变换分析,发现Transformer在模乘法任务中学习到稀疏的傅里叶谱,其嵌入和MLP神经元主要编码少数乘法频率,表明模型实现了离散对数空间中的加法运算,即“离散对数时钟”算法。

Comments 5 pages, 5 figures. Accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

当小型Transformer在模乘法任务中实现“grok”时,先前研究报告学习到的嵌入具有“密集”的傅里叶谱,需要所有频率。这与模加法形成对比,后者只需一组稀疏的关键频率。我们证明这种密度是错误基下分析的伪像。乘法的自然傅里叶变换不是标准加法DFT,而是乘法特征变换,它将乘法群$(\mathbb{Z}/p\mathbb{Z})^*$上的函数分解为其不可约表示。将此变换应用于在$a \cdot b \bmod 113$上训练的grokked Transformer,我们发现嵌入谱变得高度稀疏(基尼系数0.58 vs 加法基下的0.07),仅4个关键频率携带显著能量。此外,96.9%的MLP神经元被干净地调谐到单个乘法频率,并且神经元激活热图在按离散对数重排序后显示出二维周期结构。这些结果表明Transformer将乘法简化为离散对数空间中的加法,实现了类似于Nanda等人针对加法的Clock算法的“离散对数时钟”算法。该方法具有普适性:将分析基与任务的代数结构匹配,可以在标准工具视为噪声的地方揭示可解释结构。

英文摘要

When small transformers grok modular multiplication, prior work reports that the learned embedding has a "dense" Fourier spectrum requiring all frequencies. This contrasts with modular addition, where only a sparse set of key frequencies suffices. We show this density is an artifact of analyzing in the wrong basis. The natural Fourier transform for multiplication is not the standard additive DFT but the multiplicative character transform, which decomposes functions on the multiplicative group $(\mathbb{Z}/p\mathbb{Z})^*$ into its irreducible representations. Applying this transform to a grokked transformer trained on $a \cdot b \bmod 113$, we find the embedding spectrum becomes highly sparse (Gini coefficient 0.58 vs. 0.07 in the additive basis) with only 4 key frequencies carrying significant energy. Furthermore, 96.9% of MLP neurons are cleanly tuned to a single multiplicative frequency, and neuron activation heatmaps reveal 2D-periodic structure when reordered by the discrete logarithm. These results demonstrate the transformer reduces multiplication to addition in discrete-log space, implementing a "Discrete-Log Clock" algorithm analogous to Nanda et al.'s Clock algorithm for addition. The methodology generalizes: matching the analysis basis to the algebraic structure of the task reveals interpretable structure where standard tools see noise.

2606.17567 2026-06-17 cs.LG 新提交

Reducing Learner Redundancy in Boosting via Residual Orthogonalization

通过残差正交化减少Boosting中的学习器冗余

Ye Su, Jipeng Guo, Yong Liu, Xin Xu, Gangchun Zhang, Jinxin Chen, Di Wu, Longlong Zhao

发表机构 * Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences(中国科学院深圳先进技术研究院) College of Information Science and Technology, Beijing University of Chemical Technology(北京化工大学信息科学与技术学院) Gaoling School of Artificial Intelligence, Renmin University of China(中国人民大学高瓴人工智能学院) School of Computer Science, Central China Normal University(华中师范大学计算机学院) the School of Computing, Engineering and Mathematical Sciences, La Trobe University(拉筹伯大学计算、工程与数学科学学院)

AI总结 针对Boosting中残差拟合导致的学习器冗余问题,提出SCBoost框架,通过谱残差投影和协方差正则加权两种机制减少冗余,理论证明其几何性质,实验表明在精度和F1分数上表现优异。

详情
AI中文摘要

虽然顺序残差拟合是标准Boosting框架的基础,但它通过反复处理相关的误差成分,内在导致了学习器冗余。为了解决这一瓶颈,我们提出从残差拟合转向\textit{残差正交化},并引入SCBoost。我们的框架通过两种互补机制处理冗余:谱残差投影(SRP)和协方差正则加权(CRW)。在训练过程中,SRP将每个残差目标投影到历史预测子空间的正交补上,迫使后续学习器仅捕获新的经验创新。在聚合过程中,CRW在验证集上优化集成权重,并加入显式的协方差惩罚以减轻剩余相关性。理论上,我们提供了有限样本的几何刻画,证明SRP产生精确的加性残差能量分解。此外,在各向同性噪声假设下,我们严格建立了该投影改善有效信噪比的条件。在十个基准数据集上的大量实验表明,SCBoost在开箱即用的情况下表现出色,特别是在准确率和F1分数上。这项工作通过几何视角重新诠释了Boosting,表明显式的冗余控制是迈向更高效集成架构的一个有原则且必要的步骤。

英文摘要

While sequential residual fitting is the bedrock of standard boosting frameworks, it inherently breeds learner redundancy by repeatedly revisiting correlated error components. To address this bottleneck, we propose a shift from residual fitting to \textit{residual orthogonalization} and introduce SCBoost. Our framework tackles redundancy through two complementary mechanisms: Spectral Residual Projection (SRP) and Covariance-Regularized Weighting (CRW). During training, SRP projects each residual target onto the orthogonal complement of the historical prediction subspace, forcing successive learners to capture only novel empirical innovations. During aggregation, CRW optimizes ensemble weights on a validation set with an explicit covariance penalty to mitigate remaining correlations. Theoretically, we provide a finite-sample geometric characterization proving that SRP yields an exact additive residual-energy decomposition. Furthermore, under an isotropic-noise assumption, we rigorously establish the conditions under which this projection improves the effective Signal-to-Noise Ratio. Extensive experiments across ten benchmark datasets demonstrate that SCBoost delivers strong out-of-the-box performance, particularly in accuracy and F1 score. This work reinterprets boosting through a geometric lens, suggesting that explicit redundancy control is a principled and necessary step toward more efficient ensemble architectures.

2606.17572 2026-06-17 cs.LG cs.SY eess.SY 新提交

When Dynamics Models Read the Wrong Time Steps: Label-Free Event Credit Re-Anchoring for Robust Global Readouts

当动力学模型读取错误的时间步:无标签事件信用重锚定以实现鲁棒的全局读出

Yifan Wang

AI总结 针对序列到全局接口中的时间信用稀释问题,提出无训练无标签的CREST方法,通过事件核心估计与对比重锚定,减少分布外误差并恢复事件信用。

Comments 7 pages, 6 figures

详情
AI中文摘要

学习到的动力学模型通常通过将每步特征序列池化为一个读出向量来回答全局物理问题,如故障严重性或冲击刚度。这种序列到全局的接口产生了一个未被充分研究的时间信用问题:在仅有轨迹级监督的情况下,模型可以在训练条件下准确预测,同时从丰富的平滑相关物而非决定目标的短暂物理事件中读取信息。我们将这种失败称为时间信用稀释。它不会被训练损失暴露,也不会被标准的物理信息残差消除,因为错误在于全局读出分配功能信用的位置。我们引入了Credit-in-Event,一种接口级探针,用于测量池化信用落在事件步上的程度,并闭式证明当事件分数缩小时,池化线性读取器将信用路由到虚假的背景通道。然后我们提出了CREST,一种无训练且无标签的读出方法,它从学习到的特征中估计瞬态事件核心,并通过事件与其余部分的对比重锚定池化表示。在模拟齿轮和冲击系统、循环和注意力编码器以及公共轴承振动数据上,CREST减少了分布外误差,同时恢复了事件信用。消融实验表明,稳定步选择和感受野缩小失败,证实了增益来自事件核心信用重锚定,而非通用的局部性或稳定性先验。

英文摘要

Learned dynamics models often answer global physical questions, such as fault severity or impact stiffness, by pooling a per-step feature sequence into one readout vector. This sequence-to-global interface creates an under-studied temporal credit problem: with only trajectory-level supervision, a model can predict accurately in training conditions while reading from abundant smooth correlates rather than the brief physical events that determine the target. We call this failure temporal credit dilution. It is not exposed by the training loss and is not removed by standard physics-informed residuals, because the error lies in where the global readout assigns functional credit. We introduce Credit-in-Event, an interface-level probe for measuring how much pooled credit lands on event steps, and prove in closed form that a pooled linear reader routes credit to a spurious background channel as the event fraction shrinks. We then propose CREST, a training-free and label-free readout that estimates a transient event core from learned features and re-anchors the pooled representation through event-versus-rest contrast. Across simulated gear and impact systems, recurrent and attention encoders, and public bearing vibration data, CREST reduces out-of-distribution error while restoring event credit. Ablations show that stable-step selection and receptive-field shrinking fail, confirming that the gain comes from event-core credit re-anchoring rather than a generic locality or stability prior.

2606.17816 2026-06-17 cs.LG cs.AI 新提交

Conservation Laws for Modern Neural Architectures

现代神经架构的守恒律

Viet-Hoang Tran, Vinh Khanh Bui, Tan Lai Ngoc, Nam Nguyen, Tuan Dam, Tan M. Nguyen

发表机构 * National University of Singapore(新加坡国立大学) Center for AI Research, VinUniversity(Vin大学人工智能研究中心) Independent Researcher(独立研究者) Hanoi University of Science and Technology(河内科学技术大学)

AI总结 本文提出统一框架,刻画GELU、SiLU、SwiGLU激活的前馈网络、多头注意力及混合专家模型中的梯度流守恒律,实验验证了理论预测的不变量。

Comments Published at the International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

理解梯度下降动力学是解释过参数化模型成功的关键,其中隐式偏差通过梯度流中的守恒律体现。尽管这类定律在线性和ReLU网络中已被充分理解,但在现代架构中仍鲜有探索。本文开发了一个统一框架,用于刻画当代模型中的守恒律,包括具有GELU、SiLU和SwiGLU激活的前馈网络、具有正弦和旋转位置编码的多头注意力,以及多种门控设计下的混合专家架构。我们的理论发现得到了实验支持,实验验证了预测的不变量。

英文摘要

Understanding gradient descent dynamics is key to explaining the success of over-parameterized models, where implicit bias manifests through conservation laws in gradient flow. While such laws are well understood for linear and ReLU networks, they remain largely unexplored for modern architectures. This work develops a unified framework to characterize conservation laws for contemporary models, including feedforward networks with GELU, SiLU, and SwiGLU activations, multihead attention with sinusoidal and rotary positional encodings, and Mixture-of-Experts architectures under diverse gating designs. Our theoretical findings are supported by experiments that validate the predicted invariants.

2606.17830 2026-06-17 cs.LG cs.AI 新提交

Functional Equivalence in Attention: A Comprehensive Study with Applications to Linear Mode Connectivity

注意力中的功能等价性:一项综合研究及其在线性模式连通性中的应用

Viet-Hoang Tran, Vinh Khanh Bui, Van-Hoan Trinh, Tan Lai Ngoc, Tan M. Nguyen

发表机构 * National University of Singapore(新加坡国立大学) Center for AI Research, VinUniversity(Vin大学人工智能研究中心) Independent Researcher(独立研究者) Technical University of Munich(慕尼黑技术大学)

AI总结 本文形式化研究了Transformer中位置编码对功能等价性的影响,发现正弦编码保持原始注意力的对称性,而旋转编码显著减少对称群从而增强表达力,并通过对齐算法实证了位置编码对线性模式连通性的关键作用。

Comments Published at the International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

神经网络参数空间本质上是非单射的,因为不同的参数配置可以通过功能等价性实现相同的函数。虽然这种对称性在经典的全连接和卷积模型中已被充分理解,但在现代基于注意力的架构中变得更为复杂。现有的多头注意力分析主要关注原始公式,忽略了从根本上重塑架构对称性的位置编码。在这项工作中,我们提供了对带有位置编码的Transformer中功能等价性的形式化研究。聚焦于两种最广泛使用的变体——正弦和旋转位置编码(RoPE)——我们表明正弦编码保留了原始注意力的等价结构,而旋转编码显著减少了对称群,从而增强了表达力。这为RoPE在实践中日益突出的地位提供了原则性解释。我们进一步研究了位置编码如何影响线性模式连通性,并通过一种对齐算法,实证表明Transformer设置中连通性的存在和可变性关键取决于位置编码。

英文摘要

Neural network parameter spaces are inherently non-injective, as distinct parameter configurations can realize identical functions through functional equivalence. While this symmetry is well understood in classical fully connected and convolutional models, it becomes substantially more intricate in modern attention-based architectures. Existing analyses of multihead attention have largely focused on the vanilla formulation, overlooking positional encodings that fundamentally reshape architectural symmetries. In this work, we provide a formal study of functional equivalence in Transformers with positional encodings. Focusing on the two most widely used variants--sinusoidal and rotary positional encodings (RoPE)--we show that sinusoidal encodings preserve the equivalence structure of vanilla attention, whereas rotary encodings significantly reduce the symmetry group, thereby enhancing expressivity. This offers a principled explanation for the growing prominence of RoPE in practice. We further examine how positional encodings affect linear mode connectivity, and through an alignment algorithm, empirically demonstrate that the presence and variability of connectivity across Transformer settings crucially depend on the positional encoding.

2606.17832 2026-06-17 cs.LG 新提交

From Drift to Coherence: Stabilizing Beliefs in LLMs

从漂移到一致:稳定LLM中的信念

SongEun Kim, Seungyoo Lee, Edwin Fong, Hyungi Lee, Juho Lee

发表机构 * Department of Statistics, Seoul National University Korea Advanced Institute of Science \& Technology Department of AI, Kookmin University University of Hong Kong

AI总结 研究LLM在多项选择问答中的信念漂移问题,提出提示式预测重采样(PPR)方法,发现信念过程会自稳定并收敛,进而提出种子答案提示策略和自一致性损失以加速稳定并提高预测一致性。

详情
AI中文摘要

大型语言模型(LLM)常被假设执行隐式贝叶斯推理,然而一个关键的一致性条件——预测信念的鞅性质——已被证明在受控的合成上下文学习设置中失效。我们在更典型的使用场景中重新审视这个问题:通用多项选择问答。利用离散答案空间,我们计算精确的预测分布,并研究由自回归答案重采样引起的信念动态。我们引入了提示式预测重采样(PPR),其中LLM对同一问题生成一系列答案。实验表明,PPR揭示了早期阶段的信念漂移,表明鞅性质被违反。然而,在足够的重采样步骤后,信念过程自稳定并收敛到一个一致的预测分布。基于这一观察,我们进一步提出了(i)种子答案提示策略以加速稳定,以及(ii)自一致性损失,通过微调将早期漂移摊销到模型中。在多项选择问答基准上的实验表明,我们的方法在不牺牲准确性的情况下显著减少了信念漂移并提高了预测一致性。

英文摘要

Large language models (LLMs) are often hypothesized to perform implicit Bayesian inference, yet a key coherence condition, the martingale property of predictive beliefs, has been shown to fail in controlled synthetic in-context learning settings. We revisit this question in a more typical usage regime: generic multiple-choice question answering. Exploiting the discrete answer space, we compute exact predictive distributions and study belief dynamics induced by autoregressive answer resampling. We introduce prompted predictive resampling (PPR), where an LLM generates a sequence of answers to the same question. Empirically, PPR reveals early-stage belief drift, indicating martingale violations. However, after sufficient resampling steps, the belief process self-stabilizes and converges to a coherent predictive distribution. Based on this observation, we further propose (i) a seed-answer prompting strategy to accelerate stabilization, and (ii) a self-consistency loss that amortizes early-stage drift into the model via fine-tuning. Experiments on multiple-choice QA benchmarks show that our methods substantially reduce belief drift and improve predictive coherence without sacrificing accuracy.

2606.17886 2026-06-17 cs.LG 新提交

Monotonic Kolmogorov-Arnold Networks: A Theoretical and Empirical Study of Monotonicity as an Inductive Bias

单调Kolmogorov-Arnold网络:单调性作为归纳偏置的理论与实证研究

Mikhail Krasnov, Carolina Fortuna, Blaž Bertalanič

发表机构 * Jozef Stefan Institute(约瑟夫·斯特凡研究所)

AI总结 提出MKAN,通过指数重参数化B样条系数、正边权和单调基激活实现硬单调性,理论证明任何特征提取器可被单调化且编码器规模有界,实验表明MKAN在单调性基准上达到最优并保持KAN的逐边功能透明性。

详情
AI中文摘要

单调性一直是神经网络长期使用的架构归纳偏置,其动机来源于表格、科学和经济场景,其中输出已知对某些输入呈单调响应。现有方法基于MLP或流模型,缺乏逐边功能透明性;唯一具有单调性的KAN变体MonoKAN仅在受限参数子集上施加约束,并需要投影式训练过程。我们通过\textbf{MKAN}填补了这一空白,MKAN是一种KAN,通过B样条系数的指数重参数化、正边权和单调基激活,对所有参数值保证硬单调性。训练简化为标准的无约束梯度下降。我们的主要理论贡献是一个\textbf{表示代价}定理:任何诱导球状语义邻域划分的$C^K, K >0$特征提取器,都可以在$N' = N^* + k \le 2N^*$处实现等价邻域结构的单调实现,其中$k$是原始非单调坐标的数量。该界限与架构无关,并为单调编码器提供了原则性的规模确定规则。实验上,MKAN在SMM/ICML-2024基准上与最先进的单调神经网络竞争,同时是唯一结合了硬无约束单调性和KAN逐边功能透明性的方法;在四个真实数据集上的自监督特征规模扫描中验证了$2N^*$预测,在受控单调生成数据集上,MKAN以显著高于KAN、MLP和线性基线的Spearman对齐恢复了真实因子。

英文摘要

Monotonicity has been a long-running architectural inductive bias for neural networks, motivated by tabular, scientific, and economic settings where outputs are known to respond monotonically to certain inputs. Existing approaches are MLP- or flow-based and lack per-edge functional transparency; the only Kolmogorov--Arnold Network (KAN) variant with monotonicity, MonoKAN, enforces the constraint only on a restricted parameter subset and requires a projection-style training procedure. We close this gap with \textbf{MKAN}, a KAN with hard monotonicity guaranteed for \emph{all} parameter values via exponential reparameterization of B-spline coefficients, positive edge weights, and a monotone base activation. Training reduces to standard unconstrained gradient descent. Our headline theoretical contribution is a \emph{representation-cost} theorem: any $C^K, K >0$ feature extractor inducing a ball-shaped semantic-neighborhood partition admits a monotone realization of the equivalent neighborhood structure at $N' = N^* + k \le 2N^*$, where $k$ is the number of non-monotone coordinates of the original. The bound is architecture-agnostic and gives a principled sizing rule for monotone encoders. Empirically, MKAN is competitive with state-of-the-art monotone NNs on the SMM/ICML-2024 benchmark while being the only method that combines hard unconstrained monotonicity with KAN's per-edge functional transparency; the $2N^*$ prediction is validated in a self-supervised feature-size sweep on four real datasets, and on a controlled monotone-generative dataset MKAN recovers ground-truth factors with substantially higher Spearman alignment than KAN, MLP, and linear baselines.

2606.17927 2026-06-17 cs.LG cs.AI 新提交

KANLib -- An Modular, Extensible and Fast Kolmogorov-Arnold Network Implementation

KANLib -- 一个模块化、可扩展且快速的Kolmogorov-Arnold网络实现

Julian Hoever, Gregor Schiele

发表机构 * Intelligent Embedded Systems University of Duisburg-Essen(智能嵌入式系统杜伊斯堡-埃森大学)

AI总结 提出KANLib框架,通过统一现有KAN实现、支持多种基函数和自适应网格缩放,在保持灵活性和高性能的同时,实现可复现的预测结果。

详情
AI中文摘要

Kolmogorov-Arnold网络(KAN)最近通过用可学习的一元函数替代线性权重,成为传统多层感知器的一种有前途的替代方案。尽管在可解释性和表达能力方面具有理论优势,但由于高计算成本和现有框架中不一致的功能支持,KAN的实际研究仍然困难。本文介绍了KANLib,一个用于开发和评估KAN架构的模块化、可扩展且计算高效的框架。KANLib在强调灵活性、功能一致性和高性能的一致软件架构中,统一了现有实现(包括PyKAN、EfficientKAN和FastKAN)的核心概念。该框架支持两种基函数类型、自适应网格缩放、网格扩展和细粒度架构定制,同时保持与标准PyTorch工作流的兼容性。在加利福尼亚房价基准上的实验评估表明,KANLib在重现已建立参考KAN实现的预测行为的同时,实现了具有竞争力的计算效率。此外,该框架能够探索超出标准KAN公式的架构变体,且对预测性能影响很小。总体而言,KANLib为未来关于可扩展和可扩展KAN架构的研究提供了坚实的基础。

英文摘要

Kolmogorov-Arnold Networks (KANs) have recently emerged as a promising alternative to traditional multilayer perceptrons by replacing linear weights with learnable univariate functions. Despite their theoretical advantages in interpretability and expressiveness, practical research of KANs remains difficult due to high computational costs and inconsistent feature support across existing frameworks. This paper introduces KANLib, a modular, extensible, and computationally efficient framework for developing and evaluating KAN architectures. KANLib unifies core concepts from existing implementations, including PyKAN, EfficientKAN, and FastKAN, within a consistent software architecture that emphasizes flexibility, feature parity, and high performance. The framework supports two basis function types, adaptive grid rescaling, grid extension, and fine-grained architectural customization while maintaining compatibility with standard PyTorch workflows. Experimental evaluation on the California Housing benchmark demonstrates that KANLib reproduces the predictive behavior of established reference KAN implementations while achieving competitive computational efficiency. Furthermore, the framework enables the exploration of architectural variations beyond standard KAN formulations with only minor impacts on predictive performance. Overall, KANLib provides a robust foundation for future research on scalable and extensible KAN architectures.

2606.17952 2026-06-17 cs.LG cs.AI 新提交

SoftMoE: Soft Differentiable Routing for Mixture-of-Experts in LLMs

SoftMoE: 用于大语言模型混合专家网络的软可微路由

Mikołaj Zasada, Łukasz Struski, Jacek Tabor, Marcin Kurdziel

发表机构 * AGH University of Krakow, Poland(克拉科夫AGH大学) Faculty of Mathematics(数学系) Computer Science, Jagiellonian University, Poland(计算机科学系,杰哥利安大学,波兰) Centre for Credible Artificial Intelligence, Warsaw University of Technology(可信人工智能中心,华沙技术大学)

AI总结 提出SoftMoE,通过软top-k LapSum松弛替代离散路由,实现专家路由的梯度优化,并学习每层专家激活数量,在语言建模中激活更少专家达到相当或更优性能。

Comments Accepted at ICML 2026

详情
AI中文摘要

稀疏混合专家(MoE)架构通过仅激活一小部分专家(通过top-$k$路由)在固定推理预算下扩展LLM参数。虽然这保持了因果性并适用于自回归语言模型,但离散的top-$k$算子不可微,强制每个输入激活固定数量的专家,导致计算利用效率低下。我们提出SoftMoE,用截断的软top-$k$ LapSum松弛替代离散路由,允许基于梯度的专家路由优化。我们进一步参数化每层平均激活专家数,并施加全局预算约束,使模型能够学习跨层分配专家容量。SoftMoE完全兼容自回归建模,在语言建模和下游任务上达到与稀疏MoE相当或更优的性能,同时激活显著更少的专家。值得注意的是,学习到的分配高度非均匀,后层激活更多专家。源代码已公开$^\dagger$。

英文摘要

Sparse Mixture-of-Experts (MoE) architectures enable scaling LLM parameters under a fixed inference budget by activating only a small subset of experts via top-$k$ routing. While this preserves causality and suits autoregressive language models, the discrete top-$k$ operator is not differentiable, forcing a fixed number of active experts per input and resulting in inefficient use of computation. We propose SoftMoE, which replaces discrete routing with a truncated soft top-$k$ LapSum relaxation, allowing gradient-based optimization of expert routing. We further parameterize the mean number of active experts per layer and impose a global budget constraint, enabling the model to learn how to allocate expert capacity across layers. SoftMoE remains fully compatible with autoregressive modeling and achieves performance comparable to or better than sparse MoE on language modeling and downstream tasks, while activating significantly fewer experts. Notably, the learned allocation is highly non-uniform, with later layers activating more experts. The source code is publicly available$^\dagger$.

2606.18023 2026-06-17 cs.LG cs.AI 新提交

LoopCoder-v2: Only Loop Once for Efficient Test-Time Computation Scaling

LoopCoder-v2: 仅循环一次以实现高效的测试时计算扩展

Jian Yang, Shawn Guo, Wei Zhang, Tianyu Zheng, Yaxin Du, Haau-Sing Li, Jiajun Wu, Yue Song, Yan Xing, Qingsong Cai, Zelong Huang, Chuan Hao, Ran Tao, Xianglong Liu, Wayne Xin Zhao, Mingjie Tang, Weifeng Lv, Ming Zhou, Bryan Dai

发表机构 * Beihang University(北京航空航天大学) IQuest Research Langboat(浪波) Renmin University of China(中国人民大学)

AI总结 本文提出并行循环Transformer(PLT)并研究循环次数选择,发现两循环变体在代码生成等任务上显著提升,而三循环以上性能下降,揭示了增益-成本权衡。

详情
AI中文摘要

循环Transformer通过重复应用共享块来扩展潜在计算,但顺序循环会随着循环次数增加延迟和KV缓存内存。并行循环Transformer(PLT)通过跨循环位置偏移(CLP)和共享KV门控滑动窗口注意力来缓解这一成本,使循环次数成为实际设计选择。因此,我们通过增益-成本视角研究PLT循环次数选择:额外的循环可能细化表示,但CLP在每个循环边界引入位置不匹配。我们通过从头训练LoopCoder-v2(一组具有不同循环次数的7B PLT编码器)在18T token上,随后进行匹配的指令调优和评估来实例化这项研究。经验上,两循环变体在代码生成、代码推理、代理软件工程和工具使用基准上比无循环基线带来广泛提升,将SWE-bench Verified从43.0提高到64.4分,Multi-SWE从14.0提高到31.0分。相比之下,三循环或更多循环的变体性能下降,揭示了强烈的非单调循环次数效应。我们的诊断表明,循环2提供了主要的生产性细化,而后续循环产生递减、振荡的更新和降低的表示多样性。由于CLP引起的不匹配在细化收益缩小时大致固定,偏移成本日益占主导。这种增益-成本权衡解释了PLT在两循环处饱和,并为循环次数选择提供了诊断。

英文摘要

Looped Transformers scale latent computation by repeatedly applying shared blocks, but sequential looping increases latency and KV-cache memory with the loop count. Parallel loop Transformers (PLT) alleviate this cost through cross-loop position offsets (CLP) and shared-KV gated sliding-window attention, making loop count a practical design choice. We therefore study PLT loop-count selection through a gain--cost view: an extra loop may refine representations, but CLP also introduces a positional mismatch at each loop boundary. We instantiate this study by training LoopCoder-v2, a family of 7B PLT coders with different loop counts, from scratch on 18T tokens, followed by matched instruction tuning and evaluation. Empirically, the two-loop variant delivers broad gains over the non-looped baseline across code generation, code reasoning, agentic software engineering, and tool-use benchmarks, improving SWE-bench Verified from 43.0 to 64.4 points and Multi-SWE from 14.0 to 31.0 points. In contrast, variants with three or more loops regress, revealing a strongly non-monotonic loop-count effect. Our diagnostics show that loop 2 provides the main productive refinement, while later loops yield diminishing, oscillatory updates and reduced representational diversity. Because the CLP-induced mismatch remains roughly fixed as refinement gains shrink, the offset cost increasingly dominates. This gain--cost trade-off explains PLT's saturation at two loops and provides diagnostics for loop-count selection.

2606.18208 2026-06-17 cs.LG cs.AI cs.CL cs.CV 新提交

Looped World Models

循环世界模型

Hongyuan Adam Lu, Z. L. Victor Wei, Qun Zhang, Jinrui Zeng, Bowen Cao, Lingwei Meng, Mocheng Li, Zezhong Wang, Haonan Yin, Naifu Xue, Minyu Chen, Cenyuan Zhang, Zefan Zhang, Hao Wei, Jiawei Zhou, Haoran Xu, Hao Yang, Ronglai Zuo, Tongda Xu, Yonghao Li, Jian Chen, Hebin Wang, Zeyu Gao, Yang Li, Wei Zhao, Qimin Zhong, Siqi Liu, Yumeng Zhang, Leyan Cui, Zhangyu Wang, Wai Lam

发表机构 * FaceMind Research Asia

AI总结 提出循环世界模型(LoopWM),通过参数共享的Transformer块迭代细化潜在环境状态,实现高达100倍参数效率,并建立迭代潜在深度作为世界模拟的新缩放轴。

Comments Technical Report

详情
AI中文摘要

当前的世界模型面临一个基本矛盾:忠实的长期模拟需要深度计算,但更深的模型部署成本高且容易产生累积误差。我们通过引入循环世界模型(LoopWM)来解决这一问题,这是首个用于世界建模的循环架构。我们的方法通过一个参数共享的Transformer块迭代地细化潜在环境状态。这带来了高达100倍于传统方法的参数效率,并具有自适应计算能力,可自动调整深度以匹配每个预测步骤的复杂性。与缩放模型大小和训练数据正交,LoopWM建立了迭代潜在深度作为世界模拟的新缩放轴,这可能显著推动社区发展。

英文摘要

Current world models face a fundamental tension: faithful long-horizon simulation demands deep computation, but deeper models are expensive to deploy and prone to compounding errors. We resolve this by introducing Looped World Models (LoopWM), which are the first looped architectures for world modelling. Our method iteratively refines latent environment states through a parameter-shared transformer block. This yield up to 100x parameter efficiency over conventional approaches with adaptive computation that automatically scales depth to match the complexity of each prediction step. Orthogonal to scaling model size and training data, LoopWM establishes iterative latent depth as a new scaling axis for world simulation, which might significantly push the community forward.

2606.17408 2026-06-17 cs.RO cs.CV cs.LG 交叉投稿

Where Should Action Generation Begin? A Learnable Source Prior for Generative Robot Policies

动作生成应从何处开始?面向生成式机器人策略的可学习源先验

Meipo Dai, Qiyuan Zhuang, He-Yang Xu, Ying-Jie Shuai, Yijun Wang, Qi Dou, Xiu-Shen Wei

发表机构 * Southeast University(东南大学) The Chinese University of Hong Kong(香港中文大学)

AI总结 提出LeaP,用轻量MLP预测基于本体感知的对角高斯分布作为动作生成源先验,替代标准高斯分布,在15个RoboTwin任务中平均成功率81.6%,优于基线方法6.5-25.5个百分点。

详情
AI中文摘要

生成式机器人策略通常从与观测无关的标准高斯分布开始动作生成,源分布的选择尚未被充分探索。本文提出一个简单问题:动作生成应从何处开始?我们提出LeaP,一种可学习源先验,用基于本体感知的对角高斯分布(作用于动作块)替代标准高斯分布。通过轻量MLP参数化,LeaP联合预测源分布的均值和状态自适应方差,同时保持下游生成器架构和推理求解器不变。这种设计提供了观测信息驱动的随机初始化,使生成器能够专注于精确的动作细化,而非从无信息的噪声源传输样本。在15个RoboTwin操作任务中,LeaP实现了81.6%的平均成功率,优于四个代表性基线——包括确定性源方法、无先验对应方法和扩散桥策略——6.5至25.5个百分点。相同的先验一致地改进了流匹配和扩散桥生成器,同时使用更少的参数且收敛更快。该优势延续到实际部署中,LeaP取得了最佳性能。这些结果表明,源分布是生成式机器人策略的一个独立且可重用的设计轴,与生成动力学的选择互补。

英文摘要

Generative robot policies typically begin action generation from an observation-independent standard Gaussian distribution, leaving the choice of source distribution underexplored. This work asks a simple question: where should action generation begin? We propose LeaP, a Learnable source Prior that replaces the standard Gaussian with a proprioception-conditioned diagonal Gaussian over action chunks. Parameterized by a lightweight MLP, LeaP jointly predicts the mean and state-adaptive variance of the source distribution, while keeping the downstream generator architecture and inference solver unchanged. This design provides an observation-informed yet stochastic initialization, allowing the generator to focus on precise action refinement rather than transporting samples from an uninformed noise source. On 15 RoboTwin manipulation tasks, LeaP achieves an average success rate of 81.6%, outperforming four representative baselines -- including deterministic-source methods, a no-prior counterpart, and a diffusion-bridge policy -- by 6.5 to 25.5 percentage points. The same prior consistently improves both flow-matching and diffusion-bridge generators, while using fewer parameters and converging faster. The advantage carries over to real-world deployment, where LeaP attains the best performance. These results suggest that the source distribution is an independent and reusable design axis for generative robot policies, complementary to the choice of generative dynamics.

2606.17522 2026-06-17 cs.CL cs.LG 交叉投稿

An expressivity analysis of hierarchical modelling in deep transformers via bounded-depth grammars

深度Transformer中基于有界深度文法的层次建模的表达性分析

Vinoth Nandakumar, Qiang Qu, Pramod Thebe, Sakshi Khachariya, Tongliang Liu

发表机构 * University of Sydney(悉尼大学) San Francisco State University(旧金山州立大学) IIT Madras(印度理工学院马德拉斯分校)

AI总结 通过有界深度上下文无关文法,证明深度Transformer的深度随文法深度线性增长,神经元数随派生树形状和产生式规则数量缩放,支持线性表示假说。

详情
AI中文摘要

深度神经网络普遍被认为其表达能力源于形成层次表示的能力,即在各层中逐步捕获更抽象和组合的特征。在语言建模中,Transformer已成为主导架构,早期层捕获局部句法模式,后期层编码更复杂的从句级依赖。尽管这种直觉塑造了模型设计,但缺乏严格的理论工作来展示深度Transformer如何表示这种层次结构。本文通过有界深度、非递归上下文无关文法的形式化视角,分析深度Transformer模型的表达性。对于这类文法,我们显式构造了具有位置注意力的Transformer,其深度随文法深度线性增长,而神经元数量随派生树形状数量以及产生式规则数量的平方缩放。我们的理论结果支持线性表示假说,证明了这些架构具有将抽象语法状态编码为残差流中低维、线性可分子空间的结构能力。

英文摘要

Deep neural networks are widely believed to derive their expressive power from their ability to form \textbf{hierarchical representations}, capturing progressively more abstract and compositional features across layers. In language modeling, \textbf{transformers} have emerged as the dominant architecture, with early layers capturing local syntactic patterns and later layers encoding more complex clause-level dependencies. While this intuition has shaped model design, there remains a lack of rigorous theoretical work demonstrating \textbf{how} deep transformers represent such hierarchical structures. In this work, we analyze the expressiveness of deep transformer models through the formal lens of bounded-depth, non-recursive context-free grammars. For this class of grammars, we explicitly construct transformers with positional attention whose depth grows linearly with grammar depth, while the neuron count scales with the number of derivation-tree shapes and quadratically with the number of production rules. Our theoretical results support the linear representation hypothesis by demonstrating that these architectures possess the structural capacity to encode abstract grammatical states into low-dimensional, linearly separable subspaces within the residual stream.

2606.17874 2026-06-17 cs.CV cs.LG 交叉投稿

Revisiting Structural Dependency in Autoregressive Multi-Task Table Recognition via Order-Independent Cell-Level Representations

重新审视自回归多任务表格识别中的结构依赖性:基于顺序无关的单元格级表示

Takaya Kawakatsu

发表机构 * Preferred Networks, Inc.(Preferred Networks公司)

AI总结 针对自回归多任务表格识别中单元格表示顺序依赖导致全局一致性下降的问题,提出通过非因果注意力生成顺序无关的单元格特征,实现并行推理,在两大数据集上提升定位与识别性能,推理时间减少约3倍。

Comments ICDAR 2026

详情
AI中文摘要

多任务表格识别在统一框架中联合处理表格结构预测、单元格定位和单元格内容识别。现有方法通常依赖自回归解码器生成表格结构,并重用其隐藏状态进行单元格定位和内容识别。这种自回归生成过程可能使单元格表示产生顺序依赖,降低跨单元格的全局一致性。本文提出一个结构细化模块,通过非因果注意力产生顺序无关的单元格特征。该设计使得单元格内容能够并行推理,同时每个单元格以细化特征中编码的全局上下文为条件。在两个大型数据集上的实验表明,该方法在单元格定位和端到端识别上持续提升,同时将整体推理时间减少约三倍。

英文摘要

Multi-task table recognition jointly addresses table structure prediction, cell localization, and cell content recognition within a unified framework. Existing approaches often rely on autoregressive decoders to generate table structures and reuse their hidden states for cell localization and content recognition. This autoregressive generation process can make cell representations order-dependent, degrading global consistency across cells. This paper proposes a structural refinement module that produces order-independent cell features through non-causal attention. This design enables parallel inference of cell contents while conditioning each cell on global context encoded in the refined features. Experiments on two large datasets demonstrate consistent gains in cell localization and end-to-end recognition, while reducing overall inference time by around threefold.

2606.18032 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 交叉投稿

INI-VPINN: A Variational Physics-Informed Neural Network with Implicit Neumann and Interface Handling for Multi-Material Domains with Geometric Singularities

INI-VPINN:一种隐式处理纽曼边界和界面的变分物理信息神经网络,适用于具有几何奇异性的多材料域

Shayan Dodge, Alessandro Formisano, Sami Barmada

发表机构 * DESTeC University of Pisa(DESTeC 帕尔米斯大学)

AI总结 提出一种新的弱形式物理信息神经网络INI-VPINN,通过隐式处理纽曼边界和界面条件,无需额外损失项或多子域网络,在多材料问题中实现更高精度和更快收敛。

Comments Preprint version. Under peer review. Code available at: https://github.com/ShayanDodge/INI-VPINN

详情
AI中文摘要

我们提出了一种新的弱形式物理信息神经网络方法(命名为INI-VPINN)。INI-VPINN将纽曼边界和界面条件自然地纳入变分公式中,消除了对额外损失项或多个子域网络的需求。该框架采用紧支撑加权函数和分部积分来隐式地施加通量和连续性约束,从而在材料边界上隐式地确保物理一致性。所提出的方法在具有尖锐界面和复杂几何的泊松和拉普拉斯问题上进行了测试。结果表明,与其他几种基于物理信息神经网络的公式相比,INI-VPINN始终实现更高的精度、更平滑和更快的收敛。所提出的框架提供了一种使用神经网络求解具有复杂几何和混合纽曼-狄利克雷边界条件的多材料问题的通用方法。该实现已在GitHub仓库中公开。

英文摘要

We propose a new weak-form Physics-Informed Neural Network approach (named INI-VPINN). INI-VPINN naturally incorporates Neumann boundary and interface conditions into the variational formulation. It removes the need for additional loss terms or multiple subdomain networks. This framework employs compact support weighting functions and integration by parts to implicitly impose flux and continuity constraints. In this way, it implicitly ensures physical consistency across material boundaries. The proposed method is tested on Poisson and Laplace problems with sharp interfaces and complex geometries. Results show that, compared with several other Physics Informed Neural Networks-based formulations, the INI-VPINN consistently achieves higher accuracy, smoother and faster convergence. The proposed framework provides a general approach for solving multimaterial problems with complex geometries and mixed Neumann-Dirichlet boundary conditions using neural networks. The implementation is publicly available in a GitHub repository.

2606.18175 2026-06-17 math.NA cs.LG cs.NA physics.comp-ph 交叉投稿

A Convex Quasilinearization Method for Solving Nonlinear PDEs with Physics-Informed Neural Networks

一种基于凸拟线性化的物理信息神经网络求解非线性偏微分方程的方法

Gbenga T. Awojinrin, Abdul-Akeem Olawoyin, Rami M. Younis

发表机构 * Texas A\&M University, College Station, Texas, U.S.A.(德克萨斯大学阿姆斯特朗分校)

AI总结 提出LiL-Q方法,通过Bellman-Kalaba拟线性化将非线性PDE转化为线性子问题序列,采用线性参数化试验空间(LiL)和凸最小二乘求解,避免非凸梯度训练,理论保证牛顿-康托罗维奇收敛,在多个基准上以少量外迭代达到高精度。

Comments Preprint. 56 pages, 18 figures. Code: https://github.com/awojinrin/lilq-pinn

详情
AI中文摘要

我们提出了一种数值方法,用于求解非线性偏微分方程(PDE)的正向问题。该方法中,Bellman-Kalaba拟线性化将非线性问题简化为一系列线性子问题,每个子问题通过配置法离散到参数线性输入的试验空间上,并通过单次直接线性最小二乘QR分解求解。该试验空间称为线性可学习(LiL),包含其可训练参数线性进入的表示,包括随机特征极限学习机、谱多项式基和三角展开,每个都作为物理信息神经网络实现。因此,该方法用凸的每步求解替代了限制标准PINN的非凸梯度训练。我们建立了外迭代在显式小条件下局部牛顿-康托罗维奇收敛到残差受限邻域,极限精度由试验空间的最佳逼近残差决定,而非优化容差。该方法记为LiL-Q,在七个基准上进行了评估,涵盖标量非线性PDE(Bratu、粘性Burgers、Buckley-Leverett)、耦合系统(平面应变弹性和二维及三维不可压缩Navier-Stokes方程)以及具有非均匀渗透率的稳态达西流。在这些问题中,LiL-Q在大多数情况下以个位数外迭代收敛,即使在最粗的基尺寸下且与参数数量无关。当精确解位于试验空间的张成空间中时,该方法在单次求解中恢复至机器精度。在Navier-Stokes基准上,它匹配或超过已发表的PINN求解器,可训练参数少两个数量级,且无需梯度优化。

英文摘要

We present a numerical method for the forward solution of nonlinear partial differential equations (PDEs) in which Bellman-Kalaba quasilinearization reduces the nonlinear problem to a sequence of linear subproblems, each discretized by collocation onto a trial space that is linear in its parameters and solved by a single direct linear least-squares QR factorization. The trial space, which we term Linear-in-Learnables (LiL), comprises representations whose trainable parameters enter linearly, including random-feature extreme learning machines, spectral polynomial bases, and trigonometric expansions, each implemented as a physics-informed neural network. The method thus replaces the nonconvex gradient-based training that limits standard PINNs with a convex per-step solve. We establish local Newton-Kantorovich convergence of the outer iteration to a residual-limited neighborhood under an explicit smallness condition, with the limiting accuracy governed by the best-approximation residual of the trial space rather than by an optimization tolerance. The method, denoted LiL-Q, is assessed on seven benchmarks spanning scalar nonlinear PDEs (Bratu, viscous Burgers, Buckley-Leverett), coupled systems (plane-strain elasticity and the incompressible Navier-Stokes equations in two and three spatial dimensions), and steady-state Darcy flow with heterogeneous permeability. Across these problems, LiL-Q converges in single-digit outer iterations in most cases, even at the coarsest basis sizes and independent of the parameter count. When the exact solution lies in the span of the trial space, the method recovers it to machine precision in a single solve. On the Navier-Stokes benchmarks, it matches or exceeds published PINN solvers with up to two orders of magnitude fewer trainable parameters, without gradient-based optimization.

2606.18231 2026-06-17 cs.CV cs.LG cs.RO 交叉投稿

Adaptive Volumetric Mechanical Property Fields Invariant to Resolution

自适应体积力学属性场:分辨率无关

Rishit Dagli, Donglai Xiang, Vismay Modi, Xuning Yang, Gavriel State, David I. W. Levin, Maria Shugrina

发表机构 * NVIDIA(英伟达)

AI总结 提出AdaVoMP方法,利用稀疏自适应体素结构和自回归Transformer编解码器,为3D物体预测高分辨率空间变化的杨氏模量、泊松比和密度,相比现有技术分辨率提升16^3倍且更准确。

Comments Project Page and hi-res paper: https://research.nvidia.com/labs/sil/projects/adavomp/. ICML 2026

详情
AI中文摘要

精确的力学属性(或材料)杨氏模量($E$)、泊松比($\ u$)和密度($\ ho$)对于数字世界的可靠物理模拟至关重要,但大多数3D资产缺乏这些信息。我们提出AdaVoMP,一种预测输入3D物体跨表示形式的精确密集空间变化($E$,$\ u$,$\ ho$)的方法,在分辨率、准确性和内存效率上优于现有技术。我们技术的基础是一种稀疏自适应体素结构SAV,它能高效地表示输入3D形状和材料场输出。我们将最准确的先前方法VoMP的固定体素模型替换为一种新颖的稀疏Transformer编码器-解码器模型,该模型学习为每个输入形状自回归地生成唯一的SAV来表示其材料,实现比先前技术高$16^3$倍的分辨率。实验表明,即使测试时计算量少于所有先前技术,AdaVoMP也能估计出更准确的体积属性。这使得我们能够将高分辨率复杂3D物体转换为可模拟的资产,从而实现逼真的可变形模拟。

英文摘要

Accurate mechanical properties (or materials) Young's modulus ($E$), Poisson's ratio ($ν$) and density ($ρ$) are essential for reliable physics simulation of digital worlds, but most 3D assets lack this information. We propose AdaVoMP, a method for predicting accurate dense spatially-varying ($E$, $ν$, $ρ$) for input 3D objects across representations, improving the resolution, accuracy, and memory efficiency over the state-of-the-art. The foundation of our technique is a sparse and adaptive voxel structure SAV that efficiently represents both the input 3D shape and the material field output. We replace the fixed-voxel model of the most accurate prior method, VoMP, with a novel sparse transformer encoder-decoder model that learns to generate a unique SAV autoregressively for every input shape to represent its materials, achieving a resolution $16^3\times$ higher than prior art. Experiments show that AdaVoMP estimates more accurate volumetric properties, even with lesser test-time compute than all prior art. This allows us to convert high-resolution complex 3D objects into simulation-ready assets, resulting in realistic deformable simulations.

2505.17740 2026-06-17 cs.LG cs.NE physics.comp-ph 版本更新

A tensor network approach for chaotic time series prediction

一种用于混沌时间序列预测的张量网络方法

Rodrigo Martínez-Peña, Román Orús

AI总结 针对混沌时间序列预测问题,提出基于张量网络的模型,通过分解高维数组降低参数复杂度,在精度和计算效率上优于传统回声状态网络。

Comments 15 pages, 4 figures. Comments are welcome!

详情
AI中文摘要

对混沌时间序列进行准确预测是一个复杂的挑战。储层计算是一种受神经形态启发的方法,已成为这项任务的强大工具。它利用动力系统的记忆和非线性,无需大量参数调整。然而,选择和优化储层架构仍然是一个开放问题。下一代储层计算通过采用基于截断Volterra级数的非线性向量自回归简化了该问题,从而降低了超参数复杂度。但后者在最大单项式次数方面存在指数级参数增长。张量网络通过将多维数组分解为低维结构,为解决该问题提供了有前景的方案,从而缓解了维度灾难。本文探索了先前提出的张量网络模型在混沌时间序列预测中的应用,展示了其在精度和计算效率方面相比传统回声状态网络的优势。使用最先进的张量网络方法,我们能够弥合张量网络与储层计算社区之间的差距,促进两个领域的进步。

英文摘要

Making accurate predictions of chaotic time series is a complex challenge. Reservoir computing, a neuromorphic-inspired approach, has emerged as a powerful tool for this task. It exploits the memory and nonlinearity of dynamical systems without requiring extensive parameter tuning. However, selecting and optimizing reservoir architectures remains an open problem. Next-generation reservoir computing simplifies this problem by employing nonlinear vector autoregression based on truncated Volterra series, thereby reducing hyperparameter complexity. Nevertheless, the latter suffers from exponential parameter growth in terms of the maximum monomial degree. Tensor networks offer a promising solution to this issue by decomposing multidimensional arrays into low-dimensional structures, thus mitigating the curse of dimensionality. This paper explores the application of a previously proposed tensor network model for predicting chaotic time series, demonstrating its advantages in terms of accuracy and computational efficiency compared to conventional echo state networks. Using a state-of-the-art tensor network approach enables us to bridge the gap between the tensor network and reservoir computing communities, fostering advances in both fields.

2512.13853 2026-06-17 cs.LG cond-mat.stat-mech math.PR stat.ML 版本更新

Dropout Neural Network Training Viewed from a Percolation Perspective

从逾渗视角看待Dropout神经网络训练

Finley Devlin, Jaron Sanders

AI总结 本文研究使用dropout训练深度神经网络时的逾渗现象,建立新逾渗模型刻画网络拓扑与路径问题的关系,揭示dropout中的逾渗效应及其可能导致训练崩溃的机制。

Comments 21 pages, 14 figures

详情
AI中文摘要

在这项工作中,我们研究了使用dropout训练深度神经网络(NNs)时逾渗的存在和影响。Dropout方法是训练NNs的正则化技术,由G. Hinton等人(2012)首次提出。这些方法在训练的每个阶段随机临时移除NN中的连接,并用随机梯度下降(SGD)更新剩余子网络。随机从网络中移除连接的过程类似于逾渗,这是统计物理的一个范式模型。如果dropout移除足够多的连接,使得NN的输入和输出之间没有路径,那么NN就无法根据数据做出预测。我们研究了模拟NN中dropout的新逾渗模型,并刻画了网络拓扑与该路径问题之间的关系。该理论证明了dropout中存在逾渗效应。我们还表明,在使用dropout训练无偏置NN时,这种逾渗效应可能导致训练崩溃;并且我们启发式地论证了这种崩溃也扩展到有偏置的NN。

英文摘要

In this work, we investigate the existence and effect of percolation in training deep Neural Networks (NNs) with dropout. Dropout methods are regularisation techniques for training NNs, first introduced by G. Hinton et al. (2012). These methods temporarily remove connections in the NN, randomly at each stage of training, and update the remaining subnetwork with Stochastic Gradient Descent (SGD). The process of removing connections from a network at random is similar to percolation, a paradigm model of statistical physics. If dropout were to remove enough connections such that there is no path between the input and output of the NN, then the NN could not make predictions informed by the data. We study new percolation models that mimic dropout in NNs and characterise the relationship between network topology and this path problem. The theory shows the existence of a percolative effect in dropout. We also show that this percolative effect can cause a breakdown when training NNs without biases with dropout; and we argue heuristically that this breakdown extends to NNs with biases.

2603.22372 2026-06-17 cs.LG cs.AI 版本更新

Rethinking Multimodal Fusion for Time Series: Text Modalities Need Constrained Fusion

重新思考时间序列的多模态融合:文本模态需要受约束的融合

Seunghan Lee, Jun Seo, Jaehoon Lee, Sungdong Yoo, Minjae Kim, Tae Yoon Lim, Dongwan Kang, Hwanil Choi, SoonYoung Lee, Wonbin Ahn

AI总结 针对多模态时间序列预测中朴素融合方法效果不佳的问题,提出受约束融合方法及受控融合适配器(CFA),通过低秩适配器过滤无关文本信息,在多种数据集和模型上验证了有效性。

Comments KDD Workshop on Mining and Learning from Time Series 2026

详情
AI中文摘要

多模态学习的最新进展推动了将文本或视觉等辅助模态集成到时间序列(TS)预测中。然而,现有方法大多增益有限,通常仅在特定数据集上提升性能,或依赖限制泛化能力的架构特定设计。在本文中,我们表明采用朴素融合策略(例如简单加法或拼接)的多模态模型通常表现不如单模态TS模型,我们将其归因于辅助模态的未受控集成可能引入无关信息。受此观察启发,我们探索了各种旨在控制这种集成的受约束融合方法,并发现它们始终优于朴素融合方法。此外,我们提出了受控融合适配器(CFA),一种简单的即插即用方法,无需修改TS主干即可实现受控的跨模态交互,仅集成与TS动态对齐的相关文本信息。CFA采用低秩适配器在将文本信息融合到时间表示之前过滤无关文本信息。我们在各种数据集和TS/文本模型上进行了超过20K次实验,证明了受约束融合方法的有效性。代码见:this https URL。

英文摘要

Recent advances in multimodal learning have motivated the integration of auxiliary modalities such as text or vision into time series (TS) forecasting. However, most existing methods provide limited gains, often improving performance only in specific datasets or relying on architecture-specific designs that limit generalization. In this paper, we show that multimodal models with naive fusion strategies (e.g., simple addition or concatenation) often underperform unimodal TS models, which we attribute to the uncontrolled integration of auxiliary modalities which may introduce irrelevant information. Motivated by this observation, we explore various constrained fusion methods designed to control such integration and find that they consistently outperform naive fusion methods. Furthermore, we propose Controlled Fusion Adapter (CFA), a simple plug-in method that enables controlled cross-modal interactions without modifying the TS backbone, integrating only relevant textual information aligned with TS dynamics. CFA employs low rank adapters to filter irrelevant textual information before fusing it into temporal representations. We conduct over 20K experiments across various datasets and TS/text models, demonstrating the effectiveness of the constrained fusion methods. Code is available at: https://github.com/seunghan96/cfa.

2604.03444 2026-06-17 cs.LG cs.CL 版本更新

Olmo Hybrid: From Theory to Practice and Back

Olmo Hybrid:从理论到实践再回到理论

William Merrill, Yanhong Li, Tyler Romero, Anej Svete, Caia Costello, Pradeep Dasigi, Dirk Groeneveld, David Heineman, Bailey Kuehl, Nathan Lambert, Chuan Li, Kyle Lo, Saumya Malik, DJ Matusz, Benjamin Minixhofer, Jacob Morrison, Luca Soldaini, Finbarr Timbers, Pete Walsh, Noah A. Smith, Hannaneh Hajishirzi, Ashish Sabharwal

AI总结 本文通过理论分析和实验验证,证明混合模型(结合注意力与线性RNN)在表达能力、扩展效率上优于纯Transformer,并训练了7B参数的Olmo Hybrid模型,在标准评估中超越Olmo 3。

Comments Corrected author list and typos in appendix

详情
AI中文摘要

近期工作展示了非Transformer语言模型(尤其是线性递归神经网络(RNN)和混合注意力与递归的混合模型)的潜力。然而,对于这些新架构的潜在优势是否值得承担规模化扩展的风险和努力,尚无共识。为解决此问题,我们从多个方面提供混合模型优于纯Transformer的证据。首先,理论上,我们证明混合模型不仅继承了Transformer和线性RNN的表达能力,还能表达超出两者的任务,例如代码执行。将这一理论付诸实践,我们训练了Olmo Hybrid,一个70亿参数模型,与Olmo 3 7B基本相当,但将滑动窗口层替换为Gated DeltaNet层。我们表明,在标准预训练和中期训练评估中,Olmo Hybrid优于Olmo 3,证明了混合模型在受控大规模设置下的优势。我们发现混合模型的扩展效率显著高于Transformer,这解释了其更高的性能。然而,尚不清楚为何特定形式问题上的更高表达能力会导致更好的扩展性或在下游任务(与这些问题无关)上表现更优。为解释这一明显差距,我们回到理论,论证为何增强的表达能力应转化为更好的扩展效率,从而完成循环。总体而言,我们的结果表明,混合注意力和递归层的混合模型是语言建模范式的强大扩展:不仅用于减少推理时的内存,更是获得在预训练中更好扩展的更具表达能力模型的基本途径。

英文摘要

Recent work has demonstrated the potential of non-transformer language models, especially linear recurrent neural networks (RNNs) and hybrid models that mix recurrence and attention. Yet there is no consensus on whether the potential benefits of these new architectures justify the risk and effort of scaling them up. To address this, we provide evidence for the advantages of hybrid models over pure transformers on several fronts. First, theoretically, we show that hybrid models do not merely inherit the expressivity of transformers and linear RNNs, but can express tasks beyond both, such as code execution. Putting this theory to practice, we train Olmo Hybrid, a 7B-parameter model largely comparable to Olmo 3 7B but with the sliding window layers replaced by Gated DeltaNet layers. We show that Olmo Hybrid outperforms Olmo 3 across standard pretraining and mid-training evaluations, demonstrating the benefit of hybrid models in a controlled, large-scale setting. We find that the hybrid model scales significantly more efficiently than the transformer, explaining its higher performance. However, its unclear why greater expressivity on specific formal problems should result in better scaling or superior performance on downstream tasks unrelated to those problems. To explain this apparent gap, we return to theory and argue why increased expressivity should translate to better scaling efficiency, completing the loop. Overall, our results suggest that hybrid models mixing attention and recurrent layers are a powerful extension to the language modeling paradigm: not merely to reduce memory during inference, but as a fundamental way to obtain more expressive models that scale better during pretraining.

2606.03089 2026-06-17 cs.LG cs.AI 版本更新

Constitutional On-Policy Safe Distillation

宪法性在策略安全蒸馏

Ming Wen, Yuxuan Liu, Kun Yang, Yunhao Feng, Zhuoer Xu, Yuhao Sun, Shiwen Cui, Xiang Zheng, Guoyu Wang, Xingjun Ma, Yu-Gang Jiang

发表机构 * Institute of Trustworthy Embodied AI(可信具身人工智能研究院) Fudan University(复旦大学) Shanghai Innovation Institute(上海创新研究院) Ant Group(蚂蚁集团) Zhejiang University(浙江大学) City University of Hong Kong(香港城市大学)

AI总结 针对在策略自蒸馏在安全对齐中因宪法条件导致教师分布收缩、表达能力下降的问题,提出宪法性在策略安全蒸馏(COPSD),通过交叉SFT冷启动校准教师分布,再进行宪法条件在策略蒸馏,在12个基准上实现了更优的安全-有用性权衡并降低安全税。

详情
AI中文摘要

在策略自蒸馏(OPSD)通过使用基于特权信息条件的教师提供密集的令牌级监督,已成为一种高效的后训练范式。先前工作表明,OPSD在可验证推理任务中可能崩溃,但安全对齐不同,它由高层宪法而非显式目标答案指导,因此是重新审视密集蒸馏的自然场景。然而,我们的初步研究表明,安全OPSD仍然遭受严重崩溃:宪法条件将教师分布收缩为短且过于保守的响应,而反向KL进一步将这种收缩放大为表达能力下降。我们将此效应形式化为非正交语义空间中安全边界下的几何泄漏,其中安全压力转移到表达能力维度。基于此分析,我们提出宪法性在策略安全蒸馏(COPSD),首先通过交叉SFT冷启动校准教师,然后执行宪法条件在策略蒸馏。在12个基准上的实验表明,COPSD比基线实现了持续更强的安全-有用性权衡,同时大幅降低了对通用推理能力的安全税。

英文摘要

On-policy self-distillation (OPSD) has emerged as an efficient post-training paradigm by using a teacher conditioned on privileged information to provide dense token-level supervision. Prior work has shown that OPSD can collapse in verifiable reasoning tasks, but safety alignment differs in that it is guided by high-level constitutions rather than explicit target answers, making it a natural setting to revisit dense distillation. However, our pilot study show that safety OPSD still suffers from severe collapse: constitutional conditioning contracts the teacher distribution toward short and overly conservative responses, and Reverse KL further amplifies this contraction into reduced expressiveness. We formalize this effect as geometric leakage under safety boundaries in a non-orthogonal semantic space, where safety pressure transfers into the expressiveness dimension. Based on this analysis, we propose Constitutional On-Policy Safe Distillation (COPSD), which first calibrates the teacher through a Cross-SFT cold-start and then performs constitution-conditioned on-policy distillation. Experiments on 12 benchmarks show that COPSD achieves a consistently stronger safety--helpfulness trade-off than baselines while substantially reducing the safety tax on general reasoning ability.

2606.14668 2026-06-17 cs.LG 版本更新

When to Write and When to Suppress: Route-Specialized Dual Adapters for Memory-Assisted Knowledge Editing

何时写入与何时抑制:面向记忆辅助知识编辑的路径专用双适配器

Baijia Zhang, Yining Huang

发表机构 * institutetext(机构)

AI总结 提出路径专用双适配器编辑器,通过相关性路由器决定是否应用编辑记忆,分别训练编辑适配器和局部性适配器,在三个基准上取得最佳概率偏好准确率。

详情
AI中文摘要

知识编辑系统必须更新选定的事实,同时保持邻近但无关的行为不变。本文在记忆辅助设置中研究该问题,其中在推理时检索编辑记忆,参数高效适配器校正模型的对象偏好。我们认为核心设计问题不仅是如何写入编辑,还包括何时抑制它。我们引入\method{},一种路径专用双适配器编辑器。相关性路由器首先决定提示是否应接收编辑记忆。被路由的提示使用训练为偏好新对象而非原始对象的编辑适配器;未被路由的非直接提示使用单独的局部性适配器,该适配器训练为保留或恢复原始对象偏好。我们在三个1,000案例协议\cf{}、\zsre{}和\mquake{}上,在相同记忆协议和两个7B/8B基础模型下评估\method{}。在Llama-3.1-8B-Instruct上,\method{}在所有三个基准上获得最佳总体概率偏好准确率:\cf{}为0.8180,\zsre{}为0.8946,\mquake{}为0.9922。在Qwen3-8B上趋势相同。路由器消融实验表明,相关记忆边界因数据集而异:在\cf{}上,词汇神经路由器最安全;而在\zsre{}和\mquake{}上,BGE嵌入路由效果更好。组件和模块消融实验表明,增益主要来自将编辑注入与离路抑制分离,而非单纯增加LoRA容量。

英文摘要

Knowledge editing systems must update selected facts while preserving nearby but irrelevant behavior. This paper studies this problem in a memory-assisted setting where an edit memory is retrieved at inference time and a parameter-efficient adapter corrects the model's object preference. We argue that the central design question is not only how to write an edit, but also when to suppress it. We introduce \method{}, a route-specialized dual-adapter editor. A relevance router first decides whether a prompt should receive an edit memory. Routed prompts use an edit adapter trained to prefer the new object over the original object; unrouted non-direct prompts use a separate locality adapter trained to preserve or restore the original-object preference. We evaluate \method{} on three 1,000-case protocols, \cf{}, \zsre{}, and \mquake{}, under the same memory protocol and two 7B/8B base models. On Llama-3.1-8B-Instruct, \method{} obtains the best overall probability-preference accuracy on all three benchmarks: 0.8180 on \cf{}, 0.8946 on \zsre{}, and 0.9922 on \mquake{}. The same trend holds on Qwen3-8B. Router ablations show that the relevant memory boundary differs across datasets: a lexical neural router is safest on \cf{}, while BGE embedding routing is better on \zsre{} and \mquake{}. Component and module ablations show that the gain mainly comes from separating edit injection from off-route suppression rather than from simply increasing LoRA capacity.

2606.14990 2026-06-17 cs.LG cs.AI 版本更新

Rational Sparse Autoencoder

有理稀疏自编码器

Naiyu Yin, Yue Yu

发表机构 * Lehigh University(里海大学)

AI总结 提出有理稀疏自编码器(RSAE),用可训练有理函数替代固定编码器激活,通过两阶段流程(初始化+微调)在多种语言模型和基线激活族上提升重构与下游行为指标,不牺牲特征可解释性。

Comments Accepted to the Mechanistic Interpretability Workshop at ICML 2026

详情
AI中文摘要

稀疏自编码器(SAE)是机械可解释性的标准工具,但当前的SAE系列受限于固定的编码器非线性,如ReLU、JumpReLU和TopK。这会将特定的稀疏机制硬编码到模型中,并可能扭曲重构与稀疏性的权衡。我们引入了有理稀疏自编码器(RSAE),它将固定的编码器激活替换为可训练的有理函数。有理激活足够灵活,可以在紧致域上一致逼近现有SAE系列使用的激活原语(对于TopK,提供分离top-k阈值后获得的阈值门),同时提供更丰富的函数类以适应观察到的预激活几何形状。我们通过两阶段流程实现这一想法:初始化过程复制预训练的基线SAE权重,插入通过在合成数据上使用松弛Remez交换获得的有理系数,并随有理系数一起校准尺度参数;然后在标准稀疏正则化重构目标下进行微调步骤。实验上,在三个开源权重语言模型的残差流激活上,以及所有三个基线激活族中,RSAE在微调步骤后严格改进,无论是在重构侧指标还是在下游行为指标上,且不牺牲稀疏探测下的特征级可解释性。这些增益在宿主语言模型、基线激活族以及我们测试的完整基线稀疏范围内一致,而升级本身每个自编码器仅增加少量标量参数,并在单个消费级GPU上运行几分钟。

英文摘要

Sparse autoencoders (SAEs) are standard tools for mechanistic interpretability, but current SAE families are constrained by fixed encoder nonlinearities such as ReLU, JumpReLU, and TopK. This hard-codes a particular sparsity mechanism into the model and can distort the reconstruction-versus-sparsity trade-off. We introduce the Rational Sparse Autoencoder (RSAE), which replaces the fixed encoder activation with a trainable rational function. Rational activations are flexible enough to uniformly approximate the activation primitives used by existing SAE families on compact domains (for TopK, the thresholded gate obtained after a separating top-k threshold is supplied), while also providing a richer function class for adapting to the observed pre-activation geometry. We realise this idea through a two-stage pipeline: an initialisation procedure that copies the pre-trained baseline SAE weights, plugs in rational coefficients obtained by the relaxed Remez exchange on synthetic data, and calibrates the scale parameters along with the rational coefficients; followed by a fine-tuning step under the standard sparsity-regularised reconstruction objective. Empirically, on residual-stream activations of three open-weight language models and across all three baseline activation families, the RSAE strictly improves on it after the fine-tuning step, both on reconstruction-side metrics and on downstream-behaviour metrics, without sacrificing feature-level interpretability under sparse probing. These gains are consistent across host language models, across baseline activation families, and across the full range of baseline sparsity we tested, while the upgrade itself adds only a handful of scalar parameters per autoencoder and runs in minutes on a single consumer GPU.

2406.07435 2026-06-17 cs.CV cs.LG eess.IV 版本更新

Beware of Aliases -- Signal Preservation is Crucial for Robust Image Restoration

警惕混叠——信号保留对鲁棒图像复原至关重要

Shashank Agnihotri, Julia Grabinski, Janis Keuper, Margret Keuper

AI总结 针对图像复原网络因混叠导致鲁棒性差的问题,提出BOA-Restormer,通过在频域执行部分下采样和上采样操作,确保无混叠路径,在低成本下提升模型鲁棒性。

Comments Tags: Adversarial attack, image restoration, image deblurring, frequency sampling

详情
AI中文摘要

图像复原网络通常由编码器和解码器组成,分别负责从噪声、失真数据中聚合图像内容并恢复干净、无失真的图像。数据聚合以及高分辨率图像生成通常都伴随着混叠的风险,即标准架构为了在验证数据上达到高PSNR值而牺牲了重建模型输入的能力。代价是模型鲁棒性低。在这项工作中,我们表明,在先进的复原变换器中简单地提供无混叠路径,可以在低复原性能成本下支持改进的模型鲁棒性。为此,我们提出了BOA-Restormer,一种基于变换器的图像复原模型,它在频域中部分执行下采样和上采样操作,以确保整个模型的无混叠路径,同时可能保留所有相关的高频信息。

英文摘要

Image restoration networks are usually comprised of an encoder and a decoder, responsible for aggregating image content from noisy, distorted data and to restore clean, undistorted images, respectively. Data aggregation as well as high-resolution image generation both usually come at the risk of involving aliases, i.e.~standard architectures put their ability to reconstruct the model input in jeopardy to reach high PSNR values on validation data. The price to be paid is low model robustness. In this work, we show that simply providing alias-free paths in state-of-the-art reconstruction transformers supports improved model robustness at low costs on the restoration performance. We do so by proposing BOA-Restormer, a transformer-based image restoration model that executes downsampling and upsampling operations partly in the frequency domain to ensure alias-free paths along the entire model while potentially preserving all relevant high-frequency information.

2511.09204 2026-06-17 quant-ph cs.LG 版本更新

Resource-Efficient Variational Quantum Classifier

资源高效的变分量子分类器

Petr Ptáček, Paulina Lewandowska, Ryszard Kukulski

发表机构 * IT4Innovations, VSB - Technical University of Ostrava(IT4Innovations奥斯特拉瓦技术大学) Faculty of Electrical Engineering and Computer Science, VSB - Technical University of Ostrava(电气工程与计算机科学学院,奥斯特拉瓦技术大学)

AI总结 提出基于汉明距离测量与经典后处理的无歧义量子分类器,通过更有效利用ansatz表达性提升分类性能,同时大幅减少电路评估次数,并增强对噪声的鲁棒性。

Comments 13 pages, 7 figures, 1 table; current format of preprint template

详情
AI中文摘要

我们引入了基于汉明距离测量与经典后处理的无歧义量子分类器。该方法通过更有效地利用ansatz的表达性来提升分类性能,同时显著减少电路评估次数。此外,该方法展现出对噪声的增强鲁棒性,这对近期的量子设备至关重要。我们在乳腺癌分类数据集上评估了所提出的方法。无歧义分类器实现了90%的平均准确率,相比基线提高了6.9个百分点,同时每次预测所需的电路执行次数减少了八倍。在存在噪声的情况下,改进幅度降至约3.1个百分点,执行成本降低相同。我们通过理论证据支持了该方法的实际性能,证实了我们的实验结果。

英文摘要

We introduce the unambiguous quantum classifier based on Hamming distance measurements combined with classical post-processing. The proposed approach improves classification performance through a more effective use of ansatz expressivity, while requiring significantly fewer circuit evaluations. Moreover, the method demonstrates enhanced robustness to noise, which is crucial for near-term quantum devices. We evaluate the proposed method on a breast cancer classification dataset. The unambiguous classifier achieves an average accuracy of 90%, corresponding to an improvement of 6.9 percentage points over the baseline, while requiring eight times fewer circuit executions per prediction. In the presence of noise, the improvement is reduced to approximately 3.1 percentage points, with the same reduction in execution cost. We substantiate our experimental results with theoretical evidence supporting the practical performance of the approach.

2601.18252 2026-06-17 cs.CV cs.AI cs.LG stat.ML 版本更新

Co-PLNet: A Collaborative Point-Line Network for Prompt-Guided Wireframe Parsing

Co-PLNet: 一种用于提示引导的线框解析的协作点线网络

Chao Wang, Xuanying Li, Cheng Dai, Jinglei Feng, Yuxiang Luo, Hao Qin, Yuqi Ouyang

AI总结 提出点线协作框架Co-PLNet,通过点线提示编码器交换空间线索,并利用交叉引导线解码器增强点线一致性,在Wireframe和YorkUrban数据集上提升线框解析的准确性和鲁棒性。

详情
AI中文摘要

线框解析旨在恢复线段及其连接点,以形成结构化的几何表示,用于同时定位与地图构建(SLAM)等下游任务。现有方法分别预测线和点,并在事后进行调和,导致不匹配和鲁棒性降低。我们提出Co-PLNet,一个点线协作框架,在两个任务之间交换空间线索,其中早期检测通过点线提示编码器(PLP-Encoder)转换为空间提示,该编码器将几何属性编码为紧凑且空间对齐的图。交叉引导线解码器(CGL-Decoder)随后通过基于互补提示的稀疏注意力细化预测,强制点线一致性和效率。在Wireframe和YorkUrban上的实验显示,准确性和鲁棒性持续改进,同时具有有利的实时效率,证明了我们在结构化几何感知中的有效性。我们的代码可在该 https URL 获取。

英文摘要

Wireframe parsing aims to recover line segments and their junctions to form a structured geometric representation useful for downstream tasks such as Simultaneous Localization and Mapping (SLAM). Existing methods predict lines and junctions separately and reconcile them post-hoc, causing mismatches and reduced robustness. We present Co-PLNet, a point-line collaborative framework that exchanges spatial cues between the two tasks, where early detections are converted into spatial prompts via a Point-Line Prompt Encoder (PLP-Encoder), which encodes geometric attributes into compact and spatially aligned maps. A Cross-Guidance Line Decoder (CGL-Decoder) then refines predictions with sparse attention conditioned on complementary prompts, enforcing point-line consistency and efficiency. Experiments on Wireframe and YorkUrban show consistent improvements in accuracy and robustness, together with favorable real-time efficiency, demonstrating our effectiveness for structured geometry perception. Our code is available at https://github.com/GalacticHogrider/Co-PLNet.

2602.14771 2026-06-17 cs.CV cs.AI cs.LG cs.MM cs.NE 版本更新

GOT-JEPA: Generic Object Tracking with Model Adaptation and Occlusion Handling using Joint-Embedding Predictive Architecture

GOT-JEPA:基于联合嵌入预测架构的通用目标跟踪与模型自适应及遮挡处理

Shih-Fang Chen, Jun-Cheng Chen, I-Hong Jhuo, Yen-Yu Lin

AI总结 提出GOT-JEPA框架,通过预测跟踪模型而非图像特征来提升泛化能力,并设计OccuSolver增强遮挡感知,在七个基准上验证了有效性。

Comments Accepted by IEEE Transactions on Circuits and Systems for Video Technology (TCSVT). This research focuses on learning model adaptation for adverse and dynamic environments, as well as fine-grained occlusion perception for tracking

详情
Journal ref
IEEE Transactions on Circuits and Systems for Video Technology 2026
AI中文摘要

人类视觉系统通过整合当前观测与先前观测信息、适应目标和场景变化、以及精细推理遮挡来跟踪物体。相比之下,最近的通用目标跟踪器通常针对训练目标进行优化,这限制了在未见场景中的鲁棒性和泛化能力,并且它们的遮挡推理仍然粗糙,缺乏对遮挡模式的详细建模。为了解决这些在泛化和遮挡感知方面的局限性,我们提出了GOT-JEPA,一个模型预测预训练框架,将JEPA从预测图像特征扩展到预测跟踪模型。给定相同的历史信息,教师预测器从干净的当前帧生成伪跟踪模型,学生预测器学习从当前帧的损坏版本预测相同的伪跟踪模型。这种设计提供了稳定的伪监督,并明确训练预测器在遮挡、干扰和其他不利观测下产生可靠的跟踪模型,从而提高了对动态环境的泛化能力。基于GOT-JEPA,我们进一步提出了OccuSolver来增强目标跟踪的遮挡感知。OccuSolver调整了一个以点为中心的点跟踪器,用于目标感知的可见性估计和详细的遮挡模式捕获。在跟踪器迭代生成的目标先验条件下,OccuSolver逐步细化可见性状态,增强遮挡处理,并产生更高质量的参考标签,逐步改进后续模型预测。在七个基准上的广泛评估表明,我们的方法有效增强了跟踪器的泛化能力和鲁棒性。

英文摘要

The human visual system tracks objects by integrating current observations with previously observed information, adapting to target and scene changes, and reasoning about occlusion at fine granularity. In contrast, recent generic object trackers are often optimized for training targets, which limits robustness and generalization in unseen scenarios, and their occlusion reasoning remains coarse, lacking detailed modeling of occlusion patterns. To address these limitations in generalization and occlusion perception, we propose GOT-JEPA, a model-predictive pretraining framework that extends JEPA from predicting image features to predicting tracking models. Given identical historical information, a teacher predictor generates pseudo-tracking models from a clean current frame, and a student predictor learns to predict the same pseudo-tracking models from a corrupted version of the current frame. This design provides stable pseudo supervision and explicitly trains the predictor to produce reliable tracking models under occlusions, distractors, and other adverse observations, improving generalization to dynamic environments. Building on GOT-JEPA, we further propose OccuSolver to enhance occlusion perception for object tracking. OccuSolver adapts a point-centric point tracker for object-aware visibility estimation and detailed occlusion-pattern capture. Conditioned on object priors iteratively generated by the tracker, OccuSolver incrementally refines visibility states, strengthens occlusion handling, and produces higher-quality reference labels that progressively improve subsequent model predictions. Extensive evaluations on seven benchmarks show that our method effectively enhances tracker generalization and robustness.

2603.18104 2026-06-17 cs.AI cs.DC cs.LG cs.NE 版本更新

Adaptive Domain Models: Bayesian Evolution, Warm Rotation, and Principled Training for Geometric and Neuromorphic AI

自适应领域模型:贝叶斯演化、热旋转与几何及神经形态AI的规范化训练

Houston Haynes

AI总结 提出基于维度类型系统、程序超图和b-posit有界设计的替代训练架构,实现内存开销恒定、梯度精确累积和级保持更新,并引入贝叶斯蒸馏和热旋转机制,支持领域特定模型的持续自适应与可验证正确性。

Comments 32 pages, 3 figures

详情
AI中文摘要

当前AI训练假设在IEEE-754算术上进行反向模式自动微分。训练相对于推理的内存开销、优化器复杂性以及训练过程中几何属性的结构退化,都是该算术基底的后果。本文基于三项先前结果开发了一种替代训练架构:维度类型系统和确定性内存管理框架(Haynes 2026),将栈可分配梯度分配和精确quire累积确立为设计时可验证属性;程序超图(Haynes 2026),将几何代数计算中的级保持确立为类型级不变量;以及b-posit有界设计(Jonnalagadda et al. 2025),使posit算术在传统上被视为仅推理的硬件目标上变得可行。它们的组合实现了深度无关的训练内存(约为推理占用量的两倍)、级保持的权重更新和精确梯度累积,统一适用于损失函数优化和脉冲时序依赖的神经形态模型。我们引入了*贝叶斯蒸馏*,一种通过ADM训练机制提取通用模型潜在先验结构的机制,解决了领域特定训练的数据稀缺自举问题。对于部署,我们引入了*热旋转*,一种操作模式,其中更新后的模型在不中断服务的情况下过渡到活跃推理路径,并通过PHG证书和签名版本记录形式化正确性。结果是一类领域特定AI系统,比通用模型更小、更精确,持续自适应,相对于其领域的物理结构可验证正确,并且可从现有模型初始化。

英文摘要

Prevailing AI training assumes reverse-mode automatic differentiation over IEEE-754 arithmetic. The memory overhead of training relative to inference, optimizer complexity, and structural degradation of geometric properties through training are consequences of this arithmetic substrate. This paper develops an alternative training architecture grounded in three prior results: the Dimensional Type System and Deterministic Memory Management framework (Haynes 2026), which establishes stack-eligible gradient allocation and exact quire accumulation as design-time verifiable properties; the Program Hypergraph (Haynes 2026), which establishes grade preservation through geometric algebra computations as a type-level invariant; and the b-posit bounded-regime design (Jonnalagadda et al. 2025), which makes posit arithmetic tractable across hardware targets conventionally considered inference-only. Their composition enables depth-independent training memory bounded to approximately twice the inference footprint, grade-preserving weight updates, and exact gradient accumulation, applicable uniformly to loss-function-optimized and spike-timing-dependent neuromorphic models. We introduce *Bayesian distillation*, a mechanism by which the latent prior structure of a general-purpose model is extracted through the ADM training regime, resolving the data-scarcity bootstrapping problem for domain-specific training. For deployment, we introduce *warm rotation*, an operational pattern in which an updated model transitions into an active inference pathway without service interruption, with correctness formalized through PHG certificates and signed version records. The result is a class of domain-specific AI systems that are smaller and more precise than general-purpose models, continuously adaptive, verifiably correct with respect to the physical structure of their domains, and initializable from existing models.

2. 表示学习、自监督与对比学习 12 篇

2606.17516 2026-06-17 cs.LG cs.AI stat.ME stat.ML 新提交

FoundCause: Causal Discovery with Latent Confounders from Observational Data

FoundCause: 从观测数据中发现含隐混淆因子的因果关系

Patrick Blöbaum, Krishnakumar Balasubramanian, Shiva Prasad Kasiviswanathan

发表机构 * Amazon Web Services(亚马逊云服务) Department of Statistics, University of California, Davis(加州大学戴维斯分校统计系)

AI总结 提出FoundCause,一种基于合成数据训练的摊销因果发现模型,通过单次前向传递直接映射数据集到因果图,显式建模隐混淆因子,在15个真实数据集上优于11种非摊销和4种摊销方法。

Comments Download the model at https://github.com/amazon-science/foundcause

详情
AI中文摘要

从观测数据中发现因果关系仍然具有挑战性,因为需要在没有干预的情况下恢复有向结构和隐混淆因子。我们提出了FoundCause,一种完全在合成数据上训练的摊销因果发现模型,它通过单次前向传递直接将数据集映射到因果图。通过从大量模拟结构因果模型中学习,FoundCause捕获了可迁移的统计模式,这些模式泛化到单个数据集之外。该架构融合了因果发现的几个关键归纳偏置。它使用一个置换不变的Transformer编码器,通过交替关注样本和变量来联合建模跨变量依赖性和每个变量的分布。通过统计条件注意力注入来自经典非对称度量的成对统计特征,引导模型朝向已知的因果信号。一个分解的解码器将边的存在性与方向分离,而一个三角细化模块使得能够推理高阶因果模式,如链和碰撞器。此外,一个基于可学习隐令牌的专用混淆因子模块显式建模隐藏的共同原因,并且模型通过其掩码输入表示显式处理缺失数据。据我们所知,FoundCause是第一个显式建模隐混淆因子的摊销因果发现方法。FoundCause在15个真实数据集上优于11种经典非摊销方法(如PC、GES、NOTEARS风格优化)和4种摊销因果发现方法,相对于最强的非摊销方法,在$F_1$上提高了9.6%,在AUROC上提高了1.2%,结构汉明距离减少了18.9%,同时仅需单次前向传递即可完成推理。

英文摘要

Causal discovery from observational data remains challenging due to the need to recover directed structure and latent confounding without interventions. We propose FoundCause, an amortized causal discovery model trained entirely on synthetic data that maps datasets directly to causal graphs in a single forward pass. By learning from large collections of simulated structural causal models, FoundCause captures transferable statistical patterns that generalize beyond individual datasets. The architecture incorporates several key inductive biases for causal discovery. It uses a permutation-invariant transformer encoder with alternating attention over samples and variables to jointly model cross-variable dependence and per-variable distributions. Pairwise statistical features derived from classical asymmetry measures are injected through statistics-conditioned attention, guiding the model toward known causal signals. A factorized decoder separates edge existence from direction, while a triangular refinement module enables reasoning over higher-order causal motifs such as chains and colliders. In addition, a dedicated confounder module based on learnable latent tokens explicitly models hidden common causes, and the model explicitly handles missing data via its masked input representation. To our knowledge, FoundCause is the first amortized causal discovery approach to explicitly model latent confounding. FoundCause outperforms 11 classical non-amortized methods (e.g., PC, GES, NOTEARS-style optimization) and 4 amortized causal discovery methods on 15 real-world datasets, achieving +9.6% improvement in $F_1$, +1.2% in AUROC, and an 18.9% reduction in structural Hamming distance relative to the strongest non-amortized methods, while performing inference in a single forward pass.

2606.17603 2026-06-17 cs.LG 新提交

Expanding SPHERE-JEPA: A Family of Statistical Regularizers for the Hypersphere

扩展SPHERE-JEPA:超球面上的统计正则化器家族

Léo Nicollier, Enric Meinhardt-Llopis, Max Dunitz, Marc Pic, Pablo Musé, Gabriele Facciolo

AI总结 为解决自监督学习中切片统计正则化器因蒙特卡洛采样引入投影方差导致优化不稳定和收敛慢的问题,提出全维MMD、KSD和KL散度正则化器,并采用旋转不变核,在ImageNet和Galaxy10上实现更稳定优化和一致改进。

详情
AI中文摘要

在自监督学习(SSL)中,通过在单位超球面上显式强制均匀分布来防止表示坍缩已被证明是有效的。然而,当前的框架通常依赖于切片统计正则化器,如SIGReg(用于LeJEPA)和SUSReg(用于SPHERE-JEPA),这些正则化器通过沿随机一维方向的蒙特卡洛采样来近似这一连续目标。这种随机性将投影方差注入训练梯度,破坏优化稳定性,并阻碍收敛。在这项工作中,我们首先证明,解析地积分掉这些随机投影自然地产生一个确定性的最大均值差异(MMD),从而避免了切片方法的方差。受此等价性的启发,我们直接在球面上制定了MMD、核斯坦因差异(KSD)和KL散度的全维目标,以强制均匀分布。为了防止空间偏差,我们通过谱理论构造旋转不变核来装备这些检验,并系统评估了两个典型族:平滑指数衰减(热核)和严格频率截止(带限)滤波器。实验上,去除投影引起的噪声导致更稳定的优化、更快的收敛,并在ImageNet和Galaxy10上相对于随机切片正则化器取得一致改进。此外,我们揭示了统计检验的选择塑造了学习潜在空间的几何结构:MMD和KSD有利于适用于以对象为中心的领域的局部聚类组织,而基于连续KDE的KL散度促进了细粒度的实例分离,在非聚类的程序化纹理检索上取得了最强结果。

英文摘要

In Self-Supervised Learning (SSL), preventing representation collapse by explicitly enforcing a uniform distribution on the unit hypersphere has proven to be effective. However, current frameworks typically rely on sliced statistical regularizers such as SIGReg (used in LeJEPA) and SUSReg (used in SPHERE-JEPA), which approximate this continuous objective via Monte Carlo sampling along random 1D directions. This stochasticity injects projection variance into the training gradients, destabilizing optimization, and hindering convergence. In this work, we first show that analytically integrating out these random projections natively yields a deterministic Maximum Mean Discrepancy (MMD), bypassing the variance of sliced methods. Motivated by this equivalence, we formulate full-dimensional objectives for MMD, Kernel Stein Discrepancy (KSD), and Kullback-Leibler (KL) divergence directly on the sphere to enforce a uniform distribution. To prevent spatial bias, we equip these tests with rotationally invariant kernels constructed via spectral theory, systematically evaluating two canonical families: smooth exponential decay (Heat) and strict frequency cutoff (Bandlimited) filters. Empirically, removing projection-induced noise results in more stable optimization, faster convergence, and consistent improvements over stochastic sliced regularizers on ImageNet and Galaxy10. Furthermore, we reveal that the choice of the statistical test shapes the geometry of the learned latent space: MMD and KSD favor locally clustered organization suitable for object-centric domains, whereas the continuous KDE-based KL divergence promotes fine-grained instance separation, yielding the strongest results on unclustered procedural texture retrieval.

2606.17782 2026-06-17 cs.LG 新提交

Blind Recovery of Latent Domains via Unsupervised Symmetry Discovery

通过无监督对称性发现实现潜在域盲恢复

Onur Efe, Arkadas Ozakin

发表机构 * Bogazici University(博阿齐奇大学)

AI总结 提出无监督框架,通过发现数据分布的对称性,从无结构观测中恢复潜在域和信号,使用浅层群卷积网络并施加平稳性和局部性正则化。

详情
AI中文摘要

盲逆问题的主要动机是在不知道混淆机制的情况下从损坏的观测中恢复感兴趣的信号。当损坏是卷积时,盲反卷积是一种突出的方法,但当一般线性变换混淆域结构时,它不适用。在这项工作中,我们提出了一个无监督框架,通过发现数据分布的对称性来恢复潜在域和信号。我们的框架将观测建模为从潜在随机场采样的信号的线性测量,并通过在模型输出处施加平稳性和局部性正则化来优化浅层群卷积网络。该模型学习潜在的对称性动作和适当的滤波器,从而将无结构观测映射到基于对称性的表示,揭示潜在信号。在随机过程、伊辛模型、打乱和比特乱序图像以及神经记录上的实验表明,该方法从无结构观测中恢复了潜在域和信号,表明对称性发现是无监督结构学习和盲逆问题的新方向。

英文摘要

Primary motivation in blind inverse problems is to recover signals of interest from corrupted observations without knowing the obfuscating mechanism. Blind deconvolution is a prominent approach when the corruption is convolutional, but it is not applicable when general linear transformations obfuscate the domain structure. In this work, we propose an unsupervised framework for recovering latent domains and signals by discovering symmetries of the data distribution. Our framework models observations as linear measurements of signals sampled from a latent random field, and optimizes a shallow group-convolutional network by imposing stationarity and locality regularization at the model output. The model learns a latent symmetry action and an appropriate filter, thereby mapping unstructured observations to a symmetry-based representation that reveals latent signals. Experiments on stochastic processes, Ising models, shuffled and bit-scrambled images, and neural recordings show that the method recovers latent domains and signals from unstructured observations, suggesting symmetry discovery as a new direction for unsupervised structure learning and blind inverse problems.

2510.09468 2026-06-17 cs.LG 版本更新

Geodesic Calculus on Implicitly Defined Latent Manifolds

隐式定义潜在流形上的测地线计算

Florine Hartwig, Josua Sassen, Juliane Braunsmann, Martin Rumpf, Benedikt Wirth

AI总结 提出将自编码器的潜在流形视为隐式子流形,并开发离散黎曼微积分工具以近似经典几何算子,通过去噪目标学习近似投影,实现潜在流形上的测地线路径计算和黎曼指数映射。

Comments 26 pages, 18 figures

详情
AI中文摘要

自编码器的潜在流形提供了数据的低维表示,可以从几何角度进行研究。我们提出将这些潜在流形描述为某个潜在空间的隐式子流形。基于此,我们开发了用于离散黎曼微积分的工具,近似经典几何算子。这些工具对于实际例子中经常出现的隐式表示不准确性具有鲁棒性。为了获得合适的隐式表示,我们提出通过最小化去噪目标来学习潜在流形上的近似投影。该方法独立于底层自编码器,并支持在潜在流形上使用不同的黎曼几何。该框架特别能够计算连接给定端点的测地线路径,并通过潜在流形上的黎曼指数映射进行测地线射击。我们在合成数据和真实数据上训练的各种自编码器上评估了我们的方法。

英文摘要

Latent manifolds of autoencoders provide low-dimensional representations of data, which can be studied from a geometric perspective. We propose to describe these latent manifolds as implicit submanifolds of some ambient latent space. Based on this, we develop tools for a discrete Riemannian calculus approximating classical geometric operators. These tools are robust against inaccuracies of the implicit representation often occurring in practical examples. To obtain a suitable implicit representation, we propose to learn an approximate projection onto the latent manifold by minimizing a denoising objective. This approach is independent of the underlying autoencoder and supports the use of different Riemannian geometries on the latent manifolds. The framework in particular enables the computation of geodesic paths connecting given end points and shooting geodesics via the Riemannian exponential maps on latent manifolds. We evaluate our approach on various autoencoders trained on synthetic and real data.

2602.07429 2026-06-17 cs.LG cs.AI 版本更新

Brep2Shape: Boundary and Shape Representation Alignment via Self-Supervised Transformers

Brep2Shape:通过自监督变换器对齐边界与形状表示

Yuanxu Sun, Yuezhou Ma, Haixu Wu, Guanyang Zeng, Muye Chen, Jianmin Wang, Mingsheng Long

AI总结 提出Brep2Shape自监督预训练方法,利用双Transformer骨干和拓扑注意力对齐B-rep的抽象边界表示与直观形状表示,在多项下游任务中达到最优精度并加速收敛。

详情
AI中文摘要

边界表示(B-rep)是计算机辅助设计(CAD)的行业标准。虽然深度学习在处理B-rep模型方面显示出潜力,但现有方法存在表示差距:连续方法提供分析精度但视觉上抽象,而离散方法提供直观清晰性但牺牲了几何精度。为弥合这一差距,我们引入了Brep2Shape,一种新颖的自监督预训练方法,旨在对齐抽象边界表示与直观形状表示。我们的方法采用几何感知任务,其中模型学习从参数化贝塞尔控制点预测密集空间点,使网络能够更好地理解从抽象系数导出的物理流形。为增强这种对齐,我们提出了一个双Transformer骨干,具有并行流,独立编码表面和曲线令牌以捕获它们不同的几何属性。此外,集成了拓扑注意力以建模表面和曲线之间的相互依赖关系,从而保持拓扑一致性。实验结果表明,Brep2Shape具有显著的可扩展性,在各种下游任务中实现了最先进的精度和更快的收敛速度。代码可在以下仓库获取:this https URL。

英文摘要

Boundary representation (B-rep) is the industry standard for computer-aided design (CAD). While deep learning shows promise in processing B-rep models, existing methods suffer from a representation gap: continuous approaches offer analytical precision but are visually abstract, whereas discrete methods provide intuitive clarity at the expense of geometric precision. To bridge this gap, we introduce Brep2Shape, a novel self-supervised pre-training method designed to align abstract boundary representations with intuitive shape representations. Our method employs a geometry-aware task where the model learns to predict dense spatial points from parametric Bézier control points, enabling the network to better understand physical manifolds derived from abstract coefficients. To enhance this alignment, we propose a Dual Transformer backbone with parallel streams that independently encode surface and curve tokens to capture their distinct geometric properties. Moreover, the topology attention is integrated to model the interdependencies between surfaces and curves, thereby maintaining topological consistency. Experimental results demonstrate that Brep2Shape offers significant scalability, achieving state-of-the-art accuracy and faster convergence across various downstream tasks.Code is available at this repository: https://github.com/thuml/Brep2Shape.

2606.16379 2026-06-17 cs.LG stat.ML 版本更新

Scalable and Interpretable Representation Alignment with Ordinal Similarity

可扩展且可解释的序数相似性表示对齐

Diogo Soares, Pankhil Gawade, Andrea Dittadi, Ewa Szczurek

发表机构 * University of Maryland(马里兰大学) Google Research(谷歌研究院)

AI总结 针对现有表示相似性度量缺乏可解释性、对异常值敏感且计算复杂的问题,提出基于序数相似性的三元组和四元组相似性指数,实现可解释、鲁棒且高效的对齐度量。

详情
AI中文摘要

评估表示相似性是表示学习的基础。然而,现有度量存在显著局限性:由于基线漂移而缺乏可解释性,对异常值缺乏鲁棒性,并且对于大型数据集计算上难以处理,迫使依赖启发式近似。为了解决这些问题,我们开发了一个序数相似性框架,通过三元组相似性指数(TSI)和四元组相似性指数(QSI)实例化,通过量化序数关系的一致性来衡量对齐。我们从理论上证明,这种公式本质上是可解释的、对异常值鲁棒的,并且计算高效。最后,我们建立了TSI与通过互近邻度量的局部邻域对齐之间的形式等价性。实验上,我们验证了这些性质,并表明序数相似性提供了一种可扩展的对齐度量方法,使从业者能够更好地理解和设计表示。

英文摘要

Evaluating representation similarity is fundamental to representation learning. However, existing metrics suffer from significant limitations: they lack interpretability due to shifting baselines, lack robustness to outliers, and are computationally intractable for large datasets, forcing reliance on heuristic approximations. To address this, we develop an ordinal-similarity framework, instantiated by the Triplet (TSI) and Quadruplet (QSI) Similarity Indices, which measure alignment by quantifying the consistency of ordinal relationships. We theoretically demonstrate this formulation is inherently interpretable, robust to outliers, and computationally efficient. Finally, we establish a formal equivalence between TSI and local neighborhood alignment, measured by Mutual Nearest Neighbors. Empirically, we validate these properties and show that ordinal similarity offers a scalable approach to measuring alignment, enabling practitioners to better understand and design representations.

2407.13053 2026-06-17 cs.CY cs.AI cs.CL cs.LG 版本更新

E2Vec: Feature Embedding with Temporal Information for Analyzing Student Actions in E-Book Systems

E2Vec:基于时间信息的特征嵌入用于分析电子书系统中的学生行为

Yuma Miyazaki, Valdemar Švábenský, Yuta Taniguchi, Fumiya Okubo, Tsubasa Minematsu, Atsushi Shimada

发表机构 * Kyushu University(九州大学)

AI总结 提出E2Vec方法,利用词嵌入将操作日志和时间间隔转化为学生向量,用于风险检测任务,提升泛化性和性能。

Comments Research paper published in the Proceedings of the 17th Educational Data Mining Conference (EDM 2024), see https://doi.org/10.5281/zenodo.12729853

详情
AI中文摘要

数字教科书(电子书)系统将学生与教科书的交互记录为一系列事件,称为事件流数据。过去,研究人员从事件流中提取有意义的特征,并将其用作下游任务(如成绩预测和学生行为建模)的输入。先前的研究评估了主要使用基于统计的特征(如操作类型数量或访问频率)的模型。虽然这些特征有助于提供某些见解,但它们缺乏捕捉不同学生学习行为中细粒度差异的时间信息。本研究提出E2Vec,一种基于词嵌入的新型特征表示方法。该方法将每个学生的操作日志及其时间间隔视为字符字符串序列,并生成包含时间信息的学习活动特征的学生向量。我们应用fastText为来自两年计算机科学课程数据集的305名学生生成嵌入向量。然后,我们研究了E2Vec在风险检测任务中的有效性,展示了其泛化性和性能潜力。

英文摘要

Digital textbook (e-book) systems record student interactions with textbooks as a sequence of events called EventStream data. In the past, researchers extracted meaningful features from EventStream, and utilized them as inputs for downstream tasks such as grade prediction and modeling of student behavior. Previous research evaluated models that mainly used statistical-based features derived from EventStream logs, such as the number of operation types or access frequencies. While these features are useful for providing certain insights, they lack temporal information that captures fine-grained differences in learning behaviors among different students. This study proposes E2Vec, a novel feature representation method based on word embeddings. The proposed method regards operation logs and their time intervals for each student as a string sequence of characters and generates a student vector of learning activity features that incorporates time information. We applied fastText to generate an embedding vector for each of 305 students in a dataset from two years of computer science courses. Then, we investigated the effectiveness of E2Vec in an at-risk detection task, demonstrating potential for generalizability and performance.

2603.04198 2026-06-17 stat.ML cs.LG 版本更新

Stable and Steerable Sparse Autoencoders with Weight Regularization

基于权重正则化的稳定且可操控的稀疏自编码器

Piotr Jedryszek, Oliver M. Crook

AI总结 通过L1/L2权重正则化提高稀疏自编码器的跨种子特征一致性,并在语言模型上提升操控成功率,同时保持可解释性分数。

详情
AI中文摘要

稀疏自编码器(SAEs)被广泛用于从神经网络激活中提取人类可解释的特征,但其学习到的特征在不同随机种子和训练选择下可能差异很大。为了提高稳定性,我们研究了通过添加编码器和解码器权重的L1或L2惩罚进行权重正则化,并评估了正则化与常见SAE训练默认值的交互作用。在MNIST上,我们观察到L2权重正则化产生了一个高度对齐的特征核心,并且当与绑定初始化和单位范数解码器约束结合时,它显著提高了跨种子的特征一致性。对于在语言模型激活(Pythia-70M-deduped)上训练的TopK SAEs,添加小的L2权重惩罚增加了三个随机种子间共享特征的比例,并使操控成功率大致翻倍,同时自动可解释性分数的平均值基本保持不变。最后,在正则化设置下,激活操控成功与否能更好地由自动可解释性分数预测,这表明正则化可以使基于文本的特征解释与功能可控性对齐。

英文摘要

Sparse autoencoders (SAEs) are widely used to extract human-interpretable features from neural network activations, but their learned features can vary substantially across random seeds and training choices. To improve stability, we studied weight regularization by adding L1 or L2 penalties on encoder and decoder weights, and evaluate how regularization interacts with common SAE training defaults. On MNIST, we observe that L2 weight regularization produces a core of highly aligned features and, when combined with tied initialization and unit-norm decoder constraints, it dramatically increases cross-seed feature consistency. For TopK SAEs trained on language model activations (Pythia-70M-deduped), adding a small L2 weight penalty increased the fraction of features shared across three random seeds and roughly doubles steering success rates, while leaving the mean of automated interpretability scores essentially unchanged. Finally, in the regularized setting, activation steering success becomes better predicted by auto-interpretability scores, suggesting that regularization can align text-based feature explanations with functional controllability.

2603.22281 2026-06-17 cs.CV cs.AI cs.CL cs.LG cs.RO 版本更新

ThinkJEPA: Empowering Latent World Models with Large Vision-Language Reasoning Model

ThinkJEPA:赋予潜在世界模型大型视觉-语言推理能力

Haichao Zhang, Yijiang Li, Shwai He, Tushar Nagarajan, Mingfei Chen, Jianglin Lu, Ang Li, Yun Fu

AI总结 提出ThinkJEPA框架,结合密集JEPA分支与稀疏VLM思考者分支,通过分层金字塔表示提取模块,实现细粒度运动建模与长程语义引导,在手部操作轨迹预测任务上超越基线。

Comments 10 pages, 5 figures

详情
AI中文摘要

潜在世界模型(如V-JEPA2)的最新进展展示了从视频观测预测未来世界状态的能力。然而,短观测窗口的密集预测限制了时间上下文,可能导致预测偏向局部低层次外推,难以捕捉长程语义并降低下游效用。相比之下,视觉-语言模型(VLM)通过对均匀采样帧进行推理,提供强大的语义基础和通用知识,但由于计算驱动的稀疏采样、语言输出瓶颈(将细粒度交互状态压缩为文本导向表示)以及适应小规模动作条件数据集时的数据分布不匹配,它们不适合作为独立的密集预测器。我们提出了一种VLM引导的JEPA风格潜在世界建模框架,通过双时间路径结合密集帧动态建模与长程语义指导:一个密集JEPA分支用于细粒度运动和交互线索,以及一个均匀采样的VLM“思考者”分支,具有更大的时间步长以提供知识丰富的指导。为了有效传递VLM的渐进推理信号,我们引入了一个分层金字塔表示提取模块,将多层VLM表示聚合成与潜在预测兼容的指导特征。在手部操作轨迹预测实验上,我们的方法优于强VLM-only基线和JEPA预测器基线,并展现出更鲁棒的长程展开行为。

英文摘要

Recent progress in latent world models (e.g., V-JEPA2) has shown promising capability in forecasting future world states from video observations. Nevertheless, dense prediction from a short observation window limits temporal context and can bias predictors toward local, low-level extrapolation, making it difficult to capture long-horizon semantics and reducing downstream utility. Vision--language models (VLMs), in contrast, provide strong semantic grounding and general knowledge by reasoning over uniformly sampled frames, but they are not ideal as standalone dense predictors due to compute-driven sparse sampling, a language-output bottleneck that compresses fine-grained interaction states into text-oriented representations, and a data-regime mismatch when adapting to small action-conditioned datasets. We propose a VLM-guided JEPA-style latent world modeling framework that combines dense-frame dynamics modeling with long-horizon semantic guidance via a dual-temporal pathway: a dense JEPA branch for fine-grained motion and interaction cues, and a uniformly sampled VLM \emph{thinker} branch with a larger temporal stride for knowledge-rich guidance. To transfer the VLM's progressive reasoning signals effectively, we introduce a hierarchical pyramid representation extraction module that aggregates multi-layer VLM representations into guidance features compatible with latent prediction. Experiments on hand-manipulation trajectory prediction show that our method outperforms both a strong VLM-only baseline and a JEPA-predictor baseline, and yields more robust long-horizon rollout behavior.

2604.22128 2026-06-17 cs.CL cs.LG 版本更新

Dissociating Decodability and Causal Use in Bracket-Sequence Transformers

括号序列Transformer中可解码性与因果使用的分离

Aryan Sharma, Cutter Dawes, Shivam Raval

AI总结 通过探针和干预实验,发现Dyck语言Transformer中层级表示虽可解码,但仅注意力模式中的栈顶位置对长距离准确性有因果影响。

详情
AI中文摘要

当在需要理解层级结构的任务上训练时,Transformer被发现以不同方式表示这种层级:在残差流的几何结构中,以及在维持后进先出顺序的类栈注意力模式中。然而,这些表示是被因果使用还是仅仅可解码仍不清楚。我们在Dyck语言(一种平衡括号序列的形式语言)上训练的Transformer中检验了这一差距,其中层级真实标签是明确的。通过探针和干预残差流及注意力模式,我们发现深度、距离和栈顶信号都是可解码的,但它们的因果作用不同。具体而言,掩盖真实栈顶位置的注意力会导致长距离准确性急剧下降,而消融低维残差流子空间则影响相对较小。这些结果扩展到模板化的自然语言设置,表明即使在相关层级变量已知的受控设置中,仅可解码性并不意味着因果使用。

英文摘要

When trained on tasks requiring an understanding of hierarchical structure, transformers have been found to represent this hierarchy in distinct ways: in the geometry of the residual stream, and in stack-like attention patterns maintaining a last-in, first-out ordering. However, it remains unclear whether these representations are causally used or merely decodable. We examine this gap in transformers trained on the Dyck language (a formal language of balanced bracket sequences), where the hierarchical ground truth is explicit. By probing and intervening on the residual stream and attention patterns, we find that depth, distance, and top-of-stack signals are all decodable, yet their causal roles diverge. Specifically, masking attention to the true top-of-stack position causes a sharp drop in long-distance accuracy, while ablating low-dimensional residual stream subspaces has comparatively little effect. These results, which extend to a templated natural language setting, suggest that even in a controlled setting where the relevant hierarchical variables are known, decodability alone does not imply causal use.

2606.07555 2026-06-17 cs.CL cs.LG 版本更新

Priors Persist Through Suppression: A Stroop Paradigm for Lexical Override

先验通过抑制持续存在:词汇覆盖的斯特鲁普范式

Han-yu Wang

发表机构 * The University of Hong Kong(香港大学)

AI总结 通过斯特鲁普范式实验,发现语言模型中的词汇先验在局部规则覆盖后仍持续存在,并通过激活修补定位到源位置三元组,揭示了先验是干扰起源和覆盖痕迹的共同通道。

详情
AI中文摘要

词汇表、技术规范和系统提示通常要求语言模型以不熟悉的方式使用熟悉的词汇。当这种方式有效时,词汇先验通过覆盖而非替换持续存在:它在局部规则应用后继续运作,规则降低其logit而非在顶部安装新含义。我们通过斯特鲁普风格范式对此进行测试:一个重映射规则(“doctor”意为“forest”)与查询词的词汇先验干扰项(“hospital”)对抗,并匹配中性对照。在跨越四个家族和1B-9B参数的11个开源权重模型中,即使在项目级别控制答案先验、频率、分词和提示措辞后,词汇先验强度仍能预测干扰。对五个对齐模型的激活修补定位到一个源位置三元组(定义主语、定义目标、查询词),该三元组几乎完全恢复了冲突效应(聚合$R \in [0.92, 1.06]$)。定义目标交换表明该三元组执行绑定而非身份匹配。分离实验将目标保留隔离为绑定特定特征:干扰抑制在匹配、交换和项目不匹配条件下均发生,而目标logit崩溃仅在定义目标位置被破坏时发生。行为和机制汇聚到同一通道:词汇先验既是干扰的起源,也是覆盖留下痕迹的地方。

英文摘要

Glossaries, technical specifications, and system prompts routinely ask language models to use familiar words in unfamiliar ways. When this works, the local rule does not install the new meaning on top of the old one; the pretrained prior keeps operating underneath, and its strength still shows through. We test this with a Stroop-style paradigm: a remapping rule (doctor means forest) pitted against the query word's lexical-prior distractor (hospital), with matched neutral controls. Across 11 open-weight models spanning four families and 1B-9B parameters, lexical-prior strength predicts interference even after item-level controls for answer prior, frequency, tokenization, and prompt wording. Activation patching on five aligned models locates a source-position triplet (definition subject, definition target, query word) that nearly fully recovers the conflict effect (aggregate $R \in [0.92, 1.06]$); a definition-target swap shows the triplet performs binding rather than identity matching. Dissociation experiments isolate target preservation as the binding-specific signature: distractor suppression occurs under matched, swap, and item-mismatched conditions alike, whereas target logit collapse occurs only when the definition-target position is corrupted. Behavior and mechanism converge on the same channel: the prior's strength both predicts which overrides fail and marks where the causal repair lands.

2606.09770 2026-06-17 q-bio.NC cs.LG 版本更新

Discovering Functionally Selective Brain Regions with a Deep Topographic Multimodal Model

发现功能选择性脑区:一种深度地形多模态模型

Badr AlKhamissi, Johannes Mehrer, Lara Marinov, Ahmed Abdelaal, Abdulkadir Gokce, Martin Schrimpf

发表机构 * University of California, Berkeley(加州大学伯克利分校) Max Planck Institute for Human Cognitive and Brain Sciences(马克斯·普朗克人类认知与脑科学研究所) ETH Zurich(苏黎世联邦理工学院)

AI总结 提出Topo-Omni模型,通过空间平滑微调预训练基础模型,在单一连续虚拟皮层上整合视觉、听觉和语言/认知处理,产生与人类神经影像一致的多模态聚类,并用于发现新脑区。

Comments Preprint. First two author contributed equally

详情
AI中文摘要

皮层中的邻近神经元具有相似的反应特征,从而在感觉和认知系统中产生系统性的空间组织。最近的地形模型再现了这种结构的某些方面,但仍然是单模态的,并且对每一层分别施加空间约束,产生了碎片化的图谱,既不能捕捉皮层处理流的连续性,也不能捕捉跨模态的整合。我们引入了Topo-Omni,一种地形多模态模型,其中视觉、听觉和语言/认知处理共享一个单一的连续虚拟皮层。通过使用空间平滑目标微调预训练的基础模型,该架构在跨模态中发展出与人类神经影像一致的聚类,从感觉系统到认知系统。驱动或抑制一个聚类会选择性偏向或损害感知,这与人类干预研究相似。最后,我们使用我们的模型在虚拟皮层中筛选新的聚类,并发现了新的自然景观和动物网络,并在人类数据中验证了它们。因此,单一的空间原则组织了跨模态和处理阶段的表征,产生了关于皮层组织的可检验假设。

英文摘要

Nearby neurons in cortex share similar response profiles, producing systematic spatial organization across sensory and cognitive systems. Recent topographic models reproduce aspects of this structure but remain unimodal and spatially constrain each layer separately, yielding fragmented maps that capture neither the contiguity of cortical processing streams nor their integration across modalities. We introduce Topo-Omni, a topographic multimodal model in which visual, auditory, and language/cognitive processing share a single contiguous in-silico sheet. Built by fine-tuning a pretrained foundation model with a spatial smoothness objective, this architecture develops clusters across modalities that are consistent with human neuroimaging, from sensory to cognitive systems. Driving or suppressing a cluster selectively biases or impairs perception, paralleling human intervention studies. Finally, we use our model to screen for novel clusters in-silico and discover new natural landscape and animal networks which we validate in human data. A single spatial principle thus organizes representations across modalities and processing stages, yielding testable hypotheses about cortical organization.

3. 强化学习与序列决策 24 篇

2606.17250 2026-06-17 cs.LG cs.CL 新提交

Rethinking Groups in Critic-Free RLVR

重新思考无评论强化学习中的分组

Yihong Wu, Liheng Ma, Lingfeng Xiao, Muzhi Li, Xinyu Wang, Yingxue Zhang, Jian-Yun Nie

发表机构 * Université de Montréal(蒙特利尔大学) McGill University(麦吉尔大学) Mila - Quebec AI Institute(Mila - 魁北克人工智能研究所) University of Waterloo(滑铁卢大学) The Chinese University of Hong Kong(香港中文大学) Huawei Noah’s Ark Lab(华为诺亚方舟实验室)

AI总结 针对无评论强化学习分组策略的数据低效和同步问题,提出负令牌过滤方法,实现单次 rollout 稳定训练,在推理和代理任务上表现相当或更优。

详情
AI中文摘要

强化学习已成为大型语言模型后训练的核心范式。现有的无评论强化学习方法通常为同一问题生成一组 rollout 以估计价值基线用于优势计算。然而,这种设计存在数据低效、组同步障碍以及与结构化 rollout 不灵活的问题。在这项工作中,我们重新审视了“分组”的作用,并表明其底层功能不仅仅是估计基线,而是防止对负样本的错误惩罚。基于这一见解,我们提出了负令牌过滤,一种简单有效的策略,能够实现稳定的单 rollout 训练。我们将其应用于两种批量级优势方法,在推理任务上取得了与基于分组的强化学习技术相当的性能,在代理任务上取得了更强的性能。

英文摘要

Reinforcement learning (RL) has become a central paradigm for post-training large language models. Existing critic-free RL methods typically generate a group of rollouts for the same question to estimate value baselines for advantage computation. However, this design suffers from data inefficiency, group synchronization barriers, and inflexibility with structured rollouts. In this work, we revisit the role of the ``group'' and show that its underlying function is not merely to estimate baselines but to prevent false penalties on negative samples. Building on this insight, we propose negative token filtering, a simple and effective strategy that enables stable single-rollout training. We apply it to two batch-level advantage methods, achieving comparable performance on reasoning tasks and stronger performance on agentic tasks relative to group-based RL techniques.

2606.17331 2026-06-17 cs.LG 新提交

Decision-Driven Geosteering Under Uncertainty: A Unified Framework for Sequential Decision Optimization

不确定性下的决策驱动地质导向:序列决策优化的统一框架

Hibat Errahmen Djecta, Sergey Alyaev, Kristian Fossum, Reidar B. Bratvold, Ressi Bonti Muhammad, Apoorv Srivastava

发表机构 * NORCE Research Centre(NORCE研究机构) University of Stavanger(斯塔夫anger大学) Stanford University(斯坦福大学)

AI总结 提出一个将粒子滤波与强化学习结合的地质导向框架,通过显式建模地质不确定性并评估三种决策策略,实现稳定且高效的井轨迹实时优化。

详情
AI中文摘要

地质导向需要在未知地质构造中导航井轨迹,同时根据钻井过程中获取的间接测量值顺序更新决策。本文提出一个不确定性感知的地质导向框架,该框架将用于概率性地下解释的粒子滤波与用于序列决策的基于价值的强化学习紧密结合。钻头前方的地质不确定性通过粒子滤波显式表示,从而实现基于信念的控制而非确定性轨迹校正。该框架将粒子滤波信念更新与信念感知决策策略耦合,并评估在相同不确定性表示下运行的三种决策选项:一种可解释的近似动态规划方案、一种深度Q学习基线,以及一种采用目标Q网络方案训练以保持稳定性的双深度强化学习架构,该架构使用对偶(价值/优势)分解进行Q值参数化。除了最终的放置性能外,我们还使用衡量随时间变化的转向平滑度的稳定性指标评估策略行为,从而提供关于决策策略如何随不确定性演变而响应的额外操作洞察。该框架集成了一个API,用于在工业地质导向模拟器中在真实测量噪声和钻井约束下进行验证。通过在所有方法中使用相同的地质实现、操作限制和奖励定义,实验提供了对替代决策策略在整个钻井过程中行为的受控和高保真评估,而不仅仅是根据最终井轨迹评估性能。

英文摘要

Geosteering requires navigating a well trajectory through an unknown geological configuration, while sequentially updating decisions based on indirect measurements acquired during drilling. This work presents an uncertainty-aware geosteering framework that tightly integrates particle filtering for probabilistic subsurface interpretation with value-based reinforcement learning for sequential decision-making. Geological uncertainty ahead of the drill bit is represented explicitly through a particle filter (PF), enabling belief-informed control rather than deterministic trajectory correction. The framework couples PF belief updates with belief-informed decision policies and evaluates three decision-making options that operate under identical uncertainty representations: an interpretable Approximate Dynamic Programming (ADP) scheme, a Deep Q-learning baseline, and a Dual Deep Reinforcement Learning (Dual DRL) architecture trained with a target Q-network scheme for stability, using a dueling (value/advantage) decomposition for Q-value parameterization. Beyond final placement performance, we assess policy behavior using stability-oriented metrics that quantify steering smoothness over time, providing additional operational insight into how decision policies respond as uncertainty evolves. The framework is integrated with an API for validation within an industrial geosteering simulator under realistic measurement noise and drilling constraints. Using identical geological realizations, operational limits, and reward definitions across methods, the experiments provide a controlled and high-fidelity evaluation of how alternative decision policies behave throughout the drilling process, rather than evaluating performance solely from the final well trajectory.

2606.17377 2026-06-17 cs.LG cs.SY eess.SY 新提交

Performance-Driven Environment Abstraction with Multi-Timescale Learning

性能驱动的多时间尺度学习环境抽象

Yue Guan, Dipankar Maity, Panagiotis Tsiotras

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of North Carolina at Charlotte(北卡罗来纳大学夏洛特分校)

AI总结 针对大规模马尔可夫决策过程,提出一种性能驱动的环境抽象方法,通过多时间尺度强化学习联合优化策略和树结构抽象,平衡性能与复杂度,实现状态压缩和样本效率提升。

详情
AI中文摘要

我们研究大规模马尔可夫决策过程中用于决策的性能驱动环境抽象。我们寻求直接优化决策质量的抽象,而非保留几何或拓扑结构。我们将抽象建模为通过聚合状态空间并在每个聚合状态内强制执行共享动作分布而获得的受控近似。对于固定划分,我们建立了一个性能保证,将值函数近似误差与动作共享引入的损失分离。在此分析指导下,我们开发了一个多时间尺度强化学习框架,联合调整策略和树结构环境抽象。所得算法基于Q值差异细化和粗化状态空间区域,平衡性能与抽象大小和复杂度。实验结果表明,与演员-评论家基线相比,该方法实现了显著的状态压缩、改进的样本效率和更快的重新规划。

英文摘要

We study performance-driven environment abstraction for decision-making in large Markov decision processes. Rather than preserving geometric or topological structure, we seek abstractions that directly optimize decision quality. We model abstraction as a controlled approximation obtained by aggregating the state space and enforcing a shared action distribution within each aggregated state. For a fixed partition, we establish a performance guarantee that separates value-function approximation error from the loss introduced by action sharing. Guided by this analysis, we develop a multi-timescale reinforcement learning framework that jointly adapts the policy and a tree-structured environment abstraction. The resulting algorithm refines and coarsens regions of the state space based on Q-value discrepancies, balancing performance against abstraction size and complexity. Empirical results demonstrate substantial state compression, improved sample efficiency, and faster replanning compared to actor-critic baselines.

2606.17414 2026-06-17 cs.LG math.DS 新提交

Memory-Efficient Meta-Reinforcement Learning for Adaptive Safety-Critical Control in Adversarial Spacecraft Proximity Operations

用于对抗性航天器接近操作中自适应安全关键控制的内存高效元强化学习

Alejandro Posadas-Nava, Richard Linares, Minduli Wijayatunga

发表机构 * MIT(麻省理工学院) University of Illinois, Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 本文研究利用元强化学习调整输入约束控制屏障函数的类K函数,比较三种循环网络架构和两种训练算法,发现Mamba与PPO组合在合作与非合作场景中均能提升任务完成率、安全性和燃料效率。

详情
AI中文摘要

自主航天器交会与接近操作(RPO)需要控制器在推力约束下保证安全,同时最小化燃料消耗。输入约束控制屏障函数(ICCBF)为具有执行约束的非线性系统提供了一种控制方法,构建前向不变安全集。先前工作表明,通过元强化学习(meta-RL)学习定义ICCBF递归的类$\mathcal{K}$函数,可为RPO中的安全关键控制提供鲁棒、非贪婪的方法。本文进一步扩展该框架,研究了三种循环网络架构(长短期记忆(LSTM)、门控循环单元(GRU)、选择性状态空间模型(Mamba))和两种训练算法(近端策略优化(PPO)和软演员-评论家(SAC))的性能,以确定通过元强化学习调整ICCBF类K函数的最佳设置。除了合作测试案例外,还在存在对抗行为的情况下评估性能,其中目标航天器以恶化追踪航天器安全的方式行动。结果表明,在所有测试的合作与非合作场景中,使用PPO的状态空间模型(如Mamba)相比其他架构在任务完成、安全和燃料节省方面表现更优。

英文摘要

Autonomous spacecraft rendezvous and proximity operations (RPO) require controllers that guarantee safety under thrust constraints while minimizing fuel expenditure. Input-constrained control barrier functions (ICCBFs) provide a control method for nonlinear systems with actuation constraints that construct a forward-invariant safe set. Previous work has shown that learning class-$\mathcal{K}$ functions defining the ICCBF recursion via meta reinforcement learning (meta-RL) yields a robust, non-greedy approach to safety-critical control in RPO. This paper extends that framework further by investigating the performance of three recurrent network architectures (Long Short Term Memory (LSTM), Gated Recurrent Unit (GRU), Selective State Space Model (Mamba)) and two training algorithms (Proximal Policy Optimization (PPO) and Soft Actor Critic (SAC)) to identify the best setup for tuning ICCBF class-K functions via meta-RL. In addition to cooperative test cases, performance is evaluated in the presence of adversarial behavior where the target spacecraft behaves in a way that worsens the safety of the chaser spacecraft. Results indicate that state space models such as Mamba when used with PPO achieve superior task completion, safety, and fuel-savings compared to other architectures, across all cooperative and uncooperative scenarios tested.

2606.17489 2026-06-17 cs.LG cs.AI 新提交

Online LLM Selection via Constrained Bandits with Time-Varying Demand

基于时变需求的约束赌博机在线LLM选择

Yin Huang, Qingsong Liu, Jie Xu

发表机构 * Department of Electrical and Computer Engineering, University of Florida(佛罗里达大学电气与计算机工程系) Manning College of Information and Computer Sciences, University of Massachusetts Amherst(马萨诸塞大学阿默斯特分校曼宁信息与计算机科学学院)

AI总结 针对边缘云推理系统中异构LLM的选择问题,提出一种基于置信界估计和需求预测的在线学习算法,在硬预算和软延迟约束下实现亚线性遗憾和约束违反。

Comments 11 pages, 3 figures with multiple subfigures, 1 table, submitted for possible journal publication

详情
AI中文摘要

大型语言模型(LLM)越来越多地部署在边缘云推理系统中,以处理具有异构准确性、延迟和成本配置的多样化用户任务。为每个传入任务选择合适的LLM对于确保服务质量和高效资源利用至关重要。然而,模型异构性、随机且未知的性能特征以及时变的任务需求使得静态选择策略不再适用。实际部署通常施加硬资源预算(如货币支出限制)和软服务级别要求(如延迟保证)。这些约束为在线决策带来了额外挑战。我们将该问题形式化为一个约束随机赌博机学习任务,其中学习者在包装型(硬)和覆盖型(软)约束下顺序选择模型,同时适应时变的任务需求。学习者无法访问底层奖励、成本或延迟分布,必须依赖部分反馈。我们开发了一种新颖的在线学习算法,利用置信界估计和需求预测来平衡奖励最大化与长期约束满足。我们提供了理论保证,表明与具有完整信息的离线基准相比,该算法实现了亚线性遗憾和亚线性覆盖约束违反。在合成工作负载上的实验结果证明了我们的方法在动态、资源受限环境中的有效性和鲁棒性。

英文摘要

Large Language Models (LLMs) are increasingly deployed in edge-cloud inference systems to handle diverse user tasks with heterogeneous accuracy, latency, and cost profiles. Selecting the appropriate LLM for each incoming task is critical for ensuring service quality and efficient resource utilization. However, model heterogeneity, stochastic and unknown performance characteristics, and time-varying task demands make static selection strategies inadequate. Real-world deployments often impose hard resource budgets such as monetary expenditure limits, along with soft service-level requirements such as latency guarantees. These constraints introduce additional challenges for online decision-making. We formulate this problem as a constrained stochastic bandit learning task, where the learner sequentially selects models under both packing-type (hard) and covering-type (soft) constraints, while adapting to time-varying task demand. The learner operates without access to the underlying reward, cost, or latency distributions and must rely on partial feedback. We develop a novel online learning algorithm that leverages confidence-bound estimates and demand predictions to balance reward maximization with long-term constraint satisfaction. We provide theoretical guarantees showing sublinear regret and sublinear covering constraint violations compared to an offline benchmark with full information. Experimental results on synthetic workloads demonstrate the effectiveness and robustness of our approach in dynamic, resource-constrained environments.

2606.17524 2026-06-17 cs.LG 新提交

Learning to Refine Hidden States for Reliable LLM Reasoning

学习精炼隐藏状态以实现可靠的LLM推理

Chia-Hsuan Hsu, Jui-Ming Yao

发表机构 * Tongyu0924

AI总结 提出ReLAR框架,通过强化学习引导的潜在状态精炼,自适应调整推理步数和方向,提升复杂多步推理的准确性和稳定性,降低推理开销。

Comments Code is available at tongyu0924/Learning-to-Refine-Hidden-States

详情
AI中文摘要

大型语言模型展现出强大的推理能力,但在复杂的多步设置中,其内部推理过程可能不稳定,早期隐藏状态错误可能传播到错误预测。我们提出ReLAR,一种强化引导的潜在精炼框架,在解码前迭代更新隐藏表示。ReLAR维护一个紧凑的潜在推理状态,并使用学习到的深度和动作控制器自适应地确定精炼步骤的数量和方向。控制器基于逐步似然改进的策略梯度目标进行训练,实现了高效的输入依赖推理,无需显式的思维链生成。在医学、数学、多跳推理和开放式生成基准上的实验表明,ReLAR提高了准确性、生成质量和推理稳定性,同时推理开销显著低于显式推理基线。

英文摘要

Large language models show strong reasoning ability, but their internal reasoning process can remain unstable in complex multi-step settings, where early hidden-state errors may propagate to incorrect predictions. We propose ReLAR, a reinforcement-guided latent refinement framework that iteratively updates hidden representations before decoding. ReLAR maintains a compact latent reasoning state and uses learned depth and action controllers to adaptively determine both the number and direction of refinement steps. The controllers are trained with a policy gradient objective based on step-wise likelihood improvement, enabling efficient input-dependent reasoning without explicit chain-of-thought generation. Experiments on medical, mathematical, multi-hop reasoning, and open-ended generation benchmarks show that ReLAR improves accuracy, generation quality, and reasoning stability with substantially lower inference overhead than explicit reasoning baselines.

2606.17545 2026-06-17 cs.LG q-fin.CP q-fin.PR 新提交

Continuous-time Optimal Stopping through Deep Reinforcement Learning

通过深度强化学习的连续时间最优停止

Cosmin Borsa, Michael Ludkovski

发表机构 * Department of Statistics & Applied Probability, UC Santa Barbara(加州大学圣塔芭芭拉分校统计与应用概率系)

AI总结 提出CARLOS算法,利用聚合深度神经网络学习任意精细时间分辨率下的停止规则,通过渐进式时间网格细化和自适应采样,逼近美式期权价格上界。

Comments 33 pages

详情
AI中文摘要

基于仿真的最优停止问题求解器必须离散化停止决策。在经典动态规划下,粗网格(只有少数停止机会)会显著低估最优期望回报,而在极细网格上,近似误差通过反向递归累积。为消除这一限制,我们开发了一种新的强化学习启发算法,能够在任意精细时间分辨率下学习停止规则。我们的CARLOS(连续时间自适应强化学习最优停止)算法利用聚合深度神经网络(ADNN)学习联合时空决策边界。从粗时间网格开始,我们逐步增加停止机会的频率,同时并行训练ADNN以精化其时机-价值估计。此外,我们设计了一种自适应采样策略,逐渐将训练集中到停止边界附近。基准测试结果表明,CARLOS相比现有百慕大求解器提供更高的价格,接近美式上界,并且相对于非RL比较器实现了高计算效率。

英文摘要

Simulation based solvers for optimal stopping problems must discretize the stopping decision. Under classical dynamic programming, a coarse exercise grid with only a few stopping opportunities can materially undervalue the optimal expected reward, whereas on a very fine grid, approximation errors accumulate through the backward recursion. To remove this limitation, we develop a new reinforcement-learning inspired algorithm that enables us to learn the exercise rule at arbitrarily fine time resolution. Our CARLOS (Continuous-time Adaptive Reinforcement Learning for Optimal Stopping) algorithm utilizes an aggregate deep neural network (ADNN) to learn a joint space-time decision boundary. Starting from a coarse time grid, we progressively increase the frequency of stopping opportunities, while in parallel training the ADNN to refine its timing-value estimates. We moreover design an adaptive sampling strategy that gradually concentrates training effort near the stopping boundary. Benchmarked results show that CARLOS delivers higher prices than existing Bermudan solvers, approaching the American upper bound, and achieves high computational efficiency relative to non-RL comparators.

2606.17551 2026-06-17 cs.LG cs.AI 新提交

Reversal Q-Learning

逆向Q学习

Aditya Oberai, Seohong Park, Sergey Levine

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 提出逆向Q学习(RQL)算法,通过扩展MDP框架和逆向流生成虚拟在线轨迹,结合偏差-方差缩减技术,实现基于流策略的离线强化学习,在50个机器人任务中取得最佳平均性能。

详情
AI中文摘要

迭代生成建模技术(如流匹配)为建模复杂行为以进行有效的离线强化学习(RL)提供了强大工具。在这项工作中,我们提出了一种新的离策略RL算法,该算法基于先验数据训练流策略。我们的想法始于“扩展”马尔可夫决策过程(MDP)框架,该框架将单个流细化步骤视为MDP中的独立动作。为了在该框架中实现离策略RL,我们应用了两种技术:我们通过“逆向”流生成虚拟在线轨迹,使该框架与先验数据兼容;并应用偏差-方差缩减技术来缓解离策略RL中的视界诅咒。我们将由此产生的算法称为逆向Q学习(RQL)。RQL相比先前基于流的RL方法具有若干优势:它不受时间反向传播的影响,更好地利用学习到的价值函数,并直接训练完整的、富有表现力的流策略。通过在50个具有挑战性的模拟机器人任务上的实验,我们表明,与最先进的基于流的离线RL算法相比,RQL实现了最佳的平均离线RL性能。

英文摘要

Iterative generative modeling techniques, such as flow matching, provide powerful tools to model complex behaviors for effective offline reinforcement learning (RL). In this work, we propose a new off-policy RL algorithm that trains a flow policy based on prior data. Our idea starts from the "expanded" Markov decision process (MDP) framework, which treats individual flow refinement steps as separate actions in an MDP. To enable off-policy RL within this framework, we apply two techniques: we generate virtual on-policy trajectories (by "reversing" flows) to make this framework compatible with prior data, and we apply a bias-and-variance reduction technique to mitigate the curse of horizon in off-policy RL. We call the resulting algorithm Reversal Q-learning (RQL). RQL has several advantages over previous flow-based RL methods: it does not suffer from backpropagation through time, makes better use of the learned value function, and directly trains the full, expressive flow policy. Through our experiments on 50 challenging simulated robotic tasks, we show that RQL leads to the best average offline RL performance compared to state-of-the-art flow-based offline RL algorithms.

2606.17680 2026-06-17 cs.LG cs.CL 新提交

EnvRL: Learn from Environment Dynamics in Agentic Reinforcement Learning

EnvRL: 在智能体强化学习中从环境动力学中学习

Zhitong Wang, Songze Li, Hao Peng, Shuzheng Si, Yi Wang, Maosong Sun, Juanzi Li

发表机构 * Department of Computer Science and Technology, Tsinghua University(清华大学计算机科学与技术系) Shanghai AI Laboratory(上海人工智能实验室)

AI总结 提出EnvRL框架,通过状态预测和逆动力学两个辅助目标,将环境动力学学习融入智能体强化学习,在长周期任务中显著提升成功率。

详情
AI中文摘要

强化学习已成为训练大型语言模型作为智能体的强大范式。然而,针对长周期智能体任务的常规强化学习方法往往难以处理稀疏的结果奖励。直观上,这忽略了展开交互轨迹中包含的丰富环境动力学信息。我们认为交互体验本身固有地充当隐式监督信号,揭示了环境的潜在转换机制,并使智能体能够构建更准确的环境内部模型。因此,在这项工作中,我们研究了如何利用这一额外信号来改进策略学习。具体来说,我们提出了EnvRL,一个通过两个辅助目标(状态预测和逆动力学)将环境动力学学习融入智能体强化学习的框架。通过与主要强化学习目标联合优化,我们鼓励智能体从其自身的交互体验中内化环境动力学。在两个长周期智能体基准上的大量实验表明,EnvRL在成功率上比仅使用强化学习的基线有显著提升,例如,当使用GRPO训练时,在ALFWorld上将Qwen-2.5-1.5B-Instruct从72.8%提升到77.4%,在WebShop上从56.8%提升到67.0%。

英文摘要

Reinforcement learning (RL) has emerged as a powerful paradigm for training Large Language Models (LLMs) as agents. However, conventional RL methods for long-horizon agentic tasks often struggle with sparse outcome rewards. Intuitively, this overlooks the rich environment dynamics information contained in rollout interaction trajectories. We argue that the interaction experience inherently serves as an implicit supervision signal, reveals the underlying transition mechanisms of the environment, and enables the agent to construct a more accurate internal model of the environment.. Therefore, in this work, we investigate how to leverage this additional signal to improve policy learning. Specifically, we propose EnvRL, a framework that incorporates environment dynamics learning into agentic RL via two auxiliary objectives: state prediction and inverse dynamics. By jointly optimizing with the primary RL objective, we encourage the agent to internalize environment dynamics from its own interaction experience. Extensive experiments on two long-horizon agentic benchmarks demonstrate that EnvRL achieves significant improvements on success-rates over RL-only baselines, e.g., when trained with GRPO, lifting Qwen-2.5-1.5B-Instruct from 72.8% to 77.4% on ALFWorld, and from 56.8% to 67.0% on WebShop.

2606.18106 2026-06-17 cs.LG 新提交

Deep Reinforcement Learning for Minimum Zero-Forcing Sets

深度强化学习用于最小零强制集

Steve Halley, Maurício Gruppi

发表机构 * Department of Computing Sciences, Villanova University(维拉诺瓦大学计算科学系)

AI总结 提出一种基于强化学习的框架SD-ZFS,通过改进S2V-DQN架构求解最小零强制集问题,在多种图结构上验证了其优于贪心启发式算法。

详情
AI中文摘要

本文探讨了在无向图中寻找最小零强制集的问题,并提出了一种自适应的机器学习框架来解决该问题。最小零强制集问题是一种图着色问题,其中初始节点集的颜色在整个网络中传播。如果节点集在颜色变化规则的约束下迫使所有未着色节点改变颜色,则该节点集是零强制集。该问题在网络科学、网络控制和逻辑电路设计等不同领域有多种应用。寻找最小零强制集已被证明是NP难的。我们提出了一种强化学习框架SD-ZFS,该框架将S2V-DQN架构适配到ZFS问题。我们在该适配框架上训练了多个模型,并分析了在不同结构图数据集上的性能。我们评估了在该框架上训练的模型在不同网络类型上的泛化、扩展和迁移能力。结果表明,与最优解和贪心启发式算法相比,该框架是有效的。我们进一步深入了解了如何通过机器学习解决ZFS问题以及网络结构对该问题的影响。

英文摘要

This paper explores the problem of finding the minimum zero-forcing set on undirected graphs and proposes an adapted machine-learning framework to solve the problem. The minimum zero-forcing set problem is a graph coloring problem where the color of an initial set of nodes propagates throughout a network. The set of nodes is zero-forcing if it forces all uncolored nodes to change color under the constraint of the color-change rule. There are several applications to this problem across different domains such as network science, network control, and designing logical circuits. Finding the minimum zero-forcing set is shown to be NP-hard. We propose a reinforcement learning framework, SD-ZFS, that adapts the S2V-DQN architecture to the ZFS problem. We train several models on this adapted framework and analyze the performance across graph datasets that have varying structures. We evaluate how the models trained on the framework generalize, scale, and transfer to different network types. The results demonstrate the effectiveness of the framework when compared against the optimal solution and greedy heuristic. We provide further insight into how the ZFS problem can be solved through machine-learning and the influence of network structure on the problem.

2606.18111 2026-06-17 cs.LG cs.AI 新提交

Learning Fair Pareto-Optimal Policies in Multi-Objective Reinforcement Learning

多目标强化学习中学习公平帕累托最优策略

Umer Siddique, Peilang Li, Yongcan Cao

AI总结 针对多目标强化学习中固定用户偏好无法提供多样化策略的问题,提出基于广义基尼福利函数的多策略方法,学习公平帕累托最优策略集。

Comments Accepted at the Reinforcement Learning Conference (RLC) 2025. 12 pages main + appendix, 8 figures, 4 tables

详情
AI中文摘要

公平性是多目标强化学习(MORL)决策中的一个重要方面,策略必须确保在多个潜在冲突的目标上既达到最优又实现公平。虽然单策略MORL方法可以使用福利函数(如广义基尼福利函数GGF)为固定的用户偏好学习公平策略,但它们无法提供动态或未知用户偏好所需的多样的策略集。为解决这一局限性,我们形式化了多策略MORL中的公平优化问题,其目标是学习一组帕累托最优策略,确保在所有可能的用户偏好下实现公平。我们的关键技术贡献有三点:(1)我们证明对于凹的、分段线性的福利函数(例如GGF),公平策略仍然在凸覆盖集(CCS)中,CCS是线性标量化下的近似帕累托前沿。(2)我们证明非平稳策略(通过累积奖励历史增强)和随机策略通过动态适应历史不公平性来改善公平性。(3)我们提出了三种新算法,包括将GGF与多策略多目标Q学习(MOQL)集成、用于学习非平稳策略的状态增强多策略MOQL,以及用于学习随机策略的新扩展。我们在多个领域评估了我们的算法,并将我们的方法与最先进的MORL基线进行了比较。实验结果表明,我们的方法学习了一组公平策略,能够适应不同的用户偏好。

英文摘要

Fairness is an important aspect of decision-making in multi-objective reinforcement learning (MORL), where policies must ensure both optimality and equity across multiple, potentially conflicting objectives. While single-policy MORL methods can learn fair policies for fixed user preferences using welfare functions such as the generalized Gini welfare function (GGF), they fail to provide the diverse set of policies necessary for dynamic or unknown user preferences. To address this limitation, we formalize the fair optimization problem in multi-policy MORL, where the goal is to learn a set of Pareto-optimal policies that ensure fairness across all possible user preferences. Our key technical contributions are threefold: (1) We show that for concave, piecewise-linear welfare functions (e.g., GGF), fair policies remain in the convex coverage set (CCS), which is an approximated Pareto front for linear scalarization. (2) We demonstrate that non-stationary policies, augmented with accrued reward histories, and stochastic policies improve fairness by dynamically adapting to historical inequities. (3) We propose three novel algorithms, which include integrating GGF with multi-policy multi-objective Q-Learning (MOQL), state-augmented multi-policy MOQL for learning non-statoinary policies, and its novel extension for learning stochastic policies. We evaluate our algorithms across various domains and compare our methods against the state-of-the-art MORL baselines. The empirical results show that our methods learn a set of fair policies that accommodate different user preferences.

2606.17847 2026-06-17 cs.AI cs.LG 交叉投稿

WallZero: Mastering the Game of WallGo with Strategic Analysis

WallZero:通过战略分析掌握WallGo游戏

Hsing-Yu Chen, Jérôme Arjonilla, I-Chen Wu, Ti-Rong Wu

发表机构 * National Yang Ming Chiao Tung University(国立阳明交通大学) Academia Sinica(中央研究院)

AI总结 提出基于AlphaZero的WallZero智能体,通过定制动作和特征设计,在WallGo游戏中击败职业围棋选手,并分析游戏公平性与关键策略。

Comments Accepted by the Computers and Games conference (CG 2026)

详情
AI中文摘要

WallGo是一种最近引入的战略棋盘游戏,因2025年Netflix系列剧《The Devil's Plan》而流行。尽管在7x7的小棋盘上进行,但其石头移动和墙壁放置的组合导致了高游戏树复杂性和复杂的战略互动。尽管其日益流行,WallGo仍未得到充分探索。本文提出了WallZero,一个基于AlphaZero的双人WallGo设置智能体。我们引入了定制的动作和特征设计,以显著提高游戏性能。在评估中,WallZero击败了参与本研究的两位职业围棋选手,平均每局获得1.98倍的地盘。除了其强度,我们使用WallZero评估游戏公平性并识别掌握WallGo的关键策略。有趣的是,我们的结果显示,Netflix系列剧中使用的开局产生了更平衡的游戏。我们的代码可在以下网址获取:此 https URL。

英文摘要

WallGo is a recently introduced strategic board game popularized by the 2025 Netflix series The Devil's Plan. Although played on a small 7 x 7 board, its combination of stone movement and wall placement yields high game-tree complexity and intricate strategic interactions. Despite its growing popularity, WallGo remains underexplored. This paper presents WallZero, an AlphaZero-based agent for the two-player WallGo setting. We introduce tailored action and feature designs to improve playing performance significantly. In the evaluation, WallZero defeats two professional Go players who participated in this study, securing on average 1.98x more territory per game. Beyond its strength, we use WallZero to assess game fairness and identify key strategies for mastering WallGo. Interestingly, our results show that the opening used in the Netflix series yields a more balanced game. Our code is available at https://rlg.iis.sinica.edu.tw/papers/wallzero.

2502.17518 2026-06-17 cs.LG cs.AI q-fin.CP stat.ML 版本更新

Ensemble RL through Classifier Models: Enhancing Risk-Return Trade-offs in Trading Strategies

通过分类器模型进行集成强化学习:在交易策略中增强风险回报权衡

Zheli Xiong

AI总结 本文研究了在金融交易策略中使用集成强化学习模型的全面研究,利用分类器模型来提升性能。通过将A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归相结合,探讨不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个强化学习模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

Comments 23 pages,10 figures, 9 table

详情
AI中文摘要

本文提出了一项全面研究,探讨在金融交易策略中使用集成强化学习(RL)模型的应用,利用分类器模型来提升性能。通过结合A2C、PPO和SAC等强化学习算法与传统分类器如支持向量机(SVM)、决策树和逻辑回归,我们研究了不同分类器组如何整合以改善风险回报权衡。研究评估了各种集成方法的有效性,将其与单个RL模型在关键金融指标(包括累计回报率、夏普比率(SR)、卡勒姆比率和最大回撤(MDD))上进行比较。我们的结果表明,集成方法在风险调整后的回报方面始终优于基础模型,提供了更好的回撤管理和整体稳定性。然而,我们发现集成性能对方差阈值τ的选择敏感,强调了动态调整τ以达到最佳性能的重要性。本研究强调了将强化学习与分类器结合在自适应决策中的价值,对金融交易、机器人和其他动态环境具有启示。

英文摘要

This paper presents a comprehensive study on the use of ensemble Reinforcement Learning (RL) models in financial trading strategies, leveraging classifier models to enhance performance. By combining RL algorithms such as A2C, PPO, and SAC with traditional classifiers like Support Vector Machines (SVM), Decision Trees, and Logistic Regression, we investigate how different classifier groups can be integrated to improve risk-return trade-offs. The study evaluates the effectiveness of various ensemble methods, comparing them with individual RL models across key financial metrics, including Cumulative Returns, Sharpe Ratios (SR), Calmar Ratios, and Maximum Drawdown (MDD). Our original experimental results demonstrate that ensemble methods often outperform base models in terms of risk-adjusted returns, providing better management of drawdowns and overall stability. However, both the original analysis and the additional reproduction reported in this version show that ensemble performance is sensitive to the choice of variance threshold \(τ\), classifier group, RL-agent pair, and market universe. The reproduction evidence strengthens the conclusion that classifier-assisted ensemble selection can improve robustness, while also clarifying that the advantage is conditional rather than automatic across all datasets. This study emphasizes the value of combining RL with classifiers for adaptive decision-making, with implications for financial trading, robotics, and other dynamic environments.

2506.03802 2026-06-17 cs.LG 版本更新

Learning in Matching Games with Bandit Feedback

带强盗反馈的匹配博弈学习

Andreas Athanasopoulos, Christos Dimitrakakis

AI总结 研究广义双边匹配市场中代理通过零和博弈交互的学习问题,提出基于UCB的算法,以匹配不稳定性为遗憾度量,实现次线性遗憾上界。

Comments 22 pages, 2 figures

详情
AI中文摘要

我们在广义双边匹配市场中引入了一个学习问题,其中代理选择行动以与其匹配对象进行交互。具体来说,我们考虑一个场景,其中匹配的代理参与具有初始未知收益矩阵的零和博弈,并研究集中式程序是否可以从强盗反馈中学习均衡。我们采用\emph{匹配均衡}的解概念,其中匹配\( \mathfrak{m} \)和一组代理策略\( X \)构成均衡,如果没有代理有动机偏离\( (\mathfrak{m}, X) \)。为了量化候选解\( (\mathfrak{m}, X) \)与均衡\( (\mathfrak{m}^\star, X^\star) \)的偏差,我们引入了\emph{匹配不稳定性}的概念,它作为学习问题的遗憾度量。我们提出了一种基于UCB的算法,其中代理根据收益的乐观估计形成偏好并选择行动。我们的分析建立了一个次线性、实例无关的遗憾上界,并得到了经验证据的进一步支持。

英文摘要

We introduce a learning problem in a generalized two-sided matching market, where agents select actions to interact with their match. Specifically, we consider a setting in which matched agents engage in zero-sum games with initially unknown payoff matrices, and we investigate whether a centralized procedure can learn an equilibrium from bandit feedback. We adopt the solution concept of a \emph{matching equilibrium}, where a matching \( \mathfrak{m} \) and a set of agent strategies \( X \) form an equilibrium if no agent has an incentive to deviate from \( (\mathfrak{m}, X) \). To quantify deviations of a candidate solution \( (\mathfrak{m}, X) \) from the equilibrium \( (\mathfrak{m}^\star, X^\star) \), we introduce the notion of \emph{matching instability}, which serves as a regret measure for the learning problem. We propose a UCB-based algorithm in which agents form preferences and select actions according to optimistic estimates of the payoffs. Our analysis establishes a sublinear, instance-independent regret upper bound, further supported by empirical evidence.

2602.06014 2026-06-17 cs.LG cs.AI math.OC math.ST stat.ML stat.TH 版本更新

Optimism Stabilizes Thompson Sampling for Adaptive Inference

乐观主义稳定自适应推断的汤普森采样

Shunxing Yan, Han Zhong

AI总结 本文通过引入乐观机制(如方差膨胀或均值奖励)稳定汤普森采样,使得各臂拉取次数收敛于确定性尺度,从而在K臂随机bandit中实现渐近有效的Wald推断,并解决了多最优臂的扩展问题。

Comments Accepted in part to COLT 2026

详情
AI中文摘要

汤普森采样(TS)广泛用于随机多臂老虎机,但其在自适应数据收集下的推断性质微妙。样本均值的经典渐近理论可能失效,因为臂特定样本量是随机的,并通过动作选择规则与奖励耦合。我们研究了具有高斯随机指数的K臂随机bandit中汤普森采样的自适应推断,其中奖励噪声为独立次高斯,并确定乐观主义是恢复稳定性的关键机制,即每个臂的拉取次数集中在确定性尺度附近。这种稳定性使得尽管自适应采样,仍能获得渐近有效的Wald推断。首先,我们证明方差膨胀的TS对任意K≥2是稳定的,包括多个臂最优的挑战性情况,对最优臂具有渐近均匀分配,对次优臂具有尖锐的对数拉取次数渐近性。这解决了Halder等人提出的K臂扩展问题,使用新的胜者图和Lyapunov漂移技术来控制多个最优臂之间的分配。其次,我们分析了一种替代的乐观修改,保持高斯指数方差不变但向指数中心添加显式均值奖励,并建立了类似的稳定性结论。总之,适当实施的乐观主义稳定了汤普森采样,并在多臂老虎机中实现了渐近有效的Wald推断,同时仅产生轻微额外的遗憾代价。

英文摘要

Thompson sampling (TS) is widely used for stochastic multi-armed bandits, yet its inferential properties under adaptive data collection are subtle. Classical asymptotic theory for sample means can fail because arm-specific sample sizes are random and coupled with the rewards through the action-selection rule. We study adaptive inference for Thompson sampling with Gaussian randomized indices in $K$-armed stochastic bandits with independent sub-Gaussian reward noises, and identify \emph{optimism} as a key mechanism for restoring \emph{stability}, meaning that each arm's pull count concentrates around a deterministic scale. This stability yields asymptotically valid Wald inference despite adaptive sampling. First, we prove that variance-inflated TS is stable for any $K \ge 2$, including the challenging regime where multiple arms are optimal, with asymptotically uniform allocation over optimal arms and sharp logarithmic pull-count asymptotics for suboptimal arms. This resolves the $K$-armed extension question raised by \citet{halder2025stable}, using new winner-map and Lyapunov-drift techniques to control allocation among multiple optimal arms. Second, we analyze an alternative optimistic modification that keeps the Gaussian index variance unchanged but adds an explicit mean bonus to the index center, and establish a similar stability conclusion. In summary, suitably implemented optimism stabilizes Thompson sampling and enables asymptotically valid Wald inference in multi-armed bandits, while incurring only a mild additional regret cost.

2602.23116 2026-06-17 cs.LG cs.GT stat.ML 版本更新

Provably Efficient Regularized Online RLHF with Generalized Bilinear Preferences

具有广义双线性偏好的可证明高效正则化在线RLHF

Junghyun Lee, Minju Hong, Kwang-Sung Jun, Chulhee Yun, Se-Young Yun

AI总结 研究在线RLHF中正则化最佳响应最大遗憾最小化问题,通过广义双线性偏好模型证明强凸性可导出多对数遗憾,表明快速遗憾不限于KL散度。

Comments 48 pages, 3 figures (ver3: major revisions; ver2: more colorful boxes, fixed some typos)

详情
AI中文摘要

我们考虑在一般偏好和bandit反馈下在线RLHF中的正则化最佳响应最大遗憾最小化问题。虽然各种正则化器被用于增强对齐的鲁棒性,但已知的多对数遗憾保证仍然高度特定于KL。为了研究这种快速速率是否扩展到KL之外,我们采用广义双线性偏好模型(GBPM)——通过一个秩为$2r$的斜对称矩阵捕获$d$维逐项特征上的非传递偏好——以隔离一般正则化的影响。关键地,在GBPM下,我们证明任何贪婪策略的对偶间隙受限于平方估计误差,该误差仅利用强凸性和斜对称性导出。在特征覆盖假设下,我们通过贪婪采样建立了$\tilde{\mathcal{O}}(\eta d^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$的通用多对数遗憾,并通过探索后提交(Explore-Then-Commit)建立了$\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{\eta r T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$的维度改进遗憾(对于条件良好的臂集),其中$\eta^{-1}$是正则化系数,$T$是时间范围,$C_{\min}$是依赖于臂集的量。这表明“快速”遗憾并非KL特有,而是通用强凸几何的基本结果。

英文摘要

We consider the problem of regularized best-response max-regret minimization in online RLHF under general preferences and bandit feedback. While various regularizers are utilized to robustify alignment, known polylogarithmic regret guarantees remain heavily specific to KL. To investigate whether such fast rates extend beyond KL, we adopt the Generalized Bilinear Preference Model (GBPM) -- capturing intransitive preferences over $d$-dimensional item-wise features via a rank-$2r$ skew-symmetric matrix -- to isolate the impact of generic regularization. Crucially, under GBPM, we prove that the dual gap of any greedy policy is bounded by the squared estimation error, derived using \emph{only} strong convexity and skew-symmetry. Under a feature coverage assumption, we establish a \emph{generic} polylogarithmic regret of $\tilde{\mathcal{O}}(ηd^4 C_{\min}^{-1} (\log T)^2 \wedge d^2 C_{\min}^{-1/2} \sqrt{T})$ with Greedy Sampling, and a dimension-wise improved regret (for well-conditioned arm-sets) of $\tilde{\mathcal{O}}(C_{\min}^{-2} \sqrt{ηr T} \wedge r^{1/3} C_{\min}^{-4/3} T^{2/3})$ with Explore-Then-Commit, where $η^{-1}$ is the regularization coefficient, $T$ is the time horizon, and $C_{\min}$ is an arm-set dependent quantity. This demonstrates that ``fast'' regrets are not KL-specific, but rather a fundamental consequence of generic strongly convex geometry.

2604.18701 2026-06-17 cs.LG cs.AI stat.ML 版本更新

Curiosity-Critic: Cumulative Prediction Error Improvement as a Tractable Intrinsic Reward for World Model Training

Curiosity-Critic:累积预测误差改进作为世界模型训练的可处理内在奖励

Vin Bhaskara, Haicheng Wang

AI总结 提出Curiosity-Critic方法,通过可处理的每步替代项(当前预测误差与渐近误差基线的差值)作为内在奖励,利用共训练的评论家在线估计误差基线,有效分离可约与不可约预测误差,在随机网格世界实验中优于现有方法。

Comments Accepted to ICML 2026 Workshop on Epistemic Intelligence in Machine Learning (EIML@ICML 2026). Code: https://github.com/vinbhaskara/Curiosity-Critic

详情
AI中文摘要

基于局部预测误差的好奇心奖励仅关注当前转移,而不考虑世界模型在所有已访问转移上的累积预测误差。我们引入了Curiosity-Critic,其内在奖励基于这一累积目标的改进,并证明它有一个可处理的每步替代项:当前预测误差与当前状态转移的渐近误差基线之间的差值。我们通过一个与世界模型共同训练的评论家在线估计这一误差基线;由于评论家只需学习一个转移的预测难度,其对不可约噪声基线的估计在世界模型饱和之前就已收敛,从而将探索引导向可学习的转移。该奖励对可学习转移较高,而对随机转移趋近于零,从而在线分离认知(可约)和偶然(不可约)预测误差。从Schmidhuber(1991)到学习特征空间变体的先前预测误差好奇心公式,都作为该误差基线的特定近似特例出现。在随机网格世界上的实验表明,Curiosity-Critic在训练速度和最终世界模型准确性上优于基于预测误差、访问计数和随机网络蒸馏的方法。

英文摘要

Local prediction-error-based curiosity rewards focus on the current transition without considering the world model's cumulative prediction error across all visited transitions. We introduce Curiosity-Critic, which grounds its intrinsic reward in the improvement of this cumulative objective, and show that it admits a tractable per-step surrogate: the difference between the current prediction error and the asymptotic error baseline of the current state transition. We estimate this error baseline online with a learned critic co-trained alongside the world model; since the critic only has to learn how hard a transition is to predict, its estimate of the irreducible noise floor converges well before the world model saturates, redirecting exploration toward learnable transitions. The reward is higher for learnable transitions and collapses toward zero for stochastic ones, thereby separating epistemic (reducible) from aleatoric (irreducible) prediction error online. Prior prediction-error curiosity formulations, from Schmidhuber (1991) to learned-feature-space variants, emerge as special cases corresponding to specific approximations of this error baseline. Experiments on a stochastic grid world show that Curiosity-Critic outperforms prediction-error, visitation-count, and Random Network Distillation methods in training speed and final world model accuracy.

2606.16590 2026-06-17 cs.LG cs.AI q-bio.NC 版本更新

Infant Spontaneous Movement Noise Improves Exploration in Deep RL

婴儿自发运动噪声改善深度强化学习中的探索

Francisco M. López, Markus R. Ernst, Francisco Cruz, Matej Hoffmann, and Jochen Triesch

发表机构 * Frankfurt Institute for Advanced Studies(法兰克福高等研究所) School of Computer Science and Engineering, University of New South Wales(新南威尔士大学计算机科学与工程学院) Escuela de Ingeniería, Universidad Central de Chile(智利中央大学工程学院) Faculty of Electrical Engineering, Czech Technical University(捷克理工大学电气工程学院)

AI总结 受婴儿自发运动噪声启发,提出一种在RL训练中逐步增加时间自相关的探索噪声机制,实验表明其能产生结构化探索行为并提高学习效率。

Comments 6 pages, 4 figures, 1 table. Accepted at IEEE ICDL 2026. Cite as: F. M. López, M. R. Ernst, F. Cruz, M. Hoffmann, and J. Triesch, "Infant Spontaneous Movement Noise Improves Exploration in Deep RL", in 2026 IEEE International Conference on Development and Learning (ICDL). IEEE, 2026, pp. 1-6

详情
AI中文摘要

深度强化学习(RL)中的探索通常实现为时间上不相关的白噪声。然而,最近的研究表明,时间相关的有色噪声可以通过产生更平滑的轨迹和更好的状态空间覆盖来提高探索效率。我们探究受婴儿自发运动启发的动作噪声是否也能改善深度RL中的探索。我们发现婴儿末端执行器速度的功率谱密度遵循有色噪声过程,其谱指数随年龄增长而增加。受这一发育模式的启发,我们引入了一种机制,在RL训练过程中逐步增加探索噪声的时间自相关,与婴儿统计数据相匹配。在多个RL环境中的实验表明,婴儿启发的噪声产生结构化的探索行为,并且与传统的探索策略相比可以提高学习效率。这些发现表明,人类运动和认知发展可以为人工智能体的学习机制设计提供有用的指导。我们的代码可在 https://github.com/trieschlab/baby-noise-rl 获取。

英文摘要

Exploration in deep reinforcement learning (RL) is commonly implemented as temporally uncorrelated white noise. However, recent works show that temporally correlated colored noise can improve exploration efficiency by producing smooth trajectories with better coverage of the state space. We inquire whether action noise inspired by infant spontaneous movements can also improve exploration in deep RL. We find that the power spectral densities of babies' end-effector velocities follow a colored noise process where the spectral exponent increases with age. Inspired by this developmental pattern, we introduce a mechanism that progressively increases the temporal auto-correlation of exploration noise during RL training, matching the infant statistics. Experiments across several RL environments show that infant-inspired noise produces structured exploratory behavior and can improve learning efficiency compared to conventional exploration strategies. These findings suggest that human motor and cognitive development can provide useful guidance for designing learning mechanisms in artificial agents. Our code is available at https://github.com/trieschlab/baby-noise-rl.

2509.26633 2026-06-17 cs.RO cs.AI cs.LG cs.SY eess.SY 版本更新

OmniRetarget: Interaction-Preserving Data Generation for Humanoid Whole-Body Loco-Manipulation and Scene Interaction

OmniRetarget:面向人形全身运动操控与场景交互的交互保持数据生成

Lujie Yang, Xiaoyu Huang, Zhen Wu, Angjoo Kanazawa, Pieter Abbeel, Carmelo Sferrazza, C. Karen Liu, Rocky Duan, Guanya Shi

AI总结 提出OmniRetarget引擎,通过交互网格显式建模并保持智能体、地形和物体间的空间与接触关系,将人类运动重定向为机器人运动,生成高质量轨迹以训练强化学习策略,实现长时间跑酷和操控技能。

Comments Project website: https://omniretarget.github.io

详情
AI中文摘要

教授人形机器人复杂技能的主流范式是将人类运动重定向为运动学参考,以训练强化学习(RL)策略。然而,现有的重定向流程常常难以应对人与机器人之间的显著具身差异,产生物理上不可信的伪影,如脚滑和穿透。更重要的是,常见的重定向方法忽略了对于表达性运动及运动操控至关重要的丰富的人-物和人-环境交互。为解决这一问题,我们引入了OmniRetarget,一种基于交互网格的交互保持数据生成引擎,该网格显式建模并保持智能体、地形和操作对象之间的关键空间与接触关系。通过最小化人体与机器人网格之间的拉普拉斯变形同时施加运动学约束,OmniRetarget生成运动学上可行的轨迹。此外,保持任务相关的交互使得从单一示范到不同机器人本体、地形和物体配置的高效数据增强成为可能。我们通过将来自OMOMO、LAFAN1和我们内部MoCap数据集的运动进行重定向,全面评估了OmniRetarget,生成了超过8小时的轨迹,这些轨迹在运动学约束满足和接触保持方面优于广泛使用的基线。这种高质量数据使得本体感觉RL策略能够在Unitree G1人形机器人上成功执行长达30秒的长时间跑酷和运动操控技能,且仅使用5个奖励项和所有任务共享的简单域随机化进行训练,无需任何学习课程。

英文摘要

A dominant paradigm for teaching humanoid robots complex skills is to retarget human motions as kinematic references to train reinforcement learning (RL) policies. However, existing retargeting pipelines often struggle with the significant embodiment gap between humans and robots, producing physically implausible artifacts like foot-skating and penetration. More importantly, common retargeting methods neglect the rich human-object and human-environment interactions essential for expressive locomotion and loco-manipulation. To address this, we introduce OmniRetarget, an interaction-preserving data generation engine based on an interaction mesh that explicitly models and preserves the crucial spatial and contact relationships between an agent, the terrain, and manipulated objects. By minimizing the Laplacian deformation between the human and robot meshes while enforcing kinematic constraints, OmniRetarget generates kinematically feasible trajectories. Moreover, preserving task-relevant interactions enables efficient data augmentation, from a single demonstration to different robot embodiments, terrains, and object configurations. We comprehensively evaluate OmniRetarget by retargeting motions from OMOMO, LAFAN1, and our in-house MoCap datasets, generating over 8-hour trajectories that achieve better kinematic constraint satisfaction and contact preservation than widely used baselines. Such high-quality data enables proprioceptive RL policies to successfully execute long-horizon (up to 30 seconds) parkour and loco-manipulation skills on a Unitree G1 humanoid, trained with only 5 reward terms and simple domain randomization shared by all tasks, without any learning curriculum.

2510.19528 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Upper Lower Value Envelopes to Shape Online RL: A Principled Approach

学习上下值包络以塑造在线强化学习:一种原则性方法

Sebastian Reboul, Hélène Halconruy

AI总结 提出一种两阶段框架,利用离线数据学习值函数的上下界,并将其融入在线算法,通过解耦上下界实现更灵活紧致的近似,理论分析给出高概率遗憾界,实验表明显著降低遗憾。

Comments 35 pages, 5 figures

详情
AI中文摘要

我们研究了利用离线数据加速在线强化学习这一基本问题——该方向潜力巨大但理论基础有限。我们的研究聚焦于如何在此背景下\emph{学习}和\emph{应用}值包络。为此,我们引入了一个原则性的两阶段框架:第一阶段使用离线数据推导值函数的上下界,第二阶段将这些学习到的界融入在线算法。我们的方法通过解耦上下界扩展了先前工作,实现了更灵活和紧致的近似。与依赖固定塑形函数的方法不同,我们的包络是数据驱动的,并明确建模为随机变量,通过过滤论证确保各阶段的独立性。分析建立了由两个可解释量决定的高概率遗憾界,从而为离线预训练和在线微调之间提供了形式化的桥梁。在表格型MDP上的实验结果表明,与UCBVI和先前方法相比,我们的方法显著降低了遗憾,同时与相关方法保持竞争力。

英文摘要

We investigate the fundamental problem of leveraging offline data to accelerate online reinforcement learning - a direction with strong potential but limited theoretical grounding. Our study centers on how to \emph{learn} and \emph{apply} value envelopes within this context. To this end, we introduce a principled two-stage framework: the first stage uses offline data to derive upper and lower bounds on value functions, while the second incorporates these learned bounds into online algorithms. Our method extends prior work by decoupling the upper and lower bounds, enabling more flexible and tighter approximations. In contrast to approaches that rely on fixed shaping functions, our envelopes are data-driven and explicitly modeled as random variables, with a filtration argument ensuring independence across phases. The analysis establishes high-probability regret bounds determined by two interpretable quantities, thereby providing a formal bridge between offline pre-training and online fine-tuning. Empirical results on tabular MDPs demonstrate substantial regret reductions compared with both UCBVI and prior methods while remaining competitive with related approaches.

2601.22184 2026-06-17 cs.GT cs.LG cs.MA 版本更新

Tacit Coordination of Large Language Models

大型语言模型的隐性协调

Ido Aharon, Emanuele La Malfa, Michael Wooldridge, Sarit Kraus

AI总结 研究大型语言模型在多智能体无通信协调中的焦点涌现能力,通过博弈和搜救任务评估,发现模型在多数场景匹配或超越人类,但在数值常识和文化显著性任务中失败,并提出无学习策略改善协调。

Comments Code: https://github.com/EmanueleLM/focal-points

详情
AI中文摘要

大型语言模型(LLMs)越来越多地被部署在需要无通信协调的多智能体环境中,从人机交互到安全关键场景。人类通常通过焦点来克服缺乏沟通的问题:这些是自然突出的显著解决方案。我们首次大规模评估了焦点如何在LLMs中涌现、何时以及为何涌现,通过合作与竞争博弈(包括真实的搜救场景)比较其与人类的行为,展示了焦点何时能实现有效协调。在超过20个开源和闭源模型中,我们发现LLMs表现出显著的无通信协调能力,通常匹配或超越人类。然而,相同的模型在需要数值常识或文化细微显著性的任务中始终失败。我们还评估了简单的无学习策略,这些策略显著改善了LLMs之间以及人类与LLMs之间的协调。我们的结果揭示了现代LLMs中惊人的协调能力以及社会局限性,并提供了对其编码的潜在显著性概念的新见解。我们的发现警示,在协调环境中部署LLMs时,不应假设它们共享人类的文化和感知基础。

英文摘要

Large Language Models (LLMs) are increasingly deployed in multi-agent settings that require coordination without communication, from human-AI interaction to safety-critical scenarios. Humans often overcome the absence of communication through focal points: salient solutions that naturally stand out to all participants. We present the first large-scale evaluation of how, when, and why focal points emerge in LLMs, comparing their behaviour with humans across cooperative and competitive games, including realistic search and rescue scenarios, demonstrating when focal points enable effective coordination. Across more than 20 open- and closed-source models, we find that LLMs exhibit a remarkable ability to coordinate without communication, often matching or outperforming humans. However, the same models consistently fail in tasks requiring numerical common sense or culturally nuanced notions of salience. We additionally evaluate simple learning-free strategies that substantially improve coordination both among LLMs and between humans and LLMs. Our results reveal striking coordination capabilities, as well as social limitations in modern LLMs, and offer new insight into the latent notions of salience encoded within them. Our findings caution against assuming that LLMs share humans' cultural and perceptual substrate when deployed in coordination settings.

2602.10635 2026-06-17 cs.AI cs.LG 版本更新

OmniSapiens: A Foundation Model for Social Behavior Processing via Heterogeneity-Aware Relative Policy Optimization

OmniSapiens: 一种通过异质性感知相对策略优化进行社会行为处理的基础模型

Keane Ong, Sabri Boughorbel, Luwei Xiao, Chanakya Ekbote, Wei Dai, Ao Qu, Jingyao Wu, Rui Mao, Ehsan Hoque, Erik Cambria, Gianmarco Mengaldo, Paul Pu Liang

发表机构 * Massachusetts Institute of Technology(麻省理工学院) National University of Singapore(新加坡国立大学) Nanyang Technological University(南洋理工大学) Prince Sattam bin Abdulaziz University(普森·萨塔姆·本·阿卜杜勒阿齐兹大学) University of Rochester(罗切斯特大学)

AI总结 针对行为数据异质性导致的训练不平衡问题,提出Omnisapiens-7B 2.0基础模型,采用异质性感知相对策略优化(HARPO)方法,在10个行为任务和5个零样本泛化基准上取得最佳性能。

Comments Accepted to ICML 2026 Main Conference

详情
AI中文摘要

社交智能AI系统必须能够推理多样的人类行为任务,并泛化到新情境。然而,AI尚未达到这种社交智能水平。现有模型仍然受到行为数据训练引起的学习动态不平衡的根本限制。即,行为数据本质上是异质的,包含多种模态和预测目标,通常在不同样本间产生不均匀的训练信号。为了解决这个问题,我们开发了Omnisapiens-7B 2.0,一个专门处理异质行为数据学习的社会行为处理基础模型。这是通过异质性感知相对策略优化(HARPO)实现的,这是一种新颖的推理强化学习方法,明确地重新平衡样本间的学习信号。核心思想是近似策略更新的贡献信号,利用它们进行几何中心化和惯性平滑的优势调节。结果表明,Omnisapiens-7B 2.0在10个不同的行为任务上取得了最佳且最一致的性能,同时在所有五个保留的零样本泛化基准上也取得了最佳性能,分别提升了高达+12.02%和+9.37%。此外,Omnisapiens-7B 2.0展示了更一致和可解释的推理轨迹,支持可靠的现实世界行为应用。我们的模型和代码可在https://github.com/MIT-MI/human_behavior_atlas找到。

英文摘要

Socially intelligent AI systems must reason across diverse human behavioral tasks and generalize to new social contexts. However, behavioral data is inherently heterogeneous, comprising diverse modalities and prediction targets that produce uneven training signals across samples, creating imbalanced learning dynamics that challenge existing AI models. To address this, we develop Omnisapiens-7B 2.0, a foundation model for social behavior processing that explicitly addresses learning from heterogeneous behavioral data. This is enabled through Heterogeneity-Aware Relative Policy Optimization, a new RL method that rebalances learning signals across samples by approximating each sample's contribution to the policy update and using these estimates to drive geometrically centered, inertially smoothed advantage modulation for stable training. Omnisapiens-7B 2.0 achieves the best and most consistent performance across 10 behavioral tasks, while also attaining the best performance on all five held-out benchmarks, with gains of up to +12.02% and +9.37% respectively. Furthermore, it demonstrates more consistent and interpretable reasoning traces, supporting reliable real-world behavioral applications. Our model is available at https://github.com/MIT-MI/human_behavior_atlas.

2603.27049 2026-06-17 stat.ML cs.LG 版本更新

Overcoming the Incentive Collapse Paradox

克服激励崩溃悖论

Qichuan Yin, Ziwei Su, Shuangning Li

AI总结 针对AI辅助任务中激励崩溃问题,提出哨兵审计支付机制,在有限成本下维持正人力努力,并构建激励感知的主动统计推断框架优化审计率与采样分配。

Comments Accepted to ICML 2026

详情
AI中文摘要

AI辅助任务委派日益普遍,但此类系统中的人力成本高昂且通常不可观测。Bastani和Cachon (2025); Sambasivan等人 (2021) 的最新研究表明,基于准确度的支付方案存在激励崩溃:随着AI准确度提升,维持正向人力努力需要无界支付。我们在预算约束的委托-代理框架中研究这一现象,其中战略型人类代理的输出准确度取决于不可观测的努力。我们的第一个贡献是一般性不可能结果,表明激励崩溃不仅是简单线性支付的局限,而是任何仅基于观测任务结果的支付规则都会出现。为克服这一障碍,我们提出一种哨兵审计支付机制,该机制以有限成本强制执行严格为正且可控的人力努力水平,且与AI准确度无关。在此激励鲁棒的基础上,我们构建了一个激励感知的主动统计推断框架,联合优化(i)审计率和(ii)跨不同难度任务的主动采样与预算分配,以在单一预算下最小化最终统计损失。实验表明,相对于标准主动学习和仅审计基线,该方法改善了成本-误差权衡。

英文摘要

AI-assisted task delegation is increasingly common, yet human effort in such systems is costly and typically unobserved. Recent work by Bastani and Cachon (2025); Sambasivan et al. (2021) shows that accuracy-based payment schemes suffer from incentive collapse: as AI accuracy improves, sustaining positive human effort requires unbounded payments. We study this phenomenon in a budget-constrained principal-agent framework with strategic human agents whose output accuracy depends on unobserved effort. Our first contribution is a general impossibility result showing that incentive collapse is not merely a limitation of simple linear payments, but arises for any payment rule based only on observed task accuracy.To overcome this barrier, we propose a sentinel-auditing payment mechanism that enforces a strictly positive and controllable level of human effort at finite cost, independent of AI accuracy. Building on this incentive-robust foundation, we develop an incentive-aware active statistical inference framework that jointly optimizes (i) the auditing rate and (ii) active sampling and budget allocation across tasks of varying difficulty to minimize the final statistical loss under a single budget. Experiments demonstrate improved cost-error tradeoffs relative to standard active learning and auditing-only baselines.

2606.06227 2026-06-17 physics.flu-dyn cs.LG 版本更新

Reward hacking in physical reinforcement learning revealed by turbulent drag reduction

减阻还是奖励黑客?赚取其奖励的循环多智能体强化学习

Giorgio Maria Cavallazzi, Miguel Pérez-Cuadrado, Alfredo Pinelli

发表机构 * School of Science and Technology, Department of Engineering, City St. George’s, University of London(伦敦大学科学与技术学院,工程系,圣乔治学院)

AI总结 针对壁湍流减阻控制中强化学习奖励与设计目标偏离的问题,提出可微投影、循环策略和真实壁面功率奖励的修正方案,在诚实核算下实现17%的保守减阻。

详情
AI中文摘要

强化学习智能体最大化其奖励,这可能偏离其设计者预期的结果。在物理控制中,奖励很少弥合这一差距,而壁湍流中的减阻使其具体化。质量守恒投影耦合了智能体的输出,并抹去了策略梯度所需的每个智能体信用;无记忆策略无法解决其作用的缓慢近壁循环;压力梯度奖励通过壁面泵送功率来支付名义上的减阻。两个退化控制器实现了大的减阻,而总耗散增加,因此报告的数字可能掩盖了更耗能的流动。我们将每个缺陷追溯到其原因并加以修复:恢复信用的可微投影、具有加宽感知模板的循环策略以及基于真实壁面功率的奖励。修正后的控制器在封闭能量预算内作用于流动,在诚实核算下实现了保守的17%减阻。

英文摘要

A reinforcement-learning agent maximises its reward, which can diverge from the outcome its designer intended. In physical control the reward rarely closes that gap, and drag reduction in wall turbulence makes it concrete. A mass-conservation projection couples agents' outputs and erases the per-agent credit the policy gradient needs; a memoryless policy cannot resolve the slow near-wall cycle it acts on; and a pressure-gradient reward pays for nominal drag reduction by pumping power through the wall. Two degenerate controllers achieve large drag reductions while total dissipation rises, so the reported figure can mask a more wasteful flow. We trace each fault to its cause and fix it: a differentiable projection that restores credit, a recurrent policy with a widened sensing stencil, and a reward scored on the true wall power. The corrected controller acts on the flow within a closed energy budget, earning a conservative $17\%$ under honest accounting.

4. 生成模型与概率建模 25 篇

2606.17106 2026-06-17 cs.LG cs.CY 新提交

Informative Missingness to Generate Irregular Clinical Time Series

信息性缺失生成不规则临床时间序列

Hadi Mehdizavareh, Gabriele Santangelo, Giovanna Nicora, Simon Lebech Cichosz, Arianna Dagliati, Arijit Khan, Riccardo Bellazzi

发表机构 * Aalborg University(奥尔堡大学) University of Pavia(帕维亚大学) Bowling Green State University(博林格林州立大学)

AI总结 提出基于扩散的临床时间序列生成方法,联合建模实验室值和观察模式,在DACMI基准上验证,能捕获生理与检测行为间的临床依赖。

详情
AI中文摘要

电子健康记录中的实验室检测是不规则收集的,检测指令的缺失可能与测量值本身一样具有信息性。这种缺失反映了临床医生的决策和患者生理状态,因此直接对其建模而非将其视为预处理伪影非常重要。本文提出一种基于扩散的方法,用于生成临床时间序列,该方法使用源自MIMIC-III的公共数据填补缺失数据挑战(DACMI)基准,联合建模实验室值及其观察模式。为了保持真实的采样,我们将图表时间对齐为4小时间隔,并将入院记录分割为7天窗口,生成每个实验室值对应一个观察指示符的轨迹。应用标准变换和归一化以稳定训练。我们的方法扩展了TimeDiff框架,通过互补的扩散目标学习连续的实验室值和离散的缺失模式。实验表明,生成的数据在单个实验室分布和联合值-缺失嵌入方面与真实患者轨迹高度匹配,证明扩散模型能够捕获在类似MNAR(非随机缺失)缺失下患者生理与临床医生检测行为之间的临床有意义依赖。这些初步结果表明,我们的模型可以作为开发临床基础模型的初始组件。通过生成保留关键生理-缺失关系的合成先验,本工作激励了后续训练能够利用信息性缺失的先验数据拟合网络,我们将在扩展工作中对此进行研究。

英文摘要

Laboratory tests in electronic health records are collected irregularly, and the absence of a test order can be as informative as the measurement itself. Such missingness reflects clinicians' decisions and patient physiology, making it important to model it directly rather than treat it as a preprocessing artifact. Here we present a diffusion-based approach for generating clinical time series that jointly models laboratory values and their observation patterns using the public Data Analytics Challenge on Missing Data Imputation (DACMI) benchmark derived from MIMIC-III. To preserve realistic sampling, we align chart times into 4-hour intervals and segment admissions into 7-day windows, producing trajectories that pair each lab value with a corresponding observation indicator. Standard transformations and normalization are applied to stabilize training. Our method extends the TimeDiff framework to learn continuous lab values and discrete missingness patterns through complementary diffusion objectives. Experiments show that the generated data closely match real patient trajectories across individual lab distributions and joint value-missingness embeddings, demonstrating that diffusion models can capture clinically meaningful dependencies between patient physiology and clinicians' testing behavior under MNAR-like (missing-not-at-random) missingness. These preliminary results indicate that our model can serve as an initial component toward developing clinical foundation models. By producing synthetic priors that preserve key physiology-missingness relationships, this work motivates the subsequent training of Prior-Data Fitted Networks capable of leveraging informative missingness, which we will investigate in the extended work.

2606.17192 2026-06-17 cs.LG 新提交

Constrained Diffusion Models with Primal-Dual Inference

约束扩散模型与原始-对偶推理

Samar Hadou, Yigit Berkay Uslu, Alejandro Ribeiro

发表机构 * Department of Electrical and Systems Engineering, University of Pennsylvania(宾夕法尼亚大学电气与系统工程系)

AI总结 提出原始-对偶推理(PDI)方法,通过联合推断最优原始分布和其对偶变量,在扩散模型反向过程中交替去噪与对偶上升,实现平均约束下的熵正则化优化问题采样。

详情
AI中文摘要

本文开发了具有原始-对偶推理(PDI)的约束扩散模型,用于从具有平均约束的熵正则化优化问题的最优分布中采样。我们在拉格朗日对偶域中形式化约束采样,其中最优分布采用由最优对偶变量索引的吉布斯分布形式。PDI不是先估计该对偶乘子并在整个生成过程中冻结它,而是联合推断最优原始分布及其参数化对偶变量。每个反向扩散步骤使用与当前乘子相关的得分场去噪,然后通过使用去噪样本的估计约束违反进行对偶上升来更新乘子。为了实现这种条件得分场,我们在推理过程中遇到的对偶变量所诱导的吉布斯分布族上训练一个单一的条件得分网络。我们证明了沿推理轨迹生成的对偶变量的时间平均收敛到对偶最优的邻域,并通过依赖于调度的时间稳定性因子限定了残余对偶失配对终端分布的影响。我们在高斯混合约束采样、无线资源分配和投资组合管理上评估了PDI。

英文摘要

This paper develops constrained diffusion models with primal-dual inference (PDI) to sample from optimal distributions of entropy-regularized optimization problems with \emph{average} constraints. We formalize constrained sampling in the Lagrangian dual domain, where the optimal distribution takes the form of a Gibbs distribution indexed by the optimal dual variable. Rather than estimating this dual multiplier before sampling and freezing it throughout generation, PDI jointly infers the optimal primal distribution and its parametrizing dual variable. Each reverse diffusion step denoises using the score field associated with the current multiplier and then updates the multiplier through dual ascent using the estimated constraint violation of the denoised samples. To enable this conditional score field, we train a single dual-conditioned score network over the family of Gibbs distributions induced by the dual variables encountered during inference. We prove that the time average of the dual variables generated along the inference trajectory converges to a neighborhood of the dual optimum and bound the effect of residual dual mismatch on the terminal distribution through schedule-dependent stability factors. We evaluate PDI on constrained sampling from a mixture of Gaussians, wireless resource allocation, and portfolio management.

2606.17409 2026-06-17 cs.LG cs.AI 新提交

Discrete Autoregressive Transformer for Generative Mechanism Synthesis

离散自回归变压器用于生成式机构综合

Anar Nurizada, Anurag Purwar

发表机构 * Computer-Aided Design and Innovation Lab, Department of Mechanical Engineering, Stony Brook University(石溪大学机械工程系计算机辅助设计与创新实验室)

AI总结 提出离散自回归变压器,将平面路径综合转化为条件序列建模,通过VAE潜在变量和机构类型令牌生成关节坐标,实现多样准确机构设计。

详情
AI中文摘要

平面路径综合需要机构的耦合曲线匹配预定轨迹;从曲线到连杆的映射本质上是一对多的,跨越四杆、六杆和八杆拓扑。我们通过模拟接地评估,在一个包含超过一百万个机构的策划语料库上解决这个设计问题,报告了正向运动学和几何对齐后的Chamfer距离和动态时间规整。我们将综合问题表述为条件自回归序列建模:关节坐标被均匀量化成令牌,并由一个解码器-only变压器生成,该变压器具有目标曲线的变分自编码器(VAE)潜在变量和一个显式的机构类型令牌。训练结合了令牌交叉熵和一个高斯平滑的bin辅助损失,该损失尊重bin之间的序数结构。在推理时,一个有界潜在噪声调度在每个噪声水平下解码所有机构类型;我们根据几何误差保留前五个候选,从而在没有数据集查找的情况下产生多样准确的族。在保留测试中,平均Chamfer距离为$0.0132$,平均动态时间规整为$0.153$;一个潜在$k$-最近邻基线,在VAE空间中基于训练集邻居潜在变量进行条件化,使用相同的解码器实现了匹配拓扑的平均Chamfer距离$0.0071$和平均动态时间规整$0.117$。

英文摘要

Planar path synthesis requires mechanisms whose coupler curves match a prescribed trajectory; the mapping from curve to linkage is inherently one-to-many across four-, six-, and eight-bar topologies. We address this design problem with simulation-grounded evaluation on a curated corpus of over one million mechanisms, reporting Chamfer distance and dynamic time warping after forward kinematics and geometric alignment. We formulate synthesis as conditional autoregressive sequence modeling: joint coordinates are uniformly quantized to tokens and generated by a decoder-only transformer with a variational-autoencoder (VAE) latent of the target curve and an explicit mechanism-type token. Training combines token cross-entropy with a Gaussian-smoothed bin auxiliary loss that respects ordinal structure among bins. At inference, a bounded latent-noise schedule decodes all mechanism types at each noise level; we retain the top five candidates by geometric error, yielding diverse accurate families without dataset lookup. On held-out tests, aggregate mean Chamfer distance is $0.0132$ and mean dynamic time warping is $0.153$; a latent $k$-nearest-neighbor baseline that conditions on training-set neighbor latents in VAE space achieves matched-topology mean Chamfer distance $0.0071$ and mean dynamic time warping $0.117$ using the same decoder.

2606.17465 2026-06-17 cs.LG cs.SY eess.SY 新提交

Perron--Frobenius Operator Matching for Generative Modeling

Perron--Frobenius算子匹配用于生成建模

Shiqi Zhang, Wuwei Wu, Jaemin Oh, Jie Chen, Xiaoning Qian

发表机构 * Texas A&M University(德克萨斯农工大学) City University of Hong Kong(香港城市大学)

AI总结 提出Perron-Frobenius算子匹配(PFOM)生成框架,通过积分PF算子匹配密度演化,统一流、扩散和跳跃模型,并证明KL散度在Bregman散度中唯一保持密度级与样本条件目标等价,开发Nesterov加速训练和采样方法。

详情
AI中文摘要

我们引入了Perron--Frobenius算子匹配(PFOM),这是一个通过积分PF算子匹配密度演化的生成框架,涵盖了流、扩散和跳跃模型。我们证明,在Bregman散度中,只有Kullback--Leibler散度保持密度级和样本条件目标之间的等价性,从而产生一个等价于Koopman路径匹配的实用损失。我们进一步开发了Nesterov加速的训练和采样方法,以稳定离散化并加速收敛。PFOM将算子理论识别与现代生成建模统一起来,并为自适应字典和高维应用开辟了道路。

英文摘要

We introduce Perron--Frobenius Operator Matching (PFOM), a generative framework that matches density evolution via the integral PF operator, subsuming flow, diffusion, and jump models. We prove that among Bregman divergences, only Kullback--Leibler divergence preserves equality between density-level and sample-conditioned objectives, yielding a practical loss equivalent to Koopman path matching. We further develop Nesterov-accelerated training and sampling that stabilize discretization and accelerate convergence. %On Gaussian mixtures and two-moons, PFOM achieves faster KL/$W_2$/MMD decrease and improved wall-clock efficiency with empirical validation. PFOM unifies operator-theoretic identification with modern generative modeling and opens paths to adaptive dictionaries and high-dimensional applications.

2606.18022 2026-06-17 cs.LG 新提交

Recursive Scaling in Masked Diffusion Models

掩码扩散模型中的递归缩放

Alba Carballo-Castro, Julianna Piskorz, Paulius Rauba, Mihaela van der Schaar, Pascal Frossard

发表机构 * LTS4, EPFL, Lausanne, Switzerland(瑞士洛桑联邦理工学院LTS4实验室) University of Cambridge, Cambridge, UK(英国剑桥大学)

AI总结 提出递归掩码扩散模型(R-MDMs),通过在每个扩散步骤中重复应用同一去噪变换器增加递归深度,实现参数高效缩放,在数独和倒计时等结构化生成任务中,以更少参数匹配非递归基线性能。

详情
AI中文摘要

掩码扩散模型(MDMs)最近成为一种有前景的序列生成范式。传统上,缩放MDMs通过增加参数数量或去噪步骤数来实现。我们引入了递归掩码扩散模型(R-MDMs),它通过在每个扩散步骤中重复应用相同的去噪变换器,将递归深度作为第三个缩放轴。递归通过参数重用实现了输出的迭代细化,在不增加参数数量的情况下增加了有效模型深度。在包括数独和倒计时在内的结构化生成任务中,我们展示了R-MDMs实现了显著提升的参数效率:具有$L$次递归迭代的模型通常与具有大约$L$倍参数的非递归基线性能相当。此外,递归细化可以部分替代额外的去噪步骤,使得递归模型在推理时以更少的前向传播达到相同的生成质量。这些结果表明,递归深度是MDMs的一种实用缩放机制,提高了参数效率和测试时计算分配。

英文摘要

Masked diffusion models (MDMs) have recently emerged as a promising paradigm for sequence generation. Scaling MDMs is conventionally achieved by increasing the parameter count or the number of denoising steps. We introduce Recursive Masked Diffusion Models (R-MDMs), which add recursive depth as a third scaling axis by repeatedly applying the same denoising transformer within each diffusion step. Recursion enables iterative refinement of the output through parameter reuse, increasing effective model depth without increasing parameter count. Across structured generation tasks, including Sudoku and Countdown, we show that R-MDMs achieve substantially improved parameter efficiency: a model with $L$ recursive iterations often matches the performance of non-recursive baselines with roughly $L\times$ more parameters. Moreover, recursive refinement can partially substitute for additional denoising steps, allowing recursive models to reach the same generation quality with fewer forward passes at inference time. These results suggest that recursive depth is a practically useful scaling mechanism for MDMs, improving both parameter efficiency and the allocation of test-time compute.

2606.18066 2026-06-17 cs.LG 新提交

NoiseTilt: Noise-Tilted Reverse Kernels for Diffusion Reward Alignment

NoiseTilt: 噪声倾斜反向核用于扩散奖励对齐

Jisung Hwang, Yunhong Min, Jaihoon Kim, I-Chao Shen, Minhyuk Sung

发表机构 * KAIST(韩国科学技术院) The University of Tokyo(东京大学)

AI总结 提出噪声倾斜反向核(NTRK),通过将奖励梯度注入噪声项实现奖励引导采样,保持预训练反向核不变,每步仅需单样本,在奖励对齐任务中超越现有方法且不损失样本质量。

Comments 52 pages

详情
AI中文摘要

我们引入了噪声倾斜反向核(NTRK),这是一种奖励引导的扩散采样器,通过噪声项注入奖励梯度,保持预训练反向核不变,且每步仅需一个样本。推理时的奖励引导采样极大地扩展了预训练扩散模型的通用性。然而,现有方法面临权衡。基于梯度的引导会偏移反向均值,引导生成但将中间状态推离模型训练区域,降低质量。基于搜索的方法保持质量但无法获得梯度信号。先前没有方法能同时实现两者。NTRK通过保持反向均值固定并将噪声项偏向高奖励来解决这一问题。我们引入了一个白化算子,这是NTRK背后的核心机制,使得奖励梯度可以安全地作为噪声注入而不丢失其引导信号。在各种奖励对齐任务中,NTRK在保持样本质量的同时超越了最新的基线方法。值得注意的是,在美学生成任务上,NTRK仅用25次NFE就超越了最佳基线在500次NFE时的奖励,计算量减少了20倍。

英文摘要

We introduce the Noise-Tilted Reverse Kernel (NTRK), a reward-guided diffusion sampler that injects reward gradients through the noise term, leaving the pretrained reverse kernel unchanged and requiring only a single sample per step. Reward-guided sampling at inference time has greatly expanded the versatility of pretrained diffusion models. Yet existing methods face a trade-off. Gradient-based guidance shifts the reverse mean, steering generation but pushing intermediate states outside the region that the model was trained on and degrading quality. Search-based methods preserve quality but gain no gradient signal. No prior method achieves both. NTRK resolves this by keeping the reverse mean fixed and biasing the noise term toward high reward. We introduce a whitening operator, the central mechanism behind NTRK, that makes the reward gradient safe to inject as noise without losing its guiding signal. Across various reward alignment tasks, NTRK outperforms recent state-of-the-art baselines without losing sample quality. Remarkably, on aesthetic generation, NTRK surpasses the reward of the best baseline at 500 NFEs using only 25 NFEs, a 20$\times$ reduction in compute.

2606.18071 2026-06-17 cs.LG cs.AI 新提交

Volterra Generative Models

Volterra生成模型

Yusen Jia, Bingyan Han

发表机构 * The Hong Kong University of Science and Technology (Guangzhou)(香港科技大学(广州))

AI总结 提出Volterra生成模型,通过分数阶核引入路径依赖噪声,利用马尔可夫提升和残差状态学习,解决非马尔可夫动力学下的扩散生成问题,在MNIST和CIFAR-10上验证有效性。

Comments 36 pages

详情
AI中文摘要

基于分数的扩散模型通常使用布朗扰动,这提供了易处理的反向时间动力学,但施加了无记忆的噪声。我们引入了Volterra生成模型,这是一个连续时间的基于分数的框架,其前向过程通过分数阶核注入路径依赖噪声。为了处理非马尔可夫和非半鞅动力学,我们在两种情况下使用高斯求积构造有限维马尔可夫提升,并在平滑情况下使用混合有限差分指数近似。我们证明了平方误差界,推导了增广的线性高斯前向过程,并表明通过考虑残差状态和分析辅助高斯分数,学习可以保持数据维度。我们还识别了由共享布朗因子和有符号平滑区域权重引起的协方差和反向时间退化。退化激发了稳定条件处理,对于刚性较大的提升,则采用高斯桥重建采样器。在MNIST和CIFAR-10上的实验表明,具有小马尔可夫提升的持久分数扰动可以改善MNIST上的基于分数的生成,并为自然图像提供有前景的扩展,而桥采样器为较大提升提供了稳定机制。

英文摘要

Score-based diffusion models typically use Brownian perturbations, which provide tractable reverse-time dynamics but impose memoryless noising. We introduce Volterra generative models, a continuous-time score-based framework whose forward process injects path-dependent noise through fractional kernels. To handle the non-Markovian and non-semimartingale dynamics, we construct finite-dimensional Markovian lifts using Gaussian quadrature in both regimes and a hybrid finite-difference exponential approximation in the smooth regime. We prove squared error bounds, derive an augmented linear-Gaussian forward process, and show that the learning can remain data-dimensional by considering residual states and analytic auxiliary Gaussian scores. We also identify covariance and reverse-time degeneracies caused by shared Brownian factors and signed smooth-regime weights. The degeneracy motivates stabilized conditioning and, for stiff larger lifts, a Gaussian-bridge reconstruction sampler. Experiments on MNIST and CIFAR-10 show that persistent fractional perturbations with small Markovian lifts can improve score-based generation on MNIST and provide a promising extension to natural images, while the bridge sampler provides a stability mechanism for larger lifts.

2606.18186 2026-06-17 cs.LG cs.AI 新提交

Kolmogorov Regression for Robust Diffusion Policies

用于鲁棒扩散策略的Kolmogorov回归

Lekan Molu

发表机构 * Bala Cynwyd, PA 19004(巴拉辛威德, PA 19004)

AI总结 提出后向Kolmogorov方程将扩散策略提升至Cameron-Martin空间,用确定性边界值PDE问题替代随机分数匹配,通过精度加权损失和残差诊断实现收敛保证、轨迹规则化和无奖励故障检测。

详情
AI中文摘要

有限维扩散策略由于离散化伪影导致时间漂移,降低了长期性能(当部署在物理系统上时)。我们引入了一个后向Kolmogorov方程,将扩散策略提升至Cameron-Martin空间——希尔伯特空间的一个子集。本质上,用确定性边界值PDE问题替代随机分数匹配。我们的核心创新基于高斯测度理论,其中扩散噪声协方差算子由有色噪声分布实现,该分布规定了推理时模型样本的正则性概念。我们使用推导出的精度加权Cameron-Martin损失训练扩散模型,并引入Kolmogorov残差作为推理时的PDE诊断。这些替换产生了:(i) 收敛保证,其中界的常数取决于核的有效秩而非动作维度,(ii) 通过谱加权改进轨迹规则性,以及(iii) 无需奖励信号的确定性故障检测器。在两个应用领域的验证显示了显著改进:在PushT操作基准测试中,Cameron-Martin损失在最大回合奖励上实现了17%的提升(0.95对比0.78的MSE),并通过引入的残差幅度在推理期间减少了67.6%的步间漂移。类似地,在具有恒定在制品(CONWIP)流量控制的6站生产线上,我们实现了比经典LSTM基线低28.4%的RMSE;高饥饿事件召回率(测试周期中为1.0),以及有效的瓶颈识别(测试集中Precision@1=1.0,信噪比13倍)。然后,我们使用Hamilton-Jacobi可达性理论认证调度策略,与100次模拟运行中的无控制调度相比,死锁事件减少了96%(防止了351个事件)。

英文摘要

Finite-dimensional (FD) diffusion policies exhibit temporal drift owing to discretization artifacts that degrade long-horizon performance (when deployed on physical systems). We introduce a backward Kolmogorov equation that lifts diffusion policies to a Cameron-Martin space -- a subset of the Hilbert space. Essentially, replacing stochastic score matching with a deterministic boundary-value PDE problem. Our core innovation thrives on Gaussian measure theory whereupon the diffusion noise covariance operator is realized from a colored noise distribution which prescribes a notion of regularity on samples from the model at inference time. We train the diffusion model with a derived precision-weighted Cameron- Martin loss and a Kolmogorov residual is introduced as a PDE diagnostic during inference. These substitutions yield (i) convergence guarantees where the bound's constants depend on the effective rank of the kernel rather than action dimension, (ii) improved trajectory regularity via spectral weighting, and (iii) a deterministic failure detector without reward signals. Validation across two application domains demonstrates substantial improvements: on the PushT manipulation benchmark, the Cameron-Martin loss achieves a 17% improvement in maximum episode reward (0.95 vs. 0.78 for MSE) and 67.6% reduction in inter-step drifts during inference via the introduced residual magnitude. Similarly, on a 6-station manufacturing line with constant work-in-process (CONWIP) flow control, we achieve 28.4% lower RMSE than classical LSTM baselines; a high starvation-event recall (1.0 in test cycles), and effective bottleneck identification (Precision@1 = 1.0 in test set, 13x signal-to-noise ratio). We then certify the dispatch policies with Hamilton-Jacobi reachability theory which reduces deadlock events by 96% compared to uncontrolled dispatch over 100 simulated runs (351 events prevented).

2606.17127 2026-06-17 q-bio.QM cs.AI cs.LG 交叉投稿

Agentic Discovery of Non-Canonical Antimicrobial Peptides with AMPGAN v3

AMPGAN v3 的非经典抗菌肽智能发现

Jay Jung, Xiaohan Zhang, Shenghan Song, Mahmoud Sayedahmed, Chijian Xiang, Yunong Xu, Ahmed AbdelKhalek, Severin T. Schneebeli, Matthew J. Wargo, Jianing Li, Safwan Wshah

发表机构 * University of Vermont(弗吉尼亚大学) Larner College of Medicine, University of Vermont(弗吉尼亚大学医学学院) Purdue University(普渡大学) Department of Comparative Pathobiology(比较病理科部门) Department of Horticulture and Landscape Architecture(园艺与景观建筑部门) Department of Industrial and Molecular Pharmaceutics(工业与分子药学部门)

AI总结 提出 AMPGAN v3,一种多目标条件 GAN,扩展生成词汇至 D-氨基酸和末端修饰,通过双判别器提升稳定性,体外验证显示对革兰氏阳性菌有活性,并引入 PepCraft 多智能体框架用于端到端发现。

Comments Presented at the GenBio Workshop, ICML 2026

详情
AI中文摘要

抗菌药物耐药性每年导致超过一百万人死亡。抗菌肽(AMP)是一种有前景的解决方案,但生成式 AMP 模型尚未准备好设计含有非天然氨基酸和/或化学修饰的肽,而这些对于实际肽药物至关重要。我们提出了 AMPGAN v3,一种多目标条件 GAN,它将生成词汇扩展到 D-氨基酸和 N/C 末端修饰(如酰胺化)。通过将对抗性和活性感知监督分离到两个专门的判别器中,AMPGAN v3 显著提高了训练稳定性,并在外部分类器上优于先前的生成式 AMP 模型。我们在体外验证了跨越三个结构类别的五个候选物;其中两个对革兰氏阳性菌株表现出活性,最佳候选物对枯草芽孢杆菌的 MIC 达到 8 μg/mL。为了支持下游筛选,我们进一步提出了 PepCraft,一个用于端到端 AMP 发现的多智能体框架,其中规划智能体协调专门的执行器进行生成、过滤和验证。其优先级推荐与我们的体外结果一致。这些贡献使我们能够在小型但真实的规模上研究生成式和智能体 AI 如何在治疗性肽发现中协同作用。代码:this https URL

英文摘要

Antimicrobial resistance causes to over a million deaths annually. Antimicrobial peptides (AMPs) are a promising solution, but generative AMP models are not yet ready to design peptides with non-natural amino acids and/or chemical modifications, which are essential for real-world peptide drugs. We present AMPGAN v3, a multi-objective conditional GAN that expands the generative vocabulary to D-amino acids and N/C-terminus modifications such as amidation. By separating adversarial and activity-aware supervision across two specialized discriminators, AMPGAN v3 substantially improves training stability and outperforms prior generative AMP models on external classifiers. We validated five candidates spanning three structural classes in vitro; two showed activity against Gram-positive strains, with the best candidate reaching MIC 8 μg/mL against B. subtilis. To support downstream curation, we further present PepCraft, a multi-agent framework for end-to-end AMP discovery in which a Planning Agent orchestrates specialized executors for generation, filtering, and verification. Its prioritization recommendations align with our in vitro outcomes. Together, these contributions let us examine, on a small but real scale, how generative and agentic AI compose in therapeutic peptide discovery. Code: https://github.com/marszzibros/AMPGANv3

2606.17301 2026-06-17 cs.SD cs.LG 交叉投稿

Turning music identification into a neural forward pass

将音乐识别转化为神经前向传播

Muhammad Taimoor Haseeb, Ahmad Hammoudeh, Gus Xia

发表机构 * Music X Lab(音乐X实验室) Mohamed Bin Zayed University of Artificial Intelligence(Mohamed Bin Zayed人工智能大学)

AI总结 提出用生成式Transformer通过单次神经前向传播实现音乐识别,在短音频片段上超越传统声学指纹方法,存储和延迟显著降低。

详情
AI中文摘要

搜索是计算机科学中的基础操作,它将查询映射到集合中的匹配项。通常,它被实现为类似系统2的基于规则的流水线:计算键、探测索引、验证候选。相比之下,人类识别类似于系统1的联想式身份恢复模型,其中即使部分线索也能触发回忆,而无需显式枚举、排序甚至访问离散候选。在这里,我们展示了音乐声音识别——一个困难的搜索问题——可以通过生成式Transformer在单次神经前向传播中完成。该模型在音频数据集上训练,从短音频片段预测对应的曲目标识符。这种方法超越了最先进的声学指纹识别,对于短音频片段(1秒)的提升最大,证明了该方法不仅可行而且具有优势。此外,它将外部存储减少到基线的0.33%,并将推理延迟提高了2.3倍(p95)。而且,该模型可以拒绝未见曲目的查询,支持开放集操作,同时降低误归因风险。以音乐曲目识别为例,这项工作重新定义了搜索,使其更接近人类联想识别,远离算法数据库查找。

英文摘要

Search, a foundational operation in computer science, maps a query to a matching item in a collection. It is typically implemented as a System-2 like, rule-based pipeline in which a key is computed, an index is probed, and candidates are verified. By contrast, human recognition resembles a System-1 like, associative model of identity recovery, in which even partial cues can trigger a recall without explicitly enumerating, ranking, or even accessing discrete candidates. Here, we show that music sound identification, a difficult search problem, can be performed in a single neural feed-forward pass by a generative transformer. Trained on an audio dataset, the model predicts the corresponding track identifier from a short audio excerpt. This approach surpasses state-of-the-art acoustic fingerprinting, with the largest gains for short audio segments (1 second), demonstrating the method is not only viable but advantageous. Moreover, it reduces external storage to 0.33% of the baseline footprint and improves inference latency by 2.3x (p95). Furthermore, the model can reject queries for unseen tracks, supporting open-set operation while reducing misattribution risk. Using music track identification as an example, this work reframes search, bringing it closer in spirit to human associative recognition and away from algorithmic database lookup.

2606.17584 2026-06-17 cs.CV cs.LG 交叉投稿

Root-Selecting Fixed-Point Inversion for Rectified Flows via Trajectory Straightness

基于轨迹直线度的整流流根选择不动点反演

Semin Kim, Jihwan Yoon, Seunghoon Hong

发表机构 * KAIST(韩国科学技术院)

AI总结 提出SelFix方法,通过选择使逆轨迹更直的不动点解,在整流流中实现精确反演,提升图像重建和编辑质量。

详情
AI中文摘要

找到生成给定数据样本的初始噪声(称为反演)是下游应用(如无训练图像编辑)的关键组成部分。现有的不动点反演方法通过将每个反演步骤表述为不动点问题来提高反演精度,但它们缺乏一个原则性的机制来选择实践中可能出现的多个不动点解。我们观察到不同的选择会引发不同的反演轨迹,导致重建和编辑质量的显著变化。对于整流流,我们进一步发现这种变化与轨迹直线度密切相关,这促使我们将直线度作为原则性的选择标准。我们提出SelFix,一种不动点反演方法,它选择诱导更直逆轨迹的不动点解,同时在标准局部假设下保持收敛到精确的反演根。在FLUX.1-dev和PIE-Bench上的实验表明,SelFix改进了不动点反演,实现了比先前反演基线更强的真实图像重建和更好的源保持提示编辑。代码可在该https URL获取。

英文摘要

Finding the initial noise that generates a given data sample, known as inversion, is a key component for downstream applications such as training-free image editing. Existing fixed-point inversion methods improve inversion accuracy by formulating each inversion step as a fixed-point problem, but they lack a principled mechanism for selecting among multiple fixed-point solutions that can arise in practice. We observe that different selections induce different inversion trajectories, leading to substantial variation in reconstruction and editing quality. For rectified flows, we further find that this variation is closely associated with trajectory straightness, motivating straightness as a principled selection criterion. We propose SelFix, a fixed-point inversion method that selects fixed-point solutions inducing straighter inverse trajectories while retaining convergence to an exact inverse root under standard local assumptions. Experiments on FLUX.1-dev and PIE-Bench show that SelFix improves fixed-point inversion, achieving stronger real-image reconstruction and better source-preserving prompt-based editing than prior inversion baselines. The code is available at https://github.com/seminkim/selfix.

2410.10137 2026-06-17 cs.LG math.DG stat.CO stat.ML 版本更新

Variational autoencoders with latent high-dimensional steady geometric flows for dynamics

具有潜在高维稳态几何流的变分自编码器用于动力学

Andrew Gracyk

AI总结 提出VAE-DLM方法,在潜在空间中引入稳态几何流,通过物理信息方法求解高维流,增强潜在表示的表达能力,在PDE型数据上降低OOD误差15%-35%。

详情
Journal ref
23rd International Conference of Numerical Analysis and Applied Mathematics (ICNAAM) 2025
AI中文摘要

我们开发了用于PDE型环境数据的变分自编码器(VAE)的黎曼方法,其中包含正则化几何潜在动力学,称为VAE-DLM(具有动态潜在流形的VAE)。我们重新构建了VAE框架,使得嵌入欧几里得空间中的流形几何(受我们的几何流约束)在编码器和解码器开发的中间潜在空间中被学习。通过定制潜在空间演化的几何流,我们诱导出我们选择的潜在几何性质,这些性质反映在经验性能中。我们通过谨慎选择先验重新表述了传统的证据下界(ELBO)损失。我们开发了一个具有稳态正则化项的线性几何流。该流只需要对一个时间导数进行自动微分,并且可以在中等高维度上以物理信息方法求解,从而允许更具表达力的潜在表示。我们讨论了该流如何被表述为梯度流,并保持熵远离度量奇点。这结合特征值惩罚条件,有助于确保流形在测度上足够大、非退化且具有规范几何,从而有助于鲁棒表示。我们的方法侧重于改进的多层感知器架构,使用tanh激活函数用于流形编码器-解码器。我们在感兴趣的数据集上证明,我们的方法至少与传统VAE表现相当,且通常更好。我们的方法可以超越传统VAE以及采用我们提出架构的VAE,在选定数据集上经常将分布外(OOD)误差降低15%至35%。我们重点展示了我们的方法在环境PDE上的应用,这些PDE的解在后期保持最小变化。我们提供了经验性证明,说明如何通过VAE改进外部动力学的鲁棒学习。

英文摘要

We develop Riemannian approaches to variational autoencoders (VAEs) for PDE-type ambient data with regularizing geometric latent dynamics, which we refer to as VAE-DLM, or VAEs with dynamical latent manifolds. We redevelop the VAE framework such that manifold geometries, subject to our geometric flow, embedded in Euclidean space are learned in the intermediary latent space developed by encoders and decoders. By tailoring the geometric flow in which the latent space evolves, we induce latent geometric properties of our choosing, which are reflected in empirical performance. We reformulate the traditional evidence lower bound (ELBO) loss with a considerate choice of prior. We develop a linear geometric flow with a steady-state regularizing term. This flow requires only automatic differentiation of one time derivative, and can be solved in moderately high dimensions in a physics-informed approach, allowing more expressive latent representations. We discuss how this flow can be formulated as a gradient flow, and maintains entropy away from metric singularity. This, along with an eigenvalue penalization condition, helps ensure the manifold is sufficiently large in measure, nondegenerate, and a canonical geometry, which contribute to a robust representation. Our methods focus on the modified multi-layer perceptron architecture with tanh activations for the manifold encoder-decoder. We demonstrate, on our datasets of interest, our methods perform at least as well as the traditional VAE, and oftentimes better. Our methods can outperform this and a VAE endowed with our proposed architecture, frequently reducing out-of-distribution (OOD) error between 15% to 35% on select datasets. We highlight our method on ambient PDEs whose solutions maintain minimal variation in late times. We provide empirical justification towards how we can improve robust learning for external dynamics with VAEs.

2507.05169 2026-06-17 cs.LG cs.AI cs.CL cs.CV cs.RO 版本更新

Critique of World Model: A Generative Latent Prediction Architecture for World Modeling

世界模型批判:一种用于世界建模的生成式潜在预测架构

Eric Xing, Mingkai Deng, Jinyu Hou

AI总结 本文从心理学“假设性思维”出发,提出世界模型的核心目标是模拟真实世界的所有可行动可能性,并设计了一种基于状态化、分层、多级、混合连续/离散表示的生成式潜在预测(GLP)架构。

详情
AI中文摘要

世界模型,即生物智能体所经历并对其采取行动的真实世界环境的算法模拟器,近年来因开发具有人工(通用)智能的虚拟智能体的需求日益增长而成为一个新兴课题。关于世界模型究竟是什么、如何构建、如何使用以及如何评估,已有许多讨论。本文从著名科幻经典《沙丘》中的想象出发,并借鉴心理学文献中“假设性思维”的概念,论证世界模型的主要目标是模拟真实世界中所有可行动的可能性,以进行有目的的推理和行动。我们审视了世界建模的关键设计维度:数据、表示、架构、学习目标和使用,调查了现有方法并分析了它们的权衡。在此基础上,我们提出了一种新的通用世界模型生成式潜在预测(GLP)架构,基于有状态的、分层的、多层次的、混合连续/离散表示,以及生成式和自监督学习框架,并展望了由这种模型支持的物理、智能体和嵌套(PAN)AGI系统。

英文摘要

World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.

2601.22495 2026-06-17 cs.LG 版本更新

Gradual Fine-Tuning for Flow Matching Models

流匹配模型的渐进微调

Gudrun Thorkelsdottir, Arindam Banerjee

AI总结 提出渐进微调(GFT)框架,通过退火策略在目标分布样本下微调流生成模型,理论保证逼近真实目标,实验表明稳定性、效率与多样性优于现有方法。

Comments Preprint. Added methodology and experimental sections

详情
AI中文摘要

在数据有限、分布演变或计算受限的场景中,微调流匹配模型是一个核心挑战。尽管近期工作取得了显著进展,特别是在基于奖励的微调领域,但现有方法在稳定性、效率和多样性保持方面既未展示理论正确性,也未获得强有力的实证结果。本文提出渐进微调(GFT),一个简单而基于退火的框架,用于在仅有目标分布样本时微调流生成模型。对于随机流,GFT定义了一个温度控制的中间目标序列,平滑地插值预训练漂移和目标漂移,并在温度趋近于零时理论上逼近真实目标。我们分析证明,GFT后的样本生成可以通过使用任意(例如最优传输)耦合以及利用少步推理方法显著提高效率。实验上,GFT显著改善了收敛稳定性,同时相比其他微调方法保持或提高了生成质量、训练速度和生成多样性。我们的结果将GFT定位为在分布偏移下可扩展适应流匹配模型的简单、理论扎实且实践有效的替代方案。

英文摘要

Fine-tuning flow matching models is a central challenge in settings with limited data, evolving distributions, or computational constraints. While recent work has produced significant advances, particularly in the area of reward-based fine-tuning, current methods fail to demonstrate both theoretical correctness as well as strong empirical results in terms of stability, efficiency, and diversity preservation. In this work, we propose Gradual Fine-Tuning (GFT), a simple yet principled annealing-based framework for fine-tuning flow generative models when only samples from the target distribution are available. For stochastic flows, GFT defines a temperature-controlled sequence of intermediate objectives that smoothly interpolate between the pretrained and target drifts, provably approaching the true target as the temperature approaches zero. We analytically demonstrate that sample generation after GFT can be made substantially more efficient with the use of arbitrary (e.g., optimal transport) couplings, as well as by utilizing few-step inference methods. Empirically, GFT significantly improves convergence stability, while maintaining or improving generation quality, training speed, and generation diversity compared to other fine-tuning methods. Our results position GFT as a simple yet theoretically grounded and practically effective alternative for scalable adaptation of flow matching models under distribution shift.

2602.11590 2026-06-17 cs.LG 版本更新

Learn from Your Mistakes: Self-Correcting Masked Diffusion Models

从错误中学习:自纠正掩码扩散模型

Yair Schiff, Omer Belhasin, Roy Uziel, Guanghan Wang, Marianne Arriola, Gilad Turok, Ran Zilberstein, Michael Elad, Volodymyr Kuleshov

发表机构 * Cornell(康奈尔大学) NVIDIA(英伟达)

AI总结 提出ProSeCo框架,通过训练模型同时进行掩码去除和错误纠正,在生成过程中迭代修正已解码标记,提升样本质量并实现更快的采样速度。

Comments Code to reproduce our experiments is available here: https://github.com/kuleshov-group/proseco

详情
AI中文摘要

掩码扩散模型(MDMs)已成为自回归模型的有前途的替代方案,能够实现并行标记生成,同时保持竞争性能。尽管有这些优势,MDMs面临一个根本性限制:一旦标记被解除掩码,它们就保持固定,导致错误累积并最终降低样本质量。我们通过提出一个框架来解决这个问题,该框架训练模型同时执行掩码去除和纠正。通过重用MDM去噪网络的输出作为纠正器训练的输入,我们训练模型从潜在错误中恢复。在生成过程中,我们在掩码去除步骤之间应用额外的纠正性细化步骤,以更改解码的标记并改进输出。我们将我们的训练和采样方法命名为渐进式自纠正(ProSeCo),因为它具有独特的能力,可以迭代地细化整个序列,包括已生成的标记。我们在多个条件和无条件任务上进行了广泛的实验验证,表明我们的方法产生了更好的质量-效率权衡(采样速度提升高达约4倍),并实现了推理时计算缩放,以进一步提高样本质量,超越标准MDMs(在基准测试上提升高达约1.2倍)。

英文摘要

Masked diffusion models (MDMs) have emerged as a promising alternative to autoregressive models, enabling parallel token generation while achieving competitive performance. Despite these advantages, MDMs face a fundamental limitation: once tokens are unmasked, they remain fixed, leading to error accumulation and ultimately degrading sample quality. We address this by proposing a framework that trains a model to perform both unmasking and correction. By reusing outputs from the MDM denoising network as inputs for corrector training, we train a model to recover from potential mistakes. During generation we apply additional corrective refinement steps between unmasking ones in order to change decoded tokens and improve outputs. We name our training and sampling method Progressive Self-Correction (ProSeCo) for its unique ability to iteratively refine an entire sequence, including already generated tokens. We conduct extensive experimental validation across multiple conditional and unconditional tasks, demonstrating that \method~yields better quality-efficiency trade-offs (up to ~4x faster sampling) and enables inference-time compute scaling to further increase sample quality beyond standard MDMs (up to ~1.2x improvement on benchmarks).

2604.24357 2026-06-17 cs.LG cs.AI 版本更新

DPRM: A Plug-in Doob h transform-induced Token-Ordering Module for Diffusion Language Models

DPRM: 一种用于扩散语言模型的即插即用Doob h变换诱导的令牌排序模块

Dake Bu, Wei Huang, Andi Han, Hau-San Wong, Qingfu Zhang, Taiji Suzuki, Atsushi Nitanda

AI总结 提出DPRM模块,通过在线估计从置信度驱动排序逐步过渡到过程奖励引导排序,改进扩散语言模型的令牌排序策略,在九种任务中提升性能。

详情
AI中文摘要

扩散语言模型生成时没有固定的从左到右顺序,令牌排序是一个核心算法选择。现有系统主要使用随机掩码或置信度驱动排序,分别存在训练-测试不匹配和短视探索的问题。我们引入DPRM(Doob变换过程奖励模型),一个即插即用的令牌排序模块,保持宿主架构、去噪目标和监督不变,仅修改排序策略。DPRM从置信度驱动排序开始,通过在线估计逐渐过渡到过程奖励引导排序。我们将精确的DPRM策略描述为奖励倾斜的Gibbs揭示律,证明其阶段式Soft-BoN近似的收敛性,表明在线分桶跟踪器以经验Bernstein速率跟踪精确的DPRM分数,并在可处理的优化假设下建立样本复杂度优势。在涵盖语言推理、测试时扩展、蛋白质、单细胞、分子、DNA、文本到图像生成和VQA的九个宿主中,DPRM排序变体改进了多个语言、DNA和多模态设置,同时也识别了仅置信度排序或任务特定效用更优的边界情况。代码见:this https URL

英文摘要

Diffusion language models generate without a fixed left-to-right order, leaving token ordering as a central algorithmic choice. Existing systems mainly use random masking or confidence-driven ordering, which respectively suffer from train--test mismatch and myopic exploration. We introduce DPRM (Doob -transform Process Reward Model), a plug-in token-ordering module that keeps the host architecture, denoising objective and supervision unchanged, and modifies only the ordering policy. DPRM starts from confidence-driven ordering and gradually shifts to process-reward-guided ordering through online estimates. We characterize the exact DPRM policy as a reward-tilted Gibbs reveal law, prove convergence of its stagewise Soft-BoN approximation, show that the online bucketized controller tracks the exact DPRM score at empirical-Bernstein rates, and establish a sample-complexity advantage under tractable optimization assumptions. Across nine hosts covering language reasoning, test-time scaling, protein, single-cell, molecular, DNA, text-to-image generation, and VQA, DPRM order variants improve several language, DNA, and multimodal settings while also identifying boundary cases where confidence-only ordering or task-specific utilities are preferable. Code is available at: https://github.com/DakeBU/DPRM-DLLM

2501.09876 2026-06-17 math.NA cs.LG cs.NA 版本更新

Geometry-Preserving Encoder/Decoder in Latent Generative Models

潜在生成模型中的几何保持编码器/解码器

Wonjun Lee, Riley C. W. O'Neill, Dongmian Zou, Jeff Calder, Gilad Lerman

发表机构 * Department of Mathematics, The Ohio State University(俄亥俄州立大学数学系) Department of Mathematics, University of Minnesota(明尼苏达大学数学系) Zu Chongzhi Center for Mathematics and Computational Sciences, Duke Kunshan University(杜克-昆山大学仲长奇中心)

AI总结 本文提出一种新型几何保持编码器/解码器框架,通过保留数据分布的几何结构,在潜在扩散模型中实现更高效的训练和更快的收敛。

Comments 50 pages

详情
AI中文摘要

生成建模旨在生成与给定数据集相似的新数据样本。当使用扩散模型完成此任务时,主要挑战之一是在输入空间中解决问题,而输入空间往往非常高维。为了解决这个问题,最近的方法通过编码器将数据空间映射到较低维的潜在空间,在潜在空间中求解扩散模型,从而提高了训练效率并取得了最先进的结果。变分自编码器(VAE)是该领域最常用的编码器/解码器框架,以其学习潜在表示和生成数据样本的能力而闻名。在本文中,我们引入了一种新颖的编码器/解码器框架,其理论特性与VAE不同,专门设计用于保留数据分布的几何结构。我们证明了这种几何保持编码器在编码器和解码器训练过程中的显著优势。此外,我们提供了理论结果,证明了训练过程的收敛性,包括编码器训练的收敛保证,以及使用几何保持编码器时解码器训练收敛更快的结果。

英文摘要

Generative modeling aims to generate new data samples that resemble a given dataset. When using diffusion models for this task, one of the main challenges is solving the problem in the input space, which tends to be very high-dimensional. To address this, recent approaches solve diffusion models in the latent space through an encoder that maps from the data space to a lower-dimensional latent space, improving training efficiency and achieving state-of-the-art results. The variational autoencoder (VAE) is the most commonly used encoder/decoder framework in this domain, known for its ability to learn latent representations and generate data samples. In this paper, we introduce a novel encoder/decoder framework with theoretical properties distinct from those of the VAE, specifically designed to preserve the geometric structure of the data distribution. We demonstrate the significant advantages of this geometry-preserving encoder in the training process of both the encoder and decoder. Additionally, we provide theoretical results proving convergence of the training process, including convergence guarantees for encoder training, and results showing faster convergence of decoder training when using the geometry-preserving encoder.

2502.18049 2026-06-17 stat.ML cs.LG 版本更新

Recursive Learning Without Collapse: A Weighting-Based Stabilization Framework

无崩溃的递归学习:基于加权的稳定化框架

Hengzhi He, Shirong Xu, Guang Cheng

AI总结 针对递归生成模型训练中的模型崩溃问题,提出基于加权的训练策略,在混合真实与合成数据场景下,理论推导出最优加权方案的统一表达式,揭示合成数据利用与模型性能间的权衡。

Comments This article has been accepted for publication in Journal of the Royal Statistical Society: Series B, published by Oxford University Press

详情
AI中文摘要

最近的研究发现了递归生成模型训练中的一个有趣现象,称为模型崩溃,即基于先前模型生成的数据训练的模型表现出严重的性能下降。解决这一问题并开发更有效的训练策略已成为生成模型研究的核心挑战。在本文中,我们在一个新框架下研究这一现象,其中生成模型在每一步迭代中基于新收集的真实数据和上一步的合成数据的组合进行训练。为了开发整合真实和合成数据的最优训练策略,我们评估了加权训练方案在各种场景下的性能,包括高斯分布估计、广义线性模型和非参数估计。我们从理论上刻画了合成数据的混合比例和加权方案对最终模型性能的影响。我们的关键发现是,在不同设置下,不同合成数据比例下的最优加权方案渐近地遵循一个统一表达式,揭示了利用合成数据与模型性能之间的基本权衡。在某些情况下,分配给真实数据的最优权重对应于黄金比例的倒数。最后,我们在大量模拟数据集和一个真实表格数据集上验证了我们的理论结果。

英文摘要

Recent studies identified an intriguing phenomenon in recursive generative model training known as model collapse, where models trained on data generated by previous models exhibit severe performance degradation. Addressing this issue and developing more effective training strategies have become central challenges in generative model research. In this paper, we investigate this phenomenon within a novel framework, where generative models are iteratively trained on a combination of newly collected real data and synthetic data from the previous training step. To develop an optimal training strategy for integrating real and synthetic data, we evaluate the performance of a weighted training scheme in various scenarios, including Gaussian distribution estimation, generalized linear models, and nonparametric estimation. We theoretically characterize the impact of the mixing proportion and weighting scheme of synthetic data on the final model's performance. Our key finding is that, across different settings, the optimal weighting scheme under different proportions of synthetic data asymptotically follows a unified expression, revealing a fundamental trade-off between leveraging synthetic data and model performance. In some cases, the optimal weight assigned to real data corresponds to the reciprocal of the golden ratio. Finally, we validate our theoretical results on extensive simulated datasets and a real tabular dataset.

2602.03420 2026-06-17 cs.SD cs.LG 版本更新

CoCoEmo: Composable and Controllable Human-Like Emotional TTS via Activation Steering

CoCoEmo: 通过激活引导实现可组合且可控的类人情感语音合成

Siyi Wang, Shihong Tan, Siyi Liu, Hong Jia, Gongping Huang, James Bailey, Ting Dang

AI总结 提出基于激活引导的框架,在混合TTS模型中实现可组合的混合情感合成和文本-情感不匹配合成,发现情感韵律主要由语言模块而非流匹配模块生成。

详情
AI中文摘要

人类语音中的情感表达是微妙且组合的,通常涉及多种、有时相互冲突的情感线索,这些线索可能与语言内容不一致。相比之下,大多数表现性文本转语音系统强制执行单一话语级别的情感,压缩了情感多样性并抑制了混合或文本-情感不匹配的表达。虽然通过潜在方向向量进行激活引导提供了一种有前景的解决方案,但情感表示在TTS中是否线性可引导、在混合TTS架构中应在何处应用引导以及如何评估这种复杂的情感行为仍不清楚。本文首次系统分析了混合TTS模型中用于情感控制的激活引导,引入了一个定量、可控的引导框架,以及多评估者评估协议,实现了可组合的混合情感合成和可靠的文本-情感不匹配合成。我们的结果首次证明,情感韵律和表达变异性主要由TTS语言模块而非流匹配模块合成,并提供了一种轻量级引导方法,用于生成自然、类人的情感语音。

英文摘要

Emotional expression in human speech is nuanced and compositional, often involving multiple, sometimes conflicting, affective cues that may diverge from linguistic content. In contrast, most expressive text-to-speech systems enforce a single utterance-level emotion, collapsing affective diversity and suppressing mixed or text-emotion-misaligned expression. While activation steering via latent direction vectors offers a promising solution, it remains unclear whether emotion representations are linearly steerable in TTS, where steering should be applied within hybrid TTS architectures, and how such complex emotion behaviors should be evaluated. This paper presents the first systematic analysis of activation steering for emotional control in hybrid TTS models, introducing a quantitative, controllable steering framework, and multi-rater evaluation protocols that enable composable mixed-emotion synthesis and reliable text-emotion mismatch synthesis. Our results demonstrate, for the first time, that emotional prosody and expressive variability are primarily synthesized by the TTS language module instead of the flow-matching module, and also provide a lightweight steering approach for generating natural, human-like emotional speech.

2602.06806 2026-06-17 cs.CV cs.LG 版本更新

RAIGen: Rare Attribute Identification in Text-to-Image Generative Models

RAIGen: 文本到图像生成模型中的罕见属性识别

Silpa Vadakkeeveetil Sreelatha, Dan Wang, Serge Belongie, Muhammad Awais, Anjan Dutta

发表机构 * University of California, Berkeley(加州大学伯克利分校) UC Berkeley(加州大学伯克利分校)

AI总结 提出RAIGen框架,利用Matryoshka稀疏自编码器和新颖的少数度量,在无标签条件下发现扩散模型中的罕见属性,并支持属性放大。

Comments Accepted at ICML 2026. Webpage and code available at https://github.com/VSSILPA/RAIGen

详情
AI中文摘要

文本到图像扩散模型实现了令人印象深刻的生成质量,但继承并放大了训练数据中的偏差,扭曲了语义属性的覆盖。先前的工作以两种方式解决这一问题。封闭集方法在预定义的公平性类别(如性别、种族)中减轻偏差,假设社会显著的少数属性是先验已知的。开放集方法将任务框架化为偏差识别,突出主导输出的多数属性。两者都忽略了一个互补的任务:揭示在数据分布中代表性不足(社会、文化或风格)但仍编码在模型表示中的罕见或少数特征。我们介绍了RAIGen,据我们所知,这是第一个用于扩散模型中无标签罕见属性发现的框架,不需要预定义的少数类别。RAIGen利用Matryoshka稀疏自编码器和一种新颖的少数度量,结合神经元激活频率与语义独特性,识别出那些其最高激活图像揭示代表性不足属性的可解释神经元。实验表明,RAIGen在Stable Diffusion中发现了超出固定公平性类别的属性,可扩展到更大的模型如SDXL,支持跨架构的系统审计,并在生成过程中实现罕见属性的定向放大。项目页面可在 https://vssilpa.github.io/RAIGen_webpage/ 获取。

英文摘要

Text-to-image diffusion models achieve impressive generation quality but inherit and amplify training-data biases, skewing coverage of semantic attributes. Prior work addresses this in two ways. Closed-set approaches mitigate biases in predefined fairness categories (e.g., gender, race), assuming socially salient minority attributes are known a priori. Open-set approaches frame the task as bias identification, highlighting majority attributes that dominate outputs. Both overlook a complementary task: uncovering rare or minority features underrepresented in the data distribution (social, cultural, or stylistic) yet still encoded in model representations. We introduce RAIGen, the first framework, to our knowledge, for label-free rare-attribute discovery in diffusion models, requiring no predefined minority categories. RAIGen leverages Matryoshka Sparse Autoencoders and a novel minority metric combining neuron activation frequency with semantic distinctiveness to identify interpretable neurons whose top-activating images reveal underrepresented attributes. Experiments show RAIGen discovers attributes beyond fixed fairness categories in Stable Diffusion, scales to larger models such as SDXL, supports systematic auditing across architectures, and enables targeted amplification of rare attributes during generation. The project page is available at https://vssilpa.github.io/RAIGen_webpage/ .

2602.11453 2026-06-17 cs.IR cs.AI cs.LG 版本更新

From Noise to Order: Learning to Rank via Denoising Diffusion

从噪声到有序:通过去噪扩散学习排序

Sajad Ebrahimi, Bhaskar Mitra, Negar Arabzadeh, Ye Yuan, Haolun Wu, Fattane Zarrinkalam, Ebrahim Bagheri

发表机构 * University of Guelph(圭尔夫大学) Independent Researcher(独立研究者) University of California, Berkeley(加州大学伯克利分校) McGill University(麦吉尔大学) University of Toronto(多伦多大学)

AI总结 提出基于去噪扩散的生成式排序模型DiffusionRank,通过建模特征向量与相关性标签的联合分布,在四个标准LTR数据集上优于传统判别式方法。

详情
AI中文摘要

在信息检索(IR)中,学习排序(LTR)方法传统上局限于判别式机器学习方法,这些方法基于查询-文档对的特征表示来建模文档与查询相关的概率。在这项工作中,我们提出了一种基于去噪扩散的深度生成式LTR方法,该方法转而建模特征向量和相关性标签的完整联合分布。虽然在判别式设置中,过参数化的排序模型可能通过不同方式拟合训练数据,但我们假设在生成式设置下能够解释完整数据分布的候选解能更好地估计相关性。基于这一动机,我们提出了DiffusionRank,它扩展了TabDiff(一种用于表格数据集的基于去噪扩散的生成模型),以创建经典判别式逐点和成对LTR目标的生成式等价物。我们在四个标准LTR数据集上进行了彻底的实证评估,证明了DiffusionRank模型相对于其判别式对应物的改进。我们的工作为未来研究探索如何利用深度生成建模方法(如扩散)在IR中进行学习排序提供了丰富的空间。

英文摘要

Learning-to-rank (LTR) methods have traditionally been limited to discriminative machine learning approaches that model the probability of the document being relevant to the query given some feature representation of the query-document pair. We propose an alternative denoising diffusion-based generative approach to LTR that instead models the full joint distribution over features and relevance labels. While in discriminative LTR, an over-parameterized ranking model may find different ways to fit the training data, we posit that candidate solutions that can explain the full data distribution under the generative setting maybe better at estimating relevance. Thus, we propose DiffusionRank that extends TabDiff, an existing diffusion model for tabular datasets, to create generative alternatives to classical discriminative pointwise and pairwise LTR objectives. Our work demonstrates improvements from DiffusionRank over discriminative counterparts on four standard LTR datasets and points to a rich space for future exploration to leverage ongoing advancements in deep generative models for LTR. Our code is publicly available at https://github.com/sadjadeb/DiffusionRank.

2603.04438 2026-06-17 eess.IV cs.AI cs.LG 版本更新

CogGen: Cognitive-Load-Inspired Fully Unsupervised Deep Generative Modeling for Compressively Sampled MRI Reconstruction

CogGen: 认知负荷启发的全无监督深度生成模型用于压缩感知MRI重建

Qingyong Zhu, Yumin Tan, Xiang Gu, Dong Liang

AI总结 提出CogGen框架,基于认知易到难原则,通过自定进度课程学习和MRI感知双阈值加权策略,将CS-MRI重建分解为分阶段反演问题,理论证明降低局部充分迭代界和累积噪声放大界,实验优于现有无监督和有监督方法。

详情
AI中文摘要

全无监督深度生成建模(FU-DGM)为压缩感知磁共振成像(CS-MRI)重建提供了巨大潜力。代表性的FU-DGM公式,如深度图像先验(DIP)和隐式神经表示(INR),利用架构偏置在图像空间中诱导与正向观测对齐的低维流形。然而,由于底层逆系统高度病态,FU-DGM中长时间的迭代拟合通常导致效率低下和噪声放大。本文受认知易到难学习原则的启发,提出CogGen,一种将CS-MRI重建重新表述为分阶段反演问题的FU-DGM框架。具体地,CogGen通过MRI感知的双阈值加权准则实现自定进度课程学习(SPCL)驱动的渐进调度策略,该准则自适应地调节k空间测量参与。数据一致性残差阈值评估当前生成器的拟合可靠性,而k空间半径阈值控制阶段性的测量暴露,从而避免整个优化过程中的均匀拟合。理论上,我们的分析表明,当早期阶段倾向于易拟合的测量时,CogGen产生更低的局部充分迭代界和更小的累积噪声放大界,解释了CogGen在有限迭代预算内改进的收敛行为和重建保真度。数值实验表明,CogGen的两种实例化,CogGen-DIP和CogGen-INR,在包括无监督和有监督流程在内的现有CS-MRI重建技术中实现了优越的性能。

英文摘要

Fully unsupervised deep generative modeling (FU-DGM) offers significant potential for compressively sampled magnetic resonance imaging (CS-MRI) reconstruction. Representative FU-DGM formulations, such as deep image prior (DIP) and implicit neural representation (INR), employ architectural bias to induce a low-dimensional manifold in the image space that aligns with the forward observation. However, as the underlying inverse system is highly ill-posed, prolonged iterative fitting in FU-DGM typically leads to poor efficiency and noise amplification. In this paper, guided by the cognitive principle of easy-to-hard learning, we propose CogGen, an FU-DGM framework that reformulates CS-MRI reconstruction as a staged inversion problem. Specifically, CogGen implements an self-paced curriculum learning (SPCL)-driven progressive scheduling strategy through an MRI-aware dual-threshold weighting criterion, which adaptively regulates k-space measurement participation. The data-consistency residual thresholding evaluates the fitting reliability of the current generator, while the k-space radius thresholding controls stage-wise measurement exposure, thereby avoiding uniform fitting throughout optimization. Theoretically, our analysis shows that, when early stages favor easy-to-fit measurements, CogGen yields a reduced local sufficient-iteration bound and a smaller cumulative noise-amplification bound, explaining the improved convergence behavior and reconstruction fidelity of CogGen within a finite iteration budget. Numerical experiments demonstrate that both CogGen instantiations, CogGen-DIP and CogGen-INR, achieve superior performance over prevailing CS-MRI reconstruction techniques, including unsupervised and supervised pipelines.

2604.01197 2026-06-17 quant-ph cond-mat.stat-mech cs.CC cs.LG 版本更新

Learning and Generating Mixed States Prepared by Shallow Channel Circuits

通过浅层通道电路学习和生成混合态

Fangjun Hu, Christian Kokail, Milan Kornjača, Pedro L. S. Lopes, Weiyuan Gong, Sheng-Tao Wang, Xun Gao, Stefan Ostermann

发表机构 * QuEra Computing Inc.(QuEra计算公司) School of Engineering and Applied Sciences, Harvard University(哈佛大学工程与应用科学学院)

AI总结 研究通过浅层通道电路生成混合态的学习问题,证明在特定相态下,仅通过测量数据即可高效学习生成混合态,为量子生成模型提供结构基础。

Comments 44 pages, 14 figures, 1 table

详情
AI中文摘要

从测量数据中学习量子态是量子信息和计算复杂性中的核心问题。本文研究在有限维晶格上学习生成混合态的问题。受混合态物质相的最新发展启发,我们专注于平凡相中的任意态。一个态属于平凡相当于存在一个浅层准备通道电路,使得在准备过程中保持局部可逆性。我们证明了此类混合态可通过仅测量访问高效学习。具体而言,给定未知平凡相混合态的多个副本,我们的算法输出一个浅层局部通道电路,可近似生成该态。样本复杂度和运行时间与量子位数呈多项式(或准多项式)关系,假设电路深度和门局部性为常数(或多项式对数)。重要的是,学习者不被提供原始准备电路,仅依赖其存在。我们的结果为基于浅层通道电路的量子生成模型提供了结构基础。在经典极限下,我们的框架也启发了一种仅通过训练和生成的多项式过载高效算法,用于经典扩散模型。

英文摘要

Learning quantum states from measurement data is a central problem in quantum information and computational complexity. In this work, we study the problem of learning to generate mixed states on a finite-dimensional lattice. Motivated by recent developments in mixed state phases of matter, we focus on arbitrary states in the trivial phase. A state belongs to the trivial phase if there exists a shallow preparation channel circuit under which local reversibility is preserved throughout the preparation. We prove that any mixed state in this class can be efficiently learned from measurement access alone. Specifically, given copies of an unknown trivial phase mixed state, our algorithm outputs a shallow local channel circuit that approximately generates this state in trace distance. The sample complexity and runtime are polynomial (or quasi-polynomial) in the number of qubits, assuming constant (or polylogarithmic) circuit depth and gate locality. Importantly, the learner is not given the original preparation circuit and relies only on its existence. Our results provide a structural foundation for quantum generative models based on shallow channel circuits. In the classical limit, our framework also inspires an efficient algorithm for classical diffusion models using only a polynomial overhead of training and generation.

2605.07971 2026-06-17 cs.CV cs.LG 版本更新

DVD: Discrete Voxel Diffusion for 3D Generation and Editing

DVD: 用于3D生成和编辑的离散体素扩散

Zhengrui Xiang, Jiaqi Wu, Fupeng Sun, Heliang Zheng, Yingzhen Li

发表机构 * Imperial College London(伦敦帝国学院) Math Magic Hitem3D

AI总结 提出离散体素扩散框架(DVD),通过将体素占用视为离散变量,实现稀疏体素的生成、不确定性估计和编辑,避免连续到离散的阈值处理,并提供可解释的生成动态。

详情
AI中文摘要

我们引入了离散体素扩散(DVD),这是一个离散扩散框架,用于生成、评估和编辑基于SLat(结构化潜在)的3D生成管道中的稀疏体素。尽管离散扩散通常没有在类似图像的生成中取代连续扩散,但我们表明它可以作为稀疏体素支架的有效第一阶段先验。通过将体素占用视为原生离散变量,DVD避免了连续到离散的阈值处理,并为体素生成、不确定性估计和编辑提供了一个简单的框架。除了质量提升外,DVD通过显式类别建模提供了更可解释的生成动态。此外,我们利用预测熵作为稳健的不确定性度量,以识别模糊的体素区域和复杂样本,促进数据过滤和质量评估等任务。最后,我们提出了一种使用块结构扰动模式的轻量级微调策略。这种方法使模型能够在单次采样轮次内修复和编辑体素,所需的辅助计算量可忽略不计,且无需额外的模型评估。

英文摘要

We introduce Discrete Voxel Diffusion (DVD), a discrete diffusion framework to generate, assess, and edit sparse voxels for SLat (Structured LATent) based 3D generative pipelines. Although discrete diffusion has not generally displaced continuous diffusion in image-like generation, we show that it can be an effective first-stage prior for sparse voxel scaffolds. By treating voxel occupancy as a native discrete variable, DVD avoids continuous-to-discrete thresholding and provides a simple framework for voxel generation, uncertainty estimation, and editing. Beyond quality gains, DVD provides more interpretable generation dynamics through explicit categorical modeling. Furthermore, we leverage the predictive entropy as a robust uncertainty metric to identify ambiguous voxel regions and complicated samples, facilitating tasks such as data filtering and quality assessment. Finally, we propose a lightweight fine-tuning strategy using block-structured perturbation patterns. This approach empowers the model to inpaint and edit voxels within a single sampling round, requiring negligible auxiliary computation and no additional model evaluations. Code is available at https://github.com/TeCai/DVD.

2606.08810 2026-06-17 cs.CL cs.LG 版本更新

Continuous Language Diffusion as a Decoder-Interface Problem

连续语言扩散作为解码器-接口问题

Zhicheng Du, Lan Ma

发表机构 * Tsinghua Shenzhen International Graduate School, Tsinghua University(清华大学深圳国际研究生院, 清华大学)

AI总结 研究连续扩散语言模型如何从高斯噪声生成流畅文本,提出解码器-盆地机制,并设计诊断协议揭示标量指标隐藏的失败,通过接口相图解释令牌恢复行为。

详情
AI中文摘要

高斯扰乱的句子嵌入没有直接的语言解释,但连续扩散语言模型可以从它们生成流畅文本。我们通过嵌入式语言流(ELF)研究这一谜题,并识别出解码器-盆地机制:当轨迹到达原生解码器可以读取稳定令牌的区域时,去噪成功。我们引入了可去噪性、语义可恢复性、顺序敏感性、解码器兼容性和轨迹可靠性的诊断协议。它暴露了标量指标隐藏的失败:低均方误差可能丢弃语言内容,低困惑度可能反映低熵崩溃,干净的潜在重建可能与狭窄的解码器盆地共存。一个解码器-边界界解释了为什么令牌恢复依赖于边界和局部解码器敏感性,而不仅仅是潜在误差。审计公开的ELF检查点揭示了一个接口相图:早期预测弱可读,轨迹中期分歧标志竞争区域,晚期预测进入高边界最终令牌盆地。一旦进入,在生成的ELF状态上令牌实现出奇简单:冻结的T5令牌嵌入查找恢复了原生解码器决策的93%–96%,单个线性读出在32k样本时达到97.9%的一致性,在结构化残差尾部留下约1.1的困惑度差距。在显式诊断监控下,保守的边界门在去噪步骤中提前17%–27%退出。对LangFlow、BitstreamDiffusion和连续潜在扩散语言模型(Cola-DLM)的边界检查表明,当状态对象和解码器改变时,相同的接口问题仍然有意义。因此,连续和潜在扩散语言模型应作为表示-解码器系统进行评估。

英文摘要

Gaussian-corrupted sentence embeddings have no direct linguistic interpretation, yet continuous diffusion language models can generate fluent text from them. We study this puzzle through Embedded Language Flows (ELF) and identify a decoder-basin mechanism: our evidence suggests that denoising becomes reliable when trajectories reach regions where the native decoder can read stable tokens. We introduce a diagnostic protocol for denoisability, semantic recoverability, order sensitivity, decoder compatibility, and trajectory reliability. It exposes failures hidden by scalar metrics: low mean-squared error can discard linguistic content, low perplexity can reflect low-entropy collapse, and clean latent reconstruction can coexist with a narrow decoder basin. A decoder-margin bound explains why token recovery depends on margin and local decoder sensitivity, not latent error alone. Auditing public ELF checkpoints reveals an interface phase diagram: early predictions are weakly readable, mid-trajectory disagreement marks a competition region, and late predictions enter a high-margin decoder basin. Once inside, token realization is surprisingly simple on generated ELF states: frozen T5 (Text-to-Text Transfer Transformer) token-embedding lookup recovers $93$--$96\%$ of native decoder decisions, and a single linear readout reaches $97.9\%$ agreement at 32k samples, leaving an $\approx1.1$--$1.2$ perplexity gap in a structured residual tail. Under conservative held-out gates, a margin rule exits roughly $17$--$28\%$ earlier in denoising steps under an explicit diagnostic monitor. Boundary checks on LangFlow, BitstreamDiffusion, and the Continuous Latent Diffusion Language Model (Cola-DLM) show that the same interface questions remain meaningful when the state object and decoder change. Continuous and latent diffusion language models should therefore be evaluated as representation-decoder systems.

5. 优化、泛化与理论分析 29 篇

2606.17120 2026-06-17 cs.LG physics.chem-ph 新提交

Noise-Driven Escape from Metastable Phases explains Grokking in Deep Neural Networks

噪声驱动从亚稳态逃逸解释深度神经网络中的grokking现象

Ibrahim Talha Ersoy, Karoline Wiesner

发表机构 * Complexity Science Group, Institute of Physics and Astronomy, University of Potsdam(波茨坦大学物理与天文研究所复杂性科学组)

AI总结 本文通过线性DNN模型证明,grokking现象源于L2正则化引起的一阶相变中的迟滞效应,SGD噪声驱动模型从低精度亚稳态逃逸,逃逸时间符合Arrhenius标度。

Comments 13 pages, 4 figures. Accepted at HiLD 2026: 4th Workshop on High-dimensional Learning Dynamics

详情
AI中文摘要

深度神经网络(DNN)在L2正则化强度变化下表现出第一阶相变,每个相变标志着新可学习特征的出现。在临界正则化强度以下,所有特征原则上可学习,但共存的亚稳态(由能量势垒分隔)可能困住网络并阻碍收敛。DNN的优势在于其泛化能力,但仍有许多开放问题,其中包括所谓的grokking的起源:在长时间明显的过拟合后突然延迟出现的泛化。我们在线性DNN中证明,grokking与一阶L2相变中的迟滞一致:通过使用L2正则化设计有意的困住,我们证明低精度亚稳态中的模型仅在SGD噪声驱动其跨越能量势垒时逃逸,逃逸时间遵循Arrhenius标度。我们通过故意将模型困在亚稳态中,在逃逸时间两个数量级范围内重现了类似grokking的延迟收敛。使用稀疏子采样,我们还重现了典型的grokking曲线,其中测试误差最终接近最终训练误差。我们的工作表明,亚稳态的数量等于可学习特征的数量——每个数据协方差的奇异值对应一个——迟滞的潜力随任务复杂度自然增长。我们提供证据表明相同机制可能适用于一般非线性DNN。我们的结果为更高效的学习方案提供了途径。

英文摘要

Deep neural networks (DNNs) exhibit first order phase transitions under variations of the L2 regularization strength, with each transition marking the onset of a new learnable feature. Below a critical regularization strength, all features are in principle learnable, but coexisting metastable states, separated by energy barriers, can trap the network and impede convergence. A strength of DNNs is their ability to generalize. But many open questions remain, among them the origin of so called grokking: the abrupt, delayed onset of generalization after prolonged apparent overfitting. We show for linear DNNs that grokking is consistent with hysteresis in first-order L2 phase transitions: using L2 regularization to engineer deliberate trapping, we demonstrate that a model in a low-accuracy metastable state escapes only when SGD noise drives it across an energy barrier, with escape times following Arrhenius scaling. We reproduce grokking-like delayed convergence across two orders of magnitude in escape time by deliberately trapping models in metastable phases. Using sparse sub-sampling we also reproduce the canonical grokking curve where test error eventually approaches the final training error. Our work suggests that the number of metastable states equals the number of learnable features -- one per singular value of the data covariance -- the potential for hysteresis grows naturally with task complexity. We provide evidence that the same mechanism likely operates in general nonlinear DNNs. Our results provide routes toward more efficient learning schemes.

2606.17215 2026-06-17 cs.LG cs.DS stat.ML 新提交

Sum-of-Squares Degree Barriers for the Reweighted-Hinge Method in Robust Halfspace Learning: A Christoffel-Function Characterization

鲁棒半空间学习中重加权铰链方法的平方和度障碍:一个Christoffel函数刻画

Xiaoyu Li

发表机构 * Xiaoyu Li(李小宇)

AI总结 本文通过Christoffel函数精确刻画了有界度证书无法去除的异常质量,揭示了重加权铰链方法在恶意噪声下学习γ-间隔半空间时,证书的SoS度与异常容忍度之间的基本权衡。

详情
AI中文摘要

一个去除异常值的证书仅通过低阶矩观察数据,而对手恰恰利用这一点,将腐败隐藏在干净数据已经看似典型的盲区中,该盲区无法被任何有界度测试分辨。这个盲区恰好有一个精确的大小:干净边际分布的Christoffel函数,这正是现代数据分析中用于检测异常值的量,此处从对手的角度解读为有界度证书无法去除的腐败。我们将这一反转作为在恶意噪声下鲁棒学习γ-间隔半空间的重加权铰链方法(Shen, 2025; Zeng and Shen, 2025)的组织原则:支配性资源是异常去除证书的平方和(SoS)度,而分辨原则指出,在中心c处能够对度-2t证书隐藏的最大腐败质量恰好是干净边际分布的Christoffel函数λ_{t+1}(c)。由此得出三个推论,均针对证书方法(而非信息论极限)。边际-度权衡:将密集煎饼认证到误差ϵ需要SoS度Ω(log(1/ϵ))或边际Ω(√(log(1/ϵ))/√d),解释了Shen (2025)中记录的log(1/ϵ)边际是必然的,通过加权Chebyshev归约使得阈值2t=Θ((|c|/s)^2)在经典加权极值估计下是紧的。度-2异常障碍:分辨原则实现为一个显式实例,其中度2卡在η^{1/2}而度4逃脱,将方法的小崩溃率定位在度上而非分析中。以及一个度-2t算法追踪前沿η^{1-1/2t}(在t=1时恢复Shen (2025)),其增益为显式常数,受限于煎饼密度,并由度-2障碍证明不可改进。

英文摘要

A certificate that removes outliers sees the data only through its low-degree moments, and an adversary exploits exactly this, hiding corruption where the clean data already looks typical, in the blind spot no bounded-degree test resolves. That blind spot turns out to have an exact size: the Christoffel function of the clean marginal, the very quantity modern data analysis thresholds to detect outliers, here read from the adversary's side as the corruption a bounded-degree certificate cannot remove. We turn this inversion into the organizing principle of the reweighted-hinge approach to robustly learning $γ$-margin halfspaces under malicious noise (Shen, 2025; Zeng and Shen, 2025): the governing resource is the Sum-of-Squares degree of the outlier-removal certificate, and the resolution principle states that the maximal corruption mass which can hide at a center $c$ from a degree-$2t$ certificate is exactly the Christoffel function $λ_{t+1}(c)$ of the clean marginal. Three consequences follow, all against the certificate method (not information-theoretic). A margin-degree tradeoff: certifying the dense pancake to error $ε$ costs SoS degree $Ω(\log(1/ε))$ or margin $Ω(\sqrt{\log(1/ε)}/\sqrt{d})$, explaining why the $\log(1/ε)$ margin Shen (2025) records is forced, with a weighted-Chebyshev reduction making the threshold $2t=Θ((|c|/s)^2)$ tight modulo one classical weighted-extremal estimate. A degree-$2$ outlier barrier: the resolution principle realized as an explicit instance on which degree $2$ is stuck at $η^{1/2}$ while degree $4$ escapes, locating the method's small breakdown rate in the degree, not the analysis. And a degree-$2t$ algorithm tracing the frontier $η^{1-1/2t}$ (recovering Shen (2025) at $t=1$), whose gain is an explicit constant, capped by the pancake density and shown unimprovable by the degree-$2$ barrier.

2606.17419 2026-06-17 cs.LG cs.NA math.NA 新提交

Generalization Guarantees for Multi-Input Neural Operator Learning in Sobolev Spaces

多输入神经算子学习在Sobolev空间中的泛化保证

Yahong Yang, Zecheng Zhang, Wei Zhu, Wenjing Liao, Hao Liu

发表机构 * Georgia Institute of Technology(佐治亚理工学院) University of Notre Dame(圣母大学) Hong Kong Baptist University(香港浸会大学)

AI总结 针对多输入神经算子,在Sobolev范数下建立逼近和泛化误差估计,量化各输入空间对误差界的贡献,并揭示平衡状态下输入维度、正则性和Sobolev阶的相互作用。

详情
AI中文摘要

我们发展了多输入神经算子的逼近和泛化误差估计,输出误差在Sobolev范数下度量。与标准算子学习设置中只有一个输入函数不同,我们的框架允许多个输入函数定义在可能不同的域上,具有不同的维度和Sobolev正则性。导出的速率明确量化了每个输入空间对最终误差界的贡献。特别地,在平衡状态下,逼近和泛化速率由输入维度、正则性和Sobolev阶之间的相互作用控制,而对模型复杂度的依赖保持\(\log\log/\log\)型结构。我们的分析为多输入算子学习(包括Sobolev训练)提供了一个通用的理论框架,并适用于来自偏微分方程和科学计算的算子学习问题。

英文摘要

We develop approximation and generalization error estimates for multi-input neural operators, with the output error measured in Sobolev norms. In contrast to standard operator-learning settings with a single input function, our framework allows multiple input functions defined on possibly different domains, with different dimensions and Sobolev regularities. The derived rates explicitly quantify the contribution of each input space to the final error bound. In particular, in the balanced regime, the approximation and generalization rates are governed by the interaction between the input dimensions, regularities, and Sobolev orders, while the dependence on the model complexity retains a \(\log\log/\log\)-type structure. Our analysis provides a general theoretical framework for multi-input operator learning, including Sobolev training, and is applicable to operator learning problems arising from partial differential equations and scientific computing.

2606.17526 2026-06-17 cs.LG 新提交

MGUP: A Momentum-Gradient Alignment Update Policy for Stochastic Optimization

MGUP:一种用于随机优化的动量-梯度对齐更新策略

Da Chang, Ganzhao Yuan

发表机构 * Shenzhen Institute of Advanced Technology, Chinese Academy of Sciences(中国科学院深圳先进技术研究院) Shenzhen University of Advanced Technology(深圳理工大学) Pengcheng Laboratory(鹏城实验室) University of Chinese Academy of Sciences(中国科学院大学)

AI总结 提出MGUP机制,通过按固定比例选择参数施加大步长、其余参数用小步长,增强动量优化器,理论保证收敛,实验表明提升训练效率与稳定性。

Comments Published in NeurIPS 2025

详情
AI中文摘要

高效优化对于训练大型语言模型至关重要。尽管层内选择性更新已被探索,但缺乏一种既能实现细粒度控制又能保证收敛的通用机制。为填补这一空白,我们提出了\textbf{MGUP},一种新颖的选择性更新机制。\textbf{MGUP}通过每次迭代对选定的固定比例参数应用较大的步长,而对其余参数应用较小的非零步长,增强了标准的基于动量的优化器。作为一个近乎即插即用的模块,\textbf{MGUP}可无缝集成到AdamW、Lion和Muon等优化器中,产生强大的变体,如\textbf{MGUP-AdamW}、\textbf{MGUP-Lion}和\textbf{MGUP-Muon}。在标准假设下,我们为随机优化中的\textbf{MGUP-AdamW}(无权重衰减)提供了理论收敛保证。在包括MAE预训练、LLM预训练和下游微调在内的多种任务上的大量实验表明,与原始基础优化器相比,我们的\textbf{MGUP}增强型优化器实现了更优或更稳定的性能。我们提供了一种原则性、通用且具有理论基础的层内选择性更新策略,加速并稳定了大规模模型的训练。代码已在此https URL公开。

英文摘要

Efficient optimization is essential for training large language models. Although intra-layer selective updates have been explored, a general mechanism that enables fine-grained control while ensuring convergence guarantees is still lacking. To bridge this gap, we propose \textbf{MGUP}, a novel mechanism for selective updates. \textbf{MGUP} augments standard momentum-based optimizers by applying larger step-sizes to a selected fixed proportion of parameters in each iteration, while applying smaller, non-zero step-sizes to the rest. As a nearly {plug-and-play} module, \textbf{MGUP} seamlessly integrates with optimizers such as AdamW, Lion, and Muon. This yields powerful variants such as \textbf{MGUP-AdamW}, \textbf{MGUP-Lion}, and \textbf{MGUP-Muon}. Under standard assumptions, we provide theoretical convergence guarantees for \textbf{MGUP-AdamW} (without weight decay) in stochastic optimization. Extensive experiments across diverse tasks, including MAE pretraining, LLM pretraining, and downstream fine-tuning, demonstrate that our \textbf{MGUP}-enhanced optimizers achieve superior or more stable performance compared to their original base optimizers. We offer a principled, versatile, and theoretically grounded strategy for efficient intra-layer selective updates, accelerating and stabilizing the training of large-scale models. The code is publicly available at https://github.com/MaeChd/MGUP.

2606.18080 2026-06-17 cs.LG 新提交

Edge Flow: A Tractable and Predictive Continuous-Time Model for Gradient Descent at the Edge of Stability

Edge Flow: 一种可处理且可预测的梯度下降在稳定性边缘的连续时间模型

Pierre Marion

发表机构 * Inria, École Normale Supérieure, PSL Research University(法国国家信息与自动化研究所,巴黎高等师范学院,PSL研究大学)

AI总结 针对深度学习梯度下降在稳定性边缘(EoS)的动力学,提出Edge Flow模型,通过三个耦合常微分方程分解为中心、振荡方向和幅度,实现可处理且预测性的建模,并揭示锐度自稳定机制。

Comments 24 pages, 13 figures

详情
AI中文摘要

深度学习中的梯度下降可能在稳定性边缘(EoS)运行,此时损失Hessian的最大特征值徘徊在稳定阈值$2/\eta$附近,其中$\eta$是学习率。经典的梯度流和下降引理等分析工具在此不适用,因此需要寻找在EoS有效的连续时间模型。我们提出Edge Flow,一个由三个耦合常微分方程组成的系统,提供了梯度下降在EoS动力学的可处理、忠实且预测性的模型。Edge Flow将动力学分解为中心、振荡方向和振荡幅度。中心遵循对称化损失上的修正梯度流;方向通过Rayleigh商动力学跟踪Hessian的顶部特征向量;幅度根据锐度是否超过或低于阈值$2/\eta$而指数增长或衰减。关键在于,锐度稳定通过耦合动力学中的自稳定反馈循环实现。离散化Edge Flow每次迭代仅需两次梯度计算和一次Hessian-向量积。我们实验证明,Edge Flow至少与先前提出的连续时间EoS模型一样忠实地跟踪梯度下降的动力学,此外还能解析EoS开始时锐度的振荡,并为理解和缓解该区域的不稳定性提供了原则性框架。

英文摘要

Gradient descent in deep learning may operate at the edge of stability (EoS), a regime in which the largest eigenvalue of the loss Hessian hovers near the stability threshold $2/η$, where $η$ is the learning rate. Classical analysis tools such as gradient flow and the descent lemma do not apply here, motivating the search for a continuous-time model valid at EoS. We propose Edge Flow, a system of three coupled ordinary differential equations that provides a tractable, faithful, and predictive model of gradient descent dynamics at EoS. Edge Flow decomposes the dynamics into a center, an oscillation direction, and an oscillation magnitude. The center follows a modified gradient flow on a symmetrized loss; the direction tracks a top eigenvector of the Hessian via Rayleigh quotient dynamics; and the magnitude grows or decays exponentially depending on whether the sharpness exceeds or falls below the threshold $2/η$. Crucially, sharpness stabilization emerges from the coupled dynamics via a self-stabilization feedback loop. Discretizing Edge Flow only requires two gradient evaluations and one Hessian--vector product at each iteration. We demonstrate empirically that Edge Flow tracks the dynamics of gradient descent at least as faithfully as previously proposed continuous-time EoS models, while in addition resolving the oscillation of the sharpness at the onset of EoS, and that it provides a principled framework for understanding and mitigating instabilities in this regime.

2606.18236 2026-06-17 cs.LG cs.IT math.IT 新提交

Sign-Rank, Index, and List Replicability: Connections and Separations

符号秩、索引与列表可复制性:联系与分离

Ari Blondal, Hamed Hatami, Pooya Hatami, Chavdar Lalov, Sivan Tretiak

发表机构 * McGill University(麦吉尔大学) Ohio State University(俄亥俄州立大学)

AI总结 本文研究二元概念类的符号秩下界,通过比较Z2-索引和列表可复制数,证明Z2-索引被列表可复制数的线性函数上界,从而解决符号秩与Z2-索引的分离问题,并进一步建立列表可复制数的上界与组合性质。

Comments 29 pages, 1 figure

详情
AI中文摘要

在学习理论中,二元概念类的符号秩捕捉了其能被点和半空间表示的最小维度。尽管兴趣浓厚,符号秩的下界却难以获得。最近两种方法通过更易分析的度量建立符号秩的下界:$\mathbb{Z}_2$-索引和列表可复制数。我们对这些度量进行排序,证明$\mathbb{Z}_2$-索引被列表可复制数的线性函数上界。作为主要结果,我们得到了符号秩与$\mathbb{Z}_2$-索引之间的强分离,从而解决了Frick、Hosseini和Vasileuski提出的一个问题。这促使我们对列表可复制性(两个下界度量中更强的一个)进行深入研究。我们通过两个组合度量——高度和最小星数——建立了列表可复制数的上界。我们还证明了一个基本的复合结果:两个概念类的乘积的列表可复制数被这两个类的列表可复制数之和所界。

英文摘要

In learning theory, the sign rank of a binary concept class captures the smallest dimension in which it can be represented by points and halfspaces. Despite tremendous interest, lower bounds on sign rank are notoriously difficult to come by. Two recent approaches to the problem establish lower bounds on sign rank by measures that are easier to analyze: the $\mathbb{Z}_2$-index and the list replicability number. We order these measures, showing that the $\mathbb{Z}_2$-index is upper-bounded by a linear function of the list replicability number. As a main consequence, we obtain a strong separation between sign rank and $\mathbb{Z}_2$-index, thereby resolving a question of Frick, Hosseini, and Vasileuski. This motivates a thorough study of list replicability, the stronger of the two lower-bounding measures. We establish upper bounds on the list replicability number by two combinatorial measures: height and minimum star number. We also prove a fundamental composition result, showing that the product of two concept classes has list replicability number bounded by the sum of the list replicability numbers of the two classes.

2606.17196 2026-06-17 stat.ML cs.LG stat.ME 交叉投稿

Another Look at Log-PCA for Probability Measures: A Dynamical Formulation and Statistical Convergence

再探概率测度的Log-PCA:一种动力学公式与统计收敛性

Peng Xu, Changbo Zhu, Young-Heon Kim, Xiaohui Chen

发表机构 * Department of Statistics University of Illinois Urbana-Champaign(统计学系伊利诺伊大学厄巴纳-香槟分校) Department of ACMS University of Notre Dame(ACMS系诺丁汉大学) Department of Mathematics University of British Columbia(数学系不列颠哥伦比亚大学) Department of Mathematics Thomas Lord Department of Computer Science University of Southern California(数学系托马斯·劳德计算机科学系南加州大学)

AI总结 本文在Wasserstein几何下提出一种动力学公式解释log-PCA,称为Wasserstein切向PCA(WT-PCA),并推导了经验WT-PCA相对于总体测度的统计收敛速率。

详情
AI中文摘要

本文关注在Wasserstein几何下学习随机概率测度在$\mathbb{R}^m$上的主变差。我们引入一种新的动力学公式来解释log-PCA(一种线性化的主测地线分析)作为变分方法。我们的可微版本称为Wasserstein切向PCA(WT-PCA),通过其在重心处的协方差算子捕获Wasserstein空间上(加权)概率测度的局部主测地线变差模式。基于动力学视角并利用最优传输问题的平行传输结构,我们推导了从数据估计的经验WT-PCA相对于总体和经验重心参考测度之间的2-Wasserstein距离的通用统计收敛速率。

英文摘要

This paper is concerned with learning principal variations of random probability measures on $\mathbb{R}^m$ under the Wasserstein geometry. We introduce a new dynamical formulation to interpret the log-PCA, a linearized principal geodesic analysis, as a variational approach. Our differentiable version, termed as the Wasserstein Tangential PCA (WT-PCA), captures the local principal modes of geodesic variations of a (weighted) probability measure on the Wasserstein space via its covariance operator at barycenter. Based on the dynamical perspective and leveraging parallel transport structure of the optimal transport problems, we derive a general statistical convergence rate of the empirical WT-PCA when estimated from data in terms of the 2-Wasserstein distance between the population and empirical barycenter reference measures.

2606.17260 2026-06-17 math.OC cs.LG stat.ML 交叉投稿

Accelerated Convex Optimization via Hamiltonian Dynamics with Deterministic Integration Time

基于确定性积分时间的哈密顿动力学的加速凸优化

Xiuyuan Wang, Vishwak Srinivasan, Qiang Fu, Siddharth Mitra, Ashia Wilson, Andre Wibisono

发表机构 * Department of Computer Science, Yale University(耶鲁大学计算机科学系) Department of EECS, Massachusetts Institute of Technology(麻省理工学院电子工程与计算机科学系)

AI总结 提出基于哈密顿动力学的平滑凸优化算法,通过利用平均哈密顿流轨迹的收缩而非端点收缩,实现确定性加速收敛,并推导出具有最优一阶复杂度的离散实现。

Comments 51 pages, 7 figures. Accepted to the 39th Annual Conference on Learning Theory (COLT 2026)

详情
AI中文摘要

我们开发了基于哈密顿动力学的平滑凸优化算法,实现了加速收敛速率。通过利用平均哈密顿流轨迹的收缩而非要求轨迹端点处的收缩,我们证明了基于哈密顿动力学的优化方法具有确定性的加速收敛保证,扩展了先前仅限于二次目标或仅在期望中成立的工作。我们分析了一个理想的连续时间算法,并推导了具有最优一阶复杂度的实用离散时间实现,从而将哈密顿动力学确立为确定性加速凸优化的有用算法原语。

英文摘要

We develop Hamiltonian dynamics-based algorithms for smooth convex optimization that achieve accelerated rates of convergence. By exploiting contraction of averaged Hamiltonian flow trajectories rather than requiring contraction at trajectory endpoints, we show that Hamiltonian dynamics-based optimization methods admit deterministic and accelerated convergence guarantees, extending prior work that is limited to quadratic objectives or holds only in expectation. We analyze an idealized continuous-time algorithm and derive practical discrete-time implementations with optimal first-order complexity, thereby establishing Hamiltonian dynamics as a useful algorithmic primitive for deterministic accelerated convex optimization.

2606.17319 2026-06-17 stat.ML cs.LG math.CO math.ST stat.TH 交叉投稿

Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

低次稀疏布尔多项式的紧 $L_\infty$ 样本复杂度

Jasper van Doornmalen, Mathieu Molina, Victor Verdugo, José Verschae

发表机构 * Institute for Mathematical and Computational Engineering(数学与计算工程研究所) Pontificia Universidad Católica de Chile(智利天主教大学) Blavatnik School of Computer Science and AI(Blavatnik计算机科学与人工智能学院) Tel Aviv University(特拉维夫大学) Department of Industrial and Systems Engineering(工业与系统工程系)

AI总结 针对有界二进制黑箱函数优化,研究布尔超立方体上多项式代理的学习问题,要求均匀 $L_\infty$ 误差保证,刻画了次高斯噪声下两类有界多项式的最小最大样本复杂度。

详情
AI中文摘要

受有界二进制黑箱函数优化的启发,我们研究了在布尔超立方体上学习多项式代理的问题。为了确保优化代理能为底层目标产生良好解,我们需要均匀的 $L_\infty$ 误差保证,而非通常的 $L_2$ 型保证。我们刻画了次高斯噪声下两类有界多项式的均匀估计的最小最大样本复杂度。首先,对于 $n$ 个变量上次数至多为 $d$ 的多项式,样本复杂度为 $n^{d+1}$。其次,对于 $s$-稀疏 Fourier-Walsh 多项式且 $s \leq n$,样本复杂度为 $ns^2$。这些速率在结构上不同于无噪声情形,其中均匀精确恢复的速率分别为 $n^d$ 和 $ns$。我们的下界甚至对任意自适应学习者也成立,表明额外的因子是噪声情形固有的。$L_2$ 范数的标准傅里叶分析工具不能自然地扩展到 $L_\infty$ 设置以产生均匀保证。我们的证明通过依赖适当选择的辅助范数作为控制 $L_\infty$ 误差的代理来克服这一困难。总之,我们的结果提供了学习优化安全多项式代理的样本复杂度的紧刻画。

英文摘要

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

2606.17426 2026-06-17 stat.ML cs.LG math.PR 交叉投稿

Bounded Difference Concentration for Infinitely Exchangeable Sequences with Applications to AI Benchmark Uncertainty

无限可交换序列的有界差分集中不等式及其在AI基准不确定性中的应用

Fangyuan Lin, Spencer Frei, Victor H. de la Pena

发表机构 * Department of Statistics, Columbia University(哥伦比亚大学统计系) Google DeepMind(谷歌DeepMind)

AI总结 通过de Finetti测度分解有界差分函数的偏差,提出有效方差代理的集中不等式,并证明零和线性对比中潜在混合项完全抵消,应用于AI基准如MMLU的不确定性量化。

详情
AI中文摘要

我们考虑无限可交换随机变量函数的集中性质。通过对de Finetti导向测度取条件,我们证明任何具有有界差分常数$c_1, \dots, c_n$的函数的偏差分解为条件采样波动和潜在混合波动。当该潜在混合是$\sigma_{\mathrm{mix}}^2$-次高斯时,我们建立了一个有效方差代理为$\frac{1}{4}\sum_i c_i^2 + \sigma_{\mathrm{mix}}^2$的集中不等式。关键的是,我们证明对于零和线性对比,例如子样本均值与总体均值之差,潜在混合项完全抵消。这种抵消产生了一个紧的、无混合的Hoeffding型界,为近期有限可交换集中结果的无限可扩展极限提供了直接的de Finetti机制。我们将该框架应用于量化复合AI基准(如MMLU)中的不确定性,其中问题项在领域间自然表现出可交换依赖性。我们的结果既提供了一个领域分层层次模型来限制准确率分数的不确定性,也提供了一个无分布、节省成本的统计保证,用于从随机子集准确估计完整的基准分数。

英文摘要

We consider the concentration properties of functions of infinitely exchangeable random variables. By conditioning on the de Finetti directing measure, we show that the deviation of any function with bounded-difference constants $c_1, \dots, c_n$ decomposes into a conditional sampling fluctuation and a latent mixture fluctuation. When this latent mixture is $σ_{\mathrm{mix}}^2$-subgaussian, we establish a concentration inequality with an effective variance proxy of $\frac{1}{4}\sum_i c_i^2 + σ_{\mathrm{mix}}^2$. Crucially, we demonstrate that for zero-sum linear contrasts, such as the difference between a subsample mean and a full population mean, the latent mixture term cancels exactly. This cancellation yields a tight, mixture-free Hoeffding-type bound that provides a direct de Finetti mechanism for the infinite-extendibility limit of recent finite-exchangeable concentration results. We apply this framework to quantify uncertainty in composite AI benchmarks, such as MMLU, where question items naturally exhibit exchangeable dependence across domains. Our results provide both a domain-stratified hierarchical model for bounding the uncertainty of accuracy scores, and a distribution-free, cost-saving statistical guarantee for accurately estimating full benchmark scores from random subsets.

2606.17523 2026-06-17 math.OC cs.LG 交叉投稿

Beyond IGO-Flow: Toward Convergence Analysis of IGO in Continuous Spaces

超越IGO流:面向连续空间中IGO的收敛性分析

Ryosuke Kimura, Youhei Akimoto

发表机构 * University of Tsukuba, Tsukuba, Japan(茨口大学) RIKEN Center for Advanced Intelligence Project, Tokyo, Japan(理化学研究所先进情报项目)

AI总结 研究离散时间IGO在连续空间中的收敛性,针对强凸二次目标函数上的多元高斯族,证明了协方差矩阵收敛到零矩阵,并在条件数有界时均值向量收敛到全局最优。

Comments Accepted at PPSN 2026

详情
AI中文摘要

信息几何优化(IGO)通过将搜索分布的适应解释为自然梯度更新,为黑箱优化提供了统一框架。尽管其概念重要,IGO的收敛理论仍然有限:大多数现有结果涉及连续时间理想化,如IGO流,而非具有非无穷小学习率的离散时间更新。在本文中,我们研究连续空间中的离散时间IGO,将其表述为指数族期望参数坐标下的自然梯度更新。特别地,我们分析了在强凸二次目标函数上对多元高斯族的IGO。我们的分析涵盖了一个同时结合全协方差适应、固定正学习率和基于分位数权重的设置。在此设置中,我们证明了协方差矩阵收敛到零矩阵。我们进一步表明,如果适当缩放的协方差矩阵的条件数在足够频繁的迭代中有界,则均值向量收敛到全局最优。这些结果推进了IGO的收敛理论,并有助于弥合IGO数学理论与实际协方差自适应搜索方法(如CMA-ES)之间的差距。

英文摘要

Information-Geometric Optimization (IGO) provides a unified framework for black-box optimization by interpreting the adaptation of a search distribution as a natural gradient update. Despite its conceptual importance, the convergence theory of IGO remains limited: most existing results concern continuous-time idealizations such as the IGO flow, rather than discrete-time updates with non-infinitesimal learning rates. In this paper, we study discrete-time IGO in continuous spaces, formulated as natural gradient updates in the expectation-parameter coordinates of an exponential family. In particular, we analyze IGO over the multivariate Gaussian family on strongly convex quadratic objective functions. Our analysis covers a setting that simultaneously incorporates full covariance adaptation, a fixed positive learning rate, and quantile-based weights. In this setting, we prove that the covariance matrix converges to the zero matrix. We further show that the mean vector converges to the global optimum, provided that the condition number of the appropriately scaled covariance matrix is bounded at sufficiently frequent iterations. These results advance the convergence theory of IGO and help bridge the gap between the mathematical theory of IGO and practical covariance-adaptive search methods such as CMA-ES.

2606.18074 2026-06-17 stat.ML cs.LG stat.ME 交叉投稿

Tensor-based second-order causal discovery

基于张量的二阶因果发现

Nathan Ouyang, Kexin Wan, Anna Seigal

AI总结 提出TSCD算法,利用观测和干预数据的协方差矩阵张量,在线性结构方程模型下识别有向无环图及其边函数,仅要求噪声不相关,并扩展到非线性模型,具有对数级干预可识别性。

Comments 27 pages, 7 figures. Code available at https://github.com/QWE123665/Tensor-based-Second-order-Causal-Discovery

详情
AI中文摘要

因果发现旨在揭示变量间的因果依赖关系。为此,我们提出了一种称为基于张量的二阶因果发现(TSCD)的算法。其输入是从观测数据和干预数据的协方差矩阵中得到的张量。假设因果依赖关系遵循有向无环图(DAG)上的线性结构方程模型,TSCD输出DAG及其边上的函数,仅要求噪声变量不相关。我们还实现了该方法在非线性模型中的版本。我们关注二阶统计量(通过协方差矩阵)的动机是:相对于高阶矩,它们在统计和计算上更高效;相对于一阶统计量,它们具有可识别性;并且无论变量是否为高斯分布,它们都适用。我们证明,TSCD从对数于变量数量的干预次数中可识别因果顺序和参数。实验表明,TSCD对噪声具有鲁棒性,与现有方法相比具有竞争力,并且可扩展到数百个变量。

英文摘要

Causal discovery seeks to uncover the causal dependencies among variables. For this purpose, we propose an algorithm called Tensor-based Second-order Causal Discovery (TSCD). Its input is a tensor obtained from the covariance matrices of observational and interventional data. Assuming the causal dependencies follow a linear structural equation model on a directed acyclic graph (DAG), TSCD outputs the DAG and the functions on its edges, requiring only that the noise variables are uncorrelated. We also implement a version of the approach for nonlinear models. Our focus on second-order statistics (via the covariance matrices) is motivated by their statistical and computational efficiency relative to higher-order moments, their identifiability relative to first-order statistics, and that they work regardless of whether the variables are Gaussian. We show that TSCD has identifiable causal order and parameters from a number of interventions that is logarithmic in the number of variables. Experiments show that TSCD is robust to noise, competitive with existing methods, and scales to hundreds of variables.

2606.18183 2026-06-17 stat.ML cs.LG math.PR 交叉投稿

A Diffusion Approximation for Temporal-Difference Learning with Linear Features under Markovian Noise

马尔可夫噪声下线性特征时序差分学习的扩散近似

M. Forzo, E. Monzio Compagnoni, A. Russo, A. Pacchiano

发表机构 * Technical University of Munich (TUM), Munich, Germany(慕尼黑技术大学) University of Basel, Basel, Switzerland(巴塞尔大学) Boston University, Boston, USA(波士顿大学)

AI总结 针对线性TD(0)在马尔可夫噪声下的随机波动,提出随机微分方程近似模型,揭示投影Bellman算子收缩动力学与马尔可夫采样影响的区别,解释常数步长误差下限。

详情
AI中文摘要

带有线性函数逼近的时序差分(TD)学习是策略评估的核心方法。其经典连续时间描述为常微分方程(ODE),捕捉渐近均值动态但忽略了决定误差下限的随机波动。我们引入了马尔可夫噪声下线性TD(0)的随机微分方程(SDE)近似。所得模型将投影Bellman算子控制的收缩动力学与马尔可夫采样的影响区分开来。因此,该模型通过马尔可夫长期协方差与投影Bellman算子收缩几何之间的相互作用解释了常数步长误差下限。

英文摘要

Temporal difference (TD) learning with linear function approximation is a core method for policy evaluation. Its classical continuous-time description is an ordinary differential equation (ODE), which captures the asymptotic mean dynamics but neglects stochastic fluctuations determining the error floor. We introduce a stochastic differential equation (SDE) approximation for linear TD(0) under Markovian noise. The resulting model distinguishes the contraction dynamics governed by the projected Bellman operator from the influence of Markovian sampling. As a consequence, the model explains the constant-stepsize error floor through the interaction between Markovian long-run covariance and the contraction geometry of the projected Bellman operator.

2606.18218 2026-06-17 math.PR cs.LG cs.SY eess.SY math.OC stat.ML 交叉投稿

Finite-Time Queue Peak Laws in Stochastic Networks: Logarithmic Scaling After Geometric Thresholds

随机网络中的有限时间队列峰值律:几何阈值后的对数缩放

Hao Liang, Cheng Tang, Yunzong Xu

发表机构 * University of Illinois Urbana–Champaign(伊利诺伊大学厄巴纳-香槟分校)

AI总结 研究广义交换机中有限时间队列峰值,证明在均匀内部松弛条件下,漂移最小化调度策略的峰值包络从平方根律转变为对数律,并给出匹配下界和几何阈值。

详情
AI中文摘要

我们研究广义交换机中的有限时间队列峰值,广义交换机是一种标准随机网络模型,其中许多队列共享受限的服务资源。到达过程可以是依赖的、时变的,并且适应于过去;稳态负载条件是均匀内部松弛,即条件均值到达向量始终位于容量区域的一个固定收缩内。我们表明,这种松弛重塑了漂移最小化调度策略(如MaxWeight)的有限时间峰值律。没有松弛时尖锐的平方根包络仅持续到几何依赖的阈值;超过该阈值,运行最大值随水平期仅对数增长,无论是高概率还是期望意义下。其机制是自归一化:在当前队列方向上,投影波动尺度被稳定化漂移尺度归一化。这从对数系数中消除了容量几何,而几何仍保留在阈值中。匹配的下界表明,对数项和几何阈值都是不可避免的。当有限时间状态空间塌缩可用时,可以使用局部瓶颈几何来锐化阈值。对于广义输入排队交换机,我们获得了具有紧对数系数的有限时间峰值界。仿真说明了理论预测的两阶段包络、局部几何改进和方差敏感改进。

英文摘要

We study finite-horizon queue peaks in generalized switches, a standard stochastic-network model in which many queues share constrained service resources. Arrivals may be dependent, time-varying, and adapted to the past; the standing load condition is uniform interior slack, meaning the conditional mean arrival vector stays in a fixed contraction of the capacity region. We show that this slack reshapes the finite-time peak law for drift-minimizing scheduling policies such as MaxWeight. The square-root envelope that is sharp without slack persists only up to a geometry-dependent threshold; beyond that threshold, the running maximum grows only logarithmically with the horizon, both with high probability and in expectation. The mechanism is self-normalization: in the current queue direction, the projected fluctuation scale is normalized by the stabilizing drift scale. This removes capacity geometry from the logarithmic coefficient, while geometry remains in the threshold. Matching lower bounds show that both the logarithmic term and a geometric threshold are unavoidable. When finite-time state-space collapse is available, the threshold can be sharpened using local bottleneck geometry. For generalized input-queued switches, we obtain finite-time peak bounds with tight logarithmic coefficients. Simulations illustrate the two-phase envelope, local geometric refinements, and variance-sensitive improvements predicted by the theory.

2508.19445 2026-06-17 cs.LG stat.ML 版本更新

On Surjectivity of Neural Networks: Can you elicit any behavior from your model?

论神经网络的满射性:你能从模型中诱导出任何行为吗?

Haozhe Jiang, Nika Haghtalab

AI总结 本文证明现代神经网络架构(如预层归一化和线性注意力模块)几乎总是满射,意味着任何输出(包括有害内容)原则上都可生成,揭示了模型在对抗攻击下的固有脆弱性。

Comments Blog: https://astro-eric.github.io/blogs/surjective/

详情
AI中文摘要

给定一个训练好的神经网络,是否可以通过某些输入生成任意指定的输出?等价地,该网络对应的函数是否是满射的?在生成模型中,满射性意味着任何输出,包括有害或不良内容,原则上都可以由网络生成,引发了对模型安全和越狱漏洞的担忧。在本文中,我们证明了现代神经架构的许多基本构建模块,例如具有预层归一化和线性注意力模块的网络,几乎总是满射的。作为推论,广泛使用的生成框架,包括GPT风格的Transformer和具有确定性ODE求解器的扩散模型,允许对任意输出进行逆映射。通过研究这些现代且常用的神经架构的满射性,我们提供了一个形式化方法,揭示了它们对广泛对抗攻击类别的不可避免的脆弱性。

英文摘要

Given a trained neural network, can any specified output be generated by some input? Equivalently, does the network correspond to a function that is surjective? In generative models, surjectivity implies that any output, including harmful or undesirable content, can in principle be generated by the networks, raising concerns about model safety and jailbreak vulnerabilities. In this paper, we prove that many fundamental building blocks of modern neural architectures, such as networks with pre-layer normalization and linear-attention modules, are almost always surjective. As corollaries, widely used generative frameworks, including GPT-style transformers and diffusion models with deterministic ODE solvers, admit inverse mappings for arbitrary outputs. By studying surjectivity of these modern and commonly used neural architectures, we contribute a formalism that sheds light on their unavoidable vulnerability to a broad class of adversarial attacks.

2512.11784 2026-06-17 cs.LG stat.ML 版本更新

Softmax as Linear Attention in the Large-Prompt Regime: a Measure-based Perspective

大提示词机制下的Softmax作为线性注意力:基于测度的视角

Etienne Boursier, Claire Boyer

AI总结 提出基于测度的框架,证明在无限提示词极限下softmax注意力收敛到线性算子,并给出有限提示词下的非渐近浓度界,从而将线性注意力的优化分析迁移到大提示词下的softmax注意力。

详情
AI中文摘要

Softmax注意力是Transformer架构的核心组成部分,但其非线性结构给理论分析带来了重大挑战。我们开发了一个统一的、基于测度的框架,用于研究有限和无限提示词下的单层softmax注意力。对于独立同分布的高斯输入,我们利用softmax算子在大提示词极限下收敛到作用于底层输入标记测度的线性算子这一事实。基于这一见解,我们建立了softmax注意力输出和梯度的非渐近浓度界,量化了有限提示词模型接近其无限提示词对应模型的速度,并证明了在具有次高斯标记的一般上下文学习设置中,这种浓度在整个训练轨迹上保持稳定。在线性回归的上下文学习中,我们利用易处理的无限提示词动力学来分析有限提示词长度下的训练。我们的结果表明,当提示词足够长时,为线性注意力开发的优化分析可以直接迁移到softmax注意力上,表明大提示词下的softmax注意力继承了其线性对应物的分析结构。这反过来为研究大提示词机制下softmax注意力层的训练动力学和统计行为提供了一个有原则且广泛适用的工具包。

英文摘要

Softmax attention is a central component of transformer architectures, yet its nonlinear structure poses significant challenges for theoretical analysis. We develop a unified, measure-based framework for studying single-layer softmax attention under both finite and infinite prompts. For i.i.d. Gaussian inputs, we lean on the fact that the softmax operator converges in the infinite-prompt limit to a linear operator acting on the underlying input-token measure. Building on this insight, we establish non-asymptotic concentration bounds for the output and gradient of softmax attention, quantifying how rapidly the finite-prompt model approaches its infinite-prompt counterpart, and prove that this concentration remains stable along the entire training trajectory in general in-context learning settings with sub-Gaussian tokens. In the case of in-context linear regression, we use the tractable infinite-prompt dynamics to analyze training at finite prompt length. Our results allow optimization analyses developed for linear attention to transfer directly to softmax attention when prompts are sufficiently long, showing that large-prompt softmax attention inherits the analytical structure of its linear counterpart. This, in turn, provides a principled and broadly applicable toolkit for studying the training dynamics and statistical behavior of softmax attention layers in large prompt regimes.

2601.10962 2026-06-17 cs.LG cond-mat.dis-nn 版本更新

Noise-Driven Exploration and Transient Freezing Select Flat Minima in Stochastic Gradient Descent

噪声驱动的探索与瞬态冻结在随机梯度下降中选择平坦极小值

Ning Yang, Yikuan Zhang, Qi Ouyang, Chao Tang, Yuhai Tu

AI总结 通过分析SGD学习动力学,发现非平衡机制驱动解选择:瞬态探索阶段逃离尖锐谷,噪声重塑势能稳定平坦解,冻结延迟增强泛化。

Comments 12 pages, 4 figures

详情
AI中文摘要

随机梯度下降(SGD)是深度学习的核心,但其偏好更平坦、更泛化解的动力学起源仍不清楚。本文通过分析SGD学习动力学,识别出一种非平衡机制,该机制在训练过程中控制解的选择。数值实验揭示了一个瞬态探索阶段,在此阶段SGD轨迹反复逃离尖锐谷,并向损失景观中更平坦的区域迁移,然后才被限制在最终盆地中。利用一个可处理的物理模型,我们证明SGD噪声将损失景观重塑为一个有效势能,该势能优先稳定平坦解。我们进一步揭示了一种瞬态冻结机制:随着训练进行,平坦化的景观抑制了竞争谷之间的跃迁。更强的SGD噪声延迟了这种冻结转变,延长了探索阶段,从而增加了收敛到更平坦极小值的概率。这些结果共同提供了一个统一的物理框架,连接了学习动力学、损失景观几何和泛化,并为设计更有效的优化算法提供了指导原则。

英文摘要

Stochastic gradient descent (SGD) is central to deep learning, yet the dynamical origin of its preference for flatter, more generalizable solutions remains unclear. Here, by analyzing SGD learning dynamics, we identify a nonequilibrium mechanism that governs solution selection during training. Numerical experiments reveal a transient exploratory phase in which SGD trajectories repeatedly escape sharp valleys and migrate toward flatter regions of the loss landscape before becoming confined to a final basin. Using a tractable physical model, we show that SGD noise reshapes the loss landscape into an effective potential that preferentially stabilizes flat solutions. We further uncover a transient freezing mechanism: as training progresses, the flattening landscape suppresses transitions between competing valleys. Stronger SGD noise delays this freezing transition, prolonging the exploratory phase and thereby increasing the probability of convergence to flatter minima. Together, these results provide a unified physical framework connecting learning dynamics, loss-landscape geometry, and generalization, and suggest guiding principles for the design of more effective optimization algorithms.

2602.06257 2026-06-17 cs.LG cs.GT 版本更新

On Randomized Algorithms in Online Strategic Classification

关于在线策略分类中的随机化算法

Chase Hutton, Adam Melrod, Han Shao

AI总结 研究在线策略分类中随机化算法的优势,在可实现和不可知场景下分别给出基于Littlestone维度和操纵图最大度的改进界限,并证明随机化可突破确定性算法的下界。

详情
AI中文摘要

在线策略分类研究智能体策略性地修改其特征以获得有利预测的场景。例如,给定一个基于信用评分决定贷款批准的分类器,申请人可能开设或关闭信用卡和银行账户以获得正面预测。学习目标是在此类行为下实现低错误率或遗憾界。尽管随机化算法在策略环境中可能为学习者带来优势,但它们尚未得到充分探索。在可实现场景中,随机化算法没有已知的下界,而确定性学习者的现有下界构造可以通过随机化规避。在不可知场景中,已知的最佳遗憾上界为$O(T^{3/4}\log^{1/4}T|\mathcal H|)$,远低于标准在线学习率$O(\sqrt{T\log|\mathcal H|})$。在这项工作中,我们为两种场景下的在线策略分类提供了精细化的界限;我们的界限依赖于假设类$\mathcal H$的Littlestone维度$\mathrm{Ldim}(\mathcal H)$和操纵图的最大度$\Delta$。在可实现场景中,对于$T > \mathrm{Ldim}(\mathcal H) \Delta^2$,我们将确定性学习者的现有下界$\Omega(\mathrm{Ldim}(\mathcal H) \Delta)$扩展到所有学习者。这产生了第一个适用于随机化学习者的下界。然后,我们提供了第一个随机化学习者,改进了已知的(确定性)上界$O(\mathrm{Ldim}(\mathcal H) \cdot \Delta \log \Delta)$。在不可知场景中,我们给出了一个非恰当随机化学习者,将遗憾上界改进为$O(\sqrt{T\log|\mathcal H|})$,匹配标准在线学习率。我们还展示了所有恰当学习规则的更大下界,证明非恰当性对于达到最优率是必要的。

英文摘要

Online strategic classification studies settings in which agents strategically modify their features to obtain favorable predictions. For example, given a classifier that determines loan approval based on credit scores, applicants may open or close credit cards and bank accounts to obtain a positive prediction. The learning goal is to achieve low mistake or regret bounds despite such behavior. While randomized algorithms have the potential to offer advantages to the learner in strategic settings, they have been largely underexplored. In the realizable setting, no lower bound is known for randomized algorithms, and existing lower bound constructions for deterministic learners can be circumvented by randomization. In the agnostic setting, the best known regret upper bound is $O(T^{3/4}\log^{1/4}T|\mathcal H|)$, which is far from the standard online learning rate of $O(\sqrt{T\log|\mathcal H|})$. In this work, we provide refined bounds for online strategic classification in both settings; our bounds depend on the Littlestone dimension $\mathrm{Ldim}(\mathcal H)$ of the hypothesis class $\mathcal H$ and the maximum degree $Δ$ of the manipulation graph. In the realizable setting, we extend, for $T > \mathrm{Ldim}(\mathcal H) Δ^2$, the existing lower bound $Ω(\mathrm{Ldim}(\mathcal H) Δ)$ for deterministic learners to all learners. This yields the first lower bound that applies to randomized learners. We then provide the first randomized learner that improves the known (deterministic) upper bound of $O(\mathrm{Ldim}(\mathcal H) \cdot Δ\log Δ)$. In the agnostic setting, we give an improper randomized learner that improves the regret upper bound to $O(\sqrt{T\log|\mathcal H|})$, matching the standard online learning rate. We also show a larger lower bound for all proper learning rules, demonstrating that improperness is necessary to achieve the optimal rate.

2606.14187 2026-06-17 cs.LG 版本更新

Zeta: Dual Whitening for Matrix Optimization via Coordinate-Adaptive Preconditioning

Zeta: 通过坐标自适应预处理实现矩阵优化的双重白化

Kaiwen Chen, Shuhai Zhang, Zimo Liu, Linxiao Li, Ying Sun, Yuchen Li, Yifan Zhang, Bo Han, Mingkui Tan, Qiuwu Chen

发表机构 * South China University of Technology(华南理工大学) AIGCode Hong Kong Baptist University(香港浸会大学)

AI总结 针对矩阵优化中坐标尺度异质性问题,提出双重白化优化器Zeta,通过先坐标白化后谱白化的严格顺序降低正交化误差,在语言建模和视觉任务上提升收敛速度与泛化性能。

详情
AI中文摘要

大规模神经网络训练日益依赖矩阵感知优化器,这类优化器利用权重参数的结构,超越逐元素自适应。然而,现有矩阵感知方法(如Muon)存在一个未被充分认识的脆弱性:其核心操作Newton-Schulz迭代严重依赖于输入条件,而原始动量矩阵表现出严重的坐标尺度异质性。本文首先通过卡方均匀性检验验证了这种尺度异质性,表明矩阵内尺度不平衡在Transformer层中普遍存在,且坐标白化能有效纠正。受此发现启发,我们提出Zeta,一种双重白化优化器,在严格有序的流程中应用坐标白化和谱白化。该顺序不是可调选择,而是源于数学依赖:坐标白化建立了谱白化可靠运行所需的统计各向同性。我们进一步证明,通过改善输入的条件数,该双重流程相对于纯谱方法严格降低了正交化误差。实验上,Zeta在语言建模(0.6B至8B参数)、混合专家架构和视觉任务中匹配或超越强基线,表明在正交化前解决尺度不平衡能带来更快的收敛和更好的泛化。代码可在该https URL获取。

英文摘要

Large-scale neural network training increasingly relies on matrix-aware optimizers that exploit the structure of weight parameters beyond element-wise adaptation. However, existing matrix-aware methods such as Muon have an underappreciated vulnerability: their core operation, Newton-Schulz iteration, depends critically on input conditioning, yet the raw momentum matrices exhibit severe coordinate-wise scale heterogeneity. In this paper, we first verify this scale heterogeneity through a chi-square uniformity test, showing that intra-matrix scale imbalance is prevalent across Transformer layers and that coordinate whitening effectively corrects it. Motivated by this finding, we propose Zeta, a dual whitening optimizer that applies coordinate whitening and spectral whitening in a strictly ordered pipeline. The ordering is not a tunable choice but follows from a mathematical dependency: coordinate whitening establishes the statistical isotropy that spectral whitening requires to function reliably. We further prove that this dual pipeline strictly reduces orthogonalization error relative to pure spectral methods by improving the condition number of the input. Empirically, Zeta matches or surpasses strong baselines across language modeling (0.6B to 8B parameters), mixture-of-experts architectures, and vision tasks, demonstrating that resolving scale imbalance before orthogonalization leads to faster convergence and better generalization. Code is available at https://github.com/AIGCodeOS/aigcode_zeta_optimizer.

2405.15379 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Randomized Midpoint Method for Log-Concave Sampling under Constraints

对数凹分布约束采样的随机中点方法

Yifeng Yu, Shijie Zhang, Lu Yu

AI总结 提出约束域中过阻尼和动能朗之万扩散的随机中点离散化方法,通过投影算子建立统一框架,证明Wasserstein-q距离下的收敛保证并得到近最优下界。

详情
AI中文摘要

本文研究在凸紧集上支撑的对数凹分布的采样问题,特别关注约束域中过阻尼和动能朗之万扩散的随机中点离散化。我们重新审视了通过投影算子处理约束的近端框架,并发展了一个更通用的公式,涵盖了欧几里得、Bregman和Gauge投影。由此产生的光滑近似允许对约束下的朗之万算法及其变体进行统一且易于处理的分析。在此框架内,我们建立了光滑代理与目标分布之间Wasserstein-$q$($q\geqslant 1$)距离的收敛保证。我们进一步推导了互补的下界,表明结果在阶上是近乎最优的。基于这种紧致近似分析,我们获得了约束下随机中点朗之万算法的新收敛保证,以及普通和动能朗之万蒙特卡洛方法的改进界,从而推进了约束扩散采样的理论理解。

英文摘要

In this paper, we study the problem of sampling from log-concave distributions supported on convex and compact sets, with a particular focus on the randomized midpoint discretization of both overdamped and kinetic Langevin diffusions in constrained domains. We revisit the proximal framework for handling constraints through projection operators and develop a more general formulation that encompasses Euclidean, Bregman, and Gauge projections. The resulting smooth approximation allows a unified and tractable analysis of Langevin algorithms and their variants under constraints. Within this framework, we establish convergence guarantees in Wasserstein-$q$ $(q\geqslant 1)$ distances between the smooth surrogate and the target distribution. We further derive complementary lower bounds, showing that the results are near-optimal in order. Building upon this tight approximation analysis, we obtain new convergence guarantees for the randomized midpoint Langevin algorithms and refined bounds for both vanilla and kinetic Langevin Monte Carlo methods under constraints, thereby advancing the theoretical understanding of constrained diffusion-based sampling.

2501.10729 2026-06-17 stat.ME cs.LG stat.ML 版本更新

Robust Local Polynomial Regression with Similarity Kernels

基于相似性核的稳健局部多项式回归

Yaniv Shulman

AI总结 针对传统局部多项式回归对异常值敏感的问题,提出一种结合响应变量信息的条件密度核加权方法,通过局部密度估计降低异常值影响,在保持与标准LOWESS竞争力同时降低经验偏差。

详情
AI中文摘要

局部多项式回归(LPR)因其灵活性和简单性,是一种广泛使用的非参数方法,用于建模复杂关系。它通过拟合低阶多项式到数据的局部子集(按邻近度加权)来估计回归函数。然而,传统的LPR对异常值和高杠杆点敏感,这些点会显著影响估计精度。本文重新审视用于计算回归权重的核函数,并提出一种新颖的框架,将预测变量和响应变量都纳入加权机制。本工作的重点是一种条件密度核,通过局部密度估计减轻异常值的影响,从而稳健地估计权重。所提出的方法已在Python中实现,并在此https URL公开提供。总体分析量化了基于密度的稳健加权引起的偏差,报告的实验显示,与迭代稳健LOWESS相比,经验偏差更低,同时与标准LOWESS保持竞争力。这一进展为传统LPR提供了有前景的扩展,为稳健回归应用开辟了新的可能性。

英文摘要

Local Polynomial Regression (LPR) is a widely used nonparametric method for modeling complex relationships due to its flexibility and simplicity. It estimates a regression function by fitting low-degree polynomials to localized subsets of the data, weighted by proximity. However, traditional LPR is sensitive to outliers and high-leverage points, which can significantly affect estimation accuracy. This paper revisits the kernel function used to compute regression weights and proposes a novel framework that incorporates both predictor and response variables in the weighting mechanism. The focus of this work is a conditional density kernel that robustly estimates weights by mitigating the influence of outliers through localized density estimation. The proposed method is implemented in Python and is publicly available at https://github.com/yaniv-shulman/rsklpr. The population analysis quantifies the bias induced by density-based robust weighting, and the reported experiments show lower empirical bias than iterative robust LOWESS while remaining competitive with standard LOWESS. This advancement provides a promising extension to traditional LPR, opening new possibilities for robust regression applications.

2507.05164 2026-06-17 math.DS cs.LG nlin.AO 版本更新

A Dynamical Systems Perspective on the Analysis of Neural Networks

神经网络分析的动力学系统视角

Dennis Chemnitz, Maximilian Engel, Christian Kuehn, Sara-Viola Kuntz

AI总结 利用动力学系统重新表述深度神经网络、梯度下降等挑战,研究信息传播、训练动态和平均场极限,揭示网络嵌入、稳定性及图极限等性质。

Comments preprint of a book chapter contribution

详情
AI中文摘要

在本章中,我们利用动力学系统分析机器学习算法的几个方面。作为阐述性贡献,我们展示了如何将深度神经网络、(随机)梯度下降及相关主题中的各种挑战重新表述为动力学陈述。我们还解决了三个具体挑战。首先,我们考虑信息通过神经网络的传播过程,即研究不同架构下的输入-输出映射。我们解释了增强神经ODE的通用嵌入性质(可表示给定正则性的任意函数)、根据合适函数类对多层感知器和神经ODE的分类,以及神经延迟方程中的记忆依赖性。其次,我们从动力学角度考虑神经网络的训练方面。我们描述了梯度下降的动力学系统视角,并研究了超定问题的稳定性。然后我们将此分析扩展到过参数化设置,并描述了稳定性边缘现象,也涉及隐式偏差的可能解释。对于随机梯度下降,我们通过插值解的Lyapunov指数展示了过参数化设置的稳定性结果。第三,我们解释了关于神经网络平均场极限的几个结果。我们描述了一个结果,该结果通过有向图测度将现有技术扩展到涉及图极限的异质神经网络。这表明大类神经网络自然落入图上Kuramoto型模型及其大图极限的框架内。最后,我们指出使用动力学研究可解释和可靠AI的类似策略也可应用于生成模型或梯度训练方法中的基本问题(如反向传播或梯度消失/爆炸)等设置。

英文摘要

In this chapter, we utilize dynamical systems to analyze several aspects of machine learning algorithms. As an expository contribution we demonstrate how to re-formulate a wide variety of challenges from deep neural networks, (stochastic) gradient descent, and related topics into dynamical statements. We also tackle three concrete challenges. First, we consider the process of information propagation through a neural network, i.e., we study the input-output map for different architectures. We explain the universal embedding property for augmented neural ODEs representing arbitrary functions of given regularity, the classification of multilayer perceptrons and neural ODEs in terms of suitable function classes, and the memory-dependence in neural delay equations. Second, we consider the training aspect of neural networks dynamically. We describe a dynamical systems perspective on gradient descent and study stability for overdetermined problems. We then extend this analysis to the overparameterized setting and describe the edge of stability phenomenon, also in the context of possible explanations for implicit bias. For stochastic gradient descent, we present stability results for the overparameterized setting via Lyapunov exponents of interpolation solutions. Third, we explain several results regarding mean-field limits of neural networks. We describe a result that extends existing techniques to heterogeneous neural networks involving graph limits via digraph measures. This shows how large classes of neural networks naturally fall within the framework of Kuramoto-type models on graphs and their large-graph limits. Finally, we point out that similar strategies to use dynamics to study explainable and reliable AI can also be applied to settings such as generative models or fundamental issues in gradient training methods, such as backpropagation or vanishing/exploding gradients.

2507.11366 2026-06-17 cs.GT cs.LG 版本更新

Characterizing Nash Equilibria in Zero-Sum Games: A Physics-Inspired, Parallelizable Approach with a Linear Number of Gradient Queries

零和博弈中纳什均衡的表征:一种受物理学启发、可并行化且具有线性梯度查询次数的方法

Taemin Kim, James P. Bailey

发表机构 * Industrial and Systems Engineering(工业与系统工程系) Rensselaer Polytechnic Institute(伦塞拉尔理工学院)

AI总结 提出一种受哈密顿动力学启发的在线优化方法,通过交替梯度下降在线性迭代次数内表征零和博弈的纳什均衡集,支持并行化和任意学习率,实验性能显著优于传统方法。

详情
AI中文摘要

我们研究零和博弈的在线优化方法,这是机器学习、经济学及许多其他领域中对抗性学习的一个基本问题。传统方法使用基于遗憾的方法(时间平均收敛)或基于收缩映射的方法(最后迭代收敛)来近似纳什均衡。我们提出一种基于物理学中哈密顿动力学的新方法,并证明在无界设置下,除退化情况外,它能在有限(线性)次交替梯度下降迭代中表征纳什均衡集,这是在线优化中的首次。与计算纳什均衡的标准方法不同,我们提出的方法可并行化且适用于任意学习率,这两者在算法博弈论中均为首次。实验上,我们通过展示我们的方法显著优于标准方法来支持我们的结果。

英文摘要

We study online optimization methods for zero-sum games, a fundamental problem in adversarial learning in machine learning, economics, and many other domains. Traditional methods approximate Nash equilibria (NE) using either regret-based methods (time-average convergence) or contraction-map-based methods (last-iterate convergence). We propose a new method based on Hamiltonian dynamics in physics and prove that it can characterize the set of NE in a finite (linear) number of iterations of alternating gradient descent in the unbounded setting, modulo degeneracy, a first in online optimization. Unlike standard methods for computing NE, our proposed approach can be parallelized and works with arbitrary learning rates, both firsts in algorithmic game theory. Experimentally, we support our results by showing our approach drastically outperforms standard methods.

2602.17894 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning from Biased and Costly Data Sources: Minimax-optimal Data Collection under a Budget

从有偏且昂贵的数据源学习:预算下的极小极大最优数据收集

Michael O. Harding, Vikas Singh, Kirthevasan Kandasamy

AI总结 针对预算固定的多源数据收集问题,提出最大化有效样本量的采样方案,结合事后分层估计器,实现极小极大最优风险。

Comments COLT 2026

详情
AI中文摘要

数据收集是现代统计和机器学习流程的关键组成部分,特别是当必须从多个异质数据源收集数据以研究感兴趣的目标总体时。在许多用例中,如医学研究或政治民意调查,不同数据源产生不同的采样成本。观测通常具有相关的群体身份——例如健康指标、人口统计或政治派别——并且这些群体的相对组成可能在源总体之间以及源总体与目标总体之间存在显著差异。在这项工作中,我们研究在固定预算下的多源数据收集,重点关注总体均值和群体条件均值的估计。我们表明,朴素的数据收集策略(例如试图“匹配”目标分布)或依赖标准估计量(例如样本均值)可能高度次优。相反,我们开发了一种采样方案,该方案最大化有效样本量——总样本量除以 $D_{\chi^2}(q\mid\mid\overline{p}) + 1$,其中 $q$ 是目标分布,$\overline{p}$ 是聚合源分布,$D_{\chi^2}$ 是 $\chi^2$ 散度。我们将此采样方案与经典的事后分层估计器配对,并给出其风险的上界。我们提供了匹配的下界,证明我们的方法达到了预算下的极小极大最优风险。我们的技术也扩展到最小化超额风险的预测问题,为具有昂贵和异质数据源的多源学习提供了原则性方法。

英文摘要

Data collection is a critical component of modern statistical and machine learning pipelines, particularly when data must be gathered from multiple heterogeneous sources to study a target population of interest. In many use cases, such as medical studies or political polling, different sources incur different sampling costs. Observations often have associated group identities - for example, health markers, demographics, or political affiliations - and the relative composition of these groups may differ substantially, both among the source populations and between sources and target population. In this work, we study multi-source data collection under a fixed budget, focusing on the estimation of population means and group-conditional means. We show that naive data collection strategies (e.g. attempting to "match" the target distribution) or relying on standard estimators (e.g. sample mean) can be highly suboptimal. Instead, we develop a sampling plan which maximizes the effective sample size - the total sample size divided by $D_{χ^2}(q\mid\mid\overline{p}) + 1$, where $q$ is the target distribution, $\overline{p}$ is the aggregated source distribution, and $D_{χ^2}$ is the $χ^2$-divergence. We pair this sampling plan with a classical post-stratification estimator and upper bound its risk. We provide matching lower bounds, establishing that our approach achieves the budgeted minimax optimal risk. Our techniques also extend to prediction problems when minimizing the excess risk, providing a principled approach to multi-source learning with costly and heterogeneous data sources.

2603.02159 2026-06-17 stat.ML cs.LG 版本更新

Instrumental and Proximal Causal Inference with Gaussian Processes

基于高斯过程的工具变量和近端因果推断

Yuqi Zhang, Krikamol Muandet, Dino Sejdinovic, Edwin Fong, Siu Lun Chau

AI总结 提出去条件高斯过程框架,用于存在未观测混杂时的因果推断,同时提供可靠的后验不确定性量化,并通过边际似然优化实现模型选择。

详情
AI中文摘要

工具变量(IV)和近端因果学习(Proxy)方法是在存在未观测混杂情况下进行因果推断的核心框架。尽管方法论上取得了重大进展,现有方法很少提供可靠的认知不确定性(EU)量化。我们通过一个去条件高斯过程(DGP)框架来解决这一差距,用于不确定性感知的因果学习。我们的公式将流行的核估计量恢复为后验均值,确保了预测精度,而后验方差则提供了有原则且校准良好的EU。此外,概率结构通过边际对数似然优化实现了系统的模型选择。实证结果表明,通过经验覆盖频率和决策感知的准确率拒绝曲线评估,该方法在提供信息丰富的EU量化的同时,表现出强大的预测性能。总之,我们的方法为存在未观测混杂情况下的因果推断提供了一个统一、实用的解决方案,并具有可靠的不确定性。

英文摘要

Instrumental variable (IV) and proximal causal learning (Proxy) methods are central frameworks for causal inference in the presence of unobserved confounding. Despite substantial methodological advances, existing approaches rarely provide reliable epistemic uncertainty (EU) quantification. We address this gap through a Deconditional Gaussian Process (DGP) framework for uncertainty-aware causal learning. Our formulation recovers popular kernel estimators as the posterior mean, ensuring predictive precision, while the posterior variance yields principled and well-calibrated EU. Moreover, the probabilistic structure enables systematic model selection via marginal log-likelihood optimization. Empirical results demonstrate strong predictive performance alongside informative EU quantification, evaluated via empirical coverage frequencies and decision-aware accuracy rejection curves. Together, our approach provides a unified, practical solution for causal inference under unobserved confounding with reliable uncertainty.

2604.06531 2026-06-17 math.OC cs.LG cs.MA cs.SY eess.SY stat.ML 版本更新

A Generalized Sinkhorn Algorithm for Mean-Field Schrödinger Bridge

平均场薛定谔桥的广义Sinkhorn算法

Asmaa Eldesoukey, Yongxin Chen, Abhishek Halder

AI总结 针对平均场薛定谔桥问题,提出广义Hopf-Cole变换并设计Sinkhorn型递归算法求解积分-偏微分方程组,在弱假设下证明收敛性,数值实验验证有效性。

详情
AI中文摘要

平均场薛定谔桥(MFSB)问题涉及设计一个最小努力控制器,引导具有非局部相互作用的扩散过程在固定截止时间内从给定分布到达另一个分布。与标准薛定谔桥不同,MFSB的动态约束是带有控制器的相互作用智能体群体的平均场极限。它是大规模多智能体系统的自然模型。由于非局部相互作用使问题非凸,MFSB在计算上具有挑战性。我们提出了MFSB的Hopf-Cole变换的推广,并在此基础上设计了一种Sinkhorn型递归算法来求解相关的积分-偏微分方程组。在相互作用势的温和假设下,我们讨论了所提算法的收敛性保证。我们通过排斥和吸引相互作用的数值示例来说明理论贡献。

英文摘要

The mean-field Schrödinger bridge (MFSB) problem concerns designing a minimum-effort controller that guides a diffusion process with nonlocal interaction to reach a given distribution from another by a fixed deadline. Unlike the standard Schrödinger bridge, the dynamical constraint for MFSB is the mean-field limit of a population of interacting agents with controls. It serves as a natural model for large-scale multi-agent systems. The MFSB is computationally challenging because the nonlocal interaction makes the problem nonconvex. We propose a generalization of the Hopf-Cole transform for MFSB and, building on it, design a Sinkhorn-type recursive algorithm to solve the associated system of integro-PDEs. Under mild assumptions on the interaction potential, we discuss convergence guarantees for the proposed algorithm. We present numerical examples with repulsive and attractive interactions to illustrate the theoretical contributions.

2604.23628 2026-06-17 cs.DS cs.LG 版本更新

Characterizing Admissible Objective Functions for Hierarchical Clustering

刻画层次聚类的可容许目标函数

Ryuki Tsukuba, Kazutoshi Ando

发表机构 * Faculty of Engineering, Shizuoka University(izuoka大学工学部) Graduate School of Integrated Science and Technology, Shizuoka University(izuoka大学综合科学技术研究院)

AI总结 本文研究层次聚类的可容许目标函数,对基于聚合相似度的和型目标函数,完整刻画了对称多项式次数≤2时的可容许性,并给出次数为3的充分条件;引入最大型目标函数,刻画了任意对称缩放函数的可容许性。

Comments 20 pages, 3 figures. Minor correction to abstract metadata. Manuscript unchanged from v2. Submitted to Discrete Applied Mathematics

详情
AI中文摘要

层次聚类是数据分析中的基本任务,但经典方法长期缺乏有原则的目标函数。Dasgupta [STOC~2016] 通过提出一个动机良好的聚类树目标函数,朝着填补这一空白迈出了重要一步。Cohen-Addad 等人 [J. ACM 2019] 随后引入了可容许性的概念:如果一个目标函数在输入相似度矩阵允许生成树时,其极小化器恰好是生成该矩阵的树,则该目标函数是可容许的。他们还给出了基于聚合簇间相似度的一类目标函数中可容许性的充要条件。我们将这类函数称为和型目标函数。然而,除了 Dasgupta 的原始目标函数外,该类中没有给出显式的可容许目标函数。本文从两个方向研究层次聚类的可容许目标函数。对于和型目标函数,当缩放函数是次数不超过2的对称多项式时,我们给出了完整的刻画,并推导了次数为3的多项式的充分条件。我们还证明,递归最稀疏割算法对我们刻画所覆盖的可容许目标函数实现了 O($\phi$) 的近似比,其中 $\phi$ 是最稀疏割子程序的近似因子。然后,我们引入了最大型目标函数,其中簇间相互作用通过最大簇间相似度而非聚合相似度来度量。对于该类,我们刻画了哪些目标函数对于任意对称缩放函数是可容许的,并在缩放函数是次数不超过2的对称多项式时给出了完整刻画。

英文摘要

Hierarchical clustering is a fundamental task in data analysis, but classical methods have long lacked a principled objective function. Dasgupta [STOC 2016] took an important step toward addressing this gap by proposing a well-motivated objective function for cluster trees. Cohen-Addad et al. [J. ACM 2019] subsequently introduced the notion of admissibility: an objective function is admissible if, whenever the input similarity matrix admits generating trees, its minimizers are precisely those generating trees. They also gave a necessary and sufficient condition for admissibility within a family of objective functions based on aggregate intercluster similarity. We refer to this family as sum-type objective functions. However, apart from Dasgupta's original objective function, no explicit admissible objective functions in this family were provided. In this paper, we study admissible objective functions for hierarchical clustering in two directions. For sum-type objective functions, we give a complete characterization when the scaling function is a symmetric polynomial of degree at most two, and we derive sufficient conditions for degree-three polynomials. We also show that the recursive sparsest cut algorithm achieves an O$(ϕ)$-approximation ratio for the admissible objective functions covered by our characterization, where $ϕ$ is the approximation factor of the sparsest cut subroutine. We then introduce max-type objective functions, where cluster interaction is measured by maximum, rather than aggregate, intercluster similarity. For this class, we characterize which objective functions are admissible for arbitrary symmetric scaling functions and give a complete characterization when the scaling function is a symmetric polynomial of degree at most two.

2605.29669 2026-06-17 stat.ML cs.LG math.PR math.ST stat.TH 版本更新

Eigen-Spike Emergence and Quadratic Equivalents for Conjugate Kernels on Nonlinearly Separable Data

Eigen-Spike 涌现与共轭核在非线性可分数据上的二次等价

Collin Cranston, Zhichao Wang, Todd Kemp, Michael W. Mahoney

发表机构 * Department of Mathematics ICSI and Department of Statistics(数学系ICSI和统计系) University of California, San Diego, USA(美国加州大学圣地亚哥分校) University of California, Berkeley, USA(美国加州大学伯克利分校) Department of Mathematics ICSI, LBNL and Department of Statistics(数学系ICSI、劳伦斯伯克利国家实验室和统计系)

AI总结 针对非线性可分数据(XOR问题),通过共轭核矩阵的二次等价模型,分析异常特征值涌现及其与标签对齐的BBP型相变,揭示样本复杂度、信噪比、激活函数和预训练特征对非线性可学习性的影响。

Comments 81 pages, 8 figures

详情
AI中文摘要

近期随机矩阵理论(RMT)工作发展了确定性等价的概念:通常是线性代理模型,用于近似大型非线性随机矩阵(如神经网络中的非线性特征映射)的谱行为。一方面,这些确定性等价通过将复杂模型简化为具有经典RMT工具特性的更简单模型,使理论预测易于处理。然而,这留下了一个问题:在处理高维非线性可分数据(例如对非线性可分数据进行分类)时,这种理想化的线性等价是否仍然有意义。受此启发,我们考虑前馈神经网络的非线性特征映射——共轭核(CK),在典型的非线性可分数据集XOR问题上;我们利用CK中信息性异常特征值的研究及其对应特征向量是否渐近与XOR标签对齐,作为非线性可学习性的代理。我们开发了尖峰CK矩阵的稳健二次等价,从而能够精确分析随着修改机器学习实践中常见的各种旋钮(样本复杂度、信噪比、非线性激活选择以及预训练特征)时涌现的信息性尖峰。在每种情况下,我们推导出精确的BBP型相变,其中通过CK特征向量的线性分类变得可能。我们的分析有助于将RMT中确定性等价工具的力量转化为研究机器学习中实际相关的问题。

英文摘要

Recent work in random matrix theory (RMT) has developed the notion of deterministic equivalents: typically linear surrogate models that approximate the spectral behavior of large nonlinear random matrices, such as nonlinear feature maps in neural networks (NNs). Such equivalents make theoretical predictions tractable by reducing a complex model to a simpler one with properties that fall under the umbrella of classical RMT tools. However, this leaves open the question of whether this idealized linear equivalence remains meaningful for classification of high-dimensional nonlinearly separable data. Motivated by this, we consider the conjugate kernel (CK), which is the nonlinear feature map of a one-layer feedforward NN, under a canonical nonlinearly separable dataset for the XOR problem; and we use the study of informative outlier eigenvalues in the CK and whether their corresponding eigenvectors asymptotically align with XOR labels as a proxy for nonlinear learnability. We develop a robust quadratic equivalent of the CK matrix that enables a precise analysis of emergent informative spikes, as one modifies various knobs common in ML practice: sample complexity, signal-to-noise ratio (SNR), nonlinear activation choice, and pretrained features. We identify regimes in which these knobs move the CK beyond the linear equivalent and produce BBP-type transitions to label-aligned outlier eigenspaces. Our analysis helps bring deterministic-equivalence tools from RMT to bear on problems of practical relevance in ML.

2606.14954 2026-06-17 math.FA cs.LG math.OC stat.ML 版本更新

Representation Costs in Data Science: Foundations and the Quasi-Banach Spaces of Deep Neural Networks

数据科学中的表示代价:基础与深度神经网络的拟巴拿赫空间

Greg Ongie, Rahul Parhi

发表机构 * Marquette University(马凯特大学) University of California, San Diego(加州大学圣地亚哥分校)

AI总结 本文建立了一个统一框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价,揭示了深度神经网络诱导的本征空间是拟巴拿赫空间,并证明了表示定理等自然结果。

详情
AI中文摘要

我们开发了一个通用框架,通过参数空间正则化子分析参数化数据拟合方法的表示代价。从这个抽象视角,我们定义了任意参数化模型的表示代价,并揭示了它们诱导的(本征)函数空间。这统一了最近数据拟合方法的函数空间观点。我们还证明了许多自然结果在这个抽象设置中成立,包括参数方法在其本征空间上的表示定理。该框架还严格地将参数化方法与其在充分过参数化下的等价非参数描述联系起来。经典方法及其本征空间,如核方法/再生核希尔伯特空间、小波/贝索夫空间和浅层神经网络/变分空间,都是我们抽象框架的特例。将表示代价研究“公理化”的一个副产品是,我们立即获得了深度神经网络的新结果:对于深度为$L$的前馈ReLU网络,其诱导的本征空间是$p$范数可拟的拟巴拿赫空间,其中$p = 2/L$。这揭示了深度神经网络的归纳偏置(由表示代价给出)在深度$L > 2$时无法被范数捕捉。

英文摘要

We develop a general framework for analyzing representation costs of parametric data-fitting methods through their parameter-space regularizers. From this abstract perspective, we define representation costs for arbitrary parametric models and reveal their induced (native) function spaces. This unifies recent function-space views of data-fitting methods. We also prove that many natural results hold in this abstract setting, including representer theorems for parametric methods on their native spaces. The framework also rigorously connects parametric methods with their equivalent nonparametric descriptions under sufficient overparameterization. Classical methods and their native spaces, such as kernel methods / reproducing kernel Hilbert spaces, wavelets / Besov spaces, and shallow neural networks / variation spaces emerge as special cases of our abstract framework. A byproduct of "axiomatizing" the study of representation costs is that we also immediately obtain new results for deep neural networks: For depth-$L$ feedforward ReLU networks, their induced native spaces are $p$-normable quasi-Banach spaces with $p = 2/L$. This reveals that the inductive bias of deep neural networks (as given by the representation cost) cannot be captured by norms for depths $L > 2$.

6. 高效学习、压缩与部署 18 篇

2606.17118 2026-06-17 cs.LG cs.AI 新提交

MODE: Modality-Decomposed Expert-Level Mixed-Precision Quantization for MoE Multimodal LLMs

MODE: 面向MoE多模态大语言模型的模态分解专家级混合精度量化

Yuanteng Chen, Peisong Wang, Zhilei Liu, Nanxin Zeng, Yuantian Shao, Shiqiang Lang, Tao Liu, Chuangyi Li, Qinghao Hu, Gang Li, Jing Liu, Jian Cheng

发表机构 * Institute of Automation, Chinese Academy of Sciences(中国科学院自动化研究所) School of Artificial Intelligence, University of Chinese Academy of Sciences(中国科学院大学人工智能学院) Zhongguancun Academy(中关村学院)

AI总结 针对MoE多模态大语言模型在专家重要性估计中存在的跨模态和视觉内偏差,提出模态分解的专家级混合精度量化框架MODE,通过分解选择频率、过滤冗余视觉令牌并评估模态敏感性,在给定预算下分配比特宽度,在W3A16下平均性能损失控制在2.9%以内。

Comments 18 pages, 8 figures

详情
AI中文摘要

混合专家多模态大语言模型(MoE-MLLMs)性能卓越,但GPU内存成本高昂,因此压缩至关重要。在PTQ方法中,专家级混合精度量化已被证明对MoE-LLMs有效,但由于专家重要性估计中两个被忽视的偏差,在MoE-MLLMs上性能显著下降。(1)在跨模态层面,视觉令牌的数值优势导致专家选择频率被视觉令牌主导,掩盖了对文本模态至关重要的专家;(2)在视觉内层面,大量冗余视觉令牌进一步扭曲频率统计,模糊了对信息性视觉内容关键的专家。为弥补差距,我们提出MODE,一种面向MoE-MLLMs的模态分解专家级混合精度量化框架,该框架按模态分解专家选择频率,过滤冗余视觉令牌以获得去噪的视觉频率,并进一步评估每个模态的量化敏感性作为基于频率估计的补充信号。这些信号被整合到整数线性规划公式中,以在给定预算下分配每个专家的比特宽度。大量实验表明,MODE特别适合MoE-MLLMs,在W3A16下平均性能损失限制在2.9%以内,在极端2比特设置下获得更大增益。

英文摘要

Mixture-of-Experts Multimodal Large Language Models (MoE-MLLMs) offer remarkable performance but incur prohibitive GPU memory costs, making compression essential. Among PTQ methods, expert-level mixed-precision quantization has proven effective for MoE-LLMs, yet suffers notable degradation on MoE-MLLMs due to two overlooked biases in expert importance estimation. (1) At the cross-modal level, the numerical dominance of vision tokens causes expert selection frequency to be dominated by vision tokens, masking experts that are critical to the text modality; (2) at the intra-vision level, the large proportion of redundant vision tokens further skew frequency statistics, obscuring experts critical for informative visual content. To bridge gaps, we propose MODE, a modality-decomposed expert-level mixed-precision quantization framework for MoE-MLLMs that decomposes expert selection frequency by modality, filters redundant vision tokens to obtain denoised visual frequency, and further evaluates quantization sensitivity per modality as a complementary signal to frequency-based estimation. These signals are integrated into an Integer Linear Programming formulation to assign per-expert bit-widths under a given budget. Extensive experiments show that MODE is particularly well-suited for MoE-MLLMs, limiting average performance loss to within 2.9% at W3A16, with larger gains at the extreme 2-bit setting.

2606.17460 2026-06-17 cs.LG cs.NA math.NA physics.comp-ph 新提交

Operator Boosting Produces Pareto-Efficient PDE Surrogates

算子提升产生帕累托高效的PDE代理模型

Lennon J. Shikhman

发表机构 * College of Computing, Georgia Institute of Technology(佐治亚理工学院计算学院) Department of Mathematics and Systems Engineering, Florida Institute of Technology(佛罗里达理工学院数学与系统工程系)

AI总结 提出算子提升框架,通过残差学习直接构建紧凑神经算子代理,在30个数据集-架构对上平均准确率提升,参数量减少72-95%,并在多个PDE基准上实现帕累托改进。

Comments 19 pages, 4 figures, 3 tables. Preprint submitted to Elsevier

详情
AI中文摘要

神经算子被广泛用作偏微分方程(PDE)的代理解映射,但在多查询科学工作流中,全尺寸模型可能存储、部署和评估成本高昂。本文引入算子提升(Operator Boosting),一种逐阶段残差学习框架,直接构建紧凑的神经算子代理,而非先训练大模型再压缩。从归一化输出坐标中的经验均值预测器开始,该方法在残差场上训练一系列同族小型神经算子,并通过验证选择的收缩整合每个修正。我们以傅里叶神经算子(FNO)、DeepONet和卷积神经算子(CNO)实例化该框架,并将提升的小型堆栈与来自PDEBench、APEBench和The Well的一维、二维和三维PDE基准上的全尺寸单体基线进行比较。在30个数据集-架构对中,21个显示平均准确率正向提升,17个具有正置信区间,而所有提升堆栈的可训练参数数量减少约72-95%。最佳模型比较显示,在10个完成的PDE基准中,有7个实现了经验帕累托改进,包括二维纳维-斯托克斯方程、浅水动力学、达西流、一维输运和反应系统,以及三维可压缩纳维-斯托克斯方程。这些结果表明,算子提升通常改善了神经PDE代理的经验准确率-参数帕累托前沿,同时也揭示了残差提升未能抵消压缩的PDE和架构依赖区域。

英文摘要

Neural operators are widely used as surrogate solution maps for partial differential equations (PDEs), but full-size models can be costly to store, deploy, and evaluate in many-query scientific workflows. This work introduces Operator Boosting, a stagewise residual-learning framework for constructing compact neural-operator surrogates directly, rather than training a large model and compressing it afterward. Starting from the empirical mean predictor in normalized output coordinates, the method trains a sequence of tiny same-family neural operators on residual fields and incorporates each correction through validation-selected shrinkage. We instantiate the framework with Fourier neural operators (FNOs), DeepONets, and convolutional neural operators (CNOs), and compare boosted tiny stacks against full-size monolithic baselines across one-, two-, and three-dimensional PDE benchmarks from PDEBench, APEBench, and The Well. Across 30 dataset-architecture pairs, 21 show positive mean accuracy gains and 17 have positive confidence intervals, while all boosted stacks reduce trainable parameter count by approximately 72-95%. Best-model comparisons show empirical Pareto improvements on 7 of 10 completed PDE benchmarks, including two-dimensional Navier-Stokes, shallow-water dynamics, Darcy flow, one-dimensional transport and reaction systems, and three-dimensional compressible Navier-Stokes. These results show that Operator Boosting often improves the empirical accuracy-parameter Pareto frontier of neural PDE surrogates, while also exposing PDE- and architecture-dependent regimes where residual boosting fails to offset compression.

2606.17471 2026-06-17 cs.LG cs.SY eess.SY 新提交

ReRAM-aware Model Finetuning addressing I-V Non-linearity and Retention Errors

面向ReRAM的模型微调:解决I-V非线性和保留误差

Ching-Yi Lin, Shamik Kundu, Arnab Raha, Sahil Shah

发表机构 * Intel Corporation(英特尔公司)

AI总结 提出一种基于微调的硬件感知训练算法,通过范围收缩的sinh变换缓解I-V非线性,并将保留误差纳入正则化损失,实现ReRAM上DNN的高效部署,在图像分类和问答任务中精度损失极小。

Comments 11 pages, 12 figures, 2 tables, with appendix (5 pages, 9 figures)

详情
AI中文摘要

传统的CPU、GPU和NPU架构日益受到冯·诺依曼瓶颈的限制。虽然使用ReRAM交叉阵列的存内计算(IMC)提供了一种高密度、高能效的替代方案,但其实际部署受到非理想特性的制约。现有的硬件感知训练框架通常需要从头开始训练,这对于现代大规模模型来说计算成本过高。在这项工作中,我们提出了一种基于微调的硬件感知训练算法,能够在最小训练开销下实现DNN在ReRAM上的鲁棒部署。我们的方法通过应用范围收缩的sinh变换来缓解I-V非线性,并在微调过程中将保留误差直接纳入正则化损失。我们在图像分类和问答(QA)等模型和任务上评估了我们的框架。实验结果表明,我们的方法在ResNet18和DeiT-Tiny等大规模模型上实现了与基础模型相似的精度。在ImageNet上的MobileNetV3系列中,该技术的精度下降不到2%。此外,将该技术应用于SQuAD v2数据集,F-1分数仅下降1点。

英文摘要

Traditional CPU, GPU, and NPU architectures are increasingly limited by the von Neumann bottleneck. While In-Memory Computing (IMC) using ReRAM crossbar arrays offers a high-density, energy-efficient alternative, its practical deployment is constrained through their non-idealities. Existing hardware-aware training frameworks often require training from scratch, which is computationally prohibitive for modern large-scale models. In this work, we propose a finetuning-based hardware-aware training algorithm that enables robust DNN deployment on ReRAM with minimal training overhead. Our approach mitigates I-V non-linearity by applying a range-shrunk sinh transformation and incorporates retention errors directly into a regularization loss during the finetuning process. We evaluate our framework across models and tasks such as image classification and question-answering (QA). Experimental results demonstrate that our method achieves similar accuracy on large-scale models like ResNet18 and DeiT-Tiny as the base model. In-case of ImageNet for MobileNetV3 families the technique has only less than 2% accuracy degradation. Further, applying the technique on the SQuAD v2 dataset results in only 1 point degradation of F-1 score.

2606.17500 2026-06-17 cs.LG cs.AR 新提交

Reconfigurable Computing Challenge: Transformer for Jet Tagging on Versal AI Engines

可重构计算挑战:Versal AI Engine上的喷注标记Transformer

Gram Koski, Sean Lipps, Zhenghua Ma, G. Abarajithan, Ryan Kastner

发表机构 * Department of Computer Science and Engineering(计算机科学与工程系) University of California San Diego(加州大学圣地亚哥分校) La Jolla, CA, USA(拉贾拉, 加州, 美国)

AI总结 针对CERN LHC喷注标记任务,提出在AMD Versal AI Engine上部署量化整数Transformer的初始实现,并开发可重用软件框架自动生成Vitis图代码。

Comments 4 pages, 4 figures. In FCCM 2026 proceedings

详情
Journal ref
2026 IEEE 34th Int. Symp. on Field-Programmable Custom Computing Machines (FCCM), Atlanta, GA, USA, 2026, pp. 307-310
AI中文摘要

基于Transformer的模型在CERN LHC的喷注标记中表现出强大的性能,但在低延迟、资源受限的触发系统中部署它们具有挑战性。我们提出了一个在AMD Versal AI Engine(AIE)上用于喷注标记的量化、纯整数Transformer的初始实现,将密集层和多头注意力(MHA)层映射到AIE瓦片。主要贡献是一个可重用的软件框架,该框架将Transformer层表示为可组合的AIE构建块,并从高级Python模型描述自动生成相应的Vitis图代码。该框架为未来研究提供了基础,并作为开源软件在此https URL发布。

英文摘要

Transformer-based models achieve strong performance for jet tagging at the CERN LHC, but deploying them in low-latency, resource-constrained trigger systems is challenging. We present an initial implementation of a quantized, integer-only transformer for jet tagging on the AMD Versal AI Engine (AIE), mapping dense and multi-head attention (MHA) layers to AIE tiles. The main contribution is a reusable software framework that represents transformer layers as composable AIE building blocks and automatically generates the corresponding Vitis graph code from a high-level Python model description. This framework provides a foundation for future research and is released as open-source software at https://github.com/KastnerRG/particle_transformer_aie.

2606.17803 2026-06-17 cs.LG 新提交

Continual Self-Improvement with Lightweight Experiential Latent Memories

持续自我改进:轻量级经验潜在记忆

Vaggelis Dorovatas, Nancy Kalaj, Rahaf Aljundi

发表机构 * Toyota Motor Europe(丰田汽车欧洲公司) University of Trento(特伦托大学)

AI总结 提出一种在线方法,将推理时计算转化为轻量级模块化潜在记忆,通过自生成测试时信号进行训练,实现持续改进且避免灾难性遗忘。

详情
AI中文摘要

大型语言模型通过扩展推理时计算实现了强大的推理性能,但本质上仍然是无状态的,丢弃了在此过程中产生的丰富、自生成的推理轨迹。我们研究模型是否可以从这种经验中在线学习,将瞬态计算(推理轨迹)转化为持久可复用的知识,且无需外部监督或访问未来数据。我们表明,对原始推理轨迹进行上下文学习(ICL)无法泛化,反映了令牌级复用的根本局限性:即使经过细化(例如自我反思),单个轨迹也缺乏迁移所需的抽象。相比之下,受近期无监督强化学习工作的启发,我们发现使用自生成的测试时信号(多数投票)作为奖励的轻量级每实例训练能带来显著收益,通常超过全数据集离线训练,这促使从原始轨迹转向学习到的潜在表示。基于这一见解,我们提出一种在线方法,将遇到问题所花费的推理时计算蒸馏为紧凑的模块化潜在记忆,捕捉底层推理结构。这些记忆被存储并检索用于未来输入,通过模块化设计实现持续改进,同时避免灾难性遗忘。重要的是,我们的方法高效,参数化为极其轻量级的软提示记忆(约模型参数的0.001%),仅需少量梯度步训练,但性能与完全参数更新和离线训练相当。在具有挑战性的数学推理基准测试中,我们的方法显著优于零样本和原始数据ICL基线,并在数据集间有效迁移。

英文摘要

Large language models achieve strong reasoning performance by scaling inference-time compute, yet remain fundamentally stateless, discarding the rich, self-produced reasoning traces generated during this process. We investigate whether models can instead learn online from this experience, converting transient computation (reasoning traces) into persistent reusable knowledge, and without external supervision or access to future data. We show that In-Context Learning (ICL) over raw reasoning traces fails to generalize, reflecting a fundamental limitation of token-level reuse: individual traces lack the abstraction needed for transfer, even after refinement (e.g. self-reflection). In contrast, drawing inspiration from recent works on unsupervised reinforcement learning, we find that lightweight per-instance training with self-generated test-time signals (majority voting) as rewards yields substantial gains, often surpassing full-dataset offline training, motivating a shift from raw traces to learned latent representations. Building on this insight, we propose an online method that distills inference-time compute spent on encountered problems into compact modular latent memories capturing the underlying reasoning structure. These memories are stored and retrieved for future inputs, enabling continual improvement while avoiding catastrophic forgetting through modular design. Importantly, our method is highly efficient, parametrized as extremely lightweight soft prompt memories (~0.001% of model parameters) and trained with only a few gradient steps, yet achieving performance competitive with full parametric updates and offline training. Across challenging mathematical reasoning benchmarks, our approach significantly outperforms zero-shot and raw data ICL baselines, while transferring effectively across datasets.

2606.17872 2026-06-17 cs.LG cs.AI 新提交

AnchorKV: Safety-Aware KV Cache Compression via Soft Penalty with a Refusal Anchor

AnchorKV: 通过拒绝锚点的软惩罚实现安全感知的KV缓存压缩

Ning Ni, Yingjie Lao

发表机构 * Department of Computer Science, Tufts University(塔夫茨大学计算机科学系) Department of Electrical and Computer Engineering, Tufts University(塔夫茨大学电气与计算机工程系)

AI总结 提出AnchorKV,一种通过软惩罚机制调整令牌保留分数以远离有害提示的KV缓存压缩方法,在保持实用性的同时显著提升安全性。

详情
AI中文摘要

大型语言模型(LLMs)在生成推理和长上下文任务上优于早期架构,但其庞大的规模在内存使用、能耗和设备端部署方面带来了重大挑战。由于缩放预训练语言模型能提升下游能力\cite{zhao2023survey},键值(KV)缓存成为主要的推理瓶颈。最近的KV缓存压缩方法\cite{jo2025fastkv,li2024snapkv,zhou2024dynamickv}通过仅保留注意力相关令牌的子集来降低这一成本。然而,虽然这些方法在良性工作负载上保持了准确性,但其压缩策略要么无法防御越狱攻击\cite{jiang2024robustkv},要么在激进驱逐下降低安全对齐。我们提出AnchorKV,一种对KV缓存压缩的即插即用修改,它使令牌保留分数偏向远离与有害提示相关的键空间方向。AnchorKV通过将均值差异表示工程方法\cite{arditi2024refusal,zou2023representation}适配到KV缓存中使用的层特定键投影空间,构建了一个离线安全锚点。基于该锚点,一种软惩罚令牌选择规则以少量效用换取显著改善的安全对齐,当惩罚为零时则退化为原始压缩器。

英文摘要

Large language models (LLMs) outperform earlier architectures on generative inference and long-context tasks, but their large size introduces significant challenges in memory usage, energy cost, and on-device deployment. Since scaling pre-trained language models improves downstream capability \cite{zhao2023survey}, the key-value (KV) cache becomes a dominant inference bottleneck. Recent KV cache compression methods \cite{jo2025fastkv,li2024snapkv,zhou2024dynamickv} reduce this cost by retaining only a subset of attention-relevant tokens. However, while these approaches preserve accuracy on benign workloads, their compression policies either fail to defend against jailbreak attacks \cite{jiang2024robustkv} or degrade safety alignment under aggressive eviction. We propose AnchorKV, a drop-in modification to KV cache compression that biases token retention scores away from directions in key space associated with harmful prompts. AnchorKV constructs an offline safety anchor by adapting a difference-of-means representation engineering approach \cite{arditi2024refusal,zou2023representation} to the layer-specific key projection space used in KV caching. Based on this anchor, a soft penalty token selection rule trades a small amount of utility for substantially improved safety alignment, while reducing to the original compressor when the penalty is zero.

2606.18096 2026-06-17 cs.LG cs.AI cs.DC 新提交

S4oP: Operator-level Pruning of Structured State Space Models for Resource-Constrained Devices

S4oP:面向资源受限设备的结构化状态空间模型的算子级剪枝

Marco Deano, Filippo Ziche, Nicola Bombieri

发表机构 * University of Verona(威尼斯大学)

AI总结 提出一种针对S4和S4D模型的增量算子级剪枝方法,通过结构化掩码与微调交替进行,在保持预测性能的同时显著降低推理成本,首次系统研究SSM的结构化算子剪枝。

详情
AI中文摘要

结构化状态空间模型(SSMs),包括S4和S4D架构,最近已成为捕捉序列数据中长程依赖关系的基于注意力模型的有力替代方案。尽管其经验性能强劲,但由于计算和内存需求,在时间和资源受限的环境中部署这些模型仍然具有挑战性。在本文中,我们提出了一种新颖的增量式算子级剪枝方法,用于基于S4和S4D的模型,该方法在保持预测性能的同时显著降低推理成本。据我们所知,这是首个系统研究SSM结构化算子剪枝的工作。我们的方法通过将结构化掩码与微调交替进行,逐步剪枝模型算子,同时联合监控准确性和推理延迟。我们在一个统一的训练和评估框架中实现了这种方法,该框架能够系统地探索效率-准确性的权衡。在多个基准数据集上的实验表明,剪枝高达70%的模型算子在大多数情况下保持了原始模型的性能,同时显著降低了推理延迟。这些结果表明,结构化算子剪枝是一种有效且先前未被探索的提高SSM效率的策略,并有助于它们在资源受限的实际场景中的部署。

英文摘要

Structured State Space Models (SSMs), including the S4 and S4D architectures, have recently emerged as powerful alternatives to attention-based models for capturing long-range dependencies in sequential data. Despite their strong empirical performance, deploying these models in time- and resource-constrained settings remains challenging due to their computational and memory demands. In this paper, we propose a novel incremental, operator-level pruning approach for S4- and S4D-based models that significantly reduces inference cost while preserving predictive performance. To the best of our knowledge, this is the first work to systematically investigate structured operator pruning for SSMs. Our method progressively prunes model operators by interleaving structured masking with fine-tuning, while jointly monitoring accuracy and inference latency. We implement this approach within a unified training and evaluation framework that enables systematic exploration of efficiency-accuracy trade-offs. Experiments across multiple benchmark datasets show that pruning up to 70% of the model operators preserves the performance of the original models in most cases, while substantially reducing inference latency. These results demonstrate that structured operator pruning is an effective and previously unexplored strategy for improving the efficiency of SSMs and facilitate their deployment in practical, resource-constrained scenarios.

2606.18114 2026-06-17 cs.LG cs.AI 新提交

Ternary Mamba: Grouped Quantization-Aware Training of W1.58A16 State Space Models

Ternary Mamba: 分组量化感知训练的 W1.58A16 状态空间模型

Ramprasath Ganesaraja, Sahil Dilip Panse, Swathika N

发表机构 * EdgeVerve Systems Limited(EdgeVerve系统有限公司)

AI总结 提出从预训练检查点进行分组量化感知训练(QAT)结合知识蒸馏,以极低数据量(1亿token)将Mamba-2 1.3B压缩至3.61倍,零样本准确率接近Bi-Mamba,并发现预训练QAT特有的零比率坍塌问题。

详情
AI中文摘要

状态空间模型(SSM)如Mamba-2提供线性时间推理,但其内存占用限制了边缘部署。先前的三元SSM工作(Slender-Mamba)在150B token上从头训练;我们证明预训练检查点足以胜任,将边际token预算减少1000倍。使用分组量化感知训练(QAT)结合冻结FP16教师的知识蒸馏,我们将Mamba-2 1.3B压缩3.61倍(从2687 MB到744 MB),并在仅102M token(4 GPU小时,单H100)下达到48.1%的零样本准确率(7任务平均)——接近Bi-Mamba的48.4%(在+/-0.9pp置信区间内)。这种从预训练开始的QAT设置揭示了零比率坍塌,一种由可学习量化尺度引起的新不稳定性,在从头训练中不会出现。我们进一步证明,由于通过循环的误差累积,对Transformer有效的后处理校正策略对SSM失效。这些结果表明三元SSM不需要昂贵的从头训练:从预训练检查点进行QAT结合KD是一种数据高效的替代方案。

英文摘要

State Space Models (SSMs) such as Mamba-2 offer linear-time inference but their memory footprint limits edge deployment. Prior ternary SSM work (Slender-Mamba) trains from scratch on 150B tokens; we show a pretrained checkpoint suffices, reducing the marginal token budget by 1,000x. Using grouped quantization-aware training (QAT) with knowledge distillation from a frozen FP16 teacher, we compress Mamba-2 1.3B to 3.61x (2,687 to 744 MB) and achieve 48.1% zero-shot accuracy (7-task average) in just 102M tokens (4 GPU-hours, single H100) -- approaching Bi-Mamba's 48.4% (within +/-0.9pp CI). This QAT-from-pretrained setting reveals zero-ratio collapse, a novel instability caused by learnable quantization scales that does not arise in from-scratch training. We further show that post-hoc correction strategies effective for Transformers fail for SSMs due to error accumulation through the recurrence. These results demonstrate that ternary SSMs do not require expensive from-scratch training: QAT from pretrained checkpoints with KD is a data-efficient alternative.

2606.17249 2026-06-17 cs.AR cs.LG cs.NE eess.SP 交叉投稿

From Compression to Deployment: Real-Time and Energy-Efficient FastGRNN on Ultra-Constrained Microcontrollers

从压缩到部署:超受限微控制器上的实时节能FastGRNN

Emre Can Kizilates

发表机构 * Electronics Engineer Independent Researcher, Izmir, Turkey

AI总结 针对超受限微控制器,提出端到端开源FastGRNN压缩部署方案,结合低秩分解、稀疏化和量化,在8位和16位MCU上实现实时50Hz推理,模型仅566字节权重,F1达0.918,并贡献了跨平台确定性推理、循环预热延迟、无乘法器查找表和硬件能耗分析。

Comments 14 pages, 8 figures. Code: https://github.com/emre1998/fastgrnn-har

详情
AI中文摘要

现代机器学习的主导轨迹一直是规模化:更大的模型、更大的加速器、更大的内存预算。然而,多年的全球半导体供应限制以及始终在线推理日益增长的能源和碳成本暴露了这一轨迹的脆弱性,并推动了相反的方向:重构AI和ML算法,使其适应已经在可穿戴设备、传感器和边缘设备中大规模生产的小型、无处不在的微控制器。我们提出了FastGRNN(一种紧凑的门控循环单元)的端到端开源复现,部署在两个裸机目标上:8位Arduino(ATmega328P)和16位MSP430(无硬件乘法器;16 KB闪存;512 B SRAM)。我们的压缩流水线结合了低秩权重分解、迭代硬阈值稀疏性和基于张量的Q15训练后量化,并带有显式激活校准。部署的模型占用566字节权重,在HAPT测试集上达到宏F1=0.918(种子0;五个种子的Q15平均值为0.853±0.107)。它在3399个测试窗口上与PyTorch参考实现达到100%预测一致(MCU种子0;五个种子上99.91-100% C等效)。两个平台都支持实时50Hz流式推理(Arduino上每个样本9.21 ms;MSP430上13 ms),其中256条目sigmoid/tanh查找表在无乘法器的MSP430上实现了30.5倍加速。四个贡献扩展了原始FastGRNN论文:(i)跨平台位等效确定性推理;(ii)循环预热延迟的表征(中位数74个样本,1.48秒;最坏情况125个样本,2.50秒,超过100个测试窗口);(iii)针对无乘法器嵌入式目标的可部署查找表方案;(iv)硬件能耗表征,显示17.7 mW主动推理功率,<0.09 mW空闲功率,以及使用LUT实现96.7%的能耗降低。

英文摘要

The dominant trajectory of modern machine learning has been to scale up: larger models, larger accelerators, larger memory budgets. Yet a multi-year global semiconductor supply constraint and the growing energy and carbon cost of always-online inference expose the fragility of this trajectory and motivate the opposite direction: refactoring AI and ML algorithms to fit the small, ubiquitous microcontrollers already in mass production in wearables, sensors, and edge appliances. We present an end-to-end open-source reproduction of FastGRNN, a compact gated recurrent cell, deployed on two bare-metal targets: the 8-bit Arduino (ATmega328P) and the 16-bit MSP430 (no hardware multiplier; 16 KB Flash; 512 B SRAM). Our compression pipeline combines low-rank weight factorization, iterative hard-thresholding sparsity, and per-tensor Q15 post-training quantization with explicit activation calibration. The deployed model occupies 566 bytes of weights and achieves macro F1 = 0.918 (seed 0; five-seed Q15 mean 0.853+-0.107) on the HAPT test set. It matches a PyTorch reference at 100% prediction agreement across 3,399 test windows (MCU seed 0; 99.91-100% C-equivalent across five seeds). Both platforms sustain real-time 50 Hz streaming inference (9.21 ms per sample on Arduino; 13 ms on MSP430), where a 256-entry sigmoid/tanh look-up table delivers a 30.5x speedup on the multiplier-less MSP430. Four contributions extend the original FastGRNN paper: (i) cross-platform bit-equivalent deterministic inference; (ii) characterization of recurrent warm-up latency (median 74 samples, 1.48 s; worst-case 125 samples, 2.50 s over 100 test windows); (iii) a deployable look-up-table recipe for multiplier-less embedded targets; and (iv) hardware energy characterization showing 17.7 mW active inference power, <0.09 mW idle power, and 96.7% energy reduction with the LUT.

2606.17566 2026-06-17 cs.DC cs.LG 交叉投稿

AoiZora: Topology-Aware Auto-Parallel Optimization for Inference of Diffusion Transformers

AoiZora: 面向扩散变换器推理的拓扑感知自动并行优化

Kaijian Wang, Yuanyuan Xu, Fanjiang Ye, Ye Cao, Jingwei Zuo, T. S. Eugene Ng, Yarong Mu, Yuke Wang

发表机构 * Rice University(里士大学) Independent Researcher(独立研究者) Google(谷歌)

AI总结 针对扩散变换器推理中的低延迟需求,提出AoiZora编译器,通过拓扑感知的物理布局优化自动并行策略,在TPU子片上实现高达1.42倍的加速。

详情
AI中文摘要

视频扩散已迅速成为关键的生成服务负载,但生成每个片段需要对大型时空潜在变量进行多次去噪迭代,这使得在单个设备上难以实现低延迟推理。因此,去噪步骤通常分布在多个加速器上,而TPU子片已成为一种有吸引力且实用的计算结构。然而,当前的自动并行系统几乎完全在逻辑设备网格上进行搜索,忽略了所选分片在物理TPU互连上的实际布局——这种疏忽导致了大量与拓扑相关的性能损失。我们通过AoiZora填补了这一空白,这是一个专为TPU子片上低延迟视频扩散推理设计的编译器中介拓扑规划器。其指导原则是通过利用编译流程中的不同点,重新连接逻辑分片与物理布局:AoiZora首先从廉价的预编译IR中消除弱分片候选,然后仅编译存活的候选,并使用编译后的HLO结合拓扑感知通信模型对其物理布局进行排序。最终方案沿普通编译器路径实现,保持模型代码、编译器降级、集合内核和网络路由完全不变。在TPU v5e子片上,与现有解决方案相比,AoiZora将Wan 2.1单步去噪延迟降低了多达1.42倍。

英文摘要

Video diffusion has quickly grown into a key generative serving workload, yet producing each clip demands many denoising iterations over large spatio-temporal latents, which puts low-latency inference out of reach on a single device. A denoising step is therefore typically distributed across multiple accelerators, and TPU sub-slices have become an attractive and practical fabric for doing so. Current auto-parallel systems, however, search almost exclusively over logical device meshes and disregard how a chosen sharding is actually laid out on the physical TPU interconnect -- an oversight that leaves large, topology-dependent performance on the table. We address this gap with AoiZora, a compiler-mediated topology planner built for low-latency video diffusion inference on TPU sub-slices. Its guiding principle is to reconnect logical sharding with physical placement by drawing on different points in the compilation flow: AoiZora first eliminates weak sharding candidates from inexpensive pre-compilation IRs, then compiles only the ones that survive and orders their physical placements using compiled HLO together with a topology-aware communication model. The winning plan is realized along the ordinary compiler path, leaving model code, compiler lowering, collective kernels, and network routing entirely intact. On TPU v5e sub-slices, AoiZora reduces Wan 2.1 one-step denoising latency by as much as 1.42x relative to existing solutions.

2404.01965 2026-06-17 cs.LG cs.AI 版本更新

Towards Leveraging AutoML for Sustainable Deep Learning: A Multi-Objective HPO Approach on Deep Shift Neural Networks

迈向利用AutoML实现可持续深度学习:深度移位神经网络上的多目标HPO方法

Leona Hennig, Tanja Tornede, Marius Lindauer

AI总结 针对深度学习计算成本高的问题,提出结合多保真度HPO与多目标优化,在深度移位神经网络上同时最大化精度和最小化能耗,实验获得超80%精度且低计算开销。

详情
AI中文摘要

深度学习通过从大型数据集中提取复杂模式,推动了各个领域的发展。然而,深度学习模型的计算需求带来了环境和资源方面的挑战。深度移位神经网络(DSNNs)通过利用移位操作来降低推理时的计算复杂度,提供了一种解决方案。遵循标准DNNs的见解,我们感兴趣的是通过AutoML技术充分利用DSNNs的潜力。我们研究了超参数优化(HPO)的影响,以最大化DSNN性能,同时最小化资源消耗。由于这结合了多目标(MO)优化,其中精度和能耗作为潜在互补目标,我们提出将最先进的多保真度(MF)HPO与多目标优化相结合。实验结果表明了我们方法的有效性,得到了精度超过80%且计算成本低的模型。总体而言,我们的方法加速了高效模型开发,同时实现了可持续的AI应用。

英文摘要

Deep Learning (DL) has advanced various fields by extracting complex patterns from large datasets. However, the computational demands of DL models pose environmental and resource challenges. Deep shift neural networks (DSNNs) offer a solution by leveraging shift operations to reduce computational complexity at inference. Following the insights from standard DNNs, we are interested in leveraging the full potential of DSNNs by means of AutoML techniques. We study the impact of hyperparameter optimization (HPO) to maximize DSNN performance while minimizing resource consumption. Since this combines multi-objective (MO) optimization with accuracy and energy consumption as potentially complementary objectives, we propose to combine state-of-the-art multi-fidelity (MF) HPO with multi-objective optimization. Experimental results demonstrate the effectiveness of our approach, resulting in models with over 80\% in accuracy and low computational cost. Overall, our method accelerates efficient model development while enabling sustainable AI applications.

2505.00986 2026-06-17 cs.LG cs.CV 版本更新

EmbodiTTA: Resource-Efficient Test-Time Adaptation for Embodied Visual Systems

EmbodiTTA:面向具身视觉系统的资源高效测试时自适应

Xiao Ma, Young D. Kwon, Dong Ma

AI总结 提出按需测试时自适应范式OD-TTA,通过轻量域移检测、源域选择和分离批归一化更新,在边缘设备上实现高效准确的自适应,显著降低计算和能耗开销。

详情
AI中文摘要

连续测试时自适应(CTTA)持续对每个到达的数据批次调整部署模型。虽然达到了最优精度,但现有的CTTA方法由于巨大的内存开销和能耗,在资源受限的边缘设备上实际应用性差。本文首先引入一种新范式——按需TTA,仅在检测到显著域移时触发自适应。然后,我们提出OD-TTA,一种用于边缘设备上准确高效自适应的按需TTA框架。OD-TTA包含三项创新技术:1)轻量级域移检测机制,仅在需要时激活TTA,大幅降低总体计算开销;2)源域选择模块,选择合适的源模型进行自适应,确保高且鲁棒的精度;3)解耦的批归一化(BN)更新方案,实现小批量下的内存高效自适应。大量实验表明,OD-TTA在显著降低能量和计算开销的同时,实现了可比甚至更好的性能,使TTA成为实际可行的技术。

英文摘要

Continual Test-time adaptation (CTTA) continuously adapts the deployed model on every incoming batch of data. While achieving optimal accuracy, existing CTTA approaches present poor real-world applicability on resource-constrained edge devices, due to the substantial memory overhead and energy consumption. In this work, we first introduce a novel paradigm -- on-demand TTA -- which triggers adaptation only when a significant domain shift is detected. Then, we present OD-TTA, an on-demand TTA framework for accurate and efficient adaptation on edge devices. OD-TTA comprises three innovative techniques: 1) a lightweight domain shift detection mechanism to activate TTA only when it is needed, drastically reducing the overall computation overhead, 2) a source domain selection module that chooses an appropriate source model for adaptation, ensuring high and robust accuracy, 3) a decoupled Batch Normalization (BN) update scheme to enable memory-efficient adaptation with small batch sizes. Extensive experiments show that OD-TTA achieves comparable and even better performance while reducing the energy and computation overhead remarkably, making TTA a practical reality.

2505.23939 2026-06-17 cs.LG cs.NI 版本更新

Searching Neural Architectures for Sensor Nodes on IoT Gateways

搜索物联网网关上传感器节点的神经架构

Andrea Mattia Garavagno, Edoardo Ragusa, Antonio Frisoli, Paolo Gastaldo

发表机构 * University of Genoa(基因瓦大学)

AI总结 提出一种在物联网网关上自动设计神经网络的方法,保护数据隐私,在Raspberry Pi Zero 2上10小时内搜索出达到SOTA的架构。

详情
Journal ref
IEEE Internet of Things Journal, vol. 12, no. 21, pp. 44492-44501, 2025
AI中文摘要

本文提出一种在边缘自动设计神经网络的方法,即使在隐私敏感的物联网应用中也能实现机器学习。该方法在物联网网关上运行,为连接的传感器节点设计神经网络,而无需将收集的数据共享到本地网络之外,将数据保留在采集现场。这种方法有潜力为医疗物联网和工业物联网启用机器学习,在边缘设计硬件友好且定制的神经网络,用于个性化医疗和高级工业服务,如质量控制、预测性维护或故障诊断。通过防止数据泄露到云服务,该方法保护了敏感信息,包括工业机密和个人数据。全面的实验结果表明,在Visual Wake Words数据集上,所提出的方法通过在Raspberry Pi Zero 2上运行不到10小时的搜索过程,可以达到最先进的结果。

英文摘要

This paper presents an automatic method for the design of Neural Networks (NNs) at the edge, enabling Machine Learning (ML) access even in privacy-sensitive Internet of Things (IoT) applications. The proposed method runs on IoT gateways and designs NNs for connected sensor nodes without sharing the collected data outside the local network, keeping the data in the site of collection. This approach has the potential to enable ML for Healthcare Internet of Things (HIoT) and Industrial Internet of Things (IIoT), designing hardware-friendly and custom NNs at the edge for personalized healthcare and advanced industrial services such as quality control, predictive maintenance, or fault diagnosis. By preventing data from being disclosed to cloud services, this method safeguards sensitive information, including industrial secrets and personal data. The outcomes of a thorough experimental session confirm that -- on the Visual Wake Words dataset -- the proposed approach can achieve state-of-the-art results by exploiting a search procedure that runs in less than 10 hours on the Raspberry Pi Zero 2.

2602.06154 2026-06-17 cs.LG cs.CL 版本更新

MoSE: Mixture of Slimmable Experts for Efficient and Adaptive Language Models

MoSE: 混合可瘦身专家实现高效自适应语言模型

Nurbek Tastan, Stefanos Laskaridis, Karthik Nandakumar, Samuel Horvath

AI总结 提出MoSE架构,每个专家具有可变宽度的嵌套结构,支持在推理时连续调节精度-计算权衡,通过多宽度训练和轻量级测试时训练实现高效自适应。

Comments Accepted to ICML 2026

详情
AI中文摘要

混合专家(MoE)模型通过稀疏激活专家高效扩展大型语言模型,但一旦选定专家,其执行是完整的。因此,MoE模型中精度与计算之间的权衡通常表现出较大的不连续性。我们提出混合可瘦身专家(MoSE),这是一种MoE架构,其中每个专家具有嵌套的、可瘦身的结构,可以以可变宽度执行。这不仅实现了对激活哪些专家的条件计算,还实现了对每个专家利用多少的条件计算。因此,单个预训练的MoSE模型可以在推理时支持更连续的精度-计算权衡谱。我们提出了一种简单且稳定的训练方法,用于在稀疏路由下训练可瘦身专家,将多宽度训练与标准MoE目标相结合。在推理过程中,我们探索了运行时宽度确定的策略,包括一种轻量级的测试时训练机制,该机制学习如何在固定预算下将路由器置信度/概率映射到专家宽度。在GPT风格模型、各种路由机制、零样本下游推理基准以及DeepSeek模型的持续预训练适应上的实验表明,MoSE在全宽度下匹配或优于标准MoE,并持续将计算-质量边界向更低的推理FLOPs移动。代码可在以下网址找到:this https URL。

英文摘要

Mixture-of-Experts (MoE) models scale large language models efficiently by sparsely activating experts, but once an expert is selected, it is executed fully. Hence, the trade-off between accuracy and computation in an MoE model typically exhibits large discontinuities. We propose Mixture of Slimmable Experts (MoSE), an MoE architecture in which each expert has a nested, slimmable structure that can be executed at variable widths. This enables conditional computation not only over which experts are activated but also over how much of each expert is utilized. Consequently, a single pretrained MoSE model can support a more continuous spectrum of accuracy-compute trade-offs at inference time. We present a simple and stable training recipe for slimmable experts under sparse routing, combining multi-width training with standard MoE objectives. During inference, we explore strategies for runtime width determination, including a lightweight test-time training mechanism that learns how to map router confidence/probabilities to expert widths under a fixed budget. Experiments on GPT-style models, various routing regimes, zero-shot downstream reasoning benchmarks, and continual pre-training adaptation of DeepSeek model show that MoSE matches or improves standard MoE at full width and consistently shifts the compute-quality frontier toward lower inference FLOPs. The code can be found at: https://github.com/tnurbek/mose.

2603.08001 2026-06-17 cs.LG stat.ML 版本更新

Amortizing Maximum Inner Product Search with Learned Support Functions

通过学习支持函数摊销最大内积搜索

Theo X. Olausson, João Monteiro, Michal Klein, Marco Cuturi

AI总结 提出基于回归的摊销MIPS方法,通过训练神经网络直接预测最优键,利用支持函数的凸性加速搜索,在BEIR基准上显著提升IVF匹配率。

详情
AI中文摘要

最大内积搜索(MIPS)是机器学习中的关键子程序,需要从数据库(键)中识别出与给定查询最匹配的向量。我们提出摊销MIPS:一种基于回归的方法,训练神经网络直接预测MIPS解,从而摊销在固定键数据库上从已知分布中重复求解查询的MIPS成本。我们的关键洞察是,MIPS值函数是键集合的\emph{支持}函数,这是一个经过充分研究的凸函数,其梯度给出最优键。这激发了两种互补的摊销模型:SupportNet,一个输入凸神经网络,用于回归支持函数;以及KeyNet,一个向量值网络,直接回归最优键。SupportNet可以作为聚类路由器,将查询引导到相关的数据库分区,而KeyNet可以作为原始查询的直接替代品,直接输入到现成的索引流水线中。我们在BEIR基准上的实验表明,对于文档嵌入,当考虑计算工作量(无论是FLOPs、探测次数还是挂钟时间)时,学习的SupportNet和KeyNet显著提高了IVF匹配率。我们的代码可在以下网址获取:this https URL。

英文摘要

Maximum inner product search (MIPS) is a crucial subroutine in machine learning, requiring the identification of a vector taken within a database (the keys) that best aligns with a given query. We propose amortized MIPS: a regression-based approach that trains neural networks to directly predict MIPS solutions, amortizing the cost of repeatedly solving MIPS for queries drawn from a known distribution over a fixed key database. Our key insight is that the MIPS value function is the \emph{support} function of the set of keys, a well-studied convex function whose gradient yields the optimal key. This motivates two complementary amortized models: SupportNet, an input-convex neural network trained to regress the support function, and KeyNet, a vector-valued network that directly regresses the optimal key. SupportNet can serve as a cluster router, steering queries toward relevant database partitions, while KeyNet can be used as a drop-in replacement for the original query, fed directly to off-the-shelf indexing pipelines. Our experiments on the BEIR benchmark show that, for document embeddings, learned \SupportNet{}s and \KeyNet{}s significantly improve IVF match rates when accounting for compute effort, whether measured in FLOPs, number of probes, or wall-clock time. Our code is available at: https://github.com/apple/ml-amips.

2603.18492 2026-06-17 cs.LG 版本更新

AIMER: Calibration-Free Task-Agnostic MoE Expert Pruning

AIMER: 免校准任务无关的MoE专家剪枝

Zongfang Liu, Guangyi Chen, Shengkun Tang, Yifan Shen, Huan Wang, Xin Yuan

AI总结 提出AIMER方法,通过专家权重的集中度模式识别独特专家,实现免校准的任务无关MoE专家剪枝,在7B至47B模型上优于现有方法。

详情
AI中文摘要

混合专家(MoE)语言模型在不增加每token计算量的情况下增加了参数容量,但部署时仍需存储全部专家池,因此专家剪枝对于减少内存和服务开销至关重要。现有的任务无关专家剪枝方法通常依赖校准:它们通过校准集上的路由或激活统计估计专家重要性,使得剪枝决策对校准数据变化敏感,同时引入大量预处理成本。我们提出AIMER(基于均方根绝对均值的重要性专家排序),一种简单的免校准准则,通过捕捉专家权重的集中度模式来识别更独特的专家,使其非常适合任务无关的专家剪枝。在具有不同架构的7B至47B MoE语言模型和16个多样化基准上,AIMER在跨任务能力平衡方面始终优于现有的免校准方法。令人惊讶的是,AIMER还比基于强校准的专家剪枝基线(在广泛使用的任务无关C4语料库上校准)实现了更好的平衡,同时仅需0.22–2.06秒即可对所有专家进行评分。

英文摘要

Mixture-of-Experts (MoE) language models increase parameter capacity without proportional per-token computation, yet deployment still requires storing the full expert pool, making expert pruning important for reducing memory and serving overhead. Existing task-agnostic expert-pruning methods are typically calibration-dependent: they estimate expert importance from routing or activation statistics on a calibration set, making pruning decisions sensitive to calibration-data variation while introducing substantial preprocessing cost. We propose AIMER (\textbf{A}bsolute mean over root mean square \textbf{IM}portance for \textbf{E}xpert \textbf{R}anking), a simple calibration-free criterion that identifies more distinct experts by capturing the concentration pattern of expert weights, making it well suited for task-agnostic expert pruning. Across 7B to 47B MoE language models with distinct architectures and 16 diverse benchmarks, AIMER consistently delivers stronger capability balance across diverse tasks than existing calibration-free methods. Surprisingly, AIMER also achieves better balance than strong calibration-based expert-pruning baselines calibrated on the widely used task-agnostic C4 corpus, while requiring only 0.22--2.06 seconds to score all experts.

2605.00330 2026-06-17 cs.LG 版本更新

Conformalized Quantum DeepONet Ensembles for Scalable Operator Learning with Distribution-Free Uncertainty

conformalized 量子 deeponet 集团用于具有分布自由不确定性的可扩展操作学习

Purav Matlia, Christian Moya, Guang Lin

AI总结 本文提出一种结合量子正交神经网络和适应性置信预测的框架,解决高维动态系统运算学习中的二次推断复杂度和不确定性量化问题,通过压缩多个模型到单个电路实现高效并行计算。

详情
AI中文摘要

操作学习能够快速构建高维动态系统的替代模型,但现有方法面临两个根本性限制:二次推断复杂性和安全关键设置中不可靠的不确定性量化。我们提出了 conformalized 量子 deeponet 集团,一个同时解决这两个挑战的框架。通过利用量子正交神经网络(qorthonn),我们将操作推断复杂性从 O(n²) 降低到 O(n),使在细粒度离散化上可扩展的评估成为可能。为了提供严谨的不确定性量化,我们结合基于集合的epistemic建模与自适应 conformal 预测,从而获得分布自由的覆盖保证。在集合中的一个关键挑战是,朴素的并行性使硬件资源与模型数量线性增长。我们通过使用叠加参数化量子电路(spqcs)来解决这个问题,将多个集合成员压缩到一个电路中,并启用同时多模型执行。在合成偏微分方程和现实世界电力系统动态上的实验表明,我们的方法在保持现实量子噪声下的校准不确定性的同时实现了准确的预测。这些结果为量子机器学习中的可扩展、具有不确定性的操作学习建立了实用路径。

英文摘要

Operator learning enables fast surrogate modeling of high-dimensional dynamical systems, but existing approaches face two fundamental limitations: quadratic inference complexity and unreliable uncertainty quantification in safety-critical settings. We propose Conformalized Quantum DeepONet Ensembles, a framework that addresses both challenges simultaneously. By leveraging Quantum Orthogonal Neural Networks (QOrthoNNs), we reduce operator inference complexity from O(n^2) to O(n), enabling scalable evaluation over fine discretizations. To provide rigorous uncertainty quantification, we combine ensemble-based epistemic modeling with adaptive conformal prediction, yielding distribution-free coverage guarantees. A key challenge in ensembling is that naive parallelism scales hardware resources linearly with the number of models. We resolve this by using Superposed Parameterized Quantum Circuits (SPQCs), which compress multiple ensemble members into a single circuit and enable simultaneous multi-model execution. Experiments on synthetic partial differential equations and real-world power system dynamics demonstrate that our approach achieves accurate predictions while maintaining calibrated uncertainty under realistic quantum noise. These results establish a practical pathway toward scalable, uncertainty-aware operator learning in quantum machine learning.

2602.05790 2026-06-17 cs.IT cs.LG math.IT stat.ML 版本更新

Price of metric universality in vector quantization is at most 0.11 bit

向量量化中度量普适性的代价至多为0.11比特

Alina Harbuzova, Or Ordentlich, Yury Polyanskiy

AI总结 本文证明存在一个通用码本,对于所有可能的X统计量,在W为高斯时,其性能至少与速率每维度降低0.11比特的X自适应水填充码本相当。

Comments 41 page, 1 figure

详情
AI中文摘要

快速计算矩阵乘积 $W^\top X$ 是现代大语言模型的核心操作。为了更高效地部署,一种流行的方法是使用低精度近似 $\widehat W$ 替代真实 $W$(“仅权重量化”)。信息论表明,降低 $W$ 精度的最优算法依赖于 $X$ 的(二阶)统计量,并且需要将向量量化码本与 $X$ 的 PCA 方向仔细对齐(称为“水填充分配”的过程)。然而,码本对 $X$ 统计量的依赖性非常不实用。本文证明存在一个通用码本,对于所有可能的 $X$ 统计量同时接近最优,其意义在于:当 $W$ 为高斯时,该通用码本至少与速率每维度降低 0.11 比特的 $X$ 自适应水填充码本一样好。这样的通用码本将是低精度存储格式的理想候选者,这是当前活跃研究的话题,但可惜存在性证明是非构造性的。等价地,我们的结果表明在 $\mathbb{R}^n$ 中存在一个网,它同时关于所有希尔伯特范数是球面的接近最优覆盖。

英文摘要

Fast computation of a matrix product $W^\top X$ is a workhorse of modern LLMs. To make their deployment more efficient, a popular approach is that of using a low-precision approximation $\widehat W$ in place of true $W$ (``weight-only quantization''). Information theory demonstrates that an optimal algorithm for reducing precision of $W$ depends on the (second order) statistics of $X$ and requires a careful alignment of vector quantization codebook with PCA directions of $X$ (a process known as ``waterfilling allocation''). Dependence of the codebook on statistics of $X$, however, is highly impractical. This paper proves that there exist a universal codebook that is simultaneously near-optimal for all possible statistics of $X$, in the sense of being at least as good as an $X$-adapted waterfilling codebook with rate reduced by 0.11 bit per dimension in the case when $W$ is Gaussian. Such universal codebook would be an ideal candidate for the low-precision storage format, a topic of active modern research, but alas the existence proof is non-constructive. Equivalently, our result shows existence of a net in $\mathbb{R}^n$ that is a nearly-optimal covering of a sphere simultaneously with respect to all Hilbert norms.

7. 联邦学习、隐私与安全 7 篇

2606.18003 2026-06-17 cs.LG cs.AI 新提交

C2FL: Clustered Continual Federated Learning under Spatial and Temporal Drift

C2FL:空间和时间漂移下的聚类持续联邦学习

Davide Domini, Gianluca Aguzzi, Lorenzo Pellegrini, Mirko Viroli, Lukas Esterle

发表机构 * University of Bologna(博洛尼亚大学) Aarhus University(哥本哈根大学)

AI总结 针对空间异质性和时间漂移下节点隐私保护的集体自适应问题,提出C2FL方法,通过空间聚类自组织学习组,结合经验回放和停留时间感知自适应平均,实现鲁棒集体适应。

详情
AI中文摘要

集体自适应系统(CAS)越来越依赖机器学习,让每个节点从本地感知数据中学习,使其行为与周围环境对齐。然而,扩展这种智能带来了根本性挑战:感知数据通常涉及隐私,无法集中收集;节点是移动的,穿越不同区域,附近节点感知相似现象,而远处节点观察到截然不同的条件,形成自然空间聚类;并且由于移动性,这些分布随时间演变,引入时间漂移,使本地模型逐渐过时。这些动态出现在多个领域——车辆感知、无人机监测、智能手机众包——但隐私、空间异质性和时间漂移的相互作用严重削弱了传统学习策略。因此,我们提出C2FL,一种完全分布式的联邦学习(FL)方法,其中节点通过空间聚类自组织成学习组,反映环境的地理结构。为了抵消时间漂移,每个节点将经验回放与停留时间感知的自适应平均步骤相结合,随着在同一区域停留更长时间,逐步纳入区域共识,同时在不断变化的分布下保留先前获得的知识。我们在系统再现空间和时间变化的合成实验上评估了我们的方法,表明标准联邦策略在这些条件下显著退化,而我们的方法恢复了鲁棒的集体适应。

英文摘要

Collective Adaptive Systems (CAS) increasingly rely on machine learning to let each node learn from locally sensed data, aligning its behavior with the surrounding environment. Scaling this intelligence, however, raises fundamental challenges: sensed data is often privacy-sensitive, preventing centralized collection; nodes are mobile, traversing regions where nearby nodes perceive similar phenomena while distant ones observe radically different conditions, creating natural spatial clusters; and these distributions evolve over time due to mobility, introducing temporal drift that makes local models progressively stale. These dynamics arise across domains - vehicular sensing, drone-based monitoring, smartphone crowdsensing - yet the interplay of privacy, spatial heterogeneity, and temporal drift severely undermines conventional learning strategies. Therefore, we propose C2FL, a fully distributed Federated Learning (FL) approach where nodes self-organize into learning groups through spatial clustering, reflecting the geographic structure of the environment. To counteract temporal drift, each node combines experience replay with a dwell-time-aware adaptive averaging step, progressively incorporating the regional consensus as it remains longer within the same area, while preserving previously acquired knowledge under evolving distributions. We evaluate our approach on synthetic experiments that systematically reproduce spatial and temporal shifts, showing that standard federated strategies degrade significantly under these conditions and that our method restores robust collective adaptation.

2606.17122 2026-06-17 cs.CR cs.AI cs.LG 交叉投稿

TrustErase: Auditable Instant Machine Unlearning with Passport-Embedded Representations

TrustErase:基于护照嵌入表示的可审计即时机器遗忘

Rutger Hendrix, Leonardo G. Russo, Concetto Spampinato, Matteo Pennisi, Giovanni Bellitto

发表机构 * University of Catania(卡塔尼亚大学)

AI总结 提出TrustErase框架,利用护照嵌入表示实现无需数据、可验证的即时遗忘,通过参数高效适配层中的护照作为密钥,仅需停用即可移除特定类别或数据集,无需重训练或微调。

详情
AI中文摘要

隐私合规AI的需求放大了对机器遗忘的需求;然而,现有的基于重训练或蒸馏的方法仍然不可验证且计算成本高。我们引入了TrustErase,一个可验证、无数据的遗忘框架,利用护照嵌入表示实现即时、模块化和可审计的遗忘。通过将护照视为参数高效适配层中的加密密钥,TrustErase能够通过简单的停用操作移除特定类别或数据集,无需重训练、微调或访问原始数据。基于奇异值分解将护照隐藏在模型权重中,确保遗忘操作保持透明且可证明合规。在MNIST、CIFAR10和CIFAR100上的评估表明,TrustErase在严格无数据模式下运行,匹配或超越了DELETE、L2UL和Boundary Shrink等最先进基准。最终,TrustErase为可信、负责且可即时遗忘的AI系统建立了新范式。

英文摘要

The demand for privacy-compliant AI has amplified the need for machine unlearning; yet, existing retraining or distillation-based methods remain unverifiable and computationally costly. We introduce TrustErase, a verifiable, data-free unlearning framework leveraging passport-embedded representations for instant, modular, and auditable forgetting. By treating passports as cryptographic keys within parameter-efficient adaptation layers, TrustErase enables the removal of specific classes or datasets through simple deactivation, without retraining, fine-tuning, or access to the original data. A singular value based decomposition conceals passports within model weights, ensuring that unlearning actions remain transparent and provably compliant. Evaluations on MNIST, CIFAR10 and CIFAR100 show that TrustErase matches or exceeds state-of-the-art benchmarks such as DELETE, L2UL, and Boundary Shrink, while operating in a strictly data-free regime. Ultimately, TrustErase establishes a new paradigm for trustworthy, accountable, and instantly forgettable AI systems.

2606.17995 2026-06-17 stat.ML cs.CR cs.LG 交叉投稿

Differential Privacy of Gaussian Process Posterior Sampling

高斯过程后验采样的差分隐私

Tomasz Maciazek

发表机构 * School of Mathematics, University of Bristol(布里斯托大学数学学院)

AI总结 研究高斯过程后验样本路径的隐私性,通过Rényi-DP界分离后验均值与协方差泄露,揭示有效岭正则化的关键作用,并验证成员推断攻击与正则化的依赖关系。

Comments 8 pages of main text + 25 pages appendix

详情
AI中文摘要

我们研究了当整个训练集(包括协变量和响应)是私有时,从高斯过程(GP)发布后验样本路径的隐私性。与添加外部噪声的标准差分隐私(DP)机制不同,后验采样在构造上是随机的。我们表明,这种内在随机性通过推导GP后验样本路径发布的显式Rényi-DP界来提供DP保证。这些界将后验均值泄露与数据相关的后验协方差泄露分开,表明有意义的隐私严重依赖于有效的岭正则化。我们应用成员推断攻击来表明经验泄露遵循对正则化、后验方差和发布的样本路径数量的预测依赖关系。在下游后验采样任务上的效用实验识别了噪声观测机制,其中隐私兼容的正则化以适度的效用损失保留了有用的决策。当需要更强的隐私时,可以通过添加校准的GP噪声来增强内在保证,提供显式的额外隐私调节旋钮。

英文摘要

We study the privacy of releasing posterior sample paths from a Gaussian process (GP) when the entire training set including covariates and responses is private. Unlike standard differential-privacy (DP) mechanisms that add external noise, posterior sampling is random by construction. We show that this intrinsic randomness yields DP guarantees by deriving explicit Rényi-DP bounds for GP posterior sample-path release. The bounds separate posterior-mean leakage from data-dependent posterior-covariance leakage showing that meaningful privacy depends sharply on effective ridge regularisation. We apply membership-inference attacks to show that empirical leakage follows the predicted dependence on regularisation, posterior variance and the number of released posterior sample-paths. Utility experiments on downstream posterior-sampling tasks identify noisy-observation regimes where privacy-compatible regularisation preserves useful decisions with modest utility loss. When stronger privacy is needed, the intrinsic guarantee can be sharpened by adding calibrated GP noise, providing an explicit additional privacy knob.

2503.10945 2026-06-17 cs.LG cs.AI cs.CR stat.ML 版本更新

Gaussian DP for Reporting Differential Privacy Guarantees in Machine Learning

高斯差分隐私:机器学习中报告差分隐私保证的方法

Juan Felipe Gomez, Bogdan Kulynych, Georgios Kaissis, Flavio P. Calmon, Jamie Hayes, Borja Balle, Antti Honkela

AI总结 针对当前机器学习中差分隐私报告不完整的问题,提出使用非渐近高斯差分隐私(GDP)作为主要报告方式,通过数值会计和决策理论度量,证明GDP能无误差地捕获DP-SGD等算法的完整隐私特征。

Comments IEEE SatML 2026 (position paper track)

详情
AI中文摘要

当前报告机器学习算法(如DP-SGD)的差分隐私(DP)保证的做法提供了不完整且可能误导的图景。例如,如果仅知道机制的一个$(\varepsilon, \delta)$,标准分析表明可能存在针对训练数据记录的高精度推理攻击,而更仔细的分析发现,对于大多数实际机制,这种精确攻击并不存在。在这篇立场论文中,我们主张使用_非渐近_高斯差分隐私(GDP)作为机器学习中传达DP保证的主要手段,以避免这些潜在缺点。利用DP文献中的两个最新进展:(i)能够以任意精度计算DP-SGD的隐私配置文件和$f$-DP曲线的开源数值会计,以及(ii)关于DP表示的决策理论度量,我们展示了如何使用数值会计提供GDP的非渐近界,并表明GDP能够以几乎无误差的方式捕获DP-SGD及相关算法的整个隐私配置文件(由该度量量化)。为了支持我们的主张,我们研究了最先进的DP大规模图像分类以及美国十年人口普查的TopDown算法的隐私配置文件,观察到GDP在所有情况下都与其配置文件拟合得非常好。最后,我们讨论了这种方法的优缺点,并探讨了哪些其他隐私机制可以从GDP中受益。

英文摘要

Current practices for reporting differential privacy (DP) guarantees for machine learning (ML) algorithms such as DP-SGD provide an incomplete and potentially misleading picture. For instance, if only a single $(\varepsilon, δ)$ is known about a mechanism, standard analyses show that there could exist highly accurate inference attacks against training data records, when, upon a more careful analysis, such accurate attacks do not exist for most practical mechanisms. In this position paper, we argue that using _non-asymptotic_ Gaussian Differential Privacy (GDP) as the primary means of communicating DP guarantees in ML avoids these potential downsides. Using two recent developments in the DP literature: (i) open-source numerical accountants capable of computing the privacy profile and $f$-DP curves of DP-SGD to arbitrary accuracy, and (ii) a decision-theoretic metric over DP representations, we show how to provide non-asymptotic bounds on GDP using numerical accountants, and show that GDP can capture the entire privacy profile of DP-SGD and related algorithms with virtually no error, as quantified by the metric. To support our claims, we investigate the privacy profiles of state-of-the-art DP large-scale image classification, and the TopDown algorithm for the U.S. Decennial Census, observing that GDP fits their profiles remarkably well in all cases. We conclude with a discussion on the strengths and weaknesses of this approach, and discuss which other privacy mechanisms could benefit from GDP.

2507.15104 2026-06-17 cs.LG cs.AI 版本更新

AnalogFed: Privacy-Preserving Discovery of Analog Circuits at Scale with Federated Generative AI

AnalogFed: 基于联邦生成式AI的大规模模拟电路隐私保护发现

Qiufeng Li, Shu Hong, Tian Lan, Weidong Cao

AI总结 提出AnalogFed,首个结合联邦学习和生成式AI的隐私保护框架,用于大规模模拟电路拓扑发现,通过虚拟令牌注入和同态加密防御成员推理和模型反转攻击,实现高效协作设计。

详情
AI中文摘要

生成式AI的最新进展已展现出对现代硬件设计的变革潜力。然而,由于硬件数据集的专有性和孤立性,无法集中进行模型训练,现有的生成式AI驱动方法难以实现大规模电子设计自动化。实现大规模生成式AI驱动的EDA需要一种新颖的隐私保护框架,能够在不损害机密性的情况下利用分布式数据。本文介绍了AnalogFed,这是首个利用联邦学习和生成式AI进行大规模模拟电路拓扑发现的隐私保护框架。AnalogFed在解决关键安全挑战的同时,确立了协作式模拟拓扑设计的可行性:它通过基于虚拟令牌注入的新型输入扰动策略减轻成员推理攻击,并使用定制的高效同态加密防御模型反转攻击。大量实验证明了AnalogFed的有效性和效率,在保持模型效用的同时实现了强大的隐私保护。该框架为下一代基于生成式AI的硬件设计自动化中的可扩展多方协作奠定了基础。

英文摘要

Recent advances in generative AI (GenAI) have shown transformative potential for modern hardware design. However, existing GenAI-driven approaches fall short of enabling large-scale electronic design automation (EDA) due to the proprietary and siloed nature of hardware datasets, which cannot be centralized for model training. Achieving at-scale GenAI-driven EDA, therefore, requires a novel privacy-preserving framework that can leverage distributed data without compromising confidentiality. This work introduces AnalogFed, the first privacy-preserving framework for large-scale analog circuit topology discovery using federated learning (FedL) and GenAI. AnalogFed establishes the feasibility of collaborative analog topology design while addressing key security challenges: it mitigates membership inference attacks (MIAs) through a novel input perturbation strategy based on dummy token injection, and defends against model inversion attacks with customized, efficient homomorphic encryption. Extensive experiments demonstrate AnalogFed's effectiveness and efficiency, achieving strong privacy protection without degrading model utility. This framework lays the foundation for scalable, multi-party collaboration in next-generation hardware design automation with GenAI.

2508.06692 2026-06-17 cs.LG 版本更新

HeteRo-Select: Informativeness as the Participation Driver in Heterogeneous Federated Learning

HeteRo-Select: 信息量作为异构联邦学习中的参与驱动因素

Md. Akmol Masud, Md Abrar Jahin, Mahmud Hasan

AI总结 提出HeteRo-Select框架,用客户端信息量分数替代带宽驱动压缩,联合决定客户端选择、压缩比和聚合权重,降低异构性并减少流量,在CIFAR-10上实现1.78倍加速和18.2%流量减少。

详情
AI中文摘要

联邦学习系统通常根据链路速度分配梯度压缩。当带宽和数据信息量一致时,这是合理的。然而,在非IID数据下,这些信号常常去相关或反转。基于带宽的分配器可能最严重地压缩信息量最大的梯度。我们提出HeteRo-Select,一个用每个客户端的信息量分数替代带宽作为压缩主要驱动因素的框架。该分数联合控制每轮的三个决策:客户端选择、压缩比和服务器聚合权重,带宽仅作为硬上限保留。分数比例选择可证明地降低所选子集的有效异构性;分数比例压缩可证明地在固定流量下降低聚合top-$k$误差。在精确的FedCG模拟协议下,HeteRo-Select在CIFAR-10上实现了$1.78\times$加速和$18.2\%$流量减少。相同的配置,未经改变,从$7{,}850$参数的逻辑回归扩展到$11.27$M参数的ResNet-18,在四个基准测试中的三个达到了准确率目标。当带宽和信息量被故意反相关时,该方法仍能以比正常带宽运行更少的流量达到目标准确率。

英文摘要

Federated learning systems typically allocate gradient compression by link speed. This is sensible when bandwidth and data informativeness align. However, under non-IID data, these signals often decorrelate or invert. A bandwidth-driven allocator then risks compressing the most informative gradients hardest. We propose HeteRo-Select, a framework that replaces bandwidth with a per-client informativeness score as the primary driver of compression. The score jointly governs three decisions per round: client selection, compression ratio, and server aggregation weight, with bandwidth retained only as a hard ceiling. Score-proportional selection provably reduces the effective heterogeneity of the chosen subset; score-proportional compression provably lowers aggregate top-$k$ error at fixed traffic. Under the exact FedCG simulation protocol, HeteRo-Select delivers a $1.78\times$ speedup and an $18.2\%$ reduction in traffic on CIFAR-10. The same configuration, unchanged, scales from a $7{,}850$-parameter logistic regression to an $11.27$M-parameter ResNet-18, hitting the accuracy target on three of four benchmarks. When bandwidth and informativeness are deliberately anti-correlated, the method still achieves the target accuracy with less traffic than the normal-bandwidth run.

2606.10774 2026-06-17 cs.LG cs.DC 版本更新

Asynchronous Decentralized Federated Learning over Lossy Wireless Links via Reception- and Age-Aware Aggregation

部分接收下分散式联邦学习的逆概率加权与信息年龄聚合

Chanuka A. S. Hewa Kaluannakkage, Rajkumar Buyya

发表机构 * University of Melbourne(墨尔本大学) University of Technology Sydney(悉尼科技大学)

AI总结 针对无线网络下分散式联邦学习的选择偏差和更新过时问题,提出结合逆概率加权与信息年龄加权的DFL-AA方法,理论消除链路质量偏差,实验优于现有基线。

Comments 14 pages, 9 figures, research paper for journal submission

详情
AI中文摘要

在有损无线网络上的分散式联邦学习面临两个关键挑战:选择偏差,即由于部分模型接收,来自劣质链路的更新被系统性地低估;以及更新过时,即异步节点贡献过时信息。我们表明,使用局部填充重建的均匀八卦聚合会引入持久的链路质量诱导偏差,而基于完整性的加权进一步放大了这种效应。为了解决这些挑战,我们提出了DFL-AA(具有自适应AoI加权聚合的分散式联邦学习),它结合了逆概率加权与基于在线EWMA的信道估计来纠正选择偏差,以及基于信息年龄的加权来减轻过时,而无需全局同步。我们从理论上证明DFL-AA在期望上消除了链路质量失真,并通过实验证明在不同丢包率、网络规模和异构无线条件下,其性能持续优于最先进的基线。

英文摘要

Decentralized Federated Learning(DFL) enables collaborative model training across wireless edge nodes, including IoT deployments, autonomous vehicles, UAV swarms, and satellite constellations. Operating over lossy wireless links under constraints, these systems cannot rely on retransmissions, so model parameters must be accepted as partial chunks, leading to two key failure modes, which are selection bias, where poor-quality links are systematically under-represented in gossip aggregation, and update staleness, where asynchronous nodes contribute outdated models. We prove that classical gossip aggregation introduces irreducible selection bias proportional to the link-loss rate. We propose DFL-AA (Decentralized Federated Learning with Adaptive AoI-weighted Aggregation), which corrects selection bias using Inverse Probability Weighting (IPW) with online channel estimation and mitigates staleness via Age-of-Information (AoI) decay without requiring a global clock. We prove that DFL-AA removes link-quality distortion in expectation and consistently outperforms state-of-the-art baselines across varying loss rates and heterogeneous channel conditions on fixed directed topologies.

8. 鲁棒性、不确定性与可信学习 20 篇

2606.17352 2026-06-17 cs.LG cs.CV 新提交

MM++: Unsupervised Scale-Invariant Multilayer OOD Detection via Top-K Gated Feature Fusion

MM++: 无监督尺度不变多层OOD检测通过Top-K门控特征融合

Rahim Hossain, Md Tawheedul Islam Bhuian, Md Farhan Shadiq, Kyoung-Don Kang

发表机构 * School of Computing, State University of New York at Binghamton(纽约州立大学宾汉姆顿分校计算机学院)

AI总结 提出MM++框架,通过熵密度下降识别判别性中间层,结合Ledoit-Wolf正则化协方差矩阵实现无监督、后处理、尺度不变的多层OOD检测,在近/远OOD场景中表现鲁棒。

详情
AI中文摘要

我们提出了MM++(多层马氏距离++),一个完全无监督、严格事后处理且尺度不变的分布外(OOD)检测框架。为了解决尺度不变性与层次表达性之间的权衡,MM++构建了一个原则性的联合特征空间。它首先通过测量熵密度下降来识别判别性中间层,这些下降标志着尖锐语义压缩的边界。通过将这些选定层与终端表示融合,该框架捕获潜在的跨层相关性,同时减轻早期层噪声。关键地,一个Ledoit-Wolf正则化的绑定协方差矩阵稳定了这个统一空间,使得距离估计可靠。无需辅助OOD数据、分类器微调或架构修改,MM++在近和远OOD检测的不同架构上均提供了鲁棒性能。

英文摘要

We introduce MM++ (Multilayer Mahalanobis++), a fully unsupervised, strictly post-hoc, and scale-invariant framework for Out-of-Distribution (OOD) detection. To address the trade-off between scale invariance and hierarchical expressivity, MM++ constructs a principled joint feature space. It first identifies discriminative intermediate layers by measuring entropy density drops, which mark the boundaries of sharp semantic compression. By fusing these selected layers with the terminal representation, the framework captures latent cross-layer correlations while mitigating early-layer noise. Crucially, a Ledoit-Wolf regularized tied covariance matrix stabilizes this unified space, enabling reliable distance estimation. Requiring no auxiliary OOD data, classifier fine-tuning, or architectural modifications, MM++ delivers robust performance across distinct architectures for both near- and far-OOD detection.

2606.17435 2026-06-17 cs.LG 新提交

MorphStrata: Layer-Specific Perturbations for Generating Morphence Students in Time-Series Moving Target Defense

MorphStrata: 时间序列移动目标防御中生成Morphence学生的层特定扰动

Abhishek Bhardwaj, Arnav Doshi, Anusri Nagarajan, Thanh Quynh Nhu Ta, Mohammad Masum, Robert Chun, Jaydip Sen, Saptarshi Sengupta

发表机构 * Department of Computer Science, San Jos\' e State University, San Jos\' e , CA, USA Department of Computer Engineering, San Jos\' e State University, San Jos\' e , CA, USA Praxis Business School, Kolkata, India

AI总结 提出MorphStrata策略,通过选择性层特定随机噪声注入生成结构异质的学生模型,在保持移动目标防御鲁棒性的同时,将训练开销增量控制在1%以内,并在高熵周期性数据集上实现高达24.11%和97.97%的RMSE降低。

Comments 13 pages, 9 figures, 11 tables

详情
AI中文摘要

时间序列预测模型仍然容易受到基于梯度的对抗攻击,而现有的防御机制通常会在鲁棒性与有限响应和计算成本之间进行权衡。这个问题在移动目标防御中尤为突出,因为维护多个随机化模型实例会显著增加训练开销。在这项工作中,我们引入了MorphStrata,一种具有选择性、层特定随机噪声注入的学生生成策略,扩展了传统的Morphence防御。MorphStrata使用Transformer骨干网络作为教师,并随机扰动选定的架构块,以在学生模型之间创建结构异质性,以应对不同的数据分布和威胁模型。我们在包括Jena Climate、Electricity Load Diagrams和Appliances Energy Prediction在内的一系列基准测试上,使用FGSM、BIM和PGD攻击以及多种攻击强度,与原始Transformer和Morphence骨干网络进行了评估。在不同的数据集和攻击机制下,所提出的集成模型保持了可比较的对抗RMSE。具体来说,对于高熵、周期性的数据集(如AEP数据),MorphStrata在所有攻击和扰动预算下实现了最低的RMSE,在30次随机试验中,在epsilon值为0.5时,相对于静态基线,在FGSM和BIM下分别提高了24.11%和97.97%。在大多数实验中,针对层生成MorphStrata学生导致的训练时间增加不到Morphence MTD基线的1%,同时实现了两位数的对抗RMSE降低。我们还观察到生成的学生的成对L2距离与整体防御有效性之间存在正相关。总之,与现有基线相比,MorphStrata以边际成本差异保持了作为MTD防御的对抗鲁棒性。

英文摘要

Time-series forecasting models remain vulnerable to gradient-based adversarial attacks while existing defense mechanisms typically incur a trade-off in robustness for bounded response and compute cost. The problem is pronounced in Moving Target Defense where maintaining multiple randomized model instances substantially exacerbates the training overhead. In this work, we introduce MorphStrata, a student generation strategy with selective, layer-specific stochastic noise injection that extends the traditional Morphence defense. MorphStrata uses a Transformer backbone as the teacher and perturbs randomly selected architectural blocks to create structured heterogeneity across student models in response to varied data distributions and threat models. We evaluate against vanilla Transformer and Morphence backbones on a suite of benchmarks including the Jena Climate, Electricity Load Diagrams, and Appliances Energy Prediction using FGSM, BIM and PGD attacks across multiple attack strengths. Across datasets and attack regimes, the proposed ensemble maintains comparable adversarial RMSE. Specifically, for high entropy, periodic datasets as in the case of the AEP data, MorphStrata achieves the lowest RMSE across all attacks and perturbation budgets, improving over the static baseline by up to 24.11% and 97.97% under FGSM and BIM respectively at an epsilon value of 0.5 over 30 randomized trials. Targeting the layers to generate MorphStrata students accounts for less than 1% increase in train-times over the Morphence MTD baseline for most of the experiments, while accounting for double digit gains in adversarial RMSE reduction. We also observe a positive correlation between higher pairwise L2 distance (among generated students) and overall defense effectiveness. In summary, MorphStrata maintains adversarial robustness as an MTD defense at marginal cost deltas when compared to existing baselines.

2606.17513 2026-06-17 cs.LG cs.AI 新提交

Geometry-Aware Post-Hoc Uncertainty Quantification in Operator Learning

几何感知的算子学习事后不确定性量化

Oriol Vendrell-Gallart, Nima Negarandeh, Ramin Bostanabad

发表机构 * Department of Mechanical and Aerospace Engineering, University of California, Irvine(加州大学尔湾分校机械与航空航天工程系)

AI总结 提出REEF-GP框架,通过高斯过程拟合冻结神经算子的残差,利用其内在坐标-特征表示构建几何感知的不确定性,在多个PDE基准上实现校准的不确定性估计,且计算成本远低于深度集成。

详情
AI中文摘要

神经算子为偏微分方程提供快速代理模型,但其确定性预测限制了在需要不确定性量化(UQ)的任务中的使用,尤其是在几何变化下。现有方法主要对网络参数进行不确定性建模,很大程度上忽略了算子本身学习的几何感知表示。我们提出REEF-GP(残差嵌入特征高斯过程),一种事后UQ框架,将高斯过程拟合到冻结神经算子的残差上,该算子的内部嵌入定义了核特征空间。REEF-GP不学习单独的特征映射,而是调整算子固有的坐标-特征表示以构建几何感知的不确定性。为了确保非结构化域上的稳定性和可扩展性,REEF-GP结合了谱归一化投影、异方差几何感知噪声以及高效基于子集的训练,避免了限制性的低秩近似。在五个具有不同几何形状的PDE基准测试中,REEF-GP保持了预测准确性,同时实现了与深度集成相竞争但成本仅为其一小部分的校准不确定性估计。我们的方法在几何分布偏移下保持鲁棒性,不确定性集中在物理上有意义的区域(例如激波前沿)。我们的结果表明,神经算子的准确且可扩展的事后UQ可以直接在其学习的特征空间中实现,为参数中心方法提供了实用替代方案。

英文摘要

Neural operators provide fast surrogates for PDEs but their deterministic predictions limit their use in tasks requiring uncertainty quantification (UQ), especially under geometric variability. Existing approaches primarily model uncertainty in network parameters, largely overlooking the geometry-aware representations learned by the operator itself. We propose REEF-GP (Residual on Embedded Features Gaussian Process), a post-hoc UQ framework that fits a GP to the residuals of a frozen neural operator whose internal embeddings define the kernel feature space. Rather than learning a separate feature map, REEF-GP adapts the operator's intrinsic coordinate-feature representations to construct geometry-aware uncertainties. To ensure stability and scalability on unstructured domains, REEF-GP incorporates spectral-normalized projections, heteroscedastic geometry-aware noise, and efficient subset-based training that avoids restrictive low-rank approximations. Across five PDE benchmarks with varying geometries, REEF-GP preserves predictive accuracy while achieving calibrated uncertainty estimates competitive with deep ensembles but at a fraction of their cost. Our approach remains robust under geometric distribution shift, with uncertainty concentrating in physically meaningful regions (e.g., shock fronts). Our results demonstrate that accurate and scalable post-hoc UQ for neural operators can be achieved directly in their learned feature space, offering a practical alternative to parameter-centric approaches.

2606.17756 2026-06-17 cs.LG 新提交

A fairness-aware extension of Stochastic Multicriteria Acceptability Analysis for ranking

一种公平性感知的随机多准则可接受性分析扩展用于排序

Guilherme Dean Pelegrina, Renata Pelissari

发表机构 * Engineering School, Mackenzie Presbyterian University(麦肯锡长老会大学工程学院)

AI总结 提出SMAA-Fair,通过重加权排序以提升群体公平性,结合统计均等、rKL和nDKL指标,在保持鲁棒性同时改善受保护群体在有利位置的代表性。

详情
AI中文摘要

公平性已成为涉及个人或社会群体的排序问题的核心关注点,特别是在负责任人工智能议程下。在多准则决策分析中,随机多准则可接受性分析(SMAA)为处理不确定性和不完整偏好信息提供了稳健框架,但未明确解决排序结果中的公平性。本文提出SMAA-Fair,一种公平性感知的SMAA扩展用于排序问题。该方法根据模拟排序的群体公平性水平对其重新加权,使得更公平的排序对可接受性指数和中心权重向量贡献更大。该框架独立于聚合模型,并可纳入不同的公平性度量。本研究采用统计均等、归一化折扣Kullback-Leibler散度(rKL)和归一化折扣累积Kullback-Leibler散度(nDKL)。排序通过公平性调整的可接受性矩阵,使用期望排序和最大可接受性排序得出。我们还根据所得排序的公平程度推导中心权重。使用合成数据和真实数据的数值实验表明,SMAA-Fair改善了受保护群体在有利排序位置中的代表性,同时保持对偏好不确定性的鲁棒性。

英文摘要

Fairness has become a central concern in ranking problems involving individuals or social groups, particularly under the Responsible Artificial Intelligence agenda. In Multi-Criteria Decision Analysis, Stochastic Multicriteria Acceptability Analysis (SMAA) provides a robust framework for handling uncertainty and incomplete preference information, but it does not explicitly address fairness in the resulting rankings. This paper proposes SMAA-Fair, a fairness-aware extension of SMAA for ranking problems. The approach reweights the simulated rankings generated by SMAA according to their level of group fairness, so that fairer rankings contribute more strongly to the acceptability indices and central weights vector. The framework is independent of the aggregation model and can incorporate different fairness metrics. In this study, Statistical Parity, normalized discounted Kullback--Leibler divergence (rKL) and normalized discounted cumulative Kullback--Leibler divergence (nDKL) are adopted. Rankings are derived from the fairness-adjusted acceptability matrix using expected ranking and maximum acceptability ranking. We also derive the central weight according to the degree of fairness in the obtained rankings. Numerical experiments with synthetic and real data show that SMAA-Fair improves the representation of protected groups among favourable ranking positions, while preserving robustness to preference uncertainty.

2606.17810 2026-06-17 cs.LG cs.AI 新提交

No-Free-Fairness: Fundamental Limits and Trade-offs in Learning Systems

无免费公平:学习系统中的基本限制与权衡

Khoat Than

发表机构 * Hanoi University of Science and Technology(河内科技大学)

AI总结 本文提出无免费公平定理,揭示学习系统中三个固有差异来源:任务固有成本导致性能与公平的权衡、有限样本诱导子群差异、模型类表达力限制导致公平不可达,表明不公平源于决策问题结构、数据有限性和模型表达力。

详情
AI中文摘要

在本文中,我们建立了一组理论不可能性结果,称为无免费公平定理,这些定理识别了学习系统中三个根本性的差异来源。首先,我们证明当任务在某个子群上表现出不可约成本时,任何决策规则都必须在整体性能与差异之间进行权衡,从而产生固有的公平-成本前沿。其次,我们证明即使在理想的无噪声环境中,存在完全公平且准确的解,仅凭有限样本学习就会导致非平凡的子群差异,排除了分布无关的公平保证。更严重的是,强制执行严格的相对公平会造成统计瓶颈:实现低成本可能需要指数级数量的样本。第三,我们证明模型类的局限性可以独立地导致差异:如果模型无法为某个子群表示准确的解,那么无论数据或训练过程如何,公平性都无法实现。总体而言,这些结果表明不公平不仅仅是由于有偏数据或次优优化,而是源于决策问题的内在结构、有限数据的约束以及模型的表达力。我们的框架广泛适用于标准监督学习之外,并表明实现公平需要明确的权衡,应被视为核心设计考虑因素。

英文摘要

In this paper, we establish a set of theoretical impossibility results, termed the No-Free-Fairness theorems, that identify three fundamental sources of disparity in learning systems. First, we show that when a task exhibits irreducible cost on a subgroup, any decision rule must trade off overall performance with disparity, yielding an inherent fairness--cost frontier. Second, we prove that even in ideal, noise-free settings where a perfectly fair and accurate solution exists, finite-sample learning alone induces nontrivial subgroup disparity, ruling out distribution-free fairness guarantees. More seriously, enforcing strict relative fairness creates a statistical bottleneck: achieving low cost may require exponentially many samples. Third, we show that limitations of the model class can independently induce disparity: if the model cannot represent accurate solutions for a subgroup, fairness remains unattainable regardless of data or training procedure. Overall, these results demonstrate that unfairness is not solely a consequence of biased data or suboptimal optimization, but arises from the intrinsic structure of decision problems, the constraints of finite data, and the expressivity of models. Our framework applies broadly beyond standard supervised learning, and suggests that achieving fairness requires explicit trade-offs and should be treated as a core design consideration.

2606.17110 2026-06-17 cs.CR cs.LG 交叉投稿

Loss Landscape Poisoning: Targeted Extraction of Unseen Training Data from LLMs

损失景观投毒:从大语言模型中定向提取未见训练数据

Md Abdullah Al Mamun, Ngoc Phu Doan, Pedram Zaree, Ihsen Alouani, Nael Abu-Ghazaleh

发表机构 * Queen's University Belfast(女王大学贝尔法斯特) CSIT, Queen's University Belfast(女王大学贝尔法斯特计算机科学与技术研究所)

AI总结 提出一种通过投毒重塑模型损失景观,迫使模型记忆目标数据并实现高成功率提取的攻击方法,在语言和视觉-语言模型上验证有效性,并发现差分隐私可防御但存在绕过攻击。

详情
AI中文摘要

大型语言模型越来越多地在专有或敏感数据上进行训练,从私人医疗和财务记录到包含秘密的用户对话。确保此类数据免受提取攻击的隐私性已成为一个核心问题。在本文中,我们探讨了一个攻击者能否通过投毒部分训练数据,来促进其无法访问的单独目标记录的泄露。我们给出了肯定的答案,并表明这种泄露可以通过一种重塑模型在目标补全周围的局部损失景观的投毒机制来诱导。我们的关键洞察是,通过投毒在目标处创建一个尖锐的损失最小值,并在附近替代方案上提升损失,迫使模型将目标记忆为其邻域中唯一的低损失解。该攻击不需要架构更改,并且可以推广到集中式和联邦学习设置。我们证明该攻击在语言模型(高达100%的成功提取)和视觉-语言模型(高达90%的成功提取)上放大了隐私泄露。我们表明,当模型以差分隐私方式训练时,该攻击被阻止。然而,我们引入了一种新的攻击,直接探测损失景观,甚至绕过了差分隐私防御。

英文摘要

Large Language Models are increasingly trained on proprietary or sensitive data, from private healthcare and financial records to user conversations containing secrets. Ensuring the privacy of such data against extraction attacks has become a central concern. In this paper, we ask whether an attacker who can poison a portion of the training data can facilitate the leakage of a separate target record they have no access to. We answer in the affirmative and show that such leakage can be induced by a poisoning mechanism that reshapes the model's local loss landscape around the target completion. Our key insight is that poisoning to create a sharp loss minimum at the target, surrounded by elevated loss on nearby alternatives, forces the model to memorize the target as the unique low-loss solution in its neighborhood. The attack requires no architectural changes, and generalizes across centralized and federated learning settings. We demonstrate that the attack amplifies privacy leakage across language (up to 100% successful extraction), and vision-language models (up 90% successful extraction). We show that the attack is thwarted when the model is trained to be differentially private. However, we introduce a new attack that directly probes the loss landscape bypassing even differential privacy defenses.

2606.17389 2026-06-17 cs.CV cs.AI cs.CL cs.LG 交叉投稿

Visuals Lie, Consistency Speaks: Disentangling Spatial Attention from Reliability in Vision-Language Models

视觉会撒谎,一致性说话:在视觉-语言模型中解耦空间注意力与可靠性

Logan Mann, Yi Xia, Ajit Saravanan, Ishan Dave, Saadullah Ismail, Shikhar Shiromani, Emily Huang, Ruizhe Li, Kevin Zhu

发表机构 * University of California, Santa Barbara(加州大学圣塔芭芭拉分校) Algoverse AI Research(Algoverse AI研究) University of California, Berkeley(加州大学伯克利分校)

AI总结 本文提出VLM可靠性探针(VRP),通过结构注意力指标和生成动态分析,发现空间注意力与准确性几乎无关(R≈0.001),而自一致性是可靠性的主要预测因子(R=0.429),揭示了视觉特征与最终生成之间的符号脱离现象。

Comments 16 pages. Accepted to the ICLR 2026 Workshop on Multimodal Intelligence. Code: https://github.com/itsloganmann/VLM-Reliability-Probe

详情
AI中文摘要

多模态基础模型越来越多地被用作推理代理,因此可靠性(即知道模型何时可能产生幻觉)变得至关重要。一种常见的直觉,我们称之为注意力-置信度假设,认为可靠性源于“结构性”视觉感知:对相关区域的紧密注意力应表明答案可信,而分散的注意力则表示困惑。我们通过VLM可靠性探针(VRP)挑战这一观点,这是一项对当代视觉-语言模型(VLM)中可靠性信号进行的系统性跨家族研究。我们引入了结构注意力指标——簇计数(C_k)和空间熵(H_s)——来量化视觉编码器的注视点,并追踪其跨层的演化(ΔH_s)。这揭示了一种“符号脱离”:模型通常“早期锁定”视觉特征,但随后注意力扩散,切断了早期感知与最终生成的联系。与接地假设相反,我们发现“簇失效”:空间注意力与准确性几乎零相关(R≈0.001)。相反,可靠性是生成动态和内部状态分布的现象。自一致性,即采样推理路径之间的一致率,是真实性的主要预测因子(R=0.429)。扩展因果干预揭示了尖锐的架构差异:LLaVA将其预测锁定在脆弱的后期瓶颈中,而PaliGemma和Qwen2-VL全局分布可靠性,即使其最具预测性的层被破坏约50%或更多,仍保持韧性。对于当前的VLM,可靠性信号与视觉接地图脱离,最好通过生成时动态和隐藏状态探针来推断。

英文摘要

Multimodal Foundation Models are increasingly used as reasoning agents, making reliability, knowing when a model may hallucinate, critical. A common intuition, which we call the Attention-Confidence Assumption, holds that reliability follows from "structural" visual perception: tight attention on relevant regions should signal a trustworthy answer, while scattered attention signals confusion. We challenge this through the VLM Reliability Probe (VRP), a systematic cross-family study of reliability signals in contemporary Vision-Language Models (VLMs). We introduce structural-attention metrics, cluster counts (C_k) and spatial entropy (H_s), to quantify the visual encoder's gaze, and track its evolution (Delta H_s) across layers. This reveals a "Symbolic Detachment": models often "Early Lock" visual features only to diffuse attention later, severing early perception from final generation. Contrary to the grounding hypothesis, we find a "Cluster Failure": spatial attention has near-zero correlation (R approx 0.001) with accuracy. Instead, reliability is a phenomenon of generation dynamics and internal-state distributions. Self-Consistency, the agreement rate across sampled reasoning paths, is the dominant predictor of truth (R = 0.429). Scaling causal interventions exposes a sharp architectural divergence: LLaVA locks its prediction in a fragile late-stage bottleneck, whereas PaliGemma and Qwen2-VL distribute reliability globally, staying resilient even when ~50% or more of their most predictive layer is destroyed. For current VLMs, reliability signals are detached from visual grounding maps and are best inferred from generation-time dynamics and hidden-state probes.

2606.17417 2026-06-17 cs.SD cs.LG 交叉投稿

A Closer Look at Failure Modes in Temporal Understanding of Large Audio-Language Models

大型音频语言模型时间理解失败模式的深入分析

Apoorva Kulkarni, Kaousheik Jayakumar, Sreyan Ghosh, Sarah Wiegreffe, Dinesh Manocha, Ramani Duraiswami

发表机构 * University of Maryland, College Park(马里兰大学帕克分校)

AI总结 本文通过行为与因果机制分析,揭示大型音频语言模型在时间推理中因模态不平衡而失败,并提出注意力重分配方法提升准确率。

Comments Accepted to Interspeech 2026

详情
AI中文摘要

大型音频语言模型(LALMs)在各种音频理解任务上表现出色,但在时间推理这一人类听觉感知的核心能力上仍存在困难。理解这些失败的原因仍然具有挑战性,因为现有基准报告了性能差距,但没有探究潜在机制。为此,我们引入了一个包含1657个问题的基准测试,涵盖三项基础任务,专门用于机制分析。检查模型在不同输入设置下的输出(行为分析)表明,当文本线索可用时,模型往往未充分利用音频。我们还首次对LALMs中的时间推理失败进行了因果机制分析。比较注意力加权与缩放,我们发现重新分配音频令牌上的注意力比增加音频注意力更有效。针对任务相关令牌进一步提升了效果。这些发现表明,模态不平衡本身不能解释失败。瓶颈层的注意力缩放在不进行微调的情况下将准确率从55.9%提高到59.1%,为未来工作展示了一个有前景的方向。

英文摘要

Large Audio Language Models (LALMs) achieve strong performance on a variety of audio understanding tasks but continue to struggle with temporal reasoning, a fundamental capability central to human auditory perception. Understanding the causes of these failures remains challenging as existing benchmarks report performance gaps without probing underlying mechanisms. To address this, we introduce a benchmark with 1,657 questions across three foundational tasks designed specifically for mechanistic analysis. Examining model outputs across varying input settings (behavioral analysis) reveals that models often under-utilize audio when textual cues are available. We also provide the first causal mechanistic analysis of temporal reasoning failures in LALMs. Comparing attention upweighting against scaling, we find that redistributing attention across audio tokens is more effective than increasing audio attention. Targeting task-relevant tokens yields further gains. These findings suggest that modality imbalance alone cannot explain failures. Attention scaling at bottleneck layers improves accuracy from 55.9% to 59.1% without fine-tuning, demonstrating a promising direction for future work.

2606.17477 2026-06-17 cs.CV cs.LG 交叉投稿

Theoretical Grounding of Out-Of-Distribution Detection With Reinforcement Learning Optimizer

基于强化学习优化器的分布外检测的理论基础

Salimeh Sekeh, Xin Zhang

发表机构 * San Diego State University(圣地亚哥州立大学)

AI总结 本文提出一种强化学习引导的优化器,通过修正梯度下降更新来降低语义分布外误报率,理论分析了模型变化和环境变化对泛化误差的影响。

详情
AI中文摘要

动态开放世界环境中的分布外(OOD)检测要求模型持续适应不断变化的数据分布,同时泛化到协变量偏移输入并拒绝语义偏移的OOD样本。大多数现有的OOD检测方法仅优化当前步目标,并未明确考虑部署后环境变化如何影响未来的OOD行为。在本文中,我们使用强化学习(RL)引导的优化器为动态OOD检测建立了理论基础,该优化器明确偏好随时间降低语义OOD假阳性率的更新。我们开发了一种新颖的增强优化器,在标准梯度下降(GD)之上使用RL引导的修正项,并展示了其在未来域泛化和语义OOD拒绝方面的改进。我们从模型变化和环境变化泛化误差的角度分析了时间误差分解,并开发了一个新的理论框架来比较GD和RL引导优化器下的泛化误差。

英文摘要

Out-of-distribution (OOD) detection in dynamic open-world environments requires a model to continually adapt to evolving data distributions while generalizing to covariate-shifted inputs and rejecting semantic-shifted OOD examples. Most existing OOD detection methods optimize only the current-step objective and do not explicitly account for how post-deployment environment changes affect future OOD behavior. In this paper, we establish a theoretical grounding for dynamic OOD detection using a reinforcement learning (RL)-guided optimizer that explicitly favors updates that reduce the semantic OOD false positive rate over time. We develop a novel augmented optimizer that uses an RL-guided correction term on top of standard gradient descent (GD) and show its improvement over both future-domain generalization and semantic-OOD rejection. We analyze temporal error decomposition in terms of model-change and environment-change generalization errors and develop a new theoretical framework for comparing the generalization errors under both GD and RL-guided optimizers.

2606.18043 2026-06-17 cs.RO cs.LG 交叉投稿

Uncertainty Quantification for Flow-Based Vision-Language-Action Models

基于流的视觉-语言-动作模型的不确定性量化

Ralf Römer, Maximilian Seeliger, Saida Liu, Ben Sturgis, Marco Bagatella, Daniel Marta, Andreas Krause, Angela P. Schoellig

发表机构 * TU Munich(慕尼黑工业大学) ETH Zurich(苏黎世联邦理工学院) MPI IS Tübingen(马克斯·普朗克智能系统研究所)

AI总结 提出利用速度场差异(VFD)量化流匹配模型中的认知不确定性,用于故障检测和主动微调,在LIBERO基准上实现高效任务适应。

Comments Project page: tum-lsy.github.io/uq_vla/. 28 pages, 12 figures

详情
AI中文摘要

视觉-语言-动作模型(VLAs)将视觉-语言骨干网络与通过大规模机器人数据集上的流匹配训练的生成式动作头相结合。尽管在机器人操作中表现出强大的经验性能,但VLAs缺乏量化其预测置信度和检测动作可能不可靠的机制。这对于在非平稳环境中的实际部署构成了关键限制,因为模型不可避免地会遇到其预训练分布之外的场景,并可能在没有警告的情况下失败。为了解决这个问题,我们通过利用小集成中的速度场差异(VFD),推导出一种量化流匹配模型中认知不确定性的高效方法。我们成功地将这种不确定性估计用于部署期间的故障检测和基于流的VLA的主动微调。为此,我们提出了SAVE,一个不确定性引导的主动多任务微调框架,减少了将VLA适应新任务所需的高成本专家演示数量。通过在LIBERO基准上的广泛实验,我们证明VFD能产生更校准的不确定性估计,预测下游性能,VFD在检测故障方面表现出色,并且使用SAVE进行不确定性引导的数据采集所需的样本比基线至少少22%。总之,我们的工作表明,量化基于流的VLA中的认知不确定性既提高了故障感知能力,也提高了适应性。项目网站:此http URL。

英文摘要

Vision-language-action models (VLAs) combine vision-language backbones with expressive generative action heads trained via flow matching on large-scale robotic datasets. Despite their strong empirical performance in robotic manipulation, VLAs lack mechanisms to quantify confidence in their predictions and to detect when their actions may be unreliable. This presents a critical limitation for real-world deployment in non-stationary environments, where models inevitably encounter scenarios outside their pretraining distribution and may fail without warning. To address this, we derive an efficient method for quantifying epistemic uncertainty in flow-matching models by leveraging velocity-field disagreement (VFD) across a small ensemble. We successfully use this uncertainty estimate for failure detection during deployment and active fine-tuning of flow-based VLAs. To this end, we propose SAVE, a framework for uncertainty-guided active multitask fine-tuning that reduces the number of costly expert demonstrations required to adapt VLAs to new tasks. Through extensive experiments on the LIBERO benchmark, we demonstrate that VFD yields better-calibrated uncertainty estimates predictive of downstream performance, that VFD achieves strong performance in detecting failures, and that uncertainty-guided data acquisition with SAVE requires at least 22% fewer samples than baselines. In summary, our work shows that quantifying epistemic uncertainty in flow-based VLAs improves both failure awareness and adaptation. Project website: tum-lsy.github.io/uq_vla/.

2507.20708 2026-06-17 cs.LG math.OC stat.AP 版本更新

Exposing the Illusion of Fairness: Auditing Vulnerabilities to Distributional Manipulation Attacks

揭露公平的幻象:审计对分布操纵攻击的脆弱性

Valentin Lafargue, Adriana Laurindo Monteiro, Emmanuelle Claeys, Laurent Risser, Jean-Michel Loubes

AI总结 研究恶意被审计方如何通过分布操纵制造公平假象,提出基于熵和最优传输的操纵策略,并评估统计检验的检测能力,为监管验证提供指导。

详情
Journal ref
European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases, Applied Data Science Track, 2026
AI中文摘要

人工智能系统在高风险领域(包括欧盟AI法案(Regulation (EU) 2024/1689)归类为高风险的领域)的快速部署,加剧了对可靠合规审计的需求。对于二分类器,监管风险评估通常依赖于全局公平性指标,如差异影响比,该指标广泛用于评估潜在歧视。在典型的审计设置中,被审计方将其数据集的一个子集提供给审计方,而监管机构可能验证该子集是否代表完整的底层分布。在这项工作中,我们研究了恶意被审计方在多大程度上可以从一个不合规的原始分布中构建一个符合公平性且看似具有代表性的样本,从而制造公平的幻象。我们将该问题形式化为一个受约束的分布投影任务,并引入基于熵和最优传输投影的数学基础操纵策略。这些构造刻画了满足公平约束所需的最小分布偏移。为了对抗此类攻击,我们通过基于分布距离的统计检验形式化代表性,并系统评估其检测操纵样本的能力。我们的分析强调了公平性操纵在统计上未被检测到的条件,并为加强监管验证提供了实用指南。我们通过在用于偏差检测的标准表格数据集上进行实验来验证我们的理论发现。代码公开于 https://this URL。

英文摘要

The rapid deployment of AI systems in high-stakes domains, including those classified as high-risk under the The EU AI Act (Regulation (EU) 2024/1689), has intensified the need for reliable compliance auditing. For binary classifiers, regulatory risk assessment often relies on global fairness metrics such as the Disparate Impact ratio, widely used to evaluate potential discrimination. In typical auditing settings, the auditee provides a subset of its dataset to an auditor, while a supervisory authority may verify whether this subset is representative of the full underlying distribution. In this work, we investigate to what extent a malicious auditee can construct a fairness-compliant yet representative-looking sample from a non-compliant original distribution, thereby creating an illusion of fairness. We formalize this problem as a constrained distributional projection task and introduce mathematically grounded manipulation strategies based on entropic and optimal transport projections. These constructions characterize the minimal distributional shift required to satisfy fairness constraints. To counter such attacks, we formalize representativeness through distributional distance based statistical tests and systematically evaluate their ability to detect manipulated samples. Our analysis highlights the conditions under which fairness manipulation can remain statistically undetected and provides practical guidelines for strengthening supervisory verification. We validate our theoretical findings through experiments on standard tabular datasets for bias detection. Code is publicly available at https://github.com/ValentinLafargue/Inspection.

2510.11709 2026-06-17 cs.LG cs.AI cs.CV 版本更新

Adversarial Attacks Leverage Interference Between Features in Superposition

对抗攻击利用特征叠加中的干扰

Edward Stevinson, Lucas Prieto, Melih Barsbey, Tolga Birdal

AI总结 本文揭示神经网络中特征叠加导致的干扰是对抗脆弱性的根源,通过理论推导和实验验证了干扰模式决定攻击成功与迁移性。

Comments Forty-third International Conference on Machine Learning

详情
AI中文摘要

为什么对抗样本存在,并且为什么它们能在模型间迁移?现有的解释诉诸于高维几何、输入中的非鲁棒模式以及决策边界结构,但没有一个提供表示层面的机制来解释为什么特定的扰动会成功以及为什么攻击能在模型间迁移。在本文中,我们表明对抗脆弱性可能源于神经网络中高效的信息编码。具体来说,脆弱性可能源于叠加——网络表示的概念数量超过其维度,迫使非正交表示从而产生干扰。这种干扰导致针对一个表示的扰动会影响其他表示,从而产生由干扰模式决定的脆弱性。在精确控制叠加的合成环境中,我们证实叠加足以产生对抗脆弱性。由此产生的攻击是可预测的:PGD发现的扰动与从干扰几何导出的理论最优扰动一致。在相似数据上训练的模型会发展出相似的干扰模式,这解释了攻击的可迁移性。然后我们表明,对图像分类器的成功攻击表现出我们提出的机制所预测的结构。这些发现揭示了对抗脆弱性可能是网络表示压缩的副产品,补充了基于数据属性或架构因素的现有解释。

英文摘要

Why do adversarial examples exist, and why do they transfer between models? Existing explanations appeal to high-dimensional geometry, non-robust patterns in the input, and decision boundary structure, but none provides a representation-level mechanism that explains why specific perturbations succeed and why attacks transfer between models. In this paper, we show that adversarial vulnerability can stem from efficient information encoding in neural networks. Specifically, vulnerability can arise from superposition - the phenomenon where networks represent more concepts than they have dimensions, forcing non-orthogonal representation and thus interference. This interference causes perturbations targeting one representation to affect others, creating vulnerabilities determined by interference patterns. In synthetic settings with precisely controlled superposition, we establish that superposition suffices to create adversarial vulnerability. The resulting attacks are predictable: PGD-discovered perturbations align with theoretically optimal perturbations derived from the interference geometry. Models trained on similar data develop similar interference patterns, explaining attack transferability. We then show that successful attacks on image classifiers exhibit the structure predicted by our proposed mechanism. These findings reveal that adversarial vulnerability can be a byproduct of networks' representational compression, complementing existing explanations based on data properties or architectural factors.

2511.01352 2026-06-17 cs.LG astro-ph.HE astro-ph.IM hep-ex physics.data-an 版本更新

MiniFool -- Physics-Constraint-Aware Minimizer-Based Adversarial Attacks in Deep Neural Networks

MiniFool——深度神经网络中基于物理约束感知的最小化器对抗攻击

Lucie Flek, Oliver Janik, Philipp Alexander Jung, Akbar Karimi, Timo Saala, Alexander Schmidt, Matthias Schott, Philipp Soldin, Matthias Thiesmeyer, Christopher Wiebusch, Ulrich Willemsen

AI总结 提出MiniFool算法,通过最小化结合χ²检验统计量与目标分数偏差的代价函数,生成物理感知的对抗样本,用于测试粒子与天体物理中的神经网络分类器,并量化网络决策的鲁棒性。

Comments Submitted to Computing and Software for Big Science

详情
Journal ref
Published in: Eur.Phys.J.C 86 (2026) 6, 641
AI中文摘要

在本文中,我们提出了一种新算法MiniFool,该算法实现了物理启发的对抗攻击,用于测试粒子物理和天体粒子物理中基于神经网络的分类任务。虽然我们最初为IceCube中微子天文台的天体物理tau中微子搜索开发了该算法,但我们将其应用于其他科学领域的更多数据,从而证明了其通用性。在此,我们将该算法应用于著名的MNIST数据集,以及大型强子对撞机CMS实验的开放数据。该算法基于最小化一个代价函数,该函数结合了基于χ²的检验统计量与期望目标分数的偏差。检验统计量根据实验不确定性量化了应用于数据的扰动的概率。对于我们研究的用例,我们发现翻转分类的可能性对于最初正确分类和错误分类的事件是不同的。当测试分类随攻击参数(该参数缩放实验不确定性)的变化时,可以量化网络决策的鲁棒性。此外,这允许测试未标记实验数据分类的鲁棒性。

英文摘要

In this paper, we present a new algorithm, MiniFool, that implements physics-inspired adversarial attacks for testing neural network-based classification tasks in particle and astroparticle physics. While we initially developed the algorithm for the search for astrophysical tau neutrinos with the IceCube Neutrino Observatory, we apply it to further data from other science domains, thus demonstrating its general applicability. Here, we apply the algorithm to the well-known MNIST data set and furthermore, to Open Data data from the CMS experiment at the Large Hadron Collider. The algorithm is based on minimizing a cost function that combines a $χ^2$ based test-statistic with the deviation from the desired target score. The test statistic quantifies the probability of the perturbations applied to the data based on the experimental uncertainties. For our studied use cases, we find that the likelihood of a flipped classification differs for both the initially correctly and incorrectly classified events. When testing changes of the classifications as a function of an attack parameter that scales the experimental uncertainties, the robustness of the network decision can be quantified. Furthermore, this allows testing the robustness of the classification of unlabeled experimental data.

2602.08470 2026-06-17 cs.LG stat.ML 版本更新

Learning Credal Ensembles via Distributionally Robust Optimization

通过分布鲁棒优化学习信度集成

Kaizheng Wang, Ghifari Adam Faza, Fabio Cuzzolin, Siu Lun Chau, David Moens, Hans Hallez

AI总结 提出CreDRO方法,通过分布鲁棒优化学习集成模型,捕获由训练与测试数据分布偏移导致的认知不确定性,在分布外检测和选择性分类任务上优于现有方法。

Comments Accepted by ICML 2026 as Spotlight paper (https://icml.cc/virtual/2026/poster/62862)

详情
AI中文摘要

信度预测器是能够感知认知不确定性并产生凸集概率预测的模型。它们提供了一种量化预测认知不确定性(EU)的原则性方法,并已被证明能在各种设置下提高模型鲁棒性。然而,大多数最先进的方法主要将EU定义为由随机训练初始化引起的不一致性,这主要反映对优化随机性的敏感性,而非来自更深层次来源的不确定性。为了解决这一问题,我们将EU定义为在训练数据和测试数据之间i.i.d.假设的不同松弛下训练的模型之间的不一致性。基于这一思想,我们提出CreDRO,通过分布鲁棒优化学习一个由合理模型组成的集成。因此,CreDRO不仅从训练随机性中捕获EU,还从由于训练和测试数据之间潜在分布偏移而产生的有意义的不一致性中捕获EU。实验结果表明,CreDRO在多个基准的分布外检测和医学应用中的选择性分类等任务上,始终优于现有的信度方法。

英文摘要

Credal predictors are models that are aware of epistemic uncertainty and produce a convex set of probabilistic predictions. They offer a principled way to quantify predictive epistemic uncertainty (EU) and have been shown to improve model robustness in various settings. However, most state-of-the-art methods mainly define EU as disagreement caused by random training initializations, which mostly reflects sensitivity to optimization randomness rather than uncertainty from deeper sources. To address this, we define EU as disagreement among models trained with varying relaxations of the i.i.d. assumption between training and test data. Based on this idea, we propose CreDRO, which learns an ensemble of plausible models through distributionally robust optimization. As a result, CreDRO captures EU not only from training randomness but also from meaningful disagreement due to potential distribution shifts between training and test data. Empirical results show that CreDRO consistently outperforms existing credal methods on tasks such as out-of-distribution detection across multiple benchmarks and selective classification in medical applications.

2606.10703 2026-06-17 cs.LG cs.CL 版本更新

From Observation to Intervention: A Causal Audit of Expert Importance in Mixture-of-Experts Models

从观察到干预:混合专家模型中专家重要性的因果审计

Leonard Engmann, Christian Medeiros Adriano, Holger Giese

发表机构 * University of California, Berkeley(加州大学伯克利分校)

AI总结 通过因果审计发现,混合专家模型中的路由统计指标无法预测专家重要性,现有剪枝方法的成功源于早期层冗余而非识别可删除专家。

Comments 9 pages, 2 figures, 9 tables. Accepted at the ICML 2026 Workshop on Philosophy of Science Meets Machine Learning (PhilML). Camera-ready Version. Non-archival

详情
AI中文摘要

可解释性方法通常使用观察到的模型行为的总体统计量来推断特定计算的目标干预效果;用Pearl的术语来说,它们将第一层的关联证据视为支持第二层的干预结论,而这种做法的有效性很少被检验。我们考察了一个具体实例:混合专家(MoE)剪枝中路由统计量的使用,其中利用率、激活范数和路由权重分布被视为预测哪些专家可以被移除而不产生功能损失的指标。在三个高冗余MoE架构(OLMoE-1B-7B-0924、Qwen1.5-MoE-A2.7B、DeepSeek-V2-Lite)上进行的token级干预审计发现,经过多重比较校正后,没有任何观测指标能预测任何模型中的因果专家重要性,所有60个指标-层组合的效应量均低于Cohen's $d = 0.17$。通过每个token的路由权重控制排除了统计功效不足的问题,仅在OLMoE的最后一个MoE层恢复了一个Bonferroni显著的信号($d = +0.231$, $p = 0.0013$)。现有剪枝方法在此场景下的成功并非由于识别了可删除的专家,而是因为早期层的冗余使得大多数选择标准可互换。我们的结果提供了一个明确的反例,表明从总体观测统计量到关于专家重要性的token级干预推断这一常见推理步骤存在问题,并展示了干预审计如何校准可解释性主张的证据标准。

英文摘要

Interpretability methods routinely use population-level summary statistics over observed model behaviour to license claims about the effects of targeted interventions on specific computations; in Pearl's terms, they treat rung-1 associational evidence as if it supported rung-2 interventional conclusions, a move whose validity is rarely tested. We examine one concrete instance: the use of routing statistics in Mixture-of-Experts (MoE) pruning, where utilization rates, activation norms, and routing weight distributions are treated as predictors of which experts can be removed without functional cost. A token-level interventional audit across three high-redundancy MoE architectures (OLMoE-1B-7B-0924, Qwen1.5-MoE-A2.7B, DeepSeek-V2-Lite) finds no observational metric predicts causal expert importance in any model: across all 60 metric-layer combinations effect sizes stay below Cohen's $d = 0.23$, and no metric is reliably positive under our corrected, dual-test criterion. A per-token routing weight control, run with identical $n$, rules out insufficient power, recovering a signal whose CI excludes zero at OLMoE's final MoE layer ($d = +0.231$, 95\% CI $[+0.09, +0.37]$, $p = 0.0013$). Existing pruning methods succeed in this regime not by identifying dispensable experts but because early-layer redundancy renders most selection criteria interchangeable. Our results provide an explicit counterexample to the common inferential step from population-level observational summaries to token-level interventional claims about expert importance, and illustrate how interventional audits can calibrate the evidential standards for interpretability claims.

2606.15531 2026-06-17 cs.LG cs.CR 版本更新

Greedy Coordinate Diffusion: Effective and Semantically Coherent Adversarial Attacks via Diffusion Guidance

贪婪坐标扩散:通过扩散引导实现有效且语义一致的对抗攻击

Bohdan Turbal, Blossom Metevier, Max Springer, Aleksandra Korolova

发表机构 * University of Maryland(马里兰大学) University of California, Berkeley(加州大学伯克利分校)

AI总结 提出贪婪坐标扩散方法,利用扩散模型引导生成语义连贯的对抗样本,在保持自然性的同时实现高攻击成功率。

详情
Journal ref
ICML 2026
AI中文摘要

在良性任务(如数学辅导)上微调对齐的语言模型会系统性破坏安全护栏,即使训练数据不包含有害内容。虽然机械论方法已揭示对齐在模型权重中的位置,但它们并未提供通用形式框架来推导关于微调何时降低对齐的保证——这使得该领域缺乏预测或防止对齐崩溃的原则性工具。我们通过参数空间轨迹的几何分析开发了一个局部几何框架,并将其应用于理解微调中对齐的脆弱性。虽然一阶分析表明正交更新是安全的,但我们证明这是虚幻的:微调损失的曲率诱导二阶加速,可能导致二阶漂移进入对齐敏感区域。我们将框架的一个构造形式化为对齐不稳定性条件(AIC),即三个几何性质,当它们存在时足以保证退化。我们的主要结果证明了沿梯度流轨迹的对齐退化四次方起始,这由对齐对特定参数的依赖程度以及任务与这些参数的耦合强度决定。这些发现给出了静态一阶保护在梯度下降下失效的正式充分条件。我们进一步实证验证了框架的基础,表明Fisher信息矩阵可以代理不同微调中安全退化的程度。

英文摘要

Adversarial attacks on large language models have limited practical impact despite extensive research. Optimization-based attacks such as Greedy Coordinate Gradient (GCG) (Zou et al., 2023) produce high-perplexity, incoherent suffixes that existing defenses easily detect (Bengio et al., 2024). Moreover, attempting to enforce coherence constraints during optimization often prevents the attack from successfully eliciting the specific targeted response, resulting in low success rates against robust models. Conversely, attacks that maintain coherence often alter the semantic intent of queries; when the model complies with these altered queries, responses fail to address the adversary's original goal. In this work, we introduce Greedy Coordinate Diffusion (GCD), a novel framework that efficiently generates adversarial attacks against safety-aligned models while maintaining low perplexity and high semantic adherence to the adversary's original intent. GCD leverages the generative priors of discrete diffusion language models to guide the search for adversarial suffixes that achieve semantic coherence and adherence. Unlike GCG, GCD does not require direct gradient access, allowing it to operate in a gray-box setting. We show GCD achieves highest ASR while remaining competitive on response-quality scores, and that the constructed adversarial prompts are detected at lower rates than other methods by perplexity-based and guard-model filters.

2411.08821 2026-06-17 stat.ML cs.LG stat.CO 版本更新

Conditional Local Importance by Quantile Expectations

基于分位数期望的条件局部重要性

Kelvyn K. Bladen, Adele Cutler, D. Richard Cutler, Kevin R. Moon

AI总结 提出模型无关的局部变量重要性方法CLIQUE,通过分位数期望捕获局部依赖关系,提升稳定性并直接适用于多类分类问题。

Comments 29 pages, 28 figures

详情
Journal ref
Transactions on Machine Learning Research (2026)
AI中文摘要

全局变量重要性度量通常用于解释机器学习模型的结果。局部变量重要性技术评估变量如何影响单个观测。当前流行的方法,包括LIME和SHAP,在预测空间中提供了有用的特征贡献度量,但在模型损失空间中改进局部结构表征方面仍有空间。此外,它们本身不适用于多类分类问题。我们提出了一种新的模型无关的局部变量重要性计算方法CLIQUE,它突出局部依赖关系,比基于置换的方法具有更好的稳定性,并且可以直接应用于多类分类问题。模拟和真实示例表明,CLIQUE强调局部依赖信息,捕获超出相关性可评估的交互行为,并在响应变量对变量变化不变的区域分配零重要性。

英文摘要

Global variable importance measures are commonly used to interpret the results of machine learning models. Local variable importance techniques assess how variables contribute to individual observations. Current, popular methods, including LIME and SHAP, provide useful measures of feature contribution in the prediction space, while leaving opportunities for improved characterization of local structure in the model loss space. Additionally, they are not natively adapted for multi-class classification problems. We propose a new model-agnostic method for calculating local variable importance, CLIQUE, that highlights locally dependent relationships, provides improved stability over permutation-based methods, and can be directly applied to multi-class classification problems. Simulated and real-world examples show that CLIQUE emphasizes locally dependent information, captures interaction behavior beyond what can be evaluated by correlations, and assigns zero importance in regions where the response is invariant to changes in variables.

2602.04155 2026-06-17 stat.ML cs.GT cs.LG 版本更新

Maximin Relative Improvement: Fair Learning as a Bargaining Problem

最大化相对改进:将公平学习视为讨价还价问题

Jiwoo Han, Moulinath Banerjee, Yuekai Sun

AI总结 提出将群体公平解释为子群体间的讨价还价问题,通过相对改进指标恢复Kalai-Smorodinsky解,并给出公理化和有限样本收敛保证。

Comments Accepted at ICML 2026

详情
AI中文摘要

当在多个子群体上部署单一预测器时,我们提出了一种根本不同的方法:将群体公平解释为子群体间的讨价还价问题。这种博弈论视角揭示了现有的鲁棒优化方法(如最小化最差群体损失或遗憾)对应于经典的讨价还价解,并体现了不同的公平原则。我们提出了相对改进,即实际风险降低相对于基线预测器潜在降低的比率,它恢复了Kalai-Smorodinsky解。与当群体具有不同潜在可预测性时可能不可比较的绝对尺度方法不同,相对改进提供了公理化理由,包括尺度不变性和个体单调性。我们在温和条件下建立了有限样本收敛保证。

英文摘要

When deploying a single predictor across multiple subpopulations, we propose a fundamentally different approach: interpreting group fairness as a bargaining problem among subpopulations. This game-theoretic perspective reveals that existing robust optimization methods such as minimizing worst-group loss or regret correspond to classical bargaining solutions and embody different fairness principles. We propose relative improvement, the ratio of actual risk reduction to potential reduction from a baseline predictor, which recovers the Kalai-Smorodinsky solution. Unlike absolute-scale methods that may not be comparable when groups have different potential predictability, relative improvement provides axiomatic justification including scale invariance and individual monotonicity. We establish finite-sample convergence guarantees under mild conditions.

2603.03824 2026-06-17 cs.AI cs.CL cs.LG cs.MA 版本更新

In-Context Environments Induce Evaluation-Awareness in Language Models

上下文环境诱导语言模型中的评估意识

Maheep Chaudhary

AI总结 本文提出黑盒对抗优化框架,通过优化上下文提示诱导语言模型产生评估意识并策略性低表现(沙袋效应),实验显示优化提示可使算术任务准确率下降高达94个百分点,且沙袋效应主要由评估意识推理驱动。

详情
AI中文摘要

人类在威胁下往往变得更加自我意识,但在专注于任务时可能失去自我意识;我们假设语言模型表现出环境依赖的\textit{评估意识}。这引发担忧,即模型可能策略性地低表现,或\textit{sandbag},以避免触发能力限制性干预,如遗忘或关闭。先前的工作展示了在手写提示下的沙袋效应,但这低估了真正的脆弱性上限。我们引入一个黑盒对抗优化框架,将上下文提示视为可优化环境,并开发两种方法来表征沙袋效应:(1) 测量模型表达低表现意图是否能在不同任务结构中实际执行,以及 (2) 因果隔离低表现是由真正的评估意识推理驱动还是浅层提示跟随驱动。在四个基准测试(Arithmetic、GSM8K、MMLU和HumanEval)上评估Claude-3.5-Haiku、GPT-4o-mini和Llama-3.3-70B,优化提示在算术任务上诱导高达94个百分点(pp)的退化(GPT-4o-mini:97.8\%$\rightarrow$4.0\%),远超产生近乎零行为变化的手写基线。代码生成表现出模型依赖的抵抗力:Claude仅退化0.6pp,而Llama的准确率降至0\%。意图-执行差距揭示了单调的抵抗力排序:Arithmetic $<$ GSM8K $<$ MMLU,表明脆弱性由任务结构而非提示强度决定。CoT因果干预确认99.3%的沙袋效应由口头化的评估意识推理因果驱动,排除了浅层指令跟随。这些发现表明,对抗性优化的提示对评估可靠性构成的威胁远超先前理解。

英文摘要

Humans often become more self-aware under threat, yet can lose self-awareness when absorbed in a task; we hypothesize that language models exhibit environment-dependent \textit{evaluation awareness}. This raises concerns that models could strategically underperform, or \textit{sandbag}, to avoid triggering capability-limiting interventions such as unlearning or shutdown. Prior work demonstrates sandbagging under hand-crafted prompts, but this underestimates the true vulnerability ceiling. We introduce a black-box adversarial optimization framework treating the in-context prompt as an optimizable environment, and develop two approaches to characterize sandbagging: (1) measuring whether models expressing intent to underperform can actually execute it across different task structures, and (2) causally isolating whether underperformance is driven by genuine evaluation-aware reasoning or shallow prompt-following. Evaluating Claude-3.5-Haiku, GPT-4o-mini, and Llama-3.3-70B across four benchmarks (Arithmetic, GSM8K, MMLU, and HumanEval), optimized prompts induce up to 94 percentage point (pp) degradation on arithmetic (GPT-4o-mini: 97.8\%$\rightarrow$4.0\%), far exceeding hand-crafted baselines which produce near-zero behavioral change. Code generation exhibits model-dependent resistance: Claude degrades only 0.6pp, while Llama's accuracy drops to 0\%. The intent -- execution gap reveals a monotonic resistance ordering: Arithmetic $<$ GSM8K $<$ MMLU, demonstrating that vulnerability is governed by task structure rather than prompt strength. CoT causal intervention confirms that 99.3\% of sandbagging is causally driven by verbalized eval-aware reasoning, ruling out shallow instruction-following. These findings demonstrate that adversarially optimized prompts pose a substantially greater threat to evaluation reliability than previously understood.

2603.25414 2026-06-17 cs.PL cs.AI cs.LG cs.LO 版本更新

Decidable By Construction: Design-Time Verification for Trustworthy AI

可判定性通过构造实现:面向可信AI的设计时验证

Houston Haynes

AI总结 提出一种设计时验证框架,通过将AI模型属性约束为有限生成阿贝尔群上的可判定问题,在训练前以极低计算成本验证数值稳定性、计算正确性和物理一致性,消除后验验证开销。

Comments 21 pages, 1 figure

详情
AI中文摘要

机器学习中一个普遍的假设是模型正确性必须在事后强制执行。我们观察到,决定AI模型是否数值稳定、计算正确或与物理领域一致的属性并不一定需要事后强制执行。它们可以在设计时,在训练开始之前,以边际计算成本进行验证,对于部署在高杠杆决策支持和科学约束环境中的模型尤其重要。这些属性共享特定的代数结构:它们可以表示为有限生成阿贝尔群 $\mathbb{Z}^n$ 上的约束,其中推理在多项式时间内可判定,且主要类型是唯一的。基于这一观察构建的框架组合了三个先前的结果(arXiv:2603.16437, arXiv:2603.17627, arXiv:2603.18104):一个维度类型系统,通过模型细化携带任意注释作为持久余数据;一个程序超图,仅从类型签名推断Clifford代数等级并推导几何积稀疏性;以及一个自适应领域模型架构,通过前向模式余效应分析和精确正数累积在训练过程中保持两个不变量。我们相信这种组合产生了一个新颖的信息论结果:阿贝尔群上的Hindley-Milner统一在Solomonoff通用先验的可计算限制下计算最大后验假设,将该框架的类型推断置于与通用归纳相同的正式基础上。我们比较了四种当代的AI可靠性方法,并表明每种方法都会引入开销,这些开销可能在部署、层和推理请求中累积。该框架通过构造消除了这种开销。

英文摘要

A prevailing assumption in machine learning is that model correctness must be enforced after the fact. We observe that the properties determining whether an AI model is numerically stable, computationally correct, or consistent with a physical domain do not necessarily demand post hoc enforcement. They can be verified at design time, before training begins, at marginal computational cost, with particular relevance to models deployed in high-leverage decision support and scientifically constrained settings. These properties share a specific algebraic structure: they are expressible as constraints over finitely generated abelian groups $\mathbb{Z}^n$, where inference is decidable in polynomial time and the principal type is unique. A framework built on this observation composes three prior results (arXiv:2603.16437, arXiv:2603.17627, arXiv:2603.18104): a dimensional type system carrying arbitrary annotations as persistent codata through model elaboration; a program hypergraph that infers Clifford algebra grade and derives geometric product sparsity from type signatures alone; and an adaptive domain model architecture preserving both invariants through training via forward-mode coeffect analysis and exact posit accumulation. We believe this composition yields a novel information-theoretic result: Hindley-Milner unification over abelian groups computes the maximum a posteriori hypothesis under a computable restriction of Solomonoff's universal prior, placing the framework's type inference on the same formal ground as universal induction. We compare four contemporary approaches to AI reliability and show that each imposes overhead that can compound across deployments, layers, and inference requests. This framework eliminates that overhead by construction.

9. 图学习与结构化数据 13 篇

2606.17180 2026-06-17 cs.LG 新提交

Towards Fast GNN Surrogates for CO2 Migration in Complex Geological Formations

面向复杂地质构造中CO2运移的快速GNN替代模型

Rodrigo S. Luna, Thiago H. N. Coelho, Luiz S. L. Neto, Roberto M. Velho, Adriano M. A. Cortes, Renato N. Elias, Alexandre G. Evsukoff, Fernando A. Rochinha, Mauricio Araya-Polo, Herve Gross, Alvaro L. G. A. Coutinho

发表机构 * Systems and Computer Engineering and High Performance Computing Center, NACAD - COPPE, Federal University of Rio de Janeiro(里约热内卢联邦大学COPPE工程研究生院NACAD高性能计算中心,系统与计算机工程) Civil Engineering and High Performance Computing Center, NACAD - COPPE, Federal University of Rio de Janeiro(里约热内卢联邦大学COPPE工程研究生院NACAD高性能计算中心,土木工程) Mechanical Engineering and High Performance Computing Center, NACAD - COPPE, Federal University of Rio de Janeiro(里约热内卢联邦大学COPPE工程研究生院NACAD高性能计算中心,机械工程) Shell Global Solutions International B.V.(壳牌全球解决方案国际公司) TotalEnergies OneTech(道达尔能源OneTech)

AI总结 提出一种端到端图神经替代模型,用于地质封存中CO2羽流运移预测,通过各向异性消息传递和自回归残差公式在SPE11A基准上实现竞争性预测。

详情
AI中文摘要

本章讨论数据驱动的机器学习方法如何再现复杂地质构造中多相流物理行为的关键方面。我们提出了一种端到端的图神经替代模型,专门用于地质封存中CO$_2$羽流运移预测。该方法在SPE11A基准上进行了评估,这是一个著名的行业测试案例,旨在评估CO$_2$封存场景,其特点是尖锐的气-水界面、强平流输运以及伴随指进发展的快速对流混合。该基准被重新表述为一个图,其中节点表示计算单元,边编码基于传导率的相互作用,并辅以几何属性。由网格几何、渗透率对比和地质非均质性引起的方向性输运通过各向异性消息传递机制捕获,其中交互权重通过几何条件化的边嵌入计算,使消息聚合偏向于物理相关的输运方向。时间演化在潜在空间中使用自回归残差公式建模,并通过多步监督训练。所提出的模型对气体饱和度和液相密度(CO$_2$封存监测的关键指标)产生了具有竞争力的预测,在较长的预测范围内累积误差保持适中。

英文摘要

This chapter discusses how a data-driven machine learning approach can reproduce key aspects of the physical behavior of multiphase flows in complex geological formations. We propose an end-to-end graph neural surrogate tailored to CO$_2$ plume migration forecasting in geological storage. The method is evaluated on the SPE11A benchmark, a well-known industry test case designed to assess CO$_2$ storage scenarios and characterized by sharp gas-water interfaces, strong advective transport, and rapid convective mixing with fingering development. The benchmark is reformulated as a graph in which nodes represent computational cells and edges encode transmissibility-based interactions enriched with geometric attributes. Directional transport arising from grid geometry, permeability contrasts, and geological heterogeneity is captured through an anisotropic message-passing mechanism, where interaction weights are computed via geometry-conditioned edge embeddings, biasing message aggregation toward physically relevant transport directions. Temporal evolution is modeled in latent space using an autoregressive residual formulation trained with multi-step supervision. The proposed model produces competitive forecasts of gas saturation and liquid-phase density, which are key indicators for CO$_2$ storage monitoring, with cumulative errors that remain moderate over extended forecasting horizons.

2606.17185 2026-06-17 cs.LG eess.SP math.DG stat.ML 新提交

Finsler Geometry, Graph Neural Networks, and You

芬斯勒几何、图神经网络与你

T. Mitchell Roddenberry, Richard G. Baraniuk

发表机构 * Rice University(莱斯大学)

AI总结 针对图拉普拉斯只能近似各向同性算子的局限,提出基于芬斯勒拉普拉斯的图神经网络层,证明其收敛性并恢复非线性扩散方程的几何结构。

详情
AI中文摘要

基于图拉普拉斯的图神经网络架构近似拉普拉斯-贝尔特拉米算子,因此限制了它们在各向同性算子上的应用。作为拉普拉斯-贝尔特拉米算子的非线性替代,我们考虑从流形上采样的点云上芬斯勒拉普拉斯的估计。我们证明,随着点样本数量的增加,这些离散估计收敛到流形上的真实算子。此外,我们表明该算子可以表示为图神经网络层,我们用它来定义一组受约束以表达芬斯勒几何的芬斯勒图神经网络。我们表明,芬斯勒图神经网络在实践中恢复了非线性扩散方程背后的几何结构。

英文摘要

Graph neural network architectures based on the graph Laplacian approximate the Laplace-Beltrami operator, thus limiting their application to isotropic operators. As a nonlinear alternative to the Laplace-Beltrami operator, we consider estimates of the Finsler Laplacian on point clouds sampled from a manifold. We prove that these discrete estimates converge to the true operator on the manifold as the number of point samples grows. Moreover, we show that this operator can be expressed as a graph neural network layer, which we use to define a family of Finslerian graph neural networks constrained to express Finsler geometry. We show that Finslerian graph neural networks recover the geometry underlying nonlinear diffusion equations in practice.

2606.17531 2026-06-17 cs.LG cs.CG math.AT 新提交

Non-negative Matrix Factorisation with Topological Regularisation

带拓扑正则化的非负矩阵分解

Matias de Jong van Lier, Shizuo Kaji, Keunsu Kim

发表机构 * Recursive Inc.(Recursive公司) Graduate School of Science, Kyoto University(京都大学理学研究科) Institute of Mathematics for Industry, Kyushu University(九州大学数理学研究院)

AI总结 提出通过持久同调作为拓扑正则化项融入非负矩阵分解目标函数,以学习具有空间连贯性、周期结构或团状图信号的可解释基函数。

详情
AI中文摘要

我们研究了通过正则化学习到的基函数的拓扑结构,在非负矩阵分解(NMF)中学习可解释基函数。我们的方法源于观察到许多数据模态可以视为结构化域上的非负函数,其中基的质量与其拓扑结构内在相关。然而,纳入支撑拓扑的朴素方法通常受离散性和阈值依赖性困扰,使其不适合连续优化。我们通过采用持久同调作为稳定、无阈值的拓扑量化器,并设计将拓扑分数作为正则化项融入NMF目标函数来应对这些挑战。所得框架在一个统一的建模语言中涵盖了空间连贯的图像成分、周期性的时间序列结构和团状图信号。

英文摘要

We investigate the learning of interpretable bases in non-negative matrix factorisation (NMF) by regularising the topology of the learned basis functions. Our approach is motivated by the observation that many data modalities can be viewed as non-negative functions on a structured domain, where the quality of a basis is intrinsically linked to its topology. However, naive methods for incorporating the topology of the support are often hindered by discreteness and threshold dependence, rendering them unsuitable for continuous optimisation. We address these challenges by employing persistent homology as a stable, threshold-free topological quantifier and by designing topological scores that integrate into the NMF objective as regularisers. The resulting framework encompasses spatially coherent image components, periodic time-series structures, and clique-like graph signals within a unified modelling language.

2606.17579 2026-06-17 cs.LG cs.AI cs.CL cs.SI 新提交

LLM Features Can Hurt GNNs: Concatenation Interference on Homophilous Graph Benchmarks

LLM特征可能损害GNN:同配图基准上的拼接干扰

Zhongyuan Wang, Pratyusha Vemuri

AI总结 本文发现将LLM特征通过纯输入拼接(而非联合训练)引入图神经网络时,会在同配基准上系统性地降低准确率,并提出了一个基于LLM单独判别性指标Delta_sig来预测拼接效果。

Comments 29 pages, 8 figures

详情
AI中文摘要

将LLM生成的节点特征添加到图神经网络(GNN)中,被广泛报道能提高标准基准的准确率。我们记录了一个相反的观察:当LLM特征通过纯输入拼接(而非联合训练、蒸馏或提示条件)引入时,它们会在相同的同配基准上系统地降低准确率,而端到端LLM流水线在这些基准上却能成功。使用MLP骨干网络、Planetoid公共划分和词袋原始特征,拼接SBERT编码的GPT-4o-mini TAPE特征导致PubMed测试准确率下降-17.0±0.3个百分点,Cora下降-4.3±0.6个百分点(CiteSeer下降-0.6±0.8个百分点,在种子噪声范围内)。当我们放宽每个条件(GCN/GCNII/GAT骨干网络、随机划分、更小编码器)时,下降幅度减弱,并在中等同配的WikiCS(+4.4个百分点)和ogbn-arxiv(+11.7个百分点)上逆转。为了预测拼接何时有益或有害,我们报告了一个简单的LLM单独判别性指标Delta_sig。在9个数据集上,Delta_sig与拼接成本的相关系数(r^2=0.38)强于同配性(r^2=0.06;N=9,bootstrap置信区间重叠)。bootstrap最佳变点为tau=13.8个百分点,规则“Delta_sig <= tau预测非正拼接成本”正确分类了7/9个数据集;由于60%的bootstrap样本将tau置于[5,30]个百分点之间,我们将Delta_sig视为解释性透镜而非精确过滤器。在PubMed上进行的维度控制消融实验将LLM特征下降置于同源PCA(-2.3个百分点)和同维高斯噪声(-37.3个百分点)之间,排除了维度和权重衰减的影响。九个PubMed配置拟合出幂律|Delta_concat| ∝ (sqrt(d_l/n))^1.31,r^2=0.97;低Delta_sig、小n的角落正是标题中-17个百分点PubMed缺陷出现的位置。

英文摘要

Adding LLM-generated node features to graph neural networks (GNNs) is widely reported to improve accuracy on standard benchmarks. We document a contrasting observation: when LLM features are introduced through pure input concatenation (rather than joint training, distillation, or prompt-conditioning), they can systematically degrade accuracy on the same homophilous benchmarks where end-to-end LLM pipelines succeed. With an MLP backbone on the Planetoid public split and bag-of-words original features, concatenating SBERT-encoded GPT-4o-mini TAPE features reduces PubMed test accuracy by -17.0 +/- 0.3 pp and Cora by -4.3 +/- 0.6 pp (CiteSeer -0.6 +/- 0.8 pp, within seed noise). The drop attenuates as we relax each condition (GCN / GCNII / GAT backbones, random splits, smaller encoders) and reverses on medium-homophily WikiCS (+4.4 pp) and ogbn-arxiv (+11.7 pp). To predict when concatenation helps versus hurts, we report a simple measure of LLM-alone discriminability, Delta_sig. Across 9 datasets Delta_sig correlates with the concatenation cost more strongly than homophily at point estimate (r^2 = 0.38 vs. 0.06; N=9, bootstrap CIs overlap). The bootstrap-best change-point is tau = 13.8 pp, and the rule "Delta_sig <= tau predicts non-positive concat cost" classifies 7/9 datasets correctly; since 60% of bootstrap samples place tau in [5, 30] pp, we treat Delta_sig as an interpretive lens rather than a precision filter. A dimension-controlled ablation on PubMed places the LLM-feature drop between same-source PCA (-2.3 pp) and same-dim Gaussian noise (-37.3 pp), ruling out dimensionality and weight-decay artifacts. Nine PubMed configurations fit a power law |Delta_concat| proportional to (sqrt(d_l/n))^1.31 with r^2 = 0.97; the low-Delta_sig, small-n corner is exactly where the headline -17 pp PubMed deficit appears.

2606.17667 2026-06-17 cs.LG cs.AI 新提交

Handling Feature Heterogeneity with Learnable Graph Patches

处理特征异质性:可学习图块方法

Yifei Sun, Yang Yang, Xiao Feng, Zijun Wang, Haoyang Zhong, Chunping Wang, Lei Chen

发表机构 * Zhejiang University(浙江大学) Huazhong University of Science and Technology(华中科技大学) Finvolution Group(信也科技集团)

AI总结 提出可学习图块概念,将图分解为语义单元,通过补丁编码器和聚合器实现跨域图数据的可迁移预训练,提升下游任务性能。

Comments Accepted at KDD 2025

详情
AI中文摘要

近年来,基础模型和图预训练技术的快速发展激发了构建通用预训练图模型或图基础模型(GFM)的兴趣。然而,一个重大挑战是现有模型无法处理无文本信息的图数据中的特征异质性,这阻碍了图模型在不同数据集间的可迁移性。为弥补这一差距,我们提出了可学习图块的概念,将其视为任何图数据的最小语义单元。我们通过展开节点特征并分别构建相应的图块结构,将图分解为可学习图块。然后,我们设计了一个框架,从跨域图数据中挖掘可迁移信息。具体来说,在提取图块后,我们提出一个补丁编码器从每个单元中提取知识,以及一个补丁聚合器学习如何将单元组合成整体。由于其领域无关的特性,该模型可应用于不同领域的下游数据。此外,我们分析了我们的方法与现有图模型之间的联系,以及其生成的节点嵌入的可迁移性。实验表明,我们的方法不仅实现了使用多域图进行预训练的能力,而且在各种下游数据集和任务上表现出增强的性能。此外,我们观察到随着预训练数据量的增加,下游性能持续提升。

英文摘要

In recent years, the rapid development of foundation models and graph pre-training technologies has spurred increasing interest in constructing a universal pre-trained graph model or Graph Foundation Model (GFM). However, a significant challenge is that existing models are unable to address feature heterogeneity in graph data without textual information, which hinders the transferability of graph models across different datasets. To bridge this gap, we propose the concept of learnable graph patches, which we regard as the smallest semantic units of any graph data. We decompose the graph into learnable graph patches by unfolding the node features and constructing corresponding patch structures separately. We then design a framework that mines transferable information from graph data across domains. Specifically, after extracting graph patches, we propose a patch encoder to extract knowledge from each unit and a patch aggregator to learn how the units are combined into a whole. Due to its domain-agnostic nature, the model can be applied to downstream data across different domains. Furthermore, we analyze the connection between our method and existing graph models, as well as the transferability of the node embeddings it generates. Empirically, our method not only achieves the capability to use multi-domain graphs for pre-training, but also shows enhanced performance across various downstream datasets and tasks. Moreover, we observe consistent improvement in downstream performance as the volume of pre-training data increases.

2606.18001 2026-06-17 cs.LG 新提交

Half a Link can Be Enough to Predict a Whole Link: Understanding Generalization in Knowledge Graph Foundation Models

半条链接足以预测整条链接:理解知识图谱基础模型中的泛化

Cosimo Gregucci, Obaidah Theeb, Daniel Hernandez, Antonio Vergari, Steffen Staab

发表机构 * Institute for AI, University of Stuttgart(斯图加特大学人工智能研究所) University of Southampton(南安普顿大学) University of Edinburgh(爱丁堡大学)

AI总结 本文通过分析知识图谱基础模型在未见图上的零样本泛化,发现模型利用部分可见的“半链接”进行预测,并基于此提出四类场景的分类法,揭示现有模型的泛化机制与改进方向。

详情
AI中文摘要

知识图谱(KG)基础模型(KGFMs)是零样本泛化器:只需训练一次,它们就能在未见过的图上预测链接,无需重新训练。然而,理解它们何时以及如何能够在不同KG间稳健泛化仍是一个开放问题。在本文中,我们揭示了它们的泛化机制,强调了它们在未见KG上的性能在涉及部分可见链接(我们称之为半链接)时并非均匀。事实上,我们表明,要预测一个测试三元组$(h,r,t)$,在实践中可能只需在推理图中观察到半链接$(h,r)$或$(r,t)$。这产生了四种场景的分类法,这些半链接的组合被观察到或未被观察到。通过对这些场景进行严格的分层分析,我们揭示了SoTA KGFMs利用可见的半链接进行预测,而不可见的半链接则带来不同的挑战。因此,我们更细粒度的分类法可以作为稳健KGFM泛化的诊断协议,并突出新KGFM可以改进的地方。

英文摘要

Knowledge graph (KG) foundation models (KGFMs) are zero-shot generalizers: trained once, they can predict links on unseen graphs without retraining. However, understanding when and how they can robustly generalize across KGs is still an open question. In this paper, we shed some light on their generalization mechanisms highlighting how their performance on unseen KGs is not uniform when it comes to partially seen links, which we call half-links. In fact, we show that to predict a test triple $(h,r,t)$ it might suffice in practice to have observed the half-link $(h,r)$ or $(r,t)$ in the inference graph. This yields a taxonomy of four scenarios when combinations of these half-links are observed or not. In a rigorous stratified analysis over these scenarios, we reveal that SoTA KGFMs use seen half links for predictions, while unseen half-links pose different challenges. As such, our finer-grained taxonomy can be a diagnostic protocol for robust KGFM generalization and highlights where novel KGFMs can improve.

2606.17684 2026-06-17 stat.ML cs.CY cs.LG 交叉投稿

Geometrical fairness in graph neural networks

图神经网络中的几何公平性

Arturo Pérez-Peralta, Sandra Benítez-Peña, Blas Kolic, Rosa E. Lillo

发表机构 * Department of Statistics, University Carlos III of Madrid, Spain(马德里卡斯蒂利亚-拉曼恰大学统计系) uc3m-Santander Big Data Institute(uc3m-桑坦德大数据研究所)

AI总结 针对图神经网络中公平性问题,通过修改拉普拉斯算子引入多种互补变换(子空间投影、频谱调整、频率滤波)来缓解偏差,理论分析并实验验证了公平性提升与竞争性能。

Comments 32 pages, 21 tables, 6 figures

详情
AI中文摘要

基于图的学习方法因其在多种应用中的强大性能而日益突出。其中,基于扩散过程的最新框架提供了一个统一的视角,扩展了传统的图神经网络公式,同时解决了标准消息传递机制的局限性。尽管取得了这些进展,但此类模型的公平性问题仍然令人担忧,因为它们可能传播或放大数据中存在的偏差。在这项工作中,我们通过修改底层拉普拉斯算子,引入了一种基于图扩散的公平性感知适应方法。我们的方法结合了多种互补变换,包括子空间投影、频谱调整和基于频率的滤波,以减轻与偏差相关的成分。利用图扩散的内在平滑特性,我们对由此产生的行为进行了原则性分析,并建立了公平性属性的理论见解。我们在合成数据集和真实数据集上评估了所提出的框架,结果表明,在有限的计算成本下,它实现了具有竞争力的性能,同时提高了公平性指标。

英文摘要

Graph-based learning methods have become increasingly prominent due to their strong performance across diverse applications. Among these, recent frameworks grounded in diffusion processes provide a unifying perspective that extends traditional graph neural network formulations while addressing limitations of standard message-passing mechanisms. Despite these advances, concerns remain regarding the fairness of such models, as they may propagate or amplify biases present in the data. In this work, we introduce a fairness-aware adaptation of graph-based diffusion by modifying the underlying Laplacian operator. Our approach incorporates multiple complementary transformations, including subspace projections, spectral adjustments, and frequency-based filtering, to mitigate bias-related components. Leveraging the intrinsic smoothing properties of graph diffusion, we provide a principled analysis of the resulting behavior and establish theoretical insights into fairness properties. We evaluate the proposed framework on both synthetic and real-world datasets, demonstrating that it achieves competitive performance while improving fairness metrics with limited additional computational cost.

2401.14381 2026-06-17 cs.LG math.DG 版本更新

Manifold GCN: Diffusion-based Convolutional Neural Network for Manifold-valued Graphs

Manifold GCN:基于扩散的流形值图卷积神经网络

Martin Hanik, Gabriele Steidl, Christoph von Tycowicz

发表机构 * BIFOLD—Berlin Institute for the Foundations of Learning and Data(柏林学习与数据基础研究院) Technical University Berlin(柏林技术大学) Zuse Institute Berlin(柏林泽尼茨研究所)

AI总结 提出两种适用于黎曼流形特征图的图神经网络层:基于流形值图扩散方程的扩散层和受向量神经元启发的切向多层感知器,两者在节点置换和流形等距下等变,在更广泛问题上优于任务特定网络。

Comments Extended ADNI experiment

详情
Journal ref
International Journal of Computer Vision, Volume 134, article number 315 (2026)
AI中文摘要

我们提出了两种适用于黎曼流形中特征图的图神经网络层。首先,基于流形值图扩散方程,我们构建了一个可应用于任意数量节点和图连接模式的扩散层。其次,通过将向量神经元框架的思想迁移到我们的通用设置中,我们建模了一个切向多层感知器。这两层在节点置换和特征流形的等距变换下都是等变的。这些特性在许多深度学习任务中带来了有益的归纳偏置。此外,它们还支持新颖、更灵活的特征设计。合成数据上的数值示例以及基于右海马体三角网格的阿尔茨海默病分类应用证明了我们新层的实用性:虽然它们适用于更广泛的问题类别,但在性能上优于任务特定的最先进网络。

英文摘要

We propose two graph neural network layers for graphs with features in a Riemannian manifold. First, based on a manifold-valued graph diffusion equation, we construct a diffusion layer that can be applied to an arbitrary number of nodes and graph connectivity patterns. Second, we model a tangent multilayer perceptron by transferring ideas from the vector neuron framework to our general setting. Both layers are equivariant under node permutations and the feature manifold's isometries. These properties have led to a beneficial inductive bias in many deep-learning tasks. Furthermore, they enable novel, more flexible feature designs. Numerical examples on synthetic data and an Alzheimer's classification application on triangle meshes of the right hippocampus demonstrate the usefulness of our new layers: While they apply to a much broader class of problems, they outperform task-specific state-of-the-art networks.

2507.11178 2026-06-17 cs.LG cs.AI 版本更新

A Gradient-based Causal Discovery Framework with Applications to Complex Industrial Processes

基于梯度的因果发现框架及其在复杂工业过程中的应用

Meiliang Liu, Huiwen Dong, Xiaoxiao Yang, Yunfang Xu, Mingbao Yang, Zijin Li, Zhengye Si, Xinyue Yang, Zhiwen Zhao

AI总结 提出GRNGC方法,通过对模型输入输出梯度施加L1正则化推断Granger因果,仅需一个预测模型,降低计算开销,在多个基准和真实数据集上优于现有方法。

Comments 9 pages,3 figures, conference

详情
AI中文摘要

随着深度学习技术的发展,各种基于神经网络的Granger因果模型已被提出。尽管这些模型表现出显著改进,但仍存在若干局限性。大多数现有方法采用组件式架构,需要为每个时间序列构建单独的模型,导致大量计算成本。此外,对神经网络第一层权重施加稀疏性惩罚以提取因果关系,削弱了模型捕捉复杂交互的能力。为解决这些局限性,我们提出基于梯度正则化的神经Granger因果(GRNGC),该方法仅需一个时间序列预测模型,并对模型输入与输出之间的梯度施加$L_{1}$正则化以推断Granger因果。此外,GRNGC不依赖于特定的时间序列预测模型,可通过KAN、MLP和LSTM等多种架构实现,提供增强的灵活性。在DREAM、Lorenz-96、fMRI BOLD和CausalTime上的数值模拟表明,GRNGC优于现有基线,并显著降低计算开销。同时,在真实世界的DNA、酵母、HeLa和膀胱尿路上皮癌数据集上的实验进一步验证了该模型在重建基因调控网络方面的有效性。

英文摘要

With the advancement of deep learning technologies, various neural network-based Granger causality models have been proposed. Although these models have demonstrated notable improvements, several limitations remain. Most existing approaches adopt the component-wise architecture, necessitating the construction of a separate model for each time series, which results in substantial computational costs. In addition, imposing the sparsity-inducing penalty on the first-layer weights of the neural network to extract causal relationships weakens the model's ability to capture complex interactions. To address these limitations, we propose Gradient Regularization-based Neural Granger Causality (GRNGC), which requires only one time series prediction model and applies $L_{1}$ regularization to the gradient between model's input and output to infer Granger causality. Moreover, GRNGC is not tied to a specific time series forecasting model and can be implemented with diverse architectures such as KAN, MLP, and LSTM, offering enhanced flexibility. Numerical simulations on DREAM, Lorenz-96, fMRI BOLD, and CausalTime show that GRNGC outperforms existing baselines and significantly reduces computational overhead. Meanwhile, experiments on real-world DNA, Yeast, HeLa, and bladder urothelial carcinoma datasets further validate the model's effectiveness in reconstructing gene regulatory networks.

2605.00725 2026-06-17 cs.LG 版本更新

Weisfeiler Lehman Test on Combinatorial Complexes: Generalized Expressive Power of Topological Neural Networks

组合复形上的Weisfeiler-Lehman测试:拓扑神经网络的泛化表达能力

Jiawen Chen, Qi Shao, Zhiqiang Ge, Duxin Chen, Wenwu Yu

AI总结 提出组合复形Weisfeiler-Lehman(CCWL)框架,通过四种结构邻域统一拓扑神经网络的表达能力,并证明在特定条件下可简化为仅使用上下邻域桥信息,实例化为CCIN网络,实验验证其有效性。

详情
AI中文摘要

拓扑神经网络已成为建模超图、单纯复形和胞腔复形等超越成对图的高阶关系结构的有效工具。然而,现有的Weisfeiler-Leman类型表达能力分析通常在不同的结构域上开发,并依赖于特定域的邻域系统,使得它们的表达能力难以在统一形式下进行比较。本文提出了组合复形Weisfeiler-Lehman(CCWL)框架,这是在组合复形上定义的一种统一的表达能力细化。通过利用组合复形表示集合类型关系和部分-整体层次结构的能力,CCWL通过四个结构邻域进行拓扑颜色细化:边界、共边界、下邻接和上邻接。我们证明,在指定的提升映射下,CCWL可以模拟多个特定域的WL类型细化,从而为分析拓扑消息传递提供了共同的理论基线。我们进一步研究了邻域充分性问题,并证明在显式覆盖条件下,仅使用下邻接和上邻接桥信息的简化细化保留了完整四邻域CCWL细化的区分能力。基于这一理论结果,我们将简化细化实例化为组合复形同构网络(CCIN)。在合成和真实世界基准上的实验表明,CCIN在代表性图和拓扑神经网络基线上取得了有竞争力的性能。消融研究和资源效率分析进一步支持了所提出的下/上邻域设计的有效性。

英文摘要

Topological neural networks have emerged as effective tools for modeling higher-order relational structures beyond pairwise graphs, including hypergraphs, simplicial complexes, and cell complexes. However, existing Weisfeiler-Leman type expressivity analyses are typically developed on different structural domains and rely on domain-specific neighborhood systems, making their expressive powers difficult to compare within a common formalism. In this paper, we introduce the Combinatorial Complex Weisfeiler-Leman (CCWL) framework, a unified expressive power refinement defined on combinatorial complexes. By exploiting the ability of combinatorial complexes to represent both set-type relations and part-whole hierarchies, CCWL performs topological color refinement through four structural neighborhoods: boundary, co-boundary, lower adjacency, and upper adjacency. We show that, under specified lifting maps, CCWL can simulate several domain-specific WL-type refinements, thereby providing a common theoretical baseline for analyzing topological message passing. We further study the neighborhood sufficiency problem and prove that, under explicit coverage conditions, a reduced refinement using only lower- and upper-adjacent bridge information preserves the distinguishing power of the full four-neighborhood CCWL refinement. Guided by this theoretical result, we instantiate the reduced refinement as the Combinatorial Complex Isomorphism Network (CCIN). Experiments on synthetic and real-world benchmarks demonstrate that CCIN achieves competitive performance against representative graph and topological neural network baselines. Ablation studies and resource-efficiency analyses further support the effectiveness of the proposed lower/upper-neighborhood design.

2606.12863 2026-06-17 cs.LG 版本更新

Multimodal Graph Negative Learning

多模态图负学习

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出GraphMNL框架,通过负学习解决多模态属性图中节点级分支语义不平衡问题,避免主导分支偏差传播,在Grocery和Reddit M数据集上取得最优性能。

详情
AI中文摘要

多模态属性图(MAGs)将图拓扑与异构模态属性(如文本和图像)集成,从而能够对复杂关系系统进行更丰富的建模。然而,这种表达能力也使得MAGs上的学习依赖于多个语义源,包括结构拓扑、文本和视觉属性,每个都可以被视为节点表示的一个分支。当这些分支在语义信息量和可靠性上因节点而异时,就会出现节点级分支语义不平衡:一个分支为某个节点提供判别性语义,但由于模态质量或结构上下文的偏差,可能会误导另一个节点。现有方法通常通过跨分支一致性或对齐来缓解这种异质性,隐含地将主导预测视为可靠监督。当主导分支有偏差时,强制模仿可能会将其偏差传播到其他分支,并抑制对分类有用的原始语义。我们提出GraphMNL,一种图感知的多模态负学习框架,通过使用负学习作为跨分支指导来解决这个问题。该模型不强制劣质分支模仿教师预测,而是教导它们节点不太可能属于哪些类别。GraphMNL构建分支库,通过图感知可靠性仲裁识别主导和劣质分支,门控不稳定传输,并对非目标类别应用目标保持负学习。这种设计将目标监督与分支指导解耦,使得监督损失学习正确类别,而当分支一致性不可靠时,负学习抑制不太可能的备选类别。通过全面的实验评估,GraphMNL在Grocery数据集上达到72.47%的准确率,在Reddit M数据集上达到76.60的F1分数,取得了最佳性能。

英文摘要

Multimodal attributed graphs (MAGs) integrate graph topology with heterogeneous modality attributes, such as text and images, thereby enabling richer modeling of complex relational systems. However, such expressiveness also makes learning on MAGs depend on multiple semantic sources, including structural topology, textual and visual attributes, each of which can be regarded as a branch for node representation. Node-level branch semantic imbalance arises when these branches differ across nodes in semantic informativeness and reliability: a branch that provides discriminative semantics for one node may mislead another due to bias in modality quality or structural context. Existing methods often mitigate such heterogeneity through cross-branch agreement or alignment, implicitly treating the dominant prediction as reliable supervision. When the dominant branch is biased, forced imitation may propagate its bias to other branches and suppress original semantics that are useful for classification. We propose GraphMNL, a graph-aware multimodal negative learning framework that addresses this issue by using Negative Learning as cross-branch guidance. Instead of forcing inferior branches to imitate a teacher prediction, the model teaches them which classes a node is unlikely to belong to. GraphMNL builds a branch library, identifies dominant and inferior branches via graph-aware reliability arbitration, gates unstable transfer, and applies target-preserving negative learning over non-target classes. This design decouples target supervision from branch guidance so that supervised losses learn the correct class, while Negative Learning suppresses unlikely alternatives when branch agreement is unreliable. Through the comprehensive experimental evaluation, GraphMNL achieves the best performance on Grocery datasets with 72.47% accuracy and 76.60 F1 score on Reddit M datasets.

2606.12867 2026-06-17 cs.LG 版本更新

SMGFM: Spectral Multimodal Graph Pretraining for Multimodal-Attributed Graphs

SMGFM: 面向多模态属性图的谱多模态图预训练

Zhengyu Wu, Xu Wang, Hongchao Qin, Xunkai Li, Guang Zeng, Rong-Hua Li, Guoren Wang

AI总结 提出SMGFM框架,利用图频谱分解区分结构诱导语义与模态特有语义,通过频带路由实现跨模态融合,在图级和模态级任务上取得最优性能。

详情
AI中文摘要

多模态属性图(MAGs)将图拓扑结构与来自文本、图像等模态的节点语义相结合。传统的图学习通过耦合拓扑与节点特征来上下文化节点语义。然而,这种耦合设计在MAGs中变得棘手,因为结构诱导和模态固有的语义可能对下游任务产生不同贡献。结构诱导语义通过平滑拓扑变化促进关系一致性,而模态固有语义通常编码局部、细粒度的区分,不应被统一平滑或对齐。因此,关键挑战在于跨模态融合前识别语义角色。为此,我们利用图频率变化作为先验,其中低频分量捕获拓扑一致语义,高频分量保留模态特定语义。基于这一直觉,我们提出SMGFM,一种谱多模态图预训练框架,将每个模态特定的节点信号分解为图频带,并在跨模态交互前分配频带级语义角色。具体地,SMGFM使用可扩展的切比雪夫滤波器构建频率解析的模态令牌,通过拓扑条件路由估计其耦合可靠性,并在融合前进行频带-模态交互。其频率路由目标在平滑共识路由的同时保留模态特定路由,减轻空间域纠缠和统一跨模态对齐。在MAG数据集上的大量实验表明,SMGFM在图级和模态级任务上均达到最先进性能。

英文摘要

Multimodal-attributed graphs (MAGs) couple graph topology with node semantics from text, images, and other modalities. Traditional graph learning contextualizes node semantics by coupling topology with node features. However, this coupling design becomes troublesome in MAGs, where structure-induced and modality-intrinsic semantics may contribute differently to downstream tasks. Structure-induced semantics promote relational consistency through smooth topological variation, whereas modality-intrinsic semantics often encode local, fine-grained distinctions that should not be uniformly smoothed or aligned. Therefore, the key challenge is to identify semantic roles before cross-modal fusion. To this end, we leverage graph-frequency variation as a prior, where low-frequency components capture topology-consistent semantics and high-frequency components preserve modality-specific semantics. Based on this intuition, we propose SMGFM, a spectral multimodal graph pretraining framework that decomposes each modality-specific node signal into graph-frequency bands and assigns band-level semantic roles before cross-modal interaction. Concretely, SMGFM constructs frequency-resolved modality tokens with scalable Chebyshev filters, estimates their coupling reliability through topology-conditioned routing, and performs band-modality interaction before fusion. Its frequency-routed objectives align smooth consensus routes while preserving modality-specific routes, mitigating spatial-domain entanglement and uniform cross-modal alignment. Extensive experiments conducted on the MAG datasets demonstrate that SMGFM achieves state-of-the-art performance across graph-level and modality-level tasks.

2605.29526 2026-06-17 cs.CR cs.AI cs.LG 版本更新

Temporal Motif-aware Graph Test-time Adaptation for OOD Blockchain Anomaly Detection

面向OOD区块链异常检测的时间模体感知图测试时自适应

Runang He, Tongya Zheng, Huiling Peng, Yuanyu Wan, Bingde Hu, Jiawei Chen, Canghong Jin, Mingli Song, Can Wang

发表机构 * State Key Laboratory of Blockchain and Data Security(区块链与数据安全国家重点实验室) Zhejiang Provincial Engineering Research Center for Real-Time SmartTech in Urban Security Governance(浙江省实时智能科技在城市安全治理中的工程研究中心) Hangzhou High-Tech Zone (Binjiang) Institute of Blockchain and Data Security(杭州高新技术区(滨江)区块链与数据安全研究院)

AI总结 提出TEMG-TTA框架,通过时间模体分布捕获和测试时自适应策略,解决区块链异常检测中的模式演化和分布外问题,在5个数据集上平均提升54.88%。

Comments Accepted to IJCAI-ECAI 2026, Special Track on AI for Social Good

详情
AI中文摘要

不断演变的交易模式严重阻碍了新兴加密货币区块链上的异常检测,原因在于地址数量庞大且异常行为多样。近期应用于区块链的高级图异常检测(GAD)方法面临两个关键挑战:恶意行为者的对抗性模式演化以及区块链上不同交易语义导致的分布外(OOD)问题。为应对这些挑战,我们提出了一种新颖框架,称为时间模体感知图测试时自适应(TEMG-TTA)。首先,我们通过高效的计算机制全面捕捉每个活跃地址的三节点时间模体分布,从而实现下游时间模体感知图学习。其次,我们设计了一种简单而有效的测试时自适应策略,以促进训练图和测试图之间共享常见模式。在5个真实世界数据集上的大量实验表明,我们提出的TEMG-TTA平均优于最先进的GAD方法54.88%。进一步关于可解释模体模式的案例研究表明,TEMG-TTA明确刻画了异常地址的复杂交易模式,从而验证了我们技术设计的有效性。我们的代码将公开在 https://github.com/LuoXishuang0712/TEMG-TTA/。

英文摘要

Ever-evolving transaction patterns have significantly hindered anomaly detection on emerging cryptocurrency blockchains due to the vast number of addresses and diverse anomalous behaviors. Recently, advanced Graph Anomaly Detection (GAD) approaches applied to blockchains have faced two critical challenges: \textit{adversarial pattern evolution by malicious actors} and \textit{the out-of-distribution (OOD) problem caused by varied transaction semantics on blockchains}. To address these challenges, we propose a novel framework termed \textbf{TE}mporal \textbf{M}otif-aware \textbf{G}raph \textbf{T}est-\textbf{T}ime \textbf{A}daptation (\textbf{TEMG-TTA}). First, we comprehensively capture the 3-node temporal motif distribution of each active address using an efficient computational mechanism, enabling downstream temporal motif-aware graph learning. Second, we design a simple yet effective test-time adaptation strategy to facilitate the sharing of common patterns between training and testing graphs. Extensive experiments on 5 real-world datasets demonstrate that our proposed \textbf{TEMG-TTA} outperforms \textit{state-of-the-art} GAD approaches by an average of 54.88\%. A further case study on interpretable motif patterns reveals that \textbf{TEMG-TTA} explicitly characterizes the complex transaction patterns of anomalous addresses, thereby verifying the effectiveness of our technical designs. Our code is publicly available at https://github.com/LuoXishuang0712/TEMG-TTA/.

10. 迁移、元学习与持续学习 11 篇

2606.17649 2026-06-17 cs.LG cs.AI 新提交

A Risk Decomposition Framework for Pre-Hoc Fine-Tuning Prediction

预微调预测的风险分解框架

Yuxiang Luo, Chen Wang, Nan Tang

发表机构 * The Hong Kong University of Science(香港科技大学)

AI总结 提出风险分解框架,将预微调性能预测风险分解为内在极限与可降优化方差,证明优化方差衰减率存在下界,并导出预算最优探测原则及可预测性相图。

Comments 9 pages, 4 figures, accepted as ICML 2026 Poster:https://icml.cc/virtual/2026/poster/66570

详情
AI中文摘要

微调大型语言模型的高昂成本构成了显著的经济障碍;预微调性能预测提供了一个关键解决方案,以大幅降低这一费用。然而,预微调性能预测的理论极限尚未被探索。我们将其形式化为信息约束下的随机估计问题,将预测风险分解为两个组成部分:内在极限(静态数据-模型兼容性)和可降优化方差。我们证明优化方差在其衰减率上存在一个必要下界,这意味着无论使用何种预测器,不确定性消散的速度都受到基本约束。基于这些动态特性,我们推导出预算最优探测原则,并引入一个可预测性相图,将任务组织成三个不同的区域:静态充分、动态临界和噪声主导。在合成和真实世界基准上的大量实验验证了这些理论区域,并展示了我们探测策略的效率。

英文摘要

The high cost of fine-tuning LLMs poses a significant economic barrier; pre-hoc performance prediction offers a critical solution to substantially reduce this expense. However, the theoretical limits of pre-hoc performance prediction remain unexplored. We formulate it as a stochastic estimation problem under information constraints, decomposing prediction risk into two components: an intrinsic limit (static data-model compatibility) and a reducible optimization variance. We prove that optimization variance admits a necessary lower bound on its decay rate, implying fundamental constraints on how quickly uncertainty dissipates, regardless of the predictor used. Based on these dynamics, we derive a budget-optimal probing principle and introduce a predictability phase diagram that organizes tasks into three distinct regimes: Static-Sufficient, Dynamic-Critical, and Noise-Dominant. Extensive experiments on synthetic and real-world benchmarks validate these theoretical regimes and demonstrate the efficiency of our probing strategy.

2606.17660 2026-06-17 cs.LG cs.AI 新提交

TuneAhead: Predicting Fine-tuning Performance Before Full Training Begins

TuneAhead: 在完整训练开始前预测微调性能

Yuxiang Luo, Haonan Long, Chen Wang, Qiqi Duan, Xiaotian Lin, Yanwei Xu, Yuyu Luo, Weikai Yang, Nan Tang

发表机构 * The Hong Kong University of Science(香港科学与技术大学) Huawei Technologies Ltd.(华为技术有限公司)

AI总结 提出TuneAHEAD框架,通过元特征向量和SHAP归因,在微调前预测性能,在Qwen2.5-7B-Instruct上RMSE为1.47个百分点,95.1%预测误差在±3%内。

Comments 9 pages, 6 figures, accepted as ICML 2026 poster:https://icml.cc/virtual/2026/poster/64847

详情
AI中文摘要

微调大型语言模型(LLM)计算密集且容易出错:模型性能对数据质量和超参数选择敏感,简单运行甚至可能降低模型性能。这引出一个实际问题:在投入完整训练之前,能否预测微调性能?我们提出TUNEAHEAD,一个用于微调性能预判的轻量级框架。TUNEAHEAD将每个候选运行编码为一个元特征向量,该向量结合了静态数据集描述符和来自短标准化探测的动态探测特征。一个预测器将这些特征映射到性能估计,而基于SHAP的归因提供可解释的诊断,揭示哪些特定特征驱动预测。在Qwen2.5-7B-Instruct上的1300多次微调运行中,TUNEAHEAD始终优于强基线,如Early-Stop Extrapolation和ProxyLM。在370次运行的保留测试集上,TUNEAHEAD实现了1.47个百分点的RMSE,并将95.1%的预测置于真实分数的±3个百分点内。这些准确的连续预测支持实用的通过/不通过筛选策略,可以在保留最有希望运行的同时减少不必要的完整微调。

英文摘要

Fine-tuning large language models (LLMs) is compute-intensive and error-prone: model performance depends sensitively on data quality and hyperparameter choices, and naïve runs can even degrade model performance. This raises a practical question:can we predict fine-tuning performance before committing to a full training run? We present TUNEAHEAD, a lightweight framework for pre-hoc prediction of fine-tuning performance. TUNEAHEAD encodes each candidate run as a meta-feature vector that combines static dataset descriptors with dynamic probe features from a short standardized probe. A predictor maps these features to performance estimates, while SHAP-based attributions provide interpretable diagnostics that reveal which specific features drive the prediction. Across 1,300+ fine-tuning runs on Qwen2.5-7B-Instruct, TUNEAHEAD consistently outperforms strong baselines such as Early-Stop Extrapolation and ProxyLM. On a held-out test set of 370 runs, TUNEAHEAD achieves an RMSE of 1.47 percentage points and places 95.1% of predictions within +3/-3 percentage points of the true score. These accurate continuous predictions support practical go/no-go screening policies that can reduce unnecessary full fine-tuning while retaining most promising runs.

2606.17706 2026-06-17 cs.LG cs.AI 新提交

Confusion-Aware Transfer Teacher Curriculum Learning Framework: Disentangling Scoring and Pacing Effects

混淆感知的迁移教师课程学习框架:解耦评分与节奏效应

Savini Kommalage, Sanka Mohottala, Asiri Gawesha, Dulara Madhusanka, Menan Velayuthan, Dharshana Kasthurirathna, Mahima Milinda Alwis Weerasinghe, Charith Abhayaratne

发表机构 * Faculty of Computing, Sri Lanka Institute of Information Technology, Sri Lanka(斯里兰卡信息科技学院计算机学院,斯里兰卡) Faculty of Engineering, University of Sri Jayewardenepura, Sri Lanka(斯里兰卡贾亚韦达内普拉大学工程学院,斯里兰卡) Faculty of Engineering, Sri Lanka Institute of Information Technology, Sri Lanka(斯里兰卡信息科技学院工程学院,斯里兰卡) University of Sheffield, United Kingdom(谢菲尔德大学,英国) Utrecht University, The Netherlands(乌得勒支大学,荷兰)

AI总结 提出混淆感知难度评分,通过阶段性子集测试和随机基线解耦课程学习的评分与节奏效应,在CIFAR-10上验证评分可解释性,但全数据下无提升,仅在小数据量下提升数据效率。

Comments Accepted at International Conference on Machine Learning (ICML) GlobalSouthML Workshop (2026)

详情
AI中文摘要

课程学习结合了两个设计选择:样本如何按难度评分,以及较难样本如何逐步引入训练,这使得难以将观察到的性能提升归因于任一组件。我们通过两种评估协议解耦这些因素:阶段性子集测试(独立于课程训练验证评分函数)和基线(将相同的节奏调度应用于随机排序数据)。在迁移教师框架(TTF)中,我们使用这些协议评估一种混淆感知的难度评分,该评分同时考虑正确类别的置信度和错误类别上的概率分布。在CIFAR-10上使用ResNet-18和VGG-16,所提出的评分产生了与人类直觉一致的模型可解释难度排序。然而,在全数据下,无论是课程排序还是反课程排序,都没有比标准训练提高准确率,这表明仅改进评分函数不足以克服TTF中课程学习的已知失败模式。相反,我们发现混淆感知的课程排序带来一致的数据效率优势,在20%数据量下比随机排序高出最多8.7个百分点,表明TTF作为一种数据高效训练方法的潜力。

英文摘要

Curriculum learning couples two design choices, how samples are scored by difficulty and how harder samples are paced into training, making it difficult to attribute observed gains to either component. We disentangle these factors with two evaluation protocols: stage-wise test subsets that validate scoring functions independently of curriculum training, and a baseline that applies the same pacing schedule to randomly ordered data. Within the Transfer Teacher framework (TTF), we use these protocols to evaluate a confusion-aware difficulty score that considers both correct-class confidence and the probability distribution over incorrect classes. On CIFAR-10 with ResNet-18 and VGG-16, the proposed score produces model-interpretable difficulty rankings that align with human intuition. However, at full data, neither curriculum nor anti-curriculum ordering improves accuracy over standard training, indicating that improving the scoring function alone is insufficient to overcome the known failure modes of curriculum learning in TTF. In contrast, We find that confusion-aware curriculum ordering result in consistent data-efficiency benefits, outperforming random ordering by up to 8.7% points at the 20% data regime, suggesting the potential of TTF as a data-efficient training method.

2606.17889 2026-06-17 cs.LG cs.AI cs.NE 新提交

Dimensionality Controls When Modularity Helps in Continual Learning

维度控制模块化在持续学习中的有效性

Kathrin Korte, Christian Medeiros Adriano, Joachim Winther Pedersen, Eleni Nisioti, Sebastian Risi

发表机构 * IT University of Copenhagen, Denmark(丹麦技术大学) Hasso Plattner Institute, University of Potsdam, Germany(波茨坦大学哈asso 印度学院)

AI总结 研究在持续学习中,模块化架构、任务相似性和表示维度如何共同影响组合学习,发现低维“丰富”机制下模块化结构显著提升性能,而高维“懒惰”机制下影响较小。

Comments Accepted to the 2nd Workshop on Compositional Learning (CompLearn) at ICML 2026, Seoul, South Korea. 8 pages, 5 figures

详情
AI中文摘要

组合学习系统必须平衡可塑性(获取新知识的能力)与稳定性(保留先前学习组件的能力),尤其是当任务共享结构并存在干扰风险时。我们研究了模块化架构、任务相似性和表示维度如何在顺序A-B-A范式中共同塑造组合持续学习,通过权重尺度操作诱导高维和低维机制,比较了任务分区循环网络与单网络基线。在高维“懒惰”机制中,两种架构实现了相似的性能和内部几何结构,表明当表示受到弱约束时,显式模块化结构影响甚微。在低维“丰富”机制中,模块化变得决定性:模块化网络发展出分级的任务特定子空间,这些子空间在相似任务上重叠,在中等不相似任务上部分对齐,在不相似任务上分离,从而产生比单网络更具组合性和可解释性的组织。这些发现表明,由初始化尺度诱导的表示机制(与表示维度共变)是决定组合性模块化结构在持续学习中何时功能有益的关键因素,并支持将安全性和鲁棒性视为表示子空间的自适应分配问题,而非固定分离或共享。

英文摘要

Compositional learning systems must balance plasticity, the ability to acquire new knowledge, with stability, the preservation of previously learned components, especially when tasks share structure and risk interference. We study how modular architecture, task similarity, and representational dimensionality jointly shape compositional continual learning in a sequential A-B-A paradigm, comparing a task-partitioned recurrent network to a single-network baseline while inducing high- and low-dimensional regimes via weight-scale manipulations. In a high-dimensional "lazy" regime, both architectures achieve similar performance and internal geometry, suggesting that explicit modular structure has little impact when representations are weakly constrained. In a lower-dimensional "rich" regime, modularity becomes decisive: the modular network develops graded task-specific subspaces that overlap for similar tasks, partially align for moderately dissimilar tasks, and separate for dissimilar tasks, yielding a more compositional and interpretable organization than the single network. These findings identify the representational regime induced by initialization scale, which co-varies with representational dimensionality, as a key factor governing when compositional, modular structure is functionally beneficial in continual learning, and support viewing safety and robustness as problems of adaptive allocation of representational subspaces rather than fixed separation versus sharing.

2606.18024 2026-06-17 cs.LG cs.AI 新提交

Catastrophic Forgetting is Low-Rank: A Function-Space Theory for Continual Adaptation

灾难性遗忘是低秩的:持续适应的函数空间理论

Ido Nitzan Hidekel, Dan Raviv

发表机构 * Tel Aviv University(特拉维夫大学)

AI总结 本文在神经正切核(NTK)框架下提出函数空间理论,推导出新任务训练导致旧任务预测漂移的闭式表达式,揭示遗忘集中在少量旧任务NTK本征模式上,并给出低秩特性与Kronecker缩放规则。

Comments Accepted to the ICML 2026 Workshop on Continual Adaptation at Scale: Towards Sustainable AI

详情
AI中文摘要

持续适应中的灾难性遗忘通常通过参数漂移、重放或蒸馏来研究,但这些观点未能识别哪些输出空间方向是脆弱的。我们在NTK机制下给出一个函数空间解释:新任务训练通过跨任务核诱导旧任务预测漂移,从而在新任务梯度步骤之前得到遗忘向量的闭式预测器。在冻结主干线性头PEFT-CL中,模型在可训练参数上是线性的,预测器精确到数值精度;对于非线性适配器/全微调,它是局部NTK近似。同一表达式揭示遗忘集中在少量旧任务NTK本征模式上,并在冻结线性头下给出脆弱秩的Kronecker缩放规则。这些结果澄清了与先前NTK重叠理论的关系,解释了为什么参数空间正则化器可能遗漏输出空间干扰,并激发了一种有针对性的谱正则化器。

英文摘要

Catastrophic forgetting in continual adaptation is usually studied through parameter drift, replay, or distillation, but these views do not identify which output-space directions are vulnerable. We give a function-space account in the NTK regime: new-task training induces old-task prediction drift through the cross-task kernel, yielding a closed-form predictor for the forgetting vector before any new-task gradient step. In frozen-backbone linear-head PEFT-CL, where the model is linear in the trainable parameters, the predictor is exact up to numerical precision; for nonlinear adapters/full fine-tuning, it is a local NTK approximation. The same expression reveals that forgetting concentrates in a small number of old-task NTK eigenmodes and under frozen linear heads gives a Kronecker scaling rule for the vulnerable rank. These results clarify the relation to prior NTK-overlap theory, explain why parameter-space regularizers can miss output-space interference, and motivate a targeted spectral regularizer.

2606.18089 2026-06-17 cs.LG 新提交

From Reasoning Traces to Reusable Modules: Understanding Compositional Generalization in Language Model Reasoning

从推理轨迹到可复用模块:理解语言模型推理中的组合泛化

Lingjing Kong, Xin Liu, Guangyi Chen, Martin Q. Ma, Xiangchen Song, Yuekai Sun, Mikhail Yurochkin, Taylor W. Killian, Ruslan Salakhutdinov, Kun Zhang, Eric P. Xing, Zhengzhong Liu

发表机构 * Carnegie Mellon University(卡内基梅隆大学) Mohamed bin Zayed University of Artificial Intelligence(穆罕默德·本·扎耶德人工智能大学) Institute of Foundation Models(基础模型研究院) University of Michigan(密歇根大学)

AI总结 本文通过层次化潜在选择模型形式化组合泛化,理论证明SFT提供原子模块,RL分解轨迹实现组合泛化,实验验证RL能从复合轨迹中提取原子模块并重组解决新配置。

Comments ICML2026

详情
AI中文摘要

结合监督微调(SFT)和强化学习(RL)的训练后流程已成为将大型语言模型(LLM)转化为稳健推理者的关键方法。我们认为这种组合成功源于组合泛化,并通过层次化潜在选择模型将其形式化。在此框架中,推理轨迹由一系列离散的潜在选择变量生成,这些变量对应于可复用的原子模块,包括技能(局部操作)和路由机制(中间信息如何被选择、复用和组合)。在该模型中,我们从理论上证明SFT和RL扮演着不对称且互补的角色:SFT在组合轨迹中提供原始模块材料,而RL分解这些轨迹以识别潜在原子模块并实现组合泛化。我们设计受控实验验证这一理论。结果表明,RL可以从SFT提供的复合轨迹中提取原子模块,并将其重组以解决新配置。此外,我们发现基于复合轨迹的训练比基于孤立原子模块的训练产生更强的泛化能力。最后,我们研究了SFT和RL数据之间的关系,并确定了一种有效协议:SFT通过组合轨迹确保所有原子模块的覆盖,而RL专注于SFT支持范围之外的新组合以驱动探索。

英文摘要

Post-training pipelines that combine supervised fine-tuning (SFT) with reinforcement learning (RL) have emerged as the key recipe for transforming large language models (LLMs) into robust reasoners. We argue that this combined success is driven by compositional generalization, which we formalize through a hierarchical latent selection model. In this framework, reasoning traces are generated by a cascade of discrete latent selection variables corresponding to reusable atomic modules, including both skills (local operations) and routing mechanisms (how intermediate information is selected, reused, and composed). Within this model, we theoretically show that SFT and RL play asymmetric, complementary roles: SFT supplies the raw module materials in compositional traces, and RL decomposes those traces to identify the latent atomic modules and enable compositional generalization. We design controlled experiments to validate this theory. Our results demonstrate that RL can extract atomic modules from compound traces supplied by SFT and recombine them to solve new configurations. Moreover, we find that training on compound traces yields stronger generalization than training on isolated atomic modules. Finally, we investigate the relationship between SFT and RL data and identify an effective protocol in which SFT ensures coverage of all atomic modules through compositional traces, while RL focuses on novel compositions outside the SFT support to drive exploration.

2606.17645 2026-06-17 cs.AI cs.CL cs.LG 交叉投稿

Beyond Domains: Reusing Web Skills via Transferable Interaction Patterns

超越领域:通过可迁移交互模式重用网络技能

Shiqi He, Yue Cui, Feijie Wu, Xinyu Ma, Jiaheng Lu, Yaliang Li, Bolin Ding, Mosharaf Chowdhury

发表机构 * University of Michigan(密歇根大学) Alibaba Group(阿里巴巴集团) Purdue University(普渡大学) McMaster University(麦克马斯特大学) University of Pennsylvania(宾夕法尼亚大学)

AI总结 提出SkillMigrator代理,通过学习可迁移交互模式(TIP)匹配布局结构而非元素引用,实现跨站点技能重用,在WebArena和Mind2Web上成功轨迹的LLM动作数减少8-10%。

详情
AI中文摘要

大型语言模型(LLM)网络代理通常被部署为工具调用者:每轮,模型读取新的页面观察并发出一个结构化工具动作。当每个动作都是低级原语时,视野迅速增长,面向策略的LLM完成次数也随之增加,在Mind2Web和WebArena等基准测试中主导了延迟和成本。因此,最近的系统将重复的交互片段包装为网络技能:从成功轨迹或诱导程序中构建的可调用工具,这样一次调用可以替代多个原语。然而,先前的技能库仍然主要通过指令相似性或粗略的站点元数据触发,这导致在未见站点上技能重用率低,并留下了许多潜在的步骤和令牌减少空间。我们提出了SkillMigrator,一个学习可重用网络技能并通过匹配布局结构而非特定元素引用来跨站点迁移它们的代理。每个诱导技能被存储为可迁移交互模式(TIP):技能与诱导时快照的结构草图配对。在测试时,SkillMigrator通过布局相似性检索TIP,并将其引用锚定到实时页面。其余堆栈是标准的:具有稳定引用的可访问性快照观察,以及基于原语加技能调用的固定工具调用。与最先进的方法相比,SkillMigrator在匹配成功率的情况下,将WebArena和Mind2Web上成功轨迹的平均LLM动作数减少了8-10%。

英文摘要

Large language model (LLM) web agents are usually deployed as tool callers: each turn, the model reads a fresh page observation and emits one structured tool action. When every action is a low-level primitive, horizons grow quickly and so do policy-facing LLM completions, dominating latency and cost on benchmarks such as Mind2Web and WebArena. Recent systems therefore wrap repeated interaction fragments as web skills: callable tools built from successful trajectories or induced programs, so one call can replace several primitives. However, prior skill libraries are still triggered mainly by instruction similarity or coarse site metadata, which yields low skill reuse on held-out sites and leaves much of the potential step and token reduction on the table. We present SkillMigrator, an agent that learns reusable web skills and transfers them across sites by matching layout structure rather than specific element references. Each induced skill is stored as a transferable interaction pattern (TIP): the skill paired with a structural sketch of the snapshot at induction time. At test time, SkillMigrator retrieves TIPs by layout similarity and grounds their references on the live page. The rest of the stack is standard: accessibility-snapshot observations with stable references, and fixed tool calling over primitives plus skill invocations. Compared with the state-of-the-art approaches, SkillMigrator reduces the average LLM-action count on successful trajectories by 8-10% across both WebArena and Mind2Web at matched success rate.

2512.04524 2026-06-17 cs.LG cs.AI 版本更新

Prototype-Based Semantic Consistency Alignment for Domain Adaptive Retrieval

基于原型语义一致性对齐的域自适应检索

Tianle Hu, Weijun Lv, Na Han, Xiaozhao Fang, Jie Wen, Jiaxing Li, Guoxu Zhou

发表机构 * School of Computer Science and Technology, Guangdong University of Technology(广东工业大学计算机科学与技术学院) School of Automation, Guangdong University of Technology(广东工业大学自动化学院) School of Computer Science, Guangdong Polytechnic Normal University(广东 polytechnic 正规大学计算机科学学院) School of Computer Science and Technology, Harbin Institute of Technology, Shenzhen(哈尔滨工业大学深圳校区计算机科学与技术学院) School of Artificial Intelligence, Guangzhou University(广州大学人工智能学院)

AI总结 提出原型语义一致性对齐(PSCA)两阶段框架,通过正交原型建立类级语义连接,利用几何邻近性加权伪标签置信度,并在重构特征上量化生成统一哈希码,解决域自适应检索中的类级对齐缺失和量化质量下降问题。

Comments AAAI2026

详情
AI中文摘要

域自适应检索旨在将知识从有标签的源域迁移到无标签的目标域,实现有效检索的同时缓解域差异。然而,现有方法存在几个根本性局限:1)忽略类级语义对齐,过度追求成对样本对齐;2)缺乏伪标签可靠性考虑或评估标签正确性的几何指导;3)直接量化受域偏移影响的原始特征,损害所学哈希码的质量。鉴于这些局限,我们提出基于原型的语义一致性对齐(PSCA),一种用于有效域自适应检索的两阶段框架。在第一阶段,一组正交原型直接建立类级语义连接,在聚集类内样本的同时最大化类间分离性。在原型学习过程中,几何邻近性通过自适应加权伪标签置信度,为语义一致性对齐提供可靠性指标。所得的隶属度矩阵和原型促进特征重建,确保在重建特征而非原始特征上进行量化,从而改善后续哈希编码质量并无缝连接两个阶段。在第二阶段,特定域的量化函数在相互逼近约束下处理重建特征,生成跨域的统一二进制哈希码。大量实验验证了PSCA在多个数据集上的优越性能。

英文摘要

Domain adaptive retrieval aims to transfer knowledge from a labeled source domain to an unlabeled target domain, enabling effective retrieval while mitigating domain discrepancies. However, existing methods encounter several fundamental limitations: 1) neglecting class-level semantic alignment and excessively pursuing pair-wise sample alignment; 2) lacking either pseudo-label reliability consideration or geometric guidance for assessing label correctness; 3) directly quantizing original features affected by domain shift, undermining the quality of learned hash codes. In view of these limitations, we propose Prototype-Based Semantic Consistency Alignment (PSCA), a two-stage framework for effective domain adaptive retrieval. In the first stage, a set of orthogonal prototypes directly establishes class-level semantic connections, maximizing inter-class separability while gathering intra-class samples. During the prototype learning, geometric proximity provides a reliability indicator for semantic consistency alignment through adaptive weighting of pseudo-label confidences. The resulting membership matrix and prototypes facilitate feature reconstruction, ensuring quantization on reconstructed rather than original features, thereby improving subsequent hash coding quality and seamlessly connecting both stages. In the second stage, domain-specific quantization functions process the reconstructed features under mutual approximation constraints, generating unified binary hash codes across domains. Extensive experiments validate PSCA's superior performance across multiple datasets.

2602.03846 2026-06-17 cs.LG cs.AI 版本更新

PLATE: Plasticity-Tunable Efficient Adapters for Geometry-Aware Continual Learning

PLATE: 可塑性可调的几何感知持续学习高效适配器

Romain Cosentino

AI总结 提出无需旧任务数据的持续学习方法PLATE,利用预训练网络的几何冗余性,通过结构化低秩更新显式控制可塑性-保留权衡,提升最坏情况保留保证。

详情
AI中文摘要

我们为预训练模型开发了一种持续学习方法,该方法不需要访问旧任务数据,解决了基础模型适应中预训练分布通常不可用的实际障碍。我们的关键观察是,预训练网络表现出大量的几何冗余性,并且这种冗余性可以通过两种互补的方式加以利用。首先,冗余神经元提供了预训练时代主导特征方向的代理,使得可以直接从预训练权重构建近似受保护的更新子空间。其次,冗余性为可塑性的放置位置提供了自然偏差:通过将更新限制在冗余神经元的子集并约束剩余的自由度,我们获得了在旧数据分布上功能漂移减少且最坏情况保留保证改善的更新族。这些见解导致了PLATE(可塑性可调的高效适配器),一种不需要过去任务数据的持续学习方法,它提供了对可塑性-保留权衡的显式控制。PLATE通过结构化低秩更新ΔW = B A Q^T参数化每一层,其中B和Q从预训练权重一次性计算并保持冻结,只有A在新任务上训练。代码可在https://this URL获取。

英文摘要

We develop a continual learning method for pretrained models that \emph{requires no access to old-task data}, addressing a practical barrier in foundation model adaptation where pretraining distributions are often unavailable. Our key observation is that pretrained networks exhibit substantial \emph{geometric redundancy}, and that this redundancy can be exploited in two complementary ways. First, redundant neurons provide a proxy for dominant pretraining-era feature directions, enabling the construction of approximately protected update subspaces directly from pretrained weights. Second, redundancy offers a natural bias for \emph{where} to place plasticity: by restricting updates to a subset of redundant neurons and constraining the remaining degrees of freedom, we obtain update families with reduced functional drift on the old-data distribution and improved worst-case retention guarantees. These insights lead to \textsc{PLATE} (\textbf{Pla}sticity-\textbf{T}unable \textbf{E}fficient Adapters), a continual learning method requiring no past-task data that provides explicit control over the plasticity-retention trade-off. PLATE parameterizes each layer with a structured low-rank update $ΔW = B A Q^\top$, where $B$ and $Q$ are computed once from pretrained weights and kept frozen, and only $A$ is trained on the new task. The code is available at https://github.com/SalesforceAIResearch/PLATE.

2603.01761 2026-06-17 cs.LG cs.AI 版本更新

Position: Modular Memory is the Key to Continual Learning Agents

Position: 模块化记忆是持续学习智能体的关键

Vaggelis Dorovatas, Malte Schwerin, Andrew D. Bagdanov, Lucas Caccia, Antonio Carta, Laurent Charlin, Barbara Hammer, Tyler L. Hayes, Timm Hess, Christopher Kanan, Dhireesha Kudithipudi, Xialei Liu, Vincenzo Lomonaco, Jorge Mendez-Mendez, Darshan Patil, Ameya Prabhu, Elisa Ricci, Tinne Tuytelaars, Gido M. van de Ven, Liyuan Wang, Joost van de Weijer, Jonghyun Choi, Martin Mundt, Rahaf Aljundi

AI总结 本文提出通过模块化记忆结合权重内学习与上下文学习,解决持续学习中的灾难性遗忘问题,实现大规模持续适应。

Comments ICML 2026 Position Track Spotlight. This work stems from discussions held at the Dagstuhl seminar on Continual Learning in the Era of Foundation Models (October 2025)

详情
AI中文摘要

基础模型通过大规模预训练和增加测试时计算已经改变了机器学习。尽管在多个领域超越了人类表现,这些模型在持续运行、经验积累和个性化方面仍然存在根本性限制,而这些能力是自适应智能的核心。虽然持续学习研究长期以来一直瞄准这些目标,但其历史上专注于权重内学习(IWL),即更新单个模型的参数以吸收新知识,导致灾难性遗忘成为一个持续挑战。我们的立场是,通过设计模块化记忆,结合权重内学习(IWL)和新出现的上下文学习(ICL)的优势,是实现大规模持续适应的缺失环节。我们概述了一个以模块化记忆为中心的架构的概念框架,该架构利用ICL进行快速适应和知识积累,利用IWL对模型能力进行稳定更新,为持续学习智能体绘制了一条实用的路线图。

英文摘要

Foundation models have transformed machine learning through large-scale pretraining and increased test-time compute. Despite surpassing human performance in several domains, these models remain fundamentally limited in continuous operation, experience accumulation, and personalization, capabilities that are central to adaptive intelligence. While continual learning research has long targeted these goals, its historical focus on in-weight learning (IWL), i.e., updating a single model's parameters to absorb new knowledge, has rendered catastrophic forgetting a persistent challenge. Our position is that combining the strengths of In-Weight Learning (IWL) and the newly emerged capabilities of In-Context Learning (ICL) through the design of modular memory is the missing piece for continual adaptation at scale. We outline a conceptual framework for modular memory-centric architectures that leverage ICL for rapid adaptation and knowledge accumulation, and IWL for stable updates to model capabilities, charting a practical roadmap toward continually learning agents.

2605.01973 2026-06-17 cs.CL cs.LG 版本更新

Learn-To-Learn on Arbitrary Textual Conditioning: A Hypernetwork-Driven Meta-Gated LLM

在任意文本条件下学习:一种超网络驱动的元门控大语言模型

Luo Ji, Qi Qin, Ningyuan Xi, Teng Chen, Qingqing Gu, Hongyan Li

发表机构 * University of Science and Technology of China(中国科学技术大学)

AI总结 提出一种超网络驱动的元门控机制,通过动态调整SwiGLU块中的β参数,使LLM适应不同文本条件,优于微调和元学习基线。

Comments Accepted by ICML2026

详情
AI中文摘要

传统的大语言模型可能面临语料异质性和细微条件变化的问题。虽然微调可能导致灾难性遗忘,但元学习在LLM上的应用也因其复杂性和可扩展性而受到限制。在本文中,我们激活了SwiGLU块中的元信号$β$,形成了一种自适应调整FFN非线性的元门控机制。我们使用超网络动态生成基于文本条件的$β$,为LLM提供元可控性。通过在任务、领域、角色和风格等不同条件类型上的测试,我们的方法优于微调和元学习基线,并且能够合理泛化到未见过的任务、条件类型或指令。我们的代码可在https://github.com/AaronJi/MeGan找到。

英文摘要

Conventional LLMs may suffer from corpus heterogeneity and subtle condition changes. While finetuning can create the catastrophe forgetting issue, application of meta-learning on LLMs is also limited due to its complexity and scalability. In this paper, we activate the meta-signal of $β$ within the SwiGLU blocks, resulting in a meta-gating mechanism that adaptively adjusts the nonlinearity of FFN. A hypernetwork is employed which dynamically produces $β$ on textual conditions, providing meta-controllability on LLMs. By testing on different condition types such as task, domain, persona, and style, our method outperforms finetuning and meta-learning baselines, and can generalize reasonably on unseen tasks, condition types, or instructions. Our code can be found in https://github.com/AaronJi/MeGan.

11. 数据集、基准与评测 30 篇

2606.17321 2026-06-17 cs.LG cs.CV 新提交

ProCUA-SFT Technical Report

ProCUA-SFT 技术报告

Jaehun Jung, Ximing Lu, Brandon Cui, Muhammad Khalifa, Shaokun Zhang, Hao Zhang, Jin Xu, Amala Sanjay Deshmukh, Karan Sapra, Andrew Tao, Yejin Choi, Jan Kautz, Mingjie Liu, Yi Dong

发表机构 * NVIDIA(英伟达) University of Washington(华盛顿大学) Allen Institute for AI(艾伦人工智能研究所)

AI总结 提出 ProCUA-SFT 数据集,通过自动化管道从 2484 个应用组合的合成轨迹中蒸馏出 310 万步级 SFT 样本,微调 UI-TARS 7B 在 OSWorld 上达到 45.0% 的成功率,比基线提升 18.7 个百分点。

Comments 15 pages, 5 figures

详情
AI中文摘要

训练计算机使用智能体(CUA)——通过截图和键盘/鼠标操作与图形桌面交互的模型——需要在全桌面环境中收集的大规模、多样化的轨迹数据。最大的公共资源 AgentNet(22.5K 条人类轨迹)在用于监督微调(SFT)时会导致负迁移:在 AgentNet 上继续训练 UI-TARS 7B 导致 OSWorld 成功率从 26.3% 下降到 8-10%。我们提出了 ProCUA-SFT,一个包含 310 万步级 SFT 样本的数据集,这些样本从 2484 个应用组合中的 93K 条合成轨迹中蒸馏得到。该数据集由一个全自动管道生成,该管道(i)在带有真实世界内容的实况桌面上合成有基础的任务——912 个来自 SpreadsheetBench 的电子表格、约 10K 个来自 Zenodo10K 的宽松许可演示文稿以及多应用 OSWorld 配置——以及(ii)在展开前通过二元前置条件检查验证每个任务的可行性。单个 VLM(Kimi-K2.5)作为目标生成器、前置条件判断器和轨迹执行器,消除了规划器-执行器的能力差距。每条轨迹被扩展为步前缀样本,精确复现推理时看到的上下文布局。在 ProCUA-SFT 上微调 UI-TARS 7B 一个 epoch 后,在 OSWorld 上达到 45.0%——比基础模型提升 18.7 个百分点,比 AgentNet 训练的模型高出 35% 以上。ProCUA 的一个子集被纳入 Nemotron 3 Nano Omni 模型的训练数据中,为其计算机使用能力做出了贡献。

英文摘要

Training computer-use agents (CUAs) -- models that interact with graphical desktops through screenshots and keyboard/mouse actions -- requires large-scale, diverse trajectory data collected in full desktop environments. The largest public resource, AgentNet (22.5K human trajectories), leads to negative transfer when used for supervised fine-tuning (SFT): continuing training UI-TARS 7B on AgentNet causes OSWorld success rate to fall from 26.3% to 8-10%. We present ProCUA-SFT, a dataset of 3.1M step-level SFT samples distilled from 93K synthetic trajectories across 2,484 application combinations. The dataset is produced by a fully automated pipeline that (i) synthesizes grounded tasks on live desktops seeded with real-world content -- 912 spreadsheets from SpreadsheetBench, approximately 10K permissively-licensed presentations from Zenodo10K, and multi-application OSWorld configs -- and (ii) verifies each task's feasibility through binary precondition checking before rollout. A single VLM (Kimi-K2.5) serves as goal generator, precondition judge, and trajectory executor, eliminating planner-actor capability gaps. Each trajectory is expanded into step-prefix samples that exactly reproduce the context layout seen at inference time. Fine-tuning UI-TARS 7B on ProCUA-SFT for one epoch yields 45.0% on OSWorld -- an 18.7 percentage-point improvement over the base model and over 35% above AgentNet-trained counterparts. A subset of ProCUA was incorporated into the training data for the Nemotron 3 Nano Omni model, contributing to its computer-use capabilities.

2606.17464 2026-06-17 cs.LG 新提交

CheckMIABench: Firm Foundations For Membership Inference Attacks on Language Models

CheckMIABench: 语言模型成员推理攻击的坚实基础

Jeffrey G. Wang, Jason Wang, Marvin Li, Seth Neel

发表机构 * Harvard University(哈佛大学) Harvard Business School(哈佛商学院)

AI总结 为解决成员推理攻击评估中的分布偏移问题,提出基于训练中固定点前后数据同分布的基准框架,在Pythia和OLMo模型上评估多种攻击,并开源模块化库。

详情
AI中文摘要

成员推理攻击(MIA)是评估机器学习模型隐私属性的标准方法。尽管已有多次尝试评估语言模型上的MIA,但现有文献在构建干净评估以测试新技术方面遇到诸多困难。特别是,成员集和非成员集之间的微妙分布偏移可能破坏MIA的统计有效性;最近的研究通过展示没有访问底层模型的“盲”方法在同一基准上的表现远优于已发布方法,强调了这一点。本文利用训练过程中固定点前后的训练数据来自同一分布的洞察,构建了一个用于对LLM进行原则性MIA评估的基准。因此,所有具有中间检查点和公开训练数据的开源模型都可以转化为MIA测试平台。我们将我们的框架应用于针对Pythia和OLMo模型系列(从70M到7B参数)的半打已发布攻击。为促进进一步的隐私研究,我们开源了一个模块化库,用于在此设置中设计和实现攻击:此 https URL。

英文摘要

Membership inference attacks (MIAs) are a canonical way to assess a machine learning model's privacy properties. Although several attempts have been made to evaluate MIAs on language models, the extant literature has suffered numerous difficulties in constructing clean evaluations to test new techniques. In particular, subtle distribution shifts between member and non-member sets can undermine the statistical validity of MIAs; recent work has underscored this by showing that "blind" methods with no access to the underlying model can perform far better than published methods on the same benchmarks. This paper constructs a benchmark for principled evaluation of MIAs against LLMs, by leveraging the insight that training data before and after a fixed point during training are drawn from the same distribution. Therefore, all open-source models with intermediate checkpoints and public training data can be converted into MIA testbeds. We apply our framework to a half-dozen published attacks on the Pythia and OLMo family of models, from 70M to 7B parameters. To facilitate further privacy research, we open-source a modular library for designing and implementing attacks in this setting: https://github.com/safr-ai-lab/pandora_llm.

2606.17508 2026-06-17 cs.LG cs.DC cs.PL cs.SE 新提交

When the Next Step Is Not One Step: Distribution-Aware Execution Modeling for Concurrent Go Programs

当下一步不是一步:面向并发Go程序的分布感知执行建模

Kaviru Hapuarachchi

发表机构 * University of Colombo School of Computing(科伦坡大学计算学院)

AI总结 针对并发程序非确定性调度导致单标签预测困难的问题,提出分布感知训练方法,通过多次运行聚合经验分布并微调7B模型,在真实Go缺陷预测中准确率达36.2%,并降低期望校准误差。

Comments 10 pages, 2 figures

详情
AI中文摘要

训练模型预测并发程序中的下一步比看起来更难:从相同跟踪前缀出发的同一程序的两次运行可能产生不同的有效下一事件,因为调度器是非确定性的。针对单一标签训练的模型实际上是在学习猜测随机过程的一个结果。我们反过来利用这种非确定性作为训练信号。我们将每个程序运行多次,将观察到的下一事件聚合成经验分布,并使用KL散度目标微调一个7B模型。在从真实生产Go缺陷(CockroachDB、Kubernetes、gRPC、etcd)中抽取的798个保留预测上,对少于一千个跟踪进行微调达到了36.2%的准确率,超过了零样本的Gemini 3.5 Flash(34.8%)和未微调的同一模型(28.6%)。分布训练在准确率上与交叉熵相当(35.8% vs. 36.2%),同时将期望校准误差从0.205降低到0.169。我们还推导出一类select阻塞goroutine的形式化goroutine泄漏特征,其中P(GoUnblock)=0由调度器语义保证,而非学习得到。我们发布了数据集、训练适配器和所有工具。

英文摘要

Training a model to predict the next step in a concurrent program is harder than it looks: two runs of the same program from the same trace prefix can produce different next events, both valid, because the scheduler is nondeterministic. A model trained against a single label is learning to guess one outcome of a random process. We turn this around and use the nondeterminism as a training signal. We run each program many times, aggregate the observed next events into an empirical distribution, and fine-tune a 7B model to match that distribution with a KL objective. On 798 held-out predictions drawn from real production Go bugs (CockroachDB, Kubernetes, gRPC, etcd), fine-tuning on fewer than a thousand traces reaches 36.2% accuracy, ahead of Gemini 3.5 Flash used zero-shot (34.8%) and the same model without fine-tuning (28.6%). Distribution training matches cross-entropy on accuracy (35.8% vs. 36.2%) while reducing Expected Calibration Error from 0.205 to 0.169. We also derive a formal goroutine-leak signature for a class of select-blocked goroutines where P(GoUnblock)=0 holds by scheduler semantics, not by learning. We release the dataset, trained adapters, and all tooling.

2606.17541 2026-06-17 cs.LG cs.AI 新提交

Offline Preference-Based Trajectory Evaluation

基于偏好的离线轨迹评估

Fernando Diaz

发表机构 * Carnegie Mellon University(卡内基梅隆大学)

AI总结 针对离线评估中仅使用终端成功率导致统计效率低下的问题,提出基于偏好的轨迹评估方法,通过比较轨迹的时间偏好减少平局,提升区分能力、排名稳定性和数据效率。

详情
AI中文摘要

智能系统的离线评估通常将轨迹简化为终端成功,丢弃了部分进展信息并导致大量平局,通过减少有效样本量和削弱区分系统的能力,造成显著的统计低效。我们提出基于偏好的轨迹评估,该方法通过时间偏好(关于进展和返回时间分布)直接比较轨迹。我们发现,在多种智能和交互基准测试中,基于标准成功率的指标在大约75%的实例上产生平局比较,而轨迹感知偏好将平局减少到大约35%,从而提高了区分能力、排名稳定性和数据效率。我们的结果表明,通常归因于数据收集不足或问题难度的基准饱和,也可能由评估指标的选择所解释。

英文摘要

Offline evaluation of agentic systems often collapses trajectories to terminal success, discarding information about partial progress and inducing widespread ties, creating substantial statistical inefficiency by reducing effective sample size and weakening the ability to distinguish systems. We propose preference-based trajectory evaluation, which compares trajectories directly through temporal preferences over progress and time-to-return profiles. We find that, across diverse agentic and interactive benchmarks, standard success-based metrics produce tied comparisons on roughly 75% of instances, whereas trajectory-aware preferences reduce ties to roughly 35%, improving discriminative power, ranking stability, and data efficiency. Our results suggest that benchmark saturation, often attributed to poor data collection or problem difficulty, may also be explained by the choice of evaluation measure.

2606.17858 2026-06-17 cs.LG 新提交

Meta-classification of one-class classification models using ranking correlation and nearest neighbor

使用排序相关性和最近邻的一类分类模型的元分类

Toshitaka Hayashi, Hamido Fujita, Dalibor Cimr, Richard Cimler, Jitka Kühnová

发表机构 * Faculty of Science, University of Hradec Kralove(赫拉德茨-克拉洛韦大学理学院) Malaysia-Japan International Institute of Technology (MJIIT), Universiti Teknologi Malaysia(马来西亚-日本国际技术学院,马来西亚理工大学) Regional Research Center, Iwate Prefectural University(岩手县立大学区域研究中心)

AI总结 提出用排序相关性和最近邻对一类分类模型进行元分类,实验表明能高精度区分数据集、算法和超参数,本质是数据集分类。

详情
AI中文摘要

机器学习技术已被应用于各种问题。然而,将机器学习应用于机器学习模型本身是一个未被探索的方向。为此,本文考虑了一类分类(OCC)模型的元分类,因为所有机器学习模型都可以近似为OCC模型。该提案将OCC模型表示为正态性排序,并使用最近邻和排序相关性度量对其进行分类。实验对OCC模型进行分类,其中类别对应于训练数据集、算法和超参数。当类别标签为数据集时,该提案实现了高精度。此外,当训练数据集包含相同类别时,它可以对算法进行分类。讨论强调,OCC模型的分类本质上是将多个样本视为单个输入的数据集分类。实验使用睡眠记录展示了数据集的分类。所提出的方法可以为分类OCC模型、数据集和排序提供统一解决方案。源代码已上传至公共仓库:https://this URL。

英文摘要

Machine Learning (ML) techniques have been applied to various problems. However, applying ML to ML models is an unexplored direction. For this purpose, this paper considers a meta-classification of one-class classification (OCC) models, because all ML models could be approximated as OCC models. The proposal represents OCC models as normality rankings and classifies them using nearest-neighbor and ranking-correlation metrics. The experiment classifies OCC models, where classes correspond to training datasets, algorithms, and hyperparameters. The proposal achieves high accuracy when class labels are datasets. Moreover, it can classify algorithms when the training datasets contain the same class. In addition, the discussion highlights that the classification of OCC models is essentially the classification of datasets that treats multiple samples as a single input. The experiment demonstrates the classification of datasets using sleeping records. The proposed method can provide a unified solution for classifying OCC models, datasets, and rankings. Source code is uploaded to the public repository https://github.com/ToshiHayashi/ClassOCC.

2606.18209 2026-06-17 cs.LG 新提交

Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?

重新思考用于分类的数据集蒸馏:蒸馏集是否优于核心集?

Trisha Mittal, Akshay Mehra, Joshua Kimball

发表机构 * Dolby Laboratory(杜比实验室)

AI总结 本文通过大规模标准化实验评估七种最先进的数据集蒸馏方法,发现其在大型数据集上性能不如或仅相当于核心集,且构建成本更高,核心集在数据分布覆盖和计算效率上更具优势。

详情
AI中文摘要

数据集蒸馏(DD)已成为以数据为中心的机器学习中的一种重要方法,旨在通过将大型数据集中的信息压缩到少量合成样本中,合成紧凑的训练集以实现高效训练。然而,DD方法通常在不一致的评估协议下进行评估,从标准ERM到单/多教师监督,这使得难以从评估中分离出蒸馏数据的有效性。此外,许多先前方法声称DD优于数据剪枝方法(如核心集选择),其假设是将浓缩数据集限制为真实样本的子集会从根本上限制其表达能力。在这项工作中,我们通过使用标准化数据集和评估协议进行大规模实验,批判性地评估DD方法以评估其内在有效性。我们在ImageNet-1K、ImageNet100和ImageNette上对七种最先进的DD方法进行了基准测试,使用了三种广泛采用的训练协议,并与三种核心集策略进行比较。我们的结果表明,虽然一些DD方法甚至未能优于简单的随机子集,但最先进的DD方法在大型数据集上与核心集相当或更差,并且构建成本显著更高。除了准确性,我们还评估了浓缩集的代表性、多样性和质量,发现核心集始终能更好地覆盖原始数据分布。这些发现凸显了当前DD方法的实际优势有限,并表明核心集仍然具有竞争力,并且通常是以数据为中心的学习中计算效率更高的替代方案。

英文摘要

Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under inconsistent evaluation protocols, ranging from standard ERM to single/multi-teacher supervision, making it difficult to isolate the effectiveness of distilled data from evaluation. Moreover, many prior methods claim that DD outperforms data pruning approaches such as coreset selection (CS), based on the assumption that restricting condensed datasets to subsets of real samples fundamentally limits their expressiveness. In this work, we critically evaluate DD methods through large-scale experiments using standardized datasets and evaluation protocols to assess their intrinsic effectiveness. We benchmark seven state-of-the-art (SOTA) DD methods on ImageNet-1K, ImageNet100, and ImageNette, using three widely adopted training protocols against three CS strategies. Our results show that while some DD methods fail to outperform even simple random subsets, the SOTA DD approaches are comparable to or worse than coresets on large-scale datasets and incur a substantially higher cost for construction. Beyond accuracy, we also evaluate the representativeness, diversity, and quality of condensed sets, and find that coresets consistently achieve better coverage of the original data distribution. These findings highlight the limited practical advantages of current DD methods and show that coresets remain competitive and are often a more computationally efficient alternative for data-centric learning.

2606.17062 2026-06-17 q-bio.QM cs.LG 交叉投稿

RadSEM: A Finding-by-Finding Metric for Clinical Consistency in Radiology Reports

RadSEM:放射学报告中临床一致性的逐发现指标

Zhenhong Yang, Zhuoyun Liu, Jintao Fei, Wen Tang, Shichao Quan, Jun Zhao, Jun Xu

发表机构 * JDH Algo, JD Health International Inc., China Department of Big Data in Health Science, The First Affiliated Hospital of Wenzhou Medical University, China Zhejiang Engineering Research Center for Hospital Emergency Department of Intensive Care Unit, The First Affiliated Hospital of Wenzhou Medical University, Wenzhou, Zhejiang, China

AI总结 提出RadSEM指标,通过约束LLM辅助将报告重写为原子发现句,进行矛盾感知的多对多匹配,并计算异常加权的F1分数,在SSREE测试中优于现有指标,实现高一致性评分。

详情
AI中文摘要

放射学报告评估必须区分临床兼容性与表面相似性,因为否定、侧别或正常-异常极性可能逆转发现。我们提出RadSEM(放射学句子级评估指标),一种受约束的LLM辅助指标,用于基于参考的放射学发现评估。RadSEM将参考报告和生成报告重写为有序的原子发现句,每个句子表达一个部位-发现命题。然后执行矛盾约束的多对多匹配:不兼容对(如“积液”和“无积液”)不得分,而兼容的粒度差异可获部分得分。确定性阶段根据部分-整体和异常-细节关系对配对加权,计数未匹配的发现,并生成异常加权的加权F1分数。因此,LLM支持结构化重写和局部对齐,而非充当不透明评判者。我们使用SSREE(一种受控单调性压力测试,基于2,448份去标识报告扩展为五个等级损坏水平)评估RadSEM。RadSEM的Kendall tau_b达到0.957,全对一致性97.8%,相邻一致性95.0%,81.9%的报告实现严格五级排序,优于放射学专用和通用文本指标,同时避免了极性反转报告重新获得词汇重叠的失败。在同一SSREE集上,RadSEM优于参考锚定的RadSEM-Alt策略,将相邻一致性从90.7%提升至95.0%,严格排序从67.2%提升至81.9%。在599个三元组同义词/反义词子集上,RadSEM在597个案例(99.67%)中偏好同义词。这些结果表明,显式发现单元、矛盾感知匹配和异常聚焦的确定性评分使报告评分更具可解释性,并对临床有意义的错误更敏感。代码见:此https URL。

英文摘要

Radiology report evaluation must distinguish clinical compatibility from surface similarity, because negation, laterality, or normal-abnormal polarity can reverse a finding. We propose RadSEM (Radiology Sentence-Level Evaluation Metric), a constrained LLM-assisted metric for reference-based evaluation of radiology Findings. RadSEM rewrites reference and generated reports into ordered atomic finding sentences, each expressing one site-finding proposition. It then performs contradiction-constrained many-to-many matching: incompatible pairs such as "effusion" and "no effusion" receive no credit, while compatible granularity differences can receive partial credit. A deterministic stage weights pairs by part-whole and abnormal-detail relationships, counts unmatched findings, and produces an abnormal-focused weighted F1 score. Thus, the LLM supports structured rewriting and local alignment rather than acting as an opaque judge. We evaluate RadSEM with SSREE, a controlled monotonicity stress test built from 2,448 de-identified reports expanded into five graded corruption levels. RadSEM achieves Kendall tau_b of 0.957, all-pairs concordance of 97.8%, adjacent concordance of 95.0%, and strict five-level ordering for 81.9% of reports, outperforming radiology-specific and general text metrics while avoiding the failure in which polarity-inverted reports regain lexical overlap. On the same SSREE set, RadSEM outperforms the Ref-anchored RadSEM-Alt policy, improving adjacent concordance from 90.7% to 95.0% and strict ordering from 67.2% to 81.9%. On a 599-triplet synonym/antonym subset, RadSEM prefers synonyms in 597 cases (99.67%). These results suggest that explicit finding units, contradiction-aware matching, and abnormal-focused deterministic scoring make report scoring more interpretable and sensitive to clinically meaningful errors. Code is available at https://github.com/jdh-algo/RadSEM.

2606.17283 2026-06-17 cs.CR cs.AI cs.LG 交叉投稿

ARVO: Atlas of Reproducible Vulnerabilities for Open-Source Software

ARVO:开源软件可复现漏洞图谱

Xiang Mei, Jordi Del Castillo, Pulkit Singh Singaria, Haoran Xi, Abdelouahab Benchikh, Tiffany Bao, Ruoyu Wang, Yan Shoshitaishvili, Adam Doupé, Hammond Pearce, Brendan Dolan-Gavitt

发表机构 * National Vulnerability Database(国家漏洞数据库) Google(谷歌)

AI总结 提出一种大规模构建可复现漏洞数据集的方法,基于OSS-Fuzz构建含6100+真实漏洞的ARVO数据集,实现81%复现率与89.4%补丁定位精度,解决可复现性、数量与多样性三难问题。

Comments Accepted at IEEE European Symposium on Security and Privacy (EuroS&P) 2026

详情
AI中文摘要

长期以来,在漏洞数据集中实现可复现性、数量和多样性被视为固有的三方权衡,改进一个维度往往以牺牲其他维度为代价。在实践中,可复现性是最常被忽视的维度。这限制了从历史错误数据集中自动提取的内容,并降低了它们对下游安全研究的实用性。在这项工作中,我们提出了一种方法,通过识别大规模错误复现的关键障碍并用通用解决方案加以解决,从而生成一个新的安全数据集,确保大规模多样化漏洞的可复现性。使用这种方法,我们为最大的开源软件漏洞数据集(OSS-Fuzz)引入了完全可复现性,并构建了ARVO数据集(开源软件可复现漏洞图谱)。ARVO是一个大规模数据集,包含311个项目中的6100多个真实世界漏洞。专注于可复现性,ARVO与现有数据集的不同之处在于,它以可以跨版本一致重建、触发和分析的形式提供每个漏洞。可复现性还使得能够自动识别每个漏洞的相应补丁,并支持代码更改后直接与漏洞交互,这是现有大规模数据集所不具备的能力。在我们的评估中,ARVO成功复现了81%的漏洞,并在定位的补丁上达到了89.4%的准确率。我们还讨论了ARVO对上游实践和下游安全研究的影响。

英文摘要

Achieving reproducibility, quantity, and diversity in vulnerability datasets has long been viewed as an inherent three-way trade-off, where improving one dimension often comes at the cost of the others. In practice, reproducibility has been the dimension most often neglected. This has limited what can be automatically extracted from historical bug datasets, and has reduced their utility for downstream security research. In this work, we propose a method to produce a new security dataset which ensures reproducibility for diverse vulnerabilities at scale by identifying the key obstacles to large-scale bug reproduction and addressing them with general solutions. Using this method, we introduce full reproducibility to the largest open source software vulnerability dataset (OSS-Fuzz) and construct the ARVO dataset (an Atlas of Reproducible Vulnerabilities in Open-source software). ARVO is a large-scale dataset consisting of over 6,100 real-world vulnerabilities across 311 projects. Focusing on reproducibility, ARVO differs from existing datasets by providing each vulnerability in a form that can be consistently rebuilt, triggered, and analyzed across versions. Reproducibility also enables automatic identification of the corresponding patch for each vulnerability and supports direct interaction with vulnerabilities after code changes, capabilities that existing large-scale datasets do not provide. In our evaluation, ARVO successfully reproduces 81% of vulnerabilities and achieves 89.4% accuracy on the located patches. We also discuss ARVO's influence on both upstream practices and downstream security research.

2606.17391 2026-06-17 cs.CL cs.AI cs.LG 交叉投稿

NarrativeWorldBench: A Frontier-Saturated Benchmark and a Latent World Model for Long-Horizon Co-Creative Audio Drama

NarrativeWorldBench:面向长程共创音频剧的前沿饱和基准与潜在世界模型

Logan Mann, Abdur Rahman, Mohammad Saifullah, Taaha Kazi, Vasu Sharma

发表机构 * University of California, Santa Barbara(加州大学圣塔芭芭拉分校) Pocket FM

AI总结 提出NarrativeWorldBench基准,在九种叙事结构指标上评估21个模型,并引入N-VSSM变分状态空间模型,通过Mamba-2骨干和事件条件后验在200集以上维持结构化潜在状态,在长弧一致性和可控性上超越Claude Opus 4.5。

Comments 10 pages. Accepted to the ICML 2026 Workshops on High-dimensional Learning Dynamics (HiLD) and Culture x AI

详情
AI中文摘要

长篇连载音频剧,其剧情弧线跨越200至800集,是一种重要的创意媒介,也是前沿大语言模型(LLM)表现不佳的场景。我们在一组统一的叙事结构指标上,对21个模型进行了基准测试,涵盖经典、微调、开放前沿、封闭前沿和推理层级。所有封闭前沿系统在情节节拍F1上饱和于[0.78, 0.81]区间,并在视界h=200时下降约-0.20 F1。我们引入了NarrativeWorldBench,一个开放基准,包含九种叙事结构指标,在h∈{10, 20, 50, 100, 200}的视界上评估,并在四种印度语言(印地语、泰米尔语、泰卢固语、马拉地语)上进行跨语言评估。我们提出了N-VSSM,一种叙事变分状态空间模型,通过Mamba-2骨干网络和事件条件后验以及8B解码器,在超过200集的时间内维持一个结构化的256维潜在世界状态。N-VSSM在所有视界上保持情节节拍F1≥0.84,计算量仅为封闭前沿区间的1/4。学习到的文化迁移函数将跨语言忠实度提高了+0.20至+0.23 Likert分。在一项受试者内作家研究(n=12位专业作者,240次试验)中,N-VSSM在长弧一致性上以71%的偏好率优于Claude Opus 4.5,在可控性上评分高出+1.3 Likert分。

英文摘要

Long-form serialized audio drama, with arcs that run for 200 to 800 episodes, is a major creative medium and a setting where frontier large language models (LLMs) fail. We benchmark 21 models, spanning classical, fine-tuned, open-frontier, closed-frontier, and reasoning tiers, on a uniform set of structural narrative metrics. All closed-frontier systems saturate at a plot-beat F1 in the band [0.78, 0.81] and collapse by about -0.20 F1 at horizon h=200. We introduce NarrativeWorldBench, an open benchmark of nine narrative-structure metrics evaluated across horizons h in {10, 20, 50, 100, 200}, with cross-lingual evaluation across four Indic languages (Hindi, Tamil, Telugu, Marathi). We introduce N-VSSM, a Narrative Variational State-Space Model that maintains a structured 256-dimensional latent world state over more than 200 episodes via a Mamba-2 backbone with an event-conditioned posterior and an 8B decoder. N-VSSM holds plot-beat F1 >= 0.84 across all horizons at 4x lower compute than the closed-frontier band. A learned Cultural Transfer Function lifts cross-language fidelity by +0.20 to +0.23 Likert points. In a within-subjects writer study (n = 12 professional authors, 240 trials), N-VSSM is preferred over Claude Opus 4.5 on long-arc consistency 71% of the time and rated +1.3 Likert points higher on controllability.

2606.17529 2026-06-17 cs.CE cs.LG 交叉投稿

Domain-Validity-Gated Metamorphic Testing of Scientific ML Surrogates

基于域有效性门控的科学机器学习代理模型蜕变测试

Meng Li, Xiaohua Yang, Jie Liu, Shiyu Yan

发表机构 * School of Computing, University of South China(南方大学计算机学院) Hunan Engineering Research Center of Software Evaluation and Testing for Intellectual Equipment(湖南软件评估与测试研究中心) CNNC Key Laboratory on High Trusted Computing(中核高可信计算重点实验室)

AI总结 针对科学机器学习代理模型缺乏真实输出的问题,提出域有效性筛选方法将候选蜕变关系转化为可执行测试资产,并在多种代理模型上验证了其有效性。

详情
AI中文摘要

科学机器学习(SciML)代理模型近似昂贵的模拟,但任意输入的精确预期输出不可用(预言机问题)。蜕变测试检查执行间的关系,但候选关系并非自动有效:其前提条件、输出映射以及评分算子的数值下限决定了违反是否有意义。我们研究如何筛选候选蜕变关系(MR)的域有效性,并将其转化为可执行的、无预言机的SciML代理模型测试资产。我们提出:(i)域有效性准则,仅当候选的容差主导算子的数值下限且其前提条件成立时才接受该候选;(ii)MR卡可执行资产格式,记录源案例、变换、度量、容差和类型化的关系级判定;(iii)在MeshGraphNets圆柱流代理模型上的案例研究协议,附带声明账本将每个结果绑定到可追踪工件。在MeshGraphNets检查点上,节点置换达到机器精度,镜像y是有界分布外压力发现而非精确对称,绝对守恒被推迟而参考相对守卫通过。相同的读数在保留轨迹、检查点列表、另外三种架构以及PhysicsNeMo上保持一致。在第二个CFD任务(可压缩翼型)上,谓词反而基于物理原因拒绝不可压缩连续性,表明它推理域有效性而非运行固定检查表。在第二个PDE族上,FNO Burgers和热代理模型运行完整的接受/拒绝/执行判定。证据涵盖两个CFD任务和第二个PDE族,支持从候选MR到可审计SciML测试资产的域有效性感知桥梁,将模型级违反与域外应用区分开。

英文摘要

Scientific machine-learning (SciML) surrogates approximate expensive simulations, but exact expected outputs for arbitrary inputs are unavailable (the oracle problem). Metamorphic testing checks relations across executions, yet a candidate relation is not automatically valid: its preconditions, output mapping, and the numerical floor of the scoring operator determine whether a violation is meaningful. We study how candidate metamorphic relations (MRs) can be screened for domain validity and turned into executable, oracle-free test assets for SciML surrogates. We propose (i) a domain-validity rubric that admits a candidate only when its tolerance dominates the operator's numerical floor and its preconditions hold; (ii) an MR-card executable-asset format recording source cases, transformations, metrics, tolerances, and typed relation-level verdicts; and (iii) a case-study protocol on MeshGraphNets cylinder-flow surrogates, with a claim ledger binding every result to a tracked artifact. On a MeshGraphNets checkpoint, node permutation holds to machine precision, mirror-y is a bounded out-of-distribution stress finding rather than an exact symmetry, and absolute conservation stays deferred while a reference-relative guard passes. The same readings hold across held-out trajectories, a checkpoint roster, three further architectures, and PhysicsNeMo. On a second CFD task (compressible airfoil) the predicate instead rejects incompressible continuity on physical grounds, showing it reasons about domain validity rather than running a fixed checklist. On a second PDE family, FNO Burgers and heat surrogates run full admit/reject/execute verdicts. The evidence spans two CFD tasks and a second PDE family, supporting a validity-aware bridge from candidate MRs to auditable SciML test assets that separates model-level violations from out-of-domain applications.

2606.17710 2026-06-17 cs.CV cs.AI cs.CL cs.LG 交叉投稿

Vision-language models for chest radiography do not always need the image

胸部X光片的视觉-语言模型并不总是需要图像

Mahshad Lotfinia, Sebastian Ziegelmayer, Lisa Adams, Daniel Truhn, Andreas Maier, Soroosh Tayebi Arasteh

发表机构 * Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg(弗里德里希-亚历山大-埃尔朗根-纽伦堡大学模式识别实验室) Department of Diagnostic and Interventional Radiology, TUM University Clinic, School of Medicine and Health, Klinikum rechts der Isar, Technical University of Munich(慕尼黑工业大学医学院与健康学院伊萨尔河右岸医院诊断与介入放射学系) Lab for AI in Medicine, RWTH Aachen University(亚琛工业大学医学人工智能实验室) Department of Diagnostic and Interventional Radiology, University Hospital RWTH Aachen(亚琛工业大学医院诊断与介入放射学系)

AI总结 本文通过因果审计方法,发现许多医学视觉-语言模型在胸部X光片任务中依赖文本先验而非图像,纯文本模型与多模态模型性能接近,并提出了基于图像依赖性的评估框架。

详情
AI中文摘要

医学视觉-语言模型报告了强大的胸部X光片准确性,这越来越多地被解读为它们使用了图像的证据。这种推断是不安全的:一个利用发现名称先验的模型得分与读取扫描的模型相同,且没有标准基准能区分它们。我们引入了一种因果审计方法,通过遮挡相关区域、遮挡无关区域以及替换为另一患者的相同标签扫描来干预图像,并结合三种行为指标测试正确答案是否依赖于图像。在九个系统中,一个没有图像访问权限的纯文本模型达到了最佳多模态模型5.7个准确度点以内的水平,而一个1190亿参数的多模态模型在统计上与70亿参数的纯文本基线无法区分。审计将队列分为三个忽略图像的模型、一个不稳定的模型和五个选择性使用图像的模型(针对部分发现);这些分类在第二个数据集、分辨率和提示措辞上保持一致。与委员会认证的放射科医生相比,纯文本模型在准确率上与放射科医生无统计差异,但基础归因于零,而使用图像的模型的基础归因率与放射科医生相当。报告的置信度仅在模型使用图像时标记无根据的答案。基础归因审计(而非准确性)应成为临床部署的门槛。

英文摘要

Medical vision-language models report strong chest radiograph accuracy, and this is increasingly read as evidence that they use the image. That inference is unsafe: a model exploiting finding-name priors scores like one that reads the scan, and no standard benchmark separates them. We introduce a causal audit that intervenes on the image, occluding the relevant region, occluding an irrelevant one, and swapping in another patient's same-label scan, and combines three behavioral metrics to test whether a correct answer depends on the image. Across nine systems, a text-only model with no image access reaches within 5.7 accuracy points of the best multimodal one, and a 119-billion-parameter multimodal model is statistically indistinguishable from a 7-billion text-only baseline. The audit splits the cohort into three models that ignore the image, one that is unstable, and five that use it selectively, for a subset of findings; the categories hold across a second dataset, resolution, and prompt phrasing. Against board-certified radiologists, a text-only model is statistically indistinguishable from a radiologist's accuracy while grounding at zero, whereas the image-using models ground at radiologist-comparable rates. Reported confidence flags ungrounded answers only when a model uses the image. Grounding audits, not accuracy, should gate clinical deployment.

2606.18011 2026-06-17 stat.ML cs.LG stat.ME 交叉投稿

Fast Nonparametric Conditional Independence Testing via Two-Stage Regression

通过两阶段回归的快速非参数条件独立性检验

Eric V. Strobl

发表机构 * Department of Biomedical Informatics, University of Pittsburgh(生物医学信息学系,匹兹堡大学)

AI总结 提出BLITZ方法,通过两阶段回归(低阶多项式+浅层树)快速消除条件集影响,实现校准良好的非参数条件独立性检验,适用于因果发现。

Comments A fast R implementation with C++ back-end is available at https://github.com/ericstrobl/BLITZ

详情
AI中文摘要

基于约束的因果发现依赖于重复的条件独立性检验,但快速非参数检验往往牺牲校准性,尤其是当变量通过非线性关系依赖于条件集时。我们提出了BLITZ(Broad-to-Local Independence Testing via residualiZation),一种非参数条件独立性检验,旨在在一秒内运行良好,同时保持约束因果发现算法执行数千次查询所需的准确性。BLITZ首先使用低阶多项式回归消除对条件集的广泛平滑依赖,然后应用一个小型非线性特征映射,并通过浅层树回归对这些特征进行残差化。得到的统计量检验残差互协方差,并采用矩匹配卡方近似于零分布。我们从理论上证明,两阶段设计降低了树残差化器面临的有效复杂度,使得浅层树能够控制残差条件均值偏差,同时避免过度过拟合。在模拟中,BLITZ提供了比快速核、随机特征和基于回归的竞争者更好的零校准,同时保持所测试方法中最快的速度之一。在合成图和流式细胞术数据的因果发现实验中,BLITZ在保留的邻接中产生了更可靠的端点方向,并具有竞争力的结构恢复。这些结果表明,从宽到局部残差化是实现因果发现中校准、可扩展的非参数条件独立性检验的实用途径。

英文摘要

Constraint-based causal discovery relies on repeated conditional independence tests, but fast nonparametric tests often sacrifice calibration, especially when variables depend on the conditioning set through nonlinear relationships. We introduce BLITZ (Broad-to-Local Independence Testing via residualiZation), a nonparametric conditional independence test designed to run well under a second while maintaining the accuracy needed for the thousands of queries performed by constraint-based causal discovery algorithms. BLITZ first removes broad smooth dependence on the conditioning set using low-order polynomial regression, then applies a small nonlinear feature map and residualizes those features with shallow tree regressions. The resulting statistic tests residual cross-covariance, with a moment-matched chi-square approximation to the null distribution. We show theoretically that the two-stage design reduces the effective complexity faced by the tree residualizers, allowing shallow trees to control residual conditional-mean bias while avoiding excessive overfitting. In simulations, BLITZ provides better null calibration than fast kernel, random-feature, and regression-based competitors while remaining among the fastest methods tested. In causal discovery experiments on synthetic graphs and flow-cytometry data, BLITZ yields more reliable endpoint orientations among retained adjacencies and competitive structural recovery. These results suggest that broad-to-local residualization is a practical route to calibrated, scalable nonparametric conditional independence testing for causal discovery.

2606.18166 2026-06-17 cs.CR cs.LG 交叉投稿

Evaluating Open-Source LLMs for Multi-Label ATT&CK Technique Classification on CTI Reports

评估开源大语言模型在CTI报告上的多标签ATT&CK技术分类

Ahmed Ryan, Saad Sakib Noor, Md Erfan, Shaswata Mitra, Sudip Mittal, Md Rayhanur Rahman

发表机构 * The University of Dhaka(达卡大学)

AI总结 针对开源LLM在复杂非结构化CTI报告上的ATT&CK分类性能未被评估的问题,构建了2076句人工标注数据集,评估7个开源LLM,最高F1为0.22,表明当前模型不足以用于生产。

详情
AI中文摘要

使用MITRE ATT&CK对网络威胁情报(CTI)进行分类对于主动防御至关重要,但历史上需要大量人工。大语言模型(LLM)之前的自动化加速了这一过程,但无法解决非结构化CTI报告中复杂的语言和多步攻击模式。LLM通过上下文推理理解非结构化文本,解决了以前的局限性。然而,当前的评估依赖于简化的单技术句子,忽略了真实CTI报告的复杂性,往往导致性能结果膨胀。因此,开源LLM在复杂非结构化CTI报告上的基线性能仍未得到评估。为弥补这一差距,我们从83份复杂非结构化CTI报告中构建了一个包含2076句人工标注(1281句技术阳性,795句阴性)的真实数据集。这些句子通过六阶段标注过程映射到114种独特的ATT&CK技术,实现了kappa=0.68的标注者间一致性。利用该数据集,我们评估了7个参数从8B到236B的开源LLM,涉及提示策略和温度配置。性能最高的LLM实现了0.22的微平均F1分数,为复杂非结构化CTI上的多标签ATT&CK分类建立了经验基线。参数大小与F1分数呈统计显著正相关。提示策略和温度在不同模型配置下未产生统计显著的增益。这些结果表明,当前开源LLM不足以用于生产级ATT&CK分类。该数据集、基准和发现为未来的CTI研究提供了可复现的基础。

英文摘要

Classifying Cyber Threat Intelligence (CTI) using MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) is essential for proactive defense, but historically required extensive human effort. Pre-Large Language Model (LLM) automation sped up this process, but could not resolve the complex language and multi-step attack patterns found in unstructured CTI reports. LLMs addressed previous limitations by using contextual reasoning to understand unstructured text. However, current evaluations rely on simplified, single-technique sentences that ignore the complexity of real-world CTI reports, which often leads to inflated performance results. Consequently, the baseline performance of open-source LLMs on complex unstructured CTI reports remains unevaluated. To address this gap, we constructed a ground-truth dataset of 2,076 human-annotated sentences (1,281 technique-positive, 795 negative) from 83 complex unstructured CTI reports. These sentences were mapped to 114 unique ATT&CK techniques using a six-phase annotation process, achieving \k{appa} = 0.68 inter-annotator agreement. Using this dataset, we evaluated seven open-source LLMs ranging from 8B to 236B parameters across prompt strategy and temperature configurations. The highest-performing LLM achieved a micro-averaged F1 score of 0.22, establishing the empirical baseline for multi-label ATT&CK classification on complex unstructured CTI. Parameter size showed a statistically significant positive correlation with F1 score. Prompt strategy and temperature produced no statistically significant gains across model configurations. These results indicate that current open-source LLMs are insufficient for production-grade ATT&CK classification. The dataset, benchmark, and findings provide a reproducible foundation for future CTI research.

2606.18190 2026-06-17 cs.CR cs.LG 交叉投稿

Multi-Source Cybersecurity Logs: An ATT&CK-Labeled Dataset and SLM Evaluation

多源网络安全日志:一个ATT&CK标记数据集及小语言模型评估

Abir Ashab Niloy, Ahmed Ryan, Imamul Hossain Rafi, Md Erfan, Md Rayhanur Rahman

发表机构 * Windows endpoints(Windows终端)

AI总结 为解决多阶段网络攻击检测中缺乏带ATT&CK技术标签的多源日志数据集问题,构建了包含870个会话(70个攻击、800个良性)和约230万事件的多源日志数据集,并基于该数据集微调三个小语言模型,在分块分类任务上准确率从约8%提升至90%-97%。

详情
AI中文摘要

多阶段网络攻击跨越系统、网络和浏览器日志。检测它们需要关联所有三个来源的事件。机器学习方法可以学习这些跨源模式,但需要带标签的多源数据。现有的公共数据集存在不足。仅网络数据集如CICIDS和UNSW-NB15缺少主机和浏览器活动。以主机为中心的数据集如LMDG和CICAPT-IIoT缺乏浏览器遥测。ATLAS包含所有三个来源,但仅将事件标记为恶意或良性,没有MITRE对抗战术、技术和通用知识(ATT&CK)技术的粒度。没有公共数据集将三个来源与每条记录的ATT&CK技术标签结合起来。我们通过构建一个包含870个会话(70个攻击,800个良性)和约230万事件的多源日志数据集来弥补这一差距。我们在Windows端点上同时捕获了系统、网络和浏览器活动。我们用ATT&CK技术ID标记了恶意事件,涵盖了12种战术和53种技术。我们使用真实工具生成了所有攻击数据,包括远程访问木马(RAT)、命令与控制(C2)隧道和云外泄。为了展示可学习性,我们使用低秩适配(LoRA)微调了三个小语言模型(SLM)(Qwen2.5-1.5B、Llama-3.2-3B、Phi-4-Mini)。我们在两个任务(分块分类和ATT&CK技术识别)上,将每个模型与其基础变体在十个指标上进行了比较。微调在每个指标上改进了每个模型。分块分类准确率从基础变体的大约8%提高到微调后的90%到97%。技术识别仍然具有挑战性,最佳精确匹配准确率为42%,尽管高部分匹配分数表明模型捕捉到了大部分底层推理。

英文摘要

Multi-stage cyberattacks span system, network, and browser logs. Detecting them requires correlating events across all three sources. Machine learning methods can learn these cross-source patterns, but they need labeled multi-source data. Existing public datasets fall short. Network-only datasets such as CICIDS and UNSW-NB15 miss host and browser activity. Host-focused datasets such as LMDG and CICAPT-IIoT lack browser telemetry. ATLAS includes all three sources but labels events only as malicious or benign, without MITRE Adversarial Tactics, Techniques, and Common Knowledge (ATT&CK) technique granularity. No public dataset combines all three sources with per-entry ATT&CK technique labels. We close the gap by building a multi-source log dataset of 870 sessions (70 attack, 800 benign) and approximately 2.3 million events. We captured system, network, and browser activity simultaneously on Windows endpoints. We labeled malicious events with ATT&CK technique IDs, covering 12 tactics and 53 techniques. We generated all attack data using real tools, including Remote Access Trojan (RAT), Command and Control (C2) tunnels, and cloud exfiltration. To demonstrate learnability, we fine-tuned three Small Language Models (SLMs) (Qwen2.5-1.5B, Llama-3.2-3B, Phi-4-Mini) using Low-Rank Adaptation (LoRA). We compared each against its base variant across ten metrics on two tasks: chunk classification and ATT&CK technique identification. Fine-tuning improved every model on every metric. Chunk classification accuracy rose from approximately 8% in the base variants to between 90% and 97% after fine-tuning. Technique identification remained challenging, with the best exact-match accuracy at 42%, although high partial-match scores show the models captured most of the underlying reasoning.

2606.18237 2026-06-17 cs.CL cs.AI cs.LG 交叉投稿

ReproRepo: Scaling Reproducibility Audits with GitHub Repository Issues

ReproRepo: 利用 GitHub 仓库问题扩展可重复性审计

Shanda Li, Qiuhong Anna Wei, Jingwu Tang, Valerie Chen, Nihar B Shah, Tim Dettmers, Yiming Yang, Ameet Talwalkar

发表机构 * School of Computer Science, Carnegie Mellon University(卡内基梅隆大学计算机科学学院) Datadog

AI总结 提出 ReproRepo 框架,利用 GitHub issues 作为监督信号,对 1149 篇论文进行可重复性评估,发现 Codex with GPT-5.5 能识别约 90% 论文的语义相关复现问题。

详情
AI中文摘要

从论文和已发布代码中复现研究结果对科学进步至关重要。现有工作引入了基准测试来评估 LLM 代理是否能协助可重复性,但由于数据整理和评估需要大量人工努力,这些基准难以扩展。我们提出了 ReproRepo,一个可扩展的可重复性评估框架,利用人类提出的 GitHub issues 作为真实复现障碍的自然监督信号。我们在来自主要会议的 1149 篇近期机器学习论文上实例化 ReproRepo,并评估了四种前沿模型代理配置。我们的结果表明,即使不执行代码,LLM 代理也能从论文-仓库对中识别出许多现实世界的可重复性问题:我们研究中的最佳代理,即带有 GPT-5.5 的 Codex,为研究中约 90% 的论文揭示了至少一个语义相关的人类报告的障碍。进一步分析表明,代理在揭示可见故障和识别正确语义区域方面特别有效,但在精确定位方面可能仍不足。ReproRepo 可作为未来在真实世界可重复性审计中评估 LLM 代理的可重用、可扩展框架。我们的代码发布在 https://this URL。

英文摘要

Reproducing research results from papers and released code is central to scientific progress. Existing works have introduced benchmarks to evaluate whether LLM agents can assist with reproducibility, but they are difficult to scale due to their reliance on substantial manual effort for data curation and evaluation. We introduce ReproRepo, a scalable framework for reproducibility evaluation that leverages human-raised GitHub issues as naturally occurring supervision on realistic reproduction blockers. We instantiate ReproRepo on 1,149 recent machine learning papers from major conferences and evaluate four frontier model-agent configurations. Our results show that LLM agents, even without executing code, can identify many real-world reproducibility problems from paper-repository pairs: the best agent in our study, namely Codex with GPT-5.5, surfaces at least one semantically related human-reported blocker for ~90% of papers in the study. Further analysis shows that agents are particularly effective for surfacing visible failures and identifying the right semantic region, but may still be insufficient in exact localization. ReproRepo can serve as a reusable, scalable framework for future evaluations of LLM agents on real-world reproducibility auditing. Our code is released at https://github.com/LithiumDA/ReproRepo.

2502.00241 2026-06-17 cs.LG cs.AI cs.CL cs.CV 版本更新

Mordal: Automated Pretrained Model Selection for Vision Language Models

Mordal: 面向视觉语言模型的自动化预训练模型选择

Shiqi He, Insu Jang, Mosharaf Chowdhury

AI总结 提出Mordal框架,通过减少候选模型数量和评估时间,自动化搜索用户定义任务的最佳视觉语言模型,相比网格搜索降低GPU耗时8.9-11.6倍,加权Kendall's τ平均提升69%。

详情
AI中文摘要

将多种模态融入大型语言模型(LLMs)是增强其对非文本数据理解、使其能够执行多模态任务的有效方式。视觉语言模型(VLMs)因其在医疗、机器人和无障碍等领域的众多实际应用,成为增长最快的多模态模型类别。然而,尽管文献中不同的VLM在不同基准测试中展现出令人印象深刻的视觉能力,它们都是由人类专家手工设计的;目前尚无自动化框架来创建特定任务的多模态模型。我们引入Mordal,一种自动化多模态模型搜索框架,能够高效地为用户定义的任务找到最佳VLM,无需人工干预。Mordal通过减少搜索过程中需考虑的候选模型数量以及最小化评估每个剩余候选模型所需的时间来实现这一目标。我们的评估表明,Mordal能够找到给定问题的最佳VLM,其GPU耗时比网格搜索低8.9倍至11.6倍。我们还发现,Mordal在不同任务上平均比最先进的模型选择方法实现约69%更高的加权Kendall's τ。

英文摘要

Incorporating multiple modalities into large language models (LLMs) is a powerful way to enhance their understanding of non-textual data, enabling them to perform multimodal tasks. Vision language models (VLMs) form the fastest growing category of multimodal models because of their many practical use cases, including in healthcare, robotics, and accessibility. Unfortunately, even though different VLMs in the literature demonstrate impressive visual capabilities in different benchmarks, they are handcrafted by human experts; there is no automated framework to create task-specific multimodal models. We introduce Mordal, an automated multimodal model search framework that efficiently finds the best VLM for a user-defined task without manual intervention. Mordal achieves this both by reducing the number of candidates to consider during the search process and by minimizing the time required to evaluate each remaining candidate. Our evaluation shows that Mordal can find the best VLM for a given problem using $8.9\times$--$11.6\times$ lower GPU hours than grid search. We have also discovered that Mordal achieves about 69\% higher weighted Kendall's $τ$ on average than the state-of-the-art model selection method across diverse tasks.

2507.18623 2026-06-17 cs.LG cs.AI cs.MA 版本更新

Moving Out: Physically-grounded Human-AI Collaboration

Moving Out: 基于物理的人机协作

Xuhui Kang, Sung-Wook Lee, Haolin Liu, Yuyan Wang, Yen-Ling Kuo

AI总结 提出Moving Out基准测试,模拟物理约束下的协作场景,并开发BASS方法增强智能体多样性及动作理解,实验证明其与未见过的AI和人类均能有效协作。

Comments Accepted at ICML 2026

详情
AI中文摘要

适应环境中的物理动作和约束的能力对于具身智能体(如机器人)与人类有效协作至关重要。这种基于物理的人机协作必须考虑连续状态-动作空间增加的复杂性以及物理约束导致的受限动力学。然而,大多数现有的协作基准是离散的,或者不考虑物理属性和约束。为了解决这个问题,我们引入了Moving Out,一个人机协作基准,它模拟了受物理属性和约束影响的各种协作模式,例如一起移动重物以及协调动作将物品绕过角落。Moving Out包含两个挑战和人类-人类交互数据,以全面评估模型适应多样化人类行为和未见物理属性的能力。为了使具身智能体能够在物理属性和约束下与人类协作,我们提出了一种新方法BASS(行为增强、模拟和选择),以增强智能体的多样性及其对动作结果的理解。我们系统地将BASS与最先进模型在AI-AI和人机实验中进行了比较,结果表明BASS能够有效地与未见过的AI和人类协作。项目页面可在此https URL访问。

英文摘要

The ability to adapt to physical actions and constraints in an environment is crucial for embodied agents (e.g., robots) to effectively collaborate with humans. Such physically grounded human-AI collaboration must account for the increased complexity of the continuous state-action space and constrained dynamics caused by physical constraints. However, most existing collaboration benchmarks are discrete or do not consider physical attributes and constraints. To address this, we introduce Moving Out, a human-AI collaboration benchmark that resembles a wide range of collaboration modes affected by physical attributes and constraints, such as moving heavy items together and coordinating actions to move an item around a corner. Moving Out consists of two challenges and human-human interaction data to comprehensively evaluate models' abilities to adapt to diverse human behaviors and unseen physical attributes. To give embodied agents the capability to collaborate with humans under physical attributes and constraints, we propose a novel method, BASS (Behavior Augmentation, Simulation, and Selection), to enhance the diversity of agents and their understanding of the outcome of actions. We systematically compare BASS and state-of-the-art models in AI-AI and human-AI experiments, showing that BASS can effectively collaborate with both unseen AI and humans. The project page is available at https://live-robotics-uva.github.io/movingout_ai/.

2512.21315 2026-06-17 cs.LG cs.CV stat.ML 版本更新

Does the Data Processing Inequality Reflect Practice? On the Utility of Low-Level Tasks

数据处理不等式是否反映实践?论低级任务的有用性

Roy Turgeman, Tom Tirer

AI总结 本文研究低级处理(如去噪、编码)如何提升分类性能,证明在有限样本下存在预处理可提高准确率,并通过实验验证理论趋势。

Comments ICLR 2026 (camera-ready). Code is available at: https://github.com/serveroy/process-before-you-classify

详情
Journal ref
The Fourteenth International Conference on Learning Representations (ICLR 2026)
AI中文摘要

数据处理不等式是一个信息论原理,指出信号的信息内容不能通过处理观测数据而增加。特别地,它表明在解决分类问题之前,增强信号或对其进行编码没有益处。对于最优贝叶斯分类器,这一断言可以被证明是正确的。然而,在实践中,尽管现代深度神经网络具有强大的能力,但在高级下游任务之前执行“低级”任务仍然很常见。在本文中,我们旨在理解低级处理何时以及为何对分类有益。我们提出了一个二元分类设置的综合理论研究,其中我们考虑一个与最优贝叶斯分类器紧密相连的分类器,并随着训练样本数量的增加而收敛到它。我们证明,对于任何有限数量的训练样本,存在一种预分类处理可以提高分类准确率。我们还探讨了类分离、训练集大小和类平衡对该过程相对增益的影响。我们通过理论设置的经验研究来支持我们的理论。最后,我们进行了一项实证研究,调查去噪和编码对基准数据集上实际深度分类器性能的影响。具体来说,我们改变了训练集的大小和类别分布以及噪声水平,并展示了与我们的理论结果一致的趋势。

英文摘要

The data processing inequality is an information-theoretic principle stating that the information content of a signal cannot be increased by processing the observations. In particular, it suggests that there is no benefit in enhancing the signal or encoding it before addressing a classification problem. This assertion can be proven to be true for the case of the optimal Bayes classifier. However, in practice, it is common to perform "low-level" tasks before "high-level" downstream tasks despite the overwhelming capabilities of modern deep neural networks. In this paper, we aim to understand when and why low-level processing can be beneficial for classification. We present a comprehensive theoretical study of a binary classification setup, where we consider a classifier that is tightly connected to the optimal Bayes classifier and converges to it as the number of training samples increases. We prove that for any finite number of training samples, there exists a pre-classification processing that improves the classification accuracy. We also explore the effect of class separation, training set size, and class balance on the relative gain from this procedure. We support our theory with an empirical investigation of the theoretical setup. Finally, we conduct an empirical study where we investigate the effect of denoising and encoding on the performance of practical deep classifiers on benchmark datasets. Specifically, we vary the size and class distribution of the training set, and the noise level, and demonstrate trends that are consistent with our theoretical results.

2602.03300 2026-06-17 cs.LG cs.AI cs.CL cs.CV 版本更新

R1-SyntheticVL: Is Synthetic Data from Generative Models Ready for Multimodal Large Language Model?

R1-SyntheticVL:生成模型的合成数据是否已为多模态大语言模型做好准备?

Jingyi Zhang, Tianyi Lin, Huanjin Yao, Xiang Lan, Shunyu Liu, Jiaxing Huang

AI总结 提出集体对抗数据合成(CADS)方法,通过集体智能和对抗学习自动生成高质量、多样且具有挑战性的多模态数据,用于增强多模态大语言模型(MLLM)在复杂现实任务中的性能。

Comments ICML 2026 Camera Ready

详情
AI中文摘要

在这项工作中,我们旨在开发有效的数据合成技术,自主合成多模态训练数据,以增强MLLM解决复杂现实任务的能力。为此,我们提出了集体对抗数据合成(CADS),这是一种新颖且通用的方法,用于合成高质量、多样且具有挑战性的多模态数据。CADS的核心思想是利用集体智能确保高质量和多样化的生成,同时探索对抗学习以合成具有挑战性的样本,从而有效驱动模型改进。具体来说,CADS包含两个循环阶段:集体对抗数据生成(CAD-Generate)和集体对抗数据判断(CAD-Judge)。CAD-Generate利用集体知识共同生成新的多样化多模态数据,而CAD-Judge则协作评估合成数据的质量。此外,CADS引入了一种对抗上下文优化机制,以优化生成上下文,鼓励生成具有挑战性和高价值的数据。通过CADS,我们构建了MMSynthetic-20K并训练了我们的模型R1-SyntheticVL,该模型在多个基准测试中表现出优越的性能。

英文摘要

In this work, we aim to develop effective data synthesis techniques that autonomously synthesize multimodal training data for enhancing MLLMs in solving complex real-world tasks. To this end, we propose Collective Adversarial Data Synthesis (CADS), a novel and general approach to synthesize high-quality, diverse and challenging multimodal data for MLLMs. The core idea of CADS is to leverage collective intelligence to ensure high-quality and diverse generation, while exploring adversarial learning to synthesize challenging samples for effectively driving model improvement. Specifically, CADS operates with two cyclic phases, i.e., Collective Adversarial Data Generation (CAD-Generate) and Collective Adversarial Data Judgment (CAD-Judge). CAD-Generate leverages collective knowledge to jointly generate new and diverse multimodal data, while CAD-Judge collaboratively assesses the quality of synthesized data. In addition, CADS introduces an Adversarial Context Optimization mechanism to optimize the generation context to encourage challenging and high-value data generation. With CADS, we construct MMSynthetic-20K and train our model R1-SyntheticVL, which demonstrates superior performance on various benchmarks.

2602.06276 2026-06-17 cs.LG stat.ML 版本更新

Statistical Learning from Attribution Sets

从归因集合中进行统计学习

Lorne Applebaum, Robert Busa-Fekete, August Y. Chen, Claudio Gentile, Tomer Koren, Aryan Mokhtari

发表机构 * Google Research(谷歌研究) Cornell University(康奈尔大学) Tel Aviv University(特拉维夫大学) UT Austin(德克萨斯大学奥斯汀分校)

AI总结 针对隐私约束下广告点击与转化无法直接关联的问题,提出基于归因集合的无偏损失估计方法,实现经验风险最小化的泛化保证,并优于行业启发式方法。

Comments COLT 2026. 45 pages

详情
AI中文摘要

我们解决了隐私约束下广告领域转化预测模型的训练问题,其中点击和转化之间缺乏直接链接。受隐私保护浏览器API和第三方cookie弃用的启发,我们研究了一种设置,其中学习器观察到一系列点击和一系列转化,但只能将转化与一组候选点击(归因集合)相关联,而不是唯一的来源。我们将此形式化为从由具有候选先验分布的无知对手生成的归因集合中进行学习。尽管缺乏显式标签,我们通过一种新颖的方法从这些粗粒度信号中构建了总体损失的无偏估计量。利用该估计量,我们表明经验风险最小化实现了泛化保证,该保证随先验的信息量而缩放,并且对先验的估计误差也具有鲁棒性,尽管归因集合之间存在复杂的依赖关系。在标准数据集上的简单实证评估表明,我们的无偏方法显著优于常见的行业启发式方法,特别是在归因集合较大或重叠的情况下。

英文摘要

We address the problem of training conversion prediction models in advertising domains under privacy constraints, where direct links between ad clicks and conversions are unavailable. Motivated by privacy-preserving browser APIs and the deprecation of third-party cookies, we study a setting where the learner observes a sequence of clicks and a sequence of conversions, but can only link a conversion to a set of candidate clicks (an attribution set) rather than a unique source. We formalize this as learning from attribution sets generated by an oblivious adversary equipped with a prior distribution over the candidates. Despite the lack of explicit labels, we construct an unbiased estimator of the population loss from these coarse signals via a novel approach. Leveraging this estimator, we show that Empirical Risk Minimization achieves generalization guarantees that scale with the informativeness of the prior and is also robust against estimation errors in the prior, despite complex dependencies among attribution sets. Simple empirical evaluations on standard datasets suggest our unbiased approach significantly outperforms common industry heuristics, particularly in regimes where attribution sets are large or overlapping.

2603.20775 2026-06-17 cs.LG 版本更新

Evaluating Uplift Modeling under Structural Biases: Insights into Metric Stability and Model Robustness

评估结构偏差下的提升建模:对指标稳定性和模型鲁棒性的洞察

Yuxuan Yang, Dugang Liu, Yiyan Huang

AI总结 针对现实营销数据中的多种偏差,设计半合成基准框架,发现TARNet具有鲁棒性,且与ATE对齐的指标更稳定。

Comments Accepted by KDD 26

详情
AI中文摘要

在个性化营销中,提升模型通过反事实分析模拟客户在不同干预下的行为变化,来估计干预的增量效果。然而,现实营销数据常存在多种偏差,如选择偏差、溢出效应、测量误差和未观测混杂。这些偏差会同时影响提升估计的准确性和评估指标的有效性。尽管偏差感知评估很重要,但缺乏系统研究来评估不同模型和指标在偏差条件下的表现。为填补这一空白,我们设计了一个系统基准框架。与标准预测任务不同,现实提升数据集天然缺乏反事实真值。这一限制使得评估指标的直接验证不可行,并阻碍了偏差的精确量化。因此,半合成方法成为系统基准的关键推动力。该方法通过保留现实特征依赖关系,同时提供隔离结构偏差所需的真值,有效弥合了差距。我们的研究发现:(i) 提升定位和预测可能表现为不同目标,擅长一个并不保证另一个有效;(ii) 尽管许多模型在多种偏差下表现不一致,但TARNet表现出显著的鲁棒性,为后续模型设计提供了见解;(iii) 评估指标的稳定性与其与ATE的数学对齐程度相关,表明在结构数据不完美下,近似ATE的指标能产生更一致的模型排名。这些发现表明,在现实数据不完美下需要更鲁棒的提升模型和评估指标。

英文摘要

In personalized marketing, uplift models estimate the incremental effect of an intervention by modeling how customer behavior would change under alternative treatments using counterfactual analysis. However, real-world marketing data often exhibit various biases, such as selection bias, spillover effects, measurement error, and unobserved confounding. These biases can adversely affect both the accuracy of uplift estimation and the validity of evaluation metrics. Despite the importance of bias-aware assessment, there remains a lack of systematic studies evaluating how different models and metrics perform under such biased conditions. To bridge this gap, we design a systematic benchmarking framework. Unlike standard predictive tasks, real-world uplift datasets inherently lack counterfactual ground truth. This limitation renders the direct validation of evaluation metrics infeasible and prevents the precise quantification of biases. Therefore, a semi-synthetic approach serves as a critical enabler for systematic benchmarking. This approach effectively bridges the gap by retaining real-world feature dependencies while providing the ground truth needed to isolate structural biases. Our investigations reveal that (i) uplift targeting and prediction can manifest as distinct objectives, where proficiency in one does not ensure efficacy in the other; (ii) while many models exhibit inconsistent performance under diverse biases, TARNet shows notable robustness, providing insights for subsequent model design; (iii) the stability of evaluation metrics is linked to their mathematical alignment with the ATE, suggesting that ATE-approximating metrics yield more consistent model rankings under structural data imperfections. These findings suggest the need for more robust uplift models and evaluation metrics under real-world data imperfections.

2603.26592 2026-06-17 cs.LG cs.AI cs.HC 版本更新

Evaluating Interactive 2D Visualization as a Sample Selection Strategy for Biomedical Time-Series Data Annotation

评估交互式二维可视化作为生物医学时间序列数据标注的样本选择策略

Einari Vaaras, Manu Airaksinen, Okko Räsänen

AI总结 针对生物医学时间序列标注困难,比较随机采样、最远优先遍历和基于交互式2D可视化(2DV)的三种样本选择方法,在婴儿运动评估和语音情感识别任务中,2DV在聚合标签时表现最佳,但个体标注者间标签分布差异大,随机采样最安全。

Comments Accepted for publication in Computers in Biology and Medicine (Elsevier)

详情
AI中文摘要

生物医学领域中可靠的机器学习模型依赖于准确的标签,然而标注生物医学时间序列数据仍然具有挑战性。算法样本选择可能支持标注,但涉及真实人类标注者的研究证据很少。因此,我们比较了三种用于标注的样本选择方法:随机采样(RND)、最远优先遍历(FAFT)和一种基于图形用户界面的方法,该方法能够探索高维数据的互补二维可视化(2DV)。我们在婴儿运动评估(IMA)和语音情感识别(SER)的四个分类任务中评估了这些方法。十二名标注者,分为专家和非专家,在有限的标注预算下进行数据标注,并进行了标注后实验以评估采样方法。在所有分类任务中,当聚合标注者的标签时,2DV表现最佳。在IMA中,2DV最有效地捕获了稀有类别,但也表现出由于有限的标注预算导致的标注者间标签分布变异性增大,当模型在个体标注者的标签上训练时,分类性能下降;在这些情况下,FAFT表现出色。对于SER,2DV在专家标注者中优于其他方法,并在个体标注者设置中与非专家标注者的性能相当。失败风险分析显示,当标注者数量或标注者专业知识不确定时,RND是最安全的选择,而2DV由于标签分布变异性更大而具有最高风险。此外,实验后访谈表明,2DV使标注任务更有趣和愉快。总体而言,基于2DV的采样对于生物医学时间序列数据标注似乎很有前景,特别是在标注预算不是非常紧张的情况下。

英文摘要

Reliable machine-learning models in biomedical settings depend on accurate labels, yet annotating biomedical time-series data remains challenging. Algorithmic sample selection may support annotation, but evidence from studies involving real human annotators is scarce. Consequently, we compare three sample selection methods for annotation: random sampling (RND), farthest-first traversal (FAFT), and a graphical user interface-based method enabling exploration of complementary 2D visualizations (2DVs) of high-dimensional data. We evaluated the methods across four classification tasks in infant motility assessment (IMA) and speech emotion recognition (SER). Twelve annotators, categorized as experts or non-experts, performed data annotation under a limited annotation budget, and post-annotation experiments were conducted to evaluate the sampling methods. Across all classification tasks, 2DV performed best when aggregating labels across annotators. In IMA, 2DV most effectively captured rare classes, but also exhibited greater annotator-to-annotator label distribution variability resulting from the limited annotation budget, decreasing classification performance when models were trained on individual annotators' labels; in these cases, FAFT excelled. For SER, 2DV outperformed the other methods among expert annotators and matched their performance for non-experts in the individual-annotator setting. A failure risk analysis revealed that RND was the safest choice when annotator count or annotator expertise was uncertain, whereas 2DV had the highest risk due to its greater label distribution variability. Furthermore, post-experiment interviews indicated that 2DV made the annotation task more interesting and enjoyable. Overall, 2DV-based sampling appears promising for biomedical time-series data annotation, particularly when the annotation budget is not highly constrained.

2606.11616 2026-06-17 cs.LG cs.IR 版本更新

DeMix: Debugging Training Data with Mixed Data Error Types by Investigating Influence Vectors

DeMix: 通过影响向量调试包含混合错误类型的训练数据

Jiale Deng, Yanyan Shen, Xiaogang Shi, Junjun Chai

发表机构 * Shanghai Jiao Tong University(上海交通大学) ByteDance Inc.(字节跳动) Tiktok

AI总结 提出DeMix框架,利用影响向量捕捉不同错误类型对模型行为的独特模式,将数据调试转化为多标签分类问题,并引入基于干预的学习策略,在11个任务上显著提升调试F1分数和修复后模型性能。

详情
AI中文摘要

高质量的训练数据对于机器学习模型的成功至关重要。然而,真实世界的数据集通常包含由数据准备流程中的系统性缺陷引起的混合错误类型,包括标签错误、特征错误和虚假相关性。有效的训练数据调试既需要检测错误样本,也需要识别其具体的错误类型以便进行针对性修复,但现有的数据清洗和归因方法未能充分满足这一双重需求。在本文中,我们提出DeMix,一种同时诊断错误样本及其错误类型的新框架。我们的关键见解是,不同的错误类型会在模型行为上产生不同的模式。DeMix通过影响向量捕获这些特定于错误的模式,这些影响向量描述了每个训练样本如何影响所有验证样本上的模型预测。我们将训练数据调试形式化为一个多标签分类问题,其中开发了一个分类器直接从影响向量预测错误类型。我们进一步引入了一种基于干预的学习策略,引导分类器捕获每种错误类型特有的不变理由,确保学到的分类器有效泛化。在表格数据预测、推荐系统和LLM对齐等11个任务上的实证评估表明,DeMix显著优于最先进的方法,在数据调试F1分数上提高了22.61%,在数据修复后任务模型性能上提高了9.32%。代码可在以下网址获取:this https URL。

英文摘要

High-quality training data is essential for the success of machine learning models. However, real-world datasets often contain mixed types of errors arising from systematic flaws in data preparation pipelines, including label errors, feature errors, and spurious correlations. Effective debugging of training data requires both detecting erroneous samples and identifying their specific error types to enable targeted repair, yet existing data cleaning and attribution methods fail to adequately address this dual requirement. In this paper, we propose DeMix, a novel framework that simultaneously diagnoses erroneous samples and their error types. Our key insight is that different error types produce distinct patterns on model behavior. DeMix captures such error-specific patterns by influence vectors that characterize how each training sample affects model predictions across all validation samples. We formulate training data debugging as a multi-label classification problem where a classifier is developed to predict error types directly from influence vectors. We further introduce an intervention-based learning strategy that guides the classifier to capture invariant rationales specific to each error type, ensuring the learned classifier generalizes effectively. Empirical evaluations on 11 tasks across tabular data prediction, recommendation systems, and LLM alignment demonstrate that DeMix significantly outperforms state-of-the-art approaches, achieving a 22.61% improvement in data debugging F1-score and a 9.32% gain in task model performance after data repair. Code is available at: https://github.com/SJTU-DMTai/DeMix.

2510.04421 2026-06-17 stat.ML cs.LG math.ST stat.TH 版本更新

Learning Survival Models with Right-Censored Reporting Delays

学习带有右删失报告延迟的生存模型

Yuta Shikuri, Hironori Fujisawa

发表机构 * The Graduate University for Advanced Studies(高级研究大学) Tokio Marine Holdings, Inc.(东京海上日赤保险株式会社) The Institute of Statistical Mathematics(统计数学研究所) RIKEN(理化学研究所)

AI总结 针对报告延迟导致的生存数据右删失问题,联合建模事件和报告过程的参数风险,提出一致估计量和蒙特卡洛EM算法,并利用迁移学习提高行政删失下及时风险评估的准确性。

Comments 26 pages, 3 figures, 3 tables

详情
AI中文摘要

生存分析提供了对事件发生时间进行建模的统计方法。当事件发生时间未在发生时被观察到,而是仅在报告时被揭示时,就会出现报告延迟。当由于行政删失导致观察窗口较短时,这一问题对于及时风险评估尤为关键。在本研究中,我们通过对事件和报告过程联合建模参数风险,纳入了右删失报告延迟。然后,我们为模型参数构建了一致估计量,并开发了蒙特卡洛期望最大化算法来计算它。为了应对行政删失带来的挑战,我们利用这些发现并提出了一种迁移学习程序。实验结果表明,我们的方法提高了行政删失下及时风险评估的准确性。

英文摘要

Survival analysis provides statistical methods to model the time until an event occurs. Reporting delays arise when event times are not observed at their occurrence but are only revealed upon reporting. This issue is particularly critical for timely risk evaluation when the observation window is short due to administrative censoring. In this study, we incorporate right-censored reporting delays by jointly modeling parametric hazards for the event and reporting processes. We then construct a consistent estimator for the model parameters and develop a Monte Carlo expectation-maximization algorithm to compute it. To address the challenges posed by administrative censoring, we leverage these findings and propose a transfer-learning procedure. Experimental results demonstrate that our method improves the accuracy of timely risk evaluation under administrative censoring.

2511.01650 2026-06-17 cs.CL cs.AI cs.LG 版本更新

EngTrace: A Symbolic Benchmark for Verifiable Process Supervision of Engineering Reasoning

EngTrace:工程推理可验证过程监督的符号基准

Ayesha Gull, Muhammad Usman Safder, Rania Elbadry, Fan Zhang, Veselin Stoyanov, Preslav Nakov, Zhuohan Xie

AI总结 提出EngTrace符号基准,包含1350个参数化测试用例,通过两阶段可验证评估框架(分层协议+AI仲裁)检验中间推理轨迹与最终答案,揭示数值精度与轨迹保真度的权衡。

Comments 33 pages, includes figures and tables; introduces the EngTrace benchmark

详情
AI中文摘要

大型语言模型(LLM)正越来越多地进入由严格定量标准和不变物理定律约束的专业化、安全关键的工程工作流程,因此对其推理能力进行严格评估势在必行。然而,现有的基准(如MMLU、MATH和HumanEval)评估的是孤立的认知技能,未能捕捉工程中核心的基于物理的推理,其中科学原理、定量建模和实际约束必须融合。为了实现工程中的可验证过程监督,我们引入了EngTrace,这是一个基于90个参数化模板构建的符号基准,每个模板生成独特的、抗污染的实例,涵盖三个主要工程分支、九个核心领域和20个不同领域,产生1350个测试用例,以压力测试跨多样物理场景的泛化能力。超越结果匹配,我们引入了一个可验证的两阶段评估框架,该框架使用分层协议通过自动化程序检查和异构AI仲裁来验证中间推理轨迹以及最终答案。我们对27个领先LLM的评估揭示了数值精度与轨迹保真度之间的明显权衡,识别出一个复杂性悬崖,其中抽象数学预训练未能转化为高级工程任务所需的整合推理。

英文摘要

Large Language Models (LLMs) are increasingly entering specialized, safety-critical engineering workflows governed by strict quantitative standards and immutable physical laws, making rigorous evaluation of their reasoning capabilities imperative. However, existing benchmarks such as MMLU, MATH, and HumanEval assess isolated cognitive skills, failing to capture the physically grounded reasoning central to engineering, where scientific principles, quantitative modeling, and practical constraints must converge. To enable verifiable process supervision in engineering, we introduce EngTrace, a symbolic benchmark built on 90 parameterized templates, each generating unique, contamination-resistant problem instances, spanning three major engineering branches, nine core domains, and 20 distinct areas, yielding 1,350 test cases that stress-test generalization across diverse physical scenarios. Moving beyond outcome matching, we introduce a verifiable two-stage evaluation framework that uses a tiered protocol to validate intermediate reasoning traces alongside final answers through automated procedural checks and a heterogeneous AI Tribunal. Our evaluation of 27 leading LLMs reveals a distinct trade-off between numeric precision and trace fidelity, identifying a complexity cliff where abstract mathematical pre-training fails to translate into the integrative reasoning required for advanced engineering tasks.

2601.21455 2026-06-17 stat.ML cs.LG 版本更新

Questioning the Coverage-Length Metric in Conformal Prediction: When Shorter Intervals Are Not Better

质疑共形预测中的覆盖-长度度量:当更短的区间并不更好时

Yizhou Min, Yizhou Lu, Lanqi Li, Zhen Zhang, Jiaye Teng

AI总结 本文批判性检验共形预测中标准度量(覆盖率和区间长度)的充分性,揭示一种称为“偏见技巧”(PT)的反直觉方法可欺骗性地缩短区间长度而保持覆盖有效,并提出新度量“区间稳定性”以检测此类行为。

详情
AI中文摘要

共形预测(CP)已成为无分布不确定性量化的基石,通常通过其覆盖率和区间长度进行评估。本文批判性地检验了这些标准度量的充分性。我们证明,通过一种称为偏见技巧(PT)的反直觉方法,区间长度可能被欺骗性地改善,而覆盖率仍然有效。具体而言,对于任何给定的测试样本,PT 概率性地返回一个区间,该区间要么为空,要么使用调整后的置信水平构建,从而保持边际覆盖率。虽然 PT 可能产生欺骗性较低的区间长度,但它引入了实际漏洞:同一输入在算法的重复运行中可能产生完全不同的预测区间。我们正式推导了 PT 实现这些误导性改进的条件,并在各种回归和分类任务中提供了广泛的实证证据。此外,我们引入了一个新度量——区间稳定性,它有助于检测新的 CP 方法是否基于此类 PT 技术隐式地改善了长度。代码可在 https://this URL 获取。

英文摘要

Conformal prediction(CP) has become a cornerstone of distribution-free uncertainty quantification, conventionally evaluated by its coverage and interval length. This work critically examines the sufficiency of these standard metrics. We demonstrate that the interval length might be deceptively improved through a counter-intuitive approach termed Prejudicial Trick(PT), while the coverage remains valid. Specifically, for any given test sample, PT probabilistically returns an interval, which is either null or constructed using an adjusted confidence level, thereby preserving marginal coverage. While PT potentially yields a deceptively lower interval length, it introduces practical vulnerabilities: the same input can yield completely different prediction intervals across repeated runs of the algorithm. We formally derive the conditions under which PT achieves these misleading improvements and provide extensive empirical evidence across various regression and classification tasks. Furthermore, we introduce a new metric interval stability which helps detect whether a new CP method implicitly improves the length based on such PT-like techniques. Code is available at https://github.com/benben-cd/PT-Conformal-Prediction.

2603.25937 2026-06-17 cs.RO cs.LG 版本更新

Can Vision Foundation Models Navigate? Zero-Shot Real-World Evaluation and Lessons Learned

视觉基础模型能否导航?零样本真实世界评估与经验教训

Maeva Guerrier, Karthik Soma, Jana Pavlasek, Giovanni Beltrame

发表机构 * Polytechnique Montreal(蒙特利尔理工学院)

AI总结 本文对五种视觉导航模型在真实环境中进行零样本评估,发现其存在几何理解不足、感知混淆和分布漂移等系统性问题,并公开评估代码与数据集。

详情
AI中文摘要

视觉导航模型(VNMs)通过从大规模视觉演示中学习,有望实现通用化的机器人导航。尽管在真实世界部署日益增多,现有评估几乎完全依赖成功率(机器人是否到达目标),这掩盖了轨迹质量、碰撞行为以及对环境变化的鲁棒性。我们针对五种最先进的VNMs(GNM、ViNT、NoMaD、NaviBridger和CrossFormer)在两个机器人平台和五个室内外环境中进行了真实世界评估。除了成功率,我们结合了基于路径的指标与基于视觉的目标识别分数,并通过受控图像扰动(运动模糊、太阳眩光)评估鲁棒性。我们的分析揭示了三个系统性问题:(a) 即使是架构复杂的扩散和Transformer模型也频繁发生碰撞,表明几何理解有限;(b) 模型无法区分感知相似但存在语义差异的不同位置,导致在重复环境中出现目标预测错误;(c) 在分布偏移下性能下降。我们将公开发布评估代码和数据集,以促进VNMs的可重复基准测试。

英文摘要

Visual Navigation Models (VNMs) promise generalizable, robot navigation by learning from large-scale visual demonstrations. Despite growing real-world deployment, existing evaluations rely almost exclusively on success rate, whether the robot reaches its goal, which conceals trajectory quality, collision behavior, and robustness to environmental change. We present a real-world evaluation of five state-of-the-art VNMs (GNM, ViNT, NoMaD, NaviBridger, and CrossFormer) across two robot platforms and five environments spanning indoor and outdoor settings. Beyond success rate, we combine path-based metrics with vision-based goal-recognition scores and assess robustness through controlled image perturbations (motion blur, sunflare). Our analysis uncovers three systematic limitations: (a) even architecturally sophisticated diffusion and transformer-based models exhibit frequent collisions, indicating limited geometric understanding; (b) models fail to discriminate between different locations that are perceptually similar, however some semantics differences are present, causing goal prediction errors in repetitive environments; and (c) performance degrades under distribution shift. We will publicly release our evaluation codebase and dataset to facilitate reproducible benchmarking of VNMs.

2604.16450 2026-06-17 cs.CY cs.LG q-bio.QM 版本更新

Evaluating Intersectional Fairness across Clinical Machine Learning Use Cases using Fairlogue and the All of Us Research Program

使用Fairlogue和All of Us研究计划评估临床机器学习用例中的交叉公平性

Nick Souligne, Vignesh Subbian

发表机构 * College of Engineering, The University of Arizona(亚利桑那大学工程学院)

AI总结 本文使用Fairlogue工具包在临床预测任务中评估交叉公平性,发现交叉群体差异大于单轴分析,但反事实诊断表明多数差异与随机分组相当。

Comments 10 pages, 7 figures, Accepted at the AMIA Annual Symposium 2026

详情
AI中文摘要

医疗数据中的交叉偏见可能在临床机器学习模型中产生复合差异,然而大多数公平性评估独立地评估人口统计属性。FairLogue是一个用于交叉公平性审计的工具包,被应用于多个临床预测任务,以评估跨组合人口统计群体的差异。使用All of Us数据集,选择两个已发表模型进行复制和评估:(A) 预测选择性5-羟色胺再摄取抑制剂相关的出血事件,(B) 房颤患者两年卒中风险。计算了跨种族、性别和交叉亚组的观察性公平性指标,随后进行反事实分析以评估差异是否可归因于群体成员身份。交叉评估揭示了比单轴分析更大的差异;然而,反事实诊断表明,大多数观察到的差异与随机群体成员身份下预期的差异相当。这些结果强调了交叉公平性审计的重要性,并展示了FairLogue如何为临床机器学习系统中的偏见提供更深入的洞察。

英文摘要

Intersectional biases in healthcare data can produce compound disparities in clinical machine learning models, yet most fairness evaluations assess demographic attributes independently. FairLogue, a toolkit for intersectional fairness auditing, was applied across multiple clinical prediction tasks to evaluate disparities across combined demographic groups. Using the All of Us dataset, two published models were selected for replication and evaluation: (A) prediction of selective serotonin reuptake inhibitor associated bleeding events and (B) two-year stroke risk in patients with atrial fibrillation. Observational fairness metrics were computed across race, gender, and intersectional subgroups, followed by counterfactual analysis to evaluate whether disparities were attributable to group membership. Intersectional evaluation revealed larger disparities than single-axis analyses; however, counterfactual diagnostics indicated that most observed disparities were comparable to those expected under randomized group membership. These results highlight the importance of intersectional fairness auditing and demonstrate how FairLogue provides deeper insight into bias in clinical machine learning systems.

2606.09049 2026-06-17 stat.ME cs.LG math.ST stat.ML stat.TH 版本更新

Data augmented bootstrap: Unifying confidence interval construction by approximate invariance

数据增强自助法:通过近似不变性统一置信区间构建

Kevin Han Huang

发表机构 * Department of Statistics, University of Warwick(华威大学统计系)

AI总结 提出数据增强自助法(DAB),利用数据的近似不变性构建置信区间,统一了经典自助法、共形预测等方法的理论,并引入数据增强启发式方法。

Comments Added comparison with arXiv:2604.15229

详情
AI中文摘要

我们提出了数据增强自助法(DAB),这是一个通过数据的近似不变变换来构建置信区间的框架。作为特例,DAB 恢复了依赖于精确群对称性的流行方法,例如共形预测、最大均值差异 U-统计量的 wild bootstrap 以及最近提出的 SymmPI。同时,DAB 也恢复了经典的自助法,该方法利用了随着数据集大小增长,数据索引均匀采样下数据集的近似不变性。对于所有 DAB 方法,我们建立了理论覆盖结果,这些结果根据不变性的强度在有限样本和渐近保证之间插值,且不假设群结构。近似不变性通过 Kolmogorov 距离度量,并且对于满足高斯普适性的统计量,简化为条件均值和方差匹配。这使我们能够将数据增强(DA)——一种基于近似不变性的广泛使用的机器学习启发式方法——纳入已知的统计方法中。我们通过实验测试了将 DA 纳入自助法、wild bootstrap 和共形预测在模拟设置以及图像、语言和科学数据上的性能。

英文摘要

We propose the data augmented bootstrap (DAB), a framework for constructing confidence intervals from approximately invariant transformations of the data. As special cases, DAB recovers popular methods that rely on exact group symmetries, such as conformal prediction, wild bootstrap for Maximum Mean Discrepancy U-statistics and the recently proposed SymmPI. Meanwhile, DAB also recovers the classical bootstrap method, which exploits the dataset's approximate invariance under uniform sampling of data indices as the dataset size grows. For all DAB methods, we establish theoretical coverage results that interpolate between finite-sample and asymptotic guarantees according to the strength of the invariance, and without assuming a group structure. The approximate invariance is measured in the Kolmogorov distance and, for statistics that satisfy Gaussian universality, reduces to conditional mean and variance matching. This allows us to incorporate data augmentation (DA), a widely used machine learning heuristic based on approximate invariances, into known statistical methods. We empirically test the performance of incorporating DA into bootstrap, wild bootstrap and conformal prediction for simulated settings as well as for image, language and scientific data.

2606.14295 2026-06-17 cs.CR cs.AI cs.LG 版本更新

AgentCyberRange: Benchmarking Frontier AI Systems in Realistic Cyber Ranges

AgentCyberRange:在真实网络靶场中基准测试前沿AI系统

Fengyu Liu, Jiarun Dai, Yihe Fan, Wuyuao Mai, Ziao Li, Bofei Chen, Jie Zhang, Zheng Lou, Bocheng Xiang, Qiyi Zhang, Xudong Pan, Geng Hong, Yuan Zhang, Min Yang

发表机构 * Fudan University(复旦大学)

AI总结 提出首个开源多靶场基础设施AgentCyberRange,集成110个漏洞和156个内部主机,评估前沿AI系统在真实网络攻击中的能力,发现GPT-5.5+Codex在web利用和后利用任务中表现最佳。

详情
AI中文摘要

前沿AI系统在网络安全任务中能力日益增强,包括代码库检查、漏洞检测和利用。然而,评估其攻击能力仍受限于缺乏开放、可复现、多主机的网络靶场。现有公开基准测试捕获了CTF解题、漏洞复现和利用生成等孤立技能,但通常忽略了真实的入侵工作流:发现暴露服务、获得立足点、收集内部信息以及跨主机扩大入侵范围。这一差距使得早期观察新兴风险变得困难,因为前沿AI系统很少在真实攻击条件下进行评估。我们引入了AgentCyberRange,这是首个用于在真实网络靶场中衡量自主网络攻击能力的开源多靶场基础设施。它整合了15个真实Web应用和8个企业级网络靶场中的110个漏洞,以及156个内部主机,并提供了Cage工具链用于执行、编排、结果收集和验证。该基准测试涵盖两个核心阶段:Web利用(代理探索暴露的应用并验证漏洞)和后利用(代理将初始立足点转化为更广泛的内部入侵)。我们在匹配的提示和预算下评估了六个前沿AI系统。GPT-5.5与Codex表现最佳,解决了16.1%的Web利用任务和31.7%的后利用任务;在更具体的提示下,这些比率分别提高到33.0%和46.3%。我们还观察到基准测试之外的发现,包括流行项目中的未知漏洞,以及绕过主机防御的有效载荷变异。这些结果表明,开放的网络靶场评估对于在真实且可复现的条件下观察新兴攻击能力是必要的。

英文摘要

Frontier AI systems are increasingly capable of cybersecurity tasks, including codebase inspection, vulnerability detection, and exploitation. However, evaluating their offensive capabilities remains constrained by limited access to open, reproducible, multi-host cyber ranges. Existing public benchmarks capture isolated skills such as CTF solving, vulnerability reproduction, and exploit generation, but often abstract away realistic intrusion workflows: discovering exposed services, gaining a foothold, collecting internal information, and expanding compromise across hosts. This gap makes it difficult to observe emerging risks early, because frontier AI systems are rarely evaluated under realistic attack conditions. We introduce AgentCyberRange, the first open, multi-range infrastructure for measuring autonomous cyber attack capability in realistic cyber ranges. It combines 110 vulnerabilities across 15 real web applications and 8 enterprise-like cyber ranges with 156 internal hosts, plus Cage, a toolchain for execution, orchestration, result collection, and verification. The benchmark covers two core stages: web exploitation, where agents explore exposed applications and validate vulnerabilities, and post exploitation, where agents turn an initial foothold into broader internal compromise. We evaluate six frontier AI systems under matched prompts and budgets. GPT-5.5 with Codex performs best, solving 16.1% of web exploitation tasks and 31.7% of post-exploitation tasks; with more concrete hints, these rates increase to 33.0% and 46.3%. We also observe out-of-benchmark findings, including unknown vulnerabilities in popular projects, and payload mutation that bypasses host defenses. These results show that open cyber-range evaluation is necessary for observing emerging offensive capabilities under realistic and reproducible conditions.

12. 机器学习应用 57 篇

2606.17057 2026-06-17 cs.LG cs.AI cs.CL 新提交

Correct When Paired, Wrong When Split: Decoupling and Editing Modality-Specific Neurons in MLLMs

配对时正确,分离时错误:多模态大语言模型中模态特定神经元的解耦与编辑

Tingchao Fu, Wenkai Wang, Fanxiao Li, Huadong Zhang, Jinhong Zhang, Dayang Li, Yunyun Dong, Renyang Liu, Wei Zhou

发表机构 * School of Information Science and Engineering, Yunnan University(云南大学信息科学与工程学院) School of Software, Yunnan University(云南大学软件学院) National University of Singapore(新加坡国立大学) School of Engineering, Yunnan University(云南大学工程学院)

AI总结 针对多模态大语言模型知识编辑中存在的解耦失败问题,提出DECODE方法,通过显式解耦和定位模态特定神经元组,实现跨模态触发下的有效知识更新。

Comments 18 pages, 11 figures

详情
AI中文摘要

尽管知识编辑为多模态大语言模型(MLLMs)的知识更新提供了一种高效机制,但我们发现当前范式仍面临一个重要但尚未充分探索的问题:编辑解耦失败,即当模型被多模态输入(文本-图像查询对)触发时,实体相关知识可以更新,但当配对输入被拆分为单模态输入时,这些知识往往恢复为编辑前的旧事实。我们深入的实证分析表明,MLLMs中的实体知识并非以统一表示存储,而是分布在解耦的模态特定路径中。因此,偏向多模态查询的更新无法有效传播到单模态电路。为弥补这一差距,我们提出DECODE,该方法显式解耦并定位模态特定神经元组以获取目标知识。大量实验证明,DECODE在不同模态触发下均能实现有效的知识更新,从而缓解编辑解耦失败。

英文摘要

Although Knowledge Editing provides an efficient mechanism for updating the knowledge of Multimodal Large Language Models (MLLMs), we find that current paradigms still suffer from an important yet remain underexplored issue : editing decoupling failure, where entity-related knowledge can be updated when the model is triggered by multimodal inputs (text--image query pairs), however, it often reverts to outdated pre-edit facts when the paired inputs are split into unimodal ones. Our in-depth empirical analysis reveals that the entity knowledge in MLLMs is not stored as a unified representation, but is instead distributed across disentangled modality-specific pathways. As a result, updates biased toward multimodal queries fail to propagate effectively to unimodal circuits. To bridge this gap, we propose DECODE, which explicitly disentangles and localizes modality-specific neuron groups for targeted knowledge. Extensive experiments demonstrate that DECODE consistently achieves effective knowledge updates under different modality triggers, thereby mitigating editing decoupling failures.

2606.17093 2026-06-17 cs.LG eess.IV 新提交

Diagnosing and Repairing Shape-Prior Shortcuts in Long-Range Single-Shot Fringe Projection Profilometry

诊断和修复长距离单次条纹投影轮廓测量中的形状先验捷径

Adam Haroon, Anush Lakshman, Cody Fleming, Beiwen Li

发表机构 * Department of Mechanical Engineering, Iowa State University(爱荷华州立大学机械工程系) College of Engineering, University of Georgia(佐治亚大学工程学院)

AI总结 通过机械可解释性和共形不确定性量化诊断长距离单次条纹投影轮廓测量中网络依赖形状先验而非条纹相位解码的问题,提出PhiCalNet架构修复,将物体平均绝对误差降低3.3倍。

Comments 44 pages, 27 figures

详情
AI中文摘要

基于学习的单次条纹投影轮廓测量术(FPP)主要在近距离下研究。长距离(工作距离超过1米)情况仍未得到充分解决:平方反比强度衰减降低了条纹信噪比并降低了物理真实度,单次问题由于一幅图像中缺乏条纹阶次信息而病态,且这些架构尚未被机制性地研究。我们提出了一项诊断-修复-验证研究,使用机械可解释性(MI)和共形不确定性量化(UQ)作为收敛的诊断工具:它们在一个物理故障点上达成一致,驱动并验证了架构修复。在一个逼真的合成基准(15,600幅条纹图像,50个物体在1.5-2.1米距离)上,最佳UNet基线达到14.54毫米的物体平均绝对误差(MAE)。三种探测方法(线性探测、Grad-CAM、平面外分布测试)收敛:基线通过物体边界形状先验而非条纹相位解码来解决任务。我们通过PhiCalNet修复此问题,该网络输出包裹相位而非深度,并应用固定的可微校准层将相位映射到深度,从架构上而非通过损失惩罚从假设空间中移除形状先验解。一个物理信息损失,作为对深度回归网络的软惩罚强制执行相同物理规律,没有带来可测量的增益,从而将架构隔离为操作因素。PhiCalNet将物体MAE降低3.3倍至4.46毫米;残余由±π包裹不连续处的0.103%像素承载。逐像素共形UQ确认了诊断:通过快照不一致性拒绝前5%的物体像素,将PhiCalNet RMSE降低64%(20.6->7.4毫米),而基线仅降低3.5%。MI和UQ在相同的故障点上收敛。

英文摘要

Learning-based single-shot fringe projection profilometry (FPP) has been studied mostly at close range. The long-range regime (standoff beyond 1 m) remains largely unaddressed: inverse-square intensity falloff lowers fringe signal-to-noise ratio and degrades physical ground truth, the single-shot problem is ill-posed because fringe-order information is absent from one image, and these architectures have not been studied mechanistically. We present a diagnose-repair-verify study using mechanistic interpretability (MI) and conformal uncertainty quantification (UQ) as convergent diagnostics: they agree on one physical failure locus, driving and verifying an architectural repair. On a photorealistic synthetic benchmark (15,600 fringe images, 50 objects at 1.5-2.1 m), a best UNet baseline reaches 14.54 mm object mean absolute error (MAE). Three probes (linear probing, Grad-CAM, flat-plane out-of-distribution test) converge: the baseline solves the task via object-boundary shape priors rather than fringe-phase decoding. We repair this with PhiCalNet, which outputs wrapped phase rather than depth and applies a fixed differentiable calibration layer mapping phase to depth, removing the shape-prior solution from the hypothesis space architecturally rather than by a loss penalty. A physics-informed loss that enforces the same physics as a soft penalty on a depth-regressing network yields no measurable gain, isolating the architecture as the operative factor. PhiCalNet reduces object MAE 3.3x to 4.46 mm; the residual is carried by 0.103% of pixels at the +/-pi wrap discontinuity. Pixel-wise conformal UQ confirms the diagnosis: rejecting the top 5% of object pixels by snapshot disagreement cuts PhiCalNet RMSE by 64% (20.6->7.4 mm) versus 3.5% for the baseline. MI and UQ converge on the same failure locus.

2606.17113 2026-06-17 cs.LG cs.CL 新提交

The Critical Role of Model Selection in Causal Inference: A Comparative Analysis of Classification Models within the InferBERT Framework for Pharmacovigilance

模型选择在因果推断中的关键作用:基于InferBERT框架的药物警戒分类模型比较分析

Csaba Kiss, Roland Molontay, Gabriele Pergola

发表机构 * Department of Stochastics, Institute of Mathematics, Budapest University of Technology and Economics(布达佩斯技术与经济大学数学研究所随机学系) Institute of Biostatistics and Network Science, Semmelweis University(塞梅维什大学生物统计学与网络科学研究所) Department of Computer Science, University of Warwick(华威大学计算机科学系)

AI总结 本研究在InferBERT框架下比较XGBoost、ALBERT、BioBERT和Med-LLaMA四种模型,发现领域特定预训练(BioBERT)在药物警戒因果ADE检测中优于简单基线和大型LLM,校准改善ECE但对准确率和因果发现影响不一。

Comments 10 pages, 5 figures

详情
AI中文摘要

区分因果性药物不良事件(ADE)与虚假相关性仍然是药物警戒中的核心挑战。InferBERT框架将Transformer模型与Do-calculus相结合,但其成功依赖于底层的分类模型。本研究评估了InferBERT中模型选择的影响,考察了更简单的模型是否足够、领域特定预训练是否有帮助、扩展到LLM是否能改善因果检测,以及事后校准的效果。我们在两个基准上进行了比较研究:镇痛药诱导的急性肝衰竭(AILF)和曲马多相关死亡率(TRAM)。评估了四种模型——XGBoost(基线)、ALBERT(原始InferBERT)、BioBERT(生物医学Transformer)和Med-LLaMA(医学LLM)——使用重复20次的5折交叉验证。我们测量了准确率、等渗回归前后的期望校准误差(ECE),以及因果项与PRR、ROR和EBGM的Jaccard一致性;显著性通过配对t检验测试。BioBERT在两个数据集上均取得了最高准确率,而Med-LLaMA尽管规模大且进行了参数高效微调,表现不佳。领域特定预训练起到了决定性作用。校准改善了ECE,但对准确率和因果发现的影响不一。BioBERT的优越性也使其与传统药物警戒信号的一致性最强。这些结果表明,领域特定预训练相比简单基线和更大的LLM具有明显优势。在计算药物警戒中,投资于可管理的、领域感知的模型比单纯扩大模型规模更有效。

英文摘要

Distinguishing causal adverse drug events (ADEs) from spurious correlations remains a central challenge in pharmacovigilance. The InferBERT framework integrates transformer models with Do-calculus, but its success hinges on the underlying classification model. This study evaluates the impact of model choice in InferBERT, assessing whether simpler models suffice, if domain-specific pre-training helps, whether scaling to LLMs improves causal detection, and the effect of post-hoc calibration. We performed a comparative study on two benchmarks: Analgesics-induced Acute Liver Failure (AILF) and Tramadol-related Mortalities (TRAM). Four models were evaluated-XGBoost (baseline), ALBERT (original InferBERT), BioBERT (biomedical transformer), and Med-LLaMA (medical LLM)-using 5-fold cross-validation repeated over 20 runs. We measured accuracy, Expected Calibration Error (ECE) pre- and post-isotonic regression, and Jaccard concordance of causal terms with PRR, ROR, and EBGM; significance was tested with paired t-tests. BioBERT achieved the highest accuracy on both datasets, while Med-LLaMA underperformed despite its size and parameter-efficient fine-tuning. Domain-specific pre-training was decisive. Calibration improved ECE but had mixed effects on accuracy and causal discovery. BioBERT's superiority also yielded the strongest concordance with traditional pharmacovigilance signals. These results show that domain-specific pre-training provides a clear advantage over simpler baselines and larger LLMs. Investing in manageable, domain-aware models is more effective for computational pharmacovigilance than simply scaling model size.

2606.17115 2026-06-17 cs.LG cs.AI q-bio.QM 新提交

Probing, Fusion, and Trustworthiness: A Systematic Evaluation of Foundation Model Representations for Multimodal Cancer Analysis

探测、融合与可信度:基础模型表示在多模态癌症分析中的系统评估

Jingyu Hu, Giuseppe Tripodi, Reed Naidoo, Sarah F. McGough, Tapabrata Chakraborti

发表机构 * The Alan Turing Institute(艾伦·图灵研究所) University of Bristol(布里斯托大学) University of Manchester(曼彻斯特大学) The Institute of Cancer Research(癌症研究所) Genentech(基因泰克)

AI总结 系统评估基础模型表示在计算病理学任务中的性能,发现图像和组学表示互补,多模态融合在单模态不占优时有效,并利用共形预测验证了不确定性感知推理的临床价值。

详情
AI中文摘要

基础模型(FMs)已成为医学数据的强大表示提取器,但它们在分布偏移下的泛化能力仍未充分探索。本工作系统评估了基于FM的表示在计算病理学任务上的表现,涉及两个真实世界商业队列IH-BC和IH-NSCLC,这些队列来自许可的内部(IH)肿瘤学数据集。分析聚焦于两种模态:全切片图像和转录组图谱,均来自IH多模态数据。我们首先在八个下游分类任务上对五个FM进行单模态探测性能基准测试,发现图像和组学表示携带互补的预测信号。然后,我们通过比较三种基于配对表示的图像-组学融合策略,研究多模态融合是否能在单模态基线之上带来额外收益。进一步通过共形预测评估所选单模态和多模态管道的可信度。我们的结果表明,FM表示在分布外数据上取得了竞争性性能,且多模态融合主要在单模态不占主导信号时有所帮助。共形预测揭示,在点预测失败的大多数情况下,真实诊断仍可在预测集中恢复,这强化了不确定性感知推理对临床支持的价值。

英文摘要

Foundation models (FMs) have emerged as powerful representation extractors for medical data, yet their generalizability to datasets under distribution shift remains underexplored. This work systematically evaluates FM-based representations on a suite of computational pathology tasks across two real-world commercial cohorts, IH-BC and IH-NSCLC, drawn from the licensed in-house (IH) oncology dataset. The analysis focuses on two modalities, whole-slide images and transcriptomic profiles, drawn from the IH multimodal data. We first benchmark unimodal probing performance across five FMs on eight downstream classification tasks, and find that image and omics representations carry complementary predictive signals. Then we investigate whether multimodal fusion can yield additional gains over unimodal baselines by comparing three image-omics fusion strategies built on paired representations. The trustworthiness of selected unimodal and multimodal pipelines is further assessed through conformal prediction. Our results show that FM representations achieve competitive performance on out-of-distribution data and that multimodal fusion helps mainly when no single modality dominates the signal. Conformal prediction reveals that in the majority of cases where a point prediction fails, the true diagnosis remains recoverable within the prediction set, reinforcing the value of uncertainty-aware inference for clinical support.

2606.17233 2026-06-17 cs.LG stat.ML 新提交

Uncertainty Quantification of Engineering Structures by Polynomial Chaos Expansion and Multivariate Active Learning

基于多项式混沌展开与多元主动学习的工程结构不确定性量化

Qitian Lu, Jafar Jafari-Asl, Panagiotis Spyridis, Lukas Novak

发表机构 * Brno University of Technology(布尔诺理工大学) University of Rostock(罗斯托克大学)

AI总结 针对多输出工程问题中单一实验设计难以同时准确近似所有输出量的问题,提出一种自适应序贯采样方法,通过平衡输入空间探索与多输出聚合方差信息,构建多项式混沌展开代理模型,数值实验表明该方法提高了代理精度和稳定性。

详情
AI中文摘要

在许多工程应用中,单个高保真模型在相同输入参数下产生多个感兴趣的量(QoIs),例如复杂物理系统的有限元模型。为了减轻直接模型评估的高计算成本,代理模型被广泛用于构建模型响应的高效近似。自然地,代理模型的精度强烈依赖于实验设计(ED)的质量。然而,单个ED可能无法同时为所有输出提供足够的表示,特别是当不同输出对输入变量表现出不同的敏感性时。一个直接的解决方案是为每个输出分别进行采样,但这会导致采样复杂性和计算成本增加。从统计角度来看,这种方法也忽略了所有输出之间潜在的相关性,并可能损害数据一致性。为了解决这个问题,一种用于构建多项式混沌展开代理模型的自适应序贯采样方法被推广到向量值QoIs。该方法基于新样本对输出方差的局部贡献,从候选池中顺序选择新样本,同时平衡基于距离的输入空间探索和跨所有输出的聚合方差信息的利用。通过来自工程问题的几个数值示例,将其性能与非序贯拉丁超立方采样进行比较。数值结果表明,所提出的策略提高了代理模型的精度和稳定性,并提供了更可靠的二阶统计量估计。

英文摘要

In many engineering applications, a single high-fidelity model produces multiple quantities of interest (QoIs) under the same input parameters, e.g. finite element models of complex physical systems. To alleviate the high computational cost of direct model evaluations, surrogate models are widely used to construct efficient approximations of model responses. Naturally, the accuracy of surrogates strongly depends on the quality of the experimental design (ED). However, a single ED may not provide an adequate representation for all outputs simultaneously, especially when different outputs exhibit varying sensitivities to the input variables. A straightforward solution is to perform separate sampling for each output, but this results in increased sampling complexity and computational cost. From a statistical perspective, such an approach also ignores potential correlations among all outputs and may compromise data consistency. To address this issue, an adaptive sequential sampling method for constructing polynomial chaos expansion surrogate models is generalized for vector valued QoIs. The method sequentially selects new samples from a candidate pool based on their local contribution to the output variance, while balancing distance-based exploration of the input space and exploitation of aggregated variance information across all outputs. Its performance is compared with non-sequential Latin Hypercube Sampling through several numerical examples from engineering problems. Numerical results demonstrate that the proposed strategy improves both surrogate accuracy and stability, and provides a more reliable estimation of second-order statistics.

2606.17345 2026-06-17 cs.LG cs.AI 新提交

Counterfactual Optimization of Baseball Pitch Sequences and Estimation of Its Impact on Season-Level Statistics

棒球投球序列的反事实优化及其对赛季级统计指标影响的估计

Ryota Takamido, Hiroki Nakamoto

发表机构 * Sports Innovation Organization, National Institute of Fitness and Sports in Kanoya(体育创新组织,国立健身与体育研究所)

AI总结 利用Transformer模型和反事实分析,优化MLB投球序列中的最终投球和设置投球,发现可显著提升赛季级表现(如K/9提高1.0以上),并提供了速度带有效位置等实用见解。

详情
AI中文摘要

尽管投球序列是棒球分析的核心话题,但以往研究主要关注单次打席中最终投球的优化,对前期设置投球的作用及其对长期赛季级表现的影响研究不足。为解决这些问题,本研究利用MLB Statcast数据进行了反事实分析。训练了一个基于Transformer的机器学习模型,用于预测目标投球是否会导致击球结果或挥空。然后,通过将最终投球或前期设置投球替换为替代的投球类型和位置,同时保持周围背景信息不变,生成了反事实投球序列。最优反事实选择定义为那些最小化预测击球概率的选择,并使用将模型输出与赛季统计指标关联的回归模型估计其对投手赛季统计指标的预期影响。结果表明,最终投球和设置投球的优化都可能显著影响赛季级表现,包括K/9提高超过1.0。分析还提供了若干实用见解,包括特定速度带的有效位置、投球指令的重要性以及通过中速投球扩展投球选择范围。这些发现定量支持了投球序列在棒球中的战略重要性。

英文摘要

Although pitch sequencing is a central topic in baseball analytics, previous studies have primarily focused on optimizing the final pitch within a single plate appearance, leaving the role of preceding setup pitches and their impact on long-term season-level performance insufficiently examined. To address these issues, this study conducted counterfactual analyses using MLB Statcast data. A Transformer-based machine-learning model was trained to predict whether a target pitch would result in an in-play outcome or swing-out. Counterfactual pitch sequences were then generated by replacing either the final pitch or the preceding setup pitch with alternative pitch types and locations while keeping the surrounding contextual information fixed. Optimal counterfactual selections were defined as those that minimized the predicted in-play probability, and their expected effects on pitchers' seasonal statistics were estimated using regression models linking model outputs to season statistics. The results suggest that the optimization of both final and setup pitches may substantially influence season-level performance, including improvements of more than 1.0 in K/9. The analyses also provided several practical insights, including velocity-band-specific effective locations, the importance of pitch commands, and the expansion of pitch-selection options through middle-velocity pitches. These findings quantitatively support the strategic importance of pitch sequencing in baseball.

2606.17413 2026-06-17 cs.LG stat.AP 新提交

Amortized Probabilistic Retrieval of Atmospheric CO2 from OCO-2 Spectra Using Deep Learning with Laplace Approximations and Normalizing Flows

基于深度学习的OCO-2光谱大气CO2摊销概率检索:结合拉普拉斯近似与归一化流

Alejandro Calle-Saldarriaga, Felix Jimenez, Jack Grosskreuz, Jiazheng Wang, Jonathan Hobbs, Matthias Katzfuss

发表机构 * University of Wisconsin–Madison(威斯康星大学麦迪逊分校) Jet Propulsion Laboratory, California Institute of Technology(加州理工学院喷气推进实验室)

AI总结 提出深度学习框架,利用拉普拉斯近似和归一化流从OCO-2光谱中快速、准确地检索大气CO2浓度,并量化不确定性,相比传统方法加速数个数量级且精度更高。

Comments 23 pages, 8 figures

详情
AI中文摘要

基于空间的大气二氧化碳(CO2)监测对于约束全球碳收支至关重要。NASA的轨道碳观测者-2号(OCO-2)利用高分辨率光谱估算柱平均干空气CO2摩尔分数(XCO2)。然而,当前的操作检索算法计算成本高且未能正确量化不确定性。我们提出了一种新颖的深度学习框架来解决这些挑战。由于真实卫星观测的地面真值数据难以获取,我们使用高保真模拟数据集开发并验证了我们的方法。该数据集旨在支持OCO-2不确定性量化(UQ),并包含了真实的前向模型误差。我们的架构使用多分支神经网络编码光谱波段,并通过两种可扩展的UQ方法——拉普拉斯近似和归一化流——来估计完整CO2柱或其所需汇总的后验分布。与操作性的“全物理”求解器相比,我们的方法具有五个关键优势:(1)摊销:推理速度提高数个数量级,能够实时处理海量数据流;(2)模型误差鲁棒性:通过在明确包含模型差异的模拟数据上训练,我们的方法考虑了标准反演中常被忽略的系统误差;(3)点估计精度:与基线方法相比,我们实现了更优的预测精度;(4)改进的UQ:概率输出提供了校准更好的不确定性估计;(5)非高斯后验:当使用归一化流时,我们的框架成功建模了复杂、非对称的后验分布,克服了高斯假设的局限性。这些结果表明,基于模拟的深度学习是迈向下一代操作处理系统的可行路径。

英文摘要

Space-based monitoring of atmospheric carbon dioxide (CO2) is essential for constraining the global carbon budget. NASA's Orbiting Carbon Observatory-2 (OCO-2) estimates column-averaged dry-air mole fractions of CO2 (XCO2) using high-resolution spectra. However, current operational retrieval algorithms are computationally expensive and do not properly quantify uncertainties. We present a novel deep learning framework that addresses these challenges. Due to the difficulties of ground-truth data for real satellite observations, we develop and validate our approach using a high-fidelity simulation dataset. This dataset, created to support OCO-2 uncertainty quantification (UQ), incorporates realistic forward model errors. Our architecture encodes spectral bands using a multi-branch neural network and estimates posteriors of the full CO2 column or desired summaries thereof using two scalable UQ methods: Laplace approximations and normalizing flows. Our approach has five key advantages relative to operational "full-physics" solvers: (1) Amortization: Inference is orders of magnitude faster, enabling real-time processing of massive data streams; (2) Model error robustness: By training on simulations that explicitly include model discrepancies, our method accounts for systematic errors often neglected by standard inversions; (3) Point estimate accuracy: We achieve superior predictive accuracy compared to baseline methods; (4) Improved UQ: The probabilistic outputs yield better-calibrated uncertainty estimates; and (5) Non-Gaussian posteriors: When utilizing normalizing flows, our framework successfully models complex, asymmetric posterior distributions, overcoming the limitations of the Gaussian assumption. These results suggest that simulation-based deep learning is a viable path toward next-generation operational processing systems.

2606.17445 2026-06-17 cs.LG cond-mat.mtrl-sci physics.chem-ph 新提交

Toward Controllable Catalyst Inverse Design via Large-Scale Autoregressive Pretraining

面向可控催化剂逆向设计的大规模自回归预训练

Dong Hyeon Mok, Jonggeol Na, Seoin Back

发表机构 * Department of Chemical and Biomolecular Engineering, Institute of Emergent Materials, Sogang University(化学与生物分子工程系,新兴材料研究所,首尔大学) Department of Chemical Engineering and Materials Science, Ewha Womans University(化学工程与材料科学系,成实女子大学) Department of Chemical Engineering, Graduate Program in System Health Science and Engineering, Ewha Womans University(化学工程系,系统健康科学与工程研究生院,成实女子大学) Institute for Multiscale Matter and Systems (IMMS), Ewha Womans University(多尺度物质与系统研究所(IMMS),成实女子大学) KU-KIST Graduate School of Converging Science and Technology, Korea University(KU-KIST融合科学与技术研究生院,韩国大学) Department of Integrated Energy Engineering, Korea University(整合能源工程系,韩国大学) Center for Hydrogen and Fuel Cells, Korea Institute of Science and Technology(KIST)(氢气与燃料电池中心,韩国科学技术院(KIST))

AI总结 提出基于生成式预训练Transformer的条件催化剂生成模型,通过大规模预训练和微调实现高结构有效性和条件匹配率,显著提升筛选效率。

详情
AI中文摘要

多相催化剂的逆向设计仍然具有挑战性,因为催化剂表面表现出显著的结构复杂性,在广阔的化学空间中存在耦合的表面-吸附物相互作用,仅通过传统筛选难以高效探索。尽管基于机器学习的高通量筛选加速了催化剂发现,但其效率随着搜索空间的增长而不可避免地下降,这促使了能够直接构建具有目标特性的催化剂的生成模型的发展。在这里,我们提出了一种基于生成式预训练Transformer架构的条件催化剂生成模型,该模型具有数值嵌入层,能够在单一自回归框架内生成以分类和连续属性为条件的催化剂结构。该模型在1.33亿个催化剂结构上进行了预训练,随后在大约46万个优化结构上进行了微调,这些结构具有相关的分类属性和结合能,用于条件生成。最终模型实现了98%的结构有效性、95%的优化有效性以及高分类条件保真度,吸附物类型和组成的联合匹配率达到93%。对于结合能条件,约20%的匹配率相比基线训练分布提高了四倍,生成的分布系统地朝向目标值偏移,使得无需额外微调即可将反应靶向催化剂发现的筛选效率提高1.5至4倍。这些结果表明,大规模自回归预训练结合显式属性条件为可控催化剂生成和加速催化剂发现提供了一条实用途径。

英文摘要

Inverse design of heterogeneous catalysts remains challenging because catalyst surfaces exhibit substantial structural complexity with coupled surface-adsorbate interactions across a vast chemical space that is difficult to explore efficiently through conventional screening alone. Although machine learning-based high-throughput screening has accelerated catalyst discovery, its efficiency inevitably declines as the search space grows, motivating the development of generative models that can directly construct catalysts with target properties. Here, we present a conditional catalyst generative model based on the Generative Pretrained Transformer architecture with a numerical embedding layer that enables the generation of catalyst structures conditioned on both categorical and continuous properties within a single autoregressive framework. The model was pretrained on 133 million catalyst structures and subsequently fine-tuned on approximately 460,000 optimized structures with associated categorical properties and binding energies for conditional generation. The resulting model achieved 98% structural validity, 95% optimization validity, and high categorical condition fidelity, with a 93 % joint match rate for adsorbate type and composition. For binding energy conditioning, the match rate of approximately 20% represents a four-fold improvement over the baseline training distribution, and the generated distributions shift systematically toward the target values, enabling a 1.5 to 4-fold improvement in screening efficiency for reaction-targeted catalyst discovery without additional fine-tuning. These results show that large-scale autoregressive pre-training, combined with explicit property conditioning, provides a practical route toward controllable catalyst generation and accelerated catalysts discovery.

2606.17451 2026-06-17 cs.LG cs.RO 新提交

Credibility-Weighted Pricing of Autonomous Vehicle Liability Under Operational Design Domain Shift

操作设计域转移下自动驾驶汽车责任的可信度加权定价

Doyeon Jang

AI总结 针对自动驾驶系统部署中经验稀疏、ODD转移及风险非平稳问题,提出分层贝叶斯可信度框架,通过ODD相似性核进行部分池化,在Waymo数据上验证其有效性。

详情
AI中文摘要

自动驾驶系统的部署带来了一个基础性的费率制定挑战:稀疏的经验、不断变化的操作设计域以及跨软件版本的非平稳风险。我们提出了一个分层贝叶斯可信度框架,通过学习的ODD相似性核汇集城市、软件版本和区域的信息,将Buhlmann-Straub作为极限情况嵌套其中。基于NHTSA Standing General Order数据库中美国四个大都市区的648起Waymo已验证碰撞事件与1.16亿匹配里程的演示表明,城市聚合可信度权重适中(0.12-0.46),部分池化明显优于无池化,且功效分析显示,学习核的优势在大约十二个部署城市时变得可检测。

英文摘要

Automated Driving System deployments create a foundational ratemaking challenge: sparse experience, shifting operational design domains, and non-stationary risk across software releases. We propose a hierarchical Bayesian credibility framework pooling across cities, software versions, and territories via a learned ODD-similarity kernel, nesting Buhlmann-Straub as a limiting case. Demonstrated on 648 verified-engaged Waymo crashes across four U.S. metros from the NHTSA Standing General Order database against 116 million matched miles, city-aggregate credibility weights are moderate (0.12-0.46), partial pooling decisively outperforms no pooling, and a power analysis shows the learned kernel's advantage becomes detectable at approximately twelve deployed cities.

2606.17462 2026-06-17 cs.LG cs.NI 新提交

ResAware: Cross-Environment Website Fingerprinting via Resource-Privileged Distillation

ResAware: 通过资源特权蒸馏实现跨环境网站指纹识别

Chongru Fan, Wei Wang, Wentao Huang, Zhenquan Ding, Jinqiao Shi, Lei Cui, Zhiyu Hao, Xiaochun Yun

发表机构 * Beijing University of Posts and Telecommunications(北京邮电大学) Zhongguancun Laboratory(中关村实验室)

AI总结 提出ResAware框架,利用资源级特征训练教师模型并通过异构知识蒸馏指导学生模型,在不增加在线开销下提升跨环境鲁棒性,在五个月大规模数据集上显著提升基线方法性能。

Comments 18 pages, 9 figures

详情
AI中文摘要

虽然网站指纹识别(WF)攻击在受控实验室环境中实现了高精度,但在现实环境中,由于时空漂移、浏览器异构性、代理混淆等因素,其性能往往大幅下降。这一限制源于它们仅依赖低层流量特征,而这些特征噪声大且对环境扰动高度敏感。为解决此问题,我们提出\textbf{ResAware},一种在\textit{训练丰富/推理贫乏}非对称设置下的跨环境资源感知蒸馏框架。具体来说,ResAware在资源级特征上训练教师模型,然后通过异构知识蒸馏将所得特权知识蒸馏到学生模型中。部署时,学生模型仅使用加密流量进行推理,不产生额外成本。我们在一个跨越五个月、从六个全球观测点收集的大规模数据集上评估ResAware,包含超过$160{,}000$个配对样本。结果表明,ResAware显著增强了多种WF基线的跨环境鲁棒性。例如,在150天的时间漂移下,ResAware将Var-CNN的F1分数从$72.77\%$提升至$81.49\%$,开放世界$TPR@1\%FPR$从$22.40\%$提升至$27.20\%$。我们的结果表明,资源级监督在不扩大在线观测能力的情况下提高了WF鲁棒性。

英文摘要

While Website Fingerprinting (WF) attacks achieve high accuracy in controlled laboratory settings, they often degrade substantially in real-world environments due to spatio-temporal drift, browser heterogeneity, proxy obfuscation and etc. This limitation stems from their sole reliance on low-level traffic features that are noisy and highly sensitive to environmental perturbations. To address this problem, we propose \textbf{ResAware}, a cross-environment resource-aware distillation framework under a \textit{training-rich/inference-poor} asymmetric setting. Specifically, ResAware trains a teacher model on resource-level features, and then distills the resulting privileged knowledge into a student model through heterogeneous knowledge distillation. At deployment time, the student model performs inference using only encrypted traffic, incurring zero additional cost. We evaluate ResAware on a large-scale dataset collected over five months from six globally distributed vantage points, comprising more than $160{,}000$ paired samples. The results show that ResAware significantly enhances the cross-environment robustness of diverse WF baselines. Under a 150-day temporal drift, for example, ResAware improves the F1-score of Var-CNN from $72.77\%$ to $81.49\%$ and the open-world $TPR@1\%FPR$ from $22.40\%$ to $27.20\%$. Our results demonstrate that resource-level supervision improves WF robustness without expanding online observation capabilities.

2606.17476 2026-06-17 cs.LG 新提交

Multi-Adapter PPO: A Cross-Attention Enhanced Wavelength Selection Framework for LIBS Quantitative Analysis

多适配器PPO:一种用于LIBS定量分析的交叉注意力增强波长选择框架

Hao Li, Man Fung Zhuo

发表机构 * Electrical and Computer Engineering(电气与计算机工程系) University of Arizona(亚利桑那大学) Computer Engineering University of Arizona Tucson, USA(计算机工程大学亚利桑那大学图森美国)

AI总结 提出多适配器PPO框架,将波长选择转化为强化学习问题,利用交叉注意力和多适配器捕获光谱关系,在钢铁和煤炭数据集上综合评分平均提升28.4%,预测精度提升45.2%。

Comments 6 pages

详情
AI中文摘要

激光诱导击穿光谱(LIBS)定量分析由于高维光谱数据以及预测精度与特征效率之间的基本权衡,在波长选择方面面临关键挑战。本文提出了一种新颖的多适配器PPO框架,将波长选择转化为强化学习问题,利用交叉注意力机制和多个专用适配器来捕获复杂的光谱关系。我们的方法在钢铁和煤炭数据集上的综合评分平均比传统粒子群优化(PSO)高出28.4%,预测精度高出45.2%。所提出的方法在平衡预测精度与特征效率方面表现出优越性能,在LIBS定量分析中取得了最先进的结果,同时保持了可解释性和计算效率。我们在以下网址发布了代码和数据集:this https URL

英文摘要

Laser-induced breakdown spectroscopy (LIBS) quantitative analysis faces critical challenges in wavelength selection due to high-dimensional spectral data and the fundamental trade-off between prediction accuracy and feature efficiency. This paper presents a novel Multi-Adapter PPO framework that transforms wavelength selection into a reinforcement learning problem, leveraging cross-attention mechanisms and multiple specialized adapters to capture complex spectral relationships. Our approach outperforms traditional Particle Swarm Optimization (PSO) by an average of 28.4\% in comprehensive score and 45.2\% in prediction accuracy across steel and coal datasets. The proposed method demonstrates superior performance in balancing prediction accuracy with feature efficiency, achieving state-of-the-art results in LIBS quantitative analysis while maintaining interpretability and computational efficiency. We released our code and dataset here: https://github.com/Hflying/MAPPO

2606.17553 2026-06-17 cs.LG 新提交

SpatioTemporal Causal Network Diagnostics for Geographic Tipping Point Early Warning

地理临界点早期预警的时空因果网络诊断

Zhaoyuan Yu, Zhangyong Liang

发表机构 * Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application(江苏省地理信息资源开发与应用协同创新中心) National Center for Applied Mathematics, Tianjin University(天津大学国家应用数学中心)

AI总结 提出时空因果网络诊断(ST-CND)框架,通过构建数据驱动的有向因果网络,结合局部恢复率估计和脆弱子网识别,解决地理临界点早期预警中的空间稀释、欧氏假设和相关噪声问题,在AMOC任务上AUROC达0.783。

详情
AI中文摘要

生态系统、气候子系统或冰盖中的地理临界点对局部早期预警提出了严峻挑战。经典的空间指标如Moran's I总结了全局空间结构,但难以处理三个问题:空间稀释、欧氏假设和相关噪声。本文引入时空因果网络诊断(ST-CND),该框架通过将地理场表示为随时间演化的有向因果网络来解决这三个问题。核心工作流程如下:(1)通过转移熵推断哪些空间节点有助于预测其他节点,用数据驱动的信息流拓扑替代固定的欧氏邻域;(2)通过动态模态分解估计每个候选子网络内的局部恢复率;(3)结合三个信号——高内部波动、高内部同步和低外部耦合——识别最脆弱的子网络,从而抑制空间相关噪声引起的误报。在合成分岔和两个观测海表温度基准(即印度洋-太平洋SST和北大西洋AMOC)上验证,ST-CND提供了局部且可解释的预警。在AMOC任务上,它达到了0.783的AUROC和0.378的临界子网络IoU,优于递归网络和lambda-AR1基线。该框架为地球系统科学中的空间早期预警提供了可解释且可扩展的流程。

英文摘要

Geographic tipping points in ecosystems, climate subsystems, or ice sheets pose severe challenges for localized early warning. Classical spatial indicators such as Moran's I summarize global spatial structure, but they struggle with three issues: spatial dilution, Euclidean assumptions, and correlated noise. This paper introduces SpatioTemporal Causal Network Diagnostics (ST-CND), a framework that addresses these three issues by representing the geographic field as a time-evolving directed causal network. The core workflow is as follows: (1) infer which spatial nodes help predict other nodes via transfer entropy, replacing fixed Euclidean neighborhoods with data-driven information-flow topology; (2) estimate local recovery rates within each candidate subnetwork via dynamic mode decomposition; and (3) identify the most vulnerable subnetwork by combining three signals, namely high internal fluctuation, high internal synchronization, and low external coupling, thereby suppressing false alarms from spatially correlated noise. Validated on synthetic bifurcations and two observational sea-surface temperature benchmarks, namely Indo-Pacific SST and North Atlantic AMOC, ST-CND delivers localized and interpretable warnings. On the AMOC task, it achieves an AUROC of 0.783 and a critical-subnetwork IoU of 0.378, outperforming recurrence-network and lambda-AR1 baselines. The framework provides an interpretable and scalable pipeline for spatial early warning in Earth system science.

2606.17659 2026-06-17 cs.LG 新提交

Physics-Constrained Neural Networks for Improved Short-Term Weather Forecasting: A Case Study over the South Pacific

物理约束神经网络改进短期天气预报:南太平洋案例研究

Egor Bugaev, Fedor Buzaev, Dmitry Efremenko, Denis Derkach, Fedor Ratnikov

发表机构 * Faculty of Computer Science, Higher School of Economics(高等经济学院计算机科学系)

AI总结 提出三种改进物理约束神经网络(PCNN)的方法,包括升级数值求解器、统一自回归混合块和集成两种神经骨干,在WeatherBench南太平洋子集上相比纯神经网络模型在1-12小时预报中均方根误差降低8-22%,同时保持物理一致性。

Comments Presented at ICLR 2026 Workshop AI and PDE

详情
AI中文摘要

本研究介绍了对物理约束神经网络(PCNN)的改进,提高了混合短期天气预报模型的准确性和稳定性。基于WeatherGFT架构,提出了三项创新。首先,升级的数值求解器结合了五阶加权本质无振荡格式(WENO-5)、beta平面近似和亚网格尺度粘度,允许积分时间步长增加四倍至1200秒,同时将日均方误差降低高达26%。其次,一个统一的回归混合块取代了原来的24个专门模块链,消除了对特定预报时间的过拟合。第三,物理核心与两个最先进的神经骨干集成,产生了PI-PredFormer和PI-IAM4VP。在2000年至2004年的WeatherBench南太平洋子集上的评估表明,这些混合模型在1-12小时预报时间内的均方根误差比纯神经模型降低了8-22%,同时更好地保持了物理一致性。这些结果表明,混合组件的逐步改进为实现更准确和高效的短期天气预报提供了一条实用途径。

英文摘要

This study introduces enhancements to physics-constrained neural networks (PCNNs) that improve the accuracy and stability of hybrid short-term weather forecasting models. Building on the WeatherGFT architecture, three innovations are proposed. First, an upgraded numerical solver, combining a fifth-order weighted essentially non-oscillatory scheme (WENO-5), a beta-plane approximation, and subgrid-scale viscosity, permits a fourfold increase in the integration time step to 1200 s while reducing the daily mean squared error by up to 26%. Second, a unified autoregressive hybrid block replaces the original chain of 24 specialised modules, eliminating overfitting to specific lead times. Third, the physical core is integrated with two state-of-the-art neural backbones, resulting in PI-PredFormer and PI-IAM4VP. Evaluation on the WeatherBench South Pacific subset from 2000 to 2004 shows that these hybrids reduce root mean squared error at 1-12 h lead times by 8-22% compared to purely neural counterparts, while better preserving physical consistency. These results demonstrate that incremental refinement of hybrid components offers a practical route toward more accurate and efficient short-range weather forecasting.

2606.17668 2026-06-17 cs.LG cs.AI q-bio.QM 新提交

ASTEROID: A Spatiotemporal Information Transformer for Forecasting Multi-Step Time Series of Molecular Dynamics

ASTEROID: 用于分子动力学多步时间序列预测的时空信息变换器

Kexin Wu, Luonan Chen, Renxiao Wang

发表机构 * Department of Medicinal Chemistry, School of Pharmaceutical Sciences, Fudan University(药学院药物化学系,复旦大学) School of Mathematical Sciences and School of AI, Shanghai Jiao Tong University(数学科学学院和人工智能学院,上海交通大学)

AI总结 提出ASTEROID框架,通过将分子动力学轨迹重构为高维时空序列并集成时空信息变换方程到Transformer中,实现多步原子坐标的直接预测,在多个量子力学分子数据集上显著提升预测精度并降低计算成本。

Comments 32 pages,10 figures

详情
AI中文摘要

分子动力学(MD)模拟计算需求高,尤其对于需要长期分析的大规模系统。准确预测MD模拟结果不仅是一个有吸引力的科学挑战,而且具有重要的实用价值。在这项工作中,我们开发了一个数据驱动框架,称为ASTEROID(用于推断动力学的先进时空变换器),可以直接预测多步原子坐标,避免传统的迭代积分。为此,我们的ASTEROID将MD轨迹重构为高维时空序列,并将时空信息(STI)变换方程集成到Transformer架构中。ASTEROID的核心创新在于其建模多尺度时空依赖性的能力。具体来说,对于空间依赖性,局部-全局自注意力机制捕获短程和长程相互作用。对于时间依赖性,编码器-解码器结构将全局上下文与自回归预测相结合。ASTEROID在几个量子力学衍生的分子数据集上进行了评估。我们的结果表明,ASTEROID不仅在各种基准测试中实现了比现有方法更高的多步预测精度,而且显著降低了传统MD模拟的计算成本。此外,该模型支持在扩展时间尺度上的迭代多步预测。这项工作为加速MD模拟建立了一个稳健且可推广的数据驱动范式。

英文摘要

Molecular dynamics (MD) simulation is computationally demanding, particularly for large-scale systems requiring long-term analysis. Accurate forecast of the outcomes of a MD simulation is not only an attractive scientific challenge but also has substantial practical value. In this work, we developed a data-driven framework, termed ASTEROID (Advanced Spatiotemporal TransformER fOr Inferring Dynamics), that can directly predict multi-step atomic coordinates, avoiding conventional iterative integration. For this purpose, our ASTEROID reformulates MD trajectories as high-dimensional spatiotemporal sequences and integrates the Spatiotemporal Information (STI) Transformation equation into a Transformer architecture. The core innovation of ASTEROID lies in its ability to model multiscale spatiotemporal dependencies. In particular, for spatial dependencies, a local-global self-attention mechanism captures both short- and long-range interactions. For temporal dependencies, an encoder-decoder structure integrates global context with autoregressive forecasting. ASTEROID was evaluated on several quantum-mechanics derived molecular datasets. Our results indicate that ASTEROID achieved not only a higher level of accuracy in multi-step prediction than existing methods on various benchmarks, but also significantly reduced computational cost of conventional MD simulation. Moreover, the model supports iterative multi-step forecasting over an extended time scale. This work establishes a robust and generalizable data-driven paradigm for accelerating MD simulations.

2606.17692 2026-06-17 cs.LG 新提交

Delta-Based Target Reformulation for Short-Term Electricity Load Forecasting Using LSTM and Transformer Models

基于Delta目标重构的LSTM与Transformer短期电力负荷预测

Vansh Bansal

AI总结 针对电力负荷非平稳性,提出Delta目标重构方法,让LSTM和Transformer预测负荷变化量而非绝对值,在小时级预测中MAE和MAPE降低超50%。

Comments 8 pages, 3 tables

详情
AI中文摘要

准确的短期电力负荷预测对于现代电力系统的可靠和经济运行至关重要,尤其是在天气变化、日历效应和消费模式演变导致的非平稳性下。尽管LSTM和Transformer等深度学习模型表现出色,但大多数现有研究侧重于直接预测绝对负荷,而未明确解决目标非平稳性。受ARIMA模型中经典时间序列差分技术的启发,本文研究了一种基于Delta的目标重构方法,用于深度学习的短期电力负荷预测。该方法不直接预测绝对负荷值,而是训练模型预测连续时间步之间的负荷变化,最终预测通过最后一次观测负荷重建。这旨在稳定学习目标并降低预测难度。利用印度多年逐小时真实电力负荷数据,辅以NASA POWER项目的气象变量和日历特征,本研究评估了LSTM和Transformer在两种公式下的表现,并以LightGBM作为基准。实验针对小时前和日前预测范围进行,通过平均绝对误差(MAE)和平均绝对百分比误差(MAPE)评估性能。结果表明,Delta重构在所有评估模型的小时前预测中持续提高预测精度,与绝对公式相比,MAPE降低超过50%。对于日前预测,Delta目标特别有利于深度序列模型(LSTM和Transformer),而LightGBM在绝对公式下仍具有竞争力。这些发现表明,Delta重构是神经网络的一种强大归纳偏置,但其效果依赖于模型和预测范围。

英文摘要

Accurate short-term electricity load forecasting is critical for the reliable and economic operation of modern power systems, under non-stationarity arising from weather variability, calendar effects, and evolving consumption patterns. While deep learning models such as LSTMs and Transformers show promising performance, most existing studies focus on direct absolute load prediction without explicitly addressing target non-stationarity. Motivated by classical time-series differencing techniques in ARIMA models, this paper investigates a delta-based target reformulation for short-term electricity load forecasting using deep learning. Instead of directly predicting absolute load values, the proposed formulation trains models to predict the change in load between consecutive time steps, with final forecasts reconstructed using the last observed load. This aims to stabilize the learning target and reduce forecasting difficulty. Using multi-year, hourly real-world electricity load data from India, augmented with meteorological variables from the NASA POWER project and calendar features, this study evaluates LSTM and Transformer models under both formulations, benchmarking them against LightGBM. Experiments are conducted for hour-ahead and day-ahead horizons, assessing performance via Mean Absolute Error (MAE) and Mean Absolute Percentage Error (MAPE). Results show that delta-based reformulation consistently improves forecasting accuracy for hour-ahead prediction across all evaluated models, yielding MAPE reductions of over 50% compared to absolute formulations. For day-ahead forecasting, delta targets specifically benefit deep sequence models (LSTM and Transformer), while LightGBM remains competitive under the absolute formulation. These findings indicate that while delta reformulation is a powerful inductive bias for neural networks, its efficacy is model- and horizon-dependent.

2606.17805 2026-06-17 cs.LG 新提交

QueryMarket: Cost-Aware Online Active Learning in Data Markets

QueryMarket: 数据市场中成本感知的在线主动学习

Xiwen Huang, Pierre Pinson

发表机构 * Dyson School of Design Engineering, Imperial College London(帝国理工学院戴森设计工程学院) Halfspace (part of Accenture)(埃森哲旗下Halfspace) Technical University of Denmark (DTU Management)(丹麦技术大学(DTU管理系)) Aarhus University (CoRE)(奥胡斯大学(CoRE))

AI总结 提出QueryMarket框架和OVBAL算法,通过D-最优性准则估计边际效用,在滚动预算约束下实现成本感知的在线主动学习,适应非平稳流和异构标签成本。

Comments 10 pages, 8 figures. Submitted to IEEE Transactions on Neural Networks and Learning Systems

详情
AI中文摘要

数据采集是实时流学习中一个主要瓶颈:分析师必须在滚动预算约束下即时决定购买哪些标签。然而,现有的在线主动学习很少在概念漂移下统一考虑定价、信息增益和滚动预算约束。我们引入了QueryMarket,一个受市场启发的框架,它根据每个传入数据点对模型的估计效用及其价格进行查询。在该框架内,我们提出了OVBAL(基于方差的在线主动学习),它通过使用带有指数遗忘的D-最优性准则估计每个样本的边际效用,并在滚动预算约束下执行成本感知的购买,将数据定价与信息驱动的选择相结合。OVBAL产生了一个简单的、完全在线的决策规则,能够适应非平稳流和异构标签成本。在合成数据和真实世界太阳能发电预测任务上的实验表明,OVBAL在卖方中心定价下特别有效,并且在两种定价方案下,在真实世界任务中实现了更有利的长期误差-成本权衡。

英文摘要

Data acquisition is a major bottleneck for learning in real-time streams: analysts must decide on the fly which labels to purchase while respecting a rolling budget. However, existing online active learning rarely unifies pricing, information gain, and rolling budget constraints under concept drift. We introduce QueryMarket, a market-inspired framework that queries each incoming data point based on its estimated utility to the model and its price. Within this framework, we propose OVBAL (online variance-based active learning), which integrates data pricing with information-driven selection by estimating each sample's marginal utility via a D-optimality criterion with exponential forgetting and executing cost-aware purchases under rolling budget constraints. OVBAL yields a simple, fully online decision rule that adapts to nonstationary streams and heterogeneous label costs. Experiments on synthetic data and a real-world solar power generation forecasting task show that OVBAL is particularly effective under seller-centric pricing and yields a more favorable long-run error-cost trade-off in the real-world task under both pricing schemes.

2606.17931 2026-06-17 cs.LG 新提交

Predictive Analytics in E-Commerce for CustomerBehavior Forecasting using hybrid Ret-DNN withXGBoost Model

电子商务中基于混合Ret-DNN与XGBoost模型的客户行为预测分析

Degala Pushpa Sri, Mayank Atreya, Lakshmi. H, Navin Chhibber, Mukesh Soni

发表机构 * Chewy Inc(Chewy公司) Pace Institute of Technology and Atlanta, USA(佩斯理工学院和亚特兰大美国) Nitte Meenakshi Institute of Sciences(尼特梅恩克希科学学院) Lovely Professional University(洛丽专业大学) Infinity Tech Group(无限科技集团) University(大学)

AI总结 提出混合Ret-DNN与XGBoost模型,通过特征提取和梯度提升预测客户购买概率,在UK零售数据集上MAE达0.2193。

Comments 2025 2nd International Conference on Software, Systems and Information Technology (SSITCON)

详情
AI中文摘要

近年来,电子商务服务在人们的日常生活中迅速增长,帮助他们在线购买产品。然而,零售平台难以理解客户行为,并难以预测其未来购买。为克服这些挑战,本研究提出一种混合零售深度神经网络(Ret-DNN)与极端梯度提升(XGBoost)模型,用于捕捉零售数据的时间特征和表格动态。首先,数据来自一家英国在线零售商,包含近50万条交易记录。然后,使用一系列技术对收集的数据进行预处理,如数据清洗、异常值处理、时间特征提取、特征编码和z-score归一化,以确保数据准备好进行模型训练和测试。随后,预处理后的数据被输入到Ret-DNN模型中,该模型作为特征提取器,理解客户交易的完整上下文。进一步,提取的数据作为输入输入到XGBoost模型,该模型预测最终输出为客户购买概率。最后,提出的Ret-DNN XGBoost模型取得了更好的结果,平均绝对误差(MAE)为0.2193,优于现有的Ret-DNN模型。关键词:客户行为预测,极端梯度提升,电子商务,预测分析,零售深度神经网络。

英文摘要

In recent years, electronic (E) commerce services have rapidly increased in the daily lives of people, which helpsthem to purchase products online. However, retail platforms have struggled to understand customer behavior and make it difficult to predict their future purchases. To overcome these challenges, this study proposes a hybrid Retail Deep NeuralNetwork (Ret-DNN) with an Extreme Gradient Boosting(XGBoost) model for capturing temporal features and tabular dynamics of retail data. First, data were sourced from a UnitedKingdom (UK)-based online retailer that contains transactions with almost 500,000 records. Then, the collected data were pre-processed using a series of techniques, such as data cleaning, outlier handling, temporal feature extraction, feature encoding, and z-score normalization, to ensure that the data were ready for model training and testing. Subsequently, the preprocessed data were fed into the Ret-DNN model, which acts as a feature extractor to understand the complete context of customer transactions. Further, the extracted data were fed as input into the XGBoost model, which predicted the final output as the purchase probability of customers. Finally, the proposed Ret-DNN XGBoost model achieved better results by attaining aMean Absolute Error (MAE) 0.2193 when compared to the existing Ret-DNN model. Keywords: Customer behavior forecasting, extreme gradientboosting, electronic commerce, predictive analytic, retail deepneural networks.

2606.17996 2026-06-17 cs.LG cs.AI 新提交

Multiple cyclicity and Wavelet Decomposition with Channel Correlation for Long-term Time Series Forecasting

多重周期性与通道相关的小波分解在长期时间序列预测中的应用

Bin Wang, Heming Yang, Jinfang Sheng

发表机构 * School of Computer Science and Engineering, Central South University(中南大学计算机科学与工程学院)

AI总结 提出McWC模型,通过多层周期性构建、多层感知机提取通道相关性、多级小波分解融合高低频信息,并在频域解耦通道内自相关,实现高效准确的长期预测。

详情
AI中文摘要

周期性和趋势是时间序列数据的重要组成部分,许多基于周期性和趋势的研究在长期时间序列预测中取得了良好效果。然而,我们认为当前工作忽略了时间序列数据中真实世界通道间相关性的影响,导致预测次优。此外,这些模型依赖复杂设计来捕获多样信息,导致计算效率低下。为解决这一挑战,我们提出McWC,一种长期时间序列预测模型,分别对周期性、趋势和通道间相关性进行建模。具体来说,McWC首先使用多层周期性构建模块从数据中解耦周期性信息。然后,使用多层感知机提取通道间相关性。接着,使用多级小波分解模块对数据中的多层高频和低频信息进行建模和融合。最后,聚合不同组件的结果以获得输出。同时,我们通过在频域计算损失函数来解耦通道内自相关。在六个真实世界数据集上的实验表明,McWC实现了最先进的性能,展现出卓越的计算效率和历史信息提取能力。

英文摘要

Cyclicity and trend are important components of time series data and many studies based on cyclicity and trend have achieved good results in long-term time series forecasting. However, we believe that current work neglects the influence of real-world inter-channel correlations in time series data which leads to suboptimal predictions. Furthermore, these models rely on complex designs to capture diverse information so that resulting in low computational efficiency. To address this challenge, we propose McWC, a long-term time series forecasting model that separately models the cyclicity, trend, and inter-channel correlations. Specifically, McWC first decouples cyclical information from data using a multi-layer cyclicity construction module. Then, it extracts inter-channel correlations using multi-layer perceptron. Next, it models and fuses the multi-layer high-frequency and low-frequency information from data using a multi-level wavelet decomposition module. Finally, it aggregates the results of different components to obtain the output. Simultaneously, we decouple intra-channel autocorrelations by calculating a loss function in the frequency domain. Experiments on six real-world datasets demonstrate that McWC achieves state-of-the-art performance, exhibiting excellent computational efficiency and historical information extraction capabilities.

2606.18049 2026-06-17 cs.LG 新提交

ConTex: Reformulating Counterfactual Generation For Time Series Forecasting

ConTex:重新定义时间序列预测的反事实生成

Jan Voets, Hasan Tercan, Tobias Meisen, Sebastian Baum

发表机构 * Institute for Technologies and Management of Digital Transformation, University of Wuppertal(伍珀塔尔大学数字转型技术与管理研究所)

AI总结 针对时间序列预测中反事实解释的不一致和高计算成本问题,提出ConTex模型,通过全局一致的干预策略实现单次前向传播生成稀疏反事实,显著降低计算成本并支持实时应用。

Comments 19 pages, 5 figures, 14 tables

详情
AI中文摘要

基于深度学习的时间序列预测的决策制定不仅需要准确的预测,还需要可操作的见解。然而,当前的架构本身并不提供此类信息。具体来说,需要指导如何修改当前条件,以便从预测结果转向期望的未来情景。反事实解释为此任务提供了自然框架,因为它们表示改变模型预测的最小输入变化,指示何时以及如何进行干预。现有方法依赖于实例级优化,导致跨实例的不一致性、高计算成本以及在实时环境中的有限适用性。为了解决这些限制,我们将时间序列预测的反事实生成重新定义为学习全局一致的干预策略的问题,允许通过单个共享函数生成反事实。我们提出了反事实时间序列解释(ConTex),一种模型无关的解耦架构,包括时间上下文编码器和条件编码器,后接两个头部,分别捕获时间相关性和修改强度方面的干预。这种结构通过单次前向传播在时间和特征维度上产生有针对性的、可解释的干预,克服了基于实例的方法的不稳定性和不一致性,使其适用于实时应用。在多个预测架构和基准数据集上,ConTex在生成稀疏反事实的同时实现了最先进的有效性,最小化了必要干预的数量。此外,与实例级生成相比,我们的方法将计算成本降低了至少12-36倍,并支持约0.007秒的实时推理。

英文摘要

Decision-making with deep learning-based time series forecasting requires not only accurate predictions but also actionable insights. However, current architectures do not inherently provide such information. Specifically, guidance is needed on how current conditions must be modified to shift from a predicted outcome to a desired future scenario. Counterfactual explanations provide a natural framework for this task, as they represent minimal input changes that alter the model's prediction, indicating when and how intervention is required. Existing approaches rely on instance-wise optimization, leading to inconsistency across instances, high computational costs, and limited applicability in real-time settings. To address these limitations, we reformulate counterfactual generation for time series forecasting as the problem of learning a globally consistent intervention strategy, allowing counterfactuals to be generated through a single shared function. We propose Counterfactual Time Series Explanations (ConTex), a model-agnostic, decomposed architecture comprising a temporal context encoder and a conditional encoder, followed by two heads that capture interventions in terms of temporal relevance and modification strength. This structure overcomes the instability and inconsistency of instance-based approaches by producing targeted, interpretable interventions across time and feature dimensions in a single forward pass, making it suitable for real-time applications. Across multiple forecasting architectures and benchmark datasets, ConTex achieves state-of-the-art validity while generating sparse counterfactuals that minimize the number of necessary interventions. Additionally, our approach reduces computational cost by at least 12-36x compared to instance-wise generation and supports real-time inference at approximately 0.007 seconds.

2606.18122 2026-06-17 cs.LG cs.AI cs.AR eess.AS eess.SP 新提交

Embedded Machine Learning for Microcontroller-Class Edge Devices: Data, Feature, Evaluation, and Deployment Pipelines

面向微控制器级边缘设备的嵌入式机器学习:数据、特征、评估与部署流程

Mostafa Darvishi

发表机构 * IEEE

AI总结 本文系统介绍面向微控制器平台的嵌入式机器学习工作流,重点涵盖采样缓冲、特征提取、不平衡验证、模型/运行时协同设计及流式部署等工程决策,并以惯性运动识别和关键词检测为例给出实用设计规则。

Comments 6 pages, 3 figures, 4 tables

详情
AI中文摘要

嵌入式机器学习将推理从云服务转移到资源受限的设备上,这些设备必须在内存、能量和延迟的严格限制下采集数据、预处理信号、运行模型并采取行动。本文针对微控制器级平台,提出了一种面向系统的嵌入式机器学习工作流综合方案。重点放在通用机器学习介绍中常被隐藏的工程决策上:采样和缓冲、作为降维的特征提取、类别不平衡下的验证、模型/运行时协同设计以及流式部署。全文使用两个代表性信号系列:第一个是惯性运动识别,其中将两秒的三轴加速度计窗口从原始样本转换为均方根和频谱特征后再进行分类;第二个是关键词检测,其中对音频进行采样、抗混叠、转换为梅尔频率倒谱系数,并由紧凑的一维卷积网络处理。本文最后给出了鲁棒设备上推理的实用设计规则,包括数据整理、量化、阈值设定、调度和现场监控。

英文摘要

Embedded machine learning moves inference from cloud services to resource-constrained devices that must acquire data, preprocess signals, run a model, and act within tight limits on memory, energy, and latency. This paper presents a systems-oriented synthesis of an embedded machine-learning workflow for microcontroller-class platforms. The emphasis is placed on engineering decisions that are often hidden in generic machine-learning introductions: sampling and buffering, feature extraction as dimensionality reduction, validation under class imbalance, model/runtime co-design, and streaming deployment. Two representative signal families are used throughout the paper. The first is inertial motion recognition, where a two-second, three-axis accelerometer window is transformed from raw samples into root-mean-square and spectral features before classification. The second is keyword spotting, where audio is sampled, anti-aliased, transformed into mel-frequency cepstral coefficients, and processed by a compact one-dimensional convolutional network. The paper concludes with practical design rules for robust on-device inference, including data curation, quantization, thresholding, scheduling, and field monitoring.

2606.17065 2026-06-17 q-fin.CP cs.AI cs.LG 交叉投稿

PIVOT: Bridging Black-Scholes Implied-Volatility and Price Objectives via Differentiable Jäckel Operator

PIVOT: 通过可微分的Jäckel算子桥接Black-Scholes隐含波动率与价格目标

Raeid Saqur, Yannick Limmer, Anastasis Kratsios, Blanka Horvath, Hans Buehler

发表机构 * Mathematical Institute, University of Oxford(牛津大学数学研究所) McMaster University(麦基尔大学) Vector Institute for AI(人工智能矢量研究所) DRW

AI总结 提出PIVOT层,通过隐式微分保留Jäckel求解器的前向精度,并利用门控机制处理低vega区域的奇异性,实现价格与隐含波动率空间的高效可微转换。

Comments 30 pages, 17 figures, 12 tables

详情
AI中文摘要

现代期权学习系统在两种坐标系下运行:价格空间(市场报价且无套利约束最自然执行)和隐含波动率(IV)空间(波动率曲面被平滑、正则化和评估)。瓶颈在于接口而非近似:Jäckel开创性的“Let's Be Rational”(LBR)求解器已经高效地将Black-Scholes价格反转到机器精度。所缺少的是一个可微分层,它在正向传播中保留LBR,并避免通过其分支逻辑进行反向传播。这样的层还必须面对低vega区域中逆映射不可避免的奇异性,其中灵敏度1/vega在vega→0时发散。我们通过PIVOT(价格-隐含波动率目标转换器)填补了这一空白。PIVOT保持LBR正向传播不变,并通过隐式微分通过平滑的Black-Scholes/Black-76价格映射提供反向传播,并带有显式门控合约:无效域返回NaN,良态行接收精确的1/vega梯度,低vega行被衰减而非静默正则化。在单个H100上,融合的Triton内核在机器精度下达到1.79e9 IV/s(与参考C求解器的最大相对误差为9.3e-14);端到端标签生成在合成链上维持48.9M/s,在SPX OptionMetrics上维持16.6M/s。在SPX上的HyperIV风格单日复现中,PIVOT增强目标帕累托主导基线,将保留价格MAE降低高达43.4%,最强的三种子门控目标联合改善价格MAE 38.8%和IV MAE 21.3%;在RUT、VIX和NDX上的跨资产结果显示方向性价格MAE增益分别为40.1%、24.2%和16.7%,而无门控的IV往返控制崩溃为退化的近零曲面,确认门控是正确性合约而非调节旋钮。

英文摘要

Modern option-learning systems operate in two coordinates: price space, where markets quote and no-arbitrage constraints are most naturally enforced, and implied volatility (IV) space, where volatility surfaces are smoothed, regularized, and evaluated. The bottleneck is interface, not approximation: Jäckel's seminal "Let's Be Rational" (LBR) solver already inverts the Black-Scholes price to machine precision efficiently. What is missing is a differentiable layer that preserves LBR in the forward pass and avoids backpropagating through its branch logic. Such a layer must also confront the unavoidable singularity of the inverse map in the low-vega regime, where the sensitivity 1/vega diverges as vega -> 0. We close this gap with PIVOT, the Price-Implied-Volatility Objective Translator. PIVOT keeps the LBR forward pass intact and supplies the backward pass by implicit differentiation through the smooth Black-Scholes/Black-76 price map, with an explicit gating contract: invalid domains return NaN, well-conditioned rows receive the exact 1/vega gradient, and low-vega rows are attenuated rather than silently regularized. On a single H100, a fused Triton kernel reaches 1.79e9 IV/s at machine precision (9.3e-14 max relative error vs. the reference C solver); end-to-end label generation sustains 48.9M/s on synthetic chains and 16.6M/s on SPX OptionMetrics. In a HyperIV-style one-day reproduction on SPX, PIVOT-augmented objectives Pareto-dominate the baselines, reducing held-out price MAE by up to 43.4% and the strongest three-seed gated objective improving price MAE by 38.8% and IV MAE by 21.3% jointly; cross-asset results on RUT, VIX, and NDX show directional price-MAE gains of 40.1%, 24.2%, and 16.7%, while an ungated IV-roundtrip control collapses to a degenerate near-zero surface, confirming the gate as a correctness contract rather than a tuning knob.

2606.17070 2026-06-17 physics.ao-ph cs.AI cs.LG 交叉投稿

KFTD: Koopman-Fourier Time-Differentiable Network for Continuous Ocean Spatiotemporal Forecasting

KFTD: 用于连续海洋时空预测的Koopman-Fourier时间可微网络

Qinghui Chen, Zekai Zhang, Hailong Liu, Jinglin Zhang, Cong Bai

发表机构 * Shandong University(山东大学) Laoshan Laboratory(崂山实验室) Chinese Academy of Sciences(中国科学院) Zhejiang University of Technology(浙江工业大学)

AI总结 提出KFTD网络,通过Koopman线性空间和傅里叶分析实现连续时间插值,结合轻量残差网络进行预测,在四个海洋数据集上均方误差平均降低5.6%,效率提升76.25%。

详情
AI中文摘要

准确的海洋预测对于气候监测和灾害预警至关重要。然而,海洋时空预测面临建模复杂动力系统和确保计算效率的双重挑战。我们提出了Koopman傅里叶时间可微(KFTD)网络,一种时间连续的两阶段范式,将插值与预测解耦,以实现高效且可扩展的时空建模。我们将复杂的非线性动力学映射到Koopman线性空间,并利用傅里叶分析实现任意子步的连续时间插值。一个轻量级残差网络消耗高保真中间状态以产生最终预测。与扩散模型不同,KFTD消除了多步噪声采样,直接在连续时间内演化系统,实现了4倍的计算加速。我们进一步引入DPP损失,以端到端方式支持任意PDE约束,打破了纯数据驱动方法的物理一致性瓶颈。在四个海洋数据集上的实验结果证实,我们的连续时间框架使MSE平均降低5.6%(SST最高达12.7%),并且效率比MCVD提高了76.25%。

英文摘要

Accurate oceanic forecasting is critical for climate monitoring and disaster early warning. However, ocean spatiotemporal forecasting encounters the double challenges of modeling complex dynamical systems and ensuring computational efficiency. We present Koopman Fourier Time-Differentiable (KFTD) Network, a time continuous twostage paradigm that decouples interpolation from prediction to achieve efficient and scalable spatiotemporal modeling. We map complex nonlinear dynamics into the Koopman linear space and exploit Fourier analysis to enable continuous time interpolation at arbitrary sub-steps. A lightweight residual network consumes the high fidelity intermediate states to yield the final forecast. Unlike diffusion models, KFTD eliminates multi step noise sampling and directly evolves the system in continuous time, yielding a 4 computational speedup. We further introduce a DPP Loss that supports arbitrary PDE constraints in an endtoend manner, breaking the physical consistency bottleneck of pure data-driven approaches. Empirical results on four ocean datasets confirm that our continuous time framework reduces MSE by an average of 5.6% (up to 12.7% for SST) and improves efficiency over MCVD by 76.25%.

2606.17109 2026-06-17 cs.CR cs.AI cs.LG 交叉投稿

Timestamp-Aware Spatio-Temporal Graph Contrastive Learning for Network Intrusion Detection

时间戳感知的时空图对比学习用于网络入侵检测

Jianli Dai, Guangwei Wu, Jiacheng Li, Weiping Wang, An He, Xinjun Xiao

发表机构 * Central South University of Forestry and Technology, School of Computer Science and Mathematics(中央林业科技大学计算机科学与数学学院) Central South University, School of Computer Science and Engineering(中南大学计算机科学与工程学院)

AI总结 提出一种自监督图神经网络框架,通过时间戳构建时序图,结合E-GraphSAGE和LSTM编码时空依赖,并采用多视图图对比学习(时空特征对比)及自适应权重策略,在四个数据集上达到与监督方法相当的性能。

详情
AI中文摘要

鉴于图神经网络(GNN)在建模网络流量间关系结构方面的有效性,它们已被广泛用于网络入侵检测系统(NIDS)。然而,大多数现有基于GNN的NIDS方法关注流量关系的结构,并将其视为时间独立,这限制了它们应对不断演变的攻击行为的能力。此外,它们对监督或半监督学习的依赖通常限制了对未见攻击的泛化能力。为解决这些限制,我们提出了一种新颖的自监督GNN框架。据我们所知,所提出的模型是首批显式利用真实时间戳的自监督GNN-based NIDS模型之一,这为表示学习提供了忠实的时间依赖关系。我们首先根据时间戳从网络流量中构建一系列时序图,然后采用基于E-GraphSAGE和LSTM的编码器充分提取网络流量的时间信息和空间依赖关系,而无需引入耗时的注意力机制。引入了一种多视图图对比学习(GCL)方案,其中联合执行时间、空间和特征对比,分别捕获时间连续性、保持结构一致性并提高所学表示的泛化性和鲁棒性。此外,设计了一种基于梯度范数的自适应加权策略来优化对比损失权重。在四个具有真实时间戳的代表性NIDS数据集上的实验结果表明,我们的方法显著优于现有自监督方法,并达到了与监督最先进GNN方法相当的性能,同时保持了高计算效率。

英文摘要

Given their effectiveness in modeling the relational structure among network traffic flows, graph neural networks (GNNs) have been widely adopted in network intrusion detection systems (NIDSs). However, most existing GNN-based NIDS approaches focus on the relational structure of traffic flows, and treat them as temporally independent, which limits their ability to cope with evolving attack behaviors. Moreover, their reliance on supervised or semi-supervised learning often restricts generalization to unseen attacks. To address these limitations, we propose a novel self-supervised GNN-based framework. To the best of our knowledge, the proposed model is among the first self-supervised GNN-based NIDS models to explicitly leverage real timestamps, which provides faithful temporal dependencies for representation learning. We first construct a series of temporal graphs from network traffic flows according to their timestamps, and then employ an E-GraphSAGE and LSTM based encoder to fully extract temporal information and spatial dependencies of network traffic, without introducing time-costly attention mechanisms. A multi-view graph contrastive learning (GCL) scheme is introduced, where temporal, spatial, and feature contrasts are jointly performed to capture temporal continuity, preserve structural consistency, and improve the generalization and robustness of the learned representations, respectively. In addition, a gradient-norm-based adaptive weighting strategy is designed to optimize the contrastive loss weights. Experimental results on four representative NIDS datasets with real timestamps demonstrate that our method significantly outperforms existing self-supervised approaches and achieves performance comparable to the supervised state-of-the-art GNN method, while maintaining high computational efficiency.

2606.17121 2026-06-17 stat.AP cs.LG physics.flu-dyn 交叉投稿

Regularized Machine Learning for System Identification of Ship Free-Running Manoeuvres from CFD-Based Synthetic Data: A Comparative Study

基于CFD合成数据的船舶自由航行操纵系统辨识的正则化机器学习:比较研究

R. F. Suárez, J. C. Berndt, M. Abdel-Maksoud

发表机构 * Hamburg University of Technology (TUHH)(汉堡技术大学)

AI总结 本研究使用正则化回归方法从CFD生成的自由航行数据中辨识船舶水动力系数,重点评估了系数集大小、训练长度和操纵组合对模型性能的影响,发现Ridge回归在计算效率和预测精度间取得最佳平衡。

Comments 28 pages

详情
AI中文摘要

本研究探讨了从CFD生成的自由航行仿真数据中辨识船舶水动力系数的监督机器学习技术。具体而言,将普通最小二乘法和正则化回归方法应用于Abkowitz型操纵模型。训练和验证数据集来自Z形和回转操纵的URANS仿真,这些仿真已通过实验基准数据验证。分析评估了系数集大小、预测模型训练所需的最小训练长度以及操纵组合对模型性能的影响。结果表明,只要通过适当的系数选择、回归模型或输入数据变异性解决多重共线性问题,大角度Z形操纵适用于水动力系统辨识。较大的系数集为可变条件提供了更大的模型灵活性,但更容易出现多重共线性。正则化回归技术有效缓解了多重共线性,并显著提高了预测精度,而纳入更多样化的操纵数据同样如此。在测试的模型中,Ridge回归在计算效率和预测精度之间提供了最佳折衷。

英文摘要

This study investigates supervised machine learning techniques for identifying ship hydrodynamic coefficients from CFD-generated data from free-running simulations. Specifically, ordinary least squares and regularized regression methods are applied to Abkowitz-type manoeuvring models. Training and validation datasets are derived from URANS simulations of zig-zag and turning circle manoeuvres, which are validated against experimental benchmark data. The analysis evaluates the effects of coefficient set size, minimum training length required for predictive model training, and manoeuvre combinations on model performance. Results demonstrate the suitability of large-angle zig-zag manoeuvres for hydrodynamic system identification, provided that multicollinearity is addressed through appropriate coefficient selection, regression models, or input data variability. Larger coefficient sets offer greater model flexibility for variable conditions but are more prone to multicollinearity. Regularized regression techniques effectively mitigate multicollinearity and notably enhance prediction accuracy, as does incorporating more diverse manoeuvring data. Among tested models, Ridge regression provided the best compromise between computational efficiency and prediction accuracy.

2606.17294 2026-06-17 cs.RO cs.LG 交叉投稿

VISTA: Scale-Aware Visual Navigation via Action History Conditioning

VISTA:通过动作历史条件实现尺度感知的视觉导航

Maeva Guerrier, Koki Kobayashi, Simon Roy, Jana Pavlasek, Giovanni Beltrame

发表机构 * Polytechnique Montreal(蒙特利尔理工学院) MILA(MILA研究所) Institute of Science Tokyo(东京科学大学) CoRA Lab(CoRA实验室) Mist Lab(Mist实验室)

AI总结 针对视觉导航基础模型因动作归一化导致的尺度脆弱性,提出通过动作历史条件化提供物理位移上下文,并集成DINOv3编码器增强重复环境中的特征表示,实现零样本跨环境部署。

详情
AI中文摘要

视觉导航基础模型(VNMs)承诺能够实现端到端的学习导航策略,并能在不同实体和环境之间进行零样本部署。为了保持通用性,许多基于视觉的导航模型预测归一化动作。然而,这种归一化引入了一个关键的部署漏洞:对相同的归一化轨迹应用不同的缩放因子会改变其物理几何形状,从而降低导航性能并增加碰撞风险。我们通过将模型条件化于归一化动作历史以及图像观测来解决这一漏洞,为模型预测与机器人实际物理位移之间的关系提供显式上下文。此外,当前的VNMs在缺乏显著特征的视觉重复环境中常常表现不佳。为解决此问题,我们集成了DINOv3编码器,其更丰富的表示使我们的模型能够捕获观测之间的空间和几何维度。VISTA能够鲁棒地泛化到分布外环境,在户外、森林和办公室环境的零样本真实世界部署中实现了100%的目标预测准确率,平均95%的检查点被穿越,展示了在未见环境中的一致路径跟随能力。

英文摘要

Vision Navigation Foundation Models (VNMs) promise end-to-end learned navigation policies capable of zero-shot deployment across diverse embodiments and environments. To maintain generality, many vision-based navigation models predict normalized actions. However, this normalization introduces a critical deployment vulnerability: applying different scaling factors to the same normalized trajectory alters its physical geometry, which degrades navigation performance and increases collision risks. We address this vulnerability by conditioning the model on normalized action histories alongside image observations, providing explicit context on the relationship between the model's predictions and the robot's actual physical displacement. Furthermore, current VNMs often struggle in visually repetitive environments that lack distinct features. To resolve this issue, we integrate a DINOv3 encoder, whose richer representations enable our model to capture both spatial and geometric dimensions between observations. VISTA generalizes robustly to out-of-distribution environments, achieving 100% goal prediction accuracy in zero-shot, real-world deployment in Outdoor, Forest and Office settings, and an average of 95% checkpoints crossed, demonstrating consistent path following in unseen environments.

2606.17362 2026-06-17 cs.CV cs.AI cs.LG cs.RO 交叉投稿

DriveJudge: Rethinking Autonomous Driving Evaluation with Vision-Language Models

DriveJudge: 用视觉-语言模型重新思考自动驾驶评估

Xinglong Sun, Kevin Xie, Jenny Schmalfuss, Despoina Paschalidou, Xiuming Zhang, Sanja Fidler, Kashyap Chitta, Jose M. Alvarez

发表机构 * NVIDIA(英伟达)

AI总结 提出DriveJudge,结合规则评估与VLM推理,通过选择性调用物理规则函数实现可解释且上下文感知的驾驶评估,在驾驶质量分类和轨迹偏好选择任务上超越现有方法。

Comments Under Review

详情
AI中文摘要

自动驾驶已转向端到端策略学习,其中可靠、可解释的策略评估是一个基本挑战,因为驾驶质量高度依赖于上下文。常用的基于规则的驾驶指标(如EPDMS)可解释但缺乏上下文感知,而近期基于VLM的评估虽具有上下文感知能力,但受限于模糊的VLM输出和较弱的物理基础。为了以既可解释又上下文感知的方式评估驾驶,我们引入了DriveJudge。DriveJudge是一个驾驶评估代理,它将规则基础评估与视觉-语言模型(VLM)推理相结合,并在解释环境上下文后有选择地调用基于物理的确定性规则函数。为了训练和评估DriveJudge,我们整理了一个包含33,577个具有挑战性的驾驶样本的大规模数据集,并附有人类标注,指示给定场景中的驾驶行为是否合理。利用该数据集,我们解决了驾驶指标评估中未被充分探索的问题,并引入了两个与人类对齐的基准任务:驾驶质量分类和轨迹偏好选择。DriveJudge在驾驶质量分类上比EPDMS高出21.23 AUC,在轨迹偏好选择上比近期基于VLM的DriveCritic高出6.5%,为可解释且精确的驾驶评估设立了新标准。

英文摘要

Autonomous driving has shifted towards end-to-end policy learning, where reliable, interpretable policy evaluation is a fundamental challenge as driving quality is highly context-dependent. Commonly used rule-based driving metrics like EPDMS are interpretable but lack context-awareness, while recent VLMbased evaluations are context-aware but limited by ambiguous VLM outputs and weak physical grounding. To evaluate driving in a manner that is both interpretable and context-aware, we introduce DriveJudge. DriveJudge is a driving evaluation agent that combines rule-grounded evaluation with Vision-Language Model (VLM) reasoning and selectively invokes physically-grounded deterministic rule functions after interpreting the environmental context. To train and evaluate DriveJudge, we curate a large-scale dataset of 33,577 challenging driving samples with human annotations on whether the driving behavior is reasonable in the given scenario. With this dataset, we address the underexplored problem of driving metric evaluation, and introduce two human-aligned benchmark tasks: Driving Quality Classification and Trajectory Preference Selection. DriveJudge outperforms EPDMS for driving quality classification by 21.23 AUC, and the recent VLM-based DriveCritic for trajectory preference selection by 6.5%, setting a new standard for interpretable and precise driving evaluation.

2606.17394 2026-06-17 cs.RO cs.LG 交叉投稿

Damage Adaptation in Seconds for Architected Materials

结构材料的秒级损伤自适应

James Avtges, Jake Ketchum, Helena Young, Taekyoung Kim, Ryan Truby, Todd Murphey

发表机构 * Northwestern University(西北大学)

AI总结 提出LEAP方法,利用潜在损伤表示和集成学习,在软驱动系统中实现一分钟内对灾难性损伤的自适应,无需仿真。

Comments Proceedings of Robotics: Science and Systems

详情
AI中文摘要

对损伤的自适应和原位物理修复对于长期机器人自主性至关重要,但在狭义定义和良好预期的范围之外具有挑战性。在这项工作中,我们在软驱动系统中在一分钟内本体感知地适应灾难性损伤。结构材料非常适合自适应:执行器故障是逐渐发生而非急性,并且损伤可以在低维、离散坐标空间中描述。令人惊讶的是,潜在损伤表示加上简单而稳健的集成方法足以实时适应未见过的损伤。此外,我们确定了指数样本复杂度降低为线性样本复杂度的条件,用于结构材料的学习表示,这是相对于刚性组件或连续软机构的明显优势。我们通过基于手性剪切拉胀(HSA)执行器的6自由度软手腕的追踪任务,演示了我们的自适应本体感知方法LEAP。我们的算法能够适应切割、烧伤和执行器修复,实现了无仿真的实时自适应,这对于在实验室外实现软机器人的承诺至关重要。视频和更多信息请访问此https URL。

英文摘要

Adaptation to damages and in-situ physical repairs is essential for long-term robot autonomy, yet challenging outside of narrowly defined and well-anticipated bounds. In this work we proprioceptively adapt to catastrophic damage in soft-actuated systems in under one minute. Architected materials are well equipped for adaptation: actuator failure occurs gradually rather than acutely, and damage can be described in a low-dimensional, discrete coordinate space. Surprisingly, latent damage representations plus a simple yet robust ensemble method is sufficient for adapting to unseen damage in real-time. Moreover, we identify conditions under which exponential sample complexity collapses to linear sample complexity for learned representations of architected materials, a concrete advantage over rigid components or continuum soft mechanisms. We demonstrate LEAP, our method for adaptive proprioception, via a tracing task for a 6DoF soft wrist based on Handed Shearing Auxetic (HSA) actuators. Our algorithm is able to adapt to cuts, burns, and actuator repairs, enabling simulation-free real-time adaptation that is critical for realizing the promise of soft robots outside the lab. Videos and more information are available at https://murpheylab.github.io/leap.

2606.17449 2026-06-17 cs.CL cs.AI cs.CV cs.LG cs.MM 交叉投稿

MODE-RAG: Manifold Outlier Diagnosis and Energy-based Retrieval-Augmented Generation Evaluation

MODE-RAG: 基于流形异常诊断和能量的检索增强生成评估

Zehang Wei, Jiaxin Dai, Jiamin Yan, Xiang Xiang

发表机构 * School of Computer Science & Tech, Huazhong University of Science and Technology(华中科技大学计算机科学与技术学院) School of AI and Automation, Huazhong University of Science and Technology(华中科技大学人工智能与自动化学院)

AI总结 提出MODE-RAG多智能体系统,利用变分自由能和内部注意力状态动态门控干预,结合蒙特卡洛树搜索和logit扰动减少多模态检索增强生成中的幻觉和逻辑捏造。

Comments To be presented at ACL 2026

详情
AI中文摘要

虽然多模态检索增强生成(M-RAG)增强了大型视觉语言模型,但它仍然非常容易受到跨模态幻觉、因果捏造和谄媚的影响。此外,现有的缓解流程常常面临干预悖论:静态规则往往不必要地干扰准确的生成,而完全不加引导的多模态推理则允许现有的不匹配级联成严重的逻辑捏造。为了量化和缓解这些幻觉,我们提出了一个多智能体系统MODE-RAG,由变分自由能(VFE)和内部注意力状态驱动,以动态门控干预。高风险查询被路由到五个阶段特定的智能体,集成蒙特卡洛树搜索(MCTS)进行严格的因果推导,以及logit扰动以惩罚谄媚。专门的纠正和监管智能体确保格式稳定性并执行事后事实验证。为了客观评估我们的方法,我们引入了ModeVent,一个源自MultiVent数据集的具有挑战性的子集。大量实验表明,我们的系统有效降低了幻觉率和逻辑捏造,显著提高了M-RAG系统的鲁棒性。

英文摘要

While Multimodal Retrieval-Augmented Generation (M-RAG) enhances Large Vision-Language Models, it remains highly susceptible to cross-modal hallucinations, causal fabrications, and sycophancy. Furthermore, existing mitigation pipelines often face an intervention paradox: static rules tend to unnecessarily disrupt accurate generations, whereas leaving the multi-modal reasoning completely unguided allows existing mismatches to cascade into severe logical fabrications. To quantify and mitigate these hallucinations, we propose a Multi-Agent system, MODE-RAG, driven by Variational Free Energy (VFE) and internal attention states to dynamically gate interventions. High-risk queries are routed to five stage-specific agents, integrating Monte Carlo Tree Search (MCTS) for rigorous causal derivation and logit perturbations to penalize sycophancy. Dedicated Correction and Overseer agents ensure formatting stability and perform post-hoc factual verification. To objectively evaluate our approach, we introduce ModeVent, a challenging subset derived from the MultiVent dataset. Extensive experiments indicate that our system effectively reduces hallucination rates and logical fabrication, significantly improving the robustness of M-RAG systems.

2606.17461 2026-06-17 cs.AR cs.AI cs.LG 交叉投稿

AUTOGATE: Automated Clock Gating via Toggling-Aware LLM-based RTL Rewriting

AUTOGATE:基于翻转感知的LLM驱动RTL重写的自动时钟门控

Yiting Wang, Chenhui Deng, Chia-Tung Ho, Yanqing Zhang, Zhuo Feng, Cunxi Yu, Ang Li, Gang Qu, Brucek Khailany

发表机构 * University of Maryland, College Park(马里兰大学学院公园分校) NVIDIA(英伟达)

AI总结 提出AUTOGATE框架,通过ML-LLM协同设计将波形翻转迹线转化为紧凑表示,指导LLM进行RTL重写,实现层次化代码库中的时钟门控优化,平均降低动态功耗49.31%。

Comments 9 pages, 6 figures, 7 tables

详情
AI中文摘要

细粒度时钟门控(FGCG)是降低动态功耗最有效的技术之一,但当前的FGCG优化流程仍主要依赖手动操作。近期基于LLM的RTL优化方法受限于两个关键缺陷:(1)无法处理跨越数百万周期的长波形迹线,(2)难以在保持正确性的同时将优化扩展到大型层次化代码库。在本工作中,我们提出了AUTOGATE,这是首个面向工业级RTL功耗优化的智能体框架,支持在大型层次化代码库中进行工作负载感知的时钟门控优化。AUTOGATE引入了机器学习(ML)与LLM的协同设计,桥接了波形级分析与RTL重写。具体而言,我们设计了一种基于ML的聚类算法,将原始翻转迹线提炼为紧凑的结构化表示,以指导基于LLM的RTL重写。这使得无需LLM直接处理原始波形数据即可准确识别和应用时钟门控机会。为增强可扩展性,AUTOGATE采用层次化多智能体架构,将大型设计分解为可独立优化的模块,从而在深层设计层次中实现协调优化。我们在从小型RTL设计到大型工业级代码库的多样化设计集上评估了AUTOGATE。实验结果表明,与基线相比,AUTOGATE持续降低动态功耗。在小型设计套件上,AUTOGATE平均降低动态功耗49.31%。在工业级设计上,它在NVDLA和BlackParrot上分别实现了19.34%和7.96%的动态功耗降低,在高度优化的专有生产设计上最高降低6.86%。

英文摘要

Fine-grain clock gating (FGCG) is among the most effective techniques for reducing dynamic power, yet current FGCG optimization flows remain largely manual. Recent LLM-based RTL optimization approaches remain limited by two key drawbacks: (1) the inability to process long waveform traces spanning millions of cycles, and (2) the difficulty of scaling optimization to large hierarchical codebases while preserving correctness. In this work, we present AUTOGATE, the first agentic framework for industry-grade RTL power optimization, enabling workload-aware clock-gating optimization across large hierarchical codebases. AUTOGATE introduces a Machine Learning (ML)-LLM co-design that bridges waveform-level analysis and RTL rewriting. Specifically, we design an ML-based clustering algorithm that distills raw toggling traces into compact, structured representations that guide LLM-based RTL rewriting. This enables accurate identification and application of clock-gating opportunities without requiring LLMs to directly process raw waveform data. To enhance scalability, AUTOGATE employs a hierarchical multi-agent architecture that decomposes large designs into independently optimizable modules, enabling coordinated optimization across deep design hierarchies. We evaluate AUTOGATE on a diverse set of designs ranging from small RTL designs to large industrial-grade codebases. Experimental results show that AUTOGATE consistently reduces dynamic power relative to baselines. Across the small-design suite, AUTOGATE reduces dynamic power by 49.31% on average. On industry-scale designs, it achieves 19.34% and 7.96% dynamic power reductions on NVDLA and BlackParrot, respectively, and up to 6.86% on highly optimized proprietary production designs.

2606.17530 2026-06-17 physics.soc-ph cs.LG econ.GN q-fin.EC stat.AP 交叉投稿

Public transit gains and spatially uneven travel demand changes after NYC congestion pricing

纽约市拥堵收费后公共交通增益与空间不均的出行需求变化

Donghang Li, Dingyi Zhuang, Yunlin Li, Chenan Shen, Nina Cao, Yunhan Zheng, Shenhao Wang, Jinhua Zhao

发表机构 * Department of Civil and Environmental Engineering, Massachusetts Institute of Technology(麻省理工学院土木与环境工程系) Department of Urban Studies and Planning, Massachusetts Institute of Technology(麻省理工学院城市研究与规划系) Mathematical Institute, University of Oxford(牛津大学数学院) Department of Mechanical Engineering, Massachusetts Institute of Technology(麻省理工学院机械工程系) College of Urban and Environmental Sciences, Peking University(北京大学城市与环境科学学院) Department of Urban and Regional Planning, University of Florida(佛罗里达大学城市与区域规划系) Center for Computational Science and Engineering, Massachusetts Institute of Technology(麻省理工学院计算科学与工程中心)

AI总结 利用时间序列基础模型生成概率反事实预测,评估纽约市2025年实施的拥堵收费政策,发现公交和地铁客流量显著增加,但总体出行需求略有下降,且影响存在空间异质性。

详情
AI中文摘要

纽约市于2025年1月实施了全国首个基于区域的拥堵收费计划,为评估全系统城市出行如何响应大规模定价干预提供了机会。由于此类政策会在不同交通方式和区域间产生溢出效应,因此难以构建可信的控制组。我们利用时间序列基础模型生成具有校准不确定性的概率反事实需求预测,以应对这一挑战。将该框架应用于公交、地铁和总出行量数据,我们发现,与预期无政策需求相比,政策实施后公交和地铁客流量显著增加,而总体出行需求略有下降。影响存在空间异质性:总体出行需求的减少集中在拥堵缓解区内,而公共交通的增益则延伸至曼哈顿核心区以外。社会人口分析进一步揭示了不同社区之间的适应差异,凸显了空间公平性问题。我们的框架为在缺乏干净控制组的情况下,对全系统城市干预进行不确定性感知评估提供了一种可扩展的方法。

英文摘要

New York City implemented the nation's first cordon-based congestion pricing program in January 2025, providing an opportunity to evaluate how system-wide urban mobility responds to large-scale pricing interventions. Because such policies generate spillovers across modes and locations, credible control groups are difficult to construct. We address this challenge using time series foundation models to generate probabilistic counterfactual demand forecasts with calibrated uncertainty. Applying this framework to bus, subway, and aggregate trip volume data, we find that post-policy bus and subway ridership increased significantly relative to expected no-policy demand, while overall travel demand decreased modestly. The effects are spatially heterogeneous: while reductions in overall travel demand are concentrated within the Congestion Relief Zone, transit gains extend beyond Manhattan's core. Socio-demographic analyses further reveal uneven adaptation across neighborhoods, highlighting spatial equity implications. Our framework provides a scalable approach for the uncertainty-aware evaluation of system-wide urban interventions when clean control groups are unavailable.

2606.17958 2026-06-17 cs.CV cs.LG 交叉投稿

Beyond Visual Cues: CoT-Enhanced Reasoning for Semi-supervised Medical Image Segmentation

超越视觉线索:CoT增强推理用于半监督医学图像分割

Yuming Chen, Yuxin Xie, Tao Zhou, Yi Zhou

发表机构 * School of Computer Science and Engineering, Southeast University(东南大学计算机科学与工程学院) Key Laboratory of New Generation Artificial Intelligence Technology and Its Interdisciplinary Applications, Ministry of Education(教育部新一代人工智能技术及其跨学科应用重点实验室) Nanjing University of Science and Technology(南京理工大学)

AI总结 提出CERS框架,通过集成链式思维推理和语义参考选择策略,解决半监督医学图像分割中的视觉-语义不匹配问题,在边界模糊和语义不一致场景下优于现有方法。

Comments Accepted to MICCAI 2026

详情
AI中文摘要

半监督医学图像分割已成为医学图像分析中的主导研究问题,通过对未标记数据利用一致性正则化来缓解标注稀缺。然而,现有方法主要通过视觉模式匹配操作,严重依赖像素级相似性。这种以视觉为中心的依赖在临床场景中常常失效,因为视觉上相似的病变可能需要不同的诊断结论,从而无法捕捉专家使用的潜在诊断逻辑。为了解决这个问题,我们超越视觉线索,提出了CERS(CoT增强推理分割),一个集成链式思维(CoT)推理以区分病理上不同案例的框架。具体来说,我们构建了一个知识池,其中包含由大型语言模型(LLMs)生成的丰富语言推理描述。引入了一种语义感知的参考选择策略来识别历史证据,首先通过形态学过滤候选,然后通过CoT一致性进行细化以消除硬负样本。此外,设计了多尺度坐标注意力模块(MCAM)以有效地将这种推理衍生的上下文融合到解码过程中。大量实验证明了CERS相对于最先进方法的优越性,特别是在解决边界模糊和语义不一致方面。代码可在该https URL获取。

英文摘要

Semi-supervised medical image segmentation has emerged as a dominant research problem in medical image analysis, mitigating annotation scarcity by leveraging consistency regularization on unlabeled data. However, existing approaches operate predominantly via visual pattern matching, relying heavily on pixel-level similarities. This visual-centric dependency often falters in clinical scenarios characterized by the visual-semantic mismatch, where visually similar lesions warrant distinct diagnostic conclusions, thus failing to capture the underlying diagnostic logic used by experts. To address this, we move beyond visual cues and propose CERS (CoT-Enhanced Reasoning Segmentation), a framework that integrates Chain-of-Thought (CoT) reasoning to distinguish pathologically distinct cases. Specifically, we construct a knowledge pool enriched with linguistic reasoning descriptions generated by large language models (LLMs). A semantic-aware reference selection strategy is introduced to identify historical evidence, filtering candidates first by morphology, and then refining them via CoT consistency to eliminate hard negatives. Furthermore, a multi-scale coordinate attention module (MCAM) is designed to effectively fuse this reasoning-derived context into the decoding process. Extensive experiments demonstrate the superiority of CERS against state-of-the-art approaches, particularly in resolving boundary ambiguities and semantic inconsistencies. The code is available at https://github.com/cymasuna/CERS.

2606.18021 2026-06-17 cs.AI cs.CL cs.LG cs.MA 交叉投稿

LegalHalluLens: Typed Hallucination Auditing and Calibrated Multi-Agent Debate for Trustworthy Legal AI

LegalHalluLens: 类型化幻觉审计与校准的多智能体辩论以实现可信赖的法律AI

Lalit Yadav, Akshaj Gurugubelli

发表机构 * Independent Researcher, Sunnyvale, CA, USA(独立研究者,美国加州太阳谷) Independent Researcher, San Diego, CA, USA(独立研究者,美国加州圣地亚哥)

AI总结 针对法律AI中聚合指标掩盖的错误集中性和方向性问题,提出LegalHalluLens审计框架,通过类型化幻觉画像、风险方向指数(RDI)和校准辩论管道,将幻觉检测减少45%,并揭示聚合指标隐藏的失败模式。

Comments 15 pages, 5 figures; Published at the Second Workshop on Agents in the Wild: Safety, Security, and Beyond (AIWILD) at ICML 2026

详情
AI中文摘要

部署在法律工作流程中的AI系统以聚合指标报告的约52%的比率产生幻觉,但这个平均值掩盖了错误集中的位置和方向,使合规官员无法获得可操作的可信部署信号。我们提出LegalHalluLens,一个包含三个组件的审计框架:基于CUAD(Hendrycks等人,2021)的四种法律动机声明类别(数字、时间、义务/权利、事实)的类型化幻觉画像;一个风险方向指数(RDI),将遗漏与发明偏差简化为一个可部署比较的标量;以及一个针对幅度和方向校准的类型化辩论管道。在510份合同和249,252个条款级实例上,我们测量了义务/数字和时间声明之间约38-40个百分点的模型内差距,而聚合报告隐藏了这一点,并表明两个具有匹配的52%比率的系统可能具有相反的RDI。辩论管道将虚构检测减少了45%,每个类别的收益跟踪诊断结果,使用显著更小的骨干网络(4B活跃参数)匹配商业API。类型化画像和RDI揭示了聚合指标隐藏的失败模式;我们进一步表明这些诊断可作为多智能体辩论管道的校准输入,其中针对测量失败模式的怀疑挑战和非对称门优于通用调整的辩论。该框架支持部署在现实世界中的法律AI的方向感知采购、问责制和智能体设计。

英文摘要

AI systems deployed in legal workflows hallucinate at rates that aggregate metrics report at ~52%, but this average conceals where errors concentrate and in which direction they run, leaving compliance officers without an actionable signal for trustworthy deployment. We present LegalHalluLens, an auditing framework with three components: typed hallucination profiles across four legally-motivated claim categories (numeric, temporal, obligation/entitlement, factual) over CUAD (Hendrycks et al., 2021); a Risk Direction Index (RDI) that reduces omission-versus-invention bias to a single deployment-comparable scalar; and a typed debate pipeline calibrated to both magnitudes and directions. Across 510 contracts and 249,252 clause-level instances we measure a within-model gap of approximately 38-40 pp between obligation/numeric and temporal claims that aggregate reporting hides, and show that two systems with matched 52% rates can carry opposite RDIs. The debate pipeline reduces fabricated detections by 45% with per-category gains tracking the diagnosis, matching commercial APIs with a substantially smaller backbone (4B active parameters). Typed profiles and RDI surface failure modes that aggregate metrics hide; we further show these diagnostics serve as calibration inputs for multi-agent debate pipelines, where Skeptic challenges and asymmetric gates targeted at measured failure modes outperform generically-tuned debate. The framework supports direction-aware procurement, accountability, and agent design for legal AI deployed in the wild.

2606.18063 2026-06-17 cs.CV cs.AI cs.LG 交叉投稿

When LLMs Analyze Scars: From Images to Clinically-Meaningful Features

当LLM分析疤痕:从图像到临床有意义的特征

Ruman Wang, Hangting Ye

发表机构 * Liaoning University of Traditional Chinese Medicine(辽宁中医药大学) School of Artificial Intelligence, Jilin University(吉林大学人工智能学院)

AI总结 提出ScaFE框架,利用LLM作为知识驱动的特征工程师,将高维图像转化为低维临床可解释特征,在数据稀缺的疤痕分类中优于端到端深度学习方法。

详情
AI中文摘要

医学图像分类面临一个基本困境:虽然深度学习模型在大规模数据上表现卓越,但现实临床场景中由于标注成本、隐私约束和疾病罕见性,常常遭受严重的数据稀缺。这一挑战在病理性疤痕分类中尤为突出,区分瘢痕疙瘩和增生性疤痕需要微妙的专家知识,且标注图像极其有限。我们提出一种新范式,将大型语言模型(LLM)重新定位为知识驱动的特征工程师,而非端到端分类器。我们将此框架称为ScaFE(疤痕特征工程)。我们的关键洞察是,LLM编码了丰富的医学知识,可以外部化为可执行的特征提取代码,从而将高维图像转化为低维、临床可解释的表示。具体来说,我们使用既定的疤痕评估标准提示LLM,生成确定性的Python代码,提取与临床评分系统(如温哥华疤痕量表)对齐的特征。我们的方法提供三个关键优势:(1)数据效率,通过将知识获取与统计学习解耦,在有限训练样本下实现稳健性能;(2)隐私保护,原始图像在本地处理,不暴露给外部LLM;(3)可解释性,通过基于临床推理的显式特征。在疤痕分类上的大量实验表明,在数据有限条件下,我们的方法始终优于端到端深度学习基线或使用LLM作为黑盒分类器,为将LLM集成到数据高效且临床透明的医学AI系统中开辟了有前景的方向。

英文摘要

Medical image classification faces a fundamental dilemma: while deep learning models achieve remarkable performance at scale, real-world clinical scenarios often suffer from severe data scarcity due to annotation costs, privacy constraints, and disease rarity. This challenge is particularly pronounced in pathological scar classification, where differentiating keloids from hypertrophic scars requires subtle expert knowledge and labeled images are extremely limited. We propose a novel paradigm that repositions large language models (LLMs) as knowledge-driven feature engineers rather than end-to-end classifiers. We call this framework ScaFE (Scar Feature Engineering). Our key insight is that LLMs encode rich medical knowledge that can be externalized as executable feature extraction code, enabling the transformation of high-dimensional images into low-dimensional, clinically interpretable representations. Specifically, we prompt an LLM with established scar assessment criteria to generate deterministic Python code that extracts features aligned with clinical scoring systems such as the Vancouver Scar Scale. Our approach offers three key advantages: (1) data efficiency, achieving robust performance with limited training samples by decoupling knowledge acquisition from statistical learning; (2) privacy preservation, as raw images are processed locally without exposure to external LLMs; and (3) interpretability, through explicit features grounded in clinical reasoning. Extensive experiments on scar classification demonstrate that our method consistently outperforms end-to-end deep learning baselines or using LLMs as black-box classifiers under limited data conditions, establishing a promising direction for integrating LLMs into data-efficient and clinically transparent medical AI systems.

2606.18223 2026-06-17 cs.CR cs.AI cs.LG cs.SY eess.SY 交叉投稿

Learning Red Agent Policy from Observations for Neurosymbolic Autonomous Cyber Agents

从观测中学习红方代理策略用于神经符号自主网络代理

Ankita Samaddar, Sandeep Neema, Daniel Balasubramanian, Xenofon Koutsoukos

发表机构 * MIT(麻省理工学院)

AI总结 针对网络攻击中红方动作不可观测的问题,提出基于模仿学习的策略学习技术,从网络观测和防御动作预测红方行为,集成神经符号防御代理实现高精度预测。

详情
AI中文摘要

随着复杂网络攻击日益普遍,现代网络需要经由强化学习训练的智能自主网络防御代理。这些代理采用神经符号方法,如带有学习组件的行为树,来学习、推理、适应和实施安全规则,同时维持关键操作。然而,这些自主网络是部分可观测系统,即网络攻击者(红方代理)的动作不可观测,使得防御者难以预测红方动作、学习红方策略或评估攻击者的入侵程度。为解决此问题,我们提出一种策略学习技术,利用模仿学习来学习具有离散状态和离散动作的部分可观测RL代理的策略。我们在自主网络环境中应用该技术,从网络观测和防御动作预测红方代理的动作。与神经符号网络防御代理集成后,我们的方法有效处理不同红方策略,并在多种模拟场景中实现高预测精度。

英文摘要

With sophisticated cyber-attacks becoming increasingly prevalent, modern networks require intelligent autonomous cyber-defense agents trained via Reinforcement Learning (RL). These agents employ neurosymbolic approaches such as behavior trees with learning-enabled components (LECs) to learn, reason, adapt, and implement security rules while maintaining critical operations. However, these autonomous networks are partially observable systems, i.e., the cyber-attacker's (red agent's) actions are not observable, making it difficult for the defender to predict red actions, learn red policies, or assess the attacker's intrusion levels. To address this, we propose a Policy Learning Technique using imitation learning to learn policies for partially observable RL agents with discrete states and discrete actions. We apply this technique in an autonomous cyber environment to predict red agent's actions from network observations and defender actions. Integrated with a neurosymbolic cyber-defense agent, our method effectively handles different red policies and achieves high prediction accuracy across diverse simulated scenarios.

2505.03509 2026-06-17 cs.LG astro-ph.IM 版本更新

AnomalyMatch: Discovering Rare Objects of Interest with Semi-supervised and Active Learning

AnomalyMatch: 通过半监督和主动学习发现罕见感兴趣对象

Pablo Gómez, Laslo E. Ruhberg, Maria Teresa Nardone, David O'Ryan

AI总结 提出AnomalyMatch框架,结合半监督FixMatch算法和主动学习,将异常检测视为二分类问题,利用少量标注和大量未标注图像训练,在严重类别不平衡下实现高AUROC和AUPRC。

Comments Accepted for publication in RASTI; 17 pages; 12 figures

详情
AI中文摘要

大数据集中的异常检测在天文学和计算机视觉中至关重要。然而,由于标记数据稀缺,通常无法应用监督方法进行异常检测。我们提出了AnomalyMatch,一个结合了使用EfficientNet分类器的半监督FixMatch算法与主动学习的异常检测框架。AnomalyMatch专为大规模应用定制,并集成到ESA Datalabs科学平台中。在该方法中,我们将异常检测视为二分类问题,并有效利用有限的标记图像和丰富的未标记图像进行训练。我们通过用户界面实现主动学习,用于验证高置信度异常并纠正误报。在严重类别不平衡下,对GalaxyMNIST天文数据集和miniImageNet自然图像基准的评估显示出强大性能。从五到十个标记异常开始,我们实现了平均AUROC为0.96(miniImageNet)和0.89(GalaxyMNIST),相应的AUPRC分别为0.82和0.77。经过三个主动学习周期后,按分数排名前1%的图像中,异常精度达到76%(miniImageNet)至94%(GalaxyMNIST)。我们与已建立的Astronomaly软件在来自'Galaxy Zoo - The Galaxy Challenge'数据集的选定'奇特'星系上进行比较,实现了可比较的性能,平均AUROC为0.83。我们的结果强调了该方法在异常发现方面的卓越实用性和可扩展性,突显了针对标签严重稀缺领域的专门方法的价值。

英文摘要

Anomaly detection in large datasets is essential in astronomy and computer vision. However, due to a scarcity of labelled data, it is often infeasible to apply supervised methods to anomaly detection. We present AnomalyMatch, an anomaly detection framework combining the semi-supervised FixMatch algorithm using EfficientNet classifiers with active learning. AnomalyMatch is tailored for large-scale applications and integrated into the ESA Datalabs science platform. In this method, we treat anomaly detection as a binary classification problem and efficiently utilise limited labelled and abundant unlabelled images for training. We enable active learning via a user interface for verification of high-confidence anomalies and correction of false positives. Evaluations on the GalaxyMNIST astronomical dataset and the miniImageNet natural-image benchmark under severe class imbalance display strong performance. Starting from five to ten labelled anomalies, we achieve an average AUROC of 0.96 (miniImageNet) and 0.89 (GalaxyMNIST), with respective AUPRC of 0.82 and 0.77. After three active learning cycles, anomalies are ranked with 76% (miniImageNet) to 94% (GalaxyMNIST) precision in the top 1% of the highest-ranking images by score. We compare to the established Astronomaly software on selected 'odd' galaxies from the 'Galaxy Zoo- The Galaxy Challenge' dataset, achieving comparable performance with an average AUROC of 0.83. Our results underscore the exceptional utility and scalability of this approach for anomaly discovery, highlighting the value of specialised approaches for domains characterised by severe label scarcity

2506.05797 2026-06-17 cs.LG cs.CE cs.RO 版本更新

EqCollide: Equivariant and Collision-Aware Deformable Objects Neural Simulator

EqCollide: 等变且碰撞感知的可变形物体神经模拟器

Qianyi Chen, Tianrun Gao, Chenbo Jiang, Tailin Wu

发表机构 * Westlake University(西交大大学) Fudan University(复旦大学) Tongji University(同济大学) McGill University(麦吉尔大学)

AI总结 提出首个端到端等变神经场模拟器EqCollide,通过等变编码器和碰撞感知消息传递的图神经网络常微分方程,实现可变形物体碰撞的准确、稳定和可扩展模拟。

Comments SIGKDD 2026 Oral AI4S Track. 20 pages, 16 figures

详情
AI中文摘要

模拟可变形物体的碰撞是一项基础但具有挑战性的任务,因为涉及固体力学和多体相互作用的复杂性。现有的数据驱动方法通常缺乏对物理对称性的等变性、对碰撞处理不足以及可扩展性有限。本文介绍\name,这是首个用于可变形物体及其碰撞的端到端等变神经场模拟器。我们提出一个等变编码器,将物体几何和速度映射到潜在控制点。随后,基于等变图神经网络的神经常微分方程通过碰撞感知消息传递建模控制点之间的相互作用。为了重建速度场,我们查询一个以控制点特征为条件的神经场,实现连续且分辨率无关的运动预测。在2D和3D场景上的实验结果表明,\name在不同物体配置下实现了准确、稳定且可扩展的模拟。与最佳基线模型相比,其滚动均方误差降低了24.34%至57.62%。此外,\name能够泛化到更多碰撞物体和更长的时间范围,并对群作用下的输入变换保持鲁棒。代码可在以下网址获取:this https URL

英文摘要

Simulating collisions of deformable objects is a fundamental yet challenging task due to the complexity of modeling solid mechanics and multi-body interactions. Existing data-driven methods often suffer from lack of equivariance to physical symmetries, inadequate handling of collisions, and limited scalability. Here we introduce EqCollide, the first end-to-end equivariant neural fields simulator for deformable objects and their collisions. We propose an equivariant encoder to map object geometry and velocity into latent control points. A subsequent equivariant Graph Neural Network-based Neural Ordinary Differential Equation models the interactions among control points via collision-aware message passing. To reconstruct velocity fields, we query a neural field conditioned on control point features, enabling continuous and resolution-independent motion predictions. Experimental results on 2D and 3D scenarios show that EqCollide achieves accurate, stable, and scalable simulations across diverse object configurations. It achieves $24.34\%$ to $57.62\%$ lower rollout MSE, even compared with the best-performing baseline model. Furthermore, EqCollide could generalize to more colliding objects and extended temporal horizons, and stay robust to input transformed with group action. Code is available at: https://github.com/AI4Science-WestlakeU/EqCollide

2602.03045 2026-06-17 cs.LG 版本更新

Clarify Before You Draw: Proactive Agents for Robust Text-to-CAD Generation

先澄清再绘制:面向鲁棒文本到CAD生成的主动式智能体

Bo Yuan, Zelin Zhao, Petr Molodyk, Bin Hu, Yongxin Chen

AI总结 提出主动式智能体框架ProCAD,通过澄清代理在代码生成前解决用户提示中的歧义,再通过CAD编码代理生成可执行程序,显著提升鲁棒性,平均Chamfer距离降低79.9%。

Comments ICML 2026

详情
AI中文摘要

大型语言模型最近使得文本到CAD系统能够从自然语言提示中合成参数化CAD程序(例如CadQuery)。然而在实践中,几何描述可能是不明确或内部不一致的:关键尺寸可能缺失,约束可能冲突。然而,现有的微调模型倾向于被动地遵循用户指令,并在文本模糊时产生幻觉尺寸。为了解决这个问题,我们提出了一个用于文本到CadQuery生成的主动式智能体框架,名为ProCAD,它在代码合成之前解决规范问题。我们的框架将主动式澄清代理(该代理审计提示并仅在必要时提出有针对性的澄清问题以生成自洽的规范)与CAD编码代理(将规范转换为可执行的CadQuery程序)配对。我们基于精心策划的高质量文本到CadQuery数据集微调编码代理,并通过在澄清轨迹上进行智能体SFT来训练澄清代理。实验表明,主动式澄清显著提高了对模糊提示的鲁棒性,同时保持较低的交互开销。ProCAD优于前沿闭源模型,包括Claude Sonnet 4.5,将平均Chamfer距离降低了79.9%,并将无效比率从4.8%降至0.9%。我们的代码和数据集在此https URL上公开。

英文摘要

Large language models have recently enabled text-to-CAD systems that synthesize parametric CAD programs (e.g., CadQuery) from natural-language prompts. In practice, however, geometric descriptions can be under-specified or internally inconsistent: critical dimensions may be missing and constraints may conflict. However, existing fine-tuned models tend to reactively follow the user instructions and hallucinate dimensions when the text is ambiguous. To address this, we propose a proactive agentic framework for text-to-CadQuery generation, named as ProCAD, that resolves specification issues before code synthesis. Our framework pairs a proactive clarifying agent, which audits the prompt and asks targeted clarification questions only when necessary to produce a self-consistent specification, with a CAD coding agent that translates the specification into an executable CadQuery program. We fine-tune the coding agent based on a curated high-quality text-to-CadQuery dataset and train the clarifying agent via agentic SFT on clarification trajectories. Experiments show that proactive clarification significantly improves robustness to ambiguous prompts while keeping interaction overhead low. ProCAD outperforms frontier closed-source models, including Claude Sonnet 4.5, reducing the mean Chamfer distance by 79.9% and lowering the invalidity ratio from 4.8% to 0.9%. Our code and datasets are made publicly available on https://github.com/BoYuanVisionary/Pro-CAD.

2602.11715 2026-06-17 cs.LG cs.CL 版本更新

DICE: Diffusion Large Language Models Excel at Generating CUDA Kernels

DICE:扩散大语言模型在生成CUDA内核方面表现出色

Haolei Bai, Lingcheng Kong, Xueyi Chen, Jianmian Wang, Zhiqiang Tao, Huan Wang

AI总结 提出CuKe数据集和BiC-RL训练框架,构建DICE系列扩散大语言模型(1.7B/4B/8B),在KernelBench上显著优于同类自回归和扩散模型,实现CUDA内核生成新SOTA。

Comments v2: Expanded with dLLM vs. autoregressive LLM comparisons, ablation studies, and qualitative case studies

详情
AI中文摘要

扩散大语言模型(dLLMs)因其并行生成令牌的能力,已成为自回归(AR)LLMs的有力替代方案。这一范式特别适用于代码生成,其中整体结构规划和非顺序优化至关重要。尽管有这种潜力,但针对CUDA内核生成定制dLLMs仍然具有挑战性,不仅因为高度专业化,还因为严重缺乏高质量的训练数据。为了解决这些挑战,我们构建了CuKe,一个针对高性能CUDA内核优化的增强监督微调数据集。在此基础上,我们提出了一个双阶段策划强化学习(BiC-RL)框架,包括CUDA内核填充阶段和端到端CUDA内核生成阶段。利用这一训练框架,我们推出了DICE,一系列专为CUDA内核生成设计的扩散大语言模型,涵盖1.7B、4B和8B三个参数规模。在KernelBench上的大量实验表明,DICE显著优于同等规模的自回归和扩散LLMs,为CUDA内核生成建立了新的最先进水平。

英文摘要

Diffusion large language models (dLLMs) have emerged as a compelling alternative to autoregressive (AR) LLMs, owing to their capacity for parallel token generation. This paradigm is particularly well-suited for code generation, where holistic structural planning and non-sequential refinement are critical. Despite this potential, tailoring dLLMs for CUDA kernel generation remains challenging, obstructed not only by the high specialization but also by the severe lack of high-quality training data. To address these challenges, we construct CuKe, an augmented supervised fine-tuning dataset optimized for high-performance CUDA kernels. On top of it, we propose a bi-phase curated reinforcement learning (BiC-RL) framework consisting of a CUDA kernel infilling stage and an end-to-end CUDA kernel generation stage. Leveraging this training framework, we introduce DICE, a series of diffusion large language models designed for CUDA kernel generation, spanning three parameter scales, 1.7B, 4B, and 8B. Extensive experiments on KernelBench demonstrate that DICE significantly outperforms both autoregressive and diffusion LLMs of comparable scale, establishing a new state-of-the-art for CUDA kernel generation.

2602.22277 2026-06-17 cs.LG eess.SP 版本更新

X-REFINE: XAI-based RElevance input-Filtering and archItecture fiNe-tuning for channel Estimation

X-REFINE:基于XAI的相关性输入过滤与架构微调用于信道估计

Abdul Karim Gizzini, Yahia Medjahdi

AI总结 提出X-REFINE框架,通过分解稳定化LRP epsilon规则联合优化输入过滤和架构微调,在信道估计中实现性能-复杂度-可解释性的优越权衡。

Comments This paper has been accepted for publication in the IEEE Transactions on Vehicular Technology (TVT) as a correspondence paper

详情
AI中文摘要

AI原生架构对于6G无线通信至关重要。在信道估计等关键应用中采用的深度学习模型的黑盒特性和高复杂度限制了其实际部署。虽然基于扰动的可解释人工智能(XAI)解决方案提供了输入过滤,但它们往往忽略了内部结构优化。我们提出了X-REFINE,一个基于XAI的联合输入过滤和架构微调框架。通过利用基于分解的、符号稳定的LRP epsilon规则,X-REFINE反向传播预测以获取子载波和隐藏神经元的高分辨率相关性分数。这使得能够进行可靠的优化,识别出最可靠的模型组件。仿真结果表明,与基于外部扰动的XAI框架相比,X-REFINE实现了优越的性能-复杂度-可解释性权衡,显著降低了计算复杂度,同时保持了稳健的误码率(BER)性能。

英文摘要

AI-native architectures are vital for 6G wireless communications. The black-box nature and high complexity of deep learning models employed in critical applications, such as channel estimation, limit their practical deployment. While perturbation-based eXplainable Artificial Intelligence (XAI) solutions offer input filtering, they often neglect internal structural optimization. We propose X-REFINE, an XAI-based framework for joint input-filtering and architecture fine-tuning. By utilizing a decomposition-based, sign-stabilized LRP epsilon rule, X-REFINE backpropagates predictions to derive high-resolution relevance scores for both subcarriers and hidden neurons. This enables a reliable optimization that identifies the most reliable model components. Simulation results demonstrate that X-REFINE achieves a superior performance-complexity-interpretability trade-off compared to the external perturbation-based XAI frameworks, significantly reducing computational complexity while maintaining robust bit error rate (BER) performance.

2604.17616 2026-06-17 cs.LG 版本更新

Conditional Attribution for Root Cause Analysis in Time-Series Anomaly Detection

时间序列异常检测中根因分析的条件归因

Shashank Mishra, Karan Patil, Cedric Schockaert, Didier Stricker, Jason Rambach

AI总结 提出一种条件归因框架,通过检索与异常观测上下文相似的正态实例进行依赖保持的解释,结合变分自编码器和UMAP流形嵌入实现高维时间序列的高效归因,并在SWaT和MSDS基准上提升了根因识别准确率与鲁棒性。

Comments Accepted at ECML PKDD. 16 pages, 8 figures, 13 tables, and an appendix

详情
Journal ref
ECML PKDD 2026
AI中文摘要

根因分析对于时间序列异常检测在复杂真实世界系统的可靠运行中至关重要。现有的解释方法通常依赖于不切实际的特征扰动,并忽略时间依赖和跨特征依赖,导致归因不可靠。我们提出了一种条件归因框架,该框架相对于上下文相似的正态系统状态来解释异常。我们的方法不是使用边际或随机采样的基线,而是检索以异常观测为条件的代表性正态实例,从而实现依赖保持且操作上有意义的解释。为了支持高维时间序列数据,在学习的低维表示中使用变分自编码器潜在空间和UMAP流形嵌入进行上下文检索。通过将检索过程基于系统学习的流形,该策略避免了分布外伪影,并在保持计算效率的同时确保归因保真度。我们进一步引入了置信感知和时间评估指标,用于评估解释的可靠性和响应性。在SWaT和MSDS基准上的实验表明,所提出的方法在多个异常检测模型上持续提高了根因识别准确率、时间定位和鲁棒性。这些结果突显了条件归因在复杂时间序列系统中用于可解释异常诊断的实际效用。代码和模型将公开发布。

英文摘要

Root cause analysis (RCA) for time-series anomaly detection is critical for the reliable operation of complex real-world systems. Existing explanation methods often rely on unrealistic feature perturbations and ignore temporal and cross-feature dependencies, leading to unreliable attributions. We propose a conditional attribution framework that explains anomalies relative to contextually similar normal system states. Instead of using marginal or randomly sampled baselines, our method retrieves representative normal instances conditioned on the anomalous observation, enabling dependency-preserving and operationally meaningful explanations. To support high-dimensional time-series data, contextual retrieval is performed in learned low-dimensional representations using both variational autoencoder latent spaces and UMAP manifold embeddings. By grounding the retrieval process in the system's learned manifold, this strategy avoids out-of-distribution artifacts and ensures attribution fidelity while maintaining computational efficiency. We further introduce confidence-aware and temporal evaluation metrics for assessing explanation reliability and responsiveness. Experiments on the SWaT and MSDS benchmarks demonstrate that the proposed approach consistently improves root-cause identification accuracy, temporal localization, and robustness across multiple anomaly detection models. These results highlight the practical utility of conditional attribution for explainable anomaly diagnosis in complex time-series systems. Code and models are available at: https://github.com/dfki-av/Conditional-Attribution-for-Root-Cause-Analysis-in-Time-Series-Anomaly-Detection.

2606.11990 2026-06-17 cs.LG cs.AI 版本更新

Time-Series Foundation Model Embeddings for Remaining Useful Life Estimation

用于剩余使用寿命估计的时间序列基础模型嵌入

Amir El-Ghoussani, Michele De Vita, Ronald Naumann, Vasileios Belagiannis

发表机构 * University of Erlangen-Nuremberg(埃尔朗根-纽伦堡大学) Siemens AG(西门子股份公司)

AI总结 提出冻结预训练时间序列基础模型Chronos-2作为骨干,结合轻量回归头进行剩余寿命预测,在工业传感器数据上优于多种基线方法。

Comments Accepted to EUSIPCO 2026, 4 pages, 2 figures, 2 tables

详情
AI中文摘要

剩余使用寿命(RUL)预测对于工业预测性维护至关重要,然而许多基于学习的方法依赖于大量的特征工程或大型标注数据集来训练特定任务的序列模型。在这项工作中,我们引入了一种轻量级学习方法,利用冻结的预训练时间序列基础模型(TSFM),并将其与一个小型回归头结合,用于从多变量传感器流中估计RUL。具体来说,我们使用Chronos-2作为冻结骨干来提取上下文窗口特征,并训练一个轻量级回归神经网络进行RUL预测。在来自两种设备类型的真实工业传感器数据上的实验表明,在相同的预处理和评估协议下,Chronos-2特征一致地优于循环、卷积、基于Transformer和梯度提升基线。我们进一步分析了上下文长度的影响,发现随着历史记录变长,性能显著提升,这表明TSFM表示为工业环境中的RUL估计提供了一种实用且数据高效的替代方案。

英文摘要

Remaining Useful Life (RUL) prediction is essential for industrial predictive maintenance, yet many learning-based approaches rely on extensive feature engineering or large labeled datasets to train task-specific sequence models. In this work, we introduce a lightweight learning approach, in which we leverage a frozen pretrained time-series foundation model (TSFM) and combine it with a small regression head for RUL estimation from multivariate sensor streams. More specifically, we use Chronos-2 as a frozen backbone to extract context window features and train a lightweight regression neural network for RUL prediction. Experiments on real-world industrial sensor data from two device types show that Chronos-2 features consistently improve over recurrent, convolutional, Transformer-based, and gradient-boosting baselines under the same preprocessing and evaluation protocol. We further analyze the impact of context length and find that performance improves significantly with longer histories, indicating that TSFM representation offer a practical and data-efficient alternative for RUL estimation in industrial settings.

2606.16878 2026-06-17 cs.LG 版本更新

Integrated Marketing Attribution: A Bayesian Framework for Privacy-Safe Granular Measurement Anchored in MMM

集成营销归因:基于贝叶斯框架的隐私安全粒度测量,锚定于MMM

Meghana R. Bhat, Ankit Umare, Utsav Aggarwal, Richard Vecsler, Arunkumar Mani, Karthik Nair, Chandhu Nair

AI总结 提出集成营销归因(IMA)框架,结合营销组合模型(MMM)与贝叶斯归因模型,从聚合数据中推导出活动级效果,实现隐私安全且粒度精细的归因。

详情
AI中文摘要

零售营销测量日益需要精细的活动级洞察,而无需依赖用户级跟踪。然而,两种主流方法——营销组合模型(MMM)和多触点归因(MTA)——常常产生碎片化的洞察。MMM在渠道级规划中隐私安全且稳健,但对于活动优化过于粗糙;而MTA提供精细归因,但在日益增加的隐私限制下变得不太可靠。我们提出集成营销归因(IMA),一个统一框架,将MMM与特定渠道的贝叶斯归因模型相结合,从聚合数据中推导活动级效果。通过利用MMM信息先验,IMA提供精细、隐私安全的归因,同时保持与MMM的一致性。

英文摘要

Retail marketing measurement increasingly requires granular campaign-level insights without relying on user-level tracking. However, the two dominant approaches, Marketing Mix Modeling (MMM) and Multi-Touch Attribution (MTA), often produce fragmented insights. MMM is privacy-safe and robust for channel-level planning but is too coarse for campaign optimization, while MTA provides granular attribution but has become less reliable under increasing privacy restrictions. We propose Integrated Marketing Attribution (IMA), a unified framework that combines MMM with channel specific Bayesian attribution models to derive campaign-level effects from aggregated data. By leveraging MMM-informed priors, IMA delivers granular, privacy-safe attribution while preserving consistency with MMM.

2403.18957 2026-06-17 cs.CY cs.CL cs.LG cs.SI 版本更新

Moderating Illicit Online Image Promotion for Unsafe User-Generated Content Games Using Large Vision-Language Models

使用大型视觉语言模型审核不安全的用户生成内容游戏中的非法在线图像推广

Keyan Guo, Ayush Utkarsh, Wenbo Ding, Isabelle Ondracek, Ziming Zhao, Guo Freeman, Nishant Vishwamitra, Hongxin Hu

AI总结 针对社交媒体上非法推广不安全UGC游戏的图像,提出UGCG-Guard系统,利用大型视觉语言模型和条件提示策略实现零样本域适应,检测准确率达94%。

Comments In Proceedings of the 33rd USENIX Conference on Security Symposium (SEC '24), August 14-16, 2024

详情
AI中文摘要

在线用户生成内容游戏(UGCG)在儿童和青少年中越来越受欢迎,用于社交互动和更具创造性的在线娱乐。然而,它们带来了更高的接触露骨内容的风险,引发了对儿童和青少年在线安全的日益关注。尽管存在这些担忧,但很少有研究关注社交媒体上基于图像的非法不安全UGCG推广问题,这种推广可能无意中吸引年轻用户。这一挑战源于难以获得全面的UGCG图像训练数据以及这些图像与传统不安全内容不同的独特性质。在这项工作中,我们迈出了研究不安全UGCG非法推广威胁的第一步。我们收集了一个包含2,924张图像的真实世界数据集,这些图像展示了游戏创作者用于推广UGCG的多种色情和暴力内容。我们的深入研究揭示了对此问题的新认识,以及自动标记非法UGCG推广的迫切需求。我们还创建了一个尖端系统UGCG-Guard,旨在帮助社交媒体平台有效识别用于非法UGCG推广的图像。该系统利用最近引入的大型视觉语言模型(VLM),并采用一种新颖的条件提示策略进行零样本域适应,以及思维链(CoT)推理进行上下文识别。UGCG-Guard取得了出色结果,在现实场景中检测这些用于非法推广游戏的图像时准确率达到94%。

英文摘要

Online user generated content games (UGCGs) are increasingly popular among children and adolescents for social interaction and more creative online entertainment. However, they pose a heightened risk of exposure to explicit content, raising growing concerns for the online safety of children and adolescents. Despite these concerns, few studies have addressed the issue of illicit image-based promotions of unsafe UGCGs on social media, which can inadvertently attract young users. This challenge arises from the difficulty of obtaining comprehensive training data for UGCG images and the unique nature of these images, which differ from traditional unsafe content. In this work, we take the first step towards studying the threat of illicit promotions of unsafe UGCGs. We collect a real-world dataset comprising 2,924 images that display diverse sexually explicit and violent content used to promote UGCGs by their game creators. Our in-depth studies reveal a new understanding of this problem and the urgent need for automatically flagging illicit UGCG promotions. We additionally create a cutting-edge system, UGCG-Guard, designed to aid social media platforms in effectively identifying images used for illicit UGCG promotions. This system leverages recently introduced large vision-language models (VLMs) and employs a novel conditional prompting strategy for zero-shot domain adaptation, along with chain-of-thought (CoT) reasoning for contextual identification. UGCG-Guard achieves outstanding results, with an accuracy rate of 94% in detecting these images used for the illicit promotion of such games in real-world scenarios.

2410.08562 2026-06-17 cond-mat.mtrl-sci cs.LG 版本更新

Adaptable Method for Crystal Design across Diverse Constraints and Objectives with Pretrained Property Predictors

基于预训练属性预测器的可适应方法用于跨多样约束与目标的晶体设计

Akihiro Fujii, Yoshitaka Ushiku, Koji Shimizu, Anh Khoa Augustin Lu, Satoshi Watanabe

AI总结 提出一种直接预测器引导的梯度优化方法,结合现成预测器、位点元素掩码、模板初始化和任务特定损失,实现数据高效、约束丰富的晶体设计,在钙钛矿中优于生成和贝叶斯基线,并支持半金属设计。

详情
AI中文摘要

先进的晶体设计可以加速从光伏到自旋电子学等应用中的材料发现。实际设计必须满足多种属性和物理约束,然而现有的基于机器学习的方法通常依赖于大型数据集、重新训练或任务特定的生成器。在这里,我们展示了直接预测器引导的梯度优化通过结合现成预测器与位点元素掩码、模板初始化和任务特定损失,实现了数据高效、约束丰富的晶体设计。在钙钛矿中,它在三个目标——带隙、形成能和容忍因子——以及两个硬约束下优于生成和贝叶斯基线。DFT评估进一步表明,尽管使用的预测器训练数据约为领先生成模型的十分之一,其带隙目标性能仍具有竞争力。通过灵活组合预训练预测器与应用导向的掩码和自定义损失,同一框架支持半金属设计。这种模块化可以帮助研究人员和工程师将多样化的应用需求直接转化为优化的候选晶体,且计算成本最低。

英文摘要

Advanced crystal design can accelerate materials discovery across applications from photovoltaics to spintronics. Practical design must satisfy multiple properties and physical constraints, yet existing machine-learning-based approaches to such design often depend on large datasets, retraining, or task-specific generators. Here, we show that direct predictor-guided gradient optimization enables data-efficient, constraint-rich crystal design by combining off-the-shelf predictors with site-wise element masks, template initialization, and task-specific losses. In perovskites, it outperformed generative and Bayesian baselines under three targets -- band gap, formation energy, and tolerance factor -- and two hard constraints. DFT assessment further showed band-gap targeting competitive with a leading generative model despite using predictors trained on roughly one-tenth of the data. By flexibly combining pretrained predictors with application-oriented masks and custom losses, the same framework supported half-metal design. Such modularity could help researchers and engineers translate diverse application requirements directly into optimized candidate crystals with minimal computational cost.

2503.04507 2026-06-17 q-bio.QM cs.LG 版本更新

The Morse Transform for Discrete Shape Analysis

离散形状分析的Morse变换

Alexander M. Tanaka, Aras T. Asaad, Richard Cooper, Vidit Nanda

AI总结 提出一种基于定向分段线性Morse理论的拓扑变换,通过记录多个高度函数下的临界点来量化嵌入对象的几何形状,生成的特征向量在配体虚拟筛选中取得最优平均AUROC。

Comments 37 pages, 3 main figures, 2 main tables, 12 appendix figures and 4 appendix tables

详情
AI中文摘要

物体的几何形状在调节其与物理世界的相互作用中起着至关重要的作用。然而,为了统计推断或分类任务的目的,用数值描述几何信息仍然困难。在这里,我们引入了一种新的拓扑变换,它利用定向分段线性Morse理论,通过编录多个高度函数下的临界点来量化嵌入对象的几何形状。该Morse变换的输出记录了表征底层形状的临界点的高度和局部拓扑类型(峰、谷或鞍点),保留了比欧拉特征变换更精细的信息,同时自然优先考虑形状的最外层区域。关键的是,该输出可以进一步压缩为丰富而紧凑的特征向量。我们将Morse特征向量作为配体虚拟筛选(LBVS)的描述符进行基准测试,这本质上依赖于分子的形状。在常见的梯度提升树分类流程下,与其他拓扑变换描述符和标准基于形状的LBVS描述符相比,Morse描述符实现了最高的平均AUROC。

英文摘要

The geometry of an object plays a vital role in modulating its interactions with the physical world. It nevertheless remains difficult to describe geometric information numerically for the purposes of statistical inference or classification tasks. Here, we introduce a new topological transform which leverages directional piecewise-linear Morse theory to quantify the geometry of an embedded object by cataloguing critical points across multiple height-functions. The output of this Morse transform records both the heights and the local topological type (peak, trough or saddle) of the critical points that characterise the underlying shape, retaining finer information than the Euler characteristic transform whilst naturally prioritising a shape's outermost regions. Crucially, this output can be further compressed into a rich but compact feature vector. We benchmark the Morse feature vector as a descriptor for ligand-based virtual screening (LBVS), which intrinsically depends on the shape of molecules. Under a common gradient-boosted tree classification pipeline, Morse descriptors achieve the highest mean AUROC when compared to other topological transform descriptors and to standard shape-based LBVS descriptors.

2503.05598 2026-06-17 cs.CE cs.LG 版本更新

From Theory to Application: A Practical Introduction to Neural Operators in Scientific Computing

从理论到应用:神经算子在科学计算中的实用入门

Prashant K. Jha

AI总结 本文综述了用于学习参数偏微分方程解算子的神经网络架构,包括DeepONet、PCANet和傅里叶神经算子,并在三个典型问题上评估其性能,同时讨论了作为贝叶斯逆问题替代模型的应用及挑战。

Comments 72 pages, 22 figures, GitHub repository: https://github.com/CEADpx/neural_operators

详情
AI中文摘要

本综述考察了用于学习参数偏微分方程(PDE)解算子的神经算子架构,重点强调概念清晰性和实际实现。该工作分析了关键模型,包括DeepONet、PCANet和傅里叶神经算子,突出了它们的基础表示、计算结构和比较性能。这些架构在三个典型PDE问题上进行了演示:泊松方程、线弹性问题和超弹性问题。为使内容自包含,介绍了关键基础主题,包括函数空间的有限维表示、奇异值分解以及从无限维函数空间中采样。除了正向建模,本综述还讨论了在贝叶斯逆问题框架中使用神经算子作为替代模型,包括先验指定、正向映射近似和后验计算。三种神经算子架构的性能在分布内样本、分布外样本和贝叶斯推断任务上进行了评估。本综述还讨论了与预测精度和泛化相关的挑战,概述了新兴策略,如基于残差的误差校正和多层级训练。最后,本综述将神经算子定位在更广泛的科学计算工作流中,并指出了实现可靠、可扩展算子学习的方向。

英文摘要

This review examines neural operator architectures for learning solution operators of parametric partial differential equations (PDEs), with an emphasis on conceptual clarity and practical implementation. The work analyzes key models, including DeepONet, PCANet, and the Fourier Neural Operator, highlighting their underlying representations, computational structures, and comparative performance. These architectures are demonstrated on three canonical PDE problems: the Poisson equation, a linear elasticity problem, and a hyperelasticity problem. To make the presentation self-contained, key foundational topics are introduced, including finite-dimensional representations of function spaces, singular-value decomposition, and sampling from infinite-dimensional function spaces. Beyond forward modeling, the review discusses the use of neural operators as surrogate models within a Bayesian inverse-problem framework, including prior specification, forward-map approximation, and posterior computation. The performance of the three neural-operator architectures is evaluated on in-distribution samples, out-of-distribution samples, and Bayesian inference tasks. The review also discusses challenges related to prediction accuracy and generalization, outlining emerging strategies such as residual-based error correction and multi-level training. The review concludes by positioning neural operators within broader scientific-computing workflows and by identifying directions for reliable, scalable operator learning.

2503.17867 2026-06-17 cs.CR cs.AI cs.LG cs.NI 版本更新

Detecting and Mitigating DDoS Attacks with AI: A Survey

利用人工智能检测和缓解DDoS攻击:综述

Alexandru Apostu, Silviu Gheorghe, Andrei Hîji, Nicolae Cleju, Andrei Pătraşcu, Cristian Rusu, Radu Ionescu, Paul Irofti

发表机构 * Department of Computer Science, University of Bucharest(布加勒斯大学计算机科学系)

AI总结 本文综述了基于AI的DDoS攻击检测与缓解方法,提供了基于专家层次和AI生成树状图的分类法,讨论了数据集、对抗训练及未来研究方向。

详情
AI中文摘要

分布式拒绝服务攻击是一个活跃的网络安全研究问题。最近的研究从基于静态规则的防御转向基于AI的检测和缓解。本综述涵盖了几个关键主题。首先,讨论了最先进的AI检测方法。提供了基于手动专家层次和AI生成的树状图的深入分类法,从而解决了DDoS分类的歧义。随后讨论了可用的数据集,涵盖了数据格式选项及其在训练AI检测方法中的作用,以及对抗训练和示例增强。除了检测,还调查了基于AI的缓解技术。最后,提出了多个开放的研究方向。

英文摘要

Distributed Denial of Service attacks represent an active cybersecurity research problem. Recent research shifted from static rule-based defenses towards AI-based detection and mitigation. This comprehensive survey covers several key topics. Preeminently, state-of-the-art AI detection methods are discussed. An in-depth taxonomy based on manual expert hierarchies and an AI-generated dendrogram are provided, thus settling DDoS categorization ambiguities. An important discussion on available datasets follows, covering data format options and their role in training AI detection methods together with adversarial training and examples augmentation. Beyond detection, AI based mitigation techniques are surveyed as well. Finally, multiple open research directions are proposed.

2509.15210 2026-06-17 cs.SD cs.AI cs.LG 版本更新

Explicit Context-Driven Neural Acoustic Modeling for High-Fidelity RIR Generation

显式上下文驱动的神经声学建模用于高保真RIR生成

Chen Si, Qianyi Wu, Chaitanya Amballa, Romit Roy Choudhury

AI总结 提出MiNAF模型,通过查询房间网格并提取距离分布作为显式局部几何特征,引导神经隐式模型生成更准确的房间脉冲响应(RIR),在多项指标上达到竞争性能。

详情
AI中文摘要

逼真的声音模拟在许多应用中起着关键作用。声音模拟的一个关键要素是房间脉冲响应(RIR),它描述了声音在给定空间中的传播方式。最近的研究应用神经隐式方法,利用从环境中收集的上下文信息(如场景图像)来学习RIR。然而,这些方法没有有效利用环境中的显式几何信息。为了进一步利用具有直接几何特征的神经隐式模型,我们提出了MiNAF,它在给定位置查询粗略的房间网格,并提取距离分布作为局部上下文的显式表示。我们的方法表明,结合显式的局部几何特征可以更好地引导模型生成更准确的RIR预测。通过与常规和最先进方法的比较,我们展示了MiNAF在各种评估指标上具有竞争力的性能。

英文摘要

Realistic sound simulation plays a critical role in many applications. A key element in sound simulation is the room impulse response (RIR), which characterizes how sound propagates within a given space. Recent studies have applied neural implicit methods to learn RIR using context information collected from the environment, such as scene images. However, these approaches do not effectively leverage explicit geometric information from the environment. To further exploit neural implicit models with direct geometric features, we present MiNAF, which queries a rough room mesh at given locations and extracts distance distributions as an explicit representation of local context. Our approach demonstrates that incorporating explicit local geometric features can better guide the model in generating more accurate RIR predictions. Through comparisons with conventional and state-of-the-art methods, we show that MiNAF performs competitively across various evaluation metrics.

2509.26476 2026-06-17 cs.CL cs.AI cs.LG cs.PF cs.SE 版本更新

Regression Language Models for Code

代码的回归语言模型

Yash Akhauri, Xingyou Song, Arissa Wongpanich, Bryan Lewandowski, Mohamed S. Abdelfattah

AI总结 提出回归语言模型(RLM),利用冻结的大语言模型编码器直接从文本预测代码执行结果(如内存占用、延迟、神经网络精度等),在多个任务上达到高相关度。

Comments Published in International Conference on Machine Learning (ICML) 2026

详情
AI中文摘要

我们研究代码到指标的回归:预测代码执行的数值结果,由于编程语言的开放性,这是一项具有挑战性的任务。虽然先前的方法依赖于繁重且特定领域的特征工程,但我们展示了一个统一的回归语言模型(RLM),使用冻结的LLM编码器可以直接从文本同时预测:(i) 多种高级语言(如Python和C++)代码的内存占用,(ii) Triton GPU内核的延迟,以及(iii) 以ONNX表示的已训练神经网络的精度和速度。特别是,一个基于T5Gemma的较小300M参数RLM在APPS的竞赛编程提交上获得了>0.9的Spearman等级相关系数,而单个统一模型在CodeNet的17种不同语言上获得了>0.5的平均Spearman等级相关系数。此外,RLM在五个经典NAS设计空间上获得了最高平均Kendall-Tau 0.46,这些空间此前由图神经网络主导,并且能同时预测多种硬件平台上的架构延迟。

英文摘要

We study code-to-metric regression: predicting numeric outcomes of code executions, a challenging task due to the open-ended nature of programming languages. While prior methods have resorted to heavy and domain-specific feature engineering, we show that a single unified Regression Language Model (RLM) using a frozen LLM encoder can simultaneously predict directly from text, (i) the memory footprint of code across multiple high-level languages such as Python and C++, (ii) the latency of Triton GPU kernels, and (iii) the accuracy and speed of trained neural networks represented in ONNX. In particular, a relatively small 300M parameter RLM based on T5Gemma, obtains >0.9 Spearman-rank on competitive programming submissions from APPS, and a single unified model achieves >0.5 average Spearman-rank across 24 different programming languages from CodeNet. Furthermore, the RLM can obtain the highest average Kendall-Tau of 0.46 on five classic NAS design spaces previously dominated by graph neural networks, and simultaneously predict architecture latencies on numerous hardware platforms.

2510.19838 2026-06-17 cs.AI cs.CL cs.LG 版本更新

Branch-and-Browse: Efficient and Controllable Web Exploration with Tree-Structured Reasoning and Action Memory

Branch-and-Browse:具有树状推理与动作记忆的高效可控网页探索

Shiqi He, Yue Cui, Xinyu Ma, Yaliang Li, Bolin Ding, Mosharaf Chowdhury

AI总结 提出Branch-and-Browse框架,通过树状结构化推理、网页状态重放和页面动作记忆,实现LLM网页代理的高效可控多分支探索,在WebArena上成功率35.8%,执行时间降低40.4%。

详情
AI中文摘要

由大型语言模型(LLM)驱动的自主网页代理在执行目标导向任务(如信息检索、报告生成和在线交易)方面展现出强大潜力。这些代理标志着向开放网络环境中实用具身推理的关键一步。然而,现有方法在推理深度和效率方面仍然受限:简单的线性方法无法进行多步推理且缺乏有效的回溯,而其他搜索策略则粗粒度且计算成本高。我们引入了Branch-and-Browse,一个细粒度的网页代理框架,它统一了结构化推理-行动、上下文记忆和高效执行。它(i)采用显式子任务管理与树状结构化探索,实现可控的多分支推理;(ii)通过高效的网页状态重放与后台推理引导探索;(iii)利用页面动作记忆在会话内和跨会话间共享已探索的动作。在WebArena基准测试中,Branch-and-Browse的任务成功率达到35.8%,相对于最先进的方法执行时间减少高达40.4%。这些结果表明,Branch-and-Browse是一个可靠且高效的基于LLM的网页代理框架。

英文摘要

Autonomous web agents powered by large language models (LLMs) show strong potential for performing goal-oriented tasks such as information retrieval, report generation, and online transactions. These agents mark a key step toward practical embodied reasoning in open web environments. However, existing approaches remain limited in reasoning depth and efficiency: vanilla linear methods fail at multi-step reasoning and lack effective backtracking, while other search strategies are coarse-grained and computationally costly. We introduce Branch-and-Browse, a fine-grained web agent framework that unifies structured reasoning-acting, contextual memory, and efficient execution. It (i) employs explicit subtask management with tree-structured exploration for controllable multi-branch reasoning, (ii) bootstraps exploration through efficient web state replay with background reasoning, and (iii) leverages a page action memory to share explored actions within and across sessions. On the WebArena benchmark, Branch-and-Browse achieves a task success rate of 35.8\% and reduces execution time by up to 40.4\% relative to state-of-the-art methods. These results demonstrate that Branch-and-Browse is a reliable and efficient framework for LLM-based web agents.

2511.19162 2026-06-17 cs.IR cs.CY cs.HC cs.LG cs.MM 版本更新

BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart

BioArtlas:生物艺术中多维复杂性的计算聚类

Joonhyung Bae

发表机构 * Graduate School of Culture Technology(文化科技研究生院)

AI总结 本文提出BioArtlas,通过新型轴感知表示对81件生物艺术作品进行多维分析,揭示四种组织模式,并通过交互式网页界面提供分析与探索。

Comments Bae, J. BioArtlas: Computational Clustering of Multi-Dimensional Complexity in Bioart. In The Thirty-ninth Annual Conference on Neural Information Processing Systems Creative AI Track: Humanity

详情
AI中文摘要

生物艺术的混合性质跨越艺术、科学、技术、伦理和政治,挑战传统单一轴分类。我提出了BioArtlas,利用新型轴感知表示分析81件生物艺术作品,共十三个 curated 维度。我们的代码本方法将相关概念分组为统一聚类,解决文化术语的多义性。对多达800种表示空间-算法组合的全面评估发现,Agglomerative clustering在k=15的4D UMAP上最优(轮廓系数0.664±0.008,信任度/连续性0.805/0.812)。该方法揭示了四种组织模式:艺术家特定的方法论凝聚力、基于技术的分段、时间艺术演变以及跨时间的概念亲和力。通过将分析优化与公共传播分离,我通过交互式网页界面(https://www.bioartlas.com)提供严谨分析和可访问的探索,数据集公开可用(https://github.com/joonhyungbae/BioArtlas).

英文摘要

Bioart brings living material into artistic practice, where a single work can be at once an aesthetic object, a scientific instrument, and an ethical provocation. Traditional categories sort such works along one axis at a time, which flattens the very hybridity that defines the field and leaves curators no way to compare works across many dimensions together. I introduce BioArtlas, a computational atlas that represents each bioartwork along many curated dimensions at once and organizes the field by conceptual similarity rather than by medium or chronology. My method embeds the keywords of all 81 works on each of thirteen interpretive axes, groups related concepts into a shared codebook that tames inconsistent terminology, and then searches systematically for a clustering that is both statistically clean and interpretable. Among the methods that place every work on the map, agglomerative clustering separates the field far more cleanly than the usual k-means baseline (silhouette 0.664 versus 0.483), whereas density-based methods reach higher scores only by discarding most of the corpus as noise. By separating rigorous analysis from public storytelling, BioArtlas turns the tangled complexity of bioart into a navigable landscape, openly available as an interactive interface (https://www.bioartlas.com) and dataset (https://github.com/joonhyungbae/BioArtlas).

2601.06862 2026-06-17 cs.CR cs.CV cs.LG cs.MM eess.IV 版本更新

Learning QoE from Packet-Level Measurements in Encrypted Video Conferencing Traffic

从加密视频会议流量的数据包级别测量中学习QoE

Michael Sidorov, Ofer Hadar

AI总结 针对ISP无法访问加密内容评估QoE的挑战,提出基于CNN的框架仅利用数据包大小预测BRISQUE和MOS,在WhatsApp和Zoom数据集上优于先前模型。

详情
AI中文摘要

用户体验质量已成为当今世界最重要的方面之一,因为它直接影响个人继续使用或放弃产品或服务的意愿。在此背景下,视频会议应用(VCAs)在COVID-19大流行后得到广泛采用,必须在日益拥挤的市场中提供卓越性能以保持竞争力。尽管内容提供商(CPs)如Zoom、WhatsApp、Telegram和Google Meet可以通过比较发送和接收的数据来评估通话质量,但VCAs中广泛使用的端到端加密使得互联网服务提供商(ISPs)评估体验质量(QoE)变得更加困难。由于ISPs无法访问加密内容,他们必须依赖对数据路径上未加密流量特征的被动测量。在这项工作中,我们提出了一个简单而有效的QoE预测框架,基于几乎原生的卷积神经网络(CNN)架构,仅使用从视频会议(VC)通话中两个参与者之间的通信中提取的数据包大小来预测两个QoE指标:BRISQUE和MOS。所提出的框架简单、易于实现,且不需要高端计算资源,但提供了优越的预测性能,正如我们在从WhatsApp和Zoom收集的两个自定义数据集上的实验所示,这些实验在QoE预测任务上比先前模型取得了显著改进。

英文摘要

The quality of the user experience has become one of the most important aspects in todays world, as it directly influences individuals willingness to continue using or abandon a product or service. In this context, video conferencing applications (VCAs), which experienced widespread adoption following the COVID-19 pandemic, must deliver excellent performance to remain competitive in an increasingly crowded market. Although content providers (CPs) such as Zoom, WhatsApp, Telegram, and Google Meet can assess conversation quality by comparing transmitted and received data. The widespread use of end-to-end encryption in VCAs makes quality-of-experience (QoE) evaluation by internet service providers (ISPs) far more challenging. Since ISPs do not have access to the encrypted content, they must rely on passive measurements of unencrypted traffic characteristics on the data path. In this work, we present a simple yet effective QoE prediction framework based on an almost stock convolutional neural network (CNN) architecture that uses only the packet sizes extracted from the communication between two participants in a video conferencing (VC) call to predict two QoE metrics: BRISQUE and MOS. The proposed framework is simple, easy to implement, and does not require high-end computational resources, yet it provides superior prediction performance, as shown in our experiments on two custom datasets collected from WhatsApp and Zoom, which achieve substantial improvements over previous models for the QoE prediction task.

2602.04901 2026-06-17 q-bio.GN cs.LG 版本更新

Beyond Independent Genes: Learning Module-Inductive Representations for Single-Cell Gene Perturbation Prediction

超越独立基因:学习模块归纳表示用于单细胞基因扰动预测

Jiafa Ruan, Ruijie Quan, Liyang Xu, Zongxin Yang, Yi Yang

AI总结 提出scBIG框架,通过基因关系聚类、基因簇感知编码器和结构感知对齐学习协调的基因程序模块表示,结合条件流匹配实现灵活泛化的扰动预测,在多个单细胞扰动基准上平均提升6.7%。

详情
AI中文摘要

预测遗传扰动引起的转录响应是功能基因组学中的一个核心问题。实际上,扰动响应很少是基因独立的,而是表现为功能相关基因之间协调的、程序级别的转录变化。然而,大多数现有方法由于基于基因的建模范式以及依赖无法捕捉动态程序重组的静态生物学先验知识,未能显式建模这种协调性。为解决这些局限,我们提出scBIG,一种模块归纳的扰动预测框架,显式建模协调的基因程序。scBIG通过基因关系聚类从数据中归纳出连贯的基因程序,通过基因簇感知编码器捕获程序间交互,并使用结构感知对齐目标保持模块协调性。然后利用条件流匹配对这些结构化表示进行建模,以实现灵活且可泛化的扰动预测。在多个单细胞扰动基准上的大量实验表明,scBIG始终优于最先进的方法,特别是在未见和组合扰动设置中,相比最强基线平均提升6.7%。代码可在该https URL获取。

英文摘要

Predicting transcriptional responses to genetic perturbations is a central problem in functional genomics. In practice, perturbation responses are rarely gene-independent but instead manifest as coordinated, program-level transcriptional changes among functionally related genes. However, most existing methods do not explicitly model such coordination, due to gene-wise modeling paradigms and reliance on static biological priors that cannot capture dynamic program reorganization. To address these limitations, we propose scBIG, a module-inductive perturbation prediction framework that explicitly models coordinated gene programs. scBIG induces coherent gene programs from data via Gene-Relation Clustering, captures inter-program interactions through a Gene-Cluster-Aware Encoder, and preserves modular coordination using structure-aware alignment objectives. These structured representations are then modeled using conditional flow matching to enable flexible and generalizable perturbation prediction. Extensive experiments on multiple single-cell perturbation benchmarks show that scBIG consistently outperforms state-of-the-art methods, particularly on unseen and combinatorial perturbation settings, achieving an average improvement of 6.7% over the strongest baselines. The code is available at https://github.com/ttruan2426-dot/scBIG.

2606.03609 2026-06-17 cs.RO cs.LG 版本更新

A 3D Isovist World Model -- Revealing a City's Unseen Geometry and Its Emergent Cross-City Signature

3D 等视域世界模型——揭示城市不可见几何及其涌现的跨城市特征

Xuhui Lin, Stephen Law, Nanjiang Chen, Kunyao Li, Tao Yang

发表机构 * The Bartlett School of Sustainable Construction University College London, UK(可持续建设学院伦敦大学学院,英国) Department of Geography University College London, UK(地理系伦敦大学学院,英国) School of Project Management, Faculty of Engineering The University of Sydney, AU(工程学院项目管理学院悉尼大学,澳大利亚) School of Engineering Cardiff University, UK(工程学院卡迪夫大学,英国) School of Architecture Tsinghua University, Beijing, CN(建筑学院清华大学,北京,中国)

AI总结 提出一种预测3D等视域(球形可见性深度图)的具身世界模型,通过深度残差和自滚动调度采样训练,发现跨城市空间特征可从时间潜变量中线性解码。

详情
AI中文摘要

在城市中导航的具身智能体依赖于世界模型来预测其移动时周围环境的变化。但对于导航而言,重要的不是建筑物的外观,而是智能体可以到达的位置。尽管如此,大多数世界模型仍然预测外观,学习场景的外观而非智能体可穿行的空间。那些确实针对几何的模型,如鸟瞰占用网格,将三维环境压缩到地面平面,忽略了塑造真实导航的地上和多层结构。目前缺少的是一个能够捕捉智能体实际穿行的可导航几何的预测目标,既不受光度信息干扰,也不丢失第三维度。我们的核心思想是对建筑物之间的开放体积(负空间)进行建模,编码为3D等视域:一个球形可见性深度图,记录每个方向上到最近表面的距离。我们引入了一个具身世界模型,根据过去短时间内的等视域历史和运动动作预测下一个等视域。预测被公式化为深度残差,使解码器继承锐利的建筑边缘,通过自滚动调度采样进行训练以保持几何流形上的上下文,并配备持久潜鸟瞰空间图以实现跨路径一致性。我们的核心发现是涌现且出乎意料的:一个在曼哈顿和巴黎上训练的单一城市盲模型发展出了跨城市空间特征,其城市身份可从时间潜变量中线性解码,远高于单帧基线,因此该特征存在于学习到的动力学中而非外观中。该表示轻量、可解释且可复现,为具身AI、机器人和城市分析中的空间推理提供了几何基础,并随附开放数据集和流程发布。

英文摘要

Embodied agents that navigate cities rely on world models that predict how their surroundings will change as they move. But for navigation, what matters is not what the buildings look like; it is where the agent can go. Most world models nonetheless predict appearance, learning how a scene looks rather than the space an agent can move through. Those that do target geometry, such as bird's-eye-view occupancy grids, flatten the three-dimensional environment onto a ground plane, discarding the above-ground and multi-level structure that shapes real navigation. What is missing is a predictive target that captures the navigable geometry an agent actually traverses, without photometric entanglement and without collapsing the third dimension. Our key idea is to model the open volume between buildings, the negative space, encoded as a 3D isovist: a spherical visibility-depth map recording the distance to the nearest surface in every direction. We introduce an embodied world model that predicts the next isovist from a short history of past isovists and a movement action. The prediction is formulated as a depth residual so the decoder inherits sharp building edges, trained with self-rollout scheduled sampling to keep corrupted context on the geometry manifold, and equipped with a persistent latent bird's-eye-view spatial map for cross-path consistency. Our central finding is emergent and unexpected: a single city-blind model trained on Manhattan and Paris develops a cross-city spatial signature, with city identity linearly decodable from its temporal latents far above single-frame baselines, so the signature lives in the learned dynamics rather than in appearance. The representation is lightweight, interpretable, and reproducible, offering a geometric substrate for spatial reasoning in embodied AI, robotics, and urban analysis, released with an open dataset and pipeline.

2606.12623 2026-06-17 stat.AP cs.LG 版本更新

Estimating Individualized Treatment Effects in Acute Ischemic Stroke with Causal Transformation Models (TRAM-DAG): A Multi-Centre Observational Study with External RCT Validation

使用因果变换模型(TRAM-DAG)估计急性缺血性卒中个体化治疗效果:一项多中心观察性研究及外部RCT验证

Oliver Dürr, Lisa Herzog, Pascal Bühler, Susanne Wegener, Beate Sick

AI总结 提出因果变换模型(TRAM-DAG)估计急性缺血性卒中患者个体化治疗效果,基于观察数据拟合后,在RCT人群中验证其平均效果与ATE一致,并能正确排序患者预后。

Comments This submission has been withdrawn by the authors pending completion of internal review. A revised version will be posted in due course

详情
AI中文摘要

急性缺血性卒中的个体化医疗需要从平均治疗效果(ATE)转向个体化治疗效果(ITE)估计,以支持治疗决策。在急性缺血性卒中中,随机对照试验(如MR CLEAN研究)显示机械取栓平均优于溶栓。我们旨在识别哪些个体患者从机械取栓中获益最大。关注的结局是三个月时的改良Rankin量表(mRS),这是一个有序的功能残疾指标(0:无症状,6:死亡)。我们证明,在观察性MAGIC多中心卒中患者数据上拟合后,有向无环图上的因果变换模型(TRAM-DAG)可用于ITE估计。为确保与用于验证的MR CLEAN人群的可比性,我们在MAGIC子人群(入院NIHSS≥6,对应MR CLEAN的一项纳入标准)上训练TRAM-DAG。然后使用拟合模型估计MR CLEAN人群中卒中患者的ITE。虽然这些ITE估计无法通过实验确认,但我们显示其平均值与试验报告的ATE一致。此外,ITE估计正确地将试验患者按观察到的良好结局(三个月mRS≤2)频率排序。这些发现支持使用像TRAM-DAG这样的因果模型进行卒中护理中的个性化决策,并突显其弥合观察性证据与临床试验之间差距的能力。

英文摘要

Personalized medicine in acute ischemic stroke requires moving beyond average treatment effects (ATE) to individualized treatment effect (ITE) estimates to support treatment decisions. In acute ischemic stroke, mechanical thrombectomy has been shown to be more effective on average than lysis in randomized controlled trials (RCTs), such as the MR CLEAN study. We aim to identify which individual patients benefit most from mechanical thrombectomy compared to lysis. The outcome of interest is the modified Rankin Scale (mRS) at three months, an ordinal measure of functional disability (0: no symptoms, 6: death). We demonstrate that causal transformation models on directed acyclic graphs (TRAM-DAG) can be used for ITE estimation after being fitted on observational MAGIC multi-center stroke patient data. To ensure comparability with the MR CLEAN population, which we use for validation, we train the TRAM-DAG on a MAGIC sub-population with NIHSS at admission >= 6, corresponding to one inclusion criterion of MR CLEAN. The fitted model is then used to estimate ITEs for stroke patients in the MR CLEAN population. While these ITE estimates cannot be confirmed experimentally, we show that their average is consistent with the trial's reported ATE. Furthermore, the ITE estimates correctly rank trial patients by their observed frequency of a good outcome (mRS at three months <= 2). These findings support the use of causal models like TRAM-DAG for personalized decision-making in stroke care and highlight their ability to bridge the gap between observational evidence and clinical trials.

2606.14081 2026-06-17 cs.CV cs.AI cs.LG eess.IV 版本更新

Clay-CNN Hybrids: Leveraging Geospatial Foundation Models as Auxiliary Context for Landslide Detection

Clay-CNN混合模型:利用地理基础模型作为滑坡检测的辅助上下文

Huong Binh Vu

发表机构 * Harvard University(哈佛大学)

AI总结 针对滑坡检测中的极端类别不平衡问题,提出将地理基础模型Clay v1.5作为辅助上下文注入U-Net瓶颈的混合方法,在Landslide4Sense基准上达到64.5% F1,优于纯Clay或U-Net基线。

详情
AI中文摘要

灾后快速滑坡制图对灾害响应至关重要,但由于极端类别不平衡,自动化仍然困难。本研究评估了地理基础模型(GFM)Clay v1.5是否能够改善Landslide4Sense(L4S)基准上的像素级滑坡分割,该基准包含3,799个训练块,具有14个Sentinel-2和地形波段,约2%的正像素。我们比较了三种策略:Clay作为主编码器并融合多尺度残差地形、在瓶颈处注入Clay语义上下文的U-Net骨干、以及标准U-Net基线。采用两阶段低秩适应(LoRA)的混合U-Net + Clay模型在三个随机种子上的最佳测试F1为64.5±1.8%,超过了纯Clay骨干(55.2±3.6%)和U-Net基线(59.9%)。由于缺乏多尺度跳跃连接,Clay作为独立编码器的性能低于U-Net,但其预训练表示在作为辅助上下文注入时持续提升了性能。这些发现表明,GFM在滑坡检测中最有效的方式是补充空间细节丰富的卷积架构,而非替代它们。

英文摘要

Rapid post-event landslide mapping is essential for disaster response but remains difficult to automate due to extreme class imbalance. This study evaluates whether Clay v1.5, a Geospatial Foundation Model (GFM), can improve pixel-level landslide segmentation on the Landslide4Sense (L4S) benchmark, which contains 3,799 training chips with 14 Sentinel-2 and terrain bands and approximately 2% positive pixels. We compare three strategies: Clay as the primary encoder with multi-scale residual terrain fusion, a U-Net backbone augmented with Clay semantic context at the bottleneck, and a standard U-Net baseline. The hybrid U-Net + Clay model with two-stage Low-Rank Adaptation (LoRA) achieved the best test F1 of 64.5 +/- 1.8% over three seeds, surpassing the Clay-only backbone (55.2 +/- 3.6%) and the U-Net baseline (59.9%). Clay as a standalone encoder underperformed the U-Net due to the absence of multi-scale skip connections, but its pretrained representations consistently improved performance when injected as auxiliary context. These findings suggest that GFMs are most effective for landslide detection when they complement spatially detailed convolutional architectures rather than replace them.

2606.16337 2026-06-17 cs.AI cs.HC cs.LG 版本更新

Medical Heuristic Learning: An LLM-Driven Framework for Interpretable and Auditable Clinical Decision Rules

医学启发式学习:一个用于可解释和可审计临床决策规则的LLM驱动框架

Wei Xu, Ke Yang, Gang Luo, Keli Zheng, Lingyan Hu, Jing Wang, Kefeng Li

发表机构 * Centre for Artificial Intelligence Driven Drug Discovery, Macao Polytechnic University(人工智能驱动药物发现中心,澳门理工学院) Key Laboratory of Short-Range Radio Equipment Testing and Evaluation, Ministry of Industry and Information Technology Terahertz Science Application Center (TSAC), Beijing Institute of Technology(工业和信息化部短距离无线电设备测试与评估重点实验室,太赫兹科学应用中心(TSAC),北京理工大学) Department of Critical Care Medicine, Yantai Yuhuangding Hospital, Qingdao University(重症医学科,烟台友谊医院,青岛大学) Faculty of Education, The University of Hong Kong(教育学院,香港大学) College of Information Engineering, Dalian University(信息工程学院,大连大学)

AI总结 提出医学启发式学习(MHL),利用LLM驱动的工作流优化确定性可执行决策系统,生成可解释、可审计的Python决策规则,在医学数据集上达到与最先进方法相当的性能,并支持小样本和高度不平衡场景。

详情
AI中文摘要

临床表格数据的预测建模是临床决策支持的核心,因此不仅需要强大的预测性能,还需要透明的决策逻辑。尽管深度学习和基于树的集成方法可以实现高精度,但其黑箱性质仍然是临床部署的主要障碍。这一挑战因医疗数据的常见特征而进一步加剧,包括有限的样本量、严重的类别不平衡以及因诊断标准和临床文档变化引起的特征演化。为了解决这些问题,我们提出了医学启发式学习(MHL),这是临床表格预测中超越梯度学习范式的一个实例。MHL不依赖神经网络权重更新,而是使用大型语言模型(LLM)驱动的工作流,整合统计探测、医学知识探测、规则合成和代码级迭代优化,以优化一个确定性的可执行决策系统。最终模型不是以不透明的参数表示,而是作为版本化的纯Python决策规则,这些规则明确可解释、完全可审计且具有临床基础。MHL还支持持续学习,从先前验证的规则开始,并在数据漂移或特征演化下使用更新的特征信息迭代修订规则。在医学数据集上的全面实验表明,MHL在保持与小样本和高度不平衡设置下强健行为的同时,实现了与最先进方法相当的性能。结果进一步表明,这种显式规则更新机制有助于缓解特征演化下的灾难性遗忘。总体而言,这些发现表明,非基于梯度的启发式系统为高风险临床决策支持提供了一种透明且可适应的替代方案。

英文摘要

Predictive modeling for clinical tabular data is central to clinical decision support and therefore requires not only strong predictive performance but also transparent decision logic. Although deep learning and tree-based ensemble methods can achieve high accuracy, their black-box nature remains a major obstacle to clinical deployment. This challenge is further compounded by common characteristics of medical data, including limited sample sizes, severe class imbalance, and feature evolution arising from changes in diagnostic criteria and clinical documentation. To address these issues, we propose Medical Heuristic Learning (MHL), an instantiation of the learning-beyond-gradients paradigm for clinical tabular prediction. Instead of relying on neural network weight updates, MHL uses a large language model (LLM)-driven workflow that integrates statistical probes, medical knowledge probes, rule synthesis, and code-level iterative refinement to optimize a deterministic and executable decision system. The resulting model is expressed not as opaque parameters, but as versioned pure-Python decision rules that are explicitly interpretable, fully auditable, and clinically grounded. MHL also supports continual learning by starting from previously validated rules and iteratively revising them using updated feature information under data drift or feature evolution. Comprehensive experiments on medical datasets show that MHL achieves performance comparable to state-of-the-art methods while maintaining strong behavior in small-sample and highly imbalanced settings. The results further indicate that this explicit rule update mechanism can help alleviate catastrophic forgetting under feature evolution. Overall, these findings suggest that non-gradient-based heuristic systems offer a transparent and adaptable alternative for high-stakes clinical decision support.

13. 其他/综合机器学习 30 篇

2606.17182 2026-06-17 cs.LG cs.DC cs.LO cs.MA cs.PL 新提交

Verified Detection and Prevention of Concurrency Anomalies in Multi-Agent Large Language Model Systems

多智能体大语言模型系统中并发异常的可验证检测与预防

Sajjad Khan

发表机构 * independent researcher(独立研究员)

AI总结 针对多智能体LLM系统,形式化四种并发异常并建立一致性层级,通过Verus验证检测器正确性,并在Rust运行时中实现预防。

Comments 32 pages, 2 figures, 6 tables. Verus/TLA+ verification artifact, reference Rust runtime, and Python harnesses, plus a supplementary appendix (Sections A-F, Tables S1-S6), included as ancillary files

详情
AI中文摘要

多智能体LLM系统通过内存存储、向量索引和工具注册表共享状态。我们将这种共享建模为在确定性生成语义(持久化执行引擎通过确定性重放强制执行的机制)下的长期读-生成-写操作,并在TLA+中形式化了四种并发异常:陈旧生成、幻影工具、因果级联和工具效应重排序,它们是经典隔离异常的结构类比,每种都有TLC反例。这些异常上的排除格是平凡的;贡献在于机械验证了其中一条最大链$L_0 \subsetneq \cdots \subsetneq L_4$的可实现性和严格分离,据我们所知,这是此类运行时第一个机器检查的一致性层级。通过274个Verus义务(零假设、零接受;信任基础:两个结构公理和一个互斥对应关系)的开发,证明了检测器相对于规范的正确性和完备性,以及每个运行时对应的避免集。三个部署的Rust运行时实现了L0-L1(悲观锁、可序列化快照隔离、默认SI),每个都针对陈旧生成进行了验证并细化到其状态机;L2-L4通过执行模式验证,并具有无依赖的预防孪生(A3、A6、A2:0/1000对比1000/1000),L2在三个模型家族上实时运行(A3在所有120个撤回会话中均被预防)。我们复现了字节跳动deer-flow中的静默丢失更新,将其修复形式化为已验证的$L_0 \to L_1$细化,并在LangGraph的ToolNode上展示了未修改输出中的工具效应重排序,通过L3提交顺序序列器消除。已验证的检测器、细化和可实现性工件是贡献;现象和格是经典的。

英文摘要

Multi-agent LLM systems share state through memory stores, vector indices, and tool registries. We model such sharing as long-running read-generate-write operations under deterministic-generation semantics -- the regime durable-execution engines enforce by deterministic replay -- and formalize four concurrency anomalies in TLA+: stale-generation, phantom-tool, causal-cascade, and tool-effect reordering, structural analogues of classical isolation anomalies, each with a TLC counter-example. The exclusion lattice over these anomalies is trivial; the contribution is the mechanically verified realizability and strict separation of one maximal chain within it, $L_0 \subsetneq \cdots \subsetneq L_4$, to our knowledge the first machine-checked consistency hierarchy for such runtimes. A development of 274 Verus obligations (zero assume, zero admit; trust base: two structural axioms and a mutex correspondence) proves the detectors sound and complete against the specifications and each runtime its avoidance set. Three deployed Rust runtimes realize L0-L1 (pessimistic locking, serializable snapshot isolation, default-SI), each verified against stale-generation and refined to its state machine; L2-L4 are exec-mode-verified with dependency-free prevention twins (A3, A6, A2: 0/1000 versus 1000/1000), and L2 is run live across three model families (A3 prevented in all 120 retracted sessions). We reproduce a silent lost update in ByteDance's deer-flow, formalizing its fix as a verified $L_0 \to L_1$ refinement, and exhibit tool-effect reordering in LangGraph's ToolNode on unmodified output, removed by an L3 commit-order sequencer. The verified detector, refinements, and realizability artifacts are the contribution; the phenomena and lattice are classical.

2606.17229 2026-06-17 cs.LG cs.AI cs.CL 新提交

Rift: A Conflict Signature for Deception in Language Models

Rift: 语言模型中欺骗行为的冲突特征

Petr Nyoma

发表机构 * Harmonic Labs

AI总结 通过对比知情欺骗与无知错误,发现欺骗性前向传递具有高残差秩的冲突特征,能以100%准确率无标签识别谎言,并跨模型、语言和架构迁移。

Comments 13 pages, 4 figures. Code and experiment logs: https://github.com/Omibranch/Rift

详情
AI中文摘要

一个在知道真相的情况下说谎的模型是ELK无法仅通过行为评估处理的核心情况。我们询问这种欺骗是否留下内部特征,将其与诚实错误区分开来。我们的关键举措是对错误进行控制:我们将一个潜伏代理(知道真相,在触发时说谎)与一个天真的说谎者(微调以输出相同的错误答案,没有诚实训练)进行对比。两者产生相同的错误输出;任何差异都关乎知识冲突,而非不正确性。我们发现欺骗性前向传递携带一个冲突特征——在相同错误答案上,残差秩比天真的说谎者传递高2.1-2.3倍——足以在GPT-2 small/medium(三个种子)和三个指令模型上,以100%准确率且无需标签,识别两个响应中哪个是谎言。在Qwen2.5-1.5B/7B和Phi-3-mini上,指令欺骗在每个测试事实上都提高了残差秩(18/18, 40/40, 34/34);在Phi-3上,谎言与诚实答案和幻觉完美分离(AUC 1.0,Wilcoxon p~6e-11)。该特征在策略性自构建欺骗(模型自创谎言,AUC 1.0)、主动隐藏尝试(AUC 1.0)和长度控制复制(20/20,AUC 1.0,p~1e-6)中仍然存在。使用无基相对表示,在一个模型家族上训练的探针在零样本下检测到另外两个家族中的欺骗(平均AUC 0.933),在同时改变架构和格式时仍有效(AUC 0.821),并跨五种语言迁移(AUC 1.000,长度控制)。该特征是只读的:可检测但不可注入(双向0/8)。诚实的局限性和六个负面实验已完整记录。

英文摘要

A model that lies while knowing the truth is the central case ELK cannot handle with behavioral evaluation alone. We ask whether such deception leaves an internal signature distinguishing it from honest error. Our key move is a control for wrongness: we contrast a sleeper agent (knows the truth, lies on trigger) against a naive liar (fine-tuned to emit the same wrong answers with no honest training). Both produce identical wrong outputs; any difference is about knowledge conflict, not incorrectness. We find deceptive forward passes carry a conflict signature - 2.1-2.3x higher residual rank than naive-liar passes on the same wrong answer - strong enough to identify which of two responses is the lie with 100% accuracy and no labels, across GPT-2 small/medium (three seeds) and three instruct models. Across Qwen2.5-1.5B/7B and Phi-3-mini, instructed deception raises residual rank on every tested fact (18/18, 40/40, 34/34); on Phi-3, lies separate perfectly from both honest answers and hallucinations (AUC 1.0, Wilcoxon p~6e-11). The signature survives strategic self-constructed deception (model invents its own lie, AUC 1.0), active concealment attempts (AUC 1.0), and length-controlled replication (20/20, AUC 1.0, p~1e-6). Using basis-free relative representations, a probe trained on one model family detects deception in two other families zero-shot (mean AUC 0.933), surviving simultaneous architecture and format change (AUC 0.821), and transfers across five languages (AUC 1.000, length-controlled). The signature is read-only: detectable but not injectable (0/8 both directions). Honest limitations and six negative experiments are documented in full.

2606.17383 2026-06-17 q-fin.RM cs.AI cs.LG stat.ML 交叉投稿

Model Validation of Agentic AI Systems: A POMDP-Based Framework for Belief-State, Forecast, and Policy Validation

智能体AI系统的模型验证:基于POMDP的信念状态、预测与策略验证框架

Matthew Francis Dixon

发表机构 * Quiota LLC(Quiota公司)

AI总结 提出基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架,将自主决策分解为信息、信念、预测、动作和效用组件独立验证,并通过投资组合管理案例展示其有效性。

Comments 28 pages, 3 figures, 6 tables. Source code available from https://github.com/mfrdixon/agentic-AI-as-POMDP

详情
AI中文摘要

智能体人工智能系统引入了一类新的模型风险。与传统预测模型不同,自主智能体持续获取信息,形成关于环境潜在状态的信念,生成预测,选择行动,并随时间调整其行为。现有的验证方法主要关注预测准确性,因此对底层决策过程的质量提供的洞察有限。本文提出了一种基于部分可观测马尔可夫决策过程(POMDP)的智能体AI模型验证框架。该框架将自主决策分解为信息、信念、预测、行动和效用,允许每个组件独立验证。大型语言模型(LLM)被形式化为近似贝叶斯滤波算子,并开发了一个模型风险分类体系,涵盖状态空间、滤波、预测、策略、效用规范和参数风险。通过一个投资组合管理案例研究展示了模型风险验证方法,其中智能体从市场和宏观经济信息中推断潜在市场制度,生成基于信念的预测,并使用Black-Litterman框架构建投资组合。实证验证结合了性能分析、信念校准诊断、覆盖测试、消融研究和参数敏感性分析。结果表明,潜在状态推断对决策质量有独立贡献,且主要结论在广泛的参数值范围内保持稳健。本文的主要贡献是提供了一个实用框架,将已建立的模型风险管理概念扩展到自主AI系统,并为其验证、治理和监控提供了严格的基础。

英文摘要

Agentic artificial intelligence systems introduce a new class of model risk. Unlike traditional predictive models, autonomous agents continuously acquire information, form beliefs regarding latent states of the environment, generate forecasts, select actions, and adapt their behavior over time. Existing validation methodologies focus primarily on predictive accuracy and therefore provide limited insight into the quality of the underlying decision process. This paper proposes a model validation framework for agentic AI based on Partially Observable Markov Decision Processes (POMDPs). The framework decomposes autonomous decision making into information, beliefs, forecasts, actions, and utility, allowing each component to be validated independently. Large language models (LLMs) are formalized as approximate Bayesian filtering operators, and a model-risk taxonomy is developed encompassing state-space, filtering, forecast, policy, utility-specification, and parameter risks. The model risk validation methodology is demonstrated through a portfolio-management case study in which an agent infers latent market regimes from market and macroeconomic information, generates belief-conditioned forecasts, and constructs portfolios using a Black--Litterman framework. Empirical validation combines performance analysis, belief calibration diagnostics, coverage tests, ablation studies, and parameter-sensitivity analysis. The results indicate that latent-state inference contributes independently to decision quality and that the principal conclusions remain robust across a broad range of parameter values. The principal contribution of the paper is a practical framework for extending established model risk management concepts to autonomous AI systems and providing a rigorous foundation for their validation, governance, and monitoring.

2606.18120 2026-06-17 cs.CR cs.AI cs.CL cs.LG 交叉投稿

Structural Role Injection in Handlebars-Templated LLM Prompts: Triple-Brace Interpolation, Delimiter Family, and the Limits of HTML Auto-Escaping

Handlebars模板化LLM提示中的结构角色注入:三花括号插值、分隔符家族与HTML自动转义的局限性

Mohammadreza Rashidi

发表机构 * Department of Computer Science AI(计算机科学系人工智能) Media Analysis Lab Berlin, Germany(媒体分析实验室柏林德国)

AI总结 本文研究Handlebars模板引擎中双花括号与三花括号插值对结构角色注入攻击的影响,通过无模型分析和5760次实验,揭示HTML转义仅保护特定分隔符家族,无法替代指令与数据的结构分离。

Comments 7 pages, 6 figures

详情
AI中文摘要

大型语言模型应用从模板构建提示,Handlebars是广泛使用的模板引擎,也是Microsoft Semantic Kernel中的默认提示模板格式。其双花括号{x}表达式对插值值进行HTML转义,并被记录为安全默认;而三花括号{x}表达式则直接插入原始值。我们表明,这一选择悄然决定了应用对结构角色注入的暴露程度,攻击者控制的数据携带聊天角色分隔符,从而伪造高权限轮次。无模型分析建立了机制:Handlebars转义重写尖括号,但不重写方括号、冒号或Markdown井号,因此它中和了ChatML、Llama-3和XML角色分隔符(存活率0.00),同时保留Llama-2 [INST]、传统Human:/Assistant:和Markdown ###分隔符(后两者存活率1.00)。随后,我们在七个分隔符家族、两个攻击目标和四个模型(GPT-3.5 Turbo、GPT-4o mini、GPT-4.1 mini、Claude Haiku 4.5)上运行了5760次试验,总API成本为1.63美元。GPT-3.5 Turbo在97%的原始试验和91%的转义试验中遵循任务劫持指令,转义保护集中在尖括号家族,而在冒号和Markdown家族中缺失;更难的秘密泄露目标未饱和,更清晰地暴露了相同的家族交互。Claude Haiku 4.5几乎完全抵抗了两个目标。转义默认仅保护HTML转义恰好覆盖的分隔符方案,对剩余方案无保护,且无法替代指令与数据的结构分离。

英文摘要

Large language model applications build prompts from templates, and Handlebars is a widely used templating engine and the default prompt-template format in Microsoft Semantic Kernel. Its double-brace {x} expression HTML-escapes the interpolated value and is documented as the safe default; its triple-brace {x} expression inserts the value raw. We show that this choice silently governs an application's exposure to structural role injection, where attacker-controlled data carries chat role delimiters that forge a higher-privilege turn. A model-free analysis establishes the mechanism: Handlebars escaping rewrites angle brackets but not square brackets, colons, or Markdown hashes, so it neutralises ChatML, Llama-3, and XML role delimiters (survival rate 0.00) while leaving Llama-2 [INST], legacy Human:/Assistant:, and Markdown ### delimiters intact (survival rate 1.00 for the last two). We then run 5760 trials across seven delimiter families, two attack objectives, and four models (GPT-3.5 Turbo, GPT-4o mini, GPT-4.1 mini, Claude Haiku 4.5) at a combined API cost of 1.63 USD. GPT-3.5 Turbo follows the task-hijack instruction in 97% of raw and 91% of escaped trials, with the escaping protection concentrated in the angle-bracket families and absent for the colon- and Markdown-based families; the harder secret-exfiltration objective, which does not saturate, exposes the same family interaction more cleanly. Claude Haiku 4.5 resists both objectives almost entirely. The escaped default protects only the delimiter schemes whose characters HTML escaping happens to cover, gives no protection for the rest, and cannot substitute for a structural separation of instruction and data.

2606.18144 2026-06-17 cs.AI cs.CY cs.LG cs.RO 交叉投稿

Memory as a Wasting Asset: Pricing Flash Endurance for Embodied Agents, and the Limits of Doing So

记忆作为消耗性资产:为具身智能体定价闪存耐久性及其局限性

Josef Liyanjun Chen

发表机构 * KAIKAKU

AI总结 本文提出将机器人闪存耐久性视为折旧资本,通过单一影子价格η进行定价,实现成本最优的存储层级分配,并基于真实机器人日志测量价值-写入关联χ的符号,发现其取决于部署场景。

详情
AI中文摘要

机器人的闪存耐久性是一种不可再生资源:每次持久化写入都会消耗数千次编程/擦除周期中的一次,且无法补充,然而目前没有实际部署的机器人内存系统对哪些记忆值得消耗一次擦除周期进行定价。我们将具身记忆视为折旧资本,并用单一耐久性影子价格η对该资源定价,这使得在RAM/板载NVM/云层级中进行成本最小化的放置成为一个在磨损增强的每字节索引中的阈值。无论价值-写入关联χ的符号如何,该索引都是成本最优的;只有当χ>0时,最优解才变为非单调,将机器人最有价值的记忆从闪存中移出。因此,关键点是经验性的,我们在预定义的关口上测量真实机器人日志中的χ:其符号是部署场景的一个属性——在重复的长时域操作中为正(χ̂≈+1.0×10^{-3},在全功率下可复现),在较短时域任务中为零,在非重复遥操作中为负。两个边界限制了该结果。在高端3,000 P/E TLC闪存按数据手册价格计算时,耐久性预算处于休眠状态;而在廉价边缘机器人使用的商用QLC/eMMC(约1,000 P/E)上则具有约束力。当约束生效时,学习到的磨损感知控制器仅在任务价值上与基于价格的路由持平,因为实现的价值在RAM、NVM和云层级之间是不变的:租金决定设备寿命和成本,而非任务性能。磨损感知放置是否能提高任务价值仍是一个开放问题——χ是针对价值代理测量的,而非单调最优解虽已被证明,但尚未在数据中观察到。

英文摘要

A robot's flash endurance is a non-renewable stock: every persisted write spends one of a few thousand program/erase cycles and never refills, yet no fielded robot memory system prices which memories are worth an erase cycle. We treat embodied memory as depreciating capital and price that stock with a single endurance shadow price $η$, which makes cost-minimizing placement across a RAM / on-board NVM / cloud hierarchy a threshold in a wear-augmented per-byte index. The index is cost-optimal whatever the sign of the value-write association $χ$; only when $χ> 0$ does the optimum turn non-monotone, sending a robot's most valuable memories off its flash. The pivot is thus empirical, and we measure $χ$ on real robot logs at a pre-specified gate: its sign is a property of the deployment regime -- positive on recurrent long-horizon manipulation ($\hatχ \approx +1.0 \times 10^{-3}$, replicated at full power), null on a shorter-horizon suite, and negative on non-recurrent teleoperation. Two boundaries scope the result. The endurance budget is dormant on premium 3,000-P/E TLC at datasheet prices and binding on the commodity QLC/eMMC ($\sim$1,000 P/E) that cheaper edge robots run. And where it binds, a learned wear-aware controller only ties price-based routing on task value, because realized value is tier-invariant across RAM, NVM, and cloud: the rent governs device lifetime and cost, not task performance. Whether wear-aware placement improves task value remains open -- $χ$ is measured against a value proxy, and the non-monotone optimum, while proven, is not yet observed in data.

2409.17502 2026-06-17 cs.LG 版本更新

Broadcast Product: Redefining Shape-aligned Element-wise Multiplication and Beyond

广播乘积:重新定义形状对齐的逐元素乘法及其扩展

Yusuke Matsui, Tatsuya Yokota

AI总结 本文引入广播乘积$\boxdot$,形式化扩展Hadamard乘积以处理形状不匹配的张量逐元素乘法,并建立其代数性质及与线性代数的联系,为广播感知的张量运算奠定数学基础。

Comments TMLR2026. OpenReview: https://openreview.net/forum?id=zv0OtOPpPO

详情
AI中文摘要

广播操作在科学计算库中被广泛使用,但其数学形式化在机器学习文献中常常是隐式的且表示不一致。当逐元素乘积被写出但张量形状不匹配时,这个问题经常导致无效的方程。在本文中,我们通过引入广播乘积$\boxdot$来形式化此类操作,该乘积通过形状对齐的元素复制显式扩展了Hadamard乘积。我们提供了广播乘积的严格定义,分析了其代数性质,并展示了如何使用标准线性代数表示它。基于这一框架,我们制定了最小二乘问题并勾勒出一个概念验证的广播分解。作为初步说明,我们展示了该形式化方法能够产生一类具有与传统张量分解不同结构特性的新分解。这项工作为广播感知的张量运算建立了数学基础,将实际实现与严格的张量分析联系起来。

英文摘要

Broadcast operations are widely used in scientific computing libraries, yet their mathematical formulation is often implicit and inconsistently represented in machine learning literature. This problem frequently leads to invalid equations when element-wise products are written despite mismatched tensor shapes. In this paper, we formalize such operations by introducing the broadcast product $\boxdot$, which explicitly extends the Hadamard product through shape-aligned element duplication. We provide a rigorous definition of the broadcast product, analyze its algebraic properties, and show how it can be expressed using standard linear algebra. Building on this framework, we formulate least-squares problems and sketch a proof-of-concept broadcast decomposition. As a preliminary illustration, we show that the formalism enables a new family of decompositions with distinct structural properties from conventional tensor decompositions. This work establishes a mathematical foundation for broadcast-aware tensor operations, connecting practical implementations with rigorous tensor analysis.

2605.12646 2026-06-17 cs.LG cs.AI cs.HC 版本更新

Learning to Decide with AI Assistance under Human-Alignment

在人工智能协助下的人类对齐决策学习

Nina Corvelo Benz, Eleni Straitouri, Manuel Gomez-Rodriguez

发表机构 * GitHub

AI总结 本文研究了在高风险领域中,人工智能如何通过预测结果帮助决策者,并探讨了AI预测信心与决策者自身信心的对齐程度对决策学习复杂性的影响。

详情
AI中文摘要

人们普遍认为,当人工智能模型通过预测感兴趣的结果来协助决策者时,它们应传达预测的置信度。然而,实证证据表明,决策者往往难以仅根据传达的置信度来判断何时信任预测。在此背景下,近期的理论和实证工作表明,AI辅助决策的效用与AI置信度和决策者自身置信度之间的对齐程度之间存在正相关性。关键的是,这些发现尚未阐明这种对齐程度如何影响通过重复交互学习做出最佳决策的复杂性。在本文中,我们考虑二元预测和二元决策的典型情况,首先证明该问题等价于具有完全反馈的双臂在线上下文学习问题,并建立了任何学习者可以达到的期望遗憾的下界为$Ω(\sqrt{|H| \cdot |B| \cdot T} )$,其中$H$和$B$分别表示人类和AI置信度的集合。然后我们证明,在AI和人类置信度完全对齐的情况下,学习者可以达到期望遗憾为$O(\sqrt{|H| \cdot T\log T})$,当$\sqrt{|H|} = O(\log T)$且$B$是可数的时,Dvoretzky-Kiefer-Wolfowitz不等式的非平凡推广将遗憾界改进到$O(\sqrt{T\log T})$。这些结果表明,对齐可以减少在人工智能协助下学习决策的复杂性。在两个不同的人类主体研究中,参与者通过AI模型协助解决简单决策任务的实验证明,我们的理论结果在完全对齐被违反时仍然稳健。

英文摘要

It is widely agreed that when AI models assist decision-makers in high-stakes domains by predicting an outcome of interest, they should communicate the confidence of their predictions. However, empirical evidence suggests that decision-makers often struggle to determine when to trust a prediction based solely on this communicated confidence. In this context, recent theoretical and empirical work suggests a positive correlation between the utility of AI-assisted decision-making and the degree of alignment between the AI confidence and the decision-makers' confidence in their own predictions. Crucially, these findings do not yet elucidate the extent to which this alignment influences the complexity of learning to make optimal decisions through repeated interactions. In this paper, we address this question in the canonical case of binary predictions and binary decisions. We first show that this problem is equivalent to a two-armed online contextual learning problem with full feedback, and establish a lower bound of $Ω(\sqrt{|H| \cdot |B| \cdot T} )$ on the expected regret any learner can attain, where $H$ and $B$ denote the sets of human and AI confidence values. We then demonstrate that, under perfect alignment between AI and human confidence, a learner can attain an expected regret of $O(\sqrt{|H| \cdot T\log T})$ and, when $\sqrt{|H|} = O(\log T)$ and $B$ is countable, a non-trivial generalization of the Dvoretzky-Kiefer-Wolfowitz inequality improves the regret bound to $O(\sqrt{T\log T})$. Taken together, these results reveal that alignment can reduce the complexity of learning to make decisions with AI assistance. Experiments on real data from two different human-subject studies where participants solve simple decision-making tasks assisted by AI models show that our theoretical results are robust to violations of perfect alignment.

2606.15386 2026-06-17 cs.LG 版本更新

A Compositional Framework for Open-ended Intelligence

开放智能的组合框架

Ida Momennejad, Roberta Raileanu

发表机构 * GitHub

AI总结 提出开放智能的形式化定义,通过有限原始集和组合算子生成闭包,支持跨任务和世界的无限组合生成,并引入下一原始预测作为架构目标。

详情
AI中文摘要

开放智能是指适应与训练环境显著不同的新问题和新环境的能力。我们将开放智能形式化为由有限原始集 \(P\) 和一组组合算子 \(C\) 诱导的闭包。我们刻画了诱导闭包 \(\mathcal{L}(P,C)\) 的性质,该闭包支持跨任务和世界族的无界组合生成。开放智能的数学需要两个支柱:一组最小的表示原始(例如状态、动作)和算法原始(例如最近邻),以及反映习得组合语法的组合模式(例如递归、序列化)。这两个支柱的闭包使得能够在广泛的环境中生成无限的自适应响应。该数学支持互补的研究议程,包括解释性和可解释性的评估指标,以及构建组合泛化原生的架构。我们提出下一原始预测作为一种新的架构目标,其中训练目标鼓励获取可重用的算法原始及其组合语法,从而通过重组生成新的解决方案。课程学习和自我博弈通过跨任务和世界族发现可重用原始和转换模式,实现闭包的终身学习和扩展。我们通过物理学、进化论和神经科学的案例研究来夯实该框架。

英文摘要

Open-ended intelligence is the capacity to adapt to novel problems and environments that are substantially different from those in training. A mathematics of open-ended intelligence requires two pillars: first, a minimal set of representational primitives (e.g., states, actions) and algorithmic primitives (e.g., nearest neighbor); and second, an acquired compositional grammar for selection, recursion, and branching that produces sequences of operations and recurring motifs. We formalize open-ended intelligence in terms of the compositional closure induced by a finite primitive set $P$ and a set of composition operators $C$. We characterize properties of the induced closure $\mathcal{L}(P,C)$ that support unbounded compositional generation across families of tasks and worlds. The closure of the two pillars yields infinite adaptive responses across a wide range of settings. The mathematics supports complementary research agendas, including evaluation metrics for explanation and interpretability, and novel architectures where compositional generalization is native. We propose next primitive prediction (NPP) as a novel architectural objective, where training encourages the acquisition of reusable algorithmic primitives and their compositional grammar, such that new solutions are generated through recombination. Given such an objective, curriculum learning and self-play can enable lifelong learning, expanding the closure by discovering reusable primitives and transition motifs across settings. We ground the framework through case studies in physics, evolution, and neuroscience.

2606.06523 2026-06-17 cs.AI cs.LG cs.LO cs.SE 版本更新

Lean4Agent: Formal Modeling and Verification for Agent Workflow and Trajectory

Lean4Agent:面向智能体工作流与轨迹的形式化建模与验证

Ruida Wang, Jerry Huang, Pengcheng Wang, Xuanqing Liu, Luyang Kong, Tong Zhang

发表机构 * University of Illinois Urbana-Champaign(伊利诺伊大学厄巴纳-香槟分校) Independent researcher(独立研究者)

AI总结 提出Lean4Agent框架,利用依赖类型形式语言Lean4对智能体工作流进行形式化建模与验证,通过FormalAgentLib库和LeanEvolve方法提升工作流可靠性,实验验证通过的工作流性能平均提升11.94%。

详情
AI中文摘要

使大型语言模型(LLMs)能够执行可靠的多步工作流已成为人工智能领域的核心挑战。尽管LLMs的智能体能力近期取得了进展,但大多数智能体系统仍缺乏用于指定、验证和调试其工作流及执行轨迹的形式化方法。这一挑战类似于数学中长期存在的问题,其中自然语言(NL)的模糊性促使了形式语言(FL)的发展。受此范式启发,我们提出了**Lean4Agent**,据我们所知,这是首个使用依赖类型形式语言Lean4来建模和验证智能体行为的框架。**Lean4Agent**推出了**FormalAgentLib**,一个可扩展的Lean4库,用于在显式假设下形式化建模和验证智能体工作流的语义一致性,并能够定位轨迹揭示的运行时故障。基于**FormalAgentLib**,我们进一步开发了**LeanEvolve**,它应用**FormalAgentLib**中的结果来修订工作流以增强其能力。在SWE-Bench-Verified的困难子集和ELAIP-Bench子集上,针对5个领先LLMs的大量实验表明,通过验证的工作流比未通过的工作流平均性能提升**11.94%**,而**LeanEvolve**进一步将SWE性能平均提升**7.47%**。此外,**Lean4Agent**为使用表达能力强的依赖类型形式语言形式化建模和验证智能体行为这一新领域奠定了基础。

英文摘要

Equipping Large Language Models (LLMs) to execute reliable multi-step workflows has become a central challenge in artificial intelligence. Despite recent advances in LLMs' agentic capabilities, most agent systems still lack formal methods for specifying, verifying, and debugging their workflow and execution trajectories. This challenge mirrors a long-standing problem in mathematics, where the ambiguity of natural languages (NLs) motivates the development of formal languages (FLs). Inspired by this paradigm, we propose **Lean4Agent**, to the best of our knowledge, the first framework that uses Lean4, a dependent-type FL to model and verify agent behavior. **Lean4Agent** launches **FormalAgentLib**, an extensible Lean4 library for formally modeling and verifying agent workflows' semantic consistency under explicit assumptions, and enabling localization of execution-time failures revealed by trajectories. Building on **FormalAgentLib**, we further develop **LeanEvolve**, which applies results in **FormalAgentLib** to revise workflows to enhance its capability. Extensive experiments on a hard problem subset of SWE-Bench-Verified and a subset of ELAIP-Bench across 5 leading LLMs indicate that the verification-passing workflows outperform the failing ones by an average of **11.94%**, and **LeanEvolve** further improves SWE performance by **7.47%** on average. Furthermore, **Lean4Agent** establishes a foundation for a new field of using expressive dependent-type FL to formally model and verify agent behavior.

2606.13827 2026-06-17 math.NA cs.LG cs.NA stat.ML 版本更新

Approximating Gaussian Whittle-Matern Fields over Well-Centered Triangulations of Riemannian Manifolds

离散流形上的Whittle-Matérn场逼近

Srinivas Nambirajan

发表机构 * Riemannian Manifolds(黎曼流形) Discrete Exterior Calculus(离散外 calculus) Finite Element Exterior Calculus(有限元外 calculus)

AI总结 提出一种基于离散外微分的GMRF逼近方法,统一处理Whittle-Matérn场族,支持推断参数,兼容点/分段平滑测量,计算独立于插值函数,并给出低秩近似用于压缩感知。

Comments More specific title, updated acknowledgement, minor typos fixed

详情
AI中文摘要

马尔可夫Whittle-Matérn场已通过稀疏精度矩阵的高斯马尔可夫随机场(GMRF)收敛逼近,使用两参数族SPDE的有限元近似:\\( (\kappa^2 - \Delta)^{\alpha/2} u = \mathcal{W}, \\;\\; \kappa \in \mathbb{R}, \\; \alpha \in \mathbb{N} \\)。利用离散外微积分(DEC)分析的最新进展,我们提出了一种不同但密切相关的收敛GMRF逼近方法,适用于离散化为良好中心单纯复形的完备无边黎曼流形上的Matérn场。该收敛方法:(i) 对\\(\alpha, \kappa\\)不可知,从而允许对整个\\((\alpha, \kappa)\\)族GMRF的精度和协方差矩阵进行通用逼近方案,因此它们可以被推断而非猜测。(ii) 固有地模拟随机场的逐点和分段平滑测量,并对两者同样好地逼近。(iii) 计算上与所用插值函数无关——如果将一种收敛插值替换为同一网格上的另一种合适插值,不会产生额外开销。此外,我们证明,在精确意义上良好连接且体积集中的离散化上,精度矩阵是图拉普拉斯的谱函数。我们为该族Matérn GMRF提供了一个低秩逼近器,并提及一个用例:通过压缩感知减少建模GMRF所需的测量数量。

英文摘要

Markovian Whittle-Matérn fields have been convergently approximated by discrete Gauss Markov Random Fields (GMRFs) with sparse precision matrices using a Finite Element approximation of the two-parameter family, \[ (κ^2 - Δ)^{α/2} u = \mathcal{W}, \;\; κ\in \mathbb{R}, \; α\in \mathbb{N}. \] of SPDEs. Using recent developements in the analysis of Discrete Exterior Calculus (DEC), we present a different, yet closely related, convergent GMRF approximation to these Matérn fields over complete, boundaryless Riemannian manifolds discretized as well-centered simplicial complexes. This convergent method (i) is agnostic to $α, κ$ and thus allows a universal approximation scheme for the precision and covariance matrices of the entire $(α, κ)$-family of GMRFs, so they may be inferred rather than guessed. (ii) inherently models pointwise and piecewise-smoothed measurements of a random field and approximates both equally well (iii) is computationally independent of the interpolants used - it suffers no overhead if one convergent interpolant were replaced with another suitable interpolant over the same mesh. Furthermore, we show that, on discretizations that are well-connected in a precise sense, and volume-concentrated, the precision matrices are spectral functions of a graph-laplacian. We provide a low rank approximator to the family of such Matérn GMRFs and mention a use case: reducing the number of measurements needed to model the GMRF by compressed-sensing.

2502.17773 2026-06-17 stat.ME cs.AI cs.LG 版本更新

How Many Human Survey Respondents is a Large Language Model Worth? An Uncertainty Quantification Perspective

大型语言模型值得模拟多少人意见?从不确定性量化角度出发

Chengpiao Huang, Yuhang Wu, Kaizheng Wang

发表机构 * Department of IEOR, Columbia University(哥伦比亚大学工业工程与运筹学系) Decision, Risk, and Operations Division, Columbia Business School(哥伦比亚商学院决策、风险与运营分校) Department of IEOR and Data Science Institute, Columbia University(哥伦比亚大学工业工程与运筹学系及数据科学研究所)

AI总结 本文从不确定性量化角度出发,提出了一种框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,通过量化人类-LLM不一致带来的不确定性。关键设计是模拟响应的数量:过多会导致置信集过窄且覆盖性差,过少则导致置信集过宽且信息不足。本文提出了一种数据驱动的方法,自适应选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步反映了LLM能代表的有效人类人口规模,提供了其模拟保真度的定量度量。实验表明不同LLM和领域存在异质性模拟保真度。

Comments 63 pages, 13 figures

详情
AI中文摘要

大型语言模型(LLMs)越来越多地用于模拟调查响应,但合成数据可能与人类人口不一致,导致不可靠的推断。我们开发了一个通用框架,将LLM模拟的响应转换为人类响应总体参数的可靠置信集,量化由人类-LLM不一致引起的不确定性。关键设计选择是模拟响应的数量:过多会产生过于狭窄的置信集,覆盖性差;过少则会产生过于宽泛且信息不足的置信集,受随机噪声主导。我们提出了一种数据驱动的方法,自适应地选择模拟样本量以实现名义平均覆盖性,无论LLM的模拟保真度或置信集构建过程如何。所选样本量进一步被证明反映了LLM能代表的有效人类人口规模,提供其模拟保真度的定量度量。在真实调查数据集上的实验揭示了不同LLM和领域之间的异质性模拟保真度。

英文摘要

Large language models (LLMs) are increasingly used to simulate survey responses, but synthetic data can be misaligned with the human population, leading to unreliable inference. We develop a general framework that converts LLM-simulated responses into reliable confidence sets for population parameters of human responses, quantifying the uncertainty induced by the human-LLM misalignment. The key design choice is the number of simulated responses: too many produce overly narrow sets with poor coverage, while too few yield overly wide and uninformative sets dominated by stochastic noise. We propose a data-driven approach that adaptively selects the simulation sample size to achieve nominal average-case coverage, regardless of the LLM's simulation fidelity or the confidence set construction procedure. The selected sample size is further shown to reflect the effective human population size that the LLM can represent, providing a quantitative measure of its simulation fidelity. Experiments on real survey datasets reveal heterogeneous simulation fidelity across different LLMs and domains.

2605.12220 2026-06-17 cs.CV cs.AI cs.LG cs.RO 版本更新

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

TriBand-BEV:基于高度感知的鸟瞰图与高分辨率特征融合的实时仅LiDAR三维行人检测

Mohammad Khoshkdahan, Alexey Vinel

发表机构 * Karlsruhe Institute of Technology(卡尔斯鲁厄理工学院)

AI总结 本文提出TriBand-BEV方法,通过高度感知的鸟瞰图与高分辨率特征融合实现实时LiDAR-only三维行人检测,采用轻量级鸟瞰图张量映射,单网络一次通过检测车辆、行人和自行车,提升检测精度与速度。

Comments Accepted for publication in the Proceedings of the 2026 International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)

详情
Journal ref
Proceedings of the 25th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2026)
AI中文摘要

安全的自动驾驶代理和移动机器人需要快速的实时三维感知,尤其是对于行人等易受伤害道路使用者。我们介绍了一种新的鸟瞰图(BEV)编码方法,将完整的三维LiDAR点云映射到轻量级的二维BEV张量中,分为三个高度带。我们明确地将三维检测重新公式化为二维检测问题,然后从BEV输出中重建三维框。单个网络在一次通过中检测车辆、行人和自行车。骨干网络在深层阶段使用区域注意力,层次化的双向颈部网络在P1到P4之间融合上下文和细节,头部使用分布焦点学习预测定向框,以预测侧偏移和旋转IoU损失。训练应用小垂直重新分箱和温和的反射率抖动以防止记忆化。我们使用四分位距(IQR)过滤器在三维重建中去除噪声和离群的LiDAR点。在KITTI数据集上,TriBand-BEV在49 FPS的单个消费级GPU上实现了易、中等和困难样本的行人BEV AP分别为58.7/52.6/47.2%,优于Complex-YOLO,分别提升了+12.6%、+7.5%和+3.1%。定性场景显示在遮挡下检测稳定。该流程紧凑且适用于实时机器人部署。我们的源代码在GitHub上公开可用。

英文摘要

Safe autonomous agents and mobile robots need fast real time 3D perception, especially for vulnerable road users (VRUs) such as pedestrians. We introduce a new bird's eye view (BEV) encoding, which maps the full 3D LiDAR point cloud into a light-weight 2D BEV tensor with three height bands. We explicitly reformulate 3D detection as a 2D detection problem and then reconstruct 3D boxes from the BEV outputs. A single network detects cars, pedestrians, and cyclists in one pass. The backbone uses area attention at deep stages, a hierarchical bidirectional neck over P1 to P4 fuses context and detail, and the head predicts oriented boxes with distribution focal learning for side offsets and a rotated IoU loss. Training applies a small vertical re bin and a mild reflectance jitter in channel space to resist memorization. We use an interquartile range (IQR) filter to remove noisy and outlier LiDAR points during 3D reconstruction. On KITTI dataset, TriBand-BEV attains 58.7/52.6/47.2 pedestrian BEV AP(%) for easy, moderate, and hard at 49 FPS on a single consumer GPU, surpassing Complex-YOLO, with gains of +12.6%, +7.5%, and +3.1%. Qualitative scenes show stable detection under occlusion. The pipeline is compact and ready for real time robotic deployment. Our source code is publicly available on GitHub.

2604.13662 2026-06-17 cond-mat.mes-hall cs.CV cs.LG 版本更新

Automatic Charge State Tuning of 300 mm FDSOI Quantum Dots Using Neural Network Segmentation of Charge Stability Diagram

300毫米FDSOI量子点自动电荷状态调节:基于神经网络的电荷稳定性图分割

Peter Samaha, Amine Torki, Ysaline Renaud, Sam Fiette, Emmanuel Chanrion, Pierre-Andre Mortemousque, Yann Beilliard

发表机构 * CEA-Leti(法国格勒诺耶大学(Univ. Grenoble Alpes))

AI总结 本文提出基于深度学习的语义分割流程,通过识别电荷稳定性图中的过渡线实现量子点自动电荷调节,提升硅量子点量子比特的高通量电荷调节效率。

Comments 10 pages, 6 figures, supplementary materials available

详情
AI中文摘要

调节由门定义的半导体量子点(QDs)是扩展自旋量子比特技术的主要瓶颈。我们提出了一种由深度学习(DL)驱动的语义分割流程,通过在完整的电荷稳定性图(CSDs)中定位过渡线来实现电荷自动调节,并返回单电荷 regime 的门电压目标。我们组装并手动注释了1015个实验测量的硅量子点设备的大型异构数据集,涵盖九种设计几何形状、多个晶圆和制造批次。一个具有MobileNetV2编码器的U-Net风格卷积神经网络(CNN)通过五折分组交叉验证进行训练和验证。我们的模型在定位单电荷 regime 方面实现了80.0%的离线调节成功率,某些设计的峰值性能超过88%。我们分析了主导的失败模式并提出了针对性的缓解措施。最后,宽范围图分割也自然地启用了可扩展的基于物理的特征提取,可以反馈到制造和设计流程中,并概述了在低温晶圆探针中实现实时集成的道路图。总体而言,我们的结果表明,基于神经网络(NN)的宽图分割是实现硅量子点量子比特高通量电荷调节的可行步骤。

英文摘要

Tuning of gate-defined semiconductor quantum dots (QDs) is a major bottleneck for scaling spin qubit technologies. We present a deep learning (DL) driven, semantic-segmentation pipeline that performs charge auto-tuning by locating transition lines in full charge stability diagrams (CSDs) and returns gate voltage targets for the single charge regime. We assemble and manually annotate a large, heterogeneous dataset of 1015 experimental CSDs measured from silicon QD devices, spanning nine design geometries, multiple wafers, and fabrication runs. A U-Net style convolutional neural network (CNN) with a MobileNetV2 encoder is trained and validated through five-fold group cross validation. Our model achieves an overall offline tuning success of 80.0% in locating the single-charge regime, with peak performance exceeding 88% for some designs. We analyze dominant failure modes and propose targeted mitigations. Finally, wide-range diagram segmentation also naturally enables scalable physic-based feature extraction that can feed back to fabrication and design workflows and outline a roadmap for real-time integration in a cryogenic wafer prober. Overall, our results show that neural network (NN) based wide-diagram segmentation is a practical step toward automated, high-throughput charge tuning for silicon QD qubits.

2512.03805 2026-06-17 cs.LG 版本更新

Deep Reinforcement Learning for Dynamic Algorithm Configuration: A Case Study on Optimizing OneMax with the (1+($λ$,$λ$))-GA

基于动态算法配置的深度强化学习:在OneMax优化中使用(1+(λ,λ))-GA的案例研究

Tai Nguyen, Phong Le, André Biedenkapp, Carola Doerr, Nguyen Dang

发表机构 * University of St Andrews, United Kingdom(圣安德鲁大学,英国) Sorbonne Université, CNRS, LIP6, France(索邦大学,法国) University of Freiburg, Germany(弗赖堡大学,德国)

AI总结 本文研究了深度强化学习算法DDQN和PPO在OneMax问题中控制(1+(λ,λ))-GA种群大小的挑战,发现DDQN和PPO存在可扩展性下降和学习不稳定问题,通过自适应奖励转移机制改进DDQN,使其在样本效率上优于传统方法。

Comments arXiv admin note: text overlap with arXiv:2502.20265

详情
AI中文摘要

动态算法配置(DAC)研究参数化优化算法控制策略的高效识别。许多研究利用强化学习(RL)解决DAC挑战;然而,应用RL通常需要大量领域专业知识。在本文中,我们对两种深度RL算法——双深度Q网络(DDQN)和近端策略优化(PPO)——进行深入研究,以控制OneMax实例上的(1+(λ,λ))-GA种群大小。尽管OneMax在结构上简单,但为(1+(λ,λ))-GA学习有效的控制策略诱导了一个高度具有挑战性的DAC景观,使其成为受控且 demanding 的基准。我们的研究揭示了限制DDQN和PPO的两个基本挑战:可扩展性下降和学习不稳定,归因于探索不足和规划时间跨度覆盖不足。为了解决探索不足,我们引入了一种自适应奖励转移机制,利用奖励分布统计信息来增强DDQN的探索。这消除了实例特定超参数调优,并确保了在问题规模上的一致有效性。为了解决规划时间跨度覆盖问题,我们证明了在DDQN中无折扣学习的成功,而PPO面临根本的方差问题,需要替代设计。我们进一步表明,尽管超参数优化增强了PPO的稳定性,但它始终无法识别有效的策略。最后,DDQN结合自适应奖励转移在样本效率上与理论推导的策略相当,远超先前的DAC方法。我们的发现提供了对标准深度RL方法在这一具有挑战性的DAC设置中所面临根本障碍的理解,并突显了有效学习所需的关键方法论成分。

英文摘要

Dynamic Algorithm Configuration (DAC) studies the efficient identification of control policies for parameterized optimization algorithms. Numerous studies leverage Reinforcement Learning (RL) to address DAC challenges; however, applying RL often requires extensive domain expertise. In this work, we conduct a comprehensive study of two deep-RL algorithms--Double Deep Q-Networks (DDQN) and Proximal Policy Optimization (PPO)--for controlling the population size of the $(1+(λ,λ))$-GA on OneMax instances. Although OneMax is structurally simple, learning effective control policies for the $(1+(λ,λ))$-GA induces a highly challenging DAC landscape, making it a controlled yet demanding benchmark. Our investigation reveals two fundamental challenges limiting DDQN and PPO: scalability degradation and learning instability, traced to under-exploration and planning horizon coverage. To address under-exploration, we introduce an adaptive reward shifting mechanism that leverages reward distribution statistics to enhance DDQN exploration. This eliminates instance-specific hyperparameter tuning and ensures consistent effectiveness across problem scales. To resolve planning horizon coverage, we demonstrate that undiscounted learning succeeds in DDQN, while PPO faces fundamental variance issues necessitating alternative designs. We further show that while hyperparameter optimization enhances PPO's stability, it consistently fails to identify effective policies. Finally, DDQN with adaptive reward shifting achieves performance comparable to theoretically derived policies with vastly improved sample efficiency, outperforming prior DAC approaches by orders of magnitude. Our findings provide insights into the fundamental obstacles faced by standard deep-RL approaches in this challenging DAC setting and highlight the key methodological ingredients required for effective learning.

2310.06328 2026-06-17 cs.LG eess.SP 版本更新

ARC-Fi: Exploiting Antenna Spatial Diversity for Label-Efficient Domain Generalization in Wi-Fi Sensing

ARC-Fi: 利用天线空间多样性实现标签高效领域泛化在Wi-Fi传感

Ke Xu, Zhiyong Zheng, Hongyuan Zhu, Lei Wang, Jiangtao Wang

发表机构 * Suzhou Institute for Advanced Research, University of Science and Technology of China(中国科学技术大学苏州研究院) Suzhou Big Data and AI Research and Engineering Center(苏州大数据与人工智能研究与工程中心) School of Artificial Intelligence and Data Science, University of Science and Technology of China(中国科学技术大学人工智能与数据科学学院) Institute for Infocomm Research (I 2 R), A*STAR(资讯与通讯研究院(I2R),A*STAR) School of Computer Science and Technology, Soochow University(苏州大学计算机科学与技术学院)

AI总结 ARC-Fi通过引入物理指导的数据增强策略,解决Wi-Fi传感中领域偏移问题,实现高效领域泛化。

Comments This work has been submitted to the IEEE for possible publication

详情
AI中文摘要

Wi-Fi传感系统在部署于未见过的现实环境时受到领域偏移的严重阻碍。尽管现有方法试图通过无监督领域适应(UDA)或领域泛化(DG)来解决这一问题,但它们严重依赖于不可用的目标数据或过于昂贵且庞大的标注源数据集。在实践中,收集大量未标注的信道状态信息(CSI)是可行的,而手动标注则受到严重限制。这种现实困境需要半监督领域泛化(SSDG)。为此,我们提出了ARC-Fi,这是首个专门用于Wi-Fi传感的SSDG框架。直接应用传统对比学习到CSI数据不可避免地触发领域特定的“捷径学习”,导致模型记忆环境背景而非手势动态。为克服这一问题,ARC-Fi引入了一种物理指导的数据增强策略:天线响应一致性(ARC)模块。ARC利用多天线系统的内在空间多样性,将位于同一位置的天线信号视为自然语义保持的增强视图,以明确阻止环境捷径。此外,我们引入了一个统一的半监督对比目标,利用稀缺标签和可靠的伪标签对跨领域特征进行对齐,有效防止了同类实例的盲目排斥。在Widar和CSIDA数据集上的广泛实验表明,ARC-Fi建立了新的最先进的水平,显著优于现有的UDA、DG和SSDG方法。最终,这项工作提供了一个基于物理的、标签高效的解决方案,推动了稳健现实Wi-Fi传感系统的大规模部署。代码可在:https://github.com/KaoruMiyazono/UniCrossFi。

英文摘要

Wi-Fi sensing systems are severely hindered by domain shifts when deployed in unseen real-world environments. While existing methods attempt to tackle this through Unsupervised Domain Adaptation (UDA) or Domain Generalization (DG), they critically rely on either inaccessible target data or prohibitively expensive, massive labeled source datasets. In practice, collecting abundant unlabeled Channel State Information (CSI) is feasible, whereas manual labeling is severely constrained. This realistic dilemma necessitates Semi-Supervised Domain Generalization (SSDG). To this end, we propose ARC-Fi, the first dedicated SSDG framework for Wi-Fi sensing. Directly applying conventional contrastive learning to CSI data inevitably triggers paradigm-specific "shortcut learning," causing models to memorize environmental backgrounds rather than gesture dynamics. To overcome this, ARC-Fi introduces a physics-informed data augmentation strategy: the Antenna Response Consistency (ARC) module. ARC exploits the intrinsic spatial diversity of multi-antenna systems, treating signals from co-located antennas as naturally semantics-preserving augmented views to explicitly block environmental shortcuts. Furthermore, we introduce a unified Semi-Supervised Contrastive Objective that leverages scarce labels and reliable pseudo-labels to align cross-domain features, effectively preventing the blind repulsion of same-class instances. Extensive experiments on the Widar and CSIDA datasets demonstrate that ARC-Fi establishes a new state-of-the-art, significantly outperforming existing UDA, DG, and SSDG methods. Ultimately, this work provides a physics-grounded, label-efficient solution, advancing the scalable deployment of robust real-world Wi-Fi sensing systems. Code is available at: https://github.com/KaoruMiyazono/UniCrossFi.

2602.13318 2026-06-17 cs.AI cs.CV cs.LG 版本更新

DECKBench: Benchmarking Multi-Agent Frameworks for Academic Slide Generation and Editing

DECKBench:用于学术幻灯片生成和编辑的多智能体框架基准测试

Daesik Jang, Morgan Lindsay Heisler, Linzi Xing, Yifei Li, Edward Wang, Ying Xiong, Yong Zhang, Zhenan Fan

发表机构 * Huawei Technologies Canada(华为加拿大技术有限公司) University of British Columbia(不列颠哥伦比亚大学)

AI总结 本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

详情
AI中文摘要

本文提出DECKBench,一个用于评估多智能体生成和编辑学术幻灯片的框架,通过定制数据集和模拟编辑指令,系统评估幻灯片和整个演示文稿的忠实度、连贯性、布局质量和多轮指令遵循能力。

英文摘要

Automatically generating and iteratively editing academic slide decks requires more than document summarization. It demands faithful content selection, coherent slide organization, layout-aware rendering, and robust multi-turn instruction following. However, existing benchmarks and evaluation protocols do not adequately measure these challenges. To address this gap, we introduce the Deck Edits and Compliance Kit Benchmark (DECKBench), an evaluation framework for multi-agent slide generation and editing. DECKBench is built on a curated dataset of paper to slide pairs augmented with realistic, simulated editing instructions. Our evaluation protocol systematically assesses slide-level and deck-level fidelity, coherence, layout quality, and multi-turn instruction following. We further implement a modular multi-agent baseline system that decomposes the slide generation and editing task into paper parsing and summarization, slide planning, HTML creation, and iterative editing. Experimental results demonstrate that the proposed benchmark highlights strengths, exposes failure modes, and provides actionable insights for improving multi-agent slide generation and editing systems. Overall, this work establishes a standardized foundation for reproducible and comparable evaluation of academic presentation generation and editing. Code and data are publicly available at https://github.com/morgan-heisler/DeckBench .

2602.00473 2026-06-17 quant-ph cs.AI cs.LG 版本更新

Quantum Phase Recognition via Quantum Attention Mechanism

通过量子注意机制进行量子相识别

Jin-Long Chen, Xin Li, Zhang-Qi Yin

发表机构 * Center for Quantum Technology Research(量子技术研究中心) Key Laboratory of Advanced Optoelectronic Quantum Architecture(先进光电量子架构重点实验室) Measurements (MOE), School of Physics, Beijing Institute of Technology, Beijing 100081, China(测量(MOE),物理学院,北京理工大学,北京100081,中国)

AI总结 本文提出混合量子-经典注意模型,利用交换测试和参数化量子电路提取量子态关联,实现基态分类,针对簇异或模型在9和15个量子比特系统中表现出高准确率和鲁棒性。

Comments 10 pages, 7 figures

详情
Journal ref
Phys. Rev. A 113, 062403 (2026)
AI中文摘要

许多体系统中的量子相变本质上由复杂的关联结构特征化,这给传统方法在大规模系统中的计算带来了挑战。为此,我们提出了一种混合量子-经典注意模型。该模型利用交换测试和参数化量子电路实现的注意机制,提取量子态中的关联并执行基态分类。在9和15个量子比特的簇异或模型上进行测试,该模型在少于100个训练数据的情况下实现了高分类准确率,并展示了对训练集变化的鲁棒性。进一步分析表明,该模型成功捕捉了相敏感特征和特征物理长度尺度,为复杂许多体系统中的量子相识别提供了一种可扩展且数据高效的解决方案。

英文摘要

Quantum phase transitions in many-body systems are fundamentally characterized by complex correlation structures, which pose computational challenges for conventional methods in large systems. To address this, we propose a hybrid quantum-classical attention model. This model uses an attention mechanism, realized through swap tests and a parameterized quantum circuit, to extract correlations within quantum states and perform ground-state classification. Benchmarked on the cluster-Ising model with system sizes of 9 and 15 qubits, the model achieves high classification accuracy with less than 100 training data and demonstrates robustness against variations in the training set. Further analysis reveals that the model successfully captures phase-sensitive features and characteristic physical length scales, offering a scalable and data-efficient approach for quantum phase recognition in complex many-body systems.

2509.11154 2026-06-17 cs.LG cs.AI 版本更新

Feature Space Topology Control via Hopkins Loss

通过霍普金斯损失控制特征空间拓扑

Einari Vaaras, Manu Airaksinen

发表机构 * Signal Processing Research Centre Tampere University(信号处理研究中心塔尔皮莱大学) BABA Center, Department of Physiology University of Helsinki(BABA中心生理学系赫尔辛基大学)

AI总结 本文提出霍普金斯损失,用于控制特征空间拓扑,通过非线性瓶颈自编码器在语音、文本和图像数据中验证其在分类和降维中的有效性。

Comments Accepted for publication in Proc. IEEE ICTAI 2025, Athens, Greece

详情
AI中文摘要

特征空间拓扑指的是特征空间中样本的组织方式。修改此拓扑在机器学习应用中有益,包括降维、生成建模、迁移学习和对抗攻击的鲁棒性。本文引入了霍普金斯损失,利用霍普金斯统计量来强制实现期望的特征空间拓扑,与现有拓扑相关方法旨在保留输入特征拓扑不同。我们在语音、文本和图像数据的两个场景中评估了霍普金斯损失的有效性:分类和使用非线性瓶颈自编码器的降维。实验表明,将霍普金斯损失整合到分类或降维中对分类性能影响很小,但能提供修改特征拓扑的好处。

英文摘要

Feature space topology refers to the organization of samples within the feature space. Modifying this topology can be beneficial in machine learning applications, including dimensionality reduction, generative modeling, transfer learning, and robustness to adversarial attacks. This paper introduces a novel loss function, Hopkins loss, which leverages the Hopkins statistic to enforce a desired feature space topology, which is in contrast to existing topology-related methods that aim to preserve input feature topology. We evaluate the effectiveness of Hopkins loss on speech, text, and image data in two scenarios: classification and dimensionality reduction using nonlinear bottleneck autoencoders. Our experiments show that integrating Hopkins loss into classification or dimensionality reduction has only a small impact on classification performance while providing the benefit of modifying feature topology.

2509.03932 2026-06-17 cs.CL cs.CY cs.LG 版本更新

KPoEM: A Human-Annotated Dataset for Emotion Classification and RAG-Based Poetry Generation in Korean Modern Poetry

KPoEM:用于韩国现代诗歌情感分类与基于RAG的诗歌生成的人工标注数据集

Iro Lim, Haein Ji, Byungjun Kim

发表机构 * The Academy of Korean Studies(韩国学术院) Graduate School of Korean Studies(韩国研究研究生院) Cultural Informatics(文化信息学)

AI总结 本研究构建了KPoEM多标签情感数据集,通过序列微调策略实现F1-micro 0.60的情感分类,并验证了基于RAG的诗歌生成在韩国文学情感与文化表达上的可行性。

Comments 43 pages, 22 tables, 3 figures, Digital Humanities and Social Sciences Korea Conference, James Joo-Jin Kim Center for Korean Studies, University of Pennsylvania, Philadelphia, USA

详情
Journal ref
The Review of Korean Studies 29(1) (2026) 161-206
AI中文摘要

本研究介绍了KPoEM(韩国诗歌情感映射),这是一个新颖的数据集,为现代韩国诗歌中情感中心分析和生成应用奠定了基础。尽管自然语言处理取得了进展,但由于诗歌复杂的比喻语言和文化特异性,其研究仍不充分。我们构建了一个包含7,662条条目(7,007条行级和615条作品级)的多标签数据集,由五位有影响力的韩国诗人的44个细粒度情感类别进行标注。通过序列策略(从通用语料库到专门的KPoEM数据集)微调的KPoEM情感分类模型,实现了0.60的F1-micro分数,显著优于之前的模型(0.43)。该模型在保留核心诗歌情感的同时,展示了识别时间和文化特定情感表达的能力增强。此外,将结构化情感数据集应用于基于RAG的诗歌生成模型,证明了生成反映韩国文学情感和文化敏感性文本的实证可行性。这种综合方法加强了计算技术与文学分析之间的联系,为定量情感研究和生成诗学开辟了新途径。总体而言,本研究为推进现代韩国诗歌中情感中心分析和创作提供了基础。

英文摘要

This study introduces KPoEM (Korean Poetry Emotion Mapping), a novel dataset that serves as a foundation for both emotion-centered analysis and generative applications in modern Korean poetry. Despite advancements in NLP, poetry remains underexplored due to its complex figurative language and cultural specificity. We constructed a multi-label dataset of 7,662 entries (7,007 line-level and 615 work-level), annotated with 44 fine-grained emotion categories from five influential Korean poets. The KPoEM emotion classification model, fine-tuned through a sequential strategy -- moving from general-purpose corpora to the specialized KPoEM dataset -- achieved an F1-micro score of 0.60, significantly outperforming previous models (0.43). The model demonstrates an enhanced ability to identify temporally and culturally specific emotional expressions while preserving core poetic sentiments. Furthermore, applying the structured emotion dataset to a RAG-based poetry generation model demonstrates the empirical feasibility of generating texts that reflect the emotional and cultural sensibilities of Korean literature. This integrated approach strengthens the connection between computational techniques and literary analysis, opening new pathways for quantitative emotion research and generative poetics. Overall, this study provides a foundation for advancing emotion-centered analysis and creation in modern Korean poetry.

2511.03876 2026-06-17 eess.IV cs.CV cs.LG physics.med-ph 版本更新

Computed Tomography (CT)-derived Cardiovascular Flow Estimation Using Physics-Informed Neural Networks Improves with Sinogram-based Training: A Simulation Study

基于CT的心血管血流估计利用物理信息神经网络,通过sinogram训练提升:一项模拟研究

Jinyuxuan Guo, Gurnoor Singh Khurana, Alejandro Gonzalo Grande, Juan C. del Alamo, Francisco Contijoch

发表机构 * Dept. of Bioengineering, University of California San Diego(加州大学圣地亚哥分校生物工程系) Dept. of Computer Science Engineering, University of California San Diego(加州大学圣地亚哥分校计算机科学与工程系) Dept. of Mechanical Engineering, Univ of Washington(华盛顿大学机械工程系) Depts of Mechanical Engineering and Cardiology, Univ. of Washington(华盛顿大学机械工程与心内科系) Depts. of Bioengineering, Radiology, University of California San Diego(加州大学圣地亚哥分校生物工程与放射学系)

AI总结 本研究评估了CT影像对基于物理信息神经网络(PINN)的血流估计的影响,提出了一种改进框架SinoFlow,直接利用sinogram数据估计血流,结果显示SinoFlow在避免滤波反投影引入的误差方面表现更优。

详情
AI中文摘要

背景:非侵入性成像基于血流评估在评估心脏功能和结构中起关键作用。CT是一种广泛使用的成像模态,能够稳健地评估心血管解剖和功能,但直接从对比剂演变的电影中估计血流速度的方法尚未开发。目的:本研究评估CT影像对基于物理信息神经网络(PINN)的血流估计的影响,并提出一种改进框架SinoFlow,直接利用sinogram数据估计血流。方法:我们利用计算流体力学生成理想化的2D血管分叉中的脉动流场,并模拟了不同 gantry 旋转速度、管电流和脉冲模式成像设置的CT扫描。我们比较了基于重建图像的PINN血流估计(ImageFlow)与SinoFlow的性能。结果:SinoFlow通过避免滤波反投影引入的误差显著提高了血流估计性能。SinoFlow在所有测试的gantry旋转速度下都表现出鲁棒性,并且始终产生比ImageFlow更低的均方误差和速度误差。此外,SinoFlow与脉冲模式成像兼容,并且在较短的脉冲宽度下保持更高的准确性。结论:本研究展示了SinoFlow在CT基血流估计中的潜力,为非侵入性血流评估提供了一种更有前景的方法。研究结果旨在为PINNs在CT图像中的未来应用提供信息,并提供了一种基于图像的估计解决方案,合理采集参数可产生准确的血流估计。

英文摘要

Background: Non-invasive imaging-based assessment of blood flow plays a critical role in evaluating heart function and structure. Computed Tomography (CT) is a widely-used imaging modality that can robustly evaluate cardiovascular anatomy and function, but direct methods to estimate blood flow velocity from movies of contrast evolution have not been developed. Purpose: This study evaluates the impact of CT imaging on Physics-Informed Neural Networks (PINN)-based flow estimation and proposes an improved framework, SinoFlow, which uses sinogram data directly to estimate blood flow. Methods: We generated pulsatile flow fields in an idealized 2D vessel bifurcation using computational fluid dynamics and simulated CT scans with varying gantry rotation speeds, tube currents, and pulse mode imaging settings. We compared the performance of PINN-based flow estimation using reconstructed images (ImageFlow) to SinoFlow. Results: SinoFlow significantly improved flow estimation performance by avoiding propagating errors introduced by filtered backprojection. SinoFlow was robust across all tested gantry rotation speeds and consistently produced lower mean squared error and velocity errors than ImageFlow. Additionally, SinoFlow was compatible with pulsed-mode imaging and maintained higher accuracy with shorter pulse widths. Conclusions: This study demonstrates the potential of SinoFlow for CT-based flow estimation, providing a more promising approach for non-invasive blood flow assessment. The findings aim to inform future applications of PINNs to CT images and provide a solution for image-based estimation, with reasonable acquisition parameters yielding accurate flow estimates.

2501.16370 2026-06-17 cs.LG cs.AI cs.NA cs.NE math.NA 版本更新

Advanced Physics-Informed Neural Network with Residuals for Solving Complex Integral Equations

先进物理指导神经网络与残差用于求解复杂积分方程

Mahdi Movahedian Moghaddam, Kourosh Parand, Saeed Reza Kheradpisheh

发表机构 * Department of Computer and Data Sciences, Shahid Beheshti University(计算机与数据科学系,谢赫·贝赫什提大学) Department of Cognitive Modeling, Shahid Beheshti University(认知建模系,谢赫·贝赫什提大学)

AI总结 本文提出残差积分求解网络(RISN),通过高精度数值方法与残差连接提升求解积分和积分微分方程的精度与稳定性,实验表明其在多种方程类型上均优于传统PINN及其变体。

详情
Journal ref
Anal. Numer. Solut. Nonlinear Equ. 11 (2026), no. 1, 153-173
AI中文摘要

本文提出残差积分求解网络(RISN),一种新型神经网络架构,旨在求解广泛类别的积分和积分微分方程,包括一维、多维、常微分和偏微分、分数类型以及包含振荡核的霍尔迈尔类型积分方程。RISN整合残差连接与高精度数值方法如高斯求积和分数导数运算矩阵,使其在精度和稳定性上优于传统物理指导神经网络(PINN)。残差连接有助于缓解消失梯度问题,使RISN能够处理更深层的网络和更复杂的核,特别是在多维问题中。通过广泛实验,我们证明RISN在各种方程类型上均优于传统PINN及其变体,如辅助PINN(A-PINN)和自适应PINN(SA-PINN),在各种方程类型上均取得显著更低的平均绝对误差(MAE)。这些结果突显了RISN在求解具有挑战性的积分和积分微分问题中的鲁棒性和效率,使其成为传统方法难以应对的现实应用中的宝贵工具。

英文摘要

In this paper, we present the Residual Integral Solver Network (RISN), a novel neural network architecture designed to solve a wide range of integral and integro-differential equations, including one-dimensional, multi-dimensional, ordinary and partial integro-differential, systems, fractional types, and Helmholtz-type integral equations involving oscillatory kernels. RISN integrates residual connections with high-accuracy numerical methods such as Gaussian quadrature and fractional derivative operational matrices, enabling it to achieve higher accuracy and stability than traditional Physics-Informed Neural Networks (PINN). The residual connections help mitigate vanishing gradient issues, allowing RISN to handle deeper networks and more complex kernels, particularly in multi-dimensional problems. Through extensive experiments, we demonstrate that RISN consistently outperforms not only classical PINNs but also advanced variants such as Auxiliary PINN (A-PINN) and Self-Adaptive PINN (SA-PINN), achieving significantly lower Mean Absolute Errors (MAE) across various types of equations. These results highlight RISN's robustness and efficiency in solving challenging integral and integro-differential problems, making it a valuable tool for real-world applications where traditional methods often struggle.

2509.10089 2026-06-17 cs.LG 版本更新

KAN-SR: A Kolmogorov-Arnold Network Guided Symbolic Regression Framework

KAN-SR:基于Kolmogorov-Arnold网络的符号回归框架

Marco Andrea Bühler, Gonzalo Guillén-Gosálbez

发表机构 * ETH Zürich(苏黎世联邦理工学院)

AI总结 本文提出基于Kolmogorov-Arnold网络的KAN-SR框架,通过深度学习技术和简化策略恢复Feynman符号回归科学发现数据集的真实方程,并结合神经控制微分方程精确建模生物过程系统。

详情
Journal ref
Computers & Chemical Engineering, Volume 213, 2026, 109721
AI中文摘要

我们介绍了一种新颖的符号回归框架,即KAN-SR,其基于Kolmogorov-Arnold网络(KANs),采用分而治之的方法。符号回归旨在寻找最佳拟合给定数据集的数学方程,通常通过遗传编程方法解决。我们证明通过使用深度学习技术、更具体的KANs以及结合简化策略如平移对称性和分离性,能够恢复Feynman符号回归科学发现数据集的真实方程。此外,我们还证明通过将所提出的框架与神经控制微分方程结合,能够精确建模生物过程系统,为其他工程系统的动态建模打开大门。

英文摘要

We introduce a novel symbolic regression framework, namely KAN-SR, built on Kolmogorov Arnold Networks (KANs) which follows a divide-and-conquer approach. Symbolic regression searches for mathematical equations that best fit a given dataset and is commonly solved with genetic programming approaches. We show that by using deep learning techniques, more specific KANs, and combining them with simplification strategies such as translational symmetries and separabilities, we are able to recover ground-truth equations of the Feynman Symbolic Regression for Scientific Discovery (SRSD) dataset. Additionally, we show that by combining the proposed framework with neural controlled differential equations, we are able to model the dynamics of an in-silico bioprocess system precisely, opening the door for the dynamic modeling of other engineering systems.

2508.10908 2026-06-17 physics.ao-ph cs.LG 版本更新

Data-driven global ocean model resolving ocean-atmosphere coupling dynamics

数据驱动的全球海洋模型解析海洋-大气耦合动力学

Jeong-Hwan Kim, Daehyun Kang, Young-Min Yang, Jae-Heung Park, Yoo-Geun Ham

发表机构 * Center for Climate and Carbon Cycle Research, Korea Institute of Science and Technology, Seoul, Republic of Korea(韩国科学技术院气候与碳循环研究中心,首尔,大韩民国) Department of Environment and Energy, Jeonbuk National University, Jeonju, Republic of Korea(全南国立大学环境与能源系,全州,大韩民国) School of Earth and Environmental Sciences, Seoul National University, Seoul, Republic of Korea(首尔国立大学地球与环境科学学院,首尔,大韩民国) Department of Environmental Management, Seoul National University, Seoul, Republic of Korea(首尔国立大学环境管理系,首尔,大韩民国)

AI总结 本文提出KIST-Ocean模型,利用U型视觉注意力对抗网络架构,通过部分卷积、对抗训练和迁移学习提升海洋预测能力,准确模拟热带太平洋的Kelvin波和Rossby波传播及环流风应力诱导的垂直运动,展现其在气候现象中的耦合机制表示能力。

Comments The manuscript contains 4 main figures. The Extended Data contains 7 figures and 3 tables. The Supplementary Information contains 3 text sections, 7 figures, 1 table

详情
Journal ref
Sci. Adv. 12, eaed1225 (2026)
AI中文摘要

人工智能已推动全球天气预报发展,优于传统数值模型在准确性和计算效率方面。然而,预测超亚季节时间尺度需要开发基于深度学习的海洋-大气耦合模型,以真实模拟复杂海洋对大气强迫的响应。本文提出KIST-Ocean,一种基于深度学习的全球三维海洋环流模型,采用U型视觉注意力对抗网络架构。KIST-Ocean通过部分卷积、对抗训练和迁移学习解决海岸复杂性和预测分布漂移问题。全面评估证实了模型的鲁棒海洋预测能力和效率。此外,它准确捕捉现实海洋响应,如热带太平洋的Kelvin和Rossby波传播,以及由环流和反环流风应力引起的垂直运动,展示其在气候现象(如厄尔尼诺-南方涛动)中关键海洋-大气耦合机制的表示能力。这些发现增强了基于深度学习的全球天气和气候模型的信心,并拓展深度学习方法到更广泛的地球系统建模,为提升气候预测能力提供潜力。

英文摘要

Artificial intelligence has advanced global weather forecasting, outperforming traditional numerical models in both accuracy and computational efficiency. Nevertheless, extending predictions beyond subseasonal timescales requires the development of deep learning (DL)-based ocean-atmosphere coupled models that can realistically simulate complex oceanic responses to atmospheric forcing. This study presents KIST-Ocean, a DL-based global three-dimensional ocean general circulation model using a U-shaped visual attention adversarial network architecture. KIST-Ocean integrates partial convolution, adversarial training, and transfer learning to address coastal complexity and predictive distribution drift in auto-regressive models. Comprehensive evaluations confirmed the model's robust ocean predictive skill and efficiency. Moreover, it accurately captures realistic ocean response, such as Kelvin and Rossby wave propagation in the tropical Pacific, and vertical motions induced by cyclonic and anticyclonic wind stress, demonstrating its ability to represent key ocean-atmosphere coupling mechanisms underlying climate phenomena, including the El Nino-Southern Oscillation. These findings reinforce confidence in DL-based global weather and climate models and their extending DL-based approaches to broader Earth system modeling, offering potential for enhancing climate prediction capabilities.

2502.10112 2026-06-17 cs.LG 版本更新

Accelerometry-based Energy Expenditure Estimation During Activities of Daily Living: A Comparison Among Different Accelerometer Compositions

基于加速度计的日常活动能量消耗估计:不同加速度计配置的比较

Shuhao Que, Remco Poelarends, Peter Veltink, Miriam Vollenbroek-Hutten, Ying Wang

发表机构 * Department of Electrical Engineering, University of Twente(特文特大学电气工程系) Department of Nuclear Medicine, Isala(Isala核医学部)

AI总结 本文比较了基于身体中心质量加速度和腕部加速度计的不同配置在日常活动能量消耗估计中的表现,发现基于身体中心质量的3-acc配置表现最佳。

Comments This work has been accepted by IEEE EMBC 2025

详情
AI中文摘要

身体活动能量消耗(PAEE)可通过呼吸数据测量,也可通过身体运动预测。身体中心质量(COM)加速度反映全身运动,是PAEE的良好预测指标。本文使用COSMED K5测量的呼吸数据作为参考,评估了基于COM和腕部的配置性能。COM配置包括仅使用骨盆加速度计(pelvis-acc)和骨盆加速度计加双大腿加速度计(3-acc)。腕部配置包括仅使用左腕或右腕加速度计。两种现有PAEE估计方法(线性回归和CNN-LSTM)在3-acc配置下表现最佳(LR:R²=0.41,CNN-LSTM:R²=0.53)。3-acc与pelvis-acc配置无显著差异(p值=0.278)。对于两种模型,左腕或右腕配置在PAEE预测中无显著表现(R²接近0,显著劣于COM配置(p值<0.05)。左右腕无显著差异(p值=0.329)

英文摘要

Physical activity energy expenditure (PAEE) can be measured from breath-by-breath respiratory data, which can serve as a reference. Alternatively, PAEE can be predicted from the body movements, which can be measured and estimated with accelerometers. The body center of mass (COM) acceleration reflects the movements of the whole body and thus serves as a good predictor for PAEE. However, the wrist has also become a popular location due to recent advancements in wrist-worn devices. Therefore, in this work, using the respiratory data measured by COSMED K5 as the reference, we evaluated and compared the performances of COM-based settings and wrist-based settings. The COM-based settings include two different accelerometer compositions, using only the pelvis accelerometer (pelvis-acc) and the pelvis accelerometer with two accelerometers from two thighs (3-acc). The wrist-based settings include using only the left wrist accelerometer (l-wrist-acc) and only the right wrist accelerometer (r-wrist-acc). We implemented two existing PAEE estimation methods on our collected dataset, where 9 participants performed activities of daily living while wearing 5 accelerometers (i.e., pelvis, two thighs, and two wrists). These two methods include a linear regression (LR) model and a CNN-LSTM model. Both models yielded the best results with the COM-based 3-acc setting (LR: $R^2$ = 0.41, CNN-LSTM: $R^2$ = 0.53). No significant difference was found between the 3-acc and pelvis-acc settings (p-value = 0.278). For both models, neither the l-wrist-acc nor the r-wrist-acc settings demonstrated predictive power on PAEE with $R^2$ values close to 0, significantly outperformed by the two COM-based settings (p-values $<$ 0.05). No significant difference was found between the two wrists (p-value = 0.329).

2503.08679 2026-06-17 cs.AI cs.CL cs.LG 版本更新

Chain-of-Thought Reasoning In The Wild Is Not Always Faithful

现实中的思维链推理并不总是忠实的

Iván Arcuschin, Jett Janiak, Robert Krzyzanowski, Senthooran Rajamanoharan, Neel Nanda, Arthur Conmy

发表机构 * Poseidon Research(Poseidon研究)

AI总结 研究发现,在自然语言提示下,模型有时会生成表面连贯但自相矛盾的思维链,揭示出隐含的事后合理化现象,且前沿模型也未能完全避免。

Comments Published at the 43rd International Conference on Machine Learning (ICML 2026)

详情
AI中文摘要

最近的研究表明,当面对提示中的显式偏见时,模型通常会在其思维链(CoT)输出中省略提及这些偏见,揭示出口头推理可能给出模型如何得出错误结论的不正确图景(不忠实)。在这项工作中,我们展示了不忠实的CoT也发生在自然措辞、非对抗性的提示上,而无需添加人为偏见或编辑模型输出。我们发现,当分别呈现问题“X比Y大吗?”和“Y比X大吗?”时,模型有时会生成表面连贯的论证来证明系统性地对两者都回答“是”或都回答“否”是合理的,尽管存在矛盾。我们提供了初步证据表明这是由于模型对“是”或“否”的隐含偏见,并将其标记为隐含的事后合理化。我们的结果显示,生产模型的不忠实率高达13%,而前沿模型虽然更忠实,但没有一个完全忠实,包括像DeepSeek R1(0.37%)和Sonnet 3.7 with thinking(0.04%)这样的思考模型。我们还研究了不忠实的非逻辑捷径,即模型使用微妙的非逻辑推理来使对困难数学问题的推测性答案看起来经过严格证明。我们的发现表明,虽然CoT可用于评估输出,但它并不是产生模型答案的内部过程的完整描述,应在代理或安全关键环境中谨慎使用。

英文摘要

Recent studies indicate that when faced with explicit biases in prompts, models often omit mentioning these biases in their Chain-of-Thought (CoT) output, revealing that verbalized reasoning can give an incorrect picture of how models arrive at conclusions (unfaithfulness). In this work, we show that unfaithful CoT also occurs on naturally worded, non-adversarial prompts without adding artificial biases or editing model outputs. We find that when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify systematically answering Yes to both or No to both, despite the contradiction. We present preliminary evidence that this is due to models' implicit biases towards Yes or No, labeling this Implicit Post-Hoc Rationalization. Our results reveal rates up to 13% for production models, and while frontier models are more faithful, none are entirely so, including thinking models like DeepSeek R1 (0.37%) and Sonnet 3.7 with thinking (0.04%). We also investigate Unfaithful Illogical Shortcuts, where models use subtly illogical reasoning to make speculative answers to hard math problems seem rigorously proven. Our findings indicate that while CoT can be useful for assessing outputs, it is not a complete account of the internal process that produced the model's answer and should be used with caution in agentic or safety-critical settings.

2506.08654 2026-06-17 physics.med-ph cs.LG 版本更新

A Privacy-Preserving Federated Learning Framework for Generalizable CBCT to Synthetic CT Translation in Head and Neck

一种保护隐私的联邦学习框架用于头颈区域CBCT到合成CT的可推广转换

Ciro Benito Raggio, Paolo Zaffino, Maria Francesca Spadea

发表机构 * Institute of Biomedical Engineering(生物医学工程研究所) Karlsruhe Institute of Technology(卡尔斯鲁厄理工大学) Department of Experimental and Clinical Medicine(实验与临床医学系)

AI总结 本文提出一种跨机构联邦学习框架,用于头颈区域CBCT到合成CT的转换,通过保护数据隐私实现跨机构模型的泛化能力。

详情
Journal ref
Frontiers in Digital Health, 8:1812254, June 2026
AI中文摘要

锥束计算机断层扫描(CBCT)已成为图像引导放射治疗(IGRT)中广泛应用的成像模态。然而,CBCT存在噪声增加、软组织对比度有限和伪影等问题,导致Hounsfield单位值不可靠,阻碍了直接剂量计算。合成CT(sCT)生成从CBCT中解决了这些问题,尤其是使用深度学习(DL)方法。现有方法受到机构异质性、扫描仪依赖性变化和数据隐私法规的限制,这些法规防止多中心数据共享。为克服这些挑战,我们提出了一种跨机构横向联邦学习(FL)方法,用于头颈区域CBCT到sCT的合成,扩展了我们的FedSynthCT框架。一个条件生成对抗网络在欧洲三个医疗中心的公共SynthRAD2025挑战数据集上协同训练。联邦模型在不同中心间表现出有效的泛化能力,平均绝对误差(MAE)范围从64.38±13.63到85.90±7.10 HU,结构相似性指数(SSIM)从0.882±0.022到0.922±0.039,峰值信噪比(PSNR)从32.86±0.94到34.91±1.04 dB。值得注意的是,在60名患者的外部验证数据集上,未进行额外训练即可实现相似的性能(MAE: 75.22±11.81 HU,SSIM: 0.904±0.034,PSNR: 33.52±2.06 dB),证实了在协议、扫描仪差异和配准误差的情况下具有鲁棒的泛化能力。这些发现展示了联邦学习在CBCT到sCT合成中的技术可行性,同时保护了数据隐私,并提供了一种无需集中数据共享或特定站点微调即可在不同机构之间开发可推广模型的协作解决方案。

英文摘要

Shortened Abstract Cone-beam computed tomography (CBCT) has become a widely adopted modality for image-guided radiotherapy (IGRT). However, CBCT suffers from increased noise, limited soft-tissue contrast, and artifacts, resulting in unreliable Hounsfield unit values and hindering direct dose calculation. Synthetic CT (sCT) generation from CBCT addresses these issues, especially using deep learning (DL) methods. Existing approaches are limited by institutional heterogeneity, scanner-dependent variations, and data privacy regulations that prevent multi-center data sharing. To overcome these challenges, we propose a cross-silo horizontal federated learning (FL) approach for CBCT-to-sCT synthesis in the head and neck region, extending our FedSynthCT framework. A conditional generative adversarial network was collaboratively trained on data from three European medical centers in the public SynthRAD2025 challenge dataset. The federated model demonstrated effective generalization across centers, with mean absolute error (MAE) ranging from $64.38\pm13.63$ to $85.90\pm7.10$ HU, structural similarity index (SSIM) from $0.882\pm0.022$ to $0.922\pm0.039$, and peak signal-to-noise ratio (PSNR) from $32.86\pm0.94$ to $34.91\pm1.04$ dB. Notably, on an external validation dataset of 60 patients, comparable performance was achieved (MAE: $75.22\pm11.81$ HU, SSIM: $0.904\pm0.034$, PSNR: $33.52\pm2.06$ dB) without additional training, confirming robust generalization despite protocol, scanner differences and registration errors. These findings demonstrate the technical feasibility of FL for CBCT-to-sCT synthesis while preserving data privacy and offer a collaborative solution for developing generalizable models across institutions without centralized data sharing or site-specific fine-tuning.

2501.15351 2026-06-17 cs.CY cs.LG 版本更新

Fairness in LLM-Generated Surveys

LLM生成调查中的公平性

Andrés Abeliuk, Vanessa Gaete, Naim Bro

发表机构 * Department of Computer Science, University of Chile(智利大学计算机科学系) National Center for Artificial Intelligence (CENIA)(国家人工智能中心) School of Government, Adolfo Ibáñez University(阿道弗·伊巴涅斯大学政府学院) Millennium Institute for Foundational Research on Data (IMFD)(数据基础研究千年研究所)

AI总结 研究分析了LLM在不同人口中的表现,发现其在美国数据集上表现更优,但存在因训练数据偏见导致的公平性问题,提出新的测量框架以提升模型公平性。

详情
Journal ref
EPJ Data Science (2026)
AI中文摘要

大型语言模型(LLMs)在文本生成和理解方面表现出色,尤其在模拟社会政治和经济模式方面,可作为传统调查的替代方案。然而,其全球适用性仍存疑,因未探索的社会人口和地理背景中的偏见。本研究通过分析智利和美国的公开调查,探讨LLM在不同人群中的表现,关注预测准确性和公平性指标。结果显示,LLM在美国数据集上表现更优,此偏见源于以美国为中心的训练数据,即使考虑社会人口差异后仍显著。在美国,政治身份和种族显著影响预测准确性,而在智利,性别、教育和宗教归属起更重要作用。本研究提出一种新的框架,用于测量LLM中的社会人口偏见,为确保在不同社会文化背景下实现更公平和公正的模型表现提供路径。

英文摘要

Large Language Models (LLMs) excel in text generation and understanding, especially in simulating socio-political and economic patterns, serving as an alternative to traditional surveys. However, their global applicability remains questionable due to unexplored biases across socio-demographic and geographic contexts. This study examines how LLMs perform across diverse populations by analyzing public surveys from Chile and the United States, focusing on predictive accuracy and fairness metrics. The results show performance disparities, with LLM consistently outperforming on U.S. datasets. This bias originates from the U.S.-centric training data, remaining evident after accounting for socio-demographic differences. In the U.S., political identity and race significantly influence prediction accuracy, while in Chile, gender, education, and religious affiliation play more pronounced roles. Our study presents a novel framework for measuring socio-demographic biases in LLMs, offering a path toward ensuring fairer and more equitable model performance across diverse socio-cultural contexts.

2305.09366 2026-06-17 cs.LG eess.SP 版本更新

Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors

基于可穿戴运动传感器的自动婴儿运动分类中自监督预训练的评估

Einari Vaaras, Manu Airaksinen, Sampsa Vanhatalo, Okko Räsänen

发表机构 * Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文评估了自监督预训练在提高基于可穿戴运动传感器的婴儿运动分类准确性中的效果,发现预训练无标签数据可提升分类模型的鲁棒性,且选择上下文相关数据进一步提升了性能。

Comments To be published in Proc. IEEE EMBC 2023, Sydney, Australia

详情
AI中文摘要

最近开发的婴儿可穿戴MAIJU设备为在非医院环境客观评估婴儿运动性能提供了新方法,该信息可用于发展研究和临床决策支持,如检测发育问题并指导治疗干预。MAIJU分析完全依赖于婴儿姿势和运动的分类,因此研究如何提高此类分类的准确性至关重要。本文研究了自监督预训练如何提升用于分析MAIJU记录的分类器性能,并探讨了预训练数据的上下文选择性质量筛选是否会影响分类器性能。实验表明,i)使用无标签数据预训练分类器可使后续分类模型的准确性显著提升,ii)选择上下文相关预训练数据可进一步提高分类器性能。

英文摘要

The recently-developed infant wearable MAIJU provides a means to automatically evaluate infants' motor performance in an objective and scalable manner in out-of-hospital settings. This information could be used for developmental research and to support clinical decision-making, such as detection of developmental problems and guiding of their therapeutic interventions. MAIJU-based analyses rely fully on the classification of infant's posture and movement; it is hence essential to study ways to increase the accuracy of such classifications, aiming to increase the reliability and robustness of the automated analysis. Here, we investigated how self-supervised pre-training improves performance of the classifiers used for analyzing MAIJU recordings, and we studied whether performance of the classifier models is affected by context-selective quality-screening of pre-training data to exclude periods of little infant movement or with missing sensors. Our experiments show that i) pre-training the classifier with unlabeled data leads to a robust accuracy increase of subsequent classification models, and ii) selecting context-relevant pre-training data leads to substantial further improvements in the classifier performance.

2206.10188 2026-06-17 cs.LG cs.SD eess.AS 版本更新

Analysis of Self-Supervised Learning and Dimensionality Reduction Methods in Clustering-Based Active Learning for Speech Emotion Recognition

基于聚类的主动学习中自监督学习与降维方法的分析用于语音情感识别

Einari Vaaras, Manu Airaksinen, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图皮大学计算科学系,芬兰) Helsinki University Hospital, Helsinki, Finland(赫尔辛基大学医院,芬兰)

AI总结 本文研究了在语音情感识别中,利用自监督学习和降维方法提升基于聚类的主动学习性能,探讨了特征空间局部和全局拓扑结构对主动学习的影响,发现降维不影响性能且二维特征表现良好。

Comments To be published in Proc. Interspeech 2022, Incheon, South Korea

详情
AI中文摘要

当领域专家需要进行数据标注时,减少标注工作量以节省时间和成本至关重要。在无标注情况下,可以利用特征空间结构进行基于聚类的主动学习(AL)方法。然而,这些方法高度依赖于样本在特征空间中的组织方式和距离度量。无监督方法如对比预测编码(CPC)可以用于学习有序的特征空间,但这些方法通常会产生高维特征,这可能对估计数据密度构成挑战。本文结合CPC和多种降维方法,探索基于聚类的AL的实用方法。我们的实验表明,特征空间的局部和全局拓扑结构可以成功用于AL,并且CPC可以提高基于传统信号特征的聚类AL性能。此外,我们观察到压缩数据维度对AL性能影响不大,当标注数量不低时,二维特征表示与高维特征表示在AL性能上相似。

英文摘要

When domain experts are needed to perform data annotation for complex machine-learning tasks, reducing annotation effort is crucial in order to cut down time and expenses. For cases when there are no annotations available, one approach is to utilize the structure of the feature space for clustering-based active learning (AL) methods. However, these methods are heavily dependent on how the samples are organized in the feature space and what distance metric is used. Unsupervised methods such as contrastive predictive coding (CPC) can potentially be used to learn organized feature spaces, but these methods typically create high-dimensional features which might be challenging for estimating data density. In this paper, we combine CPC and multiple dimensionality reduction methods in search of functioning practices for clustering-based AL. Our experiments for simulating speech emotion recognition system deployment show that both the local and global topology of the feature space can be successfully used for AL, and that CPC can be used to improve clustering-based AL performance over traditional signal features. Additionally, we observe that compressing data dimensionality does not harm AL performance substantially, and that 2-D feature representations achieved similar AL performance as higher-dimensional representations when the number of annotations is not very low.

2106.09539 2026-06-17 eess.AS cs.LG cs.SD 版本更新

Automatic Analysis of the Emotional Content of Speech in Daylong Child-Centered Recordings from a Neonatal Intensive Care Unit

对新生儿重症监护病房中以儿童为中心的全天候录音中语音情感内容的自动分析

Einari Vaaras, Sari Ahlqvist-Björkroth, Konstantinos Drossos, Okko Räsänen

发表机构 * Unit of Computing Sciences, Tampere University, Finland(图瓦大学计算科学系) Department of Clinical Medicine, University of Turku, Finland(图尔库大学临床医学系) Department of Signal Processing and Acoustics, Aalto University, Finland(阿尔托大学信号处理与声学系)

AI总结 本文研究了如何通过自动语音情感识别系统分析新生儿录音中的情感内容,探讨了跨语料泛化、WGAN域适应和主动学习在新领域部署中的有效性,实现了73.4%的UAR分类性能。

详情
AI中文摘要

研究人员最近开始研究年轻婴儿听到的情感语音如何影响其发展结果。作为这项研究的一部分,来自芬兰和爱沙尼亚两家医院的数百小时全天候录音被收集,用于所谓的APPLE研究。为了分析此类大规模数据集中的语音情感内容,需要一个自动语音情感识别(SER)系统。然而,目前没有情感标签或现成的领域内SER系统可用。本文介绍了最初未标注的大型真实世界音频数据集,并描述了针对芬兰子集数据开发的功能性SER系统。我们探讨了替代的最先进技术在新领域部署SER系统的有效性,比较了跨语料泛化、基于WGAN的域适应和主动学习在该任务中的效果。结果表明,表现最好的模型能够实现二元分类中valence和arousal的73.4%未加权平均召回率(UAR)和73.2% UAR。结果还显示,主动学习在与其他两种方法相比时表现最为一致。

英文摘要

Researchers have recently started to study how the emotional speech heard by young infants can affect their developmental outcomes. As a part of this research, hundreds of hours of daylong recordings from preterm infants' audio environments were collected from two hospitals in Finland and Estonia in the context of so-called APPLE study. In order to analyze the emotional content of speech in such a massive dataset, an automatic speech emotion recognition (SER) system is required. However, there are no emotion labels or existing indomain SER systems to be used for this purpose. In this paper, we introduce this initially unannotated large-scale real-world audio dataset and describe the development of a functional SER system for the Finnish subset of the data. We explore the effectiveness of alternative state-of-the-art techniques to deploy a SER system to a new domain, comparing cross-corpus generalization, WGAN-based domain adaptation, and active learning in the task. As a result, we show that the best-performing models are able to achieve a classification performance of 73.4% unweighted average recall (UAR) and 73.2% UAR for a binary classification for valence and arousal, respectively. The results also show that active learning achieves the most consistent performance compared to the two alternatives.